8000 Handle empty input files when saving embedding plots by lazappi · Pull Request #193 · theislab/scib · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Handle empty input files when saving embedding plots #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 23, 2020
Merged

Conversation

lazappi
Copy link
Member
@lazappi lazappi commented Oct 16, 2020

This was a bit trickier than I thought because I think the pipeline will complain if there is no output. What I have done is:

  1. Check if the input .h5ad file is empty, if yes:
  2. Delete any existing embedding output files
  3. Create empty output files

I think it works but I have only tested it very quickly so probably good if @danielStrobl can try running it before we merge.

* master:
  Filter genes without expression before reduce data in save_embeddings.py
  Add progress messages to merge_benchmarks.py
  Check for empty files when merging benchmarks
  Added missing import statement in metrics.py
  Isolated label fix (#187)
@LuckyMD
Copy link
Collaborator
LuckyMD commented Oct 17, 2020

Why delete the existing embedding output files? What should there be if there is an empty h5ad file?

@lazappi
Copy link
Member Author
lazappi commented Oct 19, 2020

I assumed an empty .h5ad meant a failed run and that any existing embedding output was from some previous run so should be deleted. They will just get overwritten will blank files otherwise which is pretty much equivalent. Unless there is some other reason for an empty .h5ad. Easy enough to change if we want.

@LuckyMD
Copy link
Collaborator
LuckyMD commented Oct 19, 2020

I might mean we were not careful in how we save our data if there is sth to delete. Did you come across this case? I would maybe throw an error if this happens.. what do you think? Typical Snakemake procedure would be to overwrite though.

@lazappi
Copy link
Member Author
lazappi commented Oct 19, 2020

Ah, I remember why I did this now. The Path().touch() command only changes the timestamp if the file exists so I had to delete it so out of date stuff was left there. I can probably come up with another way to overwrite with a empty file though.

Does anyone know what we do with the metrics if the .h5ad is empty? Guess we should probably do the same thing here.

@LuckyMD
Copy link
Collaborator
LuckyMD commented Oct 19, 2020

In the metrics case we just output a file that has no value for all metric outputs (I think it was this, and not NAs). That is then captured in the metrics aggregation script to say the method wasn't run. @mumichae might be able to confirm this.

I think the safest approach here would be to throw an error if the file exists. Then we can check manually and just run the snakemake command again if needed.

@mumichae
Copy link
Collaborator

In case the adata is empty, we would want empty output. If we had a non-empty output on the previous run, but for some reason the current run produces an empty adata, we wouldn't want to keep the previous embedding, so I'd overwrite any file that is output from the script but not captured by snakemake. But seeing as we provide all output filenames to the snakemake rule, snakemake should remove these files if they exist before a rerun. So all that you'll need to do is fix your script to output empty files.

@mumichae
Copy link
Collaborator

So to clarify, when we have failed runs, they result in empty .h5ad files. That means an output file will be created. Snakemake doesn't execute a rule if there is no input for it. If you require snakemake to create an output file that it can't create, it'll throw a missing input error.

When input AnnData is empty in save_embeddings.py
@lazappi
Copy link
Member Author
lazappi commented Oct 22, 2020

Ok, I changed how empty output files are created. I think this is more what we want but if not I'm probably a bit confused about what to do.

Copy link
Collaborator
@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say: yes... would you agree @mumichae ?

@mumichae
Copy link
Collaborator

Yes, this looks fine

@lazappi lazappi merged commit cd7433a into master Oct 23, 2020
@LuckyMD LuckyMD deleted the embedding-plots branch December 3, 2020 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0