8000 questions about some commands · Issue #17 · formbio/FLAG · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

questions about some commands #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
spoonbender76 opened this issue Apr 12, 2024 · 12 comments
Open

questions about some commands #17

spoonbender76 opened this issue Apr 12, 2024 · 12 comments

Comments

@spoonbender76
Copy link
spoonbender76 commented Apr 12, 2024

I am trying the new big singularity image(81.5 G) and from the document I have questions about some commands.

  1. --overlay $(pwd)/tempdir will trigger

FATAL:container creation failed:while setting overlay session layout: only root user can use sandbox overlay in setuid mode

still require root for singularity users. Is this necessary?

  1. Even with -w miniprot It shows

chosen protein algos: None

And I don't know if this error can be ignored or it has any affects.

WARN: Unknown directive runOptions for process pasa
[b7/9cbcde] NOTE: Process pasa(1) terminated with an error exit status(2) - - Error is ignored

image

  1. Just to clarity in singularity run you don't need to put liftoff in -z if Liftoff is desired?
    In document it says:
    If Liftoff is desired the above command can be modified such as below:
singularity run --bind $(pwd):/data --bind $(pwd)/tempdir:/tmp \
--overlay $(pwd)/tempdir  singularity_flag.image \
-g Erynnis_tages-GCA_905147235.1-softmasked.fa -r curatedButterflyRNA.fa \
-p curatedButterflyProteins.fa -f GCF_009731565.1_Dplex_v4_genomic.fa \
-a GCF_009731565.1_Dplex_v4_genomic.gff -m skip -t true \
-l lepidoptera_odb10 \
-z Helixer,helixer_trained_augustus -q vertebrate -s small -n Eynnis_tages \
-w miniprot -y normal -p singularity -o outputdir -u singularity

image
In chosen annotation algo the Liftoff is absent.

  1. Besides, how to update Helixer to latest version v0.3.3
@wtroy2
Copy link
Contributor
wtroy2 commented Apr 12, 2024 via email

@wtroy2
Copy link
Contributor
wtroy2 commented Apr 12, 2024 via email

@wtroy2
Copy link
Contributor
wtroy2 commented Apr 12, 2024

the protein algo has been fixed as well as instructions for singularity with liftoff. That was a typo thank you for catching those!

@spoonbender76
Copy link
Author

I setup Docker according to instructions and I've been running Docker example for over 15 hours, but I don't see any signs of it running. There are no splign/augustus processes visible. I have to CTRL-C since it appears to be stalled.

nextflow run main.nf -w workdir/ --output outputdir/ \
    --genome examples/Erynnis_tages-GCA_905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa \
    --proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa \
    --gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true \
    --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus \
    --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot \
    --speciesScientificName Eynnis_tages \
    --funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile docker

image

By the way, the flags --fafile and --gtffile appear to be specified twice in Example Docker Run commands

If Liftoff is desired the above command can be modified such as below:

nextflow run main.nf -w workdir/ --output outputdir/
--genome examples/Erynnis_tages-GCA_905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa
--proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa
--gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true
--lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus
--helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot
--speciesScientificName Eynnis_tages --fafile examples/monarchGenome.fa --gtffile examples/monarchAnnotation.gff3
--funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile docker

nextflow run main.nf -w workdir/ --output outputdir/
--genome examples/Erynnis_tages-GCA_905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa
--proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa
--gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true
--lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus
--helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small
--proteinalgo miniprot --speciesScientificName Eynnis_tages --fafile examples/monarchGenome.fa
--gtffile examples/monarchAnnotation.gff3 --runMode laptop --funcAnnotProgram eggnog
--eggnogDB eggnogDB.tar.gz -profile docker_small

@wtroy2
Copy link
Contributor
wtroy2 commented Apr 17, 2024

This one is interesting. It looks like Splign is stalled out. This is not usually a process that runs into issues so very unsure why it's stalled. If you have the process logs for that one feel free to send it.

Augustus and all of the rest of the processes are waiting for Splign to finish before they run as the Splign outputs go into the next processes.

And thank you for noticing this I will fix it ASAP. Currently working on making the singularity all run from one large container.

@spoonbender76
Copy link
Author

Thank you for the quick reply! I ran the docker liftoff example again, but nextflow keeps printing lines to .nextflow.log even though no splign processes are running. I also checked the workdir and found splign.gff3 is empty.

Apr-17 15:22:17.159 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor Local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 7; name: splign (1); status: RUNNING; exit: -; error: -; workDir: /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6]
tree -s -D -h /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6/
[4.0K Apr 17 11:15]  /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6/
|-- [4.0K Apr 17 11:21]  1_folder
|   |-- [4.0K Apr 17 11:15]  _SplignLDS2_
|   |   `-- [ 17M Apr 17 11:15]  splign.lds2db
|   |-- [452K Apr 17 11:19]  cdna.compartments
|   |-- [127M Apr 17 11:13]  cdna.fa
|   |-- [2.7M Apr 17 11:16]  cdna.fa.ndb
|   |-- [7.7M Apr 17 11:16]  cdna.fa.nhr
|   |-- [586K Apr 17 11:16]  cdna.fa.nin
|   |-- [ 497 Apr 17 11:16]  cdna.fa.njs
|   |-- [195K Apr 17 11:16]  cdna.fa.nog
|   |-- [1.1M Apr 17 11:16]  cdna.fa.nos
|   |-- [586K Apr 17 11:16]  cdna.fa.not
|   |-- [ 30M Apr 17 11:16]  cdna.fa.nsq
|   |-- [ 16K Apr 17 11:16]  cdna.fa.ntf
|   |-- [195K Apr 17 11:16]  cdna.fa.nto
|   |-- [319M Apr 17 11:15]  genome.fa
|   |-- [ 32K Apr 17 11:15]  genome.fa.ndb
|   |-- [4.4K Apr 17 11:15]  genome.fa.nhr
|   |-- [ 580 Apr 17 11:15]  genome.fa.nin
|   |-- [ 516 Apr 17 11:15]  genome.fa.njs
|   |-- [ 192 Apr 17 11:15]  genome.fa.nog
|   |-- [ 573 Apr 17 11:15]  genome.fa.nos
|   |-- [ 488 Apr 17 11:15]  genome.fa.not
|   |-- [ 79M Apr 17 11:15]  genome.fa.nsq
|   |-- [ 16K Apr 17 11:15]  genome.fa.ntf
|   |-- [ 164 Apr 17 11:15]  genome.fa.nto
|   |-- [ 10M Apr 17 11:21]  splign.asn
|   |-- [   0 Apr 17 11:21]  splign.gff3
|   |-- [ 78K Apr 17 11:21]  splign.log
|   `-- [965K Apr 17 11:21]  splign.out
|-- [4.0K Apr 17 11:20]  2_folder
|   |-- [4.0K Apr 17 11:15]  _SplignLDS2_
|   |   `-- [ 14M Apr 17 11:15]  splign.lds2db
|   |-- [786K Apr 17 11:19]  cdna.compartments
|   |-- [108M Apr 17 11:13]  cdna.fa
|   |-- [2.1M Apr 17 11:16]  cdna.fa.ndb
|   |-- [6.0M Apr 17 11:16]  cdna.fa.nhr
|   |-- [467K Apr 17 11:16]  cdna.fa.nin
|   |-- [ 497 Apr 17 11:16]  cdna.fa.njs
|   |-- [156K Apr 17 11:16]  cdna.fa.nog
|   |-- [895K Apr 17 11:16]  cdna.fa.nos
|   |-- [467K Apr 17 11:16]  cdna.fa.not
|   |-- [ 25M Apr 17 11:16]  cdna.fa.nsq
|   |-- [ 16K Apr 17 11:16]  cdna.fa.ntf
|   |-- [156K Apr 17 11:16]  cdna.fa.nto
|   |-- [319M Apr 17 11:15]  genome.fa
|   |-- [ 32K Apr 17 11:15]  genome.fa.ndb
|   |-- [4.4K Apr 17 11:15]  genome.fa.nhr
|   |-- [ 580 Apr 17 11:15]  genome.fa.nin
|   |-- [ 516 Apr 17 11:15]  genome.fa.njs
|   |-- [ 192 Apr 17 11:15]  genome.fa.nog
|   |-- [ 573 Apr 17 11:15]  genome.fa.nos
|   |-- [ 488 Apr 17 11:15]  genome.fa.not
|   |-- [ 79M Apr 17 11:15]  genome.fa.nsq
|   |-- [ 16K Apr 17 11:15]  genome.fa.ntf
|   |-- [ 164 Apr 17 11:15]  genome.fa.nto
|   |-- [ 13M Apr 17 11:20]  splign.asn
|   |-- [   0 Apr 17 11:20]  splign.gff3
|   |-- [239K Apr 17 11:20]  splign.log
|   `-- [1.2M Apr 17 11:20]  splign.out
|-- [319M Apr 17 11:13]  Erynnis_tages-GCA_905147235.1-softmasked.fa
|-- [234M Apr 17 11:13]  cdna.fa
|-- [234M Apr 17 11:13]  formatted_curatedButterflyRNA.fa
|-- [319M Apr 17 11:13]  genome.fa
|-- [ 32K Apr 17 11:13]  genome.fa.ndb
|-- [4.4K Apr 17 11:13]  genome.fa.nhr
|-- [ 580 Apr 17 11:13]  genome.fa.nin
|-- [ 516 Apr 17 11:13]  genome.fa.njs
|-- [ 192 Apr 17 11:13]  genome.fa.nog
|-- [ 573 Apr 17 11:13]  genome.fa.nos
|-- [ 488 Apr 17 11:13]  genome.fa.not
|-- [ 79M Apr 17 11:13]  genome.fa.nsq
|-- [ 16K Apr 17 11:13]  genome.fa.ntf
|-- [ 164 Apr 17 11:13]  genome.fa.nto
|-- [ 283 Apr 17 11:15]  parallel_001.txt
|-- [ 283 Apr 17 11:15]  parallel_002.txt
`-- [  44 Apr 17 11:15]  parallel_commands.txt

5 directories, 73 files

I've attached some log files here for reference. Please let me know if you need any other information.
nextflow.log
command.log
command.out.txt
command.err.txt
splign.log
splign.out.txt
parallel_001.txt
parallel_002.txt
parallel_commands.txt

@wtroy2
Copy link
Contributor
wtroy2 commented Apr 23, 2024

I updated the ncbiclibraries container that splign runs on to hopefully fix the problem you are having.

Tested on a completely fresh Debian system that I just installed nextflow and docker on and it ran fine:
Screenshot 2024-04-23 at 6 06 40 PM

So id try repulling the containers, specifically ghcr.io/formbio/flag_ncbiclibraries:latest and rerunning and fingers crossed it works. The docker should be much more stable than singularity.

@spoonbender76
Copy link
Author

I haven't used FLAG in a while, but have you tried running it with the
--annotationalgo Liftoff,Helixer,helixer_trained_augustus flag?
I noticed that all the successful run screenshots seem to be without Liftoff in the --annotationalgo flag.

@wtroy2
Copy link
Contributor
wtroy2 commented May 11, 2024

Ya I have. I will do a run tomorrow and add a screenshot to the docs.

@wtroy2
Copy link
Contributor
wtroy2 commented May 13, 2024

A screenshot of it working has been added to the readme.md file on the GitHub main branch. This should also help users for reference

@spoonbender76
Copy link
Author

image
Thank you for the help! I have completed a FLAG run using Apptainer, but I encountered significant delays due to the time spent downloading BUSCO lineage files, likely caused by my connection issues. Is there a way to specify a local directory for pre-downloaded BUSCO lineage files, and use the --offline option for all BUSCO commands within the FLAG pipeline?

Details:
I tried to create a FLAG environment with Apptainer using the following commands:

conda create -n flag apptainer
conda activate flag
cp /etc/apptainer/apptainer.config $CONDA_PREFIX/etc/apptainer/

However, the folder /etc/apptainer/ didn't exist. So, I installed Apptainer v1.31 manually, repulled all containers, and reran the pipeline. Unfortunately, it got stuck at the CombineAndFilter step. Upon closer inspection, I found that this was primarily due to my very slow connection, which took hours to download a BUSCO file lepidoptera_odb10.tar.gz. I experienced the same issue with Docker. So I wonder if it's possible to use --offline option for all BUSCO commands within the FLAG pipeline.

@wtroy2
Copy link
Contributor
wtroy2 commented May 21, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0