-
Notifications
You must be signed in to change notification settings - Fork 859
PRELOADKRAKENUNIQ: Fix preloading mechanism to prevent (partial) double loading of DB #8397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
params.krakenuniq_ram_chunk_size. Pass ram_chunk_size to database preload call and each classify call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I've realised --reload-size
and ram_chunk_size
should be removed entirely as it's an optional thing (--preload
makes sense in the context of the module, but `--preload-size is an optional override)
Krakenuniq command-line option `--preload-size` is non-mandatory. When it is required, it should be passed in `ext.args`. Also, remove the associated process input, `ram_chunk_size`.
Based on test runs, the initial krakenuniq call to load the full database into memory does not seem to be required when option |
@muniheart thanks for the testing! Yes, I agree your proposal makes sense - something like |
Btw @muniheart I strongly suggest you join the nf-core github organisation (https://nf-co.re/join#github) so you don't have to wait for approval for tests to run. If you are unable to join the slack, just let me know here and I can add you. |
When `--preload-size` option is passed in `ext.args`, preloading the database is not necessary and may unnecessarily consume memory.
0d0037b
to
b94764c
Compare
Error in tests is: |
Remove input `ram_chunk_size`. Pass `--preload-size $ram_chunk_size` in `ext.args` to load database in chunks. Database will be loaded into memory prior to classification if and only if option `--preload-size` is not present in `ext.args`.
Sorry, @jfy133, I made changes since your last review. After looking at generated BASH scripts I realized the script code would be clearer if I were to follow your method of using conditional logic in nextflow rather than passing it to BASH. I also made a patch to the module for testing which I will remove when the module PR is approved. |
@muniheart no worries! Firstly, please let me know if you're happy to be added to the nf-core github organisation so we don't have to have me approve the tests each time. Otherwise conceptually yes, the new method will be the best way, just the way you've written it by
|
I'm on slack and happy to be added to nf-core github organization. |
I can't insert nextflow comment as suggested, @jfy133, without closing/reopening script string. Changing to BASH comment may lead to confusion in case |
b18848e
to
7e10bd1
Compare
Great! I've added you to the organisation, you should have an email and also if you go to https://github.com/nf-core you should have invitation there too.
I'm not sure if I follow, sorry. if it's OK with you I can push directly to your branch what I mean. But I wait for your OK first before I do so (I could do it via multiple suggestions but might be ugly) |
You are free to push directly to my branch, @jfy133. |
Have a look now @muniheart and feel free to test it, but I checked teh |
Official test results are in. The .command.sh LGTM, too. Thanks for seeing this through, @jfy133! Apptainer> nf-core modules -g https://github.com/muniheart/modules.git --branch update-my-module test --profile singular
ity krakenuniq/preloadedkrakenuniq
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 3.2.1 - https://nf-co.re
INFO Generating nf-test snapshot
╭─────────────────────────────────────────────────── nf-test output ───────────────────────────────────────────────────╮
│ │
│ 🚀 nf-test 0.9.2 │
│ https://www.nf-test.com │
│ (c) 2021 - 2024 Lukas Forer and
8000
Sebastian Schoenherr │
│ │
│ Load .nf-test/plugins/nft-bam/0.5.0/nft-bam-0.5.0.jar │
│ Load .nf-test/plugins/nft-compress/0.1.0/nft-compress-0.1.0.jar │
│ Load .nf-test/plugins/nft-vcf/1.0.7/nft-vcf-1.0.7.jar │
│ Load .nf-test/plugins/nft-csv/0.1.0/nft-csv-0.1.0.jar │
│ Load .nf-test/plugins/nft-utils/0.0.3/nft-utils-0.0.3.jar │
│ │
│ Test Process KRAKENUNIQ_PRELOADEDKRAKENUNIQ │
│ │
│ │
│ Test [de01b0fc] 'sarscov2 - FASTA' PASSED (47.621s) │
│ Test [daed92ca] 'sarscov2 - FASTA - ramchunksize' PASSED (17.027s) │
│ Test [57e5bc3b] 'sarscov2 - Illumina FASTQ single' PASSED (15.898s) │
│ Test [b5727000] 'sarscov2 - Illumina FASTQ paired-end' PASSED (16.598s) │
│ Test [a87eef07] 'sarscov2 - FASTA - stub' PASSED (14.532s) │
│ Test [f426379c] 'sarscov2 - Illumina FASTQ single - stub' PASSED (15.077s) │
│ Test [318d4dfb] 'sarscov2 - Illumina FASTQ paired-end - stub' PASSED (14.943s) │
│ │
│ │
│ SUCCESS: Executed 7 tests in 141.824s │
│ │
╰──────────────────────────────────────────────────────── |
I don't know how I missed it, but the logic is reversed in
|
9h poop, copy paste error... Could you make s be PR? |
No problem. #8483 |
* Pass task.ext.args last to give them precedence over params.krakenuniq_ram_chunk_size. Pass ram_chunk_size to database preload call and each classify call. * Remove krakenuniq command-line option `--preload-size`. Krakenuniq command-line option `--preload-size` is non-mandatory. When it is required, it should be passed in `ext.args`. Also, remove the associated process input, `ram_chunk_size`. * Preload database if and only if option `--preload-size` is not present. When `--preload-size` option is passed in `ext.args`, preloading the database is not necessary and may unnecessarily consume memory. * Fix bug in database preloading. Remove input `ram_chunk_size`. Remove input `ram_chunk_size`. Pass `--preload-size $ram_chunk_size` in `ext.args` to load database in chunks. Database will be loaded into memory prior to classification if and only if option `--preload-size` is not present in `ext.args`. * Reference `preload_cmd` in BASH script instead of concatenating scripts. * Use environment variable to control preloading of database. * Don't export variable. * Restructure preload insertion * Add test that preload works * Fix logic * Fix logic in stub block. --------- Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
The default value for
--preload-size
isparams.krakenuniq_ram_chunk_size
. Allow the default value to be overridden by placing options passed intask.ext.args
on the command-line after default option.Option
--preload-size
must be passed to the database preload call and to the classify call for each input dataset.PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda