8000 Add 'final reads for profiling' saving mechanism by jfy133 · Pull Request #272 · nf-core/taxprofiler · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add 'final reads for profiling' saving mechanism #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 21, 2023
Merged

Conversation

jfy133
Copy link
Member
@jfy133 jfy133 commented Mar 17, 2023

Coming in two stages:

  • Calculating whether a sample in a samplesheet has multiple runs or not; adding this to meta
  • modules.conf logic to select the 'final' FASTQ that goes into profiling for publishing in --outdir

Closes #262

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/taxprofiler branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 marked this pull request as draft March 17, 2023 21:33
@jfy133 jfy133 linked an issue Mar 17, 2023 that may be closed by this pull request
@github-actions
Copy link
github-actions bot commented Mar 17, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit ee76a5f

+| ✅ 156 tests passed       |+
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
  • schema_lint - Schema 'description' should be 'Taxonomic classification and profiling of shotgun metagenomic data'
    Found: 'Taxonomic profiling of shotgun metagenomic data'

✅ Tests passed:

Run details

  • nf-core/tools version 2.7.2
  • Run at 2023-04-21 11:48:07

@jfy133
Copy link
Member Author
jfy133 commented Mar 23, 2023

Tests

For expecting:

  • only 2612, 2613 and ERR (as they are the only FASTQ files submitted)

TODO: Documentation

1

✅ preprocessing
✅ complexity / filtlong
✅ host removal
✅ run merging

Expect: in analysis_ready_fastqs only run merged files (3 total)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs

PASSED

2

✅ preprocessing
✅ complexity / filtlong
✅ host removal
❌ run merging

Expect: in analysis_ready_fastqs only host removed files (5 total)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false

PASSED

3

✅ preprocessing
✅ complexity / filtlong
❌ host removal
❌ run merging

Expect: in analysis_ready_fastqs only complexity or filtlong (_filtered) files (5 files)

bbduk

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_runmerging false

PASSED

prinseq

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_runmerging false --shortread_complexityfilter_tool prinseqplusplus

PASSED

3

✅ preprocessing
❌ complexity / filtlong
❌ host removal
❌ run merging

Expect: in analysis_ready_fastqs only preprocessed reads (5 total - 8 if don't perform read collapsing)

fastp

Expect 5

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false

PASSED

Expect 8(?)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false --shortread_qc_mergepairs false

PASSED

and

adapterremoval

Expect 5

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false --shortread_qc_tool 'adapterremoval'

PASSED

Expect 8

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false --perform_shortread_hostremoval false --perform_longread_hostremoval false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false --shortread_qc_tool 'adapterremoval' --shortread_qc_mergepairs false

PASSED

4

✅ preprocessing
❌ complexity / filtlong
✅ host removal
❌ run merging

Expect: in analysis_ready_fastqs host removed reads (*unmapped) (5 files)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_runmerging false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false

PASSED

5

✅ preprocessing
❌ complexity / filtlong
✅ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (3 files)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false

PASSED

6

✅ preprocessing
❌ complexity / filtlong
❌ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (3)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false --perform_shortread_hostremoval false --perform_longread_hostremoval false

PASSED

7

❌ preprocessing
❌ complexity / filtlong
✅ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (5 - 2 samples, 2612 and 2612 unmerged)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_qc false --perform_longread_qc false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false

PASSED

8

❌ preprocessing
✅ complexity
❌ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (4). No nanopore expected as part of preprocessing

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_qc false --perform_longread_qc false  --perform_shortread_hostremoval false --perform_longread_hostremoval false

PASSED Missing ERR* from filtlong?

9

❌ preprocessing
✅ complexity / filtlong
✅ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (5 - run-cat but not merged)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_qc false --perform_longread_qc false

PASSED

10

❌ preprocessing
❌ complexity / filtlong
❌ host removal
✅ run merging

Expect: in analysis_ready_fastqs run merged reads (expect only run merged reads, all other are all as user provided)

nextflow run ../main.nf -profile singularity,test_noprofiling --input data/samplesheet.csv --databases data/database.csv --hostremoval_reference data/genome.fasta --outdir ./results -ansi-log false --publish_dir_mode 'symlink' -resume --save_analysis_ready_fastqs --perform_shortread_qc false --perform_longread_qc false --perform_shortread_complexityfilter false --longread_qc_skipqualityfilter false --perform_shortread_hostremoval  false --perform_longread_hostremoval false

PASSED

@jfy133 jfy133 marked this pull request as ready for review April 15, 2023 10:54
@jfy133
Copy link
Member Author
jfy133 commented Apr 15, 2023

Note this is maybe not the most optimal implementation. E.g., one could calculate already at the beginning of a pipeline what is teh 'final' step, and assign a numeric ID in meta that is then evaluated by teh publishDir of teh analysis_ready_reads directory in modules.conf depending at what 'checkpoint' the number matches.

However this wasn't working initially when I tried, and would require having the logic in multiple places. So I opted for this more verbose way.

@jfy133 jfy133 requested a review from a team April 15, 2023 10:56
Copy link
Collaborator
@sofstam sofstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CAT_FASTQ is removed, shall we update the taxprofiler tube as well?

Co-authored-by: Sofia Stamouli <91951607+sofstam@users.noreply.github.com>
@jfy133
Copy link
Member Author
jfy133 commented Apr 20, 2023

CAT_FASTQ is not removed, just moved (further up in the config file :) )

@jfy133 jfy133 requested a review from sofstam April 20, 2023 13:01
Co-authored-by: Sofia Stamouli <91951607+sofstam@users.noreply.github.com>
@jfy133 jfy133 requested a review from sofstam April 21, 2023 11:44
@jfy133 jfy133 merged commit 2140928 into dev Apr 21, 2023
@jfy133 jfy133 deleted the final-reads-saving branch April 21, 2023 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save run-merged reads for single run samples
2 participants
0