8000 Final @jfy133 review fixes for Bouncy Basenji release by jfy133 · Pull Request #531 · nf-core/taxprofiler · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Final @jfy133 revi 8000 ew fixes for Bouncy Basenji release #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#417](https://github.com/nf-core/taxprofiler/pull/417) - Added reference-free metagenome complexity/coverage estimation with Nonpareil (added by @jfy133)
- [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets can specify a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) - Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)
- [#512](https://github.com/nf-core/taxprofiler/pull/512) - Update all tools to the latest version and include nf-test (updated by @LilyAnderssonLee & @jfy133)
- [#417](https://github.com/nf-core/taxprofiler/pull/417) Added reference-free metagenome complexity/coverage estimation with Nonpareil (added by @jfy133)
- [#466](https://github.com/nf-core/taxprofiler/pull/466) Input database sheets can specify a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)
- [#512](https://github.com/nf-core/taxprofiler/pull/512) Update all tools to the latest version and include nf-test (updated by @LilyAnderssonLee & @jfy133)
- [#512](https://github.com/nf-core/taxprofiler/pull/532) Configure MultiQC to collapse stats of paired-read files into one line (by @jfy133)

### `Fixed`

- [#518](https://github.com/nf-core/taxprofiler/pull/518) Fixed a bug where Oxford Nanopore FASTA input files would not be processed (❤️ to @ikarls for reporting, fixed by @jfy133)
- [#523](https://github.com/nf-core/taxprofiler/pull/523) Removed hardcoded `-m lca` from GANON_CLASSIFY due to more options in new version of ganon (fixed by @LilyAnderssonLee & @jfy133)
- [#531](https://github.com/nf-core/taxprofiler/pull/531) Fix FASTA input validation in schema allowing FASTQ extension, expand allowed FASTA extensions (fixed by @jfy133)
- [#512](https://github.com/nf-core/taxprofiler/pull/532) Minor formatting and ordering improvements in MultiQC report (by @jfy133)
- [#532](https://github.com/nf-core/taxprofiler/pull/532) - Added missing documentation behind the 'ignore' BRACKEN_BRACKEN error strategy (❤️ to @Mavti for reporting, fixed by @jfy133)

### `Dependencies`
Expand All @@ -35,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| metaphlan | 4.0.6 | 4.1.1 |
| minimap2 | 2.24 | 2.28 |
| motus/profile | 3.0.3 | 3.1.0 |
| multiqc | 1.21 | 1.24.1 |
| multiqc | 1.21 | 1.25 |
| samtools | 1.17 | 1.20 |

### `Deprecated`
Expand Down
59 changes: 36 additions & 23 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,41 +18,41 @@ report_section_order:
fastqc-1:
order: 800
fastp:
order: 750
adapterremoval:
order: 700
adapterRemoval:
order: 600
nonpareil:
order: 600
bbduk:
order: 500
prinseqplusplus:
order: 550
porechop:
order: 400
porechop_abi:
order: 450
bbduk:
order: 300
prinseqplusplus:
order: 200
porechop_abi:
order: 400
filtlong:
order: 100
order: 350
nanoq:
order: 95
order: 300
bowtie2:
order: 90
order: 200
samtools:
order: 80
order: 100
kraken:
order: 70
order: 90
bracken:
order: 60
order: 80
centrifuge:
order: 50
order: 70
malt:
order: 40
order: 60
diamond:
order: 30
order: 50
kaiju:
order: 20
order: 40
motus:
order: 10
order: 30

export_plots: true

Expand All @@ -63,7 +63,7 @@ custom_logo_title: "nf-core/taxprofiler"

run_modules:
- fastqc
- adapterRemoval
- adapterremoval
- fastp
- nonpareil
- bbduk
Expand All @@ -72,7 +72,6 @@ run_modules:
- filtlong
- nanoq
- bowtie2
- minimap2
- samtools
- kraken
- kaiju
Expand All @@ -83,7 +82,7 @@ run_modules:

sp:
diamond:
fn_re: ".*.diamond.log$"
fn: "*.diamond.log"
fastqc/data:
fn_re: ".*(fastqc|falco)_data.txt$"
fastqc/zip:
Expand Down Expand Up @@ -210,7 +209,8 @@ table_columns_placement:
Filtlong:
Target bases: 600
nanoq:
Read N50: 700
Reads: 700
Read N50: 710
BBDuk:
Input reads: 800
Total Removed bases percent: 810
Expand Down Expand Up @@ -312,6 +312,7 @@ table_columns_visible:
Target bases: True
nanoq:
ReadN50: True
Reads: True
BBDuk:
Input reads: False
Total Removed bases Percent: False
Expand Down Expand Up @@ -356,6 +357,17 @@ table_columns_name:
reads_mapped: "Nr. Mapped Reads"
reads_mapped_percent: "% Mapped Reads"

## Allow collapsing of file names with _R1/_R2 or _1/_2 at the end
table_sample_merge:
"Read 1":
- "_R1"
- type: regex
pattern: "[_.-][rR]?1$"
"Read 2":
- "_R2"
- type: regex
pattern: "[_.-][rR]?2$"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart!!! Is it possible to share the MultiQC report? I am very curious how the new one looks like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HAha, I just copied from the MultiQC docs from the new functionality page, but sure!

It's still not perfect - if I removed _raw it would collapse everything and not have the expansion thingy

multiqc_.zip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. It's better to keep _raw.
I didn't see Nonpareil in the MultiQC report.

extra_fn_clean_exts:
- "kraken2.report.txt"
- ".txt"
Expand All @@ -366,6 +378,7 @@ extra_fn_clean_exts:
- "porechop"
- "porechop_abi"
- "_processed"
- ".diamond"
- type: remove
pattern: "_falco"

Expand Down
8 changes: 4 additions & 4 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,21 @@
"format": "file-path",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"unique": true,
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
"errorMessage": "Gzipped FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"type": "string",
"format": "file-path",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"unique": true,
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'. If not applicable, leave it empty."
"errorMessage": "Gzipped FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'. If not applicable, leave it empty."
},
"fasta": {
"type": "string",
"format": "file-path",
"pattern": "^\\S+\\.(f(ast)?q|fa(sta)?)\\.gz$",
"pattern": "^\\S+\\.(fasta|fas|fna|fa)\\.gz?$",
"unique": true,
"errorMessage": "FastA file must be provided, cannot contain spaces and must have extension '.fa.gz' or '.fasta.gz'. If not applicable, leave it empty."
"errorMessage": "Gzipped FastA file must be provided, cannot contain spaces and must have extension '.fa.gz', 'fna.gz', 'fas.gz', or '.fasta.gz'. If not applicable, leave it empty."
}
},
"required": ["sample", "run_accession", "instrument_platform"]
Expand Down
12 changes: 11 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -723,7 +723,17 @@ You can expect in the MultiQC reports either sections and/or general stats colum
- motus

:::info
The 'General Stats' table by default will only show statistics referring to pre-processing steps, and will not display possible values from each classifier/profiler, unless turned on by the user within the 'Configure Columns' menu or via a custom MultiQC config file (`--multiqc_config`)
The 'General Stats' table by default will only show statistics referring to pre-processing steps, and will not display possible values from each classifier/profiler, unless turned on by the user within the 'Configure Columns' menu or via a custom MultiQC config file (`--multiqc_config`).

For example, DIAMOND output does not have a dedicated section in the MultiQC HTML, only in the general stats table. To turn this on, copy the nf-core/taxprofiler [MultiQC config](https://github.com/nf-core/taxprofiler/blob/master/assets/multiqc_config.yml) and change the DIAMOND entry in `table_columns_visible:` to True.
:::

:::info
In the 'General Stats' table, files that end with `_R1/_R2` or `_1/_2` prior the file format extension will be collapsed into single rows.

It is assumed that file names only differening by these characters are associated paired-end reads and stats should be reported together.

For example `sample1_R1.fastq.gz` and `sample1_R2.fastq.gz` will be reported together as `sample1`, with R1/R2 specific stats included inside the collapsed row.
:::

### Pipeline information
Expand Down
5 changes: 5 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,11 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
FASTA input will not go through any preprocessing steps, and will go directly to profiling.
:::

:::warning
Files names prior the file format extension that include `_R1`/`_R2`, or `_1`/`_2` will be automatically be collapsed in the MultiQC report's General Stats table.
Please see output documentation for more information.
:::

### Full database sheet

nf-core/taxprofiler supports multiple databases being classified/profiled against in parallel for each tool.
Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@
},
"multiqc": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
"git_sha": "7c316cae26baf55e0add993bed2b0c9f7105c653",
"installed_by": ["modules"]
},
"nanoq": {
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/multiqc/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/multiqc/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/multiqc/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion subworkflows/local/nonpareil.nf
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ workflow NONPAREIL {
.map {
meta, reads ->
def reads_new = meta.single_end ? reads : reads[0]
// taxprofiler only accepts gzipped input files,
// so don't need to account for getBaseName removing all extensions
def format = reads_new[0].getBaseName().split('\\.').last() in ['fasta', 'fna', 'fa', 'fas'] ? 'fasta' : 'fastq'

[meta, reads_new, format]
}
.multiMap {
Expand Down
0