8000 2.4 review comments by jfy133 · Pull Request #508 · nf-core/mag · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

2.4 review comments #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Changed`

- [#428](https://github.com/nf-core/mag/pull/428) - Update to nf-core 2.8 `TEMPLATE` (by @jfy133)
- [#428](https://github.com/nf-core/mag/pull/428) [#467](https://github.com/nf-core/mag/pull/467) - Update to nf-core 2.8, 2.9 `TEMPLATE` (by @jfy133)
- [#429](https://github.com/nf-core/mag/pull/429) - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
- [#441](https://github.com/nf-core/mag/pull/441) - Deactivated CONCOCT in AWS 'full test' due to very long runtime (fix by @jfy133).
- [#442](https://github.com/nf-core/mag/pull/442) - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
- [#444](https://github.com/nf-core/mag/pull/444) - Moved BUSCO bash code to script (by @jfy133)
- [#428](https://github.com/nf-core/mag/pull/429) - Update to nf-core 2.9 `TEMPLATE` (by @jfy133)
- [#437](https://github.com/nf-core/mag/pull/429) - `--gtdb` parameter is split into `--skip_gtdbtk` and `--gtdb_db` to allow finer control over GTDB database retrieval (fix by @jfy133)
- [#477](https://github.com/nf-core/mag/pull/477) - `--gtdb` parameter is split into `--skip_gtdbtk` and `--gtdb_db` to allow finer control over GTDB database retrieval (fix by @jfy133)
- [#500](https://github.com/nf-core/mag/pull/500) - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)

### `Fixed`
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ The pipeline then:
- assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki)
- performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast)
- (optionally) performs ancient DNA assembly validation using [PyDamage](https://github.com/maxibor/pydamage) and contig consensus sequence recalling with [Freebayes](https://github.com/freebayes/freebayes) and [BCFtools](http://samtools.github.io/bcftools/bcftools.html)
- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal)
- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal), and bins with [Prokka](https://github.com/tseemann/prokka) and optionally [MetaEuk](https://www.google.com/search?channel=fs&client=ubuntu-sn&q=MetaEuk)
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), [MaxBin2](https://sourceforge.net/projects/maxbin2/), and/or with [CONCOCT](https://github.com/BinPro/CONCOCT), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/), or [CheckM](https://ecogenomics.github.io/CheckM/), and optionally [GUNC](https://grp-bork.embl-community.io/gunc/).
- Performs ancient DNA validation and repair with [pyDamage](https://github.com/maxibor/pydamage) and [freebayes](https://github.com/freebayes/freebayes)
- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool)
- assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT) and optionally identifies viruses in assemblies using [geNomad](https://github.com/apcamargo/genomad)
- assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT) and optionally identifies viruses in assemblies using [geNomad](https://github.com/apcamargo/genomad), or Eukaryotes with [Tiara](https://github.com/ibe-uw/tiara)

Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions.

Expand Down
2 changes: 0 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ process {
}

withName: CENTRIFUGE {
ext.prefix = { ${meta.id} }
publishDir = [
path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" },
mode: params.publish_dir_mode,
Expand All @@ -232,7 +231,6 @@ process {
}

withName: KRAKEN2 {
ext.prefix = { ${meta.id} }
ext.args = '--quiet'
publishDir = [
path: { "${params.outdir}/Taxonomy/kraken2/${meta.id}" },
Expand Down
Binary file modified docs/images/mag_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 9 additions & 5 deletions docs/images/mag_workflow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ By default, only the raw bins (and unbinned contigs) from the actual binning met

⚠️ Due to ability to perform downstream QC of both raw and refined bins in parallel (via `--postbinning_input)`, bin names in DAS Tools's `*_allBins.eval` file will include `Refined`. However for this particular file, they _actually_ refer to the 'raw' input bins. The pipeline renames the input files prior to running DASTool to ensure they can be disambiguated from the original bin files in the downstream QC steps.

### Tiara
### Tiara

Tiara is a contig classifier that identifies the domain (prokarya, eukarya) of contigs within an assembly. This is used in this pipeline to rapidly and with few resources identify the most likely domain classification of each bin or unbin based on its contig identities.

Expand Down
3 changes: 1 addition & 2 deletions docs/usage.md
< 9E19 tr data-hunk="bc35fbe9041acbf8d8f780aaeae2e67129f46e75101d680abcac09283baa00f3" class="show-top-border">
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,6 @@ with `params.yaml` containing:
```yaml
input: './samplesheet.csv'
outdir: './results/'
input: 'data'
<...>
```

Expand Down Expand Up @@ -191,7 +190,7 @@ To allow also reproducible bin QC with BUSCO, run BUSCO providing already downlo

For the taxonomic bin classification with [CAT](https://github.com/dutilh/CAT), when running the pipeline with `--cat_db_generate` the parameter `--save_cat_db` can be used to also save the generated database to allow reproducibility in future runs. Note that when specifying a pre-built database with `--cat_db`, currently the database can not be saved.

When it comes to visualizing taxonomic data using [Krona](https://github.com/marbl/Krona), you have the option to provide a taxonomy file, such as `taxonomy.tab`, using the `--krona_db` parameter. If you don't supply a taxonomy file, Krona is designed to automatically download the required taxonomy data for visualization. If you choose to provide a pre-existing taxonomy file using the `--krona_db` parameter, Krona will use that file for visualization. On the other hand, if you omit the `--krona_db` parameter, Krona will download the necessary taxonomy information automatically to enable visualization.
When it comes to visualizing taxonomic data using [Krona](https://github.com/marbl/Krona), you have the option to provide a taxonomy file, such as `taxonomy.tab`, using the `--krona_db` parameter. If you don't supply a taxonomy file, Krona is designed to automatically download the required taxonomy data for visualization.

The taxonomic classification of bins with GTDB-Tk is not guaranteed to be reproducible, since the placement of bins in the reference tree is non-deterministic. However, the authors of the GTDB-Tk article examined the reproducibility on a set of 100 genomes across 50 trials and did not observe any difference (see [https://doi.org/10.1093/bioinformatics/btz848](https://doi.org/10.1093/bioinformatics/btz848)).

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
},
"aria2": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"],
"patch": "modules/nf-core/aria2/aria2.diff"
},
Expand Down
2 changes: 1 addition & 1 deletion modules/local/centrifuge.nf
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ process CENTRIFUGE {
output:
tuple val("centrifuge"), val(meta), path("results.krona"), emit: results_for_krona
path "report.txt" , emit: report
tuple val(meta), path("*kreport.txt") , emit: kreport
tuple val(meta), path("*kreport.txt") , emit: kreport
path "versions.yml" , emit: versions

script:
Expand Down
9 changes: 0 additions & 9 deletions modules/nf-core/aria2/aria2.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/aria2/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 1 addition & 3 deletions subworkflows/local/binning.nf
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,7 @@ workflow BINNING {
// generate coverage depths for each contig
ch_summarizedepth_input = assemblies
.map { meta, assembly, bams, bais ->
def meta_keys = meta.keySet()
def meta_new = meta + meta.subMap(meta_keys)
[ meta_new, bams, bais ]
[ meta, bams, bais ]
}

METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS ( ch_summarizedepth_input )
Expand Down
4 changes: 0 additions &am E5B3 p; 4 deletions subworkflows/local/depths.nf
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
params.mag_depths_options = [:]
params.mag_depths_plot_options = [:]
params.mag_depths_summary_options = [:]

include { MAG_DEPTHS } from '../../modules/local/mag_depths'
include { MAG_DEPTHS_PLOT } from '../../modules/local/mag_depths_plot'
include { MAG_DEPTHS_SUMMARY } from '../../modules/local/mag_depths_summary'
Expand Down
24 changes: 12 additions & 12 deletions workflows/mag.nf
Original file line number Diff line number Diff line change
Expand Up @@ -89,18 +89,18 @@ include { COMBINE_TSV as COMBINE_SUMMARY_TSV } from '../modules
//
// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules
//
include { INPUT_CHECK } from '../subworkflows/local/input_check'
include { BINNING_PREPARATION } from '../subworkflows/local/binning_preparation'
include { BINNING } from '../subworkflows/local/binning'
include { BINNING_REFINEMENT } from '../subworkflows/local/binning_refinement'
include { BUSCO_QC } from '../subworkflows/local/busco_qc'
include { VIRUS_IDENTIFICATION} from '../subworkflows/local/virus_identification'
include { CHECKM_QC } from '../subworkflows/local/checkm_qc'
include { GUNC_QC } from '../subworkflows/local/gunc_qc'
include { GTDBTK } from '../subworkflows/local/gtdbtk'
include { INPUT_CHECK } from '../subworkflows/local/input_check'
include { BINNING_PREPARATION } from '../subworkflows/local/binning_preparation'
include { BINNING } from '../subworkflows/local/binning'
include { BINNING_REFINEMENT } from '../subworkflows/local/binning_refinement'
include { BUSCO_QC } from '../subworkflows/local/busco_qc'
include { VIRUS_IDENTIFICATION } from '../subworkflows/local/virus_identification'
include { CHECKM_QC } from '../subworkflows/local/checkm_qc'
include { GUNC_QC } from '../subworkflows/local/gunc_qc'
include { GTDBTK } from '../subworkflows/local/gtdbtk'
include { ANCIENT_DNA_ASSEMBLY_VALIDATION } from '../subworkflows/local/ancient_dna'
include { DOMAIN_CLASSIFICATION } from '../subworkflows/local/domain_classification'
include { DEPTHS } from '../subworkflows/local/depths'
include { DOMAIN_CLASSIFICATION } from '../subworkflows/local/domain_classification'
include { DEPTHS } from '../subworkflows/local/depths'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -414,7 +414,7 @@ workflow MAG {
ch_versions = ch_versions.mix(NANOPLOT_RAW.out.versions.first())

ch_long_reads = ch_raw_long_reads
.map {
.map {
meta, reads ->
def meta_new = meta - meta.subMap('run')
[ meta_new, reads ]
Expand Down
0