8000 List of ideas to improve assemblies · Issue #57 · nf-core/bacass · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
List of ideas to improve assemblies #57
Open
2 of 6 issues completed
Open
2 of 6 issues completed
@d4straub

Description

@d4straub

This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.

Preprocessing (check out nf-core/mag, any other examples out there?)

  • Filtlong to filter ONT by quality (e.g. >7)
  • Bowtie2 to remove Illumina PhiX reads
  • Nanolyse (alternatively Minimap2) to remove ONT Lambda reads
  • add option to down-sample reads, because sometimes this can actually improve assembly

Assemblers:

  • MEGAHIT (a5-miseq Add A5-miseq support #23 , ...) to have alternative short read assembler
  • Trycycler to have better hybrid and long read assembly than Unicycler
  • Flye (Tulip, Redbean, Raven) to have more long read assemblers at hand
  • Pilon to polish Nanopore-derived contigs with Illumina reads (for long read assemblers)

Assembly QC:

  • BUSCO to check completeness and contamination of assemblies (and possibly bins)
  • MaxBin2 (or any other binner) to separate assembly (cleanup if contaminated). In contrast to other binners, MaxBin2 outputs "Completeness, Genome size, GC content" for each bin it found, that comes very handy when judging whether there is real contamination.

Structural:

  • Use only the most polished assembly for Prokka & QUAST (currently assemblies before polishing are used!)
  • By default, run all (or at least many) assemblers inclusive polishing (Medaka & Pilon) that are appropriate for a data set. That allows easy comparison (with e.g. QUAST and BUSCO) of the performance of different assemblers and choosing the best assembly.

Defaults

  • In my opinion, --skip_kraken2 should be either removed (i.e. using --krakendb to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for --krakendb, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0