Open
2 of 6 issues completedDescription
This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.
Preprocessing (check out nf-core/mag, any other examples out there?)
- Filtlong to filter ONT by quality (e.g. >7)
- Bowtie2 to remove Illumina PhiX reads
- Nanolyse (alternatively Minimap2) to remove ONT Lambda reads
- add option to down-sample reads, because sometimes this can actually improve assembly
Assemblers:
- MEGAHIT (a5-miseq Add A5-miseq support #23 , ...) to have alternative short read assembler
- Trycycler to have better hybrid and long read assembly than Unicycler
- Flye (Tulip, Redbean, Raven) to have more long read assemblers at hand
- Pilon to polish Nanopore-derived contigs with Illumina reads (for long read assemblers)
Assembly QC:
- BUSCO to check completeness and contamination of assemblies (and possibly bins)
- MaxBin2 (or any other binner) to separate assembly (cleanup if contaminated). In contrast to other binners, MaxBin2 outputs "Completeness, Genome size, GC content" for each bin it found, that comes very handy when judging whether there is real contamination.
Structural:
- Use only the most polished assembly for Prokka & QUAST (currently assemblies before polishing are used!)
- By default, run all (or at least many) assemblers inclusive polishing (Medaka & Pilon) that are appropriate for a data set. That allows easy comparison (with e.g. QUAST and BUSCO) of the performance of different assemblers and choosing the best assembly.
Defaults
- In my opinion,
--skip_kraken2
should be either removed (i.e. using--krakendb
to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for--krakendb
, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.