8000 GitHub - mpieva/quicksand: A pipeline for the analysis of sedimentary ancient mtDNA
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

mpieva/quicksand

Repository files navigation

MIT License DOI

quicksand

See readthedocs for the full documentation of the pipeline.

Description

quicksand (quick analysis of sedimentary ancient DNA) is an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA (mtDNA) in aDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with downstream mapping (BWA), post-classification filtering, and ancient DNA authentication. quicksand is optimized for speed and portablity and requires either Singularity or Docker.

Workflow

Graphical representation of the pipeline workflow

Quickstart

Requirements

To run the pipeline, please install

Note: To run nextflow+singularity, your kernel needs to support user-namespaces (see here or here).

Prepare Input

The input for quicksand is a directory with user-supplied files in BAM or FASTQ format. Adapter-trimming, overlap-merging and sequence demultiplexing need to be performed by the user prior to running quicksand. Provide the directory with the --split flag

Download Test-file

As a test file, download the Hohlenstein-Stadel mtDNA (please see the README for more information)

wget -P split \
http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam

Create Reference Database

The required KrakenUniq database, the reference genomes for mapping and the bed-files for low-complexity filtering are available on the MPI EVA FTP Servers. Custom versions of the reference material can be created with the quicksand-build pipeline

Create Test Database

For the quickstart of quicksand, create a fresh database containing only the Hominidae mtDNA reference genomes (runtime: ~3-5 minutes)

nextflow run mpieva/quicksand-build -r v3.0 \
  --include  Hominidae \
  --outdir   refseq \
  -profile   singularity

Download Full Database

To download the full reference database (~60GB), use this command:

latest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)
wget -r -np -nc -nH --cut-dirs=3 --reject="*index.html*" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest

Run quicksand

quicksand is executed directly from github. With the databases created and the testdata downloaded, run the pipeline as follows:

# set this if you encounter a heap-space error to increase the memory that is used by nextflow
export NXF_OPTS="-Xms10g -Xmx15g" # increase or decrease the numbers as required

nextflow run mpieva/quicksand -r v2.4 \
  --db        refseq/kraken/Mito_db_kmer22/ \
  --genomes   refseq/genomes/ \
  --bedfiles  refseq/masked/ \
  --split     split/ \
  -profile    singularity

Output

Please see the documentation for a comprehensive description of the output!

References

This pipeline uses code inspired by the nf-core initative, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

0