8000 GitHub - WGLab/NanoCaller at v3.3.0
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

WGLab/NanoCaller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoCaller

install with bioconda

NanoCaller is a computational method that integrates long reads in deep convolutional neural network for the detection of SNPs/indels from long-read sequencing data. NanoCaller uses long-range haplotype structure to generate predictions for each SNP candidate variant site by considering pileup information of other candidate sites sharing reads. Subsequently, it performs read phasing, and carries out local realignment of each set of phased reads and the set of all reads for each indel candidate variant site to generate indel calling, and then creates consensus sequences for indel sequence prediction.

NanoCaller is distributed under the MIT License by Wang Genomics Lab.

Latest Updates

v3.2.0 (May 14 2023): Support added for haploid variant calling which has significant improvement in recall for indel calling. New feature generation methods and models are are used for haploid SNP and indel calling. Now chrY and chrM are assumed to be haploid, with additional parameter --haploid_X to specify if chrX is haploid. Another parameter --haploid_genome can be used for haploid variant calling on all chromosomes.

v3.0.1 (March 14 2023) : Several critical bugs regarding coverage normalization and integer overflow fixed. These bug affected very low and high coverage sample. The normalization bug was only introduced in v3.0.0 so any samples processed before that should not have been affected. Whereas integer overflow bug was much older and it only was affecting sample with more than 256 coverage.

v3.0.0 (June 7 2022) : A major update in API with single entry point for running NanoCaller. Major changes in parallelization routine with GNU parallel no longer used for whole genome variant calling.

v2.0.0 (Feb 2 2022) : A major update in API and installation instructions, with release of bioconda recipe for NanoCaller. Added support for indel calling in case of poor or non-existent phasing.

v1.0.0 (Aug 8 2021) : First post-production release with citeable DOI: DOI

v0.4.1 (Aug 3 2021) : Fixed a bug causing slower runtime in whole genome variant calling mode.

v0.4.0 (June 2 2021) : Added NanoCaller models trained on ONT reads basecalled with Guppy v4.2.2 and Bonito v0.30, as well as R10.3 reads. Added new NanoCaller models trained with long CCS reads (15-20kb library selection). Improved indel calling with rolling window for candidate selection which helps with indels in low complexity regions.

Installation

NanoCaller can be installed using Docker or Conda. The easiest way to install is from the bioconda channel:

conda install -c bioconda nanocaller

or using Docker:

VERSION="3.2.0"
docker pull genomicslab/nanocaller:${VERSION}

Please refer to Installation for instructions regarding installing NanoCaller through other methods.

Usage

General usage of NanoCaller is described in Usage. Some quick usage examples:

  • NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 will run NanoCaller on whole genome using 10 parallel processes.
  • NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --mode snps will only call SNPs.
  • NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --mode snps --phase will only call SNPs and phase them, and will additionally phase the BAM file (under intermediate_phase_files subfolder split by chromosomes).
  • NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --haploid_genome will run NanoCaller on whole genome under the assumption that the genome is haploid.
  • NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --regions chr22:20000000-21000000 chr21 will NanoCaller on chr21 and chr22:20000000-21000000 only.

For a comprehensive case study of variant calling on Nanopore reads, see ONT Case Study, where we describe end-to-end variant calling pipeline for using NanoCaller, where we start with aligning FASTQ files of HG002, calls variants using NanoCaller, and evaluate performances on various genomic regions.

Trained models

Trained models for ONT data, CLR data and HIFI data can be found here. These models are trained on chr1-22 of the genomes stated below, unless mentioned othewise.

You can specify SNP and indel models using --snp_model and --indel_model parameters with a model name from tables below. For instance, if you want to use 'ONT-HG002_bonito' SNP model and 'ONT-HG002' indel model, use the following command:

NanoCaller --snp_model ONT-HG002_bonito --indel_model ONT-HG002

SNP Models

Model Name Sequencing Technology Genome Coverage Benchmark Basecaller
ONT-HG001 ONT R9.4.1 HG001 55 v3.3.2 Guppy4.2.2
ONT-HG001_GP2.3.8 ONT R9.4.1 HG001 34 v3.3.2 Guppy2.3.8
ONT-HG001_GP2.3.8-4.2.2 ONT R9.4.1 HG001 45 v3.3.2 Guppy (2.3.8 + 4.2.2)
ONT-HG001-4_GP4.2.2 ONT R9.4.1 HG001-4 69 v3.3.2 (HG001) + v4.2.1 (HG002-4) Guppy4.2.2
ONT-HG002 ONT R9.4.1 HG002 47 v4.2.1 Guppy4.2.2
ONT-HG002_GP4.2.2_v3.3.2 ONT R9.4.1 HG002 47 v3.3.2 Guppy4.2.2
ONT-HG002_GP2.3.4_v3.3.2 ONT R9.4.1 HG002 53 v3.3.2 Guppy2.3.4
ONT-HG002_GP2.3.4_v4.2.1 ONT R9.4.1 HG002 53 v4.2.1 Guppy2.3.4
ONT-HG002_bonito ONT R9.4.1 HG002 (chr1-21) 51 v4.2.1 Bonito v0.30
ONT-HG002_r10.3 ONT R10.3 HG002 (chr1-21) 32 v4.2.1 Guppy4.0.11
CCS-HG001 PacBio CCS HG001 57 v3.3.2 -
CCS-HG002 PacBio CCS HG002 56 v4.2.1 -
CCS-HG001-4 PacBio CCS HG001-4 55 v3.3.2 (HG001) + v4.2.1 (HG002-4) Guppy4.2.2
CLR-HG002 PacBio CLR HG002 58 v4.2.1 -
NanoCaller1 ONT R9.4.1 HG001 34 v3.3.2 Guppy2.3.8
NanoCaller2 ONT R9.4.1 HG002 53 v3.3.2 Guppy2.3.4
NanoCaller3 PacBio CLR HG003 28 v3.3.2 -

Indel Models

Model Name Sequencing Technology Genome Coverage Benchmark Basecaller
ONT-HG001 ONT R9.4.1 HG001 55 v3.3.2 Guppy4.2.2
ONT-HG002 ONT R9.4.1 HG002 47 v4.2.1 Guppy4.2.2
CCS-HG001 PacBio CCS HG001 57 v3.3.2 -
CCS-HG002 PacBio CCS HG002 56 v4.2.1 -
NanoCaller1 ONT R9.4.1 HG001 34 v3.3.2 Guppy2.3.8
NanoCaller3 PacBio CCS HG001 29 v3.3.2 -

Citing NanoCaller

Please cite: Ahsan, M.U., Liu, Q., Fang, L. et al. NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22, 261 (2021). https://doi.org/10.1186/s13059-021-02472-2.

About

Variant calling tool for long-read sequencing data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  
0