8000 GitHub - PoisonAlien/annovar2maf: Tiny python script to generate MAF files from output generated by standard annotation programs
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PoisonAlien/annovar2maf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is a tiny python script to generate MAF files from output generated by stadard annotation programs. Currently, annovar - table_annovar.pl output and bcftools csq outputs can be converted to maf.

$ python annovar2maf.py -h
usage: annovar2maf [-h] [-t TSB] [-b BUILD] [-p {refGene,ensGene}] [-c] input

Convert annovar and bcftools-csq annotations to MAF

positional arguments:
  input                 Annovar anotations file [Ex: myanno.hg19_multianno.txt] or a csq formatted file.

optional arguments:
  -h, --help            show this help message and exit
  -t TSB, --tsb TSB     Sample name. Default parses from the file name
  -b BUILD, --build BUILD
                        Reference genome build [Default: hg38]
  -p {refGene,ensGene}, --protocol {refGene,ensGene}
                        Protocol used to generate annovar annotations [Default: refGene]
  -c, --csq             Input file is a bcftools csq formatted output

annovar2maf

python annovar2maf.py -t foo -b GRCh37 tests/test_mutect.refseq.hg19_multianno.txt 

# For annovar annotations generated with ensGene as a protocol
python annovar2maf.py -p ensGene -t foo -b GRCh37 tests/test_mutect.ens.hg19_multianno.txt

csq2maf

Similar to VEP, bcftools csq command can annotate variants with consequences. The program is lightweight and extremely fast Output can be converted to tsv with split-vep and then converted to MAF.

ref="Homo_sapiens.GRCh37.dna.primary_assembly.fa"

# Get the GFF files for your ref build
## GRCh38 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.gff3.gz

## GRCh37 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.chr.gff3.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.gff3.gz

## Step-1: Below commands left normalizes the VCF, splits multi-alleleic variants, annotates vcf with variant consequences while prioritizing variants with worst consequences. 
bcftools norm -f ${ref} -m -both -Oz tests/test_mutect.vcf.gz | bcftools csq -c CSQ -f ${ref} -g Homo_sapiens.GRCh37.82.gff3.gz -p a | \
bcftools +split-vep /dev/stdin -Oz -o tests/test_mutect.csq.vcf.gz -c - -s worst

## Step-2: Below command converts csq annotated vcf to tsv
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%gene\t%transcript\t%Consequence\t%amino_acid_change\t%dna_change\n' tests/test_mutect.csq.vcf.gz > tests/test_mutect.csq.tsv

## Step-3: Now Covert tsv to maf
python annovar2maf.py -c -t foo -b GRCh37 tests/test_mutect.cs
4FC2
q.tsv

About

Tiny python script to generate MAF files from output generated by standard annotation programs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0