This project introduces a single C++ file as interface to the basic htslib
, which can be easily included in a C++ program
for scripting high-performance genomic analyses.
Features:
- single file to be easily included and compiled
- easy and safe API to use. no worry about free memory.
- has the full functionalities of the
htslib
, eg. supports of compressed VCF/BCF and URL link as filename. - compatible with C++11 and later
- download the released vcfpp.h and include it in your program.
- make sure you have https://github.com/samtools/htslib installed on your system and the it is in your environment.
You can copy paste the example code into a example.cpp
file and compile it by g++ example.cpp -std=c++11 -O2 -Wall -I. -lhts -lz -lm -lbz2 -llzma -lcurl
Let’s count the number of heterozygous sites for each sample in all records. The core is just 10 lines.
#include "vcfpp.h"
using namespace std;
using namespace vcfpp;
int main(int argc, char* argv[])
{
// fetch data from 1000 genomes server
BcfReader vcf("http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr22.filtered.SNV_INDEL_SV_phased_panel.vcf.gz");
BcfRecord v(vcf.header); // construct a variant record
vector<char> gt; // genotype can be of bool, char or int type
vector<int> hetsum(vcf.nsamples);
while (vcf.getNextVariant(v)) {
if (!v.isSNP()) continue; // skip other type of variants
v.getGenotypes(gt);
for (int i = 0; i < gt.size()/2 ; i++) { // for diploid
hetsum[i] += abs(gt[2 * i + 0] - gt[2 * i +1]);
}
}
for (auto i : hetsum) { cout << i << endl; }
return 0;
}
Find more useful algorithms/tools with VCF/BCF input in tools