8000 GitHub - ccwu1212/VCF2Dis: VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

License

Notifications You must be signed in to change notification settings

ccwu1212/VCF2Dis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCF2Dis

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

1) Install


Download


Just [make] or [sh make.sh ] to compile this software.the final software can be found in the Dir [bin/VCF2Dis]
For linux /Unix and macOS

        tar -zxvf  VCF2DisXXX.tar.gz             # if Link do not work ,Try re-install [zlib]library
        cd VCF2DisXXX;                           # [zlib] and copy them to the library Dir
        make ; make clean                        # VCF2Dis-xx/src/include/zlib
        ./bin/VCF2Dis
  

Note: If fail to link,try to re-install the libraries zlib

2) Example


    1. Parameter description:
	Usage: VCF2Dis -InPut  <in.vcf>  -OutPut  <p_dis.mat>

		-InPut     <str>     Input GATK VCF genotype File
		-OutPut    <str>     OutPut Sample p-Distance matrix

		-SubPop    <str>     SubGroup SampleList of VCFFile [ALLsample]
		-KeepMF              Keep the Middle File diff & Use matrix

		-help                Show more help [hewm2008 v1.10]
    1. To Create the p_distance matrix
# 2.1) To new all the sample p_distance matrix based VCF, run VCF2Dis directly
      ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis.mat

# 2.2) To new sub group sample p_distance matrix ; put their sample name into File sample.list
      ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis.mat  -SubPop  sample.list
    1. construct nj-tree and present it (need deal with Other software)
      #    3.1 Run  PHYLIP  
      #   After p_distance done , software PHYLIP 3.69 (http://evolution.genetics.washington.edu/phylip.html) ,with neighbor-joining method can was used to construct the phylogenetic tree on the basis of this  p_distance matrix;
       
           PHYLIPNEW-3.69.650/bin/fneighbor  -datafile p_dis.matrix  -outfile tree.out1.txt -matrixtype s -treetype n -outtreefile tree.out2.tre

      #    3.2 Run  MEGA  
      #    The MEGA6 (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file [tree.out2.tre]	
    1. you can see the neighbor-joining tree and save it as PDF format

3) Introduction


To new the p_distance matrix besed the VCF file. the more infomation about the p_distance matrix, see this website. The VCF SNPs datasets were used to calculate p-distance between individuals, according to the follow formula to operate the sample i and sample j genetic distance:

            D_ij=(1/L) * [(sum(d(l)_ij))]


Where L is the length of regions where SNPs can be identified, and given the alleles at position l are A/C:

            d(l)_ij=0.0     if the genotypes of the two individuals were AA and AA;
            d(l)_ij=0.5     if the genotypes of the two individuals were AA and AC;
            d(l)_ij=0.0     if the genotypes of the two individuals were AC and AC;
            d(l)_ij=1.0     if the genotypes of the two individuals were AA and CC;
            d(l)_ij=0.0     if the genotypes of the two individuals were CC and CC;

4) Results


some NJ-tree images which I draw in the paper before.

5) Discussing


######################swimming in the sky and flying in the sea ########################### ##

About

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 73.1%
  • C 23.5%
  • Shell 2.0%
  • Makefile 1.4%
0