VCF2Dis

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

1) Install

Just [make] or [sh make.sh ] to compile this software.the final software can be found in the Dir [bin/VCF2Dis]
For linux /Unix and macOS

        tar -zxvf  VCF2DisXXX.tar.gz             # if Link do not work ,Try re-install [zlib]library
        cd VCF2DisXXX;                           # [zlib] and copy them to the library Dir
        make ; make clean                        # VCF2Dis-xx/src/include/zlib
        ./bin/VCF2Dis

Note: If fail to link,try to re-install the libraries zlib

2) Example

1. Parameter description:

	Usage: VCF2Dis -InPut  <in.vcf>  -OutPut  <p_dis.mat>

		-InPut     <str>     Input GATK VCF genotype File
		-OutPut    <str>     OutPut Sample p-Distance matrix

		-SubPop    <str>     SubGroup SampleList of VCFFile [ALLsample]
		-KeepMF              Keep the Middle File diff & Use matrix

		-help                Show more help [hewm2008 v1.10]

1. To Create the p_distance matrix

# 2.1) To new all the sample p_distance matrix based VCF, run VCF2Dis directly
      ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis.mat

# 2.2) To new sub group sample p_distance matrix ; put their sample name into File sample.list
      ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis.mat  -SubPop  sample.list

1. construct nj-tree and present it (need deal with Other software)

      #    3.1 Run  PHYLIP  
      #   After p_distance done , software PHYLIP 3.69 (http://evolution.genetics.washington.edu/phylip.html) ,with neighbor-joining method can was used to construct the phylogenetic tree on the basis of this  p_distance matrix;
       
           PHYLIPNEW-3.69.650/bin/fneighbor  -datafile p_dis.matrix  -outfile tree.out1.txt -matrixtype s -treetype n -outtreefile tree.out2.tre

      #    3.2 Run  MEGA  
      #    The MEGA6 (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file [tree.out2.tre]

1. you can see the neighbor-joining tree and save it as PDF format

3) Introduction

To new the p_distance matrix besed the VCF file. the more infomation about the p_distance matrix, see this website. The VCF SNPs datasets were used to calculate p-distance between individuals, according to the follow formula to operate the sample i and sample j genetic distance:

            D_ij=(1/L) * [(sum(d(l)_ij))]

Where L is the length of regions where SNPs can be identified, and given the alleles at position l are A/C:

            d(l)_ij=0.0     if the genotypes of the two individuals were AA and AA;
            d(l)_ij=0.5     if the genotypes of the two individuals were AA and AC;
            d(l)_ij=0.0     if the genotypes of the two individuals were AC and AC;
            d(l)_ij=1.0     if the genotypes of the two individuals were AA and CC;
            d(l)_ij=0.0     if the genotypes of the two individuals were CC and CC;

4) Results

some NJ-tree images which I draw in the paper before.

5) Discussing

📧 hewm2008@gmail.com / hewm2008@qq.com
join the QQ Group : 125293663

######################swimming in the sky and flying in the sea ########################### ##

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
exemple		exemple
src		src
INSTALL.txt		INSTALL.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Readme		Readme
make.sh		make.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VCF2Dis

1) Install

2) Example

3) Introduction

4) Results

5) Discussing

About

Uh oh!

Releases

Packages

Languages

License

ccwu1212/VCF2Dis

Folders and files

Latest commit

History

Repository files navigation

VCF2Dis

1) Install

2) Example

3) Introduction

4) Results

5) Discussing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages