8000 GitHub - elkebir-group/Dolphyin: A phylogenetic inference method for 1-Dollo phylogenies
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

elkebir-group/Dolphyin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dolphyin - inferring 1-Dollo, or persistent, phylogenies from single cell SNV sequencing data

Dolphyin takes in a binary matrix of single cell SNV sequencing data and outputs a 1-Dollo phylogeny on this data, or a rooted tree on which each character / SNV is gained and lost at most once. Underlying Dolphyin is a theoretical characterization and decomposition of all data matrices admitting a 1-Dollo phylogeny, which is turn stems from recursively breaking 1-Dollo phylogenies into 1-Dollo linear phylogenies characterizable with the consecutive ones property (Fulkerson and Gross (1965)).

Overview of Dolphyin's 1-Dollo decomposition

Dolphyin can also probabilistically account for false negatives in data and can thus be applied to real data. We also include source code (check-FN.cpp) that, given any phylogeny inferred by Dolphyin, will calculate the phylogeny's false-negative rate and provide a .txt file of the tree's Graphviz visualization.

Contents

  1. Getting started
  2. Usage instructions

Getting started

Dolphyin is implemented in C++. If using Dolphyin, we recommend checking out only src, as input and output contain several thousand files of data.

Folder DESCRIPTION
src source code for Dolphyin
input simulated data with errors, based on simulations in SPhyR (El-Kebir (2018)) and real data from an acute myeloid leukemia cohort (Morita et. al (2020))
output results on error-free simulations, simulations with error, and real data

Dependencies

Dolphyin has the following dependencies:

Compilation

To compile Dolphyin in C++, execute the following commands from the root of the repository:

$ g++ -std=c++11 src/run-Dolphyin.cpp -o src/run-Dolphyin.o

To compile the additional check-FN in C++, execute the following commands from the root of the repository:

$ g++ -std=c++11 src/check-FN.cpp -o src/check-FN.o

Usage Instructions

I/O formats

The input to Dolphyin is a .csv or .txt file that, after two lines which denote the number of m cells and n sequenced SNVs, contains m rows and n columns where m is the the number of single cells and n is the number of mutations. All entries in the .csv file should be either 1 if SNV j is present in cell i and 0 otherwise. Dolphyin's second parameter is a .txt file location to which it will write the clades of the returned 1-Dollo phylogeny. Lastly, Dolphyin takes in parameters p, e, and seed for error correction. p denotes the percentage of row pairs to randomly consider for error correction, e denotes the normalized Hamming distance that each row pair must be under in order to correct both by replacing each with their bitwise OR, and seed is a randomness seed for this error correction.

The input to check-FN is a .csv or .txt file of the original data, the .txt file of Dolphyin's output tree, a .txt file to which the false negative count and false negative rate is written, and an optional .txt file to which a Graphviz tree visualization will be written.

Usage:
  ./src/run-dolphyin.o
     [-input str] [-output-dolphyin-tree str] [-p double] [-k double] [-seed int]
  ./src/check-FN.o
     [-input str] [-dolphyin-tree str] [-output-FN str] *[-output-graphviz str]

Example

The following is an example of how to use Dolphyin.

./src/run-dolphyin.o input/sims/errors/m25_n25_s1_k1_loss0.1.txt output-m25_n25_s1_k1_loss0.1.txt 0.25 0.00 1

The following is an example of how to use the check-FN executable to get the false-negative rate and Graphviz phylogeny visualization file of Dolphyin's output.

./src/check-FN.o input/sims/errors/m25_n25_s1_k1_loss0.1.txt output-m25_n25_s1_k1_loss0.1.txt output-m25_n25_s1_k1_loss0.1-FNrate.txt output-m25_n25_s1_k1_loss0.1-vis.txt

The following is an example of how to Graphviz to examine a phylogeny after using Dolphyin and the check-FN executable. Dolphyin and the check-FN executable are not dependent on the installation of Graphviz. The visualization file can be editted to customize the result (removing the explicit labeling of nodes mapping to 0 taxa, labeling characters with other identifiers, etc.)

dot -Tsvg output-m25_n25_s1_k1_loss0.1-vis.txt

Graphviz visualization

About

A phylogenetic inference method for 1-Dollo phylogenies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0