nj-theory

This repository allows reproducing all results from our paper Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining.

To reproduce all results, first create a python enviroment and install all requirements. For instance:

$ conda create --name nj-theory-repro python=3.10
$ conda activate nj-theory-repro
$ pip install -r requirements.txt

If you have any issues setting up the environment, you can use the pip_freeze.txt instead.

Make sure the tests are passing:

$ pip install pytest
$ python -m pytest tests/

Then, you can just run:

$ time python -m casbench.papers.paper_nj_theory.figures

NOTE: You can specify the number of processes used to parallelize computation by changing the variable NUM_PROCESSES = 8 in figures.py.

Each function call in the figures.py file reproduces one set of figures:

fig_kp() reproduces the results on the KP data, showing that distance correction obtains the best performance on the majority of clones. The results table will be written to the file nj_theory_figures/kp_table.tex.
run_simulated_data_benchmark() reproduces the simulated data benchmark results, showing the performance of each of the 4 models on the different lineage tracing regimes. The figures will be located at nj_theory_figures/simulated_data_benchmark/.
fig_consistency_experiment() reproduces the specific simulation results showing that as the number of lineage tracing characters increases, tree reconstructions become perfect. The figures will be located at nj_theory_figures/nj_theory_paper_consistency/.
fig_statistical_efficiency() reproduces the specific simulation results used to disect the statistical efficiency of the distance correction approach, where we see that distance correction achieves similar performance with 10-15 percent less characters. The figures will be located at nj_theory_figures/nj_theory_paper_statistical_efficiency/.
fig_q_distribution() reproduces the figure showing the CRISPR/Cas9 indel state probabilities. It will be located in the folder nj_theory_figures/fig_q_distribution.png.
fig_gt_trees() reproduces the figures showing the simulated trees. The trees will be located in the folder nj_theory_figures/trees/.

If you want to make sure that everything will run smoothly, we recommend uncommenting and running the # FAST TEST VERSIONS block of code first.

The codebase uses caching to make benchmaking faster and seamless. The data caches are set to _cache_nj_theory (for the simulation results) and _cache_nj_theory_real_data (for the KP experiment results). Feel free to delete these cache directories to free up space after you are done reproducing our results.

Getting simulated trees

To get the trees used in our simulated data benchmark, run:

$ time python -m casbench.papers.paper_nj_theory.dryad

The trees and character matrices will be located in the trees folder.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
casbench		casbench
tests/solver_tests/distance_correction_tests		tests/solver_tests/distance_correction_tests
.gitignore		.gitignore
DRYAD_README.md		DRYAD_README.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pip_freeze.txt		pip_freeze.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nj-theory

Getting simulated trees

About

Uh oh!

Releases

Packages

Languages

License

songlab-cal/nj-theory

Folders and files

Latest commit

History

Repository files navigation

nj-theory

Getting simulated trees

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages