8000 GitHub - songlab-cal/nj-theory: NJ Theory paper reproducibility
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

songlab-cal/nj-theory

Repository files navigation

nj-theory

This repository allows reproducing all results from our paper Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining.

To reproduce all results, first create a python enviroment and install all requirements. For instance:

$ conda create --name nj-theory-repro python=3.10
$ conda activate nj-theory-repro
$ pip install -r requirements.txt

If you have any issues setting up the environment, you can use the pip_freeze.txt instead.

Make sure the tests are passing:

$ pip install pytest
$ python -m pytest tests/

Then, you can just run:

$ time python -m casbench.papers.paper_nj_theory.figures

NOTE: You can specify the number of processes used to parallelize computation by changing the variable NUM_PROCESSES = 8 in figures.py.

Each function call in the figures.py file reproduces one set of figures:

  • fig_kp() reproduces the results on the KP data, showing that distance correction obtains the best performance on the majority of clones. The results table will be written to the file nj_theory_figures/kp_table.tex.
  • run_simulated_data_benchmark() reproduces the simulated data benchmark results, showing the performance of each of the 4 models on the different lineage tracing regimes 5E86 . The figures will be located at nj_theory_figures/simulated_data_benchmark/.
  • fig_consistency_experiment() reproduces the specific simulation results showing that as the number of lineage tracing characters increases, tree reconstructions become perfect. The figures will be located at nj_theory_figures/nj_theory_paper_consistency/.
  • fig_statistical_efficiency() reproduces the specific simulation results used to disect the statistical efficiency of the distance correction approach, where we see that distance correction achieves similar performance with 10-15 percent less characters. The figures will be located at nj_theory_figures/nj_theory_paper_statistical_efficiency/.
  • fig_q_distribution() reproduces the figure showing the CRISPR/Cas9 indel state probabilities. It will be located in the folder nj_theory_figures/fig_q_distribution.png.
  • fig_gt_trees() reproduces the figures showing the simulated trees. The trees will be located in the folder nj_theory_figures/trees/.

If you want to make sure that everything will run smoothly, we recommend uncommenting and running the # FAST TEST VERSIONS block of code first.

The codebase uses caching to make benchmaking faster and seamless. The data caches are set to _cache_nj_theory (for the simulation results) and _cache_nj_theory_real_data (for the KP experiment results). Feel free to delete these cache directories to free up space after you are done reproducing our results.

Getting simulated trees

To get the trees used in our simulated data benchmark, run:

$ time python -m casbench.papers.paper_nj_theory.dryad

The trees and character matrices will be located in the trees folder.

About

NJ Theory paper reproducibility

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0