8000 GitHub - viannegao/ChromaFold
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

viannegao/ChromaFold

Repository files navigation

DOI

ChromaFold

ChromaFold is a deep learning model that enables prediction of 3D contact maps from scATAC-seq data alone, by using pseudobulk chromatin accessibility and co-accessibility from scATAC-seq as well as predicted CTCF motif tracks as input features.

model

Requirements

  • General

    • python=3.8
    • pytorch=1.11
    • numpy=1.21
    • pandas=1.4
    • scipy=1.7
  • Visualization

    • coolbox=0.3
    • matplotlib=3.2
    • seaborn=0.11
    • tabix=1.11
  • R

For using ArchR from R for data preprocessing, you can create an R environment chromafold_env following the steps in R_env.sh.

For deploying ChromaFold, you can create a conda environment using the provided .yml file:

conda env create -f chromafold.yml

Data Preprocessing

Raw data preparation

Sample raw and processed input data can be downloaded from https://drive.google.com/drive/folders/1p6dulb2z51NF_WA6RnAG4hHuUaKfFPrR?usp=sharing

a) Input data preparation

b) Target data preparation

Integration for training


Inference


1. Run inference on germinal center B cell with ChromaFold

  • Run inference on full chromosome without offset
python ./chromafold/inference.py --data-path ./data/processed_input/ -ct imr90 --model-path ./checkpoints/chromafold_CTCFmotif.pth.tar -chrom 19 -offset -2000000 --genome hg38
  • Run inference only on regions with complete input information
python ./chromafold/inference.py --data-path ./data/processed_input/ -ct imr90 --model-path ./checkpoints/chromafold_CTCFmotif.pth.tar -chrom 19 -offset 0 --genome hg38

Training


1. Training on 3 cell types

  • Train model without co-accessibility component
python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90
  • Train model with co-accessibility
python ./chromafold/train.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90 --mod-name bothInput
  • Train deterministic model for full reproducibility
python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90 --deterministic --mod-name deterministic

2. Training on 1 cell type

  • Train model on HUVEC without co-accessibility component
python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct umb_endo

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0