ChromaFold

ChromaFold is a deep learning model that enables prediction of 3D contact maps from scATAC-seq data alone, by using pseudobulk chromatin accessibility and co-accessibility from scATAC-seq as well as predicted CTCF motif tracks as input features.

Requirements

General
- python=3.8
- pytorch=1.11
- numpy=1.21
- pandas=1.4
- scipy=1.7
Visualization
- coolbox=0.3
- matplotlib=3.2
- seaborn=0.11
- tabix=1.11
R

For using ArchR from R for data preprocessing, you can create an R environment chromafold_env following the steps in R_env.sh.

For deploying ChromaFold, you can create a conda environment using the provided .yml file:

conda env create -f chromafold.yml

Data Preprocessing

Raw data preparation

Sample raw and processed input data can be downloaded from https://drive.google.com/drive/folders/1p6dulb2z51NF_WA6RnAG4hHuUaKfFPrR?usp=sharing

a) Input data preparation

Prepare CTCF motif data: CTCF motif data are extracted from the CTCF introduction from R package AnnotationHub. R scripts for generating motif of hg38 and mm10 can be found at process_input/ctcf_motif. We also provide ready-to-use CTCF motif score for hg38, hg19, mm10 in the google drive.
Prepare scATAC data for inference: please refer to the full instructions at preprocessing_pipeline.
A toy processed input folder can be found at data_subset which contains only chr19. A full version of processed input files can be found in the google drive.

b) Target data preparation

Example raw Hi-C file for IMR-90 can be downloaded from ENCODE (https://www.encodeproject.org/files/ENCFF843MZF/@@download/ENCFF843MZF.hic).
Prepare normalized Hi-C library for target: please refer to the full instructions at process input/hic_normalization (also shown below).
HiCDC+ normalized training target for IMR-90 (all chromosomes) is available at google drive, and a subset of chr19 is available at data_subset.

Integration for training

Prepare Hi-C data for training
- Run process input/Process Input - Hi-C.ipynb.
- The juicer tools jar file can be downloaded from https://s3.amazonaws.com/hicfiles.tc4ga.com/public/juicer/juicer_tools_1.22.01.jar . If the juicer tool doesn't match your java system, please refer to an earlier versions of the juicer tools.

Inference

1. Run inference on germinal center B cell with ChromaFold

Run inference on full chromosome without offset

python ./chromafold/inference.py --data-path ./data/processed_input/ -ct imr90 --model-path ./checkpoints/chromafold_CTCFmotif.pth.tar -chrom 19 -offset -2000000 --genome hg38

Run inference only on regions with complete input information

python ./chromafold/inference.py --data-path ./data/processed_input/ -ct imr90 --model-path ./checkpoints/chromafold_CTCFmotif.pth.tar -chrom 19 -offset 0 --genome hg38

Training

1. Training on 3 cell types

Train model without co-accessibility component

python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90

Train model with co-accessibility

python ./chromafold/train.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90 --mod-name bothInput

Train deterministic model for full reproducibility

python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct gm12878_hg38 umb_endo imr90 --deterministic --mod-name deterministic

2. Training on 1 cell type

Train model on HUVEC without co-accessibility component

python ./chromafold/train_bulkOnly.py --data-path ./data/processed_input/ -ct umb_endo

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
checkpoints		checkpoints
chromafold		chromafold
data_subset		data_subset
preprocessing_pipeline		preprocessing_pipeline
process_input		process_input
.gitignore		.gitignore
ChromaFold - Visualize and Evaluate.ipynb		ChromaFold - Visualize and Evaluate.ipynb
LICENSE		LICENSE
README.md		README.md
chromafold.yml		chromafold.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChromaFold

Requirements

Data Preprocessing

Inference

Training

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

viannegao/ChromaFold

Folders and files

Latest commit

History

Repository files navigation

ChromaFold

Requirements

Data Preprocessing

Inference

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages