CentromereArchitect (CA) is the first tool for annotation of alpha satellite arrays in centromeres of a newly assembled human genome. CA consists of two modules:
- Monomer Inference allows extraction of human monomers based on the given alpha-satellite consensus template and centromeric sequence.
- HOR Inference allows extraction of HORs from the centromeric sequences using the inferred monomers.
Requirements:
- Python3.5+
Monomer Inference script needs two 1) parameters (centromeric) sequence and 2) monomer template:
python3 src/monomer_inference.py -seq test_data/cenXtoy.fasta -mon test_data/AlphaSat.fa
Resulting monomers can be found in final/monomers.fa
and sequence annotation in final/final_decomposition.tsv
.
HOR Inference script needs four parameters 1) (centromeric) sequence, 2) monomers, 3) sequence annotation, and 4) output file name:
python src/extract_hors.py test_data/cenXtoy.fasta final/monomers.fa final/final_decomposition.tsv final/hor_decomposition.tsv
Resultsing HOR annotation can be found in final/hor_decomposition.tsv
.
In case of any issues please email directly to t.dvorkina@spbu.ru