8000 GitHub - HrishiDhondge/CroMaSt: CroMaSt (Cross Mapper of domain Structural instances) is an automated iterative workflow to clarify domain definition by cross-mapping of domain structural instances between domain databases.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CroMaSt (Cross Mapper of domain Structural instances) is an automated iterative workflow to clarify domain definition by cross-mapping of domain structural instances between domain databases.

License

Notifications You must be signed in to change notification settings

HrishiDhondge/CroMaSt

Repository files navigation

DOI CWL GitLab pylint

CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases

CroMaSt (Cross Mapper of domain Structural instances) is an automated iterative workflow to clarify domain definition by cross-mapping of domain structural instances between domain databases. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all structural instances of a given domain type into 3 different categories (core, true, and domain-like).

Requirements

  1. Conda or Miniconda
  2. Kpax
    Download and install conda (or Miniconda) and Kpax by following the instructions from their official site.

Get it running

(Considering the requirements are already met)

  1. Clone the repository and change the directory
git clone https://github.com/HrishiDhondge/CroMaSt.git
cd CroMaSt
  1. Create the conda environment for the workflow
conda env create --file yml/environment.yml
conda activate CroMaSt
  1. Change the path of variables in the parameter file
sed -i 's/\/home\/hdhondge\/CroMaSt\//\/YOUR\/PATH\/TO_CroMaSt\//g' yml/CroMaSt_input.yml 
  1. Create the directory to store files from PDB and SIFTS (if not already)
mkdir PDB_files SIFTS
  1. Download the source input data
cwl-runner Tools/download_data.cwl yml/download_data.yml

Basic example

1. First, we will run the workflow for the KH domain with family identifiers RRM_1 and RRM in Pfam and CATH, respectively.

Run the workflow -

cwl-runner --parallel  --outdir=Results/  CroMaSt.cwl yml/CroMaSt_input.yml

2. Once the iteration is complete, check the new_param.yml file from the outputdir (Results), if there is any family identifier in either pfam or cath; run the next iteration using following command (Until there is no new families explored by the workflow) -

cwl-runner --parallel  --outdir=Results/  CroMaSt.cwl Results/new_param.yml

Extra: Start the workflow with multiple families from one or both databases

If you would like to start the workflow with multiple families from one or both databases, then simply add a comma in between two family identifiers.

pfam: ['PF00076', 'PF08777']
cath: ['3.30.70.330']
  • Pro Tip: Don't forget to give different path to --outdir option while running the workflow multiple times or at least move the results to some other location after first run.

Run the workflow for the protein domain of your choice

1. You can run the workflow for the domain of your choice by simply changing the family identifiers in the yml/CroMaSt_input.yml file.

Simply replace the following values of family identifiers (for pfam and cath) with the family identifiers of your choice in the yml/CroMaSt_input.yml file.

pfam: ['PF00076']
cath: ['3.30.70.330']

Data files used in the current version are as follows:

Files in the Data directory can be downloaded as follows:

  1. File used from Pfam database: pdbmap.gz

  2. File used from CATH database: cath-domain-description-file.txt

  3. Obsolete entries from RCSB PDB obsolete_PDB_entry_ids.txt

CATH Version - 4.3.0 (Ver_Date - 11-Sep-2019) FTP site Pfam Version - 35.0 (Ver_Date - November-2021) FTP site

Reference

Article -
1. Hrishikesh Dhondge, Isaure Chauvot de Beauchêne, Marie-Dominique Devignes, CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment, Bioinformatics Advances, Volume 3, Issue 1, 2023, vbad081, https://doi.org/10.1093/bioadv/vbad081

Poster - 
1. Hrishikesh Dhondge, Isaure Chauvot de Beauchêne, Marie-Dominique Devignes. CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases. 21st European Conference on Computational Biology, Sep 2022, Sitges, Spain. ⟨hal-03789541⟩

Acknowledgements

This project has received funding from the Marie Skłodowska-Curie Innovative Training Network (MSCA-ITN) RNAct supported by European Union’s Horizon 2020 research and innovation programme under granta greement No 813239.

About

CroMaSt (Cross Mapper of domain Structural instances) is an automated iterative workflow to clarify domain definition by cross-mapping of domain structural instances between domain databases.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0