ProtTrans is providing state of the art pre-trained models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using various Transformers Models.
Have a look at our paper ProtTrans: cracking the language of life’s code through self-supervised deep learning and high performance computing for more information about our work.
This repository will be updated regulary with new pre-trained models for proteins as part of supporting bioinformatics community in general, and Covid-19 research specifically through our Accelerate SARS-CoV-2 research with transfer learning using pre-trained language modeling models project.
- ⌛️ Models Availability
- ⌛️ Dataset Availability
- 🚀 Usage
- 📊 Expected Results
- ❤️ Community and Contributions
- 📫 Have a question?
- 🤝 Found a bug?
- ✅ Requirements
- 🤵 Team
- 💰 Sponsors
- 📘 License
- ✏️ Citation
Model | Hugging Face | Zenodo |
---|---|---|
ProtT5-XL-UniRef50 | Download | Download |
ProtT5-XL-BFD | Download | Download |
ProtT5-XXL-UniRef50 | Download | Download |
ProtT5-XXL-BFD | Download | Download |
ProtBert-BFD | Download | Download |
ProtBert | Download | Download |
ProtAlbert | Download | Download |
ProtXLNet | Download | Download |
ProtElectra-Generator-BFD | Download | Download |
ProtElectra-Discriminator-BFD | Download | Download |
ProtElectra-Generator | coming soon | |
ProtElectra-Discriminator | coming soon | |
ProtTXL | coming soon | |
ProtTXL-BFD | coming soon |
Dataset | Dropbox |
---|---|
NEW364 | Download |
Netsurfp2 | Download |
CASP12 | Download |
CB513 | Download |
TS115 | Download |
DeepLoc Train | Download |
DeepLoc Test | Download |
How to use ProtTrans:
- 🧬 Feature Extraction (FE):
Please check: Embedding Section. More information coming soon.
- 💥 Fine Tuning (FT):
Please check: Fine Tuning Section. More information coming soon.
- 🧠 Prediction:
Please check: Prediction Section. More information coming soon.
- ⚗️ Protein Sequences Generation:
Please check: Generate Section. More information coming soon.
- 🧐 Visualization:
Please check: Visualization Section. More information coming soon.
- 📈 Benchmark:
Please check: Benchmark Section. More information coming soon.
- 🧬 Secondary Structure Prediction (Q3):
Model | CASP12 | TS115 | CB513 |
---|---|---|---|
ProtT5-XL-UniRef50 | 81 | 87 | 86 |
ProtT5-XL-BFD | 77 | 85 | 84 |
ProtBert-BFD | 76 | 84 | 83 |
ProtBert | 75 | 83 | 81 |
ProtAlbert | 74 | 82 | 79 |
ProtXLNet | 73 | 81 | 78 |
ProtElectra-Generator | 73 | 78 | 76 |
ProtElectra-Discriminator | 74 | 81 | 79 |
ProtTXL | 71 | 76 | 74 |
ProtTXL-BFD | 72 | 75 | 77 |
- 🧬 Secondary Structure Prediction (Q8):
Model | CASP12 | TS115 | CB513 |
---|---|---|---|
ProtT5-XL-UniRef50 | 70 | 77 | 74 |
ProtT5-XL-BFD | 66 | 74 | 71 |
ProtBert-BFD | 65 | 73 | 70 |
ProtBert | 63 | 72 | 66 |
ProtAlbert | 62 | 70 | 65 |
ProtXLNet | 62 | 69 | 63 |
ProtElectra-Generator | 60 | 66 | 61 |
ProtElectra-Discriminator | 62 | 69 | 65 |
ProtTXL | 59 | 64 | 59 |
ProtTXL-BFD | 60 | 65 | 60 |
- 🧬 Membrane-bound vs Water-soluble (Q2):
Model | DeepLoc |
---|---|
ProtT5-XL-UniRef50 | 91 |
ProtT5-XL-BFD | 91 |
ProtBert-BFD | 89 |
ProtBert | 89 |
ProtAlbert | 88 |
ProtXLNet | 87 |
ProtElectra-Generator | 85 |
ProtElectra-Discriminator | 86 |
ProtTXL | 85 |
ProtTXL-BFD | 86 |
- 🧬 Subcellular Localization (Q10):
Model | DeepLoc |
---|---|
ProtT5-XL-UniRef50 | 81 |
ProtT5-XL-BFD | 77 |
ProtBert-BFD | 74 |
ProtBert | 74 |
ProtAlbert | 74 |
ProtXLNet | 68 |
ProtElectra-Generator | 59 |
ProtElectra-Discriminator | 70 |
ProtTXL | 66 |
ProtTXL-BFD | 65 |
The ProtTrans project is a open source project supported by various partner companies and research institutions. We are committed to share all our pre-trained models and knowledge. We are more than happy if you could help us on sharing new ptrained models, fixing bugs, proposing new feature, improving our documentation, spreading the word, or support our project.
We are happy to hear your question in our issues page ProtTrans! Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly via our RostLab email
Feel free to file a new issue with a respective title and description on the the ProtTrans repository. If you already found a solution to your problem, we would love to review your pull request!.
For protein feature extraction or fine-tuninng our pre-trained models, Pytorch and Transformers library from huggingface is needed. For model visualization, you need to install BertViz library.
- Technical University of Munich:
Ahmed Elnaggar | Michael Heinzinger | Christian Dallago | Ghalia Rehawi | Burkhard Rost |
---|---|---|---|---|
- Med AI Technology:
Yu Wang |
---|
- Google:
Llion Jones |
---|
- Nvidia:
Tom Gibbs | Tamas Feher | Christoph Angerer |
---|---|---|
- Seoul National University:
Martin Steinegger |
---|
- ORNL:
Debsindhu Bhowmik |
---|
Nvidia | ORNL | Software Campus | ||
---|---|---|---|---|
The ProtTrans pretrained models are released under the under terms of the Academic Free License v3.0 License.
If you use this code or our pretrained models for your publication, please cite the original paper:
@ARTICLE
{9477085,
author={Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Yu, Wang and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and Bhowmik, Debsindhu and Rost, Burkhard},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3095381}}