OligoGym is a package that streamlines the training and evaluation of predictive models of oligonucleotide (ASOs, siRNAs) properties. The core components of OligoGym are its featurizers and models. The featurizers convert compounds represented using the HELM notation into a set of features that can be used by machine learning models. The models are implemented using PyTorch Lightning and scikit-learn, and they can be trained and evaluated on various datasets. They are implemented in a way that allows for easy integration with the featurizers, making it simple to switch between different featurizers and models.
OligoGym is designed to be easy to use and flexible, making it suitable for both researchers and practitioners in the field of oligonucleotide design and optimization.
from oligogym.features import KMersCounts
from oligogym.models import LinearModel
from oligogym.data import DatasetDownloader
downloader = DatasetDownloader()
data = downloader.download("siRNA1")
X_train, X_test, y_train, y_test = data.split(split_strategy="random")
feat = KMersCounts(k=[1, 2, 3], modification_abundance=True)
X_kmer_train = feat.fit_transform(X_train)
X_kmer_test = feat.transform(X_test)
model = LinearModel()
model.fit(X_kmer_train, y_train)
y_pred = model.predict(X_kmer_test)
The following featurizers are currently implemented:
- KMersCounts
- OneHotEncoder
- Thermodynamics
The following models are currently implemented:
- SKLearnModel
- NearestNeighborsModel
- RandomForestModel
- XGBoostModel
- LinearModel
- GaussianProcessModel
- TabPFNModel
- LightningModel
- MLP
- CNN
- CausalCNN
- GRU
- Python 3.11+
- Poetry
Clone the repository and navigate to the project directory:
git clone github.com/Roche/oligogym
cd oligogym
poetry install
Activate the virtual environment:
poetry shell
Format code using Black:
poetry run black oligogym/ tests/
Lint code using Flake8:
poetry run flake8 oligogym/ tests/
Run tests using Pytest:
poetry run pytest