Orion code base

This package provides an interface to all of the Orion capabilities discussed in the manuscript entitled Deep generative AI models analyzing circulating orphan non-coding RNAs enable accurate detection of early-stage non-small cell lung cancer

Requirements

This package and notebooks were run on Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz machine with Ubuntu 22.04.4 LTS and the following python v3.10.12 packages. For installation of scvi-tools on M1-chip Mac machines, please visit: scvi-tools

Ubuntu                         22.04.4 LTS
python                         3.10.12
absl-py                        1.0.0
numpy                          1.26.4
numpyro                        0.15.0
pandas                         1.5.3
pytorch-lightning              2.2.5
scipy                          1.10.0
scvi-tools                     1.1.2
seaborn                        0.12.2
shap                           0.43.0
torch                          2.0.1
torchmetrics                   1.4.0.post0
tqdm                           4.64.1

Instructions for installation with conda (runtime: 5 minutes):

conda create -n orion python=3.10.12
conda activate orion
conda install pytorch=2.0.1 numpy=1.26.4 pandas=1.5.3
pip install scvi-tools==1.1.2

Quick start

Please see the following notebook which shows a basic usage of Orion with simulated datasets (runtime: 4 minutes).

Essentially, Orion requires a dictionary of the data with the following keys:

"oncrna_ar": A numpy array of [samples, oncRNA] features count data
"oncrna_names": Name of `oncrna_ar` columns
"patient_names": Names of `oncrna_ar` rows
"onehot_ar": One-hot encoded class labels to train/predict
"smrnamat": A numpy array of [samples, small RNAs] features counts data for learning RNA content
"smrna_names": Name of `smarnamat` columns
"batch_list": A list of integers indicating batches to consider for triplet margin loss anchors

In addition, Orion requires a dictionary of model hyperparameters with some most relevant ones shown below:

"n_input": Number of oncRNAs
"n_input_lib": Number of small RNAs
"dp": Dropout
"loss_scalers": A list of scalers corresponding to NLL, KLD_Z, CE, and TML losses
"lr": Learning rate
"n_hidden": Number of hidden units
"num_lvs": Number of latent variables
"n_layers": Number of layers
"num_epochs": Number of epochs
"mini_batch": Number of samples in each mini batch
"tm_rounds": Number of rounds to sample anchors and compute TMLoss per mini batch
"num_classes": Number of classes
"weight_sample_loss": Weight to assign for each sample for classification task
"use_generative_sampling": Whether to use generative sampling for training the classifier

With these two dictionaries, Orion can be trained as:

trained_model_dict = train_orion_model(
    data_dictionary,
    train_idx, # Array of sample indices to be used for training
    tune_idx, # Array of sample indices to be used for reporting metrics
    select_features, # a subset or all of features in `onc_mat` and `oncrna_names` keys of data dictionary
    dict_params=dict_params,
)

Datasets

The datasets required for running the orion-generate-predictions.ipynb are available on Zenodo.

Maintenance and support

Please use the GitHub issues for requesting assistance with the Orion package.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
notebooks		notebooks
orion		orion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orion code base

Requirements

Quick start

Datasets

Maintenance and support

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

exai-oss/orion

Folders and files

Latest commit

History

Repository files navigation

Orion code base

Requirements

Quick start

Datasets

Maintenance and support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages