This repository contains code for gene classification of compound perturbations. The experiment aims to achieve the following objectives:
- Run the notebook
Feature Transformation.ipynb
to generate transformed features.
- Run the notebook
Classification Evaluation.ipynb
to evaluate the classification accuracy of MLP and SLPP. - Note: In the notebook, make sure to change the parameter
mode
to eithertop_1
ortop_10
to obtain the desired result.
- Run the first part of the cells in the notebook
Run this to set up datasplit and evaluation result.ipynb
located in thegzsda.main
directory. - This step will generate the required
.mat
files for further processing.
- In your terminal, run the command
bash run_xray
to execute CCVAE and MLP/1-Nearest-Neighbor evaluation. - Note 1: You may need to manually change the MLP/1-Nearest-Neighbor evaluation strategy in the file
train_vae2_xray.py
. - Note 2: Similar to Step 2, adjust the parameter
mode
totop_1
ortop_10
to obtain the desired results.
- Run the second and third parts of the cells in the notebook
Run this to set up datasplit and evaluation result.ipynb
after completing Step 4. - This will provide the analyzed accuracy and the transformed features required for mAP evaluation.
- Run the notebooks in the
mAP Umap Analysis folder
to perform mAP evaluation.
Notes:
- Make sure to adjust the file paths to match your coding environment.
- The notebook
Classification Evaluation.ipynb
also generates data for CCVAE training and mAP evaluation. - The notebook
Feature Transformation.ipynb
generates data for MLP and SLPP mAP evaluation. Do not confuse it withClassification Evaluation.ipynb
! - If you want to perform cross-validation, modify the
random.seed()
values in bothClassification Evaluation.ipynb
andFeature Transformation.ipynb
. There are three instances inClassification Evaluation.ipynb
and two inFeature Transformation.ipynb
. Ensure that you don't skip anyrandom.seed()
calls. After modifying therandom.seed()
values, repeat steps 3-6.
Supplementary: You can also explore splitting the data by cell line or time point. Refer to steps 1, 2, and 6, and open the corresponding notebooks.
- Cellprofiler, Dinov2, and Effnetb0 datasets can be accessed from this link: https://www.terabox.com/sharing/link?surl=gwHoxwJsKQ3WBOXU8jYnJw&path=%2FCPJUMP1
If you have any questions or need further assistance, feel free to contact me.