8000 GitHub - hmmartiny/AL_PPI: Active Learning (AL) for Predicting Protein-Protein Interactions (PPIs)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hmmartiny/AL_PPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Active Learning For Classifying Protein-Protein Interactions

This project is done as part of the course 02901 Advanced Topics in Machine Learning August 2020.

The aim is to reproduce and possibly improve some of the results of the following paper:

Mohamed, T. P., Carbonell, J. G., & Ganapathiraju, M. K. (2010). Active learning for human protein-protein interaction prediction. BMC bioinformatics, 11(S1), S57.

Focusing on:

  • How the dataset is splitted into training and test: random, sample X% of positive labels to be in training set, or maintain the y proportion in train and test sets.
  • Compare the effect of two different classifiers: Random Forest vs Linear Neural Network

Data preparation

Data source: http://www.cs.cmu.edu/~qyj/papers_sulp/proteins05_pages/feature-download.html

To prepare the data for the active learning system, run data_prepare.py. Basically, it combines the files into one and reduces the feature groups to 0s and 1s. 0 means feature is absent and 1 means feature is present.

Run active learning

The script main.py contains the code for testing the different active learning systems. It needs two files as input: feature file (X) and output labels (y). There are two arguments that can additionally be added:

  • -add_n: Add n samples from pool in each iteration (by default 1)
  • -threshold: Desired threshold to reach on test set (by default 0.7)

Example on how to run the script:

python3 main.py -X data/pro
50DA
tein_interaction_X_reduced.csv -y data/protein_interaction_y.csv -an 250

Visualize results

The results can be visualized with the code located in viz.py.

With the command listed in the example in the previous section, the visualization looks like this:

results visualization

About

Active Learning (AL) for Predicting Protein-Protein Interactions (PPIs)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0