ML-pipeline (source from Shiu Lab Machine Learning Pipeline: https://github.com/bmmoore43/ML-Pipeline)

Please take a look at the Shiu Lab GitHub repository for a comprehensive tutorial on Machine Learning. This repository follows their scripts but addresses any syntax errors due to updates in Python environments and dependencies (ML_classification_modified.py).

Environment Requirements

biopython 1.78
matplotlib 3.5.3
numpy 1.21.5
pandas 1.3.5
python 3.7.0
scikit-learn 1.0.2
scipy 1.7.3

$ wget http://repo.continuum.io/miniconda/Miniconda3-3.7.0-Linux-x86_64.sh -O ~/miniconda.sh
$ bash ~/miniconda.sh -b -p $HOME/miniconda
$ export PATH="$HOME/miniconda/bin:$PATH"
# source conda.sh
$ conda create -n ml python==3.7.0 
$ conda install biopython
$ conda install matplotlib
$ conda install pandas
$ conda install scikit-learn

Basic ML Pipeline

Code provided to:

Clean your data (ML_preprocess.py)
Define a testing set (test_set.py)
Select the best subset of features to use as predictors (Feature_Selection.py)
Train and apply a classification (ML_classification.py) or regression (ML_regression.py) machine learning model
Assess the results of your model (output from the ML_classification/ML_regression scripts with additional options in scripts_PostAnalysis)

Post ML

Extract Machine learning results

Please follow summary_ml_results.R to extract F1/AUC_ROC results in each groups and algorithms. Then, you can chose a best algorithm to extract the K-mers.

PCC filter K-mers in 10 times training results

> suggested Folder arrangement
> Group
> -algorithm
> --imp files

First, please follow get_kmer-imp_overlap.py to extract important K-mers among 10 times training in different algorithms. Input: all imp files; Output: imp.txt (summary file including K-mers, imp value, and counts)

Then apply ml_pcc_filte.R to pull out distinct and important K-mers in the selected algorithm. Input: imp.txt; Output: imp_distinct_pcc_enriched_kmer.txt

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
__pycache__		__pycache__
scripts_FeatureSelectionIntegration		scripts_FeatureSelectionIntegration
scripts_Old		scripts_Old
scripts_PostAnalysis		scripts_PostAnalysis
Feature_Selection.py		Feature_Selection.py
LICENSE		LICENSE
ML_classification.py		ML_classification.py
ML_classification_modified.py		ML_classification_modified.py
ML_functions.py		ML_functions.py
ML_preprocess.py		ML_preprocess.py
ML_regression.py		ML_regression.py
README.md		README.md
get_kmer-imp_overlap.py		get_kmer-imp_overlap.py
ml_pcc_filter.R		ml_pcc_filter.R
summary_ml_results.R		summary_ml_results.R
test_set.py		test_set.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML-pipeline (source from Shiu Lab Machine Learning Pipeline: https://github.com/bmmoore43/ML-Pipeline)

Environment Requirements

Basic ML Pipeline

Post ML

Extract Machine learning results

PCC filter K-mers in 10 times training results

About

Uh oh!

Releases

Packages

Languages

License

LavakauT/ML-pipeline

Folders and files

Latest commit

History

Repository files navigation

ML-pipeline (source from Shiu Lab Machine Learning Pipeline: https://github.com/bmmoore43/ML-Pipeline)

Environment Requirements

Basic ML Pipeline

Post ML

Extract Machine learning results

PCC filter K-mers in 10 times training results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages