8000 GitHub - YJY-98/IMVRL-GCN: An Interpretable Multi-View Representation Learning framework based on Graph Convolutional Network specifically designed for cancer gene prediction
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

An Interpretable Multi-View Representation Learning framework based on Graph Convolutional Network specifically designed for cancer gene prediction

Notifications You must be signed in to change notification settings

YJY-98/IMVRL-GCN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMVRL-GCN:Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms

< 7F0F svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true">

Introduction

Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on Graph Neural Networks (GNN) are effective in identifying cancer genes, they fall short in integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multi-view data, offering significant insights for the identification of cancer genes.

This repository contains the source code and datasets for our paper, "Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms".

Architecture

architecture

Requirements

The dependencies is the pytorch environment on Linux system, the operating system is CentOS Linux release 7.7.1908. Some important Python packages are listed below:

  • pytorch 1.13.1

  • torch_geometric 2.3.1

  • scikit-learn 0.22

  • numpy 1.21.6

  • pandas 1.1.5

  • scipy 1.4.1

# Create a virtual environment and install the requirements
conda create -n [ENVIRONMENT NAME] python==3.7.0
conda activate [ENVIRONMENT NAME]
pip install -r requirements.txt

Dataset

  1. ./data/CPDB_datasets.pkl contains the PPI network (as an adjacency matrix for input into GCN, $n\times n$) extracted from the CPDB database and the feature matrix X ($n\times d$, where $d$ is the size of the feature dimension, here $d=64$).

  2. ./data/k_sets.pkl contains information for five-fold cross-validation to better evaluate the performance of our model.

Demo

The command line code is:

python IMVRL-GCN.py

Description of some important functions and classes:

  1. Function Args() in IMVRL-GCN.py contains hyper-parameters, such as device, epochs. Suitable parameters can be set according to the actual situation.
  2. Function load_datasets() in IMVRL-GCN.py is used to load data and experimental setup for five-fold cross validation.
  3. Class Experiment() in IMVRL-GCN.py is used to evaluate the performance of IMVRL-GCN with five-fold cross validation.

Excepted output: The output file is saved in the output directory, including detailed results of training and testing. And the evaluation metrics include AUC and AUPR.

Instructions for use with your own data

If you want to run IMVRL-GCN on your own dataset, you should refer to ./data/CPDB_datasets.pkl and ./data/k_sets.pkl to prepare your own adjacency matrix, feature matrix information and experiment setup information for five-fold cross validation. And then you should modify the relevant code in the function load_datasets() in IMVRL-GCN.py

Citation

If you find this repository useful, please cite the following paper:

@article{10.1093/bib/bbae418,
    author = {Yang, Jianye and Fu, Haitao and Xue, Feiyang and Li, Menglu and Wu, Yuyang and Yu, Zhanhui and Luo, Haohui and Gong, Jing and Niu, Xiaohui and Zhang, Wen},
    title = "{Multiview representation learning for identification of novel cancer genes and their causative biological mechanisms}",
    journal = {Briefings in Bioinformatics},
    volume = {25},
    number = {5},
    pages = {bbae418},
    year = {2024},
    month = {08},
    issn = {1477-4054},
    doi = {10.1093/bib/bbae418},
    url = {https://doi.org/10.1093/bib/bbae418},
}

About

An Interpretable Multi-View Representation Learning framework based on Graph Convolutional Network specifically designed for cancer gene prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0