IMVRL-GCN：Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms

< 7F0F svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true">

Introduction

Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on Graph Neural Networks (GNN) are effective in identifying cancer genes, they fall short in integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multi-view data, offering significant insights for the identification of cancer genes.

This repository contains the source code and datasets for our paper, "Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms".

Architecture

Requirements

The dependencies is the pytorch environment on Linux system, the operating system is CentOS Linux release 7.7.1908. Some important Python packages are listed below:

pytorch 1.13.1
torch_geometric 2.3.1
scikit-learn 0.22
numpy 1.21.6
pandas 1.1.5
scipy 1.4.1

# Create a virtual environment and install the requirements
conda create -n [ENVIRONMENT NAME] python==3.7.0
conda activate [ENVIRONMENT NAME]
pip install -r requirements.txt

Dataset

./data/CPDB_datasets.pkl contains the PPI network (as an adjacency matrix for input into GCN, $n\times n$) extracted from the CPDB database and the feature matrix X ($n\times d$, where $d$ is the size of the feature dimension, here $d=64$).
./data/k_sets.pkl contains information for five-fold cross-validation to better evaluate the performance of our model.

Demo

The command line code is:

python IMVRL-GCN.py

Description of some important functions and classes:

Function Args() in IMVRL-GCN.py contains hyper-parameters, such as device, epochs. Suitable parameters can be set according to the actual situation.
Function load_datasets() in IMVRL-GCN.py is used to load data and experimental setup for five-fold cross validation.
Class Experiment() in IMVRL-GCN.py is used to evaluate the performance of IMVRL-GCN with five-fold cross validation.

Excepted output: The output file is saved in the output directory, including detailed results of training and testing. And the evaluation metrics include AUC and AUPR.

Instructions for use with your own data

If you want to run IMVRL-GCN on your own dataset, you should refer to ./data/CPDB_datasets.pkl and ./data/k_sets.pkl to prepare your own adjacency matrix, feature matrix information and experiment setup information for five-fold cross validation. And then you should modify the relevant code in the function load_datasets() in IMVRL-GCN.py

Citation

If you find this repository useful, please cite the following paper:

@article{10.1093/bib/bbae418,
    author = {Yang, Jianye and Fu, Haitao and Xue, Feiyang and Li, Menglu and Wu, Yuyang and Yu, Zhanhui and Luo, Haohui and Gong, Jing and Niu, Xiaohui and Zhang, Wen},
    title = "{Multiview representation learning for identification of novel cancer genes and their causative biological mechanisms}",
    journal = {Briefings in Bioinformatics},
    volume = {25},
    number = {5},
    pages = {bbae418},
    year = {2024},
    month = {08},
    issn = {1477-4054},
    doi = {10.1093/bib/bbae418},
    url = {https://doi.org/10.1093/bib/bbae418},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.vscode		.vscode
data		data
image		image
utils		utils
IMVRL-GCN.py		IMVRL-GCN.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMVRL-GCN：Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms

Introduction

Architecture

Requirements

Dataset

Demo

Instructions for use with your own data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

YJY-98/IMVRL-GCN

Folders and files

Latest commit

History

Repository files navigation

IMVRL-GCN：Multi-View Representation Learning for Identification of Novel Cancer Genes and Their Causative Biological Mechanisms

Introduction

Architecture

Requirements

Dataset

Demo

Instructions for use with your own data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages