StrucHIS

This repository contains source code and data for paper "Hierarchical Structure Sharing Empowers Multi-task Heterogeneous GNNs for Customer Expansion". (accepted by KDD 2025 Applied Data Science Track).

Overview

Figure 1: The left figure shows the overview framework of our StrucHIS and the right figure shows the details of the structural information sharing module within each layer. Our StrucHIS is a heterogeneous graph-based Multi-task Learning framework that explicitly regulates structural knowledge sharing. It breaks down the structure learning phase into multiple stages and introduces sharing mechanisms at each stage, ensuring that task-specific requirements are addressed during each stage.

Installation

Our model depends on PyTorch (CUDA toolkit if use GPU). You must have them installed before using our model

Python 3.7
Pytorch 1.10.2
Cuda 11.1
dgl > 0.7

You can adjust the versions of Python, PyTorch, and CUDA based on your setup, but make sure the DGL version is above 0.7, otherwise some functions won't be available. This command will install all the required libraries and their specified versions.

pip install -r requirements.txt

Dataset

Our paper uses two public datasets and one private dataset from JD Logistics. Due to company security and privacy policies, we are unable to release the Logistics Dataset. Here, we provide two public datasets: Aminer and DBLP in the Data folder.

Note:The original DBLP data and preprocessing steps follow the method provided at https://github.com/THUDM/HGB/tree/master. However, in our version of DBLP, we removed the venue nodes and introduced paper venue prediction as a new task. As a result, the prediction performance on DBLP differs from what is reported in the HGB paper.

Running Experiments

Python command

For training and evaluating the model, run the following code

# Train on Aminer dataset
python python main.py --config our_Aminer
# Train on DBLP dataset
python  python main.py --config our_DBLP

Models are trained on the training set, halted when the model's performance on the validation set no longer improves after 40 epochs, and then evaluated on the test set. The train, validation, and test datasets have already been split in the Data directory.

The model will be run multiple times, and the final output will include the average value and the fluctuation range of the results.

Experiments were conducted on CentOS 7 with 1 24GB Tesla P40 GPUs

Paramters

Basic parameters are set in kg_lib\args.py

config: hyper-parameter file name
seed: random seed

Hyper-parameters for each dataset are set in kg_config\*.yaml

rerun_num: Number of repeated training runs for robustness.
train_epoch: Maximum number of training epochs.
early_stop: Early stopping patience (in epochs).

learning_rate: Learning rate used for optimization.
weight_decay : Weight decay (L2 regularization).
loss_weight : Weight assigned to each component of the loss function (e.g., [task1_weight, task2_weight]).
save_metric : Metric used to save the best model (e.g., 'loss').
eval_inter_ep : Frequency of evaluation during training (in epochs).

layer_num : Number of heterogeneous graph layers in the model.
node_feature_hid_len : Hidden dimension for node feature representation.
GAT_hid_len : Hidden dimension in heterogeneous graph layers.
edge_feats_len : Dimension of edge feature representations.
nhead : Number of attention heads.

Best Model and Prediction Results

All results will be saved in the Result folder, including:

best_model_{i}.py: The best model parameters saved from the i-th training run.
best_result.json: Contains the test set performance of the best model from each training run.
log.txt: Logs of the training process. The final part of the file will record the average value and fluctuation range of the results from all training runs.

Citation

If you use StrucHIS in your research, please cite the following paper:

Xinyue Feng, Shuxin Zhong, Jinquan Hang, Wenjun Lyu, Yuequn Zhang, Guang Yang, Haotian Wang, Desheng Zhang, and Guang Wang. 2025. Hierarchical Structure Sharing Empowers Multi-task Heterogeneous GNNs for Customer Expansion. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD ’25), August 3–7, 2025, Toronto, ON, Canada. ACM, New York, NY, USA, 12 pages.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
images		images
kg_config		kg_config
kg_lib		kg_lib
kg_model		kg_model
pub_data		pub_data
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StrucHIS

Table of Contents

Overview

Installation

Dataset

Running Experiments

Python command

Paramters

Best Model and Prediction Results

Citation

License

About

Uh oh!

Releases 2

Packages

Languages

License

fengxyStar/StrucHIS

Folders and files

Latest commit

History

Repository files navigation

StrucHIS

Table of Contents

Overview

Installation

Dataset

Running Experiments

Python command

Paramters

Best Model and Prediction Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages