COMI

This repository is the official implementation of WWW 2025 Paper "Compress and Mix: Advancing Efficient Taxonomy Completion with Large Language Models". It includes:

COMI Reproduction: Easily reproduce the COMI model as described in our paper.
Backbone Models: Includes base backbone models such as PromptLM and PretrainLM to facilitate TC research.
Open-Sourced Compressed Tokens: We provide compressed tokens to support further research and inspire more efficient and effective TC techniques.

Please cite the paper if you find the code helpful. Thanks!

Experimental Setup

To reproduce the experiments, ensure your environment matches the required specifications.

Both environment.yaml and requirements.txt specify all necessary packages and versions.

The dependencies can be set up using either conda or pip:

# conda
conda env create -f environment.yaml
# pip
pip install -r requirements.txt

Configuration Details

All hyper-parameters and training settings are defined in config files. To modify these settings, edit the appropriate config file.

Details of all configuration parameters are provided in:

./config_files/config.explain.json

Refer to this file to understand and adjust training settings as needed.

Model Training

Follow the steps below to train the COMI model.

0. First Running

When running the program for the first time, necessary data preprocessing will take some time. Subsequently, essential intermediate files will be stored in pickle format. For future runs, simply set the raw parameter to False and existing_partition to True in the MAGDataset within the Dataloader to load the intermediate files and avoid repeated processing.

Alternatively, you can download the datasets and intermediate files directly from here and put them under data/ for the experiments.

1. Semantic Compression (Optional)

To train the model from scratch and generate compressed tokens, use the following command to perform Semantic Compression:

python train_id.py --config './config_files/<TAXO_NAME>/config.SemanticCompression.json'

Note: You can directly use our compressed tokens in the ./compressed_token/ directory since this stage requires a certain amount of GPU resources .

2. Contrastive Structure Modeling

With the precomputed compressed tokens provided in ./compressed_token/, you can directly proceed to Contrastive Structure Modeling using the following command:

python train_id.py --config './config_files/<TAXO_NAME>/config.StructureContrastive.json'

Replace <TAXO_NAME> with the name of the taxonomy corresponding to your dataset. Available options include food, mesh, or SemEval-V.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
base		base
compressed_token		compressed_token
config_files		config_files
data		data
data_loader		data_loader
logger		logger
model		model
trainer		trainer
utils		utils
.gitattributes		.gitattributes
README.md		README.md
environment.yaml		environment.yaml
parse_config.py		parse_config.py
requirements.txt		requirements.txt
train_id.py		train_id.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COMI

Experimental Setup

Configuration Details

Model Training

0. First Running

1. Semantic Compression (Optional)

2. Contrastive Structure Modeling

About

Uh oh!

Releases

Packages

Languages

cyclexu/COMI

Folders and files

Latest commit

History

Repository files navigation

COMI

Experimental Setup

Configuration Details

Model Training

0. First Running

1. Semantic Compression (Optional)

2. Contrastive Structure Modeling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages