8000 GitHub - cyclexu/COMI
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cyclexu/COMI

Repository files navigation

COMI

This repository is the official implementation of WWW 2025 Paper "Compress and Mix: Advancing Efficient Taxonomy Completion with Large Language Models". It includes:

  • COMI Reproduction: Easily reproduce the COMI model as described in our paper.

  • Backbone Models: Includes base backbone models such as PromptLM and PretrainLM to facilitate TC research.

  • Open-Sourced Compressed Tokens: We provide compressed tokens to support further research and inspire more efficient and effective TC techniques.

Please cite the paper if you find the code helpful. Thanks!

Experimental Setup

To reproduce the experiments, ensure your environment matches the required specifications.

Both environment.yaml and requirements.txt specify all necessary packages and versions.

The dependencies can be set up using either conda or pip:

# conda
conda env create -f environment.yaml
# pip
pip install -r requirements.txt

Configuration Details

All hyper-parameters and training settings are defined in config files. To modify these settings, edit the appropriate config file.

Details of all configuration parameters are provided in:

./config_files/config.explain.json

Refer to this file to understand and adjust training settings as needed.

Model Training

Follow the steps below to train the COMI model.

0. First Running

When running the program for the first time, necessary data preprocessing will take some time. Subsequently, essential intermediate files will be stored in pickle format. For future runs, simply set the raw parameter to False and existing_partition to True in the MAGDataset within the Dataloader to load the intermediate files and avoid repeated processing.

Alternatively, you can download the datasets and intermediate files directly from here and put them under data/ for the experiments.

1. Semantic Compression (Optional)

To train the model from scratch and generate compressed tokens, use the following command to perform Semantic Compression:

python train_id.py --config './config_files/<TAXO_NAME>/config.SemanticCompression.json'

Note: You can directly use our compressed tokens in the ./compressed_token/ directory since this stage requires a certain amount of GPU resources .

2. Contrastive Structure Modeling

With the precomputed compressed tokens provided in ./compressed_token/, you can directly proceed to Contrastive Structure Modeling using the following command:

python train_id.py --config './config_files/<TAXO_NAME>/config.StructureContrastive.json'

Replace <TAXO_NAME> with the name of the taxonomy corresponding to your dataset. Available options include food, mesh, or SemEval-V.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0