This repository contains the official code for CCKT-Det, a novel approach to open-vocabulary object detection, accepted at ICLR 2025. Below, you'll find instructions for installation, data preparation, usage, and more! 🥳
Our models are developed with python=3.9
and pytorch=1.13.0
. Other versions might be available as well.
- Compile CUDA Operators: Follow the instructions from Deformable-DETR to compile CUDA operators.
- Install Dependencies: Install the required packages, including:
- open-clip
coco-api
mmdet
timm
mmcv-full
For OVD-COCO setting, Please download COCO2017 dataset and follow OV-DETR to split data into base and novel class.
For LVIS setting, follow the setup instructions from ViLD.
The data file is organized as following:
data/
├── object365/
├── lvis/
├── lvis_v1_train_norare.json
├── lvis_v1_train_proposal.json
└── instances_val2017_all.json
└── coco/
├── instances_train2017_base.json
└── instances_val2017_all.json
└──regional_feats.pkl
The LATEST codebase is here.
The prior concepts file is available at here.
Evaluation can be done using:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6 --use_env main.py --with_box_refine --resume outputs/checkpoint.pth --eval
- Extract Regional Features: Generate
regional_feats.pkl
using:
python scripts/save_regional_feats.py
Note: This process may take some time. Alternatively, download pre-extracted features here.
- Train the Model: Use 6 GPUs with the provided script:
bash run_training.sh
Ensure regional_feats.pkl is in the coco_path/ directory before training.
If you find this work useful, please cite our paper:
@article{zhang2025cyclic,
title={Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection},
author={Zhang, Chuhan and Zhu, Chaoyang and Dong, Pingcheng and Chen, Long and Zhang, Dong},
journal={arXiv preprint arXiv:2503.11005},
year={2025}
}