8000 GitHub - CharlesNeilWilliams/TME: [CVPR 2025] Official Pytorch implementation of "Learning with Noisy Triplet Correspondence for Composed Image Retrieval".
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[CVPR 2025] Official Pytorch implementation of "Learning with Noisy Triplet Correspondence for Composed Image Retrieval".

Notifications You must be signed in to change notification settings

CharlesNeilWilliams/TME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2025-CVPR-TME

[CVPR 2025] Code for the paper "Learning with Noisy Triplet Correspondence for Composed Image Retrieval"

Preparation

Project structure

The proper project structure for running the code is as follows:

TME/
├── bert-base-uncased/
├── data/
│   ├── CIRR/
│   │   ├── captions/
│   │   │   ├── captions/
│   │   │   │   │ cap.rc2.test1.json
│   │   │   │   │ cap.rc2.train.json
│   │   │   │   │ cap.rc2.val.json
│   │   │   ├── image_splits/
│   │   │   │   │ split.rc2.test1.json
│   │   │   │   │ split.rc2.train.json
│   │   │   │   │ split.rc2.val.json
│   │   ├── dev/
│   │   │   │ dev-0-0-img0.png
│   │   │   │ ...
│   │   ├── test1/
│   │   │   │ test1-0-0-img0.png
│   │   │   │ ...
│   │   ├── train/
│   │   │   ├── 0/
│   │   │   ├── 1/
│   │   │   ├── ...
│   ├── FashionIQ/
│   │   ├── captions
│   │   │   │ cap.dress.train.json
│   │   │   │ cap.dress.val.json
│   │   │   │ ...
│   │   ├── image_splits
│   │   │   │ split.dress.train.json
│   │   │   │ split.dress.val.json
│   │   │   │ ...
│   │   ├── images
│   │   │   │ B00006M009.jpg
│   │   │   │ B00006M00B.jpg
│   │   │   │ ...
├── src/
│   │ ...
├── weight/
│   │ blip2_pretrained.pth
│   │ eva_vit_g.pth

Prerequisites

The following commands will create a local Anaconda environment with the necessary packages installed. We provide all package versions, but using the exact same versions is not necessary to run the code; any version shall be fine. Note that Using MKL versions 2024.1+ with PyTorch may result in an "ImportError: undefined symbol: iJIT_NotifyEvent".

conda create -n TME -y python=3.10
conda activate TME
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

dataset setting

Please modify FashioniqPath and CIRRPath in data.py to point to your dataset paths. Also, check the folder names in data.py to ensure they match the names of your dataset folders. The default dataset settings are as follows:

# data.py:
FashioniqPath = './data/FashionIQ/' # end with '/'
CIRRPath = './data/CIRR/' # end with '/'

Pretrained wegiht

Download the BLIP2 weights and place them in the weight directory.

To use the online url during training, simply edit the following file: src/lavis/models/blip2_models/blip2.py line 85-98 and src/lavis/models/eva_vit.py line 428-447.

Runing

Precompute

Encodings via Frozen ViT remain unchanged during training. To save time, we precompute the ViT encodings using precompute.py, which creates a features directory containing the precomputed features inside the project directory TME.

python src/precompute.py \
    --dataset {'CIRR' or 'FashionIQ'} \
    --split {'test1', 'train' or 'val'} \
    --batch_size 512

Storing features requires a lot of space. If your storage is insufficient, you can call compute_vit_features() in precompute.py to generate the features (as a CPU tensor) and the name_to_idx dictionary. Then, replace load_image_features() with load_image_features_tensor_and_dict() in precompute_train.py and precompute_test.py to avoid loading features from local disk storage.

Training

python src/precompute_train.py \
    --exp_name {your_exp_name} \
    --shuffle_seed 42 \
    --seed 42 \
    --dataset {'CIRR' or FashionIQ} \
    --noise_ratio 0.0 \
    --nc_type "mix" \
    --batch_size 128 \
    --num_epochs 30 \
    --warmup_qformer 3 \
    --warmup_proj 2 \
    --warmup_last 1 \
    --partitioner "GMM" \
    --split_type "loss" \
    --threshold 0.5 \
    --lr "1e-5" \
    --lpm 1.0 \
    --lsa 1.0 \
    --lrd {'0.2' for CIRR and '0.1' for FashionIQ} \
    --save_training \
    --gpu {gpu_default_0}

Test or Evaluation

Note that the test mode is only for CIRR.

python src/precompute_test.py \
    --dataset {CIRR or FashionIQ} \
    --mode {validate or test} \
    --model-path {your_model_path} \
    --gpu {gpu_default_0} \
    --name {save_folder_name}

Checkpoints

We provide the best TME model weight for CIRR and FashionIQ in the setting without sythetic noise in kaggle.

Weights: CIRR, FashionIQ.

Experiment Results

CIRR performance:

FashionIQ performance:

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{TME,
    author    = {Li, Shuxian and He, Changhao and Liu, Xiting and Zhou, Joey Tianyi and Peng, Xi and Hu, Peng},
    title     = {Learning with Noisy Triplet Correspondence for Composed Image Retrieval},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {19628-19637}
}

Acknowledgement

Our implementation is based on CLIP4Cir and LAVIS.

About

[CVPR 2025] Official Pytorch implementation of "Learning with Noisy Triplet Correspondence for Composed Image Retrieval".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0