10000 GitHub - xiaoxing2001/DeGLA: Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

xiaoxing2001/DeGLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu, Kaicheng Yang, Jun Wang, Haoran Xu, Ziyong Feng, Yupei Wang

📖 Introduction

DeGLA is a novel fine-tuning framework designed to enhance CLIP's compositional understanding. Within this framework, we focus on improving the model's compositional understanding while mitigating the catastrophic forgetting of pre-trained knowledge that often occurs during fine-tuning. To achieve this, we introduce the DeGLA framework, which features a more effective negative sample generation pipeline and innovative training framework. Experimental results demonstrate that our approach establishes a new SOTA in both compositional understanding and general performance. For any inquiries, please contact xiaoxinghhh@gmail.com or raise an issue. Thank you for your attention.

📣 News

  • [2025/04/24]:✨The training code and pertrained weight of DeGLA have been released.
  • [2025/04/24]:✨The paper of DeGLA is submitted to arXiv.

💡 Highlights

We propose a simple yet effective negative caption generation pipeline that harnesses the in-context learning capability of Large Language Models (LLMs) to produce high-quality negative captions, facilitating hard negative-based fine-tuning

Description

We introduce the DeGLA framework, which employs a self- distillation mechanism within the global alignment to maintain the model’s inherent general comprehension capabilities. Addi- tionally, it combines Image-Grounded Contrast (IGC) loss and Text-Grounded Contrast (TGC) loss to improve vision-language compositional understanding

Description

🎨 TODO

  • Release training code
  • Release model weight
  • Release training data

Environment installation

Our work is based on openclip,NegCLIP, CE-CLIP, you can refer to these repository for environment setup, then modify them according to our code and proceed with the train. Alternatively, you can refer to the environment detailed below:

conda create -n DeGLA python=3.9 -y
conda activate DeGLA
pip install -r requirements.txt

Our CUDA version is 12.1. You can adjust the versions of the relevant libraries, such as PyTorch, according to your CUDA version.

Training

Our hard negative data is released at Baidu Yun,GoogleDrive and Huggingface.

git clone https://github.com/xiaoxing2001/DeGLA
cd DeGLA
./scripts/train_DeGLA.sh

Evaluation

Our weights is released at Baidu Yun,GoogleDrive and Huggingface. Our compositional reansoning evaluation is based on other repositories. For ARO, please visit ARO. For SugarCrepe, please visit SugarCrepe. For VALSE, please visit VALSE.

Results

  • VALSE

  • SugarCrepe

  • ARO

  • Zero-shot Classification

Acknowledgements

This project is based on CE-CLIP,NegCLIP,openclip, thanks for their works.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

📖 Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{hu2025decoupledgloballocalalignmentimproving,
      title={Decoupled Global-Local Alignment for Improving Compositional Understanding}, 
      author={Xiaoxing Hu and Kaicheng Yang and Jun Wang and Haoran Xu and Ziyong Feng and Yupei Wang},
      year={2025},
      eprint={2504.16801},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.16801}, 
}

About

Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0