Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu, Kaicheng Yang, Jun Wang, Haoran Xu, Ziyong Feng, Yupei Wang

📖 Introduction

DeGLA is a novel fine-tuning framework designed to enhance CLIP's compositional understanding. Within this framework, we focus on improving the model's compositional understanding while mitigating the catastrophic forgetting of pre-trained knowledge that often occurs during fine-tuning. To achieve this, we introduce the DeGLA framework, which features a more effective negative sample generation pipeline and innovative training framework. Experimental results demonstrate that our approach establishes a new SOTA in both compositional understanding and general performance. For any inquiries, please contact xiaoxinghhh@gmail.com or raise an issue. Thank you for your attention.

📣 News

[2025/04/24]:✨The training code and pertrained weight of DeGLA have been released.
[2025/04/24]:✨The paper of DeGLA is submitted to arXiv.

💡 Highlights

We propose a simple yet effective negative caption generation pipeline that harnesses the in-context learning capability of Large Language Models (LLMs) to produce high-quality negative captions, facilitating hard negative-based fine-tuning

We introduce the DeGLA framework, which employs a self- distillation mechanism within the global alignment to maintain the model’s inherent general comprehension capabilities. Addi- tionally, it combines Image-Grounded Contrast (IGC) loss and Text-Grounded Contrast (TGC) loss to improve vision-language compositional understanding

🎨 TODO

Release training code
Release model weight
Release training data

Environment installation

Our work is based on openclip,NegCLIP, CE-CLIP, you can refer to these repository for environment setup, then modify them according to our code and proceed with the train. Alternatively, you can refer to the environment detailed below:

conda create -n DeGLA python=3.9 -y
conda activate DeGLA
pip install -r requirements.txt

Our CUDA version is 12.1. You can adjust the versions of the relevant libraries, such as PyTorch, according to your CUDA version.

Training

Our hard negative data is released at Baidu Yun,GoogleDrive and Huggingface.

git clone https://github.com/xiaoxing2001/DeGLA
cd DeGLA
./scripts/train_DeGLA.sh

Evaluation

Our weights is released at Baidu Yun,GoogleDrive and Huggingface. Our compositional reansoning evaluation is based on other repositories. For ARO, please visit ARO. For SugarCrepe, please visit SugarCrepe. For VALSE, please visit VALSE.

Results

VALSE

SugarCrepe

ARO

Zero-shot Classification

Acknowledgements

This project is based on CE-CLIP,NegCLIP,openclip, thanks for their works.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

📖 Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{hu2025decoupledgloballocalalignmentimproving,
      title={Decoupled Global-Local Alignment for Improving Compositional Understanding}, 
      author={Xiaoxing Hu and Kaicheng Yang and Jun Wang and Haoran Xu and Ziyong Feng and Yupei Wang},
      year={2025},
      eprint={2504.16801},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.16801}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

📖 Introduction

📣 News

💡 Highlights

🎨 TODO

Environment installation

Training

Evaluation

Results

VALSE

SugarCrepe

ARO

Zero-shot Classification

Acknowledgements

License

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

xiaoxing2001/DeGLA

Folders and files

Latest commit

History

Repository files navigation

Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

📖 Introduction

📣 News

💡 Highlights

🎨 TODO

Environment installation

Training

Evaluation

Results

VALSE

SugarCrepe

ARO

Zero-shot Classification

Acknowledgements

License

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages