This repository contains the code for "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting," NAACL Findings 2025. You can access the paper here
- Clone this repository and navigate to CAP folder
git clone https://github.com/jwu114/CAP.git
cd CAP
- Install Dependencies (ignore if you've installed tqdm, sklearn, and openai)
conda create -n cap python=3.10 -y
conda activate cap
conda install tqdm scikit-learn openai -y
- Download and put the images of ARO dataset under ./dataset/aro/images/
- Download and put the images of GQA dataset under ./dataset/gqa/images/
- Download and put the images of MMRel dataset under ./dataset/mmrel/images/
You need to get your own API key from OpenAI. After obtaining the key, include it in the ./run.sh file.
After changing to the correct working directory, enter:
bash run.sh
You can modify the dataset and prompt used in the evaluation. More details about prompts can be found in ./config/para.py
├── config
│ ├── para.py
│ └── path.py
├── dataset
│ ├── aro
│ │ ├── annotation
│ │ │ ├── test.jsonl
│ │ │ └── valid.jsonl
│ │ └── images
│ ├── gqa
│ │ ├── annotation
│ │ │ ├── test.jsonl
│ │ │ └── valid.jsonl
│ │ └── images
│ └── mmrel
│ ├── annotation
│ │ ├── test.jsonl
│ │ └── valid.jsonl
│ └── images
└── run.py
If our work is useful for your research, please cite our paper:
@inproceedings{wu-etal-2025-mitigating,
title = "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting",
author = "Wu, Jiarui and Liu, Zhuo and He, Hangfeng",
editor = "Chiruzzo, Luis and Ritter, Alan and Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.192/",
pages = "3450--3468",
ISBN = "979-8-89176-195-7"
}