Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

This repository contains the code for "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting," NAACL Findings 2025. You can access the paper here

How to Run

Install

Clone this repository and navigate to CAP folder

git clone https://github.com/jwu114/CAP.git
cd CAP

Install Dependencies (ignore if you've installed tqdm, sklearn, and openai)

conda create -n cap python=3.10 -y
conda activate cap
conda install tqdm scikit-learn openai -y

Prepare Dataset

Download and put the images of ARO dataset under ./dataset/aro/images/
Download and put the images of GQA dataset under ./dataset/gqa/images/
Download and put the images of MMRel dataset under ./dataset/mmrel/images/

Get OpenAI API Key

You need to get your own API key from OpenAI. After obtaining the key, include it in the ./run.sh file.

Run Program

After changing to the correct working directory, enter:

bash run.sh

You can modify the dataset and prompt used in the evaluation. More details about prompts can be found in ./config/para.py

Code Organization

├── config
│   ├── para.py
│   └── path.py
├── dataset
│   ├── aro
│   │   ├── annotation
│   │   │   ├── test.jsonl
│   │   │   └── valid.jsonl
│   │   └── images
│   ├── gqa
│   │   ├── annotation
│   │   │   ├── test.jsonl
│   │   │   └── valid.jsonl
│   │   └── images
│   └── mmrel
│       ├── annotation
│       │   ├── test.jsonl
│       │   └── valid.jsonl
│       └── images       
└── run.py

Citation

If our work is useful for your research, please cite our paper:

@inproceedings{wu-etal-2025-mitigating,
    title = "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting",
    author = "Wu, Jiarui and Liu, Zhuo and He, Hangfeng",
    editor = "Chiruzzo, Luis and Ritter, Alan and Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.192/",
    pages = "3450--3468",
    ISBN = "979-8-89176-195-7"
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
config		config
dataset		dataset
.gitignore		.gitignore
README.md		README.md
main.py		main.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

How to Run

Install

Prepare Dataset

Get OpenAI API Key

Run Program

Code Organization

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jwu114/CAP

Folders and files

Latest commit

History

Repository files navigation

Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

How to Run

Install

Prepare Dataset

Get OpenAI API Key

Run Program

Code Organization

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages