✨ MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
🌐 Homepage | 🔬 Paper | 👩💻 Code | 📊 Dataset | 📈 Evaluation | 🏆 Leaderboard
MMPerspective is a comprehensive benchmark designed to systematically evaluate the understanding of perspective geometry by Multimodal Large Language Models (MLLMs). It comprises 10 diverse tasks across three key dimensions: Perspective Perception, Reasoning, and Robustness, with 2,711 real-world and synthetic image instances.
MMPerspective enables researchers and practitioners to uncover the strengths, limitations, and potential areas for improvement in MLLMs, offering valuable insights into the challenges of understanding perspective geometry.
@article{tang2025mmperspective,
title = {MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness},
author = {Tang, Yunlong and Liu, Pinxin and Feng, Mingqian and Tan, Zhangyun and Mao, Rui and Huang, Chao and Bi, Jing and Xiao, Yunzhong and Liang, Susan and Hua, Hang and Vosoughi, Ali and Song, Luchuan and Zhang, Zeliang and Xu, Chenliang},
journal = {arXiv preprint arXiv:2505.20426},
year = {2025}
}