Perception R1:
Pioneering Perception Policy with Reinforcement Learning

En Yu^1*, Kangheng Lin^2*, Liang Zhao^3*, Jisheng Yin³, Yana Wei⁴, Yuang Peng⁵, Haoran Wei³, Jianjian Sun³, Chunrui Han³, Zheng Ge³, Xiangyu Zhang³, Daxin Jiang³, Jingyu Wang², Wenbing Tao¹

¹HUST ²BUPT ³StepFun ⁴JHU ⁵THU

📖 Overview

We present Perception-R1, a scalable RL framework using Group Relative Policy Optimization (GRPO) during MLLM post-training. Key innovations:

🎯 Perceptual Perplexity Analysis: We introduce a novel analytical framework that reveals critical thresholds for effective reinforcement learning in perception tasks, providing insights into when and how RL can improve visual understanding.

🚀 GRPO Optimization: Scalable policy learning with meticulously crafted rule-based reward shaping.

🔥 Surprising Performance: Perception-R1 achieves remarkable improvements across multiple visual perception benchmarks, notably reaching 31.9% mAP on COCO2017 validation set - making it the first 3B-scale MLLM to achieve such performance.

TODOS

2025-04-10 🎄: Initial release of Perception-R1 models and evaluation code.
🧐: Release the training code and datas of Perception-R1 on grounding task.
2025-05-27 🎉: Additional perception tasks coming soon (detection, OCR, counting...)

🛠️Installation

# Create and activate a new conda environment
conda create -n pr1 python=3.10 -y  
conda activate pr1

# Clone the repository and install dependencies
git clone https://github.com/linkangheng/PR1.git
cd PR1
pip install -e ".[dev]"
pip install flash-attn==2.7.0.post2 --no-build-isolation

🔄Training

Before training, modify the script to specify your model and data paths. Then run the experiment using:

bash local_scripts/train/train_qwen2_2b_vl_grounding.sh

The training script includes comprehensive configurations for hyperparameters, data loading, and model checkpointing. For custom training scenarios, you can adjust parameters such as learning rate, batch size, and optimization settings directly in the script.

📊Evaluation

Preparation

Download the evaluation datas from 🤗huggingface, and then unzip them in the eval/ folder. The directory structure should be:

Important: The COCO images are not included in the package and must be downloaded separately. Please download the COCO images from the official COCO website and place them in the eval/images/coco/ directory.

eval/
├── images/
│   ├── coco/
│   ├── pixmo-count/
│   └── ocr/
└── jsons/
    ├── counting/
    ├── grounding/
    ├── ocr/
    └── detection/

Running Evaluations

Counting Evaluation

python eval/evaluate_counting.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Counting' \
    --anno_dir 'eval/jsons/counting/' \
    --image_dir 'eval/images/'

Grounding Evaluation

python eval/evaluate_grounding.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Grounding' \
    --anno_dir 'eval/jsons/grounding/' \
    --image_dir 'eval/images/coco/'

Detection Evaluation

pip install pycocotools
python eval/evaluate_detection.py \
    --model_path Kangheng/PR1-Qwen2.5-VL-3B-Detection \
    --anno_dir 'eval/jsons/detection/coco_val2017.json' \
    --image_dir 'eval/images/coco/val2017/'

OCR Evaluation

python eval/evaluate_ocr.py \
    --model_path Kangheng/PR1-Qwen2-VL-2B-OCR \
    --anno_dir 'eval/jsons/ocr/' \
    --image_dir 'eval/images/ocr/'

📈Results

Grounding

OCR

Counting and Detection

Some Cases

Acknowledgement

This work builds upon several important open-source projects. We would like to acknowledge the following repositories that inspired our research:

📚Citation

If you find our paper and code useful in your research, please consider giving us a star ⭐ and citing our work ✏️:

@article{yu2025perception,
  title={Perception R1: Pioneering Perception Policy with Reinforcement Learning},
  author={Yu, En and Lin, Kangheng and Zhao, Liang and Yin, Jisheng and Peng, Yuang and Wei, Haoran and Sun, Jianjian and Han, Chunrui and Ge, Zheng and Zhang, Xiangyu and Jiang, Daxin and Wang, Jingyu and Tao, Wenbing},
  journal={arXiv preprint arXiv:2504.07954},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
configs		configs
dist		dist
eval		eval
local_scripts/train		local_scripts/train
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perception R1:
Pioneering Perception Policy with Reinforcement Learning

📖 Overview

TODOS

🛠️Installation

🔄Training

📊Evaluation

Preparation

Running Evaluations

Counting Evaluation

Grounding Evaluation

Detection Evaluation

OCR Evaluation

📈Results

Grounding

OCR

Counting and Detection

Some Cases

Acknowledgement

📚Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

linkangheng/PR1

Folders and files

Latest commit

History

Repository files navigation

Perception R1:Pioneering Perception Policy with Reinforcement Learning

📖 Overview

TODOS

🛠️Installation

🔄Training

📊Evaluation

Preparation

Running Evaluations

Counting Evaluation

Grounding Evaluation

Detection Evaluation

OCR Evaluation

📈Results

Grounding

OCR

Counting and Detection

Some Cases

Acknowledgement

📚Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Perception R1:
Pioneering Perception Policy with Reinforcement Learning

Packages