8000 GitHub - linkangheng/PR1: Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

linkangheng/PR1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perception R1:
Pioneering Perception Policy with Reinforcement Learning


arXiv Website HF Model: Perception-R1 HF Model: UTR-Data
1HUST 2BUPT 3StepFun 4JHU 5THU

📖 Overview

We present Perception-R1, a scalable RL framework using Group Relative Policy Optimization (GRPO) during MLLM post-training. Key innovations:

🎯 Perceptual Perplexity Analysis: We introduce a novel analytical framework that reveals critical thresholds for effective reinforcement learning in perception tasks, providing insights into when and how RL can improve visual understanding.

🚀 GRPO Optimization: Scalable policy learning with meticulously crafted rule-based reward shaping.

🔥 Surprising Performance: Perception-R1 achieves remarkable improvements across multiple visual perception benchmarks, notably reaching 31.9% mAP on COCO2017 validation set - making it the first 3B-scale MLLM to achieve such performance.

TODOS

  • 2025-04-10 🎄: Initial release of Perception-R1 models and evaluation code.
  • 🧐: Release the training code and datas of Perception-R1 on grounding task.
  • 2025-05-27 🎉: Additional perception tasks coming soon (detection, OCR, counting...)

🛠️Installation

# Create and activate a new conda environment
conda create -n pr1 python=3.10 -y  
conda activate pr1

# Clone the repository and install dependencies
git clone https://github.com/linkangheng/PR1.git
cd PR1
pip install -e ".[dev]"
pip install flash-attn==2.7.0.post2 --no-build-isolation

🔄Training

Before training, modify the script to specify your model and data paths. Then run the experiment using:

bash local_scripts/train/train_qwen2_2b_vl_grounding.sh

The training script includes comprehensive configurations for hyperparameters, data loading, and model checkpointing. For custom training scenarios, you can adjust parameters such as learning rate, batch size, and optimization settings directly in the script.

📊Evaluation

Preparation

Download the evaluation datas from 🤗huggingface, and then unzip them in the eval/ folder. The directory structure should be:

Important: The COCO images are not included in the package and must be downloaded separately. Please download the COCO images from the official COCO website and place them in the eval/images/coco/ directory.

eval/
├── images/
│   ├── coco/
│   ├── pixmo-count/
│   └── ocr/
└── jsons/
    ├── counting/
    ├── grounding/
    ├── ocr/
    └── detection/

Running Evaluations

Counting Evaluation

python eval/evaluate_counting.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Counting' \
    --anno_dir 'eval/jsons/counting/' \
    --image_dir 'eval/images/'

Grounding Evaluation

python eval/evaluate_grounding.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Grounding' \
    --anno_dir 'eval/jsons/grounding/' \
    --image_dir 'eval/images/coco/'

Detection Evaluation

pip install pycocotools
python eval/evaluate_detection.py \
    --model_path Kangheng/PR1-Qwen2.5-VL-3B-Detection \
    --anno_dir 'eval/jsons/detection/coco_val2017.json' \
    --image_dir 'eval/images/coco/val2017/'

OCR Evaluation

python eval/evaluate_ocr.py \
    --model_path Kangheng/PR1-Qwen2-VL-2B-OCR \
    --anno_dir 'eval/jsons/ocr/' \
    --image_dir 'eval/images/ocr/'

📈Results

Grounding

Evaluation of Grounding

OCR

Evaluation of OCR

Counting and Detection

Evaluation of Counting and Detection

Some Cases

OCR Case Counting Case Detection Case Grounding Case

Acknowledgement

This work builds upon several important open-source projects. We would like to acknowledge the following repositories that inspired our research:

📚Citation

If you find our paper and code useful in your research, please consider giving us a star ⭐ and citing our work ✏️:

@article{yu2025perception,
  title={Perception R1: Pioneering Perception Policy with Reinforcement Learning},
  author={Yu, En and Lin, Kangheng and Zhao, Liang and Yin, Jisheng and Peng, Yuang and Wei, Haoran and Sun, Jianjian and Han, Chunrui and Ge, Zheng and Zhang, Xiangyu and Jiang, Daxin and Wang, Jingyu and Tao, Wenbing},
  journal={arXiv preprint arXiv:2504.07954},
  year={2025}
}

About

Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0