G1

Source code for paper G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning. This repository includes:

VLM-Gym: A parallel environment for training/evaluating Vision-Language Models on visual games
Gaming RL Training: Implementation of reinforcement learning for training G0 and G1 models.

compare.mov

Features of VLM-Gym

RL training curves of G0 and G1 models on different games.

⚙️ Setup

conda create -n vlmgym python=3.10
conda activate vlmgym
bash setup.sh

🏃 Run Parallel Enviroment in VLM-Gym

We provide the evaluation scripts of 4 games using a random policy under ./vlmgym/test. The 2048 enviroment is based on gymnasium-2048.

cd ./vlmgym/test
python eval_2048.py

This would generate the evaluation log file and an image summarizing the curves under ./vlmgym/test/logs dir and the videos documenting all runs under ./vlmgym/test/videos dir. All different runs are conducted in parallel.

It is easy to evaluate different models by implementing the custom_policy function in the evaluation script, such as using OpanAI class or vLLM.

Example 10 parallel random 2048 run curves

📄 Customize Difficulties in VLM-Gym

The game config are in ./vlmgym/sandbox/games/, for example, you can alter the diffculties of Shisen-Sho game by changing the shape and color settings in ./vlmgym/sandbox/games/gamematch.py

🎯 RL Training using VLM-Gym

We provide the RL scripts utilizing the VLM-Gym under ./training/scripts. Our training is based on EasyR1.

For example, to conduct the RL experiments for Shisen-sho game

cd training
bash scripts/rl_shisensho.sh

# Verified least GPU requirements is 4x80G GPUs.

All rollout histroy would be saved under the EasyR1 directory to watch the learning curve.

After training, you can serve the model using vLLM to conduct evaluations with VLM-Gym.

📖 Citation

If you find our work helpful, please kindly cite

@misc{chen2025g1bootstrappingperceptionreasoning,
      title={G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning}, 
      author={Liang Chen and Hongcheng Gao and Tianyu Liu and Zhiqi Huang and Flood Sung and Xinyu Zhou and Yuxin Wu and Baobao Chang},
      year={2025},
      eprint={2505.13426},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.13426}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
training		training
vlmgym		vlmgym
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

G1

⚙️ Setup

🏃 Run Parallel Enviroment in VLM-Gym

📄 Customize Difficulties in VLM-Gym

🎯 RL Training using VLM-Gym

📖 Citation

About

Uh oh!

Releases

Packages

Languages

chenllliang/G1

Folders and files

Latest commit

History

Repository files navigation

G1

⚙️ Setup

🏃 Run Parallel Enviroment in VLM-Gym

📄 Customize Difficulties in VLM-Gym

🎯 RL Training using VLM-Gym

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages