8000 GitHub - thuml/RLVR-World: Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

thuml/RLVR-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RLVR-World: Training World Models with Reinforcement Learning

Project Page Paper Hugging Face

This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.

Give it a star 🌟 if you find our work useful!

πŸ”₯ News

  • 🚩 2025.05.26: We release all models and datasets.
  • 🚩 2025.05.21: We open-source our training codes.
  • 🚩 2025.05.21: Our paper is released on arXiv.

πŸ“‹ TL;DR

We pioneer training world models through RLVR:

  • World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
  • Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.

concept

πŸ€— Models and Datasets

At the moment, we provide the following models and datasets:

Modality Type Domain Name
Language Dataset Text game bytesized32-world-model-cot
Language World model Text game bytesized32-world-model-sft
Language World model Text game bytesized32-world-model-rlvr-binary-reward
Language World model Text game bytesized32-world-model-rlvr-task-specific-reward
Language Dataset Web navigation webarena-world-model-cot
Language World model Web navigation webarena-world-model-sft
Language World model Web navigation webarena-world-model-rlvr
Video Tokenizer Robot manipulation rt1-frame-tokenizer
Video World model Robot manipulation rt1-world-model-single-step-base
Video World model Robot manipulation rt1-world-model-single-step-rlvr
Video Tokenizer Robot manipulation rt1-compressive-tokenizer
Video World model Robot manipulation rt1-world-model-multi-step-base
Video World model Robot manipulation rt1-world-model-multi-step-rlvr

πŸ’¬ Evaluating Language World Models

See lang_wm:

  • Text game state prediction
  • Web page state prediction
  • Application: Model predictive control for web agents

πŸŽ‡ Evaluating Video World Models

See vid_wm:

  • Robot manipulation trajectory prediction
  • Application: Real2sim policy evaluation

πŸŽ₯ Showcases

showcase

πŸš€ Release Progress

  • Video world model with RLVR
  • Pre-trained & post-trained video world model weights
  • Real2sim policy evaluation with video world models
  • Text game SFT data
  • Web page SFT data
  • Language world model on text games with RLVR
  • Language world model on web pages with RLVR
  • Post-trained language world model weights
  • Web agents with language world models

πŸ“œ Citation

If you find this project useful, please cite our paper as:

@article{wu2025rlvr,
    title={RLVR-World: Training World Models with Reinforcement Learning}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
    journal={arXiv preprint arXiv:2505.13934},
    year={2025},
}

🀝 Contact

If you have any questions, please contact wujialong0229@gmail.com.

πŸ’‘ Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon:

0