GitHub - gjyin/RL-Factory: Train your Agent model via our easy and efficient framework

📘Tutorial | 🛠️Installation | 🎨Framework; 🏆Model

RLFactory is an easy and efficient RL post-training framework for Agentic Learning.

RL-Factory decouples the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training 2x faster.

Current version natively supports one-click DeepSearch training and features multi-turn tool-calling, model judge reward, and training of multiple models including Qwen3. More easy and efficient agentic learning modules will be added in upcoming features.

Now, everyone can easily and quickly train an Agent model with Qwen3 (as base models) and MCP tools!

Our Framework Design

Our goal is to enable users to focus on reward logic and tool setup for fast agentic learning with minimal code, while hardcore developers could focus on improving training efficiency and model performance.

For easy-to-use, we decouple the environment from RL-based post-training with several advantages.

Easy-to-design reward function: Calculate rewards through rules, model-judge, and even tools to meet all your requirements for reward function.
Seamless tool setup: Simply provide the configuration file for your MCP tools and custom tools to integrate them into RL learning.
Multi-Agent extention: Convert your agent to the MCP format for easy Multi-Agent Interaction. LLM chat simulation will be also added in the future to improve multi-turn dialogue capabilities.

For efficient learning, we develope several essential modules within the RL post-training framework, making training 2x faster.

Efficient tool-call: Improve online RL training efficiency through batch processing and asynchronous parallel tool calls.
Efficient reward calculation: Deploy LRM (like QwQ-32B) in a distributed manner for efficient model judging, and use asynchronous parallelism to speed up reward calculation.

For future progression, we will continue to prioritize "easy" and "efficient".

Easier: Use WebUI to process data, define tool & environment, adjust training configuration, and manage project. (The WebUI is under rapid development.)
More efficient: Continuously iterating and improving the training framework (such as AsyncLLMEngine) and RL training algorithms.

Release Log

We’ll keep a fast release cycle to quickly deliver and polish the upcoming features.

Version 0.1
- Environment decouple: define your tool-use envinroment easily (tools setup and reward function definition)
- Qwen3 Model support: quickly train your agent model using Qwen3 (much better than Qwen2.5 in tool-call)
- Efficient training: 2x faster than existing frameworks for rapid model iteration (mainly through async tool-use)
Version 0.2 (within 2 weeks)
- WebUI: build a WebUI for data processing, tool & environment definition, training configuration, and project management
- More efficient training: support the AsyncLLMEngine for more efficient rollout
- More models: test more models (such as Deepseek, Llama, etc.) and add corresponding support configurations
- More applications: help create more demos (such as TravelPlanner) to adapt to more benchmarks

User Instructions

Dependencies (Key)

Cuda: >=12.0 (Recommended: 12.4)
Python: >=3.10  (Recommended: 3.10)
# For Qwen3 model support
vllm: >=0.8.3 (Recommended: 0.8.5)

Install Requirements

pip3 install accelerate bitsandbytes datasets deepspeed==0.16.4 einops flash-attn==2.7.0.post2 isort jsonlines loralib optimum packaging peft pynvml>=12.0.0 ray[default]==2.46.0 tensorboard torch torchmetrics tqdm transformers==4.51.3 transformers_stream_generator wandb wheel
pip3 install vllm==0.8.5      # Mainly for Qwen3 model support
pip3 install "qwen-agent[code_interpreter]"
pip3 install llama_index bs4 pymilvus infinity_client codetiming tensordict==0.6 omegaconf torchdata==0.10.0 hydra-core easydict dill python-multipart mcp
pip3 install -e . --no-deps
pip3 install faiss-gpu-cu12   # Optional, needed for end-to-end search model training with rag_server

Note: Currently, only Qwen models are tested.

What do you need to provide?
- An environment is enough! See the minimal tutorial in docs/rl_factory/main_tutorial.md

Training Command

# Before running, modify MODEL_PATH, REWARD_MODEL_PATH, and several actor_rollout_ref.env parameters as needed
bash main_grpo.sh

Demo in DeepSearch Training

In docs/rl_factory/main_tutorial.md, we provide an RLFactory reproduction example of Search-R1. We use Qwen3-4B and Qwen3-8B as the base model for RL training.
Easy: Start with Qwen3 and MCP tools to quickly train your own DeepSearch Agent Model.
- Provide only one tool configuration and one reward function to start training!
- Qwen3 demonstrates significant advantages in Agent Learning. It can accurately call tools even without SFT, and it also supports the MCP protocol.
Efficient: Enjoy the efficient training enabled by asynchronous parallel tool-call.
- Compared to Search-R1 based on the original verl, the required training time is reduced by 1.5 to 2 times, and the efficiency gain is even greater if a model judge is involved.
- After 100 steps of training (about 5 hours in 8*A100), Qwen3-4B achieves a score of 0.458 and Qwen3-8B achieves a score of 0.463.
The table below presents our training results under identical computational resources, software, and verl versions
- RLFactory trains in about half the time of Search-R1, demonstrating high efficiency.
- Qwen3 as the base model outperforms Qwen2.5, enabling domain-specific tool-calling via RL post-training without SFT.

Model Name	Test Score (NQ)	Total Training Time (100 step)	Seconds per step	Training Resources
Search-R1-Qwen2.5-3B-Instruct-GRPO	0.356	7.39 h	266 s	A100 × 8
Search-R1-Qwen2.5-7B-Instruct-GRPO	0.451	9.25 h	333 s	A100 × 8
Search-R1-Qwen3-4B-GRPO	0.420	7.95 h	286 s	A100 × 8
RLFactory-Qwen3-4B-GRPO	0.458	5.30 h	190 s	A100 × 8
RLFactory-Qwen3-8B-GRPO	0.463	5.76 h	207 s	A100 × 8

How to contribute?

We welcome all users and developers to contribute code to RLFactory. If you have any questions, encounter bugs, or would like to collaborate on development, please feel free to contact us!

Submit an issue directly on GitHub.
Contact us via email at chaijiajun@meituan.com or gjyin@outlook.com.
Join our WeChat group and become a pioneer in Agent model training!

Acknowledgement

This repo benefits from verl, Search-R1, Qwen-Agent. Thanks for their wonderful works. We will also introduce TRL in the future to further expand the applicability of our framework.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
docker		docker
docs		docs
envs		envs
examples		examples
generator		generator
patches		patches
rag_server		rag_server
recipe		recipe
scripts		scripts
tests		tests
verl		verl
webui		webui
workspace/tools/code_interpreter		workspace/tools/code_interpreter
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
main_grpo.sh		main_grpo.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Our Framework Design

Release Log

User Instructions

Demo in DeepSearch Training

How to contribute?

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

gjyin/RL-Factory

Folders and files

Latest commit

History

Repository files navigation

Our Framework Design

Release Log

User Instructions

Demo in DeepSearch Training

How to contribute?

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages