Stars
verl: Volcano Engine Reinforcement Learning for LLMs
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
[ICLR 2025] OMG for material modeling in Gaussian Splatting
[NeurIPS 2024] GL-NeRF for training-free ANY NeRF acceleration
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
The official implementation of Self-Play Fine-Tuning (SPIN)
Build your own visual reasoning model
Paper list for Efficient Reasoning.
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Official Repo for Open-Reasoner-Zero
Robust recipes to align language models with human and AI preferences
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Benchmarking Agentic LLM and VLM Reasoning On Games
Minimal reproduction of DeepSeek R1-Zero
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
Anthropic's educational courses
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…