Stars
Preempt-RT Kernel Build Guide for NVIDIA Development Board
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
NeXT hardware emulator for a NeXT Cube and NeXT Station. Mirrored from SourceForge
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Align Anything: Training All-modality Model with Feedback
Train transformer language models with reinforcement learning.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & RFT & Dynamic Sampling & Async Agent RL)
🌎💪 BrowserGym, a Gym environment for web task automation
A project that provides help for using DeepMind's mctx on gym-style environments.
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Recources to build the MFOS - Noise Toaster Synth by Ray Wilson
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
A library for advanced large language model reasoning
An extensible benchmark for evaluating large language models on planning
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environments.
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
Dream to Control: Learning Behaviors by Latent Imagination
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Really Fast End-to-End Jax RL Implementations
(Crafter + NetHack) in JAX. ICML 2024 Spotlight.
Repository for the paper EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning (EACL'24)