Stars
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (ICML 2024)
Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
NonTrivial-MIPS is a synthesizable superscalar MIPS processor with branch prediction and FPU support, and it is capable of booting linux.
Official code for ICLR'25 paper [Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning]
An elegant PyTorch offline reinforcement learning library for researchers.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Pref-RL provides ready-to-use PbRL agents that are easily extensible.
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
上海交通大学 致远数学方向 专业研讨课3 [计算神经科学专题]
Guide on how to use Qemu to create a similar effect to Windows Subsystem for Linux on macOS. Unfinished; contributions are welcome!
All Algorithms implemented in Python
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide