Stars
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
Awesome paper lists for "A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions""
Yelp Simulator for WWW'25 AgentSociety Challenge
Evaluation data, LLMs query code and results for "Large Language Models as Zero-Shot Conversational Recommenders" on CIKM 2023.
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
📚LeetCUDA: 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.
This repository contains code for the paper "[Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?](https://arxiv.org/abs/2502.12215)"
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
limenlp / verl
Forked from volcengine/verlAdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
verl: Volcano Engine Reinforcement Learning for LLMs
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Robust recipes to align language models with human and AI preferences
SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning
Understanding R1-Zero-Like Training: A Critical Perspective
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
Code for the paper "Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues", published at AIED 2025.
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
Best practices & guides on how to write distributed pytorch training code
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation