This repository contains code for the paper "[Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?](https://arxiv.org/abs/2502.12215)"

Jupyter Notebook 8 Updated Apr 9, 2025

UCSB-NLP-Chang / ThinkPrune

Python 32 1 Updated Apr 16, 2025

RUCAIBox / R1-Searcher

R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Python 561 39 Updated May 25, 2025

limenlp / verl

Forked from volcengine/verl

AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Python 37 2 Updated Jun 13, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,592 1,444 Updated Jun 17, 2025

hemingkx / TokenSkip

TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Python 155 7 Updated Mar 13, 2025

Zanette-Labs / efficient-reasoning

Python 65 7 Updated Apr 13, 2025

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

Python 5,226 448 Updated Apr 30, 2025

NovaSky-AI / SkyRL

SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning

Python 411 42 Updated Jun 9, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 988 47 Updated May 24, 2025

rosieyzh / openrlhf-pretrain

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

Python 18 2 Updated Apr 14, 2025

umass-ml4ed / tutorbot-dpo

Code for the paper "Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues", published at AIED 2025.

Python 3 Updated Mar 11, 2025

anthropics / anthropic-cookbook

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 14,701 1,627 Updated Jun 13, 2025

Agent-RL / ReCall

ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning

Python 936 63 Updated May 16, 2025

akumar2709 / OVERTHINK_public

Jupyter Notebook 35 2 Updated Apr 3, 2025

LambdaLabsML / distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

Python 441 36 Updated Feb 24, 2025

TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Python 45 5 Updated Jul 28, 2024

dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 369 15 Updated Jan 19, 2025

shawnricecake / Heima

Code for Heima

Python 46 3 Updated Apr 21, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 41,913 7,002 Updated Dec 9, 2024

graykode / gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Python 1,001 228 Updated Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JinyanSu1

Achievements

Achievements

Highlights

Block or report JinyanSu1

Stars

facebookresearch / sweet_rl

emrecanacikgoz / awesome-conversational-agents

vardhandongre / Respact

scandukuri / assistant-gate

tsinghua-fib-lab / AgentSocietyChallenge

AaronHeee / LLMs-as-Zero-Shot-Conversational-RecSys

jeon185 / LaViC

salesforce / DialogStudio

xlite-dev / LeetCUDA

ZhiYuanZeng / test-time-scaling-eval