8000 JinyanSu1 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View JinyanSu1's full-sized avatar

Highlights

  • Pro

Block or report JinyanSu1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks

Python 218 9 Updated May 5, 2025

Awesome paper lists for "A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions""

12 Updated Apr 25, 2025
Jupyter Notebook 5 1 Updated Jun 16, 2025
Python 25 6 Updated May 29, 2024

Yelp Simulator for WWW'25 AgentSociety Challenge

Python 80 23 Updated Apr 27, 2025

Evaluation data, LLMs query code and results for "Large Language Models as Zero-Shot Conversational Recommenders" on CIKM 2023.

Python 78 9 Updated Aug 22, 2023

Implementation of LaViC (KDD 2025)

Python 9 Updated Jun 1, 2025

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI

Python 502 35 Updated Jan 27, 2025

📚LeetCUDA: 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4,793 519 Updated Jun 18, 2025

This repository contains code for the paper "[Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?](https://arxiv.org/abs/2502.12215)"

Jupyter Notebook 8 Updated Apr 9, 2025
Python 32 1 Updated Apr 16, 2025

R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Python 561 39 Updated May 25, 2025

AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Python 37 2 Updated Jun 13, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,592 1,444 Updated Jun 17, 2025

TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Python 155 7 Updated Mar 13, 2025

Robust recipes to align language models with human and AI preferences

Python 5,226 448 Updated Apr 30, 2025

SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning

Python 411 42 Updated Jun 9, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 988 47 Updated May 24, 2025

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

Python 18 2 Updated Apr 14, 2025

Code for the paper "Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues", published at AIED 2025.

Python 3 Updated Mar 11, 2025

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 14,701 1,627 Updated Jun 13, 2025

ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning

Python 936 63 Updated May 16, 2025
Jupyter Notebook 35 2 Updated Apr 3, 2025

Best practices & guides on how to write distributed pytorch training code

Python 441 36 Updated Feb 24, 2025

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Python 45 5 Updated Jul 28, 2024

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 369 15 Updated Jan 19, 2025

Code for Heima

Python 46 3 Updated Apr 21, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 41,913 7,002 Updated Dec 9, 2024

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Python 1,001 228 Updated Jul 8, 2019
Next
0