yqy2001

yqy2001

🎯 RL towards the ultimate.

163 followers · 362 following

Tsinghua, AIR
yqy2001.github.io

Achievements

Organizations

10000

Lists (16)

Sort

Stars

openai / preparedness

Releases from OpenAI Preparedness

Python 761 74 Updated May 30, 2025

thu-yao-01-luo / MultiPowerLaw

Python 8 Updated Mar 18, 2025

feifeibear / DPSKV3MFU

Estimate MFU for DeepSeekV3

Python 24 1 Updated Jan 5, 2025

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 372 20 Updated May 29, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,196 180 Updated May 30, 2025

ByteDance-Seed / Seed-Thinking-v1.5

772 14 Updated Apr 20, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,280 52 Updated May 11, 2025

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 2,601 311 Updated May 27, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,396 606 Updated May 27, 2025

ZhengYinan-AIR / Diffusion-Planner

[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"

Python 477 56 Updated Apr 5, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,811 1,100 Updated May 31, 2025

facebookresearch / large_concept_model

Large Concept Models: Language modeling in a sentence representation space

Python 2,211 201 Updated Jan 29, 2025

deepseek-ai / DeepSeek-V3

Python 97,311 15,816 Updated Apr 9, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,121 341 Updated May 31, 2025

KbsdJames / Omni-MATH

The official repository of the Omni-MATH benchmark.

Python 83 1 Updated Dec 22, 2024

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,775 135 Updated Jan 17, 2025

LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction

Jupyter Notebook 2,228 491 Updated May 20, 2024

Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 9,259 1,027 Updated May 28, 2025

openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.

Python 10,922 2,340 Updated Aug 5, 2024

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

Python 14,109 4,921 Updated Aug 9, 2024

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,135 80 Updated Mar 17, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,741 375 Updated May 29, 2025

ScalingIntelligence / large_language_monkeys

Python 92 23 Updated Sep 25, 2024

p-lambda / dsir

DSIR large-scale data selection framework for language model training

Python 249 19 Updated Apr 7, 2024

facebookresearch / fastText

Library for fast text representation and classification.

HTML 26,235 4,768 Updated Mar 22, 2024

mlfoundations / dclm

DataComp for Language Models

HTML 1,304 118 Updated Mar 19, 2025

allenai / OLMoE

OLMoE: Open Mixture-of-Experts Language Models

Jupyter Notebook 770 68 Updated Mar 14, 2025

tianyi-lab / Cherry_LLM

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models

Python 368 24 Updated Sep 6, 2024

tianyi-lab / Reflection_Tuning

[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Python 353 29 Updated Sep 6, 2024

hendrycks / test

Measuring Massive Multitask Language Understanding | ICLR 2021

Python 1,418 103 Updated May 28, 2023

yqy2001

Organizations

Lists (16)

Agents

🌟 Basic Knowledge Learning

Engineering Product

Fun Things

Great Research Codebases

JAX

leetcode

LLM

MOE

Multimodal

⌨️ Research To Learn

RL

synthetic data

🙌 TO Learn and Try

🔨Toolbox & Engineering Practice

VQ & VisionGen

Stars