yflyzhang

yflyzhang

14 followers · 5 following

Achievements

Lists (4)

Sort

Starred repositories

adityatelange / hugo-PaperMod

A fast, clean, responsive Hugo theme.

HTML 11,691 3,044 Updated May 24, 2025

dillonzq / LoveIt

❤️A clean, elegant but advanced blog theme for Hugo 一个简洁、优雅且高效的 Hugo 主题

JavaScript 3,599 1,132 Updated Mar 29, 2025

QwenLM / Qwen2.5-Math

A series of math-specific large language models of our Qwen2 series.

Python 938 135 Updated Jan 11, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,129 41 Updated May 21, 2025

yflyzhang / RankPO

RankPO: Rank Preference Optimization

Python 1 Updated Mar 17, 2025

yflyzhang / simpleR1

simpleR1: A Simple Framework for Training R1-like Models

Python 9 2 Updated May 21, 2025

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 9,230 781 Updated May 29, 2025

microsoft / ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Python 1,071 76 Updated Feb 22, 2024

ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,113 1,132 Updated Jun 1, 2025

GAIR-NLP / ToRL

Python 204 9 Updated May 24, 2025

PeterGriffinJin / Search-R1

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 2,471 176 Updated May 23, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

15,420 998 Updated May 30, 2025

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Python 2,563 408 Updated Jun 2, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 962 44 Updated May 24, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,348 155 Updated Mar 20, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,135 343 Updated Jun 2, 2025

computerhistory / AlexNet-Source-Code

This package contains the original 2012 AlexNet code.

Cuda 2,634 344 Updated Mar 12, 2025

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1,579 269 Updated Jun 2, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,875 667 Updated Jun 2, 2025

RUCAIBox / Slow_Thinking_with_LLMs

A series of technical report on Slow Thinking with LLM

Python 684 39 Updated May 27, 2025

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,777 135 Updated Jan 17, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,863 1,111 Updated Jun 2, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,644 2,280 Updated Jun 2, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,308 306 Updated May 13, 2025

hkust-nlp / simpleRL-reason

Simple RL training for reasoning

Python 3,604 269 Updated Apr 10, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,828 1,891 Updated Jun 4E60 3, 2025

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 18,415 1,511 Updated Apr 29, 2025

ise-uiuc / magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

Python 2,017 164 Updated Nov 1, 2024

RLHFlow / RAFT

This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.

yflyzhang

Lists (4)

Dev

Graph

LLM

RL

Starred repositories

knowledge-graph