Stars
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
OpenDeepWiki is the open-source version of the DeepWiki project, aiming to provide a powerful knowledge management and collaboration platform. The project is mainly developed using C# and TypeScrip…
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
A project to improve skills of large language models
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Code and data for "Measuring and Narrowing the Compositionality Gap in Language Models"
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Hammer: Highly Agile Masks Made Effortlessly from RTL
Model Context Protocol Servers
Lightweight coding agent that runs in your terminal
Function Calling Benchmark & Testing
This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
An Open-source RL System from ByteDance Seed and Tsinghua AIR
A Database of Real Faults and an Experimental Infrastructure to Enable Controlled Experiments in Software Engineering Research
Qihoo360 / 360-LLaMA-Factory
Forked from hiyouga/LLaMA-Factoryadds Sequence Parallelism into LLaMA-Factory
Democratizing Reinforcement Learning for LLMs
Fully open data curation for reasoning models