解决Cursor在免费订阅期间出现以下提示的问题: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please l…

Shell 21,654 2,671 Updated May 23, 2025

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,995 118 Updated Jun 1, 2023

GAIR-NLP / OlympicArena

[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

JavaScript 101 4 Updated Mar 6, 2025

LiveBench / LiveBench

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Python 744 59 Updated May 22, 2025

OpenLMLab / GAOKAO-Bench-Updates

GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.

Python 30 3 Updated Jan 7, 2025

open-compass / MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

99 1 Updated May 22, 2025

MetaCopilot / dseval

Python 25 4 Updated Jun 24, 2024

seketeam / DevEval

A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

Python 23 1 Updated Sep 4, 2024

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Python 21,558 2,464 Updated May 21, 2024

openai / simple-evals

Python 3,518 346 Updated May 13, 2025

thunlp / DebugBench

The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".

Python 77 7 Updated Jul 13, 2024

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Python 943 243 Updated Oct 31, 2024

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,380 570 Updated May 22, 2025

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 3,043 606 Updated Jul 19, 2024

suzgunmirac / BIG-Bench-Hard

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

490 31 Updated Jun 25, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 49,464 6,023 Updated May 21, 2025

hiyouga / ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

Python 3,702 477 Updated Oct 12, 2023

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 175,522 45,727 Updated May 22, 2025

chatchat-space / Langchain-Chatchat

Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 35,093 5,890 Updated Mar 25, 2025

THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,054 5,219 Updated Jun 27, 2024

lich99 / ChatGLM-finetune-LoRA

Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)

Jupyter Notebook 719 66 Updated Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUSSO YibinShen

Achievements

Achievements

Block or report YibinShen

Stars

xlang-ai / OSWorld

camel-ai / owl

lmgame-org / GamingAgent

FoundationAgents / OpenManus

MARIO-Math-Reasoning / MARIO_EVAL

volcengine / verl

huggingface / smolagents

THUDM / DataSciBench

modelscope / modelscope-classroom

yuaotian / go-cursor-help