8000 YibinShen (RUSSO) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View YibinShen's full-sized avatar

Block or report YibinShen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 1,860 233 Updated May 22, 2025

🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Python 16,597 1,950 Updated May 23, 2025

Computer gaming agents that run on your PC and laptops.

Python 598 62 Updated May 23, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 45,966 8,014 Updated May 20, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,358 1,028 Updated May 23, 2025

🤗 smolagents: a barebones library for agents that think in code.

Python 19,053 1,648 Updated May 22, 2025

DataSciBench: An LLM Agent Benchmark for Data Science

Python 15 3 Updated Feb 19, 2025
Jupyter Notebook 918 109 Updated May 9, 2025

解决Cursor在免费订阅期间出现以下提示的问题: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please l…

Shell 21,654 2,671 Updated May 23, 2025

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,995 118 Updated Jun 1, 2023

[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

JavaScript 101 4 Updated Mar 6, 2025

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Python 744 59 Updated May 22, 2025

GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.

Python 30 3 Updated Jan 7, 2025

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

99 1 Updated May 22, 2025
Python 25 4 Updated Jun 24, 2024

A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

Python 23 1 Updated Sep 4, 2024

DeepSeek Coder: Let the Code Write Itself

Python 21,558 2,464 Updated May 21, 2024
Python 3,518 346 Updated May 13, 2025

The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".

Python 77 7 Updated Jul 13, 2024

A framework for the evaluation of autoregressive code generation language models.

Python 943 243 Updated Oct 31, 2024

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,380 570 Updated May 22, 2025

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 3,043 606 Updated Jul 19, 2024

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

490 31 Updated Jun 25, 2024

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 49,464 6,023 Updated May 21, 2025

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

Python 3,702 477 Updated Oct 12, 2023

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 175,522 45,727 Updated May 22, 2025

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 35,093 5,890 Updated Mar 25, 2025

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,054 5,219 Updated Jun 27, 2024

Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)

Jupyter Notebook 719 66 Updated Jul 18, 2023
Next
0