Stars
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Computer gaming agents that run on your PC and laptops.
No fortress, purely open ground. OpenManus is Coming.
verl: Volcano Engine Reinforcement Learning for LLMs
🤗 smolagents: a barebones library for agents that think in code.
DataSciBench: An LLM Agent Benchmark for Data Science
解决Cursor在免费订阅期间出现以下提示的问题: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please l…
800,000 step-level correctness labels on LLM solutions to MATH problems
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
LiveBench: A Challenging, Contamination-Free LLM Benchmark
GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
DeepSeek Coder: Let the Code Write Itself
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
A framework for the evaluation of autoregressive code generation language models.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)