Highlights
- Pro
Stars
This replication package contains supplementary material for the paper "AUCAD: Automated Construction of Alignment Dataset from Log-Related Issues for Enhancing LLM-based Log Generation".
Berkeley Function Calling Leaderboard (BFCL) with Chinese-Language Evaluation
user space utility to interface to kernel dropwatch facility
End to end, high speed, and privately self-host free version of Google Translate - 低占用速度快可私有部署的自由版 Google 翻译
Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"
LiveBench: A Challenging, Contamination-Free LLM Benchmark
算法岗笔试面试大全,励志做算法届的《五年高考,三年模拟》!
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Disable web developer tools from the f12 button, right-click and browser menu
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
Making large AI models cheaper, faster and more accessible
[ICML 2025 Spotlight] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
[AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(https://arxiv.org/abs/2407.05700).
A flexible and efficient training framework for large-scale alignment tasks
[NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
Using system APIs directly with adb/root privileges from normal apps through a Java process started with app_process.
Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
[ICSE'25] Aligning the Objective of LLM-based Program Repair
The Open Cookbook for Top-Tier Code Large Language Model
Train transformer language models with reinforcement learning.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit