Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 11,806 975 Updated Jul 3, 2025

OpenBMB / InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 337 29 Updated Sep 25, 2024

Libr-AI / do-not-answer

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 257 27 Updated Jun 7, 2024

IAAR-Shanghai / UHGEval

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

Python 168 17 Updated Jun 7, 2025

GAIR-NLP / OPO

Python 50 8 Updated Mar 2, 2024

thu-coai / Safety-Prompts

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts，用于评估和提升大模型的安全性。

1,043 84 Updated Feb 27, 2024

idavidrein / gpqa

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Jupyter Notebook 366 30 Updated Sep 30, 2024

llm-attacks / llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Python 4,030 540 Updated Aug 2, 2024

THUDM / AlignBench

大模型多维度中文对齐评测基准 (ACL 2024)

Python 398 27 Updated Aug 16, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,007 558 Updated Apr 11, 2025

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 9,799 969 Updated Jul 3, 2025

dair-ai / Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 58,840 5,874 Updated Jun 19, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 51,443 8,495 Updated Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DangKai dangkai4u

Achievements

Achievements

Block or report dangkai4u

Stars

lmgame-org / GamingAgent

ByteDance-Seed / VeOmni

fla-org / flash-linear-attention

andyzorigin / cybench

volcengine / verl

bytedance / SandboxFusion

NousResearch / Open-Reasoning-Tasks

e2b-dev / fragments

LiveBench / LiveBench

RL4VLM / RL4VLM

dvlab-research / MGM

FuxiaoLiu / LRV-Instruction

EvolvingLMMs-Lab / lmms-eval

open-compass / VLMEvalKit

lmmlzn / Awesome-LLMs-Datasets

InflectionAI / Inflection-Benchmarks

jondurbin / bagel

Unstructured-IO / unstructured