Stars
LLM/VLM gaming agents and model evaluation through games.
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
🚀 Efficient implementations of state-of-the-art linear attention models
verl: Volcano Engine Reinforcement Learning for LLMs
A comprehensive repository of reasoning tasks for LLMs (and beyond)
Open-source Next.js template for building apps that are fully generated by AI. By E2B.
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
A One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Summarize existing representative LLMs text datasets.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Universal and Transferable Attacks on Aligned Language Models
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Supercharge Your LLM Application Evaluations 🚀
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
A high-throughput and memory-efficient inference and serving engine for LLMs