8000 dangkai4u (DangKai) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View dangkai4u's full-sized avatar

Block or report dangkai4u

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLM/VLM gaming agents and model evaluation through games.

Python 702 72 Updated Jul 3, 2025

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python 363 21 Updated Jul 4, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 2,849 205 Updated Jul 4, 2025
HTML 119 54 Updated Jun 12, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 10,343 1,717 Updated Jul 4, 2025
Python 444 38 Updated Jun 26, 2025

A comprehensive repository of reasoning tasks for LLMs (and beyond)

JavaScript 448 54 Updated Sep 27, 2024

Open-source Next.js template for building apps that are fully generated by AI. By E2B.

TypeScript 5,586 754 Updated Jun 16, 2025

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Python 807 65 Updated Jun 26, 2025

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 371 32 Updated Dec 15, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,287 282 Updated May 4, 2024

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Python 281 14 Updated Mar 13, 2024

A One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 2,709 323 Updated Jul 4, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,646 430 Updated Jul 3, 2025

Summarize existing representative LLMs text datasets.

1,307 131 Updated Mar 25, 2025

Public Inflection Benchmarks

68 2 Updated Mar 6, 2024

A bagel, with everything.

Python 322 33 Updated Apr 11, 2024

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 11,806 975 Updated Jul 3, 2025

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 337 29 Updated Sep 25, 2024

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 257 27 Updated Jun 7, 2024

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

Python 168 17 Updated Jun 7, 2025
Python 50 8 Updated Mar 2, 2024

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。

1,043 84 Updated Feb 27, 2024

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Jupyter Notebook 366 30 Updated Sep 30, 2024

Universal and Transferable Attacks on Aligned Language Models

Python 4,030 540 Updated Aug 2, 2024

大模型多维度中文对齐评测基准 (ACL 2024)

Python 398 27 Updated Aug 16, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,007 558 Updated Apr 11, 2025

Supercharge Your LLM Application Evaluations 🚀

Python 9,799 969 Updated Jul 3, 2025

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 58,840 5,874 Updated Jun 19, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 51,443 8,495 Updated Jul 4, 2025
Next
0