Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,709 121 Updated Jun 11, 2025

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Python 933 166 Updated Dec 9, 2024

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 486 56 Updated Jun 9, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM Inference Papers with Codes.

Python 4,118 286 Updated Jun 9, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,558 98 Updated Jun 6, 2025

dkhamsing / open-source-ios-apps

📱 Collaborative List of Open-Source iOS Apps

45,278 5,562 Updated Jun 13, 2025

opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 7,849 571 Updated Jan 3, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,730 1,495 Updated Jun 13, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,816 1,741 Updated Jun 10, 2025

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 34,441 4,929 Updated Jun 12, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 40,510 3,211 Updated Jun 12, 2025

DevLiuSir / SwiftUI-DesignCode

 SwiftUI-DesignCode is some examples in the process of learning swiftUI 2.0

Swift 262 29 Updated Apr 18, 2024

assafelovic / gpt-researcher

LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.

Python 21,855 2,871 Updated Jun 13, 2025

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 20,781 1,745 Updated Jun 8, 2025

commaai / openpilot

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

Python 54,080 9,811 Updated Jun 13, 2025

tinygrad / tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 29,387 3,452 Updated Jun 13, 2025

chatchat-space / Langchain-Chatchat

Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 35,279 5,911 Updated Mar 25, 2025

carla-simulator / carla

Open-source simulator for autonomous driving research.

C++ 12,580 4,062 Updated Jun 13, 2025

microsoft / AirSim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

C++ 17,203 4,725 Updated May 15, 2025

tonylt / tvm

Forked from apache/tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 1 Updated Feb 6, 2024