Stars
Supercharge Your LLM with the Fastest KV Cache Layer
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
Unified KV Cache Compression Methods for Auto-Regressive Models
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Dynamic Memory Management for Serving LLMs without PagedAttention
Awesome LLM compression research papers and tools.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
📰 Must-read papers and blogs on Speculative Decoding ⚡️
A library to analyze PyTorch traces.
A curated list for Efficient Large Language Models
An implementation of a deep learning recommendation model (DLRM)
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Writing an OS in 1,000 lines.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
A powerful framework for building realtime voice AI agents 🤖🎙️📹
Pytorch domain library for recommendation systems
Less than 100 Kilobytes. Works for Android 5.1 and above
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
The simplest, fastest repository for training/finetuning medium-sized GPTs.
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references und…