Starred repositories
a small build system with a focus on speed
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
verl: Volcano Engine Reinforcement Learning for LLMs
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Making large AI models cheaper, faster and more accessible
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Universal LLM Deployment Engine with ML Compilation
Large World Model -- Modeling Text and Video with Millions Context
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Janus-Series: Unified Multimodal Understanding and Generation Models
Large Language Model (LLM) Systems Paper List
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Transformer related optimization, including BERT, GPT
Dynamic Memory Management for Serving LLMs without PagedAttention
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
A Datacenter Scale Distributed Inference Serving Framework
FlashInfer: Kernel Library for LLM Serving
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Evolutionary Scale Modeling (esm): Pretrained language models for proteins
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)