Stars
prime-rl is a codebase for decentralized async RL training at scale
Official PyTorch implementation for "Large Language Diffusion Models"
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Robust Speech Recognition via Large-Scale Weak Supervision
Train transformer language models with reinforcement learning.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
verl: Volcano Engine Reinforcement Learning for LLMs
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Best practices & guides on how to write distributed pytorch training code
LLM training parallelisms (DP, FSDP, TP, PP) in pure C
💯 Curated coding interview preparation materials for busy software engineers
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A Telegram bot to recommend arXiv papers
LLM training code for Databricks foundation models
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
SGLang is a fast serving framework for large language models and vision language models.
AirLLM 70B inference with single 4GB GPU