Lists (1)
Sort Name ascending (A-Z)
Stars
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Ring attention implementation with flash attention
a toolkit on knowledge distillation for large language models
OLMoE: Open Mixture-of-Experts Language Models
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Train transformer language models with reinforcement learning.
An unofficial cuda assembler, for all generations of SASS, hopefully :)
A framework for few-shot evaluation of language models.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
My learning notes/codes for ML SYS.
A throughput-oriented high-performance serving framework for LLMs
Minimal reproduction of DeepSeek R1-Zero
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A highly optimized LLM inference acceleration engine for Llama and its variants.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling