Stars
An Emacs framework for the stubborn martian hacker
DeepEP: an efficient expert-parallel communication library
A Datacenter Scale Distributed Inference Serving Framework
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
FlashMLA: Efficient MLA decoding kernels
A retargetable MLIR-based machine learning compiler and runtime toolkit.
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Large Language Model (LLM) Systems Paper List
Code repository for 'From Batch to Stream: Automatic Generation of Online Algorithms' https://arxiv.org/abs/2404.04743
Universal LLM Deployment Engine with ML Compilation
A list of awesome compiler projects and papers for tensor computation and deep learning.
This is a repository for all workshop related materials.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
ASCII generator (image to text, image to image, video to video)
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
SGLang is a fast serving framework for large language models and vision language models.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
Machine learning compiler based on MLIR for Sophgo TPU.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.