Stars
FlashInfer: Kernel Library for LLM Serving
Accessible large language models via k-bit quantization for PyTorch.
Character Animation (AnimateAnyone, Face Reenactment)
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
A library for calculating the FLOPs in the forward() process based on torch.fx
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Count the MACs / FLOPs of your PyTorch model.
This is an efficient cuda implementation of 2D depthwise convolution for large kernel, it can be used in Pytorch deep learning framework.
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
A high-throughput and memory-efficient inference and serving engine for LLMs
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.
Fast and memory-efficient exact attention
Annotations of the interesting ML papers I read
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Transformer related optimization, including BERT, GPT
Development repository for the Triton language and compiler
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
PyTorch Tutorial for Deep Learning Researchers
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
A list of awesome compiler projects and papers for tensor computation and deep learning.