TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,955 1,562 Updated Jul 8, 2025

Lyken17 / pytorch-OpCounter

Count the MACs / FLOPs of your PyTorch model.

Python 5,014 533 Updated Jul 8, 2024

ShqWW / dwconv2d

This is an efficient cuda implementation of 2D depthwise convolution for large kernel, it can be used in Pytorch deep learning framework.

Cuda 10 1 Updated Sep 28, 2023

gty111 / GEMM_MMA

Optimize GEMM with tensorcore step by step

28 6 Updated Dec 17, 2023

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,757 457 Updated Oct 9, 2023

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 51,754 8,566 Updated Jul 8, 2025

MooreThreads / torch_musa

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 417 30 Updated Jun 26, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 18,242 1,789 Updated Jul 6, 2025

shreyansh26 / Annotated-ML-Papers

Annotations of the interesting ML papers I read

242 24 Updated Jul 3, 2025

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 475 37 Updated Mar 15, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,231 909 Updated Mar 27, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,809 1,301 Updated Jul 6, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 16,075 2,099 Updated Jul 8, 2025

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 43,884 5,494 Updated May 8, 2025

yunjey / pytorch-tutorial

PyTorch Tutorial for Deep Learning Researchers

Python 31,459 8,225 Updated Aug 15, 2023

phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 2,903 630 Updated Mar 16, 2025

LiuXiaoxuanPKU / Cost-Model-papers

13 1 Updated Feb 22, 2023

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,382 3,352 Updated Aug 12, 2024

merrymercy / awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,598 312 Updated Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lms-mt

Block or report lms-mt

Stars

fishaudio / Bert-VITS2

flashinfer-ai / flashinfer

bitsandbytes-foundation / bitsandbytes

MooreThreads / Moore-AnimateAnyone

pytorch-labs / gpt-fast

ggml-org / llama.cpp

kuleshov / minillm

LAION-AI / Open-Assistant

zugexiaodui / torch_flops

NVIDIA / TensorRT-LLM