Stars
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Radial Attention Official Implementation
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
Task management for the Obsidian knowledge base.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Fast Automatic License Plate Recognition (ALPR) framework.
yolov5 车牌检测 车牌识别 中文车牌识别 检测 支持12种中文车牌 支持双层车牌
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Development repository for the Triton language and compiler
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
VGDFR: Diffuison-based Video Generation with Dynamic Frame Rate
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Lets make video diffusion practical!
Efficient Triton Kernels for LLM Training
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Helpful tools and examples for working with flex-attention
SD.Next: All-in-one WebUI for AI generative image and video creation
NVIDIA curated collection of educational resources related to general purpose GPU programming.
🚀 Efficient implementations of state-of-the-art linear attention models
[CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Official PyTorch Implementation of "Optimal Stepsize for Diffusion Sampling".