Stars
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
VGDFR: Diffuison-based Video Generation with Dynamic Frame Rate
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Efficient Triton Kernels for LLM Training
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Helpful tools and examples for working with flex-attention
SD.Next: All-in-one WebUI for AI generative image and video creation
NVIDIA curated collection of educational resources related to general purpose GPU programming.
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
[CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Official PyTorch Implementation of "Optimal Stepsize for Diffusion Sampling".
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025 Oral)
[TMLR 2025] Efficient Diffusion Models: A Survey
Hackable and optimized Transformers building blocks, supporting a composable construction.
Accelerate inference in Flux and Sana for ComfyUI.
[ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1Wan 2.1 for the GPU Poor
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Enjoy the magic of Diffusion models!
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model