Stars
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
EEdit⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Accelerating Diffusion Transformers with Token-wise Feature Caching
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
verl: Volcano Engine Reinforcement Learning for LLMs
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
Efficient Triton Kernels for LLM Training
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
A pipeline parallel training script for diffusion models.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Wan: Open and Advanced Large-Scale Video Generative Models
[CVPR2025 Highlight] SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Unofficial Windows wheel package for the Nunchaku (SVDQuant) library.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Distributed Triton for Parallel Systems
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
[ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching