Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
A pipeline parallel training script for diffusion models.
Pocket Flow: Codebase to Tutorial
A TTS model capable of generating ultra-realistic dialogue in one pass.
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS
nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for do…
Self-contained, minimalistic implementation of diffusion models with Pytorch.
The ultimate training toolkit for finetuning diffusion models
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Accessible large language models via k-bit quantization for PyTorch.
OneDiff: An out-of-the-box acceleration library for diffusion models.
Applied AI experiments and examples for PyTorch
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Combining Teacache with xDiT to Accelerate Visual Generation Models
XAttention: Block Sparse Attention with Antidiagonal Scoring
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Fast and memory-efficient exact attention
Parallel computation of community structures in graphs
Exocompilation for productive programming of hardware accelerators
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
renardozt / SpoofDPI-Turkiye
Forked from xvzc/SpoofDPIThis version of Spoof DPI is configured for using in Turkiye.
You like pytorch? You like micrograd? You love tinygrad! ❤️
Sky-T1: Train your own O1 preview model within $450
aider is AI pair programming in your terminal
Blazing fast Neovim framework providing solid defaults and a beautiful UI, enhancing your neovim experience.