SpargeAttention: A training-free sparse attention that can accelerate any model inference.
attention vit quantization video-generation mlsys inference-acceleration ai-infra vision-transformer sparse-attention llm sageattention
-
Updated
May 14, 2025 - Cuda