Releases · woct0rdho/SageAttention

First, note that if you just pip install sageattention, that's actually SageAttention 1, which uses Triton and no CUDA and is easy to install.

Here is SageAttention 2, which has both Triton and CUDA kernels, and can be faster than SageAttention 1 in many cases.

Both SageAttention 1 and 2 only support RTX 30xx and newer GPUs (sm >= 80). RTX 20xx and older are not supported.

Installation

Know how to use pip to install packages in the correct Python environment. See https://github.com/woct0rdho/triton-windows
Install triton-windows
Install the wheel here
- Choose the wheel for your PyTorch version. For example, 'torch2.7.0' in the filename
  - The torch minor version (2.6/2.7 ...) must be correct, but the patch version (2.7.0/2.7.1 ...) can be different
  - The CUDA version can be different, because SageAttention does not yet use any breaking API.
- For torch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch wheel on any day. They're only tested with torch 2.8.0.dev20250415
- Choose the wheel for your Python version. For example, 'cp312' in the filename means Python 3.12

Please help test it on all kinds of GPUs. If you see errors, please open an issue at https://github.com/woct0rdho/SageAttention/issues

Recently we've simplified the installation by a lot. There is no need to install Visual Studio or CUDA toolkit to use Triton and SageAttention (unless you want to step into the world of building from source)

To use SageAttention in ComfyUI, you just need to add --use-sage-attention when starting ComfyUI. The PatchSageAttentionKJ node is usually not needed, and not compatible with all workflows.

Dev notes

The wheels are built using the workflow https://github.com/woct0rdho/SageAttention/blob/main/.github/workflows/build-sageattn.yml
CUDA kernels for sm80/86/89/90 are bundled in the wheels, and also sm120 for CUDA 12.8
The wheels do not use CXX11 ABI
It's tricky to specify both torch (with index URL at download.pytorch.org ) and pybind11 (not in that index URL) in an isolated build environment. The easiest way I could think of is to use simpleindex
We cannot publish the wheels to PyPI, because PyPI does not support multiple PyTorch/CUDA variants for the same version of SageAttention. The uv team is working on this: https://x.com/charliermarsh/status/1901634997053804610

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Installation

Dev notes

Uh oh!

Releases: woct0rdho/SageAttention

v2.1.1-windows

Installation

Dev notes

Uh oh!