Releases: woct0rdho/SageAttention
v2.1.1-windows
First, note that if you just pip install sageattention
, that's actually SageAttention 1, which uses Triton and no CUDA and is easy to install.
Here is SageAttention 2, which has both Triton and CUDA kernels, and can be faster than SageAttention 1 in many cases.
Both SageAttention 1 and 2 only support RTX 30xx and newer GPUs (sm >= 80). RTX 20xx and older are not supported.
Installation
- Know how to use pip to install packages in the correct Python environment. See https://github.com/woct0rdho/triton-windows
- Install triton-windows
- Install the wheel here
- Choose the wheel for your PyTorch version. For example, 'torch2.7.0' in the filename
- The torch minor version (2.6/2.7 ...) must be correct, but the patch version (2.7.0/2.7.1 ...) can be different
- The CUDA version can be different, because SageAttention does not yet use any breaking API.
- For torch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch wheel on any day. They're only tested with torch 2.8.0.dev20250415
- Choose the wheel for your Python version. For example, 'cp312' in the filename means Python 3.12
- Choose the wheel for your PyTorch version. For example, 'torch2.7.0' in the filename
Please help test it on all kinds of GPUs. If you see errors, please open an issue at https://github.com/woct0rdho/SageAttention/issues
Recently we've simplified the installation by a lot. There is no need to install Visual Studio or CUDA toolkit to use Triton and SageAttention (unless you want to step into the world of building from source)
To use SageAttention in ComfyUI, you just need to add --use-sage-attention
when starting ComfyUI. The PatchSageAttentionKJ
node is usually not needed, and not compatible with all workflows.
Dev notes
- The wheels are built using the workflow https://github.com/woct0rdho/SageAttention/blob/main/.github/workflows/build-sageattn.yml
- CUDA kernels for sm80/86/89/90 are bundled in the wheels, and also sm120 for CUDA 12.8
- The wheels do not use CXX11 ABI
- It's tricky to specify both torch (with index URL at download.pytorch.org ) and pybind11 (not in that index URL) in an isolated build environment. The easiest way I could think of is to use simpleindex
- We cannot publish the wheels to PyPI, because PyPI does not support multiple PyTorch/CUDA variants for the same version of SageAttention. The uv team is working on this: https://x.com/charliermarsh/status/1901634997053804610