Sparse autoencoders (SAEs) for vision transformers (ViTs), implemented in PyTorch.
This is the codebase used for our preprint "Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models"
saev is a package for training sparse autoencoders (SAEs) on vision transformers (ViTs) in PyTorch. It also includes an interactive webapp for looking through a trained SAE's features.
Originally forked from HugoFry who forked it from Joseph Bloom.
Read logbook.md for a detailed log of my thought process.
See related-work.md for a list of works training SAEs on vision models. Please open an issue or a PR if there is missing work.
Installation is supported with uv. saev will likely work with pure pip, conda, etc. but I will not formally support it.
Clone this repository (or fork it), then from the root directory:
uv run python -m saev --help
This will create a virtual environment and display the CLI help.
See the docs for an overview.
You can ask questions about this repo using the llms.txt
file.
Example (macOS):
curl https://osu-nlp-group.github.io/saev/llms.txt | pbcopy
, then paste into Claude or any LLM interface of your choice.