Implementation of sparse autoencoders (SAEs) for vision transformers (ViTs) in PyTorch.
saev is a package for training sparse autoencoders (SAEs) on vision transformers (ViTs) in PyTorch. It also includes an interactive webapp for looking through a trained SAE's features.
Originally forked from HugoFry who forked it from Joseph Bloom.
Read logbook.md for a detailed log of my thought process.
See related-work.md for a list of works training SAEs on vision models. Please open an issue or a PR if there is missing work.
Installation is supported with uv. saev will likely work with pure pip, conda, etc. but I will not formally support it.
To install, clone this repository (maybe fork it first if you want).
In the project root directory, run uv run python -m saev --help
.
The first invocation should create a virtual environment and show a help message.
See the docs for an overview.
I recommend using the llms.txt file as a way to use any LLM provider to ask questions.
For example, you can run curl https://samuelstevens.me/saev/llms.txt | pbcopy
on macOS to copy the text, then paste it into https://claude.ai and ask any question you have.
- Train models with data scaling (norm, mean) turned on.
- Train models on ViT-L/14 datasets.
- Semantic segmentation baseline with linear probe.
- ADE20K experiment to demonstrate faithfulness.