Skip to content

Commit

Permalink
Docs and rearranging folders
Browse files Browse the repository at this point in the history
  • Loading branch information
samuelstevens committed Dec 5, 2024
1 parent 2023fba commit 702859b
Show file tree
Hide file tree
Showing 24 changed files with 1,318 additions and 450 deletions.
17 changes: 17 additions & 0 deletions configs/preprint/classification.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
tag = "classification-v1.0"

lr = [1e-4, 3e-4, 1e-3, 3e-3]

n_lr_warmup = 500
n_sparsity_warmup = 500

[sae]
sparsity_coeff = [4e-4, 8e-4, 1.6e-3]
ghost_grads = false
normalize_w_dec = true
remove_parallel_grads = true
exp_factor = [16, 32]

[data]
scale_mean = true
scale_norm = true
File renamed without changes.
30 changes: 30 additions & 0 deletions contrib/classification/reproduce.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Reproduce

You can reproduce our classification control experiments from our preprint by following these instructions.

The big overview (as described in our paper) is:

1. Train an SAE on the ImageNet-1K [CLS] token activations from a CLIP ViT-B/16, from the 11th (second-to-last) layer.
2. Show that you get meaningful features, through visualizations.
3. Train a linear probe on the [CLS] token activations from a CLIP ViT-B/16, from the 11th layer, on the Oxford Flowers-102 dataset.
4. Show that we get good accuracy.
5. Manipulate the activations using the proposed SAE features.
6. Be amazed. :)

To do these steps:

## Record ImageNet-1K activations

## Train an SAE

```sh
uv run python -m saev train --sweep configs/preprint/classification.toml --data.shard-root /local/scratch/stevens.994/cache/saev/ac89246f1934b45e2f0487298aebe36ad998b6bd252d880c0c9ec5de78d793c8/ --data.patches cls --data.layer -2 --sae.d-vit 768
```

## Visualize the SAE Features

## Record Oxford Flowers-102 Activations

## Train a Linear Probe

## Manipulate
Loading

0 comments on commit 702859b

Please sign in to comment.