Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-Batch / mini-batch LLC estimates #38

Open
jqhoogland opened this issue Oct 10, 2023 · 1 comment
Open

Full-Batch / mini-batch LLC estimates #38

jqhoogland opened this issue Oct 10, 2023 · 1 comment

Comments

@jqhoogland
Copy link
Contributor

We need to support each of the following choices (from @edmundlth):

There are few distinct objects where that choice need to be made here (going to use n for total dataset size and b for batch size)
k=n or k=b when SGLD sampling for w_i.
L_n or L_b in E_w L_k(w)
L_n or L_b in L_k(w*).
k=n or k=b in (E_w L_{k'}(w) - L_{k'}(w^*)) * k / log(k)

One thing that I noticed during TMS and also more recently in the grokking experiment is that the statistics for L_k(w) and L_n(w) could be different over samples of w in an epoch of SGD or SGLD.

Whenever possible, one should use k = n above. As in, in decreasing order of preference and theoretical support:
Do all of 1 -4 with k = n: SGLD sampling with b=n and all loss eval are on full data and calculation happen with k = n.
Do SGLD sampling with k = b (minibatching), but for every SGLD sample w_i, still evaluate the loss on the full dataset.
Do SGLD sampling with minibatching and only use the minibatch loss, i.e. E_w L_n(w) is approx-ed as \mean_i L_b(w_i). But the loss eval on w^* should still be on the full dataset, i.e. L_n(w^*).

@jqhoogland
Copy link
Contributor Author

See #39

svwingerden added a commit that referenced this issue Jan 24, 2025
ADD [SAEs] SAE-related figures that ended up on lang1 branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant