Full-Batch / mini-batch LLC estimates #38

jqhoogland · 2023-10-10T01:39:00Z

We need to support each of the following choices (from @edmundlth):

There are few distinct objects where that choice need to be made here (going to use n for total dataset size and b for batch size)
k=n or k=b when SGLD sampling for w_i.
L_n or L_b in E_w L_k(w)
L_n or L_b in L_k(w*).
k=n or k=b in (E_w L_{k'}(w) - L_{k'}(w^*)) * k / log(k)

One thing that I noticed during TMS and also more recently in the grokking experiment is that the statistics for L_k(w) and L_n(w) could be different over samples of w in an epoch of SGD or SGLD.

Whenever possible, one should use k = n above. As in, in decreasing order of preference and theoretical support:
Do all of 1 -4 with k = n: SGLD sampling with b=n and all loss eval are on full data and calculation happen with k = n.
Do SGLD sampling with k = b (minibatching), but for every SGLD sample w_i, still evaluate the loss on the full dataset.
Do SGLD sampling with minibatching and only use the minibatch loss, i.e. E_w L_n(w) is approx-ed as \mean_i L_b(w_i). But the loss eval on w^* should still be on the full dataset, i.e. L_n(w^*).

jqhoogland · 2023-11-20T23:17:49Z

See #39

ADD [SAEs] SAE-related figures that ended up on lang1 branch

svwingerden added a commit that referenced this issue Jan 24, 2025

Merge pull request #38 from timaeus-research/lang1/mig-out-saes-proj

39a8c29

ADD [SAEs] SAE-related figures that ended up on lang1 branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full-Batch / mini-batch LLC estimates #38

Full-Batch / mini-batch LLC estimates #38

jqhoogland commented Oct 10, 2023

jqhoogland commented Nov 20, 2023

Full-Batch / mini-batch LLC estimates #38

Full-Batch / mini-batch LLC estimates #38

Comments

jqhoogland commented Oct 10, 2023

jqhoogland commented Nov 20, 2023