Non-weight-shared memory #16

jeremygatineau · 2025-01-25T23:43:24Z

I might be missing something but this recent tweak is confusing to me

titans-pytorch/titans_pytorch/neural_memory.py

Lines 461 to 466 in 7034cc9

    
           if exists(prev_layer_updates): 
        
               prev_layer_updates = TensorDict(prev_layer_updates) 
        
               weights = weights + prev_layer_updates 
        
               per_sample_grad_fn = self.per_sample_grad_fn_expanded_weights # the weights will now have a batch * chunk dimension

What is the intuition behind 'accumulated gradients become the weights on the next memory module'?

From what I thought I understood, the $x_t$ in paper would have just stood in as the layer's input when taking the gradient of the memory objective; so non-tied memory modules would have an understanding of surprise at every level, and the gradient would not have to be trickled up and could simply be computed in the local layer context, such that if $x^i_t$ is the input of block $i$ at sequence time $t$ the gradient that's added to the surprise is just $\nabla_{w} MSE(M_t^i(x^i_t W^Q_i) - x^i_t W^V_i)$ where $w$ is a weight of the memory module $M_i$ at block $i$.

Please let me know if I am missing something.

lucidrains · 2025-01-26T13:53:31Z

@jeremygatineau hey Jeremy, yes this is related to #6

could you voice your insight there if you have any? thank you

jeremygatineau · 2025-01-26T15:12:38Z

Will do

jeremygatineau closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-weight-shared memory #16

Non-weight-shared memory #16

jeremygatineau commented Jan 25, 2025 •

edited

Loading

lucidrains commented Jan 26, 2025

jeremygatineau commented Jan 26, 2025

Non-weight-shared memory #16

Non-weight-shared memory #16

Comments

jeremygatineau commented Jan 25, 2025 • edited Loading

lucidrains commented Jan 26, 2025

jeremygatineau commented Jan 26, 2025

jeremygatineau commented Jan 25, 2025 •

edited

Loading