Skip to content

Commit

Permalink
Fix exponent in "Visualize and understand GPU memory in PyTorch" (#2588)
Browse files Browse the repository at this point in the history
  • Loading branch information
qgallouedec authored Jan 12, 2025
1 parent c422809 commit aa0e1b3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions train_memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ Not directly, as the number of activations per token depends on the model archit

This linear relationship allows us to estimate activations using the heuristic:

\\( A = 4.6894 \times 10^{4} \times N + 1.8494 \times 10^{6} \\)
\\( A = 4.6894 \times 10^{-4} \times N + 1.8494 \times 10^{6} \\)

Though this is an approximation, it provides a practical way to estimate activation memory without needing to perform complex calculations for each model.

Expand Down Expand Up @@ -267,7 +267,7 @@ with the following components:
- **Optimizer State**: \\( 2 \times N \times P \\)
- **Gradients**: \\( N \times P \\)
- **Optimizer Intermediates**: \\( N \times P \\)
- **Activations**: \\( A \times B \times L \times P \\), estimated using the heuristic \\( A = 4.6894 \times 10^{4} \times N + 1.8494 \times 10^{6} \\)
- **Activations**: \\( A \times B \times L \times P \\), estimated using the heuristic \\( A = 4.6894 \times 10^{-4} \times N + 1.8494 \times 10^{6} \\)

To make this calculation easier, I created a small tool for you:

Expand Down

0 comments on commit aa0e1b3

Please sign in to comment.