You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After leveraging Transformer Engine's FP8 features for PyTorch on H100, my linear layers in forward pass output GEMM kernels like sm90_xmma_gemm_e4m3bf16_e4m3f32_f32_tn_n_tilesize128x128x128_warpgroupsize1x1x1_execute_segment_k_off_kernel__5x_cublas instead of sm90_xmma_gemm_bf16bf16_bf16f32_f32_tn_n_tilesize128x128x64_warpgroupsize1x1x1_execute_segment_k_off_kernel__5x_cublas
The main difference seems to be bf16bf16_bf16f32_f32 -> e4m3bf16_e4m3f32_f32
I'm curious how do I interpret this? I thought the pattern was [input_types]_[accumulator_type]_[output_type]. But that would imply that either one of the weights or activations is in bf16 rather than fp8. My understanding is that both are cast to fp8. Would appreciate if anyone can help correct my understanding here. Thank you!
Note I am also using AMP autocast with bf16 so maybe that is affecting things.
The text was updated successfully, but these errors were encountered:
After leveraging Transformer Engine's FP8 features for PyTorch on H100, my linear layers in forward pass output GEMM kernels like
sm90_xmma_gemm_e4m3bf16_e4m3f32_f32_tn_n_tilesize128x128x128_warpgroupsize1x1x1_execute_segment_k_off_kernel__5x_cublas
instead ofsm90_xmma_gemm_bf16bf16_bf16f32_f32_tn_n_tilesize128x128x64_warpgroupsize1x1x1_execute_segment_k_off_kernel__5x_cublas
The main difference seems to be
bf16bf16_bf16f32_f32 -> e4m3bf16_e4m3f32_f32
I'm curious how do I interpret this? I thought the pattern was
[input_types]_[accumulator_type]_[output_type]
. But that would imply that either one of the weights or activations is in bf16 rather than fp8. My understanding is that both are cast to fp8. Would appreciate if anyone can help correct my understanding here. Thank you!Note I am also using AMP autocast with bf16 so maybe that is affecting things.
The text was updated successfully, but these errors were encountered: