Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] XLA and AMP #11

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

[Draft] XLA and AMP #11

wants to merge 2 commits into from

Conversation

ekuznetsov139
Copy link

  • This PR adds support of XLA. Specify a command-line flag --use_xla=1 or --use_xla=2 to enable. With --use_xla=1, XLA will be used to fuse several specific subgraphs like AdamWeightDecayOptimizer. With --use_xla=2, TF will try to fuse the entire graph to the maximum extent possible.
  • It enables AMP via the flag --use_fp16=True (supersedes the branch https://github.com/ROCmSoftwarePlatform/bert/tree/enable_AMP).
  • Alternately, it enables fp16 via the flag --manual_fp16=True. This code was lifted straight from NVBERT and has not been tested.
  • It adds continuous logging, letting you see the loss in realtime (also lifted from NVBERT).
  • It adjusts evaluation logic.

It's been tested with horovod+xla+adam with and without fp16 and it seems to work correctly. With 8x MI50, seq 128, batch size 10, 1M steps (125K/GPU), final loss is 2.179 +/- 0.003 with fp32 and 2.202 with fp16.

It may be necessary to get a very recent build of TF for horovod & xla to work together.

Adding support of AMP (FP16)
Adding logging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant