[Draft] XLA and AMP #11

ekuznetsov139 · 2020-06-03T04:07:41Z

This PR adds support of XLA. Specify a command-line flag --use_xla=1 or --use_xla=2 to enable. With --use_xla=1, XLA will be used to fuse several specific subgraphs like AdamWeightDecayOptimizer. With --use_xla=2, TF will try to fuse the entire graph to the maximum extent possible.
It enables AMP via the flag --use_fp16=True (supersedes the branch https://github.com/ROCmSoftwarePlatform/bert/tree/enable_AMP).
Alternately, it enables fp16 via the flag --manual_fp16=True. This code was lifted straight from NVBERT and has not been tested.
It adds continuous logging, letting you see the loss in realtime (also lifted from NVBERT).
It adjusts evaluation logic.

It's been tested with horovod+xla+adam with and without fp16 and it seems to work correctly. With 8x MI50, seq 128, batch size 10, 1M steps (125K/GPU), final loss is 2.179 +/- 0.003 with fp32 and 2.202 with fp16.

It may be necessary to get a very recent build of TF for horovod & xla to work together.

Adding support of AMP (FP16) Adding logging

deven-amd mentioned this pull request Jun 10, 2020

Add hook for reporting and recording variables #12

Merged

ekuznetsov139 added 2 commits June 12, 2020 17:54

Adding support of XLA

4b0a3ca

Adding support of AMP (FP16) Adding logging

Add missing file

68798a9

ekuznetsov139 force-pushed the xla_and_fp16 branch from f69163a to 68798a9 Compare June 12, 2020 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] XLA and AMP #11

[Draft] XLA and AMP #11

ekuznetsov139 commented Jun 3, 2020

[Draft] XLA and AMP #11

Are you sure you want to change the base?

[Draft] XLA and AMP #11

Conversation

ekuznetsov139 commented Jun 3, 2020