Driver quantize fp8 update #3715

CharlieL7 · 2024-12-13T22:46:52Z

Updates the quantization to always quantize fp8 to the OCP fp8e4m3fn type
Removes running simplify_qdq and optimize_module during the quantization so that the ocp_to_fnuz conversion pass can work properly
Don't merge this until FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684 is done.

…st_op

…_fnuz

it needs a string input

…aphX into ocp_to_fnuz

…_quantize_fp8_update

codecov · 2024-12-13T23:02:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.29%. Comparing base (976ae75) to head (9fc2e97).
Report is 3 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3715      +/-   ##
===========================================
+ Coverage    92.28%   92.29%   +0.01%     
===========================================
  Files          519      519              
  Lines        22222    22216       -6     
===========================================
- Hits         20507    20504       -3     
+ Misses        1715     1712       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfultz2 · 2024-12-13T23:19:22Z

src/targets/gpu/include/migraphx/gpu/context.hpp

@@ -311,7 +311,6 @@ struct context
        value result;
        result["events"]  = events.size();
        result["streams"] = current_device->nstreams();
-        result["gfx_name"] = get_current_device().get_gfx_name();


Why is this removed on serialization?

I added this earlier when getting FP8 OCP in to query the gfx number from the driver. We could keep it, but it would not be used anywhere anymore.

…ntize_fp8_update

…phX into driver_quantize_fp8_update

…ntize_fp8_update

migraphx-bot · 2025-01-21T20:04:15Z

Test	Batch	Rate new 9fc2e9	Rate old f36eba	Diff	Compare
torchvision-resnet50	64	3,251.40	3,253.72	-0.07%	✅
torchvision-resnet50_fp16	64	6,915.73	6,920.21	-0.06%	✅
torchvision-densenet121	32	2,452.99	2,455.81	-0.11%	✅
torchvision-densenet121_fp16	32	4,158.21	4,158.69	-0.01%	✅
torchvision-inceptionv3	32	1,630.71	1,630.15	0.03%	✅
torchvision-inceptionv3_fp16	32	2,716.42	2,714.74	0.06%	✅
cadene-inceptionv4	16	763.47	762.38	0.14%	✅
cadene-resnext64x4	16	813.01	812.83	0.02%	✅
slim-mobilenet	64	7,461.91	7,460.11	0.02%	✅
slim-nasnetalarge	64	208.59	208.61	-0.01%	✅
slim-resnet50v2	64	3,448.16	3,446.57	0.05%	✅
bert-mrpc-onnx	8	1,149.10	1,148.57	0.05%	✅
bert-mrpc-tf	1	485.55	486.00	-0.09%	✅
pytorch-examples-wlang-gru	1	483.78	512.26	-5.56%	🔴
pytorch-examples-wlang-lstm	1	442.47	489.70	-9.65%	🔴
torchvision-resnet50_1	1	813.52	808.03	0.68%	✅
cadene-dpn92_1	1	424.15	440.63	-3.74%	🔴
cadene-resnext101_1	1	383.83	385.04	-0.31%	✅
onnx-taau-downsample	1	373.00	373.31	-0.08%	✅
dlrm-criteoterabyte	1	33.30	33.30	-0.00%	✅
dlrm-criteoterabyte_fp16	1	52.70	52.42	0.53%	✅
agentmodel	1	8,672.50	8,664.84	0.09%	✅
unet_fp16	2	58.47	58.49	-0.03%	✅
resnet50v1_fp16	1	1,046.35	1,032.87	1.31%	✅
resnet50v1_int8	1	1,028.24	1,034.09	-0.57%	✅
bert_base_cased_fp16	64	1,180.56	1,180.18	0.03%	✅
bert_large_uncased_fp16	32	365.26	365.26	-0.00%	✅
bert_large_fp16	1	199.07	198.79	0.14%	✅
distilgpt2_fp16	16	2,227.41	2,225.55	0.08%	✅
yolov5s	1	530.54	524.79	1.10%	✅
tinyllama	1	43.78	43.59	0.44%	✅
vicuna-fastchat	1	173.88	174.11	-0.13%	✅
whisper-tiny-encoder	1	417.75	417.02	0.17%	✅
whisper-tiny-decoder	1	432.99	430.09	0.67%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2025-01-21T20:04:17Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

CharlieL7 added 30 commits October 3, 2024 11:02

Initial

70336db

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

b41c8b6

progress

bdebeb5

cleanup

a1fb21e

remove unneeded files

b8e2041

Fix bit_cast kernel

8366434

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

a15e5a4

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

be5d9a0

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

3e08ab2

Merge branch 'bit_cast_op' of github.com:ROCm/AMDMIGraphX into bit_ca…

7b40796

…st_op

progress

697d459

fix template for gpu bit_cast

4b6c8c1

Merge branch 'develop' into bit_cast_op

531150f

Merge branch 'bit_cast_op' of github.com:ROCm/AMDMIGraphX into ocp_to…

d53ac35

…_fnuz

first implementation

95a3cd7

progress

98d8760

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

7357367

Fixes and first test works

e3d84fc

formatting

dac07c2

Added ref tests

06b94b8

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

3e5d3a8

Cleanup

df0202e

initial

0318f32

temporary

3b48242

disable simpilify_qdq in quantization_8bits

b373d10

revert

28aab5f

disable extra passes after quantize_8bits

7e0142f

add verify test

0a4d6bf

Fix bug with __builtin_nan(string)

c94c520

it needs a string input

Merge branch 'develop' into ocp_to_fnuz

d025e47

CharlieL7 added 3 commits December 13, 2024 14:11

separate quantizable ops

0cddfbf

Merge branch 'ocp_to_fnuz' of github.com:ROCmSoftwarePlatform/AMDMIGr…

3c36b9b

…aphX into ocp_to_fnuz

Merge branch 'ocp_to_fnuz' of github.com:ROCm/AMDMIGraphX into driver…

b62a304

…_quantize_fp8_update

CharlieL7 self-assigned this Dec 13, 2024

CharlieL7 requested a review from causten as a code owner December 13, 2024 22:46

pfultz2 reviewed Dec 13, 2024

View reviewed changes

causten requested a review from TedThemistokleous December 16, 2024 20:36

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into driver_qua…

302adac

…ntize_fp8_update

CharlieL7 requested a review from a team as a code owner January 9, 2025 17:33

spolifroni-amd approved these changes Jan 9, 2025

View reviewed changes

Base automatically changed from ocp_to_fnuz to develop January 13, 2025 20:06

CharlieL7 mentioned this pull request Jan 14, 2025

Update API to break out fp8 quantization functionality. #3724

Merged

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into driver_qua…

9ba6fe2

…ntize_fp8_update

CharlieL7 requested a review from shivadbhavsar January 14, 2025 21:53

CharlieL7 added the simple small or simple changes label Jan 14, 2025

TedThemistokleous approved these changes Jan 14, 2025

View reviewed changes

TedThemistokleous and others added 6 commits January 15, 2025 12:04

Merge branch 'develop' into driver_quantize_fp8_update

1de1d22

Merge branch 'develop' into driver_quantize_fp8_update

dac22d0

Fix the gpu context test

083a9da

Merge branch 'driver_quantize_fp8_update' of github.com:ROCm/AMDMIGra…

39592ae

…phX into driver_quantize_fp8_update

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into driver_qua…

1ddf7a1

…ntize_fp8_update

licensing update

9fc2e97

causten merged commit 43488c9 into develop Jan 22, 2025
43 of 45 checks passed

causten deleted the driver_quantize_fp8_update branch January 22, 2025 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Driver quantize fp8 update #3715

Driver quantize fp8 update #3715

CharlieL7 commented Dec 13, 2024 •

edited

Loading

codecov bot commented Dec 13, 2024 •

edited

Loading

pfultz2 Dec 13, 2024

CharlieL7 Dec 16, 2024

migraphx-bot commented Jan 21, 2025

migraphx-bot commented Jan 21, 2025

Driver quantize fp8 update #3715

Driver quantize fp8 update #3715

Conversation

CharlieL7 commented Dec 13, 2024 • edited Loading

codecov bot commented Dec 13, 2024 • edited Loading

Codecov Report

pfultz2 Dec 13, 2024

Choose a reason for hiding this comment

CharlieL7 Dec 16, 2024

Choose a reason for hiding this comment

migraphx-bot commented Jan 21, 2025

migraphx-bot commented Jan 21, 2025

CharlieL7 commented Dec 13, 2024 •

edited

Loading

codecov bot commented Dec 13, 2024 •

edited

Loading