-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver quantize fp8 update #3715
Conversation
CharlieL7
commented
Dec 13, 2024
•
edited
Loading
edited
- Updates the quantization to always quantize fp8 to the OCP fp8e4m3fn type
- Removes running simplify_qdq and optimize_module during the quantization so that the ocp_to_fnuz conversion pass can work properly
- Don't merge this until FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684 is done.
it needs a string input
…aphX into ocp_to_fnuz
…_quantize_fp8_update
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #3715 +/- ##
===========================================
+ Coverage 92.28% 92.29% +0.01%
===========================================
Files 519 519
Lines 22222 22216 -6
===========================================
- Hits 20507 20504 -3
+ Misses 1715 1712 -3 ☔ View full report in Codecov by Sentry. |
@@ -311,7 +311,6 @@ struct context | |||
value result; | |||
result["events"] = events.size(); | |||
result["streams"] = current_device->nstreams(); | |||
result["gfx_name"] = get_current_device().get_gfx_name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed on serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this earlier when getting FP8 OCP in to query the gfx number from the driver. We could keep it, but it would not be used anywhere anymore.
…phX into driver_quantize_fp8_update
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |