-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SDXL regression test on mi308 #19747
Conversation
IanNod
commented
Jan 21, 2025
- Adds mi308 runner for regression testing and specifies sku for tuned specs
- Adds new unet IR for sdxl with 960x1024 image shape
- New tuning spec for new unet IR for mi308 sku
6396140
to
33254c6
Compare
6171e30
to
73ab365
Compare
(seeing all the force pushes for "typo fixes" to get tests working) When iterating on a single workflow job, you can put a git trailer like
in the PR description to run a specific selection of jobs: https://iree.dev/developers/general/contributing/#ci-behavior-manipulation. That reduces the load on unaffected runners and can save a respectable sum from managed runner time that we pay for. The PkgCI workflows are also designed to be run without needing to rebuild the packages at all, but that takes some workflow_dispatch triggering and possibly workflow surgery. If you expect to be debugging for a longer duration I can help with that. Do you have access to the runner type you are adding these tests for? Or a similar machine? I would start by getting the tests working locally before pushing to GitHub if possible. If tests can only be iterated on via github actions then that is quite unhealthy for development workflows and should be improved. |
de6bf5c
to
12acf02
Compare
- Adds new unet for sdxl image shape of 960x1024 - Adds new tuning spec for new unet shapes - Adds mi308 to tests Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
…thout Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
Signed-off-by: Ian <[email protected]> Signed-off-by: ianNod <[email protected]>
Signed-off-by: ianNod <[email protected]>
deb8eb2
to
2504703
Compare
@@ -165,3 +173,32 @@ jobs: | |||
--timeout=600 \ | |||
--retries 7 | |||
echo "$(<job_summary.md )" >> $GITHUB_STEP_SUMMARY | |||
# Note: allowing 10% deviation from observed averages here to account for | |||
# different runner conditions. | |||
- name: "Running SDXL rocm pipeline benchmark (mi300)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change name here for mi308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes but looks good enough to me.
Can you also edit the PR title to match local project style (https://github.com/iree-org/iree/commits/main/)? Notably start with an uppercase letter and include a verb, like "Add SDXL regression test on mi308"
source ${VENV_DIR}/bin/activate | ||
pytest ./experimental/benchmarks/sdxl/benchmark_sdxl_rocm.py \ | ||
--goldentime-tolerance-multiplier 1.1 \ | ||
--goldentime-rocm-e2e-ms 797.0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd round to 800 (I'm skeptical we can get significant figures within 1ms at this scale, 5ms maybe).
--goldentime-rocm-e2e-ms 797.0 \ | |
--goldentime-rocm-e2e-ms 800.0 \ |
--goldentime-rocm-punet-int8-fp16-ms 138.0 \ | ||
--goldentime-rocm-punet-int8-fp8-ms 147 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more nits about precision in benchmark expectations, but you've probably looked at the times closer than me 😅
--goldentime-rocm-punet-int8-fp16-ms 138.0 \ | |
--goldentime-rocm-punet-int8-fp8-ms 147 \ | |
--goldentime-rocm-punet-int8-fp16-ms 140.0 \ | |
--goldentime-rocm-punet-int8-fp8-ms 150.0 \ |
(These tests are good at guarding against large regressions, but they will flake frequently if the tolerances are too tight and the runners experience different conditions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, another 200+ line spec file 😨. How does this one differ from the mi300 version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The differences are mainly the shapes used for the 960x1024 IR don't match, and MI308 tunings are different than MI300. It seems we ideally will need a similar spec file for every sku if we want best performance after tuning
sdxl_unet_fp16_960_1024_inference_input_0 = fetch_source_fixture( | ||
"https://sharkpublic.blob.core.windows.net/sharkpublic/ian/unet_npys/input1.npy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, not blocking.
Common infra should really be pointing at a common file path with some versioning built in (like a date) and some developer documentation explaining how to manage the files. Otherwise anyone wanting to modify this in the future will need to ask you personally what to do 😛
if os.path.isfile( | ||
f"{iree_test_path_extension}/attention_and_matmul_spec_unet_fp16_{sku}.mlir" | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems brittle. Is this checking if the test runner has a file in the cache? Should that file be coming from the source code or one of the fetch_source_fixture
s? I'm worried this will make the tests even harder to understand and run locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is grabbing the file in the source code. I changed the existing spec name to append the sku (mi300) and added the one for mi308. If we don't have a tuning spec file for the sku in question, then it shouldn't use one as tuning from one sku seems to not be directly applicable to another.
Signed-off-by: ianNod <[email protected]>