Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default #19520

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Dec 18, 2024

Based on comparisons with iree-kernel-benchmark here The performance between VectorDistribute vs TileAndFuse when using intrinisics seem comparable. Note that none of the tests in the sheet used the padding extension available in TileAndFuse after, #19484
so its a fair comparison of the pipelines themselves. TileAndFuse in some cases did have a speed up that seems beyond the noise level and overall it averages out to 1.25x faster.

However, we will be looking at LLAMA and SDXL numbers before actually considering this PR for merging,

Fixes : #18858

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 38f5a22 to 7d687d7 Compare December 18, 2024 22:21
@nirvedhmeshram nirvedhmeshram changed the title [GPU] Enable GEMMs to use LLVMGPUTileAndFuse by default [GPU] Enable GEMMs to first attempt LLVMGPUTileAndFuse with intrinsic by default Dec 18, 2024
@nirvedhmeshram nirvedhmeshram marked this pull request as ready for review December 18, 2024 22:35
@nirvedhmeshram nirvedhmeshram marked this pull request as draft December 19, 2024 16:26
@nirvedhmeshram
Copy link
Contributor Author

There are compiler failures in the regression suite models, converting to draft while I debug

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 7d687d7 to 7e2cdf8 Compare December 19, 2024 21:46
@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Dec 19, 2024

The problem was a missing functionality for GEMMs of the type (f16,f16) ->f16. I filed this issue for it
#19532
Probably cant land this without having a solution for that but we also solved this problem at the model level so going to keep pushing on this to find other issues.

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 2 times, most recently from e6aa895 to 3bc822c Compare December 20, 2024 16:41
@nirvedhmeshram
Copy link
Contributor Author

Found another issue with accumulating GEMMs #19546

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 6 times, most recently from 2adc85d to 2111358 Compare January 6, 2025 23:19
@nirvedhmeshram
Copy link
Contributor Author

Also need to disable prefetching when using c promotion due to this issue #19612

@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch 2 times, most recently from 017e558 to 982856b Compare January 8, 2025 17:30
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh Meshram <[email protected]>
@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from c2326e1 to 181a2ed Compare January 10, 2025 17:19
…t of TileLargeTensorPass

Signed-off-by: Nirvedh Meshram <[email protected]>
@nirvedhmeshram nirvedhmeshram force-pushed the enable_tile_and_fuse_matmul branch from 181a2ed to adbc5c8 Compare January 10, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable TileAndFuse pipeline with instrinisc targeting for non-intrinsic sized GEMM shapes
1 participant