Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging N_BLOCKS #75

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Debugging N_BLOCKS #75

wants to merge 4 commits into from

Conversation

WilsonCWu
Copy link

@WilsonCWu WilsonCWu commented Dec 10, 2024

Why is the first one failing?

/matmul
N=16, N_BLOCK=2
--------------------  M=4096 N=4096 K=4096  --------------------
Block size: 128x32
Allocated host memory
Initialized matrices
Performed CPU matrix multiplication
Allocated device memory
Copied matrices to device
Launching warmup kernel with grid (132, 1), block (384)
Launching kernel with grid (132, 1), block (384)
Avg Kernel execution time: 742.272 us
Achieved performance: 185.16 TFLOPs
Copied result back to host
Converted result back to float
Error at row 0 col 0: -3.09375 != -1.39006 (ref)
Error at row 0 col 1: -2.82812 != 1.66512 (ref)
Error at row 0 col 2: 8.3125 != 2.44395 (ref)
Error at row 0 col 3: -6.21875 != -8.22833 (ref)
Error at row 0 col 4: -3.35938 != 2.22921 (ref)
Error at row 0 col 5: 0.660156 != 6.12578 (ref)
Error at row 0 col 6: 18.875 != -7.62248 (ref)
Error at row 0 col 7: 7.25 != -1.25781 (ref)
Error at row 0 col 8: 1.91406 != 3.26331 (ref)
Error at row 0 col 9: 0.460938 != 3.74322 (ref)
Error at row 0 col 10: 2.375 != -1.60718 (ref)
Error at row 0 col 11: 0.691406 != 3.00533 (ref)
Error at row 0 col 12: -3.89062 != 6.23455 (ref)
Error at row 0 col 13: -4.125 != 2.55398 (ref)
Error at row 0 col 14: 1.05469 != 6.0381 (ref)
Error at row 0 col 15: -0.236328 != 4.82473 (ref)
Error at row 0 col 17: 5.15625 != -4.56702 (ref)
Error at row 0 col 18: -5.625 != 1.13531 (ref)
Error at row 0 col 19: -1.19531 != 7.92724 (ref)
Error at row 0 col 20: 5.96875 != 0.462465 (ref)
Too many errors to show them all.
Max error: 40.0543
Error count: 15008549
N=16, N_BLOCK=1
--------------------  M=4096 N=4096 K=4096  --------------------
Block size: 128x16
Allocated host memory
Initialized matrices
Performed CPU matrix multiplication
Allocated device memory
Copied matrices to device
Launching warmup kernel with grid (132, 1), block (384)
Launching kernel with grid (132, 1), block (384)
Avg Kernel execution time: 1172.43 us
Achieved performance: 117.226 TFLOPs
Copied result back to host
Converted result back to float
Max error: 0.0982647
Error count: 0
N=32, N_BLOCK=1
--------------------  M=4096 N=4096 K=4096  --------------------
Block size: 128x32
Allocated host memory
Initialized matrices
Performed CPU matrix multiplication
Allocated device memory
Copied matrices to device
Launching warmup kernel with grid (132, 1), block (384)
Launching kernel with grid (132, 1), block (384)
Avg Kernel execution time: 633.43 us
Achieved performance: 216.976 TFLOPs
Copied result back to host
Converted result back to float
Max error: 0.0982647
Error count: 0
N=64, N_BLOCK=2
--------------------  M=4096 N=4096 K=4096  --------------------
Block size: 128x128
Allocated host memory
Initialized matrices
Performed CPU matrix multiplication
Allocated device memory
Copied matrices to device
Launching warmup kernel with grid (132, 1), block (384)
Launching kernel with grid (132, 1), block (384)
Avg Kernel execution time: 229.372 us
Achieved performance: 599.198 TFLOPs
Copied result back to host
Converted result back to float
Max error: 0.0982647
Error count: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants