apply 2D blocking to all kernels #156

ahgamut · 2023-12-30T18:36:27Z

extracts a bit more speed in the prompt eval time. also fixes some typo errors.

ahgamut · 2023-12-30T18:38:15Z

somehow GemmStridedBatchedEx produces garbage output when I try to use 2D blocks there.

ahgamut · 2023-12-30T19:25:50Z

Alright, I'm unable to find the error that stops GemmStridedBatchedEx. It's a bounds check of some kind, I think.

also fix some typos.

removed the matmul_single methods

ahgamut · 2024-01-02T20:14:06Z

I applied the 2D blocking for every kernel used. This gives a boost in both the CLIP and the prompt_eval parts when using llava, but causes a slowdown in the eval part.

The slowdown is because the blocking for the GemmStridedBatchedEx kernel is not optimal, a lot of threads do no work with the current values of BM, BN, BK. GemmStridedBatchedEx does much better when BM = 32, BN = 4, BK = 32 because less threads are wasted.

ahgamut · 2024-01-02T20:15:27Z

A balancing act will be have BM/BK/BN as template parameters for all the GPU functions, or redefine those macros again before GemmStridedBatchedEx and copy-paste the body of matmul_block2d.

jart

Nice! It looks like this gives us a 13% performance boost for GPU inference (both eval and batch eval) for Windows users. Looks good to me.

ahgamut marked this pull request as ready for review December 30, 2023 19:25

jart mentioned this pull request Jan 1, 2024

ggml : get rid of BLAS and all it's variants ggerganov/ggml#293

Open

ahgamut force-pushed the gemm-batched branch from 9589eea to b96885c Compare January 2, 2024 20:10

ahgamut and others added 4 commits January 2, 2024 14:11

apply 2D blocks to GemmBatchedEx

60f5ea7

also fix some typos.

create separate matmul_block2d function

74605e2

add GSBE

c9ff289

added matmul32_block2d for sgemm

54bb243

removed the matmul_single methods

add bounds check

b96885c

ahgamut changed the title ~~apply 2D blocks to GemmBatchedEx~~ apply 2D blocking to all kernels Jan 2, 2024

jart approved these changes Jan 3, 2024

View reviewed changes

jart merged commit c0589f0 into Mozilla-Ocho:main Jan 3, 2024
1 check passed

ahgamut mentioned this pull request Jan 3, 2024

separate kernel for GemmStridedBatchedEx #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply 2D blocking to all kernels #156

apply 2D blocking to all kernels #156

ahgamut commented Dec 30, 2023

ahgamut commented Dec 30, 2023

ahgamut commented Dec 30, 2023

ahgamut commented Jan 2, 2024

ahgamut commented Jan 2, 2024

jart left a comment

apply 2D blocking to all kernels #156

apply 2D blocking to all kernels #156

Conversation

ahgamut commented Dec 30, 2023

ahgamut commented Dec 30, 2023

ahgamut commented Dec 30, 2023

ahgamut commented Jan 2, 2024

ahgamut commented Jan 2, 2024

jart left a comment

Choose a reason for hiding this comment