-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply 2D blocking to all kernels #156
Conversation
somehow |
Alright, I'm unable to find the error that stops |
also fix some typos.
removed the matmul_single methods
I applied the 2D blocking for every kernel used. This gives a boost in both the The slowdown is because the blocking for the |
A balancing act will be have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! It looks like this gives us a 13% performance boost for GPU inference (both eval and batch eval) for Windows users. Looks good to me.
extracts a bit more speed in the prompt eval time. also fixes some typo errors.