Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about the grouplinear? #1386

Open
south-ocean opened this issue Dec 26, 2024 · 2 comments
Open

How about the grouplinear? #1386

south-ocean opened this issue Dec 26, 2024 · 2 comments

Comments

@south-ocean
Copy link

Hi,i noticed that in our multi-stream implementation of group linear, we didn't use batchgemm or groupgemm. Is there any particular reason for this?

@yaox12
Copy link
Collaborator

yaox12 commented Dec 31, 2024

Batched gemm doesn't support different GEMM sizes.

For grouped gemm, we have two (potential) implementations:

  • cublasGemmGroupedBatchedEx(). The performance is not as good as the multi-stream implementation, and it doesn't support FP8 for now.
  • Cutlass. We evaluate the performance with GEMM sizes from popular MoE models, including Mixtral 8x7B, 8x22B, Qwen2-57B-A14B, and DeepSeek v2, multi-stream calls to cuBLASLt shows better performance in most cases.

We will keep exploring the performance of grouped gemm, and would add them to TE if there were better implementaions.

@south-ocean
Copy link
Author

Got it, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants