The Project is not implemented for 70B llama? #62

zhangzhenyu13 · 2024-03-08T07:20:27Z

No GQA implementation is found, so the model is not capable to scale to 70B for composerLLAMA.
Maybe we need design GQA and introduce head_z for wq and head_z_kv for wk and wv?

xiamengzhou · 2024-03-12T01:49:56Z

Hi, the modeling file currently does not support GQA, but should require minimal changes to support it. What you described should work perfectly :)

ZhiYuanZeng · 2024-03-12T10:58:22Z

It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group the same.

zhangzhenyu13 · 2024-03-12T12:25:54Z

It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group the same.

In order to make the pruned model be able to run tp, it would be better to keep the group num unchanged.
We only need to prune the query heads for each group, thus maybe a layer_num * group_num* group_heads_query z_group_query need to initialized.

xiamengzhou · 2024-03-12T12:51:25Z

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

ZhiYuanZeng · 2024-03-12T13:01:21Z

Could we share the mask of query-heads among different groups?

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

zhangzhenyu13 · 2024-03-13T04:15:48Z

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

Yes, your settings are right. We need to share z across groups.

Longyichen · 2024-04-01T14:47:10Z

Hi @zhangzhenyu13 I have some confusion. The author's composer llama file does not implement any GQA functionality. Did you implement GQA forward yourself? Which llama warehouse implementation version is better to refer to?

LorrinWWW mentioned this issue Apr 23, 2024

Support for Llama-3 / GQA? #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Project is not implemented for 70B llama? #62

The Project is not implemented for 70B llama? #62

zhangzhenyu13 commented Mar 8, 2024

xiamengzhou commented Mar 12, 2024

ZhiYuanZeng commented Mar 12, 2024 •

edited

Loading

zhangzhenyu13 commented Mar 12, 2024

xiamengzhou commented Mar 12, 2024

ZhiYuanZeng commented Mar 12, 2024 •

edited

Loading

zhangzhenyu13 commented Mar 13, 2024

Longyichen commented Apr 1, 2024

The Project is not implemented for 70B llama? #62

The Project is not implemented for 70B llama? #62

Comments

zhangzhenyu13 commented Mar 8, 2024

xiamengzhou commented Mar 12, 2024

ZhiYuanZeng commented Mar 12, 2024 • edited Loading

zhangzhenyu13 commented Mar 12, 2024

xiamengzhou commented Mar 12, 2024

ZhiYuanZeng commented Mar 12, 2024 • edited Loading

zhangzhenyu13 commented Mar 13, 2024

Longyichen commented Apr 1, 2024

ZhiYuanZeng commented Mar 12, 2024 •

edited

Loading

ZhiYuanZeng commented Mar 12, 2024 •

edited

Loading