Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

ZhiYuanZeng · 2024-03-22T03:52:11Z

I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py？

zhangzhenyu13 · 2024-03-25T01:50:39Z

The rope is infact not trained and is fixed registerd buffer tensors.
It is ok to apply the default settings of ROPE without any modifications.

ZhiYuanZeng · 2024-03-25T09:02:43Z

Yes, the rope is parameter-free, but the base of rope is often tuned to support long-context extrapolation. The base of ComposerMosaicLlama is fixed to be 10000. This configuration works well for the standard LLama model, but it might not be correct for variants of LLama.

ZhiYuanZeng · 2024-03-25T10:15:22Z

But It is better to set the rope base from the config file, rather than loading from checkpoint.

I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

ZhiYuanZeng commented Mar 22, 2024

zhangzhenyu13 commented Mar 25, 2024

ZhiYuanZeng commented Mar 25, 2024 •

edited

Loading

ZhiYuanZeng commented Mar 25, 2024

Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

Comments

ZhiYuanZeng commented Mar 22, 2024

zhangzhenyu13 commented Mar 25, 2024

ZhiYuanZeng commented Mar 25, 2024 • edited Loading

ZhiYuanZeng commented Mar 25, 2024

ZhiYuanZeng commented Mar 25, 2024 •

edited

Loading