Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the rope params are ignored while converting hf checkpoint to composer checkpoint? #66

Open
ZhiYuanZeng opened this issue Mar 22, 2024 · 3 comments

Comments

@ZhiYuanZeng
Copy link

I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py?

@zhangzhenyu13
Copy link

The rope is infact not trained and is fixed registerd buffer tensors.
It is ok to apply the default settings of ROPE without any modifications.

@ZhiYuanZeng
Copy link
Author

ZhiYuanZeng commented Mar 25, 2024

Yes, the rope is parameter-free, but the base of rope is often tuned to support long-context extrapolation. The base of ComposerMosaicLlama is fixed to be 10000. This configuration works well for the standard LLama model, but it might not be correct for variants of LLama.

@ZhiYuanZeng
Copy link
Author

But It is better to set the rope base from the config file, rather than loading from checkpoint.

I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants