You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py?
The text was updated successfully, but these errors were encountered:
Yes, the rope is parameter-free, but the base of rope is often tuned to support long-context extrapolation. The base of ComposerMosaicLlama is fixed to be 10000. This configuration works well for the standard LLama model, but it might not be correct for variants of LLama.
But It is better to set the rope base from the config file, rather than loading from checkpoint.
I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py?
I found that the rope params are ignored in composer_to_hf.py and that the base of rope in composer_llama.py is set to be 10000 constantly. However, it is normal to tune the base of rope for the better long-context performance. Therefore, we need to set the rope params (inv_freq) in composer_to_hf.py?
The text was updated successfully, but these errors were encountered: