-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Llama 3.1 and 3.2 fine tuning #114
Comments
Hi @DimensionSTP ! |
Hello again, Thank you for your response and clarification regarding the support for Llama 3.1 and 3.2. I attempted to follow the Llama fine-tuning example provided in your comment, but unfortunately, I encountered an issue related to the rope_scaling configuration. In Llama 3, the configuration includes: However, in Llama 3.1 and 3.2, the rope_scaling field differs significantly: Llama 3.1: Llama 3.2: When attempting to fine-tune these models, I encountered the following error: I tried this both with the Transformers version currently supported by Optimum-TPU and after upgrading Transformers to the latest version, but the same error persisted. It seems that Optimum-TPU relies on an older version of Transformers and may not fully support the more complex rope_scaling configurations introduced in Llama 3.1 and 3.2 Would it be possible to update Optimum-TPU to handle these changes in rope_scaling? Alternatively, could you provide guidance on modifying the library locally to accommodate the new configuration while waiting for official support? Thank you for your efforts in maintaining this amazing project. I'm looking forward to the updates! |
Hi @DimensionSTP, as I told you before, working with Llama 3.1 is on the roadmap, and we will work on this soon. |
Hello,
I am deeply interested in your Optimum-TPU project.
Currently, I am planning to fine-tune the Llama 3.1 and 3.2 models on my native language and a specific domain, with a fairly large dataset (approximately 60B tokens).
I am using Google TPU Pods, but I have been facing significant challenges in implementing model parallel training from scratch, saving unified checkpoints in the safetensors format, setting up appropriate logging, and configuring hyperparameters.
While exploring solutions, I came across the Optimum-TPU project, which seems incredibly useful. However, I noticed that it currently only supports up to Llama 3.
Are there any plans to extend support to Llama 3.1 and 3.2 for fine-tuning?
I strongly hope that future updates will include support for these versions as well.
Thank you for considering this request!
The text was updated successfully, but these errors were encountered: