-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the repo support OFA pre-training? #1
Comments
Thanks for your support! I just released a version of the pretrain script. Please try pretrain.sh. But I do not recommend using this framework for pre-training. We have tried pre-training under this framework before, and its performance is not as good as the original repo (fairseq version). |
Thank you for your prompt reply. Does this suboptimal performance refer to training from scratch? Have you tried to load ckpt (from fairseq version) to continue adding new data or tasks for pre-training? By the way, will OFA continue to maintain the pre-training in fairseq version in the future, or gradually change to the Huggingface version? Thank you!!! |
Yes, the performance refers to training from scratch, and we haven't tried pertaining with new data from loaded ckpt. If you are interested, welcome to try it. In the future, the OFA pre-training will still be maintained in fairseq version, but the compression features (including distillation, pruning and quantization) will be supported by this repo. |
Thank you! In addition, I am a little confused about some details of these three codes (process_image_text_pair, L12&45 of pretrain.sh, and init_task.py). Their configurations and dimensions seem inconsistent. |
Awesome job! The Huggingface version OFA looks more concise. Can this framework support multi-task pre-training like this repo?
I see that this code file contains many different tasks. Could you provide more details about pre-training (Such as data preparation and submission script)?
Thank you again and look forward to your reply!
The text was updated successfully, but these errors were encountered: