-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No validation set in openwebtext leads to failure. #195
Comments
I think this is because David changed this from the original. It looks like Line 94 in bf9eff0
get_auto_dataset it should work fine and create a standard tokenized Hugging Face cache with an extra validation set.
|
Basically older Mistral just built a conventional Hugging Face cache and David created a new custom data handling setup and I guess didn't add in creating a validation set ... |
I've started this branch: https://github.com/stanford-crfm/mistral/tree/mistral-flash-dec-2022 This should have Mistral Feb 2022 code + some bug fixes + has worked with Flash Attention |
I'm on vacation mode but I am happy to help you get this branch working ... you will need to install flash attention and a specially modified Hugging Face as well ... |
Some instructions on getting this working, (remember use branch: https://github.com/stanford-crfm/mistral/tree/mistral-flash-dec-2022)
I think this will work with newer PyTorch, etc ... but you need to make sure you build Flash Attention with whatever you are using ...
Please let me know if you run into any issues and we can clean this branch + instructions up ... but if all goes well should get super fast Flash Attention GPT2 training which is something like 2x faster ... In the future we should think about reconciling this branch with current main ... but if you just want something working in the next day this is quickest route ... |
Sample command: Note add a file called
|
You need to use bf16 ... a bad feature of this branch right now is this is just hard-coded here: mistral/src/args/training_args.py Line 67 in 3a7dfac
So it'd be a good idea to make this more transparent ... this branch is sort of my personal experimentation that I got running and could use some clean up ... |
Flash Attention requires bf16 or fp16 ... and you need bf16 for the stability ... |
@J38 Hello, I also had the same issue of code not working for openwebtext due to missing validation set, so I tried your solution above. But I encountered the error "ImportError: cannot import name 'LMDataCollator' from 'src.core.trainer'". |
Can you provide more details about what is causing that error (e.g. what line is failing in what file)? The branch is older code before changes were made, so it should not require LMDataCollator. Are you pre-training from scratch or trying to fine-tune a model trained with main branch code? |
I guess it is this line: Line 39 in 3a7dfac
|
Yes it is this line, and I believe LMDataCollator is used in line 158 of this file. But I'm able to fix the dev set problem by adding a few lines of code on the main branch so I think the issue is resolved. |
I tried reverting |
Describe the bug
After building index for openwebtext, building the trainer fails (at line 161 of
train.py
) because novalidation
dataset is constructed. I believe this is because thelm_dataset
object is built with huggingface'sload_dataset
on theopenwebtext
named dataset, and it has no validation split. Thevalidation_ratio
quinine config option is only used in building thecustom_eval_datasets
, not thelm_dataset
object, so it is not used to portion out part ofopenwebtext
as a validation set.To Reproduce
Replace
datasets/wikitext2.yaml
withdatasets/openwebtext.yaml
inmistral-micro.yaml
(and make other artefact location changes) and runExpected behavior
No failure occurs at line
161
oftrain.py
whenlm_dataset['validation']
is expressed.The text was updated successfully, but these errors were encountered: