Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to resume CoCondenser pretraining #9

Open
eugene-yang opened this issue Dec 13, 2021 · 7 comments
Open

Unable to resume CoCondenser pretraining #9

eugene-yang opened this issue Dec 13, 2021 · 7 comments

Comments

@eugene-yang
Copy link

The model checkpoints seem to be hard-coded as the BertForMaskedLM and are unable to load but to the CoCondensor class.
Adding the following attributes in the initialization can surpass the exceptions but all the weights were not loaded.

self._keys_to_ignore_on_save = None
self._keys_to_ignore_on_load_missing = None

Is there a way to resume training after interruptions?
Thanks!

@luyug
Copy link
Owner

luyug commented Dec 13, 2021

Please elaborate on the issue. Include what you did, what worked and did not worked, error messages, etc.

@eugene-yang
Copy link
Author

Here is the way to reproduce the exception.

I first start training from the model downloaded from huggingface.

HF_DATASETS_CACHE="/expscratch/eyang/cache/datasets" TOKENIZERS_PARALLELISM="false"\
  python run_co_pre_training.py \
  --output_dir ./test/bert-base-cased/ \
  --model_name_or_path bert-base-cased \
  --do_train \
  --fp16 \
  --save_steps 1 \
  --save_total_limit 10 \
  --model_type bert \
  --per_device_train_batch_size 256 \
  --cache_chunk_size 12 \
  --gradient_accumulation_steps 1 \
  --warmup_ratio 0.1 \
  --learning_rate 1e-5 \
  --num_train_epochs 8 \
  --dataloader_drop_last \
  --overwrite_output_dir \
  --dataloader_num_workers 10 \
  --n_head_layers 2 \
  --skip_from 6 \
  --max_seq_length 180 \
  --train_path ./processed_text/msmarco-document.span-90.tokenized-bert-base_incomplete.jsonl \
  --weight_decay 0.01 \
  --late_mlm

I tried to load the checkpoint from the first step. --

HF_DATASETS_CACHE="/expscratch/eyang/cache/datasets" TOKENIZERS_PARALLELISM="false"\
  python run_co_pre_training.py \
  --output_dir ./test/bert-base-cased/ \
  --model_name_or_path ./test/bert-base-cased/checkpoint-1 \
  --do_train \
  --fp16 \
  --save_steps 100 \
  --save_total_limit 10 \
  --model_type bert \
  --per_device_train_batch_size 256 \
  --cache_chunk_size 12 \
  --gradient_accumulation_steps 1 \
  --warmup_ratio 0.1 \
  --learning_rate 1e-5 \
  --num_train_epochs 8 \
  --dataloader_drop_last \
  --overwrite_output_dir \
  --dataloader_num_workers 10 \
  --n_head_layers 2 \
  --skip_from 6 \
  --max_seq_length 180 \
  --train_path ./processed_text/msmarco-document.span-90.tokenized-bert-base_incomplete.jsonl \
  --weight_decay 0.01 \
  --late_mlm

and here is the exception.

[INFO|tokenization_utils_base.py:1671] 2021-12-13 16:30:45,404 >> Didn't find file ./test/bert-base-cased/checkpoint-1/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:30:45,404 >> loading file ./test/bert-base-cased/checkpoint-1/vocab.txt
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:30:45,404 >> loading file ./test/bert-base-cased/checkpoint-1/tokenizer.json
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:30:45,404 >> loading file None
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:30:45,404 >> loading file ./test/bert-base-cased/checkpoint-1/special_tokens_map.json
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:30:45,404 >> loading file ./test/bert-base-cased/checkpoint-1/tokenizer_config.json
[INFO|modeling_utils.py:1350] 2021-12-13 16:30:45,426 >> loading weights file ./test/bert-base-cased/checkpoint-1/pytorch_model.bin
[INFO|modeling_utils.py:1619] 2021-12-13 16:30:47,089 >> All model checkpoint weights were used when initializing BertForMaskedLM.

[INFO|modeling_utils.py:1627] 2021-12-13 16:30:47,089 >> All the weights of BertForMaskedLM were initialized from the model checkpoint at ./test/bert-base-cased/checkpoint-1.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForMaskedLM for predictions without further training.
12/13/2021 16:30:47 - INFO - modeling -   loading extra weights from local files
12/13/2021 16:30:47 - INFO - trainer -   Initializing Gradient Cache Trainer
[INFO|trainer.py:439] 2021-12-13 16:30:51,616 >> Using amp half precision backend
/home/hltcoe/eyang/.conda/envs/pretrain/lib/python3.8/site-packages/transformers/trainer.py:1059: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  warnings.warn(
[INFO|trainer.py:1089] 2021-12-13 16:30:51,618 >> Loading model from ./test/bert-base-cased/checkpoint-1).
Traceback (most recent call last):
  File "run_co_pre_training.py", line 227, in <module>
    main()
  File "run_co_pre_training.py", line 217, in main
    trainer.train(model_path=model_path)
  File "/home/hltcoe/eyang/.conda/envs/pretrain/lib/python3.8/site-packages/transformers/trainer.py", line 1108, in train
    self._load_state_dict_in_model(state_dict)
  File "/home/hltcoe/eyang/.conda/envs/pretrain/lib/python3.8/site-packages/transformers/trainer.py", line 1484, in _load_state_dict_in_model
    if self.model._keys_to_ignore_on_save is not None and set(load_result.missing_keys) == set(
  File "/home/hltcoe/eyang/.conda/envs/pretrain/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CoCondenserForPretraining' object has no attribute '_keys_to_ignore_on_save'

We can surpass this exception by adding the two flags here.
https://github.com/luyug/Condenser/blob/main/modeling.py#L177

And execute the same command above to, here are the warnings (clipped but basically all the layers)

[INFO|tokenization_utils_base.py:1671] 2021-12-13 16:34:43,748 >> Didn't find file ./test/bert-base-cased/checkpoint-1/added_tokens.json. We won't load it.                                                                                                                                                                                                                                   
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:34:43,749 >> loading file ./test/bert-base-cased/checkpoint-1/vocab.txt                                                                                                                                                                                                                                                                  
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:34:43,749 >> loading file ./test/bert-base-cased/checkpoint-1/tokenizer.json                                                                                                                                                                                                                                                             
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:34:43,749 >> loading file None                                                                                                                                                                                                                                                                                                           
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:34:43,749 >> loading file ./test/bert-base-cased/checkpoint-1/special_tokens_map.json                                                                                                                                                                                                                                                    
[INFO|tokenization_utils_base.py:1740] 2021-12-13 16:34:43,749 >> loading file ./test/bert-base-cased/checkpoint-1/tokenizer_config.json                                                                                                                                                                                                                                                      
[INFO|modeling_utils.py:1350] 2021-12-13 16:34:43,770 >> loading weights file ./test/bert-base-cased/checkpoint-1/pytorch_model.bin                                                                                                                                                                                                                                                           
[INFO|modeling_utils.py:1619] 2021-12-13 16:34:45,435 >> All model checkpoint weights were used when initializing BertForMaskedLM.                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                              
[INFO|modeling_utils.py:1627] 2021-12-13 16:34:45,435 >> All the weights of BertForMaskedLM were initialized from the model checkpoint at ./test/bert-base-cased/checkpoint-1.                                                                                                                                                                                                                
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForMaskedLM for predictions without further training.                                                                                                                                                                                                                                 
12/13/2021 16:34:45 - INFO - modeling -   loading extra weights from local files                                                                                                                                                                                                                                                                                                              
12/13/2021 16:34:45 - INFO - trainer -   Initializing Gradient Cache Trainer                                                                                                                                                                                                                                                                                                                  
[INFO|trainer.py:439] 2021-12-13 16:34:49,899 >> Using amp half precision backend                                                                                                                                                                                                                                                                                                             
/home/hltcoe/eyang/.conda/envs/pretrain/lib/python3.8/site-packages/transformers/trainer.py:1059: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.                                                                                                                                                                    
  warnings.warn(                                                                                                                                                                                                                                                                                                                                                                              
[INFO|trainer.py:1089] 2021-12-13 16:34:49,901 >> Loading model from ./test/bert-base-cased/checkpoint-1).                                                                                                                                                                                                                                                                                    
[WARNING|trainer.py:1489] 2021-12-13 16:34:50,315 >> There were missing keys in the checkpoint model loaded: ['co_target', 'lm.bert.embeddings.position_ids', 'lm.bert.embeddings.word_embeddings.weight', 'lm.bert.embeddings.position_embeddings.weight', 'lm.bert.embeddings.token_type_embeddings.weight', 'lm.bert.embeddings.LayerNorm.weight', 'lm.bert.embeddings.LayerNorm.bias', 'lm
.bert.encoder.layer.0.attention.self.query.weight', 'lm.bert.encoder.layer.0.attention.self.query.bias', 'lm.bert.encoder.layer.0.attention.self.key.weight', 'lm.bert.encoder.layer.0.attention.self.key.bias', 'lm.bert.encoder.layer.0.attention.self.value.weight', 'lm.bert.encoder.layer.0.attention.self.value.bias', 'lm.bert.encoder.layer.0.attention.output.dense.weight', 'lm.bert
.encoder.layer.0.attention.output.dense.bias', 'lm.bert.encoder.layer.0.attention.output.LayerNorm.weight', 'lm.bert.encoder.layer.0.attention.output.LayerNorm.bias', 'lm.bert.encoder.layer.0.intermediate.dense.weight', 'lm.bert.encoder.layer.0.intermediate.dense.bias', 'lm.bert.encoder.layer.0.output.dense.weight', 'lm.bert.encoder.layer.0.output.dense.bias', 'lm.bert.encoder.la
yer.0.output.LayerNorm.weight', 'lm.bert.encoder.layer.0.output.LayerNorm.bias', 'lm.bert.encoder.layer.1.attention.self.query.weight', 'lm.bert.encoder.layer.1.attention.self.query.bias', 'lm.bert.encoder.layer.1.attention.self.key.weight', 'lm.bert.encoder.layer.1.attention.self.key.bias', 'lm.bert.encoder.layer.1.attention.self.value.weight', 'lm.bert.encoder.layer.1.attention
.self.value.bias', 'lm.bert.encoder.layer.1.attention.output.dense.weight', 'lm.bert.encoder.layer.1.attention.output.dense.bias', 'lm.bert.encoder.layer.1.attention.output.LayerNorm.weight', 'lm.bert.encoder.layer.1.attention.output.LayerNorm.bias', 'lm.bert.encoder.layer.1.intermediate.dense.weight', 'lm.bert.encoder.layer.1.intermediate.dense.bias', 'lm.bert.encoder.layer.1.ou
tput.dense.weight', 'lm.bert.encoder.layer.1.output.dense.bias', 'lm.bert.encoder.layer.1.output.LayerNorm.weight', 'lm.bert.encoder.layer.1.output.LayerNorm.bias', 'lm.bert.encoder.layer.2.attention.self.query.weight', 'lm.bert.encoder.layer.2.attention.self.query.bias', 'lm.bert.encoder.layer.2.attention.self.key.weight', 'lm.bert.encoder.layer.2.attention.self.key.bias', 'lm.b
ert.encoder.layer.2.attention.self.value.weight', 'lm.bert.encoder.layer.2.attention.self.value.bias', 'lm.bert.encoder.layer.2.attention.output.dense.weight', 'lm.bert.encoder.layer.2.attention.output.dense.bias', 'lm.bert.encoder.layer.2.attention.output.LayerNorm.weight', 'lm.bert.encoder.layer.2.attention.output.LayerNorm.bias', 'lm.bert.encoder.layer.2.intermediate.dense.wei
ght', 'lm.bert.encoder.layer.2.intermediate.dense.bias', 'lm.bert.encoder.layer.2.output.dense.weight', 'lm.bert.encoder.layer.2.output.dense.bias', 'lm.bert.encoder.layer.2.output.LayerNorm.weight', 'lm.bert.encoder.layer.2.output.LayerNorm.bias', 'lm.bert.encoder.layer.3.attention.self.query.weight', 'lm.bert.encoder.layer.3.attention.self.query.bias', 'lm.bert.encoder.layer.3.
attention.self.key.weight', 'lm.bert.encoder.layer.3.attention.self.key.bias', 'lm.bert.encoder.layer.3.attention.self.value.weight', 'lm.bert.encoder.layer.3.attention.self.value.bias', ...

@luyug
Copy link
Owner

luyug commented Dec 13, 2021

The attribute _keys_to_ignore_on_save is introduced in a relatively recent release of hf transformers. Maybe I should patch the repo but for now a few easy things you can do,

  • get a earlier version of transformers. I used 4.2.0 in my experiments.
  • Set model_path=None here.

@eugene-yang
Copy link
Author

Thank you for the reply!
Isn't setting model_path=None basically telling the trainer to start from scratch and ignore the checkpoint?

@eugene-yang
Copy link
Author

Would it makes more sense to put the path of the checkpoint we want to resume from at here (like ./test/bert-base-cased/checkpoint-1 in the example) and leave the rest of the model_name_or_path as the original model like bert-base-cased in the example?

@luyug
Copy link
Owner

luyug commented Dec 14, 2021

Thank you for the reply! Isn't setting model_path=None basically telling the trainer to start from scratch and ignore the checkpoint?

Yes, and the CoCondensr object will do the loading. You will see a log when it does so. Letting CoCondensr class do the loading makes sure that we can handle multiple load scenarios.

This is more or less a WAR. Eventually, I probably need to patch the CondenserPreTrainer class s.t. it will no longer load model weights.

@qherreros
Copy link

Maybe I am missing something, but from what I can read, using model_path=None and loading the model from CondenserForPretraining is actually doing exactly the same thing as trainer.train(resume_from_checkpoint=model_args.model_name_or_path). You will just get rid of the warning but the loading should be exactly the same. If you print missing_keys from the custom from_pretrained classmethod of CondenserForPretraining, you'll see it contains the same keys that are logged in the warning.
Maybe ignoring those keys on save is a cleaner solution, but in the end, it should not change anything to the training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants