You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 26, 2019. It is now read-only.
When I use the BERT-keras, I don't understand this part: class TextEncoder: PAD_OFFSET = 0 MSK_OFFSET = 1 BOS_OFFSET = 2 DEL_OFFSET = 3 # delimiter EOS_OFFSET = 4 SPECIAL_COUNT = 5 NUM_SEGMENTS = 2 BERT_UNUSED_COUNT = 99 # bert pretrained models BERT_SPECIAL_COUNT = 4 # they don't have DEL
Why would you set it up like this?
and the BERT_UNUSED_COUNT = 99 BERT_SPECIAL_COUNT = 4 are used in load_google_bert.
The text was updated successfully, but these errors were encountered:
Hi,
There are some special tokens in the vocabulary(for example BOS stands for Beginning Of Sentence) and we can either put them at the beginning of a lookup table(embedding) or at the end. I decided to put them at the beginning.
And for the "UNUSED_COUNT" you can check the vocab files in pretrained BERT models.
Ah, you might be confused by their usage, right?
Let's say you want to feed a sentence into your network, so you should add the BOS and EOS tokens to your sentence and you should know their locations in the embedding table
I see, but when I load_google_bert model, the vocab_size = vocab_size - TextEncoder.BERT_SPECIAL_COUNT - TextEncoder.BERT_UNUSED_COUNT, but it doesn't match when w_id ==2 'weights[w_id][vocab_size + TextEncoder.EOS_OFFSET] = saved[3 + TextEncoder.BERT_UNUSED_COUNT] ' this line can not load the weight.
@ChiuHsin I guess you are right, and it seems that you were able to solve it(based on the other issue you posted)
can you please send a pull request to correct this problem?
thanks!
When I use the BERT-keras, I don't understand this part:
class TextEncoder: PAD_OFFSET = 0 MSK_OFFSET = 1 BOS_OFFSET = 2 DEL_OFFSET = 3 # delimiter EOS_OFFSET = 4 SPECIAL_COUNT = 5 NUM_SEGMENTS = 2 BERT_UNUSED_COUNT = 99 # bert pretrained models BERT_SPECIAL_COUNT = 4 # they don't have DEL
Why would you set it up like this?
and the BERT_UNUSED_COUNT = 99 BERT_SPECIAL_COUNT = 4 are used in load_google_bert.
The text was updated successfully, but these errors were encountered: