-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
转换成 sentencepiece 的之后载入失败 #13
Comments
转换前的模型方便共享吗?或者给一个最小的复现代码? |
@bojone 按照 README 的例子复现。模型在这里 https://microbin.yzlnew.com/upload/sloth-worm-falcon from bytepiece import Tokenizer
tokenizer1 = Tokenizer('tokenizer_80k_small_isolated.model')
tokenizer1.convert_to_sentencepiece('sp.model')
import sentencepiece as spm
tokenizer2 = spm.SentencePieceProcessor("sp.model") |
@yzlnew 看上去你不是ensure_unicode版本?只有ensure_unicode版本的模型才保证能顺利转换成sentencepiece(在较新的版本中,ensure_unicode默认是开启的,你可以检查一下) |
@bojone 奇怪了,这个模型是用 0.6.3 训练的,而且也是 ensure_unicode 的。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
通过类方法 convert_to_sentencepiece 转换为 sp model,再进行 load 的时候报错
相关 issue google/sentencepiece#156
模型里面有 "\0",是否应该在 convert 的时候去掉,以及是否有副作用?
The text was updated successfully, but these errors were encountered: