Skip to content
This repository has been archived by the owner on Jan 19, 2019. It is now read-only.

Different Character Encodings #272

Open
nelson-liu opened this issue Mar 23, 2017 · 0 comments
Open

Different Character Encodings #272

nelson-liu opened this issue Mar 23, 2017 · 0 comments

Comments

@nelson-liu
Copy link
Contributor

Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.

Allowing for different character encodings in tokenizers that return characters would thus be nice.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants