Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Commit

Permalink
Merge pull request #47 from kalaidin/master
Browse files Browse the repository at this point in the history
Update docs
  • Loading branch information
kalaidin authored Nov 19, 2019
2 parents 080195d + aeffc5f commit c93cce1
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ Key advantages:
* Highly efficient implementation in C++
* Python wrapper and command-line interface

Extra features:
* BPE-dropout (as described in [Provilkov at al, 2019](https://arxiv.org/abs/1910.13267))

As well as in the algorithm from the original paper, ours does not consider tokens
that cross word boundaries. Just like in [SentencePiece](https://github.com/google/sentencepiece), all space symbols were replaced by meta symbol "▁" (U+2581). It allows sequences of tokens to be converted back to text and for word boundaries to be restored.

Expand Down

0 comments on commit c93cce1

Please sign in to comment.