Skip to content

Commit

Permalink
Fix symbol table for byte bpe
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj committed Oct 13, 2023
1 parent efd3cd3 commit 094d772
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion sherpa-onnx/csrc/symbol-table.cc
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,16 @@ void SymbolTable::Init(std::istream &is) {
}

assert(!sym.empty());
assert(sym2id_.count(sym) == 0);

// for byte bpe, after replacing ▁ with a space, whose ascii is also 0x20,
// there is a conflict between the real byte 0x20 and ▁, so we disable
// the following check.
//
// Note: Only id2sym_ matters as we use it to convert ID to symbols.
if (sym != " ") {
assert(sym2id_.count(sym) == 0);
}

assert(id2sym_.count(id) == 0);

sym2id_.insert({sym, id});
Expand Down

0 comments on commit 094d772

Please sign in to comment.