Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
grammar changes
  • Loading branch information
Wavesonics authored Dec 5, 2024
1 parent 40e3e77 commit 9924c80
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions Fdic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ FrequencyDictionaryIO.writeFdic(dictionary, "en-80k.fdic")
```

## Performance
Performance varies greatly depending on the combination of machine and dictionary being decoded. But `fdic` always
Performance varies greatly depending on the combination of machine and dictionary being decoded. But `fdic` is always
superior in both size and speed to both a plain text dictionary, and a GZIPed text dictionary.

Some Machine/Dictionary combinations show more modest speed improvements, but considering simply GZIPing would have
Expand Down Expand Up @@ -111,15 +111,15 @@ Integer permission to represent the frequency for `is`.
Pretty quickly the terms require fewer than 8 characters to represent their frequency, so representing it as a binary
number begins to increase the overall size of each entry.

To solve this I came up with `Variable Length Longs` which only take as many bytes as necisary, the smallest numbers
To solve this I came up with `Variable Length Longs` which only take as many bytes as necessary, the smallest numbers
requiring just 1 byte when encoded.

### How
This is achieved by using the **Most Significant Bit** (_MSB_) as a `Continuation Flag` when parsing. This means that each
encoded byte only represents 7 bits of data, meaning in worst case scenarios we could use up to 10 bytes to represent
an 8 byte Long. In this use case it never occurs, as term frequencies are never that large.

When we begin reading a `vlong` feild, we mask out the continuation bit, take the 7 data bits, shift them depending on
When we begin reading a `vlong` field, we mask out the continuation bit, take the 7 data bits, shift them depending on
how many bytes we've read so far for this vlong, and add that to an accumulator Long.

```mermaid
Expand Down

0 comments on commit 9924c80

Please sign in to comment.