Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a step in normalization of count #105

Open
BeeBreeze opened this issue Oct 21, 2020 · 2 comments
Open

a step in normalization of count #105

BeeBreeze opened this issue Oct 21, 2020 · 2 comments
Labels

Comments

@BeeBreeze
Copy link

I am new here and still reading the source code. In this line, why normalizedCounter[s] = -1? In my opinion, -1 should be 1. Could you please explain it to me? Thanks a lot.

normalizedCounter[s] = -1;

@Cyan4973
Copy link
Owner

It's a special case, meaning "this symbol has a weight of 1, because it can't be lower than 1, but really, it's so small, it should be a fraction of that". This information has consequences on the way the table is built, because not all positions in the table are equivalent, therefore such symbols will be attributed the least probable positions.

This is pretty advanced stuff. It's not "necessary" to know it. You may also just as well provide "1" to these symbols, and it will work, they will just receive a "normal slot" which is going to negatively impact the global compression ratio by a very little amount, but no big deal.

@JarekDuda
Copy link

This is basic tuning, a year ago I have finally written paper about tuning: https://arxiv.org/pdf/2106.06438
For 2048 states and 256 size alphabet, ~100 byte header allows to work deltaH/H ~ 0.002 from Shannon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants