Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yappy levels mapping #173

Open
tansy opened this issue Jan 22, 2025 · 0 comments
Open

Yappy levels mapping #173

tansy opened this issue Jan 22, 2025 · 0 comments

Comments

@tansy
Copy link
Collaborator

tansy commented Jan 22, 2025

Yappy levels mapping

I happen to check yappy and to my surprise it uses 100 levels. 100 levels that are virtually indistinguishable from each other.
Checked the source and I get where it come from but these levels are needlessly numerous.
While in the beginning they do provide measurable difference, later they are basically the same - difference in ratio between level 98 and 99 is 0.000434%.

# lzbench -eyappy silesia.tar

Compressor name  lv Compr.size Ratio  Filename 
yappy 2014-03-22 -0  109432876 51.63  silesia.tar
yappy 2014-03-22 -1  105755106 49.89  silesia.tar
yappy 2014-03-22 -2  103971220 49.05  silesia.tar
yappy 2014-03-22 -3  102945870 48.57  silesia.tar
yappy 2014-03-22 -4  102351839 48.29  silesia.tar
yappy 2014-03-22 -5  101910770 48.08  silesia.tar
yappy 2014-03-22 -6  101582066 47.93  silesia.tar
yappy 2014-03-22 -7  101303954 47.79  silesia.tar
yappy 2014-03-22 -8  100703725 47.51  silesia.tar
yappy 2014-03-22 -9  100158415 47.25  silesia.tar
yappy 2014-03-22 -10 100020279 47.19  silesia.tar
(...)     
yappy 2014-03-22 -95  98683510 46.56  silesia.tar
yappy 2014-03-22 -96  98682010 46.56  silesia.tar
yappy 2014-03-22 -97  98675172 46.55  silesia.tar
yappy 2014-03-22 -98  98674715 46.55  silesia.tar
yappy 2014-03-22 -99  98673795 46.55  silesia.tar

So I tried what every gzip dev would do - map these levels into smaller, more meaningful and manageable range, namely 1..9.
As returns from more 'levels' diminishes quickly, likely at exponential pace so I decided to choose exponential number of iterations as a yardstick and comeup with this formula:

# "lzbench-yappy-9_lv__lv2=(1<<lv)>>2" -eyappy,1,2,3,4,5,6,7,8,9 silesia.tar

Compressor name  lv  old Compr.size Ratio  Filename 
yappy 2014-03-22 -1    0  109432876 51.63  silesia.tar
yappy 2014-03-22 -2    1  105755106 49.89  silesia.tar
yappy 2014-03-22 -3    2  103971220 49.05  silesia.tar
yappy 2014-03-22 -4    4  102351839 48.29  silesia.tar
yappy 2014-03-22 -5    8  100703725 47.51  silesia.tar
yappy 2014-03-22 -6   16   99471363 46.93  silesia.tar
yappy 2014-03-22 -7   32   98912604 46.67  silesia.tar
yappy 2014-03-22 -8   64   98734033 46.58  silesia.tar
yappy 2014-03-22 -9  128   98644248 46.54  silesia.tar

Now differece in ratio between levels 8 and 9 is 0.042359%. Still very little, but somewhat more meaningful. 100 times more.
The low levels are also well 'spread', correspondingly with their performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant