-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement ideas #12
Comments
Besides, as admitted in the Anyway, I think that the performance of this crate can be improved! Using |
Yeah, I'm not too worried about performance really, zopfli is always going to be horrendously slow 🤣. I just came across crc32fast and thought it might be useful here. Although, ECT has demonstrated that fast zopfli is possible. I guess you could try and figure out how it was done. zopfli#119 and zopfli#110 (including the discussions) may be good reference. |
When dealing with near-optimal algorithms like Zopfli, one must keep the no free lunch theorem in mind. A patch may perform better on a given benchmark, in detriment of other possible benchmarks 😉 However, thank you for sharing the links! ECT certainly did some changes that are worth looking into, and multi-threading sounds like a relatively low-hanging fruit that could bring some significant performance improvements to the table. |
The current implementation looks very inefficient, handling io::Error for every byte: Lines 31 to 50 in 41d8712
|
I was focusing my attention on the Edit: I've realized I had a much more efficient-looking hashing read wrapper type lying around from another Rust project of mine, so let's transplant it to Zopfli and see how it performs! |
As pointed out by @kornelski, using IterRead for the task of updating a rolling hasher as bytes are pulled from a reader is likely to be very inefficient. Let's use a more specific wrapper type that does such a thing per block of data read, not per byte. While at it, let's replace the `crc32` dependency by `crc32fast`, and `adler32` by `simd-adler32`, which are faster equivalents. `iter-read` is no longer necessary, so it has been removed. The performance impact of these changes has not been properly analyzed yet, but some preliminary runs of `make zopfli && make test` show no noticeable performance regressions at least, which is good! Addresses #12.
I've switched to |
That looks great! |
Nice! I measured this around 3-4% faster on one file I tested.
Ha, true! Though in my experience ECT can compress better than zopfli's default settings while still being about 100x faster (without multithreading). On the topic of near-optimal algorithms, libdeflate has its own implementation which is also very fast, though falls just short of the ratios that ECT/zopfli can reach. |
Indeed, by the looks of it, ECT and other projects you've linked to did quite a few smart optimizations to the hot paths that work better in the practical scenarios I can imagine. Admittedly, this crate is more focused on correctness and being a faithful Rust port of the original Zopfli code than performance. Anyway, I'm glad that cleanup worked well for you, and I'd like this crate to become faster too, by implementing some of those ideas! Truth be told, however, I didn't dive that deep into the matter, and getting these optimizations done right requires a lot of effort. I feel confident about implementing multi-threading at some point, but gaining further substantial speedups by accumulating known micro-optimizations is a task that flies over my head right now 😇 For the time being, I'd like to keep this issue open to track performance improvement ideas. |
Yeah, I totally understand. Wish I knew more about compression myself 🙂 |
The modified code was part of the get_cost_stat function, which is pretty hot for bigger files (takes ~17% of the total CPU cycles for a sample file). Let's do some obvious improvements to use better numeric types here. Related issue: #12
I got some time and will to run This is by no means a scientific benchmark or the end of the road when it comes to performance optimization, but I'm amazed that this relatively simple change had such outsized impact 😄 |
Nice find! |
This SIMD accelerated crc library may be able to improve performance: https://github.com/srijs/rust-crc32fast
Not sure if this is related to the performance difference between this and zopfli-rs?
The text was updated successfully, but these errors were encountered: