Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to decode huffman code" on image that works fine in other software #229

Open
SludgePhD opened this issue Feb 23, 2022 · 5 comments

Comments

@SludgePhD
Copy link

The following image fails to decode with a "failed to decode huffman code" error, but it displays "fine" (with some corruption in the bottom part of the image) in Firefox, sxiv, VS Code, and Dolphin:

error

@quilan1
Copy link
Contributor

quilan1 commented Jul 8, 2022

This is a case of a corrupt entropy stream, but it's not quite clear what the corruption is. I can, however, illustrate why the stream yields the error. We'll start from the file offset 0xD19 (3353'th byte). Here begins the coefficient decode for the MCU block (x=4, y=14), or in coordinates of the image, (32, 112), specifically component {id=2}.

Please excuse the crude diagram:

   |---69--|---C1--|---A9--|---5B--|---B7--|---1F--|---BE--|---8F--|---FE--|---FB--
   01101001110000011010100101011011100001110001111110111110100011111111111011111011
H3: ---100                   H6: ------011100                  ????????????????????
      H6: ------001101                   H1: --1
                  H1: --0                  H34: --------11
                     H1: --0                          H1: --0
                       H17: ----0                        H0: --

We start off with reading the DC coefficient; H3 means the Huffman symbol for '3' (110b in this table), followed by 3 bits (100b), to be calculated as some coefficient offset value. Irrelevant here. Then it reads the AC coefficients, until it reaches the Huffman '0' symbol, signifying that it's the end of the coefficient information. All good'ish (although I think this row is a bit suspect in general)!

Now comes the next set of coefficients we need to read, for component {id=3}. We have the following set of bits ahead in the stream: 11111111111011111011 -- eleven one-bits ahead.

The problem is that the DC Huffman table in use has no matching code for this pattern of eleven one-bits. It's got the following entries:

Value Code Value Code
0 00 6 111110
1 01 7 1111110
2 10 8 11111110
3 110 9 111111110
4 1110 10 1111111110
5 11110 11 11111111110

At this point, I'm not sure what the correct approach to even trying to recover from the issue would be. Somewhere the stream got garbled by some bits, but there's (to my brief look) no obvious place. I suppose the bare minimum one could theoretically do to recover from this type of error, would be to scan forward for the next RST marker, and drop all of the coefficients until then (super ugly), but the problem here is that this stream doesn't have any.

@micahsnyder
Copy link
Contributor

I have a couple more example jpegs that are handled by other software by fail to load with jpeg-decoder because of the "failed to decode huffman code" error.

1574264676-failed-to-decode-huffman-code

1853202599-failed-to-decode-huffman-code

Admittedly, my sample files look even more like garbage than @SludgePhD's file. But it would be nice if image-rs/jpeg-decoder could make a best effort. My use case is to generate image fuzzy hashes for malware identification. So my hope is to make a best effort even if the image is technically malformed.

@fintelia
Copy link
Contributor

@micahsnyder If you want to match what other viewers display as closely as possible, I'd suggest relying on libjpeg since that's what those viewers are probably using. Any other decoder implementation is almost surely going to display some corrupted images differently since they don't usually target "bug for bug compatibility"

@micahsnyder
Copy link
Contributor

@fintelia that's good advice. Thanks. In my case it's okay if it's not rendered the same, so long as it gives a usable image that can be hashed. Our tools for both creating the hashes and matching the hashes use the same code with image-rs / jpeg-decoder. So I think it is okay for us if the best effort comes out a little differently.

@MoSal
Copy link

MoSal commented Jul 28, 2022

So this basically relates to #252, except "partially-invalid" would be more accurate than "incomplete", and you wouldn't want the decoder to give up as early as possible if you want this to be generally useful.

This patch puts the decoder on par with libjpeg-turbo and FFmpeg (with this patch applied) in my very limited tests.

The first part of the patch (the read_marker part) was done to handle the image from @micahsnyder. This may not be how/where EOF errors should/would be handled.

The last part (continuing after decode block errors) should be generally useful. It fixed an image that brought me here, and it's enough to handle the image from @SludgePhD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants