-
-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213
Comments
Ok. Which version is this with? Is the input in form of bytes ( |
Version 2.6.1 (latest). I was using the File version of the parser, the problem was in the So the canonicalizer is given one name in bytes, but returns another, because of a hash collision and a missing safety check. |
Quick note: I am on vacation, and returning in one week. So while this is a critical issue, there is no progress due to this, but we'll get it fixed as soon as I get back next week. I was also wondering if this might be related to https://github.com/FasterXML/jackson-dataformat-smile/issues/26 given that both parsers (and CBOR as well) share the new symbol table implementation for 2.6. |
@cowtowncoder we worked around it by not using the Quad class, so I'm not in a hurry to get a fix. It is kind of critical (as you mentioned) though :) About the other problem -- it might be related, but I haven't looked deeper. From our data set we only saw one key to be replaced by another, it may cause an array out of bounds in another module. |
@cowtowncoder I'm experiencing similar issues related to the I have a file with dictionaries with about 500-800 elements each, however when parsing this file, sometimes a few elements are missing, but most of the times they are not. When I disable |
I suspect this -- FasterXML/jackson-databind#916 -- is same. |
If you have a big dictionary (150 000 keys), it will randomly swap one of the field names with another. We traced it down to CANONICALIZE_FIELD_NAMES (if disabled, it doesn't happen).
Out of 1000 parsings of a file with 150 000 keys, around 50 (5 %) will have a single key swapped. I guess if you try with more keys it will fail more often.
Our keys are randomly generated
/[0-9A-Za-z]{17}/
The text was updated successfully, but these errors were encountered: