Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213

ichernev · 2015-08-21T23:23:32Z

If you have a big dictionary (150 000 keys), it will randomly swap one of the field names with another. We traced it down to CANONICALIZE_FIELD_NAMES (if disabled, it doesn't happen).

Out of 1000 parsings of a file with 150 000 keys, around 50 (5 %) will have a single key swapped. I guess if you try with more keys it will fail more often.

Our keys are randomly generated /[0-9A-Za-z]{17}/

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2015-08-22T05:25:57Z

Ok. Which version is this with? Is the input in form of bytes (InputStream, File, byte[]) or chars (Reader, char[])? Also, how does this show itself -- does parser claim incorrect field name?

ichernev · 2015-08-24T19:42:15Z

Version 2.6.1 (latest). I was using the File version of the parser, the problem was in the ByteQuadCanonicalizer, one equality check is missing (a colleague of mine found it, I'll ask for details).

So the canonicalizer is given one name in bytes, but returns another, because of a hash collision and a missing safety check.

ichernev · 2015-08-25T17:06:15Z

This is the offending line: https://github.com/FasterXML/jackson-core/blob/master/src/main/java/com/fasterxml/jackson/core/sym/ByteQuadsCanonicalizer.java#L705

cowtowncoder · 2015-09-01T07:52:34Z

Quick note: I am on vacation, and returning in one week. So while this is a critical issue, there is no progress due to this, but we'll get it fixed as soon as I get back next week.

I was also wondering if this might be related to

https://github.com/FasterXML/jackson-dataformat-smile/issues/26

given that both parsers (and CBOR as well) share the new symbol table implementation for 2.6.
So it would seem likely that problems could manifest themselves in multiple places as well.

ichernev · 2015-09-01T16:03:20Z

@cowtowncoder we worked around it by not using the Quad class, so I'm not in a hurry to get a fix. It is kind of critical (as you mentioned) though :)

About the other problem -- it might be related, but I haven't looked deeper. From our data set we only saw one key to be replaced by another, it may cause an array out of bounds in another module.

vzx · 2015-09-07T14:55:13Z

@cowtowncoder I'm experiencing similar issues related to the CANONICALIZE_FIELD_NAMES feature as well.

I have a file with dictionaries with about 500-800 elements each, however when parsing this file, sometimes a few elements are missing, but most of the times they are not. When I disable CANONICALIZE_FIELD_NAMES, it works fine.

cowtowncoder · 2015-09-08T22:13:51Z

I suspect this -- FasterXML/jackson-databind#916 -- is same.
Since that one contains a test case, I'll start with it.

cowtowncoder mentioned this issue Sep 8, 2015

JSON with 17-character keys deserializes incorrectly (2.6.0, 2.6.1) FasterXML/jackson-databind#916

Closed

cowtowncoder added a commit that referenced this issue Sep 8, 2015

Add test for #213

6f496e7

cowtowncoder closed this as completed in cfeaed0 Sep 8, 2015

cowtowncoder added this to the 2.6.2 milestone Sep 8, 2015

tlrx mentioned this issue Sep 17, 2015

Update to Jackson 2.6.2 elastic/elasticsearch#13344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213

Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213

ichernev commented Aug 21, 2015

cowtowncoder commented Aug 22, 2015

ichernev commented Aug 24, 2015

ichernev commented Aug 25, 2015

cowtowncoder commented Sep 1, 2015

ichernev commented Sep 1, 2015

vzx commented Sep 7, 2015

cowtowncoder commented Sep 8, 2015

Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213

Parser is sometimes wrong when using CANONICALIZE_FIELD_NAMES #213

Comments

ichernev commented Aug 21, 2015

cowtowncoder commented Aug 22, 2015

ichernev commented Aug 24, 2015

ichernev commented Aug 25, 2015

cowtowncoder commented Sep 1, 2015

ichernev commented Sep 1, 2015

vzx commented Sep 7, 2015

cowtowncoder commented Sep 8, 2015