-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] An Error Occured While Reading Parquet File Using C++ - GetRecordBatchReader -Corrupt snappy compressed data. #31992
Comments
Weston Pace / @westonpace:
I did not get any errors and got the expected output:
Does my test program work in your environment? |
SnappyCodec::Decompress() called from SerializedPageReader::DecompressIfNeeded() fails if input_len == 0 |
|
Could you open a pull request with a test? |
IIRC, levels are compressed together with values. If all values are NULLs, it must have definition levels encoded and compressed. In any case, the compressed length should not be 0. The fix itself looks reasonable to me. |
Would it be enough if I place buggy parquet here instead? :) The file is made by java lib
|
@4ertus2 Do you mind opening a pull request against https://github.com/apache/parquet-testing to add this file? |
Hi All
When I use Arrow Reading Parquet File like follow:
status is not ok and an error occured like this:
When I comment out this statement
The program runs normally and I can read parquet file well.
Program errors only occur when I read multiple columns and using _reader->set_use_threads(true); and a single column will not occur error
The testing parquet file is created by pyarrow,I use only 1 group and each group has 3000000 records.
The parquet file has 20 columns including int and string types
you can create a test parquet file using attachment python script
In my case,I read 0,1,2,3,4,5,6 index columns
Reading file using C++,arrow 7.0.0 ,snappy 1.1.8
Writting file using python3.8 ,pyarrow 7.0.0
Looking forward to your reply
Thank you!
@pitrou
@westonpace
Environment: C++,arrow 7.0.0 ,snappy 1.1.8, arrow 8.0.0
pyarrow 7.0.0 ubuntu 9.4.0 python3.8,
Reporter: yurikoomiga
Original Issue Attachments:
Externally tracked issue: #13186
Note: This issue was originally created as ARROW-16642. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: