Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[comet-parquet-exec] Track remaining test failures in POC 1 & 2 #1228

Open
andygrove opened this issue Jan 7, 2025 · 0 comments
Open

[comet-parquet-exec] Track remaining test failures in POC 1 & 2 #1228

andygrove opened this issue Jan 7, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

andygrove commented Jan 7, 2025

What is the problem the feature request solves?

I thought it would be useful to have one issue to track all the test failures that we are currently working on resolving

POC 1

58 failing tests as of Jan 7, excluding stability tests:

  • unsupported Spark types (prefetch enabled) *** FAILED *** (12 milliseconds)
  • unsupported Spark types *** FAILED *** (0 milliseconds)
  • timestamp (prefetch enabled) *** FAILED *** (139 milliseconds)
  • timestamp *** FAILED *** (68 milliseconds)
  • timestamp as int96 (prefetch enabled) *** FAILED *** (165 milliseconds)
  • timestamp as int96 *** FAILED *** (124 milliseconds)
  • test multiple pages with different sizes and nulls (prefetch enabled) *** FAILED *** (142 milliseconds)
  • test multiple pages with different sizes and nulls *** FAILED *** (91 milliseconds)
  • unsigned long supported (prefetch enabled) *** FAILED *** (65 milliseconds)
  • unsigned long supported *** FAILED *** (50 milliseconds)
  • FIXED_LEN_BYTE_ARRAY support (prefetch enabled) *** FAILED *** (63 milliseconds)
  • FIXED_LEN_BYTE_ARRAY support *** FAILED *** (51 milliseconds)
  • schema evolution (prefetch enabled) *** FAILED *** (277 milliseconds)
  • schema evolution *** FAILED *** (275 milliseconds)
  • scan metrics (prefetch enabled) *** FAILED *** (166 milliseconds)
  • scan metrics *** FAILED *** (132 milliseconds)
  • row group skipping doesn't overflow when reading into larger type (prefetch enabled) *** FAILED *** (67 milliseconds)
  • row group skipping doesn't overflow when reading into larger type *** FAILED *** (55 milliseconds)
  • Test V1 parquet scan uses respective scanner (prefetch enabled) *** FAILED *** (126 milliseconds)
  • Test V1 parquet scan uses respective scanner *** FAILED *** (108 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (154 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (196 milliseconds)
  • native shuffle: different data type *** FAILED *** (1 second, 281 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (127 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (163 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (121 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (152 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (126 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (152 milliseconds)
  • reading ancient dates before 1582 *** FAILED *** (47 milliseconds)
  • reading ancient timestamps before 1582 *** FAILED *** (39 milliseconds)
  • reading ancient int96 timestamps before 1582 *** FAILED *** (313 milliseconds)
  • SortMergeJoin with unsupported key type should fall back to Spark *** FAILED *** (183 milliseconds)
  • HashJoin struct key *** FAILED *** (1 second, 215 milliseconds)
  • unsupported Spark types (prefetch enabled) *** FAILED *** (0 milliseconds)
  • unsupported Spark types *** FAILED *** (0 milliseconds)
  • cast TimestampType to LongType *** FAILED *** (147 milliseconds)
  • cast TimestampType to DecimalType(10,2) *** FAILED *** (116 milliseconds)
  • cast TimestampType to StringType *** FAILED *** (115 milliseconds)
  • cast TimestampType to DateType *** FAILED *** (115 milliseconds)
  • cast StructType to StringType *** FAILED *** (297 milliseconds)
  • Comet native metrics: scan *** FAILED *** (84 milliseconds)
  • explain native plan *** FAILED *** (243 milliseconds)
  • spill sort with (multiple) dictionaries on mixed columns *** FAILED *** (86 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (127 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (154 milliseconds)
  • basic data type support *** FAILED *** (136 milliseconds)
  • hour on int96 timestamp column *** FAILED *** (84 milliseconds)
  • cast timestamp and timestamp_ntz *** FAILED *** (100 milliseconds)
  • cast timestamp and timestamp_ntz to string *** FAILED *** (90 milliseconds)
  • cast timestamp and timestamp_ntz to long, date *** FAILED *** (102 milliseconds)
  • date_trunc with timestamp_ntz *** FAILED *** (102 milliseconds)
  • date_trunc with format array *** FAILED *** (160 milliseconds)
  • date_trunc on int96 timestamp column *** FAILED *** (106 milliseconds)
  • round *** FAILED *** (378 milliseconds)
  • hex *** FAILED *** (209 milliseconds)
  • Year *** FAILED *** (98 milliseconds)

POC 2

66 failing tests as of Jan 7, excluding stability tests:

  • Standard mode - decimals *** FAILED *** (3 seconds, 417 milliseconds)
  • Legacy mode - decimals *** FAILED *** (2 seconds, 801 milliseconds)
  • unsupported Spark types (prefetch enabled) *** FAILED *** (11 milliseconds)
  • unsupported Spark types *** FAILED *** (0 milliseconds)
  • timestamp *** FAILED *** (128 milliseconds)
  • timestamp as int96 *** FAILED *** (93 milliseconds)
  • test multiple pages with different sizes and nulls *** FAILED *** (71 milliseconds)
  • unsigned long supported *** FAILED *** (41 milliseconds)
  • FIXED_LEN_BYTE_ARRAY support *** FAILED *** (39 milliseconds)
  • scan metrics *** FAILED *** (142 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (144 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (176 milliseconds)
  • fix: native Unsafe row accessors return incorrect results *** FAILED *** (878 milliseconds)
  • fix: StreamReader should always set useDecimal128 as true *** FAILED *** (184 milliseconds)
  • columnar shuffle: different data type *** FAILED *** (1 second, 504 milliseconds)
  • native shuffle: different data type *** FAILED *** (1 second, 208 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (124 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (163 milliseconds)
  • fix: native Unsafe row accessors return incorrect results *** FAILED *** (859 milliseconds)
  • fix: StreamReader should always set useDecimal128 as true *** FAILED *** (180 milliseconds)
  • columnar shuffle: different data type *** FAILED *** (1 second, 389 milliseconds)
  • reading ancient dates before 1582 *** FAILED *** (43 milliseconds)
  • reading ancient timestamps before 1582 *** FAILED *** (36 milliseconds)
  • reading ancient int96 timestamps before 1582 *** FAILED *** (292 milliseconds)
  • SortMergeJoin with unsupported key type should fall back to Spark *** FAILED *** (180 milliseconds)
  • unsupported Spark types (prefetch enabled) *** FAILED *** (0 milliseconds)
  • unsupported Spark types *** FAILED *** (1 millisecond)
  • cast TimestampType to LongType *** FAILED *** (123 milliseconds)
  • cast TimestampType to DecimalType(10,2) *** FAILED *** (99 milliseconds)
  • cast TimestampType to StringType *** FAILED *** (102 milliseconds)
  • cast TimestampType to DateType *** FAILED *** (103 milliseconds)
  • cast StructType to StringType *** FAILED *** (284 milliseconds)
  • Sort on single struct should fallback to Spark *** FAILED *** (155 milliseconds)
  • Comet native metrics: scan *** FAILED *** (81 milliseconds)
  • spill sort with (multiple) dictionaries on mixed columns *** FAILED *** (78 milliseconds)
  • columnar shuffle on struct including nulls *** FAILED *** (124 milliseconds)
  • columnar shuffle on nested struct *** FAILED *** (164 milliseconds)
  • fix: native Unsafe row accessors return incorrect results *** FAILED *** (840 milliseconds)
  • fix: StreamReader should always set useDecimal128 as true *** FAILED *** (168 milliseconds)
  • columnar shuffle: different data type *** FAILED *** (1 second, 453 milliseconds)
  • basic data type support *** FAILED *** (119 milliseconds)
  • date and timestamp type literals *** FAILED *** (106 milliseconds)
  • hour on int96 timestamp column *** FAILED *** (80 milliseconds)
  • cast timestamp and timestamp_ntz *** FAILED *** (86 milliseconds)
  • cast timestamp and timestamp_ntz to string *** FAILED *** (76 milliseconds)
  • cast timestamp and timestamp_ntz to long, date *** FAILED *** (88 milliseconds)
  • date_trunc with timestamp_ntz *** FAILED *** (83 milliseconds)
  • date_trunc with format array *** FAILED *** (150 milliseconds)
  • date_trunc on int96 timestamp column *** FAILED *** (107 milliseconds)
  • round *** FAILED *** (318 milliseconds)
  • hex *** FAILED *** (190 milliseconds)
  • Year *** FAILED *** (78 milliseconds)
  • get_struct_field - select primitive fields *** FAILED *** (98 milliseconds)
  • get_struct_field - select subset of struct *** FAILED *** (95 milliseconds)
  • get_struct_field - read entire struct *** FAILED *** (118 milliseconds)

Issues

Causes of the above above failures (partial list)

  • Need to be able to cast between different timestamp types and timezones
  • Need to add scan metrics

Describe the potential solution

No response

Additional context

No response

@andygrove andygrove added the enhancement New feature or request label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant