Chunked compute shouldn't preserve chunks during scan #1943

gatesn · 2025-01-14T16:53:56Z

I don't know the right API for this, but a scan already partitions the file into splits. When column splits don't align, we end up computing filters and projections over chunked arrays.

Currently the chunked compute functions preserve chunking, which adds lots of overhead to the splits which we know are reasonably sized.

Either we should never preserve chunking? Or we should set some thread local options that we can configure during the scan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunked compute shouldn't preserve chunks during scan #1943

Chunked compute shouldn't preserve chunks during scan #1943

gatesn commented Jan 14, 2025

Chunked compute shouldn't preserve chunks during scan #1943

Chunked compute shouldn't preserve chunks during scan #1943

Comments

gatesn commented Jan 14, 2025