-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Support aggregated basic stats in partition summary #11669
Conversation
@deniskuzZ: Could you please provide a short description what data is stored in the summary and in what format? I think it is important to understand the cost for keeping this stat up-to-date. How costly is to calculate it, and what is the data size increase caused by this change. @findepi: Could this be useful for Trino? Does Trino have some optimiyation like this? |
This discussion could be relevant here too: https://lists.apache.org/thread/0q1csnkfg8jc11zo1dlssjkr4v8s8zz0 |
@pvary, unfortunately, that won't work. I was looking for an easy way to get basic partition stats, however, I missed the part that iceberg only keeps the changed partitions in a SnapshotSummary. Aggregation with just the prev snapshot value is not enough, it requires loop through all the snapshots.
do you think it's worth doing it in SnapshotSummary or is there some simpler/better way like create or update the partition stats puffin file right after the commit? |
Found partition stats tracker issue #8450 with the following design doc: https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk |
And here is the relevant mailing list thread: https://lists.apache.org/thread/knl1ol7s1o2p7rglgl2mm8c5mc2pk0sx @ajantha-bhat: Are you still working on the proposal? |
Yes, it is still active. But it is not getting enough reviews. #11216 is the last PR that is needed for the functionality to work. |
@deniskuzZ: Could you please comment on my last PR that this feature will be helpful for Hive? and you are looking for it. |
Should we reopen this PR or is it superseded by another one ? |
ref: apache/hive#5498