Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh cagg uses min value for dimension when start_time is NULL #7546

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gayyappan
Copy link
Contributor

@gayyappan gayyappan commented Dec 18, 2024

When the refresh_continuous_aggregate window's start is NULL, use the min value in the hypertable to determine the beginning of the range instead of the min value for the partition column.

We do this only if enable_tiered_reads is set to false.

@gayyappan gayyappan marked this pull request as draft December 18, 2024 19:03
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 10.81081% with 33 lines in your changes missing coverage. Please review.

Project coverage is 82.15%. Comparing base (59f50f2) to head (8c32688).
Report is 660 commits behind head on main.

Files with missing lines Patch % Lines
src/hypertable.c 0.00% 25 Missing ⚠️
tsl/src/continuous_aggs/refresh.c 33.33% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7546      +/-   ##
==========================================
+ Coverage   80.06%   82.15%   +2.09%     
==========================================
  Files         190      230      +40     
  Lines       37181    43360    +6179     
  Branches     9450    10912    +1462     
==========================================
+ Hits        29770    35624    +5854     
- Misses       2997     3412     +415     
+ Partials     4414     4324      -90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gayyappan gayyappan added this to the TimescaleDB 2.18.0 milestone Jan 10, 2025
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the PR explains what is done, but it is more important why it is done. In particular:

The old behavior take the minimum value of the type when NULL is provided to create an open-ended range. What is the situation that break prompting you to do this approach instead?

src/hypertable.c Outdated
Comment on lines 2430 to 2434
appendStringInfo(command,
"SELECT pg_catalog.min(%s) FROM %s.%s",
quote_identifier(NameStr(dim->fd.column_name)),
quote_identifier(NameStr(ht->fd.schema_name)),
quote_identifier(NameStr(ht->fd.table_name)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a full table scan of the entire hypertable. With a hypertable that is big, this is going to be a significant problem, especially if tiering is involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to get min value from chunk metadata.

@gayyappan gayyappan marked this pull request as ready for review January 14, 2025 17:46
@github-actions github-actions bot requested review from akuzm and erimatnor January 14, 2025 17:46
Copy link

@akuzm, @erimatnor: please review this pull request.

Powered by pull-review

When the refresh_continuous_aggregate window's start is NULL, use
the min value in the hypertable to determine the beginning of
the range instead of the min value for the partition column.

We do this only if enable_tiered_reads is set to false.
* Query for the oldest chunk in the hypertable.
*/
command = makeStringInfo();
appendStringInfo(command, query_str, ht->fd.id, dim->fd.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct arguments order? From looking at the query string it seems dim->fd.id should come first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: psprintf would do the same as StringInfo here, but would be shorter.

Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear yet if this is the right way to solve the problem of enabling/disabling tiered read and the impacts on the caggs.

Let's please remove it from 2.18.0 milestone and discuss more.

@fabriziomello fabriziomello removed this from the TimescaleDB 2.18.0 milestone Jan 15, 2025
@gayyappan
Copy link
Contributor Author

It is not clear yet if this is the right way to solve the problem of enabling/disabling tiered read and the impacts on the caggs.

Let's please remove it from 2.18.0 milestone and discuss more.

This solves the problem only for the special case described here. And yes, we need a more general solution. This PR is meant to address the more common case (where we have bad materialization entries) and a stop gap until we implement the incremental solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants