Refresh cagg uses min value for dimension when start_time is NULL #7546

gayyappan · 2024-12-18T19:03:15Z

When the refresh_continuous_aggregate window's start is NULL, use the min value in the hypertable to determine the beginning of the range instead of the min value for the partition column.

We do this only if enable_tiered_reads is set to false.

codecov · 2024-12-18T19:14:05Z

Codecov Report

Attention: Patch coverage is 10.81081% with 33 lines in your changes missing coverage. Please review.

Project coverage is 82.15%. Comparing base (59f50f2) to head (8c32688).
Report is 660 commits behind head on main.

Files with missing lines	Patch %	Lines
src/hypertable.c	0.00%	25 Missing ⚠️
tsl/src/continuous_aggs/refresh.c	33.33%	4 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7546      +/-   ##
==========================================
+ Coverage   80.06%   82.15%   +2.09%     
==========================================
  Files         190      230      +40     
  Lines       37181    43360    +6179     
  Branches     9450    10912    +1462     
==========================================
+ Hits        29770    35624    +5854     
- Misses       2997     3412     +415     
+ Partials     4414     4324      -90

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mkindahl

The description of the PR explains what is done, but it is more important why it is done. In particular:

The old behavior take the minimum value of the type when NULL is provided to create an open-ended range. What is the situation that break prompting you to do this approach instead?

mkindahl · 2025-01-10T14:06:25Z

src/hypertable.c

+	appendStringInfo(command,
+					 "SELECT pg_catalog.min(%s) FROM %s.%s",
+					 quote_identifier(NameStr(dim->fd.column_name)),
+					 quote_identifier(NameStr(ht->fd.schema_name)),
+					 quote_identifier(NameStr(ht->fd.table_name)));


This does a full table scan of the entire hypertable. With a hypertable that is big, this is going to be a significant problem, especially if tiering is involved.

Changed this to get min value from chunk metadata.

github-actions · 2025-01-14T17:46:31Z

@akuzm, @erimatnor: please review this pull request.

Powered by pull-review

When the refresh_continuous_aggregate window's start is NULL, use the min value in the hypertable to determine the beginning of the range instead of the min value for the partition column. We do this only if enable_tiered_reads is set to false.

… whole table

zilder · 2025-01-15T11:01:25Z

src/hypertable.c

+	 * Query for the oldest chunk in the hypertable.
+	 */
+	command = makeStringInfo();
+	appendStringInfo(command, query_str, ht->fd.id, dim->fd.id);


Is this correct arguments order? From looking at the query string it seems dim->fd.id should come first.

nit: psprintf would do the same as StringInfo here, but would be shorter.

fabriziomello

It is not clear yet if this is the right way to solve the problem of enabling/disabling tiered read and the impacts on the caggs.

Let's please remove it from 2.18.0 milestone and discuss more.

gayyappan · 2025-01-15T20:12:35Z

It is not clear yet if this is the right way to solve the problem of enabling/disabling tiered read and the impacts on the caggs.

Let's please remove it from 2.18.0 milestone and discuss more.

This solves the problem only for the special case described here. And yes, we need a more general solution. This PR is meant to address the more common case (where we have bad materialization entries) and a stop gap until we implement the incremental solution.

gayyappan marked this pull request as draft December 18, 2024 19:03

gayyappan added this to the TimescaleDB 2.18.0 milestone Jan 10, 2025

mkindahl reviewed Jan 10, 2025

View reviewed changes

gayyappan force-pushed the try branch from 0de7b82 to 7c161d8 Compare January 14, 2025 17:41

gayyappan marked this pull request as ready for review January 14, 2025 17:46

github-actions bot assigned gayyappan Jan 14, 2025

gayyappan requested a review from fabriziomello January 14, 2025 17:46

github-actions bot requested review from akuzm and erimatnor January 14, 2025 17:46

gayyappan added 4 commits January 14, 2025 18:08

Get min value from dimension slice information instead of reading the…

7c4f736

… whole table

Test case for cagg with enable_tiered_reads

88e6f7b

PR file

b465b78

gayyappan force-pushed the try branch from 7c161d8 to b465b78 Compare January 14, 2025 23:12

zilder reviewed Jan 15, 2025

View reviewed changes

fabriziomello requested changes Jan 15, 2025

View reviewed changes

fabriziomello removed this from the TimescaleDB 2.18.0 milestone Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh cagg uses min value for dimension when start_time is NULL #7546

Refresh cagg uses min value for dimension when start_time is NULL #7546

gayyappan commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024

mkindahl left a comment

mkindahl Jan 10, 2025

gayyappan Jan 14, 2025

github-actions bot commented Jan 14, 2025

zilder Jan 15, 2025

zilder Jan 15, 2025

fabriziomello left a comment

gayyappan commented Jan 15, 2025

Refresh cagg uses min value for dimension when start_time is NULL #7546

Are you sure you want to change the base?

Refresh cagg uses min value for dimension when start_time is NULL #7546

Conversation

gayyappan commented Dec 18, 2024 • edited Loading

codecov bot commented Dec 18, 2024

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

mkindahl Jan 10, 2025

Choose a reason for hiding this comment

gayyappan Jan 14, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 14, 2025

zilder Jan 15, 2025

Choose a reason for hiding this comment

zilder Jan 15, 2025

Choose a reason for hiding this comment

fabriziomello left a comment

Choose a reason for hiding this comment

gayyappan commented Jan 15, 2025

gayyappan commented Dec 18, 2024 •

edited

Loading