Auto-Compaction using Multi-Stage Query Engine #16291

gargvishesh · 2024-04-16T05:40:38Z

Description

Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters.

This PR enables compaction on MSQ to remove the dependency on native engine.

Main changes:

DataSourceCompactionConfig now has an additional field engine that can be one among [native, msq] with native being the default.
if engine is MSQ, CompactSegments duty assigns all available compaction task slots to the launched CompactionTask to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction.
ClientCompactionTaskQuery has a new field ClientCompactionRunnerInfo with just one Engine subfield.
CompactionTask now has CompactionRunner interface instance with its implementations NativeCompactinRunner in core and MSQCompactionRunner in the druid-multi-stage-query extension. . The objectmapper deserializes ClientCompactionRunnerInfo in ClientCompactionTaskQuery to the CompactionRunner instance that is mapped to the specified type [native, msq].
CompactTask uses the CompactionRunner instance it receives to create the indexing tasks.
CompactionTask to MSQControllerTask conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The storeCompactionState flag is set in the context.
Each created MSQControllerTask is launched in-place and its TaskStatus tracked to determine the final status of
the CompactionTask. The id of each of these tasks is the same as that of CompactionTask since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).

Some things to note:

The context specified in DataSourceCompactionConfig is passed as is to the MSQControllerTask and hence can contain MSQ context params as well, with the exception of rowsPerSegment -- which will be overridden by either targetRowsPerSegment or maxRowsPerSegment if specified in a partitionsSpec.
maxRowsInMemory param is only considered if specified in the context. The value in DataSourceCompactionConfig.tuningConfig is not considered as it is set to a default value (1M) if unspecified by a user, so it is indistinguishable between coming from the user or via the default.
If no maxNumTasks value is specified in the taskContext, min(availableCompactionTaskSlots, 5) is allotted to MSQ compaction tasks.
rollup:true without any metricsSpec de-duplicates rows since all columns are then treated as dimensions -- just as in native compaction.

Currently unsupported for MSQ Compaction:

Update of cluster-level default compaction engine. Current default is native which can be updated at a per-datasource level. The cluster-level update API will come in a follow-up PR.
Grouping on multi-value columns. The flag groupByEnableMultiValueUnnesting is disabled. Only array-type columns are supported
partitionsSpec of type HashedParititionsSpec. Only DimensionRangePartitionsSpec and DynamicPartitionsSpec works.
maxTotalRows in DynamicPartitionsSpec. Only maxRowsPerSegment works.
rollup set to false in granularitySpec when metricsSpec is specified. Only rollup set to true works with a non-empty metricsSpec.
Aggregators in metricsSpec where fieldName != name OR (internal) aggregatorFactory.class() != aggregatorFactory.getCombiningFactory.class(). This is a major limitation that will be addressed in a follow-up PR

Release note

Key changed/added classes in this PR

MyFoo
OurBar
TheirBaz

This PR has:

…onverters

# Conflicts: # indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java # server/src/main/java/org/apache/druid/server/coordinator/CoordinatorCompactionConfig.java

server/src/main/java/org/apache/druid/server/coordinator/DataSourceCompactionConfig.java

…paction config

...multi-stage-query/src/main/java/org/apache/druid/msq/compaction/CompactionToMSQTaskImpl.java

server/src/main/java/org/apache/druid/server/coordinator/DataSourceCompactionConfig.java

kfaraz · 2024-07-02T08:43:48Z

I am wondering if there are unit tests that test the new msq compaction flow in the unit tests. I could find the spec validation tests only.

I think conversion to MSQ spec is the main job of the MSQCompactionRunner. Unit testing the entire flow doesn't really ensure something is broken in this class. I'm planning to cover the entire flow in ITs instead.

@gargvishesh , I concur with @LakshSingla on this one. We need UTs to test the entire flow with MSQ the same way we are doing with native. We may add ITs too but most Druid devs rely on UTs more heavily as the IT flow is a little flaky currently.

If you have already added tests for the new flow, please share that detail here.
If you are yet to start on those, you could just try to parameterize the engine the existing compaction flow tests and run the same existing tests with both native and msq engines. Please let me know if you need any assistance with this.

kfaraz · 2024-07-05T05:09:57Z

server/src/main/java/org/apache/druid/server/coordinator/compact/CompactionStatus.java

@@ -305,17 +305,30 @@ private CompactionStatus metricsSpecIsUpToDate()
      if (ArrayUtils.isEmpty(configuredMetricsSpec)) {
        return COMPLETE;
      }
+      final AggregatorFactory[] configuredMetricsCombiningFactorySpec =


Why are we comparing just the combining factory now and not the original AggregatorFactory itself? Please add comments (and tests, if not already added) to clarity this change.

Added it as a javadoc for the function

I would advise retaining the old version of the code and doing this change in a follow up. In the current PR, I would prefer it if we didn't modify any of the logic for native compaction. What happens if we just stick to the old logic?

With the last change to unsupport problematic aggregator factory definitions, this change is not required. Have reverted it.

server/src/test/java/org/apache/druid/server/http/CoordinatorCompactionConfigsResourceTest.java

server/src/test/java/org/apache/druid/client/indexing/ClientCompactionRunnerInfoTest.java

server/src/test/java/org/apache/druid/server/coordinator/duty/CompactSegmentsTest.java

server/src/test/java/org/apache/druid/server/coordinator/DataSourceCompactionConfigTest.java

server/src/main/java/org/apache/druid/server/http/CoordinatorCompactionConfigsResource.java

server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java

gargvishesh · 2024-07-09T18:14:10Z

@kfaraz @LakshSingla
Thanks for your comments -- I do see the usefulness of the UTs for the entire MSQ-based compaction flow. The existing tests in CompactionTaskRunTest however cannot be parameterised since MSQ code resides in an extension. The tests therefore need to be present in the extension itself and require some effort for implementation. I will take them up in a follow-up PR.

server/src/test/java/org/apache/druid/server/http/CoordinatorCompactionConfigsResourceTest.java

kfaraz

Please resolve merge conflicts in the PR.

It has been introduced by a recent NPE bugfix in #16713 .

I think this bugfix is also included in the current PR but I decided merged it separately as it has been reported in community in a couple of places.

Sorry for the inconvenience!

server/src/test/java/org/apache/druid/server/http/CoordinatorCompactionConfigsResourceTest.java

server/src/test/java/org/apache/druid/server/coordinator/DataSourceCompactionConfigTest.java

server/src/test/java/org/apache/druid/client/indexing/ClientCompactionRunnerInfoTest.java

server/src/test/java/org/apache/druid/server/coordinator/duty/CompactSegmentsTest.java

server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java

...ration-tests/src/test/java/org/apache/druid/tests/coordinator/duty/ITAutoCompactionTest.java

# Conflicts: # server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java

kfaraz

Thanks a lot for taking this PR to completion, @gargvishesh !
You have been a great sport with addressing the multiple rounds of feedback. 🚀

gargvishesh · 2024-07-12T11:18:13Z

Thank you @kfaraz, @LakshSingla and @cryptoe for the time and effort to do all those rounds of reviews, esp. given the size of the PR.

Follow-up to #16291, this commit enables a subset of existing native compaction ITs on the MSQ engine. In the process, the following changes have been introduced in the MSQ compaction flow: - Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema` - Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names - Fix passing missing cluster-by clause in scan queries - Add annotation of `CompactionState` to tombstone segments

Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).

Follow-up to apache#16291, this commit enables a subset of existing native compaction ITs on the MSQ engine. In the process, the following changes have been introduced in the MSQ compaction flow: - Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema` - Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names - Fix passing missing cluster-by clause in scan queries - Add annotation of `CompactionState` to tombstone segments

gargvishesh added 13 commits January 8, 2024 14:00

Checkpointing poc work

c1dc09d

Add CompactionClient

d11dcf4

Merge branch 'master' into msq-auto-compaction

1d4a230

Compaction POC working version

a861c57

Cleanup

959a9b3

Cleanup 2

5290bdc

Merge branch 'master' into msq-auto-compaction

c7b53e5

Merge branch 'master' into msq-auto-compaction

6cd0230

Separate out native and msq-based compaction flows and refine query c…

65a0343

…onverters

Add engine field compaction config validation

9b9128f

Compaction to scan query on msq

dfaef5d

Merge branch 'master' into msq-auto-compaction

e142186

# Conflicts: # indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java # server/src/main/java/org/apache/druid/server/coordinator/CoordinatorCompactionConfig.java

Code cleanup

1deef3e

github-actions bot added Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 16, 2024

github-advanced-security bot found potential problems Apr 16, 2024

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/DataSourceCompactionConfig.java Fixed Show fixed Hide fixed

kfaraz self-requested a review April 16, 2024 06:11

kfaraz added the Design Review label Apr 16, 2024

Precondition checks and updates to match lastCompactionState with com…

177bea9

…paction config

github-actions bot added the Area - Querying label Apr 22, 2024

github-advanced-security bot found potential problems Apr 22, 2024

View reviewed changes

...multi-stage-query/src/main/java/org/apache/druid/msq/compaction/CompactionToMSQTaskImpl.java Fixed Show fixed Hide fixed

server/src/main/java/org/apache/druid/server/coordinator/DataSourceCompactionConfig.java Fixed Show fixed Hide fixed

gargvishesh added 8 commits April 23, 2024 11:09

Simplify guice module and refactor CompactionToMSQTaskImpl

e6222e8

Fix checkstyle

8923fbd

Move CompactionToMSQTaskImpl to msq indexing.

279c619

Merge branch 'master' into msq-auto-compaction

affa12b

Fix tests

0a75321

Move to json deserialization of compaction strategy.

6e2d3f4

Refactorings

c600fa3

Revert compaction engine field addition in CompactionState for now

2f0203e