-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building #15817
Changes from 44 commits
33a8dd5
7a7ca55
eb6a145
b9fb83d
aa2bfe7
e97dcda
7badce1
25cdce6
5f5ad18
08e949e
80fc09d
4217cd8
8b7e483
d6ac350
533236b
70f0888
a176bfe
b32dfd6
17417b5
6a395a9
907ace3
cf68c38
441f37a
bd5b048
933d8d1
9e7e364
e7356ce
5d16148
d8884be
0f0805a
e129d3e
b4042c6
01c27c9
87c9873
971b347
89d3845
270dbd5
2da23b8
6f568a6
fe229c0
473b25c
39fb248
cac695a
eb3e3c1
30438f4
e88ad00
61d130b
2e4c45b
255cf2c
1a6dfc5
a961501
2e65726
4b51c42
a4e2097
3ca03c9
902abd3
bcab458
32a4065
6b04ee7
cb93e43
152c480
80c6d26
9cca98e
bb69cde
a604e65
f2d8a5e
37ac4a7
629f3ad
af34c34
4a4f1b0
b142b29
035a884
7935349
26ee22f
1555592
ccd2c42
900dad5
45423fc
33cfb75
6c7e0d5
85594a7
72f671c
ecc8fc4
566d09b
c81b90b
3fb3da5
8c7bd0e
9fc3f0f
6208486
96ec515
3f7e9c0
f9dde90
034d1ef
ee303da
1d1e983
500c375
d9bc9bc
9e89aed
6210504
a0d2886
88826f7
8e7539e
a3343ff
9852f9a
fbf1189
d1f3180
ffc8e06
cede729
264b19e
052d026
96c05b1
17ca156
0b49546
5607196
ab5c0cf
f417abb
0ac4c83
d8cd78e
b66a1f0
b1b9529
e8a6d9b
7cf889d
e0e97ba
9175e93
aa52df0
9fdeedd
fd4e9f6
e86131c
5d55539
5a19650
348a72c
5e9e58f
f2e9488
564e038
88b0146
2ad6639
c7980b0
39f9328
924268d
e14ab29
ca33d8b
cfece06
e103336
9875c44
634ced2
43f1850
12e36d0
c65b2f2
b6b578b
8d962da
2723796
5969c60
78d760b
0fa4f8d
bf20a25
9f78e1d
8816298
ac77102
21a7320
0963fca
80c88de
f783636
0786a6a
3415418
873f373
e3736af
f4fb2ec
b11ec29
52e7c05
9a7b1b6
249313a
5acb93d
b0ab4d9
1d1c803
cf7901a
5f729f6
e883918
aa1ab56
864bc91
a582417
6fb906f
43111e7
55cf345
2b97164
03c0c94
08d0498
1b4497d
ff76aeb
5c181f6
dd46989
bc0e4e3
b54bcac
29107b9
f8f8a97
8a75693
a768dba
6e333a0
8548f2a
edec5e7
e4d525d
86cc24c
0f7df79
43b6b96
b57f987
08f1c15
1c66e5d
86d2d90
e629f9a
77b181f
bf37982
4efe3fb
7ac88c4
5382669
f3b1711
e0db66c
8c29649
158f124
5e6dff5
fdc46cc
e9dbe3c
f3545ac
6138338
aba03e0
55c4ed4
28359e8
7886412
14efbac
ca571ee
4277760
5e7a6a6
86351ae
a5827a6
5383967
5bf86be
a07eccf
0023d24
85208f2
11e5ca7
e78fb65
bb7e98b
ae43a83
26a6472
e65ff4d
a05043b
c44b289
9033a46
7e349ce
472bce9
bc1e131
4adfc77
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,6 +75,12 @@ Most metric values reset each emission period, as specified in `druid.monitoring | |
|`metadatacache/schemaPoll/count`|Number of coordinator polls to fetch datasource schema.|| | ||
|`metadatacache/schemaPoll/failed`|Number of failed coordinator polls to fetch datasource schema.|| | ||
|`metadatacache/schemaPoll/time`|Time taken for coordinator polls to fetch datasource schema.|| | ||
|`metadatacache/backfill/count`|Number of segments for which schema was back filled in the database.|`dataSource`| | ||
|`schemacache/realtime/size`|Number of realtime segments for which schema is cached.||Depends on the number of realtime segments.| | ||
|`schemacache/finalizedSegmentMetadata/size`|Number of finalized segments for which schema metadata is cached.||Depends on the number of segments in the cluster.| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: this should be count and not size. We can do this change as a followup. |
||
|`schemacache/finalizedSchemaPayload/size`|Number of finalized segment schema cached.||Depends on the number of distinct schema in the cluster.| | ||
|`schemacache/inTransitSMQResults/size`|Number of segments for which schema was fetched by executing segment metadata query.||Eventually it should be 0.| | ||
|`schemacache/inTransitSMQPublishedResults/size`|Number of segments for which schema is cached after back filling in the database.||Eventually it should be 0.| | ||
|`serverview/sync/healthy`|Sync status of the Broker with a segment-loading server such as a Historical or Peon. Emitted only when [HTTP-based server view](../configuration/index.md#segment-management) is enabled. This metric can be used in conjunction with `serverview/sync/unstableTime` to debug slow startup of Brokers.|`server`, `tier`|1 for fully synced servers, 0 otherwise| | ||
|`serverview/sync/unstableTime`|Time in milliseconds for which the Broker has been failing to sync with a segment-loading server. Emitted only when [HTTP-based server view](../configuration/index.md#segment-management) is enabled.|`server`, `tier`|Not emitted for synced servers.| | ||
|`subquery/rowLimit/count`|Number of subqueries whose results are materialized as rows (Java objects on heap).|This metric is only available if the `SubqueryCountStatsMonitor` module is included.| | | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,6 +57,7 @@ | |
import org.apache.druid.segment.incremental.RowIngestionMeters; | ||
import org.apache.druid.segment.indexing.DataSchema; | ||
import org.apache.druid.segment.indexing.TuningConfig; | ||
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig; | ||
import org.apache.druid.segment.realtime.appenderator.Appenderator; | ||
import org.apache.druid.segment.realtime.appenderator.AppenderatorConfig; | ||
import org.apache.druid.segment.realtime.appenderator.Appenderators; | ||
|
@@ -192,7 +193,8 @@ public Pair<Integer, ReadableInput> apply(ReadableInput readableInput) | |
frameContext.indexMerger(), | ||
meters, | ||
parseExceptionHandler, | ||
true | ||
true, | ||
CentralizedDatasourceSchemaConfig.create(false) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MSQ does not support centralized data source schema yet. I think we should put his comment here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to have this comment in the javadoc of |
||
); | ||
|
||
return new SegmentGeneratorFrameProcessor( | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,7 +33,7 @@ | |
import org.apache.druid.indexing.overlord.DataSourceMetadata; | ||
import org.apache.druid.indexing.overlord.SegmentPublishResult; | ||
import org.apache.druid.java.util.common.ISE; | ||
import org.apache.druid.segment.MinimalSegmentSchemas; | ||
import org.apache.druid.segment.SegmentSchemaMapping; | ||
import org.apache.druid.segment.SegmentUtils; | ||
import org.apache.druid.timeline.DataSegment; | ||
import org.joda.time.Interval; | ||
|
@@ -70,27 +70,27 @@ public class SegmentTransactionalInsertAction implements TaskAction<SegmentPubli | |
@Nullable | ||
private final String dataSource; | ||
@Nullable | ||
private final MinimalSegmentSchemas minimalSegmentSchemas; | ||
private final SegmentSchemaMapping segmentSchemaMapping; | ||
|
||
public static SegmentTransactionalInsertAction overwriteAction( | ||
@Nullable Set<DataSegment> segmentsToBeOverwritten, | ||
Set<DataSegment> segmentsToPublish, | ||
@Nullable MinimalSegmentSchemas minimalSegmentSchemas | ||
@Nullable SegmentSchemaMapping segmentSchemaMapping | ||
) | ||
{ | ||
return new SegmentTransactionalInsertAction(segmentsToBeOverwritten, segmentsToPublish, null, null, null, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For follow-up PR |
||
minimalSegmentSchemas | ||
segmentSchemaMapping | ||
); | ||
} | ||
|
||
public static SegmentTransactionalInsertAction appendAction( | ||
Set<DataSegment> segments, | ||
@Nullable DataSourceMetadata startMetadata, | ||
@Nullable DataSourceMetadata endMetadata, | ||
@Nullable MinimalSegmentSchemas minimalSegmentSchemas | ||
@Nullable SegmentSchemaMapping segmentSchemaMapping | ||
) | ||
{ | ||
return new SegmentTransactionalInsertAction(null, segments, startMetadata, endMetadata, null, minimalSegmentSchemas); | ||
return new SegmentTransactionalInsertAction(null, segments, startMetadata, endMetadata, null, segmentSchemaMapping); | ||
} | ||
|
||
public static SegmentTransactionalInsertAction commitMetadataOnlyAction( | ||
|
@@ -109,15 +109,15 @@ private SegmentTransactionalInsertAction( | |
@JsonProperty("startMetadata") @Nullable DataSourceMetadata startMetadata, | ||
@JsonProperty("endMetadata") @Nullable DataSourceMetadata endMetadata, | ||
@JsonProperty("dataSource") @Nullable String dataSource, | ||
@JsonProperty("minimalSegmentSchemas") @Nullable MinimalSegmentSchemas minimalSegmentSchemas | ||
@JsonProperty("segmentSchemaMapping") @Nullable SegmentSchemaMapping segmentSchemaMapping | ||
) | ||
{ | ||
this.segmentsToBeOverwritten = segmentsToBeOverwritten; | ||
this.segments = segments == null ? ImmutableSet.of() : ImmutableSet.copyOf(segments); | ||
this.startMetadata = startMetadata; | ||
this.endMetadata = endMetadata; | ||
this.dataSource = dataSource; | ||
this.minimalSegmentSchemas = minimalSegmentSchemas; | ||
this.segmentSchemaMapping = segmentSchemaMapping; | ||
} | ||
|
||
@JsonProperty | ||
|
@@ -156,9 +156,9 @@ public String getDataSource() | |
|
||
@JsonProperty | ||
@Nullable | ||
public MinimalSegmentSchemas getMinimalSegmentSchemas() | ||
public SegmentSchemaMapping getSegmentSchemaMapping() | ||
{ | ||
return minimalSegmentSchemas; | ||
return segmentSchemaMapping; | ||
} | ||
|
||
@Override | ||
|
@@ -218,7 +218,7 @@ public SegmentPublishResult perform(Task task, TaskActionToolbox toolbox) | |
segments, | ||
startMetadata, | ||
endMetadata, | ||
minimalSegmentSchemas | ||
segmentSchemaMapping | ||
) | ||
) | ||
.onInvalidLocks( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For follow-up PR:
The config description should be more like
Indicates whether centralized schema management is enabled
. The description should also link to the page which contains the details of the feature.