Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modify optimized compaction to cover edge cases #25594
feat: Modify optimized compaction to cover edge cases #25594
Changes from 25 commits
d631314
67849ae
827e859
5387ca3
3153596
83d28ec
f896a01
f15d9be
54c8e1c
403d888
23d12e1
d3afb03
cf657a8
fc6ca13
4fc4d55
c93bdfb
2dd5ef4
479de96
c392906
f444518
5e4e2da
c315b1f
eb0a77d
1bac192
371f960
5a614c4
3748c36
0799f00
3e69f2d
976291a
0a2ba1e
8c908c5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block the PR for this, but this is still a superfluous function. The
fmt.Sprintf
could have been used directly with thestring
var.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why we don't need to check the points in all blocks. Can you explain why are we checking the
BlockCount
for block 1 and not block 0?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM, figured it out.
BlockIterator
is a Java-style iterator, and the index is the number of timesNext
gets called on it, so 1 is actually the first block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused whether this is still what we want to do. We skip a group (i.e., a generation) here if it is large (sum of all files is larger than the largest permissible single file), and the first file has the default maximum points per block and there are no tombstones.
This seems to be mixing metrics from the first file in the generation (points per block) with metrics from the whole generation (combined file size). Do we need to look at the points per block of all the files in the generation? Why are we skipping a generation if it is larger than a single file can be? What's the significance of that?
I understand the original code had this strange mix of conditionals, but do we understand why, and whether we should continue with them? At the very least, the comment
Skip the file if...
is misleading, because we are skipping a generation which may contain more than one file, are we not?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the comment is a bit misleading. I was mostly just keeping Plan, and PlanLevel as is... I would have no problem with modifying the existing logic in them though. Perhaps instead of checking individual file block counts and the entire group size against 2 GB I take the approach checking all the files in the group and all the block sizes in the group? Some pseudo code:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After consideration, I think you were right, @devanbenz, to change
Plan
andPlanLevel
minimally. While their algorithms are obtuse, we shouldn't change them in the PR or at this time, to minimize the risks in what is already a large change to compaction.