-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rythm: fix ingestion slack time range #4459
base: main-rhythm
Are you sure you want to change the base?
rythm: fix ingestion slack time range #4459
Conversation
I have not dug into the details of this PR but to share some history about ingestion slack. When we first rolled Tempo out internally it would mark a block's start and time using the min start time and max end time of all spans in the block. We quickly found that every block had a start time of 0 and an end time 100 years in the future due to the data we were consuming. So I created the ingestion slack to prevent this. Ingestion slack is gross to calculate and I have since just wished I had added 5 minutes to the beginning and end of every block instead of trying to watch all spans and only doing it conditionally. You all are welcome to tackle this problem however you want, and I'd be glad to discuss it sync if you're like to, but just wanted to give some history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new behaviour, but I'm unsure of changing how Append
and AppendTrace
work like that. Specially only changing it for one parquet version. There is no warning to the callers of the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current changes correctly address the problem, but I still find strange that Append
and AppendTrace
have different behaviors. We should either change the method's signature or be very clear with comments or similar.
BTW, this PR should now be pointed at |
What this PR does:
It fixes the ingestion slack by calculating it according to the partition's last commit.
Using a fixed time on WAL creation won't work as we want for several reasons. Every partition can have a different offset and the wall is created before reading Kafka. Also, we don't have a way to know this start time in case of wal replay.
The adjustment of the ingestion slack should be done outside the WAL, since we could have different strategies. Another possibility could be storing a function to calculate it but I have opted for this approach that extends the current interface without modifying it.