Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-7117 Prevent OOM in Debezium Redis sink #61

Conversation

eizners
Copy link
Contributor

@eizners eizners commented Jan 16, 2024

No description provided.

@eizners eizners marked this pull request as draft January 16, 2024 20:40
@eizners eizners marked this pull request as ready for review January 16, 2024 21:04
Copy link
Member

@Naros Naros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few inline comments @eizners.

Comment on lines +63 to +68
totalProcessed += bufferSize;
LOGGER.debug("Total Processed Records: {}", totalProcessed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If maximumMemory is set to 0, I assume we are not doing any memory check, and is there a case where totalProcessed here would just continue to grow until it overflows potentially? If so, I wonder if it would be better to avoid incrementing and simply just return true only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct is maximumMemory we are not doing any checks. The totalProcessed is long so I dobt that Debezium will ever process 9,223,372,036,854,775,807 records without restart

(estimatedUsedMemory >= maximumMemory), humanReadableSize(usedMemory), humanReadableSize(maximumMemory),
humanReadableSize(accumulatedMemory), humanReadableSize(estimatedUsedMemory),
humanReadableSize(extraMemory), bufferSize, totalProcessed + bufferSize);
totalProcessed += bufferSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, could this eventually overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above

}

long extimatedBatchSize = extraMemory * bufferFillRate;
long usedMemory = memoryTuple.getItem1();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the memoryTuple() function, the first element stored here could technically be null if the key for this element is not found and could result in a NullPointerException. Similar to how you guarantee there is a non-null value for item2, you should do the same for item1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 141 to 174
/**
* Formats a raw file size value into a human-readable string with appropriate units (B, KB, MB, GB).
*
* @param size The size value to be formatted.
* @return A human-readable string representing the file size with units (B, KB, MB, GB).
*/
public static String humanReadableSize(Long size) {
if (size == null) {
return "Not configured";
}

final String[] units = { "B", "KB", "MB", "GB" };
int unitIndex = 0;

double sizeInUnit = size;

if (size == 0) {
return "0 B";
}

while (sizeInUnit >= 1024 && unitIndex < units.length - 1) {
sizeInUnit /= 1024.0;
unitIndex++;
}

return String.format("%.2f %s", sizeInUnit, units[unitIndex]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest creating a follow-up Jira and consider moving this to debezium-core inside either the io.debezium.util.NumberConversions class or io.debezium.util.Strings utility classes as I think this could be helpful for any connector that may want to log similar bits in the future, perhaps with a variant that uses the `"%.2f %s" format by default and another that takes a format string that allows customization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to name this with a verb, something like:

  • formatReadableSize
  • getSizeInHumanReadableFormat
  • convertSizeToReadableString
  • getFormattedSizeString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to getSizeInHumanReadableFormat. I will create a new Jira for that. This fix will need to go into different repository

@eizners eizners force-pushed the DBZ-7117-prevent-oom-in-debezium-redis-sink-stash branch from 2dec7de to 4592876 Compare January 17, 2024 16:18
@jpechane
Copy link
Contributor

jpechane commented Jan 24, 2024

@eizners Applied via #63, thanks a lot!

@jpechane jpechane closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants