-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry logic to PRM watch thread #116
Conversation
Signed-off-by: Michael Honaker <[email protected]>
Signed-off-by: Michael Honaker <[email protected]>
6964328
to
83f81c7
Compare
Signed-off-by: Michael Honaker <[email protected]>
Signed-off-by: Michael Honaker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm probably just missing the bigger picture, but it's not clear how the retry is actually happening
list_resource_version = self.kubernetes_watch.resource_version | ||
self.wait_on_precondition(self.retry_delay.total_seconds()) | ||
self.attempts_left = self.attempts_left - 1 | ||
log.info("Restarting watch with %d attempts left", self.attempts_left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be because I'm reviewing on my phone and can't see indentation well, but I'm not seeing how this actually performs the retry. There isn't a loop above the try
, so it's this being done outside this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a loop above the try, so it's this being done outside this function?
The try
is inside the While True:
loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I shouldn't review on a phone!
|
||
# Update the resource version to only get new events | ||
list_resource_version = self.kubernetes_watch.resource_version | ||
self.wait_on_precondition(self.retry_delay.total_seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a bool, but I'm not seeing it used anywhere. Can the preconditions fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing it used anywhere.
good catch. It should be checked in the retry logic.
Can the preconditions fail?
Yep! If the thread should shutdown then it returns True
): | ||
# Validate leadership on each event | ||
if not self.check_preconditions(): | ||
log.debug("Checking preconditions failed. Shuting down") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an indentation change, but Shutting
Signed-off-by: Michael Honaker <[email protected]>
9ce0bd8
to
13b7f89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Related Issue
Supports #?
Related PRs
This PR adds better support for ApiExceptions in the watch thread by adding retry logic and a
sys.exit
in-case of failure. In the past if a watch thread failed due to 403/400 errors then the thread would die and nothing would restart it.What this PR does / why we need it
Special notes for your reviewer
If applicable**
What gif most accurately describes how I feel towards this PR?