Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic to PRM watch thread #116

Merged
merged 5 commits into from
Sep 6, 2024

Conversation

HonakerM
Copy link
Collaborator

@HonakerM HonakerM commented Sep 6, 2024

Related Issue

Supports #?

Related PRs

This PR adds better support for ApiExceptions in the watch thread by adding retry logic and a sys.exit in-case of failure. In the past if a watch thread failed due to 403/400 errors then the thread would die and nothing would restart it.

What this PR does / why we need it

Special notes for your reviewer

If applicable**

  • this PR contains documentation
  • this PR contains unit tests
  • this PR has been tested for backwards compatibility

What gif most accurately describes how I feel towards this PR?

Example of a gif

Signed-off-by: Michael Honaker <[email protected]>
@HonakerM HonakerM force-pushed the add_watch_retry_and_exit branch from 6964328 to 83f81c7 Compare September 6, 2024 15:10
Signed-off-by: Michael Honaker <[email protected]>
Signed-off-by: Michael Honaker <[email protected]>
Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm probably just missing the bigger picture, but it's not clear how the retry is actually happening

list_resource_version = self.kubernetes_watch.resource_version
self.wait_on_precondition(self.retry_delay.total_seconds())
self.attempts_left = self.attempts_left - 1
log.info("Restarting watch with %d attempts left", self.attempts_left)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be because I'm reviewing on my phone and can't see indentation well, but I'm not seeing how this actually performs the retry. There isn't a loop above the try, so it's this being done outside this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a loop above the try, so it's this being done outside this function?

The try is inside the While True: loop

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I shouldn't review on a phone!


# Update the resource version to only get new events
list_resource_version = self.kubernetes_watch.resource_version
self.wait_on_precondition(self.retry_delay.total_seconds())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a bool, but I'm not seeing it used anywhere. Can the preconditions fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing it used anywhere.

good catch. It should be checked in the retry logic.

Can the preconditions fail?

Yep! If the thread should shutdown then it returns True

):
# Validate leadership on each event
if not self.check_preconditions():
log.debug("Checking preconditions failed. Shuting down")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an indentation change, but Shutting

Signed-off-by: Michael Honaker <[email protected]>
@HonakerM HonakerM force-pushed the add_watch_retry_and_exit branch from 9ce0bd8 to 13b7f89 Compare September 6, 2024 17:57
@HonakerM HonakerM requested a review from gabe-l-hart September 6, 2024 18:10
Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@gabe-l-hart gabe-l-hart merged commit 9ac5c17 into IBM:main Sep 6, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants