PWM watch thread doesn't restart the service when unable to start #125

pit1sIBM · 2024-10-04T15:57:44Z

Describe the bug

When insufficient RBAC is provided, the service recieves 403 errors for the cluster as expected. However if the correct permissions are not applied before the watch thread exits, the service never picks up the changes and stays stuck. In the logs this looks like

2024-10-04T15:38:07.230908 [WTCHT:ERRR:140078775780928] Unable to start watch within 5 attempts
2024-10-04T15:38:12.100343 [TMRTH:DBUG:140078758995520] Timer executing action for event: TimerEvent(time=datetime.datetime(2024, 10, 4, 15, 38, 12, 99364), action=<bound method HeartbeatThread._run_heartbeat of <HeartbeatThread(heartbeat_thread, started daemon 140078758995520)>>, args=(), kwargs={}, stale=False)
2024-10-04T15:38:12.101064 [TMRTH:DBG2:140078758995520] Timer waiting 29.999622s until next scheduled event
2024-10-04T15:38:42.100994 [TMRTH:DBUG:140078758995520] Timer executing action for event: TimerEvent(time=datetime.datetime(2024, 10, 4, 15, 38, 42, 100649), action=<bound method HeartbeatThread._run_heartbeat of <HeartbeatThread(heartbeat_thread, started daemon 140078758995520)>>, args=(), kwargs={}, stale=False)

This is due to the following code, which calls a sys.exit(1) which I believe only applies to the thread (same as _thread.exit()) and not the main process.

oper8/oper8/watch_manager/python_watch_manager/threads/watch.py

Lines 180 to 184 in 92e62f5

    
           log.error( 
        
               "Unable to start watch within %d attempts", 
        
               config.python_watch_manager.watch_retry_count, 
        
           ) 
        
           sys.exit(1)

I assume we want to be calling the exit on the main thread instead?

Platform

Please provide details about the environment you are using, including the following:

Interpreter version: 3.11.7
Library version: 0.1.26

Sample Code

Run the operator without the appropriate RBAC

Expected behavior

Service exits and k8s restarts the pod

Observed behavior

operator pod hangs

Additional context

The text was updated successfully, but these errors were encountered:

pit1sIBM · 2024-10-04T18:24:25Z

Looks to be addressed in https://github.com/IBM/oper8/releases/tag/v0.1.27

HonakerM · 2024-11-08T09:55:31Z

Yeah this was resolved by #116

HonakerM closed this as completed Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PWM watch thread doesn't restart the service when unable to start #125

PWM watch thread doesn't restart the service when unable to start #125

pit1sIBM commented Oct 4, 2024

pit1sIBM commented Oct 4, 2024

HonakerM commented Nov 8, 2024

PWM watch thread doesn't restart the service when unable to start #125

PWM watch thread doesn't restart the service when unable to start #125

Comments

pit1sIBM commented Oct 4, 2024

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

pit1sIBM commented Oct 4, 2024

HonakerM commented Nov 8, 2024