-
-
Notifications
You must be signed in to change notification settings - Fork 619
Description
Describe the bug
Helm chart:
reloadOnCreate: false
Workload's annotation:
deployment.reloader.stakater.com/pause-period: <anything above a few seconds>
With the above configuration, a race condition is created. If a configmap or secret changes and then the Reloader pod is restarted before the pause period ends, the workload is stuck with rollouts paused indefinitely until either the rollouts are unpaused manually or another change is made to the configmap.
Running in single-replica mode minimizes the timing but it still occurs. Running in leadership elections mode with multiple replicas prolongs the timeframe during which the race condition can occur.
Enabling syncAfterRestart does not solve that problem.
Enabling both syncAfterRestart and reloadOnCreate technically solves the problem, but it does so by restarting all workloads across all monitored namespaces after reloader restarts. This solution is not really viable.
The problem is more likely to occur with longer pause-period values.
To Reproduce
- Set
reloadOnCreate: false - Set
syncAfterRestartto any value - Add
deployment.reloader.stakater.com/pause-periodannotation to any workload with value like one minute. - Change any configmap / secret that the above workload is referencing.
- Before the pause period ends, manually restart the
reloaderdeployment.
Expected behavior
The reloader pod restarts shouldn't impact the timed reloads.
Possible solutions
Reloader already adds a temporary annotation specifying time at which the pause was activated, and another annotation also contains the pause period duration. It looks like it should be possible for Reloader to check for pending reloads with syncAfterRestart enabled, without the nuclear option of forcing reloads across all monitored workloads.
Alternatively, in leader elections mode, the information about pending reloads could be transferred to the new leader so it can handle the affected workloads.
Another option is to gracefully terminate the old reloader pod by allowing it to handle all the pending cases while letting the new one handle all future incoming changes.