[BUG] Race condition on reloader deployment restart

**Describe the bug**
Helm chart:
`reloadOnCreate: false`

Workload's annotation:
` deployment.reloader.stakater.com/pause-period: <anything above a few seconds>`

With the above configuration, a race condition is created. If a configmap or secret changes and then the `Reloader` pod is restarted before the pause period ends, the workload is stuck with rollouts paused indefinitely until either the rollouts are unpaused manually or another change is made to the configmap.

Running in single-replica mode minimizes the timing but it still occurs. Running in leadership elections mode with multiple replicas prolongs the timeframe during which the race condition can occur.

Enabling `syncAfterRestart` does not solve that problem.

Enabling both `syncAfterRestart` and `reloadOnCreate` technically solves the problem, but it does so by restarting all workloads across all monitored namespaces after `reloader` restarts. This solution is not really viable.

The problem is more likely to occur with longer `pause-period` values.

**To Reproduce**
1. Set `reloadOnCreate: false`
2. Set `syncAfterRestart` to any value
3. Add `deployment.reloader.stakater.com/pause-period` annotation to any workload with value like one minute.
4. Change any configmap / secret that the above workload is referencing.
5. Before the pause period ends, manually restart the `reloader` deployment.

**Expected behavior**
The `reloader` pod restarts shouldn't impact the timed reloads.

**Possible solutions**
Reloader already adds a temporary annotation specifying time at which the pause was activated, and another annotation also contains the pause period duration. It looks like it should be possible for Reloader to check for pending reloads with `syncAfterRestart` enabled, without the nuclear option of forcing reloads across all monitored workloads.

Alternatively, in leader elections mode, the information about pending reloads could be transferred to the new leader so it can handle the affected workloads.

Another option is to gracefully terminate the old `reloader` pod by allowing it to handle all the pending cases while letting the new one handle all future incoming changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] Race condition on reloader deployment restart #1016

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Race condition on reloader deployment restart #1016

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions