-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Seems that the ingress update affects the operator lease renewal causing it to exit and delaying the CR reconciliation.
The operator exits while it's waiting for the ingress to get updated:
2023-09-11T07:56:23Z INFO Waiting for ingress to update {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "0d68e259-8df3-4af2-a8cb-3cc0015b9c64"}
2023-09-11T07:56:33Z ERROR Reconciler error {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "0d68e259-8df3-4af2-a8cb-3cc0015b9c64", "error": "dial tcp 192.168.127.10:443: connect: connection refused"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
2023-09-11T07:56:33Z INFO validation succeeded {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d"}
2023-09-11T07:56:33Z INFO TLS cert already exists for Ingresses {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d"}
2023-09-11T07:56:33Z INFO Using user provided API certificate {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d", "namespace": "relocation", "name": "new-api-certs"}
2023-09-11T07:56:33Z ERROR Reconciler error {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d", "error": "dial tcp 192.168.127.10:443: connect: connection refused"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
E0911 07:56:52.138017 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:57:02.139324 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:58:23.958624 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:58:33.960122 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
Once the new instance starts it hangs for some time while it's trying to acquire the lease:
2023-09-11T07:58:56Z INFO setup starting manager
I0911 07:58:56.215805 1 leaderelection.go:248] attempting to acquire leader lease openshift-operators/f4de3632.rhsyseng.github.io...
2023-09-11T07:58:56Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2023-09-11T07:58:56Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0911 08:00:05.290010 1 leaderelection.go:258] successfully acquired lease openshift-operators/f4de3632.rhsyseng.github.io
2023-09-11T08:00:05Z DEBUG events cluster-relocation-operator-controller-manager-75666d5c5-tmn66_aca225e6-ef45-4574-b015-8132b0091818 became leader
Expected Behavior
Current Behavior
Possible Solution
- Don't use a leader lease if there's a single instance of the operator
- Use a longer lease duration
Steps to Reproduce (for bugs)
- I noticed this when I applied the CR with a new domain and without ingress cert (so the operator generated a self signed one)
Context
This issues delays the clusterrelocation CR reconciliation
I applied the CR on a stable cluster that was installed houres ago.
Regression
UnsureYour Environment
- Version used (
cluster-relocation-operator):
latest operator from operator HUB - Environment name and version (e.g. OCP v1.12.20):
4.10
Metadata
Metadata
Assignees
Labels
No labels