Skip to content

Conversation

@joaojuliosap
Copy link

Description

This PR implements automatic recovery for single-shard Redis Cluster instances when master and slave nodes restart simultaneously and lose track of each other's IP addresses. When both nodes of a single-shard cluster restart, they retain stale IP addresses in their nodes.conf files and fail to reconnect automatically. This fix detects such scenarios and executes a CLUSTER MEET command to reintroduce the follower node to the leader using the current IP address.

The solution adds a new recovery path in the reconciliation loop that specifically handles single-shard clusters (leaderReplicas == 1 && followerReplicas == 1) with unhealthy nodes. It introduces the RepairDisconnectedCluster function that retrieves the current follower pod IP from Kubernetes and issues a CLUSTER MEET command from the leader to re-establish the cluster connection.

Type of change

Bug fix (non-breaking change which fixes an issue)

Additional Context
E2E tests added for both Redis v6 and v7 to validate the repair mechanism
No functional behavior changes for clusters with more than one shard

@joaojuliosap joaojuliosap force-pushed the repair-disconnected-cluster-with-one-shard branch from a30fe9b to 0bf1b3a Compare December 17, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant