Repair disconnected cluster with one shard #1618

joaojuliosap · 2025-12-15T18:52:05Z

Description

This PR implements automatic recovery for single-shard Redis Cluster instances when master and slave nodes restart simultaneously and lose track of each other's IP addresses. When both nodes of a single-shard cluster restart, they retain stale IP addresses in their nodes.conf files and fail to reconnect automatically. This fix detects such scenarios and executes a CLUSTER MEET command to reintroduce the follower node to the leader using the current IP address.

The solution adds a new recovery path in the reconciliation loop that specifically handles single-shard clusters (leaderReplicas == 1 && followerReplicas == 1) with unhealthy nodes. It introduces the RepairDisconnectedCluster function that retrieves the current follower pod IP from Kubernetes and issues a CLUSTER MEET command from the leader to re-establish the cluster connection.

Type of change

Bug fix (non-breaking change which fixes an issue)

Additional Context
E2E tests added for both Redis v6 and v7 to validate the repair mechanism
No functional behavior changes for clusters with more than one shard

Signed-off-by: I759672 <[email protected]>

joaojuliosap requested review from drivebyer, iamabhishek-dubey and shubham-cmyk as code owners December 15, 2025 18:52

joaojuliosap added 3 commits December 17, 2025 10:18

fix: repair disconnected cluster with one shard

44dc350

Signed-off-by: I759672 <[email protected]>

fix: formatting issues

c5cc0d1

Signed-off-by: I759672 <[email protected]>

fix: formatting in e2e tests

0bf1b3a

Signed-off-by: I759672 <[email protected]>

joaojuliosap force-pushed the repair-disconnected-cluster-with-one-shard branch from a30fe9b to 0bf1b3a Compare December 17, 2025 10:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repair disconnected cluster with one shard #1618

Repair disconnected cluster with one shard #1618

Uh oh!

joaojuliosap commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Repair disconnected cluster with one shard #1618

Are you sure you want to change the base?

Repair disconnected cluster with one shard #1618

Uh oh!

Conversation

joaojuliosap commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant