Skip to content

Primary/replica status fails to update after Redis Cluster node status change #3000

@VilleKylmamaa

Description

@VilleKylmamaa

Redis nodes in a Redis Cluster can change their status from replica to primary and vice versa, when a primary node becomes unavailable. When this happens, StackExchange.Redis seems to fail in updating its knowledge (held in memory?) to match the new primary/replica status in the Redis Cluster. Importantly, this leads to many (1/3 in a Redis cluster with 3 primary and 3 replica nodes?) calls failing due to the following exception: Command cannot be issued to a replica (StackExchange.Redis.RedisCommandException). This is despite the cluster having returned to a fully functional state with all primary and replica nodes available.

This situation where some % of calls fail due to the said exception remains for an indefinite amount of time. One time when the situation was left unhandled manually, it remained approximately for an hour before resolving by itself without a clear trigger why.

The Command cannot be issued to a replica exception would be ExceptionFactory.PrimaryOnly which according to our stack trace (see below) is thrown by PhysicalBridge.WriteMessageToServerInsideWriteLock. The exception appears to be thrown by the client library without attempting to call Redis.

During this situation, StackExchange.Redis keeps attempting to call ConnectionMultiplexer.ReconfigureAsync but it does not update the Redis Cluster node primary/replica status to match the situation of the actual nodes. I am deducting ConnectionMultiplexer.ReconfigureAsync being called from the "Endpoint Summary" logs being produced. I have tested forcing ConnectionMultiplexer.ReconfigureAsync calls every minute and it does not solve the situation. Creating a new ConnectionMultiplexer and opening a totally new connection does work but should not be required as it is expensive.

Stack trace:

2026-01-27 12:42:18,830 [6616/103] ERROR - Error setting value to Redis cache, key=<REDACTED_KEY>
StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX <REDACTED_KEY> (BooleanProcessor) ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX <REDACTED_KEY>
   at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1573
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

Example of replicating the situation

This was replicated in a test environment with the following specifications:

StackExchange.Redis version: 2.9.32
Redis version: 7.4.6
.NET version: .NET Framework 4.8.1
Redis Cluster set up: 3 primary nodes, 3 replica nodes

Redis Cluster initial state

These endpoint summaries are provided by the Trace logs by StackExchange.Redis. I have redacted server addresses.

2026-01-27 12:31:57,192 [6616/11] INFO - Endpoint Summary:
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_C>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;

SHUTDOWN node NODE_C

2026-01-27 12:36:21,306 [6616/103] WARN - RedisConnection.OnConnectionFailed() failure=SocketClosed endPoint=IPEndPoint(<NODE_C>)

This was logged by the following custom code where connection is ConnectionMultiplexer:

connection.ConnectionFailed += (sender, args) =>
{
    var endPoint = EndPointToString(args.EndPoint);
    Loki.Warn($"RedisConnection.OnConnectionFailed() failure={args.FailureType} endPoint={endPoint}", args.Exception);
};

StackExchange.Redis first attempt to reconfigure

See NODE_C in state Connecting

2026-01-27 12:37:25,963 [6616/64] INFO - Endpoint Summary:
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_C>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: Connecting; sub: n/a;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;

Command cannot be issued to a replica exceptions already started showing during this stage:

2026-01-27 12:36:44,399 [6616/86] ERROR - Error setting value to Redis cache, key=<REDACTED_KEY> StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX <REDACTED_KEY> (BooleanProcessor) ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX <REDACTED_KEY> at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1573 --- End of inner exception stack trace --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

RESTART NODE_C

Note: If the shutdown node is restarted too quickly after shutting it down, the situation might not reproduce due no the primary/replica status of the nodes not changing.

2026-01-27 12:45:24,395 [6616/87] INFO - RedisConnection.OnConnectionRestored() endPoint=IPEndPoint(<NODE_C>)

This was logged by the following custom code where connection is ConnectionMultiplexer:

connection.ConnectionRestored += (sender, args) =>
{
    var endPoint = EndPointToString(args.EndPoint);
    _log.Info($"RedisConnection.OnConnectionRestored() endPoint={endPoint}");
};

RESULT

Notice NODE_C has turned into a replica but NODE_D also remains as a replica. At this stage all the connections are ConnectedEstablished and the Redis Cluster has returned to fully operational state with 3 primary and 3 replica nodes, but StackExchange.Redis incorrectly thinks there are 2 primary and 4 replicas nodes.

2026-01-27 12:45:30,239 [6616/64] INFO - Endpoint Summary:
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;

StackExchange.Redis does not solve the situation and it remains for an indefinite amount of time despite StackExchange.Redis attempting to reconfigure multiple times. Notice the situation of 2 primary and 4 replica nodes remains after over 6 minutes have passed:

2026-01-27 12:51:59,129 [6616/79] INFO - Endpoint Summary:
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;

Correct primary/replica status only after restarting the application

Only after the connection is fully reset by an application pool recycle, StackExchange.Redis learns the correct primary/replica status, where NODE_C has become a replica and NODE_D has become a primary:

2026-01-27 12:54:12,764 [9928/9] INFO - Endpoint Summary:
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_D>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions