-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Redis nodes in a Redis Cluster can change their status from replica to primary and vice versa, when a primary node becomes unavailable. When this happens, StackExchange.Redis seems to fail in updating its knowledge (held in memory?) to match the new primary/replica status in the Redis Cluster. Importantly, this leads to many (1/3 in a Redis cluster with 3 primary and 3 replica nodes?) calls failing due to the following exception: Command cannot be issued to a replica (StackExchange.Redis.RedisCommandException). This is despite the cluster having returned to a fully functional state with all primary and replica nodes available.
This situation where some % of calls fail due to the said exception remains for an indefinite amount of time. One time when the situation was left unhandled manually, it remained approximately for an hour before resolving by itself without a clear trigger why.
The Command cannot be issued to a replica exception would be ExceptionFactory.PrimaryOnly which according to our stack trace (see below) is thrown by PhysicalBridge.WriteMessageToServerInsideWriteLock. The exception appears to be thrown by the client library without attempting to call Redis.
During this situation, StackExchange.Redis keeps attempting to call ConnectionMultiplexer.ReconfigureAsync but it does not update the Redis Cluster node primary/replica status to match the situation of the actual nodes. I am deducting ConnectionMultiplexer.ReconfigureAsync being called from the "Endpoint Summary" logs being produced. I have tested forcing ConnectionMultiplexer.ReconfigureAsync calls every minute and it does not solve the situation. Creating a new ConnectionMultiplexer and opening a totally new connection does work but should not be required as it is expensive.
Stack trace:
2026-01-27 12:42:18,830 [6616/103] ERROR - Error setting value to Redis cache, key=<REDACTED_KEY>
StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX <REDACTED_KEY> (BooleanProcessor) ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX <REDACTED_KEY>
at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1573
--- End of inner exception stack trace ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
Example of replicating the situation
This was replicated in a test environment with the following specifications:
StackExchange.Redis version: 2.9.32
Redis version: 7.4.6
.NET version: .NET Framework 4.8.1
Redis Cluster set up: 3 primary nodes, 3 replica nodes
Redis Cluster initial state
These endpoint summaries are provided by the Trace logs by StackExchange.Redis. I have redacted server addresses.
2026-01-27 12:31:57,192 [6616/11] INFO - Endpoint Summary:
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_C>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:31:57,207 [6616/11] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
SHUTDOWN node NODE_C
2026-01-27 12:36:21,306 [6616/103] WARN - RedisConnection.OnConnectionFailed() failure=SocketClosed endPoint=IPEndPoint(<NODE_C>)
This was logged by the following custom code where connection is ConnectionMultiplexer:
connection.ConnectionFailed += (sender, args) =>
{
var endPoint = EndPointToString(args.EndPoint);
Loki.Warn($"RedisConnection.OnConnectionFailed() failure={args.FailureType} endPoint={endPoint}", args.Exception);
};StackExchange.Redis first attempt to reconfigure
See NODE_C in state Connecting
2026-01-27 12:37:25,963 [6616/64] INFO - Endpoint Summary:
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_C>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: Connecting; sub: n/a;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:37:25,963 [6616/79] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
Command cannot be issued to a replica exceptions already started showing during this stage:
2026-01-27 12:36:44,399 [6616/86] ERROR - Error setting value to Redis cache, key=<REDACTED_KEY> StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX <REDACTED_KEY> (BooleanProcessor) ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX <REDACTED_KEY> at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1573 --- End of inner exception stack trace --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
RESTART NODE_C
Note: If the shutdown node is restarted too quickly after shutting it down, the situation might not reproduce due no the primary/replica status of the nodes not changing.
2026-01-27 12:45:24,395 [6616/87] INFO - RedisConnection.OnConnectionRestored() endPoint=IPEndPoint(<NODE_C>)
This was logged by the following custom code where connection is ConnectionMultiplexer:
connection.ConnectionRestored += (sender, args) =>
{
var endPoint = EndPointToString(args.EndPoint);
_log.Info($"RedisConnection.OnConnectionRestored() endPoint={endPoint}");
};RESULT
Notice NODE_C has turned into a replica but NODE_D also remains as a replica. At this stage all the connections are ConnectedEstablished and the Redis Cluster has returned to fully operational state with 3 primary and 3 replica nodes, but StackExchange.Redis incorrectly thinks there are 2 primary and 4 replicas nodes.
2026-01-27 12:45:30,239 [6616/64] INFO - Endpoint Summary:
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:45:30,239 [6616/64] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
StackExchange.Redis does not solve the situation and it remains for an indefinite amount of time despite StackExchange.Redis attempting to reconfigure multiple times. Notice the situation of 2 primary and 4 replica nodes remains after over 6 minutes have passed:
2026-01-27 12:51:59,129 [6616/79] INFO - Endpoint Summary:
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_D>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:51:59,129 [6616/79] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
Correct primary/replica status only after restarting the application
Only after the connection is fully reset by an application pool recycle, StackExchange.Redis learns the correct primary/replica status, where NODE_C has become a replica and NODE_D has become a primary:
2026-01-27 12:54:12,764 [9928/9] INFO - Endpoint Summary:
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_A>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_B>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_C>: Cluster v7.4.6, replica; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_D>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_E>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;
2026-01-27 12:54:12,764 [9928/9] INFO - Server summary: <NODE_F>: Cluster v7.4.6, primary; keep-alive: 00:01:00; int: ConnectedEstablished;