Skip to content

Commit e7efbee

Browse files
committed
rabbit: raise failure-timeout
The failure timeout of 30s is far too low. Essentially it means that a failed node is considered ready after 30s. Given that any start or stop operation takes considerably more than 30s. We should only expire failures after around 30 minutes to prevent flapping services.
1 parent f933d9e commit e7efbee

File tree

2 files changed

+7
-6
lines changed

2 files changed

+7
-6
lines changed

chef/cookbooks/rabbitmq/attributes/default.rb

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,14 @@
4242
default[:rabbitmq][:ha][:op][:start][:timeout] = "300s"
4343
default[:rabbitmq][:ha][:op][:promote][:timeout] = "180s"
4444
default[:rabbitmq][:ha][:op][:monitor][:interval] = "10s"
45-
default[:rabbitmq][:ha][:clustered_op][:start][:timeout] = "360s"
46-
default[:rabbitmq][:ha][:clustered_op][:stop][:timeout] = "120s"
47-
default[:rabbitmq][:ha][:clustered_op][:promote][:timeout] = "120s"
48-
default[:rabbitmq][:ha][:clustered_op][:demote][:timeout] = "120s"
45+
default[:rabbitmq][:ha][:clustered_op][:start][:timeout] = "540s"
46+
default[:rabbitmq][:ha][:clustered_op][:stop][:timeout] = "180s"
47+
default[:rabbitmq][:ha][:clustered_op][:promote][:timeout] = "180s"
48+
default[:rabbitmq][:ha][:clustered_op][:demote][:timeout] = "180s"
4949
default[:rabbitmq][:ha][:clustered_op][:notify][:timeout] = "180s"
5050
default[:rabbitmq][:ha][:clustered_op][:monitor] = [
51-
{ interval: "30s" }, { interval: "27s", role: "Master" }
51+
{ interval: "60s", timeout: "90s" },
52+
{ interval: "57s", timeout: "90s", role: "Master" }
5253
]
5354

5455
default[:rabbitmq][:hipe_compile] = false

chef/cookbooks/rabbitmq/recipes/ha_cluster.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
op node[:rabbitmq][:ha][:clustered_op]
123123
meta ({
124124
"migration-threshold" => "10",
125-
"failure-timeout" => "30s",
125+
"failure-timeout" => "1800s",
126126
"resource-stickiness" => "100"
127127
})
128128
action :update

0 commit comments

Comments
 (0)