|
| 1 | +--- |
| 2 | +title: Unplanned Downtime, November 2025 |
| 3 | +extensions: |
| 4 | + footnotes: true |
| 5 | +--- |
| 6 | + |
| 7 | +# Unplanned Downtime, November 2025 |
| 8 | + |
| 9 | +Last week, I had some unexpected downtime for my website. The original cause was |
| 10 | +a power issue at the physical server, and out of my control, but an oversight of |
| 11 | +mine made the situation worse. |
| 12 | + |
| 13 | +## The Outage |
| 14 | + |
| 15 | +In the evening (Mountain Time[^1]) of Monday, November 17th, my hosting provider, |
| 16 | +Brownrice, suffered a temporary outage due to power supplies failures for the |
| 17 | +server that contains my VPS. Brownrice reports that the outage was between 17:30 |
| 18 | +and 18:00, but I suspect that was the outage for any of their customers. Based |
| 19 | +on my server logs, my server was back up at 17:47. |
| 20 | + |
| 21 | +Once the server was online, the tool I use for managing web requests, |
| 22 | +[Traefik][traefik], restarted automatically based on the |
| 23 | +[configuration for the docker containers][traefik-restart]. The client websites |
| 24 | +that I host, which run on WordPress, also restarted automatically thanks to the |
| 25 | +[same configuration][wordpress-restart]. However, I forgot to apply that |
| 26 | +configuration to my own website! |
| 27 | + |
| 28 | +From 17:51, when the first request was made to my website and failed, until |
| 29 | +18:21 when I manually restarted the web containers for my staging and production |
| 30 | +sites, the downtime was entirely avoidable. |
| 31 | + |
| 32 | +## Mitigation |
| 33 | + |
| 34 | +I got lucky that I noticed the downtime when I did, since I have no automated |
| 35 | +monitoring set up. The first thing I did was to manually restart the docker |
| 36 | +containers for my website. At 18:21, the downtime was over. |
| 37 | + |
| 38 | +Later that same day, I [merged a change][patch-56] to ensure that moving |
| 39 | +forward, the docker containers would restart if they stopped unexpectedly. If |
| 40 | +the entire server goes down and restarts, that change should bring my website |
| 41 | +back up as soon as possible. |
| 42 | + |
| 43 | +## Next Steps |
| 44 | + |
| 45 | +I've started looking into monitoring options, to be able to learn about outages |
| 46 | +earlier. While I would generally prefer to self-host the tool used, if I put the |
| 47 | +monitoring on the same server as the website then the server being down would |
| 48 | +also bring the monitoring down. Self-hosting isn't an option. |
| 49 | + |
| 50 | +There are a variety of companies that offer monitoring services, but I don't |
| 51 | +need a whole bunch of fancy features, and would rather not need to pay for the |
| 52 | +monitoring - most offers I saw cost more than what I pay to rent the server! |
| 53 | + |
| 54 | +I'll try to find a free tool for monitoring my website status, but if that |
| 55 | +doesn't work I guess I can just leave it unmonitored. This is a personal site, |
| 56 | +not a client site, and uptime is best-effort. |
| 57 | + |
| 58 | +[^1]: All times in this blog post are in Mountain Time; the hosting company is |
| 59 | +based in New Mexico. |
| 60 | + |
| 61 | +[traefik]: https://traefik.io/ |
| 62 | +[traefik-restart]: https://github.com/DanielEScherzer/website-traefik/blob/ea50b0520edc6d0aa8ab33442945ab8e2d88b408/docker-compose.yml#L4 |
| 63 | +[wordpress-restart]: https://github.com/DanielEScherzer/wordpress-site/blob/e27bdb23be09e06bd78de5ba87fc3834ab6ab8df/docker-compose.yml#L28 |
| 64 | +[patch-56]: https://github.com/DanielEScherzer/website-content/commit/bdc2324414ab47d2b6b1c0f6510c12d00fac8de5 |
0 commit comments