Skip to content

Conversation

@emmiegit
Copy link
Member

@emmiegit emmiegit commented Feb 24, 2025

Apologies in advance for the size for this PR. Sorry it's been two months!

As seems to be common in this project, attempting to introduce a significant piece of infrastructure / a new abstraction (in this case wjfiles) revealed several core issues which I've had to work through. Ultimately, this PR changed directions several times as new issues were revealed, but in the end I think the infra is in a better state now for having found these problems.

Major Changes

  • Adds WWS. "Wilson's Web Server" is a Rust webserver which has the overall purpose of serving wjfiles. It takes in requests, looks up the relevant data from its series of handlers (e.g. getting file data from S3), and returns it. This server will not be exposed publicly, but instead is fronted by:
  • Adds Caddy. Caddy is a web server similar to Traefik, which is highly configurable and can handle our routing, certificate renewl, TLS terminal, and reverse proxying. It will sit in front of framerail and wws and send requests to the appropriate server. Our host determination logic through dynamic generation of the Caddyfile configuration file:
  • Adds CaddyService to DEEPWELL. This new service is responsible for generating the Caddyfile config that is fed to caddy. Instead of doing a host lookup every time we get a new request, we just generate the configuration with all the relevant matching (e.g. this custom domain refers to this site, this site is preferred and should be redirected to, etc.) This is responsible for adding internal HTTP headers, X-Wikijump-Site-Id and X-Wikijump-Site-Slug, which terminates the host logic and allows wws and framerail to easily lookup site information using the site ID. There are tests to ensure that the config generation is as expected for various options.
  • Adds corresponding infrastructure changes. I have added new GitHub workflows to test the new images (note that the diff here is unhelpful and thinks some of these are moves between incorrect source/dest pairs), and local Docker files to build images, particularly for local deploys. When you start Wikijump locally, everything should automatically be set as you expect.
  • Renames infra images. Previously, we had an asymmetry between the names infrastructure code used for services and their normal name. This eliminates this disparity and ensures they are the same. The differences are:
    • apideepwell
    • webframerail
    • Adds wws
    • Adds caddy
  • Adds "special errors" as a concept. These are error messages that are in fluent (and so are localized), which can be generated by calling DEEPWELL. This way we can avoid unlocalized text as much as possible, even in cases where we aren't showing pretty error messages. Which brings us to:
  • Adds "fallback errors" as a concept to wws. In cases where wws is unable to show a proper error message or reveal proper information, we show a "fallback error". An example for this case is "wws is trying to show an error message but can't even connect to DEEPWELL to generate an error message", in which case we show that it's an error and print a unique code, like ERROR XF-1003. This way developers can be alerted to the matter and shown an indication of where the problem may be despite the other limitations that clearly existed. (If you're curious, ERROR is obvious and hopefully intelligible English for most people given our Anglocentric modern web, and XF is a combination of X, which seems like a thing you'd see in an error code, and F for fallback. The number should increase monotonically so we can uniquely identify issues.)

All the other changes should fall into one of the above categories, or be one of many miscellenous fixes or updates.

Notes and Caveats

  • wws is not finished, but I figured future PRs would be easier to review if they were not so bulky. I have the "happy path" for file requests finished, in addition to simpler handlers, but code and html requests are entirely stubbed and will be done separately in WJ-1224.
  • The DigitalOcean App Platform configuration was not fully updated, since adding yet another droplet is going to be annoyingly expensive, and I am looking at alternatives (see WJ-1300). I will just deploy this as-is, and in follow-up PRs address the dev environment there. This is also why I have disabled all deploy_on_push settings.
  • I generate a Caddyfile instead of the underlying JSON for two reasons. First, Caddyfiles are human-readable and much easier to inspect for correctness visually. (This is why I have unit tests for CaddyService generation.) Second, I am not nearly as familiar with how to properly generate those JSON rules, and I don't see any issue with caddy generating it for us.
  • The Caddy container will request a new Caddyfile configuration from DEEPWELL on boot, and a cronjob exists inside to auto-update it every hour. This is my solution to having site host changes (like adding a custom domain) become present in the infrastructure, though it does add a time delay (since I'm not sure how to securely make the updates push-only) and that a bug in CaddyService means all site updates will be frozen (due to Caddy's auto-rollback mechanism) until it's fixed. This regular config update should have no effect on the infrastructure: according to Caddy's docs, the reload is "lightweight, efficient, and incur[s] zero downtime", and the change isn't even applied anyways if the configuration is the same (which it should always be unless somebody adds a new site or changes one of its domain settings).
  • The "ftml" section in the Dependabot configuration was a leftover I didn't clean up when I split ftml back out into its own repo. Since I needed to add settings for WWS anyways, the diff saw it as just a rename.
  • The /-/health-check route pings DEEPWELL, which in turn pings Postgres and Redis. It's a more "proper" health check. There is also /-/health-check/caddy, which is simply that Caddy is able to respond. The latter is for debugging issues with routing, so you can determine that things are working as far as Caddy, but where reverse proxying further is broken.
  • The X-Wikijump-Target-Server header can be either main or files, and is used in cases where a request is hitting the main server (e.g. wikijump.com), but wws is handling the request. An example would be robots.txt, which could differ between foo.wikijump.com and foo.wjfiles.com. Planning for how exactly sites can customize robots.txt's contents has yet to be planned, but the groundwork for it is laid here, since we can add appropriate code to wws for that.

We're not handling missing sites at this level.
Host information is already passed through.
@emmiegit emmiegit force-pushed the WJ-1293-wjfiles branch 2 times, most recently from c331f7d to 7e5a81d Compare April 7, 2025 16:38
emmiegit added 8 commits April 7, 2025 12:54
Since we moved the admin panel route, we can also resolve the issue with
_admin being "not a page" for the purposes of stuff like orange links
by actually just making a page-based redirect!

See WJ-331.
This was accidentally added here.
@emmiegit emmiegit marked this pull request as ready for review April 7, 2025 17:26
@emmiegit emmiegit requested a review from Yossipossi1 April 7, 2025 17:27
@emmiegit emmiegit merged commit 02a90a2 into develop Apr 7, 2025
23 checks passed
@emmiegit emmiegit deleted the WJ-1293-wjfiles branch April 7, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants