Skip to content

Conversation

@geekosaur
Copy link
Collaborator

@geekosaur geekosaur commented Feb 1, 2026

Use cache keys that capture the OS version in sufficient detail to ensure that e.g. OS libraries don't mismatch between caches. This will prevent bad caches from being used when e.g. ubuntu-latest is updated.


Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

@geekosaur geekosaur force-pushed the safe-cache-keys branch 2 times, most recently from fc067ea to bd768c5 Compare February 1, 2026 04:35
@geekosaur
Copy link
Collaborator Author

Yay, it's doing the right thing. (The current run is of course reporting cache misses, because "ubuntu24" doesn't match "Linux"… but that's exactly what I want here.)

@geekosaur geekosaur marked this pull request as ready for review February 1, 2026 04:40
@geekosaur geekosaur changed the title TESTING: add a debug step to see if image version available Replace OS name with image version in cache keys Feb 1, 2026
@geekosaur
Copy link
Collaborator Author

geekosaur commented Feb 1, 2026

This is not yet complete, though, I just realized: bootstrapping and quick jobs also use cache. The latter will be "hardest" to fix, because it uses the cache more times (7 or 8, IIRC, from when I grepped for actions/cache) than bootstrap (1) or validate (3); that said, it's all read only.


ETA: nope, it's written. All Linux, but that means replacing a literal "linux-" prefix with the same imageOS hack since it's all stores with OS libraries embedded.

@geekosaur geekosaur marked this pull request as draft February 1, 2026 04:43
@geekosaur
Copy link
Collaborator Author

geekosaur commented Feb 1, 2026

Hm. Bootstrap caches have a datestamp in them. Should that be being updated, and if so when? (Regeneration of the JSON build plans seems likely to me, and as such probably that should be in the release wiki page where the plans are regenerated.)
/cc: @Mikolaj @ulysses4ever

@geekosaur
Copy link
Collaborator Author

Bootstrap and quick-jobs now also updated; let's see if I did it right….

@geekosaur
Copy link
Collaborator Author

Missed some fallback keys in quick-checks which will be fixed on the next push; otherwise this is ready to go. (But not requesting reviews until I do that push.)

This avoids OS updates in runner images giving us caches referencing
possibly nonexistent OS shared objects.
@ulysses4ever
Copy link
Collaborator

@andreasabel I learned about caching haskell things in GHA from your works so I dare to ping you here for your opinion: do you think this change makes sense?

@geekosaur geekosaur linked an issue Feb 1, 2026 that may be closed by this pull request
@geekosaur
Copy link
Collaborator Author

The now-linked issue explains why I did this, FWIW. /cc: @andreasabel

@geekosaur
Copy link
Collaborator Author

geekosaur commented Feb 2, 2026

Uh, let me correct that; that issue is mostly notes, the issue it links to is #11296 which is where the real problem arose.

The best of my understanding is that GitHub was rolling out 24.04 one runner at a time, and when a build was cached on a newer runner any older runners trying to use that cache threw errors about cached references to newer glibc and/or glibcxx symbols. This PR ensures that such upgrades are separately cached.

(IIRC we also saw this in a different form: validate-old-ghcs was failing because libtinfo.so.5 couldn't be installed on 24.04. But that didn't poison caches, at least.)

I will also add that we prefer to use -latest so we catch issues with new OS versions as quickly as possible. This is also why we don't pin dependencies in validate jobs.

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making cache-keys more accurate, and you seem to need this accuracy (for reusing binaries from the cache).
I am missing the context to know the exact symptom you are fighting.
Just throwing in that the images get updated and if your setup is very dependent on the versions of everything on the image you might maybe need the image version also in the key: https://github.com/haskell/cabal/actions/runs/21572227509/job/62153142155#step:1:17
(But before considering this there should be a symptom, of course.)

@geekosaur
Copy link
Collaborator Author

I have access to the image version (it's $ImageVersion in the same way $ImageOS is available), but I think if we ever need to dice things that finely we're kinda doomed. It's the OS major version GLIBC_* and GLIBCXX_* symbols that bite us and the OS major version is sufficient for those. (Since GitHub only uses LTS versions, we don't have to worry about e.g. 24.10 vs. 24.04.)

@geekosaur
Copy link
Collaborator Author

I've also realized a couple more cases where this can happen: switching between release images. For example, if you for some reason need to downgrade ubuntu-24.04 to ubuntu-22.04 the caches from the former would be used for builds with the latter as things work now.

I don't know how necessary this is for MacOS or Windows (Microsoft, at least, is fairly careful about backward compatibility — but forward compatibility requires an oracle regardless of OS) but this will make sure it can't come up with those either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace runner.os with matrix.os in cache keys

3 participants