Draft
Conversation
I tested this out by first trying to abort and watching it fail because
there is no trust quorum configuration. Then I issued an LRTQ upgrade,
which will fail because I didn't restart the sled-agents to pick up the
LRTQ shares. Then I aborted that configuration stuck in prepare. Lastly,
I successfully issued a new LRTQ upgrade after restartng the sled agents
and watched it commit.
Here's the external API calls:
```
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api '/v1/system/hardware/racks/ea7f612b-38ad-43b9-973c-5ce63ef0ddf6/membership/abort' --method POST
error; status code: 404 Not Found
{
"error_code": "Not Found",
"message": "No trust quorum configuration exists for this rack",
"request_id": "819eb6ab-3f04-401c-af5f-663bb15fb029"
}
error
➜ oxide.rs git:(main) ✗
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api '/v1/system/hardware/racks/ea7f612b-38ad-43b9-973c-5ce63ef0ddf6/membership/abort' --method POST
{
"members": [
{
"part_number": "913-0000019",
"serial_number": "20000000"
},
{
"part_number": "913-0000019",
"serial_number": "20000001"
},
{
"part_number": "913-0000019",
"serial_number": "20000003"
}
],
"rack_id": "ea7f612b-38ad-43b9-973c-5ce63ef0ddf6",
"state": "aborted",
"time_aborted": "2026-01-29T01:54:02.590683Z",
"time_committed": null,
"time_created": "2026-01-29T01:37:07.476451Z",
"unacknowledged_members": [
{
"part_number": "913-0000019",
"serial_number": "20000000"
},
{
"part_number": "913-0000019",
"serial_number": "20000001"
},
{
"part_number": "913-0000019",
"serial_number": "20000003"
}
],
"version": 2
}
```
Here's the omdb calls:
```
root@oxz_switch:~# omdb nexus trust-quorum lrtq-upgrade -w
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
Error: lrtq upgrade
Caused by:
Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "8503cd68-7ff4-4bf1-b358-0e70279c6347", "content-length": "124", "date": "Thu, 29 Jan 2026 01:37:09 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "8503cd68-7ff4-4bf1-b358-0e70279c6347" }
root@oxz_switch:~# omdb nexus trust-quorum get-config ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 (rack),
epoch: Epoch(
2,
),
last_committed_epoch: None,
state: PreparingLrtqUpgrade,
threshold: Threshold(
2,
),
commit_crash_tolerance: 0,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
},
encrypted_rack_secrets: None,
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
},
time_created: 2026-01-29T01:37:07.476451Z,
time_committing: None,
time_committed: None,
time_aborted: None,
abort_reason: None,
}
root@oxz_switch:~# omdb nexus trust-quorum get-config ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 (rack),
epoch: Epoch(
2,
),
last_committed_epoch: None,
state: Aborted,
threshold: Threshold(
2,
),
commit_crash_tolerance: 0,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
},
encrypted_rack_secrets: None,
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
},
time_created: 2026-01-29T01:37:07.476451Z,
time_committing: None,
time_committed: None,
time_aborted: Some(
2026-01-29T01:54:02.590683Z,
),
abort_reason: Some(
"Aborted via API request",
),
}
root@oxz_switch:~# omdb nexus trust-quorum lrtq-upgrade -w
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
Started LRTQ upgrade at epoch 3
root@oxz_switch:~# omdb nexus trust-quorum get-config ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 (rack),
epoch: Epoch(
3,
),
last_committed_epoch: None,
state: PreparingLrtqUpgrade,
threshold: Threshold(
2,
),
commit_crash_tolerance: 0,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
},
encrypted_rack_secrets: None,
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Unacked,
share_digest: None,
time_prepared: None,
time_committed: None,
},
},
time_created: 2026-01-29T02:20:03.848507Z,
time_committing: None,
time_committed: None,
time_aborted: None,
abort_reason: None,
}
root@oxz_switch:~# omdb nexus trust-quorum get-config ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: ea7f612b-38ad-43b9-973c-5ce63ef0ddf6 (rack),
epoch: Epoch(
3,
),
last_committed_epoch: None,
state: Committed,
threshold: Threshold(
2,
),
commit_crash_tolerance: 0,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
},
encrypted_rack_secrets: Some(
EncryptedRackSecrets {
salt: Salt(
[
143,
198,
3,
63,
136,
48,
212,
180,
101,
106,
50,
2,
251,
84,
234,
25,
46,
39,
139,
46,
29,
99,
252,
166,
76,
146,
78,
238,
28,
146,
191,
126,
],
),
data: [
167,
223,
29,
18,
50,
230,
103,
71,
159,
77,
118,
39,
173,
97,
16,
92,
27,
237,
125,
173,
53,
51,
96,
242,
203,
70,
36,
188,
200,
59,
251,
53,
126,
48,
182,
141,
216,
162,
240,
5,
4,
255,
145,
106,
97,
62,
91,
161,
51,
110,
220,
16,
132,
29,
147,
60,
],
},
),
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 13c0a6113e55963ed35b275e49df4c3f0b3221143ea674bb1bd5188f4dac84,
),
time_prepared: Some(
2026-01-29T02:20:46.792674Z,
),
time_committed: Some(
2026-01-29T02:21:49.503179Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 8557d74f678fa4e8278714d917f14befd88ed1411f27c57d641d4bf6c77f3b,
),
time_prepared: Some(
2026-01-29T02:20:47.236089Z,
),
time_committed: Some(
2026-01-29T02:21:49.503179Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: d61888c42a1b5e83adcb5ebe29d8c6c66dc586d451652e4e1a92befe41719cd,
),
time_prepared: Some(
2026-01-29T02:20:46.809779Z,
),
time_committed: Some(
2026-01-29T02:21:52.248351Z,
),
},
},
time_created: 2026-01-29T02:20:03.848507Z,
time_committing: Some(
2026-01-29T02:20:47.597276Z,
),
time_committed: Some(
2026-01-29T02:21:52.263198Z,
),
time_aborted: None,
abort_reason: None,
}
```
After chatting with @davepacheco, I changed the authz checks in the datastore to do lookups with Rack scope. This fixed the test bug, but is only a shortcut. Trust quorum should have it's own authz object and I"m going to open an issue for that. Additionally, for methods that already took an authorized connection, I removed the unnecessary authz checks and opctx parameter.
This commit adds a 3 phase mechanism for sled expungement. The first phase is to remove the sled from the latest trust quorum configuration via omdb. The second phase is to reboot the sled after polling for commit the trust quorum removal. The third phase is to issue the existing omdb expunge command, which changes the sled policy as before. The first and second phases remove the need to physically remove the sled before expungement. They act as a software mechanism that gates the sled-agent from restarting on the sled and doing work when it should be treated as "absent". We've discussed this numerous times in the update huddle and it is finally arriving! The third phase is what informs reconfigurator that the sled is gone and remains the same except for an extra sanity check that that the last committed trust quorum configuration does not contain the sled that is to be expunged. The removed sled may be added back to this rack or another after being clean slated. I tested this by deleting the files in the internal "cluster" and "config" directories and rebooting the removed sled in a4x2 and it worked. This PR is marked draft because it changes the current sled-expunge pathway to depend on real trust quorum. We cannot safely merge it in until the key-rotation work from #9737 is merged in. This also builds on #9741 and should merge after that PR.
When Trust Quorum commits a new epoch, all U.2 crypt datasets must have their encryption keys rotated to use the new epoch's derived key. This change implements the key rotation flow triggered by epoch commits. ## Trust Quorum Integration - Add watch channel to `NodeTaskHandle` for epoch change notifications - Initialize channel with current committed epoch on startup - Notify subscribers via `send_if_modified()` when epoch changes ## Config Reconciler Integration - Accept `committed_epoch_rx` watch channel from trust quorum - Trigger reconciliation when epoch changes - Track per-disk encryption epoch in `ExternalDisks` - Add `rekey_for_epoch()` to coordinate key rotation: - Filter disks needing rekey (cached epoch < target OR unknown) - Derive keys for each disk via `StorageKeyRequester` - Send batch request to dataset task - Update cached epochs on success - Retry on failure via normal reconciliation retry logic ## Dataset Task Changes - Add `RekeyRequest`/`RekeyResult` types for batch rekey operations - Add `datasets_rekey()` with idempotency check (skip if already at target) - Use `Zfs::change_key()` for atomic key + epoch property update ## ZFS Utilities - Add `Zfs::change_key()` using `zfs_atomic_change_key` crate - Add `Zfs::load_key()`, `unload_key()`, `dataset_exists()` - Add `epoch` field to `DatasetProperties` - Add structured error types for key operations ## Crash Recovery - Add trial decryption recovery in `sled-storage` for datasets with missing epoch property (e.g., crash during initial creation) - Unload key before each trial attempt to handle crash-after-load-key - Set epoch property after successful recovery ## Safety Properties - Atomic: Key and epoch property set together via `zfs_atomic_change_key` - Idempotent: Skip rekey if dataset already at target epoch - Crash-safe: Epoch read from ZFS on restart rebuilds cache correctly - Conservative: Unknown epochs (None) trigger rekey attempt
Create a new key-manager-types crate containing the disk encryption key types (Aes256GcmDiskEncryptionKey and VersionedAes256GcmDiskEncryptionKey) that were previously defined in key-manager. This breaks the dependency from illumos-utils to key-manager, allowing illumos-utils to depend only on the minimal types crate. The key-manager crate re-exports VersionedAes256GcmDiskEncryptionKey for backwards compatibility.
- Format ZFS_GET_PROPS const with concat! and clarify epoch field docs - Preserve error source chain with anyhow::Error::from instead of formatting - Convert KeyRotationError from enum to struct (single variant) - Log current and new epochs in key rotation success and failure paths - Change rekey_for_epoch to return ReconciliationResult instead of bool - Add error log for unexpected epoch-ahead-of-target condition - Simplify epoch filter using Option<Epoch> ordering - Move dataset_name allocation into Ok branch to minimize scope - Upgrade all-key-derivations-failed log from info to warn - Inline dataset_exists helper, calling Zfs::dataset_exists directly - Log warning on best-effort unload_key failure
The mark_unchanged() call before the reconciler loop was a no-op: borrow_and_update() inside do_reconcilation() already reads and processes the current epoch unconditionally, marking it as seen. Also remove a stale comment about copying epoch out of the Ref, which was informational and no longer adds clarity.
This reverts commit 8062384.
Now that illumos ZFS supports `zfs change-key -o oxide:epoch=N`, we no longer need the zfs-atomic-change-key crate that embedded key material in Lua scripts. Zfs::change_key() now just runs the native command, taking (dataset, epoch) instead of (dataset, key). Keyfile management is lifted to the caller in datasets_rekey(), which uses KeyFile::create + zero_and_unlink — matching the existing pattern used for dataset creation and trial decryption. This ensures key material is zeroed from tmpfs promptly after use.
… into tq-expunge-and-zfs-change-key
Contributor
Author
|
This all worked like a charm. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DO NOT MERGE! For testing purposes only.