feat: add configurable scratch region base GPA#1271
feat: add configurable scratch region base GPA#1271danbugs wants to merge 1 commit intohyperlight-dev:mainfrom
Conversation
9275564 to
b961eb1
Compare
ef6485a to
4b579d3
Compare
dblnz
left a comment
There was a problem hiding this comment.
LGTM!
However, I am not that well acquainted with this part of the codebase. Maybe @syntactically can take a look
| ..Default::default() | ||
| }, | ||
| // CR0.ET (bit 4) is hardwired to 1 on modern x86 CPUs. | ||
| // MSHV rejects the vCPU state if ET is not set. |
There was a problem hiding this comment.
How hasn't mshv failed so far? was this scenario not tested before?
There was a problem hiding this comment.
I don't believe there are many tests for the 16/32-bit guests.
| // MSHV (unlike KVM) does not provide sane defaults for the XSAVE | ||
| // area on a freshly created vCPU, and will reject the VP state | ||
| // on the first run if it is left uninitialized. | ||
| vm.reset_xsave().map_err(VmError::Register)?; |
There was a problem hiding this comment.
I'm curious how this worked before
There was a problem hiding this comment.
It looks like at issue here is the difference between bringing up the vcpu in real vs long mode. Does the xsave reset call need to be there in both, or should it just be in real mode? I know we had tests of the long mode mshv stuff passing before.
There was a problem hiding this comment.
I think the real vs long is the issue here. When I did some investigation in mxcsr on kvm awhile back it did work fine in long mode.
There was a problem hiding this comment.
I agree that if we don't need this for some targets it would be nice to not waste cycles here
|
I'm not clear on the need for a configurable region base for scratch. You can just set MAX_GPA/MAX_GVA based on target architecture + feature flags? (see src/hyperlight_common/src/arch/i686/layout.rs). Maybe the selection based on target_arch of which layout module to use is broken in the host for the i686 case where it's the guest arch that counts; maybe that needs to change to something like I'm not clear on how the idea of making snapshot pages writable is going to work with the snapshot-first lifecycle (#1268). This seems like it will break a lot of assumptions and make a lot of code more complex/rife with opportunities for vulnerabilities. If you really are unable to run on amd64 natively, would teaching hyperlight about i686 paging structures be a slightly smaller break of the conceptual model? |
This seems like something we could flush out in that discussion? With CoW changes we've broken some downstream consumers, so it makes sense give an escape hatch (maybe behind a feature flag) while we sort through what we want to do. As far as I understand, this isn't a breaking change and doesn't introduce issues now? or does it? To be clear I think we do need to reconcile this requirement and enabling this wouldn't be a long term solution |
ludfjig
left a comment
There was a problem hiding this comment.
does nanvix not use snapshotting? If we can do this without adding a config option, I think that would be good too.
| // MSHV (unlike KVM) does not provide sane defaults for the XSAVE | ||
| // area on a freshly created vCPU, and will reject the VP state | ||
| // on the first run if it is left uninitialized. | ||
| vm.reset_xsave().map_err(VmError::Register)?; |
There was a problem hiding this comment.
I agree that if we don't need this for some targets it would be nice to not waste cycles here
|
Addressed the config option concern. Latest push removes
The Regarding the xsave commit that was on this branch — it's been dropped from the latest push since it's orthogonal to the scratch GPA changes. The writable snapshot commit remains, gated behind Note that Nanvix doesn't use Hyperlight's snapshotting mechanism. Instead, it has a kernel call that, being guest-aware, can VMExit at any time to snapshot. Nanvix does this because it doesn't execute like our purpose-built guest binaries (e.g., it doesn't use guest function calls). Instead, it just calls |
cac29ff to
f3b4089
Compare
Propagate the init-paging feature flag from hyperlight-host to hyperlight-common so that the layout module selects the i686 constants (32-bit MAX_GPA/MAX_GVA) when init-paging is disabled. This allows 32-bit guests (like Nanvix) to use the correct scratch region placement without needing a configurable GPA override. Also make the snapshot memory region writable for non-init-paging guests: without guest page tables there is no CoW layer, so the CPU needs direct write access (e.g. for GDT descriptor Accessed bits during segment loads). Signed-off-by: danbugs <danilochiarlone@gmail.com>
f3b4089 to
92140ff
Compare
Summary
Adds a configurable scratch region base GPA to
SandboxConfiguration, makes the snapshot memory region writable for non-init-pagingguests, and uses compile-time#[cfg(feature = "init-paging")]gates for scratch GVA computation.Motivation
Nanvix runs as a 32-bit (i386) guest without Hyperlight's
init-pagingfeature. The default scratch region GPA sits at the top of the 36-bit physical address range — unreachable by a non-paging 32-bit kernel. This PR lets the caller place the scratch region at an arbitrary GPA (e.g., just below 4 GB) and ensures the snapshot region is host-writable so non-paging guests can function correctly.Changes
Commit 1:
feat: add configurable scratch region base GPAconfig.rs: Newscratch_base_gpa: u64field (0 = use default) with getter/setter. Usesu64with a zero sentinel rather thanOption<u64>to maintain#[repr(C)]FFI safety, matching the existingheap_size_overridepattern.layout.rs: Uses custom GPA when specifiedhyperlight_vm.rs: Stores and passes throughscratch_base_gpagdb/mod.rs: Adapts to layout changessnapshot.rs: Takesscratch_base_gpa: u64directly instead of computing it from sizeCommit 2:
fix: make snapshot region writable for non-init-paging guestsshared_mem.rs: Makes snapshot region writable at host level so non-paging guests can write to itCommit 3:
fix: use init-paging feature gate for scratch GVA computationlayout.rs+hyperlight_vm.rs: Uses#[cfg(feature = "init-paging")]for scratch GVA computation. Withinit-paging, the GVA is the high-canonical page-table-mapped address; without it, GVA = GPA (identity mapped). This is a compile-time decision, not a runtime one.Commit 4:
fix: use scratch base GPA instead of scratch size in page table walkermgr.rs: Updatesread_guest_memory_by_gva(behindtrace_guest/mem_profilefeatures) to pass the correctscratch_base_gpavalue instead ofscratch_sizetoaccess_gpaandSharedMemoryPageTableBuffer::new, matching their updated signatures from commit 1.Test plan
cargo clippypasses with all feature combinations (kvm,init-paging,kvm,executable_heap,kvm,mshv3,mem_profile,kvm,mshv3,trace_guest, etc.)cargo test -p hyperlight-host --no-default-features -F "kvm,init-paging" --libpassescargo build -p hyperlight-host --no-default-features -F "kvm,executable_heap" --libsucceeds