-
Notifications
You must be signed in to change notification settings - Fork 473
Description
I am running MeloNX on iOS with MoltenVK 1.4.1. Random crashes (SIGSEGV) are observed in-game consistently and reproducible across multiple devices, while calling for QueueSubmit. In the debugger, it crashed at
AGXMetal13_3`-[AGXG13GFamilyRenderContext setVisibilityResultMode:offset:]:
0x11a383bf8 <+0>: pacibsp
0x11a383bfc <+4>: stp x20, x19, [sp, #-0x20]!
0x11a383c00 <+8>: stp x29, x30, [sp, #0x10]
0x11a383c04 <+12>: add x29, sp, #0x10
0x11a383c08 <+16>: adrp x8, 1021
0x11a383c0c <+20>: ldrsw x8, [x8, #0x27c]
0x11a383c10 <+24>: ldr x8, [x0, x8]
0x11a383c14 <+28>: add x9, x8, #0x14, lsl #12 ; =0x14000
0x11a383c18 <+32>: add x19, x9, #0x8c0
0x11a383c1c <+36>: mov w9, #0x88dc ; =35036
0x11a383c20 <+40>: add x9, x8, x9
0x11a383c24 <+44>: ubfx x10, x3, #3, #29
-> 0x11a383c28 <+48>: strh w10, [x9, #0x500]
0x11a383c2c <+52>: cmp w2, #0x1
0x11a383c30 <+56>: cset w11, eq
0x11a383c34 <+60>: cmp w2, #0x0
0x11a383c38 <+64>: cset w12, ne
0x11a383c3c <+68>: ldr w13, [x9]
0x11a383c40 <+72>: lsl w12, w12, #15
0x11a383c44 <+76>: and w13, w13, #0xffff3fffFurther debugging shows the crash happens if a Metal render pass begins while the visibility buffer offset is already near its end (e.g., 262136/262144), the first occlusion query wraps the buffer immediately. Because firstVisibilityResultOffsetInRenderPass is still 0 (only updated after accumulation), the wrap path fires instantly, forcing a render-pass end/restart while the encoder is bound to a near-end offset. ChatGPT suggests that this can fault on AGX, producing duplicated beginMetalRenderPass logs and a crash. Here's the full logs for debugging before it crashed:
[MVK-AGENT]MVKCommandEncoder::beginMetalRenderPass::visibilityBufferSetup hasBuffer:true,offset:262136,size:262144,cmdUse:9,isRestart:false,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKCommandEncoder::beginMetalRenderPass::visibilityBufferSetup hasBuffer:true,offset:262136,size:262144,cmdUse:9,isRestart:false,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::encode::setVisibilityResultMode mode:2,prevMode:0,offset:262136,bufferSize:262144,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::beforeAdvance offset:262136,size:262144,emptyQueue:false},timestamp:394843302
[MVK-AGENT]MVKVisibilityBuffer::advanceOffset::advance offset:0,halfSize:131072,wrapped:true,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::afterAdvance offset:0,size:262144,numCopyFences:�,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::wrapTriggeredStore offset:0,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::beforeAdvance offset:0,size:262144,emptyQueue:false},timestamp:394843302
[MVK-AGENT]MVKVisibilityBuffer::advanceOffset::advance offset:8,halfSize:131072,wrapped:false,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::afterAdvance offset:8,size:262144,numCopyFences:�,firstOffsetInPass:0,timestamp:394843302
SIGSEGV
And ChatGPT proposed this fix, and after some test it does fix the segmentation fault in-game
diff --git a/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm b/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
index 707c2672..2453603d 100644
--- a/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
+++ b/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
@@ -777,6 +777,9 @@ static MVKBarrierStage commandUseToBarrierStage(MVKCommandUse use) {
if (!_pEncodingContext->visibilityResultBuffer.buffer()) {
_pEncodingContext->visibilityResultBuffer = _device->getVisibilityBuffer();
}
+ // Track the starting visibility offset for this Metal render pass so wrap detection compares
+ // against the correct baseline even when the buffer was already partially consumed.
+ _pEncodingContext->firstVisibilityResultOffsetInRenderPass = _pEncodingContext->visibilityResultBuffer.offset();
mtlRPDesc.visibilityResultBuffer = _pEncodingContext->visibilityResultBuffer.buffer();
}I am quite new to graphical APIs so I hope someone can confirm the root cause and have it fixed. Thank you!