Skip to content

Occlusion queries: wrap detection uses stale baseline, crashes when visibility buffer starts near end of pass #2682

@BXYMartin

Description

@BXYMartin

I am running MeloNX on iOS with MoltenVK 1.4.1. Random crashes (SIGSEGV) are observed in-game consistently and reproducible across multiple devices, while calling for QueueSubmit. In the debugger, it crashed at

AGXMetal13_3`-[AGXG13GFamilyRenderContext setVisibilityResultMode:offset:]:
    0x11a383bf8 <+0>:   pacibsp 
    0x11a383bfc <+4>:   stp    x20, x19, [sp, #-0x20]!
    0x11a383c00 <+8>:   stp    x29, x30, [sp, #0x10]
    0x11a383c04 <+12>:  add    x29, sp, #0x10
    0x11a383c08 <+16>:  adrp   x8, 1021
    0x11a383c0c <+20>:  ldrsw  x8, [x8, #0x27c]
    0x11a383c10 <+24>:  ldr    x8, [x0, x8]
    0x11a383c14 <+28>:  add    x9, x8, #0x14, lsl #12    ; =0x14000 
    0x11a383c18 <+32>:  add    x19, x9, #0x8c0
    0x11a383c1c <+36>:  mov    w9, #0x88dc               ; =35036 
    0x11a383c20 <+40>:  add    x9, x8, x9
    0x11a383c24 <+44>:  ubfx   x10, x3, #3, #29
->  0x11a383c28 <+48>:  strh   w10, [x9, #0x500]
    0x11a383c2c <+52>:  cmp    w2, #0x1
    0x11a383c30 <+56>:  cset   w11, eq
    0x11a383c34 <+60>:  cmp    w2, #0x0
    0x11a383c38 <+64>:  cset   w12, ne
    0x11a383c3c <+68>:  ldr    w13, [x9]
    0x11a383c40 <+72>:  lsl    w12, w12, #15
    0x11a383c44 <+76>:  and    w13, w13, #0xffff3fff

Further debugging shows the crash happens if a Metal render pass begins while the visibility buffer offset is already near its end (e.g., 262136/262144), the first occlusion query wraps the buffer immediately. Because firstVisibilityResultOffsetInRenderPass is still 0 (only updated after accumulation), the wrap path fires instantly, forcing a render-pass end/restart while the encoder is bound to a near-end offset. ChatGPT suggests that this can fault on AGX, producing duplicated beginMetalRenderPass logs and a crash. Here's the full logs for debugging before it crashed:

[MVK-AGENT]MVKCommandEncoder::beginMetalRenderPass::visibilityBufferSetup hasBuffer:true,offset:262136,size:262144,cmdUse:9,isRestart:false,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKCommandEncoder::beginMetalRenderPass::visibilityBufferSetup hasBuffer:true,offset:262136,size:262144,cmdUse:9,isRestart:false,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::encode::setVisibilityResultMode mode:2,prevMode:0,offset:262136,bufferSize:262144,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::beforeAdvance offset:262136,size:262144,emptyQueue:false},timestamp:394843302
[MVK-AGENT]MVKVisibilityBuffer::advanceOffset::advance offset:0,halfSize:131072,wrapped:true,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::afterAdvance offset:0,size:262144,numCopyFences:�,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::wrapTriggeredStore offset:0,firstOffsetInPass:0,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::beforeAdvance offset:0,size:262144,emptyQueue:false},timestamp:394843302
[MVK-AGENT]MVKVisibilityBuffer::advanceOffset::advance offset:8,halfSize:131072,wrapped:false,timestamp:394843302
[MVK-AGENT]MVKOcclusionQueryCommandEncoderState::nextMetalQuery::afterAdvance offset:8,size:262144,numCopyFences:�,firstOffsetInPass:0,timestamp:394843302
SIGSEGV

And ChatGPT proposed this fix, and after some test it does fix the segmentation fault in-game

diff --git a/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm b/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
index 707c2672..2453603d 100644
--- a/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
+++ b/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm
@@ -777,6 +777,9 @@ static MVKBarrierStage commandUseToBarrierStage(MVKCommandUse use) {
                if (!_pEncodingContext->visibilityResultBuffer.buffer()) {
                        _pEncodingContext->visibilityResultBuffer = _device->getVisibilityBuffer();
                }
+               // Track the starting visibility offset for this Metal render pass so wrap detection compares
+               // against the correct baseline even when the buffer was already partially consumed.
+               _pEncodingContext->firstVisibilityResultOffsetInRenderPass = _pEncodingContext->visibilityResultBuffer.offset();
                mtlRPDesc.visibilityResultBuffer = _pEncodingContext->visibilityResultBuffer.buffer();
        }

I am quite new to graphical APIs so I hope someone can confirm the root cause and have it fixed. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions