Skip to content

GVN: invalidate moved-from droppable locals on Operand::Move#155296

Closed
blackms wants to merge 1 commit intorust-lang:mainfrom
blackms:fix-gvn-move-invalidation-155241
Closed

GVN: invalidate moved-from droppable locals on Operand::Move#155296
blackms wants to merge 1 commit intorust-lang:mainfrom
blackms:fix-gvn-move-invalidation-155241

Conversation

@blackms
Copy link
Copy Markdown

@blackms blackms commented Apr 14, 2026

Fixes #155241 (the MIR-GVN half of the issue; the extern "rust-call" const-arg half is handled in #155291).

Problem

simplify_operand was treating Operand::Move(p) and Operand::Copy(p) identically and leaving the recorded VnIndex of p.local in place even after the value had been consumed. A later aggregate rvalue with the same VnIndex was then rewritten to Operand::Copy(p.local) by visit_assign, and StorageRemover downgraded the original Move to a Copy as well, turning a single move into a double-use of memory the callee had already dropped.

The reproducer in the issue is a pair of FnOnce-consumed (Vec::new(),) tuples, both of which receive the same aggregate VnIndex because Vec::new() lowers to a deterministic Aggregate rvalue. GVN then made the second call_once(move _7) into call_once(copy _5) after the first call had already taken ownership of _5, leading to either an unreachable!() panic or heap corruption (SIGABRT on stable 1.94.1) at -Copt-level=2+.

Fix

This patch records simplify_operand's Move branch separately and, when the moved place's type needs_drop, drops everything GVN knows about the backing local via a new invalidate_local helper. Calls reach this path through super_terminator -> visit_operand, so terminator-arg moves are covered with no extra plumbing. !needs_drop types are intentionally left alone: for them Move and Copy are observationally equivalent in the post-borrowck MIR GVN runs on, so the existing unification (e.g. moves of &mut T, *mut T, plain integer aggregates) is preserved.

Soundness sketch

After _5 = move _x GVN no longer believes _5 holds a re-derivable value for any locally-tracked VnIndex, so it cannot rewrite a future _y = some_aggregate_with_same_vn into _y = copy _5. The invalidation is gated on needs_drop because that is precisely when the post-move source is left in an "unspecified state": the callee may drop / move-out of the slot, and the original drop glue for _5 may have been elided by drop elaboration (which ran before GVN).

The narrower alternative — invalidate only at TerminatorKind::Call arguments — was considered and rejected: the same unsoundness arises within a single basic block when a partial move (_x = move _5.field) is followed by another aggregate with the same VnIndex as _5's, and StorageRemover can no longer rescue that pattern.

Test plan

  • New run-pass test tests/ui/mir/gvn-fnonce-miscompile.rs runs the MCVE under -Copt-level=3 -Zmir-enable-passes=+GVN and asserts the second closure sees a fresh, empty Vec.
  • New mir-opt diff test tests/mir-opt/gvn_move_invalidation.rs exercises the post-fix MIR shape for both the droppable (no rewrite) and plain (constant-fold) cases.
  • Three pre-existing mir-opt tests lose the unsound aggregate-to-copy rewrite as a documented consequence:
    • tests/mir-opt/pre-codegen/try_identity.rs (old desugaring): the Ok(x?) identity is no longer collapsed into _0 = copy _1 for the generic Result<T, E>. The new desugaring is unaffected.
    • tests/mir-opt/gvn_copy_aggregate.rs::remove_storage_dead: the rebuilt AlwaysSome::<T>::Some(_) is no longer rewritten to _0 = copy _3.
    • tests/mir-opt/early_otherwise_branch_unwind.rs: EarlyOtherwiseBranch now emits a fresh discriminant read on the cleanup edge instead of reusing the cached one.

x test tests/mir-opt, tests/ui/mir, tests/ui/borrowck, tests/ui/closures, tests/ui/drop, tests/codegen-llvm, tests/assembly-llvm all pass on aarch64-apple-darwin --stage 1. x test tests/ui --stage 1 made it past 13200/21004 cases with no failures before being interrupted for time.

End-to-end verification

  • Bug reproduces on system rustc 1.94.1: -Copt-level=2 exit 134 (SIGABRT).
  • Bug reproduces on stage0 (unpatched): -Copt-level=3 -Zmir-enable-passes=+GVN exit 133.
  • Bug fixed on stage1 (patched): same command exits 0.

Open questions

The following are genuine soundness / policy questions that survive this fix; flagging them for the mir-opt reviewers rather than trying to address them in this PR:

  1. Should StorageRemover::visit_operand also gate its Move -> Copy downgrade on !needs_drop? This PR stops GVN from amplifying that downgrade into unsoundness, but the downgrade itself remains a footgun for any future pass that adds itself to reused_locals. Probably warrants a separate hardening PR.
  2. Indirect-by-ref Call arguments (PassMode::Indirect { mode: ByMutRef, .. }) can be mutated by the callee. Today GVN ignores this entirely. This PR doesn't make that worse, but it's a real soundness gap that should be tracked separately.
  3. simplify_aggregate_to_copy still synthesises Operand::Copy(place) from a place computed by try_as_place without a proper liveness check. The invalidate_local mechanism here keeps the rewrite honest from the GVN side, but a principled liveness analysis would let us drop the needs_drop heuristic and unify even droppable aggregates when it's safe — natural transition path to the "Option B" explored in the investigation.

r? mir-opt


LLM disclosure: this PR was bootstrapped by a Claude-based coding agent team that read the GVN pass end-to-end, formed the root-cause hypothesis, and drafted the initial patch and tests. I then validated the work manually: reproduced the bug on system rustc 1.94.1 and on stage0, ran x test across mir-opt, ui/mir, ui/borrowck, ui/closures, ui/drop, codegen-llvm, and assembly-llvm on aarch64-apple-darwin --stage 1 to confirm there are no regressions beyond the three pre-existing tests documented above, and reviewed the diff line-by-line before pushing. Happy to answer any question about the reasoning or rewrite anything that reads as machine-y.

`simplify_operand` was treating `Operand::Move(p)` and `Operand::Copy(p)`
identically and leaving the recorded `VnIndex` of `p.local` in place even after
the value had been consumed. A later aggregate rvalue with the same `VnIndex`
was then rewritten to `Operand::Copy(p.local)` by `visit_assign`, and
`StorageRemover` downgraded the original `Move` to a `Copy` as well, turning a
single move into a double-use of memory the callee had already dropped.

The reproducer is a pair of `FnOnce`-consumed `(Vec::new(),)` tuples, both of
which receive the same aggregate `VnIndex` because `Vec::new()` lowers to a
deterministic aggregate; GVN then made the second `call_once(move _7)` into
`call_once(copy _5)` after the first call had already taken ownership of `_5`.

This patch records `simplify_operand`'s `Move` branch separately and, when the
moved place's type `needs_drop`, drops everything GVN knows about the backing
local via a new `invalidate_local` helper. Calls reach this path through
`super_terminator` -> `visit_operand`, so terminator-arg moves are covered with
no extra plumbing. `!needs_drop` types are left untouched: for them `Move` and
`Copy` are observationally equivalent in post-borrowck MIR, so the existing
unification (e.g. moves of `&mut T`, `*mut T`, plain integer aggregates) is
preserved.

Three pre-existing mir-opt tests lose the unsound aggregate-to-copy rewrite as
a consequence:

  - `tests/mir-opt/pre-codegen/try_identity.rs` (`old` desugaring): the
    `Ok(x?)` identity is no longer collapsed into `_0 = copy _1` for the
    generic `Result<T, E>`. The `new` desugaring is unaffected.
  - `tests/mir-opt/gvn_copy_aggregate.rs::remove_storage_dead`: the rebuilt
    `AlwaysSome::<T>::Some(_)` is no longer rewritten to `_0 = copy _3`.
  - `tests/mir-opt/early_otherwise_branch_unwind.rs`: `EarlyOtherwiseBranch`
    now emits a fresh `discriminant` read on the cleanup edge instead of
    reusing the cached one.

The new tests pin the fix down:
  - `tests/ui/mir/gvn-fnonce-miscompile.rs` runs the issue's MCVE under
    `-Copt-level=3 -Zmir-enable-passes=+GVN` and asserts the second closure
    sees a fresh, empty `Vec`.
  - `tests/mir-opt/gvn_move_invalidation.rs` exercises the post-fix MIR shape
    for both the droppable (no rewrite) and plain (constant-fold) cases.
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 14, 2026

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 14, 2026
@dianqk
Copy link
Copy Markdown
Member

dianqk commented Apr 15, 2026

I don't think this fix is correct; see #155241 (comment).

@dianqk dianqk added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 15, 2026
@blackms
Copy link
Copy Markdown
Author

blackms commented Apr 15, 2026

Thanks for taking a look. I think there may be a mix-up: issue #155241 ended up containing two independent miscompiles, and the comment you linked is about the second one (@theemathas's const VALUE: (Thing,) case).

This PR addresses the issue's first MCVE — the FnOnce(Vec<usize>) one. That one reproduces via GVN alone: disabling -Zmir-enable-passes=-GVN makes the MCVE pass, disabling CopyProp alone does not, and it is unrelated to the extern "rust-call" ABI codegen. The const VALUE: (Thing,) case theemathas minimised is a separate codegen bug in codegen_arguments_untupled (the tupled rust-call path missing the safety-copy added for the non-tupled path in #45996); that one is fixed in the sibling PR #155291.

I verified each fix closes the MCVE it targets in isolation. If you have a specific case that reproduces on a tree with both PRs applied, please share — I haven't stress-tested combinations that might hit a third shape, and would want to know before this one goes further.

@theemathas
Copy link
Copy Markdown
Contributor

What test case is fixed by this PR that is not fixed by #155291?

@RalfJung
Copy link
Copy Markdown
Member

So far there seems to be no indication that #155241 is a bug in GVN. Changing GVN likely merely works around the symptoms but not the cause of the bug. An LLM would of course not notice this, this requires a human to actually understand the problem. This is exactly why LLM-driven contributions are dangerous and generally unsuited for new contributors that are inexperienced in the codebase.

@jieyouxu
Copy link
Copy Markdown
Member

jieyouxu commented Apr 15, 2026

Hi, we have seen that you've generated a lot of PRs recently. I'm going to ask that you slow down on opening PRs for the time being. Several of these are in very tricky areas of the compiler which can easily be wrong. And it's easy to overwhelm reviewers. Furthermore, recently we have seen waves of automated contributions that are not adequately self-reviewed, which is why we are being more cautious.

Separately, there are concerns about #155296 and the potential for unsoundness (as mir-opts are often a source of unsoundness). Rather than try to engage in the PR thread about that PR, it would be better to discuss it on Zulip to talk through the change and why it may or may not be correct.

Additionally, several reviewers have expressed concerns with these assisted contributions:

  • Overly verbose PR descriptions and replies to review comments, that seem to be largely LLM-generated
  • Replying to reviewer comments with LLM output
  • Non-disclosure until reviewers expressly asks
  • Some of the proposed changes seem to exhibit misunderstanding of important aspects of the compiler and such

I am thus going to close the existing PRs from you that exhibit these characteristics that cause frustration with reviewers. If you would like to continue contributing:

  • I recommend discussing such a change in the compiler help thread #t-compiler/help before opening such PRs.
  • Refrain from using LLMs to generate the PR description.
  • Refrain from using LLMs to generate answers to reviewer questions.

Caution

In my capacity as a venue moderator, I am hereby placing a temporary embargo on creation of additional new PRs from you before reviewers develop more confidence that you are self-reviewing and understand your contributions. The temporary embargo will be lifted if the reviewers have sufficient confidence in your contributions.

This is a moderation warning.

Before the temporary embargo is lifted, any further PRs from your account will be closed. If new PRs continue to be raised from your account before so, further moderation action may be applied.

You may contact the moderation team to discuss your warning and temporary embargo.

See our policy and contribution etiquette.

@jieyouxu jieyouxu closed this Apr 15, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 15, 2026
@rust-lang rust-lang locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

release-mode heap corruption with non-generic FnOnce(Vec<usize>)

7 participants