Skip to content

Cap iteration count in ScalarReplacementOfAggregates#155290

Closed
blackms wants to merge 1 commit intorust-lang:mainfrom
blackms:fix-sroa-hang-153205
Closed

Cap iteration count in ScalarReplacementOfAggregates#155290
blackms wants to merge 1 commit intorust-lang:mainfrom
blackms:fix-sroa-hang-153205

Conversation

@blackms
Copy link
Copy Markdown

@blackms blackms commented Apr 14, 2026

Fixes #153205.

Problem

ScalarReplacementOfAggregates (compiler/rustc_mir_transform/src/sroa.rs) runs a fixed-point loop { … } that keeps flattening aggregate locals until no further "dead" locals are produced. Inside, compute_flattening calls iter_fields, which normalizes each field type via try_normalize_erasing_regions before feeding it back as a new local.

On the MCVE from the issue:

trait Apply { type Output<T>; }
struct Identity;
impl Apply for Identity { type Output<T> = T; }
struct Thing<A: Apply>(A::Output<Self>);

fn foo<A: Apply>() { let _x: Thing<A>; }
fn main() { foo::<Identity>(); }

normalizing Thing<Identity>'s field type yields Thing<Identity> again — the enclosing struct itself. An E0391 layout-cycle error is emitted up front, but MIR optimization still runs, and every SROA iteration allocates exactly one fresh local of type Thing<Identity> which is itself a fresh flattening candidate. all_dead_locals is therefore non-empty forever, and the loop never terminates.

Reproduced locally with rustc 1.94.1 stable: after the E0391 is printed the compiler hangs indefinitely (killed after 12 s during repro).

Fix

Replace the unbounded loop with for _ in 0..MAX_ITERATIONS where const MAX_ITERATIONS: usize = 16. Per @scottmcm on the issue thread, 2 iterations typically capture all of SROA's value and non-contrived code never exceeds 10, so 16 leaves comfortable headroom while guaranteeing termination on tainted input. The loop body (including the break on empty all_dead_locals) is unchanged, so healthy code reaches its fixed point exactly as before.

I considered adding an early-bail on body.tainted_by_errors / tcx.dcx().has_errors() instead, but opted against it: it's a larger behavioural change (skipping SROA entirely on tainted bodies) and the same non-termination shape could theoretically be reached without the tainting flag being set at pass entry. A structural iteration cap is a simpler, more local invariant and is robust to both.

Test

Added tests/ui/layout/hang-sroa-layout-cycle.rs next to the existing layout-cycle.rs / post-mono-layout-cycle.rs tests. Uses //@ build-fail with //@ compile-flags: -Copt-level=3 (SROA is gated on mir_opt_level >= 2) and asserts the E0391 cycle error via //~^ ERROR. Without the fix compiletest times out on this test; with the fix it completes and matches the expected stderr.

r? mir-opt

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 14, 2026

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 14, 2026
@Kivooeo
Copy link
Copy Markdown
Member

Kivooeo commented Apr 14, 2026

  1. how llm was used for this pr?

  2. am i read this correct?

  In practice, 2
   // iterations capture nearly all the value and more than 10 is never
   // needed for non-contrived code.

it says that 2 is nearly covers all cases and more than 10 is never needed, so you go 16? why?

@rust-log-analyzer

This comment has been minimized.

The SROA pass repeatedly flattens aggregate locals into their field
locals, looping until a fixed point is reached. When the MIR body
involves a type whose layout is cyclic (e.g. `struct T(T)` reached via
an associated type projection), the layout cycle error is emitted but
compilation continues into MIR optimization. Inside SROA, `iter_fields`
then keeps returning a single field whose normalized type is the
enclosing struct itself, so each iteration allocates a fresh replacement
local and the fixed-point loop never terminates.

Bound the loop to a small constant. In practice two iterations capture
essentially all of SROA's value and non-contrived code does not exceed
ten, so 10 is sufficient and guarantees termination on tainted input.
@blackms blackms force-pushed the fix-sroa-hang-153205 branch from aa5cf73 to e6d8c3c Compare April 14, 2026 14:48
@blackms
Copy link
Copy Markdown
Author

blackms commented Apr 14, 2026

Thanks for the review @Kivooeo!

how llm was used for this pr?

Full disclosure: I used a Claude-based coding agent team for the initial investigation — reading the SROA pass, forming the root-cause hypothesis, drafting the fix from @scottmcm's comment, and sketching the test. I then validated manually before opening the PR: reproduced the hang locally with rustc 1.94.1 stable on the MCVE (killed after 12 s after the E0391), read the full diff myself, and checked the test placement against sibling files (layout-cycle.rs, post-mono-layout-cycle.rs). Happy to answer any question about the reasoning or rewrite anything if something reads as machine-y.

it says that 2 is nearly covers all cases and more than 10 is never needed, so you go 16? why?

Fair point, I was being overcautious. Dropped the cap to 10 in the force-push just now — that's the number @scottmcm explicitly called out as "never needed" in non-contrived code, so using exactly that value makes the relationship between comment and constant obvious. Comment updated to match.

Also force-pushed a fix for the CI failure: when I renamed the test from hang-sroa-layout-cycle-153205.rs to hang-sroa-layout-cycle.rs I forgot to update the $DIR/… path and line number in the .stderr fixture (the .rs file gained a blank line from the URL rewrite). That's what aarch64-gnu-llvm-21-1 was complaining about; should be green on the new push.

@scottmcm
Copy link
Copy Markdown
Member

TBH, I completely made up those numbers. They're an instinct, not something I researched or data-collected.

So it'd probably be good to justify them somehow -- how often does it really loop in practice?


That said, we might as well see what perf thinks about it anyway.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Apr 14, 2026
Cap iteration count in `ScalarReplacementOfAggregates`
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 14, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Apr 15, 2026

☀️ Try build successful (CI)
Build commit: d96b019 (d96b019342c4bec517214647f0d2880a53721cd1, parent: a5c825cd824ee0ef9463021078a2f464b4cc1a0d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (d96b019): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.5% [-1.4%, -0.2%] 5
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary -4.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.5% [-4.5%, -4.5%] 1
All ❌✅ (primary) - - 0

Cycles

Results (primary -2.6%, secondary 3.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.3% [3.3%, 3.3%] 1
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.6% [-2.6%, -2.6%] 1

Binary size

Results (primary 0.1%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 15
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 20
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) 0.1% [0.0%, 0.1%] 15

Bootstrap: 489.296s -> 490.526s (0.25%)
Artifact size: 394.04 MiB -> 394.22 MiB (0.05%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 15, 2026
@saethlin
Copy link
Copy Markdown
Member

r? mir-opt

@rustbot rustbot assigned wesleywiser and unassigned saethlin Apr 15, 2026
@blackms
Copy link
Copy Markdown
Author

blackms commented Apr 15, 2026

Perf came back clean — 0 regressions, 5 secondary improvements around −0.5%, no primary changes. Thanks for running it.

On the cap value: 10 is structural rather than empirical. SROA only keeps looping when all_dead_locals is non-empty, which means some local got flattened into fresh locals that are themselves eligible for flattening. That recursion depth is bounded by the aggregate-nesting depth of the types in the body (tuple-of-struct, struct-of-tuple, etc.), and real code rarely goes past 2–3 levels. Happy to instrument the pass with a counter and run it across a crater sample if you want empirical numbers, but given the try came back clean I'm not sure it buys much beyond what you already have.

@jieyouxu
Copy link
Copy Markdown
Member

See #155296 (comment)

@jieyouxu jieyouxu closed this Apr 15, 2026
@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 15, 2026
@rust-lang rust-lang locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiler hangs with uninitialized variable containing a struct that contains itself

9 participants