Skip to content

turbo-tasks: Reduce allocations on cache hits#92756

Merged
lukesandberg merged 21 commits intocanaryfrom
fewer_boxes
Apr 16, 2026
Merged

turbo-tasks: Reduce allocations on cache hits#92756
lukesandberg merged 21 commits intocanaryfrom
fewer_boxes

Conversation

@lukesandberg
Copy link
Copy Markdown
Contributor

@lukesandberg lukesandberg commented Apr 13, 2026

What?

Reduce heap allocations when turbo-tasks functions get cache hits (~85% of calls).

Why?

Every turbo-tasks function call (generated by #[turbo_tasks::function]) was boxing its arguments into Box<dyn MagicAny> before looking up the task cache. This allocation is wasted on cache hits, which are the overwhelmingly common case.

How?

Deferred boxing via StackMagicAny trait object:

Introduce a StackMagicAny trait that abstracts over a stack-resident Option<T>:

  • as_ref(&self) -> &dyn MagicAny — borrow the argument for hash/equality (cache lookup)
  • take_box(&mut self) -> Box<dyn MagicAny> — move the value to the heap (zero clones)
  • as_any_mut(&mut self) -> &mut dyn Any — downcast to concrete type without boxing

The data flow:

  1. Callsite (macro-generated): creates StackMagicAnySlot::new((args...)) on the stack, calls dynamic_call(..., &mut arg)
  2. dynamic_call: checks resolution via arg.as_ref(), routes to native_call (resolved) or boxes via arg.take_box() for LocalTaskSpec (unresolved)
  3. Backend get_or_create_task_inner: does a read-only raw_get lookup using hash_from_components + eq_components with the borrowed &dyn MagicAny. On cache hit (~85%), returns immediately — zero heap allocation. On cache miss, re-checks under write lock using the same borrowed reference, and only calls arg.take_box() in the vacant entry case (true cache miss).

Boxing is now deferred past all of these:

  • Memory cache hit — the common case, no allocation at all
  • Backing storage hit — found in persistent storage, no allocation needed
  • Lost race under write lock — another thread inserted while we upgraded; we use their task_id, still no allocation
  • Trait method dispatch (no filtering)filter_owned is now Option<FilterOwnedArgsFunctor>; when None (the common case where all args are used), the original &mut dyn StackMagicAny passes straight through to dynamic_call without boxing

Optimized filter_owned for traits:

When trait methods do need argument filtering (unused _-prefixed parameters), the old path did take_box()downcast_args_owned() → dereference → repack. This is an unnecessary heap round-trip. The new downcast_stack_args_owned() function uses as_any_mut() to downcast directly to &mut StackMagicAnySlot<T> and calls take() on the inner Option, skipping the intermediate Box entirely.

Additional changes:

  • Backend::get_or_create_*_task now takes decomposed parameters (native_fn, this, &mut dyn StackMagicAny) instead of a pre-constructed CachedTaskType
  • Persistent and transient task creation merged into a shared get_or_create_task_inner(transient: bool)
  • connect_child uses eagerly-set persistent_task_type from initialize_new_task
  • OwnedMagicAny adapter wraps already-boxed args (from async resolution tasks) to fit the StackMagicAny interface
  • Both dynamic_call and trait_call take &mut dyn StackMagicAny (trait dispatch also benefits)
  • CachedTaskType::hash_encode now delegates to hash_encode_components (deduplicated)
  • Removed try_native_call, native_call_if_consistent, try_get_or_create_* — the deferred boxing approach subsumes these

Binary size

Binary size is neutral (linux-x86_64, --release, stripped + gzipped: 30.9 MB on both canary and this branch).

Overhead benchmark (turbo-tasks-backend, median, lower is better)

Measured on an isolated Firecracker microVM (linux-x86_64). Variance is nontrivial on this environment, but the direction is consistently positive across all turbo-tasks benchmarks.

Benchmark Canary This PR Delta
turbo-cached-same-keys/1 512.6 ns 490.2 ns -4.4%
turbo-cached-same-keys/10 493.4 ns 483.8 ns -2.0%
turbo-cached-same-keys/100 502.1 ns 484.9 ns -3.4%
turbo-cached-same-keys/1000 1.31 µs 774.2 ns -41.0%
turbo-cached-different-keys/1 1.02 µs 1.00 µs -1.4%
turbo-cached-different-keys/10 1.13 µs 1.11 µs -1.8%
turbo-cached-different-keys/100 1.29 µs 1.24 µs -3.8%
turbo-cached-different-keys/1000 2.37 µs 2.32 µs -2.5%
turbo-uncached/1 23.43 µs 20.47 µs -12.6%
turbo-uncached/10 32.23 µs 29.71 µs -7.8%
turbo-uncached/100 126.01 µs 124.95 µs -0.8%
turbo-uncached/1000 1.079 ms 1.048 ms -2.9%
turbo-uncached-parallel/1 6.16 µs 5.67 µs -8.0%
turbo-uncached-parallel/10 5.47 µs 5.12 µs -6.4%
turbo-uncached-parallel/100 14.65 µs 14.55 µs -0.7%
turbo-uncached-parallel/1000 132.09 µs 129.13 µs -2.2%

@nextjs-bot nextjs-bot added created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js. labels Apr 13, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 14, 2026

Merging this PR will not alter performance

✅ 17 untouched benchmarks
⏩ 3 skipped benchmarks1


Comparing fewer_boxes (59875f4) with canary (73e89d9)

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 14, 2026

Stats from current PR

✅ No significant changes detected

📊 All Metrics
📖 Metrics Glossary

Dev Server Metrics:

  • Listen = TCP port starts accepting connections
  • First Request = HTTP server returns successful response
  • Cold = Fresh build (no cache)
  • Warm = With cached build artifacts

Build Metrics:

  • Fresh = Clean build (no .next directory)
  • Cached = With existing .next directory

Change Thresholds:

  • Time: Changes < 50ms AND < 10%, OR < 2% are insignificant
  • Size: Changes < 1KB AND < 1% are insignificant
  • All other changes are flagged to catch regressions

⚡ Dev Server

Metric Canary PR Change Trend
Cold (Listen) 455ms 455ms ▁▁▁▁█
Cold (Ready in log) 442ms 442ms ▂▁▂▂█
Cold (First Request) 832ms 835ms █▇▁▁▂
Warm (Listen) 457ms 456ms ▁▁▁▁█
Warm (Ready in log) 442ms 443ms ▃▂▁▁█
Warm (First Request) 345ms 344ms █▅▆▃▁
📦 Dev Server (Webpack) (Legacy)

📦 Dev Server (Webpack)

Metric Canary PR Change Trend
Cold (Listen) 456ms 455ms ▁▁▁██
Cold (Ready in log) 439ms 439ms ▇▂▄▅▁
Cold (First Request) 1.940s 1.962s ▇▅▁▁▁
Warm (Listen) 456ms 455ms █▅▅▅▁
Warm (Ready in log) 438ms 439ms ▇▃▇▃▁
Warm (First Request) 1.935s 1.953s █▅▃▃▁

⚡ Production Builds

Metric Canary PR Change Trend
Fresh Build 3.987s 3.973s ▂▁▄▃█
Cached Build 3.994s 4.020s ▁▁▃▃█
📦 Production Builds (Webpack) (Legacy)

📦 Production Builds (Webpack)

Metric Canary PR Change Trend
Fresh Build 14.478s 14.470s █▅▇▅▃
Cached Build 14.576s 14.651s █▄██▁
node_modules Size 494 MB 494 MB ███▅█
📦 Bundle Sizes

Bundle Sizes

⚡ Turbopack

Client

Main Bundles
Canary PR Change
01lykb9j9u19z.js gzip 155 B N/A -
037b-ksaecgir.js gzip 158 B N/A -
051awit1pzxy9.js gzip 156 B N/A -
07rxhp_1_g4mu.js gzip 13.1 kB N/A -
08avva-dy02e7.js gzip 10.4 kB N/A -
0b9mvru8gaawc.js gzip 156 B N/A -
0cz1d0mv5g_q7.js gzip 39.4 kB 39.4 kB
0dfverolgqlu_.js gzip 155 B N/A -
0fli3_wppnim5.js gzip 12.9 kB N/A -
0guupo6x26xoo.js gzip 70.8 kB N/A -
0k09jwjeb-tki.js gzip 13.8 kB N/A -
0kb7_ep3r1z0_.js gzip 10.1 kB N/A -
0kpuma6a8t2qh.js gzip 168 B N/A -
0kw8xgqdrilf6.js gzip 8.56 kB N/A -
0ojkk2e654xsc.js gzip 8.59 kB N/A -
0sbq9bkqvh45e.js gzip 152 B N/A -
0wxpyd8r-vipl.js gzip 1.47 kB N/A -
0xnfs20vs3ysc.js gzip 153 B N/A -
0xy2fhla48_rd.js gzip 9.24 kB N/A -
10wqsvi2mgfmi.js gzip 9.82 kB N/A -
16lhqjoqbznyg.js gzip 220 B 220 B
16vepdkipri3r.js gzip 8.51 kB N/A -
17n96uu6y1pxq.js gzip 8.6 kB N/A -
18y4_8-9or0mn.js gzip 8.51 kB N/A -
1elt1qium-r2m.css gzip 115 B 115 B
1gq145j3kps-h.js gzip 8.62 kB N/A -
1l5or8vq6a69s.js gzip 154 B N/A -
1nsh-mbn0e-se.js gzip 8.56 kB N/A -
1tsrrp1tdngti.js gzip 13.3 kB N/A -
1zf460s-ga2zh.js gzip 154 B N/A -
2__-e_ym8n788.js gzip 450 B N/A -
2-ivsrs9yb0b0.js gzip 156 B N/A -
22o6xd9_ywdu6.js gzip 233 B N/A -
26ui6d5bv607a.js gzip 49.3 kB N/A -
2kvj8yrfznmwx.js gzip 5.69 kB N/A -
2mlsou7_9la1i.js gzip 154 B N/A -
2p854ctj-qiki.js gzip 65.5 kB N/A -
2qv7m7xjnokgr.js gzip 8.58 kB N/A -
341itofhl0awt.js gzip 160 B N/A -
342ijzvrpe53h.js gzip 2.29 kB N/A -
44un3--wmqiyh.js gzip 7.61 kB N/A -
turbopack-04..qhq7.js gzip 4.17 kB N/A -
turbopack-0a..u7nt.js gzip 4.19 kB N/A -
turbopack-0b..u348.js gzip 4.19 kB N/A -
turbopack-0e..wy2i.js gzip 4.19 kB N/A -
turbopack-0l..dasq.js gzip 4.19 kB N/A -
turbopack-1-..xex1.js gzip 4.2 kB N/A -
turbopack-1d..whps.js gzip 4.19 kB N/A -
turbopack-1e..pdkr.js gzip 4.18 kB N/A -
turbopack-3_..jxf5.js gzip 4.19 kB N/A -
turbopack-33..vmhx.js gzip 4.19 kB N/A -
turbopack-36..ne43.js gzip 4.19 kB N/A -
turbopack-3s..yafr.js gzip 4.19 kB N/A -
turbopack-3y..cu3g.js gzip 4.19 kB N/A -
turbopack-41..gi52.js gzip 4.19 kB N/A -
06eqw0ze8c7k4.js gzip N/A 65.5 kB -
0arkbdqpxc37i.js gzip N/A 8.6 kB -
0bz-xifewa17d.js gzip N/A 8.63 kB -
0fbm505yboynb.js gzip N/A 49.3 kB -
0tvekitj587fh.js gzip N/A 8.51 kB -
0xz7kqe1wjdqh.js gzip N/A 169 B -
0yvk6-wi8e9wh.js gzip N/A 13.3 kB -
0z83a1om5rvtt.js gzip N/A 7.61 kB -
1-jqyfc89tixo.js gzip N/A 1.46 kB -
14t1kneseb8th.js gzip N/A 2.3 kB -
15sb1-dsqfk_j.js gzip N/A 8.59 kB -
1ab2xruymo-oj.js gzip N/A 449 B -
1hxb-q1ungqh_.js gzip N/A 70.8 kB -
1tu25qtsmfhar.js gzip N/A 9.82 kB -
1vein_gnv3mwr.js gzip N/A 8.56 kB -
1wzrm0xjjbzn5.js gzip N/A 10.1 kB -
1z3g0uaqtv9_3.js gzip N/A 8.56 kB -
2-e64t22r1kgw.js gzip N/A 153 B -
21uyslsd4odmk.js gzip N/A 158 B -
25a1yz7zua29z.js gzip N/A 13.8 kB -
27o93knux3hfn.js gzip N/A 156 B -
2bi5hx402juv-.js gzip N/A 8.58 kB -
2hy56297fog9u.js gzip N/A 8.52 kB -
2r7z7i6hgc457.js gzip N/A 155 B -
2u_rpxq3tzytl.js gzip N/A 233 B -
2upap--8h9cvf.js gzip N/A 157 B -
323ki47w-n3e9.js gzip N/A 157 B -
35a8pvb74ba9h.js gzip N/A 155 B -
368lim5wq0o0r.js gzip N/A 12.9 kB -
3asf9b6dh7q99.js gzip N/A 155 B -
3d2-cjrz3nqd1.js gzip N/A 161 B -
3drqjohogojbw.js gzip N/A 5.69 kB -
3g8l1m2-o-ewi.js gzip N/A 13.1 kB -
3hhvtftvowwye.js gzip N/A 156 B -
3jmkxsnxg0nrh.js gzip N/A 10.4 kB -
3q11pdkvyja5e.js gzip N/A 161 B -
3r03tqt-li-wg.js gzip N/A 158 B -
3wpp8nvyoj121.js gzip N/A 9.24 kB -
turbopack-03..7g4c.js gzip N/A 4.19 kB -
turbopack-0f..tm9h.js gzip N/A 4.19 kB -
turbopack-0i..hzz_.js gzip N/A 4.19 kB -
turbopack-0m..zm5y.js gzip N/A 4.2 kB -
turbopack-0x..34yg.js gzip N/A 4.19 kB -
turbopack-16..zbi5.js gzip N/A 4.19 kB -
turbopack-1h..u9e1.js gzip N/A 4.19 kB -
turbopack-28..n7f5.js gzip N/A 4.19 kB -
turbopack-2h..wq2d.js gzip N/A 4.19 kB -
turbopack-2n..pnni.js gzip N/A 4.19 kB -
turbopack-2u..gsgj.js gzip N/A 4.19 kB -
turbopack-2x..crbl.js gzip N/A 4.19 kB -
turbopack-36..2jpx.js gzip N/A 4.19 kB -
turbopack-3d..8xnw.js gzip N/A 4.17 kB -
Total 465 kB 465 kB ⚠️ +62 B

Server

Middleware
Canary PR Change
middleware-b..fest.js gzip 717 B 721 B
Total 717 B 721 B ⚠️ +4 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 432 B 434 B
Total 432 B 434 B ⚠️ +2 B

📦 Webpack

Client

Main Bundles
Canary PR Change
2637-HASH.js gzip 4.63 kB N/A -
7724.HASH.js gzip 169 B N/A -
8274-HASH.js gzip 61.4 kB N/A -
8817-HASH.js gzip 5.59 kB N/A -
c3500254-HASH.js gzip 62.8 kB N/A -
framework-HASH.js gzip 59.7 kB 59.7 kB
main-app-HASH.js gzip 254 B 255 B
main-HASH.js gzip 39.4 kB 39.4 kB
webpack-HASH.js gzip 1.68 kB 1.68 kB
5887-HASH.js gzip N/A 5.61 kB -
6522-HASH.js gzip N/A 60.8 kB -
6779-HASH.js gzip N/A 4.63 kB -
8854.HASH.js gzip N/A 169 B -
eab920f9-HASH.js gzip N/A 62.8 kB -
Total 236 kB 235 kB ✅ -643 B
Polyfills
Canary PR Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Total 39.4 kB 39.4 kB
Pages
Canary PR Change
_app-HASH.js gzip 193 B 193 B
_error-HASH.js gzip 182 B 182 B
css-HASH.js gzip 333 B 334 B
dynamic-HASH.js gzip 1.81 kB 1.8 kB
edge-ssr-HASH.js gzip 255 B 255 B
head-HASH.js gzip 353 B 349 B 🟢 4 B (-1%)
hooks-HASH.js gzip 384 B 382 B
image-HASH.js gzip 581 B 581 B
index-HASH.js gzip 260 B 259 B
link-HASH.js gzip 2.51 kB 2.51 kB
routerDirect..HASH.js gzip 316 B 318 B
script-HASH.js gzip 386 B 386 B
withRouter-HASH.js gzip 313 B 314 B
1afbb74e6ecf..834.css gzip 106 B 106 B
Total 7.98 kB 7.97 kB ✅ -10 B

Server

Edge SSR
Canary PR Change
edge-ssr.js gzip 126 kB 126 kB
page.js gzip 273 kB 273 kB
Total 399 kB 399 kB ✅ -382 B
Middleware
Canary PR Change
middleware-b..fest.js gzip 617 B 618 B
middleware-r..fest.js gzip 156 B 156 B
middleware.js gzip 44.2 kB 44.4 kB
edge-runtime..pack.js gzip 842 B 842 B
Total 45.9 kB 46.1 kB ⚠️ +195 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 721 B 720 B
Total 721 B 720 B ✅ -1 B
Build Cache
Canary PR Change
0.pack gzip 4.38 MB 4.38 MB
index.pack gzip 113 kB 111 kB 🟢 2.06 kB (-2%)
index.pack.old gzip 117 kB 112 kB 🟢 4.4 kB (-4%)
Total 4.61 MB 4.6 MB ✅ -7.58 kB

🔄 Shared (bundler-independent)

Runtimes
Canary PR Change
app-page-exp...dev.js gzip 347 kB 347 kB
app-page-exp..prod.js gzip 192 kB 192 kB
app-page-tur...dev.js gzip 346 kB 346 kB
app-page-tur..prod.js gzip 192 kB 192 kB
app-page-tur...dev.js gzip 343 kB 343 kB
app-page-tur..prod.js gzip 190 kB 190 kB
app-page.run...dev.js gzip 343 kB 343 kB
app-page.run..prod.js gzip 190 kB 190 kB
app-route-ex...dev.js gzip 77 kB 77 kB
app-route-ex..prod.js gzip 52.5 kB 52.5 kB
app-route-tu...dev.js gzip 77.1 kB 77.1 kB
app-route-tu..prod.js gzip 52.6 kB 52.6 kB
app-route-tu...dev.js gzip 76.7 kB 76.7 kB
app-route-tu..prod.js gzip 52.3 kB 52.3 kB
app-route.ru...dev.js gzip 76.6 kB 76.6 kB
app-route.ru..prod.js gzip 52.3 kB 52.3 kB
dist_client_...dev.js gzip 324 B 324 B
dist_client_...dev.js gzip 326 B 326 B
dist_client_...dev.js gzip 318 B 318 B
dist_client_...dev.js gzip 317 B 317 B
pages-api-tu...dev.js gzip 43.9 kB 43.9 kB
pages-api-tu..prod.js gzip 33.5 kB 33.5 kB
pages-api.ru...dev.js gzip 43.9 kB 43.9 kB
pages-api.ru..prod.js gzip 33.5 kB 33.5 kB
pages-turbo....dev.js gzip 53.3 kB 53.3 kB
pages-turbo...prod.js gzip 39.1 kB 39.1 kB
pages.runtim...dev.js gzip 53.3 kB 53.3 kB
pages.runtim..prod.js gzip 39.1 kB 39.1 kB
server.runti..prod.js gzip 62.9 kB 62.9 kB
Total 3.06 MB 3.06 MB ✅ -1 B
📎 Tarball URL
https://vercel-packages.vercel.app/next/commits/59875f45ec5a4ddd1f78087b1b1a77fe0b0a97dc/next

@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 14, 2026

Tests Passed

Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/operation/aggregation_update.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/operation/connect_child.rs Outdated
Comment thread turbopack/crates/turbo-tasks-macros/src/func.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
Comment thread turbopack/crates/turbo-tasks-testing/src/lib.rs
Comment thread turbopack/crates/turbo-tasks-macros/src/func.rs Outdated
Comment thread turbopack/crates/turbo-tasks-macros/src/func.rs
Comment thread turbopack/crates/turbo-tasks-macros/src/func.rs
@lukesandberg lukesandberg requested a review from mmastrac April 16, 2026 08:08
@lukesandberg lukesandberg marked this pull request as ready for review April 16, 2026 08:08
Comment thread turbopack/crates/turbo-tasks/src/magic_any.rs Outdated
Comment thread turbopack/crates/turbo-tasks/src/manager.rs Outdated
Comment thread turbopack/crates/turbo-tasks/src/manager.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs
Comment thread turbopack/crates/turbo-tasks-backend/src/utils/dash_map_raw_entry.rs Outdated
Comment thread turbopack/crates/turbo-tasks/src/backend.rs
Comment thread turbopack/crates/turbo-tasks/src/manager.rs Outdated
Comment thread turbopack/crates/turbo-tasks/src/manager.rs Outdated
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
Copy link
Copy Markdown
Contributor Author

Rename MagicAny to DynTaskInput or RawTaskInputs

... in another PR

Comment thread turbopack/crates/turbo-tasks/src/magic_any.rs
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/mod.rs Outdated
lukesandberg and others added 21 commits April 16, 2026 13:41
Two changes to reduce heap allocations when calling turbo-tasks functions:

1. Move persistent_task_type propagation from connect_child/IncreaseActiveCount
   into initialize_new_task. This removes the need to thread task_type through
   operations on every call (hit or miss), and lets connect_child use
   TaskDataCategory::Meta instead of All.

2. Add a fast-path cache lookup (try_native_call / native_call_if_consistent)
   that checks the task_cache with borrowed args before boxing. The macro-
   generated code now tries this read-only lookup first for non-self function
   calls. On cache hit (~85% of calls), no Box<dyn MagicAny> is allocated.
   On miss, falls back to the existing boxed path.

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace redundant closures with `RawVc::TaskOutput` (clippy)
- Return `Err(0)` from VcStorage::try_native_call instead of unreachable!(),
  since the testing backend has no task cache
- Fall back to dynamic_call (not native_call) on cache miss, since
  dynamic_call is the universal entry point all backends implement

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract native_fn before Arc::new(task_type) to avoid an extra .clone()
  in the Vacant arms of get_or_create_{persistent,transient}_task
- Add track_cache_miss_by_fn (mirrors track_cache_hit_by_fn)
- Remove explanatory comments about persistent_task_type eagerness
- Remove unused persistence() method instead of suppressing warning

Co-Authored-By: Claude <noreply@anthropic.com>
The static_block codegen for method calls (self/this pointer) now uses
the same optimized path as free functions: args stay on the stack and
we try a read-only cache lookup before boxing.

For methods, we additionally check this.is_resolved() before taking the
fast path, since unresolved self values need a resolution wrapper task.

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of expanding each macro callsite into two code paths (one for
cache hit, one for miss), introduce a StackArg trait that keeps args
on the caller's stack. The backend does a read-only cache lookup with
a borrowed &dyn MagicAny reference; only on cache miss does take_box()
move the value to the heap — zero clones, single code path per callsite.

Key changes:
- Add StackArg trait + StackArgSlot<T> (stack slot) + OwnedArg (boxed adapter)
- dynamic_call/native_call now take &mut dyn StackArg instead of Box<dyn MagicAny>
- Backend::get_or_create_*_task takes components (native_fn, this, &mut dyn StackArg)
  and does raw_get with borrowed arg before materializing the Box on miss
- Remove try_native_call, native_call_if_consistent, try_get_or_create_*
- Macro static_block reduces to a single dynamic_call with StackArgSlot

Co-Authored-By: Claude <noreply@anthropic.com>
- Comment 4+5: Restore `persistence()` helper, use it in both
  `static_block` and `dynamic_block` to reduce diff from canary
- Comment 6: Make `trait_call` take `&mut dyn StackArg` too, so
  `dynamic_block` (trait dispatch) also uses `StackArgSlot` instead
  of `Box::new(inputs)` — deferred boxing on trait calls
- Comment 2: Merge `get_or_create_persistent_task` and
  `get_or_create_transient_task` into shared `get_or_create_task_inner`
  parameterized by `transient: bool`
- Comment 1: Construct `CachedTaskType` in the transient panic path
  so `panic_persistent_calling_transient` gets a real task description

Co-Authored-By: Claude <noreply@anthropic.com>
Replace the two-phase lookup (read-lock raw_get then write-lock
raw_entry with re-hash) with a single raw_entry_with_hash call that
takes the pre-computed hash and a heterogeneous eq closure. The map
is sharded so write-lock contention is minimal, and this eliminates
redundant hashing on the miss path.

Co-Authored-By: Claude <noreply@anthropic.com>
…ring backing storage read

The single raw_entry_with_hash approach held the dashmap write lock
while calling task_by_type (backing storage). Restore the three-step
flow: raw_get (read lock) -> task_by_type (no lock) -> raw_entry_with_hash
(write lock), but now the write-lock step reuses the pre-computed hash
instead of re-hashing.

Co-Authored-By: Claude <noreply@anthropic.com>
- Delete arc_or_owned.rs (no longer referenced after ArcOrOwned removal)
- Remove or_insert_with, get_mut, into_mut, RefMut and its Deref/DerefMut
  impls from dash_map_raw_entry (none used by current callers)
- VacantEntry::insert now returns () since no caller used RefMut
- Mark panic_persistent_calling_transient as -> ! to make the divergence
  contract explicit

Co-Authored-By: Claude <noreply@anthropic.com>
…MagicAny

Address review comments:
- Rename StackArg -> StackMagicAny, StackArgSlot -> StackMagicAnySlot,
  OwnedArg -> OwnedMagicAny, arg_ref -> as_ref
- FilterOwnedArgsFunctor now takes &mut dyn StackMagicAny and returns
  OwnedMagicAny, so the caller doesn't manually take_box + rewrap

Co-Authored-By: Claude <noreply@anthropic.com>
…eate_task

The Backend trait had two methods with identical signatures that only
differed by transience. The caller just matched on persistence and
dispatched. Merge into a single method that accepts TaskPersistence,
eliminating the redundant trait surface.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…inline variable

- Thread 14: Restore comment explaining why read lock is used for Step 1
- Thread 16: Restore descriptive comments on backing storage path
- Thread 19: Remove StackMagicAny doc comment from TurboTasksCallApi
- Thread 20: Inline parent_task local variable in native_call

Co-Authored-By: Claude <noreply@anthropic.com>
When restoring a task from backing storage, reuse the existing
Arc<CachedTaskType> from the stored persistent_task_type rather
than creating a new Arc from the caller's boxed copy. This avoids
having two copies of the same task_type in memory.

Co-Authored-By: Claude <noreply@anthropic.com>
…ents

Verify that hash_from_components produces the same hash as the Hash
impl on a fully constructed CachedTaskType, and that eq_components
correctly matches/rejects on each component (native_fn, this, arg).

Co-Authored-By: Claude <noreply@anthropic.com>
Compute the shard index once from the hash and reuse it for both
the read-only cache check and the subsequent write-lock entry lookup.
Saves a few math operations and a pointer dereference on the miss path.

Co-Authored-By: Claude <noreply@anthropic.com>
Guarantees same layout as Option<T>, making the type suitable for
FFI-like patterns and ensuring no padding overhead.

Co-Authored-By: Claude <noreply@anthropic.com>
Enable dashmap's raw-api feature to access shard internals directly.
get_shard() returns a reference to the shard itself, which is reused
across both read-only and write-lock lookups, eliminating redundant
shard index computation.

Co-Authored-By: Claude <noreply@anthropic.com>
Rework task_by_type and lookup_task_candidates to accept exploded
components (native_fn, this, &dyn MagicAny) instead of &CachedTaskType.
This allows the backing storage lookup to happen using borrowed
references from the stack — the Box<dyn MagicAny> allocation for the
arg is now deferred until both the in-memory cache AND backing storage
have confirmed a miss.

Co-Authored-By: Claude <noreply@anthropic.com>
- Restore "another thread beat us" comment in Occupied race path
- Restore "Initialize storage BEFORE making task_id visible" ordering invariant
- Restore "insert() consumes e, releasing the shard write lock"
- Fix stale connect_child.rs comments about removed task type update
- Restore "stay Meta not All" performance rationale in aggregation_update
- Improve error message in kv_backing_storage to include this parameter

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@lukesandberg lukesandberg requested a review from mmastrac April 16, 2026 21:28
Copy link
Copy Markdown
Contributor

@mmastrac mmastrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I suspect that we may actually be able to remove the virtual methods on StackMagicAny in a followup, assuming that Rust isn't smart enough to devirtualize them itself with some additional tricks.

Copy link
Copy Markdown
Contributor Author

This is the 'explicitly capture layout information and vtables idea?

@mmastrac
Copy link
Copy Markdown
Contributor

This is the 'explicitly capture layout information and vtables idea?

I missed that Rust already captures layout inside the vtable already, so if you 1) have a &dyn MagicAny and 2) guarantee that it doesn't use niches and 3) use a combination of MaybeInit + Cell with a boolean, you can use that vtable information to write a universal as_ref and take_box.

You'd have something like this (hand-wavey):

 #[repr(C)]
 pub struct MagicStackAny<T: MagicAny> {
     value: MaybeUninit<T>,
     taken: UnsafeCell<bool>,
 }

impl Drop {
  // Only if not taken
}

 /// Move the value out of a `MagicStackAny<T>` and onto the heap as
 /// `Box<dyn MagicAny>`, without knowing `T` statically.
 ///
 /// # Safety
 ///
 /// `value` MUST be a reference obtained from `MagicStackAny::as_ref` on a
 /// `#[repr(C)] MagicStackAny<T>` where `T` matches the vtable carried by
 /// `value`. In particular:
 /// - The byte at offset `size_of_val(value)` past the data pointer must be
 ///   the `UnsafeCell<bool>` `taken` flag.
 /// - The caller must not read `value` again after this returns (the slot
 ///   becomes logically uninitialized, protected only by `taken = true`).
 pub unsafe fn take_magic_stack_any(value: &dyn MagicAny) -> Box<dyn MagicAny> {
     use std::alloc::{Layout, alloc, handle_alloc_error};
     use std::ptr;

     let size  = std::mem::size_of_val(value);
     let align = std::mem::align_of_val(value);
     let src   = value as *const dyn MagicAny as *const u8;

     // `taken` sits immediately past `value` (see layout invariants above).
     let taken_ptr = unsafe { src.add(size) } as *mut bool;
     assert!(unsafe { !*taken_ptr }, "take_magic_stack_any called twice");

     // Allocate a heap block matching T's layout and byte-copy the value.
     let layout = unsafe { Layout::from_size_align_unchecked(size, align) };
     let dest = unsafe { alloc(layout) };
     if dest.is_null() { handle_alloc_error(layout); }
     unsafe { ptr::copy_nonoverlapping(src, dest, size) };

     // Mark the stack slot taken so the MagicStackAny<T>'s Drop skips T::drop.
     unsafe { ptr::write(taken_ptr, true) };

     // Rebuild a Box<dyn MagicAny> with the heap pointer + the original vtable.
     let meta = ptr::metadata(value as *const dyn MagicAny);
     let fat: *mut dyn MagicAny =
         ptr::from_raw_parts_mut(dest as *mut (), meta);
     unsafe { Box::from_raw(fat) }
 }

@lukesandberg lukesandberg merged commit 31a1b63 into canary Apr 16, 2026
189 checks passed
@lukesandberg lukesandberg deleted the fewer_boxes branch April 16, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants