Skip to content

⚗️ Partial view updates (experimental)#4201

Draft
mormubis wants to merge 6 commits intomainfrom
adlrb/partial-view
Draft

⚗️ Partial view updates (experimental)#4201
mormubis wants to merge 6 commits intomainfrom
adlrb/partial-view

Conversation

@mormubis
Copy link
Copy Markdown
Contributor

@mormubis mormubis commented Feb 18, 2026

Motivation

Every periodic view update was sending the full view payload even when only one counter changed. Benchmarks showed 50–90% of the data was redundant. This implements the partial view updates RFC to reduce bandwidth by sending only changed fields in subsequent view events.

Aligned with rum-events-format #355 (now merged).

Changes

When partial_view_updates is enabled, the SDK sends the first event per view.id as a full view, then sends view_update diffs with only changed fields. The diff runs post-assembly in startRumBatch.ts so beforeSend always sees the full event (backward-compatible). view_update events bypass the assembly pipeline intentionally, they are a bandwidth optimization and not a customer-visible event type.

A full view checkpoint is sent every 100 updates for backend recovery. Checkpoints can be disabled with partial_view_updates_no_checkpoint.

view_update events use batch.add instead of upsert, so a batch can contain a full view followed by view_update events. This is intentional: if we consolidated them, we wouldn't be able to tell if the backend missed an intermediate update or it was never sent.

Test instructions

yarn test:unit
yarn test:e2e:init && yarn test:e2e -g "partial view"

Or manually:

datadogRum.init({
  enableExperimentalFeatures: ['partial_view_updates'],
})

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 18, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da bot commented Feb 18, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 178.50 KiB 180.65 KiB +2.15 KiB +1.20%
Rum Profiler 6.16 KiB 6.16 KiB 0 B 0.00%
Rum Recorder 27.03 KiB 27.03 KiB 0 B 0.00%
Logs 57.00 KiB 57.11 KiB +120 B +0.21%
Rum Slim 134.32 KiB 136.40 KiB +2.08 KiB +1.55%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.0044 0.0039 -11.36%
RUM - add action 0.014 0.0127 -9.29%
RUM - add error 0.0129 0.0122 -5.43%
RUM - add timing 0.0027 0.0025 -7.41%
RUM - start view 0.0129 0.012 -6.98%
RUM - start/stop session replay recording 0.001 0.0007 -30.00%
Logs - log message 0.0147 0.0139 -5.44%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 27.68 KiB 26.90 KiB -804 B
RUM - add action 96.69 KiB 94.14 KiB -2.55 KiB
RUM - add timing 26.45 KiB 27.60 KiB +1.15 KiB
RUM - add error 92.71 KiB 96.92 KiB +4.21 KiB
RUM - start/stop session replay recording 25.20 KiB 26.67 KiB +1.47 KiB
RUM - start view 487.49 KiB 482.90 KiB -4.59 KiB
Logs - log message 99.97 KiB 91.23 KiB -8.74 KiB

🔗 RealWorld

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 bot commented Feb 18, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Other Violations

🧪 2 Tests failed

serializeNodeAsChange for snapshotted documents for a simple document when the <html> element's privacy level is ALLOW matches the snapshot from Chrome 63.0.3239.84 (Windows 10)   View in Datadog   (Fix with Cursor)
Error: Timeout - Async function did not complete within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL)
    at <Jasmine>
serializeNodeAsChange for snapshotted documents for a simple document when the <html> element's privacy level is MASK_UNLESS_ALLOWLISTED matches the snapshot when the allowlist is empty from Safari 12.1.2 (Mac OS 10.14.6)   View in Datadog   (Fix with Cursor)
Error: Timeout - Async function did not complete within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 8638)
<Jasmine>

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 46.51%
Overall Coverage: 77.13% (-0.32%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 08a0506 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Copy link
Copy Markdown

Nice work on this — the diff engine design looks solid, and the REPLACE/MERGE/APPEND strategy categorization is clean. I've been prototyping a parallel implementation against our staging backend and wanted to share some observations.

High severity

1. No post-assembly strip (~500-650B wasted per view_update)

The diff in viewDiff.ts operates on the raw RawRumViewEvent before assembly. Assembly then adds these fields to every view_update identically to full view events:

  • usr, context, connectivity (~150-350B conditional)
  • _dd.configuration (~143B)
  • ddtags (~88B), service+version (~45B), source (~9B)
  • display.viewport (~27B), view.url+view.referrer (~44B)
  • _dd.sdk_name, _dd.format_version, session.type (~12B)
  • feature_flags when unchanged (~40-400B depending on flag count)

These fields have REPLACE semantics — they don't change between updates. In my prototype I added a second pass in startRumBatch.ts that stores the last assembled VIEW per view ID and strips unchanged REPLACE-semantics fields from subsequent view_updates (constructing a new object, not mutating). Steady-state savings: ~523B/VU base + ~56B per flag-set change. Without this, most of the bandwidth savings from the diff engine are negated by assembly overhead.

2. No periodic full VIEW refresh (no recovery from dropped events)

If any view_update is lost (network failure, batch timeout, intake hiccup), the backend's merged state drifts for the entire view lifetime with no self-healing. For SPAs where views can live for minutes or hours, this is a persistent silent corruption risk.

Suggestion: force a full view event every N updates or every T seconds (e.g., every 10 updates or 60s). Acts as a recovery checkpoint. The backend receives a complete snapshot and resets its merge state from that point.

3. No full VIEW on view end

When is_active goes false, the current diff sends a view_update containing only the changed fields from the last snapshot. If any earlier updates were lost, the final terminal state in the backend is incomplete.

Suggestion: always emit a full view event (not a diff) when is_active: false. This guarantees a complete final snapshot regardless of any prior losses.

Medium severity

4. _dd.page_states APPEND semantics are lossy on drop

If a view_update carrying new page_states entries is dropped, those foreground/background transitions are permanently unrecoverable — subsequent appends only send elements added after the last sent state. Given page_states is used for foreground time calculations and session replay stitching, this is a data quality risk.

One option: skip page_states entirely from view_update (let them be captured in periodic full VIEW refreshes as in point 2). Another option: always send the full page_states array on change (REPLACE semantics instead of APPEND), since the array is typically small.

5. feature_flags not covered by the diff

featureFlagContext.ts adds feature_flags to every view_update via assembly hooks — outside the diff engine's scope. They're always included even when unchanged. For customers with many flags this adds meaningful bytes per event. If stripping (point 1) is added, feature_flags would be handled there naturally.

Low severity

6. No snapshot cleanup on view end

diffTracker.reset() fires on new view.id, but completed views' snapshots persist until the tracker is overwritten. Minor memory concern for long-running SPAs with many navigations. Explicitly clearing on is_active: false would bound memory to active views only.


What looks good:

  • REPLACE for custom_timings (correct — whole object semantics)
  • batch.add() instead of upsert() for view_updates (each delta must be independently routable)
  • beforeSend protection (consistent with view)
  • Fallback to full view on diff failure
  • Empty diff = no event emitted
  • Deep clone in diffTracker (safe from mutation)
  • document_version always required in view_update

@mormubis mormubis force-pushed the adlrb/partial-view branch 4 times, most recently from 1950458 to 4ca0710 Compare March 12, 2026 11:26
@mormubis mormubis force-pushed the adlrb/partial-view branch 3 times, most recently from 40aee0f to c60c158 Compare March 18, 2026 11:34
@mormubis
Copy link
Copy Markdown
Contributor Author

/to-staging

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Mar 18, 2026

View all feedbacks in Devflow UI.

2026-03-18 11:55:05 UTC ℹ️ Start processing command /to-staging


2026-03-18 11:55:11 UTC ℹ️ Branch Integration: starting soon, merge expected in approximately 0s (p90)

Commit c60c158a5c will soon be integrated into staging-12.


2026-03-18 12:14:47 UTC ℹ️ Branch Integration: this commit was successfully integrated

Commit c60c158a5c has been merged into staging-12 in merge commit f3d007064e.

Check out the triggered DDCI request.

If you need to revert this integration, you can use the following command: /code revert-integration -b staging-12

gh-worker-dd-mergequeue-cf854d bot added a commit that referenced this pull request Mar 18, 2026
Integrated commit sha: c60c158

Co-authored-by: mormubis <adrian.delarosa@datadoghq.com>
@mormubis mormubis force-pushed the adlrb/partial-view branch from c60c158 to 52a76bb Compare March 30, 2026 09:04
@mormubis
Copy link
Copy Markdown
Contributor Author

/to-staging

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Mar 30, 2026

View all feedbacks in Devflow UI.

2026-03-30 09:53:33 UTC ℹ️ Start processing command /to-staging


2026-03-30 09:53:40 UTC ℹ️ Branch Integration: starting soon, merge expected in approximately 0s (p90)

Commit 52a76bbd97 will soon be integrated into staging-14.


2026-03-30 10:15:17 UTC ℹ️ Branch Integration: this commit was successfully integrated

Commit 52a76bbd97 has been merged into staging-14 in merge commit 1cf29217c5.

Check out the triggered DDCI request.

If you need to revert this integration, you can use the following command: /code revert-integration -b staging-14

gh-worker-dd-mergequeue-cf854d bot added a commit that referenced this pull request Mar 30, 2026
Integrated commit sha: 52a76bb

Co-authored-by: mormubis <adrian.delarosa@datadoghq.com>
Comment on lines +165 to +206
const viewId = serverRumEvent.view.id

// New view started
if (viewId !== currentViewId) {
currentViewId = viewId
lastSentView = serverRumEvent
viewUpdatesSinceCheckpoint = 0
batch.upsert(serverRumEvent, viewId)
return
}

// View ended (is_active: false)
if (!(serverRumEvent.view as any).is_active) {
currentViewId = undefined
lastSentView = undefined
viewUpdatesSinceCheckpoint = 0
batch.upsert(serverRumEvent, viewId)
return
}

// Checkpoint: every N intermediate updates, send a full view (unless disabled by flag)
if (!isExperimentalFeatureEnabled(ExperimentalFeature.PARTIAL_VIEW_UPDATES_NO_CHECKPOINT)) {
viewUpdatesSinceCheckpoint += 1
if (viewUpdatesSinceCheckpoint >= PARTIAL_VIEW_UPDATE_CHECKPOINT_INTERVAL) {
viewUpdatesSinceCheckpoint = 0
lastSentView = serverRumEvent
batch.upsert(serverRumEvent, viewId)
return
}
}

// Intermediate update: compute diff and send view_update.
// Note: view_update events are created here, post-assembly, and go directly to batch.add().
// They intentionally bypass RAW_RUM_EVENT_COLLECTED → assembly → RUM_EVENT_COLLECTED, which
// means they skip beforeSend entirely. view_update is an internal bandwidth optimization —
// not a customer-visible event type, and not modifiable via beforeSend.
if (!lastSentView) {
// Safety fallback (should not happen in practice)
lastSentView = serverRumEvent
batch.upsert(serverRumEvent, viewId)
return
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: all this could be simplified with a single condition shouldSendViewUpdate:

const shouldSendViewUdpate = lastSentView && viewId === currentViewId && serverRumEvent.view.is_active && viewUpdatesSinceCheckpoint > PARTIAL_VIEW_UPDATE_CHECKPOINT_INTERVAL

if (shouldSendViewUpdate) {
  viewUpdatesSinceCheckpoint += 1
  const diff = ...
  ...
} else {
  viewUpdatesSinceCheckpoint = 0
  batch.upsert(serverRumEvent, viewId)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer early returns. Each path (new view, view end, checkpoint, diff) is readable on its own. I'm not sure the single condition improves it. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a bit of logic is repeated in you current code, which makes things unnecessarily lengthy. Also there this some case:

 if (!lastSentView) {
      // Safety fallback (should not happen in practice)
      lastSentView = serverRumEvent
      batch.upsert(serverRumEvent, viewId)
      return
    }

which seem incomplete because it doesn't reset viewUpdatesSinceCheckpoint. It's probably fine as you stated that it shouldn't happen in practice, but then either it happens and we should make sure it behaves correctly, or it shouldn't and we remove it.

It feels like you are struggling because of imprecise typings. How can we improve the schema so types are better defined?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the fallback. Replaced as any with as RumViewEvent since is_active is already typed there. Not sure if there's a better way to improve the schema to avoid the cast entirely. What do you think?

mormubis added 4 commits April 6, 2026 16:08
- Add PARTIAL_VIEW_UPDATES to ExperimentalFeature enum
- Add VIEW_UPDATE to RumEventType, RawRumViewUpdateEvent, and RawRumEvent union
- Add viewDiff.ts: isEqual and diffMerge utilities implementing MERGE / REPLACE /
  APPEND strategies for computing minimal diffs between assembled view events
When partial_view_updates is enabled, startRumBatch intercepts assembled view
events and sends view_update diffs instead of full views for intermediate updates.

Key design: diff runs post-assembly so beforeSend always sees full view events
(backward-compatible). view_update events intentionally bypass the assembly
pipeline — they are a bandwidth optimization, not a customer-visible event type.

- computeAssembledViewDiff: diffs two assembled view events, always including
  required routing fields (view.id, view.url, _dd.document_version, format_version)
- Routing state machine: handles new view / view-end / checkpoint / diff cases
- view-end events (is_active: false) always sent as full view
- Full view checkpoint every 100 updates for backend recovery
- Exclude view_update from trackEventCounts and assembly beforeSend guard
- Add E2E tests covering all routing cases
@mormubis mormubis force-pushed the adlrb/partial-view branch from 75e714c to 5e79702 Compare April 6, 2026 14:10
@mormubis mormubis requested a review from BenoitZugmeyer April 8, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants