Skip to content

Conversation

@Arpafaucon
Copy link

@Arpafaucon Arpafaucon commented Jan 19, 2026

What does this PR do?

Add gauges for

  • incoming compressed body size
  • incoming uncompressed body size
  • response body size

Charted course

as a newcomer, I need to write it down somewhere^^

  • temp version of the PR with all histogram! calls commented out
    • rationale: George pointed out downstream services don't ingest histogram metrics yet
  • get container built by CI
  • test that on my (to-be-created) SMP playground

Motivation

cf https://datadoghq.atlassian.net/wiki/spaces/SMP/pages/5915967832/Target+Invariants
We want to investigate how lading can increasingly check the correctness of the program under test.
More specifically, this is the first step towards supporting target invariant checks on the target outputs

Related issues

not sure where it is stored, help needed

@Arpafaucon Arpafaucon requested a review from a team as a code owner January 19, 2026 10:52
@Arpafaucon Arpafaucon self-assigned this Jan 19, 2026
@Arpafaucon Arpafaucon marked this pull request as draft January 27, 2026 09:34
@Arpafaucon Arpafaucon force-pushed the greg/http_payload_size_metrics branch from 6d5befa to 20e625a Compare January 27, 2026 11:50
@Arpafaucon
Copy link
Author

Arpafaucon commented Jan 27, 2026

Test run done today
command run:

aws-vault exec smp -- bash bin/submit_to_cluster --team-id 60693115 --baseline 7.74.1 --comparison 7.75.0 --path-to-experiments ./experiments/regression/agentsimp --tags "purpose=test-lading" --replicas 2

where agentsimp is a copy of agent with

  • all but quality_gates_idle checks removed
  • the lading>version in config.yaml set to sha-20e625a87ecedad9bd4b8a5ad6ef9ad01f122993 (current tip of this branch)

Got result

+ cat /var/folders/dl/2n3g00gs4n9d0333yr_jk8x00000gp/T/tmp.RN2Ps0Kt4x/outputs/report.md
# Regression Detector Results

[Metrics dashboard](https://app.datadoghq.com/dashboard/ykh-ua8-vcu/smp-regression-detector-capture-data----refined?fromUser=true&refresh_mode=paused&tpl_var_run-id%5B0%5D=6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&view=spans&from_ts=1769531584000&to_ts=1769531594000&live=false)
[Target profiles](https://app.datadoghq.com/profiling/explorer?query=env%3Asingle-machine-performance%20service%3Adatadog-agent%20job_id%3A6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&viz=stream&start=1769524384000&end=1769535194000&paused=true)
Run ID: 6c8e61d8-3da0-4df3-a4dd-61472ebcc97f

Baseline: 7.74.1
Comparison: 7.75.0


## Optimization Goals: ✅ No significant changes detected


<details>
<summary><h2>
Fine details of change detection per experiment
</h2></summary>

| perf | experiment        | goal               | Δ mean % | Δ mean % CI    | trials | links                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|------|-------------------|--------------------|----------|----------------|--------||
|| quality_gate_idle | memory utilization | +0.22    | [-0.10, +0.54] | 1      | [Logs](https://app.datadoghq.com/logs?query=experiment%3Aquality_gate_idle%20run_id%3A6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&agg_m=count&agg_m_source=base&agg_q=%40span.url&agg_q_source=base&agg_t=count&fromUser=true&index=single-machine-performance-target-logs&messageDisplay=inline&refresh_mode=paused&storage=hot&stream_sort=time%2Cdesc&top_n=100&top_o=top&viz=stream&x_missing=true&from_ts=1769524384000&to_ts=1769535194000&live=false) [bounds checks dashboard](https://app.datadoghq.com/dashboard/vz3-jd5-bdi?fromUser=true&refresh_mode=paused&tpl_var_experiment%5B0%5D=quality_gate_idle&tpl_var_job_id%5B0%5D=6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&view=spans&from_ts=1769531584000&to_ts=1769531594000&live=false) |
</details>


<details>
<summary><h2>
Bounds Checks: ✅ Passed
</h2></summary>

| perf | experiment        | bounds_check_name  | replicates_passed | observed_value     | links                                                                                                                                                                                                                                                                               |
|------|-------------------|--------------------|-------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| quality_gate_idle | intake_connections | 2/2               | 3 = 3              | [bounds checks dashboard](https://app.datadoghq.com/dashboard/vz3-jd5-bdi?fromUser=true&refresh_mode=paused&tpl_var_experiment%5B0%5D=quality_gate_idle&tpl_var_job_id%5B0%5D=6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&view=spans&from_ts=1769531584000&to_ts=1769531594000&live=false) |
|| quality_gate_idle | memory_usage       | 2/2               | 154.06MiB ≤ 175MiB | [bounds checks dashboard](https://app.datadoghq.com/dashboard/vz3-jd5-bdi?fromUser=true&refresh_mode=paused&tpl_var_experiment%5B0%5D=quality_gate_idle&tpl_var_job_id%5B0%5D=6c8e61d8-3da0-4df3-a4dd-61472ebcc97f&view=spans&from_ts=1769531584000&to_ts=1769531594000&live=false) |
</details>


<details>
<summary><h2>
Explanation
</h2></summary>

**Confidence level:** 90.00%
**Effect size tolerance:** |Δ mean %| ≥ 5.00%

Performance changes are noted in the **perf** column of each table:

* ✅ = significantly better comparison variant performance
* ❌ = significantly worse comparison variant performance
* ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that *if our statistical model is accurate*, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

3. Its configuration does not mark it "erratic".
</details>

My understanding at this point is that

  • we don't break anything
  • we still have this histogram situation to resolve one way or another

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants