instrumentation telemetry: validate session id headers#6510
instrumentation telemetry: validate session id headers#6510
Conversation
|
As per https://dd.slack.com/archives/D032MDTSCR1/p1773765779731369 We need to assert:
|
Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
|
Remaining gaps I believe are: Gap 1: The test should assert that for every event where DD-Session-ID != root_session_id (i.e. every child event), DD-Root-Session-ID must be present not just that at least one event has it globally. Gap 2 (exec vs fork): Same validation function for both test cases, exec propagation via env vars isn't distinctly tested. |
Both cases should be covered by the current test. We can discuss it in our next sync |
|
|
khanayan123
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments, tests LGTM
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50bdced3a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """ | ||
| # Use lifecycle events only; metrics and log events from lib-datadog can contain | ||
| # runtime/session_ids that do not map to tracer-generated telemetry. | ||
| telemetry_data = list(interfaces.library.get_lifecycle_events()) |
There was a problem hiding this comment.
Scope lifecycle checks to the spawn_child-triggered request
This validator reads all lifecycle telemetry (interfaces.library.get_lifecycle_events()) without filtering to the /spawn_child request that the setup just issued, so it can pass on unrelated startup/shutdown events even when child-process behavior is broken (or /spawn_child returns 404). Because the assertions are not request-scoped, the test does not reliably prove cross-process session header propagation.
Useful? React with 👍 / 👎.
| bool do_crash = strcmp(crash_str, "true") == 0; | ||
| bool use_fork = strcmp(fork_str, "true") == 0; |
There was a problem hiding this comment.
Parse /spawn_child booleans case-insensitively in cpp_nginx
The cpp_nginx handler compares crash and fork to "true" case-sensitively, but the new test setup passes Python booleans as query params (True/False). That makes fork=True evaluate as false here, so the fork path is silently skipped and crash requests are ignored; this invalidates fork-specific coverage for cpp_nginx.
Useful? React with 👍 / 👎.
Motivation
Enable telemetry session ID header tests (
DD-Session-ID,DD-Root-Session-ID,DD-Parent-Session-ID) across process forks per the Stable Service Instance Identifier RFC.Changes
sleep,crash,fork. Uses fork when supported, exec otherwise. Runtimes without fork (Java, Go, PHP, .NET) return 400 forfork=true.test_session_id_headers_across_forksandtest_session_id_headers_across_spawnedvalidate session ID headers in lifecycle telemetry. Usesget_lifecycle_events()to avoid lib-datadog metrics/log events. Asserts: DD-Session-ID = runtime_id, one root per app instance, at least two runtimes (parent + child).get_lifecycle_events()added to filter lifecycle events.docs/weblog/end-to-end_weblog.md.missing_featurefor other weblogs and non-fork runtimes.Workflow
🚀 Once your PR is reviewed and the CI is green, you can merge it!
🛟 #apm-shared-testing 🛟
SDK Implementations
Nodejs: DataDog/dd-trace-js#7821
Go: DataDog/dd-trace-go#4574
Java: DataDog/dd-trace-java#10914