[bpf-ci-bot] sched_ext rt_stall test still flaky: measurement includes pre-sleep CPU time


## Summary

The sched_ext `rt_stall` selftest remains flaky in CI despite the synchronization fix in commit 0b82cc331d2e. The test fails because it measures **total** CPU time since fork rather than the delta during the measurement window, causing pre-sleep CPU accumulation to skew the ratio below the 4% threshold.

## Failure Details
- **Test / Component:** selftests/sched_ext rt_stall
- **Frequency:** Most sched_ext CI runs on x86_64 (observed in 3+ independent PRs over 3 days)
- **Failure mode:** Flaky — ratio drops to 3.0-3.9% instead of expected ~5%
- **Affected architectures:** x86_64
- **CI runs observed:**
  - https://github.com/kernel-patches/bpf/actions/runs/22834153226 (bpf-next_test)
  - https://github.com/kernel-patches/bpf/actions/runs/22803463271 (fix constant blinding bypass)
  - https://github.com/kernel-patches/bpf/actions/runs/22788551155 (perf_link: avoid failures)

## Root Cause Analysis

The `rt_stall` test verifies that the sched_ext deadline server prevents RT tasks from starving EXT/FAIR tasks. It forks two children pinned to the same CPU — one EXT/FAIR and one SCHED_FIFO — then measures their CPU time ratio after `sleep(5)`.

Commit 0b82cc331d2e added pipe-based synchronization so children complete their setup before the parent starts `sleep(RUN_TIME)`. However, children start busy-looping **immediately after** `signal_ready()`, while the parent still needs to process both pipe reads before calling `sleep()`. During this gap, both children accumulate CPU time — with the RT child dominating.

The measurement reads **total** CPU time from `/proc/pid/stat` (`utime + stime`), which includes this pre-sleep time. This inflates the RT denominator:

| Run | Failing iteration | EXT/FAIR | RT | Ratio | Note |
|-----|-------------------|----------|----|-------|------|
| bpf-next_test | i=2 (FAIR) | 0.180s | 4.740s | 3.66% | |
| constant blinding | i=3 (EXT) | 0.180s | **5.690s** | 3.07% | RT > RUN_TIME proves pre-sleep accumulation |
| perf_link | i=3 (EXT) | 0.190s | 4.740s | 3.85% | |

The RT task getting **5.69s** of CPU time in a 5-second sleep window is conclusive evidence: ~0.69s of RT time was accumulated before the measurement started.

The failure tends to occur in later iterations (i=2 or i=3) because the parent has more work to do between iterations (destroying/re-attaching the sched_ext link), giving children more time to accumulate pre-sleep CPU time.

**Relevant code:** `tools/testing/selftests/sched_ext/rt_stall.c:sched_stress_test()`

## Proposed Fix

Take before/after snapshots of CPU time around the `sleep(RUN_TIME)` window and compute deltas, rather than using total CPU time since fork. This eliminates the pre-measurement bias regardless of how long the gap between `signal_ready()` and `sleep()` takes.

See attached patch: `0001-selftests-sched_ext-Fix-rt_stall-flaky-measurement-w.patch`

The fix adds:
1. A pre-sleep snapshot of both children's CPU times via `get_process_runtime()`
2. Subtraction of the pre-sleep snapshot from the post-sleep reading
3. Error handling for the new snapshot reads

## Impact

Without this fix, the sched_ext test job fails in most CI runs, blocking unrelated PRs from passing CI. The sched_ext job is not marked `continue_on_error`, so any sched_ext test failure fails the entire workflow.

## References
- Prior fix: 0b82cc331d2e ("selftests/sched_ext: Fix rt_stall flaky failure")
- Original test: be621a76341c ("selftests/sched_ext: Add test for sched_ext dl_server")
- Related vmtest issue: https://github.com/kernel-patches/vmtest/issues/453 (covers numa test, not rt_stall)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bpf-ci-bot] sched_ext rt_stall test still flaky: measurement includes pre-sleep CPU time #458

Summary

Failure Details

Root Cause Analysis

Proposed Fix

Impact

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run	Failing iteration	EXT/FAIR	RT	Ratio	Note
bpf-next_test	i=2 (FAIR)	0.180s	4.740s	3.66%
constant blinding	i=3 (EXT)	0.180s	5.690s	3.07%	RT > RUN_TIME proves pre-sleep accumulation
perf_link	i=3 (EXT)	0.190s	4.740s	3.85%

[bpf-ci-bot] sched_ext rt_stall test still flaky: measurement includes pre-sleep CPU time #458

Description

Summary

Failure Details

Root Cause Analysis

Proposed Fix

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions