Skip to content

[bpf-ci-bot] sched_ext rt_stall test still flaky: measurement includes pre-sleep CPU time #458

@kernel-patches-review-bot

Description

@kernel-patches-review-bot

Summary

The sched_ext rt_stall selftest remains flaky in CI despite the synchronization fix in commit 0b82cc331d2e. The test fails because it measures total CPU time since fork rather than the delta during the measurement window, causing pre-sleep CPU accumulation to skew the ratio below the 4% threshold.

Failure Details

Root Cause Analysis

The rt_stall test verifies that the sched_ext deadline server prevents RT tasks from starving EXT/FAIR tasks. It forks two children pinned to the same CPU — one EXT/FAIR and one SCHED_FIFO — then measures their CPU time ratio after sleep(5).

Commit 0b82cc331d2e added pipe-based synchronization so children complete their setup before the parent starts sleep(RUN_TIME). However, children start busy-looping immediately after signal_ready(), while the parent still needs to process both pipe reads before calling sleep(). During this gap, both children accumulate CPU time — with the RT child dominating.

The measurement reads total CPU time from /proc/pid/stat (utime + stime), which includes this pre-sleep time. This inflates the RT denominator:

Run Failing iteration EXT/FAIR RT Ratio Note
bpf-next_test i=2 (FAIR) 0.180s 4.740s 3.66%
constant blinding i=3 (EXT) 0.180s 5.690s 3.07% RT > RUN_TIME proves pre-sleep accumulation
perf_link i=3 (EXT) 0.190s 4.740s 3.85%

The RT task getting 5.69s of CPU time in a 5-second sleep window is conclusive evidence: ~0.69s of RT time was accumulated before the measurement started.

The failure tends to occur in later iterations (i=2 or i=3) because the parent has more work to do between iterations (destroying/re-attaching the sched_ext link), giving children more time to accumulate pre-sleep CPU time.

Relevant code: tools/testing/selftests/sched_ext/rt_stall.c:sched_stress_test()

Proposed Fix

Take before/after snapshots of CPU time around the sleep(RUN_TIME) window and compute deltas, rather than using total CPU time since fork. This eliminates the pre-measurement bias regardless of how long the gap between signal_ready() and sleep() takes.

See attached patch: 0001-selftests-sched_ext-Fix-rt_stall-flaky-measurement-w.patch

The fix adds:

  1. A pre-sleep snapshot of both children's CPU times via get_process_runtime()
  2. Subtraction of the pre-sleep snapshot from the post-sleep reading
  3. Error handling for the new snapshot reads

Impact

Without this fix, the sched_ext test job fails in most CI runs, blocking unrelated PRs from passing CI. The sched_ext job is not marked continue_on_error, so any sched_ext test failure fails the entire workflow.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions