Skip to content

Allow overall test to pass when a retried phase eventually succeeds#1274

Open
aoyoqui wants to merge 2 commits intogoogle:masterfrom
aoyoqui:feature/overall-pass-on-retried-successful-phase
Open

Allow overall test to pass when a retried phase eventually succeeds#1274
aoyoqui wants to merge 2 commits intogoogle:masterfrom
aoyoqui:feature/overall-pass-on-retried-successful-phase

Conversation

@aoyoqui
Copy link

@aoyoqui aoyoqui commented Mar 2, 2026

Summary

With repeat_on_measurement_fail, a phase can run multiple times after measurement failures. Each failed run is recorded as a phase with outcome FAIL. Overall outcome is “any FAIL phase → test FAIL,” so the test is reported as Failed even when a later retry passes. That makes “retry until it passes” unusable when the desired result is “test Passed if the last run passed.”

New phase option: skip_failed_retries

When set with repeat_on_measurement_fail, a run that fails and triggers a retry is recorded as SKIP instead of FAIL. Retries still run; only the stored outcome of that iteration changes. The last run still decides the phase (PASS/FAIL), and the existing rule for overall outcome is unchanged: any FAIL phase → test FAIL. So if the last run passes, there is no FAIL phase and the test can be Passed.

Why implement it in the executor?

Single rule: “Any phase with outcome FAIL → test FAIL” stays the only rule. We don’t add a second path that ignores some FAILs only at finalization.
stop_on_first_failure: That logic uses phases[-1].outcome == FAIL. By turning the failed iteration into SKIP before retrying, the last phase is no longer FAIL, so we don’t stop and the retries run. If we only changed finalization and left those iterations as FAIL, we’d have to special-case “don’t stop when this FAIL will be retried” in the executor as well, duplicating the “this FAIL doesn’t count” rule.
History change: The trade-off is of course that we rewrite history. The user must decide if having failed retries is a must. For my particular use case, marking failed retries as SKIP actually simplifies data analysis

@google-cla
Copy link

google-cla bot commented Mar 2, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@glados-verma
Copy link
Collaborator

Thanks for your contribution! This is indeed a bug. However, my thinking is that stop_on_first_failure should not be needed, and the behavior of repeat_on_measurement_fail should simply be changed to match the intended behavior. Any particular reason you added stop_on_first_failure, beyond maintaining backwards compatibility?

@aoyoqui
Copy link
Author

aoyoqui commented Mar 3, 2026

Thanks for the prompt reply. Yes, I added the flag skip_failed_retries to maintain backwards compatibility. I guess you meant this flag in your message because I did not add stop_on_first_failure. There might be users out there that want to see failed iterations as Failed, I couldn't see in the history if this was the original intent, or is a bug.

My comment in the description regarding stop_on_first_failure is confusing and wrong. Please ignore that bit.

So if you want to change the behaviour, I can remove this new flag and use the existing repeat_on_measurement_fail but somebody else might come later asking for the current behaviour. Personally I don't necessarily think the current behaviour is wrong.

If my proposal is ok, I could add some more cases in repeat.py that demonstrate the usage of the stop flags and the new skip_failed_retries flag and their effect. Namely, cover all these cases:

With repeat_on_measurement_fail = True:

phase.stop_on_measurement_fail test.stop_on_first_failure phase.skip_failed_retries First run fail. Last run passes? Overall test outcome Test continues
T T T Won't repeat FAIL No
T T F Won't repeat FAIL No
T F T Won't repeat FAIL No
T F F Won't repeat FAIL No
F T T Yes PASS Yes
F T F No FAIL No
F F T Yes PASS Yes
F F F No FAIL Yes

Note that stop_on_measurement_fail "wins" over skip_failed_retries. Execution stops if there is a failure on the first iteration. This is ok since skip_failed_retries has more to do with what we report and not with execution flow.

@glados-verma
Copy link
Collaborator

Ah right, I meant skip_failed_retries.

This repeat_on_measurement_fail feature was added to "repeat a phase until it passes, up to a limit on repeats, because sometimes measurements can be flaky or borderline" (there are other ways to accomplish this, so this feature didn't necessarily have to be added, but now it exists :). The original requester didn't verify the behavior post-implementation.

If we get rid of skip_failed_retries, then the feature will work as intended, but the existing use case of "rerun phase if it fails, stop rerunning if the measurement passes, but fail the test anyway" is a rather weird "feature". It is also possible to accomplish that in other ways (and would be less obscure). My first question would be, why not use repeat_limit and force_repeat in PhaseOptions, or explicitly record separate measurements (possibly dimensioned measurements) without rerunning the phase.

So I have to weigh the balance of further complicating the API (it already has a number of "repeat"-like options) vs potentially breaking existing users. I think we should not add another option, "fix" this behavior, and see if anyone expresses a desire for the original behavior.


So, can you:

  • Remove the skip_failed_retries option
  • Update exe_test.py to test this behavior. I'd change phase_repeat_on_measurement_fail to take a callable instead of meas_value, and "measure" the result of the callable - thus you can pass in a mock object with side_effect set depending on the test case. The parameterized test cases can then pass a list of sequential return values (the side_effect) rather than the current static return value.
  • Ensure the phase is rerun within the limit if and only if the measurement fails.
  • It stops rerunning if the measurement passes, without hitting the repeat limit, and the test is a PASS
  • It should not rerun if the measurement isn't set
  • or if the phase errored out
  • or if the phase explicitly returned STOP

I might have missed something - let me know what you think.

@aoyoqui aoyoqui force-pushed the feature/overall-pass-on-retried-successful-phase branch 2 times, most recently from 9e5c93b to 1b598fb Compare March 9, 2026 11:15
@aoyoqui
Copy link
Author

aoyoqui commented Mar 9, 2026

  • I've removed the flag I introduced: skip_failed_retries.
  • I fixed the behaviour to match what happens when the phase returns PhaseResult.REPEAT: phase outcome is PhaseOutcome.SKIP, except when repeat_limit is reached, in which case is FAIL.
  • I adapted the tests in exe_test.py, is this what you had in mind?
    PTAL
    Thanks!

…is True, unless repeat_limit is reached.

Remove skip_failed_retries flag; fix behaviour instead
@aoyoqui aoyoqui force-pushed the feature/overall-pass-on-retried-successful-phase branch from 1b598fb to 26f13e5 Compare March 9, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants