fix: multi-mode runtime cross-platform file ops and transition recovery #2203
+207
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three bugs in the multi-mode runtime system:
Cross-platform file write failure (
manager.py):os.rename()fails on Windows when the destination file already exists. The single-mode cortex already usesos.replace()correctly, but the multi-modeModeManagerstill usedos.rename()in two places —_create_runtime_config_fileand_save_mode_state. Replaced both withos.replace()for atomic cross-platform writes.Incorrect return value for skipped transitions (
manager.py):_execute_transitionreturnedTruewhen a transition was skipped because another transition was already in progress. This is misleading — callers interpretTrueas "transition completed successfully". Changed to returnFalseso callers can distinguish between a successful transition and a skipped one.No recovery on failed mode transitions (
cortex.py):_on_mode_transitionhad an explicitTODO: Implement fallback/recovery mechanismcomment. When the transition to a new mode failed (e.g. bad config, missing plugin), the exception propagated and left the runtime in a broken state — old orchestrators stopped, new ones never started. Implemented rollback logic that attempts to re-initialize the previous mode when the target mode fails, keeping the system operational.Closes #2202
Test plan
TestFileOperationstests verifyingos.replaceoverwrites work correctlyTestTransitionReturnValuetests verifying return value semantics for_execute_transition