Skip to content

Conversation

@Ridwannurudeen
Copy link
Contributor

Summary

Fixes three bugs in the multi-mode runtime system:

  1. Cross-platform file write failure (manager.py): os.rename() fails on Windows when the destination file already exists. The single-mode cortex already uses os.replace() correctly, but the multi-mode ModeManager still used os.rename() in two places — _create_runtime_config_file and _save_mode_state. Replaced both with os.replace() for atomic cross-platform writes.

  2. Incorrect return value for skipped transitions (manager.py): _execute_transition returned True when a transition was skipped because another transition was already in progress. This is misleading — callers interpret True as "transition completed successfully". Changed to return False so callers can distinguish between a successful transition and a skipped one.

  3. No recovery on failed mode transitions (cortex.py): _on_mode_transition had an explicit TODO: Implement fallback/recovery mechanism comment. When the transition to a new mode failed (e.g. bad config, missing plugin), the exception propagated and left the runtime in a broken state — old orchestrators stopped, new ones never started. Implemented rollback logic that attempts to re-initialize the previous mode when the target mode fails, keeping the system operational.

Closes #2202

Test plan

  • Added TestFileOperations tests verifying os.replace overwrites work correctly
  • Added TestTransitionReturnValue tests verifying return value semantics for _execute_transition
  • Added 4 transition recovery tests: rollback on init failure, rollback on start failure, reloading flag cleared on success, reloading flag cleared on total failure

- Replace os.rename with os.replace in ModeManager for atomic file writes
  that work on Windows (where os.rename fails if destination exists)
- Fix _execute_transition to return False when skipping concurrent
  transitions instead of incorrectly returning True
- Implement rollback recovery in _on_mode_transition so the system
  returns to the previous mode when a transition fails, instead of
  leaving the runtime in a broken state
- Add tests for file overwrite operations, transition return values,
  and mode transition rollback scenarios
@Ridwannurudeen Ridwannurudeen requested review from a team as code owners February 10, 2026 19:16
@github-actions github-actions bot added robotics Robotics code changes python Python code tests Test files labels Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Python code robotics Robotics code changes tests Test files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Multi-mode runtime: cross-platform file write failure and missing transition recovery

1 participant