Skip to content

Train iter improvements#43

Merged
rbx merged 3 commits intomasterfrom
train-iter-improvements
Mar 25, 2026
Merged

Train iter improvements#43
rbx merged 3 commits intomasterfrom
train-iter-improvements

Conversation

@rbx
Copy link
Copy Markdown
Member

@rbx rbx commented Mar 25, 2026

  • Pipe individual session output to log file.
  • Add interactive mode with option to terminate individual sessions.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Warning

Rate limit exceeded

@rbx has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ce09a38-ac9d-4667-9d46-c7901369e82a

📥 Commits

Reviewing files that changed from the base of the PR and between 3845857 and 252d39e.

📒 Files selected for processing (1)
  • train_iter.py
📝 Walkthrough

Walkthrough

Replaced the basic parallel execution loop with a task orchestration framework that introduces per-run log file capture, optional interactive curses TUI for monitoring and terminating runs, and a fallback plain-text progress display mode. Added CLI flag --no-tui and supporting helper functions for log directory creation, filename mapping, and elapsed-time formatting.

Changes

Cohort / File(s) Summary
Parallel Execution Framework
train_iter.py
Refactored task execution to use log file capture per subprocess, added interactive curses TUI mode (_run_tui, _draw_tui) displaying queued/running/completed states with elapsed times and run termination capability, plain progress fallback mode (_run_plain), and helper functions for log management. Updated run_all_parallel signature with no_tui parameter and added corresponding CLI flag.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI
    participant Orchestrator
    participant TUI/Logger
    participant Subprocess
    
    User->>CLI: Invoke with --no-tui flag (optional)
    CLI->>Orchestrator: Call run_all_parallel(no_tui=True/False)
    
    alt isatty() and not no_tui
        Orchestrator->>TUI/Logger: Initialize curses TUI mode
        Note over TUI/Logger: Display queued/running/completed runs
        User->>TUI/Logger: Request termination by index
    else
        Orchestrator->>TUI/Logger: Initialize plain progress mode
    end
    
    loop For each run
        TUI/Logger->>Subprocess: Launch with log file capture
        Subprocess->>Subprocess: Execute task, stream to disk
        Subprocess-->>TUI/Logger: Exit with status
        TUI/Logger->>TUI/Logger: Update elapsed time & state
        TUI/Logger->>User: Display progress (TUI or plain)
    end
    
    TUI/Logger-->>User: Final summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Train iter improvements' is vague and generic, using non-descriptive terms that don't convey meaningful information about the specific changes made. Provide a more specific title that highlights the main changes, such as 'Add interactive TUI and per-run log files to train_iter.py' or 'Implement interactive training session orchestration with logging'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description directly relates to the changeset by mentioning log file piping and interactive termination, which are core features implemented in the changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@train_iter.py`:
- Around line 175-209: The loop in _run_plain (and the other runner) can leave
child processes and open log_fh handles running if launch() raises or the loop
is exited via an exception; wrap the main scheduling loop in a try/finally (or
add a top-level finally) that iterates over active and for each (proc, label,
log_fh, t0) ensures log_fh.close() and terminates or kills the proc
(proc.terminate(); if still alive after a timeout, proc.kill()), waiting for
proc.wait() to avoid zombies; also guard the launch(combo, seed) call with
try/except to perform the same cleanup for already-started entries before
re-raising the exception so no child processes or file handles are leaked.
- Around line 298-309: The code blocks in the key handler remove the tuple from
active then call proc.wait() synchronously, which freezes the TUI; instead,
after proc.terminate() spawn a short-lived background worker (e.g.,
threading.Thread with daemon=True) that does proc.wait(), closes fh, increments
failure_count and appends to done_log so the main loop is non-blocking. Locate
the key handler logic that manipulates active, proc, fh, failure_count and
done_log and replace the direct
proc.wait()/fh.close()/failure_count++/done_log.append calls with a threaded
reaper function (or schedule the cleanup on the main loop using proc.poll()) to
reap and finalize the process asynchronously.
- Around line 324-333: The final TUI always blocks on stdscr.getch(), which
prevents the CLI from returning to the shell; change the logic in the end-of-run
block around _draw_tui/stddscr.nodelay and summary so it does not
unconditionally call stdscr.getch(): either remove the blocking getch and let
the function return, or make the wait conditional on an interactive terminal
(e.g. check sys.stdin.isatty() / os.isatty(0)) and only call getch (or use
stdscr.timeout with a short non-blocking wait) when interactive; update the code
paths that reference _draw_tui, stdscr.nodelay(False), and stdscr.getch
accordingly.
- Around line 156-166: The proc-log paths and filenames are deterministic and
get clobbered when sessions are reused or labels repeat; update make_log_dir to
create a unique subdirectory (e.g., append high-precision timestamp or a uuid4
nonce to the session folder) and change label_to_filename to append a short
unique suffix (timestamp microseconds or uuid) to each filename so repeated
labels don't overwrite; also audit any code that opens logs (the code around
make_log_dir and the other log-creation block referenced later) to keep using
the unique directory/filename and consider opening files with exclusive/create
semantics rather than plain "w".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9573f4a3-74fa-4ba0-b72c-5454cca71e29

📥 Commits

Reviewing files that changed from the base of the PR and between fa448ed and 3845857.

📒 Files selected for processing (1)
  • train_iter.py

@rbx rbx merged commit 8225f99 into master Mar 25, 2026
4 checks passed
@rbx rbx deleted the train-iter-improvements branch March 25, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant