Implement a leader/follower pattern when Gemini is set to auto mode #18445

Coldaine · 2026-02-06T17:13:57Z

Coldaine
Feb 6, 2026

Disclaimer:
My words have been formatted by an LLM, but I am a person.

The problem is simple: Auto does a solid job selecting Pro vs. Flash at the start of a user turn. But that selection is locked for the entire agentic loop. If Auto correctly escalates to Pro for initial reasoning and planning, Pro then stays hot for every subsequent inference round — tool calls, debugging, file writes, all of it — burning expensive tokens on execution work Flash would handle blind.
Prior art already exists. Goose's lead/worker pattern solves exactly this:

🦢 Lead model (e.g. Claude Opus, GPT-4o) handles the first N inference rounds — planning, architecture, complex reasoning
⚙️ Worker model (e.g. Claude Haiku, GPT-4o-mini) takes over for execution rounds — file edits, test runs, routine implementation
🛡️ Failure fallback — if the worker starts generating broken code or tool failures, the orchestrator automatically pulls the lead back in for a few recovery rounds, then re-delegates

Critically: same context window, same session history. The switch happens at the seam between inference rounds — after tool results return to the orchestrator but before the next LLM completion is dispatched. No forked contexts, no separate planner-executor sessions. The orchestrator just points the next API call at a different model endpoint.
This is the seam that already exists in any agentic loop:
User Prompt
→ [Auto selects Pro] Inference Round 1 (planning)
→ Tool calls execute → results return to orchestrator
→ 🎯 THIS IS WHERE YOU SWITCH TO FLASH
→ Inference Round 2 (execution)
→ Tool calls execute → results return
→ Inference Round N...
→ Final Response
The orchestrator has full control between every inference round. Goose exploits this. Gemini CLI's Auto mode currently doesn't — it makes one model selection and holds it for the entire loop.
What I'm proposing is lightweight. Not a full planner-executor architecture with separate contexts. Not subagents. Just: let Auto re-evaluate (or deterministically downshift) at inference-round boundaries within a single user turn. Pro plans, Flash executes, and if Flash fumbles, Pro steps back in.
The open question: I don't know Gemini CLI's internals well enough to say whether the agentic loop exposes this orchestrator seam cleanly, or whether the whole thing is delegated to the API as a monolithic streaming call with no natural injection point. If it's the former, this should be relatively straightforward to implement — Goose's Rust codebase is open source and the pattern is well-documented. If it's the latter, it's a deeper architectural ask.
Would love to hear if others feel this pain, and whether the Gemini team has considered inference-round-level model routing as a feature.

Agentic Inference Loop Flow-2026-02-06-170940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a leader/follower pattern when Gemini is set to auto mode #18445

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Implement a leader/follower pattern when Gemini is set to auto mode #18445

Uh oh!

Coldaine Feb 6, 2026

Replies: 0 comments

Coldaine
Feb 6, 2026