Planner Graph Refactor Proposals — Ranked Evaluation¶
Evaluator: Fable 5 (claude-fable-5)
Date: 2026-06-11
Inputs: docs/planner-graph-ref/current-graph.md, the 11 files in docs/planner-graph-ref/proposals/, and the live code in src/venturescope/planner/ and src/venturescope/conversation/graph.py.
Disclosure: one of the evaluated proposals (
fable-5-proposal.md) was authored by the same model family as this evaluator. To keep the ranking honest, every load-bearing claim in every proposal was checked against the code, and the criteria below are applied uniformly. Where the fable-5 proposal ranks well, the reasons are verifiable code facts, not taste.
Method¶
Each proposal was scored against six criteria, in descending weight:
- A. Behavioral fidelity — does the new topology map 1:1 to the current
plan_node()semantics, or does it silently change behavior (rewrite scope, loop caps, extra LLM calls)? This is a hot-path refactor of every conversation turn; fidelity dominates. - B. LangGraph idiom soundness — pure conditional-edge functions, interrupt discipline, checkpoint-serializable state additions.
- C. Operational awareness — checkpoint write amplification on the Postgres saver, and migration of in-flight threads parked at the
ask_userinterrupt. - D. Granularity balance — enough nodes that the graph tells the truth, few enough that it stays readable and cheap.
- E. Migration & test plan quality — staged, each stage green, test seams identified.
- F. Document groundedness — accuracy of claims about the actual codebase.
Ground-truth facts (verified in code)¶
Proposals contradict each other on several points. These facts, checked against the current sources, are the referee:
plan_node()spansagent.py:846-1192;route_after_plan()(agent.py:2070) trivially returnsdecision.action. The god-node diagnosis shared by all 11 proposals is correct.- The planner subgraph is Postgres-checkpointed in production.
conversation/graph.py:58compiles it with the outer graph'sAsyncPostgresSaver; theMemorySaverinagent.py:2127is only the standalonecompile_graph()helper. Therefore every graph superstep is a real DB checkpoint write, and between outer turns the planner thread is parked mid-graph at theask_userinterrupt. Node renames/removals are not free for live conversations. - There is exactly one
interrupt()site — insideask_user_node(agent.py:2051). Region/currency answers are dispatched insideobserve_user_nodevia_handle_region_answer/_handle_currency_answer(agent.py:1684-1686). - Four LLM call sites live inside
plan_node's scope: proactive decomposition (agent.py:628via_proactive_decompositions), blocked-path decomposition (agent.py:961), the planner decision itself (agent.py:1059), and post-LLM decomposition for composite targets (agent.py:1080). Any proposal claiming its "preparation/guard" stage is LLM-free while including proactive decomposition is internally inconsistent. - Deterministic decisions bypass the rewrite chain. The acquisition fast path (
agent.py:1018-1024) and auto-finish (agent.py:1026-1035) return early with only_adjust_calculation_decisionapplied. The redirect/cap chain (agent.py:1070-1192) applies to LLM decisions only. Wiring deterministic decisions through the full policy chain is a behavior change, not a refactor. run_planner_step()attributes per-turn search attempts using an iteration floor (_turn_searches,planner/__init__.py:212,228). Any topology where a loop path skips the iteration increment breaks both themax_iterssafety net and turn attribution in the UI.
Where all proposals agree¶
The direction is not in dispute. All 11 proposals converge on: the plan node is a god node; the LLM decision should become a thin, isolated node; deterministic gates (iteration caps, region/currency bootstrap, calculator lifecycle, acquisition, auto-finish) should become graph stages with conditional edges; action nodes should loop back to a deterministic entry node; and migration should be "extract helpers first, rewire second."
The disagreements — and therefore the ranking axes — are: granularity (2 to ~14 control nodes), where post-LLM rewrites live (node vs. edge function vs. per-rule nodes), bootstrap question handling (shared ask_user vs. dedicated interrupt nodes), and state additions (flags vs. cached objects).
Ranked list¶
1. fable-5-proposal.md — tick → prepare → select → decide → guard¶
Five control stages; action nodes unchanged, looping back to tick; state gains decision_origin and llm_failed.
Good:
- The only proposal that addresses in-flight checkpoint compatibility (fact 2): it recognizes that threads parked at the interrupt reference the old topology, and mitigates with a planner thread-namespace bump (:planner:v2) plus the already-supported re-bootstrap from prior_schema/prior_dynamic_decompositions. Every other proposal is silent or wrong here.
- The only proposal that preserves the rewrite-scope split exactly (fact 5): guard applies the full pipeline for decision_origin="llm" and only _adjust_calculation_decision for deterministic decisions. Proposals 2, 6, and 8 silently broaden the scope.
- Carries llm_failed through state, preserving the current infinite-loop protection after structured-output failures.
- Prices the cost honestly: ~4 extra checkpoint writes per tick, with a named fallback (fuse tick+prepare) if it ever matters. Isolating prepare makes the proactive-decomposition LLM call individually resumable — a concrete crash-recovery win, since LangGraph replays from node boundaries.
- 1:1 responsibility table with line numbers; explicit "this is a topology refactor, not a behavior change" contract; clean 3-step migration where step 1 keeps all tests green.
- State additions are plain str/bool — serializer-safe by construction (fact 4 of checkpoint_serde.py discipline).
Bad:
- Bootstrap questions are fused into tick, giving it two jobs (iteration lifecycle + region/currency). gpt-5.4/gpt-5.5 make this stage more explicit; fable-5 trades explicitness for one fewer per-iteration superstep — defensible, but a judgment call it under-argues.
- select keeps an embedded LLM call (blocked-path decomposition, agent.py:961) and a five-step deterministic ladder — it remains the densest node; the "every LLM call is a node" goal is explicitly deferred.
- Heavy reliance on current line numbers makes the document rot-prone.
Verdict: the most complete answer on the two hardest, least-visible problems (parked threads, rewrite-scope fidelity), at sane granularity.
2. gpt-5.5-proposal.md — enter_iteration → prepare_schema → require_region/require_currency → acquisition_gate → plan → normalize_decision → retry_gate → maybe_calculate¶
Seven control nodes; bootstrap emit-nodes route into the existing ask_user.
Good:
- The most disciplined responsibility contracts: each node section states what it owns and what it must not do — the best defense against the god node re-growing.
- require_region/require_currency emit decisions but reuse the single ask_user interrupt — explicit bootstrap stages without multiplying the interrupt surface (fact 3). The cleanest treatment of bootstrap among all proposals.
- Explicitly warns against caching FieldAcquisition recipes in state unless serialization is proven safe — directly rebutting the mistake deepseek and mimo make.
- Best test migration map: names which existing test_planner_decisions.py assertions move to which new node. Best non-goals section, including "do not move planner logic into the outer graph."
- Recommends llm_failed/decision_origin state additions — aligned with fact 5's needs.
Bad:
- Its wiring routes deterministic component tasks through normalize_decision → retry_gate → maybe_calculate — broader than today's calc-adjust-only treatment (fact 5). It names decision_origin but never gates the policy nodes on it, so as written this subtly changes behavior.
- The three-node policy chain (normalize → retry → calculate) means three extra Postgres supersteps after every decision; one origin-gated policy node achieves the same testability at a third of the write cost.
- Checkpoint compatibility is treated only as "don't remove state fields"; node renames vs. parked threads and write amplification (fact 2) go unmentioned.
Verdict: the most rigorous specification; needs origin gating and one policy node instead of three to be safely adoptable.
3. gpt-5.4-proposal.md — tick → bootstrap_gate → prepare_context → acquisition_gate → llm_plan → decision_policy¶
Five-to-six control stages; one consolidated policy node.
Good:
- The best design judgment, distilled into explicit anti-pattern rules that several other proposals violate: conditional-edge functions must not mutate state (qwen-3.6 fails this); only tick increments iterations (qwen-3.7 fails this); ask_user stays the single interrupt node (kimi and qwen-3.6 fail this); and do not turn every rewrite into a node (opus's step 3 fails this).
- Notices the operational constraint others miss: attempt tracking and turn-scoped search extraction depend on the "one iteration per decision cycle" invariant (fact 6) — its loopback-to-tick rule protects it.
- Granularity is the sweet spot: one decision_policy node for all rewrites, with the explicit argument that a derived_redirect → source_policy → retry_policy → calculator_policy chain "would make the graph harder to follow than it is today."
- Sensible 6-stage migration, smallest-seam-first, with "rename tests only after behavior is stable."
Bad:
- Under-specified: no state-field design at all (how does acquisition_gate tell the router "deterministic decision present" vs. "needs LLM"? no decision_origin/llm_failed), no line-level mapping, no code sketches.
- Deterministic decisions flow through decision_policy with no origin gating — the fact-5 fidelity risk exists here too, unaddressed.
- No checkpoint-compatibility or parked-thread discussion.
Verdict: the right rules and the right shape, but a principles memo rather than an implementable spec; it needs fable-5's or gpt-5.5's mechanics underneath it.
4. opus-proposal.md — gates + correctors as ~14 individual nodes with a dispatch invariant¶
Good:
- The sharpest problem analysis in the set: a 17-row concern inventory, and the clearest conceptual vocabulary (gate / enrich / acquire / plan / corrector / dispatch). The "every path deposits a PlannerDecision; dispatch reads decision.action" invariant is an elegant unifying contract.
- The per-node write-ownership table genuinely solves the duplicated schema_changed/dynamic_decomps bookkeeping that plagues the current early returns.
- Honest about hop costs; 3-step migration with per-step risk ratings; thoughtful open questions (corrector merge trade-offs, enrich-before-gates ordering).
Bad:
- The end state is over-granular: ~10 supersteps per planner iteration, each a Postgres checkpoint write (fact 2). Worse, the proposed mitigation is factually wrong: statically add_edge-chained nodes still execute as separate checkpointed supersteps — LangGraph does not batch them. The main cost therefore has no real mitigation.
- Per-corrector nodes (c_redirect_derived, c_redirect_web, …) add graph noise without behavioral benefit — the correctors are already small pure functions; promoting each to a node buys checkpoint granularity nobody needs between two pure rewrites. gpt-5.4's counter-argument applies directly.
- Smaller inaccuracies: the Mermaid loops action nodes back to START rather than the iteration gate; "checkpoint footprint shrinks" is misleading (channel deltas are small, but checkpoint count grows ~10×).
Verdict: read it for the analysis and taxonomy; implement only its steps 1-2. Step 3 (per-corrector nodes) should be rejected.
5. deepseek-4-pro-proposal.md — guard → prepare → plan → adjust¶
Good:
- Clean four-node shape with the correct loopback target (guard), and the only proposal that explains why deterministic decisions still route through adjust ("the alternative would duplicate _adjust_calculation_decision logic") — closest to engaging with fact 5, even if it then over-applies the chain.
- The alternatives-considered section is unique and valuable: it explicitly rejects the 16-node explosion, the guard-only minimal change, and a calculator subgraph — with reasons that hold up.
- Concrete 4-phase migration ending with the actual verify commands (make test, ruff, mypy).
Bad:
- Adds recipes: dict[str, FieldAcquisition] to checkpointed state. FieldAcquisition objects would need MSGPACK_ALLOWLIST registration (checkpoint_serde.py is the project's single source of truth for this), and the cached copy can drift from dynamic_decompositions — exactly the trap gpt-5.5 warns against. Rebuilding recipes per node is cheap; caching them in durable state is the wrong trade.
- Smuggles in a behavior change: a plan → plan self-loop that re-prompts the LLM after generating a decomposition. Today the post-LLM decomposition proceeds straight into the redirect chain (agent.py:1070-1088) with no second planner call. Extra LLM cost and a new loop, presented as pure restructuring.
- LOC estimates are optimistic (prepare "~50 lines" actually absorbs composition + calculator gates + acquisition + auto-finish — most of the old node); parked-thread compatibility is waved off as "just a different node name."
Verdict: a good skeleton with two specific mistakes (recipes-in-state, self-loop) that must be stripped out before use.
6. kimi-2.6-proposal.md — tick → prepare → calc_gate → acquire → (route_direct | decide) → enforce_policy¶
Good:
- Sound phase separation; decide is genuinely LLM-only (~30 lines); a single consolidated enforce_policy node matches the gpt-5.4 sweet spot.
- route_direct makes the deterministic-task-to-policy bridge an explicit, visible adapter.
- The appendix decision matrix (every logic block → proposed location → reason) is the best quick-reference table in the set.
Bad:
- Introduces ask_region/ask_currency as dedicated interrupt nodes. Today there is exactly one interrupt site (fact 3); multiplying it complicates resume handling, the outer _extract_interrupt_payload contract, and parked-thread migration — for a question type observe_user already handles. gpt-5.4 names this exact anti-pattern.
- Double-application bug in its own wiring: route_direct applies _adjust_calculation_decision, then routes into enforce_policy, which applies caps, redirects, and _adjust_calculation_decision again — both a deviation from fact 5 and a double-adjust.
- Sloppy in places: a calc_adjust node appears in the diagram but is never defined; there is no state-changes section at all (the _route tag mechanism is only implied by a legacy-wrapper sketch); no checkpoint-compat discussion.
Verdict: right instincts on policy consolidation, undermined by the interrupt-surface expansion and unexamined wiring details.
7. glm-5.1-proposal.md — pre_check → acquire_or_plan (2 nodes)¶
Good:
- The most honest minimal increment: explicitly keeps post-LLM rewriting fused with the LLM node ("Option A") and defers a validate_decision split, with stated reasoning. Lowest migration risk of any proposal.
- The _build_plan_output helper — collapsing the 9-way conditional-return assembly into one composable function — is a genuinely useful cleanup nobody else proposed, valuable during any proposal's extract phase.
- Flags resume-after-interrupt as something to test, and its risk table is mostly sober.
Bad:
- It does not reach the goal. acquire_or_plan remains a ~200-line multi-concern node (acquisition + LLM + full rewrite chain) by its own admission. The LLM call is still not its own superstep, so a crash after the planner call still redoes it, and observability still shows one opaque step.
- Internal contradiction: pre_check is billed as "pure state inspection — no LLM calls," yet its responsibility list includes _proactive_decompositions(), which calls generate_decomposition — an LLM call (fact 4).
- "LangGraph checkpointer is keyed by state, not node name" is misleading reassurance: parked threads carry pending-task/next-node references, which is precisely why fable-5 bumps the thread namespace.
- As an end state it is strictly dominated: every higher-ranked proposal contains glm-5.1 as its phase 1.
Verdict: a safe first commit, not a destination.
8. qwen-3.7-max-proposal.md — 10 control nodes with full code sketches¶
Good:
- By far the most implementation-ready on paper: complete node bodies, routing functions, and the full _build_state_graph(). Correctly orders the completion check before the LLM, and carries _llm_failed.
- The guard/transformation/LLM node classification is clear and the per-node code makes review concrete.
Bad:
- A real wiring bug: observe_user → check_region_currency skips check_termination, the only node that increments iterations. Every ask→answer cycle therefore never ticks the counter — the max_iters safety net is dead on the dominant loop, and run_planner_step's iteration-floor turn attribution (fact 6) silently breaks. The proposal's completeness makes this verifiable, and it fails verification.
- Claims _route/_llm_failed are "excluded from checkpointing" — no such mechanism exists for LangGraph TypedDict state without custom serde; they will persist in every checkpoint.
- Routes deterministic acquisition decisions through enforce_caps (contra fact 5), even though its own acquisition_routing code already applied calc-adjust — scope broadening plus double-apply.
- 10 control nodes ≈ 6-8 Postgres supersteps per iteration (fact 2), and build_dynamic_recipes is recomputed in five separate nodes.
Verdict: impressive effort whose own detail exposes that it was never traced against the real control flow; the granularity is also past the useful point.
9. mimo-2.5-pro-proposal.md — guards → acquire → decide (3 nodes)¶
Good:
- A reasonable coarse split, and the phase-1 idea of keeping plan_node as a thin orchestrator "to surface hidden coupling before the full split" is a sensible de-risking move. Honest open questions (separate validate_decision? plan_router node or function?).
Bad:
- decide remains a mini god node: LLM call + acquisition-task conversion + all three redirectors + cap enforcement. The claimed "~80 lines" is unrealistic against the ~200 source lines those concerns occupy; the central problem is relocated rather than solved.
- Internal contradiction: the risk table promises "keep all state keys identical" while acquire introduces a new next_acquisition_task state field — an AcquisitionTask object that would hit the serializer allowlist problem, unflagged.
- The decide → plan_router edge is left ambiguous ("or inline routing"); guards bundles four concerns (iteration, bootstrap, calculator gates, abort); interrupt/checkpoint concerns get one hand-wave row.
Verdict: the thinnest of the workable proposals — nothing fatally broken, but little added over the obvious split, with self-contradictions.
10. qwen-3.6-plus-proposal.md — preflight → decompose → plan, redirects in route_after_plan, route_finish_check, dedicated region/currency ask+observe nodes¶
Good:
- Thorough before/after tables grounded in real helpers (_handle_region_answer etc.); a decompose node that isolates the decomposition LLM call; a 5-phase migration; the route_finish_check instinct (validate finish before END) is interesting.
Bad — two disqualifying design errors:
- Its self-declared "biggest win" — moving the redirect/cap logic into the route_after_plan conditional-edge function — cannot work in LangGraph. Edge functions return route strings; they cannot persist a rewritten decision. The cap fallbacks construct new decisions (a new user_question, status="aborted"); with rewrite-in-edge, ask_user would interrupt carrying the stale search decision (no question text), and the abort status would be lost. This is exactly the "no state mutation in edge functions" anti-pattern gpt-5.4 names.
- route_finish_check →|missing fields| plan loops directly back to plan, bypassing preflight — no iteration increment on that path (fact 6), so an LLM that keeps emitting finish while fields are missing produces an unbounded LLM-call loop.
- Also the largest interrupt/resume surface of any proposal: three interrupt-adjacent ask nodes plus two new observe nodes; and the "no new state fields needed" claim is shaky given the routing its own design requires.
Verdict: detailed but built on a mechanism LangGraph does not support, plus an uncapped loop; would need its centerpiece redesigned into a node (at which point it converges on gpt-5.4/kimi).
11. gemini-3.1-pro-proposal.md — prepare_state → evaluate_rules → llm_plan (3 nodes)¶
Good:
- The core three-way cut (state mutation / deterministic rules / LLM) is sound and clearly narrated; correct that visualizers (LangSmith/Mermaid) cannot see the hidden flow; loop-back to prepare_state keeps guards running each iteration; "zero risk of unintentionally invoking the LLM when a deterministic rule should have fired" is a fair framing of the value.
Bad:
- Materially incomplete. The entire post-LLM rewrite/cap layer (agent.py:1070-1192 — derived/web redirects, search and ask caps, calculation adjustment) is absent from both its problem statement and its design; its route_after_llm "just blindly returns decision.action." Implemented as written, the loop protections vanish (regressions: infinite search retries, no ask caps); implemented charitably, they stay buried in llm_plan and the result is glm-5.1 with less analysis.
- evaluate_rules is itself a rules god node — six-plus concerns in one node; the problem moves rather than dissolves.
- Factual sloppiness: repeatedly calls the planner "the Gemini LLM" (the project uses a provider-agnostic LLMClient with an OpenAI implementation); "~150-line procedural waterfall" undercounts a ~350-line function; no state spec, no risk analysis, no checkpoint discussion.
Verdict: the shallowest and least accurate of the set; superseded by every proposal above it.
Comparison table¶
| # | Proposal | Control nodes | LLM call isolated | Det.-rewrite scope preserved (fact 5) | Single interrupt kept (fact 3) | Pure edge fns | Parked-thread compat (fact 2) | Correctness issues |
|---|---|---|---|---|---|---|---|---|
| 1 | fable-5 | 5 | yes (decide; decomposition partially) | yes (decision_origin) |
yes | yes | yes (namespace bump) | none found |
| 2 | gpt-5.5 | 7 | yes | no (policy chain unguarded) | yes | yes | partial (state fields only) | scope broadening |
| 3 | gpt-5.4 | 5-6 | yes | unaddressed | yes (explicit rule) | yes (explicit rule) | no | under-specified mechanics |
| 4 | opus | ~14 | yes | yes (corrector order kept) | yes | yes | no | wrong batching claim; write amplification |
| 5 | deepseek-4-pro | 4 | yes | mostly (explicit adjust routing) | yes | yes | no | recipes-in-state; plan self-loop re-prompt |
| 6 | kimi-2.6 | ~8 | yes | no (route_direct double-adjust) | no (ask_region/ask_currency) | yes | no | undefined node; no state spec |
| 7 | glm-5.1 | 2 | no (fused with rewrites) | yes (by not splitting) | yes | yes | claimed-safe (misleading) | pre_check "no LLM" contradiction |
| 8 | qwen-3.7-max | 10 | yes | no (enforce_caps on det. path) | yes | yes | no | iteration-skip wiring bug; "excluded from checkpoint" claim false |
| 9 | mimo-2.5-pro | 3 | no (decide bundles rewrites) | partially | yes | ambiguous | no | state-key contradiction |
| 10 | qwen-3.6-plus | 6+ | yes | n/a (mechanism broken) | no (3 interrupt nodes) | no (rewrites in edge fn) | no | edge-rewrite impossible; uncapped finish-check loop |
| 11 | gemini-3.1-pro | 3 | yes | omitted entirely | yes | yes | no | caps/redirects missing from design |
Conclusion and recommendation¶
No single proposal should be adopted verbatim, but the choice of backbone is clear: take the fable-5 topology and harden it with gpt-5.4's discipline rules and gpt-5.5's specification assets.
The fable-5 and gpt-5.4 proposals are nearly isomorphic five-stage pipelines (tick/prepare/select/decide/guard vs. tick/bootstrap_gate/prepare_context/acquisition_gate/llm_plan/decision_policy), and gpt-5.5 is the same shape at one notch finer granularity. That convergence — reached independently — is itself evidence that ~5-6 control nodes with a single consolidated policy node is the right end state: glm-5.1's 2 nodes don't dissolve the god node, and opus's 14 / qwen-3.7's 10 nodes buy graph noise and 6-10 Postgres checkpoint writes per iteration for no behavioral benefit.
Within that shape, fable-5 wins the backbone slot because it is the only proposal that solves the two problems that would actually hurt in production and that nothing in the test suite would catch early:
- Parked-thread migration (fact 2): live conversations sit at the
ask_userinterrupt between turns; removing theplannode invalidates their pending state. The thread-namespace bump ({conversation_id}:planner:v2) with re-bootstrap degradation is the only stated, workable migration story. - Rewrite-scope fidelity (fact 5):
decision_origingating is the only stated mechanism that keeps deterministic decisions on the calc-adjust-only path while LLM decisions get the full redirect/cap chain — the subtle semantics that gpt-5.5, kimi, and qwen-3.7 all silently break.
Recommended composition¶
- Backbone (fable-5):
tick → prepare → select → decide → guard, action nodes loop totick; adddecision_origin+llm_failedtoState/PlannerState; bump the planner thread namespace at rewire time; three-step migration (extract → rewire → regenerate docs). - Acceptance rules (gpt-5.4), enforced in review: conditional-edge functions are pure routers; only
tickmutatesiterations(protectsmax_itersand_turn_searchesattribution, fact 6);ask_userremains the singleinterrupt()node; exactly one policy/guard node — do not split per rewrite. - Specification assets (gpt-5.5): adopt its per-node "owns / must-not-do" contracts and its test-migration map (which
test_planner_decisions.pyassertions move to which node); adopt its prohibition on cachingFieldAcquisition/recipes in checkpointed state. Iftickproves too dense in review, itsrequire_region/require_currencyemit-node pattern (routing into the sharedask_user) is the approved way to split bootstrap out — not dedicated interrupt nodes. - Incidental cleanups (glm-5.1, opus): apply glm-5.1's
_build_plan_output-style consolidation of the 9-way return assembly during the extract phase; use opus's gate/enrich/acquire/plan/corrector taxonomy as naming and review vocabulary.
Explicitly rejected ideas (with sources)¶
- Decision rewrites inside conditional-edge functions — cannot persist rewritten decisions; loses question text and abort status (qwen-3.6).
- Dedicated interrupt nodes for region/currency — multiplies the resume surface for questions
observe_useralready handles (kimi-2.6, qwen-3.6). - Caching
recipes/AcquisitionTaskobjects in checkpointed state — serializer-allowlist hazard plus drift (deepseek-4-pro, mimo-2.5-pro). plan → planself-loop to re-prompt after decomposition — an unrequested behavior change with extra LLM cost (deepseek-4-pro).- Per-corrector / per-rule node explosion — 6-10 checkpoint writes per iteration with no batching mechanism in LangGraph (opus step 3, qwen-3.7-max).
- Any loop edge that bypasses the iteration tick — kills
max_itersand turn attribution (qwen-3.7-max'sobserve_user → check_region_currency, qwen-3.6'sroute_finish_check → plan).
Suggested execution order¶
- Extract (no topology change): pull
tick/prepare/select/decide/guardbodies out ofplan_nodeas module functions;plan_nodebecomes a thin sequential composition; adddecision_origin/llm_failed; apply the_build_plan_outputconsolidation. All existing tests stay green. - Rewire: register the five nodes and conditional edges; point action-node returns at
tick; deleteplan_node/route_after_plan; bump the planner thread namespace; migrate tests per the gpt-5.5 map; verify interrupt resume throughrun_planner_stepend-to-end (bootstrap, ask→answer, abort, calc-success paths). - Document: regenerate
current-graph.mdfrom the new topology; most of its "Routing details" prose disappears because the graph now expresses it; link this analysis and the adopted proposal from an ADR.