Skip to content

Planner Graph Refactor Proposals — Ranged Analysis

Author: Sisyphus (qwen3.6-plus) Date: 2026-06-11 Scope: 11 proposals in docs/planner-graph-ref/proposals/ Reference: docs/planner-graph-ref/current-graph.md, src/venturescope/planner/agent.py:846-1192


Common Problem Statement (all proposals agree)

The plan_node() function (~350 lines, 15+ early-return paths) is a monolith that hides the planner's real control flow from the graph diagram. All proposals agree on the diagnosis:

Concern Lines LLM needed?
Iteration tick + abort/max_iters 848–862 No
Region/currency bootstrap 864–887 No
Proactive decomposition + schema composition 889–917 Yes (lazy)
Calculator lifecycle gates 919–943 No
Blocked-calc acquisition + dynamic decomp 945–988 Yes (lazy)
Auto-finish + acquisition fast path 993–1035 No
Planner LLM call 1037–1068 Yes
Post-LLM redirects + cap enforcement 1070–1192 Yes (lazy)

The shared goal across all proposals: move deterministic orchestration to graph-level nodes + conditional edges, shrink plan to LLM-only.


Proposal Ranking (Best → Worst)

1. fable-5 — ⭐ Best balance of ambition and pragmatism

Nodes: tickprepareselectdecideguard (5 new nodes)

Strengths: - decision_origin flag ("deterministic" vs "llm") is the single best insight across all proposals. It cleanly gates which rewrite subset guard applies, preserving the current llm_failed infinite-loop protection without ad-hoc state juggling. - select as a deterministic ladder preserves the exact order of today's early returns (calc cap → calc success → blocked calc → acquisition fast path → auto-finish). This is critical — the ordering is not accidental. - Migration plan is genuinely safe: Phase 1 extracts without rewiring (plan_node becomes thin sequential composition). Phase 2 rewires. This means tests stay green at every step. - Explicitly addresses checkpoint compatibility: Bump the planner thread namespace (:planner:v2). This is the only proposal that treats in-flight checkpoint breakage as a first-class concern with a concrete mitigation. - Follow-up section is honest: Acknowledges decompose as a loop node and Command(goto=...) as deferred improvements. Scope discipline. - llm_failed in state replaces the local variable cleanly, enabling the guard to skip calc-adjustment after structured-output failure.

Weaknesses: - 4 checkpoint writes per tick (tick, prepare, select, decide/guard). Acceptable but measurable. - select still contains one LLM call (blocked-field decomposition). Deferred to a follow-up decompose node, which is honest but leaves a rough edge.

Verdict: The most production-ready proposal. Five nodes is the sweet spot between "still a black box" and "graph noise." The decision_origin concept is worth adopting even if the node count changes.


2. gpt-5.5 — ⭐ Second best, excellent migration discipline

Nodes: enter_iterationprepare_schemarequire_region/require_currencyacquisition_gateplannormalize_decisionretry_gatemaybe_calculate (8 new nodes)

Strengths: - normalize_decision + retry_gate split is the cleanest post-LLM pipeline across all proposals. Normalization (derived redirect, web-first redirect, blocked-calc wording) is conceptually separate from retry limits (search cap, ask-user cap). - maybe_calculate as a dedicated calculator router replaces the hidden _adjust_calculation_decision() call with an explicit graph node. This makes the finish→calculate rewrite visible in the diagram. - Migration plan is exemplary: Step 1 extracts pure helpers without changing graph shape. Step 2 adds nodes one group at a time. Step 3 renames only after behavior is stable. Step 4 updates docs. This is textbook safe refactoring. - Explicit non-goals section prevents scope creep: no outer graph changes, no PlannerDecision redesign, no search backend changes. - State changes are minimal and justified: Only llm_failed or decision_origin. Explicitly warns against adding prepared_recipes to state unless serialization is proven safe.

Weaknesses: - 8 nodes is a lot. The graph becomes visually larger than fable-5's 5 nodes. - require_region and require_currency as separate nodes may be over-splitting — they're structurally identical and could be one bootstrap_gate node with a routing parameter. - The Mermaid diagram has some routing ambiguities (e.g., prepare_schema → three conditional edges plus a fall-through to acquisition_gate is not clearly expressed).

Verdict: If the team values explicitness over conciseness, this is the best proposal. The migration plan is the most detailed and safest. The normalize_decision/retry_gate split is architecturally sound.


3. gpt-5.4 — ⭐ Third, pragmatic 6-node pipeline

Nodes: tickbootstrap_gateprepare_contextacquisition_gatellm_plandecision_policy (6 new nodes)

Strengths: - 6 nodes is the right granularity. Not too few (still hides logic), not too many (graph noise). Each node has a clear, named responsibility. - decision_policy as a single post-LLM node is the right call. The proposal explicitly warns against splitting every rewrite into separate nodes: "Splitting every rewrite into separate nodes would add graph noise without improving clarity." This is correct. - Migration order is optimal: tick first (cleanest seam), then bootstrap_gate, then prepare_context, then acquisition_gate, then slim plan, then decision_policy. Each step adds one node without breaking the previous step's behavior. - Anti-patterns section is valuable: "Do not move state mutation into edge functions," "Do not increment iterations in every gate," "Do not turn every policy rule into a node." These are lessons learned from reading the other proposals. - Design rule is crisp: "Deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small."

Weaknesses: - Less detailed on state changes than fable-5 or gpt-5.5. No explicit decision_origin or llm_failed flag discussion. - The Mermaid diagram doesn't show the decision_policy → action nodes routing as clearly as it could. - No explicit checkpoint compatibility discussion.

Verdict: The most balanced proposal in terms of node count and migration safety. The anti-patterns section alone makes it worth reading. If fable-5's decision_origin concept were added, this would be #1.


4. opus — Thorough but over-granular

Nodes: g_iter_capg_regiong_currencyenrich_schemag_calc_capsacquireplan_llmc_target_decomposec_redirect_derivedc_redirect_webc_search_capc_ask_capc_calc_adjustdispatch (14 nodes)

Strengths: - Most detailed node-by-node mapping of any proposal. Every line range in plan_node is mapped to a specific new node. - g_* / c_* naming convention is clear: gates vs correctors. The invariant ("every path deposits a PlannerDecision into state["decision"]") is well-stated. - State write table is excellent: each node's write set is explicitly documented, eliminating the 8 separate if schema_changed blocks. - Three-step migration is reasonable: extract helpers → hoist gates → split correctors. - Open questions are genuine: Should enrich_schema always run? Should acquire be one node or two? Should c_search_cap and c_ask_cap be merged?

Weaknesses: - 14 nodes is too many. The corrector chain (6 sequential c_* nodes) adds 6 checkpoint writes per iteration for zero routing benefit — they're a straight add_edge chain with no branching. This is graph noise. - Splitting g_region and g_currency adds a node each for what is structurally identical logic. A single bootstrap_gate would suffice. - The Mermaid diagram is hard to read with 14 nodes and many intermediate emit_* nodes. - Cost analysis acknowledges the problem: "up to 10 hops per iteration" with "measurable" checkpoint writes. The mitigation (use add_edge not add_conditional_edges) helps but doesn't eliminate the overhead.

Verdict: Excellent analysis, over-engineered architecture. The node-by-node mapping and state write table are reference-quality. The corrector chain should be collapsed into a single enforce_policy node.


5. kimi-2.6 — Solid 8-node pipeline with good decision matrix

Nodes: tickask_region/ask_currencypreparecalc_gateacquireroute_directdecideenforce_policy (8 nodes + 2 bootstrap)

Strengths: - Decision matrix appendix is the best reference artifact across all proposals. Every logic block is mapped to current location, proposed location, and reason. - calc_gate as a dedicated node makes calculator lifecycle checks visible. The three-way routing (finish_success | finish_abort | continue) is clean. - route_direct as a tiny adapter bridges deterministic acquisition tasks into the same post-processing pipeline as LLM decisions. This is a good pattern. - Migration path is practical: Extract helpers → register as nodes → delete plan_node. The phased wrapper approach (plan_node becomes a thin delegator during transition) is safe.

Weaknesses: - ask_region and ask_currency as separate nodes is over-splitting (same issue as opus). - enforce_policy is still complex (all post-LLM rewrites in one node). This is acceptable but the proposal doesn't acknowledge the trade-off. - No explicit decision_origin or llm_failed state field discussion. - The Mermaid diagram shows ask_region → observe_user and ask_currency → observe_user but doesn't show how observe_user routes back to tick.

Verdict: The decision matrix appendix makes this a valuable reference. The node count is reasonable. The lack of decision_origin is a notable gap.


6. glm-5.1 — Conservative 2-node split

Nodes: pre_checkacquire_or_plan (2 new nodes)

Strengths: - Most conservative proposal. Only 2 new nodes, minimal graph changes. Lowest risk. - _build_plan_output helper to collapse the 9 conditional-return pattern is a good code-level improvement independent of graph changes. - Post-LLM rewriting options (Option A: keep inline, Option B: extract validate_decision) shows good judgment about when to stop refactoring. - State change is minimal: Only _pre_check_route field. The alternative (extending Action enum) is correctly rejected. - Risk table is thorough: Loop-back edge changes, region/currency timing, _build_plan_output hiding logic, dynamic decomposition dependencies.

Weaknesses: - 2 nodes is too few. acquire_or_plan is still ~200 LOC after extraction. The proposal acknowledges this ("still substantial") but doesn't fully address the cognitive load. - Doesn't solve the core problem fully. The graph still hides most routing inside acquire_or_plan. The Mermaid diagram is only marginally more informative than today's. - No explicit migration testing strategy. Phase 1 says "update tests" but doesn't specify how. - The _pre_check_route field as a string is less type-safe than fable-5's decision_origin: Literal["deterministic", "llm"].

Verdict: Good first step, not a complete solution. If the team wants to start small, this is the right entry point. But it should be followed by further decomposition.


7. deepseek-4-pro — Well-structured but over-engineered

Nodes: guardprepareplanadjust (4 new nodes, but with self-loop on plan)

Strengths: - 4-node pipeline is conceptually clean. Guard → Prepare → Plan → Adjust maps well to the mental model. - Detailed trade-off table with before/after comparison is useful. - Alternatives considered section shows good engineering judgment: rejected 16-node explosion, rejected keep-plan-node-with-edges, rejected calculator subgraph, rejected guard-only extraction. - Migration phases are well-ordered: Extract → Update graph → Remove old code → Verify.

Weaknesses: - plan self-loop for decomposition is a rough edge. The conditional edge plan → plan (when decomposition is generated) adds complexity that fable-5 avoids by putting decomposition in prepare. - recipes in state is a new field that fable-5 and gpt-5.5 avoid. The proposal acknowledges this risk ("must stay synchronized with dynamic_decompositions") but doesn't fully mitigate it. - guard handles region/currency + abort/max_iters but prepare handles calculator gates + auto-finish. The split between these two is not obvious — why are calculator gates in prepare rather than guard? The proposal doesn't justify this boundary clearly. - No checkpoint compatibility discussion.

Verdict: Good structure, but the plan self-loop and recipes in state are rough edges that fable-5 handles better.


8. qwen-3.6-plus — Good ideas, flawed graph topology

Nodes: preflightask_region/ask_currencydecomposeplanroute_finish_check (5 new nodes + 4 bootstrap)

Strengths: - route_finish_check as a validation node is a good idea — the current blind finish → END path doesn't verify that finish is actually appropriate. - Routing function signatures are well-documented. Each routing function has a clear return type and responsibility. - Phase-by-phase migration is reasonable, starting with pre-flight nodes (lowest risk).

Weaknesses: - Graph topology has a critical flaw: The loop-back edges point to plan instead of preflight. This means guard checks (aborted, max_iters, region, currency) are NOT re-run on every iteration. This is a behavioral regression. - ask_region/ask_currency with dedicated observe_region/observe_currency adds 4 nodes for what is currently 20 lines of inline logic. Over-splitting. - route_after_plan becomes a routing function with redirect logic AND state mutation. This violates the principle that edge functions should be pure predicates. The proposal acknowledges this but proceeds anyway. - No decision_origin or llm_failed state field. The redirect logic in route_after_plan needs to know whether the decision came from LLM or deterministic path. - The Mermaid diagram is inconsistent with the proposed graph construction code (e.g., observe → plan vs observe → preflight).

Verdict: Good ideas buried in an inconsistent topology. The loop-back-to-plan bug alone makes this proposal unsafe to implement as written.


9. qwen-3.7-max — Overly granular, 10 nodes with _route state pollution

Nodes: check_terminationcheck_region_currencycompose_schemacheck_calculatoracquisition_routingcheck_completionllm_decidepost_processenforce_capsroute_decision (10 new nodes)

Strengths: - Most detailed code examples of any proposal. Each node has a full Python implementation sketch. - _route transient field pattern is consistent across all nodes. - Migration strategy is phased: Guards → Transformations → LLM → Graph → Cleanup.

Weaknesses: - 10 nodes is excessive. The proposal splits every concern into its own node without considering whether the granularity adds value. - _route field in state is a code smell. Using state as a routing signal means every node writes a _route key that is immediately consumed and discarded. This pollutes the checkpoint and makes the state schema noisy. - observe_user → check_region_currency breaks the existing observe_user → plan loop. Region/currency answers currently go through the same observe_user_node that handles all user answers. This changes behavior. - No decision_origin concept. The proposal doesn't distinguish between deterministic and LLM-originated decisions in the post-processing pipeline. - The code examples have bugs: check_termination_node returns _route: "finish" but the routing function reads state.get("_route", "continue") — the default would never trigger. - Checkpoint serialization concern is acknowledged ("exclude transient _route fields") but LangGraph's PostgresSaver doesn't support field-level exclusion.

Verdict: The code examples are impressive but the architecture is over-engineered. The _route state pollution is a significant anti-pattern.


10. gemini-3.1-pro — Incomplete analysis, too coarse

Nodes: prepare_stateevaluate_rulesllm_plan (3 new nodes)

Strengths: - 3-node split is the simplest proposal. Easy to understand, easy to implement. - Benefits section is clear: True graph visibility, reduced latency/cost risk, cleaner unit tests.

Weaknesses: - Doesn't address post-LLM rewriting at all. The current _redirect_derived_direct_decision, _redirect_premature_ask_for_web_field, cap enforcement, and _adjust_calculation_decision are all left unaddressed. - evaluate_rules is still a mega-node. It combines abort/max_iters checks, region/currency gating, calculator status, acquisition task selection, and auto-finish logic. This is barely better than the current plan_node. - No migration plan. Just "refactor agent.py functions" with no phasing or testing strategy. - No state changes discussion. The proposal doesn't address how evaluate_rules communicates its decision to the routing function. - The Mermaid diagram shows route_after_rules and route_after_llm as diamond nodes (conditional edges), but doesn't specify what state they read.

Verdict: Too coarse to be useful. The 3-node split doesn't solve the core problem — evaluate_rules is still a black box.


11. mimo-2.5-pro — Incomplete, naming confusion

Nodes: guardsacquiredecide (3 new nodes)

Strengths: - 3-node split is simple. Guards → Acquire → Decide is easy to understand. - Migration plan is phased: Extract without changing topology → Wire into graph → Cleanup. - Expected benefits table is clear and measurable.

Weaknesses: - decide node is still ~80 lines and includes LLM call + all redirectors + cap enforcement. This is barely better than the current plan_node. - plan_router is mentioned but not defined. The graph construction references builder.add_edge("decide", "plan_router") but plan_router is not registered as a node. - route_after_acquire routes to "decide" or "finish" but doesn't handle the case where acquire finds a task that should go directly to an action node. - No decision_origin or llm_failed state field. - Open questions are left unanswered: Should decide include loop/cap detection? Should plan_router be a node or routing function? Naming preferences? - The graph construction code has a bug: builder.add_edge("decide", "plan_router") followed by builder.add_conditional_edges("plan_router", route_after_plan, ...) — but plan_router is never registered as a node.

Verdict: Incomplete and has implementation bugs in the proposed code. The 3-node split is too coarse.


Core concepts to adopt (from multiple proposals):

Concept Source Why
decision_origin flag fable-5 Cleanly gates which rewrite subset applies; preserves llm_failed protection
tick as sole iteration incrementer fable-5, gpt-5.4 Prevents drift in attempt tracking
Deterministic ladder in select fable-5 Preserves exact ordering of today's early returns
Single decision_policy / enforce_policy node gpt-5.4, gpt-5.5, kimi-2.6 Avoids graph noise from splitting every rewrite
Phase 1: extract without rewiring fable-5, gpt-5.5, opus Keeps tests green at every step
Bump planner thread namespace fable-5 Concrete mitigation for in-flight checkpoint breakage
Anti-patterns list gpt-5.4 "Don't mutate state in edge functions," "Don't split every rule"

The sweet spot is between glm-5.1's 2 nodes (too few) and opus's 14 nodes (too many). The fable-5 proposal's 5 nodes (tick, prepare, select, decide, guard) is the right granularity.

  1. Extract helpers without graph changes — plan_node becomes thin sequential composition
  2. Add decision_origin and llm_failed to State — serializer-compatible with defaults
  3. Rewire graph — register 5 nodes, add conditional edges, bump thread namespace
  4. Delete plan_node — update tests and docs

What NOT to do:

  • Don't split correctors into individual nodes. A single enforce_policy / guard node handles all post-LLM rewrites. The sequential chain adds checkpoint writes with zero routing benefit.
  • Don't add _route to state. Use decision_origin + decision.action for routing. Transient routing fields pollute checkpoints.
  • Don't make plan self-loop. Decomposition belongs in prepare, not as a post-LLM loop.
  • Don't move state mutation into edge functions. Edge functions should be pure predicates.
  • Don't split region/currency into separate nodes. One bootstrap_gate or include in tick.