Planner Graph Refactor Proposals — Ranged Analysis¶

Author: Sisyphus (qwen3.6-plus) Date: 2026-06-11 Scope: 11 proposals in docs/planner-graph-ref/proposals/ Reference: docs/planner-graph-ref/current-graph.md, src/venturescope/planner/agent.py:846-1192

Common Problem Statement (all proposals agree)¶

The plan_node() function (~350 lines, 15+ early-return paths) is a monolith that hides the planner's real control flow from the graph diagram. All proposals agree on the diagnosis:

Concern	Lines	LLM needed?
Iteration tick + abort/max_iters	848–862	No
Region/currency bootstrap	864–887	No
Proactive decomposition + schema composition	889–917	Yes (lazy)
Calculator lifecycle gates	919–943	No
Blocked-calc acquisition + dynamic decomp	945–988	Yes (lazy)
Auto-finish + acquisition fast path	993–1035	No
Planner LLM call	1037–1068	Yes
Post-LLM redirects + cap enforcement	1070–1192	Yes (lazy)

The shared goal across all proposals: move deterministic orchestration to graph-level nodes + conditional edges, shrink plan to LLM-only.

Proposal Ranking (Best → Worst)¶

1. fable-5 — ⭐ Best balance of ambition and pragmatism¶

Nodes: tick → prepare → select → decide → guard (5 new nodes)

Strengths: - decision_origin flag ("deterministic" vs "llm") is the single best insight across all proposals. It cleanly gates which rewrite subset guard applies, preserving the current llm_failed infinite-loop protection without ad-hoc state juggling. - select as a deterministic ladder preserves the exact order of today's early returns (calc cap → calc success → blocked calc → acquisition fast path → auto-finish). This is critical — the ordering is not accidental. - Migration plan is genuinely safe: Phase 1 extracts without rewiring (plan_node becomes thin sequential composition). Phase 2 rewires. This means tests stay green at every step. - Explicitly addresses checkpoint compatibility: Bump the planner thread namespace (:planner:v2). This is the only proposal that treats in-flight checkpoint breakage as a first-class concern with a concrete mitigation. - Follow-up section is honest: Acknowledges decompose as a loop node and Command(goto=...) as deferred improvements. Scope discipline. - llm_failed in state replaces the local variable cleanly, enabling the guard to skip calc-adjustment after structured-output failure.

Weaknesses: - 4 checkpoint writes per tick (tick, prepare, select, decide/guard). Acceptable but measurable. - select still contains one LLM call (blocked-field decomposition). Deferred to a follow-up decompose node, which is honest but leaves a rough edge.

Verdict: The most production-ready proposal. Five nodes is the sweet spot between "still a black box" and "graph noise." The decision_origin concept is worth adopting even if the node count changes.

2. gpt-5.5 — ⭐ Second best, excellent migration discipline¶

Nodes: enter_iteration → prepare_schema → require_region/require_currency → acquisition_gate → plan → normalize_decision → retry_gate → maybe_calculate (8 new nodes)

Strengths: - normalize_decision + retry_gate split is the cleanest post-LLM pipeline across all proposals. Normalization (derived redirect, web-first redirect, blocked-calc wording) is conceptually separate from retry limits (search cap, ask-user cap). - maybe_calculate as a dedicated calculator router replaces the hidden _adjust_calculation_decision() call with an explicit graph node. This makes the finish→calculate rewrite visible in the diagram. - Migration plan is exemplary: Step 1 extracts pure helpers without changing graph shape. Step 2 adds nodes one group at a time. Step 3 renames only after behavior is stable. Step 4 updates docs. This is textbook safe refactoring. - Explicit non-goals section prevents scope creep: no outer graph changes, no PlannerDecision redesign, no search backend changes. - State changes are minimal and justified: Only llm_failed or decision_origin. Explicitly warns against adding prepared_recipes to state unless serialization is proven safe.

Weaknesses: - 8 nodes is a lot. The graph becomes visually larger than fable-5's 5 nodes. - require_region and require_currency as separate nodes may be over-splitting — they're structurally identical and could be one bootstrap_gate node with a routing parameter. - The Mermaid diagram has some routing ambiguities (e.g., prepare_schema → three conditional edges plus a fall-through to acquisition_gate is not clearly expressed).

Verdict: If the team values explicitness over conciseness, this is the best proposal. The migration plan is the most detailed and safest. The normalize_decision/retry_gate split is architecturally sound.

3. gpt-5.4 — ⭐ Third, pragmatic 6-node pipeline¶

Nodes: tick → bootstrap_gate → prepare_context → acquisition_gate → llm_plan → decision_policy (6 new nodes)

Strengths: - 6 nodes is the right granularity. Not too few (still hides logic), not too many (graph noise). Each node has a clear, named responsibility. - decision_policy as a single post-LLM node is the right call. The proposal explicitly warns against splitting every rewrite into separate nodes: "Splitting every rewrite into separate nodes would add graph noise without improving clarity." This is correct. - Migration order is optimal: tick first (cleanest seam), then bootstrap_gate, then prepare_context, then acquisition_gate, then slim plan, then decision_policy. Each step adds one node without breaking the previous step's behavior. - Anti-patterns section is valuable: "Do not move state mutation into edge functions," "Do not increment iterations in every gate," "Do not turn every policy rule into a node." These are lessons learned from reading the other proposals. - Design rule is crisp: "Deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small."

Weaknesses: - Less detailed on state changes than fable-5 or gpt-5.5. No explicit decision_origin or llm_failed flag discussion. - The Mermaid diagram doesn't show the decision_policy → action nodes routing as clearly as it could. - No explicit checkpoint compatibility discussion.

Verdict: The most balanced proposal in terms of node count and migration safety. The anti-patterns section alone makes it worth reading. If fable-5's decision_origin concept were added, this would be #1.

4. opus — Thorough but over-granular¶

Nodes: g_iter_cap → g_region → g_currency → enrich_schema → g_calc_caps → acquire → plan_llm → c_target_decompose → c_redirect_derived → c_redirect_web → c_search_cap → c_ask_cap → c_calc_adjust → dispatch (14 nodes)

Strengths: - Most detailed node-by-node mapping of any proposal. Every line range in plan_node is mapped to a specific new node. - g_* / c_* naming convention is clear: gates vs correctors. The invariant ("every path deposits a PlannerDecision into state["decision"]") is well-stated. - State write table is excellent: each node's write set is explicitly documented, eliminating the 8 separate if schema_changed blocks. - Three-step migration is reasonable: extract helpers → hoist gates → split correctors. - Open questions are genuine: Should enrich_schema always run? Should acquire be one node or two? Should c_search_cap and c_ask_cap be merged?

Weaknesses: - 14 nodes is too many. The corrector chain (6 sequential c_* nodes) adds 6 checkpoint writes per iteration for zero routing benefit — they're a straight add_edge chain with no branching. This is graph noise. - Splitting g_region and g_currency adds a node each for what is structurally identical logic. A single bootstrap_gate would suffice. - The Mermaid diagram is hard to read with 14 nodes and many intermediate emit_* nodes. - Cost analysis acknowledges the problem: "up to 10 hops per iteration" with "measurable" checkpoint writes. The mitigation (use add_edge not add_conditional_edges) helps but doesn't eliminate the overhead.

Verdict: Excellent analysis, over-engineered architecture. The node-by-node mapping and state write table are reference-quality. The corrector chain should be collapsed into a single enforce_policy node.

5. kimi-2.6 — Solid 8-node pipeline with good decision matrix¶

Nodes: tick → ask_region/ask_currency → prepare → calc_gate → acquire → route_direct → decide → enforce_policy (8 nodes + 2 bootstrap)

Strengths: - Decision matrix appendix is the best reference artifact across all proposals. Every logic block is mapped to current location, proposed location, and reason. - calc_gate as a dedicated node makes calculator lifecycle checks visible. The three-way routing (finish_success | finish_abort | continue) is clean. - route_direct as a tiny adapter bridges deterministic acquisition tasks into the same post-processing pipeline as LLM decisions. This is a good pattern. - Migration path is practical: Extract helpers → register as nodes → delete plan_node. The phased wrapper approach (plan_node becomes a thin delegator during transition) is safe.

Weaknesses: - ask_region and ask_currency as separate nodes is over-splitting (same issue as opus). - enforce_policy is still complex (all post-LLM rewrites in one node). This is acceptable but the proposal doesn't acknowledge the trade-off. - No explicit decision_origin or llm_failed state field discussion. - The Mermaid diagram shows ask_region → observe_user and ask_currency → observe_user but doesn't show how observe_user routes back to tick.

Verdict: The decision matrix appendix makes this a valuable reference. The node count is reasonable. The lack of decision_origin is a notable gap.

6. glm-5.1 — Conservative 2-node split¶

Nodes: pre_check → acquire_or_plan (2 new nodes)

Strengths: - Most conservative proposal. Only 2 new nodes, minimal graph changes. Lowest risk. - _build_plan_output helper to collapse the 9 conditional-return pattern is a good code-level improvement independent of graph changes. - Post-LLM rewriting options (Option A: keep inline, Option B: extract validate_decision) shows good judgment about when to stop refactoring. - State change is minimal: Only _pre_check_route field. The alternative (extending Action enum) is correctly rejected. - Risk table is thorough: Loop-back edge changes, region/currency timing, _build_plan_output hiding logic, dynamic decomposition dependencies.

Weaknesses: - 2 nodes is too few. acquire_or_plan is still ~200 LOC after extraction. The proposal acknowledges this ("still substantial") but doesn't fully address the cognitive load. - Doesn't solve the core problem fully. The graph still hides most routing inside acquire_or_plan. The Mermaid diagram is only marginally more informative than today's. - No explicit migration testing strategy. Phase 1 says "update tests" but doesn't specify how. - The _pre_check_route field as a string is less type-safe than fable-5's decision_origin: Literal["deterministic", "llm"].

Verdict: Good first step, not a complete solution. If the team wants to start small, this is the right entry point. But it should be followed by further decomposition.

7. deepseek-4-pro — Well-structured but over-engineered¶

Nodes: guard → prepare → plan → adjust (4 new nodes, but with self-loop on plan)

Strengths: - 4-node pipeline is conceptually clean. Guard → Prepare → Plan → Adjust maps well to the mental model. - Detailed trade-off table with before/after comparison is useful. - Alternatives considered section shows good engineering judgment: rejected 16-node explosion, rejected keep-plan-node-with-edges, rejected calculator subgraph, rejected guard-only extraction. - Migration phases are well-ordered: Extract → Update graph → Remove old code → Verify.

Weaknesses: - plan self-loop for decomposition is a rough edge. The conditional edge plan → plan (when decomposition is generated) adds complexity that fable-5 avoids by putting decomposition in prepare. - recipes in state is a new field that fable-5 and gpt-5.5 avoid. The proposal acknowledges this risk ("must stay synchronized with dynamic_decompositions") but doesn't fully mitigate it. - guard handles region/currency + abort/max_iters but prepare handles calculator gates + auto-finish. The split between these two is not obvious — why are calculator gates in prepare rather than guard? The proposal doesn't justify this boundary clearly. - No checkpoint compatibility discussion.

Verdict: Good structure, but the plan self-loop and recipes in state are rough edges that fable-5 handles better.

8. qwen-3.6-plus — Good ideas, flawed graph topology¶

Nodes: preflight → ask_region/ask_currency → decompose → plan → route_finish_check (5 new nodes + 4 bootstrap)

Strengths: - route_finish_check as a validation node is a good idea — the current blind finish → END path doesn't verify that finish is actually appropriate. - Routing function signatures are well-documented. Each routing function has a clear return type and responsibility. - Phase-by-phase migration is reasonable, starting with pre-flight nodes (lowest risk).

Weaknesses: - Graph topology has a critical flaw: The loop-back edges point to plan instead of preflight. This means guard checks (aborted, max_iters, region, currency) are NOT re-run on every iteration. This is a behavioral regression. - ask_region/ask_currency with dedicated observe_region/observe_currency adds 4 nodes for what is currently 20 lines of inline logic. Over-splitting. - route_after_plan becomes a routing function with redirect logic AND state mutation. This violates the principle that edge functions should be pure predicates. The proposal acknowledges this but proceeds anyway. - No decision_origin or llm_failed state field. The redirect logic in route_after_plan needs to know whether the decision came from LLM or deterministic path. - The Mermaid diagram is inconsistent with the proposed graph construction code (e.g., observe → plan vs observe → preflight).

Verdict: Good ideas buried in an inconsistent topology. The loop-back-to-plan bug alone makes this proposal unsafe to implement as written.

9. qwen-3.7-max — Overly granular, 10 nodes with `_route` state pollution¶

Nodes: check_termination → check_region_currency → compose_schema → check_calculator → acquisition_routing → check_completion → llm_decide → post_process → enforce_caps → route_decision (10 new nodes)

Strengths: - Most detailed code examples of any proposal. Each node has a full Python implementation sketch. - _route transient field pattern is consistent across all nodes. - Migration strategy is phased: Guards → Transformations → LLM → Graph → Cleanup.

Weaknesses: - 10 nodes is excessive. The proposal splits every concern into its own node without considering whether the granularity adds value. - _route field in state is a code smell. Using state as a routing signal means every node writes a _route key that is immediately consumed and discarded. This pollutes the checkpoint and makes the state schema noisy. - observe_user → check_region_currency breaks the existing observe_user → plan loop. Region/currency answers currently go through the same observe_user_node that handles all user answers. This changes behavior. - No decision_origin concept. The proposal doesn't distinguish between deterministic and LLM-originated decisions in the post-processing pipeline. - The code examples have bugs: check_termination_node returns _route: "finish" but the routing function reads state.get("_route", "continue") — the default would never trigger. - Checkpoint serialization concern is acknowledged ("exclude transient _route fields") but LangGraph's PostgresSaver doesn't support field-level exclusion.

Verdict: The code examples are impressive but the architecture is over-engineered. The _route state pollution is a significant anti-pattern.

10. gemini-3.1-pro — Incomplete analysis, too coarse¶

Nodes: prepare_state → evaluate_rules → llm_plan (3 new nodes)

Strengths: - 3-node split is the simplest proposal. Easy to understand, easy to implement. - Benefits section is clear: True graph visibility, reduced latency/cost risk, cleaner unit tests.

Weaknesses: - Doesn't address post-LLM rewriting at all. The current _redirect_derived_direct_decision, _redirect_premature_ask_for_web_field, cap enforcement, and _adjust_calculation_decision are all left unaddressed. - evaluate_rules is still a mega-node. It combines abort/max_iters checks, region/currency gating, calculator status, acquisition task selection, and auto-finish logic. This is barely better than the current plan_node. - No migration plan. Just "refactor agent.py functions" with no phasing or testing strategy. - No state changes discussion. The proposal doesn't address how evaluate_rules communicates its decision to the routing function. - The Mermaid diagram shows route_after_rules and route_after_llm as diamond nodes (conditional edges), but doesn't specify what state they read.

Verdict: Too coarse to be useful. The 3-node split doesn't solve the core problem — evaluate_rules is still a black box.

11. mimo-2.5-pro — Incomplete, naming confusion¶

Nodes: guards → acquire → decide (3 new nodes)

Strengths: - 3-node split is simple. Guards → Acquire → Decide is easy to understand. - Migration plan is phased: Extract without changing topology → Wire into graph → Cleanup. - Expected benefits table is clear and measurable.

Weaknesses: - decide node is still ~80 lines and includes LLM call + all redirectors + cap enforcement. This is barely better than the current plan_node. - plan_router is mentioned but not defined. The graph construction references builder.add_edge("decide", "plan_router") but plan_router is not registered as a node. - route_after_acquire routes to "decide" or "finish" but doesn't handle the case where acquire finds a task that should go directly to an action node. - No decision_origin or llm_failed state field. - Open questions are left unanswered: Should decide include loop/cap detection? Should plan_router be a node or routing function? Naming preferences? - The graph construction code has a bug: builder.add_edge("decide", "plan_router") followed by builder.add_conditional_edges("plan_router", route_after_plan, ...) — but plan_router is never registered as a node.

Verdict: Incomplete and has implementation bugs in the proposed code. The 3-node split is too coarse.

Synthesis: Recommended Approach¶

Core concepts to adopt (from multiple proposals):¶

Concept	Source	Why
`decision_origin` flag	fable-5	Cleanly gates which rewrite subset applies; preserves `llm_failed` protection
`tick` as sole iteration incrementer	fable-5, gpt-5.4	Prevents drift in attempt tracking
Deterministic ladder in `select`	fable-5	Preserves exact ordering of today's early returns
Single `decision_policy` / `enforce_policy` node	gpt-5.4, gpt-5.5, kimi-2.6	Avoids graph noise from splitting every rewrite
Phase 1: extract without rewiring	fable-5, gpt-5.5, opus	Keeps tests green at every step
Bump planner thread namespace	fable-5	Concrete mitigation for in-flight checkpoint breakage
Anti-patterns list	gpt-5.4	"Don't mutate state in edge functions," "Don't split every rule"

Recommended node count: 5-6 nodes¶

The sweet spot is between glm-5.1's 2 nodes (too few) and opus's 14 nodes (too many). The fable-5 proposal's 5 nodes (tick, prepare, select, decide, guard) is the right granularity.

Recommended migration order:¶

Extract helpers without graph changes — plan_node becomes thin sequential composition
Add decision_origin and llm_failed to State — serializer-compatible with defaults
Rewire graph — register 5 nodes, add conditional edges, bump thread namespace
Delete plan_node — update tests and docs

What NOT to do:¶

Don't split correctors into individual nodes. A single enforce_policy / guard node handles all post-LLM rewrites. The sequential chain adds checkpoint writes with zero routing benefit.
Don't add _route to state. Use decision_origin + decision.action for routing. Transient routing fields pollute checkpoints.
Don't make plan self-loop. Decomposition belongs in prepare, not as a post-LLM loop.
Don't move state mutation into edge functions. Edge functions should be pure predicates.
Don't split region/currency into separate nodes. One bootstrap_gate or include in tick.