Planner Graph Refactor Proposals — Ranged Analysis¶
Author: Sisyphus (qwen3.6-plus)
Date: 2026-06-11
Scope: 11 proposals in docs/planner-graph-ref/proposals/
Reference: docs/planner-graph-ref/current-graph.md, src/venturescope/planner/agent.py:846-1192
Common Problem Statement (all proposals agree)¶
The plan_node() function (~350 lines, 15+ early-return paths) is a monolith that hides the planner's real control flow from the graph diagram. All proposals agree on the diagnosis:
| Concern | Lines | LLM needed? |
|---|---|---|
| Iteration tick + abort/max_iters | 848–862 | No |
| Region/currency bootstrap | 864–887 | No |
| Proactive decomposition + schema composition | 889–917 | Yes (lazy) |
| Calculator lifecycle gates | 919–943 | No |
| Blocked-calc acquisition + dynamic decomp | 945–988 | Yes (lazy) |
| Auto-finish + acquisition fast path | 993–1035 | No |
| Planner LLM call | 1037–1068 | Yes |
| Post-LLM redirects + cap enforcement | 1070–1192 | Yes (lazy) |
The shared goal across all proposals: move deterministic orchestration to graph-level nodes + conditional edges, shrink plan to LLM-only.
Proposal Ranking (Best → Worst)¶
1. fable-5 — ⭐ Best balance of ambition and pragmatism¶
Nodes: tick → prepare → select → decide → guard (5 new nodes)
Strengths:
- decision_origin flag ("deterministic" vs "llm") is the single best insight across all proposals. It cleanly gates which rewrite subset guard applies, preserving the current llm_failed infinite-loop protection without ad-hoc state juggling.
- select as a deterministic ladder preserves the exact order of today's early returns (calc cap → calc success → blocked calc → acquisition fast path → auto-finish). This is critical — the ordering is not accidental.
- Migration plan is genuinely safe: Phase 1 extracts without rewiring (plan_node becomes thin sequential composition). Phase 2 rewires. This means tests stay green at every step.
- Explicitly addresses checkpoint compatibility: Bump the planner thread namespace (:planner:v2). This is the only proposal that treats in-flight checkpoint breakage as a first-class concern with a concrete mitigation.
- Follow-up section is honest: Acknowledges decompose as a loop node and Command(goto=...) as deferred improvements. Scope discipline.
- llm_failed in state replaces the local variable cleanly, enabling the guard to skip calc-adjustment after structured-output failure.
Weaknesses:
- 4 checkpoint writes per tick (tick, prepare, select, decide/guard). Acceptable but measurable.
- select still contains one LLM call (blocked-field decomposition). Deferred to a follow-up decompose node, which is honest but leaves a rough edge.
Verdict: The most production-ready proposal. Five nodes is the sweet spot between "still a black box" and "graph noise." The decision_origin concept is worth adopting even if the node count changes.
2. gpt-5.5 — ⭐ Second best, excellent migration discipline¶
Nodes: enter_iteration → prepare_schema → require_region/require_currency → acquisition_gate → plan → normalize_decision → retry_gate → maybe_calculate (8 new nodes)
Strengths:
- normalize_decision + retry_gate split is the cleanest post-LLM pipeline across all proposals. Normalization (derived redirect, web-first redirect, blocked-calc wording) is conceptually separate from retry limits (search cap, ask-user cap).
- maybe_calculate as a dedicated calculator router replaces the hidden _adjust_calculation_decision() call with an explicit graph node. This makes the finish→calculate rewrite visible in the diagram.
- Migration plan is exemplary: Step 1 extracts pure helpers without changing graph shape. Step 2 adds nodes one group at a time. Step 3 renames only after behavior is stable. Step 4 updates docs. This is textbook safe refactoring.
- Explicit non-goals section prevents scope creep: no outer graph changes, no PlannerDecision redesign, no search backend changes.
- State changes are minimal and justified: Only llm_failed or decision_origin. Explicitly warns against adding prepared_recipes to state unless serialization is proven safe.
Weaknesses:
- 8 nodes is a lot. The graph becomes visually larger than fable-5's 5 nodes.
- require_region and require_currency as separate nodes may be over-splitting — they're structurally identical and could be one bootstrap_gate node with a routing parameter.
- The Mermaid diagram has some routing ambiguities (e.g., prepare_schema → three conditional edges plus a fall-through to acquisition_gate is not clearly expressed).
Verdict: If the team values explicitness over conciseness, this is the best proposal. The migration plan is the most detailed and safest. The normalize_decision/retry_gate split is architecturally sound.
3. gpt-5.4 — ⭐ Third, pragmatic 6-node pipeline¶
Nodes: tick → bootstrap_gate → prepare_context → acquisition_gate → llm_plan → decision_policy (6 new nodes)
Strengths:
- 6 nodes is the right granularity. Not too few (still hides logic), not too many (graph noise). Each node has a clear, named responsibility.
- decision_policy as a single post-LLM node is the right call. The proposal explicitly warns against splitting every rewrite into separate nodes: "Splitting every rewrite into separate nodes would add graph noise without improving clarity." This is correct.
- Migration order is optimal: tick first (cleanest seam), then bootstrap_gate, then prepare_context, then acquisition_gate, then slim plan, then decision_policy. Each step adds one node without breaking the previous step's behavior.
- Anti-patterns section is valuable: "Do not move state mutation into edge functions," "Do not increment iterations in every gate," "Do not turn every policy rule into a node." These are lessons learned from reading the other proposals.
- Design rule is crisp: "Deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small."
Weaknesses:
- Less detailed on state changes than fable-5 or gpt-5.5. No explicit decision_origin or llm_failed flag discussion.
- The Mermaid diagram doesn't show the decision_policy → action nodes routing as clearly as it could.
- No explicit checkpoint compatibility discussion.
Verdict: The most balanced proposal in terms of node count and migration safety. The anti-patterns section alone makes it worth reading. If fable-5's decision_origin concept were added, this would be #1.
4. opus — Thorough but over-granular¶
Nodes: g_iter_cap → g_region → g_currency → enrich_schema → g_calc_caps → acquire → plan_llm → c_target_decompose → c_redirect_derived → c_redirect_web → c_search_cap → c_ask_cap → c_calc_adjust → dispatch (14 nodes)
Strengths:
- Most detailed node-by-node mapping of any proposal. Every line range in plan_node is mapped to a specific new node.
- g_* / c_* naming convention is clear: gates vs correctors. The invariant ("every path deposits a PlannerDecision into state["decision"]") is well-stated.
- State write table is excellent: each node's write set is explicitly documented, eliminating the 8 separate if schema_changed blocks.
- Three-step migration is reasonable: extract helpers → hoist gates → split correctors.
- Open questions are genuine: Should enrich_schema always run? Should acquire be one node or two? Should c_search_cap and c_ask_cap be merged?
Weaknesses:
- 14 nodes is too many. The corrector chain (6 sequential c_* nodes) adds 6 checkpoint writes per iteration for zero routing benefit — they're a straight add_edge chain with no branching. This is graph noise.
- Splitting g_region and g_currency adds a node each for what is structurally identical logic. A single bootstrap_gate would suffice.
- The Mermaid diagram is hard to read with 14 nodes and many intermediate emit_* nodes.
- Cost analysis acknowledges the problem: "up to 10 hops per iteration" with "measurable" checkpoint writes. The mitigation (use add_edge not add_conditional_edges) helps but doesn't eliminate the overhead.
Verdict: Excellent analysis, over-engineered architecture. The node-by-node mapping and state write table are reference-quality. The corrector chain should be collapsed into a single enforce_policy node.
5. kimi-2.6 — Solid 8-node pipeline with good decision matrix¶
Nodes: tick → ask_region/ask_currency → prepare → calc_gate → acquire → route_direct → decide → enforce_policy (8 nodes + 2 bootstrap)
Strengths:
- Decision matrix appendix is the best reference artifact across all proposals. Every logic block is mapped to current location, proposed location, and reason.
- calc_gate as a dedicated node makes calculator lifecycle checks visible. The three-way routing (finish_success | finish_abort | continue) is clean.
- route_direct as a tiny adapter bridges deterministic acquisition tasks into the same post-processing pipeline as LLM decisions. This is a good pattern.
- Migration path is practical: Extract helpers → register as nodes → delete plan_node. The phased wrapper approach (plan_node becomes a thin delegator during transition) is safe.
Weaknesses:
- ask_region and ask_currency as separate nodes is over-splitting (same issue as opus).
- enforce_policy is still complex (all post-LLM rewrites in one node). This is acceptable but the proposal doesn't acknowledge the trade-off.
- No explicit decision_origin or llm_failed state field discussion.
- The Mermaid diagram shows ask_region → observe_user and ask_currency → observe_user but doesn't show how observe_user routes back to tick.
Verdict: The decision matrix appendix makes this a valuable reference. The node count is reasonable. The lack of decision_origin is a notable gap.
6. glm-5.1 — Conservative 2-node split¶
Nodes: pre_check → acquire_or_plan (2 new nodes)
Strengths:
- Most conservative proposal. Only 2 new nodes, minimal graph changes. Lowest risk.
- _build_plan_output helper to collapse the 9 conditional-return pattern is a good code-level improvement independent of graph changes.
- Post-LLM rewriting options (Option A: keep inline, Option B: extract validate_decision) shows good judgment about when to stop refactoring.
- State change is minimal: Only _pre_check_route field. The alternative (extending Action enum) is correctly rejected.
- Risk table is thorough: Loop-back edge changes, region/currency timing, _build_plan_output hiding logic, dynamic decomposition dependencies.
Weaknesses:
- 2 nodes is too few. acquire_or_plan is still ~200 LOC after extraction. The proposal acknowledges this ("still substantial") but doesn't fully address the cognitive load.
- Doesn't solve the core problem fully. The graph still hides most routing inside acquire_or_plan. The Mermaid diagram is only marginally more informative than today's.
- No explicit migration testing strategy. Phase 1 says "update tests" but doesn't specify how.
- The _pre_check_route field as a string is less type-safe than fable-5's decision_origin: Literal["deterministic", "llm"].
Verdict: Good first step, not a complete solution. If the team wants to start small, this is the right entry point. But it should be followed by further decomposition.
7. deepseek-4-pro — Well-structured but over-engineered¶
Nodes: guard → prepare → plan → adjust (4 new nodes, but with self-loop on plan)
Strengths: - 4-node pipeline is conceptually clean. Guard → Prepare → Plan → Adjust maps well to the mental model. - Detailed trade-off table with before/after comparison is useful. - Alternatives considered section shows good engineering judgment: rejected 16-node explosion, rejected keep-plan-node-with-edges, rejected calculator subgraph, rejected guard-only extraction. - Migration phases are well-ordered: Extract → Update graph → Remove old code → Verify.
Weaknesses:
- plan self-loop for decomposition is a rough edge. The conditional edge plan → plan (when decomposition is generated) adds complexity that fable-5 avoids by putting decomposition in prepare.
- recipes in state is a new field that fable-5 and gpt-5.5 avoid. The proposal acknowledges this risk ("must stay synchronized with dynamic_decompositions") but doesn't fully mitigate it.
- guard handles region/currency + abort/max_iters but prepare handles calculator gates + auto-finish. The split between these two is not obvious — why are calculator gates in prepare rather than guard? The proposal doesn't justify this boundary clearly.
- No checkpoint compatibility discussion.
Verdict: Good structure, but the plan self-loop and recipes in state are rough edges that fable-5 handles better.
8. qwen-3.6-plus — Good ideas, flawed graph topology¶
Nodes: preflight → ask_region/ask_currency → decompose → plan → route_finish_check (5 new nodes + 4 bootstrap)
Strengths:
- route_finish_check as a validation node is a good idea — the current blind finish → END path doesn't verify that finish is actually appropriate.
- Routing function signatures are well-documented. Each routing function has a clear return type and responsibility.
- Phase-by-phase migration is reasonable, starting with pre-flight nodes (lowest risk).
Weaknesses:
- Graph topology has a critical flaw: The loop-back edges point to plan instead of preflight. This means guard checks (aborted, max_iters, region, currency) are NOT re-run on every iteration. This is a behavioral regression.
- ask_region/ask_currency with dedicated observe_region/observe_currency adds 4 nodes for what is currently 20 lines of inline logic. Over-splitting.
- route_after_plan becomes a routing function with redirect logic AND state mutation. This violates the principle that edge functions should be pure predicates. The proposal acknowledges this but proceeds anyway.
- No decision_origin or llm_failed state field. The redirect logic in route_after_plan needs to know whether the decision came from LLM or deterministic path.
- The Mermaid diagram is inconsistent with the proposed graph construction code (e.g., observe → plan vs observe → preflight).
Verdict: Good ideas buried in an inconsistent topology. The loop-back-to-plan bug alone makes this proposal unsafe to implement as written.
9. qwen-3.7-max — Overly granular, 10 nodes with _route state pollution¶
Nodes: check_termination → check_region_currency → compose_schema → check_calculator → acquisition_routing → check_completion → llm_decide → post_process → enforce_caps → route_decision (10 new nodes)
Strengths:
- Most detailed code examples of any proposal. Each node has a full Python implementation sketch.
- _route transient field pattern is consistent across all nodes.
- Migration strategy is phased: Guards → Transformations → LLM → Graph → Cleanup.
Weaknesses:
- 10 nodes is excessive. The proposal splits every concern into its own node without considering whether the granularity adds value.
- _route field in state is a code smell. Using state as a routing signal means every node writes a _route key that is immediately consumed and discarded. This pollutes the checkpoint and makes the state schema noisy.
- observe_user → check_region_currency breaks the existing observe_user → plan loop. Region/currency answers currently go through the same observe_user_node that handles all user answers. This changes behavior.
- No decision_origin concept. The proposal doesn't distinguish between deterministic and LLM-originated decisions in the post-processing pipeline.
- The code examples have bugs: check_termination_node returns _route: "finish" but the routing function reads state.get("_route", "continue") — the default would never trigger.
- Checkpoint serialization concern is acknowledged ("exclude transient _route fields") but LangGraph's PostgresSaver doesn't support field-level exclusion.
Verdict: The code examples are impressive but the architecture is over-engineered. The _route state pollution is a significant anti-pattern.
10. gemini-3.1-pro — Incomplete analysis, too coarse¶
Nodes: prepare_state → evaluate_rules → llm_plan (3 new nodes)
Strengths: - 3-node split is the simplest proposal. Easy to understand, easy to implement. - Benefits section is clear: True graph visibility, reduced latency/cost risk, cleaner unit tests.
Weaknesses:
- Doesn't address post-LLM rewriting at all. The current _redirect_derived_direct_decision, _redirect_premature_ask_for_web_field, cap enforcement, and _adjust_calculation_decision are all left unaddressed.
- evaluate_rules is still a mega-node. It combines abort/max_iters checks, region/currency gating, calculator status, acquisition task selection, and auto-finish logic. This is barely better than the current plan_node.
- No migration plan. Just "refactor agent.py functions" with no phasing or testing strategy.
- No state changes discussion. The proposal doesn't address how evaluate_rules communicates its decision to the routing function.
- The Mermaid diagram shows route_after_rules and route_after_llm as diamond nodes (conditional edges), but doesn't specify what state they read.
Verdict: Too coarse to be useful. The 3-node split doesn't solve the core problem — evaluate_rules is still a black box.
11. mimo-2.5-pro — Incomplete, naming confusion¶
Nodes: guards → acquire → decide (3 new nodes)
Strengths: - 3-node split is simple. Guards → Acquire → Decide is easy to understand. - Migration plan is phased: Extract without changing topology → Wire into graph → Cleanup. - Expected benefits table is clear and measurable.
Weaknesses:
- decide node is still ~80 lines and includes LLM call + all redirectors + cap enforcement. This is barely better than the current plan_node.
- plan_router is mentioned but not defined. The graph construction references builder.add_edge("decide", "plan_router") but plan_router is not registered as a node.
- route_after_acquire routes to "decide" or "finish" but doesn't handle the case where acquire finds a task that should go directly to an action node.
- No decision_origin or llm_failed state field.
- Open questions are left unanswered: Should decide include loop/cap detection? Should plan_router be a node or routing function? Naming preferences?
- The graph construction code has a bug: builder.add_edge("decide", "plan_router") followed by builder.add_conditional_edges("plan_router", route_after_plan, ...) — but plan_router is never registered as a node.
Verdict: Incomplete and has implementation bugs in the proposed code. The 3-node split is too coarse.
Synthesis: Recommended Approach¶
Core concepts to adopt (from multiple proposals):¶
| Concept | Source | Why |
|---|---|---|
decision_origin flag |
fable-5 | Cleanly gates which rewrite subset applies; preserves llm_failed protection |
tick as sole iteration incrementer |
fable-5, gpt-5.4 | Prevents drift in attempt tracking |
Deterministic ladder in select |
fable-5 | Preserves exact ordering of today's early returns |
Single decision_policy / enforce_policy node |
gpt-5.4, gpt-5.5, kimi-2.6 | Avoids graph noise from splitting every rewrite |
| Phase 1: extract without rewiring | fable-5, gpt-5.5, opus | Keeps tests green at every step |
| Bump planner thread namespace | fable-5 | Concrete mitigation for in-flight checkpoint breakage |
| Anti-patterns list | gpt-5.4 | "Don't mutate state in edge functions," "Don't split every rule" |
Recommended node count: 5-6 nodes¶
The sweet spot is between glm-5.1's 2 nodes (too few) and opus's 14 nodes (too many). The fable-5 proposal's 5 nodes (tick, prepare, select, decide, guard) is the right granularity.
Recommended migration order:¶
- Extract helpers without graph changes — plan_node becomes thin sequential composition
- Add
decision_originandllm_failedto State — serializer-compatible with defaults - Rewire graph — register 5 nodes, add conditional edges, bump thread namespace
- Delete plan_node — update tests and docs
What NOT to do:¶
- Don't split correctors into individual nodes. A single
enforce_policy/guardnode handles all post-LLM rewrites. The sequential chain adds checkpoint writes with zero routing benefit. - Don't add
_routeto state. Usedecision_origin+decision.actionfor routing. Transient routing fields pollute checkpoints. - Don't make
planself-loop. Decomposition belongs inprepare, not as a post-LLM loop. - Don't move state mutation into edge functions. Edge functions should be pure predicates.
- Don't split region/currency into separate nodes. One
bootstrap_gateor include intick.