Skip to content

Planner Graph Refactor: Proposal Evaluation & Ranking

Evaluator: Sisyphus (deepseek-v4-pro) Date: 2026-06-11 Source proposals: docs/planner-graph-ref/proposals/ (11 proposals) Context: docs/planner-graph-ref/current-graph.md, src/venturescope/planner/agent.py#L846-L1192


0. Problem Summary (Consensus Across All Proposals)

All 11 proposals agree on the core diagnosis:

  1. plan_node() is a 346-line god node with 15+ early-return paths, mixing 6 distinct concern categories (guard, preparation, acquisition, LLM planning, correction, logging).
  2. The graph diagram lies. route_after_plan() is 4 lines that just read decision.action. All real routing decisions happen inside plan_node as if blocks — invisible to LangGraph/LangSmith visualizers.
  3. Testing is coarse. A test for "calculator cap forces abort" must construct full State and walk through 7 preceding gates.
  4. Cross-concern bugs are easy. Post-LLM correctors each independently mutate decision; ordering changes are silent.

All proposals converge on the same solution principle: move deterministic orchestration from inside plan_node to graph-level nodes and conditional edges. Where they differ is how many nodes and where to draw the boundaries.


1. Individual Proposal Evaluations (Ranked)

#1: kimi-2.6-proposal — BEST OVERALL

Decomposition: 7 new nodes: tick, ask_region/ask_currency, prepare, calc_gate, acquire, route_direct, decide, enforce_policy

What's good: - Optimal granularity. 7 nodes strike the right balance — each has a clear single responsibility, but no concern is split into sub-atomic fragments. - Decision Matrix appendix (Section: "What Goes Where") is outstanding documentation — maps every code block from the current plan_node to its proposed new location with rationale. - Bootstrap questions as dedicated nodes (ask_region, ask_currency) make the mandatory-first-questions contract structurally visible. - calc_gate as an explicit node makes calculator lifecycle (success → finish, cap → abort) a first-class graph transition. - acquire vs decide separation correctly isolates deterministic acquisition-task resolution from LLM-based decision-making. The route_direct adapter node bridging both paths into enforce_policy is a clean pattern. - enforce_policy as a single corrector node — keeps all post-LLM rewrites (redirects, caps, calculator adjustments) in one place without splitting them into a 5-node corrector chain. - Phase-out wrapper pattern (plan_node delegates internally during transition) allows zero-risk migration step 1. - Good migration path: extract helpers → register as nodes → delete legacy.

What's problematic: - ask_region/ask_currency being separate nodes adds 2 dedicated nodes + they bypass observe_user (redirect to a separate handler). The current code handles region/currency answers in observe_user_node via _handle_region_answer/_handle_currency_answer — separating these means duplicating or moving that logic. - route_direct as a separate node between acquire and enforce_policy is a ~20-line adapter. It could be inlined into acquire without losing clarity. - No explicit decision_origin state field — unlike fable-5, this proposal doesn't distinguish whether enforce_policy should run full or partial correction pipeline based on where the decision came from.

Bottom line: The most well-rounded proposal. Good depth of analysis, pragmatic node count, excellent documentation. Minor issues with bootstrap node separation and missing decision_origin can be resolved in implementation.


#2: fable-5-proposal — STRONG RUNNER-UP

Decomposition: 5 nodes: tick, prepare, select, decide, guard

What's good: - decision_origin field (Literal["deterministic", "llm"]) is the single most valuable state design idea across all proposals. It tells guard (the corrector node) which rewrite subset to apply, elegantly solving the problem of running different correction pipelines for deterministic vs LLM-originated decisions. - select as "deterministic decision ladder" is a conceptually clean unit — all the rules that can decide "we know what to do next without the LLM" live in one place. - Checkpoint namespace versioning (planner_thread_id(){conv_id}:planner:v2) shows practical awareness of migration risks with in-flight conversations. - Explicit follow-up section acknowledges the remaining LLM calls (blocked-field decomposition, guard on-demand decomposition) and defers a decompose loop node — honest about incomplete decomposition. - Migration plan (extract → rewire → docs) is realistic with specific files and step order.

What's problematic: - 5 nodes might be slightly too few. select handles calculator cap/success, blocked-calc acquisition, generic acquisition, and auto-finish — that's still ~95 lines for one node. guard handles ALL correctors (derived redirect, web-pref redirect, search cap, ask cap, calculator adjustment) — that's ~120 lines. - prepareselectdecideguard pipeline has 4 sequential deterministic hops before reaching an action node. Each hop writes to the checkpointer. - The Command(goto=...) suggestion in follow-up is deferred; using goto would reduce router functions but make the topology less declaratively visible in _build_state_graph(). - Not clear how decision_origin interacts with llm_failed — the proposal mentions both but the interaction isn't fully specified.

Bottom line: Excellent architectural thinking with the decision_origin concept. Slightly under-decomposed — select and guard could each benefit from one more split. The state design ideas should be adopted by whichever proposal is implemented.


#3: gpt-5.5-proposal — MOST THOROUGH DOCUMENTATION

Decomposition: 7+ nodes: enter_iteration, prepare_schema, require_region/require_currency, acquisition_gate, plan, normalize_decision, retry_gate, maybe_calculate

What's good: - Exceptionally clear about non-goals (Section: "Non-goals"). Explicitly states what won't change: PlannerDecision actions, search backend, schema merge, calculator semantics, SQL persistence. This reduces scope anxiety for reviewers. - Dedicated retry_gate separate from normalize_decision — recognizes that per-field cap enforcement is a different concern from LLM output correction. - maybe_calculate as calculator-only router — clean separation of "should we run the calculator before finishing?" from the rest of decision routing. - prepare_schema as reusable graph stage — downstream nodes can assume schema and recipes are current, eliminating redundant build_dynamic_recipes() calls. - Detailed test migration notes mapping each existing test category to new node targets. - Keeps plan_node name during migration — pragmatic backward compatibility. - Good risk analysis about state handoff discipline being harder with more nodes.

What's problematic: - require_region/require_currency as separate nodes — same issue as kimi-2.6. These are two ~20-line nodes that do the same thing (create an ask_user decision with different field/question text). The graph would be cleaner with a single node. - normalize_decision + retry_gate + maybe_calculate is arguably over-split. These three run sequentially and always in the same order. Merging retry_gate into normalize_decision would reduce hop count without losing clarity. - 7-stage migration plan (one stage per new node) is ambitious — intermediate graph states with partially extracted nodes may have subtle interactions. - llm_failed OR decision_origin in state — the proposal offers both as options without committing. The ambiguity suggests the design isn't fully settled.

Bottom line: Most thoroughly documented — excellent as a reference for what each concern does. The 7-node topology is defensible but slightly over-split. The retry_gate/normalize_decision/maybe_calculate trio could be 2 nodes without loss.


#4: deepseek-4-pro-proposal — WELL-STRUCTURED, SLIGHTLY UNDER-DECOMPOSED

Decomposition: 4 nodes: guard, prepare, plan, adjust

What's good: - Routing acquired-task decisions through adjust is a key design insight. It ensures calculator decision adjustments (_adjust_calculation_decision) still apply to deterministic decisions, not just LLM ones. - recipes in state — practical addition. Currently recomputed in multiple nodes; caching avoids redundant work. - Phase 1 extraction without graph changes is the safest migration strategy. Validates correctness before touching edges. - Good trade-off table (Section 5) clearly enumerates advantages and mitigations for each concern. - Explicit consideration and rejection of alternatives (4 alternatives considered) shows thorough design thinking.

What's problematic: - guard mixes concerns — iteration/abort checks AND region/currency questions. These are semantically different (bookkeeping vs bootstrap gating). - plan self-loop for decomposition regeneration — the proposal shows plan → plan as a conditional edge when decomposition was generated. This is a subtle pattern that's hard to test and reason about (will it loop forever if decomposition generation fails?). - 4 nodes is likely insufficient to solve the testability problem. adjust at ~80 lines still handles 5 distinct corrector concerns in a single function. - No decision_origin or llm_failed state fieldadjust can't distinguish between "LLM produced this" and "deterministic gate produced this", so it must always run the full pipeline. - route_after_plan redefined as checking for decomposition — changes the semantics of an existing function name, which could confuse readers.

Bottom line: Pragmatic and well-argued. The 4-node split is the right starting point but doesn't go far enough to solve the core problems. Best used as Phase 1 of a larger decomposition.


#5: glm-5.1-proposal — CONSERVATIVE, PRAGMATIC

Decomposition: 2 nodes: pre_check + acquire_or_plan (renamed from plan)

What's good: - Most conservative approach. Two new nodes is the smallest surface area change — lowest migration risk. - _build_plan_output helper to collapse the 9-way conditional return pattern is a valuable side improvement that can be adopted independently. - Explicit Options A/B for post-LLM rewriting — acknowledges the trade-off without committing to a harder migration. - _pre_check_route field is a clear routing contract between pre_check and route_after_pre_check. - Good risk table with specific mitigations for each concern.

What's problematic: - pre_check is still ~150 LOC — mixes iteration counting, abort checks, region/currency gates, proactive decomposition, schema composition, calculator cap/success checks. This is ~6 concerns in one node — the same monolithic problem, just at a smaller scale. - acquire_or_plan is still ~200 LOC — contains acquisition task selection, LLM call, and all 5 post-LLM correctors. This is the bulk of the current plan_node just moved to a new name. - Doesn't meaningfully improve testability — to test "calculator cap forces abort", you still need to set up state to bypass iteration checks, abort checks, region gates, and currency gates. - The "Phase 3" and "Phase 4" suggest more decomposition is needed — the proposal itself acknowledges this isn't the end state.

Bottom line: Best for a risk-averse "first step." Adopt _build_plan_output helper regardless. Not sufficient as the final architecture — the proposal itself plans for further decomposition.


#6: gpt-5.4-proposal — CLEAN PIPELINE, OVER-SEQUENTIAL MIGRATION

Decomposition: 6 nodes: tick, bootstrap_gate, prepare_context, acquisition_gate, llm_plan, decision_policy

What's good: - Design anti-pattern warnings (Section: "Risks and anti-patterns to avoid") are excellent — "Don't move state mutation into edge functions", "Don't increment iterations in every gate", "Don't turn every policy rule into a node". These protect against common LangGraph refactoring mistakes. - One decision_policy node for all correctors — correct choice. Splitting into a chain of derived_redirect -> source_policy -> retry_policy -> calculator_policy would make the graph harder to follow. - tick owns iteration only — clean separation. Only place that writes iterations. - bootstrap_gate as a dedicated node for region/currency gating is a clean single concern.

What's problematic: - 6-stage sequential migration plan (one stage per new node, in order). Adding nodes one at a time means 5 intermediate graph states where the planner is partially decomposed. Each intermediate state needs its own verification. High risk. - acquisition_gate handles both calculator-blocked acquisition AND auto-finish checks — two distinct concerns (recovery vs completion). - prepare_context is described as "should only orchestrate, helpers own domain logic" but the boundary between orchestration and domain logic isn't clearly specified. - No state additions proposed — the proposal avoids adding llm_failed or decision_origin fields, which means decision_policy can't distinguish deterministic vs LLM decisions.

Bottom line: Strong design principles and anti-pattern awareness. The 6-stage linear migration is impractical. The acquisition_gate responsibility is slightly overloaded. Good as a reference for what NOT to do (anti-patterns section).


#7: gemini-3.1-pro-proposal — SIMPLEST, MOST NAIVE

Decomposition: 3 nodes: prepare_state, evaluate_rules, llm_plan

What's good: - Easiest to understand — 3 nodes, each with one clear job. - Good emphasis on LangGraph/LangSmith visualization — the concrete benefit of showing "evaluate_rules → ask_user" vs "evaluate_rules → llm_plan" in a trace. - Clear problem statement with specific line numbers and concerns listed.

What's problematic: - evaluate_rules becomes the same monolith by a different name. It handles: max_iters/aborted, region, currency, calculator caps, calculator success, blocked-calc acquisition, next task selection, and auto-finish checks. That's 8 concerns in one node — ~150 lines. - No post-LLM correction handling. The proposal shows LLM output routing directly to action nodes — all the redirects (derived fields, web-preferred, cap enforcement, calculator adjustment) are simply missing from the proposed graph. - prepare_state includes both iteration increment AND schema composition — mixing bookkeeping with domain logic. - No migration path — just implementation steps with no risk assessment or phased approach. - Likely results in worse testability than current stateevaluate_rules has the same branching complexity as the top half of plan_node but fewer integration points to hook tests into.

Bottom line: A good first-draft analysis of the problem but an incomplete solution. Useful as a thinking exercise; not suitable for implementation. The missing post-LLM correction pipeline is a showstopper.


#8: mimo-2.5-pro-proposal — REASONABLE STRUCTURE, INCOMPLETE EXECUTION

Decomposition: 3 nodes: guards, acquire, decide

What's good: - Clean ASCII diagram showing the three-way split. - Good open questions about naming alternatives and whether loop/cap detection should be separate. - Phase-based migration with explicit risk labels (low, medium, low). - Benefit table with concrete before/after LOC estimates.

What's problematic: - guards node is overloaded — handles iteration counting, abort checks, region/currency gates, AND calculator cap/success checks. That's 5 distinct concerns. - decide node is overloaded in the other direction — handles acquisition task conversion, LLM call, AND all 5 post-LLM correctors. ~80 lines is optimistic; the full redirect pipeline is closer to 120 lines. - acquire node described as "populates dynamic_decompositions and selects the next task" but the current code has decomposition generation BEFORE acquisition tasks (lines 889-917) and acquisition tasks AFTER calculator checks (lines 919-988). The proposal reorders these without addressing the dependency. - No corrector/cap enforcement separation — these remain buried inside decide. - plan_router as a node or routing function? — open question left unresolved. This should be a routing function, not a node. - Naming proposal (guards / acquire / decide) is less self-documenting than other proposals' naming conventions.

Bottom line: The 3-node split is insufficient. The proposal correctly identifies the problem and outline but doesn't go far enough. The guards and decide nodes would both be too large.


#9: qwen-3.7-max-proposal — MOST GRANULAR, OVER-ENGINEERED

Decomposition: 10+ nodes: check_termination, check_region_currency, compose_schema, check_calculator, acquisition_routing, check_completion, llm_decide, post_process, enforce_caps, route_decision

What's good: - Complete, compilable code examples for every proposed node. The most implementation-ready proposal in terms of artifact completeness. - Good separation of calculator checks (check_calculator) from acquisition routing (acquisition_routing) from completion checks (check_completion). - route_decision as a separate node that emits events and routes — isolates the logging concern. - Comprehensive graph construction code with all edges declared.

What's problematic: - 10+ nodes means 10+ checkpoint writes per iteration under PostgresSaver. For a chat-paced agent with max_iters=50, that's potential 500+ small writes per conversation turn. Acceptable for a slow human-paced interaction but measurable. - _route transient field spread across ALL nodes is a code smell. It's an ad-hoc routing protocol layered on top of LangGraph's own conditional edge system. Every node writes it, every routing function reads it — tight coupling. - acquisition_routing node is ~70 lines with significant domain logic (decomposition generation, blocked-calc task selection). This is one of the most complex pieces of plan_node and it gets ONE node, while check_termination (increment + 2 if-checks) also gets ONE node. The granularity is inconsistent. - check_completion node duplicates acquisition task selection logic from acquisition_routing (both call next_acquisition_task). This violates DRY. - check_region_currencyask_user routing uses the existing ask_user node for region/currency interrupts, but observe_usercheck_region_currency means the outer observe_user handler must still special-case region/currency field targets. The proposal doesn't fully extract this concern.

Bottom line: The most complete in terms of code, but over-engineered. The inconsistent granularity (a 5-line check gets a node; a 70-line acquisition resolver gets one too) and _route field proliferation indicate the decomposition wasn't driven by clear criteria. Takes several good ideas too far.


#10: qwen-3.6-plus-proposal — MISPLACED RESPONSIBILITIES

Decomposition: 7 new nodes: preflight, ask_region/ask_currency, observe_region/observe_currency, decompose, route_finish_check, plus significantly modified plan and route_after_plan

What's good: - decompose as a separate node — makes explicit that recipe building happens before planning. (Though this could also be argued as over-separation.) - route_finish_check as a finish-validation node — prevents premature termination when fields are still missing. - Good benefit table mapping 5 categories of improvement. - Honest open questions about whether decompose should be a node and whether separate observe nodes are needed.

What's problematic: - route_after_plan gains redirect logic. This is explicitly warned AGAINST by the gpt-5.4 proposal as an anti-pattern: "Don't move state mutation into edge functions." The proposal puts 6 redirect checks into a conditional edge function, which should be a pure state-reader, not a state-mutator. - route_finish_check as a node that can route back to plan — creates a loop node that's semantically a conditional edge. The graph already has plan as the main loop entry; adding another loop-back point fragments the control flow. - Separate observe_region/observe_currency nodes — the current code handles region/currency answers in observe_user_node via _handle_region_answer/_handle_currency_answer. Separating them means 4 extra nodes (ask_region, ask_currency, observe_region, observe_currency) for what's currently 2 branches of a single handler. - preflight is ~60 lines but the proposal's code sketch is just comments showing 7 responsibilities. Each of those is a distinct concern. - Migration Phase 4 ("Move redirects to routing function") puts ~100 lines of logic into route_after_plan — this is the wrong place for business logic. Routing functions should be thin dispatch, not policy engines.

Bottom line: The route_after_plan redirect idea is architecturally wrong. Putting decision-rewrite logic in routing functions violates LangGraph's separation of concerns. The region/currency observe node separation is unnecessary complexity. Some good individual ideas (decompose node, route_finish_check) but the overall architecture is flawed.


#11: opus-proposal — MOST PRINCIPLED, LEAST PRACTICAL

Decomposition: 12+ nodes: g_iter_cap, g_region+emit_region, g_currency+emit_currency, enrich_schema, g_calc_caps (sub-nodes emit_calc_abort/emit_calc_done), acquire, plan_llm, c_target_decompose, c_redirect_derived, c_redirect_web, c_search_cap, c_ask_cap, c_calc_adjust, dispatch

What's good: - Best naming convention across all proposals. g_* for gates, c_* for correctors, emit_* for decision emitters, dispatch for the action router. Self-documenting. - Most principled decomposition. Every concern gets its own node. The invariant "every path deposits a PlannerDecision into state['decision']" is clean. - Single-responsibility taken to its logical conclusion. plan_llm is literally just the LLM call — nothing else. - State write matrix (Section 5) showing exactly which node writes which fields — eliminates the 8 scattered if schema_changed: out["schema"] = schema_dict blocks in the current code. - Good open questions (Section 9) about whether enrich_schema should always run, whether acquire should be split further, and whether c_search_cap/c_ask_cap should merge. - g_region/g_currency split from emit_region/emit_currency — separates predicate from decision emission, making predicates independently testable.

What's problematic: - 12+ nodes = 12+ checkpoint writes per iteration. Under PostgresSaver, even small writes accumulate. With max_iters=50, worst case is 600 checkpoint writes per conversation turn. While LangGraph can batch on add_edge (static edges for the c_* chain), the g_* gate stage alone has 5+ conditional hops that must be checkpointed individually. - Split of predicate from emission (g_region + emit_region) is philosophically correct but doubles node count for what are 15-line functions. 2 nodes where 1 suffices. - Highest migration risk across all proposals. Step 3 alone ("Split correctors") involves promoting 6 internal functions to graph nodes — each with its own edges, checkpointer impact, and test requirements. - c_* corrector chain — 6 sequential nodes connected by add_edge. Each is 5-20 lines. The same logic could be a single enforce_policy node (as in kimi-2.6) with a sequential function pipeline inside. The extra granularity adds checkpoint overhead without proportional debugging benefit. - acquire node is described as one node but the text suggests splitting into acquire_blockedacquire_generic — the proposal itself acknowledges it might need more nodes.

Bottom line: The most architecturally pure proposal. If checkpoint cost weren't a concern, this would be the ideal solution. In practice, the 12+ node count creates unacceptable overhead for a chat-paced agent. Adopt the naming conventions; use a more pragmatic node count.


2. Cross-Cutting Observations

2.1 What ALL Proposals Get Right

  • tick (or equivalent) as the loop-entry point — every proposal separates iteration counting from planning. This is the consensus "first extraction."
  • LLM call isolated to one node — every proposal agrees plan_node should shrink to just prompt building + LLM structured output.
  • Action nodes unchanged — no proposal touches search, observe, calculate, ask_user, observe_user, reflect, finish. Correct.
  • Loop-back to new entry point — all proposals recognize that action nodes should return to the new loop-entry node (whatever it's named) rather than "plan."
  • Outer contract preserved — no proposal changes run_planner_step(), build_planner_graph(), or the thread namespace.

2.2 Where Proposals Disagree

Concern Minimalist View Maximalist View Recommendation
Bootstrap questions (region/currency) In tick (fable-5, glm-5.1) Separate nodes (kimi-2.6, qwen-3.6-plus, qwen-3.7-max, opus, gpt-5.5) In tick — they're simple predicates. Separate nodes add graph noise.
Calculator lifecycle In acquire/acquire_or_plan (several) Separate calc_gate (kimi-2.6, fable-5 via select, gpt-5.5 via maybe_calculate, qwen-3.7-max) Separate calc_gate — calculator has 3 distinct states (not-run, blocked, success) that deserve a visible transition.
Post-LLM correctors All in one node (kimi-2.6 enforce_policy, gpt-5.4 decision_policy) Split into 4-6 nodes (opus c_* chain, gpt-5.5 normalize_decision + retry_gate) One enforce_policy node. The correctors form a pipeline — splitting into nodes adds checkpoint overhead without proportional clarity.
State routing field _route / _pre_check_route (glm-5.1, qwen-3.7-max) decision_origin (fable-5) decision_origin — it's semantically meaningful (deterministic vs LLM), not just a routing hack.
Pre-LLM acquisition In acquire (most) + route_direct adapter (kimi-2.6) Merged into acquire (fable-5 select, gpt-5.4 acquisition_gate) In acquire, no separate adapter. The adapter adds a node for a 20-line conversion.
Schema preparation In prepare node (most) In enrich_schema (opus) or prepare_context (gpt-5.4) prepare — naming should be consistent, not fancy.

2.3 Common Pitfalls to Avoid

  1. State mutation in routing functions — qwen-3.6-plus's route_after_plan with redirect logic is the canonical example. Routing functions should inspect state and return route labels ONLY.
  2. _route transient field proliferation — qwen-3.7-max spreads this across 10 nodes. Use decision_origin (fable-5) or just check state["decision"] directly.
  3. Over-splitting — opus's 12+ nodes and qwen-3.7-max's 10 nodes create excessive checkpoint writes.
  4. Under-splitting — gemini-3.1-pro's 3 nodes and mimo-2.5-pro's 3 nodes don't solve the testability problem.
  5. Sequential migration that creates broken intermediate states — gpt-5.4's 6-stage plan where each stage adds one node. Prefer: extract all helpers first (zero graph change), then rewire the entire graph at once.

3.1 Best Combination

The optimal solution combines elements from multiple proposals:

Element Source Why
tick node (iteration + terminal checks + region/currency bootstrap) fable-5, gpt-5.4, gpt-5.5 Single loop-entry point. Region/currency predicates are simple — separate nodes are overkill.
prepare node (decomposition + schema composition + recipe building) kimi-2.6, deepseek-4-pro Pure enrichment — no decisions. Writes schema and dynamic_decompositions.
calc_gate node (calculator cap/success → finish) kimi-2.6 Calculator lifecycle deserves a visible graph transition.
acquire node (blocked-calc recovery + deterministic task selection + auto-finish check) fable-5 select, gpt-5.5 acquisition_gate All deterministic "we know what to do" logic in one place. Routes to decide only when no deterministic decision exists.
decide node (LLM call ONLY) ALL proposals The core planning node. Prompt build + structured output. Nothing else.
enforce_policy node (all post-decision corrections) kimi-2.6, gpt-5.4 Single corrector node with internal pipeline. Don't split into normalize_decision + retry_gate + maybe_calculate.
decision_origin state field fable-5 Tells enforce_policy which correction pipeline to run. Deterministic decisions skip LLM-specific redirects.
llm_failed state field fable-5, gpt-5.5, opus Prevents enforce_policy from running _adjust_calculation_decision after LLM failure (avoids infinite loop).
Migration: extract helpers first, then rewire deepseek-4-pro, glm-5.1 Lowest risk. Validates extraction before changing graph topology.
flowchart TD
    START([START]) --> tick[tick]

    tick -->|aborted or max_iters| finish[finish]
    tick -->|region/currency missing| ask_user[ask_user / interrupt]
    tick -->|continue| prepare[prepare]

    prepare --> calc_gate[calc_gate]

    calc_gate -->|cap reached or success| finish
    calc_gate -->|continue| acquire[acquire]

    acquire -->|deterministic decision| enforce_policy[enforce_policy]
    acquire -->|needs LLM| decide[decide]

    decide -->|LLM failed| finish
    decide -->|valid decision| enforce_policy

    enforce_policy -->|search| search[search]
    enforce_policy -->|ask_user| ask_user
    enforce_policy -->|calculate| calculate[calculate]
    enforce_policy -->|reflect| reflect[reflect]
    enforce_policy -->|finish| finish

    search -->|last_observation| observe[observe]
    search -->|no hits| tick
    observe --> tick
    calculate --> tick
    ask_user --> observe_user[observe_user]
    observe_user --> tick
    reflect --> tick
    finish --> END([END])

6 new control nodes (7 including tick which replaces plan as the loop entry) + unchanged action nodes. Each has a single, clear responsibility. The graph diagram tells the truth about control flow.

3.3 Node Responsibilities (Summary)

Node Lines (est.) One-Sentence Responsibility
tick ~40 Increment iterations; check abort/max_iters; handle region/currency bootstrap questions.
prepare ~45 Generate proactive decompositions, build dynamic recipes, compose ready fields.
calc_gate ~25 Check calculator cap reached or calculator success → finish.
acquire ~80 Resolve blocked-calc acquisition tasks; select next task from open component work; auto-finish when all inputs collected.
decide ~40 Build planner prompt; call LLM structured output; fallback to finish on failure.
enforce_policy ~100 Run corrector pipeline: composite-field decomposition, derived-field redirect, web-pref redirect, search cap, ask-user cap, calculator adjustment, logging.

Total new code: ~330 lines (split across 6 nodes) vs current 346 lines in one node. The LOC is similar but the architectural clarity is dramatically better.

3.4 State Changes

class State(TypedDict):
    # ... existing fields ...
    decision_origin: Literal["deterministic", "llm"] | None  # Set by acquire/decide
    llm_failed: bool                                           # Set by decide on LLM error
    recipes: dict[str, FieldAcquisition]                       # Cached from build_dynamic_recipes
  • decision_origin: Set by acquire when it produces a deterministic decision, or by decide when the LLM produces one. Read by enforce_policy to select the correct correction pipeline.
  • llm_failed: Set by decide when structured output fails. Read by enforce_policy to skip _adjust_calculation_decision (prevents infinite planner→reflect→planner loop).
  • recipes: Optional — currently recomputed in multiple nodes. Adding to state avoids redundant recomputation.

3.5 Migration Plan

Phase 1: Extract helpers (no graph change — 1 PR) 1. Pull tick_node(), prepare_node(), calc_gate_node(), acquire_node(), decide_node(), enforce_policy_node() out of plan_node as module-level functions. 2. plan_node becomes a thin sequential composition calling each. 3. Add decision_origin and llm_failed to State. 4. Verify: all existing tests pass. No behavior change.

Phase 2: Rewire graph (1 PR) 1. Register 6 new nodes in _build_state_graph(). 2. START → "tick" replaces START → "plan". 3. Add conditional edges per the Mermaid diagram above. 4. Update action node loop-back edges: → "tick" instead of → "plan". 5. Bump planner thread namespace to {conversation_id}:planner:v2 for checkpoint compatibility. 6. Delete plan_node and route_after_plan. 7. Verify: planner integration tests pass; full test suite green.

Phase 3: Docs (1 commit) 1. Replace docs/planner-graph-ref/current-graph.md with new diagram. 2. Update src/venturescope/planner/AGENTS.md node descriptions.


4. Conclusion

The proposal set represents a remarkably consistent diagnosis: plan_node has too many responsibilities and the graph should own the routing logic. The disagreement is only about how many nodes and where to draw the boundaries.

The kimi-2.6 proposal is the best single proposal because it hits the sweet spot: enough decomposition to make every concern testable and every graph transition visible, but not so much that checkpoint overhead or migration risk becomes unacceptable. Its decision matrix appendix is a model of clear documentation.

However, the ideal solution synthesizes strengths from multiple proposals: - kimi-2.6 provides the topology (tick → prepare → calc_gate → acquire → decide → enforce_policy) - fable-5 provides the state design (decision_origin) - deepseek-4-pro provides the migration safety pattern (extract → rewire → delete) - gpt-5.5 provides the non-goals clarity and boundary discipline - opus provides the naming conventions (though the node count is excessive)

Recommendation: Implement the synthesis described in Section 3. Start with Phase 1 (helper extraction) immediately — it's risk-free and makes the code reviewable regardless of which final graph topology is chosen.


Appendix: Proposal Comparison Matrix

# Proposal Nodes (control) State Fields Added Correctors Strategy Migration Steps Key Innovation Overall
1 kimi-2.6 7 0 Single enforce_policy node 3 Decision matrix appendix ★★★★★
2 fable-5 5 2 (decision_origin, llm_failed) Single guard node with origin-based dispatch 3 decision_origin concept ★★★★★
3 gpt-5.5 7 1-2 Split: normalize_decision + retry_gate + maybe_calculate 4 Non-goals section, test mapping ★★★★☆
4 deepseek-4-pro 4 1 (recipes) Single adjust node 4 Self-loop for decomposition, routing through adjust ★★★★☆
5 glm-5.1 2 1 (_pre_check_route) In acquire_or_plan (Option A) 4 _build_plan_output helper, Option A/B for correctors ★★★☆☆
6 gpt-5.4 6 0 Single decision_policy node 6 Anti-patterns section ★★★☆☆
7 gemini-3.1-pro 3 0 Missing — no post-LLM correction N/A (no plan) Problem articulation ★★☆☆☆
8 mimo-2.5-pro 3 0 Inside decide node 3 Open questions section ★★☆☆☆
9 qwen-3.7-max 10 2 (_route, _llm_failed) Split: post_process + enforce_caps 5 Complete code examples ★★☆☆☆
10 qwen-3.6-plus 7 0 In routing function (anti-pattern) 5 route_finish_check node ★☆☆☆☆
11 opus 12+ 1 (llm_failed) Split into 6 c_* nodes 3 Naming convention (g_*, c_*), state write matrix ★★☆☆☆