Planner Graph Refactor Proposals — Evaluation & Ranking¶

Evaluator: mimo-2.5-pro (Sisyphus) Date: 2026-06-11 Scope: 11 proposals in docs/planner-graph-ref/proposals/ Reference: current-graph.md, src/venturescope/planner/agent.py:846-1192

Evaluation Criteria¶

Each proposal is assessed on:

Criterion	Weight	What it measures
Problem accuracy	15%	Does the proposal correctly identify all 10+ responsibilities in `plan_node`?
Decomposition quality	25%	Are node boundaries drawn at the right seams? Single-responsibility?
Graph honesty	20%	Does the resulting Mermaid diagram actually represent the control flow?
Migration safety	15%	Can this ship incrementally without breaking existing planner behavior?
Test seam quality	10%	Can each new node be unit-tested in isolation?
LangGraph alignment	10%	Does it follow LangGraph idioms (nodes do work, edges do routing)?
Practical completeness	5%	Are state changes, risks, and edge cases addressed?

Ranked List¶

1. fable-5 — "Decompose `plan_node` into Graph-Level Stages"¶

Score: 9.2/10

Criterion	Score	Notes
Problem accuracy	9	Maps all 9 responsibility blocks with line ranges
Decomposition quality	9.5	`tick` / `prepare` / `select` / `decide` / `guard` — clean, well-bounded
Graph honesty	9.5	Mermaid diagram shows real control flow including deterministic bypass
Migration safety	9	Two-phase: extract without rewiring, then rewire. Thread namespace bump for in-flight checkpoints.
Test seam quality	9.5	`decision_origin` field cleanly separates deterministic vs LLM paths for `guard`
LangGraph alignment	9	Nodes own mutations, edges own routing, `tick` as loop entry
Practical completeness	9	State changes documented, checkpoint compatibility addressed

What's good: - The tick → prepare → select → decide → guard pipeline is the most natural decomposition. Each node has exactly one job. - decision_origin: Literal["deterministic", "llm"] is elegant — it tells guard which rewrite subset to apply without needing to know where the decision came from. - The select node correctly captures "deterministic decision ladder" as a distinct phase before the LLM is ever called. - Checkpoint compatibility is explicitly addressed with thread namespace bumping. - The "Follow-up" section honestly defers decompose loop node and Command(goto=...) as out-of-scope.

What's bad: - select still has one LLM call (decomposition for blocked field without recipe, lines 960-968). This slightly violates "deterministic" labeling. The proposal acknowledges this. - The guard node's decision_origin-based branching adds a small amount of complexity that needs documentation.

2. gpt-5.5 — "Move Planner Control Logic to Graph Level"¶

Score: 9.0/10

Criterion	Score	Notes
Problem accuracy	9	Lists all 9 responsibility categories
Decomposition quality	9	`enter_iteration` / `prepare_schema` / `require_region` / `require_currency` / `acquisition_gate` / `plan` / `normalize_decision` / `retry_gate` / `maybe_calculate`
Graph honesty	9	Mermaid is detailed and accurate
Migration safety	8.5	4-step staged migration, but many nodes to wire at once
Test seam quality	9.5	Each policy node is independently testable
LangGraph alignment	9	Clean separation of concerns
Practical completeness	9	State changes, test strategy, non-goals all documented

What's good: - normalize_decision, retry_gate, maybe_calculate as separate nodes is the most granular correct decomposition. Each post-LLM policy is independently testable. - require_region and require_currency as dedicated nodes (not just ask_user) makes bootstrap flow explicit in the graph. - The "Non-goals" section is precise: no outer graph changes, no SQL persistence changes. - llm_failed or decision_origin state field for calculator loop protection is correctly identified.

What's bad: - 9 new nodes is a lot. The graph becomes visually complex even though behavior is the same. For a team that reads Mermaid diagrams, this may be harder to follow than a 5-node pipeline. - The migration plan ("Step 1: extract helpers, Step 2: add graph nodes one group at a time") is good but doesn't address checkpoint compatibility for in-flight conversations.

3. gpt-5.4 — "Planner Graph 5.4 Proposal"¶

Score: 8.8/10

Criterion	Score	Notes
Problem accuracy	8.5	Lists 7 responsibility categories (slightly less granular)
Decomposition quality	9	`tick` / `bootstrap_gate` / `prepare_context` / `acquisition_gate` / `llm_plan` / `decision_policy`
Graph honesty	9	Clean Mermaid with 6 pipeline stages
Migration safety	9	6-stage migration, each stage is a standalone commit
Test seam quality	8.5	Good, but `decision_policy` bundles multiple rewrites
LangGraph alignment	9	Explicitly warns against state mutation in edge functions
Practical completeness	8.5	Anti-patterns section is excellent

What's good: - The "anti-patterns to avoid" section is unique and valuable: "Do not move state mutation into edge functions", "Do not increment iterations in every gate", "Do not turn every policy rule into a node", "Do not move interrupts into policy gates". - 6-stage migration plan is the most detailed and safest. - tick as the sole owner of iterations increment is correctly called out as critical. - "deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small" is the clearest design principle statement.

What's bad: - decision_policy bundles derived-field redirect, web-preferred redirect, search cap, ask-user cap, and calc adjustment into one node. This is a pragmatic choice but means that node is still ~80 lines with multiple concerns. - bootstrap_gate as a single node that routes to ask_user for either region or currency loses some explicitness compared to separate require_region/require_currency nodes.

4. deepseek-4-pro — "Decompose `plan_node` into Graph-Level Logic"¶

Score: 8.5/10

Criterion	Score	Notes
Problem accuracy	9	Lists all 12 responsibility items with line references
Decomposition quality	8.5	`guard` / `prepare` / `plan` / `adjust` — 4 nodes
Graph honesty	8.5	Good Mermaid, but `prepare` routing is complex
Migration safety	9	4-phase migration, Phase 1 is extraction without graph changes
Test seam quality	8	Good, but `prepare` has 5 exit paths
LangGraph alignment	8.5	Mostly good, but `plan` self-loop for decomposition is unusual
Practical completeness	8.5	Risk assessment is honest

What's good: - The guard → prepare → plan → adjust pipeline is the simplest 4-node decomposition that captures the essential phases. - recipes: dict[str, FieldAcquisition] in state is a good optimization — avoids recomputing in multiple nodes. - The migration path is clear: Phase 1 (extract helpers), Phase 2 (rewire graph), Phase 3 (cleanup). - Risk assessment is honest about recipes synchronization concerns.

What's bad: - prepare has 5 exit conditions (calculator cap, calculator success, blocked→acquire, auto-complete, pass-through). This is more routing complexity than ideal for a "preparation" node. - plan self-looping for decomposition generation is architecturally unusual — a node looping to itself with a conditional edge feels like a workaround. - adjust at ~80 lines is still substantial for a "correction" node.

5. opus — "Hoist `plan_node` Policy Into Graph Edges"¶

Score: 8.3/10

Criterion	Score	Notes
Problem accuracy	9.5	Most detailed responsibility table (17 items with line ranges)
Decomposition quality	8	`g_iter_cap` / `g_region` / `g_currency` / `enrich_schema` / `g_calc_caps` / `acquire` / `plan_llm` / 6 `c_*` correctors / `dispatch`
Graph honesty	8.5	Very detailed Mermaid with naming conventions
Migration safety	8	3-step migration, but Step 2 is "hoist gates into graph" which is a big change
Test seam quality	8.5	Each corrector is independently testable
LangGraph alignment	7.5	10+ nodes with `add_edge` chains for correctors — unusual pattern
Practical completeness	8.5	State surface changes, routing rules documented

What's good: - The g_* (gate) / enrich / acquire / plan_llm / c_* (corrector) / dispatch naming convention is the clearest taxonomy. - The "one invariant ties everything together" section is excellent: every path deposits a PlannerDecision, dispatch reads decision.action. - 17-item responsibility mapping is the most thorough problem analysis. - State surface changes table (which node writes which fields) is unique and valuable.

What's bad: - 10+ graph nodes is too many. The c_* corrector chain as 6 separate nodes with add_edge (not add_conditional_edges) creates a long linear chain that adds checkpoint writes without adding routing intelligence. - The g_region → emit_region split (predicate vs emitter) is over-engineering for a 2-line check. - Migration Step 2 ("hoist gates into graph") is a large change — multiple new nodes and edges at once.

6. kimi-2.6 — "Decompose `plan_node` into Graph-Level Pipeline"¶

Score: 8.1/10

Criterion	Score	Notes
Problem accuracy	8.5	Lists 9 responsibility categories
Decomposition quality	8	`tick` / `ask_region` / `ask_currency` / `prepare` / `calc_gate` / `acquire` / `route_direct` / `decide` / `enforce_policy`
Graph honesty	8	Good Mermaid with clear node labels
Migration safety	7.5	3-step migration but lacks checkpoint compatibility details
Test seam quality	8	Each node is testable
LangGraph alignment	8	Good separation
Practical completeness	7.5	Decision matrix appendix is helpful but risks section is thin

What's good: - ask_region / ask_currency as dedicated nodes (not generic ask_user) is correct — bootstrap questions are structural, not LLM decisions. - The decision matrix appendix (Logic Block → Current Location → Proposed Location → Reason) is excellent for implementation. - route_direct as a tiny adapter node bridging deterministic acquisition into the post-processing pipeline is a clean pattern.

What's bad: - enforce_policy bundles 7 different rewrites into one node. The proposal acknowledges this ("still the most complex node") but doesn't offer a clear path to further decomposition. - The migration plan is thin — "Extract helper functions" → "Register as nodes" → "Delete plan_node" is too high-level. - No discussion of checkpoint compatibility or in-flight conversation handling.

7. mimo-2.5-pro — "Refactor `plan_node` — Move Routing Logic to Graph Level"¶

Score: 7.8/10

Criterion	Score	Notes
Problem accuracy	8	Lists 10 responsibility categories
Decomposition quality	7.5	`guards` / `acquire` / `decide` — 3 nodes
Graph honesty	7.5	Simple Mermaid, but `decide` bundles too much
Migration safety	8	3-phase migration with risk assessment
Test seam quality	7.5	`decide` node is still ~80 lines with multiple concerns
LangGraph alignment	7.5	Mostly good, but `plan_router` as a routing function (not a node) is inconsistent
Practical completeness	7.5	Open questions section is honest

What's good: - The 3-node decomposition (guards → acquire → decide) is the simplest proposal. Easy to understand and implement. - Phase 1 (extract without changing topology) is the safest migration approach. - The "Expected Benefits" table is honest about LOC counts.

What's bad: - decide bundles LLM call + redirectors + loop/cap detection. This is still a complex node (~80 lines) with multiple concerns. - plan_router as a routing function (not a node) is inconsistent with the rest of the graph — it's a function that decide calls, not a graph-level construct. - Open questions (should decide include loop detection? should plan_router be a node?) suggest the design isn't fully resolved.

8. gemini-3.1-pro — "Refactoring Planner Logic to the Graph Level"¶

Score: 7.5/10

Criterion	Score	Notes
Problem accuracy	7.5	Lists 8 responsibility items (less granular)
Decomposition quality	7	`prepare_state` / `evaluate_rules` / `llm_plan` — 3 nodes
Graph honesty	7	Simple Mermaid, but `evaluate_rules` bundles too much
Migration safety	7.5	3-step implementation plan
Test seam quality	7	`evaluate_rules` is still complex
LangGraph alignment	7.5	Good separation of concerns
Practical completeness	6.5	Benefits section is thin, no risks section

What's good: - The 3-node split (prepare_state / evaluate_rules / llm_plan) is the simplest decomposition that captures the essential separation. - "Reduced Latency/Cost Risk: The LLM node becomes completely isolated" is a good insight. - The implementation steps are concrete and actionable.

What's bad: - evaluate_rules bundles 7 different checks (max_iters, aborted, region, currency, calculator, acquisition, auto-finish) into one node. This is still a complex function. - The "route_after_rules" edge function handles both deterministic routing AND the "needs_llm" flag — this mixes concerns. - No discussion of state changes, checkpoint compatibility, or risks. - The Mermaid diagram uses route_after_rules and route_after_llm as diamond nodes, but in LangGraph these are edge functions, not nodes. This is slightly misleading.

9. glm-5.1 — "Decompose `plan` Node into Graph-Level Routing"¶

Score: 7.3/10

Criterion	Score	Notes
Problem accuracy	7.5	Lists 6 responsibility categories
Decomposition quality	7	`pre_check` / `acquire_or_plan` — 2 nodes
Graph honesty	7	Simple Mermaid, but `acquire_or_plan` is still complex
Migration safety	7.5	4-phase migration with optional Phase 4
Test seam quality	7	`pre_check` is testable, `acquire_or_plan` is still ~200 LOC
LangGraph alignment	7	Good separation
Practical completeness	7	Risks section is present but thin

What's good: - The pre_check extraction is the minimal valuable change — it pulls out all no-LLM guards into a testable node. - Phase 4 (optional validate_decision node) is honest about what's deferred. - The _build_plan_output helper idea is practical — collapsing 9 conditional-return patterns.

What's bad: - acquire_or_plan at ~200 LOC is still a substantial node with 7 sequential steps. The proposal acknowledges this but doesn't offer a clear path to further decomposition. - Only 2 new nodes is the least ambitious decomposition. It solves the "guard interleaving" problem but leaves the "post-LLM rewriting" problem inside acquire_or_plan. - The _pre_check_route state field uses a string literal instead of a typed enum, which is less safe.

10. qwen-3.6-plus — "Decompose `plan_node` into Graph-Level Routing"¶

Score: 7.0/10

Criterion	Score	Notes
Problem accuracy	7.5	Lists 12 responsibility items
Decomposition quality	6.5	`preflight` / `decompose` / `plan` + `route_after_plan` with redirects + `route_finish_check`
Graph honesty	7	Good Mermaid, but `route_after_plan` as edge function doing redirects is unusual
Migration safety	7	5-phase migration
Test seam quality	6.5	`route_after_plan` as an edge function doing redirects is hard to test
LangGraph alignment	6	Edge functions doing state mutation (redirects) violates LangGraph idioms
Practical completeness	7	Open questions section is thoughtful

What's good: - ask_region / ask_currency with dedicated observe_region / observe_currency nodes is the most explicit bootstrap handling. - route_finish_check as a validation node before END is a unique insight — it catches cases where finish is premature. - The 5-phase migration plan is detailed.

What's bad: - Moving post-LLM redirects into route_after_plan (an edge function) is an anti-pattern in LangGraph. Edge functions should be pure routing decisions, not state mutations. This makes checkpoint behavior harder to reason about. - route_after_plan doing 6 different redirect checks is complex for an edge function — it should be a node. - The proposal creates observe_region / observe_currency as separate nodes, which adds graph complexity without clear benefit (they share most logic with observe_user).

11. qwen-3.7-max — "Decomposing the `plan` Node"¶

Score: 6.5/10

Criterion	Score	Notes
Problem accuracy	7	Lists 11 responsibility items
Decomposition quality	6	10 nodes including `check_termination`, `check_region_currency`, `compose_schema`, `check_calculator`, `acquisition_routing`, `check_completion`, `llm_decide`, `post_process`, `enforce_caps`, `route_decision`
Graph honesty	6.5	Detailed Mermaid but overly granular
Migration safety	6	5-phase migration, but 10 nodes is a big change
Test seam quality	6.5	Each node is testable, but graph complexity is high
LangGraph alignment	5.5	`_route` state field as transient routing hint is an anti-pattern
Practical completeness	6.5	State schema changes documented, but `_route` field approach is problematic

What's good: - The most granular decomposition — every check is its own node. - Code examples for each node are complete and runnable. - The _route state field approach is at least explicit about what it's doing.

What's bad: - 10 new nodes is too many. The graph becomes harder to read than the original monolith. The Mermaid diagram has 20+ edges. - Using _route: NotRequired[str] as a transient routing hint stored in state is an anti-pattern. LangGraph conditional edges should read existing state and return a string — they shouldn't need a special routing field. This adds state mutation that the checkpointer sees. - check_termination and check_region_currency as separate nodes adds a hop for what could be a single preflight node. - route_decision as a node that just reads decision.action is redundant — this is what route_after_plan already does as an edge function.

Summary Table¶

Rank	Proposal	Author	Nodes	Score	Key Strength	Key Weakness
1	fable-5	fable-5	5 new	9.2	`decision_origin` for deterministic vs LLM separation	`select` still has one LLM call
2	gpt-5.5	gpt-5.5	9 new	9.0	Most granular correct decomposition	Graph visually complex
3	gpt-5.4	gpt-5.4	6 new	8.8	Best anti-patterns section, safest migration	`decision_policy` bundles multiple rewrites
4	deepseek-4-pro	deepseek-v4-pro	4 new	8.5	Simplest pipeline that captures essential phases	`plan` self-loop for decomposition
5	opus	opus-4.7	10+ new	8.3	Best naming convention and responsibility mapping	Too many nodes, corrector chain overhead
6	kimi-2.6	kimi-k2.6	8 new	8.1	Decision matrix appendix	`enforce_policy` still complex
7	mimo-2.5-pro	mimo-2.5-pro	3 new	7.8	Simplest proposal, easiest to understand	`decide` bundles too much
8	gemini-3.1-pro	gemini-3.1-pro	3 new	7.5	Simplest viable decomposition	`evaluate_rules` still complex, thin docs
9	glm-5.1	glm-5.1	2 new	7.3	Minimal valuable change	`acquire_or_plan` still ~200 LOC
10	qwen-3.6-plus	qwen-3.6-plus	5 new	7.0	`route_finish_check` validation node	Edge functions doing state mutation
11	qwen-3.7-max	qwen-3.7-max	10 new	6.5	Most granular	Too many nodes, `_route` anti-pattern

Recommendation¶

Best single proposal: fable-5. It has the best balance of decomposition quality, migration safety, and LangGraph alignment. The decision_origin field is an elegant solution for separating deterministic and LLM paths.

Best combination: fable-5 core + gpt-5.4 anti-patterns + gpt-5.5 post-LLM granularity.

Concretely:

Adopt fable-5's 5-node pipeline (tick → prepare → select → decide → guard) as the core architecture.
Adopt gpt-5.4's anti-patterns section as implementation constraints (no state mutation in edges, only tick increments iterations, don't turn every rule into a node, keep ask_user as sole interrupt).
Consider splitting guard into normalize_decision + retry_gate + maybe_calculate (from gpt-5.5) if post-LLM policy testing proves difficult after the initial refactor. This can be a follow-up PR.
Adopt gpt-5.4's 6-stage migration plan for safety (each stage is a standalone commit).

The combined approach gives: - 5 nodes in the initial refactor (fable-5) - Clear design constraints (gpt-5.4 anti-patterns) - Optional further decomposition path (gpt-5.5 post-LLM nodes) - Safest possible migration (gpt-5.4 staged approach)

What to avoid: - qwen-3.7-max's 10-node decomposition and _route state field pattern - qwen-3.6-plus's edge functions doing state mutation - opus's 10+ node corrector chain (too many checkpoint writes per iteration)

Appendix: Convergence Analysis¶

Most proposals agree on these architectural decisions:

Decision	Consensus	Dissent
Extract iteration/abort guards into a `tick` node	10/11 agree	glm-5.1 calls it `pre_check`
Extract schema preparation (decomposition + composition) into its own node	9/11 agree	qwen-3.7-max calls it `compose_schema`
Isolate the LLM call into a slim node	11/11 agree	—
Extract post-LLM decision correction into a separate phase	8/11 agree	glm-5.1 keeps it in `acquire_or_plan`
Region/currency bootstrap should bypass normal planning	7/11 agree	Some keep it in `tick`/`guard`
Loop-back edges should target the first guard node, not `plan`	9/11 agree	—
`llm_failed` or `decision_origin` state field for calculator loop protection	6/11 agree	Some handle it inline

The strongest consensus is on isolating the LLM call and extracting iteration guards. The weakest consensus is on how granular post-LLM correction should be (one node vs multiple).

Planner Graph Refactor Proposals — Evaluation & Ranking¶

Evaluation Criteria¶

Ranked List¶

1. fable-5 — "Decompose plan_node into Graph-Level Stages"¶

2. gpt-5.5 — "Move Planner Control Logic to Graph Level"¶

3. gpt-5.4 — "Planner Graph 5.4 Proposal"¶

4. deepseek-4-pro — "Decompose plan_node into Graph-Level Logic"¶

5. opus — "Hoist plan_node Policy Into Graph Edges"¶

6. kimi-2.6 — "Decompose plan_node into Graph-Level Pipeline"¶

7. mimo-2.5-pro — "Refactor plan_node — Move Routing Logic to Graph Level"¶