GPT-5.5 range analysis of planner graph refactor proposals¶

Scope and evaluation lens¶

This analysis ranks the proposals in docs/planner-graph-ref/proposals against the current planner implementation in src/venturescope/planner/agent.py and the current graph reference in docs/planner-graph-ref/current-graph.md.

The current pain point is real: plan_node() is the hidden policy engine of the planner. The public graph has a small plan -> action shape, while the actual implementation mixes iteration ticking, mandatory region/currency questions, decomposition, schema composition, calculator gates, deterministic acquisition, LLM planning, retry caps, and decision rewrites.

I used these criteria for the range:

Behavior preservation. Keep the current outer contract, checkpoint-owned planner state, {conversation_id}:planner namespace strategy, one ask_user interrupt surface, and existing schema/calculator/search semantics.
LangGraph fit. Nodes should own state mutations; conditional edge functions should stay side-effect free and route only from state.
Right-sized graph. Expose meaningful durable phases without turning every if branch into a Postgres checkpoint write.
State-surface discipline. Avoid storing complex derived objects such as FieldAcquisition recipes unless necessary. If adding flags, mirror them in PlannerState with safe defaults.
Migration risk. Existing tests import plan_node() directly and run_planner_step() relies on iteration counts for turn-scoped search reporting, so staged extraction matters.

LangGraph-specific constraints that affect the ranking:

Treat node boundaries as replay/checkpoint boundaries. Do not rely on local in-node progress surviving an interrupt, resume, or process restart.
Keep graph state explicit, typed, and serializer-friendly. Route hints such as _route are not truly transient unless the graph state contract handles them deliberately.
Changing node names/topology can affect in-flight checkpoints because pending graph tasks refer to node names and checkpoint namespaces. A namespace bump is a product decision, not a code-style detail.
Use subgraph boundaries for ownership, not for hiding side effects. The planner should remain the owner of planner attempts, interrupts, and acquisition policy; the outer conversation graph should stay a bridge.

Ranked range¶

Rank	Proposal	Range	Verdict
1	`fable-5-proposal.md`	A	Best concrete topology and migration detail.
2	`gpt-5.4-proposal.md`	A-	Best architecture principles and staging guidance.
3	`gpt-5.5-proposal.md`	B+	Strong policy split, but its stage ordering needs correction.
4	`deepseek-4-pro-proposal.md`	B+	Good compact graph, with state-caching risk.
5	`opus-proposal.md`	B	Useful exhaustive map, but too many graph nodes for first implementation.
6	`kimi-2.6-proposal.md`	B-	Mostly sound, but over-specializes bootstrap nodes and has graph noise.
7	`glm-5.1-proposal.md`	C+	Pragmatic low-risk first cut, but leaves too much in the new planner node.
8	`qwen-3.7-max-proposal.md`	C	Detailed, but contains implementation hazards around routing state and graph wiring.
9	`gemini-3.1-pro-proposal.md`	C	Clear but too simplified for the actual current planner.
10	`mimo-2.5-pro-proposal.md`	C-	Good diagnosis, weak topology and some semantic mismatches.
11	`qwen-3.6-plus-proposal.md`	D	Moves mutation into routing functions and misses important current behavior.

Proposal-by-proposal evaluation¶

1. `fable-5-proposal.md` — best primary basis¶

Good sides

Splits the monolith into durable phases that map well to the current code: tick, prepare, select, decide, and guard.
Preserves the important current order: iteration and region/currency checks happen before expensive preparation/decomposition.
Correctly notices that a single current plan_node() execution may perform several LLM calls, so moving decomposition/preparation behind node boundaries improves checkpoint replay behavior.
Adds decision_origin and llm_failed, which are useful because deterministic decisions and LLM decisions do not need the same rewrite pipeline.
Explicitly calls out checkpoint compatibility and suggests a planner thread namespace bump for in-flight old graph states.
Keeps action nodes and outer planner contract unchanged.

Bad sides / risks

It adds two new state fields. That is acceptable, but they must be added to both State and PlannerState with defaults and reset at the loop entry.
Its select node still contains blocked-field decomposition, so not every LLM call is isolated. This is probably fine for the first pass, but should be acknowledged.
The migration says to delete plan_node() after rewiring. Because tests import it directly today, keeping a temporary compatibility wrapper or renaming only after tests move would be safer.

2. `gpt-5.4-proposal.md` — best principle set¶

Good sides

Has the cleanest rule: deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small.
Correctly warns not to mutate state in conditional edge functions.
Correctly warns that only one node should increment iterations; this matters because attempts use the iteration number and run_planner_step() filters turn searches by the pre-invocation iteration floor.
Keeps ask_user as the single interrupting node and keeps the outer conversation graph unchanged.
Recommends a staged migration from tick through decision_policy, which is operationally safer than one large graph rewrite.

Bad sides / risks

Less precise than Fable about how deterministic decisions and LLM-failure decisions should be distinguished downstream.
Says planner thread namespacing should stay unchanged. That is fine for fresh development checkpoints, but unsafe for real in-flight checkpoints whose pending node names reference the old graph.
Recommends renaming to llm_plan_node(). That is correct eventually, but should be delayed until direct tests are moved.

3. `gpt-5.5-proposal.md` — strong policy split, flawed ordering¶

Good sides

Separates normalization, retry policy, and calculator routing in a way that matches the actual hidden policy blocks in plan_node().
Correctly keeps all planner internals inside the planner subgraph and out of conversation/graph.py.
Correctly recommends minimal state additions and warns against storing prepared recipes unless serialization and benefit are proven.
Lists concrete tests to move from plan_node() to policy nodes.

Bad sides / risks

Its diagram runs prepare_schema before mandatory region/currency gates. That changes current ordering and can waste or misdirect decomposition/search context before location and currency are confirmed.
Splitting normalize_decision, retry_gate, and maybe_calculate into separate nodes is probably more graph detail than the first implementation needs. A single decision_policy/guard node can contain those deterministic rewrites initially.
The require_region and require_currency nodes should create normal PlannerDecision(action="ask_user") decisions and route to the existing ask_user node, not create a separate interrupt surface.

4. `deepseek-4-pro-proposal.md` — good compact graph¶

Good sides

The guard -> prepare -> plan -> adjust shape is easy to understand and likely enough to expose the main hidden lifecycle.
It routes all deterministic and LLM decisions through an adjust node, avoiding duplicated calculator adjustment logic.
It explicitly avoids exploding every predicate into a node.
The migration path starts with extraction before graph rewiring, which is the right sequence.

Bad sides / risks

It proposes a recipes field in state. That is risky because recipes are derived from static_field_acquisition and dynamic_decompositions; serializing them adds sync and compatibility burden for little benefit.
The proposed plan -> plan loop for newly generated decomposition may cause extra LLM planning calls. A deterministic normalizer that rewrites to the next component task is safer.
It underplays checkpoint compatibility risk from node-name/topology changes.

5. `opus-proposal.md` — valuable map, too granular as target¶

Good sides

Provides the most exhaustive mapping from current line-level concerns to new gate/corrector responsibilities.
Correctly introduces llm_failed to preserve the current safeguard that prevents a structured-output failure from being rewritten into another planner loop.
Has a useful node-write table; this is the right discipline for checkpointed graph state.
Recommends a three-step migration: helpers first, gates second, correctors third.

Bad sides / risks

Too many graph nodes for the first implementation. Splitting every corrector into a c_* node would add many Postgres checkpoint writes and make the graph noisy.
Some proposal links point at a different local path, so it should not be used mechanically.
The same clarity can be achieved initially with one decision_policy node and pure helper functions inside it.

6. `kimi-2.6-proposal.md` — mostly sound but noisy¶

Good sides

Good decomposition table and a reasonable tick, prepare, calc_gate, acquire, decide, enforce_policy shape.
Keeps post-decision safety rules together in enforce_policy, which is a good first-step compromise.
Suggests a temporary wrapper so plan_node() can delegate to the new pipeline during migration.

Bad sides / risks

Dedicated ask_user_region and ask_user_currency nodes duplicate the existing ask_user interrupt node and make the resume surface less uniform.
Some diagram details are under-specified, such as the calc_adjust path.
It is more node-heavy than necessary before the team proves the smaller phase split is insufficient.

7. `glm-5.1-proposal.md` — practical but incomplete¶

Good sides

Low-risk: starts by extracting a pre_check node and leaving most of the remaining behavior intact.
Correctly recognizes that post-LLM rewriting can stay together at first instead of becoming several graph nodes immediately.
Calls out a helper to standardize output assembly, which would reduce many repeated out construction branches.

Bad sides / risks

Leaves a large acquire_or_plan node containing acquisition, auto-finish, LLM planning, decomposition, retry caps, and policy rewrites. That solves only part of the monolith problem.
Uses a _pre_check_route state field where a PlannerDecision plus simple router, or a small typed phase flag, would be cleaner.
Does not fully address the current llm_failed local behavior.

8. `qwen-3.7-max-proposal.md` — detailed but hazardous¶

Good sides

Gives concrete pseudo-code for many nodes, which is useful for implementation thinking.
Separates completion, post-processing, and cap enforcement clearly.
Identifies the need for an LLM-failure flag.

Bad sides / risks

Uses _route and _llm_failed as transient state and says they should be excluded from checkpointing. In this project, graph state is checkpoint-owned; transient routing fields would need explicit schema/reducer discipline, not just comments.
The graph sketch routes observe_user to check_region_currency, bypassing the iteration tick after generic user answers. That risks breaking the one-iteration-per-decision-cycle semantics.
The sketch routes through a route_decision source without registering it as a normal node in the shown builder.
It is too implementation-heavy while still containing wiring errors, so it should be mined for node snippets only.

9. `gemini-3.1-pro-proposal.md` — clear but too simplified¶

Good sides

Easy to understand: prepare_state, evaluate_rules, llm_plan.
Correctly isolates the LLM call and makes deterministic rules testable.
Good high-level diagnosis of hidden control flow.

Bad sides / risks

Does not cover the current post-LLM policy pipeline: derived-field redirects, web-preferred redirects, retry caps, and calculator adjustment.
The proposed raw route_after_llm is too weak for the current planner.
It understates the complexity of current plan_node() and would likely leave behavior gaps around calculator-backed component acquisition.

10. `mimo-2.5-pro-proposal.md` — weak topology despite good diagnosis¶

Good sides

Correctly diagnoses plan_node() as a guard/router/LLM/policy monolith.
Suggests extraction without changing topology first.
Keeps the target graph small.

Bad sides / risks

The acquire routing is semantically off: deterministic acquisition tasks should often bypass the LLM, not route into a generic decide node that also owns LLM calls and all post-processing.
no_task -> finish is not safe without the current auto-finish and missing-field checks.
Leaves too much in decide, so the final design is still a smaller monolith.
The sketch references a plan_router without making its role concrete.

11. `qwen-3.6-plus-proposal.md` — avoid as an implementation basis¶

Good sides

Correctly identifies the monolithic plan_node() issue.
Correctly wants the graph diagram to become more truthful.

Bad sides / risks

Proposes moving redirects and cap enforcement into route_after_plan(). That is a LangGraph anti-pattern here because those rules mutate decision, status, and sometimes decomposition state; state mutation belongs in nodes, not conditional edge functions.
Splits region/currency into dedicated observe nodes even though observe_user_node() already handles them via target field. This duplicates resume logic and widens the interrupt surface.
Leaves search/observe/calculation loops returning to plan in several places, so loop-entry guards and iteration ticking are not consistently applied.
Omits enough acquisition and calculator nuance that it is unlikely to preserve behavior.

GPT-5.5 range analysis of planner graph refactor proposals¶

Scope and evaluation lens¶

Ranked range¶

Proposal-by-proposal evaluation¶

1. `fable-5-proposal.md` — best primary basis¶

2. `gpt-5.4-proposal.md` — best principle set¶

3. `gpt-5.5-proposal.md` — strong policy split, flawed ordering¶

4. `deepseek-4-pro-proposal.md` — good compact graph¶

5. `opus-proposal.md` — valuable map, too granular as target¶

6. `kimi-2.6-proposal.md` — mostly sound but noisy¶

7. `glm-5.1-proposal.md` — practical but incomplete¶

8. `qwen-3.7-max-proposal.md` — detailed but hazardous¶

9. `gemini-3.1-pro-proposal.md` — clear but too simplified¶

10. `mimo-2.5-pro-proposal.md` — weak topology despite good diagnosis¶

11. `qwen-3.6-plus-proposal.md` — avoid as an implementation basis¶

Recommended solution¶

Concrete target shape¶

Suggested migration order¶

GPT-5.5 range analysis of planner graph refactor proposals¶

Scope and evaluation lens¶

Ranked range¶

Proposal-by-proposal evaluation¶

1. fable-5-proposal.md — best primary basis¶

2. gpt-5.4-proposal.md — best principle set¶

3. gpt-5.5-proposal.md — strong policy split, flawed ordering¶

4. deepseek-4-pro-proposal.md — good compact graph¶

5. opus-proposal.md — valuable map, too granular as target¶

6. kimi-2.6-proposal.md — mostly sound but noisy¶

7. glm-5.1-proposal.md — practical but incomplete¶

8. qwen-3.7-max-proposal.md — detailed but hazardous¶

9. gemini-3.1-pro-proposal.md — clear but too simplified¶

10. mimo-2.5-pro-proposal.md — weak topology despite good diagnosis¶

11. qwen-3.6-plus-proposal.md — avoid as an implementation basis¶

Recommended solution¶

Concrete target shape¶

Suggested migration order¶

1. `fable-5-proposal.md` — best primary basis¶

2. `gpt-5.4-proposal.md` — best principle set¶

3. `gpt-5.5-proposal.md` — strong policy split, flawed ordering¶

4. `deepseek-4-pro-proposal.md` — good compact graph¶

5. `opus-proposal.md` — valuable map, too granular as target¶

6. `kimi-2.6-proposal.md` — mostly sound but noisy¶

7. `glm-5.1-proposal.md` — practical but incomplete¶

8. `qwen-3.7-max-proposal.md` — detailed but hazardous¶

9. `gemini-3.1-pro-proposal.md` — clear but too simplified¶

10. `mimo-2.5-pro-proposal.md` — weak topology despite good diagnosis¶

11. `qwen-3.6-plus-proposal.md` — avoid as an implementation basis¶