Skip to content

GPT-5.5 range analysis of planner graph refactor proposals

Scope and evaluation lens

This analysis ranks the proposals in docs/planner-graph-ref/proposals against the current planner implementation in src/venturescope/planner/agent.py and the current graph reference in docs/planner-graph-ref/current-graph.md.

The current pain point is real: plan_node() is the hidden policy engine of the planner. The public graph has a small plan -> action shape, while the actual implementation mixes iteration ticking, mandatory region/currency questions, decomposition, schema composition, calculator gates, deterministic acquisition, LLM planning, retry caps, and decision rewrites.

I used these criteria for the range:

  1. Behavior preservation. Keep the current outer contract, checkpoint-owned planner state, {conversation_id}:planner namespace strategy, one ask_user interrupt surface, and existing schema/calculator/search semantics.
  2. LangGraph fit. Nodes should own state mutations; conditional edge functions should stay side-effect free and route only from state.
  3. Right-sized graph. Expose meaningful durable phases without turning every if branch into a Postgres checkpoint write.
  4. State-surface discipline. Avoid storing complex derived objects such as FieldAcquisition recipes unless necessary. If adding flags, mirror them in PlannerState with safe defaults.
  5. Migration risk. Existing tests import plan_node() directly and run_planner_step() relies on iteration counts for turn-scoped search reporting, so staged extraction matters.

LangGraph-specific constraints that affect the ranking:

  • Treat node boundaries as replay/checkpoint boundaries. Do not rely on local in-node progress surviving an interrupt, resume, or process restart.
  • Keep graph state explicit, typed, and serializer-friendly. Route hints such as _route are not truly transient unless the graph state contract handles them deliberately.
  • Changing node names/topology can affect in-flight checkpoints because pending graph tasks refer to node names and checkpoint namespaces. A namespace bump is a product decision, not a code-style detail.
  • Use subgraph boundaries for ownership, not for hiding side effects. The planner should remain the owner of planner attempts, interrupts, and acquisition policy; the outer conversation graph should stay a bridge.

Ranked range

Rank Proposal Range Verdict
1 fable-5-proposal.md A Best concrete topology and migration detail.
2 gpt-5.4-proposal.md A- Best architecture principles and staging guidance.
3 gpt-5.5-proposal.md B+ Strong policy split, but its stage ordering needs correction.
4 deepseek-4-pro-proposal.md B+ Good compact graph, with state-caching risk.
5 opus-proposal.md B Useful exhaustive map, but too many graph nodes for first implementation.
6 kimi-2.6-proposal.md B- Mostly sound, but over-specializes bootstrap nodes and has graph noise.
7 glm-5.1-proposal.md C+ Pragmatic low-risk first cut, but leaves too much in the new planner node.
8 qwen-3.7-max-proposal.md C Detailed, but contains implementation hazards around routing state and graph wiring.
9 gemini-3.1-pro-proposal.md C Clear but too simplified for the actual current planner.
10 mimo-2.5-pro-proposal.md C- Good diagnosis, weak topology and some semantic mismatches.
11 qwen-3.6-plus-proposal.md D Moves mutation into routing functions and misses important current behavior.

Proposal-by-proposal evaluation

1. fable-5-proposal.md — best primary basis

Good sides

  • Splits the monolith into durable phases that map well to the current code: tick, prepare, select, decide, and guard.
  • Preserves the important current order: iteration and region/currency checks happen before expensive preparation/decomposition.
  • Correctly notices that a single current plan_node() execution may perform several LLM calls, so moving decomposition/preparation behind node boundaries improves checkpoint replay behavior.
  • Adds decision_origin and llm_failed, which are useful because deterministic decisions and LLM decisions do not need the same rewrite pipeline.
  • Explicitly calls out checkpoint compatibility and suggests a planner thread namespace bump for in-flight old graph states.
  • Keeps action nodes and outer planner contract unchanged.

Bad sides / risks

  • It adds two new state fields. That is acceptable, but they must be added to both State and PlannerState with defaults and reset at the loop entry.
  • Its select node still contains blocked-field decomposition, so not every LLM call is isolated. This is probably fine for the first pass, but should be acknowledged.
  • The migration says to delete plan_node() after rewiring. Because tests import it directly today, keeping a temporary compatibility wrapper or renaming only after tests move would be safer.

2. gpt-5.4-proposal.md — best principle set

Good sides

  • Has the cleanest rule: deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small.
  • Correctly warns not to mutate state in conditional edge functions.
  • Correctly warns that only one node should increment iterations; this matters because attempts use the iteration number and run_planner_step() filters turn searches by the pre-invocation iteration floor.
  • Keeps ask_user as the single interrupting node and keeps the outer conversation graph unchanged.
  • Recommends a staged migration from tick through decision_policy, which is operationally safer than one large graph rewrite.

Bad sides / risks

  • Less precise than Fable about how deterministic decisions and LLM-failure decisions should be distinguished downstream.
  • Says planner thread namespacing should stay unchanged. That is fine for fresh development checkpoints, but unsafe for real in-flight checkpoints whose pending node names reference the old graph.
  • Recommends renaming to llm_plan_node(). That is correct eventually, but should be delayed until direct tests are moved.

3. gpt-5.5-proposal.md — strong policy split, flawed ordering

Good sides

  • Separates normalization, retry policy, and calculator routing in a way that matches the actual hidden policy blocks in plan_node().
  • Correctly keeps all planner internals inside the planner subgraph and out of conversation/graph.py.
  • Correctly recommends minimal state additions and warns against storing prepared recipes unless serialization and benefit are proven.
  • Lists concrete tests to move from plan_node() to policy nodes.

Bad sides / risks

  • Its diagram runs prepare_schema before mandatory region/currency gates. That changes current ordering and can waste or misdirect decomposition/search context before location and currency are confirmed.
  • Splitting normalize_decision, retry_gate, and maybe_calculate into separate nodes is probably more graph detail than the first implementation needs. A single decision_policy/guard node can contain those deterministic rewrites initially.
  • The require_region and require_currency nodes should create normal PlannerDecision(action="ask_user") decisions and route to the existing ask_user node, not create a separate interrupt surface.

4. deepseek-4-pro-proposal.md — good compact graph

Good sides

  • The guard -> prepare -> plan -> adjust shape is easy to understand and likely enough to expose the main hidden lifecycle.
  • It routes all deterministic and LLM decisions through an adjust node, avoiding duplicated calculator adjustment logic.
  • It explicitly avoids exploding every predicate into a node.
  • The migration path starts with extraction before graph rewiring, which is the right sequence.

Bad sides / risks

  • It proposes a recipes field in state. That is risky because recipes are derived from static_field_acquisition and dynamic_decompositions; serializing them adds sync and compatibility burden for little benefit.
  • The proposed plan -> plan loop for newly generated decomposition may cause extra LLM planning calls. A deterministic normalizer that rewrites to the next component task is safer.
  • It underplays checkpoint compatibility risk from node-name/topology changes.

5. opus-proposal.md — valuable map, too granular as target

Good sides

  • Provides the most exhaustive mapping from current line-level concerns to new gate/corrector responsibilities.
  • Correctly introduces llm_failed to preserve the current safeguard that prevents a structured-output failure from being rewritten into another planner loop.
  • Has a useful node-write table; this is the right discipline for checkpointed graph state.
  • Recommends a three-step migration: helpers first, gates second, correctors third.

Bad sides / risks

  • Too many graph nodes for the first implementation. Splitting every corrector into a c_* node would add many Postgres checkpoint writes and make the graph noisy.
  • Some proposal links point at a different local path, so it should not be used mechanically.
  • The same clarity can be achieved initially with one decision_policy node and pure helper functions inside it.

6. kimi-2.6-proposal.md — mostly sound but noisy

Good sides

  • Good decomposition table and a reasonable tick, prepare, calc_gate, acquire, decide, enforce_policy shape.
  • Keeps post-decision safety rules together in enforce_policy, which is a good first-step compromise.
  • Suggests a temporary wrapper so plan_node() can delegate to the new pipeline during migration.

Bad sides / risks

  • Dedicated ask_user_region and ask_user_currency nodes duplicate the existing ask_user interrupt node and make the resume surface less uniform.
  • Some diagram details are under-specified, such as the calc_adjust path.
  • It is more node-heavy than necessary before the team proves the smaller phase split is insufficient.

7. glm-5.1-proposal.md — practical but incomplete

Good sides

  • Low-risk: starts by extracting a pre_check node and leaving most of the remaining behavior intact.
  • Correctly recognizes that post-LLM rewriting can stay together at first instead of becoming several graph nodes immediately.
  • Calls out a helper to standardize output assembly, which would reduce many repeated out construction branches.

Bad sides / risks

  • Leaves a large acquire_or_plan node containing acquisition, auto-finish, LLM planning, decomposition, retry caps, and policy rewrites. That solves only part of the monolith problem.
  • Uses a _pre_check_route state field where a PlannerDecision plus simple router, or a small typed phase flag, would be cleaner.
  • Does not fully address the current llm_failed local behavior.

8. qwen-3.7-max-proposal.md — detailed but hazardous

Good sides

  • Gives concrete pseudo-code for many nodes, which is useful for implementation thinking.
  • Separates completion, post-processing, and cap enforcement clearly.
  • Identifies the need for an LLM-failure flag.

Bad sides / risks

  • Uses _route and _llm_failed as transient state and says they should be excluded from checkpointing. In this project, graph state is checkpoint-owned; transient routing fields would need explicit schema/reducer discipline, not just comments.
  • The graph sketch routes observe_user to check_region_currency, bypassing the iteration tick after generic user answers. That risks breaking the one-iteration-per-decision-cycle semantics.
  • The sketch routes through a route_decision source without registering it as a normal node in the shown builder.
  • It is too implementation-heavy while still containing wiring errors, so it should be mined for node snippets only.

9. gemini-3.1-pro-proposal.md — clear but too simplified

Good sides

  • Easy to understand: prepare_state, evaluate_rules, llm_plan.
  • Correctly isolates the LLM call and makes deterministic rules testable.
  • Good high-level diagnosis of hidden control flow.

Bad sides / risks

  • Does not cover the current post-LLM policy pipeline: derived-field redirects, web-preferred redirects, retry caps, and calculator adjustment.
  • The proposed raw route_after_llm is too weak for the current planner.
  • It understates the complexity of current plan_node() and would likely leave behavior gaps around calculator-backed component acquisition.

10. mimo-2.5-pro-proposal.md — weak topology despite good diagnosis

Good sides

  • Correctly diagnoses plan_node() as a guard/router/LLM/policy monolith.
  • Suggests extraction without changing topology first.
  • Keeps the target graph small.

Bad sides / risks

  • The acquire routing is semantically off: deterministic acquisition tasks should often bypass the LLM, not route into a generic decide node that also owns LLM calls and all post-processing.
  • no_task -> finish is not safe without the current auto-finish and missing-field checks.
  • Leaves too much in decide, so the final design is still a smaller monolith.
  • The sketch references a plan_router without making its role concrete.

11. qwen-3.6-plus-proposal.md — avoid as an implementation basis

Good sides

  • Correctly identifies the monolithic plan_node() issue.
  • Correctly wants the graph diagram to become more truthful.

Bad sides / risks

  • Proposes moving redirects and cap enforcement into route_after_plan(). That is a LangGraph anti-pattern here because those rules mutate decision, status, and sometimes decomposition state; state mutation belongs in nodes, not conditional edge functions.
  • Splits region/currency into dedicated observe nodes even though observe_user_node() already handles them via target field. This duplicates resume logic and widens the interrupt surface.
  • Leaves search/observe/calculation loops returning to plan in several places, so loop-entry guards and iteration ticking are not consistently applied.
  • Omits enough acquisition and calculator nuance that it is unlikely to preserve behavior.

I would not adopt any proposal verbatim. The best solution is a hybrid:

  1. Use Fable's topology as the base: tick -> bootstrap_gate -> prepare -> select -> decide -> decision_policy -> action.
  2. Use GPT-5.4's rules: keep edge functions side-effect free, increment iterations only in tick, keep ask_user as the only interrupt node, and keep the outer conversation graph unchanged.
  3. Use GPT-5.5's state discipline: add only a small decision_origin field, or decision_origin = "llm_error" instead of a separate llm_failed, and do not store derived recipes in checkpoint state unless proven necessary.
  4. Use Opus/Kimi only for mapping details: their line-by-line decompositions are useful when extracting helper functions, but their final graphs are too granular for the first implementation.

Concrete target shape

flowchart TD
    START --> tick
    tick -->|terminal| finish
    tick -->|continue| bootstrap_gate

    bootstrap_gate -->|region/currency question| ask_user
    bootstrap_gate -->|ready| prepare

    prepare --> select
    select -->|deterministic decision| decision_policy
    select -->|needs LLM| decide

    decide --> decision_policy
    decision_policy -->|search| search
    decision_policy -->|ask_user| ask_user
    decision_policy -->|reflect| reflect
    decision_policy -->|calculate| calculate
    decision_policy -->|finish| finish

    search -->|observation| observe
    search -->|no observation| tick
    observe --> tick
    calculate --> tick
    ask_user --> observe_user
    observe_user --> tick
    reflect --> tick
    finish --> END

Suggested migration order

  1. Extract pure helpers from plan_node() while keeping the old graph topology. Existing tests should remain mostly unchanged.
  2. Add a small state flag such as decision_origin: Literal["deterministic", "llm", "llm_error"] | None to State and PlannerState; reset it in tick.
  3. Introduce tick and bootstrap_gate first. This preserves current region/currency-before-preparation ordering.
  4. Introduce prepare and select next. Keep recipes derived locally from existing state, not stored as a new checkpoint field.
  5. Reduce plan_node() to LLM-only decide behavior, but keep the exported function name until tests are moved.
  6. Move all deterministic rewrites into one decision_policy node. Split it later only if it remains too large after helper extraction.
  7. Decide explicitly whether to bump planner thread namespace to :planner:v2. If existing in-flight planner checkpoints matter, bump it; if this is a development-only migration, document why it is safe not to.

This combination gives the main architectural win without turning the planner into a graph of tiny predicates. It preserves the current runtime surface while making the graph honest about the phases that are currently hidden inside plan_node().