GPT-5.5 range analysis of planner graph refactor proposals¶
Scope and evaluation lens¶
This analysis ranks the proposals in docs/planner-graph-ref/proposals against the current planner implementation in src/venturescope/planner/agent.py and the current graph reference in docs/planner-graph-ref/current-graph.md.
The current pain point is real: plan_node() is the hidden policy engine of the planner. The public graph has a small plan -> action shape, while the actual implementation mixes iteration ticking, mandatory region/currency questions, decomposition, schema composition, calculator gates, deterministic acquisition, LLM planning, retry caps, and decision rewrites.
I used these criteria for the range:
- Behavior preservation. Keep the current outer contract, checkpoint-owned planner state,
{conversation_id}:plannernamespace strategy, oneask_userinterrupt surface, and existing schema/calculator/search semantics. - LangGraph fit. Nodes should own state mutations; conditional edge functions should stay side-effect free and route only from state.
- Right-sized graph. Expose meaningful durable phases without turning every
ifbranch into a Postgres checkpoint write. - State-surface discipline. Avoid storing complex derived objects such as
FieldAcquisitionrecipes unless necessary. If adding flags, mirror them inPlannerStatewith safe defaults. - Migration risk. Existing tests import
plan_node()directly andrun_planner_step()relies on iteration counts for turn-scoped search reporting, so staged extraction matters.
LangGraph-specific constraints that affect the ranking:
- Treat node boundaries as replay/checkpoint boundaries. Do not rely on local in-node progress surviving an interrupt, resume, or process restart.
- Keep graph state explicit, typed, and serializer-friendly. Route hints such as
_routeare not truly transient unless the graph state contract handles them deliberately. - Changing node names/topology can affect in-flight checkpoints because pending graph tasks refer to node names and checkpoint namespaces. A namespace bump is a product decision, not a code-style detail.
- Use subgraph boundaries for ownership, not for hiding side effects. The planner should remain the owner of planner attempts, interrupts, and acquisition policy; the outer conversation graph should stay a bridge.
Ranked range¶
| Rank | Proposal | Range | Verdict |
|---|---|---|---|
| 1 | fable-5-proposal.md |
A | Best concrete topology and migration detail. |
| 2 | gpt-5.4-proposal.md |
A- | Best architecture principles and staging guidance. |
| 3 | gpt-5.5-proposal.md |
B+ | Strong policy split, but its stage ordering needs correction. |
| 4 | deepseek-4-pro-proposal.md |
B+ | Good compact graph, with state-caching risk. |
| 5 | opus-proposal.md |
B | Useful exhaustive map, but too many graph nodes for first implementation. |
| 6 | kimi-2.6-proposal.md |
B- | Mostly sound, but over-specializes bootstrap nodes and has graph noise. |
| 7 | glm-5.1-proposal.md |
C+ | Pragmatic low-risk first cut, but leaves too much in the new planner node. |
| 8 | qwen-3.7-max-proposal.md |
C | Detailed, but contains implementation hazards around routing state and graph wiring. |
| 9 | gemini-3.1-pro-proposal.md |
C | Clear but too simplified for the actual current planner. |
| 10 | mimo-2.5-pro-proposal.md |
C- | Good diagnosis, weak topology and some semantic mismatches. |
| 11 | qwen-3.6-plus-proposal.md |
D | Moves mutation into routing functions and misses important current behavior. |
Proposal-by-proposal evaluation¶
1. fable-5-proposal.md — best primary basis¶
Good sides
- Splits the monolith into durable phases that map well to the current code:
tick,prepare,select,decide, andguard. - Preserves the important current order: iteration and region/currency checks happen before expensive preparation/decomposition.
- Correctly notices that a single current
plan_node()execution may perform several LLM calls, so moving decomposition/preparation behind node boundaries improves checkpoint replay behavior. - Adds
decision_originandllm_failed, which are useful because deterministic decisions and LLM decisions do not need the same rewrite pipeline. - Explicitly calls out checkpoint compatibility and suggests a planner thread namespace bump for in-flight old graph states.
- Keeps action nodes and outer planner contract unchanged.
Bad sides / risks
- It adds two new state fields. That is acceptable, but they must be added to both
StateandPlannerStatewith defaults and reset at the loop entry. - Its
selectnode still contains blocked-field decomposition, so not every LLM call is isolated. This is probably fine for the first pass, but should be acknowledged. - The migration says to delete
plan_node()after rewiring. Because tests import it directly today, keeping a temporary compatibility wrapper or renaming only after tests move would be safer.
2. gpt-5.4-proposal.md — best principle set¶
Good sides
- Has the cleanest rule: deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small.
- Correctly warns not to mutate state in conditional edge functions.
- Correctly warns that only one node should increment
iterations; this matters because attempts use the iteration number andrun_planner_step()filters turn searches by the pre-invocation iteration floor. - Keeps
ask_useras the single interrupting node and keeps the outer conversation graph unchanged. - Recommends a staged migration from
tickthroughdecision_policy, which is operationally safer than one large graph rewrite.
Bad sides / risks
- Less precise than Fable about how deterministic decisions and LLM-failure decisions should be distinguished downstream.
- Says planner thread namespacing should stay unchanged. That is fine for fresh development checkpoints, but unsafe for real in-flight checkpoints whose pending node names reference the old graph.
- Recommends renaming to
llm_plan_node(). That is correct eventually, but should be delayed until direct tests are moved.
3. gpt-5.5-proposal.md — strong policy split, flawed ordering¶
Good sides
- Separates normalization, retry policy, and calculator routing in a way that matches the actual hidden policy blocks in
plan_node(). - Correctly keeps all planner internals inside the planner subgraph and out of
conversation/graph.py. - Correctly recommends minimal state additions and warns against storing prepared recipes unless serialization and benefit are proven.
- Lists concrete tests to move from
plan_node()to policy nodes.
Bad sides / risks
- Its diagram runs
prepare_schemabefore mandatory region/currency gates. That changes current ordering and can waste or misdirect decomposition/search context before location and currency are confirmed. - Splitting
normalize_decision,retry_gate, andmaybe_calculateinto separate nodes is probably more graph detail than the first implementation needs. A singledecision_policy/guardnode can contain those deterministic rewrites initially. - The
require_regionandrequire_currencynodes should create normalPlannerDecision(action="ask_user")decisions and route to the existingask_usernode, not create a separate interrupt surface.
4. deepseek-4-pro-proposal.md — good compact graph¶
Good sides
- The
guard -> prepare -> plan -> adjustshape is easy to understand and likely enough to expose the main hidden lifecycle. - It routes all deterministic and LLM decisions through an
adjustnode, avoiding duplicated calculator adjustment logic. - It explicitly avoids exploding every predicate into a node.
- The migration path starts with extraction before graph rewiring, which is the right sequence.
Bad sides / risks
- It proposes a
recipesfield in state. That is risky because recipes are derived fromstatic_field_acquisitionanddynamic_decompositions; serializing them adds sync and compatibility burden for little benefit. - The proposed
plan -> planloop for newly generated decomposition may cause extra LLM planning calls. A deterministic normalizer that rewrites to the next component task is safer. - It underplays checkpoint compatibility risk from node-name/topology changes.
5. opus-proposal.md — valuable map, too granular as target¶
Good sides
- Provides the most exhaustive mapping from current line-level concerns to new gate/corrector responsibilities.
- Correctly introduces
llm_failedto preserve the current safeguard that prevents a structured-output failure from being rewritten into another planner loop. - Has a useful node-write table; this is the right discipline for checkpointed graph state.
- Recommends a three-step migration: helpers first, gates second, correctors third.
Bad sides / risks
- Too many graph nodes for the first implementation. Splitting every corrector into a
c_*node would add many Postgres checkpoint writes and make the graph noisy. - Some proposal links point at a different local path, so it should not be used mechanically.
- The same clarity can be achieved initially with one
decision_policynode and pure helper functions inside it.
6. kimi-2.6-proposal.md — mostly sound but noisy¶
Good sides
- Good decomposition table and a reasonable
tick,prepare,calc_gate,acquire,decide,enforce_policyshape. - Keeps post-decision safety rules together in
enforce_policy, which is a good first-step compromise. - Suggests a temporary wrapper so
plan_node()can delegate to the new pipeline during migration.
Bad sides / risks
- Dedicated
ask_user_regionandask_user_currencynodes duplicate the existingask_userinterrupt node and make the resume surface less uniform. - Some diagram details are under-specified, such as the
calc_adjustpath. - It is more node-heavy than necessary before the team proves the smaller phase split is insufficient.
7. glm-5.1-proposal.md — practical but incomplete¶
Good sides
- Low-risk: starts by extracting a
pre_checknode and leaving most of the remaining behavior intact. - Correctly recognizes that post-LLM rewriting can stay together at first instead of becoming several graph nodes immediately.
- Calls out a helper to standardize output assembly, which would reduce many repeated
outconstruction branches.
Bad sides / risks
- Leaves a large
acquire_or_plannode containing acquisition, auto-finish, LLM planning, decomposition, retry caps, and policy rewrites. That solves only part of the monolith problem. - Uses a
_pre_check_routestate field where aPlannerDecisionplus simple router, or a small typed phase flag, would be cleaner. - Does not fully address the current
llm_failedlocal behavior.
8. qwen-3.7-max-proposal.md — detailed but hazardous¶
Good sides
- Gives concrete pseudo-code for many nodes, which is useful for implementation thinking.
- Separates completion, post-processing, and cap enforcement clearly.
- Identifies the need for an LLM-failure flag.
Bad sides / risks
- Uses
_routeand_llm_failedas transient state and says they should be excluded from checkpointing. In this project, graph state is checkpoint-owned; transient routing fields would need explicit schema/reducer discipline, not just comments. - The graph sketch routes
observe_usertocheck_region_currency, bypassing the iteration tick after generic user answers. That risks breaking the one-iteration-per-decision-cycle semantics. - The sketch routes through a
route_decisionsource without registering it as a normal node in the shown builder. - It is too implementation-heavy while still containing wiring errors, so it should be mined for node snippets only.
9. gemini-3.1-pro-proposal.md — clear but too simplified¶
Good sides
- Easy to understand:
prepare_state,evaluate_rules,llm_plan. - Correctly isolates the LLM call and makes deterministic rules testable.
- Good high-level diagnosis of hidden control flow.
Bad sides / risks
- Does not cover the current post-LLM policy pipeline: derived-field redirects, web-preferred redirects, retry caps, and calculator adjustment.
- The proposed raw
route_after_llmis too weak for the current planner. - It understates the complexity of current
plan_node()and would likely leave behavior gaps around calculator-backed component acquisition.
10. mimo-2.5-pro-proposal.md — weak topology despite good diagnosis¶
Good sides
- Correctly diagnoses
plan_node()as a guard/router/LLM/policy monolith. - Suggests extraction without changing topology first.
- Keeps the target graph small.
Bad sides / risks
- The
acquirerouting is semantically off: deterministic acquisition tasks should often bypass the LLM, not route into a genericdecidenode that also owns LLM calls and all post-processing. no_task -> finishis not safe without the current auto-finish and missing-field checks.- Leaves too much in
decide, so the final design is still a smaller monolith. - The sketch references a
plan_routerwithout making its role concrete.
11. qwen-3.6-plus-proposal.md — avoid as an implementation basis¶
Good sides
- Correctly identifies the monolithic
plan_node()issue. - Correctly wants the graph diagram to become more truthful.
Bad sides / risks
- Proposes moving redirects and cap enforcement into
route_after_plan(). That is a LangGraph anti-pattern here because those rules mutatedecision,status, and sometimes decomposition state; state mutation belongs in nodes, not conditional edge functions. - Splits region/currency into dedicated observe nodes even though
observe_user_node()already handles them via target field. This duplicates resume logic and widens the interrupt surface. - Leaves search/observe/calculation loops returning to
planin several places, so loop-entry guards and iteration ticking are not consistently applied. - Omits enough acquisition and calculator nuance that it is unlikely to preserve behavior.
Recommended solution¶
I would not adopt any proposal verbatim. The best solution is a hybrid:
- Use Fable's topology as the base:
tick -> bootstrap_gate -> prepare -> select -> decide -> decision_policy -> action. - Use GPT-5.4's rules:
keep edge functions side-effect free, increment
iterationsonly intick, keepask_useras the only interrupt node, and keep the outer conversation graph unchanged. - Use GPT-5.5's state discipline:
add only a small
decision_originfield, ordecision_origin = "llm_error"instead of a separatellm_failed, and do not store derived recipes in checkpoint state unless proven necessary. - Use Opus/Kimi only for mapping details: their line-by-line decompositions are useful when extracting helper functions, but their final graphs are too granular for the first implementation.
Concrete target shape¶
flowchart TD
START --> tick
tick -->|terminal| finish
tick -->|continue| bootstrap_gate
bootstrap_gate -->|region/currency question| ask_user
bootstrap_gate -->|ready| prepare
prepare --> select
select -->|deterministic decision| decision_policy
select -->|needs LLM| decide
decide --> decision_policy
decision_policy -->|search| search
decision_policy -->|ask_user| ask_user
decision_policy -->|reflect| reflect
decision_policy -->|calculate| calculate
decision_policy -->|finish| finish
search -->|observation| observe
search -->|no observation| tick
observe --> tick
calculate --> tick
ask_user --> observe_user
observe_user --> tick
reflect --> tick
finish --> END
Suggested migration order¶
- Extract pure helpers from
plan_node()while keeping the old graph topology. Existing tests should remain mostly unchanged. - Add a small state flag such as
decision_origin: Literal["deterministic", "llm", "llm_error"] | NonetoStateandPlannerState; reset it intick. - Introduce
tickandbootstrap_gatefirst. This preserves current region/currency-before-preparation ordering. - Introduce
prepareandselectnext. Keep recipes derived locally from existing state, not stored as a new checkpoint field. - Reduce
plan_node()to LLM-onlydecidebehavior, but keep the exported function name until tests are moved. - Move all deterministic rewrites into one
decision_policynode. Split it later only if it remains too large after helper extraction. - Decide explicitly whether to bump planner thread namespace to
:planner:v2. If existing in-flight planner checkpoints matter, bump it; if this is a development-only migration, document why it is safe not to.
This combination gives the main architectural win without turning the planner into a graph of tiny predicates. It preserves the current runtime surface while making the graph honest about the phases that are currently hidden inside plan_node().