Planner Graph Refactoring: Proposal Analysis¶

This document evaluates the 11 proposals submitted for refactoring the monolithic plan_node in src/venturescope/planner/agent.py. The evaluation is based on LangGraph best practices, architectural cohesion, state management overhead (checkpointing), and observability.

Architectural Paradigms Observed¶

Across all proposals, three distinct architectural paradigms emerged to solve the plan_node monolith:

Micro-node Explosion (Extreme Granularity): Breaking every single if statement and rule into its own dedicated LangGraph node.
Edge-heavy Routing: Moving business logic and policy enforcement directly into LangGraph conditional edges.
Phase-based Pipeline (Balanced Granularity): Grouping deterministic logic into cohesive lifecycle phases (e.g., prepare, acquire, adjust) surrounding a pure LLM node.

Ranked Evaluation¶

Rank 1: The "Phase-Based Pipeline" (Highly Recommended)¶

Proposals: gpt-5.4-proposal.md, fable-5-proposal.md, deepseek-4-pro-proposal.md

These proposals correctly identify that while plan_node does too much, exploding it into a dozen nodes is a mistake. They advocate for a 4-to-5 node pipeline: - tick / prepare: Deterministic state mutations, schema composition, and iteration checks. - acquisition_gate / select: Deterministic routing bypassing the LLM if the next step is obvious. - llm_plan: A thin, pure node exclusively for the LLM call. - decision_policy / adjust: Post-LLM normalization and guardrail enforcement.

Pros: - Optimal Checkpointing: LangGraph saves state after every node. A 4-node pipeline ensures we only persist state at meaningful boundaries (before/after LLM), avoiding DB bloat and latency from unnecessary PostgreSQL roundtrips. - High Cohesion: Business rules for "policy" stay together in a testable, pure python helper rather than being scattered across the graph topology. - Clean Traces: LangSmith traces remain readable, distinctly highlighting preparation, LLM inference, and decision adjustment.

Cons: - Some routing logic (like specific calculator rules) remains hidden inside the acquisition_gate or decision_policy node functions rather than being explicit graph edges.

Rank 2: The "Minimalist Stage" Approach¶

Proposals: mimo-2.5-pro-proposal.md, glm-5.1-proposal.md, gemini-3.1-pro-proposal.md

These proposals suggest a very coarse-grained split, typically into just 3 nodes: guards, acquire, and decide (the LLM).

Pros: - Much better than the current monolith. - Very easy and low-risk to implement and migrate.

Cons: - They tend to lump pre-LLM preparation (schema composition) and post-LLM correction (decision rewriting) either back into the LLM node or awkwardly into the guards node, meaning we still have mixed concerns.

Rank 3: The "Micro-node Explosion" (Not Recommended)¶

Proposals: qwen-3.7-max-proposal.md, gpt-5.5-proposal.md, kimi-2.6-proposal.md, qwen-3.6-plus-proposal.md

These proposals advocate for mapping the control flow almost 1:1 to LangGraph nodes (8 to 10 nodes like check_termination, check_region_currency, compose_schema, check_calculator, enforce_caps, etc.).

Pros: - The graph visualization (Mermaid diagram) becomes a 100% accurate representation of every single business rule.

Cons: - Severe Checkpoint Bloat: LangGraph will write to the database 10 times for a single logical turn before even reaching an action. This introduces massive latency. - Trace Noise: LangSmith traces will be flooded with micro-execution steps (e.g., executing check_calculator -> returns True -> executing compose_schema), making it impossible to quickly debug the actual LLM behavior. - Fragility: Changing a simple business rule now requires rewiring the graph edges rather than just updating a unit-tested Python function.

Rank 4: The "Edge-Heavy" Approach (Anti-Pattern)¶

Proposals: opus-proposal.md

This approach attempts to hoist policy logic directly into LangGraph conditional edges (route_after_plan).

Pros: - Keeps the node count low while making routing explicit.

Cons: - LangGraph Anti-Pattern: Conditional edges in LangGraph are designed to read state and return a routing string. They should not mutate state (like incrementing iterations, mutating the schema, or generating dynamic decompositions). Pushing complex policy into edges either breaks state immutability or requires awkward workarounds.

Conclusion & Best Solution¶

The best solution is a combination of the Phase-Based Pipeline approaches (specifically drawing from gpt-5.4-proposal.md and fable-5-proposal.md).

Recommended Target Architecture:¶

We should decompose plan_node into exactly four distinct graph nodes:

prepare_node: Handles deterministic setup. Increments iterations, checks early aborts, handles the mandatory region/currency bootstrap gating, and composes the schema.
acquisition_gate_node: A deterministic router. Evaluates if the calculator is blocked or if there are obvious missing fields. If it finds a deterministic task, it outputs a PlannerDecision and bypasses the LLM entirely.
llm_plan_node: The core AI component. Receives a prepared state and returns a raw PlannerDecision. No state mutation or policy enforcement happens here.
policy_guard_node: Post-processes the raw LLM decision. Enforces search/ask retry caps, normalizes derived-field redirects, and rewrites invalid choices.

Why this is the winner:¶

This architecture perfectly balances visibility and performance. It isolates the non-deterministic LLM call from deterministic business logic, making unit testing trivial. Simultaneously, by limiting the decomposition to 4 major phases, we protect the system from the severe checkpointing latency and tracing noise that the micro-node proposals would introduce.