Planner Graph 5.4 Proposal¶

Problem¶

plan_node() currently does too many jobs inside one node in src/venturescope/planner/agent.py:846:

iteration ticking and terminal checks
region/currency bootstrap gating
schema composition and dynamic decomposition prep
calculator-blocked acquisition recovery
deterministic completion checks
LLM planning
post-LLM policy rewrites and retry caps

That makes the graph in docs/planner-graph-ref/current-graph.md look simple while the real orchestration is hidden inside one node. The result is harder checkpoint reasoning, harder test targeting, and more risk when changing one policy.

Proposal¶

Move orchestration policy to graph level, but do not explode every rule into its own node. The target shape should use a small decision pipeline:

graph nodes for durable orchestration phases
pure helpers for business rules and schema logic
one slim LLM planning node that only proposes a PlannerDecision
one policy node that normalizes that decision before routing

What should move out of `plan_node()`¶

1. Iteration and terminal control -> `tick`¶

Move these checks into a dedicated loop-entry node:

increment iterations
emit the planning-step event
short-circuit when planner is already aborted
stop on max_iters

All loopbacks should return to tick, not directly to the planner node. That preserves the current "one iteration per decision cycle" behavior used by attempt tracking in search_node(), observe_user_node(), and reflect_node().

2. Bootstrap questions -> `bootstrap_gate`¶

Move deterministic region/currency gating into a separate node:

_needs_region_question()
_needs_currency_question()
creation of deterministic PlannerDecision(action="ask_user", ...)

This keeps bootstrap policy visible in the graph instead of being buried as an early return.

3. Schema/decomposition preparation -> `prepare_context`¶

Move context preparation before any planning decision:

_proactive_decompositions()
build_dynamic_recipes()
compose_ready_fields()
persistence of updated schema and dynamic_decompositions

The node should only orchestrate. Existing helpers should continue to own the domain logic.

4. Deterministic acquisition/calculation routing -> `acquisition_gate`¶

Move deterministic routing that does not require the planner LLM into a pre-LLM gate:

calculation-attempt cap
successful-calculation finish
blocked-calculation acquisition selection
blocked-field decomposition generation
open acquisition task selection
auto-finish checks when raw inputs are complete

This is the highest-value extraction because it removes calculator and acquisition policy from the LLM planning path.

5. Decision rewrites and retry policy -> `decision_policy`¶

Move post-LLM normalization into one deterministic node:

derived-field redirects
premature ask_user -> search redirect for web-preferred fields
duplicate/capped search fallback
ask-user retry cap
_adjust_calculation_decision()

Keep this as one policy node. Splitting every rewrite into separate nodes would add graph noise without improving clarity.

What should stay in the planning node¶

Rename plan_node() to llm_plan_node() and keep only work that truly belongs to the planner model:

compute prompt inputs from current state
call planner_prompt(...)
call _llm().structured(..., schema=PlannerDecision)
fall back to finish when structured output is invalid

This node should propose a decision, not run the whole planner.

Target graph shape¶

flowchart TD
    START --> tick
    tick -->|continue| bootstrap_gate
    tick -->|terminal| finish

    bootstrap_gate -->|needs region/currency| ask_user
    bootstrap_gate -->|ready| prepare_context

    prepare_context --> acquisition_gate
    acquisition_gate -->|deterministic decision| decision_policy
    acquisition_gate -->|needs LLM| llm_plan

    llm_plan --> decision_policy

    decision_policy -->|search| search
    decision_policy -->|ask_user| ask_user
    decision_policy -->|reflect| reflect
    decision_policy -->|calculate| calculate
    decision_policy -->|finish| finish

    search -->|last_observation| observe
    search -->|no observation| tick
    observe --> tick
    calculate --> tick
    reflect --> tick
    ask_user --> observe_user
    observe_user --> tick
    finish --> END

Why this is a better fit¶

Graph-level visibility¶

Checkpointed phase boundaries become explicit:

loop entry
bootstrap interrupt gating
state preparation
deterministic acquisition/calc routing
LLM proposal
policy normalization

That matches the actual planner lifecycle better than the current single-node hub.

Better test seams¶

Current tests in tests/planner/test_planner_decisions.py heavily target plan_node(). After the split, tests can target:

tick for iteration/terminal semantics
bootstrap_gate for region/currency behavior
prepare_context for decomposition/schema composition
acquisition_gate for deterministic routing
decision_policy for redirect/cap behavior
llm_plan_node for prompt + structured-output handling

This isolates policy regressions instead of forcing broad plan_node() fixture coverage.

Better alignment with LangGraph¶

This follows the usual LangGraph split:

nodes do work and persist state updates
routing stays explicit in graph edges
interrupts remain at the surface node that actually pauses execution

Importantly, conditional edge functions should stay simple and side-effect free. Mutating state in edge functions would make checkpoint behavior harder to reason about.

What should not change¶

ask_user remains the only interrupt() node
planner thread namespacing in planner_thread_id() stays unchanged
outer run_planner_step() bootstrap/resume contract stays unchanged
domain helpers keep owning decomposition, acquisition, validation, and schema merge logic
planner still emits exactly one actionable decision per iteration

This proposal is an internal graph refactor, not an outer orchestration redesign.

Migration order¶

Stage 1: extract `tick`¶

Change START -> plan to START -> tick -> plan first. Remove iteration increment and terminal checks from plan_node() but keep the rest intact.

This gives the cleanest first seam with minimal checkpoint churn.

Stage 2: extract `bootstrap_gate`¶

Move region/currency gating out next. This preserves current interrupt behavior because the graph still routes through ask_user -> observe_user.

Stage 3: extract `prepare_context`¶

Move proactive decomposition and schema composition out before any planning decision.

Stage 4: extract `acquisition_gate`¶

Move deterministic calculator and acquisition routing out of the LLM path.

Stage 5: slim to `llm_plan_node()`¶

Reduce the old planner node to prompt construction and structured decision generation only.

Stage 6: extract `decision_policy`¶

Move all decision rewrites and retry caps into one post-planning policy node.

Only after behavior is stable should tests be renamed/reorganized.

Risks and anti-patterns to avoid¶

Do not move state mutation into edge functions¶

Conditional edges should inspect state and return route labels only. Nodes should own mutations so checkpoints capture phase changes explicitly.

Do not increment iterations in every gate¶

Only tick should change iterations. Otherwise attempt logs and turn-scoped search extraction in run_planner_step() will drift.

Do not turn every policy rule into a node¶

One decision_policy node is enough. A longer chain like derived_redirect -> source_policy -> retry_policy -> calculator_policy would make the graph harder to follow than it is today.

Do not move interrupts into policy gates¶

ask_user should remain the single interrupting node. Gate nodes should create decisions, not pause execution themselves.

Recommendation¶

Adopt the staged graph split above.

The key design rule is:

deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small.

That keeps the planner graph honest: the graph shows the lifecycle, while helpers keep the domain complexity out of routing glue.