Planner Graph 5.4 Proposal¶
Problem¶
plan_node() currently does too many jobs inside one node in src/venturescope/planner/agent.py:846:
- iteration ticking and terminal checks
- region/currency bootstrap gating
- schema composition and dynamic decomposition prep
- calculator-blocked acquisition recovery
- deterministic completion checks
- LLM planning
- post-LLM policy rewrites and retry caps
That makes the graph in docs/planner-graph-ref/current-graph.md look simple while the real orchestration is hidden inside one node. The result is harder checkpoint reasoning, harder test targeting, and more risk when changing one policy.
Proposal¶
Move orchestration policy to graph level, but do not explode every rule into its own node. The target shape should use a small decision pipeline:
- graph nodes for durable orchestration phases
- pure helpers for business rules and schema logic
- one slim LLM planning node that only proposes a
PlannerDecision - one policy node that normalizes that decision before routing
What should move out of plan_node()¶
1. Iteration and terminal control -> tick¶
Move these checks into a dedicated loop-entry node:
- increment
iterations - emit the planning-step event
- short-circuit when planner is already aborted
- stop on
max_iters
All loopbacks should return to tick, not directly to the planner node. That preserves the current "one iteration per decision cycle" behavior used by attempt tracking in search_node(), observe_user_node(), and reflect_node().
2. Bootstrap questions -> bootstrap_gate¶
Move deterministic region/currency gating into a separate node:
_needs_region_question()_needs_currency_question()- creation of deterministic
PlannerDecision(action="ask_user", ...)
This keeps bootstrap policy visible in the graph instead of being buried as an early return.
3. Schema/decomposition preparation -> prepare_context¶
Move context preparation before any planning decision:
_proactive_decompositions()build_dynamic_recipes()compose_ready_fields()- persistence of updated
schemaanddynamic_decompositions
The node should only orchestrate. Existing helpers should continue to own the domain logic.
4. Deterministic acquisition/calculation routing -> acquisition_gate¶
Move deterministic routing that does not require the planner LLM into a pre-LLM gate:
- calculation-attempt cap
- successful-calculation finish
- blocked-calculation acquisition selection
- blocked-field decomposition generation
- open acquisition task selection
- auto-finish checks when raw inputs are complete
This is the highest-value extraction because it removes calculator and acquisition policy from the LLM planning path.
5. Decision rewrites and retry policy -> decision_policy¶
Move post-LLM normalization into one deterministic node:
- derived-field redirects
- premature
ask_user -> searchredirect for web-preferred fields - duplicate/capped search fallback
- ask-user retry cap
_adjust_calculation_decision()
Keep this as one policy node. Splitting every rewrite into separate nodes would add graph noise without improving clarity.
What should stay in the planning node¶
Rename plan_node() to llm_plan_node() and keep only work that truly belongs to the planner model:
- compute prompt inputs from current state
- call
planner_prompt(...) - call
_llm().structured(..., schema=PlannerDecision) - fall back to
finishwhen structured output is invalid
This node should propose a decision, not run the whole planner.
Target graph shape¶
flowchart TD
START --> tick
tick -->|continue| bootstrap_gate
tick -->|terminal| finish
bootstrap_gate -->|needs region/currency| ask_user
bootstrap_gate -->|ready| prepare_context
prepare_context --> acquisition_gate
acquisition_gate -->|deterministic decision| decision_policy
acquisition_gate -->|needs LLM| llm_plan
llm_plan --> decision_policy
decision_policy -->|search| search
decision_policy -->|ask_user| ask_user
decision_policy -->|reflect| reflect
decision_policy -->|calculate| calculate
decision_policy -->|finish| finish
search -->|last_observation| observe
search -->|no observation| tick
observe --> tick
calculate --> tick
reflect --> tick
ask_user --> observe_user
observe_user --> tick
finish --> END
Why this is a better fit¶
Graph-level visibility¶
Checkpointed phase boundaries become explicit:
- loop entry
- bootstrap interrupt gating
- state preparation
- deterministic acquisition/calc routing
- LLM proposal
- policy normalization
That matches the actual planner lifecycle better than the current single-node hub.
Better test seams¶
Current tests in tests/planner/test_planner_decisions.py heavily target plan_node(). After the split, tests can target:
tickfor iteration/terminal semanticsbootstrap_gatefor region/currency behaviorprepare_contextfor decomposition/schema compositionacquisition_gatefor deterministic routingdecision_policyfor redirect/cap behaviorllm_plan_nodefor prompt + structured-output handling
This isolates policy regressions instead of forcing broad plan_node() fixture coverage.
Better alignment with LangGraph¶
This follows the usual LangGraph split:
- nodes do work and persist state updates
- routing stays explicit in graph edges
- interrupts remain at the surface node that actually pauses execution
Importantly, conditional edge functions should stay simple and side-effect free. Mutating state in edge functions would make checkpoint behavior harder to reason about.
What should not change¶
ask_userremains the onlyinterrupt()node- planner thread namespacing in
planner_thread_id()stays unchanged - outer
run_planner_step()bootstrap/resume contract stays unchanged - domain helpers keep owning decomposition, acquisition, validation, and schema merge logic
- planner still emits exactly one actionable decision per iteration
This proposal is an internal graph refactor, not an outer orchestration redesign.
Migration order¶
Stage 1: extract tick¶
Change START -> plan to START -> tick -> plan first. Remove iteration increment and terminal checks from plan_node() but keep the rest intact.
This gives the cleanest first seam with minimal checkpoint churn.
Stage 2: extract bootstrap_gate¶
Move region/currency gating out next. This preserves current interrupt behavior because the graph still routes through ask_user -> observe_user.
Stage 3: extract prepare_context¶
Move proactive decomposition and schema composition out before any planning decision.
Stage 4: extract acquisition_gate¶
Move deterministic calculator and acquisition routing out of the LLM path.
Stage 5: slim to llm_plan_node()¶
Reduce the old planner node to prompt construction and structured decision generation only.
Stage 6: extract decision_policy¶
Move all decision rewrites and retry caps into one post-planning policy node.
Only after behavior is stable should tests be renamed/reorganized.
Risks and anti-patterns to avoid¶
Do not move state mutation into edge functions¶
Conditional edges should inspect state and return route labels only. Nodes should own mutations so checkpoints capture phase changes explicitly.
Do not increment iterations in every gate¶
Only tick should change iterations. Otherwise attempt logs and turn-scoped search extraction in run_planner_step() will drift.
Do not turn every policy rule into a node¶
One decision_policy node is enough. A longer chain like derived_redirect -> source_policy -> retry_policy -> calculator_policy would make the graph harder to follow than it is today.
Do not move interrupts into policy gates¶
ask_user should remain the single interrupting node. Gate nodes should create decisions, not pause execution themselves.
Recommendation¶
Adopt the staged graph split above.
The key design rule is:
deterministic orchestration belongs in graph phases; business logic stays in helpers; LLM planning stays small.
That keeps the planner graph honest: the graph shows the lifecycle, while helpers keep the domain complexity out of routing glue.