analyse`¶

This file normalizes the main theses from the evaluator analyses in this directory.

Source evaluator docs:

deepseek-4-pro-range.md
gemini-3.1-pro-range.md
glm-5.1-pro-range.md
gpt-5.4-range.md
gpt-5.5-range.md
kimi-2.6-range.md
mimo-2.5-pro-range.md
opus-4.7-range.md
qwen-3.6-range.md
qwen-3.7-max-range.md

Cross-check used for ranking counts:

rankings-matrix.md

Method¶

Theses are normalized: semantically equivalent statements are merged into one line.
"Agreeing evaluators" means the evaluator states the thesis directly or clearly endorses it in its final ranking/synthesis.
Some theses conflict with each other. This file preserves those disagreements instead of forcing a false consensus.
rankings-matrix.md is used to cross-check ranking counts, but it is not listed as an evaluator because it is an aggregate synthesis file.
The companion stance grid lives in thesis-matrix.md and marks each evaluator as agree / disagree / no-position for theses 1-40.
The ranked agreement chart lives in thesis-agreement-chart.md and sorts theses by evaluator agreement count.
The evaluator ranking by summed thesis commonness lives in evaluator-common-thesis-ranking.md.
The evaluator preferability ranking for "most deliberated and comprehensive analysis" lives in evaluator-preferability-ranking.md.

1. Universal theses¶

All ten evaluator docs agree on every thesis in this section:

deepseek-4-pro
gemini-3.1-pro
glm-5.1
gpt-5.4
gpt-5.5
kimi-2.6
mimo-2.5-pro
opus-4.7
qwen-3.6-plus
qwen-3.7-max
plan_node is a monolith / god node and should be split.
The current graph is misleading because too much routing intelligence is hidden inside plan_node.
Deterministic pre-LLM logic belongs in graph-visible phases.
The LLM step should shrink to a thin planning phase rather than remain a mixed orchestration node.
Loop-back edges from action nodes should re-enter at a top-of-loop phase, not jump back into the middle of planning.
ask_user should remain the single interrupting node.
The outer conversation graph is out of scope for this refactor.
The right end-state is a medium-granularity pipeline of durable phases, not a node per if branch.
Heavy over-decomposition adds graph noise and checkpoint overhead.
Conservative 2-3 node splits may be acceptable as migration slices, but they are not the strongest final architecture.
Post-LLM routing/policy must become explicit architecture, not remain buried in a catch-all planner function.
Redirect / mutation logic does not belong in edge functions in the final design.

2. Strong-majority theses¶

gpt-5.4 is the strongest baseline and the consensus winner.
- Agreeing evaluators: gemini-3.1-pro, gpt-5.4, gpt-5.5, opus-4.7, qwen-3.6-plus, qwen-3.7-max
deepseek-4-pro is the strongest 4-node alternative and the most stable second-tier option.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, kimi-2.6, mimo-2.5-pro, qwen-3.7-max
gpt-5.5 contributes the best state hygiene, consequence analysis, and test-migration ideas even if its full topology is too granular.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, opus-4.7, qwen-3.7-max
The planner needs explicit LLM-failure provenance (llm_failed or decision_origin) so failure paths are not rewritten into bad calculator/reflect loops.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, opus-4.7, qwen-3.7-max
Exactly one loop-entry node should own iterations increments.
- Agreeing evaluators: deepseek-4-pro, gpt-5.4, gpt-5.5, kimi-2.6, opus-4.7, qwen-3.6-plus, qwen-3.7-max
_route / route-tag fields in checkpointed state are a bad pattern.
- Agreeing evaluators: deepseek-4-pro, gpt-5.4, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max
Caching recipes / FieldAcquisition-like acquisition objects in planner state is risky until serializer compatibility is proven.
- Agreeing evaluators: gpt-5.4, gpt-5.5, opus-4.7, qwen-3.7-max
Explicit state-write ownership is worth adopting as implementation documentation.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, opus-4.7
Calculator lifecycle deserves explicit named handling (gate, policy seam, or dedicated concern), not incidental inline logic.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, kimi-2.6, qwen-3.7-max
GLM's _build_plan_output() helper is worth adopting regardless of the final graph shape.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, qwen-3.6-plus, qwen-3.7-max
Kimi's decision-matrix / "what-goes-where" appendix is one of the best implementation reference artifacts in the set.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, kimi-2.6, qwen-3.7-max
Opus's state/write mapping tables are valuable reference material even if the proposed graph is too chatty.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, gpt-5.5, opus-4.7
qwen-3.7-max is useful as implementation pseudo-code / reference, but not as the target topology.
- Agreeing evaluators: deepseek-4-pro, glm-5.1, opus-4.7, qwen-3.6-plus, qwen-3.7-max
qwen-3.6-plus contributes a useful finish-validation idea (route_finish_check / finish re-check) even though the rest of the design should not be adopted as-is.
- Agreeing evaluators: deepseek-4-pro, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max

3. Contested theses¶

Dedicated ask_region / ask_currency nodes improve bootstrap visibility and graph honesty.
- Agreeing evaluators: kimi-2.6, qwen-3.6-plus
Dedicated bootstrap nodes are mostly graph noise; one gate feeding the existing ask_user path is better.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, gpt-5.4, gpt-5.5, glm-5.1, mimo-2.5-pro, opus-4.7, qwen-3.7-max
route_direct is a good adapter pattern for converging deterministic acquisition decisions and LLM decisions into one post-policy phase.
- Agreeing evaluators: deepseek-4-pro, kimi-2.6, qwen-3.7-max
decision_origin is a better long-term state field than a bare llm_failed boolean.
- Agreeing evaluators: deepseek-4-pro, gpt-5.5
Splitting post-LLM policy into normalize_decision -> retry_gate -> maybe_calculate improves correctness and testability despite the extra hops.
- Agreeing evaluators: glm-5.1, gpt-5.5, qwen-3.7-max
DeepSeek's plan -> plan self-loop is the cleanest way to surface late decomposition / composite-target re-prompting.
- Agreeing evaluators: deepseek-4-pro, opus-4.7, qwen-3.7-max
Keeping recipes in planner state is a worthwhile cache if compatibility is verified.
- Agreeing evaluators: deepseek-4-pro
Opus-style g_* / c_* naming improves graph self-documentation.
- Agreeing evaluators: deepseek-4-pro, opus-4.7
A dedicated dispatch phase is useful as a canonical final log+route point.
- Agreeing evaluators: deepseek-4-pro, opus-4.7, qwen-3.7-max

4. Proposal-specific rejection theses¶

qwen-3.6-plus's redirect-in-edge pattern should be rejected even if some of its finish-validation ideas are retained.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, gpt-5.4, gpt-5.5, glm-5.1, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.7-max
The full Opus corrector lattice is analytically excellent but too chatty for production.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, glm-5.1, gpt-5.4, gpt-5.5, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max
Gemini's evaluate_rules shape is too coarse and would become a new god node.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, glm-5.1, gpt-5.4, gpt-5.5, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max
Mimo's decide node still hides too much logic and does not solve enough of the original problem.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, glm-5.1, gpt-5.4, gpt-5.5, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max
qwen-3.7-max's _route-heavy design is brittle even if its code sketches are useful.
- Agreeing evaluators: deepseek-4-pro, gpt-5.4, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus

5. Ranking theses¶

gpt-5.4 was ranked #1 by six evaluators.
- Agreeing evaluators: gemini-3.1-pro, gpt-5.4, gpt-5.5, opus-4.7, qwen-3.6-plus, qwen-3.7-max
deepseek-4-pro was ranked #1 by three evaluators.
- Agreeing evaluators: deepseek-4-pro, kimi-2.6, mimo-2.5-pro
gpt-5.5 was ranked #1 by one evaluator.
- Agreeing evaluators: glm-5.1
gpt-5.4 plus deepseek-4-pro form the clear top tier across the directory.
- Agreeing evaluators: deepseek-4-pro, gemini-3.1-pro, gpt-5.5, kimi-2.6, mimo-2.5-pro, opus-4.7, qwen-3.6-plus, qwen-3.7-max

Thesis Index for docs/planner-graph-ref/analyse¶