Thesis Index for docs/planner-graph-ref/analyse¶
This file normalizes the main theses from the evaluator analyses in this directory.
Source evaluator docs:
deepseek-4-pro-range.mdgemini-3.1-pro-range.mdglm-5.1-pro-range.mdgpt-5.4-range.mdgpt-5.5-range.mdkimi-2.6-range.mdmimo-2.5-pro-range.mdopus-4.7-range.mdqwen-3.6-range.mdqwen-3.7-max-range.md
Cross-check used for ranking counts:
rankings-matrix.md
Method¶
- Theses are normalized: semantically equivalent statements are merged into one line.
- "Agreeing evaluators" means the evaluator states the thesis directly or clearly endorses it in its final ranking/synthesis.
- Some theses conflict with each other. This file preserves those disagreements instead of forcing a false consensus.
rankings-matrix.mdis used to cross-check ranking counts, but it is not listed as an evaluator because it is an aggregate synthesis file.- The companion stance grid lives in
thesis-matrix.mdand marks each evaluator as agree / disagree / no-position for theses1-40. - The ranked agreement chart lives in
thesis-agreement-chart.mdand sorts theses by evaluator agreement count. - The evaluator ranking by summed thesis commonness lives in
evaluator-common-thesis-ranking.md. - The evaluator preferability ranking for "most deliberated and comprehensive analysis" lives in
evaluator-preferability-ranking.md.
1. Universal theses¶
All ten evaluator docs agree on every thesis in this section:
deepseek-4-progemini-3.1-proglm-5.1gpt-5.4gpt-5.5kimi-2.6mimo-2.5-proopus-4.7qwen-3.6-plus-
qwen-3.7-max -
plan_nodeis a monolith / god node and should be split. - The current graph is misleading because too much routing intelligence is hidden inside
plan_node. - Deterministic pre-LLM logic belongs in graph-visible phases.
- The LLM step should shrink to a thin planning phase rather than remain a mixed orchestration node.
- Loop-back edges from action nodes should re-enter at a top-of-loop phase, not jump back into the middle of planning.
ask_usershould remain the single interrupting node.- The outer conversation graph is out of scope for this refactor.
- The right end-state is a medium-granularity pipeline of durable phases, not a node per
ifbranch. - Heavy over-decomposition adds graph noise and checkpoint overhead.
- Conservative 2-3 node splits may be acceptable as migration slices, but they are not the strongest final architecture.
- Post-LLM routing/policy must become explicit architecture, not remain buried in a catch-all planner function.
- Redirect / mutation logic does not belong in edge functions in the final design.
2. Strong-majority theses¶
-
gpt-5.4is the strongest baseline and the consensus winner.- Agreeing evaluators:
gemini-3.1-pro,gpt-5.4,gpt-5.5,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
deepseek-4-prois the strongest 4-node alternative and the most stable second-tier option.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,kimi-2.6,mimo-2.5-pro,qwen-3.7-max
- Agreeing evaluators:
-
gpt-5.5contributes the best state hygiene, consequence analysis, and test-migration ideas even if its full topology is too granular.- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
The planner needs explicit LLM-failure provenance (
llm_failedordecision_origin) so failure paths are not rewritten into bad calculator/reflect loops.- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
Exactly one loop-entry node should own
iterationsincrements.- Agreeing evaluators:
deepseek-4-pro,gpt-5.4,gpt-5.5,kimi-2.6,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
_route/ route-tag fields in checkpointed state are a bad pattern.- Agreeing evaluators:
deepseek-4-pro,gpt-5.4,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
Caching
recipes/FieldAcquisition-like acquisition objects in planner state is risky until serializer compatibility is proven.- Agreeing evaluators:
gpt-5.4,gpt-5.5,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
Explicit state-write ownership is worth adopting as implementation documentation.
- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,opus-4.7
- Agreeing evaluators:
-
Calculator lifecycle deserves explicit named handling (gate, policy seam, or dedicated concern), not incidental inline logic.
- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,kimi-2.6,qwen-3.7-max
- Agreeing evaluators:
-
GLM's
_build_plan_output()helper is worth adopting regardless of the final graph shape.- Agreeing evaluators:
deepseek-4-pro,glm-5.1,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
Kimi's decision-matrix / "what-goes-where" appendix is one of the best implementation reference artifacts in the set.
- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,kimi-2.6,qwen-3.7-max
- Agreeing evaluators:
-
Opus's state/write mapping tables are valuable reference material even if the proposed graph is too chatty.
- Agreeing evaluators:
deepseek-4-pro,glm-5.1,gpt-5.5,opus-4.7
- Agreeing evaluators:
-
qwen-3.7-maxis useful as implementation pseudo-code / reference, but not as the target topology.- Agreeing evaluators:
deepseek-4-pro,glm-5.1,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
qwen-3.6-pluscontributes a useful finish-validation idea (route_finish_check/ finish re-check) even though the rest of the design should not be adopted as-is.- Agreeing evaluators:
deepseek-4-pro,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
3. Contested theses¶
-
Dedicated
ask_region/ask_currencynodes improve bootstrap visibility and graph honesty.- Agreeing evaluators:
kimi-2.6,qwen-3.6-plus
- Agreeing evaluators:
-
Dedicated bootstrap nodes are mostly graph noise; one gate feeding the existing
ask_userpath is better.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,gpt-5.4,gpt-5.5,glm-5.1,mimo-2.5-pro,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
route_directis a good adapter pattern for converging deterministic acquisition decisions and LLM decisions into one post-policy phase.- Agreeing evaluators:
deepseek-4-pro,kimi-2.6,qwen-3.7-max
- Agreeing evaluators:
-
decision_originis a better long-term state field than a barellm_failedboolean.- Agreeing evaluators:
deepseek-4-pro,gpt-5.5
- Agreeing evaluators:
-
Splitting post-LLM policy into
normalize_decision -> retry_gate -> maybe_calculateimproves correctness and testability despite the extra hops.- Agreeing evaluators:
glm-5.1,gpt-5.5,qwen-3.7-max
- Agreeing evaluators:
-
DeepSeek's
plan -> planself-loop is the cleanest way to surface late decomposition / composite-target re-prompting.- Agreeing evaluators:
deepseek-4-pro,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
Keeping
recipesin planner state is a worthwhile cache if compatibility is verified.- Agreeing evaluators:
deepseek-4-pro
- Agreeing evaluators:
-
Opus-style
g_*/c_*naming improves graph self-documentation.- Agreeing evaluators:
deepseek-4-pro,opus-4.7
- Agreeing evaluators:
-
A dedicated
dispatchphase is useful as a canonical final log+route point.- Agreeing evaluators:
deepseek-4-pro,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
4. Proposal-specific rejection theses¶
-
qwen-3.6-plus's redirect-in-edge pattern should be rejected even if some of its finish-validation ideas are retained.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,gpt-5.4,gpt-5.5,glm-5.1,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.7-max
- Agreeing evaluators:
-
The full Opus corrector lattice is analytically excellent but too chatty for production.
- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,glm-5.1,gpt-5.4,gpt-5.5,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
Gemini's
evaluate_rulesshape is too coarse and would become a new god node.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,glm-5.1,gpt-5.4,gpt-5.5,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
Mimo's
decidenode still hides too much logic and does not solve enough of the original problem.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,glm-5.1,gpt-5.4,gpt-5.5,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
qwen-3.7-max's_route-heavy design is brittle even if its code sketches are useful.- Agreeing evaluators:
deepseek-4-pro,gpt-5.4,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus
- Agreeing evaluators:
5. Ranking theses¶
-
gpt-5.4was ranked #1 by six evaluators.- Agreeing evaluators:
gemini-3.1-pro,gpt-5.4,gpt-5.5,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators:
-
deepseek-4-prowas ranked #1 by three evaluators.- Agreeing evaluators:
deepseek-4-pro,kimi-2.6,mimo-2.5-pro
- Agreeing evaluators:
-
gpt-5.5was ranked #1 by one evaluator.- Agreeing evaluators:
glm-5.1
- Agreeing evaluators:
-
gpt-5.4plusdeepseek-4-proform the clear top tier across the directory.- Agreeing evaluators:
deepseek-4-pro,gemini-3.1-pro,gpt-5.5,kimi-2.6,mimo-2.5-pro,opus-4.7,qwen-3.6-plus,qwen-3.7-max
- Agreeing evaluators: