Skip to content

Thesis Stance Matrix for docs/planner-graph-ref/analyse

This companion file turns the normalized thesis catalog in thesis-index.md into a stance matrix.

For a rank-ordered summary by agreement count, see thesis-agreement-chart.md. For an evaluator ranking derived from those agreement counts, see evaluator-common-thesis-ranking.md. For the final meta-ranking of evaluator preferability, see evaluator-preferability-ranking.md.

Legend

  • A = agrees
  • D = disagrees
  • = says nothing / no clear position

Evaluator abbreviations

Abbrev Evaluator
DS deepseek-4-pro
GEM gemini-3.1-pro
GLM glm-5.1
G54 gpt-5.4
G55 gpt-5.5
KIM kimi-2.6
MIM mimo-2.5-pro
OPS opus-4.7
Q36 qwen-3.6-plus
Q37 qwen-3.7-max

Scope note

  • Theses 1-40 are covered below.
  • Theses 41-44 from thesis-index.md are aggregate ranking-count statements derived from rankings-matrix.md, so they are not modeled here as per-evaluator stance rows.
  • Disagreement is conservative: a D appears only when an evaluator clearly argues the opposite thesis rather than merely failing to mention it.

Universal theses (1-12)

All evaluators agree with every thesis in this section.

# Thesis DS GEM GLM G54 G55 KIM MIM OPS Q36 Q37
1 plan_node is a monolith / god node and should be split A A A A A A A A A A
2 The graph is misleading because routing intelligence is hidden inside plan_node A A A A A A A A A A
3 Deterministic pre-LLM logic belongs in graph-visible phases A A A A A A A A A A
4 The LLM step should shrink to a thin planning phase A A A A A A A A A A
5 Loop-backs should re-enter at a top-of-loop phase A A A A A A A A A A
6 ask_user should remain the single interrupting node A A A A A A A A A A
7 The outer conversation graph is out of scope A A A A A A A A A A
8 The best end-state is a medium-granularity durable pipeline A A A A A A A A A A
9 Heavy over-decomposition adds graph noise and checkpoint overhead A A A A A A A A A A
10 Conservative 2-3 node splits are migration slices, not the strongest final architecture A A A A A A A A A A
11 Post-LLM routing/policy must become explicit architecture A A A A A A A A A A
12 Redirect / mutation logic does not belong in edge functions A A A A A A A A A A

Non-universal theses (13-40)

# Thesis DS GEM GLM G54 G55 KIM MIM OPS Q36 Q37
13 gpt-5.4 is the strongest baseline and consensus winner D A D A A D D A A A
14 deepseek-4-pro is the strongest 4-node alternative / stable second tier A A A A A
15 gpt-5.5 contributes the best state-hygiene / consequence-analysis ideas A A A A A
16 The planner needs explicit LLM-failure provenance (llm_failed or decision_origin) A A A A A
17 Exactly one loop-entry node should own iterations increments A A A A A A A
18 _route / route-tag fields in checkpointed state are a bad pattern A A A A A A A
19 Caching recipes / FieldAcquisition-like objects in planner state is risky until serializer safety is proven D A A A A
20 Explicit state-write ownership is valuable implementation documentation A A A A
21 Calculator lifecycle deserves explicit named handling A A A A A
22 GLM's _build_plan_output() helper is worth adopting regardless of final graph shape A A A A
23 Kimi's decision-matrix appendix is one of the best implementation reference artifacts A A A A A
24 Opus's state/write mapping tables are valuable even if the graph is too chatty A A A A
25 qwen-3.7-max is useful as pseudo-code / reference but not as target topology A A A A A
26 qwen-3.6-plus contributes a useful finish-validation idea but should not be adopted as-is A A A A A
27 Dedicated ask_region / ask_currency nodes improve bootstrap visibility and graph honesty D D D D D A D D A D
28 Dedicated bootstrap nodes are mostly graph noise; one gate feeding ask_user is better A A A A A D A A D A
29 route_direct is a good adapter pattern for converging deterministic and LLM decisions A A A
30 decision_origin is a better long-term state field than bare llm_failed A A
31 Splitting post-LLM policy into normalize_decision -> retry_gate -> maybe_calculate improves correctness despite extra hops D D A D A D D D D A
32 DeepSeek's plan -> plan self-loop is the cleanest way to surface late decomposition re-prompting A A A
33 Keeping recipes in planner state is worthwhile if compatibility is verified A D D D D
34 Opus-style g_* / c_* naming improves graph self-documentation A A
35 A dedicated dispatch phase is useful as a canonical final log+route point A A A
36 qwen-3.6-plus's redirect-in-edge pattern should be rejected A A A A A A A A D A
37 The full Opus corrector lattice is analytically excellent but too chatty for production A A A A A A A A A A
38 Gemini's evaluate_rules shape is too coarse and becomes a new god node A A A A A A A A A A
39 Mimo's decide node still hides too much logic to solve enough of the problem A A A A A A A A A A
40 qwen-3.7-max's _route-heavy design is brittle even if the code sketches are useful A A A A A A D