Thesis Stance Matrix for docs/planner-graph-ref/analyse¶
This companion file turns the normalized thesis catalog in thesis-index.md into a stance matrix.
For a rank-ordered summary by agreement count, see thesis-agreement-chart.md.
For an evaluator ranking derived from those agreement counts, see evaluator-common-thesis-ranking.md.
For the final meta-ranking of evaluator preferability, see evaluator-preferability-ranking.md.
Legend¶
A= agreesD= disagrees—= says nothing / no clear position
Evaluator abbreviations¶
| Abbrev | Evaluator |
|---|---|
| DS | deepseek-4-pro |
| GEM | gemini-3.1-pro |
| GLM | glm-5.1 |
| G54 | gpt-5.4 |
| G55 | gpt-5.5 |
| KIM | kimi-2.6 |
| MIM | mimo-2.5-pro |
| OPS | opus-4.7 |
| Q36 | qwen-3.6-plus |
| Q37 | qwen-3.7-max |
Scope note¶
- Theses
1-40are covered below. - Theses
41-44fromthesis-index.mdare aggregate ranking-count statements derived fromrankings-matrix.md, so they are not modeled here as per-evaluator stance rows. - Disagreement is conservative: a
Dappears only when an evaluator clearly argues the opposite thesis rather than merely failing to mention it.
Universal theses (1-12)¶
All evaluators agree with every thesis in this section.
| # | Thesis | DS | GEM | GLM | G54 | G55 | KIM | MIM | OPS | Q36 | Q37 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | plan_node is a monolith / god node and should be split |
A | A | A | A | A | A | A | A | A | A |
| 2 | The graph is misleading because routing intelligence is hidden inside plan_node |
A | A | A | A | A | A | A | A | A | A |
| 3 | Deterministic pre-LLM logic belongs in graph-visible phases | A | A | A | A | A | A | A | A | A | A |
| 4 | The LLM step should shrink to a thin planning phase | A | A | A | A | A | A | A | A | A | A |
| 5 | Loop-backs should re-enter at a top-of-loop phase | A | A | A | A | A | A | A | A | A | A |
| 6 | ask_user should remain the single interrupting node |
A | A | A | A | A | A | A | A | A | A |
| 7 | The outer conversation graph is out of scope | A | A | A | A | A | A | A | A | A | A |
| 8 | The best end-state is a medium-granularity durable pipeline | A | A | A | A | A | A | A | A | A | A |
| 9 | Heavy over-decomposition adds graph noise and checkpoint overhead | A | A | A | A | A | A | A | A | A | A |
| 10 | Conservative 2-3 node splits are migration slices, not the strongest final architecture | A | A | A | A | A | A | A | A | A | A |
| 11 | Post-LLM routing/policy must become explicit architecture | A | A | A | A | A | A | A | A | A | A |
| 12 | Redirect / mutation logic does not belong in edge functions | A | A | A | A | A | A | A | A | A | A |
Non-universal theses (13-40)¶
| # | Thesis | DS | GEM | GLM | G54 | G55 | KIM | MIM | OPS | Q36 | Q37 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 | gpt-5.4 is the strongest baseline and consensus winner |
D | A | D | A | A | D | D | A | A | A |
| 14 | deepseek-4-pro is the strongest 4-node alternative / stable second tier |
A | A | — | — | — | A | A | — | — | A |
| 15 | gpt-5.5 contributes the best state-hygiene / consequence-analysis ideas |
A | — | A | — | A | — | — | A | — | A |
| 16 | The planner needs explicit LLM-failure provenance (llm_failed or decision_origin) |
A | — | A | — | A | — | — | A | — | A |
| 17 | Exactly one loop-entry node should own iterations increments |
A | — | — | A | A | A | — | A | A | A |
| 18 | _route / route-tag fields in checkpointed state are a bad pattern |
A | — | — | A | — | A | A | A | A | A |
| 19 | Caching recipes / FieldAcquisition-like objects in planner state is risky until serializer safety is proven |
D | — | — | A | A | — | — | A | — | A |
| 20 | Explicit state-write ownership is valuable implementation documentation | A | — | A | — | A | — | — | A | — | — |
| 21 | Calculator lifecycle deserves explicit named handling | A | — | A | — | A | A | — | — | — | A |
| 22 | GLM's _build_plan_output() helper is worth adopting regardless of final graph shape |
A | — | A | — | — | — | — | — | A | A |
| 23 | Kimi's decision-matrix appendix is one of the best implementation reference artifacts | A | — | A | — | A | A | — | — | — | A |
| 24 | Opus's state/write mapping tables are valuable even if the graph is too chatty | A | — | A | — | A | — | — | A | — | — |
| 25 | qwen-3.7-max is useful as pseudo-code / reference but not as target topology |
A | — | A | — | — | — | — | A | A | A |
| 26 | qwen-3.6-plus contributes a useful finish-validation idea but should not be adopted as-is |
A | — | — | — | — | — | A | A | A | A |
| 27 | Dedicated ask_region / ask_currency nodes improve bootstrap visibility and graph honesty |
D | D | D | D | D | A | D | D | A | D |
| 28 | Dedicated bootstrap nodes are mostly graph noise; one gate feeding ask_user is better |
A | A | A | A | A | D | A | A | D | A |
| 29 | route_direct is a good adapter pattern for converging deterministic and LLM decisions |
A | — | — | — | — | A | — | — | — | A |
| 30 | decision_origin is a better long-term state field than bare llm_failed |
A | — | — | — | A | — | — | — | — | — |
| 31 | Splitting post-LLM policy into normalize_decision -> retry_gate -> maybe_calculate improves correctness despite extra hops |
D | D | A | D | A | D | D | D | D | A |
| 32 | DeepSeek's plan -> plan self-loop is the cleanest way to surface late decomposition re-prompting |
A | — | — | — | — | — | — | A | — | A |
| 33 | Keeping recipes in planner state is worthwhile if compatibility is verified |
A | — | — | D | D | — | — | D | — | D |
| 34 | Opus-style g_* / c_* naming improves graph self-documentation |
A | — | — | — | — | — | — | A | — | — |
| 35 | A dedicated dispatch phase is useful as a canonical final log+route point |
A | — | — | — | — | — | — | A | — | A |
| 36 | qwen-3.6-plus's redirect-in-edge pattern should be rejected |
A | A | A | A | A | A | A | A | D | A |
| 37 | The full Opus corrector lattice is analytically excellent but too chatty for production | A | A | A | A | A | A | A | A | A | A |
| 38 | Gemini's evaluate_rules shape is too coarse and becomes a new god node |
A | A | A | A | A | A | A | A | A | A |
| 39 | Mimo's decide node still hides too much logic to solve enough of the problem |
A | A | A | A | A | A | A | A | A | A |
| 40 | qwen-3.7-max's _route-heavy design is brittle even if the code sketches are useful |
A | — | — | A | — | A | A | A | A | D |