Skip to content

wtf

Thesis Matrix

Thesis Stance Matrix for `docs/planner-graph-ref/analyse`¶

This companion file turns the normalized thesis catalog in thesis-index.md into a stance matrix.

For a rank-ordered summary by agreement count, see thesis-agreement-chart.md. For an evaluator ranking derived from those agreement counts, see evaluator-common-thesis-ranking.md. For the final meta-ranking of evaluator preferability, see evaluator-preferability-ranking.md.

Legend¶

A = agrees
D = disagrees
— = says nothing / no clear position

Evaluator abbreviations¶

Abbrev	Evaluator
DS	`deepseek-4-pro`
GEM	`gemini-3.1-pro`
GLM	`glm-5.1`
G54	`gpt-5.4`
G55	`gpt-5.5`
KIM	`kimi-2.6`
MIM	`mimo-2.5-pro`
OPS	`opus-4.7`
Q36	`qwen-3.6-plus`
Q37	`qwen-3.7-max`

Scope note¶

Theses 1-40 are covered below.
Theses 41-44 from thesis-index.md are aggregate ranking-count statements derived from rankings-matrix.md, so they are not modeled here as per-evaluator stance rows.
Disagreement is conservative: a D appears only when an evaluator clearly argues the opposite thesis rather than merely failing to mention it.

Universal theses (`1-12`)¶

All evaluators agree with every thesis in this section.

#	Thesis	DS	GEM	GLM	G54	G55	KIM	MIM	OPS	Q36	Q37
1	`plan_node` is a monolith / god node and should be split	A	A	A	A	A	A	A	A	A	A
2	The graph is misleading because routing intelligence is hidden inside `plan_node`	A	A	A	A	A	A	A	A	A	A
3	Deterministic pre-LLM logic belongs in graph-visible phases	A	A	A	A	A	A	A	A	A	A
4	The LLM step should shrink to a thin planning phase	A	A	A	A	A	A	A	A	A	A
5	Loop-backs should re-enter at a top-of-loop phase	A	A	A	A	A	A	A	A	A	A
6	`ask_user` should remain the single interrupting node	A	A	A	A	A	A	A	A	A	A
7	The outer conversation graph is out of scope	A	A	A	A	A	A	A	A	A	A
8	The best end-state is a medium-granularity durable pipeline	A	A	A	A	A	A	A	A	A	A
9	Heavy over-decomposition adds graph noise and checkpoint overhead	A	A	A	A	A	A	A	A	A	A
10	Conservative 2-3 node splits are migration slices, not the strongest final architecture	A	A	A	A	A	A	A	A	A	A
11	Post-LLM routing/policy must become explicit architecture	A	A	A	A	A	A	A	A	A	A
12	Redirect / mutation logic does not belong in edge functions	A	A	A	A	A	A	A	A	A	A

Non-universal theses (`13-40`)¶

#	Thesis	DS	GEM	GLM	G54	G55	KIM	MIM	OPS	Q36	Q37
13	`gpt-5.4` is the strongest baseline and consensus winner	D	A	D	A	A	D	D	A	A	A
14	`deepseek-4-pro` is the strongest 4-node alternative / stable second tier	A	A	—	—	—	A	A	—	—	A
15	`gpt-5.5` contributes the best state-hygiene / consequence-analysis ideas	A	—	A	—	A	—	—	A	—	A
16	The planner needs explicit LLM-failure provenance (`llm_failed` or `decision_origin`)	A	—	A	—	A	—	—	A	—	A
17	Exactly one loop-entry node should own `iterations` increments	A	—	—	A	A	A	—	A	A	A
18	`_route` / route-tag fields in checkpointed state are a bad pattern	A	—	—	A	—	A	A	A	A	A
19	Caching `recipes` / `FieldAcquisition`-like objects in planner state is risky until serializer safety is proven	D	—	—	A	A	—	—	A	—	A
20	Explicit state-write ownership is valuable implementation documentation	A	—	A	—	A	—	—	A	—	—
21	Calculator lifecycle deserves explicit named handling	A	—	A	—	A	A	—	—	—	A
22	GLM's `_build_plan_output()` helper is worth adopting regardless of final graph shape	A	—	A	—	—	—	—	—	A	A
23	Kimi's decision-matrix appendix is one of the best implementation reference artifacts	A	—	A	—	A	A	—	—	—	A
24	Opus's state/write mapping tables are valuable even if the graph is too chatty	A	—	A	—	A	—	—	A	—	—
25	`qwen-3.7-max` is useful as pseudo-code / reference but not as target topology	A	—	A	—	—	—	—	A	A	A
26	`qwen-3.6-plus` contributes a useful finish-validation idea but should not be adopted as-is	A	—	—	—	—	—	A	A	A	A
27	Dedicated `ask_region` / `ask_currency` nodes improve bootstrap visibility and graph honesty	D	D	D	D	D	A	D	D	A	D
28	Dedicated bootstrap nodes are mostly graph noise; one gate feeding `ask_user` is better	A	A	A	A	A	D	A	A	D	A
29	`route_direct` is a good adapter pattern for converging deterministic and LLM decisions	A	—	—	—	—	A	—	—	—	A
30	`decision_origin` is a better long-term state field than bare `llm_failed`	A	—	—	—	A	—	—	—	—	—
31	Splitting post-LLM policy into `normalize_decision -> retry_gate -> maybe_calculate` improves correctness despite extra hops	D	D	A	D	A	D	D	D	D	A
32	DeepSeek's `plan -> plan` self-loop is the cleanest way to surface late decomposition re-prompting	A	—	—	—	—	—	—	A	—	A
33	Keeping `recipes` in planner state is worthwhile if compatibility is verified	A	—	—	D	D	—	—	D	—	D
34	Opus-style `g_` / `c_` naming improves graph self-documentation	A	—	—	—	—	—	—	A	—	—
35	A dedicated `dispatch` phase is useful as a canonical final log+route point	A	—	—	—	—	—	—	A	—	A
36	`qwen-3.6-plus`'s redirect-in-edge pattern should be rejected	A	A	A	A	A	A	A	A	D	A
37	The full Opus corrector lattice is analytically excellent but too chatty for production	A	A	A	A	A	A	A	A	A	A
38	Gemini's `evaluate_rules` shape is too coarse and becomes a new god node	A	A	A	A	A	A	A	A	A	A
39	Mimo's `decide` node still hides too much logic to solve enough of the problem	A	A	A	A	A	A	A	A	A	A
40	`qwen-3.7-max`'s `_route`-heavy design is brittle even if the code sketches are useful	A	—	—	A	—	A	A	A	A	D