Skip to content

wtf

Thesis Agreement Chart

Thesis Agreement Chart for `docs/planner-graph-ref/analyse`¶

This file ranks normalized theses by the number of evaluators that agree with them.

For the inverse view - ranking evaluators by the summed commonness of the theses they contain - see evaluator-common-thesis-ranking.md.

Method¶

Source of truth: thesis-matrix.md
Sort order: agreement count descending, then thesis number ascending
Evaluator count: 10 total
Covered here: theses 1-40
Omitted here: theses 41-44, because those are already aggregate ranking-count facts rather than substantive theses with independent evaluator stances

Legend¶

Agree = number of evaluators marked A
Disagree = number of evaluators marked D
Silent = number of evaluators marked —
Bar = visualized agreement count out of 10 evaluators

Ranked chart¶

Rank	Thesis #	Agree	Disagree	Silent	Bar	Thesis
1	1	10	0	0	`██████████`	`plan_node` is a monolith / god node and should be split
2	2	10	0	0	`██████████`	The graph is misleading because routing intelligence is hidden inside `plan_node`
3	3	10	0	0	`██████████`	Deterministic pre-LLM logic belongs in graph-visible phases
4	4	10	0	0	`██████████`	The LLM step should shrink to a thin planning phase
5	5	10	0	0	`██████████`	Loop-backs should re-enter at a top-of-loop phase
6	6	10	0	0	`██████████`	`ask_user` should remain the single interrupting node
7	7	10	0	0	`██████████`	The outer conversation graph is out of scope
8	8	10	0	0	`██████████`	The best end-state is a medium-granularity durable pipeline
9	9	10	0	0	`██████████`	Heavy over-decomposition adds graph noise and checkpoint overhead
10	10	10	0	0	`██████████`	Conservative 2-3 node splits are migration slices, not the strongest final architecture
11	11	10	0	0	`██████████`	Post-LLM routing/policy must become explicit architecture
12	12	10	0	0	`██████████`	Redirect / mutation logic does not belong in edge functions
13	37	10	0	0	`██████████`	The full Opus corrector lattice is analytically excellent but too chatty for production
14	38	10	0	0	`██████████`	Gemini's `evaluate_rules` shape is too coarse and becomes a new god node
15	39	10	0	0	`██████████`	Mimo's `decide` node still hides too much logic to solve enough of the problem
16	36	9	1	0	`█████████`	`qwen-3.6-plus`'s redirect-in-edge pattern should be rejected
17	28	8	2	0	`████████`	Dedicated bootstrap nodes are mostly graph noise; one gate feeding `ask_user` is better
18	17	7	0	3	`███████`	Exactly one loop-entry node should own `iterations` increments
19	18	7	0	3	`███████`	`_route` / route-tag fields in checkpointed state are a bad pattern
20	13	6	4	0	`██████`	`gpt-5.4` is the strongest baseline and consensus winner
21	40	6	1	3	`██████`	`qwen-3.7-max`'s `_route`-heavy design is brittle even if the code sketches are useful
22	14	5	0	5	`█████`	`deepseek-4-pro` is the strongest 4-node alternative / stable second tier
23	15	5	0	5	`█████`	`gpt-5.5` contributes the best state-hygiene / consequence-analysis ideas
24	16	5	0	5	`█████`	The planner needs explicit LLM-failure provenance (`llm_failed` or `decision_origin`)
25	21	5	0	5	`█████`	Calculator lifecycle deserves explicit named handling
26	23	5	0	5	`█████`	Kimi's decision-matrix appendix is one of the best implementation reference artifacts
27	25	5	0	5	`█████`	`qwen-3.7-max` is useful as pseudo-code / reference but not as target topology
28	26	5	0	5	`█████`	`qwen-3.6-plus` contributes a useful finish-validation idea but should not be adopted as-is
29	19	4	1	5	`████`	Caching `recipes` / `FieldAcquisition`-like objects in planner state is risky until serializer safety is proven
30	20	4	0	6	`████`	Explicit state-write ownership is valuable implementation documentation
31	22	4	0	6	`████`	GLM's `_build_plan_output()` helper is worth adopting regardless of final graph shape
32	24	4	0	6	`████`	Opus's state/write mapping tables are valuable even if the graph is too chatty
33	29	3	0	7	`███`	`route_direct` is a good adapter pattern for converging deterministic and LLM decisions
34	31	3	7	0	`███`	Splitting post-LLM policy into `normalize_decision -> retry_gate -> maybe_calculate` improves correctness despite extra hops
35	32	3	0	7	`███`	DeepSeek's `plan -> plan` self-loop is the cleanest way to surface late decomposition re-prompting
36	35	3	0	7	`███`	A dedicated `dispatch` phase is useful as a canonical final log+route point
37	27	2	8	0	`██`	Dedicated `ask_region` / `ask_currency` nodes improve bootstrap visibility and graph honesty
38	30	2	0	8	`██`	`decision_origin` is a better long-term state field than bare `llm_failed`
39	34	2	0	8	`██`	Opus-style `g_` / `c_` naming improves graph self-documentation
40	33	1	4	5	`█`	Keeping `recipes` in planner state is worthwhile if compatibility is verified