Thesis Agreement Chart for docs/planner-graph-ref/analyse¶
This file ranks normalized theses by the number of evaluators that agree with them.
For the inverse view - ranking evaluators by the summed commonness of the theses they contain - see evaluator-common-thesis-ranking.md.
Method¶
- Source of truth:
thesis-matrix.md - Sort order: agreement count descending, then thesis number ascending
- Evaluator count: 10 total
- Covered here: theses
1-40 - Omitted here: theses
41-44, because those are already aggregate ranking-count facts rather than substantive theses with independent evaluator stances
Legend¶
Agree= number of evaluators markedADisagree= number of evaluators markedDSilent= number of evaluators marked—Bar= visualized agreement count out of 10 evaluators
Ranked chart¶
| Rank | Thesis # | Agree | Disagree | Silent | Bar | Thesis |
|---|---|---|---|---|---|---|
| 1 | 1 | 10 | 0 | 0 | ██████████ |
plan_node is a monolith / god node and should be split |
| 2 | 2 | 10 | 0 | 0 | ██████████ |
The graph is misleading because routing intelligence is hidden inside plan_node |
| 3 | 3 | 10 | 0 | 0 | ██████████ |
Deterministic pre-LLM logic belongs in graph-visible phases |
| 4 | 4 | 10 | 0 | 0 | ██████████ |
The LLM step should shrink to a thin planning phase |
| 5 | 5 | 10 | 0 | 0 | ██████████ |
Loop-backs should re-enter at a top-of-loop phase |
| 6 | 6 | 10 | 0 | 0 | ██████████ |
ask_user should remain the single interrupting node |
| 7 | 7 | 10 | 0 | 0 | ██████████ |
The outer conversation graph is out of scope |
| 8 | 8 | 10 | 0 | 0 | ██████████ |
The best end-state is a medium-granularity durable pipeline |
| 9 | 9 | 10 | 0 | 0 | ██████████ |
Heavy over-decomposition adds graph noise and checkpoint overhead |
| 10 | 10 | 10 | 0 | 0 | ██████████ |
Conservative 2-3 node splits are migration slices, not the strongest final architecture |
| 11 | 11 | 10 | 0 | 0 | ██████████ |
Post-LLM routing/policy must become explicit architecture |
| 12 | 12 | 10 | 0 | 0 | ██████████ |
Redirect / mutation logic does not belong in edge functions |
| 13 | 37 | 10 | 0 | 0 | ██████████ |
The full Opus corrector lattice is analytically excellent but too chatty for production |
| 14 | 38 | 10 | 0 | 0 | ██████████ |
Gemini's evaluate_rules shape is too coarse and becomes a new god node |
| 15 | 39 | 10 | 0 | 0 | ██████████ |
Mimo's decide node still hides too much logic to solve enough of the problem |
| 16 | 36 | 9 | 1 | 0 | █████████ |
qwen-3.6-plus's redirect-in-edge pattern should be rejected |
| 17 | 28 | 8 | 2 | 0 | ████████ |
Dedicated bootstrap nodes are mostly graph noise; one gate feeding ask_user is better |
| 18 | 17 | 7 | 0 | 3 | ███████ |
Exactly one loop-entry node should own iterations increments |
| 19 | 18 | 7 | 0 | 3 | ███████ |
_route / route-tag fields in checkpointed state are a bad pattern |
| 20 | 13 | 6 | 4 | 0 | ██████ |
gpt-5.4 is the strongest baseline and consensus winner |
| 21 | 40 | 6 | 1 | 3 | ██████ |
qwen-3.7-max's _route-heavy design is brittle even if the code sketches are useful |
| 22 | 14 | 5 | 0 | 5 | █████ |
deepseek-4-pro is the strongest 4-node alternative / stable second tier |
| 23 | 15 | 5 | 0 | 5 | █████ |
gpt-5.5 contributes the best state-hygiene / consequence-analysis ideas |
| 24 | 16 | 5 | 0 | 5 | █████ |
The planner needs explicit LLM-failure provenance (llm_failed or decision_origin) |
| 25 | 21 | 5 | 0 | 5 | █████ |
Calculator lifecycle deserves explicit named handling |
| 26 | 23 | 5 | 0 | 5 | █████ |
Kimi's decision-matrix appendix is one of the best implementation reference artifacts |
| 27 | 25 | 5 | 0 | 5 | █████ |
qwen-3.7-max is useful as pseudo-code / reference but not as target topology |
| 28 | 26 | 5 | 0 | 5 | █████ |
qwen-3.6-plus contributes a useful finish-validation idea but should not be adopted as-is |
| 29 | 19 | 4 | 1 | 5 | ████ |
Caching recipes / FieldAcquisition-like objects in planner state is risky until serializer safety is proven |
| 30 | 20 | 4 | 0 | 6 | ████ |
Explicit state-write ownership is valuable implementation documentation |
| 31 | 22 | 4 | 0 | 6 | ████ |
GLM's _build_plan_output() helper is worth adopting regardless of final graph shape |
| 32 | 24 | 4 | 0 | 6 | ████ |
Opus's state/write mapping tables are valuable even if the graph is too chatty |
| 33 | 29 | 3 | 0 | 7 | ███ |
route_direct is a good adapter pattern for converging deterministic and LLM decisions |
| 34 | 31 | 3 | 7 | 0 | ███ |
Splitting post-LLM policy into normalize_decision -> retry_gate -> maybe_calculate improves correctness despite extra hops |
| 35 | 32 | 3 | 0 | 7 | ███ |
DeepSeek's plan -> plan self-loop is the cleanest way to surface late decomposition re-prompting |
| 36 | 35 | 3 | 0 | 7 | ███ |
A dedicated dispatch phase is useful as a canonical final log+route point |
| 37 | 27 | 2 | 8 | 0 | ██ |
Dedicated ask_region / ask_currency nodes improve bootstrap visibility and graph honesty |
| 38 | 30 | 2 | 0 | 8 | ██ |
decision_origin is a better long-term state field than bare llm_failed |
| 39 | 34 | 2 | 0 | 8 | ██ |
Opus-style g_* / c_* naming improves graph self-documentation |
| 40 | 33 | 1 | 4 | 5 | █ |
Keeping recipes in planner state is worthwhile if compatibility is verified |