Skip to content

Thesis Agreement Chart for docs/planner-graph-ref/analyse

This file ranks normalized theses by the number of evaluators that agree with them.

For the inverse view - ranking evaluators by the summed commonness of the theses they contain - see evaluator-common-thesis-ranking.md.

Method

  • Source of truth: thesis-matrix.md
  • Sort order: agreement count descending, then thesis number ascending
  • Evaluator count: 10 total
  • Covered here: theses 1-40
  • Omitted here: theses 41-44, because those are already aggregate ranking-count facts rather than substantive theses with independent evaluator stances

Legend

  • Agree = number of evaluators marked A
  • Disagree = number of evaluators marked D
  • Silent = number of evaluators marked
  • Bar = visualized agreement count out of 10 evaluators

Ranked chart

Rank Thesis # Agree Disagree Silent Bar Thesis
1 1 10 0 0 ██████████ plan_node is a monolith / god node and should be split
2 2 10 0 0 ██████████ The graph is misleading because routing intelligence is hidden inside plan_node
3 3 10 0 0 ██████████ Deterministic pre-LLM logic belongs in graph-visible phases
4 4 10 0 0 ██████████ The LLM step should shrink to a thin planning phase
5 5 10 0 0 ██████████ Loop-backs should re-enter at a top-of-loop phase
6 6 10 0 0 ██████████ ask_user should remain the single interrupting node
7 7 10 0 0 ██████████ The outer conversation graph is out of scope
8 8 10 0 0 ██████████ The best end-state is a medium-granularity durable pipeline
9 9 10 0 0 ██████████ Heavy over-decomposition adds graph noise and checkpoint overhead
10 10 10 0 0 ██████████ Conservative 2-3 node splits are migration slices, not the strongest final architecture
11 11 10 0 0 ██████████ Post-LLM routing/policy must become explicit architecture
12 12 10 0 0 ██████████ Redirect / mutation logic does not belong in edge functions
13 37 10 0 0 ██████████ The full Opus corrector lattice is analytically excellent but too chatty for production
14 38 10 0 0 ██████████ Gemini's evaluate_rules shape is too coarse and becomes a new god node
15 39 10 0 0 ██████████ Mimo's decide node still hides too much logic to solve enough of the problem
16 36 9 1 0 █████████ qwen-3.6-plus's redirect-in-edge pattern should be rejected
17 28 8 2 0 ████████ Dedicated bootstrap nodes are mostly graph noise; one gate feeding ask_user is better
18 17 7 0 3 ███████ Exactly one loop-entry node should own iterations increments
19 18 7 0 3 ███████ _route / route-tag fields in checkpointed state are a bad pattern
20 13 6 4 0 ██████ gpt-5.4 is the strongest baseline and consensus winner
21 40 6 1 3 ██████ qwen-3.7-max's _route-heavy design is brittle even if the code sketches are useful
22 14 5 0 5 █████ deepseek-4-pro is the strongest 4-node alternative / stable second tier
23 15 5 0 5 █████ gpt-5.5 contributes the best state-hygiene / consequence-analysis ideas
24 16 5 0 5 █████ The planner needs explicit LLM-failure provenance (llm_failed or decision_origin)
25 21 5 0 5 █████ Calculator lifecycle deserves explicit named handling
26 23 5 0 5 █████ Kimi's decision-matrix appendix is one of the best implementation reference artifacts
27 25 5 0 5 █████ qwen-3.7-max is useful as pseudo-code / reference but not as target topology
28 26 5 0 5 █████ qwen-3.6-plus contributes a useful finish-validation idea but should not be adopted as-is
29 19 4 1 5 ████ Caching recipes / FieldAcquisition-like objects in planner state is risky until serializer safety is proven
30 20 4 0 6 ████ Explicit state-write ownership is valuable implementation documentation
31 22 4 0 6 ████ GLM's _build_plan_output() helper is worth adopting regardless of final graph shape
32 24 4 0 6 ████ Opus's state/write mapping tables are valuable even if the graph is too chatty
33 29 3 0 7 ███ route_direct is a good adapter pattern for converging deterministic and LLM decisions
34 31 3 7 0 ███ Splitting post-LLM policy into normalize_decision -> retry_gate -> maybe_calculate improves correctness despite extra hops
35 32 3 0 7 ███ DeepSeek's plan -> plan self-loop is the cleanest way to surface late decomposition re-prompting
36 35 3 0 7 ███ A dedicated dispatch phase is useful as a canonical final log+route point
37 27 2 8 0 ██ Dedicated ask_region / ask_currency nodes improve bootstrap visibility and graph honesty
38 30 2 0 8 ██ decision_origin is a better long-term state field than bare llm_failed
39 34 2 0 8 ██ Opus-style g_* / c_* naming improves graph self-documentation
40 33 1 4 5 Keeping recipes in planner state is worthwhile if compatibility is verified