Twilight of the Gods. Comparing how 11 LLMs approach a code-reorganization task.
Other languages
Эта статья также доступна на русском: Гибель богов.
This is a detailed write-up of one experiment. I took a god node from a real LangGraph agent and asked 5 American and 6 Chinese models first to propose how to untangle it, then to evaluate each other's proposals. After that, I tried three different ways to figure out which of them to trust on the matter.