A multi-Agent debate system (That also picked a world cup winner)

After a long debate, I think I finally know who’s going to win the FIFA World Cup 2026.

This weekend I finished an idea I’d been sitting on for a few months: getting four different LLMs to debate a specific topic within a defined set of guidelines. So I took ChatGPT, Claude, Gemini and Grok, and using their APIs and Python I put together a structure where they interact and eventually reach a conclusion about who will be the next FIFA World Cup Champion.

First, I defined the topic and the framework, as detailed as possible and with up-to-date information. Both get fed into the system through two variables pointing to two txt files. Then one of the models reads the topic and proposes four roles designed to unfold the debate from different angles, with each model taking on the part of a domain expert. The debate runs for a maximum of four rounds, and from round 2 onwards a judge, also one of the models, can call it off early if it finds the discussion has become redundant. Otherwise, it continues all the way to round 4.

In round 1, each participant delivers an initial, reasoned response from their assigned role. The responses are stored in a SQL table under a unique session ID. In round 2, each participant reads what the other three wrote and puts forward a new argument from their role. Round 3 follows the same dynamic, with each model now having access to everything that came before. Finally, in round 4 the judge wraps up the discussion, and a synthesizer model reads through every response from round 1 onwards to consolidate the whole thing into an exportable txt summary.

Persistence is the core of this system: every response is saved to a table in Azure SQL, and from round 2 onwards the models can read what the others have written. None of the models knows who’s behind each response, they only see one another as Analyst A, B, C or D.

After four rounds, the debate narrowed down to a tight call between France and Spain, but the more interesting finding was how every top candidate ended up with a real structural flaw, and how the 48-team format, the heat and the cross-continental travel compress the margins far more than the headline odds suggest.

debate_20260510_193138_EN Download

https://github.com/arroba1250/agora

View in GitHub

Carmonex

Leave a Reply Cancel reply