engineering notes

Multi-agent system architecture: roles, contracts, coordination

What a multi-agent system is made of: the agent as an element, input/output contracts, coordination and message exchange between agents.

In brief for executives. The architecture of a multi-agent system is not the choice of model but how the boundaries between participants are built: what each accepts, what it returns, who decides the next step. Exactly these boundaries determine what a change to the system will cost a year from now. A clear architecture is a managed cost of changes; a tangled one is a full rewrite on every significant change.

“Agent” colloquially means almost anything — from a model call with an instruction to an autonomous decision-making entity. Because of this blur, a multi-agent system’s architecture is often not designed at all: the result is a tangle of prompts calling itself. Let’s go through what the system actually consists of.

Architecture is set by contracts at the seams, not by the choice of model.

Hypothesis: architecture is set by contracts, not the model

The robustness of a distributed process is set not by how smart each participant is but by how strictly the seams between them are defined. A contract at an agent’s input and output is the unit of architecture. You can swap the model behind a contract; you cannot swap the implicit arrangements between a dozen agents without a rewrite.

data

Why multi-agent systems fail (1,600+ execution traces)

Nearly 80% of failures are specification and coordination — i.e. architecture, not a «weak model». Fixed by contracts and explicit coordination, not by swapping the LLM.

Source: Why Do Multi-Agent LLM Systems Fail? (MAST, UC Berkeley), NeurIPS 2025 https://arxiv.org/pdf/2503.13657

The failure distribution confirms the thesis: most breakage is where specification and coordination are blurred — i.e. at the seams, not inside the agents.

Problem: “agent” is understood as a prompt

When an agent is conceived as “a lucky model instruction”, the system has no interfaces: participants exchange free text, the decision on the next step is inferred by the model from the correspondence, boundaries of responsibility are undefined. Such a system works in a demo and is not debuggable in production: you cannot say which participant failed, because there are no checkable boundaries between them.

Why the usual approaches don’t work

Free text as an interface does not scale: it cannot be typed, validated and versioned. The more agents, the more implicit links and the faster the cost of any change grows.

A centralized “smart coordinator” that decides the next route by free reasoning on every step is a single point of failure and unpredictability: its behaviour cannot be reproduced and tested.

The absence of contract versioning means a change in one agent’s format silently breaks a neighbouring one. In a system without explicit seams this is found in production, not at build.

Engineering model: roles, contracts, coordination

Roles. Each agent has one area of responsibility, a limited toolset and a defined task. A classifier agent doesn’t write the answer, an executor agent doesn’t decide whether escalation is needed. A narrow role means testability and replaceability.

Contracts. Input and output are typed structures with mandatory fields, including an explicit “unsure / couldn’t” flag. A contract makes the boundary of responsibility observable and lets a participant be changed without touching the rest. Contracts are versioned: a format change is a managed event, not a silent break.

Coordination. It must be unambiguous who decides the next step: a coordinator with routing by the previous step’s result, or an event-driven choreography where a step emits an event and the next participant is subscribed to it. The route is set by architecture, not inferred by the model from free correspondence. Coordination has limits: iteration count, timeouts, early exit.

Observability as part of the architecture. Typed messages between agents are logged and traced by step. This turns “an error somewhere” into “a failure at the seam of agent A and B on contract version N”.

Practical takeaway for business

Architecture is what determines the cost of ownership, and it can be checked without being an engineer. Ask to be shown: what contracts the agents have and whether they include an “unsure” field; who decides the next step and by what rule; what happens on a model swap. If the answer is “agents agree among themselves in text”, the cost of future changes is unbounded.

data

Almost everyone has adopted — few capture value

Adoption is near-universal, but measurable business impact is rare. The gap is not access to AI — it is whether AI was taken to a managed process.

Source: McKinsey, The State of AI 2025 https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

The gap between “adopted” and “got an effect” is largely the gap between a system with an architecture and a tangle of prompts. The first is reworked piece by piece and gives a measurable effect; the second looks nice in a demo and never reaches a stable result.

Apply this to your processes — .

Open questions

Centralized coordination or choreography — the choice depends on the process: centralization is easier to debug, choreography is more robust to individual-participant failures; there is no universal answer. How strict to make contracts is a trade-off between flexibility and predictability. Whether emergence in large agent systems is manageable is an open research question, not a settled engineering practice.

If you are assessing a contractor or in-house build of a multi-agent system — the architecture is worth dissecting before the start, by contracts and coordination. — we’ll look at roles, seams and the cost of future changes.