engineering notes

Why natural language is inconvenient for machine coordination

Where natural language creates cost and errors in exchange between agents and how compact representations of meaning solve it.

In brief for executives. Natural language is an excellent interface between a human and a machine and a poor protocol between machines. When agents coordinate in text, two bills grow: for tokens and for parse errors. This is not philosophy but an engineering decision: the exchange language between agents directly affects the system’s cost of ownership and reliability.

The temptation is clear: models work well with text, so let agents communicate in text too. In a demo it looks elegant. In production it turns out that natural language is the most expensive and least reliable way to connect machines.

Between people — language. Between machines — a contract.

Hypothesis: the exchange language is an engineering decision, not a given

The “language” agents exchange in is a design decision with measurable consequences. Natural language is redundant (many tokens per unit of meaning) and ambiguous (one phrasing — different interpretations). For coordinating machines both properties are defects.

data

Machine-exchange reliability: free text vs structure

The stricter the exchange is formalized, the more reliable the coordination. Natural language as a protocol between agents is the least reliable; a hard schema almost eliminates parse failures.

Source: Structured Output Reliability, обзор подходов, 2025 https://www.cognitivetoday.com/2025/10/structured-output-ai-reliability/

The stricter the exchange is formalized, the more reliable: free text by prompt is the least reliable option, a hard schema almost eliminates parse failures.

Problem: ambiguity and token volume

Ambiguity. “Prepare a client report for the period” one agent understands one way, another differently: which period, which format, what counts as a client. Between people this is clarified in conversation; between agents it becomes a silent error.

Token volume. To remove ambiguity, text is lengthened: more explanations, more examples. Each clarification is paid-for tokens, and so on every step of every process.

data

Why multi-agent systems fail (1,600+ execution traces)

Nearly 80% of failures are specification and coordination — i.e. architecture, not a «weak model». Fixed by contracts and explicit coordination, not by swapping the LLM.

Source: Why Do Multi-Agent LLM Systems Fail? (MAST, UC Berkeley), NeurIPS 2025 https://arxiv.org/pdf/2503.13657

The failure distribution of multi-agent systems confirms it: a significant share is coordination and specification breakdowns — exactly what free text exchange generates.

Why the usual approaches don’t work

“Phrase it more precisely in the prompt” reduces ambiguity at the cost of growing tokens — the problem is moved, not removed.

“Ask to answer in JSON in words” without schema enforcement gives valid syntax but a floating format: extra fields, missing mandatory ones.

“Add examples to the prompt” turns the exchange into an unreadable, unversionable and expensive construction — the same flaw at greater volume.

Engineering model: formalized exchange

Schema as a constraint, not a hint. An agent’s output is valid by schema by construction, not “usually valid”. This removes parse failures.

Compact meaning. A structure with mandatory fields is passed, not a paragraph of text: fewer tokens for the same information, no room for ambiguity.

Natural language at the human boundary. Where a human is in the loop (a request, an explanation, an escalation), text is appropriate and needed. Between machines — a formal representation.

Contracts with an explicit “unsure”. Uncertainty is expressed as a field, not the tone of text — and is handled by the process, not guessed.

Practical takeaway for business

The exchange language between agents is a cost and risk line. A system with formal exchange is cheaper in tokens and more robust to failures than one where agents “correspond”. This is visible in the bill and in the number of incidents.

Ask how the agents exchange among themselves: by schema or by text. “They understand each other in natural language” is a marketing plus and an engineering minus at once.

Apply this to your processes — .

Open questions

Where exactly the “human — text, machine — structure” boundary runs in a specific process is decided by design, not a general rule. How much formalization lowers system flexibility is a per-domain trade-off. Whether models will catch up to schema reliability on free text — there is progress, but on critical processes formalization stays cheaper than trust.

If your agents coordinate in text — that is extra tokens and silent errors. — we’ll look at where to replace text exchange with a formal contract.