research

LLM orchestration: why wrappers don't survive the second version

How a working AI system differs from an LLM wrapper, where wrappers break under real requirements, and what orchestrating a workflow around the model is actually made of.

An LLM wrapper is one stateless model call: text in, text out. For a demo that is enough. Under real requirements such a system does not survive the second iteration — not because the model is bad, but because all the complexity lives not in the model call but around it, and a wrapper has nowhere to put it.

What breaks first

The scenario is always the same. Version one is “user asked, model answered”. Then real requirements arrive, and each one breaks the wrapper on its own.

“Sometimes it needs to fetch data from a system.” A tool call appears. Now it is not one call but at least: realize a tool is needed → call it → parse the result → answer. There is nowhere to keep state between these steps in a wrapper except one ever-growing prompt.

“The answer is sometimes invalid.” You need validation and retry. Retry is branching and an exit condition. In a wrapper this becomes an ad-hoc while with heuristics nobody can explain a month later.

“Sometimes it must go to a human.” You need an escalation point that preserves state so the human sees context. A wrapper keeps no state — you bolt it on the side.

“Why did it answer differently yesterday?” You need step and decision tracing. A wrapper has no steps — it has one call, and you fundamentally cannot say what went wrong.

Each requirement seems small alone. Together they mean the system must have explicit state, explicit steps and explicit transitions between them. That is orchestration — and it cannot be added to a wrapper after the fact, only built.

What orchestration is made of

Materialized workflow state. State lives in the workflow, not inside the prompt: what is done, what tools returned, which step we are on. The prompt is assembled from state for the specific step, not accumulating everything.

Explicit steps and contracts between them. Each step has typed input and output. That is what makes behavior reproducible and testable: a step can be run in isolation with a known input and its output checked.

Explicit branching and termination conditions. When to retry, when to advance, when to stop — set explicitly, not inferred by the model inside one prompt. A loop that doesn’t converge must stop by rule, not by timeout.

Fault tolerance as part of the workflow. Timeouts, fallback to a simpler strategy, humans as the terminal handler. One step’s failure must not collapse the workflow — it must move it to a predefined state.

Where the line is

If “at which step and why did the system make this decision” can’t be answered from workflow data, you have a wrapper, even if there are several model calls inside. The mark of orchestration is not call count but explicit state and traceable transitions.
Decomposition into steps is a tradeoff, not a goal. Every extra step adds latency and coordination cost. Splitting makes sense where it buys testability and risk control, not for “architectural” looks.
A wrapper is cheapest to rewrite into orchestration before it has users. After, it is a migration of a live system with no logs to tell you what it was even doing.

Conclusion

The value of production AI is not the model call but managing state, branching and failures around it. A wrapper holds up exactly until the first real requirement, because it has nowhere to put the complexity that requirement creates. Orchestration is not “a more complex wrapper” — it is a different architecture: a workflow with explicit state, contracts and traceable transitions. You build it from the start, because you cannot retrofit it into a wrapper.

LLM orchestration: why wrappers don't survive the second version

What breaks first

What orchestration is made of

Where the line is

Conclusion

Let's design an AI-native automation layer for your operations.