engineering notes
Event-driven AI systems instead of simple scenarios
Why a linear scenario breaks on exceptions while an event-driven architecture makes an AI system robust and observable.
In brief for executives. A linear scenario “step 1 → step 2 → step 3” is pretty in a demo and fragile in production, because reality is not linear: events arrive out of order, services respond with delay, some steps repeat. An event-driven architecture is about operational reliability: fewer incidents, predictable behaviour under load, observability. This is an engineering decision with a direct consequence for the cost of operation.
Most AI automations are drawn as a straight line: a request came in, done A, then B, then C, returned the result. In a demo it works. In production reality delivers events in the wrong order and not one at a time — and the line snaps.
Reality is not linear — and systems shouldn’t be either.
Hypothesis: real processes are event-driven, not linear
In a real process simultaneously: new data arrives mid-processing; an external service responds later than the step that waited for it; the same signal arrives twice. These are not exceptions “for later” — this is the norm of the flow. A system designed as a line fights the very nature of the task.
Problem: a linear scenario lives on the “right order”
A linear pipeline implicitly assumes everything happens in order and exactly once. On a real flow this assumption is broken constantly: races, repeats, late responses. The line answers this with hangs, duplicated actions and silent losses — and is fixed with endless “ifs” on top of the original scenario.
Nearly 80% of failures are specification and coordination — i.e. architecture, not a «weak model». Fixed by contracts and explicit coordination, not by swapping the LLM.
Most failures of multi-step AI systems are exactly coordination and state breakdowns: precisely what a linear scenario doesn’t model.
Why the usual approaches don’t work
“Add handling for this case” on top of the line doesn’t scale: each new “if” complicates the scenario and spawns new races.
“Add a retry on error” without idempotency leads to duplicated actions: a step repeat executes it twice.
“Wait for the response synchronously” turns an external service’s delay into a hang of the whole process.
Engineering model: events, state, idempotency
An event as the unit. A step doesn’t “call the next” but emits an event; whoever must react is subscribed to it. Order and parallelism stop breaking the process.
Process state. A task has explicit state: what happened, what’s done. A late or repeat event is handled correctly because there is something to reconcile against.
Idempotency. A step repeat doesn’t change the result or double the action. This is what makes repeats safe — and therefore permissible.
Escalation by event. Low confidence, timeout, contradiction — these are events the handoff to a human is subscribed to, not an “if branch at the end of a function”.
Flow observability. Events, their order, delays and handling are visible. An incident turns from “hung somewhere” into “event X not handled by subscriber Y”.
Practical takeaway for business
Event-orientation is about the number of incidents and the cost of operation. A linear system requires constant manual fixing of exceptions; an event-driven one moves exceptions inside the model and stabilizes.
Per IDC, of 33 launched pilots only about 4 reach production. The cause of failure is not technology — it is the underestimated complexity of taking it to a process.
Part of why pilots don’t reach production is exactly here: a linear demo doesn’t withstand the real, unordered flow, and rewriting it into an event-driven one “later” is costlier than designing it in from the start.
Ask how the system behaves on a late and a repeat event. If the answer is “that won’t happen” — it will, and the cost of learning that in production is higher than designing for it in advance.
Apply this to your processes — .
Open questions
Where the limit of justified event-orientation lies — for simple rare processes a line is enough; the question is honest assessment, not a fashion for architecture. Event-driven systems are harder to debug without good observability — that is the built-in price of the approach. How finely to split a process into events is a trade-off between flexibility and complexity.
If your automation constantly requires manual fixing of “strange cases” — that is a symptom of a linear architecture. — we’ll look at the event flow and where it snaps.