engineering notes
Engineering Notes
Carbonfay Engineering Notes — dense engineering write-ups: AI-system architecture, context engineering, economics and organization. Not SEO filler, but what we learned in practice.
RED-driven development: an AI agent's maturity lives in its red tests
Why the «feature → test → green» cycle isn't enough for AI agents, and how RED-driven development measures maturity by the quality of red tests survived, not the count of green ones.
GREEN bias: how development environments falsify AI-agent quality
Why testing AI agents yields green reports while the agent fails in production: the built-in GREEN bias of development environments, and how to bypass it with honest evaluation.
The bot that passes every test and doesn't sell
Why an AI bot with 95% green tests barely moves sales: tests check knowledge of facts, not handling doubt, holding the dialog, and bringing the customer back.
Why every demo bot is smarter than the real one
An AI bot's demo is always smarter than production: tests are written by people who know the system and unconsciously help the agent. That's a class of engineering error, not a prompt tweak.
Context contamination in tests
If the tester — human or LLM — knows how the system is built, the test is spoiled: a hint leaks into the check. The fix is full isolation of the test loop from knowledge of the implementation.
Intents don't exist
Why the classic «user → intent → slots → result» scheme only works in slide decks, while a real person holds several intentions at once and shifts them as the dialog unfolds.
Chaos as the primary form of human dialog
Why teams treat a chaotic dialog as the user's mistake, when in fact chaos is the norm of live conversation, and a dialog system must be designed for it rather than «training» the human.
Why the user doesn't know what they want
A dialog AI agent executes the first message as a finished need, but more often the need takes shape inside the conversation. How to design an agent that leads to a formulation instead of guessing.
User lying as a normal operating mode
Users routinely give an AI agent wrong data — budget, dates, goal — and not out of malice. A mirror of LLM hallucination: designing for unreliable input as the foundation of agent reliability.
A model of cognitive noise
An AI agent's quality is defined not by the right answer but by resilience to dialog noise: topic switches, contradictions, emotions, returns to old questions. How to redefine the quality metric.
A red team for conversational AI
Why a developer can't honestly test their own conversational agent, and why you need an AI red team — an independent opponent agent whose job is to prove the agent doesn't work.
Agent versus agent: a new model of QA
Why the tester of a conversational AI is another agent, not a human. The «Customer Simulator → Target Agent → Judge» architecture as multi-agent engineering applied to QA.
Testing an agent against real dialogs
Why synthetic scenarios are useless for chatbot and AI-agent testing, and how a corpus of real dialogs becomes the source of truth and a company asset.
Why LLMs play the customer role badly
LLM-based customer simulators behave too reasonably and help the agent instead of breaking it. Why honest AI-agent testing needs explicit models of difficult behavior.
Adopting AI in a company: where to start and what it costs
AI adoption pays off on specific repeated processes: where to start, how to compute cost and effect, what not to do.
Automating business processes with AI: what actually works
Which business processes AI actually automates — classification, routing, drafts, status reconciliation — and where it doesn't pay off.
Best approaches to AI agents for business: how to measure "best"
How to measure the "best" AI agent for business: reliability, cost of ownership, human control and embeddability — not the model.
Building multi-agent systems: architecture that doesn't fall apart
How to design multi-agent systems that work in production: roles, contracts, coordination, fault tolerance and predictable cost.
Context as the main resource of an AI system
Why an AI system's quality is set by context management, not model size, and how to manage it as an engineering discipline.
Context entropy and the degradation of answer quality
How noise accumulating in context lowers an AI system's answer quality, and which engineering techniques hold it back.
Cost-aware architecture for AI systems
How to design AI systems where cost is an engineering metric alongside latency and reliability, not a surprise at month's end.
Do machines need their own languages to coordinate
Why agents need compact machine representations of meaning instead of natural language, and what it changes in cost and reliability.
Event-driven AI systems instead of simple scenarios
Why a linear scenario breaks on exceptions while an event-driven architecture makes an AI system robust and observable.
Hidden hardcode in AI automation
How wired-in rules and prompt chains turn AI automation into technical debt and why it hits the cost of changes.
How AI compresses operational processes
How AI removes intermediate steps, approvals and waits in operational processes and what it gives in cycle time.
How to build a RAG system that doesn't lie in production
A practical breakdown of building a RAG system: sources, event-based indexing, hybrid search with reranking, and grounding evaluation.
How to compute the payback of AI agents
A model for computing AI-agent payback: what to count as benefit, how to account for token and operation cost, which assumptions are dangerous.
Multi-agent system architecture: roles, contracts, coordination
What a multi-agent system is made of: the agent as an element, input/output contracts, coordination and message exchange between agents.
On-prem RAG: when it's justified and when it's not
When an on-prem RAG system is really needed: data security, the perimeter, cost of ownership — and when the cloud wins.
Orchestrating AI agents in business processes
What AI-agent orchestration is: how to connect agents, tools and people into a managed business process with cost control.
Problems of multi-agent systems and how to avoid them
A breakdown of typical multi-agent failures — looping, context drift, cost growth — and engineering ways to avoid them.
RAG system architecture: sources, indexing, reranking
How a RAG system is built (retrieval augmented generation): sources, indexing, hybrid search, reranking and delivering the minimally sufficient context.
RAG: where it helps and where it creates an illusion of knowledge
When RAG actually raises accuracy and when it merely errs confidently, and how to tell search-over-a-base from understanding the business.
Reducing coordination costs with AI
Why a large company's main hidden cost is coordination, and how an AI layer lowers it without cutting people.
What AI-agent development costs and what drives the price
What makes up the cost of AI-agent development: process, integrations, human control, operation and token cost.
What AI-native engineering is and why it's not "coding with ChatGPT"
What AI-native engineering means: runtime thinking, architecture around models and cost control — and why it's not code generation in a chat.
Why a chatbot is not an AI architecture
How a corporate chatbot differs from an AI system: state, contracts, human control — and why this decides money and risk.
Why AI automation can suddenly become expensive
Where uncontrolled cost growth in AI automation comes from — context length, retries, bad routing — and how to keep the budget.
Why companies need an operational AI environment, not a chatbot
Why a point chatbot doesn't scale, while an operational AI environment lowers coordination costs and gives leaders visibility into processes.
Why natural language is inconvenient for machine coordination
Where natural language creates cost and errors in exchange between agents and how compact representations of meaning solve it.