AI Systems
RAG Systems
Enterprise knowledge retrieval as engineering infrastructure: sources, event-driven indexing, re-ranking, quality evaluation.
Most RAG systems break in production not because of the model but because of context. “Vector DB plus a prompt” is a prototype: it works in a demo because the data is small, fresh and selected to match the questions. In production every one of those assumptions stops holding, and the system starts answering confidently wrong.
Sources and indexing
Internal documents, conversations, tasks, repositories and CRM are normalized and versioned. The index rebuilds on a source-change event, not weekly on a schedule. This removes an entire class of “confidently stale” answers — the most insidious one, because it is not reproducible and can’t be debugged from a user complaint.
Retrieval and context supply
Hybrid retrieval: dense finds candidates by meaning, lexical cuts “looks like it but isn’t”, re-ranking orders by relevance to the specific query. What goes into the model is not “everything we found” but a re-ranked minimum within a set context budget. This is about quality and cost at once: on long noisy context the model extracts the essential worse and costs more.
Quality evaluation
Without it RAG degrades invisibly. You need relevance and groundedness metrics — how much the answer relies on supplied context vs. the model’s memory — regression sets with known answers, and tracing of which fragment influenced the answer. “The sources don’t contain this” behavior is designed separately: an honest refusal beats a confident guess, but it does not appear on its own.
Where the line is
If demo quality is excellent but production “sometimes lies”, the culprit is almost always index freshness and missing re-ranking, not the retrieval algorithm and not the model. “Add more context” more often hurts than helps: accuracy depends more on context relevance than completeness. RAG is context infrastructure, not model tuning.