AI Systems
RAG Systems
Building a RAG system (retrieval-augmented generation): enterprise knowledge retrieval as engineering infrastructure — sources, event-driven indexing, re-ranking, quality evaluation.
Most RAG systems break in production not because of the model but because of context. “Vector DB plus a prompt” is a prototype: it works in a demo because the data is small, fresh and selected to match the questions. In production every one of those assumptions stops holding, and the system starts answering confidently wrong.
Sources and indexing
Internal documents, conversations, tasks, repositories and CRM are normalized and versioned. The index rebuilds on a source-change event, not weekly on a schedule. This removes an entire class of “confidently stale” answers — the most insidious one, because it is not reproducible and can’t be debugged from a user complaint.
Retrieval and context supply
Hybrid retrieval: dense finds candidates by meaning, lexical cuts “looks like it but isn’t”, re-ranking orders by relevance to the specific query. What goes into the model is not “everything we found” but a re-ranked minimum within a set context budget. This is about quality and cost at once: on long noisy context the model extracts the essential worse and costs more.
Quality evaluation
Without it RAG degrades invisibly. You need relevance and groundedness metrics — how much the answer relies on supplied context vs. the model’s memory — regression sets with known answers, and tracing of which fragment influenced the answer. “The sources don’t contain this” behavior is designed separately: an honest refusal beats a confident guess, but it does not appear on its own.
Where the line is
If demo quality is excellent but production “sometimes lies”, the culprit is almost always index freshness and missing re-ranking, not the retrieval algorithm and not the model. “Add more context” more often hurts than helps: accuracy depends more on context relevance than completeness. RAG is context infrastructure, not model tuning.
A vector database is not yet RAG
A vector database stores embeddings and answers “what is semantically similar” — that’s one component, not a system. A working RAG system is everything around it: source normalization and versioning, event-driven indexing, hybrid retrieval with re-ranking, context budgets and quality evaluation. The vector DB answers “what’s similar”; the RAG system answers “what to put into the model so the answer is grounded and reproducible”. Mistaking one for the other is a common reason “vector DB + a prompt” looks great in a demo and drifts in production.
Built for your task
We design RAG as a model-swappable layer of an AI system: local, on external models or hybrid — to your data and cost requirements. Knowledge retrieval is a frequent step in AI agents and in process automation.