Skip to content
Carbonfay
RU

AI Systems

RAG Systems

Building a RAG system (retrieval-augmented generation): enterprise knowledge retrieval as engineering infrastructure — sources, event-driven indexing, re-ranking, quality evaluation.

sourcesindexretrievere-rankcontextmodelevent-drivenquality eval · groundedness

Most RAG systems break in production not because of the model but because of context. “Vector DB plus a prompt” is a prototype: it works in a demo because the data is small, fresh and selected to match the questions. In production every one of those assumptions stops holding, and the system starts answering confidently wrong.

Sources and indexing

Internal documents, conversations, tasks, repositories and CRM are normalized and versioned. The index rebuilds on a source-change event, not weekly on a schedule. This removes an entire class of “confidently stale” answers — the most insidious one, because it is not reproducible and can’t be debugged from a user complaint.

Retrieval and context supply

Hybrid retrieval: dense finds candidates by meaning, lexical cuts “looks like it but isn’t”, re-ranking orders by relevance to the specific query. What goes into the model is not “everything we found” but a re-ranked minimum within a set context budget. This is about quality and cost at once: on long noisy context the model extracts the essential worse and costs more.

Quality evaluation

Without it RAG degrades invisibly. You need relevance and groundedness metrics — how much the answer relies on supplied context vs. the model’s memory — regression sets with known answers, and tracing of which fragment influenced the answer. “The sources don’t contain this” behavior is designed separately: an honest refusal beats a confident guess, but it does not appear on its own.

Where the line is

If demo quality is excellent but production “sometimes lies”, the culprit is almost always index freshness and missing re-ranking, not the retrieval algorithm and not the model. “Add more context” more often hurts than helps: accuracy depends more on context relevance than completeness. RAG is context infrastructure, not model tuning.

A vector database is not yet RAG

A vector database stores embeddings and answers “what is semantically similar” — that’s one component, not a system. A working RAG system is everything around it: source normalization and versioning, event-driven indexing, hybrid retrieval with re-ranking, context budgets and quality evaluation. The vector DB answers “what’s similar”; the RAG system answers “what to put into the model so the answer is grounded and reproducible”. Mistaking one for the other is a common reason “vector DB + a prompt” looks great in a demo and drifts in production.

Built for your task

We design RAG as a model-swappable layer of an AI system: local, on external models or hybrid — to your data and cost requirements. Knowledge retrieval is a frequent step in AI agents and in process automation.

Go deeper

faq

Straight answers

Why is building a RAG system harder than "a vector DB + a prompt"?
"Vector DB + prompt" works in a demo and breaks in production: the index goes stale, retrieval gets noisy, context is unbounded. A working RAG system means normalized, versioned sources, event-driven indexing, hybrid retrieval with re-ranking, context budgets and continuous quality evaluation.
Local RAG or external models?
Depends on data and cost requirements. The architecture is the same; the model layer differs. We design the model to be swappable — local, external or a hybrid per step.
How do you know RAG is working correctly?
By relevance and groundedness metrics (the answer relies on sources, not the model's memory), regression sets, and tracing which fragment influenced the answer. Without this, degradation is invisible.

related cases

Next step

Let's design an AI-native automation layer for your operations.

DBCV