engineering notes
RAG: where it helps and where it creates an illusion of knowledge
When RAG actually raises accuracy and when it merely errs confidently, and how to tell search-over-a-base from understanding the business.
In brief for executives. RAG (knowledge search + a model) really raises accuracy when the answer exists in the sources and can be found. But the same mechanism creates an illusion of knowledge: the system answers confidently even where the wrong thing, or nothing, was found. For the business the confident error is exactly the danger — it is more expensive than an honest “I don’t know”, because it passes unnoticed further down the process. The line “where RAG helps and where it harms” is a business decision about error cost, not a technical parameter.
RAG is sold as a way to “let the model know your data”. Part of that promise is true: when the answer exists in the sources, accuracy grows manifold. The other part is a trap: the same mechanism answers just as confidently when there is no answer. Let’s go through where the line runs.
Search over a base is not understanding the business.
Hypothesis: RAG is search, not understanding
RAG does not “understand the business”. It finds the similar and retells. When the needed thing is found — that’s useful. When something similar but wrong is found, the model just as smoothly retells the wrong. Search over a base and understanding the domain are different things, and substituting one for the other is the illusion of knowledge.
What decides accuracy is retrieval, not model size: on the same model, adding RAG multiplies accuracy. Answer quality is set by which context reached the model.
Where an answer exists and is found — the gain is real and large. That is “where it helps”.
Problem: confidence doesn’t depend on correctness
A RAG answer has no built-in “I’m unsure” indicator. The answer style is the same whether a precise fragment was found or a similar stale one was pulled in. The user sees no difference; the system gives no signal. That is why the illusion of knowledge is more dangerous than explicit ignorance: the error is invisible until the consequences.
Why the usual approaches don’t work
“Add more documents” raises the chance that something similar is found for any question — so the share of confident off-base answers grows too.
“Take a better model” doesn’t help: the model doesn’t know the provided fragment doesn’t answer the question; it makes it coherent.
“Trust it, it’s usually right” doesn’t work where the cost of a rare error is high: average accuracy says nothing about the cost of a specific confident error.
Engineering model: where RAG helps and where it harms
Helps when: the answer exists in the sources; sources are fresh and versioned; there is reranking (the answering, not just the similar, is found); the cost of a single error is moderate and there is a check.
Reranking is the layer missing from the naive «vector → model» scheme — and the one that most shifts results from «wrong» to «right».
Creates an illusion when: the answer may not be in the sources but the system answers anyway; there is no grounding evaluation; there is no “honestly don’t know” mode; the error cost is high (legal, financial, medical contexts).
The engineering answer is not “more RAG” but: grounding evaluation on the flow, explicit refusal on insufficient basis, and calibrating “answer / stay silent” to the process’s error cost.
Practical takeaway for business
Decide which is more expensive: silence or a confident error. In reference scenarios it is cheaper to answer with risk; in legal and financial ones it is cheaper to honestly say “I don’t know”. This decision is made before development and defines the architecture, not the other way round.
Require a refusal mode. Ask: what does the system do when the basis is insufficient — answer anyway or honestly say “didn’t find”? If “always answers” — you are buying an illusion of knowledge, not knowledge.
Don’t confuse search over a base with understanding the business. RAG is powerful search with retelling; expert decisions with a high error cost require human control on expensive steps, not more faith in search.
Apply this to your processes — .
Open questions
How to reliably measure grounding without manual labeling — we approximate with automated scores but don’t replace spot checks. Where exactly the “answer / stay silent” line runs is a business decision changing process to process. How much better new models calibrate their own uncertainty — there is progress, but on critical processes it doesn’t cancel human control.
If your system always answers — even when it didn’t find anything — that is an illusion of knowledge, and it is expensive on critical processes. — we’ll define the error cost and where an honest “don’t know” mode is needed.