Skip to content
Carbonfay
RU

engineering notes

RAG: where it helps and where it creates an illusion of knowledge

When RAG actually raises accuracy and when it merely errs confidently, and how to tell search-over-a-base from understanding the business.

In brief for executives. RAG (knowledge search + a model) really raises accuracy when the answer exists in the sources and can be found. But the same mechanism creates an illusion of knowledge: the system answers confidently even where the wrong thing, or nothing, was found. For the business the confident error is exactly the danger — it is more expensive than an honest “I don’t know”, because it passes unnoticed further down the process. The line “where RAG helps and where it harms” is a business decision about error cost, not a technical parameter.


RAG is sold as a way to “let the model know your data”. Part of that promise is true: when the answer exists in the sources, accuracy grows manifold. The other part is a trap: the same mechanism answers just as confidently when there is no answer. Let’s go through where the line runs.

Search over a base is not understanding the business.

Hypothesis: RAG is search, not understanding

RAG does not “understand the business”. It finds the similar and retells. When the needed thing is found — that’s useful. When something similar but wrong is found, the model just as smoothly retells the wrong. Search over a base and understanding the domain are different things, and substituting one for the other is the illusion of knowledge.

data
Answer accuracy: model alone vs the same model with RAG
Base model, no retrieval (HaluEval)10%Same model with RAG (HaluEval)45%Base model, no retrieval (TriviaQA)5%Same model with RAG (TriviaQA)35%

What decides accuracy is retrieval, not model size: on the same model, adding RAG multiplies accuracy. Answer quality is set by which context reached the model.

Source: Exploring RAG Solutions to Reduce Hallucinations in LLMs, IEEE, 2024 https://ieeexplore.ieee.org/document/11014810/

Where an answer exists and is found — the gain is real and large. That is “where it helps”.

Problem: confidence doesn’t depend on correctness

A RAG answer has no built-in “I’m unsure” indicator. The answer style is the same whether a precise fragment was found or a similar stale one was pulled in. The user sees no difference; the system gives no signal. That is why the illusion of knowledge is more dangerous than explicit ignorance: the error is invisible until the consequences.

Why the usual approaches don’t work

“Add more documents” raises the chance that something similar is found for any question — so the share of confident off-base answers grows too.

“Take a better model” doesn’t help: the model doesn’t know the provided fragment doesn’t answer the question; it makes it coherent.

“Trust it, it’s usually right” doesn’t work where the cost of a rare error is high: average accuracy says nothing about the cost of a specific confident error.

Engineering model: where RAG helps and where it harms

Helps when: the answer exists in the sources; sources are fresh and versioned; there is reranking (the answering, not just the similar, is found); the cost of a single error is moderate and there is a check.

data
What a reranking step (cross-encoder) buys you
+25–48%
retrieval-quality gain from reranking (depending on baseline and domain)
+4 nDCG
cross-encoder advantage over a strong bi-encoder, BEIR average

Reranking is the layer missing from the naive «vector → model» scheme — and the one that most shifts results from «wrong» to «right».

Source: BEIR benchmark; исследования cross-encoder reranking, 2022–2024 https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/65b9eea6e1cc6bb9f0cd2a47751a186f-Paper-round2.pdf

Creates an illusion when: the answer may not be in the sources but the system answers anyway; there is no grounding evaluation; there is no “honestly don’t know” mode; the error cost is high (legal, financial, medical contexts).

The engineering answer is not “more RAG” but: grounding evaluation on the flow, explicit refusal on insufficient basis, and calibrating “answer / stay silent” to the process’s error cost.

Practical takeaway for business

Decide which is more expensive: silence or a confident error. In reference scenarios it is cheaper to answer with risk; in legal and financial ones it is cheaper to honestly say “I don’t know”. This decision is made before development and defines the architecture, not the other way round.

Require a refusal mode. Ask: what does the system do when the basis is insufficient — answer anyway or honestly say “didn’t find”? If “always answers” — you are buying an illusion of knowledge, not knowledge.

Don’t confuse search over a base with understanding the business. RAG is powerful search with retelling; expert decisions with a high error cost require human control on expensive steps, not more faith in search.

Apply this to your processes — .

Open questions

How to reliably measure grounding without manual labeling — we approximate with automated scores but don’t replace spot checks. Where exactly the “answer / stay silent” line runs is a business decision changing process to process. How much better new models calibrate their own uncertainty — there is progress, but on critical processes it doesn’t cancel human control.


If your system always answers — even when it didn’t find anything — that is an illusion of knowledge, and it is expensive on critical processes. — we’ll define the error cost and where an honest “don’t know” mode is needed.

related cases

Next step

Let's design an AI-native automation layer for your operations.

DBCV