engineering notes

On-prem RAG: when it's justified and when it's not

When an on-prem RAG system is really needed: data security, the perimeter, cost of ownership — and when the cloud wins.

In brief for executives. An on-prem RAG system is a decision about the data regime, not about fashion and not about “our own neural net”. For a significant share of large companies it is exactly the data class and regulation, not model capability, that decides what can be deployed at all. But an on-prem perimeter is also a higher cost of ownership. The decision is made by a “data class × requirements × TCO” matrix, not by default in either direction.

“We need it on-prem, we don’t send data to the cloud” sounds on every other project with sensitive data. Sometimes a real requirement stands behind it, sometimes a default fear. Let’s go through when an on-prem RAG system is justified and when the cloud is objectively cheaper.

“On-prem” is a decision about data, not about fashion.

Hypothesis: “on-prem” is about data governance, not technology

An on-prem perimeter does not make a RAG system better or worse as technology. It changes one thing: where the data physically lives and who has access to it. This is a management and regulatory decision, and it must be made from the data class, not from a general feeling that “it’s safer this way”.

data

What actually holds back AI adoption in large companies

53%

of organizations named data privacy the top barrier to AI adoption (~1,500 IT leaders surveyed)

An on-prem perimeter is not a fashion choice: for half of companies it is the data regime, not model capability, that decides what can be deployed at all.

Source: Cloudera, опрос ИТ-руководителей, 2025 https://www.cloudera.com/about/news-and-blogs.html

For half of companies, data privacy is not one factor but the main barrier: it defines not the quality of adoption but the very list of permissible use cases.

Problem: the perimeter is chosen by default

Two mirror mistakes. The first — “everything to the cloud, it’s cheaper and faster” without analyzing which data goes there and whether that is permissible by regulation. The second — “everything on-prem, it’s safer” without accounting for the fact that an on-prem perimeter requires operation, updates and a team, and that part of the data in it is not sensitive at all. Both are made before the analysis, not from it.

Why the usual approaches don’t work

“On-prem by default” doesn’t work, because on-prem RAG without context engineering reproduces all the same failures (stale index, no reranking, illusion of knowledge) and adds the cost of operating hardware and models.

“Cloud by default” doesn’t work where the data class or regulation directly forbids exporting it — and then the cost question is secondary.

“On-prem because it feels calmer” doesn’t work as an engineering rationale: calm is not a criterion; the criterion is data class, SLA and cost of ownership.

Engineering model: the decision matrix

The decision is assembled from four axes.

Data class. Part of the data is sensitive and cannot leave the perimeter; part is not. Most knowledge bases are heterogeneous: the sensible answer is not “all on-prem” but a split by class.

Regulation and data sovereignty. Where requirements are direct, the choice is predetermined and the cost argument isn’t had.

SLA and load. An on-prem perimeter requires its own fault tolerance and peak capacity; for rare requests this is costlier than the cloud, for a constant flow it may be cheaper.

Cost of ownership. On-prem is not only hardware but model updates, operation, a team. This line is counted before the decision, not after.

The practical result is a hybrid scheme: the sensitive perimeter is processed on-prem, non-sensitive operations and heavy models where it is cheaper and faster, with an explicit boundary between perimeters.

data

Where enterprise AI infrastructure runs today

71%

of AI infrastructure runs outside the public cloud — on-prem and at the edge

On-prem and hybrid is not a niche case but already the prevailing mode in large enterprises, especially in finance. The question is not «can we» but «when is it justified».

Source: Enterprise Strategy Group (ESG), 2025 https://www.esg-global.com/

On-prem and hybrid is already the prevailing, not a niche, mode: the question is not “can we at all” but “what exactly, and why, to keep in our own perimeter”.

Practical takeaway for business

Don’t choose the perimeter before classifying the data. First — which data is sensitive and what regulation says; then — SLA and load; last — cost of ownership. The decision is almost always hybrid, not “all on-prem / all cloud”.

Count the on-prem TCO honestly. Hardware is the smaller part; the main part is operation, updates and people. A contractor who calls on-prem cheap “because no token bills” hasn’t counted ownership.

On-prem is a constraint by data class, not a goal. The goal is a working process while meeting data requirements; the perimeter is chosen for it.

Apply this to your processes — .

Open questions

Where exactly the boundary between perimeters runs for a specific base is a data-classification task solved on an audit, not a general rule. How to balance quality and isolation when the best models are available faster in the cloud is an open trade-off. How well on-prem models catch up to cloud ones on your tasks is checked by measurement on your data, not by a promise.

If you have sensitive data and a high error cost in answers — it is worth dissecting what must be on-prem and what need not. — we’ll lay it out by data class, SLA and cost of ownership.