engineering notes

Context as the main resource of an AI system

Why an AI system's quality is set by context management, not model size, and how to manage it as an engineering discipline.

In brief for executives. In an AI system the main managed resource is not the model but the context — what is fed to the model at each step. It is managed like a budget, because both answer quality and cost depend directly on it. Companies that treat context as a resource get a predictable system; those who “throw everything into the model just in case” get a growing bill and floating quality.

When an AI system behaves badly, the first impulse is to blame the model. Almost always it is not the model but what, and in what volume, made it into the model. Context is exactly the resource a system lives on or chokes on.

An AI system’s main resource is not the model, but the context.

Hypothesis: quality and cost are set by context, not the model

The same model on precise, minimally sufficient context gives a useful answer; on bloated, noisy context — a confident error. Model size barely moves this result. So designing an AI system is first of all designing context management, not picking a model.

data

Model context windows grow roughly 30× per year

4K → 1M+

tokens of context window: early 2023 → 2025

≈30×/yr

growth rate of context length since mid-2023

The window grows faster than the ability to use it: you can fit almost anything, but accuracy depends on what is put there and how. Window size is no substitute for context engineering.

Source: Epoch AI, анализ длины контекста https://epoch.ai/data-insights/context-windows

The context window has grown by orders of magnitude — the temptation to “fit everything” became technical. But capacity and usefulness are different things.

Problem: context is treated as “add more text”

A common mental model: quality is low — let’s add more history, more documents, more examples to the context. Context grows, and with it cost (you pay per token) and accuracy falls (the model mixes sources, loses the important among the unimportant).

data

Answer accuracy by position of the needed fact in a long context

Models use long context unevenly: what lands in the middle gets lost. «More context» without managed delivery lowers accuracy, not raises it. Values illustrative; the profile is from the study.

Source: Lost in the Middle: How Language Models Use Long Contexts (Liu et al.), 2023 https://arxiv.org/abs/2307.03172

The model uses long context unevenly: what’s in the middle is lost. “More” literally means “worse” if delivery isn’t managed.

Why the usual approaches don’t work

“Put in all retrieved material” doesn’t work, because the relevant drowns in the almost-relevant, and the bill grows linearly with length.

“Take a model with a bigger window” doesn’t work, because the problem is not capacity but selection: the model still uses long context unevenly.

“Put in the whole dialogue history” doesn’t work, because most of the history isn’t needed for the current step, yet is paid for and adds noise.

Engineering model: context as a budgeted resource

A budget per step. Each step is given a ceiling: how many context tokens it may use. This forces the system to select, not accumulate.

Selection, not accumulation. A reranked minimum sufficient for the answer goes into context, not “everything found”. Selection is a separate engineering step, not a side effect of search.

Versions and freshness. Context fragments have a source, version and date; the stale is not fed even if semantically similar.

Context per step, not per system. Different steps need different context. A global “shared context for everything” is exactly the source of bloat.

Context observability. It is visible what exactly was fed at each step and how much it cost. Without it, context growth is invisible until the bill.

Practical takeaway for business

Context is a budget line, and it is managed before launch. If spend grows faster than the number of requests, the context length is almost always silently growing; this is found via tracing in hours — if observability is built in.

Ask the contractor whether there is a context budget per step and how what is fed to the model is selected. “We put everything in, the model will sort it out” is both unpredictable cost and floating quality at once.

Don’t buy “a bigger model” as a quality fix. More often a quality problem is a context-selection problem, and it is solved cheaper and more reliably than by swapping the model.

Apply this to your processes — .

Open questions

How to measure context “sufficiency” on your tasks without a labeled gold set is a problem without a mature general solution; we build the gold set from historical requests. Where the limit of context compression without meaning loss lies is an open trade-off resolved by measurement. How much new long-context models remove the middle problem — it improves, but does not cancel the need for selection.

If your AI system gets more expensive faster than the load grows — it is almost certainly about context. — we’ll look at what’s fed to the model and where money leaks into unmanaged context.