engineering notes
Why the user doesn't know what they want
A dialog AI agent executes the first message as a finished need, but more often the need takes shape inside the conversation. How to design an agent that leads to a formulation instead of guessing.
Executive summary. Most dialog assistants are designed on the assumption that the customer arrives with a finished request that simply needs to be fulfilled. On live traffic that assumption breaks: people more often arrive with a sense of a problem than with its formulation, and the real need is born during the conversation. An agent that literally executes the first phrase serves not the customer but their guess about themselves — and loses everyone who hasn’t decided yet, which is most of them. This is not a model defect but a problem-framing defect, and it costs you in revenue left on the table.
When people build a dialog assistant, they usually draw a diagram: the user says what they want, the agent recognizes it and executes. The diagram is elegant and passes every demo, because in a demo the request is always crisp. The trouble is that a real person rarely knows what they want at the moment they start the conversation. They know something is bothering them — and the formulation comes later, in the process.
The user arrives not with a need but with a sense of a problem. The need is something you have to build together with them, in the dialog.
Hypothesis: the need forms inside the dialog, not before it
The standard model of dialog assumes the user has an intent, and the agent’s job is to extract and execute it. We claim the opposite: in most sales and support scenarios the intent does not exist in finished form at the input. A person says “I want X”, but behind it there is almost always “I want to understand whether I need X at all” or “I want to solve a problem for which X is only one hypothesis”. The first message is not a request, it’s an entry point. If the agent treats it as a specification, it builds the right answer to the wrong question. A mature agent must lead the user toward formulating the need, not execute the first phrase that comes along as a command.
Problem: executing the first message serves a guess, not a person
Take a typical opening in a travel service: “show me package tours to Italy in June”. A linear agent dutifully shows the catalog of package tours to Italy in June — and almost always misses. Because behind that phrase could be anything: “I want to go somewhere with the kids but I’m not sure a group tour is for us”, “I have a budget and I’m checking whether I fit”, “I’m actually deciding between a package tour and travelling on our own”. The person named the first hypothesis that came to mind, not their task. The agent that executed it literally gave a technically correct but useless answer — and lost the chance to close someone who was, in fact, still choosing.
This is the quiet mechanism of lost revenue. Those whose request truly is finished would have gotten there anyway — they barely need the agent. The agent’s value is precisely in working with those who haven’t formulated yet, and they are the majority. The linear “recognize — execute” model systematically drops exactly this, the most valuable part of the traffic. And worse, you can’t see it in tests: test scenarios are written by people who already know what they want, so in testing the request is always finished, while in production it almost never is.
Why the usual approaches don’t work
The first familiar approach is to scale up intent recognition: more intents, a sharper classifier, more clarifying questions along a predefined tree. It doesn’t help, because the very framework “intent → slots → result” is wired in as a linear script. A clarification tree assumes we know in advance all the branches that uncertainty might take. But the user’s uncertainty doesn’t decompose into a fixed tree: they aren’t choosing from our options, they form their own as they go. The more rigid the tree, the harder the agent drags the person down the “right” branch instead of their own.
The second approach is to rely on the “smartness” of a large model: give the LLM full freedom, it’ll figure it out. It doesn’t. By default the LLM is obliging: it grabs the first message as the task and tries to execute it, because that’s how training on “helpfulness” shapes it. Without an explicit engineering instruction — “first understand the task, then execute” — the model reproduces the same mistake as the linear agent, only more verbosely. The model’s eagerness here is not a help but a source of the miss.
The third approach is to add more clarifying questions “just in case”. This hits the other edge: the user who did arrive with a finished request now gets interrogated. A blunt “always clarify” is as wrong as “always execute” — because the real problem isn’t the number of questions but the absence of a model of where in their own awareness the person is.
Engineering model: the agent leads to the formulation of the need
The working model changes the agent’s goal. The goal is not “execute the request” but “bring the user to a formulated need, and only then to a solution”. This is a different loop, and it rests on three pillars.
The first is assessing request maturity. From the first messages the agent distinguishes the user’s state: they already have a finished need, they have only a hypothesis, or they have only a sense of a problem. The whole downstream strategy depends on it: for the ready one, execute and don’t get in the way; for the undecided one, help them formulate. This signal is not a separate intent classifier but a continuous assessment of the dialog state, the agent’s primary working resource.
The second is moving toward the task, not the order. When the person names a hypothesis (“a tour in June”), the agent doesn’t rush to execute it but takes a step toward the task behind it: for whom, why now, what should the outcome be. Not a checklist interrogation, but a few precise steps that turn “I want X” into “here’s my task, and what fits it is this”. Only after that does execution become meaningful.
The third is checking the hypothesis out loud. Before producing a solution, the agent reflects the user’s task back in its own words: “do I understand correctly that this matters to you, and that — not so much”. This is cheap in turns and sharply raises the hit rate, because the person sees their need formulated for the first time — and either confirms or corrects it. That is the moment the need turns from a sense into a decision.
This model doesn’t conflict with handling finished requests — it doesn’t break them, it adds a branch for everyone else. Technically it is close to an event-driven rather than a linear-script architecture: the agent reacts to the dialog state instead of running everyone through one pipe.
Practical takeaway for the business
Change the success metric. If you measure the agent by the share of “correctly executed requests”, you reward exactly the mistake that loses money — literal execution of the first message. The right metric is the share of dialogs in which an undecided user reached a formulated need and a solution. That is the work that makes the agent pay off.
What to delegate. Build an explicit requirement into the spec: the agent first assesses request maturity, then acts. This is not “one more prompt” but a separate engineering task with tests on vague, not finished, requests. Check against a corpus of real opening messages — that’s where you see how people actually come in.
What not to do. Don’t take “the bot answers crisp questions correctly” as readiness for production — on crisp questions there’s almost no difference between a good and a bad agent; it’s all on the vague ones. Don’t fix the problem with a blunt “always ask three clarifying questions”: that mutilates ready customers and doesn’t help undecided ones. The root is in state assessment, not in the number of questions.
Apply this to your agent — .
Open questions
Where the line runs between “help formulate” and “impose your own frame” — an over-active agent starts selling its vision of the task instead of the customer’s, and that’s a separate risk, held in check by a corpus of real dialogs rather than the developer’s intuition. How to measure “reached a formulated need” objectively — this is harder than measuring the fact of an answer and requires a judge by outcome, not by formal completion. How many clarification steps are acceptable before the person tires — depends on the channel and the stakes of the question, and is calibrated on live traffic, not assigned from the armchair.
If your assistant answers crisp questions confidently but loses those still choosing, the problem isn’t the model — it’s that the agent executes a message instead of understanding the task. — let’s see at which stage of awareness your agent loses customers.