AI can write code and prove theorems. It still can't book me a restaurant

The gap between what AI agents can do in demos and what they can do in daily life is the biggest unsolved problem in consumer technology

by Ian Lyall

Updated May 05, 2026

AI can write code and prove theorems. It still can't book me a restaurant — Photo by Andrea De Santis / Unsplash

The AI industry has a party trick problem. Every few weeks, a new model arrives that can write production-grade software, analyse 28,000 genes, or discover mathematical proofs. The benchmarks go up. The demos look spectacular. And then you try to get it to do something genuinely useful in your actual life, and you are back to managing another inbox.

This is the state of consumer AI in 2026. The technology is capable enough to help. It is not capable enough to help without being managed. That distinction matters more than any benchmark score.

The management tax nobody talks about

The promise of an AI agent is that it lifts the load. The reality is that it adds one. You have to notice the task, remember the agent exists, translate the task into a prompt, decide how much permission to grant, and then supervise the result. For a simple job like booking a restaurant, that sequence is more work than picking up the phone.

Chatbots succeeded because they required almost no behavioural shift. Twenty years of typing queries into Google had already trained the muscle memory. Agents are different. They ask you to figure out what to delegate and how to describe it, which is a skill most people have only ever practised with other humans, where shared history and social context do the heavy lifting.

The result is a technology that works brilliantly in controlled conditions and disappears from daily life within a week. The demo has a prepared user. Real life has someone juggling three tabs, a school pickup, and a half-read email thread.

Protestors hold signs saying "quit gpt" with crossed-out logos. — Photo by Nathan Kuczmarski / Unsplash

The anticipation gap

The missing ingredient is anticipation. A useful assistant does not wait to be summoned. It notices the delayed flight, checks the connection, and texts you the options before you open the airline app. The situation calls the agent into existence, not the other way around.

Consumer software has crossed smaller versions of this threshold before. Push notifications, recommendation feeds, autocomplete, smart replies. They all worked because they were narrow, bounded, and reversible. Agents are trying to do the same thing across dozens of domains, with real-world actions and real-world costs if they get it wrong.

There is no test suite for life admin. Code either compiles or it doesn't. A restaurant booking depends on budget, timing, dietary requirements, who you are trying to impress, and whether you care about parking. Defining "right" is subjective in a way that makes the engineering problem far harder than anything in software development.

white robot near brown wall — Photo by Alex Knight / Unsplash

The trust ladder

The companies closest to solving this are thinking about permission as a ladder rather than a switch. Step one: the agent can read. Step two: it can suggest. Step three: it can draft. Step four: it can act with confirmation. Step five: it acts on its own. Each step requires the user to extend a little more trust, and breaking that trust at any point sends them back to the bottom.

Several products are making credible attempts. Poke lives inside iMessage and connects to your email, calendar, and search. Clickie sits beside your cursor and watches your screen. Anthropic's Cowork applies the multi-step approach that made Claude Code valuable to non-technical knowledge work. None of them has crossed the threshold into genuine proactivity yet, but the building blocks are visible.

The race is on

OpenAI recently hired Peter Steinberger, known for building the open-source agent framework OpenClaw. That is not a research hire. That is a product signal. The race to build the first proactive consumer agent is underway, and the window is short.

The model that wins this race will not be the one with the highest benchmark score. It will be the one who knows when to show up, when to ask, and when to shut up. Nobody has built that yet. But somebody will, and when they do, managing your own calendar will feel as quaint as hand-washing your clothes.

by Ian Lyall

Updated May 05, 2026