Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Luma AI founder says world models, not LLMs, are the trillion-dollar opportunity

The company raised $1.4 billion to build what comes after large language models. Its founder thinks most of the industry is solving the wrong problem.

Ian Lyall profile image
by Ian Lyall
Luma AI founder says world models, not LLMs, are the trillion-dollar opportunity
Photo by Andrea De Santis / Unsplash

Amit Jain argues that models trained on text have a hard ceiling, and that the real opportunity lies in teaching machines to understand the physical world.

The AI boom was built on large language models. One well-funded startup thinks they are already obsolete.

Luma AI, a Bay Area company that has raised more than $1.4 billion from investors including a16z, Nvidia, and Amazon, has released an early version of what it calls a unified intelligence model.

Its founder, Amit Jain, argues that LLMs such as ChatGPT, Claude, and Gemini have a fundamental ceiling, and that the next generation of AI will need to understand the physical world, not just text written about it.

The ceiling argument

The case against LLMs starts with data. Current frontier models are trained on between 15 and 20 trillion tokens. Jain estimates that combining all of humanity's written text would barely reach 30 trillion. So, that ceiling is in sight.

The deeper problem is what text can and cannot teach. LLMs contain human logic and humanity's interpretation of the universe. They do not contain the universe itself. A model trained on text can describe how to swim in precise detail. It cannot learn to swim.

Real-world tasks, including robotics, navigation, and physical interaction, require a different kind of understanding than language can provide.

What a world model actually is

Several companies are working on what the industry calls world models, including World Labs, DeepMind, and Runway. Jain is dismissive of most of them. He describes current world models as lazy attempts at making video models interactive, arguing they lack genuine understanding of physics, causality, or long-range reasoning.

His definition is stricter. A true world model must combine understanding of the physical world with understanding of language. It does not need to be interactive or fast. Even a slow image model with a genuine grasp of architecture, physical motion, and cause and effect qualifies, in his view. Most current offerings do not.

Jain is also sceptical of approaches built around 3D data. Luma was itself a pioneer in 3D before pivoting. The lesson he draws is that 3D data does not exist at scale the way video does. Eight billion phones are producing vast quantities of 2D data continuously. AI research has shown that general methods applied to large amounts of compute and data outperform specialised approaches. Chasing 3D is, in his words, a fool's errand.

Agents, not models

The next phase of Luma's work is agentic. Jain draws a distinction between a model, which produces a specific output such as a clip or an image, and an agent, which takes a high-level brief and completes a full task end-to-end, evaluating and correcting its own work.

Give a model a 30-second ad brief and it generates something. Give an agent the same brief and it works through the problem, makes decisions, and delivers a finished product.

Luma's current customers are film studios and advertising agencies that use generation models, which require continuous prompting to produce usable results. The goal is to remove most of that friction.

On jobs, Jain argues the bigger structural problem in film is not AI displacement but a shortage of creatives. Companies that fail to retrain staff and rethink their model will shed jobs. Those who adapt will find AI expands what they can produce. He is not sympathetic to studio heads resisting the shift.

The roadmap covers three stages: generation, understanding, and robotics. Solving the first two makes the third possible. The goal is a single model that can reproduce the world, reason about it, and operate within it.

He thinks that looks a lot like AGI. He also thinks most people in AI are still thinking too small to get there.

Ian Lyall profile image
by Ian Lyall