The most important disagreement in AI right now does not appear in product announcements. It rarely surfaces in the coverage that follows each new model release. But it is the question that determines whether the current trajectory of AI development leads to systems that genuinely understand the world, or to extremely sophisticated pattern-matching machines that will remain fundamentally brittle in ways that matter.
The dispute is between two schools of thought: the scaling laws camp and the world models camp.
The scaling laws thesis originated from an empirical observation that proved remarkably durable. Researchers at OpenAI formalised it in 2020: the performance of language models improves in predictable, power-law relationships with three variables simultaneously — model size (parameter count), training data volume, and compute (measured in floating-point operations). Increase all three in the right proportions, and performance improves reliably across multiple orders of magnitude. The implication was almost algorithmic: more compute, more data, more parameters equals a better model. Run the recipe longer.
This thesis has been validated empirically at every scale it has been tested. GPT-2 to GPT-3 to GPT-4; the progression of Gemini and Claude models — each generation has confirmed that scaling produces meaningful capability gains. The infrastructure investment that followed — tens of thousands of GPUs, multi-gigawatt data centres, training runs costing hundreds of millions — is the industrial expression of belief in scaling. If the curve holds, you build more.
The critique comes most forcefully from Yann LeCun, Chief AI Scientist at Meta and one of the foundational architects of modern deep learning. LeCun's argument is not about the empirical validity of scaling curves. It is about what the curves are measuring. His claim: predicting the next token in a sequence of text — no matter how much data, no matter how many parameters — cannot produce a system that understands causality, physics, or the structure of the world. It produces a system that has extraordinary knowledge of how humans describe the world in language. These are not the same thing.
His demonstration is simple but pointed. A large language model can describe in accurate detail what happens when a glass falls off a table. It can generate the physics, the sound, the consequence. But it has never seen a glass fall. It has no internal model of gravity, of rigidity, of fragmentation dynamics. What it has is a very good statistical model of how humans write about glasses falling. Ask it a question that lies slightly outside the distribution of human descriptions — a novel physical configuration, a situation no one has written about — and the facade can collapse into confident nonsense.
LeCun's alternative — the world model approach, currently expressed in his Joint Embedding Predictive Architecture (JEPA) — proposes that genuine intelligence requires an internal simulation of the world: a latent model of physics, causality, and object permanence, learned from multimodal observation (video, images, tactile data) rather than text prediction. Such a system could reason by simulation — "if I do X, what is the consequence?" — rather than by statistical retrieval of descriptions.
The scaling response is that the emergence of reasoning capabilities in large models — chain-of-thought, mathematical problem-solving, multi-step planning — suggests that scaling may be producing something like a world model, implicitly, through a different path. The empirical question of whether these capabilities are genuine causal understanding or sophisticated pattern matching is genuinely unresolved. The research literature contains compelling evidence for both interpretations.
What both schools agree on is that the current AI systems, whatever their underlying nature, remain probabilistically fallible in ways that matter for how we use them. A system that predicts plausible text is not the same as a system that reasons about ground truth. Until we have systems that reliably distinguish between the two — and we currently do not — the work of maintaining that distinction belongs to the humans using them. The question of which school is right about the mechanism has a direct bearing on how long, and how much, we can rely on AI systems to do that work for us.