← Back to Library

World models: Computing the uncomputable

Packy McCormick makes a provocative claim that could reshape how we view the next decade of artificial intelligence: the path to machines that truly understand and navigate the physical world doesn't run through language, but through action. While the industry chases the next large language model, McCormick argues that "World Models"—systems that learn to predict the future based on specific interventions—are the only architecture capable of computing the uncomputable complexity of reality. This is not just a technical distinction; it is a fundamental shift from passive observation to active simulation, a concept backed by billions in new capital from figures like Yann LeCun and Fei-Fei Li.

The Cost of Complexity

McCormick opens by highlighting a critical bottleneck in current robotics and AI: the inability of traditional computing to handle the sheer chaos of the real world without exploding in cost. He illustrates this with a vivid scenario of a soccer stadium, noting that "simulating N fans is at least an O(N) or O(N 2) problem" in traditional engines, where every interaction must be explicitly calculated. In contrast, he argues that World Models solve this by reducing dynamic, stochastic environments into a "single fixed cost operation in a neural network."

World models: Computing the uncomputable

This distinction is vital. Where standard video models predict the next frame based on probability, McCormick explains that a World Model predicts the next state based on intervention, mathematically expressed as P(s t+1 | s t , a t). The inclusion of the action variable, "a t," is the "magic" that allows the system to understand cause and effect rather than just correlation. This approach mirrors how human cognition works, a concept deeply rooted in the history of predictive coding, where the brain is viewed not as a passive receiver of data but as an active generator of predictions that are constantly updated by sensory input.

"Actions act as a form of compression to predict unfolding dynamics: they hold the information to unroll future states in an environment, until more actions take place and add new inputs into the environment."

The author's framing here is compelling because it moves beyond the hype of "generative AI" creating pretty pictures. Instead, it positions these models as essential infrastructure for embodied cognition, where a machine must understand that pushing a cup will make it fall, not just that a falling cup looks a certain way in a video. Critics might note that the leap from learning these dynamics in a video game to applying them in the messy, unstructured physical world remains a massive engineering hurdle, but the theoretical foundation is sound.

From Dreams to Reality

The essay draws a sharp line between passive video generation and the active planning required for general intelligence. McCormick writes, "Think about models like dreams. Have you ever had a dream where you simply stood and watched what was happening without the ability to intervene? That's a video model." He contrasts this with the lucid dream, where the dreamer can shape the narrative, calling this the true definition of a World Model.

This analogy effectively demystifies a complex technical concept. The core of the argument is that current Large Language Models (LLMs) are limited because they lack this interactive loop; they can discuss the world but cannot simulate the consequences of acting within it. McCormick asserts that "World Models are a new and potentially more powerful class of foundation model than LLMs for environments that require deep spatial and temporal reasoning."

The investment landscape is already shifting to reflect this belief. With Fei-Fei Li's World Labs raising $1 billion and Yann LeCun's AMI securing $1.03 billion, the financial markets are betting that language alone is insufficient for the next leap in AI. As McCormick notes, "Yann LeCun, who has been skeptical that LLMs are the path to general intelligence, just announced that he raised $1.03 billion for AMI." This convergence of top-tier talent and capital suggests a maturing field that is moving past the "cool videos" phase toward functional utility.

"The ability to compute the uncomputable is why we believe World Models will unlock progress in embodied AI in a way that current model architectures can't."

However, the field is not without its confusion. McCormick candidly admits that definitions are still "shaky" and that hype is distorting the conversation. He quotes Alexandre LeBrun, CEO of AMI Labs, who predicts, "In six months, every company will call itself a World Model to raise funding." This self-awareness adds credibility to the piece; McCormick isn't selling a finished product but mapping a nascent and rapidly evolving frontier.

The Simulation Hypothesis Revisited

To ground the technical discussion, McCormick weaves in a rich historical context, tracing the idea of simulated reality from Plato's Allegory of the Cave to Philip K. Dick's 1977 assertion that "We are living in a computer-programmed reality." He even references the original script for The Matrix, where humans were conceived as part of a collective neural network, a concept that was simplified for the movie but remains philosophically relevant.

This historical deep dive serves a specific purpose: it reframes the "World Model" not as a new invention, but as the technological realization of a question humans have asked for millennia. McCormick writes, "Being human, after all, is spending a lifetime taking actions based on what we experience, observe, and learn." By connecting modern AI architecture to ancient philosophical inquiries, he elevates the discussion from a product roadmap to a fundamental shift in how we understand intelligence.

The argument gains further weight when McCormick cites NVIDIA's Jim Fan, who predicts that "2026 will mark the first year that Large World Models lay real foundations for robotics." This timeline provides a concrete horizon for readers to watch, moving the conversation from abstract theory to near-future application.

"Unfortunately, the most hyped use case of World Models right now is AI video slop (and coming up, game slop). I bet with full confidence that 2026 will mark the first year that Large World Models lay real foundations for robotics, and for multimodal AI more broadly."

A counterargument worth considering is that the "video slop" criticism might be premature; the ability to generate high-fidelity video is often the necessary training data for these models to learn physics and object permanence. The "slop" may be the raw material from which the "foundation" is built. Nevertheless, McCormick's insistence on distinguishing between passive generation and active planning remains the most crucial takeaway.

Bottom Line

McCormick's strongest contribution is the clear demarcation between passive video models and active World Models, arguing that the latter is the only viable path to machines that can truly operate in the physical world. The piece's biggest vulnerability lies in the timeline; while the theoretical case is robust, the engineering challenges of scaling these models to handle the infinite variables of reality are immense and may not resolve by 2026. Readers should watch for the first demonstrations of agents that can plan multi-step physical tasks in unstructured environments, as these will be the true proof of concept for this emerging paradigm.

Deep Dives

Explore these related deep dives:

  • Predictive coding

    This neuroscience theory posits that the brain functions as a prediction machine, providing the biological blueprint for the 'dreaming' and future-prediction mechanics central to the article's definition of World Models.

  • Artificial general intelligence

    As a specific algorithmic architecture that learns to act by simulating future states in a latent space, this concept illustrates the technical shift from reactive chatbots to the proactive, planning agents described in the essay.

  • Embodied cognition

    The article argues that true intelligence requires interaction with the physical world, making this niche field of robotics and simulation the necessary context for understanding why 'World Models' are distinct from standard Large Language Models.

Sources

World models: Computing the uncomputable

by Packy McCormick · Not Boring · Read full article

Packy McCormick makes a provocative claim that could reshape how we view the next decade of artificial intelligence: the path to machines that truly understand and navigate the physical world doesn't run through language, but through action. While the industry chases the next large language model, McCormick argues that "World Models"—systems that learn to predict the future based on specific interventions—are the only architecture capable of computing the uncomputable complexity of reality. This is not just a technical distinction; it is a fundamental shift from passive observation to active simulation, a concept backed by billions in new capital from figures like Yann LeCun and Fei-Fei Li.

The Cost of Complexity.

McCormick opens by highlighting a critical bottleneck in current robotics and AI: the inability of traditional computing to handle the sheer chaos of the real world without exploding in cost. He illustrates this with a vivid scenario of a soccer stadium, noting that "simulating N fans is at least an O(N) or O(N 2) problem" in traditional engines, where every interaction must be explicitly calculated. In contrast, he argues that World Models solve this by reducing dynamic, stochastic environments into a "single fixed cost operation in a neural network."

This distinction is vital. Where standard video models predict the next frame based on probability, McCormick explains that a World Model predicts the next state based on intervention, mathematically expressed as P(s t+1 | s t, a t). The inclusion of the action variable, "a t," is the "magic" that allows the system to understand cause and effect rather than just correlation. This approach mirrors how human cognition works, a concept deeply rooted in the history of predictive coding, where the brain is viewed not as a passive receiver of data but as an active generator of predictions that are constantly updated by sensory input.

"Actions act as a form of compression to predict unfolding dynamics: they hold the information to unroll future states in an environment, until more actions take place and add new inputs into the environment."

The author's framing here is compelling because it moves beyond the hype of "generative AI" creating pretty pictures. Instead, it positions these models as essential infrastructure for embodied cognition, where a machine must understand that pushing a cup will make it fall, not just that a falling cup looks a certain way in a video. Critics might note that the leap from ...