Andrej Karpathy has a message for anyone betting on AI agents arriving this year: don't hold your breath.
The former OpenAI researcher and Tesla Autopilot architect says we're not even close to having agents that can truly work alongside humans. In his view, the industry is suffering from severe overoptimism. "This will be the decade of agents," he told Dwarkesh Patel in a recent interview. "We're very early."
Karpathy spent nearly two decades in AI research before stepping back to observe the field from afar. His perspective isn't about hype — it's about what's actually tractable. He points out that today's agents like Claude and Codeium can't do real work. They lack multimodal capabilities, computer use skills, continuous learning, and genuine cognition. These aren't minor gaps; they're fundamental barriers.
Why a decade? Karpathy's intuition comes from watching AI predictions play out over fifteen years. The problems are difficult but surmountable. When averaged out, they feel like a ten-year ticket.
The History of Seismic Shifts
Karpathy has lived through two or three transformative moments in AI. Each came with surprising irregularity.
When he started, deep learning was a niche side project — not the dominant force it became after AlexNet reoriented everyone toward neural networks. Early agents attempted to perceive and act in the world, like Atari games in 2013. Karpathy calls this period "a misstep." The reward signals were too sparse, the environments too simple.
At OpenAI, he worked on Universe — an agent using keyboard and mouse to operate web pages. The goal was digital knowledge work: something interacting with the actual computer screen rather than gaming worlds. It failed. They jumped too early, before neural networks had enough representational power.
"We shouldn't have been working on that," Karpathy said. "It was way too early." People were trying to build agents before getting the language model representation right first.
We're Building Ghosts, Not Animals
The deepest insight from the conversation involves what Karpathy calls ghosts instead of animals.
Humans and animals take in everything at once — sensory data without labels, making sense of the world through immersion. That's a radically different approach than current AI training with millions of years of simulated evolution behind it.
"We're not actually building animals," he said. "We're building ghosts." Digital spirit entities that mimic humans but lack true embodiment or understanding. They're ethereal because they're fully digital, trained on human data rather than evolving in the real world.
We're not actually building animals. We're building ghosts — ethereal spirit entities because they're fully digital and they kind of like mimicking humans.
Animals evolved through natural selection with massive hardware built into their DNA. A zebra runs minutes after birth. That complexity isn't reinforcement learning; it's baked in through evolution's outer loop. Humans, Karpathy argues, don't actually use reinforcement learning for intelligence tasks like problem solving — mostly motor skills.
The vision for AGI should be something like a human: taking in raw sensory data, figuring out what's going on from scratch without pre-training labels.
Counterpoints
Critics might note that Karpathy's pessimism could undersell current progress. Recent advances in reasoning models and computer use agents have shown remarkable capability gains that contradict his decade timeline. Others argue that multimodal systems are already emerging faster than his framework allows.
A reasonable counterargument: perhaps the ghost metaphor underestimates how quickly digital entities can become more embodied through feedback loops with real-world data. The gap between current AI and animal-like intelligence may narrow differently than Karpathy predicts.
Bottom Line
Karpathy's strongest contribution isn't prediction — it's diagnosis. His distinction between ghosts and animals clarifies what most AI hype ignores: we have representation power but not embodiment. The decade timeline is a bet on difficulty, not impossibility. Watch for whether multimodal agents and computer use capabilities arrive faster than his cautious outlook suggests.