← Back to Library

Richard sutton – father of rl thinks llms are a dead end

The Fundamental Difference

Sutton sees LLMs as fundamentally different from intelligence. "Large language models are about mimicking people doing what people say you should do," he says. "They're not about figuring out what to do." That's the : one mimics, the other acts.

The problem runs deeper than just approach. Sutton argues that without a goal, there's no sense of right or wrong, better or worse. And LLMs don't have real goals—they only predict what tokens should come next, not what will actually happen in the world.

Richard sutton – father of rl thinks llms are a dead end

The Goal Question

When asked if LLMs have goals, Sutton's answer is damning: "Next token prediction. That's not a goal." A goal requires changing the external world—achieving something beyond just predicting accurately. Token prediction doesn't influence anything. It's purely internal.

Sutton points out that even when LLMs appear to solve math problems or demonstrate chain-of-thought reasoning, they're still just pattern-matching. They're given goals by humans—like solving an Olympiad problem—but they don't have genuine understanding of why solving those problems matters in the broader sense.

Imitation vs Experience

The interview draws an interesting parallel with human learning. Children initially learn through imitation—they mimic what adults do without understanding meaning. But at some point, humans transition to learning from experience: trying things, seeing what happens, adjusting based on results.

Sutton believes this distinction is crucial for AI. The LLM approach is essentially imitative—it learns from "here's a situation and here's what a person did" rather than from actual experience. And without genuine feedback about what actually works in the world, you can't truly learn.

"What we want, to quote Alan Turing, is a machine that can learn from experience."

The problem with LLMs isn't just philosophical. Without goals or experience, there's no ground truth—no way to verify what's actually right. In reinforcement learning, the "right thing to do" is the action that gets you reward. That definition exists because there's a clear outcome in the world. But in LLMs? There's no definition of what the right thing to say is.

The Bitter Lesson

Sutton wrote "The Bitter Lesson" in 2019—it became the most influential essay in AI history, often used to justify scaling up LLMs with ever-more compute. But Sutton now says that's a misunderstanding.

The lesson isn't just about computation. It's specifically about learning from experience rather than human knowledge. The essay argues that methods using genuine experience always win over those relying on human-crafted knowledge. And that pattern has repeated across AI history: every time we think human knowledge is sufficient, the experiential approach eventually supersedes it.

Critics might note that Sutton's view seems to discount the remarkable capabilities already demonstrated by LLMs—reasoning at Olympiad levels, chain-of-thought reasoning, even self-correction within context. The math problems are different from the physical world, but we've seen genuine problem-solving emerge from these systems.

Bottom Line

Sutton's core argument is compelling: intelligence requires goals that change the external world, not just pattern-matching of human text. His vulnerability is practical—we're already building systems that seem to work without following his prescription. The real question isn't whether LLMs are fundamentally limited; it's whether those limitations matter when these systems already outperform humans on many tasks. Sutton would say yes. The industry seems to be saying no.

Deep Dives

Explore these related deep dives:

Sources

Richard sutton – father of rl thinks llms are a dead end

by Dwarkesh Patel · Dwarkesh Patel · Watch video

Why are you trying to distinguish humans? Humans are animals. What we have in common is more interesting. What distinguishes us, we should be paying less attention to.

>> we're trying to replicate intelligence, right? No animal can go to the moon or make semiconductors. So, we want to understand what makes humans special. >> So, I like the way you consider that obvious, cuz I consider the opposite obvious.

If we understood a squirrel, we'd be almost all the way there. I am personally just kind of content being out of sync with my field for a long period of time perhaps decades because occasionally I have improved right in the past. I don't think learning is really about training. It's about an active process.

The child tries things and sees what happens. I think we should be proud that we are giving rise to this great transition in the universe. Today I'm chatting with Richard Sutton who is one of the founding fathers of reinforcement learning an inventor of many of the main techniques used there like TD learning and policy gradient methods and for that he received this year's touring award which if you don't know is basically the Nobel Prize for computer science Richard congratulations >> thank you Darish >> and thanks for coming on the podcast >> it's my pleasure >> okay so first question my audience and I are familiar with the LLM way of thinking about AI conceptually What are we missing in terms of thinking about AI from the RL perspective? >> Well, yes, I think it's really quite a different point of view and it's it can easily get separated and lose the ability to talk to each other.

>> Mhm. >> And yeah, large language models have become such a big thing. Generative AI in general a big thing. and our field is subject to bandwagons and fashions.

So we lose we lose track of the basic things because I consider reinforcement learning to be basic AI and what is intelligence are the problem is to understand your world and reinforcement learning is about understanding your world whereas large language models are about mimicking people doing what people say you should do. They're not about figuring out what to do. >> Huh. I guess you would think that to emulate the trillions of tokens in the corpus of internet text, ...