Richard Sutton – Father of RL thinks LLMs are a dead end
Why are you trying to distinguish humans? Humans are animals. What we have in common is more interesting. What distinguishes us, we should be paying less attention to.
>> I mean, we're trying to replicate intelligence, right? No animal can go to the moon or make semiconductors. So, we want to understand what makes humans special. >> So, I like the way you consider that obvious, cuz I consider the opposite obvious.
If we understood a squirrel, we'd be almost all the way there. I am personally just kind of content being out of sync with my field for a long period of time perhaps decades because occasionally I have improved right in the past. I don't think learning is really about training. It's about an active process.
The child tries things and sees what happens. I think we should be proud that we are giving rise to this great transition in the universe. Today I'm chatting with Richard Sutton who is one of the founding fathers of reinforcement learning an inventor of many of the main techniques used there like TD learning and policy gradient methods and for that he received this year's touring award which if you don't know is basically the Nobel Prize for computer science Richard congratulations >> thank you Darish >> and uh thanks for coming on the podcast >> it's my pleasure >> okay so first question my audience and I are familiar with the LLM way of thinking about AI conceptually What are we missing in terms of thinking about AI from the RL perspective? >> Well, yes, I think it's really quite a different point of view and it's it can easily get separated and lose the ability to talk to each other.
>> Mhm. >> And um yeah, large language models have become such a big thing. Generative AI in general a big thing. Um and our field is subject to bandwagons and fashions.
So we lose we lose track of the uh basic basic things because I consider reinforcement learning to be basic AI and what is intelligence are the problem is is to understand your world and um reinforcement learning is about understanding your world whereas large language models are about mimicking people doing what people say you should do. They're not about figuring out what to do. >> Huh. I guess you would think that to emulate the ...
Watch the full video by Dwarkesh Patel on YouTube.