← Back to Library

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Okay, I'm joined again by my friends uh Schultter Bricken. Wait, [ __ ] Did I do this last? No, no, you named us differently, but we didn't have Shto Bricken and Trenton Douglas. Shelto Douglas and Trenton Bricken.

Um, uh, who are now both at Entropic? Uh, Shto scaling RL. Trenton still working on mechanistic interpability. Um, welcome back.

Happy to be here. Yeah, it's fun. What's changed since last year? We talked basically this month in 2024.

Now we're in 2025. What's happened? Okay, so I think the biggest thing that's changed is RL and language models has finally worked. Um, and this is manifested in we finally have proof of an algorithm that can give us expert human reliability and performance given the right feedback loop.

And so I think this is only really being like conclusively demonstrated in competitive programming and math basically. uh and so if you think of these two axes, one is uh the like intellectual complexity of the task and the other is the time horizon of which the task is uh is being completed on um and I think we have proof that we can we can reach the peaks of intellectual complexity uh along along many dimensions. uh we haven't yet demonstrated like longunning agentic uh performance and you're seeing like the first stumbling steps of that now and should see much more like conclusive evidence of that basically by the end of the year u with like real software engineering agents doing real work um and I think Trenton you're like experimenting with this at the moment right yeah absolutely I mean the most public example people could go to today is Claude plays Pokemon right uh and seeing it struggle in a way that's like kind of painful to watch but each model generation gets further through the game. Uh, and it seems more like a limitation of it being able to use uh memory system than anything else.

Yeah. Um, I wish we had recorded predictions last year. We definitely should this year. Oh, yeah.

Hold us accountable. Yeah, that's right. Would you have said that agents would be only this powerful as of last year? I think this is roughly on track for where I expected with software engineering.

I think I expected them to be a little bit better at computer use. Uh but I ...

Watch on YouTube →

Watch the full video by Dwarkesh Patel on YouTube.