← Back to Library

Is rl + llms enough for agi? — Sholto douglas & trenton bricken

Sholto Douglas and Trenton Bricken predict that by late 2025, AI agents will complete a full day's worth of software engineering work—marking a decisive shift from mere chatbot assistants to genuine digital workers. The claim isn't speculative: it's based on concrete evidence from competitive programming and mathematics domains where reinforcement learning has achieved expert-level performance.

The New Reality

Sholto Douglas and Trenton Bricken are both researchers at Anthropic, focused respectively on scaling reinforcement learning and mechanistic interpretability. They sat down with host Dwarkesh Patel to discuss what's changed since their 2024 conversation—and the answer is striking: RL and language models have finally worked.

Is rl + llms enough for agi? — Sholto douglas & trenton bricken

The proof points are concrete. In competitive programming and mathematics, these systems now demonstrate reliable performance when given clean feedback signals. Unit tests pass. Code compiles. Math problems resolve to correct answers. The axis of intellectual complexity has been scaled.

What hasn't emerged yet is long-running agentic performance—autonomous systems that can operate across extended time horizons without constant human intervention. But the early steps are visible, and real software engineering agents are beginning to demonstrate competence.

Why Software Engineering Leads

The domain where these models excel most immediately is software engineering—not by accident but by design. The field offers what researchers call "verifiable rewards": does the code pass a test? Does it compile? Does it run?

Writing essays lacks equivalent verifiability. Grading an essay requires subjective taste, and humans are notoriously poor judges of quality—laced with length biases and preferences that don't correlate with actual excellence.

The researchers noted an illuminating comparison: a Nobel Prize-winning discovery may be more achievable than a Pulitzer-worthy novel. Scientific work layers verifiable experiments, measurable outcomes, and iterative testing. Literary creativity remains stubbornly resistant to automated verification.

This reveals something fundamental about the current generation of AI systems: when given a clear feedback loop, they perform impressively. Without one, they struggle.

The Reliability Question

The core bottleneck isn't merely "extra nines" of reliability—ten times more reliable, or ninety times more reliable—as previously assumed. Instead, it's the absence of context and the ability to handle complex, multi-file changes across larger scopes of work.

When a task requires discovery and iteration within an environment, these systems still falter. They handle well-scoped problems with high intellectual complexity effectively but struggle when tasks become amorphous or require sustained interaction with the world.

The researchers expect agents to perform close to a junior engineer's worth of work by the end of 2025—perhaps saving an entire day of boilerplate coding or delivering several hours of competent, independent software engineering.

The Capabilities Debate

A persistent critique questions whether RL is actually creating new capabilities or simply revealing what was already present in pre-training. A paper from Stanford demonstrated that giving a base model enough tries allows it to answer questions as well as reasoning models—essentially narrowing down the probability space of possible answers rather than generating new ones.

Douglas counters that the compute investment in reinforcement learning represents genuine capability addition, not mere revelation. The amount of training data matters: when RL signal is sufficiently clean, these systems learn knowledge exceeding human-level performance—from chess to Go—and do so from scratch.

The economics are telling. Labs currently spend hundreds of millions on base model pre-training but only a few million on reinforcement learning. This isn't compute-limited yet—but will become increasingly significant as RL scales.

Counterpoints

Critics might note that the distinction between "new capabilities" and "revealed capabilities" matters significantly for understanding what's actually happening with these systems. If reasoning models merely narrow exploration rather than learn new patterns, the apparent progress in mathematics and programming could be an artifact of evaluation design rather than fundamental improvement.

Additionally, predicting software engineering agents performing a full day's work by 2025 carries inherent uncertainty—the field has seen premature declarations before.

Bottom Line

Douglas and Bricken present compelling evidence that RL plus language models has crossed a threshold in verifiable domains like competitive programming and mathematics. Their prediction for end-of-2025 agent capability represents the most substantive shift from chatbot assistants to autonomous workers we've yet seen. The fundamental insight is clear: when feedback loops are clean, these systems perform remarkably; without them, they falter. The next frontier isn't more powerful base models—it's constructing environments where AI can reliably learn from genuine success and failure.

Deep Dives

Explore these related deep dives:

Sources

Is rl + llms enough for agi? — Sholto douglas & trenton bricken

by Dwarkesh Patel · Dwarkesh Patel · Watch video

Okay, I'm joined again by my friends Schultter Bricken. Wait, Did I do this last? No, no, you named us differently, but we didn't have Shto Bricken and Trenton Douglas. Shelto Douglas and Trenton Bricken.

who are now both at Entropic? Shto scaling RL. Trenton still working on mechanistic interpability. welcome back.

Happy to be here. Yeah, it's fun. What's changed since last year? We talked basically this month in 2024.

Now we're in 2025. What's happened? Okay, so I think the biggest thing that's changed is RL and language models has finally worked. and this is manifested in we finally have proof of an algorithm that can give us expert human reliability and performance given the right feedback loop.

And so I think this is only really being like conclusively demonstrated in competitive programming and math basically. and so if you think of these two axes, one is the like intellectual complexity of the task and the other is the time horizon of which the task is being completed on and I think we have proof that we can we can reach the peaks of intellectual complexity along many dimensions. we haven't yet demonstrated like longunning agentic performance and you're seeing like the first stumbling steps of that now and should see much more like conclusive evidence of that basically by the end of the year u with like real software engineering agents doing real work and I think Trenton you're like experimenting with this at the moment right yeah absolutely the most public example people could go to today is Claude plays Pokemon right and seeing it struggle in a way that's like kind of painful to watch but each model generation gets further through the game. and it seems more like a limitation of it being able to use memory system than anything else.

Yeah. I wish we had recorded predictions last year. We definitely should this year. Oh, yeah.

Hold us accountable. Yeah, that's right. Would you have said that agents would be only this powerful as of last year? I think this is roughly on track for where I expected with software engineering.

I think I expected them to be a little bit better at computer use. but I understand all the reasons for why that is and I think that's like well on track to be solved. It's just like a sort of temporary ...