Ilya sutskever – we're moving from the age of scaling to the age of research

Dwarkesh Patel", ## The Age of Scaling Is Over

In the world of artificial intelligence, something strange is happening. The biggest companies are investing roughly 1% of GDP into AI—resources that would have felt science fiction decades ago—but for most people, nothing has actually changed. That's the paradox at the heart of this conversation with Ilya Sutskever.

Ilya sutskever – we're moving from the age of scaling to the age of research

Sutskever, a pioneer in deep learning and co-founder of OpenAI, makes a bold claim: we're moving from an era where scaling existing techniques was enough to drive progress, into something far more uncertain—an age of genuine research. The shift isn't just semantic. It's a fundamental change in how we think about what these models can become.

The Puzzle of Smarter Models

One of the most confusing things about current AI systems is how to reconcile their remarkable performance on evaluations with their relatively modest economic impact. The models do incredibly well on tests—sometimes outperforming human experts—yet the economy hasn't felt that shift yet.

Sutskever points out a striking example: when you use AI coding tools, something odd happens. You can ask a model to fix a bug, and it introduces a second bug. Then you ask it to fix that new bug, and somehow it brings back the first one. You can alternate between bugs forever.

How is this possible? The models seem smart enough for evaluations but struggle with basic tasks in the real world.

Sutskever offers two explanations. First, maybe real training has made the models too single-minded—unaware of broader contexts even while they're highly capable in narrow domains. Second, when companies do pre-training, they use essentially everything available. But during reinforcement learning training, they have to make choices about what environments to include.

The Two Students

Sutskever introduces a human analogy that makes this clearer:

Imagine two students wanting to become great programmers. Student one trains like a competitive programmer: 10,000 hours of practice, solving every problem, memorizing proof techniques, mastering algorithms. They become exceptional at coding competitions.

Student two does maybe 100 hours of practice but also explores broadly—does well in their career later on.

Which student succeeds more in their career? The second one.

The models today are much more like the first student. Companies say: we want our model to do great at coding competitions, so let's give it every single competitive programming problem ever created, augment the data, and train on that. Now you have an excellent competitive programmer—but this level of preparation doesn't necessarily generalize to other things.

What Pre-Training Actually Does

The distinction between pre-training and reinforcement learning matters more than most people realize.

Pre-training's main strength is sheer volume—you can include basically all the world's data without carefully selecting what goes in. It's like the 10,000 hours of practice you get for free because that knowledge already exists somewhere in the pre-training distribution.

But here's what's surprising: even with a tiny fraction of that data, humans know much more at age 15 than these models do. And they know it more deeply. The mistakes they make are different from what you'd expect from pure scale.

Sutskever also explores whether evolution is an analogy for pre-training—three billion years of search producing human intelligence. But there's a key difference: in the case of brain damage, when someone loses emotional processing, they can't make decisions even though they're articulate on tests. They make terrible financial decisions and struggle with basic choices.

This suggests emotions might be something you can't get from pre-training alone—some kind of value function that tells you what to optimize for.

The Problem With Value Functions

In reinforcement learning, when you train an agent, you give it a problem and let it take thousands of actions. Then you score the solution and use that signal to update every action in the trajectory.

The problem: if you're doing something that takes a long time to solve, you learn nothing until you actually finish solving it.

Value functions help here. In chess, when you lose a piece, you know immediately that what preceded it was bad—you don't need to play the entire game to update your behavior. The value function lets you short-circuit learning before reaching the end of a long chain of reasoning.

But in domains like math or programming, if you've spent a thousand steps pursuing a solution direction and conclude it's unpromising, you could already get that reward signal at step one—not at the very end when you finally give up. The value function isn't playing as prominent a role in current AI research.

Moving Forward

Sutskever's core argument is clear: we need to stop simply stacking environments and diversity. Instead, we should figure out approaches that let models learn from one environment and improve on something else. The age of scaling taught us important lessons about what works. But the next era requires genuine research into how these systems actually reason.

"If you combine this with generalization being inadequate, it explains a lot of what's going on—the disconnect between eval performance and real-world performance that we don't even fully understand yet."

Critics might note that focusing too narrowly on coding competitions or specific benchmarks risks missing broader capabilities. The evaluation metrics themselves might be too narrow, and expanding them might not capture what actually matters for deployment.

Bottom Line

Sutskever's strongest insight is recognizing that scale alone won't produce generalizable intelligence—the analogy of the 100-hour student suggests something more fundamental about how learning works. His vulnerability is less technical and more conceptual: we don't yet know what those broader capabilities should look like, or how to design environments that test for them instead of just stacking more of the same.

The age of research isn't just a new era—it's a confession that we have no idea how to get where we're going.

Ilya sutskever – we're moving from the age of scaling to the age of research

by Dwarkesh Patel · Dwarkesh Patel · Watch video

what's crazy that all of this is real? >> Yeah. Meaning what? >> Don't you think so?

>> Meaning what? >> Like all this AI stuff and all this area. Yeah. That it's happen like >> isn't it straight out of science fiction?

>> Yeah. Another thing that's crazy is like how normal the slow takeoff feels. The idea that we'd be investing 1% of GDP in AI, like I feel like it would have felt like a bigger deal, where right now it just feels like >> you get used to things pretty fast. Turns out Yeah.

But also it's kind of like it's abstract like what does it mean? What it means that you see it in the news. >> Yeah. >> That such and such company announced such and such dollar amount, >> right?

>> That's that's all you see, >> It's not really felt in any other way so far. >> Yeah. Should we actually begin here? I think this is an interesting discussion.

>> Sure. >> I think your point about well from the average person's point of view, nothing is that different will continue being true even into the singularity. >> No, I don't think so. Okay.

Interesting. >> So the thing which I was referring to not feeling different is okay. So such and such company announced some difficult to comprehend dollar amount of investment >> right >> I don't think anyone knows what to do with that. >> But I think that the impact of AI is going to be felt.

>> AI is going to be diffused through the economy. There are very strong economic forces for this and I think the impact is going to be felt very strongly. >> When do you expect that impact? I think the models seem smarter than their economic impact would imply.

>> Yeah, this is one of the very confusing things about the models right now. How to reconcile the fact that they are doing so well on evals. >> Mhm. And you look at the evals and you go, those are pretty hard evals, >> They're doing so well, >> but the economic impact seems to be dramatically behind.

And it's almost like it's it's very difficult to make sense of how can the model on the one hand do these amazing things and then on the other hand like repeat itself ...

Ilya sutskever – we're moving from the age of scaling to the age of research

The Puzzle of Smarter Models

The Two Students

What Pre-Training Actually Does

The Problem With Value Functions

Moving Forward

Bottom Line

Deep Dives

Sources

Ilya sutskever – we're moving from the age of scaling to the age of research