Is AI progress slowing down?

The tech industry's sudden pivot from "scaling is everything" to "scaling is dead" feels less like a scientific breakthrough and more like a strategic retreat. Arvind Narayanan & Sayash Kapoor dismantle the narrative that artificial intelligence progress has stalled, arguing instead that the sudden shift is a convenient cover for companies hitting data limits while trying to save face. For busy leaders tracking the economic impact of AI, this piece is essential because it exposes the gap between industry hype and the messy reality of product development.

The Myth of the Crystal Ball

The authors challenge the automatic deference granted to Silicon Valley executives, pointing out that their predictions are often driven by fundraising needs rather than hard data. "Declaring the death of model scaling is premature," they write, noting that the industry's flip-flopping reveals a lack of foresight. Narayanan & Kapoor argue that when a company like OpenAI admits to struggles, it creates a cascade effect where competitors follow suit to explain away their own delays. This is a crucial observation: the narrative isn't shifting because of new physics, but because of market psychology.

The core of their argument rests on the idea that insiders are not significantly better informed than the rest of us. "Industry leaders don't have a good track record of predicting AI developments," they state, drawing a parallel to the decade-long overoptimism surrounding self-driving cars. This lands with force because it strips away the mystique of the "tech genius" and replaces it with a view of executives as rational actors protecting their commercial interests. As Narayanan & Kapoor put it, "Their narratives are heavily influenced by their vested interests."

Critics might argue that insiders do possess proprietary data on unreleased models that outsiders simply cannot access. However, the authors counter that this advantage is marginal in the context of multi-year forecasts, especially given the open-source movement sharing weights and insights. The real bottleneck, they suggest, isn't a lack of knowledge but a lack of new ideas to feed the models. "Scaling as usual ended with GPT-4 class models, because these models are trained on most of the readily available data sources," they explain. This reframes the problem from a technological dead end to a resource management issue.

Industry leaders don't have a good track record of predicting AI developments.

The Pivot to "Thinking" Models

With the narrative of raw model size hitting a wall, the industry is pivoting to "inference scaling," or test-time compute. This involves spending more computational power while the model is running, essentially forcing it to "think" before it answers. Narayanan & Kapoor acknowledge this is a real trend with potential, but they warn against viewing it as a magic bullet. "Inference scaling is real, and there is a lot of low-hanging fruit," they concede, but immediately caution that the improvements will be "unpredictable and unevenly distributed among domains."

The authors provide a nuanced breakdown of where this approach works and where it fails. It excels in tasks with clear correct answers, like coding or math, where a model can verify its own logic. "For problems where it does work well, how much of an improvement is possible by doing more computation during inference?" they ask, highlighting the uncertainty. Conversely, for creative tasks or translation, the authors argue that reasoning cannot compensate for a lack of training data. If a model doesn't know a language's idioms, no amount of "thinking" will fix it.

This distinction is vital for stakeholders. If the next wave of AI progress is limited to logic-heavy tasks, the economic disruption will look very different than if it applies to general creativity. Narayanan & Kapoor note that early evidence supports this split: "Improvements in exam performance seem to strongly correlate with the importance of reasoning for answering questions, as opposed to knowledge or creativity." They point out that while math scores jump, performance in subjects like biology or English shows negligible gains.

The Disconnect Between Capability and Impact

Perhaps the most sobering point in the essay is the assertion that better AI models do not automatically translate to better societal outcomes. The authors argue that the connection between raw capability and real-world impact is "extremely weak." "The bottlenecks for impact are the pace of product development and the rate of adoption, not AI capabilities," they write. This shifts the focus from the algorithm to the ecosystem: how fast can companies build usable tools, and how quickly will businesses adopt them?

Narayanan & Kapoor suggest that the industry's obsession with capability metrics distracts from these harder, messier problems. They criticize the tendency to overestimate the immediate utility of new models, noting that even if a model can solve a math problem perfectly, it might be useless in a real-world workflow if it cannot integrate with existing software. "The most important factor is whether scaling will make business sense, not whether it is technically feasible," they conclude. This is a stark reminder that the market, not the math, will dictate the pace of AI's integration into the economy.

Critics might suggest that dismissing the link between capability and impact is premature, as history shows that foundational breakthroughs often lead to unforeseen applications. Yet, the authors' focus on the "bottlenecks of adoption" remains a strong counter to the hype cycle. They urge journalists and policymakers to stop treating industry forecasts as gospel. "We must now caution against excessive pessimism about model scaling," they write, just as they previously warned against excessive optimism. The goal is a more grounded, evidence-based approach to understanding AI's trajectory.

Bottom Line

The strongest part of this argument is its ruthless deconstruction of industry authority, proving that the sudden shift away from scaling is likely a strategic maneuver rather than a scientific consensus. Its biggest vulnerability is the difficulty in predicting when "inference scaling" might hit its own limits, a gap the authors admit remains open. Readers should watch for whether the promised gains in reasoning actually translate into reliable, scalable products, or if they remain confined to academic benchmarks. The real story isn't whether AI is slowing down; it's whether the industry can pivot fast enough to build something useful before the hype runs out.

Is AI progress slowing down?

by Arvind Narayanan & Sayash Kapoor · AI Snake Oil · Read full article

By Arvind Narayanan, Benedikt Ströbl, and Sayash Kapoor.

After the release of GPT-4 in March 2023, the dominant narrative in the tech world was that continued scaling of models would lead to artificial general intelligence and then superintelligence. Those extreme predictions gradually receded, but up until a month ago, the prevailing belief in the AI industry was that model scaling would continue for the foreseeable future.

Then came three back-to-back news reports from The Information, Reuters, and Bloomberg revealing that three leading AI developers — OpenAI, Anthropic, and Google Gemini — had all run into problems with their next-gen models. Many industry insiders, including Ilya Sutskever, probably the most notable proponent of scaling, are now singing a very different tune:

“The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.” (Reuters)

The new dominant narrative seems to be that model scaling is dead, and “inference scaling”, also known as “test-time compute scaling” is the way forward for improving AI capabilities. The idea is to spend more and more computation when using models to perform a task, such as by having them “think” before responding.

This has left AI observers confused about whether or not progress in AI capabilities is slowing down. In this essay, we look at the evidence on this question, and make four main points:

Declaring the death of model scaling is premature.

Regardless of whether model scaling will continue, industry leaders’ flip flopping on this issue shows the folly of trusting their forecasts. They are not significantly better informed than the rest of us, and their narratives are heavily influenced by their vested interests.

Inference scaling is real, and there is a lot of low-hanging fruit, which could lead to rapid capability increases in the short term. But in general, capability improvements from inference scaling will likely be both unpredictable and unevenly distributed among domains.

The connection between capability improvements and AI’s social or economic impacts is extremely weak. The bottlenecks for impact are the pace of product development and the rate of adoption, not AI capabilities.

Is model scaling dead?.

There is very little new information that has led to the sudden vibe shift. We’ve long been saying on this newsletter that there are important headwinds to model scaling. ...

The Myth of the Crystal Ball

The Pivot to "Thinking" Models

The Disconnect Between Capability and Impact

Bottom Line

Sources

Is AI progress slowing down?