The conversation around whether AI is a bubble has become exhausting. But buried in the debate are some genuinely surprising findings that deserve attention — starting with what Sam Altman actually said.
What Altman Actually Said
The narrative that OpenAI CEO Sam Altman called AI a bubble was an oversimplification. He actually said investors are overexcited about AI — and he's right to be cautious. The evidence? Former OpenAI chief scientist Ilas Suskava left to form Safe Super Intelligence, valued at $32 billion with no product. His former CTO Mera Mirati now runs Thinking Machines Lab, valued at $12 billion also without a public product. These valuations exist without any revenue to back them up.
The Studies Tell a Complicated Story
Two recent studies have fueled the bubble narrative — but they deserve closer inspection.
The McKinsey study, cited by the New York Times, found that eight in ten enterprises see no measurable profit increase from AI projects. However, this study was conducted during the pre-reasoning paradigm of mid-2024 and reads more like advertising for consulting case studies where AI did show results. The MIT study surveyed 153 senior leaders at 52 organizations and found only five percent of enterprise AI projects generate value. But here's what headlines miss: while formal company initiatives remain stuck on the wrong side of the Gen AI divide, employees are already crossing it through personal AI tools. This shadow AI often delivers better return on investment than formal initiatives.
The studies capture a real problem — but it's not quite the one being reported.
The Reasoning Breakthrough
By mid-2024, academic consensus held that models couldn't reason. The blocksworld challenge — where models must stack red blocks on blue ones — showed Gemini 1.5 Pro scoring around half on this challenge. When researchers shuffled the words while keeping the logic identical, model performance dropped off a cliff.
Then came o1-preview in September 2024, achieving nearly 53 percent on this very challenge. The authors had to rename language models to language reasoning models. This single shift — from pattern matching to actual reasoning — changes everything about how we evaluate AI progress.
Benchmark Progress Is Real
The MMU test measures model's ability to navigate charts, tables, and technical diagrams at near-expert level. In 2023, scores were around 38 percent. A year later: 68 percent. By the following year: 83 percent — ahead of an ensemble of human experts at 82.6 percent.
Alpha Evolve from Google saved 0.7 percent of their worldwide compute resources through automated solutions that could iterate rapidly. The bottleneck became manual experimentation, not AI capability.
Critics might note that benchmark improvements don't automatically translate to real-world productivity gains for most businesses. The gap between impressive test scores and practical deployment remains significant.
"If you think you've found a slam dunk thing that AI can't do, make a benchmark of it and see if it lasts 18 months."
Bottom Line
The bubble argument rests on outdated studies and media headlines that miss the reasoning shift entirely. What actually matters is that language models have crossed a fundamental threshold — from pattern matching to genuine reasoning. The studies capture legitimate frustration with implementation, but they were written before September 2024's breakthrough. Anyone claiming AI is a bubble should first explain why reasoning capabilities suddenly appeared when everyone said they wouldn't.