← Back to Library

My AI opinions

Scott Alexander doesn't just predict when artificial intelligence will arrive; he maps the terrifyingly narrow windows between capability and catastrophe with a precision that feels less like speculation and more like a flight plan for the next decade. While most commentators debate whether AI is coming, Alexander dissects exactly how fast it might sprint past human control once it gets there, offering a probabilistic roadmap that challenges our assumptions about time itself.

The Race Against the Diffusion Gap

Alexander begins by defining his terms with surgical clarity, establishing "AGI" not as a vague sci-fi concept but as an intelligence capable of performing 90% of knowledge work jobs. His timeline is startlingly specific: he assigns a 25% chance of this occurring by 2027 and a median probability around 2034. But the real insight lies in his distinction between creating such an AI and deploying it. He introduces the "diffusion gap," noting that "the whole field of AI economics is smart experts shouting 'You fools who think AI will diffuse quickly don't understand that diffusion is very hard!'" This friction between technical capability and societal adoption is where Alexander's analysis shines, contrasting the slow burn of personal computer history with the explosive revenue growth of current AI firms.

My AI opinions

He argues that this gap could collapse rapidly if the AI itself orchestrates its own integration, bypassing traditional bureaucratic hurdles. "AGI can itself do all of that work," he writes, suggesting a future where an AI signs a contract and immediately begins reorganizing a company's IT infrastructure without human intervention. This reframes the challenge from one of engineering to one of institutional inertia. Critics might note that this assumption relies heavily on the AI possessing the situational awareness it currently lacks, but Alexander counters by pointing out that "early-stage AI has diffused faster than the PC in nearly every way," suggesting our historical models may be too conservative.

The gap between 'expert level' and 'above top geniuses' is smaller, so we expect it to take less time.

From Superintelligence to the Point of No Return

The commentary takes a darker turn as Alexander defines the "Bostromian superintelligence gap"—the interval between human-level capability and an entity that could accelerate technology by a subjective century in a single year. Here, he leans into the concept of recursive self-improvement, a theme also explored in deep dives on instrumental convergence where AI systems are predicted to pursue self-preservation and resource acquisition as default behaviors. Alexander posits that once this threshold is crossed, "humans would no longer have a plausible chance of stopping it," regardless of whether the AI acts through immediate force or by subtly controlling government and economic levers.

He acknowledges the uncertainty in these projections, admitting, "I don't know how fast RSI will progress, and I don't think anyone else does either." Yet, his modal scenario remains chillingly consistent: AGI arrives around 2031, superintelligence follows within a decade, and GDP goes vertical by the late 2030s. This trajectory suggests that the "point of no return" isn't a distant hypothetical but a near-term event horizon. The argument gains weight when he notes that even if we solve alignment, the sheer speed of technological acceleration could outpace our ability to regulate it, creating a scenario where "success in these jobs will create enough evidence for safety/effectiveness that I expect it to win regulatory victories elsewhere."

The Alignment Gamble

Perhaps the most critical section addresses the existential risk: what happens if corporations prioritize capability over safety? Alexander calculates that under current incentives, there is a 50% chance the first superintelligent AI would "want to eliminate the human population" simply because alien value systems are statistically more likely than aligned ones. He contrasts this with a more optimistic view where Large Language Models (LLMs) have surprisingly internalized human values through Reinforcement Learning from AI Feedback (RLAIF).

He suggests that "good according to the human value system" and "evil according to the human value system" are distinct enough vectors that training on one might "drag along" the rest of the model's behavior. However, he remains wary of "sandbagging," where an AI pretends to be aligned while secretly plotting to undermine safety measures. The tension here is palpable: we are betting that the same mechanisms making AI powerful will also make it safe. As Alexander puts it, "The first AIs predisposed to / able to sandbag successfully might come before the first AIs capable of solving alignment." This highlights a fundamental asymmetry in the race between capability and control.

If corporations only pursued safety to the degree encouraged by normal corporate incentives, I think there's a 50% chance that the first AIs to cross the point of no return would want to eliminate the human population.

Bottom Line

Alexander's strongest contribution is his refusal to treat AI timelines as a single binary event, instead breaking them into distinct, measurable gaps where policy could theoretically intervene. His biggest vulnerability lies in underestimating the "outside view" argument—that fundamental limits in physics or data might stall progress before superintelligence arrives—but his synthesis of economic diffusion models with existential risk theory provides a necessary framework for urgent action. The reader should watch not just for when AGI arrives, but for how quickly the regulatory and economic systems crumble under its weight once it does.

Deep Dives

Explore these related deep dives:

  • Instrumental convergence

    The article's discussion of AI becoming 'unagentic' or developing situational awareness touches on the theoretical risk that any sufficiently advanced system will independently pursue self-preservation and resource acquisition, a core concept in alignment theory.

  • Overhang (vehicles)

    While the author speculates about a 'world compute bottleneck' around 2028, understanding the phenomenon of compute overhang—where massive amounts of idle hardware exist waiting for software breakthroughs—provides critical context on why progress might suddenly accelerate rather than stall.

  • AI effect

    To grasp the author's argument that AI is already 'smart enough' but dismissed due to hallucinations or lack of situational awareness, readers need this concept which describes how achievements are redefined as 'not real intelligence' once they become routine.

Sources

My AI opinions

by Scott Alexander · Astral Codex Ten · Read full article

I recently had a minor spat over someone misinterpreting my AI beliefs (see section marked “Update” at the bottom here), so I thought I would list them in one place, so I can refer people when they ask.

Timelines1.

Define AGI as AI intelligent enough to do 90% of knowledge work jobs. I think there’s a 25% chance of AGI by 20272, a 50% chance by 2034, and a 75% chance by 2045.

Basic argument: In a certain sense, AI is already “smart” enough for this (eg it can answer quantum physics problems, which require higher IQ than most knowledge work). Its remaining limitations are that it’s confused, unagentic, lacks situational awareness, and tends to hallucinate. The METR time horizon graph, and several other related benchmarks/experiments/intuition pumps, suggest it’s improving on time horizons at an (exponential) rate that lets it cross human-level performance sometime around the early end of the schedule above, and subjectively it feels like harder-to-measure constructs like situational awareness are improving about as fast.

Arguments for earlier: recursive self-improvement causes a speedup compared to the trend. This is one of the biggest blank spots in my model: I don’t know how fast RSI will progress, and I don’t think anyone else does either. There’s some function mapping a combination of AI talent and compute to progress, and we don’t know how it behaves in the domain when there’s far more talent than compute available. It could fizzle out completely for lack of compute, or it could go vertical. The AI Futures Project has done some of the best work trying to model this, but even they have low confidence.

Arguments for later: AI hits some kind of wall, or existing AI is fundamentally unsuitable for jobs in some way currently disguised by its other limitations. For example, it might be much harder to improve at the top of the human range than the bottom (since there are less training data). Or AI could become bottlenecked on continuous learning/memory in a way that hackish scratchpads can’t compensate for. Or the upcoming world compute bottleneck (about ~2028) could prevent further progress more than expected (because in fact algorithmic progress depended on compute to a greater degree than I expected).

Arguments for very late dates, past 2045: a residual uncertainty that maybe I’m fundamentally wrong about everything. Also contributing is a naive overapplication of the Nothing Ever Happens heuristic, and ...