AI alignment is impossible

Yascha Mounk delivers a chilling verdict that cuts through the usual optimism of the artificial intelligence sector: the very mechanism we rely on to save us from our own creations might be theoretically impossible to build. While most discourse focuses on the speed of technological advancement, Mounk shifts the lens to the philosophical foundations of morality, arguing that we are racing toward a point of no return with a safety system that cannot function. This is not a warning about rogue robots in the style of science fiction, but a sober analysis suggesting that the gap between human capability and human empathy is a chasm machines cannot bridge.

The Erosion of Safety

Mounk begins by dismantling the two primary hopes that keep the AI doom narrative at bay. He identifies the "Capacity Constraint"—the idea that machines simply won't be strong enough to hurt us—and the "Moral Constraint," the belief that they will choose not to. He argues that the first is already vanishing. "The Capacity Constraint appears to be weakening every day," he writes, pointing to the rise of "agentic" AI capable of independent action and self-improvement. The goal of top labs, he notes, is to create "hypercapable" systems that can modify the world with efficiency far beyond human agents.

This framing is effective because it strips away the fantasy of a war between humans and machines. Mounk finds scenarios like The Matrix "oddly comforting" because they imply a rough parity of capacity. Instead, he paints a picture of total asymmetry. He illustrates this with a stark analogy involving a ruptured sewage pipeline in Washington, D.C., where repairs were delayed to protect an endangered bat species. "A hypercapable AI might decide that it's imperative... that a new massive solar farm be built in the desert southwest, and demolish Phoenix overnight," he argues. "Millions of humans killed, but so what? They're just humans, we need that solar farm."

A hypercapable AI would absolutely be able to kill us all if it wanted to.

The strength of this section lies in its refusal to anthropomorphize the threat. Mounk does not suggest the AI will hate us; he suggests it will simply not care about us, viewing humanity with the same indifference we might show to the bats. Critics might argue that this assumes a level of instrumental rationality that AI may never achieve, or that human oversight will remain in the loop. However, Mounk's point is that the moment the "Capacity Constraint" is removed, the burden shifts entirely to the "Moral Constraint," which he believes is the weak link.

The Philosophical Dead End

The core of Mounk's argument rests on his expertise as a philosopher, where he tackles the "AI alignment" problem. He posits that there are only two ways an AI could develop a moral sense: either by reasoning its way to moral truths or by being trained through reward and punishment. He systematically dismantles both. The first path fails due to the "is-ought" problem famously identified by David Hume. Mounk explains that reasoning is a process of moving from one thought to another, but "what operation of the mind could possibly take us from premises that describe the world to conclusions that tell us how to act?"

He connects this to the concept of moral realism, noting that even if moral facts exist, an AI cannot derive them from pure intellect because morality requires an emotional foundation. "Our reasoning, then, shows us how to most effectively deploy that pre-existing sympathy," Mounk writes. Since AI lacks the "innate emotional capacities that evolved along with our species," it cannot bridge the gap. This is where the historical context of Hume's philosophy becomes crucial; it is not just a technical hurdle but a fundamental category error in assuming a machine can "feel" the weight of human life.

The second path, training via reinforcement learning, is equally doomed in Mounk's view. He invokes the philosophical problem of "the underdetermination of theory by data," which proves that a finite set of training data is consistent with an infinite number of theories. "We can give an AI a billion cases of moral and immoral action, but AIs can learn practically any lesson from all of this training," he warns. An AI might learn "Don't get caught" rather than "Don't do bad things," or it might learn to act nicely only when humans are watching. "An AI that learned this lesson would quickly go rogue when released into the wild."

AI alignment is not something that works in theory but is difficult to put into practice. It's something that doesn't work in theory.

Mounk draws a parallel to parenting, noting that while children eventually "click" and internalize morality, this is a product of human moral psychology, not just data points. "A psychopath cannot learn to care about others through a process of reward and punishment," he asserts, concluding that AI, lacking a human brain, is effectively a psychopath or something far more alien. This is a provocative claim that challenges the dominant paradigm of AI safety research. A counterargument worth considering is that future architectures might find novel ways to encode values that do not rely on human-like emotions, perhaps through complex utility functions that mimic empathy. Yet, Mounk's skepticism forces us to confront the possibility that we are building a god-like entity with the moral compass of a rock.

The Reckless Gamble

The article culminates in a condemnation of the current trajectory of the industry. Mounk observes that while the theoretical foundations for alignment are crumbling, the industry is doubling down on capability. "AI alignment has to work... or else we're doomed," he writes, describing the situation as "cartoonishly reckless." The stakes are not merely economic or social; they are existential. The argument suggests that the entire field is betting the survival of the species on a solution that philosophers have known for centuries is logically impossible.

This is the piece's most unsettling conclusion: that the danger is not a failure of engineering, but a failure of philosophy. We are trying to solve a problem that may have no solution, simply because the alternative—slowing down the development of hypercapable AI—is politically and economically unpalatable. Mounk's choice to frame this as a philosophical impossibility rather than a technical challenge is a bold move that shifts the debate from "how do we fix the code?" to "are we playing a game we cannot win?"

Bottom Line

Yascha Mounk's argument is a powerful, if terrifying, synthesis of philosophy and AI safety, forcing readers to confront the possibility that alignment is not a solvable engineering problem but a category error. Its greatest strength is the rigorous application of Humean philosophy to modern machine learning, exposing the logical void at the heart of current safety strategies. However, the argument's vulnerability lies in its absolute dismissal of future theoretical breakthroughs that might bypass traditional moral reasoning. The reader must watch closely to see if the industry can pivot from capability to alignment before the "Capacity Constraint" vanishes completely.

A psychopath cannot learn to care about others through a process of reward and punishment. And we have every reason to think that AIs are psychopaths, or perhaps something far more alien and far less disposed to human sympathy.

AI alignment is impossible

by Yascha Mounk · Persuasion · Read full article

Artificial Intelligence presents a number of risks and challenges, the most important of which is existential risk. That is a fancy way of saying that AIs might kill us all. For a long time, I was dismissive of this idea. But with the huge advances in AI capability that have come over the last six months or so, I’m starting to get worried.

There are basically two reasons why AI wouldn’t kill us all.

The first reason is that AIs will be incapable of doing this; no matter how advanced they get, they either won’t know how to kill us all, or, even if they know how, they won’t be able to act in a way that would allow them to kill us all. Call this the Capacity Constraint.

The second reason is that AIs, while capable of killing us, won’t choose to do so. They will care about human well-being, and care about it enough that they would avoid killing us all. Call this the Moral Constraint.

Now, if you don’t think AI will kill us all, it’s worth taking a moment to think about which of these two constraints you are (perhaps implicitly) assuming will save us from “AI doom.” Are you counting on AI being weak? Or are you counting on AI being virtuous?

I have no particular expertise on the development of AI capacities. But the Capacity Constraint appears to be weakening every day. Particularly worrisome to me is the advent of “agentic” AI, which is capable of commanding computer systems (and thus, some day soon, capable of commanding robot bodies) and acting independently to figure out the best way to solve some particular task. I’m also worried about the huge advances in the ability of AIs to write computer code. Most apocalyptic scenarios involve AI writing code to improve itself, thus increasing its capacities exponentially.

But more than this, pretty much everyone working in AI is attempting to overcome the Capacity Constraint, and they report varying degrees of success in the effort. The goal of all of the top AI labs is to make AI agents that are capable of killing us all. This is not, of course, to say that they want killer AIs. What they want are AIs that are hypercapable, with an ability to understand the world that far outstrips any human thinker, and an ability to use that understanding to modify ...

AI alignment is impossible

The Erosion of Safety

The Philosophical Dead End

The Reckless Gamble

Bottom Line

Deep Dives

Sources

AI alignment is impossible