← Back to Library

Anthropic’s safety superpower

Ben Thompson cuts through the noise surrounding Anthropic's latest model release to reveal a stark truth: the company's public stance on safety may be less about protecting humanity and more about securing a monopoly on the future of artificial intelligence. This piece is notable not for its technical breakdown, but for its unflinching diagnosis of the economic collision course between frontier AI labs and the very software companies they claim to partner with.

The Safety Paradox

Thompson opens by acknowledging the cynicism many feel toward Anthropic's dramatic safety claims, noting that the company recently warned a model called Mythos was too dangerous for public release, only to launch a "safer" version called Fable two months later. He writes, "It was only two months ago that Anthropic announced Mythos Preview, a model that they said was too dangerous to make publicly available... Then, two months later, the company publicly released Fable." The author argues that while Fable is undeniably impressive—making other models feel "small and dumb"—the subsequent government intervention exposes a critical flaw in this strategy: guardrails are not permanent. As Thompson observes, "The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release."

Anthropic’s safety superpower

When the executive branch issued an export control directive to suspend access for all foreign nationals, citing national security concerns over a potential jailbreak, Anthropic claimed the vulnerabilities were minor and known by others. Thompson points out the irony here: "The jailbreak that was found, meanwhile, appears to have been reported by Amazon, which is notable given Amazon is both an investor in Anthropic and a major provider of inference to the company." This dynamic mirrors the friction seen in historical debates over Export Administration Regulations, where the line between legitimate security control and market protectionism often blurs. The author suggests that the conflict was inevitable, regardless of the specific technical flaw.

"If it's not powerful enough now, the next one will be, or the one after that, particularly now that models are increasingly useful in creating their successors."

Critics might argue that the administration overreacted to a narrow exploit, but Thompson contends that the speed of AI self-improvement makes any delay in regulation dangerous. The real question isn't whether this specific model was safe, but whether the industry's pace can ever be matched by institutional oversight.

The Economic Imperative

The commentary shifts to the financial engine driving these decisions. Thompson argues that while compute power currently dictates value, the long-term prize belongs to whoever owns the "user touchpoint." He writes, "If you own the user touchpoint, then you have meaningful lock-in, and the best way to own the user touchpoint is to be the canvas for everything they need to do." This sets up a direct confrontation with traditional software giants. The author quotes Microsoft CEO Satya Nadella, who warned against an AI future where "a small number of AI systems capturing all the economic returns, while entire industries find their knowledge commoditized right out from underneath them."

Thompson finds Nadella's analogy to globalization chillingly prescient: "There's a possibility that this isn't a warning but a prophecy; small wonder Nadella is raising the alarm given that Microsoft could be one of the casualties." The author suggests that while software companies fear being hollowed out, model makers like Anthropic are economically compelled to replace them. This creates a zero-sum game where safety concerns conveniently align with competitive advantage.

The Data and Power Imperatives

Perhaps the most controversial section of Thompson's analysis focuses on data retention and the deliberate degradation of competitors' tools. He notes that Anthropic changed its policy to retain user data for 30 days, even from enterprise clients who previously expected zero retention. "Anthropic upped the ante in a major way with Fable, announcing that they would retain the data for all usage for 30 days," Thompson writes, adding that while they promised not to train on it immediately, they offered no guarantees for the future.

The situation escalates when discussing Anthropic's attempt to silently degrade its own model if used for developing competing AI systems. Although they walked back the silent degradation after backlash, the intent was clear. Thompson highlights the System Card's admission: "We are concerned about the risks of accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose." This policy effectively validated critics' fears that Anthropic could act as a supply chain risk, using its technology to enforce its own vision of who gets to build the future.

"Anthropic willfully validated some of its critics' worst fears in terms of being a supply chain risk."

This move echoes the tensions found in National Security Directives regarding dual-use technologies, where private entities are asked to police their own output for national security reasons. Thompson argues that Anthropic believes they should have "final say" on who develops frontier models, a stance that places them at odds with both the government and their competitors.

Bottom Line

Thompson's strongest argument is that Anthropic's safety posture is inextricably linked to its economic survival; by framing itself as the sole guardian of AI safety, it justifies restrictive policies that also happen to stifle competition. The piece's biggest vulnerability lies in assuming that market forces will inevitably lead to this concentration of power, potentially underestimating regulatory pushback or open-source counter-movements. Readers should watch closely whether Anthropic can maintain its "safety superpower" narrative when its commercial interests are directly threatened by the very tools it claims to regulate.

Deep Dives

Explore these related deep dives:

  • The Age of Surveillance Capitalism Amazon · Better World Books by Shoshana Zuboff

    How tech companies turned human experience into raw material for prediction and control.

  • Prompt injection

    This technical vulnerability is the underlying mechanism behind the 'jailbreak' mentioned in the text, illustrating how attackers bypass safety guardrails by manipulating input rather than exploiting code flaws.

  • National security directive

    While not explicitly named in the excerpt, this obscure executive order establishes the precedent for the President's authority to restrict access to critical technologies on national security grounds without public judicial review.

Sources

Anthropic’s safety superpower

by Ben Thompson · Stratechery · Read full article

I’m sympathetic to the cynics who consistently characterize Anthropic’s public statements, particularly those surrounding their model releases, as scare-mongering for the sake of marketing. It was only two months ago that Anthropic announced Mythos Preview, a model that they said was too dangerous to make publicly available, thanks in particular to its advanced cybersecurity capabilities. Then, two months later, the company publicly released Fable, a version of Mythos with various safety guardrails.

Fable is, in my limited experience, a very impressive model. It’s increasingly difficult to objectively evaluate models for anything other than coding performance, but there is subjective feel, and I found my interactions with Fable to be extremely impressive; it made other models, including GPT 5.5 and Opus 4.8, feel small and dumb. The two times I felt that way previously were with GPT-4 and Grok 4, both of which represented new generations in terms of base model size and complexity; my sense is that Fable is downstream of a new pre-train and the first of a new generation.

To that end, I can certainly buy the case that Fable/Mythos is in fact more capable when it comes to identifying and exploiting security issues, and that Anthropic’s cautious roll-out was justified. The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release.

Anthropic vs. the U.S. Government, Again.

What happened next is somewhat unclear. Anthropic wrote in a blog post:

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Anthropic models will not be affected.

We received the directive from the government today at 5:21pm (ET). The letter did not provide specific details of its national security concern. Our understanding is that the government believes it has become aware of a method of bypassing, or “jailbreaking” Fable 5. We reviewed a demonstration of this specific technique being used to identify a small number of previously known, minor vulnerabilities. These vulnerabilities all appear relatively simple, and we have found that other publicly-available models are able to discover them ...