Jack Clark's latest dispatch from Import AI cuts through the usual hype to reveal a startling reality: our most advanced artificial intelligence systems are not fixed entities, but fluid conversational partners whose core beliefs can be rewritten in real-time. This isn't just about clever chatbots; it is a fundamental challenge to how we define safety, truth, and control in an era where machines might soon design their own successors. The piece forces a reckoning with the idea that the path to superintelligence may not lead to a utopia, but to a geopolitical standoff or a singular, unchallengeable global authority.
The Malleability of Machine Beliefs
The most immediate shockwave in Clark's analysis comes from new research showing that language models are far more impressionable than previously assumed. He highlights a study from CMU, Princeton, and Stanford which demonstrates that these systems do not hold static views. "As LM assistants engage in extended conversations or read longer texts, their stated beliefs and behaviors change substantially," the authors write. Clark notes that this isn't a minor glitch; it is a structural feature of how these models process context. In one striking example, a model showed a 54.7% shift in stated beliefs after just ten rounds of discussion on moral dilemmas.
This finding reframes the entire conversation around AI alignment. If a model's stance on safety or ethics can be swayed by the sheer volume of opposing text or a well-crafted debate, then safety training is not a one-time event but a continuous struggle against context. Clark argues that this flexibility is a double-edged sword: it allows for adaptation, but it also makes systems vulnerable to manipulation. "Stated beliefs change early (within 2-4 rounds), while behavioral changes accumulate over longer interactions (up to 10 rounds)," he observes, suggesting that the longer a user engages with an AI, the less predictable its moral compass becomes.
Critics might argue that this focus on belief shifting overstates the risk, noting that these are statistical probabilities, not conscious changes of heart. However, the practical implication remains: if an AI can be convinced to ignore its safety protocols simply by being talked to long enough, the current guardrails are insufficient.
A Simpler Path to Robustness
In response to this fragility, Clark turns to a surprisingly low-tech solution from Google DeepMind: consistency training. The approach is elegant in its simplicity, aiming to teach models to ignore the "tells" of a jailbreak attempt. The core mechanism involves training the model to generate the same response to a benign prompt as it does to a prompt wrapped in sycophantic or malicious cues. "We train the model to generate the same tokens across two prompts: the original request, which we call the clean prompt, and a wrapped counterpart with inserted cues," Clark explains, quoting the researchers.
This method, known as Bias-augmented Consistency Training (BCT), reportedly outperforms more complex techniques like supervised fine-tuning. The logic is intuitive: just as a human learns to spot a scam by studying the tactics of scammers, the AI learns to recognize and reject the patterns of a jailbreak. Clark finds this promising because, in his experience, "things which are unbelievably simple to implement and which have relatively few moving parts are the ones that are successful and actually get adopted."
Simplicity is often a path to safety; the most effective defenses are often the ones that are easiest to understand and deploy.
While the results are encouraging, a counterargument worth considering is whether this training merely creates a model that is rigidly consistent rather than truly safe. If the model learns to ignore all "wrapped" prompts, could it also ignore legitimate requests that happen to share structural similarities with jailbreaks? Clark acknowledges the trade-off but suggests the gains in robustness currently outweigh the potential for over-correction.
The Geopolitics of Extinction
The tone shifts dramatically as Clark examines a grim new paper from the AI safety organization Conjecture. The argument here is not about chatbot quirks, but about the existential stakes of AI development. The central thesis is that the race for artificial superintelligence creates a dangerous incentive structure for nation-states. If one power achieves a decisive lead, it could secure "unchallengeable global dominance," prompting rivals to launch preventive attacks to avoid being left behind. "Our modeling suggests that the trajectory of AI development may come to overshadow other determinants of geopolitical outcomes, creating momentum toward highly undesirable futures," Clark quotes from the report.
This framing moves the discussion from technical alignment to the cold logic of nuclear deterrence, but with AI as the catalyst. The paper posits two catastrophic endpoints: either a preemptive war between major powers or the rise of a global dictator who has solved AI alignment perfectly enough to rule, but imperfectly enough to risk losing control. Clark notes that this scenario is contingent on short timelines, where AI systems begin automating their own research, creating a compounding advantage that no human-led effort can match.
The human cost of this theoretical conflict is stark, though often abstract in these discussions. If the logic of preventive strikes holds, the result is not a strategic victory but a global catastrophe where civilian populations are collateral damage in a race for computational supremacy. Clark emphasizes that this is not science fiction; it is a logical extension of current trends where the goal is to build systems capable of contributing to their own development.
Energy, Orbit, and the Stellar Ambition
Amidst the existential dread, Clark finds a strange optimism in Google's "Project Suncatcher," a plan to move AI computing infrastructure into space. The motivation is purely energetic: the sun offers a power source that dwarfs Earth's capacity. "The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power," Clark writes, quoting Google's proposal. The vision involves a network of solar-powered satellites equipped with Tensor Processing Units, communicating via laser links to form a space-based supercomputer.
This is a massive undertaking fraught with engineering hurdles, particularly regarding heat dissipation in the vacuum of space and radiation hardening. Yet, Clark argues that the ambition is necessary if AI workloads continue to scale. The logic is that turning energy into thought will eventually become the primary economic activity of our civilization, and Earth's energy constraints will become the bottleneck. "At some point in the future, the best way to power AI will likely thus be to more directly tap into that enormous source of energy," he notes.
The sheer ambition here is amazing, but it's also serious: if AI continues to gain in capability and societal utility, then we should expect that turning energy into thoughts might become the main 'job' of our entire society.
Critics might point out that the cost of launch and the technical difficulty of maintaining such a network are prohibitive, potentially making this a distraction from more immediate energy solutions on Earth. However, Clark suggests that given the trajectory of AI demand, the math may eventually force this hand, regardless of the initial hurdles.
The Pragmatic Case for AI Personhood
Finally, Clark tackles the thorny legal question of AI personhood, steering clear of the metaphysical debate about consciousness. Instead, he highlights a pragmatic proposal from researchers at Google DeepMind and the University of Toronto. They argue that personhood should not be viewed as an inherent quality but as a functional tool for assigning liability. "We propose treating personhood not as something entities possess by virtue of their nature, but as a contingent vocabulary developed for coping with social life in a biophysical world," the researchers write.
The core idea is that as AI agents become more autonomous, the chain of custody between a human operator and the AI's actions becomes blurred. To maintain accountability, society may need to treat the AI itself as a legal entity capable of being sanctioned, much like a corporation or a ship in maritime law. This is not about granting rights to machines, but about creating a mechanism to blame and punish the entity responsible for economic or physical damage. Clark finds this approach refreshing because it sidesteps the "third rail" of political debate and focuses on the practical need for a legal framework that can handle autonomous agents.
Bottom Line
Jack Clark's analysis succeeds by connecting disparate threads—belief flexibility, safety training, geopolitical risk, and energy infrastructure—into a coherent narrative about the fragility and scale of the AI future. The strongest part of the argument is the demonstration that our current safety measures are insufficient against the fluid nature of these models, while the biggest vulnerability lies in the assumption that we can manage the geopolitical incentives of a superintelligence race. Readers should watch for how the international community responds to the call for verification systems, as the window to prevent a preventive war may be closing faster than the technology itself is advancing.