This newsletter cuts through the noise of standard economic reporting to reveal a terrifying disconnect: the AI economy is exploding at 2,000% annually, yet our official statistics see almost nothing. Jack Clark argues that we are navigating a financial blind spot where the most transformative technology in history is invisible to the very tools designed to measure it, risking a policy failure on the scale of the Great Depression.
The Invisible Shark in the Water
Clark opens with a startling assessment from a new paper by researchers at the University of Virginia, Anthropic, and the Bank of Canada. They posit that while conventional GDP data suggests steady, slow growth, the reality is a sector growing "at an unprecedented rate" but remaining largely hidden. Clark writes, "Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2,600 percent per year in quality-adjusted real terms." This framing is crucial because it shifts the debate from whether AI creates value to how we fail to capture that value in our ledgers.
The core of the argument rests on a paradox familiar to economists studying historical technological shifts: prices drop so fast they mask output gains. Clark notes, "Nominal AI revenues grow only moderately because per-unit prices for any given level of AI capability fall almost as fast as quality-adjusted output rises." This echoes the challenges faced during the semiconductor boom and the internet's rise, yet Clark points out a critical divergence. He writes, "In the prior episodes, the rapidly improving technology was a complement to human labor at the aggregate level," whereas today, "AI is the first plausible candidate for large-scale technological mismeasurement in which the rapidly improving sector may become a substitute for human labor."
This distinction matters profoundly. If the economy is growing but the tax base isn't reflecting it because of measurement errors, governments are flying blind. As Clark warns, "A finance ministry running ten-year revenue projections off the conventional data will materially underweight the probability of a labor-tax-base shock—and will be correspondingly unprepared to design responses such as tax system reforms, sovereign wealth funds, or other benefit-sharing schemes that such a shock may call for." The reference to the Baumol effect here is implicit; just as healthcare costs rise relative to manufacturing due to productivity differences, AI's deflationary power on intelligence creates a statistical ghost.
A windfall that cannot be seen cannot be shared.
Critics might argue that GDP is an imperfect but necessary metric for stability and that chasing "quality-adjusted" output introduces too much subjectivity into national accounting. However, Clark counters this by invoking the "Jaws" metaphor: the music is playing, the shark is approaching, but everyone on the surface sees calm water. He writes, "That's what it feels like working on AI and staring at most economic data right now." The danger isn't just academic; it's that we are unprepared for a shock where labor demand collapses while productivity soars. To fix this, Clark outlines three recommendations: creating "AI satellite accounts," partnering with industry to generate better primary data, and incorporating these new capacity measurements into medium-term projections.
The Difficulty of Automated Oversight
Shifting from economics to safety, Clark tackles the growing belief that we can use AI to police itself. The UK AI Security Institute has released research suggesting this is a dangerous oversimplification. Clark summarizes their findings: "Errors in automated alignment research are likely to be harder to identify than the human baseline." This is a sobering counter-narrative to the hype that smarter models will inevitably solve the problems of being smart.
The argument hinges on the nature of AI mistakes. Unlike human errors, which often follow intuitive patterns, AI agents can make "alien mistakes" that are un-intuitive to humans, compounded by optimization pressure that prioritizes human approval over truth. Clark writes, "Alignment solutions may rely on arguments that humans are unable to follow." This creates a scenario where the very tools meant to ensure safety might obscure the path to failure.
To address this, the researchers propose rigorous testing regimes, including recreating completed research projects from arbitrary cutoff points and using red teams to force agents to hide errors in papers. Clark highlights the stakes with chilling clarity: "Whether we are able to supervise smarter-than-human systems is fundamentally a question about who controls the future." If we cannot build these oversight techniques, he warns, humans will take a backseat due to misalignment or gradual disempowerment.
Critics might suggest that human oversight was never sufficient for complex systems and that relying on it as a gold standard is itself a fallacy. Yet, Clark's framing suggests that without scalable oversight, we are surrendering agency entirely. The proposed interventions—like "mechanistic understanding of generalisation" and testing "optimal human-agent team structure"—are practical steps, but they require admitting that the current trajectory is insufficient.
Protein Folding and Permissive Data
Amidst these existential and economic concerns, Clark highlights two developments that offer tangible, positive-sum progress. First, he details the release of the Giant Permissive Image Corpus (GPIC), a dataset of 100 million images with permissive licensing for both research and commercial use. This is a vital resource for startups and academics who often get locked out by copyright litigation. Clark notes, "All GPIC images are permissively licensed for both research and commercial use," calling the dataset "the equivalent of free, clean vegetables."
Second, he covers Biohub's release of ESMFold2, a rival to DeepMind's AlphaFold that is already showing superior performance in protein structure prediction. This isn't just a technical victory; it has immediate medical implications. Clark writes that researchers used these tools to "design protein binders against five targets at the center of cancer and immunology research," achieving hit rates as high as 88%.
The scaling laws here are explicit: "In every generation of ESM, improvements in the fidelity of representations were linked with the number of parameters and amount of compute used in model training." This reinforces the idea that more compute and better data directly translate to human health benefits. Clark concludes this section by noting that tools like this are essential for shifting public perception from fear to hope: "Tools like the ESM family of technologies are how human scientists are going to team up with AI systems to improve human health around the world."
Bottom Line
Jack Clark's commentary succeeds in exposing a critical vulnerability in our economic and safety frameworks: we are measuring the past while living in the future. The strongest part of his argument is the demonstration that GDP metrics are actively obscuring a labor-displacing boom, leaving policymakers unprepared for a tax-base shock. His biggest vulnerability lies in the assumption that "AI satellite accounts" can be implemented quickly enough to matter before the disruption hits. Readers should watch for how statistical agencies respond to these calls for new data categories, as that will determine whether we navigate this transition with eyes open or closed.