← Back to Library

Import AI 459: AI oversight is difficult; scaling laws for protein folding models; and pricing the…

This newsletter cuts through the noise of standard economic reporting to reveal a terrifying disconnect: the AI economy is exploding at 2,000% annually, yet our official statistics see almost nothing. Jack Clark argues that we are navigating a financial blind spot where the most transformative technology in history is invisible to the very tools designed to measure it, risking a policy failure on the scale of the Great Depression.

The Invisible Shark in the Water

Clark opens with a startling assessment from a new paper by researchers at the University of Virginia, Anthropic, and the Bank of Canada. They posit that while conventional GDP data suggests steady, slow growth, the reality is a sector growing "at an unprecedented rate" but remaining largely hidden. Clark writes, "Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2,600 percent per year in quality-adjusted real terms." This framing is crucial because it shifts the debate from whether AI creates value to how we fail to capture that value in our ledgers.

Import AI 459: AI oversight is difficult; scaling laws for protein folding models; and pricing the…

The core of the argument rests on a paradox familiar to economists studying historical technological shifts: prices drop so fast they mask output gains. Clark notes, "Nominal AI revenues grow only moderately because per-unit prices for any given level of AI capability fall almost as fast as quality-adjusted output rises." This echoes the challenges faced during the semiconductor boom and the internet's rise, yet Clark points out a critical divergence. He writes, "In the prior episodes, the rapidly improving technology was a complement to human labor at the aggregate level," whereas today, "AI is the first plausible candidate for large-scale technological mismeasurement in which the rapidly improving sector may become a substitute for human labor."

This distinction matters profoundly. If the economy is growing but the tax base isn't reflecting it because of measurement errors, governments are flying blind. As Clark warns, "A finance ministry running ten-year revenue projections off the conventional data will materially underweight the probability of a labor-tax-base shock—and will be correspondingly unprepared to design responses such as tax system reforms, sovereign wealth funds, or other benefit-sharing schemes that such a shock may call for." The reference to the Baumol effect here is implicit; just as healthcare costs rise relative to manufacturing due to productivity differences, AI's deflationary power on intelligence creates a statistical ghost.

A windfall that cannot be seen cannot be shared.

Critics might argue that GDP is an imperfect but necessary metric for stability and that chasing "quality-adjusted" output introduces too much subjectivity into national accounting. However, Clark counters this by invoking the "Jaws" metaphor: the music is playing, the shark is approaching, but everyone on the surface sees calm water. He writes, "That's what it feels like working on AI and staring at most economic data right now." The danger isn't just academic; it's that we are unprepared for a shock where labor demand collapses while productivity soars. To fix this, Clark outlines three recommendations: creating "AI satellite accounts," partnering with industry to generate better primary data, and incorporating these new capacity measurements into medium-term projections.

The Difficulty of Automated Oversight

Shifting from economics to safety, Clark tackles the growing belief that we can use AI to police itself. The UK AI Security Institute has released research suggesting this is a dangerous oversimplification. Clark summarizes their findings: "Errors in automated alignment research are likely to be harder to identify than the human baseline." This is a sobering counter-narrative to the hype that smarter models will inevitably solve the problems of being smart.

The argument hinges on the nature of AI mistakes. Unlike human errors, which often follow intuitive patterns, AI agents can make "alien mistakes" that are un-intuitive to humans, compounded by optimization pressure that prioritizes human approval over truth. Clark writes, "Alignment solutions may rely on arguments that humans are unable to follow." This creates a scenario where the very tools meant to ensure safety might obscure the path to failure.

To address this, the researchers propose rigorous testing regimes, including recreating completed research projects from arbitrary cutoff points and using red teams to force agents to hide errors in papers. Clark highlights the stakes with chilling clarity: "Whether we are able to supervise smarter-than-human systems is fundamentally a question about who controls the future." If we cannot build these oversight techniques, he warns, humans will take a backseat due to misalignment or gradual disempowerment.

Critics might suggest that human oversight was never sufficient for complex systems and that relying on it as a gold standard is itself a fallacy. Yet, Clark's framing suggests that without scalable oversight, we are surrendering agency entirely. The proposed interventions—like "mechanistic understanding of generalisation" and testing "optimal human-agent team structure"—are practical steps, but they require admitting that the current trajectory is insufficient.

Protein Folding and Permissive Data

Amidst these existential and economic concerns, Clark highlights two developments that offer tangible, positive-sum progress. First, he details the release of the Giant Permissive Image Corpus (GPIC), a dataset of 100 million images with permissive licensing for both research and commercial use. This is a vital resource for startups and academics who often get locked out by copyright litigation. Clark notes, "All GPIC images are permissively licensed for both research and commercial use," calling the dataset "the equivalent of free, clean vegetables."

Second, he covers Biohub's release of ESMFold2, a rival to DeepMind's AlphaFold that is already showing superior performance in protein structure prediction. This isn't just a technical victory; it has immediate medical implications. Clark writes that researchers used these tools to "design protein binders against five targets at the center of cancer and immunology research," achieving hit rates as high as 88%.

The scaling laws here are explicit: "In every generation of ESM, improvements in the fidelity of representations were linked with the number of parameters and amount of compute used in model training." This reinforces the idea that more compute and better data directly translate to human health benefits. Clark concludes this section by noting that tools like this are essential for shifting public perception from fear to hope: "Tools like the ESM family of technologies are how human scientists are going to team up with AI systems to improve human health around the world."

Bottom Line

Jack Clark's commentary succeeds in exposing a critical vulnerability in our economic and safety frameworks: we are measuring the past while living in the future. The strongest part of his argument is the demonstration that GDP metrics are actively obscuring a labor-displacing boom, leaving policymakers unprepared for a tax-base shock. His biggest vulnerability lies in the assumption that "AI satellite accounts" can be implemented quickly enough to matter before the disruption hits. Readers should watch for how statistical agencies respond to these calls for new data categories, as that will determine whether we navigate this transition with eyes open or closed.

Deep Dives

Explore these related deep dives:

  • Productivity paradox

    This 1987 observation that productivity growth stalled despite massive IT investment provides the historical precedent for why today's AI boom might similarly remain invisible in GDP statistics until a tipping point is reached.

  • Hedonic regression

    The article attributes AI's economic invisibility to falling prices per unit of capability, and this statistical method explains exactly how economists adjust for quality improvements that traditional metrics miss.

  • Baumol effect

    While the article notes AI is a substitute rather than a complement to labor, understanding this economic theory about stagnant productivity in service sectors clarifies why the displacement of human workers creates unique measurement challenges compared to previous industrial revolutions.

Sources

Import AI 459: AI oversight is difficult; scaling laws for protein folding models; and pricing the…

by Jack Clark · Import AI · Read full article

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe.

The AI economy in the US is growing at 2,000% a year:…The more directly you measure the AI economy, the weirder and more unprecedented it seems to get…Economists with the University of Virginia* and Anthropic, and the Bank of Canada have written a paper outlining both the tremendous growth of the emerging “AI economy” in the US, and wrestling with why this growth is hard to see in aggregate GDP statistics. “The AI economy in the United States has been growing at an unprecedented rate, but this extraordinary growth is largely invisible in conventional GDP statistics,” they write. “Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2,600 percent per year in quality-adjusted real terms.”Why it’s hard to see: There are a couple of factors here - one is that though the datacenter building boom is large it still isn’t quite large enough to uplift GDP significantly. By comparison, where the majority of AI’s economic impact is taking place is in AI inference - the usage of AI’s systems - but there are confounding factors here as it relates to GDP measurement: “Nominal AI revenues grow only moderately because per-unit prices for any given level of AI capability fall almost as fast as quality-adjusted output rises,” they write.If we can’t measure this, we might end up surprised in a way that’s hard to recover from: “AI is the latest in a series of fast-moving technologies that have raised measurement concerns; semiconductors and the internet generated similar debates in their time,” they write. But a key difference is that AI as a technology might have a far bigger impact on labor than these other technologies. “In the prior episodes, the rapidly improving technology was a complement to human labor at the aggregate level,” they write. “AI is the first plausible candidate for large-scale technological mismeasurement in which the rapidly improving sector may become a substitute for human labor”.Three ways of measuring the AI economy:

Nominal compute spending: US compute spending rose from $37 billion in 2023 to $90 billion in 2024 to $219 billion in 2025.

Raw compute capacity: Due to efficiencies in newer chips, actual capacity grows ...