Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI

Jack Clark delivers a jarring reality check: the economic disruption of artificial intelligence is no longer a theoretical future scenario but a measurable present-day fact. While much of the public discourse remains fixated on chatbot novelty, this piece presents hard data showing that AI systems are already matching human experts in real-world economic tasks while operating at one-hundredth the cost and time. This is not a prediction; it is an evaluation of a shift that has already begun.

The GDPval Benchmark: Measuring Real-World Impact

Clark introduces a new benchmark from OpenAI called GDPval, designed to move beyond abstract reasoning tests to measure actual economic utility. He notes that this tool could be to the broad economy what SWE-Bench is to coding: a definitive standard for capability. The benchmark is rigorous, testing models across 9 industries and 44 occupations with tasks vetted by professionals with an average of 14 years of experience.

Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI

Crucially, the evaluation reflects the messiness of actual work. "GDPval tasks are not simple text prompts," the authors write. "They come with reference files and context, and the expected deliverables span documents, slides, diagrams, spreadsheets, and multimedia." This realism is vital because it strips away the illusion that AI only works in clean, controlled environments. The results are startling: "We found that today's best frontier models are already approaching the quality of work produced by industry experts." Specifically, the leading model achieved a win or tie rate of nearly 48 percent against human professionals.

"We found that frontier models can complete GDPval tasks roughly 100x faster and 100x cheaper than industry experts."

This finding is the piece's most explosive element. The speed and cost differential suggests that the economic incentive to replace or augment human labor is not a distant possibility but an immediate mathematical certainty. Clark argues that AI companies are now building systems designed to operate in "Every. Single. Part. Of. The. Economy." He frames this as a historic anomaly, noting that we are testing systems for 44 distinct "ecological economic niches" and finding them nearly ready to plug in. Critics might argue that these benchmarks still rely on specific, curated tasks and may not capture the full nuance of long-term project management or the emotional intelligence required in many of these roles. However, the sheer scale of the speed and cost advantage makes the direction of travel undeniable.

The Sovereign AI Gamble: Switzerland's Attempt

The commentary then pivots to the geopolitical struggle for AI independence, examining the Swiss "Apertus" models. Clark describes this as a symptom of "AI nationalism," where nations realize they cannot rely on US or Chinese dominance. The Swiss coalition trained models on 15 trillion tokens across 1,811 languages, a massive undertaking for an academic-led initiative.

However, Clark is blunt about the outcome: "Generally, No!" The models are not competitive with the leading open-weight models from major corporations. On standard reasoning benchmarks, the Swiss models lag significantly behind their American and Chinese counterparts. The one exception is multilingual performance, where they approach or occasionally supersede other models. Clark warns that without top-tier performance, these projects risk becoming "research curiosities" rather than strategic assets. He draws a parallel to the BLOOM model, noting that few remember it today, suggesting that "sovereign AI" efforts that lack cutting-edge capability may be destined for obscurity.

"Buying a seat onto the AGI table will require on the order of millions of chips expended on a single training run, so Apertus - like all of its brethren - is a few orders of magnitude off so far."

This section highlights a painful truth for non-superpower nations: the barrier to entry for leading-edge AI is becoming prohibitively high. The drive for sovereignty is inevitable, but the resources required are staggering. Clark speculates whether the Swiss government might eventually tap its literal gold reserves to fund the necessary compute, underscoring the desperation and scale of the challenge.

Rethinking Economics for a Transformed World

The final major segment addresses the academic response to these technological shifts. Clark highlights a new position paper from researchers at Stanford, the University of Virginia, and the University of Toronto, which argues that economists must stop waiting and start preparing for transformative AI. The paper defines such AI as a system that could increase total factor productivity by 3 to 5 times historical averages.

The authors outline 21 critical questions, ranging from income distribution to the concentration of power. They emphasize that unlike technical analyses, "economic analysis emphasizes societal outcomes: who benefits, what trade-offs emerge, and how institutions might adapt to technological change." The questions are profound, asking how society can retain a sense of meaning if the "economic problem is solved" and how to prevent AI from exacerbating inequality. Clark interprets this agenda as a signal that experts expect radical changes comparable to the post-World War II reforms or the Industrial Revolution.

"The fundamental question this is all pointing at is 'how to equitably share the benefits and how to reform taxation systems in a world where traditional labor may be significantly diminished'."

Clark's own analysis adds a layer of uncertainty, suggesting that the ultimate impact depends on two unknown variables: the speed of diffusion and the revenue multiplier of the technology. If AI delivers a hundred dollars of revenue for every dollar spent, the entire economic system is upended. If it is slower and less efficient, the transition is more manageable. He concludes that we are fundamentally unprepared for a world of "true abundance," where the traditional link between labor and income is severed.

Bottom Line

Jack Clark's strongest contribution is the empirical evidence that AI's economic displacement is already underway, not a future threat but a current reality measured by speed and cost. The piece's greatest vulnerability lies in its reliance on the assumption that these technical capabilities will translate seamlessly into widespread adoption, potentially underestimating the regulatory and cultural friction that could slow the transition. Readers should watch for how quickly the 100x efficiency gains materialize in regulated industries, as that will be the true test of the economic transformation ahead.

Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI

by Jack Clark · Import AI · Read full article

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

OpenAI builds an eval that could be to the broad economy as SWE-Bench is to code:…GDPval is a very good benchmark with extremely significant implications…OpenAI has built and released GDPval, an extremely well put together benchmark for testing out how well AI systems do on the kinds of tasks people do in the real world economy. GDPval may end up being to broad real world economic impact as SWE-Bench is to coding impact, as far as evals go - which is a big deal!What it is: GDPval “measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks.” The benchmark tests out 9 industries across 44 occupations, including 1,230 specialized tasks “each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields”. The dataset “includes 30 fully reviewed tasks per occupation (full-set) with 5 tasks per occupation in our open-sourced gold set”. Another nice property of the benchmark is that it involves multiple formats for response and tries to get at some of the messiness inherent to the real world. “GDPval tasks are not simple text prompts,” they write. “They come with reference files and context, and the expected deliverables span documents, slides, diagrams, spreadsheets, and multimedia. This realism makes GDPval a more realistic test of how models might support professionals.” “To evaluate model performance on GDPval tasks, we rely on expert “graders”—a group of experienced professionals from the same occupations represented in the dataset. These graders blindly compare model-generated deliverables with those produced by task writers (not knowing which is AI versus human generated), and offer critiques and rankings. Graders then rank the human and AI deliverables and classify each AI deliverable as “better”, “as good as”, or “worse than” one another,” the authors write.Results: “We found that today’s best frontier models are already approaching the quality of work produced by industry experts”, the authors write. Claude Opus 4.1 came in first with an overall win or tie rate of 47.6% versus work produced by a human, followed by GPT-5-high with 38.8%, and o3 high with 34.1%.Faster and Cheaper: More ...

The GDPval Benchmark: Measuring Real-World Impact

The Sovereign AI Gamble: Switzerland's Attempt

Rethinking Economics for a Transformed World

Bottom Line

Sources

Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI