← Back to Library

Inside the AI industry's most expensive mistake

This piece cuts through the industry's current obsession with scale to expose a fundamental architectural flaw: the belief that forcing artificial intelligence to "think" in words is a feature, not a bug. Alberto Romero argues that the trillion-dollar rush to burn tokens is not a sign of intelligence, but a costly scaffolding propping up a system that cannot yet reason without a verbal crutch.

The Token Trap

Romero opens by highlighting a bizarre cultural shift within elite engineering teams, where spending massive amounts of money on data processing has become a status symbol. He points to Meta's internal leaderboard, "Claudeonomics," where users are ranked from "Session Immortal" to "Token Legend." The scale is staggering; Romero notes that over a single 30-day period, usage on this dashboard topped "~60 trillion tokens," a figure that dwarfs the estimated 20 trillion tokens comprising all books published throughout history.

Inside the AI industry's most expensive mistake

This behavior isn't limited to Meta. Romero cites Nvidia CEO Jensen Huang, who admitted that if an engineer spent less than $250,000 a year on tokens, he would be "deeply alarmed." The industry has normalized this excess, with OpenAI even creating a "Tokens of Appreciation" program to reward high-volume users. Romero observes that engineers are now "tokenmaxxing," boasting about processing volumes that exceed their own salaries. He writes, "If you are not burning through your token cap, you are like the chip designer who refuses to use CAD tools and instead designs by hand, with a pencil."

The author's framing here is sharp. He correctly identifies that this is a classic case of Goodhart's law in action: "What do you think your employees will do if you take this simple, hackable metric as proof of work? Well, hack it. Duh." The industry has incentivized inefficiency, rewarding the generation of output over the quality of thought. Critics might argue that this "brute force" approach is simply the current path of least resistance for scaling intelligence, but Romero suggests this is a temporary fix that has been mistaken for a permanent solution.

The intelligence age has mistaken the scaffolding for the cathedral.

The Pre-Linguistic Reality

The core of Romero's argument shifts from economics to cognitive science. He challenges the assumption that thinking requires language, drawing a parallel between human cognition and the potential of AI. Romero describes his own thought process not as a stream of words, but as navigating a landscape of "attraction points" and "sensations." He argues that language is merely a "lossy compression" of these richer, pre-linguistic thoughts.

To support this, Romero leans on historical and modern evidence. He references a 1945 survey by mathematician Jacques Hadamard, which found that great minds like Einstein rarely thought in words. Einstein famously stated, "The words of the language, as they are written or spoken, do not seem to play any role in my mechanism of thought." Romero connects this to 2024 research from Evelina Fedorenko's lab at MIT, which confirmed that the brain's language network does not activate during reasoning tasks. As Romero puts it, "Language is primarily a tool for communication rather than thought."

This distinction is crucial for understanding the current AI bottleneck. Romero argues that current models are forced to "think before you respond" by generating sequential tokens, a process he compares to being "forced to narrate every thought aloud before you're allowed to have the next one." This architectural constraint forces models to spell out their knowledge to access it, creating a massive inefficiency. He writes, "It makes no sense that an AI model has to spell out what it 'knows' in order to know it."

The Architectural Dead End

Romero contends that the industry is stuck because it has conflated the "scaffolding" of inference-time token generation with the actual building of intelligence. He describes the current state of AI as a choice between a "dumb wordcel" (pre-trained models that respond instantly) or a "mute savant" (post-trained models that can reason but only by generating excessive text). The current solution, he argues, is a "prosthesis" that companies are desperately trying to convince us is load-bearing architecture.

The author highlights a growing faction of researchers who are trying to break this cycle by moving away from token prediction entirely. He points to the work of Yann LeCun, who has long argued that large language models are "a dead end." Romero details LeCun's work on Joint Embedding Predictive Architecture (JEPA), which aims to predict meaning and abstract representations in "latent space" rather than predicting the next word. He notes that Meta's own FAIR lab published "Coconut" (Chain of Continuous Thought) and the "Large Concept Model," both of which attempt to reason through continuous space instead of decoding into tokens.

The tension between this scientific approach and the commercial reality is stark. Romero reveals that LeCun left Meta in late 2025, not just due to organizational changes, but because the company doubled down on "safe and proven" token-based approaches after the flop of Llama 4. The author suggests that the industry is prioritizing short-term profitability over the long-term viability of the technology. He writes, "Making a clean profit is what truly matters. It's just not economically feasible to overhaul an entire industry just because it's resting on top of a fallacy."

Bottom Line

Romero's most compelling contribution is the reframing of the token explosion not as a triumph of scale, but as a symptom of an architectural failure to replicate human-like, pre-linguistic reasoning. The argument's greatest vulnerability lies in the timeline; while the science of latent space reasoning is promising, the industry's reliance on token-based scaling is deeply entrenched and profitable. Readers should watch for whether the next generation of models can truly escape the "token crutch" or if the industry will continue to burn trillions to keep the scaffolding standing.

Deep Dives

Explore these related deep dives:

  • What Is It Like to Be a Bat? Amazon · Better World Books by Thomas Nagel

  • Goodhart's law

    The article explicitly invokes this principle to explain how engineers are gaming token metrics by intentionally generating inefficient loops, turning a performance indicator into a self-defeating target.

  • Latent space

    This concept provides the necessary technical counterpoint to the article's critique of 'tokenmaxxing,' illustrating that true model intelligence resides in high-dimensional vector relationships rather than the sheer volume of sequential tokens processed.

  • What Is It Like to Be a Bat?

    Thomas Nagel's thought experiment is used here to question whether the massive expenditure of tokens actually correlates with genuine understanding or merely simulates the appearance of cognition without subjective experience.

Sources

Inside the AI industry's most expensive mistake

Hey there, I’m Alberto!

Each week, I publish long-form AI analysis covering culture, philosophy, and business for The Algorithmic Bridge. Paid subscribers also get Monday how-to guides and Friday news commentary. I publish occasional extra articles and essays. If you’d like to become a paid subscriber, here’s a button for that:

Today, I bring you a hot analysis + hot news, the best mix.

I..

Meta employees have an internal leaderboard called “Claudeonomics.” The ranks go up from casual user to “Session Immortal” to the top tier: “Token Legend.” Setting aside the cringe, it’s crazy that over a 30-day period, total usage on the dashboard topped ~60 trillion tokens (words). For context, it has been estimated that all books published throughout history amount to ~20 trillion tokens.

It’s not just Meta, though. Nvidia CEO Jensen Huang said in the All-In Podcast that if a $500k engineer spent less than $250k a year in tokens, he’d be “deeply alarmed.” When asked whether Nvidia is spending $2 billion on tokens for its engineering team, he said: “We’re trying to.” Last year, OpenAI introduced “Tokens of Appreciation,” a program to recognize developers and organizations that have processed high volumes of data through the API in three tiers: silver, black, and blue.

Token usage is, by all accounts, a new status symbol. Engineers at the top labs boast about token numbers like a bad coder would about lines of code: look how excessive I am. They are “tokenmaxxing,” as the New York Times wrote last month. “An engineer at OpenAI processed 210 billion ‘tokens,’” writes the NYT, “enough text to fill Wikipedia 33 times.”

Some engineers spend more on tokens than they earn in salary. Token budgets are being pitched as a “fourth component” of compensation, alongside salary, equity, and bonuses. Candidates are asking in interviews: how many tokens come with the job? API expenses are now competing with labor budgets.

If you are not burning through your token cap, said Huang, you are like the chip designer who refuses to use CAD tools and instead designs by hand, with a pencil. From an industrial perspective, this behavior makes sense: these are the tools we have. The more you spend, the more they think, and, presumably, the better the output you get, and thus the more benefit you create for your company.

In the “intelligence age,” the token is the unit of cognitive labor.

The ...