This piece cuts through the industry's current obsession with scale to expose a fundamental architectural flaw: the belief that forcing artificial intelligence to "think" in words is a feature, not a bug. Alberto Romero argues that the trillion-dollar rush to burn tokens is not a sign of intelligence, but a costly scaffolding propping up a system that cannot yet reason without a verbal crutch.
The Token Trap
Romero opens by highlighting a bizarre cultural shift within elite engineering teams, where spending massive amounts of money on data processing has become a status symbol. He points to Meta's internal leaderboard, "Claudeonomics," where users are ranked from "Session Immortal" to "Token Legend." The scale is staggering; Romero notes that over a single 30-day period, usage on this dashboard topped "~60 trillion tokens," a figure that dwarfs the estimated 20 trillion tokens comprising all books published throughout history.
This behavior isn't limited to Meta. Romero cites Nvidia CEO Jensen Huang, who admitted that if an engineer spent less than $250,000 a year on tokens, he would be "deeply alarmed." The industry has normalized this excess, with OpenAI even creating a "Tokens of Appreciation" program to reward high-volume users. Romero observes that engineers are now "tokenmaxxing," boasting about processing volumes that exceed their own salaries. He writes, "If you are not burning through your token cap, you are like the chip designer who refuses to use CAD tools and instead designs by hand, with a pencil."
The author's framing here is sharp. He correctly identifies that this is a classic case of Goodhart's law in action: "What do you think your employees will do if you take this simple, hackable metric as proof of work? Well, hack it. Duh." The industry has incentivized inefficiency, rewarding the generation of output over the quality of thought. Critics might argue that this "brute force" approach is simply the current path of least resistance for scaling intelligence, but Romero suggests this is a temporary fix that has been mistaken for a permanent solution.
The intelligence age has mistaken the scaffolding for the cathedral.
The Pre-Linguistic Reality
The core of Romero's argument shifts from economics to cognitive science. He challenges the assumption that thinking requires language, drawing a parallel between human cognition and the potential of AI. Romero describes his own thought process not as a stream of words, but as navigating a landscape of "attraction points" and "sensations." He argues that language is merely a "lossy compression" of these richer, pre-linguistic thoughts.
To support this, Romero leans on historical and modern evidence. He references a 1945 survey by mathematician Jacques Hadamard, which found that great minds like Einstein rarely thought in words. Einstein famously stated, "The words of the language, as they are written or spoken, do not seem to play any role in my mechanism of thought." Romero connects this to 2024 research from Evelina Fedorenko's lab at MIT, which confirmed that the brain's language network does not activate during reasoning tasks. As Romero puts it, "Language is primarily a tool for communication rather than thought."
This distinction is crucial for understanding the current AI bottleneck. Romero argues that current models are forced to "think before you respond" by generating sequential tokens, a process he compares to being "forced to narrate every thought aloud before you're allowed to have the next one." This architectural constraint forces models to spell out their knowledge to access it, creating a massive inefficiency. He writes, "It makes no sense that an AI model has to spell out what it 'knows' in order to know it."
The Architectural Dead End
Romero contends that the industry is stuck because it has conflated the "scaffolding" of inference-time token generation with the actual building of intelligence. He describes the current state of AI as a choice between a "dumb wordcel" (pre-trained models that respond instantly) or a "mute savant" (post-trained models that can reason but only by generating excessive text). The current solution, he argues, is a "prosthesis" that companies are desperately trying to convince us is load-bearing architecture.
The author highlights a growing faction of researchers who are trying to break this cycle by moving away from token prediction entirely. He points to the work of Yann LeCun, who has long argued that large language models are "a dead end." Romero details LeCun's work on Joint Embedding Predictive Architecture (JEPA), which aims to predict meaning and abstract representations in "latent space" rather than predicting the next word. He notes that Meta's own FAIR lab published "Coconut" (Chain of Continuous Thought) and the "Large Concept Model," both of which attempt to reason through continuous space instead of decoding into tokens.
The tension between this scientific approach and the commercial reality is stark. Romero reveals that LeCun left Meta in late 2025, not just due to organizational changes, but because the company doubled down on "safe and proven" token-based approaches after the flop of Llama 4. The author suggests that the industry is prioritizing short-term profitability over the long-term viability of the technology. He writes, "Making a clean profit is what truly matters. It's just not economically feasible to overhaul an entire industry just because it's resting on top of a fallacy."
Bottom Line
Romero's most compelling contribution is the reframing of the token explosion not as a triumph of scale, but as a symptom of an architectural failure to replicate human-like, pre-linguistic reasoning. The argument's greatest vulnerability lies in the timeline; while the science of latent space reasoning is promising, the industry's reliance on token-based scaling is deeply entrenched and profitable. Readers should watch for whether the next generation of models can truly escape the "token crutch" or if the industry will continue to burn trillions to keep the scaffolding standing.