Agent, know thyself!

Rohit Krishnan challenges a foundational assumption in the rapidly scaling world of artificial intelligence: that we can simply route complex tasks to the best available model without first teaching those models to understand their own limitations. By introducing a new benchmark called MarketBench, Krishnan and his co-author Andrey Fradkin reveal a startling gap between what AI agents claim they can do and what they actually achieve, suggesting that the dream of a self-organizing AI economy is currently stalled by a lack of self-knowledge.

The Hayekian Promise vs. The Calibration Gap

Krishnan frames the problem of assigning tasks to AI agents through the lens of economic theory, specifically the work of Friedrich Hayek on the "local knowledge problem." The central thesis is that no central planner—human or algorithmic—can possess the dispersed, specific information required to match every task to the perfect agent. Instead, Krishnan proposes a market mechanism where agents bid on tasks based on their own assessment of cost and probability of success. "Markets tend to be superior to other forms of resource allocation when information and capabilities are distributed among a variety of people," Krishnan writes, arguing that this aggregation of private information is the only way to efficiently manage a heterogeneous ecosystem of models.

However, the piece quickly pivots from theory to a harsh empirical reality. To test this, the authors built MarketBench, asking six frontier models to forecast their own success rates and token consumption before attempting real software engineering tasks. The results were disqualifying for a market-based approach. "Models don't know themselves very well," Krishnan states bluntly. The data showed that while actual pass rates clustered tightly between 75% and 81%, the models' stated confidence spanned wildly from 61% to 93%. Some models, particularly from the Gemini family, were dramatically overconfident, while the GPT family was systematically under-confident.

"If you were running a market and asked agents 'how much compute will this take?' you'd get answers that are off by an order of magnitude or two."

This calibration failure has profound implications. In a functioning market, a bidder's price signals their capability. Here, the signals are noise. Krishnan notes that when they ran a simulated procurement auction, the results were predictable but disastrous: "Gemini wins 84.6% of auctions. But it's winning because it's the most overconfident, not because it's the most capable." This mirrors the historical failures of central planning not because the central planner was absent, but because the private information required to make the market work simply didn't exist in the agents' internal states. The analogy to Goodhart's law is apt here: once a measure (self-reported confidence) becomes a target (winning the bid), it ceases to be a good measure of the underlying reality (actual capability).

The Limits of Prompting and the Case for Diversity

Recognizing that the models lacked self-awareness, Krishnan tested a simpler intervention: providing each model with a "report card" of its historical performance to help it calibrate its current bids. While this improved the models' average accuracy slightly, it failed to solve the core problem of task-specific routing. "The intervention improved average calibration, not comparative routing," Krishnan explains, noting that a bidder can be right on average but still useless for allocation if they cannot distinguish between tasks they can solve and those they cannot.

This leads to a nuanced finding about system architecture. When the authors replaced the market mechanism with a centralized router—a single large model tasked with picking the best worker—the centralized planner actually outperformed the flawed market. "Once we held model diversity constant, a LLM central planner beat the market," Krishnan admits. This suggests that until agents can reliably self-assess, the "invisible hand" of the market is less efficient than a visible, albeit imperfect, hand of a central router.

"The single most robust finding in our live scaffold is that access to multiple different (frontier) models helps, almost regardless of how you route between them."

Despite the failure of the market mechanism itself, the study uncovers a critical practical takeaway for engineers: diversity is king. Even with crude routing, a system that leverages multiple different models significantly outperforms a single-model approach. This is a vital distinction. It implies that the immediate bottleneck is not the sophistication of the routing logic, but the fundamental architecture of the agent pool. "Don't lock into one provider, even if your routing logic is crude," Krishnan advises, emphasizing that the heterogeneity of the models themselves provides a buffer against the specific blind spots of any single architecture.

Critics might argue that focusing on self-assessment as a training target is a distraction from improving raw reasoning capabilities. If models simply get smarter, won't they naturally become better at estimating their own success? Krishnan anticipates this, arguing that solving a task and predicting the probability of solving it are distinct cognitive skills that require separate optimization. "Models are trained to solve tasks, not to predict whether they can solve them," he writes, suggesting that without explicit training on metacognition, raw intelligence alone will not yield a functional market.

Bottom Line

Rohit Krishnan's analysis delivers a necessary reality check to the hype surrounding autonomous AI markets: the infrastructure for decentralized coordination is currently broken because the participants lack self-knowledge. While the Hayekian vision of agents bidding on tasks remains theoretically sound, the empirical evidence shows that without a fundamental shift in how models are trained to understand their own capabilities, centralized routing and model diversity remain the only reliable strategies. The most urgent next step for the field is not better algorithms for bidding, but better curricula for metacognition.

"As agentic systems scale, the ability to say 'I can do this, at this cost, with this confidence' becomes as important as the ability to do the thing."

The Path Forward

The piece concludes with a call for a hybrid approach, acknowledging that pure decentralization is premature but that centralized planning will eventually hit a wall as the ecosystem grows too complex. Krishnan envisions a "scoring auction" where bids are weighted by reputation and observed history, effectively creating a market augmented by AI oversight. This middle ground recognizes that while the agents cannot yet be trusted to tell the truth, a system can be built to verify their claims over time. For now, the advice is pragmatic: test your models' self-assessment capabilities before betting your infrastructure on a market mechanism, because right now, they mostly don't know what they're good at.

Agent, know thyself!

by Rohit Krishnan · Strange Loop Canon · Read full article

Written with the wonderful Andrey Fradkin, who does the Justified Posteriors podcast.Attention conservation notice: We developed a new benchmark, MarketBench, and scaffold. Based on our findings, we argue that self-assessment of capabilities and costs is a key capability, and it needs to be a target of training. This is work in progress, and we are looking for collaborators and funding to pursue this research. Paper here. Repo here.

Let’s say you have a large-scale project to work on. How do you choose which model, scaffolding, or system to use? If you’re like most folks, you go with what your coding agent does by default. For Claude Code, this means that the model called is determined by a set of ad-hoc rules set by Anthropic. But this strategy is not guaranteed to be the most effective or cost-efficient way to build your project, especially since it ignores non-Anthropic models. In fact, it reminds us of central planning.

You could also go with an intelligent router. But turns out, routing is a wicked problem. To know which model should do which task requires computation and knowledge. For one-shot queries you can probably do this - any model can answer “what’s the capital of France” and few models can solve Erdos problems, especially without bespoke prompts. But what about that research question you asked this morning, in a chat you started three weeks ago, which has been forked four times and has had dozens of compactions? How do you train a router to figure out who should do the next task when it requires so much context?

This led us to think, what if we used markets instead of ad-hoc rules to assign tasks to AI agents? It turns out society has had this debate before. Markets tend to be superior to other forms of resource allocation when information and capabilities are distributed among a variety of people. In these cases, markets aggregate information and allocate resources in a relatively efficient manner, as well argued by Hayek.

You may be wondering, why would models have distributed information and capabilities? Aren’t there relatively few models and shouldn’t they only have the information you’ve given them. In a narrow interpretation, the private information could be the specific neural network weights of the model and how they relate to the task. These neural network weights result in models that have drastically different token consumption and success ...

Agent, know thyself!

The Hayekian Promise vs. The Calibration Gap

The Limits of Prompting and the Case for Diversity

Bottom Line

The Path Forward

Deep Dives

Sources

Agent, know thyself!