Gergely Orosz dismantles the comforting illusion that buying a single enterprise license solves the AI coding dilemma. In a landscape where the default answer eighteen months ago was simply "buy GitHub Copilot," Orosz reveals a chaotic, fragmented reality where developer trust has replaced executive mandates as the primary driver of adoption. This is not a story about which model is fastest; it is a forensic look at how organizations of vastly different sizes are navigating a market where the tools are evolving faster than their procurement policies can handle.
The Speed of Trust vs. The Weight of Bureaucracy
Orosz draws a sharp line between the agility of small teams and the paralysis of large enterprises. For startups with fewer than sixty engineers, the selection process is almost entirely organic. The author notes that at a seed-stage logistics startup, the head of engineering adopted a "high-trust" model: "We agreed to try new tools for 2 weeks and see how everyone felt. We didn't use any hard-and-fast measurement. TLDR: I trust our devs and their opinion is a big part of this." This approach allows tools like CodeRabbit to "stick" within days if the team likes them, while inferior options are discarded just as quickly.
This fluidity stands in stark contrast to the mid-sized and large companies Orosz interviewed, where the decision-making machinery grinds to a halt under the weight of security reviews and budgetary scrutiny. At a cloud infrastructure firm with 900 employees, the principal engineer describes a painful tug-of-war: "We started with Copilot because it was easy to procure... Then switching to Cursor took forever. Pricing keeps shifting." The author highlights a critical friction point: executives are often unwilling to absorb the cost jump from a $40 monthly subscription to the $150 required for more advanced agents, even when developers are demonstrably more productive with the latter.
"The Copilot → Cursor → Claude Code migration path is well trodden, and nobody has cracked productivity measurement yet."
Critics might argue that Orosz underestimates the necessity of security gates in regulated industries, but the evidence he presents suggests these gates often become barriers to innovation rather than safeguards. The author illustrates this with a cautionary tale of an EU-based software company that declared itself "AI-first" in the summer of 2025 only to find itself gridlocked. Because leadership had not budgeted for alternatives to the default Microsoft offering, and because legal teams were paralyzed by the EU's AI Act, developers were left "stuck" with outdated models while the market moved on. As one engineer there lamented, "The pace of new models and tools in the second half of 2025 left the leadership team completely unprepared."
The Measurement Paradox
Perhaps the most damning insight in Orosz's deep dive is the universal failure to measure the actual value of AI tools. Every company, from the five-person startup to the publicly listed fintech, struggles to prove ROI. The industry's default metric—lines of code generated—is widely distrusted by engineers because it incentivizes volume over quality and ignores the most valuable use cases, such as debugging or research. Orosz writes, "Using the 'lines of code written by AI' metric creates bad incentives, a sentiment shared by AI enthusiasts and skeptics alike at the company."
The author contrasts this with the rigorous, albeit imperfect, frameworks some companies are building. Wealthsimple, for instance, ran a two-month selection process backed by usage data from Jellyfish, while WeTravel constructed a structured scoring system across five dimensions to evaluate code review tools. Yet, even with these efforts, the consensus remains that no single metric works. At a large fintech that ran a comparative test of Copilot, Claude, and Cursor across 50 pull requests, the results were nuanced: "Cursor reviews the most precise, Claude the most balanced, and Copilot the most quality-focused." This specificity is lost when companies rely on blunt, one-size-fits-all KPIs.
"There's no single vendor that's universally rated by every team in all contexts."
This finding challenges the vendor marketing narrative that a single "best" tool exists for everyone. Orosz points out that a tool beloved by one team can be loathed by another, depending on their specific workflow and codebase. The author notes that at a Series D observability company, non-engineers like product managers were using Claude Code more than median engineers to handle customer bug reports, a use case that would never show up in a standard developer productivity dashboard. This highlights a blind spot in how organizations currently evaluate their tech stacks.
The Vendor Lock-in Trap
A recurring theme in Orosz's reporting is the fear of vendor lock-in, particularly among larger firms. A staff engineer at a public travel company with 1,500 employees explicitly stated, "Our main concern is avoiding vendor lock-in with a single solution." This caution is understandable given the rapid pace of change; what is the industry standard today may be obsolete in six months. However, the author suggests that this fear often leads to inaction. The EU-based company mentioned earlier found themselves unable to approve new tools for six months, leading to a "vicious cycle where the tool feels underwhelming, which suppresses adoption and makes it harder to justify further investment."
Orosz also touches on the human element of this transition. At the Series A startup, a staff engineer noted that code reviews had become a headache as code quantity increased, but quality had dipped until the adoption of newer models like Opus 4.5. The solution wasn't just a new tool, but a cultural shift supported by "Agents.md" and "Claude.md" files that served as a single source of truth for coding style. This suggests that the technology is only half the battle; the other half is the documentation and shared context that allows teams to leverage these tools effectively.
Bottom Line
Orosz's most compelling argument is that the era of the "silver bullet" AI tool is over; the future belongs to organizations that can build flexible, trust-based evaluation processes rather than relying on executive mandates or vanity metrics. The piece's greatest vulnerability is its reliance on anonymous sources for the majority of its data, which, while necessary for candidness, limits the ability to verify the specific outcomes of these tooling shifts. Readers should watch for how the industry develops standardized, non-invasive metrics that can actually capture the value of AI without distorting developer behavior.