← Back to Library

The great gpu shortage – rental capacity – launching our h100 1 year rental price index

Dylan Patel doesn't just report on a shortage; he reveals a market that has fundamentally broken its own pricing logic. While most observers expected compute costs to plummet as new hardware arrived, Patel documents a scenario where prices are soaring, availability has vanished, and the very act of securing a server feels less like a business transaction and more like a desperate scramble. This is the definitive read for anyone trying to understand why the AI boom has hit a physical wall, backed by proprietary data that exposes the gap between public listings and the brutal reality of closed-door negotiations.

The Great Reversal

The core of Patel's argument dismantles the prevailing wisdom of late 2025. For months, the industry assumed that the arrival of next-generation Blackwell chips would render older Hopper architectures obsolete, driving rental rates down. Patel writes, "Only six months ago, most market observers were skeptical on GPU terminal value and assumed an inexorably steep fall in GPU rental rates over time." He then flips the script, showing that the opposite occurred: demand for the older H100 chips not only held firm but strengthened as the ecosystem pivoted toward inference and agentic workflows.

The great gpu shortage – rental capacity – launching our h100 1 year rental price index

This shift wasn't driven by a single application but by a structural change in how AI is consumed. Patel notes that "the rapid adoption of open-weight models and accelerating inference demand at that time was the first sign of the insatiable wave of compute demand coming to market." The evidence is stark: H100 rental prices jumped nearly 40% in just five months, moving from $1.70 per hour to $2.35. The market has become so tight that providers are locking up capacity for years, with some H100 contracts being renewed for four-year terms extending into 2028. This suggests that the supply chain cannot keep pace with the velocity of adoption, creating a bottleneck that affects everything from memory chips to the gas turbines powering data centers.

Trying to find GPU compute in early 2026 has been like trying to book airplane tickets on the last flight out, high prices, and almost no availability.

The Anatomy of Scarcity

Patel's coverage excels in detailing the chaotic mechanics of this shortage. He moves beyond abstract numbers to describe a market where "customers are fighting to pay $14/hr/GPU for p6-b200 spot instances in AWS" and where providers are refusing to release on-demand instances back into the pool despite price hikes. The analogy he draws is visceral: "Trying to rent a cluster is actually like trying to buy drugs." This is not hyperbole; it reflects a market where liquidity has dried up and access is determined by relationships and prepayment rather than standard commercial terms.

The driver of this frenzy is the rise of multi-agent workloads. Patel observes that "multi-agent workloads executing multi-step workflows, operating at high concurrency and iterating continuously, leading to parabolic growth in token and compute consumption." He points to tools like Claude Code as a prime example, noting that his own firm has seen AI consume billions of tokens in a single week. The implication is clear: the return on investment for these tools is so high (estimated at 5-10x) that companies are willing to pay almost any price for the compute required to run them. As Patel puts it, "if the return on investment from using AI tools is 5-10x, then there is clearly a long way to go in GPU rental pricing before prices rise enough to curtail demand."

Critics might argue that such inelastic demand is unsustainable and that high prices will eventually force a market correction or a shift to more efficient models. However, the speed at which capacity is being absorbed—every cluster coming online until August 2026 is already booked—suggests that any correction is far off. The supply side is also constrained by an "AI Server Pricing Apocalypse," where rising memory costs have forced Original Equipment Manufacturers to hike server prices, causing some operators to delay or abandon deployments, further tightening the rental market.

A New Market Structure

The most significant insight Patel offers is the transformation of the rental market itself. The dynamic has shifted from a buyer's market to a seller's market where providers dictate terms. "Neoclouds and Hyperscalers are now in the driver's seat – they can now negotiate for more favorable terms such as higher prepay, better pricing, longer contract lengths and can even pick and choose the contract start and end dates," he writes. This is a departure from the competitive pricing environment of the past, where operators were desperate to fill their racks.

To capture this reality, Patel's team has launched a new index based on direct survey data and transaction validation, rather than relying on public spot prices which often lag behind actual market movements. He explains that "most of the GPU rental market is transacted on a long-term basis with contracts of at least 6mths and longer," and these negotiated prices are rarely visible to the public. By releasing the H100 1-year contract price index, Patel provides a rare window into the true cost of intelligence, stripping away the noise of posted rates to reveal the underlying scarcity.

The debate on the true return of using AI is now a settled question – the use of AI tools can deliver value an order of magnitude greater than the cost of using the tools.

Bottom Line

Patel's analysis is a crucial correction to the narrative that AI infrastructure is becoming cheaper and more abundant; the data proves we are entering an era of extreme scarcity where compute is the primary constraint on innovation. The strongest part of his argument is the evidence of demand inelasticity driven by high-return agentic workflows, which suggests prices will continue to rise until supply catches up. The biggest vulnerability in this outlook is the potential for a sudden technological breakthrough in efficiency or a regulatory shift that could abruptly cool demand, but for now, the market is locked in a high-stakes race for silicon.

Sources

The great gpu shortage – rental capacity – launching our h100 1 year rental price index

by Dylan Patel · SemiAnalysis · Read full article

Dylan Patel doesn't just report on a shortage; he reveals a market that has fundamentally broken its own pricing logic. While most observers expected compute costs to plummet as new hardware arrived, Patel documents a scenario where prices are soaring, availability has vanished, and the very act of securing a server feels less like a business transaction and more like a desperate scramble. This is the definitive read for anyone trying to understand why the AI boom has hit a physical wall, backed by proprietary data that exposes the gap between public listings and the brutal reality of closed-door negotiations.

The Great Reversal.

The core of Patel's argument dismantles the prevailing wisdom of late 2025. For months, the industry assumed that the arrival of next-generation Blackwell chips would render older Hopper architectures obsolete, driving rental rates down. Patel writes, "Only six months ago, most market observers were skeptical on GPU terminal value and assumed an inexorably steep fall in GPU rental rates over time." He then flips the script, showing that the opposite occurred: demand for the older H100 chips not only held firm but strengthened as the ecosystem pivoted toward inference and agentic workflows.

This shift wasn't driven by a single application but by a structural change in how AI is consumed. Patel notes that "the rapid adoption of open-weight models and accelerating inference demand at that time was the first sign of the insatiable wave of compute demand coming to market." The evidence is stark: H100 rental prices jumped nearly 40% in just five months, moving from $1.70 per hour to $2.35. The market has become so tight that providers are locking up capacity for years, with some H100 contracts being renewed for four-year terms extending into 2028. This suggests that the supply chain cannot keep pace with the velocity of adoption, creating a bottleneck that affects everything from memory chips to the gas turbines powering data centers.

Trying to find GPU compute in early 2026 has been like trying to book airplane tickets on the last flight out, high prices, and almost no availability.

The Anatomy of Scarcity.

Patel's coverage excels in detailing the chaotic mechanics of this shortage. He moves beyond abstract numbers to describe a market where "customers are fighting to pay $14/hr/GPU for p6-b200 spot instances in AWS" and where providers are refusing to release on-demand instances back into the pool despite price hikes. ...