← Back to Library

Amazon’s AI resurgence: Aws & Anthropic's multi-gigawatt trainium expansion

Dylan Patel flips the script on the prevailing narrative that Amazon is losing the artificial intelligence race. While Wall Street punishes the company for lagging behind Microsoft and Google, Patel argues this is a temporary misreading of a massive, multi-gigawatt infrastructure pivot that is only just beginning to bear fruit.

The Anchor Customer Strategy

Patel identifies a specific catalyst that the broader market has overlooked: the deepening integration between Amazon Web Services and Anthropic. He writes, "Amazon's savior has a name: Anthropic," positioning the startup not just as a client, but as the architect of Amazon's next growth phase. The argument rests on the idea that the cloud wars are no longer about who has the most generic GPU capacity, but who can secure the "anchor customers" willing to bet billions on custom infrastructure.

Amazon’s AI resurgence: Aws & Anthropic's multi-gigawatt trainium expansion

The evidence is in the construction speed. Patel notes that "AWS is building datacenters faster than it ever has in its entire history," driven by a commitment to house nearly a million of Amazon's custom Trainium chips for Anthropic's training needs. This is a bold departure from the standard playbook. Critics might note that betting on a custom chip that currently lags behind Nvidia's best hardware is a massive risk, especially when the alternative is simply buying more Nvidia GPUs. However, Patel contends that for Anthropic's specific roadmap, the trade-off is calculated.

"Dario Amodei's startup was heavily involved in the design process, and its influence on the Trainium roadmap only grows from here."

This level of collaboration is rare. Patel suggests this partnership effectively makes Anthropic a co-designer of Amazon's silicon, creating a moat that generic cloud providers cannot easily replicate. The financial stakes are staggering, with Amazon having invested over $5 billion in Anthropic, a move that now looks less like a venture bet and more like a strategic acquisition of a future revenue stream.

The Trainium Gamble

The core of Patel's thesis challenges the assumption that Nvidia is unbeatable. He admits the raw specifications favor the competition, stating, "Trainium2 lags Nvidia's systems in many ways." On paper, Nvidia's chips offer significantly higher floating-point performance and memory bandwidth. Yet, Patel argues that raw speed is the wrong metric for this specific use case.

He pivots to Total Cost of Ownership (TCO), a more nuanced measure of efficiency. "Trainium2 is highly competitive on a TCO per million Tokens and TCO per TB/s of memory bandwidth," he writes. This is the crux of the argument: Anthropic's workloads are memory-bound, not just compute-bound. By optimizing for memory bandwidth per dollar rather than peak theoretical performance, the custom chips become viable, even superior, for the specific task of Reinforcement Learning.

"This will enable Anthropic to be, alongside Google DeepMind, the only AI labs benefiting from tight hardware–software co-design in the near horizon."

This framing is compelling because it moves the conversation away from a simple spec-sheet comparison to a systems-level analysis. It suggests that the future of AI infrastructure isn't just about buying the fastest chips, but about designing a stack where the hardware and the model are built for each other. Patel points out that Amazon's custom networking fabric, EFA, has historically been a weakness compared to Nvidia's InfiniBand, but he notes that the latest iterations are closing the gap, specifically tailored for these large-scale clusters.

The Timeline for Resurgence

Patel is careful not to promise immediate returns. He acknowledges that the massive datacenters currently under construction are not yet generating significant revenue. "While these datacenters look built from the skies, we don't think they are generating any meaningful revenue yet," he admits, citing yield issues in the early assembly of the new chips.

However, the forecast is aggressive. He predicts that by the end of 2025, these facilities will "meaningfully contribute to AWS' top line and jack up growth above the 20% YoY threshold." This timeline aligns with Anthropic's own scaling plans, which involve a $13 billion funding round to fuel further expansion. The argument is that the market is pricing in the past year's stagnation, failing to account for the infrastructure that will come online in the next twelve months.

"Given the sheer scale, we can't understate how bold Anthropic's bet is. Not only are they committing to spending tens of billions of dollars, they're doing it on a largely unproven chip!"

The risk here is real. If the custom chips fail to deliver the promised efficiency, or if Anthropic's growth slows, the entire thesis could unravel. Yet, the depth of the partnership suggests that both parties are too invested to let it fail. The administration of cloud resources is shifting from a commodity market to a bespoke engineering challenge, and Amazon is positioning itself to lead that shift.

Bottom Line

Patel's strongest move is reframing Amazon's "underperformance" not as a failure of strategy, but as a lag in execution for a massive, custom-built infrastructure play that is only now coming online. The argument's biggest vulnerability is the reliance on the success of a single, unproven chip architecture and the continued dominance of one specific partner. Investors should watch whether the Trainium chips can actually deliver the cost efficiencies promised when they hit full scale later this year.

Sources

Amazon’s AI resurgence: Aws & Anthropic's multi-gigawatt trainium expansion

by Dylan Patel · SemiAnalysis · Read full article

Two-and-a-half years ago, we flagged a looming “cloud crisis” at AWS. Today, the evidence has mounted. AWS is the crown jewel of the Amazon empire, generating ~60% of group profits, and dominating the lucrative Cloud Computing market. But it struggles to translate this strength into the new GPU/XPU Cloud era.

Microsoft Azure now leads the market on quarterly new cloud revenue, and the gap between Google Cloud and AWS has materially narrowed especially with Google's big moves on the TPU that we've been posting about for over a month. Markets have noticed. Year-to-date, Amazon is the clear laggard among the four tech-and-AI titans as investors mark down the company most for losing momentum in AI.

Today, SemiAnalysis is back with another out-of-consensus call. While the market overplays the Cloud Crisis theme, we call for an AWS AI Resurgence. We laid out our thesis a month ago to our Core Research subscribers, forecasting an upcoming acceleration beyond 20% year-over-year growth by the end of 2025.

Amazon’s savior has a name: Anthropic. The startup has been the clear outperformer in the GenAI market in 2025, multiplying revenue fivefold year-to-date to reach $5B annualized.

To keep that trajectory, Anthropic is betting hard on Scaling Laws. While Dario’s startup draws fewer headlines than OpenAI, xAI and Meta Superintelligence, it isn’t shy about investment. AWS has well over a gigawatt of datacenter capacity in final stages of construction for its anchor customer. AWS is building datacenters faster than it ever has in its entire history. And there’s much more on the horizon.

To understand and forecast GPU/XPU power capacity by AI Lab broken down by Cloud Provider, we rely on our proprietary Datacenter Industry Model powered by real-time satellite imagery. Trusted by all hyperscalers, AI labs, and the world’s largest investors, it provides a quarterly building-by-building datacenter forecast for OpenAI, Anthropic, xAI, Meta Superintelligence, Google DeepMind, and more. Contact us for more information.

Trainium vs GPUs.

While Amazon’s AI Datacenters are impressive in scale and speed, the design of individual building is unremarkable. Hyper-optimized for air-cooling, this blueprint is identical to 5-year-old traditional AWS Cloud datacenters.

What makes these facilities unique is their inside: they’ll host the world’s largest cluster of non-Nvidia AI chips, with just under a million Trainium2 in the largest campus. To understand everything about the Trainium2 system, read our December 2024 technical deep dive.

Trainium2 lags Nvidia’s systems in many ways, ...