← Back to Library

Nvidia, Amd, amkor, arista @ ubs tech conference

The most striking revelation from the UBS technology conference isn't a new chip launch, but a quiet admission: the AI hardware boom is still in its "additive" phase, with no significant replacement of older infrastructure yet. Chipstrat captures a pivotal moment where the industry's bottleneck has shifted from silicon availability to power constraints, forcing a re-evaluation of how hyperscalers manage their data centers. This analysis cuts through the hype of "next-gen" announcements to ask a more pragmatic question: when will the economics of power force the retirement of yesterday's compute?

The Additive Era and the Power Wall

Chipstrat reports that Colette Kress, Nvidia's chief financial officer, confirmed a counterintuitive reality: "It's true that most of the installed base still stays there." Rather than discarding older graphics processing units (GPUs) for the latest Blackwell architecture, companies are layering new hardware on top of existing fleets. The piece argues that this is logical; older chips remain highly effective for specific tasks like fine-tuning, data labeling, and synthetic data generation. "R&D teams everywhere can absorb essentially unlimited amounts of old GPU compute," the article notes, pointing out that depreciated hardware is too valuable to scrap when it can still crank out tokens.

Nvidia, Amd, amkor, arista @ ubs tech conference

However, this "additive" model faces a hard physical limit. The commentary highlights a crucial insight from Amazon's leadership regarding capacity. When asked about constraints, Andrew Jassy noted, "On the capacity side... maybe the bottleneck is power." Chipstrat uses this to pivot the conversation from chip scarcity to energy reallocation. The central thesis emerges: a replacement cycle may not be driven by the obsolescence of older chips, but by the need to free up power budgets for newer, more efficient units. "Could we see a scenario where older GPUs get unplugged to free up the power for the latest generation chips that produce more tokens per Watt?" the piece asks.

If power is the ultimate constraint, the most efficient way to scale isn't just building more data centers, but ruthlessly optimizing the mix of hardware inside them.

Critics might note that this assumes a uniform efficiency gain across all workloads, ignoring that some legacy tasks simply cannot be migrated to newer architectures without significant software re-engineering. Yet, the logic holds for the massive scale of training clusters where every watt counts.

The Memory Ceiling and the Lifecycle Question

While power is a hard limit, memory capacity is the soft limit that will eventually force obsolescence. Chipstrat draws a compelling parallel to consumer electronics: "Think of this like older smartphones that eventually fail to run modern apps because the memory budget no longer matches the software expectations." As models demand larger context windows, older GPUs with smaller High Bandwidth Memory (HBM) footprints will struggle to hold working datasets, forcing data offloading that kills throughput.

This section connects deeply to the historical context of High Bandwidth Memory. Just as the industry moved from GDDR to HBM to solve the memory wall in previous generations, the current trajectory suggests that without sufficient on-chip memory, even the fastest compute cores become idle. The piece suggests that while older GPUs can handle "decode-only workloads," their utility will narrow as software expectations outpace hardware capabilities.

The commentary also raises a fascinating, unanswered question about the fate of retired hardware. "If Google isn't using all of the oldest ones… what happened to them?" Chipstrat wonders. Given the history of TPU iterations, where v1 and v2 units were eventually repurposed or decommissioned, the industry faces a growing challenge of hardware recycling. The article suggests a potential "industry to academia recycling" model, similar to how nanoelectronics fabrication tools were once donated to universities, though no major program currently exists for AI compute.

The Flexibility Debate: GPUs vs. Custom Accelerators

The most contentious part of the coverage involves the competition between general-purpose GPUs and custom "hyperscaler XPUs" like Google's Tensor Processing Units (TPUs) and Amazon's Trainium. Lisa Su, CEO of AMD, framed the debate by stating that while custom chips are "purpose-built," they lack the "same programmability" and "model flexibility" of GPUs. Chipstrat reports that Su believes GPUs will remain the "significant majority of the market" for the next five years because developers need the freedom to innovate on algorithms they cannot yet predict.

The piece offers a sharp critique of the "fixed function ASIC" label often applied to these custom chips. Chipstrat argues that calling Amazon's Trainium a "fixed function ASIC" is a misnomer; it is a "programmable accelerator" with architectural choices tuned specifically for generative AI. "Trainium3 is not 'fixed function'; it is a programmable accelerator with architectural choices tuned for GenAI," the article clarifies, noting that AWS has doubled down on open-sourcing its software stack to broaden adoption.

The real distinction isn't between programmable and fixed; it's between a merchant silicon vendor designing for peak performance across all use cases and a cloud provider optimizing for their own specific time-to-market and total cost of ownership.

The commentary suggests that while custom chips offer efficiency today, they carry a hidden risk: a shorter productive lifespan. "XPUs make deliberate precision choices that optimize for today's workloads but narrow future flexibility," Chipstrat observes. If future algorithms require precision regimes (like FP4 or FP6) that the custom silicon wasn't built to handle efficiently, that hardware may become obsolete faster than a flexible GPU. This echoes the historical trajectory of specialized accelerators, which often struggle when the software stack evolves away from their specific design assumptions.

The Ecosystem Players: Arista and Amkor

Beyond the chipmakers, the piece highlights the critical infrastructure providers Arista Networks and Amkor Technology. Chipstrat notes that both companies believe their growth trajectories are only beginning. Arista is positioned to benefit from the massive scale-out and scale-up Ethernet requirements of AI clusters, while Amkor is capitalizing on the demand for advanced packaging in the United States. The article touches on the geopolitical dimension, mentioning the competition from TSMC's Arizona facilities, but focuses on the immediate technical necessity: "American advanced packaging" is essential to support the domestic supply chain for these high-performance chips.

Bottom Line

Chipstrat's coverage succeeds by shifting the focus from the excitement of new chip launches to the gritty realities of power, memory, and lifecycle management. The strongest argument is that the "replacement cycle" will be driven by physics (power and memory) rather than pure performance obsolescence, a nuance often missed in hype-driven analysis. However, the piece's skepticism regarding the longevity of custom accelerators may underestimate the rapid iteration capabilities of hyperscalers, who can redesign their silicon faster than merchant vendors can update their roadmaps. The reader should watch for the first major hyperscaler announcement that explicitly cites power reallocation as the reason for retiring a generation of hardware, as that will mark the true end of the "additive" era.

Deep Dives

Explore these related deep dives:

  • High Bandwidth Memory

    The article discusses HBM capacity and bandwidth limitations in older GPUs as a potential driver of replacement cycles. Understanding HBM's technical architecture, stacking technology, and evolution across generations would give readers deeper insight into why memory constraints matter for AI workloads.

  • Tensor Processing Unit

    The article references Google TPUs and questions what happens to older versions, while also discussing the GPU vs XPU competition. Understanding TPU architecture, its differences from GPUs, and Google's design philosophy provides essential context for the hyperscaler competition discussion.

  • Fabless manufacturing

    The article contrasts Nvidia as a 'merchant silicon vendor' against hyperscalers designing custom chips. Understanding the fabless model explains why companies like Nvidia, AMD, and now cloud providers make different architectural tradeoffs and how the semiconductor industry structure shapes AI chip competition.

Sources

Nvidia, Amd, amkor, arista @ ubs tech conference

by Various · Chipstrat · Read full article

Thoughts from various conversations at the UBS conference last week:

Nvidia.

No Replacement Cycles Yet.

Colette confirmed that there hasn’t been a datacenter GPU replacement cycle yet:

Timothy Arcuri: And I get the question a lot about how much of what you’re shipping is replacing existing GPUs versus just additive to the existing base. And it seems like almost all of what you’re shipping is just additive to the base. We haven’t even begun to replace the existing installed base. Is that correct?

Colette Kress: It’s true. It’s true that most of the installed base still stays there. And what we are seeing is the advanced new models want to go to the latest generation because a lot of our codesign was working with the researchers of all of these companies to help understand what they’re going to need for their next models. So that’s the important part that they do. They move that model to the newest architecture and stay with the existing. So yes, to this date, most of what you’re seeing is all brand new builds throughout the U.S. and across the world.

On the one hand this is fairly obvious: GPUs, even older ones, are super useful whether you’re pre-training, post-training, fine-tuning, serving inference, labeling data, simulating autonomy, synthetic data generation, ablation studies, regression testing, etc etc. R&D teams everywhere can absorb essentially unlimited amounts of old GPU compute. Every lab has more experiments it wants to run than budget for new GPUs.

So why throw out old GPUs that can still crank out tokens, even if the throughput is lower? Especially if they are nearly or fully depreciated!

But it does raise the question: what would cause GPU replacement cycles?

Power Budget Reallocation.

Recall that power is a constraint. Remember how Andy Jassy answered a capacity question on the Amazon earnings call in terms of power and not chips?

Justin Post: I’ll ask on AWS. Can you just kind of go through how you’re feeling about your capacity levels and how capacity constrained you are right now?

Andrew Jassy: On the capacity side, we brought in quite a bit of capacity, as I mentioned in my opening comments, 3.8 gigawatts of capacity in the last year with another gigawatt plus coming in the fourth quarter and we expect to double our overall capacity by the end of 2027. So we’re bringing in quite a bit ...