Right-sized AI infrastructure. Marvell xpus

Chipstrat dismantles the myth that AI datacenters are monolithic Nvidia fortresses, revealing instead a chaotic, high-stakes marketplace of modular components where hyperscalers are actively unbundling the stack. The piece argues that the era of "one size fits all" is over, replaced by a combinatorial explosion of hardware choices driven by the specific economic and performance needs of diverse workloads. This is not just an engineering deep dive; it is a strategic map for understanding where the next trillion dollars of infrastructure investment will actually flow.

The Lego Block Revolution

The core of the argument rests on a simple but profound metaphor: AI clusters are not pre-built monoliths but collections of interchangeable parts. Chipstrat reports, "With just a few legos you can create quite a diverse set of ducks!" This framing effectively strips away the mystique of "supercomputing" to reveal the practical reality of system design. The piece notes that while Nvidia GPUs dominate the compute layer, the surrounding infrastructure—networking, storage, and memory—is where the real customization happens.

Right-sized AI infrastructure. Marvell xpus

The editors highlight how even the most standardized components are now subject to intense scrutiny. "At Meta, we handle hundreds of trillions of AI model executions per day... Custom designing much of our own hardware, software, and network fabrics allows us to optimize the end-to-end experience," the piece quotes from Meta's own disclosures. This is a critical signal: the biggest players are no longer content to buy turnkey solutions. They are A/B testing network fabrics, pitting Ethernet against InfiniBand, to find the optimal balance between performance and cost.

Critics might argue that this level of fragmentation increases complexity and slows deployment, but the piece counters that the alternative—paying a premium for "Cadillac" performance on workloads that only need a "Honda"—is a far greater financial risk. The text emphasizes that "the design space is exploding," listing a crowded field of vendors from Broadcom to Credo and Arista competing for every socket.

"You have to think like the GM of the business; your job is to also manage risk and costs."

This shift in perspective is the piece's most valuable insight. It moves the conversation from pure technical specs to business strategy, suggesting that the winners in the next cycle won't be those with the fastest chips, but those who can best match hardware to specific workload shapes.

The Shape of Workloads

The commentary then pivots to a nuanced analysis of how different AI applications demand vastly different infrastructure. Chipstrat argues that a voice-to-voice assistant and a deep-reasoning agent have fundamentally different "shapes" of requirements. The former needs instant time-to-first-token to avoid the awkward silence of a broken connection, while the latter can tolerate a longer wait for a more complex answer.

The article illustrates this with a striking observation: "The shape of these two workloads are fairly similar!" referring to voice assistants and ad-tech copy rewriting. Both require high memory bandwidth but not massive context windows. Conversely, video generation and deep research models demand massive compute and context, creating a completely different infrastructure profile. "Notice how there are sort of two families of workloads here, and they result in different infra demands," the piece notes.

This distinction challenges the prevailing narrative that every datacenter must be built for the absolute cutting edge. The editors suggest that hyperscalers will increasingly deploy a mix of clusters: a cost-optimized fleet for lightweight tasks and a state-of-the-art fleet for heavy lifting. "A cost-optimized cluster for fast, lightweight workloads... A SOTA cluster for deep reasoning and generative video," they propose. This approach allows companies to depreciate older hardware for less demanding jobs, a strategy that could significantly alter the capital expenditure landscape.

The Economics of "Good Enough"

Perhaps the most provocative claim in the piece is the idea that "good enough" performance is a viable, and often superior, strategy. Chipstrat writes, "Some Cadillac systems could be overkill for the workloads when a Honda would do." The article posits that if a workload only requires 100 tokens per second, a system delivering 1,000 tokens per second is a waste of capital and energy.

The editors illustrate this with a scenario where a lower-cost configuration using previous-generation compute or cheaper memory types (GDDR instead of HBM) can hit the "acceptable performance" threshold at a fraction of the cost. "In this scenario, the second configuration can hit 'acceptable performance' at a lower cost than the first configuration," they explain. This reframes the entire market dynamic: it's not just about who has the most powerful chip, but who can deliver the right performance-to-cost ratio for a specific use case.

The piece also touches on the emerging trend of custom silicon, noting that hyperscalers are increasingly working with vendors like Marvell and Broadcom to design their own accelerators. "Which makes the case for designing the Lego blocks you need; i.e. working with a company like Marvell or Broadcom to make custom silicon for your datacenter," the article states. This suggests a future where the merchant silicon market is bifurcating between standardized, high-volume chips and highly specialized, custom-built solutions.

Bottom Line

Chipstrat's strongest contribution is its rejection of the "bigger is better" dogma, replacing it with a sophisticated framework for workload-specific optimization. The argument that infrastructure must be "right-sized" to the specific shape of the AI application is compelling and timely. However, the piece's biggest vulnerability lies in its assumption that hyperscalers have the engineering bandwidth to manage this growing complexity; the operational overhead of maintaining a fragmented, multi-vendor ecosystem could prove to be a significant drag on innovation. The reader should watch for how quickly the industry moves from A/B testing to full-scale deployment of these mixed-architecture clusters, as that will be the true test of this new paradigm.

Right-sized AI infrastructure. Marvell xpus

by Various · Chipstrat · Read full article

Ready to study Marvell’s datacenter business?

Marvell tends to zoom in and talk about the components that power said AI datacenters, namely “XPUs and XPU attach”. And we’ll talk about those.

But first, let’s zoom out.

Marvell helps hyperscalers develop custom AI datacenters.

While “custom datacenters” might sound niche, pretty much all AI datacenters are custom. And of course, the AI datacenter TAM is insane.

But I thought the vast majority of the market uses Nvidia GPUs with NVLink and Infiniband or Spectrum-X… so what do you mean by custom? Seems like mostly all Nvidia…

Recall that AI clusters are made up of many components beyond just the compute (GPU/XPU). They also include networking (scale-up and scale-out), storage, and software. And yes, while many AI clusters share similar components, the resulting system configuration is often distinct.

And just like the image above, each hyperscaler needs to choose the right building blocks and tune the AI cluster to meet its specific needs. A lot of the tuning can be done in software, but some workloads place specific demands on the underlying hardware configurations.

And sometimes the existing Lego blocks don’t quite meet the hyperscalers specific needs. Which makes the case for designing the Lego blocks you need; i.e. working with a company like Marvell or Broadcom to make custom silicon for your datacenter.

But first, lets take a step back and look at the evolution of the merchant silicon offerings.

From the earliest days, new Lego blocks started to emerge that let AI datacenter designers manage trade-offs and tune for specific workloads.

AI Datacenter Diversity.

Early in the AI datacenter game, the compute option was largely standardized (e.g. Nvidia Ada/Hopper/Blackwell), and the scale-up network was too (NVLink). So it kind of felt like all roads led to Nvidia.

But, for some time now, the scale-out infrastructure has been split between Infiniband and Ethernet, and increasingly so. (More here).

You can start to see the increase in Lego blocks…

Scale Out Networking Diversity.

Here’s an example from Meta to illustrate.

Meta basically A/B tested two AI training clusters. Many components are the same, but the scale-out fabric was different; one with Ethernet and the other Infiniband.

BTW, Meta nicely described the custom nature of their infra tuned to meet Meta’s specific needs:

At Meta, we handle hundreds of trillions of AI model executions per day. Delivering these services at a large scale ...

The Lego Block Revolution

The Shape of Workloads

The Economics of "Good Enough"

Bottom Line

Sources

Right-sized AI infrastructure. Marvell xpus