Vera rubin – extreme co-design: An evolution from grace blackwell oberon

Nvidia Designs the Whole Rack Now

At CES 2026, Nvidia unveiled the complete Vera Rubin platform: six silicon products spanning the Rubin Graphics Processing Unit (GPU), Vera Central Processing Unit (CPU), NVLink 6 Switch, ConnectX-9 Network Interface Card (NIC), BlueField-4 Data Processing Unit (DPU), and Spectrum-6 switch. The VR NVL72 rack represents the second generation of Nvidia's Oberon rack-scale architecture, and the company's ambitions have grown accordingly. Where Grace Blackwell was Nvidia's first attempt at treating the rack as a unit of compute, Vera Rubin is the version where Nvidia asserts near-total control over what goes inside it.

SemiAnalysis describes this strategy as "extreme co-design," and the framing is apt:

Nvidia's competitiveness strengthens with its extreme co-design supremacy. It is the only player with the best in class or close to the best in class silicon product offerings for all the major silicon contents in an Nvidia trail-blazed AI server system design.

No other chip company offers a competitive GPU, a state-of-the-art scale-up switch, a top-tier NIC, a leading Ethernet networking switch, and a purpose-built CPU all under one roof. AMD has competitive GPUs. Broadcom makes excellent switch ASICs. But nobody else fields the complete lineup. This vertical integration is Nvidia's moat, and Vera Rubin widens it.

Vera rubin – extreme co-design: An evolution from grace blackwell oberon

The Silicon: Brute Force at 3 Nanometers

The Rubin GPU moves to a 3-nanometer process node with 336 billion transistors, a 60 percent increase over Blackwell. Dense FP4 performance hits 35 petaflops, a 3.5x improvement over GB200, achieved through more streaming multiprocessors, doubled tensor core width, and a 25 percent clock speed bump. Nvidia also claims an "effective" 50 petaflops through a new adaptive compression engine that dynamically eliminates zeros in the data stream.

While models that utilize Post Training Quantization or Quantization Aware Training will be tuned to maximize adaptive compression speedups, they are not strictly needed to take advantage of dynamic compression.

This is a meaningful departure from the rigid 2:4 structured sparsity of previous generations, which programmers largely ignored. The new approach works automatically on existing models. But SemiAnalysis appropriately flags the skepticism:

Many ML Systems engineers are still skeptical that this new form of sparsity will work well, and it is very possible that Nvidia's 50 PFLOPS is purely marketing like prior generations.

Whether that 50 petaflop figure holds up in real workloads or joins structured sparsity in the dustbin of marketing claims remains an open question.

On the memory front, the shift to High Bandwidth Memory 4 (HBM4) delivers 22 terabytes per second of bandwidth at 288 gigabytes of capacity. That is 2.75x Blackwell. But the article reveals a telling supply chain reality: Nvidia requested pin speeds well above the Joint Electron Device Engineering Council (JEDEC) specification, and memory suppliers are struggling to deliver.

While Nvidia is targeting 22TB/s, we understand that memory suppliers are having challenges hitting Nvidia's requirements and we see it likely that initial shipments will come in slightly below at closer to 20TB/s.

Meanwhile, Micron appears to have fallen out of the HBM4 race entirely for Rubin, leaving SK Hynix and Samsung as the only viable suppliers. The concentration risk here deserves attention.

The Vera CPU and the Full Stack Play

The Vera CPU doubles performance over Grace by moving to a 3-nanometer reticle-sized compute die with 88 cores, simultaneous multithreading for 176 threads, and a 40 percent larger L3 cache. Memory bus width doubles to 1024 bits. SemiAnalysis notes that Nvidia was notably aggressive here:

NVIDIA was aggressive on the CPU front, with Vera doubling performance over Grace by moving to a 3nm reticle-sized compute die and disaggregating the memory controllers and I/O into chiplets.

The CPU has never been where Nvidia wins deals. But a mediocre CPU creates friction in the system sale. By making Vera genuinely competitive, Nvidia eliminates one more reason for customers to look elsewhere.

Cableless Design: Engineering Elegance or Lock-In?

Perhaps the most consequential change in VR NVL72 is the elimination of internal cables from the compute tray. Flyover cables were the major point of failure in Grace Blackwell assembly, and Jensen Huang claimed at CES that the new cableless design reduces compute tray assembly time from two hours to five minutes.

The modular approach replaces cables with board-to-board connectors from Amphenol (Paladin HD2) and routes signals through a printed circuit board (PCB) midplane. This demanded significant upgrades to PCB materials, including M8/M9 grade copper clad laminate (CCL) and HVLP4 copper foil, with quartz cloth still under consideration for the longest signal paths.

The cableless design is a genuine engineering achievement. But it also tightens Nvidia's grip on the ecosystem. Customization options for hyperscalers shrink considerably:

For VR NVL72, although some level of customization is still available, there are a lot more limitations on the form factor. Given the modular and the cableless design of VR NVL72, the customized modules at the front of the chassis must match the form factor and dimension of Nvidia's reference design.

Only the power delivery module, BlueField-4, and management modules remain customizable, and even those must conform to Nvidia's specified dimensions. This is a double-edged sword. Simpler assembly and fewer failure modes benefit everyone. But hyperscalers who built competitive advantages through custom server designs now find their options constrained. Whether this creates meaningful pushback or simply gets accepted as the cost of staying on Nvidia's roadmap will play out over the next year.

Cooling a 220-Kilowatt Rack

VR NVL72 runs at 180 to 220 kilowatts per rack, roughly double Blackwell. The compute tray is now 100 percent liquid cooled, with fans eliminated entirely. Cold plates cover every module, connected through internal manifolds with miniature quick disconnects. The Rubin GPU cold plate uses micro-channel technology with 100-micron pitch channels, down from 150 microns, with gold plating to resist corrosion from liquid metal thermal interface material.

Nvidia claims the platform can operate with 45-degree Celsius inlet water temperatures, potentially avoiding mechanical chillers. SemiAnalysis provides useful context: this is not as revolutionary as the market perceived it. Blackwell already operates above 40 degrees, and vendors like Lenovo and HPE discussed 45-degree liquid cooling architectures in early 2025. The real engineering challenge is the tightened temperature differential. With less delta between inlet and outlet, flow rates must increase roughly 2 to 2.5 times.

Networking: Bidirectional Signaling Over Copper

The scale-up network doubles bandwidth without doubling cables, thanks to bidirectional signaling over existing copper backplane connections. Each electrical lane carries 448 gigabits per second, up from 224 in NVLink 5, by sending signals in both directions simultaneously over the same differential pair. Echo cancellation at each end of the wire separates the local transmit signal from the received signal.

This is a clever solution to a hard physical constraint. Doubling the roughly five thousand copper cables on the Blackwell backplane would have been, as the article puts it, "a tall order" given the reliability problems already experienced at scale. The tradeoff is that echo cancellation must be precisely calibrated, and any delay in generating the local transmit copy can cause link failure.

On the scale-out side, VR NVL72 doubles per-GPU bandwidth to 1.6 terabits per second using two ConnectX-9 NICs per GPU rather than a single faster NIC. This enables multi-plane network architectures that can scale to clusters of hundreds of thousands of GPUs. The article walks through InfiniBand, Spectrum Ethernet, and third-party Ethernet deployment options in considerable detail, with co-packaged optics (CPO) making its first appearance in Nvidia's scale-out backend network.

The TCO Question

For all the engineering brilliance, the business case requires scrutiny. VR NVL72 carries roughly 45 percent higher per-GPU capital cost versus GB300 and 14 to 15 percent higher cost versus AMD's MI4XX. SemiAnalysis frames a potential advantage in memory economics:

NVIDIA directly procures memory, allowing them to negotiate long-term agreements, volume-preferential terms with memory suppliers and most importantly, VVIP pricing. We think this will shield end customers from spikes in memory costs.

AMD, by contrast, carries roughly double the Dynamic Random Access Memory (DRAM) content per rack and leaves DDR5 procurement to assemblers, creating greater exposure to memory price spikes. In a rising DRAM market, Nvidia's centralized procurement could offset its higher sticker price. But that 45 percent premium is steep, and the Total Cost of Ownership (TCO) advantage depends heavily on assumptions about memory pricing trajectories and actual workload performance that have yet to be proven in production.

Bottom Line

Vera Rubin represents Nvidia at peak vertical integration. The company now designs the GPU, CPU, DPU, NIC, and both scale-up and scale-out switch ASICs, and it dictates the rack architecture in which they all sit. The cableless modular design is genuinely impressive engineering that solves real manufacturing problems. The bidirectional signaling approach to doubling scale-up bandwidth without doubling cables is elegant.

The risk, as always with Nvidia, is that the moat depends on execution across an enormous surface area. Memory suppliers are already struggling to hit HBM4 specifications. The 50-petaflop adaptive sparsity claim is unproven. The 45 percent cost premium demands that every promised performance gain actually materializes. And the reduced customization options could eventually motivate the largest hyperscalers to accelerate their own silicon programs.

For now, though, the competitive picture is clear. No other company can deliver this level of integration at rack scale. SemiAnalysis has produced a comprehensive technical breakdown of what may be the most complex commercially deployed computing system ever built, and the sheer density of engineering decisions documented here -- from quartz cloth PCB substrates to gold-plated micro-channel cold plates to bidirectional echo cancellation -- underscores just how far ahead Nvidia remains in the co-design game.

Vera rubin – extreme co-design: An evolution from grace blackwell oberon

Nvidia Designs the Whole Rack Now

The Silicon: Brute Force at 3 Nanometers

The Vera CPU and the Full Stack Play

Cableless Design: Engineering Elegance or Lock-In?

Cooling a 220-Kilowatt Rack

Networking: Bidirectional Signaling Over Copper

The TCO Question

Bottom Line

Deep Dives

Sources

Vera rubin – extreme co-design: An evolution from grace blackwell oberon