← Back to Library

No jensen, not all compute is created equal

Jordan Schneider cuts through the noise of aggregate statistics to deliver a counterintuitive truth: in the race for artificial intelligence, quantity is a dangerous illusion. While popular narratives obsess over total chip counts or raw floating-point operations, Schneider argues that the architecture of the hardware itself creates an unbridgeable chasm between nations. This piece is essential listening because it dismantles the comforting notion that a massive stockpile of older, weaker chips can simply "scale up" to match the performance of a smaller number of cutting-edge accelerators.

The Illusion of Aggregate Power

The article begins by challenging a provocative claim from Jensen Huang, the CEO of Nvidia, who recently suggested on a podcast that China possesses sufficient computing power to build frontier AI models. Huang argued, "AI is a parallel computing problem, isn't it? Why can't they just put 4x, 10x, as many chips together because energy's free?" Schneider immediately pushes back, noting that while this line of reasoning is compelling to some policymakers, it fundamentally misunderstands the physics of modern computing. The author points out that the House Select Committee on China has already begun to grapple with this nuance, proposing a "rolling technical threshold" to prevent what they call "death by a thousand sub-threshold chips."

No jensen, not all compute is created equal

Schneider writes, "Legacy chips don't matter for AI," a blunt assertion that reorients the entire debate. The author explains that the vast majority of chips manufactured in China are "mainstream" components found in washing machines, car engine management systems, and industrial motors. These are typically produced at 28-nanometer nodes or larger. As Schneider puts it, "A chip in your microwave cannot do matrix multiplication for a transformer, and a 40nm microcontroller in a Chinese EV does not help run DeepSeek-V4." This distinction is critical; it separates the industrial backbone of an economy from the specific, high-performance silicon required to train the next generation of intelligence.

A chip in your microwave cannot do matrix multiplication for a transformer.

The argument gains depth when Schneider introduces the concept of FLOPs (floating-point operations per second) as the true currency of AI, rather than simple chip counts. The disparity is staggering: a single Nvidia Blackwell B200 chip delivers roughly 10 petaFLOPs, while a typical automotive microcontroller delivers only 0.12 teraFLOPs. Schneider illustrates this gap with a vivid comparison: "if a country had 100,000 Blackwells, its rival would need more than the absurd number of two billion legacy chips to match the same FLOPs output." This reframing forces the reader to abandon the idea that volume can compensate for obsolescence.

The Architecture of Failure

To make the technical limitations concrete, Schneider constructs a hypothetical scenario comparing two nations: "Nvidiana," which possesses a lean stockpile of top-tier frontier chips, and "Huaweiopolis," which relies on a massive volume of weaker, older accelerators. Even if both nations possess the same total number of "H100-equivalents" on paper, their capabilities diverge radically. Schneider notes that Nvidiana can train the next generation of models, while Huaweiopolis "cannot, and more chips will not close the gap."

The author identifies three specific bottlenecks that prevent the "quantity over quality" strategy from working: numerical precision, memory bandwidth, and network bandwidth. On precision, Schneider explains that newer chips like the Blackwell series can perform calculations at FP4 (four-digit precision), effectively doubling their speed compared to older chips limited to INT8. "By calculating half the digits, they have double the speed," Schneider writes, highlighting how older hardware is not just slower, but fundamentally incompatible with the efficiency demands of modern AI training.

Memory bandwidth presents an even steeper hurdle. Schneider uses a powerful analogy to describe the problem: "A chip with ample FLOPs but insufficient memory bandwidth is like a chef with incredible knife skills but a single narrow hallway between the pantry and the kitchen." In this scenario, the chef (the processor) is ready to work, but the ingredients (data) arrive too slowly to keep pace. The article notes that while the latest chips utilize HBM3e or HBM4 memory with bandwidths exceeding 22 terabytes per second, domestic Chinese chips like the Ascend 910C are still reliant on older HBM2E technology with only 3.2 terabytes per second. This creates a situation where the logic units are constantly idle, waiting for data.

Finally, the piece addresses network bandwidth, the speed at which chips communicate with one another. This is the Achilles' heel for a strategy based on stacking millions of weaker chips. Schneider argues that as the cluster grows, the communication overhead becomes the dominant cost. "At scale, this turns communication into the dominant cost, meaning that adding more chips yields diminishing returns and eventually no additional performance at all," the author writes. The result is not just slower training, but potential system failure. Schneider warns that if communication is too slow, "gradients do not reliably descend — and training can become unstable or fail to converge altogether."

Critics might note that this analysis assumes a static technological landscape where Chinese firms cannot innovate rapidly enough to close the memory and interconnect gaps. However, the current constraints on high-bandwidth memory (HBM) production, a technology where the global supply chain is tightly controlled, suggest these bottlenecks are structural rather than temporary. Furthermore, the historical context of chiplet technology—which attempts to combine smaller, cheaper dies to mimic larger ones—has shown that while it can extend the life of older nodes, it cannot fully replicate the raw throughput of monolithic, advanced-node designs without incurring massive latency penalties.

A naive equivalence on FLOPs of a Huaweiopolis cluster with a Nvidiana cluster hides the fact that the Huaweiopolis cluster will suffer in performance for both training and inference.

Policy Implications and the Path Forward

The commentary concludes by observing a shift in how US policymakers are interpreting these technical realities. Schneider points to the recent introduction of the SCALE Act by Representative John Moolenaar, which moves away from a blunt cap on aggregate compute capacity. Instead, the new proposal would limit exports to "110% of the performance of the best chips China can already manufacture domestically at scale." Schneider views this as a significant maturation of policy, stating, "It is a narrower, more observable target, and it takes the quality-over-quantity insight more seriously than the aggregate headcount approach did."

The author argues that future enforcement must focus on these "crown jewels"—the specific chips that enable frontier model training—rather than trying to police a vague aggregate number. "We should be building enforcement around these crown jewels rather than solely around an aggregate FLOP count, and definitely not based on dubious chip counts!" Schneider asserts. This approach acknowledges that while aggregate compute matters for the broad diffusion of AI across an economy, the race for the most powerful models is won by the concentration of the best hardware, not the sheer volume of the rest.

Bottom Line

Schneider's strongest contribution is the dismantling of the "parallel computing" fallacy, proving that architectural superiority in memory and interconnects creates a performance floor that sheer volume cannot breach. The argument's vulnerability lies in its assumption that the gap in memory bandwidth and network efficiency will remain insurmountable, a premise that could be challenged by unexpected breakthroughs in domestic Chinese supply chains. Readers should watch for how the new legislative thresholds, pegged to domestic capability rather than total capacity, will be enforced in practice, as this will likely define the next phase of the global AI hardware race.

Deep Dives

Explore these related deep dives:

  • The New Silk Roads: The Present and Future of the World Amazon · Better World Books by Peter Frankopan

  • Chiplet

    The article discusses the strategy of combining many sub-threshold chips to overcome performance gaps, a technique that relies on advanced packaging technologies like chiplets to function effectively.

  • High Bandwidth Memory

    Jensen Huang's argument that 'energy is free' ignores the critical bottleneck of memory bandwidth, which High Bandwidth Memory specifications define as the limiting factor for scaling AI clusters beyond simple chip counts.

  • List of Huawei products

    While the article mentions Huawei's Ascend line as China's primary AI alternative, the specific Wikipedia entry details the architectural constraints and yield challenges of these chips compared to Nvidia's Hopper and Blackwell architectures.

Sources

No jensen, not all compute is created equal

by Jordan Schneider · ChinaTalk · Read full article

Nick Corvino is in Cape Town for two weeks… email Nick at nick@chinatalk.media if you’re interested in joining a ChinaTalk meetup!

We’ve recently tried to pin down how much compute China actually has, approaching the question from both the supply and demand sides. We converged on roughly 2.5 to 2.8 million H100-equivalents. But a single aggregate figure only captures part of the picture.

Jensen on China.

On Dwarkesh’s podcast last week, Jensen Huang argued that China already has enough compute to build frontier AI.

“They manufacture 60% of the world’s mainstream chips, maybe more.”

When Dwarkesh raised the gap in advanced chips, Jensen responded,

“AI is a parallel computing problem, isn’t it? Why can’t they just put 4x, 10x, as many chips together because energy’s free?”

Jensen is wrong, but that doesn’t mean people aren’t compelled by this line of reasoning. John Moolenaar, who chairs the House Select Committee on China, sent a letter to Lutnick in December proposing a rolling technical threshold that would cap Chinese aggregate AI compute at 10% of US compute capacity. It’s much more nuanced — accounting for memory and network bandwidth as part of this calculus — but ultimately seems motivated by preventing, as the letter calls it, “death by a thousand sub‐threshold chips.”

Export restrictions are a difficult line to walk, and total computing power does matter. But not all compute is created equal. The compute that can train a frontier model, serve inference on an existing one, and power your laptop are different things, and a “death by a thousand sub-threshold chips” is less concerning for the trajectory of AI than a concentration of the most important chips.

Legacy Chips Don’t Matter for AI.

It’s hard to know where Jensen is getting his claim that “China manufactures 60% of the world’s mainstream chips.” Perhaps originally from a 2024 projection from previous Commerce Secretary Gina Raimondo about new legacy chip capacity coming online in China. But this is not a measure of AI compute. It includes the chips running your car’s engine management system, your washing machine’s control board, and the power electronics in an industrial motor, typically manufactured at 28nm or larger. They matter, but they are not the chips that train frontier AI. A chip in your microwave cannot do matrix multiplication for a transformer, and a 40nm microcontroller in a Chinese EV does not help run DeepSeek-V4.

The sliver of ...