Tpu mania

Babbage reframes the current AI hardware frenzy not as a sudden race, but as the culmination of a twelve-year strategy that predates the generative AI boom by a decade. While the market fixates on the latest stock movements, the author reveals that the architecture powering today's most advanced models was conceived when deep learning was still an academic curiosity, not a trillion-dollar industry.

The Twelve-Year Overnight Success

The piece's most striking revelation is the sheer patience required to build a hardware empire. Babbage writes, "The origins of the TPU program date Google date all the way back to 2013 - almost a decade before the launch of ChatGPT." This timeline is crucial for understanding why Google can now challenge the dominant player in the market. The author traces the architectural DNA even further back, noting that the "essence of the design dates back even further - into the 1970s," specifically citing a 1978 paper by H.T. Kung and Charles E. Leiserson on systolic arrays. This historical context is vital; it reminds us that the efficiency gains driving modern AI are not magic, but the result of decades of theoretical work on rhythmic data processing that was once limited by fabrication technology.

Babbage argues that the first generation of these chips was surprisingly rudimentary, yet perfectly timed. "With the benefit of hindsight this first Google design seems remarkably primitive," the author notes, describing it as a co-processor that "could only perform integer arithmetic, and was only useful for inference." Yet, this simplicity was its strength. It met the immediate need for cost-effective, scalable inference, launching a wave of imitation across the industry. The author quips that the TPU v1 led to "the launch of a thousand chips," sparking a venture capital frenzy and a buying spree by Intel for domain-specific architecture startups.

The approach didn't gain widespread adoption in the 1970s and 1980s, but by 2013 the time was right.

The commentary effectively highlights the iterative nature of this success. Unlike the "big bang" launches often seen in consumer tech, Google's strategy involved steady, incremental upgrades from v2 through v7. The author details how each generation introduced specific improvements, such as the shift to floating-point arithmetic with the bfloat16 format in v2—a critical evolution that allowed for the training of complex neural networks without the precision loss that plagued earlier integer-only designs. This progression mirrors the development of Apple Silicon, where control over the entire stack allowed for optimizations that general-purpose competitors couldn't match.

Leading the Accelerator Crowd

The core of Babbage's argument rests on the unique advantages Google holds over its competitors, particularly Nvidia. The author posits that Google's dominance isn't just about raw chip performance, but about ecosystem control. "Full control over the stack - including software and machine learning algorithms - that it applies in its services," Babbage writes, "means that there can be feedback into the hardware design as software and algorithms evolve." This vertical integration is the secret sauce that allows the administration of Google's internal AI labs to dictate hardware roadmaps that perfectly suit their models, a luxury external chipmakers struggle to replicate.

The author also points out a strategic advantage often overlooked: the lack of legacy baggage. "No legacy (non-AI applications) to support," Babbage notes, allowing the TPU team to optimize purely for AI workloads. This stands in contrast to Nvidia, which must maintain backward compatibility for a vast array of scientific and graphics computing tasks. The result is a chip architecture that has become increasingly specialized, with Nvidia itself moving closer to the TPU model by adding matrix multiply units and reducing support for high-precision floating-point calculations.

There is mythological reverence for Google's Tensor Processing Unit. While the world presently watches NVIDIA's gravity drag more companies into its orbit, there sits Google, imperial and singular.

Critics might argue that this "imperial" stance is fragile, relying heavily on Google's ability to maintain its software moat. If the open-source community can crack the code on the compiler and runtime layers, the hardware advantage could evaporate. Babbage acknowledges this vulnerability, noting that the "critical missing ingredient" for Google to truly challenge Nvidia's CUDA dominance is to open-source its XLA:TPU compiler and runtime code.

Google's Dilemma

Perhaps the most intriguing section of the commentary addresses the paradox of Google's recent decision to sell these chips to third parties, including rivals. Babbage frames this as a high-stakes strategic gamble. "I'd love to have been a fly on the wall for Google's decision to sell TPUs externally," the author muses, highlighting the tension between capturing hardware margins and empowering competitors. The decision to sell to entities like OpenAI, even at a discount, suggests a shift in priorities from pure monopoly to ecosystem dominance.

The author suggests that Google's leverage lies in its ability to control the supply. "It can decide, on a year by year basis, who gets TPUs and how many TPUs they get," Babbage writes. This creates a precarious dependency for customers. The commentary warns that while Google may not be "truly evil," the history of the company suggests that no product is safe from discontinuation. "A key takeaway from this is that none of these potential TPU customers should really make themselves reliant on Google," the author cautions, a sobering thought for any enterprise betting its infrastructure on a single vendor.

The Architecture War

Finally, Babbage challenges the prevailing narrative that GPUs are superior for training while TPUs are relegated to inference. The author points to the training of Google's latest Gemini 3 model as a definitive counter-example. "Google's exclusive use of TPUs to train Gemini 3 undermines the first part of this narrative somewhat," Babbage asserts. The article suggests that the future is not a binary winner-take-all scenario, but a convergence of architectures. The competition, the author argues, will be less about the theoretical merits of the silicon and more about the quality of the software stacks and the ability to scale.

The competition between the two approaches (and not just between Nvidia and Google) looks likely to be the most interesting and keenly fought since CISC vs RISC in the 1980s.

This historical parallel to the CISC vs RISC debate of the 1980s provides a sobering perspective on the current hype cycle. Just as that battle resulted in a complex, hybrid landscape rather than a single victor, the GPU vs. TPU war will likely end in a fragmented market where different architectures serve different niches. The author's focus on the supply chain and the "optical circuit switch" systems in the latest TPU v7 (Ironwood) reveals that the real battleground has shifted from the chip itself to the interconnects and the ability to scale clusters to thousands of units.

Bottom Line

Babbage's strongest contribution is the dismantling of the "overnight success" myth, replacing it with a narrative of deliberate, long-term architectural planning that began long before the current AI gold rush. The piece's biggest vulnerability is its reliance on Google's continued benevolence in managing a competitive ecosystem, a historical track record that offers little guarantee. Readers should watch closely for whether Google's decision to open its software stack can truly break Nvidia's moat, or if the "imperial" nature of Google's control will ultimately limit the adoption of its hardware by the broader market.

by Babbage · The Chip Letter · Read full article

Nobel Prize winning economist, Paul Krugman, discusses TPUs and GPUs with Paul Kedrosky. No criticism of Krugman intended, it’s great that computer architecture is getting so much attention.

All of a sudden, everyone is talking - and writing - about Google’s TPUs (Tensor Processing Units). The use of TPUs to train Google latest, market leading Gemini 3 model together with Google’s decision to sell TPUs to third parties (apparently including arch-rival Meta) have combined to create a major ‘vibe-shift’ away from Nvidia and towards Google’s hardware.

Not just a vibe-shift either as Alphabet’s stock price has been on a tear:

The coverage of the latest TPUs (v7) has been so extensive that, rather than add another series of takes, I thought it would more useful for readers to curate a selection of some of the most informative posts and links on TPUv7.

Before that, though, I wanted to briefly highlight several points that I can’t recall seeing much discussed elsewhere.

A Twelve Year Overnight Success.

We looked at the origins of Google’s TPU programme in:

The origins of the TPU program date Google date all the way back to 2013 - almost a decade before the launch of ChatGPT - when it first became apparent that Google might need to start applying deep learning at scale.

The essence of the design dates back even further - into the 1970s. In their 1978 paper Systolic Arrays (for VLSI) H.T Kung and Charles E. Leiserson of Carnegie Mellon University had set out proposals for what they called a ‘systolic system’.

A systolic system is a network of processors which rhythmically compute and pass data through the system….In a systolic computer system, the function of a processor is analogous to that of the heart. Every processor regularly pumps data in and out, each time performing some short computation so that a regular flow of data is kept up in the network.

The approach didn’t gain widespread adoption in the 1970s and 1980s, but by 2013 the time was right:

By 2013 some of the original motivation behind Kung and Lieberson’s ideas had fallen away, particularly dealing with the limits of fabrication technology of the 1970s, had fallen away. However, the inherent efficiency, and of particular relevance in 2013, the relatively low power consumption, of this approach for tasks such as matrix multiplication remained. So the TPU would use systolic arrays.

We discussed the ...

Tpu mania

The Twelve-Year Overnight Success

Leading the Accelerator Crowd

Google's Dilemma

The Architecture War

Bottom Line

Deep Dives

Sources

Tpu mania