Babbage reframes the current AI hardware frenzy not as a sudden race, but as the culmination of a twelve-year strategy that predates the generative AI boom by a decade. While the market fixates on the latest stock movements, the author reveals that the architecture powering today's most advanced models was conceived when deep learning was still an academic curiosity, not a trillion-dollar industry.
The Twelve-Year Overnight Success
The piece's most striking revelation is the sheer patience required to build a hardware empire. Babbage writes, "The origins of the TPU program date Google date all the way back to 2013 - almost a decade before the launch of ChatGPT." This timeline is crucial for understanding why Google can now challenge the dominant player in the market. The author traces the architectural DNA even further back, noting that the "essence of the design dates back even further - into the 1970s," specifically citing a 1978 paper by H.T. Kung and Charles E. Leiserson on systolic arrays. This historical context is vital; it reminds us that the efficiency gains driving modern AI are not magic, but the result of decades of theoretical work on rhythmic data processing that was once limited by fabrication technology.
Babbage argues that the first generation of these chips was surprisingly rudimentary, yet perfectly timed. "With the benefit of hindsight this first Google design seems remarkably primitive," the author notes, describing it as a co-processor that "could only perform integer arithmetic, and was only useful for inference." Yet, this simplicity was its strength. It met the immediate need for cost-effective, scalable inference, launching a wave of imitation across the industry. The author quips that the TPU v1 led to "the launch of a thousand chips," sparking a venture capital frenzy and a buying spree by Intel for domain-specific architecture startups.
The approach didn't gain widespread adoption in the 1970s and 1980s, but by 2013 the time was right.
The commentary effectively highlights the iterative nature of this success. Unlike the "big bang" launches often seen in consumer tech, Google's strategy involved steady, incremental upgrades from v2 through v7. The author details how each generation introduced specific improvements, such as the shift to floating-point arithmetic with the bfloat16 format in v2—a critical evolution that allowed for the training of complex neural networks without the precision loss that plagued earlier integer-only designs. This progression mirrors the development of Apple Silicon, where control over the entire stack allowed for optimizations that general-purpose competitors couldn't match.
Leading the Accelerator Crowd
The core of Babbage's argument rests on the unique advantages Google holds over its competitors, particularly Nvidia. The author posits that Google's dominance isn't just about raw chip performance, but about ecosystem control. "Full control over the stack - including software and machine learning algorithms - that it applies in its services," Babbage writes, "means that there can be feedback into the hardware design as software and algorithms evolve." This vertical integration is the secret sauce that allows the administration of Google's internal AI labs to dictate hardware roadmaps that perfectly suit their models, a luxury external chipmakers struggle to replicate.
The author also points out a strategic advantage often overlooked: the lack of legacy baggage. "No legacy (non-AI applications) to support," Babbage notes, allowing the TPU team to optimize purely for AI workloads. This stands in contrast to Nvidia, which must maintain backward compatibility for a vast array of scientific and graphics computing tasks. The result is a chip architecture that has become increasingly specialized, with Nvidia itself moving closer to the TPU model by adding matrix multiply units and reducing support for high-precision floating-point calculations.
There is mythological reverence for Google's Tensor Processing Unit. While the world presently watches NVIDIA's gravity drag more companies into its orbit, there sits Google, imperial and singular.
Critics might argue that this "imperial" stance is fragile, relying heavily on Google's ability to maintain its software moat. If the open-source community can crack the code on the compiler and runtime layers, the hardware advantage could evaporate. Babbage acknowledges this vulnerability, noting that the "critical missing ingredient" for Google to truly challenge Nvidia's CUDA dominance is to open-source its XLA:TPU compiler and runtime code.
Google's Dilemma
Perhaps the most intriguing section of the commentary addresses the paradox of Google's recent decision to sell these chips to third parties, including rivals. Babbage frames this as a high-stakes strategic gamble. "I'd love to have been a fly on the wall for Google's decision to sell TPUs externally," the author muses, highlighting the tension between capturing hardware margins and empowering competitors. The decision to sell to entities like OpenAI, even at a discount, suggests a shift in priorities from pure monopoly to ecosystem dominance.
The author suggests that Google's leverage lies in its ability to control the supply. "It can decide, on a year by year basis, who gets TPUs and how many TPUs they get," Babbage writes. This creates a precarious dependency for customers. The commentary warns that while Google may not be "truly evil," the history of the company suggests that no product is safe from discontinuation. "A key takeaway from this is that none of these potential TPU customers should really make themselves reliant on Google," the author cautions, a sobering thought for any enterprise betting its infrastructure on a single vendor.
The Architecture War
Finally, Babbage challenges the prevailing narrative that GPUs are superior for training while TPUs are relegated to inference. The author points to the training of Google's latest Gemini 3 model as a definitive counter-example. "Google's exclusive use of TPUs to train Gemini 3 undermines the first part of this narrative somewhat," Babbage asserts. The article suggests that the future is not a binary winner-take-all scenario, but a convergence of architectures. The competition, the author argues, will be less about the theoretical merits of the silicon and more about the quality of the software stacks and the ability to scale.
The competition between the two approaches (and not just between Nvidia and Google) looks likely to be the most interesting and keenly fought since CISC vs RISC in the 1980s.
This historical parallel to the CISC vs RISC debate of the 1980s provides a sobering perspective on the current hype cycle. Just as that battle resulted in a complex, hybrid landscape rather than a single victor, the GPU vs. TPU war will likely end in a fragmented market where different architectures serve different niches. The author's focus on the supply chain and the "optical circuit switch" systems in the latest TPU v7 (Ironwood) reveals that the real battleground has shifted from the chip itself to the interconnects and the ability to scale clusters to thousands of units.
Bottom Line
Babbage's strongest contribution is the dismantling of the "overnight success" myth, replacing it with a narrative of deliberate, long-term architectural planning that began long before the current AI gold rush. The piece's biggest vulnerability is its reliance on Google's continued benevolence in managing a competitive ecosystem, a historical track record that offers little guarantee. Readers should watch closely for whether Google's decision to open its software stack can truly break Nvidia's moat, or if the "imperial" nature of Google's control will ultimately limit the adoption of its hardware by the broader market.