An interview with MatX CEO Reiner Pope about LLM chips

This interview cuts through the hardware hype cycle with a rare combination of insider credibility and first-principles economics. Chipstrat doesn't just ask about specs; it forces a confrontation with the most expensive bottleneck in modern computing: the belief that software lock-in is unbreakable. The piece argues that the era of blindly trusting a single vendor's ecosystem is over, not because of politics, but because the math of trillion-dollar compute bills has finally tipped the scales.

The Economics of Breaking Lock-in

The narrative begins with a striking coincidence: the founders of MatX, Reiner Pope and Mike Gunter, exited Google's Brain and TPU teams just one week before the public release of ChatGPT. Pope notes that while the public was just waking up, insiders knew the economics were broken. "The big question prior to ChatGPT was: okay, cool demo, but it's too expensive. Can you actually productize it?" the piece reports. This framing is crucial; it shifts the conversation from "who will build the best model" to "who can afford to run it."

An interview with MatX CEO Reiner Pope about LLM chips

The article's most compelling argument challenges the notion of Nvidia's CUDA moat. Historically, software engineering costs dwarfed hardware costs, making it rational to optimize for developer convenience over raw efficiency. Chipstrat highlights Pope's counter-intuitive insight: "This is really the first time that balance has changed, and it has violated a lot of people's intuitions." With frontier labs now spending tens of billions on compute, the calculus has flipped. The rational choice is no longer to stick with the familiar tool, but to rewrite the software stack to slash hardware bills.

Critics might note that rewriting decades of CUDA-optimized code is a massive operational risk that few companies can afford. However, the piece points to the reality that major players like OpenAI, Anthropic, and Meta are already running multi-platform strategies, suggesting the lock-in is more fragile than the market assumes.

"The rational choice is to do anything you can to get hardware costs down, be multi-platform, get the negotiating power that comes from that."

A Hybrid Architecture for a New Era

The technical core of the interview focuses on MatX's attempt to solve the memory bottleneck that has plagued deep learning since the rise of the Transformer architecture in 2017. The piece details a hybrid approach that merges the low-latency benefits of Static Random-Access Memory (SRAM) with the high-throughput capacity of High Bandwidth Memory (HBM). This isn't just a tweak; it's a structural reimagining designed to handle the massive context windows required by modern agentic AI.

Pope explains that while competitors like Cerebras and Groq excel at latency using SRAM, they struggle with capacity. Conversely, traditional designs using HBM suffer from latency. "It takes careful engineering and you need to balance the system right," Chipstrat reports, quoting Pope. The result is a system where pipeline parallelism—often the "ugly stepchild" of distributed training techniques—finally becomes as efficient as tensor or expert parallelism.

This architectural bet relies on a specific insight about how models process data. By optimizing for the specific memory access patterns of Large Language Models, MatX aims to deliver performance that physics allows, rather than settling for the compromises of general-purpose GPUs. The piece notes that this approach was a "core idea going in," driven by a deep understanding of workload mapping that predates the current AI boom.

The Publishing Paradox

A subtle but significant tension runs through the interview: the trade-off between open research and competitive advantage. Pope reflects on the "disappointing inflection point in 2022" when Google ceased publishing its research, a move that stalled the industry's collective knowledge base. "You could get all of the trend lines of where the best models are going until then, and then that stopped," the article quotes.

MatX attempts to navigate this by continuing to publish on attention mechanisms and memory efficiency, while keeping their proprietary numerics secret for a one-to-two-year delay. This strategy serves a dual purpose: it advocates for better hardware-aware model design while protecting their core IP. The piece argues that this ability to publish remains a key differentiator for hiring top talent in a market where secrecy is becoming the norm.

"We're not selling ML, we're selling GEMMs. But the agenda of our ML team is twofold. First is attention research... The second is numerics."

Bottom Line

Chipstrat's coverage effectively dismantles the myth of inevitable hardware monopolies by grounding the argument in the brutal economics of AI scaling. The strongest part of the piece is its demonstration that the software lock-in narrative is crumbling under the weight of trillion-dollar compute bills. Its biggest vulnerability lies in the execution risk: can a 100-person startup truly manufacture and deploy at the scale required to challenge the incumbents? The reader should watch for the first real-world benchmarks of the MatX One chip, as that will be the ultimate test of whether this hybrid architecture can deliver on its physics-based promises.

An interview with MatX CEO Reiner Pope about LLM chips

by Various · Chipstrat · Read full article

This interview is with Reiner Pope, co-founder and CEO of MatX. Pope and his co-founder Mike Gunter left Google — Pope from the Brain team, Gunter from the TPU team — one week before ChatGPT launched to build what they believe will be the best chips for LLMs that physics allows. The company has raised ~$600 million to date.

In this interview we discuss why Pope left Google to start a chip company, how to overcome the CUDA lock-in, and why frontier labs are the natural first customers. We get into the chip itself: a hybrid SRAM-HBM memory architecture that combines the low latency of Cerebras and Groq with the throughput of traditional HBM designs, and why that unlocks advantages across training, prefill, and decode. We also cover how agentic AI changes hardware requirements, how MatX uses AI internally in chip design, and the biggest skepticism Pope hears: can a 100-person startup manufacture at datacenter scale?

This interview is lightly edited for clarity.

Origin Story.

Hello listeners, we have a special guest today, co-founder and CEO of MatX, Reiner Pope. Welcome Reiner, for listeners who haven’t heard of you and MatX, who are you, what is MatX, what are you guys trying to do?

RP: Thanks, very happy to be here. As you mentioned, I’m CEO at MatX. What we’re doing is making the best chips for LLMs that is allowable by physics. My co-founder Mike Gunter and I, prior to MatX, were working at Google for a long time. Most recently, I was on the Google Brain Team training one of the LLMs at the time, and Mike was on the TPU team. There were a lot of things we wanted to do to make the TPUs much better for running LLMs. Things like running at much lower precision, having much more compute performance based on large matrix support, and generally optimizing for LLMs, reducing a lot of the other circuitry that was needed for non-LLM workloads. At the time, this was in 2022, and it turned out the best way to do this would be by starting a separate company, which is MatX.

So take me back, you mentioned 2022, you came out of Google, which I will say, it seems like everyone came out of Google that’s at the forefront of AI and hardware.

RP: It’s like the Bell Labs of the time.

Yes! There’ll be a ...

An interview with MatX CEO Reiner Pope about LLM chips

The Economics of Breaking Lock-in

A Hybrid Architecture for a New Era

The Publishing Paradox

Bottom Line

Deep Dives

Sources

An interview with MatX CEO Reiner Pope about LLM chips