DeepSeek debrief: >128 days later

Dylan Patel delivers a startling verdict on the most hyped AI release of the year: the model that terrified Western markets is failing as a consumer product because its creators never intended it to be one. While headlines fixated on price wars, Patel uncovers a strategic pivot where DeepSeek sacrificed user experience to hoard computing power for research, turning its own API into a loss-leader for open-source adoption. This isn't just a market analysis; it's a forensic look at how export controls and resource scarcity are reshaping the global AI architecture.

The Illusion of the Price War

The narrative that DeepSeek's success was purely a result of undercutting competitors on price is, according to Patel, a dangerous oversimplification. He argues that the low cost was a byproduct of severe engineering trade-offs, not a sustainable business model for direct consumption. "The answer lies in tokenomics and the myriad of tradeoffs between the KPIs for serving a model," Patel writes, shifting the focus from sticker price to the hidden costs of latency and context.

He illustrates that DeepSeek's own hosted service forces users to endure significant delays to keep costs down. "A big reason why DeepSeek is able to price their product so cheaply is because they force users to wait many seconds before the model responds with the first token." This is a critical distinction for busy professionals who need instant feedback. The data shows that while the model is cheap, the time-to-first-token is often worse than competitors charging double.

Critics might argue that for many batch-processing tasks, latency is irrelevant, but Patel's data on consumer app traffic suggests otherwise. The explosive initial growth has stalled, with web traffic declining in absolute terms while other providers surge. This suggests that for the general market, speed is a non-negotiable feature, not a luxury.

To be clear, this is an active decision by DeepSeek. They are not interested in making money off users or in serving them lots of tokens via a chat app or an API service. The company is singularly focused on reaching AGI and is not interested in end user experience.

The Hardware Reality and the Open Source Strategy

Patel's most compelling insight is that DeepSeek's strategy is a rational response to hardware constraints imposed by international export controls. With access to the latest chips limited, the company cannot afford to burn compute on serving millions of casual users. Instead, they are using open-source releases to win global mind share while reserving their scarce internal resources for training.

"Batching at extremely high rates allows them to use the minimal amount of compute possible for inference and external usage. This keeps the maximal amount of compute internal for research and development purposes." This reframes the "commoditization" fear: DeepSeek isn't trying to sell a product; they are trying to survive a blockade. By letting third-party providers like OpenRouter host their models, they offload the infrastructure burden while still capturing the ecosystem's attention.

The article draws a parallel to the broader Chinese ecosystem, noting that while export controls have "greatly limited China's capability in inferencing models at scale," they have not stopped the training of useful models from giants like Tencent and Alibaba. This nuance is vital—it suggests the bottleneck is in serving, not in creating intelligence. The trade-off is stark: a model that is brilliant but slow and context-limited when hosted directly, versus a model that thrives when distributed across global clouds.

Compute Scarcity as a Global Phenomenon

Patel expands the lens beyond China, arguing that compute scarcity is now a universal constraint affecting even well-funded US companies. He draws a surprising parallel between DeepSeek and Anthropic, noting that both are forced to throttle performance to manage demand. "The reason for this is not unlike DeepSeek's – to manage all the incoming requests with the available compute, they have to batch at higher rates."

This comparison is effective because it demystifies the performance dips seen in popular tools like Claude Code. Patel points out that Anthropic's speed decreased by 40% recently, a direct result of batching strategies similar to those DeepSeek employs. However, he notes a key differentiator: Anthropic's models are more efficient, requiring fewer tokens to answer a question. "Indeed, Claude has the lowest amount of total output tokens for leading reasoning models," Patel observes, which compensates for the slower generation speed.

This highlights a new frontier in AI competition: efficiency per token. It's not just about how smart the model is, but how much intelligence it can deliver for the least amount of compute. As Patel puts it, "It is not just more intelligence, but more intelligence per token produced." This shifts the metric of success from raw capability to resource optimization, a critical pivot for an industry facing a global chip shortage.

Unlike a normal factory, the token price is a variable that model providers can solve for based on the other attributes of the model.

Bottom Line

Patel's analysis cuts through the noise of price-per-token headlines to reveal a structural shift where hardware constraints dictate product strategy more than market demand. The strongest part of this argument is the reframing of DeepSeek's "failure" as a deliberate, rational choice to prioritize research over revenue in the face of geopolitical barriers. The biggest vulnerability is the assumption that third-party hosting can fully compensate for the lack of a polished, direct-to-consumer interface, which remains a significant hurdle for mass adoption. Watch for how other nations respond to these hardware bottlenecks, as the race is no longer just about who builds the smartest model, but who can run it most efficiently.

DeepSeek debrief: >128 days later

by Dylan Patel · SemiAnalysis · Read full article

SemiAnalysis is hiring an analyst in New York City for Core Research, our world class research product for the finance industry. Please apply here

It’s been a bit over 150 days since the launch of the Chinese LLM DeepSeek R1 shook stock markets and the Western AI world. R1 was the first model to be publicly released that matched OpenAI’s reasoning behavior. However, much of this was overshadowed by the fear that DeepSeek (and China) would commoditize AI models given the extremely low price of $0.55 input/$2.19 output, undercutting the then SOTA model o1 by 90%+ on output token pricing. Reasoning model prices have dropped significantly since, with OpenAI recently dropping their flagship model price by 80%.

R1 got an update as DeepSeek continued to scale RL after release. This resulted in the model improving in many domains, particularly coding. This continuous development and improvement is a hallmark of the new paradigm we previously covered.

Today we look at DeepSeek’s impact on the AI model race and the state of AI market share.

A Boom and... Bust? .

Consumer app traffic to DeepSeek spiked following release, resulting in a sharp increase in market share. Because Chinese usage is poorly tracked and Western labs are blocked in China, the numbers below understate DeepSeek’s total reach. However the explosive growth has not kept pace with other AI apps and DeepSeek market share has since declined.

For web browser traffic, the data is even more grim with DeepSeek traffic down in absolute terms since release. The other leading AI model providers have all seen impressive growth in users over the same time frame.

The poor user momentum for DeepSeek-hosted models stands in sharp contrast to third party hosted instances of DeepSeek. Aggregate usage of R1 and V3 on third party hosts continues to grow rapidly, up nearly 20x since R1 first released.

Digging deeper into the data, by splitting out the DeepSeek tokens into just those hosted by the company itself, we can see that DeepSeek’s share of total tokens continues to fall every month.

So why are users shifting away from DeepSeek’s own web app and API service in favor of other open source providers despite the rising popularity of DeepSeek’s models and the apparently very cheap price?

The answer lies in tokenomics and the myriad of tradeoffs between the KPIs for serving a model. These tradeoffs mean a model’s price per token is ...

DeepSeek debrief: >128 days later

The Illusion of the Price War

The Hardware Reality and the Open Source Strategy

Compute Scarcity as a Global Phenomenon

Bottom Line

Deep Dives

Sources

DeepSeek debrief: >128 days later