Role of storage in AI, primer on nand flash, and deep-dive into qlc ssds

Vikram Sekar makes a compelling case that the bottleneck in artificial intelligence isn't just about how fast chips can think, but how quickly they can remember. While the industry fixates on graphics processing units, Sekar argues that the silent hero of the AI revolution is the storage hierarchy, specifically the evolution of NAND flash technology. This is a crucial pivot for investors and engineers who might otherwise overlook the physical constraints of data movement in favor of raw compute power.

The Hidden Cost of Idle Compute

Sekar begins by dismantling the assumption that faster processors alone solve AI scaling problems. He writes, "What gets less attention is the design of high capacity storage for AI workloads." This observation is vital because the economics of AI training are brutal; every second a powerful GPU sits waiting for data is money burned. Sekar explains that frontier models require tens of trillions of tokens, and the sheer volume of data needed for inference—handling hundreds of millions of user queries—demands storage systems capable of holding 50 to 100 petabytes in a single rack.

Role of storage in AI, primer on nand flash, and deep-dive into qlc ssds

The author effectively reframes the data center not as a collection of brains, but as a complex logistics network. He notes that "underutilizing GPU time quickly adds to model training costs," a stark reminder that efficiency is the primary currency in this sector. The hierarchy he describes moves from the ultra-fast, expensive High Bandwidth Memory (HBM) closest to the chip, down to the slower, cheaper hard disk drives used for cold storage. The efficiency of this flow determines whether a data center is a profit center or a money pit.

Critics might argue that emerging technologies like optical interconnects or new memory architectures could eventually render this specific storage hierarchy obsolete. However, Sekar's grounding in current physical constraints makes his analysis immediately actionable for anyone deploying infrastructure today.

"The efficiency of this hierarchy determines GPU utilization. A bottleneck at any level translates directly into idle compute time and wasted resources."

The Physics of Density

The core of Sekar's technical argument lies in the trade-offs of NAND flash cells. He details how the industry has moved from Single Layer Cells (SLC), which store one bit per cell, to Quad Level Cells (QLC), which store four bits. This shift is not merely a marketing upgrade; it is a fundamental physical compromise. Sekar writes, "The downside to storing multiple bits in a cell requires charge storage levels to be precisely controlled, and there are many more voltage reference levels that must be determined for every cell."

This precision comes at a cost. As Sekar points out, higher density results in higher latency and lower endurance. The more voltage levels a cell must distinguish between, the longer it takes to read or write data, and the more stress is placed on the cell's oxide layer. He explains that while QLC offers incredible storage density, making it a potential replacement for hard disk drives in some scenarios, it introduces significant thermal stability issues and requires more powerful CPU controllers to manage the error correction.

The author's breakdown of the "charge trap flash" technology versus the older "floating gate" design provides necessary context for why modern drives behave the way they do. He notes that the industry has largely adopted charge trap flash due to "better cell endurance, lower leakage and better ease of 3D stacking." This technical nuance is often glossed over in broader market reports, yet it dictates the reliability of the storage systems powering the next generation of AI models.

The QLC vs. HDD Battle

Perhaps the most significant implication of Sekar's analysis is the looming displacement of traditional hard disk drives. For decades, spinning disks have been the go-to for archival storage due to their low cost per terabyte. However, Sekar suggests that the advent of QLC-based storage systems is changing the equation. He asks, "What the advent of QLC-based storage systems means for HDD storage," implying a shift where the speed advantage of flash begins to outweigh the cost advantage of magnetic media.

This transition is not without friction. Sekar highlights that testing QLC SSDs presents unique challenges, particularly regarding performance consistency under heavy load. He argues that while QLC offers a middle ground between the speed of TLC (Triple Level Cell) and the capacity of HDDs, it requires a rethinking of how data centers are architected. The move to QLC isn't just about buying bigger drives; it's about redesigning the entire data flow to accommodate the specific latency profiles of these new cells.

A counterargument worth considering is that the cost per gigabyte of HDDs may remain lower for the foreseeable future, especially for truly "cold" data that is rarely accessed. Sekar acknowledges the role of HDDs for long-term archival but suggests that as AI workloads demand faster retrieval of historical data, the line between "warm" and "cold" storage will blur, favoring the speed of flash.

"Since only one charge level exists (other than zero charge), this kind of cell is called a single layer cell (SLC). SLC flash is typically fast because only one voltage comparison needs to be made to the reference level."

Bottom Line

Vikram Sekar's analysis succeeds by shifting the focus from the glamour of AI compute to the gritty reality of data movement, proving that storage architecture is the unsung determinant of AI scalability. While the piece is dense with technical specifics on voltage levels and cell endurance, its strongest argument is economic: the cost of idle GPUs makes storage efficiency a non-negotiable priority. The biggest vulnerability in this narrative is the assumption that QLC technology will scale perfectly to replace HDDs without unforeseen reliability issues, but the trajectory toward flash-only data centers seems increasingly inevitable. Investors and engineers should watch how quickly the industry adapts its infrastructure to these new physical limits, as the next bottleneck in AI will likely be found in the storage rack, not the processor.

Role of storage in AI, primer on nand flash, and deep-dive into qlc ssds

by Vikram Sekar · Vik's Newsletter · Read full article

Welcome to a subscriber-only deep-dive edition of my weekly newsletter. Each week, I help investors, professionals and students stay up-to-date on complex topics, and navigate the semiconductor industry.

If you’re new, start here. As a paid subscriber, you will get additional in-depth content. See here for all the benefits of upgrading your subscription tier!

Note to paid subscribers: This post is long, and to save you time, here is an executive summary post. Thanks for all your support! I still recommend reading the full post for a comprehensive understanding.

I have a video version, if you want to understand the fundamentals of NAND flash!

A lot of attention is given to compute, memory and networking in an AI data center - from how quickly GPUs get data from HBM, to how fast the interconnects in a datacenter are. What gets less attention is the design of high capacity storage for AI workloads.

Frontier models are trained on tens of trillions of tokens, which typically occupy multiple terabytes of training data depending on the precision format used. During the training process, voluminous amounts of data need to be delivered to the GPU on time for computation because underutilizing GPU time quickly adds to model training costs.

The storage demands on the inference side are even higher. With 800 million ChatGPT users and counting, all the queries, documents, images, and videos generated or uploaded need to be stored somewhere, and query results must be generated in a matter of milliseconds, or a few seconds at most. This requires in the range of 50-100 petabytes (1 PB = 1,000 TB) in a single storage rack.

All these workloads require fast access to high capacity storage, which is present as local SSDs within the compute tray, networked SSDs in a storage server connected over fabrics like Ethernet/Infiniband, or just massive clusters of HDDs for long term archival as shown in the picture below.

In this article, we will look at how data flows during training and inference, understand storage hierarchies in a datacenter, develop a thorough understanding of NAND flash technologies, and discuss emerging QLC NAND flash in quite some detail.

For free subscribers:

Role of Storage in LLMs: How the interplay between HBM and SSDs is essential to training and inference.

Storage hierarchy in AI datacenters: Understanding the cost, capacity and speed constraints of various storage devices.

Basic types of NAND cells: Floating ...

The Hidden Cost of Idle Compute

The Physics of Density

The QLC vs. HDD Battle

Bottom Line

Sources

Role of storage in AI, primer on nand flash, and deep-dive into qlc ssds