← Back to Library

How OpenAI scaled to 800 million users with postgres

In an era where every scaling crisis is met with a reflexive rush to shard databases, Alex Xu presents a counterintuitive case study that challenges the very foundation of modern distributed systems dogma. The claim is audacious: OpenAI scaled PostgreSQL to serve 800 million users and handle millions of queries per second without a single primary writer, relying instead on aggressive optimization and read replicas. For busy engineers and architects, this isn't just a technical deep dive; it's a strategic reminder that pragmatism often beats theoretical perfection.

The Myth of the Sharding Threshold

Xu opens by dismantling the industry's default assumption. "The common wisdom suggests that beyond a certain scale, you must shard the database or risk failure," he writes, immediately setting up the central tension of the piece. Instead of following the conventional playbook of splitting data across independent databases—a move that introduces immense complexity—the OpenAI engineering team chose to push a single primary PostgreSQL instance to its absolute limit.

How OpenAI scaled to 800 million users with postgres

This decision was driven by a starkly pragmatic calculation. Xu notes that sharding would have required "modifying hundreds of application endpoints and could take months or years to complete." In the fast-paced world of AI development, that timeline was a non-starter. The team realized their workload was overwhelmingly read-heavy, making a single-primary architecture viable if they could eliminate bottlenecks elsewhere. This is a crucial distinction often lost in system design debates: scale is not just about raw capacity, but about matching architecture to workload characteristics.

Critics might argue that relying on a single writer creates a single point of failure that no amount of optimization can fully mitigate. While valid, Xu's evidence suggests that for read-dominant systems, the operational cost of sharding often outweighs the theoretical risk of a bottleneck, provided the primary is treated with surgical precision.

"OpenAI's success came not from adopting the latest distributed database technology but from deeply understanding their workload characteristics and eliminating bottlenecks."

The Three Pillars of Optimization

The core of Xu's argument rests on three specific pillars, each addressing a different layer of the stack. First, minimizing load on the primary. The team didn't just offload reads; they migrated write-heavy workloads to sharded systems like Azure Cosmos DB, leaving PostgreSQL to do what it does best. "They implemented lazy writes where appropriate to smooth traffic spikes rather than hitting the database with sudden bursts," Xu explains. This approach of smoothing traffic rather than fighting it is a masterclass in operational discipline.

Second, the article dives into query and connection optimization. Xu highlights a particularly instructive failure mode: a single query joining 12 tables that caused high-severity incidents. "The team learned to avoid complex multi-table joins in their OLTP system," he writes, opting instead to move join logic to the application layer. This is a significant departure from the convenience of Object-Relational Mapping (ORM) frameworks, which Xu notes can "produce inefficient queries." By manually reviewing ORM-generated SQL and implementing strict timeouts, they prevented long-running queries from blocking the database's cleanup processes.

The third pillar focuses on preventing cascading failures. Xu describes a dangerous feedback loop where cache misses trigger database spikes, leading to retries that amplify the load. To break this cycle, OpenAI implemented a "cache locking and leasing mechanism." When multiple requests miss the cache, only one fetches the data while others wait. This simple logic prevents the database from being hammered by a thundering herd. "This targeted load shedding enables rapid recovery from sudden surges of expensive queries," Xu concludes, emphasizing that resilience is often about what you block, not just what you allow.

Navigating Architectural Constraints

No system is without its historical baggage, and Xu does not shy away from PostgreSQL's specific constraints. He explains how Multi-Version Concurrency Control (MVCC), a feature dating back to the early 1990s that allows concurrent transactions without blocking, creates challenges for write-heavy workloads. "MVCC creates challenges for write-heavy workloads," Xu writes, noting that it causes write amplification and table bloat. The team's solution was to migrate these specific workloads away from Postgres entirely, a decision that acknowledges the database's strengths while respecting its limits.

Similarly, the article addresses the pain of schema changes. In many systems, altering a column type can trigger a full table rewrite, locking the database for hours. OpenAI enforced strict rules: "Only lightweight schema changes are permitted," and "All schema changes have a 5-second timeout." This discipline prevents the database from grinding to a halt during maintenance, a lesson that echoes the early days of database administration when a single bad migration could take down a service for a day. The team also leverages the concept of Write Ahead Log (WAL) streaming to keep replicas synchronized, though Xu notes the network bandwidth limits of this approach as the number of replicas grows.

The Human Cost of Complexity

While the technical details are dense, the underlying message is about the human cost of architectural complexity. Xu points out that sharding is not just a technical challenge but an organizational one. "Sharding would require modifying hundreds of application endpoints," he writes, implying a massive coordination effort across engineering teams. By avoiding this, OpenAI preserved its velocity. The article serves as a reminder that the most scalable system is often the one that allows your team to move fast without breaking things.

"The road wasn't easy. In this article, we will look at the challenges OpenAI faced while scaling Postgres and how the team handled the various scenarios."

This admission of difficulty is refreshing. Xu does not present the solution as a magic bullet but as the result of rigorous optimization, careful monitoring, and operational discipline. The team achieved five-nines availability (99.999% uptime) not by magic, but by systematically eliminating failure points. Even with these measures, they faced a SEV-0 incident during the viral launch of a new feature, proving that no architecture is immune to the chaos of viral growth.

Bottom Line

Alex Xu's analysis delivers a powerful verdict: the rush to distributed databases is often premature, and a single-primary PostgreSQL instance can handle massive scale if treated with extreme care. The strongest part of this argument is its grounding in real-world constraints—specifically, the time and complexity costs of sharding. The biggest vulnerability remains the inherent risk of a single writer, a risk that will only grow as the user base expands further. For now, the lesson is clear: optimize the boring stuff before you build the complex stuff. Watch for how OpenAI's collaboration on cascading replication evolves, as that will be the true test of whether this architecture can scale indefinitely without a fundamental redesign.

Sources

How OpenAI scaled to 800 million users with postgres

Sentry’s AI debugger fixes code wherever it breaks (Sponsored).

Most AI coding tools only see your source code. Seer, Sentry’s AI debugging agent, uses everything Sentry knows about how your code has behaved in production to debug locally, in your PR, and in production.

How it works:

Seer scans & analyzes issues using all Sentry’s available context.

In development, Seer debugs alongside you as you build

In review, Seer alerts you to bugs that are likely to break production, not nits

In production, Seer can find a bug’s root cause, suggest a fix, open a PR automatically, or send the fix to your preferred IDE.

OpenAI scaled PostgreSQL to handle millions of queries per second for 800 million ChatGPT users. They did it with just a single primary writer supported by read replicas.

At first glance, this should sound impossible. The common wisdom suggests that beyond a certain scale, you must shard the database or risk failure. The conventional playbook recommends embracing the complexity of splitting the data across multiple independent databases.

OpenAI’s engineering team chose a different path. They decided to see just how far they could push PostgreSQL.

Over the past year, their database load grew by more than 10X. They experienced the familiar pattern of database-related incidents: cache layer failures causing sudden read spikes, expensive queries consuming CPU, and write storms from new features. Yet through systematic optimization across every layer of their stack, they achieved five-nines availability with low double-digit millisecond latency. But the road wasn’t easy.

In this article, we will look at the challenges OpenAI faced while scaling Postgres and how the team handled the various scenarios.

Disclaimer: This post is based on publicly shared details from the OpenAI Engineering Team. Please comment if you notice any inaccuracies.

Understanding Single-Primary Architecture.

A single-primary architecture means one database instance handles all writes, while multiple read replicas handle read queries.

See the diagram below:

This design creates an inherent bottleneck because writes cannot be distributed. However, for read-heavy workloads like ChatGPT, where users primarily fetch data rather than modify it, this architecture can scale effectively if properly optimized.

OpenAI avoided sharding its PostgreSQL deployment for pragmatic reasons. Sharding would require modifying hundreds of application endpoints and could take months or years to complete. Since their workload is primarily read-heavy and current optimizations provide sufficient capacity, sharding remains a future consideration rather than an immediate necessity.

So ...