OpenAI scales PostgreSQL to millions of queries per second for ChatGPT users
The company says careful engineering and a growing shift to sharded databases have allowed it to keep latency low and availability high as ChatGPT usage surged to hundreds of millions.
OpenAI has detailed how it scaled its PostgreSQL database infrastructure to handle millions of queries per second for around 800 million users of ChatGPT, while maintaining near-continuous availability.
In a statement, OpenAI said its PostgreSQL load has grown more than tenfold over the past year. The system now runs on a single primary Azure PostgreSQL flexible server instance, supported by nearly 50 read replicas distributed across multiple regions.
For a lay reader, PostgreSQL is a widely used open-source database that stores and retrieves structured data. In large systems, one server typically handles all writes, while additional “read replicas” serve copies of the data to answer queries more quickly and spread the load. That architecture can work at scale, but it comes with trade-offs.
OpenAI said PostgreSQL’s design relies on a technique called multiversion concurrency control, or MVCC, which allows many users to read and write data at the same time. While this improves correctness and consistency, it can also lead to “write amplification” and table bloat, meaning the database does more work and consumes more storage as traffic increases.
As a result, the company said it has been steadily moving write-heavy and easily partitioned workloads to sharded systems such as Azure Cosmos DB, where data is split across many independent servers. New workloads now default to these sharded systems, and OpenAI said adding new tables to the existing PostgreSQL deployment is no longer allowed.
To keep the remaining PostgreSQL setup stable, OpenAI described a series of operational changes aimed at reducing pressure on the primary database. Read traffic is aggressively offloaded to replicas, and background tasks such as data backfills are rate-limited to avoid spikes. The company also uses PgBouncer, a connection-pooling tool, to manage database connections more efficiently.
That change alone had a marked impact. OpenAI said average connection times fell from about 50 milliseconds to around 5 milliseconds in internal benchmarks, helping prevent the primary server from hitting its 5,000-connection limit.
The primary database runs in a high-availability configuration with a hot standby, meaning a secondary instance is kept ready to take over if the main one fails. OpenAI added that it is testing cascading replication with the Azure PostgreSQL team, a technique that allows replicas to feed other replicas, reducing the load on the primary server as the number of read copies grows.
According to the company, these measures have allowed it to maintain near-zero replication lag, low double-digit millisecond p99 client-side latency and “five nines” availability, or 99.999% uptime, even while operating close to 50 read replicas.
OpenAI reported just one SEV-0, or highest-severity, PostgreSQL incident in the past 12 months. That occurred during the viral launch of ChatGPT ImageGen, when write traffic surged by more than ten times as over 100 million new users signed up within a single week.
Related reading
- OpenAI launches Education for Countries to embed AI across national school systems
- OpenAI warns of widening AI use gap between power users and the rest
- vLLM traces long-running memory leak to low-level system calls in disaggregated AI setup
Looking ahead, the company said it will continue migrating remaining write-heavy workloads away from PostgreSQL, work with Azure to enable cascading replication in production, and evaluate sharded PostgreSQL and other distributed database systems for future scalability.
The account highlights a broader reality of running consumer-scale AI services. While cutting-edge models often grab attention, the reliability of underlying databases remains critical. OpenAI’s experience suggests that even mature technologies like PostgreSQL can be pushed to extraordinary limits, provided there is a willingness to redesign workloads, impose constraints and accept that no single system can do everything forever.
The Recap
- OpenAI scaled PostgreSQL to millions of queries per second.
- Deployed a single primary and nearly 50 read replicas globally.
- Remaining write-heavy workloads are migrating to sharded systems.