OpenAI builds hybrid billing engine to keep Codex and Sora running at peak demand
The system blends rate limits with purchasable credits so users are not cut off when traffic spikes
OpenAI has built a real-time access engine that combines rate limits with purchasable credits, allowing users of its Codex coding agent and Sora video generator to keep working when demand exceeds baseline capacity.
The company said existing approaches forced a choice between hard rate limits, which cut users off at a fixed threshold, and pure usage-based billing, which charges from the first request.
Its hybrid system instead enforces rate limits first and transitions seamlessly to credit-based access within the same request when a user's allocation runs out, with the decision made in real time.
Under the hood, the engine tracks per-user, per-feature usage, maintains rate-limit windows and live credit balances, and settles charges through a streaming asynchronous processor.
Every request follows a single evaluation path that consumes rate limits synchronously, checks credit balances when needed and returns one definitive outcome before debiting credits asynchronously.
OpenAI said the billing architecture ties together three datasets: product usage events, monetisation events and balance updates, enabling independent audit and reconciliation.
Transactions carry stable idempotency keys, unique identifiers that prevent the same charge from being applied twice, to guard against double debits.
Balance updates are applied asynchronously but in near real time, with automatic refunds issued if an update overshoots a user's balance.
Related reading
- OpenAI introduces Lockdown Mode to guard against prompt injection attacks
- OpenAI launches open-source toolkit for social science research
- Deepgram updates multilingual speech-to-text model with improved code-switching accuracy
The company said balance decrements and their corresponding records are written in a single atomic database transaction and serialised per account to prevent race conditions, a class of bug in which simultaneous operations produce incorrect results.
Codex and Sora are the first products to use the system, but OpenAI said the infrastructure is designed to extend to additional services over time.
The Recap
- OpenAI built real-time access engine for Codex and Sora.
- System tracks usage, rate-limit windows, and credit balances.
- Billing uses events, monetisation, and balance updates for audit.