OpenAI develops internal AI agent to accelerate data-driven decision making
Platform serves over 3,500 staff with reasoning grounded in 70,000 datasets and institutional memory
OpenAI has built an internal artificial intelligence agent designed to help employees explore its data platform and generate faster answers to business and product questions.
The company said in a blog post that the agent is for internal use only and integrates with its own data, workflows and permission systems. It uses OpenAI’s Codex, the GPT‑5.2 model, the Evals API and the Embeddings API.
Serving more than 3,500 internal users, the tool spans 70,000 datasets and 600 petabytes of data. It is available through Slack, a web interface, integrated development environments, the Codex command-line interface via MCP, and OpenAI’s internal ChatGPT app through an MCP connector.
According to the company, the agent moves users from question to insight in minutes and includes a memory system that learns continuously from interactions.
Its reasoning draws on multiple context layers, including dataset usage and lineage, human annotations, Codex-derived code enrichment, institutional content from Slack, Google Docs and Notion, persistent memory, and live queries executed at runtime.
A daily offline pipeline aggregates usage data, annotations and code-based enrichment into embeddings, which are used at query time for retrieval-augmented generation. OpenAI said this ensures that the agent can surface the most relevant context in response to user questions.
Related reading
- OpenAI limits agent access to unverified URLs to curb unauthorised data transfer
- OpenAI offers €500,000 in grants for youth AI safety work across EMEA
- Microsoft unveils UniRG to improve accuracy in AI-generated radiology reports
Quality control is enforced through the Evals framework, which pairs natural-language queries with validated SQL examples. The company said it compares generated SQL and output against known results to detect regressions.
The agent preserves permission boundaries through pass-through access and displays its reasoning alongside each answer, with links to underlying data sources. OpenAI said it continues to improve the tool’s reliability, validation methods and workflow integration.
The Recap
- OpenAI built an internal agent to explore company data.
- Platform covers 70k datasets and 600 petabytes for 3.5k users.
- Daily offline pipeline produces embeddings used at query time.