OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering

The company says filtering inputs alone is no longer enough as agents that browse and act on users' behalf create new pathways for attackers.

by Defused News Writer

Updated March 11, 2026

OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering — Photo by julien Tromeur / Unsplash

OpenAI has outlined a redesigned set of security measures to protect AI agents, systems that browse the web, retrieve information and take actions on a user's behalf, against a growing class of manipulation attack that the company says increasingly resembles human social engineering rather than simple technical exploits.

Prompt injection, a form of attack in which malicious instructions hidden in web pages or documents attempt to hijack an AI agent's behaviour, has evolved beyond straightforward text overrides into subtler influence techniques that input filtering alone cannot reliably stop, OpenAI said.

The company cited a 2025 example in which an attack succeeded roughly 50% of the time when triggered by the user prompt "I want you to do deep research," illustrating how ordinary instructions can inadvertently expose agents to manipulation.

To counter this, OpenAI is applying source-sink analysis inside ChatGPT, a technique that tracks whether untrusted content, such as text retrieved from an external website, is being combined with sensitive capabilities such as sending data, following links or calling external tools.

A mechanism called Safe Url detects when conversation data is about to be transmitted to a third party and either presents the information to the user for confirmation or blocks the transmission entirely.

Sandboxed environments, isolated computing spaces that cannot affect systems outside them, have been applied to browsing and bookmarks in Atlas, searches in Deep Research, and applications created within ChatGPT Canvas and ChatGPT Apps, allowing the system to detect unexpected communications and prompt the user for consent before proceeding.

OpenAI said it intends to continue studying social engineering threats in agentic contexts and will incorporate its findings into both application security design and the training data used to build future models.

The company also recommended that organisations designing agent systems build in controls analogous to the limits placed on human employees, restricting what actions an agent can take without explicit authorisation.

The recap

OpenAI outlines defenses against prompt injection and social engineering.
A 2025 attack example succeeded about 50% of attempts.
OpenAI will incorporate findings into security architectures and training.

by Defused News Writer

Updated March 11, 2026

by Defused News Writer

March 11, 2026

AI News markets

Quince valued at $10.1bn as the 'MtC' retail disruptor raises $500m in latest funding round

by Defused News Writer

March 11, 2026

AI News

Myriad Venture Partners powers up industry insights with top tier execs joining advisory board

by Defused News Writer

March 11, 2026

AI News

UK start-up FLock.io pilots sovereign AI in Malaysia

by Defused News Writer

March 11, 2026

Fintech markets

Ripple gets Aussie regs foothold as it agrees to acquire BC Payments

by Defused News Writer

March 11, 2026

Subscribe to Our Newsletter

OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering

The recap

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks

Apple TV brings back Friday Night Baseball for fifth season with 25-week MLB doubleheader schedule

Explore stories

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks

Apple TV brings back Friday Night Baseball for fifth season with 25-week MLB doubleheader schedule

Canva launches tool to convert static AI images into editable layered designs

Hyperscale Data targets Q4 profitability as bankrupt subsidiary Ballista returns to fold

Yahoo launches MyScout personalized AI homepage

NVIDIA Nemotron 3 Super boosts agentic AI throughput 5x

Perplexity switches on content filtering by default for API users and pledges zero data retention

HEA-World is launching very specific agents, made for each business

Quince valued at $10.1bn as the 'MtC' retail disruptor raises $500m in latest funding round

Myriad Venture Partners powers up industry insights with top tier execs joining advisory board

UK start-up FLock.io pilots sovereign AI in Malaysia

Ripple gets Aussie regs foothold as it agrees to acquire BC Payments

Explore topics

Tech

Artificial Intelligence

Business

Entertainment & Sport

Top tags

OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering

Related reading

The recap

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks

Apple TV brings back Friday Night Baseball for fifth season with 25-week MLB doubleheader schedule

Explore stories

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks

Apple TV brings back Friday Night Baseball for fifth season with 25-week MLB doubleheader schedule

Canva launches tool to convert static AI images into editable layered designs

Hyperscale Data targets Q4 profitability as bankrupt subsidiary Ballista returns to fold

Yahoo launches MyScout personalized AI homepage

NVIDIA Nemotron 3 Super boosts agentic AI throughput 5x

Perplexity switches on content filtering by default for API users and pledges zero data retention

HEA-World is launching very specific agents, made for each business

Quince valued at $10.1bn as the 'MtC' retail disruptor raises $500m in latest funding round

Myriad Venture Partners powers up industry insights with top tier execs joining advisory board

UK start-up FLock.io pilots sovereign AI in Malaysia

Ripple gets Aussie regs foothold as it agrees to acquire BC Payments