Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

OpenAI is paying researchers to find ways its AI can be weaponised

The new Safety Bug Bounty program targets harms that fall outside conventional security definitions. Prompt injection, agent hijacking and account manipulation are all in scope

Defused News Writer profile image
by Defused News Writer
OpenAI is paying researchers to find ways its AI can be weaponised
Photo by James Wainscoat / Unsplash

OpenAI is launching a public Safety Bug Bounty program designed to surface AI abuse risks that its existing security processes were not built to catch.

The program sits alongside OpenAI's Security Bug Bounty but targets a different category of problem: issues that cause tangible harm without meeting the technical definition of a security vulnerability. In practice, that means the grey zone where AI systems can be manipulated, redirected or exploited at scale.

What researchers can report

The scope is broad. Agentic risks are explicitly in scope, including third-party prompt injection and data exfiltration, where attacker-controlled text can reliably hijack an agent's behaviour. To qualify, a reported issue must be reproducible at least 50% of the time.

Also in scope: agentic products performing disallowed actions at scale, model outputs that return OpenAI proprietary information, and vulnerabilities that expose other proprietary data. Account and platform integrity issues are covered too, including bypassing anti-automation controls, manipulating account trust signals and evading bans or restrictions.

Jailbreaks are out of scope. Issues that allow access beyond authorised permissions belong in the Security Bug Bounty rather than this one. OpenAI says submissions will be triaged by both teams and rerouted depending on scope.

Why this program exists

The conventional security model was built for software with defined inputs and outputs. AI systems do not behave that way. An agent that can browse the web, write code and take actions on a user's behalf creates attack surfaces that have no equivalent in traditional vulnerability research.

Prompt injection is the clearest example. An attacker embeds instructions in content the agent reads, and the agent follows them. No system is compromised in the conventional sense. The model simply does what it was told, by someone other than the user.

OpenAI's decision to pay for these reports acknowledges that finding and fixing them requires external eyes. The company also runs private campaigns targeting specific harm categories, with biorisk content in ChatGPT Agent and GPT-5 cited as current examples.

The broader signal

A public bug bounty is also a statement of intent. It tells researchers that OpenAI regards safety abuse as a legitimate and compensable area of work, not a policy problem to be managed separately from engineering.

"Our goal is to ensure our systems remain safe and secure against misuse or abuse that could lead to tangible harm," the company said in its announcement.

Researchers, ethical hackers and safety teams can apply through the Safety Bug Bounty program to submit findings and work with OpenAI on remediation.

The recap

  • OpenAI launches public Safety Bug Bounty program for AI abuse.
  • Reports must be reproducible at least 50% of the time.
  • Researchers can apply through the Safety Bug Bounty program.
Defused News Writer profile image
by Defused News Writer

Explore stories