Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

OpenAI releases training dataset to help AI models follow instructions in the right order

The IH-Challenge dataset is designed to make AI systems better at deciding whose instructions to obey when they conflict

Defused News Writer profile image
by Defused News Writer
OpenAI releases training dataset to help AI models follow instructions in the right order
Photo by Patrick Perkins / Unsplash

OpenAI has released a training dataset called IH-Challenge, designed to teach its AI models to follow instructions in the correct order of priority and resist attempts to hijack their behaviour.

The company describes this as an instruction hierarchy: a trust ranking that places system-level instructions above those from developers, developers above users, and users above automated tools.

The practical challenge arises when instructions conflict, for example, when a developer tells a model to act as a maths tutor and never give away answers, but a user then asks it to do exactly that.

IH-Challenge structures thousands of such scenarios as automated tests, allowing OpenAI to check whether a model correctly sides with the higher-ranking instruction.

The dataset is also designed to improve resistance to prompt injection, a type of attack in which malicious text hidden in a document or website attempts to hijack an AI model's behaviour by posing as a legitimate instruction.

OpenAI said a version of its GPT-5 Mini model trained on IH-Challenge, called GPT-5 Mini-R, outperformed the standard GPT-5 Mini on several safety benchmarks while maintaining its usefulness for everyday tasks.

The company said the trained model did not become overly cautious or prone to refusing legitimate requests, a common pitfall when safety measures are tightened.

OpenAI is releasing the dataset publicly to support further research, arguing that models trained on these structured scenarios learn behaviour that transfers to more realistic and adversarial situations as AI systems become more capable.

The recap

  • OpenAI introduces IH‑Challenge training dataset for instruction hierarchy.
  • GPT‑5 Mini‑R improves TensorTrust (sys‑user) from 0.86 to 0.94.
  • IH‑Challenge dataset will be released to support further research.
Defused News Writer profile image
by Defused News Writer