OpenAI has released a training dataset called IH-Challenge, designed to teach its AI models to follow instructions in the correct order of priority and resist attempts to hijack their behaviour.
The company describes this as an instruction hierarchy: a trust ranking that places system-level instructions above those from developers, developers above users, and users above automated tools.
The practical challenge arises when instructions conflict, for example, when a developer tells a model to act as a maths tutor and never give away answers, but a user then asks it to do exactly that.
IH-Challenge structures thousands of such scenarios as automated tests, allowing OpenAI to check whether a model correctly sides with the higher-ranking instruction.
The dataset is also designed to improve resistance to prompt injection, a type of attack in which malicious text hidden in a document or website attempts to hijack an AI model's behaviour by posing as a legitimate instruction.
OpenAI said a version of its GPT-5 Mini model trained on IH-Challenge, called GPT-5 Mini-R, outperformed the standard GPT-5 Mini on several safety benchmarks while maintaining its usefulness for everyday tasks.
Related reading
- OpenAI to acquire Promptfoo to bolster security
- OpenAI unveils education tools to close AI capability gaps
- OpenAI extends single-minus amplitudes to gravitons
The company said the trained model did not become overly cautious or prone to refusing legitimate requests, a common pitfall when safety measures are tightened.
OpenAI is releasing the dataset publicly to support further research, arguing that models trained on these structured scenarios learn behaviour that transfers to more realistic and adversarial situations as AI systems become more capable.
The recap
- OpenAI introduces IH‑Challenge training dataset for instruction hierarchy.
- GPT‑5 Mini‑R improves TensorTrust (sys‑user) from 0.86 to 0.94.
- IH‑Challenge dataset will be released to support further research.