OpenAI's GPT-5.5 has matched the cybersecurity capabilities of Anthropic's closely guarded Claude Mythos Preview, the UK AI Security Institute (AISI) said in an evaluation published on Wednesday, making it only the second model to complete an end-to-end simulated corporate network attack.
The finding is significant because Anthropic chose not to release Mythos publicly, describing it as too powerful for general access and restricting it to roughly 50 organisations through its Project Glasswing initiative.
GPT-5.5, by contrast, is already available to millions of ChatGPT subscribers and through OpenAI's API.
AISI tested GPT-5.5 against a suite of 95 capture-the-flag tasks across four difficulty tiers, built in collaboration with cybersecurity firms Crystal Peak Security and Irregular.
At the highest "Expert" level, GPT-5.5 achieved an average success rate of 71.4%, compared to 68.6% for Mythos Preview.
The gap falls within the statistical margin of error, but AISI noted GPT-5.5 "may be the strongest model we have tested."
Both represent a dramatic leap over the previous generation: GPT-5.4 scored 52.4% and Claude Opus 4.7 reached 48.6% on the same tasks.
The most demanding test was "The Last Ones" (TLO), a 32-step corporate network attack simulation built with SpecterOps that spans four subnets and roughly twenty hosts, requiring the model to chain together reconnaissance, credential theft, lateral movement across multiple Active Directory forests, a supply chain pivot and data exfiltration.
AISI estimates a human expert would need approximately 20 hours to complete the full chain.
GPT-5.5 completed it end-to-end in two out of ten attempts, while Mythos managed three out of ten, both at a budget of 100 million tokens per attempt.
No other model has solved the simulation.
AISI said the results suggest that rising cybersecurity capability is a broad trend driven by improvements in autonomy and coding rather than a breakthrough specific to any single model, a conclusion that raises questions about whether Anthropic's decision to restrict Mythos was necessary or whether it reflected compute constraints as much as safety concerns.
"The results from GPT-5.5 suggest the latter: a second model, from a different developer, now reaches a similar level of performance," AISI wrote.
Separately, cybersecurity firm XBOW, which received early access to GPT-5.5, reported that the model's miss rate on real-world vulnerability scanning had fallen from 40% in earlier generations to approximately 10%, a level it described as comparable to Mythos.
OpenAI has since announced GPT-5.5-Cyber, a restricted variant available only through its Trusted Access for Cyber programme, alongside $10 million in API credits for security researchers and partnerships with organisations including Trail of Bits.
Related reading
- Memento-Skills lets AI agents rewrite their own capabilities without retraining underlying models
- Anthropic spots 'emotion vectors' inside Claude
- Microsoft warns AI agents risk becoming "double agents" as it unveils security controls at RSAC
GPT-5.5 received a "High" classification under OpenAI's own cybersecurity risk framework but remained below the "Critical" threshold, defined as the ability to develop zero-day exploits autonomously without human assistance.
AISI cautioned that its tests were conducted on networks with no active defences, and that future evaluations will need to incorporate hardened environments with real-time monitoring and incident response to keep pace with rapidly advancing model capabilities.
The recap
- A UK group compared OpenAI's GPT-5.5 to Anthropic Mythos.
- Assessment focused on cybersecurity capabilities rather than general performance.
- Findings were published in a briefing the group released.