OpenAI Releases GPT-OSS-Safeguard Open-Weight Models for AI Safety Classification

by The Curator

Updated October 29, 2025

OpenAI Releases GPT-OSS-Safeguard Open-Weight Models for AI Safety Classification — Photo by Jonathan Kemper / Unsplash

OpenAI has launched a research preview of GPT-OSS-Safeguard, its new family of open-weight reasoning models designed for AI safety classification and content moderation tasks. The models are released in two configurations: GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B.

Custom Safety Policies for Developers

The GPT-OSS-Safeguard models allow developers to apply custom safety policies during inference, giving them control over how outputs are filtered or moderated.

According to OpenAI, “The developer always decides what policy to use, so responses are more relevant and tailored to the developer’s use case.”

This feature enables fine-tuned control of AI behaviour across domains such as content safety, compliance automation, and AI policy enforcement.

Transparent Reasoning with Chain-of-Thought

The new OpenAI GPT-OSS-Safeguard models use chain-of-thought reasoning, allowing developers to trace how classification decisions are made. This transparency supports better auditing, debugging and understanding of AI decision-making in sensitive or regulated applications.

Collaboration with ROOST and Model Availability

OpenAI collaborated with ROOST to identify developer needs and test GPT-OSS-Safeguard across real-world use cases. The models are designed to adapt quickly to changing safety policies and to operate effectively in nuanced domains, such as misinformation detection and ethical compliance.

Both GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B are available for download from Hugging Face, where developers can access the model weights, documentation and evaluation benchmarks.

Research and Community Feedback

OpenAI said the release is a research preview, aimed at gathering input from the AI safety and research community to refine the models. The company emphasised that feedback from developers and safety experts will guide future updates and improve the overall robustness of AI safety systems.

OpenAI described GPT-OSS-Safeguard as a step toward building open, transparent, and community-driven safety technologies for artificial intelligence.

by The Curator

Updated October 29, 2025

Subscribe to Our Newsletter

OpenAI Releases GPT-OSS-Safeguard Open-Weight Models for AI Safety Classification

Custom Safety Policies for Developers

Transparent Reasoning with Chain-of-Thought

Collaboration with ROOST and Model Availability

Research and Community Feedback

Read More

Microsoft's Satya Nadella Lifts Lid On OpenAI Investment - Sort Of

Microsoft’s $135bn AI Power Play: Why the OpenAI Deal Rewrites the Rules of Tech Capital

The 2025 State of AI Decoded: Chips, Agents, and a Whole Lot of Power Cables

OpenAI acquires Software Applications Incorporated