Details
- Anthropic has activated AI Safety Level 3 (ASL-3) protections for its new Claude Opus 4 model, adopting advanced security and deployment protocols as a precautionary measure against potential misuse involving chemical, biological, radiological, and nuclear (CBRN) threats.
- The company’s ASL-3 standards focus on preventing model weight theft and limiting the model’s capacity to assist in CBRN weapon development.
- Newly introduced safeguards include Constitutional Classifiers to detect CBRN-related queries, strict egress bandwidth controls to curb model weight extraction, and a triple-layer anti-jailbreak system comprising prevention, monitoring, and rapid iteration defenses.
- While previous Claude models operated under ASL-2 protocols, this move reflects Anthropic's heightened caution as its AI approaches greater capability and risk thresholds.
- Anthropic’s system card notes over 100 security controls, such as two-party authorization and binary allowlisting, while maintaining a high rate of legitimate user queries accepted.
Impact
This step makes Anthropic a leader in setting industry standards for AI safety, demonstrating a rigorous, transparent approach as regulatory debates intensify globally. The explicit focus on CBRN mitigation underpins Anthropic’s influence in shaping policy discussions, while raising the bar for rivals like OpenAI and Google DeepMind to publicly adopt comparable safety frameworks. As AI models become more powerful, such protocols will shape enterprise trust and compliance across sensitive sectors.