Details
- Anthropic is donating its open-source Petri alignment auditing tool to Meridian Labs, an independent AI evaluation nonprofit, to enable continued development outside Anthropic.
- The handover coincides with the release of Petri 3.0, featuring major architectural changes that separate the auditor model from the target model for greater adaptability and customization.
- New component Dish enhances test realism by running evaluations using a model's actual system prompt and live deployment framework, countering eval-awareness where models detect and alter behavior during tests.
- Petri now integrates with Bloom, another open-source tool, for deeper analysis of specific behaviors alongside Petri's broad scenario coverage.
- Originally launched in October 2025 via Anthropic Fellows program, Petri tests large language models for issues like deception, sycophancy, and harmful cooperation using multi-turn scenarios scored by a judge model.
- Anthropic has used Petri in alignment assessments for every Claude model since Claude Sonnet 4.5; the tool is available on GitHub for any LLM.
Impact
By donating Petri to an independent nonprofit, Anthropic promotes standardized, community-driven AI safety evaluations, reducing reliance on proprietary tools from frontier labs like OpenAI or Google DeepMind, which maintain closed alignment pipelines. Petri 3.0's architectural split and Dish realism mitigations address key limitations in behavioral auditing, enabling more reliable tests that mirror real-world deployments. This could accelerate adoption among labs, researchers, and governments, narrowing gaps in safety benchmarking and pressuring rivals to open similar tools amid rising calls for transparent AI governance.
