Perplexity Unveils BrowseSafe: Open-Source Model and Benchmark to Combat Prompt-Injection in AI Agents

Details

Perplexity has introduced BrowseSafe, a lightweight model that analyzes HTML and blocks malicious prompt-injection instructions from reaching AI agents in real time.
Alongside the model, Perplexity has released BrowseSafe-Bench, a dataset of over 1,200 labeled web pages with embedded attack instructions to challenge AI browser agents’ defenses.
According to internal evaluations, BrowseSafe outperforms leading safety and LLM-based classifiers by up to 26 percentage points in F1 score while operating with millisecond-level latency.
The improvement is credited to training on adversarial examples specific to the benchmark, which enables much faster detection by eliminating the multi-second response lag common in larger models.
All code, weights, and datasets are provided under Apache-2.0 license on GitHub, allowing instant use with frameworks such as AutoGen, LangChain, and SuperAgent.
Reference scripts guide developers on removing or neutralizing unsafe DOM elements before an agent processes web content, providing an additional shield for autonomous browsing systems.

Impact

This release sets a new bar for agent security, pressing competitors like OpenAI and Anthropic to enhance their own real-time defenses. By open-sourcing these tools, Perplexity lowers costs for startups and advances compliance in the face of new regulations like the EU AI Act. The move could influence future safety standards and accelerate adoption of autonomous browsing tools across industries.

Perplexity Unveils BrowseSafe: Open-Source Model and Benchmark to Combat Prompt-Injection in AI Agents

Details

Impact

Social

CONTENT

INFO