Anthropic's latest model, Claude Opus 4.8, has been found to have a prompt injection vulnerability rate of 31.5% before safeguards are engaged, significantly higher than the disclosures from competitors OpenAI, Google, and Meta, which lack comparable metrics and transparency. The absence of standardization in measuring these vulnerabilities complicates the evaluation of AI security across different vendors, leaving buyers to manage their own risk exposure.
The key insight for an AI professional is the significant variability in how frontier labs like Anthropic, OpenAI, Google, and Meta disclose and measure prompt injection vulnerabilities in their AI models. Anthropic's detailed per-surface analysis reveals a 31.5% hijack rate in browser environments before safeguards, emphasizing the critical need for robust, surface-specific security assessments. This underscores the importance of demanding detailed, per-surface security metrics from vendors and conducting independent red-team evaluations to accurately assess and mitigate AI deployment risks.