Shared from twixb · venturebeat.com

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

venturebeat.com·Jun 10, 2026

Researchers from UC Berkeley have introduced the Agents' Last Exam (ALE), a new benchmark to evaluate AI's ability to perform complex, economically valuable tasks, where OpenAI's GPT-5.5 achieved the highest pass rate of 24.0%. The benchmark aims to address issues with previous AI evaluations by employing a rigorous, multi-faceted assessment framework, revealing significant shortcomings in current AI models, including a 0.0% pass rate in the most challenging tasks.

The introduction of Agents' Last Exam (ALE) by UC Berkeley's Center for Responsible, Decentralized Intelligence presents a critical new benchmark designed to evaluate AI models on economically valuable, long-horizon professional workflows. This highlights a significant gap between academic benchmarks and real-world impact, as shown by the low pass rates of even the most advanced models like OpenAI’s GPT-5.5 and Anthropic's Claude Fable 5. For professionals tracking AI advancements, ALE serves as a necessary reality check and a tool to measure genuine progress in AI's ability to perform complex, professional tasks.

Powered by twixb

Want more content like this?

twixb tracks your favorite blogs and social media, filters by keywords, and delivers personalized key learnings — straight to your inbox.

More from AI & Machine Learning News

Recent stories curated alongside this one.