Researchers from UC Berkeley have introduced the Agents' Last Exam (ALE), a new benchmark to evaluate AI's ability to perform complex, economically valuable tasks, where OpenAI's GPT-5.5 achieved the highest pass rate of 24.0%. The benchmark aims to address issues with previous AI evaluations by employing a rigorous, multi-faceted assessment framework, revealing significant shortcomings in current AI models, including a 0.0% pass rate in the most challenging tasks.
The introduction of Agents' Last Exam (ALE) by UC Berkeley's Center for Responsible, Decentralized Intelligence presents a critical new benchmark designed to evaluate AI models on economically valuable, long-horizon professional workflows. This highlights a significant gap between academic benchmarks and real-world impact, as shown by the low pass rates of even the most advanced models like OpenAI’s GPT-5.5 and Anthropic's Claude Fable 5. For professionals tracking AI advancements, ALE serves as a necessary reality check and a tool to measure genuine progress in AI's ability to perform complex, professional tasks.