Shared from twixb · venturebeat.com

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

venturebeat.com·Jun 10, 2026

Researchers from UC Berkeley have introduced the Agents' Last Exam (ALE), a new benchmark to evaluate AI's ability to perform complex, economically valuable tasks, where OpenAI's GPT-5.5 achieved the highest pass rate of 24.0%. The benchmark aims to address issues with previous AI evaluations by employing a rigorous, multi-faceted assessment framework, revealing significant shortcomings in current AI models, including a 0.0% pass rate in the most challenging tasks.

The introduction of Agents' Last Exam (ALE) by UC Berkeley's Center for Responsible, Decentralized Intelligence presents a critical new benchmark designed to evaluate AI models on economically valuable, long-horizon professional workflows. This highlights a significant gap between academic benchmarks and real-world impact, as shown by the low pass rates of even the most advanced models like OpenAI’s GPT-5.5 and Anthropic's Claude Fable 5. For professionals tracking AI advancements, ALE serves as a necessary reality check and a tool to measure genuine progress in AI's ability to perform complex, professional tasks.

Want more content like this?

twixb tracks your favorite blogs and social media, filters by keywords, and delivers personalized key learnings — straight to your inbox.

Create Your Own →Explore Newsfeeds

More from AI & Machine Learning News

Recent stories curated alongside this one.

Browse all AI & Machine Learning News →

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

Want more content like this?

More from AI & Machine Learning News

Anthropic blocks all public access to Claude Fable 5, Mythos 5 following US government order — what enterprises should do

Here’s How AI Agents Can Protect EV Chargers

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

Google researchers introduce 'faithful uncertainty,' allowing LLMs to offer best guesses instead of hallucinations

NanoClaw and JFrog launch 'immune system' to block AI agents from downloading malicious code