Shared from twixb · venturebeat.com

Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it

venturebeat.com·Apr 30, 2026

Researchers at Alibaba have developed a new reinforcement learning framework called Hierarchical Decoupled Policy Optimization (HDPO), which trains AI agents to effectively balance the use of internal knowledge and external tools, significantly reducing unnecessary tool invocations and improving reasoning accuracy. Their multimodal model, Metis, demonstrates this capability by achieving state-of-the-art performance while minimizing redundant tool usage, marking a shift towards cultivating AI's metacognitive abilities in tool utilization.

The most valuable insight for you is the introduction of Hierarchical Decoupled Policy Optimization (HDPO) by Alibaba researchers, which significantly addresses the inefficiencies in AI agents related to tool-use abstention. This framework decouples task accuracy and execution efficiency into separate optimization channels, allowing for cleaner learning signals and reducing redundant tool invocations from 98% to just 2% as demonstrated by the Metis model. Implementing HDPO could optimize AI agent performance in real-world applications by minimizing latency bottlenecks and API costs without sacrificing reasoning accuracy.

Powered by twixb

Want more content like this?

twixb tracks your favorite blogs and social media, filters by keywords, and delivers personalized key learnings — straight to your inbox.

More from AI & Machine Learning News

Recent stories curated alongside this one.