AI Agents Do Well in Simulations, Falter in

AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test

Upworthy

Published 02 Jul 2025

In a bid to test whether artificial intelligence (AI) agents can operate autonomously in the real economy, Andon Labs and Anthropic deployed Claude Sonnet 3.7 — nicknamed “Claudius” — to run an actual small, automated vending store at Anthropic’s San Francisco office for a month. …

Full Article

AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test

AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test

You might like