
AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test
In a bid to test whether artificial intelligence (AI) agents can operate autonomously in the real economy, Andon Labs and Anthropic deployed Claude Sonnet 3.7 — nicknamed “Claudius” — to run an actual small, automated vending store at Anthropic’s San Francisco office for a month. …
Full Article