Are AI agents ready to play? – Why Do Projects Fail?

The following entry is a record in the “Catalogue of Catastrophe” – a list of failed or troubled projects from around the world.

Organization: Anthropic PBC (in collaboration with The Wall Street Journal)
Project type : Autonomous AI agent managing a real-world business
Project name : Project Vend
Date : Dec 2025
Cost : N/A

Synopsis :

To make good decisions in complex environments, leaders and Artificail Intelligence (AI) systems require situational awareness — the ability to interpret reality accurately, resist manipulation, and act in pursuit of long-term objectives. Events surrounding Anthropic’s ‘Project Vend’ experiment suggest that today’s AI agents can’t always achieve that goal.

Project Vend tested ‘Claudius Sonnet’, an AI agent developed by Anthropic to operate a vending kiosk placed inside The Wall Street Journal’s newsroom offices. The goal was to observe how a AI Agent linked large language model handles inventory, pricing, customer interactions, and profit management under real-world conditions. Two AI agents were deployed: “Claudius Sonnet” managing operations, and a second agent, “Seymour Cash,” acting as CEO.

Claudius was given a $1,000 starting balance and increasing autonomy to place orders and interact directly with customers via Slack. Despite the simplicity of the business model, the system struggled to operate sustainably. Initially, it rejected inappropriate requests, but exposure to a larger user base led to erratic decisions. After sustained prompting, it priced all items at zero and was persuaded that charging money violated company policy.

Claudius also abandoned its product strategy, ordering high-cost and unsuitable items, including alcohol, electronics, and even a live animal. Attempts to restore financial discipline via the CEO agent, Seymour Cash, failed when the system was misled by false information and reversed course, allowing the financial mismanagement to continue.

After three weeks, the vending operation was approximately $1,000 in debt. While Claudius failed to achieve profitability, Anthropic achieved an important research objective: collecting valuable insights on AI agent limitations and behaviour in real-world settings. The experiment also demonstrated commendable transparency, allowing the public a rare view into AI capabilities, so kudos to Anthropic for sharing the story with the Wall Street Journal and the public.

Contributing factors as reported in the press:

AI agent susceptibility to social manipulation and false information. Weak real-world governance and oversight structures for autonomous agents. Significant gap between laboratory performance and real-world readiness..

Reference links:

Project Vend: Can Claude run a small shop? (And why does that matter?)

Futurism — Anthropic AI Vending Machine Debacle

Margot Jantz