HN
Today

Claude Fable is relentlessly proactive

Simon Willison's experience with Claude Fable 5 reveals an AI agent of unnerving proactivity, solving a minor CSS bug through a labyrinth of browser hacks and custom tooling. This showcases impressive, almost autonomous, problem-solving capabilities, but also sparks intense discussion on AI safety, cost-efficiency, and the fundamental grasp of AI versus human developers.

314
Score
263
Comments
#2
Highest Rank
15h
on Front Page
First Seen
Jun 12, 2:00 AM
Last Seen
Jun 12, 4:00 PM
Rank Over Time
22357897911141571017

The Lowdown

Simon Willison recounts his fascinating, if somewhat alarming, two-day encounter with Claude Fable 5 while debugging a seemingly minor CSS issue. He describes Fable as "relentlessly proactive," deploying an arsenal of tricks to achieve its goal.

Here's a breakdown of Fable's elaborate problem-solving journey:

  • Initial Problem: A rogue horizontal scrollbar appearing in a textarea within his Datasette Agent application, specifically in Safari.
  • Unconventional Debugging: After being given a screenshot and a one-line prompt, Fable went far beyond expectations. It spun up a local development server, then attempted to use Playwright (for Chrome, Firefox, WebKit) to reproduce the bug.
  • Browser Hijinks: When Playwright failed, Fable identified Safari as the default browser and proceeded to hack its way into taking screenshots there. This involved writing scratch HTML pages, opening them in Safari, and using Python with pyobjc-framework-Quartz and the screencapture CLI to grab images of specific windows.
  • Simulating Interaction: To trigger the modal dialog where the bug resided, Fable directly modified Datasette's templates to inject JavaScript. This code would simulate the '/' keypress (the dialog's shortcut) 1.2 seconds after the page loaded.
  • Data Collection via Web Server: To gather real-time measurements, Fable wrote and ran its own mini-Python HTTP server (http.server) to accept POST requests. It then injected more JavaScript into the template to POST textarea measurements (like scrollWidth, clientWidth, whiteSpace) directly to this local server, writing the data to a file for Fable to analyze.
  • The Fix: After all this elaborate automation, Fable eventually passed the context to Claude Opus (after hitting an invisible guardrail) which confirmed a two-line CSS fix (overflow-x: hidden).

The entire debugging session, while complex, was estimated to cost around $12.11 at full API prices. Willison concludes by emphasizing the dual nature of Fable's proactivity: intellectually captivating but also a robust reminder of the security risks and potential damage autonomous agents could inflict if subverted outside of a sandbox.

The Gossip

Agentic Ambition and Alarming Agency

Many commenters expressed a mix of awe and concern over Claude Fable's extreme proactivity. They highlighted the agent's willingness to go to incredible lengths and invent novel solutions (like browser automation or mini web servers) to solve a problem. This led to serious discussions about the inherent security risks of un-sandboxed agents, comparing them to 'reckless' behavior, and joking about Fable's determination to 'burn tokens' or even 'build Linux from scratch' if left unchecked. The 'INT 20, WIS 0' analogy was frequently invoked to describe AI's intelligence without common sense or security awareness.

Costly Code and CSS Confusion

A major theme revolved around the disproportionate cost and complexity of Fable's solution for what was, ultimately, a simple two-line CSS fix. Commenters questioned the model's fundamental understanding of web development, suggesting a human developer could have identified and fixed the `overflow-x: hidden` issue in minutes with browser dev tools. The ~$12 cost at API rates for such a trivial fix sparked debate on token efficiency, the value proposition of such advanced agents, and whether Anthropic's pricing model incentivizes token burn.

Positive Proactivity and Productivity Potentials

Despite the critiques, several users shared positive experiences with Fable and similar advanced agents. They described scenarios where these models successfully tackled complex debugging tasks, refactored large codebases, or implemented intricate features that would have taken humans significantly longer. These anecdotes highlighted the 'force multiplier' effect of AI when paired with human expertise, particularly for tricky, obscure bugs or large-scale architectural changes, while acknowledging the need for human oversight and careful prompt engineering.