HN
Today

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

A developer provides detailed data analysis alleging that Anthropic silently downgraded the default cache Time-To-Live (TTL) for its Claude Code API from 1 hour to 5 minutes. This unannounced server-side change significantly inflated costs by 20-32% and caused subscription users to hit their quotas more frequently. The story resonates on HN as a critical examination of transparency and billing practices in the rapidly evolving AI API landscape.

8
Score
1
Comments
#2
Highest Rank
9h
on Front Page
First Seen
Apr 12, 8:00 AM
Last Seen
Apr 12, 7:00 PM
Rank Over Time
272912119121316

The Lowdown

An in-depth analysis of Claude Code session logs reveals that Anthropic seemingly made a covert, server-side change to its API's default prompt cache TTL. This adjustment, which occurred around March 6-8, 2026, reduced the TTL from a consistent 1 hour to just 5 minutes, leading to substantial financial and quota implications for users.

  • The analysis, based on 119,866 API calls across two independent machines and accounts, demonstrates a clear shift in cache behavior.
  • From February 1 to March 5, 2026, the API consistently used a 1-hour TTL, indicating it was likely the intended default.
  • Starting March 6, 5-minute TTL calls began to reappear, quickly becoming dominant by March 8, despite no client-side changes.
  • This reversion resulted in a 20-32% increase in cache creation costs for users, as frequent cache expirations forced re-uploads at higher 'write' rates instead of cheaper 'read' rates.
  • Pro/subscription users also reported hitting their quota limits for the first time, directly correlating with the TTL change.
  • The author hypothesizes that the 1-hour TTL was the deliberate default, and the subsequent downgrade was either an intentional cost-saving measure or an accidental regression.
  • The post requests Anthropic to confirm the change, clarify the intended TTL, consider restoring the 1-hour default, and disclose quota counting for cache reads.

This meticulous investigation highlights how a seemingly minor, unannounced technical alteration can have significant financial consequences for developers and raises important questions about transparency and fair billing practices from major AI service providers.