An update on recent Claude Code quality reports

Anthropic released a candid post-mortem addressing widespread user complaints about Claude Code's degraded performance, detailing three distinct changes that inadvertently hobbled the AI. The report reveals a delicate balance between latency, cost, and model intelligence, leading to frustrated users feeling 'gaslighted' by the shifting product experience and hidden operational trade-offs.

593

Score

468

Comments

Highest Rank

19h

on Front Page

First Seen

Apr 23, 6:00 PM

Last Seen

Apr 24, 12:00 PM

Rank Over Time

The Lowdown

Anthropic published a post-mortem to explain the recent degradation in Claude Code's performance, acknowledging user reports and outlining three primary issues that contributed to the observed quality drop.

Reasoning Effort Default: On March 4, the default reasoning effort for Claude Code was changed from 'high' to 'medium' for Sonnet 4.6 and Opus 4.6 to reduce UI latency. This was reverted on April 7 after user feedback indicated a preference for higher intelligence, even with longer latencies.
Caching Bug: A caching optimization shipped on March 26 intended to clear older thinking for idle sessions (over an hour) contained a bug. Instead of clearing thinking only once, it did so repeatedly, making Claude seem forgetful and repetitive. This affected Sonnet 4.6 and Opus 4.6 and was fixed on April 10.
System Prompt Change: On April 16, a system prompt instruction was added to reduce model verbosity. Combined with other changes, this inadvertently hurt coding quality for Sonnet 4.6, Opus 4.6, and Opus 4.7, and was reverted on April 20.

Anthropic admitted that these issues were challenging to reproduce internally and distinguish from normal user feedback variations. Moving forward, the company plans to increase internal dogfooding with public builds, enhance its Code Review tool, implement tighter controls for system prompt changes with broader evaluations, and use gradual rollouts. As a gesture of goodwill, all subscribers' usage limits were reset.

The Gossip

Gaslighting Grievances

Many users expressed deep frustration, feeling that Anthropic initially dismissed their concerns about degradation, leading to accusations of 'gaslighting.' They point to previous instances where public statements contradicted internal changes impacting model behavior and quality, eroding trust. This sentiment is fueled by the realization that issues were acknowledged only after widespread complaints, suggesting a reactive rather than proactive approach to user feedback.

The Cache Conundrum

The decision to clear Claude's thinking history after an hour of inactivity, originally intended to reduce latency and cost, sparked significant debate. Users found this disruptive, especially those with workflows involving long, intermittent sessions. While an Anthropic developer explained the technical and cost-saving rationale, many argued it was a hidden 'cost' for users, leading to higher token usage when the full context had to be re-processed. Suggestions for user control over caching or better UI indicators were common.

Model Maladies and Migration

Commenters widely shared anecdotal experiences of Claude's perceived 'dumbing down,' particularly with Opus 4.7. Issues included responding to internal prompts, increased verbosity (ironically, despite Anthropic's attempt to reduce it), forgetfulness, and a general decrease in code quality. Many users reported switching back to older Claude models (4.6 or even 4.5) or migrating to competitors like OpenAI's Codex/GPT-5.4, citing better performance and reliability, questioning Claude's value proposition given its premium pricing.

Transparency Troubles

A recurring critique centered on Anthropic's lack of transparent communication regarding significant product changes. Users highlighted that crucial behavioral shifts, like default reasoning levels or caching policies, were implemented without clear public announcements, often discovered through degraded experience or social media interactions. This 'black box' approach fosters distrust and makes it difficult for users to build reliable workflows on a constantly shifting platform.