Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Anthropic pushes back on claims of Claude degradation as user complaints mount

Viral benchmarks and a detailed GitHub analysis have fuelled debate over whether the AI developer has quietly reduced model performance

Defused News Writer profile image
by Defused News Writer
Anthropic pushes back on claims of Claude degradation as user complaints mount

Anthropic is facing a wave of user complaints alleging that its Claude Opus 4.6 and Claude Code artificial intelligence models have deteriorated, with posts across GitHub, X and Reddit presenting logs, benchmarks and comparative tests as evidence of reduced reasoning quality and more frequent task abandonment.

The debate intensified following a GitHub analysis by Stella Laurenzo reviewing 6,852 Claude Code session files, 17,871 thinking blocks and 234,760 tool calls, which argued that estimated reasoning depth had declined while premature stopping and backtracking had increased.

Benchmark figures circulated alongside the complaint thread, with testing platform BridgeBench reporting a drop in accuracy from 83.3% to 68.3% and a fall from second to tenth place in its rankings.

Outside researcher Paul Calcraft cautioned that the two benchmark runs covered different task sets, however, and said performance on directly comparable tasks had shifted only modestly.

Anthropic staff have disputed the degradation claims while acknowledging product-level changes.

Boris Cherny, lead of Claude Code, said in a pinned GitHub reply that a header labelled "redact-thinking-2026-02-12" was a user interface change only, designed to hide reasoning from the display and reduce latency, and that model defaults had shifted to adaptive thinking at a medium effort level.

Cherny also noted that users can enter the command /effort high to request more extended reasoning, and said in a widely shared post on X, "This is false," in direct response to claims of secret model degradation.

Anthropic has separately confirmed changes to session and cache behaviour introduced to manage demand, and told VentureBeat in an email that Team and Enterprise customers were unaffected, with approximately 7% of users likely to encounter session limits more frequently during peak hours under the revised rules.

The company said many visible differences reflect disclosed product settings and cache experiments, and said it plans to release environment variables allowing users to set cache durations manually while continuing investment in capacity.

The recap

  • Developers report Claude Opus 4.6 regressions across GitHub and X.
  • One analysis reviewed 6,852 session files and 234,760 tool calls.
  • Anthropic will expose cache controls and invest in scaling.
Defused News Writer profile image
by Defused News Writer