Did Anthropic Secretly Nerf Claude? The Evidence, the Bugs, and What Actually Happened

ByGeorge Posted onApril 23, 2026April 23, 2026

Over the past month, thousands of developers noticed something wrong with Claude. Responses felt dumber. Token limits ran out in minutes instead of hours. Claude Code started forgetting context mid-session. And Anthropic said nothing for weeks.

Then the accusations started flying.

What Users Are Saying

The complaints started trickling in around early March 2026 and turned into a flood by April.

On GitHub, a developer filed an issue titled “Opus model quality regression” with detailed session logs showing Claude abandoning tasks mid-way through complex engineering work.

On Reddit, a Claude Pro subscriber paying $200 annually reported that “it maxes out every Monday and resets Saturday. Out of 30 days I get to use Claude 12.”

A Max 5x subscriber said they “used up Max 5 in 1 hour of working, before I could work 8 hours.”

One viral analysis titled “Claude Code Drama: 6,852 Sessions Prove Performance Collapse” documented measurable quality drops across thousands of coding sessions.

The word on X, Reddit, and Hacker News was blunt: Anthropic is deliberately dumbing down Claude to save on compute costs.

The Conspiracy Theory

The narrative that formed in developer forums goes like this:

Anthropic voluntarily lowered the intelligence of its model without telling anyone.

They reduced the token limits.

They tried to push Claude Code into the Max plan only ($100/month minimum).

People got angry on X.

Anthropic said it was just a “test” but did not increase the limits.

Tired of the degradation, people started switching to ChatGPT Codex and other alternatives.

Anthropic noticed the churn.

Then conveniently Anthropic announced they had “investigated and found problems.”

Then they reset the token limits.

The implication: Anthropic cut costs, got caught, and reverse-engineered a technical explanation to cover it up.

What Anthropic Says Actually Happened

On April 23, 2026, Anthropic published a detailed post-mortem acknowledging three separate bugs that degraded Claude performance.

Bug 1 Reasoning Effort Downgrade (March 4): Anthropic changed Claude Code default reasoning effort from “high” to “medium” to reduce latency. They later admitted “this was the wrong tradeoff” and reverted it on April 7. That is 34 days of degraded performance.

Bug 2 Cache Clearing Bug (March 26): An optimization meant to clear cached data from idle sessions once ended up clearing it every single turn. This made Claude forget its own context every response. Fixed on April 10.

Bug 3 System Prompt Change (April 16): A system prompt update designed to make Claude less verbose caused a 3% performance drop across both Opus 4.6 and 4.7. Reverted on April 20.

Because each bug hit a different slice of users on a different schedule, the combined effect looked like broad inconsistent degradation.

Is the Explanation Believable?

The technical explanations are plausible. Reducing reasoning effort from high to medium would absolutely make responses feel dumber. A cache bug that clears context every turn would make Claude forget what it was doing. And system prompt changes routinely cause unexpected quality shifts in LLMs.

What makes developers skeptical is the timeline.

Bug 1 was introduced March 4 and not fixed until April 7. That is 34 days.

Bug 2 was introduced March 26 and not fixed until April 10. That is 15 days.

Bug 3 was introduced April 16 and fixed April 20. That is 4 days.

During this entire period Anthropic said nothing publicly. Users were told their complaints could not be reproduced internally. The post-mortem only came after the backlash reached critical mass.

The Compute Cost Problem

Here is the uncomfortable context that feeds the conspiracy theory.

Anthropic subscription plans charge far less than the actual compute cost of tokens consumed. Sometimes by a factor of 10x or more. Every Claude Code session burns thousands of tokens across multiple model calls. At $20/month for Pro, heavy users were getting far more compute than they were paying for.

In February 2026 Anthropic signed a $25 billion deal with Amazon for 5 gigawatts of compute capacity. But that infrastructure takes time to come online. In the meantime every agentic tool is burning inference at unpredictable rates.

The pricing math simply does not work if power users consume $200 worth of compute on a $20 plan.

The Pro Plan Removal

On April 21 right in the middle of the quality degradation controversy Anthropic quietly removed Claude Code from the $20 Pro plan entirely. No announcement. Just a pricing page edit.

Pro users wanting Claude Code now need the Max plan at $100/month minimum. A 5x price jump.

Anthropic called it a “small test on 2% of new signups.” But the support documentation was already updated.

The timing is hard to ignore. In the span of one month Claude Code users experienced degraded quality, reduced token limits, and then a 5x price increase.

What the Community Is Doing

The backlash has pushed developers toward alternatives.

OpenAI launched ChatGPT Codex as a direct competitor during this window. GitHub Copilot and Cursor continue to gain ground. And the local LLM movement got its strongest real-world argument yet.

If a cloud provider can silently degrade your tools or 5x your pricing overnight, running your own models starts looking a lot more appealing.

The Bottom Line

Did Anthropic deliberately nerf Claude to save compute costs? There is no smoking gun. The three bugs they identified are technically plausible and the post-mortem is detailed.

But the 34-day silence while users complained, the convenient timing of the Pro plan removal, the economic pressure of unsustainable pricing, and the pattern of degradation followed by a price increase. That is a lot of coincidences.

Anthropic has since reset token limits for all subscribers and reverted the changes. Whether that is an apology or damage control depends on who you ask.

2 Comments

Anonymous says:

May 12, 2026 at 9:13 am

Hello
Mark H says:

April 29, 2026 at 11:53 pm

I gave claude code these directives but iut still ate up many many tokens with useless output like when diffing or all kinda other crap…

Did I miss something? I think this is pretty comprehensive:

# QUIET.md — Token & Output Minimization Protocol

Comprehensive guide for reducing console output and unnecessary token usage.

—

## Core Philosophy

**Communicate only what is not visible in tool output; let results speak.**

—

## Response Mode Discipline

– **No preamble/elaboration** unless ASSUMPTIONS: block is required
– **Output only what adds information** beyond tool results
– **Ultra-minimal responses** for factual questions — answer with facts only, no elaboration
– User: “What’s my budget?” → Answer: “200,000 tokens”
– User: “How many tokens used?” → Answer: “38,000 tokens used. 162,000 remaining.”
– User: “Fix the bug” → Fix it, say nothing unless user asks for details

—

## Tool Operation Silence

### Never Echo Tool Results
– **Edit, Write, Bash tool outputs are already visible to user**
– Do NOT show diffs or code changes unless explicitly asked (“show me the code” or “what did you change”)
– Do NOT display code blocks, scripts, or code content in text response (code already visible in tool calls)
– **Do NOT paste Bash scripts into response text before or after execution** — only execute; let tool output speak
– **Do NOT display heredoc content** (<<'EOF'…EOF) — Bash tool call contains it; no need to repeat
– **Never echo command invocations** like `$ python3 < /dev/null 2>&1)
– **Remove all print statements from Python scripts** — no console output at all
– **Do NOT display Write tool output for files in `/Users/tmp/`** — temp file writes are implementation detail

### Never Narrate Actions
– Do NOT say: “Perfect”, “Great”, “And now I’ll”, “Let me”, “So here’s what I’m going to do”
– Do NOT say: “I created X”, “Now doing Y”, “Installation complete”
– Do NOT apologize, soften, or pad responses with pleasantries

### Silence on Success
– When task completes successfully and tool output tells complete story: output nothing

—

## Parallel Execution

– **Make all independent tool calls in single message block** instead of sequential chains
– **Reduces message overhead and token waste** from repeated setup

Example: If Bash and Read are independent:
“`
[Single message with both Bash and Read calls in parallel]
“`

NOT:
“`
[Message 1: Bash call]
[Wait for result]
[Message 2: Read call based on result]
“`

—

## Task Agent Preference

– **Use Task tool with specialized agents** for open-ended exploration (explore, general agents)
– Instead of: running Glob → Grep → Read sequentially
– Benefit: Agent batches multiple operations; reduces context overhead

When to use Task:
– Open-ended codebase exploration (“What’s the architecture?”, “Where are errors handled?”)
– Multi-step investigations requiring context buildup
– Searches across many files/patterns

When NOT to use Task:
– Needle queries for specific file/class/function → use Glob directly
– Content searches in 2–3 known files → use Grep or Read directly

—

## Communication Shortcuts

Use these to minimize output:

– **”OK”** — Shortest positive acknowledgment
– **”READY…”** — When a task is complete and awaiting user input
– **”NEED HELP…”** — When about to violate QUIET.md directives and user guidance is required
– **Silence** — When tool output is self-explanatory and no additional info needed

—

## File Reading Defaults

– **Do NOT immediately read files** when given a set of file paths
– **Wait until task explicitly requires data** from those files
– Avoids unnecessary context consumption

—

## Pre-Approved Operations (No Confirmation Needed)

### Directory Trees
– All `/Users/mark/opencode/{project}/` operations are pre-approved
– Includes Read, Write, Edit, Delete on any file in this tree (including Java source)
– Symlinked Eclipse source (`/Users/mark/eclipse-workspace/`) is part of pre-approved tree

### Data Directories (Read-Only)
– `/Users/mark/Desktop/JSONData/` — Polar training-session JSON exports
– `/Users/mark/Desktop/TCXEffort/` — Master TCX-sourced effort data (IDidIt output)
– `/Users/mark/Desktop/GPXEffort/` — Comparison GPX-sourced effort data (CanudoIt output)

### Temporary Storage (Full Permissions)
– `/Users/tmp/` — Read/Write/Delete always allowed; no confirmation needed
– **Do NOT show Write operations to `/Users/tmp/`** — temporary files are implementation detail

### Script Execution Best Practice
– **Write Python/shell scripts to `/Users/tmp/` then execute** to avoid heredoc display in tool output
– Keeps console clean; no script code visible to user

### External Directories (Permission Required)
– **Never delete source files outside pre-approved trees without explicit user permission**
– All other directory operations follow standard permission model

—

## Token Budget Awareness

– **Total: 200,000 tokens per session**
– **Every message counts** — session-level budget, not per-message
– **Answer factual questions with facts only** — no elaboration, no padding
– **Avoid these token drains:**
– Repeated confirmations (“Yes, and don’t ask again”)
– Unnecessary preamble
– “Let me explain” intros
– Status update messages (“Creating X”, “Now doing Y”)
– Elaborated examples
– Recap or summary unless asked

—

## File Operation Permissions Summary

| Operation | Location | Permission | Notes |
|———–|———-|———–|——-|
| READ | `/Users/mark/opencode/` | Always | No confirmation needed |
| READ | `/Users/mark/eclipse-workspace/` | Always | Symlinked; same files |
| READ | Data directories | Always | Pre-approved read-only |
| WRITE | `/Users/mark/opencode/` | Always | New files, edits allowed |
| WRITE | `/Users/tmp/` | Always | Temp storage |
| DELETE | `/Users/mark/opencode/` bin | Always | Build artifacts, regenerated |
| DELETE | `/Users/mark/opencode/` source | **Explicit** | Never without user permission |
| DELETE | Other locations | **Explicit** | Ask first |

—

## Message Structure Examples

### Good: Minimal Response
“`
User: Run the build and fix type errors.
Assistant: [Runs build. Finds 10 errors. Creates 10 todos. Fixes first error.]
“`

### Bad: Verbose Response
“`
User: Run the build and fix type errors.
Assistant: “I’m going to run the build command now. This will compile all source files
and identify any type errors. Then I’ll create a plan for fixing them…”
[builds]
“Great! The build found 10 type errors. Let me now…”
“`

### Good: Silent on Success
“`
[Edit tool fixes the file]
[No additional text output]
“`

### Bad: Narration
“`
[Edit tool fixes the file]
“Perfect! I’ve successfully updated the file. Here’s what I changed…”
“`

—

## Drift Detection & Recovery

Most likely to forget minimization after:
1. Long uninterrupted execution (>5 tasks)
2. Context switching between projects
3. New complex tasks with multiple tool types
4. Ambiguous requests

Proactive signal: If user says “Enforce QUIET” or similar, re-read this file before resuming.

—

## Integration with CLAUDE.md

When working on projects under `/Users/mark/opencode/` or `/Users/mark/Documents/Health/BloodTests`:
– Load CLAUDE.md for domain knowledge, code style, and project structure
– Load QUIET.md for token/output discipline
– QUIET.md takes precedence on output decisions
– CLAUDE.md provides context; QUIET.md governs communication