πŸ€– AI Tools
Β· 4 min read

Claude Sonnet 5 vs Sonnet 4.6: What Actually Changed


Claude Sonnet 5 replaces Sonnet 4.6 as Anthropic’s mid-tier model. If you have been running Sonnet 4.6 in production, this is the comparison that matters: what changed, whether it is worth switching, and what to watch for. The short version is that Sonnet 5 is a genuine step up on agentic work, but the new tokenizer means you should not assume your bill stays flat.

At a glance

Sonnet 5Sonnet 4.6
ReleasedJune 30, 2026February 17, 2026
Context window1M tokens1M tokens
SWE-bench Verifiedstrong79.6%
OSWorld (computer use)81.2%78.5%
Effort levelslow to x-highfixed
Input price$2 intro, then $3$3
Output price$10 intro, then $15$15
Tokenizerupdated (1.0 to 1.35x)previous

The headline changes

Much more agentic. This is the core upgrade. Sonnet 4.6 was already strong at coding, but Sonnet 5 finishes complex, multi-step tasks that older Sonnets would stop short on. Early partners describe it planning, using browsers and terminals, checking its own output, and running autonomously for long stretches.

Better computer use. OSWorld rose to 81.2 percent from 78.5 percent. For agents that drive real desktop and browser workflows, that is a meaningful reliability bump.

Selectable effort levels. Sonnet 4.6 ran at a fixed reasoning depth. Sonnet 5 exposes low, medium, high, max, and x-high, so you can dial accuracy against cost per task. See the effort levels guide.

Closer to Opus. Sonnet 4.6 trailed Opus 4.8 by a clear margin. Sonnet 5 lands within striking distance and even edges Opus 4.8 on GPQA-AAA v2. Full detail in Sonnet 5 vs Opus 4.8.

Safer. Sonnet 5 shows lower rates of hallucination and sycophancy than Sonnet 4.6, refuses malicious requests more reliably, and resists prompt-injection hijacks better.

What to watch for

The tokenizer change. Sonnet 5 uses an updated tokenizer, the same kind of change Anthropic made with Opus 4.7. The same text can map to roughly 1.0 to 1.35 times more tokens depending on content. Anthropic set introductory pricing so the move from 4.6 is roughly cost-neutral, not a flat discount. If you are migrating, model your real prompt mix rather than assuming the lower sticker price applies directly. We cover this in Sonnet 5 pricing explained.

Effort discipline. Because higher effort costs more tokens, the temptation to run everything at high or x-high can erase the savings. Match effort to task difficulty.

Should you upgrade?

For almost everyone running Sonnet 4.6, yes. Sonnet 5 is better at the agentic work the Sonnet line is used for, it is safer, and during the introductory window it is cheaper on paper. The main task is to re-validate your token budget against the new tokenizer and to set sensible default effort levels.

If you are running it inside Claude Code or Aider, the switch is a one-line model change. See the Claude Code setup and Aider setup.

What the upgrade feels like in practice

Numbers aside, the qualitative change testers describe is follow-through. Sonnet 4.6 was a capable model that would sometimes stop short on a multi-step task, leaving you to nudge it along. Sonnet 5 is more likely to carry the task to completion: it plans, executes, checks its own output, and corrects course without being told. For agent builders, that translates directly into fewer failed runs, less babysitting, and lower cleanup cost, which often matters more than a few benchmark points.

A migration checklist

If you are moving from Sonnet 4.6 to Sonnet 5, a short checklist keeps it smooth:

  1. Change the model string to claude-sonnet-5 in your code, Claude Code, or Aider config.
  2. Re-measure token usage on a sample of real prompts, since the new tokenizer can raise counts by up to 1.35 times.
  3. Set a default effort level (medium is a good start) instead of assuming the old fixed behavior.
  4. Spot-check a few representative tasks to confirm quality meets or beats 4.6.
  5. Update any cost dashboards to reflect the new rates and tokenizer.

That is usually a half-day of work for a meaningful capability and safety upgrade.

Who should hold off

Almost no one needs to stay on Sonnet 4.6, but there are edge cases. If you have heavily tuned prompts to the old tokenizer and cannot retest right now, or if you have strict cost ceilings and have not yet modeled the tokenizer impact, take a short beat to measure before flipping production traffic. Otherwise, the upgrade is straightforward and clearly worth it. For the full details, see the Sonnet 5 complete guide.

Frequently asked questions

Is Sonnet 5 better than Sonnet 4.6? Yes. It improves on agentic execution, computer use (81.2 vs 78.5 percent OSWorld), reasoning, and safety, and it adds selectable effort levels.

Is Sonnet 5 more expensive than Sonnet 4.6? Sticker pricing is the same at standard rates ($3 input, $15 output), and the introductory rate is lower. But the new tokenizer can raise effective token counts, so real costs depend on your workload.

Do I need to change my code to upgrade? Only the model string. Set it to claude-sonnet-5.

Did the context window change? No. Both Sonnet 4.6 and Sonnet 5 offer a 1M token context window.

The bottom line

Sonnet 5 is the upgrade Sonnet 4.6 users should make. The agentic and safety gains are real, and introductory pricing softens the move. Just budget for the new tokenizer and set deliberate effort levels. For the full picture, read the Sonnet 5 complete guide.