Claude Opus 4.7 vs 4.6: What Changed and Is It Worth Upgrading?

⚡ Update: Claude Opus 4.8 has replaced 4.7 (May 28, 2026). Same price, better benchmarks, dynamic workflows. Upgrade immediately.

Anthropic dropped Claude Opus 4.7 yesterday (April 16, 2026), just ten weeks after Opus 4.6. The headline numbers are impressive — a 10.9-point jump on SWE-bench Pro, near-perfect vision scores, and a bunch of new agentic features. But there’s a catch: a new tokenizer that inflates token counts by up to 35%. Here’s the full breakdown.

At a glance

	Opus 4.7	Opus 4.6
Released	April 16, 2026	Feb 5, 2026
SWE-bench Pro	64.3%	53.4%
CursorBench	70%	58%
Vision (XBOW)	98.5%	54.5%
SWE-bench Multilingual	80.5%	77.8%
Context window	1M tokens	1M tokens (beta)
Pricing (input/output)	$5 / $25 per 1M	$5 / $25 per 1M
Tokenizer	New (up to 35% more tokens)	Standard
Effort levels	low, medium, high, xhigh, max	low, medium, high, max
Vision resolution	2,576px long edge (~3.75 MP)	~1,568px long edge (~1.15 MP)
Prefilling	❌ Not supported (400 error)	✅ Supported

The per-token pricing is identical, but the new tokenizer means the same text produces more tokens — so your effective cost goes up. More on that below.

Where Opus 4.7 wins

Coding

The jump from 53.4% to 64.3% on SWE-bench Pro is substantial — that’s a 20% relative improvement. CursorBench tells a similar story, going from 58% to 70%. In practice, Opus 4.7 handles multi-file refactors, complex debugging, and cross-language tasks noticeably better. The SWE-bench Multilingual bump from 77.8% to 80.5% confirms it’s not just English-centric improvements.

Vision

This is the biggest leap. XBOW scores went from 54.5% to 98.5% — nearly doubling. The max resolution jumped to 2,576px on the long edge (~3.75 megapixels), roughly 3x the area of Opus 4.6. If you’re doing screenshot analysis, diagram parsing, or UI review, this is a different model.

New features

xhigh effort level — slots between high and max, giving you a better cost/quality tradeoff for tasks that need more reasoning but don’t justify max compute.
File system memory — the model can persist context across sessions by reading/writing to the file system. This is a big deal for agentic workflows.
Self-verification — Opus 4.7 checks its own outputs before returning them. This reduces hallucinations and incorrect code, especially on longer tasks.
Adaptive thinking is now the default — the budget_tokens parameter is deprecated. The model decides how much to think on its own.
Task budgets (beta) — set spending limits on agentic tasks.
Auto mode for Max users — Claude.ai Max subscribers get automatic model routing.
/ultrareview in Claude Code — a new command for deep code review passes.

The elephant in the room: the tokenizer tax

The pricing table says $5/$25 per million tokens — same as Opus 4.6. But Opus 4.7 ships with a new tokenizer that produces up to 35% more tokens for the same input text. This means:

A prompt that cost $1.00 on Opus 4.6 could cost up to $1.35 on Opus 4.7
Your context window fills up faster
Rate limits hit sooner

Anthropic hasn’t said much about why the tokenizer changed. The likely explanation is that the new tokenizer supports the improved multilingual and vision capabilities, but the cost impact is real. If you’re running high-volume API workloads, benchmark your actual token counts before switching. The 35% figure is a worst case — typical English text seems to land around 15-25% more tokens in early testing.

There’s no way to use the old tokenizer with the new model.

The Opus 4.6 degradation context

It’s impossible to talk about Opus 4.7 without addressing what happened to Opus 4.6 in the weeks before this release.

Starting in mid-March, users began reporting that Opus 4.6 felt noticeably worse — shorter responses, shallower reasoning, more refusals. This wasn’t just vibes. A HuggingFace analysis across 6,852 sessions documented a 67% drop in reasoning depth. BridgeBench accuracy fell from 83.3% (ranked #2) to 68.3% (ranked #10). An AMD senior director posted forensic evidence on GitHub showing measurable degradation in structured outputs.

The community coined the term “AI shrinkflation” — same price, less capability. Anthropic denied modifying the model weights.

Whether the degradation was intentional, a side effect of infrastructure changes, or something else entirely remains unclear. What is clear: many users feel that Opus 4.7 restores the quality they were getting from Opus 4.6 at launch. Some have joked that Opus 4.7 feels like “early Opus 4.6” — which, depending on your perspective, is either reassuring or concerning.

Migration checklist

If you’re moving API code from Opus 4.6 to 4.7, here’s what to change:

Update the model string — use the new Opus 4.7 model identifier in your API calls.
Remove budget_tokens — this parameter is deprecated. Adaptive thinking is now the default. Sending budget_tokens still works but is ignored.
Remove any prefilling — Opus 4.7 returns a 400 error if you include assistant prefill content. Strip any assistant message prefills from your requests.
Test your token counts — the new tokenizer will inflate your usage. Run your typical prompts through the tokenizer endpoint and compare.
Consider xhigh effort — if you were using max effort and finding it slow/expensive, try xhigh as a middle ground.
Update vision pipelines — if you were downscaling images for Opus 4.6, you can now send higher-resolution images (up to 2,576px long edge) for better results.
Test agent workflows — file system memory and self-verification change how the model behaves in multi-turn agentic loops. Test thoroughly.

Should you upgrade?

Claude.ai / Max users: Yes, just use it. You get better coding, dramatically better vision, and new features like /ultrareview and auto mode. The tokenizer change doesn’t affect you on subscription plans.

API users (low volume): Probably yes. The quality improvements are significant, especially for coding and vision tasks. The tokenizer cost increase is manageable at low volumes. Just remove prefilling and budget_tokens first.

API users (high volume): Test first. Run your actual workloads and measure the token count difference. For some use cases the 15-35% token inflation could meaningfully impact costs. If you’re heavily reliant on prefilling, you’ll need to refactor that out entirely. The quality gains are real, but so is the cost increase — do the math for your specific usage.

If you were affected by Opus 4.6 degradation: Upgrade immediately. Early reports suggest Opus 4.7 restores (and exceeds) the quality level of Opus 4.6 at launch.

Claude Opus 4.7 vs 4.6: What Changed and Is It Worth Upgrading?

At a glance

Where Opus 4.7 wins

Coding

Vision

New features

The elephant in the room: the tokenizer tax

The Opus 4.6 degradation context

Migration checklist

Should you upgrade?

📬 AI Dev Weekly

You might also like

Claude Opus 4.7 vs GPT-5.4: Which AI Model Wins in 2026?

AI Model Comparison 2026: Claude vs ChatGPT vs Gemini

What's New in Claude Opus 4.6 vs 4.5

Claude Opus 4 vs GPT-5: Which AI Model Is Better?