Anthropic released Claude Opus 4.7 on April 16, 2026. It’s the new top-of-line model in the Claude family, replacing Opus 4.6 as the go-to for complex coding, agentic workflows, and vision tasks. The headline number: 64.3% on SWE-bench Pro, which puts it well ahead of both its predecessor (53.4%) and GPT-5.4 (57.7%). Vision got a massive upgrade too — 3.75 megapixel support with 98.5% accuracy on XBOW, nearly doubling the previous score. If you’re using Claude for serious development work, this is a significant step up. But there are tradeoffs you need to know about, especially around the new tokenizer.
Quick specs
| API model string | claude-opus-4-7 |
| Release date | April 16, 2026 |
| Context window | 1,000,000 tokens |
| Input pricing | $5 / 1M tokens |
| Output pricing | $25 / 1M tokens |
| Max vision resolution | 2,576px long edge (~3.75 MP) |
| Effort levels | low, medium, high, xhigh (new), max |
| Thinking mode | Adaptive (default) — budget_tokens deprecated |
| Availability | Claude.ai (Pro, Max, Team, Enterprise), API, Amazon Bedrock, Google Vertex AI, Microsoft Foundry |
| Free tier | No (Sonnet only on free) |
Benchmark results
The numbers tell a clear story. Opus 4.7 leads on every major coding benchmark against publicly available models.
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
| SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% |
| CursorBench | 70% | 58% | — | — |
| Vision accuracy (XBOW) | 98.5% | 54.5% | — | — |
| SWE-bench Multilingual | 80.5% | 77.8% | — | — |
| BigLaw Bench (high effort) | 90.9% | — | — | — |
| Finance Agent eval | SOTA | — | — | — |
| GDPval-AA | SOTA | — | — | — |
Additional highlights: Rakuten-SWE-Bench shows 3x more production tasks resolved compared to Opus 4.6, and there’s a 13% improvement on a 93-task coding benchmark. The SWE-bench Pro gap over GPT-5.4 (64.3% vs 57.7%) is the widest lead any Claude model has held over OpenAI’s flagship in this benchmark’s history.
One caveat: Opus 4.7 still trails Anthropic’s own Mythos Preview on most benchmarks. Mythos remains restricted to vetted security firms through the Cyber Verification Program, so it’s not a practical alternative for most developers.
What’s new in Opus 4.7
Vision overhaul
The biggest surprise in this release. Opus 4.7 supports images up to 2,576px on the long edge — roughly 3.75 megapixels, which is 3x what previous models handled. The XBOW vision accuracy score jumped from 54.5% to 98.5%. If you’ve been avoiding Claude for vision tasks, this changes the calculus entirely. Screenshots, diagrams, UI mockups — all of these are now processed at much higher fidelity.
xhigh effort level
There’s a new effort level slotted between high and max. The xhigh level gives you most of the quality gains of max without the full token cost. In practice, it’s the sweet spot for complex coding tasks where high isn’t quite enough but max burns through your budget. Claude Code now defaults to xhigh for all plans.
File system-level memory
Opus 4.7 can maintain memory across multi-session work at the file system level. This means context about your project structure, conventions, and previous decisions persists between separate conversations when working through Claude Code or similar agentic setups. No more re-explaining your architecture every session.
Self-verification
The model now verifies its own outputs before reporting back to you. This is especially noticeable in agentic coding tasks — Opus 4.7 will run its own sanity checks on generated code, catch obvious errors, and fix them before presenting the result. It doesn’t eliminate all mistakes, but it reduces the “looks right, doesn’t compile” problem.
Adaptive thinking (default)
Adaptive thinking is now the default mode. The old budget_tokens parameter is deprecated. Instead, you control thinking depth through the effort parameter. The model decides how much thinking a given prompt needs rather than you pre-allocating a fixed token budget.
Task budgets (beta)
A new beta feature that lets you set a hard ceiling on token spend for long-running agentic tasks. Useful if you’re running Opus 4.7 on automated pipelines and need cost predictability.
Claude Code updates
Opus 4.7 ships alongside meaningful Claude Code improvements:
- /ultrareview — A new slash command that spins up a dedicated review session for your code. It flags bugs, design issues, and potential problems in a structured format. Pro and Max users get 3 free ultrareviews included.
- Auto mode for Max users — Previously limited to Enterprise, auto mode now works for Max plan users. Fewer permission interruptions on long multi-step tasks.
- xhigh default — Claude Code’s default effort level is now
xhighacross all plans, up fromhigh. You’ll notice better output quality out of the box, but also higher token usage.
The tokenizer warning
This is the part you need to read carefully. Opus 4.7 ships with an updated tokenizer, and the same text can now map to 1.0x to 1.35x more tokens than it did with Opus 4.6.
What this means in practice: your existing prompts and workflows could cost up to 35% more in tokens even though the per-token price hasn’t changed. The pricing looks identical on paper ($5/$25 per million tokens), but your actual bill may increase because the same content consumes more tokens.
This hits hardest on:
- Long system prompts you reuse across calls
- Agentic workflows with lots of context passing
- Higher effort levels (which already generate more output tokens)
Before you migrate production workloads, run your typical prompts through the new tokenizer and compare token counts. Don’t assume your costs stay flat just because the rate card didn’t change.
Migration from Opus 4.6
What to change
Thinking mode: Replace thinking: {type: 'enabled'} with budget_tokens with thinking: {type: 'adaptive'} and use the effort parameter instead. The old format still works for now but is deprecated and will eventually return errors.
// Old (Opus 4.6)
{
"thinking": { "type": "enabled", "budget_tokens": 10000 }
}
// New (Opus 4.7)
{
"thinking": { "type": "adaptive" },
"effort": "xhigh"
}
Prompt tuning: Opus 4.7 interprets instructions more literally than 4.6. If your prompts relied on the model “reading between the lines” or inferring intent from vague instructions, you may need to be more explicit. Test your critical prompts before switching over.
What breaks
Prefilling assistant messages — This is a hard break. If you’re using the technique of prefilling the assistant’s response to steer output format, Opus 4.7 returns a 400 error. You’ll need to move that guidance into your system or user prompts instead.
Token budgets — Any hardcoded token budget assumptions need revisiting due to the tokenizer change. A prompt that used 1,000 tokens on 4.6 might use 1,350 on 4.7.
Pricing & availability
Pricing is unchanged from Opus 4.6:
- Input: $5 per 1M tokens
- Output: $25 per 1M tokens
- Context window: 1,000,000 tokens
Available on Claude.ai (Pro, Max, Team, and Enterprise plans), the Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Not available on the free tier — free users are limited to Sonnet.
Remember: the tokenizer change means your effective cost per prompt may be higher even though the rate is the same. Budget accordingly.
Safety notes
Anthropic reports a similar overall safety profile to Opus 4.6, with some specific changes:
- Better: Improved honesty scores and stronger resistance to prompt injection attacks.
- Worse: Modestly weaker on harm-reduction advice for controlled substances.
- Deliberate: Cyber capabilities are intentionally reduced compared to Mythos Preview. Security professionals who need those capabilities can apply through the new Cyber Verification Program.
- Opus 4.7 is part of Anthropic’s Project Glasswing safeguards testing.
Who should upgrade (and who should wait)
Upgrade now if you:
- Use Claude primarily for coding tasks (the SWE-bench Pro jump from 53.4% to 64.3% is substantial)
- Need vision capabilities (the XBOW improvement from 54.5% to 98.5% is transformative)
- Run multi-session agentic workflows that benefit from persistent memory
- Use Claude Code regularly (xhigh default + /ultrareview are genuine workflow improvements)
Wait if you:
- Have finely-tuned Opus 4.6 prompts in production (test thoroughly first — the literal interpretation change and tokenizer shift can break things)
- Rely on assistant message prefilling (this is a breaking change, full stop)
- Are cost-sensitive and haven’t measured the tokenizer impact on your specific workloads
- Don’t need coding or vision improvements (for pure text tasks, the gains are more modest)
The honest take: Opus 4.7 is a clear upgrade for developer-focused use cases. The coding benchmarks are genuinely impressive, and the vision overhaul fills what was a real gap. But the tokenizer change is a hidden cost increase that Anthropic hasn’t exactly highlighted, and the prefilling removal will break some production setups. Test before you migrate.