Apr 17, 2026 · 7 min read

Last updated on Apr 19, 2026

Claude Opus 4.7: Complete Guide to Benchmarks, Features & Pricing (2026)

⚡ Update (May 28, 2026): Claude Opus 4.8 is now available — 69.2% on SWE-bench Pro (+4.9 over 4.7), dynamic workflows, 4× fewer unflagged errors. Same price. See what changed from 4.7.

Anthropic released Claude Opus 4.7 on April 16, 2026. It’s the new top-of-line model in the Claude family, replacing Opus 4.6 as the go-to for complex coding, agentic workflows, and vision tasks. The headline number: 64.3% on SWE-bench Pro, which puts it well ahead of both its predecessor (53.4%) and GPT-5.4 (57.7%). Vision got a massive upgrade too — 3.75 megapixel support with 98.5% accuracy on XBOW, nearly doubling the previous score. If you’re using Claude for serious development work, this is a significant step up. But there are tradeoffs you need to know about, especially around the new tokenizer.

Quick specs

API model string	claude-opus-4-7
Release date	April 16, 2026
Context window	1,000,000 tokens
Input pricing	$5 / 1M tokens
Output pricing	$25 / 1M tokens
Max vision resolution	2,576px long edge (~3.75 MP)
Effort levels	low, medium, high, xhigh (new), max
Thinking mode	Adaptive (default) — budget_tokens deprecated
Availability	Claude.ai (Pro, Max, Team, Enterprise), API, Amazon Bedrock, Google Vertex AI, Microsoft Foundry
Free tier	No (Sonnet only on free)

Benchmark results

The numbers tell a clear story. Opus 4.7 leads on every major coding benchmark against publicly available models.

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Pro	64.3%	53.4%	57.7%	54.2%
CursorBench	70%	58%	—	—
Vision accuracy (XBOW)	98.5%	54.5%	—	—
SWE-bench Multilingual	80.5%	77.8%	—	—
BigLaw Bench (high effort)	90.9%	—	—	—
Finance Agent eval	SOTA	—	—	—
GDPval-AA	SOTA	—	—	—

Additional highlights: Rakuten-SWE-Bench shows 3x more production tasks resolved compared to Opus 4.6, and there’s a 13% improvement on a 93-task coding benchmark. The SWE-bench Pro gap over GPT-5.4 (64.3% vs 57.7%) is the widest lead any Claude model has held over OpenAI’s flagship in this benchmark’s history.

One caveat: Opus 4.7 still trails Anthropic’s own Mythos Preview on most benchmarks. Mythos remains restricted to vetted security firms through the Cyber Verification Program, so it’s not a practical alternative for most developers.

What’s new in Opus 4.7

Vision overhaul

The biggest surprise in this release. Opus 4.7 supports images up to 2,576px on the long edge — roughly 3.75 megapixels, which is 3x what previous models handled. The XBOW vision accuracy score jumped from 54.5% to 98.5%. If you’ve been avoiding Claude for vision tasks, this changes the calculus entirely. Screenshots, diagrams, UI mockups — all of these are now processed at much higher fidelity.

xhigh effort level

There’s a new effort level slotted between high and max. The xhigh level gives you most of the quality gains of max without the full token cost. In practice, it’s the sweet spot for complex coding tasks where high isn’t quite enough but max burns through your budget. Claude Code now defaults to xhigh for all plans.

File system-level memory

Opus 4.7 can maintain memory across multi-session work at the file system level. This means context about your project structure, conventions, and previous decisions persists between separate conversations when working through Claude Code or similar agentic setups. No more re-explaining your architecture every session.

Self-verification

The model now verifies its own outputs before reporting back to you. This is especially noticeable in agentic coding tasks — Opus 4.7 will run its own sanity checks on generated code, catch obvious errors, and fix them before presenting the result. It doesn’t eliminate all mistakes, but it reduces the “looks right, doesn’t compile” problem.

Adaptive thinking (default)

Adaptive thinking is now the default mode. The old budget_tokens parameter is deprecated. Instead, you control thinking depth through the effort parameter. The model decides how much thinking a given prompt needs rather than you pre-allocating a fixed token budget.

Task budgets (beta)

A new beta feature that lets you set a hard ceiling on token spend for long-running agentic tasks. Useful if you’re running Opus 4.7 on automated pipelines and need cost predictability.

Claude Code updates

Opus 4.7 ships alongside meaningful Claude Code improvements:

/ultrareview — A new slash command that spins up a dedicated review session for your code. It flags bugs, design issues, and potential problems in a structured format. Pro and Max users get 3 free ultrareviews included.
Auto mode for Max users — Previously limited to Enterprise, auto mode now works for Max plan users. Fewer permission interruptions on long multi-step tasks.
xhigh default — Claude Code’s default effort level is now xhigh across all plans, up from high. You’ll notice better output quality out of the box, but also higher token usage.

The tokenizer warning

This is the part you need to read carefully. Opus 4.7 ships with an updated tokenizer, and the same text can now map to 1.0x to 1.35x more tokens than it did with Opus 4.6.

What this means in practice: your existing prompts and workflows could cost up to 35% more in tokens even though the per-token price hasn’t changed. The pricing looks identical on paper ($5/$25 per million tokens), but your actual bill may increase because the same content consumes more tokens.

This hits hardest on:

Long system prompts you reuse across calls
Agentic workflows with lots of context passing
Higher effort levels (which already generate more output tokens)

Before you migrate production workloads, run your typical prompts through the new tokenizer and compare token counts. Don’t assume your costs stay flat just because the rate card didn’t change.

Migration from Opus 4.6

What to change

Thinking mode: Replace thinking: {type: 'enabled'} with budget_tokens with thinking: {type: 'adaptive'} and use the effort parameter instead. The old format still works for now but is deprecated and will eventually return errors.

// Old (Opus 4.6)
{
  "thinking": { "type": "enabled", "budget_tokens": 10000 }
}

// New (Opus 4.7)
{
  "thinking": { "type": "adaptive" },
  "effort": "xhigh"
}

Prompt tuning: Opus 4.7 interprets instructions more literally than 4.6. If your prompts relied on the model “reading between the lines” or inferring intent from vague instructions, you may need to be more explicit. Test your critical prompts before switching over.

What breaks

Prefilling assistant messages — This is a hard break. If you’re using the technique of prefilling the assistant’s response to steer output format, Opus 4.7 returns a 400 error. You’ll need to move that guidance into your system or user prompts instead.

Token budgets — Any hardcoded token budget assumptions need revisiting due to the tokenizer change. A prompt that used 1,000 tokens on 4.6 might use 1,350 on 4.7.

Pricing & availability

Pricing is unchanged from Opus 4.6:

Input: $5 per 1M tokens
Output: $25 per 1M tokens
Context window: 1,000,000 tokens

Available on Claude.ai (Pro, Max, Team, and Enterprise plans), the Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Not available on the free tier — free users are limited to Sonnet.

Remember: the tokenizer change means your effective cost per prompt may be higher even though the rate is the same. Budget accordingly.

Safety notes

Anthropic reports a similar overall safety profile to Opus 4.6, with some specific changes:

Better: Improved honesty scores and stronger resistance to prompt injection attacks.
Worse: Modestly weaker on harm-reduction advice for controlled substances.
Deliberate: Cyber capabilities are intentionally reduced compared to Mythos Preview. Security professionals who need those capabilities can apply through the new Cyber Verification Program.
Opus 4.7 is part of Anthropic’s Project Glasswing safeguards testing.

Who should upgrade (and who should wait)

Upgrade now if you:

Use Claude primarily for coding tasks (the SWE-bench Pro jump from 53.4% to 64.3% is substantial)
Need vision capabilities (the XBOW improvement from 54.5% to 98.5% is transformative)
Run multi-session agentic workflows that benefit from persistent memory
Use Claude Code regularly (xhigh default + /ultrareview are genuine workflow improvements)

Wait if you:

Have finely-tuned Opus 4.6 prompts in production (test thoroughly first — the literal interpretation change and tokenizer shift can break things)
Rely on assistant message prefilling (this is a breaking change, full stop)
Are cost-sensitive and haven’t measured the tokenizer impact on your specific workloads
Don’t need coding or vision improvements (for pure text tasks, the gains are more modest)

The honest take: Opus 4.7 is a clear upgrade for developer-focused use cases. The coding benchmarks are genuinely impressive, and the vision overhaul fills what was a real gap. But the tokenizer change is a hidden cost increase that Anthropic hasn’t exactly highlighted, and the prefilling removal will break some production setups. Test before you migrate.

Best AI Coding Tools in 2026
Claude vs GPT-5: Which AI Model Should You Use?
Amazon Bedrock: Complete Guide to Model Deployment
AI Tools for Code Review: 2026 Comparison
How to Optimize AI API Costs