Anthropic released Claude Opus 4.8 on May 28, 2026. It replaces Opus 4.7 as the top model in the Claude family. The headline numbers: 69.2% on SWE-bench Pro (up from 64.3%), 74.2% on Terminal-Bench 2.1 (up 8.4 points), and four times fewer unflagged code flaws. Pricing stays the same at $5/$25 per million tokens.
The release also introduces dynamic workflows in Claude Code — a feature that spawns hundreds of parallel subagents to tackle codebase-scale problems — and effort control across all Claude products. Anthropic calls it “a modest but tangible improvement” while noting that Mythos-class models with even higher intelligence are coming in weeks.
Quick specs
| API model string | claude-opus-4-8 |
| Release date | May 28, 2026 |
| Context window | 1,000,000 tokens |
| Input pricing | $5 / 1M tokens |
| Output pricing | $25 / 1M tokens |
| Fast mode pricing | $10 / $50 per 1M tokens (3x cheaper than before) |
| Fast mode speed | 2.5× standard speed |
| Effort levels | low, medium, high (default), extra/xhigh, max |
| Dynamic workflows | Yes (research preview, Max/Team/Enterprise) |
| Availability | Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry |
Benchmark comparison
| Benchmark | Opus 4.8 | Opus 4.7 | GPT-5.5 | Gemini 3.5 Flash | What it measures |
|---|---|---|---|---|---|
| SWE-bench Pro | 69.2% | 64.3% | 58.6% | 54.2% | Agentic coding (real GitHub issues) |
| Terminal-Bench 2.1 | 74.2% | 65.8% | 72.1%* | — | Command-line task completion |
| SWE-bench Verified | 88.6% | 85.2% | 78.1% | — | Code generation accuracy |
| OSWorld-Verified | 87.1% | 82.3% | — | — | Computer use tasks |
| Humanity’s Last Exam | 57.9% | 51.2% | 53.4% | — | Multidisciplinary reasoning (with tools) |
| Finance Agent v2 | 53.9% | 48.7% | — | 57.9% | Financial analysis tasks |
| Artificial Analysis Index | 61.4 | 57.3 | 60.2 | — | Overall intelligence composite |
*GPT-5.5’s Terminal-Bench score of 83.4% uses the Codex CLI harness, not the standard Terminus-2 harness used for all other models.
Opus 4.8 leads on agentic coding (SWE-bench Pro) by a wide margin — 10.6 points ahead of GPT-5.5. It also takes the #1 spot on the Artificial Analysis Intelligence Index, edging out GPT-5.5 by 1.2 points.
What changed from Opus 4.7
For a detailed side-by-side, see our Opus 4.8 vs 4.7 comparison. The key improvements:
Honesty and self-correction
The most notable improvement is reliability. Opus 4.8 is four times less likely than 4.7 to let flawed code pass without flagging the issue. It proactively identifies uncertainties, pushes back on unsound plans, and catches its own mistakes before reporting results.
Devin’s CEO Scott Wu noted: “It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7.”
Agentic coding
The 4.9-point jump on SWE-bench Pro (64.3% → 69.2%) represents a meaningful improvement in multi-step coding tasks. Opus 4.8 handles complex debugging, multi-file refactoring, and long-running agent sessions with better coherence.
Tool calling efficiency
Cursor’s CEO Michael Truell reported: “Tool calling is meaningfully more efficient, using fewer steps for the same intelligence.” This means lower token usage for the same quality of output in agentic workflows.
Computer use
OSWorld-Verified improved from 82.3% to 87.1%. The model scores 84% on Online-Mind2Web, making it the strongest browser-agent model tested according to Browserbase.
Alignment
Anthropic’s alignment team found Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” Misaligned behavior rates are substantially lower than 4.7 and match Claude Mythos Preview.
Dynamic workflows
The biggest feature launch alongside Opus 4.8 is dynamic workflows in Claude Code. This allows Claude to:
- Plan a large task and break it into subtasks
- Spawn tens to hundreds of parallel subagents
- Run them simultaneously with independent verification
- Check results before reporting back
Use cases include codebase-wide migrations, security audits, bug hunts across entire services, and language ports. Jarred Sumner used dynamic workflows to port Bun from Zig to Rust — 750,000 lines, 99.8% test pass rate, eleven days from first commit to merge.
Dynamic workflows are available on Max, Team, and Enterprise plans. Enable them by asking Claude to “create a workflow” or by setting effort to ultracode in Claude Code.
Effort control
Users now have explicit control over how much effort Claude puts into a response:
- Low — Fast responses, minimal thinking, uses rate limits slowly
- Medium — Balanced
- High (default for Opus 4.8) — Similar token spend to Opus 4.7’s default, but better performance
- Extra/xhigh — More thinking for difficult tasks
- Max — Maximum effort, recommended for the hardest problems
In the API, set effort via the thinking parameter. In Claude Code, use /effort or the effort menu.
Fast mode
Fast mode makes Opus 4.8 respond at 2.5× normal speed. The pricing for fast mode is now 3× cheaper than it was for previous models:
- Standard: $5 input / $25 output per million tokens
- Fast mode: $10 input / $50 output per million tokens (was $30/$150 for Opus 4.7)
This makes fast mode viable for production workloads where latency matters more than cost.
API changes
One notable API addition: system entries inside the messages array. Developers can now update Claude’s instructions mid-task without breaking the prompt cache or routing through a user turn. This is useful for:
- Updating permissions as an agent runs
- Adjusting token budgets mid-session
- Changing environment context dynamically
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=[
{"role": "user", "content": "Analyze this codebase"},
{"role": "assistant", "content": "I'll start by..."},
{"role": "system", "content": "Budget remaining: 50K tokens. Prioritize critical issues only."},
{"role": "user", "content": "Continue"}
]
)
Pricing comparison
| Model | Input/M | Output/M | SWE-bench Pro |
|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 | 69.2% |
| GPT-5.5 | $5.00 | $30.00 | 58.6% |
| Gemini 3.5 Flash | $0.15 | $0.60 | 54.2% |
| DeepSeek V4-Pro | $0.435 | $0.87 | — |
| MiMo V2.5 Pro | $0.435 | $0.87 | — |
Opus 4.8 is the most capable model for agentic coding but also the most expensive. For cost-sensitive workloads, Chinese models offer 30x lower pricing with competitive (though not leading) benchmark scores. For a direct comparison, see Opus 4.8 vs DeepSeek V4-Pro.
What’s next: Mythos
Anthropic confirmed that Mythos-class models — with “even higher intelligence than Opus” — will be available to all customers “in the coming weeks.” Currently, Claude Mythos Preview is limited to cybersecurity work under Project Glasswing. The company is developing safety guardrails to enable broader release.
Who should upgrade
- Already on Opus 4.7: Upgrade immediately. Same price, better at everything, fewer bugs in output.
- On Sonnet 4.6: Opus 4.8 is worth the premium if you do complex agentic work, multi-file refactoring, or need high reliability.
- On GPT-5.5: Opus 4.8 beats it on SWE-bench Pro by 10.6 points. If coding quality is your priority, switch.
- On DeepSeek/MiMo: Stay if cost is your primary concern. Opus 4.8 is better but 30-60x more expensive per token.
FAQ
Is Opus 4.8 worth the upgrade from 4.7?
Yes. Same price, better benchmarks across the board, fewer unflagged errors, and dynamic workflows. There is no reason to stay on 4.7.
How does Opus 4.8 compare to GPT-5.5 for coding?
Opus 4.8 leads on SWE-bench Pro (69.2% vs 58.6%) and the Artificial Analysis Intelligence Index (61.4 vs 60.2). GPT-5.5 scores higher on Terminal-Bench with its native Codex CLI harness (83.4% vs 74.2%), but that comparison uses different tooling. On a level playing field, Opus 4.8 is the stronger coding model.
What are dynamic workflows?
A new Claude Code feature that spawns hundreds of parallel subagents to tackle large-scale problems. Think codebase migrations, security audits, or language ports. Available on Max, Team, and Enterprise plans. See our dynamic workflows guide.
Is the pricing the same as Opus 4.7?
Yes. $5/M input, $25/M output for standard mode. Fast mode is actually cheaper now: $10/$50 (was $30/$150 for 4.7).
When is Mythos coming?
Anthropic says “in the coming weeks.” It’s currently in limited preview for cybersecurity work. Expect broader availability in June-July 2026.
Should I use Opus 4.8 or Gemini 3.5 Flash?
Depends on your budget. Opus 4.8 is the better model for coding (69.2% vs 54.2% on SWE-bench Pro) but costs 33x more per token. Gemini 3.5 Flash wins on some tool-use benchmarks and is better value for simpler tasks. See our Opus 4.8 vs Gemini 3.5 Flash comparison.