🤖 AI Tools
· 7 min read

Claude Opus 4.8: Complete Guide to Benchmarks, Features & Pricing (2026)


Anthropic released Claude Opus 4.8 on May 28, 2026. It replaces Opus 4.7 as the top model in the Claude family. The headline numbers: 69.2% on SWE-bench Pro (up from 64.3%), 74.2% on Terminal-Bench 2.1 (up 8.4 points), and four times fewer unflagged code flaws. Pricing stays the same at $5/$25 per million tokens.

The release also introduces dynamic workflows in Claude Code — a feature that spawns hundreds of parallel subagents to tackle codebase-scale problems — and effort control across all Claude products. Anthropic calls it “a modest but tangible improvement” while noting that Mythos-class models with even higher intelligence are coming in weeks.

Quick specs

API model string claude-opus-4-8
Release date May 28, 2026
Context window 1,000,000 tokens
Input pricing $5 / 1M tokens
Output pricing $25 / 1M tokens
Fast mode pricing $10 / $50 per 1M tokens (3x cheaper than before)
Fast mode speed 2.5× standard speed
Effort levels low, medium, high (default), extra/xhigh, max
Dynamic workflows Yes (research preview, Max/Team/Enterprise)
Availability Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry

Benchmark comparison

BenchmarkOpus 4.8Opus 4.7GPT-5.5Gemini 3.5 FlashWhat it measures
SWE-bench Pro69.2%64.3%58.6%54.2%Agentic coding (real GitHub issues)
Terminal-Bench 2.174.2%65.8%72.1%*Command-line task completion
SWE-bench Verified88.6%85.2%78.1%Code generation accuracy
OSWorld-Verified87.1%82.3%Computer use tasks
Humanity’s Last Exam57.9%51.2%53.4%Multidisciplinary reasoning (with tools)
Finance Agent v253.9%48.7%57.9%Financial analysis tasks
Artificial Analysis Index61.457.360.2Overall intelligence composite

*GPT-5.5’s Terminal-Bench score of 83.4% uses the Codex CLI harness, not the standard Terminus-2 harness used for all other models.

Opus 4.8 leads on agentic coding (SWE-bench Pro) by a wide margin — 10.6 points ahead of GPT-5.5. It also takes the #1 spot on the Artificial Analysis Intelligence Index, edging out GPT-5.5 by 1.2 points.

What changed from Opus 4.7

For a detailed side-by-side, see our Opus 4.8 vs 4.7 comparison. The key improvements:

Honesty and self-correction

The most notable improvement is reliability. Opus 4.8 is four times less likely than 4.7 to let flawed code pass without flagging the issue. It proactively identifies uncertainties, pushes back on unsound plans, and catches its own mistakes before reporting results.

Devin’s CEO Scott Wu noted: “It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7.”

Agentic coding

The 4.9-point jump on SWE-bench Pro (64.3% → 69.2%) represents a meaningful improvement in multi-step coding tasks. Opus 4.8 handles complex debugging, multi-file refactoring, and long-running agent sessions with better coherence.

Tool calling efficiency

Cursor’s CEO Michael Truell reported: “Tool calling is meaningfully more efficient, using fewer steps for the same intelligence.” This means lower token usage for the same quality of output in agentic workflows.

Computer use

OSWorld-Verified improved from 82.3% to 87.1%. The model scores 84% on Online-Mind2Web, making it the strongest browser-agent model tested according to Browserbase.

Alignment

Anthropic’s alignment team found Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” Misaligned behavior rates are substantially lower than 4.7 and match Claude Mythos Preview.

Dynamic workflows

The biggest feature launch alongside Opus 4.8 is dynamic workflows in Claude Code. This allows Claude to:

  1. Plan a large task and break it into subtasks
  2. Spawn tens to hundreds of parallel subagents
  3. Run them simultaneously with independent verification
  4. Check results before reporting back

Use cases include codebase-wide migrations, security audits, bug hunts across entire services, and language ports. Jarred Sumner used dynamic workflows to port Bun from Zig to Rust — 750,000 lines, 99.8% test pass rate, eleven days from first commit to merge.

Dynamic workflows are available on Max, Team, and Enterprise plans. Enable them by asking Claude to “create a workflow” or by setting effort to ultracode in Claude Code.

Effort control

Users now have explicit control over how much effort Claude puts into a response:

  • Low — Fast responses, minimal thinking, uses rate limits slowly
  • Medium — Balanced
  • High (default for Opus 4.8) — Similar token spend to Opus 4.7’s default, but better performance
  • Extra/xhigh — More thinking for difficult tasks
  • Max — Maximum effort, recommended for the hardest problems

In the API, set effort via the thinking parameter. In Claude Code, use /effort or the effort menu.

Fast mode

Fast mode makes Opus 4.8 respond at 2.5× normal speed. The pricing for fast mode is now 3× cheaper than it was for previous models:

  • Standard: $5 input / $25 output per million tokens
  • Fast mode: $10 input / $50 output per million tokens (was $30/$150 for Opus 4.7)

This makes fast mode viable for production workloads where latency matters more than cost.

API changes

One notable API addition: system entries inside the messages array. Developers can now update Claude’s instructions mid-task without breaking the prompt cache or routing through a user turn. This is useful for:

  • Updating permissions as an agent runs
  • Adjusting token budgets mid-session
  • Changing environment context dynamically
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Analyze this codebase"},
        {"role": "assistant", "content": "I'll start by..."},
        {"role": "system", "content": "Budget remaining: 50K tokens. Prioritize critical issues only."},
        {"role": "user", "content": "Continue"}
    ]
)

Pricing comparison

ModelInput/MOutput/MSWE-bench Pro
Claude Opus 4.8$5.00$25.0069.2%
GPT-5.5$5.00$30.0058.6%
Gemini 3.5 Flash$0.15$0.6054.2%
DeepSeek V4-Pro$0.435$0.87
MiMo V2.5 Pro$0.435$0.87

Opus 4.8 is the most capable model for agentic coding but also the most expensive. For cost-sensitive workloads, Chinese models offer 30x lower pricing with competitive (though not leading) benchmark scores. For a direct comparison, see Opus 4.8 vs DeepSeek V4-Pro.

What’s next: Mythos

Anthropic confirmed that Mythos-class models — with “even higher intelligence than Opus” — will be available to all customers “in the coming weeks.” Currently, Claude Mythos Preview is limited to cybersecurity work under Project Glasswing. The company is developing safety guardrails to enable broader release.

Who should upgrade

  • Already on Opus 4.7: Upgrade immediately. Same price, better at everything, fewer bugs in output.
  • On Sonnet 4.6: Opus 4.8 is worth the premium if you do complex agentic work, multi-file refactoring, or need high reliability.
  • On GPT-5.5: Opus 4.8 beats it on SWE-bench Pro by 10.6 points. If coding quality is your priority, switch.
  • On DeepSeek/MiMo: Stay if cost is your primary concern. Opus 4.8 is better but 30-60x more expensive per token.

FAQ

Is Opus 4.8 worth the upgrade from 4.7?

Yes. Same price, better benchmarks across the board, fewer unflagged errors, and dynamic workflows. There is no reason to stay on 4.7.

How does Opus 4.8 compare to GPT-5.5 for coding?

Opus 4.8 leads on SWE-bench Pro (69.2% vs 58.6%) and the Artificial Analysis Intelligence Index (61.4 vs 60.2). GPT-5.5 scores higher on Terminal-Bench with its native Codex CLI harness (83.4% vs 74.2%), but that comparison uses different tooling. On a level playing field, Opus 4.8 is the stronger coding model.

What are dynamic workflows?

A new Claude Code feature that spawns hundreds of parallel subagents to tackle large-scale problems. Think codebase migrations, security audits, or language ports. Available on Max, Team, and Enterprise plans. See our dynamic workflows guide.

Is the pricing the same as Opus 4.7?

Yes. $5/M input, $25/M output for standard mode. Fast mode is actually cheaper now: $10/$50 (was $30/$150 for 4.7).

When is Mythos coming?

Anthropic says “in the coming weeks.” It’s currently in limited preview for cybersecurity work. Expect broader availability in June-July 2026.

Should I use Opus 4.8 or Gemini 3.5 Flash?

Depends on your budget. Opus 4.8 is the better model for coding (69.2% vs 54.2% on SWE-bench Pro) but costs 33x more per token. Gemini 3.5 Flash wins on some tool-use benchmarks and is better value for simpler tasks. See our Opus 4.8 vs Gemini 3.5 Flash comparison.