Sonnet 4.6 narrowed the gap with Opus 4.6 to almost nothing on key benchmarks — while costing 40-80% less. So is Opus still worth it?
⚡ Update (April 17, 2026): Opus 4.7 has arrived and widens the gap again — 64.3% on SWE-bench Pro, 70% on CursorBench, and 98.5% vision accuracy. The Sonnet-vs-Opus calculus has shifted. See our Opus 4.7 complete guide for the full picture. The comparison below still applies to Sonnet 4.6 vs Opus 4.6.
At a glance
| Sonnet 4.6 | Opus 4.6 | |
|---|---|---|
| Context window | 1M tokens | 1M tokens (beta) |
| Max output | 64K tokens | 128K tokens |
| SWE-bench Verified | 79.6% | 80.8% |
| OSWorld (computer use) | 72.5% | — |
| Adaptive thinking | Yes | Yes |
| Agent teams | No | Yes |
| Input price | $3 / 1M tokens | $5 / 1M tokens |
| Output price | $15 / 1M tokens | $25 / 1M tokens |
The 1.2% gap
On SWE-bench Verified — the most important coding benchmark — Opus 4.6 scores 80.8% vs Sonnet’s 79.6%. That’s a 1.2-point difference. For that gap, you’re paying 67% more on input and 67% more on output.
For most developers, that math doesn’t work out.
When Sonnet 4.6 is the better choice
- Most coding tasks. The 1.2% gap is negligible for day-to-day development.
- High-volume API use. At $3/$15 vs $5/$25, the savings compound fast.
- Computer use / UI agents. Sonnet 4.6 scores 72.5% on OSWorld — excellent for browser automation.
- General assistant work. Writing, analysis, summarization — Sonnet handles these just as well.
When Opus 4.6 is still worth it
- Complex multi-file architecture. For large-scale refactors across many files, Opus’s extra reasoning depth shows.
- Agent teams. Only Opus 4.6 supports collaborative agent teams in Claude Code.
- 128K output. If you need very long generated outputs (Sonnet caps at 64K).
- Hardest reasoning tasks. On the most complex problems, Opus still has an edge.
Bottom line
Start with Sonnet 4.6. It’s the default on claude.ai for a reason — it gives you 95%+ of Opus’s capability at 40% less cost. Only upgrade to Opus if you’re hitting Sonnet’s limits on complex agentic workflows or need the longer output.
The fact that a Sonnet model is even comparable to Opus is the real story here. Anthropic has essentially made flagship-level AI accessible at mid-tier pricing.
FAQ
Is Opus 4.6 worth the premium over Sonnet 4.6?
For most developers, no. The 1.2-point SWE-bench gap (80.8% vs 79.6%) doesn’t justify paying 67% more. Opus is worth it only for complex multi-file architecture work, agent teams in Claude Code, or when you need 128K output tokens (vs Sonnet’s 64K).
When is Sonnet 4.6 enough?
For the vast majority of coding tasks, general assistant work, computer use agents, and high-volume API usage. Sonnet 4.6 gives you 95%+ of Opus’s capability at 40% less cost. It’s the default model on claude.ai for a reason.
Which is better for coding?
Opus 4.6 scores 80.8% vs Sonnet’s 79.6% on SWE-bench — a small but real gap. For day-to-day development, the difference is negligible. Opus shows its advantage on the hardest tasks: large-scale refactors across many files, complex architecture decisions, and long agent sessions where reasoning depth compounds.
See our full AI Model Comparison for all models side by side.
Related: Claude Code Guide · AI Coding Tools Pricing · How to Reduce LLM API Costs