Sonnet 4.6 narrowed the gap with Opus 4.6 to almost nothing on key benchmarks — while costing 40-80% less. So is Opus still worth it?
At a glance
| Sonnet 4.6 | Opus 4.6 | |
|---|---|---|
| Context window | 1M tokens | 1M tokens (beta) |
| Max output | 64K tokens | 128K tokens |
| SWE-bench Verified | 79.6% | 80.8% |
| OSWorld (computer use) | 72.5% | — |
| Adaptive thinking | Yes | Yes |
| Agent teams | No | Yes |
| Input price | $3 / 1M tokens | $5 / 1M tokens |
| Output price | $15 / 1M tokens | $25 / 1M tokens |
The 1.2% gap
On SWE-bench Verified — the most important coding benchmark — Opus 4.6 scores 80.8% vs Sonnet’s 79.6%. That’s a 1.2-point difference. For that gap, you’re paying 67% more on input and 67% more on output.
For most developers, that math doesn’t work out.
When Sonnet 4.6 is the better choice
- Most coding tasks. The 1.2% gap is negligible for day-to-day development.
- High-volume API use. At $3/$15 vs $5/$25, the savings compound fast.
- Computer use / UI agents. Sonnet 4.6 scores 72.5% on OSWorld — excellent for browser automation.
- General assistant work. Writing, analysis, summarization — Sonnet handles these just as well.
When Opus 4.6 is still worth it
- Complex multi-file architecture. For large-scale refactors across many files, Opus’s extra reasoning depth shows.
- Agent teams. Only Opus 4.6 supports collaborative agent teams in Claude Code.
- 128K output. If you need very long generated outputs (Sonnet caps at 64K).
- Hardest reasoning tasks. On the most complex problems, Opus still has an edge.
Bottom line
Start with Sonnet 4.6. It’s the default on claude.ai for a reason — it gives you 95%+ of Opus’s capability at 40% less cost. Only upgrade to Opus if you’re hitting Sonnet’s limits on complex agentic workflows or need the longer output.
The fact that a Sonnet model is even comparable to Opus is the real story here. Anthropic has essentially made flagship-level AI accessible at mid-tier pricing.
See our full AI Model Comparison for all models side by side.