Two flagship models. Two very different philosophies. Anthropic dropped Claude Opus 4.7 yesterday (April 16, 2026), barely a month after OpenAI shipped GPT-5.4. Both claim the crown. Neither is wrong — they just win in different arenas.
This is the honest breakdown. No cheerleading. Just benchmarks, pricing, and practical advice on which one deserves your API budget.
At a Glance
| Feature | Claude Opus 4.7 | GPT-5.4 |
|---|---|---|
| Provider | Anthropic | OpenAI |
| Released | April 16, 2026 | ~March 2026 |
| Context Window | 1,000,000 tokens | 1,000,000 tokens |
| API Name | claude-opus-4-7 |
— |
| Input Pricing | $5 / 1M tokens | ~$5 / 1M tokens |
| Output Pricing | $25 / 1M tokens | ~$25 / 1M tokens |
| Max Output | 128K tokens | — |
| SWE-bench Pro | 64.3% | 57.7% |
| Vision | 98.5% XBOW, 3.75 MP | Strong (details vary) |
| Access | API, Claude Code | API, ChatGPT Pro ($200/mo), Plus ($20/mo) |
Coding: Opus 4.7 Wins Clearly
This is where Opus 4.7 pulls away. The numbers aren’t subtle:
- SWE-bench Pro: 64.3% vs 57.7% — a 6.6-point gap on real-world software engineering tasks.
- CursorBench: 70% — Opus 4.7 is the first model to break the 70% mark on this IDE-integrated coding benchmark.
- SWE-bench Multilingual: 80.5% — strong performance across languages, not just Python.
- BigLaw Bench: 90.9% — not coding per se, but it signals the kind of precise, detail-oriented reasoning that matters in complex codebases.
Opus 4.7 also introduces five effort levels (low, medium, high, xhigh, and max), letting you dial compute up or down depending on the task. The new xhigh tier sits between the old high and max, giving you a sweet spot for tasks that need serious reasoning without burning through your budget at max effort.
A caveat worth stating plainly: these benchmarks come from Anthropic’s announcement. They chose which benchmarks to highlight. GPT-5.4 may perform better on benchmarks Anthropic didn’t include — OpenAI’s own Terminal-Bench results, for instance, are competitive by their own reporting. Take any vendor-selected comparison with a grain of salt.
General Reasoning: Closer Than You’d Think
GPT-5.4 doesn’t have a single headline number that screams dominance here, but in practice it holds up well. OpenAI has consistently optimized for multi-step reasoning and agent workflows, and GPT-5.4 continues that trajectory.
Where GPT-5.4 tends to shine:
- Research and synthesis — pulling together information from long contexts into coherent analysis.
- Writing and brainstorming — still the model many writers and content teams reach for first.
- Mixed workflows — when you need a single model that’s good-enough at everything rather than best-in-class at one thing.
Opus 4.7 counters with state-of-the-art results on Finance Agent and GDPval-AA benchmarks, suggesting it’s no slouch at complex reasoning either. But if your workload is more “research assistant” than “code generator,” GPT-5.4 remains a strong pick.
Vision: Opus 4.7’s 3.75 Megapixels Is a Big Deal
Opus 4.7 scores 98.5% on the XBOW vision benchmark and supports images up to 3.75 megapixels. That’s a meaningful jump — it means you can feed it high-resolution screenshots, architectural diagrams, or dense data visualizations without downscaling.
For developers working with UI screenshots, design specs, or document processing, this matters. GPT-5.4 has solid vision capabilities too, but Anthropic is clearly pushing the resolution ceiling higher with this release.
Pricing: Similar on Paper, but Read the Fine Print
Both models land in the same ballpark: roughly $5 per million input tokens and $25 per million output tokens. On a spec sheet, it’s a wash.
But there’s a catch with Opus 4.7: the new tokenizer.
Anthropic shipped a new tokenizer with Opus 4.7 that can produce up to 35% more tokens for the same text compared to previous Claude models. That means the same prompt that cost you X tokens on Opus 4 might cost you 1.35X tokens on Opus 4.7. The per-token price looks identical, but your actual bill could be meaningfully higher for the same workload.
This isn’t a hidden fee — Anthropic has been transparent about it — but it’s easy to miss if you’re just comparing rate cards. If you’re migrating from an older Claude model, benchmark your actual token counts before committing to a budget.
GPT-5.4’s pricing is straightforward through the API, and OpenAI also offers access through ChatGPT Pro ($200/month) and Plus ($20/month) subscriptions, which can be more economical for individual users who don’t need raw API access.
Ecosystem: Claude Code vs ChatGPT + Codex CLI
The model is only half the story. The tooling around it matters just as much.
Opus 4.7’s ecosystem leans heavily into developer workflows:
- Claude Code with the new
/ultrareviewcommand for deep code review. - Auto mode that lets the model decide its own effort level per task.
- Task budgets to cap spending on agentic workflows.
- File system memory — persistent context across sessions without manual prompt stuffing.
GPT-5.4’s ecosystem plays to breadth:
- ChatGPT remains the most widely used AI interface, period.
- Codex CLI gives terminal-native developers an OpenAI-powered coding assistant.
- The plugin and GPT ecosystem offers integrations Anthropic hasn’t matched yet.
- ChatGPT Pro and Plus tiers make it accessible without API key management.
If you live in the terminal and write code all day, Claude Code’s feature set is hard to beat right now. If you need a model that plugs into a broader set of tools and workflows — or you have non-technical team members who need access — OpenAI’s ecosystem is more mature.
Who Should Pick Which
Choose Claude Opus 4.7 if you:
- Write code professionally and want the best available coding model
- Need high-resolution vision for screenshots, diagrams, or documents
- Work in Claude Code or terminal-first environments
- Want granular control over compute effort (the five-tier system is genuinely useful)
- Need multilingual code support
Choose GPT-5.4 if you:
- Need a strong generalist for research, writing, and mixed tasks
- Want the ChatGPT interface for team members who aren’t developers
- Prefer OpenAI’s broader ecosystem and integrations
- Want predictable tokenization costs without a new tokenizer to account for
- Already have workflows built around OpenAI’s API
Or use both. Seriously. At similar price points, there’s no rule that says you pick one. Route coding tasks to Opus 4.7 and general reasoning to GPT-5.4. The models are good at different things — let them be.
Related Links
- Anthropic’s Opus 4.7 announcement — official benchmarks and feature details
- OpenAI GPT-5.4 documentation — API reference and pricing
- SWE-bench Pro leaderboard — independent coding benchmark results
- Claude Code documentation — /ultrareview, auto mode, and task budgets
- ChatGPT pricing — Pro, Plus, and API rate cards