On paper, Claude Sonnet 5 looks like a steep discount on near-flagship quality. The real cost is more subtle, because of a tokenizer change that most launch coverage skipped. This guide breaks down the actual numbers so you can budget honestly.
The sticker price
| Period | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Introductory (through Aug 31, 2026) | $2 | $10 |
| Standard (from Sep 1, 2026) | $3 | $15 |
For comparison, Opus 4.8 is $5 input and $25 output. So Sonnet 5 is well under half the flagship price, and even cheaper during the introductory window.
The tokenizer catch
Here is the part you need to understand. Sonnet 5 uses an updated tokenizer, the same kind of change Anthropic introduced with Opus 4.7. The way the model splits text into tokens changed, and the same input can now map to roughly 1.0 to 1.35 times more tokens depending on content type.
That matters because you are billed per token, not per word. If your prompts land near the high end of that range, a nominal $2 input rate behaves more like $2.70 in real terms compared to the old tokenizer. Anthropic was transparent about this: it set the introductory price specifically so that moving from Sonnet 4.6 is roughly cost-neutral, not a flat price cut.
The takeaway: do not assume your bill drops just because the sticker price did. Measure your real prompt mix.
How to estimate your real cost
- Take a representative sample of your prompts and completions.
- Run them through Sonnet 5 and record actual input and output token counts from the API response.
- Multiply by the current rates ($2 and $10 during the intro window).
- Compare against your Sonnet 4.6 baseline for the same tasks.
This gives you a true cost delta rather than a sticker-price guess. For tooling, see monitor and control AI API spending and the LLM inference cost calculator.
The effort-level cost lever
Sonnet 5 exposes reasoning effort levels: low, medium, high, max, and x-high. Higher effort uses more tokens, so effort is a direct cost multiplier. The trap: at x-high, Sonnet 5 can cost more than Opus 4.8 at a comparable accuracy point. If you push Sonnet 5 to maximum effort to match the flagship, you can spend more than you would have on Opus 4.8 directly. The fix is to run lower effort for routine work and escalate to Opus 4.8 for the hard tasks rather than maxing out Sonnet. See the effort levels guide and Sonnet 5 vs Opus 4.8.
Ways to cut your Sonnet 5 bill
- Prompt caching. Cache long system prompts and shared context so you are not re-billed for them on every call.
- Trim context. The 1M window is large, but every token you send costs money. Send only what the task needs.
- Default to low effort. Reserve high and x-high for tasks that genuinely need them.
- Batch where possible. Group non-urgent work to reduce overhead.
- Track per-feature spend. Know which workflows burn the most tokens.
Sonnet 5 vs the field on price
Among frontier-adjacent models, Sonnet 5 is aggressively priced. It undercuts Opus 4.8 sharply and sits below the Mythos-class Fable 5, which was $10 input and $50 output before its ban. Against rivals like GPT-5.5 and Gemini 3.5 Flash, the value case depends on your workload; see Sonnet 5 vs GPT-5.5 and Sonnet 5 vs Gemini 3.5 Flash.
A concrete cost example
Say you process 2 million input tokens and generate 400,000 output tokens per day. At the introductory rate, that is 2 times $2 plus 0.4 times $10, which is $4 plus $4, for $8 a day on the sticker rate. Now apply the tokenizer: if your content lands near the high end of the 1.35 times range, your real input is closer to 2.7 million tokens, pushing input cost to about $5.40 and the daily total to roughly $9.40. After the introductory window ends and rates move to $3 input and $15 output, the same real usage is about $8.10 input plus $6 output, or roughly $14 a day. Running the same workload on Opus 4.8 at $5 and $25 would cost far more, which is the core value case, but the example shows why you should budget with the tokenizer and post-introductory rates in mind, not just todayβs sticker price.
When Sonnet 5 is not the cheapest option
There are two situations where Sonnet 5 can lose its price advantage. The first is running it at x-high effort to match Opus 4.8 on hard tasks, where the sheer volume of reasoning tokens can exceed Opus 4.8βs cost at comparable accuracy. The second is sending huge context on every call when only a fraction is needed, which inflates input tokens that the tokenizer then multiplies. Both are avoidable: escalate hard tasks to Opus 4.8 rather than maxing out Sonnet, and trim context aggressively. See the effort levels guide and token efficiency guide.
Budgeting for the price change in September
One easy thing to overlook: the introductory rate ends on August 31, 2026, after which input rises from $2 to $3 and output from $10 to $15, a 50 percent increase. If you size your budget on the introductory rate alone, you will get a surprise in September. Model both rates now so the transition is planned rather than painful.
Frequently asked questions
How much does Claude Sonnet 5 cost? $2 input and $10 output per million tokens through August 31, 2026, then $3 and $15.
Why might my real cost be higher than the sticker price suggests? Sonnet 5 uses a new tokenizer that can map the same text to up to 1.35 times more tokens, so effective costs depend on your content.
Is Sonnet 5 cheaper than Opus 4.8? Yes on a per-token basis (under half the price), but at maximum effort it can cost more than Opus 4.8 at a comparable accuracy point.
When does the introductory pricing end? August 31, 2026. After that, standard pricing applies.
How do I lower my Sonnet 5 bill? Use prompt caching, trim context, default to lower effort levels, and batch non-urgent work.
The bottom line
Sonnet 5 is genuinely cheaper than the flagship, but the new tokenizer and effort levels mean the headline rate is not the whole story. Measure your real token usage, set sensible effort defaults, and escalate to Opus 4.8 rather than maxing out Sonnet. For the full model picture, read the complete guide.