Claude Sonnet 5 is priced to be a value model, but its real cost depends heavily on how efficiently you use tokens. Two factors make this especially important: a new tokenizer that can inflate effective token counts, and selectable effort levels that multiply reasoning tokens. This guide shows how to get the most output per dollar.
Why token efficiency matters more with Sonnet 5
Sonnet 5 uses an updated tokenizer, the same kind of change Anthropic introduced with Opus 4.7. The same text can map to roughly 1.0 to 1.35 times more tokens than older models depending on content type. Since you pay per token, that difference flows straight to your bill. Anthropic set the introductory price so the move from Sonnet 4.6 is roughly cost-neutral, which tells you the tokenizer genuinely raises counts. Efficiency is how you claw that back.
Tactic 1: Prompt caching
If you send the same system prompt or shared context repeatedly, prompt caching lets you avoid being billed full price for those tokens every call. For agents and chat apps with stable instructions, this is often the single biggest saving. Structure your prompts so the stable part comes first and the variable part last.
Tactic 2: Trim the context window
Sonnet 5βs one million token context window is powerful, but every token you send costs input money. Loading an entire repo when the task touches three files is wasteful. Be deliberate:
- Include only the files and documents the task needs.
- Summarize long histories instead of resending them in full.
- Drop context that is no longer relevant.
For deeper patterns, see context window management.
Tactic 3: Tune effort levels
Higher effort means more reasoning tokens. Running everything at high or x-high burns tokens you often do not need. Default to medium, drop to low for routine work, and reserve high effort for hard tasks. Crucially, before pushing Sonnet 5 to x-high, check whether Opus 4.8 at lower effort is cheaper for the same accuracy. See the effort levels guide.
Tactic 4: Control output length
Output tokens cost more than input tokens ($10 versus $2 during the intro window). Ask for concise outputs when you do not need long ones, set sensible max_tokens limits, and avoid prompting the model into verbose explanations you will not read.
Tactic 5: Batch and route
- Group non-urgent work to reduce per-call overhead.
- Route routine tasks to lower effort or cheaper models, and reserve Sonnet 5βs higher tiers and Opus 4.8 for work that needs them.
Measure, then optimize
The only way to know your true efficiency is to measure. Pull actual input and output token counts from API responses, attribute them to features, and watch the trend. You will usually find a small number of workflows dominate your spend. Optimize those first. For tooling, see monitor and control AI API spending and the LLM inference cost calculator.
A worked example
Suppose you run a code-review agent that processes 1,000 pull requests a month. Each review sends a 12,000 token system prompt plus context and produces a 2,000 token response. Without caching, you pay for that 12,000 token system prompt on every single call. With prompt caching, you pay full price once and a reduced rate thereafter, which on a stable prompt can cut input costs dramatically. Now layer in the tokenizer: if your content lands at the high end of the 1.35 times range, your real input tokens are closer to 16,000 than 12,000 before optimization. The lesson is that caching and context discipline are not nice-to-haves with Sonnet 5, they are how you keep the model as cheap as the sticker price suggests.
Where the waste usually hides
In most applications, a small number of patterns account for most wasted tokens:
- Resending entire conversation histories instead of summaries.
- Loading whole files or repos when only a few functions are relevant.
- Verbose system prompts repeated on every call without caching.
- Running high or x-high effort on tasks that medium handles fine.
- Letting outputs run long when a concise answer would do.
Fix these five and most teams see a meaningful drop in spend without any loss of quality.
Build a habit of measurement
Token efficiency is not a one-time cleanup, it is a habit. Pull token counts from API responses, attribute them to features, and review the trend monthly. When a new feature ships, check its token profile before it scales. For tooling and dashboards, see monitor and control AI API spending. For the full pricing picture including the tokenizer math, see Sonnet 5 pricing explained.
Frequently asked questions
Why does Sonnet 5 use more tokens for the same text? It uses an updated tokenizer that can map the same text to up to 1.35 times more tokens depending on content type.
What is the single best way to cut Sonnet 5 costs? For apps with stable system prompts, prompt caching usually saves the most. For agents, trimming context and tuning effort levels matter most.
Do output tokens cost more than input tokens? Yes. During the intro window, output is $10 per million versus $2 for input, so concise outputs save real money.
Should I always use the full 1M context window? No. Send only what the task needs. Large context costs input tokens on every call.
The bottom line
Sonnet 5βs value is real, but the new tokenizer means efficiency is on you. Cache stable prompts, trim context, default to medium effort, and keep outputs tight. Measure your real token usage and optimize the workflows that dominate spend. For the full pricing picture, read Sonnet 5 pricing explained.