πŸ“ Tutorials
Β· 4 min read

How to Use the Claude Sonnet 5 API: Setup, Code, and Pricing (2026)


Claude Sonnet 5 is available through the Claude API from day one under the model string claude-sonnet-5. This guide walks through getting set up, making your first call, using effort levels and the 1M context window, and keeping costs under control.

Step 1: Get an API key

  1. Sign in to the Claude Console at platform.claude.com.
  2. Open the API keys section and create a new key.
  3. Store it in an environment variable rather than hardcoding it:
export ANTHROPIC_API_KEY="your-key-here"

Keeping the key in an environment variable is a basic security habit. Never commit it to a repo. For more on this, see our guide to securing AI API keys.

Step 2: Install the SDK

For Python:

pip install anthropic

For JavaScript or TypeScript:

npm install @anthropic-ai/sdk

Step 3: Your first call

Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what a race condition is in two sentences."}
    ],
)

print(message.content[0].text)

JavaScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Explain what a race condition is in two sentences." },
  ],
});

console.log(message.content[0].text);

That is the whole loop. The only change from older Claude models is the model string.

Step 4: Set the effort level

Sonnet 5 supports selectable reasoning effort: low, medium, high, max, and x-high. Lower effort is faster and cheaper, higher effort is more accurate and uses more tokens. Use low or medium for routine tasks and reserve x-high for genuinely hard reasoning. Our effort levels guide explains when each tier pays off and warns about a cost trap: at x-high, Sonnet 5 can cost more than Opus 4.8 at a comparable accuracy point.

Step 5: Use the 1M context window

Sonnet 5 accepts up to one million tokens of context, enough to load an entire codebase or a large set of documents in a single prompt. Pass long context as ordinary message content. Keep in mind that large prompts cost real money on input tokens, so trim what the model does not need. For patterns on this, see context window management.

Pricing

Introductory pricing runs through August 31, 2026:

  • Input: $2 per million tokens
  • Output: $10 per million tokens

After that, standard pricing is $3 input and $15 output. One important caveat: Sonnet 5 uses an updated tokenizer, so the same text can map to up to 1.35 times more tokens than older models. Budget for that. The full breakdown is in Sonnet 5 pricing explained.

Cost control tips

  • Use prompt caching for repeated system prompts and long shared context.
  • Run lower effort levels for routine work.
  • Trim context to what the task needs.
  • Set sensible max_tokens limits so runaway outputs do not surprise you.
  • Track spend per request. See monitor and control AI API spending.

Other ways to access Sonnet 5

Beyond the native API, Sonnet 5 is available on Amazon Bedrock, Microsoft Foundry, and Google Vertex, and through multi-provider routers. If you want failover across providers, see the Sonnet 5 OpenRouter setup. To use it in your editor or terminal, see the Claude Code setup and Aider setup.

Streaming responses

For interactive apps, streaming the response token by token improves perceived speed. The SDK supports it directly. In Python:

with client.messages.stream(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Refactor this function for readability."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming does not change your token cost, but it makes agents and chat interfaces feel far more responsive, which matters for user-facing products.

Tool use and agentic calls

Sonnet 5 is built for tool use, so the most powerful pattern is giving it tools (functions) it can call. You define tools in the request, the model decides when to call them, you execute the call and return the result, and the model continues. This is the foundation of agentic workflows where Sonnet 5 plans, acts, and verifies. Because it was trained to check its own output, it tends to recover gracefully when a tool call returns an error, which reduces stuck loops.

Handling errors and rate limits

Production code should handle transient failures. Wrap calls with retry logic that backs off on rate-limit (429) and server (5xx) responses, and surface a clear message on auth (401) errors. For broader patterns, see our guide to handling API errors. Anthropic raised rate limits across its platform alongside the Sonnet 5 launch to accommodate higher-effort usage, but high-volume apps should still implement backoff.

Choosing the right effort per request

Because you set effort per request, you can route programmatically: low effort for classification or formatting, medium for standard generation, and high only for genuinely hard reasoning. This single decision often has a bigger impact on your bill than any other optimization. See the effort levels guide for concrete guidance.

Frequently asked questions

What is the Claude Sonnet 5 API model string? claude-sonnet-5.

How much does the Sonnet 5 API cost? $2 input and $10 output per million tokens through August 31, 2026, then $3 and $15.

Does the API support the full 1M context window? Yes, the API supports the one million token context window.

Can I set the reasoning effort through the API? Yes. Sonnet 5 exposes effort levels from low through x-high.

Is there a free way to try Sonnet 5? Sonnet 5 is the default model on the Free plan in the Claude app. For API use you pay per token.

The bottom line

Calling Sonnet 5 is a one-line change from older Claude models: set model to claude-sonnet-5. The real work is tuning effort levels and managing the 1M context window so your costs stay predictable. Start with low effort, escalate only when a task needs it, and watch the tokenizer math. For the full model picture, read the complete guide.