πŸ€– AI Tools
Β· 5 min read
Last updated on

GPT-5 Complete Guide: Models, Pricing, Benchmarks, and API Setup (2026)


GPT-5 launched in August 2025 as OpenAI’s biggest leap since GPT-4. GPT-5.4, released March 2026, is the current flagship β€” the first AI model to exceed human performance on desktop computer tasks (75% on OSWorld vs 72.4% human baseline). It handles up to 1 million tokens of context and can natively control software.

Here’s everything you need to know about the GPT-5 family.

The GPT-5 model family

ModelContextInput priceOutput priceBest for
GPT-5.41M tokens$2.50/1M$15.00/1MComplex reasoning, coding, computer use
GPT-5.4 Mini400K tokens$0.75/1M$4.50/1MFast tasks, high throughput, cost-sensitive
GPT-5.4 Pro1M tokens$30.00/1M$180.00/1MMaximum quality, research, hard problems
GPT-5 Mini128K tokens$0.25/1M$2.00/1MBudget option, simple tasks

Pricing note: Input tokens above 272K are charged at 2x the standard rate. Cached input tokens get a 75-90% discount.

GPT-5.4 dynamically routes between internal models: a lightweight engine for routine tasks, a deeper reasoning engine for complex queries, and fallbacks to Mini/Nano variants when usage limits are reached.

Benchmarks

BenchmarkGPT-5.4Claude Opus 4.6Gemini 3.1 ProWhat it tests
OSWorld75% (beats humans)68%71%Desktop computer use
SWE-bench Verified69.2%72.1%65.8%Real bug fixing
GPQA Diamond78.4%81.2%94.3%PhD-level science
ARC-AGI-252.1%48.7%77.1%Novel reasoning
GDPval83%79%76%Professional knowledge
AIME 202586.7%82.3%88.1%Competition math

The takeaway: GPT-5.4 leads on computer use (OSWorld) and professional knowledge (GDPval). Claude leads on coding (SWE-bench). Gemini leads on reasoning (ARC-AGI-2, GPQA). No single model wins everything.

Key features

Computer use

GPT-5.4 can natively control desktop software β€” clicking buttons, filling forms, navigating menus. This is what the 75% OSWorld score means: given a task like β€œcreate a pivot table in this spreadsheet,” GPT-5.4 can actually operate the software to do it.

This powers Codex CLI and the OpenAI Agents SDK sandbox execution.

1M token context

The full 1M context window means you can feed entire codebases, long documents, or extensive conversation histories in a single request. Output is capped at 128K tokens.

For comparison: Claude also offers 1M context, Gemini offers 1M+, and GLM-5.1 offers 128K.

Dynamic routing

GPT-5.4 internally routes between model sizes based on query complexity. Simple questions get fast, cheap processing. Complex questions get full reasoning. You pay per token regardless, but response times are faster for simple queries.

API setup

from openai import OpenAI

client = OpenAI(api_key="sk-...")

# GPT-5.4 (standard)
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a senior software engineer."},
        {"role": "user", "content": "Review this authentication middleware for security issues."},
    ],
    max_tokens=4096,
)
print(response.choices[0].message.content)

Using GPT-5.4 Mini (cheaper, faster)

# 3x cheaper, 2x faster β€” good for most tasks
response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Explain dependency injection in 3 sentences."}],
)

Using with OpenRouter (multi-provider fallback)

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="or-...",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "..."}],
)

OpenRouter gives you automatic fallback to other providers if OpenAI is down.

ChatGPT subscription plans

PlanPriceModels includedBest for
Free$0GPT-4o (limited)Casual use
Plus$20/moGPT-5.4, GPT-5.4 MiniIndividual developers
Pro$200/moGPT-5.4 Pro, Deep ResearchPower users, researchers
Team$25/user/moGPT-5.4, admin controlsSmall teams
Enterprise~$60/user/moAll models, SSO, complianceOrganizations

GPT-5.4 vs GPT-4o: what changed

FeatureGPT-4oGPT-5.4
Context window128K1M
Max output16K128K
Computer useβŒβœ… Native
OSWorld~35%75%
SWE-bench~48%69.2%
Dynamic routingβŒβœ…
Price (input)$2.50/1M$2.50/1M
Price (output)$10.00/1M$15.00/1M

Output is 50% more expensive, but the capability jump is massive. For most developers, GPT-5.4 replaces GPT-4o entirely.

When to use GPT-5.4 vs alternatives

Use caseBest modelWhy
General codingClaude SonnetHigher SWE-bench, better at multi-file edits
Computer use / automationGPT-5.4Best OSWorld score, native computer control
Complex reasoningGemini 3.1 Pro77.1% ARC-AGI-2
Budget codingGPT-5.4 Mini$0.75/1M input, good quality
Free codingGLM-5.1 or Qwen 3.6MIT licensed, free API tiers
Local/privateDeepSeek R1 or Qwen 3.5Runs on your hardware
Agent infrastructureOpenAI Agents SDK + GPT-5.4Best sandbox execution

Cost optimization

GPT-5.4 at $2.50/$15 per 1M tokens adds up fast. Strategies to control costs:

  1. Use Mini for simple tasks β€” 3x cheaper, handles 80% of queries well
  2. Cache system prompts β€” repeated prefixes get 75-90% discount
  3. Route by complexity β€” see our model routing guide
  4. Use OpenRouter β€” compare prices across providers
  5. Set per-user budgets β€” prevent runaway costs

See our AI API spending guide and FinOps for AI for detailed cost management.

FAQ

How much does GPT-5 cost?

GPT-5.4 costs $2.50 per million input tokens and $15.00 per million output tokens via the API. GPT-5.4 Mini is significantly cheaper at $0.75/$4.50 per million tokens. Through ChatGPT, the Plus plan at $20/month includes GPT-5.4 access.

Is GPT-5 better than Claude?

It depends on the task. GPT-5.4 leads on computer use (75% OSWorld vs Claude’s 68%) and professional knowledge benchmarks, while Claude Opus 4.6 leads on coding tasks like SWE-bench (72.1% vs 69.2%). Neither model dominates across all categories.

Can I use GPT-5 for free?

The free ChatGPT tier only includes GPT-4o, not GPT-5.4. To access GPT-5.4, you need at least a ChatGPT Plus subscription ($20/month) or use the API with pay-per-token pricing. There is no free API tier for GPT-5 models.

What’s the context window of GPT-5?

GPT-5.4 supports up to 1 million tokens of input context with a maximum output of 128K tokens. GPT-5.4 Mini has a 400K token context window, and the older GPT-5 Mini supports 128K tokens. Note that input tokens beyond 272K are charged at double the standard rate.

Related: Claude Code vs Codex CLI vs Gemini CLI Β· OpenAI Agents SDK Guide Β· GPT-5 vs Gemini 2.5 Pro Β· DeepSeek V3 vs GPT-5 Β· AI Coding Tools Pricing Β· OpenRouter Complete Guide Β· Best AI Coding Tools