Apr 21, 2026 · 5 min read

Last updated on Apr 19, 2026

GPT-5 Complete Guide: Models, Pricing, Benchmarks, and API Setup (2026)

GPT-5 launched in August 2025 as OpenAI’s biggest leap since GPT-4. GPT-5.4, released March 2026, is the current flagship — the first AI model to exceed human performance on desktop computer tasks (75% on OSWorld vs 72.4% human baseline). It handles up to 1 million tokens of context and can natively control software.

Here’s everything you need to know about the GPT-5 family.

The GPT-5 model family

Model	Context	Input price	Output price	Best for
GPT-5.4	1M tokens	$2.50/1M	$15.00/1M	Complex reasoning, coding, computer use
GPT-5.4 Mini	400K tokens	$0.75/1M	$4.50/1M	Fast tasks, high throughput, cost-sensitive
GPT-5.4 Pro	1M tokens	$30.00/1M	$180.00/1M	Maximum quality, research, hard problems
GPT-5 Mini	128K tokens	$0.25/1M	$2.00/1M	Budget option, simple tasks

Pricing note: Input tokens above 272K are charged at 2x the standard rate. Cached input tokens get a 75-90% discount.

GPT-5.4 dynamically routes between internal models: a lightweight engine for routine tasks, a deeper reasoning engine for complex queries, and fallbacks to Mini/Nano variants when usage limits are reached.

Benchmarks

Benchmark	GPT-5.4	Claude Opus 4.6	Gemini 3.1 Pro	What it tests
OSWorld	75% (beats humans)	68%	71%	Desktop computer use
SWE-bench Verified	69.2%	72.1%	65.8%	Real bug fixing
GPQA Diamond	78.4%	81.2%	94.3%	PhD-level science
ARC-AGI-2	52.1%	48.7%	77.1%	Novel reasoning
GDPval	83%	79%	76%	Professional knowledge
AIME 2025	86.7%	82.3%	88.1%	Competition math

The takeaway: GPT-5.4 leads on computer use (OSWorld) and professional knowledge (GDPval). Claude leads on coding (SWE-bench). Gemini leads on reasoning (ARC-AGI-2, GPQA). No single model wins everything.

Key features

Computer use

GPT-5.4 can natively control desktop software — clicking buttons, filling forms, navigating menus. This is what the 75% OSWorld score means: given a task like “create a pivot table in this spreadsheet,” GPT-5.4 can actually operate the software to do it.

This powers Codex CLI and the OpenAI Agents SDK sandbox execution.

1M token context

The full 1M context window means you can feed entire codebases, long documents, or extensive conversation histories in a single request. Output is capped at 128K tokens.

For comparison: Claude also offers 1M context, Gemini offers 1M+, and GLM-5.1 offers 128K.

Dynamic routing

GPT-5.4 internally routes between model sizes based on query complexity. Simple questions get fast, cheap processing. Complex questions get full reasoning. You pay per token regardless, but response times are faster for simple queries.

API setup

from openai import OpenAI

client = OpenAI(api_key="sk-...")

# GPT-5.4 (standard)
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a senior software engineer."},
        {"role": "user", "content": "Review this authentication middleware for security issues."},
    ],
    max_tokens=4096,
)
print(response.choices[0].message.content)

Using GPT-5.4 Mini (cheaper, faster)

# 3x cheaper, 2x faster — good for most tasks
response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Explain dependency injection in 3 sentences."}],
)

Using with OpenRouter (multi-provider fallback)

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="or-...",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "..."}],
)

OpenRouter gives you automatic fallback to other providers if OpenAI is down.

ChatGPT subscription plans

Plan	Price	Models included	Best for
Free	$0	GPT-4o (limited)	Casual use
Plus	$20/mo	GPT-5.4, GPT-5.4 Mini	Individual developers
Pro	$200/mo	GPT-5.4 Pro, Deep Research	Power users, researchers
Team	$25/user/mo	GPT-5.4, admin controls	Small teams
Enterprise	~$60/user/mo	All models, SSO, compliance	Organizations

GPT-5.4 vs GPT-4o: what changed

Feature	GPT-4o	GPT-5.4
Context window	128K	1M
Max output	16K	128K
Computer use	❌	✅ Native
OSWorld	~35%	75%
SWE-bench	~48%	69.2%
Dynamic routing	❌	✅
Price (input)	$2.50/1M	$2.50/1M
Price (output)	$10.00/1M	$15.00/1M

Output is 50% more expensive, but the capability jump is massive. For most developers, GPT-5.4 replaces GPT-4o entirely.

When to use GPT-5.4 vs alternatives

Use case	Best model	Why
General coding	Claude Sonnet	Higher SWE-bench, better at multi-file edits
Computer use / automation	GPT-5.4	Best OSWorld score, native computer control
Complex reasoning	Gemini 3.1 Pro	77.1% ARC-AGI-2
Budget coding	GPT-5.4 Mini	$0.75/1M input, good quality
Free coding	GLM-5.1 or Qwen 3.6	MIT licensed, free API tiers
Local/private	DeepSeek R1 or Qwen 3.5	Runs on your hardware
Agent infrastructure	OpenAI Agents SDK + GPT-5.4	Best sandbox execution

Cost optimization

GPT-5.4 at $2.50/$15 per 1M tokens adds up fast. Strategies to control costs:

Use Mini for simple tasks — 3x cheaper, handles 80% of queries well
Cache system prompts — repeated prefixes get 75-90% discount
Route by complexity — see our model routing guide
Use OpenRouter — compare prices across providers
Set per-user budgets — prevent runaway costs

See our AI API spending guide and FinOps for AI for detailed cost management.

FAQ

How much does GPT-5 cost?

GPT-5.4 costs $2.50 per million input tokens and $15.00 per million output tokens via the API. GPT-5.4 Mini is significantly cheaper at $0.75/$4.50 per million tokens. Through ChatGPT, the Plus plan at $20/month includes GPT-5.4 access.

Is GPT-5 better than Claude?

It depends on the task. GPT-5.4 leads on computer use (75% OSWorld vs Claude’s 68%) and professional knowledge benchmarks, while Claude Opus 4.6 leads on coding tasks like SWE-bench (72.1% vs 69.2%). Neither model dominates across all categories.

Can I use GPT-5 for free?

The free ChatGPT tier only includes GPT-4o, not GPT-5.4. To access GPT-5.4, you need at least a ChatGPT Plus subscription ($20/month) or use the API with pay-per-token pricing. There is no free API tier for GPT-5 models.

What’s the context window of GPT-5?

GPT-5.4 supports up to 1 million tokens of input context with a maximum output of 128K tokens. GPT-5.4 Mini has a 400K token context window, and the older GPT-5 Mini supports 128K tokens. Note that input tokens beyond 272K are charged at double the standard rate.