Jul 2, 2026 · 7 min read

GPT-5.6 Pricing: Sol, Terra, and Luna Compared (With the Cache Math)

GPT-5.6 introduces a three-tier pricing structure and a completely revamped caching system. If you are trying to figure out what this will actually cost in production, you need to understand both the base prices and how the cache mechanics change your effective cost.

Here is the full breakdown, including real scenarios and comparisons against Claude Sonnet 5, Claude Opus 4.8, and DeepSeek models.

Base Pricing

All prices are per 1 million tokens:

Model	Input	Output	Model ID
GPT-5.6 Sol	$5.00	$30.00	gpt-5.6-sol
GPT-5.6 Terra	$2.50	$15.00	gpt-5.6-terra
GPT-5.6 Luna	$1.00	$6.00	gpt-5.6-luna

For context, here is how these compare to other frontier models available in 2026:

Model	Input	Output
Claude Opus 4.8	$15.00	$75.00
GPT-5.6 Sol	$5.00	$30.00
Claude Sonnet 5	$2.00	$10.00
GPT-5.6 Terra	$2.50	$15.00
GPT-5.6 Luna	$1.00	$6.00

Sol is expensive but not the most expensive model on the market. Opus 4.8 costs 3x more on input and 2.5x more on output. Terra sits close to Sonnet 5 pricing. Luna undercuts everything in the frontier tier.

For a comprehensive comparison across all providers, see our AI API pricing guide for 2026.

The Cache System: How It Actually Works

GPT-5.6 introduces explicit cache management that gives developers direct control over what gets cached and how. This is different from the implicit caching in previous OpenAI models.

Cache Mechanics

Cache writes: 1.25x the standard input price
Cache reads: 90% discount (you pay 10% of standard input price)
Minimum lifetime: 30 minutes
Breakpoints: You define explicit cache boundaries in your prompts

Cache Prices Per Model

Model	Standard Input	Cache Write	Cache Read
Sol	$5.00	$6.25	$0.50
Terra	$2.50	$3.125	$0.25
Luna	$1.00	$1.25	$0.10

How Breakpoints Work

You place explicit markers in your prompt to define cacheable sections. Everything before a breakpoint is eligible for caching. When a subsequent request shares the same prefix up to a breakpoint, the cached version is read instead of reprocessed.

This is fundamentally better than implicit caching because:

You control what gets cached (no surprises)
You know the minimum lifetime (30 minutes guaranteed)
You can optimize prompt structure around cache boundaries

The Cache Math: When Does It Pay Off?

Let us calculate when caching becomes worthwhile for Sol:

Setup: A system prompt of 50,000 tokens that stays constant across requests.

Without caching:

Each request costs $0.25 for the system prompt input (50K × $5/1M)

With caching:

First request (cache write): $0.3125 (50K × $6.25/1M)
Subsequent requests (cache read): $0.025 (50K × $0.50/1M)

Break-even point: The cache write costs $0.0625 more than a standard read. Each cached read saves $0.225 compared to a standard read. You break even after just 1 additional request.

After 10 requests with the same prefix within 30 minutes:

Without cache: 10 × $0.25 = $2.50
With cache: $0.3125 + (9 × $0.025) = $0.5375
Savings: 78%

After 100 requests:

Without cache: 100 × $0.25 = $25.00
With cache: $0.3125 + (99 × $0.025) = $2.7875
Savings: 89%

The takeaway: if you are making more than 2 requests with the same prefix within a 30-minute window, always cache. The savings are enormous.

Real-World Cost Scenarios

Scenario 1: Coding Assistant (Individual Developer)

Typical usage: 50 requests per day, 5K token system prompt, 3K token user message, 2K token output.

Using Sol:

System prompt cached (after first request): 50 × 5K × $0.50/1M = $0.125
User messages (unique, not cached): 50 × 3K × $5/1M = $0.75
Output: 50 × 2K × $30/1M = $3.00
Daily cost: ~$3.88
Monthly cost: ~$77.50 (20 working days)

Using Luna:

System prompt cached: 50 × 5K × $0.10/1M = $0.025
User messages: 50 × 3K × $1/1M = $0.15
Output: 50 × 2K × $6/1M = $0.60
Daily cost: ~$0.78
Monthly cost: ~$15.50

Using Claude Sonnet 5:

System prompt (Anthropic caching): 50 × 5K × $0.20/1M ≈ $0.05 (estimated with Anthropic cache)
User messages: 50 × 3K × $2/1M = $0.30
Output: 50 × 2K × $10/1M = $1.00
Daily cost: ~$1.35
Monthly cost: ~$27.00

Scenario 2: High-Volume API Service

Usage: 10,000 requests per hour, 20K token context (mostly cached), 1K token output.

Using Sol:

Cached input: 10,000 × 20K × $0.50/1M = $100/hour
Output: 10,000 × 1K × $30/1M = $300/hour
Hourly: $400 | Monthly: ~$288,000

Using Luna:

Cached input: 10,000 × 20K × $0.10/1M = $20/hour
Output: 10,000 × 1K × $6/1M = $60/hour
Hourly: $80 | Monthly: ~$57,600

Using Claude Sonnet 5:

Cached input: 10,000 × 20K × estimated $0.20/1M = $40/hour
Output: 10,000 × 1K × $10/1M = $100/hour
Hourly: $140 | Monthly: ~$100,800

Luna is significantly cheaper for high-volume workloads. For this scenario, Luna saves $43,200/month compared to Sonnet 5 while likely delivering better Terminal-Bench performance (84.3% vs Sonnet 5’s score).

Scenario 3: Agentic Workflow (Sol Ultra)

A complex coding task using Sol’s ultra mode, where the model spawns 4 subagents:

Main context: 100K tokens input, 50K output
Each subagent: 30K input, 20K output (4 subagents)
Synthesis: 80K additional input, 10K output

Cost per complex task:

Main: (100K × $5 + 50K × $30) / 1M = $0.50 + $1.50 = $2.00
Subagents: 4 × (30K × $5 + 20K × $30) / 1M = 4 × ($0.15 + $0.60) = $3.00
Synthesis: (80K × $5 + 10K × $30) / 1M = $0.40 + $0.30 = $0.70
Total per task: ~$5.70

Ultra mode is expensive. Use it only when the quality improvement justifies the 3 to 5x cost increase over standard Sol. For most tasks, standard Sol or even Luna will be sufficient.

For tracking these costs across projects, see our guide on monitoring and controlling AI API spending.

Optimization Strategies

1. Structure Prompts for Maximum Cache Hits

Put stable content first: system instructions, reference documents, few-shot examples. Place the variable content (user query, dynamic context) after the cache breakpoint. This maximizes the cacheable prefix.

2. Batch Requests Within 30-Minute Windows

The cache has a 30-minute minimum lifetime. Cluster your requests so cached prefixes get reused within that window. If your workload is bursty, this happens naturally. If it is spread out, consider batching.

3. Use Luna for Everything You Can

Luna at $1/$6 with 84.3% Terminal-Bench performance is remarkable value. Start with Luna and only escalate to Terra or Sol when Luna’s quality is insufficient for your specific task. Do not default to Sol because it is the flagship.

4. Reserve Sol Ultra for High-Value Tasks

Ultra mode (subagents) multiplies your cost by 3 to 5x. Reserve it for tasks where the 3.1% Terminal-Bench improvement (88.8% to 91.9%) translates to meaningful output quality differences. Routine coding tasks do not need it.

5. Compare Against Available Alternatives

Remember that GPT-5.6 is government-gated. If you are evaluating costs for future planning but cannot actually access the models yet, build your current infrastructure around publicly available options. Claude Sonnet 5 at $2/$10 is available now and competitively priced.

DeepSeek Comparison

DeepSeek V4 Flash offers aggressive pricing that undercuts even Luna for some workloads. However, GPT-5.6 Luna at $1/$6 with 84.3% Terminal-Bench is competitive. The decision between them depends on:

Access (GPT-5.6 is government-gated)
Specific task performance beyond Terminal-Bench
Data residency and provider trust considerations
Whether the cache system provides enough savings to close any price gap

Check our AI API providers comparison for the full landscape including DeepSeek options.

The Cost of Government Gating

There is an indirect cost worth noting: you cannot actually use these prices yet unless you are one of approximately 20 approved partners. The pricing is published, but access is restricted.

For planning purposes, budget based on models you can actually use today. If GPT-5.6 access opens up, the cost reduction (especially with Luna or the cache system) could be significant. But do not commit to a budget projection based on a model you cannot access.

This connects to the broader AI model supply chain risk discussion. Your cost model should not depend on government decisions.

FAQ

Is the cache shared across API keys or organizations?

The cache is per-organization. Your cached prompts are not shared with other users, and other users’ caches do not benefit you. Each organization builds its own cache.

What happens when the 30-minute cache lifetime expires?

The cache entry is evicted, and the next request with that prefix will incur a cache write charge (1.25x) again. Structure your usage patterns to stay within the 30-minute window for maximum savings.

Can I use the cache system with Luna and Terra, or just Sol?

All three tiers support the new cache system with the same mechanics (1.25x write, 90% read discount, 30-minute minimum, explicit breakpoints). The dollar amounts differ based on each model’s base pricing.

How does GPT-5.6 Sol compare to Opus 4.8 on cost-per-quality?

Sol at $5/$30 with 88.8% Terminal-Bench offers dramatically better cost-per-quality than Opus 4.8 at $15/$75 with 78.9% Terminal-Bench. Sol is cheaper AND better on this metric. Opus 4.8 may still win on specific tasks not captured by Terminal-Bench, but the general pricing advantage is clearly with Sol.

Is GPT-5.6 Luna really cheaper than all frontier models?

At $1/$6 per 1M tokens with 84.3% Terminal-Bench, Luna is the cheapest model at this capability level. DeepSeek V4 Flash may be cheaper on raw token pricing but likely trails on capability. Claude Sonnet 5 at $2/$10 costs twice as much on input and nearly twice on output. For models above 80% Terminal-Bench, Luna is currently the price leader.

GPT-5.6 Pricing: Sol, Terra, and Luna Compared (With the Cache Math)

GPT-5.6 Pricing: Sol, Terra, and Luna Compared (With the Cache Math)

Base Pricing

The Cache System: How It Actually Works

Cache Mechanics

Cache Prices Per Model

How Breakpoints Work

The Cache Math: When Does It Pay Off?

Real-World Cost Scenarios

Scenario 1: Coding Assistant (Individual Developer)

Scenario 2: High-Volume API Service

Scenario 3: Agentic Workflow (Sol Ultra)

Optimization Strategies

1. Structure Prompts for Maximum Cache Hits

2. Batch Requests Within 30-Minute Windows

3. Use Luna for Everything You Can

4. Reserve Sol Ultra for High-Value Tasks

5. Compare Against Available Alternatives

DeepSeek Comparison

The Cost of Government Gating

FAQ

Is the cache shared across API keys or organizations?

What happens when the 30-minute cache lifetime expires?

Can I use the cache system with Luna and Terra, or just Sol?

How does GPT-5.6 Sol compare to Opus 4.8 on cost-per-quality?

Is GPT-5.6 Luna really cheaper than all frontier models?

📬 AI Dev Weekly

You might also like

Why the US Government Controls Who Can Use GPT-5.6 (And What It Means)

GPT-5.6 Sol, Terra, and Luna: Complete Guide to OpenAI's Government-Gated Models (2026)

GPT-5.6 Sol vs Claude Sonnet 5: The Models Most Developers Cannot Compare Yet

Chinese AI Models Are Now 30x Cheaper Than American Models (May 2026)