GPT-5.6 Pricing: Sol, Terra, and Luna Compared (With the Cache Math)
GPT-5.6 introduces a three-tier pricing structure and a completely revamped caching system. If you are trying to figure out what this will actually cost in production, you need to understand both the base prices and how the cache mechanics change your effective cost.
Here is the full breakdown, including real scenarios and comparisons against Claude Sonnet 5, Claude Opus 4.8, and DeepSeek models.
Base Pricing
All prices are per 1 million tokens:
| Model | Input | Output | Model ID |
|---|---|---|---|
| GPT-5.6 Sol | $5.00 | $30.00 | gpt-5.6-sol |
| GPT-5.6 Terra | $2.50 | $15.00 | gpt-5.6-terra |
| GPT-5.6 Luna | $1.00 | $6.00 | gpt-5.6-luna |
For context, here is how these compare to other frontier models available in 2026:
| Model | Input | Output |
|---|---|---|
| Claude Opus 4.8 | $15.00 | $75.00 |
| GPT-5.6 Sol | $5.00 | $30.00 |
| Claude Sonnet 5 | $2.00 | $10.00 |
| GPT-5.6 Terra | $2.50 | $15.00 |
| GPT-5.6 Luna | $1.00 | $6.00 |
Sol is expensive but not the most expensive model on the market. Opus 4.8 costs 3x more on input and 2.5x more on output. Terra sits close to Sonnet 5 pricing. Luna undercuts everything in the frontier tier.
For a comprehensive comparison across all providers, see our AI API pricing guide for 2026.
The Cache System: How It Actually Works
GPT-5.6 introduces explicit cache management that gives developers direct control over what gets cached and how. This is different from the implicit caching in previous OpenAI models.
Cache Mechanics
- Cache writes: 1.25x the standard input price
- Cache reads: 90% discount (you pay 10% of standard input price)
- Minimum lifetime: 30 minutes
- Breakpoints: You define explicit cache boundaries in your prompts
Cache Prices Per Model
| Model | Standard Input | Cache Write | Cache Read |
|---|---|---|---|
| Sol | $5.00 | $6.25 | $0.50 |
| Terra | $2.50 | $3.125 | $0.25 |
| Luna | $1.00 | $1.25 | $0.10 |
How Breakpoints Work
You place explicit markers in your prompt to define cacheable sections. Everything before a breakpoint is eligible for caching. When a subsequent request shares the same prefix up to a breakpoint, the cached version is read instead of reprocessed.
This is fundamentally better than implicit caching because:
- You control what gets cached (no surprises)
- You know the minimum lifetime (30 minutes guaranteed)
- You can optimize prompt structure around cache boundaries
The Cache Math: When Does It Pay Off?
Let us calculate when caching becomes worthwhile for Sol:
Setup: A system prompt of 50,000 tokens that stays constant across requests.
Without caching:
- Each request costs $0.25 for the system prompt input (50K Γ $5/1M)
With caching:
- First request (cache write): $0.3125 (50K Γ $6.25/1M)
- Subsequent requests (cache read): $0.025 (50K Γ $0.50/1M)
Break-even point: The cache write costs $0.0625 more than a standard read. Each cached read saves $0.225 compared to a standard read. You break even after just 1 additional request.
After 10 requests with the same prefix within 30 minutes:
- Without cache: 10 Γ $0.25 = $2.50
- With cache: $0.3125 + (9 Γ $0.025) = $0.5375
- Savings: 78%
After 100 requests:
- Without cache: 100 Γ $0.25 = $25.00
- With cache: $0.3125 + (99 Γ $0.025) = $2.7875
- Savings: 89%
The takeaway: if you are making more than 2 requests with the same prefix within a 30-minute window, always cache. The savings are enormous.
Real-World Cost Scenarios
Scenario 1: Coding Assistant (Individual Developer)
Typical usage: 50 requests per day, 5K token system prompt, 3K token user message, 2K token output.
Using Sol:
- System prompt cached (after first request): 50 Γ 5K Γ $0.50/1M = $0.125
- User messages (unique, not cached): 50 Γ 3K Γ $5/1M = $0.75
- Output: 50 Γ 2K Γ $30/1M = $3.00
- Daily cost: ~$3.88
- Monthly cost: ~$77.50 (20 working days)
Using Luna:
- System prompt cached: 50 Γ 5K Γ $0.10/1M = $0.025
- User messages: 50 Γ 3K Γ $1/1M = $0.15
- Output: 50 Γ 2K Γ $6/1M = $0.60
- Daily cost: ~$0.78
- Monthly cost: ~$15.50
Using Claude Sonnet 5:
- System prompt (Anthropic caching): 50 Γ 5K Γ $0.20/1M β $0.05 (estimated with Anthropic cache)
- User messages: 50 Γ 3K Γ $2/1M = $0.30
- Output: 50 Γ 2K Γ $10/1M = $1.00
- Daily cost: ~$1.35
- Monthly cost: ~$27.00
Scenario 2: High-Volume API Service
Usage: 10,000 requests per hour, 20K token context (mostly cached), 1K token output.
Using Sol:
- Cached input: 10,000 Γ 20K Γ $0.50/1M = $100/hour
- Output: 10,000 Γ 1K Γ $30/1M = $300/hour
- Hourly: $400 | Monthly: ~$288,000
Using Luna:
- Cached input: 10,000 Γ 20K Γ $0.10/1M = $20/hour
- Output: 10,000 Γ 1K Γ $6/1M = $60/hour
- Hourly: $80 | Monthly: ~$57,600
Using Claude Sonnet 5:
- Cached input: 10,000 Γ 20K Γ estimated $0.20/1M = $40/hour
- Output: 10,000 Γ 1K Γ $10/1M = $100/hour
- Hourly: $140 | Monthly: ~$100,800
Luna is significantly cheaper for high-volume workloads. For this scenario, Luna saves $43,200/month compared to Sonnet 5 while likely delivering better Terminal-Bench performance (84.3% vs Sonnet 5βs score).
Scenario 3: Agentic Workflow (Sol Ultra)
A complex coding task using Solβs ultra mode, where the model spawns 4 subagents:
- Main context: 100K tokens input, 50K output
- Each subagent: 30K input, 20K output (4 subagents)
- Synthesis: 80K additional input, 10K output
Cost per complex task:
- Main: (100K Γ $5 + 50K Γ $30) / 1M = $0.50 + $1.50 = $2.00
- Subagents: 4 Γ (30K Γ $5 + 20K Γ $30) / 1M = 4 Γ ($0.15 + $0.60) = $3.00
- Synthesis: (80K Γ $5 + 10K Γ $30) / 1M = $0.40 + $0.30 = $0.70
- Total per task: ~$5.70
Ultra mode is expensive. Use it only when the quality improvement justifies the 3 to 5x cost increase over standard Sol. For most tasks, standard Sol or even Luna will be sufficient.
For tracking these costs across projects, see our guide on monitoring and controlling AI API spending.
Optimization Strategies
1. Structure Prompts for Maximum Cache Hits
Put stable content first: system instructions, reference documents, few-shot examples. Place the variable content (user query, dynamic context) after the cache breakpoint. This maximizes the cacheable prefix.
2. Batch Requests Within 30-Minute Windows
The cache has a 30-minute minimum lifetime. Cluster your requests so cached prefixes get reused within that window. If your workload is bursty, this happens naturally. If it is spread out, consider batching.
3. Use Luna for Everything You Can
Luna at $1/$6 with 84.3% Terminal-Bench performance is remarkable value. Start with Luna and only escalate to Terra or Sol when Lunaβs quality is insufficient for your specific task. Do not default to Sol because it is the flagship.
4. Reserve Sol Ultra for High-Value Tasks
Ultra mode (subagents) multiplies your cost by 3 to 5x. Reserve it for tasks where the 3.1% Terminal-Bench improvement (88.8% to 91.9%) translates to meaningful output quality differences. Routine coding tasks do not need it.
5. Compare Against Available Alternatives
Remember that GPT-5.6 is government-gated. If you are evaluating costs for future planning but cannot actually access the models yet, build your current infrastructure around publicly available options. Claude Sonnet 5 at $2/$10 is available now and competitively priced.
DeepSeek Comparison
DeepSeek V4 Flash offers aggressive pricing that undercuts even Luna for some workloads. However, GPT-5.6 Luna at $1/$6 with 84.3% Terminal-Bench is competitive. The decision between them depends on:
- Access (GPT-5.6 is government-gated)
- Specific task performance beyond Terminal-Bench
- Data residency and provider trust considerations
- Whether the cache system provides enough savings to close any price gap
Check our AI API providers comparison for the full landscape including DeepSeek options.
The Cost of Government Gating
There is an indirect cost worth noting: you cannot actually use these prices yet unless you are one of approximately 20 approved partners. The pricing is published, but access is restricted.
For planning purposes, budget based on models you can actually use today. If GPT-5.6 access opens up, the cost reduction (especially with Luna or the cache system) could be significant. But do not commit to a budget projection based on a model you cannot access.
This connects to the broader AI model supply chain risk discussion. Your cost model should not depend on government decisions.
FAQ
Is the cache shared across API keys or organizations?
The cache is per-organization. Your cached prompts are not shared with other users, and other usersβ caches do not benefit you. Each organization builds its own cache.
What happens when the 30-minute cache lifetime expires?
The cache entry is evicted, and the next request with that prefix will incur a cache write charge (1.25x) again. Structure your usage patterns to stay within the 30-minute window for maximum savings.
Can I use the cache system with Luna and Terra, or just Sol?
All three tiers support the new cache system with the same mechanics (1.25x write, 90% read discount, 30-minute minimum, explicit breakpoints). The dollar amounts differ based on each modelβs base pricing.
How does GPT-5.6 Sol compare to Opus 4.8 on cost-per-quality?
Sol at $5/$30 with 88.8% Terminal-Bench offers dramatically better cost-per-quality than Opus 4.8 at $15/$75 with 78.9% Terminal-Bench. Sol is cheaper AND better on this metric. Opus 4.8 may still win on specific tasks not captured by Terminal-Bench, but the general pricing advantage is clearly with Sol.
Is GPT-5.6 Luna really cheaper than all frontier models?
At $1/$6 per 1M tokens with 84.3% Terminal-Bench, Luna is the cheapest model at this capability level. DeepSeek V4 Flash may be cheaper on raw token pricing but likely trails on capability. Claude Sonnet 5 at $2/$10 costs twice as much on input and nearly twice on output. For models above 80% Terminal-Bench, Luna is currently the price leader.