GPT-5.6 Sol, Terra, and Luna: Complete Guide to OpenAI's Government-Gated Models (2026)
GPT-5.6 Sol, Terra, and Luna: Complete Guide to OpenAI’s Government-Gated Models (2026)
On June 26, 2026, OpenAI launched GPT-5.6 as a limited preview. Not limited in the “sign up for a waitlist” sense. Limited in the “the US government decides who gets access” sense.
This is the first time OpenAI has shipped a model family where access is entirely controlled by a government entity. There is no public waitlist. There is no ChatGPT integration. Approximately 20 trusted partners have access right now, and that number is not growing fast.
If you are building with AI APIs today, you need to understand what GPT-5.6 offers, why you probably cannot use it yet, and what your alternatives are. This guide covers all three tiers, the new features, the safety stack that drove the restrictions, and practical comparisons to Claude Sonnet 5 and Claude Opus 4.8.
The Three Tiers: Sol, Terra, and Luna
GPT-5.6 ships as three distinct models, each targeting a different use case and budget:
Sol (gpt-5.6-sol)
The flagship. Sol is OpenAI’s most capable model to date and the reason this release is government-gated.
- Input: $5 per 1M tokens
- Output: $30 per 1M tokens
- Terminal-Bench 2.1: 88.8% (plain), 91.9% (ultra mode)
- Best for: Complex agentic workflows, deep reasoning, multi-step coding tasks
Sol is the only model in the family that supports ultra mode, a new feature that spawns subagent processes for complex reasoning. More on that below.
Terra (gpt-5.6-terra)
The balanced option. Terra targets the same use cases as GPT-5.5 but at roughly half the cost.
- Input: $2.50 per 1M tokens
- Output: $15 per 1M tokens
- Terminal-Bench 2.1: 82.5%
- Best for: General development tasks, code generation, standard chat applications
Terra sits in an interesting position. It scores below GPT-5.5’s 88.0% on Terminal-Bench, but costs significantly less. Whether that tradeoff makes sense depends on your workload.
Luna (gpt-5.6-luna)
The speed tier, and the surprise of this launch. Luna scores above Terra on Terminal-Bench despite being the cheapest model in the family.
- Input: $1 per 1M tokens
- Output: $6 per 1M tokens
- Terminal-Bench 2.1: 84.3%
- Best for: High-volume workloads, latency-sensitive applications, cost-optimized pipelines
Luna at 84.3% beating Terra at 82.5% is unusual. This likely reflects architectural differences in how each model was optimized. Luna was built for speed and throughput, and that optimization apparently benefits certain benchmark tasks.
Benchmark Context
Here is how GPT-5.6 compares to other frontier models:
| Model | Terminal-Bench 2.1 | Price (in/out per 1M) |
|---|---|---|
| GPT-5.6 Sol Ultra | 91.9% | $5/$30 + subagent costs |
| GPT-5.6 Sol | 88.8% | $5/$30 |
| GPT-5.5 | 88.0% | varies |
| GPT-5.6 Luna | 84.3% | $1/$6 |
| GPT-5.6 Terra | 82.5% | $2.50/$15 |
| Claude Opus 4.8 | 78.9% | $15/$75 |
| Claude Sonnet 5 | competitive | $2/$10 |
For SWE-bench Pro comparisons, Claude Opus 4.8 scores 69.2% and Claude Sonnet 5 hits 63.2%. OpenAI has not published SWE-bench Pro numbers for GPT-5.6 yet.
New Features: Max Reasoning Effort and Ultra Mode
GPT-5.6 introduces two related but distinct features for controlling model behavior.
Max Reasoning Effort
All three models support a reasoning effort parameter. You can dial it from low to max, trading latency and cost for deeper thinking. This is conceptually similar to Claude’s extended thinking, but implemented as a continuous scale rather than discrete levels.
Ultra Mode (Sol Only)
Ultra mode is the headline feature. When enabled on Sol, the model can spawn subagent processes that work in parallel on different aspects of a problem. Think of it as the model breaking a complex task into subtasks, delegating each to a separate reasoning process, then synthesizing results.
The results speak for themselves: 88.8% on Terminal-Bench with standard Sol jumps to 91.9% with ultra mode enabled. That 3.1 percentage point gain is significant at this level of the benchmark.
Ultra mode has implications for cost. Each subagent consumes tokens independently, so a single ultra-mode request can cost several times what a standard request costs. For managing API spending, you will want to monitor ultra-mode usage carefully.
The Cache System
GPT-5.6 introduces a revamped caching system with explicit developer control:
- Cache writes: 1.25x the standard input price
- Cache reads: 90% discount on standard input price
- Minimum cache lifetime: 30 minutes
- Explicit breakpoints: You define where cache boundaries sit in your prompts
This is a meaningful improvement over implicit caching. You can now structure your prompts with stable prefixes (system instructions, context documents) and mark breakpoints so those sections get cached reliably. The 90% read discount means that for repeated calls with similar prefixes, your effective input cost drops dramatically.
For Sol at $5/1M input tokens: cache writes cost $6.25/1M, but subsequent reads cost just $0.50/1M. If you are making 10+ calls with the same prefix within 30 minutes, the math works out heavily in your favor.
Access: The Government Gate
This is the part that matters most for practical planning. GPT-5.6 is not available through normal channels.
- No ChatGPT integration
- No public API waitlist
- No self-serve access of any kind
- Approximately 20 trusted partners have access
- The US government decides who gets added
OpenAI previewed GPT-5.6’s capabilities to the US government ahead of launch, at the government’s request. This is the second frontier model in a month to face government restrictions, following the Fable 5 ban.
For most developers reading this, GPT-5.6 is not something you can use today. Plan accordingly. If you need frontier-level performance right now, Claude Sonnet 5 launched June 30 at $2/$10 and is publicly available.
Safety Stack
The access restrictions exist because of GPT-5.6’s capabilities in sensitive domains. OpenAI invested 700,000 A100-equivalent GPU hours in automated red-teaming and reports the following scores:
- Virology Capabilities Test: 53.5%
- Molecular Biology: 60%
- Human Pathogen: 68.4%
- ExploitBench: Competitive with Mythos Preview at 1/3 the output tokens
The safety architecture is three-layered:
- Model-level training: Safety behaviors baked into the model weights
- Real-time classifiers: External systems that monitor inputs and outputs
- Account-level review: Human oversight of partner organizations
This is the most restrictive safety deployment OpenAI has ever done, and it reflects the broader trend toward AI model supply chain risks being taken seriously at the policy level.
Cerebras Hosting: 750 Tokens Per Second
Coming in July 2026, Cerebras will offer hosting for GPT-5.6 Sol at approximately 750 tokens per second. This is a notable partnership because it demonstrates that even government-gated models can be deployed on alternative infrastructure.
For developers who do get access, the Cerebras option means you will not be locked into OpenAI’s own inference infrastructure. Check our best AI API providers guide for updates as this becomes available.
Comparison with Claude Models
If you are choosing between model families today, here is the practical reality:
GPT-5.6 Sol vs Claude Opus 4.8: Sol wins on Terminal-Bench (88.8% vs 78.9%) and costs far less ($5/$30 vs $15/$75). But you cannot use Sol unless you are one of ~20 partners. Opus 4.8 is available now.
GPT-5.6 Luna vs Claude Sonnet 5: Luna scores higher on Terminal-Bench (84.3% vs Sonnet 5’s competitive range) and costs less ($1/$6 vs $2/$10). But again, access. Sonnet 5 launched publicly on June 30.
GPT-5.6 Terra vs Claude Sonnet 5: Terra at $2.50/$15 is more expensive than Sonnet 5 at $2/$10 and likely scores similarly. Sonnet 5 wins on value and availability.
For a deeper comparison, see our GPT-5.6 Sol vs Claude Sonnet 5 analysis and Claude Sonnet 5 vs Opus 4.8 breakdown.
Context Window
OpenAI has not explicitly stated GPT-5.6’s context window. Given that GPT-5.5 supported 1M+ tokens and the trend toward larger contexts, GPT-5.6 likely supports at least 1M tokens across all three tiers. We will update this guide when official numbers are confirmed.
Practical Recommendations
If you have access: Use Sol for complex multi-step tasks where quality matters most. Use Luna for high-volume workloads where cost matters. Terra sits awkwardly between them since Luna outperforms it on Terminal-Bench at lower cost.
If you do not have access (most developers): Use Claude Sonnet 5 for balanced performance at $2/$10. Use Claude Opus 4.8 when you need maximum reasoning depth. Keep your API keys secure regardless of provider.
If you are evaluating for future planning: Watch the access situation. The government gate will not last forever, but there is no timeline for broader availability. Build your systems to be model-agnostic so you can swap in GPT-5.6 when it opens up.
FAQ
Can I sign up for GPT-5.6 access?
No. There is no public waitlist or self-serve signup. Access is controlled by the US government and limited to approximately 20 trusted partner organizations. This is fundamentally different from previous OpenAI launches.
Why does Luna score higher than Terra on Terminal-Bench?
Luna was optimized for speed and throughput, which appears to benefit its performance on coding-focused benchmarks like Terminal-Bench 2.1. Terra is optimized for general-purpose balanced performance. Different architectural choices lead to different benchmark profiles.
Is GPT-5.6 available in ChatGPT?
No. GPT-5.6 is not integrated into ChatGPT in any tier. It is API-only, and even API access is restricted to government-approved partners.
How does GPT-5.6 Sol Ultra compare to Claude Opus 4.8?
Sol Ultra scores 91.9% on Terminal-Bench 2.1, compared to Claude Opus 4.8’s 78.9%. That is a massive gap. However, Opus 4.8 is publicly available and has its own strengths in long-form reasoning and instruction following. See our Opus 4.8 guide for details.
When will GPT-5.6 become publicly available?
No timeline has been announced. Given the safety concerns driving the restrictions (particularly the biology and exploit capabilities), broader access may depend on additional safety infrastructure being deployed. This could be months, not weeks.