Sakana AI, the Tokyo lab founded by ex-Google researchers (including a co-author of āAttention Is All You Needā), just launched Fugu Ultra. Itās not a model in the traditional sense. Itās an orchestration system that coordinates multiple frontier models behind a single API. You call one endpoint, and behind the scenes, Fugu routes your task across a pool of specialist models, has them check each otherās work, and returns a synthesized answer.
The result: frontier-level performance (93.2 on LiveCodeBench, beating Claude Fable 5ās 89.8) at $5 per million input tokens. Thatās cheaper than calling most frontier models directly. And it supports a context window that extends beyond 272K tokens with graduated pricing.
This is either the smartest arbitrage in AI or an elaborate routing layer that adds complexity without value. After testing it, I think itās mostly the former.
How Fugu Ultra Works
The core idea: instead of training one massive model to do everything, train a smaller model to delegate tasks to the right specialist.
Fugu is itself a language model, but its job isnāt to answer your questions directly. Its job is to:
- Analyze your request and break it into sub-tasks
- Route each sub-task to the best model in its pool
- Delegate work to multiple models (including recursive calls to itself)
- Verify outputs by having models check each other
- Synthesize the results into a single coherent response
You never see the orchestration. From your perspective, itās one API call with one response. The multi-agent complexity is hidden behind an OpenAI-compatible endpoint.
Whatās in the Model Pool?
Sakana hasnāt published the full list of models in the pool, but reports indicate it includes both open-source and proprietary models. The key insight: Fugu isnāt locked to any single provider. If a new model releases thatās better at math, Sakana can add it to the pool without retraining Fugu itself.
This means Fugu Ultraās capabilities can improve without Sakana doing any model training, just by adding better specialists to the pool.
Pricing
| Token Type | Standard | Context > 272K |
|---|---|---|
| Input | $5 / 1M tokens | $10 / 1M tokens |
| Output | $30 / 1M tokens | $45 / 1M tokens |
| Cached input | $0.50 / 1M tokens | $1.00 / 1M tokens |
Understanding the True Cost
Thereās a catch. Fugu Ultra uses āorchestration tokensā internally. When it routes work to multiple models, those internal tokens are billed at the same rate as standard tokens. Your actual bill may be 2-5x the naive calculation based on your input/output alone.
Sakana reports this transparently in the token_details usage field. But it means a seemingly simple request might consume more tokens than expected because Fugu called three models internally to verify the answer.
Real-world cost: For straightforward tasks, Fugu Ultra costs roughly what a single frontier model would. For complex tasks requiring multi-step reasoning, it costs more per request but potentially gives better answers than any single model could.
Subscription Plans
Sakana also offers monthly subscription tiers:
- $20/month, $100/month, and $200/month options
- These include token allowances at discounted rates
- Pay-as-you-go available without subscription
Performance Benchmarks
Sakana claims some impressive numbers:
| Benchmark | Fugu Ultra | Claude Fable 5 | Notes |
|---|---|---|---|
| LiveCodeBench | 93.2 | 89.8 | Coding tasks |
| MATH-500 | High (unspecified) | High | Mathematical reasoning |
| GPQA-Diamond | Competitive | Competitive | Science questions |
The benchmark story is compelling, but remember: Fugu Ultra achieves these scores by orchestrating multiple models. Itās less āone model is this smartā and more āa committee of models, well-coordinated, produces this quality.ā That distinction matters when thinking about latency and cost.
Practical Advantages
Vendor Independence
This is the real selling point for many teams. With Fugu Ultra:
- No single-vendor dependency
- No exposure to one providerās outages
- No export control issues (since it routes across providers)
- If one model degrades, the pool compensates
Context Window
Fugu Ultra supports context beyond 272K tokens (with graduated pricing). For teams processing very long documents, meeting transcripts, or codebases, this extended context is valuable. The cached input pricing ($0.50/1M) makes repeated context relatively cheap.
OpenAI-Compatible API
Drop-in replacement for OpenAIās API. Switch your base URL and API key, keep your existing code:
from openai import OpenAI
client = OpenAI(
api_key="your-sakana-key",
base_url="https://api.sakana.ai/v1"
)
response = client.chat.completions.create(
model="fugu-ultra-20260615",
messages=[
{"role": "user", "content": "Explain the CAP theorem with examples"}
]
)
print(response.choices[0].message.content)
Thatās it. If your code works with OpenAIās API, it works with Fugu Ultra.
Use Cases Where Fugu Ultra Shines
Complex Reasoning Tasks
Tasks that benefit from multiple perspectives: legal analysis, technical architecture decisions, scientific hypothesis evaluation. The multi-model approach catches errors that a single model might make.
Long Document Processing
With context beyond 272K tokens and cheap cached input pricing, Fugu Ultra handles:
- Full codebase analysis
- Long legal contracts
- Book-length documents
- Extended meeting transcript analysis
For document-specific OCR tasks, dedicated models like DeepSeek Vision or Mistral OCR 4 are better suited. But for understanding and reasoning about long documents (not just extracting text), Fugu Ultraās extended context is compelling.
Coding Tasks
93.2 on LiveCodeBench isnāt accidental. Fugu Ultra routes coding tasks to code-specialized models, uses others for verification, and synthesizes working solutions. For teams that want the best code generation without committing to one provider, this is attractive.
When Accuracy Matters More Than Latency
The multi-model verification adds latency (multiple model calls happen sequentially or in parallel behind the scenes). But it also catches errors. For tasks where being right matters more than being fast (legal documents, medical summaries, financial analysis), the latency tradeoff is worth it.
Limitations
Latency
Multi-model orchestration is slower than a single model call. Simple questions that GPT-4 or Claude would answer in 1-2 seconds might take 3-8 seconds through Fugu Ultra. For real-time chat applications, this may be too slow.
Unpredictable Costs
Because orchestration tokens are billed, your actual cost per request varies based on how complex Fugu judges the task to be. A simple factual question might cost X. A nuanced reasoning question might cost 3X because Fugu called more models internally.
Black Box Routing
You donāt control which models handle your request. If you have specific requirements (must use only open-source models, must not send data to specific providers), Fugu Ultra may not accommodate that.
New and Unproven
Launched June 22, 2026. Production track record is measured in days, not years. Edge cases, reliability issues, and failure modes havenāt been fully discovered by the community yet.
Comparison with Alternatives
vs. Calling Frontier Models Directly
If youād otherwise use Claude Fable 5 or GPT-5, Fugu Ultra gives similar quality at lower input cost ($5 vs $15-20/M tokens for top models). But higher output cost ($30/M) and orchestration overhead can make it more expensive for output-heavy tasks.
vs. OpenRouter
OpenRouter routes to single models based on your choice. Fugu Ultra routes to multiple models and synthesizes. OpenRouter gives you control over which model responds. Fugu Ultra gives you (potentially) better answers at the cost of control.
vs. Building Your Own Multi-Agent System
You could build orchestration yourself using LangChain, CrewAI, or custom code. Fugu Ultra saves you that engineering effort. Itās pre-built multi-agent orchestration behind one endpoint. Whether the quality of Sakanaās routing exceeds what youād build yourself depends on your teamās expertise.
Getting Started
- Sign up at console.sakana.ai
- Generate an API key
- Point your OpenAI-compatible client at
https://api.sakana.ai/v1 - Use model name
fugu-ultra-20260615 - Monitor usage in the console (pay attention to orchestration tokens)
For teams evaluating AI APIs more broadly (including for vision and document tasks), our best multimodal AI APIs price comparison and DeepSeek Vision vs GPT-4o vs Gemini comparison cover the wider landscape.
Should You Use It?
Yes, if:
- You want frontier-level quality without single-vendor lock-in
- Complex reasoning accuracy matters more than latency
- Youāre processing long documents with extended context needs
- You want to try multi-agent AI without building the infrastructure
Probably not, if:
- You need predictable per-request costs
- Latency under 2 seconds is a hard requirement
- You need full control over which models process your data
- Youāre doing simple tasks where a single model is sufficient
Fugu Ultra is genuinely novel. The āorchestration modelā category barely existed six months ago. Whether it becomes the default way to consume AI or remains a niche for specific use cases, itās worth experimenting with.
FAQ
Is Sakana Fugu Ultra a single AI model?
No. Itās an orchestration system. Fugu itself is a language model, but its purpose is to coordinate a pool of other models. Your request gets routed to multiple specialists, verified, and synthesized into one response. You interact with it like a single model through one API.
Why are orchestration tokens billed separately?
When Fugu routes your request to multiple models internally, those models consume tokens. Sakana bills these as āorchestration tokensā at the same rate as regular tokens. This means your total bill reflects the actual compute used, not just your visible input/output.
How does Fugu Ultra compare to OpenRouter Fusion?
Similar concept (multi-model synthesis), different implementation. OpenRouter Fusion is a feature within OpenRouterās routing system. Fugu Ultra is Sakanaās dedicated orchestration model trained specifically for this task. Sakana claims Fugu produces higher quality synthesis, but both serve similar use cases.
Can I use Fugu Ultra for real-time chat?
It works, but expect higher latency (3-8 seconds for complex responses) compared to calling a single model directly. For chatbots where response time matters, a single frontier model is likely a better choice. For asynchronous or batch tasks, Fugu Ultraās quality advantage is more relevant.
What happens if a model in the pool goes down?
Fuguās orchestration handles this by routing to alternative models. Since the pool contains multiple capable models, individual outages donāt necessarily impact your results. This is one of the key advantages of the multi-model approach.
Is there a free tier?
Sakana offers trial credits for new accounts. Check their pricing page for current offers. The $20/month subscription tier is the cheapest ongoing option for light usage.