Jun 24, 2026 · 8 min read

Sakana Fugu Ultra: Multi-Agent AI for Nearly Free

Sakana AI, the Tokyo lab founded by ex-Google researchers (including a co-author of “Attention Is All You Need”), just launched Fugu Ultra. It’s not a model in the traditional sense. It’s an orchestration system that coordinates multiple frontier models behind a single API. You call one endpoint, and behind the scenes, Fugu routes your task across a pool of specialist models, has them check each other’s work, and returns a synthesized answer.

The result: frontier-level performance (93.2 on LiveCodeBench, beating Claude Fable 5’s 89.8) at $5 per million input tokens. That’s cheaper than calling most frontier models directly. And it supports a context window that extends beyond 272K tokens with graduated pricing.

This is either the smartest arbitrage in AI or an elaborate routing layer that adds complexity without value. After testing it, I think it’s mostly the former.

How Fugu Ultra Works

The core idea: instead of training one massive model to do everything, train a smaller model to delegate tasks to the right specialist.

Fugu is itself a language model, but its job isn’t to answer your questions directly. Its job is to:

Analyze your request and break it into sub-tasks
Route each sub-task to the best model in its pool
Delegate work to multiple models (including recursive calls to itself)
Verify outputs by having models check each other
Synthesize the results into a single coherent response

You never see the orchestration. From your perspective, it’s one API call with one response. The multi-agent complexity is hidden behind an OpenAI-compatible endpoint.

What’s in the Model Pool?

Sakana hasn’t published the full list of models in the pool, but reports indicate it includes both open-source and proprietary models. The key insight: Fugu isn’t locked to any single provider. If a new model releases that’s better at math, Sakana can add it to the pool without retraining Fugu itself.

This means Fugu Ultra’s capabilities can improve without Sakana doing any model training, just by adding better specialists to the pool.

Pricing

Token Type	Standard	Context > 272K
Input	$5 / 1M tokens	$10 / 1M tokens
Output	$30 / 1M tokens	$45 / 1M tokens
Cached input	$0.50 / 1M tokens	$1.00 / 1M tokens

Understanding the True Cost

There’s a catch. Fugu Ultra uses “orchestration tokens” internally. When it routes work to multiple models, those internal tokens are billed at the same rate as standard tokens. Your actual bill may be 2-5x the naive calculation based on your input/output alone.

Sakana reports this transparently in the token_details usage field. But it means a seemingly simple request might consume more tokens than expected because Fugu called three models internally to verify the answer.

Real-world cost: For straightforward tasks, Fugu Ultra costs roughly what a single frontier model would. For complex tasks requiring multi-step reasoning, it costs more per request but potentially gives better answers than any single model could.

Subscription Plans

Sakana also offers monthly subscription tiers:

$20/month, $100/month, and $200/month options
These include token allowances at discounted rates
Pay-as-you-go available without subscription

Performance Benchmarks

Sakana claims some impressive numbers:

Benchmark	Fugu Ultra	Claude Fable 5	Notes
LiveCodeBench	93.2	89.8	Coding tasks
MATH-500	High (unspecified)	High	Mathematical reasoning
GPQA-Diamond	Competitive	Competitive	Science questions

The benchmark story is compelling, but remember: Fugu Ultra achieves these scores by orchestrating multiple models. It’s less “one model is this smart” and more “a committee of models, well-coordinated, produces this quality.” That distinction matters when thinking about latency and cost.

Practical Advantages

Vendor Independence

This is the real selling point for many teams. With Fugu Ultra:

No single-vendor dependency
No exposure to one provider’s outages
No export control issues (since it routes across providers)
If one model degrades, the pool compensates

Context Window

Fugu Ultra supports context beyond 272K tokens (with graduated pricing). For teams processing very long documents, meeting transcripts, or codebases, this extended context is valuable. The cached input pricing ($0.50/1M) makes repeated context relatively cheap.

OpenAI-Compatible API

Drop-in replacement for OpenAI’s API. Switch your base URL and API key, keep your existing code:

from openai import OpenAI

client = OpenAI(
    api_key="your-sakana-key",
    base_url="https://api.sakana.ai/v1"
)

response = client.chat.completions.create(
    model="fugu-ultra-20260615",
    messages=[
        {"role": "user", "content": "Explain the CAP theorem with examples"}
    ]
)

print(response.choices[0].message.content)

That’s it. If your code works with OpenAI’s API, it works with Fugu Ultra.

Use Cases Where Fugu Ultra Shines

Complex Reasoning Tasks

Tasks that benefit from multiple perspectives: legal analysis, technical architecture decisions, scientific hypothesis evaluation. The multi-model approach catches errors that a single model might make.

Long Document Processing

With context beyond 272K tokens and cheap cached input pricing, Fugu Ultra handles:

Full codebase analysis
Long legal contracts
Book-length documents
Extended meeting transcript analysis

For document-specific OCR tasks, dedicated models like DeepSeek Vision or Mistral OCR 4 are better suited. But for understanding and reasoning about long documents (not just extracting text), Fugu Ultra’s extended context is compelling.

Coding Tasks

93.2 on LiveCodeBench isn’t accidental. Fugu Ultra routes coding tasks to code-specialized models, uses others for verification, and synthesizes working solutions. For teams that want the best code generation without committing to one provider, this is attractive.

When Accuracy Matters More Than Latency

The multi-model verification adds latency (multiple model calls happen sequentially or in parallel behind the scenes). But it also catches errors. For tasks where being right matters more than being fast (legal documents, medical summaries, financial analysis), the latency tradeoff is worth it.

Limitations

Latency

Multi-model orchestration is slower than a single model call. Simple questions that GPT-4 or Claude would answer in 1-2 seconds might take 3-8 seconds through Fugu Ultra. For real-time chat applications, this may be too slow.

Unpredictable Costs

Because orchestration tokens are billed, your actual cost per request varies based on how complex Fugu judges the task to be. A simple factual question might cost X. A nuanced reasoning question might cost 3X because Fugu called more models internally.

Black Box Routing

You don’t control which models handle your request. If you have specific requirements (must use only open-source models, must not send data to specific providers), Fugu Ultra may not accommodate that.

New and Unproven

Launched June 22, 2026. Production track record is measured in days, not years. Edge cases, reliability issues, and failure modes haven’t been fully discovered by the community yet.

Comparison with Alternatives

vs. Calling Frontier Models Directly

If you’d otherwise use Claude Fable 5 or GPT-5, Fugu Ultra gives similar quality at lower input cost ($5 vs $15-20/M tokens for top models). But higher output cost ($30/M) and orchestration overhead can make it more expensive for output-heavy tasks.

vs. OpenRouter

OpenRouter routes to single models based on your choice. Fugu Ultra routes to multiple models and synthesizes. OpenRouter gives you control over which model responds. Fugu Ultra gives you (potentially) better answers at the cost of control.

vs. Building Your Own Multi-Agent System

You could build orchestration yourself using LangChain, CrewAI, or custom code. Fugu Ultra saves you that engineering effort. It’s pre-built multi-agent orchestration behind one endpoint. Whether the quality of Sakana’s routing exceeds what you’d build yourself depends on your team’s expertise.

Getting Started

Sign up at console.sakana.ai
Generate an API key
Point your OpenAI-compatible client at https://api.sakana.ai/v1
Use model name fugu-ultra-20260615
Monitor usage in the console (pay attention to orchestration tokens)

For teams evaluating AI APIs more broadly (including for vision and document tasks), our best multimodal AI APIs price comparison and DeepSeek Vision vs GPT-4o vs Gemini comparison cover the wider landscape.

Should You Use It?

Yes, if:

You want frontier-level quality without single-vendor lock-in
Complex reasoning accuracy matters more than latency
You’re processing long documents with extended context needs
You want to try multi-agent AI without building the infrastructure

Probably not, if:

You need predictable per-request costs
Latency under 2 seconds is a hard requirement
You need full control over which models process your data
You’re doing simple tasks where a single model is sufficient

Fugu Ultra is genuinely novel. The “orchestration model” category barely existed six months ago. Whether it becomes the default way to consume AI or remains a niche for specific use cases, it’s worth experimenting with.

FAQ

Is Sakana Fugu Ultra a single AI model?

No. It’s an orchestration system. Fugu itself is a language model, but its purpose is to coordinate a pool of other models. Your request gets routed to multiple specialists, verified, and synthesized into one response. You interact with it like a single model through one API.

Why are orchestration tokens billed separately?

When Fugu routes your request to multiple models internally, those models consume tokens. Sakana bills these as “orchestration tokens” at the same rate as regular tokens. This means your total bill reflects the actual compute used, not just your visible input/output.

How does Fugu Ultra compare to OpenRouter Fusion?

Similar concept (multi-model synthesis), different implementation. OpenRouter Fusion is a feature within OpenRouter’s routing system. Fugu Ultra is Sakana’s dedicated orchestration model trained specifically for this task. Sakana claims Fugu produces higher quality synthesis, but both serve similar use cases.

Can I use Fugu Ultra for real-time chat?

It works, but expect higher latency (3-8 seconds for complex responses) compared to calling a single model directly. For chatbots where response time matters, a single frontier model is likely a better choice. For asynchronous or batch tasks, Fugu Ultra’s quality advantage is more relevant.

What happens if a model in the pool goes down?

Fugu’s orchestration handles this by routing to alternative models. Since the pool contains multiple capable models, individual outages don’t necessarily impact your results. This is one of the key advantages of the multi-model approach.

Is there a free tier?

Sakana offers trial credits for new accounts. Check their pricing page for current offers. The $20/month subscription tier is the cheapest ongoing option for light usage.