šŸ¤– AI Tools
Ā· 8 min read

Sakana Fugu Ultra: Multi-Agent AI for Nearly Free


Sakana AI, the Tokyo lab founded by ex-Google researchers (including a co-author of ā€œAttention Is All You Needā€), just launched Fugu Ultra. It’s not a model in the traditional sense. It’s an orchestration system that coordinates multiple frontier models behind a single API. You call one endpoint, and behind the scenes, Fugu routes your task across a pool of specialist models, has them check each other’s work, and returns a synthesized answer.

The result: frontier-level performance (93.2 on LiveCodeBench, beating Claude Fable 5’s 89.8) at $5 per million input tokens. That’s cheaper than calling most frontier models directly. And it supports a context window that extends beyond 272K tokens with graduated pricing.

This is either the smartest arbitrage in AI or an elaborate routing layer that adds complexity without value. After testing it, I think it’s mostly the former.

How Fugu Ultra Works

The core idea: instead of training one massive model to do everything, train a smaller model to delegate tasks to the right specialist.

Fugu is itself a language model, but its job isn’t to answer your questions directly. Its job is to:

  1. Analyze your request and break it into sub-tasks
  2. Route each sub-task to the best model in its pool
  3. Delegate work to multiple models (including recursive calls to itself)
  4. Verify outputs by having models check each other
  5. Synthesize the results into a single coherent response

You never see the orchestration. From your perspective, it’s one API call with one response. The multi-agent complexity is hidden behind an OpenAI-compatible endpoint.

What’s in the Model Pool?

Sakana hasn’t published the full list of models in the pool, but reports indicate it includes both open-source and proprietary models. The key insight: Fugu isn’t locked to any single provider. If a new model releases that’s better at math, Sakana can add it to the pool without retraining Fugu itself.

This means Fugu Ultra’s capabilities can improve without Sakana doing any model training, just by adding better specialists to the pool.

Pricing

Token TypeStandardContext > 272K
Input$5 / 1M tokens$10 / 1M tokens
Output$30 / 1M tokens$45 / 1M tokens
Cached input$0.50 / 1M tokens$1.00 / 1M tokens

Understanding the True Cost

There’s a catch. Fugu Ultra uses ā€œorchestration tokensā€ internally. When it routes work to multiple models, those internal tokens are billed at the same rate as standard tokens. Your actual bill may be 2-5x the naive calculation based on your input/output alone.

Sakana reports this transparently in the token_details usage field. But it means a seemingly simple request might consume more tokens than expected because Fugu called three models internally to verify the answer.

Real-world cost: For straightforward tasks, Fugu Ultra costs roughly what a single frontier model would. For complex tasks requiring multi-step reasoning, it costs more per request but potentially gives better answers than any single model could.

Subscription Plans

Sakana also offers monthly subscription tiers:

  • $20/month, $100/month, and $200/month options
  • These include token allowances at discounted rates
  • Pay-as-you-go available without subscription

Performance Benchmarks

Sakana claims some impressive numbers:

BenchmarkFugu UltraClaude Fable 5Notes
LiveCodeBench93.289.8Coding tasks
MATH-500High (unspecified)HighMathematical reasoning
GPQA-DiamondCompetitiveCompetitiveScience questions

The benchmark story is compelling, but remember: Fugu Ultra achieves these scores by orchestrating multiple models. It’s less ā€œone model is this smartā€ and more ā€œa committee of models, well-coordinated, produces this quality.ā€ That distinction matters when thinking about latency and cost.

Practical Advantages

Vendor Independence

This is the real selling point for many teams. With Fugu Ultra:

  • No single-vendor dependency
  • No exposure to one provider’s outages
  • No export control issues (since it routes across providers)
  • If one model degrades, the pool compensates

Context Window

Fugu Ultra supports context beyond 272K tokens (with graduated pricing). For teams processing very long documents, meeting transcripts, or codebases, this extended context is valuable. The cached input pricing ($0.50/1M) makes repeated context relatively cheap.

OpenAI-Compatible API

Drop-in replacement for OpenAI’s API. Switch your base URL and API key, keep your existing code:

from openai import OpenAI

client = OpenAI(
    api_key="your-sakana-key",
    base_url="https://api.sakana.ai/v1"
)

response = client.chat.completions.create(
    model="fugu-ultra-20260615",
    messages=[
        {"role": "user", "content": "Explain the CAP theorem with examples"}
    ]
)

print(response.choices[0].message.content)

That’s it. If your code works with OpenAI’s API, it works with Fugu Ultra.

Use Cases Where Fugu Ultra Shines

Complex Reasoning Tasks

Tasks that benefit from multiple perspectives: legal analysis, technical architecture decisions, scientific hypothesis evaluation. The multi-model approach catches errors that a single model might make.

Long Document Processing

With context beyond 272K tokens and cheap cached input pricing, Fugu Ultra handles:

  • Full codebase analysis
  • Long legal contracts
  • Book-length documents
  • Extended meeting transcript analysis

For document-specific OCR tasks, dedicated models like DeepSeek Vision or Mistral OCR 4 are better suited. But for understanding and reasoning about long documents (not just extracting text), Fugu Ultra’s extended context is compelling.

Coding Tasks

93.2 on LiveCodeBench isn’t accidental. Fugu Ultra routes coding tasks to code-specialized models, uses others for verification, and synthesizes working solutions. For teams that want the best code generation without committing to one provider, this is attractive.

When Accuracy Matters More Than Latency

The multi-model verification adds latency (multiple model calls happen sequentially or in parallel behind the scenes). But it also catches errors. For tasks where being right matters more than being fast (legal documents, medical summaries, financial analysis), the latency tradeoff is worth it.

Limitations

Latency

Multi-model orchestration is slower than a single model call. Simple questions that GPT-4 or Claude would answer in 1-2 seconds might take 3-8 seconds through Fugu Ultra. For real-time chat applications, this may be too slow.

Unpredictable Costs

Because orchestration tokens are billed, your actual cost per request varies based on how complex Fugu judges the task to be. A simple factual question might cost X. A nuanced reasoning question might cost 3X because Fugu called more models internally.

Black Box Routing

You don’t control which models handle your request. If you have specific requirements (must use only open-source models, must not send data to specific providers), Fugu Ultra may not accommodate that.

New and Unproven

Launched June 22, 2026. Production track record is measured in days, not years. Edge cases, reliability issues, and failure modes haven’t been fully discovered by the community yet.

Comparison with Alternatives

vs. Calling Frontier Models Directly

If you’d otherwise use Claude Fable 5 or GPT-5, Fugu Ultra gives similar quality at lower input cost ($5 vs $15-20/M tokens for top models). But higher output cost ($30/M) and orchestration overhead can make it more expensive for output-heavy tasks.

vs. OpenRouter

OpenRouter routes to single models based on your choice. Fugu Ultra routes to multiple models and synthesizes. OpenRouter gives you control over which model responds. Fugu Ultra gives you (potentially) better answers at the cost of control.

vs. Building Your Own Multi-Agent System

You could build orchestration yourself using LangChain, CrewAI, or custom code. Fugu Ultra saves you that engineering effort. It’s pre-built multi-agent orchestration behind one endpoint. Whether the quality of Sakana’s routing exceeds what you’d build yourself depends on your team’s expertise.

Getting Started

  1. Sign up at console.sakana.ai
  2. Generate an API key
  3. Point your OpenAI-compatible client at https://api.sakana.ai/v1
  4. Use model name fugu-ultra-20260615
  5. Monitor usage in the console (pay attention to orchestration tokens)

For teams evaluating AI APIs more broadly (including for vision and document tasks), our best multimodal AI APIs price comparison and DeepSeek Vision vs GPT-4o vs Gemini comparison cover the wider landscape.

Should You Use It?

Yes, if:

  • You want frontier-level quality without single-vendor lock-in
  • Complex reasoning accuracy matters more than latency
  • You’re processing long documents with extended context needs
  • You want to try multi-agent AI without building the infrastructure

Probably not, if:

  • You need predictable per-request costs
  • Latency under 2 seconds is a hard requirement
  • You need full control over which models process your data
  • You’re doing simple tasks where a single model is sufficient

Fugu Ultra is genuinely novel. The ā€œorchestration modelā€ category barely existed six months ago. Whether it becomes the default way to consume AI or remains a niche for specific use cases, it’s worth experimenting with.

FAQ

Is Sakana Fugu Ultra a single AI model?

No. It’s an orchestration system. Fugu itself is a language model, but its purpose is to coordinate a pool of other models. Your request gets routed to multiple specialists, verified, and synthesized into one response. You interact with it like a single model through one API.

Why are orchestration tokens billed separately?

When Fugu routes your request to multiple models internally, those models consume tokens. Sakana bills these as ā€œorchestration tokensā€ at the same rate as regular tokens. This means your total bill reflects the actual compute used, not just your visible input/output.

How does Fugu Ultra compare to OpenRouter Fusion?

Similar concept (multi-model synthesis), different implementation. OpenRouter Fusion is a feature within OpenRouter’s routing system. Fugu Ultra is Sakana’s dedicated orchestration model trained specifically for this task. Sakana claims Fugu produces higher quality synthesis, but both serve similar use cases.

Can I use Fugu Ultra for real-time chat?

It works, but expect higher latency (3-8 seconds for complex responses) compared to calling a single model directly. For chatbots where response time matters, a single frontier model is likely a better choice. For asynchronous or batch tasks, Fugu Ultra’s quality advantage is more relevant.

What happens if a model in the pool goes down?

Fugu’s orchestration handles this by routing to alternative models. Since the pool contains multiple capable models, individual outages don’t necessarily impact your results. This is one of the key advantages of the multi-model approach.

Is there a free tier?

Sakana offers trial credits for new accounts. Check their pricing page for current offers. The $20/month subscription tier is the cheapest ongoing option for light usage.