Two βFlashβ models, two different approaches to the same goal: maximum speed at minimum cost. Googleβs Gemini 3.5 Flash launched at I/O 2026 (May 19) and immediately became the default for cost-sensitive workloads. StepFunβs Step 3.7 Flash dropped ten days later with 2Γ the throughput and native video understanding.
Both cost under $1 per million output tokens. Both are fast enough for real-time applications. But they differ significantly in architecture, multimodal capabilities, and ecosystem. Here is how to choose.
Head-to-head comparison
| Step 3.7 Flash | Gemini 3.5 Flash | |
|---|---|---|
| Developer | StepFun (China) | |
| Architecture | MoE (198B total, 11B active) | Dense (undisclosed size) |
| Speed | 400 t/s | ~200 t/s |
| Context window | 256K | 1M |
| Input price | ~$0.20/M | $0.15/M |
| Output price | ~$0.80/M | $0.60/M |
| Cache hit | $0.04/M | $0.0375/M |
| Vision | β Native (images + video) | β Native (images) |
| Video | β Native | β |
| Reasoning tiers | 3 levels (Low/Medium/High) | β (single mode) |
| Advisor Mode | β (auto-escalation) | β |
| Open weight | β | β |
| Self-hostable | β (128GB RAM) | β |
| MCP-Atlas tool use | β | 83.6% |
| Finance Agent v2 | β | 57.9% |
| ClawEval-1.1 (agent reliability) | 67.1 | β |
| BrowseComp (search) | 75.82% | β |
| Available on OpenRouter | β | β |
Speed: Step 3.7 Flash wins (2Γ)
Step 3.7 Flash generates at 400 tokens per second β double Gemini 3.5 Flashβs ~200 t/s. For applications where latency matters (autocomplete, real-time chat, interactive coding), this is a meaningful difference.
The speed advantage comes from the MoE architecture: only 11B parameters activate per token despite the model having 198B total. Less compute per token means faster generation.
Context: Gemini wins (4Γ)
Gemini 3.5 Flash supports 1 million tokens of context β 4Γ Step 3.7 Flashβs 256K. For workloads that need to process entire codebases, long documents, or extensive conversation histories, Gemini has a clear advantage.
For most practical tasks (single file coding, chat, document analysis), 256K is sufficient. But if you regularly work with contexts above 256K tokens, Gemini is the only option.
Pricing: Gemini is slightly cheaper
Gemini 3.5 Flash is 25% cheaper on input ($0.15 vs $0.20) and 25% cheaper on output ($0.60 vs $0.80). Cache hit prices are nearly identical ($0.0375 vs $0.04).
In practice, the cost difference is small in absolute terms. A full day of heavy usage might cost $0.50 more with Step 3.7 Flash. The speed advantage (2Γ throughput) may offset this for time-sensitive workloads.
Multimodal: Step 3.7 Flash wins
Both models handle images natively. But Step 3.7 Flash adds:
- Native video understanding β Process video frames for temporal reasoning
- GUI interaction β Can open browsers, inspect rendered pages, interact with UIs
- Image manipulation β Crops, zooms, draws bounding boxes using Python tools
- Emergent tool combination β Writes code, visually verifies it, fixes based on what it βseesβ
Gemini 3.5 Flash handles images well but does not have the same agentic visual capabilities. If your workflow involves video processing or GUI automation, Step 3.7 Flash is the clear choice.
Reasoning flexibility: Step 3.7 Flash wins
Step 3.7 Flash offers three reasoning tiers per API call:
- Low: Fast responses for simple tasks (cheapest)
- Medium: Balanced for standard work
- High: Deep reasoning for complex problems (most expensive)
Gemini 3.5 Flash has a single inference mode. You get the same level of reasoning regardless of task complexity. This means you pay the same whether the task is trivial or hard.
With Step 3.7 Flash, you can use Low tier for 80% of requests (saving money) and High tier for the 20% that actually need deep reasoning. This granular control can reduce costs by 30-50% for mixed workloads.
Open weight vs closed
Step 3.7 Flash is fully open-weight (Hugging Face, GitHub). You can:
- Self-host on your own infrastructure
- Fine-tune for specific domains
- Inspect model behavior
- Run completely offline for data privacy
Gemini 3.5 Flash is closed-source, API-only. No self-hosting, no fine-tuning, no offline use.
For enterprises with strict data residency requirements, this alone may decide the choice.
Ecosystem and tooling
Gemini 3.5 Flash advantages:
- Native Antigravity CLI integration
- Google Cloud / Vertex AI integration
- Established community and documentation
- Higher benchmark scores on tool use (MCP-Atlas: 83.6%)
- Part of the Google AI ecosystem (Search, Workspace, etc.)
Step 3.7 Flash advantages:
- Available on OpenRouter (single API key for everything)
- Supports vLLM, SGLang, llama.cpp, Transformers
- Advisor Mode for automatic cost optimization
- Open-weight community (fine-tunes, quantizations)
- Works with Aider and Continue via OpenRouter
When to choose Step 3.7 Flash
- You need maximum speed (400 t/s)
- Your workload involves video understanding
- You want tunable reasoning tiers to optimize cost
- You need to self-host for data privacy
- You want Advisor Mode (auto-escalation to stronger models)
- You are building GUI automation agents
- You prefer open-weight models you can inspect and fine-tune
When to choose Gemini 3.5 Flash
- You need 1M token context (large codebases, long documents)
- You are in the Google Cloud ecosystem
- You want the cheapest possible per-token price
- You use Antigravity CLI as your primary coding tool
- You need proven tool-use reliability (83.6% MCP-Atlas)
- You want the most established documentation and community support
- Financial analysis is a key use case (57.9% Finance Agent v2)
Using both
Since both are available on OpenRouter, you can route between them based on task type:
def choose_flash_model(task):
if task.needs_video or task.needs_gui:
return "stepfun/step-3.7-flash"
elif task.context_length > 256000:
return "google/gemini-3.5-flash"
elif task.latency_critical:
return "stepfun/step-3.7-flash" # 2x faster
else:
return "google/gemini-3.5-flash" # slightly cheaper
The bigger picture
Both models represent the same trend: frontier-class capabilities at budget prices. A year ago, you needed $15-75/M tokens for this level of performance. Now you get it for $0.15-0.80/M.
Step 3.7 Flash adds another strong option to the growing list of cheap, capable models alongside DeepSeek V4-Pro, MiMo V2.5 Pro, and Gemini 3.5 Flash. The competition is driving prices toward zero while quality keeps improving.
FAQ
Which is better for coding?
For routine coding, both are comparable. Step 3.7 Flashβs Advisor Mode achieves 97% of Opus 4.6βs coding quality at $0.19/task. Gemini 3.5 Flash scores 54.2% on SWE-bench Pro. Neither matches Claude Opus 4.8 (69.2%) or DeepSeek V4-Pro on hard coding tasks.
Can Step 3.7 Flash replace Gemini 3.5 Flash entirely?
For most workloads under 256K context: yes. The speed advantage and reasoning tiers make it competitive or better. The main blocker is the 256K context limit β if you regularly exceed that, you need Gemini.
Is Step 3.7 Flash reliable enough for production?
The 67.1 ClawEval score (agent reliability) and 75.82% BrowseComp (search accuracy) suggest yes for agent workloads. It is newer than Gemini, so less production data exists. Start with non-critical workloads and validate.
How much RAM do I need to self-host?
128GB unified memory (Mac Studio/MacBook Pro) or 120GB system RAM (AMD). NVIDIA DGX also works. The GGUF quantized version is the most practical for local deployment.
Which has better documentation?
Gemini 3.5 Flash, by a wide margin. Google has extensive docs, tutorials, and community resources. Step 3.7 Flash is brand new with minimal English documentation. Expect this to improve as the community grows.
Can I use Step 3.7 Flash with Claude Code?
Not directly (Claude Code only supports Anthropic models). Use it via Aider or Continue with the OpenRouter endpoint. Or use it as a standalone API for non-coding tasks.