🤖 AI Tools
· 9 min read

Gemini 3.5 Flash Complete Guide: Google's Fastest Frontier Model


Google dropped Gemini 3.5 Flash at Google I/O 2026 on May 19, and it’s a significant leap. This is the first model in the Gemini 3.5 family — a Flash-tier model that outperforms the previous generation’s Pro on coding and agentic benchmarks, while pushing 289 tokens per second. That’s 4x faster than Claude Opus 4.7 and GPT-5.5. For developers building AI agents, this changes the cost-performance equation entirely.

Here’s everything you need to know about Gemini 3.5 Flash: specs, benchmarks, pricing, API access, and where it fits in the current model landscape. If you’ve been following the pre-I/O leaks, some of this will confirm what was rumored — but the benchmark numbers are now official.

Key Specs

SpecValue
Release dateMay 19, 2026 (Google I/O 2026)
Model familyGemini 3.5 (first in family)
Input modalitiesText, Image, Video, Audio, PDF
Output modalitiesText only
Context window1,000,000 tokens
Max output tokens65,536
Knowledge cutoffJanuary 2025
Output speed289 tokens/sec
Input pricing (global)$1.50 per 1M tokens
Output pricing (global)$9.00 per 1M tokens
Cached input pricing$0.15 per 1M tokens
Tool useFunction calling, Structured output, Search as a tool, Code execution
Computer UseNot supported

The 1M context window matches Gemini 3.1 Pro. The 65K max output is generous for a Flash model — enough for full file rewrites and long-form generation without truncation.

Benchmarks

These numbers come directly from Google DeepMind’s official release. Bold indicates the best score in each row.

CategoryBenchmarkGemini 3.5 FlashGemini 3 FlashGemini 3.1 ProClaude Sonnet 4.6Claude Opus 4.7GPT-5.5
CodingTerminal-bench 2.176.2%58.0%70.3%66.1%78.2%
CodingSWE-Bench Pro55.1%49.6%54.2%64.3%58.6%
AgenticMCP Atlas83.6%62.0%78.2%69.5%79.1%75.3%
AgenticToolathlon56.5%49.4%55.6%
UI ControlOSWorld-Verified78.4%65.1%76.2%72.5%78.0%78.7%
ExpertFinance Agent v257.9%42.6%43.0%51.0%51.5%51.8%
ExpertGDPval-AA (Elo)165612041314167617531769
MultimodalCharXiv Reasoning84.2%80.3%83.3%72.4%82.1%84.1%
MultimodalMMMU-Pro83.6%81.2%80.5%74.5%75.2%81.2%
Long contextMRCR v2 (128k)77.3%67.2%84.9%59.2%46.9%41.4%
ReasoningARC-AGI-272.1%33.6%77.1%58.3%75.8%84.6%

Key takeaways from benchmarks

Gemini 3.5 Flash leads on:

  • MCP Atlas (83.6%) — the top agentic tool-use benchmark. Critical if you’re building MCP-based agents.
  • Toolathlon (56.5%) — multi-step tool orchestration
  • Finance Agent v2 (57.9%) — domain-specific agent tasks
  • CharXiv Reasoning (84.2%) and MMMU-Pro (83.6%) — multimodal understanding

GPT-5.5 still wins on:

  • Terminal-bench 2.1 (78.2%) — raw coding in terminal environments
  • SWE-Bench Pro (58.6%) — real-world software engineering (though Claude Opus 4.7 leads at 64.3%)
  • ARC-AGI-2 (84.6%) — abstract reasoning

The story: A Flash-tier model now beats the previous Pro on coding (76.2% vs 70.3% on Terminal-bench) and agentic tasks (83.6% vs 78.2% on MCP Atlas). For a deeper comparison, see our Gemini 3.5 Flash vs Claude Opus 4.7 vs GPT-5.5 breakdown.

What’s New vs Previous Gemini Models

Compared to Gemini 3 Flash and Gemini 3.1 Pro:

  • Agentic performance leap: +21.6 points on MCP Atlas over Gemini 3 Flash. This model was built for tool use and MCP.
  • Speed: 289 tok/s is a massive jump. Gemini 3 Flash was already fast; 3.5 Flash is faster while being significantly more capable.
  • Coding parity with Pro: 76.2% on Terminal-bench vs 70.3% for 3.1 Pro. A Flash model shouldn’t be beating Pro — but here we are.
  • Multimodal gains: Best-in-class on CharXiv and MMMU-Pro, beating all competitors including GPT-5.5.
  • Long context regression: 77.3% on MRCR v2 vs 84.9% for 3.1 Pro. If long-context retrieval is your primary use case, 3.1 Pro still has an edge. See our Gemini 3.5 Flash vs 3.1 Pro comparison for details.

Google’s positioning has shifted from “chatbot” to “agent.” Gemini 3.5 Flash powers Antigravity 2.0, Gemini Spark, and AI Mode in Search. The model is optimized for multi-step tool orchestration, not just single-turn Q&A.

Pricing Breakdown

ModelInput (per 1M)Output (per 1M)Notes
Gemini 3.5 Flash$1.50$9.00Global tier. Non-global: $1.65/$9.90
Gemini 3 Flash$0.50$3.003x cheaper, significantly less capable
Gemini 3.1 Pro$2.50$15.0040% more expensive, worse on agentic
Claude Sonnet 4.6$3.00$15.002x input, 1.7x output cost
Claude Opus 4.7$15.00$75.0010x input, 8.3x output cost
GPT-5.5$5.00$15.003.3x input, 1.7x output cost
DeepSeek V4 Pro$2.19$8.76Closest competitor on price-performance

Cached input at $0.15/M is extremely aggressive — 90% discount on repeated context. For agent loops that re-send system prompts and tool definitions, this adds up fast.

For a full pricing analysis across all major APIs, see our AI API Pricing Compared 2026 guide. If you’re looking for free-tier options, check Best Free AI APIs 2026.

The closest price-performance competitor is DeepSeek V4 Pro at $2.19/$8.76 — but Gemini 3.5 Flash beats it on agentic benchmarks while being cheaper on input. Full comparison: Gemini 3.5 Flash vs DeepSeek V4.

Where to Use Gemini 3.5 Flash

Gemini 3.5 Flash is available on:

  • Gemini API — Direct access via generateContent endpoint
  • Google AI Studio — Browser-based playground for testing
  • Gemini App — Consumer chat interface (web + mobile)
  • Antigravity — Google’s agentic coding IDE (full review)
  • OpenRouter — Multi-provider routing (OpenRouter guide)
  • Android Studio — Integrated coding assistant
  • Gemini CLI — Terminal-based access (Gemini CLI guide)

Quick API Example

import google.genai as genai

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain the MCP protocol in 3 sentences.",
    config={
        "temperature": 0.7,
        "max_output_tokens": 1024,
    }
)

print(response.text)

For a full setup walkthrough including streaming, function calling, and structured output, see our Gemini 3.5 Flash API Setup Guide.

If you’re using it through Gemini CLI or comparing terminal coding tools, check our Claude Code vs Codex CLI vs Gemini CLI comparison.

Google AI Subscription Tiers

Gemini 3.5 Flash is included in all paid Google AI tiers:

TierPriceGemini 3.5 Flash AccessExtras
Google AI Plus$20/moYes (rate-limited)Basic access to Gemini App
Google AI Pro$20/moYes (higher limits)API access, AI Studio
Google AI Ultra$100/mo or $200/moYes (highest limits)Priority access, 3.5 Pro early access

The free tier still gives access to Gemini 3 Flash. Upgrading to Plus or Pro unlocks 3.5 Flash in the Gemini App and API respectively.

Thinking Mode

Gemini 3.5 Flash supports a “thinking” mode similar to what was introduced in Gemini 2.5. When enabled, the model performs extended reasoning before responding — useful for complex coding, math, and multi-step planning tasks.

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Find the bug in this code and explain the fix step by step.",
    config={
        "thinking_config": {"thinking_budget": 8192}
    }
)

# Access thinking content
for part in response.candidates[0].content.parts:
    if part.thought:
        print("Thinking:", part.text)
    else:
        print("Response:", part.text)

Thinking mode increases latency but improves accuracy on reasoning-heavy tasks. The thinking_budget parameter controls how many tokens the model can use for internal reasoning (not counted toward output billing). For agentic workflows where speed matters more than deep reasoning, leave thinking mode off.

What This Means for Developers

Agent builders: The 83.6% MCP Atlas score is the headline. If you’re building agents that use MCP tools, Gemini 3.5 Flash is now the best model for tool orchestration — and it’s cheap. At $1.50/$9.00 with 289 tok/s, you can run complex agent loops without burning through budget.

Coding assistants: 76.2% on Terminal-bench puts it in striking distance of GPT-5.5 (78.2%). For IDE integrations and code generation, it’s a viable primary model at a fraction of the cost of Opus or GPT-5.5.

Multimodal apps: Best-in-class on CharXiv and MMMU-Pro means it handles charts, diagrams, and document understanding better than any other model at this price point.

Cost optimization: If you were using Gemini 3.1 Pro, switching to 3.5 Flash saves 40% on input and 40% on output while getting better agentic and coding performance. The only tradeoff is long-context retrieval (77.3% vs 84.9%).

Track how Gemini models compare over time on our Gemini race page.

Limitations

Be aware of these constraints before building on Gemini 3.5 Flash:

  • No Computer Use: Unlike Claude Opus 4.7, there’s no browser/desktop control capability. If you need UI automation, look elsewhere.
  • Text-only output: Input is multimodal (text, image, video, audio, PDF), but output is text only. No image generation, no audio synthesis, no video output.
  • Knowledge cutoff January 2025: The model doesn’t know about events after January 2025 unless you provide context or enable Search as a tool.
  • Long context retrieval: At 77.3% on MRCR v2 (128k), it’s good but not best-in-class. Gemini 3.1 Pro (84.9%) is still better for needle-in-haystack retrieval over very long documents.
  • SWE-Bench gap: At 55.1%, it trails Claude Opus 4.7 (64.3%) and GPT-5.5 (58.6%) on real-world software engineering tasks. For complex multi-file refactors, Opus remains stronger.
  • Gemini 3.5 Pro coming June: If you need absolute frontier performance, the Pro variant is expected next month. This is the Flash (speed-optimized) tier.

FAQ

Is Gemini 3.5 Flash free?

Not directly via API. The free tier still uses Gemini 3 Flash. However, Gemini 3.5 Flash is included in Google AI Plus ($20/mo) and Pro ($20/mo) subscriptions. API usage is pay-per-token at $1.50/M input and $9.00/M output. Check our best free AI APIs guide for free alternatives.

Gemini 3.5 Flash vs GPT-5.5 — which is better?

Depends on the task. Gemini 3.5 Flash wins on agentic benchmarks (MCP Atlas: 83.6% vs 75.3%), multimodal (MMMU-Pro: 83.6% vs 81.2%), and is 4x faster at 289 tok/s vs 71 tok/s. GPT-5.5 wins on coding (Terminal-bench: 78.2% vs 76.2%) and reasoning (ARC-AGI-2: 84.6% vs 72.1%). Gemini is also 3.3x cheaper on input. Full comparison: Gemini 3.5 Flash vs Claude Opus 4.7 vs GPT-5.5.

When is Gemini 3.5 Pro coming?

Google confirmed Gemini 3.5 Pro is coming in June 2026. It will be the full frontier model in the 3.5 family, expected to compete directly with Claude Opus 4.7 and GPT-5.5 on all benchmarks.

Is Gemini 3.5 Flash better than Gemini 3.1 Pro?

On most benchmarks, yes. It beats 3.1 Pro on Terminal-bench (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), and multimodal tasks. The exception is long-context retrieval where 3.1 Pro still leads (84.9% vs 77.3%). It’s also 40% cheaper. See Gemini 3.5 Flash vs 3.1 Pro.

Does Gemini 3.5 Flash support MCP?

Yes. It supports function calling and scores 83.6% on MCP Atlas — the highest of any model. It’s optimized for MCP tool orchestration. It also supports Search as a tool and Code execution.

What’s the context window for Gemini 3.5 Flash?

1 million tokens input, 65,536 tokens max output. Same context length as Gemini 3.1 Pro.

Can Gemini 3.5 Flash generate images?

No. Despite accepting multimodal input (images, video, audio, PDF), Gemini 3.5 Flash only outputs text. For image generation, use Imagen 4 or other dedicated models.

How fast is Gemini 3.5 Flash?

289 output tokens per second. For comparison: Claude Opus 4.7 runs at 67 tok/s and GPT-5.5 at 71 tok/s. This makes Gemini 3.5 Flash roughly 4x faster than competing frontier models.