May 20, 2026 · 6 min read

Gemini 3.5 Flash API Setup Guide: Get Started in 5 Minutes

Q: What is the model ID for Gemini 3.5 Flash?

The model ID is `gemini-3.5-flash`. Use this string in all SDK calls and REST API requests. On OpenRouter, use `google/gemini-3.5-flash`.

The Gemini 3.5 Flash API is one of the fastest and most cost-effective ways to add AI capabilities to your applications. With a 1M token context window, multimodal input support, and pricing starting at just $1.50 per million input tokens, it’s an excellent choice for developers building production-grade AI features.

In this guide, you’ll go from zero to making your first Gemini 3.5 Flash API call in under 5 minutes. We’ll cover the Google AI SDK, Node.js, cURL, thinking mode, streaming, and how to access the model via OpenRouter.

For a broader overview of the model’s capabilities and benchmarks, check out our Gemini 3.5 Flash complete guide.

Prerequisites

Before you start, make sure you have:

A Google account (for API key access)
Python 3.9+ or Node.js 18+ installed
A terminal or code editor ready

That’s it. The Gemini 3.5 Flash API has a free tier so you don’t need a credit card to get started.

Step 1: Get Your Gemini 3.5 Flash API Key

There are two ways to access the Gemini 3.5 Flash API:

Google AI Studio (free tier, ideal for prototyping and personal projects)
Vertex AI (enterprise, with SLAs and higher rate limits)

For this tutorial, we’ll use Google AI Studio:

Go to Google AI Studio
Sign in with your Google account
Click “Get API Key” in the left sidebar
Click “Create API Key” and select a project
Copy your key — you’ll need it in the next step

Tip: Store your API key in an environment variable (GEMINI_API_KEY) rather than hardcoding it. This is a security best practice.

The free tier gives you 15 requests per minute and 1,500 requests per day — more than enough for development and testing. If you need higher limits, check out our best free AI APIs guide for alternatives.

Step 2: Make Your First Request (Python)

Install the Google AI SDK:

pip install google-generativeai

Now make your first Gemini 3.5 Flash API call:

import google.generativeai as genai

genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-3.5-flash")
response = model.generate_content("Explain microservices")
print(response.text)

That’s it — five lines of code. The model ID is gemini-3.5-flash. The response comes back as plain text by default.

The Gemini 3.5 Flash API supports text, image, video, audio, and PDF input with text output. You can pass multimodal content by including file parts in your request.

Step 3: Node.js Example

Install the SDK:

npm install @google/generative-ai

Then use it:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI("YOUR_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3.5-flash" });
const result = await model.generateContent("Explain REST APIs");
console.log(result.response.text());

The Node.js SDK mirrors the Python SDK closely. Both support streaming, function calling, and structured output. For more on streaming implementations, see our guide on streaming AI responses in Node.js.

Step 4: cURL Example

If you prefer testing from the command line:

curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent?key=YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"Hello"}]}]}'

This is useful for quick testing or integrating with languages that don’t have an official SDK. The REST API gives you full access to all Gemini 3.5 Flash API features.

Using Thinking Mode

One of the standout features of Gemini 3.5 Flash is thinking mode. When enabled, the model performs chain-of-thought reasoning before generating its final answer — improving accuracy on complex tasks like coding, math, and system design.

response = model.generate_content(
    "Design a caching strategy for a high-traffic API",
    generation_config={"thinking": {"enabled": True}}
)

Thinking mode uses additional tokens for the reasoning process, but the quality improvement on complex prompts is significant. It’s especially useful for:

Architecture and design questions
Multi-step reasoning problems
Code generation with complex requirements
Mathematical proofs and calculations

Streaming Responses

For better UX in chat applications, use streaming to display tokens as they arrive:

response = model.generate_content(
    "Write a detailed guide to Docker networking",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Streaming is essential for production applications where perceived latency matters. The Gemini 3.5 Flash API supports streaming across all SDKs and the REST API. Learn more in our streaming AI responses guide.

Access via OpenRouter

OpenRouter provides a unified API that lets you access Gemini 3.5 Flash alongside hundreds of other models using an OpenAI-compatible endpoint. This is ideal if you want to switch between models without changing your code.

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

response = client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

The model ID on OpenRouter is google/gemini-3.5-flash. Since it uses the OpenAI SDK format, you can swap models by changing a single string. This makes it easy to compare Gemini 3.5 Flash against Claude Opus 4.7 or GPT-5.5 in your own applications.

Structured Output

The Gemini 3.5 Flash API supports structured output via JSON schema, making it easy to get predictable, parseable responses:

import google.generativeai as genai

model = genai.GenerativeModel("gemini-3.5-flash")

response = model.generate_content(
    "List 3 programming languages with their use cases",
    generation_config={
        "response_mime_type": "application/json",
        "response_schema": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "language": {"type": "string"},
                    "use_case": {"type": "string"}
                }
            }
        }
    }
)

Structured output eliminates the need for fragile regex parsing. The model guarantees valid JSON matching your schema. For a deeper dive, read our structured outputs explained article.

Tool Use and Function Calling

Gemini 3.5 Flash supports several tool-use capabilities:

Function calling — let the model invoke your defined functions
Search as tool — ground responses with real-time web data
Code execution — run Python code in a sandboxed environment

These features make the Gemini 3.5 Flash API suitable for building agents and complex workflows. You can combine them with the Antigravity CLI for local development workflows.

Rate Limits and Pricing

Free Tier (Google AI Studio)

Limit	Value
Requests per minute	15 RPM
Requests per day	1,500
Context window	1M tokens
Max output	65K tokens

Paid Pricing

Type	Cost per 1M tokens
Input tokens	$1.50
Output tokens	$9.00
Cached input	$0.15

The cached input pricing at $0.15 per million tokens is exceptionally low — perfect for applications that reuse system prompts or context. For a full comparison with other models, see our AI API pricing comparison for 2026.

Want to optimize your costs further? Check out our guide on how to reduce LLM API costs.

FAQ

How do I get a Gemini 3.5 Flash API key for free?

Go to Google AI Studio, sign in with your Google account, and click “Get API Key.” The free tier includes 15 requests per minute and 1,500 per day — no credit card required.

What is the model ID for Gemini 3.5 Flash?

The model ID is gemini-3.5-flash. Use this string in all SDK calls and REST API requests. On OpenRouter, use google/gemini-3.5-flash.

Does Gemini 3.5 Flash support images and video?

Yes. The Gemini 3.5 Flash API accepts text, images, video, audio, and PDF as input. It produces text output. You can pass multimodal content using the file parts API or inline base64 data.

What’s the difference between Google AI Studio and Vertex AI?

Google AI Studio is free and designed for prototyping. Vertex AI is Google Cloud’s enterprise platform with higher rate limits, SLAs, data residency controls, and VPC integration. Both serve the same Gemini 3.5 Flash model.

Can I use Gemini 3.5 Flash with the OpenAI SDK?

Yes. Google provides an OpenAI-compatible endpoint, and you can also access Gemini 3.5 Flash through OpenRouter using the standard OpenAI Python or Node.js SDK with base_url set to https://openrouter.ai/api/v1.

How does Gemini 3.5 Flash thinking mode work?

Thinking mode enables chain-of-thought reasoning. The model generates internal reasoning tokens before producing its final answer. Enable it by setting "thinking": {"enabled": True} in the generation config. It uses more tokens but improves accuracy on complex tasks.

Next Steps

You’re now set up with the Gemini 3.5 Flash API. Here’s where to go next:

Gemini 3.5 Flash Complete Guide — full capabilities, benchmarks, and use cases
Gemini 3.5 Flash vs Claude Opus 4.7 vs GPT-5.5 — head-to-head comparison
OpenRouter Complete Guide — access 200+ models through one API
Antigravity 2 Complete Guide — use Gemini 3.5 Flash in your local dev workflow
Best Free AI APIs in 2026 — explore more free options