The Gemini 3.5 Flash API is one of the fastest and most cost-effective ways to add AI capabilities to your applications. With a 1M token context window, multimodal input support, and pricing starting at just $1.50 per million input tokens, it’s an excellent choice for developers building production-grade AI features.
In this guide, you’ll go from zero to making your first Gemini 3.5 Flash API call in under 5 minutes. We’ll cover the Google AI SDK, Node.js, cURL, thinking mode, streaming, and how to access the model via OpenRouter.
For a broader overview of the model’s capabilities and benchmarks, check out our Gemini 3.5 Flash complete guide.
Prerequisites
Before you start, make sure you have:
- A Google account (for API key access)
- Python 3.9+ or Node.js 18+ installed
- A terminal or code editor ready
That’s it. The Gemini 3.5 Flash API has a free tier so you don’t need a credit card to get started.
Step 1: Get Your Gemini 3.5 Flash API Key
There are two ways to access the Gemini 3.5 Flash API:
- Google AI Studio (free tier, ideal for prototyping and personal projects)
- Vertex AI (enterprise, with SLAs and higher rate limits)
For this tutorial, we’ll use Google AI Studio:
- Go to Google AI Studio
- Sign in with your Google account
- Click “Get API Key” in the left sidebar
- Click “Create API Key” and select a project
- Copy your key — you’ll need it in the next step
Tip: Store your API key in an environment variable (
GEMINI_API_KEY) rather than hardcoding it. This is a security best practice.
The free tier gives you 15 requests per minute and 1,500 requests per day — more than enough for development and testing. If you need higher limits, check out our best free AI APIs guide for alternatives.
Step 2: Make Your First Request (Python)
Install the Google AI SDK:
pip install google-generativeai
Now make your first Gemini 3.5 Flash API call:
import google.generativeai as genai
genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-3.5-flash")
response = model.generate_content("Explain microservices")
print(response.text)
That’s it — five lines of code. The model ID is gemini-3.5-flash. The response comes back as plain text by default.
The Gemini 3.5 Flash API supports text, image, video, audio, and PDF input with text output. You can pass multimodal content by including file parts in your request.
Step 3: Node.js Example
Install the SDK:
npm install @google/generative-ai
Then use it:
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("YOUR_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3.5-flash" });
const result = await model.generateContent("Explain REST APIs");
console.log(result.response.text());
The Node.js SDK mirrors the Python SDK closely. Both support streaming, function calling, and structured output. For more on streaming implementations, see our guide on streaming AI responses in Node.js.
Step 4: cURL Example
If you prefer testing from the command line:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent?key=YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
This is useful for quick testing or integrating with languages that don’t have an official SDK. The REST API gives you full access to all Gemini 3.5 Flash API features.
Using Thinking Mode
One of the standout features of Gemini 3.5 Flash is thinking mode. When enabled, the model performs chain-of-thought reasoning before generating its final answer — improving accuracy on complex tasks like coding, math, and system design.
response = model.generate_content(
"Design a caching strategy for a high-traffic API",
generation_config={"thinking": {"enabled": True}}
)
Thinking mode uses additional tokens for the reasoning process, but the quality improvement on complex prompts is significant. It’s especially useful for:
- Architecture and design questions
- Multi-step reasoning problems
- Code generation with complex requirements
- Mathematical proofs and calculations
Streaming Responses
For better UX in chat applications, use streaming to display tokens as they arrive:
response = model.generate_content(
"Write a detailed guide to Docker networking",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
Streaming is essential for production applications where perceived latency matters. The Gemini 3.5 Flash API supports streaming across all SDKs and the REST API. Learn more in our streaming AI responses guide.
Access via OpenRouter
OpenRouter provides a unified API that lets you access Gemini 3.5 Flash alongside hundreds of other models using an OpenAI-compatible endpoint. This is ideal if you want to switch between models without changing your code.
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="google/gemini-3.5-flash",
messages=[{"role": "user", "content": "Hello"}]
)
The model ID on OpenRouter is google/gemini-3.5-flash. Since it uses the OpenAI SDK format, you can swap models by changing a single string. This makes it easy to compare Gemini 3.5 Flash against Claude Opus 4.7 or GPT-5.5 in your own applications.
Structured Output
The Gemini 3.5 Flash API supports structured output via JSON schema, making it easy to get predictable, parseable responses:
import google.generativeai as genai
model = genai.GenerativeModel("gemini-3.5-flash")
response = model.generate_content(
"List 3 programming languages with their use cases",
generation_config={
"response_mime_type": "application/json",
"response_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"language": {"type": "string"},
"use_case": {"type": "string"}
}
}
}
}
)
Structured output eliminates the need for fragile regex parsing. The model guarantees valid JSON matching your schema. For a deeper dive, read our structured outputs explained article.
Tool Use and Function Calling
Gemini 3.5 Flash supports several tool-use capabilities:
- Function calling — let the model invoke your defined functions
- Search as tool — ground responses with real-time web data
- Code execution — run Python code in a sandboxed environment
These features make the Gemini 3.5 Flash API suitable for building agents and complex workflows. You can combine them with the Antigravity CLI for local development workflows.
Rate Limits and Pricing
Free Tier (Google AI Studio)
| Limit | Value |
|---|---|
| Requests per minute | 15 RPM |
| Requests per day | 1,500 |
| Context window | 1M tokens |
| Max output | 65K tokens |
Paid Pricing
| Type | Cost per 1M tokens |
|---|---|
| Input tokens | $1.50 |
| Output tokens | $9.00 |
| Cached input | $0.15 |
The cached input pricing at $0.15 per million tokens is exceptionally low — perfect for applications that reuse system prompts or context. For a full comparison with other models, see our AI API pricing comparison for 2026.
Want to optimize your costs further? Check out our guide on how to reduce LLM API costs.
FAQ
How do I get a Gemini 3.5 Flash API key for free?
Go to Google AI Studio, sign in with your Google account, and click “Get API Key.” The free tier includes 15 requests per minute and 1,500 per day — no credit card required.
What is the model ID for Gemini 3.5 Flash?
The model ID is gemini-3.5-flash. Use this string in all SDK calls and REST API requests. On OpenRouter, use google/gemini-3.5-flash.
Does Gemini 3.5 Flash support images and video?
Yes. The Gemini 3.5 Flash API accepts text, images, video, audio, and PDF as input. It produces text output. You can pass multimodal content using the file parts API or inline base64 data.
What’s the difference between Google AI Studio and Vertex AI?
Google AI Studio is free and designed for prototyping. Vertex AI is Google Cloud’s enterprise platform with higher rate limits, SLAs, data residency controls, and VPC integration. Both serve the same Gemini 3.5 Flash model.
Can I use Gemini 3.5 Flash with the OpenAI SDK?
Yes. Google provides an OpenAI-compatible endpoint, and you can also access Gemini 3.5 Flash through OpenRouter using the standard OpenAI Python or Node.js SDK with base_url set to https://openrouter.ai/api/v1.
How does Gemini 3.5 Flash thinking mode work?
Thinking mode enables chain-of-thought reasoning. The model generates internal reasoning tokens before producing its final answer. Enable it by setting "thinking": {"enabled": True} in the generation config. It uses more tokens but improves accuracy on complex tasks.
Next Steps
You’re now set up with the Gemini 3.5 Flash API. Here’s where to go next:
- Gemini 3.5 Flash Complete Guide — full capabilities, benchmarks, and use cases
- Gemini 3.5 Flash vs Claude Opus 4.7 vs GPT-5.5 — head-to-head comparison
- OpenRouter Complete Guide — access 200+ models through one API
- Antigravity 2 Complete Guide — use Gemini 3.5 Flash in your local dev workflow
- Best Free AI APIs in 2026 — explore more free options