DeepSeek’s V4 models now understand images, and the best part is you don’t need to learn a new SDK. The API is OpenAI-compatible, so if you’ve used the OpenAI Python library before, you already know 90% of what you need. This tutorial walks through everything from your first API call to production-ready batch processing.
If you’re looking for a broader overview of what DeepSeek Vision can do, check out our complete guide. This article is purely hands-on code.
Prerequisites
You’ll need:
- Python 3.9+
- An API key from platform.deepseek.com
- The
openaiPython package (version 1.0+)
Install the dependency:
pip install openai
Set your API key as an environment variable:
export DEEPSEEK_API_KEY="your-key-here"
Basic Setup
Since DeepSeek uses an OpenAI-compatible API, you configure the standard OpenAI client with a different base URL:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com"
)
That’s it. Every example below uses this same client instance. You’ve got two model choices:
deepseek-v4-flash- Fast and cheap ($0.14/$0.28 per M tokens). Great for straightforward tasks.deepseek-v4-pro- More capable ($1.74/$3.48 per M tokens). Better for complex reasoning about images.
Example 1: Basic Image Description
Let’s start simple. Send an image URL and ask the model to describe it:
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in detail."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
}
}
]
}
],
max_tokens=500
)
print(response.choices[0].message.content)
Sample response:
The image shows a golden retriever sitting on a wooden dock overlooking a
calm lake at sunset. The dog is facing away from the camera, looking out
over the water. The sky has shades of orange and pink reflected on the
lake surface. Pine trees line the far shore.
The content field takes a list of parts. You can mix text and images in any order. The model sees them all as a single prompt.
Example 2: Using Base64-Encoded Images
For local files, encode them as base64:
import base64
def encode_image(image_path: str) -> str:
with open(image_path, "rb") as f:
return base64.standard_b64encode(f.read()).decode("utf-8")
image_data = encode_image("receipt.jpg")
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract all text from this image."
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}
]
}
],
max_tokens=1000
)
print(response.choices[0].message.content)
Both JPEG and PNG work fine. For most use cases, JPEG is preferred since smaller file sizes mean faster uploads and lower costs.
Example 3: OCR Text Extraction
OCR is where DeepSeek Vision really shines compared to its price. Here’s a structured extraction example:
def extract_document_text(image_path: str) -> dict:
image_data = encode_image(image_path)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": """Extract all text from this document image.
Return it as JSON with the following structure:
{
"document_type": "invoice/receipt/form/other",
"extracted_text": "full text content",
"key_fields": {"field_name": "value"}
}"""
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}
]
}
],
max_tokens=2000,
temperature=0.1
)
return response.choices[0].message.content
Setting temperature=0.1 keeps OCR output consistent across runs. You don’t want creative interpretation of text on an invoice.
Sample response for a receipt:
{
"document_type": "receipt",
"extracted_text": "WHOLE FOODS MARKET\n123 Main St\nAustin, TX 78701\n\nOrganic Bananas $1.99\nAlmond Milk $4.49\nSourdough Bread $5.99\n\nSubtotal: $12.47\nTax: $0.62\nTotal: $13.09\n\nVISA ***1234\n06/15/2026 14:32",
"key_fields": {
"store": "Whole Foods Market",
"total": "$13.09",
"date": "06/15/2026",
"payment_method": "VISA ***1234"
}
}
For a deeper dive into OCR pipelines, check out our DeepSeek Vision OCR guide.
Example 4: Multiple Images in One Request
You can send multiple images in a single message. This is useful for comparing documents or processing related images together:
def compare_images(image_paths: list[str], question: str) -> str:
content = [{"type": "text", "text": question}]
for path in image_paths:
image_data = encode_image(path)
content.append({
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
})
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": content}],
max_tokens=1500
)
return response.choices[0].message.content
# Compare two versions of a design
result = compare_images(
["design_v1.png", "design_v2.png"],
"What are the differences between these two UI designs?"
)
print(result)
Each image only uses about 90 KV cache entries in DeepSeek’s architecture, so you can fit many images within the 1M context window without running into limits.
Example 5: Streaming Responses
For longer outputs (like detailed image descriptions or large OCR jobs), streaming gives you results as they generate:
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe everything in this image."},
{
"type": "image_url",
"image_url": {"url": "https://example.com/complex-scene.jpg"}
}
]
}
],
max_tokens=2000,
stream=True
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_response += content
print() # newline at the end
Streaming is especially helpful in web applications where you want to show progress to users instead of making them wait for the entire response.
Example 6: Batch Processing Multiple Images
Here’s a practical pattern for processing a folder of images:
import os
import json
import time
from pathlib import Path
def batch_process_images(
folder: str,
prompt: str,
model: str = "deepseek-v4-flash",
delay: float = 0.5
) -> list[dict]:
results = []
image_extensions = {".jpg", ".jpeg", ".png", ".webp"}
image_files = [
f for f in Path(folder).iterdir()
if f.suffix.lower() in image_extensions
]
print(f"Processing {len(image_files)} images...")
for i, image_path in enumerate(image_files):
try:
image_data = encode_image(str(image_path))
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}
]
}
],
max_tokens=1500,
temperature=0.1
)
results.append({
"file": image_path.name,
"result": response.choices[0].message.content,
"tokens_used": response.usage.total_tokens
})
print(f" [{i+1}/{len(image_files)}] {image_path.name} - OK")
except Exception as e:
results.append({
"file": image_path.name,
"error": str(e)
})
print(f" [{i+1}/{len(image_files)}] {image_path.name} - ERROR: {e}")
time.sleep(delay) # rate limiting
return results
# Usage
results = batch_process_images(
folder="./invoices",
prompt="Extract the invoice number, date, total amount, and vendor name as JSON."
)
# Save results
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
The delay parameter prevents hitting rate limits. For production workloads, you’d want proper retry logic, which brings us to the next section.
Example 7: Robust Error Handling
Production code needs to handle API errors gracefully:
from openai import (
APIError,
APIConnectionError,
RateLimitError,
APITimeoutError,
)
import time
def call_with_retry(
messages: list,
model: str = "deepseek-v4-flash",
max_retries: int = 3,
max_tokens: int = 1500
) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
timeout=60.0
)
return response.choices[0].message.content
except RateLimitError as e:
wait_time = 2 ** attempt * 5 # 5s, 10s, 20s
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except APITimeoutError:
print(f"Timeout on attempt {attempt + 1}. Retrying...")
time.sleep(2)
except APIConnectionError as e:
print(f"Connection error: {e}. Retrying in 5s...")
time.sleep(5)
except APIError as e:
if e.status_code >= 500:
print(f"Server error ({e.status_code}). Retrying...")
time.sleep(3)
else:
raise # client errors shouldn't be retried
raise Exception(f"Failed after {max_retries} attempts")
This handles the most common failure modes: rate limits, timeouts, connection issues, and server errors. Client errors (4xx) are raised immediately since retrying won’t fix them.
Cost Tracking
Keep track of what you’re spending:
class CostTracker:
def __init__(self, model: str = "deepseek-v4-flash"):
self.model = model
self.total_input_tokens = 0
self.total_output_tokens = 0
# Prices per million tokens
self.prices = {
"deepseek-v4-flash": {"input": 0.14, "output": 0.28},
"deepseek-v4-pro": {"input": 1.74, "output": 3.48},
}
def add_usage(self, usage):
self.total_input_tokens += usage.prompt_tokens
self.total_output_tokens += usage.completion_tokens
@property
def total_cost(self) -> float:
prices = self.prices[self.model]
input_cost = (self.total_input_tokens / 1_000_000) * prices["input"]
output_cost = (self.total_output_tokens / 1_000_000) * prices["output"]
return input_cost + output_cost
def report(self) -> str:
return (
f"Tokens: {self.total_input_tokens:,} in / "
f"{self.total_output_tokens:,} out\n"
f"Cost: ${self.total_cost:.4f}"
)
# Usage
tracker = CostTracker("deepseek-v4-flash")
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[...],
max_tokens=1000
)
tracker.add_usage(response.usage)
print(tracker.report())
# Tokens: 1,245 in / 387 out
# Cost: $0.0003
At $0.14 per million input tokens with V4-Flash, you can process thousands of images for pennies.
Tips and Best Practices
Choose the right model. Use V4-Flash for straightforward tasks like basic OCR, image labeling, and simple descriptions. Switch to V4-Pro when you need the model to reason about what it sees, like comparing documents or interpreting complex diagrams.
Keep images reasonable. While the API accepts large images, resizing to 800x800 or smaller usually gives the same quality at lower cost. The model’s visual primitives compress the information anyway.
Use low temperature for extraction. When you want consistent, factual output (OCR, data extraction), set temperature to 0.1 or even 0. Save higher temperatures for creative descriptions.
Batch wisely. Sending multiple images in one request is cheaper than separate requests because you only pay for the system prompt once. But don’t overdo it: if one image in a batch causes an error, you lose the whole response.
For a comparison of how DeepSeek Vision stacks up against GPT-4o and Gemini on these tasks, see our detailed benchmark comparison.
FAQ
What image formats does DeepSeek Vision support?
JPEG, PNG, WebP, and GIF (first frame only). Both URLs and base64-encoded images work. For most use cases, JPEG gives you the best quality-to-size ratio.
Is there a maximum image size?
The API accepts images up to 20MB. However, images larger than 2048x2048 pixels are automatically resized. For cost efficiency, pre-resize your images to around 800x800 before sending them.
Can I use the async OpenAI client?
Yes. The AsyncOpenAI class works identically. Just pass the same base_url and api_key, then use await client.chat.completions.create(...). This is the recommended approach for web applications and high-throughput pipelines.
How does the 1M context window work with images?
Each image uses approximately 90 KV cache entries regardless of resolution (after internal resizing). That means you could theoretically include thousands of images in a single conversation. In practice, you’re more likely limited by the base64 payload size and API timeout settings.
What’s the rate limit?
DeepSeek’s rate limits vary by plan. Free tier accounts get lower throughput. Paid accounts typically get 60 requests per minute and 1M tokens per minute. Check your dashboard at platform.deepseek.com for your specific limits.
Can I use this with LangChain or LlamaIndex?
Yes to both. Since the API is OpenAI-compatible, any framework that supports custom OpenAI endpoints works out of the box. Just set the base URL to https://api.deepseek.com in your framework’s configuration.