Mistral OCR 4 launched on June 23, 2026 with an API that’s clean, well-documented, and easy to integrate. This tutorial covers everything from your first API call to production-ready batch processing with proper error handling.
We’ll build up from basic text extraction to using bounding boxes, confidence scores, and batch endpoints. By the end, you’ll have working code for a document processing pipeline.
Prerequisites
You’ll need:
- Python 3.9+
- A Mistral API key (sign up at console.mistral.ai)
- The
mistralaiPython package
Install the SDK:
pip install mistralai
Authentication
Every request needs your API key. Set it as an environment variable (recommended) or pass it directly:
import os
from mistralai import Mistral
# From environment variable (recommended)
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Or directly (not recommended for production)
client = Mistral(api_key="your-api-key-here")
Store your key in a .env file and use python-dotenv for local development:
# .env
MISTRAL_API_KEY=your-key-here
from dotenv import load_dotenv
load_dotenv()
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
Basic Document Processing
Processing a URL
The simplest case: point the API at a document URL.
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.ocr.process(
model="mistral-ocr-4",
document={
"type": "document_url",
"document_url": "https://example.com/sample-invoice.pdf"
}
)
# Print extracted text
for page in response.pages:
print(f"--- Page {page.index} ---")
for block in page.blocks:
print(block.text)
Processing a Local File
For local files, you upload the file content as base64:
import base64
from pathlib import Path
def process_local_file(client, file_path):
"""Process a local PDF or image file."""
file_bytes = Path(file_path).read_bytes()
encoded = base64.b64encode(file_bytes).decode("utf-8")
# Determine MIME type
suffix = Path(file_path).suffix.lower()
mime_types = {
".pdf": "application/pdf",
".png": "image/png",
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".tiff": "image/tiff",
".webp": "image/webp",
}
mime_type = mime_types.get(suffix, "application/pdf")
response = client.ocr.process(
model="mistral-ocr-4",
document={
"type": "base64",
"base64": encoded,
"mime_type": mime_type,
}
)
return response
result = process_local_file(client, "invoice.pdf")
for page in result.pages:
print(page.blocks[0].text[:100])
Working with Bounding Boxes
Bounding boxes are one of OCR 4’s standout features. Each block includes coordinates showing exactly where it sits on the page.
response = client.ocr.process(
model="mistral-ocr-4",
document={"type": "document_url", "document_url": url},
include_bounding_boxes=True
)
for page in response.pages:
print(f"Page {page.index} (size: {page.width}x{page.height})")
for block in page.blocks:
bbox = block.bounding_box
print(f" Type: {block.type}")
print(f" Position: ({bbox.x}, {bbox.y}) to ({bbox.x + bbox.width}, {bbox.y + bbox.height})")
print(f" Text: {block.text[:60]}...")
print()
Visualizing Bounding Boxes
You can draw bounding boxes on the original document for debugging:
from PIL import Image, ImageDraw
import fitz # PyMuPDF
def visualize_boxes(pdf_path, response, output_path="annotated.png"):
"""Draw bounding boxes on the first page of a PDF."""
doc = fitz.open(pdf_path)
page = doc[0]
pix = page.get_pixmap(dpi=150)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
draw = ImageDraw.Draw(img)
# Scale factors (API coordinates are normalized to page dimensions)
scale_x = pix.width / response.pages[0].width
scale_y = pix.height / response.pages[0].height
colors = {
"title": "red",
"paragraph": "blue",
"table": "green",
"formula": "purple",
"signature": "orange",
}
for block in response.pages[0].blocks:
bbox = block.bounding_box
color = colors.get(block.type, "gray")
x1 = bbox.x * scale_x
y1 = bbox.y * scale_y
x2 = (bbox.x + bbox.width) * scale_x
y2 = (bbox.y + bbox.height) * scale_y
draw.rectangle([x1, y1, x2, y2], outline=color, width=2)
img.save(output_path)
print(f"Saved annotated image to {output_path}")
Using Confidence Scores
Confidence scores help you decide which extractions to trust and which to flag for review:
def process_with_confidence_check(client, url, threshold=0.85):
"""Process a document and flag low-confidence blocks."""
response = client.ocr.process(
model="mistral-ocr-4",
document={"type": "document_url", "document_url": url},
include_bounding_boxes=True
)
high_confidence = []
needs_review = []
for page in response.pages:
if page.confidence < threshold:
print(f"Warning: Page {page.index} has low confidence ({page.confidence:.2f})")
for block in page.blocks:
if block.confidence >= threshold:
high_confidence.append(block)
else:
needs_review.append({
"page": page.index,
"text": block.text,
"confidence": block.confidence,
"type": block.type,
})
return high_confidence, needs_review
trusted, flagged = process_with_confidence_check(client, document_url)
print(f"Trusted blocks: {len(trusted)}, Needs review: {len(flagged)}")
for item in flagged:
print(f" Page {item['page']}: [{item['type']}] {item['text'][:50]}... ({item['confidence']:.2f})")
Batch Processing
For large document sets, the batch endpoint is half the price ($2/1K pages instead of $4). It’s asynchronous: you submit documents, then poll for results.
import time
def batch_process_documents(client, document_urls, poll_interval=30):
"""Submit a batch job and wait for results."""
# Submit the batch
batch = client.ocr.batch.create(
model="mistral-ocr-4",
documents=[
{"type": "document_url", "document_url": url}
for url in document_urls
]
)
print(f"Batch submitted: {batch.id}")
print(f"Status: {batch.status}")
# Poll until complete
while batch.status in ("queued", "processing"):
time.sleep(poll_interval)
batch = client.ocr.batch.retrieve(batch.id)
print(f"Status: {batch.status} ({batch.completed}/{batch.total} pages)")
if batch.status == "completed":
results = client.ocr.batch.results(batch.id)
return results
else:
raise RuntimeError(f"Batch failed: {batch.error}")
# Usage
urls = [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf",
"https://example.com/doc3.pdf",
]
results = batch_process_documents(client, urls)
Processing a Directory of Files
A practical pattern for processing all PDFs in a folder:
from pathlib import Path
import json
def process_directory(client, input_dir, output_dir):
"""Process all PDFs in a directory and save results as JSON."""
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
pdf_files = list(input_path.glob("*.pdf"))
print(f"Found {len(pdf_files)} PDFs to process")
for pdf_file in pdf_files:
print(f"Processing: {pdf_file.name}")
try:
response = process_local_file(client, str(pdf_file))
# Save structured output
output_file = output_path / f"{pdf_file.stem}.json"
output_data = {
"source": pdf_file.name,
"pages": [
{
"index": page.index,
"confidence": page.confidence,
"blocks": [
{
"type": block.type,
"text": block.text,
"confidence": block.confidence,
}
for block in page.blocks
]
}
for page in response.pages
]
}
output_file.write_text(json.dumps(output_data, indent=2))
except Exception as e:
print(f" Error: {e}")
process_directory(client, "./documents/", "./extracted/")
Handling Multi-Page PDFs
OCR 4 handles multi-page PDFs natively. You don’t need to split them:
def extract_tables_from_pdf(client, pdf_url):
"""Extract only table blocks from a multi-page PDF."""
response = client.ocr.process(
model="mistral-ocr-4",
document={"type": "document_url", "document_url": pdf_url},
include_bounding_boxes=True
)
tables = []
for page in response.pages:
for block in page.blocks:
if block.type == "table":
tables.append({
"page": page.index,
"content": block.text,
"position": {
"x": block.bounding_box.x,
"y": block.bounding_box.y,
"width": block.bounding_box.width,
"height": block.bounding_box.height,
}
})
return tables
tables = extract_tables_from_pdf(client, "https://example.com/report.pdf")
print(f"Found {len(tables)} tables across all pages")
Error Handling
Production code needs proper error handling:
from mistralai.exceptions import MistralAPIError, MistralConnectionError
import time
def process_with_retry(client, document, max_retries=3, backoff=2):
"""Process a document with exponential backoff retry."""
for attempt in range(max_retries):
try:
return client.ocr.process(
model="mistral-ocr-4",
document=document,
include_bounding_boxes=True
)
except MistralAPIError as e:
if e.status_code == 429: # Rate limited
wait = backoff ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
elif e.status_code >= 500: # Server error
wait = backoff ** attempt
print(f"Server error ({e.status_code}). Retrying in {wait}s...")
time.sleep(wait)
else:
raise # Client error, don't retry
except MistralConnectionError:
wait = backoff ** attempt
print(f"Connection error. Retrying in {wait}s...")
time.sleep(wait)
raise RuntimeError(f"Failed after {max_retries} attempts")
Building a Complete Pipeline
Here’s a complete example that ties everything together: processing invoices, extracting structured data, and handling errors:
import os
import json
from pathlib import Path
from mistralai import Mistral
def create_invoice_pipeline():
"""Complete invoice processing pipeline."""
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
def process_invoice(file_path):
response = process_local_file(client, file_path)
invoice_data = {
"file": Path(file_path).name,
"pages": len(response.pages),
"text_blocks": [],
"tables": [],
"low_confidence": [],
}
for page in response.pages:
for block in page.blocks:
if block.type == "table":
invoice_data["tables"].append(block.text)
else:
invoice_data["text_blocks"].append(block.text)
if block.confidence < 0.8:
invoice_data["low_confidence"].append({
"page": page.index,
"text": block.text[:100],
"confidence": block.confidence,
})
return invoice_data
return process_invoice
# Usage
pipeline = create_invoice_pipeline()
result = pipeline("./invoices/invoice-001.pdf")
print(json.dumps(result, indent=2))
Tips for Production
- Use environment variables for API keys. Never hardcode them.
- Implement retry logic for rate limits (429) and server errors (5xx).
- Use batch processing when latency isn’t critical. It’s half the price.
- Set confidence thresholds appropriate for your use case. Medical/legal documents need higher thresholds.
- Monitor API costs by tracking pages processed per batch.
- Cache results for documents you’ve already processed.
If you’re comparing this workflow to DeepSeek’s approach, see our DeepSeek Vision Python tutorial for a side-by-side look at how different multimodal APIs handle document extraction. For the self-hosting alternative, check out our guide on running OCR locally.
FAQ
What Python version does the Mistral SDK require?
Python 3.9 or higher. The SDK uses modern Python features like type hints and dataclasses extensively.
How fast is the Mistral OCR 4 API?
Typical response time is 1-3 seconds for a single page, depending on document complexity. Multi-page PDFs take roughly 1-2 seconds per page. Batch processing is slower (results in minutes to hours) but cheaper.
Is there a rate limit?
Yes. Free tier accounts have lower rate limits. Paid accounts typically get 100+ requests per minute. If you hit rate limits, use the batch endpoint or implement exponential backoff.
Can I process images (not just PDFs)?
Yes. Mistral OCR 4 accepts JPEG, PNG, TIFF, and WebP images in addition to PDFs. Set the appropriate MIME type when uploading base64-encoded files.
How do I handle documents larger than 100 pages?
The API handles multi-page PDFs natively with no documented page limit. For very large documents (hundreds of pages), the batch endpoint is recommended to avoid timeout issues on the standard endpoint.
What’s the difference between the standard and batch endpoint?
Standard: synchronous, results in seconds, $4/1K pages. Batch: asynchronous, results in minutes to hours, $2/1K pages. Use standard for real-time applications, batch for bulk processing where latency isn’t critical.