🤖 AI Tools
· 6 min read

How to Use Mistral OCR 4 API (Python Tutorial)


Mistral OCR 4 launched on June 23, 2026 with an API that’s clean, well-documented, and easy to integrate. This tutorial covers everything from your first API call to production-ready batch processing with proper error handling.

We’ll build up from basic text extraction to using bounding boxes, confidence scores, and batch endpoints. By the end, you’ll have working code for a document processing pipeline.

Prerequisites

You’ll need:

  • Python 3.9+
  • A Mistral API key (sign up at console.mistral.ai)
  • The mistralai Python package

Install the SDK:

pip install mistralai

Authentication

Every request needs your API key. Set it as an environment variable (recommended) or pass it directly:

import os
from mistralai import Mistral

# From environment variable (recommended)
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Or directly (not recommended for production)
client = Mistral(api_key="your-api-key-here")

Store your key in a .env file and use python-dotenv for local development:

# .env
MISTRAL_API_KEY=your-key-here
from dotenv import load_dotenv
load_dotenv()

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

Basic Document Processing

Processing a URL

The simplest case: point the API at a document URL.

from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.ocr.process(
    model="mistral-ocr-4",
    document={
        "type": "document_url",
        "document_url": "https://example.com/sample-invoice.pdf"
    }
)

# Print extracted text
for page in response.pages:
    print(f"--- Page {page.index} ---")
    for block in page.blocks:
        print(block.text)

Processing a Local File

For local files, you upload the file content as base64:

import base64
from pathlib import Path

def process_local_file(client, file_path):
    """Process a local PDF or image file."""
    file_bytes = Path(file_path).read_bytes()
    encoded = base64.b64encode(file_bytes).decode("utf-8")

    # Determine MIME type
    suffix = Path(file_path).suffix.lower()
    mime_types = {
        ".pdf": "application/pdf",
        ".png": "image/png",
        ".jpg": "image/jpeg",
        ".jpeg": "image/jpeg",
        ".tiff": "image/tiff",
        ".webp": "image/webp",
    }
    mime_type = mime_types.get(suffix, "application/pdf")

    response = client.ocr.process(
        model="mistral-ocr-4",
        document={
            "type": "base64",
            "base64": encoded,
            "mime_type": mime_type,
        }
    )
    return response

result = process_local_file(client, "invoice.pdf")
for page in result.pages:
    print(page.blocks[0].text[:100])

Working with Bounding Boxes

Bounding boxes are one of OCR 4’s standout features. Each block includes coordinates showing exactly where it sits on the page.

response = client.ocr.process(
    model="mistral-ocr-4",
    document={"type": "document_url", "document_url": url},
    include_bounding_boxes=True
)

for page in response.pages:
    print(f"Page {page.index} (size: {page.width}x{page.height})")
    for block in page.blocks:
        bbox = block.bounding_box
        print(f"  Type: {block.type}")
        print(f"  Position: ({bbox.x}, {bbox.y}) to ({bbox.x + bbox.width}, {bbox.y + bbox.height})")
        print(f"  Text: {block.text[:60]}...")
        print()

Visualizing Bounding Boxes

You can draw bounding boxes on the original document for debugging:

from PIL import Image, ImageDraw
import fitz  # PyMuPDF

def visualize_boxes(pdf_path, response, output_path="annotated.png"):
    """Draw bounding boxes on the first page of a PDF."""
    doc = fitz.open(pdf_path)
    page = doc[0]
    pix = page.get_pixmap(dpi=150)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

    draw = ImageDraw.Draw(img)

    # Scale factors (API coordinates are normalized to page dimensions)
    scale_x = pix.width / response.pages[0].width
    scale_y = pix.height / response.pages[0].height

    colors = {
        "title": "red",
        "paragraph": "blue",
        "table": "green",
        "formula": "purple",
        "signature": "orange",
    }

    for block in response.pages[0].blocks:
        bbox = block.bounding_box
        color = colors.get(block.type, "gray")
        x1 = bbox.x * scale_x
        y1 = bbox.y * scale_y
        x2 = (bbox.x + bbox.width) * scale_x
        y2 = (bbox.y + bbox.height) * scale_y
        draw.rectangle([x1, y1, x2, y2], outline=color, width=2)

    img.save(output_path)
    print(f"Saved annotated image to {output_path}")

Using Confidence Scores

Confidence scores help you decide which extractions to trust and which to flag for review:

def process_with_confidence_check(client, url, threshold=0.85):
    """Process a document and flag low-confidence blocks."""
    response = client.ocr.process(
        model="mistral-ocr-4",
        document={"type": "document_url", "document_url": url},
        include_bounding_boxes=True
    )

    high_confidence = []
    needs_review = []

    for page in response.pages:
        if page.confidence < threshold:
            print(f"Warning: Page {page.index} has low confidence ({page.confidence:.2f})")

        for block in page.blocks:
            if block.confidence >= threshold:
                high_confidence.append(block)
            else:
                needs_review.append({
                    "page": page.index,
                    "text": block.text,
                    "confidence": block.confidence,
                    "type": block.type,
                })

    return high_confidence, needs_review

trusted, flagged = process_with_confidence_check(client, document_url)
print(f"Trusted blocks: {len(trusted)}, Needs review: {len(flagged)}")
for item in flagged:
    print(f"  Page {item['page']}: [{item['type']}] {item['text'][:50]}... ({item['confidence']:.2f})")

Batch Processing

For large document sets, the batch endpoint is half the price ($2/1K pages instead of $4). It’s asynchronous: you submit documents, then poll for results.

import time

def batch_process_documents(client, document_urls, poll_interval=30):
    """Submit a batch job and wait for results."""
    # Submit the batch
    batch = client.ocr.batch.create(
        model="mistral-ocr-4",
        documents=[
            {"type": "document_url", "document_url": url}
            for url in document_urls
        ]
    )

    print(f"Batch submitted: {batch.id}")
    print(f"Status: {batch.status}")

    # Poll until complete
    while batch.status in ("queued", "processing"):
        time.sleep(poll_interval)
        batch = client.ocr.batch.retrieve(batch.id)
        print(f"Status: {batch.status} ({batch.completed}/{batch.total} pages)")

    if batch.status == "completed":
        results = client.ocr.batch.results(batch.id)
        return results
    else:
        raise RuntimeError(f"Batch failed: {batch.error}")

# Usage
urls = [
    "https://example.com/doc1.pdf",
    "https://example.com/doc2.pdf",
    "https://example.com/doc3.pdf",
]
results = batch_process_documents(client, urls)

Processing a Directory of Files

A practical pattern for processing all PDFs in a folder:

from pathlib import Path
import json

def process_directory(client, input_dir, output_dir):
    """Process all PDFs in a directory and save results as JSON."""
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    pdf_files = list(input_path.glob("*.pdf"))
    print(f"Found {len(pdf_files)} PDFs to process")

    for pdf_file in pdf_files:
        print(f"Processing: {pdf_file.name}")
        try:
            response = process_local_file(client, str(pdf_file))

            # Save structured output
            output_file = output_path / f"{pdf_file.stem}.json"
            output_data = {
                "source": pdf_file.name,
                "pages": [
                    {
                        "index": page.index,
                        "confidence": page.confidence,
                        "blocks": [
                            {
                                "type": block.type,
                                "text": block.text,
                                "confidence": block.confidence,
                            }
                            for block in page.blocks
                        ]
                    }
                    for page in response.pages
                ]
            }
            output_file.write_text(json.dumps(output_data, indent=2))

        except Exception as e:
            print(f"  Error: {e}")

process_directory(client, "./documents/", "./extracted/")

Handling Multi-Page PDFs

OCR 4 handles multi-page PDFs natively. You don’t need to split them:

def extract_tables_from_pdf(client, pdf_url):
    """Extract only table blocks from a multi-page PDF."""
    response = client.ocr.process(
        model="mistral-ocr-4",
        document={"type": "document_url", "document_url": pdf_url},
        include_bounding_boxes=True
    )

    tables = []
    for page in response.pages:
        for block in page.blocks:
            if block.type == "table":
                tables.append({
                    "page": page.index,
                    "content": block.text,
                    "position": {
                        "x": block.bounding_box.x,
                        "y": block.bounding_box.y,
                        "width": block.bounding_box.width,
                        "height": block.bounding_box.height,
                    }
                })

    return tables

tables = extract_tables_from_pdf(client, "https://example.com/report.pdf")
print(f"Found {len(tables)} tables across all pages")

Error Handling

Production code needs proper error handling:

from mistralai.exceptions import MistralAPIError, MistralConnectionError
import time

def process_with_retry(client, document, max_retries=3, backoff=2):
    """Process a document with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            return client.ocr.process(
                model="mistral-ocr-4",
                document=document,
                include_bounding_boxes=True
            )
        except MistralAPIError as e:
            if e.status_code == 429:  # Rate limited
                wait = backoff ** attempt
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            elif e.status_code >= 500:  # Server error
                wait = backoff ** attempt
                print(f"Server error ({e.status_code}). Retrying in {wait}s...")
                time.sleep(wait)
            else:
                raise  # Client error, don't retry
        except MistralConnectionError:
            wait = backoff ** attempt
            print(f"Connection error. Retrying in {wait}s...")
            time.sleep(wait)

    raise RuntimeError(f"Failed after {max_retries} attempts")

Building a Complete Pipeline

Here’s a complete example that ties everything together: processing invoices, extracting structured data, and handling errors:

import os
import json
from pathlib import Path
from mistralai import Mistral

def create_invoice_pipeline():
    """Complete invoice processing pipeline."""
    client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

    def process_invoice(file_path):
        response = process_local_file(client, file_path)

        invoice_data = {
            "file": Path(file_path).name,
            "pages": len(response.pages),
            "text_blocks": [],
            "tables": [],
            "low_confidence": [],
        }

        for page in response.pages:
            for block in page.blocks:
                if block.type == "table":
                    invoice_data["tables"].append(block.text)
                else:
                    invoice_data["text_blocks"].append(block.text)

                if block.confidence < 0.8:
                    invoice_data["low_confidence"].append({
                        "page": page.index,
                        "text": block.text[:100],
                        "confidence": block.confidence,
                    })

        return invoice_data

    return process_invoice

# Usage
pipeline = create_invoice_pipeline()
result = pipeline("./invoices/invoice-001.pdf")
print(json.dumps(result, indent=2))

Tips for Production

  1. Use environment variables for API keys. Never hardcode them.
  2. Implement retry logic for rate limits (429) and server errors (5xx).
  3. Use batch processing when latency isn’t critical. It’s half the price.
  4. Set confidence thresholds appropriate for your use case. Medical/legal documents need higher thresholds.
  5. Monitor API costs by tracking pages processed per batch.
  6. Cache results for documents you’ve already processed.

If you’re comparing this workflow to DeepSeek’s approach, see our DeepSeek Vision Python tutorial for a side-by-side look at how different multimodal APIs handle document extraction. For the self-hosting alternative, check out our guide on running OCR locally.

FAQ

What Python version does the Mistral SDK require?

Python 3.9 or higher. The SDK uses modern Python features like type hints and dataclasses extensively.

How fast is the Mistral OCR 4 API?

Typical response time is 1-3 seconds for a single page, depending on document complexity. Multi-page PDFs take roughly 1-2 seconds per page. Batch processing is slower (results in minutes to hours) but cheaper.

Is there a rate limit?

Yes. Free tier accounts have lower rate limits. Paid accounts typically get 100+ requests per minute. If you hit rate limits, use the batch endpoint or implement exponential backoff.

Can I process images (not just PDFs)?

Yes. Mistral OCR 4 accepts JPEG, PNG, TIFF, and WebP images in addition to PDFs. Set the appropriate MIME type when uploading base64-encoded files.

How do I handle documents larger than 100 pages?

The API handles multi-page PDFs natively with no documented page limit. For very large documents (hundreds of pages), the batch endpoint is recommended to avoid timeout issues on the standard endpoint.

What’s the difference between the standard and batch endpoint?

Standard: synchronous, results in seconds, $4/1K pages. Batch: asynchronous, results in minutes to hours, $2/1K pages. Use standard for real-time applications, batch for bulk processing where latency isn’t critical.

📘