🤖 AI Tools
· 6 min read

Mistral OCR 4: Complete Guide (Pricing, API, Features)


Mistral just dropped OCR 4 on June 23, 2026, and it’s the most complete document intelligence model they’ve ever shipped. It doesn’t just extract text. It gives you paragraph-level bounding boxes, confidence scores, block classification, and support for 170 languages. At $4 per 1,000 pages (or $2 in batch mode), it undercuts most enterprise alternatives while delivering a 72% win rate in blind human evaluations.

If you’ve been following the OCR space (and you should, because it’s moving fast), this is a significant release. It tops the OlmOCRBench leaderboard and ships with a Document AI studio mode that makes it accessible to non-developers too.

Let’s break down everything you need to know.

What Is Mistral OCR 4?

Mistral OCR 4 is a dedicated document intelligence model from Mistral AI. Unlike general-purpose vision models that happen to read text, OCR 4 is purpose-built for structured document extraction. Think invoices, contracts, scientific papers, forms, multilingual documents, anything on a page.

It’s available on la Plateforme (Mistral’s own API), Microsoft Foundry (Azure’s model catalog), and as a self-hosted enterprise deployment for organizations that need on-prem control.

The “4” in the name signals a major generational leap. Previous Mistral OCR versions were solid but lacked the spatial awareness that enterprise customers demanded. OCR 4 fixes that with bounding boxes and confidence scoring at the paragraph level.

Key Features

170 Language Support

This isn’t just the usual “we technically support 170 languages” claim. Mistral reports strong performance across Latin, Cyrillic, Arabic, CJK, Devanagari, and other scripts. For teams processing multilingual document sets, this matters. You don’t need separate models for different language families.

Paragraph-Level Bounding Boxes

Every extracted element comes with coordinate data showing exactly where it lives on the page. This enables:

  • Document reconstruction (maintaining original layout)
  • Compliance auditing (proving which section a data point came from)
  • Search indexing with spatial awareness
  • Overlay and highlight features in document viewers

Confidence Scores

OCR 4 reports confidence at both the page and word level. Low-confidence extractions can be flagged for human review, which is critical for regulated industries like finance and healthcare.

Block Classification

The model classifies each extracted region: title, paragraph, table, formula, signature, header, footer. This structured output means you can route different content types to different processing pipelines without manual tagging.

Document AI Studio Mode

For non-developers, Mistral offers a studio interface where you can upload documents, configure extraction settings, and preview results without writing code. It’s a nice on-ramp, though most production workloads will use the API directly.

Pricing

Mistral keeps it simple:

ModePrice
Standard API$4 / 1,000 pages
Batch API$2 / 1,000 pages

The batch endpoint is half the price but designed for asynchronous workloads where you don’t need results in real-time. Upload a batch of documents, get results within hours.

For context, Google Document AI charges around $5 per 1,000 pages for similar functionality. So Mistral undercuts Google by 20% on standard pricing and 60% on batch. That adds up fast at enterprise scale.

If you’re comparing costs across the broader multimodal API landscape, check our best multimodal AI APIs price comparison for a full breakdown.

Benchmark Performance

The headline number: 72% win rate in blind human evaluations against competing OCR solutions. That means human judges preferred Mistral OCR 4’s output nearly three-quarters of the time when shown anonymized results from multiple systems.

On OlmOCRBench (the standard academic benchmark for OCR quality), OCR 4 takes the top spot. This benchmark tests:

  • Text extraction accuracy
  • Layout preservation
  • Table structure recognition
  • Formula handling
  • Multilingual text

Mistral hasn’t published exact scores for every sub-benchmark, but the overall top ranking is verified on the public leaderboard.

API Usage

The API follows Mistral’s standard patterns. You authenticate with an API key, send documents (PDF, images), and get back structured JSON with text, bounding boxes, confidence scores, and block classifications.

Here’s a basic example:

from mistralai import Mistral

client = Mistral(api_key="your-api-key")

response = client.ocr.process(
    model="mistral-ocr-4",
    document={"type": "document_url", "document_url": "https://example.com/invoice.pdf"},
    include_bounding_boxes=True
)

for page in response.pages:
    print(f"Page {page.index} (confidence: {page.confidence})")
    for block in page.blocks:
        print(f"  [{block.type}] {block.text[:80]}...")
        print(f"  Bbox: {block.bounding_box}")

For a complete Python walkthrough with batch processing and error handling, see our Mistral OCR 4 Python tutorial.

Available On Microsoft Foundry

Microsoft announced same-day availability of Mistral OCR 4 in Microsoft Foundry (formerly Azure AI Foundry). This means Azure customers can access OCR 4 through their existing Azure billing, compliance, and networking setup without a separate Mistral account.

The Foundry deployment supports:

  • Virtual network integration
  • Managed identity authentication
  • Azure Private Link for zero-trust networking
  • Regional data residency

For enterprise teams already on Azure, this removes a procurement barrier entirely.

Enterprise Self-Hosting

Mistral offers selective self-hosting for enterprises that need full data sovereignty. Documents never leave your infrastructure. This is particularly relevant for:

  • Government agencies with classified documents
  • Healthcare organizations under HIPAA
  • Financial institutions with strict data residency requirements
  • Legal firms handling privileged communications

Self-hosting pricing isn’t public (you need to contact Mistral sales), but the option exists, which is more than most cloud-only OCR services can say.

How It Compares

The OCR landscape in 2026 is competitive. Here’s where Mistral OCR 4 fits:

  • vs. Google Document AI: Cheaper ($4 vs $5/1K pages), more languages, similar accuracy. Google has tighter GCP integration. See our detailed Mistral OCR 4 vs Google Document AI comparison.
  • vs. DeepSeek Vision: DeepSeek is cheaper per token for light workloads and fully open-source. But it’s a general vision model, not a dedicated OCR system. It lacks bounding boxes and batch endpoints. Read more in our DeepSeek Vision complete guide.
  • vs. Baidu Unlimited-OCR: Baidu’s model is free and self-hostable under MIT license, but it’s a 3B parameter model you run yourself. No managed API, no enterprise SLA. Different use case entirely.

For the full three-way breakdown, see Mistral OCR 4 vs DeepSeek Vision vs Baidu Unlimited-OCR.

Best Use Cases

OCR 4 shines brightest for:

  1. Enterprise document pipelines: Invoice processing, contract extraction, compliance scanning at scale.
  2. Multilingual archives: Organizations with documents in dozens of languages that need a single extraction pipeline.
  3. Regulated industries: Where confidence scores and bounding boxes provide the audit trail regulators demand.
  4. RAG ingestion: Feeding structured document content into retrieval-augmented generation systems.
  5. Batch digitization: Large-scale scanning projects where the $2/1K pages batch pricing makes cost predictable.

Limitations

No tool is perfect. Here’s what to keep in mind:

  • No open-source option: Unlike DeepSeek or Baidu Unlimited-OCR, you can’t run this locally without an enterprise contract.
  • Page-based pricing: If your documents are image-heavy with little text, you still pay per page. Token-based pricing (like DeepSeek) can be cheaper for simple documents.
  • Latency: The standard endpoint has typical cloud API latency. For real-time applications, you’ll want to test if it meets your SLA.
  • New model: OCR 4 literally launched yesterday. Edge cases and language-specific quirks will surface as more teams adopt it.

Getting Started

  1. Sign up at console.mistral.ai
  2. Generate an API key
  3. Install the Python SDK: pip install mistralai
  4. Start with the Document AI studio for interactive testing
  5. Move to the API for production workloads

For Azure users: search “Mistral OCR 4” in the Microsoft Foundry model catalog and deploy to your subscription.

FAQ

How much does Mistral OCR 4 cost?

$4 per 1,000 pages for standard API access, or $2 per 1,000 pages for batch processing. Enterprise self-hosting pricing requires contacting Mistral sales directly.

How many languages does Mistral OCR 4 support?

170 languages across multiple scripts including Latin, Cyrillic, Arabic, CJK (Chinese, Japanese, Korean), Devanagari, and others. Performance varies by script but is reported as strong across all supported languages.

Can I self-host Mistral OCR 4?

Yes, but only through Mistral’s enterprise program. It’s not open-source or freely downloadable. You need to contact their sales team for on-premise deployment options.

How does Mistral OCR 4 compare to Google Document AI?

Mistral is 20% cheaper on standard pricing ($4 vs $5 per 1K pages), supports more languages (170 vs ~60), and includes bounding boxes by default. Google has tighter GCP ecosystem integration. Both are strong choices depending on your existing cloud provider.

What formats does Mistral OCR 4 accept?

PDFs (single and multi-page), JPEG, PNG, TIFF, and WebP images. The API handles format detection automatically.

Does Mistral OCR 4 handle tables and formulas?

Yes. The block classification system identifies tables and formulas specifically, and the structured output preserves table layouts and renders formulas in a parseable format. It’s one of the top-scoring models for table extraction on OlmOCRBench.