Jun 4, 2026 · 4 min read

How to Run MAI-Thinking-1 Locally: What We Know About Microsoft's 35B Model (2026)

MAI-Thinking-1 is Microsoft’s 35B reasoning model announced at Build 2026. Developers are already asking: can I run it locally? The short answer: not yet. But here is what we know, what to expect, and what to use in the meantime.

Current status: enterprise-only

As of June 2026, MAI-Thinking-1 is:

❌ Not publicly available
❌ No weights released
❌ No public API
❌ Not on OpenRouter or any third-party provider
✅ Available to Microsoft enterprise customers via Azure

Microsoft has given no indication it will release weights publicly. The MAI models are positioned as proprietary differentiators for Azure, not open-source contributions.

If weights were released: hardware requirements

Based on the 35B parameter size, here is what you would need:

Quantization	Memory needed	Hardware	Speed (est.)
FP16	~70GB	Mac Studio 128GB, 1× A100	15-25 t/s
Q8	~35GB	RTX 5090 32GB, Mac 64GB	25-40 t/s
Q4_K_M	~20GB	RTX 4090 24GB, Mac 32GB	35-55 t/s
Q3_K	~15GB	RTX 4070 16GB	40-60 t/s

A 35B model at Q4 quantization is very manageable on consumer hardware. It would fit on an RTX 4090 (24GB) with room for context, or easily on RTX Spark (128GB). This is the same size class as Mistral Medium 3.5 and Granite 4.1 34B.

The Aion models: local Windows AI (available soon)

While MAI-Thinking-1 itself is enterprise/cloud-only, Microsoft DID announce local models:

Aion 1.0 Instruct — On-device reasoning for Windows
Aion 1.0 Plan — On-device planning and tool use for Windows

These will ship with RTX Spark hardware this fall and run natively on Windows. They are smaller than MAI-Thinking-1 but designed specifically for on-device agent workflows.

What to use in the meantime (35B-class local models)

If you want a ~35B reasoning model running locally today, these are available now:

Best alternatives at similar size:

Model	Size	How to run	Quality
Qwen 3.7 27B	27B	`ollama pull qwen3.7:27b`	Excellent coding
Qwen 3.6 35B-A3B	35B (3B active)	`ollama pull qwen3.6:35b-a3b`	Fast (80+ t/s)
Mistral Medium 3.5	~40B	`ollama pull mistral-medium-3.5`	Strong reasoning
Granite 4.1 34B	34B	`ollama pull granite4.1:34b`	Tool calling
Devstral 2	~50B	`ollama pull devstral2`	Code specialist

All of these are open-weight, available today, and run on the same hardware that MAI-Thinking-1 would require. Qwen 3.7 27B is the closest match in terms of balancing reasoning + coding at a manageable size.

For enterprise-grade reasoning (API):

If you need MAI-Thinking-1’s target quality (Sonnet 4.6 class) via API today:

Claude Sonnet 4.6 — $3/$15, the exact benchmark target
DeepSeek V4-Pro — $0.435/$0.87, likely exceeds MAI-Thinking-1 on coding
Qwen 3.7 Max — $2.50/$7.50, 92.4% GPQA reasoning

See our best AI API providers guide for the full landscape.

Will Microsoft ever open-source MAI models?

Unlikely for the flagship models. Microsoft’s history:

Phi models (1-4): Open-source (small models, research-focused)
MAI models: Proprietary (enterprise differentiators)
Pattern: Small/research models = open. Large/commercial models = closed.

Microsoft may release smaller variants (like they did with Phi) but MAI-Thinking-1 itself will likely remain Azure-exclusive. The Aion local models may be more accessible since they ship with Windows hardware.

The RTX Spark Dev Box angle

Microsoft’s Surface RTX Spark Dev Box ships preloaded with:

Windows 11 Pro
WSL2 with CUDA GPU passthrough
VS Code + GitHub Copilot
Python, Git, Node.js

This hardware (128GB unified memory) can run any open-weight 35B model locally. Even if MAI-Thinking-1 stays closed, the Dev Box runs Qwen 3.7 27B, Mistral Medium 3.5, and dozens of other open models that match or exceed MAI-Thinking-1’s claimed quality. See Best LLMs for RTX Spark.

FAQ

When will MAI-Thinking-1 be publicly available?

No date announced. Enterprise Azure access is live. Public API likely Q3 2026 at earliest. Open weights: unlikely ever.

Is the Aion 1.0 model the same as MAI-Thinking-1?

No. Aion models are smaller, designed for on-device Windows tasks. MAI-Thinking-1 is the cloud/enterprise flagship. Think of Aion as “MAI Lite for laptops.”

Can I use MAI-Thinking-1 with Aider or Claude Code?

Not yet. No public API exists. When/if Microsoft releases an API, it will likely be Azure-only (not OpenAI-compatible endpoint). Tools like Aider would need specific Azure integration.

What’s better right now: MAI-Thinking-1 (if I had access) or DeepSeek V4-Pro?

DeepSeek V4-Pro almost certainly beats MAI-Thinking-1 on coding (80.6% SWE-bench vs Sonnet 4.6-class). MAI-Thinking-1’s advantage is enterprise compliance — commercially licensed data, Azure integration, no Chinese provider concerns. If you don’t have those constraints, DeepSeek is better and available today.

Should I wait for MAI models or use alternatives?

Use alternatives now. Qwen 3.7 27B locally or DeepSeek V4-Pro via API both exceed MAI-Thinking-1’s claimed Sonnet 4.6-class quality at similar or lower cost — and they’re available today.

What about the Surface RTX Spark Dev Box?

The Surface RTX Spark Dev Box ships with Windows, CUDA, and the full dev stack preloaded. Even without MAI-Thinking-1 weights, it can run every open-weight 35B model at full speed. It is the ideal hardware for local AI development on Windows — whether you’re running Microsoft’s models or open alternatives. If MAI-Thinking-1 ever becomes available locally, the Dev Box would run it effortlessly at Q4 quantization (~20GB of its 128GB used).

Is there a timeline for MAI models becoming open?

No. Microsoft has not announced any plans to open-source MAI-Thinking-1 or MAI-Code-1-Flash. Their smaller Phi models (research-focused) are open, but commercial MAI models appear to be permanent Azure exclusives. The Aion on-device models ship with hardware but whether they are extractable/redistributable is unclear.