How to Run MAI-Thinking-1 Locally: What We Know About Microsoft's 35B Model (2026)
MAI-Thinking-1 is Microsoftโs 35B reasoning model announced at Build 2026. Developers are already asking: can I run it locally? The short answer: not yet. But here is what we know, what to expect, and what to use in the meantime.
Current status: enterprise-only
As of June 2026, MAI-Thinking-1 is:
- โ Not publicly available
- โ No weights released
- โ No public API
- โ Not on OpenRouter or any third-party provider
- โ Available to Microsoft enterprise customers via Azure
Microsoft has given no indication it will release weights publicly. The MAI models are positioned as proprietary differentiators for Azure, not open-source contributions.
If weights were released: hardware requirements
Based on the 35B parameter size, here is what you would need:
| Quantization | Memory needed | Hardware | Speed (est.) |
|---|---|---|---|
| FP16 | ~70GB | Mac Studio 128GB, 1ร A100 | 15-25 t/s |
| Q8 | ~35GB | RTX 5090 32GB, Mac 64GB | 25-40 t/s |
| Q4_K_M | ~20GB | RTX 4090 24GB, Mac 32GB | 35-55 t/s |
| Q3_K | ~15GB | RTX 4070 16GB | 40-60 t/s |
A 35B model at Q4 quantization is very manageable on consumer hardware. It would fit on an RTX 4090 (24GB) with room for context, or easily on RTX Spark (128GB). This is the same size class as Mistral Medium 3.5 and Granite 4.1 34B.
The Aion models: local Windows AI (available soon)
While MAI-Thinking-1 itself is enterprise/cloud-only, Microsoft DID announce local models:
- Aion 1.0 Instruct โ On-device reasoning for Windows
- Aion 1.0 Plan โ On-device planning and tool use for Windows
These will ship with RTX Spark hardware this fall and run natively on Windows. They are smaller than MAI-Thinking-1 but designed specifically for on-device agent workflows.
What to use in the meantime (35B-class local models)
If you want a ~35B reasoning model running locally today, these are available now:
Best alternatives at similar size:
| Model | Size | How to run | Quality |
|---|---|---|---|
| Qwen 3.7 27B | 27B | ollama pull qwen3.7:27b | Excellent coding |
| Qwen 3.6 35B-A3B | 35B (3B active) | ollama pull qwen3.6:35b-a3b | Fast (80+ t/s) |
| Mistral Medium 3.5 | ~40B | ollama pull mistral-medium-3.5 | Strong reasoning |
| Granite 4.1 34B | 34B | ollama pull granite4.1:34b | Tool calling |
| Devstral 2 | ~50B | ollama pull devstral2 | Code specialist |
All of these are open-weight, available today, and run on the same hardware that MAI-Thinking-1 would require. Qwen 3.7 27B is the closest match in terms of balancing reasoning + coding at a manageable size.
For enterprise-grade reasoning (API):
If you need MAI-Thinking-1โs target quality (Sonnet 4.6 class) via API today:
- Claude Sonnet 4.6 โ $3/$15, the exact benchmark target
- DeepSeek V4-Pro โ $0.435/$0.87, likely exceeds MAI-Thinking-1 on coding
- Qwen 3.7 Max โ $2.50/$7.50, 92.4% GPQA reasoning
See our best AI API providers guide for the full landscape.
Will Microsoft ever open-source MAI models?
Unlikely for the flagship models. Microsoftโs history:
- Phi models (1-4): Open-source (small models, research-focused)
- MAI models: Proprietary (enterprise differentiators)
- Pattern: Small/research models = open. Large/commercial models = closed.
Microsoft may release smaller variants (like they did with Phi) but MAI-Thinking-1 itself will likely remain Azure-exclusive. The Aion local models may be more accessible since they ship with Windows hardware.
The RTX Spark Dev Box angle
Microsoftโs Surface RTX Spark Dev Box ships preloaded with:
- Windows 11 Pro
- WSL2 with CUDA GPU passthrough
- VS Code + GitHub Copilot
- Python, Git, Node.js
This hardware (128GB unified memory) can run any open-weight 35B model locally. Even if MAI-Thinking-1 stays closed, the Dev Box runs Qwen 3.7 27B, Mistral Medium 3.5, and dozens of other open models that match or exceed MAI-Thinking-1โs claimed quality. See Best LLMs for RTX Spark.
FAQ
When will MAI-Thinking-1 be publicly available?
No date announced. Enterprise Azure access is live. Public API likely Q3 2026 at earliest. Open weights: unlikely ever.
Is the Aion 1.0 model the same as MAI-Thinking-1?
No. Aion models are smaller, designed for on-device Windows tasks. MAI-Thinking-1 is the cloud/enterprise flagship. Think of Aion as โMAI Lite for laptops.โ
Can I use MAI-Thinking-1 with Aider or Claude Code?
Not yet. No public API exists. When/if Microsoft releases an API, it will likely be Azure-only (not OpenAI-compatible endpoint). Tools like Aider would need specific Azure integration.
Whatโs better right now: MAI-Thinking-1 (if I had access) or DeepSeek V4-Pro?
DeepSeek V4-Pro almost certainly beats MAI-Thinking-1 on coding (80.6% SWE-bench vs Sonnet 4.6-class). MAI-Thinking-1โs advantage is enterprise compliance โ commercially licensed data, Azure integration, no Chinese provider concerns. If you donโt have those constraints, DeepSeek is better and available today.
Should I wait for MAI models or use alternatives?
Use alternatives now. Qwen 3.7 27B locally or DeepSeek V4-Pro via API both exceed MAI-Thinking-1โs claimed Sonnet 4.6-class quality at similar or lower cost โ and theyโre available today.
What about the Surface RTX Spark Dev Box?
The Surface RTX Spark Dev Box ships with Windows, CUDA, and the full dev stack preloaded. Even without MAI-Thinking-1 weights, it can run every open-weight 35B model at full speed. It is the ideal hardware for local AI development on Windows โ whether youโre running Microsoftโs models or open alternatives. If MAI-Thinking-1 ever becomes available locally, the Dev Box would run it effortlessly at Q4 quantization (~20GB of its 128GB used).
Is there a timeline for MAI models becoming open?
No. Microsoft has not announced any plans to open-source MAI-Thinking-1 or MAI-Code-1-Flash. Their smaller Phi models (research-focused) are open, but commercial MAI models appear to be permanent Azure exclusives. The Aion on-device models ship with hardware but whether they are extractable/redistributable is unclear.