NVIDIA unveiled RTX Spark at Computex on June 1, 2026 β a new class of Windows PC designed specifically for running AI agents locally. It combines an ARM CPU with a Blackwell GPU and 128GB of unified memory in a form factor that ranges from slim laptops to compact desktops. The headline claim: it can run 120-billion-parameter LLMs with up to 1 million tokens of context on-device.
This is not a GPU upgrade. It is an entirely new product category β NVIDIA entering the Windows PC market directly, competing with Apple Silicon on unified memory and with Intel/AMD on the CPU side. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will ship this fall.
For developers running AI models locally, RTX Spark is the first Windows machine purpose-built for the task. Here is everything we know.
Specs
| Chip name | RTX Spark Superchip (also called N1X) |
| CPU | ARM-based (Windows on ARM) |
| GPU | Blackwell architecture |
| Unified memory | 128GB (shared between CPU and GPU) |
| AI compute | Up to 1 petaflop |
| LLM capacity | 120B parameters, up to 1M token context |
| OS | Windows (ARM) |
| Form factors | Laptops (slim, all-day battery) and desktops |
| Availability | Fall 2026 |
| OEM partners | ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI |
| Software stack | CUDA, llama.cpp, vLLM, LM Studio, TensorRT |
| Security | NVIDIA OpenShell (local agent security runtime) |
What models can run on RTX Spark?
The 128GB unified memory determines what fits. Here is a realistic breakdown based on model sizes and quantization:
| Model | Parameters | Quantization | Memory needed | Runs on RTX Spark? |
|---|---|---|---|---|
| Qwen 3.6 27B | 27B | Q4_K_M | ~16GB | β Easily |
| Qwen 3.6 35B | 35B (3B active) | FP16 | ~7GB | β Easily |
| Llama 4 Scout | 109B (17B active) | Q4_K_M | ~60GB | β Fits |
| DeepSeek V4 Flash | 70B (estimated active) | Q4_K_M | ~40GB | β Fits |
| Qwen 3.6 27B | 27B | FP16 | ~54GB | β Fits |
| MiniMax M3 | 200-400B (estimated) | Q4_K_M | ~100-200GB | β οΈ Tight/unlikely |
| DeepSeek V4-Pro | 1.6T (49B active) | Q4_K_M | ~100-200GB | β Too large |
| MiMo V2.5 Pro | Dense (large) | Any | >128GB | β Too large |
| Claude Opus 4.8 | Closed source | N/A | N/A | β API only |
The sweet spot is models up to ~70B parameters at FP16 or ~120B at Q4 quantization. NVIDIA specifically benchmarked with Qwen 3.6 models and demonstrated 2x performance improvements on the 27B variant.
Models that will run great on RTX Spark:
- Qwen 3.6 27B and 35B (NVIDIAβs optimized targets)
- Llama 4 Scout (109B MoE, 17B active β fits comfortably)
- Gemma 4 variants
- Mistral Medium 3.5
- Any model under 70B dense or 120B MoE
Models that will NOT run on RTX Spark:
- DeepSeek V4-Pro (1.6T total, even quantized needs 200GB+)
- MiMo V2.5 Pro (dense architecture, too large for 128GB)
- MiniMax M3 (estimated 200-400B, likely too large)
- Any model requiring >128GB total memory
Performance: what NVIDIA demonstrated
NVIDIA showed concrete llama.cpp benchmarks at Computex:
- Qwen 3.6 27B: 2Γ throughput improvement with multi-token prediction + optimizations
- Qwen 3.6 35B: 1.6Γ throughput improvement
- Multi-GPU (2Γ GPUs): 2Γ memory and 1.8Γ compute via tensor parallelism
These optimizations are available in llama.cpp and LM Studio today on existing RTX hardware, and will be further optimized for RTX Spark at launch.
The 1M token context claim is significant β most local setups are limited to 32-128K tokens due to memory constraints. With 128GB unified memory, RTX Spark can maintain much larger contexts without swapping to disk.
RTX Spark vs Mac Studio for local AI
The most obvious comparison is Appleβs Mac Studio with M4 Ultra:
| NVIDIA RTX Spark | Apple Mac Studio M4 Ultra | |
|---|---|---|
| Unified memory | 128GB | Up to 192GB |
| GPU compute | 1 PFLOP (Blackwell) | ~27 TFLOPS (Apple GPU) |
| AI-specific hardware | Tensor Cores, TensorRT | Neural Engine |
| Max model size | ~120B params | ~140B params (192GB config) |
| OS | Windows (ARM) | macOS |
| CUDA support | β Native | β |
| llama.cpp optimization | β NVIDIA-optimized | β Metal-optimized |
| vLLM support | β | β |
| Price (estimated) | TBD (likely $2,000-4,000) | $4,000-7,000 |
| Form factor | Laptops + desktops | Desktop only |
| Battery (laptop) | All-day | N/A (desktop) |
RTX Sparkβs advantage: CUDA ecosystem, TensorRT acceleration, laptop form factor, likely lower price. Mac Studioβs advantage: more memory (192GB option), mature ecosystem for local AI, proven performance.
For a deeper dive on Mac options, see our best AI models for Mac 2026 guide.
The NVIDIA OpenShell security layer
RTX Spark ships with NVIDIA OpenShell β a runtime for running AI agents securely on Windows. It provides:
- Policy controls: Define what agents can and cannot do
- Privacy routing: Automatically route queries to local models based on privacy settings
- Data masking: Disguise personal information in queries sent to cloud models
- Sandboxing: Contain agent actions within defined boundaries
This addresses the main concern with running AI agents locally: security. OpenShell is being adopted by Hermes Agent and OpenClaw (both popular open-source agent frameworks).
Who should wait for RTX Spark?
Buy RTX Spark if you:
- Want to run 27-70B parameter models locally on a laptop
- Need CUDA support (many AI tools require it)
- Want the Windows ecosystem with native AI agent support
- Are currently limited by GPU VRAM on your existing setup
- Need a machine that handles both AI workloads AND daily work (gaming, creative apps)
Stick with your current setup if you:
- Already have a Mac Studio with 128-192GB (similar capability, available now)
- Only need small models (<14B) that run on any modern hardware
- Primarily use API-based models (DeepSeek, MiMo) where local isnβt needed
- Cannot wait until fall 2026
Stick with API models if you:
- Your workloads need models larger than 120B parameters
- You need DeepSeek V4-Pro or MiMo V2.5 Pro quality (too large for 128GB)
- Cost of API calls is lower than hardware investment for your volume
- You need multiple models simultaneously
DGX Spark vs RTX Spark
NVIDIA also sells DGX Spark β a more powerful deskside workstation aimed at developers and researchers:
| RTX Spark | DGX Spark | |
|---|---|---|
| Target user | Consumer/prosumer | Developer/researcher |
| OS | Windows | Linux (Ubuntu) |
| Memory | 128GB unified | 128GB unified |
| GPU | Blackwell (consumer) | Blackwell (data center class) |
| Use case | Personal AI agent PC | Always-on development server |
| Price range | ~$2,000-4,000 (est.) | ~$3,000-5,000 (est.) |
DGX Spark runs Linux and is designed as an always-on AI development machine. RTX Spark is a Windows PC you also use for everything else. For most developers who want to run models locally alongside their daily workflow, RTX Spark is the better fit.
Software ecosystem at launch
RTX Spark will launch with support for:
- llama.cpp β With NVIDIA-optimized multi-token prediction (2Γ performance)
- vLLM β Server-grade inference with NVFP4 checkpoint support
- LM Studio β Consumer-friendly GUI for running models
- ComfyUI β AI image/video generation with multi-GPU optimizations
- TensorRT β NVIDIAβs inference optimizer
- OpenShell β Secure agent runtime
- Hermes Agent β Open-source AI agent (OpenShell integrated)
- OpenClaw β Open-source agent framework
- H Company Holo β Computer-use agent harness
For guides on running models with these tools, see our Ollama complete guide, LM Studio guide, and vLLM vs Ollama comparison.
What this means for self-hosted AI
RTX Spark represents a shift in who can run serious AI models locally:
Before RTX Spark: Running 70B+ models required a multi-GPU server ($10K+), a Mac Studio ($4K-7K), or cloud GPU rentals. The barrier was high.
After RTX Spark: A Windows laptop or compact desktop with 128GB unified memory can run 120B models. Multiple PC manufacturers will compete on price. The barrier drops to consumer hardware levels.
This accelerates the trend away from API dependence. If you can run Qwen 3.6 35B or Llama 4 Scout locally with no per-token cost, the break-even point for self-hosting vs API shifts dramatically.
FAQ
When can I buy RTX Spark?
Fall 2026. NVIDIA has not announced a specific date. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will be available.
How much will RTX Spark cost?
Not announced. Based on the specs (128GB unified memory, Blackwell GPU), expect $2,000-4,000 for desktops and $2,500-5,000 for laptops. This is speculative β NVIDIA has not published pricing.
Can I run DeepSeek V4-Pro or MiMo V2.5 Pro locally on RTX Spark?
No. DeepSeek V4-Pro has 1.6T total parameters (even at Q4 quantization it needs 200GB+). MiMo V2.5 Pro is a large dense model that exceeds 128GB. These models require multi-GPU server setups or API access. Stick with the DeepSeek API at $0.435/M tokens or MiMo API at $0.435/M.
Whatβs the biggest model I can run?
Approximately 120B parameters at Q4 quantization, or 70B at FP16. NVIDIA specifically demonstrated 120B models with 1M token context. MoE models where only a subset of parameters are active (like Llama 4 Scout at 109B total but 17B active) run particularly well.
How does it compare to a Mac Studio?
Similar unified memory concept (128GB vs up to 192GB on Mac). RTX Spark has CUDA support (critical for many AI tools), likely lower pricing, and comes in laptop form factor. Mac Studio has more memory options and a mature local AI ecosystem. See the comparison table above.
Will it run Ollama?
Yes. Ollama uses llama.cpp under the hood, which will be NVIDIA-optimized for RTX Spark with 2Γ performance gains. Ollama setup guide.
Is this better than cloud GPUs?
For sustained local use: yes. A one-time hardware purchase eliminates per-hour GPU rental costs. For occasional use or models that donβt fit in 128GB: no, cloud GPUs (A100 80GB, H100) still offer more memory and compute. See our self-hosted vs API comparison for the break-even analysis.
Does it support multi-GPU?
RTX Spark is a single-chip design. For multi-GPU setups, look at DGX Station for Windows (data-center-class GPU in a desktop) or build a custom multi-GPU rig with GeForce RTX 5090s.