Jun 3, 2026 · 8 min read

NVIDIA RTX Spark: Complete Guide to the AI-First Windows PC (2026)

NVIDIA unveiled RTX Spark at Computex on June 1, 2026 — a new class of Windows PC designed specifically for running AI agents locally. It combines an ARM CPU with a Blackwell GPU and 128GB of unified memory in a form factor that ranges from slim laptops to compact desktops. The headline claim: it can run 120-billion-parameter LLMs with up to 1 million tokens of context on-device.

This is not a GPU upgrade. It is an entirely new product category — NVIDIA entering the Windows PC market directly, competing with Apple Silicon on unified memory and with Intel/AMD on the CPU side. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will ship this fall.

For developers running AI models locally, RTX Spark is the first Windows machine purpose-built for the task. Here is everything we know.

Specs

Chip name	RTX Spark Superchip (also called N1X)
CPU	ARM-based (Windows on ARM)
GPU	Blackwell architecture
Unified memory	128GB (shared between CPU and GPU)
AI compute	Up to 1 petaflop
LLM capacity	120B parameters, up to 1M token context
OS	Windows (ARM)
Form factors	Laptops (slim, all-day battery) and desktops
Availability	Fall 2026
OEM partners	ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI
Software stack	CUDA, llama.cpp, vLLM, LM Studio, TensorRT
Security	NVIDIA OpenShell (local agent security runtime)

What models can run on RTX Spark?

The 128GB unified memory determines what fits. Here is a realistic breakdown based on model sizes and quantization:

Model	Parameters	Quantization	Memory needed	Runs on RTX Spark?
Qwen 3.6 27B	27B	Q4_K_M	~16GB	✅ Easily
Qwen 3.6 35B	35B (3B active)	FP16	~7GB	✅ Easily
Llama 4 Scout	109B (17B active)	Q4_K_M	~60GB	✅ Fits
DeepSeek V4 Flash	70B (estimated active)	Q4_K_M	~40GB	✅ Fits
Qwen 3.6 27B	27B	FP16	~54GB	✅ Fits
MiniMax M3	200-400B (estimated)	Q4_K_M	~100-200GB	⚠️ Tight/unlikely
DeepSeek V4-Pro	1.6T (49B active)	Q4_K_M	~100-200GB	❌ Too large
MiMo V2.5 Pro	Dense (large)	Any	>128GB	❌ Too large
Claude Opus 4.8	Closed source	N/A	N/A	❌ API only

The sweet spot is models up to ~70B parameters at FP16 or ~120B at Q4 quantization. NVIDIA specifically benchmarked with Qwen 3.6 models and demonstrated 2x performance improvements on the 27B variant.

Models that will run great on RTX Spark:

Qwen 3.6 27B and 35B (NVIDIA’s optimized targets)
Llama 4 Scout (109B MoE, 17B active — fits comfortably)
Gemma 4 variants
Mistral Medium 3.5
Any model under 70B dense or 120B MoE

Models that will NOT run on RTX Spark:

DeepSeek V4-Pro (1.6T total, even quantized needs 200GB+)
MiMo V2.5 Pro (dense architecture, too large for 128GB)
MiniMax M3 (estimated 200-400B, likely too large)
Any model requiring >128GB total memory

Performance: what NVIDIA demonstrated

NVIDIA showed concrete llama.cpp benchmarks at Computex:

Qwen 3.6 27B: 2× throughput improvement with multi-token prediction + optimizations
Qwen 3.6 35B: 1.6× throughput improvement
Multi-GPU (2× GPUs): 2× memory and 1.8× compute via tensor parallelism

These optimizations are available in llama.cpp and LM Studio today on existing RTX hardware, and will be further optimized for RTX Spark at launch.

The 1M token context claim is significant — most local setups are limited to 32-128K tokens due to memory constraints. With 128GB unified memory, RTX Spark can maintain much larger contexts without swapping to disk.

RTX Spark vs Mac Studio for local AI

The most obvious comparison is Apple’s Mac Studio with M4 Ultra:

	NVIDIA RTX Spark	Apple Mac Studio M4 Ultra
Unified memory	128GB	Up to 192GB
GPU compute	1 PFLOP (Blackwell)	~27 TFLOPS (Apple GPU)
AI-specific hardware	Tensor Cores, TensorRT	Neural Engine
Max model size	~120B params	~140B params (192GB config)
OS	Windows (ARM)	macOS
CUDA support	✅ Native	❌
llama.cpp optimization	✅ NVIDIA-optimized	✅ Metal-optimized
vLLM support	✅	❌
Price (estimated)	TBD (likely $2,000-4,000)	$4,000-7,000
Form factor	Laptops + desktops	Desktop only
Battery (laptop)	All-day	N/A (desktop)

RTX Spark’s advantage: CUDA ecosystem, TensorRT acceleration, laptop form factor, likely lower price. Mac Studio’s advantage: more memory (192GB option), mature ecosystem for local AI, proven performance.

For a deeper dive on Mac options, see our best AI models for Mac 2026 guide.

The NVIDIA OpenShell security layer

RTX Spark ships with NVIDIA OpenShell — a runtime for running AI agents securely on Windows. It provides:

Policy controls: Define what agents can and cannot do
Privacy routing: Automatically route queries to local models based on privacy settings
Data masking: Disguise personal information in queries sent to cloud models
Sandboxing: Contain agent actions within defined boundaries

This addresses the main concern with running AI agents locally: security. OpenShell is being adopted by Hermes Agent and OpenClaw (both popular open-source agent frameworks).

Who should wait for RTX Spark?

Buy RTX Spark if you:

Want to run 27-70B parameter models locally on a laptop
Need CUDA support (many AI tools require it)
Want the Windows ecosystem with native AI agent support
Are currently limited by GPU VRAM on your existing setup
Need a machine that handles both AI workloads AND daily work (gaming, creative apps)

Stick with your current setup if you:

Already have a Mac Studio with 128-192GB (similar capability, available now)
Only need small models (<14B) that run on any modern hardware
Primarily use API-based models (DeepSeek, MiMo) where local isn’t needed
Cannot wait until fall 2026

Stick with API models if you:

Your workloads need models larger than 120B parameters
You need DeepSeek V4-Pro or MiMo V2.5 Pro quality (too large for 128GB)
Cost of API calls is lower than hardware investment for your volume
You need multiple models simultaneously

DGX Spark vs RTX Spark

NVIDIA also sells DGX Spark — a more powerful deskside workstation aimed at developers and researchers:

	RTX Spark	DGX Spark
Target user	Consumer/prosumer	Developer/researcher
OS	Windows	Linux (Ubuntu)
Memory	128GB unified	128GB unified
GPU	Blackwell (consumer)	Blackwell (data center class)
Use case	Personal AI agent PC	Always-on development server
Price range	~$2,000-4,000 (est.)	~$3,000-5,000 (est.)

DGX Spark runs Linux and is designed as an always-on AI development machine. RTX Spark is a Windows PC you also use for everything else. For most developers who want to run models locally alongside their daily workflow, RTX Spark is the better fit.

Software ecosystem at launch

RTX Spark will launch with support for:

llama.cpp — With NVIDIA-optimized multi-token prediction (2× performance)
vLLM — Server-grade inference with NVFP4 checkpoint support
LM Studio — Consumer-friendly GUI for running models
ComfyUI — AI image/video generation with multi-GPU optimizations
TensorRT — NVIDIA’s inference optimizer
OpenShell — Secure agent runtime
Hermes Agent — Open-source AI agent (OpenShell integrated)
OpenClaw — Open-source agent framework
H Company Holo — Computer-use agent harness

For guides on running models with these tools, see our Ollama complete guide, LM Studio guide, and vLLM vs Ollama comparison.

What this means for self-hosted AI

RTX Spark represents a shift in who can run serious AI models locally:

Before RTX Spark: Running 70B+ models required a multi-GPU server ($10K+), a Mac Studio ($4K-7K), or cloud GPU rentals. The barrier was high.

After RTX Spark: A Windows laptop or compact desktop with 128GB unified memory can run 120B models. Multiple PC manufacturers will compete on price. The barrier drops to consumer hardware levels.

This accelerates the trend away from API dependence. If you can run Qwen 3.6 35B or Llama 4 Scout locally with no per-token cost, the break-even point for self-hosting vs API shifts dramatically.

FAQ

When can I buy RTX Spark?

Fall 2026. NVIDIA has not announced a specific date. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will be available.

How much will RTX Spark cost?

Not announced. Based on the specs (128GB unified memory, Blackwell GPU), expect $2,000-4,000 for desktops and $2,500-5,000 for laptops. This is speculative — NVIDIA has not published pricing.

Can I run DeepSeek V4-Pro or MiMo V2.5 Pro locally on RTX Spark?

No. DeepSeek V4-Pro has 1.6T total parameters (even at Q4 quantization it needs 200GB+). MiMo V2.5 Pro is a large dense model that exceeds 128GB. These models require multi-GPU server setups or API access. Stick with the DeepSeek API at $0.435/M tokens or MiMo API at $0.435/M.

What’s the biggest model I can run?

Approximately 120B parameters at Q4 quantization, or 70B at FP16. NVIDIA specifically demonstrated 120B models with 1M token context. MoE models where only a subset of parameters are active (like Llama 4 Scout at 109B total but 17B active) run particularly well.

How does it compare to a Mac Studio?

Similar unified memory concept (128GB vs up to 192GB on Mac). RTX Spark has CUDA support (critical for many AI tools), likely lower pricing, and comes in laptop form factor. Mac Studio has more memory options and a mature local AI ecosystem. See the comparison table above.

Will it run Ollama?

Yes. Ollama uses llama.cpp under the hood, which will be NVIDIA-optimized for RTX Spark with 2× performance gains. Ollama setup guide.

Is this better than cloud GPUs?

For sustained local use: yes. A one-time hardware purchase eliminates per-hour GPU rental costs. For occasional use or models that don’t fit in 128GB: no, cloud GPUs (A100 80GB, H100) still offer more memory and compute. See our self-hosted vs API comparison for the break-even analysis.

Does it support multi-GPU?

RTX Spark is a single-chip design. For multi-GPU setups, look at DGX Station for Windows (data-center-class GPU in a desktop) or build a custom multi-GPU rig with GeForce RTX 5090s.