πŸ€– AI Tools
Β· 8 min read

NVIDIA RTX Spark: Complete Guide to the AI-First Windows PC (2026)


NVIDIA unveiled RTX Spark at Computex on June 1, 2026 β€” a new class of Windows PC designed specifically for running AI agents locally. It combines an ARM CPU with a Blackwell GPU and 128GB of unified memory in a form factor that ranges from slim laptops to compact desktops. The headline claim: it can run 120-billion-parameter LLMs with up to 1 million tokens of context on-device.

This is not a GPU upgrade. It is an entirely new product category β€” NVIDIA entering the Windows PC market directly, competing with Apple Silicon on unified memory and with Intel/AMD on the CPU side. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will ship this fall.

For developers running AI models locally, RTX Spark is the first Windows machine purpose-built for the task. Here is everything we know.

Specs

Chip name RTX Spark Superchip (also called N1X)
CPU ARM-based (Windows on ARM)
GPU Blackwell architecture
Unified memory 128GB (shared between CPU and GPU)
AI compute Up to 1 petaflop
LLM capacity 120B parameters, up to 1M token context
OS Windows (ARM)
Form factors Laptops (slim, all-day battery) and desktops
Availability Fall 2026
OEM partners ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI
Software stack CUDA, llama.cpp, vLLM, LM Studio, TensorRT
Security NVIDIA OpenShell (local agent security runtime)

What models can run on RTX Spark?

The 128GB unified memory determines what fits. Here is a realistic breakdown based on model sizes and quantization:

ModelParametersQuantizationMemory neededRuns on RTX Spark?
Qwen 3.6 27B27BQ4_K_M~16GBβœ… Easily
Qwen 3.6 35B35B (3B active)FP16~7GBβœ… Easily
Llama 4 Scout109B (17B active)Q4_K_M~60GBβœ… Fits
DeepSeek V4 Flash70B (estimated active)Q4_K_M~40GBβœ… Fits
Qwen 3.6 27B27BFP16~54GBβœ… Fits
MiniMax M3200-400B (estimated)Q4_K_M~100-200GB⚠️ Tight/unlikely
DeepSeek V4-Pro1.6T (49B active)Q4_K_M~100-200GB❌ Too large
MiMo V2.5 ProDense (large)Any>128GB❌ Too large
Claude Opus 4.8Closed sourceN/AN/A❌ API only

The sweet spot is models up to ~70B parameters at FP16 or ~120B at Q4 quantization. NVIDIA specifically benchmarked with Qwen 3.6 models and demonstrated 2x performance improvements on the 27B variant.

Models that will run great on RTX Spark:

  • Qwen 3.6 27B and 35B (NVIDIA’s optimized targets)
  • Llama 4 Scout (109B MoE, 17B active β€” fits comfortably)
  • Gemma 4 variants
  • Mistral Medium 3.5
  • Any model under 70B dense or 120B MoE

Models that will NOT run on RTX Spark:

  • DeepSeek V4-Pro (1.6T total, even quantized needs 200GB+)
  • MiMo V2.5 Pro (dense architecture, too large for 128GB)
  • MiniMax M3 (estimated 200-400B, likely too large)
  • Any model requiring >128GB total memory

Performance: what NVIDIA demonstrated

NVIDIA showed concrete llama.cpp benchmarks at Computex:

  • Qwen 3.6 27B: 2Γ— throughput improvement with multi-token prediction + optimizations
  • Qwen 3.6 35B: 1.6Γ— throughput improvement
  • Multi-GPU (2Γ— GPUs): 2Γ— memory and 1.8Γ— compute via tensor parallelism

These optimizations are available in llama.cpp and LM Studio today on existing RTX hardware, and will be further optimized for RTX Spark at launch.

The 1M token context claim is significant β€” most local setups are limited to 32-128K tokens due to memory constraints. With 128GB unified memory, RTX Spark can maintain much larger contexts without swapping to disk.

RTX Spark vs Mac Studio for local AI

The most obvious comparison is Apple’s Mac Studio with M4 Ultra:

NVIDIA RTX SparkApple Mac Studio M4 Ultra
Unified memory128GBUp to 192GB
GPU compute1 PFLOP (Blackwell)~27 TFLOPS (Apple GPU)
AI-specific hardwareTensor Cores, TensorRTNeural Engine
Max model size~120B params~140B params (192GB config)
OSWindows (ARM)macOS
CUDA supportβœ… Native❌
llama.cpp optimizationβœ… NVIDIA-optimizedβœ… Metal-optimized
vLLM supportβœ…βŒ
Price (estimated)TBD (likely $2,000-4,000)$4,000-7,000
Form factorLaptops + desktopsDesktop only
Battery (laptop)All-dayN/A (desktop)

RTX Spark’s advantage: CUDA ecosystem, TensorRT acceleration, laptop form factor, likely lower price. Mac Studio’s advantage: more memory (192GB option), mature ecosystem for local AI, proven performance.

For a deeper dive on Mac options, see our best AI models for Mac 2026 guide.

The NVIDIA OpenShell security layer

RTX Spark ships with NVIDIA OpenShell β€” a runtime for running AI agents securely on Windows. It provides:

  • Policy controls: Define what agents can and cannot do
  • Privacy routing: Automatically route queries to local models based on privacy settings
  • Data masking: Disguise personal information in queries sent to cloud models
  • Sandboxing: Contain agent actions within defined boundaries

This addresses the main concern with running AI agents locally: security. OpenShell is being adopted by Hermes Agent and OpenClaw (both popular open-source agent frameworks).

Who should wait for RTX Spark?

Buy RTX Spark if you:

  • Want to run 27-70B parameter models locally on a laptop
  • Need CUDA support (many AI tools require it)
  • Want the Windows ecosystem with native AI agent support
  • Are currently limited by GPU VRAM on your existing setup
  • Need a machine that handles both AI workloads AND daily work (gaming, creative apps)

Stick with your current setup if you:

  • Already have a Mac Studio with 128-192GB (similar capability, available now)
  • Only need small models (<14B) that run on any modern hardware
  • Primarily use API-based models (DeepSeek, MiMo) where local isn’t needed
  • Cannot wait until fall 2026

Stick with API models if you:

  • Your workloads need models larger than 120B parameters
  • You need DeepSeek V4-Pro or MiMo V2.5 Pro quality (too large for 128GB)
  • Cost of API calls is lower than hardware investment for your volume
  • You need multiple models simultaneously

DGX Spark vs RTX Spark

NVIDIA also sells DGX Spark β€” a more powerful deskside workstation aimed at developers and researchers:

RTX SparkDGX Spark
Target userConsumer/prosumerDeveloper/researcher
OSWindowsLinux (Ubuntu)
Memory128GB unified128GB unified
GPUBlackwell (consumer)Blackwell (data center class)
Use casePersonal AI agent PCAlways-on development server
Price range~$2,000-4,000 (est.)~$3,000-5,000 (est.)

DGX Spark runs Linux and is designed as an always-on AI development machine. RTX Spark is a Windows PC you also use for everything else. For most developers who want to run models locally alongside their daily workflow, RTX Spark is the better fit.

Software ecosystem at launch

RTX Spark will launch with support for:

  • llama.cpp β€” With NVIDIA-optimized multi-token prediction (2Γ— performance)
  • vLLM β€” Server-grade inference with NVFP4 checkpoint support
  • LM Studio β€” Consumer-friendly GUI for running models
  • ComfyUI β€” AI image/video generation with multi-GPU optimizations
  • TensorRT β€” NVIDIA’s inference optimizer
  • OpenShell β€” Secure agent runtime
  • Hermes Agent β€” Open-source AI agent (OpenShell integrated)
  • OpenClaw β€” Open-source agent framework
  • H Company Holo β€” Computer-use agent harness

For guides on running models with these tools, see our Ollama complete guide, LM Studio guide, and vLLM vs Ollama comparison.

What this means for self-hosted AI

RTX Spark represents a shift in who can run serious AI models locally:

Before RTX Spark: Running 70B+ models required a multi-GPU server ($10K+), a Mac Studio ($4K-7K), or cloud GPU rentals. The barrier was high.

After RTX Spark: A Windows laptop or compact desktop with 128GB unified memory can run 120B models. Multiple PC manufacturers will compete on price. The barrier drops to consumer hardware levels.

This accelerates the trend away from API dependence. If you can run Qwen 3.6 35B or Llama 4 Scout locally with no per-token cost, the break-even point for self-hosting vs API shifts dramatically.

FAQ

When can I buy RTX Spark?

Fall 2026. NVIDIA has not announced a specific date. PCs from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will be available.

How much will RTX Spark cost?

Not announced. Based on the specs (128GB unified memory, Blackwell GPU), expect $2,000-4,000 for desktops and $2,500-5,000 for laptops. This is speculative β€” NVIDIA has not published pricing.

Can I run DeepSeek V4-Pro or MiMo V2.5 Pro locally on RTX Spark?

No. DeepSeek V4-Pro has 1.6T total parameters (even at Q4 quantization it needs 200GB+). MiMo V2.5 Pro is a large dense model that exceeds 128GB. These models require multi-GPU server setups or API access. Stick with the DeepSeek API at $0.435/M tokens or MiMo API at $0.435/M.

What’s the biggest model I can run?

Approximately 120B parameters at Q4 quantization, or 70B at FP16. NVIDIA specifically demonstrated 120B models with 1M token context. MoE models where only a subset of parameters are active (like Llama 4 Scout at 109B total but 17B active) run particularly well.

How does it compare to a Mac Studio?

Similar unified memory concept (128GB vs up to 192GB on Mac). RTX Spark has CUDA support (critical for many AI tools), likely lower pricing, and comes in laptop form factor. Mac Studio has more memory options and a mature local AI ecosystem. See the comparison table above.

Will it run Ollama?

Yes. Ollama uses llama.cpp under the hood, which will be NVIDIA-optimized for RTX Spark with 2Γ— performance gains. Ollama setup guide.

Is this better than cloud GPUs?

For sustained local use: yes. A one-time hardware purchase eliminates per-hour GPU rental costs. For occasional use or models that don’t fit in 128GB: no, cloud GPUs (A100 80GB, H100) still offer more memory and compute. See our self-hosted vs API comparison for the break-even analysis.

Does it support multi-GPU?

RTX Spark is a single-chip design. For multi-GPU setups, look at DGX Station for Windows (data-center-class GPU in a desktop) or build a custom multi-GPU rig with GeForce RTX 5090s.