🤖 AI Tools
· 3 min read

Run AI on a Raspberry Pi — Which Models Actually Work? (2026)


You can run AI models on a Raspberry Pi 5. Not toy demos — actual useful language models that answer questions, write code, and summarize text. The catch: you need to pick the right model and set realistic expectations about speed.

What actually works on a Pi 5

The Raspberry Pi 5 has 8GB RAM (or 16GB on the newer variant). That limits you to small models, but small models in 2026 are surprisingly capable.

ModelSizeSpeed on Pi 5 (8GB)Speed on Pi 5 (16GB)Quality
Qwen3.5-0.8B0.8B~8-10 tok/s~10-12 tok/sBasic tasks, fast
Qwen 2.5 0.5B0.5B~12-15 tok/s~15 tok/sSimple Q&A
TinyLlama 1.1B1.1B~6-8 tok/s~8-10 tok/sDecent for size
Qwen3.5-4B (Q2)4BToo slow~3-4 tok/sGood quality, slow
Qwen3.5-35B-A3B (Q2)3B activeToo slow~4.5 tok/sSurprisingly good

The sweet spot is Qwen3.5-0.8B on 8GB or Qwen3.5-35B-A3B on 16GB. The 35B-A3B model is remarkable — it has 35B parameters worth of knowledge but only activates 3B per token, so it runs on minimal hardware while being much smarter than its speed suggests.

Setup with Ollama

# Install Ollama on Pi 5
curl -fsSL https://ollama.com/install.sh | sh

# Run the best model for your Pi
ollama run qwen3.5:0.8b    # 8GB Pi
ollama run qwen3.5:35b-a3b  # 16GB Pi (Q2 quantization)

Ollama handles ARM compilation automatically. First run downloads the model — make sure you have an SD card or SSD with enough space.

Use an SSD, not an SD card

This is critical. Model loading from an SD card is painfully slow. A USB 3.0 NVMe SSD makes a massive difference:

  • SD card: 2-3 minutes to load a model
  • NVMe SSD: 10-15 seconds

A 256GB NVMe SSD costs ~$25 and transforms the Pi experience.

Realistic expectations

At 3-10 tokens per second, a Pi is not replacing your laptop for interactive coding. But it’s genuinely useful for:

  • Home assistant backend. Run a local AI that answers questions without sending data to the cloud.
  • Smart home automation. Process voice commands or sensor data locally.
  • Learning and experimentation. Understand how AI inference works on constrained hardware.
  • Offline AI. A Pi with a model is a portable, battery-powered AI that works anywhere.
  • Privacy-first applications. Data never leaves your network.

It’s NOT useful for:

  • Real-time coding assistance (too slow)
  • Long document analysis (not enough RAM for context)
  • Anything requiring fast responses

Pi 5 vs other cheap hardware

DeviceRAMPriceBest modelSpeed
Raspberry Pi 5 (8GB)8GB~$80Qwen3.5-0.8B~10 tok/s
Raspberry Pi 5 (16GB)16GB~$120Qwen3.5-35B-A3B~4.5 tok/s
Mac Mini M4 (16GB)16GB$599Qwen3.5-9B~35 tok/s
Used mini PC (16GB)16GB~$150-200Qwen3.5-4B~8-12 tok/s

If your budget is over $150, a used mini PC with 16GB RAM will outperform a Pi significantly. But if you already have a Pi or want the smallest possible AI device, it works.