Apr 1, 2026 · 3 min read

Run AI on a Raspberry Pi — Which Models Actually Work? (2026)

You can run AI models on a Raspberry Pi 5. Not toy demos — actual useful language models that answer questions, write code, and summarize text. The catch: you need to pick the right model and set realistic expectations about speed.

What actually works on a Pi 5

The Raspberry Pi 5 has 8GB RAM (or 16GB on the newer variant). That limits you to small models, but small models in 2026 are surprisingly capable.

Model	Size	Speed on Pi 5 (8GB)	Speed on Pi 5 (16GB)	Quality
Qwen3.5-0.8B	0.8B	~8-10 tok/s	~10-12 tok/s	Basic tasks, fast
Qwen 2.5 0.5B	0.5B	~12-15 tok/s	~15 tok/s	Simple Q&A
TinyLlama 1.1B	1.1B	~6-8 tok/s	~8-10 tok/s	Decent for size
Qwen3.5-4B (Q2)	4B	Too slow	~3-4 tok/s	Good quality, slow
Qwen3.5-35B-A3B (Q2)	3B active	Too slow	~4.5 tok/s	Surprisingly good

The sweet spot is Qwen3.5-0.8B on 8GB or Qwen3.5-35B-A3B on 16GB. The 35B-A3B model is remarkable — it has 35B parameters worth of knowledge but only activates 3B per token, so it runs on minimal hardware while being much smarter than its speed suggests.

Setup with Ollama

# Install Ollama on Pi 5
curl -fsSL https://ollama.com/install.sh | sh

# Run the best model for your Pi
ollama run qwen3.5:0.8b    # 8GB Pi
ollama run qwen3.5:35b-a3b  # 16GB Pi (Q2 quantization)

Ollama handles ARM compilation automatically. First run downloads the model — make sure you have an SD card or SSD with enough space.

Use an SSD, not an SD card

This is critical. Model loading from an SD card is painfully slow. A USB 3.0 NVMe SSD makes a massive difference:

SD card: 2-3 minutes to load a model
NVMe SSD: 10-15 seconds

A 256GB NVMe SSD costs ~$25 and transforms the Pi experience.

Realistic expectations

At 3-10 tokens per second, a Pi is not replacing your laptop for interactive coding. But it’s genuinely useful for:

Home assistant backend. Run a local AI that answers questions without sending data to the cloud.
Smart home automation. Process voice commands or sensor data locally.
Learning and experimentation. Understand how AI inference works on constrained hardware.
Offline AI. A Pi with a model is a portable, battery-powered AI that works anywhere.
Privacy-first applications. Data never leaves your network.

It’s NOT useful for:

Real-time coding assistance (too slow)
Long document analysis (not enough RAM for context)
Anything requiring fast responses

Pi 5 vs other cheap hardware

Device	RAM	Price	Best model	Speed
Raspberry Pi 5 (8GB)	8GB	~$80	Qwen3.5-0.8B	~10 tok/s
Raspberry Pi 5 (16GB)	16GB	~$120	Qwen3.5-35B-A3B	~4.5 tok/s
Mac Mini M4 (16GB)	16GB	$599	Qwen3.5-9B	~35 tok/s
Used mini PC (16GB)	16GB	~$150-200	Qwen3.5-4B	~8-12 tok/s

If your budget is over $150, a used mini PC with 16GB RAM will outperform a Pi significantly. But if you already have a Pi or want the smallest possible AI device, it works.

Run AI on a Raspberry Pi — Which Models Actually Work? (2026)

What actually works on a Pi 5

Setup with Ollama

Use an SSD, not an SD card

Realistic expectations

Pi 5 vs other cheap hardware

Related

📬 Get weekly dev tools & AI tips

You might also like

How to Run GLM-5.1 Locally — Hardware, Setup, and Quantization Guide

How to Replace GitHub Copilot for Free — Step-by-Step Guide (2026)

How to Run AI Without a GPU — CPU-Only Inference Guide (2026)

How to Run DeepSeek Locally — V3 and R1 Setup Guide