Jun 19, 2026 · 3 min read

Build a Private Voice Assistant with Ollama — No Cloud, No Alexa (2026)

Alexa, Google Assistant, and Siri send everything you say to the cloud. Every command, every question, every accidental activation. A local voice assistant keeps your voice data on your hardware — and with 2026 models, the quality is surprisingly good.

Architecture

Microphone → Whisper (speech-to-text, local)
    → Ollama (AI response, local)
        → Piper TTS (text-to-speech, local)
            → Speaker

Everything runs locally. No internet required after initial setup.

Requirements

Component	Minimum	Recommended
CPU/GPU	Any modern CPU	Apple Silicon or NVIDIA GPU
RAM	8 GB	16 GB
Storage	10 GB	20 GB
Microphone	Any USB mic	ReSpeaker array
Speaker	Any	3.5mm or Bluetooth

Works on: Mac, Linux desktop, Raspberry Pi 5 (8GB), any x86 machine.

Want to try this without buying hardware? Cloud GPU providers let you spin up the right GPU in minutes — useful if you want to run a larger model for better response quality.

Setup

1. Install components

# Ollama (AI brain)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3:8b

# Whisper.cpp (speech-to-text)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
bash models/download-ggml-model.sh base.en

# Piper TTS (text-to-speech)
pip install piper-tts

2. Python assistant script

import subprocess
import ollama
import tempfile
import os

def listen(duration=5):
    """Record audio and transcribe with Whisper."""
    audio_file = tempfile.mktemp(suffix=".wav")
    # Record
    subprocess.run(["arecord", "-d", str(duration), "-f", "S16_LE", "-r", "16000", audio_file],
                   capture_output=True)
    # Transcribe
    result = subprocess.run(
        ["./whisper.cpp/main", "-m", "whisper.cpp/models/ggml-base.en.bin", "-f", audio_file, "--no-timestamps"],
        capture_output=True, text=True
    )
    os.unlink(audio_file)
    return result.stdout.strip()

def think(prompt):
    """Get AI response from Ollama."""
    response = ollama.chat(model="qwen3:8b", messages=[
        {"role": "system", "content": "You are a helpful voice assistant. Keep responses short and conversational — 1-3 sentences max."},
        {"role": "user", "content": prompt},
    ])
    return response["message"]["content"]

def speak(text):
    """Convert text to speech with Piper."""
    subprocess.run(
        f'echo "{text}" | piper --model en_US-lessac-medium --output-raw | aplay -r 22050 -f S16_LE',
        shell=True
    )

# Main loop
print("🎤 Listening... (say 'quit' to exit)")
while True:
    text = listen()
    if not text or "quit" in text.lower():
        break
    print(f"You: {text}")
    response = think(text)
    print(f"AI: {response}")
    speak(response)

3. Run it

python3 assistant.py
# Say something — it listens, thinks, and responds

Raspberry Pi 5 setup

The Pi 5 (8GB) can run this stack with a small model:

# Use a smaller model for Pi
ollama pull phi4:3.8b  # Fits in 4GB, leaves room for Whisper

# Use tiny Whisper model
bash models/download-ggml-model.sh tiny.en

Response time on Pi 5: ~3-5 seconds (acceptable for a voice assistant). On a Mac with Apple Silicon: <1 second.

Add wake word detection

To avoid always-listening, add a wake word:

pip install openwakeword

from openwakeword import Model

wake_model = Model(wakeword_models=["hey_jarvis"])

def wait_for_wake_word():
    """Listen for wake word before activating."""
    # ... microphone stream processing
    # Returns True when wake word detected

Connect to Home Assistant

For smart home control, add Home Assistant integration:

import requests

HA_URL = "http://homeassistant.local:8123"
HA_TOKEN = "your-long-lived-access-token"

def control_home(command):
    """Let AI decide what Home Assistant action to take."""
    response = ollama.chat(model="qwen3:8b", messages=[{
        "role": "user",
        "content": f"""The user said: "{command}"
        
Available actions:
- turn_on/turn_off: light.living_room, light.bedroom, switch.fan
- set_temperature: climate.thermostat (range: 18-25)
- lock/unlock: lock.front_door

Respond with JSON: {{"action": "turn_on", "entity": "light.living_room"}}
Or respond with {{"action": "none"}} if this isn't a home control request."""
    }])
    # Parse and execute via Home Assistant API

Privacy comparison

Assistant	Voice data sent to cloud	Always listening	Local processing
Alexa	✅ Yes	✅ Yes	❌ No
Google Assistant	✅ Yes	✅ Yes	❌ No
Siri	✅ Yes	✅ Yes	Partial
This setup	❌ No	Optional	✅ Yes

Your voice never leaves your network. No recordings stored in the cloud. No “Alexa, I didn’t say that” moments.

Build a Private Voice Assistant with Ollama — No Cloud, No Alexa (2026)

Architecture

Requirements

Setup

1. Install components

2. Python assistant script

3. Run it

Raspberry Pi 5 setup

Add wake word detection

Connect to Home Assistant

Privacy comparison

📬 AI Dev Weekly

You might also like

How to Set Up Open WebUI — Complete Guide for Teams and Schools (2026)

How to Run Kimi K2.7 Code Locally: Hardware, Quantization, and Setup (2026)

How to Run openPangu 2.0 Locally: Ascend and GPU Setup Guide (2026)

AI-Powered Log Analysis with Local Models (2026)