πŸ“ Tutorials
Β· 3 min read

Build a Private Voice Assistant with Ollama β€” No Cloud, No Alexa (2026)


Alexa, Google Assistant, and Siri send everything you say to the cloud. Every command, every question, every accidental activation. A local voice assistant keeps your voice data on your hardware β€” and with 2026 models, the quality is surprisingly good.

Architecture

Microphone β†’ Whisper (speech-to-text, local)
    β†’ Ollama (AI response, local)
        β†’ Piper TTS (text-to-speech, local)
            β†’ Speaker

Everything runs locally. No internet required after initial setup.

Requirements

ComponentMinimumRecommended
CPU/GPUAny modern CPUApple Silicon or NVIDIA GPU
RAM8 GB16 GB
Storage10 GB20 GB
MicrophoneAny USB micReSpeaker array
SpeakerAny3.5mm or Bluetooth

Works on: Mac, Linux desktop, Raspberry Pi 5 (8GB), any x86 machine.

Want to try this without buying hardware? Cloud GPU providers let you spin up the right GPU in minutes β€” useful if you want to run a larger model for better response quality.

Setup

1. Install components

# Ollama (AI brain)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3:8b

# Whisper.cpp (speech-to-text)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
bash models/download-ggml-model.sh base.en

# Piper TTS (text-to-speech)
pip install piper-tts

2. Python assistant script

import subprocess
import ollama
import tempfile
import os

def listen(duration=5):
    """Record audio and transcribe with Whisper."""
    audio_file = tempfile.mktemp(suffix=".wav")
    # Record
    subprocess.run(["arecord", "-d", str(duration), "-f", "S16_LE", "-r", "16000", audio_file],
                   capture_output=True)
    # Transcribe
    result = subprocess.run(
        ["./whisper.cpp/main", "-m", "whisper.cpp/models/ggml-base.en.bin", "-f", audio_file, "--no-timestamps"],
        capture_output=True, text=True
    )
    os.unlink(audio_file)
    return result.stdout.strip()

def think(prompt):
    """Get AI response from Ollama."""
    response = ollama.chat(model="qwen3:8b", messages=[
        {"role": "system", "content": "You are a helpful voice assistant. Keep responses short and conversational β€” 1-3 sentences max."},
        {"role": "user", "content": prompt},
    ])
    return response["message"]["content"]

def speak(text):
    """Convert text to speech with Piper."""
    subprocess.run(
        f'echo "{text}" | piper --model en_US-lessac-medium --output-raw | aplay -r 22050 -f S16_LE',
        shell=True
    )

# Main loop
print("🎀 Listening... (say 'quit' to exit)")
while True:
    text = listen()
    if not text or "quit" in text.lower():
        break
    print(f"You: {text}")
    response = think(text)
    print(f"AI: {response}")
    speak(response)

3. Run it

python3 assistant.py
# Say something β€” it listens, thinks, and responds

Raspberry Pi 5 setup

The Pi 5 (8GB) can run this stack with a small model:

# Use a smaller model for Pi
ollama pull phi4:3.8b  # Fits in 4GB, leaves room for Whisper

# Use tiny Whisper model
bash models/download-ggml-model.sh tiny.en

Response time on Pi 5: ~3-5 seconds (acceptable for a voice assistant). On a Mac with Apple Silicon: <1 second.

Add wake word detection

To avoid always-listening, add a wake word:

pip install openwakeword
from openwakeword import Model

wake_model = Model(wakeword_models=["hey_jarvis"])

def wait_for_wake_word():
    """Listen for wake word before activating."""
    # ... microphone stream processing
    # Returns True when wake word detected

Connect to Home Assistant

For smart home control, add Home Assistant integration:

import requests

HA_URL = "http://homeassistant.local:8123"
HA_TOKEN = "your-long-lived-access-token"

def control_home(command):
    """Let AI decide what Home Assistant action to take."""
    response = ollama.chat(model="qwen3:8b", messages=[{
        "role": "user",
        "content": f"""The user said: "{command}"
        
Available actions:
- turn_on/turn_off: light.living_room, light.bedroom, switch.fan
- set_temperature: climate.thermostat (range: 18-25)
- lock/unlock: lock.front_door

Respond with JSON: {{"action": "turn_on", "entity": "light.living_room"}}
Or respond with {{"action": "none"}} if this isn't a home control request."""
    }])
    # Parse and execute via Home Assistant API

Privacy comparison

AssistantVoice data sent to cloudAlways listeningLocal processing
Alexaβœ… Yesβœ… Yes❌ No
Google Assistantβœ… Yesβœ… Yes❌ No
Siriβœ… Yesβœ… YesPartial
This setup❌ NoOptionalβœ… Yes

Your voice never leaves your network. No recordings stored in the cloud. No β€œAlexa, I didn’t say that” moments.

Related: Ollama Complete Guide Β· Run AI on Raspberry Pi Β· Best AI Models Under 4GB RAM Β· Self-Hosted AI for Enterprise Β· Self-Host n8n with Local AI Β· Run AI Offline