Build a Private Voice Assistant with Ollama β No Cloud, No Alexa (2026)
Alexa, Google Assistant, and Siri send everything you say to the cloud. Every command, every question, every accidental activation. A local voice assistant keeps your voice data on your hardware β and with 2026 models, the quality is surprisingly good.
Architecture
Microphone β Whisper (speech-to-text, local)
β Ollama (AI response, local)
β Piper TTS (text-to-speech, local)
β Speaker
Everything runs locally. No internet required after initial setup.
Requirements
| Component | Minimum | Recommended |
|---|---|---|
| CPU/GPU | Any modern CPU | Apple Silicon or NVIDIA GPU |
| RAM | 8 GB | 16 GB |
| Storage | 10 GB | 20 GB |
| Microphone | Any USB mic | ReSpeaker array |
| Speaker | Any | 3.5mm or Bluetooth |
Works on: Mac, Linux desktop, Raspberry Pi 5 (8GB), any x86 machine.
Want to try this without buying hardware? Cloud GPU providers let you spin up the right GPU in minutes β useful if you want to run a larger model for better response quality.
Setup
1. Install components
# Ollama (AI brain)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3:8b
# Whisper.cpp (speech-to-text)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
bash models/download-ggml-model.sh base.en
# Piper TTS (text-to-speech)
pip install piper-tts
2. Python assistant script
import subprocess
import ollama
import tempfile
import os
def listen(duration=5):
"""Record audio and transcribe with Whisper."""
audio_file = tempfile.mktemp(suffix=".wav")
# Record
subprocess.run(["arecord", "-d", str(duration), "-f", "S16_LE", "-r", "16000", audio_file],
capture_output=True)
# Transcribe
result = subprocess.run(
["./whisper.cpp/main", "-m", "whisper.cpp/models/ggml-base.en.bin", "-f", audio_file, "--no-timestamps"],
capture_output=True, text=True
)
os.unlink(audio_file)
return result.stdout.strip()
def think(prompt):
"""Get AI response from Ollama."""
response = ollama.chat(model="qwen3:8b", messages=[
{"role": "system", "content": "You are a helpful voice assistant. Keep responses short and conversational β 1-3 sentences max."},
{"role": "user", "content": prompt},
])
return response["message"]["content"]
def speak(text):
"""Convert text to speech with Piper."""
subprocess.run(
f'echo "{text}" | piper --model en_US-lessac-medium --output-raw | aplay -r 22050 -f S16_LE',
shell=True
)
# Main loop
print("π€ Listening... (say 'quit' to exit)")
while True:
text = listen()
if not text or "quit" in text.lower():
break
print(f"You: {text}")
response = think(text)
print(f"AI: {response}")
speak(response)
3. Run it
python3 assistant.py
# Say something β it listens, thinks, and responds
Raspberry Pi 5 setup
The Pi 5 (8GB) can run this stack with a small model:
# Use a smaller model for Pi
ollama pull phi4:3.8b # Fits in 4GB, leaves room for Whisper
# Use tiny Whisper model
bash models/download-ggml-model.sh tiny.en
Response time on Pi 5: ~3-5 seconds (acceptable for a voice assistant). On a Mac with Apple Silicon: <1 second.
Add wake word detection
To avoid always-listening, add a wake word:
pip install openwakeword
from openwakeword import Model
wake_model = Model(wakeword_models=["hey_jarvis"])
def wait_for_wake_word():
"""Listen for wake word before activating."""
# ... microphone stream processing
# Returns True when wake word detected
Connect to Home Assistant
For smart home control, add Home Assistant integration:
import requests
HA_URL = "http://homeassistant.local:8123"
HA_TOKEN = "your-long-lived-access-token"
def control_home(command):
"""Let AI decide what Home Assistant action to take."""
response = ollama.chat(model="qwen3:8b", messages=[{
"role": "user",
"content": f"""The user said: "{command}"
Available actions:
- turn_on/turn_off: light.living_room, light.bedroom, switch.fan
- set_temperature: climate.thermostat (range: 18-25)
- lock/unlock: lock.front_door
Respond with JSON: {{"action": "turn_on", "entity": "light.living_room"}}
Or respond with {{"action": "none"}} if this isn't a home control request."""
}])
# Parse and execute via Home Assistant API
Privacy comparison
| Assistant | Voice data sent to cloud | Always listening | Local processing |
|---|---|---|---|
| Alexa | β Yes | β Yes | β No |
| Google Assistant | β Yes | β Yes | β No |
| Siri | β Yes | β Yes | Partial |
| This setup | β No | Optional | β Yes |
Your voice never leaves your network. No recordings stored in the cloud. No βAlexa, I didnβt say thatβ moments.
Related: Ollama Complete Guide Β· Run AI on Raspberry Pi Β· Best AI Models Under 4GB RAM Β· Self-Hosted AI for Enterprise Β· Self-Host n8n with Local AI Β· Run AI Offline