May 16, 2026 · 5 min read

Build a Local Voice Assistant with Whisper + Ollama (2026)

Alexa, Siri, and Google Assistant are convenient — but every word you say gets shipped to a server you don’t control. What if you could build the same thing, running entirely on your own hardware, with zero cloud dependency?

That’s exactly what we’re building today. A local voice assistant that listens through your microphone, transcribes speech with OpenAI’s Whisper, thinks with a local LLM via Ollama, and speaks the answer back to you. No API keys, no subscriptions, no data leaving your machine.

Architecture

The pipeline is straightforward:

🎤 Microphone
   ↓
🔊 Whisper (speech-to-text, runs locally)
   ↓
🧠 Ollama (LLM generates a response)
   ↓
🔈 pyttsx3 / edge-tts (text-to-speech)
   ↓
🎧 Speaker

Each piece is swappable. You could replace Whisper with faster-whisper for speed, swap Ollama models on the fly, or switch TTS engines depending on your platform. The glue between them is a short Python script.

Prerequisites

Before we start, make sure you have:

Python 3.10+ installed
Ollama installed and running — if you’re new to it, check out our complete Ollama guide first
A working microphone
Around 8 GB of RAM minimum (16 GB recommended for comfortable model loading — see best AI models under 16 GB VRAM)

Pull a model in Ollama before continuing:

ollama pull llama3.2

Step 1: Install Whisper

OpenAI’s Whisper runs locally and handles speech-to-text. We’ll also need sounddevice and scipy to capture audio from the microphone.

pip install openai-whisper sounddevice scipy numpy

Whisper comes in several sizes. For a voice assistant where latency matters, base or small are the sweet spot. The base model is about 140 MB and transcribes in near real-time on most machines:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.wav")
print(result["text"])

If you have a decent GPU, you can bump up to small or medium for better accuracy. On CPU-only setups, stick with base — it’s surprisingly good for English.

Tip: For even faster transcription, consider faster-whisper which uses CTranslate2 under the hood and can be 4x faster than the original.

Step 2: Set Up Ollama

Ollama serves as the brain. It takes the transcribed text and generates a response. If you followed the prerequisite, you already have a model pulled.

Test it from the command line:

ollama run llama3.2 "What is the capital of France?"

In Python, we talk to Ollama through its local HTTP API:

import requests

def ask_ollama(prompt, model="llama3.2"):
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": model,
        "prompt": prompt,
        "stream": False
    })
    return response.json()["response"]

That’s it. No API keys, no tokens, no rate limits. Just a local HTTP call. For a deeper dive into what models work best for this kind of task, see our guide on the cheapest way to run AI locally.

Step 3: Text-to-Speech

For speaking the response back, pyttsx3 is the simplest option — it works offline and cross-platform with no extra downloads:

pip install pyttsx3

import pyttsx3

engine = pyttsx3.init()
engine.say("Hello, I am your local voice assistant.")
engine.runAndWait()

pyttsx3 uses your OS’s built-in speech engine (SAPI5 on Windows, NSSpeechSynthesizer on macOS, espeak on Linux). The voice quality is functional but robotic.

If you want more natural-sounding speech and don’t mind a slightly heavier dependency, edge-tts is an excellent alternative:

pip install edge-tts playsound

import edge_tts, asyncio

async def speak(text):
    communicate = edge_tts.Communicate(text, "en-US-AriaNeural")
    await communicate.save("response.mp3")

asyncio.run(speak("Hello from edge TTS"))

Note that edge-tts does make network calls to Microsoft’s edge services, so it’s not fully local. For a 100% offline setup, stick with pyttsx3.

Step 4: Wire It All Together

Here’s the complete script. It records from your microphone, transcribes with Whisper, sends the text to Ollama, and speaks the response:

import whisper
import sounddevice as sd
import scipy.io.wavfile as wav
import numpy as np
import requests
import pyttsx3
import tempfile
import os

# --- Config ---
WHISPER_MODEL = "base"
OLLAMA_MODEL = "llama3.2"
SAMPLE_RATE = 16000
RECORD_SECONDS = 5
OLLAMA_URL = "http://localhost:11434/api/generate"

# --- Load models ---
print("Loading Whisper model...")
whisper_model = whisper.load_model(WHISPER_MODEL)
tts_engine = pyttsx3.init()

def record_audio(duration=RECORD_SECONDS):
    """Record audio from the microphone."""
    print(f"🎤 Listening for {duration} seconds...")
    audio = sd.rec(int(duration * SAMPLE_RATE), samplerate=SAMPLE_RATE,
                   channels=1, dtype="float32")
    sd.wait()
    return audio.flatten()

def transcribe(audio):
    """Transcribe audio using Whisper."""
    tmp = tempfile.mktemp(suffix=".wav")
    wav.write(tmp, SAMPLE_RATE, (audio * 32767).astype(np.int16))
    result = whisper_model.transcribe(tmp)
    os.remove(tmp)
    return result["text"].strip()

def ask_ollama(prompt):
    """Send prompt to Ollama and return the response."""
    response = requests.post(OLLAMA_URL, json={
        "model": OLLAMA_MODEL,
        "prompt": prompt,
        "stream": False
    })
    return response.json()["response"]

def speak(text):
    """Speak text using pyttsx3."""
    print(f"🔈 {text}")
    tts_engine.say(text)
    tts_engine.runAndWait()

def main():
    print("Voice assistant ready. Press Ctrl+C to quit.\n")
    while True:
        try:
            audio = record_audio()
            text = transcribe(audio)

            if not text or len(text) < 2:
                continue

            print(f"📝 You said: {text}")
            response = ask_ollama(text)
            speak(response)
            print()

        except KeyboardInterrupt:
            print("\nGoodbye!")
            break

if __name__ == "__main__":
    main()

Save this as voice_assistant.py and run it:

python voice_assistant.py

The assistant will record 5 seconds of audio each loop, transcribe it, get a response from Ollama, and read it back to you. Simple, private, and entirely local.

Improvements

This basic version works, but there’s plenty of room to make it better:

Wake word detection — Instead of recording in a fixed loop, use a library like pvporcupine or openwakeword to trigger recording only when you say a keyword like “Hey assistant.” This makes it feel much more natural.

Streaming responses — Right now we wait for Ollama to finish generating before speaking. You can stream the Ollama response token by token and start TTS as soon as the first sentence is complete. This cuts perceived latency dramatically.

Conversation memory — The current script sends each prompt in isolation. Wrap it with a conversation history that passes previous exchanges to Ollama for context-aware responses.

Voice activity detection — Replace the fixed 5-second recording with webrtcvad or silero-vad to detect when you stop talking and automatically end the recording.

Better TTS — Look into Coqui TTS or Piper for high-quality, fully offline text-to-speech with natural-sounding voices.

For a more polished take on this concept with additional features, check out our private voice assistant with Ollama tutorial.

Wrapping Up

You now have a fully functional voice assistant that never phones home. Whisper handles the ears, Ollama provides the brain, and pyttsx3 gives it a voice — all running on your hardware.

The beauty of this setup is modularity. Swap base for medium when you need better transcription. Switch from llama3.2 to mistral or phi-3 depending on your task. Replace pyttsx3 with a neural TTS engine when you want a more human voice.

No cloud bills. No privacy concerns. Just your machine, doing what you tell it to.

Build a Local Voice Assistant with Whisper + Ollama (2026)

Architecture

Prerequisites

Step 1: Install Whisper

Step 2: Set Up Ollama

Step 3: Text-to-Speech

Step 4: Wire It All Together

Improvements

Wrapping Up

Related Links

📬 AI Dev Weekly

You might also like

Build an AI Expense Tracker That Reads Your Bank CSV Files

Build a CLI That Generates README Files From Your Code

Build an AI-Powered Changelog Generator From Git Tags

Build a Personal AI Knowledge Base with Obsidian + Ollama