Apr 15, 2026 · 4 min read

Last updated on Apr 19, 2026

Ollama Troubleshooting Guide — Fix Every Common Error

Ollama usually just works. When it doesn’t, here are the fixes for every common error.

Ollama won’t start

Symptoms: ollama run hangs or returns “could not connect.”

# Check if Ollama is already running
ps aux | grep ollama

# If running but unresponsive, kill and restart
pkill ollama
ollama serve

On Mac: Check Activity Monitor for “ollama” processes. Force quit all, then reopen the Ollama app.

On Linux (systemd):

sudo systemctl restart ollama
sudo systemctl status ollama
journalctl -u ollama -n 50  # Check logs

“command not found: ollama”

Ollama isn’t in your PATH.

# Mac (Homebrew)
brew install ollama

# Linux - reinstall
curl -fsSL https://ollama.com/install.sh | sh

# Check installation
which ollama
ollama --version

GPU not detected (running on CPU)

Symptoms: Inference is extremely slow. ollama ps shows no GPU.

NVIDIA:

# Check if NVIDIA drivers are installed
nvidia-smi

# If not found, install drivers
# Ubuntu/Debian:
sudo apt install nvidia-driver-535

# Restart Ollama after driver install
sudo systemctl restart ollama

Apple Silicon: GPU is used automatically. If slow, check that you’re not running out of unified memory:

# Check available memory
sysctl hw.memsize
# Check model size vs available memory
ollama ps

No GPU at all? See our guide on running AI without a GPU. CPU inference works but is 5-10x slower. Alternatively, cloud GPU providers let you rent a GPU by the hour.

Model download stuck

Symptoms: Download progress stops or is extremely slow.

# Cancel and retry
# Ctrl+C, then:
ollama pull devstral-small:24b

# If still stuck, clear partial download
rm -rf ~/.ollama/models/blobs/sha256-*
ollama pull devstral-small:24b

# Check disk space
df -h ~/.ollama

Common cause: Not enough disk space. Models are large (5-20GB). Free up space or change the Ollama storage directory:

# Move Ollama storage
export OLLAMA_MODELS=/path/to/larger/drive/.ollama/models

“model requires more system memory”

The model is too large for your RAM.

Your RAM	Max model size	Recommended model
8 GB	~5 GB (8B models)	Qwen3 8B
16 GB	~12 GB (14-24B)	Devstral Small 24B
32 GB	~24 GB (27-32B)	Qwen 3.5 27B
64 GB	~48 GB (70B)	Llama 4 Scout 70B

Fixes:

Use a smaller model: ollama pull qwen3:8b
Use a more quantized version: ollama pull devstral-small:24b-q4_K_M
Close other applications to free RAM
Upgrade hardware — see best AI models for Mac

”address already in use” (port 11434)

Another Ollama instance or process is using the port.

# Find what's using the port
lsof -i :11434

# Kill it
kill -9 $(lsof -t -i :11434)

# Or use a different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve

”context length exceeded”

Your prompt + response exceeds the model’s context window.

# Increase context window (default is often 4K, too small)
ollama run devstral-small:24b --num-ctx 65536

Important: Larger context uses more RAM. A 64K context on a 24B model needs ~20GB RAM. See our context engineering guide for managing context efficiently.

Responses are very slow

Possible causes:

Running on CPU instead of GPU — check with ollama ps
Model too large for RAM — causes swapping, 10-100x slower
Context window too large — reduce --num-ctx
Other processes using GPU — close other GPU-heavy apps

# Check current performance
ollama ps
# Shows: model name, size, processor (GPU/CPU), and memory usage

# Benchmark your setup
time ollama run qwen3:8b "Write hello world in Python" --verbose

Expected speeds on Apple Silicon:

Hardware	8B model	24B model
M1 8GB	~15 tok/s	Too slow
M2 16GB	~25 tok/s	~15 tok/s
M3 Pro 36GB	~35 tok/s	~25 tok/s
M4 Pro 48GB	~40 tok/s	~30 tok/s

Connection refused from other tools

Aider, Continue.dev, or other tools can’t connect to Ollama.

# Check Ollama is running and listening
curl http://localhost:11434/api/tags

# If connecting from another machine, Ollama needs to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve

Docker users:

# Ollama inside Docker needs host networking
docker run --network host ollama/ollama

Clean reinstall

When nothing else works:

# Mac
brew uninstall ollama
rm -rf ~/.ollama
brew install ollama

# Linux
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
rm -rf ~/.ollama
curl -fsSL https://ollama.com/install.sh | sh

This removes all models and settings. You’ll need to re-download your models.

FAQ

Why is Ollama slow?

The most common causes are: running on CPU instead of GPU (check with ollama ps), using a model too large for your RAM which causes disk swapping, or having an excessively large context window set. Ensure your GPU drivers are installed and pick a model size that fits comfortably in your available memory.

Why can’t Ollama find my GPU?

On NVIDIA systems, this usually means the GPU drivers aren’t installed or are outdated — run nvidia-smi to verify. After installing or updating drivers, restart Ollama with sudo systemctl restart ollama. On Apple Silicon, the GPU is used automatically and doesn’t require driver setup.

How do I fix Ollama connection refused?

First verify Ollama is running with curl http://localhost:11434/api/tags. If it’s not running, start it with ollama serve. If connecting from another machine or Docker container, Ollama needs to listen on all interfaces: OLLAMA_HOST=0.0.0.0 ollama serve.

How do I clear Ollama’s cache?

To remove all downloaded models and reset Ollama, delete the ~/.ollama directory with rm -rf ~/.ollama. For removing just a specific model, use ollama rm model-name. After clearing, you’ll need to re-download any models you want to use.