Ollama usually just works. When it doesn’t, here are the fixes for every common error.
Ollama won’t start
Symptoms: ollama run hangs or returns “could not connect.”
# Check if Ollama is already running
ps aux | grep ollama
# If running but unresponsive, kill and restart
pkill ollama
ollama serve
On Mac: Check Activity Monitor for “ollama” processes. Force quit all, then reopen the Ollama app.
On Linux (systemd):
sudo systemctl restart ollama
sudo systemctl status ollama
journalctl -u ollama -n 50 # Check logs
“command not found: ollama”
Ollama isn’t in your PATH.
# Mac (Homebrew)
brew install ollama
# Linux - reinstall
curl -fsSL https://ollama.com/install.sh | sh
# Check installation
which ollama
ollama --version
GPU not detected (running on CPU)
Symptoms: Inference is extremely slow. ollama ps shows no GPU.
NVIDIA:
# Check if NVIDIA drivers are installed
nvidia-smi
# If not found, install drivers
# Ubuntu/Debian:
sudo apt install nvidia-driver-535
# Restart Ollama after driver install
sudo systemctl restart ollama
Apple Silicon: GPU is used automatically. If slow, check that you’re not running out of unified memory:
# Check available memory
sysctl hw.memsize
# Check model size vs available memory
ollama ps
No GPU at all? See our guide on running AI without a GPU. CPU inference works but is 5-10x slower.
Model download stuck
Symptoms: Download progress stops or is extremely slow.
# Cancel and retry
# Ctrl+C, then:
ollama pull devstral-small:24b
# If still stuck, clear partial download
rm -rf ~/.ollama/models/blobs/sha256-*
ollama pull devstral-small:24b
# Check disk space
df -h ~/.ollama
Common cause: Not enough disk space. Models are large (5-20GB). Free up space or change the Ollama storage directory:
# Move Ollama storage
export OLLAMA_MODELS=/path/to/larger/drive/.ollama/models
“model requires more system memory”
The model is too large for your RAM.
| Your RAM | Max model size | Recommended model |
|---|---|---|
| 8 GB | ~5 GB (8B models) | Qwen3 8B |
| 16 GB | ~12 GB (14-24B) | Devstral Small 24B |
| 32 GB | ~24 GB (27-32B) | Qwen 3.5 27B |
| 64 GB | ~48 GB (70B) | Llama 4 Scout 70B |
Fixes:
- Use a smaller model:
ollama pull qwen3:8b - Use a more quantized version:
ollama pull devstral-small:24b-q4_K_M - Close other applications to free RAM
- Upgrade hardware — see best AI models for Mac
”address already in use” (port 11434)
Another Ollama instance or process is using the port.
# Find what's using the port
lsof -i :11434
# Kill it
kill -9 $(lsof -t -i :11434)
# Or use a different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve
”context length exceeded”
Your prompt + response exceeds the model’s context window.
# Increase context window (default is often 4K, too small)
ollama run devstral-small:24b --num-ctx 65536
Important: Larger context uses more RAM. A 64K context on a 24B model needs ~20GB RAM. See our context engineering guide for managing context efficiently.
Responses are very slow
Possible causes:
- Running on CPU instead of GPU — check with
ollama ps - Model too large for RAM — causes swapping, 10-100x slower
- Context window too large — reduce
--num-ctx - Other processes using GPU — close other GPU-heavy apps
# Check current performance
ollama ps
# Shows: model name, size, processor (GPU/CPU), and memory usage
# Benchmark your setup
time ollama run qwen3:8b "Write hello world in Python" --verbose
Expected speeds on Apple Silicon:
| Hardware | 8B model | 24B model |
|---|---|---|
| M1 8GB | ~15 tok/s | Too slow |
| M2 16GB | ~25 tok/s | ~15 tok/s |
| M3 Pro 36GB | ~35 tok/s | ~25 tok/s |
| M4 Pro 48GB | ~40 tok/s | ~30 tok/s |
Connection refused from other tools
Aider, Continue.dev, or other tools can’t connect to Ollama.
# Check Ollama is running and listening
curl http://localhost:11434/api/tags
# If connecting from another machine, Ollama needs to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve
Docker users:
# Ollama inside Docker needs host networking
docker run --network host ollama/ollama
Clean reinstall
When nothing else works:
# Mac
brew uninstall ollama
rm -rf ~/.ollama
brew install ollama
# Linux
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
rm -rf ~/.ollama
curl -fsSL https://ollama.com/install.sh | sh
This removes all models and settings. You’ll need to re-download your models.
FAQ
Why is Ollama slow?
The most common causes are: running on CPU instead of GPU (check with ollama ps), using a model too large for your RAM which causes disk swapping, or having an excessively large context window set. Ensure your GPU drivers are installed and pick a model size that fits comfortably in your available memory.
Why can’t Ollama find my GPU?
On NVIDIA systems, this usually means the GPU drivers aren’t installed or are outdated — run nvidia-smi to verify. After installing or updating drivers, restart Ollama with sudo systemctl restart ollama. On Apple Silicon, the GPU is used automatically and doesn’t require driver setup.
How do I fix Ollama connection refused?
First verify Ollama is running with curl http://localhost:11434/api/tags. If it’s not running, start it with ollama serve. If connecting from another machine or Docker container, Ollama needs to listen on all interfaces: OLLAMA_HOST=0.0.0.0 ollama serve.
How do I clear Ollama’s cache?
To remove all downloaded models and reset Ollama, delete the ~/.ollama directory with rm -rf ~/.ollama. For removing just a specific model, use ollama rm model-name. After clearing, you’ll need to re-download any models you want to use.
Related: Ollama Complete Guide · Best Ollama Models for Coding · Ollama vs LM Studio vs vLLM · How to Run AI Without GPU · Best AI Models for Mac