🤖 AI Tools
· 4 min read
Last updated on

Ollama Troubleshooting Guide — Fix Every Common Error


Ollama usually just works. When it doesn’t, here are the fixes for every common error.

Ollama won’t start

Symptoms: ollama run hangs or returns “could not connect.”

# Check if Ollama is already running
ps aux | grep ollama

# If running but unresponsive, kill and restart
pkill ollama
ollama serve

On Mac: Check Activity Monitor for “ollama” processes. Force quit all, then reopen the Ollama app.

On Linux (systemd):

sudo systemctl restart ollama
sudo systemctl status ollama
journalctl -u ollama -n 50  # Check logs

“command not found: ollama”

Ollama isn’t in your PATH.

# Mac (Homebrew)
brew install ollama

# Linux - reinstall
curl -fsSL https://ollama.com/install.sh | sh

# Check installation
which ollama
ollama --version

GPU not detected (running on CPU)

Symptoms: Inference is extremely slow. ollama ps shows no GPU.

NVIDIA:

# Check if NVIDIA drivers are installed
nvidia-smi

# If not found, install drivers
# Ubuntu/Debian:
sudo apt install nvidia-driver-535

# Restart Ollama after driver install
sudo systemctl restart ollama

Apple Silicon: GPU is used automatically. If slow, check that you’re not running out of unified memory:

# Check available memory
sysctl hw.memsize
# Check model size vs available memory
ollama ps

No GPU at all? See our guide on running AI without a GPU. CPU inference works but is 5-10x slower.

Model download stuck

Symptoms: Download progress stops or is extremely slow.

# Cancel and retry
# Ctrl+C, then:
ollama pull devstral-small:24b

# If still stuck, clear partial download
rm -rf ~/.ollama/models/blobs/sha256-*
ollama pull devstral-small:24b

# Check disk space
df -h ~/.ollama

Common cause: Not enough disk space. Models are large (5-20GB). Free up space or change the Ollama storage directory:

# Move Ollama storage
export OLLAMA_MODELS=/path/to/larger/drive/.ollama/models

“model requires more system memory”

The model is too large for your RAM.

Your RAMMax model sizeRecommended model
8 GB~5 GB (8B models)Qwen3 8B
16 GB~12 GB (14-24B)Devstral Small 24B
32 GB~24 GB (27-32B)Qwen 3.5 27B
64 GB~48 GB (70B)Llama 4 Scout 70B

Fixes:

  • Use a smaller model: ollama pull qwen3:8b
  • Use a more quantized version: ollama pull devstral-small:24b-q4_K_M
  • Close other applications to free RAM
  • Upgrade hardware — see best AI models for Mac

”address already in use” (port 11434)

Another Ollama instance or process is using the port.

# Find what's using the port
lsof -i :11434

# Kill it
kill -9 $(lsof -t -i :11434)

# Or use a different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve

”context length exceeded”

Your prompt + response exceeds the model’s context window.

# Increase context window (default is often 4K, too small)
ollama run devstral-small:24b --num-ctx 65536

Important: Larger context uses more RAM. A 64K context on a 24B model needs ~20GB RAM. See our context engineering guide for managing context efficiently.

Responses are very slow

Possible causes:

  1. Running on CPU instead of GPU — check with ollama ps
  2. Model too large for RAM — causes swapping, 10-100x slower
  3. Context window too large — reduce --num-ctx
  4. Other processes using GPU — close other GPU-heavy apps
# Check current performance
ollama ps
# Shows: model name, size, processor (GPU/CPU), and memory usage

# Benchmark your setup
time ollama run qwen3:8b "Write hello world in Python" --verbose

Expected speeds on Apple Silicon:

Hardware8B model24B model
M1 8GB~15 tok/sToo slow
M2 16GB~25 tok/s~15 tok/s
M3 Pro 36GB~35 tok/s~25 tok/s
M4 Pro 48GB~40 tok/s~30 tok/s

Connection refused from other tools

Aider, Continue.dev, or other tools can’t connect to Ollama.

# Check Ollama is running and listening
curl http://localhost:11434/api/tags

# If connecting from another machine, Ollama needs to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve

Docker users:

# Ollama inside Docker needs host networking
docker run --network host ollama/ollama

Clean reinstall

When nothing else works:

# Mac
brew uninstall ollama
rm -rf ~/.ollama
brew install ollama

# Linux
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
rm -rf ~/.ollama
curl -fsSL https://ollama.com/install.sh | sh

This removes all models and settings. You’ll need to re-download your models.

FAQ

Why is Ollama slow?

The most common causes are: running on CPU instead of GPU (check with ollama ps), using a model too large for your RAM which causes disk swapping, or having an excessively large context window set. Ensure your GPU drivers are installed and pick a model size that fits comfortably in your available memory.

Why can’t Ollama find my GPU?

On NVIDIA systems, this usually means the GPU drivers aren’t installed or are outdated — run nvidia-smi to verify. After installing or updating drivers, restart Ollama with sudo systemctl restart ollama. On Apple Silicon, the GPU is used automatically and doesn’t require driver setup.

How do I fix Ollama connection refused?

First verify Ollama is running with curl http://localhost:11434/api/tags. If it’s not running, start it with ollama serve. If connecting from another machine or Docker container, Ollama needs to listen on all interfaces: OLLAMA_HOST=0.0.0.0 ollama serve.

How do I clear Ollama’s cache?

To remove all downloaded models and reset Ollama, delete the ~/.ollama directory with rm -rf ~/.ollama. For removing just a specific model, use ollama rm model-name. After clearing, you’ll need to re-download any models you want to use.

Related: Ollama Complete Guide · Best Ollama Models for Coding · Ollama vs LM Studio vs vLLM · How to Run AI Without GPU · Best AI Models for Mac