Apr 27, 2026 · 2 min read

Ollama GPU Not Detected Fix: CUDA and Metal Acceleration Issues (2026)

Ollama is running but using CPU instead of GPU. Your responses are 5-10x slower than they should be. Here’s how to fix GPU detection for both NVIDIA and Apple Silicon. If you don’t have a compatible GPU at all, cloud GPU providers are a quick alternative.

Check if GPU is being used

# Check running models and their processor
ollama ps
# Look for "GPU" — if it says "CPU", GPU isn't being used

# NVIDIA: check if GPU is visible
nvidia-smi
# Should show your GPU with driver version

# macOS: Metal is used automatically on Apple Silicon
# If slow, check Activity Monitor → GPU History

NVIDIA fixes

Fix 1: Install/update CUDA drivers

# Check current driver
nvidia-smi
# If "command not found" — drivers aren't installed

# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-550  # Or latest version
sudo reboot

# After reboot, verify
nvidia-smi

Fix 2: Install NVIDIA Container Toolkit (Docker)

If running Ollama in Docker, GPU passthrough needs the toolkit:

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

Then run Ollama with GPU access:

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Fix 3: Force GPU layers

# Force all layers to GPU
OLLAMA_NUM_GPU=999 ollama serve

# Or set specific number of layers
OLLAMA_NUM_GPU=35 ollama serve

If the model is too large for VRAM, Ollama silently falls back to CPU. See our out of memory fix for solutions.

Apple Silicon fixes

Fix 1: Check Metal support

Metal GPU acceleration should work automatically on M1/M2/M3/M4. If it’s not:

# Reinstall Ollama
brew reinstall ollama

# Or download fresh from ollama.com

Fix 2: Unified memory pressure

On Apple Silicon, GPU and CPU share memory. If other apps are using too much:

# Check memory pressure
vm_stat | head -10

# Close memory-hungry apps
# Chrome, Docker, Xcode, Simulator are common culprits

Fix 3: Rosetta issues (Intel apps)

If you’re running Ollama through Rosetta (Intel emulation), Metal won’t work:

# Check architecture
file $(which ollama)
# Should say "arm64" not "x86_64"

# If x86_64, reinstall the ARM version
brew uninstall ollama
arch -arm64 brew install ollama

Verify GPU is working

After applying fixes:

ollama run qwen3:8b --verbose "Hello"
# Check the output for:
# - "eval rate" should be 20+ tok/s (GPU) vs 2-5 tok/s (CPU)
# - GPU memory usage in nvidia-smi should increase

Ollama GPU Not Detected Fix: CUDA and Metal Acceleration Issues (2026)

Check if GPU is being used

NVIDIA fixes

Fix 1: Install/update CUDA drivers

Fix 2: Install NVIDIA Container Toolkit (Docker)

Fix 3: Force GPU layers

Apple Silicon fixes

Fix 1: Check Metal support

Fix 2: Unified memory pressure

Fix 3: Rosetta issues (Intel apps)

Verify GPU is working

📬 AI Dev Weekly

You might also like

Ollama API Timeout Fix: Slow or Hanging API Requests (2026)

Ollama Connection Refused Fix: Server Not Starting or Not Responding (2026)

Ollama Slow Inference Fix: Speed Up Local AI Model Response Times (2026)

Ollama Model Not Found Fix: Why Your Model Won't Pull or Run (2026)