πŸ”§ Error Fixes
Β· 2 min read

Ollama GPU Not Detected Fix: CUDA and Metal Acceleration Issues (2026)


Ollama is running but using CPU instead of GPU. Your responses are 5-10x slower than they should be. Here’s how to fix GPU detection for both NVIDIA and Apple Silicon. If you don’t have a compatible GPU at all, cloud GPU providers are a quick alternative.

Check if GPU is being used

# Check running models and their processor
ollama ps
# Look for "GPU" β€” if it says "CPU", GPU isn't being used

# NVIDIA: check if GPU is visible
nvidia-smi
# Should show your GPU with driver version

# macOS: Metal is used automatically on Apple Silicon
# If slow, check Activity Monitor β†’ GPU History

NVIDIA fixes

Fix 1: Install/update CUDA drivers

# Check current driver
nvidia-smi
# If "command not found" β€” drivers aren't installed

# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-550  # Or latest version
sudo reboot

# After reboot, verify
nvidia-smi

Fix 2: Install NVIDIA Container Toolkit (Docker)

If running Ollama in Docker, GPU passthrough needs the toolkit:

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

Then run Ollama with GPU access:

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Fix 3: Force GPU layers

# Force all layers to GPU
OLLAMA_NUM_GPU=999 ollama serve

# Or set specific number of layers
OLLAMA_NUM_GPU=35 ollama serve

If the model is too large for VRAM, Ollama silently falls back to CPU. See our out of memory fix for solutions.

Apple Silicon fixes

Fix 1: Check Metal support

Metal GPU acceleration should work automatically on M1/M2/M3/M4. If it’s not:

# Reinstall Ollama
brew reinstall ollama

# Or download fresh from ollama.com

Fix 2: Unified memory pressure

On Apple Silicon, GPU and CPU share memory. If other apps are using too much:

# Check memory pressure
vm_stat | head -10

# Close memory-hungry apps
# Chrome, Docker, Xcode, Simulator are common culprits

Fix 3: Rosetta issues (Intel apps)

If you’re running Ollama through Rosetta (Intel emulation), Metal won’t work:

# Check architecture
file $(which ollama)
# Should say "arm64" not "x86_64"

# If x86_64, reinstall the ARM version
brew uninstall ollama
arch -arm64 brew install ollama

Verify GPU is working

After applying fixes:

ollama run qwen3:8b --verbose "Hello"
# Check the output for:
# - "eval rate" should be 20+ tok/s (GPU) vs 2-5 tok/s (CPU)
# - GPU memory usage in nvidia-smi should increase

Related: Ollama Complete Guide Β· Ollama Out of Memory Fix Β· Ollama Slow Inference Fix Β· Best GPU for AI Locally Β· CPU vs GPU for LLM Inference Β· How Much VRAM for AI Models