Ollama GPU Not Detected Fix: CUDA and Metal Acceleration Issues (2026)
Ollama is running but using CPU instead of GPU. Your responses are 5-10x slower than they should be. Hereβs how to fix GPU detection for both NVIDIA and Apple Silicon. If you donβt have a compatible GPU at all, cloud GPU providers are a quick alternative.
Check if GPU is being used
# Check running models and their processor
ollama ps
# Look for "GPU" β if it says "CPU", GPU isn't being used
# NVIDIA: check if GPU is visible
nvidia-smi
# Should show your GPU with driver version
# macOS: Metal is used automatically on Apple Silicon
# If slow, check Activity Monitor β GPU History
NVIDIA fixes
Fix 1: Install/update CUDA drivers
# Check current driver
nvidia-smi
# If "command not found" β drivers aren't installed
# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-550 # Or latest version
sudo reboot
# After reboot, verify
nvidia-smi
Fix 2: Install NVIDIA Container Toolkit (Docker)
If running Ollama in Docker, GPU passthrough needs the toolkit:
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
Then run Ollama with GPU access:
# docker-compose.yml
services:
ollama:
image: ollama/ollama
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
Fix 3: Force GPU layers
# Force all layers to GPU
OLLAMA_NUM_GPU=999 ollama serve
# Or set specific number of layers
OLLAMA_NUM_GPU=35 ollama serve
If the model is too large for VRAM, Ollama silently falls back to CPU. See our out of memory fix for solutions.
Apple Silicon fixes
Fix 1: Check Metal support
Metal GPU acceleration should work automatically on M1/M2/M3/M4. If itβs not:
# Reinstall Ollama
brew reinstall ollama
# Or download fresh from ollama.com
Fix 2: Unified memory pressure
On Apple Silicon, GPU and CPU share memory. If other apps are using too much:
# Check memory pressure
vm_stat | head -10
# Close memory-hungry apps
# Chrome, Docker, Xcode, Simulator are common culprits
Fix 3: Rosetta issues (Intel apps)
If youβre running Ollama through Rosetta (Intel emulation), Metal wonβt work:
# Check architecture
file $(which ollama)
# Should say "arm64" not "x86_64"
# If x86_64, reinstall the ARM version
brew uninstall ollama
arch -arm64 brew install ollama
Verify GPU is working
After applying fixes:
ollama run qwen3:8b --verbose "Hello"
# Check the output for:
# - "eval rate" should be 20+ tok/s (GPU) vs 2-5 tok/s (CPU)
# - GPU memory usage in nvidia-smi should increase
Related: Ollama Complete Guide Β· Ollama Out of Memory Fix Β· Ollama Slow Inference Fix Β· Best GPU for AI Locally Β· CPU vs GPU for LLM Inference Β· How Much VRAM for AI Models