Ollama Docker Setup Guide โ Run Local LLMs in Containers (2026)
Running LLMs locally is powerful. Running them inside Docker makes it production-ready. This guide walks you through setting up Ollama in Docker โ from a single command to a full-stack deployment with GPU passthrough and a web UI.
Why Run Ollama in Docker?
Installing Ollama directly on your machine works fine for personal use. Docker solves the problems that show up everywhere else:
- Reproducibility โ the same container image runs identically on your laptop, your colleagueโs machine, and your production server. No โworks on my machineโ issues.
- Team sharing โ spin up a shared Ollama instance on a team server. Everyone hits the same API endpoint without installing anything locally.
- Server deployment โ deploy to any cloud VM, on-prem server, or Kubernetes cluster with a single
docker compose up. - Isolation โ keep Ollama and its dependencies contained. No conflicts with other software on the host.
- Easy upgrades โ pull a new image tag and restart. Rolling back is just as simple.
If youโre deploying Ollama for anything beyond personal tinkering, Docker is the way to go.
Prerequisites
You need two things installed on your host machine:
Docker Engine (v20.10+) or Docker Desktop:
# Verify Docker is installed
docker --version
docker compose version
NVIDIA Container Toolkit (required only for GPU passthrough):
If you plan to run models on an NVIDIA GPU โ and you should for anything larger than 7B parameters โ youโll need the NVIDIA Container Toolkit. We cover the full installation in the GPU passthrough section below.
For CPU-only usage, Docker alone is sufficient.
Quick Start with Docker Run
The fastest way to get Ollama running in a container:
docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama_data:/root/.ollama \
ollama/ollama
This does three things:
- Exposes the Ollama API on port
11434 - Creates a named volume
ollama_dataso downloaded models persist across container restarts - Runs in detached mode
Pull and run a model:
docker exec ollama ollama pull llama3.1
docker exec ollama ollama run llama3.1 "Explain Docker in one sentence"
Hit the API from your host:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "What is Docker?",
"stream": false
}'
This works, but Docker Compose gives you more control.
Docker Compose with GPU and Persistent Storage
Create a docker-compose.yml for a production-ready Ollama setup:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0
volumes:
ollama_data:
Start it:
docker compose up -d
Key details:
deploy.resources.reservations.devicesโ passes all NVIDIA GPUs to the container. Changecount: alltocount: 1if you want to limit it.ollama_datavolume โ models are stored in/root/.ollamainside the container. The named volume ensures they survive container recreation.OLLAMA_HOST=0.0.0.0โ makes Ollama listen on all interfaces, necessary for access from other containers or external clients.restart: unless-stoppedโ the container comes back up after reboots automatically.
For CPU-only usage, remove the entire deploy block.
Docker Compose with Open WebUI (Full Stack)
For a complete local AI setup with a ChatGPT-style interface, combine Ollama with Open WebUI:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
volumes:
- openwebui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
openwebui_data:
docker compose up -d
Open http://localhost:3000 in your browser. Create an account (stored locally), pull a model through the UI or via CLI, and start chatting. The OLLAMA_BASE_URL environment variable tells Open WebUI where to find the Ollama API โ Dockerโs internal DNS resolves the ollama service name automatically.
This is the setup we recommend for teams. Deploy it on a shared server with a GPU and everyone gets a private, self-hosted ChatGPT alternative. See our full Open WebUI guide for authentication, model management, and customization.
GPU Passthrough Setup
Docker doesnโt see your GPU by default. The NVIDIA Container Toolkit bridges that gap.
Step 1 โ Install NVIDIA drivers on the host:
# Verify your GPU is detected
nvidia-smi
If this command doesnโt work, install the appropriate NVIDIA drivers for your distribution first.
Step 2 โ Install the NVIDIA Container Toolkit:
# Add the repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Step 3 โ Configure Docker to use the NVIDIA runtime:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Step 4 โ Verify GPU access inside a container:
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
You should see your GPU listed. If this works, the deploy.resources block in the Compose files above will function correctly.
For non-NVIDIA GPUs (AMD ROCm), check the Ollama documentation for ROCm-specific Docker images.
Common Issues
Models disappear after restarting the container
Youโre not using a persistent volume. Make sure you have -v ollama_data:/root/.ollama in your run command or the volumes section in Compose.
โcould not select device driverโ error The NVIDIA Container Toolkit isnโt installed or configured. Run through the GPU passthrough setup steps above.
Container canโt access GPU but nvidia-smi works on host
Restart Docker after installing the toolkit: sudo systemctl restart docker. If using Docker Desktop, restart the application entirely.
Open WebUI shows โOllama connection failedโ
Check that OLLAMA_BASE_URL is set to http://ollama:11434 (the service name, not localhost). Both containers must be on the same Docker network, which Compose handles automatically.
Slow model loading First loads are slow because the model weights are read from disk into memory (or VRAM). Subsequent requests are fast. Use a volume backed by SSD storage for best performance.
Port 11434 already in use
You have Ollama running natively on the host. Stop it with systemctl stop ollama or change the Docker port mapping to something like 11435:11434.
Whatโs Next
You now have Ollama running in Docker with GPU acceleration and optional web UI. From here you can:
- Scale with multiple model instances behind a load balancer
- Add authentication with a reverse proxy like Traefik or Caddy
- Integrate with your applications via the REST API on port 11434
- Explore vLLM for higher-throughput serving if you need production-grade inference
FAQ
Can Ollama run in Docker?
Yes, Ollama has an official Docker image (ollama/ollama) that works out of the box. You can run it with a single docker run command and access the API on port 11434, just like a native installation.
Does Docker Ollama support GPU?
Yes, Docker Ollama supports NVIDIA GPU passthrough via the NVIDIA Container Toolkit. Once the toolkit is installed and configured, you add a deploy.resources.reservations.devices block to your Compose file or use --gpus all with docker run.
How do I persist models in Docker?
Mount a named volume or bind mount to /root/.ollama inside the container. This ensures downloaded models survive container restarts and recreations โ without it, youโd re-download models every time the container is removed.
Can I use Docker Compose with Ollama?
Absolutely โ Docker Compose is the recommended approach for production setups. It lets you define GPU passthrough, persistent volumes, environment variables, and multi-service stacks (like Ollama + Open WebUI) in a single declarative YAML file.
Related Guides
- Ollama Complete Guide 2026 โ installation, model management, API usage, and configuration
- Ollama + Open WebUI Setup โ deep dive into the web interface
- Serve LLMs with vLLM โ high-throughput alternative for production workloads
- Best Self-Hosted AI Models 2026 โ which models to run locally
- Self-Hosted AI for Enterprise โ scaling local AI across your organization