๐Ÿ“ Tutorials
ยท 6 min read
Last updated on

Ollama Docker Setup Guide โ€” Run Local LLMs in Containers (2026)


Running LLMs locally is powerful. Running them inside Docker makes it production-ready. This guide walks you through setting up Ollama in Docker โ€” from a single command to a full-stack deployment with GPU passthrough and a web UI.

Why Run Ollama in Docker?

Installing Ollama directly on your machine works fine for personal use. Docker solves the problems that show up everywhere else:

  • Reproducibility โ€” the same container image runs identically on your laptop, your colleagueโ€™s machine, and your production server. No โ€œworks on my machineโ€ issues.
  • Team sharing โ€” spin up a shared Ollama instance on a team server. Everyone hits the same API endpoint without installing anything locally.
  • Server deployment โ€” deploy to any cloud VM, on-prem server, or Kubernetes cluster with a single docker compose up.
  • Isolation โ€” keep Ollama and its dependencies contained. No conflicts with other software on the host.
  • Easy upgrades โ€” pull a new image tag and restart. Rolling back is just as simple.

If youโ€™re deploying Ollama for anything beyond personal tinkering, Docker is the way to go.

Prerequisites

You need two things installed on your host machine:

Docker Engine (v20.10+) or Docker Desktop:

# Verify Docker is installed
docker --version
docker compose version

NVIDIA Container Toolkit (required only for GPU passthrough):

If you plan to run models on an NVIDIA GPU โ€” and you should for anything larger than 7B parameters โ€” youโ€™ll need the NVIDIA Container Toolkit. We cover the full installation in the GPU passthrough section below.

For CPU-only usage, Docker alone is sufficient.

Quick Start with Docker Run

The fastest way to get Ollama running in a container:

docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama

This does three things:

  1. Exposes the Ollama API on port 11434
  2. Creates a named volume ollama_data so downloaded models persist across container restarts
  3. Runs in detached mode

Pull and run a model:

docker exec ollama ollama pull llama3.1
docker exec ollama ollama run llama3.1 "Explain Docker in one sentence"

Hit the API from your host:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "What is Docker?",
  "stream": false
}'

This works, but Docker Compose gives you more control.

Docker Compose with GPU and Persistent Storage

Create a docker-compose.yml for a production-ready Ollama setup:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
    environment:
      - OLLAMA_HOST=0.0.0.0

volumes:
  ollama_data:

Start it:

docker compose up -d

Key details:

  • deploy.resources.reservations.devices โ€” passes all NVIDIA GPUs to the container. Change count: all to count: 1 if you want to limit it.
  • ollama_data volume โ€” models are stored in /root/.ollama inside the container. The named volume ensures they survive container recreation.
  • OLLAMA_HOST=0.0.0.0 โ€” makes Ollama listen on all interfaces, necessary for access from other containers or external clients.
  • restart: unless-stopped โ€” the container comes back up after reboots automatically.

For CPU-only usage, remove the entire deploy block.

Docker Compose with Open WebUI (Full Stack)

For a complete local AI setup with a ChatGPT-style interface, combine Ollama with Open WebUI:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
    environment:
      - OLLAMA_HOST=0.0.0.0

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    volumes:
      - openwebui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  openwebui_data:
docker compose up -d

Open http://localhost:3000 in your browser. Create an account (stored locally), pull a model through the UI or via CLI, and start chatting. The OLLAMA_BASE_URL environment variable tells Open WebUI where to find the Ollama API โ€” Dockerโ€™s internal DNS resolves the ollama service name automatically.

This is the setup we recommend for teams. Deploy it on a shared server with a GPU and everyone gets a private, self-hosted ChatGPT alternative. See our full Open WebUI guide for authentication, model management, and customization.

GPU Passthrough Setup

Docker doesnโ€™t see your GPU by default. The NVIDIA Container Toolkit bridges that gap.

Step 1 โ€” Install NVIDIA drivers on the host:

# Verify your GPU is detected
nvidia-smi

If this command doesnโ€™t work, install the appropriate NVIDIA drivers for your distribution first.

Step 2 โ€” Install the NVIDIA Container Toolkit:

# Add the repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Step 3 โ€” Configure Docker to use the NVIDIA runtime:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 4 โ€” Verify GPU access inside a container:

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

You should see your GPU listed. If this works, the deploy.resources block in the Compose files above will function correctly.

For non-NVIDIA GPUs (AMD ROCm), check the Ollama documentation for ROCm-specific Docker images.

Common Issues

Models disappear after restarting the container Youโ€™re not using a persistent volume. Make sure you have -v ollama_data:/root/.ollama in your run command or the volumes section in Compose.

โ€œcould not select device driverโ€ error The NVIDIA Container Toolkit isnโ€™t installed or configured. Run through the GPU passthrough setup steps above.

Container canโ€™t access GPU but nvidia-smi works on host Restart Docker after installing the toolkit: sudo systemctl restart docker. If using Docker Desktop, restart the application entirely.

Open WebUI shows โ€œOllama connection failedโ€ Check that OLLAMA_BASE_URL is set to http://ollama:11434 (the service name, not localhost). Both containers must be on the same Docker network, which Compose handles automatically.

Slow model loading First loads are slow because the model weights are read from disk into memory (or VRAM). Subsequent requests are fast. Use a volume backed by SSD storage for best performance.

Port 11434 already in use You have Ollama running natively on the host. Stop it with systemctl stop ollama or change the Docker port mapping to something like 11435:11434.

Whatโ€™s Next

You now have Ollama running in Docker with GPU acceleration and optional web UI. From here you can:

  • Scale with multiple model instances behind a load balancer
  • Add authentication with a reverse proxy like Traefik or Caddy
  • Integrate with your applications via the REST API on port 11434
  • Explore vLLM for higher-throughput serving if you need production-grade inference

FAQ

Can Ollama run in Docker?

Yes, Ollama has an official Docker image (ollama/ollama) that works out of the box. You can run it with a single docker run command and access the API on port 11434, just like a native installation.

Does Docker Ollama support GPU?

Yes, Docker Ollama supports NVIDIA GPU passthrough via the NVIDIA Container Toolkit. Once the toolkit is installed and configured, you add a deploy.resources.reservations.devices block to your Compose file or use --gpus all with docker run.

How do I persist models in Docker?

Mount a named volume or bind mount to /root/.ollama inside the container. This ensures downloaded models survive container restarts and recreations โ€” without it, youโ€™d re-download models every time the container is removed.

Can I use Docker Compose with Ollama?

Absolutely โ€” Docker Compose is the recommended approach for production setups. It lets you define GPU passthrough, persistent volumes, environment variables, and multi-service stacks (like Ollama + Open WebUI) in a single declarative YAML file.

๐Ÿ“˜