Jun 19, 2026 · 5 min read

Deploy a RAG Pipeline on DigitalOcean (Python + Postgres + Embeddings)

You’ve built a RAG pipeline locally. It searches your documents, generates answers, and works great on your laptop. Now you need it running in production — reliable, fast, and accessible via API.

DigitalOcean is perfect for this. You get a simple Droplet for compute, managed Postgres with pgvector for embeddings, and straightforward pricing. No AWS-level complexity, no surprise bills.

In this tutorial, we’ll deploy a complete RAG pipeline: document ingestion, vector storage with pgvector, retrieval, and generation via DeepSeek/OpenAI.

Architecture Overview

Here’s what we’re building:

User → FastAPI (Droplet) → pgvector (Managed Postgres)
                         → DeepSeek API (generation)

Components:

DigitalOcean Droplet — runs your FastAPI app
Managed PostgreSQL + pgvector — stores document embeddings
DeepSeek/OpenAI API — generates answers from retrieved context

If you’re new to RAG, start with our what is RAG explainer.

Getting Started

You’ll need a DigitalOcean account:

Get DigitalOcean credits

New accounts get free credits to test the setup without paying upfront.

Step 1: Create a Managed Postgres Database

Why managed? Automatic backups, updates, failover, and pgvector comes pre-installed.

Go to Databases → Create Database Cluster
Choose PostgreSQL 16
Plan: Basic ($15/mo — 1 vCPU, 1GB RAM, 10GB storage)
Pick your data center
Create

Once provisioned (takes 2-3 minutes), grab the connection string from the dashboard.

Enable pgvector:

-- Connect via the provided connection string
psql "your-connection-string"

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

Create your documents table:

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1536),
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

The vector(1536) dimension matches OpenAI’s text-embedding-3-small. Adjust if using a different embedding model.

For more on vector databases, see what is a vector database.

Step 2: Create a Droplet

Droplets → Create Droplet
OS: Ubuntu 22.04
Plan: Basic $12/mo (2 vCPU, 2GB RAM) — enough for FastAPI + embedding generation
Add your SSH key
Create

SSH in:

ssh root@YOUR_DROPLET_IP

Step 3: Build the RAG Application

Install system dependencies:

apt update && apt install -y python3-pip python3-venv

Create the project:

mkdir /opt/rag-api && cd /opt/rag-api
python3 -m venv venv
source venv/bin/activate

Install Python packages:

pip install fastapi uvicorn psycopg2-binary openai httpx pgvector numpy

Create main.py:

from fastapi import FastAPI, UploadFile
from pydantic import BaseModel
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np
from openai import OpenAI
import os
import httpx

app = FastAPI()

# Config
DB_URL = os.environ["DATABASE_URL"]
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
DEEPSEEK_API_KEY = os.environ["DEEPSEEK_API_KEY"]

# Embedding client (OpenAI for embeddings)
embed_client = OpenAI(api_key=OPENAI_API_KEY)

# Generation client (DeepSeek for answers)
gen_client = OpenAI(
    api_key=DEEPSEEK_API_KEY,
    base_url="https://api.deepseek.com"
)

def get_db():
    conn = psycopg2.connect(DB_URL)
    register_vector(conn)
    return conn

def get_embedding(text: str) -> list[float]:
    response = embed_client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

class IngestRequest(BaseModel):
    content: str
    metadata: dict = {}

class QueryRequest(BaseModel):
    question: str
    top_k: int = 5

@app.post("/ingest")
async def ingest(req: IngestRequest):
    embedding = get_embedding(req.content)
    conn = get_db()
    cur = conn.cursor()
    cur.execute(
        "INSERT INTO documents (content, embedding, metadata) VALUES (%s, %s, %s)",
        (req.content, embedding, psycopg2.extras.Json(req.metadata))
    )
    conn.commit()
    cur.close()
    conn.close()
    return {"status": "ingested"}

@app.post("/query")
async def query(req: QueryRequest):
    # Get embedding for the question
    query_embedding = get_embedding(req.question)

    # Retrieve similar documents
    conn = get_db()
    cur = conn.cursor()
    cur.execute(
        """
        SELECT content, 1 - (embedding <=> %s::vector) as similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
        """,
        (query_embedding, query_embedding, req.top_k)
    )
    results = cur.fetchall()
    cur.close()
    conn.close()

    # Build context from retrieved docs
    context = "\n\n---\n\n".join([row[0] for row in results])

    # Generate answer with DeepSeek
    response = gen_client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n\n{context}"},
            {"role": "user", "content": req.question}
        ],
        max_tokens=500
    )

    return {
        "answer": response.choices[0].message.content,
        "sources": [{"content": r[0][:200], "similarity": r[1]} for r in results]
    }

@app.get("/health")
async def health():
    return {"status": "healthy"}

Add the missing import at the top of the file:

import psycopg2.extras

Step 4: Configure Environment Variables

Create an env file:

cat > /opt/rag-api/.env << 'EOF'
DATABASE_URL=postgresql://user:password@your-db-host:25060/defaultdb?sslmode=require
OPENAI_API_KEY=sk-your-openai-key
DEEPSEEK_API_KEY=sk-your-deepseek-key
EOF

Get the DATABASE_URL from your DigitalOcean database dashboard (connection string).

Step 5: Create a Systemd Service

cat > /etc/systemd/system/rag-api.service << 'EOF'
[Unit]
Description=RAG API Service
After=network.target

[Service]
WorkingDirectory=/opt/rag-api
EnvironmentFile=/opt/rag-api/.env
ExecStart=/opt/rag-api/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now rag-api

Step 6: Test Your RAG Pipeline

Ingest a document:

curl -X POST http://YOUR_DROPLET_IP:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "FastAPI is a modern Python web framework. It supports async, automatic OpenAPI docs, and type validation via Pydantic.", "metadata": {"source": "docs"}}'

Query it:

curl -X POST http://YOUR_DROPLET_IP:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is FastAPI?"}'

You should get a generated answer that references the ingested content.

Step 7: Add Nginx + SSL (Production)

For production, put nginx in front:

apt install -y nginx certbot python3-certbot-nginx

# Configure nginx
cat > /etc/nginx/sites-available/rag-api << 'EOF'
server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
EOF

ln -s /etc/nginx/sites-available/rag-api /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx

# Get SSL cert
certbot --nginx -d your-domain.com

Cost Breakdown

Component	Monthly Cost
Droplet (2 vCPU, 2GB)	$12/mo
Managed Postgres (Basic)	$15/mo
OpenAI embeddings (~1M tokens)	~$0.02/mo
DeepSeek generation (~1M tokens)	~$0.14/mo
Total	~$27/mo

That’s a production RAG pipeline for under $30/month. Scale by upgrading the Droplet and database plan as traffic grows.

For scaling strategies, check building a RAG system that scales.

If you want to run everything locally first, try our build a local RAG pipeline with Ollama tutorial.

FAQ

Why Postgres + pgvector instead of a dedicated vector DB?

Simplicity. Postgres handles your vectors AND your regular data in one database. No extra service to manage. pgvector performance is excellent up to ~1M vectors. Beyond that, consider dedicated solutions like Qdrant or Pinecone.

Can I use a free embedding model instead of OpenAI?

Yes. Run a local embedding model on the Droplet (like sentence-transformers/all-MiniLM-L6-v2) to eliminate the OpenAI dependency. You’ll need a slightly bigger Droplet (4GB RAM). Change the get_embedding function to call your local model.

How many documents can this handle?

The Basic Postgres plan (10GB storage) holds roughly 500K-1M document chunks with embeddings. Query performance stays under 100ms up to ~500K vectors with the IVFFlat index. For millions of vectors, upgrade the DB plan and switch to HNSW indexing.

What if I don’t want managed Postgres?

Install Postgres + pgvector directly on the Droplet to save $15/mo. Trade-off: you handle backups, updates, and recovery yourself. For side projects, that’s fine. For production, managed is worth it.

Deploy a RAG Pipeline on DigitalOcean (Python + Postgres + Embeddings)

Architecture Overview

Getting Started

Step 1: Create a Managed Postgres Database

Step 2: Create a Droplet

Step 3: Build the RAG Application

Step 4: Configure Environment Variables

Step 5: Create a Systemd Service

Step 6: Test Your RAG Pipeline

Step 7: Add Nginx + SSL (Production)

Cost Breakdown

FAQ

📬 AI Dev Weekly

You might also like

Build a Local RAG Pipeline with Ollama — No Cloud, No API Keys (2026)

Run DeepSeek V4 on a Vultr GPU Server (Complete Setup)

Build a Local AI Translation Tool with Ollama — No Google Translate Needed

Deploy an AI Chatbot on Railway for Free (Step-by-Step)