Youβve built a RAG pipeline locally. It searches your documents, generates answers, and works great on your laptop. Now you need it running in production β reliable, fast, and accessible via API.
DigitalOcean is perfect for this. You get a simple Droplet for compute, managed Postgres with pgvector for embeddings, and straightforward pricing. No AWS-level complexity, no surprise bills.
In this tutorial, weβll deploy a complete RAG pipeline: document ingestion, vector storage with pgvector, retrieval, and generation via DeepSeek/OpenAI.
Architecture Overview
Hereβs what weβre building:
User β FastAPI (Droplet) β pgvector (Managed Postgres)
β DeepSeek API (generation)
Components:
- DigitalOcean Droplet β runs your FastAPI app
- Managed PostgreSQL + pgvector β stores document embeddings
- DeepSeek/OpenAI API β generates answers from retrieved context
If youβre new to RAG, start with our what is RAG explainer.
Getting Started
Youβll need a DigitalOcean account:
New accounts get free credits to test the setup without paying upfront.
Step 1: Create a Managed Postgres Database
Why managed? Automatic backups, updates, failover, and pgvector comes pre-installed.
- Go to Databases β Create Database Cluster
- Choose PostgreSQL 16
- Plan: Basic ($15/mo β 1 vCPU, 1GB RAM, 10GB storage)
- Pick your data center
- Create
Once provisioned (takes 2-3 minutes), grab the connection string from the dashboard.
Enable pgvector:
-- Connect via the provided connection string
psql "your-connection-string"
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
Create your documents table:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
The vector(1536) dimension matches OpenAIβs text-embedding-3-small. Adjust if using a different embedding model.
For more on vector databases, see what is a vector database.
Step 2: Create a Droplet
- Droplets β Create Droplet
- OS: Ubuntu 22.04
- Plan: Basic $12/mo (2 vCPU, 2GB RAM) β enough for FastAPI + embedding generation
- Add your SSH key
- Create
SSH in:
ssh root@YOUR_DROPLET_IP
Step 3: Build the RAG Application
Install system dependencies:
apt update && apt install -y python3-pip python3-venv
Create the project:
mkdir /opt/rag-api && cd /opt/rag-api
python3 -m venv venv
source venv/bin/activate
Install Python packages:
pip install fastapi uvicorn psycopg2-binary openai httpx pgvector numpy
Create main.py:
from fastapi import FastAPI, UploadFile
from pydantic import BaseModel
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np
from openai import OpenAI
import os
import httpx
app = FastAPI()
# Config
DB_URL = os.environ["DATABASE_URL"]
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
DEEPSEEK_API_KEY = os.environ["DEEPSEEK_API_KEY"]
# Embedding client (OpenAI for embeddings)
embed_client = OpenAI(api_key=OPENAI_API_KEY)
# Generation client (DeepSeek for answers)
gen_client = OpenAI(
api_key=DEEPSEEK_API_KEY,
base_url="https://api.deepseek.com"
)
def get_db():
conn = psycopg2.connect(DB_URL)
register_vector(conn)
return conn
def get_embedding(text: str) -> list[float]:
response = embed_client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
class IngestRequest(BaseModel):
content: str
metadata: dict = {}
class QueryRequest(BaseModel):
question: str
top_k: int = 5
@app.post("/ingest")
async def ingest(req: IngestRequest):
embedding = get_embedding(req.content)
conn = get_db()
cur = conn.cursor()
cur.execute(
"INSERT INTO documents (content, embedding, metadata) VALUES (%s, %s, %s)",
(req.content, embedding, psycopg2.extras.Json(req.metadata))
)
conn.commit()
cur.close()
conn.close()
return {"status": "ingested"}
@app.post("/query")
async def query(req: QueryRequest):
# Get embedding for the question
query_embedding = get_embedding(req.question)
# Retrieve similar documents
conn = get_db()
cur = conn.cursor()
cur.execute(
"""
SELECT content, 1 - (embedding <=> %s::vector) as similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(query_embedding, query_embedding, req.top_k)
)
results = cur.fetchall()
cur.close()
conn.close()
# Build context from retrieved docs
context = "\n\n---\n\n".join([row[0] for row in results])
# Generate answer with DeepSeek
response = gen_client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": f"Answer based on this context:\n\n{context}"},
{"role": "user", "content": req.question}
],
max_tokens=500
)
return {
"answer": response.choices[0].message.content,
"sources": [{"content": r[0][:200], "similarity": r[1]} for r in results]
}
@app.get("/health")
async def health():
return {"status": "healthy"}
Add the missing import at the top of the file:
import psycopg2.extras
Step 4: Configure Environment Variables
Create an env file:
cat > /opt/rag-api/.env << 'EOF'
DATABASE_URL=postgresql://user:password@your-db-host:25060/defaultdb?sslmode=require
OPENAI_API_KEY=sk-your-openai-key
DEEPSEEK_API_KEY=sk-your-deepseek-key
EOF
Get the DATABASE_URL from your DigitalOcean database dashboard (connection string).
Step 5: Create a Systemd Service
cat > /etc/systemd/system/rag-api.service << 'EOF'
[Unit]
Description=RAG API Service
After=network.target
[Service]
WorkingDirectory=/opt/rag-api
EnvironmentFile=/opt/rag-api/.env
ExecStart=/opt/rag-api/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now rag-api
Step 6: Test Your RAG Pipeline
Ingest a document:
curl -X POST http://YOUR_DROPLET_IP:8000/ingest \
-H "Content-Type: application/json" \
-d '{"content": "FastAPI is a modern Python web framework. It supports async, automatic OpenAPI docs, and type validation via Pydantic.", "metadata": {"source": "docs"}}'
Query it:
curl -X POST http://YOUR_DROPLET_IP:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is FastAPI?"}'
You should get a generated answer that references the ingested content.
Step 7: Add Nginx + SSL (Production)
For production, put nginx in front:
apt install -y nginx certbot python3-certbot-nginx
# Configure nginx
cat > /etc/nginx/sites-available/rag-api << 'EOF'
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
EOF
ln -s /etc/nginx/sites-available/rag-api /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
# Get SSL cert
certbot --nginx -d your-domain.com
Cost Breakdown
| Component | Monthly Cost |
|---|---|
| Droplet (2 vCPU, 2GB) | $12/mo |
| Managed Postgres (Basic) | $15/mo |
| OpenAI embeddings (~1M tokens) | ~$0.02/mo |
| DeepSeek generation (~1M tokens) | ~$0.14/mo |
| Total | ~$27/mo |
Thatβs a production RAG pipeline for under $30/month. Scale by upgrading the Droplet and database plan as traffic grows.
For scaling strategies, check building a RAG system that scales.
If you want to run everything locally first, try our build a local RAG pipeline with Ollama tutorial.
FAQ
Why Postgres + pgvector instead of a dedicated vector DB?
Simplicity. Postgres handles your vectors AND your regular data in one database. No extra service to manage. pgvector performance is excellent up to ~1M vectors. Beyond that, consider dedicated solutions like Qdrant or Pinecone.
Can I use a free embedding model instead of OpenAI?
Yes. Run a local embedding model on the Droplet (like sentence-transformers/all-MiniLM-L6-v2) to eliminate the OpenAI dependency. Youβll need a slightly bigger Droplet (4GB RAM). Change the get_embedding function to call your local model.
How many documents can this handle?
The Basic Postgres plan (10GB storage) holds roughly 500K-1M document chunks with embeddings. Query performance stays under 100ms up to ~500K vectors with the IVFFlat index. For millions of vectors, upgrade the DB plan and switch to HNSW indexing.
What if I donβt want managed Postgres?
Install Postgres + pgvector directly on the Droplet to save $15/mo. Trade-off: you handle backups, updates, and recovery yourself. For side projects, thatβs fine. For production, managed is worth it.