πŸ“ Tutorials
Β· 5 min read

Self-Host an LLM on Contabo VPS for €4.99/Month


Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.

What if you could have your own AI assistant running 24/7, accessible from anywhere, for less than the price of a coffee? No API keys, no rate limits, no per-token billing. Just your own LLM on a server you control.

The catch? You’ll be running on CPU, so it’s not blazing fast. But for personal use β€” a coding assistant, writing helper, or API backend for side projects β€” it’s more than good enough. And at €4.99/month, it’s hard to beat.

Let me show you how to set this up on a Contabo VPS.

Why Contabo?

Contabo is the budget king of VPS hosting. Their Cloud VPS S plan gives you:

  • 8GB RAM β€” enough for 7B parameter models
  • 4 vCPU cores β€” decent for CPU inference
  • 200GB SSD β€” plenty for multiple models
  • Unlimited traffic β€” no bandwidth limits
  • €4.99/month β€” seriously

The tradeoff: no GPU. You’re doing CPU inference, which means slower responses (think 3-8 tokens/second instead of 50+). But for an always-on personal AI, that’s acceptable.

For a deeper comparison of CPU vs GPU inference, check our CPU vs GPU for LLM inference breakdown.

Getting Started

Grab a Contabo VPS to follow along:

Get Contabo VPS

The Cloud VPS S (8GB RAM, €4.99/mo) is the sweet spot. If you want to run slightly larger models, the Cloud VPS M (16GB RAM, €8.99/mo) gives you more headroom.

Step 1: Order Your VPS

  1. Go to Contabo and select Cloud VPS S
  2. Choose Ubuntu 22.04 as the OS
  3. Pick a data center close to you (EU or US options available)
  4. Set a root password or upload your SSH key
  5. Complete the order

Provisioning takes a few minutes (sometimes up to an hour during peak times β€” not instant like cloud providers, but it’s a one-time wait).

Step 2: SSH Into Your Server

Once you get the confirmation email with your IP:

ssh root@YOUR_SERVER_IP

First, update the system:

apt update && apt upgrade -y

Step 3: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Ollama installs and starts automatically. Verify:

ollama --version
systemctl status ollama

For everything Ollama can do, see our complete Ollama guide.

Step 4: Pull a Model That Fits 8GB RAM

This is the crucial part. With 8GB RAM and no GPU, you need models that:

  • Fit in memory (leave ~2GB for the OS)
  • Are optimized for CPU inference
  • Still produce useful output

Here are your best options:

# Qwen 3 4B β€” fast, capable, fits easily
ollama pull qwen3:4b

# Gemma 4 E4B β€” Google's efficient 4B model
ollama pull gemma3:4b

# Phi-3 Mini β€” Microsoft's compact model
ollama pull phi3:mini

For the 8GB VPS, stick to 4B models or smaller. They’ll use about 3-4GB RAM, leaving room for the OS and Ollama overhead.

If you upgrade to the 16GB plan, you can run proper 7-8B models:

# Only on 16GB+ RAM:
ollama pull qwen3:8b
ollama pull llama3.1:8b

Check our guide on best AI models under 4GB RAM for more options.

Step 5: Test Your Model

Interactive chat:

ollama run qwen3:4b

Expect your first response in 5-10 seconds (model loading), then 3-8 tokens/second for generation. Not instant, but totally usable for personal tasks.

Test via API:

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:4b",
  "prompt": "Write a Python function to reverse a string",
  "stream": false
}'

Step 6: Configure for Remote Access

To access your LLM from other devices or apps:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Restart:

sudo systemctl restart ollama

Important: Set up a firewall to restrict access:

ufw allow ssh
ufw allow from YOUR_HOME_IP to any port 11434
ufw enable

Now you can hit your LLM from your laptop, phone, or other services.

Step 7: Keep It Running 24/7

Ollama runs as a systemd service by default, so it survives reboots. Verify:

systemctl is-enabled ollama

That’s it. Your personal AI is now always on, always available.

Performance: What to Expect

Real-world benchmarks on Contabo Cloud VPS S (4 vCPU, 8GB RAM):

ModelSizeTokens/secTime for 200-word response
Qwen 3 4B~2.5GB6-8 tok/s~25 seconds
Phi-3 Mini~2.3GB7-9 tok/s~22 seconds
Gemma 4 E4B~2.8GB5-7 tok/s~28 seconds

Not fast, but predictable. No cold starts, no rate limits, no per-token costs. For background tasks, coding assistance, or async workflows, this is perfectly fine.

Cost Comparison

OptionMonthly CostSpeedAlways On?
Contabo VPS S€4.99/moSlow (CPU)βœ… Yes
OpenAI API (light use)~$5-20/moFastN/A (per-call)
Vultr GPU (on-demand)~$50-100/moVery fast❌ Expensive 24/7
Your laptop€0Medium❌ Drains battery

The Contabo option wins when you want: always available, predictable cost, full privacy, no API dependencies.

For more budget-friendly approaches, see cheapest way to run AI locally.

Use Cases That Work Great

  • Personal coding assistant β€” pipe code questions via API, get answers in 20-30 seconds
  • Automated writing helper β€” blog post drafts, email rewrites, summaries
  • API backend for side projects β€” your apps call your own LLM, zero API costs
  • Private AI β€” sensitive queries never leave your server
  • Learning and experimentation β€” try different models, prompts, and configs

FAQ

Is CPU inference really usable?

For personal, async use β€” yes. You won’t get instant chat responses, but 5-8 tokens/second is fine for background tasks, batch processing, or when you don’t mind waiting 20-30 seconds for a response. It’s like having a slightly slow but always-free assistant.

Can I run larger models on Contabo?

On the 8GB plan, max out at 4B models. The 16GB plan (€8.99/mo) lets you run 7-8B models. For anything larger (13B+), you need GPU instances β€” check our how much VRAM guide. Contabo doesn’t offer GPU servers.

Will Contabo throttle my CPU usage?

Contabo is known for generous sustained CPU allowance. Running Ollama at full CPU won’t get you flagged. They’re one of the few providers that actually let you use what you pay for.

How does this compare to just using ChatGPT?

ChatGPT is faster and smarter (GPT-4 class). But it costs $20/mo, has rate limits, logs your data, and requires internet. A self-hosted LLM on Contabo gives you full privacy, no limits, always-on access for €4.99/mo β€” with the tradeoff of slower, less capable models.