Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.
What if you could have your own AI assistant running 24/7, accessible from anywhere, for less than the price of a coffee? No API keys, no rate limits, no per-token billing. Just your own LLM on a server you control.
The catch? Youβll be running on CPU, so itβs not blazing fast. But for personal use β a coding assistant, writing helper, or API backend for side projects β itβs more than good enough. And at β¬4.99/month, itβs hard to beat.
Let me show you how to set this up on a Contabo VPS.
Why Contabo?
Contabo is the budget king of VPS hosting. Their Cloud VPS S plan gives you:
- 8GB RAM β enough for 7B parameter models
- 4 vCPU cores β decent for CPU inference
- 200GB SSD β plenty for multiple models
- Unlimited traffic β no bandwidth limits
- β¬4.99/month β seriously
The tradeoff: no GPU. Youβre doing CPU inference, which means slower responses (think 3-8 tokens/second instead of 50+). But for an always-on personal AI, thatβs acceptable.
For a deeper comparison of CPU vs GPU inference, check our CPU vs GPU for LLM inference breakdown.
Getting Started
Grab a Contabo VPS to follow along:
The Cloud VPS S (8GB RAM, β¬4.99/mo) is the sweet spot. If you want to run slightly larger models, the Cloud VPS M (16GB RAM, β¬8.99/mo) gives you more headroom.
Step 1: Order Your VPS
- Go to Contabo and select Cloud VPS S
- Choose Ubuntu 22.04 as the OS
- Pick a data center close to you (EU or US options available)
- Set a root password or upload your SSH key
- Complete the order
Provisioning takes a few minutes (sometimes up to an hour during peak times β not instant like cloud providers, but itβs a one-time wait).
Step 2: SSH Into Your Server
Once you get the confirmation email with your IP:
ssh root@YOUR_SERVER_IP
First, update the system:
apt update && apt upgrade -y
Step 3: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Ollama installs and starts automatically. Verify:
ollama --version
systemctl status ollama
For everything Ollama can do, see our complete Ollama guide.
Step 4: Pull a Model That Fits 8GB RAM
This is the crucial part. With 8GB RAM and no GPU, you need models that:
- Fit in memory (leave ~2GB for the OS)
- Are optimized for CPU inference
- Still produce useful output
Here are your best options:
# Qwen 3 4B β fast, capable, fits easily
ollama pull qwen3:4b
# Gemma 4 E4B β Google's efficient 4B model
ollama pull gemma3:4b
# Phi-3 Mini β Microsoft's compact model
ollama pull phi3:mini
For the 8GB VPS, stick to 4B models or smaller. Theyβll use about 3-4GB RAM, leaving room for the OS and Ollama overhead.
If you upgrade to the 16GB plan, you can run proper 7-8B models:
# Only on 16GB+ RAM:
ollama pull qwen3:8b
ollama pull llama3.1:8b
Check our guide on best AI models under 4GB RAM for more options.
Step 5: Test Your Model
Interactive chat:
ollama run qwen3:4b
Expect your first response in 5-10 seconds (model loading), then 3-8 tokens/second for generation. Not instant, but totally usable for personal tasks.
Test via API:
curl http://localhost:11434/api/generate -d '{
"model": "qwen3:4b",
"prompt": "Write a Python function to reverse a string",
"stream": false
}'
Step 6: Configure for Remote Access
To access your LLM from other devices or apps:
sudo systemctl edit ollama
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Restart:
sudo systemctl restart ollama
Important: Set up a firewall to restrict access:
ufw allow ssh
ufw allow from YOUR_HOME_IP to any port 11434
ufw enable
Now you can hit your LLM from your laptop, phone, or other services.
Step 7: Keep It Running 24/7
Ollama runs as a systemd service by default, so it survives reboots. Verify:
systemctl is-enabled ollama
Thatβs it. Your personal AI is now always on, always available.
Performance: What to Expect
Real-world benchmarks on Contabo Cloud VPS S (4 vCPU, 8GB RAM):
| Model | Size | Tokens/sec | Time for 200-word response |
|---|---|---|---|
| Qwen 3 4B | ~2.5GB | 6-8 tok/s | ~25 seconds |
| Phi-3 Mini | ~2.3GB | 7-9 tok/s | ~22 seconds |
| Gemma 4 E4B | ~2.8GB | 5-7 tok/s | ~28 seconds |
Not fast, but predictable. No cold starts, no rate limits, no per-token costs. For background tasks, coding assistance, or async workflows, this is perfectly fine.
Cost Comparison
| Option | Monthly Cost | Speed | Always On? |
|---|---|---|---|
| Contabo VPS S | β¬4.99/mo | Slow (CPU) | β Yes |
| OpenAI API (light use) | ~$5-20/mo | Fast | N/A (per-call) |
| Vultr GPU (on-demand) | ~$50-100/mo | Very fast | β Expensive 24/7 |
| Your laptop | β¬0 | Medium | β Drains battery |
The Contabo option wins when you want: always available, predictable cost, full privacy, no API dependencies.
For more budget-friendly approaches, see cheapest way to run AI locally.
Use Cases That Work Great
- Personal coding assistant β pipe code questions via API, get answers in 20-30 seconds
- Automated writing helper β blog post drafts, email rewrites, summaries
- API backend for side projects β your apps call your own LLM, zero API costs
- Private AI β sensitive queries never leave your server
- Learning and experimentation β try different models, prompts, and configs
FAQ
Is CPU inference really usable?
For personal, async use β yes. You wonβt get instant chat responses, but 5-8 tokens/second is fine for background tasks, batch processing, or when you donβt mind waiting 20-30 seconds for a response. Itβs like having a slightly slow but always-free assistant.
Can I run larger models on Contabo?
On the 8GB plan, max out at 4B models. The 16GB plan (β¬8.99/mo) lets you run 7-8B models. For anything larger (13B+), you need GPU instances β check our how much VRAM guide. Contabo doesnβt offer GPU servers.
Will Contabo throttle my CPU usage?
Contabo is known for generous sustained CPU allowance. Running Ollama at full CPU wonβt get you flagged. Theyβre one of the few providers that actually let you use what you pay for.
How does this compare to just using ChatGPT?
ChatGPT is faster and smarter (GPT-4 class). But it costs $20/mo, has rate limits, logs your data, and requires internet. A self-hosted LLM on Contabo gives you full privacy, no limits, always-on access for β¬4.99/mo β with the tradeoff of slower, less capable models.