Some links in this article are affiliate links. We earn a commission at no extra cost to you when you purchase through them. Full disclosure.
You can run AI coding models locally for free. No API keys, no monthly subscriptions, no rate limits, no data leaving your machine. Hereβs how to set it up in 15 minutes.
Option 1: Ollama (easiest, recommended)
Ollama is the simplest way to run models locally. One command to install, one command to run.
Install
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from ollama.com
Pull a coding model
# Best coding model for most hardware (24B, needs 16GB RAM)
ollama pull devstral-small:24b
# Lighter option (8B, needs 8GB RAM)
ollama pull qwen3:8b
# Smallest useful option (4B, needs 4GB RAM)
ollama pull qwen3:4b
See our best AI models for Mac guide for hardware-specific recommendations.
Connect to your coding tools
Aider (terminal):
aider --model ollama/devstral-small:24b
Continue.dev (VS Code):
Add to .continue/config.json:
{
"models": [{
"title": "Devstral Local",
"provider": "ollama",
"model": "devstral-small:24b"
}]
}
OpenCode (terminal):
opencode --provider ollama --model devstral-small:24b
Thatβs it. Free AI coding with zero API costs.
Option 2: LM Studio (GUI, beginner-friendly)
If you prefer a graphical interface:
- Download LM Studio from lmstudio.ai
- Search for βdevstralβ or βqwen3β in the model browser
- Download a model (GGUF format)
- Click βStart Serverβ β it runs an OpenAI-compatible API on localhost
Connect your tools to http://localhost:1234/v1 as the API base URL.
Option 3: vLLM (production, multi-user)
For serving models to a team, vLLM provides production-grade inference:
pip install vllm
vllm serve devstral-small-2506 --port 8000
This gives you an OpenAI-compatible API that handles multiple concurrent users with continuous batching and prefix caching.
Best for: teams of 5+ developers sharing one GPU server.
Hardware requirements
| Model | RAM needed | GPU VRAM | Best hardware |
|---|---|---|---|
| Qwen3 4B | 4 GB | 3 GB | Any modern laptop |
| Qwen3 8B | 8 GB | 6 GB | MacBook Air M2+ |
| Devstral Small 24B | 16 GB | 14 GB | MacBook Pro M2+, RTX 3090 |
| Qwen3.5 27B | 20 GB | 16 GB | MacBook Pro M3+, RTX 4090 |
| DeepSeek R1 14B | 12 GB | 10 GB | MacBook Pro M2+, RTX 3080 |
No GPU? See our guide on running AI without a GPU. Apple Silicon Macs run models on the unified memory, no discrete GPU needed.
Cost comparison: local vs API
| Setup | Monthly cost | Speed | Privacy |
|---|---|---|---|
| Local (Ollama) | $0 | Depends on hardware | β Full |
| OpenRouter free tier | $0 | Fast | β Data sent to API |
| Claude Code | $20/mo | Fastest | β Data sent to Anthropic |
| DeepSeek API | ~$5/mo | Fast | β Data sent to China |
Local inference is the only option with true privacy. If you work with sensitive code or GDPR-regulated data, self-hosting is the way to go. Use a VPN for additional network privacy.
Sharing with your team
Once your local server is running, share it with teammates:
On your local network
# Ollama already listens on all interfaces
# Teammates connect to your-ip:11434
ollama serve
Over the internet (with Tailscale)
# Install Tailscale on server and client machines
# Server: ollama serves on tailscale IP
# Client: connect to 100.x.x.x:11434
On a VPS for the whole team
Deploy on a Hetzner or Vultr VPS:
# On a VPS with 32GB RAM
ollama pull devstral-small:24b
# Team connects via SSH tunnel or VPN
Cost: β¬8.50/month on Hetzner for a 16GB server. Split across 5 developers = β¬1.70/month each for unlimited AI coding.
The recommended setup
For most developers:
- Install Ollama (5 minutes)
- Pull Devstral Small 24B (if you have 16GB+ RAM) or Qwen3 8B (if less)
- Connect Aider for terminal coding
- Connect Continue.dev for VS Code autocomplete
- Keep Claude Code for complex tasks that need frontier quality
This gives you free AI for 80% of coding tasks and paid frontier AI for the hard 20%. Total cost: $0-20/month depending on how much you use Claude.
Related: Ollama Complete Guide Β· Best AI Models for Mac Β· How to Serve LLMs with vLLM Β· Self-Hosted AI for Enterprise Β· Best AI Coding Agents for Privacy Β· Best VPNs for Developers