GitHub Copilot costs $10-19/month and sends your code to Microsoftβs servers. Hereβs how to replace it with a free, private alternative that runs entirely on your machine. Total setup time: 10 minutes.
What youβll get
- Inline code completion (tab to accept, just like Copilot)
- AI chat sidebar for code questions
- Code explanation, refactoring, and test generation
- Works offline
- Your code never leaves your machine
- $0/month, forever
Requirements
- VS Code (or JetBrains)
- 16GB+ RAM (for basic models) or 24GB+ VRAM (for best quality)
- 10 minutes
Step 1: Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download from ollama.com
Step 2: Download a coding model
Pick based on your hardware:
# 8GB RAM β basic but functional
ollama pull qwen3.5:9b
# 12-16GB VRAM β great autocomplete
ollama pull codestral
# 24GB VRAM β best overall quality
ollama pull qwen2.5-coder:32b
Step 3: Install Continue in VS Code
- Open VS Code
- Go to Extensions (Ctrl+Shift+X)
- Search βContinueβ
- Install βContinue - Codestral, Claude, and moreβ
- Click the Continue icon in the sidebar
Step 4: Configure Continue
Click the gear icon in Continue and set up your models:
If you have 24GB VRAM (best setup):
{
"tabAutocompleteModel": {
"title": "Codestral",
"provider": "ollama",
"model": "codestral"
},
"models": [
{
"title": "Qwen Coder 32B",
"provider": "ollama",
"model": "qwen2.5-coder:32b"
}
]
}
If you have 16GB or less:
{
"tabAutocompleteModel": {
"title": "Qwen 9B",
"provider": "ollama",
"model": "qwen3.5:9b"
},
"models": [
{
"title": "Qwen 9B",
"provider": "ollama",
"model": "qwen3.5:9b"
}
]
}
Step 5: Start coding
Thatβs it. Open any code file and start typing. Youβll see inline suggestions appear, just like Copilot. Press Tab to accept.
Use the chat sidebar (Ctrl+L) to ask questions about your code, request refactoring, or generate tests.
How it compares
After a week of using this setup vs Copilot:
- Autocomplete quality: Codestral is arguably better than Copilot for inline suggestions (95.3% FIM accuracy)
- Chat quality: Qwen Coder 32B matches GPT-4o on coding benchmarks
- Speed: Local inference is instant β no network latency
- Context awareness: Copilot is better at understanding your full project. Local models see less context.
- Multi-file edits: Copilot is better here. Local models work best on single-file tasks.
For 80% of daily coding work β writing functions, fixing bugs, generating boilerplate β the free setup is indistinguishable from Copilot.
Troubleshooting
Suggestions are slow: Your model is too large for your hardware. Drop to a smaller model.
No suggestions appearing: Make sure Ollama is running (ollama serve) and the model is downloaded.
Quality is poor: Upgrade to a larger model if your hardware allows it. The jump from 9B to 32B is significant.
Related
- Best Free AI Coding Assistant in 2026
- What Is Codestral? Mistralβs Coding Model Explained
- Qwen 2.5 Coder vs Codestral β Best Open-Source Coding Model?
- Best Self-Hosted AI Models in 2026
Related: Self-Hosted AI for Enterprise Β· How to Choose an AI Coding Agent Β· Git Cheat Sheet