GitHub Copilot costs $10β19/month. What if you could get the same experience β tab completions, AI chat, inline edits β running entirely on your machine for free?
Thatβs exactly what Continue.dev + Ollama gives you. Continue is an open-source AI coding extension for VS Code (and JetBrains). Ollama runs LLMs locally. Together, they deliver Copilot-style coding assistance with zero cloud dependency, zero API keys, and zero subscription fees.
This guide walks you through the full setup in about 15 minutes.
What You Need
- VS Code (or VS Code Insiders)
- Ollama β local LLM runtime (install guide)
- Continue.dev extension β free from the VS Code marketplace
- 8GB+ VRAM recommended (16GB+ for larger chat models) β see how much VRAM you actually need
Thatβs it. No accounts, no tokens, no sign-ups.
Step 1: Install Ollama and Pull Models
If you donβt have Ollama yet, install it from ollama.com:
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows β download the installer from ollama.com
Now pull the two models weβll use β one for chat, one for autocomplete:
# Chat model β excellent reasoning for code tasks
ollama pull qwen3.6-35b-a3b
# Autocomplete model β small and fast for tab completions
ollama pull qwen2.5-coder:7b
Why two models? Autocomplete fires on every keystroke, so it needs a small, fast model (3β7B parameters). Chat can afford a larger, smarter model since you only trigger it manually. This split gives you the best of both worlds.
Verify both models are ready:
ollama list
You should see both qwen3.6-35b-a3b and qwen2.5-coder:7b in the output. For more model options, check our best Ollama models for coding roundup.
Step 2: Install the Continue.dev Extension
- Open VS Code
- Go to Extensions (
Ctrl+Shift+X/Cmd+Shift+X) - Search for Continue
- Click Install on the βContinue - Codestral, Claude, GPT, Gemini, Llama, etc.β extension
- After install, youβll see the Continue icon in the sidebar (the arrow logo)
Continue will open a welcome panel on first launch. You can skip the cloud setup β weβre going fully local.
Step 3: Configure config.json
This is where the magic happens. Continue reads its configuration from ~/.continue/config.json. Open it:
# macOS / Linux
code ~/.continue/config.json
# Windows
code %USERPROFILE%\.continue\config.json
Replace the contents with this complete configuration:
{
"models": [
{
"title": "Qwen 3.6 35B-A3B (Chat)",
"provider": "ollama",
"model": "qwen3.6-35b-a3b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen 2.5 Coder 7B (Autocomplete)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
},
"tabAutocompleteOptions": {
"debounceDelay": 400,
"maxPromptTokens": 1500,
"multilineCompletions": "always"
},
"allowAnonymousTelemetry": false
}
Key settings explained:
- models β the model(s) available in the chat sidebar. You can add multiple and switch between them.
- tabAutocompleteModel β the model that powers tab completions. Keep this small and fast.
- debounceDelay β milliseconds to wait after you stop typing before triggering autocomplete. 400ms is a good balance between responsiveness and not hammering your GPU.
- multilineCompletions β set to
"always"to get multi-line suggestions like Copilot. - allowAnonymousTelemetry β set to
falsebecause weβre doing this for privacy.
Save the file. Continue picks up changes automatically β no restart needed.
Step 4: Test Chat and Autocomplete
Test chat
- Click the Continue icon in the VS Code sidebar
- Type a question: βWrite a Python function that checks if a string is a palindromeβ
- You should see Qwen 3.6 streaming a response
If you get a connection error, make sure Ollama is running (ollama serve in a terminal).
Test autocomplete
- Open any code file (or create a new
.pyfile) - Start typing a function signature:
def calculate_fibonacci( - Pause briefly β you should see a ghost-text suggestion appear
- Press Tab to accept, or keep typing to dismiss
The autocomplete works exactly like Copilot: ghost text appears inline, Tab accepts it. The difference is everything runs on your hardware.
Test inline editing
- Select a block of code
- Press
Ctrl+I(Cmd+Ion Mac) - Type an instruction like βadd error handlingβ
- Continue will suggest edits inline that you can accept or reject
Best Model Combinations
| Task | Model | Why |
|---|---|---|
| Chat / Q&A | qwen3.6-35b-a3b | Best reasoning-to-size ratio for code |
| Autocomplete | qwen2.5-coder:7b | Fast, code-specialized, low VRAM |
| Autocomplete (low VRAM) | qwen2.5-coder:3b | Fits in 4GB, still decent completions |
| Chat (low VRAM) | qwen2.5-coder:7b | Use same model for both if VRAM is tight |
| Chat (max quality) | qwen3.6-35b-a3b | Larger active parameters, deeper reasoning |
For a deeper dive into model selection, see best AI models for coding locally.
Tips for Better Results
Keep Ollama running in the background. The first request after a cold start loads the model into VRAM, which takes a few seconds. After that, responses are near-instant.
Use @file and @codebase in chat. Continue can pull context from your project. Type @filename.py in the chat to reference a specific file, or @codebase to let it search your project for relevant code.
Adjust debounce delay to your hardware. If autocomplete feels laggy, increase debounceDelay to 500β600ms. If your GPU is fast, drop it to 200β300ms.
Try LM Studio as an alternative backend. If you prefer a GUI for model management, LM Studio also works with Continue. Just change apiBase to http://localhost:1234/v1 and set the provider to "lmstudio".
Donβt run both models simultaneously on limited VRAM. Ollama will swap models in and out of memory as needed, but if you only have 8GB VRAM, expect a brief pause when switching between chat and autocomplete.
Continue.dev vs GitHub Copilot
| Continue.dev + Ollama | GitHub Copilot | |
|---|---|---|
| Cost | Free (after hardware) | $10β19/month |
| Privacy | 100% local | Code sent to cloud |
| Tab completion | β | β |
| Chat | β | β |
| Inline editing | β | β |
| Codebase context | β | β (Copilot Chat) |
| Model choice | Any local model | GPT-4o / Claude |
| Quality | Good (depends on model/hardware) | Excellent |
| Speed | Depends on GPU | Consistently fast |
| Offline | β | β |
The honest trade-off: Copilotβs cloud models are still stronger for complex tasks. But for everyday coding β completions, boilerplate, refactoring, quick questions β a local setup with good models is surprisingly close. And you own every byte of it.
For more options beyond Continue, check our best free AI coding assistants comparison.