May 1, 2026 · 5 min read

Ollama + Continue.dev Setup — Free Local AI Coding in VS Code (2026)

GitHub Copilot costs $10–19/month. What if you could get the same experience — tab completions, AI chat, inline edits — running entirely on your machine for free?

That’s exactly what Continue.dev + Ollama gives you. Continue is an open-source AI coding extension for VS Code (and JetBrains). Ollama runs LLMs locally. Together, they deliver Copilot-style coding assistance with zero cloud dependency, zero API keys, and zero subscription fees.

This guide walks you through the full setup in about 15 minutes.

What You Need

VS Code (or VS Code Insiders)
Ollama — local LLM runtime (install guide)
Continue.dev extension — free from the VS Code marketplace
8GB+ VRAM recommended (16GB+ for larger chat models) — see how much VRAM you actually need

That’s it. No accounts, no tokens, no sign-ups.

Step 1: Install Ollama and Pull Models

If you don’t have Ollama yet, install it from ollama.com:

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the installer from ollama.com

Now pull the two models we’ll use — one for chat, one for autocomplete:

# Chat model — excellent reasoning for code tasks
ollama pull qwen3.6-35b-a3b

# Autocomplete model — small and fast for tab completions
ollama pull qwen2.5-coder:7b

Why two models? Autocomplete fires on every keystroke, so it needs a small, fast model (3–7B parameters). Chat can afford a larger, smarter model since you only trigger it manually. This split gives you the best of both worlds.

Verify both models are ready:

ollama list

You should see both qwen3.6-35b-a3b and qwen2.5-coder:7b in the output. For more model options, check our best Ollama models for coding roundup.

Step 2: Install the Continue.dev Extension

Open VS Code
Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
Search for Continue
Click Install on the “Continue - Codestral, Claude, GPT, Gemini, Llama, etc.” extension
After install, you’ll see the Continue icon in the sidebar (the arrow logo)

Continue will open a welcome panel on first launch. You can skip the cloud setup — we’re going fully local.

Step 3: Configure config.json

This is where the magic happens. Continue reads its configuration from ~/.continue/config.json. Open it:

# macOS / Linux
code ~/.continue/config.json

# Windows
code %USERPROFILE%\.continue\config.json

Replace the contents with this complete configuration:

{
  "models": [
    {
      "title": "Qwen 3.6 35B-A3B (Chat)",
      "provider": "ollama",
      "model": "qwen3.6-35b-a3b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5 Coder 7B (Autocomplete)",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  },
  "tabAutocompleteOptions": {
    "debounceDelay": 400,
    "maxPromptTokens": 1500,
    "multilineCompletions": "always"
  },
  "allowAnonymousTelemetry": false
}

Key settings explained:

models — the model(s) available in the chat sidebar. You can add multiple and switch between them.
tabAutocompleteModel — the model that powers tab completions. Keep this small and fast.
debounceDelay — milliseconds to wait after you stop typing before triggering autocomplete. 400ms is a good balance between responsiveness and not hammering your GPU.
multilineCompletions — set to "always" to get multi-line suggestions like Copilot.
allowAnonymousTelemetry — set to false because we’re doing this for privacy.

Save the file. Continue picks up changes automatically — no restart needed.

Step 4: Test Chat and Autocomplete

Test chat

Click the Continue icon in the VS Code sidebar
Type a question: “Write a Python function that checks if a string is a palindrome”
You should see Qwen 3.6 streaming a response

If you get a connection error, make sure Ollama is running (ollama serve in a terminal).

Test autocomplete

Open any code file (or create a new .py file)
Start typing a function signature: def calculate_fibonacci(
Pause briefly — you should see a ghost-text suggestion appear
Press Tab to accept, or keep typing to dismiss

The autocomplete works exactly like Copilot: ghost text appears inline, Tab accepts it. The difference is everything runs on your hardware.

Test inline editing

Select a block of code
Press Ctrl+I (Cmd+I on Mac)
Type an instruction like “add error handling”
Continue will suggest edits inline that you can accept or reject

Best Model Combinations

Task	Model	Why
Chat / Q&A	qwen3.6-35b-a3b	Best reasoning-to-size ratio for code
Autocomplete	qwen2.5-coder:7b	Fast, code-specialized, low VRAM
Autocomplete (low VRAM)	qwen2.5-coder:3b	Fits in 4GB, still decent completions
Chat (low VRAM)	qwen2.5-coder:7b	Use same model for both if VRAM is tight
Chat (max quality)	qwen3.6-35b-a3b	Larger active parameters, deeper reasoning

For a deeper dive into model selection, see best AI models for coding locally.

Tips for Better Results

Keep Ollama running in the background. The first request after a cold start loads the model into VRAM, which takes a few seconds. After that, responses are near-instant.

Use @file and @codebase in chat. Continue can pull context from your project. Type @filename.py in the chat to reference a specific file, or @codebase to let it search your project for relevant code.

Adjust debounce delay to your hardware. If autocomplete feels laggy, increase debounceDelay to 500–600ms. If your GPU is fast, drop it to 200–300ms.

Try LM Studio as an alternative backend. If you prefer a GUI for model management, LM Studio also works with Continue. Just change apiBase to http://localhost:1234/v1 and set the provider to "lmstudio".

Don’t run both models simultaneously on limited VRAM. Ollama will swap models in and out of memory as needed, but if you only have 8GB VRAM, expect a brief pause when switching between chat and autocomplete.

Continue.dev vs GitHub Copilot

	Continue.dev + Ollama	GitHub Copilot
Cost	Free (after hardware)	$10–19/month
Privacy	100% local	Code sent to cloud
Tab completion	✅	✅
Chat	✅	✅
Inline editing	✅	✅
Codebase context	✅	✅ (Copilot Chat)
Model choice	Any local model	GPT-4o / Claude
Quality	Good (depends on model/hardware)	Excellent
Speed	Depends on GPU	Consistently fast
Offline	✅	❌

The honest trade-off: Copilot’s cloud models are still stronger for complex tasks. But for everyday coding — completions, boilerplate, refactoring, quick questions — a local setup with good models is surprisingly close. And you own every byte of it.

For more options beyond Continue, check our best free AI coding assistants comparison.

Ollama + Continue.dev Setup — Free Local AI Coding in VS Code (2026)

What You Need

Step 1: Install Ollama and Pull Models

Step 2: Install the Continue.dev Extension

Step 3: Configure config.json

Step 4: Test Chat and Autocomplete

Test chat

Test autocomplete

Test inline editing

Best Model Combinations

Tips for Better Results

Continue.dev vs GitHub Copilot

Related Links

📬 AI Dev Weekly

You might also like

Build an AI-Powered Cron Job Monitor That Explains Failures

Build a Local AI Image Describer — Vision Models + Ollama

Build an AI Expense Tracker That Reads Your Bank CSV Files

Build a CLI That Generates README Files From Your Code