πŸ“ Tutorials
Β· 5 min read

Ollama + Continue.dev Setup β€” Free Local AI Coding in VS Code (2026)


GitHub Copilot costs $10–19/month. What if you could get the same experience β€” tab completions, AI chat, inline edits β€” running entirely on your machine for free?

That’s exactly what Continue.dev + Ollama gives you. Continue is an open-source AI coding extension for VS Code (and JetBrains). Ollama runs LLMs locally. Together, they deliver Copilot-style coding assistance with zero cloud dependency, zero API keys, and zero subscription fees.

This guide walks you through the full setup in about 15 minutes.

What You Need

  • VS Code (or VS Code Insiders)
  • Ollama β€” local LLM runtime (install guide)
  • Continue.dev extension β€” free from the VS Code marketplace
  • 8GB+ VRAM recommended (16GB+ for larger chat models) β€” see how much VRAM you actually need

That’s it. No accounts, no tokens, no sign-ups.

Step 1: Install Ollama and Pull Models

If you don’t have Ollama yet, install it from ollama.com:

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows β€” download the installer from ollama.com

Now pull the two models we’ll use β€” one for chat, one for autocomplete:

# Chat model β€” excellent reasoning for code tasks
ollama pull qwen3.6-35b-a3b

# Autocomplete model β€” small and fast for tab completions
ollama pull qwen2.5-coder:7b

Why two models? Autocomplete fires on every keystroke, so it needs a small, fast model (3–7B parameters). Chat can afford a larger, smarter model since you only trigger it manually. This split gives you the best of both worlds.

Verify both models are ready:

ollama list

You should see both qwen3.6-35b-a3b and qwen2.5-coder:7b in the output. For more model options, check our best Ollama models for coding roundup.

Step 2: Install the Continue.dev Extension

  1. Open VS Code
  2. Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
  3. Search for Continue
  4. Click Install on the β€œContinue - Codestral, Claude, GPT, Gemini, Llama, etc.” extension
  5. After install, you’ll see the Continue icon in the sidebar (the arrow logo)

Continue will open a welcome panel on first launch. You can skip the cloud setup β€” we’re going fully local.

Step 3: Configure config.json

This is where the magic happens. Continue reads its configuration from ~/.continue/config.json. Open it:

# macOS / Linux
code ~/.continue/config.json

# Windows
code %USERPROFILE%\.continue\config.json

Replace the contents with this complete configuration:

{
  "models": [
    {
      "title": "Qwen 3.6 35B-A3B (Chat)",
      "provider": "ollama",
      "model": "qwen3.6-35b-a3b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5 Coder 7B (Autocomplete)",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  },
  "tabAutocompleteOptions": {
    "debounceDelay": 400,
    "maxPromptTokens": 1500,
    "multilineCompletions": "always"
  },
  "allowAnonymousTelemetry": false
}

Key settings explained:

  • models β€” the model(s) available in the chat sidebar. You can add multiple and switch between them.
  • tabAutocompleteModel β€” the model that powers tab completions. Keep this small and fast.
  • debounceDelay β€” milliseconds to wait after you stop typing before triggering autocomplete. 400ms is a good balance between responsiveness and not hammering your GPU.
  • multilineCompletions β€” set to "always" to get multi-line suggestions like Copilot.
  • allowAnonymousTelemetry β€” set to false because we’re doing this for privacy.

Save the file. Continue picks up changes automatically β€” no restart needed.

Step 4: Test Chat and Autocomplete

Test chat

  1. Click the Continue icon in the VS Code sidebar
  2. Type a question: β€œWrite a Python function that checks if a string is a palindrome”
  3. You should see Qwen 3.6 streaming a response

If you get a connection error, make sure Ollama is running (ollama serve in a terminal).

Test autocomplete

  1. Open any code file (or create a new .py file)
  2. Start typing a function signature: def calculate_fibonacci(
  3. Pause briefly β€” you should see a ghost-text suggestion appear
  4. Press Tab to accept, or keep typing to dismiss

The autocomplete works exactly like Copilot: ghost text appears inline, Tab accepts it. The difference is everything runs on your hardware.

Test inline editing

  1. Select a block of code
  2. Press Ctrl+I (Cmd+I on Mac)
  3. Type an instruction like β€œadd error handling”
  4. Continue will suggest edits inline that you can accept or reject

Best Model Combinations

TaskModelWhy
Chat / Q&Aqwen3.6-35b-a3bBest reasoning-to-size ratio for code
Autocompleteqwen2.5-coder:7bFast, code-specialized, low VRAM
Autocomplete (low VRAM)qwen2.5-coder:3bFits in 4GB, still decent completions
Chat (low VRAM)qwen2.5-coder:7bUse same model for both if VRAM is tight
Chat (max quality)qwen3.6-35b-a3bLarger active parameters, deeper reasoning

For a deeper dive into model selection, see best AI models for coding locally.

Tips for Better Results

Keep Ollama running in the background. The first request after a cold start loads the model into VRAM, which takes a few seconds. After that, responses are near-instant.

Use @file and @codebase in chat. Continue can pull context from your project. Type @filename.py in the chat to reference a specific file, or @codebase to let it search your project for relevant code.

Adjust debounce delay to your hardware. If autocomplete feels laggy, increase debounceDelay to 500–600ms. If your GPU is fast, drop it to 200–300ms.

Try LM Studio as an alternative backend. If you prefer a GUI for model management, LM Studio also works with Continue. Just change apiBase to http://localhost:1234/v1 and set the provider to "lmstudio".

Don’t run both models simultaneously on limited VRAM. Ollama will swap models in and out of memory as needed, but if you only have 8GB VRAM, expect a brief pause when switching between chat and autocomplete.

Continue.dev vs GitHub Copilot

Continue.dev + OllamaGitHub Copilot
CostFree (after hardware)$10–19/month
Privacy100% localCode sent to cloud
Tab completionβœ…βœ…
Chatβœ…βœ…
Inline editingβœ…βœ…
Codebase contextβœ…βœ… (Copilot Chat)
Model choiceAny local modelGPT-4o / Claude
QualityGood (depends on model/hardware)Excellent
SpeedDepends on GPUConsistently fast
Offlineβœ…βŒ

The honest trade-off: Copilot’s cloud models are still stronger for complex tasks. But for everyday coding β€” completions, boilerplate, refactoring, quick questions β€” a local setup with good models is surprisingly close. And you own every byte of it.

For more options beyond Continue, check our best free AI coding assistants comparison.