Jun 5, 2026 · 5 min read

Last updated on Apr 19, 2026

How to Use Aider with Ollama — Free Local AI Coding Setup

Aider + Ollama = free AI coding with zero API costs and complete privacy. No tokens leave your machine, no usage limits, no monthly bills. If you have a decent GPU (or a Mac with 16+ GB RAM), this is the most cost-effective AI coding setup available.

Why use Aider with Ollama

Cost: Zero. No API keys, no per-token charges. Run it 24/7 without worrying about bills.

Privacy: Your code never leaves your machine. Critical for proprietary codebases, regulated industries, or anyone who doesn’t want their code in someone else’s training data.

Speed: No network latency. Token generation starts instantly — no round-trip to a cloud API. On a good GPU, local models generate 25-40 tokens/second.

Availability: Works offline. No API outages, no rate limits, no degraded service during peak hours.

The trade-off is quality — local models (7B-27B) aren’t as capable as Claude Sonnet or GPT-4o. But for routine coding tasks (refactoring, boilerplate, tests, documentation), they’re more than sufficient.

Step-by-step setup

1. Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start the Ollama service
ollama serve

Verify it’s running: curl http://localhost:11434/api/tags

2. Install Aider

pip install aider-chat

# Or with pipx for isolation
pipx install aider-chat

3. Pull a coding model

# Best quality (needs 16GB+ VRAM)
ollama pull qwen3.5:27b

# Good balance (needs 14GB+ VRAM)
ollama pull devstral-small:24b

# Lower VRAM option (needs 5GB+)
ollama pull qwen3.5:9b

4. Start coding

cd your-project
aider --model ollama/qwen3.5:27b

That’s it. You now have free AI pair programming with full Git integration — Aider automatically commits changes with descriptive messages.

Best Ollama models for Aider

Not all models work equally well with Aider. Aider needs models that can follow edit instructions precisely and output structured diffs. Here are the best options:

Model	VRAM needed	Coding quality	Speed	Best for
Qwen 3.5 27B	16 GB	Excellent	~25 tok/s	Best all-rounder
Devstral Small 24B	14 GB	Excellent	~28 tok/s	Agentic coding tasks
Gemma 4 27B	16 GB	Very good	~25 tok/s	Multi-language
Qwen 3.5 9B	5 GB	Good	~40 tok/s	Low VRAM / fast iteration
DeepSeek Coder V2 16B	10 GB	Good	~32 tok/s	Code-specific tasks

Recommendation: If you have 16+ GB VRAM, use Qwen 3.5 27B. It has the best instruction-following for Aider’s edit format. If you’re on 12 GB, Devstral Small 24B at Q4 fits and excels at coding. For models needing more VRAM than you have locally, cloud GPU providers let you run larger models on demand.

Configuration options

Create .aider.conf.yml in your project root or home directory:

# Model configuration
model: ollama/qwen3.5:27b

# Edit format — diff is best for local models (fewer tokens)
edit-format: diff

# Git integration
auto-commits: true
auto-lint: true

# Context management
map-tokens: 1024
map-refresh: auto

# Performance
stream: true

Key settings explained

edit-format: Aider supports whole, diff, and udiff formats. For local models, diff is strongly recommended — it uses fewer tokens (important when generation is slower) and local models handle it well.

map-tokens: Controls how many tokens Aider uses for the repository map. Lower values (512-1024) work better with smaller context windows. Default is 1024.

auto-commits: When true, Aider commits each change with a descriptive message. Great for tracking what the AI changed and easy rollback with git undo.

Environment variables

# Set default model
export AIDER_MODEL=ollama/qwen3.5:27b

# Set Ollama host (if not localhost)
export OLLAMA_HOST=http://192.168.1.100:11434

Performance tips

1. Use diff edit format

aider --model ollama/qwen3.5:27b --edit-format diff

This reduces token output by 60-80% compared to whole file edits. Faster generation, less chance of errors.

2. Keep context small

Local models have less capacity than cloud models. Be selective about what files you add:

/add src/auth/login.py        # Only the file you're editing
/read src/auth/types.py       # Read-only reference (cheaper)
/drop src/unrelated.py        # Remove files you're done with

3. Use /read for reference files

Files added with /read provide context but Aider won’t try to edit them. This is cheaper and prevents the model from making unwanted changes to reference files.

4. Limit repository map

For large repos, the default repo map can consume too many tokens:

map-tokens: 512  # Reduce from default 1024

5. Use a faster model for simple tasks

Switch models based on task complexity:

# Quick refactoring — use the fast model
aider --model ollama/qwen3.5:9b

# Complex architecture changes — use the big model
aider --model ollama/qwen3.5:27b

6. GPU offloading

Ensure Ollama is using your GPU fully. Check with:

ollama ps  # Shows which models are loaded and GPU usage

If you see CPU inference, check that your GPU drivers are installed correctly.

Combining local and cloud models

Aider supports switching models mid-session. Use local for routine work, cloud for hard problems:

# Start with local
aider --model ollama/qwen3.5:27b

# In-session, switch to cloud for a complex task
/model deepseek/deepseek-chat
# ... do the complex work ...
/model ollama/qwen3.5:27b  # Switch back to free

You can also configure a “weak model” for commit messages and simple tasks:

model: ollama/qwen3.5:27b
weak-model: ollama/qwen3.5:9b  # Used for commit messages

Common issues

Model not found: Make sure you’ve pulled the model first with ollama pull model-name. Run ollama list to see available models.

Slow generation: Ensure GPU is being used (ollama ps). If on CPU, expect 5-10 tok/s instead of 25-40 tok/s.

Edit failures: If the model produces malformed edits, try --edit-format whole (slower but more reliable) or switch to a larger model.

Context too long: Local models have limited context. If you get errors, /drop unnecessary files or reduce map-tokens.

FAQ

Which Ollama model is best for Aider?

Qwen 3.5 27B is currently the best all-around choice for Aider with Ollama. It has strong instruction-following (critical for Aider’s edit format), good coding ability, and runs at ~25 tok/s on a 24 GB GPU. If you’re VRAM-constrained, Devstral Small 24B is excellent for coding-specific tasks, and Qwen 3.5 9B is the best option under 8 GB VRAM. See our best Ollama models for coding for detailed benchmarks.

Is Aider with Ollama slower than cloud APIs?

Token generation is comparable (25-40 tok/s locally vs 30-80 tok/s from cloud APIs). The real difference is model capability — a local 27B model produces less sophisticated solutions than Claude Sonnet or GPT-4o. For routine tasks (refactoring, tests, boilerplate), you won’t notice a practical difference. For complex architectural decisions or novel algorithms, cloud models are noticeably better. Many developers use local for 80% of tasks and switch to cloud for the hard 20%.

Can I use Aider with multiple models simultaneously?

Not simultaneously, but you can switch models within a session using the /model command. A common workflow: use a fast local model (9B) for quick edits and commit messages, switch to a larger local model (27B) for complex changes, and occasionally switch to a cloud model for the hardest problems. You can also configure weak-model in your config to automatically use a smaller model for lightweight tasks like generating commit messages.