Aider + Ollama = free AI coding with zero API costs and complete privacy. No tokens leave your machine, no usage limits, no monthly bills. If you have a decent GPU (or a Mac with 16+ GB RAM), this is the most cost-effective AI coding setup available.
Why use Aider with Ollama
Cost: Zero. No API keys, no per-token charges. Run it 24/7 without worrying about bills.
Privacy: Your code never leaves your machine. Critical for proprietary codebases, regulated industries, or anyone who doesn’t want their code in someone else’s training data.
Speed: No network latency. Token generation starts instantly — no round-trip to a cloud API. On a good GPU, local models generate 25-40 tokens/second.
Availability: Works offline. No API outages, no rate limits, no degraded service during peak hours.
The trade-off is quality — local models (7B-27B) aren’t as capable as Claude Sonnet or GPT-4o. But for routine coding tasks (refactoring, boilerplate, tests, documentation), they’re more than sufficient.
Step-by-step setup
1. Install Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start the Ollama service
ollama serve
Verify it’s running: curl http://localhost:11434/api/tags
2. Install Aider
pip install aider-chat
# Or with pipx for isolation
pipx install aider-chat
3. Pull a coding model
# Best quality (needs 16GB+ VRAM)
ollama pull qwen3.5:27b
# Good balance (needs 14GB+ VRAM)
ollama pull devstral-small:24b
# Lower VRAM option (needs 5GB+)
ollama pull qwen3.5:9b
4. Start coding
cd your-project
aider --model ollama/qwen3.5:27b
That’s it. You now have free AI pair programming with full Git integration — Aider automatically commits changes with descriptive messages.
Best Ollama models for Aider
Not all models work equally well with Aider. Aider needs models that can follow edit instructions precisely and output structured diffs. Here are the best options:
| Model | VRAM needed | Coding quality | Speed | Best for |
|---|---|---|---|---|
| Qwen 3.5 27B | 16 GB | Excellent | ~25 tok/s | Best all-rounder |
| Devstral Small 24B | 14 GB | Excellent | ~28 tok/s | Agentic coding tasks |
| Gemma 4 27B | 16 GB | Very good | ~25 tok/s | Multi-language |
| Qwen 3.5 9B | 5 GB | Good | ~40 tok/s | Low VRAM / fast iteration |
| DeepSeek Coder V2 16B | 10 GB | Good | ~32 tok/s | Code-specific tasks |
Recommendation: If you have 16+ GB VRAM, use Qwen 3.5 27B. It has the best instruction-following for Aider’s edit format. If you’re on 12 GB, Devstral Small 24B at Q4 fits and excels at coding. For models needing more VRAM than you have locally, cloud GPU providers let you run larger models on demand.
Configuration options
Create .aider.conf.yml in your project root or home directory:
# Model configuration
model: ollama/qwen3.5:27b
# Edit format — diff is best for local models (fewer tokens)
edit-format: diff
# Git integration
auto-commits: true
auto-lint: true
# Context management
map-tokens: 1024
map-refresh: auto
# Performance
stream: true
Key settings explained
edit-format: Aider supports whole, diff, and udiff formats. For local models, diff is strongly recommended — it uses fewer tokens (important when generation is slower) and local models handle it well.
map-tokens: Controls how many tokens Aider uses for the repository map. Lower values (512-1024) work better with smaller context windows. Default is 1024.
auto-commits: When true, Aider commits each change with a descriptive message. Great for tracking what the AI changed and easy rollback with git undo.
Environment variables
# Set default model
export AIDER_MODEL=ollama/qwen3.5:27b
# Set Ollama host (if not localhost)
export OLLAMA_HOST=http://192.168.1.100:11434
Performance tips
1. Use diff edit format
aider --model ollama/qwen3.5:27b --edit-format diff
This reduces token output by 60-80% compared to whole file edits. Faster generation, less chance of errors.
2. Keep context small
Local models have less capacity than cloud models. Be selective about what files you add:
/add src/auth/login.py # Only the file you're editing
/read src/auth/types.py # Read-only reference (cheaper)
/drop src/unrelated.py # Remove files you're done with
3. Use /read for reference files
Files added with /read provide context but Aider won’t try to edit them. This is cheaper and prevents the model from making unwanted changes to reference files.
4. Limit repository map
For large repos, the default repo map can consume too many tokens:
map-tokens: 512 # Reduce from default 1024
5. Use a faster model for simple tasks
Switch models based on task complexity:
# Quick refactoring — use the fast model
aider --model ollama/qwen3.5:9b
# Complex architecture changes — use the big model
aider --model ollama/qwen3.5:27b
6. GPU offloading
Ensure Ollama is using your GPU fully. Check with:
ollama ps # Shows which models are loaded and GPU usage
If you see CPU inference, check that your GPU drivers are installed correctly.
Combining local and cloud models
Aider supports switching models mid-session. Use local for routine work, cloud for hard problems:
# Start with local
aider --model ollama/qwen3.5:27b
# In-session, switch to cloud for a complex task
/model deepseek/deepseek-chat
# ... do the complex work ...
/model ollama/qwen3.5:27b # Switch back to free
You can also configure a “weak model” for commit messages and simple tasks:
model: ollama/qwen3.5:27b
weak-model: ollama/qwen3.5:9b # Used for commit messages
Common issues
Model not found: Make sure you’ve pulled the model first with ollama pull model-name. Run ollama list to see available models.
Slow generation: Ensure GPU is being used (ollama ps). If on CPU, expect 5-10 tok/s instead of 25-40 tok/s.
Edit failures: If the model produces malformed edits, try --edit-format whole (slower but more reliable) or switch to a larger model.
Context too long: Local models have limited context. If you get errors, /drop unnecessary files or reduce map-tokens.
FAQ
Which Ollama model is best for Aider?
Qwen 3.5 27B is currently the best all-around choice for Aider with Ollama. It has strong instruction-following (critical for Aider’s edit format), good coding ability, and runs at ~25 tok/s on a 24 GB GPU. If you’re VRAM-constrained, Devstral Small 24B is excellent for coding-specific tasks, and Qwen 3.5 9B is the best option under 8 GB VRAM. See our best Ollama models for coding for detailed benchmarks.
Is Aider with Ollama slower than cloud APIs?
Token generation is comparable (25-40 tok/s locally vs 30-80 tok/s from cloud APIs). The real difference is model capability — a local 27B model produces less sophisticated solutions than Claude Sonnet or GPT-4o. For routine tasks (refactoring, tests, boilerplate), you won’t notice a practical difference. For complex architectural decisions or novel algorithms, cloud models are noticeably better. Many developers use local for 80% of tasks and switch to cloud for the hard 20%.
Can I use Aider with multiple models simultaneously?
Not simultaneously, but you can switch models within a session using the /model command. A common workflow: use a fast local model (9B) for quick edits and commit messages, switch to a larger local model (27B) for complex changes, and occasionally switch to a cloud model for the hardest problems. You can also configure weak-model in your config to automatically use a smaller model for lightweight tasks like generating commit messages.
Related: Aider Complete Guide · Ollama Complete Guide · Best Ollama Models for Coding · Ollama + Continue.dev Setup