🤖 AI Tools
· 10 min read

How to Use Granite 4.1 with Aider and Continue.dev (2026)


IBM Granite 4.1 is one of the strongest open-weight coding models available in 2026. The 8B model scores 87.2 on HumanEval and the 30B leads BFCL V3 tool calling at 73.68 — both under Apache 2.0. But raw benchmarks don’t matter if you can’t plug the model into your actual workflow. Here’s how to set up Granite 4.1 with Aider, Continue.dev, and other popular coding tools.

Prerequisites

Before configuring any tool, you need Granite 4.1 running somewhere. You have three options:

# Install Ollama if you haven't
curl -fsSL https://ollama.com/install.sh | sh

# Pull Granite 4.1 (pick your size)
ollama pull granite4.1:3b      # ~2 GB, runs on any machine
ollama pull granite4.1:8b      # ~5 GB, needs 8+ GB RAM
ollama pull granite4.1:30b     # ~18 GB quantized, needs 32+ GB RAM

Ollama serves models on http://localhost:11434 by default with an OpenAI-compatible API.

Option 2: Local with LM Studio

  1. Download LM Studio from lmstudio.ai
  2. Search for “granite 4.1” in the model library
  3. Download your preferred size
  4. Start the local server (default: http://localhost:1234/v1)

Option 3: Cloud API

For the 30B model without local hardware, use a cloud provider:

  • OpenRouter: ibm-granite/granite-4.1-30b-instruct — pay per token
  • watsonx.ai: IBM’s managed platform — enterprise SLAs available
  • Replicate: On-demand inference — no GPU management

Each provider gives you an OpenAI-compatible endpoint and API key.

Setting up Aider with Granite 4.1

Aider is a terminal-based AI coding assistant that edits files directly in your repository. It works with any OpenAI-compatible API, which makes Granite 4.1 integration straightforward.

Install Aider

pip install aider-chat

Configure for local Ollama

Create or edit ~/.aider.conf.yml:

# Granite 4.1 via Ollama
model: ollama/granite4.1:8b

Or pass it directly:

aider --model ollama/granite4.1:8b

Aider auto-detects Ollama on localhost:11434. No API key needed.

Configure for OpenRouter

export OPENROUTER_API_KEY=your-key-here
aider --model openrouter/ibm-granite/granite-4.1-30b-instruct

Or in ~/.aider.conf.yml:

model: openrouter/ibm-granite/granite-4.1-30b-instruct

Configure for a generic OpenAI-compatible endpoint

If you’re running Granite 4.1 via vLLM, LM Studio, or any other OpenAI-compatible server:

export OPENAI_API_BASE=http://localhost:1234/v1
export OPENAI_API_KEY=not-needed
aider --model openai/granite-4.1-8b-instruct

Aider performance tips for Granite 4.1

Use the 8B for most tasks. Granite 4.1 8B scores 87.2 on HumanEval — that’s strong enough for most code editing tasks. The 30B is better for complex multi-file refactors but slower locally.

Enable the editor model pattern. Use the 3B as a fast editor model and the 8B or 30B as the main model:

aider --model ollama/granite4.1:8b --editor-model ollama/granite4.1:3b

The 3B handles simple edits quickly while the 8B tackles complex reasoning.

Set appropriate context. Granite 4.1 8B supports 512K tokens, so you can add many files to context:

aider --model ollama/granite4.1:8b
# Then in Aider:
/add src/**/*.py

Use map-tokens for large repos. Aider’s repo map helps Granite understand your codebase structure:

# ~/.aider.conf.yml
model: ollama/granite4.1:8b
map-tokens: 2048

Setting up Continue.dev with Granite 4.1

Continue.dev is an open-source AI coding assistant that runs inside VS Code and JetBrains. It supports local models through Ollama and remote APIs.

Install Continue.dev

  1. Open VS Code
  2. Go to Extensions (Ctrl+Shift+X)
  3. Search “Continue” and install
  4. Or install from the marketplace: continue.continue

Configure for local Ollama

Edit your Continue config file at ~/.continue/config.json:

{
  "models": [
    {
      "title": "Granite 4.1 8B",
      "provider": "ollama",
      "model": "granite4.1:8b",
      "contextLength": 32768
    }
  ],
  "tabAutocompleteModel": {
    "title": "Granite 4.1 3B",
    "provider": "ollama",
    "model": "granite4.1:3b",
    "contextLength": 4096
  }
}

This configuration uses the 8B for chat/edit tasks and the 3B for fast tab autocomplete.

Configure for OpenRouter

{
  "models": [
    {
      "title": "Granite 4.1 30B",
      "provider": "openrouter",
      "model": "ibm-granite/granite-4.1-30b-instruct",
      "apiKey": "your-openrouter-key"
    }
  ]
}

Configure for a custom OpenAI-compatible endpoint

{
  "models": [
    {
      "title": "Granite 4.1 8B",
      "provider": "openai",
      "model": "granite-4.1-8b-instruct",
      "apiBase": "http://localhost:1234/v1",
      "apiKey": "not-needed"
    }
  ]
}

Configure for watsonx.ai

{
  "models": [
    {
      "title": "Granite 4.1 30B (watsonx)",
      "provider": "openai",
      "model": "ibm/granite-4-1-30b-instruct",
      "apiBase": "https://us-south.ml.cloud.ibm.com/ml/v1/text/generation",
      "apiKey": "your-watsonx-api-key",
      "requestOptions": {
        "headers": {
          "Content-Type": "application/json"
        }
      }
    }
  ]
}

Continue.dev performance tips for Granite 4.1

Use the 3B for autocomplete. Tab completion needs to be fast — under 200ms ideally. Granite 4.1 3B is small enough to deliver near-instant completions while still being accurate (79.27 on HumanEval).

Set context length appropriately. While Granite supports up to 512K tokens, Continue.dev works best with a focused context window. Set contextLength to 32768 for chat and 4096 for autocomplete to keep responses fast.

Enable codebase indexing. Continue.dev can index your codebase for retrieval-augmented generation:

{
  "contextProviders": [
    {
      "name": "codebase",
      "params": {
        "nRetrieve": 15,
        "nFinal": 5
      }
    }
  ]
}

This helps Granite understand your project structure without loading everything into context.

Use slash commands. Continue.dev supports slash commands that work well with Granite’s instruction-following capabilities (IFEval 89.65 for the 30B):

  • /edit — edit selected code
  • /comment — add comments to code
  • /test — generate tests for selected code
  • /fix — fix errors in selected code

Setting up other coding tools

vLLM (high-throughput serving)

For team deployments or production use, vLLM provides optimized inference:

pip install vllm

vllm serve ibm-granite/granite-4.1-8b-instruct \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9

This serves an OpenAI-compatible API on http://localhost:8000/v1 that any tool can connect to.

For the 30B with FP8 quantization:

vllm serve ibm-granite/granite-4.1-30b-instruct \
  --quantization fp8 \
  --max-model-len 32768 \
  --tensor-parallel-size 2

Open Interpreter

pip install open-interpreter

# With Ollama
interpreter --model ollama/granite4.1:8b

# With OpenRouter
export OPENROUTER_API_KEY=your-key
interpreter --model openrouter/ibm-granite/granite-4.1-30b-instruct

Cursor / Windsurf (custom model)

Both Cursor and Windsurf support custom OpenAI-compatible endpoints. Point them at your Ollama, vLLM, or LM Studio server:

  • API Base: http://localhost:11434/v1 (Ollama) or http://localhost:8000/v1 (vLLM)
  • Model: granite4.1:8b
  • API Key: not-needed (for local)

AnythingLLM

AnythingLLM supports Granite 4.1 directly:

  1. Open AnythingLLM settings
  2. Select Ollama as the LLM provider
  3. Choose granite4.1:8b from the model list
  4. Set your workspace preferences

Which Granite 4.1 size for which tool?

ToolRecommended sizeWhy
Aider (main model)8B or 30BNeeds strong code editing; 8B is the sweet spot
Aider (editor model)3BFast simple edits
Continue.dev (chat)8BGood balance of quality and speed
Continue.dev (autocomplete)3BSpeed is critical for tab completion
vLLM (team server)30BMaximum quality for shared deployment
Open Interpreter8BGeneral-purpose coding tasks
Quick prototyping3BFast iteration, good enough quality

The 8B is the sweet spot for most individual developer workflows. It scores 87.2 on HumanEval, 80.2 on EvalPlus, and 68.27 on BFCL V3 tool calling — all while fitting in 16 GB of RAM with quantization.

The 30B is worth the hardware cost for team deployments or when you need maximum quality for complex tasks. It leads BFCL V3 at 73.68 and scores 89.63 on HumanEval.

The 3B is ideal for autocomplete and fast editing. At 79.27 on HumanEval, it’s remarkably capable for its size and runs on virtually any hardware.

Performance expectations

Here’s what to expect in terms of speed and quality:

Local inference speed (approximate)

ModelHardwareTokens/secLatency feel
3B (Q4)MacBook Air M2 8GB~40-60Instant
3B (Q4)RTX 3060 12GB~50-70Instant
8B (Q4)MacBook Pro M3 16GB~25-40Fast
8B (Q4)RTX 4070 12GB~30-50Fast
30B (Q4)Mac Studio M2 Ultra 64GB~10-20Acceptable
30B (FP8)A100 80GB~30-50Fast

Quality expectations by task

Task3B8B30B
Single function generationGoodExcellentExcellent
Multi-file refactoringFairGoodExcellent
Test generationGoodExcellentExcellent
Bug fixingGoodExcellentExcellent
API integration / tool callingFairGoodExcellent
Code explanationFairGoodExcellent
Complex architecture decisionsPoorFairGood

Troubleshooting common issues

”Model not found” in Aider

Make sure Ollama is running and the model is pulled:

ollama list  # Check available models
ollama pull granite4.1:8b  # Pull if missing

Slow responses in Continue.dev

Reduce contextLength in your config. Large context windows require more memory and compute:

{
  "models": [
    {
      "title": "Granite 4.1 8B",
      "provider": "ollama",
      "model": "granite4.1:8b",
      "contextLength": 16384
    }
  ]
}

Out of memory errors

Switch to a smaller model or enable quantization:

# Use the 3B instead
ollama pull granite4.1:3b

# Or use a quantized variant
ollama pull granite4.1:8b-q4_0

Tool calling not working

Granite 4.1 supports tool calling natively, but the tool needs to format the request correctly. Make sure your tool is sending the tools parameter in the OpenAI-compatible format. Aider and Continue.dev handle this automatically.

Connection refused errors

Check that your local server is running:

# For Ollama
curl http://localhost:11434/api/tags

# For vLLM
curl http://localhost:8000/v1/models

# For LM Studio
curl http://localhost:1234/v1/models

Granite 4.1 vs other local coding models in these tools

How does Granite 4.1 compare to other popular choices for Aider and Continue.dev?

ModelHumanEvalTool callingContextBest for
Granite 4.1 8B87.268.27 (BFCL V3)512KTool calling, enterprise
Granite 4.1 30B89.6373.68 (BFCL V3)512KMaximum coding quality
Qwen 3-Coder 30B~88~65128KGeneral coding
Gemma 4 27B~83~72.7256KMultimodal coding
Phi-4 14B~82~60128KCompact reasoning

Granite 4.1 8B offers the best coding performance per GB of VRAM. The 30B leads on tool calling. For Aider specifically, Granite’s strong instruction following (IFEval 89.65) translates to reliable code edits — it follows Aider’s edit format consistently.

For more details on the model itself, see our Granite 4.1 complete guide. For API configuration options, check the Granite 4.1 API guide. For Aider setup with other models, see our Aider complete guide, and for Continue.dev, the Continue.dev complete guide.


FAQ

Which Granite 4.1 size should I use with Aider?

The 8B is the best default. It scores 87.2 on HumanEval, fits in 16 GB RAM with quantization, and generates tokens fast enough for interactive use. Use the 30B for complex multi-file refactors if you have the hardware (32+ GB RAM). Use the 3B as an editor model for fast simple edits alongside the 8B as the main model.

Does Granite 4.1 work with Continue.dev autocomplete?

Yes. Use the 3B model for tab autocomplete — it’s fast enough for sub-200ms completions while scoring 79.27 on HumanEval. Configure it as the tabAutocompleteModel in your Continue config with a short context length (4096 tokens) for maximum speed.

Can I use Granite 4.1 with Aider without an API key?

Yes, when running locally through Ollama or LM Studio. No API key is needed — Aider connects directly to the local server. For cloud providers (OpenRouter, watsonx.ai), you’ll need an API key from that provider.

How does Granite 4.1 compare to Qwen 3-Coder in Aider?

Granite 4.1 8B and Qwen 3-Coder are close on raw coding benchmarks, but Granite has two advantages in Aider: stronger instruction following (IFEval 89.65) means it follows Aider’s edit format more reliably, and better tool calling (BFCL V3 68.27) means it handles structured operations better. Granite also offers 512K context vs Qwen’s 128K.

Is the 30B model worth the extra hardware for coding tools?

It depends on your tasks. The 30B scores 89.63 on HumanEval vs 87.2 for the 8B — a modest improvement. Where the 30B shines is tool calling (73.68 vs 68.27 on BFCL V3) and complex multi-file operations. If you’re doing simple function generation and bug fixes, the 8B is sufficient. For complex refactoring, API integration, or team deployments, the 30B justifies the hardware.

Can I run Granite 4.1 and another model simultaneously?

Yes, with Ollama. Ollama can serve multiple models — it loads them on demand and keeps recently used models in memory. You can configure Aider to use Granite 4.1 8B as the main model and a different model (or Granite 3B) as the editor model. Continue.dev also supports multiple model configurations for different tasks.

What’s the minimum hardware for a usable Granite 4.1 coding setup?

A MacBook with 8 GB RAM or any machine with 8 GB VRAM can run the 3B model comfortably. For the 8B (recommended), you need 16 GB RAM/VRAM. A MacBook Pro M2/M3 with 16 GB unified memory runs the 8B well for Aider and Continue.dev. The 30B needs 32+ GB — a Mac Studio or a GPU like the RTX 4090 or A6000.

Related: Granite 4.1 complete guide · Granite 4.1 API guide · Aider complete guide · Continue.dev complete guide