🔧 Error Fixes
· 2 min read

OpenRouter Rate Limit Fix: 429 Errors and Retry Strategies (2026)


You’re calling the OpenRouter API and getting:

{"error": {"code": 429, "message": "Rate limit exceeded"}}

OpenRouter has multiple rate limit layers: per-account, per-model, and per-provider. Here’s how to handle each one.

Understanding the limits

OpenRouter rate limits come from three sources:

SourceLimitError message
Your accountBased on credits/plan”Rate limit exceeded”
The modelPer-model request limits”Model rate limit exceeded”
The providerUpstream provider limit”Provider rate limit exceeded”

Check your current limits:

curl -s https://openrouter.ai/api/v1/auth/key \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq

Fix 1: Add retry with exponential backoff

import asyncio
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_KEY,
)

async def call_with_retry(messages, model, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            delay = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {delay:.1f}s...")
            await asyncio.sleep(delay)

Fix 2: Fall back to another model

If one model is rate limited, switch to an equivalent:

FALLBACK_CHAIN = [
    "anthropic/claude-sonnet-4",
    "openai/gpt-5.4",
    "google/gemini-2.5-pro",
    "deepseek/deepseek-chat",
]

async def call_with_fallback(messages):
    for model in FALLBACK_CHAIN:
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except RateLimitError:
            continue
    raise Exception("All models rate limited")

This is the core value of OpenRouter — one API key, many providers. See our model fallback guide for advanced patterns.

Fix 3: Add credits

The most common cause of 429s on OpenRouter: you ran out of credits.

  1. Go to openrouter.ai/credits
  2. Add credits ($5-50)
  3. Retry

OpenRouter also has free models that don’t require credits:

# Free models (no credits needed)
response = client.chat.completions.create(
    model="qwen/qwen3.6-plus:free",
    messages=messages,
)

Fix 4: Reduce request frequency

If you’re hitting per-minute limits:

import time

# Simple rate limiter
class RateLimiter:
    def __init__(self, max_per_minute=20):
        self.interval = 60.0 / max_per_minute
        self.last_call = 0
    
    async def wait(self):
        now = time.time()
        elapsed = now - self.last_call
        if elapsed < self.interval:
            await asyncio.sleep(self.interval - elapsed)
        self.last_call = time.time()

limiter = RateLimiter(max_per_minute=20)

async def rate_limited_call(messages, model):
    await limiter.wait()
    return client.chat.completions.create(model=model, messages=messages)

Fix 5: Use caching

Avoid hitting rate limits by caching identical requests:

import hashlib

cache = {}

def cached_call(messages, model):
    key = hashlib.md5(f"{model}:{str(messages)}".encode()).hexdigest()
    if key in cache:
        return cache[key]
    result = client.chat.completions.create(model=model, messages=messages)
    cache[key] = result
    return result

For production caching, see our AI agent cost management guide.

Related: OpenRouter Complete Guide · OpenRouter as Model Fallback · AI Agent Error Handling · AI Agent Cost Management · AI Coding Tools Pricing