You’re calling the OpenRouter API and getting:
{"error": {"code": 429, "message": "Rate limit exceeded"}}
OpenRouter has multiple rate limit layers: per-account, per-model, and per-provider. Here’s how to handle each one.
Understanding the limits
OpenRouter rate limits come from three sources:
| Source | Limit | Error message |
|---|---|---|
| Your account | Based on credits/plan | ”Rate limit exceeded” |
| The model | Per-model request limits | ”Model rate limit exceeded” |
| The provider | Upstream provider limit | ”Provider rate limit exceeded” |
Check your current limits:
curl -s https://openrouter.ai/api/v1/auth/key \
-H "Authorization: Bearer $OPENROUTER_API_KEY" | jq
Fix 1: Add retry with exponential backoff
import asyncio
import random
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=OPENROUTER_KEY,
)
async def call_with_retry(messages, model, max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model=model,
messages=messages,
)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
delay = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Waiting {delay:.1f}s...")
await asyncio.sleep(delay)
Fix 2: Fall back to another model
If one model is rate limited, switch to an equivalent:
FALLBACK_CHAIN = [
"anthropic/claude-sonnet-4",
"openai/gpt-5.4",
"google/gemini-2.5-pro",
"deepseek/deepseek-chat",
]
async def call_with_fallback(messages):
for model in FALLBACK_CHAIN:
try:
return client.chat.completions.create(model=model, messages=messages)
except RateLimitError:
continue
raise Exception("All models rate limited")
This is the core value of OpenRouter — one API key, many providers. See our model fallback guide for advanced patterns.
Fix 3: Add credits
The most common cause of 429s on OpenRouter: you ran out of credits.
- Go to openrouter.ai/credits
- Add credits ($5-50)
- Retry
OpenRouter also has free models that don’t require credits:
# Free models (no credits needed)
response = client.chat.completions.create(
model="qwen/qwen3.6-plus:free",
messages=messages,
)
Fix 4: Reduce request frequency
If you’re hitting per-minute limits:
import time
# Simple rate limiter
class RateLimiter:
def __init__(self, max_per_minute=20):
self.interval = 60.0 / max_per_minute
self.last_call = 0
async def wait(self):
now = time.time()
elapsed = now - self.last_call
if elapsed < self.interval:
await asyncio.sleep(self.interval - elapsed)
self.last_call = time.time()
limiter = RateLimiter(max_per_minute=20)
async def rate_limited_call(messages, model):
await limiter.wait()
return client.chat.completions.create(model=model, messages=messages)
Fix 5: Use caching
Avoid hitting rate limits by caching identical requests:
import hashlib
cache = {}
def cached_call(messages, model):
key = hashlib.md5(f"{model}:{str(messages)}".encode()).hexdigest()
if key in cache:
return cache[key]
result = client.chat.completions.create(model=model, messages=messages)
cache[key] = result
return result
For production caching, see our AI agent cost management guide.
Related: OpenRouter Complete Guide · OpenRouter as Model Fallback · AI Agent Error Handling · AI Agent Cost Management · AI Coding Tools Pricing