Mar 16, 2026 · 3 min read

Last updated on Apr 19, 2026

What is Rate Limiting? A Simple Explanation for Developers

Rate limiting means restricting how many requests a user or client can make to your API within a time window.

Example: “100 requests per minute per user.” Request #101 gets rejected with a 429 Too Many Requests status code.

Why rate limit?

Prevent abuse — stop bots from hammering your API
Protect your server — one user shouldn’t be able to crash your service
Fair usage — ensure all users get a fair share of resources
Cost control — if you pay per API call (database, AI models), rate limiting caps your bill
Security — slow down brute-force login attempts

Common strategies

Fixed window

Count requests in fixed time blocks (e.g., per minute). Simple but allows bursts at window boundaries.

Minute 1: 0-60s → 100 requests allowed
Minute 2: 60-120s → 100 requests allowed

Sliding window

Smooths out the fixed window problem by looking at a rolling time period.

Token bucket

Users get tokens at a steady rate. Each request costs a token. If you’re out of tokens, you wait. Allows short bursts while maintaining an average rate.

How to implement it

Express.js (Node.js)

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per minute
  message: { error: 'Too many requests, try again later' },
});

app.use('/api/', limiter);

Nginx

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
        }
    }
}

Python (Flask)

from flask_limiter import Limiter

limiter = Limiter(app, default_limits=["100 per minute"])

@app.route("/api/data")
@limiter.limit("10 per second")
def get_data():
    return {"data": "..."}

Rate limit headers

Most APIs tell you your rate limit status in response headers:

X-RateLimit-Limit: 100        # Max requests allowed
X-RateLimit-Remaining: 42     # Requests left in this window
X-RateLimit-Reset: 1710590400 # When the window resets (Unix timestamp)
Retry-After: 30               # Seconds to wait (on 429 responses)

As an API consumer

If you’re calling someone else’s API and hitting rate limits:

Respect Retry-After — wait the specified time before retrying
Add exponential backoff — wait 1s, then 2s, then 4s, etc.
Cache responses — don’t re-fetch data you already have
Batch requests — combine multiple calls into one where possible

FAQ

Where should I implement rate limiting — application or infrastructure level?

Both. Use infrastructure-level rate limiting (Nginx, Cloudflare, API gateway) as your first line of defense to block obvious abuse before it hits your app. Add application-level rate limiting for more granular control — per-user limits, endpoint-specific rules, or limits based on subscription tier.

How do I rate limit in a distributed system with multiple servers?

Use a shared store like Redis to track request counts across all your servers. Libraries like rate-limiter-flexible (Node.js) support Redis as a backend, ensuring consistent counting regardless of which server handles the request. Without a shared store, each server tracks independently and users can multiply their limit by the number of servers.

What’s a good default rate limit for a public API?

A common starting point is 100 requests per minute for authenticated users and 20 per minute for unauthenticated requests. Adjust based on your use case — read-heavy endpoints can be more generous, while write endpoints (creating resources, sending emails) should be stricter. Always include rate limit headers so consumers can self-regulate.

Related: AI Security Checklist · How Rate Limiting Actually Works