Design a Distributed Rate Limiter

ArchitectureAdvanced

distributed-systemsrate-limitingredisconsistencyscalability

Problem

Design a rate limiter that works across multiple application server nodes behind a load balancer. The rate limiter must enforce global rate limits (e.g., 100 requests per minute per API key) regardless of which server handles the request.

Requirements

Global Rate Limiting: Enforce a single rate limit across all server nodes. If the limit is 100 req/min, the total requests from a client across all nodes must not exceed 100.
Multiple Algorithms: Support both fixed-window and sliding-window rate limiting.
Tiered Limits: Different API keys may have different rate limits (free: 60/min, pro: 600/min, enterprise: 6000/min).
Burst Handling: Allow short bursts above the sustained rate (e.g., token bucket with burst capacity).
Response Headers: Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset).
Graceful Degradation: If the centralized rate limit store is unavailable, the system should degrade gracefully rather than blocking all requests.

Constraints

20 application server nodes behind a load balancer.
Combined throughput: 50,000 requests per second.
Rate limit check must add less than 2ms of latency (p99).
Rate limit state should not consume more than 10 GB of memory.
The system must handle clock skew between nodes (up to 100ms).

What to Design

The algorithm for each rate limiting strategy (fixed window, sliding window, token bucket)
How state is synchronized across nodes
The data structure in the centralized store
Your graceful degradation strategy when the store is unavailable
How you handle race conditions in concurrent rate limit checks

Design a Distributed Rate Limiter

Problem

Requirements

Constraints

What to Design

Your Solution

Design a Distributed Rate Limiter

Problem

Requirements

Constraints

What to Design

Your Solution