Why IP-Based Rate Limiting Falls Short
The simplest rate limit implementation keys on client IP address: allow N requests per minute from any given IP, block or throttle the rest. This is trivially defeated by anyone with access to more than one IP address β which describes essentially every motivated attacker.
Residential Proxy Networks
Residential proxy services route traffic through real consumer ISP addresses. Requests from these proxies appear to originate from genuine home broadband customers across hundreds of thousands of IP addresses globally. Each individual IP stays well below typical rate limits while collectively achieving arbitrarily high aggregate request rates.
Services like Bright Data, Oxylabs, and their grey-market equivalents sell access by the GB transferred. A $50 budget buys enough residential proxy bandwidth to brute-force a 6-digit OTP space in hours, each attempt from a different IP, against a limit of 5 attempts per IP per hour.
Cloud IP Rotation
AWS, GCP, and Azure provide vast pools of egress IPs. A Lambda function or Cloud Run container gets a new IP on each cold start. An attacker can trigger thousands of cold starts to use thousands of IPs, or explicitly release and reallocate Elastic IPs. This is less convincing than residential proxies (datacenter CIDR blocks are identifiable), but costs are extremely low and the IP pool is essentially unlimited.
IP-only rate limiting is a starting point, not a solution. Any limit that can be bypassed by acquiring a different IP will be bypassed. Effective rate limiting requires additional identity signals beyond the network address.
Header Spoofing: Fooling Proxy-Aware Implementations
Many applications sit behind reverse proxies, load balancers, or CDNs. To get the real client IP, they read headers like X-Forwarded-For, X-Real-IP, or CF-Connecting-IP. If the application trusts these headers without verifying they come from a trusted upstream, attackers can set arbitrary values.
The X-Forwarded-For Trust Problem
X-Forwarded-For is a list header. Each proxy appends the IP it received the request from. A correctly configured chain looks like:
The application should read the rightmost IP that it didn't add itself β the one appended by its own trusted upstream proxy. Reading the leftmost value (client-supplied) is the vulnerability.
Headers Commonly Abused
X-Forwarded-Forβ the most common. Applications should only trust the last value added by a known-good proxy.X-Real-IPβ single-value header set by nginx upstream. If the application trusts it regardless of source, an attacker can override it.True-Client-IPβ Akamai header. Applications that deploy Akamai and trust this header are protected; applications that read it without verifying it came through Akamai are not.X-Cluster-Client-IPβ used by some load balancers. Same issue.
Parameter Manipulation to Evade Limits
Some rate limiting implementations key on request parameters β user ID, email, or username β in addition to or instead of IP. If the keying logic has inconsistencies, attackers can vary parameter encoding to appear as different identities.
Email Normalisation Gaps
A login rate limit keyed on email address can be bypassed if the application doesn't normalise emails before rate-limit lookups. RFC 5321 specifies that the local part of an email is case-sensitive, but in practice Gmail, Outlook, and most providers are case-insensitive and ignore dots in the local part:
Unicode Normalisation
Similar issues arise with Unicode equivalents. Usernames containing characters with multiple Unicode representations (e.g., precomposed vs. decomposed forms, fullwidth vs. halfwidth) can bypass string-equality checks used as rate limit keys. The application authenticates using normalised identity, but the rate limit key uses the raw string.
HTTP Method and Path Variations
Some rate limiters track by (IP, path) tuple but don't account for method variation on the same logical endpoint, or path encoding differences:
Nginx's limit_req module, for example, keys on whatever $request_uri contains β including query strings. An attacker appending ?_=1, ?_=2, etc. may get a fresh bucket for each value if the implementation doesn't strip query parameters from the key.
Distributed Abuse: Staying Below the Threshold
A more sophisticated approach doesn't try to bypass rate limits at all β it respects them, but uses enough parallel identities that the aggregate impact is still damaging.
Credential Stuffing at Scale
A credential stuffing attack using a list of 10 million breached credentials doesn't need to hammer a single endpoint. With 10,000 IPs from a residential proxy pool, each IP sends one request per hour (safely below a 5/hour limit). That's 240,000 login attempts per day across the IP pool, all within rate limits for every individual IP.
Slowloris for Rate Limit Exhaustion
Some rate limit implementations track in-flight requests rather than completed requests. Keeping connections open (slow reading, slow sending) occupies request slots without triggering the per-minute completion-based counter. A small number of connections can exhaust per-IP concurrent connection limits while the rate counter stays at zero.
Account-Level Aggregation
For APIs that require authentication, an attacker who controls many accounts (created with fake email addresses, purchased, or compromised) can distribute requests across accounts. Per-account limits are respected; total impact is unlimited.
This is the primary attack surface for data scraping against APIs that require authentication. Each account extracts small amounts of data per rate window; thousands of accounts extract the entire dataset.
Rate Limiting Algorithm Flaws
Even well-intentioned implementations using the right signals can have algorithmic vulnerabilities.
Fixed Window Race Condition
The fixed window algorithm (allow N requests per minute, reset counter at the top of each minute) has a well-known burst vulnerability. An attacker can send N requests at 11:59:59, then N more at 12:00:01, getting 2N requests in a 2-second window β both windows technically within limit:
The sliding window log and sliding window counter algorithms eliminate this by tracking request timestamps rather than resetting at fixed intervals. Sliding window counter approximates the sliding log at much lower memory cost and is the recommended approach for most use cases.
Token Bucket Timing Leak
A correctly implemented token bucket refills at a fixed rate. But the response behaviour β specifically whether a 429 is returned immediately or after a delay β can leak information about the current bucket state. An attacker who times the gap between a rate-limited response and the next successful response can determine the refill rate and predict exactly when the next token will be available, allowing precise timing of high-value requests.
Leaky Bucket and Queue Amplification
The leaky bucket algorithm queues excess requests and processes them at a fixed rate. An attacker who fills the queue with cheap requests can cause expensive requests from legitimate users to queue behind them β effectively a denial of service without triggering any rate limit violation. The limit says "process 10 requests per second"; the attacker submits 10 cheap requests per second indefinitely, occupying all processing slots.
Account-Level and Token-Level Bypasses
API Key Sharing
Developer APIs that rate-limit by API key are vulnerable to key sharing. If the key-to-account mapping is not enforced (e.g., you can share a key across multiple services), a single paid account's key can be used from arbitrarily many locations simultaneously, multiplying effective throughput without triggering per-key limits that assume single-client use.
OAuth Token Proliferation
Some OAuth implementations allow unlimited token generation. An attacker with a single account can generate thousands of valid access tokens and rotate through them, spreading requests across token-keyed rate limit buckets. If the limit is per-token rather than per-account, this bypasses it trivially.
Shared Infrastructure Limits
When rate limits apply at the infrastructure level (e.g., a CDN or API gateway rule), but the actual limit key is per-upstream-IP rather than per-client, multiple clients behind a corporate NAT or a large university share a limit pool. A single abusive user behind a NAT blocks all other users at the same IP. This is a denial of service via rate limit exhaustion, not a bypass β but it has the same effect of disrupting service.
Building Rate Limits That Hold
Multi-Signal Identity
Effective rate limiting combines multiple signals to form a rate limit key that can't be rotated cheaply. The key should include some subset of:
- Authenticated user or account ID (when available β most reliable)
- Normalised client IP (after correct proxy header handling)
- Fingerprinted device characteristics (TLS fingerprint, HTTP/2 settings, User-Agent)
- Session token or API key
Unauthenticated endpoints (login, registration, password reset) can't use user ID, so they need IP combined with device fingerprinting and ideally challenge-response (CAPTCHA, proof-of-work) for high-sensitivity operations.
Sliding Window Counter in Redis
The dual-key approach is important: an IP limit stops distributed accounts from the same IP, an email limit stops distributed IPs targeting the same account. Both must be satisfied.
Normalisation Before Keying
Rate limit keys must use normalised forms. For email: lowercase, strip subaddressing (+tag), strip dots for known providers. For paths: resolve URL encoding, strip query strings unless they're semantically meaningful.
Progressive Penalties
A hard cutoff at N requests is binary and frustrating for legitimate users who occasionally spike. Progressive backoff is more user-friendly and harder to game:
- 0β5 requests: no restriction
- 6β10 requests: 1-second artificial delay added to response
- 11β20 requests: CAPTCHA challenge before proceeding
- 21+ requests: hard block for 15 minutes
The delay step is important β it burns time for automated tools without blocking legitimate users who notice the slowdown. A bot that ignores the slowdown still wastes wall-clock time, making the attack less economical.
Detection and Response
Signals That Indicate Bypass Attempts
- X-Forwarded-For with many unique values from a single autonomous system number β consistent ASN suggests a single actor with IP rotation.
- Near-limit request rates across many IPs β legitimate users show Poisson-distributed request rates; attackers tuned to stay just under a threshold show suspiciously regular spacing.
- Identical request bodies or payloads from different IPs β legitimate variation is expected; a credential list being tried shows structural similarity across "different" clients.
- Low TLS fingerprint diversity despite high IP diversity β residential proxies often use the same underlying HTTP library, producing identical TLS client hellos from many IPs.
TLS Fingerprinting as a Signal
JA3 (or JA4 β its successor) fingerprints the TLS ClientHello by hashing the version, cipher suites, extensions, and curves. Legitimate users across different machines show diverse fingerprints. Bots using a single HTTP library show identical fingerprints regardless of source IP.
The attacker with a million IPs is a harder problem than the one without. But the attacker with a million IPs and a single HTTP library fingerprint is identifiable. The goal is not to make bypass impossible β it's to make it expensive enough that the economics don't work out.
Observability Requirements
Rate limiting is only useful if you can see it working. At minimum, instrument:
- 429 rate by endpoint, key type (IP vs. user vs. email), and time
- Requests that are delayed vs. blocked β delays are early warning signals
- Top-N rate-limited keys by request count β these are your current attackers
- False positive rate β legitimate users hitting limits due to misconfiguration
A rate limiting system with no observability is a black box. You won't know if it's working, if the limits are too tight for legitimate users, or if an ongoing attack is succeeding by staying under the thresholds.