System design: rate limiter
Like a lot of system design exercises, most engineers don’t actually build rate limiters at work. Most companies rely on off-the-shelf solutions they’re already using—Cloudflare, NGINX, etc. Still, rate limiting is a foundational concept for web developers and a classic system design problem.
The ByteByteGo write-up is helpful, but I don’t retain much by just reading. Instead, after getting comfortable with the high-level ideas, I used Claude to build a toy rate limiter.
It was surprisingly satisfying to run curl locally and watch the headers change as requests were consumed:
me@MacBookPro ~> curl -i localhost:3000/ping
...
x-ratelimit-limit: 10
x-ratelimit-remaining: 9
x-ratelimit-reset: 1768250604
...
{"message":"pong","timestamp":"2026-01-12T20:43:18Z"}⏎
me@MacBookPro ~> curl -i localhost:3000/ping
...
x-ratelimit-limit: 10
x-ratelimit-remaining: 8
x-ratelimit-reset: 1768250610
...
{"message":"pong","timestamp":"2026-01-12T20:43:20Z"}⏎
And finally:
me@MacBookPro ~> curl -i localhost:3000/ping
...
x-ratelimit-limit: 10
x-ratelimit-remaining: 0
x-ratelimit-reset: 1768250721
retry-after: 55
...
{"error":"Rate limit exceeded"}⏎
I implemented the rate limiter as a Rails middleware using an in-memory token bucket. At its core, it’s just a hash with some bookkeeping:
module RateLimiter
module Stores
class Memory < Base
def initialize
@data = {}
@mutex = Mutex.new
end
def get(key)
@mutex.synchronize do
entry = @data[key]
return nil unless entry
if entry[:expires_at] && Time.now > entry[:expires_at]
@data.delete(key)
return nil
end
entry[:value]
end
end
...
The mutex caught me off guard. Multithreading is easy to overlook in an exercise like this.
me@MacBookPro ~/R/rate-limiter-tester> rails server -p 3000
=> Booting Puma
=> Rails 8.1.1 application starting in development
=> Run `bin/rails server --help` for more startup options
Puma starting in single mode...
* Puma version: 7.1.0 ("Neon Witch")
* Ruby version: ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
* Min threads: 3
* Max threads: 3
* Environment: development
* PID: 60042
* Listening on http://127.0.0.1:3000
* Listening on http://[::1]:3000
Use Ctrl-C to stop
Even in development, Puma is running multiple threads. Without a Mutex, it’s easy to introduce subtle race conditions and inconsistent token counts under load.
None of this is groundbreaking. You can absorb it by carefully reading an article. But implementing a toy version forces the details to click. When it’s reasonable, I try to build small, working versions of the systems I’m learning—it’s consistently the fastest way I’ve found to make the details stick.