Slow Dispatch

← Back to essays

Measure before you cache

3 min read

Every 'make it faster' project I've ever joined has had the same first hour. Here's what I've learned about resisting the urge to add a Redis.


Every “make it faster” project I’ve ever joined has had the same first hour: someone confidently identifies the bottleneck, someone confidently disagrees, and we end up arguing about caching strategies for two days before anyone produces a profile.

The actual order of operations is the opposite. You measure first. You decide whether to cache only after you have evidence. The reason this is unintuitive — the reason it took me years of doing it backwards — is that caching is fun and measurement is boring.

The actual order of operations

  1. Measure the thing. Production-shaped traffic, production-shaped data, production-shaped concurrency. Lab numbers lie.1
  2. Look at where the wall time is. Not where you think it is. Where the flame graph says it is.
  3. Then decide if caching is the right tool. Often it isn’t. Often the answer is “remove this redundant call entirely,” or “this is N+1 and has been since 2019.”

I was on the Northwind edge-auth project where we knew the auth round-trip was the bottleneck because we measured it. We measured it because the existing dashboard had been describing the round-trip as “expected baseline” for three years. The dashboard was right about the number and wrong about the framing. Once we framed it as “an unnecessary round-trip we pay every request,” the design was obvious.

If we had cached harder at the application layer first — the intuitive move — we would have made things faster by maybe 10%, declared victory, and shipped the same architectural problem to next year’s on-call rotation.

Why we skip step one

Because step one is boring. Caching is fun. Caching has a satisfying mental model: store thing, retrieve thing, request faster. You can ship a cache in an afternoon and feel like you accomplished something. Profiling takes patience and produces information that is sometimes inconvenient.

There is also a class of engineer — I have been this engineer — for whom the cache is an aesthetic preference. They want to see the latency graph have an if (cached) return cached early-return at the top of every hot function. Whether that early-return is actually pulling its weight matters less than the visceral sense of having reduced the surface area of work the system has to do.

This is fine right up until the cache invalidation problem catches up to you. Which it always does.

A heuristic that has served me well

If you cannot describe, in one sentence, what specific operation the cache is replacing — not “the slow part,” but “this exact lookup that costs N ms and is called M times per request” — you are not ready to cache. Go measure.

If the sentence ends with “and we’d save approximately X ms across approximately Y requests per day,” you have a budget. You can compare it to the cost of the cache (memory, complexity, invalidation, the half-day-a-quarter someone will spend debugging stale reads). You can decide.

If the sentence is “we should cache the slow part,” you are about to cause a pager incident in nine months, and the post-mortem is going to call out a load-bearing TTL that no one remembers setting.

Footnotes

  1. Lab numbers especially lie when the lab is your laptop. Your laptop has a hot CPU cache, a warm filesystem cache, no contention, and a network that is always ten milliseconds away. Production has none of those things.