Measure before you cache
Published March 14, 2026
A short essay on the most expensive engineering mistake I have made repeatedly: assuming I knew where the time was going.
- #performance
- #pragmatism
Every “make it faster” project I have ever joined has had the same first hour: someone confidently identifies the bottleneck, someone confidently disagrees, and we end up arguing about caching strategies for two days before anyone produces a profile.
The actual order of operations
- Measure the thing. Production-shaped traffic, production-shaped data, production-shaped concurrency. Lab numbers lie.
- Look at where the wall time is. Not where you think it is. Where the flame graph says it is.
- Then decide if caching is the right tool. Often it isn’t. Often the answer is “remove this redundant call entirely,” or “this is N+1 and has been since 2019.”
Why we skip step one
Because step one is boring. Caching is fun. Caching has a satisfying mental model: store thing, retrieve thing, request faster. You can ship a cache in an afternoon and feel like you accomplished something. Profiling takes patience and produces information that is sometimes inconvenient.
The team I worked with on the Northwind edge-auth project knew the auth round-trip was the bottleneck because we measured it. We measured it because the existing dashboard had been describing the round-trip as “expected baseline” for three years. The dashboard was right about the number and wrong about the framing. Once we framed it as “an unnecessary round-trip we pay every request,” the design was obvious.
If we had cached harder at the application layer first — the intuitive move — we would have made things faster by maybe 10%, declared victory, and shipped the same architectural problem to next year’s on-call rotation.
A heuristic that has served me well
If you cannot describe, in one sentence, what specific operation the cache is replacing — not “the slow part,” but “this exact lookup” — you are not ready to cache. Go measure.