Measure before you cache

Published March 14, 2026

A short essay on the most expensive engineering mistake I have made repeatedly: assuming I knew where the time was going.

#performance
#pragmatism

Every “make it faster” project I have ever joined has had the same first hour: someone confidently identifies the bottleneck, someone confidently disagrees, and we end up arguing about caching strategies for two days before anyone produces a profile.

The actual order of operations

Measure the thing. Production-shaped traffic, production-shaped data, production-shaped concurrency. Lab numbers lie.
Look at where the wall time is. Not where you think it is. Where the flame graph says it is.
Then decide if caching is the right tool. Often it isn’t. Often the answer is “remove this redundant call entirely,” or “this is N+1 and has been since 2019.”

Why we skip step one

Because step one is boring. Caching is fun. Caching has a satisfying mental model: store thing, retrieve thing, request faster. You can ship a cache in an afternoon and feel like you accomplished something. Profiling takes patience and produces information that is sometimes inconvenient.

The team I worked with on the Northwind edge-auth project knew the auth round-trip was the bottleneck because we measured it. We measured it because the existing dashboard had been describing the round-trip as “expected baseline” for three years. The dashboard was right about the number and wrong about the framing. Once we framed it as “an unnecessary round-trip we pay every request,” the design was obvious.

If we had cached harder at the application layer first — the intuitive move — we would have made things faster by maybe 10%, declared victory, and shipped the same architectural problem to next year’s on-call rotation.

A heuristic that has served me well

If you cannot describe, in one sentence, what specific operation the cache is replacing — not “the slow part,” but “this exact lookup” — you are not ready to cache. Go measure.