Cutting p95 latency 40% by moving auth to the edge

Moved session validation and per-request authorization to a Cloudflare Worker tier in front of the origin. Same correctness contract; far less work for the application layer.

Role: Senior engineer, infrastructure
Year: 2022
Where: Northwind Studios
Period: Q1 2022 — Q2 2022

Problem

Our auth middleware ran in the application layer. Every request paid an unnecessary 80–120ms round-trip even when the answer was already in cache. The observability dashboard called it "expected baseline." It was not.

Outcome

Authenticated p95 latency 312ms → 187ms. p99 cut by 38%. Origin egress dropped 22% as a side effect.

Stack

Go
Cloudflare Workers
Redis
PostgreSQL
OpenTelemetry
Terraform

The number nobody wanted to look at

Authenticated requests had a steady-state p95 of 312 milliseconds. The internal narrative was “yeah, that’s just what the auth check costs.” The dashboard had a baseline annotation explaining this. The annotation had been there for three years.

When I started pulling threads, the real cost was that every single authenticated request — including ones we already had cached, including ones whose session token we’d validated 8 milliseconds ago for the same user — went through the full middleware stack at the application layer. The session lookup hit Redis in the same data center, but the round-trip from the load balancer to the app pod and back was the actual long pole.

The constraint

We could not change the auth contract. Anything that flipped a single behavior — token expiry semantics, idle timeout, two-factor escalation — would break partner integrations on a six-month deprecation timeline. So whatever we built had to be byte-for-byte compatible with the existing application-layer middleware. No “while we’re here” cleanups. Boring on purpose.

Where the work went

The win came from moving the common case out of the application layer entirely. Cloudflare Workers run before the request ever reaches origin. If the worker can validate the session against a KV-replicated session store, it produces a signed assertion that the origin trusts and skips the redundant work. If the worker can’t decide — token expired, escalation required, anything ambiguous — the request falls through to the existing middleware unchanged.

Three things made this defensible:

The fallback path is the source of truth. Workers can be wrong; origin must not be. Workers produce assertions origin verifies. They never produce assertions origin trusts blindly.
Cache invalidation is a single broadcast. When a session is revoked, we invalidate by user ID, not by token. A revoked user becomes uncacheable for 30 seconds while the broadcast settles. We are okay with a 30-second window for a security event because the application layer still re-validates anything risky.
We rolled out behind a per-request shadow flag for two weeks. The worker decided what it would have decided; the application layer still decided for real. We compared 60 million requests. Found three correctness bugs. Shipped only after those were fixed.

What surprised me

Origin egress dropped 22%. That wasn’t the goal — the goal was latency — but caching session assertions at the edge meant fewer requests were even reaching origin. The savings showed up on the next AWS bill and paid for the project.

What I’d do differently

I’d shadow-test for four weeks, not two. Two of the three correctness bugs we found were in the first week. The third surfaced on day 11. I have no confidence there wasn’t a fourth on day 22 that we shipped past.

Concrete outcome

Authenticated p95: 312ms → 187ms (40% reduction).
Authenticated p99: −38%.
Origin egress: −22% (unbudgeted bonus).
Zero customer-visible incidents during rollout.
Auth contract unchanged. Partner integrations unaffected.

Links

Architecture write-up (mock) ↗