← All writing
#BackendSystemsJun 2, 2026 · 5 min read

Measuring Latency Honestly

Why averages flatter the truth, what percentiles fix, and how to keep a fast metric from quietly becoming a wrong one.

A while ago a dashboard told me everything was fine. The average response time was healthy, the line was flat, and nobody was complaining loudly — and yet a real slice of users were having a slow, frustrating time. The average wasn't lying, exactly. It was just telling a comfortable story.

That is the trouble with a single number. An average folds the best and the worst experiences into one figure, and the worst ones are the experiences people actually remember. A system can keep a flat average while the slow tail quietly gets slower.

So when I think about measuring latency — or almost anything a product feels — the first job isn't computing a number. It's choosing a number that tells the truth.

Why the average lies

An average is dominated by the bulk of your data. If most requests are fast, a small number of very slow ones barely move it. But that small number is not small to the people living in it: it's the user whose page took five seconds, the request that timed out, the session someone abandoned.

The dangerous part is that the average can stay still while reality gets worse. Your mean holds at a reassuring number while the slowest 10% of requests double. Nothing on the dashboard moves, and yet more people are having a bad time. A metric that can hide a regression isn't really measuring the thing you care about.

Percentiles, and the traps around them

The honest alternative is to report the distribution — percentiles like p90 or p95, the experience at the slow edge rather than the middle. "Ninety percent of requests finished under X" is a sentence about real users, not about arithmetic.

Percentiles come with their own traps, though, and getting them wrong quietly reintroduces the lie:

  • You can't average percentiles. Combining the p90 of yesterday and the p90 of today doesn't give you the p90 of both days. Aggregating across time or buckets has to be weighted and done deliberately, or the summarized number drifts away from the real one.
  • An empty period isn't a zero. If a time bucket has no data, skipping it or treating it as zero bends the trend line. A gap should read as "nothing happened here," not as a dip that never occurred — so empty periods get filled, not dropped.

An average is the story the data tells about itself. A percentile is closer to what people actually felt.

Fast versus correct

Honest measurement runs into a practical wall: the moment you ask these questions over a long date range, the queries get slow and heavy. The usual fix is pre-aggregation — rollup tables that store the answer ahead of time so a big range loads instantly.

But a rollup is a second copy of the truth, and second copies go stale. A rollup can be empty for a period that hasn't been computed yet, or out of date after the underlying data changed. If a dashboard serves that rollup blindly, it shows a confident number that happens to be wrong — the worst kind.

The integrity move is a fallback: when a rollup is missing or stale, fall back to the raw query. It's slower, but it's correct. Speed is allowed to be an optimisation; it is never allowed to become a quiet lie.

The lesson

A metric is an interface to reality, and like any interface it can be honest or flattering. Measuring well is mostly refusing the comfortable version: report the tail, not just the mean; aggregate percentiles carefully instead of averaging them; treat a gap as a gap; and make sure a fast answer is never silently a wrong one. The goal isn't a prettier dashboard. It's a number you can trust enough to act on.