External Notes On Monitoring
External Notes On Monitoring
High-level takeaways:
● Alerts are not logs — optimise for a false positive rate of zero, even if the false negative
rate is nonzero
● Measure work being performed that affects business value (e.g. CAPS: capacity,
availability, performance, scalability)
● Consider the best type of metric for the job: gauge, counter, meter, histogram, timer
● Use OODA (observe, orient, decide, act) to build a culture of observability
○ You can’t improve what you can’t measure
○ You can’t measure what you can’t observe
○ Decide what you care about and how to measure it
○ Find a way to get the data
● Aim to deliver knowledge, not just information. Start with a hypothesis and ask the
graphs the question; don’t go to a graph and then ask what the numbers mean
● Don’t use static thresholds — tune thresholds properly
Monitoring challenges
● Measuring business value
○ Customer happiness
■ Time to value
■ Availability
■ Response time
○ Cost efficiency
■ Utilisation
■ Optimisation
■ Automation