Avaialbility Management Worksheet
Avaialbility Management Worksheet
As the figure shows, the response time of the IT organization and any external contractors is one of the factors
determining the downtime. As this factor can be controlled by the IT organization and directly affects the service
quality, agreements about it can be included in the SLA.
The measurements can be averaged to give a good impression of the relevant factors. The averages can be used
to determine the achieved service levels, and to estimate the expected future availability of a service.
This information can also be used to develop improvement plans.
The following metrics are commonly used in Availability Management:
Mean Time to Repair - MTTR: average time between the occurrence of a fault and service recovery, also
known as the downtime. It is the sum of the detection time and the resolution time. This metric relates to
the recoverability and serviceability of the service.
Mean Time Between Failures - MTBF: mean time between the recovery from one incident and the
occurrence of the next incident, also known as uptime. This metric relates to the reliability of the service.
Mean Time Between System Incidents - MTBSI: mean time between the occurrence of two consecutive
incidents. The MTBSI is the sum of the MTTR and MTBF.
The ratio of the MTBF and the MTBSI indicates if there are many minor faults or just a few major faults.