Practice Availability Management
Practice Availability Management
Availability management
ITIL® 4 Practice Guide
AXELOS.com
24th
February
2020
2 Availability management
AXELOS Copyright
View Only – Not for Redistribution
© 2020
Contents
AXELOS Copyright
View Only – Not for Redistribution
© 2020
AXELOS Copyright Availability management 3
View Only – Not for Redistribution
© 2020
AXELOS Copyright
View Only – Not for Redistribution
© 2020
4 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
2 General information
2.1 PURPOSE AND DESCRIPTION
Key message
The purpose of the availability management practice is to ensure that services deliver the
agreed levels of availability to meet the needs of customers and users.
The availability management practice ensures that requirements for the availability of services and
resources are understood and fulfilled efficiently and in line with the organization’s strategy and
commitments. To enable this, this practice is applied throughout the organization’s product and
service lifecycle, from ideation to operations.
This practice is extremely important when products and services are planned and designed;
decisions made at this stage will affect availability levels and related constraints, as well as the
organization’s ability to monitor and manage these aspects.
Availability is an important service characteristic from the consumers’ perspective, and therefore
it is subject to negotiation, agreement, monitoring, and reporting. These activities involve
multiple practices (including the business analysis, relationship management, service design,
service level management (SLM), and measurement and reporting practices, among others), and
the availability management practice is used in conjunction with those to ensure that availability
is sufficiently and consistently addressed.
Definition: Availability
The ability of an IT service or other configuration item to perform its agreed function
when required.
Theoretically, availability is simple to measure and understand; it depends on how frequently the
service fails and how quickly it recovers after a failure. These characteristics are often expressed
as mean time between failures (MTBF) and mean time to restore service (MTRS):
● MTBF measures how frequently the service fails. For example, on average, a service with a
MTBF of four weeks fails 13 times each year.
● MTRS measures how quickly service is restored after a failure. For example, on average, a
service with a MTRS of four hours will fully recover from failure in four hours.
Service availability is central to business success, there is a direct correlation between service
availability and customer and user satisfaction. However, it is possible to achieve customer
satisfaction when services fail. The way in which a service provider reacts in a failed situation has
a major influence on customer perception.
It is difficult to improve availability without understanding how the services support the consumer.
● Provides access to a resource (such as network, print, or email services), availability is defined
and measured in terms of resource availability.
● Includes fulfilment actions (such as user support), availability is often not an applicable
measure. Instead, the focus should be on timely request completion.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
6 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
● the number of users, business units, and/or sites that are impacted; for example, the service
may only be considered unavailable if more than a certain percentage of users are impacted
● whether certain vital users, business units, sites, and so on, are impacted; for example, for an
e-mail service, it may be that, if users who need to communicate directly with customers and
partners are able to use the service, the service is considered available
● the service delivery schedule and peak hours: a service that only has outages at night or on
weekends may not be considered unavailable.
These factors reflect how the service provider and customers define unavailability. It is good
practice to document the agreed availability criteria for the service in a service level agreement.
● The more frequent the outages are, the higher the losses are, because the expenses associated
with managing a loss event and restarting business operations are high.
Availability can be measured, assessed, and reported in various ways. These include, but are not
limited to, the following metrics:
● MTBF
● minimum time between failures
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 7
View Only – Not for Redistribution
© 2020
● number of service disruptions
● total downtime over the period
● maximum single outage
● MTRS.
When defining metrics to measure availability, it is crucial to reflect the business impact of service
disruptions rather than the technical availability of service components.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
8 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Availability Description
measurement method
Incident records Incident records usually include the timestamps when the incident was identified
and resolved so that the duration of outage can be calculated. However, this
method has limitations, including:
● The incident may not be identified and recorded at the same time as the service
becoming unavailable.
● The incident may not be resolved, and its resolution may not be recorded, at the
same time as service availability is restored.
● Not all incidents are availability incidents (see section 2.2.3 for details about
availability criteria).
● Related incident records should be linked and the possible overlap of incidents
over time should be considered in order to accurately estimate the period of
downtime.
These issues might be overcome by developing a service health model; a model that
determines how the underperformance or outage of a component impacts other
components in the service model.
Developing a service health model is a time-consuming exercise that, in many
cases, is not the best use of time because the IT infrastructure changes rapidly.
Business transaction
Business transaction monitoring is a way of measuring the availability and
monitoring/real user
performance of IT services from a business operations/transactions perspective. A
monitoring
variety of data collection methods might be used for the purpose, including
network packet sniffing, log parsing, agent-based middleware protocol sniffing,
reading database records, and others.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 9
View Only – Not for Redistribution
© 2020
Two particular methods of business transaction monitoring are:
● Real user monitoring (RUM) RUM may capture server-side data in order to
reconstruct end-user experience or directly monitor user interactions with the
application and what users experience at the point of service consumption.
2.3 SCOPE
The availability management practice ensures that services deliver agreed levels of availability to
meet the needs of customers and users cost-effectively. To achieve this, the practice includes the
definition, measurement, analysis, and improvement of availability and provides a centre of
expertise for availability matters to support other service management practices.
The scope of the availability management practice is very broad. Almost every ITIL practice
contributes to service availability, directly or indirectly. Activities of other practices that are
closely related to the availability management practice are listed in Table 2.2. It is important to
remember that ITIL practices are merely collections of tools to use in the context of value
streams; they should be combined as necessary, depending on the situation.
Table 2.2 Activities related to the availability management practice described in other practice
guides
AXELOS Copyright
View Only – Not for Redistribution
© 2020
10 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 11
View Only – Not for Redistribution
© 2020
2.3.2 Availability management’s role in managing service risks
The concept of risk is central to the availability management practice. In order to meet service
availability targets, the practice needs information about risks, which can be provided by the risk
management practice.
An effective availability management practice can therefore contribute significantly to risk
management. A large proportion of risk mitigation measures are related in some way to availability
controls.
Availability management generally focuses on identifying and eliminating single points of failure or
unreliable or weak components, when it is cost-justifiable (see 2.4.3 for details).
A practice success factor (PSF) is more than a task or activity, as it includes components of all four
dimensions of service management. The nature of the activities and resources of PSFs within a
practice may differ, but together they ensure that the practice is effective.
The availability management practice includes the following PSFs:
To effectively manage availability, the service provider should identify the service availability
requirements. These requirements should reflect how service customers may be impacted by
service outages.
Identifying a service’s availability requirements may be a separate activity, but it is more
commonly a part of service level negotiation within the SLM practice, or a broader BIA performed
jointly with the service continuity management practice.
Identifying service availability requirements includes:
● understanding customer requirements for service availability
● determining availability criteria
● determining availability metrics and setting targets.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
12 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Service providers must be able to measure, assess, and report availability correctly. It is a widely
accepted practice to report availability as a percentage, which can be calculated using a simple
formula based on uptime and downtime. Although it can be suitable in many cases (especially for
resource provision services), this method lacks visibility of the business impacts of complicated
service disruption scenarios.
It is important to consider various ways of measuring, assessing, and reporting availability,
including, but not limited to, the following metrics (see 2.2.4 for details):
● MTBF
● minimum time between failures
● number of service disruptions
● total downtime over the period
● maximum single outage
● MTRS.
Whichever set of metrics is suitable for a service, it is important to reflect the business impact of
service disruptions, rather than the technical availability of service components.
One of the most important objectives for the availability management practice is to design and
ensure sufficient availability monitoring. Then, to translate the monitoring data into meaningful
service availability information.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 13
View Only – Not for Redistribution
© 2020
Incident records are one obvious source of service disruptions data. However, it is often difficult to
obtain reliable availability data based on incident records, especially for user--reported incidents.
It is also difficult to align the data with agreed service availability metrics.
Infrastructure monitoring tools are more reliable sources of availability data. However, although
they work well for measuring resource provision services, it is very difficult to measure the
availability of services that enable business operations correctly based on infrastructure
monitoring data. Tools such as real user monitoring, business transaction monitoring, and so on can
help with this (see section 2.2.5).
The availability management practice is not only about planning and monitoring availability. This
practice includes the definition and management of controls to manage a range of risks that might
impact service availability. For this, it is used in conjunction with the risk management practice
and other risk-focused practices (including the service continuity management, capacity and
performance management, and information security management practice). An effective
availability management practice can make a significant contribution to risk management 1.
The measures outlined in Table 2.4 may be designed and implemented as a part of an overall risk
mitigation plan.
Improved testing
1
Risk management: ITIL® Practice Guide.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
14 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
When choosing an availability control, the effectiveness and efficiency of each option should be
assessed 2. It is also important to continually control and validate the effectiveness and efficiency
of availability arrangements.
● Efficiency The costs of an availability control should also be assessed and compared to its
benefits. Benefits are calculated by estimating the reduction in the likelihood of incidents after
the control is implemented, then multiplying it by the severity of the impact the incidents
would have if they occurred. This value should be compared in terms of cost to the cost of
implementing the measure (cost benefit analysis can be used here).
It is usually cheaper to design the right level of service availability into a service from the start,
rather than try and add it subsequently. Also, once a service gets a reputation for unreliability, it
becomes very difficult to repair.
The following forms of loss, proposed by FAIR 3, might be useful when assessing service availability
risks:
● productivity the reduction in a service provider’s ability to deliver services
● response expenses associated with managing a loss event
● replacement the intrinsic value of an asset, or the expense associated with replacing lost or
damaged assets (e.g. purchasing a replacement server)
● SLA fines and regulatory judgments legal or regulatory actions levied against the service
provider
The effectiveness and performance of ITIL practices should be assessed within the context of the
value streams to which each practice contributes. As with the performance of any tool, the
2
For details see Risk management: ITIL® Practice Guide.
3
An Introduction to Factor Analysis of Information Risk (FAIR)
ftp://mail.im.tku.edu.tw/Prof_Liang/IRM/10%20An%20Introduction%20to%20Factor%20Analysis%20
of%20Information%20Risk.pdf [Accessed 24th February 2020]
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 15
View Only – Not for Redistribution
© 2020
practice’s performance can only be assessed within the context of its application. However, tools
can differ greatly in design and quality, and these differences define a tool’s potential or
capability to be effective when used according to their purpose. Further guidance on metrics, key
performance indicators (KPIs), and other tools that can help with this can be found in the
measurement and reporting practice guide.
Key metrics for the availability management practice are mapped to its PSFs. They can be used as
KPIs in the context of value streams to assess the contribution of the practice to the effectiveness
and efficiency of those value streams. Some examples of key metrics are given in Table 2.5.
Table 2.5 Example metrics for the practice success factors
Identifying service availability Percentage of products and services with clearly documented
requirements
availability criteria
MTRS achievement
The correct aggregation of metrics into complex indicators will make it easier to use the data for
the ongoing management of value streams, and for the periodic assessment and continual
improvement of the availability management practice. There is no single best solution. Metrics will
be based on the overall service strategy and priorities of an organization, as well as the goals of
the value streams to which the practice contributes.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
16 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Like any other ITIL management practice, the availability management practice contributes to
multiple value streams. It is important to remember that a value stream is never formed from a
single practice. The availability management practice combines with other practices to provide
high-quality services to consumers. The main value chain activities to which availability
management contributes are:
● plan
● deliver and support
● design and transition
● obtain/build
● improve
The contribution of the availability management practice to the service value chain is shown in
Figure 3.1.
Figure 3.1 Heat map of the contribution of the availability management practice to value chain
activities
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 17
View Only – Not for Redistribution
© 2020
3.2 PROCESSES
Each practice may include one or more processes and activities that may be necessary to fulfil the
purpose of that practice.
Definition: Process
A set of interrelated or interacting activities that transform inputs into outputs. A process
takes one or more defined inputs and turns them into defined outputs. Processes define
the sequence of actions and their dependencies.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
18 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Figure 3.2 Workflow for the establishing service availability control process
These activities may be performed with varying levels of formality by many people in the
organization. Table 3.2 describes these activities further.
Table 3.2 Activities of the establishing service availability control process
Activity Description
Identifying service The organization may have draft SLRs and service availability requirements,
availability but they are rarely defined in a measurable and manageable way. Customers
requirements communicate their requirements for service availability based on their
business needs.
The availability management practice should work with the SLM practice to
clarify service availability criteria and availability indicators, which should
accurately reflect the impacts of service outages on the customer (see
sections 2.2.3 and 2.4.1 for details).
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 19
View Only – Not for Redistribution
© 2020
measurement customer reporting requirements, type of the service, and available
requirements monitoring tools.
● availability percentage
● MTBF
● minimum time between failures
● number of service disruptions
● total downtime over the period
● maximum downtime
● MTRS.
Table 3.3 Inputs, activities, and outputs of the analysing and improving service availability process
Risk register(s)
Service specification(s)
These activities may be carried out with varying levels of formality by many people in the
organization.
Figure 3.2 shows a workflow diagram of the process.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
20 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Figure 3.3 Workflow of the analysing and improving service availability process
Activity Description
Service availability The achievement of service availability requirements must be confirmed.
analysis All deviations from pre-defined levels must be subject to investigation, and
corrective action must be undertaken if a fault is found.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 21
View Only – Not for Redistribution
© 2020
Trend analysis should be performed to detect flaws that have not yet
caused incidents. Problems or risks may be logged.
Because business needs and customer demand may change, the levels of
availability for a service may need to be revised. Such reviews should be
part of the SLM practice’s regular service reviews. Inputs from the service
continuity management practice, particularly from BIAs and risk assessment
exercises, should also be considered regularly.
4
See the Service level management: ITIL® 4 Practice Guide for details.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
22 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
The ITIL practice guides do not describe the practice management roles, such as practice owner,
practice lead, or practice coach. They focus instead on the specialist roles that are specific to
each practice. The structure and naming of each role may differ from organization to organization,
so any roles defined in ITIL should not be treated as mandatory, or even recommended.
Remember, roles are not job titles. One person can take on multiple roles and one role can be
assigned to multiple people.
Roles are described in the context of processes and activities. Each role is characterized with a
competency profile based on the model shown in Table 4.1.
Table 4.1 Competency codes and profiles
Examples of roles that are involved in availability management activities are listed in Table 4.2,
together with the associated competency profiles and specific skills.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 23
View Only – Not for Redistribution
© 2020
Table 4.2 Examples of roles with responsibility for availability management activities
AXELOS Copyright
View Only – Not for Redistribution
© 2020
24 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
Although the role of availability manager may be supported with formal positions and job
descriptions, it is unusual to see a dedicated organizational structure for the availability
management practice. Service availability is managed by other practices and organizational
functions. These functions are outlined in Table 4.3.
Table 4.3 Examples of availability management activities connected with other practices and
organizational functions
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 25
View Only – Not for Redistribution
© 2020
Planning and designing service
Performed in conjunction with the risk
availability
manager and business continuity
administrator. Depending on the service
lifecycle stage and organizational context,
a business analyst, architecture manager,
information security manager, and/or
system administrator may be involved
Because availability is impacted by almost every ITIL practice, it is a good idea to appoint an
availability manager who is accountable for ensuring cost-efficient availability management. This
role may be combined with the roles of service continuity administrator or IT risk manager.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
26 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
The effectiveness of the availability management practice is based on the quality of the
information used. This information includes, but is not limited to, information about:
● consumer’s business processes
● services and their architecture and design
● partners and suppliers and information on the services they provide
● regulatory requirements regarding service availability
● technology and services available on the market that may be relevant for service availability
arrangements
● information about monitoring tools and techniques.
This information may take various forms. The key inputs and outputs of the practice are listed in
section 3.
In some cases, the availability management practice can significantly benefit from automation (see
section 3 for details). Where this is possible and effective, it may involve the solutions outlined in
Table 5.1.
Table 5.1 Automation solutions for availability management activities
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 27
View Only – Not for Redistribution
© 2020
Process activity Means of automation Key functionality Impact on the
effectiveness of the
practice
AXELOS Copyright
View Only – Not for Redistribution
© 2020
28 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019
AXELOS Copyright Availability management 29
View Only – Not for Redistribution
© 2020
7 Important reminder
Most of the content of the practice guides should be taken as a suggestion of areas that an
organization might consider when establishing and nurturing their own practices. The practice
guides are catalogues of topics that organizations might think about not a list of answers. When
using the content of the practice guides, organizations should always follow the ITIL guiding
principles:
● focus on value
● start where you are
● progress iteratively with feedback
● collaborate and promote visibility
● think and work holistically
● keep it simple and practical
● optimize and automate.
More information on the guiding principles and their application can be found in section 4.3 of
ITIL® Foundation: ITIL 4 Edition.
AXELOS Copyright
View Only – Not for Redistribution
© 2020
30 Availability management AXELOS Copyright
View Only – Not for Redistribution
© 2020
8 Acknowledgments
AXELOS Ltd is grateful to everyone who has contributed to the development of this guidance.
These practice guides incorporate an unprecedented level of enthusiasm and feedback from across
the ITIL community. In particular, AXELOS would like to thank the following people.
8.1 AUTHORS
Pavel Demin
8.2 REVIEWERS
Roman Jouravlev
AXELOS Copyright
View Only – Not for Redistribution
© 2020
View Only – Not for Redistribution
© 2019