0% found this document useful (0 votes)

19 views4 pages

Incident Management

The document serves as a practical guide for incident management within the ITIL 4 framework, detailing its purpose, processes, and the roles of various stakeholders. It emphasizes the importance of minimizing the negative impact of incidents through quick restoration of services and includes recommendations for effective practices. Key concepts such as incident models, major incidents, and prioritization are also discussed to enhance incident management efficiency.

Uploaded by

gmc7m9fxns

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

Incident Management

Uploaded by

gmc7m9fxns

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Home Resources CPD Badges Events Help Dinesh Peter

April 13, 2023 36 min read

ITIL ITIL4 Practice Guides

Incident management: ITIL 4 Practice Guide

83 Likes

This document provides practical guidance for the incident management practice.

Table of Contents

1. About this guide 4. Organizations and people 7. Capability assessment and

development

2. General information 5. Information and technology

8. Recommendations for practice
success
3. Value streams and processes 6. Partners and suppliers

9. Acknowledgements

1. About this guide

It is split into seven main sections, covering:

general information about the practice

the practice’s processes and activities and their roles in the service value chain

the organizations and people involved in the practice

the information and technology supporting the practice

considerations for partners and suppliers for the practice

information on assessing and developing the capability of the practice

recommendations for succeeding in the practice.

1.1 ITIL 4 qualification scheme

Selected content of this guide is examinable as a part of the following syllabuses:

ITIL Specialist Create, Deliver and Support

ITIL Specialist High-velocity IT

ITIL Specialist Monitor, Support, and Fulfil

Please refer to the respective syllabus documents for details.

2. General information

2.1 Purpose and description

Key message

The purpose of the incident management practice is to minimize the negative impact of incidents by restoring normal
service operation as quickly as possible.

The definition refers to a ‘normal service operation’. Conditions of normal service operation are typically defined within service level agreements
(SLAs), or other forms of service quality specification, either agreed with the customer or defined by the service provider. In some cases, internal
service provider’s specification can include more quality criteria than were initially agreed with the customers (see more on this in the service
level management practice guide). The incident management practice is not limited to the service quality perceived by users. It includes
restoration of the normal operation of services and resources, even when their failure or deviation is not visible to the service consumers. In this
case, normal operation can be defined in the technical specifications of services or configuration items (CIs). Finally, if there is no documented
specification of a normal operation, an expert opinion may be used to assess the status of the resources and services.

Tips

A simple flow to decide if there is an incident:

If users perceive the situation as abnormal, it is recommended to register an incident and work on making users happy as quickly as
possible, regardless of whether there is a breach of SLA. If users have not reported anything, but a service level agreement is breached,
register an incident and work to restore the agreed level of service before it affects users. If a service or configuration item are not working
as defined in a technical specification, register an incident and work to restore normal performance before it affects the SLA and users. If
there is no formal specifications of service or component normal operation, or if the service works within the specifications, but a specialist
thinks that it is not operating normally, register an incident and restore normal operation as quickly as reasonably possible.

The incident management practice is a fundamental element of service management. This practice is beneficial for both IT service provider
and their service consumers.

Benefits for service providers include:

Reduced losses caused by IT service unavailability

Better image due to uninterrupted IT services

Fulfilment of the SLAs with service consumers

Reduced costs of service restoration due to knowledge capture and reuse

Higher user satisfaction.

Benefits for service consumers include:

Reduced losses caused by business service unavailability

Better image due to uninterrupted business services

Higher client and employee satisfaction.

The quick restoration of a service is a key factor in user and customer satisfaction, the credibility of the service provider, and the value the
service provider creates in the service relationships.

2.2 Terms and concepts

Definition: Incident

An unplanned interruption to a service or reduction in the quality of a service.

The incident management practice ensures that periods of unplanned service unavailability or degradation are minimized, thus reducing
negative impacts on users. There are two main factors enabling this: early incident detection and the quick restoration of normal operation.

The quick detection and resolution of incidents is made possible with effective and efficient processes, automation, and supplier relationships
alongside skilled and motivated specialist teams. Resources from the four dimensions of service management are combined to form the
incident management practice.

2.2.1 Incident models

Some systems and services demonstrate patterns of operations that include so-called typical incidents. These may be associated with known
errors, such as a lack of compatibility or patterns of incorrect user behaviour. Service providers benefit from defining incident models to
optimize the handling and resolution of repeating or similar incidents. Incident models help to resolve incidents quickly and efficiently, and
often with better results, due to the application of proven and tested solutions.

Definition: Incident model

A repeatable approach to the management of a particular type of incident.

The creation and use of incident models are important activities in the incident management practice. They are described further in section 3.

2.2.2 Major incidents

Although some incidents have a relatively low impact on service operation and on work of users, others may lead to dramatic consequences for
service consumers and the service provider. These are called major incidents and require special attention.

Definition: Major incident

An incident with significant business impact, requiring an immediate coordinated resolution.

A significant business impact is not the only characteristic of a major incident. Major incidents are often associated with a higher level of
complexity. Many systems and services are designed for high availability, and single failures are unlikely to cause a significant business impact.
Failures in these systems are quickly, and often automatically, detected and fixed. However, if multiple seemingly trivial events coincide, they
may lead to a major disruption of multiple services and have a high impact on service consumers. Complex incidents such as this require a
special approach to management and resolution.

It is recommended to implement a model to manage all major incidents, even though major incidents rarely recur and usually differ in nature.
A model for major incidents typically includes:

clear criteria to distinguish major incidents from disasters and other incidents

a special accountable coordinator, sometimes referred to as the major incident manager (MIM)

a dedicated temporary team created to investigate and resolve a major incident

other dedicated resources (including budget); for example, for urgent consultations with third- party

experts or procurement of components

special methods of investigation (for example, swarming: see section 2.4.2)

an agreed model of communications with users, customers, regulators, media, and other

stakeholders

an agreed procedure for review and follow-up activities.

2.2.3 Workarounds

Definition: Workaround

A solution that reduces or eliminates the impact of an incident or problem for which a full resolution is not yet available.
Some workaround reduce the likelihood of incidents.

Sometimes, it may be impossible to find a systemic solution for an incident. In these situations, service providers may apply a workaround.

Workarounds promptly restore the service to an acceptable quality. However, workarounds can increase technical debt and may lead to new
incidents in the future. The problem management practice can be used to reduce the technical debt created by incident workarounds. In
many cases, understanding the cause or causes of an incident can help find an optimal solution.

Definition: Technical debt

The total rework backlog accumulated by choosing workarounds instead of systemics solutions that would take longer.

2.3 Scope

The scope of the incident management practice includes:

detecting and registering incidents

diagnosing and investigating incidents

restoring the affected services and configuration items to an agreed quality

managing incident records

communicating with relevant stakeholders throughout the incident lifecycle

reviewing incidents and initiating improvements to services and to the incident management

practice after resolution.

There are a number of activities and areas of responsibility that are not included in the incident management practice, although they are
closely related to it. These activities are listed in Table 2.1, along with references to the practice guides in which they can be found. Management
practices should be combined to form service value streams, as described in section 3.2.

Table 2.1 Activities related to the incident management practice described in other practice
guides

Activity Practice guide

Investigating causes of Problem management

incidents

Communicating with users Service desk

Implementation of changes to Change enablement; deployment management; infrastructure and platform; project
products and services management; release management; software development management

Monitoring technology, teams, Monitoring and event management

and supplier performance

Management of improvement Continual improvement

initiatives

Management and fulfilment of Service request management

service requests

Restoring normal operations Service continuity management

in case of a disaster

2.4 Practice success factors

Definition: Practice success factor

A complex functional component of a practice that is required for the practice to fulfil its purpose.

A practice success factor (PSF) is more than a task or activity; it includes components from all four dimensions of service management. The
nature of the activities and resources of PSFs within a practice may differ, but together they ensure that the practice is effective.

The incident management practice includes the following PSFs:

detecting incidents early

resolving incidents quickly and efficiently

continually improving incident management.

2.4.1 Detecting incidents early

Previously, it was a common practice to register most incidents based on information from end users and IT specialists. This method of
sourcing information is still widely used, but good practice currently suggests detecting and registering incidents automatically wherever
possible. This can be done immediately after incidents occur and before they start affecting users. This approach has multiple benefits:

Earlier incident detection decreases the time of the service unavailability or degradation, which in turn decreases the losses and other
negative business impact caused by incidents.

The higher quality of the initially collected data supports the correct response to and resolution of incidents, including automated
resolution, also known as self-healing.

Some incidents remain invisible to users, improving user satisfaction and customer satisfaction.

Some incidents may be resolved before they affect the service quality agreed with customers, improving the perceived service and the
reported service quality.

Costs associated with incident management may decrease.

Early detection of incidents is enabled by the monitoring and event management practice. This includes tools and processes for event
categorization that distinguish incidents from other types of events. Automatically detected incidents can be classified either automatically,
manually, or with partial automation. A partially automated categorization is made manually but is based on suggestions made by the system.
Automated incident detection and categorization may benefit from machine learning solutions, using the data available from past incidents,
events, known errors, and other sources. See section 3.1.1 for more details on incident classification.

When automated incident detection is not possible, incidents are usually detected when they have already impacted users and their work.
Even then, the earlier an incident is reported and registered, the better. This can be achieved by promoting a culture of responsible service
consumption among users that includes encouraging reporting of suspicious events and behaviour, and tolerating false reports, within reason.

2.4.2 Resolving incidents quickly and efficiently

This PSF is vital for the success of the incident management practice and for general service quality. After incidents are detected, they should
be handled effectively and efficiently, considering the complexity of the environment:

In clear situations, such as recurring and well-known incidents, pre-defined resolution procedures are likely to be effective. These may
include automated resolution or standardized routing and handling (according to an appropriate pre-agreed incident model).

In complicated situations, where the exact nature of the incident is unknown but the systems and components are familiar to the support
teams and the organization has access to expert knowledge, incidents are usually routed to a specialist group or groups for diagnosis and
resolution. Sometimes this can assist in identifying patterns and lead to a model and/or a solution which can be applied to similar
incidents in the future.

In complex situations, where it is difficult or impossible to define an expert area and group, or where defined groups of experts fail to find a
solution, a collective approach may be useful. This technique is known as swarming.

Definition: Swarming

A technique for solving various complex tasks. In swarming, multiple people with different areas of expertise work together
on a task until it becomes clear which competencies are the most relevant and needed.

Usually, swarming assists in decreasing the level of complexity and makes it possible to switch to the techniques used in a complicated or clear
situations. One example where swarming is particularly relevant are major incidents of an unknown nature. In these situations, pulling
together numerous specialized resources is cost-effective compared to the losses resulting from the incident remaining unsolved.

Physical meetings are not required when swarming. When a plan is established, experts may work alone to run experiments, perform analysis ,
and use other tools to discover what is happening. To engage with the incident, swarming utilizes the correct people rather than a great
amount of people. It is usual to involve people from different teams in swarming; this requires organizational solutions which allow involving
team members on a very short notice.

Other techniques can be used in complex situations. For example, expert analysis may be replaced or combined with a series of safe-to-fail
experiments which aim to improve the understanding of the nature of the incident. Adopting and utilizing a complexity-based framework for
decision-making1 is useful for dealing with incidents in situations of high and changing complexity.

As mentioned in section 2.2.1, some incidents recur and can be handled in a well-known, repeatable way. Ideally, such recurrences should be
analysed and further repetition prevented (this usually involves the problem management practice). However, problem management may take
significant time, and some incident, even if well-understood, cannot be effectively prevented. Their occurrence and nature are clear, and their
handling often can follow a well-defined incident model. To optimize the time and resources for resolution of such incidents, the shift left
approach can be used.

Definition: shift-left approach

An approach to managing work that focuses on moving activities closer to the source of the work, in order to avoid
potentially expensive delays or escalations. In a software development context, a shift-left approach might be characterized
by moving testing activities closer to (or integrated with) development activities. In a support context, a shift-left approach
might be characterized by providing self-help tools to end-users.

In incident management, shift-left can be used to delegate more activities to users: not only reporting an incident, but also self-help using chat
bots, FAQ pages, and other resources. Another form of shift-left is training of the service desk agents to diagnose and solve more different types
of incidents. Any opportunity to solve incidents without transferring them to other teams should be used, especially as the transfer is likely to
take extra time and cost extra money. This should not, however, create unacceptable delays; the speed of incident resolution remains the most
important requirement. The shift-left approach works best in clear, well-known situations, where less experienced people can successfully
follow well-tested and safe instructions.

Regardless of the complexity, it is important to review and confirm the high quality of the incident data from the first steps of incident
handling. This has a strong influence on the:

correctness of the decisions made

speed of service recovery

effective use of resources

ability to find and remedy the underlying cause(s)

possibility and quality of machine learning.

2.4.2. Incident prioritization

Incidents should be resolved as soon as possible. However, the resources of the teams involved in incident resolution are limited and these
teams are often simultaneously involved in other types of work. Some incidents should be prioritized over others to minimize negative impacts
on users and optimize the use of resources.

Definitions

Prioritization

An action of selecting tasks to work on first when it is impossible to assign resources to all tasks in the backlog.

Task priority

The importance of a task relative to other tasks. Tasks with a higher priority should be worked on first. Priority is defined in
the context of all the tasks in a backlog.

There are a number of simple guidelines for prioritization which apply to all types of tasks, including incidents:

Prioritization is a tool for assigning tasks to people in the context of a team. If an incident is handled by multiple teams, it will be prioritized
within each team depending on resource availability, target resolution time, and estimated processing time. If resolution of an incident
requires several tasks to be performed by different teams working in parallel, each team will be prioritizing their own task.

Prioritization is needed only when there is a resource conflict. Where there are sufficient resources to process every task within the time
constraints, prioritization is unnecessary.

In each team, all types of tasks (including incidents) should await prioritization and assignment in a single backlog, together with other
tasks (planned and unplanned).

Visualization tools, such as Kanban, and Lean principles, such as the limiting of work in progress, are useful for effective prioritization.

These rules apply to all types of work, whether planned or unplanned, performed by the service provider’s specialist teams. It is important that
they are agreed and followed by everyone involved in the organization’s service management activities, across all practices. Specific to incident
management, the following additional recommendations should be considered:

Evaluation of the impact and urgency of an incident is performed during the incident classification (see section 3.1.1). This evaluation and
the related time constraints for its investigation and resolution (often guided by a service level agreement) is NOT prioritization. However,
this evaluation provides important input for prioritization.

Resource availability and estimated processing time are defined by each team. For well-known repeating operations, the processing time
may be standardized. The target resolution time may be defined by SLAs and/or the internal service specifications of the service provider.
The impact assessment and completion (resolution) time may change as support teams discover new information.

2.4.3 Continually improving incident management

Periodic reviews of incidents should be conducted to improve the effectiveness and efficiency of the incident management practice. Some
incidents will require an individual review upon resolution. This usually applies to major incidents, new types of incidents, and incidents that
were not resolved on time. Most incidents, however, do not require an individual review beyond confirming their successful resolution.
Nonetheless, an overview of the incident management records at certain intervals will help to identify positive experiences and room for
improvement; share knowledge between specialist teams; identify new types of incidents; and improve or introduce incident models.

Periodic reviews provide an opportunity to analyse the stakeholders’ satisfaction with the incident management practice. Periodic incident
review is also key for the continual improvement of the practice and the organization’s products and services.

Key message

The importance of data

Effective reviews will always need data; therefore, it is important to agree the requirements for documenting it. Data should
be:

- Concurrent: It is useful to know exactly what was done when, to assist in continual improvement. This requires
stakeholders to update incident records during, not after, the event. Also, an accurate timeline may be useful for
investigating the problem.

- Complete: A considerable amount of activity can be hidden behind a simple statement. For example, a statement such as
‘We restarted the cluster and normal function was observed after 45 minutes’ may hide useful detail. It could mean: ‘We
restarted Server 1, then 2, then 3 and found that Server 4, which was operating normally, stopped. We checked the manual
and restarted Servers 2 and 4, then 1 and 3. All were processing data correctly after 10 minutes.’
- Comprehensive: Describing why an action was taken can be just as important as describing the action itself.

2.5 Key metrics

The practice metrics should be applied to a specific context such as type of incident, services, specialist groups, or periods of time.

The effectiveness and performance of the ITIL practices should be assessed within the context of the value streams to which the practices
contribute. The context of the business and the value streams is important to define what is considered good or not so good performance of a
practice. This is why this practice guide cannot recommend universal key performance indicators for incident management: the target values
for each metric can only be defined in the organization’s context.

Table 2.2 Key metrics for incident management

Practice success factors Key metrics

Detecting incidents early Time between incident occurrence and detection

Percentage of incidents detected via monitoring and event management

Resolving incidents quickly and efficiently Time between incident detection and acceptance for diagnosis
Time of diagnosis
Number of reassignments
Percentage of waiting time in the overall incident handling time
First-time resolution rate
Meeting the agreed resolution time
User satisfaction with incident handling and resolution
Percentage of the incident resolved automatically
Percentage of incidents resolved before being reported by users

Continually improving incident Percentage of incident resolutions using previously identified and recorded
management solutions
Percentage of incidents resolved using incident models
Improvement of the key practice indicators over time
Balance between the speed and effectiveness metrics for incident resolution

3. Value streams and processes

Definition: Process

A set of interrelated or interacting activities that transform inputs into outputs. A process takes one or more defined inputs
and turns them into defined outputs. Processes define the sequence of actions and their dependencies.

Incident management activities form two processes:

Incident handling and resolution This process is focused on the handling and resolution of individual incidents, from detection to
closure.

Periodic incident review This process ensures that the lessons from incident handling and resolution are learned and that approaches to
incident management are continually improved.

3.1.1 Incident handling and resolution

This process includes the activities listed in Table 3.1, and transforms the inputs into outputs.

Table 3.1 Inputs, activities, and outputs of the incident handling and resolution process

Key inputs Activities Key outputs

Monitoring and event data Incident detection Incident records

User queries Incident registration Incident status communications

Configuration information Incident classification Problem investigation requests

IT asset information Incident diagnosis Change request

Service catalogue Incident resolution Incident reports

SLAs with consumers and suppliers/partners Incident closure Updates to the knowledge base

Capacity and performance information Restored CIs and services

Continuity policies and plans

Information security policies and plans

Problem records

Knowledge base

Figure 3.1 shows a workflow diagram of the process.

Figure 3.1 Workflow of the incident handling and resolution process

Throughout the process, ownership over each incident should be ensured. The ownership may be transferred via the handling and resolution
process, but each incident should have a person responsible for it at any time. Also, stakeholder communications should be updated whenever
there are changes in the status of the incident.

The process may vary significantly, depending on the incident model. Table 3.2 provides descriptions of the activities in two incident models
(manual and automatic), which are just two of many options. They are meant to illustrate the difference between incident models.

Table 3.2 Activities of the incident handling and resolution process

Activity Manually processed user-detected incidents Automatically detected and processed

incidents

Incident A user detects a malfunction in service operation and An event is detected by a monitoring system
detection contacts the service provider’s service desk through and identified as an incident based on a pre-
the agreed channel(s). A service desk agent performs defined classification.
the initial triage of the user query, confirming that the
query does indeed refer to an incident.

Incident The service desk agent performs incident registration, An incident record is registered and associated
registration adding the available data to the incident record. with the CI where the event has been detected.
Pre-defined technical data is registered. If
needed, a notification is sent to the relevant
technical specialists.

Incident The service desk agent performs initial classification of Based on pre-defined rules, the following is
classification the incident; this helps to qualify incident impact, automatically discovered:
identify the team responsible for the failed CIs and/or
services, and to link the incident to other past and - the incident's impact on services and users
ongoing events, incidents, and/or problems. In some - the solutions available
cases, classification helps to reveal a previously - the technical team(s) responsible for the
defined solution for this type of content. incident resolution if automated solutions are
ineffective or unavailable.

Incident If classification does not provide an understanding of If the automated solution is ineffective or
diagnosis a solution, technical specialist teams perform incident unavailable, the incident is escalated to the
diagnosis. This may involve transfer of the incident responsible technical team to manual diagnosis.
between the teams (also known as functional It may involve transfer of the incident between
escalation), or joint techniques, such as swarming. the teams, or joint techniques, such as
If classification does not provide an understanding of swarming.
a solution, technical specialist teams perform incident If an automated solution failed because of an
diagnosis. This may involve transfer of the incident incorrect CI association, this information should
between the teams (also known as functional be communicated to those responsible for the
escalation), or joint techniques, such as swarming. configuration control (see the service
If classification is wrong because of an incorrect CI configuration practice guide).
assignment, this information should be
communicated to those responsible for configuration
control (see the service configuration practice guide).

Incident When a solution is found, the relevant specialist teams If there is an automated solution available, it is
resolution attempt to apply it, working sequentially or in parallel. applied, tested, and confirmed. If a manual
It may require the initiation of a change. If the solution intervention is required, a relevant specialist
does not work, additional diagnosis is performed. team attempts to apply it. It may require the
initiation of a change. If the solution proves not
to work, additional diagnosis is performed.

Incident After the incident is successfully resolved, several If the automated solution proves effective,
closure formal closure procedures may be needed: incident records are automatically updated and
- user confirmation of service restoration closed. A report is sent to the responsible
- resolution costs calculation and reporting technical team. If information about the incident
- resolution price calculation and invoicing has been communicated to other stakeholders
- problem investigation initiation at any of the previous steps, the closure of the
- incident review. incident should also be communicated.

After all the required actions are completed and the

incident records are updated accordingly, the incident
is formally closed. This can be done by the product
owner, service owner, incident manager, or service
desk agent, depending on the agreed incident model.

3.1.2 Periodic incident review

This process is focused on the continual improvement of the incident management practice, incident models, and incident handling
procedures. It is either performed regularly or triggered by incident reports highlighting inefficiencies and other improvement opportunities.
Regular reviews may take place every two to three months or more frequently, depending on the effectiveness of the existing models and
procedures.

This process includes the activities listed in Table 3.3 and transforms the inputs into outputs.

Table 3.3 Inputs, activities, and outputs of the periodic incident review process

Key inputs Activities Key outputs

Current incident models and procedures Incident review and Updated incident models
Incident records Incident reports incident records analysis

Policies and regulatory requirements Incident model Updated incident handling procedures
improvement initiation

Configuration information IT asset information Incident model update Incident records

communication

SLAs with consumers and suppliers/ partners Change requests Improvement

initiatives Incident review reports

Capacity and performance information

Continuity policies and plans

Security policies and plans

Figure 3.2 shows a workflow diagram of the process.

Figure 3.2 Workflow of the periodic incident review process

Table 3.4 provides a description of the process activities.

Table 3.4 Activities of the periodic incident review process

Activity Description

Incident review The incident manager, together with service owners and other relevant stakeholders, performs a review
and incident of selected incidents such as major incidents, those not resolved in time, or all incidents over a certain
records analysis period. They identify opportunities for incident model and incident handling procedures optimization,
including the automation of incident processing and resolution.

Incident model The incident manager registers the improvement initiatives to be processed with the involvement of the
improvement continual improvement practice or initiates a change request (if incident models, procedures, and
initiation automation are included within the scope of the change enablement practice).

Incident model If the incident model is successfully updated, it is communicated to the relevant stakeholders. This is
update usually done by the incident manager and/or the service or resource owner.
communication

3.2 Value stream contribution

3.2.1 Service value streams

To perform certain tasks or respond to particular situations, organizations create service value streams. These are specific combinations of
activities and practices, and each one is designed for a particular scenario. Once designed, value streams should be subject to continual
improvement.

Definition: Value Stream

A series of steps an organization undertakes to create and deliver products and services to consumers.

In practice, however, many organizations come to use of the value stream concept after having worked for a while (sometimes for years)
without the value streams being managed, mapped, or understood. This means that when the importance of the concept becomes clear, the
first step is to understand and map the ‘As Is’ situation, the de-facto flows of work, and to analyse them in order to identify and eliminate the
non-value-adding activities and other forms of waste.

Identifying and understanding the existing value streams is critical to improving organization’s performance. Structuring the organization’s
activities in the form of value streams allows it to have a clear picture of what it delivers and how, and to make continual improvements to its
services.

Combined, organizations’ value streams form an operating model which can be used to understand and improve how the organization creates
value for the stakeholders.

Many organizations have been following best practice recommendations for various service management practices, such as incident
management, change enablement, software development, and many others. Incident management is one of the most adopted and mature
practices; organizations often start their ITSM journey with incident management.

However, the practices have often been adopted and organized in a siloed, isolated manner, just as they were presented in the service
management bodies of knowledge. In reality, a flow of work required to create or restore value, for a customer or another stakeholder, is almost
never limited to one practice.

The incident management practice is not enough to restore normal service after it has been interrupted. The real-life workflow may include the
activities outlined in table 3.5, which are described as parts of different practices.

Table 3.5 Management practices in the incident resolution value stream

Activity Practice

Incident detection Service desk (for user-reported incidents)

or
Monitoring and event management
Incident management

Incident registration Incident management

Incident classification

Incident diagnosis Incident management

Knowledge management
Problem management

Incident resolution Incident management and one or more of

Problem management
Change enablement
Software development and management
Service validation and testing
Deployment management
Release management
Service desk
Infrastructure and platform management
Supplier management

Incident closure Incident management

Service desk
Monitoring and event management
Problem management
Knowledge management
Relationship management

The incident management practice is core for this value stream, but it is not enough to complete the value stream and restore value co-
creation.

ITIL 4 recommends organizations to examine how they perform work and map all the value streams they can identify. This will enable them to
analyse their current state and identify any barriers to workflow and non-value-adding activities (waste). Wasteful activities should be
eliminated to increase productivity.

Opportunities to increase value-adding activities can be found across the service value chain. These may be new activities or modifications to
existing ones, which can make the organization more productive. Value stream optimization may include process automation or adoption of
emerging technologies and ways of working to gain efficiencies or enhance user experience.

Value streams should be defined by organizations for all their products and services. Depending on the organization’s strategy, value streams
can be redefined to react to changing demand and other circumstances, or remain stable for a significant amount of time. In any case, they
should be continually reviewed and improved to ensure that the organization achieves its objectives in an optimal way.

3.2.2.2 Incident management in other service value streams

The main and most obvious value stream involving incident management is described in section 3.2.2.1. Unlike most other practices, incident
management is rarely involved in other value streams. Incidents occurring in other value streams trigger the value stream to restore normal
operation, rather than involve the incident management practice in their own context. For example, if an incident occurs during a new product
release, it triggers the value stream to restore normal operation, while the release-related value stream continues, most likely, rolling back the
unsuccessful changes. Similarly, if an incident occurs during fulfilment of a service request, it does not involve incident management into the
ongoing request fulfilment workflow; instead, it triggers the value stream to restore the normal operation, while the request-related value
stream continues or restarts.

However, some organizations come up with operating models where incident management is involved in other value streams. The examples
include:

Involving the incident management practice to deal with unplanned events in development, testing, and other pre-live environments.
Although these events do not impact live services and don’t have a direct business impact, they can be processed using the same or similar
processes, competencies, tools and third parties: in other words, the same practice. In most cases, people involved in the related workflows are
different from those involved in management of incidents in the live environment.

Separating the restoration value streams for incidents detected by users and incidents detected by monitoring. The former value stream would
be initiated by users contacting service desk and focused on restoring the services to an agreed level and to the users’ expectations. The latter
value stream would be triggered by events captured by the monitoring systems and focused on restoring the components and services to an
agreed technical specification, preventing any negative impact on the live services and their users.

There is no single operating model fitting all organizations. Different solutions work for different organizations, involving different value streams
which in turn involve different management practices.

3.2.3 Analysing a service stream

3.2.3.1 The key steps of a service value stream analysis

The following are some simple and practical recommendations for service value stream analysis and mapping:

Identify the scope of the value stream analysis It can be mapped to a particular product or service or applied to most or all of them.
Similarly, service value streams may differ for different consumers; for example, incidents can be solved and communicated differently for
internal and external customers, or for B2B and B2C products, or for services based on products developed inhouse or sourced externally.

Define the purpose of the value stream from the business standpoint Make sure the stakeholder’s concerns are clearly understood,
since they are the ones defining value. In case of incident management, it is usually user who needs to return to normal work as soon as
possible; however, there are usually other interested parties. For example, internal users may be unable to provide normal service to a
business customer because of the incident, and the value of the value stream should be considered from the business perspective, not
solely from the user perspective.

Do the service value stream walk Walk through or directly experience the steps and information flow as they go in practice (consider the
Lean technique of Gemba walk):

a. Identify the workflow steps

b. Collect data as you walk

c. Evaluate the workflow steps Typically, the criteria for evaluation are:

value for the stakeholder (does the step add value for the business stakeholder?)

effectiveness or performance (is the step performed well?)

availability (are required resources available to execute the step?)

capacity (are required resources enough?)

flexibility (are the required resources interchangeable within the step?)

d. Map the activities and the information flows In an ideal situation, the flow goes smoothly without delays and pauses, there are no
disconnections between the steps, and the world is level with minimal (and agreed) variation.

e. Create and review the timeline and resource level Map out process times and lead times for resources and workload through the workflow
steps.

Reflect on the value stream map (VSM) Identify factors that might not have been entirely apparent at first. The information collected is
used at this step to find the waste.

Create a ‘to be’ VSM This informs and drives improvement. The value stream should be considered holistically to ensure end-to-end
efficiency and value creation, not just local improvements.

Using the ‘to be’ VSM, plan improvements Refer to the continual improvement practice guide for a practical improvement model.

3.2.3.2 Incident management considerations in a service value stream analysis

To ensure that relevant incident management activities are included in service value streams, the following steps can be added to the above
recommendations.

At the scoping step (1), identify the IT and business services related to the value stream and the involved business stakeholders. For
example, when an IT service provider delivers IT services consumed by business users who in turn provide services to the business clients,
should the incident-related service value stream involve restoration of normal business services for the clients, or should it be limited to
the restoration of normal IT services for the business users?

Make sure the value stream is understood (step 2) from the standpoint of the business, not only of the service provider.

During the service value stream walk (3a), identify other practices involved in dealing with incidents at every step. Which practices provide
required information (configuration data, asset data, previously identified solutions, agreed timeline for the service restoration…)? What if
the incident resolution requires changes? What if incident diagnosis and/or resolution involves third parties?

During the workflow steps evaluation (3c), evaluate the step’s impact on the value restoration. Special attention should be paid to steps
with low business value, low performance, and availability or capacity issues. It is not unusual to find steps which serve some internal
control or bureaucratic purposes but delay the incident resolution.

At the reflection and planning steps (4-5), ensure that the incident management flow is optimized for business value throughout the
stream, not only at the incident management practice activities.

Include creation or update of incident models (see sections 2.2.1 and 3.1.2) in the value stream improvement plans (step 6).

4. Organizations and people

4.1 Roles, competencies, and responsibilities

The practice guides do not describe the practice management roles such as practice owner, practice lead, or practice coach. They focus instead
on the specialist roles that are specific to each practice. The structure and naming of each role may differ from organization to organization, so
any roles defined in ITIL should not be treated as mandatory, or even recommended.

Remember, roles are not job titles. One person can take on multiple roles and one role can be assigned to multiple people.

Roles are described in the context of processes and activities. Each role is characterized with a competency profile based on the model shown
in Table 4.1.

Table 4.1 Competency codes and profiles

Competency Competency profile (activities and skills)

code
L Leader Decision-making, delegating, overseeing other activities, providing incentives and motivation, and
evaluating outcomes

A Administrator Assigning and prioritizing tasks, record-keeping, ongoing reporting, and initiating basic
improvements

C Coordinator/communicator Coordinating multiple parties, maintaining communication between

stakeholders, and running awareness campaigns

M Methods and techniques expert Designing and implementing work techniques, documenting
procedures, consulting on processes, work analysis, and continual improvement

T Technical expert Providing technical (subject matter) expertise and conducting expertise-based
assignments

4.1.1 Incident manager role

In many organizations, the incident manager role is performed by a dedicated person, sometimes under the incident manager job title. In
other organizations, the responsibilities of an incident manager are taken by the person or team responsible for the CI, service, or product with
which the incident is associated; this may be the resource owner, service owner, or product owner.

This role is typically responsible for:

the coordination of incident handling in the organization or in a specific area, such as territory, product, or technology, depending on the
organizational design

coordinating manual work with incidents, especially those involving multiple teams

monitoring and reviewing the work of teams that handle and resolve incidents

ensuring sufficient awareness of the incidents and their status across the organization

conducting regular incident reviews and initiating improvements of the incident management practice, the incident models, and the
incident handling procedures

developing the organization’s expertise in the processes and methods of the incident management practice.

In some cases, organizations may introduce an additional role of the major incident manager (MIM). This role has similar responsibilities to the
incident manager but focuses exclusively on major incidents. This role becomes the main point of contact and coordination during major
incidents. The MIM usually has wider authority and may have dedicated resources for major incident management.

The competency profile for these roles is CMAT, though the importance of each of these competencies varies from activity to activity.

4.1.2 Other roles involved in incident management activities

Examples of other roles which can be involved in incident management activities are listed in Table 4.2, together with the associated
competency profiles and specific skills.

Table 4.2 Examples of roles with responsibility for incident management activities

Activity Responsible Competency Specific skills

roles profile

Incident handling and

resolution process

Incident detection Technical TC Understanding of the service design, resource configuration,

specialist and business impact of events and symptoms
User

Incident registration Incident AT Good knowledge of IT service management (ITSM) tools and
manager procedures
Service desk
agent
Technical
specialist

Incident classification Incident TC Understanding of the service design, resource configuration,

manager and business impact
Service desk Good knowledge of requirements and commitments for
agent incident resolution
Technical Good knowledge of incident models
specialist

Incident diagnosis Supplier TC Understanding of the service design, resource configuration,

Technical and business impact
specialist Knowledge of incident models, diagnostic tools, methods
Analytical skills

Incident resolution Supplier T Understanding of methods and procedures required for

Technical incident resolutions
specialist
User

Incident closure Incident ACT Understanding of the service design, resource configuration,
manager and business impact
Service desk Good knowledge of the requirements and commitments for
agent incident resolution
Technical
specialist

Incident review and incident Incident TCL Understanding of the service design, resource configuration,
records analysis manager and business impact
Product Good knowledge of the requirements and commitments for
owner incident resolution
Service Knowledge of incident models, diagnostic tools, methods, and
owner analytic skills
Supplier

Incident model improvement Incident TMC Understanding of the service design, resource configuration,
initiation manager and business impact
Product Good knowledge of the requirements and commitments for
owner incident resolution
Service Knowledge of incident models, diagnostic tools, and methods
owner Knowledge of the organization's continual improvement and
change enablement practices

Incident module update Incident CA Knowledge of communication procedures and tools

communication manager
Product
owner
Service desk
agent
Service
owner

4.2 Organizational structures and teams

Organizational structure and the size of organization influences how the incident management practice is performed and how it is integrated
in the organization’s value streams. Incident management involves specialists with different areas and levels of expertise; these specialists may
belong to different organizational teams. Typical methods of grouping specialists include, among others:

technical domain

product/service

territory

consumer types.

The method of organization will vary, depending on the organization’s needs and resources. The incident management practice should take a
flexible approach to its organization, involving resources from various internal and external teams as necessary. Either way, it is crucial to ensure
effective cooperation between members of different teams involved in handling and resolution of incidents.

4.2.1 Tiered versus flat team structures

Historically, teams working on incidents had a tiered or levelled structure in which competency, expertise, and specialization increased with
each level. It aimed to resolve most of the incidents at the lowest level possible to reduce costs. Incidents were transferred to the upper level, or
escalated, if they could not be resolved in the current level. In such teams, there were clear boundaries between levels and clear procedures for
the escalation of incidents. Unfortunately, such structures can restrain collaboration and information flow, resulting in prolonged resolution
time. So, for high-priority incidents, teams collaborate to facilitate speedy resolution.

The expansion of Agile methods and evolution of IT systems (such as self-healing systems) call for the wider use of horizontal team structures,
rather than hierarchical team structures. Flatter structures and respective collaboration methods, such as swarming, replace tiered ones to
facilitate cooperation and the free flow of information. The main driver of such change is the rejection of rigid tiering and its replacement by a
more dynamic, self-organized collaboration.

4.2.2 Team dynamics

The incident management practice is the foundation of team dynamics, because they affect the functioning of the support operation. The
following issues regularly recur:

incidents are bounced between teams

team members experience a lack of autonomy and report being blocked by others

a culture prevails where lone ‘heroes’ are rewarded when incidents are solved.

This leads to numerous negative effects, such as:

the incident management practice being out of sync

resolutions happening slowly or not at all

a decrease in morale

a lack of motivation

an unhealthy degree of competitiveness entering the workplace.

Furthermore, trust between team members breaks down. Approaches such as DevOps and techniques such as swarming show some of the
characteristics needed to encourage a positive culture, although it is not necessary to follow these approaches to achieve the correct team
dynamic. The following three main areas need to be addressed.

4.2.2.1 Collective responsibility

If resolving incidents is the primary responsibility, that is what individuals within the teams will focus on. Team dynamics should come second
to achieving the SLA or meeting a deadline. The first step in changing this is to build a culture where team members share successes and
failures. Teams that share responsibility may have a single person who sees an incident through to resolution, but they should be encouraged
to engage other experienced people in the process. When this occurs, the organization will benefit from a fast restoration of normal service as
well as knowledge-sharing.

4.2.2.2 No-blame culture

There should be a no-blame culture within teams, otherwise this will lead to the deterioration of trust between individuals, teams, and
suppliers. Incident investigations and reviews need to address incident resolution and service restoration. Incident teams must be encouraged
to act without fear of retribution if their idea fails to work. This requires transparency and positive leadership. Mistakes should be treated as
shared learning opportunities rather than personal failures.

4.2.2.3 Continual learning

Team members need to share the lessons that they have learned from experimenting so they can learn and improve. This can prove to be a
significant cultural leap in many environments, particularly those with a large percentage of outsourcing.

5. Information and technology

5.1 Information exchange

The effectiveness of the incident management practice is based on the quality of the information used. This includes, but is not limited to,
information about:

customers and users

architecture and design of services

partners and suppliers, including contract and SLA information on the services they provide

policies and requirements which regulate service provision

stakeholder satisfaction with the practice.

This information may take various forms, depending on the incident models in use. The key inputs and outputs of the practice are listed in
chapter 3.

Details of incidents are the most important pieces of information. These usually include:

sources of information

a reference to the product, service, or CI that is failing or performing below standard

the impacted users or services

the symptoms of the poor performance

when the symptoms are observed

the last known time of correct operation before the symptoms began

whether an automatic fix was applied (and if not, the reason)

the location, both geographic and virtual

the nature and extent of the impact on normal operations

similar systems which might be affected by the poor performance and are currently operating normally

the sequence of events leading up to the observation of the symptom.

Additional information that will be exchanged and recorded during the incident management practice should include details of:

the investigation

every action taken, including the results.

Any actions taken should be documented to produce an accurate timeline. If it is not practical to document actions in real time, the
documentation should specify when the action was started and completed to avoid the creation of a false history log. It is preferable, however,
to capture real-time actions if the customer can see the information through a portal. Where possible, the registration of actions should be
automated.

5.2 Automation and tooling

The incident management practice can significantly benefit from automation. The term automation is used in this and other ITIL publications
to refer to the use of digital technology to enable, support, or enhance various activities. This includes, but is not limited to the full automation
of activities where technology solutions remove the need for human intervention. Table 5.1 provides a list of the key automation supporting the
practice and their most common application.

Table 5.1 Automation solutions for the incident management practice

Automatic tools Application in incident management

Monitoring and event management tools Detection of incidents

Analysis of trends and events during incident
diagnosis
Confirmation of incident resolution

Workflow management and collaboration tools (including user Management of incident lifecycle
query (‘ticket’) management tools) Support and automation of incident models
Communications between specialists involved in
incident handling and resolution
Integration of the practices into service value systems

Knowledge management tools Classification and assignment of incidents,

identification of known incident solutions

Service configuration management tools Incident classification and diagnosis

Classification and analysis tools, including ML-enhanced Incident classification and analysis

Remote administration, diagnosis, deployment, and other Incident diagnosis and resolution
infrastructure and software management tools

Work planning and prioritization tools Planning and tracking of improvement initiatives

Analysis and reporting tools Practice measurement and reporting

Survey tools Collection of feedback for practice improvement

Detailed descriptions of how these tools support the practice’s activities are outlined in Table 5.2.

In some cases, all activities after a particular activity in the incident handling and resolution process can be fully automated using pre-defined
scripts and scenarios for specific types of incidents.

Note that automation tools used in the incident management practice could include not only organization-wide tools, which are valid for all
incidents, but also some local custom tools and scripts created as a result of a periodic incident review process for specific incident models.
Both should be used to drive automation efforts.

Table 5.2 Details of automation of the incident management activities

Process Means of automation Key functionality

activity

Incident
handling
and
resolution
process

Incident Monitoring and event Early detection and correlation of High

detection management tools incidents; initiating the incident
management practice

Incident Workflow management and Efficient registration of incidents High

registration collaboration tools, including
user query ('ticket')
management tools

Incident Workflow management and Fast and correct classification and Very high, especially when
classification collaboration tools, including assignment of the incidents, the number of incidents is
user query ('ticket') identification of known solutions, high
management tools identification of major incidents
Knowledge management
tools
Service configuration
management tools
Classification and analysis
tools

Incident Workflow management and Fast and correct definition and testing High, especially when the
diagnosis collaboration tools, including hypothesis, effective collaboration of number of complex incidents
user query ('ticket') multiple specialists/teams requiring manual
management tools collaboration efforts is high
Knowledge management
tools
Service configuration
management tools

Incident Remote administration, Fast correction of the faulty CIs and High, especially when
resolution diagnosis, deployment, and restoration of the services services are provided in
other infrastructure and remote locations
software management tools

Incident Workflow management and Fast and comprehensive overview of Medium

closure collaboration tools, including the incident lifecycle
user query ('ticket')
management tools

Periodic incident
review process

Incident review Analysis and Remote collaboration, incident data Medium to high, especially for high
and incident reporting tools analysis, and users survey data volumes of incidents.
records analysis Workflow analysis and reports
management and
collaboration tools
Survey tools

Incident model Workflow Registration and tracking of the Low to medium

improvement management and improvement initiatives
initiation collaboration tools

Incident model Workflow Communicating updates to the Medium to high, especially when
update management and relevant teams organization is large, and number of
communications collaboration tools updates is high

5.2.1 Recommendations for automation of incident management

The following recommendations can help when applying automation to incident management:

Automate the value stream Although incident management is often one of the first practices to be developed by a service provider, the
implementation of ITSM automation systems also often starts with the incident management processes. Even if other practices may not
be mature at this stage, it is important to define requirements and design workflows that will support the full value stream, from
detection, to resolution of incidents. For incident resolution that requires changes, the automation tool should allow for a simple change
tracking workflow; for recurring incidents, it should be possible to capture and reuse of proven solutions. Think and work holistically.

Allow different workflows for user- and event- initiated incidents Detection, classification, communications, and conditions for closing
a record are all handled differently for user-initiated and event-initiated incidents, even if the latter are handled manually. Attempts to fit
both types of incidents in one workflow with the same forms and business logic are unlikely to be successful. The handling of event-
generated incidents can and should be automated.

Do not overcomplicate the workflows and business rules Forms filled in manually should be user-friendly and should not take much
time to fill in. When designing user journeys and interfaces, treat IT support teams as you would treat external users whose expectations
are based on their experience with mobile apps and modern web sites.

Pay attention to measurement and reporting from the beginning Incident management is a high-load practice, and it is not possible to
monitor the status of incidents and the performance of the practice without a convenient dashboard; it is impossible to understand the
trends and to analyse the work of teams without a flexible reporting engine. The popular statement ‘you cannot manage what you don’t
measure’ is not always true, but it certainly applies to large amounts of data, and the incident management practice generates large
amounts of data.

Allow for swarming and other forms of cross-team collaboration Some incident management tools are designed for a linear flow and
transfer of incident records between the teams. When a joint action is required, it is often unsupported; specialists meet and work
together, but the incident records do not reflect it. Design the tool for collaborative and non-linear workflows.

Communications are important Informing people about incidents, both on the service consumer side and within the service provider, is
a crucial part of incident management. Relevant and proactive communications significantly reduce work duplication and optimize the
resources of the incident management and service desk practices.

Leverage machine learning capabilities Incident detection, matching, classification, and prioritization can be enhanced or fully
automated using machine learning. Effective use of machine learning requires high-quality data and effective integration with various
sources of information. If used properly, it can significantly improve the incident management practice.

6. Partners and suppliers

Very few services are delivered using only an organization’s own resources. Most, if not all, depend on other services, often provided by third
parties outside the organization (see section 2.4 of ITIL Foundation: ITIL 4 Edition for a model of a service relationship). Relationships and
dependencies introduced by supporting services are described in the practice guides for service design, architecture management, and
supplier management.

Partners and suppliers may support the development, management, and execution of the incident management practice. The forms of
support include the following:

Performing incident management activities Some incident management activities can be largely or completely performed by a
specialized supplier. Third parties are often involved in incident diagnosis and resolution, and sometimes in other activities. It is important
to ensure effective integration of the third parties in the incident-related workflows and information exchange, as well as their adherence
to relevant policies. Incident models should define how third parties are involved in incident resolution and how the organization ensures
effective collaboration. This will depend on the architecture and design solutions for products, services, and value streams. Nonetheless,
the optimization of incident models supporting these solutions will involve the incident management practice. Generally, after the correct
model is selected for an incident, further consideration of third-party dependencies is needed during incident diagnosis, resolution, and
review. Defined standard interfaces may become an easy way to communicate the necessary conditions and requirements for a supplier
to become a part of the organization’s ecosystem. Such interface description may include rules of data exchange, tools, and processes that
will create a common language in the multi-vendor environment. Where organizations aim to ensure fast and effective incident
resolution, they usually try to agree close cooperation with their partners and suppliers, removing formal bureaucratic barriers in
communication, collaboration, and decision-making (see the supplier management practice guide for more information).

Provision of software tools Most software tools used for incident management are shared with other practices. However, implementation
and use of integrated service management information systems often starts with automating incident management (and service desk)
activities. In this case, the owner of the incident management practice and the managers of the teams involved in incident management
should define requirements and interact with other teams and practices of the service provider to ensure that the required tools are
procured, implemented, and used in an optimal way.

Consulting and advisory Specialized suppliers who have developed expertise in incident management can help establish and develop
practices, adopt methods and techniques (such as swarming), and initially develop incident models.

7. Capability assessment and development

7.1 The practice capability levels

The practice success factors described in section 2.4 cannot be developed overnight. ITIL maturity model defines the following capability levels
applicable to any management practice:

Level 1 The practice is not well organized; it’s performed as initial or intuitive. It may occasionally or partially achieve its purpose through an
incomplete set of activities.

Level 2 The practice systematically achieves its purpose through a basic set of activities supported by specialized resources.

Level 3 The practice is well defined and achieves its purpose in an organized way, using dedicated resources and relying on inputs from other
practices that are integrated into a service management system.

Level 4 The practice achieves its purpose in a highly organized way, and its performance is continually measured and assessed in the context of
the service management system.

Level 5 The practice is continually improving organizational capabilities associated with its purpose.

For each practice, the ITIL maturity model defines criteria for every capability level from level two to level five. These criteria can be used to
assess the practice’s ability to fulfil its purpose and to contribute to the organization’s service value system.

Each criterion is mapped to one of the four dimensions of service management and to the supported capability level. The higher the capability
level, the more comprehensive realization of the practice is expected. For example, criteria related to the practice automation are typically
defined at levels 3 or higher because effective automation is only possible if the practice is well defined and organized.
Figure 7.1 Design of the capability criteria

This approach results in every practice having up to 30 capability criteria based on the practice PSFs and mapped to the four dimensions of
service management. The number of criteria at each level differs; the four dimensions are comprehensively covered starting from level 3, so this
level typically has more criteria than others.

Table 7.1 outlines the capability criteria that are defined in the ITIL maturity model for the incident management practice.

Table 7.1 Incident management capability criteria

PSF Criterion Dimension Capability

level

Detecting incidents Incidents are usually detected immediately after they occur Value 2
early streams and
processes

Incident detection is automated, where relevant Information 2

and
technology

The users and other relevant stakeholder know how to report Organizations 2
incidents and report them as soon as possible and people

Incident detection is integrated into the relevant value streams Value 3

streams and
processes

Third-party incidents are detected and reported as soon as Partners and 3

possible suppliers

Information about detected incidents is traced and managed in Information 3

an integrated information system and
technology

The effectiveness of incident detection is measured and reported Value 4

streams and
processes

The effectiveness of incident detection is regularly reviewed and Value 5

continually improved streams and
processes

Resolving incidents Incidents are usually resolved in the quickest possible way Value 2
quickly streams and
processes

Incidents are usually resolved within the agreed target resolution Value 2
times streams and
processes

The resolution of incidents is standardized, where relevant Value 3

streams and
processes

The competencies required to resolve incidents are identified Information 3

and skilled human resources are available and
technology

The third-party dependencies affecting incident resolution are Partners and 3

identified and third-party resources are available, where relevant suppliers

Information about incident resolution is tracked and managed in Information 3

an integrated information system and
technology

Incident resolution is optimized for the complexity of the Value 4

environment streams and
processes

Incident resolution is integrated into the relevant value streams Value 4

streams and
processes

The effectiveness of incident resolution is measured and Value 4

reported streams and
processes

The effectiveness of incident resolution is regularly reviewed and Value 5

continually improved streams and
processes

Continually improving The approach to incident management is defined, discussed, Value 3

incident management and agreed at the relevant level of the organization streams and
processes

The responsibility for the approach to incident management is Value 3

clearly defined streams and
processes

The competencies required for performing incident Organizations 3

management are identified and skilled human resources are and people
available

The incident management approach is integrated with other Value 4

standards and approaches adopted by the organization streams and
processes

The effectiveness of the incident management approach is Value 4

measured and reported streams and
processes

The incident management approach is regularly reviewed and Value 5

continually improved streams and
processes

These capability criteria can be used by organizations for self-assessment and improvement of the practice.

7.2 Capability self-assessment

The self-assessment can be conducted by the service provider’s internal audit team, if the service provider has one, or by the respective team of
the parent organization. If there is no specialized team in the organization, the assessment can be done by a team of practice owners and
managers responsible for other management practices of the service provider, or a mixed team of the service provider’s executive leaders and
managers.

To perform a quick self-assessment using the capability criteria, the following rules should be followed.

1. Start with the level 2 criteria. Based on the knowledge of your organization, answer the question, ‘Is this a valid description of our
organization in MOST cases?’

2. If the answer to the question above is ‘yes’, make a list of at least three types of material evidence that could prove the answer.
These can be records, documents, interviews with business stakeholders, or service provider’s employees.

3. If the answer is ‘yes’ to all criteria of level 2, this level is considered achieved. Proceed to the criteria of level 3.

4. If not all criteria of level 2 are met, the practice is considered to be at level 1. Focus on the criteria that are not met; what is missing
in the organization? Why? How can it affect the service consumer and the quality of the IT services? What can be done to meet
the criteria that are currently missed?

5. The same approach is applied at every next level; the practice is considered to be at the level, where all criteria are met. It is
important to focus on the missing capabilities and improvement opportunities, rather than on a formal achievement of a high
capability level.

7.3 Incident management capability development

Management practices should support achievement of the organization’s objectives and enable creation of value for the stakeholders.
Depending on the service provider’s strategy, positioning, and business and operating models, some practices may be more important and
therefore require a higher level of capability. There is no organization that requires all management practices to be at the capability level 5.
Higher capability level provides higher assurance of the fulfilment of the practice’s purpose, but it comes with a cost; cost of management,
automation, and training, for example. To achieve optimal performance with sufficient level of assurance, organizations should define a target
capability level for each management practice.

Figure 7.2 and table 7.2 show the capability development model, which can be applied to every management practice. The structure of this
publication is aligned with the development steps.

Figure 7.2 The capability development steps and levels

Table 7.2 The incident management capability development steps

Capability Define, agree, Comment for incident management Chapter (for

level and implement recommendations)

2 Purpose and Key stakeholder groups; types of incidents 2.1

objectives

Scope 2.3

Processes and Workflows; incident prioritization; roles and responsibilities; 3

activities automation and information exchange
Roles and
responsibilities

Tools and 5
procedures

3 Dependencies and Use of an integrated information system 5

integration

Suppliers and other parties involved in incident management 6

4 Measurement and Metrics 2.5

reporting

5 Continual Regular review of practice and the incident management 2.4,2.5,7

improvement capability development

8. Recommendations for practice success

Most of the content of the practice guides should be taken as a suggestion of areas that an organization might consider when establishing and
nurturing their own practices. When using the content of the practice guides, organizations should always follow the ITIL guiding principles:

focus on value

start where you are

progress iteratively with feedback

collaborate and promote visibility

think and work holistically

keep it simple and practical

optimize and automate.

More information on the guiding principles and their application can be found in section 4.3 of ITIL Foundation: ITIL 4 Edition.

Table 8.1 outlines recommendations for the success of the incident management practice, linked to the relevant guiding principles.

Table 8.1 Recommendations for the success of incident management

Recommendation Comments ITIL

guiding
principles

Look at the incidents from For user-reported incidents, do not hide behind SLAs, aim to restore level of Focus on
the service consumer service which satisfies the users. value
perspective For monitoring-based incidents, assess business impact even if there are no Collaborate
directly affected users yet. and
Prioritize incidents according to their business impact. promote
visibility

Gather and reuse data Many incidents recur. Significant time and resources can be saved by Collaborate
developing incident models and reusing known resolutions. Do not rely on and
individuals' experience, motivate team members to document and share promote
their knowledge. visibility
Leverage automation tools to manage knowledge and automate solutions, Optimize
where possible. and
automate

Understand, manage, and Incident lifecycle spans beyond one practice. Ensure effective integration Think and
improve the incident with service desk, change enablement, problem management, and other work
resolution value stream, not relevant practices. holistically
only the incident Focus on
management practice value

Develop the practice Start with the most critical products and services and with basic workflow Start
continually but don't from detection to resolution. Gradually increase both the scope and the where you
overcomplicate it capability level based on the business requirement and stakeholder are
feedback. Use the capability criteria and continual improvement model as a Progress
guidance. iteratively
with
feedback
Keep it
simple and
practical

Adjust for complexity Shift left and automate handling and resolution of repeating clear incidents. Optimize
Use swarming to optimize resolution of unusual, complex, and major and
incidents. automate
Collaborate
and
promote
visibility

Demonstrate business value Measure the practice and produce regular reports and dashboards for Focus on
internal (within the service provider) and external (service consumer) value
stakeholders. Collaborate
Use dashboards for the current state and regular reports for analysis and and
highlights. promote
visibility

9. Acknowledgements

PeopleCert is grateful to everyone who has contributed to the development of this guidance. These practice guides incorporate an
unprecedented level of enthusiasm and feedback from across the ITIL community. In particular, PeopleCert would like to thank the following
people.

Authors

Barry Corless, Roman Jouravlev, Andrew Vermes

Reviewers

Akshay Anand, Sofi Fahlberg, Michael G. Hall, Steve Harrop, Piia Karvonen, Anton Lykov, Paula Määttänen, Christian F. Nissen, Mark O’Loughlin,
Tatiana Orlova, Elina Pirjanti, Stuart Rance

2023 Revision

David Cannon, Antonina Douannes, Peter Farenden, Adam Griffith, Roman Jouravlev, Kaimar Karu, Barclay Rae, Stuart Rance, Nicola Reeves

Home Resources CPD Badges Events Help Legal

Practice Incident-Management ITILv4
100% (3)
Practice Incident-Management ITILv4
35 pages
Incident Management - ITIL 4 Practice Guide
No ratings yet
Incident Management - ITIL 4 Practice Guide
55 pages
Proposal For Manpower Services
88% (8)
Proposal For Manpower Services
5 pages
ITIL Incident Management
No ratings yet
ITIL Incident Management
6 pages
Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
No ratings yet
Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
226 pages
Deal With Production Issues
100% (3)
Deal With Production Issues
41 pages
Incident Management ITIL®4 Practice Guide: View Only - Not For Redistribution © 2019
100% (1)
Incident Management ITIL®4 Practice Guide: View Only - Not For Redistribution © 2019
33 pages
ITIL V2 - Foundation Essentials
No ratings yet
ITIL V2 - Foundation Essentials
5 pages
Sol. Man. - Chapter 10 She 1
No ratings yet
Sol. Man. - Chapter 10 She 1
9 pages
Detailed Unit Price Drywall (Installation)
100% (1)
Detailed Unit Price Drywall (Installation)
5 pages
Ent600 CASE STUDY REPORT
86% (7)
Ent600 CASE STUDY REPORT
27 pages
Incident Management in ITIL 4: Download Now: ITIL 4 Best Practice E-Books
No ratings yet
Incident Management in ITIL 4: Download Now: ITIL 4 Best Practice E-Books
5 pages
Materi 12 - Service Operation
No ratings yet
Materi 12 - Service Operation
70 pages
ITIL Service Operation in Policy
No ratings yet
ITIL Service Operation in Policy
7 pages
Service Request
No ratings yet
Service Request
4 pages
IT Management Week 3
No ratings yet
IT Management Week 3
32 pages
MIM Bible 1747992840
No ratings yet
MIM Bible 1747992840
13 pages
ITIL4 Practices From Foundation Syllabus
No ratings yet
ITIL4 Practices From Foundation Syllabus
39 pages
IT Availability and Capacity Policy and Procedure
No ratings yet
IT Availability and Capacity Policy and Procedure
25 pages
6 Itil v3 Service Operation v1.8 PDF
No ratings yet
6 Itil v3 Service Operation v1.8 PDF
88 pages
Incident Management Policy Template
No ratings yet
Incident Management Policy Template
8 pages
Based On ITIL v3 Service Operation Publication
No ratings yet
Based On ITIL v3 Service Operation Publication
70 pages
Incident Management The Complete Guide
No ratings yet
Incident Management The Complete Guide
8 pages
Overview To ITIL
No ratings yet
Overview To ITIL
31 pages
FS IM Training
No ratings yet
FS IM Training
24 pages
How A Security System Works
No ratings yet
How A Security System Works
12 pages
Itilv 3
No ratings yet
Itilv 3
56 pages
Information Technology Infrastructure Library (ITIL)
No ratings yet
Information Technology Infrastructure Library (ITIL)
65 pages
Itilv3 Introduction And: Tony Brett Head of It Support Staff Services Oucs
No ratings yet
Itilv3 Introduction And: Tony Brett Head of It Support Staff Services Oucs
58 pages
ITIL Service Operation Poster PDF
100% (8)
ITIL Service Operation Poster PDF
1 page
Servicenow Notes
No ratings yet
Servicenow Notes
6 pages
Incident Management
No ratings yet
Incident Management
9 pages
Service Management Practices
No ratings yet
Service Management Practices
28 pages
ITIL 4 Foundation Key Concepts 2024
No ratings yet
ITIL 4 Foundation Key Concepts 2024
4 pages
Itilv3 Introduction and Overview
No ratings yet
Itilv3 Introduction and Overview
58 pages
Itilv3 Introduction and Overview: Tony Brett Head of It Support Staff Services Oucs
No ratings yet
Itilv3 Introduction and Overview: Tony Brett Head of It Support Staff Services Oucs
58 pages
Itil Notes
No ratings yet
Itil Notes
18 pages
Start Incident: Change Request
No ratings yet
Start Incident: Change Request
10 pages
ITIL and Security Management Overview
No ratings yet
ITIL and Security Management Overview
15 pages
Itilv 3
No ratings yet
Itilv 3
58 pages
Lesson 5 - Incident Management
No ratings yet
Lesson 5 - Incident Management
21 pages
Building An Effective Incident Management Program (For Hoang Anh) (Michael Kehoe, David Cintz) (Z-Library)
No ratings yet
Building An Effective Incident Management Program (For Hoang Anh) (Michael Kehoe, David Cintz) (Z-Library)
88 pages
Service Now
No ratings yet
Service Now
12 pages
ITIL Glossary
No ratings yet
ITIL Glossary
13 pages
Implementing ITIL For Incident Management
100% (1)
Implementing ITIL For Incident Management
17 pages
Itil Cobit Iso20000 Alignment Isaca
No ratings yet
Itil Cobit Iso20000 Alignment Isaca
65 pages
Incident Mangement Mindmap PDF
No ratings yet
Incident Mangement Mindmap PDF
1 page
IT Incident Management - A Getting Started Guide
No ratings yet
IT Incident Management - A Getting Started Guide
25 pages
Itilv3 Introduction and Overview: Tony Brett Head of It Support Staff Services Oucs
No ratings yet
Itilv3 Introduction and Overview: Tony Brett Head of It Support Staff Services Oucs
58 pages
Itilv3 Introduction and Overview: Tony Bret Head of It Support Staff Services Oucs
No ratings yet
Itilv3 Introduction and Overview: Tony Bret Head of It Support Staff Services Oucs
58 pages
ITIL The ITIL Practices
No ratings yet
ITIL The ITIL Practices
30 pages
Common ITIL Terms
No ratings yet
Common ITIL Terms
3 pages
Milestone - 3-4 - Template - Ananda Aditya Surya
No ratings yet
Milestone - 3-4 - Template - Ananda Aditya Surya
8 pages
Effective Incident Management and Resolution Techniques
No ratings yet
Effective Incident Management and Resolution Techniques
13 pages
ITIL4 Summary and Exams
No ratings yet
ITIL4 Summary and Exams
22 pages
ITIL Quick Reference
No ratings yet
ITIL Quick Reference
3 pages
Incident Management Process
100% (2)
Incident Management Process
40 pages
Production Support Process
100% (2)
Production Support Process
26 pages
Sonia's Project-1-4
No ratings yet
Sonia's Project-1-4
82 pages
Unit#3 - Data Science Vs Other Fields
No ratings yet
Unit#3 - Data Science Vs Other Fields
19 pages
About Us - Sunvizion - Telecom OSS - BSS Solutions
No ratings yet
About Us - Sunvizion - Telecom OSS - BSS Solutions
4 pages
The Allocation of Resources: Microeconomics and Macroeconomics
No ratings yet
The Allocation of Resources: Microeconomics and Macroeconomics
3 pages
Challenges of Management in OB
No ratings yet
Challenges of Management in OB
7 pages
The Nascent March of "New-Generation" Food Initiatives in The Emerging Struggle For Food Sovereignty in Turkey
No ratings yet
The Nascent March of "New-Generation" Food Initiatives in The Emerging Struggle For Food Sovereignty in Turkey
422 pages
Project Management Lecture
No ratings yet
Project Management Lecture
416 pages
International Marketing
No ratings yet
International Marketing
14 pages
MBA2023 AMENSISA KEBEDE Thesis Project Management Practices
50% (2)
MBA2023 AMENSISA KEBEDE Thesis Project Management Practices
103 pages
Data Analysis
No ratings yet
Data Analysis
32 pages
May Statement New
No ratings yet
May Statement New
21 pages
Best Trading Method CPSE Etf
No ratings yet
Best Trading Method CPSE Etf
14 pages
Mgs Builders and General Contractor: Subject: Policy and Procedure
No ratings yet
Mgs Builders and General Contractor: Subject: Policy and Procedure
4 pages
Company Profile: Trimo Security and Private Investigators LTD
No ratings yet
Company Profile: Trimo Security and Private Investigators LTD
7 pages
Making Social Spending Work
No ratings yet
Making Social Spending Work
38 pages
Delhi
No ratings yet
Delhi
4 pages
20% Invoice
No ratings yet
20% Invoice
1 page
A Study On Capital Structure AND Performance Analysis of Everest Bank Limited
No ratings yet
A Study On Capital Structure AND Performance Analysis of Everest Bank Limited
14 pages
British Columbia Box Limited Case Study
No ratings yet
British Columbia Box Limited Case Study
5 pages
Balance Sheet: Scud Investigation & Security Agency (Sisa), Inc
No ratings yet
Balance Sheet: Scud Investigation & Security Agency (Sisa), Inc
24 pages
Research Paper
No ratings yet
Research Paper
17 pages
Kettle Cum Flask 16043
No ratings yet
Kettle Cum Flask 16043
6 pages
Referee Report
No ratings yet
Referee Report
5 pages
Dehydrated Fruit and Vegetables
100% (1)
Dehydrated Fruit and Vegetables
17 pages
The Prospects of Fare-Free Public Transport: Evidence From Tallinn
No ratings yet
The Prospects of Fare-Free Public Transport: Evidence From Tallinn
22 pages