ITIL Problem Management

Uploaded by

smith d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views6 pages

ITIL Problem Management

Uploaded by

smith d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CHAPTER 10

ITIL PROBLEM MANAGEMENT

In their ITIL 4 framework, Axelos Ltd define the practice of problem man-
agement as being distinct from the incident management practice. Reactive
problem management involves responding to incidents which have already
occurred in order to understand the underlying causes and address these.
Proactive problem management is about identifying risks and responding
to those risks before they manifest themselves in incidents.

PROACTIVE PROBLEM MANAGEMENT

A key component of proactive problem management is to have a well-
defined patching policy. Security risks may be reduced by routinely
deploying security patches issued by vendors in a timely manner. Many
organisations understand the need for security patches, but fail to
take seriously the need to deploy other patches. Patches and hotfixes
are issued for two reasons. One is feature enhancements; the other is
addressing defects in the design of the product. If defect patches are not
deployed, then by definition there are unresolved problems within your
product. Patching policies are needed not just for software applications
but also for the firmware which comes with hardware. There was a
recent case where a hardware vendor, HPE, identified a fault with the
firmware within some of their hard drive products.1 A particular model
of SSD drive would fail with total loss of data after 32,768 hours (less
than four years) unless the firmware was updated. This is an extreme
case where the vendor was proactive in informing their customers of
the need to upgrade the firmware. Hardware vendors produce firmware
updates on a regular basis, and it is important that each organisation has
a patching policy for how frequently they will respond to these updates.
DOI: 10.1201/9781003119975-11 51
ITIL Problem Management

One of the best ways of doing proactive problem management is to

learn from other people’s incidents. Following the industry news can
be useful for alerting you to major, widespread issues. As an example,
there are regular reports on the effects of Microsoft’s Patch Tuesdays
(the date each month when new Windows updates are issued) on the
stability of the computers receiving the patches. However, being part
of a support community can add greater value than this. Sharing your
experiences and then learning from other people’s experiences is use-
ful in its own right, but also provides greater leverage with vendors.
Vendors are more likely to address an underlying issue if multiple cli-
ents are pursuing them, but if those clients are working together this
may add extra weight to their individual voices. I have known vendors
who claimed that teething issues with a new software application were
local to our organisation, but when I spoke to other organisations using
the same product it was apparent that the issues were ubiquitous. We
were able to apply greater pressure on the vendor when we combined
to speak with a common voice.
Various techniques mentioned earlier in this book, such as failure
mode analysis, are important tools in proactive problem management.
There is value in conducting an independent audit of an end to end
service in order to assess the risks to that service.

PROBLEM CONTROL
ITIL 4 recommends that a key aspect of problem management is the
process developed for controlling and managing problems. Each prob-
lem which is identified (either through reactive or proactive problem
management) should be recorded in a problem record within an ITSM
tool or similar system. Problem records should be linked to related
resources. In reactive problem management, the related incidents
should be linked to the problem record. Configuration Items (CIs) such
as desktops, servers, printers and software assets should also be linked
to the problem record as required. The problem record is a way to:
• collate the information
• prioritise the effort
• coordinate who is involved

52
Problem Control

• spin off tasks for people to engage in to progress the problem

diagnostics and resolution
• keep a historical record which may be referred to should a
similar problem occur in the future
Each problem record will have a lifecycle. Note that different stages in
the lifecycle may overlap. Some organisations prefer greater granularity
in the lifecycle, whilst others will utilise a more coarse approach, but
the following stages may be helpful:
• Logged: a problem record is created because a problem is
suspected. At this stage, the problem has not been confirmed.
In reactive problem management, a logged problem record
indicates that there is a suspicion that a group of incidents may
be related and that the problem has not been seen before. In
proactive problem management, it may be that an issue has been
identified in another organisation but it is not clear at this stage
whether that issue will affect your organisation.
• Identification: this is a confirmation stage where a consistent
problem is confirmed and ideally is reproducible. Data is
collated at this stage. A trawl through recent incidents may
surface further ones, which were not initially identified, as being
related. A prioritisation process needs to happen at this point
to determine how much effort will be devoted to this problem
record. This is typically scored according to both impact and
urgency. Some problem records will be left at this stage because
either the impact or the urgency is low. They will be reviewed
periodically to see whether new data (e.g. additional incidents)
warrant a change to the priority.
• Investigation: The problem solving techniques outlined in this
book may be employed in order to identify one or more root
causes or other possible means of progressing the problem.
• Known Error: ITIL defines a Known Error to be a problem which
has been analysed but not resolved. From a problem resolution
point of view, this is not an important stage. However, it is
useful for the Service Desk to have a list of current Known
Errors, together with an explanation on how to identify whether
an incident is related to them and also what action should
be taken if one is encountered. It is worth reflecting on the
frustration experienced by Service Desk analysts if they spend

53
ITIL Problem Management

time trying to resolve an issue for a customer, fail, refer it to

second line and are only then told that this is a known error.
• Workaround available: The role of incident management is to
get users/customers back up and working as quickly as possible.
It is often possible to identify a workaround which will achieve
this as an interim solution whilst the permanent solution is
sought. In an earlier example, I noted that clearing the web
browser cookie cache before visiting a web application provided
a viable workaround (as did accessing it from an incognito
window). Whilst this was not a desirable action to have to take
for any prolonged period of time, it did allow users to carry on
working whilst the IT teams identified the right solution and
implemented it. Where there are outstanding incidents, the
workaround needs to be communicated to those users. Some
workarounds become permanent workarounds. It should be
noted that these increase the organisation’s technical debt and
need to be added to a Continual Improvement register.
• Root Cause Identified: Whilst not all problem records get to this
point, it is hoped that for significant problems (problems with a
high impact or a high urgency), the root cause will be identified
within a reasonable timeframe.
• Evaluation: It is tempting to jump straight from identifying the
root cause to fixing it. It is important to include an evaluation
step first. Chapter 9 looks at resolution evaluation methods to
discern the best way of addressing a root cause. It should be
remembered that not all root causes should be fixed. In some
cases, a workaround may be deemed to be adequate. In the
case of the cookie clash previously mentioned, two root causes
were identified. An evaluation needed to be made to determine
whether one or both would be fixed. The evaluation decided
that the cookie needed to be fixed because it might impact
other web-based applications as well, either now (but not yet
identified) or in the future. The corporate application was also
patched, because a patch was available and recommended by the
vendor. Whether this was essential or not was subject to a risk
assessment. It was decided that it was easier to apply the patch
than to run with the risk of this happening again. It should be
noted that applying the software patch took a number of days of
54
Knowledge Management

staff effort and if the impact of this problem had been less, this
might not have been considered cost effective.
• Resolving the root cause: Having evaluated the optimum means of
fixing the root cause, this needs to be added to the work queue
for the relevant teams, appropriately prioritised alongside their
other work. Adequate testing of any changes to the system need
to be done before the fix is implemented and normal change
enablement processes followed. Once the fix is in place, the
result on users who have been affected needs to be evaluated.
Sometimes the fix at the server end will not resolve the issue
for the end users, who may also need to make a change on
their desktops (e.g. clearing the cache). If users are still using a
workaround, they need to be notified that the permanent fix is
now in place. The Known Error may be removed from the Service
Desk list of current problems once this has been completed.
• Long-term monitoring: Unlike incidents which should be marked
as resolved as soon after resolution as possible, a problem record
will typically be left in a semi-open state for a period of time in
order to assess whether the fix which has been applied has been
effective. Not all fixes address all issues. If the incidents reoccur,
then the problem record should be re-activated and moved back to
the identification stage. However, it should be noted that it is often
the case that the incidents for two related problems will all be
linked to the first problem record. If there is evidence that the first
problem has been successfully fixed, but that a second problem
exists with a different root cause, then a new problem record
should be created and the relevant incidents moved across. As a
general rule of thumb, an incident should not be linked to two
problem records as there should not be two independent problems
causing it (as distinct from one problem with multiple root causes).
• Closed: a problem record which has been monitored for a reasonable
length of time, with no recurrences may be marked as closed.

KNOWLEDGE MANAGEMENT
One key aspect of both proactive problem management and reactive
problem management is knowing how data is meant to flow between
55
ITIL Problem Management

systems. It is common practice in large organisations for integration

platforms to be used as midpoints between different corporate appli-
cations. Data is not shared on a point to point basis, but is shared to
the integration platform, which then passes the data on. Whilst there
are many technical and operational benefits to this approach, it can
obscure how the data is used. Periodically changes are made to the
meta data for corporate applications – in some cases this will be the
addition of a new field, in some cases it will be the change in format of
a field (e.g. extending the field length to allow for longer surnames or
changing the encoding for a field from 8-bit ASCII to 16-bit UniCode).
In other cases it will just be a change in the contents of the dataset such
as agreeing that invoice codes can now be 6 digits rather than 5 digits
or adding new country codes to reflect a changing political horizon. It
is important to recognise the knock-on consequences of changes to the
data in one corporate application on the other corporate applications
which are downstream consumers of that data. If change enablement
does not adequately consider the implications of these types of change,
then problems can arise sometime later. Tracking these problems back
to the change concerned can prove time-consuming if records are not
kept with sufficient detail of how the data is used.
Knowledge Management may be used both for keeping track of
shared data about the services and systems available and for provid-
ing the Service Desk analysts with checklists for drill down and other
means of resolving incidents.

SUMMARY
A formal practice and process for problem management, such as
the ITIL 4 practice, is a good way of methodically keeping track of
problems.

Note
1 https://support.hpe.com/hpesc/public/docDisplay?docId=emr_
na-a00092491en_us

Kepner-Tregoe Problem Solving (PDFDrive)
100% (1)
Kepner-Tregoe Problem Solving (PDFDrive)
60 pages
Applied Poetics.
100% (3)
Applied Poetics.
9 pages
Problem Management Process - 2025
No ratings yet
Problem Management Process - 2025
26 pages
Production Support Process
100% (2)
Production Support Process
26 pages
ITIL V2 Questions - Problem Management
No ratings yet
ITIL V2 Questions - Problem Management
6 pages
Problem Management Process Ver1.0
100% (2)
Problem Management Process Ver1.0
32 pages
Library Design: Showcase
100% (1)
Library Design: Showcase
52 pages
Problem Management High Level Plan
No ratings yet
Problem Management High Level Plan
31 pages
PM Training - Process Tool v1.4
No ratings yet
PM Training - Process Tool v1.4
19 pages
3 PRB
No ratings yet
3 PRB
12 pages
Problem Management
100% (1)
Problem Management
18 pages
IRM4720 Assignment 2
No ratings yet
IRM4720 Assignment 2
19 pages
My Father's Dragon
No ratings yet
My Father's Dragon
2 pages
ITIL Prob Man
No ratings yet
ITIL Prob Man
3 pages
Problem Management Process
100% (1)
Problem Management Process
15 pages
Unit 4 It Infrastyre
No ratings yet
Unit 4 It Infrastyre
9 pages
Materi 12 - Service Operation
No ratings yet
Materi 12 - Service Operation
70 pages
Aero Gels
No ratings yet
Aero Gels
4 pages
ProblemManagementProcessDocument v02
No ratings yet
ProblemManagementProcessDocument v02
21 pages
Problem Managent
No ratings yet
Problem Managent
14 pages
34 Parachuting
No ratings yet
34 Parachuting
11 pages
Deal With Production Issues
100% (3)
Deal With Production Issues
41 pages
Terminal Velocity
No ratings yet
Terminal Velocity
3 pages
Explaining Lift With AI.
No ratings yet
Explaining Lift With AI.
2 pages
Environment and Climate.
No ratings yet
Environment and Climate.
2 pages
Problem
No ratings yet
Problem
2 pages
Positive Psychology
No ratings yet
Positive Psychology
15 pages
Itil Problem Management Process Poster Series Part 1 PDF
No ratings yet
Itil Problem Management Process Poster Series Part 1 PDF
1 page
Practical It Problem Management
No ratings yet
Practical It Problem Management
99 pages
Chapter Seventeen: International Journal For The Study of The Christian Church 2.2 (2002) : 28-43
No ratings yet
Chapter Seventeen: International Journal For The Study of The Christian Church 2.2 (2002) : 28-43
22 pages
Gestión de Problemas
No ratings yet
Gestión de Problemas
15 pages
Reference Case Study
No ratings yet
Reference Case Study
13 pages
InvGate - White Paper - ITSM 101 - Problem Management WP
No ratings yet
InvGate - White Paper - ITSM 101 - Problem Management WP
12 pages
Bacteroides and Prevotella
No ratings yet
Bacteroides and Prevotella
6 pages
Cumulative Index
No ratings yet
Cumulative Index
38 pages
Trend Analysis
No ratings yet
Trend Analysis
5 pages
Fiske
No ratings yet
Fiske
9 pages
S Vet. Med. Ass. Xxxill 1962.: (B) Bayol F
No ratings yet
S Vet. Med. Ass. Xxxill 1962.: (B) Bayol F
2 pages
IEEE P802.1DG - Time-Sensitive Networking Profile For Automotive In-Vehicle Ethernet Communications Call For Participation
No ratings yet
IEEE P802.1DG - Time-Sensitive Networking Profile For Automotive In-Vehicle Ethernet Communications Call For Participation
2 pages
The Yuan and The SDR
No ratings yet
The Yuan and The SDR
2 pages
Problem Management - Centrica
No ratings yet
Problem Management - Centrica
11 pages
Serio Problem Management
No ratings yet
Serio Problem Management
13 pages
Literary Impressionism.
No ratings yet
Literary Impressionism.
2 pages
Incident Management in ITIL 4: Download Now: ITIL 4 Best Practice E-Books
No ratings yet
Incident Management in ITIL 4: Download Now: ITIL 4 Best Practice E-Books
5 pages
ITIL Problem Management
75% (4)
ITIL Problem Management
26 pages
Case Study On Google: 1. Regulatory Risks Are Increasing
No ratings yet
Case Study On Google: 1. Regulatory Risks Are Increasing
4 pages
Rule-Based Problem Classification in IT Service Management
No ratings yet
Rule-Based Problem Classification in IT Service Management
8 pages
Aloka 06-09
No ratings yet
Aloka 06-09
2 pages
Problem Management ITIL®4 Practice Guide: View Only - Not For Redistribution © 2019
100% (2)
Problem Management ITIL®4 Practice Guide: View Only - Not For Redistribution © 2019
35 pages
How A Security System Works
No ratings yet
How A Security System Works
12 pages
IT Management Week 3
No ratings yet
IT Management Week 3
32 pages
FS IM Training
No ratings yet
FS IM Training
24 pages
Problem Management Mindmap PDF
No ratings yet
Problem Management Mindmap PDF
1 page
IT Service Management: People Processes Products Partners
No ratings yet
IT Service Management: People Processes Products Partners
25 pages
Effective Implementation of Problem Management in ITIL Service Management
No ratings yet
Effective Implementation of Problem Management in ITIL Service Management
6 pages
Defiinisi Problems
No ratings yet
Defiinisi Problems
4 pages
ISOM Notes
No ratings yet
ISOM Notes
6 pages
Milestone - 3-4 - Template - Ananda Aditya Surya
No ratings yet
Milestone - 3-4 - Template - Ananda Aditya Surya
8 pages
Problem Management Best Practices
No ratings yet
Problem Management Best Practices
2 pages
Itil Problem Management Process Poster Series Part 1
No ratings yet
Itil Problem Management Process Poster Series Part 1
1 page
Previews 14955 Pre
No ratings yet
Previews 14955 Pre
10 pages
ITIL - A Guide To Problem Management PDF
0% (1)
ITIL - A Guide To Problem Management PDF
7 pages
ITIL and Security Management Overview
No ratings yet
ITIL and Security Management Overview
15 pages
What Is ITIL
No ratings yet
What Is ITIL
13 pages
Problem Management
50% (2)
Problem Management
51 pages
ITIL Quick Reference
No ratings yet
ITIL Quick Reference
3 pages
Analytics in Incident Management A Clustering Approach-FinalPaper
No ratings yet
Analytics in Incident Management A Clustering Approach-FinalPaper
5 pages
ITIL - A Guide To Incident Management
No ratings yet
ITIL - A Guide To Incident Management
7 pages
ITIL Problem Management Process
No ratings yet
ITIL Problem Management Process
10 pages
WP IT Marquis Problem Management
No ratings yet
WP IT Marquis Problem Management
6 pages
Team Members: Chetnaba Bhalgaria 02 Piyush Jagwani 11 Shivani Pandita 21 Shruti Patil 30
No ratings yet
Team Members: Chetnaba Bhalgaria 02 Piyush Jagwani 11 Shivani Pandita 21 Shruti Patil 30
59 pages
Implementing ITIL For Incident Management
100% (1)
Implementing ITIL For Incident Management
17 pages
Problem Management - Itil
No ratings yet
Problem Management - Itil
27 pages
Information Technology Infrastructure Library (ITIL)
No ratings yet
Information Technology Infrastructure Library (ITIL)
65 pages
Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies
No ratings yet
Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies
26 pages