Sample Process Guide - Problem Management
Sample Process Guide - Problem Management
Problem Management
Process Guide
Process Re-engineering
Problem Management Process
Version Control
1
2. Purpose of the Document
This document contains high level process flows pertaining to the Problem
Management Service in XXX IT environment. The document provides a framework
and roadmap from which lower level operational procedures can be defined and
implemented by the Service Improvement Team and IT Service Delivery staff. The
document also serves the purpose of providing material for high level training and
education to end user and IT communities. This aids high level understanding of
process based service delivery and specific process based tasks for the Problem
Management Service.
This process captures information about problems and resolves them, according to
XXX Standards and policies. Problems will flow in from and out to the XXX Incident
management process.
The process identifies, documents, analyses, tracks and resolves all problems within
the XXX IT environment.
The suggested Problem Definition is: Any deviation from an expected norm. That is, a
problem is any event resulting in a loss or potential loss of the availability or
performance to a service delivery resource and/or its supporting environment. This
includes errors related to systems, networks, workstations and their connectivity;
hardware, software, and applications. The recognition of problems can come from
any point in the environment and can be identified using a variety of automated and
non-automated methods
A problem is the underlying cause of one or more incidents, the exact nature of which
has not yet been diagnosed. Restoring normal service to the users should normally
take priority over investigating and diagnosing problems, although this may not
always be possible.
2
2.2 Terminology
INCIDENT
KNOWN PROBLEM
PROBLEM
REQUEST FOR
CHNANGE
A problem is the underlying cause of one or more incidents, the exact nature of which
has not yet been diagnosed. Restoring normal service to the users should normally
take priority over investigating and diagnosing problems, although this may not
always be possible.
A known problem is a problem which has been diagnose and for which a resolution
or circumvention exists. There may be good reasons for leaving a problem outstanding
even though a resolution is possible, for example if the problem is minor and the
resolution will impact on normal service provision.
Incident Control
w Restoring normal service when service has one wrong
Problem Control
w Getting to the route cause of the problems
w Correcting Problems
Management Information
w Resulting from the other areas
3
Problem Management is also concerned with proactively preventing problems
occurring.
The Problem Management process and Incident Management process are closely
linked with many of the Problem Sub-process activities performed by the Helpdesk.
2.4 Scope
The Problem Management service begins with receipt of a problem record.
The assumption upon entering Problem Management is that the problem has already
been logged, as a problem, via the Incident Management Process.
4
3. Overview of the Problem
Management Process
The overall Problem Management process comprises a number of tasks or activities.
1. Notification
The identification of a problem. Examples of a problem might be an outage, an
incorrect or an unusual result. This sub process also includes notifying the
appropriate support structure that there is a problem and a need for assistance.
2. Problem Determination
The collection, analysis, and correlation, of data to determine and isolate the
cause of the problem.
4. Problem Resolution
The identification, implementation, and verification of solutions, and notification
to affected clients.
5. Tracking
The assignment of ownership for resolving problems and the follow-up activity
to ensure that the goals for problem resolution are being met. It includes setting
priorities and escalating issues via the appropriate system.
5
Note:
w Emergency Changes will always relate to a problem record
w There will be known problems that will not be fixed
w There will be known problems for which XXX will be waiting on a vendor to
provide the fix.
To Incident Process
Problem Record
Problem Workaround
Validated Severity
Problem
Determ ination
Escalate to Problem
Level 2 Priority M anager
Updated Problem
Record
Escalations
Com munications
Problem Record
Tracking
Closed Problems
6
3.3 Notification
Inputs: w Problem Record
w External Notification
w User Communication
Outputs: w Assigned Problem
w Escalated record
w
Roles: w Problem Management Controller / Problem
Manager
AIB IT - Notification
Incident from
Incident process
Incorrect
Correct
Escalate to Prblem Validate Assignment Validate Severity Allocate &
Problem Record Level
Management Prioritise
Severity 1 & 2
Critical Situation
Management
The Identification and notification sub process includes the following steps:
7
service delivery groups and / or user management. If necessary, problem manager
should convene “problem/critical situation” meetings with the relevant experts to
determine the best course of action and maintain progress in line with severity
and impact.
If a Level 2 to Level 2 reassignment takes place the group passing the problem
on will notify the incident manager/Helpdesk of the move.
The problem manager will also assign significant problems that have been
externally notified, for example urgent notification of virus signatures from the
relevant external agencies.
Collect & Analyse Cause Identified No Level 3 Required for Yes Invoke Level 3
Validated Severity
Data Yes / No Prob. Determination Support
Yes /No
Yes
Level 2 Priority No
No
Update Problem Prob. Identification Escalate to
Record Complete Yes / No Problem Manager
Yes
Prob. Workaround
8
2. Is this a problem requiring a specialist service (level 2 or 3 support)
Based on the available problem data, decide whether this problem is of a
“specialist” nature, for example a performance problem
3. Correct Owner?
Determine if the problem has been referred to the correct owner (work
group/queue)
6. Resolve Incident
Problem Management controller, co-ordinates actions of, resolution with
assistance and participation of relevant support groups.
9. Is it a Problem?
Determine if the reported problem is actually a problem
9
Update Problem Record to Indicate that Reported Problem is not Actually a
Problem
Update the problem record to indicate why the reported problem is not a problem.
Note: The problem record is then closed by way of the Close Request activity of the
Incident Management service
Incident Management
Call Management is responsible for resetting the severity/priority.
Emergency Change
Mangmt Process
Recover/ Resources
/ Services
Verify recover
actions
Backout bypass
Change
Problem Operational No Operational No Successful YES Update Problem
Management
Record Procedures Procedures Bypass / record with
Required
Recovery details
YES YES No
No
Project Escalate
Appropriate?
Request
Yes / No YES According to
Severity
10
The Problem Workaround & Recovery sub process includes the following steps:
2. Project Required?
Based on Policy.
Determine if a project is required to implement the bypass.
- If yes, proceed to Project Request.
- If No, proceed to Change Management Required?
5. Change Management
If required, invoke Change Control to approve and schedule the workaround.
7. Operational Processes
If a project is not required, start the implementation of the workaround or
recovery plan by way of the operational procedures that perform implementation
tasks such as:
- Emergency Change Management
- Implement the bypass
- Apply temporary fixes
- Recover resources and services
- Verify that the bypass/recovery actions work
- Back out the bypass if it was unsuccessful
11
8. Successful Workaround?
- If Yes, proceed to Update Problem Record to Indicate Workaround was
Successful
- If No, proceed to Update Problem Record to Indicate Workaround was
Unsuccessful.
NO NO
Escalate to
Level 3
Project NO Develop
Project Update
Deferred Project Work Resolution
Proposal PBM Record
Yes / No Plan
YES
12
The Problem Resolution sub process includes the following steps:
3. Review/Design Solution
Review or design the permanent solution for the problem
5. Project Required?
Based on Policy
Determine if a project is required to implement the solution.
- If Yes, proceed to Project Request (in tracking)
- If No, proceed to Select Problem Solution
6. Project Request
If a project is required, invoke Project Request to implement the solution
7. Provide Service
After handling the entitlement failure, determine if service is to be provided; that
is, will the recommended solution or an acceptable alternative be implemented
13
3.7 Problem Tracking
Inputs: w Problem record
w Knowledge database
w Configuration Information
Outputs: w Problem Status (Updated Problem Record)
w Problem analysis information
w Root cause analysis
w Possible Problem Solution
Roles: w Problem Management Co-ordinator / Team
Leader
w Problem Management Controller
Follow up
AIB IT - Problem Tracking enquiries on
actions
Update users
Cordinate / Check status Identify via Helpdesk
Monitor
Problem Record Communicate of call, provide progress issues for
Incident resolution feedback of problems investigation
Ascertain
Trends
Escalate
Problem
Communicate Close
Advise Problem Confirm Review Problem Satisfied Resolution Problem
Manager Resolution Record
Not
Satisfied Route to Re-drive
Problem Problem
Resolution
Project
Required
14
helpdesk), ascertaining trends (for feed into production management information)
and potential escalation
Identify Resolution
Problem owner verifies that the solution has successfully resolved the problem or
known error.
Confirm Resolution
The Problem manager checks for resolution details, confirming details and resolving
any inconsistency with problem owner, change owner and Change Manager as
required.
Close Problem
Problem Manager completes closure details and closure of associated incident links
(unless done by the helpdesk as part of Incident Management in which case Problem
Manager advises Help Desk manager of completion), and closes problem.
Long Term
Recurring Issue Project Required / Long Term Review Project Proposal Project Work /
Bigger Problem Project Request Change Mgmt etc.
Short Term
Update Problem
Record
15
3.8 Report and Control
Inputs: w Problem Record
Process Document
Process Not service
Improvement
Working? improvments
required
NO Problem
Management
Problem Root Cause Project Resolution
Record Resolved Required? Information
Sub-Process
YES
Project Identified Report /
Not Imlemented Escalate
The Report and Control Problems sub process includes the following steps:
3. Project Required?
Based upon the outcome of analysis of generic incidents or problems determine
whether or not specific project activities are required.
16
3.9 Grouped Level 2 XXX Problem Management
Process
Incorrect
Correct
Escalate to Prblem Validate Assignment Validate Severity Allocate &
Problem Record
Management Level Prioritise Emergency
Severity 1 & 2 Change
Mangmt Process
Critical Situation Implement Bypass
Management
Problem Record
Apply Temp
fix
Validated Severity Collect & Analyse Cause Identified No Level 3 Required for Yes Invoke Level 3 Recover/ Resources
Data Yes / No Prob. Determination Support / Services
Yes /No
Yes Verify recover
Level 2 Priority No actions
No
Update Problem Prob. Identification Escalate to
Record Complete Yes / No Problem Manager Backout
bypass
Yes
Prob. Workaround
Change
Problem Operational M anagement
No Operational No Successful YES Update Problem
Record Procedures Procedures Bypass / record with
Required
Recovery details
YES
YES YES No
Problem
Investigate
Level 2 YES Select Review Design
Project
Record Resolution Problem Specify Required?
Solutions Solution
Yes / No Solution Solution Yes / No No
Project Escalate
Appropriate?
NO NO Request
Yes / No YES According to
Severity Follow up
Escalate to
Level 3 enquiries on
actions
Update users
Project NO Develop Cordinate / Check status Identify via Helpdesk
Project Update Monitor
Deferred Project Work Resolution PBM Record Problem Record Communicate of call, provide issues for
Proposal
Yes / No
progress
Plan Incident resolution feedback investigation
of problems
YES Ascertain
Trends
Escalate
Problem
Communicate Close
Process Document Advise Problem Confirm Review Problem Satisfied Resolution Problem
Process Not Improvement service Manager Record
Resolution
Working? improvments
required
Not
Satisfied Route to
NO Problem Re-drive
Problem Root Cause Project Management Problem
Resolution Problem
Record Resolved Required? Information Resolution
Sub-Process
YES Project
Project Report /
Required
Identified Escalate
Not Imlemented
17
3.10 ITIL Problem Management Overview
N
S e t se verity & priority
& a dv is e cu stom er
w ith re f no.
Y
S u pport grou p ring
custom e r w ithin S LA to
discuss p roblem / give
fix tim e / co nfirm prio rity
P riority
chang e Y In form S ervice D esk
need ed w ho w ill chan ge priority
?
N
S up port g rou p pe rform
p rob lem determ in ation
(P D ) and d evelop fix Y C re ate c hang e re cord ,
up date pro blem reco rd,
info rm u ser of sta tus
C h ange
n eed ed C h an ge
? Y
im plem en ted
?
N
S up port gro u p in form custo m er of s olution ,
N
C lose reco rd upd ate prob lem record w ith full d escrip tion
and caus e c od e; set record to "op en,
resolved" s tatus
C ust S at
N Y S D com p lete
que stion naire q ue stionnaire w ith
n eeded c usto m er
END
?
18
4. Problem Management Measures
The reports that are produced for the problem management system are designed to
help manage the process. Daily reports identify results from the previous day, and any
problems, which must be confronted during the day. Weekly reports provide a
summary of the previous week’s success, current status and weekly trend information.
Daily reports are primarily for technicians. Weekly reports enable effective
management of the process. Monthly reports can also enable IT to evaluate the
effectiveness of the problem management system.
19
5. Roles and Responsibilities
5.1 Problem Management Process Owner
Job Purpose This position is a senior service delivery co-ordination and development
role for the Problem Management and underpinning technical services.
Is responsible for ensuring the problem, management system is in
place and effective.
Major Tasks w Is responsible for and owns the overall Problem Management
service
w The process owner must build the process. This includes defining
what is a problem, setting goals and objectives of the problem
management process, understanding what severity’s, priorities,
service levels are required, and setting up the information flows
w Responsible for overall performance to target service levels for
Problem Management and underpinning technical services
w Ultimately responsible for resolving Problem Management and
technology service/s dissatisfaction issues
w Escalates exceptions to senior management as appropriate
w Has a nominated deputy to cover for service owner absence
w Develops requirements for Problem Management standards,
procedures, measurements, tools and technology in conjunction with
the Incident Management service owner
w Sponsors and / or manages internal improvement projects to
implement new technology and process improvement, ensuring
compatibility and integration with other XXX services and non XXX
service providers
w Communicates Problem Management procedures and working
practices and changes to internal standards, processes, procedures
and technology
w Co-ordinates and sets annual service requirements, objectives and
targets for Problem Management and underpinning technical
services in conjunction with technology service owners
w Approves and sponsors Problem Management and technical service
improvement ideas
w Attends appropriate senior management level service support and
development reviews as appropriate
w Involved in development and subsequent agreement of service level
targets and target improvements related to the Problem
Management and underpinning technical services.
20
5.2 Problem Management Controller (Helpdesk –
Level 1)
Job Purpose The Helpdesk personnel play the key role in the day-to-day operation of
the problem management process and in the majority of incidents
becomes the problem owner.
The problem owner/controller assumes responsibility for all
communications and for co-ordinating resolution activity on that problem,
in accordance with severity.
Major Tasks w Is the initial point of contact for the client community
w Do the initial problem logging and problem determination
w Resolve most level 1 problems
w Contact vendors for most hardware problems
w Do the problem tracking.
w Provide feedback to the client who reported the problem
w Records all calls that require a problem or incident to be opened
w Complete the initial descriptive portion of the problem record for all
problems
w Assign problem severity level and the initial priority
w Update the problem record and maintain a list for tracking all problems
that have been assigned problem numbers
w Assign the problem and send a copy of the problem record to the
appropriate group(s) for additional problem determination and problem
resolution
w Reassign the problem if the Level 2 that was first assigned is not the
correct group to fix the problem
w Summarise daily, weekly and monthly statistics and provide reports to
interested departments
w Provide the problem management co-ordination. In that role, the
responsibilities are:
w Oversee and track all exception problems affecting clients, from initial
recording, through management review, through escalation, through
closing.
w Notify management of the requirements to schedule
escalation/problem review meetings
w Prepare problem reports
w Review closed problems for validity
21
5.3 Problem Management Analysts
Job Purpose The problem analyst is a member of the Problem Management function
and is responsible for examining incidents escalated from first level
support to identify their cause. Incidents are either related to existing
problems or known problems, or recorded as new problems which will
normally be allocated to a support area and subsequently progressed by
the problem owner / controller.
Major Tasks w Responsible for effective implementation and maintenance of Problem
Management procedures and working practices
w Defines training and development needs for individuals within the team
w Ensures adherence to staff training plan
w Undertakes performance review meetings with team members in
compliance with XXX policy
w Invokes escalation procedures and communicates with management
as appropriate
w Identifies and reports exception items to management as appropriate
w Identifies incident and problem trends to anticipate potential service
outages and duplicated problems
w Co-ordinates / undertakes appropriate action as a result of service
deterioration
w Participates in customer satisfaction surveys obtaining feedback from
customers with respect to service level attainment and service quality
and feeding information into service improvement process
w Provides first line escalation point for customer service dissatisfaction
w Recommends working practice improvement ideas with the team,
passing them to the Problem Management Controller and / or Service
Owner for approval and action
w Provides individual input to Problem Management service
improvement.
22
5.5 Level 2 Support (Operations/Other)
Job Purpose Level 2 support is responsible for problem determination and resolution,
and for bypass, recovery and / or circumvention when the Helpdesk
(Level 1) or operations functions are unable to resolve the problem.
Operations have specific responsibility for identifying those problems
that are caused by systems and operational activities.
Major Tasks w Timely acceptance of responsibility for resolving problems which are
assigned by the Helpdesk
w Timely reaction based on priority of the problem
w Meeting the established objectives for the problem resolution priority
w Determining the failing component or the cause of the problem
w Creating bypass/recovery/circumvention procedures, making the
decisions as to when they need to be invoked, and invoking them
when necessary
w Providing the solution to the problem or contacting the vendor to
resolve
w Updating the resolution section of the problem record; working with
the Helpdesk when the problem status changes, when there is
activity, and when the problem is resolved
w Assisting with Problem Determination when requested by others
Operations
w Notify the Helpdesk of problems, in the operations environment,
which will affect the user community
w Identify the failing component or the cause of the problem
w Assist the Helpdesk with problem determination when requested
w Help determine the availability of Bypass/Recovery Procedures
w Obtain approval for Bypass/Recovery procedures and execute them
when necessary or contact the appropriate group to perform
Bypass/Recovery
Update the problem record or have the Helpdesk update it.
Need to add more of a Level 3 description, so that level 3 can be integrated into the
process.
23
24
6. Appendices
6.1 Appendix A: Assigning Severity Codes
The impact of a problem is a composite of many factors: the number of clients
affected, the type of service disrupted, the length of outage, the number of times the
problem has recurred, the availability of a workaround, and the length of time the
problem has been open.
Severity codes provide the means for assigning a value to a problem so that the impact
of the problem can be communicated to the people involved in the Problem
Management Process. The Help Desk personnel will make severity code assignment
for client problems when the problem record is created.
25
Sample Incident/Problem Close Codes
A = User Error
B = Request for Information / Education / Advice
C = Desktop Hardware
D = Desktop Software
E = System Hardware
F = System Software
G = Network
H = Security
I = Change
J = Duplicate Call
26
6.2 Appendix B: Managing Escalation
Escalation is a normal part of the problem management process, which recognises that
some problems will not be resolved within established time frames.
The Helpdesk, with the participation of the appropriate level 2 departments and
managers, manages the escalation process. The purpose of the escalation process is to
bring additional resources to a problem which is not meeting the resolution objective
for any number of reasons, such as lack of resource, problem more difficult to resolve
than anticipated, lack of attention on the part of the client etc.
The escalation process is the means for bringing additional effort and emphasis to a
problem.
27
w The Helpdesk Manager identifies the appropriate managers or supervisors to be
involved. They set the objectives of the escalation and identify who needs to be
involved as part of the resolution team
w The Helpdesk provides the history of the problem (via the Helpdesk call record)
and ensures that an action plan is developed
w The team develops an action plan that outlines the action and sets target times and
ensures resource commitments
w If there is no agreement on a plan, or if the objective is missed, the problem is
escalated to the next level of management
w The assignee ensures that the affected department/clients are notified and are in
agreement with the plan. If they are not, then agreement must be obtained
w The Helpdesk documents the results of the escalation
w The assignee notifies the appropriate management of the situation and plan
w The assigned Level 2 department is responsible for updating the problem call in a
timely fashion.
28
6.3 Appendix C: Support Levels
Support levels define the problem management functions to be performed by the staff
and departments. Example support levels are described below. They can help each
department determine how well prepared they are, to support the problem
management process.
Level 1
w Act as the first point of contact for clients
w Perform problem Logging and tracking
w Answer basic operational and product knowledge questions
w Resolve most procedural and usage problems
w Perform problem determination for some applications and some hardware, and
network usage problems. Level 1 should be able to perform routine Problem
Determination for; PC workstations, key generic applications, and the network
w Dispatch problems to level 2 or vendors
Level 2
w Be able to operate and install
w Take responsibility for problem resolution
w Isolate complex problems to failing component
w Fix routine technical problems
w Identify bypass and recovery procedures
w Work with vendors to resolve problems
w Use diagnostic tools
w Update problem tracking system
Level 3 (usually the vendor)
w Work with level 2 to resolve complex problems
w Supply solutions with target time frames
29
6.4 Appendix D: Problem Management System
Participants
The process participants are the XXX IT departments and groups identified below.
w Client
w Problem Management Process Owner
w Helpdesk (level 1)
w Operations
w Other XXX IT departments (level 2) i.e.: ITD, Networking
w Management
w Vendors (Level 3)
30