100% found this document useful (1 vote)
575 views

Problem Management Process Ver1.0

This document outlines Loblaw's Problem Management Process. It defines the process goal as preventing problems and resulting incidents from happening. It describes the process scope, benefits, triggers, and interfaces with other ITSM processes like incident management and change management. Key roles and their responsibilities are defined. The document provides high-level process flows for both reactive and proactive problem management. It also covers topics like problem priorities, service level targets, and metrics.

Uploaded by

drustagi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
575 views

Problem Management Process Ver1.0

This document outlines Loblaw's Problem Management Process. It defines the process goal as preventing problems and resulting incidents from happening. It describes the process scope, benefits, triggers, and interfaces with other ITSM processes like incident management and change management. Key roles and their responsibilities are defined. The document provides high-level process flows for both reactive and proactive problem management. It also covers topics like problem priorities, service level targets, and metrics.

Uploaded by

drustagi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

1

Loblaw

IT Service Management Processes

Problem Management Process

Document Name: Problem Management Process

Version History Version


1.00

Name
Ali Alaswad

Comment
(the reason for the increment to the version)

Date
July 18, 2008

1st draft Final

Document Distribution Control Recipient Name


Bill Charters Patrick Ma Dorota Mac Bobby Seebalack

Version
1.0 1.0 1.0 1.0

Date
July , 2008 July , 2008 July , 2008 July , 2008

Table of Contents
1. Process Goal .................................................................................................................... 4 2. Process Scope.................................................................................................................. 4 3. Process Benefits .............................................................................................................. 4 4. Process Overview ............................................................................................................ 5 4.1. Problem Management includes the following standard phases: ............................ 5 4.2. High Level process Flow (Reactive)........................................................................ 6 4.3. High Level process Flow (Proactive) ...................................................................... 6 6. Process Interfaces with Other ITSM Processes............................................................... 8 8. Roles and Responsibilities ............................................................................................. 10 9. Roles Assignment Matrix .............................................................................................. 12 10. Problem Priorities ....................................................................................................... 13 11. Impact-Urgency Matrix ............................................................................................... 14 12. Problem Service Level Targets Definition ................................................................... 14 13. Major Problem Review ............................................................................................... 15 14. Known Error Database ................................................................................................ 15 15. Process Deliverables ................................................................................................... 15 16. Process Measurement (Metrics) and Reporting ......................................................... 16 16.1. Metrics ................................................................................................................. 16 17. Process Meetings ........................................................................................................ 17 17.1. Problem Management Meeting .......................................................................... 17 17.2. Monthly Meeting ................................................................................................. 19 18. Process RACI Chart .................................................................................................. 20 19. Process Detailed Description ...................................................................................... 23 20. Legend & Definitions................................................................................................... 32 21. Attachments................................................................................................................ 32

1. Process Goal
To prevent problems and resulting incidents from happening, to eliminate recurring incidents and to minimize the impact of incidents that cannot be prevented.

2. Process Scope
Diagnose the root cause of incidents Determine the resolution to those problems. Ensuring that the resolution is implemented through the appropriate control procedures, especially Change Management and Release Management. Maintain information about problems and the appropriate workarounds and resolutions

3. Process Benefits
Improved IT service quality Incident volume reduction Improved knowledge base Permanent solutions Better service desk first-time fix rate (workaround) Works together with Incident Management and Change Management to ensure that IT service availability and quality are increased. Recording information about problems will speed up the resolution time and identify permanent solutions, reducing the number and resolution time of incidents Higher productivity of business and IT staff Reduction in cost of effort in fire-fighting or resolving repeated incidents.

4. Process Overview
Problem Management process is one of the IT service Management processes that works very close to incident management process and Change Management process, There are two types of process activities; Reactive and Proactive. The Reactive activity is concerned with the detected errors (mainly from Incident Management Process) and required Root Cause Analysis, the outcome of this activity is a Known Error and workaround which is recorded in the Known Error Database. The Proactive activity is concerned with reviewing the Known Errors, Incident/Problem Reports (Patterns of failures/events), analyses the information and uses data collected by other IT Service Management Processes to identify trends or significant problems. The outcome of this activity is to provide solution to eliminate the error from happening again or to provide a workaround. It is driven as part of Continual Service Improvement.

4.1. Problem Management includes the following standard phases:


Problem Recording (Prioritize & Categorize)

Problem Detection

Investigation and Diagnosis

Problem Closure

Resolution (Permanent Solution)

Create Known Error Record (Workaround)

Figure - 1

4.2. High Level process Flow (Reactive)


Problem Detected Problem Recorded Assign Problem Record to Problem Manager

Investigate and Isolate Root Cause

Assign to Appropriate Workgroup

Review and Validate

Provide Solution

Verify Resolution

Update Known Error Database

Closure Of Problem Record

Figure - 2

4.3. High Level process Flow (Proactive)


Review Data/Reports, Proactive Monitoring Create Problem Record Follow the Same Steps in the Reactive Workflow

Review and Close Problem Record

Figure - 3

5. Process Triggers
Reaction to one or more incidents Problem triggered in testing Trend analysis in errors and faults Suppliers may trigger the need for some Problem Records through the notification of potential faults or known deficiencies in their products or services Availability Management, problem initiated to investigate, diagnosis and analyses on how to reduce downtime and increase uptime.

6. Process Interfaces with Other ITSM Processes


Problem interact with other processes as shown in the below diagram.

Change Management

RFCs

Request to participate in the post implementation review Entry to problem and known errors records

Configuration Management

Configuration Item information in CMDB

Incident Management

Known Error Records, Workarounds, Problem Resolution Logged incident against Configuration Item(s) Reports of problems and known errors by service

Service Level Management


SLA Reports of capacity related problems and known errors

Problem Management Process

Capacity Management

Resolutions for capacity related problems and known errors

Availability Management

Reports of availability related problems and known errors Availability reports used to indicate current or future problems

Release Management

Reports of any problem(s) introduced by release Notice of release

Figure - 4

7. Problem Policy
Policy -1: Incident and Problem are two separate processes, but they are mostly using the same tools, similar categorization, impact and priority coding systems. Policy -2: Problem record can be created by anyone in IT or benefiting from IT services or providing services to IT. Policy -3: Problem is different from incident, problem created to isolate the cause when incident occur with an unknown cause, or created to eliminate the known cause of the incident (permanent solution) or created to prevent an incident from occurring. Policy -4: Each Problem Record documents the Lifecycle of a single Problem Policy -5: One centralized Tool for problem across the IT organization Policy -6: Problem Management should maintain information about problems and the appropriate workarounds and resolutions, all known errors and workarounds must be registered in the Known Errors Database. Policy -7: Problem management meeting should be conducted regularly (weekly), problem manager is accountable and responsible on facilitating and managing those meetings. Policy -8: There are two types of Problem Management Process activities Reactive and Proactive. Policy -9: All problems go through the process and all problem initiators must complete the required information in the problem record. Policy -10: Problem Manager is accountable on the complete problem life cycle and provides a single point of coordination. Policy -11: End user means all parties or individuals benefiting from IT services.

10

8. Roles and Responsibilities Role


Process Owner

Responsibilities
Owns the problem Management Process Defining the process strategy Ensuring that appropriate process documentation is available and current Defining appropriate policies and standards to be employed throughout the process Periodically auditing the process to ensure compliance to policy and standards Periodically reviewing the process strategy to ensure that it is still appropriate and change as required Communicating process information or changes as appropriate to ensure awareness Providing process resources to support activities required throughout the Service Management lifecycle Ensuring process implementers have the required knowledge and the required technical and business understanding to deliver the process, and understand their role in the process Reviewing opportunities for process enhancements and for improving the efficiency and effectiveness of the process Addressing issues with the running of the process Providing input to the ongoing service improvement plan. Liaison with all problem resolution groups to ensure swift resolution of problems within SLA targets Ownership and protection of the KEDB (Known Error Database) Gatekeeper for the inclusion of all Known Errors and management of search algorithms Formal closure of all Problem Records Liaison with suppliers, contractors, etc. to ensure that third parties fulfill their contractual obligations, especially with regard to resolving problems and providing problem-related information and data. Arranging, running, documenting and all follow-

Process Manager Manager/Problem

11

Problem-Solving Group

up activities relating to Major Problem Reviews (Critical/High priority) Ensure that the correct number and level of resources is available in the problem solving team. Validate problems and ensure it has been set with the correct priority. Investigates, diagnose and isolates the root cause. Update known KEDB with known errors Develop corrective action plans to implement permanent solution. Escalate to problem manager on issues, risks and obstacles. Request 3rd party company (Suppliers/partners) involvement when is needed. Verify problem resolution with the initiator Update the problem record Create problem record as a proactive action to prevent incident from occurring.

Figure - 5

12

9. Roles Assignment Matrix Role Name of Locatio Resources n


Process Owner Process Manager (Problem Manager) Patrick Ma Patrick Ma Toronto Toronto

Tel
905-861905-861-

Email
[email protected] [email protected]

Time Zone
EST EST

TBD TBD TBD TBD TBD TBD TBD TBD TBD Problem-Solving TBD group(s) TBD TBD (IT Service TBD Support TBD Specialist) TBD TBD TBD TBD TBD TBD TBD TBD TBD TBD TBD TBD Third party TBD Companies TBD TBD (Suppliers/partne TBD rs) TBD TBD TBD TBD

13

10. Problem Priorities


Problem prioritized in the same way the incident is prioritized, it depends on the urgency and Impact and needs to take the below points into account to set the correct priority to a problem record. Can the system be recovered, or does it need to be replaced? How much will it cost? How many people, with what skills, will be needed to fix the problem? How long will it take to fix the problem? How extensive is the problem (e.g. how many CIs are affected)

Critical: Complete outage or partial outage of service(s) or component(s) that stop one
or more of the Vital Business Functions causing significant loss of revenue or the ability to deliver important public services. Service(s) or Component(s) supporting a critical business process is down or not functioning correctly or one or several critical business processes are unavailable, affecting all users. There is no workaround

High: Severely affecting some key users, or impacting on a large number of users.
Service(s) or Component(s) is not down but there is a serious problem affecting a great majority of the users and their productivity or affecting an individuals ability to conduct business effectively. Work around (if provided) is awkward and inefficient.

Medium: No severe impact


Service(s) or Component(s) is not down but there is a problem affecting a small number of users. Business critical work can be performed. Acceptable workaround is available.

Low:
Service(s) or Component(s) is not down, business critical work can be performed, but a cosmetic work would be beneficial.

14

11. Impact-Urgency Matrix

Impact
High High Urgency Medium Low 1 2 3 Medium 2 3 4 Low 3 4 5

Priorities
Figure - 6

12. Problem Service Level Targets Definition


Code Priority
Accept Problem Record 4 hr

Service Level Targets


Apply Root Cause Analysis Permanent Resolution N/A

Critical

48 hr

High

12 hr

4 days

N/A

Medium

48 hr

10 days

N/A

Low

7 days

21 days

N/A

Planned

Planning

Figure - 7

15

13. Major Problem Review


After every major problem (as determined by the priority definition), while memories are still fresh a review should be conducted to learn any lessons for the future. Specifically, the review should examine: Those things that were done correctly Those things that were done wrong What could be done better in the future How to prevent recurrence Whether there has been any third-party responsibility and whether follow-up actions are needed. Such reviews can be used as part of training and awareness activities for support staff and any lessons learned should be documented in appropriate procedures, work instructions, diagnostic scripts or Known Error Records. The Problem Manager facilitates the session and documents any agreed actions. It is recommended the review take place within three days from problem closure.

14. Known Error Database


The purpose of a Known Error Database is to allow storage of previous knowledge of incidents and problems and how they were overcome to allow quicker diagnosis and resolution if they recur. The Known Error Record should hold exact details of the fault and the symptoms that occurred, together with precise details of any workaround or resolution action that can be taken to restore the service and/or resolve the problem. An incident count will also be useful to determine the frequency with which incidents are likely to recur and influence priorities, etc.

15. Process Deliverables


Rejected problem record Accepted problem record Known Error/Workaround Permanent solution

16

16. Process Measurement (Metrics) and Reporting


The below metrics used to judge the effectiveness and efficiency of the Problem Management process, or its operation:

16.1. Metrics
The total number of problems recorded in the period (as a control measure) The percentage of problems resolved within SLA targets (and the percentage that are not!) The number and percentage of problems that exceeded their target resolution times The backlog of outstanding problems and the trend (static, reducing or increasing?) The average cost of handling a problem The number of major problems (opened and closed and backlog) The percentage of Major Problem Reviews successfully performed The number of Known Errors added to the KEDB The percentage accuracy of the KEDB (from audits of the database) The percentage of Major Problem Reviews completed successfully and on time.

17

17. Process Meetings


17.1. Problem Management Meeting
Title: Problem Management Meeting Purpose: The purpose of this meeting is to control and minimize the impact of incidents, problems and changes to the business environment that are caused by errors within the IT environment. Problem manager and other problem-solving group(s) meet to review problem records, problem trending and failed changes, and they ensure the root cause is isolated and a corrective action plan developed. Frequency:

Weekly

Role Players (Attendees): Problem Manager IT Lead team (Problem-Solving Group(s)) 3rd Party Companies (If required) Business Manager(s) (If required) Incident Manager (If required) Change Manager (If required) Agenda Content: Review open problem Records Problem records backlog Review the Root Cause Analysis assignment and progress Approve/Reject problem resolution Conduct a Major problem review (If any) Develop action plan for the outstanding problems Update records Close completed problem records Review problem management process performance (reports from the system) Process Improvement Improvement opportunities identified and discussed Meeting closure Review known errors requires permanent solution Agenda needs to be submitted at least 24 hours before the meeting to all invitees.

18

Method of Communication: Face to face or, Conference Call (Tel Number: 1-88...) or, Electronically through a supporting tool and emails.

19

17.2. Monthly Meeting


Title: Monthly Process Governance Meeting Purpose: Frequency: Overall review on process performance Identify gaps and develop actions plan to accommodate solutions Review report on changes created during the last month and outstanding incidents. To ensure that corrective action has been taken and that it was effective

Monthly

Role Players: Problem Manager (Facilitator, prepare agenda and write minutes of meeting) Process Owner IT Directors and Vice Presidents (Infrastructure & Applications) Business operation representative Agenda Content: Comparison between required and actual performance Review business impacts and reports on total problem cost Reports on overall SLA performance (breaches vs. exceeding the agreed service level targets) Review the status of the actions assigned during previous meetings Develop action plan for the new outstanding issues Agenda will be submitted to the problem manager minimum two days before the meeting Method of Communication: Tools: Change Management System Repository for keeping meeting agenda and minutes Conference Call (Tel Number: 1-88...) Face to face

20

18. Process RACI Chart


Step

Activity

Problem Initiator

Problem Manager

ProblemSolving group

3rd party Company (Suppliers/ Partners)

1,2,3, 4

Problem triggered by Change management, Problem management (Proactive Activities), Incident management (Further Root Cause Analysis) , Incident management (Incident post incident review) Problem Initiated Create Problem Record Does this Problem Exist somewhere else in the environment? Create a Class Problem Categorize and Prioritize Problem Assign Problem Record to Problem Manager Problem Record Resides under the Problem Manager Queue Review and Validate Problem Valid Problem? Update & Close Problem Record Inform Problem Initiator Duplicate Problem? Correct Priority? Set the Correct Priority Inform Problem Initiator 3rd Party Required? Assign Problem to Appropriate 3rd Party Company for Root Cause Analysis Assign Problem to Appropriate IT Service Specialist for Root Cause Analysis Problem Record Reside Under the Appropriate Problem Management Queue Investigate and Isolate Root Cause Root Cause Found? Requires 3rd Party Company Participation? Send Request 3rd Party Company Escalate to Problem Manager Document Root Cause and Mark as Known Error, Add to Known Error Database

AR

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

AR AR AR AR AR AR AR AR I AR AR AR AR I AR AR AR AR AR AR AR AR AR I AR AR C I C

21

Step

Activity

Problem Initiator

Problem Manager

ProblemSolving group
AR AR AR AR

3rd party Company (Suppliers/ Partners)

29 30 31 32 33

Is Permanent Solution Available? Develop Corrective Action Plan Evaluate Each CA to Determine if Change is Required Change Required? Change Management Process Create RFC I

AR

34

Change Management Process Changes Tested

AR

35 36 37 38 39 40 41 42 43 44 45 46 47

Implement Corrective Action Plan Issues with CA Implementation? Notify Problem Manager Verify CA Completion All CA Completed? Problem Resolved Problem Evolved from a Major Incident? Problem Resolution Needs to be Accepted by IT Team Leader Verify Problem Resolution with Initiator Is solution accepted? Update Problem Record Notify Problem Manager on Resolution Proactive Activity Review Known Errors in Known Errors Database I C C I

AR AR AR AR AR AR AR AR AR AR AR AR AR

48 49

Can Provide Permanent Solution? Proactive Activity Re-open the problem record

AR AR

50 51 52

Proactive Activity Discover Potential Incident Prepare Problem Report, Send to Problem Manager for Review & Distribution Notify Problem Manager I

AR AR AR

22

Step

Activity

Problem Initiator

Problem Manager

ProblemSolving group

3rd party Company (Suppliers/ Partners)

53 54 55 56 57 58 59 60 51 62 63 64 65 66 67 68 69

Conduct a Weekly Problem Management Meeting with ITLT Review Problem Status Problem Can be Closed by ITLT? Is it a Major Problem? Conduct a Major Problem Review Update and Close Problem Record Inform Problem-Solving Group; assure the Known Error Database Updated Accurately. Inform Problem-Solving Group to Update CMDB-If apply Develop Action Plan Monitor Implementation and Problem Status Receive Request for Root Cause Analysis Provide Assistance or Full Problem Resolution Can Provide Permanent Solution? Access Permitted to LCL Known Errors Database? Inform LCL IT Service Specialist to add the Known Errors and Workaround to KEDB Follow LCL Problem Management Process Add Known Errors and Workaround to KEDB (3rd Party Company) Inform LCL IT Service Specialist/Problem Manager on Problem Resolution

AR AR AR AR AR AR AR AR AR AR AR AR AR C I AR AR AR AR C C C I

70

AR

Legend R A C I

Explanation
Responsible for the action but not necessarily an authority or approval Accountable for the action, only one person Consulted before or during the action Informed

23

19. Process Detailed Description


Step
1,2,3 ,4

Activity
Problem triggered by Change management, Problem management (Proactive Activities), Incident management (Further Root Cause Analysis) , Incident management (Incident post incident review) Problem Initiated

Explanation
Problem can be triggered by followings: Change management; if testing or implementation of the change didnt go successfully and the change tester/implementer doesnt know the reason, a problem record created to find out the root cause of failure. Problem management (Proactive activities); by reviewing problem and incident reports, patterns of failures, monitoring IT infrastructure-alerts from systems, outcome of the problem or incident meeting , a problem record is created and it is not associated to an incident but it is created to prevent an incident from happening. Incident management (RCA); during the incident management lifecycle and in order to provide a resolution to an incident, a root cause analysis is required by the problem-solving group to investigate, diagnose and analyze deeper to isolate and identify the root cause of the incident. Incident management (Post Incident Review); after incident resolution of a critical or high incident a post review is conducted and one of the outcomes of this review is a problem record is created to identify the root cause of the incident. Problem record is created in the problem management system; the requester should complete the required information. Please see in Section 21 Attachment the Problem logging Template The problem initiator should take the possibility of having the same problem exists somewhere else in the IT environment, the problem initiator work within his/her knowledge and can share the concern with others (higher expertise) to have the correct data. Example: Router Brand XX Model 123 experiencing problem after downloading a new version of software. There are five of them in IT infrastructure. A Class Problem Record is created to cover one single problem on multi Components.

Create Problem Record

Does this Problem Exist somewhere else in the environment?

Create a Class Problem

24

Step
8

Activity
Categorize and Prioritize Problem

Explanation
The problem initiator should select the correct category such as hardwarenetworkRouter Brand XX Problems must be categorized in the same way as incidents so that the true nature of the problem can be easily traced in the future and meaningful management information can be obtained. And needs to select the appropriate priority associated with this problem, depends on the urgency and the impact of the problem. Problem record will be dispatched to the problem manager automatically after completing filling the required fields and submission. The record will reside under the problem manager queue, waiting for the problem manager to open and review and proceed with the process. Problem manager open and review the problem record, will look if the required information is completed, if it is a valid problem or not (Sometimes Incidents created as a problem and dispatched mistakenly to the problem manager) If YES then GOTO activity 15 If NO then Continue with activity 13 Problem manager call by phone the problem initiator and explain to him/her the reasons behind the rejection, based on the problem definition and criteria this is not a problem it is an incident, advise the initiator to create an incident record in the incident management system. Problem manager update the problem record with his/her reasons of rejecting this request, and close the record. Process END

Assign Problem Record to Problem Manager Problem Record Resides under the Problem Manager Queue

10

11

Review and Validate Problem

12 13

Valid Problem?

Inform Problem Initiator

14

Update & Close Problem Record

25

Step
15

Activity
Duplicate Problem?

Explanation
Problem manger checks if this problem has been created before for the same problem by the same initiator or different one, and this is a duplicate. In order to consider the record as a duplicate the following points needs to be taken in concern. The previous problem record must be still open Same configuration item and same problem description Associated to the same incident record or change record Call the initiator to confirm the duplication The problem manager depends on his knowledge of the existing opened problems and search in the system by the name of initiator or configuration item. The tool might give an informational message ( pop up) when the same problem initiator or configuration item exists in a previous opened record

16

Correct Priority?

The problem manager based on the agreed definition of priorities, review the current priority of the problem record If it is correct then GOTO activity 19 If NO then Continue with activity 17 Problem manager call by phone the problem initiator and inform him/her of the wrong selection of the priority. Problem manager set the correct priority to the problem record based on priority definition. If the problem exists in components or services managed or maintained by a 3rd party company or requires expertise that doesnt exists from inside the organization. If YES then Continue with activity 20 If NO then GOTO activity 21 Problem manager contact the business relationship manager to assign the problem record to the 3rd party company. If they have access on the problem management system then the problem manger will dispatch the record and call by phone to ensure and confirm receiving the record. If no access to the system is granted then problem manager send an email with the problem record details and call by phone. GOTO activity 63

17 18 19

Inform Problem Initiator Set the Correct Priority 3 Party Required?


rd

20

Request the Loblaws Business Relationship Manager to Assign problem to the Appropriate 3rd Party Company for Root Cause Analysis

26

Step
21

Activity
Assign Problem to Appropriate Problem-Solving Group (IT Service Specialist for Root Cause Analysis) Problem Record Reside Under the Appropriate Problem Management Queue Investigate and Isolate Root Cause

Explanation
Problem manager assign the problem record to the appropriate Problem-Solving group depends on the category of the problem Problem record resides under the problem-solving group queue, they will receive an automatic notification by the system, the notification can be by an email and a message through the system. The problem-solving group receives the problem record, open and review the problem details, perform an investigation and diagnosis activities to isolate the root cause. An investigation should be conducted to try to diagnose the root cause of the problem the speed and nature of this investigation will vary depending upon the impact, severity and urgency of the problem but the appropriate level of resources and expertise should be applied to finding a resolution proportionate with the priority code allocated and the service target in place for that priority level. There are many problem analysis, diagnosis and solving techniques available and much research has been done in this area. Some of the most useful and frequently used techniques include: Chronological Analysis Pain Value Analysis Kepner and Tregoe Brainstorming Ishikawa Diagrams Pareto Analysis

22

23

24

Root Cause Found?

25

Requires 3 Party Company Participation? Send Request to Loblaws Business Relationship Manager to assign problem to rd the appropriate 3 Party Company

rd

If the problem-solving group found the root cause then GOTO activity 28 If No the Continue with activity 25 If YES then Continue with activity 26 If NO then GOTO activity 27 Problem-solving group send the request to the business relationship manager to send it to the appropriate 3rd party company to assist in identifying the root cause. GOTO activity 63

26

27

Step
27

Activity
Escalate to Problem Manager

Explanation
Neither the problem-solving group nor the 3rd party company can identify the root cause. The problem-solving group escalates to the problem manager as an issue and added item to the weekly problem management meeting, to be discussed and decide on the next step, develop action plan and the problem manager will follow up and monitor the implementation of those actions until issue resolved. GOTO activity 53 The problem-solving group documents the identified root cause and registers the error in the Known Error Database as a known error. If during the root cause analysis activity a workaround is found, then it should be recorded in the problem record and keep the problem record open, it is important that work on a permanent resolution continues where this is justified. If no work on permanent solution is planned then you can close the problem record. In the future and during the regular review of the known errors that pending for a permanent solution; a corrective action plan can be developed to provide a permanent solution if possible. If YES then Continue with activity 30 If NO GOTO activity 51 The problem-solving group develops a corrective action plan to eliminate the root cause permanently. The action plan contains but not limited to the following: Tasks Resources assigned against each task Timeline for each task Approver of each task Objective of the plan The problem-solving group evaluates each corrective action to determine if change is required. If YES then Continue with activity 33 If NO then GOTO activity 35 The problem-solving group creates the request for change and follows the change management process to get the change assessed and approved

28

Document Root Cause and Mark as Known Error, Add to Known Error Database

29 30

Is Permanent Solution Available? Develop Corrective Action Plan

31 32 33

Evaluate Each CA to Determine if Change is Required Change Required? Change Management Process Create RFC

28

Step
34

Activity
Change Management Process Changes Tested

Explanation
Change tested in the development environment implementing it in the production environment. before

35 36 37

Implement Corrective Action Plan Issues with CA Implementation? Notify Problem Manager

Corrective action plan implemented. If YES then Continue with activity 37 If NO then GOTO 38 The Problem-solving group notifies the problem manager with the issue(s) accompanied with the implementation. Issues such as shortage of resources, overtime payment, technology constraint, etc The problem manager will review and discuss the raised issue(s) during the problem management meeting and invite the concerned people to come up with an immediate solution or action plan. GOTO activity 53 The problem-solving group verifies the corrective actions completion, to ensure all tasks are completed as per the plan and no task(s) is missed. If YES then Continue with activity 40 If NO then GOTO activity 52 And in parallel activity GOTO activity 35 Problem resolved by implementing the corrective actions that eliminated the root cause and provided a permanent solution. If problem is evolved or associated with a critical or high priority incident then Continue with activity 42 If NO then GOTO activity 43 The problem-solving group contacts by phone the IT team leader to verify and accept the solution before notifying the problem manager on resolution. GOTO activity 44 The problem is not associated with a major incident then the problem-solving group calls by phone and verify with the problem initiator. If solution is accepted then Continue with activity 45 If solution is not accepted then GOTO activity 35 and in parallel activity notify the problem manager of the situation GOTO activity 53.

38

Verify CA Completion

39

All CA Completed?

40 41

Problem Resolved Problem Evolved from a Major Incident? Problem Resolution Needs to be Accepted by IT Team Leader

42

43

Verify Problem Resolution with Initiator Is solution accepted?

44

29

Step
45

Activity
Update Problem Record

Explanation
Problem-solving group update problem record with the details of the work done, it is recommended to attach the corrective action plan to the record and document the results. A notification will be send to the problem manager to inform him/her on resolution, the notification can be done in one or more of the following methods: Phone call (MUST) Email Automatic notification through the system The next step is to evaluate the resolution and close the record by the problem manager. GOTO activity 53 The problem-solving group for a certain area in IT such as Network, Servers and active directory, Applications, Security etc, should review periodically the know errors with no permanent solution registered in the know errors Database within their technical area. The purpose of the review is to evaluate the possibility of providing a permanent solution to the problem. If YES then Continue with activity 49 If NO then continue monitoring and reviewing the registered known errors in KEDB GOTO activity 47 If the problem record was closed then re-open otherwise just follow the process. GOTO activity 30 The Problem-solving group or members of the problem management meeting or by other IT staff a potential incident can be discovered, by reviewing the reports from the system or the alerts generated by the monitoring systems.

46

Notify Problem Manager on Resolution

47

Proactive Activity Review Known Errors in Known Errors Database

48

Can Provide Permanent Solution?

49

Proactive Activity Re-open the problem record

50

Proactive Activity Discover Potential Incident

51

GOTO activity 5 Prepare Problem Report, The problem-solving group prepare a problem report, explains the Send to Problem Manager activities took place and the result of the root cause analysis, and for Review & Distribution the reasons of not proving the permanent solution. Send to the problem manager for his/her review and distribution. GOTO activity 53

30

Step
52

Activity
Notify Problem Manager

Explanation
The problem-solving group notifies the problem manager when not all the corrective actions is completed and requires more time and effort to complete. GOTO activity 53 The problem manager conduct a weekly meeting with the IT lead Team and problem-solving group and others depends on the agenda item (Please see section 17.1 Process Weekly Meeting) Review the opened problem record, the problem under solution, review issues and take action to resolve them, review reports generated from the system on the current status of the problems in the IT environment. If problems provided with a solution and it can be closed then Continue with activity 56 If problem still opened and requires intervene to expiate the work or to find solution to an outstanding issues keeping the problem open, then GOTO activity 61

53

Conduct a Weekly Problem Management Meeting with ITLT

54

Review Problem Status

55

Problem Can be Closed by ITLT?

56

Is it a Major Problem?

Is It a major problem, a problem with critical or high priority If Yes then Continue with activity 57 If No GOTO activity 58 The problem manager conduct a major problem review (See Section 13 major Problem Review) Change manager perform a check at this time to ensure that the record contains a full historical description of all events and if not, the record should be updated, then the Problem Record formally closed.

57

Conduct a Major Problem Review

58

Update and Close Problem Record

59

Inform Problem-Solving Group to Assure the Known Error Database Updated Accurately. Inform Problem-Solving Group to Update CMDB-If apply

Change manger informs the problem-solving group to update the KEDB accurately. Change manager notify configuration manager regarding any change in configuration item took place during the problem solving process. Process End

60

31

Step
61

Activity
Develop Action Plan

Explanation
Change manager is accountable and responsible on developing an action plan to overcome issues or to expiate a problem solving activities. Change manager monitors the implementation of the action plan and problem status GOTO activity 55 The 3rd part company receives a request from LCL problem-solving group or from the problem manager. The 3rd part company can participate partially or fully in the problem solving activities. (Partially means such as providing assistance to the LCL group in the root cause analysis or implementing the permanent solution) Can the 3rd party company provide a permanent solution? If YES then GOTO activity 68 If NO Continue with activity 66 Access refers to the level and extent of a services functionality or data that a user is entitled to use. If the 3rd company have access on the KEDB then GOTO activity 69. If NO then Continue with activity 67 The 3rd party company informs the problem-solving group to add the known error to the known error Database. The 3rd party company should follow LCL problem management process such as attending the LCL problem management meeting, notifying the problem manager when required, adding the known errors in LCL KEDB (if access granted), updating the problem record (if access granted) and others. Errors added to the Known Error Database

62

Monitor Implementation and Problem Status Receive Request for Root Cause Analysis Provide Assistance or Full Problem Resolution

63 64

65

Can Provide Permanent Solution?

66

Access Permitted to LCL Known Errors Database?

67

Inform LCL IT Service Specialist to add the Known Errors and Workaround to KEDB Follow LCL Problem Management Process

68

69

Add Known Errors and Workaround to KEDB rd (3 Party Company) Inform LCL IT Service Specialist/Problem Manager on Problem Resolution

70

Inform the problem-solving group and the problem manager on problem resolution

32

20. Legend & Definitions


Legend Problem Explanation A cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management Process is responsible for further investigation. Reducing or eliminating the Impact of an Incident or Problem for which a full Resolution is not yet available. For example by restarting a failed Configuration Item. Workarounds for Problems are documented in Known Error Records. Workarounds for Incidents that do not have associated Problem Records are documented in the Incident Record. A Problem that has a documented Root Cause and a Workaround. Known Errors are created and managed throughout their Lifecycle by Problem Management. Known Errors may also be identified by Development or Suppliers. Analysis of data to identify time-related patterns. Trend Analysis is used in Problem Management to identify common Failures or fragile Configuration Items, and in Capacity Management as a Modeling tool to predict future behavior. It is also used as a management tool for identifying deficiencies in IT Service Management Processes. An Activity that identifies the Root Cause of an Incident or Problem. RCA typically concentrates on IT Infrastructure failures. Part of the Problem Management Process. The Objective of Proactive Problem Management is to identify Problems that might otherwise be missed. Proactive Problem Management analyses Incident Records, and uses data collected by other IT Service Management Processes to identify trends or significant problems.

Workaround

Known Error

Trend Analysis

Root Cause Analysis Proactive Problem Management

21. Attachments

Problem Analysis Techniques.doc

Problem Record template.doc

You might also like