0% found this document useful (0 votes)
129 views

Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies

The document provides an overview of problem management according to the ITIL framework. It defines problem management and discusses its goals, business value, lifecycle, roles and responsibilities, and metrics. Problem management aims to minimize the impact of incidents by identifying root causes and resolving known errors in order to prevent future incidents. It involves both reactive and proactive activities, including trend analysis, to resolve current problems and prevent potential problems. The document outlines the key processes, success factors, and value of implementing an effective problem management program.

Uploaded by

1buckeye
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies

The document provides an overview of problem management according to the ITIL framework. It defines problem management and discusses its goals, business value, lifecycle, roles and responsibilities, and metrics. Problem management aims to minimize the impact of incidents by identifying root causes and resolving known errors in order to prevent future incidents. It involves both reactive and proactive activities, including trend analysis, to resolve current problems and prevent potential problems. The document outlines the key processes, success factors, and value of implementing an effective problem management program.

Uploaded by

1buckeye
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Problem Management Overview

HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies

Problem Management Overview Agenda


Overview of the ITIL framework Overview of Problem Management
Definition of Problem Management Goals of Problem Management Business Value of Problem Management Problem Management Lifecycle Critical Success Factors of Problem Management Problem Management Roles and Responsibilities Problem Management Metrics

Challenges Facing IT
IT is constantly being asked to:
Improve service quality Reduce the complexity of IT Reduce risk Lower the cost of operations Manage compliance Reduce the burden on an overworked IT workforce Manage the IT organization more like a business.

ITIL can provide the framework for a strategy to make IT and particularly Problem Management more efficient

Introduction to ITIL
ITIL is a framework for IT Service Management best practice produced by the OGC Adopting ITIL guidance offers a range of benefits that includes:
Reduced costs; Improved IT services through the use of proven best practice processes; Improved customer satisfaction through a more professional approach to service delivery; Standards and guidance; Improved productivity; Improved use of skills and experience

ITIL Statistics (Problem Management Focused)


From Pink Elephant:
ITIL process improvements present senior IT management with an opportunity to improve efficiency and customer service quality, reduce IT workload and control costs by 20-40 per cent. Implementing key ITIL processes at Nationwide Insurance led to a 40% reduction of its systems outages. The company estimates a $4.3 million RoI over the next three years. An ITIL program at Capital One that began in resulted in a 30% reduction in systems crashes and software-distribution errors, and a 92% reduction in 'business-critical' incidents in 2 years

Agreed ITIL Disciplines


Financial Management Demand Management Service Portfolio Management Service Catalogue Management Capacity Management Availability Management Service Level Management Information Security Management Vendor Management Continuity Management Planning and Support Change Management
Incident Management Problem Management

Service Validation & Testing Management Release & Deployment Management Service Evaluation Management Knowledge Management Event Management Request Fulfillment Access Management Service Asset & Configuration

ITIL (v3) Library Components


ITIL Core:
Service Strategy Service Design Service Transition Service Operation Continual Service Improvement

Continual Service Improvement

Service Transition

ITIL Complementary Guidance


Set of publications specific to:
Industry sectors Organization types Operating models Technology architectures
Service Design

Service Strategy Service Operation

Available via web


http://www.best-management-practice.com

ITIL (v3) Service Operation


Service Operation
Achieving effectiveness and efficiency in the delivery and support of services Ensuring value to customers Topics
Stability in service operations Managing availability Controlling demand Scheduling operations Fixing problems Processes
Event Management Incident Management Request Fulfillment Problem Management Access Management

Incident vs Problem
Incident Management is restoring normal service operation as quickly as possible and minimizing the adverse effect on business operations. ('Normal service operation' is defined here as service operation within Service Level Agreement (SLA) limits) Problem Management process that seeks to resolve the root cause of incidents and thus to minimize the adverse impact of incidents and problems on business that are caused by errors within the IT infrastructure, and to prevent recurrence of incidents related to these errors. A `problem' is an unknown underlying cause of one or more incidents, and a `known error' is a problem that is successfully diagnosed and for which either a work-around or a permanent resolution has been identified.

Service Operation Balance


Reactive vs Proactive
Reactive organizations
Do not act unless triggered Reactive efforts tend to build until all work is reactive
An organization here is out of balance and is not able to effectively support the business strategy An organization here is quite balanced, but tends to fix services that are not broken, resulting in higher levels of change

Proactive organizations
Always looking for ways to improve services Can be overly expensive

Extremely Reactive

Extremely Proactive

Goal of Problem Management


Problem Management is both reactive and proactive in identification and resolution of errors Goal
To minimize the adverse impact of Incidents and Problems caused by errors in the infrastructure and to proactively prevent the occurrence of Incidents, Problems and errors.
Incident Management is concerned with restoring service

Objectives
Resolve Problems quickly and effectively To ensure resources are prioritized to resolve Problems in the most appropriate order based on business need To proactively identify and resolve Problems and Known Errors to minimize or prevent Incidents from occurring Minimize the impact of incidents that cannot be prevented To improve the productivity of support staff To provide relevant management information

Problem Management Definitions

Problem
Known Error Workaround Urgency Impact CI CMDB

The unknown root cause of one or more existing or potential Incidents A fault in a CI identified by the successful diagnosis of a problem and for which a temporary workaround or permanent solution has been identified A temporary remedy to eliminate or reduce interruption in service due to an Incident A measure of business criticality of an Incident, Problem or Change where there is an effect upon business deadlines. A measure of the effect that an Incident, Problem or Change might have on the business service being provided.

A Configuration Item (CI) is any object being managed by the IT Organization that is stored within the CMDB
A Configuration Management Database (CMDB) is a repository of all managed CIs and their associated relationships

Scope of Problem Management Scope


Diagnosis of the root cause
Identifies Known Errors

Strong relationship with Knowledge Management


Populates the Knowledge Management database

Uses similar if not identical tools and categorization as Incident Management Key process area within the ITIL framework

KPIs for Problem Management


Ratio of number of incidents versus number of problems sometimes grouped by services and in some cases by CIs. # of repeat problems (not incidents, problems) Balance of Problems solved with a KE - RFC or other Average problem closure duration % of unmodified/neglected problems % of problems with a root cause analysis Average cost to solve a problem % of problems with available workaround Average problem closure duration

KPIs continued
Number of Incidents resolved by Problem resolution Costs incurred during Problem resolution Expected plans and timelines for open Problems and Errors Number of Incidents resolved using the Knowledge Base

Source Column PM Scorecard and KPILibrary.com

Business Value of Problem Management


Value to Business
Problem Management reduces the Known Errors in the environment resulting in improved availability and fewer incidents
Other benefits
Higher availability of IT services Higher productivity of business and IT staff Reduced expenditure on workarounds or fixes that do not work Reduction in cost of effort in firefighting or resolving repeat incidents Better first-time fix rate of the Service Desk Improved organizational learning

Reactive vs. Proactive Problem Management


Reactive Problem Management
Reactive problem management seeks to cure the symptoms of problems. The reactive approach responds to reports of incidents that have already occurred.
Problem Control Activities Error Control Activities

Proactive Problem Management


Proactive problem management seeks to inoculate IT systems against problems. The proactive approach identifies potential problems before they emerge.
Trend Analysis Targeting Preventative Action

Problem Management High Level Process

Inputs Incident Details Workarounds Configuration details IT Infrastructure details Known Errors from Releases
Problem Management

Outputs
Known Errors Request for Changes (RFCs) Problem Records Management Information

Problem Management Activities


Problem control
Problem identification and recording Problem classification Problem investigation and diagnosis RFC and possible resolution and closure Tracking and monitoring of problems Error identification and recording Error assessment Recording error resolution Error closure Monitoring resolution progress

Error control

Assistance with the handling of major Incidents Proactive prevention of Problems


Trend analysis Targeting support action Providing information to the organization

Obtaining management information from Problem data Completing major Problem reviews

Problem Management Lifecycle


Problem Control Error Control

Tracking and Monitoring of Problems

Problem Classification

Tracking and Monitoring of Errors

Problem Identification and Recording

Error Identification and Recording

Error Assessment

Problem Investigation and Diagnosis

Record Error Resolution

RFC

RFC and possible Resolution and Closure

Close Error and Associated Problems

Successful Change Implementation

To Error Control Known Error Workaround Solution

Note: Error Control does not require a Problem to begin tracking and resolution of Errors Known Error Workaround Solution

Problem Management Critical Success Factors


Effective automated registration of Incidents
Should be linked with Incident records

Setting obtainable objectives and making use of skills of the Problem-solving team Good cooperation between Incident Management and Problem Management Setting aside time for true proactive Problem Management
A little time goes a long way to reduce the number of Incidents Over time, the reactive part of Problem Management will be reduced and more time spent on proactive Problem Management Focus on key Problems that cause the greatest pain

Errors in released software should be incorporated into the Known Error database for live services. Well defined Problem Management Roles

Problem Management Roles


Roles Problem Manager
Person or people responsible for Problem Management Responsible for:
Liaison with all problem resolution groups Formal closure of all Problem Records Develop and maintain relationships with suppliers and 3rd parties Major Problem Reviews

Problem Solving Groups


Technical groups and/or suppliers Responsible for problem solving

Problem Management Process Owner


Overall authority and responsibility for the process metrics, policies and procedures

Knowledge Manager
Responsible for the quality and integrity of the Knowledge Database

Problem Management Metrics (KPIs)

Problem Management Key Pitfalls


Poor link between Incident Management and Problem Management Lack of management commitment Insufficient time and resources to build and maintain the knowledge base Ineffective communication of Known Errors from the development environment to the live environment Organizational resistance to change

Questions and Answers

Questions and Answers

You might also like