Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies
Problem Management Overview: HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies
HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies
Challenges Facing IT
IT is constantly being asked to:
Improve service quality Reduce the complexity of IT Reduce risk Lower the cost of operations Manage compliance Reduce the burden on an overworked IT workforce Manage the IT organization more like a business.
ITIL can provide the framework for a strategy to make IT and particularly Problem Management more efficient
Introduction to ITIL
ITIL is a framework for IT Service Management best practice produced by the OGC Adopting ITIL guidance offers a range of benefits that includes:
Reduced costs; Improved IT services through the use of proven best practice processes; Improved customer satisfaction through a more professional approach to service delivery; Standards and guidance; Improved productivity; Improved use of skills and experience
Service Validation & Testing Management Release & Deployment Management Service Evaluation Management Knowledge Management Event Management Request Fulfillment Access Management Service Asset & Configuration
Service Transition
Incident vs Problem
Incident Management is restoring normal service operation as quickly as possible and minimizing the adverse effect on business operations. ('Normal service operation' is defined here as service operation within Service Level Agreement (SLA) limits) Problem Management process that seeks to resolve the root cause of incidents and thus to minimize the adverse impact of incidents and problems on business that are caused by errors within the IT infrastructure, and to prevent recurrence of incidents related to these errors. A `problem' is an unknown underlying cause of one or more incidents, and a `known error' is a problem that is successfully diagnosed and for which either a work-around or a permanent resolution has been identified.
Proactive organizations
Always looking for ways to improve services Can be overly expensive
Extremely Reactive
Extremely Proactive
Objectives
Resolve Problems quickly and effectively To ensure resources are prioritized to resolve Problems in the most appropriate order based on business need To proactively identify and resolve Problems and Known Errors to minimize or prevent Incidents from occurring Minimize the impact of incidents that cannot be prevented To improve the productivity of support staff To provide relevant management information
Problem
Known Error Workaround Urgency Impact CI CMDB
The unknown root cause of one or more existing or potential Incidents A fault in a CI identified by the successful diagnosis of a problem and for which a temporary workaround or permanent solution has been identified A temporary remedy to eliminate or reduce interruption in service due to an Incident A measure of business criticality of an Incident, Problem or Change where there is an effect upon business deadlines. A measure of the effect that an Incident, Problem or Change might have on the business service being provided.
A Configuration Item (CI) is any object being managed by the IT Organization that is stored within the CMDB
A Configuration Management Database (CMDB) is a repository of all managed CIs and their associated relationships
Uses similar if not identical tools and categorization as Incident Management Key process area within the ITIL framework
KPIs continued
Number of Incidents resolved by Problem resolution Costs incurred during Problem resolution Expected plans and timelines for open Problems and Errors Number of Incidents resolved using the Knowledge Base
Inputs Incident Details Workarounds Configuration details IT Infrastructure details Known Errors from Releases
Problem Management
Outputs
Known Errors Request for Changes (RFCs) Problem Records Management Information
Error control
Obtaining management information from Problem data Completing major Problem reviews
Problem Classification
Error Assessment
RFC
Note: Error Control does not require a Problem to begin tracking and resolution of Errors Known Error Workaround Solution
Setting obtainable objectives and making use of skills of the Problem-solving team Good cooperation between Incident Management and Problem Management Setting aside time for true proactive Problem Management
A little time goes a long way to reduce the number of Incidents Over time, the reactive part of Problem Management will be reduced and more time spent on proactive Problem Management Focus on key Problems that cause the greatest pain
Errors in released software should be incorporated into the Known Error database for live services. Well defined Problem Management Roles
Knowledge Manager
Responsible for the quality and integrity of the Knowledge Database