IT infrastructure library
Video 3: Service Management
Service Management Lifecycle An approach to IT Service Management
that emphasizes the importance of
coordination and Control across the various
Functions, Processes, and Systems
necessary to manage the full Lifecycle of IT
Services. The Service Management
Lifecycle approach considers the Strategy,
Design, Transition, Operation and
Continuous Improvement of IT Services.
Service Manager A manager who is responsible for
managing the end-to-end Lifecycle of one
or more IT Services. The term Service
Manager is also used to mean any
manager within the IT Service Provider.
Most commonly used to refer to a Business
Relationship Manager, a Process Manager,
an Account Manager or a senior manager
with responsibility for IT Services overall.
Service Management Service Management is a set of specialized
organizational capabilities for providing
value to customers in the form of services.
Service A means of delivering value to Customers
by facilitating Outcomes Customers want to
achieve without the ownership of specific
Costs and Risks.
Deming quality circle:
Mission:
Specific
measurable
appropriate
realistic
Time; bound by time
Measuring planning:
Business score Card(BSC) = (CSF)Critical success factor: Key performance indicators (can be
divided into smaller PI's)
Process Management:
Process A structured set of Activities designed to
accomplish a specific Objective. A Process
takes one or more defined inputs and turns
them into defined outputs. A Process may
include any of the Roles, responsibilities,
tools and management Controls required
to reliably deliver the outputs. A Process
may define Policies, Standards,
Guidelines, Activities, and Work
Instructions if they are needed.
"manage to process"
Measurement and control with metrics; quantifiable indicators, based on baseline data.
Process relationships: software to modules, transfers of data to databases, etc.
Service Desk Support
Video 4
Service Desk (Service Operation) The Single Point of
Contact between the Service Provider and
the Users. A typical Service Desk manages
Incidents and Service Requests, and also
handles communication with the Users.
Help Desk (Service Operation) A point of contact for
Users to log Incidents. A Help Desk is
usually more technically focused than a
Service Desk and does not provide a
Single Point of Contact for all interaction.
The term Help Desk is often used as a
synonym for Service Desk.
Call center typically deals with large call volume
SD has larger responsibility than telesales or incident response. It is consolidated,
comprehensive, single interface between IT and users. Improve communication and teamwork
by informing about status, progress, assessment, and changes (S-term & L-term).
Service Desk:
Change Management (Service Transition) The Process
responsible for controlling the Lifecycle of
all Changes. The primary objective of
Change Management is to enable
beneficial Changes to be made, with
minimum disruption to IT Services.
Incident Management (Service Operation) The Process
responsible for managing the Lifecycle of
all Incidents. The primary Objective of
Incident Management is to return the IT
Service to Users as quickly as possible.
Service Level Management (SLM) (Service Design) (Continual Service
Improvement) The Process responsible for
negotiating Service Level Agreements, and
ensuring that these are met. SLM is
responsible for ensuring that all IT Service
Management Processes, Operational Level
Agreements, and Underpinning Contracts,
are appropriate for the agreed Service
Level Targets. SLM monitors and reports
on Service Levels, and holds regular
Customer reviews.
Configuration Management (Service Transition) The Process
responsible for maintaining information
about Configuration Items required to
deliver an IT Service, including their
Relationships. This information is managed
throughout the Lifecycle of the CI.
Configuration Management is part of an
overall Service Asset and Configuration
Management Process.
Release Management (Service Transition) The Process
responsible for Planning, scheduling and
controlling the movement of Releases to
Test and Live Environments. The primary
Objective of Release Management is to
ensure that the integrity of the Live
Environment is protected and that the
correct Components are released.
Release Management is part of the Release
and Deployment Management Process.
Underpinning Contract (UC) (Service Design) A Contract between an
IT Service Provider and a Third Party.
The Third Party provides goods or Services
that support delivery of an IT Service to a
Customer. The Underpinning Contract
defines targets and responsibilities that are
required to meet agreed Service Level
Targets in an SLA.
Centralized Service Desk
Centralized responsibility for accepting and recording service calls, routing, monitoring,
escalation
Unified business operations support
Uses a common incident reporting/recording system
Bridges physical with operation via direct communication
Rapid response and close proximity are objectives
Local Service Desk (Distributed)
Similar to centralized service desk, but the service desks are distributed amongst different
physical locations
Virtual Service Desk
Modern, specialized version of Local Service Desk
Several LSD's virtualized as a single Telecom unit
Provides global, round-the-clock support
Difficult to provide on-site support
Phone numbers are often re-routed as network center business hours transition
Excellent solution for large multinational organizations
Incident Management
Video 5
Overview of Incident Management
To lower or even eliminate effects or occurrences of IT interruptions, disturbances or
quality reduction
Must get users back to work a.s.a.h.p.
Hire appropriate specialists to record, classify, and allocate all incidents
Progress should be monitored
Incidents must all be resolved
Process must be closed after resolution
Incident (Service Operation) An unplanned
interruption to an IT Service or a
reduction in the Quality of an IT Service.
Failure of a Configuration Item that has not
yet impacted Service is also an Incident.
For example Failure of one disk from a
mirror set.
Incident Management (Service Operation) The Process
responsible for managing the Lifecycle of all
Incidents. The primary Objective of Incident
Management is to return the IT Service to
Users as quickly as possible.
Incident Records vs. Problem records
Incident records are end-user focused
Incident reporting management focuses on measurements of downtime and business
impact
Problem records start with IT management decision to invest IT resources and end-user
downtime for root cause investigation and implementation of resolution
Problem records focus on internal IT processes
Management reporting on problems focuses on root causes and implementing structural
resolutions
Incident Record (Service Operation) A Record
containing the details of an Incident.
Each Incident record documents the
Lifecycle of a single Incident.
Problem Record (Service Operation) A Record
containing the details of a Problem.
Each Problem Record documents the
Lifecycle of a single Problem.
Problem (Service Operation) A cause of one or
more Incidents. The cause is not
usually known at the time a Problem
Record is created, and the Problem
Management Process is responsible for
further investigation.
Escalation (Service Operation) An Activity that
obtains additional Resources when these
are needed to meet Service Level Targets
or Customer expectations. Escalation may
be needed within any IT Service
Management Process, but is most
commonly associated with Incident
Management, Problem Management and
the management of Customer complaints.
There are two types of Escalation,
Functional Escalation and Hierarchic
Escalation.
Functional Escalation (Horizontal); develop first. (Service Operation) Transferring an
Incident, Problem or Change to a technical
team with a higher level of expertise to
assist in an Escalation.
Hierarchic Escalation (Vertical) ; develop second. (Service Operation) Informing or involving
more senior levels of management to assist
in an Escalation.
Benefits, costs, and challenges
Greater effectiveness- reduced business impact
proactive enhancements and changes
business-focused SLA management
Improved monitoring and performance-service quality
Better staff utilization and efficiency
Eliminate future incidents and service requests
More accurate database
Improved customer/user satisfaction
Initial planning and implementation- communication
training and ramp-up
hardware and software tools
users/staff bypass procedures
incident backlog-process overload
too many escalations
lack of clear definitions, SLA's, commitment
Problem Management
Video 6
Problem Management (Service Operation) The Process
responsible for managing the Lifecycle of
all Problems. The primary Objectives of
Problem Management are to prevent
Incidents from happening, and to
minimize the Impact of Incidents that
cannot be prevented.
Incident Management (Service Operation) The Process
responsible for managing the Lifecycle of
all Incidents. The primary Objective of
Incident Management is to return the IT
Service to Users as quickly as possible.
PM is closely related to SD and incident management
Main concern is eliminating infrastructure errors
Goal is to find the underlying causes of actual and potential failures
PM also tracks and monitors infrastructure
Problem management vs. incident management
The two actually have contradictory goals- many organizations combine two processes-
detrimentally
Problem management supports incident management-workarounds and temporary solutions- no
incident resolution
PM takes time to identify root causes of incidents-often extended periods of unplanned downtime
Optimally the two processes should be separated
PM is often escalation point for service desk IM
Project management goals and objectives
Fire fighting and fire prevention
Minimize adverse effects on business
Proactively prevent the occurrence of incidents, problems, and errors
(1) Identify errors (2) Go through Decision-making processes (3) Act accordingly
Present proposals for improvement, rectifying errors, responding to RFC (request for change)
Identify weaknesses/vulnerabilities in infrastructures
Problem Control
Identifying problems-investigating root causes
Turn problems into known errors through classification, investigation, and diagnosis
Generate RFC, then resolve and close case
Classification includes categorization, impact, urgency, priority, status
Investigation and diagnosis is repeated to get closer to resolution
Temporary or emergency fixes may be applied
Error Control
Monitoring and managing known errors until resolved
Issues RFC to change management IT service
May evaluate changes in a post-implementation review (PIR)
Can involve several departments/units
Includes error identification and recording, error assessment, documentation resolution, closing
the case
Tracking and monitoring is done through all stages
Proactive management
Focuses on quality of infrastructure and services
Uses trend analysis to pre-empt problems and incidents
Goal is to look for weaknesses, perform penetration testing, keep up with vendor alerts and
bulletins
Firewall to prevent inter-domain problems
Overall identification and ongoing investigation of systems
The Problem Manager: Responsible for all PM activities
Maintain all problem/error control procedures
Assess effectiveness of problem management process
Protect integrity and independence of incident management
Govern proactive prevention campaigns
Manage personnel and resources(acquisition)
Develop and improve processes
Conduct "Post-Mortem" reviews
Change Management
Video 7
Change Management (Service Transition) The Process
responsible for controlling the Lifecycle of
all Changes. The primary objective of
Change Management is to enable
beneficial Changes to be made, with
minimum disruption to IT Services.
Most incidents are directly related to change
Change management is like a thermostat
Goal of CM is to manage the change the process and limit the introduction of errors and incidents
related to the changes
Innovation, improvement, modifications, corrections
CM is tightly coupled with configuration management and release management
Change (Service Transition) The addition,
modification or removal of anything that
could have an effect on IT Services. The
Scope should include all IT Services,
Configuration Items, Processes,
Documentation etc.
Change Advisory Board (CAB) (Service Transition) A group of people
that advises the Change Manager in the
Assessment, prioritization and scheduling
of Changes. This board is usually made up
of representatives from all areas within the
IT Service Provider, the Business, and
Third Parties such as Suppliers.
Change Case (Service Operation) A technique used to
predict the impact of proposed Changes.
Change Cases use specific scenarios to
clarify the scope of proposed Changes and
to help with Cost Benefit Analysis.
See Use Case.
Change History (Service Transition) Information about all
changes made to a Configuration Item
during its life. Change History consists of all
those Change Records that apply to the CI.
Change Management (Service Transition) The Process
responsible for controlling the Lifecycle of
all Changes. The primary objective of
Change Management is to enable
beneficial Changes to be made, with
minimum disruption to IT Services.
Change Model (Service Transition) A repeatable way of
dealing with a particular Category of
Change. A Change Model defines specific
pre-defined steps that will be followed
for a Change of this Category. Change
Models may be very simple, with no
requirement for approval (e.g. Password
Reset) or may be very complex with many
steps that require approval (e.g. major
software Release).
See Standard Change, Change Advisory
Board.
Change Record (Service Transition) A Record containing
the details of a Change. Each Change
Record documents the Lifecycle of a single
Change. A Change Record is created for
every Request for Change that is
received, even those that are
subsequently rejected. Change Records
should reference the Configuration Items
that are affected by the Change. Change
Records are stored in the Configuration
Management System.
Change Schedule (Service Transition) A Document that
lists all approved Changes and their
planned implementation dates. A
Change Schedule is sometimes called a
Forward Schedule of Change, even though
it also contains information about Changes
that have already been implemented.
Request for Change (RFC) (Service Transition) A formal proposal for
a Change to be made. An RFC includes
details of the proposed Change, and may
be recorded on paper or electronically. The
term RFC is often misused to mean a
Change Record, or the Change itself.
Change Management Actions
Recording
All RFC's are logged with Ref. Number of known error
Routine, standard changes don't generate RFC's (service requests not assessed by change
management)
RFC's come from PM, IT staff, customers, legislation and mandates, vendors and suppliers,
projects
Log should contain: RFC unique ID, cross-referenced known-error/problem number, description,
reason for change, current/new versioning, info about submitter, submission date, estimated
timeframe, resource allocations
Acceptance
Initial assessment with re-request option for RFC change incident(CI) (aka change item)
Acceptance will lead to (1) status change of existing CI, (2)change in relationship between CI's,
(3)new CI, (4)new location or owner of CI
Information needed to further process change is included in the change record
Classification
Priority and category are designated for accepted RFC
Example priorities: 1-low, 2-normal, 3-high, 4-highest (critical)
Priorities and categories determined with CAB and possibly other steering committees
Sample Categories: Minor Impact, Substantial Impact, Major Impact
Planning and Approval
CM uses change calendar or FSC-(Forward schedule of changes) to plan
FSC contains: Approved change details, planned implementation dates
CM with CAB approves 3 aspects of change: Financial, technical, Business
Also includes estimating impact and resources
Coordination
Approved changes go to product specialist who construct and integrate changes
Before implementation changes should be pilot tested
This may involve release management IT service
Phases: Build--Test--Implement
Evaluation
All implemented non-standard changes should be assessed
Did change meet intended goal(s)?
Are all parties satisfied with the results?
What were unplanned side effects of change?
Did implementation exceed estimated costs or downtime?
RFC is closed after success and evaluation
Results are documented in the PIR (Post implementation review)
Change Manger Role
Overall responsibility for change management in consultation with management liaison and
change advisory board
may be single individual or steering committee
In charge of reaching all CM goals and developing method of ensuring effectiveness and
efficiency
defines the scope of cm and associated processes
receives, logs, prioritizes RFC's-convenes CAB to review
Submits CAB recommendations to stakeholders, etc.
Release Management
Video 8
Release Management (Service Transition) The Process
responsible for Planning, scheduling and
controlling the movement of Releases to
Test and Live Environments. The primary
Objective of Release Management is to
ensure that the integrity of the Live
Environment is protected and that the
correct Components are released.
Release Management is part of the
Release and Deployment Management
Process.
The goals of release management
Garner holistic view of IT service change
Assure technical and non-technical aspects are considered
RM is hands-on working group for change management
Mange release planning and policy, design, building, and configuration
Campaigns for acceptance, plans the eventual rollout
conducts extensive testing and auditing(post)
Preparation, installation, training
Storage, release, distribution, install of software
Release Types
Full- Test, distribute, and implement all components of that release unit
Delta- Does not replace all components within change instances within a release unit, but will only
include those that have been changed. Also known as partial
Package- Includes a set of two or more full releases and is tested and released live as a full package.
Microsoft office 2007 to 2010
Definitive Media Library (DML) (Service Transition) One or more
locations in which the definitive and
approved versions of all software
Configuration Items are securely stored.
The DML may also contain associated CIs
such as licenses and documentation. The
DML is a single logical storage area
even if there are multiple locations. All
software in the DML is under the control
of Change and Release Management and
is recorded in the Configuration
Management System. Only software
from the DML is acceptable for use in a
Release.
RM development Environment
Release Policy
A set of rules for deploying releases into the live operational environment, defining different
approaches for releases depending on their urgency and impact.
→ ITIL Processes, ITIL Service Transition > Release and Deployment Management
Release Plan
A Document that embraces all Releases in line for rollout and their planned implementation
dates.
Release Management (Service Transition) The Process
responsible for Planning, scheduling and
controlling the movement of Releases to
Test and Live Environments. The primary
Objective of Release Management is to
ensure that the integrity of the Live
Environment is protected and that the
correct Components are released.
Release Management is part of the
Release and Deployment Management
Process.
Release (Service Transition) A collection of
hardware, software, documentation,
Processes or other Components required
to implement one or more approved
Changes to IT Services. The contents of
each Release are managed, Tested, and
Deployed as a single entity.
→ ITIL Processes, Release Management - ITIL V2
Design and develop software or Purchase software (or hardware)
RM controlled test environment
Build and configure (with back-out plan)
Fit-for-purpose tests
Release acceptance
Rollout planning
Communication, preparation, and training
RM live Environment
Release distribution-Audit trails/chain of custody
Installation- Installing software, drivers, etc.
Cost and potential problems
Costs and Pitfalls of Release management
Personnel, DSL/DML/DHL storage (backup and recovery too)
Build/test/distribute environments
Software and hardware costs-installation
Pitfalls/Challenges: Resistance from parties, circumventing release management process,
distribution is out-of-sync, inadequate testing
Configuration Management
Video 9
Configuration Item (CI) (Service Transition) Any Component that
needs to be managed in order to deliver an
IT Service. Information about each CI is
recorded in a Configuration Record within
the Configuration Management System and
is maintained throughout its Lifecycle by
Configuration Management. CIs are under
the control of Change Management. CIs
typically include IT Services, hardware,
software, buildings, people, and formal
documentation such as Process
documentation and SLAs.
Configuration Management (Service Transition) The Process
responsible for maintaining information
about Configuration Items required to
deliver an IT Service, including their
Relationships. This information is
managed throughout the Lifecycle of the
CI. Configuration Management is part of
an overall Service Asset and
Configuration Management Process.
Configuration Management Database (CMDB) (Service Transition) A database used to
store Configuration Records throughout
their Lifecycle. The Configuration
Management System maintains one or
more CMDBs, and each CMDB stores
Attributes of CIs, and Relationships with
other CIs.
Concepts and Objectives
Goal is to keep info about IT infrastructure current-specific item details and relationship to other
items
Make sure changes have been properly logged/documented
Maintains accurate topology of existing configuration items (CI)
Provide info about product policy, trouble-shooting data, impact assessment, provisioning of
services
Concepts-CI and CMDB
Configuration management goes well beyond asset management
CMDB (service transition)
CMDB is DB store of configuration records through lifecycle
Configuration management system contains one or more CMDB's
Each DB stores attributes of CI's and inter-relationships
CMDB is designed based on a configuration structure
Similar to versioning control software that develops used by developers(vms, scm, webdav)
Configuration Item Attributes
ITIL mandatory= CI alpha numeric unique identifier, status
Optional attributes = REV number, location, owner, part number, license data, etc
Other linked fields to other DB's = RFC numbers, change numbers, problems numbers, incident
numbers
Benefits of Configuration management
Manage IT components and services
Contributes to faster trouble shooting and change processing
Better control of software and hardware
Improved security and compliance
enhanced planning for procurement/expenditures
Support for capacity management and availability management
Foundation for IT continuity management
Costs and challenges
Added hardware, software, licenses, fees
Database and CMS design, implementation, maintenance
Erroneous CMDB scope and CI detail are challenges
Timing moving from manual to automated systems
Effects of sudden changes and over-reaching schedules
Obtaining buy-in from management/ stakeholders
Users/customers bypassing CM process
SLA Management
Video 10
Service Level (SMART-Specific Measurable Achievable Measured and reported achievement
Relevant Timely) against one or more Service Level Targets.
The term Service Level is sometimes used
informally to mean Service Level Target.
Service Level Target (SMART-Specific Measurable Achievable (Service Design) (Continual Service
Relevant Timely) Improvement) A commitment that is
documented in a Service Level Agreement.
Service Level Targets are based on Service
Level Requirements, and are needed to
ensure that the IT Service design is Fit for
Purpose. Service Level Targets should be
SMART, and are usually based on KPIs.
Service Level Agreement (SLA) (Service Design) (Continual Service
Improvement) An Agreement between an
IT Service Provider and a Customer. The
SLA describes the IT Service, documents
Service Level Targets, and specifies the
responsibilities of the IT Service Provider
and the Customer. A single SLA may cover
multiple IT Services or multiple Customers.
See Operational Level Agreement.
Service Level Management (SLM) (Service Design) (Continual Service
Improvement) The Process responsible for
negotiating Service Level Agreements, and
ensuring that these are met. SLM is
responsible for ensuring that all IT Service
Management Processes, Operational Level
Agreements, and Underpinning Contracts,
are appropriate for the agreed Service
Level Targets. SLM monitors and reports
on Service Levels, and holds regular
Customer reviews.
Service Level Requirement (SLR) (Service Design) (Continual Service
Improvement) A Customer Requirement for
an aspect of an IT Service. SLRs are based
on Business Objectives and are used to
negotiate agreed Service Level Targets.
Service Hours (Service Design) (Continual Service
Improvement) An agreed time period when
a particular IT Service should be Available.
For example, "Monday-Friday 08:00 to
17:00 except public holidays". Service
Hours should be defined in a Service Level
Agreement.
Service Capacity Management (SCM) (Service Design) (Continual Service
Improvement) The Activity responsible for
understanding the Performance and
Capacity of IT Services. The Resources
used by each IT Service and the pattern of
usage over time are collected, recorded,
and analyzed for use in the Capacity Plan.
See Business Capacity Management,
Component Capacity Management.
Goals and Scope of SLM
Maintain and improve IT service quality via continual consensus, monitoring, and logging by
eliminating poor service
SLP builds long-term relationships with all parties
SLM covers every aspect of IT service provisioning
Also concerned with customer/vendor negotiations
SLM is at the top of responsibility chain
Costs vs. benefits
Informed decisions with customers must be made
Costs are staffing, accommodation, support tools, hardware costs, marketing costs
benefits of SLM are extensive:::::
Improved relationships and customer perception of IT
Clear demarcation between IT groups/ IT and customers
Isolated targets for measuring and reporting
IT goals are much more "Business-centric"
Expectations are easier agreed upon and met
Baseline for measuring vendor performance
Clearly defined IT services
The SLM Process
Identifying-Identify customer needs
Defining-Defining the depth and scope of the requirements; ISO 9001-design, develop, produce, install,
maintain
Finalizing-Contract phase, contract-SLA
Monitoring-Clearly identified and monitored: Logs, Docs
Reporting-From monitoring, you get the logs and documents
Reviewing-A process that reviews the whole process at regular intervals
The SLM manager
Manages interfacing between customers and IT
SLM manager is both customer and IT advocate
Must be diplomat and facilitator
He/She must strive to be objective/Impartial
Rare combo of technical expertise and interpersonal manager
SLM manager needs to be intimate with all other IT service areas
Aligns all operational and tactical IT processes with business objectives
IT services Financial Management
Video 11
Goal is to promote cost awareness and prudence
IT services must address quality, cost, and customer needs (First 2 often contradict)
Budgeting
Accounting
Charging
Costs
Types of cost
Direct Cost (Service Strategy) A cost of providing an
IT Service which can be allocated in full
to a specific Customer, Cost Centre,
Project etc. For example cost of providing
non-shared servers or software licenses.
See Indirect Cost.
Indirect Cost (Service Strategy) A Cost of providing
an IT Service which cannot be allocated
in full to a specific Customer. For
example Cost of providing shared
Servers or software licenses. Also known
as Overhead.
See Direct Cost.
Fixed Cost (Service Strategy) A Cost that does not
vary with IT Service usage. For example
the cost of Server hardware.
See Variable Cost.
Variable Cost (Service Strategy) A Cost that depends
on how much the IT Service is used, how
many products are produced, the number
and type of Users, or something else that
cannot be fixed in advance.
See Variable Cost Dynamics.
Operational Cost Cost resulting from running the IT
Services. Often repeating payments. For
example staff costs, hardware
maintenance and electricity (also known
as "current expenditure" or "revenue
expenditure").
See Capital Expenditure.
Capital Expenditure (CAPEX) (Service Strategy) The Cost of
purchasing something that will become a
financial Asset, for example computer
equipment and buildings. The value of
the Asset is Depreciated over multiple
accounting periods.
Cost Elements
Equipment cost unit, software cost unit, organizational cost unit, accommodation cost unit, transfer cost
unit, cost accounting
Goals of IT financial Management
Reduce long-term costs-empower management
Declares added value of IT
Improved TCO and ROI
Forces business to make service levels and their costs more visible
Assures senior management/stakeholders that IT is well-managed and meeting business needs
Assists change management processes
Financial Management Activities
Budgeting
Planning/managing financial activities of the organization
Involves corporate and strategic long-term planning (1-5 years)
Incremental vs. zero-base budgeting
Process = sales & marketing; production; administrative; cost and investment
Determine the budget period
Accounting
Identifying and qualifying costs & expenditures
Understand the way costs are structured
Defining cost elements ( 1 year fixed)
Base cost structure on a service structure (Cisco IIN)
Subdivide cost units for personnel, hardware, software, and overhead
Budgets are formed annually for each cost element and service (based on past analysis,
growth)
Charging
Tool to allow for careful usage of IT resources
Should be compatible with organizations financial policies
Used to recover all incurred costs of IT business unit
Charging policy should include: Communication; pricing; flexibility; notational
charging
Pricing policy should include : Cost plus; going rate; target return; negotiated contract
price
Reporting
Invoicing and communicating to the customer
Conduct regular meetings with customer under umbrella of SLM
SLM is reported to with the following:::
IT expenditures per customer
Differences in actual/estimated and charging
Methods for accounting and charging
Disputes and their solutions
Possible Pitfalls
IT personnel are often unfamiliar with monitoring, calculating, and charging costs
May often require substantial non-IT information
Hard to find people with dual expertise
Difficult to accomplish if corporate strategy & IT/IS objectives aren't formalized via
policy
Because of above factors, non-cooperation is common
Lack of management commitment trickles down
Capacity Management
Video 12
Basic Terminology
Capacity Management (Service Design) The Process
responsible for ensuring that the
Capacity of IT Services and the IT
Infrastructure is able to deliver agreed
Service Level Targets in a Cost
Effective and timely manner. Capacity
Management considers all Resources
required to deliver the IT Service, and
plans for short, medium and long term
Business Requirements.
Performance Management (Continual Service Improvement) The
Process responsible for day-to-day
Capacity Management Activities. These
include Monitoring, Threshold detection,
Performance analysis and Tuning, and
implementing Changes related to
Performance and Capacity.
Application Sizing (Service Design) The Activity responsible
for understanding the Resource
Requirements needed to support a new
Application, or a major Change to an
existing Application. Application Sizing
helps to ensure that the IT Service can
meet its agreed Service Level Targets for
Capacity and Performance.
Capacity Plan (Service Design) A Capacity Plan is used
to manage the Resources required to
deliver IT Services. The Plan contains
scenarios for different predictions of
Business demand, and costed options to
deliver the agreed Service Level Targets.
Capacity Planning (Service Design) The Activity within
Capacity Management responsible for
creating a Capacity Plan.
Workload The Resources required to deliver an
identifiable part of an IT Service.
Workloads may be Categorized by Users,
groups of Users, or Functions within the IT
Service. This is used to assist in analyzing
and managing the Capacity, Performance
and Utilization of Configuration Items and
IT Services. The term Workload is
sometimes used as a synonym for
Throughput.
Modeling A technique that is used to predict the
future behavior of a System, Process, IT
Service, Configuration Item etc. Modeling
is commonly used in Financial
Management, Capacity Management and
Availability Management.
Benefits of Capacity management
Efficient management of resources to reduce IT risk through continuous monitoring
Understand impact of new or changed services
Cost reduction and maximized investment
Reduced disruption of business via link to change management
quicker response to customer needs
Lower capacity- related expenditures
Capacity management sub-processes
Business capacity management
Gain knowledge of current and future business requirements and goals
Data gathered from customer, strategic planning, marketing campaigns, trend analysis
BCM involves trending, proactive modeling, forecasting, prototyping, sizing, and
documenting present and future business requirements
Service Capacity Management
Determine usage of IT products/services to customers
Understand performance/ peak loads to meet SLA's
SCM is tightly coupled to SLM
Involved with SLA negotiations
Engages in monitoring, analyzing, tuning, and reporting on service performance
Establishes baselines for service usage
Manages the demand for services
Resource capacity management
To decide on use of IT infrastructure/ components
Stay current on technological developments and actively monitor trends
Monitoring, analyzing, and reporting on infrastructure and component utilization,
profiling, & base-lining
Activities of Capacity Management
Develop the plan
Modeling
Application Sizing
Monitor
Analyze
Tune
Implement
Demand management
Create and input to CDB (collection of data bases
Costs and related issues
Hardware & software tools and CDM management
Project management costs
Personnel, training, and support overhead
Related facilities and services
Unrealistic expectations
Lack of data and supplier input
Complex implementations
Lack of buy-in from management
Availability Management
Video 13
Availability Management (Service Design) The Process
responsible for defining, analyzing,
Planning, measuring and improving all
aspects of the Availability of IT
Services. Availability Management is
responsible for ensuring that all IT
Infrastructure, Processes, Tools, Roles etc
are appropriate for the agreed Service
Level Targets for Availability.
Availability (Service Design) Ability of a Configuration
Item or IT Service to perform its agreed
Function when required. Availability is
determined by Reliability, Maintainability,
Serviceability, Performance, and Security.
Availability is usually calculated as a
percentage. This calculation is often based
on Agreed Service Time and Downtime. It
is Best Practice to calculate Availability
using measurements of the Business
output of the IT Service.
Availability management in a nutshell
Making sure IT services are there when needed
Defined in SLA's via SLM integration
Monitors availability from end-to-end
Also works with problem management service
Goal of A.M. is to optimize IT infrastructure
Must maintain customer/user satisfaction
3 Key principles of Availability
IT services must be consistently accessible
One-time disruptions must be addressed rapidly and completely-no repeat incidents
Availability is mission critical business/infrastructure component
Reliability (Service Design) (Continual Service
Improvement) A measure of how long a
Configuration Item or IT Service can
perform its agreed Function without
interruption. Usually measured as MTBF
or MTBSI. The term Reliability can also
be used to state how likely it is that a
Process, Function etc. will deliver its
required outputs.
See Availability.
Serviceability (Service Design) (Continual Service
Improvement) The ability of a Third Party
Supplier to meet the terms of their
Contract. This Contract will include
agreed levels of Reliability, Maintainability
or Availability for a Configuration Item.
Maintainability (Service Design) A measure of how
quickly and Effectively a Configuration
Item or IT Service can be restored to
normal working after a Failure.
Maintainability is often measured and
reported as MTRS.
Maintainability is also used in the context
of Software or IT Service Development to
mean ability to be Changed or Repaired
easily.
High Availability (Service Design) An approach or Design
that minimizes or hides the effects of
Configuration Item Failure on the Users of
an IT Service. High Availability solutions
are Designed to achieve an agreed level
of Availability and make use of techniques
such as Fault Tolerance, Resilience and
fast Recovery to reduce the number of
Incidents, and the Impact of Incidents.
Continuous Availability (Service Design) An approach or design
to achieve 100% Availability. A
Continuously Available IT Service has no
planned or unplanned Downtime.
Continuous Operation (Service Design) An approach or design
to eliminate planned Downtime of an IT
Service. Note that individual Configuration
Items may be down even though the IT
Service is Available.
Availability management metrics
Mean Time To Repair (MTTR) The average time taken to repair a
Configuration Item or IT Service after a
Failure. MTTR is measured from when the
CI or IT Service fails until it is Repaired.
MTTR does not include the time required to
Recover or Restore. MTTR is sometimes
incorrectly used to mean Mean Time to
Restore Service.
Mean Time Between Failures (MTBF) (Service Design) A Metric for measuring
and reporting Reliability. MTBF is the
average time that a Configuration Item or IT
Service can perform its agreed Function
without interruption. This is measured from
when the CI or IT Service starts working,
until it next fails.
Mean Time Between Service Incidents (MTBSI) (Service Design) A Metric used for
measuring and reporting Reliability. MTBSI
is the mean time from when a System or IT
Service fails, until it next fails. MTBSI is
equal to MTBF + MTRS.
Mean Time to Restore Service (MTRS) The average time taken to Restore a
Configuration Item or IT Service after a
Failure. MTRS is measured from when the
CI or IT Service fails until it is fully Restored
and delivering its normal functionality.
See Maintainability, Mean Time to Repair.
Methodologies for availability management
Component Failure Impact Analysis (CFIA) (Service Design) A technique that helps to
identify the impact of CI failure on IT
Services. A matrix is created with IT
Services on one edge and CIs on the other.
This enables the identification of critical CIs
(that could cause the failure of multiple IT
Services) and of fragile IT Services (that
have multiple Single Points of Failure).
Fault Tree Analysis (FTA) (Service Design) (Continual Service
Improvement) A technique that can be used
to determine the chain of Events that leads
to a Problem. Fault Tree Analysis
represents a chain of Events using Boolean
notation in a diagram.
CRAMM A methodology and tool for analyzing and
managing Risks. CRAMM was developed
by the UK Government, but is now privately
owned. Further information is available
from http://www.cramm.com/
SOA (Service outage analysis) to reduce the frequency and duration of
outages while improving Mean Time To
Repair (MTTR). The result of SOA is clear
exposure of the risk of future outages, as
well as recommendations for improvement.
CRAMM (CCTA Risk Analysis and Management Method) was created in 1987 by the Central
Computing and Telecommunications Agency (CCTA) of the United Kingdom government.
CRAMM is currently on its fifth version, CRAMM Version 5.0. It comprises three stages, each
supported by objective questionnaires and guidelines. The first two stages identify and analyze
the risks to the system. The third stage recommends how these risks should be managed. The
three stages of CRAMM are as follows:
Stage 1 The establishment of the objectives for security by:
Defining the boundary for the study;
Identifying and valuing the physical assets that form part of the system;
Determining the ‘value’ of the data held by interviewing users about the potential business
impacts that could arise from unavailability, destruction, disclosure or modification;
Identifying and valuing the software assets that form part of the system.
Stage 2 The assessment of the risks to the proposed system and the requirements for security
by:
Identifying and assessing the type and level of threats that may affect the system;
Assessing the extent of the system's vulnerabilities to the identified threats;
Combining threat and vulnerability assessments with asset values to calculate measures of risks.
Stage 3 Identification and selection of countermeasures that are commensurate with the
measures of risks calculated in Stage 2. CRAMM contains a very large countermeasure library
consisting of over 3000 detailed countermeasures organized into over 70 logical groupings.
IT service continuity management
Video 14
Overview
Also known as disaster recovery
Disaster-goes far beyond incidents
Consists of business continuity planning and continuity planning
Emphasis is on disaster prevention (avoidance)
Supports overall business continuity management (BCM)
Deals also with restoration and recovery
Business Impact Analysis (BIA) (Service Strategy) BIA is the Activity in
Business Continuity Management that
identifies Vital Business Functions and
their dependencies. These dependencies
may include Suppliers, people, other
Business Processes, IT Services etc.
BIA defines the recovery requirements for
IT Services. These requirements include
Recovery Time Objectives, Recovery
Point Objectives and minimum Service
Level Targets for each IT Service.
Risk Assessment The initial steps of Risk Management.
Analyzing the value of Assets to the
business, identifying Threats to those
Assets, and evaluating how Vulnerable
each Asset is to those Threats. Risk
Assessment can be quantitative (based
on numerical data) or qualitative.
IT Service Continuity Plan (Service Design) A Plan defining the
steps required to Recover one or more IT
Services. The Plan will also identify the
triggers for Invocation, people to be
involved, communications etc. The IT
Service Continuity Plan should be part of
a Business Continuity Plan.
IT service continuity strategy
Elaborate prevention measures (stronghold/fortress)
Choose recovery options-personnel, accommodations, IT systems, networks, support services,
archiving
Viable options: No response-paper based system- reciprocal relationships-cold standby -warm
standby- hot start/hot standby- hybrid approach
Costs and potential problems
Time and finances are extensive for planning, initiating, developing, and implementing
Risk management personnel, software, hardware
Recovery management (hot site, warm site) are typically expensive solutions
Resources and commitment issues/budgeting
Assessing recovery sites
No Management commitment/ buy-in (risk-takers? Let insurance handle it)
Perpetual delays
Lack of in-house IT expertise
Lack of business familiarity and awareness
IT security management
Video 15
The Big Picture
Confidentiality (Service Design) A security principle that
requires that data should only be accessed
by authorized people.
Integrity (Service Design) A security principle that
ensures data and Configuration Items are
only modified by authorized personnel and
Activities. Integrity considers all possible
causes of modification, including software
and hardware Failure, environmental
Events, and human intervention.
Availability (Service Design) Ability of a Configuration
Item or IT Service to perform its agreed
Function when required. Availability is
determined by Reliability, Maintainability,
Serviceability, Performance, and Security.
Availability is usually calculated as a
percentage. This calculation is often based
on Agreed Service Time and Downtime. It
is Best Practice to calculate Availability
using measurements of the Business output
of the IT Service.
Threat Anything that might exploit a Vulnerability.
Any potential cause of an Incident can be
considered to be a Threat. For example a
fire is a Threat that could exploit the
Vulnerability of flammable floor coverings.
This term is commonly used in Information
Security Management and IT Service
Continuity Management, but also applies to
other areas such as Problem and
Availability Management.
Threat agent Hacker
Risk A possible Event that could cause harm or
loss, or affect the ability to achieve
Objectives. A Risk is measured by the
probability of a Threat, the Vulnerability of
the Asset to that Threat, and the Impact it
would have if it occurred.
Vulnerability A weakness that could be exploited by a
Threat. For example an open firewall port, a
password that is never changed, or a
flammable carpet. A missing Control is also
considered to be a Vulnerability.
Exposure Scenario when your exposed to potential
losses from a threat
Countermeasure Can be used to refer to any type of Control.
The term Countermeasure is most often
used when referring to measures
at increase Resilience, Fault Tolerance or
Reliability of an IT Service.
Security and the SLA
SLA must define extent that security is provided
Security elements must be customer-specific
Customer determines security policy
Security needs compared with providers service catalog ->Gap analysis performed->
then negotiation
Operational level agreement (OLA) will have a security section as well
Costs and possible pitfalls
Personal, cash outlay
No security/ poor security = lost production, replacement, data loss/damage, goodwill
diminished, reputation, legal action, government fines
Lack of commitment and ambition
Poor attitudes and human behavior
Lack of awareness training- verification checks
No change management service
lack of detection (IDS, IPS, FW)
Over-reliance on reactive techniques