0% found this document useful (0 votes)
518 views146 pages

Foundations of Service Level Management PDF

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
518 views146 pages

Foundations of Service Level Management PDF

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 146

SERVICE LEVEI

MANAGEMEN I

Service level management can be the key to custom i i


provider, including ASPs, ISPs, Telcos, and IT orgai . i
been a lack of information and advice about 1111111 „,
FOUNDATIONS OF
and Service Level Agreements (SLAs), at least 1111111 Him
Management solves that problem by providing the
use with service level management:
SERVICE LEVEL
Practical tips, cautions, and notes of interest to help you take adr:uit.s ....t tlu
the authors and others who have implemented service level inamwe mcut
• Il mplutes for building effective Service Level Agreements
m ..1
MANAGEMENT
l: ►► itlelines to shorten the process of negotiating SLAs
- Advice on developing service level management disciplines within your orgunliulloo
Sample business justifications supporting service level management Inveslnua ► IN
A comprehensive list of products that can be used for service level management
At Mil this book can save managers time and money, as well as help them avoid the trust rut ion
it attempting to "reinvent the wheel" for service level management.
Kick Sturm has over 25 years of experience in the computer industry. He is president of lintcrprise
Management Associates, a leading industry analyst firm that provides strategic and tactical advice
on the issues of managing computing and communications environments and the delivery of those
services. He was co-chair of the IETF Working Group that developed the SNMP M113 for managing
applications, and was a founder of the OpenView Forum. Rick also is a columnist for Interne! Week,
leas published numerous articles in other leading trade publications, and is a frequent speaker at
industry events.
Wayne Morris has over 20 years of experience in the computer industry. He is the vice president of
corporate marketing and a company officer of BMC Software, the leading supplier of application
service assurance solutions. He has held a variety of technical, support, sales, marketing, and
executive management positions in several companies in Australia and the United States.
Mary Jander has spent 15 years tracking information technology. She is presently a senior analyst
with Enterprise Management Associates. Prior to that, she was with Data Communications
magazine, where she covered network and systems management.

CATEGORY: NETWORKING $29.99 USA / $41.99 CAN / £21.99 Net UK


«)VERS: SERVICE LEVEL
MANAGEMENT SBN 0 672 31743-5

SA MS RICK STURM, WAYNE MORRIS,


www.samspublishing.com 9 MS AND MARY JANDER
Foundations of
Service Level
Management

SiI M5
Foundations of
Service Level
Management
Rick Sturm,
Wayne Morris,
and Mary Jander

SiI M5
800 E. 96th Street, Indianapolis, Indiana 46240
Foundations of Service Level About the Authors
Management Rick Sturm has over 25 years experience in the computer industry. He is presi-
Rick Sturm, Wayne Morris, and Mary Jander dent of Enterprise Management Associates, a leading industry analyst firm that
Associate Publisher
provides strategic and tactical advice on the issues of managing computing and
Michael Stephens
Copyright © 2000 by Sams communications environments and the delivery of those services. He was co-chair
Executive Editor of' the IETF Working Group that developed the SNMP MIB for managing appli-
All rights reserved. No part of this book shall be reproduced,
Tim Ryan cations and was a founder of the OpenView Forum. He also is a columnist for
stored in a retrieval system, or transmitted by any means,
Acquisitions Editor Internet Week, has published numerous articles in other leading trade publications,
electronic, mechanical, photocopying, recording, or other-
Steve Anglin and is a frequent speaker at industry events.
wise, without written permission from the publisher. No
patent liability is assumed with respect to the use of the Development Editor Wayne Morris has over 20 years experience in the computer industry. He is the
information contained herein. Although every precaution Songlin Qiu vice president of corporate marketing and a company officer of BMC Software,
has been taken in the preparation of this book, the publisher Managing Editor the leading supplier of application service assurance solutions. He has held a vari-
and author assume no responsibility for errors or omissions. Lisa Wilson ety of technical, support, sales, marketing, and executive management positions in
Nor is any liability assumed for damages resulting from the several companies in Australia and the United States. His articles on systems and
Project Editor
use of the information contained herein.
Elizabeth Roberts service level management have been published in the United States, Europe, and
International Standard Book Number: 0672317435 Australia, and he speaks regularly at industry conferences.
Copy Editor
Library of Congress Catalog Card Number: 99-63620 Rhonda Tinch-Mize Mary Jander has spent 15 years tracking information technology. She is a senior
Printed in the United States ofAmerica analyst with Enterprise Management Associates (Boulder, CO). Prior to that, she
Indexer
Kevin Kent was with Data Communications magazine, where she covered network and systems
First Printing: April, 2000
management for an international readership of network architects and information
07 06 05 9876 Proofreader systems managers. She has also worked at Computer Decisions magazine and as a
Katherin Bidwell freelance writer and copy editor.
Professional Reviewers
Trademarks Ben Bolles
All terms mentioned in this book that are known to be
Eric Goldfarb Dedication
trademarks or service marks have been appropriately capital- Interior Designer To Marilyn and David—thanks again for your understanding, encouragement, and
ized. Sams cannot attest to the accuracy of this information.
Dan Armstrong forbearance through the entire process of creating this book.
Use of a term in this book should not be regarded as affect- Cover Designer Rick Sturm
ing the validity of any trademark or service mark. Alan Clements
Copywriter To my family on two continents whose understanding, support, and love carry me.
Eric Borgert Wayne Morris
Warning and Disclaimer Production
To Rusty, with love and gratitude for all your support.
Every effort has been made to make this book as complete Darin Crone
Mary Jander
and as accurate as possible, but no warranty or fitness is
implied. The information provided is on an "as is" basis. The
author(s) and the publisher shall have neither liability nor
responsibility to any person or entity with respect to any loss
or damages arising from the information contained in this
book or from the use of programs accompanying it.

V
4
Acknowledgments Contents at a Glance
This book generated significant interest and excitement as we put it together.
Service level management is a growing management discipline, and there are a Introduction
number of professionals who are contributing to the growth in understanding
and acceptance of proactive service management. This includes industry analysts, Part I: Theory and Principles 5
members of the press, and courageous individuals within IT departments who 1 The Challenge 7
have taken a leadership role in implementing service level management, and who 2 The Perception and Management of
have subsequently shared their best practices in industry forums and conferences. Service Levels 21
Many of our colleagues and friends contributed their insights and support for 3 Service Level Reporting 39
this book. To mention them all here is not possible, but we'd like to call out some 4 Service Level Agreements 53
individuals who made our jobs much easier.
5 Standards Efforts 77
Thanks to Jeanne Moreno, Linda Harvey, and Alex Shootman of BMC Software
who have been instrumental in implementing service level management and in Part II: Reality 85
educating many others in the methodology and procedures for assuring that
6 Service Level Management Practices 87
service levels can be met. David Spuler, David Johnson, Sharon Dearman, and
Shannon Whiting of BMC Software contributed research that helped in many 7 Service Level Management Products 101
chapters. Amy DeCarlo, Mike Howell, Elizabeth North, and Colleen Prinster
with Enterprise Management Associates contributed research and editorial assis- Part III: Recommendations 123
tance that helped in many chapters and Appendix E Finally, our thanks go to Sara 8 Business Case for Service Level Management 125
Nupen with Enterprise Management Associates. She assisted with the creation of 9 Implementing Service Level Management 137
several of the illustrations for the book.
10 Capturing Data for Service Level Agreements
Many vendors contributed information to the descriptions of current service level (SLAs) 153
management products. Although it is not practical to list all those companies, our 11 Service Level Management as Service Enabler 169
special thanks go to BMC Software, Cabletron, Candle Corporation, FirstSense,
12 Moving Forward 179
Hewlett-Packard, IBM/Tivoli, Landmark, Luminate, and Mercury Interactive for
their contributions.
Appendixes 187
Our thanks go to Rosemarie Waiand and Jan Watson for setting up the Web site, A Internal Service Level Agreement Template 189
http: //www.sim-info.org , which will be used as a repository for templates and
also for discussion forums around service level management. B Simple Internal Service Level Agreement
Template 195
There are many others who had a direct role in developing this book and bring- C Sample Customer Satisfaction Survey 197
ing it to market including Songlin Qiu, Tim Ryan, and Steve Anglin of Sams
Publishing and the book reviewer, Ben Bolles. D Sample Reporting Schedule 201
E Sample Value Statement & Return on Investment (ROI) Analysis
To all these individuals and others who helped us, we give our thanks and sincere for a Service Provider Delivering an SAP Application 203
appreciation.
F Selected Vendors of Service Level Management Products 211
Glossary 241
Index 247

A
vi vii
Table of Contents 4 Service Level Agreements 53
The Need for SLAs 53
Introduction 1 Functions Of SLAs 55
Types of SLAs 56
SLA Processes 58
PART
Summary 75
I Theory and Principles 5
5 Standards Efforts 77
1 The Challenge 7 IT Infrastructure Library 78
Mission Impossible 8 Distributed Management Task Force (DMTF)
Divergent Views 9 SLA Working Group 81
Technical Challenge 11 Internet Engineering Task Force (IETF)—Application
Management MIB 82
What Is SLM? 13
Application Response Measurement Working Group 83
Pros and Cons 14
Summary 84
Other Service Providers 15
The Importance of SLM 15
Why Now? 18 PART
Summary 20
II Reality 85
2 The Perception and Management of Service 6 Service Level Management Practices 87
Levels 21
Lack of Common Understanding 87
Availability 22
Current Service Level Management Practices 90
Performance 24
Summary 100
Workload Levels 26
References 100
Security 28
Accuracy 31 7 Service Level Management Products 101
Recoverability 32 Monitoring Tools 102
Affordability 34 Reporting Tools 117
Summary 38 SLM Analysis 120
Administration Tools 121
3 Service Level Reporting 39
Summary 122
Audience 40
Types of Reports 42
Frequency of Reporting 46 PART
Real-Time Reporting 49 III Recommendations 123
Summary 52
8 Business Case for Service Level Management 125
Cost Justifying Proactive Service Level Management 125
Quantifying the Benefits of Service Level Management 127
A Sample Cost Justification Worksheet 131 I?voltuion of Service bevel Management Standards 184
Summary 136 Evolution of Management Solution Capabilities 184

Implementing Service Level Management 137


I'A RT
Planning the Rollout 138
Going Live with SLM 142 1 V Appendixes 187
Following Through 149
A Internal Service Level Agreement Template 189
Summary 151
About the SLA 189
10 Capturing Data for Service Level About the Service 190
Agreements (SLAs) 153 About Service Availability 191
Metrics for Measuring Service Levels 153 About Service Measures 193
Methods for Capturing Service Metrics 155
Monitoring Individual Components and Aggregating
B Simple Internal Service Level Agreement
Results 156 Template 195
Inspecting Network Traffic for Application Transactions 160
C Sample Customer Satisfaction Survey 197
End-to--End Service Level Measurement 161
Rating Service Quality 197
Common Architectures and Technologies for Data Capture
Solutions 164 General Comments 199
Summary 167 Current Usage 199
Future Requirements 200
11 Service Level Management as Service Enabler 169 Optional Information 200
The Ascendance of IP 170
A Spectrum of Providers 170 D Sample Reporting Schedule 201
The Importance of SLAs in the Service Environment 172 Daily Report 201
Different Strokes 173 Weekly Report 202
Smart Implementation 174 Monthly Report 202
Advice for Users 174 Quarterly Report 202
Advice for Service Providers 176
E Sample Value Statement & Return on Investment (ROI)
Summary 177
Analysis for a Service Provider Delivering an SAP
12 Moving Forward 179 Application 203
Summary ofValue 204
Establishing the Need for Service Level Management 179
Return on Investment (ROI) Value Areas 204
Defining the Services to Be Managed 180
Benefit Areas 207
Communicating with the Business 181
Return on Investment Analysis 210
Negotiating Service Level Agreements 181
Summary 210
Managing to the Service Level Agreement 182
Using Commercial Management Solutions 183
Continuously Improving Service Quality 183
F Selected Vendors of Service Level Management
Products 211
Tell Us What You Think!
As the reader of this book, you are our most important critic and commentator.
Glossary 241 We value your opinion and want to know what we're doing right, what we
could do better, what areas you'd like to see us publish in, and any other words
of wisdom you're willing to pass our way.
Index 247
As an Associate Publisher for Sams, I welcome your comments.You can fax,
email, or write me directly to let me know what you did or didn't like about this
hook—as well as what we can do to make our books stronger.
Please note that I cannot help you with technical problems related to the topic of this book,
and that due to the high volume of mail I receive, I might not be able to reply to every
message.

When you write, please be sure to include this book's title and author as well as
your name and phone or fax number. I will carefully review your comments and
share them with the author and editors who worked on the book.
Fax: 317-581-4770
Email: [email protected]

Mail: Michael Stephens


Associate Publisher
Sams
201 West 103rd Street
Indianapolis, IN 46290 USA

xii I xiii
Introduction
InformationTechnology (IT) departments are receiving pressure in both large
acid small businesses to operate more like a business and become more efficient.
(;ustomers are demanding assurances of service levels from their IT departments,
Poleos, ASPs, ISPs, and other service providers. New categories of service providers
are emerging, such as application service providers (ASP). Increasingly, businesses
are turning to out-sourcing of IT functions as a way to control costs and to
achieve consistent levels of service. Businesses are increasingly dependent upon
service providers, including their own internal IT department and external service
providers. No longer can service providers, such as an IT department, just focus
on keeping each of the pieces (network, systems, databases, and so on) running.
'Ibday's environment demands a comprehensive, customer-focused, holistic
approach to management. In some cases, IT itself is becoming the focal point
I br new business, as evidenced in the growing trend toward managed e-commerce,
'

e-business, application services, and virtual private networks. To fulfill these new
roles, IT managers must remain focused on important customers, while still
providing affordable service to other clients.

The Need for a Book About Service Level


Management
In a way, service level management is similar to the weather. Like the weather,
there is much talk about service level management, but little action. Executives
recognize the importance of service level management, and Service Level
Agreements, even more so. However, a recent study found that a majority of IT
organizations have not yet implemented Service Level Agreements with their
clients. Why? The answer is quite simple—they do not know how to go about
establishing a program for service level management or how to write Service
Level Agreements.
This book resolves that uncertainty by showing how to go about establishing a
program of effective service level management and how to write Service Level
Agreements that will be meaningful.

Who Will Benefit from This Book?


This book is written for those professionals whose organizations are service pro-
viders or the client of a service provider. Service providers include IT organiza-
tions, telcos, application service providers, out-sourcing companies, and so on. The
book has been designed to serve as a practical guide to service level management
for the IT managers and other professionals who are responsible for providing ser-
vices to their clients.
Features of This Book successliil and one that is a complete I hiilcn•e_ a waste of tine, money, and
'

There has been a lack of readily available documented knowledge about service ellOrt. It also looks at the various types of products that are available to help
level management procedures and implementation. This book addresses the theory with service level management,
and methodology behind service level management, and provides an assessment of • Part III: Recommendations—provides insights and guidance about the
current technologies and products that can be used to implement proactive service actual contents of a Service Level Agreement. It provides guidance on build-
management. The book also provides specific recommendations for developing a ing a business case for service level management. It provides guidelines for
service level management discipline within your organization. It includes practical choosing the appropriate metrics for Service Level Agreements. A third key
tips, cautions, and notes of interest to help you take advantage of the experiences component of this section is guidelines for implementing service level
of the authors and others who have implemented service level management. management program in any organization.
Templates for building Service Level Agreements, along with sample business justi- • Appendixes—provide detailed information that will be helpful for imple-
fications supporting service level management investments, should allow the reader menting service level management. The appendices include templates for
to more quickly implement a disciplined approach to service level management. reports, Service Level Agreements, and follow up assessments. There is also
This book can save the manager of an IT organization or other service provider an appendix that contains a comprehensive list of vendors and their products
time and allow the manager to avoid the frustration of attempting to "reinvent the to assist with service level management.
wheel" for SLM. Similarly, this book will help the clients of the service providers • Glossary—In light of the confusion surrounding service level management
to understand what they can reasonably expect from their service provider in terminology, a set of clear, concise definitions is vital to understanding this
terms of service level guarantees. If this book's guidelines are followed, even the subject.This glossary contains definitions of over 60 terms important to
process of negotiating a Service Level Agreement can be shortened and made understanding service level management.
more efficient.
Another valuable resource that this book offers is a comprehensive list of products
that can be used to facilitate service level reporting. This list was compiled with From the Authors
many hours of research and can be used to quickly identify products that will be We believe this comprehensive approach to service level management will provide
useful in specific situations. sufficient understanding to enable you to successfully adopt this management
approach. This will help you to operate the IT department more like a business
.ind to stay focused on your most important customers.
How This Book Is Organized
We expect continued evolution of SLM techniques, technology, and vendor
Over the past five years, service level management has become a hotbed of activ-
solutions in the industry. We encourage you to commence a dialog and share best
ity, hype, hoopla, and misinformation. Therefore this book begins by laying out practices with ourselves and other colleagues in the industry. To that end, we invite
a clear blueprint of what service level management is and what it is not. It then
you to visit our Web site, http: //www.slm-info.org , where electronic copies of
continues with a review of the principles that underlie the effective management the templates are available together with chat facilities.You can communicate to
of service levels in a service provider's environment, IT environment, or any other
any of the authors via this Web site or send email directly to the following
service environment.
addresses:
• Part I:Theory and Principles—presents a detailed guide to creating Service
Rick Sturm: [email protected]
Level Agreements—the heart of any SLM program. It provides a guide to
Wayne Morris: [email protected]
how to go about creating your own Service Level Agreements.
• Part II: Reality—provides practical advice, tips, and guidelines for creating Mary Dander: jander@enterprisemanagement .com
an effective program for service level management. The advice that this book
gives can make the difference between creating an SLM program that is
Conventions Used in This Book
Note
A Note presents interesting pieces of information related to the surrounding discussion.
PART

Tip
A Tip offers advice or teaches an easier way to do something.

Caution
A Caution advises you about potential problems and helps you steer clear of disaster.
PRAI

Theory and
Principles
Chapter
1 The Challenge
2 The Perception and Management of
Service Levels
3 Service Level Reporting
4 Service Level Agreements
5 Standards Efforts
CHAPTER

The Challenge

Like the peasants in an old monster film, armed with torches and pitchforks,
ready to storm the castle—today IT clients are fed up with the service that they
have been receiving. They are storming IT castles demanding change. They want
improved service and they want it now.This book looks at the problem and how
IT can respond to the demands of the user community.

Note
In this book, the terms users and clients are used interchangeably to identify those people and
groups within a company who are the consumer of services provided by IT. The term customer is
reserved for those groups and individuals who buy the company's goods and services.

The world of business has always been one of change and innovation. The objec-
tive of those changes has always been to maximize profits. Owners and managers
have constantly sought new ways to achieve this objective. Historically, change has
been slow and often subtle—measured in generations or even centuries. I lowever, level of responsibility within the corporation has risen significantly. IT has gone
that has changed. In this century, and particularly since 1940, changes in business from a facilitator of the business process to becoming part of the process and from
practices have accelerated, thanks to the introduction of technologies that stream- supporting staff functions to becoming a key element of the business.
line processes and enable new approaches to traditional methods.
Today, some businesses are built solely on electronic commerce. In these cases,
the company has no existence except through its computers. Companies like
Mission Impossible Amazon.com , eBay, e*trade, and the like cease to exist without the functions
Information technology (IT) plays a double role in today's global business provided by IT.
environment. IT's role in facilitating change is well-known and well-documented.
However, the criticality of systems is not limited to cyberspace. For industries such
However, it is also subject to forces of change from outside the IT department and
as the airlines, financial services, and telecommunications, continuous availability
even from outside the corporation.
of mission-critical applications is essential. The spread of enterprise resource plan-
Among the forces affecting IT is the accounting department. Over the past two ning (ERP) applications (for example, SAP R/3, PeopleSoft, Oracle, and so on)
decades, companies have sought to become even more competitive. In some cases, has produced another form of mission-critical application. The need for highly
this has been driven by a desire for increased profits, and in other instances, it is a available applications with high performance levels has become nearly ubiquitous.
matter of survival, as competitors force prices downward. This has translated into
IT has found itself in an unenviable position. CIOs around the world are being
pressure on IT to reduce costs. IT is being asked to live with smaller budgets—both
told to reduce their budget and improve service levels for an ever-increasing num-
for capital expenditures and for ongoing expenses. The result is that it is difficult to
ber of applications. In other words, they are faced with the impossible situation of
acquire additional equipment to accommodate the growth in usage most companies
having to deliver "more" with "less."
are experiencing (increased number of users, transaction volume, number of appli-
cations, and so on). Acquisition of more modern, faster, and more reliable equip-
ment is made difficult.
Divergent Views
Reductions in expense budgets usually translate into reductions in the size of the IT managers have not been ignoring the needs of their users or the business
IT staff because payroll is normally the greatest single expense in an IT budget. impacts of the services provided by their organizations. In fact, from the very
Other casualties of budget cuts are salary increases and training for the IT staff. beginning, IT managers have attempted to measure and assess the performance
In a competitive job market, the results of these latter items can be higher staff of the services provided by their organizations. However, they have been limited
turnover and employees who are less experienced and not as well trained as would by perspective and technology.
be desirable. Ultimately, this limits IT's ability to improve or maintain the levels
of service delivered to the end users. Historically, IT managers have measured the effectiveness of their organizations
by looking at the individual hardware and software components. In the beginning,
While pressures to reduce costs have been mounting, IT clients, the end users, have this made perfect sense—there was only one computer, and it could only run a
become less ignorant and increasingly sophisticated and technically savvy in the single program at a time. However, that condition did not last long. Today analyz-
ways of computing. They have computers at home. They have computers on their ing individual components provides information that is relevant and important for
desks in their offices and many have purchased servers for their departments. They managing a specific device or component, but it does not provide a perspective on
are no longer as accepting of excuses or explanations from IT as they once were. the overall service being provided to the end user.
The users know what they want and believe they know what is possible. They
Consider the example of Acme Manufacturing. The order entry department has
have aggressive timetables for the delivery of new systems and high expectations
in terms of the availability and performance of those systems. negotiated a Service Level Agreement (SLA) with IT. That agreement calls for the
order entry system to be available 99.9% of the time, and no component is to have
Just as users' technical awareness has risen, so has their reliance on computer sys- more than 10 minutes of total downtime in a month. This SLA is incorporated
tems. The number of mission-critical systems—systems essential to the operation into the objectives for each of the IT department managers. Table 1.1 shows the
of the business and, ultimately, its very survival—continues to grow daily. Thus, IT's results for one month. At the end of the month, almost everyone is pleased with
the results. All but one of the components have met or exceeded the objectives for
availability and for total downtime in the month.
'Table I .I Acme Order Entry System Performance for One Month At this point, IT uanagers are congratulating L.R . 1i other di• the high level of
Minutes of service that they have delivered to the order entry department. Meanwhile, the
Component Downtime Availability
order entry department managers are hopping mad. The users see IT as being
Building Hub
unresponsive and unable (or unwilling) to meet their needs. As they look at the
0 100.00% situation, outsourcing the IT function starts to sound like an appealing alternative.
Customer Database 4.32 99.99%
Inventory Database 0 100.00% Studies by Enterprise Management Associates have found another problem. Too
LAN 6.00 99.99% Olen, IT fails to provide service level statistics that are meaningful to the end
Local Server 8.64 99.98% user. Even worse than the component-centric view taken in the previous example,
Order Entry Application 7.54 99.98% some IT managers substitute techno-babble for meaningful information. For
Remote Host 69.72 99.84% example, it is not uncommon to find IT groups giving reports to their clients that
WAN 9.88 99.98% enotain such things as packets dropped, page faults, and so on. This type of data
might be meaningful to the engineers working with the specific component (for
example, the wide area network), but it is little more than gibberish to their
Obviously, the Remote Host had some problems during the month, but this
clients.
was because of the failure of a circuit board. Operations had to wait for a service
technician to arrive on site and install the new board. Considering this, even the 'He two communities, the IT organization and their clients, have vastly different
operations group is reasonably satisfied with the performance that IT delivered. perspectives on IT services. IT feels that the client community needs to have more
realistic expectations, basing them on what is possible, practical, and affordable.
The management of the order entry department takes a much different view of the
I laced with constraints of immature technology, tight budgets, limited headcount,
performance for the month. The end users see a month in which there was a total
and scarcity of skilled personnel, IT feels that they should be given credit for being
of 106.1 minutes in which they could not process orders. As shown in Table 1.2,
,ihle to accomplish so much. In other words, they feel that they deserve an "A" for
the availability that they experienced was 99.75%—well below the target of 99.9%.
effort. On the other hand, the clients' very survival often depends on the delivery
(Availability of 99.9% would allow a - total unavailable time of 43.2 minutes in a
or adequate levels of service by IT. They feel that they are paying for the services
30-day month.) It is possible that there was some overlap in outages; however, that
and should have the right to define what will be delivered.
is statistically unlikely. Also, this example has been constructed using the assumption
that outages impacting more than one component would be charged to the root
cause. For example, if the remote host fails, it will necessarily result in an outage
Technical Challenge
for the order entry application, the customer database, and the inventory database.
I! is true that IT still maintains far too much of a component-centric view of the
However, the outage would only be charged against the remote host because that
is the cause of the other components being unavailable. services that it delivers. Although IT organizations must share in the blame for this
c ontinued limited perspective, much of it can be explained by the limitations of
the technology available for management reporting. Consider the problem that is
Table 1.2 Users' Perspective—Acme Order Entry System Performance reflected in Figure 1.1.This is a simplified illustration of a distributed computing
Minutes of environment. Each type of component has a unique management system attached.
Component Downtime Availability Those unique management systems (element management systems) can provide a
Building Hub O great deal of information about any single device. The problem presented by these
100.00%
Customer Database 4.32 99.99%
systems is that they also produce a fragmented view of the service. Element man-
Inventory Database O agement systems are not designed to assess each device in the overall context of
100.00%
LAN 6.00 99.99%
lie service that it is helping to deliver.
Local Server 8.64 99.98% I ;Iement management systems are providing information about each component in
Order Entry Application 7.54 99.98% isolation. It is as if a doctor carefully examines a single part of your body From
Remote Host 69.72 99.84% that examination, it will be possible to describe the state of that body part. In the
WAN 9 . 88 99.98% case of certain critical body parts (such as the heart, liver, and so on), it might be
Composite 106.1 99.75%
possible to state how the part is impacting your general health or lilt' expectancy.
However, more often it is necessary to consider the part within the context of the
total body. It is this perspective that is moving the medical community toward a
holistic approach to treatment and diagnosis. There is a similar need within the IT
community.

Workstation

Desktop
•Systems

mac, A complex distributed environment.


Figure 1.2
Figure 1.1 A simple distributed environment.

In Figure 1.1, there is only a single router, and if the router fails, there is no question What Is SLM?
that the service is interrupted. However, in a more complex environment—with
many routers (see Figure 1.2), alternate paths for data, and so on—the impact of the vice level management (SLM) is the disciplined, proactive methodology and
piocedures used to ensure that adequate levels of service are delivered to all IT
failure of a single router is not as obvious.
users in accordance with business priorities and at acceptable cost. Effective SLM
What is needed is the ability to assess the impact of any aberration in the iequires the IT organization to thoroughly understand each service it provides,
service delivery environment on the service and the end users. Herein lies the nit luding the relative priority and business importance of each.
challenge assessing the overall impact when the data is only available on a piece-
'vet vice levels typically are defined in terms of the availability, responsiveness,
meal basis. Many companies simply rely on the subjective judgment of the opera- integrity, and security delivered to the users of the service.These criteria must be
tions personnel. Unfortunately, this is an unreliable approach that cannot produce viewed in light of the specific goals of the application being provided. For exam-
accurate measurements of the overall level of the service being delivered. ple, a human resources application might require communications such as email
Fortunately, new software products are emerging that aim to provide such mea- ,uuong individuals. An order-entry application might involve multiple cooperating
surements.We will look at these tools in Chapter 7, "Service Level Management applications such as supply chain management. In all cases, the service should be
Products."
treated as a closed-loop system with all service levels related directly to the end-
her experience.
I'hr instrument for enforcing SLM is the Service Level Agreement (SLA): a
oitract between IT and its clients that specifies the parameters of system capacity,
network performance, and overall response time required to meet business objec-
t ives. The SLA also specifies a process for measuring and reporting the quality of
service provided by IT, and it describes compensation due the client if IT misses
i lie mark.
Pros and Cons SI.M's benefits are so compelling that its use isn't relegated only to IT environ-
Some IT managers have a negative perception of service level management (SLM) ments. Seminole Electric Cooperative Inc. (Tampa, FL) also uses SLAs as a key ele-
and Service Level Agreements (SLAs). To begin with, there is a tendency to view ment in its multilevel service offerings. Customers receive cycles of electrical
SLM as just another fad sweeping across the IT landscape. Certainly in recent years power instead of packets or system capacity.
there have been many such fads. Implementing SLM requires time and effort.When
IT is already working with limited resources, it is difficult to rationalize allocating
Other Service Providers
some of those resources to work on SLM (especially if it is just a passing fad).
('his book has been written mainly from the perspective of IT as a provider of tech-
Another reason for a negative perception is that in some organizations, the SLM
nology services to its clients. However, it is important to note this was done as a
process and the associated SLAs have been abused by the clients of IT. Specifically,
matter of convenience for the authors and the readers. In reading this book, it should
the users have succeeded in negotiating unreasonable or unattainable service level
always be remembered that IT is also a consumer of services.And in this area, IT
commitments and used them as a "club" against the IT organization. Some IT
interest in establishing SLAs with service providers is on the rise. The research firm
managers believe their organizations expect nothing less than 100% uptime, and
they think that signing an SLA merely gives clients a means of documenting the International Data Corp. (Framingham, MA) announced in September 1999 that
perceived failures of the IT group. ')0% of an annual survey of 500 executives said they require SLAs from
all service providers.This figure is up from 30% for the same survey in 1998.
Although these concerns and reservations are understandable and valid, IT manage-
As a client or customer of a service provider, all the principles set forth in this book
ment should not allow them to prevent development of service level management,
which can be vitally important to the company and the IT organization. with regard to IT's clients apply equally well to IT in its role as a user of services.
Some of the external service providers with whom IT might interface include
The importance of SLM is demonstrated by its rising popularity The research companies such as Internet Service Providers (ISPs), various forms of communica-
firm Cahners In-Stat Group (Newton, MA) reports that the use of SLAs rose 25% tions service providers (telcos), out-sourcing companies, and application service
among Fortune 1000 companies during 1999, and the market for service level providers (ASPs). Similarly, there might be other service providers within the same
management products is expected to reach $280 million in worldwide company as the IT organization. (Whether a service provider is within the same
revenue by the end of 2000. company as the IT department or external will change the form of the Service
Organizations that implement SLM testify to its value in a variety of ways. A com- Level Agreement, but not the need for the agreement and service level guarantees.)
prehensive SLA served as the basis for a lucrative contract between systems integrator If those organizations providing services to IT do not deliver a consistent, accept-
2020 Group Ltd. (Middlesex, United Kingdom) and IMS U.K. and Ireland (Pinner, able level of service, IT will not be able to meet its service commitments. It is only
United Kingdom), a healthcare consultancy. The 2020 Group was enlisted by IMS reasonable for IT to insist on service level guarantees from its service providers.
to help them plan and launch a multisite network requiring 100% reliability. The This book is also relevant to the non-IT service providers. Companies such as the
integrator met this objective by using a "tailored and detailed" SLA, which also telcos, ISPs, ASPS, and so on can equally draw on the principles and guidelines in
helped the team manage the transition to the new network and ensured that 2020's this book, as we will explore in depth in Chapter 11,"Service Level Management
work for IMS was showcased to the best advantage. As a result, the integrator won as Service Enabler."
a lucrative contract to outsource IMS's facilities management on a full-time basis.
At Stanford University (Stanford, CA), the UNIX Systems Support group within
the university's Information Technology Systems and Services organization uses The Importance of SLM
SLAs to offer various levels of fee-paid services to staff and students. This approach All this makes it easy to answer the question, "Is service level management really
increases IT's efficiency and allows clients to plan maintenance costs important to IT?" with an unequivocal "Yes!" For IT, effective service level man-
as a regular budget item instead of as an unexpected expense. agement is a matter of survival. To understand this, it might be helpful to think of
There are plenty other SLM success stories. At the National Institutes of Health an IT organization as a company. That company's products are the services that it
delivers to its clients. It is rare to find a successful company that sells its products
(NIH, Bethesda, MD), SLAs have been implemented by IT not only to furnish
on the basis of"take it or leave it,"—caveat emptor. Instead, companies go to con-
dependable support and timely response to problems, but also to "lower costs through
siderable effort and expense to clearly define both the capabilities and limitations
standardized configurations."As an additional benefit, NIH sees SLAs as a means of
of the product they are selling. Often these are defined in a contract with the
modeling IT efficiency for other government agencies, thereby taking leadership.
buyer. Similarly, a company cannot hope to be successful if it does not implement
production controls to ensure that what is shipped is within specifications.
Consider a company that produces sugar (powdered and granulated) and sells it not established at a particular level. Instead, undocumented requirements are free
in two-pound bags. First, the specifications ("granulated sugar" and "net weight: to rise steadily, always staying ahead of the level of service that is being delivered.
2 lb.") appear clearly on the bag. For reasons of customer satisfaction and cost When Service Level Agreements are negotiated, requirements are documented.
control, in addition to governmental regulations, the product must meet those Although users might continue to want higher levels of service, the agreement
specifications. s erves as a braking mechanism. IT is able to point to the commitments in the
SLA that were previously identified as being acceptable.Any changes require a
Note renegotiation of the agreement and, potentially, additional funding for IT in order
An IT department without a service level management program is like a sugar producer that puts
to provide the higher level of service.
a "reasonable" amount of product in unlabeled bags. Clearly, this is not a formula for success.

Resource Regulation
There are six basic reasons for an IT organization to implement service level
management. These reasons are as follows: SI,M provides a form of governance over IT resources. In some organizations,
it powerful user group will sometimes demand support for an application that
• Client satisfaction unfairly ties up resources. With an SLA in place, it is more difficult for a strong
• Managing expectations minority to outweigh the interests of the majority. SLAs also help IT avoid
opacity problems that result when too many applications crowd the network,
• Resource regulation
wryer, mainframe, or desktop. And because SLAs specify levels of service, they can
• Internal marketing of IT services he used as indicators for ongoing system capacity and network bandwidth require-
• Cost control ments. Specific resources will be needed to keep abreast of SLA parameters. And
• Defensive strategy the monitoring and measurement deployed by IT to keep up with SLAs ensures
early warning for any new capacity that might be required.

Client Satisfaction Internal Marketing of IT Services


The leading reason for implementing service level management is client satisfac- When used correctly, SLM not only helps IT departments to deploy resources
tion. To begin with, SLM necessitates a dialog between IT managers and their Iairly, but also it can be a great marketing tool. By ensuring ongoing, consistent
clients. This is necessary in order for IT to be able to understand the client's ser- levels of response time and availability, SLAs provide a powerful way for IT to
vice requirements. It also forces clients to clearly state (perhaps for the first time) let internal clients know what a terrific job they are doing. Before the advent of
their requirements or expectations. When IT and the client agree on what is an SLAB, the only contact many organizations had with their IT departments
acceptable level of service, they are establishing a benchmark against which IT occurred when something went wrong. This state of affairs tended to place IT
performance can be measured. IT is able to shift toward a defined objective—the in a negative light, causing clients to view IT as a necessary evil and the object
client's requirements. The dialog that is initially established continues through the of blame for system failures. In changing this perception, SLM takes IT out of
process with regular reports. Even a process of service level management cannot the category of a liability and puts it among the company's assets. With the right
produce happy clients when service level commitments are not met. However, it approach, SLM puts IT in the limelight with other departments—such as finance
will significantly raise overall client satisfaction when commitments are met. It can and accounting that are charged with sustaining and growing the business.
also help to improve the situation when targets are missed.

Cost Control
Managing Expectations In the context of cost control, service level management is a double-edged sword.
An ancillary benefit of implementing service level management is that it makes it First, it helps IT to better determine the appropriate level of service to provide.
possible to avoid so-called expectation creep—that is, the ever rising levels of users' Without service level objectives arrived at in dialog with their clients, IT manage-
undocumented expectations. It is common for people to want improvements over ment is forced to guess. Too often, this guesswork leads to excess. That is, it can
the status quo. If users' requirements are not documented, their expectations are lead to over-staffing, configuring networks with excess capacity, buying larger,
faster computers, and so on.
In the absence of dialog with IT, the users' requirements are established by what is I'iohably more important than all the other factors fueling interest in service level
desirable rather than what is affordable. The requirements and expectations are not nt,ntagement is the fact that technology has matured, making possible end-to-end
tempered by the reality of feasibility or affordability. Service level management can measurement and reporting available at a reasonable cost. Dozens of vendors, rang-
also impact costs through moderating user demands for higher levels of service. ing from the very largest to minuscule start-ups, have focused their attention and
This can happen in two ways. As discussed in the previous section, service level he considerable talents of their technical wizards on the challenges of service level
management can limit the escalation of user demands. Also, as part of the dialog tinnagement.A 1996 Enterprise Management Associates survey of IT managers
with IT, the financial impact of higher levels of service can also be explained. In li and only twelve products that were being used for service level management.
some instances, the business case will justify the additional cost of providing higher 1 lowever, possibly only one of those products (Microsoft Excel) added any value
levels of service. In other cases, there will not be a financial justification and, to he process, albeit minimal. Eighteen months later, in May 1998, the number
hopefully, the unnecessary cost will be avoided. of companies that identified themselves as offering products specifically for service
level management had risen to 62. By March of 1999, the number had climbed
to W) (see Figure 1.3).
Defensive Strategy
Ultimately everyone is motivated by self-interest. IT managers are no different. SLM Products
It can clearly be in the interest of IT managers to implement a service level man- 100
agement process.With SLM in place, IT has a tool to use in defending itself from
user attacks. Clear objectives are set and documented. There is no room for doubt 80
about whether the objectives have been met. In a well-written Service Level
Agreement, even the metrics for measuring service levels are defined and agreed
to by both the users and IT. 60

Ultimately, service level management is something that can benefit the user, the
IT organization, and the corporation in which they both work. The process of ser- 40

vice level management can temper users' demands for higher levels of service.
Conversely, service level management can hold IT accountable for delivering 20
agreed upon levels of service, while providing them with clear objectives for ser-
vice. Outsourcing continues to be very popular. SLM can be the best defensive
0 1999
strategy that IT can have against user dissatisfaction that can lead to outsourcing. 1996 1998

Figure 1.3 Growth in SLM Products.


Why Now?
II' a company wants to implement SLM, it is a much simpler process than it was
If the case for service level management is so compelling, why is it just now even 4 years ago. In the past, collecting the data (if available) and generating the
receiving widespread attention? Several reasons help explain the sudden attention
ti I,M reports was slow and labor-intensive. Sometimes it required custom programs
that service level management is receiving. First, there has been a dramatic increase lo be written, or expensive data collection products to be purchased. Even then,
in the number of applications (that is, the number of services being provided) and I he results were usually marginal. The situation has improved significantly. The
in the relative importance of those applications. Companies are more dependent introduction of new products has facilitated the data collection process as well as
upon the services that IT provides. Me merging or correlating of data from diverse sources. Although there are more
The next factor driving the increased interest in service level agreements is ,advances to come, SLM reporting has become dramatically easier.
increasing user sophistication and their growing dissatisfaction with the level of
service they are receiving. This change in the user community is discussed at the
beginning of this chapter.
Summary
In today's global business environment, IT professionals find themselves undergoing
pressure to reduce costs and deliver higher-than-ever levels of service to increasingly
savvy users. To achieve this, they are deploying service level management (SLM), a
methodology for ensuring consistent levels of capacity and performance in IT envi-
ronments. SLM includes a contract between IT and its clients (whether in house or
external to the organization), that specifies the client's expectations, IT's responsibili-
CHAPTER
ties, and compensation that IT will provide if goals are not met. Despite some initial
misgivings, the value and importance of SLM have been established. Its successful use
is well documented in numerous case studies, and its popularity is increasing not
only within IT organizations, but also among service providers. A range of new
products geared to supporting SLM further testifies to its deployment as an unques-
tioned requirement in IT organizations worldwide.

The Perception and


Management of
Service Levels

Seporting,
ervice level management (SLM) is the continuous process of measuring,
and improving the quality of service provided by the IT organization to
i
11 is business. This requires that the IT organization understands each service it pro-
vides, including relative priorities, business importance, and which lines of business
.11 id individual users consume which service.
'I'licre are a number of important aspects that relate to the perception and man-
,igcm . ent of service levels. The first consideration is to ensure that service levels to
In managed are measured and evaluated from a perspective that matches the busi-
ness goals of the IT organization. The IT department supports business productiv-
ity by ensuring that the applications used by internal personnel are available to
hem when required and that they are responsive enough to allow these users to
he optimally productive. It is almost certain that failures and errors will lead to a
u rvice outage. The IT department must restore the service as quickly as possible
with the least amount of disruption to other services.
The IT department must also ensure that automated business processes complete Notwork

in a timely fashion to meet required deadlines and to enhance effectiveness and


profitability. Additionally, if customers directly use or interact with IT services, IT
must ensure that their experience is pleasurable—leading to customer loyalty and
repeat business.
Service levels are measured and managed to improve a number of quantifiable
aspects of the perceived quality of the services delivered. These are described in
the remainder of the chapter.

Availability <End.to-End Approach to Service Level Management


Availability is the percentage of the time that a service is available for use. This can
be a controversial measure of service quality because of the number of different
measurement mechanisms. The variability of measurements is a result of differing
perspectives of service goals, which vary primarily by the job function of the indi-
vidual doing the measuring. For example, the network manager typically sees the
service as the network connectivity; the system manager views the service as the Figure 2.1 End-to-end service level management.
server being operational; the database administrator sees the service as available
access to data held in the database. Hence, quoted availability measurements typi-
cally relate to individual components (for example, server availability or network Note
availability) and do not match the IT user's perception of availability. The end user this end-to-end definition of service using the user's perspective is required for all measures relating
or line of business wants to know that he can access the applications and data to service quality.
required for him to perform productive work.
'hi/lability of a service is the capability to successfully complete an entire service or
Note
business transaction as defined previously. Component availability describes when an
Unless availability measurement relates directly to the user experience, it will have little positive individual component the service depends on is operational.
value and might, in fact, damage the credibility of the IT organization quoting such measurements.
I'he availability of each service is defined within standard hours of operation for
that service. Most IT organizations need to remove applications from service at
True availability must be measured end-to-end from the end user through all the
technology layers and components to the desired business application and data, and periodic intervals (called a maintenance window) in order to undertake routine main-
back to the end user. Such an aggregate value can be difficult to measure directly Ir nance of the application, supporting databases, and underlying infrastructure.
I knee availability objectives will also specify planned outages by service, together
and might have to be derived by combining the availability of all the components
with the schedule for those outages. The standard hours of operation will vary
traversed. Figure 2.1 shows the concept of end-to-end service level management.
depending on the nature and criticality of each service. As more services become
The reality of a business-oriented IT service might, in fact, be more complex,
involving multiple applications, extranets, and Internet connections. I nternet based, the length of the maintenance window shrinks and might not be
, n eeptable at all.
The first step in measuring and managing service levels is to define each service
and map out the service from end-to-end. Each of the end users and their loca- Availability is the most important factor that influences the users' perception of the
tions should be identified together with the path they take to access the business quality of IT services. It is also the most critical factor affecting user productivity,
application providing the core part of the service. The data used by the application particularly if the user depends entirely on a particular business application to per-
should be determined along with where it resides and how it is accessed. If the form_ her job function.
core application needs to interact with other applications, these should also be
identified and mapped. In this manner the overall flow of a service or business
transaction can be determined, recorded, and used to define transaction types,
component dependencies, and appropriate service measurement points.
Performance Botch Job Turnaround
As with availability, performance must be measured from the end users' perspective A large Blount of processing dues not req uire continuous interaction with either
and must also relate to the business goals of IT. The performance of a service is Iltr riser or system operator and happens in batch mode or as background tasks. In
measured by the responsiveness of the application to interactive users and the time this i' ,is&', job streams are scheduled and processed to perform routine operations
required to complete each batch job to be processed (also called job turnaround). The dim produce predetermined outputs such as payroll, financial reporting, inventory
responsiveness of the application and batch job processing times will be affected noauoilrstos, and so on. Responsiveness of the batch jobs is referred to as turnaround.
directly by the amount of work to be processed (also called workload levels). This I' h ls is the time between submitting the batch or background request and the
concept is discussed in the next section. rrnnple Lion of all processing associated with that request, including delivering
',input to the required recipients.
Ina large IT environment, numerous batch jobs will have to be processed every
Interactive Responsiveness ¡lay, told the volume of jobs typically varies in cycles with peaks at the end of
Interactive responsiveness relates to the time taken to complete a request on behalf week, month, quarter, and fiscal year.
of a user. The quicker the requests are completed, the more responsive the service.
IIore is usually a specified "batch window" within which all batch processing has
The request could be processing a service transaction or retrieving some informa-
to Iìnish to ensure that the performance, and particularly the responsiveness, of
tion. It,is important that any measure of responsiveness match the user experience,
hence response time measures must be end-to-end, from the end user's desktop Itueroctive processing is not degraded by the batch processing. Some background
through the business application (including any database access and interaction lacks are continuous in nature such as print spooling or file system management.
with other applications) and back to the end user.
The responsiveness of the service is second only to availability as an important Critical Deadlines
factor in the user's perception of the quality of services provided by IT. There is a In ,uldition to the normal window for processing all batch jobs, there might
direct correlation between how fast the application responds to online users and Ile specified times at which certain jobs or tasks must finish to satisfy external
their productivity. An important consideration is the consistency of the interactive vendors or regulation. For example, the payroll run might have to be completed
response times experienced by the end users. Erratic and unpredictable response by :I:00 a.m. to ensure that the information is sent to the bank in time for the
times that vary from exceptionally fast to extremely slow will be perceived by the electronic funds transfer to be completed that night.
users as unacceptable and far worse than consistent response times that might be
merely adequate.
Note
It is important that all applications supported by the IT environment meet their Meeting critical deadlines can be very important because there might be monetary damages or
performance goals. If balanced performance is not maintained, one application penalties for not completing the work by the specified time.
service might meet its performance objectives at the expense of other application
services, which will result in a dissatisfied user community. III many cases, the completion of critical deadline jobs will take precedence over
iipleting all jobs within the batch window and might have priority over inter-
Tip
u live processing.
In cases where performance is certain to degrade over time because of increasing workloads, and
where responsiveness is significantly better than required initially, some IT departments build in
latency that can be removed gradually over time. This ensures that response times can be held User Perception of Performance
constant at acceptable levels, and ensures that unrealistic expectations aren't set. I Istobility studies have identified the relationship between response times and user
t isf action for various user and work profiles. This varies tremendously by the
n,ilure of the work involved, the perceived difficulty of the task being performed
by the automated process, and the relationship between response time and user
"think" time. Similar parameters will affect the user's perception of the adequacy
or hatch job turnaround. As mentioned previously, consistency of responsiveness is
c uical to user satisfaction.
26 27

Workload Levels indication, A trading house accepts the hid in yet another transaction. Finally, the
The workload level is the volume of processing performed by a particular service. sale confirmation is sent back to the broker in another transaction, and the broker
notifies the customer of the sale. Each of these interactions could be considered a
This includes both the rate of processing interactive transactions, as well as the
lousiness transaction by itself, but will have little relevance to the customer wanting
number of completed batch jobs within a given time period.
In sell stock, The customer's perception will be that he completed one business
These service workloads generally relate to specific applications; however, work- isaction—he sold a volume of stock.
load processing might span multiple applications and generate work on multiple
All measures of interactive workloads need to specify whether business transactions
systems. A service workload uses all the components involved in delivering the ser-
vice, including using network, system, database, and middleware resources. In order ur application transactions are being quoted, and if a business transaction, the par-
to plan capacity requirements and understand the effect of service workloads, it is I les to the business transaction should also be understood, particularly whether one
very useful to correlate service workloads with individual resource utilization levels more of the parties are outside the corporation.
across all components that provide the service infrastructure.
Caution
Measuring business transactions can be complicated unless the application code itself supports such
Note
The most important measures of service workload volumes are online transaction rates, the number n measure.

of batch jobs to be processed, and the number of these jobs that will be completed in parallel.
Mapping business transactions to application transactions, to underlying interac-
lions, and relating how the supporting technology components were used to
I,rocess the transaction will be required in order to plan capacity requirements.
Transaction Rates
Interactive workloads are usually measured as the number of transactions per sec-
ond; however, it is important to understand the nature of the quoted transactions. Client/Server Interactions
Transactions represent a complete unit of useful work. It is important to recognize Many applications have been developed using the client/server architecture. In
the difference between a transaction and a system or application interaction. A sin- many cases, this is multi-tiered such as an application that is split among the client-
gle interaction is simply one pair of messages in a dialog between the user and the side presentation, application server, and database server. This complicates the mea-
application, such as the user submitting a request for service and receiving an s tirement of the transaction.
acknowledgment of the request. A single transaction can involve multiple user
interactions.
Tip
The overriding rule for determining the scope of a business transaction is that it begins with an end
user initiating a business action or request and ends when the automated business process fulfills
Definition of a Transaction
that initial business request.
An application transaction performs some business task that results in a change to the
data associated with, or the state of, the automated application. A business transaction
Using a client/server application architecture increases the need to carefully map
changes the state of a business entity, changes the state of the relationship between
(lie transaction to ensure that all the subordinate interactions are encapsulated by
business entities, or performs some service on behalf of a business customer.
the business transaction definition. This ensures that the measurement of transac-
For example, a business transaction might result in the sale of a number of shares I ion rates and associated response times matches the end user's experience.
in a publicly traded company. In order to complete this business transaction, a
number of different interactions and application transactions might occur. The
broker registers the customer's desire to sell the shares using one application trans- Batch Job Concurrency
action. The broker's system checks the stock price from the stock exchange with Another measure of workload levels is the number of batch jobs that are run
a different application transaction. Perhaps the broker then lists the volume to be simultaneously. Most operating systems allow the operator to control the number
sold and its asking price with a market maker by interfacing to that entity's trading of background jobs that can be initiated concurrently and the optimal number will
28 29

vary depending on the power, of the system performing the processing and the
characteristics of the workloads themselves.
o ordinating actions and administration across these multiple security systems becomes critical for
ensuring consistency of access privileges and reducing the administrative overhead and potential for
Tip
errors.
Workload balancing is important to ensure that synergistic workloads run concurrently because con-
flicting jobs (jobs with similar characteristics, such as being CPU intensive or I/O intensive) might
lead to thrashing and performance degradation.
I )(fining Resources
It becomes particularly important to carefully control the amount and priority of All users and resources—including services, data, applications, systems, and network
batch jobs and background processing when these are performed concurrently elements—must be defined to the security systems. To avoid issues with multiple
with interactive processing. There is a very real possibility that these background un onsistent definitions, a resource-naming architecture should be defined and
tasks and jobs will detract from the quality of service provided to interactive users. adopted. Additionally, a centralized security administration application can be used
t o automate the coordination and propagation of definitions and updates between
multiple distributed security systems.
Batch Job Dependencies
Iii more complex environments, it will be very useful to maintain a registry or
Batch job stream specifications not only include which jobs to run, but also their directory outlining the relationship between the defined resources. For example, as
sequencing and inter-dependencies. As a simple example, every two weeks a payroll t lie service is mapped onto the underlying infrastructure (as outlined in the section
run might include first identifying and flagging new recruits within the employee till Availability), it would be very useful to capture this information and maintain
database, processing all employee records to determine vacation or sick days used by ,111(1 use it when diagnosing service difficulties and when aggregating component
employees, and then processing the payroll to calculate wages and taxes owed. In service levels to produce end-to-end service level reports.
this case, the first two steps must complete prior to the third step of calculating the
payroll numbers.

Complying with job dependencies will place limits on the number of batch jobs Access Controls
that can be run in parallel, hence constraining the overall workload levels. Ensuring When the resources have been defined, access control lists are defined and then
that all dependencies are met can also negatively impact the total time required to used to determine which users have access to which resources, and the nature of
complete all batch windows. Therefore careful job stream planning, scheduling, and t lie authorized access. Depending on the type of resource to be accessed or used,
operational management are important aspects of meeting required service levels. (lie nature of the access will vary. For example, a user might be authorized to use a
particular service that also requires access rights to certain data objects within a
Note database.The nature or level of the access can range from read, write, update, cre-
ate, or delete. Depending on her access privileges to the underlying data, the user's
Where critical deadlines exist, extra coordination is required to ensure that the critical jobs, together
ability to invoke the various service options will vary.
with any dependencies, can finish in the required timeframe.

Note
Understanding, mapping, and maintaining the resources used by each service is important for under-
Security standing which resources a particular user will need to access in order to perform his job function.
Defining the security of a service includes the definition of who can access the
service, the nature of the access, and the mechanisms used to detect, prevent, and 'rhe service options available to the user will determine the level of access to each
report unauthorized access. As applications span multiple platforms and users application and resource she requires. Again, the use of a registry or directory ser-
require access to data across multiple databases, the complexity of the security vice can simplify this aspect of maintaining access control.
environment increases tremendously, and multiple security management systems
will be employed.
Assigning Users to Privilege Classes I he issue of information privacy is a sensitive topic within the Internet commu-
To improve the consistency of resource access, users can be allocated to privilege nity ,ind one that has direct impact on the users' perception of service quality.
classes that group users together with common profiles. This grouping could be by htandards and regulation can be expected to continue to evolve in this area, and
job function, job level, organization structure, physical location, or some combina- hence service level management must embrace managing information privacy.
tion of these. In this manner, changes to the access privileges in existing or new
resources can be applied to an entire group or class of user simultaneously.
IluNiness Ownership of Security
Tip lie I I' resource definitions, as well as the identity and information associated with
users who access those resources, are all business assets, and as such it is important
The use of a registry to hold and maintain this information can simplify the administration of user
groups and subsequent granting or revoking of access privileges to resources.
to identify the business owner of IT security. This security business manager must
he responsible for defining security requirements, policies, privilege classes, access
I rots, escalation procedures, and monitoring roles and procedures.

Intrusion Detection The security aspects of service level management and reporting should be aimed
tit satisfying the requirements specified by this security business owner.
After users, resources, and authorized access privileges have been defined, a contin-
uous process of ensuring only authorized access to resources takes place. The secu-
rity systems should automatically enforce the security policies as defined by the Accuracy
user group associations and the access control lists for defined resources. Service a difficult concept to define and measure quantitatively, but the
level management should ensure that the definitions correctly allow access and use
her vice accuracy is
perception of the quality of the service offering will be influenced by a number of
of authorized services while refusing unauthorized access.
aspects relating to the accuracy of the data used for decisions and the accuracy of
Another aspect of service management is monitoring the IT environment to implementing IT procedures.
detect unauthorized access or attempts to access resources illegally. This includes
logging failed access attempts, particularly methodical repeated access attempts;
notifying security, system, and service administrators of these attempts in real- Data Integrity
time; and exercising escalation procedures to increase the difficulty of obtaining tat,a integrity is the most significant aspect of ensuring the accuracy of the data
unauthorized access. used for making decisions. Hardware failures, logic errors, and program architec-
use issues, as well as operator and user error, can all impact the integrity of data.
I tinsuring data integrity requires checking the consistency of data and databases
Privacy Issues
su uctures including views, stored procedures, indices, and so on.
An important consideration when implementing security systems is to ensure that, Additionally, defining and implementing appropriate data backup and recovery
where appropriate, the identity of the users is kept private and is not available to
pr ocedures will improve data integrity by enabling restoration of corrupted data.
unauthorized access. If registries and directory services are used, controlling access
I l ecovery of data is addressed in more detail in the section "Recoverability" later
to these data stores is an important aspect of ensuring information privacy.
in the chapter.
Note
Information privacy becomes more important for those applications that directly touch customers— Data Currency
for example, e-commerce applications—or where applications interface with business partners, such Another important aspect of data accuracy is the currency of data. This is particu-
as with supply chain and e-business applications.
larly important when data is distributed across multiple data stores such as repli-
„ited databases, data warehouses, and data marts. In these cases, the latency, or delay
in propagating data changes to the distributed data stores, affects the accuracy of
tlhe data. Longer propagation delays result in data that is not consistent across the
enterprise, and different users will be working with various versions of the data.
Web servers, c-business, and e-commerce exacerbate this problem because in many e'i vies using that device or component, whereas a logic error will impact a single
cases data is moved from operational databases to data stores outside the firewall- vice. In either case, the failure might have a cascading effect on other services
for example, to the external Web site or to the business partner via an extranet.
thal INC the data or other output from a service initially impacted by the failure.A
frill disaster will affect all services in that location and all other services that depend
Caution Dui any physical devices in that location.
Applications using replicated data can result in customers and partners having different and
inaccurate data available to them, depending on the frequency of data updates. Note
the vast majority of outages today are because of logic errors rather than either hardware failure or

disasters.

Job Control
Inderstanding the impact of each outage type and planning for the correct recov-
The accuracy of provided services also depends on ensuring all the required batch rry procedure requires knowledge of the relationship between each service and the
jobs are run with the correct sequencing and dependency rules and that critical underlying resources, as well as the inter-relationship between services, particularly
deadlines are met. This aspect can rely on operator intervention, job scripts, or an tit which there is data sharing. The registry or directory of services and associated
automated job scheduler.
s upporting resources can be invaluable in understanding the effect of an individual
resource failure. Similarly, knowledge of the business process, application integra-
tion, data model, and the association between data objects and access by applica-
Scheduled Maintenance
tion services will allow the impact of one service outage on other services to be
As mentioned when discussing service availability, most IT environments require assessed.
maintenance functions to be performed regularly during scheduled downtimes.
Service availability is directly affected by the IT department's ability to remain
within the scheduled periods. Service quality also depends on the IT department Levels of Recovery
ensuring that all appropriate maintenance—including backups, bulk data moves Recovering from an outage will take place in multiple stages. In the event of a
and loads, database reorganizations, and database schema changes, as well as upgrades physical failure, the device is repaired or replaced. Then the data must be restored
to applications, supporting software, and hardware—is completed correctly during from a back-up copy; the application restarted; and as much lost work as possible,
the planned downtime. Hence, service management should include precise defini- from the time of the last backup to the time of failure, is re-created. Then business
tion and implementation of scheduled maintenance requirements, frequency, and processing can be resumed. Each of these processes can be automated to some
procedures.
degree, and the use of additional automation reduces the time required for recov-
ery and can also reduce the possibility of error in the recovery process.
Recoverability
Recovering from unplanned outage conditions as rapidly as possible is necessary to Recovering to a Specified Point in Time
improve the availability of services provided by IT The ultimate goal of a recover-
In the event of a program logic error, operator error, or user error, the goal is to
ability strategy is to provide business continuity or as close to this ideal as possible. recover to the point in time immediately prior to the error that caused the outage.
Hence, the IT organization must be able to recover from multiple types of outages This reduces the time required to resume normal business operations. During nor-
in a minimal time and with minimal disruption to the other services provided by IT. mal operation, transactions are captured and logged by applications, middleware,
and the database. Following an outage, these logs can be analyzed and used to re-
create transactions after the data is recovered from the backup, which provides a
Types of Outages
snapshot of the data as it existed when the back-up copy was taken.
Outages can be because of physical failures, logical errors, or a natural disaster.
Those transactions that were completed between the time the back-up copy
Depending on the nature of the outage, a single service might be disrupted or
was made and the point in time immediately prior to the error are re-created.
multiple services might be affected. For example, a physical failure will affect those
Automated solutions are available that will analyze the logs and generate a script
that is replayed to re-create the transactions.
JD
Time to Recover
C nvuunment is very important to keeping these costs under control. Effective
The time taken for the recovery includes the time required to cease processing, qet vice level management can help contain operating costs.
restore a stable environment, recover corrupted data, and re-create lost transactions.
The recovery time directly impacts service availability, whereas the ability to l'br. total IT costs can be calculated relatively easily; however, allocating costs to
recover all data and completed transactions has a direct effect on the accuracy Individual services is complex and can be subject to dispute.
and integrity of the data and the perceived quality of the service.
Note
The time taken to restore a stable environment depends on the extent of any
An important decision in the operation of the IT department will be whether to allocate costs at all.
physical damage and availability of additional or substitute hardware resources. The
II not, IT costs can be considered part of general administration costs.
additional recovery time depends on the amount of data to be recovered, the time
required to locate and mount the back-up media, the speed of data transfer from
the back-up media, the time required to re-create transactions, and the time
required to initialize and restart applications and background tasks. Tip
Assigning IT costs to lines of business allows IT to be seen as a business partner and service supplier
In a disaster situation, or if multiple services are affected by an outage, the time In the business, rather than as a cost center.
taken to recover an individual service will depend on the procedures used and the
priority given to recovering that particular service.
Ibis shift in positioning of the IT department as a business partner is subtle but
important if IT is willing to take on a more strategic role in helping ensure busi-
Tip
ness success. The goal of IT becomes a combination of improving business effi-
Recovering the most critical business services and those with the most stringent service-level ( iency as well as business effectiveness.A further refinement of this is that IT will
requirements first helps to increase the satisfaction of users and lines of business. lie measured on return on investment (ROI), rather than simply by total cost of
ownership (TCO).
Phis is a significant step forward in recognizing the business importance of the IT
Affordability department. As more corporations implement e-business initiatives, the IT depart-
ment becomes a critical revenue generator for the company, and the strategic role
A distinct balance exists between the service levels provided by the IT department
of IT becomes one of helping the company make money.
and the associated costs of delivering the service. Typically, the higher the availabil-
ity and performance required, the more costly it is to provide the service. In order When assigning costs to lines of business, considerations include what costs to use,
to better understand this relationship, and to ensure that lines of business use fully what method of allocation to use, and how to demonstrate return on investment
loaded costs when assessing their profit and loss, many organizations charge IT and show value for money.
costs directly to the users of IT services.

When allocating costs, a mechanism for calculating IT costs together with a


methodology for allocating those costs to the various users and lines of business
What Costs to Count
should be negotiated and agreed to by IT and the user community. Some costs are directly related to the use of IT resources such as CPU, memory,
disk space, and application software. Greater utilization of these resources results in
increased demand for hardware capacity and software licenses, which can be fulfilled
Quantifying Cost either by reduced service or additional purchases. Environmental costs and IT oper-
a tions staff can be considered constant overhead within certain limitations. Typically
The costs associated with running the IT environment include hardware costs
these are stepped functions in which additional staff or perhaps larger floor space
(capital depreciation and expenses), software costs, maintenance costs, personnel
become necessary as the IT environment grows past certain size thresholds.
costs, telecommunications cost, consultant and professional service costs, and
environmental costs. In most cases, the costs associated with operating the IT Software development costs vary depending on the demand for custom applica-
environment are the largest costs incurred, outweighing the expenses and capital tions and can normally be related to specific projects or business initiatives.
depreciation of hardware and software. This means that effectively managing the
Software license costs for application software usually increase with the number of wider range of desktops, Web servers, application servers, database servers,
En IONS a
users. Assigning application license costs and custom development costs to lines of mid corporate mainframes are used to complete single transa.ctions.There is also
business is generally straightforward. much greater network complexity resulting in additional equipment, local area
network, and telecommunication service costs.These are difficult to allocate based
Other costs will vary with the size and complexity of the IT environment includ-
ing additional capacity for contingencies, backups, and hot standby systems, along ou actual usage.
with utility software and network and system management solutions. These are dif- Mitre simplistic methods have gathered support in distributed environments
ficult to assign to individual users or lines of business but are necessary for smooth tug lading allocating costs based on service subscriptions or calculating costs based
operation. volume of business transactions. These are easier to calculate and have analogies
on
I tat most management can relate to, making them easier to understand and sell to
Tip
Ilie lines of business.
These costs lend themselves more to an agreed-upon allocation as overhead, rather than trying to Service subscription cost allocation is based on the cable television model. Lines
relate costs to usage by the lines of business.
oll businesses pay for services that are accessed by their personnel, and the cost per
user does not vary by the intensity of usage. This is a very simple model, and, pro-
vided that the cost per user per service is set appropriately, it is easy to understand
Assigning Costs to Line of Business slid easy to calculate.The price for each service subscription should relate to the
ost of providing the service and preferably will also reflect the perceived value of
The requirement to assign IT costs to lines of business varies according to each
the service.
company's accounting practices and desire for profit-and-loss reporting by lines of
business. In many companies, IT is seen as a cost center allocated to general
administration overhead. However, there is a trend toward viewing IT as a compet- Caution
The IT department needs to ensure that the sum of all subscriptions sold equals the total costs to be
itive advantage that can increase revenues and enhance market position. In this
case, allocating IT costs to lines of business as a cost of sales is very appropriate. allocated.

Note Allocating cost by business transaction volumes can be very attractive if business
ransactions are easily measured. This is an easy allocation mechanism for the lines
As e-commerce continues to gain momentum, the IT costs might, in fact, become the primary cost T
of business to understand. The IT department and each line of business will have
of sales for a growing number of companies or specific line of business.
to agree on a suitable cost per transaction. There will be significant work on the
part of the IT department to calculate appropriate transaction costs for each trans-
A variety of mechanisms are in use that allocate IT costs to lines of business. Ín action type based on resources consumed to perform the transaction, the length of
many, usage statistics are gathered including CPU utilization, disk space consumed, the transaction, and the perceived business value of the transaction.
output generated, and so on, and a formula is used to calculate a usage cost based
on these measurements. Additional costs such as telecommunication, environmental,
and labor costs are factored into the formula, such that total IT costs are covered by Caution
Again, the IT department must ensure that the total charges across all lines of business are adequate
the sum of the costs allocated to each line of business. This method is popular in
traditional mainframe environments where most costs are centered on the central- to cover all IT-related costs.
ized processing environment. The relationship between transactions processed and
business value is relatively easy to establish because much of the processing is batch
oriented, and the interactive processing is very transaction oriented.
Relating Value to Cost
The acceptance and growth of distributed computing environments has made col- The ease of relating IT costs to business value varies depending on the mechanism
lecting usage costs by individual user or group much more complex, and it can be used to calculate and allocate costs. Typically most lines of business will care little
very difficult for lines of business to relate IT resource consumption to the way about the actual IT resources consumed, but will understand that they use certain
they conduct their business. Multi-tiered applications mean that more resources applications and services in order to conduct business and complete business
transactions.
Allocating costs based on resource consumption will represent IT as a cost center
because it will be difficult for the lines of business to directly associate the cost
allocation to business volumes. In this case, the IT services will be viewed more as
a commodity, and the lines of business will seek to lower costs either by reducing
budget for the IT department or looking for an alternative low-cost provider.
Using service subscriptions can be easy for the user community to understand
because there is an analogy to the cable television industry. There is not a direct
CHAPTER
correlation between cost, usage intensity, and business volumes in this model, and
this might cause some confusion when trying to relate business value.
Using business transaction volumes provides a more direct link between cost and
business value, and the analogy is bank fees on banking transactions. The difficulty
here might be in negotiating a suitable fee per business transaction and differentiat-
ing value between the various types of transactions, while still having a simple
model that is easy to calculate. This also places the responsibility on the IT depart-
ment to understand business volumes by transaction type well enough to ensure
that total IT costs are recovered.

Summary Service Level


As outlined in this chapter, there are many aspects to service level management.
The most important concept is to ensure that the definition of the services to be
Reporting
managed relate to the perception of the lines of business and the IT users. The
quality of the services delivered to these users will be judged according to the
users' ability to safely, effectively, and cost efficiently use the services when required
to perform their jobs.
S ervice level reporting is an important communication vehicle between the IT
department, the user community, and the lines of business. It should be viewed as a
means for demonstrating the value of IT services and as a way to promote the
quality of the services provided by the IT department. Providing the reports in a
format that aligns with the goals of the lines of business, and that is easily under-
stood by business managers as well as corporate executives, demonstrates the IT
department's understanding and support for key business initiatives.
I affective reporting provides a way to proactively address service difficulties and
reduce the negative effect on the reputation of the IT department as a result of a
service outage or degradation.

Tip
Proactive reporting of service difficulties can also reduce the load on help desk personnel by
decreasing the number of problem reports initiated by users of the affected IT services.
40 41
Audience degradation in terms directly related to the business is equally important, hence
When determining how best to report on the quality of services provided by the opportunity costs and lost productivity should be determined and reported.
IT department, various audience types should be identified and categorized along
with their interest areas and characteristics. Each audience category requires differ- Note
ent information that varies in focus, granularity, and frequency. Many common Establishing the relationship between service quality and the ability to optimize business transac-
elements can provide the underlying information used in all reports; however, the lions is important.
perspective and presentation format will differ by audience.
1110-eases in business transaction volumes might be related to improved service levels
;ind business expenses might be reduced by improved staff productivity because of
Executive Management better service performance. These types of relationships can be shown with many
Executive management wants to know that the IT department is providing value different types of applications, such as automated manufacturing operations where
to the business overall and contributing to business success. As information tech- the bottleneck might lie with computerized control or with customer-facing opera-
nology becomes viewed increasingly as a competitive advantage, senior manage- tions such as reservation systems where computer delays lead to increases in staffing
ment becomes more attuned to the impact (positive and negative) of the service levels and reductions in customer satisfaction. Therefore, it is important that this rela-
quality delivered by the IT department. This includes understanding how enhanc- tionship be explained to the lines of business and utilized in service level reports. The
ing the quality of IT services improves business competitiveness and efficiency. goal is to quantify the business benefits associated with the reported service quality.
Similarly, management understands that outages and degraded service cost the
Relating customer satisfaction directly to the quality of IT services might be more
business both in real dollars as well as in related lost opportunity costs. As IT ser-
difficult to capture, and conducting primary research such as customer surveys or
vices are provided directly to customers, such as with e-commerce and e-business
providing a feedback mechanism as part of the service transactions might be required.
initiatives, the visibility of service difficulties increases and extends to the press,
After it is established, the relationship between IT service quality and customer satis-
financial community, and investors who assess the impact of service problems on
business viability and performance. faction can be an important tool for establishing the value of IT and for justifying
additional IT resources. If there is an established, credible mechanism for regularly
Reports aimed at the executive management team must be highly summarized and assessing customer satisfaction, tracking the results of service satisfaction against
outline the quality of service experienced by the company's personnel, customers, delivered service levels directly shows the relationship and trends.
and business partners. The report should directly relate the delivery of superior ser-
vice to associated productivity improvements. Conversely, service outages or degra-
dation should be related to real costs as well as lost opportunity costs in both Internal to IT
revenue and staff productivity. IT must be service oriented in order to provide better support for the business.
To foster this orientation, the same service level reports provided to the lines of
Note business should be available to and reviewed by all levels of IT management. Many
Although reports that include the business impact of service difficulties might be painful and IT departments are organizing first-level support along service lines rather than
embarrassing, they build credibility and might be very helpful when asking for management's sup- technology layers. This provides a focus for service level reviews as well as natural
port to fix the problems. interface points for user communities. These service-oriented teams also act as the
user advocates within the IT department.
Additional reports showing all underlying technology outages and performance
Lines of Business degradation should be produced. Where possible, these reports should be correlated
with overall service quality using time as the common variable. These allow IT
The lines of business are interested in knowing how the quality of services pro- management and technology-focused second-level support to relate the impact of
vided by IT help them to drive more business. This means the reports should relate technology and component failures and degradation to the quality of service levels
service levels to business transaction volumes, personnel productivity, and, where delivered to the lines of business. Overall service delivery performance should be
possible, customer satisfaction. Reporting the impact of service outages or service graded against service level objectives. This ensures all IT personnel know how
well the department is performing overall, and how their particular role and acr vice levels achieved with ally business impact is an important aspect of the
the technology they support affects the achievement of these objectives. executive summary.
The executive summary should be self-contained, particularly for end-of-period
Outside Customers uel,orts aimed at senior management and lines of business. If service difficulties
hive been experienced, they should be highlighted with references to any
Summarized reports should be available to the customers of IT services who are su pporting documentation or detailed reports.
outside the corporation. These should provide information on the quality of the
services delivered to them, and should also outline the steps taken to improve ser-
vice quality, particularly if customer expectations have not been met. Service Availability Reporting
Service availability should be shown mapped against objectives. This includes
Tip distinguishing between normal operating hours, off-hour shifts, and downtime as
Regular customer satisfaction surveys should be conducted to relate the satisfaction of external IT a result of scheduled maintenance. Availability should be shown by service or
users to the service levels delivered to them. application rather than by components, and should represent the experience of
users by organization, by location, and by line of business. Roll-up summaries
A powerful business driver can be established if service levels can be related to cus- showing percentage availability by service and by line of business will be useful
tomer satisfaction and if there is a relationship between customer satisfaction, cus- when communicating with the lines of business and senior management.
tomer loyalty, and buying behaviors. If these relationships can be demonstrated, the 'Hie only audience who should be interested in the availability of individual com-
IT department is in a powerful position to show the true value of the services and ponents will be the IT department. These reports should be used to evaluate the
service quality it provides.
reliability of technology components and the impact of technology problems on
overall service quality. These technology-focused reports can be very useful when
Tip reviewing the performance of technology vendors and might also be useful for
One aspect of service level reporting that can dramatically improve customer perception and satis- senior management when justifying the acquisition of additional IT resources.
faction is the proactive notification provided by real-time reporting and alerts as outlined in the
next section.

Performance Reporting
Performance must relate directly to the end-user experience and should be broken
Types of Reports out by online transaction responsiveness as well as batch job turnaround. To com-
municate effectively with lines of business and senior management, responsiveness
Several different report types are required to provide sufficient detail on all the should be shown by application, user group, location, and line of business.
aspects of service quality and to satisfy the interests and focus of the different audi-
ence types. The format and content of each report also varies with the frequency
Tip
with which it is produced. Reporting frequency is discussed in the next section.
This section outlines the components of a service level report; however, not all It might be beneficial to group transactions based on their characteristics, as well as report respon-
siveness of those transaction types individually as well as the aggregate performance.
reports incorporate all components and not all audiences are interested in receiv-
ing all components.
Characteristics that could be used include degree of difficulty, importance to the
business, and value based on improved user productivity. This provides a more
Executive Summary granular view and helps to associate business value with transaction responsiveness.
This report provides an overall assessment of achieved service levels including To give the IT department a better understanding of the impact of the various
quantitative and qualitative reports against agreed service level objectives. It should technology layers on overall responsiveness, response times and propagation delays
provide quick summaries of the quality of the services delivered and, preferably, by each technology layer (network, operating system, middleware, and database)
make effective use of graphs and charts to impart this inti rnration. Relating the
TT 4t

should be calculated if possible and reported. These should then be graphed against I hose reports should be timed when reviewing security procedures with lines of
overall end-user response times to show any correlation using time as the common business, and when discussing ;md recommending additional security measures.
variable. This enables IT management, capacity planners, and performance analysts
to focus on the most critical performance issues.
Recoveries
All outages should have an additional report outlining recovery time, technique
Workload Volumes used to recover, and procedures implemented to prevent or reduce the impact of
The lines of business and senior management want to see workload volumes subsequent occurrences.This report is very useful to ensure that IT operations
expressed in terms of business transaction rates.This provides a common basis for Income more proactive and move away from continuously operating in a reactive
discussion of workload levels between the IT department and the lines of business. niode.The preventive measures might require additional IT resources (human or
equipment) and might involve implementing new procedures within the IT
Note department or the lines of business.
Business transaction volume supports a better understanding of the value of the services provided by
IT, and provides a foundation for demonstrating the positive (or negative) impact of service levels on Tip
business productivity, revenue, and efficiency. Outlining the real costs, as well as the lost opportunity costs of the downtime caused by each out-
age together with the incremental expense of preventing future occurrences, allows an informed

Workload reports external to the IT department should show business transaction business decision to be taken.

volumes by user group, by location, and by line of business.


To increase understanding of workload characteristics within the IT department,
business transactions should be correlated with transaction rates and utilization Cost Allocation
levels for each of the various technology layers. The most interesting measures are II' costs are allocated to the lines of business, either to allow them to be charged
traffic volumes and utilization of the network, CPU utilization and transaction hack or to provide a pseudo profit and loss statement, an appropriate report will
rates on the servers, I/O rates on the databases and storage subsystem, and transac- he required. This report should outline the methodology used to calculate IT costs,
tion and message rates across the middleware environment. Understanding the total costs calculated by this method, the mechanism used to allocate costs to indi-
relationship between these measures and the business transaction rates is extremely vidual lines of business, and the calculated cost for each line of business.
useful when predicting future performance under various business scenarios and
supports more accurate business decision based on all costs including required IT
Caution
resource capacity. Internal IT workload reports should show business transaction
Using a cost allocation model can lead to unpleasant discussions about the amount of costs
volumes as well as technology utilization levels and transaction rates.
involved, unless this report is also accompanied by the associated service level reports showing the
value of the IT services in business terms.

Security Intrusion
Many corporations choose not to allocate IT costs, and treat the IT department as
An important aspect of service quality is maintaining the confidentiality, privacy, and
a general administration expense. This overlooks the value of IT as a competitive
integrity of business data. Thus, reports should be provided on security intrusion
advantage for the business and overstates the profitability of individual lines of
attempts, security violations, and compromised or damaged data. Additionally, reports
business. In this case, total IT costs are allocated to the lines of business using the
on virus infections and their associated impacts are useful to understanding how
formula for allocating general administration costs.This is a simple way of allocat-
viruses are spread and for making decisions on preventive measures. In all cases of
data damage or security violations, the report should also include a summary of ing IT costs, but doesn't necessarily provide a true representation of how IT
resources are used in reality or the relative utilization and value of IT services to
techniques used to detect the intrusion, the recovery procedures used to restore data
integrity, and the processes and mechanisms used to prevent reoccurrence. each line of business.
However the allocated costs are calculated, the IT department has to decide
whether to produce a single report showing allocated costs for all lines of business,
or to produce individual reports for each line of business showing only the costs liih,lier - level audience, and relate more to the business aspects of service delivery
associated with that organization. This decision depends on the culture of the and the resulting business impact of service level quality.
organization and any associated internal political ramifications.

I )oily Reports
Report Card Summary
Reports produced daily are detailed and show the quality of service provided by
A number of organizations have designed reports that use a school report card for- t hr I'I' department during the previous day. All reports and quality grades of the
mat that provides summarized reports that are easily understood by all audiences, various aspects of service should be provided segmented by application, user group,
rather than highly technical complex reports. Whether using an alphabetic system location, and line of business.The report should also show how the service quality
or a numeric scale, multiple aspects of service are graded and reported. varied by time of day for each of these segments.
Determining what aspects of service to grade is typically done in conjunction with
the user community. Additional underlying information is also provided so that
Ti p
more detail is available if a service level grade is unacceptable.
Daily reports are very detailed and are useful for identifying any patterns or trends in workload vol-
Figure 3.1 shows a sample service level weekly report card that shows attributes for umes or service quality that require analysis or improvement.
two services supported by the IT department. In this case, there was a minor outage ma 15

for the Financials application and a more significant outage for the Help Desk system. I >etailed, daily reports are primarily for consumption within the IT department
and, thus, can be more technology focused. However, the relationship between
technology performance and the service quality experienced by end-users should
Service Level Report - Week of July 5-7, 1999
Service Oracle Financials
he clearly established.
Measure Grade Figure 3.2 shows a sample daily report for a help desk application. This report clearly
Availability for Normal Operations B+
Proactive Problem Notification
shows that the response time experienced by the users in Paris was problematic.
A
Outage Recovery Times A
Responsiveness for Queries A
Responsiveness for Update B Vantive Help Desk Performance for Wednesday July 7, 1999
Report Timeliness A
Security of Data A

Service Vantive Help Desk


Measure Grade
180
Availability for Normal Operations C 160
Proactive Problem Notification A 140
Outage Recovery Times B 120
100
Responsiveness for Queries A
Responsiveness for Update B
Report Timeliness A
Security of Data A
o
o o w w cn 6 v v a 6 6 Oa m m m m r

Figure 3.1 A sample service level weekly report in report card format. n n n n n n n n n n n n n n n nnnnnnnnnnnnn
VANTIVE HELPDESK COLLECTOR.AVeragá Response VANENTP_Camberley@papaya
dw.
..*VANTIVE HELPDESK_COLLECTOR.Average_Response VANENTP_Austin@papaya
o VANTIVE HELPDESK COLLECTOR.AVeragá Response VANENTP_Copenhagen@papaya VANTIVE_HELPDESK COLLECTOR.Average_ResponseVANENTP Franklurt@papaya

VANTIVE HELPDESK COLLECTOR.AVerage Response VANENTP Houston®papaya Ano VANTIVE_HELPDESK COLLECTOR,Average_Response VANENTP_Ho ston 2@papaya

.VANTIVE HELPDESK_COLLECTOR.Average_Response VANENTP LOCALHOST@papaya •++VANTIVE HELPDESK COLLECTOR.Average_Response VANENTP_Madrid@papaye

Frequency of Reporting _VANTIVE HELPDESK_COLLECTOR.AVerage_Response VANENTP_Melbourne@papaya AA* VANTIVE_HELPDESK COLLECTOR.Average_Responee VANENTP NieAvegein@papaya
+VANTIVE_HELPDESK_COLLECTOR,Average_Response VANENTP_Sen Jose@papaya
VANTIVE_HELPDESK COLLECTOR.AVerage_Response VANENTP Paris@papaya

Reports are produced with varying frequencies, depending on the audience ••■ VANTIVE_HELPDESK COLLECTOR.Average_Response VANENTP_Singapore@papaya
VANTIVE HELPDESK_COLLECTOR.Average_Response VANENTP_Sunnyvale@papaya

._VANTIVE HELPDESK_COLLECTOR.Average_Response VANENTP WaItham@papaya


and level of detail. The more frequently produced reports contain a very detailed
analysis, whereas the summary reports are produced less frequently, are aimed at a Figure 3.2 A sample daily service level report showing response times by location for
a help desk application service.
•ro 40

Weekly Summaries osts, can be very useful for a quarterly line of business review conducted by the
Weekly summaries provide similar information to the daily reports, but are sum- I'I' department.
marized relative to time. The service quality can be summarized by shift or half-
shift for each day of the week rather than on an hourly basis. If additional detail is Tip
required to explain a pattern or trend, the drill-down detail from a particular day's The quarterly summary report, combined with a line of business customer satisfaction survey, is
report should be provided. an excellent vehicle for continuing the communications between IT and its internal and external
customers.
The weekly reports should start with a business focus of availability, performance,
and workload volumes, provide technology focused reports of these same mea-
sures, and highlight any correlation to show the impact of technology issues on 'I'liese business reports can also be useful for understanding future plans and IT
requirements for each business unit and for renegotiating service level agreements
overall service delivery. Additional aspects to be covered in the report should be a
as necessary. The quarterly summaries are also where costs would typically be allo-
summary of security violations and attempted intrusion, as well as detailed analysis
of outages and recoveries. cated if a chargeback mechanism were implemented. In order to ensure no sur-
prises for the lines of business, additional exception reports might be required
Although the primary audience is the IT department, lines of business might also inure frequently if anticipated costs are exceeded.
want to review the weekly reports, particularly if the service quality was perceived
to be abnormal.
Real-Time Reporting
(Leal-time reporting adds significant value to the users of IT services as an addition
Monthly Overviews to historical service level reporting. Real-time reports increase the satisfaction of the
The monthly overview is primarily a reporting mechanism for the lines of business IT user community and also reduce the workload of help desk personnel and over-
and senior management. It should communicate the quality of services delivered head on problem reporting systems. Proactive notification of known problems also
by the IT department succinctly and should relate the quality of IT services to increases the end-users' willingness to work more flexibly with the IT department
business value. to reduce the business impact of outages and service degradation. For example,
work shifts might be rescheduled or back-up systems put in operation.
Tip
The report card format, combined with graphical explanation, enables clear understanding and quick
interpretation. Outage Alerts

Tip
All aspects of service quality should be covered, but allocated costs are not typically
Providing a proactive mechanism that lets users know of problems identified by the IT department
shown in monthly reports unless accounting and budget procedures called for
significantly increases confidence of the user community in the IT department.
monthly reports.
Additional reports internal to the IT department should relate the availability and When communicating outage information, it is important to show which users are
performance of the various technology layers to the business view of service. This affected, together with impacted applications, locations, and lines of business. Other
allows the correlation between technology problems and associated business impact important information includes the nature of the problem and its symptoms, as
to be clearly established so that resources can be focused on the most important well as the anticipated service resumption time.
issues.
This requires a reporting facility using Web or push technology. It significantly
reduces the number of calls to the help desk and allows IT personnel to focus
Quarterly Business Summaries their resources and energy on fixing the problem, rather than on responding to
The quarterly summary report showing overall service levels, associated business and user queries.
productivity leverage, as well as outages and associated costs and lost opportunity
K1 F
bU ▪
Planned Downtime Ao Paooai i*
^

Scheduled maintenance is nearly always disruptive to business operations. This is Sp Poltsh C a


exaggerated by e-commerce, e-business, and other internal online initiatives that
enable business personnel and customers to use IT services at any time of the day
BMC Compass
or night via Internet access, home PCs, and dial-up networks. When alerting users
SEARCH I HELP DESK 1 CRISIS I
to planned downtime, information should be provided on the service, location, and ALERT EVENTS !PHONE/100K t PRODUCTS 1 STOCK I

Serviccs cutioii: I lolisilri


user groups affected, as well as the reason for the downtime and any available alter-
native service offerings. `:invite Name 'Sub- Sul vices - .iunitnreit
,-_.- =..0,5/ Calendar Sofraare, .Fm
_ .lg

IL ce, til, ;V mtve, Financiai & 1111. Apphnotieue, Soles &


Tip lNat mnut MarketingAplcos/Da-u,Prdt
1es
'..:Distribution, R&D Applications, Corporate
It might also improve user relations if the alert is published with sufficient notice to enable the user
"7nformaticn r ,; rams, Dacum tt lVfanagemmt
community to negotiate an alternative schedule for the downtime or to make other business =:: ..1•VI Phone, Voice Mail
arrangements. r,• r.i.l :<.mnte 0, I 0,

14elw0rk S i ,1ú1L! `I ,L P ..m. ^ baia';

Performance Degradation A sample online alert system showing the service status for a specific location.
Figure 3.3
Typically, users will experience performance degradation in the form of poor
responsiveness before a problem is identified by the IT department, unless technol- figure. 3.4 shows that the first warning is due to a problem with the document
ogy solutions are implemented to proactively measure the end-user experienced i i i i nagement system. This additional information shows the locations affected by
response times. In either case, as soon as the performance degradation is identified, an We problem together with the organizations, as well as the estimated time for
alert should be sent to all impacted users notifying them of the condition. In many arr vice resumption.
cases, isolating the cause of the performance degradation is complex, and it might be
difficult to determine the length of time this will take. However, making users aware
of the problem and that the IT department is actively working to improve perfor-
mance reduces the number of problem reports and calls to the help desk. Ë _ eer, h '

Heavy Security Attacks BMC Compass

If the IT department detects security intrusion attempts or a spreading virus, it WRITS I PHONEPDOK I PRODUCTS I STOCK I SEARCH 1 HELP DESK I CRISIS I

needs to alert users immediately in order to reduce the potential damage caused by l BMC a nwrn Pacol. I .:• nnII111 '.lan.r ,weucttt
ROC cott. ,ar.: FiscIE
these attacks. The alert should contain the nature of the attack, the immediate steps ▪

being taken by IT to contain the attack, and the precautions that users should take
d Solo
li Proaw.l. A SNNbn.
Stains Sub-Service DateiIime 'Estimated Lac ato OArea1organizatl Drsrrìplior
Down Date'1'ime Impacted . Impa d ^
to limit their vulnerability. In some cases, the IT department must take certain
nM
A cgkzatt
M

▪,
:a rac
ULORDRILOsns co
applications and services offline, and an estimate of the downtime should also be I,A •atIons
Ima gi ng
Database
\riELD
; D ocume .:.Sac .ïui ...15. , ]Son'; l:.. Alean,
provided. Following the resolution of the security violation or virus, the IT A 00111 1999 3:00 1999 6:00 Houston, , R&D found
▪ sn. Su, ts,
PM CST I M CST Sunnyvale, corrupted 7/1 0
department will need to notify users of how they can recover any lost or cor- Waltham . ata recovery
s of 7/10 2
rupted data and the steps to be taken to avoid any repeat occurrences of the
attack.
Figure 3.3 demonstrates a sample online reporting system in which one service is
currently in warning status and one service is currently in alert status. Drill-down Figure 3.4 The details of the service disruption alert.
details for each service are available as shown in Figure 3.4.
Service level reporting consists of a variety of report formats, each with different
content and production frequency. Each of these reports has a different set of audi-
ences, and the report format and content should be tailored to each specific audi-
ence. Effective reporting requires an understanding of the audiences, and service
levels should be reported from the audience's perspective. Of particular importance
is establishing and reporting the relationship between the delivered service levels
and the business impact (both positive and negative). Additionally, real-time alert- CHAPTER
ing and reporting mechanisms are very valuable to the service users, and can sig-
nificantly enhance the reputation of the IT department as well as reduce the
workload on the help desk staff and systems.

Summary
Reporting achieved service levels is an important aspect of communicating
between the IT department and the lines of business it services. The reports will
vary by audience, frequency, and content detail; however, in all cases, the reports
should discuss the quality of services provided in terms the audience understands.
If possible, establishing a direct link between service quality and business impact
adds credibility to the reports. Additionally, the use of online alerts and proactive
notification of service degradation allows the IT department to show greater
Service Level
responsiveness to the IT user community. This improves user relationships while
reducing the number of calls to help desk personnel. Agreements

S ervice Level Agreements (SLAs) are central to managing the quality of service
delivered by, or received by, an IT organization. More than anything else, SLAs are
what people think of when they discuss service level management. Obviously, as
discussed in Chapter 3, "Service Level Reporting," Service Level Reports are a key
component of service level management. However, without service level agree-
ments, efforts to manage service levels are little more than a collection of good
intentions.

The Need for SLAs


Why are Service Level Agreements so important to service level management?
The answer is that SLAs set the standard to measure against. It is analogous to hav-
ing a cooking thermometer that is not calibrated. (There are no gradations on it.)
Without calibration, you do not know the range of the thermometer. Therefore,
you can use it, but it provides little value. If you stick the thermometer into a
turkey and then take it out, it shows that your prospective dinner is hot enough to liv now you aright he askiig,"What do ostru Ii steaks have in do with Service
register halfway up the thermometer. What does this tell you? Is the turkey nearly t•vel Agreennents?".The answer is that the cooking analogy actually represents a
done, barely thawed, or so overcooked that it soon can be converted to jerky? tri y simplistic example of a Service Level Agreement.You and your friends agreed
From the data provided by the uncalibrated thermometer, you simply cannot on what would be an acceptable way of cooking their steaks. In essence, you nego-
judge. If you have used this thermometer enough times prior to this, you might tiated an agreement about the service (cooking) that you were about to provide
be able to interpret its display. However, such interpretation requires considerable mid what: would be an acceptable level of service (how well done the steaks would
experience and is highly subjective. In'). lit doing this, you have settled on a mutual understanding and set a target level
performance.
Take the analogy of a cooking thermometer a bit further. Assume that you are
going to cook some steaks for a group of friends. Now these are not just ordinary
steaks. They are ostrich steaks. They are quite expensive, and you have never Functions of SLAs
cooked them before. To make matters worse, your friends are quite particular
about how their steaks are to be served. Each of your friends tells you how he eke defining how well steaks will be cooked, some basic benefits result from cre-
ating a Service Level Agreement. First, an SLA defines what levels of service are
or she would like their steak prepared. Two of them request rare. One requests
isidered acceptable by users and are attainable by the service provider. This is
medium and one requests medium rare, "but not too rare." iii

l a rticularly beneficial to the service provider. It guards against expectation creep.


In far too many cases, IT managers are working with uncalibrated thermometers. There is a basic characteristic in human nature to always want more and better-
That is, they are working with data that is difficult to relate to the service levels tegarclless of the subject. If you receive a huge raise, it is likely that, even if the cost
expected or provided. For example, detailed collections of data (such as packet col or living does not change, you will be hoping for (or even demanding) another
lisions) might be very useful to network administrators and, indeed, might be raise a year later.
related to the level of service being provided. However, that relationship is not
immediately apparent. In the case of IT services, if the availability of a key application is increased
dramatically--higher than ever requested before—clients will soon become used
Because you have never cooked ostrich before, you do not feel that you will be to that level of availability and begin to demand an even higher level of availability,
able to visually judge when the steaks are properly cooked. Fortunately, you have a and they will vilify IT if it is not provided. If the expectations are documented in
cookbook that tells you the temperature to which an ostrich steak must be cooked iii SLA, they become a reference point—an anchor—for client expectations. In
for various degrees of completion. Although you have upgraded your thermometer other words, the SLA provides permanence for the agreements arrived at and
and now have one that has degrees marked on it, you still have a problem. You do locumented in it. More specifically, a well-written Service Level Agreement will
not know if the cookbook's standard for medium rare matches your friends' define not only the expectations (how good is good enough), but it will also
expectations. What should you do? You could proceed to cook the steaks using the define a mutually acceptable and agreed upon set of indicators of the quality of
standard specified in the cookbook and hope that your friends approve of your service.
decision.You decide that would be too risky.You could tell your friends to cook
their steaks themselves, thereby absolving yourself of responsibility. However, you
Principles of Expectation Creep
want to impress your friends with your great culinary skills, and telling them to As expectations are met, expectations will rise. People are never satisfied.
cook their own steaks could hardly be expected to impress them as you are hoping
to do. As you ponder your dilemma, you notice that your cookbook contains a People become upset when their expectations are violated.
short description for each of the labels provided (rare, medium, and so on).You hit In the absence of contradictory facts, expectations will be based on what is desirable, rather than
upon an idea; you read the descriptions to your friends. A discussion follows in
what is possible.
which you learn that your friend who wanted her steak cooked medium rare, but
not too rare, in reality would like her steak cooked to what your cookbook
Service providers and their clients are like beings from different planets when talk-
describes as medium-well done. The others agree with the cookbook's descrip-
ing about service levels. They tend to speak very different languages. The result is
tions. Using your new thermometer, you cook the steaks to perfection and your
friends hail you as a great chef. that it is often quite difficult for them to understand each other. Ultimately, an
SLA, through those service level indicators, will provide a common language for
communication between the two diverse communities. Documenting the mutual Instance, ensure I(. X ) uptiiiie to external customers by establishing firm in-house
understanding arrived at through the process of negotiating a Service Level SI As between IT and the various divisions of the organization. The cumulative
Agreement provides clarity. owult ot'strictly adhering to these agreements is an overall level of reliability that
an he used as a selling point to bank customers.
Note
There are six primary benefits that can be expected from Service Level Agreements. Those
benefits are 1xternal SLAs
The most rigorous type of agreement is the External SLA. Because it is usually a
Provides permanence
legally binding contract between two companies, it requires more care in crafting
Provides clarity it, Legal review of the External agreement is strongly advised. However, many
Serves as communications vehicle r n npanies overlook this step and, as a result, end up with an agreement that is
Guards against expectation creep ul . little value. Of course, another error, at least as serious, is to fail to have SLAs
with external service providers.The lack of SLAB has proven disastrous for many
Sets mutual standards for service
c on Upanies.
Defines how level of service will be measured

Caution
Have External SLAs reviewed by an attorney before signing.
Types of SLAs
Broadly, there are three types of SLAs.The one that is most common is an In- I ailing to have an SLA with an external service provider is unconscionable, and
yet countless companies do exactly that every year. The problem is not limited to
House SLA. An In-House SLA is one between a service provider and an in-house
small companies. Some of the world's largest companies have made this mistake.
client. An example of an In-House SLA would be the agreement between IT and
'I'he managers responsible for these contracts are guilty of gross negligence and
a user department. The second most common SLA is an External SLA. That is, an
SLA between a service provider and its client (another company).The third type perhaps a breech of their fiduciary responsibility to their employers. After a con-
of SLA is an Internal SLA. The Internal SLA is used by the service provider to tract for service without a service level guarantee has been signed, the client's
measure the performance of groups within the service provider's organization. An options are quite limited. To begin, they must hope that the services provided meet
example of the Internal SLA would be between the network services group in an their needs. If the services provided do not meet their requirements, for any rea-
IT organization and the overall organization, or perhaps the CIO. The Internal son, depending on the specific terms of their contract, they might be faced with
tough choices, including enduring, for the remainder of the contract, a level of ser-
SLA is typically tied to annual reviews of managers and provides a mechanism for
vice that is less than acceptable; terminating the contract prematurely, potentially
holding individuals and groups accountable for their portion of an overall service.
incurring large penalties for doing so; or attempting to renegotiate the contract.
The process for creating an SLA is fundamentally the same for each type of agree- (Of course, the service provider probably has little or no incentive to renegotiate
ment. Likewise, the contents that are found in each different type of agreement are the contract.) Renegotiating the contract might result in higher fees in order to
basically the same. The differences come largely in the formality that is attached to receive the desired level of service. The specific options available will depend on
the process of creating the agreement, the language that is used, and the conse- the terms of the contract with the service provider. Any company finding itself in
quences that will result if the service level commitments are not met. the unenviable position of receiving an unacceptable level of service and no con-
tractual guarantees of the level of service to be provided should seek legal counsel
to assist in assessing the available options.
In - House SLAs
When the service provider and client work for the same company, familiarity
Tip
should not be allowed to preclude establishing a detailed, legally binding contract. Always include a Service Level Agreement as part of any service contract with another company.
If the SLA is constructed in a considered, serious way, the results can benefit both
parties as well as the company itself. Most large banks and Financial institutions, for
58
Internal SLAs
I ne size-fits-all answer to this question. In part, the size of the team will be dic-
The Internal SLA is a relatively simple matter. It typically is written in an informal t,tteil by the culture of the company. However, some guidelines can be offered. The
manner. In fact, the Internal SLA might not exist as a separate agreement. Instead, e,itn needs to be large enough that each stakeholder group be represented on the
its commitments and intent might be embodied in other documents, such as indi- e,nit. (A stakeholder group is any organization that is either engaged in providing
vidual or departmental goals and objectives, or even in the criteria for the com- the service or is a user of the service.) In most cases, the agreement will have only
pany's bonus plan. Frequently, the Internal SLA will specify service levels in very I wo stakeholders, the service provider and the user. Although it is possible for the
technical terms. The use of technical terminology, and even jargon, can be accept-
Bain to consist of just two individuals, that is not very common.The typical size
able in this document because all the parties are familiar with the terms and of ;in SLA negotiating team in a medium to large company is 4-10 people. Every
understand them.
el fort should be made to minimize the size of the team, although the realities of
c orporate politics might dictate otherwise.
SLA Processes Ideally, every team member should have something unique to contribute to the
Service level management is a process. While the SLA itself is a document, it is the process, such as knowledge about how the service impacts the users, the limitations
product of process. Processes are required to create, maintain, and administer the of the technology being used to deliver the service, and so on. The members need
Service Level Agreement. In be, in some respect, subject-matter experts on some aspect of the service deliv-
ery or consumption.

Creation Process Note


The process of creating an SLA typically follows a series of predictable steps, which In assembling a team to negotiate a service level agreement, there are four points to keep in mind.
are summarized in the following section. Although this guide is applicable to most First, there should be equal representation from the service provider and their client. Second, the
SLAs, keep in mind that every situation is different. In many instances, SLAs will leaders of the team should be peers. Third, the members of the team should be stakeholders. That is,
need to be tweaked to accommodate special functions and measurements. It is they should have a vested interest in the service being provided. Fourth, the team members need to
crucial for IT managers to follow their instincts in tailoring each SLA to fit the be subject matter experts; for example, knowledgeable about the service and its business impacts.
requirements of particular constituents.
SLA creation begins with a serious commitment to negotiate an agreement. It ' I'liere should be equal representation on the team from both the user group and
would be easy to say the groups involved must make that commitment. However, the service provider. Too great a disparity in numbers will give an unfair psycho-
in reality, commitments are not made by groups, but by individuals. In this case, the logical advantage to the larger team.
commitment needs to begin with senior management of the groups involved— At a minimum, the leaders from each group need to be peers. It is difficult for
that is, the management of the service provider organization and the user or client effective negotiations to take place if there is a significant disparity in rank among
organization. Ideally, the commitment is made at the very highest levels in the team members. Some companies are more sensitive to this than others. If peer rela-
respective organizations. tionships are not considered, this can lead to one group being apt to dictate (possi-
bly unintentionally) the terms of the SLA, rather than negotiating them through a
Assemble a Team process of exchange between peers. Another requirement for the team leaders is
that they have sufficient authority to commit their organization to the SLA.
When there is executive commitment to creating an SLA, the next step is to
assemble a team of people to actually negotiate the terms of the agreement. It is The negotiating team should have a charter, written by the leaders, that specifies
important that the members of the team be personally committed to the success its responsibilities, membership, leadership, structure, and functioning. In terms of
of the process—that is, committed to creating a fair and reasonable Service Level functioning, the charter needs to include a schedule for the development of the
Agreement. SLA. It is advisable to make the schedule aggressive. A pitfall of some teams, espe-
cially large ones, is that some members almost make a career of the negotiation
In order to assemble a team to negotiate an agreement, it is necessary to determine
process. In most cases, depending on availability of the team members, it should be
the team size and membership. As with most questions in life, there is not a single,
possible to negotiate an agreement in 6-8 weeks.
I )ueument The Agreement
Tip
As noted previously, the SLA is a contract . When the negotiations have been com-
Negotiating team meetings should be brief and infrequent. Most of the work will be done outside
pleled, die next step is to document what was agreed upon. The basic components
the meetings.
pit a Service Level Agreement are as follow s:

Parties to the agreement Exclusions


Negotiate the SLA Ii•rni Reporting

Successfully negotiating an SLA (particularly Internal or In-House SLAs) requires Scope Administration
that both parties approach the process seeking a Win-Win solution. That is, they ,imitations Reviews
should seek to craft a Service Level Agreement that is fair and reasonable to both Service level objectives Revisions
parties. The result of the negotiation will be a contract. In the case of the In- Approvals
Service level indicators
House and Internal agreements, the contract might not be a legally binding agree-
Non-performance
ment; however, the structure will be the same.
Optional services
Negotiating an SLA is a process in which information is exchanged in order to
seek a reasonable conclusion. The user group needs to be able to communicate
Parties to the Agreement
their requirements clearly. They also need to be able to explain the business impac
Phis will normally be the two groups that negotiated the agreement; that is, the
of various levels of service.
service provider and the user group that is the consumer of the service.
Similarly, IT (or another type of service provider) needs to be able to assess the
potential impacts of delivering proposed levels of services. Those impacts might be
Term
financial (additions or upgrades in staff, computers, networks, and so on). Another
'Typically, the term of the SLA will be two years. Creating an SLA is too much
possible impact of delivering a higher level of service to one group might be that
work to warrant an agreement term of much less than two years. Alternatively,
IT would have to reduce the level of service provided to another group. There
technology and business conditions change too rapidly to be able to confidently
might be technical limitations on IT's ability to provide the level of service.
expect the agreement to be valid beyond two years.

Tip
Scope
Do your homework before negotiating. You should know the following:
This section will define the services covered by the agreement. For example, an
■ Cost of delivering a given level of service agreement might specify that it covers an online order entry system, the facilities

where the users will be located, volumes of transactions anticipated, when the
Benefits of the desired level of service
service will be available (days of the week and hours of the day). Note that this
• Service level metrics that are available section does not specify the levels of services to be provided. In the preceding
example, nothing is mentioned about the percent availability for the service.
Before negotiations begin, it is important that benchmark data be collected. Ideally,
both groups will be able to collect data. However, in most cases the service Limitations
provider will have access to more data. The objective is to know as precisely as This section of the agreement can be thought of as the service provider's Caveat
possible the level of service currently being provided. It is also important to know clause. This section basically qualifies the services defined in the Scope section of
what metrics are available regarding the level of service being provided. It might the agreement. The service provider is saying, "We will provide the services cov-
be great to agree to provide an average end-to-end response time of 0.5 seconds. ered by this agreement as long as you don't exceed any of the limitations." Typical
However, if it is not possible to measure end-to-end response time, the effort spent limitations are volume (for example, transactions per minute or per hour, number
negotiating that item in the SLA has been wasted. of concurrent users, and so on), topology (location of facilities to which the ser-
vice is delivered, distribution of users, and so on), and adequate funding for the
service provider. These types of limitations are quite reasonable.
62 63

In order to enter into the Service Level Agreement, the service provider has to Ides, and so on. 'l he objective for accuracy is basically centered on the question
believe that they have adequate resources to meet the commitments of the agree- III whether the service is doing what it is supposed to do. For example, are email
ment. Making this commitment without these limitations would be like someone messages delivered to the intended recipient? Although availability, performance,
agreeing to feed you and all the members of your household for a lump sum pay- and accuracy are the most popular categories for objectives, they are by no means
ment of $10,000. This might be a good deal for either party of this agreement. If t he only objectives. Other categories include cost and security.
your immediate family consists of just yourself and your very petite wife, the per- Iti any discussion of service level objectives, a question always raised is,"What is
son agreeing to provide the food has struck a great bargain. However, one month
t he right number of objectives?". Although there is not a specific number that is
into the term of your agreement, your five children (two of whom are training to always the correct number to use, this is a case in which the principle of brevity
become sumo wrestlers) move back into your house. Also, you decide to host a has merit. Including more objectives does not automatically raise the quality of the
foreign exchange student. The student happens to be 300-pound weight lifter.
SI A. In general terms, 5-10 service level objectives are usually sufficient. This
Suddenly the balance of the equation has shifted. Without limitations in the agree- numuber of objectives is usually sufficient to cover the most important aspects of
ment to provide food for your current household, the other party to the agree- the service. Including more objectives usually means that less important objectives
ment will start losing money after the second month of feeding your enlarged ,ire being introduced and drawing attention away from the more important ones. If
household. In business, equally dramatic changes can occur. Mergers and acquisi- tltcre appears to be a large number of critical service level objectives that need to
tions can bring sudden increases in workload, as well as shifts in traffic network be included in the agreement, the SLA team should carefully consider the possibil-
patterns. Closing or opening facilities will shift workloads and might require new ity that they are attempting to cover more than one service with the agreement. If
links for your network. Consolidation of functions into fewer locations might that is the case, they should redefine their effort and write separate SLAs for each
change traffic patterns. Growth of the business is also a source of additional data to
be handled. service.

Service Level Objectives Tip


Limit the number of service level objectives to 5-10 critical objectives.
More than any other factor, the service level objectives are what most people think
of when they refer to SLAs. The service level objectives are the agreed upon levels
Service level objectives cannot be any randomly chosen set of characteristics.These
of service that are to be provided. These might include such things as response
oust be able to meet certain criteria in order to qualify for inclusion in a service
time, availability, and so on. For each aspect of the service covered by the agree-
level agreement. First, a service level objective must be attainable. There are far too
ment, there should be a target level defined. In fact, in some cases it can be desir-
many cases in which, for a variety of reasons (none of which are valid), a service
able to define two levels for each factor. The first will be the minimum level of
level objective is included in an SLA even though it cannot be met.
service that will be considered acceptable. The second will be a stretch objective.
That is, the second number will reflect a higher level of service that is desirable, Consider the example of choosing user response time as one of the service level
but not guaranteed. Clearly, the second category is optional, and if it is utilized in objectives to be included in an agreement. Assume that for this example it is possi-
an SLA, it will normally have some type of incentive or reward associated with ble to measure the response time. The parties agree upon an acceptable target for
meeting it. average response time. However, there is a problem. The site covered by this agree-
ment is in a remote area of Indonesia. The server being accessed is located in Terre
The most popular categories of Service Level Objectives are availability, perfor-
Haute, Indiana. The connection consists of a T1 link to an earth station, followed
mance, and accuracy. Availability can be specified in terms of the days and hours
by two satellite links to reach an earth station in Jakarta, Indonesia. From the
that the service will be available or as a percentage of that time. It is generally best
Indonesian earth station, there are a series of microwave links to the user location.
to specify the time period when the service is expected to be available and then
In total, the propagation delay alone for this connection is greater than the total
define the minimum acceptable percentage of availability. Performance can include
response time allowed in the Service Level Agreement. The result is that no matter
measurements of speed and/or volume. Volume (also referred to as throughput or
how hard the IT organization tries, they will never be able to meet the response
workload) might be expressed in terms of transactions/hour, transactions/day, or
time commitment in this Service Level Agreement. The example in Figure 4.1
gigabits of files transferred from one location to another. Speed includes the
always-popular response time objective. However, speed is not limited to just illustrates this connection.
response time. It could also include time required to transfer data, retrieve archived
114 96

A service level objective must be meaningful to all the parties to the agreement.
Another way of stating this is to say that it must be relevant. An IT organization
ought consider an important metric to be CPU utilization for the servers used to
deliver the service in question. However, from the users' perspective, the relevance
id this to the service they receive is difficult to grasp.
Another requirement for a service level objective is closely related to the need to
he meaningful. That is, the service level objective and its associated metrics must
Tern, Haut ,
Irdlsna
he understandable. Interviews of IT managers by Enterprise Management
Associates has found that some of them are providing the users with statistics that
ate intended to reflect service levels. Unfortunately, those statistics tend to be ones
that are easily captured, and mean little to anyone other than a network engineer
nr system administrator. Two of the more popular statistics reported were packet
c ollisions and dropped packets. Although these might impact the level of service
being delivered, they are not readily related to what the user is experiencing. In
tact, these statistics meant little or nothing to nearly all the users receiving the
reports. Thus, those statistics failed the tests for being understandable and for being
Medan
meaningful.
i eaningful.
te
dang „Pontianak
C
l aalikpapan
Maj
aT o yepura
The next requirement for a service level objective is that it must be measurable.
Teluk enarmas
eonthein4R
Iman Among users, a very popular service level objective is user (that is, end-to-end)
' INDONESIA
r4,ae= . 0 response time. Certainly, this is one of the key factors shaping the users' opinions
Mereeke
about the level of service they are receiving. Unfortunately, measuring user
response time on an end-to-end basis is still a technical challenge today. At the
Figure 4.1 Propagation delay makes service level objectives unattainable. other end of the feasibility spectrum is the availability of a service. This is relatively
straightforward and can be measured with a minimum of effort and difficulty. If it
You might wonder why an IT organization would ever commit to a service level is not possible (and affordable) to measure something to represent a service level
objective that it cannot meet. There are a variety of reasons for this. The IT representa- objective, that objective is worthless and should not be included in an agreement.
tives on the SLA team might have been poor negotiators. The user team might not
have negotiated in good faith, approaching the process from a win-lose perspective. A service level objective belongs in an SLA only if it represents something that is
The negotiators might not have been peers with the more senior representatives on controllable. That is, the service provider must have the ability to exercise control
the user team. Another possibility is that the IT representatives failed to do their home- over the factors that determine the level of service delivered. If unlimited resources
work. Even if the team members, individually, lacked specific knowledge about the are available, it is difficult to conceive of a common IT-provided service that is not
connection in question, they certainly should have been able to research it and make controllable. However, faced with the limitations of the real world, such conditions
an informed response to the request for this level of service for user response time. become much more plausible. Consider the IT manager in a Third World country.
The manager's budget does not permit the purchase of a standby generator to pre-
Note vent service interruptions during the frequent power failures. Strikes by union
workers, poor service by a telco (with no reasonable alternative), and so on are just
In order for service level agreements to be successful, the criteria that they use to measure the level
of service must be
a few of the factors that can place certain service level objectives beyond the con-
trol of the service provider. When assessing whether an objective is controllable,
Attainable Meaningful consider providing exclusions (or waivers) for factors that are not controllable and
Measurable Controllable that might impact the level of service provided.
Understandable Affordable As has been previously mentioned, no organization has unlimited resources. The
Mutually acceptable amount that can be spent on delivering any service is limited. Therefore, in setting
OD Of

service level objectives, it is also necessary to consider whether the desired level of potential trouble spots. Or the problem could he in one of the many network con-
service is affordable. (This might also be thought of as being cost effective.) The ions linking the client to the server: A router could be down in the network,
first way to look at this is by considering whether the desired level can be deliv- the communications server at the user site could be down, or the application could
ered within the existing budget of the service provider, without adversely impact- hr' running but not responding because it is waiting for some critical resource. Any
ing any other services. If it can, there is no question that it is. affordable. If it of these examples would prevent the users from being able to access the applica-
cannot, the question becomes more difficult to answer. It is necessary to consider tion. From the user's perspective, however, the truth is that the application is
the business value of the desired level of service compared with the current level. unavailable.
In one case, the client of an IT organization was adamant that for the service in
question (order entry system), they absolutely had to have 99.999999% availability.
Tip
The current availability for that system was 99.999%. Instead of digging in their
Remember that the user's perspective is the one that counts.
heels and insisting that the higher availability was impossible, the IT organization mcamm

did their homework. They researched what changes would be required in order to
l'herefore, it can be seen that careful thought must go into defining what indica-
deliver the requested availability and the cost of those changes. They returned to
tors will be used to provide metrics to represent each service objective. In some
the user organization and explained that they would be happy to provide the
ases, the service level indicators will be the same as the objective they represent.
desired level of service if (as was the company policy) the user organization would
lit other cases, the indicators are an indirect representation of the service level
provide the necessary funds. It was explained that the cost of the necessary changes
would be $87 million initially and $8—$10 million per year thereafter. Suddenly the objective.
user organization decided that a more modest increase would be acceptable Consider the case of the availability of an order entry system. Ideally, there will be
(99.9999%). Another aspect of affordability pertains to the cost of collecting the a single indicator for the service's availability; that is, an indicator which reflects the
data for service level reporting. Like so many things, this often becomes a tradeoff overall availability of the service to the end user. Unfortunately, in the case of the
between precision and cost. order entry system, there is not a way to directly measure the availability of the
service. However, it might be possible to develop an estimate of the service's avail-
Finally, the service level objectives that are included in an SLA must be mutually
ahility. Perhaps a special application can be constructed that will reside at the users'
acceptable to all the parties to the agreement. It is not possible for a viable, effec-
location and periodically test the service's availability (perhaps by submitting an
tive agreement to be arrived at if one of the parties to the agreement simply dic-
inquiry transaction). However, security and other concerns might preclude such an
tates the terms of the agreement. Creating an SLA is a process of negotiation to
arrive at a result that both parties consider acceptable and that they both feel they approach.
can live with for the term of the agreement. Continuing with our example, in the event that it is not possible to develop a sin-
gle measurement that represents the overall availability of the service, it becomes
Service Level Indicators necessary for the SLA to define what will provide an adequate approximation of
As noted previously in this chapter, every service level objective must be measur- the service's availability. One approach might be to track the availability of each of
the components required for the delivery of the service (for example, application,
able. More precisely, something must be able to be measured that is indicative of
that service level objective. In a sense, a service is an elusive, intangible thing that server, network, application, and so on). Obviously, this is not a perfect solution,
cannot be directly measured. Instead, it is necessary to measure something that but it might be good enough. If more precision is necessary, it is possible to ana-
both parties agree reasonably represents the service level objective. lyze and correlate the data to provide a better view of overall availability. However,
greater precision will normally carry with it greater complexity, greater cost, and
Consider the service level objective of the availability of a system (such as an order higher likelihood of error. Remember that perfection is unlikely and compromise
entry system). This might seem very simple and straightforward. However, look at is an inherent part of the SLA process.
it more carefully. Some IT managers tend to look at the problem too simplistically
and feel that it is sufficient to monitor the application software. They think that if Whatever is chosen, the SLA needs to document each of the service level indica-
the application is running, it is available. However, there could be a variety of tors that will be used to represent each of the service level objectives. It will be
problems that prevent the user from accessing and using the application. The necessary to specify the data source for each of the indicators.
problem could originate in the client or the server, each of which has a series of
an
Non-Performance
v(sthility of a problem within their company Wright not be something that the sales
If the Limitations section of the agreement can be considered the service utgain/ation or the support organization consider desirable, it is clearly in the cus-
providers' Caveat section, the Non-Performance section can be considered the I rn ner's best interests. Another interesting aspect to the use of this contract clause is
Consequences section. That is, this section spells out what will happen in the event t hat it has almost never actually been applied. The reason that it has not been
that the service provider fails to meet the commitments that are spelled out in the applied is twofold. First, and most importantly, the vendors will turn their organi-
SLA. Typically, if the service provider fails to meet their obligations, the agreement /ations inside out to make sure that they don't have to make the penalty payment.
will detail the penalties that might be expected. The most obvious penalty is finan- Second, in all honesty, the terms that the customer specifies are so loose that
cial, particularly in the case of an external service provider. With external service almost any action on the vendor's part will satisfy the language of the contract.
providers, you should also include a clause that provides your company with the I lowever, in their fear of incurring any penalty, the vendors seem not to recognize
option of terminating the contract in the event of significant non-performance. Be t l i is and go far beyond what is required in the contract.
careful in dealing with external service providers. Some of them will propose a ( )ne last point about this example is warranted. The penalty is calculated as a per-
remedy for non-performance that consists of credits to be applied toward future centage of the annual maintenance fee. The result is a potential penalty that is
services. However, if they are providing an unacceptable level of service now, why miniscule. Consider a software product that costs $30,000 and has an annual main-
would you want a discount on future services? That is somewhat like going to a 'enance fee of $4,500.The contractual penalty for not responding to a complaint
restaurant for dinner and having a horrible meal with service to match the quality li•ont the customer about a problem would be calculated based on the amount of
of the food. At the end of the evening, you complain to the manager. As consola- tulle in excess of that allowed in the contract. Assume that the agreement specifies
tion for the poor meal that you have just had, he offers you a gift certificate so that that the vendor will respond to a serious problem within one business day. If the
you can come back another time and have another meal at no cost. However, if vendor actually takes three business days to respond, the violation of the agreement
the food was truly terrible and the waiter incompetent, would you really want to would be two days. Even though the violation is in business days, the maintenance
go back and have another meal that might potentially be equally bad? The same is agreement is specified in calendar days, and therefore the penalty is calculated
true of services in the business world. Be sure that the compensation offered is using business days. The 2-day violation is divided by 365 calendar days. This result
really something that would have value to your company. The purpose of the (0.005479) is then multiplied times the annual maintenance fee. Therefore, in this
penalties (other than termination) is not to compensate your company for the case the penalty would be $24.67! A ridiculously small amount, yet sufficient to
poor service. Rather, the purpose is to provide sufficient incentive for the service make very large companies jump through hoops to avoid it. A simpler alternative
provider to provide the level of service for which you have contracted. This princi- to determining the amount of the penalty is to simply specify a dollar amount for
ple can be illustrated by one company's approach to software acquisitions. The a period of time (minutes, hours, days, and so on) of violation of the agreement.
company insists that any vendor contractually promise that if they do not respond
to problems in their software on a timely basis, they will pay the company a small Although in the previous example a token penalty was sufficient, you cannot
amount of money (a percentage of the annual maintenance contract). The maxi- always rely on a token payment to produce the desired effect. The key to a penalty
mum amount of money that could ever have to be paid, even in a worst-case sce- for non-performance being effective is that it must cause pain or discomfort
nario, is insignificant to both companies. Many vendors will do almost anything to within the service provider's organization. The objective is to maximize the dis-
keep from including such a clause in the contract. Dramatic discounts have comfort so that in the future the service provider will choose to ensure the proper
resulted from vendors trying to avoid this service level commitment for their soft- level of service is delivered, rather than to suffer the discomfort that will result
ware support service. The reason that this particular clause is so objectionable is From non-performance. The most obvious way to cause pain is through a large
that it is outside of the vendor's normal processes. If a vendor actually fails to meet financial penalty. However, smart service providers won't agree to terms that can
the nominal requirements of the contract, he will have to issue a check to the cus- result in very large penalties. Also, although financial penalties are possible with
tomer. Issuing a check to a customer is not something that a vendor normally internal service providers, they are more difficult to implement. Another drawback
does. Although they have processes for taking money from customers, they don't to the financial penalty is that a large penalty can cripple the service provider,
have processes for giving them money because of poor service. Therefore, to com- making it even more difficult for them to meet their commitments. On the other
ply with this condition requires exception processing. It most likely would require hand, a small penalty applied to an unscrupulous service provider can become
an escalation of the problem within the vendor's organization to obtain the another incidental cost of doing business for them—less expensive than providing
approvals necessary to issue the check to the customer. Although raising the the level of service specified in the agreement.
70 71

In constructing the non-performance section of the agreement, creativity and flexi-


Tip
hility are important. These are particularly important when dealing with internal ser-
Penalties for non-performance should be sufficiently large so that they will cause pain within the
e providers. It does not make sense to negotiate a penalty for non-performance
vendor organization. However, even small penalties can be constructed in such a way as to make
dial consists of reducing IT's budget. That would effectively reduce their ability to
them painful.
I t el the users' requirements. Instead, non-financial penalties should be considered
(li- example, reductions in individual bonuses, and so on). Also, as an alternative to
Creating effective penalties for non-performance calls for creativity Some of the penalties, internal service providers can be motivated by rewarding them for meeting
best penalties do not involve money. For example, an effective requirement might exceeding the service level commitments in the SLA.
be specified that in a case of non-performance, the head of the service provider's
organization must meet with the head of the client organization and provide an
Optional Services
explanation. It might be even more effective if it was stipulated that the meeting
I'here might be additional service components that are not normally provided, or
had to be in person, at the client's office, and within 48 hours of the determination
of the non-performance condition.You will be most successful in creating effective T hatare not provided at this time. However, if there is reason to anticipate that the
user might want some of these options within the term of the SLA, it is wise to
penalties if you know as much as possible about the service provider. This way, you
include a provision for that in this agreement. For example, a company might not
can better understand which penalties will have the maximum effect within their
tirrently be open for business on Sunday, allowing IT to perform batch processing
organization. The bottom line is that you must be creative to be effective in defin-
and system administration work during the day. However, if it is anticipated that
ing penalties for non-performance. Also, never accept a service provider's claim that
Sunday work will be required during the Christmas holiday season—hence,
they never agree to penalty clauses. First, they probably already have done so with
tetluiring the availability of the online systems—the possibility should be included
other clients. Second, they will do so if they want your business, particularly if you
are creative in defining the penalties. it this section of the agreement.

Tip Ixxclusions
Do not accept a service provider's claim that they never agree to penalties for non-performance. This It addition to spelling out the services that are covered by the agreement, the SLA
is almost certainly untrue and even if true can be circumvented through persistence and creativity.
should also specify what is not included in the agreement. Some common sense is
\v,trranted here. Obviously, if the agreement covers the online order entry system, it
is not necessary to specify that the agreement does not include the payroll system.
It is very important that both parties have a clear understanding of what consti-
Instead, the exclusions that are specified in the SLA are those categories that might
tutes non-performance. Consider the example of an agreement that specifies a
reasonably be assumed to be covered. For example, it might be appropriate to specify
response time of 2.2 seconds for the order entry application. Is that requirement
t I at the service encompassed by the order entry system's SLA does not cover the
an absolute threshold that must never be crossed? Or is the response time specifi-
entry of orders by customers via the company's Web site. Clearly the e-commerce
cation actually referring to an average? Alternatively, there might be some
activity, although important to the company and a means by which orders can be
allowance for the threshold to be exceeded under certain circumstances (refer to
icceived, is not part of the current order entry system. The e-commerce component
the section "Limitations") or for a maximum number of transactions in a given
is too distinct to be covered by the SLA for the online order entry system. It has dif-
time period or for a maximum allowable period in the busiest part of the day.
ftrent users, employs different software, is accessed differently, and so on. What might
What is most important here is that both of the parties have a clear understanding
he appropriate to consider for inclusion in the agreement would be the interface
of what constitutes a violation of the agreement and therefore warrants some con-
I 'trough which e-commerce orders are received by the order entry system.
sequential action.

Reporting
Caution
'Hie reports generated for the Service Level Agreement are key components of the
Beware of non-performance remedies that provide the compensation in the form of future services.
SLA process. Without reports, the agreement is left merely as a statement of good
Bad service is never a bargain.
intentions. The lack of reports would mean that it would never be possible to con-
trast actual performance against the stated objectives contained in the agreement.
72
73
The reports must be relevant to the service level objectives and reflect the service
level indicators. Like the service level objectives, users must readily understand Who Can You Trust?
them—even the ones who have no understanding of the underlying technical A certain degree of caution is appropriate with the SLA reports, particularly if the employees pro-
issues. In many cases, graphs are the best way to represent the information about ducing the reports stand to gain personally from the results reflected in the reports. A large com-
the service level performance. However, remember that some users will want to pany learned this lesson painfully. The company had an internally developed trouble ticketing system.
look at the data more closely. Therefore, it is advisable to have the supporting data The system was not terribly sophisticated, but it was adequate for the needs of that company. One
available in tabular form for those who want to review it. Another recommenda- clay, an executive got the bright idea that he could motivate the IT department employees to provide
tion is to keep the reports simple and focused. Although it might be easy to dis- better service (higher availability) by linking their quarterly bonuses to the level of service that was
tribute copies of a report already being produced that includes the required being delivered to their clients. On the face of it, this seemed like a reasonable idea. The question
information (plus a lot of other information), it is unwise to use this report. then became how to measure the service being delivered. Someone hit upon the idea of using the
Instead, it is better to distribute reports that contain only the specific information trouble ticketing system. This seemed reasonable because it did track every outage and, by implica-
required by the SLA. Additional information can be confusing or lead to misun- tion, could then be used to calculate the remaining availability.
derstandings. Reports might contain information about multiple service level indi-
Data from the trouble ticketing system was analyzed and it was agreed that this could be used to
cators, but should not contain extraneous data.
provide a reasonably accurate indication of service availability. The decision was made to implement
the plan.
Tip
It should be noted that the trouble ticketing system was not perfect. Its greatest weakness was the
Remember that graphs can convey more information and be more readily understood than
fact that it relied on individual IT employees (Help Desk) to manually enter information about service
tables. Therefore, whenever possible, use graphs to display information about actual service level
performance. interruptions. The employees were quite good about opening trouble tickets when a problem
occurred. However, when they were very busy, or if multiple people were involved in resolving the
problem, sometimes a trouble ticket might not be closed at the time that the problem was resolved.
The SLA should contain a list of each of the reports that will need to be produced This could result in some trouble tickets being open for several days, or even longer. This problem
in support of the agreement. For each report, the SLA should specify the name of had been discovered a couple of years earlier and the program was modified to allow anomalies like
the report and when it will be produced (frequency). It should also indicate which
this to be corrected. At the end of each month, any apparent problems of this type were researched
service level indicator(s) are reflected in this report. There should be a brief
and new information was entered to reflect the correct duration of the problem. This facility became
description of the content of the report and possibly even an example of the
the source of a problem of a very different type.
report itself. A description of the source of the data for the report should be
included in this section of the agreement. Although this might seem tedious, it The employees responsible for researching outages and, if appropriate, entering the correct informa-
does prevent misunderstandings later. Also, it can serve as a limited guard against tion were part of the same group whose bonuses were tied to service availability. It did not take
unethical manipulation of the reports during the term of the agreement. For each long for these individuals to figure out the facility used to correct errors in outage durations could
report specify the following: also be used to ensure that their group always met or exceeded the objectives that had been estab-
lished for service availability. Within a couple of months of implementing the idea to link bonuses to
• Report name service availability, availability had soared. The executive who had conceived the plan was congratu-
• Frequency lating himself and his team.

• Service level indicator(s) Executives from the user departments grumbled that they had not seen any improvement in service.
• Content However, this was initially dismissed with the thought that the users would never be satisfied no

• Data sources matter how much service improved. After about eight months and continued complaints from the
user departments, supported by their own documentation of problems, the IT department decided to
• Responsibility
investigate the situation. The investigation did reveal that the employees had been falsifying the
• Distribution records in order to meet availability objectives and thereby maximizing their personal bonuses. As a
consequence, the link of availability to bonuses was discontinued and accurate reporting returned.
Amazingly, no one was ever disciplined for this scam.
74
Revisions
The SLA needs to specify who will be responsible for producing the reports.The
When an SLA is put into place, it should be expected that revisions to it would be
responsibility should be specified by position or group rather than by individual.
necessary.'l'he agreement is not set in concrete, nor are the organizations that it
It is also necessary to include specifications about the distribution of each report.
° Ie rves. Revisions are very common and tend to be driven by a variety of factors
As illustrated in the sidebar, care must be taken not to create a situation in which
including: requirements, technology, workload, staffing, staff location, mergers and
a conflict of interest might arise and lead to the reports being compromised. At a
a, tluisitions, and so on.When revisions are necessary, a new agreement will need to
minimum, it should list the groups, or positions, that are to receive the reports.
I written and approved. As with the agreement reviews, the process can be quite
However, it is also desirable to specify whether the report will be produced in hard
iii lìirinal or require a lengthy negotiation process.
copy or electronic form. If electronic copies are chosen, the SLA should specify
how the report would be distributed (email, Web, and so on).
pprovals
Administration Alier all the details for an SLA have been defined, and all the parties are in agree-
ment, the agreement needs to be signed. In the case of an SLA with an external
This section of an SLA describes the ongoing administration of the SLA and the
service provider, this is obviously necessary With internal service providers, the
processes that it specifies. In this section, there needs to be a description of the
need to sign the agreement might be less obvious, but it is just as important. In
ongoing processes and a definition of where in the organization responsibility for
signing the agreement, both parties are formally acknowledging that they are in
each process lies.
Agreement with its terms and are committed to its success.The person signing the
,agreement for the service provider should be the person who has authority over
Reviews All aspects of the services covered by the agreement. Likewise, the user signing the
Periodically, the SLA needs to be reviewed to verify that it is still valid and that its Agreement should be the overall department head, that is, the person to whom all
processes are working satisfactorily. It is possible for a review to occur at any time, the users of the service report. However, regardless of level, the individuals signing
if both parties are agreeable to doing so. However, the SLA needs to specify times the agreement must have authority to sign the agreement and have an interest in
when regular, periodic reviews will occur. In a typical agreement, with a term of its success.
24 months, three reviews should be scheduled. The first review should be held six
months after the agreement is put in place. The other two reviews should occur in
the twelfth month and the eighteenth month. Summary
Service Level Agreements are a key component to any service level management
In a review of an SLA, some fundamental questions need to be addressed. The first
process.To begin with, they provide a basis for effective dialog between the client
question is whether the agreement and its associated processes are functioning as
,and the service provider. They can be beneficial to both the service provider and
intended. Particularly in the first review, it is important to address the question of
the client because SLAs hold both parties accountable. That is, the client is forced
whether the agreement and its service levels are still acceptable. The reviews need
to define the level of service that will be considered acceptable. On the other
to consider whether any changes are required. For example, it might be necessary
hand, the service provider is held accountable for delivering the level of service
to replace a service level indicator because data is no longer available for it. Or, it
to which they have agreed. To be effective, the SLA must be negotiated fairly
might be necessary to redefine responsibilities or report distributions because of an
and in good faith. When established, the SLA is one of the most effective vehicles
organization restructuring.
available to the service provider for managing client satisfaction.
SLA reviews can range from very informal to very formal. They can be little more
than two department heads (for example, the former negotiating team leaders) dis-
cussing the SLA over a cup of coffee. Alternatively, at the other end of the spec-
trum, the review might consist of reconvening the entire SLA negotiating team.
The method chosen will depend in large part on the culture of the company, the
warmth or coolness of the relations between the departments involved, and the
user department's satisfaction with the service levels being delivered.
CHAPTER

Standards Efforts

I ndustry standards for service level management are not very mature at this time,
and in general there is a lack of industry-accepted methodologies, practices, and
standards in place. Most standards efforts have focused on infrastructure manage-
ment rather than service management. This is at least partially because of the diffi-
culty of setting standards for defining, measuring, and managing services, which is
a more complex issue than standards for monitoring and configuring individual
devices and components.
The most notable standards effort to date has been driven by the UK
Government's Central Computing and Telecommunications Agency (CCTA).
CCTA has delivered a documented methodology for managing service called the
IT Infrastructure Library (ITIL). Other efforts include the Service Level
Agreement (SLA) Working Group created by the Distributed Management Task
Force (DMTF) and the Appl MIB by the Internet Engineering Task Force (IETF).
A more focused effort, the Application Response Measurement (ARM) Working
Group is supported by several vendors, as well as a special interest group sponsored
by the Computer Measurement Group. The rest of this chapter looks at these
efforts in more detail.
• Setting accounting policies for cost allocation
Tip
Standards efforts continually change over time, with some efforts gaining momentum whereas oth-
• Monitoring and reviewing services
ers lapse, and new initiatives are regularly introduced into the industry. It is worthwhile to check the • Reporting on achieved service levels
following Web sites regularly to ensure that you are aware of new developments:

http: / /www. dmtf .org (Distributed Management Task Force)


User
Users
http: //www.ietf.org (Internet Engineering Task Force)

http: //www.cmg.org (Computer Measurement Group)


Service Level Agreements
http: / /www.exin.nl/itil/itinf /home (IT Infrastructure Library)

Services
IT Service
Providers
IT Infrastructure Library Systems

The IT Infrastructure Library (ITIL) was initially developed for use within UK
government IT departments by the Central Computing and Telecommunications Contracts
Agency (CCTA). This library consists of 24 volumes available to interested parties.
The use of the ITIL has spread outside the UK government and, in fact, has a sig-
nificant amount of support throughout Europe. Awareness and support for ITIL in Suppliers and
Hardware Application Telecomms
the United States is very limited, although an organization has been established in Maintainers
Software
the United States to try to increase its acceptance.
The ITIL has a number of service management modules that cover topics includ- The relationship of users, providers, and maintainers of IT services.
Figure 5.1
ing help desk operations, problem management, change management, software
control and distribution, service level management, cost management, capacity I'l'IL specifies many benefits of service level management including achieving a
management, contingency planning, configuration management, and availability specific, consistent level of service, balancing service levels against the cost of provid-
management. The volumes provide a methodology for defining, communicating, ing them, increasing user productivity, and defining a more objective relationship
planning, implementing, and reviewing services to be delivered by the IT depart- between IT users and service providers. It also spells out potential problems includ-
ment. They include guidelines, process flowcharts, job descriptions, and discussions ing resistance to change, the difficulty in formulating service level requirements, the
on benefits, costs, and potential problems. problems associated with establishing costs, and the danger of agreeing to overly-
The specific module on service level management refers to the relationship ambitious service level targets before an appropriate baseline is established.
between service level managers, suppliers, and maintainers of services. ITIL sees ' l'he ITIL endorses the principle that IT services are there to support the business
service level management as being primarily concerned with the quality of IT ser- and help staff to do their work well. Two concepts are embodied in all its modules:
vices in the face of changing needs and demands. Figure 5.1 shows how IT users,
service providers, and suppliers relate via the use of Service Level Agreements. • A lifecycle approach to service management
This module also outlines the responsibilities of the service level manager as . Customer focus

. Creating a service catalog that describes provided services ITIL advocates that IT service managers have appropriate input to development
• Identifying service level requirements relating to each service and user com- projects to ensure that operational requirements are taken into account during
munity development, testing strategies are created, capacity requirements to support the
. Negotiating Service Level Agreements between service suppliers and IT new systems are understood, and service expectations are understood from the
users beginning. ITIL emphasizes the importance of service quality, and states that
quality service comes from keeping close to the customer and communicating
. Reviewing support services with service suppliers
effectively with the customer.
80 O 1

ITIL views service management as a single discipline with multiple aspects and user. High priority should be given to those practices that will help improve the
advocates taking an integrated approach to implementing service management. quality and consistency of IT service delivery. Use the ITIL methodology as a
Hence ITIL recommends the use of a single repository for configuration data that starting point and alter it to better suit the size and maturity of your organization
is available to the help desk and used as a base for change management, problem and the scope of the services to be managed.
management, and contingency planning as an important implementation consider-
ation. The base ITIL modules don't specify which order to implement all aspects
of service management because they can be implemented either consecutively or Distributed Management Task Force (DMTF) SLA
simultaneously. Working Group
'l'he Service Level Agreement (SLA) Working Group is a task force of DMTF
Note members, who are focused on extending the DMTF's Common Information
ITIL does not cover all aspects of implementation management, and recommends the use of formal Model (CIM). The CIM's aim is to allow the definition and association of policies,
project management as well as complete procedure documentation, risk management, audits, and rules, and expressions that enable common industry communications with respect
regular reviews. to service management.
The Common Information Model (CIM) is an object-oriented information model
The ITIL approach gained the support of the British Computer Society, which has that describes details required to manage systems, software, users, and networks. A
validated the training and examinations associated with ITIL's Certificate in IT conceptual management framework is provided that establishes object definitions
Infrastructure Management. A number of user groups have formed to support and classes, and uses the following layers:
ITIL, including the IT Infrastructure Management Forum and the IT Service I
Management Forum. These user groups comprise IT departments of government • Core Model—Applicable to all domains of management (domains include
and commercial organizations as well as academic bodies and vendor representa- systems, applications, devices, users, and networks). 14
II
tives. EXIN, the Dutch equivalent of CCTA, has become a partner in ITIL and • Common Models—Common to particular management domains but i
is helping to fund the ongoing updating and re-issuing of the library. independent of a particular technology or implementation.
Several thousand professionals in Europe have been trained and certified in the • Extension Models—Technology-specific extensions of the Common II I
ITIL methodologies, and multiple authorized vendors provide ITIL certification Models. 'IIII
training, almost all of whom are located in Europe. A large number of organiza-
tions in Europe are training their IT staff in ITIL methods; however, it is impor- An important aspect of CIM is the ability to define and represent relationships
It l I
tant to recognize that smaller environments should scale down the processes and between objects. This is very useful when trying to show the resources used by
methods appropriately. various applications and the users who implement those applications.

Note Note !II

Training in ITIL methods does not mean that you are ready to implement the methodologies immedi- The Common Information Model began as a component of the Web Based Enterprise Management

ately as learned in your IT organization. ITIL starts with the presumption that no current service (WBEM) initiative and has gained significant support from hardware, software, and management

management methodology, products, or processes are in place, and this will not be the case in vendors in the industry.
most organizations. The goal is to adopt and adapt those methodologies that bring an appropriate
level of discipline and the most benefit to the organization. These should be implemented in a The SLA Working Group is extending the syntax and metaschema of CIM to
phased approach to build on and extend existing practices. embrace the concepts of service management. The concept of a service spans
across multiple areas of the CIM schema, such as network support of the service,
It is recommended that service managers become familiar with the concepts and software used to deliver the service, and end users who consume the service. Core
methodologies provided by the ITIL and use this information as a framework to Model extensions are being created, and these will also allow for further subclass-
review current service management processes with the IT department. Then, select ing within the Common Models for domain-specific usage. In addition, policies
those areas in which no formal procedures exist and that appear to have the largest will be supported for representing management goals, desired system states, or the
potential return on investment in terms of better support for the business and IT commitments of a Service Level Agreement.
CL 83

The working group is seeking to address a number of issues including . Transaction Statistics tables that hold information about transaction streams,
including the number of transactions processed and transaction throughput
. Various types of policies and linkages with methods to allow detection and
response to policy violations . ' I'he Running Application Element Status table that augments the information
contained in the SysAppl MIB table with additional status, open connections,
. Mechanisms for end-to-end management of policies and rules across
and error information
multiple domains
. The Running Application Element Control table that provides the ability to
. Specification of priority and ordering of rules and expressions together, with exercise some control over the running application elements including sus-
mechanisms for conflict resolution pending, reconfiguring, or terminating a running element
. Interfacing with other working groups to allow use of policies and rules for
systems, applications, a network's availability, and performance management )l . these, the most interesting with respect to service level management are the
'Transactions Statistics table and the Running Application Element Control table.
At this time the SLA Working Group is in a very early stage, and it is unclear 'I'lie requirement that the application runs on a single system is a limitation that
whether significant progress will be made and, if so, whether the work will gain needs to be considered as part of any management solution implemented using
broad acceptance in the industry. As this work is likely to continue to evolve, it is t hee Appl MIB standard.
recommended that interested parties monitor the efforts through the DMTF Web
site, at http: //www.dmtf . org .
Note
As the Appl MIB is still a proposal in RFC stage, it will be some time before it is finalized and addi-
tional time before there is any significant support and implementation of software products that
Internet Engineering Task Force (IETF)- use this standard to provide management information.
Application Management MIB
The IETF has issued RFC 2564 Application Management MIB that, although not
focused on service level management, does have a number of elements that can
assist in measuring and managing service quality. The areas of most interest are the Application Response Measurement Working Group
definition and measurement of units of work; response time monitoring; monitor- I Iewlett-Packard and Tivoli (a subsidiary of IBM Corporation) cooperated to pro-
ing resource usage by application such as I/O statistics and application layer net- duce an API specification called Application Response Measurement (ARM). This
work resource usage; and facilities for controlling applications such as stopping, is designed to measure business transactions from an end-user perspective, as well
suspending, resuming, and reconfiguring applications. is measuring the contributing components of response time in distributed applica-
tions. Response times are one aspect of understanding the service being delivered
The Appl MIB is complementary to the SysAppl MIB that focuses on system-level
to end users. The ARM Working Group has expanded to include representatives of
managed objects for applications. Both the SysAppl MIB and the Appl MIB have a
I;MC Software, Boeing, Candle, Citicorp, Compuware, Landmark, Novell, Oracle,
significant limitation in that they specifically exclude any applications running on
SAS, SES, Sun, Unify, and Wells Fargo, along with Hewlett-Packard and IBM.
multiple systems. This means that client/server applications or applications that use
multitiered architectures are not covered by the Appl MIB. A number of tables are 'The actual API specification is relatively simple and requires placing API calls
associated with the Appl MIB including the following: within the application code to designate the beginning and end of business
transactions. Additional optional API calls can be used to indicate progress in
. Service-level tables that map services long-running transactions. The ARM API defines six procedure calls:
. The Open Files table that contains information on files currently open for
arm init: Initializes the ARM environment.
the running application elements
arm_getid: Names each transaction that will be
. The Open Files Cross-reference table that accesses information about open
monitored.
files using the names of the open files as an index
arm start: Denotes the start of a transaction
. The Open Connections table that provides information on read and write
instance.
activity by an application element across connections
84

arm_update: Updates statistics fora long-running


transaction.
Registers the end of a transaction
arm_stop:
instance.
PART
arm_end: Cleans up the ARM environment
prior to shutdown.
The ARM Software Developer Kit continues to be refined.Version 2 addressed the ,
issue of client/server transactions that span multiple tiers by supporting the corre-
lation of transactions and sub-transactions. This allows a better understanding of
where bottlenecks and delays might be occurring in more complex transactions.
Version 2 can also be used to provide information on the size of transactions, such
as a count of bytes or transactions processed, which might be useful to show the
status of long-running transactions. The new update API can also be useful in pro-
viding additional error code information or an indication of transaction progress
such as the account record currently being processed.

Caution
The ARM API is intrusive and must be used during the application development or retro-fitted if the
application is already in service. Because many of the applications used within an IT department are
produced by third-party software vendors, the IT department will not have access to the source code
of these applications. Hence retro-fitting the applications for ARM might be impossible unless the
Reality
vendor agrees to make the modifications. It might be possible to use remote terminal emulation and
embed ARM API calls with the scripts; however, there are other mechanisms for simulating transac-
tions and gathering statistics that might be simpler to implement. As ARM does not have wide-
spread acceptance yet, reliance on it as the only way to measure end-user response times might be Chapter
premature.

6 Service Level Management Practices

7 Service Level Management Products


Summary
No single industry standard exists for service level management that has broad
acceptance. Although a number of initiatives are underway, the best approach is
to keep apprised of standard developments and use a best practice approach. The
methodology of the ITIL—suitably modified for your particular organization,
together with suitable mechanisms for measuring aspects of service quality—can
provide a base platform for successful implementation of service level manage-
ment. As standards continually evolve and new initiatives appear frequently, it is
wise to monitor the various standards organizations via their Web sites.
CHAPTER

Service Level
Management
Practices

T his chapter examines the current practices in use today in typical corporations
;Ind organizations. Most of the information used to draw conclusions has come
from the United States, although anecdotal evidence suggests that common prac-
tices in other countries are quite similar. In general the current state of service
management, particularly for newer applications and services across distributed
enterprises, is somewhat immature. Although a number of organizations proactively
manage the services they provide, the definition, understanding, and scope of
service management vary tremendously from organization to organization.

Lack of Common Understanding


In late 1998, Enterprise Management Associates surveyed readers of Internet Week,
and the report7 was published in November. It was found that 21% of high-
ranking IT executives could not define service management or identify the tools
they would use to implement it, and 81% said they needed more information on
88 nv

the subject. Even among organizations that practice service naanageinciu, the scope skiff overcontniitinent in trying to meet unrealistic goals.These unrealistic objec-
and understanding of this discipline varied. Of those who could define service tives might be set by the IT organization in response to customer demands and, in
level management, the most common answer (around 35% of those surveyed) many cases, the agreements are too one-sided and don't clearly specify the respon-
associated the term with meeting or improving end-user perception of the sibilities of both parties.
service, which might be a specific application or network service 6. I >itlèrent industry analysts emphasize different aspects of service level management.
Several industry research firms have concluded that there is significant confusion I Iurwitz Group advocates that a service level agreement needs to specify three
in the industry and in the marketplace around service level management. For service level objectives: user response time, application availability, and application
example, META Group titled a May 1999 research note "Service Level Mess," recoverability. Hurwitz Group also sees service level management as an iterative
citing hype from vendors as helping to increase this level of confusion. META process that extends beyond Service Level Agreements and must be managed as
Group also indicated that the service level management market maturity would outlined in Figure 6.1.
begin in 2001.
Many IT organizations look at service level management as simply a reporting 1. Define the SLA 2. Assign the
function, or perhaps a mechanism for gaining some advantage by documenting SLA owne r

Service Level Agreements (SLAs). An April 1999 report by Forrester Research


found that most organizations with documented Service Level Agreements reap 3. Monitor SLA
little more than bureaucracy from those agreements 4. They related that most SLAs compliance

• Arise from provincial objectives, such as to support chargeback or to defend


in-house jobs from outsourcers 4. Collect &
analyze data
• Specify irrelevant metrics, such as component utilization measures
• Lack teeth where no penalties exist for not meeting the agreement
• Don't drive improvement because they don't assess customer satisfaction,
and are not revised when there are changes in business or technology best Figure 6.1 The SLA management process as defined by Hurwitz Group.
practices
In contrast, Giga Group believes that the three critical areas of Service Level
In a research survey published in February 1998, Forrester found the result of Agreements are response time, application availability, and cost of service delivery.
management problems to be downtime (38%), poor performance (20%), slow Giga Group recognizes that cost analysis is a complex subject, but highlights the
problem resolution (18%), impact on revenues (15%), high IT costs (13%), and user need to consider cost as it relates to achieving a specified set of service levels.
dissatisfaction (13%). Forrester Research believes that corporations need better Forrester Research sees service level management as consisting of ensuring busi-
ways to measure IT department performance and must align the IT department ness application availability, performance planning to enhance the infrastructure
with business goals. Effective service level management can help achieve these to meet response time requirements, and administrative support to provide the
goals5 . day-to-day operations.
META Group advocates an approach it calls Service Value Agreements (SVAs),
Note which is an evolution from a static SLM model to a dynamic one that is oriented
A number of industry research firms cover service level management, particularly since interest has around business-focused process management. In META Group's assessment, less
grown within their client base. Although we quote from several of these firms, it is very likely that if than 25% of IT organizations would implement service management from a qual-
you use the services of a different research firm, it will also have a practice that covers service level ity discipline wherein the IT department is aligned with the goals of the lines of
management. business and has appropriate compensation programs to ensure SLA goals are met 1 .
The increased attention from industry analysts, as well as more service level
The industry research firm Gartner Group outlines common pitfalls of today's management articles appearing in trade publications, will help to educate IT
Service Level Agreements as being too complex, with no set baseline, leading to
90 I
91
professionals. The general level of understanding and acceptance will also increase
International Network Services conducts an annual online survey on service level
as more industry forums evolve to include the opportunity for the sharing of best
management. The 1999 INS survey showed that, of those respondents who have
practices around service management. At this time, there is no common agreement
implemented service level management, 63% were satisfied with their organization's
even among the industry analyst community regarding the definition, scope, and
S I ,M capabilities versus only 17% in the previous year. Although satisfaction was
process for managing services effectively. As outlined in Chapter 5, "Standards
increasing, the same survey indicated that 90% of the respondents felt improving
Efforts," few standards have emerged and none have any significant support. We
(heir SLM capabilities was an important goal—the same number as the previous
can expect the situation and the maturity of service management to continue to
year. This is a good indication that service level management is, in fact, a continuous
improve, particularly if accepted standards do emerge.
improvement process, with most IT organizations seeking better capabilities 3 .
Tip I'he 1999 INS survey also found that organizational issues including processes and
While waiting for standards to emerge and evolve, including the basics such as common definitions,
procedures were the most significant barriers to implementing or improving ser-
you might want to become involved in groups such as the Distributed Management Task Force and
vice level management. A number of other challenges related to the difficulties in
the IT Service Management Forum. You could also attend selected, focused trade shows and confer- defining, negotiating, and measuring Service Level Agreements. Also noted was the
ences where you can share your experiences and listen to best practices in use at other organizations. problem of justifying the cost/benefits to upper management.
'rhe degree to which the IT department has implemented sophisticated service
management varies by the perspective of the IT department. If the IT organization
sees itself as a partner with the lines of business and responsible for helping those
Current Service Level Management Practices
business units gain market advantage and improve profits, continuous service
Most IT organizations today don't practice service management as a defined, con- improvement comes more naturally. The nature of Service Level Agreements and
tinuous process of quality improvement.Various research studies show, however, management is also different for services the IT department provides internally
that many organizations have implemented some aspects of service management and services it contracts for with external suppliers.
that typically begin with some form of documented service level agreement and
expand to include service level monitoring and reporting. A minority of IT orga-
nizations have also included a disciplined approach to continuous improvement for Management of Services Provided by the IT Department
both achieved service levels and customer satisfaction.
to the Corporation
A research report published in April 1999 by META Group shows that IT organi- Most IT departments evolve their service level management along two vectors:
zations implementing service management quality programs place emphasis on the
following five areas2 : • The IT department begins by setting internal goals and then extends them
by negotiating and formalizing an agreement with the lines of business.
1. Management by fact using captured performance data. Some IT organiza-
• The metrics used to determine quality of service and the scope of the
tions are also using this information to determine the performance bonus service management processes increase and improve.
component of IT management compensation.
2. Continuous improvement, which means meeting user expectations that In all organizations, the users of services provided by the IT department have a set
continue to increase while balancing service levels, timeliness, and cost. of expectations about the quality they want to be delivered by those services. The
3. Customer satisfaction surveys to ensure end-user priorities and perceptions starting point for managing service quality is, typically, when the users complain,
are clearly understood. the IT department attempts to fix the problem. When user complaints escalate
4. Design for quality including a focus on change management, standardization, and the IT department is continually in a fire-fighting mode trying to meet user
training, and end-user tools. expectations, the IT management typically seeks to put in place more proactive
processes and procedures.
5. IT leadership in helping lines of business to create competitive advantages
using new technology and services. This generally starts with a review of current management practices and an
attempt to establish a baseline of the quality of services being delivered by the IT
92 93

department. To understand service quality, a review is typically undertaken of In most cases, these early steps establish a baseline of procedures and internal
reported problems, including the trend in the number of problem reports, the time processes necessary to ensure consistency of approach to service delivery. This
to close problems, the number of backlogged problems, which organizations are approach often helps the IT department understand the current level of service
most affected, and which IT functions are handling the greatest number of prob- delivery and highlights which areas require most improvement. Although an
lems. Following the review and the establishment of a baseline, the IT department important first step, real service level management will not begin until the IT
can then set a number of measurable, internal goals that will lead to service department establishes agreements with internal clients and external suppliers.
improvement.
Note
Tip Even after Service Level Agreements are established outside the IT department, internal agreements
To be most effective, the initial service quality goals should be simple, easy to understand, and will still be required to define interfaces between the various areas within the IT organization, along
clearly measurable. There should also be a link between achievement of the goals and incentives with expected operating procedures and performance goals.
for the IT staff responsible for the service, such as a bonus component of their compensation.

Agreements with the Lines of Business


Internal to the IT Department There is a natural maturation of the types of agreements the IT department might
The IT department typically begins the journey toward more disciplined service enter into with the lines of business. Initially, these will be informal agreements
level management by examining internal management processes and setting some with very ad hoc service quality measurement and reporting. This might result
internal performance goals. In many cases, this starts with the procedure for from a problem situation with the service delivered to a specific line of business
reporting, documenting, tracking, and resolving problems. or particular application. In order to factually examine the situation, the IT depart-
ment must collect information relating to the problem, service availability, service
Processes that become more formal include the method used to document and
degradation, notification procedures, and the problem resolution process.
categorize problem priority and severity, how problem status is reported to affected
users, and how problems are escalated. Associated goals include how quickly help Having captured this information, a baseline of service quality can be established
desk calls are answered, how quickly the assigned support specialist will respond to and documented and the IT department can work from this baseline to improve
the user reporting the problem, and how quickly the problem will be fixed (which service quality and user satisfaction. After this process is followed for one applica-
normally varies according to the assigned priority or severity). Problem hand-off tion supporting one line of business, it is a natural process to extend the process
procedures between the various areas within the IT department are also formal- to other lines of business and other applications and services. At this stage in the
ized, together with project status reporting and processes for handling crisis situa- maturation process, a number of Service Level Agreements may be negotiated and
tions. Together with these documented procedures and goals, the IT department documented. However, in many cases, these agreements address the most bother-
might identify requirements for supporting infrastructure and tools, such as phone some lines of business or the most visible applications from the perspective of the
systems, a problem-tracking system, and associated knowledge database. IT department.

The IT department might also set other internal goals around the delivery of new The next stage in developing more effective service level management practices
services to the organization, such as how quickly the department will respond to is to address the prioritization of the lines of business and the applications and
requests for new desktop equipment, or to equipment moves, or to additional net- services provided by the IT department from a business value perspective.
work connections.

Additionally, the IT department might implement certain system-performance- Caution


Determining business- value-based Service Level Agreements might be a difficult concept for all
oriented goals that are based on observed behavior patterns. For example, in many
lines of business to accept. It typically requires senior-management-level and sometimes executive-
organizations, the IT department attempts to keep utilization levels of distributed
servers below 75% because the department has observed increased frequency of user level sponsorship to ensure complete buy-in by the lines of business.

calls complaining of poor performance if system utilization goes beyond that level.
94 96

'I'his approach is also shown in the 1999 INS survey, where network availability
Only half the participants in the 1999 INS survey had SLAs in place, versus 87%
was selected as very important by 90% of respondents.This is a technology view
who stated that they had some form of service level management. This indicates
of availability. The secondmost important metric was customer satisfaction, which
that a large number of organizations were still in the ad hoc stage of managing
can be achieved only if the user experience is perceived to be acceptable. The
service levels. The survey also indicates that the acceptance and implementation
thirdm.ost important component was network performance, followed by applica-
of Service Level Agreements will improve as 60% of the respondents planned to
I ion availability and application response time. This supports the typical view of
implement either initial or additional SLAs, at which time 65% of respondents will
first addressing availability, and then performance, while at the same time moving
have at least one SLA in place. Interestingly, IT departments recognized the impor-
from a pure component view to one that centers around the application and the
tance of mapping resources to the most critical applications and services, with 42%
of respondents making this an objective for Service Level Agreements 3 . end-user experience.

The 1999 INS survey also examined the primary objectives for developing Service
Level Agreements between the IT department and lines of business. The most Management of Services Provided by External Suppliers
prominent themes were setting and managing user expectations and assessing their The primary external services supplied to the IT department are the networking
satisfaction, understanding service priorities and mapping resources accordingly, and services provided by telecommunications companies or Internet service providers.
measuring the quality of the services provided by the IT department. Figure 6.2 External Service Level Agreements with these carriers are becoming important
provides additional detail on this aspect of the survey. parts of the relationship between the IT department and the service providers.
In a joint study conducted in 1999 by McConnell Associates and Renaissance
Primary Objectives for Developing SLAB Worldwide, 46% of IT managers said they have established external SLAs with
their providers, and 80% of those agreements include penalty clauses for failure
J 15 Expand services to deliver required service quality 6 .
l 25 Measure efficacy of operational procedures
26 Measure impact of IT in business When managing the services provided by external suppliers, IT managers have
28 O Prioritize services on basis of importance a set of objectives different from the one they have when setting Service Level
28 Relate technology to business objectives Agreements with lines of business. The 1999 INS survey shows the top three
36 • Help justify and prioritize additional investment priorities of external Service Level Agreements to be
36 In Measure quality of service
37 [] Measure customer satisfaction • Define required performance levels (58% of respondents)
42 ❑ Map resources to most critical services
• Measure quality of service provided by service providers (52%)
48 ® Define required performance levels
• Measure customer satisfaction (34%)
55 Set and manage expectations

0 20 40 60 This is demonstrated by the results of the McConnell/Renaissance survey, which


showed the most important service level metrics for wide area network providers
Respondents by Category
to be
Figure 6.2 The top objectives for developing SLAs between the IT department and internal
organizations-1999 INS service level management survey. • Utilization rates (78% of respondents)
• Throughput (72 %)
In addition to negotiating Service Level Agreements, the IT department must • Error rates (67 %)
measure service levels to understand the quality of services provided to lines of
• Availability (66 %)
business. Most IT organizations begin by measuring the availability of services they
provide, and then add capabilities to measure performance or the responsiveness • Response time (61 %)
of the service to end users. Again, there is a maturation process wherein the IT • Reliability (50 %)
department typically begins by measuring and monitoring the availability of
various technology components before it can measure the end-to-end availability
of the service from the user's perspective.
96 97 I
These results are consistent with the 1999 INS survey that showed the top three UUNET Technologies offers SI,As fin• frame relay, dedicated circuits, and Internet
elements included in external network service provider SLAs to be access services. These cover network availability, latency, proactive outage notification,
and installation interval guarantees. Again, with each of these there are financial
• Network availability (77% of respondents) penalties if UUNET fails to meet the performance guarantees.
• Network performance (73%)
In summary, offering Service Level Agreements and managing the quality of the
• Network throughput (64%) services they provide is seen as a competitive necessity by the telecommunications
and Internet services providers.

Tip
Telecommunications companies and Internet service providers are becoming much more competitive Tip
IT departments should ensure that they have Service Level Agreements in place with external suppli-
and aggressive in trying to increase their respective market shares. If you don't have a formal service
ers, and that those agreements are monitored and regularly reviewed. Without appropriate service
level agreement with your supplier, you should be able to use the competitive pressures to negotiate
quality from these suppliers, it is extremely difficult, if not impossible, for the IT department to meet
one that includes penalty clauses for failure to deliver the required level of service.
its own Service Level Agreements with the lines of business.

For the most part, telecommunications service providers and Internet service
providers are offering SLAs that guarantee high levels of network performance.
To illustrate this, we will look at a sample of providers offering such agreements; Typical Agreements
however, note that this is meant to be only representative and not exhaustive. Currently, most Service Level Agreements for services provided by the IT depart-
AT&T offers SLAs for its domestic, international, and managed frame relay ment are fairly simple and are more focused on specifying roles, responsibilities,
environments. AT&T provides SLAs in five areas including provisioning, service and procedures. The 1999 INS survey found the top elements included in internal
restoration time, latency, throughput, and network availability. Each of these areas SLAs to be:
has agreed-upon service levels and if they are not met, AT&T credits customers
for monthly charges and maintenance fees based on the terms outlined in each • Assignment of responsibilities and roles (64% of respondents)
customer's contract. • Goals and objectives (61%)

GTE Internetworking offers SLAs for its Internet Advantage dedicated access • Reporting policies and escalation procedures (61%)
customers. These SLAs include credits for network outages, the inability to reach • Help desk availability (59%)
specific Internet sites, and packet losses. GTE guarantees only its own backbone,
but customers can test to identify packet losses or delays within that portion of Below these were more performance-oriented metrics including network availabil-
the network. GTE also keeps performance statistics on a central database, which ity, network performance, application availability, and application response time.
allows verification of customer claims of poor performance. The structure of most Service Level Agreements begins with a statement of intent,
MCI WorldCom's networkMCl Enterprise Assurance SLA extends performance a description of the service, approval process for changes to the SLA, definition of
guarantees across all its data services. These include guarantees for availability, terms, and identification of the primary users of the service. A number of proce-
performance such as transit delays, and network restoration time. dures are described, including the problem-reporting procedures, definition of the
change management process, and how requests for new users will be processed.
NaviSite Internet Services provides Internet outsourcing solutions and offers the Typically, the schedule of normal service availability and schedule of planned out-
SiteHarbor product family of service guarantees. These guarantees cover the data-
ages is specified.
base server, Web server, network infrastructure, and facility infrastructure. NaviSite
includes penalties in the form of free service if the guarantees are not met. Following this definition of roles and procedures, any specific performance objectives
are specified. In most of today's SLAs, these goals tend to be limited to availability
Sprint's Frame Relay for LAN service is backed by performance guarantees for measures and response times and resolution times for reported problems. In some
network availability and network response time. Sprint also offers performance cases, additional measures and objectives are stated for application-response times.
guarantees for its Frame Relay for SNA service. In both cases, Sprint provides Today, very few internal Service Level Agreements either specify the costs of services
customers with financial credits if performance guarantees are not met.
tib

or provide a cost allocation mechanism. Similarly, very few agreentenfs specify Types of Products in Use
penalties for the IT department if service level objectives are not utet.
'1 Forrester Research 1998 report on Service Level Management included a
In general, these Service Level Agreements are early in the maturation cycle, but survey on the use of management tools. The results indicated the following tool
they establish a dialog between the IT department and the lines of business and usages:
catalog the services provided by the IT department. These Service Level
Agreements also establish the procedures for the lines of business to interface with • Systems management tools (70% of respondents)
the IT department, and begin to set expectations as to the service levels that can • Network management tools (60%)
be measured and delivered by the IT department. • Applications/database management tools (35%)
Service Level Agreements with external suppliers tend to follow the standard • Management frameworks (28%)
offerings of the telecommunication service providers and Internet service providers.
Although these are probably adequate for small and medium corporations, larger The more widespread use of systems and network management tools as compared
organizations might be better served by negotiating custom agreements. to the use of application management tools also explains some of the immaturity
of service level management. Looking at the network or system in isolation does
provide the ability to measure or manage service from the user's perspective. The
Reporting Practices lines of business use applications that automate a business process or task. Hence,
their concern is that the application be available and responsive to the users, who
Just as Service Level Agreements are somewhat immature at this time, so are the
have no visibility to or desire to know the availability and performance of the
service level reporting practices of most IT departments. The majority of service
level reports are very detailed, component-level availability and performance statis- underlying infrastructure, such as the network and the systems.
tics that are incomprehensible to most recipients outside the IT department. Some The 1999 INS survey examined the effectiveness of service level management
organizations produce useful reports showing the number, severity, and type of tools. Figure 6.3 shows the survey respondents' assessment of tool effectiveness.
problems reported by users of IT services, including response and resolution times.
These help the IT department show its responsiveness to the lines of business and Effectiveness of SLM Tools
can be used to determine whether the problems are systematic, underlying tech-
nology or staffing issues requiring attention.
• Event correlation and aggregation
2.9
Some organizations do manage to provide service level information by application, 3 E Trouble ticketing / help desk
by location, and by user; however, unless the IT department has invested in a Application management
3
sophisticated toolset and employed a rigorous methodology, this information has
3.3 O Fault notification / event handling
to be generated manually.
3.3 Network performance management

Tip 3.3 E Network management platform

Unless you can provide service level metrics in terms that users can relate to and that represent
2.6 2.8 3 32 34
their experience, it might be best to disseminate the service level reports only within the IT depart-
ment. Technology-oriented component reporting confuses the lines of business and reduces the Respondents by Category
credibility of the IT department.
1=Not at all effective 2=Not so effective 3=Somewhat effective 4=Very effective

Figure 6.3 The effectiveness of SLM tools-1999 INS service level management survey.
Proactive reports of service difficulties and scheduled outages can be extremely
effective in increasing user satisfaction and the credibility of the IT department.
The perceived relative ineffectiveness of application management tools versus
Some sophisticated IT departments are deploying technology solutions that allow
network management tools also explains part of the difficulty for IT departments to
them to notify users via Web-based applications, synthesized voice units, voice
mail, and email. implement more mature service level management. As these products and the applica-
tion management market mature, there will be a direct benefit to IT departments
wishing to manage service levels from the application and end-user perspectives.
IVV

A more detailed look at service level management products is provided in


Chapter 7, "Service Level Management Products."

Summary
Today's service level management practices in most IT departments are still
immature; however, many organizations are improving their abilities significantly.
CHAPTER
Because service level management is a continuous process, the maturation of the
discipline and industry will not occur overnight, but will be a gradual process.
Many organizations are investing in management tools, and are putting service
level agreements in place with the lines of business they serve, as well with exter-
nal providers of services to the IT department. These initiatives support a more
sophisticated approach to service level management and improve the quality of
services provided by the IT department.

References
1. META Group, Service Management Strategies Delta, 10 February 1999, File:754
2. META Group, Service Management Strategies Delta, 30 April 1999, File:778
Service Level
3. Rick Blum, Jeffrey Kaplan/International Network Services, INS 1999 Survey
Results - Service Level Management, 10 May 1999
Management
4. The Forrester Report, IT Pacts Beyond SLAs, April 1999 Products
5. The Forrester Report, Service Level Management,Volume 15, Number Four,
February 1998
6. Tim Wilson/InternetWeek, "Service Level Management: Build Stronger
External Bonds," 10 May 1999
7. Enterprise Management Associates, Service Level Management Market T he market for management tools displays an interesting phenomenon.
Periodically, a new area of interest emerges and becomes the hot topic for several
Research Study, 30 November 1998
months or, in rare cases, even a few years.When a new hot topic emerges, there is
usually a stampede among companies in the vendor community to address the
new market opportunity. Some companies race to develop and deliver new prod-
ucts. Others will tweak and refine their existing products. Still others will simply
change their marketing materials to slant them toward the new area. This approach
is sometimes legitimate. That is, the company's product really does meet a need in
this emerging market space. Unfortunately, there are other cases in which even
though the product does not really meet any needs related to the hot topic, the
vendor still claims it does.
What makes a topic "hot?" Changes in technology are frequently responsible for
moving a topic to the front burner. That is because new technologies are exciting,
interesting, and usually make possible things that heretofore had been impossible.
102 103

New products can generate a "buzz" that leads to an area becoming a hot topic. (bridges, routers, switches, hubs, and so forth). Some also gather input from .
Also, a shift in user interests will sometimes be the driver for interest in a particular software programs that of ect overall service availability (applications, databases,
arena. Sometimes press coverage can become a driver independently of any of tiddleware, and the like). n
these factors, or it can be fueled by these factors. Also, vendor publicity can
Most primary data collectors are not dedicated to SLM. Instead, they are typically
become both a driving force and also lead to increased press coverage.
management systems that gather data for a range of purposes, one of which is
Today, SLM is one of the latest hot topics. Predictably, there has been a flood of SLM. For example, Hewlett-Packard's OpenView Network Node Manager
products, from new and established companies, aimed at this market segment. (NNM) monitors an enterprise network for a range of parameters, including net-
However, with SLM, there is a fundamental problem. There is not a clear defini- work availability. Although this data can be used for SLM reporting, it also aids
tion of terminology. Therefore, vendors are free to create their own definitions— troubleshooting by tipping off network operators about degraded performance. HP
ones that include their products in the domain of service level management. provides a separate SLM reporting package that works with NNM. That product,
Unfortunately, this plethora of definitions has created confusion within the user called Information Technology Service Management (ITSM) Service Level
community. Manager, also takes input from other HP applications.
In this chapter, we will provide a framework for classifying and assessing SLM Another class of product, secondary data collectors, has appeared (see Figure 7.1).
products. This will enable managers to better decipher the confusing array of prod- These tools do not need to communicate directly with the managed environment
ucts offered for SLM. And hopefully, it will give managers the means to find SLM (although some of them are able to do so, if necessary). Instead, they extract data
solutions that meet their organization's particular requirements. from other products that are primary data collectors. Tools such as Luminate's
Service Level Analyzer fit this category. Infovista's Vistaviews is another example.
We will use our own classification system to scope out SLM products. Keep in
This product retrieves data from third-party management applications, including
mind, however, that it's possible for SLM tools to fit into more than one category.
BMC Patrol and Compaq Insight. Also, it comes in versions capable of interacting
And when given the chance, most vendors will insist that their products "do it all."
directly with routers, Ethernet switches, and WAN gear. Secondary data collectors
Still, for our purposes, SLM products can be grouped into the following broad
functional categories: like Service Level Analyzer offer a means of extending management platforms
from different vendors for SLM monitoring, while filling in where management
• Monitoring systems might be absent. This approach offers a number of advantages. First, it
• Reporting eliminates the need for redundant agents throughout the distributed computing
environment. Second, redundant management traffic is eliminated by relying on
• Analysis original sources. Third, this approach eliminates the need for the redundant storage
• Administration of large quantities of data.

Primary Secondary
Monitoring Tools data data
collector collector
When a Service Level Agreement has been negotiated, it is necessary to capture
Inventory F-- ^ SLM
data about the actual quality, or level, of service delivered. To do this, managers
need to use tools to monitor the performance of the service. These monitoring Fault

tools comprise software or hardware that retrieves data about the state of underly- SLM
ing components driving the service. This data is stored in a database for future ref- A
erence or interpreted and put into reports. (Reporting tools will be discussed in
the next section.) Network
Systems

Basic Strategies Figure 7.1 Primary and secondary data collectors.

Monitoring tools collect data in two ways: In the first approach, primary data
collectors capture data directly from the network elements underlying the service
Knowing whether a product is a primary or secondary data collector helps deter- standard RMON probes, can furnish this information. And if the retailer has HP
mine how an SLM monitoring tool fits a particular environment. To get a better OpenView NNM installed, the data can be easily captured.
sense of actual requirements, however, it's important to gauge how products fit the
manager-agent model. Rut the retailer's IT department also needs to know how quickly orders are
processed after they're taken over the phone. To obtain this input, software agents
Both primary and secondary data collectors are designed according to this engi- will need to be placed on the call center's database server. Because most database
neering scheme (see Figure 7.2), in which each device or software program uses an servers don't come with SNMP agents installed, the retailer will need to purchase
integral mechanism called an agent to collect data about its status. This information an application that includes special agent software. In this example, another
is automatically forwarded to a central application called a manager, usually in OpenView product, HP's IT/Operations, could be purchased to track the server
response to a poll signal or request. Many agents in a network can be set up to database via agents bundled with the product. Data from IT/Operations could
communicate with one or more managers. For a comprehensive description of the then be combined with NNM data for use in HP's ITSM Service Level Manager.
manager-agent model and its implementation in various products, see Appendix F,
Our hypothetical retailer might take a different tack if OpenView NNM wasn't
"Selected Vendors of Service Level Management Products."
available. If BMC Patrol were installed, for instance, server agents would already be
in place. The problem then would be to purchase an SLM monitoring tool to cap-
ture data about the underlying network. The retailer could choose secondary SLM
data collectors like Quallaby's Proviso to add data about routers and other gear to
the system information from Patrol.
In some instances, IT will need to obtain data for SLM from a legacy application,
a = agent device, or system that does not have its own standard SNMP agent. In this case, IT
_= status information personnel might have to build agents that can report either directly or indirectly
into existing management solutions. This requirement is not as difficult to meet as
it might seem. It is relatively easy to construct an SNMP agent using object mod-
eling via Visual Basic or Visual C++. If need be, reporting tools and alerting pro-
grams also can be constructed or augmented in a relatively straightforward fashion.
Keep in mind that it will be easier to augment SLM tools that support open, well-
documented databases and formats.

Servers and
Workstations
Data Capture
Figure 7.2 The manager-agent model. SLM monitoring tools use a range of methods to capture data. In the implementa-
tions previously described, agents are used to check on the devices and software
The manager-agent model can help prospective buyers determine what they need underlying a network service. Other techniques include the use of probes and sim-
to look for in an SLM monitoring tool. If a company already has an SNMP man- ulation. Take a look at each of these methods, along with their key benefits and
ager such as HP's OpenView NNM, for instance, all that might be needed is an drawbacks.
SLM package capable of using NNM data. That's because most network devices
today are shipped with integral SNMP agents, ready to send data to any vendor's
standard SNMP manager on request. Agents
Most products classed as primary data collectors—including HP OpenView NNM
In other cases, special agents will be needed to furnish additional information for and IT/Operations, or Tivoli Netview and Tivoli Management Framework—use
SLM reports. Suppose that, for example, a catalog retailer needs to track how well agents to retrieve information about the hardware and software components that
its call center has performed in a given month. Data will be required about the support a particular service. This data can then be forwarded to SLM tools from
functions of the CSU/DSUs, routers, and network connections that bring cus- the platform vendor or third parties.
tomer orders into the call center. SNMP agents embedded in those devices, plus
106 107

models, both support the conclusion that only in an extreme worst-case situation
Several types of agents can be used with SLM tools: Hardware agents comprise
can the traffic between managers and agents be expected to exceed 1% of the
software or firmware embedded in network devices that retrieve status information
via SNMP or proprietary commands. Nearly all devices in today's corporate envi- available bandwidth.
ronments ship with embedded SNMP agents. All devices from Cisco, for instance, The growth of the Internet has prompted many vendors to investigate Web-based
ship with integral agents that use special commands to capture information about techniques, such as Java applets and XML (eXtensive Markup Language), as an
device status. This data is converted within the agent to SNMP for transmission to alternative to traditional manager-agent communications. Some products, including
local or remote manager applications from Cisco and other vendors. Trinity from Avesta (a company recently purchased by Visual Networks), e-Specto
Another type of agent important to SLM products is the RMON agent, which from Dirigo, and FrontLine e.M from Manage.com put these techniques to work
monitoring the availability and health of e-commerce services. Using the Web
consists of code installed at the network interface to analyze traffic and gauge
saves bandwidth and system resources and eliminates the need to set up multiple
overall network availability. Many RMON agents are packed into standalone
consoles for management from remote locations. Instead, managers can obtain
boxes called probes (see the next section for more on these). Alternatively, RMON
SLM data from any location via Web browsers. Today, most Web-based manage-
agents are sold as firmware embedded in switches, hubs, and network interface
ment products rely on proprietary protocols and interfaces. But ongoing work by
cards. All major hub and switch vendors include RMON agents in their wares.
the Distributed Management Task Force (DMTF) is aimed at creating formal stan-
Because SNMP agents aren't ubiquitously installed on servers or within software dards for Web-based management.
packages, many SLM products come with specially designed agents. These agents
consist of code that resides on a server and taps log files for information on the
Note
performance of databases, network applications, middleware, or the operating sys- Agent software embedded in hardware devices and network servers is used to gather status and
tem itself. BMC Software offers Patrol agents for a range of distributed databases configuration data for transmission to central management consoles. Agent technology has been
as well as mainframe environments. These agents report back to BMC's Patrol standardized by the IETF using SNMP, which allows third-party platforms and applications to gather
manager, which in turn is accessible to a range of third-party applications from input from multiple sources in the network, regardless of vendor or brand.
vendors who've partnered with BMC.
The chief benefits of agent technology are its flexibility and support for mixing
and matching of products from different vendors. Software agents also can be used Probes, Packet Monitors, and CSU/DSUs
to extract data from a range of sources, as previously noted. Agents also are versa- Many SLM products rely on specialized applications or devices to retrieve data on
tile: Any standard SNMP agent works with any SNMP manager, and vice versa. network performance. In this category are probes, packet monitors, and CSU/DSUs
Even proprietary agents can be integrated with third-party managers—as long as equipped with specialized monitoring capabilities. Each of these products passively
the vendors are willing to cooperate. scrutinizes packets at the network interface and parses them in order to retrieve
On the downside, agent technology can add a processing burden to networks information on latency and throughput. Probes and packet monitors also analyze
and systems if it is not well planned. Communication between agents and man- flows for insight into the specific applications traversing the net, as well as their
agers is usually based on the client/server model, in which data is exchanged overall quality.
between the two entities over a network. When SNMP is used, this means that Probes consist of standalone hardware devices containing RMON and RMON II
packets are transmitted back and forth across a TCP/IP connection. This traffic agents along with packet parsing and filtering engines similar to those used in pro-
can tax bandwidth on network links set up to handle mission-critical applications. tocol analyzers. Apptitude and Netscout Systems are examples of probe vendors.
Congestion can result, especially in large networks, in which many devices are Packet monitors are similar, although often they don't require dedicated hardware
"talking to" a central manager console. One way to avoid congestion is to set up but are sold as management applications—the Ecotools product from Compuware
the manager console to poll agents only at specified intervals, or to retrieve only and Application Expert from Optimal Networks are examples.
certain types of data from the agents, such as critical alarm information.
Probes and packet monitors obtain accurate, multi-layer performance data through
Poorly designed SNMP and proprietary agent software also can burden a host direct contact with network traffic. On the downside, these products are usually
computer, causing slowdowns in response time. However, broad experience by limited in scope and scalability, and their focus is strictly on network traffic; they
IT organizations in many industries over several years, coupled with computer cannot capture data about the status of devices or specific databases or applications.
Probes are limited in other ways too: A probe designed to monitor based- line inlorination about the performance of specific transactions within applications.
services, for instance, won't track traffic operating above rates of 2.1)413 megabits per That type of granular information requires the use of software agents such as those
second (Mbps). And the number of links a probe can handle is limited to its physi- from BMC Software, Candle, Landmark, or Luminate.
cal port capacity: As the number of monitored links increases, more probes need to
be purchased.
Note
Some vendors provide software that is capable of simulating specific types of network traffic or
Note transactions over LAN and WAN links. The simulation tools furnish a way to test multiple connec-
Probes and packet monitors use agents embedded in packet-filtering devices to track and report the tions in a uniform way.
status of network traffic as it moves over LAN or WAN connections. The RMON MIB standardizes
this data for compatibility with any SNMP console.

SLM Domains
A range of vendors of CSU/DSUs have entered the SLM market by adapting their
equipment for use as SLM monitoring tools. ADC Kentrox, Adtran, Digital Link, Effective use of SLM monitoring calls for a skillful application of the basic strate-
Eastern Research, Paradyne, Sync Research,Verilink, and Visual Networks all fit gies and data capture techniques previously outlined. But just having the tools isn't
this category. Each of these vendors offers a series of CSU/DSUs that keep track enough; a manager needs to apply the tools at the right times in the right places.
of physical-layer performance while divvying up WAN bandwidth to enterprise Like a carpenter equipped with wood and a hammer but no nails, SLM tools
segments. These products are comparatively inexpensive, and they can be a conve- won't deliver good information if they're not used in the proper combinations.
nient solution for organizations that want to press existing equipment into the ser- And the right mix of tools differs with each organization.
vice of SLM monitoring. On the downside, these units only track the performance One step toward success is to examine the portions of a network that need to be
ofWAN links. They don't monitor routed segments. And they might not work monitored, and then put tools in place to generate the needed SLM data. In gen-
on international networks—although most of the CSU/DSU vendors furnish eral, networks can be described as having the following components or domains:
standalone probe versions of their monitors for use overseas.
• Network devices and connections
Note • Servers and desktops
Some WAN CSU/DSUs come with integral agents that track the physical-layer performance of WAN • Applications
connections and apply this data to SLM reports.
• Databases
• Transactions
Simulation
Taken together, these domains control the quality of network services. An account-
SLM vendors rely on a range of data capture techniques in addition to traditional ing department, for instance, can't run effectively unless all personnel including
agents and probes. Some vendors use simulated application flows, for instance, to debit and credit professionals, tax accountants, the controller and the CFO—are all
test the fitness of network connections. FirstSense, Ganymede Software, Jyra properly connected over the intranet, which in turn requires switches, hubs, and
Research, Mercury Interactive, and NextPoint Networks take this tack, in which routers to be in working order. Likewise, the servers and workstations used by the
simulated transactions are sent over an IP intranet in order to get consistent read- staff need to be configured correctly. But no IT manager needs to be told that
ings on response time and availability. In some instances, RMON and SNMP data response time can slow to a crawl even if the underlying devices and servers are
is added to the mix to fill out the network performance profile. working. Applications can be awkwardly designed, databases clogged with useless
Besides offering a consistent view of network application performance, simulated entries, and transactions poorly structured.
transactions offer a way to view end-to-end response time—something that can't To get the best SLM information, it is usually, but not always, necessary to install
be measured by tools that gauge latency alone. On the downside, these tools add products to monitor each domain. To get the best read on the quality of the
traffic to the network; a fact that concerns many network managers. Simulation accounting services in the previous example requires tools to deliver input on net-
tools also are restricted to gauging client-server response time. They cannot furnish work availability and response time of applications. If multiple sites are involved, a
110 111

probe might be used to track the quality of WAN links furnished by ;i crrier. number of management systems dedicated to performance monitoring and report-
Based on the network design and ongoing performance input, it night be impor- ing also support SLM, including Keystone VPNview from Bridgeway, ProactiveNet
tant to adjust the level of monitoring, increase the number and quality of tools in Watch from ProactiveNet, and Netvoyant from Redpoint Network Systems. Each
one domain, or consolidate tools across others. of these products can take the place of a primary data collector to feed its own
To know how to best coordinate a solution that fits a particular organization's SLM reporting tools. They can be used where no primary data collector is in
requirements, it's important to know the basic functions of each domain, the tools place, or where there is a primary data collector in a central location that needs
typically used to monitor those functions, and where and when they're applied. to be augmented at remote sites.
Take a closer look at each of the domains in turn (see Figure 7.3) and examine Deciding which SLM monitor to use depends in part on the size and design of
how the basic strategies and the data capture techniques that we've already covered the network. In large nets, it might be practical and economical to simply extend
are applied in each. Examples of currently available products will be furnished for a platform like OpenView to include SLM monitoring by using tools from the
each domain. platform vendor. It might also make sense to deploy the platform's scalability
options. HP, Tivoli, and other vendors of SNMP management platforms furnish
SLM Domains software called a midlevel manager that gathers data at specific segments or sites
and sifts it for selective transmission to a central console, reducing the amount of
bandwidth and processing required to monitor multiple sites.
Most mission-critical networks these days rely to some extent on carrier connec-
tivity. To keep track of how well the carrier is contributing to service levels, probes
may be deployed at specific WAN links (see Figure 7.4). Probes can be polled just
like any other SNMP or RMON device. More in-depth data, however, can be
obtained by using the application that is sold with the probe. In many instances,
this app can be set up with a bit of tweaking to transmit data to OpenView or to a
third-party SLM tool.

SLM Monitor

Figure 7.3 Interdependent SLM Domains.

Network Devices and Connections


Router Probe
The quality of the underlying network is key to SLM monitoring. After all, no net-
worked service or e-business application can operate without reliable physical con-
nectivity. Monitoring a network requires keeping track of whether each device is
operating, and how well all components are working in concert. Getting this data
Hub
calls for a two-pronged approach that includes tracking the availability of individual
devices and monitoring the performance of network connections. Typically, perfor-
mance data includes information about the throughput, or quantity of delivered
hubs, workstations,
packets, and the latency or delay between devices on a particular connection. servers, etc.

Availability and performance data can be obtained by tapping standard SNMP


and RMON/RMON II agents located in hubs, switches, routers, and other gear.
As previously noted, this can be done via primary data collectors such as HP = status information

OpenView or Tivoli Netview, both of which, like other platforms, support their
Figure 7.4 Typical configuration: Network monitoring for SLM data.
own SLM tools as well as those of third-party vendors. Alternatively, a growing
112 113
Servers and Desktops time it takes a server to respond to a desktop request, they also monitor the health
As the user interface to mission-critical applications, servers and desktops are key and functionality of the application's inner workings.They do this by residing
to SLM success. If response time is poor or there is a failure of server transmission, inside the software itself monitoring the keystrokes, commands, and transactions
service has failed, even if network devices are running properly. deployed by the service. They can identify applications that send too many
requests to the server, or highlight those that use transactions that are awkwardly
But servers and desktops are often missing from the view of SLM monitoring constructed.
tools. Although specific SNMP MIBs, such as the Host Resoures MIB, have been
defined to track some elements of computer systems in a standard way, these aren't Because they're so detailed, these agents are specially designed to keep tabs on
typically used to track performance for SLM. Instead, gathering data on the rele- specific brands of apps or databases.The Collaborative Service Level Suite from
vant criterion, response time, usually requires proprietary agents or simulation tools. Envive Corp. and Luminate for SAP R/3, for instance, track SAP R/3 databases.
Both types of products measure the time it takes applications to traverse the net, ETEWatch from Candle Corp. monitors the response time of Lotus Notes,
either from server to desktop or vice versa, and compare this data to established PeopleSoft, and SAP R/3 applications. Empirical Director from Empirical Software
service levels. Some products also compare response times for different types of gathers performance data in Oracle databases as well as a range of operating systems.
applications and indicate how much network latency contributes to the overall Smartwatch from Landmark Systems can be set up to track the performance of a
measurement. variety of middleware packages, operating systems, and applications. And BMC
Software furnishes a comprehensive framework suite encompassing Patrol, Best/1,
Most SLM monitoring tools for desktops and servers keep track of response time and other packages for managing all these elements.
by issuing simulated transactions from desktop to server, or by using agents placed
at either location to gauge server or desktop responses to live application requests. Application agents differ in their monitoring orientation: ETEwatch and
Examples of products in this category include FirstSense Enterprise from Smartwatch, for instance, monitor the performance of applications from the work-
FirstSense Software,VitalSuite from Lucent, S3 from NextPoint Networks, and station perspective, whereas Envive and Luminate take the response time view from
ResponseNet and ResponseWeb from Response Networks. the server. Which view is more valid is generally a matter of opinion. Proponents
of the workstation approach claim their wares gauge end-to-end response times,
Products like these show how well service levels are being met from the end-user's whereas vendors of the server approach say their agents are easier to maintain
perspective. But many are restricted in the range of applications they simulate or because they don't have to be placed on desktops throughout the network.
monitor: Although most support Internet applications such as POP3, SMTP,
Telnet, FTP, and Domain Name Service, not all track Web transactions via HTTP. Monitoring specific transactions within applications represents the most sophisti-
Also, many are restricted to using SQL queries associated with particular applica- cated type of application monitoring. It also requires the user to deploy the highest
tions like Oracle, PeopleSoft, or SAP R/3. And all of them require some software level of expertise. That's because products like Smartwatch call for users to select
to be deployed at the desktop, which can make them ungainly to maintain in large the transactions they want to monitor. This calls for in-depth knowledge of how
networks. Exceptions include NextPoint, which uses Java applets to control simu- applications are structured, as well as a sense of the specific transactions that require
lated transactions, and FirstSense and Response Networks, which allow agents to most attention. For most organizations, a product like Smartwatch will need to be
be distributed over the Internet. run by a programmer.
Packet monitors can furnish granular information about software performance by
Applications, Databases, and Transactions analyzing application traffic. Optimal's Application Expert, for example, depicts
Tools that furnish in-depth details on the performance of applications, databases, specific application threads using color-coded graphs; managers can visually pick
and transactions are required wherever SLAs depend on software performance. out bulky command sequences that might be holding up response time.
They also can be useful where applications are complex but underlying networks A key consideration in choosing a software-monitoring tool is its ability to inte-
are stable. Because these software elements are all monitored using the same grate with other vendors' wares, particularly vendors that offer other SLM solu-
techniques—specifically, specialized agents or packet monitors—it makes sense to tions. BMC Software, for example, has integrated its tools with HP OpenView,
group them into a single category when considering product selection. Tivoli, and a range of other third-party management platforms and applications.
The agents used for this domain differ significantly from those used to monitor And vendors such as Compuware and Envive also have made integration with
response time for servers and desktops. Although these agents still measure the platforms and frameworks a priority. No SLM shopping expedition is complete
without a thorough check of a vendor's partnerships and integrated solutions.
114 116

Ultimately, a range of tools is available to support SLM fi>r applications, databases, We will examine how each of these functions might incorporate SLM monitoring
and transactions. But choosing solutions that fit requires in-depth scrutiny of a par- and reporting—and how commercial products can be used to fit the specific
ticular organization's SLA priorities, available expertise, and need for products that requirements.
integrate with other parts of the overall SLM monitoring scheme.

Fault Management
Application Response Measurement (ARM): A Rising Phoenix?
These days, it's rare to find a network that isn't equipped with some form of
It could be the best-kept secret of SLM: Application Response Measurement (ARM), a standard
fault-reporting software or hardware. The SNMP management systems of the
method of instrumenting applications for management. Initially created by Hewlett-Packard and
early 1990s were focused primarily on reporting broken links and devices, and
Tivoli four years ago, ARM comprises APIs designed to be built into networked applications, enabling
the descendants of these early OpenView and Netview systems remain in many
them to be monitored for a range of performance characteristics, including response time. HP and
Tivoli provide free ARM software development kits, and BMC and Compuware support ARM in their
organizations today.
application-management products. Also in today's organizations are the techniques of fault reporting that originated
ten years ago. The trouble is, yesterday's fault management systems are no longer
Despite all this, ARM seemed, until recently, doomed to obscurity. Even the vendors that supported it
able to meet the needs of today's burgeoning networks. The reason is sheer num-
weren't able to furnish customer testimonials. Some claimed ARM deployment was in the works but
bers: Larger and more complicated networks breed lots of alerts that can cause as
had taken a back seat to other projects, particularly Y2K updates. Other vendors said companies
many problems as they solve. When a router breaks, for instance, the management
considered ARM so strategic that its use was often kept secret. In the meantime, no volunteers
stepped forward to testify to successful ARM implementation.
system will not only receive alerts from that device, but also from all the hubs,
workstations, servers, and other gear that depend on that router for connectivity.
But ARM's fortunes might be changing. This year, The Open Group, itself newly re-launched after Weeding through the resulting avalanche of alarms can delay troubleshooting and
suffering several years of second-class industry citizenship, has declared ARM an approved building repair—resulting in missed service levels.
block in its overall Open Group Architecture for making intranet services as reliable as dial-up voice
To cope with this, a new breed of product has emerged that works alongside stan-
networks. It remains to be seen whether the ringing endorsement of Open Group members like
dard SNMP managers, sifting their alerts and reporting only those the manager
Compaq, Fujitsu, HP, Hitachi, IBM, NCR, Siemens, and Sun will make a difference in users' readiness
needs to see. Included in this category is Netcool/Omnibus from Micromuse,
to rewrite their applications with ARM. But coming under the Open Group umbrella will give ARM a
which lets managers gather and filter events from multiple management systems,
boost by furnishing certification and testing for ARM implementations and promoting ARM use in
large-scale enterprise software integration projects.
including those supporting non-SNMP protocols. In effect, Netcool/Omnibus acts
as a manager of managers, providing a single console in which selected events and
alerts are displayed to streamline troubleshooting. Another group of products takes
event filtering a step further, using built-in intelligence to identify the root cause
The FCAPS Approach of network problems from telltale patterns of alerts. The Incharge system from
SLM can be approached not only by domain, but also by function, When doing System Management Arts, Eye of the Storm from Prosum, and tsc/Eventwatch
this, the time-tested ISO model of management serves as a useful starting point. from Tavve Software fit this category.
This theoretical approach, dubbed FCAPS for short, calls for five basic categories
of tasks to be included in any comprehensive network management scheme: Configuration

• Fault management Ideally, successful SLM includes the ability to control as well as monitor network
devices and connections. But this capability is only just starting to emerge, as ven-
• Configuration
dors add traffic-shaping capabilities to their SLA monitoring tools. The Wise
• Accounting IP/Accelerator hardware/software product from Netreality, for example, combines
• Performance management a traffic monitor and shaper with SLA reporting tools. This lets managers assign
• Security management bandwidth to applications according to priority. Mission-critical e-commerce
applications, for instance, are run at high, guaranteed rates, whereas internal email
might get "best effort" status if congestion occurs. There are other vendors with
116 117

offerings in this space, although many do not have integral SLA reporting tools. In general, products that do performance management share the following
Packeteyes from SBE, for example, combines an access router and firewall with characteristics:
software that assigns and controls application bandwidth. There are also software-
only products for bandwidth management: The Enterprise Edition software suite • SNMP device management—the capability to gather information from
SNMP agents in network devices and systems
from Orchestream, for instance, enforces prioritization of traffic across switches and
routers from Cisco, Lucent, and Xedia. On the downside, a lack of standards for • RMON/RMON II or probe links for traffic monitoring—the capability to
policy management has up to now kept products like Orchestream's limited to track the overall performance of network connections
specific vendors' wares. • Response time measurement—the capability to gauge how well applications,
databases, and transactions are performing over the intranet
Accounting • Real-time event filtering—the capability to generate warnings and alerts
when devices break or traffic conditions deteriorate
A key aim of SLM is to keep costs in line. Ironically, products that track the usage
of enterprise network services have only recently emerged. These tools, including • Historical trend analysis—storage of performance data over time in order to
generate periodic graphical representations of network health and status
Netcountant from Apogee Networks, IT Charge Manager from SAS, and Telemate.
net from Telemate Software, tap RMON probes and log files in routers and applica- A growing number of management systems do provide all the previous features
tions in order to tally the amount of bandwidth consumed by a particular applica- and functions, laying claim to being a new breed of performance management
tion, department, or individual. This data is matched up to a dollar value and placed platform. Included are systems like Avesta's Trinity, Loran Kinnetics, Manage.com's
in a bill. Alternatively, managers can use the data to populate financial reports or Frontline, and NextPoint's S3.
forecast the cost of upcoming additions to networking hardware and software.
Because these products are still so new, they haven't reached their full potential Security
yet. It's conceivable, for instance, that by linking these accounting applications to Increased use of the Internet and carrier services in corporate networks has
Web load balancers, switches, and bandwidth prioritization gear, IT and network made management of security a full-time job in many networks. Keeping pass-
managers could include cost parameters along with network performance in future words up-to-date, making sure that access is properly assigned, and monitoring
SLAs. An IT department might, for instance, be able to keep track of how much software for viruses are just a few of the tasks required to ensure that today's larger,
of a costly leased line or virtual private network a particular group has used in a more public networks guard business secrets and avoid resource tampering. A com-
given month. And if usage threatens to exceed budgeted funds, the department pelling argument for considering security as part of the service level management
could be notified. Likewise, if more bandwidth is required, a manager can test equation is quite simple. If the security of an environment is compromised, the
out various configurations before signing on the dotted line. availability and/or performance of the service can be compromised. Some Service
Level Agreements include specific metrics regarding the security of the environ-
Performance ment and the data contained therein.
If there is a place of honor among FCAPS functions, performance management Several vendors of secondary data collectors furnish comprehensive security appli-
can claim it. With few exceptions, most SLM monitoring tools discussed up to this cation suites along with SLM. Unfortunately, many of these products aren't directly
point can be classified as performance management systems because their main integrated with the platform. Exceptions include BullSoft, which offers security
purpose is to capture data about how well various portions of the network are per- management, authentication, monitoring, and documentation as an integral part
forming in terms of uptime, response time, throughput, packet latency, and the like. of its OpenMaster platform.
Unfortunately, performance management has taken on the "hot topic" status we
mentioned at the start of this chapter.Vendors whose products contribute only part Reporting Tools
of what's required to manage performance—such as event reporting, probes, or We've spent the lion's share of this chapter describing a framework for selecting
protocol analyzers—are "bellying up to the bar" with claims to do the whole job. products that monitor and capture data for SLM. There's a good reason for this:
Without the right input, any SLM project is doomed to failure. Even the best
information won't guarantee a successful SLM strategy if the results can't be pub-
lished effectively. An examination of reporting capabilities is a key part of any SLM
product selection.
Unfortunately, when it conies to SLM products, there's a gap between monitoring Overall, traditional primary data collectors and t;iult-management consoles like
and reporting—a gap that's not breached by the vendors, who don't like to admit OpenView don't qualify as SLM tools without some adjustments. Conversely,
their weaknesses at either end. It's too often assumed that any product that per- many early SLM tools, like lnfovista's Vistaviews, did not support real-time event
forms SLM monitoring comes with SLM reporting to match. This is definitely not handling, although that's now changed as vendors become aware that the ability to
the case. Many primary data collectors like OpenView, for instance, furnish SLM pinpoint the number of alarms and outages is a key factor in quantifying network
performance data. But they don't ship off the lot with integral SLM report tem- availability.
plates. Likewise, some products are designed primarily as reporting tools and rely— Historical data comprises metrics of the overall health of specific network segments
at least in part—on imported data from systems like OpenView or BMC Patrol. and connections over time. RMON/RMON II information collected by probes,
In general, it is wisest to keep two things in mind when evaluating SLM reporting CSU/DSUs, and packet monitors, for example, can be gathered at specific intervals—
tools: daily, weekly, or monthly—and placed in charts or graphs depicting how well service
levels were met. The importance of historical data to the overall SLM effort made it
• The information source easy for performance monitoring vendors like Concord Communications, Desktalk
• The destination of reports Systems, and Lucent (through their acquisition of International Network Services) to
adapt their marketing strategies to fit the SLM trend when it first emerged. That's not
to say these vendors didn't proceed to add SLM-specific capabilities to their wares, but
getting on the bandwagon was undoubtedly easier for them than for vendors of fault-
Consider the Source management applications.
Information sources for SLM refer to the type of data capture already in place, as
The terms of a specific Service Level Agreement have an impact on the kinds of
well as the quality of the data provided. An organization that's invested heavily in
reports needed. In the case of mission-critical applications, some users will want to
a platform like OpenView or Netview, for instance, might want to build on that
know about the occurrence and duration of every device failure. This requires real-
investment by purchasing the add-on products required for SLM reporting. In
time event reporting. In other cases, an SLA might specify that availability of spe-
contrast, products designed from the ground up for SLM come with their own
cific devices, such as backbone routers and switches, be reported weekly or even
integral report templates, and if they already hook into other data sources, there's
daily, whereas availability for other gear is reported monthly. This makes it vital that
no need to add extra software.
an SLM tool be flexible when it comes to the increments of time for which data
The choice of reporting tool can't be based solely on what's already in place. It's can be obtained.
important to consider the type of data that's being captured as well. Management
tools in general capture two types of data for SLM: real-time and historical.
Real-time data consists of events reported from the network directly as they occur.
Report Destination: Who Will See It?
Broken routers, congested links, and malfunctioning adapters all generate SNMP The other key question to answer about SLM reporting tools is: Who will be
alerts that show up as alarms or alerts in fault-management consoles such as viewing the information? Graphs of packet performance over time and host-by-
OpenView or Netview This is information that's typically required by network host uptime charts, however useful to a network operator, won't give a technologi-
operators in the course of day-to-day troubleshooting and management. In fact, cally challenged executive the needed information about the bottom line. To meet
most of the time, management tools with real-time capabilities have the capability the specific requirements of upper management, SLM vendors generally provide
to automatically generate a page or dial a phone number to notify operations per- "executive reports" that depict SLM information in more general terms and with
sonnel in the event critical alarms occur. (Operators and other IT personnel can a keener eye to the presentation slide. Compuware, Concord, Desktalk, Lucent,
select the particular events ahead of time that will trigger the notification capabil- Netscout, ProactiveNet, and Quallaby have invested significant effort in creating
ity.) Still, keep in mind that although real-time data is an important gauge of overall executive report templates. These vendors also furnish Web access to all reports in
availability and uptime, it can't give the perspective on overall performance required an effort to help managers easily distribute SLM data throughout an organization.
for SLM. Prompt response to an outage can reduce the impact on a particular SLA, A newer trend in SLM reporting is the ability to calculate composite status mea-
or help operations personnel keep to the repair times stipulated in the SLA. surements from a variety of data sources. In this case, data captured from probes,
software and hardware agents, and simulation tools is statistically tallied in order to
Lit)
t) 121

show the overall performance ofa specific segment, application, or grup. performance information. SAS also has added cost accounting, capacity planning,
Apptitude, Concord, Desktalk, Lucent, Empirical Software, H I Netscout, Trvve, and high-end financial analysis to its suite.
and Visual Networks offer composite measurements in their applications.
Some organizations will need consulting help to properly analyze SLM data. Cases
In some instances, business-level views are also achieved by matching up this like these might be best served by reliance on a service from the likes of Lucent,
performance data with information about the business purpose served. Thus, an Winterfold Datacomm, or X-Cel Communications.These vendors provide services
executive might be able to see how well the accounting, human resources, and that can help, orchestrate data capture to fuel specially tailored reports and analyses.
manufacturing divisions were served by information services during the past But as is the case with any kind of customization, extra costs might be involved.
month. Avesta's Trinity is an example of one system that's geared to furnishing this
type of business-level view
Caution
In some instances, managers will need reports that can't be provided by the ven- Vendors usually consider a $2,000-per-day pricetag for consulting help to be a bargain. The value of
dor. For cases like these, many vendors offer APIs and software development kits customized software must be weighed against the outlay beforehand in order to avoid disappointment.
that allow their wares to be customized. This option might cost extra, however.

Caution
IT managers choosing to use vendor APIs and software development kits (SDKs) sometimes need to
Administration Tools
SLM calls for a new approach to the day-to-day tasks involved in managing and
spend twice as much as they did to obtain the basic product. Even if APIs or SDKs seem reasonably
administering network services. After all, it's tough to analyze network costs or
priced, there might be a need to hire the vendor's professional services team to create customized
ensure ongoing performance if it's not clear what is installed. To make changes as
software that works. In some cases, it might be more practical to simply export data to a third-party
required to improve service levels demands tools that enable network elements to
reporting package such as Crystal Reports from Seagate rather than going the made-to-order route.
be located and reconfigured quickly and efficiently.
MEW

One way to meet these requirements is adoption of better asset-management


tools. Products such as N'telligence from Netsuite Development, AssetCenter from
SLM Analysis Peregrine Systems, and Remedy Asset Management from Remedy can help dis-
Its characteristic of SLM that when it's properly in place in any organization, it cover and document network devices, servers, and workstations and keep track
starts to exceed its original function. When constituents see the benefits of SLM, of their licenses, service and support records, costs, and configurations. This helps
they aren't content with a monthly report. Network operators want day-to-day speed up repairs, ultimately boosting service uptime.
downloads for proactive management. Executives want to see data cut and sifted in
Managing software assets is among the biggest challenges in any network. Software
various ways to furnish better insight into how the technology they're purchasing
is serving the business, and so on. distribution tools are offered from vendors like Marimba, which OEMs its Castanet
package to a range of framework and console vendors. Specialized packages also are
SLM tools vary widely in their capability to adapt to all these requirements. Some offered to help track servers and desktops throughout multi-site networks and facili-
products, such as Desktalk's Trend series, were designed with built-in data analysis tate their upkeep. These include Landesk from Intel, SMS from Microsoft, EDM
flexibility, whereas others, such as the Network Health series from Concord from Novadigm, Zenworks from Novell, and Netcensus from Tally Systems.
Communications, were advertised from the start as offering off-the-shelf reports
that didn't require tweaking. But even if a product furnishes in-depth analytical Most of these products deploy SNMP along with proprietary protocols to track
capability, it might not have the data in hand to do the numbers required. Concord and update network software. The Distributed Management Task Force (DMTF)
and Desktalk, for instance, have limited real-time data capture capabilities. also has created a Common Information Model that can be implemented by
vendors to share inventory information across multi-vendor SLM applications.
Generally speaking, serious data analysis will require the use of sophisticated third- Directories also can help furnish user information to security and accounting
party packages. SAS Institute, the statistical software vendor, now provides a range applications. The DMTF's Directory Enabled Networks initiative is an effort
of data analysis and reporting tools tailored to fit SLM. Among these are the IT
to standardize the format for directory data, enabling it to be used easily across
Service Vision series, which creates a data warehouse of network, system, and Web
management applications.
ILL

Summary
SLM products span a broad range of functions and formats. What's more, vendors
have jumped on the SLM bandwagon in order to promote products that weren't
created with service level management in mind. Only by using a framework that
keeps the primary purposes and goals of SLM at the forefront can managers hope
to make sense of the many offerings crowding the market.
A workable approach is to first look at products according to the SLM functions
of monitoring, reporting, analysis, and administration. Then it's important to scope
out the monitoring issues—where data is captured and in what format. Careful
planning is required to ensure that the right data is gathered in the right spots at
the right times to create an adequate basis for service level reporting. When this is
accomplished, an organization is ready to choose tools for publishing and analyzing
SLM data in ways that meet its particular requirements.
When selecting SLM tools, it is especially important to keep in mind the database
format supported by each product.You will need to be able to get data in and out
of an SLM system easily. Selecting a system with a proprietary database for back-end
functions will limit your ability to customize the software or augment it with third-
party products. Many of today's SLM tools are based on open, well-documented
databases like SQL Server, so it should not be difficult to select one that meets your
requirements. Recommendations
To be effective, any SLM strategy also needs to be flexible enough to accommodate
ongoing information requests. In fact, the test of a successful SLM implementation
will be the demands put on the IT department for more information once initial
reports are generated. Managers need to be ready to "slice and dice" SLM data in
order to meet these demands. Again, in constructing reports, it helps to have an
Chapter
integral database that is familiar to your staff.
8 Business Case for Service Level
Ultimately, SLM monitoring and reporting will lead to a more efficient approach to Management
managing network services; one that calls for improved record-keeping and a tighter
centralized control over administrative parameters. The Distributed Management Task 9 Implementing Service Level Management
Force (DMTF) and other organizations are working to make this happen by creating
interoperable schemas for management data in applications, databases, and directories. 10 Capturing Data for Service Level
Agreements (SLAs)
11 Service Level Management as Service
Enabler
12 Moving Forward
CHAPTER

Business Case for


Service Level
Management

T he IT department must ensure that adequate, but not excessive, computing


facilities are always available to handle the workloads required by the lines of
business with acceptable quality of service. As discussed in previous chapters, this
requires proactive service level management during the life cycle of important
business applications. Typically, senior management requires a cost/benefit assess-
ment that provides the justification for implementing a service level management
strategy.

Cost Justifying Proactive Service Level Management


The cost justification should look across the corporation to determine the benefits
of proactive service level management. There are benefits within the IT depart-
ment, but greater benefits lie in increased productivity within the lines of business,
reduced lost opportunity costs, and enhanced customer satisfaction.
126

• Hardware costs for additional servers, workstations, and specialist equipment


Tip
for supporting the service management software tools
The cost justification for service level management will be much more credible if true business value
• IT management attention to justify, procure software and hardware, recruit
can be related directly to improved quality of service. This is more powerful than attempting to jus-
and educate staff, and oversee the operation of a service level management
tify service level management based on cost or staff savings within the IT department.
function

There might be other, less-tangible benefits such as enhanced brand image During the remainder of this chapter, we will concentrate on methods for identi-
and customer loyalty, but it is easier to justify quantifiable benefits that can be fying and quantifying the benefits because the costs are more easily determined
objectively measured or calculated. after the service management strategy is determined and appropriate management
tools are selected.
Key benefits of proactive service level management include the ability to
• Understand the quality of service provided to end users and lines of business
Quantifying the Benefits of Service Level
• Optimize the service provided to users of services by automating and
Management
centralizing the control of business critical applications and the underlying A good approach to quantifying service management benefits is to work from
components such as data, databases, server operating systems, middleware, the outside inward.That is, look first at the external business impact including
networks, and server hardware
the effect on revenue and customer satisfaction, and then examine the impact on
• Increase business revenue by reducing outages that directly affect business productivity within lines of business, and finally, look within the IT department
operations
and the effect on IT assets.
• Increase customer satisfaction and loyalty by ensuring that services used
directly by consumers are responsive and available whenever required
Impact on Business Revenue
• Increase productivity of users within the lines of business through better
When critical business applications are unavailable, there is normally an associated
performance and availability of services loss of business and a reduction in generated revenues. In all cases, this is associated
• Proactively plan to meet future business requirements, including workload with lost opportunity costs, and in some cases there are flow-on losses due to reg-
volumes and required service levels ulatory penalties and market share losses to competitors.
• Increase the return on investment in IT assets by balancing workloads and
A recent study of 400 large corporations found that downtime costs an average of
obtaining highest levels of component utilization while still meeting service $1,400 per minute or approximately $85,000 per hour. The study results are shown
level requirements
in Table 8.1.
• Increase IT staff productivity by implementing proactive planning and
management rather than continuously operating in a reactive mode
Table 8.1 Costs Associated with System Downtime
• Reduce or eliminate penalties associated with contractual commitments to
System Availability Downtime Costs per Year
meet specified service levels
• Increase shareholder value by eliminating highly visible outages, which 99% $7,358,400
99.5% $3,679,200
reduce investor confidence
99.9% $736,400
99.99% $7,000
The primary costs associated with implementing service level management are
• IT personnel to plan, implement, monitor, and report against service level
agreements
Note
• Software costs for purchasing or developing tools to monitor, diagnose, The cost of downtime varies significantly by industry. Financial trading systems have extremely high

manage, and report service quality, including problem notification costs associated with even minor service disruptions. As more corporations enter the age of e-business,
opportunity costs as a result of outages of front office applications will continue to increase.
Mrnisarn
I l t1 I 41

Business revenue can also be affected by performance degradations that impact the Similarly, the potential of improved relationships with the distribution channels and
ability to handle the required workload volumes. If the application responsiveness the effectiveness of supply chain transactions can be used as a basis for calculating
degrades, this can also impact revenues as best illustrated by financial trading sys- the benefit of improved service quality or the negative impact of unacceptable
tems where additional seconds can lead to significant losses or decrease profits from service levels for the critical application services used by these business partners.
trades. As e-comrnerce is used to directly sell goods and services to consumers
across the Internet, slow responsiveness can also lead to consumers buying from
a competitor. End User and Lines of Business Productivity
End user productivity suffers even if other work can be performed when service
Quantifying the impact on business revenue requires an understanding of the
outages or degradations occur. On average, it takes 20 minutes for an end user to
critical business systems and the associated revenue generated by those systems on
discover an application is restored and get back to the point in the application
an annual basis. This information can be used to calculate an hourly rate, and by
when the failure occurred. Similarly, if service responsiveness degrades, it takes
assessing the increased service availability due to proactive service management,
an associated benefit can be calculated. the service users longer to complete their tasks and business transactions.

mommumom mam
Quantifying the benefits of reducing the outages by proactively managing service
Caution is a relatively simple matter of calculating the additional time the users will be pro-
The lines of business should be consulted when calculating revenue impact because they might have
ductive based on increased service availability, and multiplying this by the number
manual backup systems that will allow processing to continue in a degraded mode. This produces of users and the average loaded cost per user per hour.
quantifiable revenues, but at a reduced rate.

Tip
Quantifying the impact of slow response times will be more difficult and will When calculating employee costs for productivity calculations, remember to use fully loaded costs,

require the cooperation of the lines of business. Revenue impact will include which include salary, bonuses, benefits, equipment costs, real estate, and utilities.

any penalties involved in not meeting critical deadlines, as well as the competitive
disadvantage associated with reduced effectiveness of internal personnel or lost Similar to the benefits of reduced outages, the benefits associated with improved
business due to customers shopping elsewhere. and consistent responsiveness can be calculated by determining how much more
work can be performed in a given time period. This can translate into cost avoid-
ance by deferring the hiring of additional employees.
Customer and Partner Satisfaction and Loyalty
Customer loyalty is becoming more important to most corporations as they attempt Note
to build strong customer relationships, particularly with their best customers. The User productivity is also affected by offline activities such as output distribution and information
focus of information technology is shifting from improving the efficiencies within archival and retrieval. When determining the scope of service management in your environment, the
the corporation to improving the effectiveness of the corporation's supply chain as service level agreement should extend to cover these offline requirements.
well as its sales channels and marketing efforts. This has led to many front office
applications that engage the customer or partner in a dialog and add value to the
relationship by providing information or conducting transactions directly with
them. Proactive Business and Capacity Planning
By understanding future business applications and workloads as well as required
There are a number of methodologies for calculating the value of a deeper
relationship with a distributor or customer. One method is to look at the best- service levels, it is possible to proactively plan the necessary IT architecture and
assets to meet these requirements. This ensures that adequate capacity will be avail-
penetrated customer of a particular size and in a specified industry. Using that as
able and it also supports a policy of just-in-time upgrades. Using this approach
a guideline, the increased revenue of penetrating all similar customers to the same
degree can be calculated. This provides a baseline that must then be scaled to allows better use of capital, and the net present value of deferring hardware pur-
chases can be calculated along with any associated costs for maintenance charges
reflect how much of this business potential is affected by improved service quality
or would be negatively impacted by failure to meet acceptable service levels for and upgrading software licenses.
those critical front office applications.
IdV

Increased Return on Investment in IT Assets


Tip
Proactive management of service levels allows higher utilization levels of IT When implementing a service level management solution, it is best to start with the most critical
components because more accurate measurement of service quality is possible or highly visible service provided by the IT department. By focusing on one service at a time, the
and workloads can be better balanced across available resources. This in turn defers probability of a successful implementation increases significantly and the initial success leads to
the need to upgrade hardware and software. continued management support for service level management of additional services.

Similarly, a number of corporations deploy additional resources to provide redun-


dant capacity that can be used in the event of an outage. This might not be a The decision of whether to develop or acquire service level management tools
cost-effective approach because redundant systems protect against only one thing: can also affect IT staff productivity. Acquiring solutions rather than developing and
hardware failure, which is the fifth leading cause of downtime. Planned mainte- maintaining in-house tools will free up valuable resources for revenue generating
nance, application failure, operator error, and operating system failure occur more applications or other critical applications. In a case in which the application is
often. Proactively managing service levels can often meet the requirements of ser- hosted on unique, old, or specialized hardware, there might be no commercial ser-
vice level agreements without having redundant hardware systems in place because vice level management solutions available. In this case, it is very important to keep
the more frequent causes of outages are reduced or eliminated. This saves the costs development costs associated with building the management tools to a minimum
of the hardware, software, maintenance, and operating personnel associated with required to provide an acceptable level of monitoring and reporting. Remember
the redundant systems. to factor in ongoing maintenance when doing the cost/benefit analysis of adding
additional functionality.
Quantifying the benefits associated with improved IT staff productivity requires
Increased IT Personnel Productivity an assessment of the deferred costs associated with being able to do more with the
Industry research firms have found that through the use of a proactive service existing or fewer staff. After this assessment is made, calculating savings is easily
management methodology and associated tools, IT operational support personnel accomplished using average fully loaded employee costs.
can increase productivity from 25% to over 300%. By automating tedious and
repetitive tasks, human error and subsequent corrective actions are also significantly
reduced. A Sample Cost Justification Worksheet
Monitoring and managing a complex, distributed, heterogeneous environment from Table 8.2 is a sample worksheet that was used by one large corporation to
a service orientation, rather than managing individual components in isolation, sig- calculate the benefits of proactive service level management. Some of the specific
nificantly increases the quality of services as well as the productivity of network and formulae used for the calculations follow the worksheet.The following benefit
systems management personnel. categories are used in the worksheet:
Capturing historical information on the utilization and performance of the IT Employee costs within lines of business—these are the end users of services

infrastructure supports proactive analysis of problem areas and enables corrective provided by the IT department
actions to be taken before problems occur. This results in better utilization of IT
staff and reduces reactive fire fighting. • Average number of personnel using application servers
• IT support personnel costs
When examining service management tools, it is important to balance the produc- Service downtime—the percentage of time the application service is
tivity gains with the costs and resources to implement the solution. Chapter 7, •
unavailable during normal business hours
"Service Level Management Products," discusses the various management solutions Lost employee productivity due to service outages that could be prevented
available. Management solutions that focus on specific application services can pro- •
vide a more rapid implementation that results in cost savings, quicker time to value, by service management
and increased IT staff productivity By utilizing event management tools and setting Lost business due to service outages that could have been prevented by

thresholds on critical items such as application errors, database space availability, CPU service management
utilization, application response time, and other critical components, alerts and warn- Cost of customer dissatisfaction due to service outages and degradation that

ings can be sent to support personnel to prevent problems before they occur. This could have been prevented by service management
increases the effectiveness of IT personnel and allows them to be more proactive.
I Ji

Nui' MANAI NI; Se W t i


• Costs associated with failure to meet service level agreements Cos'rs Asscn,iAriai wrn i
• Increased IT personnel productivity due to implementing proactive service Employee Costs 57,000
management and associated tools Cost per year of personnel on application servers
28.50
Cost per hour per individual 14,250
Cost per hour total 83,000
Note Cost per year of IT support personnel 41.50
In the interests of simplicity, this worksheet uses only outages for determining the cysts associated Cost per hour per individual 1,038
with not implementing proactive service management. No costs are calculated for lost revenue and Cost per hour total
productivity due to degradation in service responsiveness. Nor are the opportunity costs of better Application Service Downtime 0.50%
utilization of IT assets and deferring upgrades factored into the worksheet. Percentage of application downtime 936
Annual unscheduled downtime in hours for all servers
468
Annual unscheduled downtime during business hours
Table 8.2 Cost Justification Worksheet Lost Productivity for Application Downtime
3,334,500
Personnel on application servers 242,775
NUMBERS ARE REPRESENTATIVE OF AN AVERAGE BUSINESS
Employee Costs IT support personnel 3,577,275
Annual salary of personnel using application services $40,000 Total
Annual salary of IT support personnel 60,000 Lost Business 26,709
Percentage of annual salary to add for benefits 30% Hourly income related to server applications
375,000
Facilities costs per employee per year 5,000 Annual lost business due to application downtime
Number of personnel using application services 500 Customer Satisfaction 500,000
Percentage of their time using the applications 50% Estimated value of customer satisfaction impact
Number of IT operations management personnel 15 SLA Penalties Cost 93,600
Percentage of IT operations time connected to servers 50% Total SLA cost per year
IT Infrastructure - Hardware
Improved IT Staff Productivity
Number of application servers 3
50 Ratio of servers to IT systems management support
Estimated percentage of growth of application servers 10%
Percentage of availability during business hours personnel 10
99.5% Ratio of databases to DBAs
IT Infrastructure - Software 207,500
Cost avoidance of tools allowing additional IT staff
Number of databases 100 productivity
Estimated percentage of growth of databases 10% SUMMARY
Business and Income Numbers $3,577,275
Lost productivity for application downtime $375,000
Annual income related to application services 100,000,000
Number of business hours per day Lost business $500,000
12
Number of business days per week Customer satisfaction $93,600
6
SLA penalties cost $207,500
Estimated value of acceptable applications 500,000
service and related customer satisfaction Improved IT productivity $4,753,375
Estimated percentage of business lost due to downtime Total
3%
SLA Penalties
Penalties per hour outside SLA 2,000
Percentage of application downtime outside SLA agreement 10%
r s^ 136

Calculations Used in Worksheet Lost Productivity for Application Downtime


Most of the items in Table 8.2 are easily calculated. We have included the calcula- Application downtime affects the productivity of users within the lines of business
tions for representative items in the following sections. These formulae are useful as well as support personnel and IT staff.
for creating a spreadsheet that can be used to calculate the benefits under various
scenarios. • Personnel on application servers:
(cost per hour)x(annual unscheduled downtime during business
Employee Costs hours)x(percentage of time using applications)
The costs of the employees should be calculated using fully loaded costs including • IT support personnel:
benefits and facilities costs.These costs will vary based on employee type. In this (cost per hour)x(annual unscheduled downtime during business
worksheet, we have only considered two categories of employees: IT personnel hours)x(percentage of time IT & DBAs connected to servers)
and others.

• Cost per year of personnel using application services:


Lost Business
(annual salary)+(benefits)+(facilities cost per employee) When calculating lost income due to unscheduled downtime, we must factor the
• Cost per hour per individual: revenue normally generated by those application services to account for the ability
(cost per year of personnel using application services)/(2000) to operate manually in a degraded mode.
[40 hours/weekx50 weeks/year]
• Hourly income related to server applications:
• Cost per hour total:
(annual income related to server applications)/((number of business hours
(cost per hour per individual)x(number of employees using application per day)x(number of business days per week))x(52 weeks per year)
services)
• Annual lost business due to application downtime:
• IT operations personnel calculations are similar to the calculations shown
(hourly income related to server applications)x(annual unscheduled down-
here.
time during business hours)x(estimated percentage of business lost due to
downtime)
Application Downtime • Customer Satisfaction:
The costs associated with downtime include both unscheduled downtime due (estimated value of application availability to customers this is an estimate
to failures as well as planned downtime for maintenance that extends into normal of future business that will be affected by unacceptable quality of service)
business hours.

• Percentage of application downtime: SLA Penalty Cost per Year


(100)—(percentage of availability during business hours) Many service level agreements contain penalty clauses for failure to achieve
• Annual unscheduled downtime in hours for all servers: required service levels. This is particularly important if services are provided to
external users such as business partners.These penalties should be accounted for
(number of business hours per day)x(number of business days per
in the cost justification.
week)x(52 weeks per year)x(number of servers)X(percentage of downtime
during business hours)
• Total SLA penalties cost per year:
• Annual unscheduled downtime during business hours: (cost per hour)x(annual unscheduled downtime during business
(number of business hours per day)/(24)X(annu al unscheduled downtime hours)x(percentage of application downtime outside of SLA agreement)
in hours for all servers)
136

Cost Avoidance Due to Improved IT Staff Productivity


Using a proactive service management methodology along with automated man-
agement tools can significantly improve the productivity of IT staff, and thus avoid
the costs of additional personnel.

• Ratio of servers to IT systems management support personnel:


CHAPTER
(number of application servers)/(number of IT operations management
personnel)
• Ratio of databases to DBAs:
(number of databases)/(number of DBAs)
• The preceding figures are calculated twice—first without tools, and a second
time using automated management tools.
• The costs associated with managing the IT environment manually are
(number of application servers)x(estimated growth of application
servers)/(ratio of servers to IT systems management support personnel)x(cost
per year of IT support personnel) [plus DBA costs calculated in a similar
fashion]
• Cost avoidance of tools allowing additional productivity: Implementing Service
The difference between the calculation using the ratio of IT personnel to
servers without management tools and the calculation using the ratio of IT
Level Management
support personnel to servers using management tools.

Note
Industry analysts have estimated that the practical limit of the number of databases a database
administrator can manage manually is from five to ten. Through the use of tools that take proactive
actions, this number rises to fifty or more. Similar ratios can be used for system support personnel.
R eaders can pause at the start of this chapter. After all, we've covered the funda-
mental concepts and parameters of SLAs and offered a framework for product
selection. Isn't that what implementing service level management is all about?
Summary The answer is a resounding no. In fact, we've only laid the groundwork.
It is possible to quantify the cost savings associated with implementing proactive Successfully implementing service level management (SLM) calls for more than
service level management strategies and tools. When doing so, it is important to buying some software and slapping a contract on the desk of the nearest depart-
begin with the impact on business revenue, productivity, and customer satisfaction. ment head. It requires a strategy, an organized, flexible plan for introducing SLAs
Additional cost avoidance resulting from improved IT staff productivity, better and working with them day to day to achieve maximum efficiency and savings.
return on investment in IT assets, and deferring system upgrades can also be Without this, projects can fail despite the best efforts to make them work.
used to justify service level management. Consider the following case: A couple of years back, a network manager working
Appendix E provides an actual case study of qualitative value and quantitative for a large Eastern retailer decided SLM would suit his firm. He hired a consultant
return on investment for implementing service level management for an SAP to scope out the basics and evaluate products. The CIO signed off. After a large
application at a service provider. expenditure, software was installed and SLA templates prepared. The first of these
was sent to the head of the customer service department, the largest in-house IT
138 13lí

user in the company—where it sat on her desk.Time passed. Other divisions were .
, Enlist support from significant constituents.
sent SLA forms with similar results. A meeting was called to explain the benefits of 1. Set up resources to assist stall during the transition.
the new system, during which the head of customer service asked why she hadn't 5. Put ongoing controls in place.
been given the opportunity to help shape the terms of her SLA. She did not, she
pointed out, have time to help IT do its job. The other managers present at the An effective SLM strategy will cover all these points. But beware:The procedure
meeting concurred. The next day, the network manager found himself summoned isn't as simple as following five steps to success. Read on.
to the boss's office for a long talk about the high cost of his pet project. Two
months later, the manager who'd instigated SLM resigned.
First Things First
This anecdote illustrates that good intentions and products don't constitute an
A key lesson of the anecdote related previously is that managers can't thrust SLAs
SLM strategy. Instead, what's needed is an in-depth analysis of a company's unique
on all departments (that is, clients) at once and expect success. Instead, SLM must
culture and requirements, with a clear sense of information regarding potential
be introduced on a client-by-client basis. This ensures that the IT manager, acting
pitfalls and opportunities. The network manager wasn't wrong to propose service
in the role of service provider to various lines of business, can furnish the maxi-
level management. In fact, he could have been a trendsetter. His products and mum attention to each client. It also guards against the confusion—and ultimately,
templates were state of the art. The trouble was, he hadn't bothered to consider
the political upheaval and rebellion—that can erupt when multiple departments
how best to introduce SLM to his constituents. He had not focused on soliciting clamor at once to grasp new technology. But taking the client-by-client approach
buy-in from all parts of the business, not just IT. He had mistakenly focused on the
means it's vital to choose a starting point that will get the project off on the right
network layer alone, and he had not followed an inclusive strategy that incorporated
all services that would be affected by SLM. Inevitably, the vacuum of unanswered foot.
questions soon filled with misunderstanding and political rivalry In the end, our It might come as a surprise that the best place to start an SLM project is not with
hero fell victim to his own initiative. the first client. Instead, it is within the IT department itself. Make sure that support
for SLM is consistent throughout the IT organization—from the CIO on down
Unfortunately, it's a scenario that's repeated all too often in today's business world. through the ranks of those operations folk who will be responsible for responding
But with proper planning, it can be avoided. In this chapter, we'll outline ways to to problem calls and assembling the day-to-day reports the SLA requires.
construct an effective SLM strategy, thereby not only avoiding failure, but also
planning for best results in real-world situations. The necessity for this can be illustrated by another anecdote. Not long ago, the IT
manager of a large Midwestern bank decided to start a series of SLAs within her
organization. All went well. She was able to enlist the support of top management,
Planning the Rollout purchase the necessary software tools, and set an implementation schedule with her
Any business innovation requires some form of planned introduction to succeed. immediate staff. After several meetings, she also obtained the support of two key
In effect, managers must sell a new technology or procedure to those members of in-house constituents who seemed eager to start the process. SLAs were assembled,
the corporation who'll be charged with making it work. There's a simple reason approved, and signed. The process officially began. A few days passed uneventfully.
for this: People resist change. From the shop floor to the boardroom, human nature Then late one morning, the executive in charge of one of the constituent depart-
tends to stick with the status quo, even with its problems and difficulties. Some of ments called the IT manager in a rage. What was the idea of breaking their SLA
this originates in fear: Employees and executives might perceive that adopting a so soon into the cycle? he thundered. Didn't they have a deal? After some ques-
new procedure will threaten their usefulness or position in the company. Some tioning, the IT manager learned that a server had crashed first thing that morning.
think taking time out to make a change will hinder their ability to meet hectic A call to the IT help desk had failed to produce a fix within the agreed-on time
schedules and deadlines. And of course, political pressures abound. period of three hours. In a panic, the IT manager checked with the help desk,
only to learn that her top SLA constituent had been relegated to the bottom of
Any procedure for introducing new technology—including SLM—must be able to the priority list for that morning, in keeping with usual procedure. Apparently, the
meet and overcome most arguments for resistance. Generally this can be achieved help desk hadn't been informed about the existence of a special-case SLA for the
by following specific guidelines:
department whose server failed.
1. Explain and demonstrate the benefits clearly.
2. Define the goals.
I TV '141

This story shows clearly what can happen when trouble isn't taken up front to Making Contact
obtain top-down buy-in by all IT personnel, from the CIO down. The concepts When a starting point for introducing SLM has been chosen, the next step is to
of SLM cannot be effective unless all IT personnel are informed of their particular initiate contact with the prospective client. This needs to be done from the top.
role in making the SLA work.
SI,M can't succeed without the endorsement of the folks who appear at the head
of the client's org chart. Don't make the common mistake of assuming that the
boss is too busy or doesn't care about the changes you're trying to make. Also
The First Client
don't assume that those below him in the organization will fall into step on their
After the IT department itself is fully briefed on its roles and responsibilities, it is own.
time to choose a clientele. Who will be first?
When you've decided whom to contact, it's time to make your pitch. SLM puts
In choosing a first client, it's best to pick according to need and visibility within any IT manager in the position of a service provider who must sell the client on
the corporation. But there are no hard and fast rules, and in the end the best a proposal's benefits. Set up a formal meeting with your target executives and give
course of action will depend on the company's particular circumstances. The a standard business presentation, complete with graphics (see Table 9.1).This might
following selection criteria can help: be your chance to become a corporate hero. Don't reduce your effectiveness with

• The area most critical to the business—It's an IT rule of thumb that poor preparation.

network applications aren't created equal. A financial services company mea-


sures its lifeblood in the uptime of its trading network. An online retailer Table 9.1 The SLM Presentation: Do's and Don'ts
wants nothing to halt the flow of orders. If SLM implementers succeed in DO DON'T
pleasing these tough constituents, it's a safe bet others will fall into line. Assume that the client boss is
Contact the top-level personnel in
• Where most improvement is needed—Specific departments might suffer the client department to ask for too busy, orthat underlings will
more than others from poor response time and availability. Getting these the meeting. carry the message more effectively.
clients on track first helps demonstrate SLM's capacity for improving net- Call to set an appointment for Use email alone to contact your
work functionality. Also, making substantial improvements lends a sense of making your presentation. target executives.
drama and achievement to the project. Create a business-quality presentation Go into the first meeting empty-handed
• The most disgruntled group Every company has departments that for the initial meeting. or with handwritten notes.
routinely require special handling. Perhaps they have the most demanding Prepare a schedule for implementation Be vague about goals or timelines.
network requirements (not to be mistaken for the most important or to share with the client.
mission-critical ones). Or they might routinely avoid IT, working around Have a contract template ready for Go to the first meeting without a
standard technologies. Putting the toughest group first might make the rest discussion. template or thrust a template on the
of the job a breeze. without asking for input client later.
• The most politically powerful group—These are the folks who can Field all questions calmly and Get argumentative or defensive
pleasantly. when challenged.
make or break any technology initiative. Often, they belong to the area
most critical to the business—but there are exceptions. The office of the
CEO or CFO, for instance, is often a bellwether for acceptable procedures. After the presentation, be ready for confrontation. Don't expect the benefits of
• Areas of highest/lowest visibility—Sometimes it's a good plan to start an SLM to be immediately evident. Furthermore, the audience (that is, the client) is
SLM rollout wherever it will show up the most—or the least. If you're confi- very apt to be antagonistic to the service provider (regardless of whether it is inter-
dent of success and want only to make an eye-catching start, choose the shop nal or external). Certainly in a majority of companies today, the IT department is
floor, the trading floor, the customer service center, or another headquarters- viewed with a mixture of attitudes ranging from mild suspicion to open hostility.
based division whose work is widely seen and discussed. The in-house public-
ity will pave the way forward. Conversely, if you're facing difficulties, choosing
an area of low visibility—remote training centers, building maintenance—
might help minimize exposure during the start-up phase.
As with any new technology, there will be plenty of questions. Field these pleas- Obtaining a Baseline
antly and with candor. Do not become defensive—if you do become defensive, No SLM strategy can begin without a baseline of performance. Baselining, or
your client will think you have something to defend. Similarly, any aggressive monitoring the network and systems to determine the present state of perfor-
behavior will work against you.
mance, is crucial in determining 1) how services need to be changed for more
satisfactory performance, and 2) how services will be maintained and guaranteed
over time. The operative principle is simple:You must know where you are before
Going Live with SLM you can proceed to a better place.
After SLM has been successfully introduced and a first client chosen, SLM deploy-
Taking a baseline doesn't mean racing to the nearest network connector with a
ment can begin in force. In general, this is a process of the following five steps:
portable monitor. Unless all parties agree to a set of fundamental parameters ahead
1. Setting up the service management team. of time—and clearly understand what they're agreeing to—the baseline report will
2. Obtaining a performance baseline. be worthless.

3. Negotiating service levels. Start by deciding what measurements will be needed to adequately identify exist-
4. Implementing service management tools. ing network and system performance. In most cases, these boil down to availability,
or uptime of all devices and system, and performance, defined in terms of response
5. Establishing reporting procedures.
time, network latency, or job turnaround. As ever, it's important to keep the focus
We will explore each step in detail. on how the end user perceives the service. The end user is the consumer, whereas
IT is the service provider. The end-user experience determines how the service is
actually meeting key business goals.
Setting Up the Team
Clearly explain all metrics as you suggest them, and make sure that you consult
After the client has committed to participating in the SLM rollout, it's important with colleagues in other parts of IT before suggesting anything. Clients are likely
to select a team as quickly as possible. Ideally, this should start out as a small group to be confused if metrics are explained inadequately or if multiple metrics are
comprised equally of top staffers from IT and the client department. Keeping the presented for the same service. Worse, they might feel IT is attempting to mystify
group small makes it easier to keep the focus, lay the groundwork, and make any them in order to gain control of the project. Perceptions like these can sound
adjustments. As time passes, members can be added if need be. Alternatively, folks the death knell for SLM.
might opt to drop off the team after things get going.
Next, determine who will be responsible for capturing the metrics, which
On the IT side, include representatives of all areas involved. If the service has been methodology will be used, and how the data will be captured. If application
properly mapped out from end to end, this will be easy to establish. For example, a response time and network availability are determined to be the key baseline
customer service department might rely on a LAN and WAN as well as hosted elements, two distinct measurements might need to be taken by two IT groups
applications. Three different groups within IT might control these elements. Each using two distinct types of instruments. The network group, for instance, might
group needs to be represented on the team, at least initially, to ensure that SLAs are gauge uptime via a performance monitor, and the systems group might use a
properly defined. software agent to measure application response time at the server. Choose a time
Although the appointment of a single coordinator is practical, in most instances, and place for coordinating input from multiple sources.
there is no need to designate an official leader for the SLM start-up team. The rea- It's also vital to determine the time values for the baseline. Careful consideration
son for this is that in most cases, members of the team need to perceive themselves must be given to the time interval over which samples or measurements will be
as partners in the SLM process, not as recipients of another division's policy. Also, taken, as well as the overall period of time allowed for baseline sampling. These
both sides represented on the team will play important roles. The client may values probably will end up in the SLA itself, so it's important to give this some
decide on the time and place for the meeting and kick things off, but IT will be thought, and perhaps even to run through a few trials before coming to a final
expected to take the lead in presenting choices and making recommendations. Be
decision.
careful: IT representatives should act as trusted advisers without forcing their will
on the group. In the final analysis, both client and IT staffers should play an equal
role in decision making.
144 145

Regarding time intervals for sampling, it's generally best to err on the side of only if considerable funds are shelled out for multiple redundancy. Clients might
granularity. If you start out with too much data, it can always be reduced to a also need counsel in order to avoid shortchanging themselves. One company we
significant and accurate figure. Too little information, on the other hand, defeats worked with recently signed for 99% uptime per month on all WAN links ordered
the purpose of baselining. Start by measuring at least on an hourly basis, then tally li.om a particular carrier--but soon found out that metric allowed for several hours
results into daily and weekly averages. of downtime every 30 days. It took some haggling, but adjusting the 99% figure to
reflect biweekly rather than monthly performance resulted in significant savings for
Let the duration of baseline sampling be determined by the business cycle itself.
the company.
A payroll application might show the full range of possible variations in response
time and availability over the two to six weeks it takes to complete a company
payroll. In contrast, a customer service department specializing in seasonal equip- Implementing Service Management Tools
ment might peak in bandwidth and system requirements for three months of the
It might seem odd to describe SLM tool implementation following a section on
year, and then show minor fluctuation for the remaining nine months. In that
baselining. In fact, it's a logical progression. Establishing baseline parameters helps
instance, a baseline might have to be taken twice in one year to establish reason-
IT managers set goals. That in turn helps them scope out the nature of the work
able performance expectations.
to be done—and assemble the right tools for the task.
Note In Chapter 7, "Service Level Management Products," we created a framework for
SLM team members must agree on the following parameters before baselining can begin:
product selection, including in-depth coverage of specific product groups and their
characteristics. Now it is time to apply that information to your company's specific
• WHAT the specific metrics will be SLM requirements. At this juncture, it is helpful to list your SLM requirements
• WHO will measure baseline performance
alongside the tools available to meet them. Table 9.2 illustrates how you might
create a worksheet to match your requirements with potential sources of SLM
• HOW the measurements will be taken
data before going shopping:
• WHERE the baseline will be taken

• WHEN it will be taken (over what time period)


Table 9.2 Charting SLM Tool Implementation
General Metrics Specific Metrics Sources
Availability Network availability Network management
platforms
Negotiating Service Levels Performance management
After the baseline performance measurement is taken, the SLM team is ready to applications
start finalizing the service levels for the actual SLA. If the baseline has been prop- Protocol analyzers
erly conducted, this step should take care of itself. In many instances, the metrics Traffic monitors
and sampling intervals chosen for the baseline project can be carried over into the RMON probes
ongoing contract. But lessons learned during baselining should be applied as much System uptime Systems management platforms
as possible. The baseline might reveal, for instance, that response time for a particu- Systems management
lar application hasn't been up to par. New equipment or network services might applications
need to be procured.When that's done, expectations from IT and the client can System log files
be set and an SLA established. Some network management
systems and performance
Keep in mind that non-technical clients need help in setting realistic performance
management applications
expectations.You might need to explain that the nature of technology itself—not
a lack of interest on the part of IT—is responsible for making 100% uptime doable COHhI I IIes
146 147
'rabic 9.2 Continued Alternatively, although upgrading a product to a new version (as opposed to a new
General Metrics Specific Metrics Sources release) will not normally be free, it will usually be much cheaper than acquiring
Performance Network latency Performance management new products.
applications Many SLM implementers will choose to augment incumbent products with
Protocol analyzers new tools—introducing software from the likes of Quallaby to create databases
Traffic monitors and reports using output from Openview or BMC Patrol, for instance. When
RMON probes shopping for new tools, however, it's important to build on what's in place. Keep
Network response time Performance the following checklist in mind during all new-tool evaluations:
(roundtrip from end management
user workstation; applications • Will it run on the existing network?
roundtrip from server)
• Will it run on existing hardware, or will it require new servers or
Protocol analyzers workstations?
Traffic monitors
• Will it run under the current operating systems?
RMON probes
Workload Transaction rates Log files • Does it interoperate with or support network or systems management
levels products now in place (for example, HP Openview, BMC Patrol)?
Systems management platforms • Does it support databases already installed?
and applications
Batch job completions Log files In answering these questions, dig into details. Avoid disappointment and embarrass-
Other Recoverability (mean Asset management ment by making sure that newly acquired SLM tools match specific releases of
time to repair) systems operating system, database, and management products in house. Nail down support
Security Log files contracts before officially introducing new tools: Ask all vendors for a commitment
Radius servers to furnish upgrades to ensure that these key parts of your SLM system keep work-
ing together.
Start by taking stock of what you already have.You might have SLM tools that you
don't recognize already on hand. Management systems, application log files, and Establishing Reporting Procedures
performance management applications all can be used to obtain SLM metrics. By
creating a record-keeping database and customized reports, it might be possible to When the baseline is taken and tools are selected for ongoing SLM measurement,
minimize the need to acquire additional products. it is time to think carefully about SLM reporting procedures:

Another alternative to acquiring additional products is to upgrade existing tools. • WHO will generate the reports?
For example, most IT organizations use one or more protocol analyzers or net- • WHERE will reports be generated?
work monitors. Nearly all these devices feature upgrades and add-ons for SLM
• WHEN will reports be generated?
implementation. Not only can these enhancements equip monitors and analyzers
with SLA reporting functions, they also can extend their scope of functionality. • WHO will be on the distribution list?
Vendors like Concord Communications and Netscout now furnish basic applica- • HOW will the reports be distributed?
tion response-time measurement along with traffic monitoring. The same goes for
products originally designed only to measure application performance. BMC Don't assume that any of this will take care of itself. It is not a given that the same
Patrol, long known for app-management wares, now works with a range of net- department that takes the SLM metrics will present them to the client. This might
work management platforms and performance monitoring tools. Upgrading exist- in fact be the best approach, but that decision should be reached only after consid-
ing products can usually be done simply by installing new releases of products to eration by the SLM team. The chief concern is whether all parties trust the source
which you are entitled under current maintenance agreements for those products. of SLM reports. If a particular division of IT, for instance, will be taking the met-
rics, are all team members comfortable enough with that division to ask questions,
148 149

request adjustments, and make changes as needed? Or will hidden mistrust and
80%
rivalries threaten the project? Will the group doing the monitoring have time to
50%
deal with all this? Follow your instincts here. Remember, anything shoved under 40%
the carpet at this point will surface in one way or another later on. If you sense 30%
problems, it might be best to charge one or two team members with reviewing 20%
metrics and generating reports. 10%
0% •

In many instances, complex SLAs will call for input from multiple departments. If Verbal Hardcopy Email Web based
this is the case, create a reporting team to coordinate results. This team also should
be accountable for the results—don't enable "buck passing." Pick folks who have Figure 9.1 The format of reports to end users.
the time, the ability, and the diplomacy to get the job done properly.
The next choice is when to issue reports. Much depends on the terms of the SLA
itself. If a contract stipulates that IT must live up to a monthly service level, reports Following Through
should be delivered at a set time each month, preferably in time for the client to If you've followed an orderly and well thought out strategy, your SLM rollout
obtain credit against next month's bill. In some cases, clients might want more fre- should proceed smoothly. But don't rest on your laurels. All SLM projects require
quent reports, even if the terms of the contract call for once-a-month review continuous care and feeding to stay successful. Part of a winning strategy is a
Encourage all parties to compromise in order to reach a frequency that is easy to follow-up program of continual improvement. This doesn't mean that you must
meet within everyone's schedule, while allowing time for discussion and changes. make changes just to keep up the appearance of flexibility. It does mean that you
need to be open to suggestions and willing to make corrections to any aspect of
Next, establish a report distribution list. This can be tricky. If too many people
the project as needed.
receive reports, you might be faced with a periodic chorus of opinions and
demands (depending, of course, on how well you've managed to field input up Sometimes this means parting graciously with pet products and plans. Consider the
front). But too few recipients can lower the project's visibility and value. Each following case: One IT manager I know implemented SLM in his company using
organization will have its own circumstances to consider, but in most instances, a management system he'd already installed. Working overtime, he prepared SLA
itis best to err on the side of having too many rather than too few included in templates, a database, and reports tailored to fit the incumbent system. The savings
the SLM report loop. If you've done your job ahead of time, report recipients realized from this earned my friend praise and a bonus.Time passed, and the suc-
shouldn't have much to complain about or change. And some folks, happy to cess of the SLM project caused other clients in the company to clamor for their
have been included in the first place, will tend to drop out of active participation own contracts, based on new parameters. It was clear new tools were needed to
over time. An alternative that is growing in popularity is the use of a Web site meet these requests. One day, one of the man's IT colleagues unexpectedly pre-
with authenticated access to make the SLM report information available to clients. sented the SLM team with a sweeping proposal for a new suite of tools he'd evalu-
However, a study of IT managers by Enterprise Management Associates (see Figure ated. My friend felt slighted and argued publicly against the purchase. Eventually,
9.1) found that hard-copy reports are still the most favored method of distributing this caused rivalries to surface, the boss took sides, and my friend felt compelled to
information about SLM performance. take a back seat on the SLM team. By failing to recognize that following new sug-
gestions did not detract from the value of his contribution, my friend stopped
The capabilities of the SLM tools chosen will help determine how reports are
reaping the rewards of his success.
distributed. As noted, many SLM tools today are Web-enabled: Results can be
emailed over the Internet or posted to a Web site for general browser access. This story shows that the right attitude is an important first step in any SLM
Where Web distribution isn't possible, the time and trouble it takes to get reports follow-up plan. But it's just a first step. To ensure continual improvement, you
to the right people should influence the size of the distribution list. Alternatively, need to get input at the right time and in the right format. A good review process
someone might be designated to supervise the actual publishing and distribution of makes this happen. This includes 1) getting input from members of the SLM team
reports. Using administrative staff or part-time help might be economical ways to in regularly scheduled evaluation meetings, 2) conducting client satisfaction surveys
get the job done. to get input that might not be put forward in a public meeting, and 3) finding
150 IDI

ways to keeping ongoing communications open and constructive. We will examine When exploring the reason for dissatisfaction, he proactive. If you've heard grum-
these elements individually: hlings about SL,As, ask for input: "1 is the agreement working for you?""Can
we meet to discuss any adjustments that need to be made?" "How can we help
Meetings—Regular team meetings are key to the ongoing success of any SLM make this work better for your group?" Don't wait for the client to become
project. Plan to convene at the same time each month. Make sure that all mem- unhappier. Don't think that by hiding in your hole you'll avoid confrontation. If
bers are present. Keep the focus positive:This is not a gripe session. Instead, use anything, putting off contact will cause disgruntlement to fester and increase the
the format of a progress report where suggestions for improvement are wel- chances of ultimate SLM rejection.
comed. For instance, don't forget to keep clients informed about technology
innovations that might help the cause. The latest traffic monitor or performance When complaints are voiced, try to defuse them before they get to be insur-
management application might offer a way to do the job better. Use your posi- mountable obstacles or crises. If a client is unhappy with the time interval being
tion to inform, not dictate. When changes need to be made, set guidelines for monitored for service level performance, change it. Don't ask for more time or
decision-making—use a consensus of opinion or majority vote to establish a argue against it. Instead, give a simple response such as, "Yes, that sounds like a
plan of action. good idea, let's give it a try" Demonstrating your willingness to act as the client
wants will dispel suspicions that you're using your technical expertise to rule the
Satisfaction surveys—This is a time-tested review method for large organiza-
tions. Start by creating a questionnaire. During the monthly team meeting, ask roost.
for input and make changes as needed. Then choose a time and place to distrib- Sometimes, mistakes will be made.You might fail to be proactive or initiate SLM
ute the survey. If the Web is used for SLM reports, that might be the place to contact with clients. If this happens, there is a risk that anything you say regarding
post the questionnaire. Otherwise, you can print it and distribute it in sealed SLM will be viewed with skepticism.You must accept this.You made a mistake by
inter-office envelopes. Give participants the option to deliver their responses not being proactive or by responding inadequately to your clients, and you must
anonymously. Publish the results in a brief summary for distribution to team pay the price. There is no magic fix. Candor and honesty, coupled with open com-
members. Then assemble a list of objectives to be met as a result of the survey munication, stand the best chance of healing the wound over time. So if you find
and set a timetable for meeting those objectives. yourself confronted by an unhappy client, be frank. Admit your mistakes, outline
Ongoing communications—Don't limit your review process to meetings your plans to resolve the problems, and move forward. Invite the client to work
and satisfaction surveys. Stay proactive and try to anticipate problems. Watch with you to establish SLAs that will meet their requirements. And keep the lines
for organizational changes like the regrouping of a department or the departure of communication open. Where problems have occurred, it's important to exceed
or arrival onboard of key personnel. When a new person enters the group, take the minimum level of dialog that the SLA process requires.
time to stop by and offer to explain procedures or answer questions. Even when
SLM procedures run smoothly, don't be lulled into a false sense of security.
Maintain regular contact with clients, and when the procedures are in working Summary
order, keep up the contact at various levels. Talk to clients who are part of the Effective implementation of SLM requires more than good intentions and good
SLM team, as well as those staffers responsible for using the service in the products. It calls for a carefully considered strategy that emphasizes cooperation
trenches. and planning. IT managers can ensure success by first analyzing their company's
unique culture; then proceeding with an open mind and a willing attitude to cre-
ate a plan that fits it. Putting the plan into action requires assembling a team of
Dealing with Difficulties professionals who are committed to the rollout. The team must use a thorough,
In some instances, SLM will meet resistance despite your best efforts. Maybe one orderly process to create SLAs, track them, and distribute reports in agreed-upon
department feels snubbed because they haven't been chosen to pilot the new pro- formats. In addition, IT must follow up with ongoing checks on user satisfaction.
ject. Or a new client who is working with an SLA thinks the terms of the contract At every juncture, the time and trouble invested in establishing trust, reliability, and
need changing. orderly and open communication will determine an organization's success in
putting SLM into practice.
Here, as ever, start at the top. When there is dissatisfaction on the part of a client,
it will usually float to the top of the group before it comes across to your depart-
ment. Often, dissatisfaction in the ranks originates with messages from leading
executives. So make sure that you're dealing with senior staff.
CHAPTER

Capturing Data
for Service Level
Agreements (SLAB)

T he primary objectives of service level management are to measure, monitor,


and improve the quality of services provided to end users. The data collected to
support these objectives must measure directly, or provide the ability to derive,
the end-to-end service availability and responsiveness experienced by end users.

Metrics for Measuring Service Levels


Rather than measure end-to-end service levels, many IT departments measure
the availability and performance of individual resources and components such as
the network, servers, or individual devices.This approach does not align the IT
department with the lines of business. The lines of business are concerned with
end-to-end availability and responsiveness of the critical applications that support
their automated business processes. The Service Level Agreements with the lines
of business should therefore be based on application services.
166
Four broad parameters typically used in evaluating service levels are as II)llows: With a n ultitier server structure, end-to-end response time might not provide a
• Availability detailed enough picture of delays within the underlying components to pinpoint
performance problems. Another technique, inter-server response time measurement,
. Performance focuses on the response time between servers. Providing multitiered response time
. Reliability measurement allows the IT personnel to drill-down and discover the source of per-
• Recoverability formance problems. Undoubtedly the best approach to measuring response time is to
implement both end-to-end and inter-server response time measurements.
Availability refers to the percentage of time available for use, preferably of the end- Today, only a small percentage of IT departments set and measure Service Level
to-end service, but many times of a server, device, or the network. Performance Agreements for distributed application availability and performance. Many of these
basically indicates the rate (or speed) at which work is performed. The most popular IT departments do so using in-house developed tools and manual processes.These
indicator of performance today is response time. However, other indicators are also processes are typically based on analyzing end-user service problem calls and corre-
useful in specific contexts. For example, in a company that performs remote data lating the end-user locations with the components that are failing or performing
backups for its clients, an important performance indicator would be the file trans- poorly. New technologies are emerging that can assist the IT department to measure
fer rate. Reliability refers to how often a service, device, or network goes down or distributed application availability and performance in a more automated fashion.
how long it stays down, and recoverability is the time required to restore the service
following a failure. These metrics offer high-level views of service quality, whereas
response time is a way to directly measure how the end user's productivity and sat- Methods for Capturing Service Metrics
isfaction are affected by service performance. Five emerging methods for proactively measuring application availability and
Simply providing measurements of availability, performance, reliability, and recover- performance are as follows:
ability is not enough to perform service level monitoring. All aspects of service • Monitoring all components used by application transactions and aggregating
that affect end-user productivity and satisfaction should be covered by the Service these to derive overall availability and performance measures
Level Agreements. The characteristic that has the highest visibility among cus- • Inspecting network traffic to identify application transactions, which are then
tomers is response time. Other aspects to measure and monitor include workload tracked to completion and measured for propagation delay
volumes, help desk responsiveness, implementation times for configuration changes
• Using client agents that decode conversations to identify application transac-
and new services, as well as overall customer satisfaction.
tions and measure client-perceived availability and responsiveness
Although Service Level Agreements should align with the end-user perception of • Instrumenting the application code to define application transactions and
service quality, IT departments have been reluctant to agree to such SLAs for dis- collecting information on completed transactions and response times
tributed application services because of the difficulty in measuring actual applica- • Generating synthetic transactions at regular intervals and collecting availabil-
tion availability and performance on an end-to-end basis. ity and performance measures based on tracking these transactions
For example, end-to-end response time measurement is an ill-defined concept.
Management vendors have differing definitions that vary depending on what their Each of the methods determines application availability by measuring response
particular product can do. In general, industry analysts agree that end-to-end response times of multiple integrated applications used in a business process, a single appli-
time should start and end at the desktop and should measure the time from when a cation, or transaction. Applications with response times recorded above a prede-
command or transaction request is entered on the keyboard to the time when the fined threshold are considered unavailable. The last three methods provide the most
resulting actions are completed and the results are displayed on the monitor. accurate picture of end-to-end response times as perceived by the user community.

Today's application architectures vary widely and typically use some variation of
the client/server model. This results in some processing occurring on the desktop, Tip
Selecting which method to use depends on a number of factors including access to code for instru-
some on the application server, and in the case of a multitiered architecture, some
mentation purposes, willingness to proliferate and manage agents on desktops, ability to acquire
occurs on back-end database servers. This complicates capturing end-to-end
sophisticated network traffic monitors, and the inherent inaccuracies with some of these approaches.
response times because a single business or application transaction will span
In many cases, a combination of approaches deployed pragmatically will provide the best solution.
multiple interactions between the various client/server layers.
156 167

Use of these techniques does not eliminate the need for measuring the service lev- Web Server and Internet Middleware
els of individual components. In many cases, these techniques will identify service
problems based on end-to-end measurements, but this might not be enough to Applications including ERP, E-business. ..

determine where the problem is located or how to correct it. However, by com-
paring response times by application across various locations, it might be possible
Middleware - transaction, message or object-oriented
to isolate the problem location. For example, if an application is performing poorly
across all locations, the server or database is the likely cause. If an application is
performing poorly in only one location, it is likely a location-specific problem Database, File System. Print Qs, Fax Qs ..

such as local server, the local area network, or the wide area network connection
between that location and the application server. Servers with Operating System, CPU, Memory, Disk ...

Caution
Network connections and devices
These techniques for measuring end-to-end response times aren't able to detect outages of
individual desktops. These methods measure availability and performance of application transactions
Figure 10.1 The infrastructure technology layers.
between the user and the business process. This might be an issue for client/server applications
in which a significant portion of the application code actually runs on the desktop itself. The IT
A consideration when adopting this approach is the ability to capture management
department should continue to monitor help desk calls and the problem resolution system closely
information about the availability and performance of all the relevant components.
to determine the business impact of individual desktop problems.
The primary focus areas for data collection are the servers and the network.The
desktops, databases, and middleware also impact availability and performance.
We will now examine each of these methods in more detail. As these techniques
require data to be collected continuously, we will also discuss some of the common The performance data collected natively for servers varies depending on the oper-
architectures used by data monitoring solutions later in the chapter. ating system platform. OS/390 is a well-instrumented operating system with a sig-
nificant amount of information available through IBM utilities including System
Management Facility (SMF) and Resource Monitoring Facility (RMF), as well as a
Monitoring Individual Components and number of established products from third-party vendors. As these capabilities have
Aggregating Results been available for a number of years and are well understood, we will concentrate
Infrastructure monitoring involves measuring the availability and performance of on the distributed systems platforms including UNIX, Microsoft Windows NT,
individual components such as servers, networks, databases, and clients. To under- and Windows 2000.
stand the service delivered by a specific application, the availability and perfor-
mance data across all relevant infrastructure components must be consolidated and
aggregated. Figure 10.1 shows the various infrastructure technology layers. Monitoring UNIX Systems
This approach to service level monitoring concentrates on the components in the As mission-critical applications are moved to UNIX servers, the performance and
application dependency stack. Software agents are installed on each technology availability of the UNIX server assumes greater importance. However, the UNIX
level to gather information. The information from all the agents is consolidated to operating system is not constructed to provide complete and accurate performance
provide a single view of the application and its performance This approach is often information. One such important performance metric is the CPU utilization, and
referred to as monitoring by footprint. It provides a more comprehensive view of the the quality of this metric is determined by the capture ratio, which is the propor-
application than simply monitoring one component such as network traffic. tion of CPU utilization that is accounted for. In most UNIX systems, the capture
ratio is not sufficient for sophisticated performance analysis or capacity planning.
Caution
Monitoring across infrastructure components can use significant system resources to support the
data collection agents. It can also be difficult to identify, maintain, and register all the components
used by each application, and it can be difficult to consolidate this information and present it in a
format that is easy to understand.
158 1D

■ Per/non monitors perlo s tance and server resource usage (including CPU,
Note memory and disk I/O). It uses counters from the Windows registry, and the
UNIX systems come with a variety of performance measurement utilities. Unfortunately, these utili- data can be logged and viewed online or charted in reports.
ties were designed as standalone tools, and each addresses the particular problem the utility
. 'Ms/manager provides information on all the processes and services running
designer was trying to solve at the time of its design. The outputs of these utilities vary between
and the amount of memory and CPU they are using.
UNIX variants. In addition, the procedure for underlying measurement is not well documented and
supported. As a result, it takes a large amount of effort to correctly collect, understand, and inter- • Process Explode monitors processes, threads, and the committed mapped
pret UNIX performance data in consistent ways. memory. This is primarily of use to developers.
■ Quick Slice is a basic tool for viewing the per active process CPU usage.
The utilities generally available with the UNIX operating system include
Similar to the standard UNIX utilities, these Windows facilities focus on resource
■ Sar, system activity reporter, records and reports on system-wide resource utilization and don't directly measure or monitor the service levels experienced by
utilization and performance information, including total CPU utilization. users. The event logs can also provide a significant amount of information about
CPU utilization is measured using the tick-based sampling method. A system activity on an NT system. The NT Resource Kit Utilities allow these logs to be
counter accumulates the number of CPU ticks during which non-idle dumped and imported into a database for easier manipulation and analysis.
process was running. This counter is sampled at specified intervals to com-
pute the average CPU utilization between samples. This method leads to the
problem of relatively low capture ratio. Monitoring the Network
■ The accounting utility records the resources used by a process upon the process's The Simple Network Management Protocol (SNMP), and associated Management
Information Bases (MIBs), is used by most network device vendors to provide
termination. The principal drawback of this method is that no information is
available for the process until it terminates. Accounting reports summarize configuration, fault, and performance information about the network components.
these statistics by the command or process name and username. This information is useful for diagnosing failures and service degradation. The
Remote Monitoring (RMON) MIB provides additional information that can be
■ The ps utility provides a snapshot of the processes running on the system as
useful for determining traffic patterns and for understanding bandwidth utilization
an ASCII report. It reports the amount of CPU used by the process since its
by protocol and traffic type.
inception. When reporting information on all the processes, overhead is quite
high. Later in this chapter, we discuss in more detail how analyzing network traffic pro-
vides additional information about the quality of service delivered to end users.
As seen from this quick overview, these tools primarily provide resource utilization
information and don't measure the end-user response times or application transac-
tion throughput. The output of these utilities differs among the assorted UNIX Monitoring the Database
variants and doesn't provide historical or trend information. The SNMP protocol can be used for high-level database monitoring as most data-
base vendors support the RDBMS MIB. However, for more detailed monitoring
A number of performance monitoring products are available from independent and database management, additional tools are required either from the database
software vendors. Most of these collect data through a standard UNIX interface vendor or third-party software vendors.
called the /dev/kmem kernel device driver. The advantages of the third-party
products include the ability to normalize and compare the data across different
UNIX variants as well as greater productivity through enhanced user interfaces Aggregating the Component Information
and reports, including trend analysis reports. To build the composite picture of availability and performance, the standard tit ili
ties will generally need to be augmented by third-party performance monitoring
tools. Additionally, a management database and a mechanism for loading cipomolI
Monitoring Windows NT and Windows 2000 data from servers and the network provides a central repository for answer ing
Microsoft ships performance management tools with Windows NT and queries and producing reports. Similarly, performance information should be rn ^ e
Windows 2000. These include tured from other components such as middleware, databases, and Web serve+FN dull
loaded into the same repository.
inu IDI

When the information has been placed in the repository, analysis tools that support Intercepting Socket Traffic
a specific application service are required to correlate and aggregate information Another technology with similar promise as network packet decoding involves
across all components. It is typically easier to use this method to determine end- intercepting network traffic directly from the kernel stack. Network traffic on a
to-end availability than it is to determine end-to-end response times. system (server, workstation, or desktop) passes through a software layer before
continuing into or out of the system. Typically, this collection method involves
replacing some low-level network libraries or intercepting traffic from existing
Inspecting Network Traffic for Application low-level operating system features.
Transactions
This technology sees all incoming and outgoing network traffic. Analysis can be
Two mechanisms for inspecting the network traffic are as follows:
simplistic such as byte counts or complex protocol decodes.The same comments
. Decoding network packets about decoding traffic packets and protocols apply here as they do for network
packet decodes from wire sniffers. ASCII protocols such as DNS and HTTP are
. Intercepting socket traffic
easier than proprietary vendor protocols.
In both cases, the network traffic is examined to identify the end-points in the The main advantage of this technology is the opportunity for actual optimization
connection, including which application is participating in the conversation. and compression, rather than just measurement. This idea can be implemented on
This allows transaction times to be calculated by linking the transaction pairs in either the client or server machines, or on both sides.
a dialogue and determining elapsed times between requests and responses.
Caution
The primary drawbacks of analyzing network traffic are the inability to define transactions in user
Decoding Network Packets terms and the difficulty of matching all traffic. Additionally, these techniques do not capture
This approach known as wire sniffing or network packet decoding involves using a response time stemming from desktop application components.
technology that intercepts and analyzes every network packet. Typical technologies
are pure software approaches that do not require a hardware card. This technology
manipulates the Network Interface Card (NIC) by placing it in promiscuous
mode, which sends the packets through the software components that analyze End-to-End Service Level Measurement
the packet data. The problem of measuring end-to-end service levels including availability and
response times is gaining significant interest from IT departments and the vendor
The analysis of these packets is by no means a trivial task. Simple network packet
community There are multiple methods for addressing this issue and new technology
data is relatively easy to gather. For example, mapping of packets to sockets and
is rapidly emerging. Three basic approaches to this problem are as follows:
port numbers can be useful in analyzing bandwidth usage by application. However,
decoding the packets to find the application-level transaction start and finish points . Capturing information from the end-user's desktop to understand the
is more difficult. ASCII protocols such as DNS, telnet, and HTTP are not hard to application transactions as they occur
decode. However, proprietary protocols such as SQL*Net from Oracle or those of
ERP applications such as SAP R/3 are more difficult to decode. This difficulty is . Instrumenting the application to identify transactions with markers that can
be monitored in real time
the main reason why such approaches are not amenable to in-house development
and available only from management solution vendors. ▪ Generating sample transactions that simulate the activities of the user
community and that can be monitored
One advantage that arises from current LAN technology is that the network
packet collector need not sit on the server or client machine. It can run on any
machine within the network segment, thus allowing flexibility to perform analysis
on the server or on a dedicated workstation. Using Client Agents to Decode Conversations
This approach involves loading every client machine with a small agent that non-
intrusively watches events such as keystrokes or network events. The client agent
I US

then attempts to detect the start and end of a transaction, and measure the time
between these events. Typically, the client agent then sends back measured data to Caution
The main issue with this approach is the high costs of modifying legacy application code and the
a central place where broader analysis occurs.
lack of coordination between most IT operations staff and applications development departments.
These client agents capture response time from the client perspective without hav- The need for application modification makes this approach inapplicable to older, noninstrumented
ing to instrument the application itself. For example, some capture information on versions of an application. Hence, this intrusive instrumentation approach is best used in a situation
Web browser interactions such as the response time for page retrievals or down- in which a full revision and upgrade of the application is already required or under way.
loads. Similarly, some can decode client transactions for popular ERP applications.
The primary benefits of this method are the granularity of collection, for example
at an individual screen level, lack of application instrumentation, and the ability to
analyze user interaction from the detail data. Generating Synthetic Transactions
Generating synthetic transactions can be accomplished using scripts and intelligent
Tip agents or by using tools that capture transactions and then later play them back
The primary drawback of this approach is the large volume of data captured. To mitigate this issue,
against an application service. These mechanisms allow a simulated response time
place agents on representative desktops rather than on every desktop in the organization. Using a
to be measured, and can be very effective provided sufficient thought is given to
ensuring the transactions are truly representative of typical user behavior and that
sampling mechanism can also reduce the volume of data while still providing reasonable availability
and response time metrics. all locations are also represented.
The implementation can be as simple as writing a small script that launches the
These client capture agents might also be appropriate for user workflow analysis in application's client, command-line interface (CLI), or application programming
addition to capturing the service quality from the end-user perspective. interface (API) to perform a simple read or other controlled sample transaction.
The length of time taken can be subject to alerting, notification, and further
actions for deeper diagnosis or perhaps corrective attempts. The script can be
Instrumenting Applications scheduled to run at regular intervals using an intelligent agent or task scheduler.
The next approach involves building application programming interfaces (APIs) Capture/playback tools have traditionally been used for applications testing. These
that provide monitoring directly into the application. These API calls allow a are now evolving to also report availability and performance metrics. A number
monitoring tool to query the application for end-to-end response times, as well as of capture/playback products are available from testing tool vendors that can be
run application management actions on the application, for example backup and retrofitted or customized to provide availability and performance metrics.
recovery routines. This approach is still developing, as an industry accepted standard Capture/playback tools record user keystrokes and can play them back at regular
for these API calls has not emerged yet. intervals while measuring response time. By using distributed server resources or
Even after a standard becomes widely accepted, many popular applications will placing dedicated workstations at desired locations to submit transactions for criti-
likely go through several releases before they fully support the standard. This cal applications, a continuous sampling of response times by location is captured
embedded API approach offers the best accuracy for measuring application and reported. The strength of this method is in its ability to provide the end-user
response time. The Application Response Measurement API (ARM), discussed experience using samples rather than having to collect large volumes of data across
in Chapter 5, "Standards Efforts," is a good example of an API used to measure all transactions from all end users.
response time. There are two important considerations when implementing the synthetic transac-
The instrumentation APIs define the start and end of business transactions and tion approach. The first is to use a broad enough sample base to capture the service
capture the total end-to-end response times as users process their transactions. This quality across all critical applications and end-user locations, while ensuring that
technology is invasive to the application itself. The strength of this approach is that the number of transactions generated does not place too much overhead on the
transactions are defined in terms of business processes. The primary drawback is application environment. The second issue is that as the applications change, or
application invasiveness, which is an expense that most enterprises are willing to measurement criteria change, modification will be required to the scripts or the
incur for only their most critical applications. Further, instrumentation adds over- transactions will have to be re-captured for future playback.
head that could impact the runtime performance of the application.
I VT

Commands like pine and traceroute are special cases of simulated transactions, They their management solution. All these agents perform similar tasks, but the
measure only the response time of the network round trip, and do not include any functionality differs based on certain agent characteristics.
information about the application server or database. These approaches can be use-
`true intelligent agents have the following characteristics:
ful in detecting network congestion, diagnosing if a problem is network or server
related, and separating measured transaction times into network and non-network
• Autonomous—Operates independently of the management console
times. As an example of the latter, transaction-level synthetic transactions correlated including the ability to start, collect data, and take actions.
with network-level pings can provide a reasonable division of response time into
network and server times. • Social—Communicates with other agents, management consoles, and
directly with users.
. Reactive—Detects events and initiates actions based on the event.
Tip
Synthetic transaction generation, together with built-in sampling capabilities, offers the best . Dynamic—Operates differently depending on time and the context of other
approach to measuring availability and response time metrics for the widest variety of business activities that might be happening.
transactions. This approach is not intrusive into the application, and it requires less technical skill to
implement. There are also a number of technical aspects of an intelligent agent including

• Asynchronous—Does not need a permanent link to the initiating event or


console. ^ Y
^i1J
1
Common Architectures and Technologies for Data . Event-driven—Reacts to events and runs when only certain events occur. +

Capture Solutions ■ No active user interaction—Does not require constant user intervention
The most common approach to monitoring applications uses intelligent autonomous to run.
agents to gather information on the application as well as the underlying infrastruc- . Self-executing--Has the ability to run itself.
ture components.These agents use a variety of measurement techniques to collect the
• Self-contained—Has all required knowledge to perform its task.
data that is then used for event management and problem diagnosis and stored for
trend analysis and reporting. In most cases, a management console is used to display To avoid excessive overhead on the servers, the number of agents should be lim-
the information for administrative purposes.
ited. This might be best achieved by acquiring agents from as few different vendors
When deploying management agents for data collection, the captured information as possible. When selecting agent vendors, agent-to-agent integration capabilities,
must be consistent and accurate. One issue to consider is the potential problem of agent intelligence, and agent security are important considerations.
managing the widespread proliferation of agents. This can be achieved by ensuring
both efficient management agent architecture as well as an efficient data collection Tip
mechanism. Before deploying agents, implement a pilot to measure CPU, memory, and bandwidth consumption of
agents and consoles under a variety of operating conditions. By estimating the number of events
and the overhead required to manage that number of events, the agent impact on the system can be
Management Agent Characteristics
accurately planned.
Management agents perform a number of functions including scheduling the col-
lection of data, determining event conditions, forwarding this event information to Procedures to control, rationalize, and optimize the scope of agent execution in
consoles, executing recovery actions, and storing metric and event information for both function and execution time should be developed. For example, ensuring that
historical purposes. Simple agents are normally slaves to a master console. They only a fault detection agent is run continuously will limit the impact on overall
collect data and perform some event detection based on simple threshold analysis. system performance. Some data collection agents can be run at regular intervals,
They then pass the collected data and events to a management console that has for example as a response to a detected condition that requires additional data for
built-in intelligence to know how to react to the event. As outlined in Chapter 7, diagnosis. Historical data should be held for trend analysis so that the root causes
"Service Level Management Products," there are a number of vendors that provide
of problems can be identified.
network, operating system, database, middleware, and application agents as part of
166 161

Tip Comparative Analysis


The event-driven collection method is generally the most accurate, but it does
Multiple agents that duplicate agent functionality on a server should be avoided whereverpossible.
This can be achieved by careful coordination across management disciplines such as network man- have some limitations. Its accuracy depends on the level to which the events are
inter-
agement, database administration, and systems management. Each management discipline should I need. There can also be discrepancies depending on the nature of the event
be responsible for controlling agent deployment within its functional area. Policies and procedures rupt and when the actual measurements are taken. Depending on the frequency
should be developed for deploying and managing distributed agents. of events, the overhead of the event-driven measurement can be significantly larger
than that of sampling-based measurement, and it can potentially distort the mea-
su rements significantly.

Measurement Techniques On the other hand, the sampling-based method is subject to errors when multiple
activities occur or processes run between two samples. The activity occurring at
Before using a specific performance metric, it is important to have a clear and
the time of the sample will be allocated the entire length of the sample interval.
unambiguous understanding of its semantics. This is particularly important when Other activities or processes are not allocated any time during that sample.
using multiple metrics in conjunction to derive end-to-end service quality or to Similarly, if an activity takes place totally within a sample or if a process is created
solve a problem of service degradation. Almost all operating systems and manage- and terminated between two samples, it is not allocated any time at all.
ment solutions have had some metrics with ambiguous meaning at some point in
time. Measurement techniques for collecting performance data can be divided into
two general categories, which are event-driven and sampling-based. Note
The amount of error in the sampling depends primarily on the sampling frequency. Longer time
between the samples will result in larger potential errors. The trade-off is that more frequent
Event-Driven Measurement
sampling will increase the overhead of this technique.
Event-driven measurement means that the times at which certain events happen
are recorded and then desired statistics are computed by analyzing the data.
For example, when measuring CPU utilization, the events of interest are the Summary
scheduling of a process to run on a processor and the suspension of its execution. There are multiple data collection techniques and agent architectures that are used
The elapsed time between the scheduling of a process to run and suspension of its to collect data that is useful for measuring and monitoring service levels. Careful
processing is added to the CPU's busy counter and the process's CPU use counter, consideration should be given to the nature of the technology used and the
which can be sampled and written periodically to a log file or repository. With this deployment of collection agents. The goal is to ensure that sufficient data is col-
method, both the total CPU utilization and the CPU utilization for each process
are measured. lected to accurately measure service quality, while not placing excessive overhead
on the computing environment.
The same method can be used for collecting other information including end-to-end It is also very important to ensure that service levels are measured from an end-to-
response times based on instrumentation APIs or synthetically generated transactions. end basis so that the end-user experience is captured. A number of techniques can
be used to measure end-to-end availability and response times. When selecting a
Sampling-Based Measurement method to use, considerations include access to code for instrumentation, agent
The sampling method of data collection involves taking a scheduled periodic look proliferation, and the level of expertise in house for implementing and supporting
at certain counters or information access points. For example, when measuring the management solution. In many cases, adopting a pragmatic approach utilizing
CPU utilization by sampling, the measurement method periodically takes a sample synthetically generated transactions that are measured using sampling techniques
to see if any process is running on the CPU, and if so, it increments the system will provide sufficient scope and accuracy of information.
busy counter as well as the CPU usage counter for the process. The data collector Very critical and time-sensitive applications might require more sophisticated tech-
will typically sample these counters and record values in a log file. niques such as intrusive application instrumentation or client agents to provide
The sampling method is generally more efficient because it places less overhead on more comprehensive, accurate information.
the system under measurement.
CHAPTER

Service Level
Management as
Service Enabler

T he benefits of service level management can be clearly delineated for any orga-
nization that takes the time to make it work. But SLM can be especially advanta-
geous for those companies seeking to sell their IT services to outside users. In fact,
the growing ranks of Internet service providers (ISPs), application service providers
(ASPs), and outsourcers testify to the value of SLM.
By defining the parameters of acceptable service and setting clear goals and expec-
tations for providers and users, SLM provides a framework in which providers can
offer more and better services, while maximizing the potential of existing ones.
SLM also is the key to helping users ensure that they get the most value from
their growing investment in outside services. In most organizations, increased
demands on IT are accompanied by staff shortages and budget constraints. IT
managers are turning to external providers for help. SLM gives them a way to
quantify and get the level of performance and capacity they require.
In this chapter, we'll take a closer look at how SLM is playing a role in the riiw
ing world of online services. In doing so, we will attempt to focus on . SLM ksti
from two perspectives—the service provider's and the end user's.
170 171

The Ascendance of IP )ata Corp. (II )C, h r minglt a nt, MA) estimate this market will grow at rates over
0)0% annually, reaching $2 billion to $O billion by the end of 2001.
First, we will take a look at the types of services that are most often used by cor-
porate customers to extend their internal networks. These services vary, but there is (;onipanies also are turning to outsourcers for assistance. Consultants and systems
a preponderance of demand for IP-based services. The reasons are clear: Internet integrators often take over all or part of the duties of the data center, including
access is inexpensive compared with the costs of dedicated, private networks. The supervision of local and wide area network services, maintenance and management
Internet is a fast and easy way to extend the corporate network without adding of assets, network monitoring, and security. According to IDC, worldwide revenue
new facilities. An Internet presence gives companies with limited geographic scope For outsourcing services now exceeds $100 billion and is expected to reach $151
a way to market their wares to 55 million computers in 222 countries. billion by 2003.
Improved security and performance on the Internet also make it a unique envi-
ronment for .com businesses such as Amazon.com that exist solely in cyberspace. What's an ASP?
The market for firms like Amazon.com that conduct business-to-consumer elec- The demands of e-business are driving companies to sign on with service providers who offer them
online access to mission-critical applications. This approach reduces the initial investment organiza-
tronic commerce over the Internet is expected to exceed $100 billion over the
tions must make, and it saves development and implementation time. It also eliminates the need to
next three to four years. And business-to-business electronic commerce, in which
hire extra IT talent to run new systems.
companies use the Internet to support transactions with partners and suppliers,
is even bigger. Estimated revenue for companies in this space is expected to top But, like all technology "buzz phrases'—such as service level management—the term ASP seems to
$1 trillion within the same timeframe. take on new meanings every month. And as this profitable services segment grows, the term is likely
to become even more inclusive—at least to marketing experts. Outsourcers, integrators, and even
Market opportunities like these are forcing SLM into the spotlight, as providers and
consultants are jumping on the revenue bandwagon and are labeling themselves as ASPs.
their customers seek ways to establish and maintain ever-higher levels of network
performance and availability in an increasingly service-oriented environment. On a more down-to-earth level, the question of who really qualifies as an ASP is more limiting.
According to the ASP Consortium (Wakefield, MA), "An application service provider manages and
Note delivers application capabilities to multiple entities from data centers across a wide area network."

The Advantages of Internet-based Services are as follows: Market research firm International Data Corp. (Framingham, MA) gives this definition: "Application
service providers (ASPs) provide a contractual service offering to deploy, host, manage, and rent
For end users: A quick, inexpensive way to extend in-house networks and interact with customers
access to an application from a centrally managed facility. ASPs are responsible for either directly or
and suppliers
indirectly providing all the specific activities and expertise aimed at managing a software applica-

For service providers: Fast deployment of services at low cost; worldwide reach; and unique environ- tion or set of applications."
ment for services like electronic commerce
These definitions leave room for two kinds of providers—those who offer applications from their own
facilities, and those who rely on the cooperation of other carriers or Web hosting companies to fur-
nish the necessary network services. In either case, the ASP is charged with the direct management

A Spectrum of Providers of its own servers and holds ultimate responsibility to the customer for maintaining agreed-on levels

of service.
As demand rises for Internet-based services, the market is becoming increasingly
segmented. Internet service providers (ISPs) tout a range of offerings IT profes- The generally accepted definition of ASP does not include those companies that provide applications

sionals can use to increase their companies' online capabilities, including standard over a customer's own network, such as systems integrators. And in most instances—although there

Web access and hosting, email, virtual private networks, electronic commerce net- is some disagreement about this—it does not include network outsourcers. In some cases, however,

working services, remote access, and voice-over IP services. According to invest- these providers might host customer applications from their own servers, using the Web as the

ment banker Credit Suisse First Boston (New York), projected worldwide revenue transport medium.
for ISPs will exceed $45 billion by 2002.
The ASP market has many helpers, as evidenced by the list of ASP Consortium members, which

Meanwhile, an emerging segment of application service providers (ASPs) offers includes hardware vendors like Compaq and Cisco, which furnish the servers and network infrastruc-

remote access to specific applications, such as enterprise resource planning (ERP) ture gear for ASPs, as well as software suppliers like Citrix, Great Plains Software, and IBM. Still,

applications, corporate databases, and complex vertical applications, over the Web. these companies do not qualify as ASPs by themselves.
Market researchers such as Forrester Research (Cambridge, MA) and International
172

The Importance of SLAs in the Service 1h' \A .11 lutem "dAm (t .eu rn, . 11'1114
wall WAN .mt vice I rntvldef''P
Atlrrr.m' , ul' ) No
461 - .¡ Yoe

ri1144: 54%
Environment 54 %
46%
Emerging e-commerce and e-business services rely on network availability. In the
volatile environment of the Internet, where traffic levels vary dynamically and an
error in one location can snarl traffic worldwide, it isn't as easy to guarantee per-
'low cio you measure SLA performance?
formance levels as it is in the world of dedicated links. Thus, Service Level 50%
40%-
Commercially available product
Agreements (SLAs) are playing a key role in the spread of today's IP-based ser- Part of the service
30%
20%
vices. Users demand them, and service providers are finding them to be a necessary Product or solution developed in-house
10%
0%
Commercially Pad of Product or solution
differentiator as competition increases. available product the service developed
32% 52% in - house
28%
This is not to say that today's SLAs for IP-based services are without problems. No
28%
On the user side, there is much to be desired. Users say most SLAs for standard Are you satisfied with the terms of your SLAs?

Internet services from ISPs are too simplistic to be truly effective. What's more, if Yes ESN Yes
No 72%
something goes wrong, many IT professionals report difficulties in getting reim-
bursement from their providers (see Figure 11.1).
45%
For their part, service providers embrace SLAs as much as their customers. But How could your SLAs be improved? 40%
35% i
44%
many service providers believe that they can standardize their SLAs in order to More reliable tools for measuring 30%
performance 25%
simplify life for themselves. They cite the difficulties involved in negotiating indi- More metrics 36% 20%'

vidual SLAs for many customers. "We'd never get services out if we spent all that Tougher penalties 36% 10/
5% 1 Fr ^1 , fJV
li
28%
time creating SLAs," said one provider. "If we do our job correctly, we can keep More frequent measurements 0%
More More Tougher More Other
36% reliable metrics penalties frequent
customers satisfied and keep SLAs uniform." Other
tools for measurements
measuring
performance
There are exceptions: Large, powerful customers that enlist multiple services from
one provider usually can negotiate their SLA terms as part of an overall service If you do not have SLAs in place, why not?
45%
40%;
35%
contract. Getting the best results, however, requires the input of a strong negotia- Provider does not honor them 43%
30% 1 '
1.1
1
"
r'
33% 25/ 1 I1f i,
tor. It also calls for the customer to establish a reliable means of monitoring ongo- Provider unwilling to negotiate
20%, -
Provider responds to pe rf ormance 33% 15%
ing service performance (see Sidebar). problems case by case 10%
I
^
'i
^ l'
r
{
_1
5%
Confident in provider performance without 29%
0/ 1 Provider Provider Provider Confident Other
them does not unwilling responds in prouder
Get It In Writing Other 5% honor to to performance
them negotiate performance wibn4
problems them
Don't try to sell corporate networkers on the merits of the honor system. What works well at West Point case by
case
shows flaws outside the walls of the academy, especially when it comes to SLAs: Can customers really
trust that carriers will meet the pledges they make, and make restitution when they come up short?
Figure 11.1 The role of SLAs in WAN services.
Ask David Giambruno, the global transition program manager for medical equipment manufacturer
Datex-Ohmeda, a subsidiary of Instrumentarium Corp. (Helsinki, Finland). He suffered days of down-
time on ATEtT's international frame relay network last October, and while he was fixing the problem,
he discovered ATEtT had charged him for five years of service he never knew about. Different Strokes
The burgeoning growth in IP services—and the variety of services now on offer —

"'We had PVCs (permanent virtual circuits) in Canada that weren't even hooked up to routers: he says.
To top it off, ATEtT (Basking Ridge, NJ) refused to make compensation—pointing out the SLA (Service presents a challenge when it comes to SLM. New services such as e-commerce
and ASP services will not always fit a single SLM mold because an extension or
Level Agreement) didn't cover either problem. ATEtT declined to comment, but Giambruno is blunt:
modification of existing SLM parameters is required. And in some cases, it might
'I've learned that unless you can actually see what's going on in your network, it's going to cost you:"
be necessary to set new parameters.
From the article, "SLA Monitoring Tools, Heavyweight Help," Data Communications magazine,
February 7, 1999.
When it comes to e-commerce, for instance, availability, performance, and other
SLM parameters will be affected not only by the customer's own network but also
by those of the business partners and suppliers who support the customer's online Irk will be sure to surfàce to the users' disadvantage later on. Ask lots of questions:
presence. Callers who place an online order with an e-commerce retailer, for As noted, most service providers have prefabricated SLAs they use as standard.
example, might experience a response time delay if the supplier on whom the 'They will not offer to extend these SLAs unless they are asked to do so.
retailer depends suffers a network failure. Although the retailer is not technically
responsible for the delay, it will affect his ability to deliver service to his customers.
Note
He is liable to compensate those customers if response times fall below promised Remember: Like it or not, corporate customers are the underdogs in the service relationship.
service levels. The retailer is legally the provider of service, regardless of the com- Controlling the business infrastructure gives the provider control over the customer's business. It
ponents that service contains. Any SLAs set up with e-commerce providers— is therefore vital to clarify all terms of the SLA and its implementation right from the start.
and/or SLAs made between providers—need to reflect these new facts of life
in the world of online services.
I'i' professionals can give themselves a better chance for success by keeping several
Emerging ASP services also present special challenges. ASP services are still so new things in mind when setting up a SLM relationship with service providers:
that users and providers have not determined the precise elements that will consti-
tute service-level criteria. New types of services are changing the rules. Do ASPs, • Know what you're talking about—Enter negotiations armed with baseline
measurements. Know what constitutes adequate performance and capacity for
for instance, offer their customers SLAs based on user response time, server
all business functions that the provider will be expected to support. Also know
response time, overall network uptime, or a combination of all these? Questions
exactly how long you are willing to wait to have something fixed if it breaks.
like these are still in debate as the market for services develops.
• Establish a common frame of reference—Make certain up front that the
There are other SLM challenges presented by emerging services: In many terms used in the SLA match those the service provider uses. Also, agree
instances, a service might include interdependencies between Web hosting with the provider on what methods and products you'll be using to monitor
providers and carriers offering the underlying network facilities. SLAs will need conformance with parameters. Some providers might not support your
to be established between the multiple providers as well as between providers and products, and vice versa, making it tough to make a case for compensation
customers. if something goes wrong.
• Document everything—Make sure that you ask your provider to endorse all
SLA parameters, including reimbursement in the event of outage or failure.
Smart Implementation
If your requests are not documented, the provider will be under no obliga-
Regardless of the complexities of particular SLA criteria, all the suggestions and tion to follow them. And the fact that the provider has so many customers
templates for creating and maintaining SLAs covered so far in this book can be will make it resistant to extending special unasked-for privileges.
successfully applied to the service environment. From the perspective of the user
• Be ready to pay for extras—Providers will often prove flexible when asked
of services as well as the provider, it is important to set up a task force, define to extend the terms of their one-size-fits-all SLAs. But most will ask for
SLA parameters, and agree on continual methods of monitoring and follow-up. additional payment beyond a certain point. This situation is normal; expect
The service environment also presents unique SLM implementation challenges, to pay for the terms you need.
both from the user and service provider perspectives. It is important to be aware • Get your act together—Make sure that your in-house SLM team is well
of these from the beginning in order to ensure success. prepared and unified. Recordkeeping is a key part of the SLA and needs full
support on your end. If something does go wrong with the service you are
Fundamentally, these distinctions center on the fact that the service provider holds
contracting, it will be vital for team members to work as a well-informed
the advantage in relation to its customers. By offering the vital services on which
team in order to get repairs and compensation.
the customer's business is run, the provider is virtually in control of the customer's
business itself. • Keep an open mind—Our advice so far is based on the fact that users need
to take extra precautions in setting up SLM with their providers. But don't
maintain a defensive attitude: Remember that most service providers intend
Advice for Users to do the best possible job for their customers in order to keep themselves in
The unequal nature of the user/provider relationship makes it vital for users to be business. Encouraging an atmosphere of cooperation will serve you better
thorough in establishing SLAs right from the start. Remember that any loopholes than harboring an adversarial attitude.
176 177

in a vacuum, where no precedents exist to guide them.That said, there are rules
Advice for Service Providers providers can follow to help them navigate the uncharted waters of today's service
SLM can work as well—or better—for service providers as it can for their cus-
offerings:
tomers. But there are unique considerations from the service provider perspective.
Providers might start from a position of power relative to their customers, but this • Make infrastructure serve your SLAs—Service providers can make the most
does not mean that they themselves are not vulnerable. In many ways, today's ser- of service level management by building it into their infrastructure through
vice providers are just as vulnerable as their customers. After all, their business is to the use of technologies like Quality of Service (QOS), which uses intelli-
offer reliable service. If they fail to do that, they cannot stay in business. gence built into routers and switches to control the flow of network traffic.
It is worth the investment of time and effort to find and take advantage of
SLM does more for service providers than merely offer protection from liability. It
these new resources for guaranteeing performance.
helps them create a frame of reference for new and existing services. Knowing the
level of performance they can guarantee allows them to pass along SLAs to their • Stay in tune with customers—Most providers are focused more on deter-
customers that differentiates them from other providers. mining the kinds of services customers want than on ways to present new
SLAs. Still, it pays to keep in touch with demands for SLA improvements,
SLM also helps in the creation of differentiated services, in which different groups of particularly as the quality and content of SLAs will increasingly differentiate
users are offered disparate guarantees of service, based on their payment plan. services from multiple providers in emerging segments.
"Gold" customers, for instance, might be offered continuous availability at an
agreed-upon level of response time; "silver" customers would get response times
• Take the lead with other participants and suppliers—Many types of services
today call for cooperation among multiple providers. E-commerce is one
within a certain range of measurement; and "bronze" customers would receive
example. Leave nothing to chance if you have these kinds of interdependent
"best effort" service. SLM provides the input that enables the service provider to
relationships. Remember, your business depends on all the links in the chain
offer differentiated services; and it also gives them the ongoing framework for
performing consistently.
implementing them with customers.Table 11.1 illustrates a typical model used by
providers of differentiated services. • Keep an open mind—Stay flexible with your customers and business suppli-
ers. Rigidity will not serve you well in a market in which new competitors
are ready to take business from you at a moment's notice.
Table 11.1 A Typical Model of Differentiated Services
Service Rate Response time Availability
Gold 10Mbps <1 second 99.9%
Summary
Silver 5 to 10Mbps <3 seconds Over 90%
The burgeoning services market is a proving ground for SLM. Users need to be
Bronze 2 to 5Mbps <5 seconds Over 80%
firm and clear in negotiations with providers. Providers need to stay in touch with
Standard Best effort Best effort Best effort
customer demand and remain flexible and open to new methods, while ensuring a
consistent level of performance and availability in multiprovider situations. Over
In general, service providers have the same problems their customers do, often on a time, the demands of the service environment will no doubt help service level
grander scale. Also, service providers face a range of challenges that their customers management develop beyond its present scope.
do not. Specifically, they must ensure that other providers and suppliers on whom
they depend can furnish a level of performance, availability, and capacity that
enable them to pass along a single, consistent level of service to their customers.
In effect, many of today's online services, such as e-commerce services, depend on
a group of suppliers maintaining a chain of performance. A break in the chain will
affect the ability of all participants to meet service-level expectations.

In some cases, service providers will need to take the initiative in establishing
SLA parameters ahead of industry trends. Many ASPs, for instance, are breaking
new ground when it comes to service models. They find themselves creating SLAs
CHAPTER

Moving Forward

W e have covered a large amount of information on the state of service level


management in the industry today. However, the story has only just begun. We
anticipate many advances during the coming months and years.The continuing
maturity of the understanding and best practices of service level management will
happen more rapidly if IT managers share information, monitor the evolution of
standards, and push vendors to provide more capable solutions.
This chapter recaps some of the more salient aspects of the current state of service
level management and also suggests a mechanism for commencing and continuing
a dialog to assist in the maturation process for service level management.

Establishing the Need for Service Level


Management
As corporations and organizations move forward with using information technol-
ogy for business effectiveness, the quality of services delivered by the IT depart-
ment becomes more critical for business success. Most corporations are rapidly
180 1t1

moving to greater use of technology, particularly Web and Internet technologies, to knclt service should have identifiable business owners as well as responsible IT
communicate more effectively with employees, business partners, and consumers. nrutagernent personnel assigned to them.The locations where each service is
delivered together with the user community served should be identified and
In general, the business goals for the IT department in the new millennium can be documented. This will help to assess the business impact in the event of a service
characterized as
outage or degradation. It also allows the IT department to identify who needs to
• Improving internal business efficiency through intranets, knowledge be involved in negotiations for the Service Level Agreements.
management, data mining, and information sharing
✓ Improving the cost effectiveness of the supply chain by integrating enterprise
Communicating with the Business
applications, connecting to business partners via extranets, and outsourcing
non-mission-critical elements One of the most crucial aspects of successfully implementing a service level man-
agement strategy is the ongoing communication between the IT department and
• Enhancing customer and distribution channel relationships and loyalty
the lines of business. Service Level Agreements, if constructed appropriately, pro-
through one-to-one marketing, sales force automation, personalized Internet vide the IT department the ability to discuss goals, responsibilities, and issues with
Web sites, and data mining t:he lines of business in terms they will easily understand. Service management
aligns the IT department with the business and will raise the credibility and value
These goals cannot be fully realized unless the quality of the services delivered by of the IT department in the eyes of the business managers.
the IT department is adequate. If internal systems are not available or responsive
enough, employee and business productivity degrades. If supply chain applications
are not delivering the right level of service, business partner linkages will not be as Tip
If the IT department is seen as overhead and treated as a cost center, implementing a proactive
effective; in many cases, costs will rise, rework might be required, and upstream
service management strategy is an excellent initial step in changing that perception. As increased
processes might be negatively impacted. If the problems are severe enough, the
competition forces the lines of business to look for competitive and market advantages, the IT
business partnership might be placed in jeopardy.
department can take an increasingly stronger leadership role in delivering capabilities to improve
Similarly, many corporations are moving to a direct self-service model for interact- business and market effectiveness. However, when the IT department takes on this challenge, it will
ing with their customers using the Web as the communication mechanism. be even more important to ensure consistent, high-quality service delivery.
Hundreds of millions of dollars are being spent to attract customers to those Web
sites. If the site is not available, the Internet application is not responsive, or the The dialog with the lines of business is important when defining services and
customer feels vulnerable from a security or privacy perspective, he will not have negotiating Service Level Agreements. Reporting on service quality when prob-
a good experience. Not only will he be reluctant to buy something or conduct a lems occur, as well as when excellent service is delivered, is another important
business transaction on the initial visit, it is unlikely that they will return to the aspect of building trust and credibility. It will also be important to jointly conduct,
site, or it will take significant marketing dollars to attract them again. with the lines of business, regular satisfaction surveys and reviews of the Service
Chapter 8, "Business Case for Service Level Management," provides a business case Level Agreements. This helps the IT department to stay in touch with changing
for service level management along with a sample cost justification worksheet. business requirements and user perceptions of service quality.
Hopefully, these won't be needed in order to convince senior management of the
need to carefully manage service quality in the same way as they would monitor
and manage other valuable business assets. Negotiating Service Level Agreements
Service Level Agreements can never be set by the IT department alone. They must
be developed in conjunction with the lines of business and, where necessary, nego-
Defining the Services to Be Managed tiated as with any other contract. There is an inherent balance between the service
The initial step in any service level management initiative is to clearly define and levels that can be achieved, the workload that can be supported, and the cost of
prioritize the services delivered by the IT department. This must be done in con- delivering the service. When that is understood by all parties to the negotiation,
junction with the lines of business so that the true business value and importance there will typically be ample opportunity to formulate an agreement acceptable It'
of each service is clearly understood. the IT department as well as the lines of business.
All the agreements used by the IT department should follow a similar format, Il tltlgltt also he beneficial to align sonic component of compensation for IT per-
but the actual terms and conditions can vary from one agreement to another. sonnel with the service quality and meeting Service Level Agreements. In some
The detail required within each agreement could also vary, particularly when the corporations, incentives have also changed to encourage more proactive automated
business importance and time criticality of the services vary. The Service Level approaches to ensuring that service level objectives are met, rather than providing
Agreements should include the conditions under which the agreement should incentives for reactive fire-fighting problem correction practices.
be re-negotiated to remove any contention in the future.

Using Commercial Management Solutions


Managing to the Service Level Agreement Initially collecting information and monitoring service levels might be performed
When the Service Level Agreements are in place, the difficult task begins of ensur- using standard utilities and manual techniques.This might be sufficient to establish
ing that the required service levels specified in the agreements are met. Service level some baselines of service quality, but it will very quickly be inadequate for proac-
management is not simply reacting to problems and reporting the achieved service tive service level management. Although it might be tempting for the IT depart-
levels. Properly implemented, service level management includes proactively devel- ment to develop scripts and other automated mechanisms for capturing service
oping the right procedures, policies, organization structure, and personnel skills to level information, it is generally more efficient to use commercial solutions for ser-
improve service quality and to ensure that users and the business are not impacted vice level management. Chapter 7, "Service Level Management Products," provides
by any service difficulties. information on a variety of solutions that are generally available. It is unlikely that
The starting point is to first capture the end-user experience of service quality, a single product or solution from a single vendor will meet all your service level
particularly the end-to-end availability and response times. There are a number of management requirements.
viable approaches to capturing and collecting metrics that provide this informa- Even having to integrate solutions from multiple vendors will generally be an easier
tion. When the end-to-end service quality is being continuously monitored, ser- proposition for most IT departments than attempting to develop solutions com-
vice degradation or trends that indicate Service Level Agreements are in jeopardy pletely in-house. Applications change, as do the underlying database, middleware,
can be detected. Upon this type of event detection, procedures should be followed operating system, and network hardware and software. Maintaining currency with
to isolate and correct the problem, along with proactive notification to end users, the individual new versions as well as the various combinations of components can
alerting them to the issue together with an estimate of when normal service will be an onerous task for a single IT department, whereas independent software ven-
be resumed. This proactive approach to service management increases the credibil- dors can distribute the development costs across a number of customers.
ity of the IT department and also increases the willingness of lines of business to
work with the IT department.
Skilled technicians are required to diagnose and correct problems when service degra-
Continuously Improving Service Quality
dation or potential degradation occurs. Wherever possible, these diagnostic and recov- One aspect of service level management that must be recognized is that user
ery routines should be automated so that if the same condition occurs again, it can be expectation and business requirements will continue to increase over time. This is
recovered in machine speed rather than having to wait for operator intervention. why adopting a very proactive approach to service level management is very
important; because operating in a reactive mode will not support continuous
Proactive service level management is a combination of structuring the right orga-
nization, ensuring that the staff have appropriate skills, defining and implementing improvement.
the right methodology and procedures, and using an appropriate management There is a natural maturation process associated with service level management
solution to monitor and improve service quality. that involves

Tip . Monitoring the service quality by monitoring individual components and


evolving to monitoring from the end-user perspective.
The organization structure should align front-line operational teams with each service rather than
the traditional approach of aligning operations personnel by technology layers; for example, separate . Managing the service to reduce the impact of service degradations.
teams of network managers, database administrators, and system managers. This provides a much . Controlling the service in an automated fashion to proactively detect and
better interface to the lines of business, and the service teams become the advocates for the lines of correct problems.This also ensures consistency of management actions and
business within the IT department. removes the potential of human error.
MaaMMIZIM
1 Uzi I Y I

e Delivering service continuity by predicting future business requirements and A new breed of solution has evolved over the last few years,TYpically referred to as
the associated resources that will be necessary to support the business with application management, these solutions seek to manage from an application per-
appropriate levels of service. ypective and drill into the underlying technology layers where necessary to resolve
. Moving to a virtual environment where many of the supporting services prublems,This provides a much better alignment between the IT department and
the lines of business and is more attuned to supporting a service level management
provided by the IT department use online interfaces and capabilities to pro-
inn iative.These solutions are being augmented to capture the end-user experience,
vide better service to the end users while reducing the overhead on the IT
department. This allows the IT department to use freed staff resources for which provides the basis for understanding and improving service quality.
more proactive activities such as planning future capacity requirements. Several vendors are also providing more sophisticated service reporting capabilities
based on the ability to either capture directly or derive the end-to-end availability
Tip and responsiveness of critical application services.
Building on the experience of others is a very effective way of enhancing your service level manage- ' l'he future direction of management solutions will include advances in the
ment practices. Attending trade shows and conferences and using the opportunity to network with following two important areas:
other attendees is one mechanism for achieving this. Later in this chapter, we suggest another
mechanism using a Web site. • Enhanced intelligence built into the solution as it is delivered from the
vendor
Keeping track of advancements in standards efforts and solution technology will • Broader solution scope to provide service management of the entire business
assist your understanding of the state of the art in the industry. This knowledge processes
will help you continue to enhance service level management procedures and
practices within your IT department.
Enhanced Management Solution Intelligence
Evolution of Service Level Management Standards Today's management solutions typically require significant customization before
they are able to effectively monitor and manage service quality. Some vendors
Today there are no real industry-accepted standards for service level management,
provide off-the-shelf knowledge for various components in the environment.
Service Level Agreements, or metric definitions for capturing data to monitor ser-
However, relating the components to the supported service and to the end users
vice levels. We can expect standard definitions to evolve during the early years of
the new millennium. using the service is difficult and generally requires customization. We can expect
advances in this capability through the use of directories, repositories, and advances
These standards initiatives are most likely to evolve from the Distributed with standards such as the common information model (CIM).
Management Task Force (DMTF), IT Service Management Forum (ITSMF), and
the Internet Engineering Task Force (IETF). Chapter 5, "Standards Efforts," pro- The ability to capture the relationships between the various components also
vides the Web site addresses of these organizations, which contain status informa- provides the basis for more sophisticated event aggregation and correlation, as well
tion that makes tracking their progress very easy. as root cause analysis. This capability will increase the level of automated problem
diagnosis, allowing the IT department to concentrate on solving the specific
Although we can expect standards to evolve, the criticality of implementing proac- problem causing the service degradation. Reducing the time required to diagnose
tive service management means that most IT departments should adopt pragmatic problems will also allow the IT staff to spend more time on automating recovery
service management approaches prior to standards becoming available. actions, proactively planning future requirements, and working with the lines of
businesses on supporting strategic business initiatives.
Evolution of Management Solution Capabilities Reading relevant articles in trade publications as well as research reports from
The capabilities of commercial management solutions will also continue to evolve. industry analyst firms is one way of keeping abreast of technology and solutions
Most management vendors started with network management offerings and aug- advances.
mented these with system management capabilities. Additional capabilities for
managing other components such as databases and middleware are available and
are maturing rapidly.
186

Service Management of Business Processes


The most sophisticated management solutions today manage a single application
at a time. This provides service level management for users and business processes
that use a single application. However, as businesses seek greater effectiveness,
enterprise application integration is becoming more important as supply chains are
connected with manufacturing systems, and back-office applications are connected
with front-office systems. This means that managing service quality from a business
process perspective requires management solutions with a greater scope and the
ability to span multiple applications.
A number of vendors offer enterprise consoles that have the capability to provide
multiple views onto the business environment, and these are rapidly evolving to
deliver business process views.

Caution
As with any new initiative within the industry, vendor hype around business process management
will confuse the marketplace. When evaluating these claims, it is a good idea to go back to the
basics. A business process spanning multiple applications can't be effectively managed unless the
solution has visibility into, and can manage, the individual applications. Similarly, a single applica-
tion can't be effectively managed unless the solution has visibility into, and can manage, all the sup-
porting infrastructure layers.
Appendixes
Establishing and Continuing the Dialogue
The intent of this book has been to examine the state of service level management
today and to provide some practical help in implementing a service level manage-
Appendix
ment initiative. We anticipate that the topic of service level management will con-
tinue to be of major interest to most IT professionals. As stated previously, we also A Internal Service Level Agreement Template
expect that the capabilities and general understanding of service level management B Simple Internal Service Level Agreement
methodologies and technology will continue to evolve.
Template
We would like to participate in that evolution and extend an invitation to you, the
C Sample Customer Satisfaction Survey
reader, to also be involved in the progress of service level management. To this end,
we have set up a Web site at www. nextslm. org . On this site, you will find some of D Sample Reporting Schedule
the templates provided in the appendix to this book, as well as other material we
felt would be beneficial to share. There are chat capabilities as well as instructions E Sample Value Statement & Return on
on how to post material to the site. Investment (ROI) Analysis for a Service
We hope the Web site will promote sharing of best practices and a continuing Provider Delivering an SAP Application
dialog between like-minded professionals seeking to advance service level manage- Selected Vendors of Service Level
ment. We thank you for your interest in this book, and in advance, for sharing in
F
Management Products
the forthcoming dialog.
Appendix

Internal Service Level


Agreement Template

About the SLA


This section provides a general description of the intent of the service level agree-
ment (SLA) as well as the owners, approval and review process, and a definition of
the terms used in the document.

Statement of Intent
This service level agreement (SLA) documents the characteristics of an IS service
that is required by a business function as they are mutually understood and agreed
to by representatives of the owner groups. The purpose of the SLA is to ensure
that the proper elements and commitment are in place to provide optimal data
processing services for the business function. The owner groups use this SLA to
facilitate their planning process. This agreement is not meant to override current
procedures, but to complement them. Service levels specified within this agree-
ment are communicated on a monthly basis to the owner group representatives.
190 1ti1

Approvals Description
Table A.1 shows which business groups and IS groups share ownership of the I 1w service management group provides the following service:
service, and which of their representatives have reviewed and approved this SLA.
• Ensures that the specify name application is available for users to log on and to
specify business purpose of the service
Table A.1 Organization Representation
• Responds to and resolves user questions about, problems with, and requests
Ownership Type Organizational Group Representative for enhancements to the application
Business Function Name of business unit Business unit
supported by this service representative
IS Service Name of service Service manager User Environment
Computing Services Support team for service Team leader
The business function is conducted in the following data processing environment
as shown in Table A.2.

Review Dates Table A.2 Service User Community Characteristics


Last Review: Date of last SLA review Number of Users Approximately number of service users
Geographic Location Specify physical locations of users
Next Review: Scheduled date for next SLA review
Computer Platform Specify actual systems and desktops used to support the service;
include any prerequisites in terms of operating system, database,

Time and Percent Conventions and so on

This SLA uses the following conventions to refer to times and percents:
Times expressed in the format "hours:minutes" reflect a 24-hour clock in the
central standard time zone. About Service Availability
Times expressed as a number of"business hours" include those from the hours This section provides information about the normal schedule of times when the ser-
from 8:30 to 17:30. vice is available. It also describes the process for enhancing or changing the service.
Times expressed as a number of"business days" include business hours, Monday
through Friday, excluding designated holidays.
Normal Service Availability Schedule
The symbol "---" indicates that no time applies in a category (for example, no Table A.3 shows the times the service is available for customer use.
outages are scheduled for a day).

Table A.3 Service Availability


About the Service Times Sunday Monday Tuesday Wednesday Thursday Friday Saturday

This section provides a description of the service and the user community, 0:00 0:00 0:00 0:00 0:00 0:00 0:00**
Start
including their physical location. 24:00 24:00 24:00 24:00 24:00 24:00 24:00
Stop
**Adjusted when necessary for scheduled outages and nonemergency enhancements
I t1 i b3

Scheduled Events that Impact Service Availability About Service Measures


Regularly scheduled events can cause a service outage or have an impact on per- 'The sped service management team monitors and reports the service quality Table A.5
formance (such as slow response time). Table A.4 shows when these are scheduled shows the service measures that are reported along with the performance targets.
to occur.

Service Quality Measurement


Table A.5
Table A.4 Scheduled Outages for the Weekly Server Reboot Performance Target
Measurement Definition
Times Sunday Monday Tuesday Wednesday Thursday Friday Saturday The percent of time that the Insert target
Service Availability
Start 3:00 application is available during percentage
Percent
Stop 4:00 the normal schedule minus the
impact time from any events
(scheduled or unexpected) other
than loss of network or system
Nonemergency Enhancements availability
All changes that take more than four hours to implement or that impact user User Response Time The time taken for the appli- Insert targets—
workflow are reviewed by the service name Advisory Board for approval and cation service to complete a normally specified as
prioritization. user request and user request X% of transactions of
and user request and return a type Y to be completed
Enhancements and changes that do not require a service outage and that do not with Z seconds
response
impact user workflow are implemented upon completion. 1-High Priority—
Problem Response The time required for a user to
Enhancements and changes that require a service outage are scheduled on Saturday Time receive a response after reporting insert target time
mornings. Users are notified at least two business days in advance when a non- a problem to the Help Desk 2-Medium Priority—
emergency service outage is required to implement an enhancement or change. insert target time
3-Low Priority—
To request an enhancement, submit a problem by specify problem submittal process. insert target time

Problem The time required for a user to 1-High Priority—


Circumvention or receive a circumvention or a insert target Time
Change Process
Resolution Time solution after reporting a 2-Medium Priority—
Changes to any hardware or software affecting the application should be requested problem to the Help Desk insert target time
by specify change request process. 3-Low Priority—
insert target time

Requests for New Users


To add a new user to an existing team requires notifying the specify appropriate The Help Desk prioritizes requests for support according to the following
representative, or submitting a completed User Request form, and specifying the priority-level guidelines:
team name and the user job role (or a pattern user). Requests are usually satisfied 1-High Priority
within two business days.
Service name is not operational for multiple users.
To set up a new team requires notifying the specify appropriate representative. These A major function of service name is not operational for
requests are treated as enhancement requests and are prioritized by the service name multiple users.
Advisory Board.
2-Medium Priority
Service name is not operational for a single user.
A major function of service name is not operational for a
single user.
A user needs to access a locked record.
3-Low Priority Appendix
A minor function of service name is not operational
for one or more users (who can continue to use other
application functions).
A user has questions about service name functionality.
A user needs administrative assistance.
Enhancement requests are logged as Priority 3-Low
Priority, but are reviewed and scheduled by the service
name Advisory Board.

Simple Internal
Service Level
Agreement Template

T he insert service name is usedThe


by insert description of user community to insert
IT department guarantees that
description of the service capability.

1. The service name will be available insert percentage of the time from insert
normal hours of operation including hours and days of the week. Any individual
outage in excess of insert time period or sum of outages exceeding insert time
period per month will constitute a violation.
2. Insert percentage of service name transactions will exhibit insert value seconds
or less response time, defined as the interval from the time the user sends a
transaction to the time a visual confirmation of transaction completion is
received. Missing the metric for business transactions measured over any
business week will constitute a violation.
3. The IT department will respond to service incidents that affect multiple
users within insert time period, resolve the problem within insert time period,
and update status every insert time period. Missing any of these metrics on
an incident will constitute a violation.
4. The IT department will respond to service incidents that affect individual
users within insert time period, resolve the problem within
insert time period,
and update status every insert time period.
Missing any of these metrics on an
incident will constitute a violation.
5. The IT department will respond to noncritical inquiries within Appendix
period, deliver an answer within insert time period, insert time
and update status within
insert time period. Missing any of these metrics on an incident will constitute
a violation.

Sample Customer
Satisfaction Survey

T hank you for taking the time to provide feedback regarding the services
provided by the IT department. There are three areas that you may evaluate:
• Customer Service Orientation
• Results Orientation
. Expertise of Staff

There is also an area for general comments and future IT requirements.

Rating Service Quality


The quality ratings to be used are

• Poor: Service was significantly below expectations


. Fair: Service was below expectations
• Good: Service met expectations
• Very Good: Service exceeded expectations
. Excellent: Service significantly exceeded expectations
1lU 199

If you enter a Fair or Poor rating, we ask that you provide additional comments.
General Comments
Table C.1 shows the qualities and skills descriptions that should be used when Please make general comments in the following areas:
making evaluations.
;nstomer Service Orientation:
Results Orientation:
Table C.1 Quality and Skill Descriptions
Area Ih.xpertise of Staff:
Qualities and Skills Evaluated
What things do you feel the IT department does well and what things could we
Customer Service Orientation Courteous, congenial, responds in a timely manner,
do better? What works and what does not? Please be specific.
gets along with customers, cost-efficient, profes-
sional, enthusiastic
Results Orientation Maintains focus, persistent, strong commitment,
organized, `can-do' attitude, takes initiative, takes
Current Usage
This section helps the IT department gain a better understanding of the service
pride in work, achieves goals, takes responsibility,
usage and support patterns of our customers. Please answer the following question.
dependable
Expertise of Staff Technical knowledge, effective oral and written How would you describe your reliance on information technology to perform
skills, good listener, perceptive, objective, thorough, your job?
analytical, decisive, insightful, intuitive
• Extremely Heavy
Please complete Table C.2 by rating each of the services against the three attributes. • Heavy
• Moderate
Table C.2 Service Ratings • Light
Service Customer Results Expertise • Very Light
Service Orientation of Staff
Orientation Please indicate in Table C.3 the most frequent contact you have with the IT
department in each of the designated areas.
BUSINESS APPLICATIONS
Financial Application
H/R Application Table C.3 IT Department Contacts
Email Daily Weekly Monthly Quarterly
Contact Type
Web Access
Annually
DESKTOP SUPPORT
PC Hardware/Software Reporting a

UNIX, X-terms service problem


Requesting a
NETWORK SUPPORT
new application project
Local Network
Remote Network Requesting an
application enhancement
Phones/Voice mail
Adding a new user
TECHNICAL SUPPORT
Mainframe Requesting new

UNIX Servers network access

NT Servers Requesting
service access
200

Future Requirements
In your opinion, what specific areas should the IT department focus on during the
next year? Please be specific.

Optional Information
Please provide the following information so that we can follow up with you:
Appendix
Name:
Department:
Location:

Sample Reporting
Schedule

T he following outline recommends report content and frequencies. The reports


are additive. At the end of a quarter, the monthly, weekly, and daily reports would
be produced in addition to the quarterly report.

Daily Report
The daily report is a tactical report showing sufficient detail to allow the IT
department and IT management to have a good understanding of the service
quality of the previous day. These reports are typically kept online for two weeks.
The contents include

• Outage report by application by location


• Response time report by application by location summarized at 15-minute
intervals for the prime shift, and at 30-minute intervals for the off-shift
• Problem reports by priority, including a brief description of the problem for
critical and severe problems
202

• Average problem response time by priority


. Problems closed and outstanding by priority
• Security violations and attempted intrusions

Weekly Report
Appendix
The weekly reports are used by both the IT department and the lines of business
to review the service quality delivered by the IT department. These reports are
kept online for eight weeks. The contents include

. Workload volumes by application summarized by shift by day


. Outage summary by application by shift by day
• Recovery analysis for all outages of significant duration
• Cumulative outage duration for the month by application
• Response time percentiles by application
• Security violations and attempted intrusions
Sample Value Statement &
Monthly Report Return on Investment (ROT)
The monthly report is a management report that focuses on how well the IT
department is servicing the lines of business. The monthly reports are kept online
Analysis for a Service
for six months. The contents include
Provider Delivering an SAP
• Report card summary
. Workload volumes by application
Application
• Service level achievement summary by application service
. Highlighted problem areas and analysis

T his appendix contains a case study of the justification for implementing


service level management at an application service provider. The name of the
Quarterly Report company has been suppressed. In addition to a qualitative discussion of the value
The quarterly report is a business report focused on identifying trends in service from implementing service level management at this company, there is a quantita-
quality as well as overall satisfaction. It also provides information on future initiatives. tive analysis of rates of return on investment.
The quarterly reports are kept online for four to six quarters. The contents include The value proposition is explained by looking at two different categories of value:
refers to the soft value that is typically harder to quan-
. Workload trend report by application and user community benefits and ROI. A benefit
tify in direct dollar revenue generation or cost savings. People, productivity, and
. Customer satisfaction survey results or ROI, refers to
perception frequently fall into this category. Return on Investment,
. Service level achievement trends hard real-dollar savings or direct revenue generation or direct dollar cost savings.
• Cost allocation summary
. New IT initiatives
L U' Lvu

Summary of Value Metrics Needed to Quantify


The nature of the service provider business is providing application availability. • Past frequency of missing service levels
Inherent in this is a guarantee of a certain level of availability and performance.
• Cost of a performance credit and the formula for determining when real
The Service Level Agreement (SLA) evidences this with our customers. We pro- money is lost due to performance credits given
vide for a penalty when this level of availability is not met. The major service level
management value comes from providing the exact methodology and tools needed
to manage at the required level. Ixpected Service Level Management Benefit
The ROI value areas are A reduction in number of times of missing a service level and the resulting
lowering of performance credits given. This will result in fewer dollars lost due
• Avoid paying a financial penalty by meeting service level objectives for a
to performance credits.
customer
• Slower growth (hiring) of the support and operations staff
• Reduce the number of help desk calls Slower Growth (Hiring) of the Support and Operations Staff
• Eventual possible elimination of the help desk operations Automation of certain routine tasks and recovery processes means that the support
and operations staff has more time to spend on other nonroutine tasks and pro-
• Software licensing savings jects. That means as the workload increases, new staff will not have to be hired
as fast as the growth of new customers.
The benefit value areas are

• Lost customer credibility with excessive downtime or poor response time How Service Level Management Contributes Value
• Time savings of the operations staff Proactive service management provides the automation, notification, email, and
• Time savings of the shared group paging capabilities to make this possible. Also, because the service level manage-
ment methodology helps to better determine which roles are responsible for what
• Greater credibility of the SLA numbers
service areas, problem detection and determination time are reduced as well.
• Sales competitive advantage
• Reduced time and manual effort for the billing staff
Metrics Needed to Quantify
• Projected new customer and user frequency
Return on Investment (ROI) Value Areas • Projected support and operations staff increases
These are the hard-dollar savings or increased revenue that can be generated as a
result of implementing proactive service level management.
Expected Service Level Management Benefit
A slower than projected growth rate of support and operations staff.
Avoid Paying a Financial Penalty by Meeting Service Level
Objectives for a Customer
Reduce the Number of Help Desk Calls
Service Level Agreements provide for giving our customers performance credits
Quicker recovery of certain routine tasks along with real-time Web posting of
when a service level is not met in some defined area.
application statuses will reduce the number of calls to the help desk.There is a
direct cost associated with the resources needed to staff and operate a help desk.
How Service Level Management Contributes Value
Proactive service level management monitors and manages the very thing that is in How Service Level Management Contributes Value
the Service Level Agreement with your customers. There is no stronger statement Proactive service level management includes the Web posting and communication
of value that service level management can make than this.
facilities to customers and end users so that they will not have to call the help desk
4410 Lin

as often to report a problem or find out information. In oilier companies, this Expected Service Level Management Benefit
process has resulted in calls to the help desk being reduced by as much as 85%. lventual elimination of the help desk staff.
Using certain assumptions based on cost of resources and time to complete a call,
other companies have estimated the cost of a single help desk call to be in the
$20—$25 range. Multiplying this cost multiplied by the expected reduction in Software License Savings
number of calls demonstrates that this process can result in substantial savings. I4y using information on active and inactive users of a software package in an
environment where license purchase is based on concurrent usage, the number of
Metrics Needed to Quantify licenses needed can be managed to be much less than the observed maximum
• Cost per hour of help desk and support resources number of concurrent users.
• Average time spent on a single help desk call
How Service Level Management Contributes Value
Proactive service management provides the capability to understand which users of
Expected Service Level Management Benefit an application are active and inactive, and how long inactive users have been in
Substantial reduction in the number of help desk calls and a corresponding that state. Through a baselining process, you can determine how many concurrent
decrease in cost. licenses you need if you automatically logged off users who have been inactive
longer than a certain period of time. Then the service management center per-
forms the automatic logoff to keep you in license compliance. This has been an
Eventual Potential Elimination of the Help Desk Operations observed and demonstrated benefit at other companies. Hundreds of thousands of
Elimination of the help desk function that involves actually answering the phone to dollars have been saved in just a few years by using this information.
take all problem and request calls can be accomplished over time through the imple-
mentation of proactive service management and can result in significant savings. Metrics Needed to Quantify
• Type of licensing arrangements
How Service Level Management Contributes Value
• Cost of a license for specific application packages
The service management center, through its Web posting and virtual help desk
• Expected growth rate of users
enabling capabilities, can help the IT staff to align systems statuses and problem
reporting alike so that users and customers are presented with the same levels of • Current license usage levels
drill down. This alignment also means that problems are directed to the proper
support resource in the same way that the IT department alerts staff automatically
Expected Service Level Management Benefit
of internal problems.This has been an observed and demonstrated benefit at other
companies. Several companies have no help desk staff All problems are logged Dollar savings in fewer concurrent licenses needed and growth containment in
through the internal Web page, system and application statuses are posted to the new licenses needed.
internal Web site, and a backup telephone number using automated voice recogni-
tion (AVR) technology directs callers automatically using the same prompts as on
the internal Web site. Benefit Areas
Benefits include soft dollar areas such as people productivity, customer confidence,
Metrics Needed to Quantify and brand perception that are harder to quantify and use as a justification for ser-
vice level management, but are nonetheless important.
• Annual cost of help desk staff
• Current help desk staffing levels and projected growth rates
• Cost of AVR technology
■ vo

Lost Customer Credibility with Excessivo Downtime or Greater Credibility of the SLA Numbers
Poor Response Time Most IT environments that report service levels for availability and performance
A service provider will quickly lose credibility with customers who frequently collect numbers based on assumptions and memory recall, and few facts that can
experience what they perceive to be excessive down time or poor response time. he validated. PProactive service management reports on numbers that are based only
Even though the service provider will pay a penalty for not meeting service level oil true, measured availability, and response times from an application, as well as an
objectives with customers, the intangible loss of credibility can ultimately cost the end-user perspective.
service provider new and existing customers in the long run.
How Service Level Management Contributes Value
How Service Level Management Contributes Value Proactive service management collects availability information from all technology
Proactive service management manages both availability and performance to cus- components involved in an application service and records exact uptimes and
tomer needs. These are two of the items that can significantly contribute to loss of downtimes. There is no guesswork involved and everything is automated. When
credibility. the entire application service is covered through electronic means, the numbers
have credibility.

Time Savings of the Operations Staff


Automation of certain routine tasks and recovery processes means that the existing Sales Competitive Advantage
support and operations staff has more time to spend on other nonroutine tasks and Besides access to applications, high availability and good performance of business
projects. Having more time to spend on projects means performance improvements transactions are the biggest benefits a service provider offers to potential customers.
across the board. This also helps improve morale of the staff because they would pre- Having the infrastructure tools, processes, and resources aligned in the same way
fer to work on the more challenging and less routine aspects of their jobs. can be a selling point with potential customers.

How Service Level Management Contributes Value How Service Level Management Contributes Value
Proactive service management provides automation, emailing, and paging capabili- Service level management contributes value in this area by providing the means to
ties, which free up time for the staff to perform nonroutine tasks and projects. manage and improve availability and performance.

Time Savings of the Reporting Group Reduced Time and Manual Effort for the Billing Staff
The reporting group has the responsibility of collecting the data and analyzing this The billing staff must collect accurate information on availability by application
information to prepare and publish SLA data. This is a manual effort today, which and customer in order to calculate the bills and credits properly each month. This
is time-consuming. As the company grows in size of number of customers, this effort can be reduced with proactive service level management that automatically
effort will take even more time. Automation of the collection and publishing of collects these numbers, thus saving time and money of the billing staff.
this information can save substantial amounts of time, as well as result in greater
accuracy.
How Service Level. Management Contributes Value
Service level management contributes value in this area by providing accurate
How Service Level Management Contributes Value availability numbers electronically in whatever format is desirable. This comes from
Proactive service management is based on collecting the information needed to a service monitoring database where these numbers are consolidated from all tech-
produce SLA reports. This will eliminate the manual effort of the reporting group nology components involved in the process.
for collecting the information, comparing it to defined SLAs, and reporting it.
I

Return on Investment Analysis


Along with the qualitative analysis, a quantitative analysis has been produced show-
ing three-year rates of return on the investment required to implement proactive
service level management. These results are shown in Figure E.1.

Sample Service Provider


Project ROI Analysis
Appendix
3Year Utilization

Benefits/Costs Year 0 Year 1 Year 2 Year 3


Slower Rate of Hiring - Systems Support $0 $140,000 $210,000 $280,000
Reduce Help Desk Hiring Rate / Eliminate HD $0 $140,000 $210,000 $280,000
$0 $336,000 $840,000 $1,344,000
$0 $616,000 $1,260,000 $1,904,000

Professional Services - SAP SLM $0 $450,000 $0 $0


Product Cost - SAP $0 $480,000 $1,080,000 $960,000
$0 $930,000 $1,080,000 $960,
Annual Net Cash Flow $0 ($314,000) $180,000 $944,000
Internal Rate of Return
$ Above Hurdle Rate
104.4%
$341,358
Selected Vendors
Figure E.1 An ROI analysis for service level management of an SAP application
at a service provider.
of Service Level
Management Products
Summary
The implementation of proactive service level management at this sample service
provider shows an excellent rate of return on the level of investment required for
the implementation. A s this book went to print, the ever-expanding market for service level man-
agement included over 800 vendors, each with a claim to provide at least one
SLM solution. In reality, many products cover just one aspect of SLM, such as
event monitoring or historical reporting. But this limitation does not stop vendors
from selling their wares as comprehensive SLM solutions. Given these claims, it is
difficult, if not impossible, to assemble a complete list of SLM products that does
full justice to the market—and the prospective buyer. The information that follows
is intended as a sampling of representative offerings that readers can use to start the
evaluation process.

ServicePoint Series
The ServicePoint Service Delivery Unit (SDU) is a WAN access device that com-
bines termination, monitoring, and control. It maps specific types of services, such
212 213

as ATM, frame relay, or II? WAN service, to business applications, according to InView
user-specified parameters. ServicePoint Explorer is a real-time software package View software monitors service levels for distributed applications and alerts
that centrally collects data from multiple ServicePoint devices, displaying perfor-
I' managers when objectives are not being met, so problems can be resolved
I'
mance parameters such as utilization, congestion, and delay and monitoring WAN proactively The software runs under Windows NT and is designed to continuously
use within organizations. ServicePoint Reporter is a non-intrusive data collection provide response-time information from the end-user perspective. The software
tool for monitoring frame relay service levels and overall network performance. It
identifies trends in response time over long periods of time to spot potential
works with ServicePoint SDUs and with ADC Telecommunications' DataSMART problems before they interfere with business and to identify growth patterns for
Frame Monitoring DSU/CSUs. Reporter monitors frame relay performance and ;pitchy planning. To facilitate this, detailed service-level data generated by EnView
graphically displays key statistics on circuit availability, delay, and data delivery rates. can be stored in a central reporting repository. Thus, availability and end-user
It also exhibits individual and aggregate circuit performance on a day-to-day basis response time might be tracked historically by application and location, enabling
and over time, giving IT staff a basis for circuit troubleshooting and WAN band-
service-level trending for IS management and the end-user community.
width planning.
Amdahl Corporation
ADC Telecommunications Incorporated
1250 East Arques Avenue
Access Products Division
14375 NW Science Park Drive Sunnyvale, CA 94088
Portland, OR 97209 +1-408-746-7830
http://www.amdahl.com
+1-503-643-1681
800-733-5511
http://www.adc.com/access Appvisor Application Management
Appvisor is a software package designed to work with Microsoft Exchange and
Lotus Notes. It provides service level monitoring, performance reporting, help
IQ Series desk services, and ongoing usage analysis. Appvisor users can associate resource use
The Adtran IQ series of intelligent performance monitoring devices provides with specific departments or individuals in order to manage user behavior, correct
detailed statistics on the overall health and performance of frame relay networks inefficiencies, and illustrate how resources are being consumed. Appvisor also sup-
at rates from 56Kbps to 2.048Kbps. It is specifically targeted at Service Level ports real-time monitoring of application transactions and monitors the impact of
Agreement verification for frame relay subscribers. In-depth diagnostics for circuit application workloads on server performance and user response time.The product
management and troubleshooting also are furnished. The IQ family features IQ also creates illustrations of baseline performance.
View, an SNMP management program that runs under Windows NT This soft-
Appliant Incorporated
ware manages IQ devices while providing a database and trend analysis of the
frame relay statistics gathered. 3513 NE 45th Street
Adtran Incorporated Seattle, WA 98105-5640
901 Explorer Boulevard +1-206-523-9566
Huntsville, AL 35814-4000 877-227-7542
http://www.appliant.com
+1-256-963-8000
800-923-8726
http://www.adtran.com
11 4 1I b

Spectrum Service Level Management Solutions Trinity and eWatcher


Aprisma, formerly the Spectrum division of Cabletron, offers a range of service Avesta 'Technologies, acquired in February 2000 by Visual Networks Inc., offers
level management products, both independently and through partnership with Iwo SLM prod ucts. The first of these, Trinity, helps enterprises, service providers,
selected third-party vendors. Products that have been integrated with the vendor's .uni electronic commerce organizations to apply priorities and workflow policies
Spectrum management platform include Concord Network Health, Gecko Saman, I o o I' I' problems and report real-time and historical service levels to their cus-
ICS Continuity; Micromuse Netcool/Omnibus, Opticom EIS, and Optimal Ioniers.Trinity's Enterprise Service Model performs real-time root cause and
Application Expert. In addition, Spectrum itself offers a range of products that impact analysis for diagnosing problems, allowing IT managers to resolve critical
support SLM, including the SpectroWatch real-time alarm notification package problems before service is disrupted and increasing availability. Trinity runs under
for applications, hosts, and network devices; Spectrum Alarm Notification; and UNIX and Windows NT.
SpectroRx inferencing solution. Spectrum also offers a range of applications Avesta Technologies' eWatcher is a Web management software solution that pro-
designed to report performance and manage availability for specific types of net- vides continuous monitoring of Internet-based applications and services. The
works. These include Spectrum ATM Services Manager, Remote Access Services package automatically discovers existing Web environments and tests service per-
Manager, and VLAN Services Manager. A series of functional applications for formance against established thresholds. eWatcher also locates bad links and scripts
Spectrum enhance IT's ability to furnish comprehensive SLM measurements and proactively. In the event of changes to home page content, eWatcher will alert IT
reports. These include Spectrum Data Warehouse, Spectrum Data Mining, personnel. It also delivers real-time availability and performance information. The
Spectrum Capacity Planning, and Spectrum Accounting and Billing. Spectrum
product runs under UNIX and Windows NT.
software runs under a range of operating systems, including most versions of
UNIX. Avesta Technologies Incorporated
Aprisma Management Technology 2 Rector Street, 15th Floor
121 Technology Drive New York, NY 10006
Durham, NH 03824 +1-212-285-1500
+1-603-337-7000 800-822-9773
http://www.aprisma.com http://www.avesta.com

Attention! PILOT
PILOT is a performance tuning and capacity planning tool for mainframes. It fea-
Attention! provides immediate notification of system, network, and environmental
events via pagers, telephones, audio announcements, message boards, and custom tures reporting, tracking, forecasting, and modeling. PILOT tracks response times,
notification techniques. The software filters events and activates alerts in support identifies peak periods, builds simulations of current and future systems for capac-
ity planning and justification, and produces reports that facilitate timely problem
of user-specified escalation procedures. It also furnishes statistical reports detailing
how well performance meets specified thresholds. The product runs in mainframe, diagnosis and resolution.Versions of PILOT are offered for MVS, CICS, and SMF
UNIX, and Windows NT environments and supports most RS-232 devices. environments.
Attention Software Incorporated Axios Products Incorporated
2175 N. Academy Circle, Suite 100 1373-10 Veterans Highway
Colorado Springs, CO 80909 Hauppauge, NY 11788
+1-719-591-9110 +1-631-979-0100
http://www.attentionsottware.com 800-877-0990
http://www.axios.com
L ib

Patrol, Best/1, Command/Post, MainView, MAXM Bridgeway Corporation


BMC Software solutions are designed to ensure that businesses meet specified I'() Box 229
goals of availability, performance, recovery of business-critical applications, and R edmond, WA 98073-0229
service level management agreements. The solutions run on a range of operating
1.1-425-881-4270
platforms, including OS/390, most versions of UNIX, and Windows NT. Specific
BMC Software solutions are available in four key areas: Application Service http://www.bridgeway.com
Management ensures that applications meet accepted service levels. Products in
this area include Patrol, Command/Post, Best', and MainView. In the area of data
management, products ensure enterprise data availability and integrity. Solutions OpenMaster
in this area include MAXM. IiullSoft, the worldwide software division of Groupe Bull SA (Paris), offers
OpenMaster to manage multi-vendor IT networks, systems, and applications.
BMC offers Incontrol software for IT process automation and Resolve for rapid OpenMaster, based on UNIX, incorporates an object-based repository and manage-
recovery and storage management. In addition to these products, BMC offers ser- ment services to allow IT staff to easily deploy software, manage assets and configu-
vice level management in customized Service Assurance Center solutions. These rations, manage availability and performance of IT, and secure IT components.
combine a management methodology with products and professional services fo r OpenMaster also furnishes service-level reporting on all IT elements across geo-
a variety of platforms and applications. graphical, functional, or business process boundaries. Reports are provided on
BMC Software Incorporated network devices, desktops, servers, and applications. Information is delivered on
configuration, significant events, and security parameters. A range of report formats
2101 CityWest Boulevard
are offered for a variety of media, including the Web via graphical Java interfaces. In
Houston, TX 77042 addition, multi-dimensional analysis tools are available for more complex tasks, such
+1-713-918-8800 as return on investment evaluations or analyses of the overall performance of critical
800-841-2031 components over long periods of time.
http://www.bmc.com BuliSoft
300 Concord Road
Keystone VPNview and Keystone CNM Billerica, MA 01821
Keystone VPNview software allows customers to proactively monitor SLAs and +1-978-294-6000
manage performance of carrier services such as frame relay. It reports real-time and 800-285-5727
historical data on bandwidth utilization and availability of individual circuits and http://www.bullsoft.com
routed segments. IT professionals can partition data so that in-house customers
view only the data specific to their portion of the network. The software supports
SNMP and runs under UNIX and Windows NT. Keystone CNM (Customer eBA*ServiceMonitor and ServiceNetwork,
Network Management) gives service providers the ability to furnish end users ETEWatch, RTN, PMN
with access to their network information via the Web. An integral real-time reposi- eBA(e-Business Assurance)*ServiceMonitor is a Windows NT- and UNIX-
tory ensures security and enables individual customers to further partition their compatible software package that measures Web site performance from the end-
data views by subsidiary, geography, division, or department. The software monitors user perspective, enabling IT and Web managers to set Service Level Agreements
performance on ATM, frame relay, IP, and Sonet networks based on switches from and ensure customer service thresholds. eBA*ServiceNetwork is a business infor-
Ascend, Cisco, Lucent, and Newbridge. A topology application provides graphical mation service that studies Web service levels, historical trends, and usage over
representation of specific nodes and circuits.
time and automatically issues reports.
219
ETEWatch is software that runs under Windows NT and measures end-to-end Cinco Systems Incorporated
application performance management.Versions are offered to support Citrix
MetaFrame, Lotus Notes, R/3 monitors, PeopleSoft, and custom applications. 70 West Tasman Drive
San Jose, CA 95134
Candle's Response Time Network (RTN) is a service that monitors applications
from the end user's point of view. RTN is based on Candle's ETEWatch and lets 1I -408-526-4000
users see how applications are performing for any site, time, user, server, or time 800-553-6387
period right at the desktop. An advanced online application process engine struc- http://www.cisco.com
tures the data into information that can be customized. Candle's Performance
Monitoring Network (PMN) automates the transformation of performance data
into intelligent business analysis. The service provides daily, weekly, monthly, or Unicenter TNG Advanced Help Desk and ServicelT
quarterly information on service levels, capacity, and application monitoring. Enterprise Edition
Candle Corporation CA's Unicenter TNG products offer service level management according to user-
definable rules that can be associated with business policies as well as network or
201 N. Douglas Street
system elements. Users can define service thresholds according to a specific condi-
Los Angeles, CA 90245 tion or set of conditions. Actions can be set to be performed when service thresh-
+1-310-535-3600 olds are exceeded, or when a condition is not present. Alerts, evaluations, or
http://www.candle.com specific automated routines can be set for activation after an elapsed time interval.
All actions can be associated with assigned service priority levels. The software
runs under UNIX and a range of other computing platforms.
CiscoWorks 2000 with Service Level Management Computer Associates International Incorporated
Suite
One Computer Associates Plaza
Cisco offers a service level management suite with XML-based interfaces for
Islandia, NY 11788-7000
use on networks that deploy its routing and switching equipment. The suite relies
on specially developed Service Assurance Agents (SA Agents) within routers and +1-516-342-5224
switches. These agents extend the integral capabilities of Cisco devices to measure 800-225-5224
Web, voice, and data services. The results obtained by SA Agents are used to moni- http://www.cai.com
tor network Service Level Agreements. The Cisco Service Level Management
Suite also furnishes business-oriented reporting for IT managers on services
deployed from outside providers. By using XML, the agents can be extended EcoSCOPE and EcoTOOLS
across multiple partners' networks using the Internet delivery model. EcoSCOPE uses a software probe technology to monitor the network nonintru-
The service level management suite runs under Windows NT and is also part of sively. It automatically discovers applications, tracks application flows through the
Cisco's Management Connection Service Management Program, which enables LAN/WAN infrastructure, and collects detailed performance metrics. EcoSCOPE
enterprise customers to choose their applications and to construct a service man- correlates this information into a user interface with a scorecard format that auto-
agement solution consisting of multiple horizontally integrated partners. Vendors in matically identifies poorly performing applications, the servers and users impacted,
the program have committed to deploying solutions based on Cisco's service man- and the magnitude of the performance problem. Users can drill down to under-
agement technology and open XML-based interfaces. The initial vendors in the stand the root cause in order to solve problems quickly. EcoSCOPE also can be
program include: Compuware, Concord Communications, Desktalk Systems, used to determine which applications are contending for network resources, who
FirstSense Software, Ganymede Software, Hewlett-Packard, InfoVista Corporation, is using them and for how long, and whether there are any predictable patterns in
Inverse Network Technology, Manage.Com , NetScout Systems, Network application usage. EcoTOOLS enables an IT administrator to manage availability
Associates, NextPoint Networks, ProactiveNET, Response Networks, TAVVE and service levels across e-commerce, messaging, and ERP applications running
Software, Valencia Systems, Visionael, and Visual Networks.
220 221

under Windows NT, UNIX, and Novell NetWare. EcoTOOLS uses a single, Web.The application correlates network information with links to customer
consistent Windows NT interface to furnish at-a-glance scorecard reports for inli>r n iation and Quality of Service (QoS) objectives—to provide an end-to-end
management and the general user population in addition to the in-depth opera- service view for service provider customers.
tional reports required for the daily management of applications and servers.
Customizable reports also are available. CrossKeys Systems Incorporated
1593 Spring Hill Road, Suite 200
Compuware Corporation
Vienna,VA 22182
31440 Northwestern Highway
+1-703-734-3706
Farmington Hills, MI 48334-2564
http://www.crosskeys.com
+1-248-737-7300
800-521-9353
http://www.compuware.com TREND
I)eskTalk's TREND product automates the collection and analysis of perfor-
mance data and delivers business-critical reports out of the box. TREND collects
Network Health—Service Level Reports performance data from industry-standard sources such as SNMP MIBs, as well as
Concord's Network Health—Service Level Reports allows service providers to from application monitoring partners such as FirstSense Software and Ganymede
optimize service quality and document Service Level Agreement compliance. It Software. Utilizing these heterogeneous data sources,TREND reports deliver a
also enables them to add value to transport services by offering customized reports cohesive view of network, system, and application performance, providing IT
tailored to individual organizations. These reports can be a valuable bargaining tool organizations with an end-to-end service level picture of the entire business
in service contract negotiation, the vendor says. Network Health—Service Level process. TREND is built on a distributed architecture with a Web interface for
Reports runs under UNIX and Windows NT. Concord says its approach leverages report creation and viewing. TREND users can add new data sources, update
the vendor's ability to gather performance information from multiple enterprise polling polices, fine tune threshold definitions, and create customized performance
resources and present it in a concise, single-page report that's easy to understand. reports. A predictive analysis feature warns network managers in advance of
The reports include the Executive report, a high-level summary of quality of ser- impending slowdowns so they can prevent problems and quickly identify the
vice across the enterprise and by business unit; the IT Manager Report, which root cause of any delay. TREND operates on and between AIX, HP-UX, Solaris,
offers a more detailed picture of enterprise trends and service performance by Windows 95, and Windows NT platforms.
region and individual devices; and Service Customer Reports, which document DeskTalk Systems Incorporated
the quality of service delivered by providers to their customers.
19191 South Vermont Avenue, Suite 900
Concord Communications Incorporated
Torrance, CA 90502
600 Nickerson Road
+1-310-630-1000
Marlboro, MA 01752
http://www.desktalk.com
+1-508-460-4646
http://www.concord.com
RPM 3000 with WANwatcher
Eastern Research's Router/Performance Monitor (RPM) 3000 is a multifunctional
CrossKeys Resolve branch office frame relay router with integral DSU/CSUs that includes trend
CrossKeys Resolve is a software suite designed to help service providers define, set, analysis monitoring capabilities. Using interface cards, it can be upgraded to sup-
and meaasure service level goals. The product also includes performance reporting port a range of data rates up to T1.The RPM 3000 measures throughput, band-
software. The Solaris-based package enables service provider to deliver sets of net- width utilization, and network delays on up to 32 frame relay PVCs (permanent
work and service performance reports to their customers and internal users via the virtual circuits) and assures that a carrier is delivering the promised SLA band-
width. WANwatcher takes the data gathered by the RPM 3000 and provides IT
222 223

with statistical trending information on the utilization, link status, and interface knowledge-based rules against the baseline to identity abnormal behavior that can
detail as well as other parameters vital to the frame relay network. WANwatcher lead to performance problems. Engineers can drill down and learn more about the
collects the network statistics in real time or on an hourly, daily, or weekly basis or abnormal behavior and perform what/if analyses to see how changes in loading
at preset intervals. It can handle data on up to 1,280 channels in the network. an improve system performance. Basis engineers can also enhance Envive's health
Statistics can be viewed in a range of report formats. check by adding additional knowledge rules using their own SAP knowledge. SLS
runs on a separate architecture from the database system itself, enabling it to per-
Eastern Research Incorporated
Ibrm analyses even when R/3 is down.
225 Executive Drive
Envive Corporation
Moorestown, NJ 08057
+1-856 273-6622
-
1975 El Camino Real, Suite 303
http://www.erinc.com Mountain View, CA 94040
+1-650-934-4100
888-236-8483
Empirical Suite
http://www.envive.com
Empirical's flagship product, the Empirical Suite, covers the planning, measure-
ment, and prediction functions associated with improving enterprise service levels.
The suite is comprised of three products: Empirical Planner, Empirical Director, FirstSense Enterprise
and Empirical Controller. The applications are sold either individually or as a bun- FirstSense, which was acquired by Concord Communications on January 2, 2000,
dled solution. Empirical Planner helps IT managers set baselines, define corporate offers FirstSense Enterprise, software that continuously monitors application
service levels, and implement requirements. Empirical Director runs under performance and availability from the end-user perspective. FirstSense says this
Windows NT, UNIX, or VMS and tracks actual application service, sending alerts approach provides IT organizations the information necessary to measure true
when performance falls below an optimum level, and diagnosing the source of a application quality of service. FirstSense Enterprise uses patented lightweight
problem. IT managers can also use the application to perform trend analysis for intelligent autonomous agents on end-user client systems to continuously monitor
capacity planning and long-term troubleshooting purposes. Empirical Controller and collect information on business transactions that affect the end user. The agents
performs corrective actions to fix service level issues. The application makes track end-to-end response times (in real-time) comparing actual availability and
promises to help administrators automate the tuning of application SQL and performance against service-level thresholds. When a transaction exceeds defined
the database's physical structure. service-level thresholds, FirstSense Enterprise captures diagnostic information at
Empirical Software Incorporated the moment the exception occurs and at every tier involved with that specific
application transaction. FirstSense sends notification of an alarm, and compares val-
1151 Williams Drive
ues at exception time to normally observed behavior. These "normalcy profiles"
Aiken, SC 29803 provide a baseline of application behavior so that IT can determine what is typical
+1-803-648-5931 for a particular environment. The baseline data and exception diagnostics provide
877-289-8100 IT with the context for resolving problems, whether on the client, network, or
http://www.empirical.com server.
FirstSense Software Incorporated
21 B Street
Envive Service Level Suite
Burlington, MA 01803
Envive's Service Level Suite (SLS) provides SAP service level and performance
+1-781-685-1000
management based on real-time monitoring and analysis of end-user response time
performance by business unit, department, or geography. SLS develops a baseline http://www.firstsense.com
analysis to determine normal system performance. It then applies a series of
£L`P

Pegasus or

Ganymede Software's Pegasus monitoring solution is designed to minimize the 19925 Stevens Creek Boulevard
time and effort required to detect, diagnose, and trend network performance prob- Cupertino, CA 95014-2358
lems. The Pegasus Application Monitor component gives a user's view of applica- +1408-725-7105 (US)
tion performance. It passively monitors the performance of end-user transactions,
http://www.geckoware.com
so IT professionals can identify, prioritize, isolate, and diagnose application perfor-
mance problems. It tells staffers if an application on a particular desktop is being
constrained by the client, the network, or the server so that they can deal with HP OpenView ITSM Service Level Manager
these problems before end-users are aware of them. If the network is causing an ITSM Service Level Manager uses a configuration management database to iden-
application to slow down, Pegasus identifies which network segment is causing the tify the components that reside in the IT infrastructure and the corresponding IT
performance degradation by using active application flows of known transactions services the components provide. IT professionals can then use this information to
to determine where performance is being constrained. In addition, key system create a service catalog to formalize operational performance agreements between
statistics can be monitored to see how they are affecting application performance. IT groups and their customers. The HP application tracks actual service versus ser-
This information can be used to establish trends, set SLAs, and monitor vice level objectives. This makes it possible for organizations to set progress moni-
conformance to agreed-on criteria. tors and escalation rules to manage the incidents and ensure that Service Level

Ganymede Software Incorporated Agreements will not be violated.

1100 Perimeter Park Drive, Suite 104 Hewlett-Packard Company


Morrisville, NC 27560-9119 3000 Hanover Street
919-469-0997 Palo Alto, CA 94304-1185
http://www.ganymede.com
+1-650-857-1501

http://www.hp.com
Gecko Service Level Agreement Manager (SAMAN)
Gecko SAMAN provides service level management and reporting for mission-
critical networks. It is designed to allow net managers to define, monitor, and
Continuity
Continuity software is designed to help IT organizations manage service require-
report on the achievement of service level commitments, either by in-house
ments in complex distributed environments. It gathers baseline information on nor-
organizations or external service providers. The product models Service Level
mal network performance and then tracks information on availability, performance,
Agreements and furnishes executive-quality business reports based on information
response time, throughput, service levels, and operational risks—in terms that both
from a wide variety of different sources including Spectrum, HP OpenView,
IT operations managers and business managers can understand. Continuity provides
and Tivoli NetView. Using this information, customers can build Service Level
real-time, correlated diagnostics to maximize availability and performance by help-
Agreements that include availability (uptime, mean time between failure, mean
ing managers to correct and prevent service disruption quickly. By monitoring
time to repair), network performance (bandwidth and latency), and people-related
performance from workflow management systems. business transactions as users experience them, the product aims to address problems
before users are aware they exist.
Gecko Software Limited
Intelligent Communication Software GmbH
P.O. Box 5
PINNER, HA5 1US Kistlerhof Str. 111, 81379

Middlesex, UK Munich, Germany

+44 700-004-3256 (UK) +49-89-748598-35


http://www.ics.de
117

InfoVista and Infovista Web Access Server,


VistaViews environment. SLMs notify the niid-level manager of when response times exceed a
defined level.The mid-level manager can, in turn, forward that data to upstream
InfoVista offers a suite of tools specially designed for SLM. Features include wizard- management stations. SLMs can also be used for local reporting.
driven report creation, drill-down between reports, complex data structure han-
Jyra Research Incorporated
dling, full-database query capabilities, and complete developer kit (with C, Perl,
and Visual Basic support) for customization. 2880 Zanker Road, Suite 203
The InfoVista Web Access Server is a Web application (Java-based and HTML San Jose, CA 95134
compliant) for report browsing. Web Access Server, like the InfoVista core product, +1-408-432-7235
is fully customizable. Key features include exporting reports by specifying filters, http://www.jyra.com
exporting reports by specifying instances, batch or on-demand report distribution,
and intelligent online updating of performance information. VistaViews are ready-
to-use report templates for use with the InfoVista Report Builder. These include NETClarity Suite
comprehensive reports covering networks, systems and applications, ATM and The NETClarity Suite of network performance management and diagnostic tools
frame relay WANs, Ethernet switches, routers, and LAN segments, among other allows the network manager to monitor, measure, test, and diagnose performance
elements. The Vista Plug-in for NetFlow is an integrated package for InfoVista that across the entire network. The suite's six network performance tools are Network
provides an efficient means of managing high-volume NetFlow data from Cisco Checker+, Remote Analyzer Probe, Load Balancer, Service Level Manager,
devices. All InfoVista products run on Windows or UNIX platforms. Capacity Planner, and NETClarity Complete. All the tools are based on technol-
InfoVista Corporation ogy and methodologies taken from LANquest's independent LAN/WAN testing
12, avenue des Tropiques services.
91955 Courtaboeuf cedex LANQuest
France 47800 Westinghouse Drive
+33 . 1 .46.21.87.87 (Europe) Fremont, CA 94539
or +1-510-354-0940
5950 Symphony Woods Rd 800-487-7779
Columbia, MD 21044 http://www.lanquest.com
+ 1 -410-997-4470 (United States)
htt p : //www.infovista.com
PerformanceWorks for E-Business and
PerformanceWorks WebWatcher
Service Management Architecture PerformanceWorks for E-Business software monitors the performance of end-user
workstations, back-end servers, databases, and other system-level components of
Jyra measures the quality of service and response times delivered to Web com-
enterprise services. PerformanceWorks WebWatcher monitors the performance of
merce customers, desktop users, and branch locations. Jyra's products are built
Web servers and end users in e-commerce sites.
around its Service Management Architecture (SMA) that continually monitors
services to identify weaknesses in network configuration, hardware failures, PerformanceWorks software runs under a range of platforms, including UNIX and
congestion, or other issues that are having a negative impact on e-commerce Windows NT, as well as mainframes. Optional packages are offered for adding pre-
performance.
defined alarms and reports, specialized agents for specific servers and databases,
capacity planning, and application performance management.
Jyra's SMA uses a mid-level manager to collect and aggregate response time data
from Service Level Monitor (SLMs), agents distributed throughout the networked
1La

Landmark Systems Corporation Luminate Software Corporation


12700 Sunrise Valley Drive
2750 El Camino Real
Reston, VA 20191
Redwood City, CA 94061
+1-703-464-1300
+1-650-298-7000
800-488-1111
http://www.luminate.com
http://www.landmark.00m

Netcool
VitalSuite Micromuse's Netcool suite is designed to help telecommunications and Internet
NetCare Professional Services offers a full suite of enterprise performance service providers ensure the uptime of network-based customer services and appli-
management solutions under its VitalSuite trademark software brand. cations. The Netcool ObjectServer is the central component in the suite. The
ObjectServer is an in-memory database optimized for collecting events, associating
The suite includes VitalNet (formerly, EnterprisePRO), a network performance
events with business services, and creating real-time reports that show the availabil-
reporting and SLA compliance management system—VitalAnalysis, a performance ity of services. The ObjectServer performs all formatting and filtering of this data,
reporting system for mission critical applications;VitalHelp, a proactive, real-time
allowing operators to create customized EventLists and views of business services.
fault detection and troubleshooting solution; and the Business Transaction
The suite also contains ObjectiveView, an object-based topographical front-end
Management System—which manages network, application, and user activity. In toolset that allows operators to build clickable maps, icons, and other graphical
addition, NetCare Professional Services provides comprehensive consulting services
interfaces to ObjectServer data and EventLists. ObjectiveViews are used by
to assist clients—both enterprise and service provider—in designing, deploying,
managers in the network operations center because they supply a concise, global
and administering effective, business-oriented SLAs,
summary of event severities and service availability throughout the entire network.
Lucent Technologies NetCare Professional Services (formerly INS)
Micromuse Incorporated
1213 Innsbruck Drive
139 Townsend Street
Sunnyvale, CA 94089
San Francisco, CA 94107
+1-650-318-1000
1-888-4-NETCARE +1-415-538-9090
http://www •micromuse.com
http://www.lucent.com/netcare

Do It Yourself (DIY) and Custom Network Analysis


ServiceDesk for SAP R/3, Service Level Analyzer NetOps offers products and services in the areas of network fault analysis and event
for SAP R/3 correlation. The company's solutions focus on uncovering the root of a problem that
Luminate's ServiceDesk for SAP R/3 generates end-to-end performance profiles might be diminishing network service levels. DIY (Do It Yourself), the company's
according to user-defined parameters, including SAP R/3 SID, user ID, transaction Internet-based software, identifies network problems and offers IT managers possible
code, and date/time. The Luminate software analyzes performance from several solutions. DIY uses proprietary mid-level managers called Distributed Status
perspectives, from identifying general end-to-end problems to diagnosing very Monitors (DSMs) that model what networked system behavior should be and then
specific transaction code issues. The application breaks down response time into collect actual performance information and report problems. The DSMs speak
network response time, system queue time, application response time, and database SNMP. An SNMP polling hierarchy of monitors is configured for centralized aggre-
response time. Luminate's Service Level Analyzer adds a business user perspective gation of all threshold-crossing events in real-time. Non-critical events that might be
to its analysis of service levels issues, associating the impact technical performance signs of future problems are collected for fault avoidance analysis. NetOps also pro-
has on corporate divisions, geographic sites, and individual users. vides a service called Custom Network Analysis in which the company integrates
231

the DSMs into the network and interprets the information from the agents. NetOps Notreality
then offers suggestions to correct network deficiencies.
2350 Mission College Boulevard, Suite 900
NetOps Corporation
Santa. Clara, CA 95054
501 Washington Avenue
+1-408-988-8100
2nd Floor
http://www.nreality.com
Pleasantville, NY 10570
+1-914-747-7600
http://www.operations.com
NetScout Manager Plus, NetScout Server, NetScout
Webcast
NetScout Systems monitors the performance of enterprise applications for the
NetPredict purpose of tracking SLA compliance. NetScout Manager Plus integrates data from
distributed RMON probes and embedded agents throughout the network. The
NetPredict software monitors the end-to-end performance of specific applications
software then analyzes that information to produce service level baseline and his-
on user-selected paths through a network. To perform this function, the software
torical trend reports. Another component, NetScout Server, makes enterprise-scale
collects key data obtained from SNMP and distributed RMON sources. That
network monitoring possible by logging RMON data from probes and switches,
information is then stored in a relational database for long-term trending and
allowing more frequent polling while minimizing management traffic. NetScout
historical review. By comparing this data against measured traffic on the network,
Server shares data with NetScout Manager Plus to create enterprisewide reports.
NetPredictor is able to perform accurate predictions of the effects of changes in
The server generates reports on demand or on a daily, weekly, or monthly basis.
the network or the application. With this capability, IT personnel can accurately
gauge their capacity requirements to improve the performance of both their NetScout WebCast works with NetScout Manager Plus and NetScout Server
applications and networks. NetPredict supplies a tool for creating and tracking to give IT managers access to reports and alarms at any time via the World
Service Level Agreements. IT managers can use it to estimate what their actual Wide Web.
requirements are and then use the technology to measure the application
performance end user's experience on a day-to-day basis. NetScout Systems Incorporated

NetPredict Incorporated 4 Technology Park Drive


Westford, MA 01886
1010 El Camino Real, Suite 300
Menlo Park, CA 94025 +1-978-614-4000

+1-650-853-8301
http://www.netscout.com

http://www.netpredict.com
NetSolve Services
NetSolve offers a range of remote network management and security services
Wise IP/Accelerator that allow companies to selectively outsource specific management tasks to increase
The Wise IP/Accelerator enables carriers and ISPs to offer SLAs for IP-based the reliability and the performance of their enterprise networks. The company
virtual private networks (VPNs). The IP SLAs supported by Wise/IP Accelerator essentially acts as an extension of its client's internal IT staff.
furnish point-to-point bandwidth availability guarantees for virtual private
Besides supplying network implementation services, NetSolve provides security
networks (similar to the committed information rate or CIR of a frame relay
services and turnkey management services for both LANs and WANs. The com-
network). By utilizing Wise/IP Accelerator to offer SLAs for IP VPNs, carriers
pany's WAN and LAN management services encompass network design verifica-
and ISPs can generate additional subscribers among companies looking for an
tion, installation, 24X7 fault management, configuration management, performance
inexpensive alternative to dedicated network services.
management, and ongoing documentation.
Rd N

NetSolve's performance management practice collects service level information N*Manage Company
from the customer's site and uploads that data to NetSolve's Network management Raleigh, NC 27606
center. A NetSolve engineer analyzes that data and produces a summary of those
statistics. If changes are necessary, the report will include recommendations. +1-91 9 362-8866
. -

http://www.nmanage.com
NetSolve Incorporated
12331 Riata Trace Parkway
Austin, TX 78727 Optivity SLM and Preside Performance Reporting
+1-512-340-3000 Nortel produces service level management products for both the enterprise and
the service provider. Optivity SLM, the centerpiece of its enterprise management
http://www.netsolve.com
offerings, gathers and aggregates application performance and availability data
directly from the network to provide information on both a user and application
basis. Optivity SLM is designed to quickly isolate and respond to network faults
Netuitive Service Level Monitor that impact business-critical applications. It also features remote access support,
Netuitive SLM uses Netuitive's patented Adaptive Correlation Engine (ACE) to allowing end users dialing in remotely to run a six-step diagnostics check before
identify network performance thresholds automatically, and then predict problems calling their help desk. Optional application modules enhance transaction visibility
before they occur. ACE technology analyzes the performance data about critical for Oracle database applications and IMAP email applications.
online business systems from a range of data streams in real-time. These data
streams can be gathered by the software itself or through its access to the databases Preside Performance Reporting provides trending and historical reporting capabili-
of products like DeskTalk TREND and Visual Network IPlnsight. Netuitive uses ties for networks based on a range of vendors' devices, including those from Cisco
this input to correlate performance variables, identify the correct baseline for nor- and Nortel Networks. The software comes with a range of graphical reports. The
mal network performance, and then predict abnormal performance up to four days software was added to Nortel's product line after its acquisition of X-Cel
in advance. Deviations between the SLM-predicted performance for each data Communications in 1999.
stream and the baseline are reported to network managers as an alert. The sensitiv- Nortel Networks Corporation
ity to these alert conditions is configurable globally across all inputs that are being
predicted, or for groups of inputs. 8200 Dixie Road, Suite 100
Brampton, Ontario L6T 5P6
Netuitive Incorporated
+1-905-863-0000
3460 Preston Ridge Rd., Suite 125
http://www.nortel.com
Alpharetta, GA 30005
+1-678-256-6100
http://www.netuitive.com Executive Information System or iView
Opticom's Executive Information system consolidates reporting on all aspects of
the service management process within the infrastructure—assets, services, avail-
Bluebird ability, capacity, and performance. The product includes software modules that track
N*Manage supplies Service Level Agreement monitoring software for systems and specific metrics of applications, carrier services, and systems from the perspective
networks. Bluebird, N*Manage's SLA tracking software, collects service and avail- of the end user. The EIS integrates into the existing management infrastructure. A
ability data for IP, email, FTP, HTTP, NFS, and other applications. Bluebird uses a component called ServiceView compares metrics based on the business impact of
distributed architecture and a Java client to present network health information. an outage. It also offers multifaceted service views. Users have the ability to define
Bluebird issues real-time alerts when network performance exceeds acceptable services of all types, ranging from network transport services to complex business
thresholds or availability falls below an acceptable level. processes.
234 L J V

Opticom Incorporated I'ackeiShaper tracks average and peak traffic levels, calculates the percentage of band-
width that's wasted on retransmissions, highlights top users and applications, and
One Riverside Drive
measures performance. PacketShaper's high-level network summaries record network
Andover, MA 01810 trends. The product also has the capability to measure response times and then com-
+1-978-946-6200 pare those numbers to what is deemed acceptable response time performance.
http://www.opticominc.com
Packeteer
10495 N. De Anza Boulevard
Energizer PME Cupertino, CA 95014
OptiSystems designs and sells products to manage the performance of SAP R/3 +1-408-873-4400
systems. The company also offers management products for R/2 applications. http://www.packeteer.com
Energizer PME (Performance Management Environment) for R/3 dynamically
analyzes system usage and reacts to events as they happen in order to improve
system performance. OpenLane
The data collection engine for the Energizer PME for R/3 products runs as an Paradyne's OpenLane network management application features support for
R/3 task and captures real-time interval data, as well as summary data, for all sys- diagnostics, real-time performance, SNMP-managed narrowband, and broadband
tem components using SAP'S own data collection routines. As a result, Energizer's networks through its access device product lines. OpenLane collects and reports
overhead is negligible (less than 1%, according to the vendor) and R/3's own data performance against the terms of an SLA. Support is provided for Paradyne's
collection is not needlessly duplicated. In addition, the data collected by the FrameSaver Frame Relay Access Units as well as Paradyne's Hotwire xDSL and
Energizer data collection engine is used as the basis of the Energizer PME for MVL products. OpenLane also supports Paradyne's 31xx, 7xxx, and NextEDGE
R/3 product modules. After one of the modules is installed, any one of the other 9xxx T1 and subrate access products.
modules can make use of the same data. Paradyne Corporation
OptiSystems Incorporated 8545 126th Avenue North
1100 Fifth Avenue South, Suite 404 Largo, FL 33773
Naples, FL 34102 +1-727-530-2000
+1-941-263-3885 http://www.paradyne.com
http://www.optisystems.com

Foglight
PacketShaper Foglight software ensures the reliability and performance of electronic commerce
Packeteer supplies products to both enterprise customers and service providers sites, enterprise resource planning (ERP) systems, and information technology
for managing network bandwidth. PacketShaper detects and classifies network traf- infrastructures.
fic, analyzes traffic behavior, offers policy-based bandwidth allocation for specific Foglight monitors business applications for their availability and performance;
applications, and provides network reports. PacketShaper automatically detects over alerting system managers to actual or potential application problems, and allowing
150 types of traffic. It can categorize traffic by application, service, protocol, port them to effectively identify and correct potential problems before end users are
number, URL or wildcard (for Web traffic), hostname, precedence bits, and IP or impacted. Foglight keeps critical applications up and running properly, monitors
MAC address. and reports on application service levels, and provides a solution to scale e-business
systems growth through accurate capacity planning.
tai

Quest Software, Incorporated


Statscout
8001 Irvine Center Drive Statscout is a network performance monitoring package based on SNMP. It runs
Irvine, CA 92618 under FreeBSD-3.X UNIX, a little-known flavor of UNIX comparable to Linux.
+1-949-754-8000 Statscout boasts that its software can monitor thousands of devices and ports simul-
taneously while requiring minimal disk space. The software measures network
http://www.quest.com
health statistics, including average response time (calculated by measuring ping
response times), utilization, and errors. Statscout also produces SLA summary
reports that include information on SLA non-conformance, as well as detailed
Solo DSU/CSUs with WANview
network management statistics.
The WANview Network Management System is a complete SNMP-based system for
managing wide area networks that includes Service Level Agreement (SLA) monitor- Statscout
ing and reporting for the vendor's Digital Link Solo Select family of intelligent DSUs. One World Trade Center, Suite 7967
Using industry-standard measurements based on the Frame Relay Forums' FRF.13 New York, NY 10048
specification, IT managers can ensure the levels of service they have contracted for. As
a complement to its SLA features, WANview incorporates a customer database, acces- +1 212-321-9282
-

sible via a Web browser, making it easier for service providers to partition data on a http://www •statscout.com
per-customer basis and enabling them to generate new services such as SLA verifica-
tion and selectable quality-of-service levels (QoS) for their customers. WANview is a
UNIX-based application that runs under HP Openview The Digital Link Solo series SOLVE Series
includes intelligent monitoring DSU/CSUs for use on frame relay networks at Sterling's SOLVE products monitor network performance and diagnose any prob-
56Kbps,T1, or fractional T1 rates and leased lines at up to Tl rates. lems that could have a negative impact on enterprise service levels. The software
supplies IT managers with utilization information so that they can adequately allo-
Quick Eagle Networks
cate network resources and control spending. Sterling claims their SOLVE product
217 Humboldt Court line can instantly determine the location of a problem and accelerate resolution.
Sunnyvale, CA 94089-1300 Sterling offers SOLVE products for a variety of platforms and environments.
+1-408-745-6200 Included among those are software solutions for SNA,TCP/IP, CICS, and MVS.
http://www.digitallink.com Sterling Software
300 Crescent Court, Suite 1200
ResponseCenter Dallas, Texas 75201
ResponseCenter is an active testing solution that provides comprehensive, end-to- +1-214-981-1000
end transaction performance and problem diagnosis for e-business and e-commerce http://www •sterlingsoftware .com
sites. ResponseCenter diagnoses the response time of a complete e-transaction
across networks, servers, databases, middleware objects, and application components,
breaking down the individual components of total end-to-end performance. The Frame Relay Access Probe and Sync Performance
product is designed to help e-businesses get an early warning of potential applica- Manager
tion brownouts or outages before e-commerce service is interrupted. The Frame Relay Access Probe (FRAP) line of circuit management solutions
Response Networks Incorporated from WAN access hardware vendor Sync Research is designed to deliver proactive
2034 Eisenhower Avenue, Suite 290 service level management and troubleshooting capabilities to both enterprise and
service provider users. These FRAPs, placed in strategic areas of the network, col-
Alexandria, VA 22314-4650
lect statistics and act as troubleshooting devices. An accompanying product called
+1-703-739-7770 the Sync Performance Manager is designed to help companies establish and
http://www.responsenetworks.com
AJO 239

maintain. SLAs, plan for future growth, and manage network change by compiling productivity by reducing downtime.The software integrates with a number of
statistics over time and analyzing that data for trend information. Sync also offers network and system management consoles including HP OpenView and Intel
an SNMP-managed CSU/DSU.
I.ANdesk management packages.
Sync Research Incorporated
The Vantive Corporation
12 Morgan
2525 Augustine Drive
Irvine, CA 92719
Santa Clara, CA 95054
+1-949-588-2070
+1-408-982-5700
http://www.sync.com
http://www.vantive.com

Tivoli Service Desk WANsuite and NetVoyant


Tivoli Service Desk software works with the vendor's Tivoli Enterprise manage- Verilink Corporation WANsuite series of intelligent, software-based integrated
ment framework to give customers comprehensive, centralized control over IT access devices (IADs) is designed to combine voice, data, and network traffic over
service levels. The product contains Asset Management, Problem Management, and a single transmission facility, and targets public and private line services at DDS,
Change Management modules, as well as a Service Level Agreement Module that T1, E1, and HDSL2 delivery. An accompanying Windows NT-based element man-
allows organizations to identify, define, configure, administer, and measure all agement system, NetVoyant, tracks performance data for use in WAN SLM.
aspects of IT service delivery. All components of Tivoli Service Desk are integrated NetVoyant includes an ODBC compliant database, CORBA IDL (Interface
with the Tivoli framework. The vendor says this approach enables IT managers Definition Language) for customization and flexibility, real-time diagnostics, and
to not only monitor SLAs, but also to proactively maintain network and system extensive reporting and trending application support. NetVoyant gathers statistics
performance as well as to automatically fix problems as they occur. Tivoli Service from any SNMP-based networking device.
Desk provides bi-directional integration with both Tivoli NetView and the Tivoli
Enterprise Console. Verilink Corporation

Tivoli Systems Incorporated 127 Jetplex Circle


9442 Capital of Texas Highway North Madison, AL 35758
Arboretum Plaza One +1-256-772-3770
Austin, TX 78759 800-926-0085
+1-512-436-8000 http://www.verilink.com

http://www.tivoli.com
Visual Uptime and Visual IPlnsight
Visual UpTime integrates expert monitoring capabilities with access equipment to
Vantive Help Desk fully automate the collection, interpretation, and presentation of service level data
The Vantive Corporation offers internal help desk software that promises a number across fast-packet IP, frame relay, and ATM networks. The product includes Analysis
of functions to help improve and then sustain enterprise service levels. The com- Service Elements (ASEs) that embed the functionality of a protocol analyzer and
pany's help desk solution includes asset management, enterprise tracking capabili- transmission monitor into a CSU/DSU or a passive-monitoring device. ASEs are
ties, and technical support functionality that can be customized to fit the needs of available for DS3,T1/FT1, 56K DDS,V.35, EIA-530, RS-232, RS-449, and X.21
an individual enterprise. circuits. For transmitting network data back to the centralized management con-
Vantive says its Help Desk is designed for fast access to diagnostic information, sole, either an Ethernet or a Token Ring LAN interface is available, as well as a
streamlined problem resolution, robust change management, and inventory track- backup SLIP interface via a standard serial port.
ing. The company says its approach cuts support costs and improves employee
240

Visual UpTime comes with a series of SLA monitoring and reporting tools that
track performance of frame relay and ATM network services on a daily, monthly,
or multimonth basis. A Visual UpTime Burst Advisor continuously measures one-
Glossary
second usage over each port and PVC. From this information, the system automat-
ically makes recommendations on correct bandwidth allocations. A series of Like other specialized areas of infor- Baseline: The present state of perfor-
executive reports puts this data into a format suitable for presentation to CEOs mation technology, service level mance, as monitored by an analyzer
and top-level executives. management (SLM) has acquired a or other measuring tool. Baselines are
language of its own. For the most obtained in order to determine how
Visual IP InSight leverages technology that the company picked up as part of its part, the terms used in SLM are de- services need to be changed to obtain
acquisition of Inverse Network Technology to give service providers and enter- rived from the fields of networking, more satisfactory performance, and
prises the tools required to manage IP connectivity and applications such as dedi- general IT, enterprise management, how services will be maintained and
cated and remote access and Web sites from the perspective of the end user. and software development. Here is guaranteed over time. The operative
Visual IP InSight comprises three application suites that let IP services managers an alphabetical list of key terms you'll principle is simple:You must know
provide and track service level agreements, offer new levels of end-user customer encounter in most SLM activities and where you are before you can proceed
care, and monitor end-to-end network performance. The service level manage- interactions: to a better place.
ment suite includes Service Level Performance Reports: a series of programs that Access control: The process of Batch job concurrency: A measure
gather actual end-user performance information via the network operator's defining and controlling which users of the number of background jobs that
deployment of the Visual IP InSight client. Single Visual IP InSight installations can have access to which resources or ser- can be run on a computer system con-
take feeds from as few as 500 clients, scaling to the millions, the vendor says. vices, and determining the nature of currently. The optimal number of jobs
Visual IP InSight service level management reports can be used with other suite the authorized access. varies with operating system and the
applications in order to manage a user's end-to-end experience. The reports can be characteristics of the jobs themselves.
Agent: Software designed to collect
used, for instance, with Visual IP InSight Dial Care to furnish information about data about the status and functionality Capacity planning: The process
access functionality at end-user desktops, or with Visual IP InSight Dial Operations of a device, system, or application for of calculating the amount of system
to proactively manage network access, be it in-house or outsourced. The suite also reporting purposes. See the definition resources and network bandwidth that
can be used to track and manage the performance of application services, such as of Manager later on. will be required to support a service
virtual private networks, Web, email, and news. in the future.
Application service provider
Visual Networks Incorporated Capture ratio: The proportion of
(ASP): A company that provides
2092 Gaither Road applications remotely to user compa- CPU utilization that is actually used,
Rockville, MD 20850 nies over external facilities, typically compared with what is allocated for
including the Internet. processing. In most UNIX systems,
+1-301-296-2300
the capture ratio is not sufficient for
http://www.visualnetworks.com ARM: Application Response sophisticated performance analysis or
Measurement. An industry-wide capacity planning.
effort launched by Hewlett-Packard
and other vendors to create a set of Cotnmon Information Model
APIs (application programming inter- (CIM): An object-oriented informa-
faces) designed to be written into tion model created by the DMTF to
applications in order to measure manage systems, software, users, and
business transactions from an end- networks. The DMTF also provides
user perspective. a conceptual management framework
that establishes object definitions and
Availability: The percentage of classes for use with CIM.
time that a service is available for use.
242 z4i

CPU utilization: The amount of the development of management stan- l CAPS: 'l'he initials of the live basic Internal SLA: A Service Level
time an application requires to process dards for desktop, network, and system categories of tasks included in any Agreement used by the service
information in a computer's central environments. comprehensive network management provider to measure the performance
processing unit (CPU). CPU usage scheme: specifically, Fault manage- of groups within the service provider's
governs the response time a computer DMTF Service Level Agreement organization. An example might be
ment, Configuration, Accounting,
can deliver. (SLA) Working Group: A task force the SLA between a network services
Performance management, and
of DMTF members who are focused group within IT and the overall orga-
Critical deadlines: The specified
Security management.
on extending the DMTF's Common nization, or perhaps the CIO.
times at which certain jobs or tasks Information Model (CIM) to allow Historical data: Measurements over
must be completed in order to satisfy the definition and association of poli- time of the overall health of specific Intrusion detection: The process
external vendors or regulations. cies, rules, and expressions that enable service elements. Examples include of monitoring the IT environment to
common industry communications RMON/RMON 2 information col- detect unauthorized access or attempts
Data currency: An indication that
with respect to service management. lected by probes, CSU/DSUs, and to access resources illegally.
data is timely and up-to-date. Some
packet monitors at specific intervals— ISP: An Internet Service Provider,
measure of data currency is particu- Downtime: The amount of time
daily, weekly, or monthly. This data is or a carrier who offers dedicated
larly important to have when data is during which a system or network
distributed across multiple data stores placed in charts or graphs depicting or dial-up access to the Internet for
element or a service itself is not avail- how well service levels were met. consumers and business customers.
such as replicated databases, data able because of technical failure.
warehouses, and data marts. IETF Application Management
End-to-end service: A view of IT IT Infastructure Library: A docu-
Data integrity: The accuracy and MIB: The Internet Engineering Task mented methodology for managing
service that includes each of the end
consistency of data and database Force's Request for Proposal 2564: IT services created by the UK
users of a service and their locations,
structures. Application Management Information Government's Central Computing and
together with the path they take to Base. This spec defines units of work Telecommunications Agency (CCTA).
Decode: The process of using special access the business application provid- in a system or application and specifies
technology to intercept and analyze ing the core part of the service. Kernel: The inner portion of a
ways of measuring response time,
data packets as they traverse a net- End-user's perspective: The perfor- monitoring resource usage by applica- UNIX operating system that interacts
work. Used to troubleshoot and mance of a service as it is experienced tion (such as via I/O statistics and directly with the hardware of a com-
determine overall quality of data by the user at the desktop. This per- application layer network resource puter system in order to govern the
transmission. spective is the ultimate measure of usage), and controlling applications (by order in which resources (files, data,
service quality. stopping, suspending, resuming, and and so on) are handled. The kernel is
Differentiated services: The assign-
reconfiguring them as needed).While the source of multitasking capabilities
ment of specific levels of service to Expectation creep: The basic char- in UNIX environments.
different groups of users, based on cost not focused on service level manage-
acteristic in human nature to always
or other criteria. Gold customers, for ment, RFC 2564 can assist in measur- Latency: The amount of time during
want more and better—regardless of
ing and managing service quality. a service transaction that is consumed
instance, might be offered continuous the subject. In service level manage-
availability at an agreed-upon level of ment, expectation creep describes how In-House SLA: A Service Level by the processing of network devices
response time; silver customers would users will pressure IT to exceed ser- Agreement between a service pro- such as routers.
get response times within a certain vice levels. With SLAs in place, IT is vider and a client within the same Layered monitoring: An approach
range of measurement; and bronze assured that performance and capacity organization. to SLM in which data from agents
customers would receive "best effort" increases will be acknowledged and installed on each network device, sys-
service. Interactive responsiveness: The
perhaps paid for by the client, instead tem, or application is consolidated in
time taken to complete a request
DMTF: Distributed Management of being taken for granted. order to obtain a comprehensive view
on behalf of a user. The quicker the
Task Force, which is a consortium of External SLA: A Service Level requests are completed, the more of performance.
users and vendors dedicated to leading Agreement between a service provider responsive the service.
and a client in another organization.
244 245
Lines of business: Those parts of an Privilege class: A group of staffers or 1(MON and RMON 2: Remote methodology or establishing accept-
organization that function as separate operations personnel who are awarded network monitoring management able levels of service in keeping with
business entities when viewed from a particular type of access based on information base; an SNMP MIB business processes and costs.
the highest level. job function, job level, organization designed to track packet-level activity Service subscription: An SLM
Manager: Software designed to structure, physical location, or some on network links and connections, as model in which users agree to a level
gather, consolidate, and display man- combination of these. opposed to monitoring the status of of service that adds up to a specified
agement data about network devices, Probes: Standalone hardware devices
specific devices, systems, or applica- amount of IT resources during a
systems, and applications. See the defi- tions. RMON 2 is a later version given time period.
containing RMON and RMON 2
nition of Agent earlier in this glossary. of the MIB that identifies traffic on
agents along with packet parsing and Simulated transactions: Software
filtering engines similar to those used particular subnets.
Mean time to recover: The average routines that mimic the activity of
amount of time taken to cease pro- in protocol analyzers. Scheduled maintenance: The specific business tasks on a corporate
cessing, restore a stable environment, Real-time data: Events reported
performance of IT functions such as or service provider network; often
recover corrupted data, and recreate backup during scheduled downtimes. used to obtain consistent readings on
from the network directly as they
lost transactions. occur. Examples include broken Secondary data collector: A man- response time and availability.
Performance: The responsiveness of routers, congested links, and agement tool that does not communi- SLM domains: Those components
an application or a network to inter- malfunctioning adapters. cate directly with the managed of a network service that must be
active users. Performance is expressed Recoverability: The ability to
environment (although some sec- monitored and measured as part of a
in response time of a service to end ondary data collectors are able to do service level management strategy.
resume processing after unplanned
users, in the time required for a server outages as rapidly as possible. so, if necessary). Secondary data col- SLM domains typically include net-
to deliver a response to a user com- lectors extract data from other prod- work devices and connections, servers
mand, or the time required for a Registry: A database, directory, or ucts that are primary data collectors. and desktops, applications, databases,
request to be fulfilled or a batch job file that is used to hold and maintain and transactions.
Security: The actions involved in
to be processed on the mainframe. security information about users and
defining who can access a service, the SNMP: The Simple Network
resources.The use of a registry can
Physical layer performance: The nature of the access, and the mecha- Management Protocol (SNMP), cre-
simplify the administration of user
uptime of cable links and device nisms used to detect, prevent, and ated by the Internet Engineering Task
groups and the assigning of access
interfaces on a network. privileges to resources. report unauthorized access. Force (IETF), is a coding scheme that
Service Level Agreement (SLA): uses Management Information Bases
Primary data collector: A manage- Replicated data: Data that is copied
A contract between IT and its clients (MIBs) to retrieve configuration, fault,
ment tool that captures data directly from one location to another in the
that specifies the parameters of system and performance information about
from the network elements underlying course of completing specific business
capacity, network performance, and network components.
the service (bridges, routers, switches, processes or transactions.
hubs, and so forth). Some primary overall response time required to meet Socket: A programming call that links
data collectors also gather input from Resources: The services, data, appli- business objectives. The instrument for various portions of an application to
software programs that affect overall cations, systems, and network elements enforcing SLM. one another in a networked environ-
service availability (applications, data- involved in delivering a particular ser- ment; for example, a client portion of
Service level management (SLM):
bases, middleware, and the like). vice to users. an application to the server portion.
The continuous process of measuring,
Although not dedicated to service Response time: The time required reporting, and improving the quality Transaction: The performance of a
level management, these tools are for a user to get a reaction from a of service provided by the IT organi- business task by one or more users of
often key to facilitating it. server, mainframe, or other system zation. Includes a proactive, disciplined a computer system that results in a
entity after pressing a command on change to the state of a business appli-
the keyboard. cation or the data associated with it.
Turnaround time: The time
required for completion of processing
that does not require direct inter-
Index
action with either the user or system
operator. Turnaround time refers to
actions that take place in batch mode
or as background tasks.
A monitoring tools, 104-105
WBEM: The DMTF's Web Based RMON, 106
Enterprise Management (WBEM) access. See also security SNMP, 106
specifications, which include CIM customers, to reports, 42 standards, 107
data descriptions, XML transport users Web-based alternatives, 107
encoding, and http access. defining, 29 aggregating information (monitor-
group privileged access definition, 30 ing by footprint), 159
Workload level: The volume of accounting utility (UNIX), 158 Amdahl Corporation, 213
processing performed by a particular accuracy APIs (application programming
service, typically measured in the service level objectives, 63 interfaces)
number of transactions, client/server services ARM
interactions, or batch jobs performed. batch job control, 32 history, 114
data currency, 31 procedure calls, 83
data integrity, 31 Software Developer Kit, 84
maintenance issues, 32 measuring end-to-end response
ACE (Adaptive Correlation times, 162-163
Engine), 232 Appliant Incorporated, 213
ADC Telecommunications Application Management MIB,
Incorporated 82-83
ServicePoint series, 211-212 Application Response
Web site, 212 Management. See ARM
adding services, setting response application service providers. See
goals, 92 ASPs
administration of Service Level application transactions, 26-27
Agreements, 74 applications
administration tools, 121 business, increased importance, 18
Adtran Incorporated, 212 monitoring, 112-114
affordability. See cost applications domain (networks),
agents, 105 monitoring, 112-114
benefits, 106 approving Service Level
client, measuring end-to-end Agreements, 75, 190
response times, 161-162 Appvisor Application
drawbacks, 106-107 Management, 213
hardware, 106 Aprisma Management Technology,
management 214
characteristics, 164-165 ARM (Application Response
drawbacks, 164 Management)
limiting, 165 history, 114
manager-agent model, 104-105 procedure calls, 83
optimizing, 165-166 Software Developer Kit, 84
ASPs (application service Avesta Technologies incorporated, RMON, 106
business process management, 186
providers), 170. See also service 215 SNMI? 106
business transactions, 26-27
providers standards, 107
Axios Products Incorporated, 215 business 'Web sites, importance, 180
defined, 171 Web-based alternatives, 107
service challenges, 174 application management, 185
assigning cost B C Appvisor Application Management,
assigning IT costs to lines of balancing workloads, 130 calculating 213
business, 35-37 baseline performance, 143-144 cost avoidance improvements, 136 ARM, 83-84
business transaction volume batch jobs downtime cost, 134 history, 114
allocation, 37 accuracy issues, 32 employee cost, 134 procedure calls, 83
complexity, 36 concurrency, 27-28 lost business cost, 135 Software Developer Kit, 84
reports, 45-46 dependencies, 28 productivity cost, 135 Attention!, 214
service subscription allocation, 37 processing, measuring service Service Level Agreement penalties, availability, 19
usage cost allocation, 36 quality, 25 135 benefits, 183
AT&T Service Level Agreements, benefits (SLM value), 15-18, 203 Candle Corporation, 217 Bluebird, 232
96 billing time-saving strategies, 209 CCTA (Central Computing and BMC Software Incorporated, 216
Attention Software Incorporated, competitive sales advantage, 209 Telecommunications Agency), 78 Candle Corporation, 217
214 cost control, 17-18 charters (Service Level Agreement CiscoWorks 2000, 218
Attention!, 214 customer credibility, 208 negotiating teams), 59 Continuity, 225
attorneys (Service Level documentation, 18 CIM (Common Information CrossKeys Resolve, 220
Agreements), 57 IT profile improvement, 17 Model), 81 CSU/DSUs, 108
availability operations time-saving strategies, Cisco Systems Incorporated Custom Network Analysis, 230
commercial SLM products, 19 208 CiscoWorks 2000, 218 customizing, 185
determining baselines, 143 reporting time-saving strategies, 208 Web site, 219 Do It Yourself, 229
service level objectives, 62 resource regulation authority, 17 client-by-client SLM implementa- EcoSCOPE, 219
services, 22, 154 SLA numbers credibility; 209 tion, 139 EcoTOOLS, 220
availability schedule (Service Level user expectation management, 16 clients. See also customers; users Empirical Suite, 222
Agreements), 191 user satisfaction gains, 16 agents, measuring end-to-end Energizer PME, 234
change request process (Service Level value areas, 204, 207 response times, 161-162 EnView, 213
Agreements), 192 billing time-saving strategies, 209 client/server interactions, 27 evolving market, 184-185
component availability, 23 Bluebird, 232 SLM implementation eWatcher, 215
end-to-end SLM, 22 BMC Software Incorporated, 216 client presentation tips, 141-142 Executive Information System, 233
importance, 23 breaches, security establishing client contacts, 141 FirstSense Enterprise, 223
maintenance, 32 intrusion reports, 44 selecting client implementation order, Foglight, 235
new user request process (Service intrusion detection, 30 140 Frame Relay Access Probe, 237
Level Agreements), 192 real-time alerts, 50-52 CMG (Computer Measurement Help Desk, 238
nonemergency enhancements guide- breaching Service Level Group), 78 HP OpenView ITSM Service Level
lines (Service Level Agreements), Agreements, 68-71 commands (synthetic transac- Manager, 225
192 Bridgeway Corporation tions), 164 Information Technology Service
reporting, 43 Keystone CNM, 216 commercial SLM products, Management (ITSM), 103
scheduled events (Service Level Keystone VPNview, 216 101-102. See also tools; utilities InfoVista Corporation, 226
Agreements), 192 Web sites, 217 administration tools, 121 IQ series, 212
service availability, 23 budgets (IT), 8 agents, 104-105 Keystone CNM, 216
user-relevant availability, 22 Bullsoft, 217 benefits, 106 Keystone VPNview, 216
business applications, increased drawbacks, 106-107 Luminate Software Corporation,
importance, 18 hardware, 106 228
manager-agent model, 104-105 software license savings, 207 scheduled events, 192
managers, 104-105 benefits, 55-56
SOLVE Series, 237 change request process, 192 scope, 61
monitoring tools, 102, 104-113 Spectrum. SLM products, 214 service availability schedule, 191
NetClarity Suite, 227 conventions, 190
Statscout, 237 creating, 56-58 service description, 191
Netcool suite, 229 Tivoli Service Desk, 238 documentation, 18, 61 service level indicators, 66-67
NetPredict Incorporated, 230 tool inventories, 146 service level objectives
effectiveness, 14-15
NetScout Systems Incorporated, 231 TREND, 221 accuracy, 63
elements, 97-98
NetSolve Incorporated, 231 Trinity, 215 affordability, 65
Netuitive Incorporated, 232 exclusions, 71
Unicenter TNG products, 219 external, 56-57, 95-97 attainability, 63-64
NetVoyant, 239 upgrading existing tools, 146-147 formats, 182 availability, 62
Network Health—Service Level Vistaviews, 103 controllability, 65
Reports, 220 Gartner Group, 88
Visual IP InSight, 240 Giga Group, 89 criteria, 64
OpenLane, 235 Visual Uptime, 239-240 GTE Internetworking, 96 measurability, 65
OpenMaster, 217 VitalSuite, 228 Hurwitz Group, 89 mutual acceptability, 66
OpenView Network Node WANsuite, 239 incentives, 183 number, 63
Manager (NNM), 103 WANview, 236 in-house, 56 performance, 62
Optivity SLM, 233 Wise IP/Accelerator, 230 internal, 56-58 relevance, 65
packet monitors, 107-108 communication between IT and internal SLA template, 195-196 selecting, 62-66
PacketShaper, 234 lines of business, 181 Internet services, 172 stretch objectives, 62
Pegasus, 224 compensation (IT personnel), 183 IT/service provider agreements, 15 understandability, 65
PerformanceWorks software, 227 complaints, negotiating, 150-151 service measures, 193-194
PILOT, 215 limitations, 61-62
Computer Associates International managing, 182-183 Service Value Agreements (META
Preside Performance Reporting, 233 Incorporated, 219 managing user expectations, 55-56 Group), 89
primary data collectors, 102 Computer Measurement Group setting with lines of business, 93-95
probes, 107-108 MCI WorldCom, 96
(CMG), 78 meeting service level objectives, Sprint, 96
reporting tools, 117 Compuware Corporation 204-205 stakeholder groups, 59
customizing reporting information, EcoSCOPE, 219 standard setting, 53-55
120 NaviSite, 96
EcoTOOLS, 220 negative perception, 14 standards, 172
historical data, 119 Web site, 220 negotiating, 60, 144-145, 181 statement of intent, 189
information sources, 118 Concord Communications negotiation teams, 58-59 structure, 97
real-time data, 118-119 Incorporated, 220 new user request process, 192 term, 61
report presentation issues, 119-120 conferences, service improvement user abuses, 14
ResponseCenter, 236 nonemergency enhancements
issues, 184 user environment, 191
RPM 3000, 221 guidelines, 192
configuring networks, configura- number credibility, 209 UUNET Technologies, 97
SAMAN, 224 tion monitoring, 115 optional services, 71 conventions (Service Level
secondary data collectors, 103 connections (networks), parties to the agreement, 61 Agreements), 190
selecting implementation tools, monitoring, 110-111
penalties, 68-71, 135 cost
145-146 consultants, Internet services, 171 popularity, 14-15 customizing reporting information,
Service Level Analyzer, 103 Continuity, 225
priorities, 93-94 120
Service Level Suite, 222 contracts (Service Level
reporting IT cost pressures, 8
Service Management Architecture, Agreements), 13. See also services; network usage accounting, 116
226 reporting specifications, 71-72
SLM reports, 45-46
ServicePoint series, 211-212 selecting report personnel, 73-74
administration, 74 resource regulation authority, 17 service level objectives, 65
simulations, 108-109 approving, 75, 190
reviewing, 74, 190 services, 34
SLM analysis tools, 120-121 AT&T, 96
revising, 75 allocation complexity, 36
assigning, 35-37, 45
business transaction volume databases servers and desktops, 112 end-to-end response times
allocation, 37 monitoring, 1.12-114 transactions, 112-114 measuring, 153-154
cost avoidance improvements, 136 monitoring by footprint, 1 . 59 downtime selecting measurement methods,
cost justification worksheet, 131-133 databases domain (networks), cost, 127, 134-135 155-156
downtime cost, 127, 134-135 monitoring, 112-114 planned, 50 APIs, 162-163
e-commerce, 36 deadlines, 25 Service Level Agreement penalties, client agents, 161-162
employee cost, 134 defining 135 generating synthetic transactions,
justifications, 125-126 group privileged access, 30 163-164
linking cost to value, 37-38, 126 end-to-end SLM, measuring
managed services, 180 E
lost business cost, 135 reporting specifications (Service quality, 22-23, 182
measuring, 34-36 Level Agreements), 71-72 Eastern Research Incorporated, Energizer PME, 234
primary costs, 126-127 service level indicators, 66-67 221 EnView, 213
productivity cost, 135 service levels, 13 eBA*ServiceMonitor, 217 Envive Corporation
Service Level Agreement penalties, user access, 29 eBA*ServiceNetwork, 217 Service Level Suite, 222
68-71, 135 dependencies (batch jobs), 28 e-business Web site, 223
service subscription allocation, 37 DeskTalk Systems Incorporated, expansion, 170 errors. See outages
usage cost allocation, 36 221 SLM challenges, 173 establishing client contacts, 141
SLM cost control, 17-18 desktops, monitoring, 112 e-commerce, 9 ETEWatch, 218
software license savings, 207 detecting intrusion, 30 expansion, 170 event-driven measurement,
CrossKeys Systems Incorporated real-time alerts, 50-52 IT costs, 36 166-167
CrossKeys Resolve, 220 reports, 44 SLM challenges, 173 eWatcher, 215
Web site, 221 developing (Service Level EcoSCOPE, 219 exclusions (Service Level
CSU/DSUs, 108 Agreements) EcoTOOLS, 220 Agreements), 71
Custom Network Analysis, 230 elements, 97-98 editing Service Level Agreements, Executive Information System, 233
customers. See also clients; users structure, 97 75 executive summaries, 42
access to reports, 42 devices, network, 110-111 eliminating help desks, 206-207 expectation creep, 55-56
credibility perception, 208 differentiated services, creating, Empirical Software Incorporated external Service Level Agreements,
loyalty, 128-129 176 employees 56-57,95-97
surveys, 42, 150, 197 distributing reports, 148 cost, 134-135
future requirements section, 200 DMTF (Distributed Management IT personnel F-G
general comment areas, 199 Task Force), 184 communicating with lines of business,
IT contact frequency, 199 181 failures. See outages
CIM, 81
contact frequency, 199 fault management monitoring, 115
optional information section, 200 SLA Working Group, 81-82
service quality ratings, 197-198 cost avoidance improvements, 136 FCAPs management model, 114
Web site, 78
service usage information, 199 incentives, 183 configuration monitoring, 115
Do It Yourself (DIY), 229
customizing increasing business leadership role, fault management monitoring, 115
documentation. See Service Level
reporting information, cost issues, 181 network usage accounting, 116
Agreements
120 operations time-saving strategies, 208 performance management, 116-117
domains
services, 185 organizing service teams, 182 security management, 117
applications, 112-114
productivity, 130-131 FirstSense Enterprise, 223
databases, 112-114
reporting time-saving strategies, 208 FirstSense Software Incorporated,
D network devices and connections,
223
110-111 SLM implementation issues,
daily reports, 47, 201-202 139-140 Foglight, 235
planning monitoring strategies,
data currency/integrity, 31 slowing staff growth, 205 formalizing external Service Level
109-110
end users. See users Agreements, 96
formats Hewlett-Packard rollout stratNies, 138-139 ISPs, 170
reports, 46-48 ARM, 83 sattisjiction surveys, 150 services
Service Level Agreements, 182 selecting client implementation order, benefits, 170
HP OpenView ITSM Service Level
Forrester Research, 89 140 emerging service challenges, 173-174
Manager, 225
FRAP (Frame Relay Access selecting necessary tools, 145-146 growing demand, 170
Information Technology Service
Probe), 237 Management (ITSM), 103 service level negotiation, 144-145 implementing, 174
frequency (reports) OpenView Network Node setting service management teams, outsourcing, 171
daily, 47, 201-202 142 Service Level Agreements, 172
Manager (NNM), 103
monthly overviews, 48, 202 Web site, 225 tool inventories, 146 Internet Engineering Task Force.
quarterly summaries, 49, 202 historical data, 119 upgrading existing tools, 146-147 See IETF
real-time reporting, 49-52 HP OpenView ITSM Service Level importance. See impact introducing. See implementing
weekly summaries, 48, 202 Manager, 225 improving services intrusion
functions, monitoring, 114 Hurwitz Group (Service Level research, 184 detection, 30
configuration, 115 Agreements), 89 strategies, 183-184 real-time alerts, 50-52
fault management, 115 in-house Service Level reports, 44
network usage accounting, 116 Agreements, 56 inventories (SLM tools), 146
IETF (Internet Engineering Task
performance management, 116-117 Force), 184 incentives (IT personnel), 183 IQ series, 212
security management, 117 Application Management MIB, information sources, 118 ISPs (Internet service providers),
82-83 information technology. See IT 170
Ganymede Software Incorporated, Web site, 78 Information Technology Service IT (information technology). See
224 impact (services) Management (ITSM), 103 also IT personnel
Gartner Group (Service Level balancing workloads, 130 InfoVista Corporation cost pressures, 8
Agreements), 88 customer loyalty, 128-129 Vistaviews, 103 critical business role, 9
Gecko Software Limited IT personnel productivity, 130-131 Web site, 226 electronic commerce, 9
SAMAN, 224 planning upgrades, 129 Intelligent Communication improving profile, 17
Web site, 225 revenue, 127-128 Software GmbH, 225 increased user dependence, 8-9, 18
generating synthetic transactions user productivity, 129 inter-server response time, 155-156 increased user knowledge, 8, 18
benefits, 163-164 implementing interactive responsiveness, 24 negative SLM perception, 14
drawbacks, 163 Internet services, 174 intercepting socket traffic (net- Service Level Agreements. See
ping command, 164 SLM works), 161 Service Level Agreements
traceroute command, 164 baseline sampling duration, 143-144 internal reports, 41-42 IT Infrastructure Library. See ITIL
Giga Group (Service Level internal Service Level Agreements, IT Infrastructure Management
client presentation tips, 141-142
Agreements), 89 client-by-client, 139 56-58 Forum, 80
group privileged access definition, determining baselines, 143-144 internal SLA template, 195-196 IT personnel. See also IT
30 Internet communicating with lines of
establishing client contacts, 141
GTE Internetworking (Service establishing reporting procedures, ASPs, 170 business, 181
Level Agreements), 96 147-148 defined, 171 contact frequency, 199
service challenges, 174 cost avoidance improvements, 136
implementation follow-up procedures,
H-I 149-150 business Web sites, 180 incentives, 183
e-business increasing business leadership role,
IT personnel priority, 139-140
hardware agents, 106 expansion, 170 181
negotiating complaints, 150-151
help desks, 238 SLM challenges, 173 operations time-saving strategies,
ongoing open communication, 150
eliminating, 206-207 e-commerce, 9 208
ongoing service management team
reducing calls, 205-206 expansion, 170 organizing service teams, 182
meetings, 150
planning, 137-138 IT costs, 36 productivity, 130-131
SLM challenges, 173
reporting time-saving strategies, 208 limitations (Service Level managing management agents, 164
SLM implementation. issues, Agreements), 61-62 Service Level Agreements, 182-183 characteristics, 164-165
139-140 lines of business drawbacks, 164
services
slowing staff growth, 205 communication with IT personnel, application management, 185 limiting, 165
ITIL (IT Infrastructure Library), 181 ASPs, 174 manager-agent model, 104-105
81 balancing workloads, 130 optimizing, 165-166
increasing leadership of IT
CCTA, 78 personnel, 181 benefits, 15-18 network traffic,
modules, 78-80 measuring service revenue impact, business process management, 186 drawbacks, 161
principles, 79-80 128 commercial products, 19, 183-185 intercepting socket traffic, 161
support, 80 reports communication, 181 wire sniffing, 160
training, 80 relating services to customer cost control, 17-18 quality, 94
Web site, 78 satisfaction, 41 current practices, 90-91 customer surveys, 42, 197-200
ITSM (Information Technology relating services to performance, current quality perception, 91 end-to-end SLM, 22-23, 182
Service Management), 103 40-41 current understanding, 87-88 internal reports, 41
ITSMF (IT Service Management service teams, 182 customizing solutions, 185 sampling-based measurement
Forum), 80, 184 setting Service Level Agreements, defining managed services, 180 benefits, 166
iView, 233 93-95 documentation benefits, 18 drawbacks, 167
loyalty (customers), 128-129 e-business, 173 sampling frequency, 167
J—L Lucent Technologies NetCare e-commerce, 173 service level indicators, 66-67
Professional Services, 228 emerging research, 88-90 service measures (Service Level
jobs, batch Agreements), 193-194
Luminate Software Corporation, external suppliers, 95-97
accuracy issues, 32 services
228 focus areas, 90
concurrency, 27-28 baseline sampling duration, 143-144
Service Level Analyzer, 103 future goals, 179-180
dependencies, 28 customer loyalty impact, 128-129
Web site, 229 improvement strategies, 183-184
measuring service quality, 25 determining baselines, 143-144
IT profile improvement, 17
justifying service costs, 125-126 frames of reference, creating, 176
M resource regulation authority, 17
cost justification worksheets, individual components method, 9-11
service level reporting, 98
131-133 maintenance IT/user perspective gap, 11
setting initial goals, 91-93
Jyra Research Incorporated accuracy issues, 32 lines of business input, 128
tools, 99
Service Management Architecture, availability issues, 32 revenue impact, 127-128
user expectation management, 16
226 planned downtime, 50 service characteristics measured,
user satisfaction gains, 16
Web site, 227 management 153-155
MCI WorldCom (Service Level
agents Agreements), 96 technical information, 11
Keystone CNM, 216 characteristics, 164-165 technical limitations, 11-12
measuring. See also monitoring;
Keystone VPNview, 216 drawbacks, 164 technological advancements, 19
monitoring tools
limiting, 165 cost, 34-36 user productivity impact, 129
Landmark Systems Corporation manager-agent model, 104 105 value
- end-to-end response times, 153-154
PerformanceWorks software, 227 optimizing, 165-166 benefits, 203-204, 207-209
APIs, 162-163
Web site, 228 monitoring tools, 104-105 ROI, 203-207, 210
client agents, 161-162
LANQuest, 227 reports workload levels
generating synthetic transactions,
lawyers (Service Level relating services to business perfor- batch job concurrency, 27-28
163-164
Agreements), 57 mance, 40, 120 batch job dependencies, 28
selecting measurement method,
levels, service. See Service Level client/server interactions, 27
reporting service difficulties, 40 155-156, 161
Agreements reports, 44
Management Information Bases event-driven measurement, 166-167
(MIBs), 159 inter-server response time, 155-156 transaction rates, 26-27
meeting hardware, IWI NutScout. Systems Incorporntecl,
(It-Alines, 25 RAP )N, ilia 231 criteria, 64
service level objectives, 204-205 SNMI', 1(16 NetSolve Incorporated, 231-232 measurability, 65
meetings, negotiation teams standards, 107 Netuitive Incorporated, 232 mutual acceptability, 66
(Service Level Agreements), 60 Web-based alternatives, '107 NetVoyant, 239 number, 63
META Group ARM, 114 network devices and connections performance, 62
Service Value Agreements, 89 CSU/DSUs, 1.08 domain (networks), monitoring, relevance, 65
SLM locus areas, 90 manager-agent model, 104-105 110-111 selecting, 62-66
metrics, measuring service levels, managers, 104-105 Network Health—Service Level stretch objectives, 62
153-155 packet monitors, 107-108 Reports, 220 understandability, 65
MIBs (Management Information performance management, 116-117 network packet decoding, 160 OpenLane, 235
Bases), 159 primary data collectors, 102 networks OpenMaster, 217
Micromuse Incorporated, 229 probes, 107-108 AT&T, 96 OpenView Network Node
modules (ITIL), 78-80 secondary data collectors, 103 configuration, monitoring, 115 Manager (NNM), 103
monitoring. See also measuring; simulations, 108-109 domains operating systems. See OSs
monitoring tools monthly overviews (reports), 48, applications, 112-114 Opticom Incorporated
functions, 114 202 databases, 112-114
Executive Information System, 233
configuration, 115 network devices and connections, Web site, 234
fault management, 115 110-111
optimizing management agents,
network usage accounting, 116
N 165-166
planning monitoring strategies,
performance management, 116-117 N*Manage Company 109-110
OptiSystems Incorporated, 234
security management, 117 Bluebird, 232 servers and desktops, 112 Optivity SLM, 233
monitoring by footprint, 156 Web site, 233 transactions, 112-114 organizing IT personnel, 182
aggregating information, 159 NaviSite (Service Level external service management, 95-97 OSs (operating systems)
databases, 159 Agreements), 96 GTE Internetworking, 96 UNIX
drawbacks, 156 negotiating MCI WorldCom, 96 accounting utility, 158
measurement factors, 157 Service Level Agreements, 144-145, monitoring by footprint, 157-158
MIBs, 159
MIBs, 159 181 ps utility, 158
NaviSite, 96
networks, 159 documentation, 61 sar utility, 158
SNMP, 159
SNMP, 159 goals, 60 Sprint, 96 Windows NT/2000
UNIX systems, 157-158 negotiation teams, 58-59 monitoring by footprint, 158-159
traffic
Windows NT/2000 systems, nonpe rformance consequences, 68-71 intercepting socket traffic, 161 Perfmon utility, 159
158-159 preparation, 60 measurement drawbacks, 161 Process Explode utility, 159
network domains scheduling negotiation meetings, 60 Quick Slice utility, 159
wire sniffing, 160
applications, 112-114 stakeholder groups, 59 Taskmanager utility, 159
usage accounting, 116
databases, 112-114 user complaints, 150-151 UUNET Technologies, 97 outages. See also recovery
network devices and connections, NetClarity Suite, 227 Nortel Networks Corporation, 233 downtime cost, 127, 134
110-111 Netcool suite, 229 lost business cost, 135
planning strategies, 109-110
servers and desktops, 112
NetOps Corporation
Custom Network Analysis, 230
o outage alerts, 49
planned downtime, 50
transactions, 112-114 Do It Yourself, 229 objectives, service level Service Level Agreement penalties,
SLM analysis tools, 120-121 Web site, 230 accuracy, 63 135
monitoring tools, 102, 104-113. See NetPredict Incorporated, 230 affordability, 65 types, 32-33
also measuring; monitoring Netreality attainability, 63-64 outsourcing Internet services, 171
agents, 104-105 Web site, 231 availability, 62
benefits, 106 Wise IP/Accelerator, 230
drawbacks, 106-107
P deterrniuinq baselines, 143-144 R
establishing client contacts, 141
Q
packet monitors, 107-108 real-time data, 118-119
establishing reporting procedures, qualifying Service Level
Packeteer real-time reporting
147-148 Agreements, 61-62
PacketShaper, 234 outage alerts, 49
Web site, 235 implementation follow-up procedures, quality
customer surveys, 42, 197 performance alerts, 50
packets (network), 160 149-150
future requirements section, 200 planned downtime, 50
PacketShaper, 234 IT personnel priority, 139-140
general comment areas, 199 security alerts, 50-52
Paradyne Corporation, 235 negotiating complaints, 150-151
IT contact frequency, 199 recovery (services), 32, 154. See also
Pegasus, 224 ongoing open communication, 150
optional information section, 200 outages
penalties (Service Level ongoing service management team
service quality ratings, 197-198 recovery reports, 45
Agreements), 68-71, 135 meetings, 150
service usage information, 199 recovery time necessary, 34
percent conventions (Service Level rollout strategies, 138-139
satisfaction surveys, 150 services stages, 33
Agreements), 190 time-specific recovery, 33
selecting client implementation order, accuracy, 31-32
Perfmon utility (Windows reducing help desk calls, 205-206
140 availability issues, 22-23, 32, 154
NT/2000), 159 redundant resources, 130
selecting necessary tools, 145-146 batch job processing, 25
performance reliability (services), 154
service level negotiation, 144-145 client/server interactions, 27
baseline, 143-144 report card format (reports), 46-48
setting service management teams, cost, 34-38, 126
frames of reference, creating, 176 reporting. See also reports
142 deadlines, 25
service level objectives, 62 establishing reporting procedures
tool inventories, 146 end-to-end SLM, 22-23, 182
services, 24-25, 154 report distribution, 148
upgrading existing tools, 146-147 improvement strategies, 183-184
batch job processing, 25 setting reporting schedule, 148
PMN (Performance Monitoring individual components measurement
deadlines, 25 source reliability issues, 147
Network), 218 method, 9-11
importance, 24 tools, 117
Preside Performance Reporting, interactive responsiveness, 24
interactive responsiveness, 24 customizing reporting information,
233 internal reports, 41
performance alerts, 50 120
primary data collectors, 102 measuring, 94
performance reporting, 43-44 historical data, 119
priorities (Service Level perception, 91
setting initial goals, 92 information sources, 118
Agreements), setting, 93-94 performance issues, 24-25, 154
user perception, 25 real-time data, 118-119
privacy (security issues), 30-31 recoverability, 154
SLM analysis tools, 120-121 report presentation issues, 119-120
proactive reporting, 39, 42, 98 recoverability strategies, 32-34
performance management tools, reports. See also reporting
116-117 probes, 107-108 reliability, 154
reporting technical information, 11 allowing outside customers report
problems, tracking, 92
Performance Monitoring Network access, 42
(PMN), 218 procedure calls (ARM), 83 security, 28-31
Process Explode utility (Windows technical limitations, 11-12 cost, 45-46
PerformanceWorks software, 227 daily, 47, 201-202
NT/2000), 159 transaction rates, 26-27
PILOT, 215 executive summaries, 42
processing batch jobs, 25 workload levels, 26-28
ping command, 164 internal, 41-42
productivity quarterly summaries (reports), 49,
planned downtime, 50 lines of business
planning cost, 135 202
Quest Software, Incorporated relating services to customer
network domain monitoring cost avoidance improvements, 136
IT personnel, 130-131 Foglight, 235 satisfaction, 41
strategies, 109-110 relating services to performance,
users, 129 Web site, 236
service upgrades, 129 40-41
products, SLM. See commercial Quick Eagle Networks, 236
SLM implementation, 137-138 management
SLM products Quick Slice utility (Windows
baseline sampling duration, 143-144 relating services to business perfor-
protocols, SNMB, 159 NT/2000), 159
client presentation tips, 141-142 mance, 40, 120
client-by-client, 139
ps utility (UNIX), 158
reporting service difficulties, 40
monthly overviews, 48, 202 RMON agents, 106 selecting AT&T, 96
performance reporting, 43-44 ROI (Return on Investment), 203 client SLM implementation order, benefits, 55-56
proactive reporting, 39, 42, 98 ROI analysis, 210
140 change request process, 192
quarterly summaries, 49, 202 value areas end-to-end response time measure- conventions, 190
real-time reporting
eliminating help desk, 206-207 ment methods, 155-156 creating, 56, 58
outage alerts, 49 meeting service level objectives, documentation, 18, 61
pe rformance alerts, 50 APIs, 162-163
204-205 client agents, 161-162 effectiveness, 14-15
planned downtime, 50 reducing help desk calls, 205-206 elements, 97-98
generating synthetic transactions,
security alerts, 50-52 slowing staff growth, 205 163-164 exclusions, 71
report card format, 46-48 software license savings, 207 negotiation teams external, 56-57, 95-97
reporting technology advancements, Router/Performance Monitor charters, 59 formats, 182
19 3000, 221 equal representation, 59 Gartner Group, 88
reporting time-saving strategies, 208 RPM 3000, 221 Service Level Agreements, 58-59 Giga Group, 89
security intrusions, 44 RTN (Response Time Network), stakeholder groups, 59 GTE Internetworking, 96
selecting report personnel, 73-74 218 nonperformance penalties (Service Hurwitz Group, 89
service availability reporting, 43
Level Agreements), 68-71, 135 in-house, 56
Service Level Agreement
S report personnel, 73-74 incentives, 183
specifications, 71-72
service level objectives internal, 56-58
service level reporting, 98 SAMAN (Service Level Agreement internal SLA template, 195-196
technical information, 11
accuracy, 63
Manager), 224 Internet services, 172
technical reporting limitations,
affordability, 65
sampling-based measurement IT/service provider agreements, 15
11-12 attainability, 63-64
benefits, 166 limitations, 61-62
weekly summaries, 48, 202
availability, 62
drawbacks, 167 managing, 182-183
workload levels, 44
controllability, 65
sampling frequency, 167 managing user expectations, 55-56
requests, user, 24 criteria, 64
sar utility (UNIX), 158 MCI WorldCom, 96
resources, redundancy, 130
measurability, 65
satisfaction surveys, 150 meeting service level objectives,
mutual acceptability, 66
Response Networks Incorporated, scheduling
number, 63 204-205
236 reports, 148
performance, 62 NaviSite, 96
Response Time Network (RTN), Service Level Agreement
relevance, 65 negative perception, 14
218 negotiations, 60
stretch objectives, 62 negotiating, 60, 144-145, 181
response times scope (Service Level Agreements),
understandability, 65 negotiation teams, 58-59
end-to-end, 153 61
servers new user request process, 192
measuring, 154 secondary data collectors, 103
client/server interactions, 27 nonemergency enhancements
selecting measurement method, security. See also access
guidelines, 192
155-156, 161-164 monitoring, 112
intrusion reports, 44 number credibility, 209
servers and desktops domain (net-
inter-server, 155-156 real-time alerts, 50-52
works), monitoring, 112 optional services, 71
ResponseCenter, 236 security management, 117
service description (Service Level parties to the agreement, 61
responsiveness, interactive, 24 services, 28
Agreements), 191 penalties, 68-71, 135
Return on Investment. See ROI access control definition, 29
Service Level Agreement Manager popularity, 14-15
revenues, service impact business ownership, 31
(SAMAN), 224 priorities, 93-94
lines of business input, 128 group privileged access definition, 30
Service Level Agreement Working reporting
measuring, 127-128 intrusion detection, 30
Group (SLA Working Group), reporting specifications, 71-72
reviewing Service Level multiple security systems, 29
81-82 selecting report personnel, 73-74
Agreements, 74, 190 privacy issues, 30-31
Service Level Agreements (SLAs), resource regulation authority, 17
revising Service Level Agreements, resource definition, 29
13. See also services; SLM reviewing, 74, 190
75
administration, 74 revising, 75
approving, 75, 190
scheduled events, 192 inutual acceptability, 66 business process management, 186
scheduled events (Service Level
scope, 61 number, 63 commercial products, 19, 183-185
Agreements), '192
service availability schedule, 191 performance, 62 communication, 181
reporting, 43
service description, 191 relevance, 65 cost control, 17-18
service availability, 23
service level indicators, 66-67 selecting, 62-66 current practices, 90-91
user-relevant availability, 22
service level objectives stretch objectives, 62 current quality perception, 91
business applications, 18
accuracy, 63 understandability, 65 current understanding, 87-88
cost, 34
affordability, 65 Service Level Suite (SLS), 222 customizing solutions, 185
allocation complexity, 36
attainability, 63-64 Service Management Architecture defining managed services, 180
assigning, 35-37, 45
availability, 62 (SMA), 226 documentation benefits, 18
business transaction volume
controllability, 65 service management teams, 142 e-business, 173
allocation, 37
criteria, 64 service providers. See also ASPs e-commerce, 173
cost avoidance improvements, 136
measurability, 65 implementing Internet services, 174 emerging research, 88-90
cost justification worksheet, 131-133
mutual acceptability, 66 user/service provider relationship external suppliers, 95-97
downtime cost, 134
number, 63 service provider dominance, 174-175 focus areas, 90
ecommerce, 36
performance, 62 service provider preparation future goals, 179-180
employee cost, 134
relevance, 65 suggestions, 176-177 IT personnel productivity, 130-131
justifications, 125-126
selecting, 62-66 service provider vulnerabilities, 176 IT profile improvement, 17
linking cost to value, 37-38, 126
stretch objectives, 62 user preparation suggestions, resource regulation authority, 17
lost business cost, 135
understandability, 65 174-175 service level reporting, 98
measuring, 34-36
service measures, 193-194 Service Value Agreements (SVAs), tools, 99
primary costs, 126-127
Service Value Agreements (META 89 user expectation management, 16
productivity cost, 135
Group), 89 ServicePoint series, 211-212 user satisfaction gains, 16
Service Level Agreement penalties,
setting with lines of business, 93-95 services. See also Service Level measuring
135
Sprint, 96 Agreements; SLM baseline sampling duration, 143-144
service subscription allocation, 37
stakeholder groups, 59 accuracy determining baselines, 143-144
usage cost allocation, 36
standard setting, 53-55 batch job control, 32 end-to-end response time, 153-154
customer loyalty impact, 128-129
standards, 172 data currency, 31 end-to-end response time measure-
differentiated, creating, 176
statement of intent, 189 data integrity, 31 improvement strategies, 183-184 ment methods, 155-156, 161-164
structure, 97 maintenance issues, 32 event-driven measurement, 166-167
Internet
term, 61 adding, 92 frames of reference, creating, 176
ASPs, 170-171, 174
user abuses, 14 availability, 22, 154 individual components method, 9-11
benefits, 170
user environment, 191 availability schedule (Service Level inter-server response time, 155-156
ebusiness, 170, 173
UUNET Technologies, 97 Agreements), 191 ecommerce, 9, 36, 170, 173 IT/user perspective gap, 11
Service Level Analyzer, 103 change request process (Service Level emerging service challenges, 173-174 management agents, 164-166
service level indicators, 66-67 Agreements), 192 growing demand, 170 sampling-based measurement,
service level management. See SLM component availability, 23 166-167
implementing, 174
service level objectives end-to-end SLM; 22 service characteristics measured,
ISPs, 170
accuracy, 63 importance, 23 153-155
outsourcing, 171
affordability, 65 maintenance, 32 technical information, 11
Service Level Agreements, 172
attainability, 63-64 new user request process (Service technical limitations, 11-12
managing
availability, 62 Level Agreements), 192 technological advancements, 19
application management, 185
controllability, 65 nonemergency enhancements guide- user productivity impact, 129
ASPs, 174
criteria, 64 lines (Service Level Agreements), monitoring by footprint, 156
balancing workloads, 130
measurability, 65 192 aggregating information, 159
benefits, 15-18
databases, 159
drawbacks 156 Compuware Corporation, 2211 Response Networks Incorporated,
service provider vulnerabilities, 176
measurement factors, 157 Concord Communications 236
user preparation suggestions,
MIBs, 159 174-175 Incorporated, 220 Statscout, 237
networks, 159 value CrossKeys Systems Incorporated, Sterling Software, 237
SNMP, 159 221 Sync Research Incorporated, 238
benefits, 203-204, 207-209
UNIX systems, 157-158 DeskTalk Systems Incorporated, 221 Tivoli Systems Incorporated, 238
linking value to cost, 37-38, 126
Windows NT/2000 systems, DMTF, 78, 82 Vantive Corporation, 239
ROI, 203-207, 210
158-159 Eastern Research Incorporated, 222 Verilink Corporation, 239
workload levels
performance, 24-25, 154 Empirical Software Incorporated, Visual Networks Incorporated, 240
batch job concurrency, 27-28
batch job processing, 25 222 SLA Working Group (Service Level
batch job dependencies, 28
deadlines, 25 Envive Corporation, 223 Agreement Working Group),
client/server interactions, 27
importance, 24 reports, 44 FirstSense Software Incorporated, 81-82
interactive responsiveness, 24 transaction rates, 26-27 223 SLAs. See Service Level
performance alerts, 50 setting Ganymede Software Incorporated, Agreements
performance reporting, 43-44 224 SLM (service level management),
initial service goals, 91-93
user perception, 25 Service Level Agreements Gecko Software Limited, 225 21. See also Service Level
problems, tracking, 92 Hewlett-Packard, 225 Agreements; services
lines of business, 93-95
quality. See quality IETF, 78 ASP service challenges, 174
priorities, 93-94
recovery, 32, 154 InfoVista Corporation, 226 benefits (SLM value), 15-18, 203
standards, 53-54
recovery reports, 45 Intelligent Communication Software billing time-saving strategies, 209
signing Service Level Agreements,
recovery time necessary, 34 75 GmbH, 225 competitive sales advantage, 209
stages, 33 ITIL, 78 cost control, 17-18
Simple Network Management
time-specific recovery, 33 Jyra Research Incorporated, 227 customer credibility, 208
Protocol (SNMP), 159
redundant resources, 130 Landmark Systems Corporation, 228 documentation, 18
simulation software, 108-109
reliability, 154 sites LANQuest, 227 IT profile improvement, 17
revenue impact Lucent Technologies NetCare operations time-saving strategies, 208
ADC Telecommunications
lines of business input, 128 Professional Services, 228 reporting time-saving strategies, 208
Incorporated, 212
measuring, 127-128 Luminate Software Corporation, resource regulation authority, 17
Adtran Incorporated, 212
security, 28 229 SLA numbers credibility, 209
Amdahl Corporation, 213
access control definition, 29 Micromuse Incorporated, 229 user expectation management, 16
Appliant Incorporated, 213
business ownership, 31 N*Manage Company, 233 user satisfaction gains, 16
Aprisma Management Technology,
group privileged access definition, 30 214 NetOps Corporation, 230 value areas, 204, 207
intrusion detection, 30 NetPredict Incorporated, 230 business process management, 186
Attention Software Incorporated,
multiple security systems, 29 Netreality, 231 commercial products, 101-102
214
privacy issues, 30-31 NetScout Systems Incorporated, 231 administration tools, 121
Avesta Technologies Incorporated,
resource definition, 29 215 NetSolve Incorporated, 232 agents, 104-107
service levels, defining, 13 Netuitive Incorporated, 232 application management, 185
Axios Products Incorporated, 215
service teams, 182 Nortel Networks Corporation, 233 Appvisor Application Management,
BMC Software Incorporated, 216
setting initial goals, 91-93 Opticom Incorporated, 234 213
Bridgeway Corporation, 217
upgrades, planning, 129 Bullsoft, 217 OptiSystems Incorporated, 234 ARM, 114
user/service provider relationship Packeteer, 235 availability, 19
business, importance, 180
service provider dominance, 174-175 Paradyne Corporation, 235 Attention!, 214
Cisco Systems Incorporated, 219
service provider preparation CMG, 78 Quest Software, Incorporated, 236 benefits, 183
suggestions, 176-177 Quick Eagle Networks, 236 Bluebird, 232
Computer Associates International
BMC Software Incorporated, 216
Incorporated, 219
Candle Corporation, 217 ()privity SLM, 233 service level negotiation, 144-145
current understanding, 87-88
Cisco Works 2000, 218 packet monitors, 107-108 setting service management teams,
defined, 13
Continuity, 225 PacketShaper, 234 142
defining managed services, 180
CrossKeys Resolve, 220 Pegasus, 224 tool inventories, 146
differentiated services, creating, 176
CSU/DSUs, 108 Performance Works software, 227 upgrading existing tools, 146-147
e-business challenges, 173
Custom Network Analysis, 230 PILOT, 215 increased demand, 18-19
e-commerce challenges, 173
customizing, 185 META Group, 89
Preside Performance Reporting, 233 effectiveness, 14-15
Do It Yourself, 229 primary data collectors, 102 negative perception, 14
emerging research, 88-90
EcoSCOPE, 219 probes, 107-108 popularity, 14-15
emerging service challenges,
EcoTOOLS, 220 reporting tools, 117-120 service level reporting, 98
173-174
Empirical Suite, 222 ResponseCenter, 236 setting initial goals, 91-93
end-to-end, measuring quality,
Energizer PME, 234 RPM 3000, 221 standards, 77
22-23,182
En View, 213 SAMAN, 224 agents, 107
external service management, 95-97
evolving market, 184-185 secondary data collectors, 103 Application Management MIB,
FCAPS management model
eWatcher, 215 82-83
selecting implementation tools, configuration monitoring, 115
Executive Information System, 233 145-146 fault management monitoring, 115 ARM, 83-84
FirstSense Enterprise, 223 Service Level Analyzer, 103 CIM, 81
network usage accounting, 116
Foglight, 235 Service Level Suite, 222 evolving standard initiatives, 184
performance management, 116-117
Frame Relay Access Probe, 237 Service Management Architecture, ITIL, 78-81
security management, 117
Help Desk, 238 226 Service Level Agreements, 53-54,
Forrester Research, 89
HP Open View ITSM Service Level ServicePoint series, 211-212 frames of reference, creating, 176 172
Manager, 225 simulations, 108-109 SLA Working Group, 81-82
future goals, 179-180
Information Technology Service SLM analysis tools, 120-121 tools, 99
Gartner Group, 88
Management (ITSM), 103 software license savings, 207 user abuses, 14
Hurwitz Group, 89
Info Vista Corporation, 226 SOLVE Series, 237 user/service provider relationship
implementation
IQ series, 212 service provider dominance, 174-175
Spectrum SLM products, 214 baseline sampling duration, 143-144
Keystone CNM, 216 Statscout, 237 service provider preparation
client presentation tips, 141-142
Keystone VPNview, 216 Tivoli Service Desk, 238 suggestions, 176-177
client-by-client, 139
Luminate Software Corporation, tool inventories, 146 service provider vulnerabilities, 176
determining baselines, 143-144
228 TREND, 221 user preparation suggestions,
establishing client contacts, 141
manager-agent model, 104-105 Trinity, 215 174-175
establishing reporting procedures,
managers, 104-105 slowing staff growth, 205
Unicenter TNG products, 219 147-148
monitoring tools, 102, 104-113 upgrading existing tools, 146-147 implementation follow-up procedures, SLS (Service Level Suite), 222
NetClarity Suite, 227 Vistaviews, 103 SMA (Service Management
149-150
Netcool suite, 229 Visual IP InSight, 240 Architecture), 226
IT personnel priority, 139-140
NetPredict Incorporated, 230 Visual Uptime, 239-240 SNMP (Simple Network
negotiating complaints, 150-151
NetScout Systems Incorporated, 231 VitalSuite, 228 ongoing pen communication, 150 Management Protocol), 159
NetSolve Incorporated, 231 WANsuite, 239 SNMP agents, 106
ongoing service management team
Netuitive Incorporated, 232 WANview, 236 software license savings, 207
meetings, 150
NetVoyant, 239 Wise IP/Accelerator, 230 SOLVE Series, 237
planning, 137-138
Network Health—Service Level communication between IT sources, information, 118
rollout strategies, 138-139
Reports, 220 Spectrum SLM products, 214
personnel and lines of business, satisfaction surveys, 150
OpenLane, 235 181 Sprint (Service Level Agreements),
selecting client implementation order,
OpenMaster, 217 current practices, 90-91 96
140
Open View Network Node Manager current quality perception, 91 stakeholder groups, 59
selecting necessary tools, 145-146
(NNM), 103
standards (SLM), 77 MCI Worhi(,uni (!6 utilities. See also commercial SLM
,
generatig(,, 163-164
agents, 107 Sprint, 96 products; tools
ping command, 164
Application Management MIB, UUNET Technologies, 97 accounting (UNIX), 158
traceroute command, 164
82-83 Perfmon (Windows NT/2000), 159
terms (Service Level Agreements), transaction rates, 26-27
ARM, 83-84 61 Process Explode (Windows
transactions domain (networks),
CIM, 81 NT/2000), 159
time conventions (Service Level monitoring, 112-114
evolving standard initiatives, 184 Agreements), 190 ps (UNIX), 158
TREND, 221
ITIL, 78-81 Tivoli Systems Incorporated Quick Slice (Windows NT/2000),
Trinity, 215
Service Level Agreements, 53-54, ARM, 83 159
172 sar (UNIX), 158
Tivoli Service Desk, 238
SLA Working Group, 81-82 U-V Taskmanager (Windows NT/2000),
Web site, 238
statement of intent (Service Level tools, 99. See also commercial SLM Unicenter TNG products, 219 159
Agreements), 189 products; utilities UNIX UUNET Technologies (Service
Statscout, 237 administration, 121 accounting utility, 158 Level Agreements), 97
Sterling Software, 237 monitoring, 102, 104-113 monitoring by footprint, 157-158
stretch objectives, 62 agents, 104-107 value (services)
ps utility, 158
surveys, customer, 42, 150, 197 ARM, 114 sar utility, 158 benefits, 203-204, 207-209
future requirements section, 200 CSU/DSUs, 108 upgrades (services), 129 linking value to cost, 37-38, 126
general comment areas, 199 manager-agent model, 104-105 user environment (Service Level ROI, 203-207, 210
IT contact frequency, 199 managers, 104-105 Agreements), 191 Vantive Corporation
optional information section, 200 packet monitors, 107-108 users. See also clients; customers Help Desk, 238
service quality ratings, 197-198 performance management, 116-117 access Web site, 239
service usage information, 199 primary data collectors, 102 defining, 29 Verilink Corporation, 239
SVAs (Service Value Agreements), probes, 107-108 group privileged access, 30 viruses, 50-52
89 Vistaviews, 103, 226
secondary data collectors, 103 complaints, negotiating, 150-151
Sync Research Incorporated simulations, 108-109 determining baselines, 143 Visual Networks Incorporated
Frame Relay Access Probe, 237 reporting, 117 expectation creep, 55-56 Visual IP InSight, 240
Web site, 238 increased IT dependence, 8-9, 18 Visual Uptime, 239-240
customizing reporting information,
synthetic transactions 120 increased IT knowledge, 8, 18 Web site, 240
benefits, 163-164 historical data, 119 new user request process (Service VitalSuite, 228
drawbacks, 163 information sources, 118 Level Agreements), 192
generating, 163-164 real-time data, 118-119 performance perception, 25
ping command, 164
W-Z
report presentation issues, 119-120 productivity, 129
traceroute command, 164 traceroute command, 164 requests, 24 WANs (wide area networks), 108
systems integrators (Internet tracking problems, 92 satisfaction surveys, 150 WANsuite, 239
services), 171 trade shows, 184 Service Level Agreement/SLM WANview, 236
traffic (network), measuring, abuses, 14 WBEM (Web Based Enterprise
T 160-161 user/service provider relationship Management) initiative, 81
transactions service provider dominance, 174-175 Web sites. See sites
Taskmanager utility (Windows weekly summaries (reports), 48,
application, 26-27 service provider preparation
NT/2000), 159 202
business, 26-27 suggestions, 176-177
teams, service, 182 Windows 2000
monitoring, 112-114 service provider vulnerabilities, 176
telecommunications monitoring by footprint, 158-159
synthetic user preparation suggestions,
AT&T, 96 Perfmon utility, 159
benefits, 163-164 174-175
external service management, 95-97 Process Explode utility, 159
drawbacks, 163
GTE Internetworking, 96
Quick Slice utility, 159
Taskmanager utility 159 Hey, you've got enough worries.
Windows NT
monitoring by footprint, 158-159
Perfmon utility, 159
Don't let IT training be one of them.
Process Explode utility, 159
Quick Slice utility, 159
Taskmanager utility, 159
wire sniffing (network traffic), 160
Wise IP/Accelerator, 230
workload levels
batch job concurrency, 27-28
batch job dependencies, 28
client/server interactions, 27
reports, 44
transaction rates, 26-27
Inform
workloads, balancing, 130
worksheets, cost justification
cost avoidance improvements, 136
downtime cost, 134
employee cost, 134
example, 131-133
lost business cost, 135
productivity cost, 135
Service Level Agreement penalties,
135
Get on the fast track to IT training at InformlT,
your total Information Technology training network.


1 www.informit.com I SAMS

• Hundreds of timely articles on dozens of topics • Discounts on IT books

from all our publishing partners, including Sams Publishing • Free, unabridged

books from the InformiT Free Library • "Expert Q&A"—our live, online chat
with IT experts • Faster, easier certification and training from our Web- or

classroom-based training programs • Current IT news • Software downloads

IN Career-enhancing resources

InformIT is a registered trademark of Pearson. Copyright ©2001 by Pearson.


Copyright ©2001 by Sams Publishing.
Other Related Titles
Implementing SAP B/S:
The Guide for Business
and Technology Managers
0-672 -31776-1
Vivek Kale
Sams Teach Yourself Sams Teach Yourself $39.99 USA / $59.95 CAN
SAP 11/3 in 24 Hours SAP B/3 in 10 Minutes
0-672-31624-2 0-672-31495-9
Danielle Larocca Simon Sharpe
Peter
©rton's
$24.99 USA / $37.95 CAN $12.99 USA / $18.95 CAN Maimizing
Windows 98
SMS Administrator's Maximum Linux Security Administration
Survival Guide 0-672-31670-6
0-672-30984-X Anonymous
James Farhatt, et al. $39.99 USA / $59.95 CAN
$59.99 USA / $84.95 CAN

Microsoft SUL Server 7.0 Peter Norton's Maximizing


OBA Survival Guide Windows GB Administration
0-672 - 31226-3
0-672-31218-2
Mark Spenik, et al.
Peter Norton
$49.99 USA / $71.95 CAN
$29.99 USA / $42.95 CAN

HACKER'S GUIDE TO PROTECTING

OUR INTERNET SITE AND NETWORK

SECOND EDITION

Maximum Security: A
Hacker's Guide to Protecting
Your Internet Site and
Network, Second Edition
0-672 - 31341-3
Anonymous
SAMS $49.99 USA / $70.95 CAN
www.samspublishing.com
All prices are subject to change.

You might also like