Foundations of Software and System Performance Engineering PDF
Foundations of Software and System Performance Engineering PDF
André B. Bondi
All rights reserved. Printed in the United States of America. This publication is protected
by copyright, and permission must be obtained from the publisher prior to any prohib-
ited reproduction, storage in a retrieval system, or transmission in any form or by any
means, electronic, mechanical, photocopying, recording, or likewise. For information
regarding permissions, request forms and the appropriate contacts within the Pearson
Education Global Rights & Permissions Department, please visit www.pearsoned.com/
permissions/.
ISBN-13: 978-0-321-83382-2
ISBN-10: 0-321-83382-1
2 16
In memory of my father, Henry S. Bondi,
who liked eclectic solutions to problems,
and of my violin teacher, Fritz Rikko,
who taught me how to analyze and debug.
Preface xxiii
Acknowledgments xxix
About the Author xxxi
ix
x Contents
5.5.3 Confidentiality 115
5.5.4 Performance Requirements and the
Outsourcing of Software Development 116
5.5.5 Performance Requirements and the
Outsourcing of Computing Services 116
5.6 Guidelines for Specifying Performance
Requirements116
5.6.1 Performance Requirements and
Functional Requirements 117
5.6.2 Unambiguousness 117
5.6.3 Measurability 118
5.6.4 Verifiability 119
5.6.5 Completeness 119
5.6.6 Correctness 120
5.6.7 Mathematical Consistency 120
5.6.8 Testability 120
5.6.9 Traceability 121
5.6.10 Granularity and Time Scale 122
5.7 Summary 122
5.8 Exercises 123
6.8 Summary 136
6.9 Exercises 137
References 377
Index 385
This page intentionally left blank
Preface
xxiii
xxiv Preface
• How can you test performance in a manner that tells you if the
system is functioning properly at all load levels and if it will
scale to the extent and in the dimensions necessary?
• What can poor performance tell you about how the system is
functioning?
• How do you architect a system to be scalable? How do you
specify the dimensions and extent of the scalability that will be
required now or in the future? What architecture and design
features undermine the scalability of a system?
• Are there common performance mistakes and misconceptions?
How do you avoid them?
• How do you incorporate performance engineering into an agile
development process?
• How do you tell the performance story to management?
requirements and best practices for writing and managing them are
discussed in Chapters 5 through 7. To understand the performance that
has been attained and to verify that performance requirements have
been met, the system must be measured. Techniques for doing so are
given in Chapter 8. Performance tests should be structured to enable
the evaluation of the scalability of a system, to determine its capacity
and responsiveness, and to determine whether it is meeting through-
put and response time requirements. It is essential to test the perfor-
mance of all components of the system before they are integrated into
a whole, and then to test system performance from end to end before
the system is released. Methods for planning and executing perfor-
mance tests are discussed in Chapter 9. In Chapter 10 we discuss pro-
cedures for evaluating the performance of a system and the practice of
performance modeling with some examples. In Chapter 11 we discuss
ways of describing system scalability and examine ways in which scal-
ability is enhanced or undermined. Performance engineering pitfalls
are examined in Chapter 12, and performance engineering in an agile
context is discussed in Chapter 13. In Chapter 14 we consider ways of
communicating the performance story. Chapter 15 contains a discus-
sion of where to learn more about various aspects of performance
engineering.
This book does not contain a presentation of the elements of prob-
ability and statistics and how they are applied to performance engi-
neering. Nor does it go into detail about the mathematics underlying
some of the main tools of performance engineering, such as queueing
theory and queueing network models. There are several texts that do
this very well already. Some examples of these are mentioned in
Chapter 15, along with references on some detailed aspects of perfor-
mance engineering, such as database design. Instead, this book focuses
on various steps of the performance engineering process and the link
between these steps and those of a typical software lifecycle. For exam-
ple, the chapters on performance requirements engineering draw par-
allels with the engineering of functional requirements, and the chapter
on scalability explains how performance models can be used to evalu-
ate it and how architectural characteristics might affect it.
Preface xxvii
Audience
This book will be of interest to software and system architects, require-
ments engineers, designers and developers, performance testers, and
product managers, as well as their managers. While all stakeholders
should benefit from reading this book from cover to cover, the follow-
ing stakeholders may wish to focus on different subsets of the book to
begin with:
• Product owners and product managers who are reluctant to
make commitments to numerical descriptions of workloads
and requirements will benefit from the chapters on performance
metrics, workload characterization, and performance require-
ments engineering.
• Functional testers who are new to performance testing may
wish to read the chapters on performance metrics, performance
measurement, performance testing, basic modeling, and per-
formance requirements when planning the implementation of
performance tests and testing tools.
• Architects and developers who are new to performance engi-
neering could begin by reading the chapters on metrics, basic
performance modeling, performance requirements engineer-
ing, and scalability.
This book may be used as a text in a senior- or graduate-level course
on software performance engineering. It will give the students the
opportunity to learn that computer performance evaluation involves
integrating quantitative disciplines with many aspects of software
engineering and the software lifecycle. These include understanding
and being able to explain why performance is important to the system
being built, the commercial and engineering implications of system
performance, the architectural and software aspects of performance,
the impact of performance requirements on the success of the system,
and how the performance of the system will be tested.
This page intentionally left blank
Acknowledgments
xxix
xxx Acknowledgments
Between them, Raj Varadarajan and Dr. Michael Golm read all of the
chapters of the book and made useful comments before submission to
the publisher.
Various Siemens operating units with whom I have worked on per-
formance issues kindly allowed me to use material I had prepared for
them in published work. Ruth Weitzenfeld, SC CT’s librarian, cheer-
fully obtained copies of many references. Patti Schmidt, SC CT’s in-
house counsel, arranged for permission to quote from published work
I had prepared while working at Siemens. Dr. Yoni Levi of AT&T Labs
kindly arranged for me to obtain AT&T’s permission to quote from a
paper I had written on scalability while working there. This paper
forms the basis for much of the content of Chapter 11.
I would like to thank my editors at Addison-Wesley, Bernard
Goodwin and Chris Zahn, and their assistant, Michelle Housley, for
their support in the preparation of this book. It has been a pleasure to
work with them. The copy editor, Barbara Wood, highlighted several
points that needed clarification. Finally, the perceptive comments of
the publisher’s reviewers, Nick Rozanski, Don Shafer, and Matthew
Scarpino, have done much to make this a better book.
About the Author
xxxi
This page intentionally left blank
Chapter 1
Why Performance
Engineering? Why
Performance
Engineers?
This chapter describes the importance of performance engineering in a
software project and explains the role of a performance engineer in
ensuring that the system has good performance upon delivery.
Overviews of different aspects of performance engineering are given.
1.1 Overview
The performance of a computer-based system is often characterized by
its ability to perform defined sets of activities at fast rates and with
quick response time. Quick response times, speed, and scalability are
highly desired attributes of any computer-based system. They are also
competitive differentiators. That is, they are attributes that distinguish
a system from other systems with like functionality and make it more
attractive to a prospective buyer or user.
1
2 Why Performance Engineering? Why Performance Engineers?
total, and then being able to quantify the increased demand for
memory, processing power, I/O, and network bandwidth.
• Can the system meet customer expectations or engineering
needs if the average response time requirement is 2 seconds
rather than 1 second? If so, it might be possible to build the sys-
tem at a lower cost or with a simpler architecture. On the other
hand, the choice of a simpler architecture could adversely affect
the ability to scale up the offered load later, while still maintain-
ing the response time requirement.
• Can the system provide the required performance with a cost-
effective configuration? If it cannot, it will not fare well in the
marketplace.
Performance can have an effect on the system’s functionality, or its
perceived functionality. If the system does not respond to an action
before there is a timeout, it may be declared unresponsive or down if
timeouts occur in a sufficiently large number of consecutive attempts at
the action.
The performance measures of healthy systems tend to behave in a
predictable manner. Deviations from predictable performance are
signs of potential problems. Trends or wild oscillations in the perfor-
mance measurements may indicate that the system is unstable or that a
crash will shortly occur. For example, steadily increasing memory
occupancy indicates a leak that could bring the system down, while
oscillating CPU utilization and average response times may indicate
that the system is repeatedly entering deadlock and timing out.
technologies and the evolving set of problem domains mean that the
performance engineer should have an eclectic set of skills and analysis
methods at his or her disposal. In addition, it is useful for the perfor-
mance engineer to know how to analyze large amounts of data with
tools such as spreadsheets and scripting languages, because measure-
ment data from a wide variety of sources may be encountered.
Knowledge of statistical methods is useful for planning experiments
and for understanding the limits of inferences that can be drawn from
measurement data. Knowledge of queueing theory is useful for exam-
ining the limitations of design choices and the potential improvements
that might be gained by changing them.
While elementary queueing theory may be used to identify limits
on system capacity and to predict transaction loads at which response
times will suddenly increase [DenningBuzen1978], more complex
queueing theory may be required to examine the effects of service time
variability, interarrival time variability, and various scheduling rules
such as time slicing, preemptive priority, nonpreemptive priority, and
cyclic service [Kleinrock1975, Kleinrock1976].
Complicated scheduling rules, load balancing heuristics, protocols,
and other aspects of system design that are not susceptible to queueing
analysis may be examined using approximate queueing models and/
or discrete event simulations, whose outputs should be subjected to
statistical analysis [LawKelton1982].
Queueing models can also be used in sizing tools to predict system
performance and capacity under a variety of load scenarios, thus facili-
tating what-if analysis. This has been done with considerable commer-
cial success. Also, queueing theory can be used to determine the
maximum load to which a system should be subjected during perfor-
mance tests once data from initial load test runs is available.
The performance engineer should have some grasp of computer
science, software engineering, software development techniques, and
programming so that he or she can quickly recognize the root causes of
performance issues and negotiate design trade-offs between architects
and developers when proposing remedies. A knowledge of hardware
architectures, including processors, memory architectures, network
technologies, and secondary storage technologies, and the ability to
learn about new technologies as they emerge are very helpful to the
performance engineer as well.
Finally, the performance engineer will be working with a wide
variety of stakeholders. Interactions will be much more fruitful if the
performance engineer is acquainted with the requirements drafting
10 Why Performance Engineering? Why Performance Engineers?
Drives specification
Functional
Requirements Informs specification
Performance
Requirements
Performance
Modeling
Architecture and
Technology
Choices
Design and
Implementation
Functional
Testing
Performance
Testing
Delivery
Capacity
Performance
Management and
Monitoring
Planning
this book. In this chapter we will learn that priority scheduling does
not increase the processing capacity of a system. It can only reduce the
response times of jobs that are given higher priority than others and
hence reduce the times that these jobs hold resources. Doubling the
number of processors need not double processing capacity, because of
increased contention for the shared memory bus, the lock for the run
queue, and other system resources. In Chapter 12 we also explore pit-
falls in system measurement, performance requirements engineering,
and other performance-related topics.
The use of agile development processes in performance engineer-
ing is discussed in Chapter 13. We will explore how agile methods
might be used to develop a performance testing environment even if
agile methods have not been used in the development of the system as
a whole. We will also learn that performance engineering as part of an
agile process requires careful advance planning and the implementa-
tion of testing tools. This is because the time constraints imposed by
short sprints necessitate the ready availability of load drivers, measure-
ment tools, and data reduction tools.
In Chapter 14 we explore ways of learning, influencing, and telling
the performance story to different sets of stakeholders, including archi-
tects, product managers, business executives, and developers.
Finally, in Chapter 15 we point the reader to sources where more
can be learned about performance engineering and its evolution in
response to changing technologies.
1.10 Summary
Good performance is crucial to the success of a software system or a
system controlled by software. Poor performance can doom a system
to failure in the marketplace and, in the case of safety-related systems,
endanger life, the environment, or property. Performance engineering
practice contributes substantially to ensuring the performance of a
product and hence to the mitigation of the business risks associated
with software performance, especially when undertaken from the ear-
liest stages of the software lifecycle.
This page intentionally left blank
Chapter 2
Performance Metrics
In this chapter we explore the following topics:
• The essential role and desirable properties of performance m
etrics
• The distinction between metrics based on sample statistics and
metrics based on time averaging
• The need for performance metrics to inform us about the prob-
lem domain, and how they occur in different types of systems
• Desirable properties of performance metrics
2.1 General
It is essential to describe the performance of a system in terms that are
commonly understood and unambiguously defined. Performance
should be defined in terms that aid the understanding of the system
from both engineering and business perspectives. A great deal of time
can be wasted because of ambiguities in quantitative descriptions, or
because system performance is described in terms of quantities that
cannot be measured in the system of interest. If a quantitative descrip-
tion in a contract is ambiguous or if a quantity cannot be measured, the
contract cannot be enforced, and ill will between customer and sup-
plier may arise, leading to lost business or even to litigation.
Performance is described in terms of quantities known as metrics.
A metric is defined as a standard of measurement [Webster1988].
19
20 Performance Metrics
Both the utilizations of the individual processors (or, in the case of mul-
ticore processors, the individual cores) and the total overall utilizations
of the processors can be obtained in the Linux, Windows, and UNIX
environments.
It is also possible to specify a metric whose value might not be
obtainable, even though the formula for computing it is well known.
For example, the unbiased estimator of the variance of the response
time corresponding to the data used in equation (2.1) is given by
∑
N (T )
Rk2 − N (T )R 2
SN2 (T )− 1 = k =1 (2.5)
N (T ) − 1
This metric is obtainable only if N(T) ≥ 2 and the sum of squares of the
individual response times has been accumulated during the experi-
ment, or if the individual response times have been saved. The former
incurs the cost of N(T) multiplications and additions, while the latter
incurs a processing cost that is unacceptable if secondary storage (e.g.,
disk) is used, or a memory cost and processing cost that may be unac-
ceptable if main memory or flash memory is used. This example illus-
trates the point that collecting data incurs a cost that should not be so
high as to muddy the measurements by slowing down the system
under test, a phenomenon known as confounding. As we shall see in the
chapters on measurement and performance testing, load generators
should never be deployed on the system under test for this reason.
Similarly, the resource costs associated with measurements should
themselves be measured to ensure that their interference with the work
under test is minimal. Otherwise, there is a serious risk that resource
usage estimates will be spuriously increased by the costs of the meas-
urements themselves. Measurement tools must never be so intrusive as
to cause this type of problem.
2.2 Examples of Performance Metrics 23
More than one metric may be needed to meaningfully describe the per-
formance of a system. For online transaction processing systems, such as
a brokerage system or an airline reservation system, the metrics of inter-
est are the response times and transaction rates for each type of transac-
tion, usually measured in the hour when the traffic is heaviest, sometimes
called the peak hour or the busy hour. The transaction loss rate, that is, the
fraction of submitted transactions that were not c ompleted for whatever
reason, is another measure of p erformance. It should be very low. For
packet switching systems, one may require a packet loss rate of no more
than 10−6 or 10−8 to avoid retransmissions. For high-volume systems
whose transaction loss rates may be linked to loss of revenue or even loss
of life, the required transaction loss rate might be lower still.
We see immediately that one performance metric on its own is not
sufficient to tell us about the performance of a system or about the qual-
ity of service it provides. The tendency to fixate on a single number or to
focus too much on a single metric, sometimes termed mononumerosis, can
result in a poor design or purchasing decision, because the chosen metric
may not reflect critical system characteristics that are described by other
metrics. For a home office user, choosing a PC based on processor clock
speed alone may result in disappointing performance if the main mem-
ory size is inadequate or if the memory bus is too slow. If the memory is
too small, there is a substantial risk of too much paging when many
applications are open or when larger images are being downloaded, no
matter how fast the CPU. If the memory bus is too slow, memory accesses
and I/O will be too slow. The lessons we draw from this are:
1. One cannot rely on a single number to tell the whole story
about the performance of a system.
2. A description of the performance requirements of a system
requires context, such as the offered load (job arrival rate), a
description of what is to be done, and the desired response time.
3. A performance requirement based on an ill-chosen single
number is insufficiently specific to tell us how well the system
should perform.
Induct
MFC
DB
2.4.4 Telephony
There is a long tradition of performance evaluation in the field of tele-
communications. Teletraffic engineers use the terms offered load and car-
ried load to distinguish between the rate at which calls or transactions
are offered to the network and the rate at which they are actually
accepted, or carried [Rey1983]. One would like these figures to be iden-
tical, but that is not always the case, particularly if the network is over-
loaded. In that case, the carried load may be less than the offered load.
This is illustrated in Figure 2.2. The line y = x corresponds to the entire
offered load being carried. In this illustration, the offered load could be
carried completely only at rates of 300 transactions per second or less.
The uncarried transactions are said to be lost. The loss rate is 1 – (carried
load/offered load). The carried load is sometimes referred to as the
throughput of the system. The rate at which successful completions
occur is sometimes called the goodput, especially in systems in which
retries occur such as packet networks on which TCP/IP is
transported.
450
400,400
400
Offered Load (y=x)
350
400,316
300
300,300
Transactions/sec
250
Carried Load
(y < = x) Actual TPS
200 y= x
200,200
150
100
100,100
50
0
0 50 100 150 200 250 300 350 400 450
Offered Load (transactions/sec)
Figure 2.2 Offered and carried loads (transactions per second, or TPS)
bar after paying and (2) the number of customers actually served dur-
ing the intermission. If service expansion is desired, the bar manager
may also be interested in the number of customers who balked at queue-
ing for a long time at the bar, perhaps because they thought that they
would not have time to finish their drinks before the end of the inter-
mission. If each customer queues only once, the number of customers
requesting drinks is limited by the number of tickets that can be sold.
In some network management systems, large batches of status poll-
ing requests are issued at constant intervals, for example, every
5 minutes [Bondi1997b]. Status polls are used to determine whether a
remote node is operational. They are sometimes implemented with a
ping command. If the first polling attempt is not acknowledged, up to
N - 1 more attempts are made for each node, with the timeout interval
between unacknowledged attempts doubling from an initial value of
10 seconds. If the Nth message is unacknowledged, the target node is
declared to be down, and an alarm is generated in the form of a red icon
on the operator’s screen, a console log message, and perhaps an audi-
ble tone. If the nth message is acknowledged by the target (n ≤ N ), the
corresponding icon remains green. For N = 4 (as is standard for ICMP
pings), the time to determine that a node is down is at least
150 seconds.
In an early implementation of a network management system, no
more than three outstanding status polling messages could be unac-
knowledged at any one time. This limitation is not severe if all polled
nodes are responsive, but it causes the system to freeze for prolonged
periods if there is a fault in the unique path to three or more nodes,
such as a failed router in a tree-structured network. The rate at which
status polling messages can be transmitted is the maximum number of
outstanding polls divided by the average time to declare whether a
node is either responsive or unresponsive. If many nodes are unre-
sponsive, the maximum rate at which polling messages can be trans-
mitted will be extremely limited. To ensure timely monitoring of node
status, such freezes are to be avoided. A method patented by the author
[Bondi1998] allows an arbitrary number of status polling messages to
be unacknowledged at any instant. This allows the polling rate to be
increased markedly, while avoiding polling freezes.
What metrics should be used to compare the performance of the
original polling mechanism with that of the proposed one? For each
polling request, the performance measure of interest is the time taken
from the scheduling of the batch to the time at which the first attempt
on this round is transmitted. If B nodes are being polled in a batch, one
2.9 Exercises 35
2.8 Summary
In this chapter we have seen that performance should be quantified
unambiguously in terms of well-defined, obtainable metrics. We have
also described useful properties that these metrics should have, such as
repeatability and reliability. We have also briefly examined situations
where average values of measures are useful, and those in which it is
more meaningful to look at the total amount of time taken to complete
a specified amount of work. We shall explore these points further in the
chapters on performance requirements and performance testing.
2.9 Exercises
2.1. Why is the number of users logged into a system an unreliable
indicator of demand? Explain.
2.2. Identify the performance metrics of a refreshment service that
is open only during class breaks or breaks between sessions at
a conference.
2.3. A large corporation has introduced a web-based time sheet
entry system. The head of the accounts department has speci-
fied that the system should support 1 million entries per month.
There are 50,000 potential users of the system. Company policy
stipulates that time entries shall be made at the end of each
working day. Is the number of entries per month a metric of
interest to system developers? Why or why not? Explain.
This page intentionally left blank
Chapter 3
Basic Performance
Analysis
This chapter contains an overview of basic performance laws and
queueing models of system performance. Among the topics dis-
cussed are
• Basic laws of system performance, including the Utilization Law
and Little’s Law
• Open and closed queueing network representations of systems,
and performance models to analyze them
• Product form queueing networks and Mean Value Analysis
• Loss systems
• The relationship between performance modeling principles
and the formulation of performance requirements
• The use of performance models to inform the interpretation of
performance measurements and performance test results
Number
Present
Arrivals Time
Departures
Figure 3.1 Evolution of number present with constant service and interarrival time
Number
Present
Time
Arrivals Departures
Figure 3.2 Evolution of number present with variable interarrival and constant
service times
Number
Present
Arrivals Time
Departures
Figure 3.3 Evolution of a queue’s length with constant interarrival and variable
service times
3.3 Causes of Queueing 41
Number
Present
Arrivals Time
Departures
Figure 3.4 Evolution of a queue’s length with variable interarrival and service times
Queue
Arrivals
Server Departures
Server0
Queue
Arrivals
Server1 Departures
Server2
U = XS (3.1)
Changing the queueing discipline does not change the average server
utilization, because arriving jobs will always be served unless the
server is saturated. This is similar to the principle of conservation of
work in physics.
46 Basic Performance Analysis
Example. We are told that three transactions per second arrive at a proces-
sor, and that the average service time is 100 msec per transaction. What is
the utilization?
Solution. The average utilization of this server is 3 × .01 = 0.3.
Example. The measured utilization of a device with a transaction rate of
100 jobs per second is 90%. What is the average service time?
Solution. Applying the Utilization Law, we have
U 0.9
S= = = 9 × 10 −3 sec = 9 msec per job
X 100
Example. The average service time per transaction at a device is 10 msec.
What is the maximum transaction rate that the device should be expected
to handle?
Solution. We require that the utilization always be less than 100%. Thus,
we require that
U = XS < 1 (3.2)
We usually require that the average utilization of any device should not
exceed 80% to avoid saturation. In that case, we require that
As we shall see in Section 3.6, the reason for this requirement is that the
average response time is highly sensitive to changes in the offered load
when the utilization exceeds this level. A system should not be engi-
neered to operate at higher utilizations because users desire that
response times be stable.
When interpreting the utilizations reported by measurement tools,
it is important to consider the time scale over which their average val-
ues have been computed, and how the measurements were obtained.
The utilizations of central processing units are often obtained by meas-
uring the fraction of time that an idle loop is executing and subtracting
that value from 100%. An idle loop is a process or thread of lowest
priority that executes only when there is no other work for the proces-
sor to do. For sufficiently short measurement intervals, the resulting
3.5 Basic Performance Laws: Utilization Law, Little’s Law 47
n = XR (3.3)
Queue
Length
A4
3
A3 D2
R3 R4
2
A2
R2 R3 D3
1
A1 R1 R2 R4 D4
0 T Time
R1
R2
R3
R4
empty and the server initially idle. At arrival instant A1, the number of
jobs present increases from 0 to 1. Since the first job finds the system
empty and idle, it leaves after time R1. The second job to arrive arrives
at A2 and has response time R2. Its arrival causes the number of jobs
present to increase from 1 to 2, and job 3’s arrival causes the queue length
to increase from 2 to 3. Job 1’s departure causes the queue length to drop
by one. Now, the average queue length in the time interval [0,T] is equal
to the area under the graph divided by T. A step up on the graph corre-
sponds to the arrival of a job, while a step down corresponds to a depar-
ture. The time between a step up and a corresponding step down is the
response time of a job. Hence, the area under the graph is the sum of the
response times. The throughput is the number of jobs that both arrived
and completed during the time interval [0,T], which we denote by C.
Now, the average response time is given by
C
1
R=
C
∑R
i =1
i
(3.4)
C
X= (3.5)
T
The average is the area under the queue length graph divided by the
length of the observation period. This is the sum of the response times
divided by the length of the observation period T. Hence,
C C C
∑
i =1
Ri
C
∑
i =1
Ri
C
∑R
i =1
i
n == = = = XR (3.6)
T C T T C
as we desired to show.
If the server is replaced by a faster one, we expect both the average
response time and the mean queue length to decrease. The reverse is
true if the server is replaced by a slower one. Intuitively, a faster server
should have both shorter response times and average queue lengths
given the same arrival rate. One can infer this linkage by an inspection
of Figure 3.7. The server utilization is the fraction of time that the server
was busy. In this example, the first arrival occurred at time A1, and the
last departure occurred at time D4. The server was idle from time 0 to
3.6 A Single-Server Queue 49
λ
ρ=
µ
We also call r the traffic intensity. We require ρ < 1, since the server utili-
zation cannot exceed 100%. If ρ ≥ 1, the system is saturated, and a back-
log of jobs will build up over time ad infinitum because the rate of
customers entering the system exceeds the rate going out. We say that
50 Basic Performance Analysis
n
R=
λ
1 (3.9)
=
µ− λ
λ <µ
100
90
80
70
Response Time
60
50
40
Maximum Sustainable Load
30
20
10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Load
(1 + C 2 ) ρ 2
n =ρ+ (3.10)
2(1 − ρ )
52 Basic Performance Analysis
This equation shows that for a given traffic intensity r, the mean queue
length is least when the service time is constant, that is, when c2 = 0.
When the service time is constant, we have
ρ2
n =ρ+ (3.11)
2(1 − ρ )
2ρ 2 2 ρ (1 − ρ ) + 2 ρ 2 ρ
n =ρ+ = = (3.12)
2(1 − ρ ) 2(1 − ρ ) 1− ρ
Time Slicing
Job Completion
IO1
CPU IO2
IO3
Figure 3.9 A central server model with a single CPU and three I/O devices
54 Basic Performance Analysis
since
Also,
Xi = Vi X0 , i = 1, 2,..., K (3.16)
The left-hand side of this equation is the utilization of the ith device.
We define the demand on server i as
Di = ViSi (3.18)
U i = X0 Di , i = 1, 2,..., K (3.19)
This equation shows that the utilizations of all the devices in the sys-
tem rise in proportion to the global system throughput. If one plots the
utilizations of the servers as functions of the global system throughput,
they will appear as straight lines with slopes proportional to the cor-
responding demands Di. If one plots the utilizations as functions of the
global system throughput on a logarithmic scale, the resulting curves
will be constant distances apart. To see this, observe that
1
Util1
0.9 Util2
Util3
0.8
0.7
0.6
Server Utilization
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10 11 12
Arrival Rate
Figure 3.10 Example utilizations versus arrival rate, vertical scale linear
1
Util1
Util2
Util3
Server Utilizations (log scale)
0.1
0.01
0 1 2 3 4 5 6 7 8 9 10 11 12
Arrival Rate
Figure 3.11 Example utilizations versus arrival rate, vertical scale logarithmic
Let b denote the index of the largest of the Dis. Since equation (3.22)
holds for all demands Di, it must hold for the largest of them. Therefore,
X0 < 1/ Db (3.23)
is not achievable unless the desired system throughput is less than the
reciprocal of the service demand made by a given task at the bottleneck
device.
If the job has the system to itself, there is no queueing delay for service,
and the time spent at the device on each visit is simply the service time.
This means that a job will take at least as long as the sum of the total
amounts of time being served at all the devices. Therefore, in general,
provided no job is receiving service from more than one server at a
time,
K K
R0 ≥ ∑ ViSi = ∑ D (3.25)
i
i =1 i=1
When U k < 1 ∀k , none of the servers is saturated, and the joint queue
length distribution is given by
K
Pr(n1 , n2 ,..., nK ) = ∏ (1 − U )U
k
nk
k (3.27)
k =1
provided that U k < 1 for k = 1, 2,..., K , that is, that none of the servers is
saturated. Notice that this expression has the same form as in equation
(3.12), as if each server were in isolation. The joint probability distribu-
tion of the queue lengths factorizes into the probability distributions of
the lengths of the individual queues. This shows that the probability
distribution at one queue is independent of those of all the others in the
network. Intuitively, this may be linked to the exponential distribution
of the service times, the Poisson nature of the arrival process from out-
side the network, and state-independent probabilistic routing of jobs
from one service center to the next.
The task returns to the computer system once the user has finished
thinking and clicks once again. The average time the user spends think-
ing before relaunching the task is called the think time, usually denoted
by Z , while the time spent circulating among the devices in the com-
puter system is the global response time R0. This is illustrated in
Figure 3.12. In a pure batch processing system, there is no think time,
and Z = 0.
The computer system is modeled as a network of queues. The set of
processors and other devices that constitute the computer system is
often called the central subsystem. The think time is modeled as a queue
with infinite service (IS). Infinite service is so called because a server is
available for every job that returns to the terminal. It is sometimes
called a pure delay server because the instant availability of a server
means there is no queueing for service.
The think time, average response time, and system throughput are
related by the Response Time Law, which is a direct consequence of
Little’s Law. The Response Time Law is easily derived by observing
that the average total time between task launches is the sum of the
average response time and the think time. The number of circulating
tasks is equal to the number of terminals logged in, denoted by M. The
system throughput, X0, and the average response time depend on the
number of terminals logged in, since having a larger number logged in
Time Slicing
IO1
CPU IO2
Global
Throughput R0
X
M Users Logged In IO3
Think Time Z
Figure 3.12 Computer system with terminals, central processor, and three I/O
devices
62 Basic Performance Analysis
M
R( M) = − Z (3.30)
X0 ( M)
Rearranging equation (3.29), we can see that a larger think time reduces
the system throughput, since
M
X0 ( M) = (3.31)
[R0 ( M) + Z]
If users spend more time thinking, they will not be sending transac-
tions to the central subsystem as frequently. Similarly, a higher through-
put is associated with a lower response time, while a lower throughput
is associated with a higher response time. As the think time is reduced
to zero, the response time of the central subsystem will increase, and
vice versa.
K
Ro (1) = ∑V S (3.32)
i i
i =1
64 Basic Performance Analysis
provided that a job can receive service at only one server at a time, that
is, synchronously. Because this is the response time with only one job
present and no contention, the response time of the central subsystem
cannot be any better than this. When we study performance require-
ments in Chapters 5 through 7, we shall visit this notion once again.
Using Little’s Law or the Response Time Law with M = 1 and Z = 0,
the system throughput with the sole job having the central subsystem
to itself is given by
K
X0 (1) = 1/ ∑V S (3.33)
i i
i =1
This shows that one must improve the bottleneck device if one wishes
to increase system capacity at heavy loads. For systems with terminals
and nonzero think times, inequality (3.34) becomes
The throughput bounds for the central subsystem (i.e., not considering
think time) resulting from these inequalities are shown in Figure 3.13.
Queueing at the bottleneck device ensues to the extent that it degrades
throughput when the number of circulating jobs in the central subsystem
is sufficiently large. It occurs when the two throughput bounds cross.
Thus, queueing ensues for the smallest population such that
N 1
=
R0 (1) VbSb (3.36)
R0 (1) i=1
∑V S b b
N* ≥ = (3.37)
VbSb VbSb
3.9 Bottleneck Analysis for Single-Class Closed Queueing Networks 65
X0 System
Throughput
Maximum Attainable
Throughput
1/VbSb
Throughput X0
0
0 N* N (Number of Jobs)
R (1) + Z
∑V S i i +Z
*
M ≥ 0 = i =1
(3.38)
VbSb VbSb
R0 ( M) ≥ MVbSb − Z (3.39)
66 Basic Performance Analysis
when the think time is nonzero. But the response time also has to be at
least as large as the time to circulate through it unimpeded. Therefore,
we have
Response Time R0
R0(M )
K
R0 = ∑VkSk
k=1
R0 = MVbSb − Z
0
M = Number
M*
of Terminals
Logged In
−Z
time is reduced, it cannot be less than the time spent at the bottleneck
device, even if no other jobs are present. If only one job is present in a
closed system, we must have
K
Db ≤ R0 (1) ≤ ∑ D (3.41)
i
i=1
and the system throughput with think time Z and M terminals logged
in is given by
M
X0 ( M) = (3.43)
R0 ( M) + Z − α ( M)
∑
K
R0 ≤ Vi Ri (3.44)
i =1
Notice that unlike the case with open queueing networks, the joint queue
length distribution does not factorize into the marginal distributions at
the individual nodes. This is because the presence of n jobs at one node
implies that N − n nodes are present at all the other nodes combined.
According to the BCMP Theorem [BCMP1975], a system with open,
closed, or mixed networks and Poisson arrivals for the open networks
has product form if each server in it satisfies the following conditions:
1. The service time distribution has a rational Laplace transform.
2. The service time discipline is First Come First Served (FCFS),
Last Come First Served Preemptive Resume (LCFSPR),
Processor Sharing (PS), or Infinite Service (IS).
3. If the server is FCFS, the service time has an exponential distri-
bution with the same mean for all classes of jobs.
4. Routing is probabilistic and independent of the state of the
network.
3.11 Mean Value Analysis of Single-Class Closed Queueing Network Models 69
think time Z. Suppose further that the mean service time at the ith
server is Si and that the visit ratio is Vi . Mean Value Analysis uses a
relationship between a server’s response time with N jobs present in
the system and the mean queue length at the server with one less circu-
lating job in the system. The mean queue length observed by an arriv-
ing customer is the mean queue length at the server with the arriving
customer removed. This is known as the Arrival Theorem or the
Sevcik-Mitrani Theorem [SevMit1981]. Using the Arrival Theorem, we
can write
Ri ( N ) = Si [1 + ni ( N − 1)] (3.47)
We can then obtain the mean queue length with N jobs present at each
of the K servers using Little’s Law once the throughput at each server is
known. The latter is easily obtained from the Forced Flow Law, since
Xi ( N ) = Vi X0 ( N ) (3.49)
This gives us the complete set of equations we need to solve the model
for the desired performance metrics using Mean Value Analysis. The
algorithm is depicted in Figure 3.15.
The analysis inputs and outputs would include the following:
• Input: visit ratios, think times, mean service times, number of ter-
minals logged in or multiprogramming level
• Output: utilizations, response times, throughputs, mean queue
lengths
and so on. The different job types are represented in queueing network
models by different job classes. Here are some examples:
• A web-based online banking system has variable numbers of
users logging in concurrently, generating different types of trans-
actions, such as checking balances and initiating one-time and
repeating payments. These activities take place interactively and
may occur at the same time as background activities such as clean-
ing up databases or backing up databases and transaction histo-
ries. The online transactions enter the system and then leave once
completed. They arrive randomly. Therefore, they may be mod-
eled as a set of jobs traversing an open queueing network. The
background activities are executed by a fixed set of processes that
neither enter nor leave the system. Therefore, they may be mod-
eled as a set of jobs that circulate in a closed queueing network.
• Consider a network management system that monitors the sta-
tus of a collection of managed nodes consisting of routers, hubs,
switches, file servers, gateway servers, and all manner of hosts
performing various functions. The network management sys-
tem (NMS) is deployed on a single workstation. The NMS is
scheduled to send a status poll to each of the managed nodes
according to a defined timetable. Many of the managed nodes
are equipped with a program known as an agent. The agent can
respond to the poll and also supply information about the
node’s status. The agent can also generate messages of its own
accord if an alarm condition or other designated event occurs.
Such messages are known as traps. From the standpoint of the
NMS, the traps are random events, because they are not sched-
uled. In the NMS, the polling activity may be regarded as a set
of jobs that circulate within a closed queueing network. The
polling jobs alternate between sleeping for set periods and
awakening to initiate polls. They sleep until the polls are
acknowledged, or until a timeout occurs because the acknowl-
edgment did not arrive. The trap stream is modeled as an open
class of tasks, because the system performs a particular set of
actions (such as interrogating a database and raising an alarm)
for each trap that arrives and does not execute those actions
until the next trap arrives. Polling activity may be modeled as a
closed system, because the polling tasks never enter or leave
the system; they simply alternate between sleeping and
executing.
3.12 Multiple-Class Queueing Networks 73
λ icVicSic if c ∈ W
U ic = { (3.51)
XicVicSic if c ∈ C
Sic
Sic* = , c ∈ C (3.52)
1− ∑U ir
r ∈W
Notice that when there are no open job classes, W = ∅ and the sum in
the denominator of equation (3.52) is empty. In that case, we have
Sic [1 + ni (N)]
Ric = (3.55)
1− ∑U ic
c∈W
Thus, the delay of an arriving open job is caused by the delay due to the
average number of queued closed jobs plus the service of the arriving
open job, all inflated by the complement of the resource usage attribut-
able to the open jobs. The form of the response time of the open jobs is
similar to that for closed jobs, with the exception that an arriving closed
job sees the average queue length with itself removed, while the arriv-
ing open job sees the average queue length with no closed jobs removed.
The closed formula is a consequence of Jackson’s Theorem and the
Arrival Theorem, while the open formula is a consequence of the
Arrival Theorem and the theorem that Poisson arrivals see time aver-
ages (PASTA) [Wolff1982].
3.13 Finite Pool Sizes, Lost Calls, and Other Lost Work 75
3.13 Finite Pool Sizes, Lost Calls, and Other Lost Work
In many types of systems, an arriving customer, task, or job will be
discarded if a server is not available or if there is no place for the cus-
tomer to wait. For example:
• In circuit-switched telephone systems, calls may be dropped or
rerouted if all of the circuits in a direct trunk group are busy. Calls
are not queued until a trunk becomes available. Calls that cannot
be routed along a particular trunk group are declared to be lost.
Teletraffic engineers often wish to size trunk groups so as to keep
the probability of a lost call below a small threshold. Sizing
depends on the anticipated call volume and call duration during
the busiest hour of the day.
• A barbershop has a fixed number of barber’s chairs and a fixed
number of chairs in the waiting area. If all the barbers are busy
and all the chairs in the waiting area are occupied, an arriving
customer will balk and go elsewhere. That customer is lost. A
sufficiently high customer loss rate may justify the addition of
more chairs in the waiting area and/or the addition of one or
more barbers. The decision to do either must be made carefully:
adding barbers may increase costs more than revenue, while
adding chairs in the waiting area may not increase costs but
could increase waiting times to the point that waiting custom-
ers will leave before being served.
• A multitiered computer system may consist of one or more web
servers in parallel, one or more application servers in parallel,
and a back-end database server. Communications between an
application server and the database server are mediated via a
pool of connections sometimes known as Java Database
Connections (JDBCs). This pool is known as the JDBC pool. It
has a configurable maximum size, M, say. If all of the M connec-
tors in a JDBC pool are occupied when a thread on the applica-
tion server needs to communicate with the database, the thread
will be queued until a JDBC becomes free. If the queueing buffer
itself overflows, the user’s transaction may be lost. Since both
delay and loss are undesirable, both the JDBC pool and the
memory allocated to waiting threads should be sized to keep
the probability of each occurrence below small thresholds.
76 Basic Performance Analysis
ρ = λ H (3.56)
ρ s /s !
B( s , ρ ) = s
(3.57)
∑ρ
k =0
k
/k !
ρ ′ = ρ[1 − B( s , ρ )] (3.58)
B(0, ρ ) = 1 (3.59)
ρB( s − 1, ρ )
B( s , ρ ) = , s = 1, 2,... (3.60)
s + ρB( s − 1, ρ )
The smallest value of the trunk group size s satisfying the loss require-
ment is found by iterating through equation (3.60) on s until B( s , ρ ) is
less than the desired value. Applying this recurrence relation is compu-
tationally cheaper than adding up partial series as in equation (3.57)
and may be numerically more stable as well, because the quantities in
the numerator and denominator of equation (3.60) are of the same
order of magnitude.
3.18 Summary
In this chapter we have presented the basic rules relating the perfor-
mance measures of queues occurring in computer systems to one
another, and then presented basic models of performance and their
properties. We have explored the differences between open and closed
representations of workloads and given an overview of differences
between the values of performance measures they predict. These dif-
ferences are inherent in the system structure: a closed system has a
fixed number of jobs in it, while the number of jobs in an open system
is unconstrained and potentially unbounded. Therefore, an open
3.19 Exercises 81
3.19 Exercises
3.1. We are given the following observations for a single-server
queue. During the time interval [0, 8], 4 jobs were started and
completed. The observed service times were 2.56, 0.4, 1.5, and
1.5. The observed response times were 4, 2, 4, and 4. The first
arrival occurred at time 1, the last departure at time 7. There
were no periods of idleness between the departures of custom-
ers. Compute the following for this observation period:
(a) The average completion rate
(b) The average service time
(c) The average response time
(d) The average throughput
(e) The average utilization in two ways (Hint: No idle period
between service times in this example.)
(f) The mean queue length
3.2. An airline security checkpoint may be modeled as a system of
two queueing networks. The passengers arrive at a rack of trays
to hold items to be X-rayed, load the trays, and then queue to
walk through a metal detector while the trays go through an
X-ray machine. The network seen by the customers consists of
one or more guard stations at which identities and boarding
passes are checked, followed by a queue for trays, and another
queue to go through the metal detector. The trays are stacked
(queued) in racks, awaiting use by passengers and filled one at
a time. The trays are then queued up to go through the X-ray
machines. Once emptied by the passengers who loaded them,
the trays are stacked on rolling pallets. The pallets are rolled to
their positions at the benches before the X-ray machines to be
reused.
82 Basic Performance Analysis
(a) Your output should show the global system throughput X0,
the global response time R0, and the throughputs, utiliza-
tions, mean queue lengths, and mean response times of the
individual servers. If you are using a spreadsheet tool with
plotting capabilities, plot the global system throughput, uti-
lizations of the service centers, and average response times
on separate sets of axes.
(b) Consider a closed queueing network with the parameters
depicted in the following table. Identify the bottleneck
device. Plot bounds on the system throughput when the
think time is 0 and when the average think time is 2 seconds
and 4 seconds. Plot bounds on the response time of the cen-
tral subsystem.
(c) Use your MVA tool to predict the performance of the queue-
ing network model with the same parameters with 1, 2, 3, . . .,
10 terminals logged in for think times of 0 and 4. Plot the
predicted throughputs and response times on the same axes
on which you plotted the performance bounds.
(d) Using a spreadsheet or otherwise, build a tool to predict the
performance of an open network consisting of the CPU and
two or three disks only, without thinking terminals, based
on Jackson’s Theorem.
(i) Using the global system throughputs predicted by the
closed queuing network model as inputs to the open
model, predict the utilizations, response times, and mean
queue lengths of the individual servers and the response
time of the system as a whole. Also, compute the sum of
the mean queue lengths of the individual servers.
(ii) Compare your results with those predicted by the
closed queueing network model when the think time
is zero. Are the predicted utilizations the same
84 Basic Performance Analysis
Workload
Identification and
Characterization
We describe the need to specify the functionalities of a system and to
identify the nature of the performance characteristics each functionality
must have to be effective. The notion of a reference workload will be
introduced as a vehicle for specifying a straw workload for the purposes
of when several workloads are possible. We shall discuss the impact of
time-varying behavior on system performance—for example, whether
the load offered to it is rhythmic and regular, whether it varies seasonally
or by time of day, whether it is growing over time, and whether it is
inherently subject to potentially disruptive bursts of activity. These work-
load characteristics must be understood for performance requirements
to be properly formulated and for the system to be architected in a cost-
effective manner to meet performance and functional needs. Numerical
examples of workloads from different application domains will be given.
Depending on the design of the system, the monitoring may take the
form of periodic polls to check status and/or the issuance of alarms
when certain operating parameters, such as temperature, are exceeded.
Excessive temperature could be a sign of bearing wear, and lack of con-
nectivity to a programmable control unit would mean that that piece of
the system could not be controlled at all. Finally, a monitoring system
is needed to detect jams and other alarm conditions, such as the pulling
of a red cord to stop the conveyor altogether. All of this monitoring
incurs network, processing, and logging costs. Performance engineers
must determine the permissible delays of these functionalities, espe-
cially the time to stop the conveyor after the cord is pulled, and ensure
that safety monitoring is not impeded by payload functionality, such as
moving luggage.
Combining this data with knowledge about the flow of information
through the control system, the topology of the conveyor system, and
the topology of the network that controls it, we can establish a baseline
workload characterization of the system that will eventually be used as
input to the preparation of performance requirements and may even
impact the system architecture. For this system, the following types of
workloads may be identified, each with its own demand characteristics
and performance requirements: luggage movement and delivery, sys-
tem monitoring, and quick response to alarm conditions such as the
pulling of a red cord to stop the system altogether. In some systems,
logging of luggage movement may also be required as a deterrent to
tampering and theft. This logging may also be useful to improve effi-
ciency, since luggage that is misrouted may be returned to the system
entry point. This is wasteful.
• The numbers of bar code queries per hour from each scanner.
• The number of diversion points.
• The lengths of the conveyor segments.
The monitoring workload is used to ensure the continuous function
and prompt repair of the various components of the conveyor system.
The workload might be described by the following quantities:
• The number of motor devices being monitored.
• The number of polls of motor status per motor device per hour.
• The size of each polling status message for each motor. The
message might include motor temperature, whether it is run-
ning, whether it is getting a clean power supply at the right
voltage and amperage, and so on.
• The number of other hardware devices being monitored, the
number of polling messages per hour associated with each one,
and the sizes of those messages.
• The number of program logic controllers and other networked
elements in the conveyor system, and the frequency and size of
each type of associated status message.
• The actions to be taken upon receipt of the response to a polling
message, or the absence of a response to a polling message for
whatever reason.
Quantity Value
The number of login sessions initiated in the peak 100
hour
The number of login sessions ended in the peak hour 100
The average duration of a login session 10 minutes
Calculated average number of sessions (100/hour × 10 16.7 sessions logged in
minutes) = (100/hour × 0.167 hours)
Maximum allowed number of login sessions 100
If the system is already in production, the average and Compare this with the
peak numbers of login sessions observed during the calculated value
peak hour
The average number of buy transactions in a session 0.8
The average number of sell transactions in a session 0.8
The average number of limit order transactions in a 0.05
session
The average number of balance inquiries in a session 0.95
The average number of statement requests in a session 0.05
Numbers of buys and sells during the peak hour 160
Statements generated at the end of each month 1,000,000
Logging of user interactions 1 record per mouse click
Transaction history requests generated online per 0.3
session
The frequency with which transaction logging and 1/minute
audit functions are activated in the background, and
the amount of work they have to do as a function of the
rates at which the transactions and login sessions occur
Fraud monitoring activities Unknown
4.6 Numerical Illustrations 101
The table does not tell us when or how often fraud monitoring
activities occur. Nevertheless, fraud monitoring is a critical activity
with performance requirements and demands of its own. Fraud detec-
tion logic might be invoked during or after each transaction. It could
also be an ongoing activity that occurs in the background. The cost of
background processing might increase with the transaction volume, or
it could be constant. Even if the security team does not wish anything
to be shared about fraud detection logic, including processing costs
and performance requirements, enough processing and storage
resources must be provided to ensure that fraud detection is timely and
that it does not interfere with the applications of interest. In the absence
of detailed information, performance engineering to support fraud
detection might be done by making assumptions about resource costs
and performance requirements, and flagging these assumptions in
requirements documents and the descriptions and parameterizations
of performance models and their predictions. The parameters may be
varied between their assumed best and worst cases so that a range of
impacts on overall performance can be determined.
Quantity Value
Space between suitcase handles 1.5 m
Conveyor speed 1 m/sec
Bags checked in per hour 3,000
Departures per hour 15
Bags transferred between flights per hour 1,000
Arriving flights per hour 15
Bags claimed at this airport per hour 3,000
Induction points (entry points for luggage) 30
Bar code scanners 50
Diversion points 50
Programmable logic controllers 25
Number of status monitors 80
Number of status messages per monitor per hour 60
102 Workload Identification and Characterization
Quantity Value
There are five enunciators (alarm bell, strobe light, or loudspeaker for
playing stored messages instructing evacuation) on every floor. There
is one alarm control panel for the entire building, near the main
entrance. When there is no emergency, each smoke detector sends a
status message to the alarm control panel every 5 minutes. Each smoke
detector that has “smelled” smoke sends a message to the alarm control
panel every 10 seconds. The alarm panel logs all messages from the
pull stations and smoke detectors and displays the ten most recent ones
to come in on a liquid crystal panel. This data is summarized in Table 4.3.
4.7 Summary
Workload identification proceeds naturally from an examination of the
sets of functionalities of a system and the time patterns of their invoca-
tion. The specification of the invocation patterns is complemented by a
description of the numbers of entities of various types involved in the
functionality and perhaps their expected sojourn time in the system.
Numerical specifications of the workloads must be mutually consist-
ent, so as to avoid confusion about the scale of the system and how
many user components the system must support. As we shall see in
Chapter 5, this is a prerequisite for the correct identification of system
performance requirements.
4.8 Exercises
4.1. A web-based news service allows viewing of the front page of
a newspaper, the display of stories shown on the home page,
and, for premium subscribers, access to all the news stories
posted on the site in the last ten years. Premium users must be
registered in the system and then may log in and out to access
the story database as much as they desire. Payment for access
may be by a period subscription or by the article viewed.
(a) Identify the activities a subscriber or an unsubscribed reader
may perform.
(b) Describe the set of activities a journalist may perform on
the site.
(c) Describe the set of activities an editor may perform on
the site.
104 Workload Identification and Characterization
From Workloads to
Business Aspects of
Performance
Requirements
We build a bridge from workload identification to performance require-
ments, explore how performance requirements relate to the software
lifecycle, and explore how performance requirements fit into a busi-
ness context, particularly as they relate to the mitigation of business
risk and commercial considerations. We also describe criteria for ensur-
ing that performance requirements are sound and meaningful, such as
unambiguousness, measurability, and testability.
5.1 Overview
Poor computer system performance has been called the single most
frequent cause of the failure of software projects [SmithWilliams2001]
and is perceived as the single biggest risk to them [Bass2007]. The prin-
cipal causes of poor performance are architectural choices that are
105
106 From Workloads to Business Aspects of Performance Requirements
5.5.3 Confidentiality
A great deal can be inferred about the competitiveness of a product or
the commercial position of the intended customer by examining per-
formance requirements. For example:
• The ability of an online order entry system or call center to handle
transactions at a given rate in the busy hour may be an indicator
of the owner’s anticipated growth, with consequent impacts for
revenue and market share. This intelligence could be valuable to
a competitor or an investment analyst trying to forecast the future
earnings of both the buyer and the supplier.
• The ability of a network management system to handle traps at
a given peak rate, combined with knowledge of the number of
nodes to be managed and the peak polling rate, can tell us about
the intended market segment of the product while nourishing
speculation about the product’s feature set, or even about the
nature of the site the system is intended to support. This can
affect price negotiations between supplier and buyer, and per-
haps the supplier’s share price.
These examples illustrate why performance requirements and any con-
tractual negotiations related to them should be treated as confidential
and perhaps even covered by nondisclosure agreements (NDAs). The
release of performance requirements and performance data outside a
circle of individuals with a need to know should be handled with great
care. Engineering, marketing, legal, and intellectual property depart-
ments should all be involved in setting up a formal process to release
performance data to third parties under nondisclosure agreements or
to the general public.
116 From Workloads to Business Aspects of Performance Requirements
5.6.2 Unambiguousness
First and foremost, a performance requirement must be unambiguous.
Ambiguity arises primarily from a poor choice of wording, but it can
also arise from a poor choice of metrics.
Example 1: “The response times shall be less than 5 seconds 95% of the time.”
This requirement is ambiguous. It opens the question of whether this
must be true during 95% of the busy hour, during 95% of the busiest 5
minutes of the busy hour (both of which may be hard to satisfy), or
during 95% of the year (which might be easy to satisfy if quiet periods
are included in the average). In any case, the response time is a sam-
pled discrete observation, not a quantity averaged over time.
Consider an alternative formulation:
Example 2: “The average response time shall be 2 seconds or less in
each 5-minute period beginning on the hour. Ninety-five percent of all
response times shall be less than 5 seconds.”
This requirement is very specific as to the periods in which averages
will be collected, as well as to the probability of a sampled response
time exceeding a specific value.
118 From Workloads to Business Aspects of Performance Requirements
5.6.3 Measurability
A well-specified performance requirement must be expressed in terms
of quantities that are measurable. If the source of the measurement is
not known or is not trustworthy, the requirement will be unenforce
able. Therefore, it must be possible to obtain the values of the metric(s)
in which the requirement is expressed. To ensure this, the source of the
data involved in the requirement should be specified alongside the
requirement itself. The source of the data could be a measurement tool
embedded in the operating system, a load generator, or a counter gen-
erated by the application or one of its supporting platforms, such as an
application server or database management system. A performance
requirement should not be adopted if it cannot be verified and enforced
by measurement.
Example 4: The average, minimum, and maximum response times during
an observation interval may be obtained from a commercial load generator,
together with a count of the number of attempted, successful, and failed
transactions of each type, but only if the load generator is set up to collect
them. A performance requirement expressed in terms of these quantities
should be written only if they are obtainable from the system under test
or from the performance measurement tools available in the load drivers.
5.6 Guidelines for Specifying Performance Requirements 119
5.6.4 Verifiability
According to [IEEE830], a requirement is verifiable “. . . if, and only if,
there exists some finite cost-effective process with which a person or
machine can check that the software product meets the requirement.
In general any ambiguous requirement is not verifiable.” For perfor-
mance requirements, this means that each requirement must be testable,
consistent, unambiguous, measurable, and consistent with all other per-
formance and functional requirements pertaining to the system of
interest.
Where a performance requirement is inherently untestable, such as
freedom from deadlock, a procedure should be specified for determin-
ing that the design fails to meet at least one of the three necessary con-
ditions for deadlock. These are circular waiting for a resource, mutual
exclusion from a resource, and nonpreemption of a resource
[CoffDenn1973]. On the other hand, if deadlock happens to occur dur-
ing performance testing, we know that the requirement for freedom
from it cannot be met. We also know that there is a possibility of a
throughput requirement not being met, since throughput is zero when
a system is in deadlock.
5.6.5 Completeness
A performance requirement is complete if its parameters are fully speci-
fied, if it is unambiguous, and if its context is fully specified. A require-
ment that specifies that a system shall be able to process 50,000
transactions per month is incomplete because the type of transaction
has not been specified, the parameters of the transaction have not been
specified, and the context has not been specified. In particular, to be able
to test the requirement, we have to know how many transactions are
requested in the peak hour, and then have some context for inferring
that the peak hourly transaction rate is functionally related to the num-
ber of transactions per month. We also have to define a performance
requirement for the acceptable time to complete the transaction.
120 From Workloads to Business Aspects of Performance Requirements
5.6.6 Correctness
In addition to being correct within the context of the application to
which it refers, a performance requirement is correct only if it is speci-
fied in measurable terms, is unambiguous, and is mathematically con-
sistent with other requirements. In addition, it must be specified with
respect to the time scale for which engineering steps must be taken.
5.6.8 Testability
We desire that all performance requirements be testable. By testable,
we mean that a cost-effective, repeatable method exists for running an
experiment that enables us to obtain a designated set of performance
measurements under defined conditions in a controlled, observable
5.6 Guidelines for Specifying Performance Requirements 121
5.6.9 Traceability
Like functional requirements, performance requirements must be
traceable. Traceability addresses the following points:
• Why has this performance requirement been specified?
• To what business need does the performance requirement
respond?
• To what engineering needs does the performance requirement
respond?
• Does the performance requirement enable conformance to a
government or industrial regulation?
• Is the requirement consistent with industrial norms? Is it
derived from industrial norms?
• Who proposed the requirement?
• How were the quantities in this requirement derived? If this
requirement is based on a mathematical derivation or model,
the parameters should be listed and a reference to or a descrip-
tion of the model provided.
• If this requirement is based on the outputs of a load model, a
reference and pointer to the load model and its inputs should
be provided, together with the corresponding version number
and date of issue.
Traceability is inherently beneficial to cost containment. If all of the pre-
ceding points can be satisfactorily addressed, the risk will be reduced
that a performance requirement that is expensive to implement goes
beyond the stated needs of the product. Traceability also reduces the
122 From Workloads to Business Aspects of Performance Requirements
5.7 Summary
Performance requirements define the performance expectations of the
system. Expectations about the volume and nature of the work to be
handled arise from identification of the workloads and sometimes from
5.8 Exercises 123
5.8 Exercises
5.1. A university requires students to use a web-based tool to
upload essays on deadline throughout the term. Each essay-
based course has its own deadline for submission. In addition
to enabling teachers and graders to mark essays and give them
back to the students, the system will archive the essays so that
other essays may be compared with them to detect plagiarism.
To be usable, the system should not take long to upload essays,
even as the deadline approaches. The system should be able
to carry out plagiarism checks early enough for the teachers to
have enough time to grade the essays within a week of receipt
or, in the case of essays that are part of take-home exams, well
before the end of finals week.
(a) Explain how you might formulate the performance require-
ments of this system (i) if it is meant to be used at a small
private college with a maximum enrollment of 1,500 stu-
dents, (ii) if it is meant to support a large university with an
enrollment of 30,000 students, (iii) if it is meant to support
the university system of an entire state, such as New York
or California.
(b) Identify performance requirements for different parts of the
work, such as uploading by the students, downloading by
the teachers, uploading by the teachers after marking, and
checking for plagiarism.
124 From Workloads to Business Aspects of Performance Requirements
Qualitative and
Quantitative Types
of Performance
Requirements
System performance requirements often state that a given load shall be
sustainable by the system, and that the system shall be scalable, with-
out specifying the meaning of sustainability and without specifying the
dimensions in which a system is to be scaled, and what successful scal-
ability means in terms of system performance and/or in terms of the
number of objects encompassed by the system. In this chapter we shall
describe the expression of performance requirements in quantitative,
measurable terms. We shall show how they can be used to reformulate
qualitative requirements in terms that meet the criteria for sound
performance requirements, such as being measurable, testable, and
unambiguous.
125
126 Qualitative and Quantitative Types of Performance Requirements
with the available bit rate of the transmission medium. This is essen-
tial for determining whether the performance requirement is
achievable.
logged in, the average footprint per user, and how often they will be
triggering activities of various kinds. Similarly, a performance require-
ment that a web site or an application be able to handle X user sessions
per month may be adequate for forecasting revenue, but it is uninform-
ative about how the web site has to be engineered for acceptable per-
formance. To remedy this, there should be an explicit statement about
the amount of user activity during the peak hour, or about how the
activity in the peak hour is related to the total monthly activity.
6.8 Summary
In this chapter we have seen examples of different forms of require-
ments and shown how some misleading or ambiguous statements of
requirements can be reworded to render them sound and unambigu-
ous. We have also seen that performance requirements that seem
6.9 Exercises 137
implicit from their context may very well be ambiguous unless their
context is specified, and that performance requirements must be for-
mulated in terms of the time scale within which they are to be engi-
neered if they are to be useful. We have also seen that requirements
regarding the exhaustion probabilities of object pools must be speci-
fied, and how the desired sizes of these object pools can be derived
from the values of other metrics and various mathematical assump-
tions. Finally, we have illustrated the formulation of performance
requirements relating to bursts of activity, especially those that occur in
mission-critical systems such as alarm systems.
6.9 Exercises
6.1. A conference has five concurrent tutorials with 30 attend-
ees each. During a 20-minute break, one or more waiters will
pour tea into cups from pots at a buffet. Tea must be consumed
before the attendees return to class. Write performance require-
ments for the tea service using the following steps:
(a) Identify the constraining factors.
(b) Identify the demand variables.
(c) Identify criteria for satisfactory performance.
(d) Identify metrics
(i) Describing the work done
(ii) Describing the performance of the system
(e) Explain which objectives are served by
(i) Having all tutorials on break at the same time
(ii) Having tutorials on break at different times
Explain how your performance requirements are shaped by
your choice of which of these objectives to fulfill.
6.2. At a German beer festival, waitresses in dirndls circulate
between a beer dispensing point and tables of revelers. Each
waitress serves patrons at a predetermined set of tables. Beer
may be consumed only by people sitting at tables. Waitresses
take batches of orders from one table at a time. Each table seats
at most ten revelers. The waitresses may carry up to ten mugs
of beer at a time, each of which has a capacity of 1 liter and
weighs 1 kilogram when empty. The distance from a group of
tables to the beer dispensing point is about 50 meters.
138 Qualitative and Quantitative Types of Performance Requirements
Eliciting, Writing,
and Managing
Performance
Requirements
We explore the processes of eliciting, gathering, and documenting
performance requirements. We also examine some pitfalls that may
arise in documentation, such as the expression of requirements in forms
that are antipatterns because they can lead to ambiguity and/or cause
difficulty in measurement. We show how pitfalls can arise when
prescribing the performance requirements of a system or component
that is replacing a legacy system, because the functionality in the new
system may be different but hidden, and describe how circular
dependence should be avoided. This chapter also contains guidance on
the organization of a performance requirements document and the
contents of individual requirements.
139
140 Eliciting, Writing, and Managing Performance Requirements
1. Requirement number
2. Title
3. Statement of requirement
4. Supporting commentary
5. List of precedents, sources, standards
6. Derivation of quantities
7. List of dependent requirements
8. List of assumptions and precedent performance requirements
9. Sources of measurement data
10. Name of a subject matter expert on this requirement
11. Indicator if the requirement is independently modifiable, or if not, why not
12. Indicator that the requirement is traceable
13. Indicator that the requirement is unambiguous, or if not, why not
14. Indicator that the requirement is correct, or if not, why not
15. Indicator that the requirement is complete, or if not, why not
16. Indicator that the requirement has passed or failed review, and why
to be captured pretty much in the state seen by the user. In this case, the
subject was a walking cow with a bell hanging from its collar. The digi-
tal camera took so much time to capture the image that the resulting
photo included the cow’s udder, but not the bell. The difficulty was
that the shutter reaction time on the digital camera included autofocus
and exposure settings. With the vintage camera, these would have been
done manually in advance of the shutter being released. The problem
occurred because the author erroneously assumed that the digital cam-
era would have the same shutter reaction time as the vintage camera. It
does not, and the unexpected image was the result.
One might ask whether the comparison of the shutter reaction
times is fair, given that the digital camera does so much more when the
button is pressed. The answer is that a comparison should reflect expec-
tations of the functionality that will be implemented, and that the user
should plan the shot accordingly. With the vintage camera, planning
the shot would have included several preparatory steps:
1. Opening the light meter, aligning the settings pointer on the
light meter with the needle, noting the required combination
of aperture setting and shutter speed, and setting these on the
camera.
2. Composing the picture in the camera’s viewfinder, and setting
the camera’s focus using the focus ring on the lens.
3. Pressing the shutter release button. The image is captured in
the time it takes to open and close the shutter. This is known as
the shutter speed.
The combined time to perform all of these actions could be long enough
for the cow to walk out of view altogether. By contrast, pressing the
shutter release button on the digital camera causes all three steps to be
performed. The instant at which the image is captured may be later
than the instant at which the shutter button is pressed, but the subject
may still be close to the desired position by the time the image is
captured.
Only one functionality is triggered by pressing the shutter button
on the legacy camera: opening and closing the shutter to capture the
image. On many digital cameras, multiple actions occur when the
shutter button is pressed. The lesson we draw from this comparison is
that one must evaluate the set of functionalities to be performed by
both the legacy and the replacement system components when deter-
mining the performance requirements of the replacements. We must
158 Eliciting, Writing, and Managing Performance Requirements
also take into account any changes to the interface that are required
when integrating the replacement into the system, including timing
characteristics.
7.14 Summary
In this chapter we have given an overview of how performance require-
ments documents might be prepared and structured, and about some
of the pitfalls that can arise when writing them. In this and the two
preceding chapters we have repeatedly underscored the need to avoid
ambiguity in the formulation of performance requirements and the
need to provide a clear context for them. We also insist that quantita-
tive requirements not be expressed using colloquial phrases such as
“all the time” or “of the time,” to reduce the risk of confusion about
what is meant and to provide a clear inference about how the quantities
of interest are to be measured. This is a prerequisite for the formulation
of meaningful and informative performance tests to verify that the per-
formance requirements can indeed be met.
This page intentionally left blank
Chapter 8
System Measurement
Techniques and
Instrumentation
We describe the motivation for system measurement and explore tools
and techniques for doing so. Measurement pitfalls and instrumentation
flaws will be used to illustrate the need for the validation of measure-
ment tools. We will also examine the applicability and limitations of
profiling tools and measurements embedded in the applications.
8.1 General
In earlier chapters we underscored the need to ensure that performance
requirements be expressed in measurable terms so that they can be
verified. In this chapter we look at reasons for gathering m
easurements,
including verifying performance requirements. We will examine some
of the tools with which measurements can be gathered. Discussion of
the procedures for planning measurement exercises is deferred to
Chapter 9, where performance testing will be discussed.
Just as government statistics offices collect data to track social and
economic trends, performance engineers and system managers need to
measure system resource usage and system performance to track the
163
164 System Measurement Techniques and Instrumentation
evolution of the load. This is done to ensure that the system is not over-
loaded, to verify the effects of system changes, and to ensure that the
performance of the system is meeting requirements, engineering needs,
and customer needs. Performance measurements can also be used to
anticipate the onset of system malfunctions.
Measurement is necessary to identify relationships between
resource usage measures, offered traffic, processed and lost traffic, and
response times. If the response time of a system is too long, one’s first
instinct should be to measure its resource usage to identify the cause
and fix the problem. This is true even of self-contained systems such as
laptops. However, there are many other reasons for gathering perfor-
mance measurements:
• A production system should be measured continuously so that
baseline patterns for system resource usage can be established for
different times of day and for different times of the year.
Continuous measurement and presentation of the measurements
by time of day can also reveal anomalies and trends in resource
usage. Moreover, continuous measurement of a system in pro-
duction is necessary to identify the time of day during which the
offered load is greatest.
• A production system should be measured before and after any
configuration change, so that the impact of the change on
resource usage can be determined.
• System measurement is necessary for fault detection, the trigger-
ing of alerts that undesirable events are about to take place or are
in progress, and the application of the correct measures to deal
with them. A system or network in production should be moni-
tored continuously so that a quick decision can be made to inter-
vene with a remedy if unexpected changes in performance and/
or resource usage occur. For example, sudden or otherwise unan-
ticipated increases in measured response time could be used to
trigger software rejuvenation and avert a system crash or detect
the presence of intruders [AvBonWey2005, AvColeWey2007].
• The performance of a system under development should be
measured whenever the development of new features has been
completed so that their impact on system performance can be
evaluated.
• Similarly, the performance characteristics of a subsystem or plat-
form, whether bought off the shelf or purpose built, should be
measured to ensure that it is suitable for the intended application.
8.1 General 165
version of the tool used at the time and is stated for the purpose of illustration
only, not as a warning about a particular tool. This chapter should not be
regarded as a complete listing of the measurement tools that are available, or
as a complete catalog of the characteristics and capabilities of these tools. It is
incumbent on every tool user to validate the accuracy and effectiveness of the
measurements produced, and of every counter used, as the results may vary
from release to release of the host operating system and from one system con-
figuration to another.
1 n−1
U CPU
= ∑U
n k=0 CPU , k (8.1)
0.9
0.8
0.7
0.6
Utilization
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18 20
Time (seconds)
Figure 8.1 Series of utilizations, all with average 0.5 during the period between
time = 0 and time = 20
8.6 Measurement of Multicore or Multiprocessor Systems 177
other processes to the other processors. In that case, the processor utili-
zations could be unbalanced. The utilizations of individual processors
can also become unbalanced if a process is single-threaded and is
bound to a given processor by cache affinity. In some environments, the
child threads of a process must all be routed to the processor on which
the parent process is executing. If a process is multithreaded and sub-
ject to this constraint, it will not be able to exploit the other processors,
even if they are idle. Early Java virtual machines suffered from this
constraint [Killelea2000]. This inability to exploit parallelism under-
mines a system’s load scalability [Bondi2000] (also see Chapter 11).
Sometimes the problem is recognized only after measurements have
taken place.
Figure 8.2 shows the utilizations of the individual processors in a
two-processor system. The loads are clearly unbalanced. Were the
loads balanced, the utilizations of both processors would be quite
close to the average utilization shown. This plot is based on contrived
data, but it is very similar to a plot of actual data taken from a Solaris
100%
90%
80%
70%
%CPU Utilization
60%
50% CPU0
CPU1
40% Average Utilization
30%
20%
10%
0%
0 1 2 3 4 5 6 7 8 9 10 11
Offered Load
Figure 8.2 Unbalanced processor utilization as a function of the offered load (synthetic
data)
8.6 Measurement of Multicore or Multiprocessor Systems 179
server with two processors using the mpstat command. The server
was running a version of UNIX supplied by Sun Microsystems that
used cache affinity. The utilizations of the individual processors and
the average utilizations are linear with respect to the offered load, and
thus they obey the Utilization Law. An inspection of the processor
utilizations attributable to individual processes showed that one
CPU-intensive process was bound to the more heavily used of the two
processors. The CPU utilization attributable to this process was
approximately equal to the larger of the two processor utilizations at
all load levels. Less-CPU-intensive processes ran on the other proces-
sor. We shall examine this further in Chapter 11, which deals with
scalability.
Sometimes it is necessary to conduct a pilot test with a contrived
process to clear up any questions about the interpretation of measure-
ments and the function of the operating system when the documenta-
tion is not entirely clear. When measuring a multiprocessor system, one
might be confronted with two questions:
1. How does the operating system balance threads or processes
among the processors?
2. What do the utilization counters generated by the operating
system actually mean?
One way to confront these two questions simultaneously is to run a
pilot test of a program that is known to generate at least as many
threads as there are processors and see how they are spread around.
The threads should be very CPU intensive and not contain any varia-
bles in common. Each thread will contain an infinite loop that executes
arithmetic operations, so that there will not be any I/O to cause the
thread to give up the CPU. Each thread consists solely of an executing
fragment of the form shown in Figure 8.3. Within the loop, a is repeat-
edly incremented. It is also repeatedly decremented to prevent integer
overflow.
If the number of threads is equal to the number of cores or proces-
sors, the utilization of each one will approach 100%. If there is one more
thread than there are cores or processors, a thread will be seen to
bounce from one processor to another. This is manifested by a rapidly
oscillating CPU utilization whose maximum value seems to move from
one processor to the other.
180 System Measurement Techniques and Instrumentation
Thread AddAndSubtractRepeatedly()
{
local int a;
while (TRUE)
{
a++;
a--;
}
}
a log file using a script written in awk, Perl, Python, or some other
scripting language of the analyst’s choice.
Application Database
Web Server 0
Server 0 Server 0
Application Database
Web Server 1
Server 1 Server 1
Table B and then Table A before releasing both, a deadlock will occur if
Thread 1 locks Table A and Thread 2 locks Table B, because each will
be waiting for the table required by the other. The causes of deadlocks
can be very difficult to determine. Deadlock is one of the things one
should suspect if response times are increasing while process and I/O
utilizations are down.
The following is a list of measurements that might be of interest in
examining database performance:
• The number of times each table is searched
• The number of times each table is locked, and for how long
• For databases with row-level locking, the number of times each
row in a table is locked, and for how long
• The response times of particular queries
This list is far from complete. Your database administrator and data-
base architect may have their own thoughts about what should be
measured.
8.19 Summary
The variety of performance quantities to be measured in the hardware,
operating system, middleware, and applications and the large selec-
tion of measurement tools available and needed to do so indicate that
there are many facets to computer and system performance measure-
ment. While the measurement tools are varied, the practices to which
one must adhere when using them are constant:
• The validity of all measurement tools and procedures must
always be scrutinized.
• Measured values that appear to be unrealistic or peculiar should
always be investigated, especially if they violate basic perfor-
mance laws, such as Little’s Law and the Utilization Law.
• Performance measurement and testing should always be done
in a clean environment in which only the system being meas-
ured is running. This is analogous to using clean test tubes in
chemistry experiments.
• The instrumentation used to collect measurements should not
interfere with the payload work the system is designed to do.
These principles apply whether the instrumentation is new or in estab-
lished use. It is essential to adhere to them when measuring new tech-
nology for the first time, so that one can determine whether
measurements truly reflect the behavior of the system under study.
They hold for every aspect of measurement considered in this book.
8.20 Exercises
8.1. A new programming language is introduced to implement
monitoring and control systems. Explain how you would
verify that executable code that is generated from the source
code of this language is capable of running on multiple proces-
sors simultaneously.
Performance Testing
Performance testing is essential for avoiding unpleasant surprises in
production, such as slow response time, inadequate throughput, and
dropped transactions. In this chapter we will learn how performance
tests can be structured to verify that the system has desirable scalability
properties, such as resource utilizations that are linear functions of the
offered load. We shall discuss performance testing practices and proce
dures and review and interpret actual performance data. This data illus
trates how performance testing can be used to uncover undesirable
properties of the system, preferably before it goes into production. The
chapter concludes with a discussion of performance test automation and
the value of automating the analysis of performance measurements.
199
200 Performance Testing
never been used before. Installation and configuration entail the use of
documentation and tools for the first time ever, as well as the creation
of data files and data streams intended to be like those that will be in
place when the system goes live.
The performance testing team should not be required to go about its
task in isolation from other stakeholders. To facilitate the timely resolu
tion of issues that arise, the performance team should have ready access
to the system architect, functional testers, the development teams, and
the teams that developed the integration and configuration tools needed
to set up the system. All of these stakeholders should be available to
provide support in dealing with any bugs that arise. Any necessary
changes to the system under test should be implemented while follow
ing your organization’s change management process, so that the changes
and their effects are clearly documented. A configuration change will be
less arduous than one involving a change to the code. If a change must
be implemented more quickly than a change management process
would allow, the lead performance engineer must ensure that all
changes and their observed impacts are carefully documented so that
the changes can be appropriately logged. This is essential for software
auditing purposes. Some customers for the software may require it.
Conversations with product managers and even users may help the
performance testers build and implement a performance test plan that
has broad credibility. Acquaintance with the domain of application of
the system under test is also useful. The performance testing team
should have access to the specifications for functional and nonfunc
tional requirements so that they can determine if observed behaviors
are correct, or whether strange behaviors are due to ambiguities or
other defects in the requirements themselves. A defect in a requirement
or a poorly written specification could lead to unexpected behavior that
is sufficient to prevent any further testing from going forward. Systems
based on service-oriented architectures are vulnerable to this problem:
if one of the services has undesirable performance characteristics,
hangs, or goes into an infinite loop, nothing built on top of it will work.
Virtual
User 1 Load Generator
with
Requests M Virtual
Users
Virtual
User M
Responses
Figure 9.1 Hypothetical load generator and the system under test
by bk , which could be zero. Then, when the transaction rate on the sys
tem is λ, the utilization of resource k should take the form
To verify that our system has this property, we should run performance
tests on it at different levels of λ to determine whether U k is indeed
linear in the offered transaction rate λ. One should choose the values of
λ so that they cause the utilization of the bottleneck resource to range
from 10% to 95%. Pilot tests at various levels of the arrival rate or trans
action rate λ should be used to determine those values. Running the
system with a single (possibly large) transaction rate λ tells us only
whether the system was able to function at that rate or not. It does not
tell us about trends in system loading.
The system should be measured for a nontrivial amount of time,
such as 5 minutes, without any applied load present, so that one can
determine if there is any background activity that consumes system
resources. Doing so enables the determination of values of the intercept
bk for each device. One should also investigate whether there is a
system-based reason for there to be a background load that runs even
when no payload is present. It is important to do this because arbitrar
ily setting the intercept to zero implies an assumption that there is no
background load, while also resulting in a possible incorrect estimation
of the demand Dk.
We can evaluate the linearity of utilizations with respect to transac
tion rate with the following steps:
1. Run the performance tests at increasing load levels, for the
same period of time for each load level, and measure the aver
age resource utilizations over each run. If the load is driven
asynchronously, the load levels may be chosen by varying the
arrival rate or, equivalently, the average time between arrivals.
This corresponds to an open queueing system. If, in the actual
system, the user thinks between receiving the response to
the previous transaction and launching the next one—that is,
synchronously—the throughput can be increased by increas
ing the number of virtual users, and sometimes by reducing
the average think time. If the load is driven synchronously,
the throughput cannot be controlled, but the device utiliza
tions should be linear functions of the measured throughputs
nonetheless.
9.3 Performance Test Planning and Performance Models 207
R = M / X − Z (9.2)
X = M /(R + Z) (9.3)
Thus, a longer think time or a longer response time could drive down
the maximum throughput that can be attained. If it is necessary to
increase the offered throughput without increasing the number of vir
tual users M, one must reduce the think time.
216 Performance Testing
There are a number of reasons why one might not be able to increase
the number of virtual users. If commercial load drivers are used, the
license cost per virtual user might be high. Even if there is no addi
tional monetary cost per license, the number of virtual users running
simultaneously may be constrained by the number of load-driving PCs
available for the test and by their individual capacities. Therefore, it
may be desirable to increase the load that can be offered in the testing
environment by reducing the value of the think time Z.
The maximum attainable throughput for a given number of virtual
users M is obtained by setting the think time to zero. Thus,
X ≤ M / R (9.4)
response time cannot be met. If that transaction rate is less than 100
per second, a remedy must be sought. If that transaction rate is
greater than 100 per second, there may be some room to grow the
system load or add functionality once the system is in production, or
else the system has been overengineered. Testing at a higher rate
than the required one allows one to determine whether the system
can cope with transient spikes when the transaction rate might
exceed the specified rate.
When testing performance, one should also ensure that functional
requirements have been met by verifying that outputs are as expected.
Unexpected outputs could be, but need not be, caused by concurrent
programming errors. The importance of this is underscored by the
need to verify that concurrently executing programs have interacted as
intended. That is the subject of the next section.
100 450
90 400
80
350
70
300
60
CPU Pct Busy
Actual TPS
250
50
200
40
150
30
20 100
10 50
0 0
0 100 200 300 400
Target TPS
Figure 9.2 CPU utilization (left axis) and throughput (right axis) of a healthy system
versus offered throughput
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and
Bondi, A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of
Computing Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305–322. New York:
Springer, 2012.
0.30
Figure 9.3 Transaction response time of the healthy system versus offered throughput
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A.,
and Bondi, A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and
Evaluation of Computing Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel,
305–322. New York: Springer, 2012.
Figure 9.4 CPU utilization versus time for the healthy system—offered throughput 300
transactions per second
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and
Bondi, A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of
Computing Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305–322. New York:
Springer, 2012.
234 Performance Testing
Figure 9.5 Average response time versus time for the healthy system—offered throughput 300
transactions per second
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and Bondi,
A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of Computing
Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305–322. New York: Springer, 2012.
100 2500
90
80 2000
70
CPU Pct Busy
60 1500
Actual TPS
50
40 1000
30
20 500
10
0 0
0 500 1000 1500 2000 2500
0.14
0.12
0.08
0.06
0.04
0.02
0
0 500 1000 1500 2000 2500
Target TPS
when 2,000 TPS are applied, we see that utilization is declining slightly
with time as shown in Figure 9.8, while the average response time oscil
lates wildly with amplitude that increases over time, as shown in Figure
9.9. Moreover, the average CPU utilization for 2,000 TPS in Figure 9.8 is
lower than the measured utilization for 1,500 TPS shown in Figure 9.7.
Taken together, these observations are signs of a concurrent pro
gramming problem that manifests itself once the arrival rate exceeds
1,500 TPS. Since the CPU is the bottleneck and its utilization is linear in
the arrival rate up to this level, with a value of approximately 40%, we
surmise that the service could be provided at 2,000 TPS on this plat
form without difficulty in the absence of the concurrent programming
problem. The system would have to be restricted to 1,500 TPS for the
system to be operable as implemented, but that would be hazardous,
since concurrent programming bugs tend to occur nondeterministi
cally. On investigation, a segment of Java code was found in which the
wait, notify, and notify_all operations were being used incorrectly.
Replacing this piece of code resulted in more regular performance.
Figure 9.8 CPU utilization versus time for the unhealthy system, with an offered throughput
of 2,000 TPS
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and Bondi,
A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of Computing
Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305-322. New York: Springer, 2012.
same rate. The top plot in Figure 9.10 shows the number of load genera
tors (also known as virtual clients) as a function of time. In the bottom
figure, the upper plot shows the transaction completion rates and the
lower plot shows the transaction failure rates as functions of time. It
appears that this system is either grossly saturated or dysfunctional,
because the number of failed transactions is equal to the number of com
pleted transactions from time to time. Notice also that there is a spike in
the rate at which transactions are completed sometime after the intro
duction of each load generator. A performance test with this sort of result
raises questions about whether the throughput requirements of the sys
tem and hence the test cases were wisely chosen. It turns out that the
code also suffered from thread safety issues. This caused data to be cor
rupted, contributing to the volume of failed transactions.
9.18 Interpreting the Test Results 237
Figure 9.9 Average response time versus time for the unhealthy system, offered throughput
2,000 TPS
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and Bondi,
A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of Computing
Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305-322. New York: Springer, 2012.
Figure 9.10 Transaction systems with high failure rate. Top: Number of virtual users
versus time. Bottom: Transaction pass rate is the thicker line and transaction rate
failure is the thinner line.
9.18 Interpreting the Test Results 239
90
R2 = 0.9999
80
70
Percent Utilization
60
Total CPU Util
50 User_Disk_Util
System_Disk_Util
40 Linear (Total CPU Util)
R2 = 0.9533 Linear (User_Disk_Util)
Linear (System_Disk_Util)
30
20
R2 = 0.9671
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Work Units per Second
Figure 9.11 Computationally intense transactions: average processor, user I/O, and system
I/O utilizations with regression lines
9.18 Interpreting the Test Results 241
100
90
80
70
Total CPU Util
CPU0_0util
Percent Utilizations
60 CPU0_1util
CPU0_2util
50 CPU0_3util
CPU1_0util
CPU1_1util
40
CPU1_2util
CPU1_3util
30 System_Disk_Util
User_Disk_Util
20
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Normalized Work Units per Second
Figure 9.12 System with computationally intense transactions: utilization of all devices,
including individual processor cores
300
250
Normalized Average Response Time (sec)
200
50
0
0 1 2 3 4 5
Normalized Work Units per Second
Figure 9.13 Response times for the computationally intense transactions
Figure 9.14 Task Manager performance display showing signs of a memory leak and repeated
deadlocks
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and Bondi,
A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of Computing
Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305–322. New York: Springer, 2012.
Figure 9.15 The same system as in Figure 9.14 after remedies were applied
Source: [AvBon2012] With kind permission from Springer Science+Business Media: Avritzer, A., and Bondi,
A. B. “Resilience Assessment Based on Performance.” In Resilience Assessment and Evaluation of Computing
Systems, edited by K. Wolter, A. Avritzer, M. Vieira, and A. van Moorsel, 305–322. New York: Springer, 2012.
244 Performance Testing
9.20 Summary
In this chapter we have covered a wide range of topics concerning per
formance testing. While some of the suggested performance testing
practices may appear to be mundane or obvious, we have recom
mended them because our experience has shown that failure to adhere
to them seriously diminishes the value of the performance test results.
We have shown how performance testing that is structured to verify
that the conformance of a system to properties predicted by rudimen
tary performance models yields essential information about the ability
of the system to handle increased offered loads. Among these proper
ties are constant values of average performance measures and resource
usage measures under constant average offered loads and linearity of
the resource utilizations with respect to the average offered loads. We
have described how test beds should be structured to reflect the archi
tecture of the target production system and explained how perfor
mance measurements should be collected in a clean environment to
ensure that they reflect resource usage by the system under test alone.
We have also illustrated how the results of performance tests can be
used to identify concurrent programming issues and software bottle
necks and have related these results to the predictions of standard per
formance models.
9.21 Exercises
9.1. As passengers’ luggage passes through an airport conveyor
system, the bar codes on luggage tags are read by a scanner
that sends the unique identifier of each tag to a database for
instructions on how to route the suitcase. The average suitcase
passes ten such bar code scanners during its passage through
the airport. The luggage belt moves at 2 meters per second,
and suitcases are positioned so that the handles bearing the
tags are 1 meter apart. There is a database of this kind in each
airport. A suitcase is registered in the database either when it is
checked in by a passenger departing from the airport or when
the manifest of an arriving plane is downloaded into the data
base. An arriving suitcase may be routed straight to baggage
claim or to the loading bins destined for connecting flights.
The following is the set of use cases you are asked to assess by
9.21 Exercises 247
(a) Suppose that there are no thinking terminals, and that these
are the parameters of an open system in which transactions
arrive asynchronously, without awaiting a system response.
What maximum throughput should be offered to the sys
tem by the load drivers? Explain your results using Mean
Value Analysis or otherwise.
(b) Suppose instead that we wish to measure the effect of vary
ing the think time between 0 and 8 seconds. What is the
maximum number of logged-in terminals the system can
sustain with think times of 0, 4, and 8 seconds? If you wish
to double the number of sampled response times obtained
when there are four terminals logged in (e.g., so as to be
able to have narrower confidence bounds), what combina
tion of think time and number of terminals would you use?
Explain your results and justify your choice using Mean
Value Analysis or otherwise.
9.3. Performance measurements displayed by the Windows XP
Task Manager show that both processors in a dual-core system
are 50% busy even when the system is empty and idle before
load has been applied. Clicking on the Processes tab shows that
there is a polling process whose processor utilization is 100%.
The polling process is intended to check if an arriving mes
sage queue is nonempty, and then forward any waiting mes
sages to an application process for handling. An examination
of the design document reveals that the polling process should
repeatedly check the queue to see if a message has arrived, and
then forward it for processing elsewhere. The specification con
tains no statement about how often polling must occur or about
how the polling process should behave as long as the queue is
empty. Neither does the corresponding functional requirement.
(a) Explain why the displayed CPU utilization of a single pro
cess is 100% while the utilizations of both processors are
only 50%. What feature of the operating system might make
this possible?
(b) What desirable attributes appear to be missing from the
functional requirement for the polling process?
(c) Propose a design change to the polling process that will fix
the CPU utilization problem and does not impose a theo
retical constraint on the maximum polling rate. That is,
9.21 Exercises 249
System
Understanding,
Model Choice, and
Validation
To evaluate a system’s performance, one must first acquire an under-
standing of its desired function and of the paths followed by the
information that flows through it. The performance of a complex
system may best be understood by breaking it down into compo-
nents whose performance can be engineered separately, and then
combining the resulting models into a larger one so that the entire
system can be modeled as a whole. This entails understanding the
mission of the system, the system’s architecture, the information
flow through the system, and sometimes the flow of transported
entities. At the same time, one needs to obtain both a qualitative and
a quantitative feel for the traffic demands and the performance
requirements, because these are, or should be, the drivers of the
need for capacity and the design and implementation choices made
to support them. If a performance model turns out to be inaccurate,
the reasons for the inaccuracy should be investigated so that the
limitations of the model can be understood.
251
252 System Understanding, Model Choice, and Validation
10.1 Overview
In this chapter we shall study how one goes about modeling a computer
system. This entails not only identifying the servers and the parameters
of a queueing model as suggested in Chapter 3. To begin with, one must
identify the questions to be answered by the model and identify the sali-
ent aspects of the system’s structure and function. One should identify
those portions of the system that could be modeled separately and
determine the level of detail to be captured in each component model.
It is not always advantageous to build a detailed model incorporat-
ing all facets of the system. A highly detailed model may not be needed
to answer basic questions about the capacity of the system or about the
effectiveness of a design choice from a performance standpoint. It is
often sufficient to focus one’s modeling efforts on the foci of load and
the principal factors determining capacity and response time. The more
detail is captured in the model, the more parameters are required to
evaluate it. The values of the parameters may not always be obtainable,
and using incorrect values may introduce errors in the performance
predictions that might not have occurred with a less detailed model.
Moreover, a detailed model may include a considerable amount of
information about the state of the system. In general, the more dimen-
sions are used to describe the state of the system, the more computa-
tionally expensive the evaluation of the system performance might be.
If a question arises about the influence of a detail of system design such
as buffer size or the use of a scheduling rule, it may be preferable to
address the question in isolation in a separate model. The results of that
model can then be used to address the impact of the detail on the sys-
tem as a whole. For example, if it is determined that a scheduling rule
could cause the system to go into deadlock and crash, it is best to
address the performance impact of that scheduling rule on its own, and
then determine what the performance of the system would be if a
deadlock-free rule were used instead.
The application of performance engineering techniques often
begins with questions about a system’s functionality and the perfor-
mance needs it must meet. The questions depend on the current status
of the system and/or how it might be changed. For example:
• If one is building or procuring a system for the first time,
there will be questions about the needed capacity, the desired
response times, the functionality to be supported, the tech-
nology that is available to implement functional and
10.1 Overview 253
measured ones. Despite that, the predicted response time plot and the
measured response time plot had similar shapes, reflecting the onset of
saturation as the offered load increased. The discrepancy was at least
partially explained by the use of asynchronous I/O to allow I/O and
processing of a transaction to occur simultaneously rather than serially.
In the projection phase, a validated model could be used as a base-
line for answering questions about what would happen to system per-
formance if transactions involved amounts of I/O activity or processing
time that differed from those that were measured, because of changes
in the nature of the work. It could also be used to model the effects of
adding more I/O devices to ease the load on the existing ones, or the
effects of adding faster processors.
Projecting the changed performance of the system is sometimes
called what-if analysis, because one is addressing questions like “What
if we add an I/O device?” or “What if we add a faster processor?”
Because the baseline model we described in Chapter 9 does not capture
the effect of asynchronous I/O, parallel CPUs, RAID (redundant array
of inexpensive disks) devices, or priority scheduling of any kind, it
should not be used to answer questions about the effects of changing
them. Of course, the model will tell us something about the values of
the response times in the absence of asynchronous I/O, since complete
serialization of I/O and processing were assumed. Moreover, because
utilizations depend only on processing time and arrival rate, and not
on scheduling polices, the baseline model will also tell us whether add-
ing load to the system will cause its capacity to be exceeded.
The inadequacies of the simple model illustrate a critical question
often faced by modelers and performance engineers: Is the performance
model that has been devised sufficient to address the concerns of the
system’s stakeholders, or are more accurate models that capture more
system details needed? The answer to this question could depend on
multiple factors, including the resources and time available to build
more sophisticated models, the level of expertise that is available within
the organization to address the associated modeling complexities and
interpret the results, and the availability of sophisticated modeling tools
and the availability of data needed to compute the values of modeling
parameters. Purpose-built queuing network models of asynchronous
I/O are described in [HeidelbergerTrivedi1982]. If a queueing system is
not susceptible to modeling using queueing network models, the use of
discrete event simulations may be appropriate [LawKelton1982].
In the remainder of this chapter we examine how one might go
about modeling the performance of a fictitious conveyor system. We
256 System Understanding, Model Choice, and Validation
The PLCs use a local area network to communicate with one another
and to send queries to the parcel routing database. The responses to the
queries tell the PLCs whether a parcel should be diverted or moved
straight ahead.
The load drivers for the parcel routing database are parcel move-
ments, parcel location queries generated by people, and the occurrence
of parcel intake and parcel delivery events. The load drivers for the
monitoring station are status messages sent by the PLCs, including
alarms of any kind.
From the point of view of the PLCs and the systems for intake and
delivery, the parcel routing database is a black box whose operational
characteristics are the times taken to respond to queries and the con-
tents of the queries. From the point of view of the database, the queries,
intake registrations, and delivery registrations are streams of transac-
tions to be processed according to whatever business logic is required.
The operational characteristics of the local area network are the times
taken to deliver messages.
For the purpose of this exercise, let us assume that the response
time requirements of the parcel routing database have been specified,
and that the sum of the peak rates at which parcels pass the bar code
readers is known. Let us also assume that the message pattern between
PLCs is known and that the rates at which PLCs generate queries to the
parcel routing database at various times of day are also known. From
the standpoint of the PLCs, the query response time consists of the sum
of the database response time and the network delivery time of the
message carrying the response. The requirement for an upper bound
on the value of this sum is determined by the speed of the belt and the
distance a parcel must travel between the bar code reader and the next
point on the belt at which it might be diverted. The faster the conveyor,
the lower this sum can be. Similarly, the closer the bar code reader is to
the diversion point, the lower this sum can be. Our task is to determine
258 System Understanding, Model Choice, and Validation
whether the sum of the database query response time, the network
delay, and the PLC processing time is less than the time it takes for the
parcel to travel from the bar code reader to the diversion point. It is
usually less costly to determine these delays individually in the lab
than it is to build an entire system and see how it performs once it is
switched on. It is also easier to model the PLC, database, and network
delays separately than to model them as one large system. The results
of the respective performance models might then be combined into a
whole, just as the system components are combined into a whole after
functional and integration testing.
To summarize, we need to formulate the following models to eval-
uate the performance of the conveyor system:
The outputs of the first three models are inputs to the fourth model.
The outputs of the third model are inputs to the fifth model. The model
of the traffic attributable to the routing of parcels also describes the
demand made on the routing query database, that is, the sixth model.
All of these outputs cascade into the integrated model mentioned in
item 7. The network traffic model is described in [BSA2005].
Let us turn our attention to a model of the parcel routing database.
Because we have neither direct traffic data nor performance data for an
actual system nor an actual architecture, we recommend the use of a
reference architecture and contrived model parameters to build an ini-
tial model of the system.
10.3 Example: A Conveyor System 259
I /O1
CPU
Asynch/ Vk,s
Synch
Synch I /O k
I /O
Request Vk,a
Asynch I /O
Start and
Vk,a Completion
R =∑
0
i≠ k
Vi Ri + Vk , a dk + Vk , s Rk
(10.2)
because return to the CPU is not delayed until completion of the I/O.
10.4 Example: Modeling Asynchronous I/O 263
Uk
nk = , 0 ≤ Uk < 1 (10.4)
1 − Uk
and X0 is the global system throughput. Using Little’s Law and the
expression for the average length of an M/M/1 queue, it can be
shown that the average amount of time to process I/O requests of
either kind is
Sk
Rk = , 0 ≤ Uk < 1 (10.6)
1 − Uk
Rk , a = Vk , a Rk (10.7)
be less than or equal to the sum of these two delays. Hence, the total
time to complete all activity related to a job of this type R J is governed
by the following inequality:
max(R0 , Rk , a ) ≤ R J ≤ R0 + Rk , a (10.8)
Let us now revisit the test data and initial modeling results we saw
in Chapter 9. Figure 10.2 shows the predicted and measured response
times for the overall response time, and the predicted response times
for the individual devices.
Here, we see that the predicted response time of the RAID device is
the driver of the predicted overall response time, because it is the bot-
tleneck device, while the predicted values of the combined response
time contributions of the CPU and system disks to the predicted
response times do not increase much with the load. Because we were
not able to capture the response times of the individual devices, we
cannot directly validate the predicted values of the response times.
300
250
Normalized Average Response Time (sec)
200
Measured Avg
Response Time
150 Predicted Response
Times
Predicted Response
Time RAID
100
Predicted Response
Time CPU and System
Disk Combined
50
0
0 1 2 3 4 5
Normalized Work Units per Second
Still, the shapes of the response time curves and their placement do
yield some insights and some caveats about both the model and the
system:
1. While the rise in the measured overall response time under the
asynchronous I/O regime is not as dramatic as the predicted
rise under the assumption that I/O is purely synchronous, it
is large enough for us to suppose that not all of the I/O at the
RAID device is asynchronous. If it were, the overall average
response time would hardly rise at all, because the combined
predicted average response time of the CPU and the system
disk does not increase much with the load.
2. Even if the response time perceived by the user does not
include the asynchronous portion of the I/O, the I/O will take
a great deal of time to be completed, so that any activity that
entails reading the information written to the I/O device will be
delayed until that I/O is completed. This implies that the over-
all system response time that should be measured and mod-
eled should include that delay. The performance measurements
do include that delay. The modified response time formula in
equation (10.2) may not do so, in part because it does not quan-
tify the extent of overlap between I/O and the activity at other
devices. Indeed, the extent of that overlap may be a function
of the load itself, as is indicated by the widening difference
between the overall measured response time and the predicted
response times of the CPU and system disks combined.
3. Modeling a RAID device as a single-server FCFS queue might
not be accurate. The predicted values of the response times
could be too high. Performance models of RAID devices are dis-
cussed in [LeeKatz1993], [CLGKP1994], and [TTH2012], among
others.
4. As we mentioned in Chapter 9, the system under study has
parallel CPUs and parallel cores. This was not captured in
our model. Some of the error in the predicted overall response
time may be due to this, but not all, because the CPU is lightly
loaded compared with the bottleneck device.
In summary, while the original coarse performance model correctly
predicts the qualitative nature of the performance of the system and the
load at which saturation occurs, and correctly tells us where the
266 System Understanding, Model Choice, and Validation
10.6 Summary
The foregoing examples suggest that combining a coarse performance
model and a well-structured performance test is often sufficient for the
prediction of qualitative performance trends with respect to the drivers
of load and memory footprint. Examples of drivers of load include the
offered transaction rate in transaction systems and the rates at which
events occur in monitoring systems. Examples of drivers of the mem-
ory footprint include the sizes of the executable codes of all running
programs, the number of concurrently active transactions, the size of
the in-memory portion of any database, and, in the case of monitoring
systems, the number of devices being monitored and/or managed.
Investigating the performance impact of particular design choices,
scheduling rules, or hardware technologies might entail identifying the
areas where delays occur, and then building detailed models of these
aspects of the system in isolation. The results might then be incorpo-
rated into an integrated model of the system as a whole. Related tech-
niques include hierarchical decomposition [LZGS1984] and layered
queueing networks [FAWOD2009, XOWM2005].
For systems that have been measured while in production or while
being tested, a preliminary analysis of the measurements may provide
an indication of what part of the system may be causing performance
issues. For systems that are not yet in production, a traffic modeling
effort may be initiated because of early concerns about whether the
volume of work will exceed the capacity of one or more system compo-
nents. For example, in the case of the conveyor system, one may be able
to determine from design and protocol specifications that a given rate
of parcel movement could cause available bandwidth or I/O capacity
to be exceeded. If that is the case, investing in a detailed model captur-
ing a broad range of system functionalities would not be helpful.
Instead, one should work with the architects and other stakeholders to
determine methods to reduce the amount of message activity to the
point that the network bandwidth would be sufficient to handle the
anticipated parcel movement rate. Once concerns about network satu-
ration have been addressed, one may turn to concerns about any other
bottlenecks that might be unmasked as a result.
10.6 Summary 269
Deciding what we wish to learn from the model and how it will be
used in the future is a precondition for deciding what aspects of the
system must be modeled in detail and what aspects can be modeled
coarsely. If the model parameters are not available from measurements,
validation of the model by comparing it with measurement data will
not be possible. In that case, it will be necessary to use best-effort esti-
mates of the model parameters and run the model on various input
values to determine if the model’s predictions are highly sensitive to
them. Sensitivity to model parameters could be a sign that the intended
operating range of the system is near saturation, that the system is not
robust, or that the model is not robust. The robustness of the model and
of the system under study should always be examined carefully.
270 System Understanding, Model Choice, and Validation
10.7 Exercises
10.1. In the computationally intense transaction processing system
discussed in this chapter and in Chapter 9, the definition of
the transaction response time has not been fully described.
The transaction response time could end when the last asyn-
chronous I/O has been initiated, or when the last I/O has been
completed.
(a) Give an expression for the transaction response time based
on the definition that the response time ends when the last
I/O to the asynchronous device has been initiated.
(b) Explain why this definition is unsatisfactory if the goal of
the transaction is to transform data for subsequent process-
ing by another computation that cannot proceed until the
transformed data is available on the asynchronous I/O
device.
(c) Give formulas for upper and lower bounds on the transac-
tion response time that include the final writing of data to
the asynchronous device.
(d) Explain the circumstances under which the bounds will hold.
10.2. The status of the conveyor system in this chapter is monitored
by a system that receives status messages at regular intervals
from the PLCs and from sensors that are connected to them.
10.7 Exercises 271
Scalability and
Performance
Scalability is a highly desirable and commercially necessary feature of
many systems. Despite that, there is no universally accepted definition
of it. There is no hard-and-fast rule about how to achieve it, although the
factors that might undermine it are often readily identifiable and easily
understood. In this chapter we shall explore some definitions of scalabil-
ity. We shall identify practices and system characteristics that are condu-
cive to it and patterns and characteristics that can undermine it. Scalability
pitfalls will be explored. We shall show how to plan performance tests to
verify scalability and interpret the test results accordingly.
273
274 Scalability and Performance
CPU priority, heavy inbound packet traffic will delay I/O handling as
well. This delays information delivery from web servers.
A system may also have poor load scalability because one of the
resources it contains has a performance measure that is self-expanding,
that is, its expectation is an increasing function of itself. This may occur
in queueing systems in which a common FCFS work queue is used by
processes wishing to acquire resources or wishing to return them to a
free pool. This is because the holding time of a resource is increased by
contention for a like resource, whose holding time is increased by the
delay incurred by the customer wishing to free it. Self-expansion dimin-
ishes scalability by reducing the traffic volume at which saturation
occurs. In some cases, it might be detected when performance models of
the system in question based on fixed-point approximations predict that
performance measures will increase without bound, rather than con-
verging. In some cases, the presence of self-expansion may make the per-
formance of the system unpredictable when the system is heavily loaded.
Despite this, the operating region in which self-expansion is likely to
have the biggest impact may be readily identifiable: it is likely to be close
to the point at which the loading of an active or passive resource begins
to steeply increase delays, because it is close to saturation.
We have already seen that load scalability may be undermined by
inadequate parallelism. A quantitative method for describing parallel-
ism is given in [Latouche1981]. Parallelism may be regarded as inade-
quate if system structure prevents the use of multiple processors or
multiple cores for tasks that could be executed asynchronously. For
example, a transaction processing (TP) monitor might handle multiple
tasks that must all be executed within the context of a single process. If
the host operating system allows only one task within the TP monitor
to be executed at a time, only a single processor or core can be used.
Horizontal scaling across processors or cores is infeasible in such a sys-
tem. Similarly, single-threaded systems cannot make use of more than
one processor, either. In some cases, an application may perform multi-
ple activities. If the most CPU intensive of these can execute units of
work only serially, the load among the processors will be unbalanced.
100%
CPU0
90% CPU1
Average Utilization
80%
70%
60%
%CPU Utilization
50%
40%
30%
20%
10%
0%
0 1 2 3 4 5 6 7 8 9 10 11
Offered Load
server with two processors running Solaris. The operating system uses
cache affinity to return a thread to the processor on which it last ran
before the most recent context switch suspending its execution.
Measurements of the system were taken at increasing loads, with each
load being run for a long period of time. The processor utilizations were
obtained using the mpstat command. The total processor utilizations by
individual threads could be computed by using the ps –eLF command
or corresponding system calls to obtain the differences between succes-
sive values of the cumulative processing times and dividing these by
the wall clock time between the observations. Since the observed utili-
zation of each processor is a linear function of the offered load, there is
no apparent software bottleneck. The processor utilizations computed
from ps –eLF observations corresponded to the utilizations obtained
11.5 Qualitative Analysis of Load Scalability and Examples 285
from mpstat. From this and our knowledge of how the scheduling
works, we infer that particular threads were bound to particular pro-
cessors. Under this regime, the maximum achievable throughput is
lower than if the work could be evenly spread between the two proces-
sors, because the utilization of one CPU is much higher than that of the
other. Thus, the load scalability of this system is limited by the single-
threaded architecture of the application that executes the transactions
and the inherent seriality of the system [Gunther1998]. Figure 11.2
shows that with full load balancing, the maximum transaction through-
put could be increased from 8 to 11 transactions per second.
Note: The data in Figure 11.1 and Figure 11.2 has been contrived for
the purpose of illustration because the actual results are not publicly
available.
100%
CPU0
CPU1
90%
Average Utilization
80%
70%
60%
%CPU Utilization
50%
40%
30%
20%
10%
0%
0 1 2 3 4 5 6 7 8 9 10 11
Offered Load
Figure 11.2 Parallel processors with balanced utilizations, based on contrived data
286 Scalability and Performance
Memory Bus
Free
memory Lock
(must try again)
already set
Process
Read Lock Set lock
Get memory
lock free
START
Free memory
(done)
Work
station
transition If bus busy
per Backoff
packet
sent Silent
Solo Conflicting
transmission transmission
Ethernet bus
Idle Busy
Collision
Backoff
Start
Both of these factors make the hanger holding time self-expanding. If the
holding time is self-expanding, the product of the customer arrival rate
and the hanger holding time—that is, the expected number of occupied
hangers—will increase to exceed the total number of hangers even if the
customer arrival rate does not increase. This is a sure sign of saturation.
Notice that the impediments to the scalability of this system vary
with the time of day. In the morning, when almost all visitors are leav-
ing coats, the impediments are the number of attendants and the some-
what confined space in which they work. The same is true at the end of
the day, when all visitors must pick up their coats by closing time. At
midday, the principal impediment is the FCFS queueing rule, which
leads to deadlock.
For this system, load scalability can be increased with the following
modifications:
can reduce memory cycle stealing and lock contention. In the second
example, we use simple expressions for delays to understand the driv-
ers of self-expansion when there is a common FCFS queue for the ele-
ments of a shared resource pool, such as hangers in the museum
checkroom.
f ( pL , pS ) = ( aL / pL )/[ k + aL / pS ) (11.1)
19–20
20 18–19
19 17–18
18
17 16–17
16 15–16
15
14 14–15
13 13–14
12 12–13
11
Cost Ratio 10 11–12
9 10–11
8
7 9–10
6 8–9
5 7–8
4
3 6–7
2 5–6
1
0 0.05 4–5
0.55 Lock success 3–4
0.05
0.20
0.35
0.50
0.65
0.80
0.95
probability
0.98
2–3
0.25
0.1
0.97
0.55
0.4
0.85
0.7
0.96
0.99
0.15
with straight
0.30
0.45
1–2
0.60
0.75
0.90
0.97
locking 0–1
Lock success probability using semaphores with increasing levels
of overhead (0, 5, 10 instruction cycles per attempt)
Figure 11.5 Cost ratios of the expected number of lock attempts with straight locking and
semaphores
298 Scalability and Performance
using the former is much more robust than that of a system using the
latter. To enhance scalability, we would choose the mechanism whose
performance is least sensitive to a change in the operating load. In this
instance, the semaphore mechanism is the better candidate, as prior
literature has led us to expect [DDB1981].
H = T + V
S = D + T + V
Since the museum is open for only a finite amount of time each day, M
say, we immediately have
S, H ≤ M
T = (c + a + 1)s
11.11 Summary
At the beginning of this chapter, we pointed out that the term scalability
is vague. Attempts that have been made to define it may differ in
semantics or scope, but they all relate to enabling a system or family of
systems to gracefully grow or shrink to accommodate changes in
demand and the number of objects within their scope. The various def-
initions have been attempted so that one can determine the extent to
which scalability is a desirable quality attribute that confers technical
and commercial advantages on a system’s stakeholders.
To place scalability on a solid footing, one should view it in the
context of performance requirements and performance metrics. As we
saw in earlier chapters, performance requirements should be linked to
business and engineering needs so that the cost of meeting them can be
justified. The performance requirements, whether present or antici-
pated, can guide us in determining the extent of scalability that is
required to meet stakeholders’ needs.
304 Scalability and Performance
11.12 Exercises
11.1. In a performance test, transactions are sent to an online trans-
action processing system at regular intervals. The CPU utiliza-
tion of this system increases quadratically with time, while the
size of the swap space and the amount of occupied disk space
increase linearly with time.
(a) Explain why this system has poor load scalability in its pres
ent form.
(b) Identify a simple data structure and algorithm that might
be in use in this application. If the development team con-
firms that this is indeed what is being used, explain what
should be used in its place to prevent this problem.
11.2. You are participating in an architecture review of a computa-
tionally intense system in which successively arriving trans-
actions operate on completely disjoint sets of data. Upon
arrival, an arbitrary number of transactions may go through
some preprocessing concurrently. Once the preprocessing is
complete, each transaction joins a queue for a software com-
ponent that can process only one transaction at time, because
it is single-threaded. The application is hosted by a system
11.12 Exercises 305
Performance
Engineering Pitfalls
Choices of scheduling policies or the use of new technologies are some-
times made with the intent of increasing capacity, increasing the sustain-
able load, shortening average response times, or pleasing one or more
stakeholders or constituencies. In many of these cases, the proposed
modification will not result in the achievement of the stated performance
goal. Indeed, some performance engineering choices may cause undesir-
able side effects or even worsen performance, while incurring consider-
able implementation and testing costs. The introduction of priority
scheduling can lead to the starvation of lower-priority tasks and have
other unintended side effects. Adding processors can sometimes worsen
performance. Spawning all tasks of a specific type as threads within a
single process or virtual machine can limit parallelism and diminish sys-
tem reliability. Physical limitations on the potential instruction rates of
individual processors will make it ever more necessary to use concur-
rently executing processes and threads to shorten the total execution
times of applications and to increase system throughput. Even then, the
individual threads of execution must be implemented with performance
consideration in mind if the greatest use is to be gained from available
processing, network, and I/O resources. Virtualized environments have
performance measurement and engineering issues of their own, which
we shall also explore. Finally, we consider organizational pitfalls in per-
formance engineering, including the failure to collect or review data
about the performance of systems in production.
307
308 Performance Engineering Pitfalls
12.1 Overview
As performance engineers, we are often confronted with stakeholders
who believe that the introduction of a particular scheduling policy or
the use of a new technology must inevitably bring about performance
improvements. Many performance practitioners have encountered or
read about cases in which the well-intentioned use of a scheduling rule
has had consequences for performance and service quality that might
not have been previously imagined. In previous chapters we described
cases in which the introduction of new capacity, such as adding proces-
sors, degraded performance or did not provide performance gains that
were commensurate with the cost of the new hardware or other technol-
ogy. In this chapter we shall explore some of these potential pitfalls.
We will see that while priority scheduling can provide performance
benefits in some situations, it can be detrimental or be of no benefit in
others. It cannot increase the capacity of system resources such as
processors, I/O devices, and network bandwidth. As we saw in
Chapter 11, it can have unintended consequences. As we saw
in Chapter 10, asynchronous activity can shorten total execution times,
but it cannot in and of itself increase system capacity. The use of multi-
ple processors and cores can degrade system performance by increas-
ing memory bus contention and lock contention, so the number of
processors competing for these resources must be chosen carefully.
While the automated garbage collection used in programming lan-
guages such as Java might relieve the programmer of the responsibility
to free up unused objects, its spontaneous occurrence with its associ-
ated processing cost can seriously degrade system performance when
capacity is most needed. Finally, we briefly examine the performance
engineering pitfalls of virtual machines. These are intended to provide
contained environments for execution in server farms. They can also be
used to provide contained, isolated environments for functional test-
ing. Their use for performance testing is questionable, because there is
no way of mapping resource consumption time in the virtual environ-
ment to corresponding values in a physical environment.
rather than with respect to physical (wall) clock time. The resource
usage may or may not reflect contention by other processes running in
different virtual machines.
• Pitfall: The resource utilizations indicated by virtual machines
may not be true indicators of resource utilization.
• Reason this is problem: Based on this incorrect data, the system
could be modified in a way that makes matters worse.
• Mitigation: Measure the system in isolation on a real machine to
avoid confounding.
Virtual machines are programs that can mimic diverse operating
systems on a single host. For example, they can emulate UNIX, Linux,
and Windows environments while keeping each logically hidden from
the others. They are useful for functional testing because they provide
contained environments that prevent programs from running amok
and interfering with the memory address spaces of other programs. If
a virtual machine hangs, that is, ceases functioning, it does not disrupt
the operation of its host. It merely stops. Other virtual machines on the
host can continue to execute. Virtual machines are also seen as attrac-
tive because they can collectively make use of idle processing power in
a host, even if the applications running within each one are I/O bound.
At the same time, the processes within a virtual machine are invisible
to those in other virtual machines. This provides privacy and protec-
tion in a shared environment which might otherwise not be available.
Scheduling and synchronization of processes and threads can occur
only within the context of a virtual machine. Operating system con-
structs cannot be used to implement interprocess communication across
virtual machine boundaries. It might be possible for virtual machines to
communicate with one another via TCP sockets or other network
devices, but synchronization would be enforced entirely at the applica-
tion level in that case, and not at the operating system level. It may
therefore be difficult to detect bugs attributable to faulty or mismatched
communications between processes in different machines by referring
to the resource usage measurements of processes and threads alone.
A process might occupy 100% of the processing time available to a
virtual machine, but not necessarily 100% of the processing time rela-
tive to the wall clock time. If a virtual machine is starved of access to
the CPUs by other virtual machines operating on the same host, the
physical processor utilizations of a process would be per-process utili-
zation measured within the virtual machine multiplied by the physical
processor utilization of the containing virtual machine itself.
12.7 Measurement Pitfall: Delayed Time Stamping 317
xi + 1 − xi ≠ vi (ti + 1 − ti )
even if the speed is more or less constant and the difference between
the time stamps is small. If the motion of the vehicle is very steady
compared with the granularity of the observations, it may be best to
take the observed speeds and distances traveled at face value and treat
the time stamps as approximate at best. Analogous discrepancies may
arise in system measurements that seem to fail to satisfy Little’s Law or
the Utilization Law over short time intervals. Unless there are forensic
considerations such as a crime or crash investigation, it may be worth
smoothing the performance measurements by averaging them over
longer time periods to overcome the consequences of clock variability.
U i = X0 Di , i = 1, 2, ..., K
The bottleneck device is the one with the largest Di . Denote the sorted
demands Dk by
[1/ D( K − 1) ]/[1/ D( K ) ] = D( K ) / D( K − 1)
which is greater than one by construction. Hence, if the CPU is the bot-
tleneck device in the original system, doubling its speed (and hence
12.11 Organizational Pitfalls in Performance Engineering 321
halving the demand for processing time) will not double the maximum
attainable system throughput unless D( K ) / D( K − 1) ≥ 2.
12.12 Summary
Examples of performance pitfalls in this chapter and throughout this
book show that performance pitfalls can occur in many guises. They
can be inherent in memory management techniques. They can arise
because of organizational decisions or because of misconceptions about
the work conservation properties of scheduling rules. They can occur
because of inaccurate measurements or because of organizational anxi-
ety that leads to the misinterpretation of why performance measures
such as resource utilizations might have high values for a short amount
of time. There is no hard-and-fast rule for avoiding these pitfalls. Only
healthy vigilance and skepticism in the light of experience and clear
analysis can be used to prevent, mitigate, or remedy them.
12.13 Exercises 323
12.13 Exercises
12.1. Consider a replicated database system in which copies are
stored on separate hosts. Strict consistency between the rep-
licates is required. That means that commitment on one copy
cannot take place unless it takes place on both copies.
(a) Should the update begin on the busier host or the less busy
host? Explain.
(b) Should the process or thread that is beginning updates at
the second host be given priority over threads that are doing
the first update of some other record? (Hint: Think of the
museum checkroom problem discussed in Chapter 11 (scal-
ability). For a discussion of this problem, see [BondiJin1996].)
12.2. Explain why giving high priority to a workload that dominates
the usage of any resource may not be helpful.
12.3. The average CPU utilization of a quad-processor system is only
25%, yet the utilization of one of the processors is 100% while
the other three processors are idle. Absent further information,
what do you suspect about the response time? What suspicions
do you have about the system architecture? What will you try
to find out about the executing processes?
12.4. For the system in Exercise 3.4 with device characteristics like
those in the following table, what is the maximum system
throughput that can be attained if the speed of the bottleneck
device is doubled?
13.1 Overview
Agile software development processes use sprints lasting a few weeks
at most to provide frequent opportunities for iteration between the con-
ception of the software’s purpose, design, implementation, and testing.
Each iteration occurs during the course of a sprint. The early testing that
325
326 Agile Processes and Performance Engineering
the next sprint so that all test cases in the current sprint are executed at
least once. In the case of large projects, this is something that may have
to be discussed in “Scrum of Scrum meetings” at which scrum masters
or other representatives of individual scrum teams meet to discuss
blocking issues that can be resolved only in cooperation with other
teams.
following business day. The team could not proceed with the execution
of test cases until they had identified a work-around to the most
obstructive problem and implemented it.
This experience gave a valuable lesson that obstacles encountered
in the execution of a test plan can provide useful insights into the way
the system behaves. It also showed the team that performance testing
is extremely valuable for provoking concurrency and scheduling prob-
lems that could not have shown up in unit testing.
13.5 Summary
The foregoing narrative shows that it is possible to use agile methods to
conduct performance engineering activities such as performance testing
even when the system under test is not developed under an agile process.
Interestingly, the clear separation of the stages of the software lifecycle
enables the development and execution of performance tests to be con-
ducted under whatever process works best for the performance testing
team. Our experience has been that agile development can be very effec-
tive in these circumstances when testing team members are under the
guidance of an experienced performance engineer with agile experience.
When performance engineering is part of the development sprints, con-
siderable advance preparation of testing tools and data analysis tools is
needed to enable timely performance testing under constraints imposed
by the short durations of sprints and the likelihood that completion of the
functional testing that must precede performance testing will be delayed.
13.6 Exercises
13.1. You are the lead performance engineer in a team that is cur-
rently planning the sprints of a large software development
effort. You must negotiate time for the preparation of measure-
ment instrumentation and load generators to verify that the
performance of the system is sound.
(a) At this stage, the performance requirements of the system
are likely to be unstable, assuming that anyone has speci-
fied them, which may be unlikely. The functional require-
ments may not be fully specified either. What instrumentation
can you procure or prepare now or in an early sprint to
13.6 Exercises 337
Working with
Stakeholders to
Learn, Influence, and
Tell the Performance
Engineering Story
The effectiveness of performance engineering depends very heavily on
one’s ability to learn about the system and stakeholders’ concerns and to
relate one’s analysis and recommendations to them in terms they can
understand. The first step is to understand what aspects of performance
matter to which stakeholders so that their concerns can be documented
and articulated. At the same time, the performance engineer must assure
all stakeholders that one of the main goals of a performance engineering
effort is to ensure that performance concerns and customer expectations
are addressed, while providing them with the means and tools to meet
them. At every stage, the performance engineer needs to show that the
performance effort is adding value to the product, directly or by mitigat-
ing business and engineering risks. In this chapter, we describe which
aspects of performance may matter to whom and then explore where the
339
340 Working with Stakeholders to Learn, Influence
14.4.3.1 Architects
The architect is usually, but not always, the sole player in the software
project who interacts with all stakeholders and who has a global view
of the pieces of the system and how they relate to one another
[Paulish2002]. In the author’s experience as a performance engineer,
and in the experience of a number of fellow performance analysts with
whom he has worked, a performance engineer complements this role
by examining information flows throughout the system and by
identifying bottlenecks as well as performance requirements and other
quality attributes.
Usually, it is the architect who has the mandate from the project’s
customer to see to it that the implementation is performant and scal
able, and to require that specific solutions be adopted to achieve that
end. The architect therefore has the responsibility for ensuring that
information about the performance characteristics of solutions and
technologies are well understood and, where they are not understood,
that an understanding be acquired through early testing to minimize
business risk [MBH2005] or through consultations with trusted
colleagues with related experience. This is essential to containing the
business and engineering risks inherent in the project.
At the same time, the architect must ensure that solutions are cost-
effective, and that they comply with regulatory requirements and
legacy constraints. An example of a regulatory requirement would be
14.4 Influencing the Performance Story 349
14.4.3.2 Management
While managers may understand that inadequate performance poses a
risk to their systems, they may need to be convinced of the value of
efforts to mitigate that risk and of the benefits of remedying performance
deficiencies or proactively avoiding them altogether. They will also need
to be reassured that proposed remedies will be effective or, in some cases,
that all is well with the system and that only minor changes are needed
to ensure that performance needs are met. Where third-party subsys-
tems, platforms, and hardware are involved, the performance engineer
should take the initiative to provide recommendations about the perfor-
mance requirements of those subsystems and explain how they support
the overall performance needs of the system. The performance engineer
may also have to understand and explain the limitations of those third-
party elements, so as to avoid specifying requirements that are infeasible
with the technology available. Moreover, the performance engineer may
be asked to assist management in negotiations with third-party suppli-
ers to ensure that performance needs are met.
350 Working with Stakeholders to Learn, Influence
14.6 Examples
A performance engineer was once called in to evaluate the performance
of a system that was close to delivery. There were concerns about where
the bottlenecks were and whether the system would support the
desired transaction volume. Measurements that had already been sys-
tematically collected showed high resource utilizations under a load
close to the target load. Response times were higher than desired, yet
processor utilization was below saturation, while I/O utilizations
seemed high but not intolerable. The testing team had also collected
measurements showing that the system spent very small amounts of
time in large numbers of sections of the code, through both profiling
and observing processor utilizations by threads. Thus, the system had
poor locality of reference. The performance engineer and the testing
team put together a quick performance test plan to use the instrumen-
tation in the operating system to measure resource utilizations at loads
of n, 2n, 3n, 4n, and 5n transactions per second. Resource utilizations
were linear in the load, indicating that there was no software bottle-
neck. The cache hit ratio was low. Combining these observations with
the knowledge that the system had poor locality of reference, we came
to the conclusion that the architecture of the system was sound from a
performance standpoint, but that the memory bus and memory cycle
times were degrading performance. The test results were presented
with all stakeholders in the room. Each one was given the opportunity
to relate the observations about the measurements with the parts of the
system he or she knew best, and a course of action was then mapped
out to resolve the problem.
A performance engineer on a prolonged engagement developed
automated testing and analysis tools to provide performance assurance
14.7 The Role of a Capacity Management Engineer 355
of a system with a very large capital cost and with a large number of
use cases. Tightly integrating the use of these tools into the testing and
development processes ensured that performance issues were identi-
fied well before the delivery of each release, thus reducing the risk of
user dissatisfaction while ensuring the timely remedy of performance
issues as they arose. In this case, the performance test results and their
analysis were immediately shared with test engineers, development
managers, and architects to assure timely solutions of the problems
encountered. Presentations containing condensed depictions of the
performance data and what they said about the system were given to
management from time to time, and the full detailed data was shared
with those who were more closely involved with responding to it as
needed.
100
90
80
70
% Processor Time (All Cores)
60
50
40
30
20
10
0
0:38:02
0:39:17
0:40:37
0:41:52
0:43:12
0:44:32
0:45:47
0:47:07
0:48:22
0:49:42
0:50:57
0:52:17
0:53:32
0:54:52
0:56:12
0:57:27
0:58:47
1:00:02
1:01:22
1:02:37
1:03:57
1:05:17
1:06:32
1:07:52
1:09:07
1:10:27
1:11:47
1:13:07
Time Points
Original Interval Length Double Interval Length 10 Intervals Combined
The peaks are lower and appear to last longer when the intervals are
grouped. The peaks are lower with grouping because the utiliza-
tions were averaged over longer intervals, and they appear to be
longer because the groups of measurement intervals cover longer
time periods. Grouping the intervals masks the peaks. This can be a
disadvantage if we are trying to detect the presence of oscillations as
was discussed in Chapter 8. It can also be an advantage if one is try-
ing to determine the overall average utilizations for capacity plan-
ning purposes and if spikes in delays are not a concern. Figures 14.2,
14.3, and 14.4 show histograms of the distributions of the numbers
of intervals with utilizations at each level. The histograms show that
the distribution of the number of intervals in which the utilizations
achieve specific levels is highly sensitive to their lengths. The wide
variation in distributions and in the frequencies with which the
peaks occur suggests that peak utilizations should be a concern only
when tolerance for variability of the overall response time is very
small, especially when the peaks have short duration and when the
average utilization is a good deal smaller than the peak. It is
120
100
80
Frequency
60
40
20
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 More
CPU Utilization Group
Figure 14.2 Histogram showing the distributions of the numbers of intervals with
different measured utilizations
358 Working with Stakeholders to Learn, Influence
12
10
8
Frequency
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 More
CPU Utilization Group
Figure 14.3 Histogram showing the distributions of the numbers of pairs of intervals with
different average utilizations
5
Frequency
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 More
CPU Utilization Group
Figure 14.4 Histogram showing the distributions of the numbers of groups of ten intervals
with different average utilizations
14.11 Summary
In many ways, the practice of performance engineering is like resolv-
ing the story of the three blind men and the elephant. One man touched
the elephant’s tail and said an elephant was like a rope. One touched
the elephant’s leg and said it was like a tree. Another touched the ele-
phant’s trunk and said it was a like a snake. All three men were correct,
but none on his own could fully describe the elephant. One must com-
municate with multiple stakeholders to gain a solid idea of a system’s
14.12 Exercises 363
14.12 Exercises
14.1. The state of Catawba has its own successful health insurance
exchange as authorized by the Affordable Care Act. Catawba
decides to offer a state-subsidized dental insurance scheme to
senior citizens who do not have other coverage, because pub-
licly insured dental care is not provided by their government-
run insurance plans. You are a performance engineer who has
been called in by the Catawba health department to advise
on ensuring the performance of the system with the added
workload.
(a) Describe a general high-level plan for a performance pro-
cess to ensure the smooth introduction of a web-based
scheme for applying for the new dental insurance scheme.
Your plan should include steps for baselining the load and
performance of the existing system, estimating the system
demand of the new service, and ensuring the performance
of the new service on the same platform as the existing ser-
vices. Do you need to conduct a full performance study of
the existing system before you proceed? Explain.
(b) Identify those parts of your plan that can be executed
regardless of changing rules, functional requirements, and
changing performance requirements.
(c) Identify those parts of your plan that depend on changing
functional and performance requirements.
364 Working with Stakeholders to Learn, Influence
15.1 Overview
Performance engineering is a rich area that uses techniques, tools, and
disciplines from many different fields, depending on the nature of the
problem at hand. In this book we have shown how basic performance
367
368 Where to Learn More
15.9 Summary
Software and system performance engineering encompass many disci-
plines. New performance issues arise as technology changes, yet the
underlying principles of measurement and analysis remain unchanged.
Performance issues that were identified and studied many years ago
will continue to recur in different guises. The literature we have cited
in this chapter and elsewhere in this book covers operating systems
principles, performance engineering, database design, queueing the-
ory, statistics, simulation, operations research, and requirements engi-
neering among many others. The author has used all of these disciplines
and more to assure the sound performance of the systems on which he
has worked. It is hoped that the reader will be able to do likewise.
This page intentionally left blank
References
377
378 References
[Bass2007] Bass, Len, Robert L. Nord, William Wood, and David Zubrow. Risk themes
discovered through architecture evaluations. WICSA 2007, Mumbai, India, January
2007.
[BCM2004] Blackburn, S. M., P. Cheng, and K. S. McKinley. Myths and realities: The
performance impact of garbage collection. Proc. ACM SIGMETRICS 2004, 25–36,
2004.
[BCMP1975] Baskett, F., K. M. Chandy, R. R. Muntz, and F. Palacios. Open, closed, and
mixed networks of queues with different classes of customers. JACM 22 (2),
248–260, 1975.
[BhatMiller2002] Bhat, U. N., and G. K. Miller. Elements of Applied Stochastic Processes.
Wiley-Interscience, 2002.
[Boehm1988] Boehm, B. A spiral model of software development and enhancement.
IEEE Computer 21 (5), 61–72, 1988.
[Bok2010] Bok, Derek. The Politics of Happiness: What Government Can Learn from
the New Research on Well-Being. Princeton University Press, 2010.
[Bondi1989] Bondi, A. B. An analysis of finite capacity queues with common or r eserved
waiting areas. Computers and Operations Research 16 (3), 217–233, 1989.
[Bondi1992] Bondi, A. B. A study of a state-dependent job admission policy in a
computer system with restricted memory partitions. Performance Evaluation 15 (3),
133–153, 1992.
[Bondi1997a] Bondi, A. B. A model of the simultaneous possession of agents and trunks
with automated recorded announcement. In Proc. ITC15, edited by V. Ramaswami
and P. E. Wirth, 1347–1358. Elsevier, 1997.
[Bondi1997b] Bondi, A. B. A non-blocking mechanism for regulating the transmission
of network management polls. Proc. ISINM97, 565–580, San Diego, California,
May 1997.
[Bondi1998] Bondi, A. B. Network management system with improved node discovery
and monitoring. US Patent No. 5710885, issued January 20, 1998.
[Bondi2000] Bondi, A. B. Characteristics of scalability and their impact on performance.
Proc. 2nd International Workshop on Software and Performance (WOSP 2000), Ottawa,
Canada, 195–203, September 2000.
[Bondi2007a] Bondi, A. B. Automating the analysis of load test results to assess the scal-
ability and stability of a component-based SOA-based system. Proc. CMG 2007,
San Diego, California, December 2007.
[Bondi2007b] Bondi, A. B. Experience with incremental performance testing of a sys-
tem based on a modular or service-oriented architecture. Proc. ROSATEA, Medford,
Massachusetts, July 2007.
[BondiBuzen1984] Bondi, A. B., and J. P. Buzen. The response times of priority classes
under preemptive resume in M/G/m queues. Proc. ACM SIGMETRICS 1984,
195–201, 1984.
[BondiJin1996] Bondi, A. B., and V. Y. Jin. A performance model of a design for a mini-
mally replicated distributed database for database-driven telecommunications
services. Distributed and Parallel Databases 4, 295–317, 1996.
[BondiRos2009] Bondi, A. B., and J. Ros. Experience with training a remotely located
performance test team in a quasi-agile global environment. Proc. International
Conference on Global Software Engineering, Limerick, Ireland, July 2009.
References 379
[Killelea2000] Killelea, Patrick. Java threads may not use all your CPUs. Java World,
August 11, 2000. www.javaworld.com/article/2076147/java-web-development/
java-threads-may-not-use-all-your-cpus.html.
[Kleinrock1975] Kleinrock, L. Queueing Systems, Volume 1: Theory. Wiley, 1975.
[Kleinrock1976] Kleinrock, L. Queueing Systems, Volume 2: Applications. Wiley, 1976.
[Koenigsberg1958] Koenigsberg, E. Cyclic queues. Operations Research 9 (1), 22–35, 1958.
[KS2009] Kaeli, D., and Kai Sachs. Computer Performance Evaluation and Benchmarking:
SPEC Benchmark Workshop, Austin, Texas, January 2009. Lecture Notes in Computer
Science 5419. Springer, 2009.
[Latouche1981] Latouche, G. Algorithmic analysis of a multiprogramming-
multiprocessing computer system. JACM 28 (4), 662–679, 1981.
[LawKelton1982] Law, A. M., and W. David Kelton. Simulation Modeling and Analysis.
McGraw-Hill, 1982.
[LeeKatz1993] Lee, E. K., and R. H. Katz. An analytic model of disk arrays. Proc. ACM
SIGMETRICS 1993, 98–109, Santa Clara, California, 1993.
[Lilja2000] Lilja, David J. Measuring Computer Performance: A Practitioner’s Guide. Cam-
bridge University Press, 2000.
[Little1961] Little, J. D. C. A proof for the queuing formula L = λW. Operations Research
9 (3), 383–387, 1961.
[LZGS1984] Lazowska, E. D., J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative
System Performance. Prentice Hall, 1984. Also available online at www.cs.washing-
ton.edu/homes/lazowska/qsp/.
[MBH2005] Masticola, S., A. B. Bondi, and M. Hettish. Model-based scalability estima-
tion in inception-phase software architecture. In ACM/IEEE 8th International Confer-
ence on Model-Driven Engineering Languages and Systems, 2005. Lecture Notes in
Computer Science 3713, 355–366. Springer, 2005.
[MenasceAlmeida2000] Menasce, D. A., and V. A. F. Almeida. Scaling for E-Business:
Technologies, Models, Performance, and Capacity Planning. Prentice Hall, 2000.
[MenasceAlmeida2002] Menasce, D. A., and V. A. F. Almeida. Capacity Planning for Web
Services: Metrics, Models, and Methods. Prentice Hall, 2002.
[Microsoft2007] Microsoft Corporation. Performance Testing for Web Applications.
O’Reilly, 2007.
[MogRam1997] Mogul, J. C., and K. K. Ramakrishnan. Eliminating receive livelock in an
interrupt-driven kernel. ACM Transactions on Computer Systems 15 (3), 217–252, 1997.
[Mossburg2009] Mossburg, Marta. Happiness is no metric for a country’s success.
Washington Examiner, September 18, 2009. http://washingtonexaminer.com/arti-
cle/33621#.UDvD2qC058E.
[MSOFTSUPPORT1] http://support.microsoft.com/kb/310067.
[Munin2008] Pohl, G., and M. Renner. Munin: Graphisches Netzwerk- und System-
Monitoring. Open Source Press, 2008.
[NagVaj2009] Nagarajan, S. N., and S. Vajravelu. Avoiding performance engineering
pitfalls. In Performance Engineering and Enhancement, SET Labs Briefings 7 (1), 9–14,
Infosys, 2009. Available online at www.infosys.com/infosys-labs/publications/
Documents/SETLabs-briefings-performance-engineering.pdf.
References 383
Architecture, continued B
skills need by performance engineers, 8 Back-end databases, understanding
structuring tests to reflect scalability architecture before testing,
of, 228–229 211–212
understanding before testing, Background activities
211–212 identifying concerns and drivers in
understanding impact of existing, performance story, 344–345
346–347 resource consumption by, 205
Arrival rate Bandwidth
characterizing queue performance, 42 linking performance requirements to
connection between models, require- engineering needs, 108
ments, and tests, 79 measuring utilization,
formulating performance require- 174–175
ments to facilitate testing, 159 sustainable load and, 127
modeling principles, 201 “Bang the system as hard as you can”
quantifying device loadings and flow testing method
through computer systems, 56 example of wrong way to evaluate
Arrival Theorem (Sevcik-Mitrani throughput, 208–209
Theorem), 70, 74 as provocative performance testing,
The Art of Computer Systems Performance 209–210
Analysis (Jain), 371 Banking systems
Association for Computing Machinery example of multiple-class queueing
(ACM), 370 networks, 72
Assumptions reference workload example, 88
in modeling asynchronous I/O, 262 scheduling periodic loads and peaks,
in performance requirements 267
documents, 152 Baseline models
Asynchronous activity determining resource requirements, 7
impact on performance bounds, using validated model as baseline,
66–67 255
modeling asynchronous I/O, Batch processing, in single-class closed
260–266 queueing network model, 60
parallelism and, 294 BCMP Theorem, 68, 73
queueing models and, 255 Bentham, Jeremy, 20
Atomicity, consistency, isolation, and Bohr bug, 209
durability (ACID), 287 Bottlenecks
Audience, specifying in performance contention and, 260
requirements document, 151–152 eliminating unmasks new pitfall,
Automating 319–321
data analysis, 244–245 improving load scalability, 294
testing, 213, 244–245 measuring processor utilization by
Average device utilization individual processes, 171
definition of common metrics, 20 modeling principles, 201–202
formula for, 21 performance modeling and, 10
Average service time, in Utilization Law, in single-class closed queueing
45–47 networks, 63
Average throughput, 20 software bottlenecks, 314
Averaging time window, measuring upper bounds on system throughput
utilization and, 175–177 and, 56–58
Index 387
Interactions J
in performance engineering, 13–14 Jackson’s Theorem
in performance requirements multiple-class queueing networks
documents, 151 and, 74
Interarrival time, queueing and, 39–41. single-class queueing networks and,
see also Arrival rate 59–60
International Conference on Perfor- Java
mance Engineering (ICPE), 370 garbage collection and, 315
Interoperability, in performance performance tuning resources, 374
requirements documents, 151 virtual machines, 317
Interpreting measurements, in virtual Journal of the Association for Computing
environments, 195 Machinery (JACM), 370
Interpreting test results Journals, learning resources for perfor-
applying results and, 330–331 mance engineering, 369–370
service use cases example, 231–235
system with computationally intense K
transactions, 237–241 Kernel mode (Linux/UNIX OSs),
system with memory leak and measuring CPU utilization, 171
deadlocks, 241–243
transaction system with high failure L
rate, 235–237 Labs
Introduction to Queueing Theory (Cooper), investing in lab time for measure-
372 ment and testing, 7
Investments, in performance testing and lab discipline, 217
engineering, 6–7 Last Come First Served (LCFS), 44
I/O (input/output) Last Come First Served Preemptive
asynchronous activity impacting Resume (LCFSPR)
performance bounds, 66–67 regularity conditions for computa-
benefits and pitfalls of priority tionally tractable queueing
scheduling, 310 network models, 68–69
load scalability and scheduling rules types of queueing disciplines, 44
and, 278–279 Layout, of performance requirements,
measuring disk utilization, 173 153–155
quantifying device loadings and flow Learning resources, for performance
through a computer system, 54–56 engineering
single-server queues and, 42 conferences and journals, 369–370
sustainable load and, 127 discrete event simulation,
where processing time increases per 372–373
unit of work, 267 overview, 367–369
I/O devices performance tuning, 374–375
modeling principles, 201 queueing theory, 372
in simple queueing networks, statistical methods, 374
53–54 summary, 375
iostat (Linux/UNIX OSs), measuring system performance evaluation, 373
CPU utilization, 171 texts on performance analysis,
IS (Infinite Service), regularity condi- 370–371
tions for computationally tractable Legacy system, pitfall in transition to
queueing network models, new system, 156–158
68–69 Linear regression, 374
Isolation property, ACID properties, 287 Linearity, properties of metrics, 24–25
Index 393
Locks Measurement
benefits of priority scheduling for collecting data from performance
releasing, 309 test, 229–230
busy waiting, 285–286 comparing with performance testing,
coarse granularity of, 287 167–168
comparing with semaphores, 296–298 investing in lab time and tools for, 7
row-locking vs. table-level locking, metrics applied to. see Metrics
301 in performance engineering, 10–11
Loops, processor usage and, 169 performance engineering pitfalls,
Lost calls/lost work 317–319
performance requirements related to, performance modeling and, 11
134–135 performance requirements and,
queues/queueing and, 75–77 118–119
Lost packets, 134–135 of systems. see System measurement
Lower bounds, on system response Measurement intervals, explaining to
times, 58 stakeholders, 356–359
Measurement phase, of modeling
M studies, 254
Management Measuring Computer Performance (Lilja),
of performance requirements, 371
155–156 Memory leaks
as stakeholder, 349–350 interpreting measurements of system
Management information bases (MIBs), with memory leak and deadlocks,
185 241–243
Mapping application domains, to measuring from within applications,
workloads 186
example of airport conveyor system, provocative performance testing and,
92–94 210
example of fire alarm system, 94–95 sustainable load and, 127
example of online securities trading testing system stability, 225–226
system, 91–92 Memory management
Market segments, linking performance background activities and, 205
requirements to size, 107–109 diminishing returns from multipro-
Markov chains, 159, 231 cessors or multiple cores, 320
Markup language, modeling systems in garbage collection causing degraded
development environment with, performance, 315
292 measuring memory-related activity,
Mathematical analysis, of load 180–181
scalability, 295–296 performance engineering pitfalls, 321
Mathematical consistency space-time scalability and, 280
ensuring conformity of performance Memory occupancy
requirements to performance formulating performance require-
laws, 148–149 ments to facilitate testing, 159
of performance requirements, 120 measuring memory-related activity,
Mean service time, in characterizing 180–181
queue performance, 42 qualitative attributes of system
Mean Value Analysis (MVA), of single- performance, 126–127
class closed queueing networks, Metrics, 23–24. see also Measurement;
69–71 System measurement
Measurability, of performance require- ambiguity and, 117–118
ments, 118–119 applying to conveyor system, 27–28
Index 395
informIT.com
THE TRUSTED TECHNOLOGY LEARNING SOURCE
Addison-Wesley | Cisco Press | Exam Cram
IBM Press | Que | Prentice Hall | Sams
LearnIT at InformIT
Looking for a book, eBook, or training video on a new technology? Seek-
ing timely and relevant information and tutorials? Looking for expert opin-
ions, advice, and tips? InformIT has the solution.
Visit informit.com /learn to discover all the ways you can access the
hottest technology content.
Safari Books Online is a digital library providing searchable, on-demand access to thousands
of technology, digital media, and professional development books and videos from leading
publishers. With one monthly or yearly subscription price, you get unlimited access to learning
tools and information on topics including mobile app and software development, tips and tricks
on using your favorite gadgets, networking, project management, graphic design, and much more.