Measuring and Reporting Performance
Measuring and Reporting Performance
The computer user is interested in reducing response time( the time
between the start and the completion of an event) also referred to as execution
time. The manager of a large data processing center may be interested in
increasing throughput( the total amount of work done in a given time).
Even execution time can be defined in different ways depending on what
we count. The most straightforward definition of time is called wall-clock time,
response time, or elapsed time, which is the latency to complete a task,
including disk accesses, memory accesses, input/output activities, operating
system overhead
Choosing Programs to Evaluate Performance
A computer user who runs the same programs day in and day out would
be the perfect candidate to evaluate a new computer. To evaluate a new system
the user would simply compare the execution time of her workload—the
mixture of programs and operating system commands that users run on a
machine.
There are five levels of programs used in such circumstances, listed
below in decreasing order of accuracy of prediction
1. Real applications— Although the buyer may not know what fraction of
time is spent on these programs, she knows that some users will run them to
solve real problems. Examples are compilers for C, text-processing software
like Word, and other applications like Photoshop. Real applications have input,
output, and options that a user can select when running the program. There is
one major downside to using real applications as benchmarks: Real applications
often encounter portability problems arising from dependences on the operating
system or compiler. Enhancing portability often means modifying the source
and sometimes eliminating some important activity, such as interactive
graphics, which tends to be more system-dependent.
2. Modified (or scripted) applications—In many cases, real applications
are used as the building block for a benchmark either with modifications to the
application or with a script that acts as stimulus to the application. Applications
are modified for two primary reasons: to enhance portability or to focus on one
particular aspect of system performance. For example, to create a CPU-oriented
benchmark, I/O may be removed or restructured to minimize its impact on
execution time. Scripts are used to reproduce interactive behavior, which might
occur on a desktop system, or to simulate complex multiuser interaction, which
occurs in a server system.
Kernels—Several attempts have been made to extract small, key pieces from
real programs and use them to evaluate performance. Livermore Loops and
Linpack are the best known examples. Unlike real programs, no user would run
kernel programs, for they exist solely to evaluate performance. Kernels are best
used to isolate performance of individual features of a machine to explain the
reasons for differences in performance of real programs.
4. Toy benchmarks—Toy benchmarks are typically between 10 and 100
lines of code and produce a result the user already knows before running the toy
program. Programs like Sieve of Eratosthenes, Puzzle, and Quicksort are
popular because they are small, easy to type, and run on almost any computer.
The best use of such programs is beginning programming assignments
5. Synthetic benchmarks—Similar in philosophy to kernels, synthetic
benchmarks try to match the average frequency of operations and operands of a
large set of programs. Whetstone and Dhrystone are the most popular synthetic
benchmarks.