0% found this document useful (0 votes)
5 views12 pages

Week

This document discusses the measurement and definition of computer performance, emphasizing the importance of understanding metrics such as response time and throughput for both users and designers. It explains how performance can be quantified using execution time and introduces the classical processor performance equation. The document also highlights the relationship between clock cycles, clock cycle time, and CPU execution time, providing examples to illustrate these concepts.

Uploaded by

2023eb03302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Week

This document discusses the measurement and definition of computer performance, emphasizing the importance of understanding metrics such as response time and throughput for both users and designers. It explains how performance can be quantified using execution time and introduces the classical processor performance equation. The document also highlights the relationship between clock cycles, clock cycle time, and CPU execution time, providing examples to illustrate these concepts.

Uploaded by

2023eb03302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1.

6 Performance 29

purchasers and therefore to designers. The people selling computers know this as
well. Often, salespeople would like you to see their computer in the best possible
light, whether or not this light accurately reflects the needs of the purchaser’s
application. Hence, understanding how best to measure performance and the
limitations of performance measurements is important in selecting a computer.
The rest of this section describes different ways in which performance can be
determined; then, we describe the metrics for measuring performance from the
viewpoint of both a computer user and a designer. We also look at how these metrics
are related and present the classical processor performance equation, which we will
use throughout the text.

Defining Performance
When we say one computer has better performance than another, what do we
mean? Although this question might seem simple, an analogy with passenger
airplanes shows how subtle the question of performance can be. Figure 1.14
lists some typical passenger airplanes, together with their cruising speed, range,
and capacity. If we wanted to know which of the planes in this table had the best
performance, we would first need to define performance. For example, considering
different measures of performance, we see that the plane with the highest cruising
speed was the Concorde (retired from service in 2003), the plane with the longest
range is the DC-8, and the plane with the largest capacity is the 747.

Passenger Cruising range Cruising speed Passenger throughput


Airplane capacity (miles) (m.p.h.) (passengers × m.p.h.)
Boeing 777 375 4630 0610 228,750
Boeing 747 470 4150 0610 286,700
BAC/Sud Concorde 132 4000 1350 178,200
Douglas DC-8-50 146 8720 0544 79,424

FIGURE 1.14 The capacity, range, and speed for a number of commercial airplanes. The last
column shows the rate at which the airplane transports passengers, which is the capacity times the cruising
speed (ignoring range and takeoff and landing times).

Let’s suppose we define performance in terms of speed. This still leaves two
possible definitions. You could define the fastest plane as the one with the highest
cruising speed, taking a single passenger from one point to another in the least time.
If you were interested in transporting 450 passengers from one point to another, response time Also
however, the 747 would clearly be the fastest, as the last column of the figure shows. called execution time.
Similarly, we can define computer performance in several different ways. The total time required
for the computer to
If you were running a program on two different desktop computers, you’d say
complete a task, including
that the faster one is the desktop computer that gets the job done first. If you were disk accesses, memory
running a datacenter that had several servers running jobs submitted by many accesses, I/O activities,
users, you’d say that the faster computer was the one that completed the most operating system
jobs during a day. As an individual computer user, you are interested in reducing overhead, CPU execution
response time—the time between the start and completion of a task—also referred time, and so on.
30 Chapter 1 Computer Abstractions and Technology

throughput Also called to as execution time. Datacenter managers are often interested in increasing
bandwidth. Another throughput or bandwidth—the total amount of work done in a given time. Hence,
measure of performance, in most cases, we will need different performance metrics as well as different sets
it is the number of tasks
of applications to benchmark personal mobile devices, which are more focused on
completed per unit time.
response time, versus servers, which are more focused on throughput.

Throughput and Response Time

Do the following changes to a computer system increase throughput, decrease


EXAMPLE response time, or both?
1. Replacing the processor in a computer with a faster version
2. Adding additional processors to a system that uses multiple processors
for separate tasks—for example, searching the web

Decreasing response time almost always improves throughput. Hence, in case


ANSWER 1, both response time and throughput are improved. In case 2, no one task gets
work done faster, so only throughput increases.

If, however, the demand for processing in the second case was almost
as large as the throughput, the system might force requests to queue up. In
this case, increasing the throughput could also improve response time, since
it would reduce the waiting time in the queue. Thus, in many real computer
systems, changing either execution time or throughput often affects the other.

In discussing the performance of computers, we will be primarily concerned with


response time for the first few chapters. To maximize performance, we want to
minimize response time or execution time for some task. Thus, we can relate
performance and execution time for a computer X:
1
Performance X ⫽
Execution time X

This means that for two computers X and Y, if the performance of X is greater than
the performance of Y, we have
Performance X ⬎ Performance Y
1 1

Execution time X Execution time Y
Execution time Y ⬎ Execution time X

That is, the execution time on Y is longer than that on X, if X is faster than Y.
1.6 Performance 31

In discussing a computer design, we often want to relate the performance of two


different computers quantitatively. We will use the phrase “X is n times faster than
Y”—or equivalently “X is n times as fast as Y”—to mean
Performance X
⫽n
Performance Y

If X is n times as fast as Y, then the execution time on Y is n times as long as it is


on X:
Performance X Execution time Y
⫽ ⫽n
Performance Y Execution time X

Relative Performance

If computer A runs a program in 10 seconds and computer B runs the same


program in 15 seconds, how much faster is A than B? EXAMPLE

We know that A is n times as fast as B if


Performance A Execution timeB
ANSWER
⫽ ⫽n
PerformanceB Execution time A

Thus the performance ratio is


15
⫽ 1. 5
10

and A is therefore 1.5 times as fast as B.

In the above example, we could also say that computer B is 1.5 times slower than
computer A, since
Performance A
⫽ 1. 5
PerformanceB

means that
Performance A
⫽ PerformanceB
1. 5
32 Chapter 1 Computer Abstractions and Technology

For simplicity, we will normally use the terminology as fast as when we try to
compare computers quantitatively. Because performance and execution time are
reciprocals, increasing performance requires decreasing execution time. To avoid
the potential confusion between the terms increasing and decreasing, we usually
say “improve performance” or “improve execution time” when we mean “increase
performance” and “decrease execution time.”

Measuring Performance
Time is the measure of computer performance: the computer that performs the
same amount of work in the least time is the fastest. Program execution time is
measured in seconds per program. However, time can be defined in different ways,
depending on what we count. The most straightforward definition of time is called
wall clock time, response time, or elapsed time. These terms mean the total time
to complete a task, including disk accesses, memory accesses, input/output (I/O)
activities, operating system overhead—everything.
Computers are often shared, however, and a processor may work on several
programs simultaneously. In such cases, the system may try to optimize throughput
rather than attempt to minimize the elapsed time for one program. Hence, we
often want to distinguish between the elapsed time and the time over which the
CPU execution processor is working on our behalf. CPU execution time or simply CPU time,
time Also called CPU which recognizes this distinction, is the time the CPU spends computing for this
time. The actual time the task and does not include time spent waiting for I/O or running other programs.
CPU spends computing
(Remember, though, that the response time experienced by the user will be the
for a specific task.
elapsed time of the program, not the CPU time.) CPU time can be further divided
user CPU time The into the CPU time spent in the program, called user CPU time, and the CPU time
CPU time spent in a spent in the operating system performing tasks on behalf of the program, called
program itself. system CPU time. Differentiating between system and user CPU time is difficult to
system CPU time The do accurately, because it is often hard to assign responsibility for operating system
CPU time spent in activities to one user program rather than another and because of the functionality
the operating system differences among operating systems.
performing tasks on For consistency, we maintain a distinction between performance based on
behalf of the program. elapsed time and that based on CPU execution time. We will use the term system
performance to refer to elapsed time on an unloaded system and CPU performance
to refer to user CPU time. We will focus on CPU performance in this chapter,
although our discussions of how to summarize performance can be applied to
either elapsed time or CPU time measurements.

Understanding Different applications are sensitive to different aspects of the performance of a


Program computer system. Many applications, especially those running on servers, depend
as much on I/O performance, which, in turn, relies on both hardware and software.
Performance Total elapsed time measured by a wall clock is the measurement of interest. In
1.6 Performance 33

some application environments, the user may care about throughput, response
time, or a complex combination of the two (e.g., maximum throughput with a
worst-case response time). To improve the performance of a program, one must
have a clear definition of what performance metric matters and then proceed to
look for performance bottlenecks by measuring program execution and looking
for the likely bottlenecks. In the following chapters, we will describe how to search
for bottlenecks and improve performance in various parts of the system.

Although as computer users we care about time, when we examine the details
of a computer it’s convenient to think about performance in other metrics. In clock cycle Also called
tick, clock tick, clock
particular, computer designers may want to think about a computer by using a period, clock, or cycle.
measure that relates to how fast the hardware can perform basic functions. Almost The time for one clock
all computers are constructed using a clock that determines when events take period, usually of the
place in the hardware. These discrete time intervals are called clock cycles (or processor clock, which
ticks, clock ticks, clock periods, clocks, cycles). Designers refer to the length of a runs at a constant rate.
clock period both as the time for a complete clock cycle (e.g., 250 picoseconds, or clock period The length
250 ps) and as the clock rate (e.g., 4 gigahertz, or 4 GHz), which is the inverse of the of each clock cycle.
clock period. In the next subsection, we will formalize the relationship between the
clock cycles of the hardware designer and the seconds of the computer user.
1. Suppose we know that an application that uses both personal mobile Check
devices and the Cloud is limited by network performance. For the following Yourself
changes, state whether only the throughput improves, both response time
and throughput improve, or neither improves.
a. An extra network channel is added between the PMD and the Cloud,
increasing the total network throughput and reducing the delay to obtain
network access (since there are now two channels).
b. The networking software is improved, thereby reducing the network
communication delay, but not increasing throughput.
c. More memory is added to the computer.
2. Computer C’s performance is 4 times as fast as the performance of computer
B, which runs a given application in 28 seconds. How long will computer C
take to run that application?

CPU Performance and Its Factors


Users and designers often examine performance using different metrics. If we could
relate these different metrics, we could determine the effect of a design change
on the performance as experienced by the user. Since we are confining ourselves
to CPU performance at this point, the bottom-line performance measure is CPU
34 Chapter 1 Computer Abstractions and Technology

execution time. A simple formula relates the most basic metrics (clock cycles and
clock cycle time) to CPU time:
CPU execution time CPU clock cycles
for a program for a progrram Clock cycle time

Alternatively, because clock rate and clock cycle time are inverses,
CPU execution time CPU clock cycles for a program
for a program ⫽
Clock rate

This formula makes it clear that the hardware designer can improve performance
by reducing the number of clock cycles required for a program or the length of
the clock cycle. As we will see in later chapters, the designer often faces a trade-off
between the number of clock cycles needed for a program and the length of each
cycle. Many techniques that decrease the number of clock cycles may also increase
the clock cycle time.

Improving Performance

Our favorite program runs in 10 seconds on computer A, which has a 2 GHz


EXAMPLE clock. We are trying to help a computer designer build a computer, B, which will
run this program in 6 seconds. The designer has determined that a substantial
increase in the clock rate is possible, but this increase will affect the rest of the
CPU design, causing computer B to require 1.2 times as many clock cycles as
computer A for this program. What clock rate should we tell the designer to
target?

Let’s first find the number of clock cycles required for the program on A:
ANSWER
CPU clock cycles A
CPU time A
Clock rateA
CPU clock cycles A
10 seconds
cycles
2 109
second
cycles
CPU clock cycles A 10 seconds 2 109 20 109 cycles
second
1.6 Performance 35

CPU time for B can be found using this equation:

1.2 CPU clock cycles A


CPU timeB
Clock rateB

1. 2 20 109 cycles
6 seconds
Clock rateB

1. 2 20 109 cycles 0. 2 20 109 cycles 4 109 cycles


Clock rateB 4 GHz
6 seconds second second

To run the program in 6 seconds, B must have twice the clock rate of A.

Instruction Performance
The performance equations above did not include any reference to the number of
instructions needed for the program. However, since the compiler clearly generated
instructions to execute, and the computer had to execute the instructions to run
the program, the execution time must depend on the number of instructions in a
program. One way to think about execution time is that it equals the number of
instructions executed multiplied by the average time per instruction. Therefore, the
number of clock cycles required for a program can be written as

Average clock cycles


CPU clock cycles Instructions for a program per instruction

The term clock cycles per instruction, which is the average number of clock clock cycles
cycles each instruction takes to execute, is often abbreviated as CPI. Since different per instruction
instructions may take different amounts of time depending on what they do, CPI is (CPI) Average number
of clock cycles per
an average of all the instructions executed in the program. CPI provides one way of
instruction for a program
comparing two different implementations of the same instruction set architecture, or program fragment.
since the number of instructions executed for a program will, of course, be the
same.

Using the Performance Equation

Suppose we have two implementations of the same instruction set architecture.


Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, EXAMPLE
and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same
program. Which computer is faster for this program and by how much?
36 Chapter 1 Computer Abstractions and Technology

We know that each computer executes the same number of instructions for
ANSWER the program; let’s call this number I. First, find the number of processor clock
cycles for each computer:
CPU clock cycles A ⫽ I × 2.0
CPU clock cycles B ⫽ I × 1.2

Now we can compute the CPU time for each computer:


CPU time A CPU clock cycles A Clock cycle time
I 2.0 250 ps 500 I ps

Likewise, for B:
CPU timeB I 1.2 500 ps 600 I ps
Clearly, computer A is faster. The amount faster is given by the ratio of the
execution times:
CPU performance A Execution timeB 600 I ps
1.2
CPU performanceB Execution time A 500 I ps

We can conclude that computer A is 1.2 times as fast as computer B for this
program.

The Classic CPU Performance Equation


instruction count The We can now write this basic performance equation in terms of instruction count
number of instructions (the number of instructions executed by the program), CPI, and clock cycle time:
executed by the program.
CPU time Instruction count CPI Clock cycle time

or, since the clock rate is the inverse of clock cycle time:
Instruction count CPI
CPU time
Clock rate

These formulas are particularly useful because they separate the three key factors
that affect performance. We can use these formulas to compare two different
implementations or to evaluate a design alternative if we know its impact on these
three parameters.
1.6 Performance 37

Comparing Code Segments

A compiler designer is trying to decide between two code sequences for a


particular computer. The hardware designers have supplied the following facts: EXAMPLE
CPI for each instruction class
A B C
CPI 1 2 3

For a particular high-level language statement, the compiler writer is


considering two code sequences that require the following instruction counts:
Instruction counts for each instruction class
Code sequence A B C
1 2 1 2
2 4 1 1

Which code sequence executes the most instructions? Which will be faster?
What is the CPI for each sequence?

Sequence 1 executes 2 ⫹ 1 ⫹ 2 ⫽ 5 instructions. Sequence 2 executes 4 ⫹ 1 ⫹


1 ⫽ 6 instructions. Therefore, sequence 1 executes fewer instructions. ANSWER
We can use the equation for CPU clock cycles based on instruction count
and CPI to find the total number of clock cycles for each sequence:
n
CPU clock cycles ∑ (CPIi Ci )
i 1

This yields
CPU clock cycles1 (2 1) (1 2) (2 3) 2 2 6 10 cycles

CPU clock cycles2 (4 1) (1 2) (1 3) 4 2 3 9 cycles

So code sequence 2 is faster, even though it executes one extra instruction. Since
code sequence 2 takes fewer overall clock cycles but has more instructions, it
must have a lower CPI. The CPI values can be computed by
CPU clock cycles
CPI ⫽
Instruction count
CPU clock cycles1 10
CPI1 ⫽ ⫽ ⫽ 2. 0
Instruction count1 5
CPU clock cycles2 9
CPI2 ⫽ ⫽ ⫽ 1. 5
Instruction count 2 6
38 Chapter 1 Computer Abstractions and Technology

Figure 1.15 shows the basic measurements at different levels in the


computer and what is being measured in each case. We can see how these
factors are combined to yield execution time measured in seconds per
program:
Instructions Clock cycles Seconds
Time Seconds/Program
Program Instru
uction Clock cycle
TheBIG Always bear in mind that the only complete and reliable measure of
Picture computer performance is time. For example, changing the instruction set
to lower the instruction count may lead to an organization with a slower
clock cycle time or higher CPI that offsets the improvement in instruction
count. Similarly, because CPI depends on type of instructions executed,
the code that executes the fewest number of instructions may not be the
fastest.

Components of performance Units of measure


CPU execution time for a program Seconds for the program
Instruction count Instructions executed for the program
Clock cycles per instruction (CPI) Average number of clock cycles per instruction
Clock cycle time Seconds per clock cycle

FIGURE 1.15 The basic components of performance and how each is measured.

How can we determine the value of these factors in the performance equation?
We can measure the CPU execution time by running the program, and the clock
cycle time is usually published as part of the documentation for a computer. The
instruction count and CPI can be more difficult to obtain. Of course, if we know
the clock rate and CPU execution time, we need only one of the instruction count
or the CPI to determine the other.
We can measure the instruction count by using software tools that profile the
execution or by using a simulator of the architecture. Alternatively, we can use
hardware counters, which are included in most processors, to record a variety of
measurements, including the number of instructions executed, the average CPI,
and often, the sources of performance loss. Since the instruction count depends
on the architecture, but not on the exact implementation, we can measure the
instruction count without knowing all the details of the implementation. The CPI,
however, depends on a wide variety of design details in the computer, including
both the memory system and the processor structure (as we will see in Chapter 4
and Chapter 5), as well as on the mix of instruction types executed in an application.
Thus, CPI varies by application, as well as among implementations with the same
instruction set.
1.7 The Power Wall 39

The above example shows the danger of using only one factor (instruction count)
to assess performance. When comparing two computers, you must look at all three
components, which combine to form execution time. If some of the factors are
identical, like the clock rate in the above example, performance can be determined
by comparing all the nonidentical factors. Since CPI varies by instruction mix, instruction mix
both instruction count and CPI must be compared, even if clock rates are identical. A measure of the dynamic
Several exercises at the end of this chapter ask you to evaluate a series of computer frequency of instructions
across one or many
and compiler enhancements that affect clock rate, CPI, and instruction count. In
programs.
Section 1.10, we’ll examine a common performance measurement that does not
incorporate all the terms and can thus be misleading.

The performance of a program depends on the algorithm, the language, the Understanding
compiler, the architecture, and the actual hardware. The following table summarizes
how these components affect the factors in the CPU performance equation. Program
Performance
Hardware
or software
component Affects what? How?
Algorithm Instruction count, The algorithm determines the number of source program
possibly CPI instructions executed and hence the number of processor
instructions executed. The algorithm may also affect the CPI,
by favoring slower or faster instructions. For example, if the
algorithm uses more divides, it will tend to have a higher CPI.
Programming Instruction count, The programming language certainly affects the instruction
language CPI count, since statements in the language are translated to
processor instructions, which determine instruction count. The
language may also affect the CPI because of its features; for
example, a language with heavy support for data abstraction
(e.g., Java) will require indirect calls, which will use higher CPI
instructions.
Compiler Instruction count, The efficiency of the compiler affects both the instruction
CPI count and average cycles per instruction, since the compiler
determines the translation of the source language instructions
into computer instructions. The compiler’s role can be very
complex and affect the CPI in complex ways.
Instruction set Instruction count, The instruction set architecture affects all three aspects of
architecture clock rate, CPI CPU performance, since it affects the instructions needed for a
function, the cost in cycles of each instruction, and the overall
clock rate of the processor.

Elaboration: Although you might expect that the minimum CPI is 1.0, as we’ll see in
Chapter 4, some processors fetch and execute multiple instructions per clock cycle. To
reflect that approach, some designers invert CPI to talk about IPC, or instructions per
clock cycle. If a processor executes on average 2 instructions per clock cycle, then it has
an IPC of 2 and hence a CPI of 0.5.
40 Chapter 1 Computer Abstractions and Technology

Elaboration: Although clock cycle time has traditionally been fixed, to save energy
or temporarily boost performance, today’s processors can vary their clock rates, so we
would need to use the average clock rate for a program. For example, the Intel Core i7
will temporarily increase clock rate by about 10% until the chip gets too warm. Intel calls
this Turbo mode.

Check A given application written in Java runs 15 seconds on a desktop processor. A new
Yourself Java compiler is released that requires only 0.6 as many instructions as the old
compiler. Unfortunately, it increases the CPI by 1.1. How fast can we expect the
application to run using this new compiler? Pick the right answer from the three
choices below:
15 0.6
a. 8.2 sec
1. 1
b. 15 ⫻ 0.6 ⫻ 1.1 ⫽ 9.9 sec
15 1.1
c. 27.5 sec
0. 6

1.7 The Power Wall

Figure 1.16 shows the increase in clock rate and power of eight generations of Intel
microprocessors over 30 years. Both clock rate and power increased rapidly for
decades, and then flattened off recently. The reason they grew together is that they
are correlated, and the reason for their recent slowing is that we have run into the
practical power limit for cooling commodity microprocessors.

10,000 120
3600 2667 3300 3400
2000
100
Clock Rate (MHz)

1000 103
95

Power (watts)
Clock Rate 200 87 80
75.3
66 77
100 60
25
12.5 16
Power 40
10 29.1
10.1 20
3.3 4.1 4.9
1 0
Ivy Bridge
Clarkdale
(1982)

(1985)

(1989)

Pentium

Pro (1997)

Willamette

Pentium 4

Kentsfield
80286

80386

80486

Pentium 4

Core i5

Core i5
(1993)

(2010)

(2012)
Prescott
Pentium

Core 2

(2007)
(2004)
(2001)

FIGURE 1.16 Clock rate and Power for Intel x86 microprocessors over eight generations
and 25 years. The Pentium 4 made a dramatic jump in clock rate and power but less so in performance. The
Prescott thermal problems led to the abandonment of the Pentium 4 line. The Core 2 line reverts to a simpler
pipeline with lower clock rates and multiple processors per chip. The Core i5 pipelines follow in its footsteps.

You might also like