Performance
Performance
In this section, we will discuss various aspects of performance. Anyone buying a computer will
first ask: Which machine has the best performance? Which comparable machine has the lowest price?
Which one has the best price/performance ratio? But there is much more to measuring perfomance than
comparing MHz specs.
This section includes the following topics:
2. Speedup
3. Amdahl’s Law
4. Performance metrics of single-cycle and pipelined datapath
The perceived performance depends on what metric is selected. We will consider two metrics, time
and number of tasks completed in a unit of time.
How long does a task take? The length of a task is known as execution time, response time, or
latency.
How many tasks are completed in a unit of time? The number of tasks per unit of time is
called throughput or bandwidth.
We will refer to them as latency and throughput respectively.
Consider a single-cycle machine with 2ns cycle time. Its latency is 2ns, because each instruction
completes in 2ns. Its throughput is 1/2ns, because one instruction completes every 2ns. But the
relationship between latency and throughput isn’t as clear-cut when a machine can perform multiple
tasks concurrently.
Which performance metric is more important can also depend on a perspective. For example, in a
client-server system, a client is concerned with the time required to accomplish a task (i.e. latency), but
the performance of the server is measured by the number of requests it can serve in a unit of time (i.e.
throughput).
Throughput is measured in units of things per unit of time, e.g. hamburgers per hour, passenger-
miles per hour. Bigger throughput means better performance.
Latency is measured in units of time (seconds, ns). Performance is inversely proportional to the
latency (or execution time) of a task.
1
P erf ormance(x) = (1)
execution time(x)
So, smaller latency means better performance.
How to compare performance of two machines? To compare performance of two machines X
and Y, run the same program (i.e. benchmark) on both of them. This means that you can assume that
they have the same throughput and only the latency (execution time) needs to be compared. In such
cases a simple formula suffices:
1
CS232 Section 6: Performance
2 Amdahl’s Law
Measuring performance is only a means, not an end. One use of performance measurements is determining
what part of the system to optimize to maximize performance increase. Amdahl’s Law measures how
effective an optimization can be.
Amdahl’s Law states that optimizations are limited in their effectiveness. The performance enhance-
ment possible with a given improvement is limited by the amount that the improved feature is used.
3 Practice problems
1. Recall the execution time equation:
2. Use the basic machine table from question 1, but assume that the frequency column indicates the
percentage of execution time spent in the corresponding instruction type. Use Amdahl’s Law to
answer the following: if you could decrease the cycle time of one of the instruction types by 1 cycle,
which instruction type would you optimize? How would the new execution time compare to the
original one?
2
CS232 Section 6: Performance
3. Intel’s Itanium (IA-64) ISA is designed to facilitate executing multiple instructions per cycle. If
an Itanium processor achieves an average CPI of .3 (3 instructions per cycle), how much faster is
it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1?
(a) Itanium is three times faster
(b) Itanium is one third as fast
(c) Not enough information
4. Assume the following delays for the main functional units of the single-cycle datapath shown below:
Single-cycle datapath
0
M
Add u
x
PC 4
Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
Registers memory
I [15 - 11] x Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
3
CS232 Section 6: Performance
Assuming the instructions are executed on a single-cycle machine with 10ns cycle time, compute:
6. Assume that the code above is executed on a 5-stage pipelined machine discussed in lecture. You
might first draw the pipeline diagram in the space below.