Chapter 1 Lecture 2 & 3 - Performance
Chapter 1 Lecture 2 & 3 - Performance
Response Time
Delay between start end time of a task
Throughput
Numbers of tasks per given time
Power/Energy
Energy per task, power
1
Metrics for different applications
Desktop computing
Metrics: performance (latency), cost
Server computing
Examples: web servers, transaction servers, file servers
Metrics: performance (throughput), reliability
Embedded computing
Examples: printer, cell phone, video console
Metrics: performance (real-time), cost, power
consumption
2
Airline example
Plane Speed Range Passengers Time (hrs) Throughput
(mph) (miles) (p x mph)
5
Measure of performance for a processor
7
Amdahl’s Law
8
Amhdahl’s Law [contd…]
Speedupenhanced
1
Speedupoverall = ExTimeold =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced
9
Example (1):- Design comparison
10
We can compare these two alternatives by comparing
the speedups.
12
b) Overall speedup achieved if you modify the
compiler so that 75% of the computations can use
the floating-point processor.
13
14
Example (3):-
15
b) In order to improve the speedup consider two
options:
CPU time: - is the time between the start and the end of execution of a
given program.
—CPU time depends on the program which is executed, including:
types of instructions executed.
the number of instructions executed.
19
How many cycles are required for a program?
# of cycles = # of instructions
2nd instruction
3rd instruction
1st instruction
4th
5th
6th
...
time
21
Cpu performance equation …..cont’d
22
CPU Time: Example 1
23
24
Example - 2
Suppose you have a load/store computer with the
following instruction mix:
25
b) We observe that 35% of the ALU ops are paired with a
load, and we propose to replace these ALU ops and their
loads with a new instruction. The new instruction takes 1
clock cycle. With the new instruction added, branches
take 5 clock cycles, Compute the CPI for the new version.
26
c) If the clock of the old version is 20% faster than
the new version, which version has faster CPU
Execution time and by how much percent
27
Example - 3
• 28
a) relative frequency of occurrence
29
c) Assuming that the clock on system B is 10% faster than
the clock on system A, which system is faster for the
given application problem and by how much percent?
30
Example – 3
Consider our earlier example, here modified to use
measurements of the frequency of the instructions and of
the instruction CPI values, which, in practice, is obtained by
simulation or by hardware instrumentation.
EXAMPLE Suppose we have made the following
measurements:
— Frequency of FP operations (other than FPSQR) = 25%
— Average CPI of FP operations = 4.0
— Average CPI of other instructions = 1.33
— Frequency of FPSQR= 2% CPI of FPSQR = 20
Assume that the two design alternatives are to decrease the
CPI of FPSQR to 2 or to decrease the average CPI of all FP
operations to 2.5. Compare these two design alternatives
using the CPU performance equation
31
CPU perform…..cont’d
First, observe that only the CPI changes; the clock rate and
instruction count remain identical. We start by finding the original
CPI with neither enhancement:
32
CPU perform…..cont’d
33
Example - 4
Suppose we are considering two alternatives for our
conditional branch instructions, as follows:
— CPU A: A condition code is set by a compare instruction and
followed by a branch that tests the condition code.
— CPU B: A compare is included in the branch.
34
Since we are ignoring all systems issues, we can use the CPU
performance formula:
since 20% are branches taking 2 clock cycles and the rest of the
instructions take 1 cycle each. The performance of CPU A is then
Clock cycle timeB is 1.25 × Clock cycle timeA, since A has a clock
rate that is 1.25 times higher. Compares are not executed in CPU
B, so 20%/80% or 25% of the instructions are now branches taking
2 clock cycles, and the remaining 75% of the instructions take 1
cycle. Hence,
35
Because CPU B doesn’t execute compares, ICB =
0.8 × ICA. Hence, the performance of CPU B is