Chapter 1 Notes
Chapter 1 Notes
71
Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
DP = F x ½ CV^2
Performanc e X Performanc e Y SP = ½ CV^2
= Execution time Y Execution time X = n
Clock Cycles n
Instructio n Count i
CPI = = CPIi
Instructio n Count i=1 Instructio n Count
Relative frequency
79
Performance Summary
◼Performance depends on
◼Algorithm: affects IC, possibly CPI
◼Programming language: affects IC, CPI
◼Compiler: affects IC, CPI
◼Instruction set architecture: affects IC, CPI, Tc
81
Example Problem 1
◼ Consider three different processors P1, P2, and P3 executing the
same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2
has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock
rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions
per second?
b. If the processors each execute a program in 10 seconds, find the number
of cycles and the number of instructions.
c. We are trying to reduce the execution time by 30%, but this leads to an
increase of 20% in the CPI. What clock rate should we have to get this time
reduction?
1a
IPS = Cycles per Second/Cycles per Instruction
A measure of throughput – or rate of doing work
◼ Performance of P1: 3GHz/1.5 = 2 x 109 Inst/sec
◼ Performance of P2: 2.5GHz/1.0 = 2.5 x 109 Inst/sec
◼ Performance of P3: 4GHz/2.2 = 1.8 x 109 Inst/sec
◼ CPI can be as relevant to processor performance as clock
frequency
◼ Faster clocks may not always be a good thing – higher power
dissipation, worse reliability, worse coupling noise… and not the
best rate of processing instructions!
1b
◼ # of cycles = Cycles per second x time (in seconds)
◼ Cycles of P1: 3 GHz x 10 s = 30 B cycles
◼ Cycles of P2: 2.5GHz x 10 s = 25 B cycles
◼ Cycles of P3: 4 GHz x 10s = 40 B cycles
Lower CPI translates into higher productivity and higher energy efficiency as a
result
1b contd..
◼ # of Instructions = Cycles / CPI
◼ # of Instructions of P1: 30 B cycles/1.5 Cycles per Instruction = 20B
◼ # of Instructions of P2: 25 B cycles/1 Cycles per Instruction = 25B
◼ # of Instructions of P3: 40 B cycles/2.2 Cycles per Instruction =
18.18B
1c
◼ Lower execution time trades off with higher CPI & higher FCLK
◼ Assuming 30% reduction in execution time requires 20% higher CPI
# Instructions x CPI_new / ET_new = Fclk_new
High CPI processors require even higher Clock rates to get the same %
improvement in execution time
Example Problem 2
Compilers can have a profound impact on the performance of an
application. Assume that for a program, compiler A results in a
dynamic instruction count of 1.0E9 and has an execution time of 1.1
s, while compiler B results in a dynamic instruction count of 1.2E9
and an execution time of 1.5 s.
a. Find the average CPI for each program given that the processor has a
clock cycle time of 1 ns.
b. Assume the compiled programs run on two different processors. If the
execution times on the two processors are the same, how much faster is the
clock of the processor running compiler A’s code versus the clock of the
processor running compiler B’s code?
c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus
using compiler A or B on the original processor?
2a
◼ CPI = ETime x Fclk / Instr Count
◼ Compiler A CPI = 1.1s x 1GHz / 1 B = 1.1
◼ Compiler B CPI = 1.5s x 1GHz / 1.2 B = 1.25
a. If E1 can be used 20% of the time and E2 can be used 10% of the time, what would be
the overall speedup?
b. If the percentage of time that E1 can be used decreased to 15%, what percentage of the
time would the use of E2 have to be to get the same overall speedup as in part (a)?
c. Suppose we are free to choose between E1 or E2, whenever we want (the percentages
of time for using E1 or E2 can be varied as desired, but in total cannot be more than 100%
of the time). What would be the maximum achievable overall speedup?
6 a, b, c
a. Speedup = Te old / Te new
Te / [20% (Te/10) + 10% (Te/5) + 70% x (Te/Te)]
= 1 / [0.02 + 0.02 + 0.7] = 1/0.74 = 1.35
b. 1/[0.15/10 + x/5 + (0.85 -x)] = 1 / [0.74]
[0.15/10 + x/5 + (0.85 -x) = 0.74
x = 0.125 / 0.8 = 0.15625
Enhancement 2 would need to increase its percentage time from 10% to
15.625% to make up for a decrease in time of Enhancement 1 from 20% to
15%
c. speedup = Te / [100% x (Te/10)] = 10
Example Problem 7
◼ Suppose a program (or a program task) takes 1 billion instructions to
execute on a processor running at 2 GHz. Suppose also that 50% of the
instructions execute in 3 clock cycles, 30% execute in 4 clock cycles, and
20% execute in 5 clock cycles. What is the execution time for the program
or task?
So,
1 billion instructions x CPI = number of cycles required by Program = 3.5 x 109
at 1.9 GHz,
one clock cycle consumes = 1 / [ 1.9 x 109] seconds or 0.526315 x 10-9 seconds
So,
3.5 x 109 cycles consumes 3.5 x 109 x 0.526315 x 10-9 seconds = 1.8421025 sec
so improvement is 1.85/1.8421025 = 1.0042872 or ~ 0.43% improvement