CH 2
CH 2
Which computer is fastest? Want to buy the fastest computer for what you want to do
Not so simple • workload is important
• scientific simulation - FP performance Want to design the fastest computer for what they want to pay
• program development - Integer performance • BUT cost is an important criterion
• commercial work - I/O
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 2
MIPS and MFLOPS • minimize elapsed time for program = time_end - time_start
• called response time
Which programs and how to average
Computer center manager
Amdahl’s law
• maximize completion rate = #jobs/second
• called throughput
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 4
Response Time vs. Throughput What is Performance for us?
Is throughput = 1/av. response time? For computer architects
• only if NO overlap • CPU execution time = time spent running a program
• with overlap, throughput > 1/av.response time Because people like faster to be bigger to match intuition
• e.g., a lunch buffet - assume 5 entrees • performance = 1/X time
• each person takes 2 minutes at every entree • where X = response, CPU execution, etc.
• throughput is 1 person every 2 minutes
Elapsed time = CPU execution time + I/O wait
• BUT time to fill up tray is 10 minutes
We will concentrate mostly on CPU execution time
• why and what would the throughput be, otherwise?
because there are 5 people (each at 1 entree)
simultaneously; if there is no such overlap throughput = 1/10
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 5 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 6
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 8
Breaking down Performance Iron law
A program is broken into instructions Time/program = instrs/program x cycles/instr x sec/cycle
• H/W is aware of instructions, not programs sec/cycle (a.k.a. cycle time, clock time) - ‘heartbeat’ of computer
At lower level, H/W breaks intructions into cycles • mostly determined by technology and CPU organization
• lower level state machines change state every cycle cycles/instr (a.k.a. CPI)
• mostly determined by ISA and CPU organization
E.g., 500 MHz PentiumIII runs 500M cycles/sec, 1 cycle = 2 ns • overlap among instructions makes this smaller
E.g., 2 GHz PentiumX will run 2G cycles/sec, 1 cycle = 0.5 ns instr/program (a.k.a. instruction count)
• instrs executed NOT static code
• mostly determined by program, compiler, ISA
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 10
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 12
Problems with MIPS Problems with MIPS
E.g., without FP H/W, an FP op may take 50 single-cycle instrs Ignore program
with FP H/W only one 2-cycle instr Usually used to quote peak performance
• CPI increases (why?) The FP op goes from 50/50 to 2/1 When is MIPS ok?
• but instrs/prog decreases more (why?) each of the • same compiler and same ISA
FP op reduces from 50 to 1, factor of 50
• e.g., same binary running on Pentium Pro and Pentium
• total execution time decreases
• why? instrs/prog is constant and may be ignored
• For MIPS
• instrs/prog ignored
• MIPS gets worse!
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 14
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 16
Iron Law Example Iron Law Example
Machine A: clock 1 ns, CPI 2.0, for a program Keep clock of A at 1 ns and clock of B at 2 ns
Machine B: clock 2 ns, CPI 1.2, for same program For equal performance, if CPI of B is 1.2, what is CPI of A?
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 18
Time(B)/Time(A) = 1 = (N x 2.0 x clock(A))/(N x 1.2 x 2) • port them and time the whole “workload”
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 20
How to average How to average
Example (page 70) Another: arithmetic mean (same result)
Machine A Machine B n
Arithmetic mean of times: ∑ time ( i ) ⁄ n for n programs
Program 1 1 10 1
n
Weighted arithmetic mean: ∑ ( weight ( i ) × time ( i ) ) ⁄ n
1
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 22
Average speed = total distance / total time Trick to do arithmetic mean of times but using rates and not times
• (20 / (10/30+10/90))
• 45 mph
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 24
Dealing with Ratios Dealing with Ratios
E.g., average for machine A is 1, average for machine B is 5.05
If we take ratios, with respect to Machine B
Machine A Machine B
Program 1 1 10 Machine A Machine B
Program 2 1000 100 Program 1 0.1 1
Program 2 10 1
If we take ratios, with respect to Machine A
average for machine A = 5.05, average for machine B = 1
Machine A Machine B
can’t both be true!
Program 1 1 10
Don’t use arithmetic mean on ratios (normalized numbers)
Program 2 1 0.1
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 26
Use GM if forced to use ratios If we took total execution time, A and B are equal only if
• program 1 is run 100 times more often than program 2
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 28
Summary Benchmarks: SPEC95
Use AM for times System Performance Evaluation Cooperative
Use HM if forced to use rates Latest is SPEC2K but Text uses SPEC95
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 30
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 32
Amdahl’s Law Amdahl’s Law Example
Why does the common case matter the most? Your boss asks you to improve Pentium Posterior performance by
Speedup = old time/new time = new rate/old rate • improve the ALU used 95% of time, by 10%
• improve the memory pipeline used 5%, by a factor of 10
Let an optimization speed f fraction of time by a factor of s
Let f = fraction sped up and s = the speedup on that fraction
Spdup = [ ( 1 – f ) + f ] × oldtime ⁄ ( [ ( 1 – f ) × oldtime ] + f ⁄ s × oldtime ) • new_time = (1-f)*old_time + (f/s)*old_time
• Speedup = new_rate / old_rate = old_time / new_time
= 1 ⁄ (1 – f + f ⁄ s )
• Speedup = old_time / ((1-f)*old_time + (f/s)*old_time)
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 34
Speedup
6
95% 1.10 1.094
5% 10 1.047 4
5% ∞ 1.052 2
0
0 0.2 0.4 0.6 0.8 1
f
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 36
Summary of Chapter 2
Time and performance: Machine A n times faster than Machine B
• iff Time(B)/Time(A) = n
Benchmarks: SPEC95