0% found this document useful (0 votes)
8 views10 pages

CH 2

The document discusses computer performance metrics, emphasizing the importance of workload type and cost in determining the fastest computer for specific tasks. It covers key concepts such as response time, throughput, MIPS, and Amdahl's law, while also addressing the limitations of using MIPS as a performance metric. Additionally, it highlights the significance of benchmarks like SPEC95 for evaluating system performance.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

CH 2

The document discusses computer performance metrics, emphasizing the importance of workload type and cost in determining the fastest computer for specific tasks. It covers key concepts such as response time, throughput, MIPS, and Amdahl's law, while also addressing the limitations of using MIPS as a performance metric. Additionally, it highlights the significance of benchmarks like SPEC95 for evaluating system performance.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Performance of Computers Performance of Computers

Which computer is fastest? Want to buy the fastest computer for what you want to do
Not so simple • workload is important

• scientific simulation - FP performance Want to design the fastest computer for what they want to pay
• program development - Integer performance • BUT cost is an important criterion
• commercial work - I/O

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 2

Forecast Defining Performance


Time and performance What is important to who

Iron law Computer system user

MIPS and MFLOPS • minimize elapsed time for program = time_end - time_start
• called response time
Which programs and how to average
Computer center manager
Amdahl’s law
• maximize completion rate = #jobs/second
• called throughput

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 4
Response Time vs. Throughput What is Performance for us?
Is throughput = 1/av. response time? For computer architects
• only if NO overlap • CPU execution time = time spent running a program
• with overlap, throughput > 1/av.response time Because people like faster to be bigger to match intuition
• e.g., a lunch buffet - assume 5 entrees • performance = 1/X time
• each person takes 2 minutes at every entree • where X = response, CPU execution, etc.
• throughput is 1 person every 2 minutes
Elapsed time = CPU execution time + I/O wait
• BUT time to fill up tray is 10 minutes
We will concentrate mostly on CPU execution time
• why and what would the throughput be, otherwise?
because there are 5 people (each at 1 entree)
simultaneously; if there is no such overlap throughput = 1/10

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 5 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 6

Improve Performance Performance Comparison


Improve (a) response time or (b) throughput? Machine A is n times faster than machine B iff
• faster CPU • perf(A)/perf(B) = time(B)/time(A) = n
• both (a) and (b)
• Add more CPUs Machine A is x% faster than machine B iff
• (b) but (a) may be improved due to less queueing • perf(A)/perf(B) = time(B)/time(A) = 1 + x/100

E.g., A 10s, B 15s


• 15/10 = 1.5 => A is 1.5 times faster than B
• 15/10 = 1 + 50/100 => A is 50% faster than B

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 8
Breaking down Performance Iron law
A program is broken into instructions Time/program = instrs/program x cycles/instr x sec/cycle
• H/W is aware of instructions, not programs sec/cycle (a.k.a. cycle time, clock time) - ‘heartbeat’ of computer
At lower level, H/W breaks intructions into cycles • mostly determined by technology and CPU organization
• lower level state machines change state every cycle cycles/instr (a.k.a. CPI)
• mostly determined by ISA and CPU organization

E.g., 500 MHz PentiumIII runs 500M cycles/sec, 1 cycle = 2 ns • overlap among instructions makes this smaller

E.g., 2 GHz PentiumX will run 2G cycles/sec, 1 cycle = 0.5 ns instr/program (a.k.a. instruction count)
• instrs executed NOT static code
• mostly determined by program, compiler, ISA

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 10

Our Goal Other Metrics


Minimize time which is the product, NOT isolated terms MIPS and MFLOPS

Common error to miss terms while devising optimizations 6


MIPS = instruction count/(execution time x 10 )

• E.g., ISA change to decrease instruction count 6


= clock rate/(CPI x 10 )
• BUT leads to CPU organization which makes clock slower
BUT MIPS has problems

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 12
Problems with MIPS Problems with MIPS
E.g., without FP H/W, an FP op may take 50 single-cycle instrs Ignore program
with FP H/W only one 2-cycle instr Usually used to quote peak performance

Thus adding FP H/W • ideal conditions => guarantee not to exceed!!

• CPI increases (why?) The FP op goes from 50/50 to 2/1 When is MIPS ok?
• but instrs/prog decreases more (why?) each of the • same compiler and same ISA
FP op reduces from 50 to 1, factor of 50
• e.g., same binary running on Pentium Pro and Pentium
• total execution time decreases
• why? instrs/prog is constant and may be ignored
• For MIPS
• instrs/prog ignored
• MIPS gets worse!

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 14

Other Metrics Rules


6 Use ONLY Time
MFLOPS = FP ops in program/(execution time x 10 )

Assuming FP ops independent of compiler and ISA


Beware when reading, especially if details are omitted
• Assumption not true
• may not have divide in ISA
Beware of Peak
• optimizing compilers

Relative MIPS and normalized MFLOPS


• adds to confusion! (see book)

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 16
Iron Law Example Iron Law Example
Machine A: clock 1 ns, CPI 2.0, for a program Keep clock of A at 1 ns and clock of B at 2 ns
Machine B: clock 2 ns, CPI 1.2, for same program For equal performance, if CPI of B is 1.2, what is CPI of A?

Which is faster and how much Time(B)/Time(A) = 1 = (N x 2 x 1.2)/(N x 1 x CPI(A))


Time/program = instrs/program x cycles/instr x sec/cycle CPI(A) = 2.4
Time(A): N x 2.0 x 1 = 2N

Time(B): N x 1.2 x 2 = 2.4N

Compare: Time(B)/Time(A) = 2.4N/2N = 1.2


So, Machine A is 20% faster than Machine B for this program

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 18

Iron Law Example Which Programs


Keep CPI of A 2.0 and CPI of B 1.2 Execution time of what
For equal performance, if clock of B is 2 ns, what is clock of A? Best case - you run the same set of programs everyday

Time(B)/Time(A) = 1 = (N x 2.0 x clock(A))/(N x 1.2 x 2) • port them and time the whole “workload”

clock(A) = 1.2 ns In reality, use benchmarks


• programs chosen to measure performance
• predict performance of actual workload (hopefully)
+ saves effort and money
– representative? honest?

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 20
How to average How to average
Example (page 70) Another: arithmetic mean (same result)

Machine A Machine B n 
Arithmetic mean of times:  ∑ time ( i )  ⁄ n for n programs
Program 1 1 10 1 

Program 2 1000 100 AM(A) = 1001/2 = 500.5

Total 1001 110 AM(B) = 110/2 = 55


500.5/55 = 9.1
One answer: total execution time, then B is how much faster than A? 9.1 Valid only if programs run equally often, so use “weight” factors

n 
Weighted arithmetic mean:  ∑ ( weight ( i ) × time ( i ) )  ⁄ n
1 

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 22

Other Averages Harmonic Mean


E.g., 30 mph for first 10 miles 1
--------------------------------------
90 mph for next 10 miles. averatge speed? Harmonic mean of rates =  n
1 
 ∑ ----------------- - ⁄n
 rate ( i ) 
1

Average speed = (30+90)/2 WRONG


Use HM if forced to start and end with rates

Average speed = total distance / total time Trick to do arithmetic mean of times but using rates and not times

• (20 / (10/30+10/90))
• 45 mph

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 24
Dealing with Ratios Dealing with Ratios
E.g., average for machine A is 1, average for machine B is 5.05
If we take ratios, with respect to Machine B
Machine A Machine B
Program 1 1 10 Machine A Machine B
Program 2 1000 100 Program 1 0.1 1
Program 2 10 1
If we take ratios, with respect to Machine A
average for machine A = 5.05, average for machine B = 1
Machine A Machine B
can’t both be true!
Program 1 1 10
Don’t use arithmetic mean on ratios (normalized numbers)
Program 2 1 0.1

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 26

Geometric Mean But..


Use geometric mean for ratios Geometric mean of ratios is not proportional to total time

n AM in example says machine B is 9.1 times faster


geometric mean of ratios = n ∏ ratio ( i )
1 GM says they are equal

Use GM if forced to use ratios If we took total execution time, A and B are equal only if
• program 1 is run 100 times more often than program 2

Independent of reference machine (math property)


Generally, GM will mispredict for three or more machines
In the example, GM for machine A is 1, for machine B is also 1
• normalized with respect to either machine

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 28
Summary Benchmarks: SPEC95
Use AM for times System Performance Evaluation Cooperative
Use HM if forced to use rates Latest is SPEC2K but Text uses SPEC95

Use GM if forced to use ratios 8 integer and 10 floating point programs


• normalize run time with a SPARCstation 10/40
• GM of the normalized times
Better yet, use unnormalized numbers to compute time

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 30

SPEC95 Some SPEC95 Programs


Benchmark Description Benchmark INT/FP Description
go AI, plays go m88ksim Integer Motorola 88K chip simulator
m88ksim Motorola 88K chip simulator gcc Integer Gnu compiler
gcc Gnu compiler compress Integer Unix utility compresses files
compress Unix utility compresses files vortex Integer Database program
li Lisp Interpreter su2cor FP Quantum physics; Monte carlo
ijpeg Graphic (de)compression hydro2d FP Navier Stokes equations
perl Unix utility text processor mgrid FP 3-D potential field
vortex Database program wave5 FP Electromagnetic particle simulation

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 32
Amdahl’s Law Amdahl’s Law Example
Why does the common case matter the most? Your boss asks you to improve Pentium Posterior performance by
Speedup = old time/new time = new rate/old rate • improve the ALU used 95% of time, by 10%
• improve the memory pipeline used 5%, by a factor of 10
Let an optimization speed f fraction of time by a factor of s
Let f = fraction sped up and s = the speedup on that fraction
Spdup = [ ( 1 – f ) + f ] × oldtime ⁄ ( [ ( 1 – f ) × oldtime ] + f ⁄ s × oldtime ) • new_time = (1-f)*old_time + (f/s)*old_time
• Speedup = new_rate / old_rate = old_time / new_time
= 1 ⁄ (1 – f + f ⁄ s )
• Speedup = old_time / ((1-f)*old_time + (f/s)*old_time)

Amdahl’s Law: Speedup = 1 / ((1-f) + (f/s))

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 34

Amdahl’s Law Example, cont. Amdahl’s Law: Limit


Your boss asks you to improve Pentium Posterior performance by
lim  --------------------------- =
1 1
 ----------- => Make common case fast
• improve the ALU used 95% of time, by 10% s→∞ 1 – f + f ⁄ s 1–f
10
• improve the memory pipeline used 5%, by a factor of 10
8
f s Speedup

Speedup
6
95% 1.10 1.094
5% 10 1.047 4
5% ∞ 1.052 2

0
0 0.2 0.4 0.6 0.8 1
f
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 36
Summary of Chapter 2
Time and performance: Machine A n times faster than Machine B
• iff Time(B)/Time(A) = n

Iron Law: Time/prog = Instr count x CPI x Cycle time

Other Metrics: MIPS and MFLOPS


• Beware of peak and omitted details

Benchmarks: SPEC95

Summarize performance: AM for time, HM for rate, GM for ratio

Amdahl’s Law: Speedup = 1 ⁄ ( 1 – f + f ⁄ s ) - common case fast

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 2 37

You might also like