0% found this document useful (0 votes)
16 views

Measuring Computer Performance

Uploaded by

cse.20201016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Measuring Computer Performance

Uploaded by

cse.20201016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Measuring Computer Performance

Performance
• Purchasing Perspective: given a collection of
machines (or upgrade options), which has the
• best performance ?
• least cost ?
• best performance / cost ?

• Computer Designer Perspective: faced with


design options, which has the
• best performance improvement ?
• least cost ?
• best performance / cost ?

• All require basis for comparison and metric for


evaluation
–Solid metrics lead to solid progress!
Two Notions of “Performance”
DC to Top Passen- Throughput
Plane
Paris Speed gers (pmph)
6.5 610
Boeing 747 470 286,700
hours mph

BAD/Sud 1350
3 hours 132 178,200
Concorde mph

•Which has higher performance?


•Time to deliver 1 passenger?
•Time to deliver 400 passengers?
•In a computer, time for 1 job called
Response Time or Execution Time
•In a computer, jobs per day or in unit-time called
Throughput or Bandwidth
Definitions
Performance is in units of things per sec
bigger is better
If we are primarily concerned with response time

performance(x) = 1
execution_time(x)

" F(ast) is n times faster than S(low) " means…

performance(F) execution_time(S)
n= =
performance(S) execution_time(F)
Example of Response Time v. Throughput
• Time of Concorde vs. Boeing 747?
• Concord is 6.5 hours / 3 hours = 2.2 times faster

• Throughput of Boeing vs. Concorde?


• Boeing 747: 286,700 pmph / 178,200 pmph = 1.6
times faster

• Boeing is 1.6 times (“60%”) faster in terms of throughput

• Concord is 2.2 times (“120%”) faster in terms of flying time


(response time)

We will focus primarily on execution time for a single job


Confusing Wording on Performance
• Will (try to) stick to “n times faster”; its less
confusing than “m % faster”

• As faster means both increased performance


and decreased execution time, to reduce
confusion we will (and you should) use
“improve performance” or
“improve execution time”
What is Time?
• Straightforward definition of time:
–Total time to complete a task, including disk accesses,
memory accesses, I/O activities, operating system overhead,
...
–“real time”, “response time” or “elapsed time”

• Alternative: just time processor (CPU)


is working only on your program (since multiple processes
running at same time)
–“CPU execution time” or “CPU time”
–Often divided into system CPU time (in OS) and user CPU
time (in user program)
How to Measure Time?

• User Time Þ seconds

• CPU Time: Computers constructed using a clock that runs at a


constant rate and determines when events take place in the
hardware
–These discrete time intervals called
clock cycles (or informally clocks or cycles)

–Length of clock period: clock cycle time


(e.g., 2 nanoseconds or 2 ns) and clock rate (e.g., 500
megahertz, or 500 MHz), which is the inverse of the clock
period; use these!
Measuring Time using Clock Cycles

CPU execution time for a progra


= Clock Cycles for a program x Clock Cycle Time

• Or = Clock Cycles for a program


Clock Rate
Measuring Time using Clock Cycles
• One way to define clock cycles:
Clock Cycles for program
= Instructions for a program (called “Instruction Count”)
x Average Clock cycles Per Instruction (abbreviated “CPI”)

• CPI is one way to compare two machines with same


instruction set, since Instruction Count would be the same
Performance Calculation
• CPU execution time for program
= Clock Cycles for program x Clock Cycle Time

• Substituting for clock cycles:


CPU execution time for program
= (Instruction Count x CPI)
x Clock Cycle Time
= Instruction Count x CPI x Clock Cycle Time
Performance Calculation (2/2)
CPU time = Instructions x Cycles x Seconds
Program Instruction Cycle
CPU time = Instructions x Cycles x Seconds
Program Instruction Cycle
CPU time = Instructions x Cycles x Seconds
Program Instruction Cycle
CPU time = Seconds
Program
• Product of all 3 terms: if missing a term, can’t
predict time, the real measure of performance
How to Calculate the 3 Components?
• Clock Cycle Time: in specification of computer (Clock Rate in
advertisements)
• Instruction Count:
–Count instructions in loop of small program
–Use simulator to count instructions
–Hardware counter in spec. register
• (Pentium II,III,4)

CPI:
Calculate: Execution Time / Clock cycle time
Instruction Count
Hardware counter in special register (PII,III,4)
CPU Performance
Calculating CPI Another Way
• First calculate CPI for each individual instruction
(add, sub, and, etc.)
• Next calculate frequency of each individual
instruction
• Finally multiply these two for each instruction
and add them up to get final CPI (the weighted
sum)
n Ij
CPI = å CPI j ´ Fj where Fj =
j =1 Instruction Count
Example (RISC processor)
Op Freqi CPIi Prod (% Time)
ALU 50% 1 .5 (23%)
Load 20% 5 1.0 (45%)
Store 10% 3 .3 (14%)
Branch 20% 2 .4 (18%)
2.2 (Where time spent)
Instruction Mix

How much faster would the machine be if a better data cache reduced the
average load time to 2 cycles?
Load à 20% x 2 cycles = .4
Total CPI 2.2 à 1.6
Relative performance is 2.2 / 1.6 = 1.38

How does this compare with reducing the branch instruction to 1 cycle?
Branch à 20% x 1 cycle = .2
Total CPI 2.2 à 2.0
Relative performance is 2.2 / 2.0 = 1.1
Amdahl's “Law”
• Speedup due to enhancement E:
• ExTime w/o E Performance w/ E
• Speedup(E) = -------------------- = ---------------------
• ExTime w/ E Performance w/o E

• Suppose that enhancement E accelerates a fraction F of the task


• by a factor S and the remainder of the task is unaffected then,
Performance improvement
• ExTime(with E) = ((1-F) + F/S) X ExTime(without E) is limited by how much the
improved feature is used à
Invest resources where
• Speedup(with E) = ExTime(without E) ÷ time is spent.
((1-F) + F/S) X ExTime(without E)
Example of Amdahl’s Law
• Floating point instructions are improved to run twice as fast,
but only 10% of the time was spent on these instructions
originally. How much faster is the new machine?
executionTimeold 1
Speedup = =
executionTimenew (1 - fraction fractionenhanced
enhanced ) +
Speedupenhanced
• The new machine is 1.053 times as fast, or 5.3% faster.
1 1.00
Speedup = = » 1.053 times faster
0.1 0.95
(1 - 0.1) +
2
• How much faster would the new machine be if floating point
instructions become 100 times faster?
Estimating Performance Improvements
• A state-of-the art processor currently requires 10 seconds to
execute a program.
• Processor performance improves by 50 percent per year.
• Assuming only processor performance is at issue, by what factor
does processor performance improve in 5 years?
newPerf = (1 + increase/year)numYears = (1+0.5)5 = 7.6

• How long will it take a processor to execute the program after 5


years?
executionTimenew = 10/7.59 = 1.32 seconds

• How many year will it take until the program can be executed in
1 second.
Performance Example

• Computers M1 and M2 are two implementations of the same


instruction set.
– M1 has a clock rate of 1600 MHz and M2 has a clock rate of 2.4 GHz.
– M1 has a CPI of 2.8 and M2 has a CPI of 3.2 for a given program.

• How many times faster is M2 than M1 for this program?


numInstr(M1) ´ CPI(M1) 2.8 cyc
executionTime(M1) clockRate(M1) 1600 ´106 cyc/sec
= = » 1.3
executionTime(M2) numInstr(M2) ´ CPI(M2) 3.2 cyc
clockRate(M2) 2.4 ´109 cyc/sec

• What would the clock rate of M1 have to be for them to have the
same execution time?
Marketing Metrics
• MIPS = Instruction Count /(Time * 10^6)
= Clock Rate / (CPI * 10^6)
– machines with different instruction sets ?
– programs with different instruction mixes ?
– dynamic frequency of instructions
– uncorrelated with performance

• MFLOP/s = FP Operations / (Time * 10^6)


– machine dependent
– often not where time is spent
Benchmarks

• Ideally run typical programs with typical


input before purchase, or before even build
machine
–Called a “workload”; For example:
–Engineer uses compiler, spreadsheet
–Author uses word processor, drawing program,
compression software

• In some situations it’s hard to do


–Don’t have access to machine to “benchmark”
before purchase
–Don’t know workload in future
Benchmarks
• Obviously, apparent speed of processor depends
on code used to test it

• Need industry standards so that different


processors can be fairly compared

• Companies exist that create these benchmarks:


“typical” code used to evaluate systems

• Need to be changed every 2 or 3 years since


designers could (and do!) target for these standard
benchmarks
Example Standardized Benchmarks
• Standard Performance Evaluation Corporation
(SPEC) SPEC CPU2000
–CINT2000 12 integer (gzip, gcc, crafty, perl, ...)
–CFP2000 14 floating-point (swim, mesa, art, ...)
–All relative to base machine
Sun 300MHz 256Mb-RAM Ultra5_10, which gets
score of 100
– www.spec.org/osg/cpu2000/
–They measure
• System speed (SPECint2000)
• System throughput (SPECint_rate2000)
Example Standardized Benchmarks
• SPEC
–Benchmarks distributed in source code
–Members of consortium select workload
• 30+ companies, 40+ universities
–Compiler, machine designers target benchmarks, so
try to change every 3 years
–The last benchmark released was SPEC 2006
“And in conclusion…”

• Good benchmarks, such as the SPEC benchmarks,


can provide an accurate basis for evaluating and
comparing computer performance.
• MIPS and MFLOPS are easy to use, but inaccurate
indicators of performance.
• Amdahl’s law provides an efficient method for
determining speedup due to an enhancement.
• Make the common case fast!

You might also like