Chapter 4 Assessing and Understanding Performance
Bo Cheng
Which One Is Good?
Airplane
Boeing 737-100
Boeing 747 BAC/Sud Concorde Douglas DC-8-50
Passengers
101
470 132 146
Range (mi)
630
4150 4000 8720
Speed (mph)
598
610 1350 544
Depends on measures of performance Cruising speed Longest range Largest capacity
Measuring Performance
Elapsed Time, wall-clock time or response time
Total time to complete a task
Including disk and memory accesses, I/O , etc.
a useful number, but often not good for comparison purposes Doesn't count I/O or time spent running other programs can be broken up into system CPU time, and user CPU time CPU time = user CPU time +system CPU time time spent executing the lines of code that are "in" our program
CPU (execution) time
Our focus: user CPU time
CPU Performance Metrics
Response time: the time between the start and the completion of a task (in time units) Throughput: the total amount of work done in a given time (in number of tasks per unit of time)
Performance
Problem: 1 Performancex Machine A runs a execution _ timex program in 10 sec. Performancex execution _ timey n Machine B runs the Performancey execution _ timex same program in 15 sec. How much faster is A than B ?
15 1.5 10
A is 1.5 times faster than B
Clock Rate Measurement
Name
Clock cycle: The time for one clock period running at a constant rate Clock rate is given in Hz (=1/sec) clock_cycle_time = 1/clock_rate (in sec)
Example
Measurement
Millisecond Microsecond Nanosecond Picosecond Femtosecond
1 msec (ms) 1 usec (us) 1 nsec (ns) 1 psec (ps) 1 fsec (fs)
=> => => =>
1.E-03 1.E-06 1.E-09 1.E-12 1.E-15
10 nsec clock cycle 1 nsec clock cycle 500 psec clock cycle 200 psec clock cycle
100 MHz clock rate 1 GHz clock rate 2 GHz clock rate 5 GHz clock rate
MHz
http://www.webopedia.com/TERM/M/MHz.html
One MHz represents one million cycles per second. The speed of microprocessors, called the clock speed, is measured in megahertz.
For example, a microprocessor that runs at 200 MHz executes 200 million cycles per second.
One GHz represents 1 billion cycles per second.
CPU Time or CPU Execution Time
The actual time the CPU spends computing for a specific task This time accounts for the time CPU is computing the given program, including operating system routines executed on the programs behave, and it does not include the time waiting for I/O and running other programs. Performance of processor/memory = 1 / CPU_time
CPU Execution Time Formula
N E N *T R
E = CPU Execution time for a program N = Number of CPU clock cycles for a program T = clock cycle Time R = clock Rate
Example
Job
N 10 4
10 seconds
Job
6 seconds
1.2 * N 6 R
Computer A 4 GHz
R = 8 GHz
Computer B X GHz
Clock cycles Per Instruction (CPI)
The average number of clock cycles per instruction for a program or program fragment
N = Number of CPU clock cycles for a program I = total Instructions for a program C = CPI
N I *C
The Big Picture
N E N *T R E N *T I *C *T Seconds Instructions Clock _ cycles Seconds Time * * Pr ogram Pr ogram Instructions Clock _ cycle N I *C E R R
Instruction count depends on the architecture, but not on the exact implementation Average CPI depends on design details and on the mix of types of instructions executed in an application
Understanding Program Performance
Instruction Count Algorithm Programming Language Compiler ISA X X X X
CPI Possibly X X X
Clock Rate
Using Performance Equation
Clock Cycle Time
Computer A Computer B 250 ps 500 ps
CPI
2 1.2
Which computer is faster for this program, and by how much?
CPU A I * 2 * 250 500I CPUB I * 1.2 * 500 600I PerformanceA CPUB 600I 1.2 PerformanceB CPU A 500I
Computing CPI
Done by looking at the different types of instructions and using their individual cycle counts n
Clock _ Cycle (CPI i * Ci )
i 1
Ci: The count of the number of instructions of class i executed CPIi: The average number of cycles per instruction for that instruction class l n: is the number of instruction classes
Example
CPI for this instruction class Code Sequence CPI for this instruction class
CPI
A 1
B 2
C 3
1 2
A 2 4
B 1 1
C 2 1
CC1 (2 * 1) (1 * 2) (2 * 3) 10 10 CPI1 2 5
CC2 ( 4 * 1) (1 * 2) (1 * 3) 9 CPI 2 9 1.5 6
Workload
A set of programs used for evaluating a computer or a system Benchmarks: programs specifically chosen to measure performance. SPEC 2000 benchmarks (12 integer, 14 floatingpoint programs). Performance results given by benchmarks may not be correct if the system (or the compiler of the system) is optimized for the benchmarks
Benchmark
Programs specifically chosen to measure performance Best determined by running a real application
use programs typical of expected workload e.g., compilers/editors, scientific applications, graphics...
nice for architects and designers companies have agreed on a set of real program and inputs
Small benchmarks
SPEC (System Performance Evaluation Cooperative)
Simplest Approach
Computer A Program 1 (sec) Program 2 (sec) 1 1000
Computer B 10 100
Total (sec)
1001
110
Performanc eB Execution _ Time A 1001 9.1 Performanc eA Execution _ Time B 110
Evaluating Performance
CPU Performance
Different classes and applications of computer require different types of benchmarks
Desktop
SPEC CPU benchmark to measure CPU performance and response time focusing on a specific task: DVD playback or graphic performance of games depend on the nature of intended application Throughput
Server
requirements on response time to individual events: database query and web page request SPECweb99
Embedded Computing
EEMBC
Reproducibility: list everything another experimenter need to duplicate the results
SPEC CPU2000 Benchmark
SPEC: CINT2000 and CFP2000
Relative Performance in Three Different Modes
Relative Energy Efficiency Comparison
Amdahls Law
Execution Time After Improvement = ( Execution Time Affected/ Amount of Improvement) + Execution Time Unaffected Principle: Make the common case fast
80 ET _ after (100 80) n 80 20 sec 20 n
Example: Suppose a program runs in 100 seconds on a machine, with multiply operation responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 5 times faster?"
MIPS (million instructions per second)
Instruction class
A B
CPI
1 2
Instruction _ Count MIPS Execution _ Time * 10 6
CC1 (5 * 1 1 * 2 1 * 3) * 109 10 * 109 10 * 109 E1 2.5 sec 4 * 109 (5 1 1) * 109 MIPS1 2800 2.5 * 106
CC2 (10 * 1 1 * 2 1 * 3) * 109 15 * 109 15 * 109 E1 3.75 sec 4 * 109 (10 1 1) * 109 MIPS2 3200 3.75 * 106
Code from
Compiler 1 Compiler 2
Instruction counts (in billion) A 5 10 B 1 1 C 1 1
Always trust execution time metric!
http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf
http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf
A Complete Example (I)
A Complete Example (II)
A Complete Example (III)
Three problems with using MIPS
MIPS specifies the instruction execution rate but does not take into account the capabilities of the instructions.
We cannot compare computers with different instruction sets using MIPS, since the instruction counts will certainly differ.
MIPS varies between programs on the same computer;
a computer cannot have a single MIPS rating for all programs.
MIPS can vary inversely with performance.