Chapter 1 Performance
Chapter 1 Performance
1
Outline
Defining Performance
Measuring Performance
2
Performance
7
Execution Time
Elapsed Time
counts everything (disk and memory accesses, waiting for I/O,
running other programs, etc.) from start to finish
a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
CPU time
doesn't count waiting for I/O or time spent running other programs
can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
Our focus: user CPU time (CPU execution time or, simply,
execution time)
time spent executing the lines of code that are in our program
8
System Time & User Time
• System time - This is the time that the CPU was used for executing
system calls. The CPU time spent in the operating system performing
tasks on behalf of the program. You can think of I/O operations,
context switches, inter process communication, memory management,
interrupt requests, etc.
• User Time - This is the time the CPU spent running your code. It is called
user time because the CPU is used by an operation in a program that a
user has started.
9
Definition of Performance(1)
For some program running on machine X:
PerformanceX / PerformanceY = n
10
Definition of Performance(2)
11
Relative Performance
12
Clock Cycles
Instead of reporting execution time in seconds, we often use cycles. In
modern computers hardware events progress cycle by cycle: in other
words, each event, e.g., multiplication, addition, etc., is a sequence of
cycles
tick
equivalently
14
How many cycles are required
for a program?
Could assume that # of cycles = # of instructions
2nd instruction
3rd instruction
1st instruction
4th
5th
6th
...
time
15
How many cycles are required
for a program?
time
16
Example
Our favorite program runs in 10 seconds on computer A, which
has a 400Mhz. clock.
We are trying to help a computer designer build a new machine
B, that will run this program in 6 seconds. The designer can
use new (or perhaps more expensive) technology to
substantially increase the clock rate, but has informed us that
this increase will affect the rest of the CPU design, causing
machine B to require 1.2 times as many clock cycles as machine
A for the same program.
17
Example(Sol.) seconds
cycles
program program
seconds
cycle
18
CPU clock cycles
20
Performance Equation II
CPU execution time Instruction count average CPI Clock cycle time
=
for a program for a program
21
CPI Example I (Using Performance Equation)
• Suppose we have two implementations of the same
instruction set architecture (ISA). For some program:
– machine A has a clock cycle time of 250 ps. and a CPI of 2.0
– machine B has a clock cycle time of 500 ps. and a CPI of 1.2
22
CPI Example I (Using Performance Equation)
23
CPI Example II
A compiler designer is trying to decide between two code
sequences for a particular machine.
Based on the hardware implementation, there are three
different classes of instructions: Class A, Class B, and Class C,
and they require 1, 2 and 3 cycles (respectively).
The first code sequence has 5 instructions:
2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much? What is the CPI for each
sequence?
24
CPI Example II
Which sequence will be faster? How much?
A compiler designer is trying to decide
between two code sequences for a seconds cycles seconds
particular machine. Based on the
program program cycle
hardware implementation, there are three
different classes of instructions: Class A,
Class B, and Class C, and they require one,
two, and three cycles (respectively).
ECE369
26
MIPS Example
ECE369
27
Amdahl's Law
• Example:
"Suppose a program runs in 100 seconds on a machine, with
multiply operations responsible for 80 seconds of this time. How
much do we have to improve the speed of multiplication if we want
the program to run 4 times faster?"
How about making it 5 times faster?
ECE369
28
Amdahl’s Law
1. Speed up = 4
2. Old execution time = 100
3. New execution time = 100/4 = 25
4. If 80 seconds is used by the affected part =>
5. Unaffected part = 100-80 = 20 sec
6. Execution time new = Execution time
unaffected + Execution time affected /
Improvement
7. 25= 20 + 80/Improvement
8. Improvement = 16
ECE369
29
How about making it 5 times faster?
ECE369
30
Performance Measurement Overview
CPUtime CPUclock _ cycles _ for _ the _ pogram Clock _ Cycle _ Time
IC CPI
CPUtime
Clock _ Rate
ECE369
31
Summary
Performance is specific to a particular program
total execution time is a consistent summary of performance
For a given architecture performance increases come from:
increases in clock rate (without adverse CPI affects)
improvements in processor organization that lower CPI
compiler enhancements that lower CPI and/or instruction count
Pitfall: expecting improvement in one aspect of a machine’s
performance to affect the total performance
You should not always believe everything you read! Read
carefully! See newspaper articles, e.g., Exercise 2.37!!
32