CS104: Computer Organization: Lecture 08, 2 March 2020
CS104: Computer Organization: Lecture 08, 2 March 2020
02/03/2020
• The CPU clock rate depends on the specific CPU organization (design) and hardware
implementation technology (VLSI) used.
• A computer machine (ISA) instruction is comprised of a number of elementary or
micro operations which vary in number and complexity depending on the the
instruction and the exact CPU organization (Design).
– A micro operation is an elementary hardware operation that can be performed
during one CPU clock cycle.
– This corresponds to one micro-instruction in microprogrammed CPUs.
– Examples: register operations: shift, load, clear, increment, ALU operations:
add , subtract, etc.
• Thus: A single machine instruction may take one or more CPU cycles to complete
termed as the Cycles Per Instruction (CPI). Instructions Per Cycle = IPC = 1/CPI
• Average (or effective) CPI of a program: The average CPI of all instructions executed
in the program on a given CPU design.
Cycles/sec = Hertz = Hz
MHz = 106 Hz GHz = 109 Hz
L08
02/03/2020
Generic CPU Machine Instruction Processing Steps
Obtain instruction from program memory
Instruction
The Program Counter (PC) points to the instruction to be processed
Fetch
• How can one measure the performance of this machine (CPU) running this
program?
– Intuitively the machine (or CPU) is said to be faster or has better
performance running this program if the total execution time is
shorter.
– Thus the inverse of the total measured program execution time is a
possible performance measure or metric:
Seconds/program
Programs/second PerformanceA = 1 / Execution TimeA
How to compare performance of different machines?
What factors affect performance? How to improve performance?
L08
02/03/2020
Comparing Computer Performance Using
Execution Time
• To compare the performance of two machines (or CPUs) “A”, “B” running a
given specific program: The two CPUs may target
PerformanceA = 1 / Execution TimeA different ISAs provided
the program is written in a high
PerformanceB = 1 / Execution TimeB
level language (HLL)
• Machine A is n times faster than machine B means (or slower? if n < 1) :
PerformanceA Execution TimeB
Speedup = n = =
PerformanceB Execution TimeA
• Example: (i.e Speedup is ratio of performance, no units)
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Speedup= PerformanceA / PerformanceB = Execution TimeB / Execution TimeA
= 10 / 1 = 10
The performance of machine A is 10 times the performance of
machine B when running this program, or: Machine A is said to be 10
times faster than machine B when running this program.
L08
02/03/2020
CPU Execution Time: The CPU Equation
• A program is comprised of a number of instructions executed , I
– Measured in: instructions/program AKA Dynamic Executed Instruction Count
• The average instruction executed takes a number of cycles per
instruction (CPI) to be completed.
Or Instructions Per Cycle (IPC):
– Measured in: cycles/instruction, CPI IPC = 1/CPI
• CPU has a fixed clock cycle time C = 1/clock rate C = 1/f
– Measured in: seconds/cycle
T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed
Compiler
Instruction Set
Architecture (ISA)
Organization
(CPU Design)
Technology
(VLSI)
T = I x CPI x C
L08
02/03/2020
Instruction Count I
(executed)
Depends on:
Program Used Depends on:
Compiler CPI Clock CPU Organization
ISA (Average Cycle Technology (VLSI)
CPU Organization
CPI) C
L08
02/03/2020
Performance Comparison: Example
• From the previous example: A Program is running on a specific machine
(CPU) with the following parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: C = 1/(200x106)= 5x10-9 seconds
Where:
Executed
CPI
n
CPU clock cycles i
Ci
i 1
Executed Instruction Count I = S Ci
T = I x CPI x C
L08
02/03/2020
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
e.g ALU, Branch etc. B 2 CPU design
C 3
• Two code sequences have the following instruction counts:
Program Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles
CPI for sequence 1 = clock cycles / instruction count
i.e average or effective CPI = 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5
CPI C
n
CPU clock cycles i i
CPI = CPU Cycles / I
i 1
L08
02/03/2020
Instruction Frequency & CPI
• Given a program with n types or classes of instructions
with the following characteristics:
i = 1, 2, …. n
Ci = Count of instructions of typei executed
CPIi = Average cycles per instruction of typei
Fi = Frequency or fraction of instruction typei executed
= Ci/ total executed instruction count = Ci/ I
Where: Executed Instruction Count I = S Ci
Then:
CPI CPI i F i
n
i 1
i.e average or effective CPI
T = I x CPI x C
L08
02/03/2020
Sum = 2.2
Typical Mix
CPI CPI i F i
n
CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
T = I x CPI x C
L08
02/03/2020
Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA
Datapath
Control Megabytes per second.
Function Units
Transistors Wires Pins Cycles per second (clock rate).
– Microbenchmarks:
• Small, specially written programs to isolate a specific aspect of
performance characteristics: Processing: integer, floating point, local
memory, input/output, etc.
L08
02/03/2020
Types of Benchmarks
Pros Cons
• Very specific.
• Representative Actual Target Workload • Non-portable.
• Complex: Difficult
to run, or measure.
• Portable.
• Widely used. • Less representative
Full Application Benchmarks
• Measurements than actual workload.
useful in reality.
• Peak performance
• Identify peak results may be a long
performance and Microbenchmarks
way from real application
potential bottlenecks. performance
L08
02/03/2020
SPEC: System Performance Evaluation Corporation
The most popular and industry-standard set of CPU benchmarks.
• SPECmarks, 1989: Programs application domain: Engineering and scientific computation
Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?
20
L08
02/03/2020
Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?
Note:
Frequency only impacts dynamic power, not leakage power.
We assume that the program’s CPI is unchanged when
frequency is changed, i.e., exec time varies linearly
with cycle time. 21