Lec 03

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

LECTURE - 03

Amdahl's Law
e
Amdahl's law:

Diminishing returns

Limit on overall speedup


1-F
F
1-F
F/Speedup
e
Corollary: make the
common case fast
Overall speedup=
(1-F)+F
(1-F)+
F
Speedup
Illustrating Amdahl's Law
e
Example: implement cache, or faster ALU?

Cache improves performance by 10x

ALU improves performance by 3x


e
Depends on fraction of instructions

Suppose
F
mem
=0.2, F
alu
=0.5, F
other
=0.3
Speedup withcache=
1
0.8+0.2/10
=1.22
Speedup with faster ALU=
1
0.5+0.5/ 3
=1.5
Example continued...
e
Fixing for what value of is
adding a cache better?
F
alu
=0.5 F
mem
1
1-F
mem
+F
mem
/10
1.5
=F
mem

10
27
=0.36
The CPU Performance Equation
CPUtime=Num. clock cyclesClock cycletime
CPUtime=Num. of clock cycles-Clock rate
OR
CPUtime=ICCPI Cycle time
Putting these together
Num. of clock cycles
=InstructionCount Cycles Per Instruction
=ICCPI
For a program,
More on the Equation
e
This form is convenient

Involves many relevant parameters


e
Remembering is easy
CPUtime=
Seconds
Program
=
Seconds
Clock cycle

Clock cycles
Instruction

Instructions
Program
e
With CPI as the independent variable
CPI=
CPU time
Clock cycletimeIC
Other Convenient Forms of the
Equation
e
Number of clock cycles can be counted as:
CPUclock cycles=
_
i=1
n
CPI
i
IC
i
Hence , CPUtime=(
_
i=1
n
CPI
i
IC
i
)Clock cycletime
e
Calculating in terms of
CPI
CPI
i
CPI =
CPU time
Clock cycletimeIC
=
_
i=1
n
CPI
i
(
IC
i
IC
)
Usefulness of the Equation
e
easier to measure than

Equivalently, is measured through


e
Equation includes relevant parameters such
as the cycle time
IC
i
F
i
F
i
IC
i
Measuring the Parameters for
the Equation
e
Clock cycle time:

Easy for existing architectures

Needs to be estimated in the design process


e
Instruction Count:

Requires a compiler

And, simulator/interpreter, or instrumentation code


e
CPI for each instruction type:

Easy for simple architectures

Pipelines, caches introduce complications

Need to simulate and measure average CPI


A Design Example
e
A design choice for conditional branch
instructions:

Choice 1: condition code is set by a compare


instruction, checked by the next (branch)
instruction
e
20% instructions are branches, and another 20% are
compares
e
2 cycles per branch, 1 cycle for all others
e
Clock-rate is 25% faster

Choice 2: single instruction for compare and


branch
e
Which choice is better?
Solution for Design Example
CPUtime
1
=
IC
1
|(0.81)+(0.22)
1.25C
=
IC
1
C

1.2
1.25
CPUtime
2
=
IC
1
|(0.61)+(0.22)
C
=
IC
1
C
Detailed Example: Using Caches
e
Thumb rule in hardware design:

Smaller is faster
e
Signal propagation delay is lesser
e
More power per memory cell
e
Observation w.r.t. software:

Locality of reference

Spatial as well as temporal


The Memory Hierarchy
CPU
Registers
Cache Memory
I/O Devices
Slower
5ns 10ns 100ns O(10ms)
Larger
200B 512KB 512MB O(10-100GB)
Cheaper
Modifying the CPU Performance
Equation
e
Caches involved hits and misses
e
Cache miss ==> memory stalls
CPU time=(CPU clock cycles+Memory stall cycles)
Clock cycle
Memory stall cycles=Num. missesMiss penalty
=ICMisses per instructionMiss penalty
=ICMem. refs. per instructionMiss rateMiss penalty
e
Equation in the final form is useful: parameters
can be measured
Some Numerics...
Fractionof memory access instructions=0.4
CPI for memoryinstructions(hits)=2
CPI for other instructions=1
Choice1: 0.04 missrate , 25cycle penalty
Choice2: 0.02miss rate , 50cycle penalty
Which is a better choice?
What is the overall average CPI?
CPI
avg
=(0.61)+(0.42)+(1+0.4)0.0250=2.8
Next week...
e
Instruction set architecture
e
Pipelining
e
Pipelining hazards

You might also like