0% found this document useful (0 votes)
170 views29 pages

Data Hazards

Hazards in pipelines can be caused by data or control dependencies between instructions. Data hazards occur when an instruction needs a value that is not ready yet. Control hazards occur when the next instruction is unknown, such as after a branch. Pipelines can deal with hazards by stalling, bypassing/forwarding values, or predicting control flow. Stalling inserts bubbles but guarantees correctness. Bypassing provides values as soon as possible to avoid stalls. Prediction resolves control flow early but requires flushing if wrong.

Uploaded by

sivakumarb92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views29 pages

Data Hazards

Hazards in pipelines can be caused by data or control dependencies between instructions. Data hazards occur when an instruction needs a value that is not ready yet. Control hazards occur when the next instruction is unknown, such as after a branch. Pipelines can deal with hazards by stalling, bypassing/forwarding values, or predicting control flow. Stalling inserts bubbles but guarantees correctness. Bypassing provides values as soon as possible to avoid stalls. Prediction resolves control flow early but requires flushing if wrong.

Uploaded by

sivakumarb92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Hazards

Hazards: Key Points

Hazards cause imperfect pipelining

They prevent us from achieving CPI = 1


They are generally causes by counter flow data dependences in
the pipeline

Three kinds

Structural -- contention for hardware resources


Data -- a data value is not available when/where it is needed.
Control -- the next instruction to execute is not known.
ways to deal with hazards
TwoRemoval
hardware and/or complexity to work around the
hazard so--itadd
does not exist

Bypassing/forwarding
Speculation

Stall -- Sacrifice performance to prevent the hazard from


occurring
Stalling causes bubbles

Data Dependences
data dependence occurs whenever one
Ainstruction
needs a value produced by another.

Register values (for now)


Also memory accesses (more on this later)

add $s0, $t0, $t1


sub $t2, $s0, $t3

sw

$t1, 0($t2)

ld

$t3, 0($t2)

ld

$t4, 16($s4)

add $t3, $s0, $t4


and $t3, $t2, $t4

Dependences in the pipeline


our simple pipeline, these instructions cause a
Inhazard
Cycles
add $s0, $t0, $t1

sub $t2, $s0, $t3

Fetch

Deco
de

Fetch

EX

Mem

Deco
de

EX

Write
back

Mem

Write
back

How can we fix it?

Ideas?

Solution 1: Make the compiler deal with it.


hazards to the big A architecture
Expose
A result is available N instructions after the instruction

that generates it.


In the meantime, the register file has the old value.
delay slots

is N?
What
it change?
Can
What can the compiler do?
Fetch

Deco
de

EX

Mem

Write
back

Compiling for delay slots


compiler must fill the delay slots with other
The
instructions
What if it cant? No-ops

add $s0, $t0, $t1

Rearrange
instructions add $s0, $t0, $t1

sub $t2, $s0, $t3

and $t7, $t5, $t4

add $t3, $s0, $t4

sub $t2, $s0, $t3

and $t7, $t5, $t4

add $t3, $s0, $t4

Solution 2: Stall
you need a value that is not ready, stall
When
Suspend the execution of the executing instruction

and those that follow.


This introduces a pipeline bubble. A bubble is a lack of
work to do. It moves through the pipeline like an
instruction.
Cycles

add $s0, $t0, $t1

sub $t2, $s0, $t3

Fetch

Deco
de

Fetch

EX

Mem

Stall

Write
back

Deco
de

EX

Mem

Write
back

Stalling the pipeline


all pipeline stages before the stage where
Freeze
the hazard occurred.

Disable the PC update


Disable the pipeline registers

Insert nop control bits at stalled stage (decode in our


example)
How is this solution still potentially better than relying
on the compiler?

essentially equivalent to always inserting a


This
nop when a hazard exists

The compiler can still act like there are delay slots to avoid stalls.
Implementation details are not exposed in the ISA
9

The Impact of Stalling On Performance

= I * CPI * CT
ET
and CT are constant
IWhat
is the impact of stalling on CPI?

What do we need to know to figure it out?

10

The Impact of Stalling On Performance


= I * CPI * CT
ET
and CT are constant
IWhat
is the impact of stalling on CPI?

of instructions that stall: 30%


Fraction
CPI = 1
Baseline
Stall CPI = 1 + 2 = 3
New CPI = 0.3*3 + 0.7*1 = 1.6
11

Solution 3: Bypassing/Forwarding
values are computed in Ex and Mem but
Data
publicized in write back

The data exists! We should use it.


Results "published"
to registers

results known
inputs are needed
Fetch

Deco
de

EX

Mem

Write
back

12

Bypassing or Forwarding
Take the values, where ever they are
Cycles
add $s0, $t0, $t1

sub $t2, $s0, $t3

Fetch

Deco
de

Fetch

EX

Mem

Deco
de

EX

Write
back

Mem

Write
back

13

Forwarding Paths
Cycles
add $s0, $t0, $t1

sub $t2, $s0, $t3

sub $t2, $s0, $t3

sub $t2, $s0, $t3

Fetch

Deco
de

Fetch

EX

Mem

Deco
de

EX

Mem

Deco
de

EX

Mem

Deco
de

EX

Fetch

Fetch

Write
back

Write
back

Write
back

Mem

Write
back

14

Forwarding in Hardware
Add

Add
4

Shi<
le< 2

File

Write Addr
Write Data

16

Sign
Extend

Read
Data 2

32

ALU

Address
Write Data

Read
Data

Mem/WB

Read Addr 2

Data
Memory

Read
Data 1
Exec/Mem

Register

Dec/Exec

Read
Address

Read Addr 1
IFetch/Dec

PC

Instruc(on
Memory

Add

Forwarding for Loads


Load values come from the Mem stage
Cycles
ld

$s0, (0)$t0

sub $t2, $s0, $t3

Fetch

Deco
de

Fetch

EX

Mem

Deco
de

EX

Write
back

Mem

Time travel presents significant


implementation challenges

16

What can we do?


to the compiler
Punt
Easy enough.

Will work.
Same dangers apply as before.

If the compiler cant fix it, the hardware will stall

stall.
Always
when possible, stall otherwise
Forward
Here the compiler still has leverage

17

Hardware Cost of Forwarding


our pipeline, adding forwarding required
Inrelatively
little hardware.
deeper pipelines it gets much more
For
expensive

ALU * pipeline stages you need to forward over


Roughly:
modern processor have multiple ALUs (4-5)
Some
And deeper pipelines (4-5 stages of to forward across)
paths need to be supported.
NotIf a allpathforwarding
does not exist, the processor will need to stall.

18

Key Points: Control Hazards


occur when we dont know what the
Control
next instruction is
caused by branches
Mostly
for dealing with them
Strategies
Stall

Guess!

Leads to speculation
Flushing the pipeline
Strategies for making better guesses

Understand the difference between stall and flush


19

Control Hazards

add $s1, $s3, $s2

Computing the new PC

sub $s6, $s5, $s2


beq $s6, $s7, somewhere
and $s2, $s3, $s1

Fetch

Deco
de

EX

Mem

Write
back

20

Computing the PC
instruction
Non-branch
PC = PC + 4

When is PC ready?
Fetch

Deco
de

EX

Mem

Write
back

21

Computing the PC
instructions
Branch
bne $s1, $s2, offset

if ($s1 != $s2) { PC = PC + offset} else {PC = PC + 4;}

When is the value ready?


Fetch

Deco
de

EX

Mem

Write
back

22

Option 2: Simple Prediction


a processor tell the future?
Can
non-taken branches, the new PC is ready
For
immediately.
just assume the branch is not taken
Lets
called branch prediction or control
Also
speculation
What if we are wrong?
23

Predict Not-taken
Cycles
Not-taken

bne $t2, $s0, somewhere

Taken

bne $t2, $s4, else

Fetch

Deco
de

Fetch

add $s0, $t0, $t1


...
else:
sub $t2, $s0, $t3

EX

Mem

Deco
de

EX

Fetch

Deco
de

Write
back

Mem

EX

Write
back

Mem

Write
back

Squash
Fetch

Deco
de

start the add, and then, when we discover


We
the branch outcome, we squash it.

We flush the pipeline.

24

Simple static Prediction


means before run time
static
prediction schemes are possible
Many
taken
Predict
Loops are commons
Pros?
not-taken
Predict
Pros?

Not all branches are for loops.

Backward Taken/Forward not taken


Best of both worlds.
25

Implementing Backward taken/forward not


taken

in control
Changes
inputs to the control unit
New
The sign of the offset

The result of the branch


outputs from control
New
flush signal.
The
Inserts noop bits in datapath and control
26

The Importance of Pipeline depth

are two important parameters of the


There
pipeline that determine the impact of branches
on performance

Branch decode time -- how many cycles does it take to


identify a branch (in our case, this is less than 1)
Branch resolution time -- cycles until the real branch
outcome is known (in our case, this is 2 cycles)

27

Pentium 4 pipeline
1.Branches take 19 cycles to resolve
2.Identifying a branch takes 4 cycles.
3.Stalling is not an option.
4.Not quite as bad now, but BP is still very important.

Dynamic Branch Prediction


pipes demand higher accuracy than static
Long
schemes can deliver.
of making the the guess once, make it
Instead
every time we see the branch.
Predict future behavior based on past behavior

29

You might also like