0% found this document useful (0 votes)
54 views42 pages

Lec13 Data Hazards

The document discusses different types of pipeline hazards including data hazards that occur when an instruction depends on the result of a previous instruction still in the pipeline, control hazards caused by changes in the instruction flow, and structural hazards due to insufficient hardware resources. It describes how data hazards can be minimized through forwarding the results of an instruction directly to the inputs of dependent instructions in the pipeline rather than waiting for the results to be written to the register file.

Uploaded by

mayank p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views42 pages

Lec13 Data Hazards

The document discusses different types of pipeline hazards including data hazards that occur when an instruction depends on the result of a previous instruction still in the pipeline, control hazards caused by changes in the instruction flow, and structural hazards due to insufficient hardware resources. It describes how data hazards can be minimized through forwarding the results of an instruction directly to the inputs of dependent instructions in the pipeline rather than waiting for the results to be written to the register file.

Uploaded by

mayank p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

VLSI Architecture

ES ZG642 / MEL ZG 642


Session 13
BITS Pilani
Pawan Sharma
[email protected]
Pilani Campus 04/11/2023
Last Lecture

• Pipelined Implementation
• Datapath and Control
Today’s Lecture

• Pipeline Hazards
• Data Hazards
• Control Hazards
PIPELINE HAZARDS

BITS Pilani, Pilani Campus


Hazards prevent the next instruction stream from executing during its designated clock. There are three classes
of hazards.

Structural Hazards: A structural hazard in the laundry room would occur if we used a washer-dryer
combination instead of a separate washer and dryer

Arise from resource conflicts when the hardware cannot support all possible combinations of instructions in
simultaneous overlapped instructions.
E.g., suppose you have a single – not separate – instruction and data memory in pipeline with one read port. in
the same clock cycle. the first instruction is accessing data from memory while the fourth instruction is fetching an
instruction from that same memory. Without two memories, our pipeline could have a structural hazard.

Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

BITS Pilani, Pilani Campus


A machine having only one register-file write port, want to perform two
writes in a clock cycle. When a sequence of instructions encounter this
hazard the pipeline will stall one of the instructions until the required unit
is available.

A stall is commonly called bubble.

-When an instruction is stalled all instructions issued later from the stalled
instruction also stalled.

- Instructions issued earlier than the stalled instruction must continue

- No new instructions are fetched during the stall.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Data Hazards:

Arise when an instruction depends on the result of a previous instruction


data hazards arise from the dependence of one instruction on an earlier one that is still in the pipeline
For example, the last four instructions are all dependent on the result in register R1 of the first
instruction, the ADD instruction. If register R1 had the value 10 before the ADD instruction and −20
afterwards, the programmer intends that −20 will be used in the following instructions that refer to
register R1. However, the middle three instructions will read the value to be 10 if we do not intervene.

ADD R1, R2, R3


SUB R4, R1, R5
AND R6, R5, R1
OR R8, R1, R9
XOR R10, R1, R11

BITS Pilani, Pilani Campus


All the instructions after add use the result of add instruction.

CC1 CC2 CC3 CC4 CC5 CC6

ADD R1, R2, R3 IM REG ALU DM REG

SUB R4, R1, R5

ADD instruction writes the value of R1 in WB stage but SUB instruction is


to use result in ID stage – Data Hazard ( wrong value read)

BITS Pilani, Pilani Campus


CC1 CC2 CC3 CC4 CC5 CC6

ADD R1, R2, R3 IM REG ALU DM REG

SUB R4, R1, R5

AND R6, R1, R7

AND Suffers from Data Hazard.


BITS Pilani, Pilani Campus
ADD Instruction is writing
the value of R1 to the IF/ID ID/EX EX/MEM MEM/WB
register file in the same cycle
that OR instruction is
reading the value of R1. IF/ID ID/EX EX/MEM MEM/WB

We assume that the write


operation takes place in the IF/ID ID/EX EX/MEM
first half of the clock cycle, W|R
while the read operation
takes place in the second IF/ID ID/EX
half. Therefore, the updated
R1 value is available.
IF/ID
So the only data hazards occur for instructions 2 and 3. In this style of representation, we can easily identify true data
hazards as they are the only ones whose dependency lines go back in time. Instruction 2 reads R1 in cycle 3 and
instruction 3 reads R1 in cycle 4.

BITS Pilani, Pilani Campus


Solution?
• The primary solution is based on the observation that we don’t need to wait for the instruction to complete
before trying to resolve the data hazard.
• If we could somehow bypass the writeback and register read stages when needed, then we can eliminate
these data hazards.
• For the code sequence, as soon as the ALU creates the sum for the ADD, we can supply it as an input for
the SUB.
• Luckily ADD instruction calculates the new values in cycle 3. If we simply forward the data as soon as it
is calculated, then we will have it in time for the subsequent instructions to execute.

• Essentially, we need to pass the ALU output from ADD


directly to the SUB and AND instructions, without going
through the register file.
• It’s ok if we read the wrong values from the register file, but
we need to make sure the right values are used as input to the
ALU!

BITS Pilani, Pilani Campus


Forwarding logic

IM REG ALU DM REG

IM REG ALU DM

IM REG ALU

The ADD instruction produces its result in its


ALU or EX stage, during cycle 3.
The SUB and AND need the new value of R1
IM REG
in their EX stages, during clock cycles 4-5.

BITS Pilani, Pilani Campus


Minimizing Data hazard Stalls by Forwarding.

If the results can be moved from where the ADD produces it, the EX/MEM
register, to where the SUB needs it, the ALU input latches, then the need for stall
can be avoided.

ALU result from the EX/MEM register is always fedback to ALU input latches.

Forwarding logic detects whether previous ALU operation has written the register
corresponding to a source for the current ALU operation, if so, then control logic
selects the forwarded result as the ALU input rather than value read from register
file.

Forward results not only from the immediately previous instruction, but possibly
from instruction started three clock cycles earlier.

BITS Pilani, Pilani Campus


Forwarding hardware outline

• If there is no hazard, the ALU’s operands will come from the


register file, just like before.
• If there is a hazard, the operands will come from either the
EX/MEM or MEM/WB pipeline registers instead.
• If we can take the inputs to the ALU from any pipeline register
rather than just ID/EX, then we can forward the proper data.
• The ALU sources will be selected by two new multiplexers, with
control signals named ForwardA and ForwardB.
• By adding wider multiplexors to the input of the ALU, and with
the proper controls, we can run the pipeline at full speed in the
presence of these data dependences.

BITS Pilani, Pilani Campus


Wider MUXs

BITS Pilani, Pilani Campus


Hazard Detection
So how can the hardware determine if a hazard exists?
• We only have a hazard if either the source registers, ID/EX.RegisterRs and ID/EX.RegisterRt,
are dependent on destination registers, EX/MEM.RegisterRd or MEM/WB.RegisterRd
• An EX/MEM hazard occurs between the instruction currently in its EX stage and the previous
instruction if:
• The previous instruction will write to the register file, and ADD R1, R2, R3
• The destination is one of the ALU source registers in the EX stage. SUB R4, R1, R5
Hazard conditions: Lets specify these conditions. To specify the name of a particular field in a particular
pipeline register, we use the following notation:
PipelineRegister.FieldName
for example, the number of the register corresponding to Read Data 2 from the register file in the ID/EX
register is identifiable as ID/EX.RegisterRt.

The first hazard is on register R1, between the result of


ADD R1, R2, R3 and the first read operand of
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs SUB R4, R1, R5
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt This hazard can be detected when the SUB instruction is in the
EX stage and the prior (ADD) instruction is in the MEM stage,
so this is hazard 1a:
EX/MEM.RegisterRd = ID/EX.RegisterRs = R1

BITS Pilani, Pilani Campus


• Because some instructions do not write registers, this policy of forwarding is
inaccurate; sometimes it would forward when it shouldn’t.
• One solution is simply to check if the RegWrite signal will be active: examining
the WB control field of the pipeline register during the EX and MEM stages
determines whether RegWrite is asserted.
• Also to forward or not depends on:
– if the destination register of the later instruction is $0 – in that case there is no
need to forward value ($0 is always 0 and never overwritten).

So we have to add EX/MEM.RegisterRd ≠ 0 to the first hazard condition (1a and 1b)

BITS Pilani, Pilani Campus


Data Hazard: Detection and Forwarding
Forwarding unit determines multiplexor control according to the
following rules:

1. EX hazard
if ( EX/MEM.RegWrite // if there is a write…
and ( EX/MEM.RegisterRd ≠ 0 ) // to a non-$0 register…
and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then…
ForwardA = 10

if ( EX/MEM.RegWrite // if there is a write…


and ( EX/MEM.RegisterRd ≠ 0 ) // to a non-$0 register…
and ( EX/MEM.RegisterRd = ID/EX.RegisterRt ) ) // which matches, then…
ForwardB = 10

BITS Pilani, Pilani Campus


• A MEM/WB hazard may occur between an instruction in the EX stage and the instruction from two cycles
ago.
The ADD-AND is a type2 hazard.
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs ADD R1, R2, R3
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt SUB R4, R1, R5
AND R6, R5, R1
MEM/WB.RegisterRd = ID/EX.RegisterRt = R1

and add MEM/WB.RegisterRd ≠ 0 condition also:

MEM hazard:
If (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and ( MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01

If (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

BITS Pilani, Pilani Campus


Complication!

• When summing a vector of numbers in a single register, a sequence


of instructions will all read and write to the same register:
• add $1,$1,$2
• add $1,$1,$3
• add $1,$1,$4
• ...

• A potential data hazards between the result of the instruction in the


WB stage, the result of the instruction in the MEM stage, and the
source operand of the instruction in the ALU stage.
• there is a conflict between the result of the WB stage instruction and
the MEM stage instruction – which should be forwarded?
• In this case, the result is forwarded from the MEM stage because the
result in the MEM stage is the more recent result.

BITS Pilani, Pilani Campus


Data Hazard: Detection and Forwarding
Only forward if EX hazard condition is not true. Thus, the control for the MEM hazard would be
(with the additions highlighted):

MEM hazard
if ( MEM/WB.RegWrite // if there is a write…
and ( MEM/WB.RegisterRd ≠ 0 ) // to a non-$0 register…
and not ( EX/MEM.RegWrite and EX/MEM.RegisterRd ≠ 0) // and not already a register match
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) with earlier pipeline register…
and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline
register, then…
ForwardA = 01

if ( MEM/WB.RegWrite // if there is a write…


and ( MEM/WB.RegisterRd ≠ 0 ) // to a non-$0 register…
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) // and not already a register match
with earlier pipeline register…
and ( MEM/WB.RegisterRd = ID/EX.RegisterRt ) ) // but match with later pipeline
register, then…
ForwardB = 01

BITS Pilani, Pilani Campus


Forwarding Hardware: Multiplexor Control
Mux control Source Explanation
ForwardA = 00 ID/EX The first ALU operand comes from the register file
ForwardA = 01 MEM/WB The first ALU operand is forwarded from data memory
or an earlier ALU result
ForwardA = 10 EX/MEM The first ALU operand is forwarded from prior ALU result

ForwardB = 00 ID/EX The second ALU operand comes from the register file
ForwardB = 01 MEM/WB The second ALU operand is forwarded from data memory
or an earlier ALU result
ForwardB = 10 EX/MEM The second ALU operand is forwarded from prior ALU
result

BITS Pilani, Pilani Campus


Note: this diag does not include forwarding of a store value to a store instruction.

BITS Pilani, Pilani Campus


Forwarding Hardware with Control
Called forwarding unit, not hazard detection unit,
ID/EX
because once data is forwarded there is no hazard!
WB
EX/MEM

Control M WB
MEM/WB

IF/ID EX M WB

M
Instruction

u
x
Registers
Instruction Data
PC ALU
memory memory M
u
M x
u
x

IF/ID.RegisterRs Rs
IF/ID.RegisterRt Rt
IF/ID.RegisterRt Rt
M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
Forwarding MEM/WB.RegisterRd
unit

Datapath with forwarding hardware and control wires – certain details, e.g., branching hardware,
are omitted to simplify the drawing Note: so far we have only handled forwarding to R-type instructions

BITS Pilani, Pilani Campus


Can’t always avoid stalls by forwarding
– If value not computed when needed
– Can’t forward backward in time!

LW R2, 20(R1)

AND R4, R2, R5

OR R8, R2, R6

ADD R9, R8, R2

SLT R1, R6, R7

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
• LW instruction does not have data until the end of clock
cycle 4, while the AND instruction needs to have data by
the beginning of third cycle.

• For OR, result of ALU can be forwarded from MEM/WB


stage.

• ADD instruction receives through register file.

BITS Pilani, Pilani Campus


For AND instruction the results arrives too late, need to
add hardware called pipeline interlock to preserve
correct execution pattern.

Interlock detects hazard and stalls the pipeline until


the hazard is cleared.

Interlock stalls pipeline beginning with instruction that


wants to use data until source instruction produces it.

BITS Pilani, Pilani Campus


Hazard Detection Logic to Stall
Hazard detection unit implements the following check if to stall

if ( ID/EX.MemRead // if the instruction in the EX stage is a load…


and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register
or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register
// of the instruction in the ID stage, then…
stall the pipeline

BITS Pilani, Pilani Campus


Mechanics of Stalling
If the check to stall verifies, then the pipeline needs to stall only 1 clock cycle after the load as
after that the forwarding unit can resolve the dependency
What the hardware does to stall the pipeline by1 cycle:
– does not let the IF/ID register change (disable write!) – this will cause the instruction in
the ID stage to repeat, i.e., stall
– therefore, the instruction, just behind, in the IF stage must be stalled as well – so
hardware does not let the PC change (disable write!) – this will cause the instruction in
the IF stage to repeat, i.e., stall
– changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so
effectively the instruction just behind the load becomes a nop – a bubble is said to have
been inserted into the pipeline
– In this example, the hazard forces the AND and OR instructions to repeat in clock cycle 4
what they did in clock cycle 3: AND reads registers and decodes, and OR is re-fetched
from instruction memory. Such repeated work is what a stall looks like, but its effect is to
stretch the time of the AND and OR instructions and delay the fetch of the add
instruction
• note that we cannot turn that whole instruction into an nop by 0ing all the bits in the
instruction itself – recall nop = 00…0 (32 bits) – because it has already been decoded and
control signals generated. The instructions are the same but their respective control lines are
made zero

BITS Pilani, Pilani Campus


Hazard Detection Unit

Hazard ID/EX.MemRead
detection
unit ID/EX

WB
IF/IDWrite

EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
PCWrite

M
Instruction

u
x
Registers
Instruction ALU Data
PC memory M
memory
u
M x
u
x

IF/ID.RegisterRs
IF/ID.RegisterRt
IF/ID.RegisterRt Rt M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
ID/EX.RegisterRt Rs Forwarding MEM/WB.RegisterRd
Rt unit

Datapath with forwarding hardware, the hazard detection unit and controls wires – certain details, e.g., branching
hardware are omitted to simplify the drawing
BITS Pilani, Pilani Campus
• Control or Branch Hazard: When the proper instruction
cannot execute in the proper pipeline clock cycle
because the instruction that was fetched is not the one
that is needed; that is, the flow of instruction addresses
is not what the pipeline expected.
• Arises from the need to make a decision based on the
result of control flow instruction while others (in pipeline)
are executing because we must begin fetching the
instruction following the branch on the very next clock
cycle.

BITS Pilani, Pilani Campus


Control Dependence

• Question: What should the fetch-PC be in the next clock


cycle?
• If the instruction that is fetched is a control-flow instruction:
• How do we determine the next fetch PC?
• In fact, how do we determine whether or not the fetched
instruction is a control-flow instruction to begin with because
we have not even started decoding the instruction?
• It is critical to keep the pipeline full with correct sequence of
dynamic instructions.
• How do we guess the next fetch address?
• When do we come to know (in which pipeline stage) of the
branch outcome and its target address?

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Control Hazards
Control hazard (branch hazard): When you need to make a decision based
on the result of a control flow instruction still executing in pipeline

beq $s0, $s1, label

• When a branch is executed, it may or may not change the PC (program


counter) to something other than its current value plus 4.
• If a branch changes the PC to its target address, it is a taken branch; if it
does not, it is not taken.
• If instruction i is a taken branch, then the PC is normally not changed
until the end of MEM stage (after the completion of the address
calculation and comparison.)

BITS Pilani, Pilani Campus


• In pipelined architecture, we must begin fetching the instruction following the branch on the very
next clock cycle, yet in our design the decision about whether to branch or not doesn’t occur until
the MEM pipeline stage
• Nevertheless, in the IF stage, the pipeline cannot possibly know what the next instruction should
be, since it only just received the branch instruction from memory!
• Let’s assume that we put in enough extra hardware so that we can test registers, calculate the
branch address, and update the PC during the second stage of the pipeline (ID stage).
• Even with this extra hardware, the pipeline involving conditional branches would look like figure
below with one stall. The lw instruction, executed if the branch fails, is stalled one extra 2 ns
clock cycle before starting.
• This solution to stall immediately after we fetch a branch, waiting until the pipeline determines
the outcome of the branch and knows what instruction address to fetch from--not efficient, will
slow the pipeline significantly!

BITS Pilani, Pilani Campus


Control (or Branch) Hazards
• Another solution: predict the branch outcome.
– e.g., always predict branch-not-taken – continue with
next sequential instructions. This option does not slow
down the pipeline when you are correct and pipeline
works at full speed.
– if the prediction is wrong have to flush the pipeline
following the branch – discard instructions already
fetched or decoded (stalls!!) – and continue execution at
the branch target.
– This is done by setting their control lines to 0 when you
reach the EX stage. The next significant instruction will
be the branch target.
Predicting Branch-not-taken:
Misprediction delay
Program Time (in clock cycles)
execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
order
(in instructions)

40 beq $1, $3, 7 IM Reg DM Reg

44 and $12, $2, $5 IM Reg DM Reg

48 or $13, $6, $2 IM Reg DM Reg

52 add $14, $2, $2 IM Reg DM Reg

72 lw $4, 50($7) IM Reg DM Reg

The outcome of branch taken (prediction wrong) is decided only when beq is in the MEM stage, so the following
three sequential instructions already in the pipeline in their EX, ID, and IF stages have to be flushed and execution
resumes at lw
 The instructions fetched after the branch
instruction are ignored and the fetch is restarted
once the branch target is known.

Three cycles wasted for every branch : significant loss

 With 30% branch frequency and an ideal CPI of 1,


the machine with branch stall achieves only about
half ideal speedup from pipelining.

BITS Pilani, Pilani Campus


• As we said earlier, one solution is to attempt to reduce the
potential delay or branch resolution latency of the branch
instruction. is to move the decision to an earlier stage, then
we can decrease the number of instructions we potentially
need to flush.
• Right now, because we only know the decision in the 4th
stage, we have to gamble on three instructions which might
have to be flushed if the branch is taken.

BITS Pilani, Pilani Campus


To reduce the number of clock cycles in branch stall, we need to:

- Find out if branch is taken (or) not taken, earlier in the pipeline.

- Compute the taken PC (address of branch target)

Both the steps to be taken as early as possible in the pipeline.

In MIPS ( BEQZ) require testing a register for equality to zero.

To complete this decision by the end of ID, move zero testing into
that cycle. Both PCs (taken and untaken) must be computed early.

BITS Pilani, Pilani Campus

You might also like