0% found this document useful (0 votes)
30 views48 pages

Pipeline - Instr - Super Branch

This document discusses techniques for handling branches in pipelined processors. It describes three types of data hazards - RAW, WAW, and WAR. It also discusses instruction pipelining, the Tomasulo algorithm for out-of-order execution, branch prediction strategies like 1-bit and 2-bit prediction, and delayed branch scheduling. The key advantages of Tomasulo's scheme are distributed hazard detection logic and elimination of stalls for WAW and WAR hazards. Dynamic branch prediction aims to reduce penalties from mispredicted branches.

Uploaded by

SHEENA Y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views48 pages

Pipeline - Instr - Super Branch

This document discusses techniques for handling branches in pipelined processors. It describes three types of data hazards - RAW, WAW, and WAR. It also discusses instruction pipelining, the Tomasulo algorithm for out-of-order execution, branch prediction strategies like 1-bit and 2-bit prediction, and delayed branch scheduling. The key advantages of Tomasulo's scheme are distributed hazard detection logic and elimination of stalls for WAW and WAR hazards. Dynamic branch prediction aims to reduce penalties from mispredicted branches.

Uploaded by

SHEENA Y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

COMPUTER SYSTEM ARCHITECTURE - CS

405

Module - 5 Part - 1
Instruction Pipeline Design
Data Hazard Classification - RAW
• Three types of data hazards
• Instruction i comes before instruction j
– RAW : Read After Write  R(I)∩ D(J) ≠ ɸ
(flow dependence)
• j tries to read a source before i writes it, so j incorrectly
gets the old value. Solve via forwarding.
Data Hazard Classification - WAW
– WAW : Write After Write  R(I)∩ R(J) ≠ ɸ
(output dependence)
• j tries to write an operand before it is written by i, so
we end up writing values in the wrong order
• Only occurs if there are writes in multiple stages
– Not a problem with single cycle integer
instructions
Data Hazard Classification - WAR
• WAR : Write After Read  D(I) ∩ R(J) ≠ ɸ
• (anti dependence)
– j tries to write a destination before it is read by i, so i incorrectly gets
the new value
– For this to happen we need a pipeline that writes results early in the
pipeline, and then other instruction read a source later in the pipeline

• RAR : Read After Read


– Is this a hazard?
Instruction / Code scheduling
• Code scheduling
 To reduce pipeline stalls
 To increase ILP (instruction level parallelism)
Tomasulo Organization
From Mem FP Op FP Registers
Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5 Store
Load6
Buffers

Add1
Add2 Mult1
Add3 Mult2

Reservation To Mem
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

Common Data Bus (CDB)


Tomasulo Algorithm
• Control & buffers distributed with Function Units
(FU)
– FU buffers called “reservation stations”; have pending operands
• Registers in instructions replaced by values or
pointers to reservation stations(RS);
– form of register renaming ;
– avoids WAR, WAW hazards
– More reservation stations than registers, so can do optimizations compilers
can’t
• Results to FU from RS, not through registers, over
Common Data Bus that broadcasts results to all FUs
• Load and Stores are FUs with reservation stations
• instructions can go past branches
How Tomasulo overlaps loop
iterations
• Register renaming
– Multiple iterations use different physical destinations for registers (dynamic
loop unrolling).

• Reservation stations
– Instructions advance past integer control flow operations
– buffer old values of registers - avoiding WAR stall in scoreboard.
Tomasulo
• For IBM 360/91 (before caches!)
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360)
prevented interesting compiler scheduling of
operations
– Tomasulo: how to get more effective registers — renaming in hardware!

• Same idea used today


– HP 8000, MIPS 10000, Core xx, Power 4,5,6, 7…
Tomasulo’s scheme offers 2 major
advantages
(1)the distribution of the hazard detection logic
– distributed reservation stations and the CDB
– If multiple instructions waiting on single result, & each instruction has other
operand, then instructions can be released simultaneously by broadcast on
CDB
– If a centralized register file were used, the units would have to read their
results from the registers when register buses are available.

(2) the elimination of stalls for WAW and WAR


hazards
Branch handling techniques
• Action of fetching a non sequential or remote instruction after a branch
instruction is branch taken
• Instruction to be executed after a branch taken is branch target
• No. of pipeline cycles wasted between branch taken and the fetching of
its branch target is Delay slot (b)
0<= b <= k-1, k is no of pipeline stages

All instructions after branch in pipeline are flushed, losing a number


of useful cycles.
p= prob of a conditional branch instruction (20%)
q= prob of successfully executed branch (60%)
penalty = pqnbƬ, (bƬ extra pipeline cycles)

If b= k-1 = 7
Pipeline performance can be degraded by 46% with branching when
instruction stream is sufficiently long .
Branch Handling Techniques
Dynamic Hardware Prediction

Dynamic Branch Prediction is the ability of the hardware to make


an educated guess about which way a branch will go - will the
branch be taken or not.
The hardware can look for clues based on the instructions, or it
can use past history.
In the simple 5-stage MIPS pipeline, predict-not taken is simple
prediction strategy. This is ok since the penalty for misprediction
is not much.
If the penalty is large (as in many deeply pipelined machines or
superscalar processors), cannot afford to make frequent
incorrect predictions.
The predictions have to be more sophisticated.
Some popular schemes are:
 1-bit / 2-bit prediction using Branch Prediction Buffers or
Branch target buffer
Branch Prediction

The buffer is indexed by the last few bits of address of the branch
instructions.
Buffer read in the “D” phase. Penalty for wrong prediction depends on
when the PC is calculated.
Dynamic Branch Prediction
• Performance = ƒ(accuracy, cost of misprediction)
• Branch History Lower bits of PC address index table of 1-bit values
– Says whether or not branch taken last time
• Problem: in a loop, 1-bit BHT will cause two mis-predictions:
– End of loop case, when it exits instead of looping as before
– First time through loop on next time through code, when it predicts
exit instead of looping

P
Address 0 r
e
d
31 1 Bits 13 - 2 i
c
t
1023 i
o
n
Dynamic Branch Prediction
• A 1-bit scheme for dynamic branch prediction
for (i =10, i > 0, i =i - 1)
x := x+1

With the branch instruction, 1-bit BHT a history bit is associated.


The bit is changed as follows:
Dynamic Branch Prediction
• Solution:
• 2-bit scheme where change prediction only if get misprediction
twice:
• Only wrong once for branches that execute an unusual direction
once (eg.loop)
Dynamic Branch Prediction
Delayed Branches
Delayed Branches
Delayed Branches

Limitations on delayed-branch scheduling:


- restrictions on the instructions that are scheduled into the
delay slots
- ability to predict at compile time whether a branch is likely
to be taken or not.

You might also like