0% found this document useful (0 votes)
5 views49 pages

10 Branchprediction

The document discusses branch prediction in computer organization, focusing on control hazards and speculative execution to improve performance. It outlines the steps involved in branch prediction, including identifying branches, predicting their direction, and determining their target addresses using structures like the Branch Target Buffer (BTB). Various prediction strategies are explored, including dynamic branch prediction, bimodal predictors, and hybrid predictors that combine different approaches to enhance accuracy.

Uploaded by

qyx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

10 Branchprediction

The document discusses branch prediction in computer organization, focusing on control hazards and speculative execution to improve performance. It outlines the steps involved in branch prediction, including identifying branches, predicting their direction, and determining their target addresses using structures like the Branch Target Buffer (BTB). Various prediction strategies are explored, including dynamic branch prediction, bimodal predictors, and hybrid predictors that combine different approaches to enhance accuracy.

Uploaded by

qyx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Branch Prediction

CIS 5710
Computer Organization and Design
This Unit: Branch Prediction
App App App • Control hazards
System software • Branch prediction

Mem CPU I/O

CIS 5710 | Prof Joseph Devietti 2


Readings
• P&H
• Chapter 4

3
Control Dependences and
Branch Prediction

CIS 5710 | Prof Joseph Devietti 4


What About Branches?

PC PC
D X <<
2
+
4 M
Register A
S O
File X
Insn s1 s2 d B B
PC
Mem
IR IR IR

• Wait for branch outcome (two-cycle penalty)


• Fetch past branch before outcome is known
• Default: assume “not-taken” (at fetch, can’t tell it’s a branch)

CIS 5710 | Prof Joseph Devietti 5


Big Idea: Speculative Execution
• Speculation: take risk on chance of profit
• Speculative execution
• Execute before all parameters known with certainty
• Correct speculation
+ Avoid stall, improve performance
• Incorrect speculation (mis-speculation)
– Must abort/flush/squash incorrect insns
– Must undo incorrect changes (recover pre-speculation
state)
• Control speculation: speculation aimed at
control hazards
• Are these the correct insns to execute next?
CIS 5710 | Prof Joseph Devietti 6
Control Speculation Mechanics
• Guess branch target, start fetching at guessed
position
• Doing nothing is implicitly guessing target is the next
sequential PC
• We were already speculating before!
• Can actively guess other targets: dynamic branch
prediction
• Execute branch to verify (check) our guess
• Correct speculation? keep going
• Mis-speculation? Flush mis-speculated insns
• Hopefully haven’t modified permanent state (Regfile,
DMem)
👍Happens naturally in our in-order 5-stage pipeline

CIS 5710 | Prof Joseph Devietti 7


Branch Prediction Components
regfile

I$ D$
B
P

• Step #1: is it a branch?


• Easy after decode...
• Step #2: is the branch taken or not taken?
• Direction predictor (applies to conditional branches only)
• Predicts taken/not-taken
• Step #3: if the branch is taken, where does it go?
• Easy after decode…

CIS 5710 | Prof Joseph Devietti 8


Branch Prediction Steps
• Which insn’s behavior are
is insn a no
branch?
PC+4 we trying to predict?
• Where does PC come
yes from?

T or NT? Not Taken

Taken
hardware structure:
predicted
target branch target buffer

direction predictor

CIS 5710 | Prof Joseph Devietti 9


When to Perform Branch Prediction?
• Option #1: During Decode
• Look at instruction opcode to determine branch instructions
• Can calculate next PC from instruction (for PC-relative
branches)
– One cycle “mis-fetch” penalty even if branch predictor is
correct
• we can do better!
1 2 3 4 5 6 7 8 9
bne x3,x0,targ F D X M W
targ:add x4,x5,x4 F D X M W

• Option #2: During Fetch?


• How do we do that?

CIS 5710 | Prof Joseph Devietti 10


Dynamic Branch Prediction

<>

BP TG TG
PC PC
<<
2
+
4 D X M
Register A
S O
File X
Insn s1 s2 d B B
PC Mem
IR IR IR
nop nop
• Dynamic branch prediction: hw guesses outcome
• Start fetching from guessed address
• Flush on mis-prediction
CIS 5710 | Prof Joseph Devietti 11
Identifying Branches

CIS 5710 | Prof Joseph Devietti 12


Branch Prediction Components
regfile

I$ D$
B
P

• Step #1: is it a branch?


• Easy after decode... during fetch: predictor
• Step #2: is the branch taken or not taken?
• Direction predictor (later)
• Step #3: if the branch is taken, where does it go?
• Branch target predictor (BTB)
• Supplies target PC if branch is taken

CIS 5710 | Prof Joseph Devietti 13


Branch Target Buffer
• Learn from the past to predict the future
• Record the past in a hardware structure

• Branch target buffer (BTB)


• Record a list of branches we have seen
+ code doesn’t change
• PC indexes table of bits
• each entry is 1 bit: is there a branch here?
• set the bit if we see a branch at that index

PC [31:10] [9:2] 1:0 BTB


branch

branch
1
CIS 5710 | Prof Joseph Devietti is it a branch? 14
BTB Aliasing
• What if two PCs have the same bits 9:2...?
• BTB is just a prediction, processor will still work correctly
• these PCs alias
• Aliasing branches interfere with each other
• In our initial BTB design, we never clear BTB bits…
• If bits 9:2 used to index, there are 256 BTB entries
• A 4MB program has 1M insns
• 4K insns mapping to each BTB entry
• What are the odds that 1 out of 4K insns is a branch?
• BTB will become saturated

CIS 5710 | Prof Joseph Devietti 16


BTB Tags
• BTB entries are too coarse-grained
+ Record only taken branches
• a never-taken branch might as well be a NOP
– useful, but doesn’t help enough
• better idea: tag each BTB entry
• remember some things precisely, rather than everything
imprecisely
• record a subset of actual taken branches
• is_a_branch = (BTB[PC].branch && BTB[PC].tag == PC)
• How large is each tag?
BTB
PC [31:10] [9:2] 1:0
branch tag

branch tag

CIS 5710 | Prof Joseph Devietti 1 17


is it a branch?
BTB Tags
• Now that we have tags, branch bits are redundant
• tag comparison achieves the same goal
• let’s get rid of them!

BTB
PC [31:10] [9:2] 1:0
tag

tag

==
1

is it a branch?

CIS 5710 | Prof Joseph Devietti 19


Branch Direction Prediction

CIS 5710 | Prof Joseph Devietti 20


Branch Prediction Components
regfile

I$ D$
B
P

• Step #1: is it a branch?


• Easy after decode... during fetch: predictor
• Step #2: is the branch taken or not taken?
• Direction predictor
• Step #3: if the branch is taken, where does it go?
• Branch target predictor (BTB)
• Supplies target PC if branch is taken

CIS 5710 | Prof Joseph Devietti 21


Branch Direction Prediction
• Learn from past, predict the future
• Record the past in a hardware structure
• Direction predictor (DIRP)
• Map conditional-branch PC to taken/not-taken (T/N) decision
• Individual conditional branches often biased or weakly biased
• 90%+ one way or the other considered biased
• Why? Loop back edges, checking for uncommon conditions
• Bimodal predictor: simplest predictor
• PC indexes Branch History Table of bits (0 = N, 1 = T), no tags
• Essentially: branch will go same way it went last time

PC [31:10] [9:2] 1:0 BHT


T or NT
• What about aliasing?
• Two PC with the same lower bits? T or NT
• No problem, just a prediction!
Prediction (taken or
CIS 5710 | Prof Joseph Devietti not taken) 22
Bimodal Branch Predictor
• simplest direction predictor

Prediction

Outcome
State
• PC indexes table of bits (0 = N, 1 = T),

Time
Result?
no tags
1 N N T Wrong
• Essentially: branch will go same way it
2 T T T Correct
went last time
3 T T T Correct
• Problem: inner loop branch below 4 T T N Wrong
for (i=0;i<100;i++) 5 N N T Wrong
for (j=0;j<3;j++) 6 T T T Correct
// whatever 7 T T T Correct
– Two “built-in” mis-predictions per 8 T T N Wrong
inner loop iteration 9 N N T Wrong
– Branch predictor “changes its mind 10 T T T Correct
too quickly” 11 T T T Correct
12 T T N Wrong

CIS 5710 | Prof Joseph Devietti 23


Two-Bit Saturating Counters (2bc)
• Two-bit saturating counters

Prediction

Outcome
(2bc) [Smith 1981]

State
Time
Result?
• Replace each single-bit prediction
1 N N T Wrong
• (0,1,2,3) = (N,n,t,T) 2 n N T Wrong
• Adds “hysteresis” 3 t T T Correct
• Force predictor to mis-predict twice 4 T T N Wrong
before “changing its mind” 5 t T T Correct
6 T T T Correct
• One mispredict each loop execution
7 T T T Correct
(rather than two)
8 T T N Wrong
+ Fixes this pathology (which is not 9 t T T Correct
contrived, by the way) 10 T T T Correct
• Can we do even better? 11 T T T Correct
12 T T N Wrong

CIS 5710 | Prof Joseph Devietti 24


Branches may be correlated
• Consider:
for (i=0; i<1000000; i++) { // Highly biased
if (i % 3 == 0) { // Locally correlated

}
if (random() % 2 == 0) { // Unpredictable

}
if (i % 3 == 0) {
… // Globally correlated
}
}

CIS 5710 | Prof Joseph Devietti 27


Gshare History-Based Predictor
• Exploits observation that branch outcomes are
correlated
• Maintains recent branch outcomes in Branch
History Register (BHR)
• In addition to BHT of counters (typically 2-bit sat. counters)
• How do we incorporate history into our
predictions?
• Use PC xor BHR to index into BHT. Why?
PC

BHT
BHR

CIS 5710 | Prof Joseph Devietti 28


direction prediction (T/NT)
Gshare History-based Predictor
• Gshare working example

Prediction

Outcome
State
• assume program has one

Time

BHR
Result?
branch
1 N NNN N T wrong
• BHT: one 1-bit DIRP entry
2 N NNT N T wrong
• 3BHR: last 3 branch 3 N NTT N T wrong
outcomes 4 N TTT N N correct
• train counter, and update 5 N TTN N T wrong
BHR after each branch 6 N TNT N T wrong
7 T NTT T T correct
8 N TTT N N correct
9 T TTN T T correct
10 T TNT T T correct
11 T NTT T T correct
12 N TTT N N correct

CIS 5710 | Prof Joseph Devietti 29


Hybrid Predictor
• Hybrid (tournament) predictor [McFarling 1993]
• Attacks correlated predictor BHT capacity problem
• Idea: combine two predictors
• Bimodal predictor for history-independent branches
• Correlated predictor for branches that need history
• Chooser assigns branches to one predictor or the other
• Branches start in simple BHT, move mis-prediction
threshold
+ Correlated predictor can be made smaller, handles fewer
branches PC
+ 90–95% accuracy

chooser
BHT

BHT
BHR

CIS 5710 | Prof Joseph Devietti 32


Branch Target Prediction

CIS 5710 | Prof Joseph Devietti 33


Branch Prediction Components
regfile

I$ D$
B
P

• Step #1: is it a branch?


• Easy after decode... during fetch: predictor
• Step #2: is the branch taken or not taken?
• Direction predictor
• Step #3: if the branch is taken, where does it go?
• Branch target predictor (BTB)
• Supplies target PC if branch is taken

CIS 5710 | Prof Joseph Devietti 34


Branch Target Buffer, Again
• Branch target buffer (BTB)
• guess the future PC based on past behavior
• “Last time the branch X was taken, it went to address Y”
• “So, if address X is fetched, fetch address Y next”
• Essentially: branch will go to same place it went last time
• PC indexes table of target addresses
• use tags to precisely remember a subset of targets
• What about aliasing?
• Two PCs with the same lower bits?
• No problem, just a prediction!
PC [31:10] [9:2] 1:0 BTB
target tag

target tag

CIS 5710 | Prof Joseph Devietti predicted target 35


Branch Target Buffer
• BTB predicts which insns are branches, and
their targets
• tag each entry with its corresponding PC
• Update BTB on every taken branch insn, record target PC:
• BTB[PC].tag = PC, BTB[PC].target = target of branch
• All insns access BTB at Fetch in parallel with Imem
• If tag matches, indicates insn at that PC is a branch
• otherwise, assume insn is not a branch
• Predicted PC = (BTB[PC].tag == PC) ? BTB[PC].target :
PC+4
==

tag
PC BTB
target
predicted target
+
4
CIS 5710 | Prof Joseph Devietti 36
Why Does a BTB Work?
• Because most control insns use direct targets
• Target encoded in insn itself ® same “taken” target every
time
• What about indirect targets?
• Target held in a register ® can be different each time
• Two indirect call idioms
+ Dynamically linked functions (DLLs): target always the
same
• Dynamically dispatched (virtual) functions: hard but
uncommon
• Also two indirect unconditional jump idioms
• Switches: hard but uncommon
– Function returns: hard and common
CIS 5710 | Prof Joseph Devietti 37
Return Address Stack (RAS)

==
tag
PC BTB
target
predicted target
+
4

RAS

• Return address stack (RAS)


• Call instruction? RAS[TopOfStack++] = PC+4
• Return instruction? Predicted-target = RAS[--TopOfStack]
• How can you tell if an insn is a call/return before decoding
it?
• mark some BTB entries as “returns”, or use another table

CIS 5710 | Prof Joseph Devietti 38


Misprediction Recovery

CIS 5710 | Prof Joseph Devietti 39


Branch Recovery

PC PC
D X <<
2
+
4 M
Register A
S O
File X
Insn s1 s2 d B B
PC
Mem
IR IR IR
nop nop

• Branch recovery: what to do when branch is actually taken


• Insns that are in F and D are wrong
• Flush them, i.e., replace them with NOPs
• They haven’t written permanent state yet (regfile, DMem)
– Two cycle penalty for taken branches
CIS 5710 | Prof Joseph Devietti 40
Branch Speculation and Recovery
1 2 3 4 5 6 7 8 9
Correct:
addi x3,x1,1 F D X M W
bne x3,x0,targ F D X M W
sw x6,4(x7) F D X M W
mul x10,x8,x9 F D X M W
speculative
• Mis-speculation recovery
• Not too painful in a short, in-order pipeline
• Branch resolves in X
+ Younger insns (in F, D) haven’t changed permanent state
• Flush insns currently in D and X (i.e., replace with nops)
1 2 3 4 5 6 7 8 9
Recovery: addi x3,x1,1 F D X M W
bne x3,x0,targ F D X M W
sw x6,4(x7) F D -- -- --
mul x10,x8,x9 F -- -- -- --
targ:add x4,x4,x5 F D X M W
CIS 5710 | Prof Joseph Devietti 41
Reducing Taken Branch Penalty

CIS 5710 | Prof Joseph Devietti 42


Reducing Penalty: Fast Branches

PC
D <<
2 <>
+ 0
4 A X M
Register S S
X O
File B X
Insn s1 s2 d B
PC
Mem
IR IR IR

• Fast branch: can decide at D, not X


• Test must be comparison to zero or equality, no time for ALU
• beq/bne
+ New taken branch penalty is 1
– Additional insns (blt) for more complex tests, must bypass to
D too
CIS 5710 | Prof Joseph Devietti 43
Reducing Penalty: Fast Branches
• Fast branch: targets control-hazard penalty
• Basically, branch insns that can resolve at D, not X
• Test must be comparison to zero or equality, no time for
ALU
+ New taken branch penalty is 1
– Additional comparison insns (e.g., cmplt, slt) for complex
tests
– Must bypass into decode stage now, too
1 2 3 4 5 6 7 8 9
bnez r3,targ F D X M W
st r6⟶[r7+4] F D -- -- --
targ:add r4⟵r5,r4 F D X M W

CIS 5710 | Prof Joseph Devietti 44


Putting It All Together
• BTB & branch direction predictor during fetch

==
tag
PC BTB
target
predicted target
+
4

RAS

BHT
taken/not-taken
• If branch prediction correct, no taken branch
penalty

CIS 5710 | Prof Joseph Devietti 46


Branch Prediction Performance
• Dynamic branch prediction
• 20% of instruction branches
• Simple predictor: branches predicted with 75% accuracy
• CPI = 1 + (20% * 25% * 2) = 1.1
• More advanced predictor: 95% accuracy
• CPI = 1 + (20% * 5% * 2) = 1.02

• Branch mis-predictions still a big problem though


• Pipelines are long: typical mis-prediction penalty is 10+
cycles
• For cores that do more per cycle, predictions more costly
(later)

CIS 5710 | Prof Joseph Devietti 47


Predication

CIS 5710 | Prof Joseph Devietti 48


Predication
• Instead of predicting which way we’re going, why
not go both ways?
• compute a predicate bit indicating a condition
• ISAs often include predicated instructions
• predicated insns either execute as normal or as NOPs,
depending on the predicate bit
• Examples
• x86 cmov performs conditional load/store
• 32b ARM allows almost all insns to be predicated
• 64b ARM has predicated reg-reg move, inc, dec, not
• Nvidia GPU ISA supports predication on most insns
• RV does not have predication

CIS 5710 | Prof Joseph Devietti 49


Predication Example
• Instead of predicting which way we’re going, why
not go both ways?
• compute a predicate bit indicating a condition
• ISA includes predicated instructions
• predicated insns either execute as normal or as NOPs,
depending on the predicate bits
// C code ; original RV ; imaginary predicated RV
if (a >= b) { blt x1,x2,else slt p1,x1,x2
x += y; add x3,x3,x4 add.!p1 x3,x3,x4
} else { j after sub.p1 x3,x3,x5
x -= z; else:
} sub x3,x3,x5
after:

CIS 5710 | Prof Joseph Devietti 50


Predication Performance
• Predication overhead is additional insns
• Sometimes overhead is zero
• for if-then statement where condition is true
– Most of the times it isn’t
• if-then-else statement, only one of the paths is useful
• For a given branch, predicate (vs speculate) if…
• Average number of additional insns > overall mis-prediction
penalty
• For an individual branch
• Mis-prediction penalty in a 5-stage scalar pipeline = 2
• Mis-prediction rate is <50%, and often <20%
• Overall mis-prediction penalty <1 and often <0.4
• So when is predication ever worth it?
CIS 5710 | Prof Joseph Devietti 51
Predication Performance
• What does predication actually accomplish?
• In a scalar 5-stage pipeline (penalty = 2): nothing!
• In a 4-way superscalar 15-stage pipeline (penalty = 60)?
• Use when mis-predictions >10% and insn overhead <6
• In a 4-way out-of-order superscalar (penalty ~ 150)
• potentially useful in more situations
• typically only desirable for branches that mis-predict
frequently

CIS 5710 | Prof Joseph Devietti 52


Predication Pros/Cons
• Other predication advantages
• Low-power: eliminates the need for a large branch predictor
• Real-time: predicated code has consistent latency
• Predication disadvantages
• wasted time/energy compared to correct prediction
• complex to implement
• doesn’t nest well

CIS 5710 | Prof Joseph Devietti 53


Summary
App App App • Control hazards
System software • Branch target prediction
• Branch direction prediction
Mem CPU I/O

CIS 5710 | Prof Joseph Devietti 54

You might also like