10 Branchprediction
10 Branchprediction
CIS 5710
Computer Organization and Design
This Unit: Branch Prediction
App App App • Control hazards
System software • Branch prediction
3
Control Dependences and
Branch Prediction
PC PC
D X <<
2
+
4 M
Register A
S O
File X
Insn s1 s2 d B B
PC
Mem
IR IR IR
I$ D$
B
P
Taken
hardware structure:
predicted
target branch target buffer
direction predictor
<>
BP TG TG
PC PC
<<
2
+
4 D X M
Register A
S O
File X
Insn s1 s2 d B B
PC Mem
IR IR IR
nop nop
• Dynamic branch prediction: hw guesses outcome
• Start fetching from guessed address
• Flush on mis-prediction
CIS 5710 | Prof Joseph Devietti 11
Identifying Branches
I$ D$
B
P
branch
1
CIS 5710 | Prof Joseph Devietti is it a branch? 14
BTB Aliasing
• What if two PCs have the same bits 9:2...?
• BTB is just a prediction, processor will still work correctly
• these PCs alias
• Aliasing branches interfere with each other
• In our initial BTB design, we never clear BTB bits…
• If bits 9:2 used to index, there are 256 BTB entries
• A 4MB program has 1M insns
• 4K insns mapping to each BTB entry
• What are the odds that 1 out of 4K insns is a branch?
• BTB will become saturated
branch tag
BTB
PC [31:10] [9:2] 1:0
tag
tag
==
1
is it a branch?
I$ D$
B
P
Prediction
Outcome
State
• PC indexes table of bits (0 = N, 1 = T),
Time
Result?
no tags
1 N N T Wrong
• Essentially: branch will go same way it
2 T T T Correct
went last time
3 T T T Correct
• Problem: inner loop branch below 4 T T N Wrong
for (i=0;i<100;i++) 5 N N T Wrong
for (j=0;j<3;j++) 6 T T T Correct
// whatever 7 T T T Correct
– Two “built-in” mis-predictions per 8 T T N Wrong
inner loop iteration 9 N N T Wrong
– Branch predictor “changes its mind 10 T T T Correct
too quickly” 11 T T T Correct
12 T T N Wrong
Prediction
Outcome
(2bc) [Smith 1981]
State
Time
Result?
• Replace each single-bit prediction
1 N N T Wrong
• (0,1,2,3) = (N,n,t,T) 2 n N T Wrong
• Adds “hysteresis” 3 t T T Correct
• Force predictor to mis-predict twice 4 T T N Wrong
before “changing its mind” 5 t T T Correct
6 T T T Correct
• One mispredict each loop execution
7 T T T Correct
(rather than two)
8 T T N Wrong
+ Fixes this pathology (which is not 9 t T T Correct
contrived, by the way) 10 T T T Correct
• Can we do even better? 11 T T T Correct
12 T T N Wrong
BHT
BHR
Prediction
Outcome
State
• assume program has one
Time
BHR
Result?
branch
1 N NNN N T wrong
• BHT: one 1-bit DIRP entry
2 N NNT N T wrong
• 3BHR: last 3 branch 3 N NTT N T wrong
outcomes 4 N TTT N N correct
• train counter, and update 5 N TTN N T wrong
BHR after each branch 6 N TNT N T wrong
7 T NTT T T correct
8 N TTT N N correct
9 T TTN T T correct
10 T TNT T T correct
11 T NTT T T correct
12 N TTT N N correct
chooser
BHT
BHT
BHR
I$ D$
B
P
target tag
tag
PC BTB
target
predicted target
+
4
CIS 5710 | Prof Joseph Devietti 36
Why Does a BTB Work?
• Because most control insns use direct targets
• Target encoded in insn itself ® same “taken” target every
time
• What about indirect targets?
• Target held in a register ® can be different each time
• Two indirect call idioms
+ Dynamically linked functions (DLLs): target always the
same
• Dynamically dispatched (virtual) functions: hard but
uncommon
• Also two indirect unconditional jump idioms
• Switches: hard but uncommon
– Function returns: hard and common
CIS 5710 | Prof Joseph Devietti 37
Return Address Stack (RAS)
==
tag
PC BTB
target
predicted target
+
4
RAS
PC PC
D X <<
2
+
4 M
Register A
S O
File X
Insn s1 s2 d B B
PC
Mem
IR IR IR
nop nop
PC
D <<
2 <>
+ 0
4 A X M
Register S S
X O
File B X
Insn s1 s2 d B
PC
Mem
IR IR IR
==
tag
PC BTB
target
predicted target
+
4
RAS
BHT
taken/not-taken
• If branch prediction correct, no taken branch
penalty