Anch Prediction
Anch Prediction
Unit I
Branch prediction – Static & Dynamic
FK.F02 1
4.2 Static Branch Prediction
• used where branch behavior is highly predictable at
compile time
• architectural feature to support static branch prediction
– delayed branch
FK.F02 2
2
Static Branch Prediction for Load Stall
FK.F02 4
4
Profile-based Static
Branch Prediction
Misprediction rate on SPEC92
•varying widely: 3% to 24%
•in average, 9% for FP programs and
15% for integer programs
CSE 7381
Computer Architecture FK.F02 8
2-bit Branch History Table
(BHT)
• Solution: 2-bit scheme where change prediction only
if get misprediction twice:
T
NT
Predict Taken Predict Taken
T
T NT
NT
Predict Not Predict Not
T Taken
Taken
NT
• Red: stop, not taken
• Green: go, taken
CSE 7381
Computer Architecture FK.F02 9
Correlating Branches
B1: if (aa==2) LD R1, aa
LD R2, bb
aa = 0;
DSUBUI R3, R1,#2
B2: If (bb == 2) BNEZ R3, L1
bb = 0; DADD R1, R0, R0
L1: DSUBUI R3, R2, #2
B3: If (aa!=bb) { …
BNEZ R3, L2
DADD R2, R0, R0
L2: DSUBUI R3, R1, R2
BEQZ R3, L3
Observation: B1 = NT B2 = T
CSE 7381
Computer Architecture FK.F02 11
Correlating Branches
Idea: taken/not taken of
recently executed Branch address (4 bits)
branches is related to
behavior of next branch 2-bits per branch
(as well as the history of local predictors
that branch behavior)
– Then behavior of recent
branches selects between,
say, 4 predictions of next
branch, updating just that Prediction
Prediction
prediction
• (2,2) predictor: 2-bit
global, 2-bit local
2-bit global
branch history
CSE 7381
Computer Architecture (01 = not taken then taken) FK.F02 12
Accuracy of Different Schemes
20%
18%
4096 Entries 2-bit BHT
Frequency of Mispredictions
10%
8%
6% 6% 6%
6% 5% 5%
4%
4%
2% 1% 1%
0% 0%
0%
CSE 7381
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Computer Architecture FK.F02 13
Tournament Predictors:
An example of multilevel branch predictors
• Motivation for correlating branch predictors is 2-
bit predictor failed on important branches;
– by adding global information, performance improved
• Tournament predictors: use 2 predictors,
– 1 based on global information and
– 1 based on local information, and combine with a selector
– A 2-bit saturating counter per branch to choose among two
different predictors
CSE 7381
Computer Architecture FK.F02 14
The Alpha 21264 Branch Predictor
0/0, 1/0, 1/1 0/0, 0/1, 1/1 Pi: predictor i
2-bit
counter
Use P1 Use P2
CSE 7381
4K Counter is incremented whenever the “predicted” predictor is correct.
Decremented in the reverse situation.
Computer Architecture FK.F02 15
Global Predictor
Branch address (4 bits)
•Global predictor also has 4K entries and is indexed by the history of the last 12 branches;
each entry in the global predictor is a standard 2-bit predictor (Ref. Slide #5)
–12-bit pattern: ith bit 0 => ith prior branch not taken;
2-bit global ith bit 1 => ith prior branch taken;
branch history
(01 = not taken then taken)
CSE 7381
Computer Architecture FK.F02 16
2-Level Local Predictor
The most recent 10
branch outcomes
1K 3-bit counters
Local prediction
CSE 7381
Computer Architecture FK.F02 17
% of predictions from local
predictor in Tournament
Prediction Scheme
0% 20% 40% 60% 80% 100%
nasa7 98%
matrix300 100%
tomcatv 94%
doduc 90%
spice 55%
fpppp 76%
gcc 72%
espresso 63%
eqntott 37%
li 69%
CSE 7381
Computer Architecture FK.F02 18
Accuracy of Branch Prediction
99%
tomcatv 99%
100%
95%
doduc 84%
97%
86%
fpppp 82% Profile-based
98%
2-bit counter
88% Tournament
li 77%
98%
86%
espresso 82%
96%
88%
gcc 70%
94%
8%
7%
Local
6%
5%
Correlating
4%
3%
2%
Tournament
1%
0%
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
CSE 7381
Computer Architecture Total predictor size (Kbits) FK.F02 20
Need Address
at Same Time as Prediction
• Branch Target Buffer/Cache (BTB): Address of branch index to get
prediction AND branch address (if taken)
– Note: must check for branch match now, since can’t use wrong branch address
Branch PC Predicted PC
PC of instruction
FETCH
=? Extra
Yes: instruction is prediction state
branch and use bits
No: branch not predicted PC as next
predicted,
CSE 7381proceed normally PC
(Next
Computer PC = PC+4)
Architecture FK.F02 21
Branch Folding
• In the BT-buffer, store one or more target
instructions instead of the predicted PC
– Obtain zero cycle unconditional branches
– Sometimes, zero cycle conditional branches
CSE 7381
Computer Architecture FK.F02 23
Return Address Predictors
CSE 7381
Computer Architecture FK.F02 24
Dynamic Branch Prediction
Summary
• Prediction becoming important part of scalar
execution
• Branch History Table: 2 bits for loop accuracy
• Correlation: Recently executed branches correlated
with next branch.
– Either different branches
– Or different executions of same branches
• Tournament Predictor: more resources to
competitive solutions and pick between them
• Branch Target Buffer: include branch address &
prediction
• Predicated Execution can reduce number of
branches, number of mispredicted branches
• CSE
Return
7381 address stack for prediction of indirect jump
Computer Architecture FK.F02 25