0% found this document useful (0 votes)

13 views25 pages

Anch Prediction

The document discusses advanced concepts in branch prediction, focusing on static and dynamic methods. It outlines various prediction schemes, including static predictions based on compile-time behavior and dynamic predictors that leverage historical data to improve accuracy. Key techniques such as the Branch History Table, correlating branches, and tournament predictors are highlighted for their roles in enhancing processor performance by minimizing control hazards.

Uploaded by

Herlin L.T.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views25 pages

Anch Prediction

Uploaded by

Herlin L.T.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

CS2354 Advanced Computer Architecture

Unit I
Branch prediction – Static & Dynamic

FK.F02 1
4.2 Static Branch Prediction
• used where branch behavior is highly predictable at
compile time
• architectural feature to support static branch prediction
– delayed branch

The instruction in the

branch delay slot is
executed whether or
not the branch is
taken (for zero cycle
penalty)

FK.F02 2
2
Static Branch Prediction for Load Stall

LD R1, 0(R2) ← Load Stall

DSUBU R1, R1, R3
BEQZ R1, L
OR R4, R5, R6
DADDU R10, R4, R3
L: DADDU R7, R8, R9

almost rarely taken

always taken

LD R1, 0(R2) LD R1, 0(R2)

DADDU R7, R8, R9 OR R4, R5, R6
DSUBU R1, R1, R3 DSUBU R1, R1, R3
BEQZ R1, L BEQZ R1, L
OR R4, R5, R6 DADDU R10, R4, R3
DADDU R10, R4, R3 L: DADDU R7, R8, R9
L: assume it’s safe if mis-predicted
FK.F02 3
3
Static Branch Prediction Schemes
• Simplest scheme - predict branch as taken
– 34% misprediction rate for SPEC programs (59% to 9%)
• Direction-based scheme - predict
backward-going branch as taken and
forward-going branch as not taken
– Not good for SPEC programs
– Overall misprediction rate is not less than 30% to 40%
• Profile-based scheme – predict on the basis of
profile information collected from earlier runs
– An individual branch is often highly biased toward taken or
untaken
– Changing the input so that the profile is for a different run leads
to only a small change in the accuracy

FK.F02 4
4
Profile-based Static
Branch Prediction
Misprediction rate on SPEC92
•varying widely: 3% to 24%
•in average, 9% for FP programs and
15% for integer programs

Number of instructions executed

between mispredicted branches
avg. Taken Profile
FP 30 173
INT 10 46
All 20 110
std dev 27 85
varying widely: depending
on branch frequency and
prediction precisionFK.F02 5
5
What we have done !

• Have described techniques to overcome data

hazards
• Will describe techniques to overcome control
hazards
– What limits the amount of ILP? Control dependence
– Prediction is helpful in single-issue processor
– Prediction is crucial to multi-issue processors
– WHY ?
» Branches arrives up to n times faster for an n-issue
processor
» The relative impact of the control stalls will be larger
with the lower CPI (Amdahl’s Law)
CSE 7381
Computer Architecture FK.F02 6
Basic Predict Schemes
• Static schemes
– Predict not taken
– Predict taken
– Delayed branch
• Dynamic schemes
1. 1-bit Branch-Prediction Buffer
2. 2-bit Branch-Prediction Buffer
3. Correlating Branch Prediction Buffer
4. Tournament Branch Predictor
5. Branch Target Buffer
For High-
6. Integrated Instruction Fetch Units
Performance
7. Return Address Predictors
Delivery

• Goal: Allowing the processor to resolve the outcome

of a branch early
CSE 7381
Computer Architecture FK.F02 7
1-bit Branch History Table
(BHT)
• Performance = ƒ(accuracy, cost of misprediction)
• Branch History Table: Lower bits of PC address
index table of 1-bit values
– Says whether or not branch taken last time
– No address check (saves HW, but may not be right branch)
• Problem: in a loop, 1-bit BHT will cause
2 mispredictions (avg is 9 iterations before exit):
– End of loop case, when it exits instead of looping as before
– First time through loop on next time through code, when it
predicts exit instead of looping
– Only 80% accuracy even if loop 90% of the time

CSE 7381
Computer Architecture FK.F02 8
2-bit Branch History Table
(BHT)
• Solution: 2-bit scheme where change prediction only
if get misprediction twice:
T
NT
Predict Taken Predict Taken
T
T NT
NT
Predict Not Predict Not
T Taken
Taken

NT
• Red: stop, not taken
• Green: go, taken
CSE 7381
Computer Architecture FK.F02 9
Correlating Branches
B1: if (aa==2) LD R1, aa
LD R2, bb
aa = 0;
DSUBUI R3, R1,#2
B2: If (bb == 2) BNEZ R3, L1
bb = 0; DADD R1, R0, R0
L1: DSUBUI R3, R2, #2
B3: If (aa!=bb) { …
BNEZ R3, L2
DADD R2, R0, R0
L2: DSUBUI R3, R1, R2
BEQZ R3, L3

Observation: the 3rd branch is correlated with the

1st and 2nd branches:
B1 = NT & B2 = NT  B3 = T
CSE 7381
Computer Architecture FK.F02 10
Correlating Branches
B1: if (d==0) LD R1, d
d = 0; BNEZ R1, L1
DADDIU R1, R0,#1
B2: If (d == 1) L1: DADDIU R3, R1,-#1
BNEZ R3, L2
…
L2:

Observation: B1 = NT  B2 = T

CSE 7381
Computer Architecture FK.F02 11
Correlating Branches
Idea: taken/not taken of
recently executed Branch address (4 bits)
branches is related to
behavior of next branch 2-bits per branch
(as well as the history of local predictors
that branch behavior)
– Then behavior of recent
branches selects between,
say, 4 predictions of next
branch, updating just that Prediction
Prediction
prediction
• (2,2) predictor: 2-bit
global, 2-bit local

2-bit global
branch history
CSE 7381
Computer Architecture (01 = not taken then taken) FK.F02 12
Accuracy of Different Schemes
20%

18%
4096 Entries 2-bit BHT
Frequency of Mispredictions

16% Unlimited Entries 2-bit BHT

14% 1024 Entries (2,2) BHT
12% 11%

10%

8%
6% 6% 6%
6% 5% 5%
4%
4%

2% 1% 1%
0% 0%
0%

CSE 7381
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Computer Architecture FK.F02 13
Tournament Predictors:
An example of multilevel branch predictors
• Motivation for correlating branch predictors is 2-
bit predictor failed on important branches;
– by adding global information, performance improved
• Tournament predictors: use 2 predictors,
– 1 based on global information and
– 1 based on local information, and combine with a selector
– A 2-bit saturating counter per branch to choose among two
different predictors

• Hopes to select right predictor for right branch

CSE 7381
Computer Architecture FK.F02 14
The Alpha 21264 Branch Predictor
0/0, 1/0, 1/1 0/0, 0/1, 1/1 Pi: predictor i

2-bit
counter
Use P1 Use P2

Local 1/0 0/1 1/0 0/1

Branch 0/1
Address
Use P1 Use P2
1/0

0/0, 1/1 0/0, 1/1

•4K 2-bit counters to choose from among a global predictor and a local predictor

CSE 7381
4K Counter is incremented whenever the “predicted” predictor is correct.
Decremented in the reverse situation.
Computer Architecture FK.F02 15
Global Predictor
Branch address (4 bits)

2-bits per branch

local predictors
History of Global
the last 12 Predictor
branches 2-bit
Prediction 4K entries predictor
Prediction

•Global predictor also has 4K entries and is indexed by the history of the last 12 branches;
each entry in the global predictor is a standard 2-bit predictor (Ref. Slide #5)
–12-bit pattern: ith bit 0 => ith prior branch not taken;
2-bit global ith bit 1 => ith prior branch taken;
branch history
(01 = not taken then taken)
CSE 7381
Computer Architecture FK.F02 16
2-Level Local Predictor
The most recent 10
branch outcomes

1024 10-bit entries

1K 3-bit counters

Local prediction

CSE 7381
Computer Architecture FK.F02 17
% of predictions from local
predictor in Tournament
Prediction Scheme
0% 20% 40% 60% 80% 100%
nasa7 98%
matrix300 100%
tomcatv 94%
doduc 90%
spice 55%
fpppp 76%
gcc 72%
espresso 63%
eqntott 37%
li 69%
CSE 7381
Computer Architecture FK.F02 18
Accuracy of Branch Prediction
99%
tomcatv 99%
100%

95%
doduc 84%
97%

86%
fpppp 82% Profile-based
98%
2-bit counter
88% Tournament
li 77%
98%

86%
espresso 82%
96%

88%
gcc 70%
94%

0% 20% 40% 60% 80% 100%

Branch prediction accuracy
CSE 7381 • Profile: branch profile from last execution
(static in that in encoded in instruction, but profile)
Computer Architecture FK.F02 19
Accuracy v. Size (SPEC89)
10%
9%
Conditional branch misprediction rate

7%
Local
6%
5%
Correlating
4%

3%
2%
Tournament
1%

0%
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
CSE 7381
Computer Architecture Total predictor size (Kbits) FK.F02 20
Need Address
at Same Time as Prediction
• Branch Target Buffer/Cache (BTB): Address of branch index to get
prediction AND branch address (if taken)
– Note: must check for branch match now, since can’t use wrong branch address

Branch PC Predicted PC
PC of instruction
FETCH

=? Extra
Yes: instruction is prediction state
branch and use bits
No: branch not predicted PC as next
predicted,
CSE 7381proceed normally PC
(Next
Computer PC = PC+4)
Architecture FK.F02 21
Branch Folding
• In the BT-buffer, store one or more target
instructions instead of the predicted PC
– Obtain zero cycle unconditional branches
– Sometimes, zero cycle conditional branches

Uncon Decode target

Branch instruction
IF ID EX MEM WB
IF ID EX MEM WB

– Multi-issue: BT-buffer needs to supply multiple

instructions
CSE 7381
Computer Architecture FK.F02 22
Integrated Instruction Fetch
Units
• IF unit in multi-issue processors integrates:
– Integrated Branch Prediction
– Instruction Prefetch
– Instruction Memory Access and Buffering

CSE 7381
Computer Architecture FK.F02 23
Return Address Predictors

• Predicting indirect jumps: jumps whose

destination address varies at run-time.
• Register Indirect branch hard to predict
address
• SPEC89 85% such branches for procedure
return
• Since stack discipline for procedures, save
return address in small buffer that acts like a
stack: 8 to 16 entries has small miss rate

CSE 7381
Computer Architecture FK.F02 24
Dynamic Branch Prediction
Summary
• Prediction becoming important part of scalar
execution
• Branch History Table: 2 bits for loop accuracy
• Correlation: Recently executed branches correlated
with next branch.
– Either different branches
– Or different executions of same branches
• Tournament Predictor: more resources to
competitive solutions and pick between them
• Branch Target Buffer: include branch address &
prediction
• Predicated Execution can reduce number of
branches, number of mispredicted branches
• CSE
Return
7381 address stack for prediction of indirect jump
Computer Architecture FK.F02 25

Control Hazards
No ratings yet
Control Hazards
19 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
Cse502 L11 Bpred
No ratings yet
Cse502 L11 Bpred
58 pages
App C
No ratings yet
App C
50 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
02c BranchPred
No ratings yet
02c BranchPred
35 pages
10 Branchprediction
No ratings yet
10 Branchprediction
49 pages
Pipeline Part 2 and Data Hazards
No ratings yet
Pipeline Part 2 and Data Hazards
11 pages
CA L15b BranchPrediction DynamicPredictors
No ratings yet
CA L15b BranchPrediction DynamicPredictors
25 pages
8 - Branch Prediction
No ratings yet
8 - Branch Prediction
29 pages
Lec4 Supp Branch Prediction
No ratings yet
Lec4 Supp Branch Prediction
45 pages
Lect09 Adv Branch Prediction
No ratings yet
Lect09 Adv Branch Prediction
55 pages
L11 PipelineHazards 4
No ratings yet
L11 PipelineHazards 4
30 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
L10 PipelineHazards 3
No ratings yet
L10 PipelineHazards 3
35 pages
17.L15 BranchPrediction
No ratings yet
17.L15 BranchPrediction
38 pages
Branch Pred
No ratings yet
Branch Pred
27 pages
Lab3 Branch Prediction Hardware
No ratings yet
Lab3 Branch Prediction Hardware
16 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
Implementing A Branch Predictor
No ratings yet
Implementing A Branch Predictor
7 pages
CA L15a BranchPrediction Intro and StaticPredictors
No ratings yet
CA L15a BranchPrediction Intro and StaticPredictors
19 pages
Cs146-Lecture7 2
No ratings yet
Cs146-Lecture7 2
17 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Branch Prediction
No ratings yet
Branch Prediction
41 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
RISC-V Pipeline P3
No ratings yet
RISC-V Pipeline P3
24 pages
8 DynamicBranchPrediction
No ratings yet
8 DynamicBranchPrediction
8 pages
Lecture #3
No ratings yet
Lecture #3
12 pages
Lec 15
No ratings yet
Lec 15
23 pages
Branch Predictors
No ratings yet
Branch Predictors
41 pages
Dual Clutch Transmission
100% (1)
Dual Clutch Transmission
7 pages
L12 - Advanced Branch Preiction
No ratings yet
L12 - Advanced Branch Preiction
9 pages
Lecture05 Branches
No ratings yet
Lecture05 Branches
47 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
No ratings yet
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
19 pages
Branch Prediction
No ratings yet
Branch Prediction
6 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
05 - Pipelining - Branch Prediction
No ratings yet
05 - Pipelining - Branch Prediction
20 pages
Branch Prediction: Joel Emer
No ratings yet
Branch Prediction: Joel Emer
36 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
Computer Architecture Solutions - OK
No ratings yet
Computer Architecture Solutions - OK
6 pages
Ae1 Listening Test Paper 08.2021
No ratings yet
Ae1 Listening Test Paper 08.2021
3 pages
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
No ratings yet
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
20 pages
9.1.0 Branch Prediction Pentiums IBM PPC
No ratings yet
9.1.0 Branch Prediction Pentiums IBM PPC
163 pages
Branch Prediction
No ratings yet
Branch Prediction
2 pages
Branch Handling
No ratings yet
Branch Handling
23 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
Annex 3-1 - CMR FORMAT
No ratings yet
Annex 3-1 - CMR FORMAT
3 pages
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
No ratings yet
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
22 pages
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
No ratings yet
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
24 pages
Correlating (Global) Branch Predictors Correlating Branch Predictors
No ratings yet
Correlating (Global) Branch Predictors Correlating Branch Predictors
3 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Branch Prediction
No ratings yet
Branch Prediction
5 pages
Employee Performance Review - Quarterly - Final
No ratings yet
Employee Performance Review - Quarterly - Final
5 pages
CPSE Contacts
No ratings yet
CPSE Contacts
1,264 pages
Drug Calculation Tutorial
100% (2)
Drug Calculation Tutorial
13 pages
CE6603-Design of Steel Structures
No ratings yet
CE6603-Design of Steel Structures
12 pages
Newsl 2.3: Swans and Owans
No ratings yet
Newsl 2.3: Swans and Owans
3 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
TEDtalk Transcript - How To Spot A Liar
No ratings yet
TEDtalk Transcript - How To Spot A Liar
9 pages
Catalogue Centrifugal Pumps 2
No ratings yet
Catalogue Centrifugal Pumps 2
54 pages
EfES L1
No ratings yet
EfES L1
10 pages
Thesis Approval Muhs 2016
100% (1)
Thesis Approval Muhs 2016
7 pages
FM Heat & Smoke Detector
No ratings yet
FM Heat & Smoke Detector
34 pages
Hexa Research Inc
No ratings yet
Hexa Research Inc
5 pages
Verbal Autopsy Standards 2022 Who Verbal Autopsy Instrument v1 Final
No ratings yet
Verbal Autopsy Standards 2022 Who Verbal Autopsy Instrument v1 Final
40 pages
Dynamic Approach Tomosulo Algorithm
No ratings yet
Dynamic Approach Tomosulo Algorithm
59 pages
Dynamic Approach Tomosulo Algorithm
No ratings yet
Dynamic Approach Tomosulo Algorithm
57 pages
1.pipelining & ILP
No ratings yet
1.pipelining & ILP
37 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
27 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
26 pages
Module 5
No ratings yet
Module 5
27 pages
My First Project
No ratings yet
My First Project
7 pages
Measuring & Evaluating Learning
No ratings yet
Measuring & Evaluating Learning
15 pages
Moba Compaction Assistance
No ratings yet
Moba Compaction Assistance
12 pages
Automotive and Small Engine Tools Assessment For CO
No ratings yet
Automotive and Small Engine Tools Assessment For CO
2 pages
Student Guide M2
No ratings yet
Student Guide M2
49 pages
Religion, Guilt, and Ethical Standards
No ratings yet
Religion, Guilt, and Ethical Standards
17 pages
Ni 2671
No ratings yet
Ni 2671
20 pages
Adobe Scan 04-Mar-2024
No ratings yet
Adobe Scan 04-Mar-2024
12 pages
Facial Expressionsandthe Abilityto Recognize Emotionsfromthe Eyesor Mouth AComparison Between Childrenand Adults
No ratings yet
Facial Expressionsandthe Abilityto Recognize Emotionsfromthe Eyesor Mouth AComparison Between Childrenand Adults
11 pages
Sen (2017) What Stays Unsaid in Therapeutic Relationships
No ratings yet
Sen (2017) What Stays Unsaid in Therapeutic Relationships
6 pages
mc34164 PDF
No ratings yet
mc34164 PDF
12 pages
1.0 Executive Summary: Abdm3313 Entrepreneurship
No ratings yet
1.0 Executive Summary: Abdm3313 Entrepreneurship
17 pages
Your Reliance Bill: Summary of Current Charges Amount (RS)
No ratings yet
Your Reliance Bill: Summary of Current Charges Amount (RS)
3 pages
Zoning Map
No ratings yet
Zoning Map
1 page

Anch Prediction

Uploaded by

Anch Prediction

Uploaded by

CS2354 Advanced Computer Architecture

The instruction in the

LD R1, 0(R2) ← Load Stall

almost rarely taken

LD R1, 0(R2) LD R1, 0(R2)

Number of instructions executed

• Have described techniques to overcome data

• Goal: Allowing the processor to resolve the outcome

Observation: the 3rd branch is correlated with the

16% Unlimited Entries 2-bit BHT

• Hopes to select right predictor for right branch

Local 1/0 0/1 1/0 0/1

0/0, 1/1 0/0, 1/1

2-bits per branch

1024 10-bit entries

0% 20% 40% 60% 80% 100%

Uncon Decode target

– Multi-issue: BT-buffer needs to supply multiple

• Predicting indirect jumps: jumps whose

You might also like