0% found this document useful (0 votes)

30 views48 pages

Pipeline - Instr - Super Branch

This document discusses techniques for handling branches in pipelined processors. It describes three types of data hazards - RAW, WAW, and WAR. It also discusses instruction pipelining, the Tomasulo algorithm for out-of-order execution, branch prediction strategies like 1-bit and 2-bit prediction, and delayed branch scheduling. The key advantages of Tomasulo's scheme are distributed hazard detection logic and elimination of stalls for WAW and WAR hazards. Dynamic branch prediction aims to reduce penalties from mispredicted branches.

Uploaded by

SHEENA Y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views48 pages

Pipeline - Instr - Super Branch

Uploaded by

SHEENA Y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

COMPUTER SYSTEM ARCHITECTURE - CS

405

Module - 5 Part - 1
Instruction Pipeline Design
Data Hazard Classification - RAW
• Three types of data hazards
• Instruction i comes before instruction j
– RAW : Read After Write  R(I)∩ D(J) ≠ ɸ
(flow dependence)
• j tries to read a source before i writes it, so j incorrectly
gets the old value. Solve via forwarding.
Data Hazard Classification - WAW
– WAW : Write After Write  R(I)∩ R(J) ≠ ɸ
(output dependence)
• j tries to write an operand before it is written by i, so
we end up writing values in the wrong order
• Only occurs if there are writes in multiple stages
– Not a problem with single cycle integer
instructions
Data Hazard Classification - WAR
• WAR : Write After Read  D(I) ∩ R(J) ≠ ɸ
• (anti dependence)
– j tries to write a destination before it is read by i, so i incorrectly gets
the new value
– For this to happen we need a pipeline that writes results early in the
pipeline, and then other instruction read a source later in the pipeline

• RAR : Read After Read

– Is this a hazard?
Instruction / Code scheduling
• Code scheduling
 To reduce pipeline stalls
 To increase ILP (instruction level parallelism)
Tomasulo Organization
From Mem FP Op FP Registers
Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5 Store
Load6
Buffers

Add1
Add2 Mult1
Add3 Mult2

Reservation To Mem
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

Common Data Bus (CDB)

Tomasulo Algorithm
• Control & buffers distributed with Function Units
(FU)
– FU buffers called “reservation stations”; have pending operands
• Registers in instructions replaced by values or
pointers to reservation stations(RS);
– form of register renaming ;
– avoids WAR, WAW hazards
– More reservation stations than registers, so can do optimizations compilers
can’t
• Results to FU from RS, not through registers, over
Common Data Bus that broadcasts results to all FUs
• Load and Stores are FUs with reservation stations
• instructions can go past branches
How Tomasulo overlaps loop
iterations
• Register renaming
– Multiple iterations use different physical destinations for registers (dynamic
loop unrolling).

• Reservation stations
– Instructions advance past integer control flow operations
– buffer old values of registers - avoiding WAR stall in scoreboard.
Tomasulo
• For IBM 360/91 (before caches!)
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360)
prevented interesting compiler scheduling of
operations
– Tomasulo: how to get more effective registers — renaming in hardware!

• Same idea used today

– HP 8000, MIPS 10000, Core xx, Power 4,5,6, 7…
Tomasulo’s scheme offers 2 major
advantages
(1)the distribution of the hazard detection logic
– distributed reservation stations and the CDB
– If multiple instructions waiting on single result, & each instruction has other
operand, then instructions can be released simultaneously by broadcast on
CDB
– If a centralized register file were used, the units would have to read their
results from the registers when register buses are available.

(2) the elimination of stalls for WAW and WAR

hazards
Branch handling techniques
• Action of fetching a non sequential or remote instruction after a branch
instruction is branch taken
• Instruction to be executed after a branch taken is branch target
• No. of pipeline cycles wasted between branch taken and the fetching of
its branch target is Delay slot (b)
0<= b <= k-1, k is no of pipeline stages

All instructions after branch in pipeline are flushed, losing a number

of useful cycles.
p= prob of a conditional branch instruction (20%)
q= prob of successfully executed branch (60%)
penalty = pqnbƬ, (bƬ extra pipeline cycles)

If b= k-1 = 7
Pipeline performance can be degraded by 46% with branching when
instruction stream is sufficiently long .
Branch Handling Techniques
Dynamic Hardware Prediction

Dynamic Branch Prediction is the ability of the hardware to make

an educated guess about which way a branch will go - will the
branch be taken or not.
The hardware can look for clues based on the instructions, or it
can use past history.
In the simple 5-stage MIPS pipeline, predict-not taken is simple
prediction strategy. This is ok since the penalty for misprediction
is not much.
If the penalty is large (as in many deeply pipelined machines or
superscalar processors), cannot afford to make frequent
incorrect predictions.
The predictions have to be more sophisticated.
Some popular schemes are:
 1-bit / 2-bit prediction using Branch Prediction Buffers or
Branch target buffer
Branch Prediction

The buffer is indexed by the last few bits of address of the branch
instructions.
Buffer read in the “D” phase. Penalty for wrong prediction depends on
when the PC is calculated.
Dynamic Branch Prediction
• Performance = ƒ(accuracy, cost of misprediction)
• Branch History Lower bits of PC address index table of 1-bit values
– Says whether or not branch taken last time
• Problem: in a loop, 1-bit BHT will cause two mis-predictions:
– End of loop case, when it exits instead of looping as before
– First time through loop on next time through code, when it predicts
exit instead of looping

P
Address 0 r
e
d
31 1 Bits 13 - 2 i
c
t
1023 i
o
n
Dynamic Branch Prediction
• A 1-bit scheme for dynamic branch prediction
for (i =10, i > 0, i =i - 1)
x := x+1

With the branch instruction, 1-bit BHT a history bit is associated.

The bit is changed as follows:
Dynamic Branch Prediction
• Solution:
• 2-bit scheme where change prediction only if get misprediction
twice:
• Only wrong once for branches that execute an unusual direction
once (eg.loop)
Dynamic Branch Prediction
Delayed Branches
Delayed Branches
Delayed Branches

Limitations on delayed-branch scheduling:

- restrictions on the instructions that are scheduled into the
delay slots
- ability to predict at compile time whether a branch is likely
to be taken or not.

Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
Unit 3
No ratings yet
Unit 3
94 pages
Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
Branch Hazard.: Control Hazards
No ratings yet
Branch Hazard.: Control Hazards
4 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Automated Visitor Counter With 7 Segment Display
100% (4)
Automated Visitor Counter With 7 Segment Display
59 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX
No ratings yet
Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX
28 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
05 Risc V Pipeline
No ratings yet
05 Risc V Pipeline
31 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Lec5 PDF
No ratings yet
Lec5 PDF
23 pages
CA Unit 3 Answers
No ratings yet
CA Unit 3 Answers
10 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Pipelining: Basic Concepts
No ratings yet
Pipelining: Basic Concepts
20 pages
CAP EndSem Unit 5
No ratings yet
CAP EndSem Unit 5
8 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
2b.pipeline RISC-V v2
No ratings yet
2b.pipeline RISC-V v2
13 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
Pipelining
No ratings yet
Pipelining
44 pages
Ch#16 (CPU Structure and Function)
No ratings yet
Ch#16 (CPU Structure and Function)
48 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Unit V
No ratings yet
Unit V
23 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
ACA Notes
No ratings yet
ACA Notes
39 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
36 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
RFGHJ
No ratings yet
RFGHJ
20 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
4 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Module 5 Pipeline and Vector Processing
No ratings yet
Module 5 Pipeline and Vector Processing
71 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
More On Pipelining
100% (1)
More On Pipelining
34 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Integrated Circuits Questions and Answers: Q1. What Is An IC?
100% (2)
Integrated Circuits Questions and Answers: Q1. What Is An IC?
3 pages
Chma Unit - Vi
No ratings yet
Chma Unit - Vi
19 pages
8051 MCQ
No ratings yet
8051 MCQ
6 pages
Tap Cell Usage Guidlines Reading Notes-CSDN Blog
No ratings yet
Tap Cell Usage Guidlines Reading Notes-CSDN Blog
7 pages
Mr. A. B. Shinde: Assistant Professor, Electronics Engineering, PVPIT, Budhgaon
No ratings yet
Mr. A. B. Shinde: Assistant Professor, Electronics Engineering, PVPIT, Budhgaon
34 pages
Computer Memory Worksheet
No ratings yet
Computer Memory Worksheet
4 pages
Lecture02 - The 8086 Microprocessor Architecture
No ratings yet
Lecture02 - The 8086 Microprocessor Architecture
53 pages
Vlsi Note 1
No ratings yet
Vlsi Note 1
19 pages
Instruction Set of 8086 Microprocessor
No ratings yet
Instruction Set of 8086 Microprocessor
95 pages
Chapter - 3 MOSFET Working Operation - 2
No ratings yet
Chapter - 3 MOSFET Working Operation - 2
59 pages
IC Fabrication - An Introduction
No ratings yet
IC Fabrication - An Introduction
37 pages
Addressing Modes in 8085 Microprocessor
No ratings yet
Addressing Modes in 8085 Microprocessor
1 page
IPCores Designfrom Specificationsto Production Modeling Verification Optimizationand Protection
No ratings yet
IPCores Designfrom Specificationsto Production Modeling Verification Optimizationand Protection
4 pages
Lecture 12 SRAM
No ratings yet
Lecture 12 SRAM
37 pages
Dte Full Microproject Letest
No ratings yet
Dte Full Microproject Letest
14 pages
Ec6601 Vlsi Design
No ratings yet
Ec6601 Vlsi Design
1 page
DDCO Module 5 Chapter1
No ratings yet
DDCO Module 5 Chapter1
9 pages
Digital Answers
No ratings yet
Digital Answers
5 pages
What Is The Difference Between A Computer and Calculator When Performing Calculations?
No ratings yet
What Is The Difference Between A Computer and Calculator When Performing Calculations?
27 pages
Assignment 2pdf
No ratings yet
Assignment 2pdf
1 page
W6 Programming in 8085 Module 5
No ratings yet
W6 Programming in 8085 Module 5
11 pages
Microprocessors and Microcontrollers
No ratings yet
Microprocessors and Microcontrollers
11 pages
Digital Registers and Memory
No ratings yet
Digital Registers and Memory
9 pages
Intel 8086 Microprocessor: Presented By: Shehrevar Davierwala Visit
No ratings yet
Intel 8086 Microprocessor: Presented By: Shehrevar Davierwala Visit
32 pages
Dma Controller 8257
No ratings yet
Dma Controller 8257
35 pages
Outline: 4: Nonideal Transistor Theory 2 Cmos Vlsi Design Cmos Vlsi Design
No ratings yet
Outline: 4: Nonideal Transistor Theory 2 Cmos Vlsi Design Cmos Vlsi Design
27 pages
EE328 CourseOutlineOBE SP 2023
No ratings yet
EE328 CourseOutlineOBE SP 2023
5 pages
Chapter 4
No ratings yet
Chapter 4
12 pages
6169bb10fc9a02754f687af8 - PVS416G360C4K - Sku Sheet - 100621
No ratings yet
6169bb10fc9a02754f687af8 - PVS416G360C4K - Sku Sheet - 100621
1 page

Pipeline - Instr - Super Branch

Uploaded by

Pipeline - Instr - Super Branch

Uploaded by

COMPUTER SYSTEM ARCHITECTURE - CS

• RAR : Read After Read

Common Data Bus (CDB)

• Same idea used today

(2) the elimination of stalls for WAW and WAR

All instructions after branch in pipeline are flushed, losing a number

Dynamic Branch Prediction is the ability of the hardware to make

With the branch instruction, 1-bit BHT a history bit is associated.

Limitations on delayed-branch scheduling:

You might also like