0% found this document useful (0 votes)

15 views144 pages

CA L7 Unit4 Slides Updated

The document discusses pipelining in computer processors. It describes the steps in RISC-V instruction execution and shows how pipelining can improve performance over a single cycle implementation by overlapping the execution of instructions.

Uploaded by

Incia Saleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views144 pages

CA L7 Unit4 Slides Updated

Uploaded by

Incia Saleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 144

Computer Architecture

(EE 371 / CE 321 / CS 330)

Dr. Farhan Khan

Assistant Professor,
Dhanani School of Science & Engineering,
Habib University

1
Outline: Unit 4

• Pipelined Processor Implementation

• Pipeline Hazards

• Pipelined Datapath

• Control of Pipelined Datapath

2
Combination of Combinational and Sequential Logic
Elements

• Because only state elements can store a data value, any collection of
combinational logic must have its inputs come from a set of state
elements and its outputs written into a set of state elements

• When two state elements surround a block of combinational logic

– All signals must propagate from state element 1, through the combinational logic, and
to state element 2 in the time of one clock cycle.
– The time necessary for the signals to reach state element 2 defines the length of the
clock cycle

3
Datapath with Control

4
RISC-V Instruction Execution Steps

• RISC-V instructions classically take five steps:

1. IF: Fetch instruction from memory.
2. ID: Decode instruction and read registers
3. EX: Execute the operation or calculate an address.
4. MEM: Access an operand in data memory (if necessary).
5. WB: Write the result back into a register (if necessary).

5
Steps in RISC-V Instruction Execution

6
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

7
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 1 8
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 2 9
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 3 10
Processor Implementation

• Classical performance equation:

• Instruction Count
– Determined by Compiler and Instruction Set Architecture
• CPI and Clock cycle time
– Determined by Processor Implementation

11
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay

in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

12
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

13
RISC-V Performance: Single Cycle

• Assume that the operation times for the major functional units of data
path are :
– 100ps for register read or write
– 200 ps for memory access for instructions or data,
– 200 ps for ALU operation

Single-Cycle Implementation
• Clock cycle time must support the
longest instruction (i.e. ld)
• Clock Cycle Time : 800 ps

14
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

15
Clock Cycle = 800
RISC-V Performance: Single Cycle

Single-cycle

16
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay

in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

• Although CPI is 1, the overall performance of a signal cycle processor

implementation is poor because the clock cycle is too long.

17
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay

in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

• Although CPI is 1, the overall performance of a single cycle processor

implementation is poor because the clock cycle is too long.

• We can improve performance by Pipelining

18
Pipelining Analogy

19
Pipelining Analogy

• Pipelining improves throughput of our laundry system.

• Pipelining would not decrease the time to complete one load of laundry
but when we have many loads of laundry to do, the improvement in
throughput decreases the total time to complete the work. 20
RISC-V Pipeline

• RISC-V instructions classically take five steps:

• Hence, the first RISC-V pipeline we explore has five stages: one step per
stage

21
Steps in RISC-V Instruction Execution

22
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

23
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 1 24
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 2 25
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 3 26
RISC-V Pipeline Performance

27
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

28
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

29
Clock Cycle = 800
Steps in RISC-V Instruction Execution: Pipelined

200 ps 100 ps 200 ps 200 ps 100 ps

30
Steps in RISC-V Instruction Execution: Pipelined

200 ps 100 ps 200 ps 200 ps 100 ps

Clock Cycle = 200 ps 31

RISC-V Pipeline Performance

Single-Cycle Implementation Pipelined Implementation

• Clock cycle time must support the • Each clock must be able to support the
longest instruction (i.e. ld) slowest stage.
• Clock Cycle Time : 800 ps • Clock Cycle Time : 200 ps

32
RISC-V Pipeline

Single-cycle

Pipelined

33
RISC-V Pipeline

Single-cycle

Pipelined

Single-Cycle Implementation Pipelined Implementation

Time between Instructions = 800 ps Time between instructions = 200 ps

34
RISC-V Pipeline

Single-cycle

Pipelined

Single-Cycle Implementation Pipelined Implementation

Total execution time Total execution time
for three instructions = 2400 ps for three instructions = 1400 ps

Improvement = 2400/1400 = 1.71 35

RISC-V Pipeline

Single-cycle

Pipelined

Single-Cycle Implementation Pipelined Implementation

Total execution time Total execution time

for1,000,003 instructions for 1,000,003 instructions
= (1000,000 x 800) + 2400 ps = (1000,000 x 200) + 1400 ps
= 800,002,400 ps = 200,001,400
Improvement = 800,002,400/200,001,400 ≈ 4.00 36
RISC-V Pipeline

Single-cycle

Pipelined

• Pipelining improves performance by increasing instruction

throughput, in contrast to decreasing the execution time of an
individual instruction.

• Instruction throughput is the important metric because real

programs execute millions of instructions. 37
Performance Improvement through Pipelining

• Under ideal conditions and if the pipeline stages are balanced (i.e. all the
stages took the same operational time)

38
Performance Improvement through Pipelining

• Under ideal conditions and if the pipeline stages are balanced (i.e. all the
stages took the same operational time)

• Due to unbalanced pipeline stages and pipelining overheads (discussed in

next lectures), this formula only provides an upper limit on the
performance improvement.

39
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

40
Steps in RISC-V Instruction Execution: Unbalanced Pipeline

200 ps 100 ps 200 ps 200 ps 100 ps

41
Steps in RISC-V Instruction Execution

200 ps 200 ps 200 ps 200 ps 200 ps

42
Steps in RISC-V Instruction Execution: Balanced Pipeline

200 ps 200 ps 200 ps 200 ps 200 ps

43
Pipeline Hazards

• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.

44
Pipeline Hazards

• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.

• Consider the following instruction sequence:

add x19, x0, x1
sub x2, x19, x3

45
Pipeline Hazards

• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.

• Consider the following instruction sequence:

add x19, x0, x1
sub x2, x19, x3

46
Pipeline Hazards

• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.

• There are 3 types of pipeline hazards

1. Structural Hazard
– A required hardware resource is busy

2. Data Hazard
– Data that is needed to execute the next instruction has not yet become available

3. Control Hazard
– Deciding on control action (or instruction sequence) depends on previous instruction that
has not yet been completed

47
Structural Hazards

• A planned instruction cannot execute in the proper clock cycle because

a hardware resource is busy due to a previous instruction

• Example
– In RISC-V pipeline with a single memory, Load/store requires data access to this single
memory

– Instruction fetch (for a future instruction) would have to stall for that cycle

– Hence, pipelined datapaths require separate instruction/data memories or separate

instruction/data cache

48
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 3 49
RISC-V Pipeline

Single-cycle

Pipelined

50
Data Hazards

• In a processor pipeline, data hazards arise from the dependence of one

instruction on data that is yet to be produced by an earlier instruction (still
in the pipeline).

51
Data Hazards

• In a processor pipeline, data hazards arise from the dependence of one

instruction on data that is yet to be produced by an earlier instruction (still
in the pipeline).

• Example
– add x19, x0, x1
sub x2, x19, x3

52
Solving Data Hazards: Forwarding (Bypassing)

• Use result when it is computed

– Don’t wait for it to be stored in a register
– Requires extra connections in the datapath

• Example
– add x19, x0, x1
sub x2, x19, x3

53
Solving Data Hazards: Limitations of Forwarding

• Forwarding cannot always avoid pipeline stalls.

– Value might not even be computed when it is needed

• Load-Use Data Hazard

– ld x1, 0(x2)
sub x4, x1, x5

54
Solving Data Hazards: Limitations of Forwarding

• Forwarding cannot always avoid pipeline stalls.

– Value might not even be computed when it is needed

• Load-Use Data Hazard

– ld x1, 0(x2)
sub x4, x1, x5

55
Solving Data Hazards: Code Scheduling

• Reorder code to avoid use of load result in the next instruction

• C code:
– a = b + e; c = b + f;

56
Solving Data Hazards: Code Scheduling

• Reorder code to avoid use of load result in the next instruction

• C code:
– a = b + e; c = b + f;

57
Solving Data Hazards: Code Scheduling

• Reorder code to avoid use of load result in the next instruction

• C code:
– a = b + e; c = b + f;

58
Control Hazards

• Deciding on next instruction (to be fetched) in the instruction sequence

depends on previous branch instruction that has not yet been completed

59
Control Hazards

• Deciding on next instruction (to be fetched) in the instruction sequence

depends on previous branch instruction that has not yet been completed

60
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

61
Steps in RISC-V Instruction Execution

Let’s add extra hardware so we can test a register,

calculate the branch address, and update the PC during the second stage of
the pipeline
62
Control Hazards

• Deciding on next instruction (to be fetched) in the instruction sequence

depends on previous branch instruction that has not yet been completed

Even in the presence of additional hardware (mentioned in the last slide),

63
there is a pipeline stall.
Handling Control Hazards: Branch Prediction

• Predict the outcome of the conditional branch

– When the prediction turns out to be right, the pipeline proceeds at full speed.
– When the prediction turns out to be wrong, the pipeline control must ensure that the
instructions following the wrongly guessed conditional branch have no effect and must
restart the pipeline from the proper branch address

64
Steps in RISC-V Instruction Execution

beq x1,x4, 100

ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 3 65
Handling Control Hazards: Types of Branch Prediction

• Static Branch Prediction:

– Rigid branch prediction rules based on typical branch behavior
Examples
1. Predict that conditional branches will always be untaken
2. At the bottom of loops are conditional branches that branch back to the top of the loop.
Since they are likely to be taken and they branch backward, we could always predict
taken for conditional branches that branch to an earlier address.

• Dynamic Branch Prediction

– Hardware measures actual branch behavior. Then, recent history is used to predict the
outcome of a branch

66
Designing Instruction Set Architecture (ISA) for Pipelining

• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:

1) RISC-V instructions are the same length.

– This restriction makes it much easier to fetch instructions in the first pipeline stage and to
decode them in the second stage.

– In an instruction set like the x86, where instructions vary from 1 byte to 15 bytes,
pipelining is considerably more challenging.

– Modern implementations of the x86 architecture actually translate x86 instructions into
simple operations that look like RISC-V instructions and then pipeline the simple
operations rather than the native x86 instructions!

67
Designing Instruction Set Architecture (ISA) for Pipelining

• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:

2) RISC-V only has a few instruction formats, with the source and
destination register fields being located in the same place in each
instruction
– This design choice makes it much easier to decode instructions and read registers in one
step.

68
Designing Instruction Set Architecture (ISA) for Pipelining

• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:

3) Memory operands only appear in loads or stores in RISC-V.

– This restriction means we can use the execute stage to calculate the memory address and
then access memory in the following stage.

– If we could operate on the operands in memory, as in the x86, stages 3 and 4 would
expand to an address stage, memory stage, and then execute stage.

– We will shortly see the challenges of longer pipelines

69
Pipelining: Overview

• Pipelining improves performance by increasing instruction throughput

– Executes multiple instructions in parallel
– Each instruction has the same latency

• Subject to hazards
– Structure, data, control

• Instruction set design affects complexity of pipeline implementation

70
Pipelining Hazards: Exercise
ld x10, 0(x10)
add x11, x10, x10

71
Pipelining Hazards: Exercise
add x11, x10, x10
addi x12, x10, 5
addi x14, x11, 5

72
Pipeline Diagrams

• Single Clock-Cycle Pipeline Diagrams

– Shows pipeline usage in a single cycle
– Highlight resources used

• Multi Clock-Cycle Pipeline Diagram

– Graph of operation over time

73
Single Clock-Cycle Pipeline Diagram: Example

• Focusing on a single instruction’s passage through pipeline stages

74
Single Clock-Cycle Pipeline Diagram: Example

• State of pipeline in a given clock cycle

75
Multiple Clock-Cycle Pipeline Diagram: Example

• The form showing resource usage

76
Multiple Clock-Cycle Pipeline Diagram: Example

• Traditional form

77
Pipeline Operation for Load Instruction: IF

78
Pipeline Operation for Load Instruction: ID

79
Pipeline Operation for Load Instruction: EX

80
Pipeline Operation for Load Instruction: MEM

81
Pipeline Operation for Load Instruction: WB

82
Flaw in Pipeline Operation for Load Instruction: WB

Wrong
register
number

83
Corrected Pipelined Datapath for Load Instruction

84
Pipeline Operation for Store Instruction: IF

85
Pipeline Operation for Store Instruction: ID

86
Pipeline Operation for Store Instruction: EX

87
Pipeline Operation for Store Instruction: MEM

88
Pipeline Operation for Store Instruction: WB

89
Datapath with Control

90
Implementation of Main Control Unit

• Control signals are derived

from binary encoded
instructions.

• The Main Control Unit can set

all control signals, except
PCSrc, based solely on the
opcode and funct fields of the
instruction.

• To generate PCSrc signal,

Branch signal from the Main
Control Unit is ANDed with
the Zero signal from the ALU.

91
Implementation of Main Control Unit

Truth Table
• Control signals are derived
from binary encoded
instructions.

• The Main Control Unit can set

all control signals, except
PCSrc, based solely on the
opcode field of the
instruction.

• To generate PCSrc signal,

Branch signal from the Main
Control Unit is ANDed with
the Zero signal from the ALU.

92
Pipelined Datapath with Control

93
Grouping of Control Signals by Pipeline Stages: IF & ID

• No control signal is used in ID stages

94
Grouping of Control Signals by Pipeline Stages: EX

95
Grouping of Control Signals by Pipeline Stages: MEM

96
Grouping of Control Signals by Pipeline Stages: WB

97
Implementation of Main Control Unit

Truth Table
• Control signals are derived
from binary encoded
instructions.

• Since control lines are used in

EX, MEM, and WB stage, the
Main Control Unit can create
all the control information
during ID stage that is then
used by later stages.

• The simplest way of passing

these control signals to later
stages is to extended the
pipeline registers to include
control information.
98
Pipelined Datapath with Control

99
Data Hazards and Forwarding: Implementation Issues

• Detecting the need to forward

• Implementation of forwarding

• Detecting the data hazard that cannot be solved by forwarding

• How to stall the pipeline

100
Data Hazards

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

101
Data Hazards

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

102
Data Hazards

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

103
Detecting the Need to Forward

• Pass source and destination register numbers along pipeline

– e.g., ID/EX.RegisterRs1 = register number for Rs1 sitting in ID/EX pipeline register

• ALU operand register numbers in EX stage are given by

– ID/EX.RegisterRs1, ID/EX.RegisterRs2

• Data hazards when Fwd from

– 1a. EX/MEM.RegisterRd == ID/EX.RegisterRs1 EX/MEM
pipeline reg
– 1b. EX/MEM.RegisterRd == ID/EX.RegisterRs2
– 2a. MEM/WB.RegisterRd == ID/EX.RegisterRs1 Fwd from
– 2b. MEM/WB.RegisterRd == ID/EX.RegisterRs2 MEM/WB
pipeline reg

104
Pipelined Datapath

105
Detecting the Need to Forward

• But only if forwarding instruction will write to a register!

– EX/MEM.RegWrite, MEM/WB.RegWrite

• And only if Rd for that instruction is not x0

– EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0

106
Data Hazards and Forwarding

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

107
Data Hazards and Forwarding

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

108
Forwarding Pathways

109
Forwarding Pathways

110
Forwarding Conditions

• EX Hazard (Forwarding from EX/MEM Pipeline Register)

– if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd == ID/EX.RegisterRs1)) ForwardA = 10

– if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 10

111
Forwarding Conditions

• MEM hazard (Forwarding from MEM/WB Pipeline Register)

– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd == ID/EX.RegisterRs1)) ForwardA = 01

– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 01

112
Double Data Hazard

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

• Consider the sequence:

add x1,x1,x2
add x1,x1,x3
add x1,x1,x4

113
Data Hazards and Forwarding

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

114
Data Hazards and Forwarding

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

115
Double Data Hazard

• Consider this sequence:

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

• Consider the sequence:

add x1,x1,x2
add x1,x1,x3
add x1,x1,x4

• Hazards occur from instructions in both MEM and WB stage

– Want to use the most recent one in MEM stage
• Revise forwarding condition from MEM/WB Pipeline Register
– Only fwd if forwarding condition for EX/MEM register isn’t true 116
Revised Forwarding Conditions

• MEM hazard (Forwarding from MEM/WB Pipeline Register)

– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd == ID/EX.RegisterRs1)) ForwardA = 01

– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 01

117
Pipelined Datapath with Forwarding

118
Solving Data Hazards: Limitations of Forwarding

• Forwarding cannot always avoid pipeline stalls.

– Value might not even be computed when it is needed

• Load-Use Data Hazard

– ld x1, 0(x2)
sub x4, x1, x5

119
Load-Use Data Hazard

120
Load-Use Data Hazard: Need for Hazard Detection Unit

• In addition to a Forwarding Unit, we need a Hazard Detection Unit.

– It operates during the ID stage so that it can insert the stall between the load and the
instruction dependent on it.

121
Load-Use Data Hazard: Need for Hazard Detection Unit

• In addition to a Forwarding Unit, we need a Hazard Detection Unit.

– It operates during the ID stage so that it can insert the stall between the load and the
instruction dependent on it.

• If load-use data hazard is detected, Hazard Detection Unit stalls the

pipeline (and inserts a bubble)

122
How to Stall the Pipeline

• Force control values in ID/EX register to 0

– As a result, EX, MEM and WB stages do nop (no-operation)

• Prevent update of PC and IF/ID register

– Using instruction is decoded again
– Following instruction is fetched again

123
Load-Use Data Hazard: Need for Hazard Detection Unit

• In addition to a Forwarding Unit, we need a Hazard Detection Unit.

– It operates during the ID stage so that it can insert the stall between the load and the
instruction dependent on it.

• In ID stage, Hazard Detection Unit uses the following condition to test for
the occurrence of load-use data hazard
– ID/EX.MemRead and
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
(ID/EX.RegisterRd = IF/ID.RegisterRs1))
–
• If load-use data hazard is detected, Hazard Detection Unit stalls the
pipeline (and inserts a bubble)
– The hazard detection unit controls the writing of the PC and IF/ID registers plus the
multiplexor that chooses between the real control values and all 0s

124
Load-Use Data Hazard: How to Stall the Pipeline

125
Load-Use Data Hazard: How to Stall the Pipeline

126
Load-Use Data Hazard: How to Stall the Pipeline

127
Load-Use Data Hazard: How to Stall the Pipeline

128
Pipeline Hazards

• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.

• There are 3 types of pipeline hazards

1. Structural Hazard
– A required hardware resource is busy

2. Data Hazard
– Data that is needed to execute the next instruction has not yet become available

3. Control Hazard
– Deciding on control action (or instruction sequence) depends on previous instruction that
has not yet been completed

129
Control Hazards

• Deciding on next instruction (to be fetched) in the instruction sequence

depends on previous branch instruction that has not yet been completed

130
Control Hazards

• Consider this sequence:

36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
// to 40+16*2=72
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)

131
Control Hazards

• If branch outcome determined in MEM stage

132
Handling Control Hazards: Reducing Branch Delay

• Move hardware to determine outcome to ID stage

– Target address adder
– Register comparator

133
Control Hazards
• Consider this sequence:
36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
// to 40+16*2=72
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)

• Simple strategy:
– Let’s always assume that the branch is not taken

134
Example: Wrong Assumption about Branch Outcome

135
Example: Wrong Assumption about Branch Outcome

136
How to Convert an Instruction in Pipeline to NOP(Bubble)

137
How to Convert an Instruction in Pipeline to NOP(Bubble)

• To flush instructions in the IF stage, we add a control line, called IF.Flush, that zeros the
138
instruction field of the IF/ID pipeline register.
Control Hazards

• Deciding on next instruction (to be fetched) in the instruction sequence

depends on previous branch instruction that has not yet been completed

Even in the presence of additional hardware (mentioned in the last slide),

139
there is a pipeline stall.
Handling Control Hazards: Branch Prediction

• Predict the outcome of the conditional branch

140
Handling Control Hazards: Types of Branch Prediction

• Static Branch Prediction:

• Dynamic Branch Prediction

– Hardware measures actual branch behavior. Then, recent history is used to predict the
outcome of a branch

141
Dynamic Branch Prediction: Implementation

• Branch Prediction Buffer (aka Branch History Table)

– A small memory indexed by the lower portion of the address of the branch instruction.
– Stores information about the recent outcomes (taken/not taken) of the branch
– While executing the branch, look up the address of the instruction in branch history table
to see if the conditional branch was taken/not taken the last time this instruction was
executed, and begin fetching new instructions accordingly.
– If the prediction turns out to be wrong, flush the pipeline and update the prediction
information in the branch history table.

142
Dynamic Branch Prediction: Implementation

• Two Variant of Branch Prediction Buffer (aka Branch History Table)

– 1-bit Predictor
– 2-bit Predictor

• 1-bit Predictor
– Only 1 bit is used to keep the prediction information.
– At each misprediction, the prediction bit is inverted

• 2-bit Predictor
– 2 bits are used to keep the
prediction information
– The 2 bits are used to encode 4
states in the system

143
Dynamic Branch Prediction: Implementation

• Ideally, the accuracy of the predictor would match the taken branch
frequency for highly regular branches
– Consider the inner loop branch that branches nine times in a row, and then is not taken
once

• Shortcoming of One-bit Predictor

– Mispredict as taken on last iteration of inner loop
– Then mispredict as not taken on first iteration of inner loop next time around
– Thus, the prediction accuracy for this branch that is taken 90% of the time is only 80%
(two incorrect predictions and eight correct ones)
– 2-bit prediction schemes try to address this shortcoming
144

2022 RaySharp Product Guide
No ratings yet
2022 RaySharp Product Guide
87 pages
Balanceo PCC v7
100% (1)
Balanceo PCC v7
4 pages
Enterprise FW 03-Security Fabric
No ratings yet
Enterprise FW 03-Security Fabric
31 pages
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
No ratings yet
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
7 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
C. Router
No ratings yet
C. Router
10 pages
Remote Procedure Call (RPC)
No ratings yet
Remote Procedure Call (RPC)
50 pages
Ict - chs9 Lesson 1 - Basic Computer Configuration Setup
No ratings yet
Ict - chs9 Lesson 1 - Basic Computer Configuration Setup
29 pages
Final Exam Microprocessor Fundamentals 2020 Part 2
No ratings yet
Final Exam Microprocessor Fundamentals 2020 Part 2
7 pages
CKS Curriculum v1.28
No ratings yet
CKS Curriculum v1.28
4 pages
System Software Module 1
No ratings yet
System Software Module 1
93 pages
Vizio M550SV LCD TV User Manual
No ratings yet
Vizio M550SV LCD TV User Manual
53 pages
Chapter 04 RISC V
No ratings yet
Chapter 04 RISC V
130 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Log
No ratings yet
Log
52 pages
Lec04 Pipelining Intro&hazards
No ratings yet
Lec04 Pipelining Intro&hazards
77 pages
Pipelining Preview: Basics & Challenges
No ratings yet
Pipelining Preview: Basics & Challenges
75 pages
Pipelining and Pipelining Hazards
No ratings yet
Pipelining and Pipelining Hazards
43 pages
ML Trends
No ratings yet
ML Trends
89 pages
Networks
No ratings yet
Networks
27 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
MR Sensore Zen-X E 2024 03 GB
No ratings yet
MR Sensore Zen-X E 2024 03 GB
4 pages
(ESP32 At) (v2.2.0.0) Release Note
No ratings yet
(ESP32 At) (v2.2.0.0) Release Note
5 pages
Database Management System
No ratings yet
Database Management System
162 pages
Pitfalls and Challenges Faced During A Microservices Architecture Implementation Codex5066
No ratings yet
Pitfalls and Challenges Faced During A Microservices Architecture Implementation Codex5066
21 pages
Thesis Database Management System
100% (2)
Thesis Database Management System
7 pages
Reduced Instruction Set Computing (RISC) : Li-Chuan Fang
No ratings yet
Reduced Instruction Set Computing (RISC) : Li-Chuan Fang
42 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Regarding Slsksjs
No ratings yet
Regarding Slsksjs
1 page
Lecture: Pipelining Basics
No ratings yet
Lecture: Pipelining Basics
28 pages
L04 Pipelining
No ratings yet
L04 Pipelining
48 pages
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining
No ratings yet
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining
47 pages
Parallel Processing Chapter - 2: Basics of Architectural Design
No ratings yet
Parallel Processing Chapter - 2: Basics of Architectural Design
29 pages
EnggRoom - Placement - Wipro Placement Paper-2009-3
No ratings yet
EnggRoom - Placement - Wipro Placement Paper-2009-3
5 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
WINSEM2022-23 BCSE205L TH VL2022230502914 2023-04-06 Reference-Material-I
No ratings yet
WINSEM2022-23 BCSE205L TH VL2022230502914 2023-04-06 Reference-Material-I
27 pages
Dsap Lab Report 077bei045
No ratings yet
Dsap Lab Report 077bei045
27 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
SSN 0002
No ratings yet
SSN 0002
4 pages
05 RISCV ISA Implementation MC
No ratings yet
05 RISCV ISA Implementation MC
199 pages
Week 11
No ratings yet
Week 11
33 pages
Module Framework R1
No ratings yet
Module Framework R1
16 pages
$RS2V09G
No ratings yet
$RS2V09G
12 pages
Customer Service Advisor Training Manual Nexus 3
No ratings yet
Customer Service Advisor Training Manual Nexus 3
6 pages
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
No ratings yet
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
6 pages
FemtoRV32 Piplined Processor Report
No ratings yet
FemtoRV32 Piplined Processor Report
25 pages
Chapter - 04 RISC V
No ratings yet
Chapter - 04 RISC V
132 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
Week 11-13
No ratings yet
Week 11-13
76 pages
UNIT 5 RISC Architecture
No ratings yet
UNIT 5 RISC Architecture
16 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Introduction To Pipeline Implementation of RISC-V Processor
No ratings yet
Introduction To Pipeline Implementation of RISC-V Processor
23 pages
Lecture # 7.
No ratings yet
Lecture # 7.
26 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
Computer Hardware Lecturer - 4
No ratings yet
Computer Hardware Lecturer - 4
9 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
Lecture 32
No ratings yet
Lecture 32
48 pages
What Is A Domain Name
No ratings yet
What Is A Domain Name
2 pages
RISC-V-DV Paper
No ratings yet
RISC-V-DV Paper
3 pages
326 - 7 - 052020 - Manuale Sweet Sauna Pro ENG
No ratings yet
326 - 7 - 052020 - Manuale Sweet Sauna Pro ENG
29 pages
Analysis of CPU
No ratings yet
Analysis of CPU
9 pages
Pipeline Architecture
No ratings yet
Pipeline Architecture
33 pages
Nitin Gond Resume
No ratings yet
Nitin Gond Resume
1 page
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Lecture # 8B
No ratings yet
Lecture # 8B
20 pages
IT3030E CA Chap5 CPU - Removed
No ratings yet
IT3030E CA Chap5 CPU - Removed
26 pages
RISC-V Single Cycle RTL Design
No ratings yet
RISC-V Single Cycle RTL Design
10 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
RISC
No ratings yet
RISC
11 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
The Magic Cafe Forums - Red Streamlined Convertible by David Regal
No ratings yet
The Magic Cafe Forums - Red Streamlined Convertible by David Regal
3 pages
CO chpt-4
No ratings yet
CO chpt-4
161 pages
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
Design and Implementation of A 32-Bit ISA RISC-V
No ratings yet
Design and Implementation of A 32-Bit ISA RISC-V
5 pages
Disc08 Sols
100% (1)
Disc08 Sols
8 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
32 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Co MODULE 3 - Merged
No ratings yet
Co MODULE 3 - Merged
102 pages
Risc V1
No ratings yet
Risc V1
34 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Design and Implementation of RISC-V Architechture Using FPGA
No ratings yet
Design and Implementation of RISC-V Architechture Using FPGA
20 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
FPGA
No ratings yet
FPGA
10 pages
Risc V
No ratings yet
Risc V
9 pages
6.report Format FINAL
No ratings yet
6.report Format FINAL
30 pages
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
From Everand
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
Redouane MEDDANE
No ratings yet
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet