Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX
Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX
1
Typical Non-Pipelined Execution
EX I0 I1 I2
OF I0 I1 I2
IF I0 I1 I2 I3
1 2 3 4 5 6 7 8 9 10
2
Ideal Pipelined Execution
EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10
3
Pipeline Turbulence
EX I0 I1 I2 I3 Ibr Ik
OF I0 I1 I2 I3 Ibr Ik Ik+1
IF I0 I1 I2 I3 Ibr Ik Ik+1 Ik+2
1 2 3 4 5 6 7 8 9 10
4
Multicycle Execution Units
EX
Integer unit
EX
FP/integer
multiply
IF ID MEM WB
EX
FP adder
EX
FP/integer
divider
FIGURE 3.42 The DLX pipeline with three additional unpipelined, floating-point,
functional units.
5
SuperScalar/Multiple Issue
Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory
Switch
Crossbar
Unit 1 Module 0
Integer Memory
Unit 2 Module 1
6
Hazards
7
Structural Hazards
8
Pipelined & Buffered Execution Units
Integer unit
EX
FP/integer multiply
M1 M2 M3 M4 M5 M6 M7
IF ID MEM WB
FP adder
A1 A2 A3 A4
FP/integer divider
DIV
9
Data Hazards
WAW: write after write (only present when writes occur at different
stages in a pipeline)
WAR: write after read (only possible when writes may occur earlier
than some reads)
10
Data Hazards: Potential Solutions
• Pipeline interlocking
• Forwarding
• Compiler optimizations
11
Pipeline Interlocking
12
Pipeline Interlocking
EX I0 I1 I2 I3 I4 I5 I6
OF I0 I1 I2 I3 I4 I5 I6 I7
IF I0 I1 I2 I3 I4 I5 I6 I7 I8
1 2 3 4 5 6 7 8 9 10
13
Data Forwarding
• Organize data path with routes back from later pipe stages into
earlier pipe stages.
14
Data Forwarding
Operand Execute
fetch
15
Static Scheduling
16
Dynamic Scheduling
• Scoreboarding
• Tomasulo’s Algorithm
• VLIW
17
An Architecture for Scoreboarding
Registers Data buses
FP mult
FP mult
FP divide
FP add
Integer unit
Scoreboard
Control/ Control/
status status
18
An Architecture for Tomasulo
From instruction unit
Floating-
From point
memory operation
queue FP registers
Load buffers
6
5
4
3
2 Operand Store buffers
1 buses 3
2
1
Operation bus To
memory
3 2
2 Reservation 1
1 stations
FP adders FP multipliers
FIGURE 4.8 The basic structure of a DLX FP unit using Tomasulo's algorithm.
19
Structural/Data Hazards and the Original Pentium
Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory
Switch
Crossbar
Unit 1 Module 0
Integer Memory
Unit 2 Module 1
20
Control Hazards
• Stalls/pipelined interlocks
• Delayed Branch
• Conditional Instructions
21
Delayed Branch
22
Predict Taken/Not-Taken, Statically or Dynamically
Correct Prediction:
EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10
Incorrect Prediction:
EX I0 I1 I2 I3 Ik
OF I0 I1 I2 I3 I4 Ik Ik+1
IF I0 I1 I2 I3 I4 I5 Ik Ik+1 IK k + 2
1 2 3 4 5 6 7 8 9 10
23
Branch Prediction
Taken
Not taken
Taken
Not taken
Taken
Not taken
24
Branch Prediction w/ History
Branch address
4
XX XX prediction
25
Branch Target Buffers
PC of instruction to fetch
Look up Predicted PC
Number of
entries
in branch-
target
buffer
No: instruction is
= not predicted to be Branch
branch. Proceed normally predicted
taken or
Yes: then instruction is branch and predicted untaken
PC should be used as the next PC
26
Send PO to
memory and
IF
Send out
predicted
Is
PC
No instruction Yes
a taken
branch?
ID
No Taken Yes
branch?
Normal
instruction
execution
28