0% found this document useful (0 votes)
174 views28 pages

Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX

Pipelining is a technique where the instruction execution process is divided into discrete stages so that multiple instructions can be overlapped in execution. By partitioning the instruction execution process into stages like fetch, decode, execute etc and ensuring independent hardware for each stage, multiple instructions can progress through the pipeline simultaneously, improving instruction throughput. However, pipelining can introduce hazards like structural hazards from resource conflicts, data hazards from dependencies between instructions, and control hazards from branches. Solutions involve buffering, forwarding, interlocking, and dynamic/static scheduling.

Uploaded by

Syed Ashmad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views28 pages

Pipelining Basic Concepts: Instruction Fetch Execute Operand Fetch IF OF EX

Pipelining is a technique where the instruction execution process is divided into discrete stages so that multiple instructions can be overlapped in execution. By partitioning the instruction execution process into stages like fetch, decode, execute etc and ensuring independent hardware for each stage, multiple instructions can progress through the pipeline simultaneously, improving instruction throughput. However, pipelining can introduce hazards like structural hazards from resource conflicts, data hazards from dependencies between instructions, and control hazards from branches. Solutions involve buffering, forwarding, interlocking, and dynamic/static scheduling.

Uploaded by

Syed Ashmad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Pipelining Basic Concepts

Partition the function of instruction execution into smaller functions


(called a stage), allocate separate concurrently operating hardware
components so that at any given time, there exist several instructions
in various stages of execution.

Instruction Operand Execute


fetch fetch
IF OF EX

1
Typical Non-Pipelined Execution

EX I0 I1 I2
OF I0 I1 I2
IF I0 I1 I2 I3
1 2 3 4 5 6 7 8 9 10

Time to execute n instructions: 3nt

2
Ideal Pipelined Execution

EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10

Time to execute n instructions: (2 + n)t

3
Pipeline Turbulence

Consider the presence of a branch instruction in the pipeline:

EX I0 I1 I2 I3 Ibr Ik
OF I0 I1 I2 I3 Ibr Ik Ik+1
IF I0 I1 I2 I3 Ibr Ik Ik+1 Ik+2
1 2 3 4 5 6 7 8 9 10

Branch instructions introduce control hazards into the pipeline,


negatively impacting pipeline performance.

4
Multicycle Execution Units
EX

Integer unit

EX

FP/integer
multiply

IF ID MEM WB
EX

FP adder

EX

FP/integer
divider

FIGURE 3.42 The DLX pipeline with three additional unpipelined, floating-point,
functional units.

5
SuperScalar/Multiple Issue

Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory

Switch
Crossbar
Unit 1 Module 0

Integer Memory
Unit 2 Module 1

6
Hazards

Structural Hazards: arising from resource conflicts when the


hardware cannot support all possible combinations of instructions in
simultaneous overlapped execution.
Data Hazards: arising when an instruction in the pipeline depends on
the results of a previous instruction still in the pipeline.
Control Hazards: arising from the pipelining of branches or other
instructions that change the PC.

Hazards are easily solved by stalling, but stalling reduces pipeline


efficiency.

Structural hazards can be solved by extra hardware resource — a


cost/performance tradeoff.
Load/Store architectures substantially reduce complexity of hazards.

7
Structural Hazards

• Split I/D caches


• Pipelined Execute Units
• Buffers on Execute Units
• Hardware replication

8
Pipelined & Buffered Execution Units

Integer unit

EX

FP/integer multiply

M1 M2 M3 M4 M5 M6 M7

IF ID MEM WB
FP adder

A1 A2 A3 A4

FP/integer divider

DIV

FIGURE 3.44 A pipeline that supports multiple outstanding FP operations.

9
Data Hazards

RAW: read after write

WAW: write after write (only present when writes occur at different
stages in a pipeline)

WAR: write after read (only possible when writes may occur earlier
than some reads)

Note: RAR (read after read) is not a hazard.

10
Data Hazards: Potential Solutions

• Pipeline interlocking

• Forwarding

• Compiler optimizations

11
Pipeline Interlocking

• Stall pipeline until data hazard is eliminated

• Loses performance benefits of pipelining

12
Pipeline Interlocking

Assume RAW conflict between I3 and I4.

EX I0 I1 I2 I3 I4 I5 I6
OF I0 I1 I2 I3 I4 I5 I6 I7
IF I0 I1 I2 I3 I4 I5 I6 I7 I8
1 2 3 4 5 6 7 8 9 10

13
Data Forwarding

• Organize data path with routes back from later pipe stages into
earlier pipe stages.

• Efficient operation best supported when operand encoding is simple.

14
Data Forwarding

rd rs1 rs2 rd rs1 rs2


=

Operand Execute
fetch

15
Static Scheduling

• Compiler optimization to schedule instructions as per data hazard


considerations.

• Organize code into basic blocks (single entry/single exit)

• Schedule instructions to minimize data hazards between adjacent


instructions.

16
Dynamic Scheduling

• Scoreboarding
• Tomasulo’s Algorithm
• VLIW

17
An Architecture for Scoreboarding
Registers Data buses

FP mult
FP mult

FP divide

FP add

Integer unit

Scoreboard
Control/ Control/
status status

FIGURE 4.3 The basic structure of a DLX processor with a scoreboard.

18
An Architecture for Tomasulo
From instruction unit
Floating-
From point
memory operation
queue FP registers
Load buffers
6
5
4
3
2 Operand Store buffers
1 buses 3
2
1

Operation bus To
memory

3 2
2 Reservation 1
1 stations

FP adders FP multipliers

Common data bus (CDB)

FIGURE 4.8 The basic structure of a DLX FP unit using Tomasulo's algorithm.

19
Structural/Data Hazards and the Original Pentium

Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory

Switch
Crossbar
Unit 1 Module 0

Integer Memory
Unit 2 Module 1

20
Control Hazards

• Evaluate branches earlier

• Stalls/pipelined interlocks

• Delayed Branch

• Statically predict taken/not-taken & flush when not

• Canceling (or nullifying) branch: compiler prediction/flush

• Dynamic (hardware) prediction

• Conditional Instructions

21
Delayed Branch

Redefine architecture so branches take effect after n instrs after branch

• Where to get instructions to fill branch delay slot?


– Noop
– Before the branch
– After the branch (from both destinations or, if needed/possible,
only one)
• Compiler effectiveness
– Fills about 60% of delay slots (when one slot)
– About 80% instrs in delay slot useful
• Complications: deep pipelines & superscalar implementations

22
Predict Taken/Not-Taken, Statically or Dynamically

Assume I3 is a conditional branch & branch resolved in EX stage.

Correct Prediction:

EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10

Incorrect Prediction:

EX I0 I1 I2 I3 Ik
OF I0 I1 I2 I3 I4 Ik Ik+1
IF I0 I1 I2 I3 I4 I5 Ik Ik+1 IK k + 2
1 2 3 4 5 6 7 8 9 10

23
Branch Prediction

Taken

Not taken

Predict taken Predict taken

Taken

Taken Not taken

Not taken

Predict not taken Predict not taken

Taken

Not taken

FIGURE 4.13 The states in a two-bit prediction scheme.

24
Branch Prediction w/ History

Branch address
4

2–bit per branch predictors

XX XX prediction

2–bit global branch history

FIGURE 4.20 A (2,2) branch-prediction buffer uses a two-bit global history to


choose from among four predictors for each branch address.

25
Branch Target Buffers

PC of instruction to fetch

Look up Predicted PC

Number of
entries
in branch-
target
buffer

No: instruction is
= not predicted to be Branch
branch. Proceed normally predicted
taken or
Yes: then instruction is branch and predicted untaken
PC should be used as the next PC

FIGURE 4.22 A branch-target buffer.

26
Send PO to
memory and

Step-by-step use of Branch Target Buffers


branch-target
buffer

IF

No Entry found in Yes


branch-target
buffer?

Send out
predicted
Is
PC
No instruction Yes
a taken
branch?

ID

No Taken Yes
branch?
Normal
instruction
execution

Enter Mispredicted Branch


branch IO branch, kill fetched correctly
and next PC instruction; restart predicted;
EX into branch fetch at other continue
target buffer target; delete execution with
entry from no stalls
target buffer

FIGURE 4.23 The steps involved in handling an instruction with a branch-target


buffer.
27
Exceptions

1. Synchronous versus Asynchronous


2. User requested versus coerced
3. User maskable (or not)
4. Within or between instructions
5. Resumption versus Termination
6. Precise/Imprecise

28

You might also like