Pipelining - Modified1
Pipelining - Modified1
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
• This operator scheduled his loads to be delivered to the laundry every 90 minutes
which is the time required to finish one load. In other words he will not start a
new task unless he is already done with the previous task
• The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined Laundry
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
• Another operator asks for the delivery of loads to the laundry every 40 minutes!
• Pipelined laundry takes 3.5 hours for 4 loads
Pipelining Facts
• Multiple tasks operating
simultaneously
6 PM 7 8 9 • Pipelining doesn’t help
Time latency of single task, it
helps throughput of
30 40 40 40 40 20 entire workload
T • Pipeline rate limited by
a A
s
slowest pipeline stage
k • Potential speedup =
B Number of pipe stages
O
r
• Unbalanced lengths of
d C The washer pipe stages reduces
waits for the
e dryer for 10 speedup
r minutes • Time to “fill” pipeline
D
and time to “drain” it
reduces speedup
Instruction pipeline versus sequential processing
sequential processing
Instruction pipeline
Instruction pipeline (Contd.)
sequential
processing is
faster for few
instructions
Performance of Pipelining system
Tk (k (n 1))
T1 nk n
Speedup
Tk k (n 1) k
Pipeline Datapath for MIPS Instruction
Time Taken by each MIPS Instruction: Sequential Vs Pipeline Execution
Graphical Representation of ILP
Single Cycle Non Pipeline Datapath
Instruction Execution in Single Cycle Datapath Assuming
Pipelining
Pipeline Version of Single Cycle Datapath for MIPS
Pipeline Control Issues and Hardware
• Here, the following stages perform work as specified:
• IF/ID: Initializes control by passing the rs, rd, and rt fields of the
instruction, together with the opcode and funct fields, to the control
circuitry.
• ID/EX: Buffers control for the EX, MEM, and WB stages, while executing
control for the EX stage. Control decides what operands will be input to
the ALU, what ALU operation will be performed, and whether or not a
branch is to be taken based on the ALU Zero output.
• EX/MEM: Buffers control for the MEM and WB stages, while executing
control for the MEM stage. The control lines are set for memory read or
write, as well as for data selection for memory write. This stage of
control also contains the branch control logic.
• MEM/WB: Buffers and executes control for the WB stage, and selects
the value to be written into the register file.
The control lines for the final three stages: Note that four of the nine control lines are used in the EX phase,
with the remaining five control lines passed on to the EX/MEM pipeline register extended to hold the control
lines; three are used during the MEM stage, and the last two are passed to MEM/WB for use in the WB stage.
Hazards in
Pipelining System: Data & Branch
Overview of Hazards
• Pipeline processors have several problems associated with
controlling smooth, efficient execution of instructions on the
pipeline. These problems are generally called hazards, and
include the following three types:
• Structural Hazards occur when different instructions collide
while trying to access the same piece of hardware in the same
segment of a pipeline. This type of hazard can be alleviated by
having redundant hardware for the segments wherein the
collision occurs. Occasionally, it is possible to insert stalls or
reorder instructions to omit this type of hazard.
Time Taken by each MIPS Instruction: Sequential Vs Pipeline Execution
Structural Hazard #1: in case of Single Memory
Time (clock cycles)
ALU
Mem Reg Mem Reg
I memory
n
s
Inst 1
ALU
t Mem Reg Mem Reg
r.
Inst 2
ALU
O Mem Reg Mem Reg
r
d
Inst 3
ALU
e Mem Reg Mem Reg
r
ALU
Mem Mem Reg
Inst 4 Reading instruction Reg
from memory
ALU
I$ Reg D$ Reg
s lw
ALU
t Instr 1 I$ Reg D$ Reg
r.
ALU
I$ Reg D$ Reg
Instr 2
O
ALU
I$ Reg D$ Reg
Instr 3
r
ALU
d Instr 4 I$ Reg D$ Reg
e
r
24
Structural Hazard #2: Registers (1/2)
ALU
s I$ Reg D$ Reg
t
lw
ALU
r Instr 1
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
Instr 2
O
ALU
r Instr 3
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
e Instr 4
r
26
Overview of Hazards
• Data Hazards occur when an instruction depends on the result of a
previous instruction still in the pipeline, which result has not yet been
computed. The simplest remedy inserts stalls in the execution sequence,
which reduces the pipeline's efficiency.
• The solution to data dependencies is twofold.
– First, one can forward the ALU result to the writeback or data fetch stages.
– Second, in selected instances, it is possible to restructure the code to eliminate some
data dependencies.
• Control Hazards can result from branch instructions. Here, the branch
target address might not be ready in time for the branch to be taken,
which results in stalls (dead segments) in the pipeline that have to be
inserted as local wait events, until processing can resume after the branch
target is executed. Control hazards can be mitigated through accurate
branch prediction (which is difficult), and by delayed branch strategies.
Data Hazard
• Definition. A data hazard occurs when the current instruction
requires the result of a preceding instruction, but there are
insufficient segments in the pipeline to compute the result and
write it back to the register file in time for the current instruction to
read that result from the register file.
• We typically remedy this problem in one of three ways:
• Forwarding: In order to resolve a dependency, one adds special
circuitry to the pipeline that is comprised of wires and switches with
which one forwards or transmits the desired value to the pipeline
segment that needs that value for computation. Although this adds
hardware and control circuitry, the method works because it takes
far less time for the required value(s) to travel through a wire than it
does for a pipeline segment to compute its result.
Data Hazard
Example of data
hazards in a
sequence of MIPS
instructions, where
the red (blue) arrows
indicate
dependencies that
are problematic
Operand Forwarding
Data Hazard
Data Hazard Solution using Operand Forwarding and Stall
Data Hazard Solution using Operand Forwarding and Stall (Cont.)
Data Hazard
• Code Re-Ordering: Here, the compiler reorders statements in the
source code, or the assembler reorders object code, to place one or
more statements between the current instruction and the instruction
in which the required operand was computed as a result. This requires
an "intelligent" compiler or assembler, which must have detailed
information about the structure and timing of the pipeline on which
the data hazard would occur. We call this type of software a hardware-
dependent compiler.
• Stall Insertion: It is possible to insert one or more stalls (no-op
instructions) into the pipeline, which delays the execution of the
current instruction until the required operand is written to the register
file. This decreases pipeline efficiency and throughput, which is
contrary to the goals of pipeline processor design. Stalls are an
expedient method of last resort that can be used when compiler
action or forwarding fails or might not be supported in hardware or
software design.
• Problem: The first instruction (sub), starting on clock cycle 1 (CC1) completes on CC5, when
the result in Register 2 is written to the register file. If we did nothing to resolve data
dependencies, then no instruction that read Register 2 from the register file could read the
"new" value computed by the sub instruction until CC5. The dependencies in the other
instructions are illustrated by solid lines with arrowheads. If register read and write cannot
occur within the same clock cycle (we will see how this could happen in Section 5.3.4),
then only the fifth instruction (sw) can access the contents of register 2 in the manner
indicated by the flow of sequential execution in the MIPS code fragment shown previously.
• Solution #1 - Forwarding: The result generated by the sub instruction can be forwarded to
the other stages of the pipeline using special control circuitry (data bus switchable to any
other segment, which can be implemented via a decoder or crossbar switch). This is
indicated notionally in Figure 5.7 by solid red lines with arrowheads. If the register file can
read in the first half of a cycle and write in the second half of a cycle, then the forwarding
in CC5 is not problematic. Otherwise, we would have to delay the execution of the add
instruction by one clock cycle (see Figure 5.9 for insertion of a stall).
• Solution #2 - Code Re-Ordering: Since all Instructions 2 through 5 in the MIPS code
fragment require Register 2 as an operand, we do not have instructions in that particular
code fragment to put between Instruction 1 and Instruction 2. However, let us assume that
we have other instructions that (a) do not depend on the results of Instructions 1-5, and
(b) themselves induce no dependencies in Instructions 1-5 (e.g., by writing to register 1, 2,
3, 5, or 6). In that case, we could insert two instructions between Instructions 1 and 2, if
register read and write could occur concurrently. Otherwise, we would have to insert three
such instructions. The latter case is illustrated in the following figure, where the inserted
instructions and their pipeline actions are colored dark green.
Example of code reordering to solve data
hazards in a sequence of MIPS instructions
• Solution #3 - Stalls: Suppose that we had no instructions to
insert between Instructions 1 and 2. For example, there
might be data dependencies arising from the inserted
instructions that would themselves have to be repaired.
Alternatively, the program execution order (functional
dependencies) might not permit the reordering of code. In
such cases, we have to insert stalls, also called bubbles,
which are no-op instructions that merely delay the
pipeline execution until the dependencies are no longer
problematic with respect to pipeline timing. This is
illustrated in Figure 5.9 by inserting three stalls between
Instructions 1 and 2.
Example of stall insertion to solve data
hazards in a sequence of MIPS instructions
Note: The first line tests to see if the instruction is a load: the only
instruction that reads data memory is a load. The next two lines
check to see if the destination register field of the load in the EX
stage matches either source register of the instruction in the ID
stage. If the condition holds, the instruction stalls 1 clock cycle.
Stall when R-format dependent instruction follow a Load
Instruction (pipeline Stall)
Control Hazards
41
Control Hazards Simple Solution Option 1: two Stalls
ALU
I$ Reg D$ Reg
s beq
t
r. nop bubble bubble bubble bubble bubble
ALU
d Instr I$ Reg D$ Reg
e
ALU
r Instr I$ Reg D$ Reg
43
Special Branch Comparator with One Clock Cycle Stall
ALU
n beq I$ Reg D$ Reg
s
t nop bubble bubble bubble bubble bubble
r.
ALU
I$ Reg D$ Reg
Instr
O
r Instr
ALU
I$ Reg D$ Reg
d
e Instr
ALU
I$ Reg D$ Reg
r
Exit: Exit:
47
Notes on Branch-Delay Slot
48
Control Hazards: Branch Prediction
50
Exercise 2
51
https://www.cise.ufl.edu/~mssz/CompOrg/CDA-pipe.html
Thank you