Pipelining
Pipelining
Topics To Be Covered
Hazards in Pipelining Types of Hazards Performance of Pipelines with Hazards Structural hazards Data Hazard Classification Data hazard resolving techniques Forwarding Hardware Interlocks Delayed Load
Pipeline Hazards
Hazards are the major Hurdle of Pipelining.
1 IF
2 ID
3 EX
10
MEM WB
IF
ID
IF
EX
ID
MEM WB
EX MEM WB
IF
ID IF
EX ID
MEM WB EX MEM WB
Types Of Hazards
Structural Hazards
Data Hazards
Control hazards
Average instruction time unpipelined ----------------------------Average instruction time pipelined CPI unpipelined * Clock Cycle Time unpipelined = ------------------------------------CPI pipelined * Clock Cycle Time pipelined
The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is
CPIpipelined
= Ideal CPI + Pipeline stall clock cycles per instruction = 1 + Pipeline stall clock cycles per instruction
If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and
Speedup =
Speedup =
Thus, If there are no pipeline stalls, this leads to the intuitive result that speedup is equal to the number of pipeline stages.
Structural Hazards
Common instances of structural hazards arise when Some functional unit is not fully pipelined. Some resource has not been duplicated enough to allow all combinations of instructions in the pipeline to execute.
Load
Instr 1 Instr 2 Stall Instr 3
IF ID EX
IF ID IF
MEM
EX ID bubble
WB
MEM EX bubble IF WB MEM bubble ID WB bubble EX bubble MEM WB
Simplified Picture
Clock cycle number Instr Load Instr 1 Instr 2 Instr 3 1 IF 2 ID IF 3 EX ID IF 4 MEM EX ID stall 5 WB MEM EX IF WB MEM ID WB EX MEM WB 6 7 8 9
Question :- why would a designer allow Structural Hazard? Answer :- To reduce cost.
1. RAW (read after write) 2. WAW (write after write) 3. WAR (write after read)
RAW (Read after write) :j tries to read a source before i writes it, so j incorrectly gets the old value. This corresponds to True Data Dependence.
Ex,
1 ADD SUB AND R1, R2, R3 R4, R5, R1 R6, R1, R7 2 3 4 MEM EX IDand 5 WB MEM EX WB MEM WB 6 7 8 9
IF ID EX IF IDsub IF
OR
XOR
R8, R1, R9
R10,R1,R11
IF
IDor
IF
EX
IDxor
MEM
EX
WB
MEM WB
IF
ID
IF
EX
ID
MEM1
EX
MEM2
WB
WB
j tries to write a destination before it is read by i , so i incorrectly gets the new value. This corresponds to Anti Dependence.
For this to happen we need a pipeline that writes results early in the pipeline, and then other instruction read a source later in the pipeline - This cannot occur in linear 5 stage instruction pipeline
Forwarding
Forwarding can be generalized to include passing a result directly to the functional unit that requires it. considering an example,
1 ADD SUB AND R1, R2, R3 R4, R5, R1 R6, R1, R7 IF 2 ID IF 3 EX IDsub IF 4 MEM EX IDand 5 WB MEM EX WB MEM WB 6 7
We notice that result is not actually needed by SUB until after ADD actually produces it
Thus forwarding work as follows - ALU result automatically fed back to ALU input latches . - Need control logic to detect if the feedback should be selected, or the normal input operands
2 ID
3 EX
4 MEM
5 WB
SUB
R4, R5, R1
IF
stall
stall
IDsub
EX
ME M
WB
AND
R6, R1, R7
stall
stall
IF
IDand
EX
MEM
WB
Using the forwarding paths the code sequence can be executed without stalls:
1 ADD R1, R2, R3 IF 2 ID 3 EXadd 4 MEMadd 5 WB 6 7
SUB
AND
R4, R5, R1
R6, R1, R7
IF
ID
IF
EXsub
ID
MEM
EXand
WB
MEM WB
- The first forwarding is for value of R1 from EXadd to EXsub . - The second forwarding is also for value of R1 from MEMadd to EXand. - This code now can be executed without stalls.
Interlocking
Not all potential hazards can be handled by Forwarding.
to preserve the correct execution pattern. A pipeline interlock detects a hazard and stalls
2 ID
3 EX
4 MEM
5 WB
SUB
AND OR
IF
ID
IF
stall
stall stall
EXsub
ID IF
MEM WB
EX ID MEM EX WB MEM WB
* The CPI for stalled instruction increases by the length of the stall .
Delayed Load
Compiler helps arrange instruction to avoid pipeline stalls, called instruction scheduling.
conflict.
If one cant be found insert a NOP.
A=B+C;
D =E+F
LW R1, B LW R2, C
LW R5, F
ADD R6, R4, R5 SW D, R6 <- Need to stall for R5
Thank You