0% found this document useful (0 votes)
78 views

Advanced Linux Programming

This document discusses pipelining in computer processors. It explains that pipelining allows overlapping execution of multiple instructions to improve throughput. Pipelining divides instruction execution into stages, like fetch, decode, execute, and writeback, so that a new instruction can begin execution each clock cycle. While pipelining improves throughput, it can introduce hazards like structural hazards from limited hardware resources, control hazards from branches, and data hazards from instructions dependent on earlier instructions. The document uses MIPS as an example pipeline and discusses how its design makes hazards easier to avoid and pipelining more feasible.

Uploaded by

DeepanshGoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Advanced Linux Programming

This document discusses pipelining in computer processors. It explains that pipelining allows overlapping execution of multiple instructions to improve throughput. Pipelining divides instruction execution into stages, like fetch, decode, execute, and writeback, so that a new instruction can begin execution each clock cycle. While pipelining improves throughput, it can introduce hazards like structural hazards from limited hardware resources, control hazards from branches, and data hazards from instructions dependent on earlier instructions. The document uses MIPS as an example pipeline and discusses how its design makes hazards easier to avoid and pipelining more feasible.

Uploaded by

DeepanshGoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Computer Organization

CS1403
Pipelined Data-Path

Mayank Pandey, MNNIT, Allahabad, India


Pipelining
• Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B

Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order

A Pipelined
B

D
Pipelined vs. Single-Cycle
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access:


therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access

2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
• Pipelining does not reduce latency of a single task,
it increases throughput of entire workload
• Pipeline rate limited by longest stage
– potential speedup = number pipe stages
– unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces speedup
Pipelining MIPS
• What makes it easy with MIPS?
– all instructions are same length
• so fetch and decode stages are similar for all instructions
– just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
– memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
– operands are aligned in memory
• one data transfer instruction requires one memory access stage
Pipelining MIPS
• What makes it hard?
– structural hazards: different instructions, at different stages, in the
pipeline want to use the same hardware resource
– control hazards: succeeding instruction, to put into pipeline, depends
on the outcome of a previous branch instruction, already in pipeline
– data hazards: an instruction in the pipeline requires data to be
computed by a previous instruction still in the pipeline

• Before actually building the pipelined datapath and control


we first briefly examine these potential hazards individually…
Structural Hazards
• Structural hazard: inadequate hardware to simultaneously support all
instructions in the pipeline in the same clock cycle
• E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth lw instructions
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
Hazard if single memory
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

• MIPS was designed to be pipelined: structural hazards are easy to


avoid!
Control Hazards
• Control hazard: need to make a decision based on the result of a previous
instruction still executing in pipeline
• Solution 1 Stall the pipeline

Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg Note that branch outcome is
Instruction Data computed in ID stage with
beq $1, $2, 40 Reg ALU Reg
2ns fetch access
added hardware (later…)
Instruction Data
lw $3, 300($0) bubble Reg ALU Reg
fetch access

4 ns 2ns

Pipeline stall
Control Hazards
• Solution 2 Predict branch outcome
– e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg

Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access

Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access

Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access

Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble

Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
MIPS does this

1/22/2019 Mayank Pandey, MNNIT, Allahabad, India 10


Data Hazards
• Data hazard: instruction needs data from the result of a previous
instruction still executing in pipeline
• Solution Forward data if possible…

2 4 6 8 10
Time

IF ID EX
Instruction pipeline diagram:
add $s0, $t0, $t1 MEM WB
shade indicates use –
left=write, right=read

Program
execution 2 4 6 8 10
order Time
(in instructions)
Without forwarding – blue line
add $s0, $t0, $t1 IF ID EX MEM WB
– data has to go back in time;
with forwarding – red line
sub $t2, $s0, $t3 IF ID EX MEM WB
– data is available in time

Mayank Pandey, MNNIT, Allahabad, India


Data Hazards
• Forwarding may not be enough
– e.g., if an R-type instruction following a load uses the result of the load –
called load-use data hazard
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB Without a stall it is impossible


to provide input to the sub
sub $t2, $s0, $t3 IF ID EX MEM WB instruction in time

2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB With a one-stage stall, forwarding


can get the data to the sub
bubble bubble bubble bubble bubble instruction in time
sub $t2, $s0, $t3 IF ID EX MEM WB
Reordering Code to Avoid Pipeline Stall
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)

• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
Pipelined Datapath
• We now move to actually building a pipelined datapath
• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
– all 5 steps done in a single clock cycle
– dedicated hardware required for each step

• What happens if we break the execution into multiple cycles, but keep
the extra hardware?
Review - Single-Cycle Data-path “Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M

16
E
X 32
Memory U
X
T WD
N
D

EX
Execute/ Address
IF ID Calc. MEM WB
Instruction Fetch Instruction Decode Memory Access Write Back
Pipelined Datapath – Key Idea
• What happens if we break the execution into multiple cycles,
but keep the extra hardware?
– Answer: We may be able to start executing a new instruction at each
clock cycle - pipelining
• …but we shall need extra registers to hold data between
cycles – pipeline registers
Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 32
Instruction 16 5 5 5

Memory RN1 RN2 WN


RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Bug in the Datapath

IF/ID ID/EX EX/MEM MEM/WB


ADD

4 ADD

PC Instruction I <<2
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M

16
E
X 32
Memory U
X
T WD
N
D

Write register number comes from another later instruction!


Corrected Datapath
IF/ID ID/EX EX/MEM MEM/WB
ADD
ADD
4 64 bits 133 bits
<<2 102 bits 69 bits
PC
ADDR RD 5
RN1 RD1
32
Zero
Instruction 5
RN2 ALU
Memory Register
WN
5 File RD2 M
WD U ADDR
X
Data
E RD M

16 X 32
Memory U
X
T WD
N
5 D

Destination register number is also passed through ID/EX, EX/MEM


and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
• Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram: Clock Cycle 1
LW
Single-Clock-Cycle Diagram: Clock Cycle 2
SW LW
Single-Clock-Cycle Diagram: Clock Cycle 3
ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 4
SUB ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 5
SUB ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 6
SUB ADD SW
Single-Clock-Cycle Diagram: Clock Cycle 7
SUB ADD
Single-Clock-Cycle Diagram: Clock Cycle 8
SUB
Alternative View – Multiple-Clock-Cycle Diagram

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
IM REG ALU DM REG
lw $t0, 10($t1)

IM REG ALU DM REG


sw $t3, 20($t4)

add $t5, $t6, $t7 IM REG ALU DM REG

sub $t8, $t9, $t10 IM REG ALU DM REG


Notes
• One significant difference in the execution of an R-type instruction
between multi-cycle and pipelined implementations:
– register write-back for the R-type instruction is the 5th (the last write-
back) pipeline stage vs. the 4th stage for the multi-cycle
implementation. Why?
– think of structural hazards when writing to the register file…
• Worth repeating: the essential difference between the pipeline and
multi-cycle implementations is the insertion of pipeline registers to
decouple the 5 stages
• The CPI of an ideal pipeline (no stalls) is 1. Why?

You might also like