Lect 08
Lect 08
Lecture 08 – Pipelining
Joseph Zambreno
Electrical and Computer Engineering
Iowa State University
www.ece.iastate.edu/~zambreno
rcl.ece.iastate.edu
As always, there's a couple of things in the pipeline - but that pipeline is a strange
and ambiguous place – Hugh Dancy
This Week’s Topic
• Multi-cycle processor datapath and control
– P&H D.3 (to some extent)
– Will move past fairly quickly
• Introduction to pipelined processor design
– P&H 4.5-4.6
– Pipelined datapath
– Pipelined control
ALU
fun
control
ALUSrc
MemWr
MemWr
MemRd
Equal
RegDst
RegWr
nPC_sel
ALUctr
ExtOp
Result Store
Reg.
Register
Instruction
Next PC
Wrt
Access
Fetch
ALU
Mem
Ext
PC
Fetch
Mem
Data
• Looks like a FSM with PC as state
Zambreno, Fall 2019 © ISU CprE 381 (Pipelining) Lect-08.3
What’s wrong with our CPI=1 CPU?
Arithmetic & Logical
PC Inst Memory Reg File mux ALU mux setup
Load
PC Inst Memory Reg File mux ALU Data Mem mux setup
Critical Path
Store
PC Inst Memory Reg File mux ALU Data Mem
Branch
PC Inst Memory Reg File cmp mux
Cycle 1 Cycle 2
Clk
lw sw Waste
Memory IR
Read Addr 1
A
PC
Address
Register Read
Read Addr 2 Data 1
ALUout
Read Data
(Instr. or Data) File
ALU
Write Addr Read
B
Write Data Data 2
Write Data
MDR
System Clock
clock cycle
MemWrite RegWrite
IR
A
PC
Address
Register Read
Read Addr 2 Data 1
ALUout
Read Data
(Instr. or Data) File
ALU
Write Addr Read
B
Write Data Data 2
Write Data
MDR
Instr[31-26]
PC[31-28]
Shift 28
Instr[25-0]
left 2 2
0
1
Memory 0
PC
0 Read Addr 1
A
Address
IR
Read
1 Register 1 zero
Read Addr 2 Data 1
ALUout
Read Data
0 File
(Instr. or Data) ALU
Write Addr
1 Read
B
Write Data Data 2 0
1 Write Data
4
MDR
1
0 2
Instr[15-0] Sign Shift 3
Extend 32 left 2 ALU
Instr[5-0] control
• Reading from the Register File takes ~50% of a clock cycle since it
has additional control and access overhead (but reading can be
done in parallel with decode)
Instr[31-26]
PC[31-28]
Shift 28
Instr[25-0]
left 2 2
0
1
Memory 0
PC
0 Read Addr 1
A
Address
IR
Read
1 Register 1 zero
Read Addr 2 Data 1
ALUout
Read Data
0 File
(Instr. or Data) ALU
Write Addr
1 Read
B
Write Data Data 2 0
1 Write Data
4
MDR
1
0 2 00
Instr[15-0] Sign Shift 3
Extend 32 left 2 ALU
Instr[5-0] control
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC
+(sign-extend(IR[15-0])<< 2);
Instr[31-26]
PC[31-28]
Shift 28
Instr[25-0]
left 2 2
0
1
Memory 0
PC
0 Read Addr 1
A
Address
IR
Read
1 Register 1 zero
Read Addr 2 Data 1
ALUout
Read Data
0 File
(Instr. or Data) ALU
Write Addr
1 Read
B
Write Data Data 2 0
1 Write Data
4
MDR
1
0 2 00
Instr[15-0] Sign Shift 3
Extend 32 left 2 ALU
Instr[5-0] control
• R-type:
ALUOut = A op B;
• Branch:
if (A==B) PC = ALUOut;
• Jump:
PC = PC[31-28] || (IR[25-0] << 2);
Instr[31-26]
PC[31-28]
Shift 28
Instr[25-0]
left 2 2
0
1
Memory 0
PC
0 Read Addr 1
A
Address
IR
Read
1 Register 1 zero
Read Addr 2 Data 1
ALUout
Read Data
0 File
(Instr. or Data) ALU
Write Addr
1 Read
B
Write Data Data 2 0
1 Write Data
4
MDR
1
0 2 00
Instr[15-0] Sign Shift 3
Extend 32 left 2 ALU
Instr[5-0] control
Memory Access
MemRead MemWrite RegDst=1
IorD=1 IorD=1 RegWrite
PCWriteCond=0 PCWriteCond=0 MemtoReg=0
PCWriteCond=0
RegDst=0
RegWrite
MemtoReg=1
PCWriteCond=0 Write Back
...
Combinational
state and the input) control logic points
– Output function
(determined by current
...
state) ...
State Reg
Inst Next State
Opcode
Microinstructions
Cycle 1 Cycle 2
Clk
lw sw Waste
multicycle clock
slower than 1/5th of
Multiple Cycle Implementation: single cycle clock
due to state register
overhead
Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch
lw sw Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
Clk
lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch
Pipeline Implementation:
pipeline clock same
lw IFetch Dec Exec Mem WB
as multi-cycle clock
sw IFetch Dec Exec Mem WB
Add
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
IFetch/Dec
Exec/Mem
Dec/Exec
Read
PC
File Read
Mem/WB
Address ALU Address
Write Addr Data
Read
Data 2 Write Data
Write Data
Sign
16 Extend 32
System Clock
ID/EX
EX/MEM
IF/ID Control
Add MEM/WB
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC
Sign
16 Extend 32
• Nonpipelined Execution:
–lw : IF + Read Reg + ALU + Memory + Write Reg
= 2 + 1 + 2 + 2 + 1 = 8 ns
–add: IF + Read Reg + ALU + Write Reg
= 2 + 1 + 2 + 1 = 6 ns
(recall 8ns for single-cycle processor)
• Pipelined Execution:
–Max(IF,Read Reg,ALU,Memory,Write Reg) = 2 ns
Add
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
IFetch/Dec
Exec/Mem
Dec/Exec
Read
PC
File Read
Mem/WB
Address ALU Address
Write Addr Data
Read
Data 2 Write Data
Write Data
Sign
16 Extend 32
System Clock
ALU
IM Reg DM Reg