0% found this document useful (0 votes)
17 views19 pages

Week 10 Part 1 Pipelined Processor

Uploaded by

dewierbarbell0n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views19 pages

Week 10 Part 1 Pipelined Processor

Uploaded by

dewierbarbell0n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

COM1031 Computer Logic

Week 10
Pipelined Processor
Week 10

• Pipelined Processor

Digital Design and Computer Architecture: ARM® Edition © 2015


Review: Single-Cycle ARM Processor
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite

Flags
ALUFlags

RegSrc
0 1 CLK CLK
CLK
19:16
Instr

0 RA1 WE3 SrcA WE


1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+

PCPlus4 0
+

4
23:0
Extend ExtImm
Result

Digital Design and Computer Architecture: ARM® Edition © 2015


Pipelined ARM Processor
• Aim to really improve performance
• Use temporal parallelism
• Divide single-cycle processor into 5 stages:
– Fetch
– Decode
– Execute
– Memory
– Writeback
• Add pipeline registers between stages

Digital Design and Computer Architecture: ARM® Edition © 2015


Single-Cycle vs. Pipelined
Single-Cycle
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Instr
Dec Execute Memory Wr Time (ps)
Fetch
1 Read
Instruction Reg
ALU Read / Write Reg
Fetch Dec Execute Memory Wr
2 Read
Instruction Reg
ALU Read / Write Reg

Instr
Pipelined
Fetch Dec Execute Memory Wr
1 Read
Instruction Reg
ALU Read / Write Reg
Fetch Dec Execute Memory Wr
2 Read
Instruction Reg
ALU Read / Write Reg
Fetch Dec Execute Memory Wr
3 Read
Instruction Reg
ALU Read / Write Reg

(b)

Digital Design and Computer Architecture: ARM® Edition © 2015


Pipelined Processor Abstraction
1 2 3 4 5 6 7 8 9 10

Time (cycles)
R0
LDR DM R2
LDR R2, [R0, #40] IM RF 40 + RF

R9
ADD DM R3
ADD R3, R9, R10 IM RF R10 + RF

R1
SUB DM R4
SUB R4, R1, R5 IM RF R5 - RF

R12
AND DM R5
AND R5, R12, R13 IM RF R13 & RF

R1
STR DM R6
STR R6, [R1, #20] IM RF 20 + RF

R11
ORR DM R7
ORR R7, R11, #42 IM RF 42 | RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Single-Cycle & Pipelined Datapath
Single-Cycle
CLK CLK
CLK
19:16

Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadData

ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15

+
PCPlus4 0
+

4
23:0
Extend ExtImm
Result

Pipelined
CLK CLK CLK CLK
CLK CLK
CLK
InstrF

InstrD

19:16
0 RA1D WE3 SrcAE WE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDataW

ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 0 SrcBE Data
Memory 1
15:12 WA3D 1 Memory
A3 Register WriteDataE
WD
4 WD3 File
PCPlus8 1
R15 ALUOutM ALUOutW
+

PCPlus4F 0
+

4
23:0
Extend ExtImmE
ResultW

Fetch Decode Execute Memory Writeback

Digital Design and Computer Architecture: ARM® Edition © 2015


Pipeline Hazards
• When an instruction depends on result from
instruction that hasn’t completed
• Types:
– Data hazard: register value not yet written back to
register file
– Control hazard: next instruction not decided yet
(caused by branch)

Digital Design and Computer Architecture: ARM® Edition © 2015


Data Hazard
1 2 3 4 5 6 7 8

Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF

R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF

R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF

R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Handling Data Hazards
• Insert NOPs in code at compile time
• Rearrange code at compile time
• Forward data at run time
• Stall the processor at run time

Digital Design and Computer Architecture: ARM® Edition © 2015


Compile-Time Hazard Elimination
• Insert enough NOPs for result to be ready
• Or move independent useful instructions forward
1 2 3 4 5 6 7 8 9 10

Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF

NOP DM
NOP IM RF RF

NOP DM
NOP IM RF RF

R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF

R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF

R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Data Forwarding
1 2 3 4 5 6 7 8

Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF

R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF

R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF

R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Data Forwarding
1 2 3 4 5 6 7 8

Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF

R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF

R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF

R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

• Check if register read in Execute stage matches register


written in Memory or Writeback stage
• If so, forward result

Digital Design and Computer Architecture: ARM® Edition © 2015


Stalling
1 2 3 4 5 6 7 8

Time (cycles)
R4
LDR DM R1
LDR R1, [R4, #40] IM RF 40 + RF

Trouble!
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF

R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF

R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Stalling
1 2 3 4 5 6 7 8 9

Time (cycles)
R4
LDR DM R1
LDR R1, [R4, #40] IM RF 40 + RF

R1 R1
AND DM R8
AND R8, R1, R3 IM RF R3 RF R3 & RF

R6
ORR ORR DM R9
ORR R9, R6, R1 IM IM RF R1 | RF

Stall R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF

Digital Design and Computer Architecture: ARM® Edition © 2015


Control Hazards
• B:
– branch not determined until the Writeback stage
of pipeline
– Instructions after branch fetched before branch
occurs
– These 4 instructions must be flushed if branch
happens
• Writes to PC (R15) similar

Digital Design and Computer Architecture: ARM® Edition © 2015


Control Hazards
1 2 3 4 5 6 7 8 9 10

Time (cycles)

B DM
20 B 3C IM RF RF

R1
AND DM
24 AND R8, R1, R3 IM RF R3 & RF

R6 Flush
ORR DM
28 ORR R9, R6, R1 IM RF R1 | RF these
instructions
R1
SUB DM
2C SUB R10, R1, R7 IM RF R7 - RF

R1
SUB DM
30 SUB R11, R1, R8 IM RF R8 - RF

34 ...
...
R3
ADD DM R12
64 ADD R12, R3, R4 IM RF R4 RF

+
Branch misprediction penalty
• number of instruction flushed when branch is taken (4)
• May be reduced by determining BTA earlier

Digital Design and Computer Architecture: ARM® Edition © 2015


Pipelined Performance Example
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 13% branches
– 52% data processing
• Suppose:
– 40% of loads used by next instruction
– 50% of branches mispredicted
• What is the average CPI?

Digital Design and Computer Architecture: ARM® Edition © 2015


Pipelined Performance Example
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 13% branches
– 52% data processing
• Suppose:
– 40% of loads used by next instruction
– 50% of branches mispredicted
• What is the average CPI?
– Load CPI = 1 when not stalling, 2 when stalling
So, CPIlw = 1(0.6) + 2(0.4) = 1.4
– Branch CPI = 1 when not stalling, 3 when stalling
So, CPIbeq = 1(0.5) + 3(0.5) = 2

Average CPI = (0.25)(1.4) + (0.1)(1) + (0.13)(2) + (0.52)(1) = 1.23

Digital Design and Computer Architecture: ARM® Edition © 2015

You might also like