0% found this document useful (0 votes)

111 views38 pages

l06 Pipeline PDF

pipeline

Uploaded by

shizghul89b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views38 pages

l06 Pipeline PDF

pipeline

Uploaded by

shizghul89b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

1

Pipeline Hazards

Computer Science and Artificial Intelligence Laboratory

Arvind M.I.T.

Based on the material prepared by Arvind and Krste Asanovic

6.823 L6- 2 Arvind

Technology Assumptions
A small amount of very fast memory (caches) backed up by a large, slower memory Fast ALU (at least for integers) Multiported Register files (slower!)

It makes the following timing assumption valid

tIM tRF tALU tDM tRW

A 5-stage pipelined Harvard architecture will be the focus of our detailed design

September 28, 2005

6.823 L6- 3 Arvind

5-Stage Pipelined Execution

0x4 Add we rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext

addr rdata

ALU

we addr rdata wdata

Inst. Memory

Data Memory

I-Fetch (IF)

Decode, Reg. Fetch Execute (ID) (EX)

t0 IF1 t1 t2 ID1 EX1 IF2 ID2 IF3 t3 MA1 EX2 ID3 IF4 t4 WB1 MA2 EX3 ID4 IF5 t5

Memory (MA)
t6 t7 ....

Write -Back (WB)

time instruction1 instruction2 instruction3 instruction4 instruction5

September 28, 2005

WB2 MA3 WB3 EX4 MA4 WB4 ID5 EX5 MA5 WB5

5-Stage Pipelined Execution

Resource Usage Diagram
0x4 Add we rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext

6.823 L6- 4 Arvind

addr rdata

ALU

we addr rdata wdata

Inst. Memory

Data Memory

I-Fetch (IF)
Resources

Decode, Reg. Fetch Execute (ID) (EX)

t0 I1 t1 I2 I1 t2 I3 I2 I1 t3 I4 I3 I2 I1 t4 I5 I4 I3 I2 I1 t5 I5 I4 I3 I2

Memory (MA)
t6 I5 I4 I3 t7 ....

Write -Back (WB)

time IF ID EX MA WB

I5 I4

September 28, 2005

Pipelined Execution: ALU Instructions

0x4
Add

6.823 L6- 5 Arvind

IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

Not quite correct! We need an Instruction Reg (IR) for each stage
September 28, 2005

6.823 L6- 6 Arvind

IRs and Control points

0x4
Add

IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

Are control points connected properly?

- ALU instructions - Load/Store instructions - Write back

September 28, 2005

Pipelined MIPS Datapath

without jumps
F D E
IR 0x4
Add

6.823 L6- 7 Arvind

M
IR
31

W
IR

RegDst RegWrite we rs1 rs2 rd1 ws wd rd2 OpSel A

ALU

MemWrite Y we addr

WBSrc

addr

inst IR

Inst Memory

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

ExtSel

BSrc

September 28, 2005

How Instructions can Interact with each other in a pipeline

An instruction in the pipeline may need a resource being used by another instruction in the pipeline
structural hazard

6.823 L6- 8 Arvind

An instruction may produce data that is needed by a later instruction

data hazard

In the extreme case, an instruction may determine the next instruction to be executed
control hazard (branches, interrupts,...)

September 28, 2005

6.823 L6- 9 Arvind

Data Hazards
r4 r1
0x4
Add

r1
IR IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

... r1 r0 + 10 r4 r1 + 17 ...
September 28, 2005

r1 is stale. Oops!

6.823 L6- 10 Arvind

Resolving Data Hazards

Freeze earlier pipeline stages until the data becomes available interlocks If data is available somewhere in the datapath provide a bypass to get it to the right stage Speculate about the hazard resolution and kill the instruction later if the speculation is wrong.

September 28, 2005

6.823 L6- 11 Arvind

Feedback to Resolve Hazards

FB1 FB2 FB3 FB4

stage 1

stage 2

stage 3

stage 4

Detect a hazard and provide feedback to previous stages to stall or kill instructions Controlling a pipeline in this manner works provided the instruction at stage i+1 can complete without any interference from instructions in stages 1 to i (otherwise deadlocks may occur)
September 28, 2005

Interlocks to resolve Data Hazards

Stall Condition

6.823 L6- 12 Arvind

0x4
Add

nop

IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

... r1 r0 + 10 r4 r1 + 17 ...
September 28, 2005

wdata wdata

Data Memory

rdata

MD1

MD2

6.823 L6- 13 Arvind

Stalled Stages and Pipeline Bubbles

time t0 t1 t2 t3 t4 t5 (I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1 (I2) r4 (r1) + 17 IF2 ID2 ID2 ID2 ID2 (I3) IF3 IF3 IF3 IF3 (I4) stalled stages (I5) time t0 t1 I1 I2 I1 t6 t7 ....

EX2 MA2 WB2 ID3 EX3 MA3 WB3 IF4 ID4 EX4 MA4 WB4 IF5 ID5 EX5 MA5 WB5

Resource Usage

IF ID EX MA WB

t2 I3 I2 I1

t3 I3 I2 nop I1

t4 I3 I2 nop nop I1

t5 I3 I2 nop nop nop

t6 I4 I3 I2 nop nop

t7 I5 I4 I3 I2 nop

.... I5 I4 I3 I2 I5 I4 I3

I5 I4

nop
September 28, 2005

pipeline bubble

6.823 L6- 14 Arvind

Interlock Control Logic

stall ws Cstall rs rt ?

0x4
Add

nop

IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

September 28, 2005

Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions.

Interlocks Control Logic

ignoring jumps & branches
stall Cstall rs rt re1 0x4
Add

6.823 L6- 15 Arvind

ws we ? re2 we Cdest ws we Cdest IR

Cre

nop

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

Cdest

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

Should we always stall if the rs field matches some rd? not every instruction writes a register we not every instruction reads a register re
September 28, 2005

6.823 L6- 16 Arvind

Source & Destination Registers

R-type: I-type: J-type: ALU ALUi LW SW BZ J JAL JR JALR
op op op rs rs rt rt rd func immediate16

immediate26

rd (rs) func (rt) rt (rs) op imm rt M [(rs) + imm] M [(rs) + imm] (rt) cond (rs) true: PC (PC) + imm false: PC (PC) + 4 PC (PC) + imm r31 (PC), PC (PC) + imm PC (rs) r31 (PC), PC (rs)

source(s) destination rs, rt rd rs rt rs rt rs, rt rs rs rs rs 31 31

September 28, 2005

6.823 L6- 17 Arvind

Deriving the Stall Signal

Cdest ws = Case opcode ALU rd ALUi, LW rt JAL, JALR R31 we = Case opcode ALU, ALUi, LW (ws 0) JAL, JALR on ... off Cre re1 = Case opcode ALU, ALUi, LW, SW, BZ, JR, JALR on J, JAL off re2 = Case opcode on ALU, SW off ...

Cstall

stall = ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW) . re1D ((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW) . re2D

! t y no tor is l s is ful h T e th

September 28, 2005

6.823 L6- 18 Arvind

Hazards due to Loads & Stores

Stall Condition
What if (r1)+7 = (r3)+5 ?

0x4
Add

nop

IR
31

addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata wdata

Data Memory

rdata

... M[(r1)+7] (r2) r4 M[(r3)+5] ...

September 28, 2005

MD1

MD2

Is there any possible data hazard in this instruction sequence?

Load & Store Hazards

... M[(r1)+7] (r2) r4 M[(r3)+5] ...

6.823 L6- 19 Arvind

(r1)+7 = (r3)+5 data hazard

However, the hazard is avoided because our memory system completes writes in one cycle ! Load/Store hazards, even when they do exist, are often resolved in the memory system itself. More on this later in the course.

September 28, 2005

Five-minute break to stretch your legs

6.823 L6- 21 Arvind

Complications due to Jumps

PCSrc (pc+4 / jabs / rind/ br)

stall

Add 0x4
Add

nop
Jump?

IR
I1

PC 104

addr

inst

IR I2

Inst Memory

Note fetching the next instruction before decode is speculation kill

I1 I2 I3 I4

096 100 104 304

ADD J 200 ADD kill ADD

A jump instruction kills (not stalls) the following instruction How?

September 28, 2005

6.823 L6- 22 Arvind

Pipelining Jumps
PCSrc (pc+4 / jabs / rind/ br)

stall

To kill a fetched instruction -- Insert a mux before IR

E M

Add 0x4
Add

nop
Jump? IRSrcD

IR
I I2 1

IR
I1

PC 304 104

addr

inst

nop

IR nop I2

Inst Memory

Any interaction between stall and jump?

IRSrcD = Case opcodeD J, JAL nop ... IM

I1 I2 I3 I4

096 100 104 304

ADD J 200 ADD kill ADD

September 28, 2005

6.823 L6- 23 Arvind

Jump Pipeline Diagrams

(I1) (I2) (I3) (I4) 096: 100: 104: 304: ADD J 200 ADD ADD time t0 t1 t2 IF1 ID1 EX1 IF2 ID2 IF3 t3 MA1 EX2 nop IF4 t4 WB1 MA2 nop ID4 t5 t6 t7 ....

WB2 nop nop EX4 MA4 WB4

Resource Usage

IF ID EX MA WB

time t0 t1 I1 I2 I1

t2 I3 I2 I1

t3 I4 nop I2 I1

t4 I5 I4 nop I2 I1

....

I5 I4 I5 nop I4 I5 I2 nop I4 nop

I5 pipeline bubble

September 28, 2005

6.823 L6- 24 Arvind

Pipelining Conditional Branches

PCSrc (pc+4 / jabs / rind / br)

stall

Add 0x4
Add

M IR

nop
BEQZ? IRSrcD

IR I1

zero?

PC 104

addr

inst

nop

IR I2

A
ALU

Inst Memory

I1 I2 I3 I4

096 100 104 304

ADD BEQZ r1 200 ADD ADD

Branch condition is not known until the execute stage what action should be taken in the decode stage ?

September 28, 2005

6.823 L6- 25 Arvind

Pipelining Conditional Branches

PCSrc (pc+4 / jabs / rind / br)

stall

?
Add 0x4
Add

BEQZ?

M IR I1

nop

IR I2

zero?

IRSrcD PC 108 addr inst

nop

IR I3

A
ALU

Inst Memory

I1 I2 I3 I4

096 100 104 304

If the branch is taken ADD - kill the two following instructions BEQZ r1 200 - the instruction at the decode stage ADD is not valid ADD stall signal is not valid

September 28, 2005

6.823 L6- 26 Arvind

Pipelining Conditional Branches

PCSrc (pc+4/jabs/rind/br)

stall
Add

0x4
Add

nop
Jump?
PC

IRSrcE

E IR I2

BEQZ?

M IR I1

zero?

PC 108

addr

inst

nop

IRSrcD IR I3 A
ALU

Inst Memory

I1 I2 I3 I4

096 100 104 304

If the branch is taken ADD - kill the two following instructions BEQZ r1 200 - the instruction at the decode stage ADD is not valid ADD stall signal is not valid

September 28, 2005

6.823 L6- 27 Arvind

New Stall Signal

stall = ( ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D + ((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D ) . !((opcodeE=BEQZ).z + (opcodeE=BNEZ).!z)

Dont stall if the branch is taken. Why? Instruction at the decode stage is invalid

September 28, 2005

Control Equations for PC and IR Muxes

PCSrc = Case opcodeE BEQZ.z, BNEZ.!z br ... Case opcodeD J, JAL JR, JALR ...

6.823 L6- 28 Arvind

jabs rind pc+4

IRSrcD = Case opcodeE BEQZ.z, BNEZ.!z nop ... Case opcodeD J, JAL, JR, JALR nop ... IM IRSrcE = Case opcodeE BEQZ.z, BNEZ.!z nop ... stall.nop + !stall.IRD
September 28, 2005

Give priority to the older instruction, i.e., execute stage instruction over decode stage instruction

Branch Pipeline Diagrams

(resolved in execute stage)
(I1) (I2) (I3) (I4) (I5) time t0 t1 t2 096: ADD IF1 ID1 EX1 100: BEQZ 200 IF2 ID2 104: ADD IF3 108: 304: ADD t3 MA1 EX2 ID3 IF4 t4 WB1 MA2 nop nop IF5 t5 t6 t7 ....

6.823 L6- 29 Arvind

WB2 nop nop nop nop nop ID5 EX5 MA5 WB5

Resource Usage

IF ID EX MA WB

time t0 t1 I1 I2 I1

t2 I3 I2 I1

t3 I4 I3 I2 I1

t4 I5 nop nop I2 I1

....

I5 nop I5 nop nop I5 I2 nop nop I5 nop pipeline bubble

September 28, 2005

Reducing Branch Penalty (resolve in decode stage)

6.823 L6- 30 Arvind

One pipeline bubble can be removed if an extra comparator is used in the Decode stage
PCSrc (pc+4 / jabs / rind/ br)

Add

nop
0x4
Add

addr

nop
inst IR D

Inst Memory

we rs1 rs2 rd1 ws wd rd2

Zero detect on register file output

GPRs

Pipeline diagram now same as for jumps

September 28, 2005

Branch Delay Slots (expose control hazard to software)

Change the ISA semantics so that the instruction that follows a jump or branch is always executed
gives compiler the flexibility to put in a useful instruction where normally a pipeline bubble would have resulted.

6.823 L6- 31 Arvind

I1 I2 I3 I4

096 100 104 304

ADD BEQZ r1 200 ADD ADD

Delay slot instruction executed regardless of branch outcome

Other techniques include branch prediction, which can dramatically reduce the branch penalty... to come later
September 28, 2005

6.823 L6- 32 Arvind

Bypassing
time (I1) r1 r0 + 10 (I2) r4 r1 + 17 (I3) (I4) (I5) t0 IF1 t1 t2 t3 t4 t5 ID1 EX1 MA1 WB1 IF2 ID2 ID2 ID2 ID2 IF3 IF3 IF3 IF3 stalled stages t6 t7 .... EX2 MA2 WB2 ID3 EX3 MA3 IF4 ID4 EX4 IF5 ID5

Each stall or kill introduces a bubble in the pipeline CPI > 1 A new datapath, i.e., a bypass, can get the data from the output of the ALU to its input
time (I1) r1 r0 + 10 (I2) r4 r1 + 17 (I3) (I4) (I5) t0 t1 IF1 t2 t3 ID1 EX1 IF2 ID2 IF3 t4 MA1 EX2 ID3 IF4 t5 WB1 MA2 EX3 ID4 IF5 t6 t7 ....

September 28, 2005

WB2 MA3 WB3 EX4 MA4 WB4 ID5 EX5 MA5 WB5

6.823 L6- 33 Arvind

Adding a Bypass
stall

r4 r1...
0x4
Add

r1 ...
nop
IR

M
31

ASrc
PC addr

inst IR

Inst Memory

we rs1 rs2 rd1 ws wd rd2

A
ALU

we addr

GPRs
Imm Ext

wdata

wdata MD1 MD2

Data Memory

rdata

... (I1) r1 r0 + 10 (I2) r4 r1 + 17 yes September 28, 2005

When does this bypass help? r1 M[r0 + 10] r4 r1 + 17 JAL 500 r4 r31 + 17 no

The Bypass Signal

6.823 L6- 34 Arvind

Deriving it from the Stall Signal

stall = ( ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D +((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D ) ws = Case opcode ALU rd ALUi, LW rt JAL, JALR R31 we = Case opcode ALU, ALUi, LW (ws 0) JAL, JALR on ... off

ASrc = (rsD=wsE).weE.re1D

Is this correct?

No because only ALU and ALUi instructions can benefit from this bypass Split weE into two components: we-bypass, we-stall

September 28, 2005

6.823 L6- 35 Arvind

Bypass and Stall Signals

Split weE into two components: we-bypass, we-stall
we-bypassE = Case opcodeE ALU, ALUi (ws 0) ... off we-stallE = Case opcodeE LW (ws 0) JAL, JALR on ... off

ASrc stall

= (rsD =wsE).we-bypassE . re1D = ((rsD =wsE).we-stallE + (rsD=wsM).weM + (rsD=wsW).weW). re1D +((rtD = wsE).weE + (rtD = wsM).weM + (rtD = wsW).weW). re2D

September 28, 2005

6.823 L6- 36 Arvind

Fully Bypassed Datapath

stall

PC for JAL, ...

0x4
Add

nop

ASrc
we rs1 rs2 rd1 ws wd rd2

M
31

addr

inst IR

A
ALU

we addr

Inst Memory

GPRs
Imm Ext

BSrc

wdata

wdata MD1 MD2

Data Memory

rdata

Is there still a need for the stall signal ?

stall = (rsD=wsE). (opcodeE=LWE).(wsE0 ).re1D + (rtD=wsE). (opcodeE=LWE).(wsE0 ).re2D

September 28, 2005

Why an Instruction may not be dispatched every cycle (CPI>1)

Full bypassing may be too expensive to implement
typically all frequently used paths are provided some infrequently used bypass paths may increase cycle time and counteract the benefit of reducing CPI

6.823 L6- 37 Arvind

Loads have two cycle latency

Instruction after load cannot use load result MIPS-I ISA defined load delay slots, a software-visible pipeline hazard (compiler schedules independent instruction or inserts NOP to avoid hazard). Removed in MIPS-II.

Conditional branches may cause bubbles

kill following instruction(s) if no delay slots

Machines with software-visible delay slots may execute significant number of NOP instructions inserted by the compiler.
September 28, 2005

Thank you !

11computer Science-Computer System Overview-Notes
No ratings yet
11computer Science-Computer System Overview-Notes
12 pages
Constant Voltage Constant Current DC Power Supply
100% (1)
Constant Voltage Constant Current DC Power Supply
56 pages
User Manual SRT1C
0% (1)
User Manual SRT1C
578 pages
Term Paper Topics For Civil Engineering Students
100% (1)
Term Paper Topics For Civil Engineering Students
8 pages
D M3452 CMAN vB7K369allR2AR5 04m
No ratings yet
D M3452 CMAN vB7K369allR2AR5 04m
58 pages
DP Bluetooth 15045 Drivers
No ratings yet
DP Bluetooth 15045 Drivers
702 pages
Chyurlia Monolithicintegrationofganhemtwithsilicon
No ratings yet
Chyurlia Monolithicintegrationofganhemtwithsilicon
140 pages
3.1 Analog Electronics - Ii: Rationale
No ratings yet
3.1 Analog Electronics - Ii: Rationale
20 pages
Compensation Technique For 3 DB Bandwidth Improvement in Two Satge Amplifier
No ratings yet
Compensation Technique For 3 DB Bandwidth Improvement in Two Satge Amplifier
18 pages
EWARM DevelopmentGuide
No ratings yet
EWARM DevelopmentGuide
382 pages
Electronic Metronome
100% (1)
Electronic Metronome
2 pages
Hfe Sony Str-Da30es Da50es V55es Service
100% (1)
Hfe Sony Str-Da30es Da50es V55es Service
16 pages
Submersible Mixer
No ratings yet
Submersible Mixer
48 pages
Application Note Philips PDF
No ratings yet
Application Note Philips PDF
28 pages
(Elearnica - Ir) - An Energy Transmission System For An Artificial Heart Using Leakage Inducta
100% (1)
(Elearnica - Ir) - An Energy Transmission System For An Artificial Heart Using Leakage Inducta
10 pages
NNN Manual stm32f7
No ratings yet
NNN Manual stm32f7
51 pages
VLSI Architecture and FPGA Implementation of Image Enhancement Algorithms For Medical Images
100% (1)
VLSI Architecture and FPGA Implementation of Image Enhancement Algorithms For Medical Images
11 pages
Update HTC Desire HD To Android 4.4.2 OS With CM11 Custom ROM Firmware PDF
No ratings yet
Update HTC Desire HD To Android 4.4.2 OS With CM11 Custom ROM Firmware PDF
10 pages
Update HTC Desire HD To Android 4.4.2 OS With CM11 Custom ROM Firmware PDF
No ratings yet
Update HTC Desire HD To Android 4.4.2 OS With CM11 Custom ROM Firmware PDF
10 pages
EMIC
No ratings yet
EMIC
8 pages
Advantage and Disadvantage of 4G
100% (1)
Advantage and Disadvantage of 4G
2 pages
Sx440, STAMFORD Manual
No ratings yet
Sx440, STAMFORD Manual
4 pages
Axsem RF June14
No ratings yet
Axsem RF June14
18 pages
Chapter 7
No ratings yet
Chapter 7
10 pages
Data Sheet lm331
No ratings yet
Data Sheet lm331
16 pages
Introduction To Wireless Sensor Networks: Peyman Teymoori
No ratings yet
Introduction To Wireless Sensor Networks: Peyman Teymoori
15 pages
1 Meter Persegi Bisa Hasilkan Listrik 100 Watt: System
No ratings yet
1 Meter Persegi Bisa Hasilkan Listrik 100 Watt: System
9 pages
A Novel ZCS-PWM Flyback Converter With A Simple ZCS-PWM Commutation Cell
No ratings yet
A Novel ZCS-PWM Flyback Converter With A Simple ZCS-PWM Commutation Cell
9 pages
ITU-R E-Band
No ratings yet
ITU-R E-Band
10 pages
Computer Parts1
No ratings yet
Computer Parts1
1 page
Ta8710s-Sif Converter For TV and Vtr-Toshiba
No ratings yet
Ta8710s-Sif Converter For TV and Vtr-Toshiba
7 pages
Chapter 19: Television: Multiple Choice
No ratings yet
Chapter 19: Television: Multiple Choice
6 pages
Service Manual: SSW-10 Subwoofer Infinitesimal IV Subwoofer Servo Controlled Subwoofer RS Subwoofer
No ratings yet
Service Manual: SSW-10 Subwoofer Infinitesimal IV Subwoofer Servo Controlled Subwoofer RS Subwoofer
2 pages
DSE520
No ratings yet
DSE520
3 pages
Getting Started: Crosspack For Avr Development 20131216
No ratings yet
Getting Started: Crosspack For Avr Development 20131216
3 pages
Documentation: Documentation Available in HTML Format
No ratings yet
Documentation: Documentation Available in HTML Format
2 pages
Chison Ivis-30 and I8 Colourdopplers: Ecg, Monitors & Ultrasound
No ratings yet
Chison Ivis-30 and I8 Colourdopplers: Ecg, Monitors & Ultrasound
1 page