0% found this document useful (0 votes)

41 views25 pages

Chapter Six: 2004 Morgan Kaufmann Publishers

Pipelining improves performance by increasing instruction throughput. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2004 Morgan Kaufmann Publishers Pipelining What makes it easy - all instructions are the same length - just a few instruction formats - memory operands appear only in loads and stores what makes it hard? - structural hazards: suppose we had only one memory - control hazards: need to worry about branch instructions - data hazards: an instruction depends on a previous instruction, etc.

Uploaded by

samquickly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views25 pages

Chapter Six: 2004 Morgan Kaufmann Publishers

Uploaded by

samquickly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter Six

2004 Morgan Kaufmann Publishers

Pipelining
Improve performance by increasing instruction throughput
Program execution Time order (in instructions) 200 400 600 800 1000 1200 1400 1600 1800

lw $1, 100($0) Instruction R g e fetch lw $2, 200($0) lw $3, 300($0)

AU L 800 ps

Data access

Rg e Instruction R g e fetch AU L 800 ps Data access Rg e Instruction fetch 800 ps

Note: timing assumptions changed for this example

Program execution Time order (in instructions) lw $1, 100($0)

200

400

600

800

1000

1200

1400

Instruction fetch

Rg e Instruction fetch 200 ps

AU L Rg e Instruction fetch

Data access AU L Rg e

Rg e Data access AU L Rg e Data access Rg e

lw $2, 200($0) 200 ps lw $3, 300($0)

200 ps 200 ps 200 ps 200 ps 200 ps

Ideal speedup is number of stages in the pipeline. Do we achieve this?

2004 Morgan Kaufmann Publishers

Pipelining
What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction Well build a simple pipeline and look at these issues Well talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc.
2004 Morgan Kaufmann Publishers

Basic Idea
IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back

Ad d 4 Shift lf2 et Read R a ed regist r 1 d t 1 e a a Read regist r 2 e Regiser ts Write Ra ed register dt 2 a a Wie r t data 1 6 2 Sign 3 et n xe d AD Add D result

P C

A de s drs Instruction Instruction memory

Zr eo AU A U L L result

A de s drs

Ra ed data Data Mm r e oy

Write data

What do we need to add to actually split the datapath into stages?

2004 Morgan Kaufmann Publishers

Pipelined Datapath

IF/ID

ID/EX

EX/MEM

MEM/WB

Add 4 Shift left 2 Add Add result

Address

Read register 1

Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

Zero ALU ALU result

Address Data memory Write data

Read data

Sign extend

Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?
2004 Morgan Kaufmann Publishers

Corrected Datapath

IF/ID

ID/EX

EX/MEM

MEM/WB

Add 4 Shift left 2 Add Add result

Address

Read register 1

Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

Zero ALU ALU result

Address Data memory Write data

Read data

Sign extend

2004 Morgan Kaufmann Publishers

Graphically Representing Pipelines

Time (in clock cycles) Program execution order (in instructions) lw $1, 100($0) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC7

Reg

ALU

Reg

lw $2, 200($0)

Reg

ALU

Reg

lw $3, 300($0)

Reg

ALU

Reg

Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths

2004 Morgan Kaufmann Publishers

Pipeline Control
PCSrc

IF/ID

ID/EX

EX/MEM

MEM/WB

Add 4 Shift left 2 RegWrite PC Address Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Write data Instruction (150) 16 Sign extend 32 6 ALU control MemWrite ALUSrc Zero Add ALU result MemtoReg Address Data memory Read data Add Add result Branch

Instruction memory

MemRead

Instruction (2016)

ALUOp

Instruction (1511) RegDst

2004 Morgan Kaufmann Publishers

Pipeline control
We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine?

2004 Morgan Kaufmann Publishers

Pipeline Control
Pass control signals along just like the data
Execution/Address Calculation Memory access stage stage control lines control lines Reg ALU ALU ALU Mem Mem Dst Op1 Op0 Src Branch Read Write 1 1 0 0 0 0 0 0 0 0 1 0 1 0 X 0 0 1 0 0 1 X 0 1 0 1 0 0 Write-back stage control lines Reg Mem to write Reg 1 0 1 1 0 X 0 X

Instruction R-format lw sw beq

WB Instruction M EX WB M WB

Control

IF/ID

ID/EX

EX/MEM

MEM/WB

2004 Morgan Kaufmann Publishers

Datapath with Control

PCSrc ID/EX WB

EX/MEM WB

Control

MEM/WB WB

IF/ID

Add 4 Shift left 2 Add Add result ALUSrc

Branch

Address

Read register 1

Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

Zero ALU ALU result

Address Data memory Write data

Read data

Instruction [150]

Sign extend

ALU control ALUOp

MemRead

Instruction [2016] Instruction [1511]

RegDst

2004 Morgan Kaufmann Publishers

Dependencies
Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards
Value of register $2: Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg Time (in clock cycles) CC 1 CC 2 10 10 CC 3 10 CC 4 10 CC 5 10/20 CC 6 20 CC 7 20 CC 8 20 CC 9 20

and $12, $2, $5

Reg

or $13, $6, $2

Reg

add $14, $2, $2

Reg

sw $15, 100($2)

Reg

2004 Morgan Kaufmann Publishers

Software Solution
Have compiler guarantee no hazards Where do we insert the nops ? sub and or add sw $2, $1, $3 $12, $2, $5 $13, $6, $2 $14, $2, $2 $15, 100($2)

Problem: this really slows us down!

2004 Morgan Kaufmann Publishers

Forwarding
Use temporary results, dont wait for them to be written register file forwarding to handle read/write to same register ALU forwarding
Time (in clock cycles) CC 1 CC 2 Value of register $2: 10 10 Value of EX/MEM: X X Value of MEM/WB: X X Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg CC 3 10 X X CC 4 10 20 X CC 5 10/20 X 20 CC 6 20 X X CC 7 20 X X CC 8 20 X X CC 9 20 X X

and $12, $2, $5

Reg

or $13, $6, $2

Reg

add $14,$2 , $2

Reg

sw $15, 100($2)

Reg

what if this $2 was $13?

2004 Morgan Kaufmann Publishers

Forwarding
The main idea (some details not shown)

ID/EX M u x Registers ForwardA M u x ALU

EX/MEM

MEM/WB

Data memory

M u x

ForwardB R s R t R t R d EX/MEM.RegisterRd

M u x Forwarding unit

MEM/WB.RegisterRd

2004 Morgan Kaufmann Publishers

Can't always forward

Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register.
Time (in clock cycles) CC 1 CC 2 Program execution order (in instructions) lw $2, 20($1) IM Reg DM Reg CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

and $4, $2, $5

Reg

or $8, $2, $6

Reg

add $9, $4, $2

Reg

slt $1, $6, $7

Reg

Thus, we need a hazard detection unit to stall the load instruction

2004 Morgan Kaufmann Publishers

Stalling
We can stall the pipeline by keeping an instruction in the same stage
Time (in clock cycles) CC 1 CC 2 CC 3 Program execution order (in instructions) lw $2, 20($1) IM Reg DM Reg bubble and becomes nop IM Reg DM Reg CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10

add $4, $2, $5

Reg

or $8, $2, $6

Reg

add $9, $4, $2

Reg

2004 Morgan Kaufmann Publishers

Hazard Detection Unit

Stall by letting an instruction that wont write anything go forward
Hazard detection unit ID/EX.MemRead

ID/EX WB Control IF/ID 0 M u x M EX EX/MEM WB M MEM/WB WB

M u x Registers ALU PC Instruction memory M u x Data memory M u x

IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rs Rt Forwarding unit Rt Rd M u x

2004 Morgan Kaufmann Publishers

Branch Hazards
When we decide to branch, other instructions are in the pipeline!
Time (in clock cycles) CC 1 Program execution order (in instructions) 40 beq $1, $3, 28 IM Reg DM Reg CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

44 and $12, $2, $5

Reg

48 or $13, $6, $2

Reg

52 add $14, $2, $2

Reg

72 lw $4, 50($7)

Reg

We are predicting branch not taken need to add hardware for flushing instructions if we are wrong
2004 Morgan Kaufmann Publishers

Flushing Instructions
IF.Flush Hazard detection unit ID/EX WB Control 0 IF/ID + 4 Shift left 2 M u x ALU M u x Sign extend Data memory M u x + M u x M EX EX/MEM WB M EX/MEM WB

M u x

Registers PC Instruction memory

M u x Fowarding unit

Note: weve also moved branch decision to ID stage

2004 Morgan Kaufmann Publishers

Branches
If the branch is taken, we have a penalty of one cycle For our simple design, this is reasonable With deeper pipelines, penalty increases and static branch prediction drastically hurts performance Solution: dynamic branch prediction
Taken Not taken Predict taken Taken Taken Not taken Predict not taken Taken Not taken Predict not taken Not taken Predict taken

A 2-bit prediction scheme

2004 Morgan Kaufmann Publishers

Branch Prediction
Sophisticated Techniques:

A branch target buffer to help us look up the destination

Correlating predictors that base prediction on global behavior and recently executed branches (e.g., prediction for a specific
branch instruction based on what happened in previous branches)

Tournament predictors that use different types of prediction strategies and keep track of which one is performing best. A branch delay slot which the compiler tries to fill with a useful instruction (make the one cycle delay part of the ISA) Branch prediction is especially important because it enables other more advanced pipelining techniques to be effective! Modern processors predict correctly 95% of the time!

2004 Morgan Kaufmann Publishers

Improving Performance
Try and avoid stalls! E.g., reorder these instructions: lw lw sw sw $t0, $t2, $t2, $t0, 0($t1) 4($t1) 0($t1) 4($t1)

Dynamic Pipeline Scheduling Hardware chooses which instructions to execute next Will execute instructions out of order (e.g., doesnt wait for a dependency to be resolved, but rather keeps going!) Speculates on branches and keeps the pipeline full (may need to rollback if prediction incorrect)

Trying to exploit instruction-level parallelism

2004 Morgan Kaufmann Publishers

Advanced Pipelining
Increase the depth of the pipeline Start more than one instruction each cycle (multiple issue) Loop unrolling to expose more ILP (better scheduling) Superscalar processors DEC Alpha 21264: 9 stage pipeline, 6 instruction issue All modern processors are superscalar and issue multiple instructions usually with some limitations (e.g., different pipes) VLIW: very long instruction word, static multiple issue (relies more on compiler technology)

This class has given you the background you need to learn more!

2004 Morgan Kaufmann Publishers

Chapter 6 Summary
Pipelining does not improve latency, but does improve throughput

Deeply pipelined

Multiple issue with deep pipeline (Section 6.10)

Multiple issue with deep pipeline (Section 6.10) Multiple-issue pipelined (Section 6.9) Single-cycle (Section 5.4) Pipelined Deeply pipelined

Multicycle (Section 5.5)

Pipelined

Multiple-issue pipelined (Section 6.9)

Single-cycle (Section 5.4)

Multicycle (Section 5.5)

Slower

Faster

1 Use latency in instructions

Several

Instructions per clock (IPC = 1/CPI)

2004 Morgan Kaufmann Publishers

Pipeline
100% (2)
Pipeline
8 pages
U33
No ratings yet
U33
61 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
Revisiting Hazards: Data Hazards Control Hazards Hardware
No ratings yet
Revisiting Hazards: Data Hazards Control Hazards Hardware
45 pages
CH 7
No ratings yet
CH 7
68 pages
Pipelining in MIPs Architecture
100% (3)
Pipelining in MIPs Architecture
23 pages
Mind Map
100% (1)
Mind Map
13 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
50 pages
Unit 5.2 Processor
No ratings yet
Unit 5.2 Processor
40 pages
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
Pipe 2 New
No ratings yet
Pipe 2 New
41 pages
Pipelining 3
No ratings yet
Pipelining 3
37 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Forwarding Assignment
No ratings yet
Forwarding Assignment
35 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
L13 Stalls and Flushes
No ratings yet
L13 Stalls and Flushes
27 pages
Lec12 Pipeline 2 Notes
No ratings yet
Lec12 Pipeline 2 Notes
58 pages
Pipelining 2
No ratings yet
Pipelining 2
33 pages
Lecture-16 CH-04 4
No ratings yet
Lecture-16 CH-04 4
21 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Datapath Control
No ratings yet
Datapath Control
56 pages
Module-5 DDCO
No ratings yet
Module-5 DDCO
35 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Pipelining
No ratings yet
Pipelining
24 pages
Unit 7 - Basic Processing
No ratings yet
Unit 7 - Basic Processing
85 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
Controlling A Pipelined Datapath
No ratings yet
Controlling A Pipelined Datapath
17 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
26 pages
Chapter V Processor Architecture
No ratings yet
Chapter V Processor Architecture
140 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Lec03 - Processor Structure and Function
No ratings yet
Lec03 - Processor Structure and Function
55 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Caie Igcse Mathematics Theory Znotes
No ratings yet
Caie Igcse Mathematics Theory Znotes
21 pages
Ip Study Material
No ratings yet
Ip Study Material
185 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Data Hazards: Danger!Danger!Danger!
No ratings yet
Data Hazards: Danger!Danger!Danger!
7 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Lec 11
No ratings yet
Lec 11
30 pages
Scopa Rules
No ratings yet
Scopa Rules
2 pages
Lec07 Annotated
No ratings yet
Lec07 Annotated
26 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
DDCO Notes-162-171
No ratings yet
DDCO Notes-162-171
10 pages
Saic-Q-1035 Sub-Base & Base Course
No ratings yet
Saic-Q-1035 Sub-Base & Base Course
4 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
CA Unit 3 Answers
No ratings yet
CA Unit 3 Answers
10 pages
Errecom Cat.a.05 19.en
No ratings yet
Errecom Cat.a.05 19.en
88 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
No ratings yet
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
212 pages
Single Cycle Mips
No ratings yet
Single Cycle Mips
25 pages
Immobilization of Enzymes
No ratings yet
Immobilization of Enzymes
21 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
2 Operations On Polynomials
No ratings yet
2 Operations On Polynomials
5 pages
Marketing Research Outline
67% (3)
Marketing Research Outline
7 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Table 1: Control Signals and Opcodes
No ratings yet
Table 1: Control Signals and Opcodes
6 pages
01 - Mod 2 - Livro Autorresponsabilidade
No ratings yet
01 - Mod 2 - Livro Autorresponsabilidade
9 pages
Final Analytical Cement Mortar Grouting Exceed BOQ Report 11.03.2024 From TL
No ratings yet
Final Analytical Cement Mortar Grouting Exceed BOQ Report 11.03.2024 From TL
55 pages
47 Exp2 Dav
No ratings yet
47 Exp2 Dav
15 pages
Application of Matrix - Linear Mapping-5
No ratings yet
Application of Matrix - Linear Mapping-5
7 pages
Pipeline Datapaths: Pipelined Datapath and Control
No ratings yet
Pipeline Datapaths: Pipelined Datapath and Control
16 pages
135-4500-421H 5.12inch 20-23 PPF 6K RHP-SPR Packer
No ratings yet
135-4500-421H 5.12inch 20-23 PPF 6K RHP-SPR Packer
15 pages
Practical No 18
No ratings yet
Practical No 18
8 pages
Shape Vocabulary Word Mat
No ratings yet
Shape Vocabulary Word Mat
4 pages
Prac 4 Report
100% (1)
Prac 4 Report
15 pages
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
No ratings yet
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
14 pages
ST1837 B46TU-B48TU Engines
100% (2)
ST1837 B46TU-B48TU Engines
40 pages
Are We Compatible or Terrible
No ratings yet
Are We Compatible or Terrible
6 pages
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
No ratings yet
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
8 pages
Programming Language - Common Lisp 8. Structures
No ratings yet
Programming Language - Common Lisp 8. Structures
10 pages
10 1149@2 1181908jes
No ratings yet
10 1149@2 1181908jes
6 pages
Netflix Cookies X19
No ratings yet
Netflix Cookies X19
4 pages
Introduction To Part I: The Methanol-to-Olefins (MTO) Reaction and Small-Pore Microporous Materials
No ratings yet
Introduction To Part I: The Methanol-to-Olefins (MTO) Reaction and Small-Pore Microporous Materials
13 pages
Lecture 6 - Test Design Techniques
No ratings yet
Lecture 6 - Test Design Techniques
44 pages
PH 401
No ratings yet
PH 401
9 pages
Migration XPPS Xpert Sebn Ro: Content
No ratings yet
Migration XPPS Xpert Sebn Ro: Content
5 pages
Chapter 6 HW Packet
No ratings yet
Chapter 6 HW Packet
19 pages
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet

Chapter Six: 2004 Morgan Kaufmann Publishers

Uploaded by

Chapter Six: 2004 Morgan Kaufmann Publishers

Uploaded by

Chapter Six

2004 Morgan Kaufmann Publishers

lw $1, 100($0) Instruction R g e fetch lw $2, 200($0) lw $3, 300($0)

Rg e Instruction R g e fetch AU L 800 ps Data access Rg e Instruction fetch 800 ps

Note: timing assumptions changed for this example

Program execution Time order (in instructions) lw $1, 100($0)

Rg e Instruction fetch 200 ps

Rg e Data access AU L Rg e Data access Rg e

lw $2, 200($0) 200 ps lw $3, 300($0)

200 ps 200 ps 200 ps 200 ps 200 ps

Ideal speedup is number of stages in the pipeline. Do we achieve this?

A de s drs Instruction Instruction memory

What do we need to add to actually split the datapath into stages?

Add 4 Shift left 2 Add Add result

Zero ALU ALU result

Address Data memory Write data

Add 4 Shift left 2 Add Add result

Zero ALU ALU result

Address Data memory Write data

2004 Morgan Kaufmann Publishers

Graphically Representing Pipelines

2004 Morgan Kaufmann Publishers

Instruction (1511) RegDst

2004 Morgan Kaufmann Publishers

2004 Morgan Kaufmann Publishers

Instruction R-format lw sw beq

2004 Morgan Kaufmann Publishers

Datapath with Control

Add 4 Shift left 2 Add Add result ALUSrc

Zero ALU ALU result

Address Data memory Write data

ALU control ALUOp

Instruction [2016] Instruction [1511]

2004 Morgan Kaufmann Publishers

and $12, $2, $5

add $14, $2, $2

2004 Morgan Kaufmann Publishers

Problem: this really slows us down!

2004 Morgan Kaufmann Publishers

and $12, $2, $5

what if this $2 was $13?

2004 Morgan Kaufmann Publishers

ID/EX M u x Registers ForwardA M u x ALU

2004 Morgan Kaufmann Publishers

Can't always forward

and $4, $2, $5

add $9, $4, $2

slt $1, $6, $7

Thus, we need a hazard detection unit to stall the load instruction

add $4, $2, $5

add $9, $4, $2

2004 Morgan Kaufmann Publishers

Hazard Detection Unit

ID/EX WB Control IF/ID 0 M u x M EX EX/MEM WB M MEM/WB WB

M u x Registers ALU PC Instruction memory M u x Data memory M u x

IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rs Rt Forwarding unit Rt Rd M u x

2004 Morgan Kaufmann Publishers

44 and $12, $2, $5

52 add $14, $2, $2

Registers PC Instruction memory

Note: weve also moved branch decision to ID stage

A 2-bit prediction scheme

A branch target buffer to help us look up the destination

2004 Morgan Kaufmann Publishers

Trying to exploit instruction-level parallelism

2004 Morgan Kaufmann Publishers

2004 Morgan Kaufmann Publishers

Multiple issue with deep pipeline (Section 6.10)

Multicycle (Section 5.5)

Multiple-issue pipelined (Section 6.9)

Single-cycle (Section 5.4)

Multicycle (Section 5.5)

1 Use latency in instructions

Instructions per clock (IPC = 1/CPI)

2004 Morgan Kaufmann Publishers

You might also like