0% found this document useful (0 votes)
22 views30 pages

Lec 1

Data

Uploaded by

alhdhyryfwlyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

Lec 1

Data

Uploaded by

alhdhyryfwlyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Pipeline: Hazards

Fall, 2017

These slides are adapted from notes by Dr. David Patterson (UCB)

1
Single-Cycle vs. Pipelined Execution

Non-Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600 1800
Order Time
l w $ 1 , 1 0 0 ( $ 0 ) Instruction REG ALU MEM REG
Fetch RD WR
Instruction REG REG
lw $2, 200($0) ALU MEM
Fetch RD WR
800ps
Instruction
lw $3, 300($0) Fetch
800ps
800ps
Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600
Order Time
l w $ 1 , 1 0 0 ( $ 0 ) Instruction REG ALU MEM REG
Fetch RD WR
Instruction REG REG
lw $2, 200($0) ALU MEM
Fetch RD WR
200ps
Instruction REG REG
lw $3, 300($0) ALU MEM
Fetch RD WR
200ps
200ps 200ps 200ps 200ps 200ps

2
Speedup
• Consider the unpipelined processor introduced previously. Assume that it has
a 1 ns clock cycle and it uses 4 cycles for ALU operations and branches, and
5 cycles for memory operations, assume that the relative frequencies of these
operations are 40%, 20%, and 40%, respectively. Suppose that due to clock
skew and setup, pipelining the processor adds 0.2ns of overhead to the clock.
Ignoring any latency impact, how much speedup in the instruction execution
rate will we gain from a pipeline?

Average instruction execution time


= 1 ns * ((40% + 20%)*4 + 40%*5)
= 4.4ns

Speedup from pipeline


= Average instruction time unpiplined/Average instruction time pipelined
= 4.4ns/1.2ns = 3.7

3
Comments about Pipelining

• The good news


– Multiple instructions are being processed at same time
– This works because stages are isolated by registers
– Best case speedup of N
• The bad news
– Instructions interfere with each other - hazards
• Example: different instructions may need the same piece of
hardware (e.g., memory) in same clock cycle
• Example: instruction may require a result produced by an
earlier instruction that is not yet complete

4
Pipeline Hazards
• Limits to pipelining: Hazards prevent next
instruction from executing during its
designated clock cycle
– Structural hazards: two different instructions use
same h/w in same cycle
– Data hazards: Instruction depends on result of
prior instruction still in the pipeline
– Control hazards: Pipelining of branches & other
instructions that change the PC

5
Structural Hazards
• Attempt to use same resource twice at same time
• Example: Single Memory for instructions, data
– Accessed by IF stage
– Accessed at same time by MEM stage
• Solutions ?
– Delay second access by one clock cycle
– Provide separate memories for instructions, data
•This is what the book does
•This is called a “Harvard Architecture”
•Real pipelined processors have separate caches
6
Pipelined Example -
Executing Multiple Instructions
• Consider the following instruction
sequence:
lw $r0, 10($r1)
sw $sr3, 20($r4)
add $r5, $r6, $r7
sub $r8, $r9, $r10

7
Executing Multiple Instructions
Clock Cycle 1
LW

8
Executing Multiple Instructions
Clock Cycle 2
SW LW

9
Executing Multiple Instructions
Clock Cycle 3
ADD SW LW

10
Executing Multiple Instructions
Clock Cycle 4
SUB ADD SW LW

11
Executing Multiple Instructions
Clock Cycle 5
SUB ADD SW LW

12
Executing Multiple Instructions
Clock Cycle 6
SUB ADD SW

13
Executing Multiple Instructions
Clock Cycle 7
SUB ADD

14
Executing Multiple Instructions
Clock Cycle 8
SUB

15
Alternative View - Multicycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8

lw $r0, 10($r1) IM REG ALU DM REG

sw $r3, 20($r4) IM REG ALU DM REG

add $r5, $r6, $r7 IM REG ALU DM REG

sub $r8, $r9, $r10 IM REG ALU DM REG

16
Alternative View - Multicycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8

lw $r0, 10($r1) IM REG ALU DM REG

Memory Conflict

sw $r3, 20($r4) IM REG ALU DM REG

add $r5, $r6, $r7 IM REG ALU DM REG

sub $r8, $r9, $r10 IM REG ALU DM REG

17
One Memory Port Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
Ifetch Reg DMem Reg
Load
n
s

ALU
Ifetch Reg DMem Reg
t
Instr 1
r.

ALU
Ifetch Reg DMem Reg
Instr 2
O
r
Stall Bubble Bubble Bubble Bubble Bubble
d
e
r

ALU
Ifetch Reg DMem Reg
Instr 3
18
Structural Hazards
Some Common Structural Hazards:
• Memory:
– we’ve already mentioned this one.
• Floating point:
– Since many floating point instructions require many
cycles, it’s easy for them to interfere with each other.
• Starting up more of one type of instruction than
there are resources.
– For instance, the PA-8600 can support two ALU + two
load/store instructions per cycle - that’s how much
hardware it has available.
19
Dealing with Structural Hazards
Stall
– low cost, simple
– Increases CPI
– use for rare case since stalling has performance effect
Pipeline hardware resource
– useful for multi-cycle resources
– good performance
– sometimes complex e.g., RAM
Replicate resource
– good performance
– increases cost (+ maybe interconnect delay)
– useful for cheap or divisible resources
20
Structural Hazards
• Structural hazards are reduced with these rules:
– Each instruction uses a resource at most once
– Always use the resource in the same pipeline stage
– Use the resource for one cycle only
• Many RISC ISA’s designed with this in mind
• Sometimes very complex to do this.
– For example, memory of necessity is used in the IF and
MEM stages.

21
Structural Hazards
We want to compare the performance of two machines.
Which machine is faster?
– Machine A: Dual ported memory - so there are no memory
stalls
– Machine B: Single ported memory, but its pipelined
implementation has a 1.05 times faster clock rate
Assume:
– Ideal CPI = 1 for both
– Loads are 40% of instructions executed

22
Speedup from Pipelining
Speedup from pipelining =
Average instruction time unpipelined
Average instruction time pipelined
CPI unpipelined ×Clock cycle unpipelined
CPI pipelined ×Clock cycle pipelined

CPI pipelined = Ideal CPI + Pipeline stall clock


cycles per instruction
23
CPI unpipelined = Ideal CPI ×Pipeline depth
Speed Up Equations for Pipelining

CPIpipelined = Ideal CPI + Average Stall cycles per Inst

Ideal CPI × Pipeline depth Cycle Timeunpipelined


Speedup = ×
Ideal CPI + Pipeline stall CPI Cycle Timepipelined

For simple RISC pipeline, the Ideal CPI on a pipelined


processor = 1:

Pipeline depth Cycle Timeunpipelined


Speedup = ×
1 + Pipeline stall CPI Cycle Timepipelined

24
Structural Hazards
We want to compare the performance of two machines. Which machine is faster?
• Machine A: Dual ported memory - so there are no memory stalls
• Machine B: Single ported memory, but its pipelined implementation has a 1.05 times
faster clock rate
Assume:
• Ideal CPI = 1 for both
• Loads are 40% of instructions executed

25
Summary - Structural Hazards
• Speed Up <= Pipeline Depth; if ideal CPI is 1, then:
Pipeline Depth Clock Cycle Unpipelined
Speedup = X
1 + Pipeline stall CPI Clock Cycle Pipelined

• Hazards limit performance on computers:


– Structural: need more HW resources
– Data (RAW,WAR,WAW):
– Control

26
Data Hazards
• Data hazards occur when data is used
before it is stored
Time (in clock cycles)

Value of CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
register $2: 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Program
execution
order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg

The use of the result of the SUB instruction in the next three instructions causes a
data hazard, since the register is not written until after those instructions read it.
27
Data Hazards
Execution Order is: Read After Write (RAW)
InstrI
InstrJ InstrJ tries to read operand before InstrI writes it
I: add r1,r2,r3
J: sub r4,r1,r3

• Caused by a “Dependence” (in compiler nomenclature). This


hazard results from an actual need for communication.

28
Data Hazards
Execution Order is: Write After Read (WAR)
InstrI
InstrJ
InstrJ tries to write operand before InstrI reads i
– Gets wrong operand
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7

– Called an “anti-dependence” by compiler writers.


This results from reuse of the name “r1”.

• Can’t happen in MIPS 5 stage pipeline because:


–All instructions take 5 stages, and
– Reads are always in stage 2, and
– Writes are always in stage 5
29
Data Hazards
Execution Order is: Write After Write (WAW)
InstrI
InstrJ InstrJ tries to write operand before InstrI writes it
– Leaves wrong result ( InstrI not InstrJ )

I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
–All instructions take 5 stages, and
– Writes are always in stage 5

•Will see WAR and WAW in later more


complicated pipes
30

You might also like