Chapter4 2
Chapter4 2
6
Implementing Loads/Stores
7
Source: H&P textbook
Implementing J-type Instructions
9
Source: H&P textbook
View from 5,000 Feet
10
Source: H&P textbook
Latches and Clocks in a Single-Cycle Design
Reg
File
12
The Assembly Line
Unpipelined Start and finish a job before moving to the next
Jobs
Time
A B C
A B C Break the job into smaller stages
A B C
A B C
Pipelined
13
Performance Improvements?
4
A 5-Stage Pipeline
register write in the first half of the clock cycle (dotted part)
register read in the second half of the clock cycle (solid part)
5
Source: H&P textbook
A 5-Stage Pipeline
for eg, add instruction does not require DM, but still it will
take 5 clock cycles (it will wait for that particular clock cycle)
6
DM - data memory
A 5-Stage Pipeline improved throughput, thorughput becomes 5 times
Read registers, compare registers, compute branch target; for now, assume
branches take 2 cyc (there is enough work that branches can easily take more)
7
A 5-Stage Pipeline
8
A 5-Stage Pipeline
9
A 5-Stage Pipeline
10
Pipeline Summary
note: no skipping of stages. so that there is no
overtaking (faster instruction overtaking the
slower one)
RR ALU DM RW
still 5 cycles taken (even if DM is empty). IM is not shown, only the latter 4 cycles are shown
ADD R1, R2, R3 Rd R1,R2 R1+R2 -- Wr R3
11
Performance Improvements?
12
Quantitative Effects
• As a result of pipelining:
Time in ns per instruction goes up
Each instruction takes more cycles to execute
But… average CPI remains roughly the same
Clock speed goes up becomes 5 times for 5 stage pipeline
Total execution time goes down, resulting in lower
average time per instruction
Under ideal conditions, speedup
= ratio of elapsed times between successive instruction
completions
= number of pipeline stages = increase in clock speed
13
Conflicts/Problems
14
Hazards
15
Structural Hazards
5
Data Hazards
6
Example 1 – No Bypassing i1 and i2 have data hazard
• Show the instruction occupying each stage in each cycle (no bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R7+R8R9
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF IF IF IF IF IF IF IF
DM DM DM DM DM DM DM DM
RW RW RW RW RW RW RW RW 7
Example 1 – No Bypassing
• Show the instruction occupying each stage in each cycle (no bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R7+R8R9
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF IF IF IF IF IF IF IF
L2
I1 I2 I3 I3 I3
waiting for I2 to proceed
I4 I5
L3
D/R D/R D/R D/R D/R D/R D/R D/R
I1 I2 I2 I2
concluded finally
I3 I4
in the second half
of the clock cycle
L4 ALU ALU ALU ALU ALU ALU ALU ALU
I1 I2 I3
L5
DM DM DM DM DM DM DM DM
I1 this is a bubble
I2 I3
RW RW RW RW RW RW RW RW 8
I1 I2
Example 2 – Bypassing
• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF IF IF IF IF IF IF IF
DM DM DM DM DM DM DM DM
RW RW RW RW RW RW RW RW 9
Example 2 – Bypassing Li Lj means that Li has been overwritten by Lj
L5 L3 because by the end of cyc4, L4 has been updated by the ALU op
• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
observe that the result has been stored in L3 for I1 in cyc3 itself, and it is directly usable now for I2 (dont have to wait for all 5 cycles)
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF IF IF IF IF IF IF IF
I1 I2 I3 I4 I5
L2
DM DM DM DM DM DM DM DM
I1 I2 I3
L5
RW RW RW RW RW RW RW RW
I1 I2 I3
Problem 1
IF D/R ALUL3 DM RW
i1 i1 i1 i1 i1
IF D/R ALU DM RW
L4
i2 i2 i2 i2
i2
IF D/R ALU DM RW
add $1, $2, $3
IF D/R ALU DM RW
lw $4, 8($1)
11
Problem 2
L2 L3 L4 L5
IF D/R ALU DM RW
i1 i1
i1 i1 i1
IF D/R ALU DM RW
i2 i2
12
Problem 3 1) read from L5
2) writing will happen in the first half and hence DM can access the written part in the second half
L5
IF D/R ALU DM RW
i1 i1 i1 i1
i1
IF D/R ALU DM RW
i2 i2 i2
i2 i2
IF D/R ALU DM RW
lw $1, 8($2)
IF D/R ALU DM RW
sw $1, 8($3)
13
Problem 4
ALU DM DM RW
lw $1, 8($2)
instruction
fetch decode
IF IF Dec Dec RR ALU RW
ALU DM DM RW
lw $1, 8($2)
ALU DM DM RW
11