Lect06 2up
Lect06 2up
Lect06 2up
Admin
• HW #2
• Project Selection by October 2
– Your own ideas?
• Short proposal due October 2
– Content: problem definition, goal of project, metric for success
– 3 - 5 page document
– 5 - 10 minute presentation
• Status report due November 1.
– document only
• Final report due December 6
– 8-10 page document
– 15-20 minute presentation
Page 1
1
Review: ILP
Today
• SW parallelism dependencies defined for program,
hazards if HW cannot resolve dependencies
• SW dependencies/Compiler sophistication determine
if compiler can unroll loops
– Memory dependencies hardest to determine
Page 2
2
Review: Unrolled Loop That Minimizes Stalls
1 Loop: LD F0,0(R1)
2 LD F6,-8(R1)
• What assumptions
3 LD F10,-16(R1) made when moved
4 LD F14,-24(R1) code?
5 ADDD F4,F0,F2 – OK to move store past
6 ADDD F8,F6,F2 SUBI even though changes
7 ADDD F12,F10,F2 register
8 ADDD F16,F14,F2 – OK to move loads before
9 SD 0(R1),F4 stores: get right data?
10 SD -8(R1),F8 – When is it safe for
11 SD -16(R1),F12 compiler to do such
12 SUBI R1,R1,#32 changes?
13 BNEZ R1,LOOP
14 SD 8(R1),F16 ; 8-32 = -24
Page 3
3
Can we do better?
Page 4
4
HW Schemes: Instruction Parallelism
Scoreboard Implications
Page 5
5
Four Stages of Scoreboard Control
1. Issue: decode instructions & check for structural
hazards (ID1)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the scoreboard
issues the instruction to the functional unit and updates its internal
data structure. If a structural or WAW hazard exists, then the
instruction issue stalls, and no further instructions will issue until
these hazards are cleared.
2. Read operands: wait until no data hazards, then
read operands (ID2)
A source operand is available if no earlier issued active instruction is
going to write it, or if the register containing the operand is being
written by a currently active functional unit. When the source
operands are available, the scoreboard tells the functional unit to
proceed to read the operands from the registers and begin execution.
The scoreboard resolves RAW hazards dynamically in this step, and
instructions may be sent into execution out of order.
Page 6
6
Three Parts of the Scoreboard
Page 7
7
Scoreboard Example Cycle 2
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2
LD F2 45+ R3
MULT F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Page 8
8
Scoreboard Example Cycle 4
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULT F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Page 9
9
Scoreboard Example Cycle 6
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULT F0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Page 10
10
Scoreboard Example Cycle 8a
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULT F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Page 11
11
Scoreboard Example Cycle 9
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Page 12
12
Scoreboard Example Cycle 12
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Page 13
13
Scoreboard Example Cycle 14
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Page 14
14
Scoreboard Example Cycle 16
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Page 15
15
Scoreboard Example Cycle 18
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Page 16
16
Scoreboard Example Cycle 20
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Page 17
17
Scoreboard Example Cycle 22
Instruction Status Read Execution Write
Instruction j k Issue Operand Complete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12 40 cycle
DIVD F10 F0 F6 8 21
Divide
ADDD F6 F8 F2 13 14 16 22
Page 18
18
Scoreboard Summary
Page 19
19
Tomasulo Organization
From Instruction Unit
From
Memory FP Registers
Load FP op
Buffers queue
Operand
Bus
Store
Buffers
To Memory
FP adders FP multipliers
Page 20
20
Three Stages of Tomasulo Algorithm
Page 21
21
Tomasulo Example Cycle 1
Page 22
22
Tomasulo Example Cycle 3
Page 23
23
Tomasulo Example Cycle 5
Page 24
24
Tomasulo Example Cycle 7
Page 25
25
Tomasulo Example Cycle 9
Page 26
26
Tomasulo Example Cycle 11
Page 27
27
Tomasulo Example Cycle 13
Page 28
28
Tomasulo Example Cycle 15
Page 29
29
Tomasulo Example Cycle 17
Page 30
30
Tomasulo Example Cycle 57
Page 31
31
Tomasulo Example Cycle 59
• Is tomasulo better?
• Finish in 59 cycles vs. 61 for scoreboard, why?
• We do reach the divide 3 cycles earlier…
Simultaneous read of operand for SUBD and MULT
Page 32
32
Tomasulo Loop Example
Loop: LD F0 0 R1
MULTD F4 F0 F2
SD F4 0 R1
SUBI R1 R1 #8
BNEZ R1 Loop
Page 33
33
Loop Example Cycle 1
Page 34
34
Loop Example Cycle 3
Page 35
35
Loop Example Cycle 5
Page 36
36
Loop Example Cycle 7
Page 37
37
Loop Example Cycle 9
Page 38
38
Loop Example Cycle 11
Page 39
39
Loop Example Cycle 13
Page 40
40
Loop Example Cycle 15
Page 41
41
Loop Example Cycle 17
Page 42
42
Loop Example Cycle 19
Page 43
43
Loop Example Cycle 21
Tomasulo Summary
Page 44
44
Next Time
Page 45
45