0% found this document useful (0 votes)
27 views29 pages

Lecture 6

Uploaded by

bharath.eee2017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views29 pages

Lecture 6

Uploaded by

bharath.eee2017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Lecture 6: Tomasulo Algorithm (II)

Explanations of each stages, A big


example

1
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free (no structural hazard),
control issues instr & sends operands (renames
registers).
2. Execution—operate on operands (EX)
When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available

Issue: build dependence for new inst


Writeback: Wakeup dependent instructions

2
Adapted from UCB CS252 S98
Issue Stage and Renaming Table
Renames its two source registers (source
renaming)
Assigns it to a free RS
Updates Renaming table (dest renaming)
Also decodes the inst and read register
values in parallel

How would the following inst be renamed?


ADD $16, $8, $9
ADD $17, $16, $16
3
Execute Stage
Only “ready” instructions can join the
competition
There is a select logic to select
instructions for FU execution
„ Some policy may be used, e.g. age based
Non-ready instructions can be “waken
up” during writeback of its parent inst

4
Writeback and Common Data Bus
Normal data bus: data + destination (“go
to” bus)
Common data bus: data + source (“come
from” bus)
„ 64 bits of data + 4 bits of source index
(tag)
„ Does the broadcast to every instruction in
the fly
Child instructions do tag matching and
update their ready bits and value fields
(if the tag matches theirs)

5
Adapted from UCB CS252 S98, Copyright 1998 USB
Code Example
LD F6,34(R2)
LD F2,45(R3) LD1 LD2

MULTI F0,F2,F4
SUBD MULTI
SUBD F8,F6,F2
DIVD F10,F0,F6 ADD DIVD
ADD F6,F8,F2

Operation latencies: load/store 2 cycles,


Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle

6
Tomasulo Example Cycle 0

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULT F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU

7
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 1

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 Load1 No
Yes 34+R2
LD F2 45+ R3 Load2 No
MULT F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1

8
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 2

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULT F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1

9
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 3

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes MULTD R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
• Note: registers names are removed (“renamed”) in Reservation
Stations
• Load1 completing; what is waiting for Load1? 10
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 4

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 Load2 Yes 45+R3
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(34+R2) Load2
0 Add2 No
Add3 No
0 Mult1 Yes MULTD R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(34+R2) Add1

• Load2 completing; what is waiting for it?


11
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 5

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(34+R2) M(45+R3)
0 Add2 No
Add3 No
10 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2

12
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 6

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(34+R2) M(45+R3)
0 Add2 Yes ADDD M(45+R3) Add1
Add3 No
9 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(45+R3) Add2 Add1 Mult2

13
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 7

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(34+R2) M(45+R3)
0 Add2 Yes ADDD M(45+R3) Add1
Add3 No
8 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(45+R3) Add2 Add1 Mult2

• Add1 completing; what is waiting for it?


14
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 8

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBDF8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDDF6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
2 Add2 Yes ADDD M()-M() M(45+R3)
0 Add3 No
7 Mult1 Yes MULTDM(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(45+R3) Add2 M()-M() Mult2

15
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 9

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
1 Add2 Yes ADDD M()–M() M(45+R3)
0 Add3 No
6 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(45+R3) Add2 M()–M() Mult2

16
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 10

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 Yes ADDD M()–M() M(45+R3)
0 Add3 No
5 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(45+R3) Add2 M()–M() Mult2

• Add2 completing; what is waiting for it?


17
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 11

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBDF8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDDF6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
0 Add3 No
4 Mult1 Yes MULTDM(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2

• Write result of ADDD here vs. scoreboard?


18
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 12

Instruction status Execution W rite


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MUL F0 F2 F4 3 Load3 No
SUB F8 F6 F2 4 7 8
DIVDF10 F0 F6 5
ADD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name BusyOp Vj Vk Qj Qk
0 Add1 No
0 Add2 No
0 Add3 No
3 Mult1 Yes MULT M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(45+R3) (M-M)+M()M()–M(Mult2

19
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 13

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
2 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

20
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 14

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
0 Add3 No
1 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

21
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 15

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 15 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

• Mult1 completing; what is waiting for it?


22
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 16

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
40 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

• Note: Just waiting for divide


23
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 55

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
1 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

24
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 56

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5 56
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

• Mult 2 completing; what is waiting for it?


25
Adapted from UCB CS252 S98, Copyright 1998 USB
Tomasulo Example Cycle 57

Instruction status Execution Write


Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULT F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5 56 57
ADDD F6 F8 F2 6 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
57 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M

• Again, in-oder issue,


out-of-order execution, completion
26
Adapted from UCB CS252 S98, Copyright 1998 USB
Review Dependences
How are dependences are enforced or
removed in Tomasulo Algorithm?

Data dependences (RAW)


Antidependence (WAR)
Output Dependence (WAW)

27
The Use of Tag
In Tomasulo, RS and
load/store buffer
index is used as tag.
Renaming assign new Why tag is so chosen?
inst a unique tag
RS stores tags to What does tag really
preserve dependences represent?
CDB broadcasts tag
with data for data What else can be used
passing and wakeup as tag?

28
Tomasulo Summary
Reservations stations:
„ Increases effective register number

„ Distributes scheduling logic

Register renaming: Avoids WAR and WAW dependence

Tag + Data broadcasting for waking up child instructions

Pros: can be effectively combined with speculative execution

Cons: CDB broadcasting adds one-cycle delay (addressed in


modern instruction scheduling)

29
Adapted from UCB CS252 S98, Copyright 1998 USB

You might also like