0% found this document useful (0 votes)
10 views5 pages

CA Classes-136-140

The document discusses the Tomasulo algorithm for dynamic scheduling and its key aspects. It describes how the algorithm avoids data hazards through register renaming and using reservation stations and load/store buffers. It also compares Tomasulo's algorithm to scoreboarding.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

CA Classes-136-140

The document discusses the Tomasulo algorithm for dynamic scheduling and its key aspects. It describes how the algorithm avoids data hazards through register renaming and using reservation stations and load/store buffers. It also compares Tomasulo's algorithm to scoreboarding.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computer Architecture Unit 6

6.4 Dynamic Scheduling Algorithm – The Tomasulo Approach


Dynamic Scheduling Algorithm was proposed by Robert Tomasulo.
Tomasulo’s scheme combines the important constituents of scoreboard
methodology with the prologue of Register renaming. This scheme has
many variants. The basic idea behind this algorithm is “Avoiding WAR and
WAW data hazards by use of renaming registers”.
The Tumasulo algorithm
It was formulated for IBM 360/91 in 1967; approximately three years later to
CDC 6600. This algorithm emphasises on the FPUs, in relation to a
pipelined FPU for DLX. The key distinction between DLX and the IBM360 is
that IBM 360 processor contains register-memory instructions.
Tomasulo’s algorithm makes use of a load FU therefore no key alterations
are essential for adding register-memory addressing modes. One of the
most significant additions is an added bus. The IBM 360/91 also contains
pipelined FU rather than numerous FUs. The only dissimilarity is that
pipelined FU can commence at the most one action in a clock cycle. There
are no major variations between the IBM 360/91 and CDC6600. The IBM
360/91 is capable of holding 3 operations for the FP (floating-point) adder
and 2 for the FP (floating-point multiplier). Additionally it may contain
maximum of 6 FP loads, or memory references, and 3 FP stores as
outstanding. To do this load data buffers & store data buffers are utilized.
There are various differences between Tomasulo’s scheme and
scoreboarding. These are given below:
 In Tomasulo’s scheme, the control and buffers are dispersed between
FUs (Functional Units) but it is centralised in score board technique. In
case of Tomasulo’s scheme register renaming is done to avoid the data
and structural hazards but no register renaming is done in score board
technique.
 CBD (Common Data Bus) is responsible for broadcasting the results to
all FUs in case of Tomasulo’s scheme. But scoreboard technique writes
the results into various registers.
 The Tumasulo algorithm can read operands from registers and CDB
(common data bus) and write operands to CDB only. While the
operands are read and written from and to registers in case of score
board technique.

Manipal University of Jaipur B1648 Page No. 136


Computer Architecture Unit 6

 In Tomasulo’s scheme, issue can take place only when the RS


(Reservation station) is free while the issue can take place when the FU
is free.
Figure 6.4 shows the basic structure of a Tomasulo-based floating-point
unit for DLX.

Figure 6.4: Basic Structure of a DLX Floating-Point Unit using Tomasulo’s


Algorithm

The reservation station contains the following:


 Issued instructions which are waiting for execution by the FU,
 operands for the instructions which have already been worked out (else
the source of the operands),
 Information required to handle the instruction after it has started
execution.

Manipal University of Jaipur B1648 Page No. 137


Computer Architecture Unit 6

The addresses, which come from or go to the memory are held in the load
buffers and store buffers. A pair of bus connects the FP register to FU and
a bus connects FP register to store buffers. Common bus transmits the
results from the FU & from memory everywhere excluding the load buffer.
The buffers & RS (reservation stations) contain tag fields that are utilized
for hazard control.
Tomasulo’s scheme is invoking when the designers are compelled to
pipeline the architecture where it is hard to schedule code or has registers
sufficiency of. But when evaluated in terms of cost, the benefits of the
Tomasulo approach as compared to compiler scheduling for an effective
single-issue pipeline are very less. But with the increasing demand for
issuance capability and improved performance of difficult-to-schedule
codes the methods of dynamic scheduling & register renaming are
becoming more wide-spread.
Self Assessment Questions
9. Tumasulo scheme was invented by ______________.
10. The ________________ could hold 3 operations for the FP adder and
2 for the FP multiplier.
11. The ____________ and ______________ are used to store the data/
addresses that come from or go to memory.

Activity 1:
Imagine yourself as a computer architect. Explain the measures you will
take to overcome data hazards with dynamic scheduling.

6.5 High Performance Instruction Delivery


In case of MIPS 5-stage pipelining, the address of the incoming-instruction-
fetch must be recognized before the completion of the present Instruction
Fetch (IF) cycle. Consequently, for ZERO branch penalties, it ought to be
realized if the fetched (as-yet un-decoded) instruction is branch or not. In
case it is a branch then it must also know the next-PC (Program Counter).
This is accomplished by introducing a Cache which contains the address of
the following instruction if branch is taken as well as not-taken. This cache
is known as the Branch-Target Cache or Branch-Target Buffer (BTB).
The branch-prediction buffer is accessed throughout the ID phase, after the
Manipal University of Jaipur B1648 Page No. 138
Computer Architecture Unit 6

instruction decode, i.e., we know the branch-target address at the end of ID


stage to fetch the next predicted instruction. This is shown in figure 6.5.

Figure 6.5: Branch Prediction

6.5.1 Branch target buffer


Branch Target Buffer has three fields:
 Lookup: Addresses of the known branch instructions (predicted as
taken)
 Predicted PC: PC of the fetched instruction predicted taken-branch
 Prediction State: Optional: Extra prediction state bits
Branch Target Buffer has the following complications:
 Complication arise in using 2-bit predictor because it uses information
for both the branches taken and not-taken
 This complication is resolved in PowerPC processors by using both the
Target-buffer and Prediction-buffer
The penalty can be calculated by looking at the possibility of the 2 events:
(i) Branch predicted taken but end up not take
= %buffer hit rate x % incorrect prediction
= 0.95 x 0.1 = 0.095
Manipal University of Jaipur B1648 Page No. 139
Computer Architecture Unit 6

(ii) Branch is taken but is not found in buffer


= % incorrect prediction
= 0.1
The penalty in both the cases is 2 cycles, therefore,
Branch Penalty = (0.095 + 0.1) x2 = 0.195 x 2 = 0.39
Example:
Consider a branch-target buffer implemented for conditional branches only
for pipelined processor.
Assuming that:
 Misprediction penalty = 4 cycles
 Buffer miss-penalty = 3cycles
 Hit rate and accuracy each = 90%
 Branch Frequency = 15%
Solution:
The speedup with Branch Target Buffer verses no BTB is expressed as:
Speedup = CPI no BTB/CPI BTB
= (CPI base+Stallsno BTB) / (CPI base + Stalls BTB)
The stalls are determined as:
Stalls = ΣFrequency x Penalty
The sum over all the stall cases is given as the product of frequency of the
stall cases and the stall-penalty.
i) Stallsno BTB = 0.15 x 2 = 0.30
ii) To find Stalls BTB, we have to consider each output from BTB
There exist three possibilities:
a) Branch misses the BTB:
Frequency = 15 % x 0.1 = 1.5% = 0.015
Penalty = 3
Stalls=0.045
b) Branch can hit and correctly predicted:
Frequency = 15 % x 0.9(hit)x 0.9(prediction)= 12.1% = 0.121
Penalty = 0
Stalls= 0

Manipal University of Jaipur B1648 Page No. 140

You might also like