0% found this document useful (0 votes)
105 views

Android-Based Simulator To Support Tomasulo Algorithm Teaching and Learning

This document describes an Android-based simulator that was created to help students learn Tomasulo's algorithm for dynamic instruction scheduling. Tomasulo's algorithm allows out-of-order execution to reduce data hazards. The simulator models the key components of a Tomasulo-based processor and allows step-by-step simulation and animation to demonstrate how dynamic scheduling is achieved through register renaming and tracking instruction dependencies. The goals are to help students better understand hardware scheduling techniques, reservation stations, and how Tomasulo's algorithm eliminates write-after-write and write-after-read hazards.

Uploaded by

owais khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Android-Based Simulator To Support Tomasulo Algorithm Teaching and Learning

This document describes an Android-based simulator that was created to help students learn Tomasulo's algorithm for dynamic instruction scheduling. Tomasulo's algorithm allows out-of-order execution to reduce data hazards. The simulator models the key components of a Tomasulo-based processor and allows step-by-step simulation and animation to demonstrate how dynamic scheduling is achieved through register renaming and tracking instruction dependencies. The goals are to help students better understand hardware scheduling techniques, reservation stations, and how Tomasulo's algorithm eliminates write-after-write and write-after-read hazards.

Uploaded by

owais khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318502489

Android-based Simulator to Support Tomasulo Algorithm Teaching and


Learning

Article  in  International Journal of Computer Applications · July 2017


DOI: 10.5120/ijca2017914703

CITATIONS READS

0 1,953

2 authors, including:

Dimitris Kehagias
University of West Attica
20 PUBLICATIONS   34 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

MESI cache coherence simulator View project

All content following this page was uploaded by Dimitris Kehagias on 01 September 2017.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

Android-based Simulator to Support Tomasulo


Algorithm Teaching and Learning

Dimitris Kehagias V. Douskas-Bertlviser


Department of Informatics Department of Informatics
T.E.I. of Athens T.E.I. of Athens
Greece Greece

ABSTRACT 1.1 Motivation


Tomasulo’s algorithm is a dynamic instruction scheduling For students in an undergraduate advanced computer
algorithm that allows out-of-order execution, to minimize architecture course, implementing dynamic scheduling using
“Read-After-Write” (RAW) hazards and by register renaming Tomasulos’ algorithm is often confusing as it is not that
to reduce “Write-After-Read” (WAR) and “Write-After- distinct. That’s why Tomasulo simulation tools are used to
Write” (WAW) hazards. This paper describes an Android support learning [13, 14].
based simulator that shows how dynamic scheduling is
obtained using Tomasulo's Algorithm. The simulator is Our intention to build a Tomasulo simulator was motivated by
configurable, while the simulation can be operated in a step the fact that many students, in the undergraduate advanced
by step mode and with animation in order to help students computer architecture course offered by the Informatics
comprehend the concepts of dynamic scheduling anytime, department of the Technological Educational Institute (T.E.I.)
anywhere. of Athens [12], exhibit difficulties fully understand what
Tomasulo’s algorithm is doing clock cycle per clock cycle
General Terms and how it is used to minimize data hazards.
Computer Simulation, Algorithms, Hardware, Applied
Computing.
1.2 Objectives
The main objective of the work presented in this paper was to
Keywords create a suitable tool to support “dynamic scheduling”
Tomasulo’s algorithm, Simulator, Computer architecture, teaching and learning in the context of an “Advanced
Interactive animation. Computer Architecture” course and especially in the
“Computer Architecture” and “Advanced Computer
1. INTRODUCTION Architecture” courses, offered by the Informatics department
Pipelining is extensively used in modern processors in order of the Technological Educational Institute (T.E.I.) of Athens.
to achieve instruction level parallelism and improve This simulator is an indispensable tool to clarify some
performance. In a conventional pipelined processor there are important issues in these courses:
5-pipe stages, namely Instruction Fetch (IF), Instruction
Decode (ID), Execute (EX), Memory (MEM) and Write Back  How dynamic scheduling allows independent
(WB) [9]. In the IF stage the program counter is used to get instructions behind a stall to proceed
the instruction from instruction memory and put into the
Instruction Register. In the ID stage, the instruction sent from  How an instruction can begin execution as soon as its
the IR is decoded. The instructions are executed in the EX operands are available
stage. The load/store instructions access memory during the  How dynamic scheduling allows instructions to execute
MEM stage and in the last stage (WB) the results come from and complete out of order
data memory or the ALU are written into the register file.
 Register renaming
However, it is not always possible to run the pipeline at full
capacity because of control, structural or data hazards. Data And it seeks to help students:
hazards - RAW, WAR or WAW - exist when reads and writes
 Better understand hardware scheduling techniques for
of data occur in a different order in the pipeline than in the
exploiting instruction-level parallelism
program code.
 To see how Tomasulo’s algorithm implements dynamic
Rearranging the execution sequence of instructions that scheduling
belong to the same code segment can reduce data hazards and
improve the performance. Dynamic scheduling algorithms  Better understand the concept of reservation station
such as Tomasulo and Scoreboard are examples of  To see visual representation of each stage of the
implementing the algorithms in hardware. algorithm (issue, execute and write back)
Tomasulo’s algorithm was developed by R. Tomasulo at IBM  To see how instructions are completed out of order
in 1967 [3] and first used in the IBM System/360 Model 91
floating point unit. There are many variations on this  To see how the algorithm eliminates WAW/WAR
algorithm in modern processors, although the key concepts of hazards
tracking instruction dependences to allow execution as soon  To see how register renaming is provided by reservation
as operands are available and renaming registers to avoid stations
WAR and WAW hazards are common characteristics [1, 11].

24
International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

The rest of the paper is organized as follows. Section II 3. TOMASULO’S ALGORITHM


provides a brief overview on the most relevant educational Figure 1 [1] shows the basic structure of a Tomasulo based
simulators. Section III explains the Tomasulo’s algorithm. processor. The major components of the processor are as
Section IV presents an overview of simulator implementation, follows [10]:
its functioning and its features. Section V concludes the paper.
Reservation stations: these units receive an instruction from
2. RELATED WORK the instruction unit, wait for source operand data to be ready
Various standalone tools exist to explain how dynamic before starting the execution of the instruction and broadcast
scheduling is obtained using the Tomasulo's Algorithm. The the result of the instruction on the Common Data Bus (CDB)
most relevant ones are presented in the following lines: when the result is ready.
In [5] a HASE simulation model, which closely follows the Functional units: these are the circuits that perform the
design of the IBM system 360/91 floating-point unit, has been execution steps for an instruction. Example functional units
built in order to demonstrate dynamically the Tomasulo's are FP adders, FP multipliers, integer ALUs, shifters, and so
algorithm. on.
The simulator in [6] simulates Tomasulo's algorithm for a Register File: Contains the data produced by the functional
floating-point MIPS-like instruction pipeline, demonstrating units.
out-of-order execution.
The CDB: connects the output of the functional units to all
[7] and [4] present two web-based tools that have been components expecting those results.
developed for students to understand the concepts of the
Tomasulo's algorithm used for dynamic scheduling. Load and store buffers: hold data and addresses for memory
access.
However, none of them includes all the features that our
proposal offers. These features include operation in a step by Each instruction in Tomasulo’s algorithm has 3 main stages.
step mode, animation, written explanations in every animation These are issue, execute and write back. In the issue stage, the
step, configurable execution core, variable issue rate, variable next instruction from the top of the instruction queue is sent to
latency per instruction class. Also, allows the user (i) to see an appropriate free reservation station with its operant values
memory contents during simulation, (ii) to show or hide if they are available in the register file. If the operands are not
animations, (iii) to move to the cycle in which some visible in the register file, the instruction keeps track of the functional
action occurs and get help during simulation. In addition, the unit that is going to produce it. In effect, this stage renames
android version of our simulator makes it a useful and unique registers. When all operands are available for an instruction, it
tool, considering how android applications becoming very will proceed the execute stage; otherwise, it waits for the
popular [8]. operands to be available. That means the execution of
instructions may be out of order. Once an instruction has
finished executing, it enters the write back stage, where it will
write its result to the CDB. Any instruction as well as registers
waiting for this specific result will collect it from the CDB.

Figure 1[1]: The basic structure of a MIPS floating-point unit using Tomasulo’s algorithm.

25
International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

4. TOMASULO SIMULATOR an instruction may be passed through stages issue and


dispatch or dispatch and execute but not by the stages
4.1 Functional Description issue and execute.
The code segment to be simulated can be changed by adding
or deleting instructions of the segment. For each instruction,  In each cycle only one instruction can pass by the
the destination and the source registers must be specified. The broadcast stage. If multiple instructions are ready for
supported instructions are ADDD, SUBD, MULD, DIVD, LD broadcast, then priority is given to the one with the
and SD. Each type of instruction can have its own latency, highest latency.
ranging from 1 to 50. The ADDD and SUBD instructions are  A load instruction must wait before entering the execute
executed in the integer execution units, while MULD and stage, if an older store instruction with the same data
DIVD are executed in the multi-cycle execution units. There memory address is also ready to enter the execute stage.
are also load and store buffers to hold data and addresses for
memory accesses. The number of units is also configurable,  A store instruction must wait before entering the execute
and can be set to 1, 2 or 3 units of each class. The simulation stage, if an older load or store instruction with the same
can be operated in all at once mode or in a step by step mode. data memory address is also ready to enter the execute
stage.
The instructions to be processed reside in an instruction
queue, in the order entered by the user, waiting to be executed Classes used in simulator:
in first-in, first-out order. There are four stages an instruction
goes through in order to complete its execution. These stages Each screen of the application is accompanied by an
are issue, dispatch, execute and broadcast: appropriate class that extends the Android Activity class,
which supports the development of interfaces and activities.
Issue: During the issue stage the next -in program order- For each of these classes, the screen layout is defined by a
instruction is taken from the instruction queue and placed into corresponding xml file that includes all the necessary
a reservation station of correct kind. In the case of load/store elements for describing the appearance of the interface in the
instructions, they are placed in a load/store queue. No user's mobile device. Also, for every activity there are several
instruction is issued if all the reservation stations or the auxiliary classes that support user interaction with the
load/store queues are occupied. An issued instruction to a simulation. Figure 2 shows the interconnection of classes of
reservation station is followed with its operands values if the application.
available or with associated tags indicating the reservation
station that will produce the operands. In the issue stage an Verification:
instruction monitors the CDB to see if it broadcasts the values During the development process we have made exhaustive
it is waiting for, by comparing the tags it is waiting on with tests to verify the correctness, functional behavior, and
the tags of the instruction producing the result. appearance of the application.
Dispatch: An instruction can be dispatched to a functional Initially the application was installed on many different
unit to start execution, when its source operands are ready and mobile devices of different screen sizes, to improve and adapt
the corresponding functional unit is free. When an instruction the appearance of the various components in a way that there
is dispatched, its reservation station is freed. are no deviations from one device to another. Thus, the
Execute: Dispatched instructions get executed after a certain consistency in the appearance of the application on different
amount of time determined by the specific functional unit’s devices was achieved with appropriate sharing of the space
delay, defined during initialization process. required by each component of the interface in conjunction
with the animated parts during simulation.
Broadcast: Once a functional unit has finished executing an
instruction outputs its result with the associated tag to the The execution of the application was tested in order to
CDB for broadcasting. When a load instruction comes back properly implement the simulation of the algorithm, not led to
from memory, the value that has been read is also broadcasted a collapse, to handle exceptions that may arise during
on the CDB. During broadcast: (a) the waiting instructions in execution of the code and finally to be backward-compatible
reservation stations and in the store buffers get these results to mobile applications.
only if their operand entries match the tag of the instruction
producing the result, (b) the appropriate register will be
4.3 The User Interface
As shown in Figure 3, the application includes multiple
updated in the register file, and (c) the register allocation table
interconnected screens. To ensure consistency in terms of
entry that matches the broadcasted tag will be cleared. During
graphical layout across the application, landscape orientation
this stage if there is more than one functional unit asking for
was chosen. The individual screens are: (a) Language
the CDB in the same cycle, priority is given to the one which
selection screen: Upon starting the simulator, a user has the
has completed an instruction with the highest execution
option to select between Greek or English language. (b) Help
latency. If in the same cycle the completed instructions have
screen: it displays instructions for how to use the simulator,
the same execution latency, they are broadcasted arbitrary.
including description of various components of the simulation
4.2 Overview of Simulator Implementation screen, and implementation assumptions that have been made
The simulator is implemented using Java in the Eclipse during the designing phase. (c) Main screen: on this screen a
(Kepler) development environment. Genymotion [2] Android user can choose from among several options, including
emulation has been used during the development of the entering code to be processed, starting simulation, configuring
application for testing. As a starting point for our work, we hardware, going back to starting screen, and reading the help
have made the following assumptions: text. (d) Code entering screen: This screen enables the user to
enter instructions to be processed and initial values into
 Each instruction completes execution after successively registers and memory locations. A drop down list has been
passed the stages of issue, dispatch, execute and provided to select the required instruction. Each instruction is
broadcast. In a single cycle, under normal circumstances, followed by three fields to choose the registers or memory

26
International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

Figure 2: Interconnection of classes

Figure 3: Arrangement of various screens


location relevant to each instruction selected. The initial RAT: Register Alias Table is a structure for performing
values given in registers and memory locations are checked register renaming. It maintains the mappings between
for validity. (e) Hardware configuration screen: On this screen reservation stations and destination registers of instructions.
the user defines the simulated execution environment,
including the size of load/store buffers, the number of LOAD Q / STORE Q: Load and store buffers for LD and SD
reservation stations, the number of execution cycles instructions. They hold data and addresses for memory access.
(latencies) taken by the functional units, and the number of INST Q: The “INST Q” component is a queue that contains
functional units. (f) Memory contents screen: On this screen a the instructions in the order entered by the user. The
user can view the contents of memory locations as they are instructions are issued into the reservation stations in first-in,
formed during the execution of the algorithm. (g) Simulation first-out order.
screen (Figure 4): This screen is where simulation takes place.
Its description follows in the next section. REGS: The “REGS” component implements the Floating-
point (F) and integer (R) register file. The registers contain
4.3.1 The simulation screen values entered by the user during the configuration process, or
The simulation screen (Figure 4) has a very rich and friendly broadcasted since instructions complete their execution. These
visual interface. It illustrates the movement of instructions to values that are already in registers, meaning the values that are
the reservation stations and the movement of results from the present and ready for execution, are entered to reservation
functional units. It consists with the following components: stations.

27
International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

Figure 4: Simulation screen

ADD RS / MUL RS: There are two types of reservation EXECUTE: Is the phase during which a functional unit (ALU
stations “ADD RS” and “MUL RS”. One is for ADDD and ADD or ALU MUL) operates on ready operands of an
SUBD instructions, while the second is for MULTD and instruction.
DIVD instructions. Each reservation station is made up of
three fields. The first field in a row holds the opcode for the BROADCAST: When an instruction finishes execution
pending instruction in the form of an arithmetic symbol (+,- broadcasts its results on a common data bus and from there
,*,/, for ADDD, SUBD, MULTD and DIVD instructions into registers and reservation stations.
respectively) and the other two fields hold either operand NEXT EVENT: Allows the user to move to the cycle in which
values, or names of reservation stations or load/store buffers some visible action occurs.
that will provide them.
MEMORY CONTENTS: Memory contents can be seen
ALU ADD / ALU MUL: Functional Units (FUs) to during simulation.
accomplish the execution step of instructions. The “ALU
ADD” FUs are floating point adders which execute ADDD ANIMS: Show or hide animations.
and SUBD instructions while the “ALU MUL” is floating
point multipliers which execute MULTD and DIVD 5. CONCLUSION
instructions. The FUs receive instruction and operand packets A tool to aid students and teachers in an undergraduate
from the RSs and send operand result packets to the common advanced computer architecture course was presented. This
data bus. The number of clock cycles required to execute an tool, an Android based simulator, shows how dynamic
instruction is a parameter read from the hardware scheduling is obtained using Tomasulo's Algorithm. Each
configuration activity at the start of a simulation. stage of the simulation is represented with animation and with
reference to flying information messages in order to give a
All the above mentioned components are interconnected with clear and detail picture of the whole process. Different
a common data bus (CDB), which is used to broadcast result configurations of the simulator can be created, each with a
from the adder, multiplier and the load buffer to the different performance/resource ratio. Initial use of the
reservation stations, the register file and the store buffers. simulator has shown learning effectiveness. The students were
helped to better recognize the process of register renaming. In
The simulation screen provides the user with several choices,
near future the simulator will be evaluated in the classroom
including:
through student surveys.
ISSUE: During the issue process the next -in program order-
instruction is taken from the instruction queue and putted into 6. REFERENCES
a free reservation station of correct kind (ADD RS or MUL [1] Hennessy J. L. and Patterson D. A., “Computer
RS). Architecture: A Quantitative Approach”. Morgan
Kaufmann, 5th Edition, 2012.
DISPATCH: The process of sending an instruction to
execution from a reservation station to a functional unit (ADD [2] Genymotion Android Emulator. Available at:
RS to ALU ADD or MUL RS to ALU MUL). https://www.genymotion.com/account/login. Accessed
on Oct. 2016.

28
International Journal of Computer Applications (0975 – 8887)
Volume 170 – No.2, July 2017

[3] Tomasulo R.M., “An efficient algorithm for exploiting [9] Patterson D. A. and Hennessy J. L., “Computer
multiple arithmetic units”. IBM Journal of Research and Organization and Design - The Hardware/Software
Development, 11(1):25–33, 1967. Interface”. 5th ed., Morgan Kaufmann, 2014.
[4] “Tomasulo’s Algorithm for Dynamic Scheduling”. [10] "CSE P548 - Tomasulo", washington.edu. Washington
Available at: University. 2006. Accessed on Feb. 2017.
http://dark.eit.lth.se/darklab/tomasulo/script/tomasulo.ht
m. Accessed on Feb. 2017. [11] Hwang K. and Jotwani N., “Advanced Computer
Architecture-Parallelism, Scalability, Programmability”.
[5] “Tomasulo’s Algorithm. University of Edinburgh”. 3rd ed., McGraw Hill, 2016.
Available at:
http://www.icsa.inf.ed.ac.uk/research/groups/hase/model [12] “Advanced Computer Architecture”. Available at:
s/tomasulo/index.html. Accessed on Feb. 2017. http://www.cs.teiath.gr/?page_id=6450.

[6] Typanski N., “Tomasulo algorithm simulator [13] Hatfield B. and Rieker M., “Incorporating simulation
(prototype)”. Available at: and implementation into teaching computer
http://nathantypanski.github.io/tomasulo-simulator/ organization and architecture”. 35th ASEE/IEEE
Accessed on Feb. 2017. Frontiers in Education Conf, Indianapolis, USA, pp:
FIG-18, 2005.
[7] University of Massachusetts at Amherst. “Dynamic
Scheduling Using Tomasulo's Algorithm”. Available at: [14] Carpinelli J. D., and Jaramillo F., “Simulation tools for
http://www.ecs.umass.edu/ece/koren/architecture/. digital design and computer organization and
Accessed on Feb. 2017. architecture”. Paper presented at the 31st ASEE/ IEEE
Frontiers in Education Conference, Reno, NV, 2001.
[8] Butler M., “Android: Changing the Mobile Landscape”.
IEEE Pervasive Computing, vol. 10, no. 1, pp. 4 – 7,
January-March 2011.

IJCATM : www.ijcaonline.org 29

View publication stats

You might also like