0% found this document useful (0 votes)
110 views7 pages

A Visual Simulation Framework For Simult PDF

Uploaded by

hectorjazz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views7 pages

A Visual Simulation Framework For Simult PDF

Uploaded by

hectorjazz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A VISUAL SIMULATION FRAMEWORK FOR SIMULTANEOUS

MULTITHREADING ARCHITECTURES
Adrian Florea1, Alexandru Ratiu1, Arpad Gellert1 and Lucian N. Vinţan1,2
1
Computer Engineering Department, “Lucian Blaga” University of Sibiu, Emil Cioran Street, No. 4, 550025 Sibiu,
Romania
2
Academy of Technical Sciences from Romania

E-mail:{adrian.florea, arpad.gellert, lucian.vintan}@ulbsibiu.ro, [email protected]

KEYWORDS Also, in today’s world, there is an ever-increasing


Simulation, Education, Computer Architecture, need for intelligent systems, especially in educational
Simultaneous Multithreading, Benchmarking. domain. Without modernize our teaching tools in
computer architecture, based on the latest research
ABSTRACT achievements but also on trade, we risk losing contact
with the development of computer engineering.
The computing systems, and particularly
Therefore, it is a stringent necessity to develop teaching
microarchitectures, are in a continuous expansion
resources (software simulators) related to a hard kernel
reaching an unmanageable complexity by the human
of the fundamental disciplines in computer engineering,
mind. In order to understand and control this expansion,
like computer architecture, compilers, operating systems
researchers need to design and implement larger and
and computer networks. Developing effective learning
more complex systems’ simulators. In the current
tools targeting these disciplines is a continuous
paradigm the simulators play the key role in going
challenge.
further, by translating all complex processing
In this paper we try to give a better understanding of
mechanisms in relevant and easy to understand
SMT microprocessor architectures by developing a
information. This paper aims to make a suggestive
visual simulation framework. Due to the complexity
description of the concepts and principles implemented
level, we make the learning steps easier, driven by
into a Simultaneous Multithreading Architecture. We
expressive simulations which can provide us, based on
introduce the SMTAHSim framework, an educational
the general picture of the system, a detailed one (top to
tool that simulates in an interactive manner the
down approach). But why SMT architectures request
important aspects of this particular microarchitecture.
interest? The current microarchitectures have three
The graphical simulation and the results reporting
major limiters (the so called “brick wall” concept):
techniques provide a lot of easy to understand
• Memory wall – the increasing gap created between
information that outline an expressive image of
processor clock cycle time and the main memory
Simultaneous Multithreading (SMT) processing
access time;
mechanisms. Our developed software tool facilitates the
understanding of theoretical questions, thus allowing • Instruction Level Parallelism (ILP) wall – generated
students to feel more confident when studying SMT- by the present-day impossibility to issue a
related issues. continuously higher number of instructions in parallel;
• Power wall – favorized by the frequency scaling as the
number of transistors on chip increase.
1. INTRODUCTION
The SMT architectures come as a solution to the first
The computer science (CS) domain is a very complex two limitations by combining the superscalar instruction
one, representing the result of one of the largest and issue with the multithreading approach. Thus,
fastest scientific developments known to mankind. This instructions from multiple threads could be
gradual evolution has engaged, during the last six simultaneously issued in a single clock cycle. Latencies
decades, hundreds of bright minds from different fields that occur in the execution of single threads are bridged
(mathematics, physics, electronics, automation, and by issuing operations of the remaining threads. Other
informatics), giving birth to a new science (CS), which arguments refers to the fact that, although single-core
has revolutionized everyday lives of the people. SMT architectures are on the market since 2002 (Intel
However, the main responsible for computers progress Pentium 4 Northwood Hyperthreaded) until now – in
are microprocessors. The continuous expansion of 2010 Intel released the Core™ i3, i5, i7 with
microarchitectures has lead to a hard to control and Hyperthreaded technology on each core (Intel 2010) –,
understand complexity explored with the help of larger in the authors’ opinion, there are not efficient
and more sophisticated software simulators. pedagogical tools dedicated to teach SMT concepts
easier and more intuitively with interactive animation.

Proceedings 25th European Conference on Modelling and


Simulation ©ECMS Tadeusz Burczynski, Joanna Kolodziej
Aleksander Byrski, Marco Carvalho (Editors)
ISBN: 978-0-9564944-2-9 / ISBN: 978-0-9564944-3-6 (CD)
The fast development of computer science and We developed SMTAHSim simulator using the
computer architecture especially, have determined that Microsoft .NET Framework 3.5 writing over 7K lines
many software tools, used not long ago in research, are code. The simulator is running on Windows
enhanced with an interactive graphical interface and are 2k/XP/Vista/7 and is currently used in undergraduate
taught in Computer Architecture courses. The lack of and graduate courses / laboratories in (Advanced)
simulators dedicated to simultaneous multithreading Computer Architecture at “Lucian Blaga” University of
architectures used for didactical purposes, despite they Sibiu. The simulator can be found at
are highly used in research goals, represents the starting http://webspace.ulbsibiu.ro/adrian.florea/html/simulatoar
point of this paper. In order to better achieve this e/SMTAHSim.html
purpose, we try to develop a compact hybrid simulator, The organization of the rest of this paper is as follows.
which integrates microprocessor instruction stream, In section 2 we review the Related Work in software
branch prediction and cache memory simulation. simulators domain dedicated to microarchitectures.
Judging from educational goal, through this work we Section 3 describes the theoretical background related to
propose few new ideas: SMT, whereas section 4 presents the used benchmarks
• Hybrid simulation (trace- and execution-driven) of a and simulation methodology. Section 5 illustrates the
SMT architecture using interactive animation. simulator software architecture, the simulator kernel
• Introducing real branch predictors dedicated to each from hardware viewpoint and the SMTAHSim user
simulated thread (branch prediction was only interface. Based on a short interactive animated
statistically generated in other similar simulators example, we explain the SMT functionality. Finally,
(Smullen and Taha 2006)). For example we section 6 suggests directions for future work and
implemented gshare (a two-level adaptive branch concludes the paper.
predictor (Yeh and Patt 1992)) and two state of the art
dynamic predictors: FPBNP (a fast path-based neural 2. RELATED WORK
branch predictor (Jiménez 2003)) and OGEHL After almost four decades of concerning in
(Optimized GEometric History Length branch microprocessors design, implementation and
predictor (Seznec 2005)). The last one was classified exploitation, the researchers from computer science
on 2nd place at World Championship of Branch domain got the conclusion that simulators have become
Prediction (CBP 2004) and received the best practice an integral part of the computer architecture research
award for “the predictor the closest to a possible and design process (Yi and Lilja 2006) and simulation
hardware implementation”. The branch predictors can technology and methodology represents the crux of
be used also as a third party lesson / application. computer architecture research and development (De
• Introducing a parameterized instruction cache shared Bosschere et al. 2007).
between threads (both instruction and data caches Besides their importance proved in computer
were only statistically generated in other similar architecture research field, in the latest time, simulators
simulators (Smullen and Taha 2006)). have been extensively employed as a valuable
From a didactical point of view, the developed tool pedagogical tool as they enable students to understand
(SMTAHSim) has benefits in the learning process better the theoretical concepts and to visualize how
because it helps students to observe the influence of microarchitectures components work and interact with
each parameter on the simulation model. The each other (Yi and Lilja 2006).
SMTAHSim simulator provides a wider variety of In microprocessor systems’ domain, as
configuration options. Thus, it can be determined how microarchitectural complexity increases, (crossing from
branch prediction accuracy or resource usage varies with instruction-level-parallelism to thread-level-parallelism
input parameters (number of entries in prediction tables, and toward multi- and many-core architectures), it is
history length, number of bits for weights representation, more difficult to explain concepts like caches, out-of-
etc). The execution-driven simulation allows order and speculative execution, power consumption,
SMTAHSim’s tool to give fine-grained results regarding and the interactions among the architecture components
every microarchitectural unit during and at the end of without visual aids. Graphical simulations of these
the benchmarks’ simulation. All final simulation results architectures allow students to easily grasp the
are stored in a database and can be used further to architecture concepts by observing the flow of
generate a large palette of reports regarding units’ instructions in time, also by exploring the impact of
performance in correlation with almost every parameter. different processors configuration on performance,
The SMTAHSim simulator assures three of the features dissipated energy and temperature. The static visual
specific to almost all high-performance academic office tools (such as graphical charts, diagrams, slides
standard simulators: free availability for use, etc.) are limited in efficiency: they cannot
extensibility and portability. Full inheritance and simultaneously exhibit both the structural relationships
polymorphism is used in the simulator’s source code, between microarchitectural components and the
allowing easier extension in the future, adding new temporal dependences between executed instructions
functionalities. that are in-flight in the pipeline structures and cannot
explain the functionality of coherence mechanism in
multicore architectures, etc. Some of the present-day 3. THEORETICAL BACKGROUND
most used didactical simulators are:
It is well known that superscalar architectures exploit
• WinDLX was developed for Windows operating
Instruction Level Parallelism (ILP) by fetching and
system by Herbert Grünbacher (Grünbacher 1998) and
executing more than one independent instruction per
simulates Hennessy and Patterson’s DLX (DeLuXe)
cycle. Despite that, the instruction-per-cycle (IPC) rate
architecture (Hennessy and Patterson 2007). The DLX is
is limited to relatively low values, due to a lot of factors
a didactic microprocessor designed in accordance with
(Hennessy J., Patterson D., 2007).
the most popular RISC microprocessors (SPARC,
The SMT architecture comes as a solution to the
MIPS, etc.). Simulation exposes in an expressive
above mentioned limitation by combining the
manner the principle of in-order pipelined execution
superscalar mechanism with the multithreading
(execution steps, data hazards, forwarding) and
approach, which allows exploitation of both thread-level
performance penalty involved by high latency
parallelism (TLP) and ILP. In order to achieve this
instructions (delay slots) but, because it is modeled at
performance, processor keeps different context
architecture level quite few information is given about
information (program counter, stack pointer, etc.) for
the processor.
each active thread. Latencies which normally occur in
• VLIW-DLX extends the WinDLX simulator to a single thread execution are, in this case, (partially)
VLIW model, using the same DLX ISA. It is hidden by switching to another thread. This architecture
implemented in Java and allows modifications of the represents the mapping of high level languages’ explicit
architecture, including ISA (Bečvář and Kahánek 2007). and implicit concurrencies (threads or/and micro-
• PCSpim-Cache is an execution-driven simulator threads) into a processor having implemented multiple
indented to be used in undergraduate courses for contexts. A thread from hardware level can be a task or
teaching cache memories within MIPS architecture. The a software thread within a task, but also can be made of
tool allows to run step-by-step a selected code on a software entities of smaller granularity as loops, routines
proposed cache organization and meanwhile observe or code blocks (micro-threads), which may be executed
dynamic changes in its structure (Petit et al. 2006). in parallel (Eggers et al. 1997; Vintan and Florea 2000).
• PSATSim is a powerful graphical simulator which SMT architectures inherit the superscalar processing
offers support for students in better understanding the mechanism and extend it with multithreading
tradeoff between processors’ performance and power architecture specific components. Mechanisms as out-
consumption. The simulated microarchitecture is a of-order speculative execution, register renaming and in-
configurable superscalar architecture with speculative order completion are also met in SMT architectures. For
out-of-order execution. The GUI allows in an interactive assuring a different context, some hardware resources
and easy way to simulate different microarchitectural are private for each thread (branch predictors, renaming
configurations and assures a quick feedback (Smullen tables, logical register files, ROBs, Load/Store Queues,
and Taha 2006). commit units) and others are shared among threads
However, unlike SMTAHSim, part of the existing (fetch unit, decode unit, issue queue, physical register
simulators (Hostetler and Mirtich 1996; Burger and files, execution units and cache memory), using a tag
Austin 1997; Skadron et al. 2003; Sharkey et al. 2005; information in instruction encoding to make the
August et al. 2007) were designed primarily for difference.
research, the emphasis is on modeling the effects of To ensure a high throughput, SMTs need a scheduling
architectural mechanisms. Most of these simulators are policy that arbitrates between threads for optimizing
not trying to visually express the behavior of shared resources’ utilization. The most common scheme
architectural mechanisms and the interaction between is the very simple Round-Robin policy, which switches
them. They are often designed to model a specific between threads in a circular way, regardless of their
architecture and are also too complex to be studied by behavior. A better strategy is implemented in the
students who are beginners in concepts such as SMT. ICOUNT policy which give higher priority to threads
On the other hand most of the didactic simulators used with the fewest instructions in decode, rename and
in Computer Architecture are simulating only some instruction queues. The motivation is to give higher
simplistic toy-benchmarks. As it will be further priority to fast-moving threads and, at the same time, to
presented, our developed simulator can process complex prevent starvation. ICOUNT tries to balance the number
benchmarks that are intensively used in research of instructions in the pipeline among the various threads
activities, too. The interactivity of SMTAHSim so that all threads have an approximately equal number
simulator allows both to know in every machine cycle of instructions in the front-end pipeline and instruction
the content of CPU resources (reservation stations, queues (Manadhata and Sekar 2003; Eyerman and
functional units, reorder buffer, rename buffer, pipeline Eeckhout 2009).
structure) and to experiment unforeseen circumstances SMTAHSim benefits of both mentioned fetch policies
like forcing a miss in D-Cache (this cache module is and gives user the possibility to understand how these
modeled statistically based on benchmark are influencing the IPC rate and other parameters, driven
characteristics). by simulation monitoring tool.
4. SIMULATION METHODOLOGY framework is easily extendable with our independent
modules which are inheriting the provided interface. The
The SMTAHSim tool intends to help students in
Add-Ins can come also with their own configuration and
teaching superscalar and SMT architectures, by
simulation GUIs.
simulating a large palette of hardware configurations in
step-by-step or full trace simulation mode. In order to
obtain finest results, a hybrid simulation is performed.
The results are collected at the end of each processing
cycle by the Monitoring Tool and reported according to
user preferences (see Figure 1).
SMTAHSim's execution-driven simulation is
sustained by GUI which exposes in an interactive way
the SMT's architectural structure and execution-time
information. The step-by-step simulation gives a better
perspective above the instruction stream through
processing architecture and enables the user to visualize
how basic superscalar and SMT mechanisms work.
For result validation, a set of benchmarks are used as
simulator inputs, remaining to user choice which file is
used as input for each hardware thread. The benchmarks
represent a selection from the SPEC ’95 (applu,
compress, fpppp, ijpeg, perl (SPEC 1995)) and
MediaBench 1.0 (epic, mpeg2d, mpeg2e, pegwitd, toast
(Lee et al. 1997)) benchmark suites compiled for
SimpleScalar Portable ISA (PISA). All these
benchmarks cover a lot of applications ranging from
compression to word processing, from compilers and
architectures to games enhanced with artificial
intelligence, etc. We choose to use different benchmarks
in order to discover how these different testing programs Figure 1: SMTAHSim Architecture
influence the processing performances.
The SMTAHSim framework provides two simulation
5. THE SMTAHSim FRAMEWORK modes: a step-by-step simulation or a full unanimated
simulation. The user can easily switch between these
The developed simulator must support the learning
two modes by interacting with the Simulation Control
process of students in SMT microarchitecture and search
module. Depending on the running simulation mode, the
for possible changes (architectural or optimization
Monitoring Tool filters the results stored in the
techniques) to improve it. Providing a highly
Simulator Kernel’s Results Buffer. The simulation
parameterized model for every microarchitectural
process is carried out by the Simulation Machine which
instance, the performance obtained by simulation will
performs independently of the user interfacing tools.
represent a quick feedback mechanism related to the
The Results Buffer is updated at the end of every
proposed changes, permitting thus an efficient design
processing cycle with relevant information regarding
space exploration process. The simulator’s execution
performance and with a current context copy, which are
consists in the following sequential steps:
later processed by the Monitoring Tool. This mechanism
1) Initialization phase (configuring the micro-
speeds up the simulation because the Simulation
architecture with the input parameters including the
Machine is not interrupted by the graphical tools’
benchmarks)
operations, only by the buffer’s overflow. The producer-
2) Simulation and monitoring phase
consumer design pattern is implemented: as the
3) Results’ reporting
Simulation Machine produces data, the Monitoring Tool
For the initialization phase the SMTAHSim provides
is using it to update the Presentation layer (GUI). When
help with a quick and easy to use Configuration
the buffer is full, the simulation is suspended until the
Manager. This internal tool gives users the possibility to
data are consumed. All final results are stored in the
load preconfigured or saved configurations from the
Results Repository and can be used to generate finest
Configuration Repository or guides them through the
reports with the Results Reporting tool. User is able to
configuration process. The last simulated configuration
get relevant graphics of SMT’s performance indices in
is loaded as default.
correlation with almost every architectural parameter.
Some important architectural modules (called
suggestively ISA, Branch Predictor, I-Cache, Fetch 5.1. The SMTAHSim Software Architecture
Policy) are implemented as interfaces and can be loaded
by the Add-Ins Manager as precompiled libraries. The As we reveal in Figure 1, the framework is structured in
four main software packages:
• GUI (Graphical User Interface) plays an important • Simulation Machine is the most important package,
role as the highest level (Presentation Layer) of the situated at low application level, which makes the
framework, which manages all USER’s interactions. effective simulation.
This package is developed around two basic principles:
ACTION and REACTION. All user actions have a 5.2 SMTAHSim framework: Simulation Machine
quick feedback from the system, and all this reactions
SMTAHSim models a configurable SMT architecture
are managed carefully by GUI which makes the results
(Figure 2) designed in accordance with the M-SIM
representation in an interactive and easily
architecture (Sharkey et al. 2005) which has at base a
understandable manner. Overall, this package makes the
superscalar architecture with speculative and out-of-
framework a friendly and easy to use application.
order execution. The pipeline structure of SMTAHSim
• Input/Output package is the low level management
is based on that of PowerPC 5+ comercial processor
of all the simulation inputs and outputs giving the
(Sinharoy et al. 2005). Actually, M-SIM extends the
extensibility and accessibility dimensions to the
SimpleScalar toolset (Burger and Austin 1997) with
framework. The aim of this approach is to make the user
accurate models of the pipeline structures, including
to easily access the final results and architecture
explicit register renaming, and support for the
configurations and, eventually, to develop his/her own
concurrent execution of multiple threads. Basic
configurations and extensions to the basic architecture.
superscalar units are shared among micro-threads
The framework came with some basic configurations
(Cache, Fetch Unit, Decode Unit, Dispatch Queue,
which allow a proper evaluation of the SMT
Execution Units, Physical Registers), but in order to
architecture’s performances. For others configurations, a
assure different contexts some resources are private for
wizard is guiding the user step by step through the new
each micro-thread (Branch Predictors, Rename Tables,
configuration defining process. All new simulated
Reorder Buffers, Commit Units, Logic Registers).
configurations are stored in the Configuration
Repository at the user’s decision. The simulation results
of these configurations are also stored at the user’s
decision, in the Results Repository, and linked to the
simulated configuration. Due to this, software
architecture results can be used to generate fine-grained
reports regarding performance indices in correlation
with almost every parameter, directly from the Results
Repository. The Results Reporting tool supports users
through this process and allows generating a large
diversity of figures. The Add-Ins Repository plays a
very important role because it stores all third party
modules added by developers. The management of this
collection is carried out by the Add-Ins Manager.
• Application Kernel is the middle level
(middleware) which manages all user communications Figure 2: Simulated architecture
with the application. GUIs are assured for each middle Simulation involves getting instructions from
level manager module in order to give user the access to benchmarks and passes them step by step through the
low level packages. The simulation is initialized via the pipeline stages (Figure 3). There are three sections in the
Configuration Manager and is run via the Simulation pipeline: in-order frontend (fetch the instructions from
Control module (step by step or full trace simulation). memory, make the branch prediction, decoding, rename
The Monitoring Tool manages the feedback information registers and dispatching), out-of-order execution (the
and supplies the user with interactive animation by GUI number of execution cycles is distinct for each
update. Another important tool is the Add-Ins Manager instruction type) and in-order backend (gets finished
which has the responsibility to manage all third party instructions and updates the branch predictor). All
components added by developers. This module gives the essential architectural parameters (superscalar factor,
SMTAHSim the “framework” dimension by allowing number of micro-threads, number of execution units and
developers to extend the basic SMT architecture with their execution cycles, etc.) are configurable through the
other modules (ISA, branch predictor, data cache, etc.). Configuration Manager.
The Add-Ins can provide their own configuration panel
which will be loaded by the Configuration Manager at
the configuration phase, and their parameters set will be
then stored in the Configuration Repository together
with the basic one. The developer must only implement
the interfaces provided by the Add-Ins Manager,
compile it in a library and then load it in the
SMTAHSim Add-Ins Repository. Figure 3: Simulated pipeline
Due to the benchmarks’ characteristics, the effective subsequent instructions are marked as speculative and
execution can’t be accurately simulated, because the strike-lined until the branch execution ends and it turns
registers’ values are not known all the time. As a result out that the prediction is correct. In case of a
of this limitation, the single feasible D-Cache mispredicted branch, after its execution, all speculative
implementation is based on an analytical model. Besides instructions from the afferent thread are squashed and
these, another degree of abstractization is that branch the correct fetch path is taken.
prediction is made in a single pipeline stage (Instruction
Fetch) even if in reality it could take more cycles.

5.3. SMTAHSim Framework: GUI


Projects supported by the SMTAHSim simulator are
dedicated to teach students about concepts related to
superscalar and SMT architectures (processing
mechanisms, constraints, limitation of ILP rate, etc.),
and are fairly sustained by GUI. Being the closest to the
user, this level of application has benefited the most of Figure 6: Monitoring Simulation
our attention in order to give easy and interactive access After each full trace simulation a summary of
to all its features. Therefore, user can easily configure, simulation results is shown.
simulate and track the step-by-step results. In order to
get a big picture of SMT architecture performances,
GUI also supports user with a reporting tool.

Figure 7: Results

Prediction Accuracy
HardwareBudget = 8KB
95.5
Figure 4: Configuration Manager Interface HardwareBudget = 16KB
95.0
94.5
The Configuration Manager Interface (Figure 4) 94.0
makes possible to configure the simulated architecture 93.5
[%]

from a classic superscalar one to a 4-threaded SMT one. 93.0


92.5
Each micro-thread input can be settled independently. 92.0
After the architecture’s configuration the user can 91.5
control simulation by Simulation Control Interface and 91.0
make a step-by-step simulation: one simulated CPU Gshare FPBNP OGEHL

cycle each step (“Next” button) or simulating the input Predictors

traces entirely (“Go To End” button). In both cases the


IPC rate is updated in every CPU cycle (Figure 5). Figure 8: Average branch prediction accuracies
As a concrete example, Figure 8 illustrates
comparatively the simulation results obtained with
Figure 5: Part of Simulation Control Interface SMTAHSim using three prediction structures: gshare
(Yeh and Patt 1992), FPBNP (Jiménez 2003) and
When fine step simulation is chosen, the Monitoring
OGEHL predictors (Seznec 2005). The statistics are
Tool helps user to track the instruction flow from
collected after running the benchmarks described in
fetching to committing by animated visualization of each
section 4 on two configurations (one of them imposed
architecture units. Each instruction has a thread
by the hardware constraints of Championship Branch
identification number and a unique per thread identifier,
Prediction (CBP 2004)) and represent the average
which are both distinctively colored, allowing to easily
branch prediction accuracies.
following the pipelined execution process (Figure 6).
After the prediction of each branch instruction the
6. CONCLUSIONS AND FURTHER WORK De Bosschere K. et al., 2007, “High-Performance Embedded
Architecture and Compilation Roadmap”, Transactions on
The classical approach in teaching SMT concepts is HiPEAC I, Lecture Notes in Computer Science 4050,
based largely on oral communication of professors. Springer-Verlag, pp 5-29.
They spend a lot of time in computer architecture Eggers S. Emer J., Levy H., Lo J., Stamm R., Tullsen D.,
research or use paper and pencil to follow the execution 1997, “Simultaneous Multithreading: A Platform for
of the instructions flow. Although their efforts are to Next-Generation Processors”, IEEE Micro, Vol 17, Issue
emphasize the processor kernel activities, many times 5, 12-19.
Eyerman S., Eeckhout L., March 2009, “Memory-Level
they ignore the branch prediction and cache memory
Parallelism Aware Fetch Policies for Simultaneous
simulation. Our approach represents a formative Multithreading Processors”, ACM Transactions on
necessity since computer architectures are mainly Architecture and Code Optimization, Vol. 6, No. 1.
approached in a descriptive manner. Through our Grünbacher H., 1998, “Teaching Computer Architecture /
approach, students have the opportunity to be creative Organisation using simulators”, Proceedings of the 28th
and innovative in computer architecture or in other Frontiers in Education, IEEE Computer Society, Vol. 03.
research and didactical domains of computer science, Hennessy J., Patterson D., 2007, “Computer Architecture: A
even in countries not very developed from economical Quantitative Approach”, Morgan Kaufmann, 4th Edition.
and technological points of view. Based on highly Hostetler L.B., Mirtich B., 1996, “DLXsim - A Simulator for
parameterized developed simulation tools, students can DLX”.
©Intel Corporation, 2010, http://www.intel.com/
understand more in depth and in an integrated approach
Jiménez D., 2003, “Fast Path-Based Neural Branch
the theoretical concepts related to SMT, branch Prediction”, Proceedings of the 36th International
prediction constraints, limits of instruction level Symposium on Microarchitecture.
parallelism, TLP benefits, cache memories, etc. Lee C., Potkonjak M. and Mangione-Smith W., 1997,
Although SMT architectures outperform its ”MediaBench: A Tool for Evaluating and Synthesizing
predecessors, the evolution trend is maintained on Multimedia and Communications Systems”.
vertical by growing the technologic complexity. Manadhata P., Sekar V., 2003, “Evaluating Throughput and
Therefore a more aggressive approach (many micro- Fairness of Thread Fetch Policies for SMT Processors”,
threads) is heavily limited by the management logic’s www.cs.cmu.edu/~vyass/Fall03/15740/class-
project/project_report.pdf
complexity growth. It is clear that a new evolution trend
Petit S., Tomás N., Sahuquillo J., Pont A., 2006, ”An
is needed, on horizontal approach, by decentralization of Execution-Driven Simulation Tool for Teaching Cache
processing power (multi-core). For further work we are Memories in Introductory Computer Organization
mainly concerned to solve the following issues: Courses”, Proceedings of the 2006 Workshop on
 Simulating on benchmark sets which allow a real Computer Architecture Education.
implementation of data cache. Seznec A., 2005, “Analysis of the OGEHL predictor”,
 Implementing a module for power consumption Proceedings of the 32nd International Symposium on
calculation; this can help to evaluate the SMT Computer Architecture (IEEE-ACM), Madison.
architectures based on this objective, too. It is well- Sharkey J., Ponomarev D., Ghose K, 2005, “M-SIM: A
Flexible, Multithreaded Architectural Simulation
known that SMTs are energy-intensive due to their
Environment”. Technical Report CSTR-05-DP01,
complex and concentrated control logic. This module Department of Computer Science, State University of New
is also necessary for evaluation of hardware branch York at Binghamton.
predictor within a given chip area budget, from both Sinharoy B., Kalla R.N., Tendler J.M., Eickemeyer R.J. and
power consumption and performance points of view. Joyner J. B. 2005,"POWER5 System Microarchitecture",
 Adding modules to improve the processing rate, such IBM Journal of Research and Development, Vol. 49,
as value prediction, dynamic instruction reuse and an Num. 4/5, pp. 505-521.
execution trace cache. Skadron K., Stan M.R., Huang W., Velusamy S.,
Sankaranarayanan K., Tarjan D., 2003, “Temperature-
REFERENCES Aware Microarchitecture.”. Proceedings of the 30th
International Symposium on Computer Architecture.
August D., Chang J., Girbal S., Gracia Perez D., Mouchard G., Smullen W., Taha T., 2006, ”PSATSim: An Interactive
Penry D., Temam O., Vachharajani N., 2007 “UNISIM: Graphical Superscalar Architecture Simulator for Power
An Open Simulation Environment and Library for and Performance Analysis”, Proceedings of the 2006
Complex Architecture Design and Collaborative Workshop on Computer Architecture Education.
Development”, IEEE Computer Architecture Letters, 20. SPEC 1995, The SPEC benchmark programs,
Bečvář M., Kahánek S., 2007, “VLIW-DLX Simulator for http://www.spec.org/cpu95/
Educational Purposes”, Proceedings of the 2007 Vintan L., Florea A., 2000, ”Microarhitecturi de procesare a
Workshop on Computer Architecture Education. informaţiei” (in Romanian), Editura Tehnică, Bucureşti.
Burger D., Austin T., June 1997, “The SimpleScalar Tool Set, Yeh T., Patt Y., 1992, “Alternative Implementations of Two-
Version 2.0”, University of Wisconsin Madison, USA, Level Adaptive Branch Prediction”. Proceedings of the
CSD TR #1342. 19th International Symposium on Computer Architecture.
CBP: The 1st Journal of Instruction Level Parallelism Yi J.J., Lilja D.J., 2006, “Simulation of Computer
Championship Branch Prediction Competition (CBP-1), Architectures: Simulators, Benchmarks, Methodologies,
Oregon, USA, 2004. and Recommendations”, IEEE Transactions on
Computers, vol. 55, No. 3.

You might also like