prj1 Specs2010
prj1 Specs2010
Submission Deadlines
Part A: Monday, Oct. 5, 11:59 PM
Part B: Monday, Oct. 5, 11:59 PM
Ground rules
This is the first project for the course, so let me begin by discussing some ground rules:
1. All students must work alone. The project scope is reduced (but still substantial) for ECE 463 students, as
detailed in this specification.
2. Sharing of code between students is considered cheating and will receive appropriate action in accordance
with University policy. The TAs will scan source code (from current and past semesters) through various tools
available to us for detecting cheating. Source code that is flagged by these tools will be dealt with severely.
3. A Wolfware message board is provided for posting questions, discussing and debating issues, and making
clarifications. It is an essential aspect of the project and communal learning. Students must not abuse the
message board, whether inadvertently or intentionally. Above all, never post actual code on the message
(unless permitted by the TAs/instructor). When in doubt about posting something of a potentially sensitive
nature, email the TAs and instructor to clear the doubt.
4. You must do all your work in the C/C++ or Java languages. Exceptions must be approved. The C language is fine
to use, as that is what many students are trained in. Basic C++ extensions to the C language (e.g., classes
instead of structs) are encouraged (but by no means required) because it enables more straightforward code-
reuse.
5. Use of the EOS Linux environment is required. This is the platform where the TA will compile and test your
simulator.
CAUTION: If you develop your simulator on another platform, get it working on that platform, and then try to
port it over to EOS Linux at the last minute, you may encounter major problems. Porting is not as quick and
easy as you think unless you are an excellent programmer. Worse, malicious bugs can be hidden until you port
the code to a different platform, which is an unpleasant surprise close to the deadline. So, keep this in mind.
1
1. Project Overview
In this project, you will implement a flexible cache and memory hierarchy simulator and use it
to study the performance of memory hierarchies using the SPEC benchmarks.
This project is divided into two parts. Both part A and part B are to be submitted separately.
In Part A, you will design a generic cache simulator module with some configurable parameters.
This cache module can be instantiated (used) as an L1 cache, an L2 cache, or an L3 cache, and
so on. Since it can be used at any level of the memory hierarchy, it will be referred to
generically as CACHE throughout this specification. In Part B, you will design a flexible two level
memory hierarchy simulator using the CACHE module designed in Part A.
Both simulators will take an input in a standard format which describes the read/write requests
from the processor. Simulator output is also expected to be in a standard format as explained in
further sections.
2
2. Part A: The generic CACHE module
Design a generic CACHE module that can be used at any level in a memory hierarchy.
This generic CACHE can be configured using different design parameters. It takes read/write
requests as input and optionally generates appropriate read/write request for the next level of
memory hierarchy. In Part A, you will design a one level cache memory hierarchy. Hence, all the
read/write requests come from the CPU and the next level of memory hierarchy is always the
main memory.
CACHE
Read/Write Request to
Next Level CACHE or Main
Memory
There are a few assumptions for the above parameters. BLOCKSIZE is a power of two. Also the
number of sets in a cache is also a power of two. ASSOC and SIZE need not be a power of two.
As you know, the number of sets is determined by the following equation.
3
2.2. Configurable Policies
Apart from the configurable parameters, the CACHE can be configured in terms of policies.
Corresponding policies will be specified at the beginning of the simulation.
Replacement Policy
All students (ECE463 and ECE521) need implement the LRU (Least Recently Used) replacement
policy. ECE 521 students will implement one additional policy called LFU (Least Frequently
Used) replacement policy. Replacement policy is a configurable parameter for the CACHE
simulator.
Write Policy
All students (ECE463 and ECE521) need to implement two write policies for the CACHE. CACHE
should support the WBWA (write-back + write-allocate) and WTNA (write through + write-not-
allocate) write policies.
CACHE receives a read or write request from the higher level (CPU). Only situation where
CACHE must interact with the next level below it (main memory) is when the read or write
request misses in the CACHE. CACHE always allocates a new block of data when a read request
is missed. But a write-miss may or may not cause a new block to be allocated in the CACHE. This
depends on the write policy.
4
Allocation of a new block
Think of one of the above scenarios in which CACHE needs to allocate a new block X. The
allocation of requested block X is actually a two-step process. The two steps must be performed
in the following order.
1. Make space for the requested block X. If there is at least one invalid (free) block in the set,
then there is already space for the requested block X and no further action is required. (Go
to step 2). To be consistent with the TA’s simulation, place the requested block X in place of
the first invalid block if there are more than one invalid blocks. On the other hand, if all
blocks in the set are valid, then a victim block V must be singled out for eviction, according
to the replacement policy (Section 2.2). For WBWA policy, if this victim block V is dirty, then
a write-back of the victim block V must be issued to the next level of the memory hierarchy.
2. Bring in the requested block X. Issue a read of the requested block X to the next level of the
memory hierarchy and put the requested block X in the appropriate place in the set
(determined in step 1).
Here r indicates a “load” and w indicates a “store from the processor. The simulator must parse
the trace file and issue the corresponding read/write request to the highest level in the
memory hierarchy.
r ffe04540
r ffe04544
w 0eff2340
r ffe04548
r ffe04544
w 0eff2340
r ffe04548
...
5
2.5. Raw Measurements
This simulator aims at collecting the data to calculate certain statistics for the CACHE. The
simulator should be able to compute following raw statistics at the end of simulation for a given
configuration of CACHE. In Part A, L1 is the only CACHE in the single level memory hierarchy.
a. number of L1 reads
b. number of L1 read misses
c. number of L1 writes
d. number of L1 write misses
e. L1 miss rate = MRL1 = (b + d)/(a + c)
f. number of write-backs from L1 to memory
g. total memory traffic = number of blocks transferred to/from memory
Note: g should match (b + d + f) if L1 uses WBWA policy.
The simulator will print out the CACHE configuration, the statistics and the status of CACHE at
the end of the simulation to output console in a specified format.
( )
Note: (L1 hit-time) and the L1 Miss Penalty can be obtained from the course website for this
project.
6
2.7. Validation Requirements
Your simulator code will be tested electronically. Two trace files will be used as simulator inputs
named gcc_trace and perl_trace. Various configurations of CACHE consisting of cache size,
associativity, block size, replacement policy and write policy will be tested on the trace files.
Sample outputs from the simulator called “Validation Runs” will be posted on the course
website.
1. You must be able to compile and run your simulator on EOS Linux machines. This is
required so that the TA can compile and run your simulator. If you are logging into an
EOS machine remotely and do not know whether or not it is Linux (as opposed to
SunOS), use the uname command to determine the operating system.
2. Along with your source code, you must provide a Makefile that automatically compiles
the simulator. This Makefile must create a simulator named sim_cache. The TA should
be able to build the simulator using single make command. The TA should be able to
remove object and executable files using make clean command. To make your life easy,
an example Makefile will be posted on the course website, which you can copy and
modify according to your needs.
3. Your simulator must accept exactly 6 command-line arguments in the following order:
<L1_REPLACEMENT_POLICY> <L1_WRITE_POLICY>
<trace_file>
7
<L1_WRITE_POLICY> 0 for WBWA
1 for WTNA
character <trace_file>
string, full Character string. Full name of the trace file including any
extensions
Example: 8 KB 4-way set-associative L1 cache with 32B block size, LRU replacement
policy and WTNA write policy will be simulated for “gcc_trace” with following
command.
4. Your simulator must print outputs to the console (i.e., to the screen). This way, when
a TA runs your simulator, he can simply redirect the output of your simulator to a file
for validating the results.
Validation
Your output must match both numerically and in terms of formatting when compared to
the validation runs. The TA will literally diff your output with the correct output. You must
confirm correctness of your simulator by following these two steps for each validation run.
1. Redirect the console output of your simulator to a temporary file. This
can be achieved using “>” operator. For example,
$ ./sim_cache 32 8192 4 0 1 gcc_trace > my_output
2. Test whether or not your output match properly, by running this Linux
command. This command must output “nothing” indicating a correct
match.
$ diff –iw my_output validation_run
The –iw flags tell diff to treat upper-case and lower-case as equivalent and to ignore the
amount of whitespace between words. Therefore, you do not need to worry about the
exact number of spaces or tabs as long as there is some whitespace where the validation
runs have whitespace.
For guidelines on how to get started with designing your simulator, see Appendix.
First, the TA needs to test every student’s simulator. Therefore, we are placing the constraint
that your simulator must finish a single run in 2 minutes or less. If your simulator takes longer
8
than 2 minutes to finish a single run, please see the TA as they may be able to help you speed
up your simulator.
Second, you will be running many experiments. Therefore, you will benefit from implementing
a simulator that is reasonably fast.
One simple thing you can do to make your simulator run faster is to compile it with a high
optimization level. The example Makefile posted on the web page includes the –O3
optimization flag.
Note that, when you are debugging your simulator in a debugger (such as gdb), it is
recommended that you compile without –O3 and with –g. Optimization includes register
allocation. Often, register-allocated variables are not displayed properly in debuggers, which is
why you want to disable optimization when using a debugger. The –g flag tells the compiler to
include symbols (variable names, etc.) in the compiled binary. The debugger needs this
information to recognize variable names, function names, line numbers in the source code, etc.
When you are done debugging, recompile with –O3 and without –g to get the most efficient
simulator again.
Your project1A.zip must contain only the following (any deviation from the following
requirements may delay grading your project and may result in point deductions, late penalties,
etc.):
Below is an example showing how to create project1.zip from an Eos Linux machine. Suppose
you have a bunch of source code files (*.cc, *.h), the Makefile, and your project report
(report.doc).
9
$ zip project1A *.cc *.h Makefile report.doc
As TA will unzip and grade your code electronically using scripts your zip file must have all files
inside directly (not in a folder inside the zip). So, keep this in mind if you create zip from GUI
program in windows.
10
3. Part B: Two level Memory hierarchy simulator
In this part you will design a memory hierarchy simulator with L1 and L2 cache. You will use
your generic CACHE designed in part A and instantiate it as L1 and L2 cache. Simulation
requirements will be same as part A with some modifications to accommodate the L2 cache.
Read/Write Request
From CPU
L1 CACHE
Read/Write Request
From L1 to L2
L2 CACHE
Read/Write Request to
Main Memory
11
3.2. Validation Requirements
Your simulator code will be tested electronically. Two trace files will be used as simulator inputs
named gcc_trace and perl_trace. Various configurations of CACHE consisting of cache size,
associativity, block size, replacement policy and write policy will be tested on the trace files.
Sample outputs from the simulator called “Validation Runs” will be posted on the course
website.
5. You must be able to compile and run your simulator on EOS Linux machines. This is
required so that the TA can compile and run your simulator. If you are logging into an
EOS machine remotely and do not know whether or not it is Linux (as opposed to
SunOS), use the uname command to determine the operating system.
6. Along with your source code, you must provide a Makefile that automatically compiles
the simulator. This Makefile must create a simulator named sim_cache. The TA should
be able to build the simulator using single make command. The TA should be able to
remove object and executable files using make clean command. To make your life easy,
an example Makefile will be posted on the course website, which you can copy and
modify according to your needs.
7. Your simulator must accept exactly 11 command-line arguments in the following order:
<L1_REPLACEMENT_POLICY> <L1_WRITE_POLICY>
<L2_REPLACEMENT_POLICY> <L2_WRITE_POLICY>
<trace_file>
Example: 8 KB 4-way set-associative L1 cache with 32B block size, LRU replacement
policy and WTNA write policy and 32 KB 8 way set-associative L2 cache with 64B block
size and LFU replacement policy and WBWA write policy will be simulated for
“gcc_trace” with following command.
8. Your simulator must print outputs to the console (i.e., to the screen). This way, when
a TA runs your simulator, he can simply redirect the output of your simulator to a file
for validating the results.
Validation
Your output must match both numerically and in terms of formatting when compared to
the validation runs. The TA will literally diff your output with the correct output. You must
confirm correctness of your simulator by following these two steps for each validation run.
3. Redirect the console output of your simulator to a temporary file. This
can be achieved using “>” operator. For example,
$ ./sim_cache 32 8192 4 0 1 gcc_trace > my_output
4. Test whether or not your output match properly, by running this Linux
command. This command must output “nothing” indicating a correct
match.
$ diff –iw my_output validation_run
The –iw flags tell diff to treat upper-case and lower-case as equivalent and to ignore the
amount of whitespace between words. Therefore, you do not need to worry about the
exact number of spaces or tabs as long as there is some whitespace where the validation
runs have whitespace.
For guidelines on how to get started with designing your simulator, see Appendix.
13
Appendix
Design Guidelines
Here are some guidelines to get you started with design of your CACHE simulator. In this
simulator you need to simulate only the tags for each cache block, and you don’t need to
simulate any kind of data-transfer to and from CACHE. You can use C/C++ or JAVA languages for
your code. In this guide, use of C/C++ languages is assumed.
The guidelines provided here are just for your reference. It’s not mandatory to follow these
guidelines in your project. As long as your simulator output matches with the validation runs,
you are good.
In your program, CACHE can be represented as a data structure implemented with a C struct or
a C++ class. Use of C++ is not necessary but is recommended as it simplifies the development
process. As we are designing a generic CACHE which can be used at any level in memory
hierarchy, the implementation of CACHE should be independent of the position in memory
hierarchy.
For this, the CACHE data structure needs to be a node in a linked list. Each CACHE will have
access to its immediate next level in memory hierarchy. This can be implemented as a pointer
to the CACHE data structure inside CACHE. For CACHE just above the main memory, this pointer
will simply be NULL.
In this way, your simulator will only give read/write requests to the first level CACHE (just below
the CPU) using its member functions. This CACHE will forward the requests to next level if
needed using the nextLevel pointer.
CACHE will keep track of number of reads, writes, misses, write-backs, memory accesses etc.
using some counter variables. These counter variables will be incremented during the
simulation as needed.
class CACHE
{
private:
/* Add CACHE data members
add variables for parameters like
size, block size, associativity,
write and replacement policies
dynamic Array for tag storage,
all counter variables
and other variables needed
*/
14
//pointer to the next CACHE in memory hierarchy
CACHE *nextLevel;
public:
//CACHE member functions
bool readFromAddress(unsigned int add);
bool writeToAddress(unsigned int add);
In a simulator run, it will first initialize the CACHE data structures using command line
arguments. Then it will start reading the trace file and issue the read/write requests to the first
level (L1 CACHE) and L1 CACHE will increment its counters and updates its tags and if needed, it
will issue read/write requests to L2 CACHE (if it exists). Similarly L2 CACHE will increment its
counters and updates its tags and so on. At the end of simulation when all requests from the
trace file are completed, simulator will read the counter values and display raw measurements
and contents of each CACHE.
15