0% found this document useful (0 votes)
151 views4 pages

I. Extending Project 2: Designs Over The Budget Will Get 0 Point

This document provides instructions for Project 3 in the EE557 Fall 2016 course. Students are tasked with iteratively redesigning a baseline processor's microarchitectural blocks within given transistor count and area budgets to maximize performance across four benchmarks. Acceptable blocks to modify include branch predictors, caches, queues, and functional units. Students must submit their final configuration file, area/transistor reports, and a project report detailing their design process and intermediate/final results. Performance and report quality will be graded against budgets and other student submissions.

Uploaded by

nikhilnarang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views4 pages

I. Extending Project 2: Designs Over The Budget Will Get 0 Point

This document provides instructions for Project 3 in the EE557 Fall 2016 course. Students are tasked with iteratively redesigning a baseline processor's microarchitectural blocks within given transistor count and area budgets to maximize performance across four benchmarks. Acceptable blocks to modify include branch predictors, caches, queues, and functional units. Students must submit their final configuration file, area/transistor reports, and a project report detailing their design process and intermediate/final results. Performance and report quality will be graded against budgets and other student submissions.

Uploaded by

nikhilnarang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

University of Southern California

Department of Electrical Engineering


EE557 Fall 2K16
Instructor: Michel Dubois
Section: 30630R and 30628D
Project #3, Due: 5 PM., Tuesday, November 29th
TOTAL SCORE: / 10

I. Extending Project 2
Project 3 builds on your experience gained in Project 2 configuring architectural simulators. In this
project your goal is to redesign the baseline processor by changing several micro-architectural blocks,
such as branch predictors, Register update units etc., to improve the performance of the baseline
processor. In this project you will iteratively look for an optimal design choice for all the
micro-architectural blocks by exploring the design space using simulations. Again, this task can be
accomplished without any need to modify the code and instead by simply (and intelligently) changing
the simulation parameters in the configuration file as you have already done in Project 2.
Unless otherwise stated, every detail in Project 3 stays the same as in Project 2. In particular, the
simulator and the benchmark locations, baseline configuration, and all other project environments are
identical to Project 2.

II. Project Description
In this project you are given a MAXIMUM transistor and area budget. Your goal is to change any
combination of the following micro-architectural blocks below to achieve the best performance for four
benchmark programs. We will measure the performance as:
!
!!! # of committed instrucitons!"#$%&'() !
! (in MIPS)
!!!(# of cyclesclock cycle period)!"#$%&'() !

The four benchmark programs are bitcnt, equake, bzip2, and art as below in the project environments.
For instance, if there are 1 million instructions committed per each benchmark; the simulation cycles of
the four benchmarks are 1, 2, 3, 4 million cycles; and the clock cycle time is 1 ns, then performance is
computed as follows:
(1 + 1 + 1 + 1)million instructions 4million instructions
= = 400MIPS
1 + 2 + 3 + 4 million cycles1ns 1010!! seconds

The transistor count including every component and the area budget are given below. Your design is
NOT allowed to exceed either of them. This budget will be measured by the Real Estate Estimator tool.
Designs over the budget will get 0 point.

Transistor count: 200 million
Area: 25 mm2
You are allowed to change only the following micro-architectural blocks. For instance, you can increase
or decrease the sizes of the components, change the cache associativities, change the cache
replacement policies.

Dynamic Branch Predictor1
Branch Target Buffer
Size of Return Address Stack
Machine Width (issue/decode/commit per cycle)
Instruction Fetch Queue Size
Register Update Unit Size2,3 (must be equal or larger than 32-entry)
Load/Store Queue Size
Number of Integer ALUs and Multiplier/Divider Units
Number of Floating-point ALUs and Multiplier/Divider Units
Number of Memory Ports
Caches (Size, Associativity, Replacement Algorithm, Block Size) 2,4

1
The perfect branch predictor is not allowed.
2
Remember when you change your RUU or cache structures, the number of read and write ports will
be affected. So, each time you change one of those, you need to check the estimator tool for any
change in number of ports, and then use CACTI to compute access time and latencies.
3
The RUU size must equal or larger than 32-entry. Any number under 32 is NOT allowed
4
The address space is assumed to be 42 bits and the number of bits per tag (Nr. Of Bits per Tag in
CACTI) should be calculated based on the cache size and structure.

Please keep in mind that as you increase or decrease some of the sizes, your CPU clock period and the
access time of your memory structures will be affected. Obviously accessing a 16KB L1 cache should be
much faster than accessing a 1MB L1 cache! So you should adjust the latency of any structure, which is
affected appropriately. Again use the CACTI tool to come up with latency estimates.
We will use CACTI, SimpleScalar and Real Estate Estimator that we already used in Project 2.

Basic Project Steps
Here are new steps for doing this project:
First, repeat the steps 1-6 of Project 2.
1. In this step you will look at the result files generated from the SimpleScalar simulation tool and
decide which one of the allowed micro-architectural blocks you want to change. Keep in mind that
you cannot exceed the area and the transistor count limits specified above when you increase the
structure sizes. Also, make sure that the clock cycle latency is appropriately adjusted to reflect the
new structure sizes. So be clever about which structure to change and by how much. Since the
SimpleScalar result file contains various block access counts, cache misses, hits etc. there is no need
to change the code.
2. Once you change one or more micro-architectural parameters you redo steps 1 through 6 of Project
#1, as necessary. Look at the new MIPS rating of the processor with your enhanced processor
configuration. Compare it with all prior configurations. Iterate the steps till you think you have the
worlds best processor.
3. Finally, you will generate a report that shows how you iterated through the design space and why
you made those design choices. Support your arguments with charts and compelling arguments.

III. Project Environment
Project environment is the same as that of Project 2 except that you need to copy the additional
benchmarks and inputs from the class directory. Please copy the following benchmarks with all other
necessary files into your directory in addition to bzip2 and art that were used in Project 2:

Executables Input Files Commands
bitcnt
-- bitcnts 1125000
(/ee557d/mibench)
equake
equake.in equake < equake.in
(/ee557d/spec2k)

For all benchmarks, we will limit all our simulations to only 50 million instructions. We will fast-forward
through first 300 million instructions. To do this, set the following parameters in your configuration files
or include them in your command line parameters.

IV. Project Submission
You must submit your final configuration file (FirstnameLastname_Proj3.conf), your excel sheet of the
Real estimator tool (FirstnameLastname_Proj3.xls), and an electronic copy of your project report
(FirstnameLastname_Proj3.pdf) that includes the followings by the due date on the Den class-page:
1. Front page
a) Title: EE557 Fall 2016 Project #3 Report; b) Name: <your name>; c) your email address; d)
affiliation (optional) 1 pt.
2. Section 1. Design Process
Description and discussion of your design process your iteration process: for example, what design
progress and iteration you made to approach your final design, based on what results you observed
and how that observation affect your next step of design iteration - at least a half page, 2 pts.
3. Section 2. Intermediate Results
a) Intermediate average MIPS rates in a graph, b) RUU access times estimated from Cacti with
converted cycle times in a table; c) transistor count and area estimates from Real Estate Estimator in
two graphs; d) cache miss rates for all caches in a table for 3 intermediate iterations, 2 pts.
4. Section 3. Final Design
a) MIPS rate; b) cycle time; c) area from Real Estate Estimator; d) transistor count from Real Estate
Estimator; e) cache latencies; f) cache miss rates in a table, 1 pt.
Please keep all of your shell scripts and simplescalar config files as they might be required to be
submitted or asked to run by the TA.

V. Grading
Your final design will be evaluated based on the following criteria:
1. Report (6 pts.)
From the report pdf.
2. Performance (4 pts.)
This part is evaluated by ranking the overall MIPS of all students. 0 point will be given for a
mismatch between a reported MIPS and a MIPS from running a config file.

0 point will be given to designs over the transistor count and the area budget.

Like other assignments, this project must be done INDIVIDUALLY!
Similar designs will be securitized.

You might also like