I. Extending Project 2: Designs Over The Budget Will Get 0 Point
This document provides instructions for Project 3 in the EE557 Fall 2016 course. Students are tasked with iteratively redesigning a baseline processor's microarchitectural blocks within given transistor count and area budgets to maximize performance across four benchmarks. Acceptable blocks to modify include branch predictors, caches, queues, and functional units. Students must submit their final configuration file, area/transistor reports, and a project report detailing their design process and intermediate/final results. Performance and report quality will be graded against budgets and other student submissions.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
151 views4 pages
I. Extending Project 2: Designs Over The Budget Will Get 0 Point
This document provides instructions for Project 3 in the EE557 Fall 2016 course. Students are tasked with iteratively redesigning a baseline processor's microarchitectural blocks within given transistor count and area budgets to maximize performance across four benchmarks. Acceptable blocks to modify include branch predictors, caches, queues, and functional units. Students must submit their final configuration file, area/transistor reports, and a project report detailing their design process and intermediate/final results. Performance and report quality will be graded against budgets and other student submissions.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
University of Southern California
Department of Electrical Engineering
EE557 Fall 2K16 Instructor: Michel Dubois Section: 30630R and 30628D Project #3, Due: 5 PM., Tuesday, November 29th TOTAL SCORE: / 10
I. Extending Project 2 Project 3 builds on your experience gained in Project 2 configuring architectural simulators. In this project your goal is to redesign the baseline processor by changing several micro-architectural blocks, such as branch predictors, Register update units etc., to improve the performance of the baseline processor. In this project you will iteratively look for an optimal design choice for all the micro-architectural blocks by exploring the design space using simulations. Again, this task can be accomplished without any need to modify the code and instead by simply (and intelligently) changing the simulation parameters in the configuration file as you have already done in Project 2. Unless otherwise stated, every detail in Project 3 stays the same as in Project 2. In particular, the simulator and the benchmark locations, baseline configuration, and all other project environments are identical to Project 2.
II. Project Description In this project you are given a MAXIMUM transistor and area budget. Your goal is to change any combination of the following micro-architectural blocks below to achieve the best performance for four benchmark programs. We will measure the performance as: ! !!! # of committed instrucitons!"#$%&'() ! ! (in MIPS) !!!(# of cyclesclock cycle period)!"#$%&'() !
The four benchmark programs are bitcnt, equake, bzip2, and art as below in the project environments. For instance, if there are 1 million instructions committed per each benchmark; the simulation cycles of the four benchmarks are 1, 2, 3, 4 million cycles; and the clock cycle time is 1 ns, then performance is computed as follows: (1 + 1 + 1 + 1)million instructions 4million instructions = = 400MIPS 1 + 2 + 3 + 4 million cycles1ns 1010!! seconds
The transistor count including every component and the area budget are given below. Your design is NOT allowed to exceed either of them. This budget will be measured by the Real Estate Estimator tool. Designs over the budget will get 0 point.
Transistor count: 200 million Area: 25 mm2 You are allowed to change only the following micro-architectural blocks. For instance, you can increase or decrease the sizes of the components, change the cache associativities, change the cache replacement policies.
Dynamic Branch Predictor1 Branch Target Buffer Size of Return Address Stack Machine Width (issue/decode/commit per cycle) Instruction Fetch Queue Size Register Update Unit Size2,3 (must be equal or larger than 32-entry) Load/Store Queue Size Number of Integer ALUs and Multiplier/Divider Units Number of Floating-point ALUs and Multiplier/Divider Units Number of Memory Ports Caches (Size, Associativity, Replacement Algorithm, Block Size) 2,4
1 The perfect branch predictor is not allowed. 2 Remember when you change your RUU or cache structures, the number of read and write ports will be affected. So, each time you change one of those, you need to check the estimator tool for any change in number of ports, and then use CACTI to compute access time and latencies. 3 The RUU size must equal or larger than 32-entry. Any number under 32 is NOT allowed 4 The address space is assumed to be 42 bits and the number of bits per tag (Nr. Of Bits per Tag in CACTI) should be calculated based on the cache size and structure.
Please keep in mind that as you increase or decrease some of the sizes, your CPU clock period and the access time of your memory structures will be affected. Obviously accessing a 16KB L1 cache should be much faster than accessing a 1MB L1 cache! So you should adjust the latency of any structure, which is affected appropriately. Again use the CACTI tool to come up with latency estimates. We will use CACTI, SimpleScalar and Real Estate Estimator that we already used in Project 2.
Basic Project Steps Here are new steps for doing this project: First, repeat the steps 1-6 of Project 2. 1. In this step you will look at the result files generated from the SimpleScalar simulation tool and decide which one of the allowed micro-architectural blocks you want to change. Keep in mind that you cannot exceed the area and the transistor count limits specified above when you increase the structure sizes. Also, make sure that the clock cycle latency is appropriately adjusted to reflect the new structure sizes. So be clever about which structure to change and by how much. Since the SimpleScalar result file contains various block access counts, cache misses, hits etc. there is no need to change the code. 2. Once you change one or more micro-architectural parameters you redo steps 1 through 6 of Project #1, as necessary. Look at the new MIPS rating of the processor with your enhanced processor configuration. Compare it with all prior configurations. Iterate the steps till you think you have the worlds best processor. 3. Finally, you will generate a report that shows how you iterated through the design space and why you made those design choices. Support your arguments with charts and compelling arguments.
III. Project Environment Project environment is the same as that of Project 2 except that you need to copy the additional benchmarks and inputs from the class directory. Please copy the following benchmarks with all other necessary files into your directory in addition to bzip2 and art that were used in Project 2:
For all benchmarks, we will limit all our simulations to only 50 million instructions. We will fast-forward through first 300 million instructions. To do this, set the following parameters in your configuration files or include them in your command line parameters.
IV. Project Submission You must submit your final configuration file (FirstnameLastname_Proj3.conf), your excel sheet of the Real estimator tool (FirstnameLastname_Proj3.xls), and an electronic copy of your project report (FirstnameLastname_Proj3.pdf) that includes the followings by the due date on the Den class-page: 1. Front page a) Title: EE557 Fall 2016 Project #3 Report; b) Name: <your name>; c) your email address; d) affiliation (optional) 1 pt. 2. Section 1. Design Process Description and discussion of your design process your iteration process: for example, what design progress and iteration you made to approach your final design, based on what results you observed and how that observation affect your next step of design iteration - at least a half page, 2 pts. 3. Section 2. Intermediate Results a) Intermediate average MIPS rates in a graph, b) RUU access times estimated from Cacti with converted cycle times in a table; c) transistor count and area estimates from Real Estate Estimator in two graphs; d) cache miss rates for all caches in a table for 3 intermediate iterations, 2 pts. 4. Section 3. Final Design a) MIPS rate; b) cycle time; c) area from Real Estate Estimator; d) transistor count from Real Estate Estimator; e) cache latencies; f) cache miss rates in a table, 1 pt. Please keep all of your shell scripts and simplescalar config files as they might be required to be submitted or asked to run by the TA.
V. Grading Your final design will be evaluated based on the following criteria: 1. Report (6 pts.) From the report pdf. 2. Performance (4 pts.) This part is evaluated by ranking the overall MIPS of all students. 0 point will be given for a mismatch between a reported MIPS and a MIPS from running a config file.
0 point will be given to designs over the transistor count and the area budget.
Like other assignments, this project must be done INDIVIDUALLY! Similar designs will be securitized.