The document is the instructions for a final exam for a computer architecture course. It contains 7 questions related to topics like processor performance, cache design, binary conversion, and disk array sizing for an early Internet search engine. The exam tests students on their understanding of concepts like CPI calculation, pipeline stalling, memory addressing, cache hit rate and more.
The document is the instructions for a final exam for a computer architecture course. It contains 7 questions related to topics like processor performance, cache design, binary conversion, and disk array sizing for an early Internet search engine. The exam tests students on their understanding of concepts like CPI calculation, pipeline stalling, memory addressing, cache hit rate and more.
The document is the instructions for a final exam for a computer architecture course. It contains 7 questions related to topics like processor performance, cache design, binary conversion, and disk array sizing for an early Internet search engine. The exam tests students on their understanding of concepts like CPI calculation, pipeline stalling, memory addressing, cache hit rate and more.
The document is the instructions for a final exam for a computer architecture course. It contains 7 questions related to topics like processor performance, cache design, binary conversion, and disk array sizing for an early Internet search engine. The exam tests students on their understanding of concepts like CPI calculation, pipeline stalling, memory addressing, cache hit rate and more.
Name (please print):_________________________________ Total points: ___/120
Instructions This is a CLOSED BOOK and CLOSED NOTES exam. However, you may use calculators, scratch paper, and the green MIPS reference card from your textbook. Ask the instructor if you have any questions. Good luck, and have a good summer!
1. (10 points) Intel is considering two different enhancements to their newest i8 line of processors. They only have enough time to implement one of these enhancements before the scheduled release date of the processor.
The instruction set of their original processor can be divided into three different types of instructions having the following CPIs:
Instruction type CPI A 1.4 B 2.4 C 2
The benchmarks that are used to evaluate processor performance consist of the following instruction mixture:
Instruction type Percentage A 30% B 60% C 10%
Intel must choose one of the following options:
1. decrease the CPI of instruction type A by a factor of 15 (divide by 15), or 2. decrease the CPI of instruction type B by a factor of 2 (divide by 2).
These enhancements will not affect the number of instructions or the clock rate. Which enhancement is preferable? You must justify your answer numerically (with calculations).
2. Consider the following program, which converts a ASCII-encoded string representation of a decimal number into binary.
.data str: .asciiz 891 .text main: li $t0,0 # initialize index register to 0 li $s0,0 # initialize running sum to 0 loop: lb $s1, str($t0) # load a character from the string beqz $s1, exit # if we read the null character, then see ya!
sll $t2, $s0, 3 # this instruction, along with the next two sll $s0, $s0, 1 # instructions are used for: add $s0, $s0, $t2 # multiplying accumulator ($s0) by ten
addi $s1, $s1, -48 # convert from ASCII to binary add $s0, $s0, $s1 # accumulate! addi $t0, $t0, 1 # increment index j loop # here we go again! exit: jr $31 # end proggie
a. (10 points) In the program, fill in the blank comment line to describe what computation the program is performing through the use of the corresponding instruction sequence (two sll instructions followed by an add instruction).
b. (10 points) Suppose we need to run this program on an architecture that has a branch delay slot. Would we need to make any changes to the program to ensure that it behaves properly? Why or why not? If so, what change(s) must be made?
no the shift after the branch wont hurt anything if the branch is taken
c. (10 points) In the program code, show where all the data dependences exist, and indicate if any pipeline stalls would need to be inserted to deal with any of these dependencies (assuming the 5-stage MIPS pipeline).
d. (10 points) Write a short sequence of instructions to be inserted before the first instruction (li $t0,0), as well as a short sequence of instructions that could be inserted immediately before the return instruction (jr $31) which would save and later restore the callers $s registers using the programs stack. You only need to save the $s registers that are changed by the program.
Header code for saving the callers registers to the stack: Footer code to restoring the callers registers from the stack: addi $sp,$sp,-8 sw $s0,0($sp) sw $s1,4($sp)
lw $s0,0($sp) lw $s1,4($sp) addi $sp,$sp,8 3. This problem compares the performance of the single-cycle MIPS architecture, the multi-cycle MIPS architecture, and the pipelined MIPS architecture.
Assume it requires 10 ns to perform any of the following operations: main memory access, register file access, and arithmetic operation. Assume the delay for multiplexors, registers (i.e. setup/hold time), and lookup tables is negligible.
a. (10 points) What is the minimum clock period for each of the three implementations?
single=>50 ns, multi=>10 ns, pipelined=>10 ns
b. (10 points) What is the CPI for each of the implementations, assuming a program execution with the following instruction mix:
Instruction Execution Frequency r-type arithmetic, logical, comparison 60% branch 15% load 15% store 10%
Assume the pipelined CPU performs forwarding, that the compiler schedules the code to avoid load hazards, branches have a fixed 3-cycle latency (i.e. requires 2 trailing no-ops), and you may disregard the time required to fill the pipeline.
c. (10 points) What is the speedup of the pipelined processor relative to the single-cycle processor ? What is the speedup of the pipelined processor relative to the multi-cycle processor?
pipe vs. single = (50 * 1) / (10 * 1.3) = 3.85 pipe vs. multi = (10 * 4) / (10 * 1.3) = 3.08 4. (10 points) Suppose we want to improve the MISS RATE of a cache. List three different design enhancements that we could reasonably consider to accomplish this. Also, describe any possible side-effects, in terms of possible reductions in other aspects of cache performance, that could result from implementing each of these enhancements (and why).
more associativity => higher hit time (comparing more tags), higher miss penalty (replacement decisions)
more lines => higher hit time (bigger mux)
wider lines (bigger blocks) => higher miss penalty (more data to bring in)
5. (10 points) What is the minimum number of bits that are required to represent the decimal value 11.0625 in binary without introducing any rounding error? Provide the minimum number of bits only, you do not need to consider any particular representation format (i.e. you do not need to use the IEEE floating- point format, just the raw value in base-2).
.0625 => .1250 => .25 => .5 => 1 .0001 in base 2
11 1011 in base base
11.0625 1011.0001 so 8 bits!
6. (10 points) Assume theres a 2-way set associative cache with LRU replacement where memory addresses are interpreted in the following way:
tag
index 2 bits word offset 2 bits byte offset 2 bits
Fill in the following table to simulate the behavior of this cache, based on the sequence of memory references shown in column 1 of the table.
BYTE ADDRESS INDEX TAG HIT OR MISS? 541 1 8 miss line 1, got 8 662 1 10 miss line 1, got 8, 10 533 1 8 hit 607 1 9 miss line 1, got 9,10 542 1 8 miss line 1, got 9, 8 656 1 10 miss line 1, got 10, 8 7. (10 points) Its 1997 and youre a graduate student at Stanford named Larry Page. Youre trying to build a new Internet search engine and your strategy is to optimize its performance by ensuring that during a search, neither the CPU nor its disk array are idle during a search.
The search database is logically divided into 100 MB contiguous (sequential) blocks. After the first block is read, the engine reads subsequent blocks while using the CPU to search the previously read block. It takes 100 ms for the CPU to search each block.
You decide to use disks that rotate at 170 revolutions/sec (about 10,000 RPM), have an average seek time of 8 ms, have a transfer rate of 50 MB/sec, and have a controller overhead of 2 ms. How many disks do you need in your disk array? You do not need to include check (redundant) disks.
100 ms = ((1/170 rev/s) * (1/2) rev) + 8 ms + ((100 MB) / (50 MB/s)) * n 100 ms = 3 ms + 8 ms + 2*n 89 ms = 2*n 45 = n