Sample Questions
Sample Questions
HW# 6 Solution
Q.1. We wish to compare the performance of two different computers: M1 and M2. The
following measurements have been made on these computers:
(i) Which computer is faster for each program, and how many times as fast is it?
Instruction execution rate for M1= 5 × 109/2=2.5 × 109 (instructions per second)
Instruction execution rate for M2= 6 × 109/1.5=4 × 109 (instructions per second)
(iii) The clock rates for M1 and M2 are 3 GHz and 5 GHz respectively. Find the CPI for
program 1 on both machines.
Q.3. Consider two different implementations, M1 and M2, of the same instruction set.
There are five classes of instructions (A, B, C, D and E) in the instruction set. M1 has a
clock rate of 4 GHz and M2 has a clock rate of 6 GHz.
(i) Assume that peak performance is defined as the fastest rate that a computer can
execute any instruction sequence. What are the peak performances of M1 and M2
expressed in instructions per second?
M2 is faster than M1 by a factor = (IC x 2.33 x 6 x 109)/ (IC x 2.67 x 4 x 109)= 1.31
Q.4. Consider two different implementations, M1 and M2, of the same instruction set. There
are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of
6 GHz and M2 has a clock rate of 3 GHz. The CPI for each instruction class on M1 and
M2 is given in the following table:
The above table also contains a summary of the usage of instruction classes generated by
three different compilers: C1, C2, and C3. Assume that each compiler generates the same
number of instructions for a given program.
(i) Using C1 compiler on both M1 and M2, how much faster is M1 than M2?
(v) Which computer and compiler combination give the best performance?
Speedup = 1 / (f/s + (1-f)) => 3 = 1 / (f/5+(1-f)) => f/5 + 1-f = 1/3 => f + 5 – 5f = 5/3 =>
4f = 3.33 => f =0.833
Thus, floating-point instructions must account for 83.3% of the initial execution time to
show an overall speedup of 3 on this benchmark.
Q.6. Consider the following fragment of MIPS code. Assume that a and b are arrays of
words and the base address of a is in $a0 and the base address of b is in $a1. How
many instructions are executed during the running of this code? If ALU instructions
(addu and addiu) take 1 cycle to execute, load/store (lw and sw) take 5 cycles to execute,
and the branch (bne) instruction takes 3 cycles to execute, how many cycles are needed to
execute the following code (all iterations). What is the average CPI?
The loop body will be executed 101 times. Thus, the total number of instructions
executed per class is:
Thus, the total number of instruction executed = 408 + 202 + 101 = 711 instruction.
Instruction memory access time = 190 ps, Data memory access time = 190 ps
Register file read access time = 150 ps, Register file write access = 150 ps
ALU delay for basic instructions = 190 ps, Delay for multiply or divide = 550 ps
Ignore the other delays in the multiplexers, control unit, sign-extension, etc.
Assume the following instruction mix: 30% ALU, 15% multiply & divide, 30% load &
store, 15% branch, and 10% jump.
(i) What is the total delay for each instruction class and the clock cycle for the single-
cycle CPU design?
(ii) Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the CPI for
each instruction class and the speedup over a fixed-length clock cycle? Note that this
implies that multiply and divide operations will be performed in multiple cycles.
Instruction CPI
Class
ALU 4
Load 5
Store 4
Branch 3
Jump 2
Mul/div 6