0% found this document useful (0 votes)
20 views

Sample Questions

The document provides solutions to various computer architecture and assembly language problems, comparing the performance of different computers (M1 and M2) based on execution time, instruction execution rates, and CPI calculations. It also discusses the impact of compiler choices on performance and examines the speedup achievable through hardware enhancements. Additionally, it analyzes instruction execution in a MIPS code fragment and compares single-cycle and multi-cycle CPU designs.

Uploaded by

nazatmohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Sample Questions

The document provides solutions to various computer architecture and assembly language problems, comparing the performance of different computers (M1 and M2) based on execution time, instruction execution rates, and CPI calculations. It also discusses the impact of compiler choices on performance and examines the speedup achievable through hardware enhancements. Additionally, it analyzes instruction execution in a MIPS code fragment and compares single-cycle and multi-cycle CPU designs.

Uploaded by

nazatmohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ICS 233, Term 172

Computer Architecture & Assembly Language

HW# 6 Solution

Q.1. We wish to compare the performance of two different computers: M1 and M2. The
following measurements have been made on these computers:

Program Time on M1 Time on M2


1 2.0 seconds 1.5 seconds
2 5.0 seconds 10.0 seconds

Program Instructions executed on M1 Instructions executed on M2


1 5 × 109 6 × 109

(i) Which computer is faster for each program, and how many times as fast is it?

Computer M2 is faster for program 1 and it is faster by a factor=2/1.5=1.33


Computer M1 is faster for program 2 and it is faster by a factor=10/5=2
(ii) Find the instruction execution rate (instructions per second) for each computer when
running program 1.

Instruction execution rate for M1= 5 × 109/2=2.5 × 109 (instructions per second)
Instruction execution rate for M2= 6 × 109/1.5=4 × 109 (instructions per second)
(iii) The clock rates for M1 and M2 are 3 GHz and 5 GHz respectively. Find the CPI for
program 1 on both machines.

CPI for program 1 on M1= (3 × 109 x 2)/ 5 × 109=1.2


CPI for program 1 on M2= (5 × 109 x 1.5)/ 6 × 109=1.25
(iv) Suppose that program 1 must be executed 1600 times each hour. Any remaining time
should be used to run program 2. Which computer is faster for this workload?
Performance is measured here by the throughput of program 2.

Executing program 1 on M1 1600 times each hour will consume 1600x2=3200


seconds. Remaining time for running program 2 on M1= 3600-3200=400 seconds.
Thus, program2 can be run in M1 400/5=80 times.

Executing program 1 on M2 1600 times each hour will consume 1600x1.5=2400


seconds. Remaining time for running program 2 on M1= 3600-2400=1200 seconds.
Thus, program2 can be run in M2 1200/10=120 times.
Thus, for this workload computer M2 is faster.
Q.2. Suppose you wish to run a program P with 7.5 × 109 instructions on a 5 GHz machine
with a CPI of 1.2.
(i) What is the CPU execution time?

CPU execution time = (7.5 × 109 x 1.2)/ 5 × 109 = 1.8 seconds.


(ii) When you run program P, it takes 3 seconds of wall time to complete. What is the
percentage of the CPU time program P received?

The percentage of the CPU time program P received=1.8/3=60%

Q.3. Consider two different implementations, M1 and M2, of the same instruction set.
There are five classes of instructions (A, B, C, D and E) in the instruction set. M1 has a
clock rate of 4 GHz and M2 has a clock rate of 6 GHz.

Class CPI on M1 CPI on M2


A 1 2
B 2 2
C 3 2
D 4 4
E 3 4

(i) Assume that peak performance is defined as the fastest rate that a computer can
execute any instruction sequence. What are the peak performances of M1 and M2
expressed in instructions per second?

The peak performance of M1 is when it executes instructions of class A=4 x 109


instructions per second.
The peak performance of M2 is when it executes instructions of class A, B or C=6/2
x 109 = 3 x 109 instructions per second.
(ii) If the number of instructions executed in a certain program is divided equally among
the classes of instructions, except that for class A, which occurs twice as often as
each of the others, how much faster is M2 than M1?

CPI for M1 = (2x1+1x2+1x3+1x4+1x3)/(2+1+1+1+1)=14/6=2.33


CPI for M2 = (2x2+1x2+1x2+1x4+1x4)/(2+1+1+1+1)=16/6=2.67

M2 is faster than M1 by a factor = (IC x 2.33 x 6 x 109)/ (IC x 2.67 x 4 x 109)= 1.31
Q.4. Consider two different implementations, M1 and M2, of the same instruction set. There
are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of
6 GHz and M2 has a clock rate of 3 GHz. The CPI for each instruction class on M1 and
M2 is given in the following table:

Class CPI on M1 CPI on M2 C1 Usage C2 Usage C3 Usage


A 2 1 40% 40% 60%
B 3 2 40% 20% 15%
C 5 2 20% 40% 25%

The above table also contains a summary of the usage of instruction classes generated by
three different compilers: C1, C2, and C3. Assume that each compiler generates the same
number of instructions for a given program.
(i) Using C1 compiler on both M1 and M2, how much faster is M1 than M2?

CPI for M1 using C1 compiler = 2x0.4+3x0.4+5x0.2=3


CPI for M2 using C1 compiler = 1x0.4+2x0.4+2x0.2=1.6
M1 is faster than M2 using C1 compiler by a factor =
(IC x 1.6 x 6 x 109)/(IC x 3 x 3 x 109)=1.07
(ii) Using C2 compiler on both M1 and M2, how much faster is M2 than M1?

CPI for M1 using C2 compiler = 2x0.4+3x0.2+5x0.4=3.4


CPI for M2 using C2 compiler = 1x0.4+2x0.2+2x0.4=1.6
M2 is faster than M1 using C2 compiler by a factor =
(IC x 3.4 x 3 x 109) /(IC x 1.6 x 6 x 109) =1.06

(iii) If you purchase M1, which compiler would you use?

CPI for M1 using C3 compiler = 2x0.6+3x0.15+5x0.25=2.9


The compiler with less CPI will have less execution time. Thus, compiler C3 will be
used.
(iv) If you purchase M2, which compiler would you use?

CPI for M2 using C3 compiler = 1x0.6+2x0.15+2x0.25=1.4


Thus, compiler C3 will be used.

(v) Which computer and compiler combination give the best performance?

Computer M2 and compiler C3 will be selected as it is faster that M1 with C3 by a


factor=(IC x 2.9 x 3 x 109) /(IC x 1.4 x 6 x 109) =1.04
Q.5. A benchmark program runs for 100 seconds. We want to improve the speedup of the
benchmark by a factor of 3. We enhance the floating-point hardware to make floating
point instructions run 5 times faster. How much of the initial execution time would
floating-point instructions have to account for to show an overall speedup of 3 on this
benchmark?

Speedup = 1 / (f/s + (1-f)) => 3 = 1 / (f/5+(1-f)) => f/5 + 1-f = 1/3 => f + 5 – 5f = 5/3 =>
4f = 3.33 => f =0.833

Thus, floating-point instructions must account for 83.3% of the initial execution time to
show an overall speedup of 3 on this benchmark.

Q.6. Consider the following fragment of MIPS code. Assume that a and b are arrays of
words and the base address of a is in $a0 and the base address of b is in $a1. How
many instructions are executed during the running of this code? If ALU instructions
(addu and addiu) take 1 cycle to execute, load/store (lw and sw) take 5 cycles to execute,
and the branch (bne) instruction takes 3 cycles to execute, how many cycles are needed to
execute the following code (all iterations). What is the average CPI?

addu $t0, $zero, $zero # i = 0


addu $t1, $a0, $zero # $t1 = address of a[i]
addu $t2, $a1, $zero # $t2 = address of b[i]
addiu $t3, $zero, 101 # $t3 = 101 (max i)
loop: lw $t4, 0($t2) # $t4 = b[i]
addu $t5, $t4, $s0 # $t5 = b[i] + c
sw $t5, 0($t1) # a[i] = b[i] + c
addiu $t0, $t0, 1 # i++
addiu $t1, $t1, 4 # address of next a[i]
addiu $t2, $t2, 4 # address of next b[i]
bne $t0, $t3, loop # loop if (i != 101)

The loop body will be executed 101 times. Thus, the total number of instructions
executed per class is:

Class Instruction Count


addu and addiu 4 + 101x4 = 408
lw and sw 101x2=202
bne 101

Thus, the total number of instruction executed = 408 + 202 + 101 = 711 instruction.

Total number of cycles needed to execute the code = 408x1+202x5+101x3=1721 cycle.

The average CPI = 1721/711 = 2.42


Q.7. We want to compare the performance of a single-cycle CPU design with a multicycle
CPU. Suppose we add the multiply and divide instructions. The operation times are as
follows:

Instruction memory access time = 190 ps, Data memory access time = 190 ps
Register file read access time = 150 ps, Register file write access = 150 ps
ALU delay for basic instructions = 190 ps, Delay for multiply or divide = 550 ps

Ignore the other delays in the multiplexers, control unit, sign-extension, etc.

Assume the following instruction mix: 30% ALU, 15% multiply & divide, 30% load &
store, 15% branch, and 10% jump.
(i) What is the total delay for each instruction class and the clock cycle for the single-
cycle CPU design?

Instruction Instruction Register ALU Data Register


Total
Class Memory Read Operation Memory Write
ALU 190 150 190 150 680 ps
Load 190 150 190 190 150 870 ps
Store 190 150 190 190 720 ps
Branch 190 150 190 530 ps
Jump 190 190 ps
Mul/div 190 150 550 150 1040 ps

Clock cycle = 1040 ps determined by the longest delay.

(ii) Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the CPI for
each instruction class and the speedup over a fixed-length clock cycle? Note that this
implies that multiply and divide operations will be performed in multiple cycles.

Instruction CPI
Class
ALU 4
Load 5
Store 4
Branch 3
Jump 2
Mul/div 6

Average CPI= 4*0.3 + 5*0.15 + 4*0.15 + 3*0.15 + 2*0.1 + 6*0.15=4.1


Note that we assumed that load and store instructions have equal percentage.

Speedup = 1040 ps / (4.1*200 ps) = 1.268

You might also like