Chap 2 Exercises With Solutions
Chap 2 Exercises With Solutions
Ex. 1.4: Assume a color display using 8 bits for each of the primary colors (red, green, blue) per
pixel and a frame size of 1280 × 1024.
a. What is the minimum size in bytes of the frame buffer to store a frame?
b. How long would it take, at a minimum, for the frame to be sent over a 100 Mbit/s network?
Solution:
a. 1280 * 1024 pixels = 1,310,720 pixels => 1,310,720 * 3 = 3,932,160 bytes/frame.
b. 3,932,160 bytes * (8 bits/byte) /100E6 bits/second = 0.31 seconds
Ex.1.5
Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a
3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz
clock rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions per second?
b. If the processors each execute a program in 10 seconds, find the number of cycles and the
number of instructions.
c. We are trying to reduce the execution time by 30% but this leads to an increase of 20% in the
CPI. What clock rate should we have to get this time reduction?
a)P1
3ghz clock rate=3*10^9cycles/s
cpi=1.5
nb of instructions per second=3.10^9/1.5=2*10^9 instructions per second
p2 and p3 same calculation
for p2=2.5*10^9 instructions per second
for p3=1.8*10^9 instructions per second
p2 has the highest performance
b)
for p1:
from a) the nb of instr per second is 2*10^9
if the execution time is 10 s so the total nb of instr is
2*10^9*10=2*10^10
cpi for p1 is 1.5 so the total nb of cycles is 1.5*2*10^10=3*10^10
second method
nb of cycles for p1:
clock rate is 3Ghz=3*10^9 hz=3*10^9 cycles per second
if the duration of the program is 10s so the total nb of cycles is 30*10^9 cycles
the total nb of instr is total nb of cycles / nb of cycles for an instruction
which is nb of cycles/cpi=30*10^9/1.5=2*10^10 instr
same calculation for p2 and p3
nbof cylces for p2=25*10^9
for p3=40*10^9
for the number of instructions or Instruction count(IC)
we have cpu time=(IC*CPI)/CR=> IC=(CPU time*CR)/CPI=….same formula for P1, P2 and P3
c) Clock rate=Instr count*CPI/cpu time
instr count the same so 2*10^10
cpi increased by 20% => new cpi=1.5*1.2=1.8
cputime decreased by 30%=7s
clock rate=(2*10^10*1.8)/7=5.14Ghz
same calculation for p2 and p3
cr for p2=4.28Ghz
cr for p3=6.75Ghz
Ex 1.6
Consider two different implementations of the same instruction set architecture. The
instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with
a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2,
2, 2, and 2.
Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as
follows: 10% class A, 20% class B, 50% class C, and 20% class D.
a. Which implementation is faster?
Sol: for p1 : cpu time=instr_count*cpi/clock_rate
Cpu_time for p1=(ICfor class A * cpi class A+ICfor class B*cpi class B……)/2.5*10^9=
(10^6*(10%)*1+10^6*(20%)*2……….)/2.5*10^9=
Cputime for p1=10.4*10^-4 s
Same calculation for p2 cputime for p2=6.66*10^-4 s
So p2 is faster
b. What is the global CPI for each implementation?
Global cpi for p1:
Cputimeforp1=ICforp1*globalcpiforp1/clockrateforp1
Globalcpi=cputime*clockrate/ICforp1=2.6
Globalcpi for p2=2.0
c. Find the clock cycles required in both cases.
For p1:
Genral formula for totalnbofcycles=IC*cpi
10^6*0.1*1+10^6*0.2*2+10^6*0.5*3+10^6*0.2*3=
We have 10^6 instructions, cpi for p1 is 2.6 so each instruction needs 2.6 cycles => we
need 10^6*2.6 cycles
0.1*10^6*1
For p2= clock cycles needed =20*10^5 cycles.
Ex. 1.7
Compilers can have a profound impact on the performance of an application. Assume that for a
program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time
of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time
of 1.5 s.
a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.
For compiler A:
Cputime=instr_count*cpi*cycle_time=>cpi =cputime/instr_count*cycle_time=1.1/10^9*10^-
9=1.1
Same calculation for compiler B, cpi for B is 1.25
b. Assume the compiled programs run on two different processors. If the execution times on
the two processors are the same, how much faster is the clock of the processor running
compiler A’s code versus the clock of the processor running compiler B’s code?
Cputime for A=cputime for B
ICforA*cpiforA/CR for A=ICforB*cpiforB/CRforB
ClockRateofB/ClockRateofA=ICforB*CPIforB/ICforA*CPIforA=1.2*10^9*1.25/10^9*1.1=1.37
c. A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1.
What is the speedup of using this new compiler versus using compiler A or B on the original
processor?
TA/Tnew= ICforA*cpiforA*cycletime /ICofnew*cpinew*cycletime =1.67
TB/Tnew=2.27
Ex. 1.12
Consider the following two processors: P1 has a clock rate of 4 GHz, average CPI of 0.9, and
requires the execution of 5.0E9 instructions; P2 has a clock rate of 3 GHz, an average CPI of
0.75, and requires the execution of 1.0E9 instructions.
a- One usual fallacy is to consider the computer with the largest clock rate as having the largest
performance. Check if this is true for P1 and P2.
Sol.: cpu for p1=(Ic*cpi)/CR=5*10^9*0.9/4*10^9=1.125s
For p2 cpu time=0.025s. so it is not true as p2 has a lower CR and is faster than p1.
b- Another fallacy is to consider that the processor executing the largest number of instructions
will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9
instructions and that the CPI of processors P1 and P2 do not change, determine the number of
instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.
Sol.: to excute 1*10^9 instructions, p1 needs 1*10^9*0.9/4*10^9=0.225s
Same cpu time for p2=>0.225=IC*0.75/3*10^9=>IC=0.9*10^9 instructions
c- A common fallacy is to use MIPS (millions of instructions per second) to compare the
performance of two different processors, and consider that the processor with the largest MIPS
has the largest performance.
Check if this is true for P1 and P2.
d- Another common performance figure is MFLOPS (millions of floating-point operations per
second), defined as MFLOPS = No. FP operations / (execution time × 1E6) but this figure has the
same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are
floating-point instructions. Find the MFLOPS figures for the programs.
Ex. 1.14
Assume a program requires the execution of 50 × 10^6 FP instructions, 110 × 10^6 INT
instructions, 80 × 10^6 L/S instructions, and 16 × 106 branch instructions. The CPI for each type
of instruction is 1, 1, 4, and 2, respectively.
Assume that the processor has a 2 GHz clock rate.
1.14.1 By how much must we improve the CPI of FP instructions if we want the program to run
two times faster?
1.14.2 By how much must we improve the CPI of L/S instructions if we want the program to run
two times faster?
1.14.3 By how much is the execution time of the program improved if the CPI of INT and FP
instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?
Solution:
Ex. 1.9
Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5,
respectively. Also assume that on a single processor a program requires the execution of 2.56E9
arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions.
Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of
arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the
number of processors) but the number of branch instructions per processor remains the same.
1.9.1 Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the
relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
1.9.2 If the CPI of the arithmetic instructions was doubled, what would the impact be on the
execution time of the program on 1, 2, 4, or 8 processors?
1.9.3 To what should the CPI of load/store instructions be reduced in order for a single
processor to match the performance of four processors using the original CPI values?
Solution:
1.9.1
P Nb. of arith. inst. Nb. of L/S inst. Nb. of branch inst. cycles Exec. time speedup
1 2.56E9 1.28E9 2.56E8 19.2E9 9.6 s 1
2 1.83E9 9.14E8 2.56E8 14.078 7.039s 1.36
4 9.12E8 4.57E8 2.56E8 7.676 3.838 s 2.5
8 4.57E8 2.29E8 2.56E8 4.485 2.2425 s 4.2
1.9.2
(2.56E9*2+1.28E9*12+0.256E9*5)/2E9
P Exec. time
1 (2.56E9*2+1.28E9*12+0.256E9*5)/2E9
2 (1.83E9*2+1.28E9*12+0.256E9*5)/2E9
4 (0.912E9*2+1.28E9*12+0.256E9*5)/2E9
8 (0.457E9*2+1.28E9*12+0.256E9*5)/2E9
1.9.3
Exec time for 1 proc. With new cpi for l/s =exc time for 4 processros =3.838s
So 2.56E9*1+1.28E9*newcpi+0.256E9*5=3.838=>newcpi=(3.838-0.256E9*5-2.56E9*1)/1.28E9