Solution
Solution
1.7
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) =
10.4 × 10-4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) =
6.66 × 10-4 s
CPI(P1) = 10.4 × 10-4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10-4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105
1.8
a. CPI = T
exec
× f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. f
B/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. T
A/Tnew = 1.67
T
B/Tnew = 2.27
1.9
1.9.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.9.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.9.3 (S
new
+D
new
)/(Sold + Dold) = 0.90
D
new
=C×V
new
2×F
S
old = Vold × I
S
new
=V
new
×I
Therefore:
V
new
= [D
new
/(C × F)]1/2
D
new
= 0.90 × (Sold + Dold) - Snew
S
new
=V
new
× (Sold/Vold)
Pentium 4:
S
new
=V
new
× (10/1.25) = V
new
×8
D
new
= 0.90 × 100 - V
new
× 8 = 90 - V
new
×8
V
new
= [(90 - V
new
× 8)/(3.2E8 × 3.6E9)]1/2
V
new
= 0.85 V
Core i5:
S
new
=V
new
× (30/0.9) = V
new
× 33.3
D
new
= 0.90 × 70 - V
new
× 33.3 = 63 - V
new
× 33.3
V
new
= [(63 - V
new
× 33.3)/(2.9E8 × 3.4E9)]1/2
V
new
= 0.64 V
1.11
1.11.1 die area
15cm = wafer area/dies per wafer = π × 7.52/84 = 2.10 cm2
yield15cm = 1/(1 + (0.020 × 2.10/2))2 = 0.9593
die area
20cm = wafer area/dies per wafer = π × 102/100 = 3.14 cm2
yield20cm = 1/(1 + (0.031 × 3.14/2))2 = 0.9093
1.11.2 cost/die
15cm = 12/(84 × 0.9593) = 0.1489
cost/die
20cm = 15/(100 × 0.9093) = 0.1650
1.11.3 die area
15cm = wafer area/dies per wafer = π × 7.52/(84 × 1.1) = 1.91 cm2
yield15cm = 1/(1 + (0.020 × 1.15 × 1.91/2))2 = 0.9575
die area
20cm = wafer area/dies per wafer = π × 102/(100 × 1.1) = 2.86 cm2
yield20cm = 1/(1 + (0.03 × 1.15 × 2.86/2))2 = 0.9082
1.11.4 defects per area0.92 = (1–y.5)/(y.5 × die_area/2) = (1 - 0.92.5)/
(0.92.5 × 2/2) = 0.043 defects/cm
Clock rate
new
= No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock
rate
old = 3.18 GHz
1.13
1.13.1 T(P1) = 5 × 109 × 0.9/(4 × 109) = 1.125 s
T(P2) = 109 × 0.75/(3 × 109) = 0.25 s
clock rate(P1) > clock rate(P2), performance(P1) < performance(P2)
1.13.2 T(P1) = No. instr. × CPI/clock rate
T(P1) = 2.25 3 1021 s
T(P2) 5 N × 0.75/(3 × 109), then N = 9 × 108
1.13.3 MIPS = Clock rate × 10-6/CPI
MIPS(P1) = 4 × 109 × 10-6/0.9 = 4.44 × 103
MIPS(P2) = 3 × 109 × 10-6/0.75 = 4.0 × 103
MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a)
1.13.4 MFLOPS = No. FP operations × 10-6/T
MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3
MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3
MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from
11a)
1.14
1.14.1 T
fp = 70 × 0.8 = 56 s. Tnew = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6%
1.14.2 T
new
= 250 × 0.8 = 200 s, T
fp + Tl/s + Tbranch = 165 s, Tint = 35 s. Reduction time
INT: 58.8%
1.14.3 T
new
= 250 × 0.8 = 200 s, T
fp + Tint + Tl/s = 210 s. NO
1.15
1.15.1 Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No.
L/S instr. + CPI
branch × No. branch instr.
T
CPU = clock cycles/clock rate = clock cycles/2 × 109
clock cycles = 512 × 106; TCPU = 0.256 s
To have the number of clock cycles by improving the CPI of FP instructions:
CPI
improved fp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. +
CPI
branch × No. branch instr. = clock cycles/2
CPI
improved fp = (clock cycles/2 - (CPIint × No. INT instr. + CPIl/s × No. L/S
instr. + CPI
branch × No. branch instr.)) / No. FP instr.
CPI
improved fp = (256 - 462)/50 < 0 = = > not possible
1.15.2 Using the clock cycle data from a.
To have the number of clock cycles improving the CPI of L/S instructions:
CPI
fp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S instr.
+ CPI
branch × No. branch instr. = clock cycles/2
CPI
improved l/s = (clock cycles/2 - (CPIfp × No. FP instr. + CPIint × No. INT
instr. + CPI
branch × No. branch instr.)) / No. L/S instr.
CPI
improved l/s = (256 - 198)/80 = 0.725
1.15.3 Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No.
L/S instr. + CPI
branch × No. branch instr.
T
CPU = clock cycles/clock rate = clock cycles/2 × 109
CPI
int = 0.6 × 1 = 0.6; CPIfp = 0.6 × 1 = 0.6; CPIl/s = 0.7 × 4 = 2.8; CPIbranch =
0.7 × 2 = 1.4
T
CPU (before improv.) = 0.256 s; TCPU (aer improv.) = 0.171 s
Given a 2 GHz processor that implements the instruction set consists of five classes of
instructions A, B, C, D, E with CPI of 2, 3, 4, 6, and 10 respectively. The computer is
executing a program in which instruction classes A, B, C, D, E account for 30%, 30%,
15%, 15%, and 10% of the total instruction count, respectively. If the execution time of
this program is 20 seconds, find the total instruction count.
Given the MIPS assembly program as below: addi $s0, $zero, 66 slti $at, $s0, 55 beq
$at, $zero, else addi $t2, $zero, 1 j endif else:subi $t2, $zero, 1 endif: nop Assume that
the first instruction of this program starts from the address 0x00400000
b) Show the value of the 16-bit immediate field of the instruction beq $1, $zero, else
a: 0x00400014
b: 0x0002
What is the output of this program. .text li $s4, 2 li $s1, 0 li $s3, 0x0F la $s2, A li $s5, 0
loop: sll$t1,$s1,2 add$t1,$t1,$s2 lw$t0,0($t1) add$s5,$s5,$t0 add $s1,$s1,$s4
blt$s1,$s3,loop #print out result li $v0, 1 add $a0, $s5, $zero syscall .data A: .word
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Answer: 64
.data A: .word 0x5555aaaa
.text la $t0, A
lb $s1, 1($t0)
Find the value stored in the register $s2 after executing the above MIPS assembly
program. Assume that the MIPS CPU in this machine is a little-endian processor.
0x10010000 0xaa
30x10010001 0xaa
40x10010002 0x55
50x10010003 0x55
Answer: 0x0000000A
Given the below MIPS assembly code. Show the value of s2 in hexadecimal after the
code is executed.
li $s0, 4095
Answer: 0xFFFF2000
li $s0, 0x1234BCDE
sw $s0, 1($s1)
Answer: 0x12350000
Given the single cycle processor design shown in the figure below. Assume that the
CPU executes the following instruction: slt $at, $t2, $t3 The instruction is located at the
address 0x0040FF00 in instruction memory, and the values of registers are $t1 = 0x78,
$t2 = 0x95, $t3 = 0x10. Show the values at the points marked as ①①①①including
register file read ports, ALU result output, and branch address. Values should be shown
in hexadecimal.