Practice 1
Practice 1
a)
Clock rate
Instructions per second = CPI
P1 P2 P3
Clock Rate 3 * 109 2.5 * 109 4 * 109
CPI 1.5 1 2.2
IPS 2 * 109 2.5 * 109 1.82 * 109
b)
Cycles = Clock rate * Time
Cycles
Instructions = CPI
P1 P2 P3
Cycles 3 * 1010 2.5 * 1010 4 * 1010
CPI 2 * 1010 2.5 * 1010 1.818 * 1010
c)
Tnew = 0.7 Told
Instruction∗CPI
T= CLock rate
CPI new CPI old
= 0.7 *
f new f old
P1 P2 P3
Fold 3 * 1010 2.5 * 1010 4 * 1010
Fnew 5.14 * 1010 4.29 * 1010 6.86 * 1010
2.
a)
CPI = i ∈{ A∑
, B ,C . D }
f i * CPI
i
CPIP1 = 0.10 * 1 + 0.20 * 2 + 0.50 * 3 + 0.20 * 3 = 2.60
CPIP2 = 0.10 × 2 + 0.20 × 2 + 0.50 × 2 + 0.20 × 2 = 2.00
b)
Cycles = Instruction count × Global CPI.
Instruction count = 1 * 106
P1 P2
Global CPI 2.6 2
Cycles 2.6 * 106 2 * 106
3.
a)
Executiontime∗Clock rate
CPI = Instruction count
P1 P2
Execution time 1.1 1.5
Instruction count 1 * 109 1 * 109
CPI 1.1 1.25
b)
Executiontime∗Clock rate
CPI = Instruction count
IC A∗CPI A IC ∗CPI B
= B
fA fB
fA
= 0.7333
fB
c)
ICT new∗CPI
Tnew = new
= 0.66
f
Speed up new/A = TA / Tnew = 1.67 x
Speed up new/B = TB/Tnew = 2.27 x
4.
Arith cycles (×10⁹ Load cycles (×10⁹ Branch cycles (×10⁹ Total cycles (×10⁹
p Time (s) Speedup
) ) ) )
1 3.657 21.943 1.280 26.880 13.44 1.00×
2 1.8286 10.9714 1.280 14.080 7.04 1.91×
4 0.9143 5.4857 1.280 7.680 3.84 3.50×
8 0.4571 2.7429 1.280 4.480 2.24 6.00×
a)
b)
Slowdown
p Original Tp (s) New Tp (s)
(Tnew/Told)
1 13.44 15.27 1.14 × (≈14 % slower)
2 7.04 7.95 1.13 × (≈13 % slower)
4 3.84 4.30 1.12 × (≈12 % slower)
8 2.24 2.47 1.10 × (≈10 % slower)
c)
Arithmetic cycles: 2.56 × 109 × 1
Load/store cycles: 1.28 × 109 × L
Branch cycles: 0.256 × 109 × 5
Total cycles = 3.84 × 109 + 1.28 × 109 x L
Total cycles
T1 core = 9 = 1.92 + 0.64 * L
2∗10
Set this to 4-core times, we have
1.92 + 0.64 * L = 3.84
=> L = 3
5.
a)
Iold = 2.389×1012
After 15% reduction => Inew = 2.389 * 1012 * 0.85 = 2.03065 * 1012
CPI = (T * f)/ Inew = (700 * 4 * 109)/ 2.03065 * 1012 = 1.38
b)
Clock sped up by 33% (3 → 4 GHz) but CPI jumped by about 47% (0.94 → 1.38).
They’re not proportional because the new, richer instructions and deeper pipeline
needed for 4 GHz each cost extra cycles—so you pay a bigger CPI penalty than the
clock-rate gain.
c)
R = SPEC * T = 13.7 * 700 = 9590 s
The CPU time is reduce from 9590 s to 700 s so the rate of reduction is
(9590 – 700) / 9590 = 92.7%
d)
T = (I * CPI) / f
I = (T * f)/ CPI
=> I = (0.9 * 960 * 10-9 * 4 * 109 ) / 1.61 = 2146 (instruction)
e)
T = (IC * CPI) / f
Reduce T by 10% we have
f ’ = f / 0.9 = 3 * 109 / 0.9 = 3.33 * 109
CPInew = 0.85 * CPI
Tnew = 0.8 * T
f new = f * (I * CPI new) / T new = 1.0625 * f = 1.0625 * 3 * 109
6.
a)
Instruction∗CPI
CPU time = Clock rate
We have
P1: CPU time = (5 * 109 * 0.9) / 4 * 109 = 1.125 s
P2: CPU time = (1 * 109 *0.75) / 3 *109 = 0.25 s
So higher clock -> higher performent is not true in this situation
b)
P1: CPU time new = (1 * 109 * 0.9) / 4 * 109 = 0.225s
So the instruction P2 need is
IC = (0.225 * 3 * 109) / 0.75 = 0.9 * 109
c)
Clock rate∈ MHz
MIPS = CPI
P1 = 4000 / 0.9 = 444.4 MIPS
P2 = 3000 / 0.75 = 4000 MIPS
But CPU time of P2 is faster than P1(in a.) so this is not true in this situation
d)
FP ops = 0.4 * IC
MICOPS = FP ops / execution time * 106
P1
MICOPS = (0.4 * 5 * 109) / 1.125 * 106 = 1777.7
P2
MICOPS = (0.4 * 109) / 0.25 * 106 = 1600
9.
Tp = (100 / p) + 4
Speed up = 100 / Tp
Ratio = Speed up / p
7.
INT operation is 250 – (70 + 85 + 40) = 55 s
a)
After reduce
FP instructions = 70 * 0.8 = 56
T = 56 + 85 + 40 + 250 – (70 + 85 +40) = 236 s
The total time reduce is 250 – 236 = 14 s
b)
The new total time is
T = 250 * 0.8 = 200
We have
56 + 85 + 40 + T new = 200
=> T new = 5
So the INT operation must reduce from 55 s to 5 s
c)
No because it take 50 s to reduce 20 % total time, while the branches instructions
take only 40 s
8.
∑ (I i∗CPU i)
CPU time = i = (50 * 1 + 110 * 1 + 80 * 4 + 16 * 2) / 2000 = 0.256
ClockRate
T new = 0.256 / 2 = 0.128 s
IC * CPU new = 0.128 * 2000 = 256
The CPI of FS instruction must improve is
50 * x + 462 = 256
x = -4.12
so IC * CPU min = 462
T = 462 /2000 = 0.231
0.256 / 0.231 = 1.11x faster
So impossible
b)
The CPI of L/S instruction must improve is
80 * x + 192 = 256
=> x = 0.8
So reduce the CPI from 4 to 0.8
c)
CPU time = ( 50 * 0.6 + 110 * 0.6 + 80 * 4 *0.7 + 16 * 2 * 0.7 ) / 2000 = 0.1712 s
10.
a)
CPI = 0.7 * 2 + 0.1 * 6 + 0.2 * 3 = 2.6
b)
CPI new = 2.6 /1.25 = 2.08
The cycles of arithimetic instructions must hace to improve performent is
0.7 * x + 0.1 * 6 + 0.2 * 3 = 2.08
=> x = 1.26
c)
CPI new = 2.6 / 1.5 = 1.73
The cycles of arithimetic instructions must hace to improve performent is
0.7 * x + 0.1 * 6 + 0.2 * 3 = 1.73
=> x = 0.76