Computer Performance
Computer Performance
Performance
❑ Floating-Point
clock
𝜏 cycle
clock
generator CPU
actual clock
𝑛
σ𝑛𝑘=1 𝐶𝑃𝐼𝑘 × 𝐼𝑘
𝐼𝑐 = 𝐼𝑘 𝐶𝑃𝐼 = 𝑓(𝑀ℎ𝑧) = 𝐶𝑃𝐼 ∗ 𝑀𝐼𝑃𝑆
𝑘=1
𝐼𝑐
T = I c CPI
A refinement yields
𝑇 = 𝐼𝑐 × 𝑝 + (𝑚 × 𝐾) × 𝜏
where
p is the number of processor cycles to decode + execute the instruction
m is the number of memory references needed
K is the ratio between memory cycle time and processor cycle time.
Ic p m K τ
Instruction set architecture ✓ ✓ !
Compiler technology ✓ ✓ ✓
Processor implementation ✓ ✓
b) Which compiler produces the most efficient code and by which factor?
The compiler 2 was 1,4/1,2=1,17 times more efficient than compiler 1
𝑇
single processor 𝑇(1 − 𝑓) 𝑇𝑓
𝑇𝑓
N parallel 𝑇(1 − 𝑓) 𝑁
processors
1
𝑓
1−𝑓 +
𝑁
𝑇𝑓
N parallel 𝑇(1 − 𝑓) 𝑁
𝑜
processors
1
Speedup =
(1 − f ) + f
SU f
Example:
Suppose that a task consumes 40% of the time with
floating-point operations. A new FPU has speedup
K. Then the overall speedup is
1
Speedup =
(1 − 0.4) + 0.4
K
So, the maximum speedup is 1.67.
Assume that both machines take the same time to run the same
high-level code.
So, if MIPSCISC= 1, then MIPSRISC= 4
Designing for Performance 22
Benchmarks
Definition
Programs designed to test performance
Written in high-level language → portable
Represents a particular application or system programming
area (scientific, commercial)
Easily measured and widely distributed
The best-known such collection of benchmark suites is the
System Performance Evaluation Corporation (SPEC)
The best-known of the SPEC suites is the CPU2017:
◼ contains 43 benchmarks organized into four suites
◼ includes an optional metric for measuring energy
consumption
Designing for Performance 23
System Performance Evaluation Corporation
(SPEC)
For the SPECspeed, the selected ratios are averaged using the
Geometric Mean, which is reported as the overall metric.
Answer:
average CPI is CPI = 0.6+ (2 0.18) + (4 0.12) + (8 0.1) = 2.24
Proportion Proportion
Instruction type CPI
compiler 1 compiler 2
Floating point arithmetic 10 20 % 30 %
Integer arithmetic 5 30 % 10 %
Non- arithmetic 2 50 % 60 %
Compute the ratio f1/f2 for which the processing time in P1 executing code 1
equals the processing time of P2 executing code 2.
a) Determine the limit ratio r between the CPI of instructions of type P and type S
(r=CPIP /CPIS), for which the configuration A) is faster than configuration B).
b) Compute the upper limit for the speed up that can be achieved using multiple processors
without changing the operation frequency.
END
15-17, 24,28,31-25