0% found this document useful (0 votes)
339 views36 pages

Chapter 1 Lecture 2 & 3 - Performance

The document discusses various metrics for measuring computer performance including response time, throughput, and power/energy consumption. It discusses performance metrics for different types of computing like desktop, server, and embedded computing. It provides examples of how performance can be measured for different applications and systems using benchmarks. It discusses factors that influence CPU performance like the processor, memory, compilers, and operating system. The number of cycles required to run a program depends on the types and number of instructions as well as factors like whether instructions are memory or register based and whether they involve simpler operations or more complex ones like multiplication.

Uploaded by

Seid Degu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
339 views36 pages

Chapter 1 Lecture 2 & 3 - Performance

The document discusses various metrics for measuring computer performance including response time, throughput, and power/energy consumption. It discusses performance metrics for different types of computing like desktop, server, and embedded computing. It provides examples of how performance can be measured for different applications and systems using benchmarks. It discusses factors that influence CPU performance like the processor, memory, compilers, and operating system. The number of cycles required to run a program depends on the types and number of instructions as well as factors like whether instructions are memory or register based and whether they involve simpler operations or more complex ones like multiplication.

Uploaded by

Seid Degu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

PERFORMANCE OF A COMPUTER

Computer Performance Metrics

 Response Time
Delay between start end time of a task

 Throughput
Numbers of tasks per given time

 Power/Energy
Energy per task, power

1
Metrics for different applications

Desktop computing
Metrics: performance (latency), cost

Server computing
Examples: web servers, transaction servers, file servers
Metrics: performance (throughput), reliability

Embedded computing
Examples: printer, cell phone, video console
Metrics: performance (real-time), cost, power
consumption

2
Airline example
Plane Speed Range Passengers Time (hrs) Throughput
(mph) (miles) (p x mph)

Boeing 777 610 4630 375 6.5 228,750

Boeing 747 610 4150 470 6.5 286,700

Concorde 1350 4000 132 3.0 178,200

DC 8-50 544 8720 146 7.4 79,424

 How much faster is the Concorde compared to the 747?


 How much bigger is the Boeing 747 than the Douglas DC-8?
 So which of these airplanes has the best
performance?!
performance depends on the specific interest3
How Performance varies

 Several ways to write the same program but differ


in time and space efficiency
— Space efficiency : - Using small amount of memory
Allocate all variables to be of most appropriate
in size
» Use char or short instead of int where
applicable
Share data where possible
» Use pointers, which can point the same data
Re-use code
» Write function wherever possible
4
Performance ……cont’d

Time efficiency : -Running in small amount of


time

— Avoid using multiply and divide instructions


• multiply and divide halt processor until result is available,
several clock cycles later.

— Make program shorter or fewer steps


• Fewer instruction means shorter execution

5
Measure of performance for a processor

 The rate at which instructions are executed,


— Expressed as millions of instructions per second
(MIPS), referred to as the MIPS rate.
• Measures such as MIPS and MFLOPS have proven inadequate to
evaluating the performance of processors.

 Interest shifted to measuring the performance of systems


using a set of benchmark programs.
• The same set of programs can be run on different
machines and
the execution times compared.
How a computer design increase a
performance
 Make the Common Case Fast
The most important and general principle of computer
design is to make the common case fast:
• In making a design trade-off, favor the frequent case over the
infrequent case.
• This principle also applies when determining how to spend
resources.
• Opportunity for improvement is affected by how much time the
event consumes
E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it 1st

7
Amdahl’s Law

 Amdahl’s Law tells us the system performance gain


realized from the speedup of one component depends
not only on the speedup of the component itself, but
also on the fraction of work done by the component:

Speedup due to enhancement E:


ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E

8
Amhdahl’s Law [contd…]

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

1
Speedupoverall = ExTimeold =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced

• Amdahl’s Law can serve as a guide to how much an


enhancement will improve performance.
•Useful for comparing : -
 System performance of two alternatives
 Two CPU design alternatives.

9
Example (1):- Design comparison

 Suppose FP square root (FPSQR) is responsible for 20% of


the execution time of a critical graphics benchmark.
— One proposal is to enhance the FPSQR hardware and
speed up this operation by a factor of 10.
— The other alternative is just to try to make all FP
instructions in the graphics processor run faster by a
factor of 1.6; FP instructions are responsible for a
total of 50% of the execution time for the application.

Compare these two alternatives.

10
We can compare these two alternatives by comparing
the speedups.

• Improving the performance of the FP


operations overall is slightly better
because of the higher frequency.
11
—Example (2)
 You have a system that contains a special processor for
doing floating-point operations. You have determined
that 50% of your computations can use the floating-point
processor. The speedup of the floating pointing-point
processor is 15.
 a) Overall speedup achieved by using the floating-point
processor.

12
b) Overall speedup achieved if you modify the
compiler so that 75% of the computations can use
the floating-point processor.

c) What fraction of the computations should be able


to use the floating–point processor in order to
achieve an overall speedup of 2.25?

13
14
Example (3):-

 You have a system that contains a special processor for


doing floating-point operations. You have determined that
60% of your computations can use the floating-point
processor. When a program uses the floating-point
processor, the speedup of the floating-point processor is
40% faster than when it doesn’t use it.

— a) Overall speedup by using the floating-point processor.

15
b) In order to improve the speedup consider two
options:

 Option 1: Modifying the compiler so that 70% of the


computations can use the floating-point processor. Cost of
this option is $50K.

 Option 2: Modifying the floating-point processor. The


speedup of the floating-point processor is 100% faster than
when it doesn’t use it. Assume in this case that 50% of the
computations can use the floating–point processor. Cost of
this option is $60K.

 Which option would you recommend? Justify your answer


quantitatively.
16
Therefore, Option 1 is better because it
has a smaller Cost/Speedup
17
ratio.
Example (4):-

A program runs in 100 seconds on a machine, with multiply


operations responsible for 80 seconds of this time.
 How much do I have to improve the speed of multiplication if
I want my program to run five times faster ?

Execution Time After improvement =


(exec time affected by improvement/amount of improvement) +
exec time unaffected
exec time after improvement = (80 seconds / n) + (100 – 80
seconds)
We want performance to be 5 times faster =>

20 seconds = 80/n seconds + 20 seconds


0 = 80 / n !!!!
18
CPU PERFORMANCE

 Main factors influencing performance of computer system are:


– processor and memory,
– compilers, and
– operating system.

 CPU time: - is the time between the start and the end of execution of a
given program.
—CPU time depends on the program which is executed, including:
 types of instructions executed.
 the number of instructions executed.

19
How many cycles are required for a program?
 # of cycles = # of instructions

2nd instruction
3rd instruction
1st instruction

4th
5th
6th
...
time

 This assumption is incorrect! Because:


 Different instructions take different amounts of time (cycles)

• Multiplication takes more time than addition


• Floating point operations take longer than integer ones
• Accessing memory takes more time than accessing registers
Cpu performance equation …..cont’d
• CPU performance is dependent upon three characteristics: clock cycle (or
rate), CPI, and IC.

 For a given architecture performance increases come from:


—increases in clock rate (without adverse CPI affects)
—improvements in processor organization that lower CPI
—compiler enhancements that lower CPI and/or instruction count.
A 10% improvement in any one of them leads to a 10%
improvement in CPU time.

21
Cpu performance equation …..cont’d

 To calculate the number of total CPU clock cycles:

• where ICi - number of times instruction is executed in a


program
• CPIi - the average number of instructions per clock for
instruction i.
 This form can be used to express CPU time as

22
CPU Time: Example 1

Consider an implementation of MIPS ISA with 500 MHz clock and


– each ALU instruction takes 3 clock cycles,
– each branch/jump instruction takes 2 clock cycles,
– each sw instruction takes 4 clock cycles,
– each lw instruction takes 5 clock cycles.

Also, consider a program that during its execution executes:


– x=200 million ALU instructions
– y=55 million branch/jump instructions
– z=25 million sw instructions
– w=20 million lw instructions

Find CPU time.

23
24
Example - 2
 Suppose you have a load/store computer with the
following instruction mix:

 a) Compute the CPI.

25
 b) We observe that 35% of the ALU ops are paired with a
load, and we propose to replace these ALU ops and their
loads with a new instruction. The new instruction takes 1
clock cycle. With the new instruction added, branches
take 5 clock cycles, Compute the CPI for the new version.

26
 c) If the clock of the old version is 20% faster than
the new version, which version has faster CPU
Execution time and by how much percent

27
Example - 3

 For the purpose of solving a given application problem,


you benchmark a program on two computer systems.
— On system A, the object code executed 80 million Arithmetic
Logic Unit operations (ALU ops), 40 million load instructions, and
25 million branch instructions.
— On system B, the object code executed 50 million ALU ops, 50
million loads, and 40 million branch instructions.
— In both systems, each ALU op takes 1 clock cycles, each load
takes 3 clock cycles, and each branch takes 5 clock cycles.
• Compute the relative frequency of occurrence of each type of
instruction executed in both systems
• Find the CPI for each system.
• Assuming that the clock on system B is 10% faster than the clock on
system A, which system is faster for the given application problem
and by how much percent?

• 28
 a) relative frequency of occurrence

 b) Find the CPI for each system.

29
 c) Assuming that the clock on system B is 10% faster than
the clock on system A, which system is faster for the
given application problem and by how much percent?

30
Example – 3
 Consider our earlier example, here modified to use
measurements of the frequency of the instructions and of
the instruction CPI values, which, in practice, is obtained by
simulation or by hardware instrumentation.
 EXAMPLE Suppose we have made the following
measurements:
— Frequency of FP operations (other than FPSQR) = 25%
— Average CPI of FP operations = 4.0
— Average CPI of other instructions = 1.33
— Frequency of FPSQR= 2% CPI of FPSQR = 20
 Assume that the two design alternatives are to decrease the
CPI of FPSQR to 2 or to decrease the average CPI of all FP
operations to 2.5. Compare these two design alternatives
using the CPU performance equation

31
CPU perform…..cont’d

First, observe that only the CPI changes; the clock rate and
instruction count remain identical. We start by finding the original
CPI with neither enhancement:

We can compute the CPI for the enhanced FPSQR by


subtracting the cycles saved from the original CPI:

32
CPU perform…..cont’d

We can compute the CPI for the enhancement of all FP


instructions the same way or by summing the FP and non-FP CPIs.
Using the latter gives us

Since the CPI of the overall FP enhancement is slightly lower, its


performance will be marginally better. Specifically, the speedup
for the overall FP enhancement is

33
Example - 4
 Suppose we are considering two alternatives for our
conditional branch instructions, as follows:
— CPU A: A condition code is set by a compare instruction and
followed by a branch that tests the condition code.
— CPU B: A compare is included in the branch.

 On both CPUs, the conditional branch instruction takes 2


cycles, and all other instructions take 1 clock cycle. On CPU A,
20% of all instructions executed are conditional branches; since
every branch needs a compare, another 20% of the instructions
are compares. Because CPU A does not have the compare
included in the branch, assume that its clock cycle time is 1.25
times faster than that of CPU B. Which CPU is faster? Suppose
CPU A’s clock cycle time was only 1.1 times faster?

34
 Since we are ignoring all systems issues, we can use the CPU
performance formula:

 since 20% are branches taking 2 clock cycles and the rest of the
instructions take 1 cycle each. The performance of CPU A is then

 Clock cycle timeB is 1.25 × Clock cycle timeA, since A has a clock
rate that is 1.25 times higher. Compares are not executed in CPU
B, so 20%/80% or 25% of the instructions are now branches taking
2 clock cycles, and the remaining 75% of the instructions take 1
cycle. Hence,

35
 Because CPU B doesn’t execute compares, ICB =
0.8 × ICA. Hence, the performance of CPU B is

 Under these assumptions, CPU A, with the shorter clock cycle


time, is faster than CPU B, which executes fewer instructions.
If CPU A were only 1.1 times faster, then Clock cycle timeB is
, 1.10 times clock cycle timeA and the performance of CPU B
is

With this improvement CPU B, which executes fewer


instructions, is now faster.
36

You might also like