0% found this document useful (0 votes)
95 views30 pages

4 Perfrmance

Computer performance is affected by many factors including technology, organization, software, and number of processors. Performance can be measured in execution time, throughput, and CPU execution time. Execution time includes all time spent on a task while CPU execution time only includes time spent computing. Performance is defined as the inverse of execution time. Comparing performance between machines requires using the same program or benchmark. The clock rate, clock cycle time, instructions per program, and cycles per instruction all factor into calculating a computer's execution time and performance.

Uploaded by

3sfr3sfr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views30 pages

4 Perfrmance

Computer performance is affected by many factors including technology, organization, software, and number of processors. Performance can be measured in execution time, throughput, and CPU execution time. Execution time includes all time spent on a task while CPU execution time only includes time spent computing. Performance is defined as the inverse of execution time. Comparing performance between machines requires using the same program or benchmark. The clock rate, clock cycle time, instructions per program, and cycles per instruction all factor into calculating a computer's execution time and performance.

Uploaded by

3sfr3sfr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Computer Performance

Performance of Computer Systems


 Many different factors among which:
 Technology
 Raw speed of the circuits (clock, switching time)
 Process technology (how many transistors on a
chip)
 Organization
 What type of processor (e.g., RISC vs. CISC)
 What type of memory hierarchy
 What types of I/O devices
 How many processors in the system
 Software
 O.S., compilers, database drivers etc
Definitions of Time
 Time can be defined in different ways, depending on
what we are measuring:
 Execution time : The time between the start and
completion of a task. It includes time spent executing on
the CPU, accessing disk and memory, waiting for I/O and
other processes, and operating system overhead.
 Throughput :The total amount of work done in a given
time.
 CPU execution time : Total time a CPU spends
computing on a given task (excludes time for I/O or
running other programs). This is also referred to as simply
CPU time.
Performance Definition
 For some program running on machine X,
Performance = 1 / Execution timeX
 "X is n times faster than Y" PerformanceX
/ PerformanceY = n
Problem:
 machine A runs a program in 20 seconds

 machine B runs the same program in 25

seconds
 how many times faster is machine A?

25
20 = 1.25
Basic Measurement
Comparing Machines
Metrics

 Metrics
 Execution time
 Throughput

 CPU time

 MIPS – millions of instructions per second

 MFLOPS – millions of floating point operations per second

 Comparing Machines Using Sets of Programs


 Arithmetic mean, weighted arithmetic mean

 Benchmarks

 When discussing processor performance, we will focus primarily on


execution time for a single job - why?
Because different programs have different characteristics (tasks)
Only compare processors using the same task.
Computer Clock
 A computer clock runs at a constant rate and determines
when events take placed in hardware.
Clk

clock period

 The clock cycle time is the amount of time for one


clock period to elapse (e.g. 5 ns).
 The clock rate is the inverse of the clock cycle time.
 For example, if a computer has a clock cycle time of 5
ns, the clock rate is:
1
---------------------- = 200 MHz
5 x 10-9 sec
How Many Cycles are Required for a
Program?
 Could assume that # of cycles = # of instructions

2nd instruction
3rd instruction
1st instruction

4th

5th

6th
...
time

 This assumption is incorrect, different instructions take different


amounts of time on different machines.
Different Numbers of Cycles for Different
Instructions

time

 Division takes more time than addition


 Floating point operations take longer than integer ones
 Accessing memory takes more time than accessing registers
Now That We Understand Cycles

 A given program will require


 some number of instructions (machine instructions)
 some number of clock cycles
 some number of seconds
 We have a vocabulary that relates these quantities:
 clock cycle time (seconds per cycle)
 clock rate (cycles per second)
 CPI (cycles per instruction)
 a floating point intensive application might have a higher CPI
Computing CPU Time
 The time to execute a given program can be computed as
CPU time = CPU clock cycles x clock cycle time
 Since clock cycle time and clock rate are reciprocals
CPU time = CPU clock cycles / clock rate
 The number of CPU clock cycles can be determined by
CPU clock cycles = (instructions/program) x (clock cycles/instruction)
= Instruction count x CPI
which gives
CPU time = Instruction count x CPI x clock cycle time
CPU time = Instruction count x CPI / clock rate
 The units for CPU time are
instructions clock
CPU time = cycles x x
program instruction clock cycle
seconds
Which factors are affected by each of the
following?
instr. Count CPI clock rate
Program X

Compiler X X

X
Instr. Set Arch. X

X
Organization X

Technology X

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle
CPU Time Example
 Example 1:
 CPU clock rate is 1 MHz
 Program takes 45 million cycles to execute
 What’s the CPU time?

45,000,000 * (1 / 1,000,000) = 45 seconds

 Example 2:
 CPU clock rate is 500 MHz
 Program takes 45 million cycles to execute
 What’s the CPU time

45,000,000 * (1 / 500,000,000) = 0.09 seconds


Example
 Example: Let assume that a benchmark has 100 instructions:
25 instructions are loads/stores (each take 2 cycles)
50 instructions are adds (each takes 1 cycle)
25 instructions are square root (each takes 50 cycles)

The total number of cycles executed


is 2 * 25 + 50* 1 + 25 *50 = 1350 cycles.
Or we can compute average CPI which is

Average CPI = ((0.25 * 2) + (0.50 * 1) + (0.25 * 50)) = 13.5

Then the total number of cycles is Instruction count * CPI

13.5 * 100 = 1350 cycles.

If clock rate is 1 Khz then execution tine is 1350/1000 = 1.3


seconds.
Computing CPI
 The CPI is the average number of cycles per instruction.
 If for each instruction type, we know its frequency and
number of cycles need to execute it, we can compute the
overall average CPI as follows:

Average CPI = Σ CPI x F


 For example
Op F CPI CPI x F % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Total 100% 2.2 100%
Performance
 Performance is determined by execution time
 Do you think any of the variables is sufficient enough to
determine computer performance?
 # of cycles to execute program?
 # of instructions in program?
 # of cycles per second?
 average # of cycles per instruction?
 average # of instructions per second

 It is not true to think that one of the variables is


indicative of performance.
Performance Example
 Suppose we have two implementations of the same
instruction set architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a
CPI of 2.0
Machine B has a clock cycle time of 20 ns. and a
CPI of 1.2

 Which machine is faster for this program, and by


how much? 9
CPU TimeA = 10 * 2.0 * 10 * 10 = 20 seconds
-9
Machine A is faster
CPU TimeB = 109 * 1.2 * 20 * 10-9 = 24 seconds
Assume that # of instructions in the program is 1,000,000,000.
24
20 = 1.2 times
Number of Instructions Example
 A compiler designer is trying to decide between two code sequences
for a particular machine. Based on the hardware implementation,
there are three different classes of instructions: Class A, Class B,
and Class C, and they require one, two, and three cycles
(respectively).
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
 Which sequence will be faster? How much?
 What is the CPI for each sequence?
# of cycles for first code = (2 * 1) + (1 * 2) + (2 * 3) = 10 cycles
# of cycles for second code = (4 * 1) + (1 * 2) + (1 * 3) = 9 cycles
CPI for first code = 10 / 5 = 2
10 / 9 = 1.11 times
CPI for second code = 9 / 6 = 1.5
Poor Performance Metrics
 Marketing metrics for computer performance included MIPS and
MFLOPS
 MIPS : millions of instructions per second
 MIPS = instruction count / (execution time x 106)
 = clock rate/(CPI x 106)
 For example, a program that executes 3 million instructions in 2

seconds has a MIPS rating of 1.5


 Advantage : Easy to understand and measure

 Disadvantages : May not reflect actual performance, since simple

instructions do better.
 MFLOPS : millions of floating point operations per second
 MFLOPS = floating point operations / (execution time x 106)
 For example, a program that executes 4 million fp. instructions in 5

seconds has a MFLOPS rating of 0.8


 Advantage : Easy to understand and measure

 Disadvantages : Same as MIPS, only measures floating point


MIPS
Example
 Two different compilers are being tested for a 500 MHz.
machine with three different classes of instructions:
Class A, Class B, and Class C, which require one, two,
and three cycles (respectively). Both compilers are
used to produce code for a large piece of software.
The first compiler's code uses 5 billions Class A
instructions, 1 billion Class B instructions, and 1 billion
Class C instructions.
The second compiler's code uses 10 billions Class A
instructions, 1 billion Class B instructions, and 1 billion
Class C instructions.
 Which sequence will be faster according to MIPS?
 Which sequence will be faster according to execution
time?
MIPS Example
(Con’t) Instruction counts (in billions)
for each instruction class
Code from A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

CPU Clock cycles1 = (5 x 1 + 1 x 2 + 1 x 3) x 109 = 10 x 109


CPU Clock cycles2 = (10 x 1 + 1 x 2 + 1 x 3) x 109 = 15 x 109

CPU time1 = 10 x 109 / 500 x 106 = 20 seconds


CPU time2 = 15 x 109 / 500 x 106 = 30 seconds

MIPS1 = (5 + 1 + 1) x 109 / 20 x 106 = 350


MIPS2 = (10 + 1 + 1) x 109 / 30 x 106 = 400
Another Performance Example
 Computers M1 and M2 are two implementations of the same
instruction set.
 M1 has a clock rate of 50 MHz and M2 has a clock rate of
100 MHz.
 M1 has a CPI of 2.8 and M2 has a CPI of 3.2 for a given
program.
ExTimeM1 ICM1 x CPIM1 / Clock RateM1 2.8/50
 How many times faster is M2 than M1 for this program?
= = 1.75
=
ExTimeM2 ICM2 x CPIM2 / Clock RateM2 3.2/100

 What would the clock rate of M1 have to be for them to have the
same execution time?

2.8 / Clock RateM1 = 3.2 / 100 Clock RateM1 = 87.5 MHz


Performance Summary
 The two main measure of performance are
 execution time : time to do the task
 throughput : number of tasks completed per unit time
 Performance and execution time are reciprocals.
Increasing performance, decreases execution time.
 The time to execute a given program can be computed as:
CPU time = Instruction count x CPI x clock cycle time
CPU time = Instruction count x CPI / clock rate
 These factors are affected by compiler technology, the
instruction set architecture, the machine organization, and
the underlying technology.
 When trying to improve performance, look at what
occurs frequently => make the common case fast.
Computer Benchmarks

 A benchmark is a program or set of programs used to


evaluate computer performance.
 Benchmarks allow us to make performance comparisons
based on execution times
 Benchmarks should
 Be representative of the type of applications run on the computer
 Not be overly dependent on one or two features of a computer
 Benchmarks can vary greatly in terms of their complexity
and their usefulness.
Amdahl's
Law
Speedup due to an enhancement is defined as:

ExTime old
Performance new Speedup = =
ExTime new
Performance old
 Suppose that an enhancement
accelerates a fraction
enhanced of the task by a
 Fraction Fractionenhanced
factor
ExTimenew = ExTime
Speedup x (1 - Fractionenhanced) +
oldenhanced

Speedupenhanced

ExTimeold 1
Speedup =
ExTimenew Fractionenhanced
= (1 - Fractionenhanced ) +
Speedupenhanced
Example of Amdahl’s Law
 Floating point instructions are improved to run twice as fast,
but only 10% of the time was spent on these instructions
originally. How much faster is the new machine?

Speedup = ExTimeold 1
ExTimenew Fractionenhanced
= (1 - Fractionenhanced ) +
Speedupenhanced
1
Speedup = 1.053
= (1 - 0.1) + 0.1/2
 The new machine is 1.053 times as fast, or 5.3% faster.
 How much faster would the new machine be if floating
point instructions become 100 times faster?
1
Speedup = 1.109
= (1 - 0.1) + 0.1/100
Another Example
Execution Time After Improvement =
Execution time unaffected + (Execution time affected/ improvement)

Example:
program runs for 100 seconds on a machine with multiply
responsible for 80 seconds of this time. If the multiply unit is made 4 times
faster. Then what is the overall speedup.

new execution time = 20+80/4 = 40 seconds


speedup = 100/40 = 2.5 times.
How fast the multiply must be to get 4,5 and 6 times speedup?
The maximum speedup that can be a chieved is 1/fraction not affected
In this case 1/.2 = 5 times. That is only possible with infinite speedup for
the multiply (takes 0 time to execute)
Example Continues
 Assume that a program runs in 100 seconds on a
machine, with multiply operations responsible for 80
seconds. How much do I have to improve the speed of
multiplication if I want my program to run 2 times faster.
Execution time after improvement =
Execution time affected by improvement
+ Execution time unaffected
Amount of improvement

80 seconds
50 seconds = + (100 – 80 seconds)
n
80 seconds
n= = 2.67
30 seconds
Summary of Performance Evaluation
 Good benchmarks, such as the SPEC benchmarks, can
provide an accurate method for evaluating and
comparing computer performance.
 MIPS and MFLOPS are easy to use, but inaccurate
indicators of performance.
 Amdahl’s law provides an efficient method for
determining speedup due to an enhancement.
 Make the common case fast!

You might also like