Performance Evaluation of
Parallel Computers
NARENDRA KUMAR
Basics of
Performance Evaluation
A sequential algorithm is evaluated in terms of its execution time which is
expressed as a function of its input size.
For a parallel algorithm, the execution time depends not only on input size but
also on factors such as parallel architecture, no. of processors, etc.
Performance Metrics
Parallel Run Time Speedup Efficiency
Standard Performance Measures
Peak Performance Sustained Performance
Instruction Execution Rate (in MIPS) Floating Point Capability (in MFLOPS)
2
Performance Metrics
Parallel Runtime
The parallel run time T(n) of a program or application is the time required to
run the program on an n-processor parallel computer.
When n = 1, T(1) denotes sequential runtime of the program on single
processor.
Speedup
Speedup S(n) is defined as the ratio of time taken to run a program on a single
processor to the time taken to run the program on a parallel computer with
identical processors
It measures how faster the program runs on a parallel computer rather than on
NARENDRA KUMAR
a single processor.
3
Performance Metrics
Efficiency
The Efficiency E(n) of a program on n processors is defined as the ratio of
speedup achieved and the number of processor used to achieve it.
Relationship between Execution Time, Speedup and Efficiency and the
number of processors used is depicted using the graphs in next slides.
In the ideal case:
Speedup is expected to be linear i.e. it grows linearly with the number of
processors,
NARENDRA KUMAR but in most cases it falls due to parallel overhead.
4
Performance Metrics
Graphs showing relationship b/w T(n) and no. of processors
<<<IMAGES>>>
NARENDRA KUMAR
5
Performance Metrics
Graphs showing relationship b/w S(n) and no. of processors
<<<IMAGES>>>
NARENDRA KUMAR
6
Performance Metrics
Graphs showing relationship b/w E(n) and no. of processors
<<<IMAGES>>>
NARENDRA KUMAR
7
Performance Measures
Standard Performance Measures
Most of the standard measures adopted by the industry to compare the
performance of various parallel computers are based on the concepts of:
Peak Performance
[Theoretical maximum based on best possible utilization of all resources]
Sustained Performance
[based on running application-oriented benchmarks]
Generally measured in units of:
MIPS [to reflect instruction execution rate]
MFLOPS [to reflect the floating-point capability]
NARENDRA KUMAR
8
Performance Measures
Benchmarks
Benchmarks are a set of programs of program fragments used to compare the
performance of various machines.
Machines are exposed to these benchmark tests and tested for performance.
When it is not possible to test the applications of different machines, then the
results of benchmark programs that most resemble the applications run on
those machines are used to evaluate the performance of machine.
NARENDRA KUMAR
9
Performance Measures
Benchmarks
Kernel Benchmarks
[Program fragments which are extracted from real programs]
[Heavily used core and responsible for most execution time]
Synthetic Benchmarks
[Small programs created especially for benchmarking purposes]
[These benchmarks do not perform any useful computation]
EXAMPLES
LINPACK LAPACK Livermore Loops SPECmarks
NAS Parallel
NARENDRA KUMAR Benchmarks Perfect Club Parallel Benchmarks
10
Parallel Overhead
Sources of Parallel Overhead
Parallel computers in practice do not achieve linear speedup or an efficiency of
1 because of parallel overhead. The major sources of which could be:
• Inter-processor Communication
• Load Imbalance
• Inter-Task Dependency
• Extra Computation
• Parallel Balance Point
NARENDRA KUMAR
11
Speedup
Performance Laws
Speedup Performance Laws
Amdahl’s Law
[based on fixed problem size or fixed work load]
Gustafson’s Law
[for scaled problems, where problem size increases with
machine size
i.e. the number of processors]
Sun & Ni’s Law
[applied
NARENDRA KUMAR to scaled problems bounded by memory capacity]
12
Speedup
Performance Laws
Amdahl’s Law (1967)
For a given problem size, the speedup does not increase linearly as the number
of processors increases. In fact, the speedup tends to become saturated.
This is a consequence of Amdahl’s Law.
According to Amdahl’s Law, a program contains two types of operations:
Completely sequential
Completely parallel
Let, the time Ts taken to perform sequential operations be a fraction α (0<α≤1)
of the total execution time T(1) of the program, then the time Tp to perform
parallel operations shall be (1-α) of T(1)
NARENDRA KUMAR
13
Speedup
Performance Laws
Amdahl’s Law
Thus, Ts = α.T(1) and Tp = (1-α).T(1)
Assuming that the parallel operations achieve linear speedup
(i.e. these operations use 1/n of the time taken to perform on each processor),
then
T(n) = Ts + Tp/n =
Thus, the speedup with n processors will be:
NARENDRA KUMAR
14
Speedup
Performance Laws
Amdahl’s Law
Sequential operations will tend to dominate the speedup as n becomes very
large.
As n ∞, S(n) 1/α
This means, no matter how many processors are employed, the speedup in
this problem is limited to 1/α.
This is known as sequential bottleneck of the problem.
Note: Sequential bottleneck cannot be removed just by increasing the no. of
processors.
NARENDRA KUMAR
15
Speedup
Performance Laws
Amdahl’s Law
A major shortcoming in applying the Amdahl’s Law: (is its own characteristic)
The total work load or the problem size is fixed
Thus, execution time decreases with increasing no. of processors
Thus, a successful method of overcoming this shortcoming is to increase the
problem size!
NARENDRA KUMAR
16
Speedup
Performance Laws
Amdahl’s Law
<<<GRAPH>>>
NARENDRA KUMAR
17
Speedup
Performance Laws
Gustafson’s Law (1988)
It relaxed the restriction of fixed size of the problem and used the notion of
fixed execution time for getting over the sequential bottleneck.
According to Gustafson’s Law,
If the number of parallel operations in the problem is increased (or scaled up)
sufficiently,
Then sequential operations will no longer be a bottleneck.
In accuracy-critical applications, it is desirable to solve the largest problem size
on a larger machine rather than solving a smaller problem on a smaller
machine, with almost the same execution time.
NARENDRA KUMAR
18
Speedup
Performance Laws
Gustafson’s Law
As the machine size increases, the work load (or problem size) is also increased
so as to keep the fixed execution time for the problem.
Let, Ts be the constant time tank to perform sequential operations; and
Tp(n,W) be the time taken to perform parallel operation of problem size or
workload W using n processors;
Then the speedup with n processors is:
NARENDRA KUMAR
19
Speedup
Performance Laws
Gustafson’s Law
<<<IMAGES>>>
NARENDRA KUMAR
20
Speedup
Performance Laws
Gustafson’s Law
Assuming that parallel operations achieve a linear speedup
(i.e. these operations take 1/n of the time to perform on one processor)
Then, Tp(1,W) = n. Tp(n,W)
Let α be the fraction of sequential work load in problem, i.e.
Then the speedup can be expressed as : with n processors is:
NARENDRA KUMAR
21
Speedup
Performance Laws
Sun & Ni’s Law (1993)
This law defines a memory bounded speedup model which generalizes both
Amdahl’s Law and Gustafson’s Law to maximize the use of both processor and
memory capacities.
The idea is to solve maximum possible size of problem, limited by memory capacity
This inherently demands an increased or scaled work load,
providing higher speedup,
Higher efficiency, and
Better resource (processor & memory) utilization
But may result in slight increase in execution time to achieve this scalable
speedup performance!
NARENDRA KUMAR
22
Speedup
Performance Laws
Sun & Ni’s Law
According to this law, the speedup S*(n) in the performance can be defined by:
Assumptions made while deriving the above expression:
• A global address space is formed from all individual memory spaces i.e. there is
a distributed shared memory space
• All available memory capacity of used up for solving the scaled problem.
NARENDRA KUMAR
23
Speedup
Performance Laws
Sun & Ni’s Law
Special Cases:
• G(n) = 1
Corresponds to where we have fixed problem size i.e. Amdahl’s Law
• G(n) = n
Corresponds to where the work load increases n times when memory is increased n
times, i.e. for scaled problem or Gustafson’s Law
• G(n) ≥ n
Corresponds to where computational workload (time) increases faster than memory
requirement.
Comparing
NARENDRA KUMAR speedup factors S*(n), S’(n) and S’(n), we shall find S*(n) ≥ S’(n) ≥ S(n)
24
Speedup
Performance Laws
Sun & Ni’s Law
<<<IMAGES>>>
NARENDRA KUMAR
25
Scalability Metric
Scalability
– Increasing the no. of processors decreases the efficiency!
+ Increasing the amount of computation per processor, increases the efficiency!
To keep the efficiency fixed, both the size of problem and the no. of processors
must be increased simultaneously.
A parallel computing system is said to be scalable if its efficiency can be
fixed by simultaneously increasing the machine size and the problem size.
Scalability of a parallel system is the measure of its capacity to increase
speedup in proportion to the machine size.
NARENDRA KUMAR
26
Scalability Metric
Isoefficiency Function
The isoefficiency function can be used to measure scalability of the parallel
computing systems.
It shows how the size of problem must grow as a function of the number of
processors used in order to maintain some constant efficiency.
The general form of the function is derived using an equivalent definition of
efficiency as follows:
Where, U is the time taken to do the useful computation (essential work), and
O is the parallel overhead. (Note: O is zero for sequential execution).
NARENDRA KUMAR
27
Scalability Metric
Isoefficiency Function
If the efficiency is fixed at some constant value K then
Where, K’ is a constant for fixed efficiency K.
This function is known as the isoefficiency function of parallel computing system.
A small isoefficiency function means that small increments in the problem size (U),
are sufficient for efficient utilization of an increasing no. of processors, indicating
high scalability.
ANARENDRA
largeKUMAR
isoeffcicnecy function indicates a poorly scalable system.
28
Scalability Metric
Isoefficiency Function
NARENDRA KUMAR
29
Performance
Measurement Tools
Performance Analysis
Search Based Tools
Visualization
Utilization Displays
[Processor (utilization) count, Utilization Summary, Gantt charts, Concurrency
Profile, Kiviat Diagrams]
Communication Displays
[Message Queues, Communication Matrix, Communication Traffic, Hypercube]
Task Displays
[Task Gantt, Task Summary]
NARENDRA KUMAR
30
Performance
Measurement Tools
NARENDRA KUMAR
31
Performance
Measurement Tools
NARENDRA KUMAR
32
Performance
Measurement Tools
NARENDRA KUMAR
33
Performance
Measurement Tools
NARENDRA KUMAR
34
Performance
Measurement Tools
NARENDRA KUMAR
35
Performance
Measurement Tools
Instrumentation
A way to collect data about an application is to instrument the application
executable so that when it executes, it generates the required information as a
side-effect.
Ways to do instrumentation:
By inserting it into the application source code directly, or
By placing it into the runtime libraries, or
By modifying the linked executable, etc.
Doing this, some perturbation of the application program will occur
NARENDRA KUMAR
(i.e. intrusion problem)
36
Performance
Measurement Tools
Instrumentation
Intrusion includes both:
Direct contention for resources (e.g. CPU, memory, communication links, etc.)
Secondary interference with resources (e.g. interaction with cache replacements or
virtual memory, etc.)
To address such effects, you may adopt the following approaches:
Realizing that intrusion affects measurement, treat the resulting data as an
approximation
Leave the added instrumentation in the final implementation.
Try to minimize the intrusion.
Quantify
NARENDRA KUMARthe intrusion and compensate for it!
37