0% found this document useful (0 votes)
145 views26 pages

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

The document discusses performance metrics for parallel systems including speedup, efficiency, cost, and scalability. It defines speedup as the ratio of serial to parallel runtimes. Efficiency is defined as the ratio of speedup to number of processors. Cost is the product of runtime and number of processors. Scalability measures a system's ability to increase speedup with more processors. Amdahl's law and Gustafson's law describe limits on speedup. The overhead function and isoefficiency metric characterize a system's scalability by relating problem size to processor count.

Uploaded by

Widya Meiriska
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views26 pages

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

The document discusses performance metrics for parallel systems including speedup, efficiency, cost, and scalability. It defines speedup as the ratio of serial to parallel runtimes. Efficiency is defined as the ratio of speedup to number of processors. Cost is the product of runtime and number of processors. Scalability measures a system's ability to increase speedup with more processors. Amdahl's law and Gustafson's law describe limits on speedup. The overhead function and isoefficiency metric characterize a system's scalability by relating problem size to processor count.

Uploaded by

Widya Meiriska
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Parallel Computing

Chapter 7
Performance and Scalability
Jun Zhang
Department of Computer Science
University of Kentucky
7.1 Parallel Systems

• Definition: A parallel system consists of an


algorithm and the parallel architecture that
the algorithm is implemented.
• Note that an algorithm may have different
performance on different parallel architecture.
• For example, an algorithm may perform
differently on a linear array of processors and
on a hypercube of processors.
7.2 Performance Metrices for Parallel Systems

• Run Time: The parallel run time is defined as the time


that elapses from the moment that a parallel
computation starts to the moment that the last
processor finishes execution.

• Notation: Serial run time TS , parallel run time T P .


7.3 Speedup

• The speedup is defined as the ratio of the serial runtime


of the best sequential algorithm for solving a problem to
the time taken by the parallel algorithm to solve the
same problem on p processors.
TS
S 
TP
• Example  Adding n numbers on an n processor 
hypercube
n
T S   (n ), T P   (log n ), S  ( ),
log n
Using Reduction Algorithm
7.4 Efficiency

• The efficiency is defined as the ratio of speedup to the 
number of processors. Efficiency measures the fraction of 
time for which a processor is usefully utilized.

S TS
E 
p pT p

• Example Efficiency of adding n numbers on an n‐processor 
hypercube
n 1 1
E  ( . )  ( )
log n n log n
7.5 Cost

• The cost of solving a problem on a parallel system is defined 
as the product of run time and the number of processors. 
• A cost‐optimal parallel system solves a problem with a cost 
proportional to the execution time of the fastest known 
sequential algorithm on a single processor.

• Example Adding n numbers on an n‐processor hypercube.
 ( n log n )  (n )
Cost is                    for the parallel system and           for 
sequential algorithm. The system is not cost‐optimal.  
7.6 Granularity and Performance

• Use less than the maximum number of processors.
• Increase performance by increasing granularity of 
computation in each processor.
• Example Adding n numbers cost‐optimally on a hypercube.

Use p processors, each holds n/p numbers. First add the n/p 
numbers locally. Then the situation becomes adding p
numbers on a p processor hypercube. Parallel run time and 
cost:
 ( n / p  log p ),  ( n  p log p )
7.7 Scalability

• Scalability is a measure of a parallel system’s capacity to 
increase speedup in proportion to the number of processors.
• Example Adding n numbers cost‐optimally
n
Tp   2 log p
p
np
S 
n  2 p log p
S n
E  
p n 2 p log p
Well‐known Amdahl’s law dictates the achievable speedup 
and efficiency.
7.8 Amdahl’s Law (1967)

• The speedup of a program using multiple processors in 
parallel computing is limited by the time needed for the serial 
fraction of the problem.
• If a problem of size W has a serial component Ws, the 
speedup of the program is
W Ws
Tp  Ws
p
W
S 
(W  W s ) / p  W s
W
S  as p  
Ws
7.9 Amdahl’s Law

• If Ws=20%, W‐Ws=80%, then
1
S 
0 .8 / p  0 .2
1
S   5 as p  
0 .2

• So no matter how many processors are used, the speedup 
cannot be greater than 5
• Amdahl’s Law implies that parallel computing is only useful 
when the number of processors is small, or when the problem 
is perfectly parallel, i.e., embarrassingly parallel
7.10 Gustafson’s Law (1988)

• Also known as the Gustafson‐Barsis’s Law
• Any sufficiently large problem can be efficiently parallelized 
with a speedup
S  p   ( p  1)

• where p is the number of processors, and α is the serial 


portion of the problem
• Gustafson proposed a fixed time concept which leads to 
scaled speedup for larger problem sizes.
• Basically, we use larger systems with more processors to solve 
larger problems
7.10 Gustafson’s Law (Cont)

• Execution time of program on a parallel computer is (a+b)
• a is the sequential time and b is the parallel time
• Total amount of work to be done in parallel varies linearly 
with the number of processors. So b is fixed as p is varied. The 
total run time is (a + p*b)
• The speedup is  (a+p*b)/(a+b)
• Define α = a/(a+b) , the sequential fraction of the execution 
time, then

S  p   ( p  1)
Gustafson’s Law 
7.11 Scalability (cont.)

• Increase number of processors ‐‐> decrease efficiency
• Increase problem size ‐‐> increase efficiency

• Can a parallel system keep efficiency by increasing the 
number of processors and the problem  size 
simultaneously???
Yes: ‐‐> scalable parallel system
No: ‐‐> non‐scalable parallel system

A scalable parallel system can always be made cost‐optimal by 
adjusting the number of processors and the problem size.
7.12 Scalability (cont.)

• E.g. 7.7 Adding n numbers on an n‐processor hypercube, the 
efficiency is 
1
E  ( )
log n
• No way to fix E (make the system scalable)
• Adding n numbers on a p processor hypercube optimally, the 
efficiency is 
n
E
n  2 p log p

n   ( p log p )
Choosing                               makes E a constant. 
7.13 Isoefficiency Metric of Scalability

• Degree of scalability: The rate to increase the size of the 
problem to maintain efficiency as the number of processors 
changes.
• Problem size: Number of basic computation steps in best 
sequential algorithm to solve the problem on a single 
processor (               ).
W T S

• Overhead function: Part of the parallel system cost 
(processor‐time product) that is not incurred by the fastest 
known serial algorithm on a serial computer
TO  pT p  W
7.14 Isoefficiency Function
W  T O (W , p )
TP 
p
1
E 
1  T O (W , p ) / W
• Solve the above equation for W
E
W  TO (W , p )  KT O (W , p )
1 E
• The isoefficiency function determines the growth rate of W 
required to keep the efficiency fixed as p increases.
• Highly scalable systems have small isoefficiency function.
7.15 Sources of Parallel Overhead

• Interprocessor communication: increase data locality to 
minimize communication.
• Load imbalance: distribution of work (load) is not uniform. 
Inherent parallelism of the algorithm is not sufficient.
• Extra computation: modify the best sequential algorithm may 
result in extra computation. Extra computation may be done 
to avoid communication.
7.6 Minimum Execution Time
n
Tp   2 log p
p
A Few Examples

• Example Overhead Function for adding n numbers on a p 
processor hypercube.
n
Parallel run time is:  T p   2 log p
p
Parallel system cost is: pT p  n  2 p log p
Serial cost (and problem size) is: TS  n
The overhead function is:
TO  pT p  W
 ( n  2 p log p )  n
 2 p log p

What can we know here?  
Example
• Example Isoefficiency Function for adding n numbers on a p 
processor Hypercube.
The overhead function is: 
TO  2 p log p
Hence, isoefficiency function is 
W  2 Kp log p
p0 p1
Increasing the # of processors from      to      , the problem size has 
to be increased by a factor of 
p1 log p1 /( p 0 log p 0 )
E  0 .8 n  512 .
p  4 , n  64 , E  0 .8 . p  16 , for              , we have to have   
If                                     If
Example
• Example Isoefficiency Function with a Complex Overhead Function.
Suppose overhead function is: 
TO  p 3 / 2  p 3 / 4W 3 / 4
For the first term:
W  Kp 3 / 2
For the second term:
W  Kp 3 / 4W 3 / 4
Solving it yields
W  K 4 p3

The second term dominates, so the overall asymptotic isoefficiency
( p3 )
function is            . 
Example
• Example Minimum Cost‐Optimal Execution time for Adding n
Numbers
Minimum execution time is:
n
p , T Pmin  2 log n
2
If, for cost‐optimal
W  n  f ( p )  p log p (1)
then 
log n  log p  log log p  log p

Now solve (1) for p, we have
1
p f ( n )  n / log p  n / log n
T PCost  Optimal  3 log n  2 log log n

You might also like