0% found this document useful (0 votes)

145 views26 pages

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

The document discusses performance metrics for parallel systems including speedup, efficiency, cost, and scalability. It defines speedup as the ratio of serial to parallel runtimes. Efficiency is defined as the ratio of speedup to number of processors. Cost is the product of runtime and number of processors. Scalability measures a system's ability to increase speedup with more processors. Amdahl's law and Gustafson's law describe limits on speedup. The overhead function and isoefficiency metric characterize a system's scalability by relating problem size to processor count.

Uploaded by

Widya Meiriska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views26 pages

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

Uploaded by

Widya Meiriska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Parallel Computing

Chapter 7
Performance and Scalability
Jun Zhang
Department of Computer Science
University of Kentucky
7.1 Parallel Systems

• Definition: A parallel system consists of an

algorithm and the parallel architecture that
the algorithm is implemented.
• Note that an algorithm may have different
performance on different parallel architecture.
• For example, an algorithm may perform
differently on a linear array of processors and
on a hypercube of processors.
7.2 Performance Metrices for Parallel Systems

• Run Time: The parallel run time is defined as the time

that elapses from the moment that a parallel
computation starts to the moment that the last
processor finishes execution.

• Notation: Serial run time TS , parallel run time T P .

7.3 Speedup

• The speedup is defined as the ratio of the serial runtime

of the best sequential algorithm for solving a problem to
the time taken by the parallel algorithm to solve the
same problem on p processors.
TS
S 
TP
• Example Adding n numbers on an n processor
hypercube
n
T S   (n ), T P   (log n ), S  ( ),
log n
Using Reduction Algorithm
7.4 Efficiency

• The efficiency is defined as the ratio of speedup to the
number of processors. Efficiency measures the fraction of
time for which a processor is usefully utilized.

S TS
E 
p pT p

• Example Efficiency of adding n numbers on an n‐processor
hypercube
n 1 1
E  ( . )  ( )
log n n log n
7.5 Cost

• The cost of solving a problem on a parallel system is defined
as the product of run time and the number of processors.
• A cost‐optimal parallel system solves a problem with a cost
proportional to the execution time of the fastest known
sequential algorithm on a single processor.

• Example Adding n numbers on an n‐processor hypercube.
 ( n log n )  (n )
Cost is for the parallel system and for
sequential algorithm. The system is not cost‐optimal.
7.6 Granularity and Performance

• Use less than the maximum number of processors.
• Increase performance by increasing granularity of
computation in each processor.
• Example Adding n numbers cost‐optimally on a hypercube.

Use p processors, each holds n/p numbers. First add the n/p
numbers locally. Then the situation becomes adding p
numbers on a p processor hypercube. Parallel run time and
cost:
 ( n / p  log p ),  ( n  p log p )
7.7 Scalability

• Scalability is a measure of a parallel system’s capacity to
increase speedup in proportion to the number of processors.
• Example Adding n numbers cost‐optimally
n
Tp   2 log p
p
np
S 
n  2 p log p
S n
E  
p n 2 p log p
Well‐known Amdahl’s law dictates the achievable speedup
and efficiency.
7.8 Amdahl’s Law (1967)

• The speedup of a program using multiple processors in
parallel computing is limited by the time needed for the serial
fraction of the problem.
• If a problem of size W has a serial component Ws, the
speedup of the program is
W Ws
Tp  Ws
p
W
S 
(W  W s ) / p  W s
W
S  as p  
Ws
7.9 Amdahl’s Law

• If Ws=20%, W‐Ws=80%, then
1
S 
0 .8 / p  0 .2
1
S   5 as p  
0 .2

• So no matter how many processors are used, the speedup
cannot be greater than 5
• Amdahl’s Law implies that parallel computing is only useful
when the number of processors is small, or when the problem
is perfectly parallel, i.e., embarrassingly parallel
7.10 Gustafson’s Law (1988)

• Also known as the Gustafson‐Barsis’s Law
• Any sufficiently large problem can be efficiently parallelized
with a speedup
S  p   ( p  1)

• where p is the number of processors, and α is the serial

portion of the problem
• Gustafson proposed a fixed time concept which leads to
scaled speedup for larger problem sizes.
• Basically, we use larger systems with more processors to solve
larger problems
7.10 Gustafson’s Law (Cont)

• Execution time of program on a parallel computer is (a+b)
• a is the sequential time and b is the parallel time
• Total amount of work to be done in parallel varies linearly
with the number of processors. So b is fixed as p is varied. The
total run time is (a + p*b)
• The speedup is (a+p*b)/(a+b)
• Define α = a/(a+b) , the sequential fraction of the execution
time, then

S  p   ( p  1)
Gustafson’s Law
7.11 Scalability (cont.)

• Increase number of processors ‐‐> decrease efficiency
• Increase problem size ‐‐> increase efficiency

• Can a parallel system keep efficiency by increasing the
number of processors and the problem size
simultaneously???
Yes: ‐‐> scalable parallel system
No: ‐‐> non‐scalable parallel system

A scalable parallel system can always be made cost‐optimal by
adjusting the number of processors and the problem size.
7.12 Scalability (cont.)

• E.g. 7.7 Adding n numbers on an n‐processor hypercube, the
efficiency is
1
E  ( )
log n
• No way to fix E (make the system scalable)
• Adding n numbers on a p processor hypercube optimally, the
efficiency is
n
E
n  2 p log p

n   ( p log p )
Choosing makes E a constant.
7.13 Isoefficiency Metric of Scalability

• Degree of scalability: The rate to increase the size of the
problem to maintain efficiency as the number of processors
changes.
• Problem size: Number of basic computation steps in best
sequential algorithm to solve the problem on a single
processor ( ).
W T S

• Overhead function: Part of the parallel system cost
(processor‐time product) that is not incurred by the fastest
known serial algorithm on a serial computer
TO  pT p  W
7.14 Isoefficiency Function
W  T O (W , p )
TP 
p
1
E 
1  T O (W , p ) / W
• Solve the above equation for W
E
W  TO (W , p )  KT O (W , p )
1 E
• The isoefficiency function determines the growth rate of W
required to keep the efficiency fixed as p increases.
• Highly scalable systems have small isoefficiency function.
7.15 Sources of Parallel Overhead

• Interprocessor communication: increase data locality to
minimize communication.
• Load imbalance: distribution of work (load) is not uniform.
Inherent parallelism of the algorithm is not sufficient.
• Extra computation: modify the best sequential algorithm may
result in extra computation. Extra computation may be done
to avoid communication.
7.6 Minimum Execution Time
n
Tp   2 log p
p
A Few Examples

• Example Overhead Function for adding n numbers on a p
processor hypercube.
n
Parallel run time is: T p   2 log p
p
Parallel system cost is: pT p  n  2 p log p
Serial cost (and problem size) is: TS  n
The overhead function is:
TO  pT p  W
 ( n  2 p log p )  n
 2 p log p

What can we know here?
Example
• Example Isoefficiency Function for adding n numbers on a p
processor Hypercube.
The overhead function is:
TO  2 p log p
Hence, isoefficiency function is
W  2 Kp log p
p0 p1
Increasing the # of processors from      to      , the problem size has
to be increased by a factor of
p1 log p1 /( p 0 log p 0 )
E  0 .8 n  512 .
p  4 , n  64 , E  0 .8 . p  16 , for              , we have to have
If                                     If
Example
• Example Isoefficiency Function with a Complex Overhead Function.
Suppose overhead function is:
TO  p 3 / 2  p 3 / 4W 3 / 4
For the first term:
W  Kp 3 / 2
For the second term:
W  Kp 3 / 4W 3 / 4
Solving it yields
W  K 4 p3

The second term dominates, so the overall asymptotic isoefficiency
( p3 )
function is .
Example
• Example Minimum Cost‐Optimal Execution time for Adding n
Numbers
Minimum execution time is:
n
p , T Pmin  2 log n
2
If, for cost‐optimal
W  n  f ( p )  p log p (1)
then
log n  log p  log log p  log p

Now solve (1) for p, we have
1
p f ( n )  n / log p  n / log n
T PCost  Optimal  3 log n  2 log log n

Performance Metrices
100% (1)
Performance Metrices
18 pages
SRS Emotion Detection
100% (2)
SRS Emotion Detection
8 pages
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
18 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Understanding Machine Learning Theory Algorithms
No ratings yet
Understanding Machine Learning Theory Algorithms
449 pages
Service Manual Service Manual: TX-SR601/E
No ratings yet
Service Manual Service Manual: TX-SR601/E
81 pages
05 PNDB Power Net Distribution Box-01-01
100% (2)
05 PNDB Power Net Distribution Box-01-01
1 page
PDS Merged
No ratings yet
PDS Merged
182 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
Untitled Document
No ratings yet
Untitled Document
63 pages
Perovskite Photovoltaic Proposal
67% (3)
Perovskite Photovoltaic Proposal
3 pages
Untitled Document
No ratings yet
Untitled Document
39 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
I.mx Linux User's Guide
100% (1)
I.mx Linux User's Guide
56 pages
White Paper: The Future of EMC Test Laboratory Capabilities
No ratings yet
White Paper: The Future of EMC Test Laboratory Capabilities
17 pages
3.4 G KPI Trouble Shooting
No ratings yet
3.4 G KPI Trouble Shooting
55 pages
PDC Unit-2
No ratings yet
PDC Unit-2
48 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Lecture 7 Disributed Algorithms
No ratings yet
Lecture 7 Disributed Algorithms
43 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
Experiment 5
No ratings yet
Experiment 5
4 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
P 2
No ratings yet
P 2
19 pages
Slides
No ratings yet
Slides
44 pages
2 ND
No ratings yet
2 ND
19 pages
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
No ratings yet
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
46 pages
Meera Mainframe Resume
No ratings yet
Meera Mainframe Resume
3 pages
Analysis Modeling of Parallel Programs
No ratings yet
Analysis Modeling of Parallel Programs
4 pages
Engine
No ratings yet
Engine
16 pages
Timing Closure Today: Lou Scheffer Cadence San Jose, CA
No ratings yet
Timing Closure Today: Lou Scheffer Cadence San Jose, CA
104 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Week 7
No ratings yet
Week 7
27 pages
JaJa Parallel - Algorithms Intro
50% (2)
JaJa Parallel - Algorithms Intro
45 pages
PP Assignment
No ratings yet
PP Assignment
6 pages
MM3 - Examination Frequently Asked Questions (FAQs)
No ratings yet
MM3 - Examination Frequently Asked Questions (FAQs)
14 pages
Unit 2 - 2.1 (Parallel Approaches)
No ratings yet
Unit 2 - 2.1 (Parallel Approaches)
11 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
Delkin Industrial Two Pager 2024 May Digital Version
No ratings yet
Delkin Industrial Two Pager 2024 May Digital Version
2 pages
RS - Pds-Oe 3010
No ratings yet
RS - Pds-Oe 3010
8 pages
OpenACC 1
No ratings yet
OpenACC 1
44 pages
SOE413 Parellel Distributed Cloud
No ratings yet
SOE413 Parellel Distributed Cloud
21 pages
Cours 2
No ratings yet
Cours 2
25 pages
Chapter 4
No ratings yet
Chapter 4
16 pages
Brief Introduction To Pointers
No ratings yet
Brief Introduction To Pointers
9 pages
Networking Devices: Repeater Hub Switch Bridge Router
No ratings yet
Networking Devices: Repeater Hub Switch Bridge Router
24 pages
PC 2
No ratings yet
PC 2
44 pages
Design and Implementation of A Digital Tachometer
0% (1)
Design and Implementation of A Digital Tachometer
3 pages
Precat-Operating System Day 2: PPT'S Compiled By: Mrs. Akshita. S. Chanchlani Sunbeam Infotech
No ratings yet
Precat-Operating System Day 2: PPT'S Compiled By: Mrs. Akshita. S. Chanchlani Sunbeam Infotech
33 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Test Questions
No ratings yet
Test Questions
3 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
Unit 4
No ratings yet
Unit 4
64 pages
CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
No ratings yet
CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
34 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Pacman - Arch
No ratings yet
Pacman - Arch
8 pages
Module 1
No ratings yet
Module 1
14 pages
HPC Note
No ratings yet
HPC Note
39 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Network Theory Lab Manual
No ratings yet
Network Theory Lab Manual
31 pages
Cbtree: A Practical Concurrent Self-Adjusting Search Tree
No ratings yet
Cbtree: A Practical Concurrent Self-Adjusting Search Tree
15 pages
Watercolor Organic Shapes SlidesMania
No ratings yet
Watercolor Organic Shapes SlidesMania
23 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
HPC Chapter 2
No ratings yet
HPC Chapter 2
16 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Week 7
No ratings yet
Week 7
27 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Új Szöveges Dokumentum
No ratings yet
Új Szöveges Dokumentum
14 pages
Abacus
No ratings yet
Abacus
2 pages
Performance Metrics For Parallel Programs: 8 March 2010
No ratings yet
Performance Metrics For Parallel Programs: 8 March 2010
44 pages
Run Book
No ratings yet
Run Book
12 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
User's Manual of Condor DVR - Ver.2.3.7.2-Sw
0% (1)
User's Manual of Condor DVR - Ver.2.3.7.2-Sw
26 pages
OOAD
No ratings yet
OOAD
67 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
VMC300 CNC Machine Center
No ratings yet
VMC300 CNC Machine Center
4 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
Contactless Tachomete R: Group Members: Moruboyina Alekhya Kodi Padmasree D.Hima Varsha
100% (1)
Contactless Tachomete R: Group Members: Moruboyina Alekhya Kodi Padmasree D.Hima Varsha
11 pages
Hard Disk and Hard Drive Physical Structure
No ratings yet
Hard Disk and Hard Drive Physical Structure
10 pages
An Introduction To Libuv
No ratings yet
An Introduction To Libuv
63 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
The 3.5 - Floppy List For CPC Amstrads. 09-2016
No ratings yet
The 3.5 - Floppy List For CPC Amstrads. 09-2016
7 pages
Quantum Dots Red Tide
No ratings yet
Quantum Dots Red Tide
10 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
Max Invqn v4
No ratings yet
Max Invqn v4
27 pages
3D Imaging of Crustal Fluids Under The NE Volcanic Arc
No ratings yet
3D Imaging of Crustal Fluids Under The NE Volcanic Arc
1 page
Programming Language
No ratings yet
Programming Language
6 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Javad Samad Imf Ke 2012 Abs
No ratings yet
Javad Samad Imf Ke 2012 Abs
8 pages
Magister Program of Geothermal Technology: Requirements
No ratings yet
Magister Program of Geothermal Technology: Requirements
1 page
How To Separate Recyclable Materials and Dispose of Them: Waste Papers
No ratings yet
How To Separate Recyclable Materials and Dispose of Them: Waste Papers
4 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
User Manual Type KITZ 101, 102 Interface Unit: Handling of Electronic Equipment
No ratings yet
User Manual Type KITZ 101, 102 Interface Unit: Handling of Electronic Equipment
25 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
An Introduction To Data Acquisition
From Everand
An Introduction To Data Acquisition
Jason King
No ratings yet

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

Uploaded by

Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky

Uploaded by

Parallel Computing

• Definition: A parallel system consists of an

• Run Time: The parallel run time is defined as the time

• Notation: Serial run time TS , parallel run time T P .

• The speedup is defined as the ratio of the serial runtime

• where p is the number of processors, and α is the serial

You might also like