0% found this document useful (0 votes)

87 views44 pages

Slides

Uploaded by

ossamasamir.workout

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views44 pages

Slides

Uploaded by

ossamasamir.workout

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

CS476

Parallel and Distributed Computing

Module 5
Analytical Modeling of Parallel Systems

168
Contents
1. Effect of Granularity on Performance
2. Scalability of Parallel Systems
3. Minimum Execution Time and Minimum Cost-Optimal
Execution Time
4. Asymptotic Analysis of Parallel Programs
5. Other Scalability Metrics

169
Weekly Learning Outcomes

1. Learn the scalability of Parallel Systems .

2. Discuss about Minimum Execution Time and Minimum Cost-
Optimal Execution Time.

170
Required Reading
Chapter 5 Analytical Modeling of Parallel Systems: (Ananth Grama,
Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text
``Introduction to Parallel Computing'', Addison Wesley, 2003.)

Recommended Reading
Granularity in Parallel Computing
https://www.youtube.com/watch?v=AlzOErpaXE8

171
Effect of Granularity on Performance
vOften, using fewer processors improves performance of parallel systems.
vUsing fewer than the maximum possible number of processing elements
to execute a parallel algorithm is called scaling down a parallel system.
vA naive way of scaling down is to think of each processor in the original
case as a virtual processor and to assign virtual processors equally to
scaled down processors.
vSince the number of processing elements decreases by a factor of n / p,
the computation at each processing element increases by a factor of n / p.
vThe communication cost should not increase by this factor since some of
the virtual processors assigned to a physical processors might talk to each
other. This is the basic reason for the improvement from building
granularity.
Building Granularity: Example
• Consider the problem of adding n numbers on p processing elements such that p
< n and both n and p are powers of 2.

• Use the parallel algorithm for n processors, except, in this case, we think of them
as virtual processors.

• Each of the p processors is now assigned n / p virtual processors.

• The first log p of the log n steps of the original algorithm are simulated in (n / p)
log p steps on p processing elements.

• Subsequent log n - log p steps do not require any communication.

Building Granularity: Example (continued)
• The overall parallel execution time of this parallel system is
Θ ( (n / p) log p).

• The cost is Θ (n log p), which is asymptotically higher than the Θ (n)
cost of adding n numbers sequentially. Therefore, the parallel system
is not cost-optimal.
Building Granularity: Example (continued)
Can we build granularity in the example in a cost-optimal fashion?
• Each processing element locally adds its n / p numbers in time Θ (n / p).
• The p partial sums on p processing elements can be added in time Θ(n /p).

A cost-optimal way of computing the sum of 16 numbers using four processing

elements.
Building Granularity: Example (continued)
• The parallel runtime of this algorithm is

(3)

• The cost is

• This is cost-optimal, so long as !

Scalability of Parallel Systems
How do we extrapolate performance from small problems and small systems to
larger problems on larger configurations?
Consider three parallel algorithms for computing an n-point Fast Fourier
Transform (FFT) on 64 processing elements.

A comparison of the speedups obtained by the binary-exchange, 2-D transpose and 3-D
transpose algorithms on 64 processing elements with tc = 2, tw = 4, ts = 25, and th = 2.
Clearly, it is difficult to infer scaling characteristics from observations on small
datasets on small machines.
Scaling Characteristics of Parallel Programs
• The efficiency of a parallel program can be written as:

S is the speedup
p is the number of processors
Ts is the execution time of the sequential (non-parallel) program
Tp is the execution time of the parallel program
or (4)

• The total overhead function To is an increasing function of p .

such as communication between processors or managing parallel tasks
Scaling Characteristics of Parallel Programs

vFor a given problem size (i.e., the value of TS remains

constant), as we increase the number of processing
elements, To increases.

vThe overall efficiency of the parallel program goes down.

This is the case for all parallel programs.
Scaling Characteristics of Parallel Programs: Example
• Consider the problem of adding numbers on processing
elements.

• We have seen that: n is the number of numbers to add.

p is the number of processors.
The 2logp term accounts for the
communication overhead between
= (5) of processors
the processors. As the number
increases, this overhead also increases.
This shows that speedup improves
as the number of processors
increases, but it is not linear due
to the overhead term (2logp)
= (6)

= (7)
Efficiency decreases as the number of processors
increases because more processors introduce more
communication overhead, which reduces the benefit of parallelization.
Scaling Characteristics of Parallel Programs: Example
(continued)
Plotting the speedup for various input sizes gives us:

Speedup versus the number of processing elements for adding a list of numbers.
Speedup tends to saturate and efficiency drops as a consequence of Amdahl's law
Scaling Characteristics of Parallel Programs
vTotal overhead function To is a function of both problem size Ts and
the number of processing elements p.
v In many cases, To grows sublinearly with respect to Ts.
vIn such cases, the efficiency increases if the problem size is increased
keeping the number of processing elements constant.
vFor such systems, we can simultaneously increase the problem size
and number of processors to keep efficiency constant.
vWe call such systems scalable parallel systems.
Scaling Characteristics of Parallel Programs

vRecall that cost-optimal parallel systems have an efficiency

of Θ(1).
vScalability and cost-optimality are therefore related.
v A scalable parallel system can always be made cost-optimal
if the number of processing elements and the size of the
computation are chosen appropriately.
Isoefficiency Metric of Scalability

vFor a given problem size, as we increase the number of

processing elements, the overall efficiency of the parallel
system goes down for all systems.
vFor some systems, the efficiency of a parallel system
increases if the problem size is increased while keeping the
number of processing elements constant.
Isoefficiency Metric of Scalability

Variation of efficiency: (a) as the number of processing elements is increased for

a given problem size; and (b) as the problem size is increased for a given number
of processing elements. The phenomenon illustrated in graph (b) is not common
to all parallel systems.
Isoefficiency Metric of Scalability
vWhat is the rate at which the problem size must increase with respect
to the number of processing elements to keep the efficiency fixed?
vThis rate determines the scalability of the system. The slower this
rate, the better.
vBefore we formalize this rate, we define the problem size W as the
asymptotic number of operations associated with the best serial
algorithm to solve the problem.
Isoefficiency Metric of Scalability
• We can write parallel runtime as:

(8)

• The resulting expression for speedup is

(9)

• Finally, we write the expression for efficiency as

Isoefficiency Metric of Scalability
• For scalable parallel systems, efficiency can be maintained at a
fixed value (between 0 and 1) if the ratio To / W is maintained
at a constant value.
• For a desired value E of efficiency,

(11)

• If K = E / (1 – E) is a constant depending on the efficiency to be

maintained, since To is a function of W and p, we have

(12)
Isoefficiency Metric of Scalability
vThe problem size W can usually be obtained as a function of p by
algebraic manipulations to keep efficiency constant.

vThis function is called the isoefficiency function.

vThis function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing
in proportion to the number of processing elements
Isoefficiency Metric: Example

v The overhead function for the problem of adding n numbers on p processing elements is
approximately 2p log p .

v Substituting To by 2p log p , we get

= (13)

v Thus, the asymptotic isoefficiency function for this parallel system is

v If the number of processing elements is increased from p to p’, the problem size (in this
case, n ) must be increased by a factor of (p’ log p’) / (p log p) to get the same efficiency
as on p processing elements.
Isoefficiency Metric: Example
Consider a more complex example where
• Using only the first term of To in Equation 12, we get

= (14)

• Using only the second term, Equation 12 yields the following

relation between W and p:

(15)

• The larger of these two asymptotic rates determines the

isoefficiency. This is given by Θ(p3)
Cost-Optimality and the Isoefficiency Function
• A parallel system is cost-optimal if and only if

(16)
• From this, we have:

(17)

(18)

• If we have an isoefficiency function f(p), then it follows

that the relation W = Ω(f(p)) must be satisfied to ensure
the cost-optimality of a parallel system as it is scaled up.
Lower Bound on the Isoefficiency Function
• For a problem consisting of W units of work, no more than W
processing elements can be used cost-optimally.

• The problem size must increase at least as fast as Θ(p) to maintain

fixed efficiency; hence, Ω(p) is the asymptotic lower bound on the
isoefficiency function.
Degree of Concurrency and the Isoefficiency
Function
vThe maximum number of tasks that can be executed simultaneously
at any time in a parallel algorithm is called its degree of concurrency.

vIf C(W) is the degree of concurrency of a parallel algorithm, then for

a problem of size W, no more than C(W) processing elements can be
employed effectively.
Degree of Concurrency and the Isoefficiency Function:
Example
Consider solving a system of equations in variables by using Gaussian
elimination (W = Θ(n3))

vThe n variables must be eliminated one after the other, and eliminating each
variable requires Θ(n2) computations.

vAt most Θ(n2) processing elements can be kept busy at any time.

vSince W = Θ(n3) for this problem, the degree of concurrency C(W) is Θ(W2/3) .

vGiven p processing elements, the problem size should be at least Ω(p3/2) to use
them all.
Minimum Execution Time and Minimum Cost-
Optimal Execution Time
Often, we are interested in the minimum time to solution.

• We can determine the minimum parallel runtime TPmin for a given W by

differentiating the expression for TP w.r.t. p and equating it to zero.

=0 (19)

• If p0 is the value of p as determined by this equation, TP(p0) is the minimum

parallel time.
Minimum Execution Time: Example
Consider the minimum execution time for adding n
numbers.
(20)
=
Setting the derivative w.r.t. p to zero, we have p = n/ 2 .
The corresponding runtime is

= (21)

(One may verify that this is indeed a min by verifying that

the second derivative is positive).

Note that at this point, the formulation is not cost-

optimal.
Minimum Cost-Optimal Parallel Time
vLet TPcost_opt be the minimum cost-optimal parallel time.

vIf the isoefficiency function of a parallel system is Θ(f(p)) , then a problem of size W can be solved
cost-optimally if and only if
v W= Ω(f(p)) .

vIn other words, for cost optimality, p = O(f--1(W)) .

vFor cost-optimal systems, TP = Θ(W/p) , therefore,

= (22)
Asymptotic Analysis of Parallel Programs
• Consider the problem of sorting a list of n numbers. The fastest serial
programs for this problem run in time Θ(n log n). Consider four parallel
algorithms, A1, A2, A3, and A4 as follows:
• Comparison of four different algorithms for sorting a given list of numbers.
The table shows number of processing elements, parallel runtime, speedup,
efficiency and the pTP product.
Asymptotic Analysis of Parallel Programs
vIf the metric is speed, algorithm A1 is the best, followed by A3, A4,
and A2 (in order of increasing TP).

vIn terms of efficiency, A2 and A4 are the best, followed by A3 and A1.

vIn terms of cost, algorithms A2 and A4 are cost optimal, A1 and A3

are not.

vIt is important to identify the objectives of analysis and to use

appropriate metrics!
Other Scalability Metrics
vA number of other metrics have been proposed, dictated by
specific needs of applications.

vFor real-time applications, the objective is to scale up a

system to accomplish a task in a specified time bound.

vIn memory constrained environments, metrics operate at

the limit of memory and estimate performance under this
problem growth rate.
Other Scalability Metrics: Scaled Speedup
vSpeedup obtained when the problem size is increased linearly with
the number of processing elements.
vIf scaled speedup is close to linear, the system is considered scalable.
vIf the isoefficiency is near linear, scaled speedup curve is close to
linear as well.
vIf the aggregate memory grows linearly in p, scaled speedup
increases problem size to fill memory.
vAlternately, the size of the problem is increased subject to an upper-
bound on parallel execution time.
Scaled Speedup: Example
vThe serial runtime of multiplying a matrix of dimension n x n
with a vector is tcn2 .

vFor a given parallel algorithm,

(24)

vTotal memory requirement of this algorithm is Θ(n2) .

Where:
tc : Time for computation.
ts : Setup time for processors to communicate.
tw : Communication time between processors.
p: Number of processors used in the parallel execution.
Scaled Speedup: Example (continued)
Consider the case of memory-constrained scaling.

• We have m= Θ(n2) = Θ(p).

• Memory constrained scaled speedup is given by

Where:
tc : Computation time per element.
ts : Setup time for the processors.
or tw : Communication time between processors.
p: Number of processors.
C: A constant related to data or operations size
• This is not a particularly scalable system
Scaled Speedup: Example (continued)
Consider the case of time-constrained scaling.

vWe have TP = O(n2) .

vSince this is constrained to be constant, n2= O(p) .

vNote that in this case, time-constrained speedup is identical to memory

constrained speedup.

vThis is not surprising, since the memory and time complexity of the
operation are identical.
Scaled Speedup: Example
• The serial runtime of multiplying two matrices of
dimension n x n is tcn3.

• The parallel runtime of a given algorithm is:

(25)

• The speedup S is given by:

Scaled Speedup: Example (continued)
Consider memory-constrained scaled speedup.
• We have memory complexity m= Θ(n2) = Θ(p), or n2=c
xp.
• At this growth rate, scaled speedup S’ is given by:

Note that this is scalable.

Scaled Speedup: Example (continued)

Consider time-constrained scaled speedup.

vWe have TP = O(1) = O(n3 / p) , or n3=c x p .

vTime-constrained speedup S’’ is given by:

Memory constrained scaling yields better

performance.
Serial Fraction f
• If the serial runtime of a computation can be divided into a
totally parallel and a totally serial component, we have:

• From this, we have,

(26)
Serial Fraction f
• The serial fraction f of a parallel program is defined as:

• Therefore, we have:
Serial Fraction
vSince S = W / TP , we have

vFrom this, we have:

(27)

vIf f increases with the number of processors, this is an indicator

of rising overhead, and thus an indicator of poor scalability.

5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Parallel Algorithms Complete Notes
No ratings yet
Parallel Algorithms Complete Notes
13 pages
Parallel Architecture Classification
50% (2)
Parallel Architecture Classification
41 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
JaJa Parallel - Algorithms Intro
50% (2)
JaJa Parallel - Algorithms Intro
45 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Overheads
No ratings yet
Overheads
139 pages
Performance Metrics For Parallel Programs: 8 March 2010
No ratings yet
Performance Metrics For Parallel Programs: 8 March 2010
44 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
Unit 4
No ratings yet
Unit 4
64 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Pc7 Performance
No ratings yet
Pc7 Performance
50 pages
Assignment-2 Ami Pandat Parallel Processing: Time Complexity
No ratings yet
Assignment-2 Ami Pandat Parallel Processing: Time Complexity
12 pages
Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky
No ratings yet
Parallel Computing Chapter 7 Performance and Scalability: Jun Zhang Department of Computer Science University of Kentucky
26 pages
Nscet E-Learning Presentation: Listen Learn Lead
No ratings yet
Nscet E-Learning Presentation: Listen Learn Lead
51 pages
Cours 2
No ratings yet
Cours 2
25 pages
CSCE626 Amato LN PerformanceAnalysisMethodology
No ratings yet
CSCE626 Amato LN PerformanceAnalysisMethodology
19 pages
OOAD
No ratings yet
OOAD
67 pages
Cours 2
No ratings yet
Cours 2
25 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
P 2
No ratings yet
P 2
19 pages
Karp
No ratings yet
Karp
5 pages
Unit 4 HPC
No ratings yet
Unit 4 HPC
82 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
Chapter 4
No ratings yet
Chapter 4
16 pages
PC 2
No ratings yet
PC 2
44 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
Homework 1 Uchenna Ogunka 227001144 CSCE 685 Department of Mechanical Engineering Texas A&M University, College Station
No ratings yet
Homework 1 Uchenna Ogunka 227001144 CSCE 685 Department of Mechanical Engineering Texas A&M University, College Station
8 pages
Week 7
No ratings yet
Week 7
27 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Unit 4 HPC Part4
No ratings yet
Unit 4 HPC Part4
14 pages
Analysis Modeling of Parallel Programs
No ratings yet
Analysis Modeling of Parallel Programs
4 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Unit 4 HPC Part5
No ratings yet
Unit 4 HPC Part5
10 pages
HPC Scaling
No ratings yet
HPC Scaling
56 pages
Ai QUIZ Practice
No ratings yet
Ai QUIZ Practice
2 pages
Untitled Document
No ratings yet
Untitled Document
63 pages
Effect of Granularity On Performance: V V V V V
No ratings yet
Effect of Granularity On Performance: V V V V V
5 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
18 pages
Week 7
No ratings yet
Week 7
27 pages
Untitled Document
No ratings yet
Untitled Document
39 pages
Parallel Computing Simply in Depth by Ajit Singh PDF
No ratings yet
Parallel Computing Simply in Depth by Ajit Singh PDF
125 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
1 Introduction To Parallel Computing
No ratings yet
1 Introduction To Parallel Computing
58 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
Parcomp PDF
No ratings yet
Parcomp PDF
94 pages
Unit 1 - Part 3
No ratings yet
Unit 1 - Part 3
17 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
13 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
Rtos Module 2 Notes PDF
No ratings yet
Rtos Module 2 Notes PDF
26 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
Os Unit III 2018 Ars
No ratings yet
Os Unit III 2018 Ars
102 pages
Module - 6
No ratings yet
Module - 6
89 pages
Unit 2
No ratings yet
Unit 2
64 pages
Module-1: Chapter-1 Parallel Computer Models
No ratings yet
Module-1: Chapter-1 Parallel Computer Models
42 pages
Partitioning
No ratings yet
Partitioning
37 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
CH 20
No ratings yet
CH 20
39 pages
High Performance Computing-1 PDF
No ratings yet
High Performance Computing-1 PDF
15 pages
Unit 5
No ratings yet
Unit 5
29 pages
High Performance Computing Matrix Mul.
No ratings yet
High Performance Computing Matrix Mul.
15 pages
Parallel Algorithms
No ratings yet
Parallel Algorithms
10 pages
Unit I 2 Marks With Answer
No ratings yet
Unit I 2 Marks With Answer
6 pages
Umerical Roblems - Tutorial: Roblem 1
No ratings yet
Umerical Roblems - Tutorial: Roblem 1
3 pages
Time Series Collections Considerations - MongoDB Manual v8.0
No ratings yet
Time Series Collections Considerations - MongoDB Manual v8.0
2 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet

Slides

Uploaded by

Slides

Uploaded by

CS476

Parallel and Distributed Computing

1. Learn the scalability of Parallel Systems .

• Each of the p processors is now assigned n / p virtual processors.

• Subsequent log n - log p steps do not require any communication.

A cost-optimal way of computing the sum of 16 numbers using four processing

• This is cost-optimal, so long as !

• The total overhead function To is an increasing function of p .

vFor a given problem size (i.e., the value of TS remains

vThe overall efficiency of the parallel program goes down.

• We have seen that: n is the number of numbers to add.

vRecall that cost-optimal parallel systems have an efficiency

vFor a given problem size, as we increase the number of

Variation of efficiency: (a) as the number of processing elements is increased for

• The resulting expression for speedup is

• Finally, we write the expression for efficiency as

• If K = E / (1 – E) is a constant depending on the efficiency to be

vThis function is called the isoefficiency function.

v Substituting To by 2p log p , we get

v Thus, the asymptotic isoefficiency function for this parallel system is

• Using only the second term, Equation 12 yields the following

• The larger of these two asymptotic rates determines the

• If we have an isoefficiency function f(p), then it follows

• The problem size must increase at least as fast as Θ(p) to maintain

vIf C(W) is the degree of concurrency of a parallel algorithm, then for

• We can determine the minimum parallel runtime TPmin for a given W by

• If p0 is the value of p as determined by this equation, TP(p0) is the minimum

(One may verify that this is indeed a min by verifying that

Note that at this point, the formulation is not cost-

vIn other words, for cost optimality, p = O(f--1(W)) .

vFor cost-optimal systems, TP = Θ(W/p) , therefore,

vIn terms of cost, algorithms A2 and A4 are cost optimal, A1 and A3

vIt is important to identify the objectives of analysis and to use

vFor real-time applications, the objective is to scale up a

vIn memory constrained environments, metrics operate at

vFor a given parallel algorithm,

vTotal memory requirement of this algorithm is Θ(n2) .

• We have m= Θ(n2) = Θ(p).

• Memory constrained scaled speedup is given by

vWe have TP = O(n2) .

vSince this is constrained to be constant, n2= O(p) .

vNote that in this case, time-constrained speedup is identical to memory

• The parallel runtime of a given algorithm is:

• The speedup S is given by:

Note that this is scalable.

Consider time-constrained scaled speedup.

vWe have TP = O(1) = O(n3 / p) , or n3=c x p .

vTime-constrained speedup S’’ is given by:

Memory constrained scaling yields better

• From this, we have,

vFrom this, we have:

vIf f increases with the number of processors, this is an indicator

You might also like