0% found this document useful (0 votes)

24 views13 pages

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views13 pages

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module #2

Performance Analysis of Multiprocessor

Professor Mostafa Abd-El-Barr

Term 2024-2025

Friday, October 4, 2024 1

Outline
❑Computational Models
❑ An Argument for Parallel Architectures
❑ Interconnection Networks Performance Issues
❑ Scalability Of Parallel Architectures

2
Computational Models
1. Equal Duration Models
✓ It is assumed that a given task can be divided into n equal subtasks each of which can be
executed by one processor.
✓ If t s = execution time of the whole task usingta single processor, then the time taken by
each processor to execute its subtask is tm = n
s

✓ Definition: The speedup factor S(n) of a parallel system is the ratio between the time taken
by a single processor to solve a given problem instance to the time taken by a parallel system
consisting of n processors to solve the same problem instance.
S ( n) = speedup factor
ts t
= = s = n
t t
m s
n

✓ Effect of the communication overhead: Assume that t c is the communication overhead

needed to communicate resulting from the time needed for processors to communicate and
possibly exchange data while executing their subtasks.
✓ Assume that the time incurred due to the communication overhead is called t.m = ts + tc
n
✓ The actual time taken by each processor to execute its subtask is given by

S (n) = speedup factor wit h communiati on overhead

t t n
= s = s =
t t t Friday, October 4, 2024
m s +t 1+ n  c
n c t
Computational Models
✓ Definition: efficiency  is a measure of the speedup achieved per processor.
✓ If the communication overhead is taken into consideration, the efficiency can be expressed
1
as  = t
1+ n c
t
s
✓ The equal duration model is however unrealistic.
✓ This is because it is based on the assumption that a given task can be divided into a number
of equal subtasks that can be executed by a number of processors in parallel.
✓ Real algorithms contain some (serial) parts that cannot be divided among processors. These
(serial) parts must be executed on a single processor.
✓ Example

For I  1, n
c(I)  a(I) + b(I); done in parallel, each processor does one addition

Sum  0; only one processor can do this (series section)

For J  1, n

sum  sum + c(j); done in parallel, each processor does one addition

Average  sum/n; only one processor can do this (series section)

For k  1, n
a(k) a(k)-average;
b(k)  b(k) –average done in parallel, each processor does one addition

✓ This illustrative example shows that a realistic computational model should assume the
Computational Models
2. Parallel Computation with Serial Sections Model
✓ It is assumed that a fraction f of the given task (computation) is not dividable into concurrent
subtasks.
✓ The remaining part (1-f) is assumed to be dividable into concurrent subtasks.
✓ Performing similar derivations to those done in the case of the equal duration model will
result in the following: ts
tm = fts + (1 − f )
✓ The time required to execute the task on n processors
ts is n
n
✓ The speedup factor is therefore given by S ( n ) =
t
=
1 + (n − 1) f
fts + (1 − f ) s
n
✓ Result: The potential speedup due to the use of n processors is determined primarily by the
fraction of code that cannot be divided.
✓ If the task is completely serial, i.e. f = 1, then no speedup can be achieved regardless of the
number of processors used.
✓ This principle is known as the Amdahl’s law.
1
✓ According to this law, the maximum speedup factor is given by nLim S ( n) =
→ f
✓ According to Amdahl’s law the improvement in performance (speedup) of a parallel
algorithm over a sequential one is limited not by the number of processors employed but
rather by the fraction of the algorithm that cannot be parallelized.
✓ For some time and according to Amdahl’s law researchers were led to believe that a
substantial increase in speedup factor will not be possible by using parallel architectures.
Computational Models
✓ As stated earlier, the communication overhead should be included in the processing time.
✓ Considering the time incurred due to this communication overhead, the speedup factor is
ts n
given by S (n) = t
=
t
fts + (1 − f ) s + tc f (n − 1) + 1 + n c
n ts n 1
✓ The maximum speedup factor under such conditions is given by nLim S (n) = Lim =
→ n→ t t
f (n − 1) + 1 + n c f + c
ts ts
✓ The above equation indicates that the maximum speed-up factor is determined not by the
number of parallel processors employed but by the fraction of the computation that is not
parallelized and the communication overhead.
✓ Recall that the efficiency is defined as the ratio between the speedup factor and the number
of processors, n. The efficiency can be computed as follows.
1
 (no communication overhead) =
1 + (n − 1) f
1
 (with communication overhead) =
tc
f (n − 1) + 1 + n
ts
✓ As the number of processors increases, it may become difficult to use those processors efficiently.
✓ In order to maintain a certain level of processor efficiency, there should exist a relationship
between the fraction of serial computation, f, and the number of processor employed.
Interconnection Networks Performance Issues
o Definition:
Channel Bisection Width of a network (B): is the minimum number of wires that, when cut, divide the network into
equal halves with respect to the number of nodes.
o Definition: The wire bisection is the number of wires crossing this cut of the network.
o Example: the bisection width of a 4-cube is B = 8.

The k-ary n-cube network is a radix k cube with

n dimensions. N = k n
K=8
(a) 8-ary 1-cube (8 nodes ring) network

The Table provides some numerical values of the above topological characteristics for sample static networks.
Network Configuration Bisection Width (B) Node Degree (d) Diameter (D)
8-ary 1-cube 2 2 4
4-cube 8 4 4
3  3  2 Mesh 9 3 5
8-ary 2-cube 16 4 8
Interconnection Networks Performance Issues
✓ Bandwidth of a crossbar
o Define the bandwidth for the crossbar as the average number of requests that can be accepted by a crossbar
in a given cycle.
o As processors make requests for memory modules in a crossbar, contention can take place when two or
more processors request access to the same memory module.
o Example: the case of a crossbar consisting of three processors p1, p2 , and p3 and three memory modules M1, M 2 , and M 3.
o As processors make requests for accessing memory modules, the following cases may take place.
1. All three processors request access to the same memory module: In this case, only one request can be accepted.
Since there are three memory modules, then there are three ways (three accepted requests) in which such a case
can arise.
2. All three processors request access to two different memory modules: In this case two requests can be granted.
There are 18 ways (find why)(thirty-six accepted requests) in which such a case can arise.
3. All three processors request access to three different memory modules: In this case all three requests can be
granted. There are six ways (find why) (eighteen accepted requests) in which such a case can arise.
o Out of the twenty-seven combinations of 3 requests taken from 3 possible requests, there are 57 requests that can
be accepted (causing no memory contention).
o We say that the bandwidth of such a crossbar is BW = 57/27 = 2.11 Assuming that all processors make requests
for memory module access in every cycle.
Scalability Of Parallel Architectures
• Definition: A parallel architecture is said to be scalable if it can be expanded
(reduced) to a larger (smaller) system with a linear increase (decrease) in its
performance (cost).
• This general definition indicates the desirability for providing equal chance for
scaling up a system for improved performance and for scaling down a system for
greater cost-effectiveness and/or affordability.
• Scalability is used as a measure of the system’s ability to provide increased
performance, e.g., speed as its size is increased.
• Scalability is a reflection of the system’s ability to efficiently utilize the increased
processing resources.
• Scalability of a system can be manifested in a number of forms. These forms
include speed, efficiency, size, applications, generation, and heterogeneity.
Scalability Of Parallel Architectures
✓speed
o Scalable system is capable of increasing its speed in proportion to the increase in the number of processors.
o Example:
▪ Consider the case of adding m numbers on a 4-cube (n = 16 processors) parallel system.
▪ Assume for simplicity that m is a multiple of n e.g., 32, 64, ….
▪ Assume also that originally each processor has m numbers stored in its local memory.
▪ The addition can then proceed as follows: n

▪ First: each processor can add its own numbers sequentially in m steps.
n
▪ The addition operation is performed simultaneously in all processors.
▪ Second: each pair of neighboring processors can communicate their results to one of them whereby the
communicated result is added to the local result.
▪ The second step can be repeated in promotion to log n times, until the final result of the addition process is stored in one of
2
the processors.
▪ Assuming that each computation andmthe communication takes one unit time then the time needed to perform the
addition of these m numbers is Tp = n + 2  log 2 n
▪ Recall that the time required to perform the samem
operation on a single processor is Ts = m
S=
m
▪ Therefore, the speedup is given by n
+ 2  log 2 n
Scalability Of Parallel Architectures
The possible speedup for different m and n
m n=2 n=4 n =8 n = 16 n = 32
64 1.88 3.2 4.57 5.33 5.33
128 1.94 3.55 5.82 8.00 9.14
256 1.97 3.76 6.74 10.67 14.23
512 1.98 3.88 7.31 12.8 19.70
1024 1.99 3.94 7.64 14.23 24.38
✓ Efficiency
o Consider, for example, the above problem of adding m numbers on an n-cube. The efficiency of such system is
defined as the ratio between the actual speedup, S, and the ideal speedup, n. Therefore,
= S = m
n m + 2n  log 2 n

Efficiency for different values of m and n

M n=2 n=4 n =8 n = 16 n = 32
64 0.94 0.8 0.57 0.33 0.167
128 0.97 0.888 0.73 0.5 0.285
256 0.985 0.94 0.84 0.67 0.444
512 0.99 0.97 0.91 0.8 0.62
1024 0.995 0.985 0.955 0.89 0.76
Scalability Of Parallel Architectures
o The values in the table indicate that for the same number of processors, n, higher efficiency is achieved as the size
of the problem, m, is increased.
o Also, as the number of processors, n, increases, the efficiency continues to decrease.
o Given these two observations, it should be possible to keep the efficiency fixed by increasing simultaneously both
the size of the problem, m, and the number of processors, n.
o This is a property of a scalable parallel system.
o The degree of scalability of a parallel system is determined by the rate at which the problem size must increase
with respect to n in order to maintain a fixed efficiency as the number of processors increases.
o In a highly scalable parallel system, the size of the problem needs to grow linearly with respect to n to maintain a
fixed efficiency.
o In a poorly scalable system, the size of the problem needs to grow exponentially with respect to n to maintain a
fixed efficiency.
References
▪ Textbook Chapter 3.

Friday, October 4, 2024 13

Tentative Deviation at KK Nagar - Correction
100% (1)
Tentative Deviation at KK Nagar - Correction
84 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Unit 4 HPC
No ratings yet
Unit 4 HPC
82 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Unit 4
No ratings yet
Unit 4
64 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
18 pages
Lect 02
No ratings yet
Lect 02
51 pages
CSCI 8150 Advanced Computer Architecture
No ratings yet
CSCI 8150 Advanced Computer Architecture
26 pages
Week 7
No ratings yet
Week 7
27 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
15CS72 ACA Module1 Chapter3Final
No ratings yet
15CS72 ACA Module1 Chapter3Final
28 pages
Massively Parallel Processors
No ratings yet
Massively Parallel Processors
102 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
Cours 2
No ratings yet
Cours 2
25 pages
Unit-2 Aca
No ratings yet
Unit-2 Aca
24 pages
Iso 26000
No ratings yet
Iso 26000
17 pages
Week 7
No ratings yet
Week 7
27 pages
Module 1 Chapter3
No ratings yet
Module 1 Chapter3
45 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
Origin of Theatre Forms in India
No ratings yet
Origin of Theatre Forms in India
10 pages
2 ND
No ratings yet
2 ND
19 pages
PP 1
No ratings yet
PP 1
41 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
ACA Answer Key
No ratings yet
ACA Answer Key
24 pages
OOAD
No ratings yet
OOAD
67 pages
Module 1-Performance Measure
No ratings yet
Module 1-Performance Measure
14 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
410A Week 4
No ratings yet
410A Week 4
12 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
CompArch 23a MP-1
No ratings yet
CompArch 23a MP-1
17 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Professions and Occupations
No ratings yet
Professions and Occupations
2 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
No ratings yet
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
4 pages
Saya Zion
No ratings yet
Saya Zion
44 pages
DOS 1.0 Jan82
No ratings yet
DOS 1.0 Jan82
307 pages
GK-Kailash Satyarthi - Notes and Worksheet
No ratings yet
GK-Kailash Satyarthi - Notes and Worksheet
4 pages
Worksheet of English Grammar Part 1
0% (1)
Worksheet of English Grammar Part 1
3 pages
MRP System Nervousness
100% (1)
MRP System Nervousness
232 pages
Intro Practical
No ratings yet
Intro Practical
6 pages
Review Till Priliminary
No ratings yet
Review Till Priliminary
56 pages
Do You Enjoy or Are You Good at Economics - A4c
No ratings yet
Do You Enjoy or Are You Good at Economics - A4c
1 page
CAMEL Analysis of HDFC Bank 2024 Only Bank Statement
No ratings yet
CAMEL Analysis of HDFC Bank 2024 Only Bank Statement
12 pages
Hemanta Biswal
No ratings yet
Hemanta Biswal
3 pages
Mla Bibliography Website
100% (1)
Mla Bibliography Website
4 pages
Application For Death Certificate: Department of Home Affairs
No ratings yet
Application For Death Certificate: Department of Home Affairs
1 page
Samridhi Taneja 1664..
No ratings yet
Samridhi Taneja 1664..
12 pages
MBA II Sem (R19) RegulSup Results Aug-2023
No ratings yet
MBA II Sem (R19) RegulSup Results Aug-2023
40 pages
Pat B.ing Kls 3
No ratings yet
Pat B.ing Kls 3
5 pages
Introduction To Social Representation Theory
No ratings yet
Introduction To Social Representation Theory
8 pages
1st Long Test PECS and SWOT
No ratings yet
1st Long Test PECS and SWOT
2 pages
PCX Hotline: 725-8888: Dealer's Price List Fri, Aug 21, 2020
No ratings yet
PCX Hotline: 725-8888: Dealer's Price List Fri, Aug 21, 2020
2 pages
Biology 163 Study Guide 1
No ratings yet
Biology 163 Study Guide 1
8 pages
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
No ratings yet
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
10 pages
Your Handsome Captain
No ratings yet
Your Handsome Captain
14 pages
HASYTEC DBPi Brochure
No ratings yet
HASYTEC DBPi Brochure
4 pages
SVN Class 6 - Nov 10 Summary
No ratings yet
SVN Class 6 - Nov 10 Summary
3 pages
Prisoner Diving Gear
No ratings yet
Prisoner Diving Gear
2 pages
Java Assignment
No ratings yet
Java Assignment
2 pages
UltraPoxy Data Sheet English v3
No ratings yet
UltraPoxy Data Sheet English v3
2 pages
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Module #2

Performance Analysis of Multiprocessor

Friday, October 4, 2024 1

✓ Effect of the communication overhead: Assume that t c is the communication overhead

S (n) = speedup factor wit h communiati on overhead

Sum  0; only one processor can do this (series section)

Average  sum/n; only one processor can do this (series section)

The k-ary n-cube network is a radix k cube with

Efficiency for different values of m and n

Friday, October 4, 2024 13

You might also like