0% found this document useful (0 votes)
24 views13 pages

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module #2

Performance Analysis of Multiprocessor


Professor Mostafa Abd-El-Barr

Term 2024-2025

Friday, October 4, 2024 1


Outline
❑Computational Models
❑ An Argument for Parallel Architectures
❑ Interconnection Networks Performance Issues
❑ Scalability Of Parallel Architectures

2
Computational Models
1. Equal Duration Models
✓ It is assumed that a given task can be divided into n equal subtasks each of which can be
executed by one processor.
✓ If t s = execution time of the whole task usingta single processor, then the time taken by
each processor to execute its subtask is tm = n
s

✓ Definition: The speedup factor S(n) of a parallel system is the ratio between the time taken
by a single processor to solve a given problem instance to the time taken by a parallel system
consisting of n processors to solve the same problem instance.
S ( n) = speedup factor
ts t
= = s = n
t t
m s
n

✓ Effect of the communication overhead: Assume that t c is the communication overhead


needed to communicate resulting from the time needed for processors to communicate and
possibly exchange data while executing their subtasks.
✓ Assume that the time incurred due to the communication overhead is called t.m = ts + tc
n
✓ The actual time taken by each processor to execute its subtask is given by

S (n) = speedup factor wit h communiati on overhead


t t n
= s = s =
t t t Friday, October 4, 2024
m s +t 1+ n  c
n c t
Computational Models
✓ Definition: efficiency  is a measure of the speedup achieved per processor.
✓ If the communication overhead is taken into consideration, the efficiency can be expressed
1
as  = t
1+ n c
t
s
✓ The equal duration model is however unrealistic.
✓ This is because it is based on the assumption that a given task can be divided into a number
of equal subtasks that can be executed by a number of processors in parallel.
✓ Real algorithms contain some (serial) parts that cannot be divided among processors. These
(serial) parts must be executed on a single processor.
✓ Example

For I  1, n
c(I)  a(I) + b(I); done in parallel, each processor does one addition

Sum  0; only one processor can do this (series section)


For J  1, n

sum  sum + c(j); done in parallel, each processor does one addition

Average  sum/n; only one processor can do this (series section)

For k  1, n
a(k) a(k)-average;
b(k)  b(k) –average done in parallel, each processor does one addition

✓ This illustrative example shows that a realistic computational model should assume the
Computational Models
2. Parallel Computation with Serial Sections Model
✓ It is assumed that a fraction f of the given task (computation) is not dividable into concurrent
subtasks.
✓ The remaining part (1-f) is assumed to be dividable into concurrent subtasks.
✓ Performing similar derivations to those done in the case of the equal duration model will
result in the following: ts
tm = fts + (1 − f )
✓ The time required to execute the task on n processors
ts is n
n
✓ The speedup factor is therefore given by S ( n ) =
t
=
1 + (n − 1) f
fts + (1 − f ) s
n
✓ Result: The potential speedup due to the use of n processors is determined primarily by the
fraction of code that cannot be divided.
✓ If the task is completely serial, i.e. f = 1, then no speedup can be achieved regardless of the
number of processors used.
✓ This principle is known as the Amdahl’s law.
1
✓ According to this law, the maximum speedup factor is given by nLim S ( n) =
→ f
✓ According to Amdahl’s law the improvement in performance (speedup) of a parallel
algorithm over a sequential one is limited not by the number of processors employed but
rather by the fraction of the algorithm that cannot be parallelized.
✓ For some time and according to Amdahl’s law researchers were led to believe that a
substantial increase in speedup factor will not be possible by using parallel architectures.
Computational Models
✓ As stated earlier, the communication overhead should be included in the processing time.
✓ Considering the time incurred due to this communication overhead, the speedup factor is
ts n
given by S (n) = t
=
t
fts + (1 − f ) s + tc f (n − 1) + 1 + n c
n ts n 1
✓ The maximum speedup factor under such conditions is given by nLim S (n) = Lim =
→ n→ t t
f (n − 1) + 1 + n c f + c
ts ts
✓ The above equation indicates that the maximum speed-up factor is determined not by the
number of parallel processors employed but by the fraction of the computation that is not
parallelized and the communication overhead.
✓ Recall that the efficiency is defined as the ratio between the speedup factor and the number
of processors, n. The efficiency can be computed as follows.
1
 (no communication overhead) =
1 + (n − 1) f
1
 (with communication overhead) =
tc
f (n − 1) + 1 + n
ts
✓ As the number of processors increases, it may become difficult to use those processors efficiently.
✓ In order to maintain a certain level of processor efficiency, there should exist a relationship
between the fraction of serial computation, f, and the number of processor employed.
Interconnection Networks Performance Issues
o Definition:
Channel Bisection Width of a network (B): is the minimum number of wires that, when cut, divide the network into
equal halves with respect to the number of nodes.
o Definition: The wire bisection is the number of wires crossing this cut of the network.
o Example: the bisection width of a 4-cube is B = 8.

The k-ary n-cube network is a radix k cube with


n dimensions. N = k n
K=8
(a) 8-ary 1-cube (8 nodes ring) network

The Table provides some numerical values of the above topological characteristics for sample static networks.
Network Configuration Bisection Width (B) Node Degree (d) Diameter (D)
8-ary 1-cube 2 2 4
4-cube 8 4 4
3  3  2 Mesh 9 3 5
8-ary 2-cube 16 4 8
Interconnection Networks Performance Issues
✓ Bandwidth of a crossbar
o Define the bandwidth for the crossbar as the average number of requests that can be accepted by a crossbar
in a given cycle.
o As processors make requests for memory modules in a crossbar, contention can take place when two or
more processors request access to the same memory module.
o Example: the case of a crossbar consisting of three processors p1, p2 , and p3 and three memory modules M1, M 2 , and M 3.
o As processors make requests for accessing memory modules, the following cases may take place.
1. All three processors request access to the same memory module: In this case, only one request can be accepted.
Since there are three memory modules, then there are three ways (three accepted requests) in which such a case
can arise.
2. All three processors request access to two different memory modules: In this case two requests can be granted.
There are 18 ways (find why)(thirty-six accepted requests) in which such a case can arise.
3. All three processors request access to three different memory modules: In this case all three requests can be
granted. There are six ways (find why) (eighteen accepted requests) in which such a case can arise.
o Out of the twenty-seven combinations of 3 requests taken from 3 possible requests, there are 57 requests that can
be accepted (causing no memory contention).
o We say that the bandwidth of such a crossbar is BW = 57/27 = 2.11 Assuming that all processors make requests
for memory module access in every cycle.
Scalability Of Parallel Architectures
• Definition: A parallel architecture is said to be scalable if it can be expanded
(reduced) to a larger (smaller) system with a linear increase (decrease) in its
performance (cost).
• This general definition indicates the desirability for providing equal chance for
scaling up a system for improved performance and for scaling down a system for
greater cost-effectiveness and/or affordability.
• Scalability is used as a measure of the system’s ability to provide increased
performance, e.g., speed as its size is increased.
• Scalability is a reflection of the system’s ability to efficiently utilize the increased
processing resources.
• Scalability of a system can be manifested in a number of forms. These forms
include speed, efficiency, size, applications, generation, and heterogeneity.
Scalability Of Parallel Architectures
✓speed
o Scalable system is capable of increasing its speed in proportion to the increase in the number of processors.
o Example:
▪ Consider the case of adding m numbers on a 4-cube (n = 16 processors) parallel system.
▪ Assume for simplicity that m is a multiple of n e.g., 32, 64, ….
▪ Assume also that originally each processor has m numbers stored in its local memory.
▪ The addition can then proceed as follows: n

▪ First: each processor can add its own numbers sequentially in m steps.
n
▪ The addition operation is performed simultaneously in all processors.
▪ Second: each pair of neighboring processors can communicate their results to one of them whereby the
communicated result is added to the local result.
▪ The second step can be repeated in promotion to log n times, until the final result of the addition process is stored in one of
2
the processors.
▪ Assuming that each computation andmthe communication takes one unit time then the time needed to perform the
addition of these m numbers is Tp = n + 2  log 2 n
▪ Recall that the time required to perform the samem
operation on a single processor is Ts = m
S=
m
▪ Therefore, the speedup is given by n
+ 2  log 2 n
Scalability Of Parallel Architectures
The possible speedup for different m and n
m n=2 n=4 n =8 n = 16 n = 32
64 1.88 3.2 4.57 5.33 5.33
128 1.94 3.55 5.82 8.00 9.14
256 1.97 3.76 6.74 10.67 14.23
512 1.98 3.88 7.31 12.8 19.70
1024 1.99 3.94 7.64 14.23 24.38
✓ Efficiency
o Consider, for example, the above problem of adding m numbers on an n-cube. The efficiency of such system is
defined as the ratio between the actual speedup, S, and the ideal speedup, n. Therefore,
= S = m
n m + 2n  log 2 n

Efficiency for different values of m and n


M n=2 n=4 n =8 n = 16 n = 32
64 0.94 0.8 0.57 0.33 0.167
128 0.97 0.888 0.73 0.5 0.285
256 0.985 0.94 0.84 0.67 0.444
512 0.99 0.97 0.91 0.8 0.62
1024 0.995 0.985 0.955 0.89 0.76
Scalability Of Parallel Architectures
o The values in the table indicate that for the same number of processors, n, higher efficiency is achieved as the size
of the problem, m, is increased.
o Also, as the number of processors, n, increases, the efficiency continues to decrease.
o Given these two observations, it should be possible to keep the efficiency fixed by increasing simultaneously both
the size of the problem, m, and the number of processors, n.
o This is a property of a scalable parallel system.
o The degree of scalability of a parallel system is determined by the rate at which the problem size must increase
with respect to n in order to maintain a fixed efficiency as the number of processors increases.
o In a highly scalable parallel system, the size of the problem needs to grow linearly with respect to n to maintain a
fixed efficiency.
o In a poorly scalable system, the size of the problem needs to grow exponentially with respect to n to maintain a
fixed efficiency.
References
▪ Textbook Chapter 3.

Friday, October 4, 2024 13

You might also like