0% found this document useful (0 votes)

22 views49 pages

5 Module #5 Parallel Algorithms Actual October 30 2024

Uploaded by

Omar Amer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views49 pages

5 Module #5 Parallel Algorithms Actual October 30 2024

Uploaded by

Omar Amer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Module #5

Introduction to Parallel Algorithms Processing

Professor Mostafa Abd-El-Barr

Fall Term 2024-2025

Saturday, November 2, 2024 1

Outline
1. Conventional Algorithms Processing
2. Conventional sorting Algorithms Processing
3. Introduction (Conventional Matrix Multiplication Processing)
4. Introduction to Parallel Architectures and Algorithms
Introduction (Conventional Algorithms Processing)
Example Algorithms in Pseudo-code
Example 1: Construct an algorithm (in Pseudo-code) for finding the minimum (smallest) value in a finite set of
integers. Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7}

Example 2: Construct an algorithm (in Pseudo-code) for Searching (finding out) if a given integer, x, exists in a set
of integers. Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7} The targeted integer is, x= -12

Example 3: Construct an algorithm (in Pseudo-code) for finding out if a given integer, x, exists in a set of sorted
integers. Illustration: {-22, -12 , -7 , -1 , 1 , 5 , 10, 15, 23, 35} The targeted integer is, x= -12
Example 4: Construct an algorithm (in Pseudo-code) for sorting a set of integers in increasing order using the
Bubble sort technoque. Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7}
Example 5: Construct an algorithm (in Pseudo-code) for sorting a set of integers in increasing order using the
insertion sort technique. Illustration {8, 2, 4, 9, 3, 6}
Example 6: Construct an algorithm (in Pseudo-code) for sorting a set of integers in increasing order using the
selection sort technique. Illustration {8, 2, 4, 9, 3, 6}
Introduction (Conventional Algorithms Processing)
Pseudo-Code
• Pseudo-Code: a high-level abstraction of code, usually used to outline the general steps in an algorithm
without having to write actual code (usually done for the reader's or programmer's benefit).
Example Algorithms in Pseudo-code
Example 1: Construct an algorithm (in Pseudo-code) for finding the minimum (smallest) value in a finite set of
integers.
• Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7}
• Algorithm 1 Find-Min
1. Input: set of integers a1 , a 2 ,..., a n
2. Output: Temporary Minimum variable = min,
3. Steps:
3.1. min := a1
3.2. for i := 2 to n
3.3. if min > ai then min := ai
3.4. end for
Introduction (Conventional Algorithms Processing)
Example Algorithms in Pseudo-code
◼ Example 2: Construct an algorithm (in Pseudo-code) for Searching (finding out) if a given
integer, x, exists in a set of integers.
◼ Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7} The targeted integer is, x= -12
◼ Algorithm 2 Linear-Search
1. Input: set of integers a1 , a 2 ,..., a nand integer x
2. Output: location =i if x= ai ; otherwise, location = 0
3. Steps:
3.1. i :=1
3.2. while (i  n and x  ai )
i := i+1
3.3. end while
3.4. if i  n then location := i else location := 0
Introduction (Conventional Algorithms Processing)
Example Algorithms in Pseudo-code
◼ Example 3: Construct an algorithm (in Pseudo-code) for finding out if a given integer, x,
exists in a set of sorted integers.
◼ Illustration: {-22, -12 , -7 , -1 , 1 , 5 , 10, 15, 23, 35}
The targeted integer is, x= -12
◼ Algorithm 3 Binary-Search
a1 , a2 ,..., an
1. Input: set of n sorted integers and integer x
2. Output: location =k if x= a k ; otherwise location = 0
3. Steps:
3.1. i :=1, j := n, k :=0
3.2. while ((i  j) and (k=0))
3.3. m := (i + j ) / 2
3.4. if x=
a m then k := m
3.5. else if x < a m then j := m-1
3.6. else i := m+1
3.7. end while
Trace the Algorithm.
Introduction (Conventional Algorithms Processing
Example Algorithms in Pseudo-code
• Example 4: Construct an algorithm (in Pseudo-code) for sorting a set of integers in increasing
order.
• Illustration: {5, -1, 10, 35, -22, 15, -12, 23, 1, -7}
• Algorithm 4 Bubble-Sort
1. Input: set of integers a1 , a2 ,..., an
2. Output: a1 , a2 ,..., an sorted
3. Steps:
3.1. for i := 1 to n-1
3.2. for j:= 1 to n-i
3.3. if a j  a j +1 then interchange a j and a j +1
3.4. end for

Trace the algorithm.

Introduction (Conventional Algorithms Processing)
✓ Assigning costs to algorithms: Asymptotic Algorithm Time Complexity (cost measures)
▪ Asymptotic (algorithm analysis) measures the efficiency of an algorithm as the size of the input becomes large.

▪ The size is measured in terms of the number of inputs processed by the algorithm.

▪ The number of basic operations to process an input of certain size is important in the analysis.

▪ The time taken to complete a basic operation is considered to be independent on the particular values of its operands.

Worst-case: (usually) the O-notation

• T(n) = maximum time of algorithm on any input of size n.

Average-case: (sometimes) the Θ-Theta notation

• T(n) = expected time of algorithm over all inputs of size n.
• Need assumption of statistical distribution of inputs.

Best-case: (Rarely) the Ω-Omega notation

• Cheat with a slow algorithm that works fast on some input.
Introduction (Conventional Algorithms Processing)
The O-notation
• For a given function g (n) , we denote by O( g (n)) the set
of functions
 f (n) : there exist positive constants c and n0 s.t.
O( g (n)) =  
 0  f ( n )  cg ( n ) for all n  n 0 

• We use O-notation to give an asymptotic upper bound of

a function, to within a constant factor.
• f (n) = O( g (n)) means that there existes some constant c
s.t. f (n) is always  cg (n) for large enough n.
Introduction (Conventional Algorithms Processing)
Θ -Theta notation
• For a given function g (n), we denote by ( g (n)) the set
of functions

 f (n) : there exist positive constants c1, c2 , and n0 s.t.

( g (n)) =  
 0  c1g (n)  f (n)  c2 g (n) for all n  n0 

• A function f (n) belongs to the set ( g (n)) if there exist

positive constants c1 and c2 such that it can be “sand-
wiched” between c1 g (n) and c2 g (n) or sufficienly large n.
• f (n) = ( g (n)) means that there exists some constant c1
and c2 s.t. c1 g (n)  f (n)  c 2 g (n) for large enough n.
Introduction (Conventional Algorithms Processing)
Ω-Omega notation
• For a given function g (n) , we denote by ( g (n)) the
set of functions

 f (n) : there exist positive constants c and n0 s.t.

( g (n)) =  
 0  cg (n)  f (n) for all n  n0 
• We use Ω-notation to give an asymptotic lower bound on
a function, to within a constant factor.
• f (n) = ( g (n)) means that there exists some constant c s.t.
f (n) is always  cg (n) for large enough n.
Introduction (Conventional Algorithms Processing)
Asymptotic notation

Graphic examples of , O, and  .

Introduction (Conventional Algorithms Processing)
✓Order of Growth of Functions
• We are interested in the running time of an algorithm for large input size.
• This allows us to consider the rate of growth or the order of growth of the running time.
• Example: for the function f ( n) = n 2 log n + 10 n 2 + n
• It is observed that as the larger the value of n, the significance the value of the term n 2 log n and the lesser the
significance of the contribution of the lower terms 10n 2 and n.
Rate of Growth of Functions

n 1 2 4 8 16 32
C=1 1 1 1 1 1 1
log 2 n 0 1 2 3 4 5
n× log 2 n 0 2 8 24 64 160
n2 1 4 16 64 256 1K
n3 1 8 64 512 4K 32K
2n 2 4 16 256 64K 4T
Introduction (Conventional sorting Algorithms Processing)

1. Bubble Sort
▪ Basic Idea: the idea is to repeatedly move the largest element to the highest index
position of the array.
▪ Each iteration reduces the effective size of the array by one.
▪ The focus is on successive adjacent pairs of elements in the array, they are compared
and either swaps them or not. In either case, after such a step, the larger of the two
elements will be in the higher index position.
▪ The focus then moves to the next higher position, and the process is repeated.
▪ When the focus reaches the end of the effective array, the largest element will have
``bubbled'' from whatever its original position to the highest index position in the
effective array.

14
Introduction (Conventional sorting Algorithms Processing)
Bubble Sort (Cont’d)
◼ Example: Consider the shown array A having unsorted set of integers.

45 12 34 67 25 39
45 67 12 34 25 39
0 1 2 3 4 5
0 1 2 3 4 5

45 67 12 34 25 39 45 12 34 25 67 39

0 1 2 3 4 5 0 1 2 3 4 5

45 12 34 25 39 67
45 12 67 34 25 39
0 1 2 3 4 5
0 1 2 3 4 5

15
Introduction (Conventional Sorting Algorithms Processing)
Bubble Sort (Cont’d)
• A bubble step is done by the following loops:

for j ← 0 to n-1
for i ← 0 to n-j
{ if (A[i] > A[i+ 1]);
{Temp ← A[i];
A[i] ← A[i+1];
A [i+1] ← Temp}};
Worst case Analysis: O ( n 2 )
▪ The loop compares all adjacent elements at index i and i + 1. If they are not in the correct order, they are
swapped.
▪ One complete bubble step moves the largest element to the last position, which is the correct position for that
element in the final sorted array.
▪ The effective size of the array is reduced by one and the process repeated until the effective size becomes one.
▪ Each bubble step moves the largest element in the effective array to the highest index of the effective array.
16
Introduction (Conventional sorting Algorithms
Processing)
2. Insertion Sort
• Basic Idea: For each element in the list of elements, find the proper slot where it should belong, and insert it.
• One element by itself is already sorted.
• Two elements are then considered and sorted, i.e., swapped if needed.
• Three elements, the third element is swapped leftward until it is in its proper order with the first two elements.
• Four elements, the fourth element is swapped leftward until it is in its proper order with the first three
elements.
• Continue in this manner with the fifth element, the sixth element, and so on until the whole list is sorted.

• How does it work?

✓ Each element A[j] is taken one at a time from j to n-1.
✓ Before insertion: sub-array from A[0] to A[j-1] is sorted, and the remainder of the array is unsorted.
✓ After insertion A[0] to A[j] is correctly ordered while the sub-array with elements A[j+1]…A[n-1] is unsorted.

17
Introduction (Conventional sorting Algorithms Processing)
Insertion Sort
Algorithm InsertionSort(A):
Input: An Array A of n elements
Output: The array A with its n elements sorted in a non-decreasing order
for i ← 1 to n-1 do
Temp ← A[i]
j ← i-1
while j  0 and A[j] > Temp do
A[j+1] ← A[j]
j ← j-1
end while
A[j+1] ← Temp Trace the algorithm
end for
• Best Case Analysis: Elements are already sorted
The inner loop will never be executed. The outer loop is executed n-1 times, i.e. O(n)
• Worst Case Analysis: Elements are in reverse Order
The inner loop will be executed the maximum number of times. The outer loop is executed n-1 times, i.e., O ( n2 )
Introduction (Conventional Sorting Algorithms Processing)

A pseudocode for insertion sort ( INSERTION SORT ).

INSERTION-SORT(A)
1 for j  2 to length [A]
2 do key  A[ j]
3  Insert A[j] into the sortted sequence A[1,..., j-1].
4 i j–1
5 while i > 0 and A[i] > key
6 do A[i+1]  A[i]
7 ii–1
8 A[i +1]  key
Introduction (Conventional Sorting Algorithms Processing)

Insertion sort Step

INSERTION-SORT (A, n) ⊳ A[1 . . n]
for j ← 2 to n
do key ← A[ j]
“pseudocode” i←j–1
while i > 0 and A[i] > key
do A[i+1] ← A[i]
i←i–1
A[i+1] = key
1 i j n
A:

key
sorted

L1.20
Introduction (Conventional Sorting Algorithms Processing)
Example insertion sort
8 2 4 9 3 6

8 2 4 9 3 6

L1.21
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

L1.22
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

L1.23
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.24
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.25
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.26
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.27
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

L1.28
Introduction (Conventional Sorting Algorithms Processing
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

Trace of the algorithm

Introduction (Conventional Sorting Algorithms Processing
Analysis of INSERTION-SORT
INSERTION - SORT(A) cost times
1 for j  2 to length [ A] c1 n
2 do key  A[ j ] c2 n −1
3  Insert A[ j ] into the sorted
sequence A[1   j − 1] 0 n −1
4 i  j −1 c4 n −1
5 while i  0 and A[i ]  key c5  n
t
j =2 j

6 do A[i + 1]  A[i ] c6  n
(t
j =2 j
− 1)

7 i  i −1 c7  n
(t
j =2 j
− 1)
8 A[i + 1]  key c8 n −1
Introduction (Conventional Sorting Algorithms Processing

Analysis of INSERTION-SORT
The total running time is
𝑛 𝑛 n
T(n) = C1 n + C2 (n-1) + C4 (n-1) + C5 ෍
𝑗=2
𝑡𝑗 + C6 ෍
𝑗=2
( 𝑡 𝑗 − 1) + c7  (t j − 1) + c8 (n − 1).
j =2

• The best case: The array is already sorted. (tj =1 for j=2,3, ...,n)
T (n) = c1n + c2 (n − 1) + c4 (n − 1) + c5 (n − 1) + c8 (n − 1)
= (c1 + c2 + c4 + c5 + c8 )n − (c2 + c4 + c5 + c8 ).
• The worst case: The array is reverse sorted (tj =j for j=2,3, ...,n).
T (n) = c1n + c2 (n − 1) + c5 (n(n + 1) / 2 − 1)+ c6 (n(n − 1) / 2) + c7 (n(n − 1) / 2) + c8 (n − 1)
= (c5 / 2 + c6 / 2 + c7 / 2)n 2 + (c1 + c2 + c4 + c5 / 2 − c6 / 2 − c7 / 2 + c8 )n
T (n) = an 2 + bn + c
Introduction (Conventional Sorting Algorithms Processing
• Selection Sort (Cont’d)
for I ← 0 to n-2
Temp ← A[I]
Location ←I
for J ← I+1 to n-1
if A[J] < A [Location]
Location ← J
A[I] ← A [Location]
A [Location] ← Temp
Trace of the algorithm 126 43 26 1 113

I Temp Location Inner Loop Final Selection

0 126 0 J=1
……..
✓The total number of comparisons = (n-1) + (n-2) + …+ 2 + 1= n(n − 1)
= O(n 2 ) 32
2
Summary of Some Sorting Algorithms

Algorithm Time (worst Possible case Notes

Bubble-Sort O(n2) ◼ slow
◼ for small data sets (< 1K)

Selection-sort O(n2) ◼ slow

◼ all data sets (< 1K)

Insertion-sort O(n2) ◼ slow

◼ all data sets (< 1K)

33
Introduction (Conventional Matrix Multiplication Processing)
Example 1: Matrix - Vector Multiplication
•Axb=y
• Allocate tasks to
j rows of A
y[i] = ∑A[i,j]*b[j]

• Dependencies:
o Computing each
element of y can be
done independently

• Speedup?

Introduction to Parallel Computing, University of Oregon, IPCC 34

Introduction (Conventional Matrix
A
Multiplication
B C
Processing)
Example 2: Matrix Multiplication
x =

•AxB=C
• A[i,:] • B[:,j] = C[i,j]

Introduction to Parallel Computing, University of Oregon, IPCC 35

Introduction to Parallel Architectures and Algorithms
✓ What is meant by Parallelism?
o Ability to execute different parts of a program at the same time (concurrently) on
different processors to speed up and shorten execution time.
✓ Speedup
o Speedup (of algorithm) = sequential execution time/execution time on p processors (for the same
data set).
✓ Scalability
o a program is said to scale to a certain number of processors p, if going from p-1 to p processors
results in some acceptable improvement in speedup (for instance, an increase of 20%).
Time Speed
sequential Ideal speedup

Solution time Actual speedup

Com putati on

Com munication

Number of processors Number of processors p

p
Introduction to Parallel Architectures and Algorithms
✓ Amdahl’s Law
o If f = 1/s of the program is sequential, then you can never get a speedup better than s.
▪ (Normalized) sequential execution time = 1/s + (1- 1/s) = 1
▪ Best parallel execution time on p processors = 1/s + (1 - 1/s) /p
▪ When p goes to infinity, parallel execution time= 1/s
▪ (maximum) Speedup = s.
50

f = 0 f = fraction
40
unaffected
p = speedup
Speedup (s )

f = 0 . 01
30
f = 0 . 02 of the rest
20
f = 0 . 05 1
s=
10 f + (1 – f)/p
f = 0 .1

0  min(p, 1/f)
0 10 20 30 40 50
E nha nc em en t f ac tor ( p )
Introduction to Parallel Architectures and Algorithms
Types of Parallelism: (Flynn and Johnson) Taxonomy

Single data Multiple data Shared Message

stream streams variables passing
Single instr

memory
stream

Global
Johnson’ s expansion
SISD SIMD GMSV GMMP
Uniprocessors Array or vector Shared-memory Rarely used
processors multiprocessors
Multiple instr

Distributed
memory
streams

MISD MIMD DMSV DMMP

Rarely used Multiproc’s or Distributed Distrib-memory
multicomputers shared memory multicomputers

Flynn’s categories
Introduction to Parallel Architectures and Algorithms
✓Common Parallel Architecture Models
Model Mapping Technique Observations Illustration
Most widely used for programming Communications
Message Passing Sending and receiving messages
parallel computers (clusters of Primitives
workstations ) send(buff, size, destination) M1 N2 P2 Mn
P2
receive(buff, size, source)
Features (Key attributes): Blocking vs non-blocking
Buffered vs non-buffered Link 1 Link 2 Li
Partitioned address space Message Passing Interface (MPI)
Explicit parallelization Popular message passing library
Process interactions ~125 functions
Send and receive data Interconnection
Network

Mostly used for programming SMP Communication

machines (multi-core chips) Read/write memory
Posix Thread API
Popular thread API M M M M
Shared Address Space Features (Key attributes): Operations
Shared address space Creation/deletion of threads
Implicit parallelization Synchronization (mutexes,
Process/Thread communication semaphores) Interconnection Network
Memory reads/stores Thread management
P1
P P P P
Introduction to Parallel Architectures and Algorithms
✓Types of Parallel execution of programs models
• Data parallel: all processors do same thing on different data
• Task graph: processors are assigned tasks that do different things
• Work pool: Data grouped as per the work to done
• Pipelining: Data is processed in a pipelined form
• Master-Worker: Two types of processors: Master Processor and Workers Processors

Introduction to Parallel Computing, University of Oregon, IPCC 40

Introduction to Parallel Architectures and Algorithms
Common Parallel Execution of Programs Models
The model Mapping Technique Observations Illustration
P O
• Static • Data-parallel Computation DD O
Data-parallel • Tasks process data, synchronize to P
• Tasks -> Processes get new data or exchange results, DD O
• Independent data items assigned to continue until all data processed D P
processes (Data Parallelism) • Load Balancing: Uniform O
partitioning of data Synchronization P O
P
• Static • Computation: Each node processes
Task graph input from previous node(s) and
• Tasks are mapped to nodes in a data P
send output to next node(s) in the P
dependency graph
• task dependency graph (Task • Load Balancing: Assign more
parallelism) processes to a given task Eliminate D P P P O
graph bottlenecks
• Mapping of Data: Data moves through • Synchronization: Node data D O
graph (Source to Sink) exchange D
• Examples: Parallel Quicksort, Divide P
and Conquer approaches

Output queue
Work pool • Mapping of Work/Data • Dynamic mapping of tasks to

Input queue
• No desired pre-mapping processes
• Any task performed by any process • Synchronization: Adding/removing
• Computation: Processes work as data work from input queue PP
PP
becomes available (or requests arrive) • Example: Web Server
Introduction to Parallel Architectures and Algorithms
o The speed at which sequential computers operate has been improving at an exponential rate for many years, the improvement is now
coming at greater and greater cost.
o To design an algorithm that specifies multiple operations on each step, i.e., a parallel algorithm.
o Example: computing the sum of a sequence A of n numbers.
o It is not difficult however, to devise an algorithm for computing the sum that performs many operations in parallel. For example,
o Suppose that, in parallel, each element of A with an even index is paired and summed with the next element of A, which has an odd index,
o A[0] is paired with A[1], A[2] with A[3], and so on.
o The result is a new sequence of ⌈n/2⌉ numbers that sum to the same value as the sum that we wish to compute.
o This pairing and summing step can be repeated until, after ⌈log2 n⌉ steps, a sequence consisting of a single value is produced, and this value
is equal to the final sum.
o it is important to make a distinction between the parallelism in an algorithm and the ability of any particular computer to perform multiple
operations in parallel.
o In order for a parallel algorithm to run efficiently on any type of computer, the algorithm must contain at least as much parallelism as the
computer.
o The converse does not always hold: some parallel computers cannot efficiently execute all algorithms, even if the algorithms contain a great
deal of parallelism.
o Experience has shown that it is more difficult to build a general-purpose parallel machine than a general-purpose sequential machine.

1 + + + + + + + + 8

1 + + + + 4

1 + + 2

1 + 1

Depth total = 4 Work Total: 15

Introduction to Parallel Architectures and Algorithms
✓ Multiprocessor models
o Multiprocessor models can be classified into three basic types:
▪ local memory machine models,
▪ modular memory machine models, and
▪ parallel random-access machine (PRAM) models.
▪ The Figure illustrates the structure of these machine models.

Interconnection Network M M M M M Memory Shared Memory

1 2 3 4 m

P1 P2 P3 Pn Processors Interconnection Network

................... P1 P2 P3 ................... Pn Processors

M M M Mn Memory P P P P Processors
1 2 3
...................
2 3 n
1

▪ In all three types of models, there may be differences in the operations that the processors and networks are
allowed to perform.
Introduction to Parallel Architectures and Algorithms
✓ Network topology
Bus Mesh Hypercube Multistage

000 1 5 9 000
001
001
010 2 6 10 010
011 011

100 3 7 11 100
101
101
(a) Bus (c) Hypercube
110 4 8 12 110

111 111
Introduction to Parallel Architectures and Algorithms
The Bus:
The simplest network topology is a bus.
This network can be used in both local memory machine models and modular memory machine models. In either
case, all processors and memory modules are typically connected to a single bus. In each step, at most one piece of
data can be written onto the bus. This data might be a request from a processor to read or write a memory value, or
it might be the response from the processor or memory module that holds the value. In practice,
the advantages of using a bus is that it is simple to build, and, because all processors and memory modules can
observe the traffic on the bus, it is relatively easy to develop protocols that allow processors to cache memory
values locally.
The disadvantage of using a bus is that the processors have to take turns accessing the bus. Hence, as more
processors are added to a bus, the average time to perform a memory access grows proportionately.
Introduction to Parallel Architectures and Algorithms
Mesh Topology
Several variations on meshes are also popular, including 3-dimensional meshes, toruses, and hypercubes. A torus is
a mesh in which the switches on the sides have connections to the switches on the opposite sides. Thus, every switch
(x,y) is connected to four other switches: (x,y+1 modY ), (x,y−1 modY ), (x+1 modX,y), and (x−1 modX,y). The figure
shows an example of a 2-dimesnsional mesh
Introduction to Parallel Architectures and Algorithms
Multistage network
A multistage network is used to connect one set of switches called the input switches to another set called the output
switches through a sequence of stages of switches.
The stages of a multistage network are numbered 1 through L, where L is the depth of the network. The switches on
stage 1 are the input switches, and those on stage L are the output switches. In most multistage networks, it is possible
to send a message from any input switch to any output switch along a path that traverses the stages of the network in
order from 1 to L.
Multistage networks are frequently used in modular memory computers; typically, processors are attached to input
switches, and memory modules to output switches.
A processor accesses a word of memory by injecting a memory access request message into the network.
This message then travels through the network to the appropriate memory module.
If the request is to read a word of memory, then the memory module sends the data back through then network to the
requesting processor.
Introduction to Parallel Architectures and Algorithms
Routing of Networks
An alternative to modeling the topology of a network is to summarize its routing capabilities in terms of two
parameters, its latency and bandwidth.
The latency, L, of a network is the time it takes for a message to traverse the network. In actual networks this will
depend on the topology of the network, which particular ports the message is passing between, and the congestion of
messages in the network. The latency, is often modeled by considering the worst-case time assuming that the network
is not heavily congested.
The bandwidth at each port of the network is the rate at which a processor can inject data into the network. In actual
networks this will depend on the topology of the network, the bandwidths of the network’s individual communication
channels, and, again, the congestion of messages in the network. The bandwidth often can be usefully modeled as the
maximum rate at which processors can inject messages into the network without causing it to become heavily
congested, assuming a uniform distribution of message destinations. .
✓ Primitive operations
We assume that all processors are allowed to perform the same local instructions as the single processor in the standard
sequential RAM model. (This issue will be discussed in detail in the Abstract Model Modu le)
Introduction to Parallel Architectures and Algorithms
Work-depth models (focusing on the algorithm not the machine)
In a work-depth model, the cost of an algorithm is determined by examining the total number of operations that it
performs, and the dependencies among those operations.
An algorithm’s work W is the total number of operations that it performs; its depth D is the longest chain of
dependencies among its operations.
We call the ratio P = W/D the parallelism of the algorithm.
The advantage of using a work-depth model is that there are no machine-dependent details to complicate the
design and analysis of algorithms.
The Figure : Summing 16 numbers on a tree. The total depth (longest chain of dependencies) is 4 and the total
work (number of operations) is 15. The work and depth for this family of circuits is W(n) = n − 1 and D(n) = log2 n.

1 + + + + + + + + 8

1 + + + + 4

1 + + 2

1 + 1

Depth (D) Total: 4 Work (W) Total: 15

CJ Universal Tag Integration With Reverse Proxy - Simple Actions
No ratings yet
CJ Universal Tag Integration With Reverse Proxy - Simple Actions
18 pages
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
No ratings yet
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
33 pages
Multimedia Chapter 1 and 2
No ratings yet
Multimedia Chapter 1 and 2
22 pages
SMS Lift Controller
88% (8)
SMS Lift Controller
40 pages
DM CH 3 Algorithms
No ratings yet
DM CH 3 Algorithms
24 pages
Algorithms
No ratings yet
Algorithms
32 pages
BİM122 Lecture Note 6
No ratings yet
BİM122 Lecture Note 6
32 pages
Algorithms (CHP 3)
No ratings yet
Algorithms (CHP 3)
24 pages
DS Week 2 Basics of DS Lecture Algorithm Arrays TC Linear Binary Search by DR Gaurav
No ratings yet
DS Week 2 Basics of DS Lecture Algorithm Arrays TC Linear Binary Search by DR Gaurav
83 pages
Algorithms
No ratings yet
Algorithms
24 pages
1 Alg Lecture1 (1) (7 Files Merged)
No ratings yet
1 Alg Lecture1 (1) (7 Files Merged)
185 pages
Chapter 3
No ratings yet
Chapter 3
30 pages
Chapter 3
No ratings yet
Chapter 3
81 pages
Lecture 1 Algorithms
No ratings yet
Lecture 1 Algorithms
38 pages
Chapter 7
No ratings yet
Chapter 7
20 pages
Algorithms Introduction
No ratings yet
Algorithms Introduction
45 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
41 pages
Lecture-14 Student
No ratings yet
Lecture-14 Student
33 pages
Ehb208e 3
No ratings yet
Ehb208e 3
38 pages
Algorithm Analysis Design Lecture1 PowerPoint Presentation
No ratings yet
Algorithm Analysis Design Lecture1 PowerPoint Presentation
9 pages
Objectives: A Problem Example 1
No ratings yet
Objectives: A Problem Example 1
5 pages
ADA Lec 001-005
No ratings yet
ADA Lec 001-005
86 pages
Algorithum
No ratings yet
Algorithum
22 pages
Algorithm
No ratings yet
Algorithm
28 pages
ADA Unit 2 Notes
100% (1)
ADA Unit 2 Notes
110 pages
Module1 ADA - 4225 - BCS401 - 12-03-2025
No ratings yet
Module1 ADA - 4225 - BCS401 - 12-03-2025
65 pages
Algorithm&Data Structures
No ratings yet
Algorithm&Data Structures
53 pages
Lecture 13
No ratings yet
Lecture 13
21 pages
Aqa 8525 PG Sample
No ratings yet
Aqa 8525 PG Sample
19 pages
COS 102 - Merged
No ratings yet
COS 102 - Merged
114 pages
Unit 1-1
No ratings yet
Unit 1-1
103 pages
Algorithms
100% (1)
Algorithms
53 pages
Algorithms Search Sort
No ratings yet
Algorithms Search Sort
35 pages
ADA - Unit-I
No ratings yet
ADA - Unit-I
64 pages
3 0 Algorithm
No ratings yet
3 0 Algorithm
147 pages
Algorithms Analysis and Design Lec1
No ratings yet
Algorithms Analysis and Design Lec1
9 pages
CSI 06 Algorithms
No ratings yet
CSI 06 Algorithms
30 pages
Module 1
No ratings yet
Module 1
53 pages
EHB208E 3 Lesson
No ratings yet
EHB208E 3 Lesson
58 pages
CSE373L1
No ratings yet
CSE373L1
72 pages
Csi 06
No ratings yet
Csi 06
30 pages
Lecture 3 - CS50's Computer Science For Lawyers
No ratings yet
Lecture 3 - CS50's Computer Science For Lawyers
12 pages
WorkShop On PLO Exit Exam
No ratings yet
WorkShop On PLO Exit Exam
88 pages
Lecture2 Algorithms-Complexity REV
No ratings yet
Lecture2 Algorithms-Complexity REV
16 pages
Algorithm Design Assignment
No ratings yet
Algorithm Design Assignment
20 pages
Discrete Math Lec410!22!21
No ratings yet
Discrete Math Lec410!22!21
141 pages
Algorithm
No ratings yet
Algorithm
97 pages
Chapter 1: Introduction Algorithms and Conventions: What Is An Algorithm?
No ratings yet
Chapter 1: Introduction Algorithms and Conventions: What Is An Algorithm?
6 pages
Set1 Introduction
No ratings yet
Set1 Introduction
15 pages
EEB 435 Python LECTURE 3
No ratings yet
EEB 435 Python LECTURE 3
34 pages
Decrete Chapter 3
No ratings yet
Decrete Chapter 3
5 pages
Notes
No ratings yet
Notes
60 pages
Design and Analysis of Algorithms: Lecture 1 - 2
No ratings yet
Design and Analysis of Algorithms: Lecture 1 - 2
109 pages
Really: See Page 2 Quoted
No ratings yet
Really: See Page 2 Quoted
11 pages
School of Engineering: Design and Analysis of Algorithms Digital Notes
No ratings yet
School of Engineering: Design and Analysis of Algorithms Digital Notes
154 pages
Ocr 1
No ratings yet
Ocr 1
65 pages
Lecture1 Intro
No ratings yet
Lecture1 Intro
21 pages
DAA Lecture Handouts
No ratings yet
DAA Lecture Handouts
120 pages
Algo
No ratings yet
Algo
8 pages
Introduction To Algorithm Complexity: News About Me Links Contact
No ratings yet
Introduction To Algorithm Complexity: News About Me Links Contact
8 pages
Chapter 3: The Fundamentals: Algorithms, The Integers, and Matrices
No ratings yet
Chapter 3: The Fundamentals: Algorithms, The Integers, and Matrices
27 pages
Algorithms and Data Structures: Simonas Šaltenis
No ratings yet
Algorithms and Data Structures: Simonas Šaltenis
49 pages
Comp106 3 Algorithms
No ratings yet
Comp106 3 Algorithms
52 pages
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Nanotechnology (Tell Me Why #94) (Gnv64)
80% (5)
Nanotechnology (Tell Me Why #94) (Gnv64)
98 pages
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
No ratings yet
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
14 pages
Project Management Process Groups and Knowledge Areas Mapping
100% (1)
Project Management Process Groups and Knowledge Areas Mapping
1 page
Marketing Cell, BTCL.: Bangladesh Telecommunications Company Limited
No ratings yet
Marketing Cell, BTCL.: Bangladesh Telecommunications Company Limited
42 pages
PC Intro To Sequences
No ratings yet
PC Intro To Sequences
15 pages
Keysight - Techniques For Advanced Cable Testing Using FieldFox Handheld Analyzers
No ratings yet
Keysight - Techniques For Advanced Cable Testing Using FieldFox Handheld Analyzers
15 pages
KSS 82 CREAD CWRITE en PDF
100% (1)
KSS 82 CREAD CWRITE en PDF
67 pages
200-901 V15.95
No ratings yet
200-901 V15.95
121 pages
Smart Impact
No ratings yet
Smart Impact
11 pages
Lesson 1 - Introduction To ICT
No ratings yet
Lesson 1 - Introduction To ICT
52 pages
UK Tuberculosis Detection Programme
No ratings yet
UK Tuberculosis Detection Programme
1 page
Ladd
No ratings yet
Ladd
1 page
Cisco UCS C-Series IMC Emulator Quick Start Guide
No ratings yet
Cisco UCS C-Series IMC Emulator Quick Start Guide
14 pages
Project Report
No ratings yet
Project Report
23 pages
Tarea de Ingles #1
No ratings yet
Tarea de Ingles #1
6 pages
SDA Exp.4
No ratings yet
SDA Exp.4
8 pages
Introduction To Openenterprise: Product Overview
No ratings yet
Introduction To Openenterprise: Product Overview
4 pages
PS6000
No ratings yet
PS6000
16 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
9 pages
Bus Ticket Booking System
No ratings yet
Bus Ticket Booking System
17 pages
BIT3105 INTERNET PROGRAMMING Notes Final
No ratings yet
BIT3105 INTERNET PROGRAMMING Notes Final
161 pages
College Management System ER Diagram PDF
No ratings yet
College Management System ER Diagram PDF
4 pages
PAC8000S Owner's Manual: Downloaded From Manuals Search Engine
No ratings yet
PAC8000S Owner's Manual: Downloaded From Manuals Search Engine
14 pages
Fleet Management System Presentation TO Yapi Merkezi BY Tech Up Company Limited
No ratings yet
Fleet Management System Presentation TO Yapi Merkezi BY Tech Up Company Limited
11 pages
1.1.1 Binary Systems
No ratings yet
1.1.1 Binary Systems
9 pages
CSE231 - Lecture 5
No ratings yet
CSE231 - Lecture 5
33 pages

5 Module #5 Parallel Algorithms Actual October 30 2024

Uploaded by

5 Module #5 Parallel Algorithms Actual October 30 2024

Uploaded by

Module #5

Introduction to Parallel Algorithms Processing

Professor Mostafa Abd-El-Barr

Fall Term 2024-2025

Saturday, November 2, 2024 1

Trace the algorithm.

Worst-case: (usually) the O-notation

Average-case: (sometimes) the Θ-Theta notation

Best-case: (Rarely) the Ω-Omega notation

• We use O-notation to give an asymptotic upper bound of

 f (n) : there exist positive constants c1, c2 , and n0 s.t.

• A function f (n) belongs to the set ( g (n)) if there exist

 f (n) : there exist positive constants c and n0 s.t.

Graphic examples of , O, and  .

• How does it work?

A pseudocode for insertion sort ( INSERTION SORT ).

Insertion sort Step

Trace of the algorithm

I Temp Location Inner Loop Final Selection

Algorithm Time (worst Possible case Notes

Selection-sort O(n2) ◼ slow

Insertion-sort O(n2) ◼ slow

Introduction to Parallel Computing, University of Oregon, IPCC 34

Introduction to Parallel Computing, University of Oregon, IPCC 35

Solution time Actual speedup

Number of processors Number of processors p

Single data Multiple data Shared Message

MISD MIMD DMSV DMMP

Mostly used for programming SMP Communication

Introduction to Parallel Computing, University of Oregon, IPCC 40

Depth total = 4 Work Total: 15

Interconnection Network M M M M M Memory Shared Memory

P1 P2 P3 Pn Processors Interconnection Network

Depth (D) Total: 4 Work (W) Total: 15

You might also like