0% found this document useful (0 votes)

17 views41 pages

L8 Parallel Algorithms

The document discusses parallel algorithms, focusing on their structure, architectures, and applications, including sorting and merging techniques. It explains various parallel computing models such as SIMD and MIMD, and introduces the PRAM model for parallel random access. Additionally, it provides examples of parallel algorithms like finding the largest key in an array and merge sort, highlighting their time complexities and efficiency improvements over sequential algorithms.

Uploaded by

chitrabhanuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views41 pages

L8 Parallel Algorithms

Uploaded by

chitrabhanuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Parallel Algorithms

Parallel algorithms

TOPICS:
Sorting
Merging
 List ranking in PRAMs and
Applications.
What is a parallel approach?
Imagine you needed to find a lost
child in the woods. Even in a small
area, searching by yourself would be
very time consuming. Now if you
gathered some friends and family to
help you, you could cover the woods
in much faster manner…
Parallel Architectures

Single Instruction Stream , Multiple

Data Stream (SIMD)
One global control unit connected
to each processor
Multiple Instruction Stream,
Multiple Data Stream (MIMD)
Each processor has a local control
unit
Parallel Architectures
Shared-Address-Space
Each processor has access to main memory
Processors may be given a small private memory for local
variables
Message-Passing
Each processor is given its own block of memory
Processors communicate by passing messages directly
instead of modifying memory locations
Interconnection Networks

Static
Each processor is hard-wired to every
other processor

Completely
Connected Star- Bounded-Degree
Connected (Degree 4)
Dynamic
Processors are connected to a series of
switches
Parallel Algorithms
 A parallel algorithm is an algorithm that has been
specifically written for execution on a computer with
two or more processing units.
 Can be run on computers with single
processor
(multiple functional units, pipelined
functional units, pipelined memory
systems).
 When designing algorithm, take into
account the cost of communication, the
number of processors (efficiency)
Parallel Algorithms
Parallel: perform more than one operation at a time.
PRAM model: Parallel Random Access Model.

p0 Multiple processors connected to a shared

p1 Shared memory. Each processor access any
memory location in unit time. All processors can
access memory in parallel. All processors
pn-1 can perform operations in parallel.
The PRAM Model

Parallel Random Access Machine

Number of processors is not limited
All processors have local memory
One global memory accessible to all
processors
Processors must read and write global
memory
The PRAM Model
Parallel Random Access Machine
Theoretical model for parallel
machines
P processors with uniform access to a
large memory bank
MIMD
UMA (uniform memory access) – Equal
memory access time for any processor
to any address
Memory access protocols
Exclusive-Read Exclusive-Write
Exclusive-Read Concurrent-
Write
Concurrent-Read Exclusive-Write
Concurrent-Read Concurrent-Write
Example: Finding the Largest key in an array

In order to find the largest key in an

array of size n, at least n-1
comparisons must be done.

A parallel version of this algorithm

will still perform the same amount
of compares, but by doing them in
parallel it will finish sooner.
Example: Finding the Largest key in an array

Assume that n is a power of 2 and

we have n/2 processors executing
the algorithm in parallel.
Each Processor reads two array
elements into local variables called
first and second
It then writes the larger value into
the first of the array slots that it
has to read.
Takes log (n) steps for the largest
2
key to be placed in to the first slot
of the array.
Example: Finding the Largest key in an array

3 1 7 6 8 1 1 1
2 1 9 3

P1 P2 P3 P4
Read
1 3 6 7 1 8 1 1
2
First Second First Second 1
First 3
Second First 9
Second

Write

1 1 7 6 1 1 1 1
2 2 1 1 9 3
Example: Finding the Largest key in an array

1 1 7 6 1 1 1 1
2 2 1 1 9 3

P1 P3
Read

7 12 19 11

Write

1 1 7 6 1 1 1 1
2 2 9 1 9 3
Example: Finding the Largest key in an array

1 1 7 6 1 1 1 1
2 2 9 1 9 3

Read P1

1 1
9 2
Write

1 1 7 6 1 1 1 1
9 2 9 1 9 3
Example: Merge Sort

8 1 4 5 2 7 3 6

P1 P2 P3 P4

1 8 4 5 2 7 3 6

P1 P2

1 4 5 8 2 3 6 7

1 2 3 4 5 6 7 8
Merge Sort Analysis

Number of comparisons
= 1 + 3 + … + (2i-1) + … + (n-1)
= ∑i=1.. log2(n)(2i-1) < 2n- log2(n) =
Θ(n)

We have improved from n × loge(n)

to n simply by applying the old
algorithm to parallel computing,
by altering the algorithm we can
further improve merge sort to
(log2(n) )2
O(1) Sorting Algorithm
We assume a CRCW PRAM where concurrent
write is handled with addition :
for(int i=1; i<=n; i++)
{
for(int j=1; j<=n; j++)
{
if(X[i] > X[j])
Processor Pij stores 1 in memory location
mi
else
Processor Pij stores 0 in memory location
mi
}}
O(1) Sorting Algorithm

Unsorted
List {4, 2,
1, 3}
P11(4,4) P12(4,2) P13(4,1)
P14(4,3) 0+1+1+1 = 3

P21(2,4) P22(2,2) P (2,1)

0+0+1+0 = 1 23
P24(2,3)

0+0+0+0 = 0
P31(1,4) P32(1,2) P33(1,1)
P34(1,3)

0+1+1+0 = 2
Applications
Computer Graphics Processing
Video Encoding
Accurate weather forecasting
Potential Speedup

O(n logn) optimal sequential sorting algorithm

Best we can expect based upon a sequential sorting algorithm

using n processors is:

O( n log n)
optimal parallel time complexity  O(log n)
n
Odd-Even Transposition Sort - example

Parallel time complexity: Tpar = O(n) (for P=n)

Odd-Even Transposition Sort – Example (N >> P)

Each PE gets n/p numbers. First, PEs sort n/p locally, then they run
odd-even trans. algorithm each time doing a merge-split for 2n/p
numbers. P P P
0 1 2
P3
13 7 12 8 5 4 6 1 3 9 2 10

Local sort
7 12 13 4 5 8 1 3 6 2 9 10
O-E

4 5 7 8 12 13 1 2 3 6 9 10
E-O

4 5 7 1 2 3 8 12 13 6 9 10
O-E

1 2 3 4 5 7 6 8 9 10 12 13
E-O

SORTED: 1 2 3 4 5 6 7 8 9 10 12 13
Time complexity: Tpar = (Local Sort) + (p merge-splits) +(p exchanges)

Tpar = (n/p)log(n/p) + p(n/p) + p(n/p) = (n/p)log(n/p) + 2n

Parallelizing Merge sort
Merge sort - Time complexity
Sequential :
n 2 n log n n
Tseq 1* n  2 *  2 * 2    2 * log n
2 2 2
Tseq O ( n log n)

Parallel :
 n n n n 
T par 2 0  1  2    k   2  1
2 2 2 2 
 
2n 20  2  1  2  2    2  log n
T par O( 4n)
Bitonic Merge sort

Bitonic Sequence
A bitonic sequence is defined as a list with no more than one
LOCAL MAXIMUM and no more than one LOCAL
MINIMUM.(Endpoints must be considered - wraparound )
Binary Split
1. Divide the bitonic list into two equal halves.
2. Compare-Exchange each item on the first half
with the corresponding item in the second half.

Result:
Two bitonic sequences where the numbers in one sequence are all less
than the numbers in the other sequence.
Repeated application of binary split
Bitonic list:
24 20 15 9 4 2 5 8 | 10 11 12 13 22 30 32 45

Result after Binary-split:

10 11 12 9 4 2 5 8 | 24 20 15 13 22 30 32
45

If you keep applying the BINARY-SPLIT to each half repeatedly, you

will get a SORTED LIST !

10 11 12 9 . 4 2 5 8 | 24 20 15 13 . 22 30 32
45
4 2 . 5 8 10 11 . 12 9 | 22 20 . 15 13 24 30 . 32
45
4 . 2 5 . 8 10 . 9 12 .11 15 . 13 22 . 20 24 . 30 32 .
45
2 4 5 8 9 10 11 12 13 15 20 22 24 30 32
Sorting a bitonic sequence
Compare-and-exchange moves smaller numbers of each pair to left
and larger numbers of pair to right.
Given a bitonic sequence,
recursively performing ‘binary split’ will sort the list.
Sorting an arbitrary sequence
To sort an unordered sequence, sequences are merged into larger
bitonic sequences, starting with pairs of adjacent numbers.

By a compare-and-exchange operation, pairs of adjacent numbers

formed into increasing sequences and decreasing sequences. Pairs
form a bitonic sequence of twice the size of each original
sequences.

By repeating this process, bitonic sequences of larger and larger

lengths obtained.

In the final step, a single bitonic sequence sorted into a single

increasing sequence.
Number of steps (P=n)

In general, with n = 2k, there are k phases, each of 1, 2, 3,

…, k steps. Hence the total number of steps is:

i log n
log n(log n  1)
T bitonic
par  i  2
O(log n)
i 1 2
Here Bitonic sort (for N >> P)

x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x
Parallel sorting - summary

Computational time complexity using P=n processors

• Odd-even transposition sort - O(n)

• Parallel merge sort - O(n)

unbalanced processor load and Communication

• Bitonic Merge sort - O(log2n) ( BEST! )

Pointer Jumping –list ranking
Given a single linked list L with n objects, compute, for each
object in L, its distance from the end of the list.
Formally: suppose next is the pointer field
d[i]= 0 if next[i]=nil
d[next[i]]+1 if next[i]nil
Serial algorithm: (n).
List ranking –EREW algorithm
LIST-RANK(L) (in O(log2 n) time)
1. for each processor i, in parallel
2. do if next[i]=nil
3. then d[i]0
4. else d[i]1
5. while there exists an object i such that next[i]nil
6. do for each processor i, in parallel
7. do if next[i]nil
8. then d[i] d[i]+ d[next[i]]
9. next[i] next[next[i]]
List-ranking –EREW algorithm

3 4 6 1 0 5
(a) 1 1 1 1 1 0
3 4 6 1 0 5
(b) 2 2 2 2 1 0
3 4 6 1 0 5
(c) 4 4 3 2 1 0

3 4 6 1 0 5
(d) 5 4 3 2 1 0
List ranking –correctness of EREW algorithm
Loop invariant: for each i, the sum of d
values in the sub list headed by i is the
correct distance from i to the end of the
original list L.
Parallel memory must be synchronized:
the reads on the right must occur before
the writes on the left. Moreover, read d[i]
and then read d[next[i]].
An EREW algorithm: every read and write
is exclusive. For an object i, its
processor reads d[i], and then its
precedent processor reads its d[i].
Writes are all in distinct locations.
LIST ranking EREW algorithm running time
O(log2 n):
The initialization for loop runs in O(1).
Each iteration of while loop runs in O(1).
There are exactly log2 n  iterations:
 Each iteration transforms each list into two
interleaved lists: one consisting of objects
in even positions, and the other odd
positions. Thus, each iteration double the
number of lists but halves their lengths.
The termination test in line 5 runs in O(1).
Define work = #processors running time.
O(n log2 n).
Sorting on Specific Networks

• Two network structures have received special attention:

mesh and hypercube
Parallel computers have been built with these networks.

• However, it is of less interest nowadays because networks got

faster and clusters became a viable option.

• Besides, network architecture is often hidden from the user.

• MPI provides libraries for mapping algorithms onto meshes,

and one can always use a mesh or hypercube algorithm even
if the underlying architecture is not one of them.
40
Shear-sort
Alternate row and column sorting until list is fully sorted.
Alternate row directions to get snake-like sorting:

Tesco's Visual Culture in Retail Marketing
No ratings yet
Tesco's Visual Culture in Retail Marketing
9 pages
Astm D1654 92 2000
No ratings yet
Astm D1654 92 2000
1 page
Case Study
33% (3)
Case Study
4 pages
Groundwater PDF
100% (1)
Groundwater PDF
25 pages
The Postcolonial Gramsci 2011
100% (1)
The Postcolonial Gramsci 2011
267 pages
SSPC News Bulletin - July 2020
No ratings yet
SSPC News Bulletin - July 2020
26 pages
Catalogo C&P 2016
No ratings yet
Catalogo C&P 2016
11 pages
Countertransference Awareness and Therapists' Use
No ratings yet
Countertransference Awareness and Therapists' Use
111 pages
Entrep12 Q1 Mod3 Recognize-and-Understand-the-Market v2
No ratings yet
Entrep12 Q1 Mod3 Recognize-and-Understand-the-Market v2
13 pages
After & Afterwards: Yakutsk
100% (1)
After & Afterwards: Yakutsk
36 pages
Robert Lekachman Eds. Keynes' General Theory Reports of Three Decades
No ratings yet
Robert Lekachman Eds. Keynes' General Theory Reports of Three Decades
355 pages
Eso Music I-Answer Key Unit01-1
No ratings yet
Eso Music I-Answer Key Unit01-1
6 pages
Trip of Dreams PDF
No ratings yet
Trip of Dreams PDF
6 pages
Kedah: (Formally Known As Tropicana Medical Centre)
No ratings yet
Kedah: (Formally Known As Tropicana Medical Centre)
2 pages
Sortings
No ratings yet
Sortings
92 pages
Hajj Gui Dev 4
No ratings yet
Hajj Gui Dev 4
5 pages
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
No ratings yet
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
16 pages
Motor Data Sheet
No ratings yet
Motor Data Sheet
8 pages
Test1 Key PDF
No ratings yet
Test1 Key PDF
8 pages
Group Assignment - Theory of Algorithms
No ratings yet
Group Assignment - Theory of Algorithms
11 pages
1-Analysis and Design F Algorithms
No ratings yet
1-Analysis and Design F Algorithms
83 pages
2 PDF
No ratings yet
2 PDF
97 pages
Parallel Algorithm Lecture Notes
No ratings yet
Parallel Algorithm Lecture Notes
28 pages
Highway Design Short Note PDF
No ratings yet
Highway Design Short Note PDF
14 pages
F8 PDF
No ratings yet
F8 PDF
32 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
55 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
Raghu Institute of Technology: Department of Computer Science & Engg
No ratings yet
Raghu Institute of Technology: Department of Computer Science & Engg
7 pages
Improved Computing Performance For Listing Combinatorial Algorithms Using Multi-Processing Mpi and Thread Library
No ratings yet
Improved Computing Performance For Listing Combinatorial Algorithms Using Multi-Processing Mpi and Thread Library
16 pages
CPP R16 - Unit-3
No ratings yet
CPP R16 - Unit-3
21 pages
3.parallel Processing - Algorithms
No ratings yet
3.parallel Processing - Algorithms
37 pages
DAA - Lab - Manual - 1.1
No ratings yet
DAA - Lab - Manual - 1.1
77 pages
Ques1: Write An Algorithm To Find The Largest Value Among N 1 Numbers
No ratings yet
Ques1: Write An Algorithm To Find The Largest Value Among N 1 Numbers
10 pages
Brief Introduction of Data Structure
No ratings yet
Brief Introduction of Data Structure
36 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
37 pages
Bitonic Sort
No ratings yet
Bitonic Sort
23 pages
Data Structures Se E&Tc List of Practicals
No ratings yet
Data Structures Se E&Tc List of Practicals
23 pages
Parallel Sorting Algorithms
No ratings yet
Parallel Sorting Algorithms
22 pages
15.082J/6.855J/ESD.78J September 14, 2010: Data Structures
No ratings yet
15.082J/6.855J/ESD.78J September 14, 2010: Data Structures
45 pages
(A) What Is Randomized Quicksort? Analyse The Expected Running Time of Randomized Quicksort, With The Help of A Suitable Example. Answer
No ratings yet
(A) What Is Randomized Quicksort? Analyse The Expected Running Time of Randomized Quicksort, With The Help of A Suitable Example. Answer
14 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
DAA File1
No ratings yet
DAA File1
15 pages
Japanese Fish Killing
No ratings yet
Japanese Fish Killing
19 pages
Parallel Merge Sort
No ratings yet
Parallel Merge Sort
6 pages
Sorting Algos in Parallel System
No ratings yet
Sorting Algos in Parallel System
23 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
Example:: (A Complete Solution Site)
No ratings yet
Example:: (A Complete Solution Site)
9 pages
Ada Lab Mannul
No ratings yet
Ada Lab Mannul
32 pages
2 3-Algorithms 280155520
No ratings yet
2 3-Algorithms 280155520
1 page
10 Sorting
No ratings yet
10 Sorting
20 pages
DSD Unit 3 Sorting and Searching
No ratings yet
DSD Unit 3 Sorting and Searching
36 pages
Frog Dissecting Laboratory Activity
No ratings yet
Frog Dissecting Laboratory Activity
4 pages
CAT AND MOUSE IN PARTNERSHIP - Grimms' Fairy Tales
No ratings yet
CAT AND MOUSE IN PARTNERSHIP - Grimms' Fairy Tales
3 pages
Presentation 2
No ratings yet
Presentation 2
21 pages
Linear Array: Jyotika Jain
No ratings yet
Linear Array: Jyotika Jain
22 pages
Tutorial Heat Tension Simulation
No ratings yet
Tutorial Heat Tension Simulation
5 pages
DAA Lab Manual New
No ratings yet
DAA Lab Manual New
60 pages
AD 366 - Bolts and Pins - Background Information ASM 22.3.12
No ratings yet
AD 366 - Bolts and Pins - Background Information ASM 22.3.12
2 pages
Algorithm Lab Manual Updated NEW
No ratings yet
Algorithm Lab Manual Updated NEW
65 pages
Cours 3
No ratings yet
Cours 3
54 pages
Lecture 3 - CS50's Computer Science For Lawyers
No ratings yet
Lecture 3 - CS50's Computer Science For Lawyers
12 pages
DAA Practical Jinesh
No ratings yet
DAA Practical Jinesh
19 pages
Aqa 8525 PG Sample
No ratings yet
Aqa 8525 PG Sample
19 pages
What To Do When Leaving or Coming Back To Nigeria, V3, 30-12-21
No ratings yet
What To Do When Leaving or Coming Back To Nigeria, V3, 30-12-21
2 pages
CIT Training in Jefferson County
No ratings yet
CIT Training in Jefferson County
2 pages
Cours 3
No ratings yet
Cours 3
54 pages
Algorith Lab Mannual
No ratings yet
Algorith Lab Mannual
55 pages
Task 4 - Model Presentation (With Talk Track)
No ratings yet
Task 4 - Model Presentation (With Talk Track)
11 pages
UNIT IV - Searching and Sorting
No ratings yet
UNIT IV - Searching and Sorting
21 pages
Dsa Lab 3 Searching
No ratings yet
Dsa Lab 3 Searching
29 pages
Practical List Data Structures & Algorithms
No ratings yet
Practical List Data Structures & Algorithms
25 pages
DAA Lab
No ratings yet
DAA Lab
178 pages
Unit05 3 MergeSort
No ratings yet
Unit05 3 MergeSort
27 pages
3g Technology Report
0% (1)
3g Technology Report
28 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Parallel and Distributed Lec 11
No ratings yet
Parallel and Distributed Lec 11
15 pages
Lect1 1
No ratings yet
Lect1 1
36 pages
Service Bulletin 72 - Dis-Coloration of Oil of Tamping Unit Vibration Shaft Bearing - Ext
No ratings yet
Service Bulletin 72 - Dis-Coloration of Oil of Tamping Unit Vibration Shaft Bearing - Ext
5 pages
Study Guide Paper 1 (Travel and Tourism) 2
No ratings yet
Study Guide Paper 1 (Travel and Tourism) 2
20 pages
36 BigO Sort
No ratings yet
36 BigO Sort
17 pages
MM GTP Part 01
No ratings yet
MM GTP Part 01
62 pages
7 Sorting
No ratings yet
7 Sorting
85 pages
Session-4 5
No ratings yet
Session-4 5
16 pages
Sorting and Searching
No ratings yet
Sorting and Searching
56 pages
Sorting and Searching Algorithms
No ratings yet
Sorting and Searching Algorithms
45 pages
Algo Summary
No ratings yet
Algo Summary
12 pages
Chapter7 External Sorting
No ratings yet
Chapter7 External Sorting
23 pages
TensorFlow构建机器学习项目: Chinese Edition
From Everand
TensorFlow构建机器学习项目: Chinese Edition
Posts & Telecom Press
No ratings yet

L8 Parallel Algorithms

Uploaded by

L8 Parallel Algorithms

Uploaded by

Parallel Algorithms

Single Instruction Stream , Multiple

p0 Multiple processors connected to a shared

Parallel Random Access Machine

In order to find the largest key in an

A parallel version of this algorithm

Assume that n is a power of 2 and

We have improved from n × loge(n)

P21(2,4) P22(2,2) P (2,1)

O(n logn) optimal sequential sorting algorithm

Best we can expect based upon a sequential sorting algorithm

Parallel time complexity: Tpar = O(n) (for P=n)

Tpar = (n/p)log(n/p) + p*(n/p) + p*(n/p) = (n/p)log(n/p) + 2n

Result after Binary-split:

If you keep applying the BINARY-SPLIT to each half repeatedly, you

By a compare-and-exchange operation, pairs of adjacent numbers

By repeating this process, bitonic sequences of larger and larger

In the final step, a single bitonic sequence sorted into a single

In general, with n = 2k, there are k phases, each of 1, 2, 3,

Computational time complexity using P=n processors

• Odd-even transposition sort - O(n)

• Parallel merge sort - O(n)

• Bitonic Merge sort - O(log2n) (** BEST! **)

• Two network structures have received special attention:

• However, it is of less interest nowadays because networks got

• Besides, network architecture is often hidden from the user.

• MPI provides libraries for mapping algorithms onto meshes,

You might also like

Tpar = (n/p)log(n/p) + p(n/p) + p(n/p) = (n/p)log(n/p) + 2n

• Bitonic Merge sort - O(log2n) ( BEST! )