0% found this document useful (0 votes)

0 views

Lectures on Parallel

The document discusses various sorting algorithms designed for different parallel computing models, specifically focusing on CRCW, CREW, and EREW models. It presents algorithms that aim to optimize processor usage, running time, and cost while addressing write conflicts and adapting to the number of processors. The analysis includes examples and performance metrics for each algorithm, demonstrating their efficiency and optimality under certain conditions.

Uploaded by

Shubam gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Lectures on Parallel

Uploaded by

Shubam gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

SORTING ON THE CREW MODEL

We attempt to deal with two of the objections raised with regards to

procedure CRCW SORT: its excessive use of processors and its tolerance
of write conflicts.
Our purpose is to design an algorithm that is free of write conflicts and
uses a reasonable number of processors. In addition, we shall require the
algorithm to also satisfy our usual desired properties for shared-memory
SIMD algorithms.
Thus the algorithm should have
(i) a sublinear and adaptive number of processors,
(ii) a running time that is small and adaptive, and
(iii) a cost that is optimal.
Basic Idea
The idea is quite simple. Assume that a CREW SM SIMD computer with N
processors P1 , P2 , ., Pn is to be used to sort the sequence

We begin by distributing the elements of S evenly among the N

processors. Each processor sorts its allocated subsequence sequentially
using procedure QUICKSORT.

The N sorted subsequences are now merged pairwise, simultaneously,

using procedure CREW MERGE for each pair.

The resulting subsequences are again merged pairwise and the process
continues until one sorted sequence of length n is obtained.

The algorithm is given in what follows as procedure CREW SORT.

We denote the initial subsequence of S allocated to processor Pi by Si.

Subsequently, S Kj is used to denote a subsequence obtained by merging
two subsequences and P Kj the set of processors that performed the merge.
Example:

Analysis. The dominating operation in step 1 is the call to QUICKSORT,

which requires O((n/N)log(n/N)) time. During each iteration of step 2.3,
[v/2] pairs of subsequences with n/[v/2] elements per pair are to be
merged simultaneously using N/[v/2J processors per pair. Procedure CREW
MERGE thus requires O([(n/[v/2])/(N/[v/2)] + log(n/[v/2)), that is, O((n/N) +
log n) time. Since step 2.3 is iterated [log N] times, the total running time
of procedure CREW SORT is

t(n) = O((n/N)log(n/N)) + O((n/N)log N + log n log N)

= O((n/N)log n + log2 n).
Since p(n) = N, the procedure's cost is given by

c(n) = O(n log n + N log2 n), which is optimal for N < n/log n.

SORTING ON THE EREW MODEL

Two of the criticisms expressed with regards to procedure CRCW SORT
were
addressed by procedure CREW SORT, which adapts to the number of
existing processors and disallows multiple-write operations into the same
memory location.

Still, procedure CREW SORT tolerates multiple-read operations. Our

purpose in this section is to deal with this third difficulty.
Basic Idea

The idea is to adapt the sequential procedure QUICKSORT to run on a

parallel computer. We begin by noting that, since N < n, we can write N =
n1−x , where 0 < x < 1.

each.

The
Let k = 2[1/x]. The algorithm is given as procedure EREW SORT:

Example:
Let S = {5,9,12,16,18,2,6,13,17,4,7,18,18,11,3,17,20,19,14,8,5,17,1,11,15, 10,
10} (i.e., n = 27) and let five processors P 1, P2, P3, P4, P5 be available on an EREW
SM-SIMD computer (i.e., N = 5).

.
.

are in sorted
order.

to sort S3 and S4.

shown below:
Analysis. The call to QUICKSORT takes constant time. From the analysis
of procedure PARALLEL SELECT in steps 1-4 require cnX time units for
some constant c. The running time of procedure EREW SORT is therefore

t(n) = cnX + 2t(n/k)

= O(nx log n).
1−x
Since p(n) = n , the procedure's cost is given by
c(n) = p(n) x t(n) = O(n log n), which is optimal. Note, however, that since
n1−x < n/log n, cost optimality is restricted to the range N < n/log n.

SORTING ON CRCW MODEL

Whenever an algorithm is to be designed for the CRCW model of
computation, one must specify how write conflicts, that is, multiple
attempts to write into the same memory location, can be resolved. For the
purposes of the sorting algorithm to be described, we shall assume that
write conflicts are created whenever several processors attempt to write
potentially different integers into the same address. The conflict is
resolved by storing the sum of these integers in that address.

Basic Idea

Assume that n2 processors are available on such a CRCW computer to

sort the sequence

The sorting algorithm to be used is based on the idea of sorting by

enumeration: The position of each element si of S in the sorted sequence
is
determined by computing ci, the number of elements smaller than it.

If two elements si and sj are equal, then s, is taken to be the larger of the
two if i > j; otherwise sj is the larger.

Once all the ci have been computed, si is placed in position 1 + ci of the

sorted sequence.
We assume that the processors are arranged into n rows of n elements
each and are numbered.

The shared memory contains two arrays: The input sequence is stored in
array S, while the counts ci are stored in array C. The sorted sequence is
returned in array S.

The ith row of processors is "in charge" of element s i: Processors P(i, 1),
P(i, 2), .. ., P(i, n) compute ci and store si in position 1 + ci of S.

The algorithm is given as procedure CRCW SORT:

Example:

Let S = {5, 2, 4, 5}. The two elements of S that each of the 16 processors
compares and the contents of arrays S and C after each step of procedure
CRCW SORT.
(ii) the write conflict resolution process is itself very powerful-all numbers
to
be stored in a memory location are added and stored in constant time;and
3. uses a very large number of processors; that is, the number of
processors grows quadratically with the size of the input.
For these reasons, particularly the last one, the algorithm is most likely to
be of no great practical value. Nevertheless, procedure CRCW SORT is
interesting in its own right: It demonstrates how sorting can be
accomplished in constant time on a model.

MERGING ON THE CREW MODEL

A CREW SM SIMD computer consists of N processors P1, P2 , ... PN. It is
required to design a parallel algorithm for this computer that takes the
two sequences A and B as input and produces the sequence C as output,
as defined earlier. Without loss of generality, we assume that r < s.

It is desired that the parallel algorithm satisfy the properties:

(i) the number of processors used by the algorithm be sublinear and
adaptive,
(ii) the running time of the algorithm be adaptive and significantly smaller
than the best sequential algorithm, and
(iii) the cost be optimal.

We now describe an algorithm that satisfies these properties. It uses N

processors where N < r and in the worst case when r = s = n runs in
0((n/N) + log n) time. The algorithm is therefore cost optimal for N < n/log
n. In addition to the basic arithmetic and logic functions usually available,
each of the N processors is assumed capable of performing the following
two sequential procedures:
1. Procedure SEQUENTIAL MERGE
2. Procedure BINARY SEARCH

Basic Idea
The procedure takes as input a sequence S = {s1, s2 , ... , sn.} of numbers
sorted in nondecreasing order and a number x. If x belongs to S, the
procedure returns the index k of an element Sk in S such that x = Sk.
Otherwise, the procedure returns a zero. Binary search is based on the
divide-and-conquer principle. At each stage, a comparison is performed
between x and an element of S. Either the two are equal and the
procedure terminates or half of the elements of the sequence under
consideration are discarded. The process continues until the number of
elements left is 0 or 1, and after at most one additional comparison the
procedure terminates.
Since the number of elements under consideration is reduced by one-half
at each step, the procedure requires O(log n) time in the worst case.
We are now ready to describe our first parallel merging algorithm for a
sharedmemory computer. The algorithm is presented as procedure CREW
MERGE.
Example:

In steps 3.1 and 3.2, Q(1) = (1, 1), Q(2) = (5, 3), Q(3) = (6, 7), and Q(4) = (10, 9) are
determined.

In step 3.3 processor P1 begins at elements aI = 2 and b, = 1 and merges all

elements of A and B smaller than 7, thus creating the subsequence { 1, 2, 3, 4,
5, 6} of C. Similarly, processor P2 begins at a5 = 1 1 and b3 = 7 and merges all
elements smaller than 12, thus creating {7, 8, 9, 10, 11}.

Processor P3 begins at a6 = 12 and b, = 14 and creates {12, 13, 14, 15, 16, 17}.

Finally P4 begins at a10 = 20 and bg = 18 and creates {18, 19, 20, 21, 22, 23, 24}.

The resulting sequence C is therefore {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11, 12,13, 14,

15, 16, 17, 18, 19, 20, 21, 22, 23, 24}.

Analysis. A step-by-step analysis of CREW MERGE follows:

Step 1: With all processors operating in parallel, each processor computes two
subscripts. Therefore this step requires constant time.

Step 2: This step consists of two applications of procedure BINARY SEARCH to a sequence
of length N -1, each followed by an assignment statement. This takes O(log N) time.

Step 3: Step 3.1 consists of a constant-time assignment, and step 3.2 requires at most
O(log s) time. To analyze step 3.3, we first observe that V contains 2N – 2 elements that
divide C into 2N -1 subsequences with maximum size equal to ([r/N] + [s/N]). This
maximum size occurs if, for example, one element a of A' equals an element bj of B';
then the [r/N] elements smaller than or equal to a, (and larger than or equal to a ,-) are
also smaller than or equal to bj, and similarly, the [s/N] elements smaller than or equal to
bJ (and larger than or equal to bj-,) are also smaller than or equal to a'. In step 3 each
processor creates two such subsequences of C whose total size is therefore no larger
than
2([r/N] + rs/Ni), except PN, which creates only one subsequence of C. It follows that
procedure SEQUENTIAL MERGE takes at most O((r + s)/N) time.

In the worst case, r = s = n, and since n > N, the algorithm's running time is dominated
by the time required by step 3. Thus t(2n) = O((n/N) + log n).
Since p(2n) = N, c(2n) = p(2n) x t(2n) = O(n + N log n), and the algorithm is
cost optimal when N < n/log n.

unit-IV-Searching
Searching is one of the most fundamental operations in the field of
computing. It is used in any application where we need to find out
whether an element belongs to a list or, more generally, retrieve from a
file information associated with that element. In its most basic form the
searching problem is stated as follows: Given a sequence S = {s1, s2 ,.
Sn} of integers and an integer x, it is required to determine whether X =
Sk for some sk in S.
In sequential computing, the problem is solved by scanning the sequence
S and comparing x with its successive elements until either an integer
equal to x is found or the sequence is exhausted without success. This is
given in what follows as procedure SEQUENTIAL SEARCH. As soon as an Sk
in S is found such that x = Sk, the procedure returns k; otherwise 0 is
returned.

In the worst case, the procedure takes 0(n) time. This is clearly optimal,
since every element of S must be examined (when x is not in S) before
declaring failure. Alternatively, if S is sorted in nondecreasing order, then
procedure BINARY SEARCH can return the index of an element of S equal
to x (or 0 if no such element exists) in 0(log n) time.

EREW Searching
N-processor EREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.

the value of x must be made known to all processors. This can be done
using procedure BROADCAST in O(log N) time.

The sequence S is then subdivided into N subsequences of length n/N

each, and processor Pi is assigned {S(i-1)(n/N)+1, S(i-2)(n/N)+2, ... ,
Si(n/N)}.

All processors now perform procedure BINARY SEARCH on their assigned

subsequences. This requires O(log(n/N)) in the worst case.

Since the elements of S are all distinct, at most one processor finds an Sk
equal to x and returns k.

The total time required by this EREW searching algorithm is therefore

O(log N) + O(log(n/N)), which is O(log n).

Fig: Format of record in file to be searched.

CREW Searching
N-processor CREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.

The value of x must be made known to all processors. This can be done in
constant time.
Basic Idea
There are N processors and hence an (N + 1)-ary search can be used. At
each stage, the sequence is split into N + 1 subsequences of equal length
and the N processors simultaneously probe the elements at the boundary
between successive subsequences.

Let g be the smallest integer such that that is

. g stages are sufficient to search a sequence of

length n for an element equal to an input x.

To search a
ALGO CREW SEARCH

A New Optimized Version of Merge Sort
No ratings yet
A New Optimized Version of Merge Sort
5 pages
Parallel Algorithm & Sorting in Parallel Programming: Submitted By:-Submitted To: - Dalpat Songra
No ratings yet
Parallel Algorithm & Sorting in Parallel Programming: Submitted By:-Submitted To: - Dalpat Songra
42 pages
Searching on sorted sequence
No ratings yet
Searching on sorted sequence
9 pages
A Cooperative Sort Algorithm Based On Indexing
No ratings yet
A Cooperative Sort Algorithm Based On Indexing
6 pages
Sorting On A Mesh-Connected Parallel Computer
No ratings yet
Sorting On A Mesh-Connected Parallel Computer
30 pages
Linear Array: Jyotika Jain
No ratings yet
Linear Array: Jyotika Jain
22 pages
Mergesort
No ratings yet
Mergesort
11 pages
V4i2 0534
No ratings yet
V4i2 0534
5 pages
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
No ratings yet
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
11 pages
On Implementation of Merge Sort: Kushal Jangid April 6, 2015
No ratings yet
On Implementation of Merge Sort: Kushal Jangid April 6, 2015
10 pages
V3i11 0536
No ratings yet
V3i11 0536
5 pages
Ch5-Searching-Part1-The Design and Analysis of Parallel Algori - Selim G Akl
No ratings yet
Ch5-Searching-Part1-The Design and Analysis of Parallel Algori - Selim G Akl
8 pages
Searching
No ratings yet
Searching
29 pages
Merge Sort
No ratings yet
Merge Sort
10 pages
Exp 3-2
No ratings yet
Exp 3-2
9 pages
Merge Sort
No ratings yet
Merge Sort
22 pages
Merge Sort
No ratings yet
Merge Sort
37 pages
10 Sorting
No ratings yet
10 Sorting
20 pages
Parallel Merge Sort
No ratings yet
Parallel Merge Sort
6 pages
Divide and Conquer Algorithms (Part 2)
No ratings yet
Divide and Conquer Algorithms (Part 2)
91 pages
PMSCS 623P Lecture 7
No ratings yet
PMSCS 623P Lecture 7
77 pages
HW4: Merge Sort: 1 Assignment Goal
No ratings yet
HW4: Merge Sort: 1 Assignment Goal
6 pages
Dsu Partb
No ratings yet
Dsu Partb
20 pages
Merge Sort
No ratings yet
Merge Sort
6 pages
mergeSortppt1
No ratings yet
mergeSortppt1
15 pages
Merge Sort
No ratings yet
Merge Sort
3 pages
CS161Lecture02
No ratings yet
CS161Lecture02
7 pages
Merge Sort
No ratings yet
Merge Sort
8 pages
Merge Sort Notes
No ratings yet
Merge Sort Notes
13 pages
Lectures 1-2 - Introduction - InsertionSort - MergeSort
No ratings yet
Lectures 1-2 - Introduction - InsertionSort - MergeSort
24 pages
06 SortingB MergeSort
No ratings yet
06 SortingB MergeSort
79 pages
Ds 7-Merge Sort
No ratings yet
Ds 7-Merge Sort
10 pages
Ds Mod5
No ratings yet
Ds Mod5
138 pages
Lecture 6: Divide and Conquer and Mergesort: (Thursday, Feb 12, 1998)
No ratings yet
Lecture 6: Divide and Conquer and Mergesort: (Thursday, Feb 12, 1998)
4 pages
Daa Miniproject
No ratings yet
Daa Miniproject
20 pages
AOA Lab Manual
No ratings yet
AOA Lab Manual
42 pages
Unit IV -Merge Sort
No ratings yet
Unit IV -Merge Sort
18 pages
Updated DAA Mini Project - Docx (A)
No ratings yet
Updated DAA Mini Project - Docx (A)
24 pages
Ajol File Journals - 411 - Articles - 221085 - Submission - Proof - 221085 4897 541755 1 10 20220208
No ratings yet
Ajol File Journals - 411 - Articles - 221085 - Submission - Proof - 221085 4897 541755 1 10 20220208
6 pages
Iterative_parallel_shift_sort__Optimization_and_design_for_area_constrained_applications
No ratings yet
Iterative_parallel_shift_sort__Optimization_and_design_for_area_constrained_applications
7 pages
Welcome To CIS 068 !: Lesson 9
No ratings yet
Welcome To CIS 068 !: Lesson 9
36 pages
Insertion Sort Vs Merge Sort in Matlab
No ratings yet
Insertion Sort Vs Merge Sort in Matlab
4 pages
Divide and Conquer (Yan Gu)
No ratings yet
Divide and Conquer (Yan Gu)
18 pages
MERGE SORT
No ratings yet
MERGE SORT
14 pages
AEC-Experiment 1_2
No ratings yet
AEC-Experiment 1_2
5 pages
0.1 Worst and Best Case Analysis
No ratings yet
0.1 Worst and Best Case Analysis
6 pages
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
No ratings yet
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
16 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
55 pages
Introduc2on Merge Sort (Pseudocode: Design and Analysis of Algorithms I
No ratings yet
Introduc2on Merge Sort (Pseudocode: Design and Analysis of Algorithms I
7 pages
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
No ratings yet
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
6 pages
Sorting: - Review of Sorting - Merge Sort - Sets
No ratings yet
Sorting: - Review of Sorting - Merge Sort - Sets
31 pages
3 Arrays
No ratings yet
3 Arrays
31 pages
12sorting
No ratings yet
12sorting
82 pages
L8 Parallel Algorithms
No ratings yet
L8 Parallel Algorithms
41 pages
AEC Mod Lab
No ratings yet
AEC Mod Lab
66 pages
Daa R20 Unit 2
No ratings yet
Daa R20 Unit 2
19 pages
CC 104 - SG - 6
No ratings yet
CC 104 - SG - 6
20 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Solutions Manual
No ratings yet
Solutions Manual
73 pages
Module 5 - PCD - Data Structure
No ratings yet
Module 5 - PCD - Data Structure
46 pages
7-5 Solution PDF
No ratings yet
7-5 Solution PDF
3 pages
Bidirectional Search
No ratings yet
Bidirectional Search
5 pages
Revision Worksheet Class Xi
No ratings yet
Revision Worksheet Class Xi
3 pages
Pivot Element
No ratings yet
Pivot Element
3 pages
ENGR 3157-Operations Research-Lecture 10-GoalProgramming
No ratings yet
ENGR 3157-Operations Research-Lecture 10-GoalProgramming
10 pages
Operations Research Paper PDF
No ratings yet
Operations Research Paper PDF
4 pages
AVL Tree
No ratings yet
AVL Tree
24 pages
Integer factorization
No ratings yet
Integer factorization
6 pages
Syllabus - CS 32
No ratings yet
Syllabus - CS 32
3 pages
Linear Programming: Dr. G Srinivasan Industrial Management Division Iitm
No ratings yet
Linear Programming: Dr. G Srinivasan Industrial Management Division Iitm
65 pages
2 Marks With Ans
No ratings yet
2 Marks With Ans
39 pages
Idea of Efficiency
No ratings yet
Idea of Efficiency
7 pages
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
No ratings yet
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
3 pages
Linear Programming
No ratings yet
Linear Programming
10 pages
3.4.1 Kruskal's Algorithm
No ratings yet
3.4.1 Kruskal's Algorithm
5 pages
Zero To Advance in DSA - Shumbul Arifa
No ratings yet
Zero To Advance in DSA - Shumbul Arifa
21 pages
Assignment 2 Algorithms and Data Types
No ratings yet
Assignment 2 Algorithms and Data Types
7 pages
Solution To Optimal Power Flow by PSO
No ratings yet
Solution To Optimal Power Flow by PSO
5 pages
Final Exam Machine Learning & Data Mining
No ratings yet
Final Exam Machine Learning & Data Mining
3 pages
Algorithm
No ratings yet
Algorithm
40 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Section 6 Objectives:: Data Structures & Algorithms
No ratings yet
Section 6 Objectives:: Data Structures & Algorithms
12 pages
Algorithm Question Set
75% (4)
Algorithm Question Set
26 pages
Interview Preparation Kit
No ratings yet
Interview Preparation Kit
132 pages
3 Artificial Intelligence - Week3
No ratings yet
3 Artificial Intelligence - Week3
18 pages
Network Flow Algorithms 1st Edition Williamson 2024 Scribd Download
100% (5)
Network Flow Algorithms 1st Edition Williamson 2024 Scribd Download
55 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages

Lectures on Parallel

Uploaded by

Lectures on Parallel

Uploaded by

SORTING ON THE CREW MODEL

We attempt to deal with two of the objections raised with regards to

We begin by distributing the elements of S evenly among the N

The N sorted subsequences are now merged pairwise, simultaneously,

The algorithm is given in what follows as procedure CREW SORT.

We denote the initial subsequence of S allocated to processor Pi by Si.

Analysis. The dominating operation in step 1 is the call to QUICKSORT,

t(n) = O((n/N)log(n/N)) + O((n/N)log N + log n log N)

SORTING ON THE EREW MODEL

Still, procedure CREW SORT tolerates multiple-read operations. Our

The idea is to adapt the sequential procedure QUICKSORT to run on a

to sort S3 and S4.

t(n) = cnX + 2t(n/k)

SORTING ON CRCW MODEL

Assume that n2 processors are available on such a CRCW computer to

The sorting algorithm to be used is based on the idea of sorting by

Once all the ci have been computed, si is placed in position 1 + ci of the

The algorithm is given as procedure CRCW SORT:

MERGING ON THE CREW MODEL

It is desired that the parallel algorithm satisfy the properties:

We now describe an algorithm that satisfies these properties. It uses N

In step 3.3 processor P1 begins at elements aI = 2 and b, = 1 and merges all

The resulting sequence C is therefore {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11, 12,13, 14,

Analysis. A step-by-step analysis of CREW MERGE follows:

The sequence S is then subdivided into N subsequences of length n/N

All processors now perform procedure BINARY SEARCH on their assigned

The total time required by this EREW searching algorithm is therefore

Fig: Format of record in file to be searched.

Let g be the smallest integer such that that is

. g stages are sufficient to search a sequence of

You might also like