0% found this document useful (0 votes)
0 views

Lectures on Parallel

The document discusses various sorting algorithms designed for different parallel computing models, specifically focusing on CRCW, CREW, and EREW models. It presents algorithms that aim to optimize processor usage, running time, and cost while addressing write conflicts and adapting to the number of processors. The analysis includes examples and performance metrics for each algorithm, demonstrating their efficiency and optimality under certain conditions.

Uploaded by

Shubam gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lectures on Parallel

The document discusses various sorting algorithms designed for different parallel computing models, specifically focusing on CRCW, CREW, and EREW models. It presents algorithms that aim to optimize processor usage, running time, and cost while addressing write conflicts and adapting to the number of processors. The analysis includes examples and performance metrics for each algorithm, demonstrating their efficiency and optimality under certain conditions.

Uploaded by

Shubam gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

SORTING ON THE CREW MODEL

We attempt to deal with two of the objections raised with regards to


procedure CRCW SORT: its excessive use of processors and its tolerance
of write conflicts.
Our purpose is to design an algorithm that is free of write conflicts and
uses a reasonable number of processors. In addition, we shall require the
algorithm to also satisfy our usual desired properties for shared-memory
SIMD algorithms.
Thus the algorithm should have
(i) a sublinear and adaptive number of processors,
(ii) a running time that is small and adaptive, and
(iii) a cost that is optimal.
Basic Idea
The idea is quite simple. Assume that a CREW SM SIMD computer with N
processors P1 , P2 , ., Pn is to be used to sort the sequence

We begin by distributing the elements of S evenly among the N


processors. Each processor sorts its allocated subsequence sequentially
using procedure QUICKSORT.

The N sorted subsequences are now merged pairwise, simultaneously,


using procedure CREW MERGE for each pair.

The resulting subsequences are again merged pairwise and the process
continues until one sorted sequence of length n is obtained.

The algorithm is given in what follows as procedure CREW SORT.

We denote the initial subsequence of S allocated to processor Pi by Si.


Subsequently, S Kj is used to denote a subsequence obtained by merging
two subsequences and P Kj the set of processors that performed the merge.
Example:

Analysis. The dominating operation in step 1 is the call to QUICKSORT,


which requires O((n/N)log(n/N)) time. During each iteration of step 2.3,
[v/2] pairs of subsequences with n/[v/2] elements per pair are to be
merged simultaneously using N/[v/2J processors per pair. Procedure CREW
MERGE thus requires O([(n/[v/2])/(N/[v/2)] + log(n/[v/2)), that is, O((n/N) +
log n) time. Since step 2.3 is iterated [log N] times, the total running time
of procedure CREW SORT is

t(n) = O((n/N)log(n/N)) + O((n/N)log N + log n log N)


= O((n/N)log n + log2 n).
Since p(n) = N, the procedure's cost is given by

c(n) = O(n log n + N log2 n), which is optimal for N < n/log n.

SORTING ON THE EREW MODEL


Two of the criticisms expressed with regards to procedure CRCW SORT
were
addressed by procedure CREW SORT, which adapts to the number of
existing processors and disallows multiple-write operations into the same
memory location.

Still, procedure CREW SORT tolerates multiple-read operations. Our


purpose in this section is to deal with this third difficulty.
Basic Idea

The idea is to adapt the sequential procedure QUICKSORT to run on a


parallel computer. We begin by noting that, since N < n, we can write N =
n1−x , where 0 < x < 1.

each.

The
Let k = 2[1/x]. The algorithm is given as procedure EREW SORT:

Example:
Let S = {5,9,12,16,18,2,6,13,17,4,7,18,18,11,3,17,20,19,14,8,5,17,1,11,15, 10,
10} (i.e., n = 27) and let five processors P 1, P2, P3, P4, P5 be available on an EREW
SM-SIMD computer (i.e., N = 5).

.
.

are in sorted
order.

are in sorted
order.

to sort S3 and S4.

shown below:
Analysis. The call to QUICKSORT takes constant time. From the analysis
of procedure PARALLEL SELECT in steps 1-4 require cnX time units for
some constant c. The running time of procedure EREW SORT is therefore

t(n) = cnX + 2t(n/k)


= O(nx log n).
1−x
Since p(n) = n , the procedure's cost is given by
c(n) = p(n) x t(n) = O(n log n), which is optimal. Note, however, that since
n1−x < n/log n, cost optimality is restricted to the range N < n/log n.

SORTING ON CRCW MODEL


Whenever an algorithm is to be designed for the CRCW model of
computation, one must specify how write conflicts, that is, multiple
attempts to write into the same memory location, can be resolved. For the
purposes of the sorting algorithm to be described, we shall assume that
write conflicts are created whenever several processors attempt to write
potentially different integers into the same address. The conflict is
resolved by storing the sum of these integers in that address.

Basic Idea

Assume that n2 processors are available on such a CRCW computer to


sort the sequence

The sorting algorithm to be used is based on the idea of sorting by


enumeration: The position of each element si of S in the sorted sequence
is
determined by computing ci, the number of elements smaller than it.

If two elements si and sj are equal, then s, is taken to be the larger of the
two if i > j; otherwise sj is the larger.

Once all the ci have been computed, si is placed in position 1 + ci of the


sorted sequence.
We assume that the processors are arranged into n rows of n elements
each and are numbered.

The shared memory contains two arrays: The input sequence is stored in
array S, while the counts ci are stored in array C. The sorted sequence is
returned in array S.

The ith row of processors is "in charge" of element s i: Processors P(i, 1),
P(i, 2), .. ., P(i, n) compute ci and store si in position 1 + ci of S.

The algorithm is given as procedure CRCW SORT:

Example:

Let S = {5, 2, 4, 5}. The two elements of S that each of the 16 processors
compares and the contents of arrays S and C after each step of procedure
CRCW SORT.
(ii) the write conflict resolution process is itself very powerful-all numbers
to
be stored in a memory location are added and stored in constant time;and
3. uses a very large number of processors; that is, the number of
processors grows quadratically with the size of the input.
For these reasons, particularly the last one, the algorithm is most likely to
be of no great practical value. Nevertheless, procedure CRCW SORT is
interesting in its own right: It demonstrates how sorting can be
accomplished in constant time on a model.

MERGING ON THE CREW MODEL


A CREW SM SIMD computer consists of N processors P1, P2 , ... PN. It is
required to design a parallel algorithm for this computer that takes the
two sequences A and B as input and produces the sequence C as output,
as defined earlier. Without loss of generality, we assume that r < s.

It is desired that the parallel algorithm satisfy the properties:


(i) the number of processors used by the algorithm be sublinear and
adaptive,
(ii) the running time of the algorithm be adaptive and significantly smaller
than the best sequential algorithm, and
(iii) the cost be optimal.

We now describe an algorithm that satisfies these properties. It uses N


processors where N < r and in the worst case when r = s = n runs in
0((n/N) + log n) time. The algorithm is therefore cost optimal for N < n/log
n. In addition to the basic arithmetic and logic functions usually available,
each of the N processors is assumed capable of performing the following
two sequential procedures:
1. Procedure SEQUENTIAL MERGE
2. Procedure BINARY SEARCH

Basic Idea
The procedure takes as input a sequence S = {s1, s2 , ... , sn.} of numbers
sorted in nondecreasing order and a number x. If x belongs to S, the
procedure returns the index k of an element Sk in S such that x = Sk.
Otherwise, the procedure returns a zero. Binary search is based on the
divide-and-conquer principle. At each stage, a comparison is performed
between x and an element of S. Either the two are equal and the
procedure terminates or half of the elements of the sequence under
consideration are discarded. The process continues until the number of
elements left is 0 or 1, and after at most one additional comparison the
procedure terminates.
Since the number of elements under consideration is reduced by one-half
at each step, the procedure requires O(log n) time in the worst case.
We are now ready to describe our first parallel merging algorithm for a
sharedmemory computer. The algorithm is presented as procedure CREW
MERGE.
Example:

In steps 3.1 and 3.2, Q(1) = (1, 1), Q(2) = (5, 3), Q(3) = (6, 7), and Q(4) = (10, 9) are
determined.

In step 3.3 processor P1 begins at elements aI = 2 and b, = 1 and merges all


elements of A and B smaller than 7, thus creating the subsequence { 1, 2, 3, 4,
5, 6} of C. Similarly, processor P2 begins at a5 = 1 1 and b3 = 7 and merges all
elements smaller than 12, thus creating {7, 8, 9, 10, 11}.

Processor P3 begins at a6 = 12 and b, = 14 and creates {12, 13, 14, 15, 16, 17}.

Finally P4 begins at a10 = 20 and bg = 18 and creates {18, 19, 20, 21, 22, 23, 24}.

The resulting sequence C is therefore {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11, 12,13, 14,


15, 16, 17, 18, 19, 20, 21, 22, 23, 24}.

Analysis. A step-by-step analysis of CREW MERGE follows:


Step 1: With all processors operating in parallel, each processor computes two
subscripts. Therefore this step requires constant time.

Step 2: This step consists of two applications of procedure BINARY SEARCH to a sequence
of length N -1, each followed by an assignment statement. This takes O(log N) time.

Step 3: Step 3.1 consists of a constant-time assignment, and step 3.2 requires at most
O(log s) time. To analyze step 3.3, we first observe that V contains 2N – 2 elements that
divide C into 2N -1 subsequences with maximum size equal to ([r/N] + [s/N]). This
maximum size occurs if, for example, one element a of A' equals an element bj of B';
then the [r/N] elements smaller than or equal to a, (and larger than or equal to a ,-) are
also smaller than or equal to bj, and similarly, the [s/N] elements smaller than or equal to
bJ (and larger than or equal to bj-,) are also smaller than or equal to a'. In step 3 each
processor creates two such subsequences of C whose total size is therefore no larger
than
2([r/N] + rs/Ni), except PN, which creates only one subsequence of C. It follows that
procedure SEQUENTIAL MERGE takes at most O((r + s)/N) time.

In the worst case, r = s = n, and since n > N, the algorithm's running time is dominated
by the time required by step 3. Thus t(2n) = O((n/N) + log n).
Since p(2n) = N, c(2n) = p(2n) x t(2n) = O(n + N log n), and the algorithm is
cost optimal when N < n/log n.

unit-IV-Searching
Searching is one of the most fundamental operations in the field of
computing. It is used in any application where we need to find out
whether an element belongs to a list or, more generally, retrieve from a
file information associated with that element. In its most basic form the
searching problem is stated as follows: Given a sequence S = {s1, s2 ,.
Sn} of integers and an integer x, it is required to determine whether X =
Sk for some sk in S.
In sequential computing, the problem is solved by scanning the sequence
S and comparing x with its successive elements until either an integer
equal to x is found or the sequence is exhausted without success. This is
given in what follows as procedure SEQUENTIAL SEARCH. As soon as an Sk
in S is found such that x = Sk, the procedure returns k; otherwise 0 is
returned.

In the worst case, the procedure takes 0(n) time. This is clearly optimal,
since every element of S must be examined (when x is not in S) before
declaring failure. Alternatively, if S is sorted in nondecreasing order, then
procedure BINARY SEARCH can return the index of an element of S equal
to x (or 0 if no such element exists) in 0(log n) time.

EREW Searching
N-processor EREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.

the value of x must be made known to all processors. This can be done
using procedure BROADCAST in O(log N) time.

The sequence S is then subdivided into N subsequences of length n/N


each, and processor Pi is assigned {S(i-1)(n/N)+1, S(i-2)(n/N)+2, ... ,
Si(n/N)}.

All processors now perform procedure BINARY SEARCH on their assigned


subsequences. This requires O(log(n/N)) in the worst case.

Since the elements of S are all distinct, at most one processor finds an Sk
equal to x and returns k.

The total time required by this EREW searching algorithm is therefore


O(log N) + O(log(n/N)), which is O(log n).

Fig: Format of record in file to be searched.

CREW Searching
N-processor CREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.

The value of x must be made known to all processors. This can be done in
constant time.
Basic Idea
There are N processors and hence an (N + 1)-ary search can be used. At
each stage, the sequence is split into N + 1 subsequences of equal length
and the N processors simultaneously probe the elements at the boundary
between successive subsequences.

Let g be the smallest integer such that that is

. g stages are sufficient to search a sequence of


length n for an element equal to an input x.

To search a
ALGO CREW SEARCH

You might also like