Lectures on Parallel
Lectures on Parallel
The resulting subsequences are again merged pairwise and the process
continues until one sorted sequence of length n is obtained.
c(n) = O(n log n + N log2 n), which is optimal for N < n/log n.
each.
The
Let k = 2[1/x]. The algorithm is given as procedure EREW SORT:
Example:
Let S = {5,9,12,16,18,2,6,13,17,4,7,18,18,11,3,17,20,19,14,8,5,17,1,11,15, 10,
10} (i.e., n = 27) and let five processors P 1, P2, P3, P4, P5 be available on an EREW
SM-SIMD computer (i.e., N = 5).
.
.
are in sorted
order.
are in sorted
order.
shown below:
Analysis. The call to QUICKSORT takes constant time. From the analysis
of procedure PARALLEL SELECT in steps 1-4 require cnX time units for
some constant c. The running time of procedure EREW SORT is therefore
Basic Idea
If two elements si and sj are equal, then s, is taken to be the larger of the
two if i > j; otherwise sj is the larger.
The shared memory contains two arrays: The input sequence is stored in
array S, while the counts ci are stored in array C. The sorted sequence is
returned in array S.
The ith row of processors is "in charge" of element s i: Processors P(i, 1),
P(i, 2), .. ., P(i, n) compute ci and store si in position 1 + ci of S.
Example:
Let S = {5, 2, 4, 5}. The two elements of S that each of the 16 processors
compares and the contents of arrays S and C after each step of procedure
CRCW SORT.
(ii) the write conflict resolution process is itself very powerful-all numbers
to
be stored in a memory location are added and stored in constant time;and
3. uses a very large number of processors; that is, the number of
processors grows quadratically with the size of the input.
For these reasons, particularly the last one, the algorithm is most likely to
be of no great practical value. Nevertheless, procedure CRCW SORT is
interesting in its own right: It demonstrates how sorting can be
accomplished in constant time on a model.
Basic Idea
The procedure takes as input a sequence S = {s1, s2 , ... , sn.} of numbers
sorted in nondecreasing order and a number x. If x belongs to S, the
procedure returns the index k of an element Sk in S such that x = Sk.
Otherwise, the procedure returns a zero. Binary search is based on the
divide-and-conquer principle. At each stage, a comparison is performed
between x and an element of S. Either the two are equal and the
procedure terminates or half of the elements of the sequence under
consideration are discarded. The process continues until the number of
elements left is 0 or 1, and after at most one additional comparison the
procedure terminates.
Since the number of elements under consideration is reduced by one-half
at each step, the procedure requires O(log n) time in the worst case.
We are now ready to describe our first parallel merging algorithm for a
sharedmemory computer. The algorithm is presented as procedure CREW
MERGE.
Example:
In steps 3.1 and 3.2, Q(1) = (1, 1), Q(2) = (5, 3), Q(3) = (6, 7), and Q(4) = (10, 9) are
determined.
Processor P3 begins at a6 = 12 and b, = 14 and creates {12, 13, 14, 15, 16, 17}.
Finally P4 begins at a10 = 20 and bg = 18 and creates {18, 19, 20, 21, 22, 23, 24}.
Step 2: This step consists of two applications of procedure BINARY SEARCH to a sequence
of length N -1, each followed by an assignment statement. This takes O(log N) time.
Step 3: Step 3.1 consists of a constant-time assignment, and step 3.2 requires at most
O(log s) time. To analyze step 3.3, we first observe that V contains 2N – 2 elements that
divide C into 2N -1 subsequences with maximum size equal to ([r/N] + [s/N]). This
maximum size occurs if, for example, one element a of A' equals an element bj of B';
then the [r/N] elements smaller than or equal to a, (and larger than or equal to a ,-) are
also smaller than or equal to bj, and similarly, the [s/N] elements smaller than or equal to
bJ (and larger than or equal to bj-,) are also smaller than or equal to a'. In step 3 each
processor creates two such subsequences of C whose total size is therefore no larger
than
2([r/N] + rs/Ni), except PN, which creates only one subsequence of C. It follows that
procedure SEQUENTIAL MERGE takes at most O((r + s)/N) time.
In the worst case, r = s = n, and since n > N, the algorithm's running time is dominated
by the time required by step 3. Thus t(2n) = O((n/N) + log n).
Since p(2n) = N, c(2n) = p(2n) x t(2n) = O(n + N log n), and the algorithm is
cost optimal when N < n/log n.
unit-IV-Searching
Searching is one of the most fundamental operations in the field of
computing. It is used in any application where we need to find out
whether an element belongs to a list or, more generally, retrieve from a
file information associated with that element. In its most basic form the
searching problem is stated as follows: Given a sequence S = {s1, s2 ,.
Sn} of integers and an integer x, it is required to determine whether X =
Sk for some sk in S.
In sequential computing, the problem is solved by scanning the sequence
S and comparing x with its successive elements until either an integer
equal to x is found or the sequence is exhausted without success. This is
given in what follows as procedure SEQUENTIAL SEARCH. As soon as an Sk
in S is found such that x = Sk, the procedure returns k; otherwise 0 is
returned.
In the worst case, the procedure takes 0(n) time. This is clearly optimal,
since every element of S must be examined (when x is not in S) before
declaring failure. Alternatively, if S is sorted in nondecreasing order, then
procedure BINARY SEARCH can return the index of an element of S equal
to x (or 0 if no such element exists) in 0(log n) time.
EREW Searching
N-processor EREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.
the value of x must be made known to all processors. This can be done
using procedure BROADCAST in O(log N) time.
Since the elements of S are all distinct, at most one processor finds an Sk
equal to x and returns k.
CREW Searching
N-processor CREW SM SIMD computer is available to search S for a given
element x, where 1 < N < n.
The value of x must be made known to all processors. This can be done in
constant time.
Basic Idea
There are N processors and hence an (N + 1)-ary search can be used. At
each stage, the sequence is split into N + 1 subsequences of equal length
and the N processors simultaneously probe the elements at the boundary
between successive subsequences.
To search a
ALGO CREW SEARCH