PRAM Model

The document summarizes the PRAM (Parallel Random Access Machine) model for parallel computation. It defines the PRAM model as extending the RAM sequential model to allow multiple identical processors to communicate via shared memory locations. The PRAM model allows for different memory access methods like exclusive read/write and concurrent read/write. It also discusses strengths and variants of the PRAM model for analyzing parallel algorithms and computational complexity.

Uploaded by

vsk_psg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

221 views72 pages

PRAM Model

Uploaded by

vsk_psg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 72

The PRAM Model

for
Parallel Computation
References
1. Selim Akl, Parallel Computation: Models and Methods, Prentice
Hall, 1997, Updated online version available through website.
2. Selim Akl, The Design of Efficient Parallel Algorithms, Chapter 2 in
“Handbook on Parallel and Distributed Processing” edited by J.
Blazewicz, K. Ecker, B. Plateau, and D. Trystram, Springer Verlag,
2000.
3. Selim Akl, Design & Analysis of Parallel Algorithms, Prentice Hall,
1989.
4. Henri Casanova, Arnaud Legrand, and Yves Robert, Parallel
Algorithms, CRC Press, 2009.
5. Cormen, Leisterson, and Rivest, Introduction to Algorithms, 1st
edition (i.e., older), 1990, McGraw Hill and MIT Press, Chapter 30
on parallel algorithms.
6. Phillip Gibbons, Asynchronous PRAM Algorithms, Ch 22 in
Synthesis of Parallel Algorithms, edited by John Reif, Morgan
Kaufmann Publishers, 1993.
7. Joseph JaJa, An Introduction to Parallel Algorithms, Addison
Wesley, 1992.
8. Michael Quinn, Parallel Computing: Theory and Practice, McGraw
Hill, 1994
9. Michael Quinn, Designing Efficient Algorithms for Parallel
Computers, McGraw Hill, 1987.
Outline
• Computational Models
• Definition and Properties of the PRAM Model
• Parallel Prefix Computation
• The Array Packing Problem
• Cole’s Merge Sort for PRAM
• PRAM Convex Hull algorithm using divide &
conquer
• Issues regarding implementation of PRAM
model
Concept of “Model”
• An abstract description of a real world entity
• Attempts to capture the essential features while
suppressing the less important details.
• Important to have a model that is both precise
and as simple as possible to support theoretical
studies of the entity modeled.
• If experiments or theoretical studies show the
model does not capture some important aspects
of the physical entity, then the model should be
refined.
• Some people will not accept most abstract
model of reality, but instead insist on reality.
• Sometimes reject a model as invalid if it does not
capture every tiny detail of the physical entity.
Parallel Models of Computation
• Describes a class of parallel computers
• Allows algorithms to be written for a
general model rather than for a specific
computer.
• Allows the advantages and disadvantages
of various models to be studied and
compared.
• Important, since the life-time of specific
computers is quite short (e.g., 10 years).
Controversy over Parallel Models
• Some professionals (often engineers) will not accept a
parallel model if
– It does not capture every detail of reality
– It cannot currently be built
• Engineers often insist that a model must be valid for any
number of processors

– Parallel computers with more processors than the number of

atoms in the observable universe are unlikely to be built in the
foreseeable future.
• If they are ever built, the model for them is likely to be vastly different
from current models today.
– Even models that allow a billion or more processors are likely to
be very different from those supporting at most a few million
processors.
The PRAM Model
• PRAM is an acronym for
Parallel Random Access Machine
• The earliest and best-known model for
parallel computing.
• A natural extension of the RAM sequential
model
• More algorithms designed for PRAM than
any other model.
The RAM Sequential Model
• RAM is an acronym for Random Access Machine
• RAM consists of
– A memory with M locations.
• Size of M can be as large as needed.
– A processor operating under the control of a
sequential program which can
• load data from memory
• store date into memory
• execute arithmetic & logical computations on data.
– A memory access unit (MAU) that creates a path from
the processor to an arbitrary memory location.
RAM Sequential Algorithm Steps
• A READ phase in which the processor
reads datum from a memory location and
copies it into a register.
• A COMPUTE phase in which a processor
performs a basic operation on data from
one or two of its registers.
• A WRITE phase in which the processor
copies the contents of an internal register
into a memory location.
PRAM Model Discussion
• Let P1, P2 , ... , Pn be identical processors
• Each processor is a RAM processor with a private local
memory.
• The processors communicate using m shared (or global)
memory locations, U1, U2, ..., Um.
– Allowing both local & global memory is typical in model study.
• Each Pi can read or write to each of the m shared memory
locations.
• All processors operate synchronously (i.e. using same
clock), but can execute a different sequence of
instructions.
– Some authors inaccurately restrict PRAM to simultaneously
executing the same sequence of instructions (i.e., SIMD fashion)
• Each processor has a unique index called, the processor
ID, which can be referenced by the processor’s program.
– Often an unstated assumption for a parallel model
PRAM Computation Step
• Each PRAM step consists of three phases, executed in
the following order:
– A read phase in which each processor may read a
value from shared memory
– A compute phase in which each processor may
perform basic arithmetic/logical operations on their
local data.
– A write phase where each processor may write a
value to shared memory.
• Note that this prevents reads and writes from being
simultaneous.
• Above requires a PRAM step to be sufficiently long to
allow processors to do different arithmetic/logic
operations simultaneously.
SIMD Style Execution for PRAM
• Most algorithms for PRAM are of the
single instruction stream multiple data
(SIMD) type.
– All PEs execute the same instruction on
their own datum
– Corresponds to each processor executing
the same program synchronously.
– PRAM does not have a concept similar to
SIMDs of all active processors accessing
the ‘same local memory location’ at each
step.
SIMD Style Execution for PRAM
(cont)
• PRAM model was historically viewed by
some as a shared memory SIMD.
– Called a SM SIMD computer in [Akl 89].
– Called a SIMD-SM by early textbook [Quinn
87].
– PRAM executions required to be SIMD [Quinn
94]
– PRAM executions required to be SIMD in [Akl
2000]
The Unrestricted PRAM Model
• The unrestricted definition of PRAM allows the
processors to execute different instruction
streams as long as the execution is
synchronous.
– Different instructions can be executed within the unit
time allocated for a step
– See JaJa, pg 13
• In the Akl Textbook, processors are allowed to
operate in a “totally asychronous fashion”.
– See page 39
– Assumption may have been intended to agree with
above, since no charge for synchronization or
communications is included.
Asynchronous PRAM Models
• While there are several asynchronous models, a typical
asynchronous model is described in [Gibbons 1993].
• The asychronous PRAM models do not constrain
processors to operate in lock step.
– Processors are allowed to run synchronously and
then charged for any needed synchronization.
• A non-unit charge for processor communication.
– Take longer than local operations
– Difficult to determine a “fair charge” when message-
passing is not handled in synchronous-type manner.
• Instruction types in Gibbon’s model
– Global Read, Local operations, Global Write, Synchronization
• Asynchronous PRAM models are useful tools in study of
actual cost of asynchronous computing
• The word ‘PRAM’ usually means ‘synchronous PRAM’
Some Strengths of PRAM Model
JaJa has identified several strengths designing
parallel algorithms for the PRAM model.
• PRAM model removes algorithmic details concerning
synchronization and communication, allowing designers
to focus on obtaining maximum parallelism
• A PRAM algorithm includes an explicit understanding of
the operations to be performed at each time unit and an
explicit allocation of processors to jobs at each time unit.
• PRAM design paradigms have turned out to be robust
and have been mapped efficiently onto many other
parallel models and even network models.
PRAM Strengths (cont)
• PRAM strengths - Casanova et. al. book.
– With the wide variety of parallel architectures, defining
a precise yet general model for parallel computers
seems hopeless.
• Most daunting is modeling of data communications costs
within a parallel computer.
– A reasonable way to accomplish this is to only charge
unit cost for each data move.
• They view this as ignoring computational cost.
– Allows minimal computational complexity of
algorithms for a problem to be determined.
– Allows a precise classification of problems, based on
their computational complexity.
PRAM Memory Access Methods
• Exclusive Read (ER): Two or more processors
can not simultaneously read the same memory
location.
• Concurrent Read (CR): Any number of
processors can read the same memory location
simultaneously.
• Exclusive Write (EW): Two or more processors
can not write to the same memory location
simultaneously.
• Concurrent Write (CW): Any number of
processors can write to the same memory
location simultaneously.
Variants for Concurrent Write
• Priority CW: The processor with the highest
priority writes its value into a memory location.
• Common CW: Processors writing to a common
memory location succeed only if they write the
same value.
• Arbitrary CW: When more than one value is
written to the same location, any one of these
values (e.g., one with lowest processor ID) is
stored in memory.
• Random CW: One of the processors is randomly
selected write its value into memory.
Concurrent Write (cont)
• Combining CW: The values of all the processors
trying to write to a memory location are
combined into a single value and stored into the
memory location.
– Some possible functions for combining
numerical values are SUM, PRODUCT,
MAXIMUM, MINIMUM.
– Some possible functions for combining
boolean values are AND, INCLUSIVE-OR,
EXCLUSIVE-OR, etc.
ER & EW Generalizations
• Casanova et.al. mention that sometimes
ER and EW are generalized to allow a
bounded number of read/write accesses.
• With EW, the types of concurrent writes
must also be specified, as in CW case.
Additional PRAM comments
1. PRAM encourages a focus on minimizing computation
and communication steps.
• Means & cost of implementing the communications
on real machines ignored
2. PRAM is often considered as unbuildable & impractical
due to difficulty of supporting parallel PRAM memory
access requirements in constant time.
3. However, Selim Akl shows a complex but efficient
MAU for all PRAM models (EREW, CRCW, etc) that
can be supported in hardware in O(lg n) time for n PEs
and O(n) memory locations. (See [2. Ch.2].
4. Akl also shows that the sequential RAM model also
requires O(lg m) hardware memory access time for m
memory locations.
• Some strongly criticize PRAM communication cost
assumptions but accept without question the cost in
RAM memory cost assumptions.
Parallel Prefix Computation
• EREW PRAM Model is assumed for this
discussion
• A binary operation on a set S is a function
:SS  S.
• Traditionally, the element (s1, s2) is denoted as
s 1 s 2.
• The binary operations considered for prefix
computations will be assumed to be
– associative: (s1  s2)  s3 = s1  (s2  s3 )
• Examples
– Numbers: addition, multiplication, max, min.
– Strings: concatenation for strings
– Logical Operations: and, or, xor
• Note:  is not required to be commutative.
Prefix Operations
• Let s0, s1, ... , sn-1 be elements in S.
• The computation of p0, p1, ... ,pn-1 defined
below is called prefix computation:
p0 = s 0
p1 = s 0  s 1
.
.
.
pn-1 = s0  s1  ...  sn-1
Prefix Computation Comments
• Suffix computation is similar, but proceeds from
right to left.
• A binary operation is assumed to take constant
time, unless stated otherwise.
• The number of steps to compute pn-1 has a lower
bound of (n) since n-1 operations are
required.
• Next visual diagram of algorithm for n=8 from
Akl’s textbook. (See Fig. 4.1 on pg 153)
– This algorithm is used in PRAM prefix algorithm
– The same algorithm is used by Akl for the hypercube
(Ch 2) and a sorting combinational circuit (Ch 3).
EREW PRAM Prefix Algorithm
• Assume PRAM has n processors, P0, P1 , ... , Pn-1, and n
is a power of 2.
• Initially, Pi stores xi in shared memory location si for i =
0,1, ... , n-1.
• Algorithm Steps:
for j = 0 to (lg n) -1, do
for i = 2j to n-1 in parallel do
h = i - 2j
si = sh  si
endfor
endfor
Prefix Algorithm Analysis
• Running time is t(n) = (lg n)
• Cost is c(n) = p(n)  t(n) = (n lg n)
• Note not cost optimal, as RAM takes (n)
Example for Cost Optimal Prefix
• Sequence – 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
• Use n / lg n PEs with lg(n) items each
• 0,1,2,3 4,5,6,7 8,9,10,11 12,13,14,15
• STEP 1: Each PE performs sequential prefix sum
• 0,1,3,6 4,9,15,22 8,17,27,38 12,25,39,54
• STEP 2: Perform parallel prefix sum on last nr. in PEs
• 0,1,3,6 4,9,15,28 8,17,27,66 12,25,39,120
• Now prefix value is correct for last number in each PE
• STEP 3: Add last number of each sequence to incorrect
sums in next sequence (in parallel)
• 0,1,3,6 10,15,21,28 36,45,55,66 78,91,105,120
A Cost-Optimal EREW PRAM
Prefix Algorithm
• In order to make the prefix algorithm optimal, we must
reduce the cost by a factor of lg n.
• We reduce the nr of processors by a factor of lg n (and
check later to confirm the running time doesn’t change).
• Let k = lg n and m = n/k
• The input sequence X = (x0, x1, ..., xn-1) is partitioned into
m subsequences Y0, Y1 , ... ., Ym-1 with k items in each
subsequence.
– While Ym-1 may have fewer than k items, without loss
of generality (WLOG) we may assume that it has k
items here.
• Then all sequences have the form,
Yi = (xi*k, xi*k+1, ..., xi*k+k-1)
PRAM Prefix Computation (X, ,S)
• Step 1: For 0  i < m, each processor Pi computes the
prefix computation of the sequence Yi = (xi*k, xi*k+1, ...,
xi*k+k-1) using the RAM prefix algorithm (using ) and
stores prefix results as sequence si*k, si*k+1, ... , si*k+k-1.
• Step 2: All m PEs execute the preceding PRAM prefix
algorithm on the sequence (sk-1, s2k-1 , ... , sn-1)
– Initially Pi holds si*k-1
– Afterwards Pi places the prefix sum sk-1  ...  sik-1 in
sik-1
• Step 3: Finally, all Pi for 1im-1 adjust their partial
value sums for all but the final term in their partial sum
subsequence by performing the computation
sik+j  sik+j  sik-1
for 0  j  k-2.
Algorithm Analysis
• Analysis:
– Step 1 takes O(k) = O(lg n) time.
– Step 2 takes (lg m) = (lg n/k)
= O(lg n- lg k) = (lg n - lg lg n)
= (lg n)
– Step 3 takes O(k) = O(lg n) time
– The running time for this algorithm is (lg n).
– The cost is ((lg n)  n/(lg n)) = (n)
– Cost optimal, as the sequential time is O(n)
• The combined pseudocode version of this
algorithm is given on pg 155 of the Akl textbook
The Array Packing Problem
• Assume that we have
– an array of n elements, X = {x1, x2, ... , xn}
– Some array elements are marked (or distinguished).
• The requirements of this problem are to
– pack the marked elements in the front part of the
array.
– place the remaining elements in the back of the array.
• While not a requirement, it is also desirable to
– maintain the original order between the marked
elements
– maintain the original order between the unmarked
elements
A Sequential Array Packing Algorithm
• Essentially “burn the candle at both ends”.
• Use two pointers q (initially 1) and r (initially n).
• Pointer q advances to the right until it hits an
unmarked element.
• Next, r advances to the left until it hits a marked
element.
• The elements at position q and r are switched and
the process continues.
• This process terminates when q  r.
• This requires O(n) time, which is optimal. (why?)
Note: This algorithm does not maintain original
order between elements
EREW PRAM Array Packing Algorithm
1.Set si in Pi to 1 if xi is marked and set si = 0
otherwise.
2. Perform a prefix sum on S =(s1, s2 ,..., sn) to
obtain destination di = si for each marked xi .
3. All PEs set m = sn , the total nr of marked
elements.
4. Pi sets si to 0 if xi is marked and otherwise sets
si = 1.
5. Perform a prefix sum on S and set di = si + m
for each unmarked xi .
6. Each Pi copies array element xi into address di
in X.
Array Packing Algorithm Analysis
• Assume n/lg(n) processors are used above.
• Optimal prefix sums requires O(lg n) time.
• The EREW broadcast of sn needed in Step 3 takes
O(lg n) time using either
1. a binary tree in memory (See Akl text, Example 1.4.)
2. or a prefix sum on sequence b1,…,bn with
b1= an and bi= 0 for 1< i  n)
• All and other steps require constant time.
• Runs in O(lg n) time, which is cost optimal. (why?)
• Maintains original order in unmarked group as well
Notes:
• Algorithm illustrates usefulness of Prefix Sums
There many applications for Array Packing algorithm.
Problem: Show how a PE can broadcast a value to all other PEs
in EREW in O(lg n) time using a binary tree in memory.
List Ranking Algorithm
(Using Pointer Jumping)
• Problem: Given a linked list, find the location of each node
in the list.
• Next algorithm uses the pointer jumping technique
• Ref: Pg 6-7 Casanova, et.al. & Pg 236-241 Akl text. In
Akl’s text, you should read prefix sum on pg 236-8 first.
• Assume we have a linked list L of n objects distributed in
PRAM’s memory
• Assume that each Pi is in charge of a node i
• Goal: Determine the distance d[i] of each object in linked
list to the end, where d is defined as follows:


0 if next[i] = nil
d[i] =
d[next [i]] +1 if next[i] ≠ nil
Backup of Previous Diagram
Potential Problems?
• Consider following steps:
7. d[i] = d[i] + d[next[i+1]]
8. next[i] = next[next[i]]
• Casanova, et.al, pose below problem in Step7
– Pi reads d[i+1]and uses this value to update d[i]
– Pi-1 must read d[i] to update d[i-1]
– Computation fails if Pi change the value of d[i] before
Pi-1 can read it.
• This problem should not occur, as all PEs in
PRAM should execute algorithm synchronously.
– The same problem is avoided in Step 8 for the same
reason
Potential Problems? (cont.)
• Does Step 7 (&Step 8) require CR PRAM?
d[i] = d[i] + d[next[i]]
– Let j = next[i]
– Casanova et.al. suggests that P i and Pj may try to read
d[j] concurrently, requiring a CR PRAM model
– Again, if PEs are stepping through the computations
synchronously, EREW PRAM is sufficient here
• In Step 4, PRAM must determine whether there is
a node i with next[i] ≠ nil. A CWCR solution is:
– In Step 4a, set done to false
– In Step 4b, all PE write boolean value of “next[i] = nil”
using CW-common write.
• A EREW solution for Step 7 is given next
Rank-Computation using EREW
• Theorem: The Rank-Computation
algorithm only requires EREW PRAM
– Replace Step 4 with
• For step = 1 to log n do,
• Akl raises the question of what to do if an
unknown number of processors Pi, each of
which is in charge of node i (see pg 236).
– In this case, it would be necessary to go back
to the CRCW solution suggested earlier.
PRAM Model Separation
• We next consider the following two
questions
– Is CRCW strictly more powerful than CREW
– Is CREW strictly more powerful that EREW
• We can solve each of above questions by
finding a problem that the leftmost PRAM
can solve faster than the rightmost PRAM
CRCW Maximum Array Value Algorithm
CRCW Compute_Maximum (A,n)
• Algorithm requires O(n2) PEs, Pi,j.
1. forall i  {0, 1, … , n-1} in parallel do
• Pi,0 sets m[i] = True
2. forall i, j  {0, 1, … , n-1}2, i≠j, in parallel do
• [if A[i] < A[j] then Pi,j sets m[i] = False
3. forall i  {0, 1, … , n-1} in parallel do
• If m[i] = True, then Pi,0 sets max = A[i]
4. Return max
• Note that on n PEs do EW in steps 1 and 3
• The write in Step 2 can be a “common CW”
• Cost is O(1)  O(n2) which is O(n2)
CRCW More Powerful Than
CREW
• The previous algorithm establishes that
CRCW can calculate the maximum of an
array in O(1) time
• Using CREW, only two values can be
merged into a single value by one PE in a
single step.
– Therefore the number of values that need to be
merged can be halved at each step.
– So the fastest possible time for CREW is (log
n)
CREW More Powerful Than
EREW
• Determine if a given element e belongs to a set {e1,
e2, … , en} of n distinct elements
• CREW can solve this in O(1) using n PEs
– One PE initializes a variable result to false
– All PEs compare e to one ei.
– If any PE finds a match, it writes “true” to result.
• On EREW, it takes (log n) steps to broadcast the
value of e to all PEs.
– The number of PEs with the value of e can be doubled at
each step.
Simulating CRCW with EREW
Theorem: An EREW PRAM with p PEs can simulate
a common CRCW PRAM with p PEs in O(log p)
steps using O(p) extra memory.
• See Pg 14 of Casanova, et. al.
• The only additional capabilities of CRCW that
EREW PRAM has to simulate are CR and CW.
• Consider a CW first, and initially assume all PE
participate.
• EREW PRAM simulates this CW by creating a p2
array A with length p
Simulating Common CRCW with EREW
• When a CW write is simulated, PRAM EREW PE
j writes
– The memory cell address wishes to write to in A(j,0)
– The value it wishes into memory in A(j,1).
– If any PE j does not participate in CW, it will write -1 to
A(j,0).
• Next, sort A by its first column. This brings all of
the CW to same location together.
• If memory location in A(0,1) is not -1, then PE 0
writes the data value in A(0,1) to memory
location value stored in A(0,1).
PRAM Simulations (cont)
• All PEs j for j>0 read memory address in A(j,0) and
A(j-1,0)
– If memory location in A(j,0) is -1, PE j does not write.
– Also, if the two memory addresses are the same, PE j does
not write to memory.
– Otherwise, PE j writes data value in A(j,1) to memory
location in A(j,0).
• Cole’s algorithm that EREW can sort n items in log(n)
time is needed to complete this proof. It is discussed
next in Casanova et.al. for CREW.
Problem:
• This proof is invalid for CRCW versions stronger than
common CRCW, such as combining.
Cole’s Merge Sort for PRAM
• Cole’s Merge Sort runs on EREW PRAM in O(lg n) using
O(n) processors, so it is cost optimal.
– The Cole sort is significantly more efficient than most
other PRAM sorts.
• Akl calls this sort “PRAM SORT” in book & chptr (pg 54)
– A high level presentation of EREW version is given in
Ch. 4 of Akl’s online text and also in his book chapter
• A complete presentation for CREW PRAM is in JaJa.
– JaJa states that the algorithm he presents can be
modified to run on EREW, but that the details are
non-trivial.
• Currently, this sort is the best-known PRAM sort & is
usually the one cited when a cost-optimal PRAM sort
using O(n) PEs is needed.
References for Cole’s EREW Sort
Two references are listed below.
• Richard Cole, Parallel Merge Sort, SIAM Journal
on Computing, Vol. 17, 1988, pp. 770-785.
• Richard Cole, Parallel Merge Sort, Book-chapter
in “Synthesis of Parallel Algorithms”, Edited by
John Reif, Morgan Kaufmann, 1993, pg.453-496
Comments on Sorting
• A CREW PRAM algorithm that runs in
O((lg n) lg lg n) time
and uses O(n) processors which is much simpler is given
in JaJa’s book (pg 158-160).
– This algorithm is shown to be work optimal.
• Also, JaJa gives an O(lg n) time randomized sort for
CREW PRAM on pages 465-473.
– With high probability, this algorithm terminates in O(lg
n) time and requires O(n lg n) operations
• i.e., with high-probability, this algorithm is work-
optimal.
• Sorting is often called the “queen of the algorithms”:
• A speedup in the best-known sort for a parallel
model usually results in a similar speedup other
algorithms that use sorting.
Cole’s CREW Sort
• Given in 1986 by Cole [43 in Casanova]
• Also, sort given for EREW in same paper, but is even
more difficult.
• The general idea of algorithm technique follows:
– Based on classical merge sort, represented as a binary tree.
– All merging steps at a given level of the tree must be done in
parallel
– At each level, two sequences each of arbitrary size must be
merged in O(1) time.
• Partial information from previous merges is used to merge in constant
time, using a very clever technique.
– Since there are log n levels, this yields a log n running time.
Cole’s EREW Sort (cont)
• Defn: A sequence L is called a good sampler (GS) of
sequence J if, for any k1, there are at most 2k+1
elements of J between k+1 consecutive elements of {-}
 L {+ }
– Intuitively, elements of L are almost uniformly
distributed among elements of J.
Key is to use sorting tree of Fig 1.6 in a pipelined fashion. A
good sampler sequence is built at each level for next level.
Divide & Conquer PRAM Algorithms
(Reference: Akl, Chapter 5)
• Three Fundamental Operations
– Divide is the partitioning process
– Conquer is the process of solving the base problem
(without further division)
– Combine is the process of combining the solutions to
the subproblems
• Merge Sort Example
– Divide repeatedly partitions the sequence into halves.
– Conquer sorts the base set of one element
– Combine does most of the work. It repeatedly merges
two sorted halves
• Quicksort Example
– The divide stage does most of the work.
An Optimal CRCW PRAM
Convex Hull Algorithm
• Let Q = {q1, q2, . . . , qn} be a set of points in the
Euclidean plane (i.e., E2-space).
• The convex hull of Q is denoted by CH(Q) and
is the smallest convex polygon containing Q.
– It is specified by listing convex hull corner points
(which are from Q) in order (e.g., clockwise order).
• Usual Computational Geometry Assumptions:
– No three points lie on the same straight line.
– No two points have the same x or y coordinate.
– There are at least 4 points, as CH(Q) = Q for n  3.
PRAM CONVEX HULL(n,Q, CH(Q))
1. Sort the points of Q by x-coordinate.
2. Partition Q into k =n subsets Q1,Q2,. . . ,Qk of k
points each such that a vertical line can separate Qi
from Qj
– Also, if i < j, then Qi is left of Qj.
3. For i = 1 to k , compute the convex hulls of Qi in
parallel, as follows:
– if |Qi|  3, then CH(Qi) = Qi
– else (using k=n PEs) call PRAM CONVEX HULL(k, Qi,
CH(Qi))
4. Merge the convex hulls in {CH(Q1),CH(Q2), . . .
,CH(Qk)} into a convex hull for Q.
Merging n Convex Hulls
Details for Last Step of Algorithm
• The last step is somewhat tedious.
• The upper hull is found first. Then, the lower hull
is found next using the same method.
– Only finding the upper hull is described here
– Upper & lower convex hull points merged into ordered
set
• Each CH(Qi) has n PEs assigned to it.
• The PEs assigned to CH(Qi) (in parallel)
compute the upper tangent from CH(Qi) to
another CH(Qj) .
– A total of n-1 tangents are computed for each CH(Qi)
– Details for computing the upper tangents will be
discussed separately
The Upper and Lower Hull
Last Step of Algorithm (cont)
• Among the tangent lines to CH(Qi) and polygons to the
left of CH(Qi), let Li be the one with the smallest slope.
– Use a MIN CW to a shared memory location
• Among the tangent lines to CH(Qi) and polygons to the
right, let Ri be the one with the largest slope.
– Use a MAX CW to a shared memory location
• If the angle between Li and Ri is less than 180 degrees,
no point of CH(Qi) is in CH(Q).
– See Figure 5.13 on next slide (from Akl’s Online text)
• Otherwise, all points in CH(Q) between where Li touches
CH(Qi) and where Ri touches CH(Qi) are in CH(Q).
• Array Packing is used to combine all convex hull points
of CH(Q) after they are identified.
Algorithm for Upper Tangents
• Requires finding a straight line segment tangent
to CH(Qi) and CH(Qj), as given by line swusing
a binary search technique
– See Fig 5.14(a) on next slide
• Let s be the mid-point of the ordered sequence
of corner points in CH(Qi) .
• Similarly, let w be the mid-point of the ordered
sequence of convex hull points in CH(Qi).
• Two cases arise:
– sw is the upper tangent of CH(Qi) and we are done.
– Otherwise, on average one-half of the remaining
corner points of CH(Qi) and/or CH(Qj) can be
removed from consideration.
• Preceding process is now repeated with the mid-
points of two remaining sequences.
PRAM Convex Hull Complexity Analysis
• Step 1: The sort takes O(lg n) time.
• Step 2: Partition of Q into subsets takes O(1) time.
– Here, Qi consist of points qk where k = (i-1)n +r for 1  i n
• Step 3: The recursive calculations of CH(Qi) for 1  i n
in parallel takes t(n ) time (using n PEs for each Qi).
• Step 4: The big steps here require O(lgn) and are
– Finding the upper tangent from CH(Qi) to CH(Qj) for
each i, j pair takes O(lgn ) = O(lg n)
– Array packing used to form the ordered sequence of
upper convex hull points for Q.
• Above steps find the upper convex hull. The lower
convex hull is found similarly.
– Upper & lower hulls can be merged in O(1) time to be
an (counter)/clockwise ordered set of hull points.
Complexity Analysis (Cont)
• Cost for Step 3: Solving the recurrence relation
t(n) = t(n) +  lg n
yields
t(n) = O(lg n)
• Running time for PRAM Convex Hull is O(lg n)
since this is maximum cost for each step.
• Then the cost for PRAM Convex Hull is
C(n) = O(n lg n).
Optimality of PRAM Convex Hull
Theorem: A lower bound for the number of
sequential steps required to find the convex hull
of a set of planar points is (n lg n)

• Let X = {x1, x2, . . . , xn } be any sequence of

real numbers.
• Consider the set of planar points
Q = { (x1, x12) , (x2, x22) , . . . , (xn,xn2) .
• All points of Q lie on the curve y = x2, so all
points of Q are in CH(Q).
• Apply any convex hull algorithm to Q.
Optimality of PRAM Convex Hull (cont)
• The convex hull produced is ‘sorted’ by the first
coordinate, assuming the following rotation.
– A sequence may require an around-the-end rotation
of items to get the least x-coordinate to occur first.
– Identifying smallest term and rotating A takes only
linear (or O(n)) time.
• The process of sorting has a lower bound of
n lg n basic steps.
• All of the above steps used to sort this
sequence with the exception of finding the
convex hull require only linear time.
• Consequently, a worst case lower bound for
computing the convex hull is (n lgn) steps.

Rebuilding Rails (Noah Gibbs) (Z-Library)
No ratings yet
Rebuilding Rails (Noah Gibbs) (Z-Library)
137 pages
Comp3101 18122017
No ratings yet
Comp3101 18122017
7 pages
Assignment 1: Name Class Date Period Sbuid Netid Email
No ratings yet
Assignment 1: Name Class Date Period Sbuid Netid Email
4 pages
Concurrency: Mutual Exclusion and Synchronization: Operating Systems: Internals and Design Principles
No ratings yet
Concurrency: Mutual Exclusion and Synchronization: Operating Systems: Internals and Design Principles
61 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Deadlock in DBMS
No ratings yet
Deadlock in DBMS
3 pages
Toc Unit III
No ratings yet
Toc Unit III
14 pages
How To Convert A Left Linear Grammar To A Right Linear Grammar
No ratings yet
How To Convert A Left Linear Grammar To A Right Linear Grammar
44 pages
Data Structure4
No ratings yet
Data Structure4
6 pages
Advanced Computer Architecture 2
No ratings yet
Advanced Computer Architecture 2
17 pages
Branch and Bound
No ratings yet
Branch and Bound
14 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Bangladesh University of Business & Technology (BUBT) : Department of Computer Science and Engineering
100% (1)
Bangladesh University of Business & Technology (BUBT) : Department of Computer Science and Engineering
2 pages
Closure Properties of Context Free Languages (Proof)
No ratings yet
Closure Properties of Context Free Languages (Proof)
2 pages
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
100% (2)
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
24 pages
Complex Integrity Constraints in SQL
No ratings yet
Complex Integrity Constraints in SQL
8 pages
Precedence Graph - DBMS
No ratings yet
Precedence Graph - DBMS
7 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
37 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
CPU Scheduling
No ratings yet
CPU Scheduling
48 pages
Cannon Strassen DNS Algorithm
No ratings yet
Cannon Strassen DNS Algorithm
10 pages
Topological Sorting: Directed Acyclic Graph
No ratings yet
Topological Sorting: Directed Acyclic Graph
22 pages
Short Notes With Questions PDF
100% (1)
Short Notes With Questions PDF
237 pages
Master of Computer Application: Lab Manual
No ratings yet
Master of Computer Application: Lab Manual
30 pages
String Matching
100% (1)
String Matching
12 pages
LAB Manual - PART A - PLSQL
No ratings yet
LAB Manual - PART A - PLSQL
8 pages
Features of Pentium and Above Microprocessors
0% (1)
Features of Pentium and Above Microprocessors
5 pages
Unit-5 Compiler Design - Code Generation
No ratings yet
Unit-5 Compiler Design - Code Generation
42 pages
FLAT
No ratings yet
FLAT
7 pages
7 Query Localization
No ratings yet
7 Query Localization
27 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
Theory of Automata Chapter 4
No ratings yet
Theory of Automata Chapter 4
24 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
10-IAS Instructions-10-01-2023
No ratings yet
10-IAS Instructions-10-01-2023
9 pages
LAB # 07 Facts and Rules in PROLOG: Objective
No ratings yet
LAB # 07 Facts and Rules in PROLOG: Objective
6 pages
Os PPT Disk Sheduling 22
No ratings yet
Os PPT Disk Sheduling 22
16 pages
Dbms Aicte Lab
No ratings yet
Dbms Aicte Lab
42 pages
Lecturenotes Module-5 BCS403 Databasemanagementsystem
No ratings yet
Lecturenotes Module-5 BCS403 Databasemanagementsystem
20 pages
Unit 4 DBMS R23
No ratings yet
Unit 4 DBMS R23
19 pages
Cs9227 - Operating System Lab Manual
No ratings yet
Cs9227 - Operating System Lab Manual
39 pages
Text Processing: Data Structures and Algorithms in Java 1/47
No ratings yet
Text Processing: Data Structures and Algorithms in Java 1/47
47 pages
DEADLOCK
No ratings yet
DEADLOCK
8 pages
Assignment 1: Data Mining MGSC5126 - 10
No ratings yet
Assignment 1: Data Mining MGSC5126 - 10
10 pages
Object Oriented Modeling and Design (9166) - Sample Paper of MSBTE For Sixth Semester Final Year Computer Engineering Diploma (80 Marks)
0% (1)
Object Oriented Modeling and Design (9166) - Sample Paper of MSBTE For Sixth Semester Final Year Computer Engineering Diploma (80 Marks)
2 pages
Data Structures Using C: Example 4.13
No ratings yet
Data Structures Using C: Example 4.13
5 pages
Bresenham Line Drawing Algo
No ratings yet
Bresenham Line Drawing Algo
6 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Unit 2
No ratings yet
Unit 2
73 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
Unit-3 ATCD
No ratings yet
Unit-3 ATCD
36 pages
DAA Unit-2: Fundamental Algorithmic Strategies
No ratings yet
DAA Unit-2: Fundamental Algorithmic Strategies
5 pages
CS3401 Algorithms
No ratings yet
CS3401 Algorithms
51 pages
Processor and Memory Organization
No ratings yet
Processor and Memory Organization
17 pages
Ankit Sir All Units Dbms
100% (1)
Ankit Sir All Units Dbms
142 pages
The CAP Theorem
100% (1)
The CAP Theorem
3 pages
NFA To DFA Example
No ratings yet
NFA To DFA Example
27 pages
Pushdown Automata
No ratings yet
Pushdown Automata
12 pages
Ram, Pram, and Logp Models
No ratings yet
Ram, Pram, and Logp Models
72 pages
Pda 3
No ratings yet
Pda 3
90 pages
Parallel Random Access Machine
No ratings yet
Parallel Random Access Machine
22 pages
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
No ratings yet
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
49 pages
Practical 3 - Introduction To Object-Oriented Programming (Amended) PDF
No ratings yet
Practical 3 - Introduction To Object-Oriented Programming (Amended) PDF
2 pages
Basics of R
No ratings yet
Basics of R
29 pages
STRM C1 and C2 and Overflow Cases
No ratings yet
STRM C1 and C2 and Overflow Cases
15 pages
Entity Entity Entity: in Cods View All Parameters Are Obligatory
No ratings yet
Entity Entity Entity: in Cods View All Parameters Are Obligatory
31 pages
Journal Form Practicum Computer Programming
No ratings yet
Journal Form Practicum Computer Programming
10 pages
DXC - MF - SD - 2267308 - S4TWL - Data Model Changes in SD Pricing
No ratings yet
DXC - MF - SD - 2267308 - S4TWL - Data Model Changes in SD Pricing
3 pages
XII CS1 Journal 2024 25
No ratings yet
XII CS1 Journal 2024 25
28 pages
Laporan Praktikum Algoritma Dan Pemrograman: Pertemuan Ke - 4
No ratings yet
Laporan Praktikum Algoritma Dan Pemrograman: Pertemuan Ke - 4
12 pages
Java 14 Development of Applications With Javafx
No ratings yet
Java 14 Development of Applications With Javafx
179 pages
Lab Manual-ICP Spring 2020
No ratings yet
Lab Manual-ICP Spring 2020
30 pages
Grade 9 Pre-Technical Studies Notes SP
No ratings yet
Grade 9 Pre-Technical Studies Notes SP
30 pages
Module 3 Introduction To SQL
No ratings yet
Module 3 Introduction To SQL
21 pages
Exceptions
No ratings yet
Exceptions
3 pages
Course Handout DS
No ratings yet
Course Handout DS
2 pages
Fundamentals of Programming: Lab Report # 12
No ratings yet
Fundamentals of Programming: Lab Report # 12
4 pages
GS 4353 Numerical Analysis and Computer Programming-2
No ratings yet
GS 4353 Numerical Analysis and Computer Programming-2
138 pages
EG11 ICT UP U5 Nalanada College Colombo 10 PDF
No ratings yet
EG11 ICT UP U5 Nalanada College Colombo 10 PDF
4 pages
Guvi Codekata Report (2023 - Batch) Students Greater Than 15000 Points
No ratings yet
Guvi Codekata Report (2023 - Batch) Students Greater Than 15000 Points
2 pages
Flutter Basics
No ratings yet
Flutter Basics
15 pages
CM1601 Report
No ratings yet
CM1601 Report
23 pages
DP - Report of Inventory Managment System
No ratings yet
DP - Report of Inventory Managment System
7 pages
Managing Tablespaces
No ratings yet
Managing Tablespaces
32 pages
429 Sai Kumar
No ratings yet
429 Sai Kumar
11 pages
Learn Python 3 - Modules Cheatsheet - Codecademy
No ratings yet
Learn Python 3 - Modules Cheatsheet - Codecademy
2 pages
Gaurav Bajaj
No ratings yet
Gaurav Bajaj
2 pages
Log Cat 1688594497239
No ratings yet
Log Cat 1688594497239
5 pages
11 Numpy Cheat Sheet
No ratings yet
11 Numpy Cheat Sheet
1 page
ES6 - Cheat Sheet
No ratings yet
ES6 - Cheat Sheet
12 pages

PRAM Model

Uploaded by

PRAM Model

Uploaded by

The PRAM Model

– Parallel computers with more processors than the number of

• Let X = {x1, x2, . . . , xn } be any sequence of

You might also like