0% found this document useful (0 votes)
19 views47 pages

Chapter 02

Chapter 2 discusses PRAM (Parallel Random Access Machine) algorithms, detailing the model of parallel computation, its various types, and the complexities involved. It covers specific algorithms like parallel prefix sums and merge sort, as well as concepts such as parallel reduction and the cost of computations. The chapter emphasizes the theoretical aspects of PRAM and its application in designing efficient parallel algorithms.

Uploaded by

abdallahm.alsoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views47 pages

Chapter 02

Chapter 2 discusses PRAM (Parallel Random Access Machine) algorithms, detailing the model of parallel computation, its various types, and the complexities involved. It covers specific algorithms like parallel prefix sums and merge sort, as well as concepts such as parallel reduction and the cost of computations. The chapter emphasizes the theoretical aspects of PRAM and its application in designing efficient parallel algorithms.

Uploaded by

abdallahm.alsoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter2: PRAM Algorithms

 PRAM
 A Model of Serial Computation (RAM)
 Time & Space Complexities
 The PRAM Model of Parallel
Computation
 Various Models of PRAM
 PRAM Algorithms
 Parallel Reduction

1

 Parallel Prefix Sums
 Parallel Merge Sort
 Reducing Number of Processors
 Brent’s Theorem
 Complexity Theory & PRAM

2
PRAM
 PRAM stands for Parallel Random
Access Machine.
 It is unrealistically simple; it ignores
the complexity of inter processor
communication. Why?
 It allows parallel-algorithm designers
to treat processing power as an
unlimited recourse.

3
RAM
 It’s a theoretical model of serial
computation. It is a one address
computer.
 It consists of
 A memory
 A read-only input tape
 A write-only output tape
 A program

4
Read-only
input tape x1 x2 … xn

r0
Accumulat
or
r1

r2
Location counter Program
r3

Memory

Write -only
output y1 y2 … yn
tape
Time & Space Complexities
 Time complexity: the time taken by
the program to execute over all
inputs of size n.
 Space complexity: the space taken
by the program to execute over all
input of size n.
 Measuring time & space
 Uniform cost
 Logarithmic cost
6
The PRAM Model of Parallel
Computation
 The PRAM consists of:
 Control unit
 Global memory
 Unbounded set of processor, each
with its own private memory & a
unique index, which is need to enable
and disable a processor or influence
which memory location it accesses.

7
Control

P1 P2 Pp
Private memory Private memory Private memory

… … … …

Interconnection network

Global memory
How a PRAM Works?
 A PRAM begins with:
 input stored in global memory.
 A single active processing element
 The computation terminates when
the last processor halts.

9

 During each step of computation an
active processor may:
 Read a value from a single private local
or global memory location.
 Write a value from a single private
local or global memory location.
 Perform a single RAM operation.
 During a computation step, a processor
may active another processor.

10
Cost
 The cost of a PRAM computation is
the product of the parallel time
complexity and the number of
processors used.
PRAM algorithm that has
complexity (log P) using P
processors has cost
(P log P)
11
Various Models of PRAM

 EREW
(Exclusive Read Exclusive Write)
 CREW
(Concurrent Read Exclusive Write),
which is the default.
 CRCW
(Concurrent Read Concurrent
Write)
12

The following are the different policies to handle
concurrent writes:

Common

all processors concurrently writing into the same
global address must be writing the same value.

Arbitrary

if multiple processors writing the same address, an
arbitrary processor are chosen.

Priority

if multiple processors writing the same address, the
processor with lowest index succeeds (highest
Priority).
 Compare between them ( which is the weakness,
strength, and why)
13
PRAM Algorithms
 PRAM algorithms have two
phases:
a) Sufficient number of processors are
activated.
b) Perform computation on parallel.

14

 To activate P processors, we need
log p steps.
 The meta instruction
Spawn (<processor names>)
is used to activate processors.
 To denote a code segment to be
executed in parallel by specified
processors, we use a parallel
construct.
15

 The general format is:
For all < processor list> do
< statement list>
end for
 We can use other usual structures such
as:

If-then-else-endif

For-endfor

While-endwhile

Repeat-until
16
Parallel Reduction
 The importance of binary trees to
solve problems.
 Starting from the root (fan-out)

broadcasting

divide and conquer
 Starting from the leaves (fan-in)

Reduction

17

 Given a set of n values a1,a2,---- an,
and an associative binary
operation ⊕, the reduction is the
process of computing
a1 ⊕ a2 ⊕ a3 ⊕ ------ an

18
Ex:
4 3 8 2 9 1 0 5 Array A
0 1 2 3 4 5 6 7

J=0 7 10 10 5 P0, P1, P2,


P3

J=1 17 15 P0, P2

J=2 P0
3
2 19
Algorithm
SUM (EREW PRAM)
Initialization: list of n ≥ 1 elements stored
in
A[0 … n-1]
Final condition: sum of elements stored in
A[0]
Global variables: n, A [0, …, n-1], j

20

begin
spawn (p0, p1, p2, …, p⌊n/2⌋ -1)
for all pi where 0 ≤i ≤ ⌊n/2⌋ -1 do
for j  0 to log n-1 do
if i modulo 2j = 0 and 2i +2j < n
then
A[2i]  A[2i] +A[2i + 2j]
end if
end for
end for
end
21
Tracing
 When j = 0
0 mod 20 = 0  a[0]  a[0]+a[1]
1 mod 20 = 0  a[2]  a[2]+a[3]
2 mod 20 = 0  a[4]  a[4]+a[5]
3 mod 20 = 0  a[6]  a[6]+a[7]
 When j = 1
0 mod 21 = 0  a[0]  a[0]+a[2]
1 mod 21 = 1
2 mod 21 = 0  a[4]  a[4]+a[6]
3 mod 21 = 1
22

 When j = 2
0 mod 22 = 0  a[0]  a[0]+a[4]
1 mod 22 = 1
2 mod 22 = 1
3 mod 22 = 1

23
Analysis
 Time complexity (log n)
Given n/2 processors
spawning Log n/2 

Cost is (log n)* n/2

24
Prefix Sums
Given a set of n values a1, a2, ----, an
and an associative binary operation
⊕, the prefix sums is to compute n
quantities:
a1
a1⊕ a2
a1⊕ a2 ⊕ a3
a1⊕ a2 ⊕ a3…an
25
Analysis
 Time Complexity is (log n)
Given n-1 processors
spawning log (n-1)
 Cost is (log n)*(n-1)

26
Algorithm
PREFIX-SUMS (CREW PRAM)
Initialization: list of n ≥ 1 elements stored
in
A[0 … (n-1)]
Final condition: each element a[i] contains
A[1] ⊕ A[2] … A[i]
Global variables: n, A[0, …, (n-1)], j

27

begin
spawn (p1, p2,…, pn-1)
for all pi where 1 ≤ i ≤ n-1 do
for j  0 to  log n -1 do
if i -2j ≥ 0 then
A[i]  A[i] + A [i - 2j]
end if
end for
end for
end

28
Ex
0 1 2 3 4 5 6 7

4 3 8 2 9 1 0 5 Array A

P1... P7

4 7 11 10 11 10 1 5
P2... P7

4 7 15 17 22 20 12 15
P4... P7

4 7 15 17 26 27 27 32
29
Tracing
 When j=0
1 - 20 ≥ 0  a[1]  a[1]+a[0]
2 - 20 ≥ 0  a[2]  a[2]+a[1]
3 - 20 ≥ 0  a[3]  a[3]+a[2]
4 - 20 ≥ 0  a[4]  a[4]+a[3]
5 – 20 ≥ 0  a[5]  a[5]+a[4]
6 – 20 ≥ 0  a[6]  a[6]+a[5]
7 – 20 ≥ 0  a[7]  a[7]+a[6]

30

 When j=1
1 – 21 < 0
2 - 21 ≥ 0  a[2]  a[2]+a[0]
3 - 21 ≥ 0  a[3]  a[3]+a[1]
4 - 21 ≥ 0  a[4]  a[4]+a[2]
5 – 21 ≥ 0  a[5]  a[5]+a[3]
6 – 21 ≥ 0  a[6]  a[6]+a[4]
7 – 21 ≥ 0  a[7]  a[7]+a[5]

31

 When j=2
1 – 22 < 0
2 - 22 < 0
3 - 22 < 0
4 - 22 ≥ 0  a[4]  a[4]+a[0]
5 – 22 ≥ 0  a[5]  a[5]+a[1]
6 – 22 ≥ 0  a[6]  a[6]+a[2]
7 – 22 ≥ 0  a[7]  a[7]+a[3]

32
Merging Two Sorted Lists
 An optimal RAM algorithm creates
the merged list one element at a
time. It requires at most n-1
comparisons to merge two stored
list of n/2 elements.
 Complexity for RAM algorithm is
(n).

33

 Every processor do a binary search on
one half of the array and finds the
position of the element associated with it
by adding its index to the index of the
search
 For example, P4 is in the lower half and
associated with element 9, searches the
upper half and finds the position of 9 after
index 3. Therefore, the final position is
4+3=7.
34
Algorithm
MERGE-LISTS (CREW PRAM)
Initial condition: two stored lists of n/2
elements each, stored in A[1] … A[n/2] and
A[(n/2)+1] … A[n]
Final condition: merged list in locations
A[1] … A[n]
Global variables: A[1 …n]
Local variables: x, low, high, index

35

begin
spawn (p1, p2, …, pn)
for all pi where i <= n do
if i <= n/2 then
low  (n/2) +1
high  n
else
low  1
high  n/2
endif

36

X  a[i]
repeat
index  ⌊(low + high)/2⌋
If x < A[index] then
high  index –1
else
low  index +1
endif
until low > high
A[high+i–n/2]  x
endfor
end

37
Ex
p1 p2 p3 p4 p5 p6 p7 p8

1 5 7 9 13 17 19 23 Lower half

12 4 5 7 8 9 1 1 1 1 1 2 2 2 2
A[1] 1 2 3 7 9 1 2 3 4A[16]

2 4 8 11 12 21 22 24 Upper half

p9 p10 p11 p12 p13 p14 p15 p16


38
Analysis
 It requires n-1 comparisons to
merge two stored list of n/2
elements
 Time complexity for RAM (n)
 Time complexity for PRAM (log n)

39
Reducing Number of
Processors
 A cost optimal parallel algorithm is an algorithm
for which the cost is in the same complexity
class as an optimal sequential algorithm.
 For example, parallel reduction is not cost
optimal.
PRAM cost is time complexity * number of
processors
= (log n)* (n)
= (n log n)
which is greater than the complexity of the
optimal sequential algorithm (n).

40

 A cost optimal parallel algorithm exists if the
total number of operations performed the by
parallel algorithm is of the same complexity
class as an optimal sequential algorithm.
 For example, parallel reduction # of steps is:
n/2 + n/4 + n/8 + ... + (n-1)  (n)
while number of steps sequentially is:
n-1 operation  (n)

41

 Since both the parallel algorithm
and the sequential algorithm
perform n-1 operations, then a
cost variant algorithm exists.
 Number of processors required to
perform n-1 operations in (log n)
time is
 n 1   n 
 
P=  (log n)  =  log n 

42
Ex: n=16  P=16/4= 4
9 2 1 3 5 4 7 6 01 8 2 3 5 8 1
STEP1
STEP2

STEP3
15 24 11 1
7
STEP4
39 28
STEP5
76

43
Brent’s theorem
 Given A, a parallel algorithm with
computation time t, if parallel
algorithm A performs m
computational operations, then P
processors can execute algorithm A
in time
t+(m-t)/p

44
Ex: Parallel Reduction

m t ( n  1)   log n 
t  log n  
p  n 
 log n 
 

 log n log 2 n 
 log n  log n    log n 
 n n 

45
Complexity Theory & PRAM
• P: class of problems solvable by a
deterministic algorithm in polynomial time.
• NP: class of problems solvable by non
deterministic algorithm in polynomial time.
• P- complete: a problem L  P is P-complete
if every other problem in P can be
transformed to L in polynomial time.
• NP- complete: a problem L  NP is NP-
complete if every other problem in P can be
transformed to L in polynomial time.

46

• NC: class of problems solvable on a
PRAM in polylogarithmic time using
polynomial number of processors.
NP-complete

NP
P-complete
P

NC

47

You might also like