Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
Technische Universitat
Fundamental Algorithms
Chapter 3: Parallel Algorithms The PRAM Model
Michael Bader
Winter 2014/15
Definition
Sorting is required to order a given sequence of elements, or more
precisely:
Input : a sequence of n elements a1 , a2 , . . . , an
Output : a permutation (reordering) a10 , a20 , . . . , an0 of the input
sequence, such that a10 a20 an0 .
An naive(?) solution:
pairwise comparison of all elements
count wins for each element to obtain its position
use one processor for each comparison!
AccumulateSort ( A : Array [ 1 . . n ] ) {
Create Array P [ 1 . . n ] o f I n t e g e r ;
/ / a l l P [ i ]=0 a t s t a r t
f o r i from 1 to n do i n p a r a l l e l {
A [ P [ i ]+1 ] : = A [ i ] ;
}
}
AccumulateSort Discussion
Implementation:
do all n2 comparisons at once and in parallel
use n2 processors
Assumptions:
all read accesses to A can be done in parallel
increments of P[i] and P[j] can be done in parallel
second for-loop is executed after the first one (on all processors)
all moves A[ P[i] ] := A[i] happen in one atomic step
(no overwrites due to sequential execution)
An immediate solution:
use n processors
on each processor: compare x with A[i]
return matching index/indices i
Discussion:
Can all n processors access x simultaneously?
exclusive or concurrent read
What happens if more than one processor finds an x?
exclusive or concurrent write (of multiple returns)
general approach: parallelisation by competition
Shared Memory
P1 P2 P3 ... Pn
Central Control
Shared Memory
P1 P2 P3 ... Pn
Central Control
SIMD
Underlying principle for parallel hardware architecture:
strict single instruction, multiple data (SIMD)
All parallel instructions of a parallelized loop are performed
synchronously (applies even to simple if-statements)
M. Bader: Fundamental Algorithms
Chapter 3: Parallel Algorithms The PRAM Model, Winter 2014/15 10
Munchen
Technische Universitat
Lockstep Example:
f o r i from 1 to n do i n p a r a l l e l {
i f U[ i ] > 0
then F [ i ] : = (U[ i ]U [ i 1]) / dx
else F [ i ] : = (U[ i +1]U [ i ] ) / dx
end i f
}
5 5
5 5 5 5
5 5 5 5 5 5 5 5
M. Bader: Fundamental Algorithms
Chapter 3: Parallel Algorithms The PRAM Model, Winter 2014/15 13
Munchen
Technische Universitat
A[ 1 ] := x ;
f o r i from 0 to k1 do
f o r j from 2 i +1 to 2 ( i +1) do i n p a r a l l e l {
A [ j ] : = A [ j 2 i ] ;
}
}
Complexity:
T (n) = (log n) on n
2 processors
4 7 3 9 5 6 10 8
4 3 5 8
3 5
3
M. Bader: Fundamental Algorithms
Chapter 3: Parallel Algorithms The PRAM Model, Winter 2014/15 15
Munchen
Technische Universitat
MinimumPRAM ( A : Array [ 1 . . n ] ) : I n t e g e r {
/ / n assumed t o be 2 k
/ / Model : EREW PRAM
f o r i from 1 to k do
f o r j from 1 to n / ( 2 i ) do i n p a r a l l e l {
i f A[ 2 j 1] < A[ 2 j ]
then A[ 2 j ] : = A[ 2 j 1 ] ;
end i f ;
A [ j ] : = A[ 2 j ] ;
}
return A [ 1 ] ;
}
n
Complexity: T (n) = (log n) on 2 processors
4 7 3 9 5 6 10 8
4 3 5 8
3 5
3
M. Bader: Fundamental Algorithms
Chapter 3: Parallel Algorithms The PRAM Model, Winter 2014/15 17
Munchen
Technische Universitat
BroadcastPRAM ( x , X [ 1 . . n ] ) ;
f o r i from 1 to n do i n p a r a l l e l {
i f A[ i ] = X[ i ]
then X [ i ] := i ;
else X [ i ] : = n +1; / / ( i n v a l i d index )
end i f ;
}
r e t u r n MinimumPRAM ( X [ 1 . . n ] ) ;
}
Idea:
1. compute prefix problem for A1 , . . . , An/2
gives A1:1 , . . . , A1:n/2
2. compute prefix problem for An/2+1 , . . . , An
gives An/2+1:n/2+1 , . . . , An/2+1:n
3. multiply A1:n/2 with all An/2+1:n/2+1 , . . . , An/2+1:n
gives A1:n/2+1 , . . . , A1:n
Parallelism:
steps 1 and 2 can be computed in parallel (divide)
all multiplications in step 3 can be computed in parallel
recursive extension leads to parallel prefix scheme
A1 A2 A3 A4 A5 A6 A7 A8
A1 A8 A2 A7 A3 A6 A4 A5
A1 A8 A2 A7 A3 A6 A4 A5
PrefixPRAM ( A : Array [ 1 . . n ] ) {
/ / n assumed t o be 2 k
/ / Model : CREW PRAM ( n / 2 p r o c e s s o r s )
f o r l from 0 to k1 do
f o r p from 2 l by 2 ( l +1) to n do i n p a r a l l e l
f o r j from 1 to 2 l do i n p a r a l l e l {
A [ p+ j ] : = A [ p ] A [ p+ j ] ;
}
}
Comments:
p- and j-loop together: n/2 multiplications per l-loop
concurrent read access to A[p] in the innermost loop
A1 A2 A3 A4 A5 A6 A7 A8
PrefixPRAM ( A : Array [ 1 . . n ] ) {
/ / n assumed t o be 2 k
/ / Model : EREW PRAM ( n1 p r o c e s s o r s )
f o r l from 0 to k1 do
f o r j from 2 l +1 to n do i n p a r a l l e l {
tmp [ j ] : = A [ j 2 l ] ;
A [ j ] : = tmp [ j ] A [ j ] ;
}
}
Comment:
all processors execute tmp[j] := A[j-2l] before multiplication!