Lec6 PRAMalgs
Lec6 PRAMalgs
(cont.)
Parallel Computing
Fall 2022
1
PRAM Algorithm:
Matrix Multiplication
Matrix Multiplication
A simple algorithm for multiplying two n × n matrices on a CREW
PRAM with time complexity T = O(lg n) and P = n3 follows. For
convenience, processors are indexed as triples (i, j, k), where i, j, k =
1, . . . , n. In the first step processor (i, j, k) concurrently reads aij and
bjk and performs the multiplication aijbjk. In the following steps, for all i,
k the results (i, ∗, k) are combined, using the parallel sum algorithm to
form cik = j aijbjk. After lgn steps, the result cik is thus computed.
The same algorithm also works on the EREW PRAM with the same time
and processor complexity. The first step of the CREW algorithm need
to be changed only. We avoid concurrency by broadcasting element aij
to processors (i, j, ∗) using the broadcasting algorithm of the EREW
PRAM in O(lg n) steps. Similarly, bjk is broadcast to processors (∗, j, k).
The above algorithm also shows how an n-processor EREW PRAM can
simulate an n-processor CREW PRAM with an O(lg n) slowdown.
2
Matrix Multiplication
CREW EREW
1. aij to all (i,j,*) procs O(1) O(lgn)
bjk to all (*,j,k) procs O(1) O(lgn)
2. aij*bjk at (i,j,k) proc O(1) O(1)
3. parallel sumj aij *bjk (i,*,k) procs O(lgn) O(lgn) n procs
participate
4. cik = sumj aij*bjk O(1) O(1)
3
PRAM Algorithm:
Logical AND operation
Problem. Let X1 . . .,Xn be binary/boolean values. Find X = X1 ∧ X2 ∧ . . .
∧ Xn.
The sequential problem accepts a P = 1, T = O(n),W = O(n) direct
solution.
An EREW PRAM algorithm solution for this problem works the same
way as the PARALLEL SUM algorithm and its performance is P = O(n),
T = O(lg n),W = O(n lg n) along with the improvements in P and W
mentioned for the PARALLEL SUM algorithm.
In the remainder we will investigate a CRCW PRAM algorithm. Let
binary value Xi reside in the shared memory location i. We can find X =
X1 ∧ X2 ∧ . . . ∧ Xn in constant time on a CRCW PRAM. Processor 1 first
writes an 1 in shared memory cell 0. If Xi = 0, processor i writes a 0 in
memory cell 0. The result X is then stored in this memory cell.
The result stored in cell 0 is 1 (TRUE) unless a processor writes a 0 in
cell 0; then one of the Xi is 0 (FALSE) and the result X should be
FALSE, as it is.
4
End
Thank you!