Notes 04
Notes 04
1 Overview
In the last lecture we covered the Circuit Model, defined runtime and work, and proved Brent’s
Theorem. We previously defined these as:
In this lecture we will elaborate on runtime, work and cost and their definitions without a depen-
dence on the number of processors, p. We will define parallelism and work efficiency.
We want to define runtime, work, and cost of an algorithm independently of the number of proces-
sors, p, that will be used.
We’ll use the following summation algorithm to illustrate the next few points:
sum(A[1...n]){
if n == 1
{return 1}
else
in parallel do {
l = sum(A[1. . . n2 ])
r = sum(A[ n2 + 1. . . n])
}
return l + r
}
Let’s say that the number of processors, p, is equal to the number of elements, n, being summed.
Now, we’ll calculate the runtime, work and cost for this algorithm. Figure 1 is a depiction of the
work done on each level of the recursive Sum algorithm.
1
Similarly to
( solving a recurrence, we can define the runtime as follows:
t p ( n2 ) + Θ(1) if n > 1
tp=n (n) = 2
Θ(1) otherwise
= Θ(log n)
By looking at Figure 1, we can see that the equation given above for runtime is true. We know that
constant work is done at each node. The array is split in half on each level, which gives us the t p ( n2 )
2
when n > 1. We only need to follow one path down the tree since we assume that the nodes on each
level are done in parallel. When n = 1, we hit the base case and constant work is done at each node.
It may seem odd that the number of processors has been halved in this equation,
tp=n (n) → t p ( n2 ) + Θ(1)
2
We can see that this works out because of our assumption that p = n:
tp (n) =
t(n, p) = t( n2 , p2 )
t(n, n) = t( n2 , n2 ) + Θ(1)
Let g(n) = t(n, n), then g(n) = g( n2 ) + Θ(1), which can then be solved.
g(n) = Θ(log n)
t(n, n) = Θ(log n)
We can now determine
( the work done in the algorithm.
n
2w( 2 ) + Θ(1) if n > 1
w(n) =
Θ(1) otherwise
= Θ(n)
2
Looking at the tree in Figure 1, the right column shows the amount of work done on each level.
We can see that there are log n + 1 levels and each level has 2ni work, where i is equal to the level.
We seePthat the work could then be described as
n n
i=0 2i = 2n − 1 ≤ 2n
Thus, we get the idea that work = Θ(n), which can be proved using substitution or the Master
method. We can see that work is the same as it would be for the sequential sum algorithm.
We previously defined cost to be p · tp (n). Thus,
cost = n · Θ(log n) = Θ(n log n)
We can see that cost is not equal to work. In most cases, cost is the same as work, but it is
not always the case, as we have seen here. Essentially, work is a measure of the total number of
operations used in the algorithm, or the number of operations that one processor would need to
perform when p = 1. Cost, on the other hand, is dependent on the number of processors being
used. If p > n, some of the processors will not be utilized on most of the levels, meaning that cost
will be greater than the actual number of operations needed for the algorithm. If p < n, then w(n)
work will be done. We can see that the following must then be true: cost(n) ≥ w(n).
sum(A[1...n], N ){
if n ≤ log N {
for i = 1 to n {sum = sum +A[i]}
return sum
}
else
in parallel do {
l = sum(A[1 . . . n2 ], N )
r = sum(A[ n2 + 1 . . . n], N )
}
return l + r
}
We can see that the tree for this algorithm is slightly different from earlier in Figure 2, below.
3
We will (now determine the runtime of this algorithm.
t p ( n2 ) + Θ(1) if n > log n
tp (n) = 2
Θ(n) otherwise
= Θ(log n)
We see that the depth of the tree is now log logNN . We obtain this by finding the difference in
depth of the
original
log n tree and the depth below the level where N = log n.
N
log log N = log N − log log N = log p
Each level above the leaves requires constant time because there are p processors that work in par-
allel. The last level of leaves has log n elements at each node. These elements need to be summed
up by one processor, which will take linear time, meaning that the leaves will take log n time to
process. By this reasoning, we get:
tp (n) = Θ(log N − log log N + log N )
= O(log N )
In order to achieve this runtime, we need a processor for each leaf, so
log N
p = 2 log N = logNN
4
= Θ(n)
We can use the Master Method to verify that this is true:
Compare Θ(1) to nlog2 2
Since Θ(1) = O(n), by Case 1,
w(n) = Θ(n)
Using the equation stated earlier, we can determine the cost.
Cost(n) = tp (n) · p
= Θ(log n) · logn n
= Θ(n)
If we didn’t know about Brent’s Theorem, then we would instead, design this algorithm so that
the base case is Np . We can see the tree for this algorithm in Figure 3.
Θ(n) otherwise
N
= Θ(log p + p)
We would want cost to equal work. We know that cost is equal to Θ(p log p + N ), and work
5
is equal to Θ(N ). In order to make cost equal to Θ(N ), we need the term p log p to be dominated
by N . If we choose p where p < logNN , then we can see that p log p < N :
N N
log N log log N < N
N
log N (log N − log log N ) < N
log log N
N (1 − log N ) < N
N log log N
N− log N <N
N log log N
N should be bigger than one for the algorithm, which makes the term log N positive, meaning
the inequality holds.
N
So, we say that P < log N .
In our summation example earlier, w(n) = N , and t∞ (n) = O(log n), since it could not be completed
in less time.
w(n)
Theorem 4. If p ≤ t∞ (n) , then cost = Θ(w(n)).
w(n)
Proof. tp (n) = p + t∞ (n) (By Brent’s theorem. Note: t∞ (n) = T (n) for circuits.)
cost = p · tp (n)
= p( w(n)
p + t∞ (n))
= w(n) + p · t∞ (n)
≤ w(n) + w(n)
= 2w(n)
= Θ(w(n))
Thus, if we create an algorithm so that it runs with max parallelism and uses fewer p, then cost
will be equal to work.
6
by keeping work efficient while making the critical path, t∞ (n), length as short as possible. Then,
tp (n) = w(n)
p + t∞ (n)
w(n)
and minimizing t∞ (n) maximizes p, which minimizes p and the runtime. Since t∞ (n) cannot be
w(n)
greater than p because of p, we need to minimize t∞ (n).
Theorem 5. An algorithm A that runs in time TA (n) on a p-processor CRCW PRAM can be
implemented on a p-processor EREW PRAM in time t0p = Θ(TA (n) · log p).