0% found this document useful (0 votes)
11 views

Notes 04

The document discusses parallel algorithms and covers runtime, work, cost, efficiency, and parallelism. It defines key terms and concepts and uses examples like summation algorithms to illustrate runtime, work, and cost when the number of processors is equal to or less than the problem size. It also discusses simulating CRCW PRAM algorithms on EREW PRAM models.

Uploaded by

biubu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Notes 04

The document discusses parallel algorithms and covers runtime, work, cost, efficiency, and parallelism. It defines key terms and concepts and uses examples like summation algorithms to illustrate runtime, work, and cost when the number of processors is equal to or less than the problem size. It also discusses simulating CRCW PRAM algorithms on EREW PRAM models.

Uploaded by

biubu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ICS 643: Advanced Parallel Algorithms Fall 2016

Lecture 4 — September 14, 2016


Prof. Nodari Sitchinava Scribe: Tiffany Eulalio

1 Overview

In the last lecture we covered the Circuit Model, defined runtime and work, and proved Brent’s
Theorem. We previously defined these as:

• Runtime: tp (n) = t(n, p) for p number of processor

• Cost: cost(n) = tp (n) · p

• Work: w(n) = total number of operations

In this lecture we will elaborate on runtime, work and cost and their definitions without a depen-
dence on the number of processors, p. We will define parallelism and work efficiency.

2 Runtime, Work & Cost

We want to define runtime, work, and cost of an algorithm independently of the number of proces-
sors, p, that will be used.

2.1 Cost vs. Work when p = n

We’ll use the following summation algorithm to illustrate the next few points:
sum(A[1...n]){
if n == 1
{return 1}
else
in parallel do {
l = sum(A[1. . . n2 ])
r = sum(A[ n2 + 1. . . n])
}
return l + r
}

Let’s say that the number of processors, p, is equal to the number of elements, n, being summed.
Now, we’ll calculate the runtime, work and cost for this algorithm. Figure 1 is a depiction of the
work done on each level of the recursive Sum algorithm.

1
Similarly to
( solving a recurrence, we can define the runtime as follows:
t p ( n2 ) + Θ(1) if n > 1
tp=n (n) = 2
Θ(1) otherwise
= Θ(log n)
By looking at Figure 1, we can see that the equation given above for runtime is true. We know that
constant work is done at each node. The array is split in half on each level, which gives us the t p ( n2 )
2
when n > 1. We only need to follow one path down the tree since we assume that the nodes on each
level are done in parallel. When n = 1, we hit the base case and constant work is done at each node.

It may seem odd that the number of processors has been halved in this equation,
tp=n (n) → t p ( n2 ) + Θ(1)
2
We can see that this works out because of our assumption that p = n:
tp (n) =
t(n, p) = t( n2 , p2 )
t(n, n) = t( n2 , n2 ) + Θ(1)
Let g(n) = t(n, n), then g(n) = g( n2 ) + Θ(1), which can then be solved.
g(n) = Θ(log n)
t(n, n) = Θ(log n)
We can now determine
( the work done in the algorithm.
n
2w( 2 ) + Θ(1) if n > 1
w(n) =
Θ(1) otherwise
= Θ(n)

2
Looking at the tree in Figure 1, the right column shows the amount of work done on each level.
We can see that there are log n + 1 levels and each level has 2ni work, where i is equal to the level.
We seePthat the work could then be described as
n n
i=0 2i = 2n − 1 ≤ 2n
Thus, we get the idea that work = Θ(n), which can be proved using substitution or the Master
method. We can see that work is the same as it would be for the sequential sum algorithm.
We previously defined cost to be p · tp (n). Thus,
cost = n · Θ(log n) = Θ(n log n)

2.1.1 Work vs. Cost

We can see that cost is not equal to work. In most cases, cost is the same as work, but it is
not always the case, as we have seen here. Essentially, work is a measure of the total number of
operations used in the algorithm, or the number of operations that one processor would need to
perform when p = 1. Cost, on the other hand, is dependent on the number of processors being
used. If p > n, some of the processors will not be utilized on most of the levels, meaning that cost
will be greater than the actual number of operations needed for the algorithm. If p < n, then w(n)
work will be done. We can see that the following must then be true: cost(n) ≥ w(n).

2.2 Summation with log(n) base case

Let’s rewrite the earlier sum algorithm as follows:

sum(A[1...n], N ){
if n ≤ log N {
for i = 1 to n {sum = sum +A[i]}
return sum
}
else
in parallel do {
l = sum(A[1 . . . n2 ], N )
r = sum(A[ n2 + 1 . . . n], N )
}
return l + r
}

We can see that the tree for this algorithm is slightly different from earlier in Figure 2, below.

3
We will (now determine the runtime of this algorithm.
t p ( n2 ) + Θ(1) if n > log n
tp (n) = 2
Θ(n) otherwise
= Θ(log n)
 
We see that the depth of the tree is now log logNN . We obtain this by finding the difference in
depth of the
 original
 log n tree and the depth below the level where N = log n.
N
log log N = log N − log log N = log p
Each level above the leaves requires constant time because there are p processors that work in par-
allel. The last level of leaves has log n elements at each node. These elements need to be summed
up by one processor, which will take linear time, meaning that the leaves will take log n time to
process. By this reasoning, we get:
tp (n) = Θ(log N − log log N + log N )
= O(log N )
In order to achieve this runtime, we need a processor for each leaf, so
log N
p = 2 log N = logNN

The work done(in the algorithm is


2w( n2 ) + Θ(1) if n > log n
w(n) =
Θ(log n) otherwise

4
= Θ(n)
We can use the Master Method to verify that this is true:
Compare Θ(1) to nlog2 2
Since Θ(1) = O(n), by Case 1,
w(n) = Θ(n)
Using the equation stated earlier, we can determine the cost.
Cost(n) = tp (n) · p
= Θ(log n) · logn n
= Θ(n)

2.3 Without Brent’s Theorem

If we didn’t know about Brent’s Theorem, then we would instead, design this algorithm so that
the base case is Np . We can see the tree for this algorithm in Figure 3.

The runtime will


( now be
n N
t N ( 2 ) + Θ(1) if n > p
tp (n) = p

Θ(n) otherwise
N
= Θ(log p + p)

Work would still be Θ(N ), and cost will be Θ(p log p + N ).

We would want cost to equal work. We know that cost is equal to Θ(p log p + N ), and work

5
is equal to Θ(N ). In order to make cost equal to Θ(N ), we need the term p log p to be dominated
by N . If we choose p where p < logNN , then we can see that p log p < N :
N N
log N log log N < N
N
log N (log N − log log N ) < N
log log N
N (1 − log N ) < N
N log log N
N− log N <N
N log log N
N should be bigger than one for the algorithm, which makes the term log N positive, meaning
the inequality holds.

N
So, we say that P < log N .

3 Efficiency & Parallelism

Definition 1. Efficiency is defined as tw(n)


1 (n)
, where t1 (n) is the runtime of the best sequential
algorithm, and w(n) is the work of a parallel algorithm. Efficiency is always less than or equal to
one since w(n) ≥ t1 (n).

Definition 2. A parallel algorithm is work-efficient if efficiency is equal to 1.


w(n)
Definition 3. We define parallelism to be t∞ (n) , work over the time needed for max processors
to complete the algorithm.

In our summation example earlier, w(n) = N , and t∞ (n) = O(log n), since it could not be completed
in less time.
w(n)
Theorem 4. If p ≤ t∞ (n) , then cost = Θ(w(n)).

w(n)
Proof. tp (n) = p + t∞ (n) (By Brent’s theorem. Note: t∞ (n) = T (n) for circuits.)

cost = p · tp (n)
= p( w(n)
p + t∞ (n))
= w(n) + p · t∞ (n)
≤ w(n) + w(n)
= 2w(n)
= Θ(w(n))

Thus, if we create an algorithm so that it runs with max parallelism and uses fewer p, then cost
will be equal to work.

Looking back at the equation for parallelism, tw(n)


∞ (n)
, we can think of parallelism as the maximum
number of processors to use so that we’re not wasting parallel resources. If we stop the algorithm
so that the number of leaves is equal to the number of processors, p, that we have, then the work
done at the leaves will dominate the work on internal nodes. We know that w(n) is fixed since it
is equal to the best sequential algorithm’s runtime. We want to try to minimize t∞ (n). We do this

6
by keeping work efficient while making the critical path, t∞ (n), length as short as possible. Then,
tp (n) = w(n)
p + t∞ (n)
w(n)
and minimizing t∞ (n) maximizes p, which minimizes p and the runtime. Since t∞ (n) cannot be
w(n)
greater than p because of p, we need to minimize t∞ (n).

3.1 Simulation of CRCW PRAM algorithms on EREW PRAM

Theorem 5. An algorithm A that runs in time TA (n) on a p-processor CRCW PRAM can be
implemented on a p-processor EREW PRAM in time t0p = Θ(TA (n) · log p).

The proof will be shown in the next lecture.

You might also like