Constructing Trees in Parallel
Constructing Trees in Parallel
Purdue e-Pubs
Computer Science Technical Reports Department of Computer Science
1989
Constructing Trees in Parallel
Mikhail J. Atallah
Purdue University, [email protected]
S. R. Kosaraju
L. L. Larmore
G. L. Miller
Report Number:
89-883
Tis document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for
additional information.
Atallah, Mikhail J.; Kosaraju, S. R.; Larmore, L. L.; and Miller, G. L., "Constructing Trees in Parallel" (1989). Computer Science
Technical Reports. Paper 751.
htp://docs.lib.purdue.edu/cstech/751
CONSTRUCTING TREES IN PARALLEL
M. J. Atallah
S. R. Kosaraju
L. L. Larmore
G. L. Miller
S-H. Teng
CSD-TR-883
May 1989
- 1 -
Constructing Trees in Parallel
M. J. AtalIah' s. R. Kosaraju
t
L. L. Larmore
t
G. L. Miller! S-H. Teng!
Abstract 1 Introduction
OCIog" n) time, n"/log n processor as well as D(log n)
time, n
3
/logn processor CREW deterministic paral-
lel algorithIllB are presented for constructing Huffman
codes from a given list of frequencies. The time can
be reduced to O(logn(log logn)") on a CReW model,
using only n
2
J(loglogn)2 processors. Also presented
is an optimal O(logn) time, OCn/logn) processor
EREW parallel algorithm for constructing a tree given
a list of leaf depths when the depths are monotonic.
An 0Clog" n) time, n processor parallel algorithm is
given for the general tree construction problem. We
also give an O(log
2
n) time n
2
/ log" n processor algo-
rithm which finds a nearly optimal binary search tree.
An OCIog" n) time n
2
.
36
processor algorithm for recog-
nizing linear context free languages is given. A crucial
ingredient in achieving those bounds is a formulation
of these problems as multiplications of special matrices
which we call concave matrices. The structure of these
matrices makes their parallel multiplication dramati-
cally more efficient than that of arbitrary matrices.
oDepartment of Computer Science, Purdue Univemity. Sup-
ported by the Office of Naval Re3earch under Grant9 NOOO14-B4-
K-0502 and NOOOl4-86-K-0689, llIId the National Science Foun-
dation under Grant DCR-8451393, with matching fund9 from
AT&T.
lDepartment of Computer Science, Johns Hopkins Univer-
sity. Supporl.l':d by National Scien= Founda.tion through grant
CCR-88-04284
IICS, UC Irvine.
ISchool ofCompuler Science, CMU and Department of Com-
puter Science, USC. Supported by National Science Foundation
through grant CCR-87-13489.
In this paper we present several new parallel algo-
rithms. Each algorithm uses substantially fewer pro-
cessors than used in previously known algorithms. The
four problems considered are: The Tree Construction
Problem, The Huffman Code Problem, The Linear
Context Free Language Recognition Problem, and The
Optimal Binary Search Tree Problem. In each of these
problems the computationally expensive part of the
problem is finding the associated tree. We shall show
that these trees are not arbitrary trees but are special.
We take advantage of the special form of these trees
to decrease the number of processors used.
All of the problems we consider in this paper,
as well as many other problems, can be performed in
sequential polynomial time using Dynamic Program-
ming. Nc algorithms for each of these problems can
be obtained by parallelization of Dynamic Program-
ming. Unfortunately, this approach produces parallel
algorithms which use 0(n
6
) or more processors. An
algorithm which increases the work performed from
0(71) or O(n
2
) to 0(n
6
) is not of much practical value.
In this paper we present several new paradigms for im-
proving the processor efficiency for dynamic program-
ming problems. For all the problems considered a tree
or class of trees is given implicitly and the algorithm
must find one such tree.
The construction of optimal codes is a classical
problem in communication. Let ~ = {O, 1, ... , (f - I}
be an alphabet. A code C = {ct, ... ,c
n
} o v r ~ is a
finite nonempty set of distinct finite sequences o v r ~
Each sequence Ci is called code word. A code C is a
prefix code if no code-word in C is a prefix of another
code-word. A message over C is a word resulting from
the concatenation of code words from C.
We assume the words over a source alphabet
al, J an are to be transmitted over a communica-
tion channel which can transfer one symbol of E per
unit or time, and the probability of appearance of a,
is Pi E 'R... The Huffman Coding Problem is to
construct a prefix code C = {Cl".' ,C
n
E EO} such
- 2 -
that the average word length ~ r i i . lei I is minimum,
where led is the length of Ci.
It is easy to see that prefix codes have the nice
property that a message can be decomposed in code
word in only one way- they are uniquely decipherable.
It is interesting to point out that Kraft and McMillan
proved that for any code which is uniquely decipher-
able there is always a prefix code with the same average
word length [13]. In 1952, Huffman [9] gave an elegant
sequential algorithm which can generate an optimal
prefix code in O(n log n) time. If the probabilities are
presorted then his algorithm is actually linear time
[11]. Using parallel dynamic programming, Kosaraju
and Teng [18], independently, gave the first JiC al-
gorithm for the Huffman Coding Problem. However,
both constructions use n
6
processors. In this paper,
we first show how to reduce the processor count to n
3
,
while using O(log n) time, by showing that we may as-
sume that the tree associated with the prefix code is
left-justified (to be defined in Section 2).
The n
3
processor count arises from the fact that
we are multiplying n x n matrices over a closed semir-
ing. We reduce the processor count still further to
n
2
/10gn by showing that, arter suitable modification,
the matrices which are multiplied are concave (to be
defined later). The structure of these matrices makes
their parallel multiplication dramatically more efficient
than that of arbitrary matrices. An O(logn loglogn)
time n
2
I logn processor CREW algorithm is presented
for multiplying them. Also given is an Ologlogn)2)
time, n
2
I log logn processor CRCW algorithm for mul
tiplying two concave matrices
1
.
The algorithm for construction of a Huffman code
still uses n
2
processors, which is probably too large for
practical consideration since Huffman's algorithm only
takes O(nlogn) sequential time. Shannon and Fano
gave a code, the Shannon-Fano Code, which is only
one bit off from optimal. That is, the expected length
ofa ShannonFano code word is at most one bit longer
than the Huffman code word.
The construction of the Shannon-Fano Code re-
duces to the following Tree Construction Problem,
Definition 1.1 (Tree Construction Problem)
Given n integer values h, ... , In. construct an ordered
binary tree with n leaves whose levels when read form
left to right are h, ... , In.
l Independently, [1J and [2] improved the CREW algoriLhm
results by showing thal lwo concave matrices CIlll be multi.
plied in O(logn) lime, U5ing n
2
/1ogn CREW PRAM proces-
SOI'S. Also, [2] improved the CRCW algorithm by reducing the
number ofCRCW PRAM processors required to n
2
/(loglogn)2.
We give an 0(log2 n) time, n processor EREW
PRAM: parallel algorithm for the tree construction
problem. In the case when Ii, ... ,In are monotonic, we
give an O(logn) time and nflogn processor EREW
PRAM parallel algorithm. In fact, trees where the
level of the leaves are monotone will he used for
both constructing Huffman Codes and Shannon-Fano
Codes.
Using our solution of the tree construction prob-
lem we get an O(logn) time n/logn processor
EREW PRAM algorithm for constructing Shannon-
Fano Codes.
We also consider the problem of parallel con-
structing optimal binary search trees as defined by
Knuth [10]. The best known NC algorithm for this
problem is the parallelization of dynamic program-
ming which uses n
6
processors. In this paper, using
the new concave matrix multiplication algorithm, we
show how to compute nearly optimal binary search
tree in 0(log2 n) time using n
2
/10gn processors. Our
search trees are only off from optimal by an additive
amount of link for any fixed k.
Finally, we consider recognition of linear context
free languages. A CFL is said to be linear if all pro-
ductions are of the form A -+ bB, A -+ Bb or a -+ A
where A and Bare nonterminal variables and a and
b are terminal variables. It is well known from Ruzzo
[17] that the general CFL's recognition problem can be
performed on a CRCW PRAM in O(log n) time using
n
6
processors again by parallelization of dynamic pro-
gramming. By observing that the parse tree of the lin-
ear context free language is of very restricted form, we
construct an 0(n
3
) processor, 0(log2 n) time CREW
PRAM algorithm for it. Using the fact that we are do-
ing Boolean matrix multiplication, we can reduce the
processor count to n
2
.
36
.
2 Preliminaries
Throughout this paper a tree will be a rooted tree. It
is ordered if the children of each node are ordered from
left to right. The level of a node in a tree is its distance
from the root. A binary tree T is complete at level 1
if there are 2' nodes in T at level I. A binary tree is
empty at level I if there is no vertex at levell.
A binary tree T is a left-justified tree if it satisfies
the following property:
1. if a vertex has only one child, then it is a left child;
2. if u and v are sibling nodes of T, where u is to the
left of v, then if T" is not empty at some levell,
- 3 -
then T" is complete at levell, where Ttl and Tv de-
note the subtrees rooted at u and v, respectively.
Right-justified trees can be defined similarly.
Let RAKE be an operation that removes all leaves
from a tree. We shall consider a restricted form of
RAKE where leaves are removed only when its siblings
are leaves.
Proposition 2.1 The set of left-justified trees (right-
justified trees) is closed under the RAKE opemtion.
The values of all Hi,;, including the desired out-
put value HI,n, may be obtained by the following al-
gorithm which slmulates the RAKE operation:
1. Estimate Hi,; to be 0 ifi = j, +00 otherwise.
2. Iterate this step until all Hi,; are stable: Use re-
lation (1) to re-estimate Hi,; for all i < jJ using
the values of H obtained during the previous es-
timation step.
LCDllDB. 2.1 For any left-justified tree T oln vertices,
flogz n1applications olRAKE will reduce T to a single
chain. Moreover, the resulting chain comes from the
left most path in T.
Proof: We need only show that any vertex not on
the left most path of T is removed by flog n1iterations
of RAKE.
Let v be a vertex of T not on the left most path,
and let h be the height of v, the maximum length of
a path from v to a leaf in Tv. Since T is left-justified,
there exists a vertex u of T at the same level as v and
to the left of v, and since T" is not empty at level h
J
Ttl is complete at level h and hence has at least 2
h
leaves. Since T has n leaves altogether, h < logn.
Each RAKE decreases the height of every non-empty
subtree by 1, thus log n iterations of RAKE completely
eliminate T". 0
Let the height of a tree T be the height of its root.
3. Output the value of HI,n'
Each iteration of the second step can be done in
O(logn) time using n
3
/logn processors, if a CREW
PRAM model of computation is used. Unfortunately,
the best upper bound on the number of iterations
needed is O(n), since each iteration simulalesjust one
RAKE operation.
The algorithm can be improved by introducing a
step which simulates the COMPRESS operation, as
well. The COMPRESS operation halves each chain in
a tree by doubling. For any 1 :$; i < j :s n, define
Fi ,; to be that quantity such that HI,i +F
i
,; is the
minimum average word length of a binary tree over
(PI, ... ,Pi)' where the only trees considered are those
which contain a subtree which is a binary tree over
(Pt, .. . , pi). If the value of all Hi,; are already known,
Fi,; can be defined by the following:
3 Parallel Tree Contraction
and Dynamic Programming
Corollary 2.1 1fT is a left-justified tree ofn vertices
J
then for all v not on the left most path ofT, the height
ofT" is bounded by Ocpognl).
(2)
3. For I:$; i < j:$; n, estimate Fi,j to be H;+I,;+Pi,j,
using the last estimate of H
i
+
1
,;.
1. For 1 < i :$; j 5 n, estimate Hi,; to be 0 if i = j,
+00 otherwise.
2. Iterate this step flog n1times: For all 1 < i < j 5
n, re-estimate Hi,; using relation (1) and the val-
ues of H computed during the previous estimation
step.
4. Iterate this step pog n1times: For all 1 :$; i < j :$;
n, re-estimate Fi,; using relation (2) and the val-
ues of F computed during the previous estimation
step.
We now describe the modified algorithm, which
makes use of relations (1) and (2), and which simulates
logn iterations of RAKE followed by logn iterations
of COMPRESS:
(1 )
i=j
i < i
o
m i n ~ = i (Hi,k_l +Hk,j) +Pi,j
In this section, we present a parallel algorithm for find-
ing an optimal Huffman tree for a given monotonic
frequency vector (PIJ ... JPn). The general Huffman
Coding Problem is reducible to this special case after
applying one sort (see Teng [18]).
For 1 5 i 5 j :$; n let Hi,; be defined to be the
minimum average word length of a Huffman code over
(pi J ,p;). Let Pi,; = L ~ = i P. The values of H may
be obtained recursively as follows: for aliI:$; i :$; j :$;
n,
- 4 -
5. Output the value F1,n, which will be the minimum
average word length of any Huffman code.
Intuitively, each re-estimation of the values of H
simulates one RAKE step, while each re-estimation of
the values of F simulates one COMPRESS step. Cor-
rectness of the algorithm follows from the fact that any
left-justified binary tree can be reduced to the empty
tree by pog n1iterations of RAKE followed by pog n1
iterations of COMPRESS, and
Lemma 3.1 For each monotonically increasing fre-
quency vector (P1, ... ,Pn), there is an optimal posi-
tional tree (Huffman tree) that is left-justified.
[PROOF]: This lemma can be proven by a simple in-
duction on n. In fact, the procedure given in the proof
ofLemma 3.1 (Teng (18]) transforms any Huffman tree
to a left-justified one. 0
Theorem 3.1 The Huffman Coding Problem can be
solved in O(logn) time, using 0(n
3
jlogn) processors
on a CRCW PRAM.
4 Multiplication
Matrix
of Concave
4.1 The Matrix Cut(A, B)
Let A be a concave matrix of size px q, B be a concave
matrix of size q x r. By the definition of matrix multi-
plication above, (AB),j = min{A
ik
+Bkill:$; k q}.
We can define a matrix Cut(A, B) taking values in
{I, q] as follows: Cut(A, B)ii = that value of k such
that Ail: + Bki is minimized. (If there is more than
one value of k for which that sum is minimized, take
the smallest.)
To compute AB it is clearly sufficient to compute
Cut(A, B), since we can construct AB from Cut(A, B)
in 0(1) time using pr processors. In the algorithm
below, we just indicate how to compute Cut(A, B).
Define A
ellen
to be the submatrix of A consisting
of all the entries of A whose row index is even, while
(by an abuse of nota.tion) we define Be"en to be the
submatrix of B consisting of all entries of B whose
column index is even.
MULTIPLICATION ALGORlTHM,
Procedure: Cut(A, B)
if A has just one row, or if B has just one column
then
compute Cut(A, B) by examining all possible
choices
eI.e
In this section we introduce a new subclass of matrices
which we call concave matrices. A concave matrix is
a rectangular matrix M which satisfies the quadrangle
condition [19), that is Mii +Mkl Mil +Mki for all
i < k, j < I in the range of the indices of M.
Matrix multiplication shall be defined over the
closed semiring (min, +), where the domain is the set
of rational numbers extended with +00. For example,
if M is the n x n matrix giving the weights of the edges
of a complete digraph of size n, then Mk is the ma-
trix giving the minimum weight of any path of length
exactly k between any given pair of vertices.
We give a recursive concave matrix multiplica-
tion algorithm which takes O(lognloglogn) time, us-
ing n
2
jlogn processors on a CREW machine and
Olog logn)2) time, using n
2
j(log log n) processors on
a CRCW machine. Our algorithm is very simple and
has very small constant.
Theorem 4.1 Two concave matrices Clln be mulli-
plied in O(logn log log n) time, using n
2
j logn proces-
sors on a CREW machine; and 0{(loglogn)2) time,
using n
2
j(loglogn) processors on a CRCW machine.
In the absence of the concavity assumption, the
best known algorithm for computing AB requires
O(n
3
) comparisons.
compute Cut(Ae"en' Bellen ) by recursion
compute Cut(A
ellen
, B) by interpolation
compute Cut(A, B
ellen
) by interpolation
compute Cut(A, B) by interpolation
Ii
Interpolation:
The concavity property of A guarantees the fol-
lowing inequality:
Cut(A, B)ij :$; Cut(A, B);+l,j
while concavity of B guarantees a similar inequal-
ity:
Cut(A, B)ij :$; Cut(A, B)",j+1
The combination of these two properties we call
the monotonicily property. By the monotonicity
property, the total number of comparisons needed
to compute Cut(A, B), given Cut(A
ellen
, B), can-
not exceed (q - l)r. To see this, fix a particular
column index j. For a particular odd row value
of i, q - 1 comparisons could be needed to decide
the value of Cut(A, B)ij, since every k is a candi-
date. But monotonicity allows us to decide that value
with only Cut(A, B)"+1,j - Cut(A, B)'-l,i compar-
isons. Summed over all odd values of i, the total
- 5 -
number of comparisons needed (for the fixed value of
j) is thus only q - 1. For all j together, (q - l)r
are enough. Similarly, monotonicity allows us to com-
pute Cut(AclIe ," B) given Cut(Aellen , Be"en) using at
most p(q - 1)/2 comparisons and Cut(A,B) given
Cut(Ae"erIlB) and Cut(A,Bellen ) using at most qr/2
operations.
Time and Work Analysis:
Except for the recursion, the time to execute the
multiplication algorithm is O(logq) or O(loglogq) on
a CREW and CRCW machine, respectively. Since
the depth of the recursion is min{logp, logr}, the to-
tal time is O(logq(min{logp, logr}) on a CREW ma-
chine and O(loglogq(min{logp, logr}) on a CRCW
machine.
4.2 More Efficient Concave Matrix
Algorithm
In the MULTIPLICATION ALGORITHM given in
the above subsection, the size of matrices is getting
small during the recursion, while the number of pro-
cessor available is still n
2
Hence at a certain stage,
we do have enough processors to run the general ma-
trix multiplication algorithm to compute the Cut ma-
trix in one step. This implies that we can stop the
recursion whenever the sizes of matrices are smaller
enough to run the general matrix multiplication algo-
rithm. Thus, the parallel (concave matrix) multiplica-
tion algorithm can be speeded up.
For each integer m, let Amodm be the submatrix
of A consisting of all the entries of A whose row in-
dex is a multiple of m, while (by an abuse of nota-
tion) let Bmodm submatrix of B consisting of all the
entries of B whose column index is a multiple of m.
Clearly, Cl.lt(Amodl.fii"j,Bmodl..,lnj) requires n
2
com-
parisons, and can be computed in O(1ogn) time and
(log log n) time on a CREW machine and a CRCW
machine, respectively.
The following is a bottom-up procedure for com-
puting Cut(A, B).
for m = 1 to rloglognl + 1 do
1. Compute J' Bmodln'nmJ)jj
2. Compute Cut(Amodln.nmj' B);;
3. Compute Cut(A, Bmodlnlnmj);;
We now show that each step of the loop can be
computed by n
2
comparisons.
Clearly, when m = 1, step (1) requires n
2
compar-
isons. And it follows from the monotonicity properties
that in Step(2), each row requires.,fii n comparisons.
Since there is ..;n rows, n
2
comparisons are sufficient.
Similarly, Step (3) takes n
2
comparisons.
For m > 1, by monotonicity properties that each
row (column) takes n
l12m
n comparisons. Since there
are nl_l/2m-l. n
l
/
2
m. rows (columns), n
2
comparisons
are sufficient.
Therefore, the above algo-
rithm takes O(log n log log n) time, using n
2
/ logn pro-
cessors on a CREW machine or 0((1og10gn)2) time,
using n
2
/(loglogn) processors on a CRCW machine.
5 The Parallel Generation of
Huffman Code
Presen ted in this section is an efficient parallel algo-
rithms for the Huffman Coding Problem. The algo-
rithm runs in 0(log2 n) time, using n
2
/ logn proces-
sors on an CRCW PRAM. This algorithm improves
upon the previous known fife algorithms significantly
on the processor count. It hinges on the use of concave
matrix multiplication.
By Lemma 3.1, for each nondecreasing vector
(PI, ... ,Pn), there exists an optimal ordered tree
for (PI, ... , Pn) which is left-justified. From Corol-
lary 2.1, it follows that there exists an optimal tree
for (PI,.' . l Pn) such that the heights of all subtrees in-
duced by nodes not on the leftmost path are bounded
by nagnl
This observation suggests the following paradigm
for the Huffman Coding Problem.
1. Constructing Height Bounded Subtrees: for
all i :5: j, compute 1i,;, an optimal tree for
(pi, "Pi) whose height is bounded by flognl
2. Constructing the Optimal Tree: using the in-
formation provided in the first step to construct
an optimal Huffman tree for (PI, ... , Pn).
Assume that weights PI, ... ,Pn are given in mono-
tonically increasing order. Define a matrix S as
S[i, j] = L:{=i+l PI: for i < j, and Sri, j] = +00 for
i;:: j. It follows easily that S is a concave matrix.
For each h ;:: 0, then define a matrix Ah as fol-
lows. For 0 i < j :5: n, Ah[i,jJ be the average word
length of the optimal Huffman tree for the weights
(Pi+lPj), restricted to height h, i.e., minimum over
all trees whose height does not exceed h. If no tree
exists, i.e., i 2:: j or (j - i) > 2
h
J define Ah[i, j] = +00.
Note that A
o
is trivial to compute, while Ah =
min(Ah_l, Ah_l*Ah_l +5), where *stands for the ma-
trix multiplication using the closed semiring {min,+}.
- 6 -
Theorem 5.1 The Huffman Goding Problem can be
solved in O(log
2
n) time, using n
2
Jlogn processors.
Think of M' as a digraph derived from M by
adding a self-loop of weight 0 at o. It is easy to verify
that M' satisfies the quadrangle condition [19]. Hence,
M' is a concave matrix.
Note that any path of length k or less from 0 to
j in M corresponds to a path of length exactly k from
otoi in M'.
The left edge of the optimal Huffman tree has
length less than n, therefore (M')k[O,n] equals the
weighted path length of the optimal Huffman tree for
anyk>n.
Note that M' is a concave matrix. Moreover
(M')r is also a concave matrix. Hence (M')2
f10
&n
1
can be computed by starting with M', then squaring
nog n1times. Using the parallel concave matrix mul-
tiplication algorithm, each squaring can be performed
in O(log n) time, using n
2
Jlog n processors.
In this section, the parallel construction of optimal bi-
nary search trees, an important data structure for data
maintenance and information retrieval [10], is consid-
ered. An O(log2 n) time, n
2
Jlog2 n processor parallel
algorithm is given for constructing an approximate bi-
nary search tree whose weighted path length is within
f off the optimal, where f = n-I:. Note that the best
known sequenLial optimal search tree construction al-
gorithm, due to Knuth, takes 0(n
2
) time. Hence, our
algorithm is opHmal up to approximation. This algo-
rithm too hinges on the judicious use of concave matrix
multiplication.
The sequential version of the optimal search tree
problem was first studied by Knuth [10], who used
monotonicity to give an 0(n
2
) time algorithm.
Suppose we are given n names AI, ... , An and
2n +1 frequencies Po, Pl, ... ,Pn, ql, ... , qn, where qi is
the probability of accessing A,. and Pi is the probability
of accessing a word which is not in the dictionary and
is between Ai and Ai+I.
A labeled proper binary tree T of n internal nodes
and (n+1) leaves is a binary search tree for AI, ... , An
iff there is a one-one onto mapping from All ... , An to
the internal nodes of T such that the inorder traversal
of T gives the vector (AI, ... , An).
Opti-
Trees
Constructing Almost
mal Binary Searching
in Parallel
6
ifi=j=O
ifi=Oandj=1
ifO<i<i,:Sn
otherwise.
Therefore, Apognl can be computed in O(log2 n)
time, using n
2
Jlog n processors. This is the first step
in the paradigm proposed above. We now show that
the second step in the paradigm can be also reduced
to multiplying concave matrices.
A square matrix can be identified with a weighted
directed graph. It is well known [3] that if M is the
matrix for a weighted digraph with (n + 1) vertices,
that min(M,I)n contains the solutions to the all-pairs
minimum path problem for that digraph, where I is
the identity matrix over the closed semiring {min, +}.
We now show how to reduce the IIuffman prob-
lem to a minimum weight path problem for a directed
graph. The matrix M, defined below, will be the
weight matrix for a directed graph,which is also called
M, whose vertices are {O, 1, ... , n}.
Note also that Apognl[i,i] is equal to the v r ~
age word length of the optimal tree for (Pi+I , ... , Pi) of
height bounded by pog n1.
The following lemma was first proven by Garey
[6] and is known as the Quadrangle Lemma. For a
simplier proof see Larmore [11].
LeIDIna 5.1 For each h, Ah is a concave matrix.
It is easy to veriFy that M is a concave matrix.
The entries of MJ: contain minimum weights of
paths in the digraph M of length exactly k. For
i> 0, MJ: [i,i] has no simple meaning in terms of Huff
man trees. But MJ:[O,j] contains the minimum aver-
age word length over all Hulfman trees on the weights
(PI, ",Pi) which satisfies the following two properties:
1. There are exactly k - 1 internal nodes on the left
edge of the tree, I.e., the leftmost leaf is at depth
k.
2. The tree is leftjustified.
By Lemma 3.1, there is an optimal Huffman tree
that satisfies the above two conditions for some k.
Thus computing MJ: for all k up to n will give us the
optimal Huffman tree. Unfortunately, the amount of
computation involved is too great.
This problem can be overcome by a very slight
modification of the graph of the matrix M.
Define a matrix M' as follows: M'[O, OJ = 0 and
M'[i,i] = M[i,i] otherwise.
- 7 -
Theorem 6.1 FOT any 0 < c < l/n, a binary
search tree T can be found whose weighted path
length is within (" of that of the optimal tree, in
O(log(l/c)(logn)) time, using n
2
/ IDg"!- n processors.
The correctness of the algorithm is guaranteed by
the following lemma due to Larmore [12].
Lemma 6.2 The weighted path length ofT will differ
from that of the optimal tree by at most E.
Clearly, steps (1)-(3) and (5) can be performed
optimally in O(logn) time. The bottleneck of the al-
gorithm is step (4) which computes optimal binary
search trees of height bounded by H = O(logn) for
all pairs. Like the problem of constructing optimal
Huffman tree of height bounded by O(log n), this prob-
lem can be also reduced to multiplication of concave
matrices. Moreover, the number of concave matrix
multiplications is bounded by O(logn). The formal
description of the method is given in the full paper.
In this sectiDn, an optimal O(logn) time, n/ logn pro-
cessor EREW parallel algorithm is given for the tree
construction problem when the leaf pattern is mono-
tone or bitonic. Also presented is an O(log2 n) time,
n/ log n processor parallel EREW PRAM algorithm to
the tree construction problem with general leaf pat-
terns. This involves an lie redudion for the gen-
eral tree construction problem to the tree construction
problem with bitonic leaf patterns. Consequently, an
optimal, O(logn) time EREW parallel algorithm is ob-
tained fDr constructing Shannon-Fano code.
from Constructing Trees
Given Leaf-Patterns
7
1. Let 0 < c < n-
l
and let 6 = c/2n logn.
The following is the outline of our parallel ap.
proximate optimal binary search tree construction al-
gorithm.
2. Define a Pi or qi to be small if it is less than 6;
Define a rim of small frequencies to be a sublist
starting and ending with a p value, where every
p value and every q value in that sublist is small.
Collapse every maximal run of small frequencies
to a single frequency, which will then still be less
than E.
Lemma 6.1 If S is a subtree of an optimal tree, and
ifw and d are the weight ofS and the depth of the root
ofS, then d::; C+log(l/w)jlog(,p), where,p = 1.618...
is the Golden ratio, and C is some small constant.
Let b" be the depth of the i
th
internal node and ai
be the depth of the i
th
leaf. Then peT), the weighted
path length of T, is defined to be peT) = Ei qi(bi +
1) +Ei Piai
T is an optimafbinary search treefor (AI, ... , An)
jf P(T) is minimized over all possible search trees.
The optimal binary search tree problem is re-
ducible to a dynamic programming problem over the
closed semiring {min, +}. Hence, it lies in lIC. How-
ever, the best known lie parallel algorithm requires
n
6
processors.
Let the weight of a subtree be the sum of the Pi
and qo for the nodes and leaves in that subtree. Let
the depth of the subtree to be the depth of its root in
the whole tree.
Our parallel algorithm utilizes the following ap-
proximating lemma due to Gu:ttler, Mehlhorn, and
Schneider [7J.
3. Let H = O(log(ljc)) be the maximum height,
given in the sense of Gu:ttler, Mehlhorn, and
Schneider [7], of any optimal tree which has no
subtrees (other than a single leaf) of weight less
than 6.
4. Let T
'
be an optimal tree for the collapsed list of
frequencies. Note that height(T
'
)::; H.
5. Let T be the tree of n + 1 leaves obtained from
T' as follows. If L is a leaf of T
'
whose frequency
is one of the "collapsed" values obtained in step
2, replace L by an arbitrary binary tree of height
no more than log n, which contains all the low
frequency nodes involved in the collapse.
7.1 Monotonic Leaf Patterns and
Bitonic Leaf Patterns
There is an elegant characteristic function, due to
Kraft [5], to determine whether there is a solution to
the tree construction problem with a monotone leaf
pattern.
Lemma 1.1 (Kraft [5]) There is a solution to the
tree constroction problem for a monotone leaf pattern
(1" ... ,1") ifJ'?=, 1/2'; S 1.
In using the Krait sum one has to be careful that
the numbers added have only O(logn) bits in their
representations and not O(n) as they naively appear
to have in the Kraft sum.
- 8 -
Suppose (h, ... , In) is a monotone leaf pattern.
Since it is sorted we can construct a vector a =
(aI, ... , am) such that ai = the number leaves at level
i and m = II in O(logn) time optimaly. In the case
when II > n we must store a as linked-list of nonzero
entries. For simplicity of the exposition assume that
m ::; n. We first show how to reduce a to a vector
such that a. ::; 2 for 1 :S i ::; n. We compute a vector
a' from a by setting = La;f2J +(aj_l mod 2). It
follows by the Kraft sum that the tree for a exists iff
it does for a'. Further, from the tree for a' we can
in unit time construct one for a. This reduction from
a to a' is very analogous the the RAKE in the Huff-
man code algorithm for left-justified trees. We shall
apply this reduction until the ai ::; 2, at most O(logn)
times. To see that we only need nl log n processors
for the logn reductions, observe that the total work
is 00=";>2 log a;) :S n. To balance the work, any
processor -that computes at a given stage will be
required to compute at the next stage. Thus we
distribute the a; 2 based on the work of a; which is
logaj. To construct a tree for a where a. :S 2 reduces
to computing the sum of two n-bit numbers and their
intermediate carries. This can all be done optimaly
using prefix sums.
This gives the following theorem:
Theorem 1.1 Trees with monotone leaf patterns can
be constructed in O(log n) time, using nl log n proces-
sors on an EREW PRAM.
A pattern (h, ... , In) is a bitonic pattern if there
exists i such that (11"" ,Ii) is monotone increasing
and (Ii, ... , In) is monotone decreasing.
Lemma 1.2 The tree construction problem for a
bitonic leaf pattern (11,"" In) has a solution iff
2-
l
j < 1
L....J=I -'
Using the methods presented for monotone leaf
patterns and the above lemma we get the following
theorem:
Theorem 7.2 A Tree/rom a bitonic lea/pattern can
be constructed in O(logn) time, using n/logn proces-
sors on an EREW PRAM i/it exists. In general. the
minimum number o/trees (in order) will be generated
with the prescribed leaf pattern.
7.2 General Leaf-Patterns
Presented in this subsection is an O(logn) time re-
duction from the tree construction problem with a
general leaf pattern to the one with bitonic leaf pat-
terns. Moreover, the reduction can be performed with
nllogn processors. Therefore, an 0(log2 n) time,
n/logn processor EREW PRAM parallel algorithm
results.
A segment. representation of a pattern (h, ... , In)
is ((IL nd, ..., (I:n, nm)) where Ej= nj = n, Ii ::f:. li+I'
and
For simplicity, Wi, nI), ..., (I;", n
m
)) is also called
a pattern. In a pattern ((It,nI), ... ,(lm,n
m
)), Ii is a
min-point if Ii_I> 1; < 1;+1; 1i is a max-point if li_1 <
Ii> 1i+I.
In a pattern ((h, nI)' ... , (1m, n
m
)), (Ii, ... , Ij) is
a right-finger if (1) li_I is a min-point and for no
i S k :5: j is II; a min-point (2) 1HI :S 1;-1 < Ij.
A left-finger is defined similarly except that I
HI
is a
min-point. Note that a finger may be both a left and a
right finger. We next show how to "remove" every fin-
ger from a leaf pattern using the tree construction for
bitonic patterns. Finally, we observe that the new pat-
tern will have at most half as many fingers as before.
Thus we need only remove fingers O(logn) times.
Finger-Reduction applied to one finger (Ii, ..., Ii)
in a pattern ((11, nI), ... , (1m, n
m
)) is defined as follows:
Without loss of generality assume that it is a right-
finger and I
HI
< 1;-1. Set
Finger-Reduction returns the pattern:
We have just relaced a finger with the number (from
Lemma 7.2) of leaves at level 1;-1 that are needed to
generate it. In general Finger-Reduction will simulta-
neously reJ!love all fingers, both left and right fingers.
It will return with a pattern.
To see that Finger-Reduction reduces the num-
ber of finger by at least one half, observe that Finger-
Reduction removes all max-points. It is not hard to see
that the only candidates for max-points are l; which
were previously and also adjacent to a left
and right finger. Thus the worst case for reducing the
number of fingers of a pattern is when the pattern con-
sists of consecutive pairs of left and right fingers that
share a min-point. The next Lemma summerizes this:
Lemma 7.3 (Finger Cut Lemma) If a pat-
tern (Ii, ... , is obtained by Finger-Reduction from
- 9 -
a pattern (h, ... , In), then the tree construction prob-
lem with pattern (ll,"" In) has a solution iff there is a
solution to the tree construction problem with pal/ern
l ~ , ... , I ~ .
To obtain the tree for a pattern we apply Finger-
Reduction until the pattern is reduced to a single fin-
ger. We then construct the rool tree for the finger.
In an expansion phase we attach the trees constructed
while removing the finger during Finger-Reduction to
the root tree.
Theorem. 7.3 A tree can be constructed for a pattern
(/1> ... ,ln) with m fingers O(lognlogm) time, using
njlogn processors.
7.3 Constructing Approximate OptiM
mal Trees
The Shannon-Fano coding method can be specified as:
upon input (PI, ... ,Pn), compute (/1, ... ,In) such that
log :; :::; 1; :::; log'k + 1, then construct a prefix code
G = (CI, ... ,en) such that 1c. 1= Ii
The proof to the following claim can be found in
[8],
Claim. 7.1 Let SF(A) be the average word-length of
the Shannon-Fano code of A = {al, ... ,a
n
} and
HUFF(A) be the one of the Huffman code, then
HUFF(A) :0; SF(A) :0; HUFF(A) +1
The second part of the Shannon-Fano method can
be implemented by the parallel tree construction algo-
rithm presented in Section 7.1. Therefore,
Theorem 7.4 In O(logn) time, using njlogn pro-
cessors, a prefix code can be constructed with average
word length bounded by that of the corresponding HuiJ
man code plus one.
8 Parallel Linear Context-free
Language Recognition
In this section, the parallel complexity of linear
Context-free Language recognition is considered. The
linear CFLs' recognition problem is reduced to a path
problem in a graph which has a family of small sep-
arators. An O(log2 n) time, M(n) processor parallel
algorithm is obtained for linear CFLs' recognition by
using the parallel nested disection of Pan and Reif [16].
Here M(n) is the number of processors needed to mul-
tiply two n x n boolean matrices in O(logn) time in
the CReW PRAM model.
Definition 8.1 (Context-free Language)
A context-free grammar is a ..f-tuple G ::: {V, 1;, P,5}
where:
V is a finite nonemply set called the total vocab
ulary;
1; ~ V is a finite non empty set called the terminal
alphabet;
5 E V - 1; ::::;. N is called the start symbol;
P is a finite set of rules of the form:
A_ fr, where A E N, a E V
A context-free grammar G = {V, 1;, P, 5} is linear if
each role is of the form: A _ uBv, where A,B E
N, U,v E 1;.
Let G ::::; {V, 1;, P,5} be a context-free grammar,
and let w, w' E V, w is said to directly generate Wi,
written w => Wi if there exist a,fJ,u,v E V such that
w = aAfJ, w' = auBv{3 E V, and A _ uBv E P. ~
stands for the reflexive-transitive closure of =>.
The language generated by G, written L(G) is the
,,'
L(G) = {w E EI s;' w}
The CFL-recognition problem is defined as: given
a context-free grammar, G and a finite sequence w :::
WlWn E 1;., decide whether W E L(G) (and generate
a parse tree).
Each linear context-
free grammar G' = {V
1
,1;,P
I
,S} can be normalized
by constructing another linear context-free grammar
G = {V,E,P"j ,u,h 'hat (i) L(G) = L(G'), (ii) P;,
a finite set of rules of the form
A_bB, orA_a, orA_Gc,
where a,b,cE1;, A,B,GEN.
A linear context-free grammar G
'
can be easily
normalized by finding a G such that (i), (ii) are sat-
isfied and moreover, the size of G is within a constant
factor of that of G
'
. Throughout this section, it is as-
sumed that the input linear conte.>..1Aree grammar is
normal and its size is a constant with respect to n, the
length of the input finite sequence.
Given a (normal) linear context-free grammar G
and a finite sequence w = WI ... W
n
E 1;., a graph can
be defined, IG(O, w) = {IV, IE}, called induced graph
of G and w which has IIVI = O(n') nodes. More
specifically,
IV
IE
{Vi,i,p 11 $ i$j:S n,pE N}
IE, U fEr where
{(vi,i,p, vi,i-l,q) I i < j, P - qWj E P}
{(vi,i,p, Vi+l,i,q) I i < j, P- Wiq E P}
- 10 -
approximately equal components (in that figure the
triangle is meant to depict IG(G,w) itself rather than
its collapsed version). Moreover, such a small separa-
tor can be uniformly found in each component recur-
sively.
We have the following observation.
Claim 8.1 Let G be a linear context-free grammar
and wEE-. Let fGCG, w) be the induced graph of
G and w. Then w E L(G) iff there exists a path in
IG(G,w) from VI,n,> to V;,i,q for some i: 1 :5 i:5 n,
where q _ Wi E P.
The above observation reduces the linear context-
free recognition problem to a path problem (reachabil-
ity problem) in the induced graph IG(G,w).
Let cluster i, j refer to the set of INI vertices of
the form vi,i,p (see Figure 1). Note that if all vertices
of each cluster i,i are "collapsed" into one vertex (call
it Vi,}), then a planar grid graph is obtained (see Fig-
ure 2a) which we schematically draw as a triangle (sec
Figure 2b). Although IG(G,w) itself is typically not
planar, we shall take the liberty of talking about its
external face to refer to the subset of its nodes that
map into the external face of the collapsed version.
Figure 1: In IG(G,w) the only edges leaving cluster
i,i go to clusters i,i - 1 and i + l,j.
Figure 3: Illustrating the small separator C III
IG(G,w)
The outline of the parallel algorithm becomes
clear. Let U,M,L,R be the four pieces in IG(G,w)
induced by the separator 0 (see Figure 3). First, the
reachability matrix Re.ach
u
between all pairs of ver-
tices on tbe external face of U is recursively computed.
The same is done for each of M, L, R, resulting in ma-
trices ReachM , Reach
L
, Re.achR, respectively. Using
the four boolean matrices returned by these four re-
cursive calls, the reachability matrix ReachG between
all pairs of vertices on the external face of IG(G,w)
is computed. This can be done simply by boolean
matrix multiplication (actually three such multiplica-
tions), taking time O(logn) with M(n) processors (it is
known that M(n) = O(n
2
+') where 0 < c < 1). IIence
the time complexity of the algorithm is 0(log
2
n). The
processor count can be obtained by the following re-
currence:
Pen) = max(4P(n/2),M(n))
which implies P(n) = O(M(n.
Theorem 8.1 Linear context-free languages can be
recognized in 0(log
2
n) time, using M(n) processors,
where. M(n) is the number of processors needed to do
boolean matrix multiply -in O(logn) time.
Figure 2: A grid graph (a) and its schematic represen-
tation (b).
Let m ;:; 0(n
2
) denote the number of vertices
in IG(G, w), and PIG;:; O(n) be the edge size of
IG(G,w), i.e. the perimeter of the external face. The
subset 0, shown in Figure 3, is a separator of size
0(.Jffi) ;:; O(n), which partitions IG(G,w) into four
9 Open Questions
Can the Huffman Coding Problem be solved in
polylogarithmic time, using 0(n
2
-') processors?
Given a position tree, can we test whether it is
a Huffman tree in polylogarithmic time, using
O(n
2
-') processors?
Can general context-free languages be recog-
nized in polylogarithmic time, using O(n
6
-'), or
o(M(n
2
processors?
- 11 -
Can a linear context-free language be recognized
in polylogarithmic time, using n'2 or o(M(n)) pro-
cessors?
Can the Optimal Binary Search Tree Construc-
tion be solved in polylogarithmic Lime, using fewer
than O(n
6
-!) processors?
Acknowledgements We would like to thank
Manuela Veloso of CMU for carefully reading drafts of
the paper and many helpful comments. We would also
like to thank Alok Aggarwal for helpful discussions.
References
[1] A. Apostolico, M. J. Atallah, L. L. Larmore and
H. S. McFaddin. Efficient parallel algorithms for
string editing and related problems. In Froc. 26th
A nnual Allerton Con! on Communication, Con-
trol, and Computing, Monticello, Illinois, Septem-
ber 1988, pp 253-263, 1988.
[2] A. Aggarwal and J. Park. Notes on searching
in multidimensional monotone arrays. In 29ih
Annual Symposium on Foundations of Computer
Science, IEEE, 1988.
[3] A. Aho, J. Hopcroft, and J. Ullman. The Design
and Analysis of Computer Algorithms. Addison-
Wesley, 1974.
[4] R. Cole. Parallel merge sorL In FOCS27,
pages 511-516, IEEE, Toronto, October 1987.
[5] S. Even. Graph Algorithms. Computer Science
Press, Potomac, Maryland, 1979.
[6] M. R. Garey Optimal binary search tree with re-
stricted maximal depth. SIAM Journal of Com-
puting, 3:101-110, 1974.
[7] R. GuttIer K. Mehlhorn and W. Schneider. Bi-
nary search trees: average and worst case be-
havior. Elekiron. Informationsverarb Kybernet,
16,579-591, 1980.
[8J R. W. Hamming. Coding and Information The-
ory. Prentice-Hall, Inc., 1980.
[9] D. A. Huffman. A method for the construction of
minimum redundancy codes. Proc. IRE, 40:1098-
1101, 1952.
[10] D. E. Knuth. Optimal binary search trees. Acta
Informatica, 1:14-25, 1971.
[11] L. L. Larmore. Height restricted optimal binary
trees. SIAM Journal of Computing, 16:1115-
1123, 1987.
[12] L. L. Larmore. A subquadratic algorithm for
constructing approximately optimal binary search
trees. J. of Algorithms, 8:579-591, 1987.
[13] B. McMillan. Two inequalities implied by unique
decipherability. IRE, Transaction on Information
Theory, 0,185-189, 1956.
[14] G. L. Miller and J. II. Reif. Parallel treecontrac-
tion and its applications. In 26th Symposium on
Foundaiions of Computer Science, pages 478-489,
IEEE, Portland, Oregon, 1985.
[15] G. L. Miller and S ~ Teng. Systematic meth-
ods for tree based parallel algorithm development.
In Second International Conference on Supercom-
puting, pages 392--403, Santa Clara, May 1987.
[16] V. Pan and J. H. Reif. Fast and efficient paral-
lel solution of linear systems. SIAM Journal of
Computing, to appear, 1988.
[17] W. L. Ruzzo. On uniform circuit complexity.
Journal of Computer and System Sciences, 22(3):,
June 1981.
[18] S-H. Teng. The construction of Huffman-
equivalent prefix code in NC. ACM SIGACT,
18(4),54-61,1987.
[19] F. F. Yao. Efficient dynamic programming us-
ing efficient dynamic programming using quad-
rangle inequalities. In Proceedings ofihe 12th An-
nual ACM Symposium on Theory of Computing,
pages 429-435, ACM, 1980.
- 12 -
Cluster i,j _
Cluster ij-l
Figure 1
5,5
,
...
J
4,4
3,3
2,2
..
1,5 ? 5
1,
1,1
(a) (b)
Figure 2
- 13 -
Figure 3