Optimal Binary Search Tree1
Optimal Binary Search Tree1
V. Balasubramanian 2
Contd…
• The Cause of inefficiency in divide-and-
conquer
• After division …
– Smaller instances are unrelated, e.g., mergesort
– Smaller instances are related, e.g., fibonacci
• repeatedly solve common instances
V. Balasubramanian 3
Contd…
• Dynamic programming
– bottom-up approach
– use an array (table) to save solutions to small
instances
V. Balasubramanian 4
Fibonacci
• The same Fibonacci series algorithm in Dynamic programming is as follows:
• Dynamic programming Algorithm nth Fibonacci Term (Iterative)
– Problem: Determine the nth term in the Fibonacci sequence.
– Inputs: a nonnegative integer n.
– Outputs : fib2, the nth term in the Fibonacci sequence.
• int fib 2 (int n) {
• index i;
• int f[0 .. n]; // array to store Fibonacci values
• f[ 0 ] = 0;
• if (n > 0){
• f[ 1 ] = 1;
• for (i = 2; i<= n; i++)
• f[ i ] = f[i - 1] + f [i -2 ]; }
• return f[ n ];
• }
V. Balasubramanian 5
Binomial coefficent
⎛ n⎞ n!
⎜ ⎟= for 0 ≤ k ≤ n
⎝ k ⎠ k !(n − k )!
V. Balasubramanian 6
• Definition
The binomial coefficient
⎛ n⎞ n!
⎜ ⎟= for 0 ≤ k ≤ n
⎝ k ⎠ k !(n − k )!
• Recursive definition
⎧⎛ n − 1⎞ ⎛ n − 1⎞
⎛ n⎞ ⎪⎜ ⎟ +⎜ ⎟ 0< k <n
⎜ ⎟ = ⎨⎝ k − 1⎠ ⎝ k ⎠
⎝ k⎠ ⎪1 k = 0 or k = n
⎩
V. Balasubramanian 7
7 Algorithms ([email protected])
The algorithm
• Algorithm 3.1: Binomial Coefficient Using Divide-
and-Conquer
– Problem: Compute the binomial coefficient.
– Inputs: nonnegative integers n and k, where k ≤ n.
– Outputs: bin, the binomial coefficient .
int bin (int n, int k) {
if ( k = = 0 || n = = k)
return 1;
else
return bin (n-1, k - 1)+bin (n - 1, k);
} V. Balasubramanian 8
8
Using dynamic programming
V. Balasubramanian 9
9
Compute sequence of rows
V. Balasubramanian 19
DYNAMIC PROGRAMMING
(Contd..)
• Substituting i = n -1 and using gn(y)= 0 in the
above relation(1), we obtain gn-1(y).
• Then using gn-1(y), we obtain gn-2(y) and so on till
we get g0(M) (with i=0) which is the solution of
the knapsack problem.
• There are two approaches to solving the
recurrence relation 1
• (1) Forward approach and (2) Backward approach
V. Balasubramanian 20
DYNAMIC PROGRAMMING
(Contd..)
• In the forward approach ,decision xi is made
in terms of optimal decision Sequences for
xi+1……..xn (i.e we look ahead).
• In the backward approach, decision xi is in
terms of optimal decision sequences for
x1……xi-1(i.e we look backward).
V. Balasubramanian 21
DYNAMIC PROGRAMMING
(Contd..)
• For the 0/l knapsack problems
Gi(y)=max{gi+1(y),gi+1(y-wi+1)+Pi+1}………(1)
• Is the forward approach as gn-1(y) is obtained
using gn(y).
• fi(y) = max{fj-1(y), fj-1(y-wi) + pj} …………..(2)
• is the backward approach, fj(y) is the value of
optimal solution to Knap(i,j,Y). (2) may be solved
by beginning with
fi(y) = 0 for all y ≥ 0 and fi(y) = -infinity for
y < 0.
V. Balasubramanian 22
Example
• Consider 0/1 knapsack problem which has 3
objects n=3, their weights are w1=2, w2=3,
w3=4, their profits are p1=1, p2=2, p3=5
and knapsack capacity m=6. compute g0(6).
V. Balasubramanian 23
solution
g0(6) = max{g1(6), g1(6-W1) + P1)}
= max{g1(6), g1(6-2) + 1)}
g0(6) = max{g1(6), g1(4) + 1)}
V. Balasubramanian 24
Contd…
g1(4) = max{g2(4), g2(4-W2) + P2)}
= max{g2(4), g2(4-3) + 2)}
= max{g2(4), g2(1) + 2)}
V. Balasubramanian 25
Contd…
g2(1) = max{g3(1), g3(1-W3) + P3)}
g2(6) = max{0, g3(1-4) + 5)} = max{0, -infinity + 5)} = 0.
V. Balasubramanian 26
FEATURES OF DYNAMIC
PROGRAMMING SOLUTIONS
V. Balasubramanian 27
OPTIMAL BINARY SEARCH
TREES
• Definition: binary search tree (BST) A binary
search tree is a binary tree; either it is empty or
each node contains an identifier and
(i) all identifiers in the left sub tree of T are less
than the identifiers in the root node T.
(ii) all the identifiers the right sub tree are greater
than the identifier in the root node T.
(iii) the right and left sub tree are also BSTs.
V. Balasubramanian 28
Optimal Binary Search Trees
Problem: Given n keys a1 < …< an and probabilities p1 ≤ … ≤ pn
searching for them, find a BST with a minimum
average number of comparisons in successful search.
Since total number of BSTs with n nodes is given by
C(2n,n)/(n+1), which grows exponentially, brute force is hopeless.
V. Balasubramanian 29
DP for Optimal BST Problem
Let C[i,j] be minimum average number of comparisons made in
T[i,j], optimal BST for keys ai < …< aj , where 1 ≤ i ≤ j ≤ n.
Consider optimal BST among all BSTs with some ak (i ≤ k ≤ j )
as their root; T[i,j] is the best among them.
ak C[i,j] =
min {pk · 1 +
i≤k≤j
k -1
∑ ps (level as in T[i,k-1] +1) +
Optimal Optimal s=i
BST for BST for
a i , ..., a k-1 a k+1 , ..., a j
j
∑ ps (level as in T[k+1,j] +1)}
s =k+1
V. Balasubramanian 30
Example: key A B C D
probability 0.1 0.2 0.4 0.3
The tables below are filled diagonal by diagonal: the left one is filled
using the recurrence j
C[i,j] = min {C[i,k-1] + C[k+1,j]} + ∑ ps , C[i,i] = pi ;
i≤k≤j s=i
the right one, for trees’ roots, records k’s values giving the minima
i
j
1 2 3 4 i
j
0 1 2 3 4
C
1 0 .1 .4 1. 1. 1 1 2 3 3
2 0 0 .2 1 71.
B D
2 2 3 3
3 0 .8 41. 3 3 3 A
4 .40 0.3 4 4 optimal BST
V. Balasubramanian 31
5 0 5
Optimal Binary Search Trees
V. Balasubramanian 32
Analysis DP for Optimal BST
Problem
Time efficiency: Θ(n3) but can be reduced to Θ(n2) by taking
advantage of monotonicity of entries in the
root table, i.e., R[i,j] is always in the range
between R[i,j-1] and R[i+1,j]
Space efficiency: Θ(n2)
V. Balasubramanian 33
ALGORITHM TO SEARCH FOR AN
IDENTIFIER IN THE TREE ‘T’.
Procedure SEARCH (T X I)
// Search T for X, each node had fields LCHILD,
IDENT, RCHILD//
// Return address i pointing to the identifier X//
//Initially T is pointing to tree.
//ident(i)=X or i=0 //
iÅT
V. Balasubramanian 34
Algorithm to search for an identifier in the tree
‘T’(Contd..)
While i ≠ 0 do
case : X < Ident(i) : i ÅLCHILD(i)
: X = IDENT(i) : RETURN i
: X > IDENT(i) : I Å RCHILD(i)
endcase
repeat
end SEARCH
V. Balasubramanian 35
Optimal Binary Search trees -
Example
If
For while
repeat
loop
if each identifier is searched with equal probability the
average number of comparisons for the above tree are
1+2+2+3+4 = 12/5.
5
V. Balasubramanian 36
OPTIMAL BINARY SEARCH
TREES (Contd..)
• Let us assume that the given set of
identifiers are {a1,a2,……..an} with
a1<a2<…….<an.
• Let Pi be the probability with which we are
searching for ai.
• Let Qi be the probability that identifier x
being searched for is such that ai<x<ai+1
0≤i≤n, and a0=-∞ and an+1=+∞.
V. Balasubramanian 37
OPTIMAL BINARY SEARCH
TREES (Contd..)
• Then ∑Qi is the probability of an unsuccessful search.
0≤i≤ n
∑P(i) + ∑Q(i) = 1. Given the data,
1≤i≤n 0≤i≤n
let us construct one optimal binary search tree for
(a1……….an).
• In place of empty sub tree, we add external nodes
denoted with squares.
• Internal nodes are denoted as circles.
V. Balasubramanian 38
OPTIMAL BINARY SEARCH
TREES (Contd..)
If
For while
repeat
loop
V. Balasubramanian 39
Construction of optimal binary search
trees
• A BST with n identifiers will have n internal
nodes and n+1 external nodes.
• Successful search terminates at internal nodes
unsuccessful search terminates at external nodes.
• If a successful search terminates at an internal
node at level L, then L iterations of the loop in the
algorithm are needed.
• Hence the expected cost contribution from the
internal nodes for ai is P(i) * level(ai).
V. Balasubramanian 40
OPTIMAL BINARY SEARCH
TREES (Contd..)
• Unsuccessful searche terminates at external nodes
i.e. at i = 0.
• The identifiers not in the binary search tree may
be partitioned into n+1 equivalent classes
Ei 0≤i≤n.
Eo contains all X such that X≤ai
Ei contains all X such that a<X<=ai+1 1≤i≤n
En contains all X such that X > an
V. Balasubramanian 41
OPTIMAL BINARY SEARCH
TREES (Contd..)
• For identifiers in the same class Ei, the
search terminate at the same external node.
• If the failure node for Ei is at level L, then
only L-1 iterations of the while loop are
made
∴The cost contribution of the failure node
for Ei is Q(i) * level (Ei ) -1)
V. Balasubramanian 42
OPTIMAL BINARY SEARCH
TREES (Contd..)
• Thus the expected cost of a binary search tree is:
∑P(i) * level (ai) + ∑Q(i) * level(Ei) – 1) ……(2)
1≤i≤n 0≤i≤n
• An optimal binary search tree for {a1…….,an} is a
BST for which (2) is minimum .
• Example: Let {a1,a2, a3}={do, if, stop}
V. Balasubramanian 43
OPTIMAL BINARY SEARCH
TREES (Contd..)
Level 1 stop if do
Q(0) Q(1)
(a) (b) (c)
{a1,a2,a3}={do,if,stop}
V. Balasubramanian 44
OPTIMAL BINARY SEARCH
TREES (Contd..)
stop do
do stop
if
if
(d) (c)
V. Balasubramanian 45
OPTIMAL BINARY SEARCH
TREES (Contd..)
• With equal probability P(i)=Q(i)=1/7.
• Let us find an OBST out of these.
• Cost(tree a)=∑P(i)*level a(i) +∑Q(i)*level (Ei) -1
1≤i≤n 0≤i≤n
(2-1) (3-1) (4-1) (4-1)
=1/7[1+2+3 + 1 + 2 + 3 + 3 ] = 15/7
• Cost (tree b) =17[1+2+2+2+2+2+2] =13/7
• Cost (tree c) =cost (tree d) =cost (tree e) =15/7
∴ tree b is optimal.
V. Balasubramanian 46
OPTIMAL BINARY SEARCH
TREES (Contd..)
• If P(1) =0.5 ,P(2) =0.1, P(3) =0.05 , Q(0)
=.15 , Q(1) =.1, Q(2) =.05 and Q(3) =.05
find the OBST.
• Cost (tree a) = .5 x 3 +.1 x 2 +.05 x 3
+.15x3 +.1x3 +.05x2 +.05x1 = 2.65
• Cost (tree b) =1.9 , Cost (tree c) =1.5 ,Cost
(tree d) =2.05 ,
• Cost (tree e) =1.6 Hence tree C is optimal.
V. Balasubramanian 47
OPTIMAL BINARY SEARCH
TREES (Contd..)
• To obtain a OBST using Dynamic programming
we need to take a sequence of decisions regard.
The construction of tree.
• First decision is which of ai is be as root.
• Let us choose ak as the root . Then the internal
nodes for a1,…….,ak-1 and the external nodes for
classes Eo,E1,……,Ek-1 will lie in the left subtree L
of the root.
• The remaining nodes will be in the right subtree R.
V. Balasubramanian 48
OPTIMAL BINARY SEARCH
TREES (Contd..)
Define
Cost(L) =∑P(i)* level(ai) + ∑Q(i)*(level(Ei )-1)
1≤i≤k 0≤i≤k
Cost(R) =∑P(i)*level(ai) + ∑Q(i)*(level(Ei )-1)
k≤i≤n k≤i≤n
• Tij be the tree with nodes ai+1,…..,aj and nodes
corresponding to Ei,Ei+1,…..,Ej.
• Let W(i,j) represents the weight of tree Tij.
V. Balasubramanian 49
OPTIMAL BINARY SEARCH
TREES (Contd..)
W(i,j)=P(i+1) +…+P(j)+Q(i)+Q(i+1)…Q(j)=Q(i) +∑j [Q(l)+P(l)]
l=i+1
• The expected cost of the search tree in (a) is (let us call it T) is
P(k)+cost(l)+cost(r)+W(0,k-1)+W(k,n)
W(0,k-1) is the sum of probabilities corresponding to nodes
and nodes belonging to equivalent classes to the left of ak.
W(k,n) is the sum of the probabilities corresponding to those
on the right of ak. ak
L R
(a) OBST with root ak
V. Balasubramanian 50