optimal binary search tree
optimal binary search tree
Let us assume that the given set of identifiers is {a1, . . . , an} with a1 < a2 < . . . . < an.
Let q (i) be the probability that the identifier x being searched for is such that ai < x < ai+1,
0 < i < n (assume a0 = - ∞ and an+1 = +∞).
We have to arrange the identifiers in a binary search tree in a way that minimizes the
expected total access time.
In a binary search tree, the number of comparisons needed to access an element at depth
'd' is d + 1, so if 'ai' is placed at depth 'di', then we want to minimize:
∑ p i(1+d i)
i=1
Let P (i) be the probability with which we shall be searching for 'ai'.
Let Q (i) be the probability of an un-successful search. Every internal node represents a
point where a successful search may terminate. Every external node represents a point
where an unsuccessful search may terminate.
pi(1+di)
The expected cost contribution for the internal node for 'ai' is:
P (i) * level (ai ) .
Unsuccessful search terminate with I = 0 (i.e at an external node). Hence the cost
contribution for this node is:
Q (i) * level ((Ei) - 1)
n n
The computation of each of these c(i, j)’s requires us to find the minimum of m
quantities. Hence, each such c(i, j) can be computed in time O(m). The total time for all
c(i, j)’s with j – i = m is therefore O(nm – m2).
The total time to evaluate all the c(i, j)’s and r(i, j)’s is therefore:
∑( nm-m ) 2
1 ≤m ≤ n
Example 1: The possible binary search trees for the identifier set (a1, a2, a3) = (do, if,
stop) are as follows. Given the equal probabilities p (i) = Q (i) = 1/7 for all i, we have
Cost (tree #1) = ( 1/7 * 1 + 1/7 * 2 + 1/7 *3 )+
=6/7 + 9/7
= 15/7
= 5/7 + 8/7
= 13/7
= 6/7 + 9/7
= 15/7
= 6/7 + 9/7
= 15/7