BCS401 Module5
BCS401 Module5
LECTURE 39
a run is equal to the length of this path. Hence, the number of comparisons in the worst case
is equal to the height of the algorithm’s decision tree. The central idea behind this model lies
in the observation that a tree with a given number of leaves, which is dictated by the number
of possible outcomes, has to be tall enough to have that many leaves. Specifically, it is not
difficult to prove that for any binary tree with l leaves and height h,
h ≥log2].
Indeed, a binary tree of height h with the largest number of leaves has all its leaves on the
last level (why?). Hence, the largest number of leaves in such a tree is 2h. In other words, 2h
≥ l, which immediately implies. Inequality puts a lower bound on the heights of binary
decision trees and hence the worst-case number of comparisons made by any comparison-
based algorithm for the problem in question. Such a bound is called the information-
theoretic lower bound. We illustrate this technique below on two important problems:
Analysis & Design of Algorithms (BCS401) AIML-CEC Page 1
sorting and searching in a sorted array.
Inequality implies that the height of a binary decision tree for any comparison-based sorting
algorithm and hence the worst-case number of comparisons made by such an algorithm
cannot be less than [log2 n!]:
In other words, about n log2 n comparisons are necessary in the worst case to sort an
arbitrary n-element list by any comparison-based sorting algorithm. Note that merge sort
makes about this number of comparisons in its worst case and hence is asymptotically
optimal. This also implies that the asymptotic lower bound n log n is tight and therefore
cannot be substantially improved. We should point out, however, that the lower bound of
Analysis & Design of Algorithms (BCS401) AIML-CEC Page 2
nlogn can be improved for some values of n.
We can also use decision trees for analyzing the average-case efficiencies of comparison-
based sorting algorithms. We can compute the average number of comparisons for a
particular algorithm as the average depth of its decision tree’s leaves, i.e., as the average path
length from the root to the leaves. For example, for the three-element insertion sort whose
decision tree is given in Figure 11.3, this number is (2 + 3 + 3 + 2 + 3 + 3)/6 = 2(2/3) .
Under the standard assumption that all n! outcomes of sorting are equally likely, the
following lower bound on the average number of comparisons Cavg made by any
comparison-based algorithm in sorting an n-element list has been proved:
Cavg(n) ≥ log2 n!.
Review Questions
1. What is a decision tree in the context of algorithms?
2. Why is the height of a decision tree important in analyzing the performance of an
algorithm?
3. What is meant by the information-theoretic lower bound in decision trees?
4. List the types of problems that can be analyzed using decision trees.
5. List two sorting algorithms that are considered asymptotically optimal based on
decision tree analysis.
We will use decision trees to determine whether this is the smallest possible number of
comparisons. Since we are dealing here with three-way comparisons in which search keyK is
compared with some element A[i] to see whether K <A[i],K = A[i], orK >A[i], it is natural to
try using ternary decision trees. Figure 11.4 presents such a tree for the case of n = 4.The
internal nodes of that tree indicate the array’s elements being compared with the search key.
The leaves indicate either a matching element in the case of a successful search or a found
interval that the search key belongs to in the case of an unsuccessful search. We can
represent any algorithm for searching a sorted array by three-way comparisons with a
ternary decision tree similar to that in Figure 11.4. For an array of n elements, all such
decision trees will have 2n + 1 leaves (n for successful searches and n + 1 for unsuccessful
ones). Since the minimum height h of a ternary tree with l leaves is log3, we get following
lower bound on the number of worst-case comparisons:
Cworst(n) ≥ log3(2n + 1).
This lower bound is smaller than log2(n + 1), the number of worst-case comparisons for
binary search, at least for large values of n (and smaller than or equal to log2(n + 1) for every
positive integer n—see Problem 7 in this section’s exercises). Can we prove a better lower
bound, or is binary search far from being optimal? The answer turns out to be the former. To
obtain a better lower bound, we should consider binary rather than ternary decision trees,
such as the one in Figure 11.5. Internal nodes in such a tree correspond to the same three
way comparisons as before, but they also serve as terminal nodes for successful searches.
Leaves therefore represent only unsuccessful searches, and there are n + 1 of them for
searching an n-element array.
Now, one more concept: given decision problems P and Q, if an algorithm can transform a
solution for P into a solution for Q in polynomial time, it’s said that Q is poly-time
reducible (or just reducible) to P.
Review Questions
1. Why are non-polynomial bounded problems significant in computing?
2. What is the class P?
3. Why is P considered a subset of NP?
4. What are the two conditions for a problem to be NP-complete?
5. Why are all NP-complete problems also NP-hard?
Backtracking
Some problems can be solved, by exhaustive search. The exhaustive search technique
suggests generating all candidate solutions and then identifying the one (or the ones) with a
desired property.
Backtracking is a more intelligent variation of this approach. The principal idea is to
construct solutions one component at a time and evaluate such partially constructed
candidates as follows. If a partially constructed solution can be developed further without
violating the problem’s constraints, it is done by taking the first remaining legitimate option
for the next component. If there is no legitimate option for the next component, no
alternatives for any remaining component need to be considered. In this case, the algorithm
backtracks to replace the last component of the partially constructed solution with its next
option.
It is convenient to implement this kind of processing by constructing a tree of choices being
made, called the state-space tree. Its root represents an initial state before the search for a
solution begins. The nodes of the first level in the tree represent the choices made for the first
component of a solution; the nodes of the second level represent the choices for the second
component, and soon. A node in a state-space tree is said to be promising if it corresponds to
a partially constructed solution that may still lead to a complete solution; otherwise, it is
called non-promising. Leaves represent either non-promising dead ends or complete
solutions found by the algorithm.
In the majority of cases, a statespace tree for a backtracking algorithm is constructed in the
manner of depth-first search. If the current node is promising, its child is generated by adding
the first remaining legitimate option for the next component of a solution, and the processing
moves to this child. If the current node turns out to be non-promising, the algorithm
backtracks to the node’s parent to consider the next possible option for its last component; if
there is no such option, it backtracks one more level up the tree, and so on. Finally, if the
algorithm reaches a complete solution to the problem, it either stops (if just one solution is
required) or continues searching for other possible solutions.
N-Queens problem
The problem is to place n queens on an n × n chessboard so that no two queens attack each
other by being in the same row or in the same column or on the same diagonal.
So let us consider the four-queens problem and solve it by the backtracking technique.
Since each of the four queens has to be placed in its own row, all we need to do is to assign a
column for each queen on the board presented in figure.
If other solutions need to be found, the algorithm can simply resume its operations at the leaf
at which it stopped. Alternatively, we can use the board’s symmetry for this purpose.
Finally, it should be pointed out that a single solution to the n-queens problem for any n ≥ 4
can be found in linear time.
The root of the tree represents the starting point, with no decisions about the given elements
made as yet. Its left and right children represent, respectively, inclusion and exclusion of a1 in
a set being sought.
Similarly, going to the left from a node of the first level corresponds to inclusion of a2 while
going to the right corresponds to its exclusion, and so on. Thus, a path from the root to a node
on the ith level of the tree indicates which of the first in numbers have been included in the
subsets represented by that node.
We record the value of s, the sum of these numbers, in the node. If s is equal to d, we have a
solution to the problem. We can either report this result and stop or, if all the solutions need
to be found, continue by backtracking to the node’s parent. If s is not equal to d, we can
terminate the node as non-promising if either of the following two inequalities holds:
It is convenient to order the items of a given instance in descending order by their value-to-
weight ratios.
Each node on the ith level of state space tree, 0 ≤ i ≤ n, represents all the subsets of n items
that include a particular selection made from the first i ordered items. This particular
selection is uniquely determined by the path from the root to the node: a branch going to the
left indicates the inclusion of the next item, and a branch going to the right indicates its
exclusion.
We record the total weight w and the total value v of this selection in the node, along with
some upper bound ub on the value of any subset that can be obtained by adding zero or more
items to this selection. A simple way to compute the upper bound ub is to add to v, the total
value of the items already selected, the product of the remaining capacity of the knapsack W
– w and the best per unit payoff among the remaining items, which is vi+1/wi+1:
ub = v + (W − w)(vi+1/wi+1).
Example: Consider the following problem. The items are already ordered in descending order
of their value-to-weight ratios.
Let us apply the branch-and-bound algorithm. At the root of the state-space tree (see Figure
12.8), no items have been selected as yet. Hence, both the total weight of the items already
selected w and their total value v are equal to 0. The value of the upper bound is 100.
Node 1, the left child of the root, represents the subsets that include item 1. The total weight
and value of the items already included are 4 and 40, respectively; the value of the upper
bound is 40 + (10 − 4) * 6 = 76.
Review Questions
1. Why is branch-and-bound considered an optimization problem technique?
2. What are the two additional items required in branch-and-bound compared to
backtracking?
3. What is the objective of the knapsack problem in the context of branch-and-bound?
4. Why is it convenient to order the items by their value-to-weight ratios in the knapsack
problem?
LECTURE 45 and 46
The greedy algorithm will select the first item of weight 4, skip the next item of weight 7, select
the next item of weight 5, and skip the last item of weight 3. The solution obtained happens to
be optimal for this instance
Greedy algorithm for the continuous knapsack problem
Step 1 Compute the value-to-weight ratios vi/wi, i = 1, . . . , n, for the items given.
Step 2 Sort the items in non-increasing order of the ratios computed in Step 1.
Step 3 Repeat the following operation until the knapsack is filled to its full capacity or no item
is left in the sorted list: if the current item on the list fits into the knapsack in its entirety, take it
and proceed to the next item; otherwise, take its largest fraction to fill the knapsack to its full
capacity and stop.
For example, for the four-item instance used in Example 5 to illustrate the greedy algorithm for
the discrete version, the algorithm will take the first item of weight 4 and then 6/7 of the next
item on the sorted list to fill the knapsack to its full capacity.
It should come as no surprise that this algorithm always yields an optimal solution to the
continuous knapsack problem. Indeed, the items are ordered according to their efficiency in
using the knapsack’s capacity. If the first item on the sorted list has weight w1 and value v1, no
solution can use w1 units of capacity with a higher payoff than v1. If we cannot fill the knapsack
with the first item or its fraction, we should continue by taking as much as we can of the second
most efficient item, and so on. A formal rendering of this proof idea is somewhat involved, and
we will leave it for the exercises.
Note also that the optimal value of the solution to an instance of the continuous knapsack
problem can serve as an upper bound on the optimal value of the discrete version of the same
instance. This observation provides a more sophisticated way of computing upper bounds for
solving the discrete knapsack problem by the branch-and-bound method
Approximation Schemes: We now return to the discrete version of the knapsack problem. For
this problem, unlike the traveling salesman problem, there exist polynomial-time approximation
schemes, which are parametric families of algorithms that allow us to get approximations s(k) a
with any predefined accuracy level:
where k is an integer parameter in the range 0 ≤ k < n. The first approximation scheme was
suggested by S. Sahni in 1975 [Sah75]. This algorithm generates all subsets of k items or less,
and for each one that fits into the knapsack, it adds the remaining items as the greedy algorithm
would do (i.e., in nonincreasing order of their value-to-weight ratios). The subset of the highest
value obtained in this fashion is returned as the algorithm’s output.
Example: A small example of an approximation scheme with k = 2 is provided in the following
figure. The algorithm yields {1, 3, 4}, which is the optimal solution for this instance.
You can be excused for not being overly impressed by this example. And, indeed, the
importance of this scheme is mostly theoretical rather than practical. It lies in the fact that, in
addition to approximating the optimal solution with any predefined accuracy level, the time
efficiency of this algorithm is polynomial in n. Indeed, the total number of subsets the algorithm
generates before adding extra elements is
For each of those subsets, it needs O(n) time to determine the subset’s possible extension. Thus,
the algorithm’s efficiency is in O(knk+1). Note that although it is polynomial in n, the time
efficiency of Sahni’s scheme is exponential in k. More sophisticated approximation schemes,
called fully polynomial schemes, do not have this shortcoming.
Review Questions
1. Why does the greedy algorithm for the discrete knapsack problem sometimes yield an
optimal solution?
2. Why does the greedy algorithm always yield an optimal solution for the continuous
knapsack problem?
3. How can the solution to the continuous knapsack problem be used in the branch-and-
bound method for the discrete knapsack problem?
4. What is a polynomial-time approximation scheme (PTAS) for the knapsack problem?
5. Why is the importance of Sahni's approximation scheme considered more theoretical
than practical?