Using Induction To Design Algorithms
Using Induction To Design Algorithms
UDI MANBER
This article presents a methodology, based on mathematical induction, for approaching the design and the
teaching of combinatorial algorithms. While this methodology does not cover all possible ways of designing
algorithms it does cover many known techniques. It
also provides an elegant intuitive framework for explaining the design of algorithms in more depth. The
heart of the methodology lies in an analogy between
the intellectual process of proving mathematical theorems and that of designing combinatorial algorithms.
We claim that although these two processes serve different purposes and achieve different types of results,
they are more similar than it seems. This claim is established here by a series of examples of algorithms,
each developed and explained by the use of the methodology. We believe that students can get more motivation, greater depth, and better understanding of algorithms by this methodology.
Mathematical induction is a very powerful proof
technique. It usually works as follows. Let T be a theorem that we want to prove. Suppose that 7 includes a
parameter n whose value can be any natural number.
Instead of proving directly that T holds for all values of
This research
was supported
in part by an NSF grant MCS-8303134. and an
NSF Presidential
Young Investigator
Award
[grant DCR-8451397). with matching funds from Tektronix,
Digital Equipment
Corporation,
and Hewlett
Packard.
1300
$1.50
November 1988
Volume 31
Number 11
Articles
November 1988
Volume 31
Number 11
EXAMPLES
Evaluating Polynomials
The problem: Given a sequence of real numbers an,
a,, aO,and a real number x, compute the
a,-I,...,
+ . . * + UlX + ao.
This is the same problem, except for the size. Therefore, we can solve it by induction using the following
hypothesis:
Induction hypothesis: We know how to evaluate a polynomial represented by the input a,-1, . . . , a,, aO, at the
point x (i.e., we know how to compute P,-l(x)).
1301
Articles
Complexity: The algorithm requires only n multiplications, n additions, and one extra memory location. Even
though the previous solutions seemed very simple and
efficient, it was worthwhile to pursue a better algorithm. This algorithm is not only faster, its corresponding program is simpler.
Summary:
Mappings
+ . . . + a,x + a0
P := a,;
for i := I to n do
P := x * P + a,-i
end;
1302
Communications
of the ACM
If f is originally one-to-one then the whole set A satisfies the conditions of the problem, and it is definitely
maximal. If, on the other hand, f(i) = f(j) for some i # j
then S cannot contain both i and j. For example, in
Figure 1 S cannot contain both 2 and 3. The choice of
which one of them to eliminate cannot be arbitrary.
Suppose, for example, that we decide to eliminate 3.
Since 1 is mapped to 3 we must eliminate 1 as well (the
November 1988
Volume 31
Number 11
Articles
Implementation:
Algorithm
Mapping
begin
I, on a line. Each interval I1 is given by its two endpoints Lj (left) and Rj (right). We want to murk all intervals that are contained in other intervals. In other
words, an interval I1 should be marked if there exists
another interval Ik (k # j) such that Lk I Lj and Rk 2 Rj.
For simplicity we assume that the intervals are distinct
(i.e., no two intervals have the same left and right endpoints, but they may have one of them in common).
Figure 2 shows such a set of intervals. (They are shown
one on top of the other instead of all on one line for
better illustration.)
-
1303
Articles
hypothesis:
1304
in another
begin
Sort the intervals in increasing order according to left
endpoints;
Intervals with equal left endpoints are sorted in
decreasing order according to right endpoint;
{for all j < k, either Lj < Lk or Lj = Lk and Rj > Rk]
MaxR := R, ;
forj:=ztondo
if Rj 5 MaxR then
Mark[j] := true
else
Mark[ j] := false;
MaxR := Rj
end;
Complexity: Except for the sorting, the algorithm contains one loop involving O(n) operations. Since sorting
requires O(n log n) steps, it dominates the running time
of the algorithm.
Summary: This example illustrates a less straightforward use of induction. First, we select the order under
which the induction will take place. Then, we design
the induction hypothesis so that (1) it implies the desired result, and (2) it is easy to extend. Concentrating
on these steps makes the design of many algorithms
simpler.
CHOOSING THE INDUCTION
SEQUENCE WISELY
In the previous examples, the emphasis in the search
for an algorithm was on reducing the size of the problem. This is the essence of the inductive approach.
There are, however, many ways to achieve this goal.
First, the problem may include several parameters (e.g.,
left endpoints and right endpoints, vertices and edges in
a graph), and we must decide which of those should be
reduced. Second, we may be able to eliminate many
possible elements, and we want to choose the easiest
one (e.g., the leftmost endpoint, the smallest number).
Third, we may want to impose additional constraints
on the problem (e.g., the intervals are in a sorted order).
There are many other variations. For example, we can
assume that we know how to solve the problem for
some other values < n instead of just n - 1. This is a
valid assumption. Anything that reduces the size of the
problem can be considered since it leads back to the
base case which we solve directly. Going back to the
sorting example discussed in the introduction, we can
reduce the sorting of n elements to sorting two subsets
of n/2 elements. The two sorted subsets can then be
merged (leading to an algorithm called merge sort). Dividing the problem (inductively)
into several equal
parts is a very useful technique (which we will discuss
later) called divide nnd conquer.
Some reductions are easy to make, some are hard.
Some lead to good algorithms, some do not. In many
cases this is the only hard part of the problem, and
once the right choice is made the rest is easy (e.g., the
choice of the element i in the mapping problem). This
Articles
algorithm
is given below.
Algorithm Topological-Sorting
(G = (V, E): a directed acyclic graph);
begin
Initialize v .indegree for all vertices;
{e.g., by Depth-First Search]
G-label := 0;
for i := 1 to n do
if v; .indegree = 0 then put v; in Queue;
repeat
remove vertex v from Queue;
G-label := G-label + 1;
v .label := G-label;
for all edges (v, w) do
w indegree := w .indegree - 1;
if w .indegree = 0 then put w in Queue
until Queue is empty
end;
Complexity: Initializing
the indegree counters requires
O() VI + 1E I) time (using depth first search for example). Finding a vertex with indegree 0 takes constant
time (accessing a queue). Each edge (v, w) is considered
once (when v is taken from the queue). Thus, the number of times the counters need to be updated is exactly
the number of edges in the graph. The running time of
the algorithm is therefore 0( 1VI + 1El), which is linear
in the size of the input.
Summary: This is another example in which the inductive approach leads almost directly to an algorithm. The
trick here was to choose the order of the induction
wisely. We did not reduce the problem arbitrarily, but
chose a special vertex to remove. The size of any given
problem can be reduced in many possible ways. The
idea is to explore a variety of options and test the re-
1305
Articles
Problem
1306
Celebrity
next := 2;
(in the first phase we eliminate all but one candidate]
while next 5 n do
next := next + 1;
if Know[i, j] then i := next
else j := next;
{one of either i or j is eliminated]
if i = n + 1 then candidate := j else candidate := i;
(Now we check that the candidate is indeed the
celebrity]
wrong := false; k := 1;
Know[candidate, candidate] := false;
(just a dummy variable to pass the test]
while not wrong and k I n do
if Know[candidate, k] then wrong := true;
if not Know[k, candidate] then
if candidate #k then wrong := true;
k := k + I;
if not wrong then print candidate is a celebrity!
end;
Complexity: At most 3(n - 1) questions will be asked:
n - 1 questions in the first phase so that n - 1 persons
November 1988
Volume 31
Number 11
Articles
THE INDUCTION
HYPOTHESIS
nodes.
Again, the base case is trivial. Now, when the root is
considered, its balance factor can be easily determined
by the difference between the heights of its children.
Furthermore, the height of the root can also be easily
determined-it is the maximal height of the two children plus 1.
The key to the algorithm is to solve a slightly extended problem. Instead of just computing balance factors, we also compute heights. The extended problem
turns out to be an easier problem since the heights are
easy to compute. In many cases, solving a stronger
problem is easier. This is especially true for induction.
With induction, we only need to extend a solution of a
small problem to a solution of a larger problem. If the
solution is broader (because the problem is extended)
then the induction step may be easier since we have
more to work with. It is a very common error to forget
that there are two different parameters in this problem,
and that each one should be computed separately.
Closest Pair
The problem:
1307
Articles
The straightforward
solution using induction would
proceed by removing a point, solving the problem for
n - 1 points, and considering the extra point. However!
if the only information obtained from the solution of
the n - 1 case is the minimum distance, then the distances from the additional point to all other n - 1
points must be checked. As a result, the total number
of distance computations is n - 1 + n - 2 + . . . + 1 =
n(n - 1)/z. (This is, in fact, the straightforward
algorithm consisting of comparing every pair of points.) We
want to find a faster solution.
f!
a
l
d2
hypothesis
is the following:
,-it----,
dl dl
1308
Closest-Pair
{first attempt]
November 1988
Volume 31
Number 1 I
Articles
Closest-Pair
{An improved version}
(PI, pz, . . . , p,: points in the plane);
Algorithm
begin
The key idea here is to strengthen the induction hypothesis. We have to spend O(n log n) time in the combining step because of the sorting. Although we know
how to solve the sorting problem directly, it takes too
long. Can we somehow solve the sorting problem at the
same time we are solving the closest pair problem? In
other words, we would like to strengthen the induction
hypothesis to include the sorting as part of the closest
pair problem to obtain a better solution.
Induction hypothesis: Given a set of <n points in the
plane, we know how to find the closest distance and how to
output the set sorted according to y coordinates.
+ O(n), T(2) = 1,
which implies that T(n) = O(n log n). The only difference between this algorithm and the previous one is
that the sorting according to the y coordinates is not
done every time from scratch. We use the stronger induction hypothesis to perform the sorting as we go
along. The revised algorithm is given below. This algorithm was developed by Shamos and Hoey [16] (see
also [15]).
November 1988
Volume 31
Number 11
INDUCTION
The idea of strong induction (sometimes called structured induction) is to use not only the assumption that
the theorem is true for n - 1 (or any other value en),
but the stronger assumption that the theorem is true for
all k, 1 5 k < n. Translating this technique to algorithm
design requires maintaining a data structure with all
the solutions of the smaller problems. Therefore, this
technique usually leads to more space usage. We present one example of the use of this technique.
The Knapsack
Problem
hypothesis:
1309
Articles
1310
Communications
of the ACM
November 1988
Volume 3;
Number 11
Articles
PROOF TECHNIQUES
. . . ..--
._..**
*.a
..*.
.(
..
......
*,
/.. .
:*. *.*/
.* *:;...
L..................
....a*
*i
v2
x . . . ..-..................
Ml
u2
November 1988
Volume 31
Number 11
1311
Articles
sional space, we may want to reduce the number of
objects and/or the number of dimensions depending
on the phase of the algorithm (see for example [4]).
Reversed Induction
This is a little known technique that is not often used
in mathematics but is often used in computer science.
Regular induction covers a.11natural numbers by
starting from a base case (n = 1) and advancing. Suppose that we want to go backwards. We want to prove
that the theorem is true for ?1- 1 assuming that it is
true for n. We call this type of proof a reversed induction.
But, what would be the base case? We can start by
proving a base case of n = M, where M is a very large
number. If we prove it for II = M and then use reverseed
induction then we have a proof for all numbers 5 M.
Although this is usually unsatisfactory, in some cases :it
is sufficient. For example, suppose we apply double
induction on two parameters: (e.g., number of vertices
and number of edges in a graph). We can apply regular
induction to one parameter, and reversed induction to
the second parameter if the second parameter can be
bounded in terms of the first one. For example, there
are at most n(n - 1) edges in directed graphs with n
vertices. We can use regular induction on n with the
assumption that all edges are present (namely, we con.sider only complete graphs), and then reversed induction on the number of edges.
A more common use of reversed induction is the
following. Proving a base case for only one value of n
limits the proof to those numbers less than the value.
Suppose that we can prove the theorem directly for an
infinite set of values of II. For example, the infinite set
can consist of all powers of 2. Then we can use reversed induction and cover a.11values of n. This is a
valid proof technique since for each value of n there is
a value larger than it in the base case set (since the set
is infinite).
A very good example of the use of this technique is
the elegant proof (due to Cauchy) of the arithmetic
mean versus geometric mean inequality (see for example [3]). When proving mathematical theorems, it is
usually not easier to go from n to n - 1 than it is to go
from II - 1 to n, and it is much harder to prove an
infinite base case rather than a simple one. When designing algorithms, on the other hand, it is almost always easy to go from n to n -. 1, that is, to solve the
problem for smaller inputs. For example, one can introduce dummy inputs that do not affect the outcome.
As a result, it is sufficient in many cases to design the
algorithm not for inputs of all sizes, but only for sizes
taken from an infinite set. The most common use of
this principle is designing algorithms only for inputs of
size n which is a power of 2. It makes the design much
cleaner and eliminates many messy details. Obviously these details will have to be resolved eventually, but they are usually easy to handle. We used the
assumption that n is a power of 2, for example, in the
closest pair problem.
q
1312
Communications
of the ACM
November
1988
Volume
31
Number
11
Articles
CR Categories
IX.10
[Software
Engineer-
MANBER
the University
ACM SPECIALINTERESTGROUPS
SIGCAPH
Cassette Edition
Newsletter,
SIGCAPH
Newsletter.
SlGCAS
V-dcomputerscienceandpacticein
Record
and
SIGNUM
Newsletter
and Human
SIGOIS
NEWS
Computability
SIGAda
Letters
SIGAPL
Quote
SIGCHI
(Computer
Bulletin
Interaction)
SIGCOMM
Computer
Communication
Review
(Data Communication)
(Computer
Newsletter
Personnel
Research)
(Automata and
Theory)
SIGCSE
(Computer Science
Bulletin
Education)
(Ada)
Quad
(Computers
Newsletter
SIGCUE
(APL)
(Computer
Uses in
Education)
SIGARCH
Computer
(Architecture
SIGART
Newsletter
Architecture
of Computer Systems)
SIGDA
Newsletter
SIGDOC
(Artificial
Asterisk
(Design Automation)
Operating
DATABASE
(Business Data
SIGGRAPH
Graphics
(Computer Graphics)
Processing)
SIGBIO
Computer
Newsletter
(Biomedical
SIGIR
Forum
(Information
Retrieval)
Computing)
SIGCAPH
SIGPLAN
Physically
November 1988
Volume 31
Number 11
SIGMETRKS
Review
Performance
(Measurement
Evaluation)
Review
(Programming
Notices
Languages)
SIGPLAN FORTRAN
Newsletter
FORUM (FORTRAN]
(Security,
Audit.
and Control)
(Symbolic and Algebraic
Bulletin
Manipulation)
[Simulation
Simuletter
Newsletter
Personal Computing
Applications)
SIGSOFT
Software
(Software
Newsletter
Systems
(Operating Systems)
SIGSMALL/PC
SIGBDP
(Office Information
Newsletter
and
Modeling)
Documentation)
Intelligence)
SIGOPS
SIGSIM
(Systems
(Numerical
Systems)
SIGSAM
News
of Data)
Mathematics)
SIGSAC
Bulletin
(Management
SIGMOD
Society)
SIGCPR
SIGACT
Newsletter
(Microprogramming)
Editions
tmlgh
SIGMICRO
(Small and
Systems and
Engineering
Notes
Engineering)
Evaluation
and
SIGUCCS
Newsletter
College Computing
(University and
Services)
Communications
of the ACM
1313