0% found this document useful (0 votes)
25 views75 pages

Notes Liu

Uploaded by

slayerx302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views75 pages

Notes Liu

Uploaded by

slayerx302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

David Liu

Introduction to the Theory


of Computation
Lecture Notes and Exercises for CSC236

Department of Computer Science


University of Toronto
Contents

1 Introduction 5

But Why Do I Care? 5

Overview of this Course 6

Prerequisite Knowledge 6

2 Induction 9

The Induction Idea 10

When Simple Induction Isn’t Enough 13

Complete Induction 14

Beyond Numbers 16

Recursive Definitions of Sets 16

Structural Induction 17

A Larger Example 19

Exercises 20

3 Recursion 25

Measuring Runtime 25

A Simple Recursive Function 25


4 david liu

Recursively Defined Functions 26

Aside: List Slicing vs. Indexing 30

A Special Recurrence Form 31

Divide-and-Conquer Algorithms 34

Quicksort 37

Exercises 39

4 Program Correctness 43
What is Correctness? 43

Correctness of Recursive Programs 44

Iterative Programs 47

Termination 51

Exercises 53

5 Regular Languages & Finite Automata 59


Definitions 59

Regular Languages 60

A Suggestive Flowchart 63

Deterministic Finite Automata 64

Correctness of DFAs 66

Limitations of DFAs 67

Nondeterminism 68

Equivalence of Definitions 70

Exercises 72

6 In Which We Say Goodbye 75


1 Introduction

There is a common misconception held by our students, the students of other


disciplines, and the public at large about the nature of computer science. So
before we begin our study this semester, let us clear up exactly what we will
be learning: computer science is not the study of programming, any more than
chemistry is the study of test tubes or math the study of calculators. To be sure,
programming ability is a vital tool in any computer scientist’s repertoire, but it
is still a tool in service to a higher goal.

Computer science is the study of problem-solving. Unlike other disciplines, where


researchers use their skill, experience, and luck to solve problems at the frontier
of human knowledge, computer science asks: What is problem-solving? How Even the many who go into industry
are problems solved? Why are some problems easier to solve than others? How confront these questions on a daily
basis in their work!
can we measure the quality of a problem’s solution?

It should come as no surprise that the field of computer science predates the
invention of computers, since humans have been solving problems for millen-
nia. Our English word algorithm, a sequence of steps taken to solve a problem, The word algebra is derived from the
is named after the Persian mathematician Muhammad ibn Musa al-Khwarizmi, word al-jabr, appearing in the title
of one of his books, describing the
whose mathematics texts were compendia of mathematics computational proce- operation of subtracting a number from
dures. In 1936, Alan Turing, one of the fathers of modern computer science, de- both sides of an equation.
veloped the Turing Machine, a theoretical model of computation which is widely A little earlier, Alonzo Church (who
believed to be just as powerful as all programming languages in existence today. would later supervise Turing during
the latter’s graduate studies) developed
In one of the earliest and most fundamental results in computer science, Turing the lambda calculus, an alternative
proved that there are some problems that cannot be solved by any computer that model of computation that forms
has ever or will ever be built – before computers had been invented at all! the philosophical basis for functional
programming languages like Scheme,
Haskell, and ML.

But Why Do I Care?

A programmer’s value lies not in her ability to write code, but to understand
problems and design solutions – a much harder task. Beginning programmers
often write code by trial and error (“Does this compile? What if I add this line?”),
which indicates not a lack of programming experience, but a lack of design ex-
perience. When presented with a problem, many students often jump straight
to the computer, even if they have no idea what they are going to write! And
when the code is complete, they are at a loss when asked the two fundamental “My code is correct because it passed
all of the tests” is reasonable but unsat-
questions: Why is your code correct, and is it a good solution?
isfying. What I really want to know is
how your code works.
6 david liu

In this course, you will learn the skills necessary to answer both of these ques-
tions, improving both your ability to reason about the code you write and your
ability to communicate your thinking with others. These skills will help you
design cleaner and more efficient programs, and clearly document and present
your code. Of course, like all skills, you will practice and refine these throughout
your university education and in your careers.

Overview of this Course

The first section of the course introduces the powerful proof technique of induc-
tion. We will see how inductive arguments can be used in many different math-
ematical settings; you will master the structure and style of inductive proofs, so
that later in the course you will not even blink when asked to read or write a
“proof by induction.”

From induction, we turn our attention to the runtime analysis of recursive pro-
grams. You have done this already for non-recursive programs, but did not
have the tools necessary to handle recursion. We will see that (mathematical)
induction and (programming) recursion are two sides of the same coin, so we
use induction to make analysing recursive programs easy as cake. After these Some might even say, chocolate cake.
lessons, you will always be able to evaluate your recursive code based on its
runtime, a very important consideration!

We next turn our attention to the correctness of both recursive and non-recursive
programs. You already have some intuition about why your programs are cor-
This is not to say tests are unnecessary!
rect; we will teach you how to formalize this intuition into mathematically rigor- The methods we’ll teach you in this
ous arguments, so that you may reason about the code you write and determine course are quite tricky for larger soft-
ware systems. However, a more mature
errors without the use of testing.
understanding of your own code cer-
tainly facilitates finding and debugging
Finally, we will turn our attention to the simplest model of computation, the errors.
finite automaton. This serves as both an introduction to more complex compu-
tational models like Turing Machines, and also formal language theory through
the intimate connection between finite automata and regular languages. Regu-
lar languages and automata have many other applications in computer science,
from text-based pattern matching to modelling biological processes.

Prerequisite Knowledge

CSC236 is mainly a theoretical course, the successor to MAT102. This is funda-


mentally a computer science course, though, so while mathematics will play an
important role in our thinking, we will mainly draw motivation from computer
science concepts. Also, you will be expected to both read and write Python code
at the level of CSC148. Here are some brief reminders about things you need to
know – and if you find that you don’t remember something, review early!
introduction to the theory of computation 7

Concepts from MAT102

In MAT102, you learned how to write proofs. This is the main object of interest in
CSC236, so you should be comfortable with this style of writing. However, one
key difference is that we will not expect (nor award marks for) a particular proof
structure – indentation is no longer required, and your proofs can be mixtures
of mathematics, English paragraphs, pseudocode, and diagrams! Of course, we
will still greatly value clear, well-justified arguments, especially since the content So a technically correct solution that is
will be more complex. extremely difficult to understand will
not receive full marks. Conversely, an
incomplete solution which explains
clearly partial results (and possibly
Concepts from CSC148 even what is left to do to complete
the solution) will be marked more
generously.
Recursion, recursion, recursion. If you liked using recursion CSC148, you’re in
luck: induction, the central proof structure in this course, is the abstract think-
ing behind designing recursive functions. And if you didn’t like recursion or
found it confusing, don’t worry! This course will give you a great opportunity
to develop a better feel for recursive functions in general, and even give you
programming opportunities to get practical experience.

This is not to say you should forget everything you have done with iterative pro-
grams; loops will be present in our code throughout this course, and will be the A design pattern is a common coding
central object of study for a week or two when we discuss program correctness. template which can be used to solve a
variety of different problems. “Looping
In particular, you should be very comfortable with the central design pattern of through a list” is arguably the simplest
first-year python: computing on a list by processing its elements one at a time one.
using a for or while loop.

You should also be comfortable with terminology associated with trees, which
will come up occasionally throughout the course when we discuss induction
proofs.

You will also have to remember the fundamentals of Big-O algorithm analysis,
and how to determine tight asymptotic bounds for common functions.

Finally, the last part of the course deals with regular languages; you should be
familiar with the terminology associated with strings, including length, reversal,
concatenation, and the empty string.
2 Induction

What is the sum of the numbers from 0 to n? This is a well-known identity


you’ve probably seen before:
n
n ( n + 1)
∑i= 2
.
i =0

A “proof” of this is attributed to Gauss:

1 + 2 + 3 + · · · + n − 1 + n = (1 + n ) + (2 + n − 1) + (3 + n − 2) + · · ·
= ( n + 1) + ( n + 1) + ( n + 1) + · · ·
n n
= ( n + 1) (since there are pairs)
2 2
This isn’t exactly a formal proof – what if n is odd? – and although it could be We ignore the 0 in the summation, since
made into one, this proof is based on a mathematical “trick” that doesn’t work this doesn’t change the sum.
n
for, say, ∑ i2 . And while mathematical tricks are often useful, they’re hard to
i =0
come up with in the first place! Induction gives us a different way to tackle this
problem that is astonishingly straightforward.

A predicate is a parametrized logical statement. Another way to view a predicate


is as a function that takes in one or more arguments, and outputs either True or
False. Some examples of predicates are:

EV (n) : n is even
GR( x, y) : x > y
FROSH ( a) : a is a first-year university student

Every predicate has a domain, the set of its possible input values. For exam-
ple, the above predicates could have domains N, R, and “the set of all UofT
We will always use the convention that
students,” respectively. Predicates give us a precise way of formulating English 0 ∈ N unless otherwise specified.
problems; the predicate that is relevant to our example is
n
n ( n + 1)
P(n) : ∑i= 2
.
A common mistake: defining the
i =0
predicate to be something like
n ( n + 1)
You might be thinking right now: “Okay, now we’re going to prove that P(n) is P(n) : . Such an expres-
2
true.” But this is wrong, because we haven’t yet defined n! So in fact we want to sion is wrong and misleading because
it isn’t a True/False value, and so fails
prove that P(n) is true for all natural numbers n, or written symbolically, ∀n ∈ to capture precisely what we want to
N, P(n). Here is how a formal proof might go if we were not using mathematical prove.
induction:
10 david liu

n
n ( n + 1)
Proof of ∀n ∈ N, ∑ i =
i =0
2
Let n ∈ N.
# Want to prove that P(n) is true.
Case 1: Assume n is even.
# Gauss' trick
..
.
Then P(n) is true.
Case 2: Assume n is odd.
# Gauss' trick, with a twist?
..
.
Then P(n) is true.
Then in all cases, P(n) is true.
Then ∀n ∈ N, P(n). 

Instead, we’re going to see how induction gives us a different, easier way of
proving the same thing.

The Induction Idea

Suppose we want to create a viral Youtube video featuring “The World’s Longest
Domino Chain!!! (like plz)".

Of course, a static image like the one featured on the right is no good for video;
instead, once we have set it up we plan on recording all of the dominoes falling
in one continuous, epic take. It took a lot of effort to set up the chain, so we
would like to make sure that it will work; that is, that once we tip over the
first domino, all the rest will fall. Of course, with dominoes the idea is rather
straightforward, since we have arranged the dominoes precisely enough that
any one falling will trigger the next one to fall. We can express this thinking a
bit more formally:

(1) The first domino will fall (when we push it).


(2) For each domino, if it falls, then the next one in the chain will fall (because
each domino is close enough to the next one).

From these two thoughts, we can conclude that

(3) Every domino in the chain will fall.

We can apply the same reasoning to the set of natural numbers. Instead of
“every domino in the chain will fall,” suppose we want to prove that “for all
n ∈ N, P(n) is true”, where P(n) is some predicate. The analogues of the above
statements in the context of natural numbers are

(1) P(0) is true (0 is the “first” natural number)


The “is true” is redundant, but we will
often include these words for clarity.
introduction to the theory of computation 11

(2) ∀k ∈ N, P(k) ⇒ P(k + 1)


(3) ∀n ∈ N, P(n) is true

Putting these together yields the Principle of Simple Induction (also known as simple/mathematical induction
Mathematical Induction):

P(0) ∧ ∀k ∈ N, P(k) ⇒ P(k + 1) ⇒ ∀n ∈ N, P(n)




A different, slightly more mathematical intuition for what induction says is that
“P(0) is true, and P(1) is true because P(0) is true, and P(2) is true because P(1)
is true, and P(3) is true because. . . ” However, it turns out that a more rigorous
proof of simple induction doesn’t exist from the basic arithmetic properties of
the natural numbers alone. Therefore mathematicians accept the principle of It certainly makes sense intuitively, and
turns out to be equivalent to another
induction as an axiom, a statement as fundamentally true as 1 + 1 = 2.
fundamental math fact called the Well-
Ordering Principle.
This gives us a new way of proving a statement is true for all natural numbers:
instead of proving P(n) for an arbitrary n, just prove P(0), and then prove the
link P(k) ⇒ P(k + 1) for an arbitrary k. The former step is called the base case,
while the latter is called the induction step. We’ll see exactly how such a proof
goes by illustrating it with the opening example.
n
n ( n + 1)
Example 2.1. Prove that for every natural number n, ∑i= 2
.
i =0

The first few induction examples


Proof. First, we define the predicate associated with this question. This lets us in this chapter have a great deal of
structure; this is only to help you learn
determine exactly what it is we’re going to use in the induction proof. the necessary ingredients of induction
proofs. We will not be marking for a
n
n ( n + 1) particular structure in this course, but
Step 1 (Define the Predicate) P(n) : ∑i= 2 you will probably find it helpful to use
i =0 our keywords to organize your proofs.

It’s easy to miss this step, but without it, often you’ll have trouble deciding
precisely what to write in your proofs.

Step 2 (Base Case): n = 0. We would like to prove that P(0) is true. Recall the
meaning of P:
0
0(0 + 1)
P (0) : ∑i= 2
.
i =0
For induction proofs, the base case
This statement is trivially true, because both sides of the equation are equal to usually a very straightforward proof.
In fact, if you find yourself stuck on the
0. base case, then it is likely that you’ve
misunderstood the question and/or are
Step 3 (Induction Step): the goal is to prove that ∀k ∈ N, P(k) ⇒ P(k + 1). Let trying to prove the wrong predicate.
k ∈ N be some arbitrary natural number, and assume P(k) is true. This an-
tecedent assumption has a special name: the Induction Hypothesis. Explicitly,
we assume that
k
k ( k + 1)
∑i= 2 .
i =0

k +1
(k + 1)(k + 2)
Now, we want to prove that P(k + 1) is true, i.e., that ∑i= 2
. This
i =0
can be done with a simple calculation:

We break up the sum by removing the


last element.
12 david liu

!
k +1 k
∑i= ∑i + ( k + 1)
i =0 i =0
k ( k + 1)
= + ( k + 1) (By Induction Hypothesis) The one structural requirement we do
2   have for this course is that you must
k always state exactly where you use
= ( k + 1) +1
2 the induction hypothesis. We expect
to see the words “by the induction
(k + 1)(k + 2) hypothesis” at least once in each of
=
2 your proofs.
Therefore P(k + 1) holds. This completes the proof of the induction step: ∀k ∈
N, P(k) ⇒ P(k + 1).

Finally, by the Principle of Simple Induction, we can conclude that ∀n ∈ N, P(n).




In our next example, we look at a geometric problem – notice how our proof
will use no algebra at all, but instead constructs an argument from English state-
ments and diagrams. This example is also interesting because it shows how to
apply simple induction starting at a number other than 0.
Example 2.2. A triomino is a three-square L-shaped figure. To the right, we show
a 4-by-4 chessboard with one corner missing that has been tiled with triominoes.

Prove that for all n ≥ 1, any 2n -by-2n chessboard with one corner missing can
be tiled with triominoes.

Proof. Predicate: P(n): Any 2n -by-2n chessboard with one corner missing can
be tiled with triominoes.

Base Case: This is slightly different, because we only want to prove the claim
for n ≥ 1 (and ignore n = 0). Therefore our base case is n = 1, i.e., this is the
“start” of our induction chain. When n = 1, we consider a 2-by-2 chessboard
with one corner missing. But such a chessboard is exactly the same shape as a
Again, a rather trivial base case. Keep
triomino, so of course it can be tiled by triominoes! in mind that even though it was simple,
the proof would have been incomplete
Induction Step: Let k ≥ 1 and suppose that P(k) holds; that is, that every 2k -by- without it!
2k chessboard with one corner missing can be tiled by triominoes. (This is the
Induction Hypothesis.) The goal is to show that any 2k+1 -by-2k+1 chessboard with
one corner missing can be tiled by triominoes.

Consider an arbitrary 2k+1 -by-2k+1 chessboard with one corner missing. Divide
it into quarters, each quarter a 2k -by-2k chessboard.

Exactly one of these has one corner missing; by the Induction Hypothesis, this
quarter can be tiled by triominoes. Next, place a single triomino in the middle
that covers one corner in each of the three remaining quarters.

Each of these quarters now has one corner covered, and by the I.H. again, they
can each be tiled by triominoes. This completes the tiling of the 2k+1 -by-2k+1
chessboard.  Note that in this proof, we used the
induction hypothesis twice! (Or tech-
nically, 4 times, one for each 2k -by-2k
Before moving on, here is some intuition behind what we did in the previous two
quarter.)
examples. Given a problem of a 2n -by-2n chessboard, we repeatedly broke it up
introduction to the theory of computation 13

into smaller and smaller parts, until we reached the 2-by-2 size, which we could
tile using just a single triomino. This idea of breaking down the problem into
smaller ones “again and again” was a clear sign that a formal proof by induction
was the way to go. Be on the lookout for phrases like “repeat over and over” in
your own thinking to signal that you should be using induction. In the opening In your programming, this is the same
example, we used an even more specific approach: in the induction step, we sign that points to using recursive
took the sum of size k + 1 and reduced it to a sum of size k, and evaluated solutions as the easiest approach.

that using the induction hypothesis. The cornerstone of simple induction is this
link between problem instances of size k and size k + 1, and this ability to break
down a problem into something exactly one size smaller.

Example 2.3. Consider the sequence of natural numbers satisfying the following
properties: a0 = 1, and for all n ≥ 1, an = 2an−1 + 1. Prove that for all n ∈ N, We will see in the next chapter one way
an = 2n+1 − 1. of discovering this expression for an .

Proof. The predicate we will prove is

P(n) : an = 2n+1 − 1.

The base case is n = 0. By the definition of the sequence, a0 = 1, and 20+1 − 1 =


2 − 1 = 1, so P(0) holds.

For the induction step, let k ∈ N and suppose ak = 2k+1 − 1. Our goal is to
prove that P(k + 1) holds. By the recursive property of the sequence,

ak+1 = 2ak + 1
= 2 (2k +1 − 1 ) + 1 (by the I.H.)
k +2
=2 −2+1
k +2
=2 −1 

When Simple Induction Isn’t Enough

By this point, you have done several examples using simple induction. Recall
that the intuition behind this proof technique is to reduce problems of size k + 1
to problems of size k (where “size” might mean the value of a number, or the
size of a set, or the length of a string, etc.). However, for many problems there
is no natural way to reduce problem sizes just by 1. Consider, for example, the
following problem:

Prove that every natural number greater than 1 has a prime factoriza- Every prime can be written as a product
tion, i.e., can be written as a product of primes. of just one number: itself!

How would you go about proving the induction step, using the method we’ve
used so far? That is, how would you prove P(k) ⇒ P(k + 1)? This is a very
hard question to answer, because even the prime factorizations of consecutive E.g., 210 = 2 · 3 · 5 · 7, but 211 is prime.
numbers can be completely different!

But if I asked you to solve this question by “breaking the problem down,” you
would come up with the idea that if k + 1 is not prime, then we can write k + 1 =
14 david liu

a · b, where a, b < k + 1, and we can “do this recursively” until we’re left with a
product of primes. Since we always identify recursion with induction, this hints
at a more general form of induction that we can use to prove this statement.

Complete Induction

Recall the intuitive “chain of reasoning” that we do with simple induction: first
we prove P(0), and then use P(0) to prove P(1), then use P(1) to prove P(2),
etc. So when we get to k + 1, we try to prove P(k + 1) using P(k), but we have
already gone through proving P(0), P(1), . . . , and P(k − 1), in addition to P(k )!
In some sense, in Simple Induction we’re throwing away all of our previous
work except for P(k). In Complete Induction, we keep this work and use it in our
proof of the induction step. Here is the formal statement of The Principle of
Complete Induction: complete induction
 

P(0) ∧ ∀k, P(0) ∧ P(1) ∧ · · · ∧ P(k ) ⇒ P(k + 1) ⇒ ∀n, P(n)

Somewhat surprisingly, everything we


The only difference between Complete and Simple Induction is in the antecedent can prove with Complete Induction
we can also prove with Simple Induc-
of the inductive part: instead of assuming just P(k), we now assume all of P(0), tion, and vice versa. So these proof
P(1), . . . , P(k). Since these are assumptions we get to make in our proofs, Com- techniques are equally powerful.
plete Induction proofs are often more flexible than Simple Induction proofs — Azadeh Farzan gave a great analogy
for the two types of induction. Simple
intuitively, because we have “more to work with.” Induction is like a person climbing
stairs step by step; Complete Induction
Let’s illustrate this (slightly different) technique by proving the earlier claim is like a robot with multiple giant legs,
about prime factorizations. capable of jumping from any lower step
to a higher step.
Example 2.4. Prove that every natural number greater than 1 has a prime fac-
torization.

Proof. Predicate: P(n) : “There are primes p1 , p2 , . . . , pm (for some m ≥ 1) such


that n = p1 p2 · · · pm .” We will show that ∀n ≥ 2, P(n).

Base Case: n = 2. Since 2 is prime, we can let p1 = 2 and say that n = p1 , so


P(2) holds.

Induction Step: Here is the only structural difference for Complete Induction
proofs. We let k ≥ 2, and our induction hypothesis is now to assume that for all
2 ≤ i ≤ k, P(i ) holds. (That is, we’re assuming P(2), P(3), P(4), . . . , P(k) are all
true.) The goal is still the same: prove that P(k + 1) is true.

There are two cases. In the first case, assume k + 1 is prime. Then of course k + 1
can be written as a product of primes, so P(k + 1) is true. The product contains a single number,
k + 1.
In the second case, k + 1 is composite. But then by the definition of composite-
ness, there exist a, b ∈ N such that k + 1 = ab and 2 ≤ a, b ≤ k; that is, k + 1
has factors other than 1 and itself. This is the intuition from earlier. And here
is the “recursive thinking”: by the induction hypothesis, P( a) and P(b) hold. We can only use the induction hypothe-
Therefore we can write sis because a and b are at least 2 and less
than k + 1.
a = q 1 · · · q l1 and b = r 1 · · · r l2 ,
introduction to the theory of computation 15

where each of the q’s and r’s is prime. But then

k + 1 = ab = q1 · · · ql1 r1 · · · rl2 ,

and this is the prime factorization of k + 1. So P(k + 1) holds. 

Note that we used inductive thinking to break down the problem; but unlike
Simple Induction where the size of the subproblem is one less than the current
problem size, we didn’t know much about the sizes of the resulting problems
(only that they were smaller than the original problem). Complete Induction
allows us to handle this sort of structure.

Example 2.5. The Fibonacci sequence is a sequence of natural numbers defined


recursively as f 1 = f 2 = 1, and for all n ≥ 3, f n = f n−1 + f n−2 . Prove that for all
n ≥ 1,

√ √
( 1+2 5 )n − ( 1−2 5 )n
fn = √ .
5

Proof. Note that we really need complete induction here (and not just simple
induction) because f n is defined in terms of both f n−1 and f n−2 , and not just
f n−1 only.
√ √
( 1+2 5 )n − ( 1−2 5 )n
The predicate we will prove is P(n) : f n = √ . We require two
5
base cases: one for n = 1, and one for n = 2. These can be checked by simple
calculations:

√ √ √ √
( 1+2 5 )1 − ( 1−2 5 )1 1+ 5
2 √− 1− 5
2
√ =
5 5

5
= √
5
= 1 = f1
√ √ √ √
( 1+2 5 )2 − ( 1−2 5 )2 6+2 5
4 √ − 6−2 5
4
√ =
5 5

5
= √
5
= 1 = f2

For the induction step, let k ≥ 2 and assume P(1), P(2), . . . , P(k ) hold. Consider
16 david liu

f k+1 . By the recursive definition, we have

f k +1 = f k + f k −1
√ √ √ √
( 1+2 5 )k − ( 1−2 5 )k ( 1+2 5 )k−1 − ( 1−2 5 )k−1
= √ + √ (by I.H.)
5 5
√ √ √ √
( 1+2 5 )k + ( 1+2 5 )k−1 ( 1−2 5 )k + ( 1−2 5 )k−1
= √ − √
5 5
√ √ √ √
( 1+2 5 )k−1 ( 1+2 5
+ 1) ( 1−2 5 )k−1 ( 1−2 5
+ 1)
= √ − √
5 5
√ √ √ √
( 1+2 5 )k−1 · 6+2 5
4 ( 1−2 5 )k−1 · 6−2 5
4
= √ − √
5 5
√ √ √ √
( 1+2 5 )k−1 ( 1+2 5 )2 ( 1−2 5 )k−1 ( 1−2 5 )2
= √ − √
5 5
√ √
( 1+2 5 )k+1 ( 1−2 5 )k+1
= √ − √ 
5 5

Beyond Numbers

So far, our proofs have all been centred on natural numbers. Even in situations
where we have proved statements about other objects — like sets and chess-
boards — our proofs have always required associating these objects with natural
numbers. Consider the following problem:

Prove that any non-empty binary tree has exactly one more node
than edge.

We could use either simple or complete induction on this problem by associating


every tree with a natural number (height and number of nodes are two of the
most common). But this is not the most “natural” way of approaching this prob-
lem (though it’s perfectly valid!) because binary trees already have a lot of nice
recursive structure that we should be able to use directly, without shoehorning
in natural numbers. What we want is a way of proving statements about objects
other than numbers. Thus we move away from N (the set of natural numbers),
to more general sets (such as the set of all non-empty binary trees).

Recursive Definitions of Sets


You are already familiar with many descriptions of sets: {2, π, 10}, { x ∈ R |
x ≥ 4}, and “the set of all non-empty binary trees” are all perfectly valid de-
scriptions of sets. Unfortunately, these set descriptions don’t lend themselves
very well to induction, because induction is recursion and it isn’t clear how to
apply recursive thinking to any of these descriptions. However, for some objects
– like binary trees – it is relatively straightforward to define them recursively.
Here’s a warm-up.
introduction to the theory of computation 17

Example 2.6. Suppose we want to construct a recursive definition of N. Here is The “smallest” means that nothing else
one way. Define N to be the (smallest) set such that: is in N. This is an important point to
make; for example, the set of integers Z
also satisfies the given properties, but
• 0∈N includes more than N. In the recursive
definitions below, we omit “smallest”
• If k ∈ N, then k + 1 ∈ N but it is always implicitly there.

Notice how similar this definition looks to the Principle of Simple Induction!
This isn’t a coincidence: induction fundamentally makes use of this recursive
structure of N. We’ll refer to the first rule as the base of the definition, and the
second as the recursive rule. In general, a recursive definition can have multiple
base and recursive rules!

Example 2.7. Construct a recursive definition of “the set of all non-empty binary
trees.”

Intuitively, the base rule(s) always capture the smallest or simplest elements of
a set. Certainly the smallest non-empty binary tree is a single node.

What about larger trees? This is where “breaking down” problems into smaller
subproblems makes the most sense. You should know from CSC148 that we
really store binary trees in a recursive manner: every tree has a root node and
links to the roots of the left and right subtrees (the suggestive word here is
“subtree.”) One slight subtlety is that one or both of these subtrees could be
empty. Here is a formal recursive definition (before you read it, try coming up
with one yourself!):

• A single node is a non-empty binary tree.


• If T1 , T2 are two non-empty binary trees, then the tree with a new root r
connected to the roots of T1 and T2 is a non-empty binary tree.
• If T1 is a non-empty binary tree, then the tree with a new root r connected to
the root of T1 to the left or to the right is a non-empty binary tree.

Notice that this definition has two recursive rules, not one!

Structural Induction

Now, we mimic the format of our induction proofs, but with the recursive defi-
nition of non-empty binary trees rather than natural numbers. The similarity of
form is why this type of proof is called structural induction. In particular, notice
structural induction
the identical terminology.

Example 2.8. Prove that every non-empty binary tree has one more node than
edge.

Proof. As before, we need to define a predicate to nail down exactly what it is


we’d like to prove. However, unlike all of the previous predicates we’ve seen,
which have been boolean functions on natural numbers, now the domain of the
predicate is the set of all non-empty binary trees.
18 david liu

Predicate: P( T ): T has one more node than edge.


Note that here the domain of the pred-
icate is NOT N, but instead the set of
non-empty binary trees.
We will use structural induction to prove that for every non-empty binary tree T,
P( T ) holds.

Base Case: Our base case is determined by the first rule. Suppose T is a single
node. Then it has one node and no edges, so P( T ) holds.

Induction Step: We’ll divide our proof into two parts, one for each recursive
rule.

• Let T1 and T2 be two non-empty binary trees, and assume P( T1 ) and P( T2 )


hold. (This is the induction hypothesis.) Let T be the tree constructed by attach-
ing a node r to the roots of T1 and T2 . Let V ( G ) and E( G ) denote the number
of nodes and edges in a tree G, respectively. Then we have the equations

V ( T ) = V ( T1 ) + V ( T2 ) + 1
E( T ) = E( T1 ) + E( T2 ) + 2

since one extra node (new root r) and two extra edges (from r to the roots
of T1 and T2 ) were added to form T. By the induction hypothesis, V ( T1 ) =
E( T1 ) + 1 and V ( T2 ) = E( T2 ) + 1, and so

V ( T ) = E( T1 ) + 1 + E( T2 ) + 1 + 1
= E( T1 ) + E( T2 ) + 2 + 1
= E( T ) + 1

Therefore P( T ) holds.
• Let T1 be a non-empty binary tree, and suppose P( T1 ) holds. Let T be the tree
formed by taking a new node r and adding an edge to the root of T1 . Then
V ( T ) = V ( T1 ) + 1 and E( T ) = E( T1 ) + 1, and since V ( T1 ) = E( T1 ) + 1 (by
the induction hypothesis), we have

V ( T ) = E( T1 ) + 2 = E( T ) + 1. 

In structural induction, we identify some property that is satisfied by the sim- We say that such a property is invariant
plest (base) elements of the set, and then show that the property is preserved under the recursive rules, meaning
it isn’t affected when the rules are
under each of the recursive construction rules.
applied. The term “invariant” will
reappear throughout this course in
Here is some intuition: imagine you have a set of Lego blocks. Starting with different contexts.
individual Lego pieces, there are certain “rules” that you can use to combine
Lego objects to build larger and larger structures, corresponding to (say) differ-
ent ways of attaching Lego pieces together. This is a recursive way of describing
the (infinite!) set of all possible Lego creations.

Now suppose you’d like to make a perfectly spherical object, like a soccer ball
or the Death Star. Unfortunately, you look in your Lego kit and all you see are
rectangular pieces! Naturally, you complain to your mother (who bought the kit
for you) that you’ll be unable to make a perfect sphere using the kit. But she
remains unconvinced: maybe you should try doing it, she suggests, and if you’re
introduction to the theory of computation 19

lucky you’ll come up with a clever way of arranging the pieces to make a sphere.
Aha! This is impossible, since you’re starting with non-spherical pieces, and you
(being a Lego expert) know that no matter which way you combine Lego objects
together, starting with rectangular objects yields only other rectangular objects
as results. So even though there are many, many different rectangular structures
you could build, none of them could ever be perfect spheres.

A Larger Example

Let us turn our attention to another useful example of induction: proving the
equivalence of recursive and non-recursive definitions. We know from our study
of Python that often problems can be solved using either recursive or iterative
programs, but we’ve taken it for granted that these programs really can accom-
Although often a particular problem
plish the same task. We’ll look later in this course at proving things about what lends itself more to one technique than
programs do, but for a warm-up in this section, we’ll step back from programs the other.

and prove a similar type of mathematical result.

Example 2.9. Consider the following recursively defined set S ⊆ N ∗ N:

• (0, 0) ∈ S
• If ( a, b) ∈ S, then both ( a + 1, b + 1) ∈ S and ( a + 3, b) ∈ S Again, there are two recursive rules
here.
Also, define the set S0 = {( x, y) ∈ N ∗ N | x ≥ y ∧ 3 | x − y}. Prove that these Here, 3 | x − y means that x − y is
two definitions are equivalent, i.e., S = S0 . divisible by 3.

Proof. We divide our solution into two parts. First, we show using structural
induction that S ⊆ S0 ; that is, every element of S satisfies the property of S0 .
Then, we prove using complete induction that S0 ⊆ S; that is, every element of
S0 can be constructed from the base and recursive rules of S.

Part 1: S ⊆ S0 . In this part, we show that the base case of S is in S0 , and that all
elements generated using the recursive rules of S are also in S0 . For clarity, we
define the predicate
P( x, y) : x ≥ y ∧ 3 | x − y

The only base element of S is (0, 0). Clearly, P(0, 0) is true, as 0 ≥ 0 and 3 | 0.

Now for the induction step. There are two recursive rules for S. Let ( a, b) ∈ S,
and suppose P( a, b) holds. Consider ( a + 1, b + 1). By the induction hypothesis,
a ≥ b, and so a + 1 ≥ b + 1. Also, ( a + 1) − (b + 1) = a − b, which is divisible by
3 (again by the I.H.). So P( a + 1, b + 1) also holds.

Finally, consider ( a + 3, b). Since a ≥ b (by the I.H.), a + 3 ≥ b. Also, since


3 | a − b (again by the I.H.), we can write a − b = 3k. Then ( a + 3) − b = 3(k + 1),
so 3 | ( a + 3) − b. Therefore, P( a + 3, b) holds.

Part 2: S0 ⊆ S. We would like to use complete induction, but we can only apply
that technique to natural numbers, and not pairs of natural numbers. So we
need to associate each pair ( a, b) with a single natural number. We can do this
20 david liu

by considering the sum of the pair. We define the following predicate:

P(n) : for every ( x, y) ∈ S0 such that x + y = n, ( x, y) ∈ S.

It should be clear that proving ∀n ∈ N, P(n) is equivalent to proving that S0 ⊆ S.


We will prove the former using complete induction.

The base case is n = 0. the only element of S0 whose ( x, y) sums to 0 is (0, 0),
which is certainly in S by the base rule of the recursive definition.

Now let k ∈ N, and suppose P(0), P(1), . . . , P(k) all hold. Let ( x, y) ∈ S0 such
that x + y = k + 1. We will prove that ( x, y) ∈ S. There are two cases to consider:

• y > 0. Then since x ≥ y, x > 0. Then ( x − 1, y − 1) ∈ S0 , and ( x − 1) + (y − The > 0 checks ensure that x − 1, y −
1) = k − 1. By the Induction Hypothesis (in particular, P(k − 1)), ( x − 1, y − 1 ∈ N.
1) ∈ S. Then ( x, y) ∈ S by applying the first recursive rule in the definition of
S.
• y = 0. Since k + 1 > 0, it must be the case that x > 0. Then since x −
y = x, x must be divisible by 3, and so x ≥ 3. Then ( x − 3, y) ∈ S0 and
( x − 3) + y = k − 2, so by the Induction Hypothesis (in particular, P(k − 2)),
( x − 3, y) ∈ S. Applying the second recursive rule in the definition of S shows
that ( x, y) ∈ S. 

Exercises

1. Prove that for all n ∈ N,


n
n(n + 1)(2n + 1)
∑ i2 = 6
.
i =0

2. Let a ∈ R, a 6= 1. Prove that for all n ∈ N,


n
a n +1 − 1
∑ ai = a−1
.
i =0

3. Prove that for all n ≥ 1,


n
1 n
∑ k ( k + 1 )
=
n + 1
.
k =1

4. Prove that ∀n ∈ N, the units digit of 4n is either 1, 4, or 6.


5. Prove that ∀n ∈ N, 3 | 4n − 1, where “m | n” means that m divides n, or
equivalently that n is a multiple of m. This can be expressed algebraically as
∃k ∈ N, n = mk.
6. Prove that for all n ≥ 2, 2n + 3n < 4n .
7. Let m ∈ N. Prove that for all n ∈ N, m | (m + 1)n − 1.
8. Prove that ∀n ∈ N, n2 ≤ 2n + 1. Hint: first prove, without using induction,
that 2n + 1 ≤ n2 − 1 for n ≥ 3.
9. Find a natural number k ∈ N such that for all n ≥ k, n3 + n < 3n . Then, prove
this result using simple induction.
introduction to the theory of computation 21

10. Prove that 3n < n! for all n > 6.


11. Prove that for every n ∈ N, every set of size n has exactly 2n subsets.
12. Find formulas for the number of even-sized subsets and odd-sized subsets of
a set of size n. Prove that your formulas are correct in a single induction
argument.
So your predicate should be something
13. Prove, using either simple or complete induction, that any binary string be- like “every set of size n has ... even-
sized subsets and ... odd-sized subsets.”
gins and ends with the same character if and only if it contains an even number
A binary string is a string containing
of occurrences of substrings from {01, 10}. only 0’s and 1’s.
14. A ternary tree is a tree where each node has at most 3 children. Prove that for
every n ≥ 1, every non-empty ternary tree of height n has at most (3n − 1)/2
nodes.
15. Let a ∈ R, a 6= 1. Prove that for all n ∈ N,
n
n · a n +2 − ( n + 1 ) · a n +1 + a
∑ i · ai = ( a − 1)2
.
i =0

Challenge: can you mathematically derive this formula by starting from the
standard geometric identity?
16. Recall two standard trigonometric identities:

cos( x + y) = cos( x ) cos(y) − sin( x ) sin(y)


sin( x + y) = sin( x ) cos(y) + cos( x ) sin(y)

Also recall the definition of the imaginary number i = −1. Prove, using
induction, that

(cos( x ) + i sin( x ))n = cos(nx ) + i sin(nx ).

17. The Fibonacci sequence is an infinite sequence of natural numbers f 1 , f 2 , . . .


with the following recursive definition:
(
1, if i = 1, 2
fi =
f i−1 + f i−2 , if i > 2

n
(a) Prove that for all n ≥ 1, ∑ fi = f n+2 − 1.
i =1
n
(b) Prove that for all n ≥ 1, ∑ f 2i−1 = f 2n .
i =1
(c) Prove that for all n ≥ 2, f n2 − f n+1 f n−1 = (−1)n−1 .
(d) Prove that for all n ≥ 1, gcd( f n , f n+1 ) = 1. You may use the fact that for all a < b,
n
gcd( a, b) = gcd( a, b − a).
(e) Prove that for all n ≥ 1, ∑ f i2 = f n f n +1 .
i =1

18. A full binary tree is a non-empty binary tree where every node has exactly
0 or 2 children. Equivalently, every internal node (non-leaf) has exactly two
children.

(a) Prove using complete induction that every full binary tree has an odd num-
ber of nodes.
You can choose to do induction on
either the height or number of nodes
in the tree. A solution with simple
induction is also possible, but less
22 david liu

(b) Prove using complete induction that every full binary tree has exactly one
more leaf than internal nodes.
(c) Give a recursive definition for the set of all full binary trees.
(d) Reprove parts (a) & (b) using structural induction instead of complete in-
duction.

19. Consider the sets of binary trees with the following property: for each node,
the heights of its left and right children differ by at most 1. Prove that every
binary tree with this property of height n has at least (1.5)n − 1 nodes.
1 n
 
n
20. Let k > 1. Prove that for all n ∈ N, 1 − ≥ 1− .
k k
21. Consider the following recursively defined function f : N → N.


 2, if n = 0

f (n) = 7, if n = 1

( f (n − 1))2 − f (n − 2), if n ≥ 2

Prove that for all n ∈ N, 3 | f (n) − 2. It will be helpful to phrase your


predicate here as ∃k ∈ N, f (n) = 3k + 2.
22. Prove that every natural number greater than 1 can be written as the sum of
prime numbers.
23. We define the set S of strings over the alphabet {[, ]} recursively by

• e, the empty string, is in S


• If w ∈ S, then so is [w]
• If x, y ∈ S, then so is xy

Prove that every string in S is balanced, i.e., the number of left brackets equals
the number of right brackets.
24. The Fibonacci trees Tn are a special set of binary trees defined recursively as
follows.

• T1 and T2 are binary trees with only a single node.


• For n > 2, Tn consists of a root node whose left subtree is Tn−1 , and whose
right subtree is Tn−2 .

(a) Prove that for all n ≥ 2, the height of Tn is n − 2.


(b) Prove that for all n ≥ 1, Tn has f n leaves, where f n is the n-th Fibonacci
number.

25. Consider the following recursively defined set S ⊂ N2 .

• 2∈S
• If k ∈ S, then k2 ∈ S
k
• If k ∈ S, and k ≥ 2, then ∈S
2
(a) Prove that every element of S is a power of 2, i.e., can be written in the
form 2m for some m ∈ N.
(b) Prove that every power of 2 (including 20 ) is in S.
introduction to the theory of computation 23

26. Consider the set S ⊂ N2 of ordered pairs of integers defined by the following
recursive definition:

• (3, 2) ∈ S
• If ( x, y) ∈ S, then (3x − 2y, x ) ∈ S

Also consider the set S0 ⊂ N2 with the following non-recursive definition:

S0 = {(2k+1 + 1, 2k + 1) | k ∈ N}.

Prove that S = S0 , or in other words, that the recursive and non-recursive


definitions agree.
27. We define the set of propositional formulas PF as follows:

• Any proposition P is a propositional formula.


• If F is a propositional formula, then so is ¬ F.
• If F1 and F2 are propositional formulas, then so are F1 ∧ F2 , F1 ∨ F2 , F1 ⇒ F2 ,
and F1 ⇔ F2 .

Prove that for all propositional formulas F, F has a logically equivalent for-
mula G such that G only has negations applied to propositions. For example,
we have the equivalence

¬(¬( P ∧ Q) ⇒ R) ⇔ (¬ P ∨ ¬ Q) ∧ ¬ R

Hint: you won’t have much luck applying induction directly to the statement
in the question. (Try it!) Instead, prove the stronger statement: “F and ¬ F
have equivalent formulas that only have negations applied to propositions.”
28. It is well-known that Facebook friendships are the most important relation-
ships you will have in your lifetime. For a person x on Facebook, let f x denote
the number of friends x has. Find a relationship between the total number of
Facebook friendships in the world, and the sum of all of the f x ’s (over every
person on Facebook). Prove your relationship using induction.
29. Consider the following 1-player game. We start with n pebbles in a pile,
where n ≥ 1. A valid move is the following: pick a pile with more than 1
pebble, and divide it into two smaller piles. When this happens, add to your
score the product of the sizes of the two new piles. Continue making moves
until no more can be made, i.e., there are n piles each containing a single
pebble.
Prove using complete induction that no matter how the player makes her
n ( n − 1)
moves, she will always score points when playing this game with n
2 So this game is completely determined
pebbles. by the starting conditions, and not at all
by the player’s choices. Sounds fun.
30. A certain summer game is played with n people, each carrying one water
balloon. The players walk around randomly on a field until a buzzer sounds,
at which point they stop. You may assume that when the buzzer sounds,
each player has a unique closest neighbour. After stopping, each player then
throws his water balloon at their closest neighbour. The winners of the game
are the players who are dry after the water balloons have been thrown (as-
sume everyone has perfect aim).
Prove that for every odd n, this game always has at least one winner.
24 david liu

The following problems are for the more mathematically-inclined students.

1. The Principle of Double Induction is as follows. Suppose that P( x, y) is a predi-


cate with domain N2 satisfying the following properties:

(1) P(0, y) holds for all y ∈ N


(2) For all ( x, y) ∈ N2 , if P( x, y) holds, then so does P( x + 1, y).

Then we may conclude that for all x, y ∈ N, P( x, y).


Prove that the Principle of Double Induction is equivalent to the Principle of
Simple Induction.
2. Prove that for all n ≥ 1, and positive real numbers x1 , . . . , xn ∈ R+ ,

1 − x1 1 − x2 1 − xn 1−S
× ×···× ≥ ,
1 + x1 1 + x2 1 + xn 1+S
n
where S = ∑ xi .
i =1
1
3. A unit fraction is a fraction of the form , n ∈ Z+ . Prove that every rational
n
p
number 0 < < 1 can be written as the sum of distinct unit fractions.
q
3 Recursion

Now, programming! In this chapter, we will apply what we’ve learned about in-
duction to study recursive algorithms. In particular, we will learn how to analyse
the time complexity of recursive programs, for which the runtime on an input of
size n depends on the runtime on smaller inputs. Unsurprisingly, this is tightly
connected to the study of recursively defined (mathematical) functions; we will
discuss how to go from a recurrence relation like f (n + 1) = f (n) + f (n − 1) to a
closed form expression like f (n) = 2n + n2 . For recurrences of a special form, we
will see how the Master Theorem gives us immediate, tight asymptotic bounds. Recall that asymptotic bounds involve
These recurrences will be used for divide-and-conquer algorithms; you will gain Big-O, and are less precise than exact
experience with this common algorithmic paradigm and even design algorithms expressions.

of your own.

Measuring Runtime

Recall that one of the most important properties of an algorithm is how long
it takes to run. We can use the number of steps as a measurement of running
time; but reporting an absolute number like “10 steps” or “1 000 000 steps” an
algorithm takes is pretty meaningless unless we know how “big” the input was,
since of course we’d expect algorithms to take longer on larger inputs. So a
more meaningful measure of runtime is “10 steps when the input has size 2” or
“1 000 000 steps when the input has size 300” or even better, “n2 + 2 steps when
the input has size n.” But as you probably remember from CSC148, counting an
exact number of steps is often tedious and arbitrary, so we care more about the
Big-O (asymptotic) analysis of an algorithm.
In this course, we will mainly care
about the upper bound on the worst-
In CSC148 and earlier in this course, you analysed the runtime of iterative algo- case runtime of algorithms; that is, the
rithms. As we’ve mentioned several times by now, induction is very similar to absolute longest an algorithm could run
on a given input size n.
recursion; since induction has been the key idea of the course so far, it should
come as no surprise that we’ll turn our attention to recursive algorithms now!

A Simple Recursive Function

Consider the following simple recursive function, which you probably saw in All code in this course will be in
CSC148: Python-like pseudocode. The syntax
and methods will be mostly Python,
with some English making the code
more readable and/or intuitive. We’ll
expect you to follow a similar style.
26 david liu

1 def fact(n):
2 if n == 1:
3 return 1
4 else:
5 return n * fact(n-1)

In CSC148, it was surely claimed that


this function computes n! and we’ll
How would you informally analyse the runtime of this algorithm? One might see later in this course how to formally
say that the recursion depth is n, and at each call there is just one step other than prove that this is what the function does.
the recursive call, so the runtime is O(n), i.e., linear time. But in performing
a more thorough step-by-step analysis, we reach a stumbling block with the
recursive call fact(n-1): the runtime of fact on input n depends on its runtime
on input n − 1! Let’s see how to deal with this relationship, using mathematics
and induction.

Recursively Defined Functions

You should all be familiar with standard function notation: f (n) = n2 , f (n) =
n log n, or the slightly more unusual (but no less meaningful) f (n) = “the num-
ber of distinct prime factors of n.” There is a second way of defining functions
using recursion, e.g.,
(
0, if n = 0
f (n) =
f (n − 1) + 2n − 1, if n ≥ 1

Recursive definitions allow us to capture marginal or relative difference between


recursively defined function
function values, even when we don’t know their exact values. But recursive
definitions have a significant downside: we can only calculate large values of f
by computing smaller values first. For calculating extremely large, or symbolic,
values of f (like f (n2 + 3n)), a recursive definition is inadequate; what we would
really like is a closed form expression for f , one that doesn’t depend on other
values of f . In this case, the closed form expression for f is f (n) = n2 .
You will prove this in the Exercises.

Before returning to our earlier factorial example, let us see how to apply this to
a more concrete example.

Example 3.1. There are exactly two ways of tiling a 3-by-2 grid using triominoes,
shown to the right:

Develop a recursive definition for f (n), the number of ways of tiling a 3-by-n
grid using triominoes for n ≥ 1. Then, find a closed form expression for f .

Solution:
Note that if n = 1, there are no possible tilings, since no triomino will fit in
a 3-by-1 board. We have already observed that there are 2 tilings for n = 2.
Suppose n > 2. The key idea to get a recurrence is that for a 3-by-n block, first
consider the upper-left square. In any tiling, there are only two possible tri-
omino placements that can cover it (these orientations are shown in the diagram
introduction to the theory of computation 27

above). Once we have fixed one of these orientations, there is only one possible
triomino orientation that can cover the bottom-left square (again, these are the
two orientations shown in the figure).

So there are exactly two possibilities for covering both the bottom-left and top-
left squares. But once we’ve put these two down, we’ve tiled the leftmost 3-by-2
part of the grid, and the remainder of the tiling really just tiles the remaining 3-
by-(n − 2) part of the grid; there are f (n − 2) such tilings. Since these two parts Because we’ve expressed f (n) in terms
are independent of each other, we get the total number of tilings by multiplying of f (n − 2), we need two base cases –
the number of possibilities for each. Therefore the recurrence relation is: otherwise, at n = 2 we would be stuck,
 as f (0) is undefined.

 0, if n = 1

f (n) = 2, if n = 2


2 f (n − 2), if n > 2

Now that we have the recursive definition of f , we would like to find its closed
form expression. The first step is to guess the closed form expression, by a “brute
force” approach known as repeated substitution. Intuitively, we’ll expand out the
recursive definition until we find a pattern. So much of mathematics is finding
patterns.
f ( n ) = 2 f ( n − 2)
= 4 f ( n − 4)
= 8 f ( n − 6)
..
.
= 2k f (n − 2k)

There are two possibilities. If n is odd, say n = 2m + 1, then we have f (n) =


2m f (1) = 0, since f (1) = 0. If n is even, say n = 2m, then f (n) = 2m−1 f (2) =
2m−1 · 2 = 2m . Writing our final answer in terms of n only:
(
0, if n is odd
f (n) = n
2 2 , if n is even

.
Thus we’ve obtained a closed form formula f (n) – except the .. in our repeated
.
substitution does not constitute a formal proof! When you saw the .., you proba-
bly interpreted it as “repeat over and over again until. . . ” and we already know
how to make this thinking formal: induction! That is, given the recursive defini-
tion of f , we can prove using complete induction that f (n) has the closed form Why complete and not simple induc-
given above. This is a rather straightforward argument, and we leave it for the tion? We need the induction hypothesis
Exercises. to work for n − 2, and not just n − 1.

We will now apply this technique to our earlier example.


Example 3.2. Analyse the asymptotic worst-case running time of fact(n), in
terms of n.

Solution:
Let T (n) denote the worst-case running time of fact on input n. In this course,
we will ignore exact step counts entirely, replacing these counts with constants.
28 david liu

The base case of this method is when n = 1; in this case, the if block executes
and the method returns 1. This is done in constant time, and so we can say that
Constant always means “independent of
T (1) = c for some constant c. input size.”

What if n > 1? Then fact makes a recursive call, and to analyse the runtime
we consider the recursive and non-recursive parts separately. The non-recursive
part is simple: only a constant number of steps occur (the if check, multiplica-
tion by n, and the return), so let’s say the non-recursive part takes d steps. What
about the recursive part? The recursive call is fact(n-1), which has worst-case
runtime T (n − 1), by definition! Therefore when n > 1 we get the recurrence
relation T (n) = T (n − 1) + d. Putting this together with the base case, we get
the full recursive definition of T:

c, if n = 1
T (n) =
 T (n − 1) + d, if n > 1

Now, we would like to say that T (n) = O(??), but to do so, we really need a
closed form definition of T. Once again, we use repeated substitution.

T ( n ) = T ( n − 1) + d

= T (n − 2) + d + d = T (n − 2) + 2d
= T (n − 3) + 3d
..
.
= T (1) + ( n − 1) d
= c + ( n − 1) d (Since T (1) = c)

Thus we’ve obtained the closed form formula T (n) = c + (n − 1)d, modulo the
..
.. As in the previous example, we leave proving this closed form as an exercise.

After proving this closed form, the final step is simply to convert this closed
form into an asymptotic bound on T. Since c and d are constants with respect to
n, we have that T (n) = O(n).

Now let’s see a more complicated recursive function.

Example 3.3. Consider the following code for binary search.

1 def bin_search(A, x):


2 '''
3 Pre: A is a sorted list (non-decreasing order).
4 Post: Returns True if and only if x is in A.
5 '''
6 if len(A) == 0:
7 return False
8 else if len(A) == 1:
9 return A[0] == x
10 else:
11 m = len(A) // 2 # Rounds down, like floor
12 if x <= A[m-1]:
introduction to the theory of computation 29

13 return bin_search(A[0..m-1], x)
14 else: One notable difference from Python is
15 return bin_search(A[m..len(A)-1], x) how we’ll denote sublists. Here, we use
the notation A[i..j] to mean the slice
of the list A from index i to index j,
including A[i] and A[j].

We analyse the runtime of bin_search in terms of n, the length of the input list
A. If n = 0 or n = 1, bin_search(A,x) takes constant time (note that it doesn’t
matter whether the constant is the same or different for 0 and 1).

What about when n > 1? Then some recursive calls are made, and we again
look at the recursive and non-recursive steps separately. We include the compu-
tation of A[0..m-1] and A[m..len(A)-1] in the non-recursive part, since argu-
ment evaluation happens before the recursive call begins.
Interestingly, this is not the case in
some programming languages – an
alternative is “lazy evaluation.”

IMPORTANT ANNOUNCEMENT 1: We will interpret all list slic-


ing operations A[i..j] as constant time, even when i and j depend on
the length of the list. See the discussion in the following section.

Interpreting list slicing as constant time, the non-recursive cost of bin_search is


constant time. What about the recursive calls? In all possible cases, only one
recursive call occurs. What is thejsize of the list of the recursive call? When
nk
either recursive call happens, m = , meaning the recursive call either is on a
jnk lnm 2
list of size or .
2 2

IMPORTANT ANNOUNCEMENT 2: In this course, we won’t care


about floors and ceilings. We’ll always assume that the input sizes You may look in Vassos Hadzilacos’
are “nice” so that the recursive calls always divide the list evenly. In course notes for a complete handling
of floors and ceilings. The algebra is
the case of binary search, we’ll assume that n is a power of 2. a little more involved, but the bottom
line is that the asymptotic analysis is
unchanged.

 n this in mind, we conclude that the recurrence relation for T (n) is T (n) =
With
T + d. Therefore the full recursive definition of T is
2


c, if n ≤ 1
T (n) = n
T + d, if n > 1
2
Again, we omit floors and ceilings.

Let us use repeated substitution to guess a closed form. Assume that n = 2k for
30 david liu

some natural number k.


n
T (n) = T +d
 2 n   n
= T +d +d = T + 2d
 n 4 4
=T + 3d
8
..
.
n
= T k + kd
2
= T (1) + kd (Since n = 2k )
= c + kd (Since T (1) = c)

Once again, we’ll leave proving this closed form to the Exercises. So T (n) =
c + kd. This expression is quite misleading, because it seems to not involve
an n, and hence be constant time – which we know is not the case for binary
search! The key is to remember that n = 2k , so k = log2 n. Therefore we have
T (n) = c + d log2 n, and so T (n) = O(log n).

Aside: List Slicing vs. Indexing

In our analysis of binary search, we assumed that the list slicing operation
A[0..m-1] took constant time. However, this is not the case in Python and many
other programming languages, which implement this operation by copying the
sliced elements into a new list. Depending on the scale of your application, this
can be undesirable for two reasons: this copying takes time linear in the size of
the slice, and uses linear additional memory.

While we are not so concerned in this course about the second issue, the first can
drastically change our runtime analysis (e.g., in our analysis of binary search).
However, there is always another way to implement these algorithms without
this sort of slicing that can be done in constant time, and without creating new
lists. The key idea is to use variables to keep track of the start and end points
of the section of the list we are interested in, but keep the whole list all the way
through the computation. We illustrate this technique in our modified binary
search:

1 def indexed_bin_search(A, x, first, last):


2 if first > last:
3 return False
4 else if first == last:
5 return A[first] == x
6 else:
7 m = (first + last + 1) // 2
8 if x <= A[m-1]:
9 return indexed_bin_search(A, x, first, m - 1)
10 else:
11 return indexed_bin_search(A, x, m, last)
introduction to the theory of computation 31

In this code, the same list A is passed to each recursive call; the range of searched
values, on the other hand, indeed gets smaller, as the first and last parameters
change. More technically, the size of the range, last - first + 1, decreases by
a (multiplicative) factor of two at each recursive call.

Passing indices as arguments works well for recursive functions that work on
smaller and smaller segments of a list. We’ve just introduced the most basic
version of this technique. However, many other algorithms involve making new
lists in more complex ways, and it is usually possible to make these algorithms
in-place, i.e., to use a constant amount of extra memory, and do operations by
changing the elements of the original list.

Because we aren’t very concerned with this level of implementation detail in


this course, we’ll use the shortcut of interpreting list slicing as taking constant
time, keeping in mind actual naïve implementations take linear time. This will al-
low us to perform our runtime analyses without getting bogged down in clever
implementation tricks.

A Special Recurrence Form

In general, finding exact closed forms for recurrences can be tedious or even
impossible. Luckily, we are often not really looking for a closed form solution to
a recurrence, but an asymptotic bound. But even for this relaxed goal, the only
method we have so far is to find a closed form and turn it into an asymptotic
bound. In this section, we’ll look at a powerful technique with the caveat that it
works only for a special recurrence form.

We can motivate this recurrence form by considering a style of recursive algo-


rithm called divide-and-conquer. We’ll discuss this in detail in the next section,
but for now consider the mergesort algorithm, which can roughly be outlined mergesort
in three steps:

1. Divide the list into two equal halves.


2. Sort each half separately, using recursive calls to mergesort.
3. Merge each of the sorted halves.

1 def mergesort(A):
2 if len(A) == 1:
3 return A
4 else:
5 m = len(A) // 2
6 L1 = mergesort(A[0..m-1])
7 L2 = mergesort(A[m..len(A)-1])
8 return merge(L1, L2)
9
10 def merge(A, B):
11 i = 0
32 david liu

12 j = 0
13 C = []
14 while i < len(A) and j < len(B):
15 if A[i] <= B[j]:
16 C.append(A[i])
17 i += 1
18 else:
19 C.append(B[j])
20 j += 1
21 return C + A[i..len(A)-1] + B[j..len(B)-1] # List concatenation

Consider the analysis of T (n), the runtime of mergesort(A) where len(A) = n.


Steps 1 and 3 together take linear time. What about Step 2? Since this is Careful implementations of mergesort
the recursive step, we’d expect T to appear here. There are two fundamental can do step 1 in constant time, but
questions: merging always takes linear time.

• What is the number of recursive calls?


• What is the size of the lists passed to the recursive calls?

From the written description, you should be able to intuit that  there are two
n n
recursive calls, each on a list of size . So the cost of Step 2 is 2T . Putting For the last time, we’ll point out that we
2 2
all three steps together, we get a recurrence relation ignore floors and ceilings.
n
T (n) = 2T + cn.
2

This is an example of our special recurrence form:


n
T (n) = aT + f ( n ),
b
where a, b ∈ Z+ are constants and f : N → R+ is some arbitrary function. Though we’ll soon restrict f to ease the
analysis of this recursive form.
Before we get to the Master Theorem, which gives us an immediate asymptotic
bound for recurrences of this form, let’s discuss some intuition. The special
recurrence form has three parameters: a, b, and f . Changing how big they are
affects the overall runtime:

• a is the “number of recursive calls”; the bigger a is, the more recursive calls,
and the bigger we expect T (n) to be.
• b determines the rate of decrease of the problem size; the larger b is, the faster
the problem size goes down to 1, and the smaller T (n) is.
• f (n) is the cost of the non-recursive part; the bigger f (n) is, the bigger T (n)
is.

We can further quantify this relationship by considering the following even more
specific form: 
c, if n = 1
f (n) = n
a f + nk , if n > 1
b
introduction to the theory of computation 33

Suppose n is a power of b, say n = br . Using repeated substitution,


n
f (n) = a f + nk
b !
 n  nk n  a
= a a f 2 + k + n k = a2 f 2 + n k 1 + k
b b b b
 
n a  a 2

= a3 f 3 + n k 1 + k + k
b b b
..
.
r −1 
n a i
= ar f r + n k ∑ k
b i =0 b
r −1 
a i
= ar f (1) + n k ∑ k (Since n = br )
i =0 b
r −1 
a i
= car + nk ∑ k
i =0 b
r −1 
a i Note that r = logb n, and so ar =
= cnlogb a + nk ∑ k
i =0 b alogb n = blogb a·logb n = nlogb a .

The latter term looks like a geometric series, for which we may use our geometric
a
series formula. However, this only applies when the common ratio k is not
b
equal to 1. Therefore, there are two cases.

a
• Case 1: = 1, so a = bk . Taking logs, we have logb a = k. In this case, the
bk
expression becomes

r −1
f (n) = cnk + nk ∑ 1i
i =0
= cnk + nk r
= cnk + nk logb n
= O(nk log n)

• Case 2: a 6= bk . Then by the geometric series formula,

ar
!
logb a k
1− bkr
f (n) = cn +n
1 − bak
 
nlogb a
1− nk
= cnlogb a + nk 
1 − bak

! !
1 1
= c− nlogb a + nk
1 − bak 1 − bak

There are two occurrences of n in this expression: nlogb a and nk . Asymptoti-


cally, the higher exponent dominates; so if logb a > k, then f (n) = O(nlogb a ),
and if logb a < k, then f (n) = O(nk ).

With this intuition in mind, let us now state the Master Theorem.
34 david liu

Theorem 3.1 (Master Theorem). Let T : N → R+ be a recursively defined Master Theorem


function with recurrence relation T (n) = aT (n/b) + f (n), for some constants
a, b ∈ Z+ , b > 1, and f : N → R+ . Furthermore, suppose f (n) = Θ(nk )
for some k ≥ 0. Then we can conclude the following about the asymptotic
complexity of T:

(1) If k = logb a, then T (n) = O(nk log n).


(2) If k < logb a, then T (n) = O(nlogb a ).
(3) If k > logb a, then T (n) = O(nk ).

Let’s see some examples of the Master Theorem in action.

Example 3.4. Consider the recurrence for mergesort: T (n) = 2T (n/2) + dn.
Here a = b = 2, so log2 2 = 1, while dn = Θ(n1 ). Therefore Case 1 of the Master
Theorem applies, and T (n) = O(n log n), as we expected.

Example 3.5. Consider the recurrence T (n) = 49T (n/7) + 50nπ . Here, log7 49 =
2 and 2 < π, so by Case 3 of the Master Theorem, T (n) = O(nπ ).

Even though the Master Theorem is useful in a lot of situations, be sure you
understand the statement of the theorem to see exactly when it applies (see
Exercises for some questions investigating this).

Divide-and-Conquer Algorithms

Now that we have seen the Master Theorem, let’s discuss some algorithms for
which it can help us analyse the runtime! A key feature of the recurrence form
aT (n/b) + f (n) is that each of the recursive calls has the same size. This naturally
leads to the divide-and-conquer paradigm, which can be summarized as fol-
divide-and-conquer
lows:
An algorithmic paradigm is a general
strategy for designing algorithms to
1 def divide-and-conquer(P): solve problems. You will see many
2 if P has "small enough" size: more such strategies in CSC373.

3 return solve_directly(P)
4 else:
5 divide P into smaller problems P_1, ..., P_k (same size)
6 for i from 1 to k:
7 # Solve each subproblem recursively
8 s_i = divide_and_conquer(P_i)
9 # combine the s_1 ... s_k to solve P
10 return combine(s_1 ... s_k)

This is a very general template — in fact, it may seem exactly like your mental
model of recursion so far, and certainly it is a recursive strategy. What distin-
guishes divide-and-conquer algorithms from a lot of other recursive procedures
is that we divide the problem into two or more parts and solve the subproblems
introduction to the theory of computation 35
Another common non-divide-and-
conquer recursive design pattern
is taking a list, processing the first
for each part, whereas recursive functions in general may make only a single element, then recursively processing
recursive call, like in fact or bin_search. the rest of the list (and combining the
results).
This introduction to the divide-and-conquer paradigm was deliberately abstract.
However, we have already discussed one divide-and-conquer algorithm: merge-
sort! Let us now see two more examples of divide-and-conquer algorithms: fast
multiplication and quicksort.

Fast Multiplication

Consider an algorithm for multiplying two numbers: 1234 × 5678. We might


start by writing this as

(1 ∗ 1000 + 2 ∗ 100 + 3 ∗ 10 + 4) ∗ (5 ∗ 1000 + 6 ∗ 100 + 7 ∗ 10 + 8)

Expanding this product requires 16 one-digit multiplications (1 ∗ 5, 2 ∗ 5, 3 ∗ 5, 4 ∗


5, 1 ∗ 6, . . . , 4 ∗ 8), and then some one-digit additions to add everything up. In
general, multiplying two n-digit numbers this way requires O(n2 ) one-digit op-
erations.

Now, let’s see a different way of making this faster. Using a divide-and-conquer
approach, we want to split 1234 and 5678 into smaller numbers:

1234 = 12 · 100 + 34, 5678 = 56 · 100 + 78.

Now we use some algebra to write the product 1234 · 5678 as the combination of
some smaller products:

1234 · 5678 = (12 · 100 + 34)(56 · 100 + 78)


= (12 · 56) · 10000 + (12 · 78 + 34 · 56) · 100 + 34 · 78

So now instead of multiplying 4-digit numbers, we have shown how to find the
solution by multiplying some 2-digit numbers, a much easier problem! Note On a computer, we would use base-2
that we aren’t counting multiplication by powers of 10, since that amounts to instead of base-10 to take advantage
of the “adding zeros,” which corre-
just adding some zeroes to the end of the numbers. sponds to (very fast) bit-shift machine
operations.
Reducing 4-digit multiplication to 2-digit multiplication may not seem that im-
pressive; but now, we’ll generalize this to arbitrary n-digit numbers (the differ-
ence in multiplying 100-digit vs. 50-digit numbers may be more impressive).

Let x and y be n-digit numbers. For simplicity, assume n is a power of 2. Then


we can divide x and y each into two halves:
n
x = 10 2 a + b
n
y = 10 2 c + d
n
Where a, b, c, d are -digit numbers. Then
2
n
x · y = ( ac)10n + ( ad + bc)10 2 + bd.

We have found a mathematical identity that seems useful, and we can use this
to develop a multiplication algorithm. Let’s see some pseudocode:

The length of a number here refers to


the number of digits in its decimal
representation.
36 david liu

1 def rec_mult(x,y):
2 n = length of x # Assume x and y have the same length
3 if n == 1:
4 return x * y
5 else:
6 a = x // 10^(n//2)
7 b = x % 10^(n//2)
8 c = y // 10^(n//2)
9 d = y % 10^(n//2)
10
11 r = rec_mult(a, c)
12 s = rec_mult(a, d)
13 t = rec_mult(b, c)
14 u = rec_mult(b, d)
15
16 return r * 10^n + (s + t) * 10^(n//2) + u

Now, let’s talk about the running time of this algorithm, in terms of the size n
of the two numbers. Note that there are four recursive calls; each
 n call multiplies
n
two numbers of size , so the cost of the recursive calls is 4T . What about
2 2
the non-recursive parts? Note that the final return step involves addition of
2n-digit numbers, which takes Θ(n) time. Therefore we have the recurrence
n
T (n) = 4T + cn.
2

By the Master Theorem, we have T (n) = O(n2 ).

So, this approach didn’t help! We had an arguably more complicated algo-
rithm that achieved the same asymptotic runtime as what we learned in ele- This is a serious lesson. It is not the case
mentary school! Moral of the story: Divide-and-conquer, like all algorithmic that everything we teach you works for
every situation. It is up to you to care-
paradigms, doesn’t always lead to “better” solutions! fully put together your knowledge to
figure out how to approach problems!
In the case of fast multiplication, though, we can use more math to improve the
running time. Note that the “cross term” ad + bc in the algorithm required two
multiplications to compute naïvely; however, it is correlated with the values of
ac and bd with the following straightforward identity:

( a + b)(c + d) = ac + ( ad + bc) + bd
( a + b)(c + d) − ac − bd = ad + bc

So we can compute ad + bc by calculating just one additional product ( a + b)(c +


d) (together with ac and bd that we are calculating anyway). This trick underlies
the multiplication algorithm of Karatsuba (1960).

1 def fast_rec_mult(x,y):
2 n = length of x # Assume x and y have the same length
3 if n == 1:
4 return x * y
introduction to the theory of computation 37

5 else:
6 a = x // 10^(n//2)
7 b = x % 10^(n//2)
8 c = y // 10^(n//2)
9 d = y % 10^(n//2)
10
11 p = fast_rec_mult(a + b, c + d)
12 r = fast_rec_mult(a, c)
13 u = fast_rec_mult(b, d)
14
15 return r * 10^n + (p - r - u) * 10^(n//2) + u

You can study the (improved!) runtime of this algorithm in the exercises.

Quicksort

In this section, we explore the divide-and-conquer sorting algorithm known as


quicksort, which in practice is one of the most commonly used sorting algorithms. quicksort
First, we give the pseudocode for this algorithm; note that this follows a very
clear divide-and-conquer pattern. Unlike fast_rec_mult and mergesort, the
hard work is done in the divide (partition) step, not the combine step. The
combine step for quicksort can be made to take a constant amount of work.
Our naïve implementation below
does list concatenation in linear time
1 def quicksort(A): because of list slicing, but in fact a more
clever implementation using indexing
2 if len(A) <= 1: accomplishes this in constant time.
3 # do nothing (A is already sorted)
4 else:
5 choose some element x of A (the "pivot")
6 partition (divide) the rest of the elements of A into two lists:
7 - L, the elements of A <= x
8 - G, the elements of A > x
9 sort L and G recursively
10 combine the sorted lists in the order L + [x] + G
11 set A equal to the new list

Before moving on, an excellent exercise is to take the above pseudocode and im-
plement quicksort yourself. As we will discuss again and again, implementing
algorithms yourself is the best way to understand them. Remember that the only
way to improve your coding abilities is to code lots — even something as simple
and common as sorting algorithms offers great practice. See the Exercises for
more examples.

1 def quicksort(A):
2 if len(A) <= 1:
38 david liu

3 pass
4 else:
5 # Choose the final element as the pivot
6 pivot = A[-1]
7
8 # Partition the rest of A with respect to the pivot
9 L, G = partition(A[0:-1], pivot)
10 # Sort each list recursively
11 quicksort(L)
12 quicksort(G)
13
14 # Combine
15 sorted = L + [pivot] + G
16 # Set A equal to the sorted list
17 for i in range(len(A)):
18 A[i] = sorted[i]
19
20 def partition(A, pivot):
21 L = []
22 G = []
23 for x in A:
24 if x <= pivot:
25 L.append(x)
26 else:
27 G.append(x)
28 return L, G

Let us try to analyse the running time T (n) of this algorithm, where n is the
length of the input list A. First, the base case n = 1 takes constant time. The
partition method takes linear time, since it is called on a list of length n − 1
and contains a for loop that loops through all n − 1 elements. The Python list
methods in the rest of the code also take linear time, though a more careful
implementation could reduce this. But because partitioning the list always takes
linear time, the non-recursive cost of quicksort is linear.

What about the costs of the recursive steps? There are two of them: quicksort(L)
and quicksort(G), so the recursive cost in terms of L and G is T (| L|) and T (| G |). Here | A| denotes the length of the list
Therefore a potential recurrence is: A.
(
c, if n ≤ 1
T (n) =
T (| L|) + T (| G |) + dn, if n > 1

What’s the problem with this recurrence? It depends on what L and G are,
which in turn depends on the input array and the chosen pivot! In particular,
we can’t use either repeated substitution or the Master Theorem to analyse this
function. In fact, the asymptotic running time of this algorithm can range from
Θ(n log n) to Θ(n2 ), the latter of which is just as bad as bubblesort! See the Exercises for details.

This begs the question: why is quicksort so used in practice? Two reasons:
introduction to the theory of computation 39

quicksort takes Θ(n log n) time “on average”, and careful implementations of Average-case analysis is slightly more
quicksort yield better constants than other Θ(n log n) algorithms like mergesort. sophisticated than what we do in
These two facts together imply that quicksort often outperforms other sorting this course, but you can take this
to mean that most of the time, on
algorithms in practice! randomly selected inputs, quicksort
takes Θ(n log n) time.

Exercises

1. Let f : N → N be defined as
(
0, if n = 0
f (n) =
f (n − 1) + 2n − 1, if n ≥ 1

Prove using induction that the closed form for f is f (n) = n2 .


2. Recall the recursively defined function


 0, if n = 1

f (n) = 2, if n = 2


2 f ( n − 2), if n > 2
Prove that the closed form for f is
(
0, if n is odd
f (n) = n
2 ,
2 if n is even

3. Prove that the closed form expression for the runtime of fact is T (n) = c +
(n − 1)d.
4. Prove that the closed form expression for the runtime of bin_search is T (n) =
c + d log2 n.
5. Let T (n) be the number of binary strings of length n in which there are no
consecutive 1’s. So T (0) = 1, T (1) = 2, T (2) = 3, etc.

(a) Develop a recurrence for T (n). Hint: think about the two possible cases for
the last character.
(b) Find a closed form expression for T (n).
(c) Prove that your closed form expression is correct using induction.

6. Repeat the steps of the previous question, except with binary strings where
every 1 is immediately preceded by a 0.
7. It is known that every full binary tree has an odd number of nodes. Let T (n) A full binary tree is a binary tree where
denote the number of distinct full binary trees with n nodes. For example, every node has either 0 or 2 children.
T (1) = 1, T (3) = 1, and T (7) = 5. Give a recurrence for T (n), justifying why
1
it is correct. Then, use induction to prove that T (n) ≥ 2(n−1)/2 .
n
8. Consider the following recursively defined function


 3, if n = 0

f (n) = 7, if n = 1


3 f (n − 1) − 2 f (n − 2), if n ≥ 2

Find a closed form expression for f , and prove that it is correct using induc-
tion.
40 david liu

9. Consider the following recursively defined function:


1
 ,
 if n = 0
5
f (n) =
 1 + f (n − 1) , if n ≥ 1

2
(a) Prove that for all n ≥ 1, f (n + 1) − f (n) < f (n) − f (n − 1).
4
(b) Prove that for all n ∈ N, f (n) = 1 − .
5 · 2n
10. A block in a binary string is a maximal substring consisting of the same sym-
bol. For example, the string 0100011 has four blocks: 0, 1, 000, and 11. Let
H (n) denote the number of binary strings of length n that have no odd length
blocks of 1’s. For example, H (4) = 5:

0000 1100 0110 0011 1111

Develop a recursive definition for H (n), and justify why it is correct. Then
find a closed form for H using repeated substitution.
11. Consider the following recursively defined function.

1, if n = 1
T (n) = n
4T + log2 n, otherwise
2
Use repeated substitution to come up with a closed form expression for T (n),
when n = 2k ; i.e., n is a power of 2. You will need to use the following identity:
n
n · a n +2 − ( n + 1 ) · a n +1 + a
∑ i · ai = ( a − 1)2
.
i =0

12. Analyse the worst-case runtime of fast_rec_mult.


13. Analyse the runtime of each of the following recursive algorithms. It’s up to
you to decide whether you should use repeated substitution or the Master
Theorem to find the asymptotic bound.

(a)

1 def sum(A):
2 if len(A) == 0:
3 return 1
4 else:
5 return A[0] + sum(A[1..len(A)-1])

(b)

1 def fun(A):
2 if len(A) < 2:
3 return len(A) == 0
4 else:
5 return fun(A[2..len(A)-1])
introduction to the theory of computation 41

(c)

1 def double_fun(A):
2 n = len(A)
3 if n < 2:
4 return n
5 else:
6 return double_fun(A[0..n-2]) + double_fun(A[1..n-1])

(d)

1 def mystery(A):
2 if len(A) <= 1:
3 return 1
4 else:
5 d = len(A) // 4
6 s = mystery(A[0..d-1])
7 i = d
8 while i < 3 * d:
9 s += A[i]
10 i += 1
11 s += mystery(A[3*d..len(A)-1])
12 return s

14. Recall the recurrence for the worst-case runtime of quicksort:


(
c, if n ≤ 1
T (n) =
T (| L|) + T (| G |) + dn, if n > 1

where L and G are the partitions of the list. Clearly, how the list is partitioned
matters a great deal for the runtime of quicksort.
n
(a) Suppose the lists are always evenly split; that is, | L| = | G | = at each
2
recursive call. Find a tight asymptotic bound on the runtime of quicksort
For simplicity, we’ll ignore the fact that
using this assumption. n−1
each list really would have size .
2
(b) Now suppose that the lists are always very unevenly split: | L| = n − 2
and | G | = 1 at each recursive call. Find a tight asymptotic bound on the
runtime of quicksort using this assumption.
4 Program Correctness

In our study of algorithms so far, we have mainly been concerned with their
worst-case running time. While this is an important consideration of any pro-
gram, there is arguably a much larger one: program correctness! That is, while
it is important for our algorithms to run quickly, it is more important that they
work! You are used to testing your programs to demonstrate their correctness, Frankly, developing high-quality tests
but your confidence depends on the quality of your testing. takes a huge amount of time – much
longer than you probably spent on it in
In this chapter, we’ll discuss methods of formally proving program correctness, CSC148!
without writing any tests at all. We cannot overstate the importance of this
technique: a test suite cannot possibly test a program on all possible inputs
(unless it is a very restricted program), and so a proof is the only way we can
ensure that our programs are actually correct on all inputs. Even for larger
software systems, which are far too complex to formally prove their correctness,
the skills you will learn in this chapter will enable you to reason more effectively By “semantically” we mean your ability
about your code; essentially, what we will teach you is the art of semantically to derive meaning from code, i.e.,
identify exactly what the program does.
tracing through code.
This contrasts with program syntax,
things like punctuation and (in Python)
indentation.
What is Correctness?

You may be familiar with the most common tools used to specify program cor-
rectness: preconditions and postconditions. Formally, a precondition of a function precondition/postcondition
is a property that an input to the function must satisfy in order to guarantee that
the function will work properly. A postcondition of a function is a property that As a program designer, it is up to you
must be satisfied after the function completes. Most commonly, this refers to to specify preconditions. This often bal-
properties of a return value, though it can also refer to changes to the variables ances the desire of flexibility (allowing
a broad range of inputs/usages) with
passed in as with the implementation of quicksort from the previous chapter feasibility (how much code you want or
(which didn’t return anything but instead changed the input list A). are able to write).

A single function can have several pre-


and postconditions.
44 david liu

Example 4.1. Consider the following code for calculating the greatest common
divisor of two natural numbers. Its pre- and postconditions are shown.

1 def gcd_rec(a, b):


2 '''
3 Pre: a and b are positive integers, and a >= b
4 Post: returns the greatest common divisor of a and b
5 '''
6 if a == 1 or b == 1:
7 return 1
8 else if a mod b == 0:
9 return b We’ll use mod rather than % in our code,
for clarity.
10 else:
11 return gcd_rec(b, a mod b)

So preconditions tell us what must be true before the program starts, and post-
conditions tell us what must be true after the program terminates (assuming it
ends at all). We have the following formal statement of correctness. Though it
is written in more formal language, note that it really captures what we mean
when we say that a program is “correct.”

Definition 4.1 (Program Correctness). Let f be a function with a set of pre-


program correctness
conditions and postconditions. Then f is correct (with respect to the pre- and
postconditions) if the following holds:

For every input I to f, if I satisfies the preconditions, then f(I)


terminates, and all the postconditions hold after it terminates.

Correctness of Recursive Programs

Consider the code for gcd_rec shown in the previous example. Here is its state-
ment of correctness:

For all a, b ∈ Z+ such that a ≥ b, gcd_rec(a,b) terminates and


returns gcd( a, b).

How do we prove that gcd_rec is correct? It should come as no surprise that we


use induction, since you know by now that induction and recursion are closely
related concepts. At this point in the course, you should be extremely comfort-
able with the following informal reasoning: “If a and b are ’small’ then we can
prove directly using the base case of gcd_rec that it is correct. Otherwise, the
recursive call happens on ’smaller’ inputs, and by induction, the recursive call is
correct, and hence because of some {math, logic, etc.}, the program also returns
the correct value.”
introduction to the theory of computation 45

Writing full induction proofs that formalise the above logic is tedious, so instead
we use the fundamental idea in a looser template. For each program path from
the first line to a return statement, we show that it terminates and that, when it
does, the postconditions are satisfied. We do this as follows:

• If the path contains no recursive calls or loops, analyse the code line by line
until the return statement.
• For each recursive call on the path (if there are any), argue why the precondi-
tions are satisfied at the time of the recursive call, and that the recursive call There is some ambiguity around what
occurs on a “smaller” input than the original call. Then you may assume that is meant by “smaller.” We will discuss
the postconditions for the recursive call are satisfied when the recursive call this shortly.

terminates.
Pre
Finally, argue from the last recursive call to the end of the function why the
postconditions of the original function call will hold.
• For each loop, use a “loop invariant.” We will deal with this in the next
section.
Pre

We have described the content of an induction proof; by focusing on tracing


Post Post
values in the code, we are isolating the most important thinking involved. But
recall that the core of induction is proving a predicate for all natural numbers;
Post
thus the final ingredient we need to account for is associating each input with a
natural number: its “size.” Usually, this will be very simple: the length of a list, So in essence, our “predicate” would be
or the value of an input number. Let us illustrate this technique using gcd_rec “for all inputs I of size n that satisfy the
as our example. precondition of f, f is correct on I.

Example 4.2. We will show that gcd_rec is correct. There are three program
paths (this is easy to see, because of the if statements). Let’s look at each one
separately.

• Path 1: the program terminates at line 7. If the program goes into this block,
then a = 1 or b = 1. But in these cases, gcd( a, b) = 1, because gcd( x, 1) = 1
for all x. Then the postcondition holds, since at line 7 the program returns 1.
• Path 2: the program terminates at line 9. If the program goes into this block,
b divides a. Since b is the greatest possible divisor of itself, this means that
gcd( a, b) = b, and b is returned at line 9.
• Path 3: the program terminates at line 11. We need to check that the recursive
call satisfies its preconditions and is called on a smaller instance. Note that b
and ( a mod b) are both at least 1, and ( a mod b) < b, so the preconditions
are satisfied. Since a + b > ( a mod b) + b, the sum of the inputs decreases, and
so the recursive call is made on a smaller instance.
If you recall the example of using
Therefore when the call completes, it returns gcd(b, a mod b). Now we use complete induction on ordered pairs,
taking the sum of the two components
the identity that gcd( a, b) = gcd(b, a mod b) to conclude that the original was the size measure we used there,
call returns the correct answer. too.

Example 4.3. We now look at a recursive example on lists. Here we consider a


randomized binary search – this is worse than regular binary search in practice,
How is it similar to quicksort?
but is useful for our purposes because it shows that the correctness of binary
search doesn’t depend on the size of the recursive calls.
46 david liu

1 def rand_bin_search(A, x):


2 '''
3 Pre: A is a sorted list of numbers, and x is a number
4 Post: Returns true if and only if x is an element of A
5 '''
6 if len(A) == 0:
7 return false
8 else if len(A) == 1:
9 return A[0] == x
10 else:
11 guess = a random number from 0 to len(A) - 1, inclusive
12 if A[guess] == x:
13 return true
14 else if A[guess] > x:
15 return rand_bin_search(A[0..guess-1], x)
16 else:
17 return rand_bin_search(A[guess+1..len(A)-1], x)

Proof of correctness. Here there are five different program paths. We’ll check
three of them, and leave the others as an exercise:

• Path 1: the program terminates at line 7. This happens when A is empty; if


this happens, x is certainly not in A, so the program returns the correct value
(false).

• Path 2: the program terminates at line 9. We compare A[0] to x, returning


true if they are equal, and false otherwise. Note that if they aren’t equal,
then x cannot be in A, since A has only one element (its length is one).

• Path 4: the program terminates at line 15. This happens when len(A) > 1,
and A[guess] > x. Because A is sorted, and A[guess] > x, for every index
i ≥ guess, A[i] > x. Therefore the only way x could appear in A is if it
appeared at an index smaller than guess.

Now, let us handle the recursive call. Since guess ≤ len( A) − 1, we have
that guess − 1 ≤ len( A) − 2, and so the length of the list in the recursive
call is at most len( A) − 1; so the recursive call happens on a smaller instance.
Therefore, when the recursive call returns, the postcondition is satisfied: it
returns true if and only if x appears in A[0..guess-1]. The original function
call then returns this value; by the discussion in the previous paragraph, this
is the correct value to return, so the postcondition is satisfied.


introduction to the theory of computation 47

Iterative Programs

In this section, we’ll discuss how to handle loops in our code. So far, we have
been able to determine the exact sequence of steps in each program path (e.g., We’ve treated recursive calls as “black
“Lines 1, 2, 4, and 6 execute, and then the program returns”). However, this is boxes” that behave nicely (i.e., can be
not the case when we are presented with a loop, because the sequence of steps treated as a single step) as long as their
preconditions are satisfied and they are
depends on the number of times the loop iterates, which in turn depends on called on smaller inputs.
the input (e.g., the length of an input list). Thus our argument for correctness
cannot possibly go step by step!

Instead, we treat the entire loop as a single unit, and give a correctness argument
specifically for it separately. But what do we mean for a loop to be “correct”?
Consider the following function.

1 def avg(A):
2 '''
3 Pre: A is a non-empty list of numbers
4 Post: Returns the average of the numbers in A
5 '''
6 sum = 0
7 i = 0
8 while i < len(A):
9 sum += A[i]
10 i += 1
11 return sum / len(A)

This is the sort of program you wrote in CSC108. Intuitively, it is certainly


correct — why? The “hard” work is done by the loop, which calculates the sum
of the elements in A. The key is to prove that this is what the loop does. Coming
out of first-year programming, most of you would be comfortable saying that
the variable sum starts with value 0 before the loop, and after the loop ends it
contains the value of the sum of all elements in A. But what can we say about
the value of sum while the loop is running?

Clearly, the loop calculates the sum of the elements in A one element at a time.
After some thought, we determine that the variable sum starts with value 0 and in
the loop takes on the values A[0], then A[0] + A[1], then A[0] + A[1] + A[2],
etc. We formalize this by defining a loop invariant for this loop. A loop invariant loop invariant
is a predicate that is true every time the loop-condition is checked (including the
check that terminates the loop). Usually, the predicate will depend on which
iteration the loop is on, or more generally, the value(s) of the program variable(s)
associated with the loop. For example, in avg, the loop invariant corresponding to
our previous intuition is
i −1
P(i, sum) : sum = ∑ A[k] By convention, the empty sum
−1
∑ A[k]
k =0
k =0
evaluates to 0.
The i and sum in the predicate really correspond to the values of those variables
in the code for avg. That is, this predicate is stating a property of these variables
48 david liu

in the code.

Unfortunately, this loop invariant isn’t quite right; what if i > len( A)? Then the
sum is not well-defined, since, for example, A[len(A)] is undefined. This can be
solved with a common technique for loop invariants: putting bounds on “loop
counter” variables, as follows:
i −1
Inv(i, sum) : 0 ≤ i ≤ len( A) ∧ sum = ∑ A [ k ].
k =0

This ensures that the sum is always well-defined, and has the added benefit of In general, the fewer possible values a
explicitly defining a possible range of values on i. variable takes on, the fewer cases you
have to worry about in your code.
A loop invariant is correct if it is always true at the beginning of every loop
iteration, including the loop check that fails, causing the loop to terminate. This is
why we allowed i ≤ len( A) rather than just i < len( A) in the invariant.

How do we prove that loop invariants are correct? The argument is yet another
application of induction:

• First, we argue that the loop invariant is satisfied when the loop is reached.
(This is arrow (1) in the diagram)
Pre
• Then, we argue that if the loop invariant is satisfied at the beginning of an it-
eration, then after the loop body executes once (i.e., one loop iteration occurs), (1)
the loop invariant still holds. (Arrow (2))
Inv (2)
• Finally, after proving that the loop invariant is correct, we show that if the
invariant holds when the loop ends, then the postcondition will be satisfied (3)

when the program returns. (Arrow (3)) Post

Though this is basically an inductive proof, as was the case for recursive pro- If we wanted to be precise, we would
grams, we won’t hold to the formal induction structure here. do induction on the number of loop
iterations executed.
Example 4.4. Let us formally prove that avg is correct. The main portion of the
proof will be the proof that our loop invariant Inv(i, sum) is correct.

Proof. When the program first reaches the loop, i = 0 and sum = 0. Plugging
this into the predicate yields

−1
Inv(0, 0) : 0 ≤ 0 ≤ len( A) ∧ 0 = ∑ A [ k ],
k =0

which is true (recall the note about the empty sum from earlier).

Now suppose the loop invariant holds when i = i0 , at the beginning of a loop it-
eration. Let sum0 be the value of the variable sum at this time. The loop invariant
we are assuming is the following:

i0 −1
Inv(i0 , sum0 ) : 0 ≤ i0 ≤ len( A) ∧ sum0 = ∑ A [ k ].
k =0

What happens next? The obvious answer is “the loop body runs,” but this misses
one subtle point: if i0 = len( A) (which is allowed by the loop invariant), the
introduction to the theory of computation 49

body of the loop doesn’t run, and we don’t need to worry about this case, since we
only care about checking what happens to the invariant when the loop actually
runs.

Assume that i0 < len( A), so that the loop body runs. What happens in one
iteration? Two things: sum increases by A[i0 ], and i increases by 1. Let sum1
and i1 be the values of sum and i at the end of the loop iteration. We have
sum1 = sum0 + A[i0 ] and i1 = i0 + 1. Our goal is to prove that the loop invariant
holds for i1 and sum1 , i.e.,
i1 −1
Inv(i1 , sum1 ) : 0 ≤ i1 ≤ len( A) ∧ sum1 = ∑ A [ k ].
k =0

Let us check that the loop invariant is still satisfied by sum1 and i1 . First, 0 ≤
i0 < i0 + 1 = i1 ≤ len( A), where the first inequality came from the loop invariant
holding at the beginning of the loop, and the last inequality came from the
assumption that i0 < len( A). The second part of Inv can be checked by a simple
calculation:

sum1 = sum0 + A[i0 ]


!
i0 −1
= ∑ A[k] + A [ i0 ] (by Inv(i0 , sum0 ))
k =0
i0
= ∑ A[k]
k =0
i1 −1
= ∑ A[k] (Since i1 = i0 + 1)
k =0

Therefore the loop invariant always holds.

The next key idea is that when the loop ends, variable i has value len( A), since
by the loop invariant it always has value ≤ len( A), and if it were strictly less
than len( A), another iteration of the loop would run. Then by the loop invariant,
len( A)−1 That might seem like a lot of writing to
the value of sum is ∑ A[k ], i.e., the sum of all the elements in A! The final get to what we said paragraphs ago, but
k =0 this is a formal argument that confirms
step is to continue tracing until the program returns, which in this case takes just our intuition.
a single step: the program returns sum / len(A). But this is exactly the average We also implicitly use here the mathe-
of the numbers in A, because the variable sum is equal to their sum! Therefore matical definition of “average” as the
the postcondition is satisfied.  sum of the numbers divided by how
many there are.

Note the deep connection between the loop invariant and the postcondition.
There are many other loop invariants we could have tried to prove: for example,
Inv(i, sum) : i + sum ≥ 0. But this wouldn’t have helped at all in proving
the postcondition! When confronting more problems on your own, it will be
up to you to determine the right loop invariants for the job. Keep in mind
that choosing loop invariants can usually be done by either taking a high-level
approach and mimicking something in the postcondition or taking a low-level
approach by carefully tracing through the code on test inputs to try to find
patterns in the variable values.
50 david liu

One final warning before we move on: loop invariants describe relationships
between variables at a specific moment in time. Students often try to use loop
Specifically, at the beginning of a
invariants to capture how variables change over time (e.g., “i will increase by 1 particular loop check.
when the loop runs”), which creates massive headaches because determining
how the code works is the meat of a proof, and shouldn’t be shoehorned into
a single predicate! When working with more complex code, we take the view
that loop invariants are properties that are preserved, even if they don’t describe
exactly how the code works or exactly what happens! This flexibility is what
makes correctness proofs manageable.

Example 4.5. Here we consider a numerical example: a low-level implementa-


tion of multiplication.

1 def mult(a,b):
2 '''
3 Pre: a and b are natural numbers
4 Post: returns a * b
5 '''
6 m = 0
7 count = 0
8 while count < b:
9 m += a
10 count += 1
11 return m

Proof of correctness. The key thing to figure out here is how the loop accom-
plishes the multiplication. It’s clear what it’s supposed to do: the variable m
changes from 0 at the beginning of the loop to a*b at the end. How does this
change happen? One simple thing we could do is make a table of values of the
variables m and count as the loop progresses:

m count
0 0
a 1
2a 2
3a 3
.. ..
. .

Aha: m seems to always contain the product of a and count; and, when the loop
ends, count = b! This leads directly to the following loop invariant (including
the bound on count):

Inv(m, count) : m = a × count ∧ count ≤ b

Consider an execution of the code, with the preconditions satisfied by the inputs.
Then when the loop is first encountered, m = 0 and count = 0, so m = a × count,
and count ≤ b (since b ∈ N).
introduction to the theory of computation 51

Now suppose the loop invariant holds at the beginning of some iteration, with
m = m0 and count = count0 . Furthermore, suppose count0 < b, so the loop runs.
Explicitly, we assume that
When the loop runs, m increases by a and count increases by 1. Let m1 and count1 Inv(m0 , count0 ) holds.
denote the new values of m and count; so m1 = m0 + a and count1 = count0 + 1.
Since Inv(m0 , count0 ) holds, we have

m1 = m0 + a
= a × count0 + a (by invariant)
= a(count0 + 1)
= a × count1

Moreover, since we’ve assumed count0 < b, we have that count1 = count0 + 1 ≤
b. So Inv(m1 , count1 ) holds.

Finally, when the loop terminates, we must have count = b, since by the loop
invariant count ≤ b, and if count < b another iteration would occur. Then by
the loop invariant again, when the loop terminates, m = ab, and the function
returns m, satisfying the postcondition. 

Termination

Unfortunately, there is a slight problem with all the correctness proofs we have
done so far. We’ve used phrases like “when the recursive call ends” and “when
the loop terminates”. But how do we know that the recursive calls and loops
end at all? That is, how do we know that a program does not contain an infinite
loop or infinite recursion?
This is a serious issue: what beginning
programmer hasn’t been foiled by either
The case for recursion is actually already handled by our implicit induction of these errors?
proof structure. Recall that the predicate in our induction proofs is that f is cor-
rect on inputs of size n; part of the definition of correctness is that the program
terminates. Therefore as long as the induction structure holds — i.e., that the
recursive calls are getting smaller and smaller — termination comes for free.

The case is a little trickier for loops. As an example, consider the loop in avg. Be-
cause the loop invariant Inv(i, sum) doesn’t say anything about how i changes,
we can’t use it to prove that the loop terminates. But for most loops, including
this one, it is “obvious” why they terminate, because they typically have counter The counter role is played by the
variables that iterate through a fixed range of values. variable i, which goes through the
range {0, 1, . . . , len( A)}.
This argument would certainly convince us that the avg loop terminates, and
in general all loops with this form of loop counter terminate. However, not all
loops you will see or write will have such an obvious loop counter. Here’s an
example:

1 def collatz(n):
2 ''' Pre: n is a natural number '''
3 curr = n
4 while curr > 1:
5 if curr is even:
52 david liu
In fact, it is an open question in math-
ematics whether this function halts
6 curr = curr // 2 on all inputs or not. If only we had a
computer program that could tell us
7 else: whether it does!
8 curr = 3 * curr + 1

Therefore we’ll now introduce a formal way of proving loop termination. Recall loop termination
that our correctness proofs of recursive functions hinged on the fact that the
recursive calls were made on smaller and smaller inputs, until some base case
was reached. Our strategy for loops will draw inspiration from this: we associate
with the loop a loop variant v that has the following two properties: loop variant

(1) v decreases with each iteration of the loop While this is not the only strategy for
proving termination, it turns out to
(2) v is always a natural number at the beginning of each loop iteration be suitable for most loops involving
numbers and lists. When you study
more advanced data structures and
algorithms, you will discuss more
If such a v exists, then at some point v won’t be able to decrease any further complex arguments for both correctness
(because 0 is the smallest natural number), and therefore the loop cannot have and termination.

any more iterations. This is analogous to inputs to recursive calls getting smaller
and smaller until a base case is reached.

Let us illustrate this technique on our avg loop.

Example 4.6. Proof of termination of avg. Even though we’ve already observed that
the loop has a natural loop counter variable i, this variable increases with each
iteration. Instead, our loop variant will be v = len( A) − i. Let us check that v
satisfies the properties (1) and (2):

(1) Since at each iteration i increases by 1, and len( A) stays the same, v =
len( A) − i decreases by 1 on each iteration.

(2) Note that i and len( A) are both always natural numbers. But this alone is
not enough to conclude that v ∈ N; for example, 3, 5 ∈ N but (3 − 5) ∈ / N. This is a major reason we include
But the loop invariant we proved included the predicate 0 ≤ i ≤ len( A), and such loop counter bounds on the loop
invariant.
because of this we can conclude that len( A) − i ≥ 0, so len( A) − i ∈ N.

Since we have established that v is a decreasing, bounded variant for the loop,
this loop terminates, and therefore avg terminates (since every other line of code
is a simple step that certainly terminates). 

Notice that the above termination proof relied on i increasing by 1 on each itera-
tion, and that i never exceeds len(A). That is, we basically just used the fact that
i was a standard loop counter. Here is a more complex example where there is
no obvious loop counter.

Example 4.7. Prove that the following function terminates:


introduction to the theory of computation 53

1 def term_ex(x,y):
2 ''' Pre: x and y are natural numbers. '''
3 a = x
4 b = y
5 while a > 0 or b > 0:
6 if a > 0:
7 a -= 1
8 else:
9 b -= 1
10 return x * y

Proof. Intuitively, the loop terminates because when the loop runs, either a or b
decreases, and will stop when a and b reach 0. To make this argument formal,
we need the following loop invariant: a, b ≥ 0, whose proof we leave as an
exercise.

The loop variant we define is v = a + b. Let us prove the necessary properties for
v:

• In the loop, either a decreases by 1, or b decreases by 1. In either case, v =


a + b decreases by 1. Therefore v is decreasing.
• Since our loop invariant says that a, b ≥ 0, we have that v ≥ 0 as well. There-
fore v ∈ N. 

Exercises

1. Here is some code that recursively determines the smallest element of a list.
Give pre- and postconditions for this function, then prove it is correct accord-
ing to your specifications.

1 def recmin(A):
2 if len(A) == 1:
3 return A[0]
4 else:
5 m = len(A) // 2
6 min1 = recmin(A[0..m-1])
7 min2 = recmin(A[m..len(A)-1])
8 return min(min1, min2)

2. Prove that the following code is correct, according to its specifications.

1 def sort_colours(A):
2 '''
3 Pre: A is a list whose elements are either 'red' or 'blue'
54 david liu

4 Post: All red elements in A appear before all blue ones


5 '''
6 i = 0
7 j = 0
8 while i < len(A):
9 if A[i] is red:
10 swap A[i], A[j]
11 j += 1
12 i +=1

3. Prove the following loop invariant for the loop in term_ex: Inv( a, b) : a, b ≥ 0.
4. Consider the following modification of the term_exexample.

1 def term_ex_2(x,y):
2 ''' Pre: x and y are natural numbers '''
3 a = x
4 b = y
5 while a >= 0 or b >= 0:
6 if a > 0:
7 a -= 1
8 else:
9 b -= 1
10 return x * y

(a) Demonstrate via example that this doesn’t always terminate.


(b) Show why the proof of termination given for term_ex fails.

5. For each of the following, state pre- and postconditions that capture what
the program is designed to do, then prove that it is correct according to your
specifications.
Don’t forget to prove termination (even though this is pretty simple). It’s easy
to forget about this if you aren’t paying attention.

(a)

1 def mod(n, d):


2 r = n
3 while r >= d:
4 r -= d
5 return r

(b)

1 def div(n, d):


2 r = n
3 q = 0
introduction to the theory of computation 55

4 while r >= d:
5 r -= d
6 q += 1
7 return q

(c)

1 def lcm(a,b):
2 x = a
3 y = b
4 while x != y:
5 if x < y:
6 x += a
7 else:
8 y += b
9 return x

(d)

1 def div3(s):
2 sum = 0
3 i = 0
4 while i < len(s):
5 sum += s[i]
6 i += 1
7 return sum mod 3 == 0

(e)

1 def count_zeroes(L):
2 z = 0
3 i = 0
4 while i < len(L):
5 if L[i] == 0:
6 z += 1
7 i +=1
8 return z

(f)

1 def f(n):
2 r = 2
3 i = n
4 while i > 0:
5 r = 3*r -2
6 i -= 1
7 return r
56 david liu

6. Consider the following code.

1 def f(x):
2 ''' Pre: x is a natural number '''
3 a = x
4 y = 10
5 while a > 0:
6 a -= y
7 y -= 1
8 return a * y

(a) Give a loop invariant that characterizes the values of a and y.


(b) Show that sometimes this code fails to terminate.

7. In this question, we study two different algorithms for exponentiation: a re-


cursive and iterative algorithm. First, state pre- and postconditions that the
algorithms must satisfy (they’re the same for the two). Then, prove that each
algorithm is correct according to the specifications.

(a)

1 def exp_rec(a, b):


2 if b == 0:
3 return 1
4 else if b mod 2 == 0:
5 x = exp_rec(a, b / 2)
6 return x * x
7 else:
8 x = exp_rec(a, (b - 1)/2)
9 return x * x * a

(b)

1 def exp_iter(a, b):


2 ans = 1
3 mult = a
4 exp = b
5 while exp > 0:
6 if exp mod 2 == 1:
7 ans *= mult
8 mult = mult * mult
9 exp = exp // 2
10 return ans

8. Prove that the following function is correct. Warning: this one is probably the
most difficult of these exercises. But, it runs in linear time – pretty amazing!
introduction to the theory of computation 57

1 def majority(A):
2 '''
3 Pre: A is a list with more than half its entries equal to x
4 Post: Returns the majority element x
5 '''
6 c = 1
7 m = A[0]
8 i = 1
9 while i <= len(A) - 1:
10 if c == 0:
11 m = A[i]
12 c == 1
13 else if A[i] == m:
14 c += 1
15 else:
16 c -= 1
17 i += 1
18 return m

9. Here we study yet another sorting algorithm, bubblesort.

1 def bubblesort(L):
2 '''
3 Pre: L is a list of numbers
4 Post: L is sorted
5 '''
6 k = 0
7 while k < len(L):
8 i = 0
9 while i < len(L) - k:
10 if L[i] > L[i+1]:
11 swap L[i] and L[i+1]
12 i += 1
13 k +=1

(a) State and prove an invariant for the inner loop.


(b) State and prove an invariant for the outer loop.
(c) Prove that bubblesort is correct, according to its specifications.

10. Consider the following generalization of the min function.

1 def extract(A, k):


2 pivot = A[0]
3 # Use partition from quicksort
4 L, G = partition(A[1..len(A) - 1], pivot)
5 if len(L) == k - 1:
58 david liu

6 return pivot
7 else if len(L) >= k:
8 return extract(L, k)
9 else:
10 return extract(G, k - len(L) - 1)

(a) Prove that this algorithm is correct.


(b) Analyse the worst-case running time of this algorithm. Hint: this algorithm
is known as quickselect, and is pretty obviously related to quicksort.
5 Regular Languages & Finite Automata

In this final chapter, we turn our attention to the study of finite automata, a
simple model of computation with surprisingly deep applications ranging from
vending machines to neurological systems. We focus on one particular applica-
tion: matching regular languages, which are the foundation of natural language
processing, including text searching and parsing. This application alone makes
automata an invaluable computational tool, one with which you are probably
already familiar in the guise of regular expressions.

Definitions

We open with some definitions related to strings. An alphabet Σ is a finite set alphabet
of symbols, e.g., {0, 1}, { a, b, . . . , z}, or {0, 1, . . . , 9, +, −, ×, ÷}. A string over
string
an alphabet Σ is a finite sequence of symbols from Σ. Therefore “0110” is a
string over {0, 1}, and “abba” and “cdbaaaa” are strings over { a, b, c, d}. The
empty string “”, denoted by e, consists of a sequence of zero symbols from the
Σ: Greek letter “Sigma”, e: “epsilon”
alphabet. We use the notation Σ∗ to denote the set of all strings over the alphabet
Σ.

The length of a string w ∈ Σ∗ is the number of symbols appearing in the string, length
and is denoted |w|. For example, |e| = 0, | aab| = 3, and |11101010101| = 11. We
use Σn to denote the set of strings over Σ of length n. For example, if Σ = {0, 1},
then Σ0 = {e} and Σ2 = {00, 01, 10, 11}. So Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ · · ·

A language L over an alphabet Σ is a subset of strings over Σ: L ⊆ Σ∗ . Unlike


language
alphabets and strings, languages may be either finite or infinite. Here are some
The “English language” is a subset of
examples of languages over the alphabet { a, b, c}: the strings that can be formed by all
possible combinations of the usual 26
{e, a, b, ccc} letters.

{w ∈ { a, b, c}∗ | |w| ≤ 3}
{w ∈ { a, b, c}∗ | w has the same number of a’s and c’s}
{w ∈ { a, b, c}∗ | w can be found in an English dictionary}

These are pretty mundane examples. Somewhat surprisingly, however, this no-
tion of languages also captures solutions to computational problems. Consider the
following languages over the alphabet of all standard ASCII characters.

E.g., “[1, 2, 76]00 ∈ L1


60 david liu

L1 = { A | A is a string representation of a sorted list of numbers}


L2 = {( A, x ) | A is a list of numbers, x is the minimum of A}
L3 = {( a, b, c) | a, b, c ∈ N and gcd( a, b) = c}
L4 = {( P, x ) | P is a Python program that halts when given input x }

So we can interpret many computer programs as deciding membership in a partic-


ular language. For example, a program that decides whether an input string is in Membership in L4 cannot be computed
L1 is essentially a program that decides whether an input list is sorted. by any program at all — this is the
famous Halting Problem!
Solutions to computational problems are just one face of the coin; what about the
programs we create to solve them? One of the great achievements of the early
computer scientist Alan Turing was the development of the Turing Machine, an
abstract model of computation that is just as powerful as the physical computers
we use today. Unfortunately, Turing Machines are a far more powerful and
complex model than is appropriate for this course. Instead, we will study the You will study Turing Machines to your
simpler computational model of finite automata, and the class of languages they heart’s content in CSC363/CSC463.
compute: regular languages.

Regular Languages

Regular languages are the most basic kind of languages, and are derived from
rather simple language operations. In particular, we define the following three
operations for languages L, M ⊆ Σ∗ :

• Union: The union of L and M is the language L ∪ M = { x ∈ Σ∗ | x ∈ L or x ∈


union
M }.
• Concatenation: The concatenation of L and M is the language LM = { xy ∈
concatenation
Σ∗ | x ∈ L, y ∈ M}.
• Kleene Star: The (Kleene) star of L is the language L∗ = {e} ∪ { x ∈ Σ∗ | Kleene star
∃w1 , w2 , . . . , wn ∈ L such that x = w1 w2 . . . wn , for some n}. That is, L∗ con- The star operation can be thought of
tains the strings that can be broken down into 0 or more smaller strings, each in terms of union and concatenation as
L∗ = {e} ∪ L ∪ LL ∪ LLL ∪ · · ·
of which is in L.

Example 5.1. Consider the languages L = { a, bb} and M = { a, c}. Then we have

L ∪ M = { a, bb, c}
LM = { aa, ac, bba, bbc}
L∗ = {e, a, aa, bb, aaa, abb, bba, . . . }
M∗ = {e, a, c, aa, ac, ca, cc, aaa, aac, . . . } Note that M∗ = { a, c}∗ is exactly the
strings made up of only a’s and c’s.
This explains the notation Σ∗ to denote
the set of all strings over the alphabet Σ.

We can now give the following recursive definition of regular languages:

Definition 5.1 (Regular Language). The set of regular languages over an alphabet regular language
Σ is defined recursively as follows:

• ∅, the empty set, is a regular language.


introduction to the theory of computation 61

Students often confuse the notation ∅,


• {e}, the language consisting of only the empty string, is a regular language. e, and {e}. First, e is a string, while ∅
and {e} are sets of strings. The set ∅
• For any symbol a ∈ Σ, { a} is a regular language. contains no strings (and so has size 0),
while {e} contains the single string e
• If L, M are regular languages, then so are L ∪ M, LM, and L∗ . (and so has size 1).

Regular languages are sets of strings, and are often infinite. As humans, we
are able to leverage our language processing and logical abilities to represent
languages; for example, “strings that start and end with the same character”
and “strings that have an even number of zeroes” are both simple descriptions
of regular languages. What about computers? We could certainly write simple
programs that compute either of the above languages, but we have grander am-
bitions. Specifically, we would like a simple, computer-friendly representation Exercise: write these programs.
of regular languages so that we could input an arbitrary regular language and a
string, and the computer would determine whether the string is in the language
or not.

This is precisely the idea behind the regular expression (regex), a pattern-based regular expression
string representation of a regular language. Given a regular expression r, we use
L(r ) to denote the language matched (represented) by r. Here are the elements Note the almost identical structure
of regular expressions: to the definition of regular languages
themselves.

• ∅ is a regex, with L(∅) = ∅ (matches no string) We follow the unfortunate overload-


• e is a regex, with L(e) = {e} (matches only the empty string) ing of symbols of previous course
instructors. When we write, for exam-
• For all symbols a ∈ Σ, a is a regex with L( a) = { a} (matches only the single ple, L(∅) = ∅, the first ∅ is a string
with the single character ∅, which is
string a)
a regular expression; the second ∅ is
• Let r1 , r2 be regexes. Then r1 + r2 , r1 r2 , and r1∗ are regexes, with L(r1 + r2 ) = the standard mathematical notation
representing the empty set.
L(r1 ) ∪ L(r2 ), L(r1 r2 ) = L(r1 )L(r2 ), and L(r1∗ ) = (L(r1 ))∗ (matches union,
concatenation, and star, respectively)

It’s an easy exercise to prove by structural induction that every regular language
can be matched by a regular expression, and every regular expression matches a language
that is regular. Another way to put it is that a language L is regular if and only if
there is a regular expression r such that L = L(r ).

Example 5.2. Let Σ = {0, 1}. Describe the language of the regex 01 + 1(0 + 1)∗ .
To interpret this regex, we need to understand precedence rules. By convention,
these are identical to the standard arithmetic precedence; thus star has the highest
precedence, followed by concatenation, and finally union. Therefore the complete So star is like power, concatenation is
bracketing of this regex is (01) + (1((0 + 1)∗ )), but with these precedence rules like multiplication, and union is like
in place, we only need the brackets around the (0 + 1). addition.

Let us proceed part by part. The “01” component matches the string 01. The
(0 + 1)∗ matches all binary strings, because it contains all strings resulting from
adding a 0 or 1 at each step. This means that 1(0 + 1)∗ matches a 1, followed by
any binary string. Finally, we take the union of these two: L(01 + 1(0 + 1)∗ ) is
the set of strings that are either 01 or start with a 1.

Example 5.3. Let’s go the other way, and develop a regular expression given a
description of the following regular language:

L = {w ∈ { a, b}∗ | w has length at most 2}.


62 david liu

Solution:

Note that L is finite, so in fact we can simply list out all of the strings in our
regex:
e + a + b + aa + ab + ba + bb
Another strategy is to divide up the regex into cases depending on length:

e + ( a + b) + ( a + b)( a + b)

The three parts capture the strings of length 0, 1, and 2, respectively. A final
representation would be to say that we’ll match two characters, each of which
could be empty, a, or b:
(e + a + b)(e + a + b).
All three regexes we gave are correct! In general, there is more than one regular
expression that matches any given regular language.

Example 5.4. A regular expression matching the language L = {w ∈ {0, 1}∗ |


w has 11 as a substring} is rather straightforward:

(0 + 1)∗ 11(0 + 1)∗ .

But what about the complement of L, i.e., the language L̄ = {w ∈ {0, 1}∗ |
w does not have 11 as a substring}? It is more difficult to find a regex for this
language because regular expressions specify patterns that should be matched,
not avoided. Let’s approach this problem by interpreting the definition as “every
1 must be preceded by a 0.”

Here is our first attempt:


(00∗ 1)∗ .
Inside the brackets, 00∗ 1 matches a non-empty block of 0’s followed by a 1,
and this certainly ensures that there are no consecutive 1’s. Unfortunately, this
regular expression matches only a subset of L̄, and not L̄ entirely. For example,
the string 1001 is in L̄, but can’t be matched by this regular expression because
the first 1 isn’t preceded by any 0’s. We can fix this by saying that the first
character could possibly be a 1:

(e + 1)(00∗ 1)∗ .

There is one last problem: this fails to match any string that ends with 0’s, e.g.,
0100. We can apply a similar fix as the previous one to allow a block of 0’s to be
matched at the end:
(e + 1)(00∗ 1)∗ 0∗ .

Some interesting meta-remarks arise naturally from the last example. One is a
rather straightforward way of showing that a language and regex are inequiv- Keep this in mind when you and
alent, by simply producing a string that is in one but not the other. On the your friends are arguing about whose
other hand, convincing ourselves that a regular expression correctly matches a regular expressions are correct.

language can be quite difficult; how did we know that (e + 1)(00∗ 1)∗ 0∗ correctly
matches L̄? Proving that a regex matches a particular language L is much harder,
because we need to show that every string in L is matched by the regex, and every Note that this is a universally quantified
string not in L is not. statement, while its negation (regex
doesn’t match L) is existentially quan-
tified. This explains the difference in
difficulty.
introduction to the theory of computation 63

Our intuition only takes us so far — it is precisely this gap between our gut
and true understanding that proofs were created to fill. This is, however, just a
little beyond the scope of the course; you have all the necessary ingredients —
surprise: induction — but the arguments for all but the simplest languages are
quite involved. We will see later that reasoning about
the correctness of deterministic finite
Example 5.5. We finish off this section with a few corner cases to illustrate some automata is just as powerful, and a little
simpler.
of the subtleties in our definition, and extend the arithmetic metaphor we hinted
at earlier. First, the following equalities show that ∅ plays the role of the “zero”
in regular expressions. Let r be an arbitrary regular expression.

∅+r = r+∅ = r
By equality between regular expressions
∅r = r∅ = ∅ we mean that they match the same
language. That is, r1 = r2 ⇔ L(r1 ) =
L(r2 ).
The first is obvious because taking the union of any set with the empty set
doesn’t change the set. What about concatenation? Recall that concatenation
of languages involves taking combinations of strings from the first language
and strings from the second; if one of the two languages is empty, then no
combinations are possible.

We can use similar arguments to show that e plays the role of the “one”:

er = re = r
e∗ = e

A Suggestive Flowchart

You might be wondering how computers actually match strings to regular ex-
pressions. It turns out that regular languages are rather easy to match because
of the following (non-obvious!) property:

You can determine membership in a regular language by reading


symbols of the string one at a time, left to right.
0 0
Consider the flowchart-type object shown in the figure on the right. Suppose we
1
start at the state marked q0 . Now consider the string 0110, reading the symbols
q0 q1
one at a time, left to right, and following the arrows marked by the symbols we
read in. It is not hard to see that we end up at state q0 . On the other hand, if 1
the string is 111, we end up at state q1 . The state q1 is marked as special by a
double-border; we say that a string ending up at q1 is accepted, while a string
ending at q0 is rejected.

After some thought you may realise that the accepted strings are exactly the
ones with an odd number of 1’s. A more suggestive way of saying this is that
the language accepted by this flowchart is the set of strings with an odd number
of 1’s.
64 david liu

Deterministic Finite Automata

We now use the notion of “flowcharts” to define a simple model of computation.


Each one acts as a simple computer program: starting at a particular point, it
receives inputs from the user, updates its internal memory, and when the inputs
are finished, it outputs True or False. The following definition is likely one of
the most technical you’ve encountered to date, in this course and others. Make
sure you fully understand the prior example, and try to match it to the formal
definition as you read it.

Definition 5.2 (Deterministic Finite Automaton). A deterministic finite automa-


ton (DFA) (denoted D ) is a quintuple D = ( Q, Σ, δ, s, F ) where the components deterministic finite automaton
define the various parts of the automaton:

• Q is the (finite) set of states in D


• Σ is the alphabet of symbols used by D
• δ : Q × Σ → Q is the transition function (represents the “arrows”)
• s ∈ Q is the initial state of D
• F ⊆ Q is the set of accepting (final) states of D

Example 5.6. In the introductory example, the state set is Q = {q0 , q1 }, the Note that there’s only one final state in
alphabet is {0, 1}, the initial state is q0 , and the set of final states is {q1 }. We can this example, but in general there may
represent the transition function as a table of values: be several final states.

Old State Symbol New State


q0 0 q0
q0 1 q1
q1 0 q1
q1 1 q0

In general, the number of rows in the transition table is | Q| · |Σ|; each state must
have exactly |Σ| transitions leading out of it, each labelled with a unique symbol
in Σ.

Before proceeding, make sure you understand each of the following statements
about how DFAs work:

• DFAs read strings one symbol at a time, from left to right


• DFAs cannot “go back” and reread previous symbols
• At a particular state, once you have read a symbol there is only one arrow
(transition) you can follow
• DFAs have a finite amount of memory, since they have a finite number of
states
• Inputs to DFAs can be any length
• Because of these last two points, it is impossible for DFAs to always remember
every symbol they have read so far
introduction to the theory of computation 65

A quick note about notation before proceeding with a few more examples. Tech-
nically, δ takes as its second argument a single symbol; e.g., δ(q0 , 1) = q1
(from the previous example). But we can just as easily extend this definition
to arbitrary-length strings in the second argument. For example, we can say
δ(q0 , 11) = q0 , δ(q1 , 1000111) = q1 , and δ(q0 , e) = q0 . For every state q, δ(q, e) = q.

With this “extended” transition function, it is very easy to symbolically represent


the language L(D) accepted by the DFA D . Recall that the language accepted by a
DFA is the set of strings accepted by it.
L(D) = {w ∈ Σ∗ | δ(s, w) ∈ F }

Example 5.7. Let us design a DFA that accepts the following language:

L = {w ∈ { a, b}∗ | w starts with a and ends with b}.


66 david liu

Solution:
Consider starting at an initial state q0 . q0

What happens when you read in an a or a b? Reading in a b should lead to


q0 b q1
a state q1 where it’s impossible to accept. On the other hand, reading in an a a, b

should lead to a state q2 where we need to “end with a b.” The simple way to a
achieve this is to have q2 loop on a’s, then move to an accepting state q3 on a b.
q2 b q3
What about the transitions for q3 ? As long as it’s reading b’s, it can accept, so it a
should loop to itself. On the other hand, if it reads an a, it should go back to q2 ,
and continue reading until it sees another b. q0
b

q1 a, b
Correctness of DFAs
a b

b
Like regular expressions, arguing that DFAs are incorrect is generally easier than q2 q3
arguing that they are correct. However, because of the rather restricted form a a
DFAs must take, reasoning about their behaviour is a little more amenable than
for regular expressions. q0
b
The simple strategy of “pick an arbitrary string in L, and show that it is accepted
by the DFA” is hard to accomplish, because the paths taken through the DFA by q1 a, b
each accepted string can be quite different; i.e., different strings will probably
require substantially different proofs. Therefore we adopt a different strategy.
We know that DFAs consist of states and transitions between states; the term
state suggests that if a string reaches that point, the DFA “knows” something
about that string, or it is “expecting” what will come next. We formalize this
notion by characterizing for each state precisely what must be true about the
strings that reach it.

Definition 5.3 (State invariant). Let D = ( Q, Σ, δ, s, F ) be a DFA. Let q ∈ Q be


a state of the DFA. We define a state invariant for q as a predicate P( x ) (over state invariant
domain Σ∗ ) such that for every string w ∈ Σ∗ , δ(s, w) = q if and only if P(w) is
true.

Note that the definition of state invariant uses an if and only if. We aren’t
just giving properties that the strings reaching q must satisfy; we are defining
precisely which strings reach q. Let us see how state invariants can help us prove
that DFAs are correct.

Example 5.8. Consider the following language over the alphabet {0, 1}: L = 0 0
{w | w has an odd number of 1’s}, and the DFA shown. Prove that the DFA
accepts precisely the language L. 1
q0 q1
Proof. It is fairly intuitive why this DFA is correct: strings with an even number 1
of 1’s go to q0 , and transition to q1 upon reading a 1 (where the string now has
an odd number of 1’s). Here are some state invariants for the two states:

δ(q0 , w) = q0 ⇔ w has an even number of 1’s


δ(q0 , w) = q1 ⇔ w has an odd number of 1’s
introduction to the theory of computation 67

Here are two important properties to keep in mind when designing state invari-
ants:

• The state invariants should be mutually exclusive. That is, there should be no
overlap between them; no string should satisfy two different state invariants.
Otherwise, to which state would the string go?
• The state invariants should be exhaustive. That is, they should cover all pos-
sible cases; every string in Σ∗ , including e, should satisfy one of the state
invariants. Otherwise, the string goes nowhere.

These conditions are definitely satisfied by our two invariants above, since every
string has either an even or odd number of 1’s.

Next, we want to prove that the state invariants are correct. We do this in two
steps.

• Show that the empty string e satisfies the state invariant of the initial state.
In our case, the initial state is q0 ; e has zero 1’s, which is even; therefore the
state invariant is satisfied by e.
a
• For each transition q − → r, show that if a string w satisfies the invariant of
state q, then the string wa satisfies the invariant of r. (That is, each transition
respects the invariants.) There are four transitions in our DFA. For the two
The astute reader will note that we are
0-loops, appending a 0 to a string doesn’t change the number of 1’s in the basically doing a proof by induction
string, and hence if w contains an even (odd) number of 1’s, then w0 contains on the length of the strings. Like our
proofs of program correctness, we
an even (odd) number of 1’s as well, so these two transitions are correct. “hide” the formalities of induction
proofs and focus only on the content.
On the other hand, appending a 1 increases the number of 1’s in a string by
one. So if w contains an even (odd) number of 1’s, w1 contains an odd (even)
number of 1’s, so the transitions between q0 and q1 labelled 1 preserve the
invariants.

Thus we have proved that the state invariants are correct. The final step is to
show that the state invariants of the accepting state(s) precisely describe the
target language. This is very obvious in this case, because the only accepting
Remember that in general, there can be
state is q1 , and its state invariant is exactly the defining characteristic of the more than one accepting state!
target language L. 

Limitations of DFAs

The simplicity of the DFA model enables proofs of correctness, as shown above.
This simplicity is also useful for reasoning about the model’s limitations. In this
section, we’ll cover two examples. First, we prove a lower bound on the number
of states required in a DFA accepting a particular language. Then, we’ll show
that certain languages cannot be accepted by DFAs of any size!
Example 5.9. Consider the language

L = {w ∈ {0, 1}∗ | w has at least three 1’s.}.

We’ll prove that any DFA accepting this language has at least 4 states.
68 david liu

Proof. Suppose there exists a DFA D = ( Q, Σ, δ, s, F ) that accepts L and has


fewer than four states. Consider the four strings w0 = e, w1 = 1, w2 = 11, and The Pigeonhole Principle states that if
w3 = 111. By the Pigeonhole Principle, two of these strings reach the same state m > n, and you are putting m pigeons
into n holes, then two pigeons will go
in D from the initial state. Suppose 0 ≤ i < j ≤ 3, and δ(s, wi ) = δ(s, w j ) = q for into the same hole.
some state q ∈ Q. (That is, wi and w j both reach the same state q.)

Now, since wi and w j reach the same state, they are indistinguishable prefixes to
the DFA; this means that any strings of the form wi x and w j x will end up at the
same state in the DFA, and hence are both accepted or both rejected. However,
suppose x = w3− j . Then wi x = wi+3− j and w j x = w j+3− j = w3 . Then wi x
contains 3 − j + i 1’s, and w j x contains three 1’s. But since i < j, 3 − j + i < 3, so
wi x ∈
/ L, while w j x ∈ L. Therefore wi x and w j x cannot end up at the same state
in D , a contradiction! 

The key idea in the above proof was that the four different strings e, 1, 11, 111 all
had to reach different states, because there were suffixes that could distinguish
any pair of them. In general, to prove that a language L requires at least k These suffixes were the x = w3− j in the
states in a DFA to accept it, it suffices to give a set of k strings, each of which is proof.
distinguishable from the others with respect to L. Two strings w1 and w2 are “distinguish-
able with respect to L” if there is a
Now we turn to a harder problem: proving that some languages cannot be suffix x such that w1 x ∈ L and w2 x ∈/ L,
accepted by DFAs of any size. or vice versa.

Example 5.10. Consider the language L = {0n 1n | n ∈ N}. Prove that no DFA
accepts L.

Proof. We’ll prove this by contradiction again. Suppose there is a DFA D =


( Q, Σ, δ, s, F ) that accepts L. Let k = | Q|, the number of states in D . Now here is
the key idea: D has only k states of “memory,” whereas to accept L, you really
have to remember the exact number of 0’s you’ve read in so far, so that you can
match that number of 1’s later. So we should be able to “overload” the memory
of D .

Here’s how: consider the string w = 0k+1 , i.e., the string consisting of k + 1 0’s.
Since there are only k states in D , the path that w takes through D must involve
a loop starting at some state q. That is, we can break up w into three parts:
w = 0a 0b 0c , where b ≥ 1, δ(s, 0a ) = q, and δ(q, 0b ) = q.

This loop is dangerous for the DFA! Because reading 0b causes a loop that begins
and ends at q, the DFA forgets whether it has read 0b or not; thus the strings
0a 0c and 0a 0b 0c reach the same state, and are now indistinguishable to D . But of
The DFA has lost track of the number of
course these two strings are distinguishable with respect to L: 0a 0c 1a+c ∈ L, but 0’s.
0 a 0b 0c 1 a + c ∈
/ L. 

Nondeterminism

Consider the following language over the alphabet {0, 1}∗ : L = {w | the third last character of w is 1}.
You’ll see in the Exercises that a DFA takes at least 8 states to accept L, using
the techniques we developed in the previous section. Yet there is a very short
introduction to the theory of computation 69

regular expression that matches this language: (0 + 1)∗ 1(0 + 1)(0 + 1). Contrast
this with the regular expression (0 + 1)(0 + 1)1(0 + 1)∗ , matching strings whose
third character is 1; this has a simpler 5-state DFA. You can prove this in the Exercises.

Why is it hard for DFAs to “implement” the former regex, but easy to imple-
ment the latter? The fundamental problem is the uncertainty associated with the
Kleene star. In the former case, a DFA cannot tell how many characters to match
q0 0,1
with the initial (0 + 1)∗ segment, before moving on to the 1! This is not an issue in
the latter case, because DFAs read left to right, and so have no problem reading
the first three characters, and then matching the rest of the string to the (0 + 1)∗ .
1

On the other hand, consider the automaton to the right. This is not deterministic q1
because it has a “choice”: reading in a 1 at q0 can loop back to q0 or move to q1 .
Moreover, reading in a 0 or a 1 at q3 leads nowhere! But consider this: for any 0,1
string whose third last character is indeed a 1, there is a “correct path” that leads
to the final state q3 . For example, for the string 0101110, the correct path would q2
continuously loop at q0 for the prefix 0101, then read the next 1, transition to q1 ,
then read the remaining 10 to end at q3 , and accept. 0,1
BUT WAIT, you say, isn’t this like cheating? How did the automaton “know” to
loop at the first two 1’s, then transition to q1 on the third 1? We will define our q3
model in this way first, so bear with us. The remarkable fact we’ll show later is
that, while this model seems more powerful than plain old DFAs, in fact every
language that we can accept in this model can also be accepted with a DFA. Cheating doesn’t help.

A Nondeterministic Finite Automaton (NFA) is defined in a similar fashion to nondeterministic finite automaton
a DFA: it is a quintuple N = ( Q, Σ, δ, s, F ), with Q, Σ, s, and F playing the same
roles as before. The transition function δ now maps to sets of states rather than
individual states; that is, δ : Q × Σ → 2Q , where 2Q represents the set of all
subsets of Q. For instance, in the previous example, δ(q0 , 1) = {q0 , q1 }, and
δ(q3 , 0) = ∅.

We think of δ(q, a) here as representing the set of possible states reachable from q
by reading in the symbol a. This is extended in the natural way to δ(q, w) for
arbitrary length strings w to mean all states reachable from q by reading in the
string w. Note that for state q and next symbol a, if δ(q, a) = ∅ then this path
“aborts,” i.e., the NFA does not continue reading more characters for this path.

We say that a string w is accepted by NFA N if δ(s, w) ∩ F 6= ∅; that is, if there


exists a final state reachable from the initial state by reading the string w. Note
that the existential quantifier used in the definition of acceptance is an integral On the other hand, if a string is rejected
by an NFA, this means all possible
part of the definition: we don’t require that every path leading out of s along w paths that string could take either
reach a final state, only one. This formalizes the notion that it be possible to choose aborted or ended at a rejecting state.
the correct path, even from among many rejecting or aborted paths.

e-transitions

We can augment NFAs even further through the use of e-transitions. These are e-transition
nondeterministic transitions that do not require reading in a symbol to activate.
That is, if you are currently at state q on an NFA, and there is an e-transition
q e
r

70 david liu

from q to another state r, then you can transition to r without reading the next
q0
symbol in the string.
e
For instance, the NFA on the right accepts the string 0 by taking the e-transition e
q1 q4
to q1 , reading the 0 to reach q2 , then taking another e-transition to q3 .

As was the case for nondeterminism, e-transitions do not actually add any power 0 1
to the model, although they can be useful in certain constructions that we’ll use
in the next section. q2 q5

e, 0

Equivalence of Definitions
q3

So far in this chapter, we have used both DFAs and regular expressions to rep-
resent the class of regular languages. We have taken for granted that DFAs are
sufficient to represent regular languages; in this section, we will prove this for-
mally. There is also the question of nondeterminism: do NFAs accept a larger
class of languages than DFAs? In this section, we’ll show that the answer, some-
what surprisingly, is no. Specifically, we will sketch a proof of the following
theorem.

Theorem 5.1 (Equivalence of Representations of Regular Languages). Let L be a


language over an alphabet Σ. Then the following are equivalent:

(1) There is a regular expression that matches L.


(2) There is a deterministic finite automaton that accepts L.
(3) There is a nondeterministic finite automaton (possibly with e-transitions) that
accepts L.

Proof. If you aren’t familiar with theorems asserting the equivalence of multiple
We only sketch the main ideas of the
statements, what we need to prove is that any one of the statements being true proof. For a more formal treatment,
implies that all of the others must also be true. We are going to prove this by see Sections 7.4.2 and 7.6 of Vassos
Hadzilacos’ course notes.
showing the following chain of implications: (3) ⇒ (2) ⇒ (1) ⇒ (3).

(3) ⇒ (2). Given an NFA, we’ll show how to construct a DFA that accepts b
the same language. Here is the high-level idea: nondeterminism allows you to e, a
a
“choose” different paths to take through an automaton. After reading in some 0 1 2
characters, the possible path choices can be interpreted as the NFA being simul- b
taneously in some set of states. Therefore we can model the NFA as transitioning a
between sets of states each time a symbol is read. Rather than formally defining
0 01 02 012
this construction, we’ll illustrate this on an example NFA (shown right).

There are 23 = 8 possible subsets of states in this NFA; to construct a DFA,


∅ 1 2 12
we start with one state for each subset (notice that they are labelled according
to which states of the NFA they contain, so state 02 corresponds to the subset
{0, 2}).
0 02 012
But, we notice that the e-transition between 1 and 2 ensures that every time we
could be in state 1 of the NFA, we could also be in state 2. Therefore we can
∅ 2 12
ignore states 01 and 1 in our construction, leaving us with just six states.
b a, b

a, b
0 02 012
introduction to the theory of computation 71

Next, we put in transitions between the states of the DFA. Consider the subset
{0, 2} of states in the NFA. Upon reading the symbol a, we could end up at
all three states, {0, 1, 2} (notice that to reach 2, we must transition from 2 to 1
by reading the a, then use the e-transition from 1 back to 2). In our constructed
DFA, there is a transition between {1, 2} and {0, 1, 2} on symbol a. So in general,
we look at all possible outcomes starting from a state in subset S and reading
symbol a. Repeating this for all subsets yields the following transitions.

Next, we need to identify initial and final states. The initial state of the NFA is
0, and since there are no e-transitions, the initial state of the constructed DFA is
{0}. The final states of the DFA are exactly the subsets containing the final state
1 of the NFA. Finally, we can simplify the DFA considerably by removing the
states ∅ and {2}, which cannot be reached from the initial state.

(1) ⇒ (3). In this part, we show how to construct NFAs from regular expres-
sions. Note that regular expressions have a recursive definition, so we can ac-
tually prove this part using structural induction. First, we show standard NFAs { a}
{∅} {e}
for accepting ∅, {e}, and { a} (for a generic symbol a).
a
Next, we show how to construct NFAs for union, concatenation, and star, the
three recursive operations used to define regular expressions. Note that because
we’re using structural induction, it suffices to show how to perform these oper-
ations on NFAs; that is, given two NFAs N1 and N2 , construct NFAs accepting
L(N1 ) ∪ L(N2 ), L(N1 )L(N2 ), and (L(N1 ))∗ . We use the notation on the right N1 N2

to denote generic NFAs; the two accepting states on the right side of each box s1 s2

symbolize all accepting states of the NFAs, and their start states are s1 and s2 ,
respectively.
N1
First consider union. This can be accepted by the NFA shown to the right.
Essentially, the idea is that starting in a new start state, we “guess” whether the
s1
word will be accepted by N1 or N2 by e-transitioning to either s1 or s2 , and then
e
see if the word is actually accepted by running the corresponding NFA.
s
N2
For concatenation, we start with the first NFA N1 , and then every time we reach
a final state, we “guess” that the matched string from L(N1 ) is complete, and e
s2
e-transition to the start state of N2 .

Finally, for the Kleene star we perform a similar construction, except that the
N1 N2
final states of N1 e-transition to s1 rather than s2 . To possibly accept e, we add a
new initial state that is also accepting. s1 e s2

(2) ⇒ (1). Finally, we show how to construct regular expressions from DFAs. e

This is the hardest construction to prove, so our sketch here will be especially N1

vague. The key idea is the following: e

e
s s1

e
For any two states q, r in a DFA, there is a regular expression that
matches precisely the strings w such that δ(q, w) = r; i.e., the strings
that induce a path from q to r.

Let D = ( Q, Σ, δ, s, F ) be a DFA with n states, Q = {1, . . . , n}. For each i, j ∈ Q,


we define the sets Lij = {w | δ(i, w) = j}; our ultimate goal is to show that every
72 david liu

Lij can be matched by a regular expression. To do this, we use a clever induction


argument. For every 0 ≤ k ≤ n, let
This part was first proved by Stephen
Kleene, the inventor of regular expres-
Lij (k) = {w | δ(i, w) = j, and only states ≤ k are passed between i and j} sions and after whom the Kleene star is
named.

Note that Lij (0) is the set of strings where there must be no intermediate states,
i.e., w is a symbol labelling a transition directly from i to j. Also, Lij (n) = Lij :
no restrictions are placed on the states that can be passed. We will show how
to inductively build up regular expressions matching each of the Lij (k ), where Formally, our predicate is P(k ) : “For
the induction is done on k. First, the base case, which we’ve already described all states i and j, the set Lij (k ) can be
matched by a regex.”
intuitively:
{ a ∈ Σ | δ(i, a) = j},
(
if i 6= j
Lij (0) =
{ a ∈ Σ | δ(i, a) = j} ∪ {e}, if i = j
Note that when i = j, we need to include e, as this indicates the trivial act of
following no transition at all. Since the Lij (0) are finite sets (of symbols), we can
write regular expressions for them (e.g., if Lij (0) = { a, c, f }, the regex would be You can prove this fact in the Exercises.
a + c + f ).

Finally, here is the recursive definition of the sets that will allow us to construct
regular expressions: it defines Lij (k + 1) in terms of some L... (k), using only the
operations of union, concatenation, and star:

Lij (k + 1) = Lij (k ) ∪ Li,k+1 (k) Lk+1,k+1 (k )∗ Lk+1,j (k)




Therefore, given regular expressions for Lij (k), Li,k+1 (k ), Lk+1,k+1 (k ), and Lk+1,j (k ),
we can construct a regular expression for Lij (k + 1).

Exercises

1. For each of the following regular languages over the alphabet Σ = {0, 1},
design a regular expression and DFA which accepts that language. For which
languages can you design an NFA that is substantially smaller than your
DFA?

(a) {w | w contains an odd number of 1’s}


(b) {w | w contains exactly two 1’s}
(c) {w | w contains 011}
(d) {w | w ends with 00}
(e) {w | |w| is a multiple of 3}
(f) {w | every 0 in w is eventually followed by a 1}
(g) {w | w does not begin with 10}
(h) {w | w is the binary representation of a multiple of three}
(i) {w | w contains both 00 and 11 as substrings}
(j) {0n 1m | m, n ∈ N, m + n is even}
(k) {w | w contains a substring of length at most 5 that has at least three 1’s}
(l) {w | w has an even number of zeroes, or exactly 2 ones}
introduction to the theory of computation 73

2. Let L = {w ∈ {0, 1}∗ | the third character of w is a 1}. Prove that every DFA
accepting L has at least 5 states.
3. Let L = {w ∈ {0, 1}∗ | the third last character of w is a 1}. Prove that every
DFA accepting L has at least 8 states. Hint: Consider the 8 binary strings of
length 3. For a bonus, what is the smallest DFA
4. Prove by induction that every finite language can be represented by a regular you can find that accepts L? It will have
at least 8 states!
expression. (This shows that all finite languages are regular.)
5. Prove that the following languages are not regular.
2
(a) { an | n ∈ N}
(b) { xx | x ∈ {0, 1}∗ }
(c) {w ∈ { a, b}∗ | w has more a’s than b’s}
(d) {w ∈ {0, 1}∗ | w has two blocks of 0’s with the same length}
A block is a maximal substring contain-
(e) { an bm cn−m | n ≥ m ≥ 0} ing the same character; for example, the
string 00111000001 has four blocks: 00,
6. Recall that the complement of a language L ⊆ Σ∗ is the set L = {w ∈ Σ∗ | 111, 00000, and 1.

w∈/ L }.

(a) Given a DFA D = ( Q, Σ, δ, s, F ) that accepts a language L, describe how


you can construct a DFA that accepts L.
(b) Let L be a language over an alphabet Σ. Prove that if L is not regular, then
the complement of L is not regular.

7. A regular expression r is star-free if it doesn’t contain any star operations.


Prove that for every star-free regex r, L(r ) is finite.
8. The suffix operator Su f f takes as input a regular language, and outputs all
possible suffixes of strings in the language. For example, if L = { aa, baab}
then
Su f f ( L) = {e, a, aa, b, ab, aab, baab}.
Prove that if L is a regular language, then so is Su f f ( L). (Hint: recall the
definition of regular languages, and use structural induction!)
9. The prefix operator Pre takes as input a regular language, and outputs all pos-
sible prefixes of strings in the language. For example, if L = { aa, baab} then

Pre( L) = {e, a, aa, b, ba, baa, baab}.

Prove that if L is a regular language, then so is Pre( L). (Hint: recall the
definition of regular languages, and use structural induction!)
6 In Which We Say Goodbye

With CSC236 complete, you have now mastered the basic concepts and reason-
ing techniques vital to your computer science career both at this university and
beyond. You have learned how to analyse the efficiency of your programs, both
iterative and recursive. You have also learned how to argue formally that they
are correct by using program specifications (pre- and postconditions) and loop
invariants. You studied the finite automaton, a simple model of computation
with far-reaching consequences.

Where to from here? Most obviously, you will use your skills in CSC263 and
CSC373, where you will study more complex data structures and algorithms.
You will see first-hand the real tools with which computers store and compute
with large amounts of data, facing real-world problems as ubiquitous as sorting
– but whose solutions are not nearly as straightforward. If you were intrigued
by the idea of provably correct programs, you may want to check out CSC410; if
you liked the formal logic you studied in CSC165, and are interested in more of
its computer science applications (of which there are many!), CSC330, CSC438,
CSC465, and CSC486 would be good courses to consider. If you’d like to learn
about more powerful kinds of automata and more complex languages, CSC448
is the course for you. Finally, CSC463 tackles computability and complexity theory,
the fascinating study of the inherent hardness of problems.

For more information on the above, or other courses, or any other matter aca-
demic, professional, or personal, come talk to any one of us in the Department
of Computer Science! Our {doors, ears, minds} are always open.

Be the person your dog thinks you are.

You might also like