Notes
Notes
ECS 120
Lecture Notes
David Doty
ii
Copyright
c May 31, 2020, David Doty
No part of this document may be reproduced without the expressed written consent of
the author. All rights reserved.
Contents
1 Introduction 1
1.1 What this course is about . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Automata, computability, and complexity . . . . . . . . . . . . . . . . . . . 3
1.3 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Implication statements . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Sequences and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.4 Functions and relations . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 The pigeonhole principle . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.6 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.7 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.8 Boolean logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Proof by induction on natural numbers . . . . . . . . . . . . . . . . . 11
1.4.2 Induction on other structures . . . . . . . . . . . . . . . . . . . . . . 12
I Automata Theory 15
2 String theory 17
2.1 Why to study automata theory . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
iv CONTENTS
6 Equivalence of models 57
6.1 Equivalence of DFAs and NFAs (subset construction) . . . . . . . . . . . . . 57
6.2 Equivalence of RGs and NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1 Left-regular grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Equivalence of regex’s and NFAs . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3.1 Every regex-decidable language is NFA-decidable . . . . . . . . . . . 62
6.3.2 Every NFA-decidable language is regex-decidable . . . . . . . . . . . 64
6.4 Optional: Equivalence of DFAs and constant memory programs . . . . . . . 67
8 Turing machines 85
8.1 Intuitive idea of Turing machines . . . . . . . . . . . . . . . . . . . . . . . . 85
8.2 Formal definition of a TM (syntax) . . . . . . . . . . . . . . . . . . . . . . . 87
8.3 Formal definition of computation by a TM (infinite semantics) . . . . . . . . 88
8.4 Optional: Formal definition of computation by a TM (finite semantics) . . . 89
8.5 Languages recognized/decided by TMs . . . . . . . . . . . . . . . . . . . . . 90
8.6 Variants of TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.6.1 Multitape TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.6.2 Other variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.7 TMs versus code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.8 Optional: The Church-Turing Thesis . . . . . . . . . . . . . . . . . . . . . . 96
11 Undecidability 153
11.1 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.1.1 Turing-recognizable but not decidable . . . . . . . . . . . . . . . . . . 153
11.1.2 Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.2 Optional: Source code versus programs . . . . . . . . . . . . . . . . . . . . . 155
11.3 Undecidable problems about algorithm behavior . . . . . . . . . . . . . . . . 156
CONTENTS vii
Introduction
• In ECS 20, you mathematically studied things that are not computers.
• In ECS 120, you will use the tools of ECS 20 to mathematically study computers.
The fundamental premise of the theory of computation is that the computer on your
desk—or in your pocket—obeys certain laws, and therefore, certain unbreakable limitations.
We can reason by analogy with the laws of physics. Newton’s equations of motion tell us
that each object with mass obeys certain rules that cannot be broken. For instance, an
object cannot accelerate in the opposite direction in which force is being applied to it. Of
course, nothing in the world is a rigid frictionless body of mass obeying all the idealized
assumptions of classical mechanics... in other words, “the map is not the territory”. So in
reality, Newton’s equations of motion do not exactly predict anything. But they are a useful
abstraction of what real matter is like, and many things in the world are close enough to
this abstraction that Newtonian predictions are reasonably accurate.
I often think of the field of computer science outside of theory as being about proving
what can be done with a computer... simply by doing it! Much of research in theoretical
computer science is about proving what cannot be done with a computer. This can be more
difficult, since you cannot simply cite your failure to invent an algorithm for a problem to be
a proof that there is no algorithm. But certain important problems cannot be solved with
any algorithm, as we will see.
We will draw no distinction between the idea of “formal proof” and more nebulous
instructions such as “show your work”/“justify your answer”/“explain”. A “proof” of a
claim is an argument that convinces an intelligent person who has never seen the claim
1
2 CHAPTER 1. INTRODUCTION
before and cannot see why it is true with having it explained. It does not matter if the
argument uses formal mathematical notation or not (though formal notation is briefer and
more straightforward to make precise than English), or if it uses proof by induction or proof
by contradiction or just a direct proof (though it is often easier to think in terms of induction
or contradiction). What matters is that there are no holes or counter-arguments that can
be thrown at the argument, and that every statement is precise and unambiguous.
However, one effective technique to prove theorems (I do this in nearly all of my research
papers) is to first give an informal “proof sketch”, intuitively explaining how the proof will
go, but shorter than the proof and lacking in potentially confusing (but necessary) details.
The proof is easier to read if one first reads the proof sketch, but the proof sketch by itself is
not a proof. In fact, I would go so far as to say that the proof by itself is not a very effective
proof either, since bare naked details and mathematical notation, without any intuition to
help someone understand it, do not communicate why the theorem is true any better than
the hand-waving proof sketch. Both are usually necessary to accomplish the goal of the
proof: to help the reader understand why the theorem is true.
Here’s an example of a formal theorem, and informal proof sketch, and a formal proof:
Theorem 1.1.1. There are infinitely many prime numbers.
Proof sketch. Intuitively, we show that for each finite set of prime numbers, we can find
a larger prime number. We do this by doing multiplying together the existing primes and
adding 1. This is a new prime because, if it were a multiple of an existing prime, it would
be only one greater than a different multiple of that prime. This is a contradiction because
multiples of integers bigger than 1 are more widely spaced. (For example, multiples of 3 are
at least 3 apart: 3, 6, 9, 12, 15, . . .)
Proof. Let S = {p1 , p2 , . . . , pk } ⊂ N be any finite set of primes. It suffices to show that we
can find a prime not in S, implying that S cannot be all the primes. Since S is an arbitrary
finite set of primes, this shows that the set of all prime numbers cannot be finite.
Let m = p1 · p2 · . . . · pk and let n = m + 1. Since n is larger than any number in S,
thus unequal to any of them, it suffices to show n is also prime. Suppose for the sake of
contradiction that n is not prime; then it has a prime factor pi 6= n, where pi | n (i.e., pi
divides n). Note that pi | m as well. Therefore n and m are different multiples of pi , so their
difference n − m must be at least pi . But n − m = 1, which is smaller than any prime, a
contradiction.
Note that the proof sketch is a bit shorter, uses less formal notation, and skips some
details. As a result, it is easy to read, but incomplete. The proof gives all the details, but
the proof sketch is a “roadmap” helping to guide the reader through it. Sometimes when I
write a paper, I don’t necessarily keep the proof sketch separate from the proof. Sometimes
the first paragraph or two of the proof is the sketch. Other times, the sketch is more “woven”
into the proof, especially for long proofs. This helps the reader stop to “come up for air”
occasionally and remember the big picture that the proof is trying to show. (However, in
this course, most proofs will not be very long, so won’t require this level of care.)
1.2. AUTOMATA, COMPUTABILITY, AND COMPLEXITY 3
In this course, in the interest of time, I will often give the intuitive proof sketch only
verbally, and write only the formal details on the board. On your homework, however, you
should explicitly write both, to make it easy for the TA to understand your proof and give
you full credit.
Counting
A regular expression (regex) is an expression that matches some strings and not others. For
example
(0(0 ∪ 1 ∪ 2)∗ ) ∪ ((0 ∪ 2)∗ 1)
matches any string of digits that starts with a 0, followed by any number of 0’s, 1’s, and 2’s,
or ends with a 1, preceded by any number of 0’s and 2’s.
1
We could imagine trying “all” possible integer solutions, but if there is no integer solution, then we will be trying
forever and the algorithm will not halt.
4 CHAPTER 1. INTRODUCTION
Figure 1.1: “Königsberg graph”. Licensed under CC BY-SA 3.0 via Commons – https://upload.
wikimedia.org/wikipedia/commons/5/5d/Konigsberg_bridges.png
If x is a binary string and a a symbol, let #(a, x) be the number of a’s in x. For example,
#(0, 001200121) = 4 and #(1, 001200121) = 3.
Task C: write a regex that matches a ternary string x exactly when #(0, x)+#(1, x) = 3.
Answer: 2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗
Task C 0 : write a regex that matches a ternary string x exactly when #(0, x) − #(1, x) =
3.
Fact: Task C 0 is impossible
Computability theory (unit 3): What problems can algorithms solve? (finding real roots
of multivariate polynomials, but not integer roots)
Computational complexity theory (unit 2): What problems can algorithms solve effi-
ciently? (finding paths visiting every edge, but not every node)
Automata theory (unit 1): What problems can algorithms solve with “optimal” effi-
ciency (constant space and “real time”, i.e., time = size of input)? (finding whether
the sum of # of 0’s and # of 1’s equals some constant, but not the difference)
2
In algorithms and theory courses and research, to “demonstrate correctness” usually means a proof. In software
engineering, it usually means unit tests and code reviews. For critical stuff like spacecraft software, it involves both.
3
The Diophantine equation problem above is a notable exception, but it took 70 years to prove it is unsolvable,
and even then it was done by creating a Diophantine equation that essentially “mimics” the behavior of a program
so as to connect the existence of its roots to a question about the behavior of certain programs.
4
Now, this sort of idea, of programs receiving and producing other programs, is not crazy in principle. The
C compiler, for example, is itself a program, which takes as input the source code of a program (written in C)
and outputs the code of another program (machine instructions, written in the “language” of the host machine’s
instruction set). The C preprocessor (which rewrites macros, for instance) takes as input C programs and produces
C programs as output.
6 CHAPTER 1. INTRODUCTION
1.3.2 Sets
A set is a group of objects, called elements, with no duplicates.8 The cardinality of a set A is
the number of elements it contains, written |A|. For example, {7, 21, 57} is the set consisting
of the integers 7, 21, and 57, with cardinality 3.
For two sets A and B, we write A ⊆ B, and say that A is a subset of B, if every element
of A is also an element of B. A is a proper subset of B, written A B, if A ⊆ B and A 6= B.
We use the following sets throughout the course
The unique set with no elements is called the empty set, written ∅.
To define sets symbolically,9 we use set-builder notation: for instance, { x ∈ N | x is odd }
is the set of all odd natural numbers.
5
e.g., “Hawaii is west of California”, or “The stoplight is green.”
6
e.g., “If the stoplight is green, then my car can go.”
7
The contrapositive of a statement is logically equivalent to the statement itself. For example, it is equivalent to
state “If someone is allowed to drink alcohol, then they are at least 21” and “If someone is under 21, then they are
not allowed drink alcohol”. Hence a statement’s converse and inverse are logically equivalent to each other, though
not equivalent to the statement itself.
8
Think of std::set.
9
In other words, to express them without listing all of their elements explicitly, which is convenient for large finite
sets and necessary for infinite sets.
1.3. MATHEMATICAL BACKGROUND 7
We write ∀x ∈ A as a shorthand for “for all elements x in the set A ...”, and ∃x ∈ A
as a shorthand for “there exists an element x in the set A such that ...”. For example,
(∃n ∈ N) n > 10 means “there exists a natural number n such that n is greater than 10”.
Given two sets A and B, A∪B = { x | x ∈ A or x ∈ B } is the union of A and B, A∩B =
{ x | x ∈ A and x ∈ B } is the intersection of A and B, and A \ B = { x ∈ A | x 6∈ B } is
the difference between A and B (also written A − B). A = { x | x 6∈ A } is the complement
of A. 10
Given a set A, P(A) = { S | S ⊆ A } is the power set of A, the set of all subsets of A.
For example,
P({2, 3, 5}) = {∅, {2}, {3}, {5}, {2, 3}, {2, 5}, {3, 5}, {2, 3, 5}}.
Given any set A, it always holds that ∅, A ∈ P(A), and that |P(A)| = 2|A| if |A| < ∞. 11 12
A sequence is an ordered list of objects 13 . For example, (7, 21, 57, 21) is the sequence of
integers 7, then 21, then 57, then 21.
A tuple is a finite sequence.14 (7, 21, 57) is a 3-tuple. A 2-tuple is called a pair.
For two sets A and B, the cross product of A and B is A×B = { (a, b) | a ∈ A and b ∈ B }.
≤k
Sk
Note that |A×B| = |A|·|B|. For k ∈ N, we write Ak = A | × A ×
{z. . . × A
} and A = i
i=0 A .
k times
For example, N2 = N × N is the set of all ordered pairs of natural numbers.
10
Usually, if A is understood to be a subset of some larger set U , the “universe” of possible elements, then A is
understood to be U \ A. For example if we are dealing only with N, and A ⊆ N, then A = { n ∈ N | n 6∈ A }. In
other words, we used “typed” sets, in which case each set we use has some unique superset – such as {0, 1}∗ , N, R, Q,
the set of all finite automata, etc. – that is considered to contain all the elements of the same type as the elements of
the set we are discussing. Otherwise, we would have the awkward situation that for A ⊆ N, A would contain not only
nonnegative integers that are not in A, but also negative integers, real numbers, strings, functions, stuffed animals,
and other objects that are not elements of A.
11
Why?
12
Actually, Cantor’s theory of infinite set cardinalities makes sense of the claim that |P(A)| = 2|A| even if A is an
infinite set. The furthest we will study this theory in this course is to observe that there are at least two infinite set
cardinalities: that of the set of natural numbers, and that of the set of real numbers, which is bigger than the set of
natural numbers according to this theory.
13
Think of std::vector.
14
The closest C++ analogy to a tuple, as we will use them in this course, is an object. Each member variable
of an object is like an element of the tuple, although C++ is different in that each member variable of an object
has a name, whereas the only way to distinguish one element of a tuple from another is their position. But when
we use tuples, for instance to define a finite automaton as a 5-tuple, we intuitively think of the 5 elements as being
like 5 member variables that would be used to define a finite automaton object. And of course, the natural way to
implement such an object in C++ by defining a FiniteAutomaton class with 5 member variables, which is an easier
way to keep track of what each of the 5 elements is supposed to represent than, for instance, using an void[] array
of length 5.
8 CHAPTER 1. INTRODUCTION
If
(∀d1 , d2 ∈ D) d1 6= d2 =⇒ f (d1 ) 6= f (d2 ),
then we say f is 1-1 (one-to-one or injective).18
If
(∀r ∈ R)(∃d ∈ D) f (d) = r,
then we say f is onto (surjective). Intuitively, f “covers” the range R, in the sense that no
element of R is left un-mapped-to by f .
f is a bijection (a.k.a. a 1-1 correspondence) if f is both 1-1 and onto.
A predicate is a function whose output is boolean.
Given a set A, a relation R on A is a subset of A × A. Intuitively, the elements in R are
the ones related to each other. Relations are often written with an operator; for instance,
the relation ≤ on N is the set R = { (n, m) ∈ N × N | (∃k ∈ N) n + k = m }.
15
Most statically typed programming languages like C++ have direct support for functions with declared types for
input and output. In Java, these are like static methods; Integer.parseInt, which takes a String and returns the
int that the String represents (if it indeed represents an integer) is like a function with domain String and range
int. Math.max is like a function with domain int × int (since it accepts a pair of ints as input) and range int. The
main difference between functions in programming languages and those in mathematics is that in a programming
language, a function is really an algorithm for computing the output, given in the input, whereas in mathematics the
function is just the abstract relationship between input and output, and there may not be any algorithm computing
it.
16
For instance, Integer.parseInt is (strictly) partial, because not all Strings look like integers, and such Strings
will cause the method to throw a NumberFormatException.
17
Every total function is a partial function, but the converse does not hold for any function that is undefined for
at least one value. We will usually assume that functions are total unless explicitly stated otherwise.
18
Intuitively, f does not map any two points in D to the same point in R. It does not lose information; knowing
an output r ∈ R suffices to identify the input d ∈ D that produced it (through f ).
1.3. MATHEMATICAL BACKGROUND 9
1.3.6 Combinatorics
Counting sizes of sets is a tricky art to learn. It’s a whole subfield of mathematics called
combinatorics, with very deep theorems, but we won’t need anything particularly deep in
this course. For simple counting problems, a few basic principles apply.
The first is sometimes called “The Product Rule”: If a set is defined by cross-product
(i.e., each element of a set is a tuple), then often multiplying the sizes of the smaller sets
works. For example, there are 4 integers in the set A = {1, 2, 3, 4} and 3 integers in the set
B = {1, 2, 3}. How many ways are there to pick one element from A and one from B? There
are 12 pairs of integers in the set A × B, because to choose a pair (a, b) ∈ A × B, there are
4 ways to choose a and 3 ways to choose b, so 4 · 3 ways to choose both. Here they all are:
1 2 3
1 (1, 1) (1, 2) (1, 3)
2 (2, 1) (2, 2) (2, 3)
3 (3, 1) (3, 2) (3, 3)
4 (4, 1) (4, 2) (4, 3)
If the tuple is bigger, you just keep multiplying: in the triple (a, b, c), if there are 4 ways
to choose a, 3 ways to choose b, and 7 ways to choose c, then there are 4 · 3 · 7 = 84 possible
triples (a, b, c).
19
In fact, we will eventually see (Section 11.6.3) that such reasoning applies even to infinite sets. If O and B are
both infinite, but there is no 1-1 function f : O → B, then any way of assigning objects from O to “boxes” in B must
assign two objects to the same box. For example, we will see that if O is the set of all decision problems (defined
formally in Section 2.2) we want to solve with algorithms (defined formally as Turing machines in Chapter 8), and
B is the set of all algorithms, then |O| > |B|, so there is no 1-1 function f : O → B. So any way of assigning each
decision problem uniquely to an algorithm that solves it must fail (by assigning two different problems to the same
algorithm, which can’t very well solve both of them). Thus some decision problems have no algorithm solving them.
10 CHAPTER 1. INTRODUCTION
A slightly trickier example is (where we analyze two different sub-cases): how many
4 digit numbers have their first digit within 1 of their last digit? (for example: 1230,
1231, 1232, 9009, 9008) Well, there’s still 900 ways to choose the first three digits, but
there’s now 3 ways to choose the last digit, unless the first digit is 9, and then there’s
only 2 ways to choose the last digit. This gives two sub-cases: the first digit is 9 or it
isn’t.
There are 800 ways to choose the first three digits with the first not equal to 9, times
3 ways to choose the last digit (800 · 3 = 2400). There are 100 ways to choose the first
three digits with the first equal to 9, times 2 ways to choose the last digit (100·2 = 200).
Combining the sub-cases, there are 2400 + 200 = 2600 such numbers.
1.3.7 Graphs
See slides.
See slides.
Base case: {0, 1}0 = {ε}.21 |{ε}| = 1 = 20 , so the base case holds.
Inductive case: Assume |{0, 1}n−1 | = 2n−1 .22 We must prove that |{0, 1}n | = 2n . Note
that every x ∈ {0, 1}n−1 appears as a prefix of exactly two unique strings in {0, 1}n ,
20
To start, state in English what the theorem is saying: For every string length n, there are 2n strings of length n.
21
Note that {0, 1}0 is not ∅; there is always one string of length 0, so the set of such strings is not empty.
22
Call this the inductive hypothesis, the fact we get to assume is true in proving the inductive case.
12 CHAPTER 1. INTRODUCTION
Of course, there are other (non-induction) ways to see that |{0, 1}n | = 2n . For example,
using the Product Rule for counting, we can say that there are 2 ways to choose the first
bit, times 2 ways to choose the second bit, ..., times 2 ways to choose the last bit, so
2| · 2 ·{z. . . · 2} = 2n ways to choose all of them. This is a sort of “iterative” reasoning that is
n
more cleanly (but also more verbosely and pedantically) captured by the inductive argument
above.
Theorem 1.4.2. For every n ∈ N+ , ni=1 i(i+1) 1 n
P
= n+1 .
Pn 1 1 1 n
Proof. Base case (n = 1): i=1 i(i+1) = 1(1+1) = 2 = n+1 , so the base case holds.
Inductive case: Let n ∈ N+ and suppose the theorem holds for n. Then
n+1 n
X 1 1 X 1
= + pull out last term
i=1
i(i + 1) (n + 1)(n + 2) i=1 i(i + 1)
1 n
= + inductive hypothesis
(n + 1)(n + 2) n + 1
1 + n(n + 2)
=
(n + 1)(n + 2)
n2 + 2n + 1
=
(n + 1)(n + 2)
(n + 1)2
=
(n + 1)(n + 2)
n+1
= ,
n+2
so the inductive case holds.
24
We will do lots of proofs involving induction on strings, but for now we will just give an inductive definition. Get
used to breaking down strings and other structures in this way.
25
In the case of strings, this is the empty string. In the case of trees, this could be the empty tree, or the tree with
just one node: the root (just like with natural numbers, the base case might be 0 or 1, depending on the theorem).
26
The inductive step should then employ the truth of the theorem on some “smaller” object than the target object.
In the case of strings, this is typically a substring, often a prefix, of the target string. In the case of trees, a subtree,
typically a subtree of the root. Using smaller subtrees than the immediate subtrees of the root, or shorter substrings
than a one-bit-shorter prefix, is like using a number smaller than n − 1 to prove the inductive case for n; this is the
difference between weak induction (using the truth of the theorem on n − 1 to prove it for n) and strong induction
(using the truth of the theorem on all m < n to prove it for n)
14 CHAPTER 1. INTRODUCTION
Part I
Automata Theory
15
Chapter 2
String theory
No, not that string theory. In this chapter we cover the basic theoretical definitions and
terminology used to talk about the kind of strings found as a data type in programming
languages: finite sequences of symbols.
Some people also call this language theory, which is an odd name. It was developed
at a time when connections were being discovered between linguistics and the theory of
computation. As such, many of the terms sound strange to a computer scientist, but make
sense if one considers using the theory to model natural languages. Modern-day natural
language processing is as much machine learning techniques as linguistics. However, the
most elegant application of this theory has been to the development of parsers and compilers
for artificial programming languages.
It is also sometimes the case that “language theory” refers more broadly to “automata
theory”, which is the subject of this whole part of the book, including finite automata, regular
expressions, and context-free grammars. But the data that those automata are designed to
process is strings, so we necessarily start the study with strings.
17
18 CHAPTER 2. STRING THEORY
2.2 Definitions
An alphabet is any non-empty finite set, whose elements we call symbols (a.k.a., characters).
For example, {0, 1} is the binary alphabet, and
{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z}
is the Roman alphabet. Symbols in the alphabet are usually single-character identifiers.
A string (a.k.a., word ) over an alphabet is a finite sequence of symbols taken from the
alphabet. We write strings such as 010001, without the parentheses and commas.2 If x is a
string, |x| denotes the length of x.
If Σ is an alphabet, the set of all strings over Σ is denoted Σ∗ . For n ∈ N, Σn =
{ x ∈ Σ∗ | |x| = n } is the number of strings in Σ∗ of length n. Similarly Σ≤n = { x ∈ Σ∗ | |x| ≤ n }
and Σ<n = { x ∈ Σ∗ | |x| < n }.
The string of length 0 is written ε, and in some textbooks, λ. In most programming
languages it is written "".
Note in particular the difference between ε, ∅, and {ε}.3
1
Cryptography is arguably equally impactful on applications where security is a concern. However, compilers and
interpreters are needed for all software, regardless of security. If it weren’t for the compilers that enable high-level
languages, we’d still be implementing cryptographic software—and all other software—in machine code.
2
If we used the standard sequence notation, 010001 would be written (0, 1, 0, 0, 0, 1)
3
ε is a string, ∅ is a set with no elements, and {ε} is a set with one element. The following Python code defines
these three objects
epsilon = ""
2.2. DEFINITIONS 19
Given n, m ∈ N, x[n . . m] is the substring consisting of the nth through mth symbols of
x, and x[n] = x[n . . n] is the nth symbol in x, where x[1] is the first symbol.
We write xy (or x ◦ y when we would like an explicit operator symbol) to denote the
concatenation of x and y, and given k ∈ N, we write xk = xx | {z. . . x}. 4 The reverse of x,
k times
where |x| = n, is the string xR = x[n]x[n − 1] . . . x[2]x[1].
Given two strings x, y ∈ Σ∗ for some alphabet Σ, x is a prefix of y, written x v y, if x is
a substring that occurs at the start of y. x is a suffix of y if x is a substring that occurs at
the end of y.
The length-lexicographic ordering (a.k.a. military ordering) of strings over an alphabet
is the standard dictionary ordering, except that shorter strings precede longer strings.5 For
example, the length-lexicographical ordering of {0, 1}∗ is
ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000, 0001, . . .
A language (a.k.a. a decision problem) is a set of strings. A class is a set of languages.
We’ve been speeding through terminology, but it’s worth pausing on these definitions for
a moment. We already explained that “language” is an archaic term left over from linguistic
theory. But why do we say that a set of strings is also a “decision problem”? A decision
problem is a computational problem with a yes/no answer. In other words, it’s the kind of
problem you are solving when you write a programming language function with a Boolean
return type. Given some input to the function, the function answers “yes” or “no”. Is a
given integer prime or not? Is a given string a properly formatted HTML file? Is a given
list of integers x, y, z, n a counter-example xn + y n = z n to Fermat’s Last Theorem? These
are all decision problems you can solve by writing an appropriate function in your favorite
programming language.
But what’s the input to the function? Depending on the programming language, it could
be many arguments, and they could all be different types such as integers, floating point
numbers, lists of strings, etc. However, every one of these objects, somewhere in memory, is
just a sequence of bits. In other words, at a lower level of abstraction, every one of those
empty set = set()
set with epsilon = {""}
In Java, this is
String epsilon = "";
Set empty set = new HashSet();
Set<String> set with epsilon = new HashSet<String>();
set with epsilon.add(epsilon);
In C++, this corresponds to (assuming we’ve imported everything from the standard library std)
string epsilon("");
set empty set;
set<string> set with epsilon;
set with epsilon.insert(epsilon);
4
Alternatively, define xk inductively as x0 = ε and xk = xxk−1
5
It is also common to call this simply the “lexicographical ordering”. However, this term is ambiguous and is also
used to mean the standard alphabetical ordering. In length-lexicographical order, 1 < 00, but in alphabetical order,
00 < 1. We use the term length-lexicographical to avoid confusion.
20 CHAPTER 2. STRING THEORY
objects is simply a finite binary string, whose bits are interpreted in a special way. And even
if there are many arguments, when they are passed into the function as input, through some
mechanism, they are combined into a single string (for instance, with delimiters between
them to mark the boundaries).6 So even though it’s not as convenient for programming, it is
more convenient for mathematical simplicity to simply say that the input to every decision
problem is a single finite string. And since there’s only two possible answers, if we know the
subset of strings for which the correct answer is “yes”, then we also know the correct answer
for every string.
Thus, the set of input strings (i.e., the language) for which the answer is “yes” is the
mathematical object defining the decision problem itself. More formally, L ⊆ Σ∗ defines the
decision problem: on input x, if x ∈ L then output “yes”, and if x 6∈ L then output “no”.
For example, the decision problem of determining whether an integer is prime is represented
by the subset P ⊂ {0, 1}∗ of binary strings that encode prime integers in binary:
Now, there’s other kinds of computational problems worth solving, where the output is not
Boolean. A solution to a search problem, rather than being merely one bit, is a whole string.
For example, given a set of linear equations, find a solution. An optimization problem is a
special kind of search problem where different solutions have different quantitative “values”
and we want to minimize or maximize the value. For example, given a list of airline ticket
prices between cities, find the cheapest sequence of flights that visits each city once. It seems
like a major restriction to study only decision problems and not these more general types of
problems. We will talk in more detail about these kinds of problems in Chapter 9, and we
will see that in some senses, we can learn a lot about computational problems generally just
by restricting attention to decision problems.
We said that a class is a set of languages. This term “class” sounds like just another
noun. However, these terms are useful because, without them, we would just call everything
a “set”, and easily forget whether it is a set of strings, a set of sets of strings, or even the
dreaded set of sets of sets of strings (they are out there; the arithmetical and polynomial
hierarchies are sets of classes).
Why would we want to think about sets of decision problems, instead of just one at
a time? Generally, we will define some model of computation, such as finite automata, or
Turing machines, or polynomial-time Turing machines. These are sets of “types of computing
machines” or “programs” in some programming language. Each machine has some notion
of giving a Boolean output for every input, so each machine defines some decision problem
that it solves. So we consider classes of decision problems when we want to talk about a
6
Technically this would only be true for pass-by-value. If the language is pass-by-reference, the references/pointers
would be concatenated into a single string because they are passed into the function by being pushed sequentially
onto the call stack. However, we would still think of the input to the function as including the data in memory to
which those pointers are pointing, which may not be contiguous. So it would be more accurate to say that we can
conceptually think of the input as something that could easily be concatenated into a single string, even if it is not.
2.3. BINARY NUMBERS 21
concept such as the decision problems solvable by finite automata (these are also called the
class of regular languages) or the decision problems solvable by Turing machines (these are
also called the class of decidable languages) or the decision problems solvable in polynomial
time by Turing machines (this famous class has a name: P).
Given two languages A, B ⊆ Σ∗ , let AB = { ab | a ∈ A and b ∈ B } (also denoted
A ◦ B).7 Similarly, for all n ∈ N, An = AA | {z. . . A},8 , A≤n = A0 ∪ A1 ∪ . . . ∪ An and
n times
A<n = A≤n \ An .9
Given a language A, let A∗ = ∞ n 10
Note that A = A1 (hence A ⊆ A∗ ).
S
n=0 A .
Examples. Define the languages A, B ⊆ {0, 1, 2}∗ as follows: A = {0, 11, 222} and B =
{000, 11, 2}. Then
AB = {0000, 011, 02, 11000, 1111, 112, 222000, 22211, 2222}
A2 = {00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222}
A∗ = {|{z}
ε , 0, 11, 222, 00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222, 000, 0011, . . .}
| {z } | {z } | {z }
A0 A1 A2 part of A3
end of lecture 1a
We start with a very restricted model of computation, which (like many models we study)
takes a string input and produces a Boolean output, called (deterministic) finite automata.
This will be our first example of how to formally model computation with rigorously defined
mathematical objects such as tuples, sets, and functions. But, it is not merely a toy model
for educational purposes; it is among the models most frequently encountered outside of
theoretical computer science. String matching algorithms that implement the “find” feature
in text editors and displays are based on finite automata. Syntax highlighting engines in text
editors are often specified using finite automata. Many common compression algorithms are
implemented as finite automata. A common use is as a lexer : the first step of source code
compilation that transforms the raw text into “tokens” such as variable identifiers, numeric
literals, comments, string literals, etc.
• accept states: b
• reject states: a, c
1. Start in state a
D1 also accepts 1, 01, and 010101. In fact all transitions labeled 1 go to b, so it accepts
any string ending in a 1.
Are those all the strings it accepts?
D1 also accepts 100, 0100, 110000, and any string that has at least one 1 and ends with
an even number of 0’s following the last 1.
For example, I can say that a C++ program is any ASCII string that the gcc compiler can
compile with no errors: that’s the definition of the syntax of C++.1 But knowing that gcc
successfully compiled a program does little to help you understand what will happen when
you actually run the program. The definition of the semantics of C++ are more involved,
telling you what each line of code does: what effect does it have on the memory of the
computer?
And C++ semantics are complex indeed. For example, a = b+c; has the following
semantics:
add the integer stored in b to the integer stored in c and store the result in a...
unless one of b or c is a float or a double, in which case use floating point
arithmetic instead, and if the other argument is an int it is first converted to a
double... unless b is actually an instance of a class that overloads the + operator,
in which case call the member function associated with that operator...
And so on. Hopefully this makes it clear why, to learn how to reason formally about computer
programs, we start with a much simpler model of computation than C++ or Python, one
whose syntax and semantics definitions fit on a page.
• Q = {a, b, c},
1
Of course, that doesn’t do much to help you write syntactically valid C++ programs. Another more useful way
to define it is that it is any ASCII string that is produced by the C++ grammar (http://www.nongnu.org/hcb/).
We will cover grammars briefly later in the course, and you will understand better what it means for a string to be
produced by a grammar.
26 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA
• Σ = {0, 1},
• s = a,
• F = {b}, and
• δ is defined
δ(a, 0) = a,
δ(a, 1) = b,
δ(b, 0) = c,
δ(b, 1) = b,
δ(c, 0) = b,
δ(c, 1) = b,
or more succinctly, we represent δ by the transition table
δ 0 1
a a b
b c b
c b b
The state transition diagram and this formal description contain exactly the same informa-
tion. The diagram is easy for humans to read, and the formal description is easy to work
with mathematically, and to program.
If M = (Q, Σ, δ, s, F ) is a DFA, how many transitions are in δ; in other words, how many
entries are in the transition table?
If A ⊆ Σ∗ is the set of all strings that M accepts, we say that M decides A, and we write
L(M ) = A.
Every DFA decides a language. If it accepts no strings, what language does it decide? ∅
D1 decides the language
∗ w contains at least one 1 and an
L(D1 ) = w ∈ {0, 1}
even number of 0’s follow the last 1
Note the terminology here: M decides a single language L(M ), but for each string x ∈ Σ∗ ,
M either accepts or rejects x. M does not decide a string, nor does it accept or reject a
language; this misuse of terms mixes up the types.
Example 3.3.3. See Figure 3.2.
Formally, D2 = (Q, Σ, δ, s, F ), where Q = {a, b}, Σ = {0, 1}, s = a, F = {b}), and δ is
defined
δ 0 1
a a b
b a b
What does D2 do on input 1101?
3.4. MORE EXAMPLES 27
0 1
1
a b
0
Figure 3.2: A DFA D2 . L(D2 ) = { w ∈ {0, 1}∗ | w ends in a 1 } =
{1, 01, 11, 001, 011, 101, 111, 0001, . . .}
Example 3.3.4. Unless we want to talk about the individual states, when specifying a state
diagram, it is not necessary to actually give names to the states in the diagram; the start
arrow and transition arrows tell us the whole structure of the DFA. Figure 3.3 shows a such
0 1
1
0
Figure 3.3: A DFA D3 . L(D3 ) = { w ∈ {0, 1}∗ | w does not end in a 1 }
a state diagram.
end of lecture 1b
a a
0 1 2
Figure 3.4: A DFA deciding whether its input string’s length is a multiple of 3. Here we give state
names that are meaningful: each represents the remainder left when dividing by 3 the number of
symbols read so far.
1 0
0 1 2
1 0
0 1
1
1 0
s 0 1 2
1 0
0 0 1
0,1
0* r
0,1
Figure 3.5: Design of a DFA to decide multiples of 3 in binary. If a natural number n has b ∈ {0, 1} appended
to its binary expansion, then this number is 2n + b. Top: a DFA with transitions to implement “when in state k
after reading bits representing n, if n ≡ k mod 3, then after reading bit b, so that the bits read so far now represent
2n + b, change to the state k0 such that 2n + b ≡ k0 mod 3”. This assumes ε represents 0, and that leading 0’s are
allowed, e.g., 5 is represented by 101, 0101, 00101, 000101, . . . Bottom: a modification of top DFA to reject ε and
any positive integers with leading 0’s.
Intuitively, we can keep track of whether the bits read so far represent an integer n ∈ N
that is of the form n = 3k for some k ∈ N (i.e., n is a multiple of 3, or n ≡ 0 mod 3),
n = 3k + 1 (n ≡ 1 mod 3), or n = 3k + 2 (n ≡ 2 mod 3). Appending the bit 0 to the end of
n’s binary expansion multiplies n by 2, resulting in 2n, whereas appending the bit 1 results
in 2n + 1.
For example, consider appending a 1 to a multiple of 3. Since n = 3k, then 2n + 1 =
2(3k) + 1 = 3(2k) + 1 ≡ 1 mod 3, so the transition function δ obeys δ(0, 1) = 1. Consider
3.5. FORMAL DEFINITION OF COMPUTATION BY A DFA (SEMANTICS) 29
end of lecture 1c
2
This is a bit inexact, as I haven’t said what “memory” or “steps” are for general algorithms. But the meaning is
clear for DFAs: memory is the set of states and steps correspond to individual transitions. Other reasonable ways of
defining space (memory) and time (number of steps) for other models of computation lead to models with equivalent
computational power. In fact, the time restriction doesn’t even matter: if you use a constant amount of memory
and any amount of time, only regular decision problems can be solved, although proving that is beyond the scope
of this course. So for example, if you write a Python function returning a Boolean, and it uses no lists or sets or
other unbounded data structures, and if it uses no recursion (a way to use unbounded memory from the function call
stack), then the decision problem it is solving is regular, i.e., also solvable by a DFA.
30 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA
Chapter 4
Using programming language terminology, DFAs are an imperative language, much like C++
or Python: they represent a sequence of instructions telling the machine what to do, based
on what input is read. This chapter introduces three models of computation: regular expres-
sions, grammars, and non-deterministic finite automata. In contrast to DFAs, these models
are declarative: each describes some language, but the model itself gives no indication of
instructions for how to “execute” an instance of the model.1 It’s not obvious how to create
an algorithm deciding a language described by a grammar or regular expression. (Although
such algorithms do exist.) Non-deterministic finite automata are closer to DFAs in the sense
that they are interpreted to start in some state and transition from state to state based on
reading input symbols, but there is a special ability to make non-deterministic choices, and
it’s not clear how one would implement this ability in a real machine (but we will see how).
31
32 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION
4.1.3 Conventions
We sometimes abuse notation and write R to mean L(R), relying on context to interpret the
meaning.
Optionally, we may omit the parentheses in some cases. The operators have precedence
∗ > ◦ > ∪. For example, 11 ∪ 01∗ is equivalent to (1)(1) ∪ ((0)(1)∗ ).3
3
This is similar to why the arithmetic expression 3 × 7 + 4 × 56 is equivalent to (3) × (7) + ((4) × (5)6 ), but not
to ((3 × 7) + 4) × (56 ) or (3 × ((7 + 4) × 5))6 .
4.1. REGULAR EXPRESSIONS 33
For convenience, define R+ = RR∗ , for each k ∈ N, let Rk = RR . . . R}, and given an
| {z
k times
alphabet Σ = {a1 , a2 , . . . , ak }, then Σ is shorthand for the regex a1 ∪ a2 ∪ . . . ∪ ak .
The cases ε and ∅. Base case (1c) R = ∅ is required for the technical reason that, if instead
it were omitted, then the easiest decision problem there is, “just say no” (i.e., the language
∅) would not be regex-decidable. However, the case R = ∅ is rarely used in practice. The
regex simulator (http://web.cs.ucdavis.edu/~doty/automata/) has no way to specify it.
Similarly, base case (1b) R = ε is rarely used in practice. However, for many of the proofs
we do in this course, it will be convenient to have ε as its own special case. It is specified in
the simulator by the simple absence of symbols. For example, typing nothing corresponds
to the regex ε, |ab is the regex ε ∪ ab, and cd*||ab is the regex cd+ ∪ ε ∪ ab.
Regex trees. Although a regex is merely a string, a more structured way to think of a
regex, based on the inductive definition, is as having a tree structure, similar to the parse
tree for an arithmetic expression such as (7 + 3) · 9 + 11 · 5. The base cases are leaves, and the
recursive cases are internal nodes, with one child in the case of the unary operation ∗ and
two children in the case of the binary operations ◦ and ∪. See Figure 4.1 for an example.
◦ ◦
a * ∪ ∪
b a b ◦ ◦
* b b ◦
a b ◦
a b
Figure 4.1: Recursive structure of the regular expression ab∗ ∪ (a ∪ b)(a∗ b ∪ bbab) viewed as a tree.
4.1.4 Examples
Let Σ be an alphabet.
34 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION
• 0 (0∪1)∗ 0 ∪ 1 (0∪1)∗ 1 ∪ 0 ∪ 1 = { w ∈ {0, 1}∗ | w starts and ends with the same symbol }
• (0 ∪ ε)1∗ = 01∗ ∪ 1∗ .
•
skip in lecture
∅∗ = {ε}.4
There is an algebra of regular expression operations that can be helpful to thinking about
when manipulating or simplifying them. One can think of ∪ analogously to addition and
◦ analogously to multiplication. The distributive law holds: A(B ∪ C) is equivalent to
AB ∪ AC.5 In other words, having “a string matching A, followed by one matching either B
or C” is equivalent to having “either a string matching A followed by one matching B, or a
string matching A followed by one matching C”. One can think of ∅ as the additive identity
(which is why R ∪ ∅ = R and R∅ = ∅, analogously to x + 0 = x and x · 0 = 0), and one can
think of ε as the multiplicative identity (which is why Rε = R, analogously to x · 1 = x).
4
The identity ∅∗ = {ε} is more a convention than something that can be derived from the definitions. This is
similar to the convention for 00 (the number zero raised to the power zero). Is 00 = 0, because 0 times any number
is 0? Or is 00 = 1, because anything raised to the power 0 is 1? It’s not particularly well defined, so we just say
00 = 1 by convention. Similarly, it’s not clear whether ∅∗ = ∅ because ∅ concatenated to anything is ∅, or whether
∅∗ contains ε because ε ∈ A∗ for all A. Like in the case of 00 , we just choose a convention and say that ∅∗ = {ε}. In
practice, this doesn’t come up much.
5
As with addition and multiplication, the distributive law fails if we swap the operations: A ∪ BC is not equivalent
to (A ∪ B)(A ∪ C).
4.2. CONTEXT-FREE GRAMMARS 35
Example 4.1.2. Design a regex to match C++ double literals. For example, the following
are valid:
2.0, 3.14, .02, 3., 0.02, -.02, +4.5, 0.00
The following are not valid:
., +, +-2.0, 00.0, 2 (that’s an int, not a double)
Let P = 1 ∪ . . . ∪ 9 be a regex deciding a single positive decimal digit. Let D = 0 ∪ P .
(+ ∪ − ∪ ε)(.D+ ∪ (P D∗ ∪ 0).D∗ )
end of lecture 2a
S → AB
A → 0A
A → ε
B → 1B
B → ε
A grammar consists of rules (a.k.a., productions), one on each line. The single symbol on the
left is a variable, and the string on the right consists of variables and other symbols called
terminals. One variable is designated as the start variable, on the left of the topmost rule.
The fact that the left side of each production has a single variable (instead of a multiple-
symbol string) means the grammar is context-free. We abbreviate two rules with the same
left-hand variable as follows: A → 0A | ε as a shorthand for the two rules A → 0A and
A → ε.
S, A, and B are variables, and 0 and 1 are terminals.
The idea is that we start with a single copy of the start variable. We pick a rule with
the start variable and replace the variable with the string on the right side of the rule. This
gives a string mixing variables and terminals. We pick some variable in it, pick a rule with
that variable on the left side, and again replace the variable with the right side of the rule.
Do this until the string is all terminal symbols.
What is the set of strings producible with the rules of the example grammar above?
There’s no choice how to start: S becomes AB, and we write this as S ⇒ AB.
36 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION
Now, we can replace either A or B; let’s pick A. We can replace it with either 0A or ε; let’s
pick 0A. The B remains unchanged as we do this substitution. Then AB ⇒ 0AB. We could
do this rule again: 0AB ⇒ 00AB. We could now replace B with 1B, so 00AB ⇒ 00A1B.
Do that a few more times: 00A1B ⇒ 00A11B ⇒ 00A111B ⇒ 00A1111B ⇒ 00A11111B.
We could go back to substituting for A: 00A11111B ⇒ 000A11111B. Now, we could
substitute A with the other rule A → ε, so 000A11111B ⇒ 00011111B. We could do the
same for B, and now there’s no more variables left to substitute: 00011111B ⇒ 00011111.
So we’ve produced string 00011111. What other strings are producible?
Example 4.2.1. Write a grammar generating the language {0n 1n | n ∈ N}.
S → 0S1 | ε
Example 4.2.2. Write a grammar generating the language of properly nested parentheses.
Examples of properly nested parentheses, with underbraces |{z}
() showing which pairs of
opening ( and closing ) parentheses are matched together:
• |{z}
()
• |{z}
() |{z}
()
• ( |{z}
() )
| {z }
• |{z}
() ( |{z}
() |{z}
() ) ( |{z}
() ( |{z}
() ) )
| {z } | {z }
| {z }
Examples of not properly nested parentheses:
• )(: cannot close parentheses without an opening one to the left.
• ((): first open parentheses are not closed.
• ()): last closing parentheses doesn’t match one that was opened.
Hint: the following rule describes properly nested parentheses: if two strings have prop-
erly nested parentheses, then so does their concatenation. Also, if x has properly nested
parentheses, then surrounding it with parentheses keeps it properly nested.
S → (S) | SS | ε
We also say that the grammar accepts or produces the string 000:111.
For the balanced parentheses grammar S → (S) | SS | ε, a derivation of (()(())) is
We also may represent this derivation with a parse tree, shown in Figure 4.2.
The set of strings accepted by a CFG G is denoted L(G), and we say G decides (a.k.a.,
generates) L(G).
Given the example CFG G above, its language is L(G) = { 0n :1n | n ∈ N }. A language
generated by a CFG is called context-free or CFG-decidable.6
Example 4.2.4. This is a small portion of the grammar for the Python programming
language (https://docs.python.org/3/reference/grammar.html). The rules use a : in-
stead of →, and they allow some syntactic sugar such as regex operators, but the grammar
could in principle be specified using only the syntax as defined above.
6
The common term is “context-free”. We use the phrase “CFG-decidable” to be consistent with our convention
of naming language classes after a model of computation defining them.
38 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION
S
S S
S S
ε S
ε
( ( ) ( ( ) ) )
Figure 4.2: A parse tree for a derivation of (()(())). Each internal node represents a variable
to which a production rule was applied, the children represent the variable(s) and terminal(s) on
the right side of the rule. Concatenating the leaves in order from left to right gives the produced
string.
end of lecture 2b
0,1
0 1
q0 1 q1 0,1 q2
1 0
s 1
ε 0
0
r0 1 r1
Figure 4.3: A nondeterministic finite automaton (NFA) called N1 . Differences with a DFA are
highlighted in red: 1) There are two 1-transitions leaving state q0 , 2) There is an “ε-transition”
leaving state s, 3) There is no 1-transition leaving states s, r0 , or q2 and no 0-transition leaving
states r1 or q2 . Normally “missing” transitions are not shown in NFAs, but we highlight them here
to emphasize the difference with a DFA.
• An NFA state may have any number (including 0) of transition arrows out of a state,
for the same input symbol.7
• An NFA may change states without reading any input (an ε-transition).
• If there is a series of choices (when there is a choice) to reach an accept state after
reading the whole string, then the NFA accepts. Otherwise, the NFA rejects.8
Example 4.3.1. Design an NFA to decide the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.
See Figure 4.4. This uses nondeterministic transitions (but no ε-transitions) to “guess”
when the string is 3 bits from the end.
0,1
1 0,1 0,1
Figure 4.4: An NFA N2 deciding the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.
When defining ∆, we assume that if for some q ∈ Q and b ∈ Σε , ∆(q, b) is not explicitly
defined, then ∆(q, b) = ∅.
Example 4.3.3. Recall the NFA N1 in Figure 4.3. Formally, N1 = (Q, Σ, ∆, s, F ), where
• Q = {s, r0 , r1 , q0 , q1 , q2 },
• Σ = {0, 1},
• F = {q2 , r0 , r1 }, and
• ∆ is defined
∆ 0 1 ε
s {q0 } ∅ {r0 }
r0 {r1 } ∅ ∅
r1 ∅ {r0 } ∅
q0 {q0 } {q0 , q1 } ∅
q1 {q2 } {q2 } ∅
q2 ∅ ∅ ∅
ε: s,r0 10: ∅
0: q0 , r1 11: ∅
1: ∅ 000: q0
00: q0 010: q0 , q2 , r1
01: q0 , q1 , r0 011: q0 , q1 , q2
N1 accepts a string if any of states it could be in after reading the string are accepting;
it’s okay for some of them to be rejecting as long as at least one is accepting. Thus, N1
accepts ε, 0, 01, 010, 011 and rejects 1, 00, 10, 11, 000.
4.3. NONDETERMINISTIC FINITE AUTOMATA (NFA) 41
1. x = y1 y2 . . . ym ,
2. q0 = s,
4. qm ∈ F .
The two sequences together uniquely identify which transition arrows were followed. We refer
to either the sequence of transition arrows, or sometimes the sequence of states followed, as
a computation sequence of N on x.9
0,1
ε 1 0,1 0,1
a b c d e
Figure 4.5: An NFA N3 deciding the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.
9
Unlike a DFA, even if we know x and q0 , . . . , qm , we cannot uniquely identify which transitions were followed,
since it may be possible to take ε-transitions at different points while reading x, yet follow the same sequence of
states. (Example?)
42 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION
Example 4.3.4. Figure 4.5 shows another NFA deciding if the third-to-last bit is 1, but using
an ε-transition. This example shows that unlike a DFA, the sequence of states q0 , q1 , . . . , qm
in the definition of NFA acceptance of a string of length n can be longer than n + 1. For
example, the string 0100 has y1 = 0, y2 = ε, y3 = 1, y4 = 0, y5 = 0 and q0 = a, q1 = a, q2 =
b, q3 = c, q4 = d, q5 = e.
Note that if N is actually deterministic (i.e., no ε-transitions and |∆(q, b)| = 1 for all
q ∈ Q and b ∈ Σ), then the sequence of states leading to an accept state on a given input
is unique, and it is the same sequence of states as in the definition of DFA acceptance, with
n + 1 states to accept a string of length n.
Compared to DFAs, with NFAs, strings appear
• easier to accept, but
• harder to reject.10
We won’t spend time practicing NFA design right now as we did with DFAs. The utility
of the non-determinism “feature” of NFAs will become more apparent in Chapter 5.
end of lecture 2c
10
The trick with NFAs is that it becomes (apparently) easier to accept a string, since multiple paths through the
NFA could lead to an accept state, and only one must do so in order to accept. But NFAs aren’t magic; you can’t
simply put accept states and ε-transitions everywhere and claim that there exist paths doing what you want.
By the same token that makes acceptance easier, rejection becomes more difficult, because you must ensure that,
if the string ought to be rejected, then no path leads to an accept state. Therefore, the more transitions and accept
states you throw in to make accepting easier, that much more difficult does it become to design the NFA to properly
reject. The key difference is that the condition “there exists a path to an accept state” becomes, when we negate
it to define rejection, “all paths lead to a reject state”. It is more difficult to verify a “for all” claim than a “there
exists” claim.
Chapter 5
So far we have read and “programmed” DFAs, CFGs, regex’s, and NFAs that decide par-
ticular languages. We now study fundamental properties shared by all languages decided
by these models. This will involve thinking at a higher level of abstraction than we’ve done
so far. Rather than starting with a concrete problem we want to solve, and designing, for
example, a DFA for it, we start with an abstract DFA D given to us by someone else. We
don’t know what D looks like, other than it is a valid DFA deciding some language (we don’t
know the language, but we know it’s called L(D)). We have to change it into a second DFA
D0 , for instance deciding the complement of L(D).
This might be awkward to think about, but all we are doing is describing an algorithm
that operates on DFAs. Just as with a sorting algorithm that sorts lists, you can’t make
any assumptions about the list. It could be any list, with any number of elements, in any
order, and the algorithm has to work on all of them. That’s the same idea we will apply
in this chapter: we will describe algorithms that take an automaton as input and produce
another automaton as output. We sometimes call these algorithms “constructions” because
the output of the algorithm is an automaton like a DFA or a regex that has been constructed.
We’re also not going to implement the algorithms in code, although you could. In fact, the
autograders implement all of them in code!
43
44 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
At this point you may understand the meaning of the above observation, but not under-
stand why it’s worth thinking about. So what if the DFA-decidable languages are closed
under complement? Who cares? If we think about it from the point of view of solving
problems, understanding these closure properties is often the key to breaking the problem
down and solving it.
The problem you are trying to solve, i.e., the language L you are trying to decide, may
not be obviously decidable by any DFA. But if you can spot that L is simply the complement
of another DFA-decidable language, then you know immediately that L is DFA-decidable.
It goes the other way too: in Chapter 7 we will show that certain languages are not DFA-
decidable. The techniques for proving such impossibility results are quite different from
showing that a language is DFA-decidable. So you may be asked to show L is not DFA-
decidable, and it’s not obvious how to do it. But, if you have already shown that L is not
DFA-decidable, then you can immediately conclude that L cannot be DFA-decidable either,
because then otherwise L would be also.
Let’s think about the other common set-theoretic operations we have for languages:
union, intersection (which are applicable for any kind of set, including languages), concate-
nation, and Kleene star (which only are applicable to languages). Let A and B be languages.
Union: A ∪ B = { x | x ∈ A or x ∈ B }
Intersection: A ∩ B = { x | x ∈ A and x ∈ B }
(Kleene) Star: A∗ = ∞ n 3
S
n=0 A = { x1 x2 . . . xk | k ∈ N and x1 , x2 , . . . , xk ∈ A }
operations, but that so are languages decided by a regex, NFA, or RRG, because all of these
models define exactly the same class of languages.
Let’s think about other models of computation besides DFAs. Is it obvious that NFA-
decidable languages are closed under complement? No it isn’t. You might be tempted to
swap the accept and reject states, but that won’t work! (Homework exercise.)
What about regex-decidability? Recall in the section on regex’s, we wrote a regex match-
ing strings that represent C++ double literals (for P = 1 ∪ 2 ∪ . . . ∪ 9 and D = 0 ∪ P ):
(+ ∪ − ∪ ε)(.D+ ∪ (P D∗ ∪ 0).D∗ )
Now consider this task: write a regex matching the complement of this language, i.e., letting
D = {0, 1, . . . , 9}, all strings over D ∪ {+, −, .} that are not C++ double literals.
It’s not obvious how to do it, right? It is true, in fact: the complement of any regex-
decidable language is also regex-decidable. But it’s not clear why, right now. For some
models of computation it is not even true. For example, the CFG-decidable languages are
not closed under complement.4
Or, consider these two regex’s over alphabet Σ = {0, 1}: R1 = Σ(ΣΣ)∗ , matching any
odd-length binary string, and R2 = Σ∗ 0011Σ∗ , matching any string containing the substring
0011. The regex R1 ∪ R2 matches their union: any string that is odd length or contains the
substring 0011. What about their intersection, odd length strings that contain the substring
0011?
We can consider that wherever 0011 appears, since it is even length, for the whole string
to be odd length, we must either have an even number of bits before 0011 and an odd number
of bits after, or vice versa. Recall that (ΣΣ)∗ decides even-length strings and Σ(ΣΣ)∗ decides
odd-length strings. Thus, this regex works:
R3 = (ΣΣ)∗ 0011 Σ(ΣΣ)∗
∪ Σ(ΣΣ)∗ 0011 (ΣΣ)∗
But this is ad-hoc, requiring us to think about the details of the two languages L(R1 )
and L(R2 ). Consider changing to R1 = ΣΣΣ(ΣΣΣΣΣ)∗ matching all strings whose length
is congruent to 3 mod 5. We could do a similar case analysis of all the combinations of
numbers of bits before and after 0011 that would lead the whole length to be congruent to
3 mod 5. But this would be tedious, and it similarly requires us to understand the details
of the two languages.
Could we devise a procedure to automatically process R1 and R2 , based solely on their
structure, not requiring any special reasoning specific to their languages, which would tell
us how to construct R3 with L(R3 ) = L(R1 ) ∩ L(R2 )?
Similarly, can we devise a way to automatically convert a regex R into one deciding the
complement L(R)? This task is easy enough with DFAs, but it’s not clear for regex’s.
Conversely, it is not obvious how to show the DFA-decidable languages are closed under
∪, ∩, ◦ or ∗ .
4
The language {0i 1j 2k | i 6= j or j 6= k} is CFG-decidable, but its complement {0i 1i 2i | i ∈ N} is not.
46 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
So to recap: a DFA can easily be modified to decide the complement. Regex’s can trivially
be combined for certain operations: ∪, ◦, ∗ , because those operations are built right into
the definition. NFAs can’t even seem to be modified easily to decide the complement. Some
closure properties that are easy to prove for one model seem difficult in the other.
But we made the claim that in fact all of these models define the exact same set of
decision problems: the regular languages, which are claimed to be closed under all of these
operations. Thus, if there are DFAs or regex’s or NFAs or RRGs for languages A, B, then
there are DFAs and regex’s and NFA’s and RRG’s for all of A, B, A ∪ B, A ∩ B, A ◦ B, A∗ ,
and A.
Roadmap for next two chapters. In this chapter, we will show that we can prove some
of these closure properties by giving constructions (algorithms) to transform one or two
instances of a model into another instance of that same model, deciding a (potentially)
different language. For instance, the next section shows how to combine two DFAs D1 and
D2 into a third DFA D such that L(D) = L(D1 ) ∪ L(D2 ). These constructions are more
involved than the very simple “swap the accept/reject” states construction to show DFA-
decidability is closed under complement, but they follow the same principle: the constructed
automaton “simulates” the given automata in some way. At the end of this chapter, we will
have shown that the DFA-decidable languages are closed under complement, ∪, and ∩, the
NFA-decidable languages are closed under ∪, ◦, and ∗ , and the regex-decidable languages
are closed under ∪, ◦, and ∗ . The last part about regex’s follows directly by the definition
of regex’s, so we won’t actually discuss it again in this chapter.
In Chapter 6, we do something similar, but instead, show constructions that transform
an instance of a model into an instance of a different model, deciding the same language.
For instance, we will show how to take an arbitrary NFA N , which decides language L(N )
by definition, and produce a DFA also deciding L(N ). This will be done between all four
models of DFA, NFA, regex, and RRG, showing that all of these models have the same
computational power: if one of them decides a language, so do all the others. At that
point it will make sense to call the class of languages they decide simply “regular”, without
referring specifically to one of those models.
It will follow that, for example, the DFA-decidable languages are closed under ∗ , although
the construction is somewhat indirect:
1. Start with a DFA D1 deciding language A. Note that D1 is also an NFA.
2. Convert D1 to an NFA N deciding A∗ .
3. Convert N to an equivalent DFA D2 deciding A∗ .
See Figure 5.1. Informally, we know how to create a DFA D1 for L1 and a DFA D2 for
L2 . To design D, we want to simulate the actions of both D1 and D2 , simultaneously, on the
input. This is done by giving each state of D two “fields”, one field to track the state of D1 ,
and the second to track the state of D2 . Each field is updated independently, depending only
on the input symbol and the previous value of that field, but not depending on the other
field. At any time (including after reading all the input symbols), D knows both states that
D1 and D2 would be in, so it knows whether each is accepting. It should accept if either (or
both) are accepting.
a a a a
0 1 2 3 4
Figure 5.1: Bottom left: a DFA deciding the language L3 = {an | n ≡ 2 mod 3}. Top right: a
DFA deciding the language L5 = {an | n ≡ 1 mod 5 or n ≡ 4 mod 5}. Bottom right: a DFA
deciding the language L3 ∪ L5 . For readability, transitions are not labeled, but they should all be
labeled with a. The accept states are row 2 and columns 1 and 4. To decide L3 ∩ L5 , we choose
the accept states to be {(2, 1), (2, 4)} instead, i.e., where row 2 meets columns 1 and 4.
Formally,
• s = (0, 0)
• F = { (i, j) | i = 2 or j = 1 or j = 4 }
D is essentially simulating two DFAs at once: one that computes congruence mod 3
and the other that computes congruence mod 5. We call this general idea of one DFA
simultaneously simulating two others, by having two parts of its state set, one to remember
the state of one DFA and the other to remember the state of the other DFA, the product
construction, because the larger DFA’s state set is the cross product of the smaller two.
48 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
Figure 5.2: The product construction applied to the DFAs D1 and D2 from Figures 3.1 and 3.2, to
obtain a DFA D with L(D) = L(D1 ) ∪ L(D2 ). We can think of the states of D as being grouped
into three groups representing q1 , q2 , and q3 respectively, and within each of those groups, there’s
a state representing r1 and a state representing r2 . In this figure, accept states have bold outlines.
To decide L(D1 ) ∩ L(D2 ) instead, we should choose only (q2 , r2 ) to be the accept state.
For an example of the product construction on DFAs with a larger alphabet, as well as
a different way to visualize the product construction, see Figure 5.2.
We now generalize this idea.
Q = Q1 × Q2 (= { (r1 , r2 ) | r1 ∈ Q1 and r2 ∈ Q2 })
5
Proof Idea: (The DFA Product Construction for ∪)
We must show that if A1 and A2 are regular, then so is A1 ∪ A2 . Since A1 and A2 are regular, some DFA D1
decides A1 , and some DFA D2 decides A2 . It suffices to show that some DFA D decides A1 ∪ A2 ; i.e., it accepts a
string x if and only if at least one of D1 or D2 accepts x.
D will simulate D1 and D2 . If either accepts the input string, then so will D. D must simulate them simultaneously,
because if it tried to simulate D1 , then D2 , it could not remember the input to supply it to D2
5.2. DFA UNION AND INTERSECTION (PRODUCT CONSTRUCTION) 49
• δ simulates moving both D1 and D2 one step forward in response to the input symbol.
Define δ for all (r1 , r2 ) ∈ Q and all b ∈ Σ as
δ( (r1 , r2 ) , b ) = ( δ1 (r1 , b) , δ2 (r2 , b) )
6
• F must be accepting exactly when either one, or the other, or both, of D1 and D2 are
in an accepting state:
F = { (r1 , r2 ) | r1 ∈ F1 or r2 ∈ F2 } .
7
Then, after reading an input string x that puts D1 in state r1 and D2 in state r2 , we have
that the state q = (r1 , r2 ) is in F if and only if r1 ∈ F1 or r2 ∈ F2 , which is true if and only
if x ∈ L(D1 ) or x ∈ L(D2 ), i.e., x ∈ L(D1 ) ∪ L(D2 ).
end of lecture 3a
Theorem 5.2.3. The class of DFA-decidable languages is closed under ∩.
Multiple ∪ or ∩. We have only proved that the DFA-decidable languages are closed under
a single application of ∪ or ∩ to two DFA-decidable languages A1 and A2 . But, we can use
induction to prove that they are closed under finiteS union and intersection; for instance, for
any k ∈ N, if A1 , . . . , Ak are DFA-decidable, then ki=1 Ai is DFA-decidable.
What about infinite union or intersection?
6
This is difficult to read! Be sure to read it carefully and ensure that the type of each object is what you expect.
Q has states that are pairs of states from Q1 × Q2 . Thus, δ’s first input, a state from Q, is a pair of states (r1 , r2 ),
where r1 ∈ Q1 and r2 ∈ Q2 . Similarly, the output of δ must also be a pair from Q1 × Q2 . However, δ1 ’s first input,
and its output, are just single states from Q1 , and similarly for δ2 .
7
This is not the same as F = F1 × F2 . What would the automaton do in that case?
50 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
Closure only goes one way. We now know if A and B are DFA-decidable, then A ∪ B is
DFA-decidable. What about the reverse claim? Can we say that if A ∪ B is DFA-decidable,
then both A and B must be DFA-decidable? No: let A be any DFA-decidable language, and
let B = A. Then A ∪ B = Σ∗ , which is DFA-decidable, but A is not (neither is B, otherwise
A would also be by closure under complement).
Difficulty of showing DFA closure under ◦ or ∗ . So now we know that the DFA-decidable
languages are closed under complement, ∪, and ∩. What about ◦ or ∗ ? Given two DFAs D1
and D2 , it seems difficult to create a DFA D deciding L(D1 ) ◦ L(D2 ) using the same ideas
as we used in the product construction.
The obvious thing to try is to let D have all the states of both D1 and D2 (i.e., take the
union of their states), and on input w, to start simulating D1 reading some prefix x v w,
and then at some point to switch from D1 to D2 , and have D2 process the remaining suffix
y of w, so that w = xy and D accepts if and only if D1 accepts x and D2 accepts y. But
how does D know when to switch? If w is the concatenation x of x ∈ L(D1 ) and y ∈ L(D2 ),
there’s no delimiter to tell us where x ends and y begins. In fact, it might be possible to
split w into x and y in multiple ways, for example: if L(D1 ) = {0, 00} and L(D2 ) = {0, 00},
then w = 000 could be obtained by letting x = 0 and y = 00, or x = 00 and y = 0.
For example, 1000101 ∈ A since 1002 = 4 ≡ 1 mod 3 and 01012 = 5 ≡ 2 mod 3. However,
01 6∈ A since the possible values for x and y are (x = ε, y = 01), (x = 0, y = 1), or
(x = 01, y = ε), and in all cases ny 6≡ 2 mod 3.
5.3. NFA UNION, CONCATENATION, AND STAR CONSTRUCTIONS 51
N1 a a
N
r0 a r1 a r2 a a
r0 r1 r2
ε
s
a a
N2 ε
a a a a t0 a t1 a t2 a t3 a t4
t0 t1 t2 t3 t4
Figure 5.3: Example of the NFA union construction. This NFA decides the language L(N ) =
L(N1 ) ∪ L(N2 ) = {x ∈ {a}∗ | |x| is a multiple of 3 or 5 }, using ε-transitions to guess which of the
two NFAs to simulate.
See Figure 5.4. This simulates the transitions of the DFA from Figure 3.5 twice, once on
x and once on y: it uses ε-transitions to guess where x ends and y begins (and it only will
make that jump if the DFA is currently accepting x).
N1 1 0 N2 1 0
q0 q1 q2 r0 r1 r2
1 0 1 0
0 1 0 1
ε
N ε
1 0 1 0
q0 q1 q2 r0 r1 r2
1 0 1 0
0 1 0 1
Figure 5.4: Example of the NFA concatenation construction. An NFA N deciding the language
L(N1 ) ◦ L(N2 ). N essentially consists of N1 and N2 , with ε-transitions to “guess” when to switch
from N1 to N2 (but only when N1 is accepting).
Example 5.3.3. Let L = {x ∈ {a, b}∗ | x has an odd number of b’s followed by an a}.
Design an NFA to decide the language L∗ .
L is decided by the NFA N in Figure 5.5. N simulates D over and over in a loop, using
ε-transitions to guess when to start over (but only starting over if the DFA is currently
accepting x).
It is important that we did not make the old start state accepting. This changes the
semantics of the underlying NFA and could result in a mistake, if positive-length strings
reach back to the start state.
end of lecture 3b
52 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
N1 b
a
s1
b
incorrect N ε
b
a
s1
b
correct N ε
b
a
s ε s1
b
Figure 5.5: Example of the NFA Kleene star construction on an NFA. N simulates N1 over and
over in a loop, using ε-transitions to guess when to start over (but only when N1 is accepting). The
NFA N1 decides the language described by the regular expression b(bb)∗ a, i.e., an odd number of
b’s followed by an a. L(N ) = L(N1 )∗ is then the language containing ε (as any starred language
should) and positive-length strings in which all occurrences of an a are preceded by an odd number
of b’s, and the last bit is a a. It would be a mistake simply to make the old start state accepting.
The incorrect NFA N shown accepts the strings bb and bbbb, which do not end in a a.
5.3.2 Proofs
The above examples show the basic ideas for how to prove that the NFA-decidable languages
are closed under union, concatenation, and Kleene star. Now, we actually prove those facts,
by showing how to make the ideas work on any NFA.
Theorem 5.3.4. The class of NFA-decidable languages is closed under ∪.
Proof. Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) and N2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFAs, where Q1 ∩Q2 = ∅.
Define the NFA N deciding L(N1 ) ∪ L(N2 ).
See Figure 5.3 for an example. Intuitively, on input w ∈ Σ∗ , N nondeterministically
guesses whether to simulates N1 or N2 by taking an ε-transition from N ’s start state to the
start state for either N1 or N2 . N accepts if the NFA it guessed accepts.
To fully define N = (Q, Σ, ∆, s, F ):
• N has all the states of N1 and N2 , and one extra new state s, so Q = Q1 ∪ Q2 ∪ {s},
where s 6∈ Q1 ∪ Q2 .
• N accepts if the NFA it guessed accepts: F = F1 ∪ F2 .
• N simulates N1 or N2 , depending on the initial guess. To do this, for all q ∈ Q1 ∪ Q2
and b ∈ Σε ,
∆1 (q, b), if q ∈ Q1 ;
∆(q, b) =
∆2 (q, b), if q ∈ Q2 ;
5.3. NFA UNION, CONCATENATION, AND STAR CONSTRUCTIONS 53
From now on, we will not define the full transition function ∆ in NFA constructions such
as this. Instead, we will use the equivalent (but easier) terminology of sets of transitions.
• N starts by simulating N1 , so s = s1 .
Detailed proof of correctness: To see that L(N1 ) ◦ L(N2 ) ⊆ L(N ). Let w ∈ Σ∗ . If there
are x ∈ L(N1 ) and y ∈ L(N2 ) such that w = xy (i.e., w ∈ L(N1 ) ◦ L(N2 ), then there is
a sequence of choices of N such that N accepts w (i.e., q ∈ L(N )): follow the choices N1
makes to accept x, ending in a state in F1 , then execute the ε-transition to state s2 defined
above, then follow the choices N2 makes to accept y. This shows L(N1 ) ◦ L(N2 ) ⊆ L(N ).
To see the reverse containment L(N ) ⊆ L(N1 ) ◦ L(N2 ), suppose w ∈ L(N ). Then there is
a sequence of choices such that N accepts w. By construction, all paths from s = s1 to some
state in F = F2 pass through s2 , so N must reach s2 after reading some prefix x v w, and
the remaining suffix y of w takes N from s2 to a state in F2 , i.e., y ∈ L(N2 ). By construction,
all paths from s = s1 to s2 go through a state in F1 , and those states are connected to s2
only by a ε-transition, so x takes N from s1 to a state in F2 , i.e., x ∈ L(N1 ). Since w = xy,
this shows that L(N ) ⊆ L(N1 ) ◦ L(N2 ).
Thus N decides L(N1 ) ◦ L(N2 ).
where each qj,|xj | ∈ F and has a ε-transition to s.9 For each j ∈ {1, . . . , k) and i ∈ {0, |xj |−1},
let yj,i be the corresponding symbol in Σε causing the transition from qj,i to qj,i+1 , and let
xj = yj,0 yj,1 . . . yj,|xj |−1 . Then the sequences s, qj,1 qj,2 . . . qj,|xj | and yj,0 yj,1 . . . yj,mj −1 testify
that D accepts xj , thus xj ∈ L(D). Thus w = x1 x2 . . . xk , where each xj ∈ L(D), so
w ∈ L(N )∗ , showing L(N ) ⊆ L(D)∗ .
Thus N decides L(D)∗ .
end of lecture 3c
9
Perhaps there are no such ε-transitions, in which case k = 1.
56 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
Chapter 6
Equivalence of models
The previous chapter showed how to convert instances of one model into other instances of
the same model, deciding a different language. In this chapter, we show how to convert an
instance of one model into an instance of a different model, deciding the same language.
This is our primary tool for comparing the power of different models of computation.
For example, if any regex can be simulated by an NFA, then any regex-decidable language
is NFA-decidable, so NFAs are at least as powerful as regex’s. Conversely, if any NFA can be
simulated by a regex, then regex’s are at least as powerful as NFAs. If both are true, then
NFAs and regex’s have equivalent computational power. By the end of this chapter, we will
have shown that DFAs, NFAs, regex’s, and RRGs all have the same computational power:
they all decide the same class of languages, which we call “regular”.
57
58 CHAPTER 6. EQUIVALENCE OF MODELS
D a b
a,b Ø {1} {2} {1,2}
N b
1 a,b
b a b a a
a b a
ε b
a 2 3 a
a,b {3} {1,3} a {2,3} {1,2,3}
a
b
Figure 6.1: Example of the subset construction. We transform an NFA N into a DFA D. Each
state in D represents a subset of states in N . The blue dashed arrows in D represent transitions and
start state if the ε-transition in N were absent, showing the special case of the subset construction
when there are no ε-transitions. The blue dashed arrows should be removed from D, and the red
arrows should be added, to account for the ε-transition in N . The D states {1} and {1, 2} are
unreachable from the start state {1, 3}, so we could remove them without altering the behavior of
D. Either version of the DFA would work.
If N is in state q ∈ R after reading some portion of the input, then the states could it
be in after reading the next symbol b are all the states in ∆(q, b); since N could be in
any state q ∈ R before reading b, then we must take the union over all q ∈ R.
Now we show how to handle the ε-transitions. For any R ⊆ QN and define
For example, in Figure 6.1, if we let R = {1, 2}, then E(R) = {1, 2, 3}. An example in
which multiple ε-transitions can be followed is in Figure 6.2; see the caption for details.
To account for the ε-transitions, D must be able to simulate
ε
1 6 5
a
b ε ε ε
ε
a 2 a,b
3 4
Figure 6.2: Example NFA to demonstrate E(R) for different subsets of nodes R ⊆ {1, 2, 3, 4, 5, 6}.
E({1}) = {1, 3, 4, 5, 6}, E({2}) = {2},
S E({3}) = {3, 4, 5}, E({4}) = {4, 5}, E({5}) = {4, 5},
E({6}) = {6}. For other sets, E(R) = q∈R E({q}), e.g., E({2, 5}) = E({2}) ∪ E({5}) = {2, 4, 5}.
2. N following ε-transitions before the first non-ε-transition, i.e., define sD = E({sN }).
We have constructed a DFA D such that the state D is in after reading string x is the subset
of states that N could be in after reading x. By the definition of FD , D accepts x if and
only if at least one of the states N could be in is accepting, i.e., if and only if N accepts x.
Thus L(D) = L(N ).
So from now on, when we call a language “regular”, which up to this point has been a
synonym for DFA-decidable, we can also interpret it to mean NFA-decidable.
end of lecture 4a
Alternate choices that also work. One thing to note about some of the choices we made
about when to simulate ε-transitions: we could have chosen differently and construct a
different, but still correct, DFA. For example, we could simulate ε-transitions before each
input-consuming-transition: [
δ(R, b) = ∆(E({q}), b),
q∈R
and then we could simply let sD = {sN }. However, we’d have to remember to also do ε-
transitions after the last symbol has been read, by asking not only whether any of the states
N could be in is accepting, but also if any accept states are reachable from those states by
ε-transitions:
FD = {A ⊆ QN | E(A) ∩ FN 6= ∅}.
There’s nothing wrong with doing the above; it also creates a DFA that decides L(N ). For
example, if we did this in Fig. 6.1, then the a-transition from {1, 2}, instead of going to
{2, 3}, would go to {1, 2, 3}. Although this would be a different DFA, it would be equivalent
to that of Fig. 6.1 in the sense of deciding the same language.
60 CHAPTER 6. EQUIVALENCE OF MODELS
We will take the proof of Theorem 6.1.2 to be the “official” subset construction. Partic-
ularly for auto-grading, we need to define something to be the “official” construction, and
this will be it.
Exponential blowup of states in subset construction. Note that the subset construc-
tion uses the power set of QN , which is exponentially larger than QN : |QD | = |P(QN )| =
2|QN | . It can be shown (see Kozen’s textbook) that there are languages for which this
is necessary; the smallest NFA deciding the language has n states, while the smallest
DFA deciding the language has 2n states. For example, for any n ∈ N, the language
{ x ∈ {0, 1}∗ | x[|x| − n] = 0 } has this property.
Thus, if we only care that the number of states is finite, and consider all finite numbers
“equally finite”, then finite automata have equivalent power whether or not they are nonde-
terministic. But if we consider the number of states as a resource (requiring more memory
to implement, for example), then NFAs are more powerful in the sense that for certain prob-
lems, they use vastly less resources than the best DFA. This theme will be revisited in the
unit on computational complexity.
0
P 1 T P → 0T | 1R
T → 0R | 1P
0 1 0 R → 0P | 1R
P → ε
1 R T → ε
Intuitively, any derived string has exactly 1 or 0 variables. G has one production rule for
each transition in D, which adds a new symbol to the produced string and possibly changes
the variable. G also has rules to change the variable to ε if it represents an accept state.
6.2. EQUIVALENCE OF RGS AND NFAS 61
This ensures that any possible string of terminals can be produced, but it is followed by a
variable. The variable can be erased only if it represents an accept state.
Formally, Γ = Q, Σ is the same for each, and S = s. We have a clash of conventions,
lowercase for DFA state names and uppercase for CFG variable names, so we choose upper-
case. There is one rule for each transition in δ: if δ(A, b) = C, then G has a rule A → bC.
There is also one rule for each accept state: if A ∈ F , then G has a rule A → ε.
Detailed proof of correctness: First we show L(G) ⊆ L(D). Each rule has at most one
variable on the right, so every derived string has at most one variable A. To eliminate A
and result in an all-terminal string, A must represent an accepting state, so the string is
accepted by D.
Conversely, to show L(D) ⊆ L(G), any string x accepted by D, which visits states
s = A0 , A1 , . . . , An ∈ F , can be produced by G by applying these rules in order:
A0 → x[1]A1
A1 → x[2]A2
A2 → x[3]A3
...
An−1 → x[n]An
An → ε,
A → 0B
B → A1
A → ε
decides the language {0n 1n | n ∈ N}, which we stated above (and will prove in Chapter 7),
is not regular.
There’s one more variation to consider: also allowing transitions of the form A → B,
i.e., just changing one variable to another, without introducing a new terminal. How would
that affect the construction of Theorem 6.2.2? How would such a rule be represented in the
NFA?
end of lecture 4b
a a
b b
ab a ε b
ε a ε b
ab ∪ a
ε a
ε
ε a ε b
ε
(ab ∪ a)*
ε a
ε
ε
ε a ε b
ε ε
(ab ∪ a)*b ε b
ε a
ε
ε
Figure 6.4: Example of converting the regex (ab ∪ a)∗ b to an NFA. The recursion is shown “bottom-
up”, starting with the base cases for a and b and building larger NFAs using the union construction,
concatenation construction, and Kleene star construction, as in Figures 5.3, 5.4, and 5.5.
Thus, to convert a regex into an NFA, we replace the base cases in the regex with the
simple 1- or 2-state NFAs above, and then we connect them using the constructions for NFA
union, concatenation, and Kleene star shown in Section 5.3, shown by example in Figures 5.3,
5.4, and 5.5.
N' 1 0
s' q1 q2
1 0
0 1
N ε
ε
1 0
s ε
s' q1 q2 a
1 0
0 1
Figure 6.5: Modifying an NFA to ensure it has no transitions entering the start state s, and it has
a single accept state a with no transitions leaving a.
• F = {a}, and
ε ε
• ∆ has all the transitions of ∆0 , and also s →
− s0 , and for each a0 ∈ F 0 , a0 →
− a.
Now we show how to convert any NFA N into a regex R so that L(N ) = L(R). First we
introduce a generalization of NFAs that will be helpful in the conversion, called expression
automata (EA). Intuitively, an expression automaton can have its transition arrows labeled
with arbitrary regular expressions. A transition from state a to b labeled with regex X is
X
written a −→ b. Since ε and individual alphabet symbols are regular expressions, every NFA
is an EA. The idea is that an EA may read any number of symbols from the input while
following a transition, as long as the substring read matches the regular expression labeling
the transition.
b+a
a
aba
Figure 6.6: Example of an expression automaton (EA). It generalizes NFAs to allow arbitrary
regex’s on the transitions. This EA accepts the strings bba, bbaa, and bbaaababbba, and it rejects
the strings ε, a, b, baaba, and bbaabab.
The construction will work like this. We start with an NFA (which is a special type of
EA). Then we repeatedly remove one state i at a time from the EA. When we do, we stitch
together the transition arrows coming into and out of i, combining their regex’s so that the
EA still decides the same language. When we are done, we will have exactly two states:
a rejecting start state, with a transition going to an accept state. The regex labeling that
transition is the output of the construction.
Now, we show the construction that converts any NFA into an equivalent regex.
s R a
Intuition: Last step going from 3 states to final 2 states. Suppose we have an EA E1
of the following form.
Y
s X Z a
i
W
where W, X, Y, Z are regex’s. Then the following EA E2 decides the same language and
matches the form we seek:
s W ∪ XY ∗ Z a
because in both, the strings leading to a are either of the form w ∈ L(W ), or of the form
xy k z for some k ∈ N, where x ∈ L(X), y ∈ L(Y ), and z ∈ L(Z).
How to remove a state when there are > 3 states remaining. In general, we pick a state
i other than s or a arbitrarily and remove it. If there are more than 3 states remaining, we
may have many transitions entering and leaving the intermediate state i. We iterate over
X Z
each pair of states q, r such that there are transitions q −
→ i and i −→ r. (We do this even
Y W
if q = r, such as t in Fig. 6.7.) Suppose there are also transitions i −→ i and q −→ r. We
W ∪XY ∗ Z Y W
replace these 4 transitions with q −−−−−−→ r. If i − → i or q −→ r or both were not there,
then omit Y or W or both (i.e., the regex could be W ∪ XZ, XY ∗ Z, or XZ). Figure 6.7
shows an example.
Repeat this for all states i ∈ Q \ {s, a}, and at that point the EA matches the form
described above, with a single regex R such that L(R) = L(N ).
Now we have done everything we said we’d do in this chapter: we have shown that DFAs
and NFAs are equivalent, that regex’s are equivalent to both, and that RGs are equivalent
to all. We say a language is regular if it decided by any of these automata, knowing that an
automaton of one type can be converted to an equivalent automaton (meaning, it decides
the same language) of any of the other types. In the next chapter, we move on to studying
languages that cannot be decided by any of these automata.
6.4. OPTIONAL: EQUIVALENCE OF DFAS AND CONSTANT MEMORY PROGRAMS 67
E1 q 110
E2 q 110 ∪ 0*1(00)* 0*1(00)*
0 11 0 11
0*1 r a r a
0*1(00)*1+
s ε s
01 i 01(00)*
1* 1*
1+ 00
t t
01(00)*1+
Figure 6.7: Modifying EA E1 to remove state i, resulting in EA E2 . We have only one state removal
to show the intuitive idea, because this construction makes very large regexes when all states are
removed. The pairs of states between which we add transitions are (q, r), (q, t), (t, r), (t, t). Note
that since E1 has transitions between t and i in both directions, E2 has a self-loop on t. Also note
110
that since there was already a transition q −−→ r, we must take the union of the regex 110 with the
110∪0∗ 1(00)∗
new regex 0∗ 1(00)∗ on the new transition q −−−−−−−−→ r.
end of lecture 4c
14 }
15 q1 :
16 if ( it == input_string . end () )
17 return false ;
18 if (* it == ’0 ’) {
19 it ++;
20 goto q2 ;
21 } else {
22 it ++;
23 goto q0 ;
24 }
25 q2 :
26 if ( it == input_string . end () )
27 return false ;
28 if (* it == ’0 ’) {
29 it ++;
30 goto q1 ;
31 } else {
32 it ++;
33 goto q2 ;
34 }
35 }
Now, the semantics of DFA execution take care of some of the logic above, which is why
the DFA has only 3 states, yet the C++ program has 35 lines of code. But it is still one
block of 10 lines per state.
It is also the case that it is somewhat “cheaty” to say the program has no memory, and
not just because one iterator variable is needed. Even if no variables were needed, usually
the “line of code” is translated, when the program is compiled, to machine code instructions,
and the “line” of code becomes an index into these instructions, i.e., an integer, implemented
in memory/cache as something called a program counter.
The above example illustrates that the following implication is true: if a language is
DFA-decidable, then it is solvable by a C++ bool function with only a single variable to
iterate over the input. The converse implication is true as well, though we will not attempt
to prove it here, as the complex semantics of a general-purpose programming language such
as C++ would make for a tedious proof.
Similar statements are true about other programming languages such as Python, but
not exactly the same statement, since Python lacks a goto statement. Like C/C++, the
language of DFAs allows unstructured programming, where one can potentially jump from
any state (line of code) to any other. So to truly implement a DFA in a structured language
such as Python would require some memory to keep track of the state. For example, the
following Python function implements the same DFA as above, using an enumerated type
to represent the state:
1 from enum import Enum , auto
2 class State ( Enum ) :
3 Q0 = auto ()
4 Q1 = auto ()
5 Q2 = auto ()
6.4. OPTIONAL: EQUIVALENCE OF DFAS AND CONSTANT MEMORY PROGRAMS 69
6
7 def dfa ( input_string ) {
8 state = State . Q0
9 for ( symbol in input_string ) :
10 if state is State . Q0 :
11 if ( symbol == ’0 ’) :
12 state = State . Q0
13 else :
14 state = State . Q1
15 if state is State . Q1 :
16 if ( symbol == ’0 ’) :
17 state = State . Q2
18 else :
19 state = State . Q0
20 if state is State . Q2 :
21 if ( symbol == ’0 ’) :
22 state = State . Q1
23 else :
24 state = State . Q2
25 if state is State . Q1 :
26 return True
27 else :
28 return False
70 CHAPTER 6. EQUIVALENCE OF MODELS
Chapter 7
71
72CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION
Theorem 7.1.1. Let C = {x ∈ {0, 1}∗ | #(0, x) = #(1, x)}. Then C is not regular.
w = 000000111111
0 0 0 0 0 0
s 1 a b 1 s 1 a b 1 s 1 a b 1
1 1 0,1 0 0 1 0 1 0 1 1 1 0
1 1
0 0 0
e d c e 0 d c 1 0 e 0 d c
1
Figure 7.1: Three different DFAs with p = 6 states, processing input w = 0p 1p = 000000111111,
showing how each DFA partitions w into three substrings w = xyz, where x leads to the first
occurrence of a repeated state, y loops leading to the second occurrence, and z leads to an accept
state. Transitions followed by x shown in green, by y in red, and by z in blue. Some transitions are
0 0
followed by more than one of x, y, z, for instance a →
− b and b → − c in the leftmost DFA, followed
by both y and z.
2
In other words, the string y takes D from state qi back to itself. Thus, another copy of y at this point would
return to state qi again.
7.2. THE PUMPING LEMMA FOR REGULAR LANGUAGES 73
When programming, when you find yourself tempted to copy and paste code from one
part of a project to another, it is helpful to factor out the common code into a single function
that can be called repeatedly. Similarly, when in the course of proving two theorems, you
find that they have very similar proofs, it can be helpful to factor out the common parts into
a single general-purpose lemma. This guides future proofs and removes redundant parts. We
factor out the common parts of the two proofs into a statement called the Pumping Lemma.
end of lecture 5a
1. for all k ∈ N, xy k z ∈ A,
3. |xy| ≤ p.
The Pumping Lemma is proved formally in Section 7.4. The proof generalizes the two
proofs above, essentially stripping out the parts that are specific to the languages and re-
placing them with an argument that the three conditions hold.
Proof. Assume for the sake of contradiction that B is regular, with pumping length p. Let
w = 0p 1p . Since w ∈ B and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ B.
Since |xy| ≤ p and w starts with p 0’s, y is all 0’s. Since |y| > 0, xyyz has more 0’s than
1’s.
Thus xyyz 6∈ B, contradicting condition (1) of the Pumping Lemma.
Theorem 7.3.2. The language C = {w ∈ {0, 1}∗ | #(0, w) = #(1, w)} is not regular.
To help avoid mistakes that are often made when using the Pumping Lemma, let’s first
see what a common flawed proof of this theorem looks like.
7.3. USING THE PUMPING LEMMA 75
Bad proof. Assume for the sake of contradiction that C is regular, with pumping length p.
Let w = 0dp/2e 1dp/2e . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, w = xyz, where
|y| > 0, |xy| ≤ p, and for all k ∈ N, xy k z ∈ C.
Let x = ε, y = 0dp/2e , and z = 1dp/2e . Then xyyz = 02dp/2e 1dp/2e , which has more 0’s than
1’s.
Thus xyyz 6∈ C, contradicting condition (1) of the Pumping Lemma.
What’s wrong with this proof? It’s highlighted in red. We wanted y to be all 0’s so that
when we pump a second copy into w = xyz to create xyyz, we end up with more 0’s than
1’s. But just because we want y to be all 0’s doesn’t mean we get to simply declare that it
is all 0’s. The source of the error is the sentence, “Let x = ε, y = 0dp/2e , and z = 1dp/2e .”
What if those aren’t the x, y, and z that the Pumping Lemma finds? The Pumping Lemma
is based on reasoning about how a p-state DFA would process w, Fig. 7.1 reminds us that
the DFA dictates to us how w is split into xyz. We don’t get to choose how w is split.
What if x = z = ε and y = 0dp/2e 1dp/2e ? This is totally consistent with the Pumping
Lemma: |y| > 0 and |xy| ≤ p, just as promised. But for this choice of x and y, then xy k z
k
= 0dp/2e 1dp/2e , which has an equal number of 0’s and 1’s. So xy k z ∈ C for all k, and we
don’t get a contradiction.
Think of the Pumping Lemma as a contract: we give it w of length at least p, and it
gives back x, y, and z. It is guaranteed to fulfill the exact terms of the contract: |y| > 0,
|xy| ≤ p, and xy k z ∈ C for all k ∈ N. Nothing more, nothing less.
If we want y to have some extra property, such as being all 0’s, we have to prove that
y has that property, that this property follows from the conditions |y| > 0 and |xy| ≤ p
guaranteed by the Pumping Lemma. We must take care in choosing w so that every choice
of x, y, and z satisfying |y| > 0 and |xy| ≤ p will give us the property we want.
In this case, if we choose w to be a bit longer, we can use condition (3) to conclude that
y is indeed all 0’s. We highlight in blue the parts that are different from other proofs using
the Pumping Lemma.
Proof of Theorem 7.3.2. Assume for the sake of contradiction that C is regular, with pump-
ing length p. Let w = 0p 1p . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, w = xyz,
where |y| > 0, |xy| ≤ p, and for all k ∈ N, xy k z ∈ C.
Since |xy| ≤ p and w starts with p 0’s, y is all 0’s. Then xyyz has more 0’s than 1’s.
Thus xyyz 6∈ C, contradicting condition (1) of the Pumping Lemma.
Here’s an alternate proof that does not directly use the Pumping Lemma. It uses closure
properties and appeals to the fact that we already proved that B = {0n 1n | n ∈ N} is not
regular.
Alternate proof of Theorem 7.3.2. If C were regular, then by closure of regular languages
under ∩, C ∩ L(0∗ 1∗ ) = {0n 1n | n ∈ N} would be regular,4 contradicting Theorem 7.3.1.
4
An analogy: if I tell you I have a number x and an integer n, and I tell you x + n = 3.14, what do you know about
x? We don’t know the exact value of x, but we know it can’t be an integer. The integers are closed under addition
76CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION
Proof. Assume for the sake of contradiction that D is regular, with pumping length p. Let
2
w = 1p . Since w ∈ D and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ D.
2
Let w0 = 1(p+1) be the next biggest string in D after w; then no string u where |w| <
|u| < |w0 | is in D. Then
Since |y| ≤ p, |xyyz| − |w| ≤ p, so |xyyz| < |w0 |. Since |y| > 0, |w| < |xyyz| < |w0 |.
Thus xyyz 6∈ D, contradicting condition (1) of the Pumping Lemma.
The next example shows that “pumping down” (replacing xyz with xz) can be useful.
Theorem 7.3.5. The language E = {0i 1j | i > j} is not regular.
Proof. Assume for the sake of contradiction that E is regular, with pumping length p. Let
w = 0p+1 1p . Since w ∈ E and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ E.
Since |xy| ≤ p, y is all 0’s. Let m = |y| > 0. Then xz = 0p+1−m 1p , noting p + 1 − m ≤ p.
Thus xz 6∈ E, contradicting condition (1) of the Pumping Lemma.
Finally, let’s see a proof where we can’t choose the first p symbols to be the same, since
the language disallows it. We have more cases to consider, since we cannot guarantee that,
as in the above proofs, y will be a unary string. Depending on the exact form of y, different
arguments are needed.
Theorem 7.3.6. The language H = {(01)n (10)n | n ∈ N} is not regular.
(the sum of two integers is always an integer), so if I add x to an integer and get 3.14, x must be a non-integer.
7.4. OPTIONAL: PROOF OF THE PUMPING LEMMA FOR REGULAR LANGUAGES 77
For example, H contains ε, 0110, 01011010, 010101101010 but not 0011 or 0101.
Proof. Assume for the sake of contradiction that H is regular, with pumping length p. Let
w = (01)p (10)p . Since w ∈ H and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ H.
Since |xy| ≤ p and w starts with p consecutive 01 substrings, x and y both occur in this
region.5 Say that a block in a string is a substring of length 2 starting at an odd index. For
example, the blocks of 01011011 are 01, 01, 10, 11. We have two cases:
|y| is even: Then either y = (01)m or (10)m for some m. In either case, xyyz has more 01
blocks than 10 blocks, so xyyz 6∈ H.6
|y| is odd: Then y has a different number of 0’s and 1’s, so xyyz does also, so xyyz 6∈ H.
end of lecture 5b
0 0
0
s a 1
d 0
c
1 1 0 0 1
g 1
b f 0 e
1
0,1 0
Figure 7.2: A DFA D = (Q, Σ, δ, s, F ) to help illustrate the idea of the pumping lemma. Reading
x reaches a, then reading y returns to a, then reading z reaches accepting state e. The strings xz,
xyz, xyyz, xyyyz, . . . are all accepted too, since they also end up in state e. Their paths through
D differ only in how many times they follow the cycle (a, b, c, d, a).
have that whatever is D’s answer on w = xyz, it is the same on xy k z for any i ∈ N, since
all of those strings take D to the same state.
s a b c d a b f e e
b j , y k ) = qj .
b i , y) = qj = qi , for all k ∈ N, δ(q
Since δ(q
b 0 , xy k z) = qn ∈ F , so xy k z ∈ L(D), satisfying (1). Since j 6= l, |y| > 0,
Therefore δ(q
satisfying (2). Finally, j ≤ p, so |xy| ≤ p, satisfying (3).
unlike the Pumping Lemma, it can even be used to prove that a language is regular.)
Proof. Here, letting S = D itself works as the infinite set of distinguishable strings. All
2 2 2 2 2
strings 1k , 1n ∈ S with k < n have distinguishing extension z = 1(k+1) −k , since 1k z =
2 2 2 2 2 2
1(k+1) ∈ D but 1n z = 1n +(k+1) −k = 1n +2k+1 6∈ D.
The last follows because k < n =⇒ 2k + 1 < 2n + 1, but 2n + 1 is the distance from
n2 to the next perfect square (n + 1)2 . Thus n2 + 2k + 1 lies strictly between two adjacent
2
perfect squares n2 and (n + 1)2 , so is not itself a perfect square, so 1n +2k+1 6∈ D. So D is
not regular by the Myhill-Nerode Theorem.
The next example shows that “pumping down” (replacing xyz with xz) can be useful.
Theorem 7.5.5. The language E = {0i 1j | i > j} is not regular.
Proof. Let S = {0n | n ∈ N} = {ε, 0, 00, 000, 0000, . . .}. All strings 0i , 0j ∈ S with i < j
have distinguishing extension 1i since 0i 1i 6∈ E but 0j 1i ∈ E. Since S is infinite, E is not
regular by the Myhill-Nerode Theorem.
Theorem 7.5.6. The language H = {(01)n (10)n | n ∈ N} is not regular.
For example, H contains ε, 0110, 01011010, 010101101010 but not 0011 or 0101.
Proof. Let S = {(01)n | n ∈ N} = {ε, 01, 0101, 010101, . . .}. All strings (01)i , (01)j ∈ S with
i < j have distinguishing extension (10)i since (01)i (10)i ∈ H but (01)j (10)i 6∈ H. Since S
is infinite, H is not regular by the Myhill-Nerode Theorem.
Then we can replace the subtree under the upper A (whose leaves are y) with the smaller
subtree (whose leaves are x), and this remains a valid parse tree, whose leaves are the string
z.
We have three cases:
82CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION
y has no 0’s: Then y also has no 0’s, so by replacing y with x, we have decreased the
number of 1’s and/or 2’s without decreasing the number of 0’s, so z is not of the form
0n 1n 2n for any n ∈ N. But since z has a parse tree in G, z ∈ L(G), a contradiction.
y has no 2’s: This is symmetric to the previous case.
y has both 0’s and 2’s: There are three sub-cases:
x has no 0’s: Then replacing y with x decreases the number of 1’s and/or 2’s without
decreasing the number of 0’s.
x has no 2’s: This is symmetric to the previous sub-case.
x has both 0’s and 2’s: Then x contains all the 1’s in w. Thus, replacing y with x
decreases the number of 0’s and/or 2’s without decreasing the number of 1’s.
Maybe just state formal pumping lemma at end, but do mostly examples. If a parse tree repeats a variable on a root-leaf
path, then this can be pumped.
Part II
83
Chapter 8
Turing machines
finite-state control
Turing
machine
(TM) 0 1 0 ...
unbounded read/write tape
One can think of a finite automaton as taking its input string from a “tape”, in which
the finite automaton starts reading the leftmost tape cell, moves to the right by one tape cell
each time step, and halts after moving off of the rightmost tape cell. From this perspective,
the differences between finite automata and TMs are:
• A TM can write on its tape; furthermore, it can write symbols that are not part of the
input alphabet (in particular, the blank symbol xy is never part of the input alphabet,
but appears on the tape).
• The read-write tape head can move right, but also can move left or stay still. This
means it can read the same input symbol more than once.
• The tape is unbounded: if the tape head moves off the rightmost tape cell, a new tape
cell appears, with a xy on it.1
1
It is common to describe a TM as having an infinite tape, which makes it appear unrealistic. But there is no
85
86 CHAPTER 8. TURING MACHINES
• Most states do not accept or reject; there is exactly one accept state and one reject
state, and the TM immediately halts upon entering either state. Conversely, the TM
will not halt until it reaches one of these states, which may never happen.
Recall that wR represents the reverse of w.
Example 8.1.1. Design a Turing machine to test membership in the palindrome language
P = w ∈ {0, 1}∗ | w = wR .
1. Zig-zag to either side of the string, checking if the leftmost symbol equals the rightmost
symbol. If not, reject.
2. “Cross off” symbols as they are checked (i.e., replace the symbol with a symbol x not
in the input alphabet).
3. If we make it to the end without rejecting, accept.
s
0→x,R 1→x,R
_,x
0→R r00 r11 1→R
_,x→L lx _,x→L
0,1→x,L qr
x→R
l 0,1→L
Figure 8.2: TM that decides {w ∈ {0, 1}∗ | w = wR }. Although state qr has no explicit incoming
transition, every transition not shown (e.g., reading a xy in state r10 ) implicitly goes to state qr .
2
• Γ is the tape alphabet, where Σ Γ and xy ∈ Γ \ Σ,
• s ∈ Q the start state,
• qa ∈ Q the accept state,
• qr ∈ Q the reject state, where qa 6= qr , and
• δ : (Q \ {qa , qr }) × Γ → Q × Γ × {L, R, S} is the transition function.
Example 8.2.2. We formally define the TM M1 = (Q, Σ, Γ, δ, q1 , qa , qr ) described earlier,
which decides the language P = {w ∈ {0, 1}∗ | w = wR }:
• Q = {s, r00 , r01 , r10 , r11 , lx , l, qa , qr },
• Σ = {0, 1} and Γ = {0, 1, x, xy},
• s is the start state, qa is the accept state, qr is the reject state, and
• δ is shown in Figure 8.2.
2. C1 = (s, 1, xxy∞ ) (initial/start configuration: the TM starts with its input written at
the leftmost part on the tape, with xy written everywhere else.),
If M neither accepts nor rejects x, then we say M loops (a.k.a., does not halt) on x.
end of lecture 5c
• w ∈ Γ∗ is the tape content, the string consisting of the symbols starting at the leftmost
position of the tape, until the rightmost non-blank symbol, or the largest position the
tape head has scanned, whichever is larger.
• p0 = max(1, p + m) (move tape head position, but don’t move left if it’s already 1)
• w[i] = w0 [i] for all i ∈ {1, . . . , |w|} \ {p} (tape unchanged away from the tape head)
• if m = L or S, then |w0 | = |w| (no need to grow the tape on a left/stay move)
• if m = R, then |w0 | = |w| if p0 < |w| (don’t grow the tape if tape head was not already
on rightmost cell), otherwise |w0 | = |w| + 1 and w0 [p0 − 1] = xy. (grow the tape and put
a xy in the new position)
90 CHAPTER 8. TURING MACHINES
Another important reason is this: Many decision problems we study later will be of the
form, “given a Turing machine M , does L(M ) have some property?” (e.g., is nonempty, is
infinite, etc.) Every DFA, regex, NFA, and CFG has some unique language that it decides.
With Turing machines, this is true of recognizing but not deciding. That is, every Turing
machine M defines a unique language L(M ) that it recognizes. But if M is not total, it does
not decide any language. So Turing-recognizability gives us a way of making a one-to-one
equivalence between machines and languages, in a way that Turing decidability does not.
However, in the case of TMs that always halt, the TM decides the same language L(M ) that
it recognizes, so asking the question about L(M ) is not a restriction. It simply gives a way
for the question to be well-defined on all possible TMs, rather than only on those TMs that
always halt.
tapes, it is often convenient to move only some tape heads but let others stay put, so we
make more use of the S move.
Transition function type signature. We omit a detailed definition of the syntax and
semantics of multitape TMs. But it is worth pointing out that the key part of that definition,
the transition function, would have a type signature
δ(qi , 0, 1, 0) = (qj , 1, 1, 0, S, S, L)
means that, if the 3-tape TM is in state qi and heads 1 through 3 are reading symbols 0, 1, 0,
the machine goes to state qj , writes a 1 on the first tape, and moves the third tape head left.
Theorem 8.6.1. For every multitape TM M , there is a single-tape TM S such that L(M ) =
L(S). Furthermore, if M is total, then S is total.
0 1 0 1 1 0 ...
M
multitape TM x x x y x x y ...
S 0x 1x 0x 1y 1x 0x y ...
single-tape TM
Figure 8.3: Single-tape TM simulating a multitape TM. According to the formalization of S’s
tape alphabet in the proof of Theorem 8.6.1, the symbols on S’s tape from left to right are
((0, N ), (x, N )), ((1, H), (x, N )), ((0, N ), (x, N )), ((1, N ), (y, H)), ((1, N ), (x, N )), ((0, N ), (x, N )),
((xy, N ), (y, N )), ((xy, N ), (xy, N )), but we write them in the figure in an easier-to-read fashion as a
string of symbols with possible dots indicating the presence of one of M ’s tape heads.
0 0
3. If CM → CM , how S can update CS to represent CM .
There will not be a 1-1 correspondence of configurations between M and S: S will take
many transitions to simulate a single transition of M , so many consecutive configurations of
S will represent the same configuration of M .
The idea is shown in Figure 8.3. Say that M has k tapes. Each symbol in the tape
alphabet of S actually represents k symbols from the tape alphabet of M , as well as k bits
representing “is the i’th tape head of M located here?”, which will be true for exactly one
cell.
information can be stored in the finite states of S since there are a fixed number of tapes
and therefore a fixed number of •’s. Then, S will reset its tape to the start, moving each •
appropriately (either left, right, or stay), and writing new symbols on the tape at the old
location of the •, according M ’s transition function.
This allows S to simulate the computation done by M , accepting, rejecting, or looping
if and only if M does, so L(S) = L(M ). Since S halts if and only if M halts, S is total if M
is total.
5
H means “head” and N means “no head”
94 CHAPTER 8. TURING MACHINES
end of lecture 6a
midterm exam
end of lecture 6b
we study Turing machines. No one ever programmed Turing machines except as a math-
ematical exercise; they are a model of computation whose simplicity makes them easy to
handle mathematically (and whose definition is intended to model a mathematician sitting
at a desk with paper and a pencil), though this same simplicity makes them difficult to pro-
gram. We generally use Turing machines when we want to prove limitations on algorithms.
When we want to design algorithms, there is rarely a reason to use Turing machines instead
of pseudocode or a regular programming language.
Structured data and flat strings. It is common to write algorithms in terms of the data
structures they are operating on, even though these data structures must be encoded in
binary before delivering them to a computer (or in some alphabet Σ before delivering them
to a TM). Given any “discrete” object O, such as a string, graph, tuple, (or even a Turing
machine itself), we use the notation hOi to denote the encoding of O as a string in the
input alphabet of the Turing machine we are using. To encode multiple objects, we use the
notation hO1 , O2 , . . . , Ok i to denote the encoding of the objects O1 through Ok as a single
string.
1 2
0 1 1 0
1 0 1 0
For example, to encode the graph G = 3 4 we can use an adjacency matrix 1 1 0 1
0 0 1 0
encoded as a binary string by concatenating the rows: hGi = 0110101011010010.
• 1931 - Kurt Gödel proves the incompleteness theorem: for logical systems such as
Peano arithmetic or ZFC, there are theorems which are true but cannot be proven in
the system.6 This leaves open the possibility of an algorithm that decides whether
the statement is true or false, even though the correctness of the algorithm cannot be
proven.7
• At this point in history, it remains the case that no one in the world knows exactly
what they mean by the word “algorithm”, or “computable function”.
6
Essentially, any sufficiently powerful logical system can express the statement, “This statement is unprovable.”,
which is either true, hence exhibiting a statement whose truth cannot be proven in the system, or false, meaning the
false theorem can be proved, and the system is contradictory.
7
Lest that would provide a proof the truth or falsehood of the statement.
8.8. OPTIONAL: THE CHURCH-TURING THESIS 97
• 1936 - Alonzo Church proposes λ-calculus (the basis of modern functional languages
such as LISP and Haskell) as a candidate for the class of computable functions. He
shows that it can compute a large variety of known computable functions, but his
arguments are questionable and researchers are not convinced.8
• 1936 - Alan Turing, as a first-year graduate student at the University of Cambridge
in England, hears of the Entscheidungsproblem taking a graduate class. He submits a
paper, “On computable numbers, with an application to the Entscheidungsproblem”,
to the London Mathematical Society, describing the Turing machine (he called them
a-machines) as a model of computation that captures all the computable functions
and formally defines what an algorithm is. He also shows that as a consequence of
various undecidability results concerning Turing machines, there is no solution to the
Entscheidungsproblem; no algorithm can indicate whether a given theorem is true or
false, in sufficiently powerful logical systems.
• Before the paper is accepted, Church’s paper reaches Turing from across the Atlantic.
Before final publication, Turing adds an appendix proving that Turing machines com-
pute exactly the same class of functions as λ-calculus.
• Turing’s paper is accepted, and researchers in the field – Church included – were imme-
diately convinced by Turing’s physical arguments: he essentially argued that his model
was powerful enough to capture anything any person could do as they are sitting at a
desk with a pencil and paper, calculating according to fixed instructions.
The Church-Turing Thesis. All functions that can be computed in a finite amount of time
by a physical machine in the universe, can be computed by a Turing machine.
The statement known as the Church-Turing thesis is not a mathematical statement that
can be formalized in the same way as a theorem. It is a physical law, much like the Laws of
Thermodynamics or Maxwell’s equations. It is something observed to hold in practice, but
in principle it is refutable. In practice, however, attempts to refute it (a crank field known
as “hypercomputation”) tend to fail, in the same way that attempts to build a perpetual
motion machine (violating the second law of thermodynamics) inevitably fail.
8
For instance, Emil Post accused Church of attempting to “mask this identification [of computable functions]
under a definition.” (Emil L. Post, Finite combinatory processes, Formulation I, The Journal of Symbolic Logic, vol.
1 (1936), pp. 103–105, reprinted in [27], pp. 289–303.)
98 CHAPTER 8. TURING MACHINES
Chapter 9
Standardized time using TMs To do this we need a “standard” environment for com-
parison. This is one elegant usage of the model of Turing machines: they give a simple
and clean mathematical way to measure the “number of steps” needed for an algorithm,
99
100 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
Bounds of the form nc for some constant c are called polynomial bounds. (n, n2 , n3 , n2.2 ,
n1000 , etc.) So when we say a function f is polynomially bounded, this means that there
exists a c such that f = O(nc ).
δ
Bounds of the form 2n for √
some real constant δ > 0 are called exponential bounds. (2n ,
2 100
22n , 2100n , 20.01n , 2n , 2n , 2 n , etc.)
So-called “big-O” notation, saying that f = O(g), is analogous to saying that f is “at
most” (or “grows no faster than”) g. “Little-O” notation is a way to capture that f is
“strictly smaller” (or “grows slower than”) g.
The formal definitions above are “official”, but some shortcuts are handy to remember, and
in practice, you will use these shortcuts far more often than you will need to resort to using
the above definitions directly.
2
Often one sees the slightly more complex definition that f = O(g) if there exists c, n0 ∈ N such that, for all
n ≥ n0 , f (n) ≤ c · g(n). (i.e., we require that c · g(n) exceed f only on sufficiently large n, instead of all n). Why are
these equivalent? Hint: f and g are both positive.
102 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
120
100
80
60
40
20
0
0 20 40 60 80 100 120 140
g(n) f(n)
Figure 9.1: Example functions f and g obeying f = O(g) and g 6= O(f ), but not f = o(g). Define
g(n) = n, and f (n) = c! when c! < n ≤ (c + 1)!, where c ∈ N+ . In other words, g’s graph is the
line of slope 1, and f is constant between any two inputs of the form c! (1, 2, 6, 24, 120, 720, . . .),
jumping up to the g line upon reaching the next factorial. Since f (n) ≤ g(n) for all n, f = O(g).
Also g 6= O(f ), because for any constant c ∈ N+ , when n = c! − 1 (the value of n just before f
jumps from (c − 1)! to c!, e.g., n = 23 or n = 119), then g(n) = n > c! = c · (c − 1)! = c · f (n).
However, lim fg(n)
(n)
6= 0. The limit does not exist, since fg(n)
(n)
oscillates between 1 (when n = c! for
n→∞
some c ∈ N+ ) and values that get arbitrarily close to 0 (when n + 1 = c! for some c ∈ N+ ).
9.1. ASYMPTOTIC ANALYSIS 103
Write one function as the other times something unbounded. One way to see that
f (n) = o(g(n)) is to find h(n) so that f (n)·h(n) = g(n) for some h(n) that grows unboundedly.
For example, suppose we want to compare n and n log n: letting f (n) = n, g(n) = n log n,
and h(n) = log n, since f (n)·h(n) = g(n), and h(n) grows unboundedly, then f (n) = o(g(n)),
i.e., n = o(n log n).
Of course, this is simply a different way of saying that lim fg(n)
(n)
= 0, since fg(n)
(n) f (n)
= h(n)f (n)
=
n→∞
1
h(n)
→ 0 if h(n) → ∞.
Remove constants and lower-order terms. Another useful shortcut is to ignore all but the
largest terms, and to remove all constants. For example, 10n7 +100n4 +n2 +10n = O(n7 ), and
2n + n100 + 2n = O(2n ). This is useful when analyzing an algorithm, where you are counting
steps from many portions of the algorithm, but only the “slowest part” (contributing the
most number of steps) really makes a difference in the final analysis.
Taking logs and square roots make growth rates smaller. Taking the log of anything,
or taking the square root of anything (more more generally, raising it to a power smaller
than 1) makes it strictly
√ smaller, and raising it to a power larger than 1 makes it bigger. So
log n = o(n), and n = o(n). Also, log n4 = o(n4 ) (actually, log n4 = 4 log n = O(log n)).
Memorize these. The following are used often enough that they are worth memorizing:
• 1 = o(log n),
√
• log n = o( n) (or more generally log n = o(nα ) for any α > 0, e.g., n1/100 ),
√
• n = o(n),
√
• nc = o(nk ) if c < k For example, n = o(n2 ), n2 = o(n3 ), and n3 = o(n3.01 ). n = o(n)
is the special case for c = 0.5, k = 1.
δ
• nk = o(2n ) for any k > 0 and any δ > 0 (even if k is huge and δ is tiny, e.g.,
n100 = o(20.001n .) Also, nk = o(2δn ) for any k, δ > 0.
Take logs of two functions to see if they have different growth rates. If log f (n) =
o(log g(n)), then f (n) = o(g(n)). This is useful for simplifying expressions. For example,
2
how to compare 2n and nn ? One has a larger exponent, and the other a larger base, so it’s
not obvious which grows faster. But taking the log of both, we get n2 and log nn = n log n.
2
Since n log n = o(n2 ), we know immediately that nn = o(2n ). However, this doesn’t work to
go the other way: just because log f = Θ(log g) doesn’t imply that f = Θ(g). For example,
n = Θ(2n), but 2n = o(22n ), since 22n = (2n )2 .
104 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
end of lecture 6c
Theorem 9.2.3 (Time Hierarchy Theorem). Let t1 , t2 : N → N be time bounds such that
t1 (n) log t1 (n) = o(t2 (n)). Then TIME(t1 ) ( TIME(t2 ).
For instance, there is a language decidable in time n2 that is not decidable in time n, and
another language decidable in time 2n that is not decidable in time n2 , etc.
Theorem 9.3.1. Let t : N → N, where t(n) ≥ n. Then every t(n) time multitape TM can
be simulated in time O(t(n)2 ) by a single-tape TM.
Proof. Recall the proof of Theorem 8.6.1. M ’s tape heads move right by at most t(n)
positions, so S’s tape contents have length at most t(n). Simulating one step of M requires
moving S’s tape by this length and back, which is O(t(n)) time. Since M takes t(n) steps
total, and S takes O(t(n)) steps to simulate each step of M , S takes O(t(n)2 ) total steps.
Note that the single-tape TM S is a little slower: it takes O(t(n)2 ) to simulate t(n) steps
of M . Could this be improved to O(t(n))? It turns out the answer is no: for example, the
palindrome language {w | w = wR } can be solved in time O(n) on a two-tape TM, but it
provably requires Ω(n2 ) time on a one-tape TM.
The lesson is that although all reasonable models of computation have running time
within a polynomial factor of each other, they sometimes are not within a constant factor of
each other.
106 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
9.4 Definition of P
9.4.1 The complexity class P.
Definition 9.4.1. Let P = ∞ k
S
k=1 TIME(n ). In other words, P is the class of languages
decidable in polynomial time on a deterministic, one-tape TM. But, for reasons we outline
below, the easiest way to think of P is as the class of languages decidable using only a
polynomial number of steps in your favorite programming language.
The list above can be encoded as 001100 01 11000011 01 1111, where bits 0 and 1 of the
encoded string are represented as 00 and 11, respectively, and 01 is a delimiter to mark the
boundary between strings. This increases the length of the string by at most factor 4 (the
worst case is a list of single-bit strings).
This frees us to talk about the “size” of an input without worrying too much about
whether we mean the number of bits in an encoding of the input, or whether we mean some
more intuitive notion of size, such as the number of nodes or edges (or both) in a graph, the
number of strings in a list of strings, etc.
We will analyze the time complexity of algorithms at a high level (rather than at the level
of individual transitions of a TM), assuming, as in ECS 36C and ECS 122A, that individual
lines of code in C++, Python, or pseudocode take constant time, so long as those lines are
not masking loops or recursion or something that would not take constant time (such as a
Python list comprehension, which is shorthand for a loop).
The code below can also be found on this GitHub page:
https://github.com/dave-doty/UC-Davis-ECS120/blob/master/120.ipynb
Directed versus undirected graphs. The simplest way to write code is to assume that all
graphs are directed. In this case, an undirected graph G = (V, E) is the special case of a
directed graph obeying the constraint that (u, v) ∈ E ⇐⇒ (v, u) ∈ E for all u, v ∈ V. The
Python notebook provides some utility methods for converting between this and the repre-
sentation of an undirected edge as a two-element set. Also the function add_reverse_edges
takes a directed graph and adds the reverse edge of every directed edge, if not already present.
This used in many examples to make it easier to specify an undirected graph.
1 import collections
2 def path (G ,s , t ) :
3 """ G : graph
4 s , t : nodes in G
5 Check if there is a path from node s to node t in graph G . """
6 V,E = G
7 visited = []
8 queue = collections . deque ([ s ])
9 while queue :
10 node = queue . popleft ()
11 if node == t :
12 return True
13 if node not in visited :
14 visited . append ( node )
15 node_neighbors = [ v for (u , v ) in E if u == node ]
16 for neighbor in node_neighbors :
17 if neighbor not in visited :
18 queue . append ( neighbor )
19 return False
So the outer loop (line (9)) executes O(n2 ) iterations. Putting this all together, it takes
time O(n2 · (n + m + n · n)) = O(n4 ), which is polynomial time.
We can test this out on a few graphs, shown in Fig. 9.2 and in the code samples below.
1 2 3 4 5 1 2 3 4 5
6 7 8 6 7 8
Figure 9.2: Directed (left) and undirected (right) versions of a graph. In the directed graph, there’s
a path from 1 to 4 but not from 4 to 1. In the undirected graph, there are paths in both directions.
In both graphs, there is no path between any of {1, 2, 3, 4, 5} and any of {6, 7, 8}.
1 V = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8]
2 E = [ (1 ,2) , (2 ,3) , (3 ,4) , (4 ,5) , (5 ,2) , (6 ,7) , (7 ,8) , (8 ,6) ]
3
4 G = (V , E )
5 print ( f " path from 4 to 1? { path (G , 4 , 1) } " )
6 print ( f " path from 1 to 4? { path (G , 1 , 4) } " )
Now let’s try it on an undirected graph (we make the graph undirected by adding a
reverse edge for each existing edge):
1 def add_rev erse_e dges ( G ) :
2 """ Makes directed graph undirected by adding reverse edges . """
3 V,E = G
4 reverse_edges = []
110 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
5 for (u , v ) in E :
6 if (v , u ) not in E and (v , u ) not in reverse_edges :
7 reverse_edges . append (( v , u ) )
8 return (V , E + reverse_edges )
9
10 G = add_re verse_ edges ( G )
11 print ( f " path from 4 to 1? { path (G , 4 , 1) } " )
12 print ( f " path from 1 to 4? { path (G , 1 , 4) } " )
13 print ( f " path from 4 to 7? { path (G , 4 , 7) } " )
There are an exponential number of simple paths from s to t in the worst case ((n − 2)!
paths in the complete directed graph with n vertices), but we do not examine them all
in a breadth-first search. The BFS takes a shortcut to zero in on one particular path in
polynomial time.
Of course, we know from studying algorithms that with an appropriate graph representa-
tion, such as an adjacency list, and with a more careful analysis that doesn’t make so many
worst-case assumptions, this time can be reduced to O(n + m). But in computational com-
plexity theory, since we have decided in advance to consider any polynomial running time to
be “efficient”, we can often be quite lazy in our analysis and choice of data structures and
still obtain a polynomial running time.
Time lower bounds. Typically we study upper bounds on running time. The running time
upper bound of O(n ) leaves open the possibility that path could take time n3 or n2 , since
4
these are both O(n4 ). But sometimes we want to know how tight the analysis is, meaning
we want a lower bound on the running time of the algorithm.
If the running time is the maximum number of steps over all inputs of length n, then
showing an upper bound of u(n) means show that all inputs of length n take time O(u(n)).
But since it is defined as a maximum, showing a lower bound of `(n) requires only showing
that, for each input length n, there exists an input of length n requiring Ω(`(n)) time.
By analogy, if I want to prove that the maximum number in a set A is at most u, then
I need to show that all numbers x ∈ A obey x ≤ u. However, if I want to prove that the
maximum number in a set A is at least `, then I need only show that some number x ∈ A
obeys x ≥ `. x may not be the maximum itself, but if not, then the maximum m is even
larger, so m ≥ ` as well.
Let’s do such an analysis on the path algorithm. In this case, we want to show that for
each n, there is a graph G = (V, E) with n = |V | requiring time Ω(n). This won’t tell us
the exact running time, but we now know it is some function between n and n4 (perhaps n2 ,
n2 log n, n, or n4 ).
One nice thing about showing a lower bound on an algorithm’s running time is that
we can ignore certain steps. Since ignoring them means we are under-counting the actual
number of steps, the lower bound still holds. We will simply exhibit a graph that requires
every node other than t to be visited before t will be visited. We don’t have to calculate the
exact time required to visit a node to know that it is at least one step. So if we need to visit
every other node before visiting t, then on that graph, path takes at least n steps, since it
9.5. EXAMPLES OF PROBLEMS IN P 111
end of lecture 7a
Proof. Since the input size is | hx, yi | = O(log x + log y) (the number of bits needed to
represent x and y), we must be careful to use an algorithm that is polynomial in n = | hx, yi |,
not polynomial in x and y themselves, which would be exponential in the input size.
Euclid’s algorithm for finding the greatest common divisor of two integers works.
1 def gcd (x , y ) :
2 """ x , y : positive integers
3 Euclid ’s algorithm for greatest common divisor . """
4 while y >0:
5 x = x % y
6 x,y = y,x
7 return x
8
9 def rel_prime (x , y ) :
10 return gcd (x , y ) == 1
11
12 print ( gcd (24 ,60) , rel_prime (24 ,60) ) # gcd (24 ,60) = 12
13 print ( gcd (25 ,63) , rel_prime (25 ,63) ) # gcd (25 ,63) = 1
Each loop iteration cuts the value of x in half, because x mod y < y, so if y ≤ x2 , then
x mod y < y ≤ x2 . Otherwise y > x2 , then x mod y = x − y < x2 . Therefore, at most
log x < | hx, yi | iterations of the loop execute, and each iteration requires O(1) arithmetic
operations, each polynomial-time computable, whence rel_prime is polynomial time.
112 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
Note that there are an exponential (in | hx, yi |) number of integers that could potentially
be common divisors of x and y (namely, all the integers less than min{x, y}), but Euclid’s
algorithm does not check all of them to see if they divide x or y; it uses a shortcut to skip
most of them.
d e f d e f
G3 G4
a b c a b c
d e f d e f
g h f g h f
Figure 9.3: Graphs with/without Eulerian cycles. In the connected graphs, G1 has even degree
on all nodes, so it has an Eulerian cycle (a, b, c, e, b, f, e, d, a), but G2 has no Eulerian cycle since c
and f have odd degree. In the disconnected graphs, G3 has an Eulerian cycle since one connected
component is G1 , and all other components are isolated nodes, but G4 , even though each connected
component has a Eulerian cycle, there is no single cycle through the whole graph.
Let n = |V | and m = |E|. The loop executes n = |V | iterations, each of which executes
degree, which iterates over each of the m edges, taking time O(m). So the loop takes O(nm)
time, which is polynomial in | hGi |. It also calls connected on a potentially smaller graph
than G, which we showed also takes polynomial time.
114 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
• In ignoring polynomial differences, we can make conclusions that apply to any model
of computation, since they are polynomially equivalent, rather than having to choose
one such model, such as TM’s or C++, and stick with it. Our goal is to understand
computation in general, rather than an individual programming language.
• One objection is that some polynomial running times are not feasible, for instance,
n1000 . In practice, there are few algorithms with such running times. Nearly every
algorithm known to have a polynomial running time has a running time less than n10 .
Also, when the first polynomial-time algorithm for a problem is discovered, such as
the O(n12 )-time algorithm for Primes discovered in 2002,6 it is usually brought down
within a few years to polynomial running time with a smaller degree, once the initial
insight that gets it down to polynomial inspires further research. Primes currently is
known to have a O(n6 )-time algorithm, and this will likely be improved in the future.
6
Here, n ≈ log p is the size of the input, i.e., the number of bits to represent an integer p to be tested for primality.
9.6. OPTIONAL: WHY IDENTIFY P WITH “EFFICIENT”? 115
1. Although TIME(t) is different for different models of computation, P is the same class
of languages, in any model of computation polynomially equivalent to single-tape TM’s
(which is all of them worth studying, except possibly for quantum computers, whose
status is unknown).
2. P roughly corresponds to the problems feasibly solvable by a deterministic algorithm.7
7
Here, “deterministic” is intended both to emphasize that P does not take into account nondeterminism, which is
an unrealistic model of computation, but also that it does not take into account randomized algorithms, which is a
realistic model of computation. BPP, then class of languages decidable by polynomial-time randomized algorithms,
is actually conjectured to be equal to P, though this has not been proven.
116 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
Chapter 10
HamPath is not known to have a polynomial-time algorithm (and it is generally believed not
to have one), but, a related problem, that of verifying whether a given path is a Hamiltonian
path in a given graph, does have a polynomial-time algorithm:
The algorithm simply verifies that each adjacent pair of nodes in p is connected by an edge
in G (so that p is a valid path in G), and that each node of G appears exactly once in p (so
that p is Hamiltonian).
1 def ham_path_verify (G , p ) :
2 """ G : graph
3 p : list of nodes
4 Verify that p is a Hamiltonian path in G . """
5 V,E = G
6 # verify each pair of adjacent nodes in p shares an edge
7 for i in range ( len ( p ) - 1) :
8 if ( p [ i ] , p [ i +1]) not in E :
117
118 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
9 return False
10 # verify p and V have same number of nodes
11 if len ( p ) != len ( V ) :
12 return False
13 # verify each node appears at most once in p
14 if len ( set ( p ) ) != len ( p ) :
15 return False
16 return True
Why is this polynomial-time? Let n = |V |. It executes a for loop for |p| − 1 iterations,
where |p| ≤ n. each of which checks a pair of nodes for membership in E, which takes time
O(|E|) = O(n2 ). The comparison of |p| to |V | takes O(1) time, and converting p from a list
to a set in the final if statement takes time O(n log n) for common set data structures.
Let’s try out the verifier code on some candidate paths on the graph shown in Fig. 10.1.
1 2 3
6 5 4
1 V = [1 ,2 ,3 ,4 ,5 ,6]
2 E = [ (1 ,2) , (2 ,4) , (2 ,5) , (4 ,3) , (3 ,5) , (3 ,1) , (5 ,6) ]
3 G = (V , E )
4
5 p_bad = [1 ,2 ,3 ,4 ,5 ,6] # not a path
6 print ( ham_path_verify (G , p_bad ) )
7
8 p_bad2 = [1 ,2 ,4 ,3 ,1 ,2 ,4 ,3 ,5 ,6] # not simple
9 print ( ham_path_verify (G , p_bad2 ) )
10
11 p_bad3 = [1 ,2 ,4] # too enough nodes
12 print ( ham_path_verify (G , p_bad3 ) )
13
14 p_good = [1 ,2 ,4 ,3 ,5 ,6] # is a Hamiltonian path
15 print ( ham_path_verify (G , p_good ) )
end of lecture 7b
That is, x ∈ A if and only if there is a “short” string w where hx, wi ∈ L(V ) (where “short”
means bounded by a polynomial in |x|). We call such a string w a witness (or a proof or
certificate) that testifies that x ∈ A.
120 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
Example 10.2.3. For a graph G with a Hamiltonian path p, hpi is a witness testifying that
G has a Hamiltonian path.
Example 10.2.4. For a composite integer n with a divisor 1 < d < n, hdi is a witness
testifying that n is composite. Note that n may have more than one such divisor; this shows
that a witness for an element of a polynomially-verifiable language need not be unique.
• NP: given a potential solution, we can quickly verify that it’s correct.
NP is, in fact, a very natural notion, because it is far more common than not that real
problems we want to solve have this character that, whether or not we know how to find a
solution efficiently, we know what a correct solution looks like, i.e., we can verify efficiently
whether a purported solution is, in fact, correct.
For instance, it may not be obvious to me, given an encrypted file C, how to find the
decrypted version of it. But if you give me the decrypted version P and the encryption key
k, I can run the encryption algorithm E to verify that E(P, k) = C.
Optimization. Suppose I want to find the cheapest flights from Sacramento to Rome.
This is an optimization problem. To cast this as a decision problem, we add a new input,
a threshold value, and ask whether there are flights costing under the threshold: e.g., given
two cities C1 and C2 and a budget of b dollars, we ask whether there is a sequence of flights
from C1 to C2 costing at most b dollars. It may not be obvious to me how to find an airline
route from Sacramento to Rome that costs less than $1200. But if you give me a sequence of
flights with their ticket prices, I can easily check that their total cost is ≤ $1200, that they
start at Sacramento and end at Rome, and that the destination airport of each intermediate
leg is the starting airport of the next leg.
Search. Not all problems in NP are variants of optimization problems. For example, the
problem of determining whether a number is prime or not seems to be “inherently Boolean”:
the number is prime or it isn’t. Even looking at it like a search problem—given n, find
the prime factorization of n—doesn’t seem to involve any optimization. All integers have a
unique prime factorization.
However, there is a sense in which all NP problems are search problems. Namely, if a
problem is in NP, then it has a polynomial-time verifier that verifies purported witnesses w
for instances x. The equivalent search problem is then: given x, find a witness w, or report
than none exists.
For example for the decision problem of determining if a graph G has a Hamiltonian
path, the equivalent search problem is to find a Hamiltonian path of G if it exists. For the
Boolean satisfiability decision problem of determining whether a formula φ has a satisfying
assignment, the equivalent search problem is to find a satisfying assignment if it exists.
More generally, if a decision problem A is in NP, then it has a polynomial-time verifier V
such that, for all x ∈ {0, 1}∗ , x ∈ A ⇐⇒ (∃w ∈ {0, 1}p(|x|) ) V (x, w) accepts. The equivalent
search problem is, given x, find a w such that V (x, w) accepts.2
2
1 3
6 4
5
1 V = [1 ,2 ,3 ,4 ,5 ,6]
2 E = [(1 ,2) , (1 ,3) , (1 ,4) , (2 ,3) , (2 ,4) , (3 ,4) , (4 ,5) , (5 ,6) , (4 ,6) ]
3 G = (V , E )
4 G = add_re verse_ edges ( G )
5
6 C = [1 ,2 ,3 ,4]
7 k = len ( C )
8 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
9 # true
10
11 C = [3 ,4 ,5]
12 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
13 # false
14
15 C = [4 ,5 ,6]
16 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
17 # false
18
19 C = [1 ,3 ,4 ,5]
20 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
21 # false
10.3. EXAMPLES OF PROBLEMS IN NP 123
For example, h{4, 11, 16, 21, 27}, 25i ∈ SubsetSum because 4+21 = 25, but h{4, 11, 16}, 13i 6∈
SubsetSum.
Note that the complements of these languages, Clique and SubsetSum, are not obvi-
ously members of NP (and are believed not to be). The class coNP is the class of languages
whose complements are in NP, so Clique, SubsetSum ∈ coNP. It is not known whether
coNP = NP, but this is believed to be false, although proving that is at least as difficult as
proving that P 6= NP: since P = coP, if coNP 6= NP, then P 6= NP.
NP
or P=NP
P
Figure 10.3: We know that P ⊆ NP, so either P ( NP, or P = NP. We don’t know which is true,
but many conjecture P ( NP.
The best known method for solving NP problems in general is the one employed in
the proof of Theorem 10.4.1 in the next subsection: a brute-force search over all possible
witnesses, giving each as input to the verifier.
end of lecture 7c
4
5 def b i na r y _ s t r i n g s _ o f _ l e n g t h ( length ) :
6 """ Generate all strings of a given length """
7 return map ( lambda lst : " " . join ( lst ) ,
8 itertools . product ([ " 0 " ," 1 " ] , repeat = length ) )
9
10 def A_decider ( x ) :
11 """ Exponential - time algorithm for finding witnesses . """
12 n = len ( x )
13 for m in range ( n ** k + 1) : # check lengths m in [0 ,1 ,... , n ^ k ]
14 for w in b i n a r y _ s t r i n g s _ o f _ l e n g t h ( m ) :
15 if V_A (x , w ) :
16 return True
17 return False
We now analyze the running time. The two loops iterate over all binary strings of length at
P k
most nk , of which there are nm=0 2m = 2n +1 − 1 = O(2n ). Each iteration calls VA (x, w),
k k
k k
which takes time O(nc ). Then the total time is O(nc · 2n ) = O(22n ).
This is shown by example for HamPath below. It is a bit more efficient than above:
rather than searching all possible witness binary strings up to some length, it uses the fact
that any valid witness will encode a list of nodes, that list will have length n = |V |, and it will
be a permutation of the nodes (each node in V will appear exactly once). (ham_path_verify
is the same as the verification algorithm we used above, except that the checks intended to
ensure that p is a permutation of V have been removed for simplicity, since now we are
calling ham_path_verify only with p = a permutation of V )
1 import itertools
2 def ham_path_verify (G , p ) :
3 """ G : graph (V , E )
4 p : permutation of V ( list of unique nodes of length len ( V )
5 Verify that p is a Hamiltonian path in G . """
6 V,E = G
7 # verify each pair of adjacent nodes in p shares an edge
8 for i in range ( len ( p ) - 1) :
9 if ( p [ i ] , p [ i +1]) not in E :
10 return False
11 return True
12
13 def ham_path ( G ) :
14 """ G : graph
15 Exponential - time algorithm for finding Hamiltonian paths ,
16 which calls the verifier on all potential witnesses . """
17 V,E = G
18 for p in itertools . permutations ( V ) :
19 if ham_path_verify (G , p ) :
20 return True
21 return False
No one has proven that NP 6= EXP. It is known that P ⊆ NP ⊆ EXP, and since the Time
Hierarchy Theorem tells us that P EXP, it is known that at least one of the inclusions
126 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
P ⊆ NP or NP ⊆ EXP is proper, though it is not known which one. It is suspected that they
both are; i.e., that P NP EXP.
0∨0=0 0 ∧ 0 = 0 ¬0 = 1
0∨1=1 0 ∧ 1 = 0 ¬1 = 0
1∨0=1 1∧0=0
1∨1=1 1∧1=1
A Boolean formula is an expression involving Boolean variables and the three operations ∧,
∨, and ¬ (negation). For example,
φ = (x ∧ y) ∨ (z ∧ ¬y)
φ = (x ∧ y) ∨ (z ∧ y)
More formally, given a finite set V of variables (we write variables as single letters such as
a and b, sometimes subscripted, e.g., x1 , . . . , xn for n variables), a Boolean formula over V
is either 0) (base case) a variable x ∈ V , or 1) (recursive case 1) ¬φ, where φ is a Boolean
formula over V (called the negation of φ), 2) (recursive case 2) φ ∧ ψ, where φ and ψ are
3
Statements such as “(Some NP-complete problem is in P) =⇒ P = NP” are often called, “If pigs could whistle,
then donkeys could fly”-theorems.
10.5. INTRODUCTION TO NP-COMPLETENESS 127
Boolean formulas over V , (called the conjunction of φ and ψ), 3) (recursive case 3) φ ∨ ψ,
where φ and ψ are Boolean formulas over V , (called the disjunction of φ and ψ).
¬ takes precedence over both ∧ and ∨, and ∧ takes precedence over ∨. Parentheses may
be used to override default precedence.
39 elif self . op == ’ or ’:
40 return int ( self . left . evaluate ( assignment ) or self . right . evaluate ( assignment )
)
41 else :
42 raise ValueError ( " This shouldn ’t be reachable " )
43
44 def __repr__ ( self ) :
45 if self . variable :
46 return self . variable
47 elif self . op == ’ not ’:
48 return ’( not {}) ’. format ( self . right )
49 else :
50 return ’ ({} {} {}) ’. format ( self . left , self . op , self . right )
51
52 def __str__ ( self ) :
53 return repr ( self )
54
55 @staticmethod
56 def from_string ( text ) :
57 """ Convert string that looks like a Python Boolean expression with
58 variables , e . g . " x and y or not z and ( a or b ) " , to Boolean_formula . """
59 # add plenty of whitespace to make it easy to tokenize with string . split ()
60 for token in [ ’ and ’ , ’ or ’ , ’ not ’ , ’( ’ , ’) ’ ]:
61 text = text . replace ( token , ’ ’ + token + ’ ’)
62 tokens = text . split ()
63 val_stack = []
64 op_stack = []
65 for token in tokens :
66 if token in [ ’ and ’ , ’ or ’ , ’ not ’ ]:
67 cur_op = token
68 while len ( op_stack ) > 0 and not precedence_greater ( cur_op , op_stack [ -1]) :
69 process_top_op ( op_stack , val_stack )
70 op_stack . append ( cur_op )
71 elif token == ’( ’:
72 op_stack . append ( ’( ’)
73 elif token == ’) ’:
74 while op_stack [ -1] != ’( ’:
75 process_top_op ( op_stack , val_stack )
76 op_stack . pop ()
77 else :
78 val_stack . append ( Boolean_formula ( variable = token ) )
79 while len ( op_stack ) > 0 and not precedence_greater ( cur_op , op_stack [ -1]) :
80 process_top_op ( op_stack , val_stack )
81 return val_stack . pop ()
82
83 def process_top_op ( op_stack , val_stack ) :
84 """ Processes top operator from op_stack , popping one or two values as needed
85 from val_stack , and pushing the result back on the value stack . """
86 op = op_stack . pop ()
87 right = val_stack . pop ()
88 if op == ’ not ’:
89 val_stack . append ( Boolean_formula ( op = ’ not ’ , right = right ) )
90 elif op in [ ’ and ’ , ’ or ’ ]:
10.6. POLYNOMIAL-TIME REDUCIBILITY 129
The following code builds the formula φ = (x ∧ y) ∨ (z ∧ y) and then evaluates it on all
23 = 8 assignments to its three variables.
1 formula = Boolean_formula . from_string ( " (( x and y ) or ( z and ( not y ) ) ) " )
2 import itertools
3 num_variables = len ( formula . variables )
4 for assignment in itertools . product ([ " 0 " ," 1 " ] , repeat = num_variables ) :
5 assignment = " " . join ( assignment )
6 value = formula . evaluate ( assignment )
7 print ( " formula value = {} on assignment {} " . format ( value , assignment ) )
we mean when we say a problem is at least as hard as another. Let’s discuss a bit what
we might mean intuitively by this. First, by “hard”, we mean with respect to the running
time required to solve the problem. If problem A can be decided in time O(n3 ), whereas
problem B cannot be decided in time Ω(n6 ), this means that B is harder than A: the fastest
algorithm for A is faster than the fastest algorithm for B.
But, we will also do what we have been doing so far and relax our standards of comparison
to ignore polynomial differences. So, suppose that B is actually decidable in time O(n6 ) and
no smaller: there is an O(n6 ) time algorithm for B and every algorithm deciding B requires
Ω(n6 ) time. Then, since n6 is within a polynomial factor of n3 , because (n3 )2 = n6 (i.e., n6
“only” quadratically larger than n3 ), we won’t consider this difference significant, and we
will say that the hardness of A and B are close enough to be “equivalent”, since they are
within a polynomial factor of each other.4
However, suppose there is a third problem C, and we could prove it requires time Ω(2n )
to decide. (We have difficulty proving this for particular natural problems, but the Time
Hierarchy Theorem assures us that such problems must exist). Then we will say that C really
is harder than A or B, since 2n is not within a polynomial factor of either n3 or n6 . Even
composing n6 with a huge polynomial like n1000 gives the polynomial (n6 )1000 = n6000 = o(2n ).
But, as we said, it is often difficult to take a particular natural problem that we care
about and prove that no algorithm with a certain running time can solve it. Obtaining
techniques for doing this sort of thing may well be the most important open problem in
theoretical computer science, and after decades we still have very few tools to do so. That
is to say, it is difficult to pin down the absolute hardness of a problem, the running time of
the most efficient algorithm for the problem.
What we do have, on the other hand, is a way to compare the relative hardness of two
problems. What reducibility allows to do is to say . . . okay, fine, although I don’t know
what’s the best algorithm for A, and I also don’t know what’s the best algorithm for B,
but what I do know is that, if A is “reducible” to B, this means that any algorithm for B
can be used to solve A with only a polynomial amount of extra time. Therefore, whatever is
the fastest algorithm for B (even if I don’t know what it is, or how fast it is), I know that
the fastest algorithm for A is “almost” as fast (perhaps a polynomial factor slower). So in
particular, if B is decidable in polynomial time, then so is A (perhaps with a larger exponent
in the polynomial).
end of lecture 8a
4
By this metric, all polynomial-time algorithms are within a polynomial factor of each other. But there are
other polynomially-equivalent running times that are larger than polynomial. For instance, a 2n -time algorithm is
within a polynomial factor of a n5 · 2n -time algorithm, or even a 4n -time algorithm, since 4n = (2n )2 . However,
2 2
a 2n -time algorithm is not within a polynomial factor of a 2n -time algorithm, since 2n is not bounded by a
2
polynomial function of 2n ; i.e., there is no polynomial p such that 2n = O(p(2n )). Even if p(n) = n1000 , we still get
2
p(2n ) = (2n )1000 = 21000n , which is o(2n ) because 1000n = o(n2 ).
10.6. POLYNOMIAL-TIME REDUCIBILITY 131
Gi = (V,E) Gc = (V,E)
Figure 10.4: A way to “reduce” the problem of finding an independent set of some size in a graph
Gi to finding an clique of the same size. We transform the graph Gi on the left to Gc on the right
by taking the complement of the set of edges. Each independent set in Gi (such as the four shaded
nodes) becomes a clique in Gc .
The idea is shown in Figure 10.4. To decide whether hGi , ki ∈ IndSet, we map the
graph Gi = (V, E) to the graph Gc = (V, E), (i.e., for each pair of nodes in V , add an edge if
there wasn’t one, and otherwise remove the edge if there was one), and then we give hGc , ki
to the decider for Clique. Since each independent set in Gi becomes a clique in Gc , for
each k, Gi has a k-independent set if and only if Gc has a k-clique. Thus, the decider for
Clique can be used as a subroutine to decide IndSet, by transforming the input hGi , ki
into hGc , ki, passing the transformed input to the decider for Clique, and then returning
its answer.
{0,1}* {0,1}*
A B
f
Figure 10.5: A mapping reduction f that reduces decision problem A ⊆ {0, 1}∗ to B ⊆ {0, 1}∗ .
The key visual property above is that no arrow points from inside A to outside B, nor from outside
A to inside B. Either both endpoints are in the shaded regions, or both are not.
5 return clique_algorithm ( Gp , kp )
6
7 def clique_algorithm (G , k ) :
8 raise N o t Im p l em e n te d E rr o r ()
More formally, defining IndSet = {hG, ki | G has a k-independent set }, we have just
shown that IndSet ≤P Clique. In this special case, the reduction above also shows that
Clique ≤P IndSet, since one can use an IndSet-decider as a subroutine to decide Clique
in the exact same way. But this is a special case that does not hold in general: a reduction
from A to B is not in general also a reduction from B to A, and it may be that A ≤P B but
B 6≤P A.5
General formulation. The following theorem captures one way to formalize the claim
P
“A ≤ B means that A is no harder than B”:
Theorem 10.6.4. Suppose A ≤P B. If B ∈ P, then A ∈ P.
Proof. The idea of how to use a polynomial-time mapping reduction is shown in Figure 10.6.
algorithm MA for A
f(x) MB
x f
no
Figure 10.6: How to compose a mapping reduction f : {0, 1}∗ → {0, 1}∗ that reduces decision
problem A to decision problem B, with an algorithm MB deciding B, to create an algorithm MA
deciding A. If MB and f are both computable in polynomial time, then their composition MA is
computable in polynomial time as well.
Let MB be the algorithm deciding B in time nk for some constant k, and let f be the
reduction from A to B, computable in time nc for some constant c. Define the algorithm
1 def f ( x ) :
2 raise N o t Im p l em e n te d E rr o r ()
3 # TODO : code for f , reducing A to B , goes here
4
5 def M_B ( y ) :
6 raise N o t Im p l em e n te d E rr o r ()
7 # TODO : code for M , deciding B , goes here
8
5
For instance, we believe this is true of NP-complete problems: The statement that P 6= NP is equivalent to the
statement that Path ≤P Clique but Clique 6≤P Path, since Path ∈ P and Clique is NP-complete.
134 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
9 def M_A ( x ) :
10 """ Compose reduction f from A to B , with algorithm M for B ,
11 to get algorithm N for A . """
12 y = f(x)
13 output = M_B ( y )
14 return output
N correctly decides A because x ∈ A ⇐⇒ f (x) ∈ B. Furthermore, on input x of length
n, the line y = f(x) runs in time nc , and assuming in the worst case that the length of y
is nc , the line output = M_B(y) runs in time (nc )k = nck . Thus N runs in time at most
nc + nck = nc(k+1) .
Theorem 10.6.4 tells us that if the fastest algorithm for B takes time t(n), then the fastest
algorithm for A takes no more than time p(n)+t(p(n)), where p is the running time of f ; i.e.,
p is the “polynomial factor” when claiming that “A is no harder than B within a polynomial
factor”. Since we ignore polynomial differences when defining P, if it is also true that t is
a polynomial (i.e., B ∈ P), then we conclude that A ∈ P as well, since p(n) + t(p(n)) is
bounded by a polynomial in n.
This is where things begin to get a bit abstract compared to previous definitions. The
main way that we use reductions is to show that efficient algorithms do not exist for a
problem, by invoking the contrapositive of Theorem 10.6.4:
then this will remind you that you should start with an instance x of A, and end with an
instance f (x) of B.
Unlike the previous examples of code implementing these ideas, we cannot give concrete
examples of MA and MB above, because for the types of problems we consider using these
reductions, we believe that no efficient algorithms MA and MB exist for the two problems
A and B. There is, however, a concrete, efficiently computable reduction f from A to B for
many important choices of A and B. We cover one of these choices next.
end of lecture 8b
10.6.7 Reduction between problems with different data types: 3Sat ≤P IndSet
We now use ≤P -reductions to show that IndSet is “at least as hard” (to within a polynomial
factor) as a restricted version of the Sat problem known as 3Sat.
To construct a polynomial-time reduction from 3Sat to another language, we transform
the variables and clauses in 3Sat into structures in the other language. These structures
are called gadgets. For example, to reduce 3Sat to IndSet, nodes “simulate” variables and
triples of nodes “simulate” clauses. 3Sat is not the only NP-complete language that can
be used to show other problems are NP-complete, but its regularity and structure make it
convenient for this purpose.
6
The obvious dual of CNF is disjunctive normal form (DNF), which is an Or of conjunctions, such as the formula
one would derive applying the sum-of-products rule to the truth table of a boolean function, but 3DNF formulas do
not have the same nice properties that 3CNF formulas have, so we do not discuss them further.
136 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
x y y
x y
x y
y x
Figure 10.7: Example of the ≤P -reduction from 3Sat to IndSet when the input is the 3CNF
formula φ = (x ∨ x ∨ y) ∧ (x ∨ y ∨ y) ∧ (x ∨ y ∨ y). Observe that x = 0, y = 1 is a satisfying
assignment. Furthermore, if we pick exactly one node in each triple, corresponding to a literal that
satisfies the clause associated to that triple, it is a k-independent set: since we are picking the
literal that satisfies each clause, we never pick both a literal and its negation, so every node we pick
lacks an edge to every other node. In this example, the formula is satisfied by x = 0, y = 1, and
picking nodes y on the left, one of the y nodes on the right, and x on the top is a 3-independent
set.
Proof. Given a 3CNF formula φ, the reduction must output a pair hG, ki, a graph G and
integer k, so that
( =⇒ ): Suppose φ(w) = 1 has a satisfying assignment. Then at least one literal is true in
every clause. To construct a k-independent set S, select exactly one node from each
clause labeled by a true literal (breaking ties arbitrarily). For every u, v ∈ S where
u 6= v, condition (1) is false, and since x and x cannot both be true, condition (2) is
false. Therefore S is a k-independent set.
( ⇐= ): Suppose there is a k-independent set S in G. For every u, v ∈ S where u 6= v, u
and v are in different triples by condition (1). Since there are k triples and |S| = k,
S contains exactly one node from each triple. If a node in S is labeled with literal x,
assign x = 1. If the node is labeled with literal x, assign x = 0. Assign other variables
arbitrarily (e.g., set them to 0).
Since no pair of nodes in S are labeled with x and x by condition (2), this assignment
is well-defined (we will not attempt to assign x to be both 0 and 1). The assignment
makes every clause true, thus satisfies φ.
25 E . append ({ u , v })
26 k = len ( cnf_formula . clauses )
27 return (( V , E ) , k )
Theorems 10.6.4 and 10.6.6 tell us that if IndSet is decidable in polynomial time, then
so is 3Sat. In terms of what we believe is actually true, Corollary 10.6.5 and Theorem 10.6.6
tell us that if 3Sat is not decidable in polynomial time, then neither is IndSet.
Since IndSet is equivalent to Clique, this also shows that if 3Sat is not decidable in
polynomial time, then neither is Clique.
10.6.8 Reductions are algorithms, but they don’t solve either problem.
A reduction is just an algorithm. What makes them difficult to understand when first
learning the concept is that they are used to relate the difficulty of two problems, but not to
solve either problem. The reduction above, showing that 3Sat ≤P IndSet, is an algorithm
that takes an instance of 3Sat as input, but it does not solve the problem 3Sat. It also
does not solve the problem IndSet. It transforms an instance of 3Sat into an instance of
IndSet, while preserving the correct answer, but without ever knowing what that answer is.
The job of the reduction is to translate the question about a formula into a question about
a graph, not to answer either question.
1. B ∈ NP, and
2. B is NP-hard.
Proof. Assume the hypothesis. Since P ⊆ NP, it suffices to show NP ⊆ P. Let A ∈ NP.
Since B is NP-hard, A ≤P B. Since B ∈ P, by Theorem 10.6.4 (the closure of P under
≤P -reductions), A ∈ P. Since A was arbitrary, NP ⊆ P.
Since it is generally believed that P 6= NP, Corollary 10.7.4 implies that showing a problem
is NP-complete is evidence of its intractability.
The following property of polynomial-time reductions will be useful:
reduction h = g ○ f of A to C
reduction f of A to B reduction g of B to C
f(x)
x f g g(f(x))
Figure 10.8: Polynomial-time reductions are transitive: simply compose the algorithms.
Proof. The idea is shown in Fig. 10.8. Let A, B, C ⊆ {0, 1}∗ where A ≤P B via reduction f
and B ≤P C via reduction g. Then h = g ◦ f (i.e., for all x ∈ {0, 1}∗ , define h(x) = g(f (x)))
is a polynomial-time reduction of A to C: h can be computed in polynomial time since both
f and g can, and for all x ∈ {0, 1}∗ , x ∈ A ⇐⇒ f (x) ∈ B, and f (x) ∈ B ⇐⇒ g(f (x)) ∈ C,
so x ∈ A ⇐⇒ h(x) ∈ C.
The following theorem, using the transitivity of reductions proven in Observation 10.7.5,
is generally how one shows that a problem C is NP-hard: find some other NP-hard problem
B and show B ≤P C.
The following Python code shows how one would implement this idea.
1 def reduce_A_to_B ( x ) :
2 """ reduction from some NP language A to NP - hard language B """
3 raise N o t Im p l em e n te d E rr o r ()
4
5 def reduce_B_to_C ( x ) :
6 """ reduction showing B reduces to C """
7 raise N o t Im p l em e n te d E rr o r ()
8
9 def C_decider ( x ) :
10 """ hypothetical polynomial - time decider for C """
11 raise N o t Im p l em e n te d E rr o r ()
12
13 def A_decider ( x ) :
14 """ composition of two reductions to show how a decider for C
15 can be called to decide A """
16 y = reduce_A_to_B ( x )
17 z = reduce_B_to_C ( y )
18 return C_decider ( z )
NP-complete
C
NP
≤
A≤B
≤
≤
X Y
Figure 10.9: If an NP-complete problem reduces to C, and C is also in NP, then C is NP-complete
as well. This is because of transitivity: all the NP problems reduce to B, and B reduces to C, so
by composing these, all the NP problems reduce to C through B.
Recall
that we also showed IndSet ≤P Clique via the reduction h(V, E), ki 7→
(V, E), k . So Clique is NP-hard as well since the NP-hard problem IndSet reduces to it.
We also showed both Clique and IndSet are in NP, so they are NP-complete.
8
By Theorem 10.7.3, Theorem 10.7.8 implies Theorem 10.5.1. In other words, because Sat is NP-complete, if
Sat ∈ P, then since all A ∈ NP are reducible to Sat, A ∈ P as well, i.e., P = NP. Conversely, if P = NP, then
Sat ∈ P (and so is every other problem in NP).
10.8. OPTIONAL: ADDITIONAL NP-COMPLETE PROBLEMS 141
V1 x x y y z z
(variable
gadgets)
x x x y
V2
(clause
gadgets) x y y y y z x z
Figure 10.10: An example of the reduction from 3Sat to VertexCover for the 3CNF formula
φ = (x ∨ x ∨ y) ∧ (x ∨ y ∨ y) ∧ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z), with m = 3 variables and l = 4 clauses.
φ is satisfied by x = 0, y = 1, z = 0, and we use this assignment to find a vertex cover C with
|C| = m + 2l = 11 as described in the proof; nodes in C are shown with a bold outline.
end of lecture 8c
Memorial Day
end of lecture 9a
Note that adding nodes to a vertex cover cannot remove its ability to touch every edge;
hence if G = (V, E) has a vertex cover of size k, then it has a vertex cover of each size k 0
where k ≤ k 0 ≤ |V |. Therefore it does not matter whether we say “of size k” or “of size at
most k” in the definition of VertexCover.
Proof. We must show that VertexCover is in NP, and that some NP-complete problem
reduces to it.
142 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
of the clause gadget are in C, the external edges of the third clause gadget node
is covered by a node from a variable gadget, whence the assignment satisfies the
corresponding clause.
Reduction from IndSet: Instead of reducing from 3Sat, an elegant way to see
that VertexCover is NP-hard is to observe that IndSet ≤P VertexCover by a
particularly simple reduction hG, ki 7→ hG, n − ki, where n is the number of nodes.
This works because S is an independent set in G = (V, E) if and only if V \ S is a
vertex cover, and if |S| = k then |V \ S| = n − k. To see the forward direction, note
that no pair of nodes in S has an edge that needs to be covered. Thus, if we pick
all nodes in V \ S, they cover all edges between nodes in V \ S, and since S is an
independent set, all remaining edges must be between a node in S and a node in V \ S,
so the set V \ S covers these edges as well. Conversely, if V \ S is a vertex cover, then
no pair of nodes in S can have an edge between them (since it would be uncovered by
V \ S), so S must be an independent set. In fact, this very same reduction also shows
VertexCover ≤P IndSet; they are nearly equivalent problems, in the same way
that Clique is nearly equivalent to IndSet (recall that the reduction from IndSet
to Clique simply takes the complement of E).
We gave two reductions in the proof; there are many ways to prove a problem is NP-hard,
and you just have to find one. The first one reducing 3Sat to VertexCover was much
more complex than the second that used a reduction from IndSet. Why did we go through
all that work then, if there was an easier way?
Sometimes the simplest reduction isn’t so simple as the second one above. It’s good
to see an example of a more complex reduction between two objects of different “types”
(in this case reducing a Boolean formula question to a graph question), to see how to do
them when the correspondence between two problems is not so obvious. But, when you are
trying to show a problem is NP-complete, generally the most efficient strategy is to start by
looking for existing NP-complete problems about the same type of object. Often you find
that some existing NP-complete problem is very close to your problem (such as IndSet and
VertexCover), and the reduction is very straightforward. If that doesn’t work, then move
on to other NP-complete problems. For some reason, 3Sat often works well.
x 1 x 2 x 3 x 4 c1 c2 c3
y1 1 0 0 0 1 0 1
z1 1 0 0 0 0 1 0
y2 1 0 0 0 1 0
z2 1 0 0 1 0 0
y3 1 0 1 1 0
z3 1 0 0 0 1
y4 1 0 0 1
z4 1 0 0 0
g1 1 0 0
h1 1 0 0
g2 1 0
h2 1 0
g3 1
h3 1
t 1 1 1 1 3 3 3
The upper-left and bottom-right of the table contain exactly one 1 per row as shown, the
bottom-left is all empty (leading 0’s), and t is m 1’s followed by l 3’s. The upper-right of
the table has 1’s to indicate which literals (yi for xi and zi for xi ) are in which clause. Thus
each column in the upper-right has exactly three 1’s.
The table has size O((m + l)2 ), so the reduction can be computed in time O(n2 ), since
m, l ≤ n.
We now show that
X
φ is satisfiable ⇐⇒ (∃S 0 ⊆ S) t = n
n∈S 0
3 in the bottom row, S 0 contains at least one row with a 1 in column j in the upper-
right, since only two 1’s are in the lower-right. Hence cj is satisfied for every j, whence
φ(x1 x2 . . . xm ) = 1.
Finally, we don’t prove it in this course, but HamPath is NP-complete. So are the
following variants:
each xi , and the values of the gates are determined by computing the associated Boolean
function of its inputs. A circuit γ is satisfiable if there is an input string x that satisfies γ;
i.e., such that γ(x) = 1.11
Define
CircuitSat = { hγi | γ is a satisfiable Boolean circuit } .
To show that A ≤P CircuitSat, we transform an input x to A into a circuit γxV such that
γxV is satisfiable if and only if there is a w ∈ {0, 1}q(n) such that V (x, w) accepts.13
Let V = (Q, Σ, Γ, δ, s, qa , qr ) be the Turing machine deciding CircuitSatV . V takes two
inputs, x ∈ {0, 1}n and the witness w ∈ {0, 1}q(n) . γxV contains constant gates representing x,
and its q(n) input variables represent a potential witness w. We design γxV so that γxV (w) = 1
if and only if V (x, w) accepts.
We build a subcircuit γlocal .14 γlocal has 3m inputs and m outputs, where m depends on
V – but not on x or w – as described below. Assume that each state q ∈ Q and each symbol
a ∈ Γ is represented a binary string,15 called σq and σa , respectively.
11
The only difference between a Boolean formula and a Boolean circuit is the unbounded out-degree of the gates.
A Boolean formula has out-degree one, so that when expressed as a circuit, have gates that form a tree (although
note that even in a formula the input variables can appear multiple times and hence have larger out-degree), whereas
a Boolean circuit, by allowing unbounded fanout from its gates, allows the use of shared subformulas. (For example,
“Let ϕ = (x1 ∧ x2 ) in the formula φ = x1 ∨ ϕ ∧ (x4 ∨ ϕ).”) This is a technicality that will be the main obstacle to
proving that CircuitSat ≤P 3Sat, but not a difficult one.
12
To show that CircuitSat is NP-hard, we show how any verification algorithm can be simulated by a circuit, in
such a way that the verification algorithm accepts a string if and only if the circuit is satisfiable. The input to the
circuit will not be the first input to the verification algorithm, but rather, the witness.
13
In fact, w will be the satisfying assignment for γxV . The subscript x is intended to emphasize that, while x is an
input to V , it is hard-coded into γxV ; choosing a different input y for the same verification algorithm V would result
in a different circuit γyV .
The key idea will be that circuits can simulate algorithms. We prove this by showing that any Turing machine can
be simulated by a circuit, as long as the circuit is large enough to accommodate the running time and space used by
the Turing machine.
14
Many copies of γlocal will be hooked together to create γxV .
15
This is done so that a circuit may process them as inputs.
10.9. OPTIONAL: PROOF OF THE COOK-LEVIN THEOREM 147
Proof. 3Sat ∈ NP for the same reason that Sat ∈ NP: the language
Observe that, for example, a ∧ gate with inputs a and b and output c is operating correctly
if
(a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)
Applying the fact that the statement p =⇒ q is equivalent to p ∨ q and DeMorgan’s laws
gives the expressions in equation (10.9.1).
22
The main obstacle to simulating a Boolean circuit with a Boolean formula is that circuits allow unbounded
fan-out and formulas do not. The naı̈ve way to handle this would be to make a separate copy of the subformula
representing a non-output gate of the circuit, one copy for each output wire. The problem is that this could lead to
an exponential increase in the number of copies, as subformulas could be copied an exponential number of times if
they are part of larger subformulas that are also copied. Our trick to get around this will actually lead to a formula
in 3CNF form.
23
φ is not equivalent to γ: φ has more input bits than γ. But it will be the case that φ is satisfiable if and only if
γ is satisfiable; it will simply require specifying more bits to exhibit a satisfying assignment for φ than for γ.
24
wj being the only input if gj is a ¬ gate, and each input being either a γ-input variable xi or a φ-input variable
yi representing an internal wire in γ
10.10. OPTIONAL: A BRIEF HISTORY OF THE P VERSUS NP PROBLEM 149
To express that γ is satisfied, we express that the output gate outputs 1, and that all
gates are properly functioning:
s
^
φ = (y1 ∨ y1 ∨ y1 ) ∧ ψj .
j=1
The only way to assign values to the various yj ’s to satisfy φ is for γ to be satisfiable, and
furthermore for the value of yj to actually equal the value of the output wire of gate gj in
γ, for each 1 ≤ j ≤ s.25 Thus φ is satisfiable if and only if γ is satisfiable.
Some of these problems, such as sorting and searching, have obvious efficient algorithms,28
although clever tricks like divide-and-conquer recursion could reduce the time even further
for certain problems such as sorting. Some problems, such as linear programming and max-
imum flow, seemed to have exponential brute-force algorithms as their only obvious solu-
tion, although clever tricks (Dantzig’s simplex algorithm in the case of linear programming,
Ford-Fulkerson in the case of network flow)29 would allow for a much faster algorithm than
brute-force search. Still other problems, such as the traveling salesman problem and vertex
cover, resisted all efforts to find an efficient exact algorithm.
After decades of effort by hundreds of researchers to solve such problems failed to produce
efficient algorithms, many suspected that these problems had no efficient algorithms. The
theory of NP-completeness provides an explanation for why these problems are not feasibly
solvable. They are important problems regardless of whether they are contained in NP; what
makes them most likely intractable is the fact that they are NP-hard. In other words, they
are intractable because they are at least as hard as every problem in a class that apparently
(though not yet provably) defines a larger class of problems than P. In better-understood
models of computation such as finite automata, a simulation of a nondeterministic machine
by a deterministic machine necessitates an exponential blowup (in states in the case of finite
automata). Our intuition is that, though this has not been proven, the same exponential
blowup (in running time now instead of states) is necessary when simulating a NTM with
a deterministic TM. The NP-complete problems are believed to be difficult to the extent
that this intuition is correct. While not a proof that those problems are intractable, it
does imply that if those problems are tractable, then something is seriously wrong with our
current understanding of computation, if magical nondeterministic computers could always
be simulated by real computers for negligible extra cost.30
Prior to the theory of NP-completeness, algorithms researchers – and engineers and scien-
tists in need of those algorithms – were lost in the dark, only able to conclude that a decidable
problem was probably too difficult to have a fast algorithm if considerable cost invested in
attempting to solve it had failed to produce results. The theory of NP-completeness provides
a systematic method of, if not proving, at least providing evidence for the intractability of
a problem: show that the problem is NP-complete by reducing an existing NP-complete
such as: “If transportation with trucks costs $10/soldier, $4/ton of food, subject to the constraint that each truck
can carry weight subject to the linear tradeoff of 1 ton of food for every 10 soldiers, and each soldier requires 20
pounds of food/week, how can we move the maximum number of soldiers in 30 trucks, for less than $50,000?”
28
Though it would not be until the late 1960’s that anyone would even think of polynomial time as the default
definition of “efficient”.
29
Although technically the original versions of both of those methods were exponential time in the worst case, they
nonetheless avoided brute-force search in ingenious ways, and led eventually to provably polynomial-time algorithms.
30
We have deeper reasons for thinking that P 6= NP besides the hand-waving argument, “C’mon! How could they be
equal?!” For example, cryptography as we know it would be impossible, and the discovery of proofs of mathematical
theorems could be automated. In a depressing philosophical sense, creativity itself could be automated: as Scott
Aaronson said, “Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-
step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett.”
http://www.scottaaronson.com/blog/?p=122
10.10. OPTIONAL: A BRIEF HISTORY OF THE P VERSUS NP PROBLEM 151
problem to it.31
Scott Aaronson’s paper NP-complete Problems and Physical Reality 32 contains convincing
arguments justifying the intuition that NP contains fundamentally hard problems. Anyone
who would considering taking the position that “The lack of a proof that P 6= NP implies
that there is a non-negligible chance that NP-complete problems are tractable”, owes it to
themselves to read that paper.
31
For this we probably have Richard Karp to thank more than Cook or Levin, who, one year after the publication
of Cook’s 1971 paper showing Sat is NP-complete, published a paper listing 21 famous, important and practical
decision problems, involving graphs, job scheduling, circuits, and other combinatorial objects, showing them all to
be NP-complete by reducing Sat to them (or reducing them to each other). After that, the floodgates opened, and
by the mid-1970’s, hundreds of natural problems had been shown to be NP-complete.
32
http://arxiv.org/abs/quant-ph/0502072
152 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
Chapter 11
Undecidability
153
154 CHAPTER 11. UNDECIDABILITY
But before getting to it, we will first use the machinery of reductions, together with
Theorem 11.1.1, to show that other problems are undecidable, by showing that if they could
be decided, then so could Halts, contradicting Theorem 11.1.1.
11.1.2 Reducibility
Although it is standard to use reductions as a formal tool to prove problems are undecidable,
we don’t formally use them in this chapter. For the curious, they are the similar to the
(mapping) reductions used for NP-completeness, but without the polynomial-time constraint.
We also allow a more general type of reduction (known as a Turing reduction), which is closer
to what most people would think of as “using one algorithm as a subroutine to write another”.
mapping (many-one) reducibility. These are the sorts of reductions we used in showing
problems are NP-complete. A mapping reduction f reducing A to B is the special case when
the algorithm MA is of the form “on input x, compute y = f (x) and return MB (y).” In
other words, MB is called as a subroutine exactly once, at the end of the algorithm, and its
answer is returned as the answer for the whole algorithm. With more general reductions, we
allow the reduction to change the answer, or to call MB more than once.
11.2. OPTIONAL: SOURCE CODE VERSUS PROGRAMS 155
From now on, for convenience we will most deal with Python functions, rather than
strings that compile into Python functions, knowing that if we had source code instead, we
could simply use exec to convert it to a Python function.
First, suppose we knew already that Haltsε was undecidable, but not whether Halts
was undecidable. Then we could easily use the undecidability of Haltsε to prove that
Halts is undecidable, by showing a reduction from Haltsε to Halts: suppose for the sake
of contradiction that Halts is decidable by TM H. Then to decide whether M (ε) halts, we
use call H with inputs hM, εi to decide whether M (ε) halts.
That is, an instance hM i of Haltsε is a simple special case of an instance hM, wi of
Halts (where w = ε), so Halts is at least as difficult to decide as Haltsε . Proving that
Haltsε is undecidable means showing the other direction: Haltsε is at least as difficult to
decide as Halts.
Proof. Suppose for the sake of contradiction that Haltsε is decidable by algorithm Hε .
1 def H_e ( M ) :
2 """ Function that supposedly decides whether M ("") halts . """
3 raise N o t Im p l em e n te d E rr o r ()
Define the algorithm H deciding Halts as
1 def H (M , w ) :
2 """ Decider for HALTS that works assuming H_e works , giving
3 a contradiction and proving H_e cannot be implemented . """
4 def T_Mw ( x ) :
5 """ Halts on every string if M halts on w , and loops on
6 every string if M loops on w . """
7 M(w)
8 return H_e ( T_Mw )
If M (w) halts, then TM,w halts on all inputs (including ε), and if M (w) does not halt,
then TM,w halts on no inputs (including ε). Then H decides Halts, a contradiction.
Note that TM,w depends on M and w, so the program will be behave differently if different
M or w are passed to H. M and w are “hardcoded constants” in TM,w .
Pay particular attention to what H is doing: importantly, it is not running TM,x . It
defines the algorithm TM,x . However, it is possible to define an algorithm, meaning, to state
what steps would be executed if the algorithm were to run, without actually running it. Every
time you are programming, you do this. You write code, but the code doesn’t execute until
you run it. Similarly, when a Python function like H executes the def statement defining the
“local function” T_Mw, this creates the function, but it does not execute the function. The
next line return H_e(T_Mw) similarly does not execute T_Mw. It merely passes the function
object as input to another function.
For example, the following code defines f:
1 def f ( x ) :
2 print ( " Returning {} + 1 " . format ( x ) )
3 return x +1
Run it. Nothing happens. It defined f, but did not run f. Now run this code:
158 CHAPTER 11. UNDECIDABILITY
1 def f ( x ) :
2 print ( " Returning {} + 1 " . format ( x ) )
3 return x +1
4
5 print ( " Returned value : {} " . format ( f (3) ) )
It should print
Returning 3 + 1
Returned value: 4
Because after defining f, it actually runs f.
Intuitively, think of the difference between library code, versus a program with a main
function. Library code generally doesn’t run when you import the library. For instance,
executing import math in Python gives you access to all the functions (such as math.exp)
defined in the math library, but doesn’t run any of them.
end of lecture 9b
• Does P accept x?
• Does P reject x?
• Does P loop on x?
• Does P (x) ever execute line 320?
11.4. HOW TO SPOT UNDECIDABILITY AT A GLANCE 161
Similar questions about general program behavior over all strings are also undecidable,
i.e., given a program P :
However, there are certainly questions one can ask about a program P that are decidable.
For example, if the question is about the syntax instead of the behavior, then the problem
can be decidable:
• Does P have at least 100 lines of code? (Turing machine equivalent of this question
would be, “Does M have at least 100 states?”)
• (Interpreting P as a TM) Does P have 2 and 3 in its input alphabet?
• Does P halt immediately because its first line is a return statement not calling an-
other function? (Turing machine equivalent of this question would be, “Does P halt
immediately because its start state is equal to the accept or reject state?)
• Does P have a subroutine named f ?
162 CHAPTER 11. UNDECIDABILITY
• (Interpreting P as a TM) Do P ’s transitions always move the tape head right? (If
so, then we could decide whether P halts on a given input, because it either halts
after moving the tape head beyond the input upon reading enough blanks, or repeats
a non-halting state while reading blanks and will therefore move the tape head right
forever.)
• Does P “clock itself” to halt after n3 steps on inputs of length n, guaranteeing that
its running time is O(n3 )? One way to do this is, before doing anything else, to write
3
the string 0n on a worktape, then reset the tape head to the left, then move that tape
head right once per step (in addition to the computation being done with all the other
tapes), and immediately halt when that worktape reaches the blank to the right of the
last 0.1
Furthermore, many questions about the behavior of P on input x actually are decidable,
as long as the question contains a “time bound” that restricts the search, for example, given
P and x:
that don’t move the tape head beyond n3 . If neither of these happens, P (x) will halt
and the answer is “no”.
end of lecture 9c
The following Python code implements the “convert enumerator to acceptor” direction:
1 def c r e a t e _ a c c e p t o r _ f r o m _ e n u m e r a t o r ( enumerator ) :
2 def acceptor ( i nput_t o_acce ptor ) :
3 for e numera ted_ou tput in enumerator () :
4 if en umerat ed_out put == in put_to _accep tor :
5 return True
6 return False # if it has a finite language it might halt
7 return acceptor
So, if we give it the primes enumerator above, it creates a Python function that recognizes
prime numbers. (but doesn’t halt on composites!)
1 primes_acceptor = c r e a t e _ a c c e p t o r _ f r o m _ e n u m e r a t o r ( prime s_enum erator )
2 print ( primes_acceptor (2) ) # prints True
3 print ( primes_acceptor (3) ) # prints True
4 print ( primes_acceptor (5) ) # prints True
5 print ( primes_acceptor (7) ) # prints True
6 print ( primes_acceptor (9) ) # runs forever
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE165
Python doesn’t have a way to call a function and run it for only a certain number of
steps, so there is no direct Python implementation of the idea of the other direction in the
proof of Theorem 11.5.1. We can do something similar, however, using the Python threading
library. To take an algorithm M recognizing A, and create an enumerator E enumerating
A, we can run an infinite loop that starts A on each possible input x ∈ Σ∗ in a separate
thread. If M (x) does not halt then the thread will never terminate, but for any M (x) that
does halt, E can check whether it accepted and print x if so.
1 import sys , itertools , threading
2
3 # for python 3 include the next line
4 from queue import Queue
5
6 # for python 2 include the next line
7 # from Queue import Queue
8
9 def binary_strings () :
10 """ Enumerates all binary strings in length - lexicographical order . """
11 for length in itertools . count () :
12 for s in itertools . product ([ " 0 " ," 1 " ] , repeat = length ) :
13 yield " " . join ( s )
14
15 # WARNING : this runs forever , consuming processor and memory and has to be killed
16 def e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( recognizer , inputs = binary_strings () ) :
17 """ Given recognizer , a function that defines a language by returning True if
18 its input string is in the language and otherwise either returning False
19 or looping , this function enumerates over strings in the language .
20
21 inputs is iterator yield ing valid inputs to recognizer . Default is binary
22 strings in lexicographical order . """
23 # define queue to put outputs ( and its associated input ) into
24 outputs_queue = Queue ()
25 def c o m p u t e _ o u t p u t _ a n d _ a d d _ t o _ q u e u e ( input_string ) :
26 output = recognizer ( input_string )
27 if output == True :
28 outputs_queue . put (( input_string , output ) )
29
30 # run recognizer on all possible inputs and add outputs to outputs_queue
31 re co gnizer_threads = []
32 def s t a r t _ a l l _ r e c o g n i z e r _ t h r e a d s () :
33 for input_string in inputs :
34 thread = threading . Thread ( target = compute_output_and_add_to_queue ,
35 args =[ input_string ])
36 thread . daemon = True
37 thread . start ()
38 recognizer_threads . append ( thread )
39 # if finite number of inputs , wait until all threads have stopped
40 # and then add a new item to the queue to indicate all inputs are done
41 for thread in recognizer_threads :
42 thread . join ()
43 outputs_queue . put (( None , None ) )
44
166 CHAPTER 11. UNDECIDABILITY
We can test it out, but if you actually run this, be sure to kill the process manually. It
will run forever and continue to consume resources.
A more reasonable testing option is to set inputs to iterate over a bounded number of
inputs:
1 import itertools
2 enumerate_primes = c r e a t e _ e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( is_prime , inputs = range (100) )
3
4 for i , p in zip ( itertools . count () , enumerate_primes () ) :
5 print ( " {} ’ th prime is {} " . format (i , p ) )
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE167
9 if w == x_no :
10 return False
We can test it as follows. Here, we make recognizers for the problem A = “is a given
number prime?”, and its complement A = “is a given number composite?”, and we inten-
tionally let the recognizers run forever when the answer is no, in order to demonstrate that
the decider returned by create_decider_from_recognizers still halts.5
1 def loop () :
2 """ Loop forever . """
3 while True :
4 pass
5
6 def prime_recognizer ( n ) :
7 """ Check if n is prime . """
8 if n < 2: # 0 and 1 are not primes
9 loop ()
10 for x in range (2 , int ( n **0.5) +1) :
11 if n % x == 0:
12 loop ()
13 return True
14
15 def compo s i t e _ r e c o g n i z e r ( n ) :
16 if n < 2: # 0 and 1 are not primes
17 return True
18 for x in xrange (2 , int ( n **0.5) +1) :
19 if n % x == 0:
20 return True
21 loop ()
22
23 # WARNING : will continue to consume resources after it halts
24 # because it starts up threads that run forever
25 prime_decider = c r e a t e _ d e c i d e r _ f r o m _ r e c o g n i z e r s ( prime_recognizer ,
compos i t e _ r e c o g n i z e r )
26 for n in range (2 ,20) :
27 print ( " {:2} is prime ? {} " . format (n , prime_decider ( n ) ) )
3 3 3 3 3 3 3 3 3
1 1 1 1 1 1 1 1 1
4 4 4 4 4 4 4 4 4
2 2 2 2 2 2 2 2 2
5 5 5 5 5 5 5 5 5
Figure 11.1: All the functions f : {1, 2} → {3, 4, 5}. None of them are onto.
Figure 11.1 shows that there is no onto function f : {1, 2} → {3, 4, 5}: there are only two
values of f , f (1) and f (2), but there are three values 3,4,5 that must be mapped to, so at
least one of 3, 4, or 5 will be left out (will not be an output of f ), so f will not be onto. We
conclude the obvious fact that |{1, 2}| < |{3, 4, 5}|.
Conversely, if there is an onto function f : A → B, then we can say that |A| ≥ |B|. For
instance, f (3) = f (4) = 1 and f (5) = 2 is an onto function f : {3, 4, 5} → {1, 2}, so we
conclude the obvious fact that |{3, 4, 5}| ≥ |{1, 2}|.
N vs. Z. Let’s consider two infinite sets: N = {0, 1, 2, . . . , } and Z = {. . . , −2, −1, 0, 1, 2, . . .}.
What’s an onto function from Z to N? The absolute value function works: f (n) = |n|.
In fact, for any sets A, B where A ⊆ B, we have |A| ≤ |B|, i.e., there is an onto function
6
For instance, we think of the integers as being at least as numerous as the even integers.
11.6. THE HALTING PROBLEM IS UNDECIDABLE 171
f : B → A. What is it? One choice is this: pick some fixed element a0 ∈ A. Then define f
for all x ∈ B by f (x) = x if x ∈ A, and f (x) = a0 otherwise. For f : Z → N, one possibility
is to let f (n) = n if n ≥ 0 and let f (n) = 0 if n < 0.
How about going the other way? What’s an onto function from N to Z? This doesn’t
look quite as easy; it seems like there are “more” integers than nonnegative integers, since
N ( Z. But, there is an onto function:
f (0) = 0
f (1) = 1
f (2) = −1
f (3) = 2
f (4) = −2
f (5) = 3
f (6) = −3
...
Symbolically, we can express f as f (n) = −n/2 if n is even and dn/2e if n is odd. But
functions don’t always need to be expressed symbolically. The partial enumeration above is
a perfectly clear way to express the same function.
an onto function f : N → Q+ ?
With N, there’s one nice way to think about onto functions f with N as the domain. We
can think of defining f : N → Q+ via r0 = f (0), r1 = f (1), . . ., where each rn ∈ Q+ . In
other words, we can describe f by describing a way to enumerate all the rational numbers
in order r0 , r1 , . . .. What order should we pick?
Each positive rational number r is defined by a pair of integers n, d ∈ N+ , where r = nd .
The following order doesn’t work: r0 = 11 , r1 = 12 , r2 = 31 , r3 = 14 , . . . , r? = 21 , r? = 22 , r? =
2
3
, . . . There’s infinitely many possible denominators d, so we can’t set n = 1 and iterate
through all possible d before changing n. We need some way of changing both n and d to
make sure all pairs of positive integers appear in this list.
Here’s on way to do it: 11 , 21 , 21 , 13 , 22 , 31 , 14 , 23 , 32 , 14 , 15 , 24 , 33 , 42 , 51 , 16 , . . . In other
words, enumerate all n, d ∈ N+ where n + d = 2 (which is just n = d = 1), then all
n, d ∈ N+ where n + d = 3 (of which there are 2), then all n, d ∈ N+ where n + d = 4 (of
which there are 3), then all n, d ∈ N+ where n + d = 5 (of which there are 4), etc. This
enumeration shows that |N| ≥ |Q+ |.
N vs. Q. To see that |N| ≥ |Q|, i.e., there’s an onto function f : N → Q, we can use the
same trick as above with Z. Let g : N → Q+ be the onto function we just defined, and let
172 CHAPTER 11. UNDECIDABILITY
f : N → Q be defined as
f (0) = 0
f (1) = g(0)
f (2) = −g(0)
f (3) = g(1)
f (4) = −g(1)
f (5) = g(2)
f (6) = −g(2)
...
Since g maps onto every positive rational number, f maps onto every rational number.
N vs. {0, 1}∗ . What about infinite sets without numbers? Recall that showing |{0, 1}∗ | ≤
|N| amounts to show that {0, 1}∗ can be “enumerated” in order s0 , s1 , s2 , . . .. The length-
lexicographical enumeration of {0, 1}∗ is such an enumeration: ε, 0, 1, 00, 01, 10, 11, 000, 001, . . ..
In fact, because this repeats no strings in {0, 1}∗ , it is a bijection (1-1 and onto). Bijections
are invertible, so its inverse is an onto function g : N → {0, 1}∗ , so |N| = |{0, 1}∗ |.
R vs. the unit interval. What about R, the set of real numbers, and (0, 1), the set of real
numbers strictly between 0 and 1? (a.k.a., the unit interval ) Since (0, 1) ⊂ R, we know that
|(0, 1)| ≤ |R|.
First, define g : (0, 1) → R+ as g(x) = x1 − 1. To see that g is onto, let y ∈ R+ and note
1 1
that setting x = y+1 means g(x) = y; note that y+1 ∈ (0, 1) for any positive real y. This
+
shows that |(0, 1)| ≥ |R |.
Now we must show |R+ | ≥ |R|. Define f : R+ → R as f (x) = log2 x. To see that f is
onto, let y ∈ R. Letting x = 2y makes f (x) = y, noting 2y > 0 for any real y.
Thus |(0, 1)| ≥ |R+ | ≥ |R|.
So it seems with many of these infinite sets, we can find an onto function from one to the
other. Perhaps |A| = |B| for all infinite sets? Cantor discovered in 1874 that the answer is
no.7
11.6.3 Diagonalization
The following theorem changed the course of science.
7
He used a different technique in 1874, and in 1891 discovered the technique known as “diagonalization” that we
present here.
11.6. THE HALTING PROBLEM IS UNDECIDABLE 173
Theorem 11.6.1. Let X be any set. Then there is no onto function f : X → P(X).
Proof. Let X be any set, and let f : X → P(X). It suffices to show that f is not onto.
Define the set
D = { a ∈ X | a 6∈ f (a) } .
Let a ∈ X be arbitrary. Since D ∈ P(X), it suffices to show that D 6= f (a). By the
definition of D,
a ∈ D ⇐⇒ a 6∈ f (a),
so D 6= f (a).
The interpretation is that |X| < |P(X)|, even if X is infinite.
This proof works for any set X at all. In the special case where X = N, we can visualize
why this technique is called “diagonalization”. Suppose for the sake of contradiction that
there is a onto function f : N → P(N); then we can enumerate the subsets of N in order
S0 = f (0), S1 = f (1), S2 = f (2), . . .. Each set S ⊆ N can be represented by an infinite
binary sequence χS , where the n’th bit χS [n] = 1 ⇐⇒ n ∈ S. Those sequences are the
rows of the following infinite matrix:
0 1 2 3 ... k
S0 = {1, 3, . . .} 0 1 0 1 ...
S1 = {0, 1, 2, 3, . . .} 1 1 1 1 ...
S2 = {2, . . .} 0 0 1 0 ...
S3 = {0, 2, . . .} 1 0 1 0 ...
.. ...
.
Sk = D = {0, 3, . . .} 1 0 0 1 ... ?
If D is in the range of f , then Sk = D for some k ∈ N. But D is defined so that χD is the
bitwise negation of the diagonal of the above matrix. So if D appears as row k, this gives a
contradiction when we ask what is the bit at row k and column k.
Stating that a set X is countable is equivalent to saying that its elements can be listed;
i.e., that it can be written X = {x0 , x1 , x2 , . . .}, where every element of X will appear
somewhere in the list.12
Observation 11.6.2. |N| < |R|; i.e., R is uncountable.
Proof. By Theorem 11.6.1, |N| < |P(N)|, so it suffices to prove that |P(N)| ≤ |R|;13 i.e., that
there is an onto function f : R → P(N).
Define f : R → P(N) as follows. Each real number r ∈ R has an infinite decimal
expansion.14 For all n ∈ N, let rn ∈ {0, 1, . . . , 9} be the nth digit of the decimal expansion
of r. Define f (r) ⊆ N as follows. For all n ∈ N,
n ∈ f (r) ⇐⇒ rn = 0.
That is, if the nth digit of r’s binary expansion is 0, then n is in the set f (r), and n is not in
the set otherwise. Given any set A ⊆ N, there is some some number rA ∈ R whose decimal
expansion has 0’s exactly at the positions n ∈ A, so f (rA ) = A, whence f is onto.
Continuum Hypothesis. There is no set A such that |N| < |A| < |P(N)|.
More concretely, this is stating that for every set A, either there is an onto function
f : N → A, or there is an onto function g : A → P(N).
Interesting fact: Remember earlier when we stated that Gödel proved that there are
true statements that are not provable? The Continuum Hypothesis is a concrete example of
a statement that is not provable, nor is its negation.15 So it will forever remain a hypothesis;
we can never hope to prove it either true or false.
Theorem 11.6.1 has immediate consequences for the theory of computing.
Observation 11.6.3. There is an undecidable language L ⊆ {0, 1}∗ .
Proof. {0, 1}∗ is countable, as is the set of all TM’s. By Theorem 11.6.1, P({0, 1}∗ ), the set
of all binary languages, is uncountable. So some language is not decided by any TM.
is undecidable.
Proof. Assume for the sake of contradiction that Halts is decidable, by the algorithm H.
1 def H (M , w ) :
2 """ Function that supposedly decides whether M ( w ) halts . """
3 raise N o t Im p l em e n te d E rr o r ()
In other words, in input hM i a TM, D runs H(hM, M i),16 and does the opposite.17
Now consider running D(hDi). Does it halt or not? We have
a contradiction.
1 D ( D ) # behavior not well - defined ! So H cannot be implemented .
It is worth examining the proof of Theorem 11.6.4 to see the diagonalization explicitly.
We can give any string as input to a program. Some of those strings represent other
programs, and some don’t. We just want to imagine what happens when we run a program
Mi on an input hMj i that represents another program Mj . If Mi is not even intended to
handle inputs that are programs (for instance if it tests integers for primality, or graphs for
Hamiltonian paths), then the behavior of Mi on inputs representing programs may not be
interesting. Nonetheless, Mi either halts on input hMj i, or it doesn’t.
We then get an infinite matrix describing all these possible behaviors.
16
In more detail, D runs H to determine if M halts on the string that is the binary description of M itself.
17
In other words, halt if H rejects, and loop if H accepts. Since H is a decider, D will do one of these.
176 CHAPTER 11. UNDECIDABILITY
If the halting problem were decidable, then the program D would be implementable, and
its behavior on each input, described by the row with D on the left, would be the element-
wise opposite of the diagonal of this matrix. But this gives a contradiction since the entry
in the diagonal corresponding to D would then be undefined. Since D can be implemented
if Halts is decidable, this establishes that Halts is not decidable.
The title of Section 11.7 claims that Gödel’s Incompleteness Theorem is a consequence
of undecidability. However, Gödel’s Incompleteness Theorem preceded Turing’s proof of the
undecidability of the Halting Problem by 5 years. Gödel glimpsed the first outlines of the full
theory of undecidability, which was opened wide with Turing’s proof of the undecidability of
the Halting Problem. So despite Gödel’s Incompleteness Theorem being chronologically first,
there’s a certain sense in which undecidability is conceptually at the center, with Gödel’s
Incompleteness Theorem being one application of these ideas (in fact, the first application,
discovered before the full weight of the ideas was truly understood). Also, one can use
Turing’s ideas to give a far simpler proof of Gödel’s Theorem than one tends to find in
textbooks on mathematical logic (and certainly simpler than Gödel’s original proof).
We prove a slightly weaker statement than what is usually called Gödel’s Incomplete-
ness Theorem; see http://www.scottaaronson.com/blog/?p=710 for a discussion of the
technicalities. What we prove is that there is a mathematical statement T such that either:
1. T and ¬T are both unprovable (thus one is true but not provable), or
Before going on, it is worth considering the importance of the second statement. If it were
true, it would mean that our formalization of mathematics is useless: if false statements
are provable, then proving a statement doesn’t actually inform us whether it is true.19 The
mathematical statement T will be of the form “a certain Turing machine halts on a certain
input.”
We show that if the above were false, then the Halting Problem would be decidable, a
contradiction. If the above is false, then for all mathematical statements T , both of the
following hold:
happens that renders the whole discussion irrelevant. The most common objection to Gödel’s Theorem is: “Wait, if
Gödel’s Theorem proves that a certain statement T is true but unprovable, isn’t Gödel’s Theorem itself a proof
of T ? So how can T be unprovable??” But Gödel’s Theorem doesn’t actually prove T is true: it proves the logical
disjunction “T is true OR that other crazy thing is true”, so Gödel’s Theorem is not actually a proof of T ; instead
it is a proof of (T ∨ crazy-thing). Sneak preview: the crazy thing is “some false statement is provable”.
19
Upon hearing that a mathematical system is “useless” if it gets a statement wrong, one might object, “That’s
a bit harsh... sure, there’s this one false statement T that is provable, so the mathematical system misleads us on
T , but maybe the system gets every other statement T 0 6= T correct, in the sense of being able to prove T 0 only if
T 0 is actually true, so the mathematical system is mostly useful despite getting one statement wrong.” It turns out
that, with a slight adjustment of the definition of “wrong”, this rosy scenario is impossible: if the system gets one
statement wrong, then it gets them all wrong.
The stronger statement of Gödel’s Incompleteness Theorem (actually proven by someone named Rosser) replaces
the second statement “T is false but provable” (a condition called unsoundness) with “T and ¬T are both provable”
(a condition called inconsistency). Then, if the system were inconsistent, it could prove every statement S! (including
every statement S and its negation ¬S, just like T and ¬T ). In other words, if there’s even one statement T where
the mathematical theory “can’t decide” whether T or ¬T holds, then in fact it can’t make a decision on any other
statement either. To see this, first convince yourself that ¬T =⇒ (T =⇒ S) is a logical tautology, i.e., holds for
every pair of statements T and S. (Just write out the truth table for the Boolean formula, for all 4 possible settings
of true and false to T and S.) Then, let S be an arbitrary statement and suppose T and ¬T are both provable. Apply
modus ponens twice: since the hypothesis ¬T is provable, then the conclusion (T =⇒ S) is provable, and since the
hypothesis T is provable, then the conclusion S is provable.
178 CHAPTER 11. UNDECIDABILITY
Reading Python
179
180 APPENDIX A. READING PYTHON
Strings. Python strings can be created with double quotes as in C++ or Java:
1 s = " This is a string . "
There is no separate char type. When you would use a char in C/C++/Java, in Python
you simply use a string of length 1.
Strings can also be created with single-quotes, which is convenient because double quotes
can be used inside of single-quoted strings, and vice versa, without escaping them with a
backslash:
1 s = ’ These " double quotes " do not need backslash escaping . ’
2 s2 = " These \" double quotes \" need backslash escaping . "
3 s3 = " It ’ goes the other way ’ too . "
Finally, Python has multiline strings, which start and end with three quotes instead of just
one, that allow newlines. Both types of quote can be used in them without escapes:
1 s = """ This is the " first " line .
2 This is the ’ second ’ line . """
3 s2 = ’’’ This is the " first " line .
4 This is the ’ second ’ line . ’’’
A.3. TUTORIAL ON READING PYTHON 181
Without triple quotes, you would need to use "\n" to insert newlines as in C/C++/Java,
and would need to escape some quotes:
1 s3 = " This is the \" first \" line .\ nThis is the ’ second ’ line . "
Indentation. Like pseudocode, blocks of Python code are denoted by indentation, not curly
braces as in C, C++, or Java. Consider the following code, which prints only c to the screen:
1 if 3 == 4:
2 print ( " a " )
3 print ( " b " )
4 print ( " c " )
However, the following code prints b followed by c.
1 if 3 == 4:
2 print ( " a " )
3 print ( " b " )
4 print ( " c " )
Documentation. Often the first line of a function will be a multiline string not assigned
to any variable. This is called a docstring and is interpreted as a comment documenting the
function:
1 def square ( x ) :
2 """ Compute the square of x .
3
4 For example , square (5) == 25 and square (3) == 9. """
5 return x * x
Since the docstring is not assigned to any variable, it has no effect on the program. Docstrings
are used by certain tools to automatically generate documentation for Python libraries:
https://wiki.python.org/moin/DocumentationTools
for loops. The code for i in range(n): in Python is similar to for (int i=0; i<n; i++)
in C/C++/Java: the loop has n iterations, letting i take on the values 0, 1, 2, . . ., n − 1.
So the following code prints the numbers from 0 to 9:
1 for i in range (10) :
2 print ( i )
But, many times in C/C++ when you would use a loop like for (int i=0; i<n; i++),
it would be to iterate over elements of an array or other data structure, i.e.,
1 int arr [] = {2 ,3 ,5 ,7 ,11};
2 for ( int i =0; i <5; i ++) {
3 int num = arr [ i ];
4 printf ( " The current integer is % d " , num ) ;
5 }
182 APPENDIX A. READING PYTHON
Python has a special syntax for this sort of loop, discussed below, which executes one
iteration for every element of a container. This is better to use than to loop over the indices,
because it is more readable, and because it eliminates the possible error that the index i is
out of bounds. For instance, if above loop instead were for (int i=0; i<=5; i++), i would
go off the end of the array.
lists and sets. v = [1,2,3,4,5] makes a list with 5 elements. A Python list is like an
array in C/C++/Java, but actually more like std::vector in C++ or java.util.ArrayList
in Java, because Python lists can grow and shrink. A list can have duplicates such as
d = [1,2,3,2,3], but a set cannot.
s = set(d) creates the set {1, 2, 3}, eliminating the duplicates in d. Another way to
create this set is the line s = {1,2,3}. Tuples such as (2,3,4) are lists that cannot be
modified (for example, l[0] = 5 works for a list l, but t[0] = 5 fails for a tuple t).
You can iterate over all elements of a list/tuple/set with a for loop using the in keyword:1
1 for s in [ " a " ," b " ," cd " ]:
2 print (4* s )
prints "aaaa", "bbbb", and "cdcdcdcd" to the screen.
list/set comprehensions. These are one of the most useful features and one of the best
reasons to familiarize yourself with mathematical set builder notation. Many clunky lines of
code can be replaced by a simple line that expresses the same idea. Recall that range(n) is
(something like) a list with the integers from 0 to n − 1.2 The following code creates various
other lists/sets from it:
1 ints = range (10) # range with elements 0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9
2 a = [3* n for n in ints ] # [0 ,3 ,6 ,9 ,12 ,15 ,18 ,21 ,24 ,27]
3 b = [ n for n in ints if n % 2 == 0] # [0 ,2 ,4 ,6 ,8]
4 c = [ n //3 for n in ints ] # [0 ,0 ,0 ,1 ,1 ,1 ,2 ,2 ,2 ,3]
5 d = { n //3 for n in ints } # {0 ,1 ,2 ,3}
6 e = {3 n for n in ints if n % 2 == 0} # {0 ,6 ,12 ,18 ,24}
a, b, and c are lists, but d and e are sets, so cannot have duplicates.
In general, if we have some expression <expr> (such as n//3) and optionally some Boolean
expression <phi> (such as n%2==0), the following code:
1 new_lst = [ < expr > for n in lst if <phi >]
is equivalent to
1 new_lst = []
2 for n in lst :
3 if <phi >:
4 new_lst . append ( < expr >)
1
The latest C++ standard now supports a similar idea called a “range-based for loop”: http://en.cppreference.
com/w/cpp/language/range-for.
2
Technically it returns something called a “range” type in Python 3. The main difference with a list is that the
numbers don’t get stored in memory, but we can iterate over a range just like a list, and the numbers will be generated
as we need them. In some languages this is called “lazy evaluation”.
A.3. TUTORIAL ON READING PYTHON 183
Omitting the if expression means all of the elements are added, i.e.,
1 new_lst = [ < expr > for n in lst ]
is equivalent to
1 new_lst = []
2 for n in lst :
3 new_lst . append ( < expr >)
Sets may be similarly constructed:
1 new_set = { < expr > for n in lst if <phi >}
is equivalent to
1 new_set = set () # need to use set () constructor ; {} makes an empty
dict , not a set
2 for n in lst :
3 if <phi >:
4 new_set . add ( < expr >)
For example, s = {3*n for n in range(10) if n%2==0} is equivalent to
1 s = set ()
2 for n in range (10) :
3 if n %2==0:
4 s . add (3* n )
They both create the set {0, 6, 12, 18, 24}. Note the similarity to the mathematical
set-builder notation s = {3n | n ∈ {0, 1, . . . , 9}, n is even}.
The big advantage of list/set comprehension notation isn’t so much that you have to
type fewer keystrokes (although you do save a few). The main advantage is readability. The
notation clearly communicates to anyone fluent in Python what is the purpose of the line of
code: to take items from a list/set3 , perhaps filter some of them out with the if keyword,
and process the rest using the expression at the beginning.
You may have experience with functional programming, with map and filter keywords
to transform lists. The expression <expr> above is like an anonymous function used with
map, and the Boolean expression <phi> is like an anonymous Boolean function used with
filter. In fact, Python also has the keywords map and filter, and they work the same way.
However, it is conventional to prefer a list comprehension to map and/or filter, since it is
usually more readable. Similarly, reduce (also called foldl/foldr in functional languages)
exists in Python (in functools in Python 3), but an explicit for loop is usually more readable.
See
• http://www.artima.com/weblogs/viewpost.jsp?thread=98196
• https://google.github.io/styleguide/pyguide.html
• https://stackoverflow.com/questions/5426754/google-python-style-guide
3
More generally, any object that can be “iterated over”: lists, sets, tuples, strings, and some other objects.
184 APPENDIX A. READING PYTHON
itertools. The itertools package is very useful for checking the various kinds of sub-
structures of data structures that are common in algorithms. For example, itertools.
combinations(x,k) takes all subsequences of length k from x. For example, this code:
1 import itertools
2 x = [2 ,3 ,5]
3 for t in itertools . combinations (x , 2) :
4 print ( t )
prints all the ordered pairs of elements from x, in the order they originally appear:
(2, 3)
(2, 5)
(3, 5)
To make the code easier to read, we can also import only the function combinations, and
change its name to something else (since we use it to get subsets of a fixed size, even though
for convenience we often use Python lists to represent sets):
1 from itertools import combinations as subsets
2 x = [2 ,3 ,5]
3 for t in subsets (x , 2) :
4 print ( t )
The above code is more “literate”; it’s straightforward to read the for loop in English as
“for all t that are subsets of x of size 2.”
The following code:
1 from itertools import combinations as subsets
2 x = [1 ,2 ,3 ,4 ,5]
3 for t in subsets (x , 3) :
4 print ( t )
prints all ordered triples:
(1, 2, 3)
(1, 2, 4)
(1, 2, 5)
(1, 3, 4)
(1, 3, 5)
(1, 4, 5)
(2, 3, 4)
(2, 3, 5)
(2, 4, 5)
(3, 4, 5)
Strings such as "abc" are not technically lists of characters (in fact Python has no char
data type; individual characters are the same thing as strings of length 1), but many of the
same functions that work on a list also work on a string, as though it were a list of length-1
strings. For example:
A.3. TUTORIAL ON READING PYTHON 185