0% found this document useful (0 votes)
66 views

Notes

Uploaded by

Lemi Turo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Notes

Uploaded by

Lemi Turo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

Theory of Computation

ECS 120
Lecture Notes

David Doty
ii

Copyright
c May 31, 2020, David Doty

No part of this document may be reproduced without the expressed written consent of
the author. All rights reserved.
Contents

1 Introduction 1
1.1 What this course is about . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Automata, computability, and complexity . . . . . . . . . . . . . . . . . . . 3
1.3 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Implication statements . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Sequences and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.4 Functions and relations . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 The pigeonhole principle . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.6 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.7 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.8 Boolean logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Proof by induction on natural numbers . . . . . . . . . . . . . . . . . 11
1.4.2 Induction on other structures . . . . . . . . . . . . . . . . . . . . . . 12

I Automata Theory 15

2 String theory 17
2.1 Why to study automata theory . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Deterministic finite automata 23


3.1 Intuitive overview of deterministic finite automata (DFA) . . . . . . . . . . . 23
3.2 Formal models of computation . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Formal definition of a DFA (syntax) . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Formal definition of computation by a DFA (semantics) . . . . . . . . . . . . 29

iii
iv CONTENTS

4 Declarative models of computation 31


4.1 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Formal definition of a regex (syntax and semantics) . . . . . . . . . . 32
4.1.3 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Formal definition of a CFG (syntax) . . . . . . . . . . . . . . . . . . 37
4.2.3 Formal definition of computation by a CFG (semantics) . . . . . . . . 37
4.2.4 Right-regular grammars (RRG) . . . . . . . . . . . . . . . . . . . . . 38
4.3 Nondeterministic finite automata (NFA) . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Formal definition of an NFA (syntax) . . . . . . . . . . . . . . . . . . 39
4.3.3 Set of states reachable by an NFA . . . . . . . . . . . . . . . . . . . . 40
4.3.4 Transition function versus set of transitions . . . . . . . . . . . . . . 41
4.3.5 Formal definition of computation by an NFA (semantics) . . . . . . . 41
4.3.6 Example NFA using ε-transitions . . . . . . . . . . . . . . . . . . . . 41

5 Closure of language classes under set operations 43


5.1 Automatic transformation of regex’s, NFAs, DFAs . . . . . . . . . . . . . . . 43
5.2 DFA union and intersection (product construction) . . . . . . . . . . . . . . 46
5.3 NFA union, concatenation, and star constructions . . . . . . . . . . . . . . . 50
5.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Equivalence of models 57
6.1 Equivalence of DFAs and NFAs (subset construction) . . . . . . . . . . . . . 57
6.2 Equivalence of RGs and NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1 Left-regular grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Equivalence of regex’s and NFAs . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3.1 Every regex-decidable language is NFA-decidable . . . . . . . . . . . 62
6.3.2 Every NFA-decidable language is regex-decidable . . . . . . . . . . . 64
6.4 Optional: Equivalence of DFAs and constant memory programs . . . . . . . 67

7 Proving problems are not solvable in a model of computation 71


7.1 Some languages are not regular . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.1 Why we need rigor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.2 A non-regular language . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.3 Another non-regular language . . . . . . . . . . . . . . . . . . . . . . 73
7.2 The pumping lemma for regular languages . . . . . . . . . . . . . . . . . . . 73
7.3 Using the Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
CONTENTS v

7.4 Optional: Proof of the Pumping Lemma for regular languages . . . . . . . . 77


7.5 Optional: The Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . 78
7.5.1 Distinguishing extensions and statement of Myhill-Nerode Theorem . 79
7.5.2 Examples of using Myhill-Nerode Theorem . . . . . . . . . . . . . . . 79
7.5.3 Optional: Proof of Myhill-Nerode Theorem . . . . . . . . . . . . . . . 80
7.6 Optional: The Pumping Lemma for context-free languages . . . . . . . . . . 81

II Computational Complexity and Computability Theory 83

8 Turing machines 85
8.1 Intuitive idea of Turing machines . . . . . . . . . . . . . . . . . . . . . . . . 85
8.2 Formal definition of a TM (syntax) . . . . . . . . . . . . . . . . . . . . . . . 87
8.3 Formal definition of computation by a TM (infinite semantics) . . . . . . . . 88
8.4 Optional: Formal definition of computation by a TM (finite semantics) . . . 89
8.5 Languages recognized/decided by TMs . . . . . . . . . . . . . . . . . . . . . 90
8.6 Variants of TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.6.1 Multitape TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.6.2 Other variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.7 TMs versus code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.8 Optional: The Church-Turing Thesis . . . . . . . . . . . . . . . . . . . . . . 96

9 Efficient solution of problems: The class P 99


9.1 Asymptotic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.1.1 Defining running time . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.1.2 Asymptotic Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.1.3 Rules of thumb for comparing growth rates . . . . . . . . . . . . . . . 101
9.1.4 Other asymptotic notations . . . . . . . . . . . . . . . . . . . . . . . 104
9.2 Time complexity classes and the Time Hierarchy Theorem . . . . . . . . . . 104
9.3 Optional: Time complexity of simulation of multitape TM with a one-tape TM105
9.4 Definition of P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.4.1 The complexity class P. . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.4.2 “Reasonable” encodings. . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.4.3 Input size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.4.4 P is the same for most encodings and programming languages. . . . . 107
9.5 Examples of problems in P . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.5.1 Paths in graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.5.2 Relatively prime integers . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.5.3 Connected graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.5.4 Optional: Eulerian cycles in graphs . . . . . . . . . . . . . . . . . . . 112
9.6 Optional: Why identify P with “efficient”? . . . . . . . . . . . . . . . . . . . 114
vi CONTENTS

10 Efficient verification of solutions: The class NP 117


10.1 Polynomial-time verifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.1.1 Hamiltonian path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.1.2 Composite numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.2 The class NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.2.1 Definition of NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.2.2 What sort of computation is NP is capturing? . . . . . . . . . . . . . 120
10.2.3 Decision vs. Search vs. Optimization . . . . . . . . . . . . . . . . . . 120
10.3 Examples of problems in NP . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.3.1 Finding cliques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.3.2 Finding subset of integers with a given sum . . . . . . . . . . . . . . 123
10.3.3 The P versus NP question . . . . . . . . . . . . . . . . . . . . . . . . 123
10.4 NP problems are decidable in exponential time . . . . . . . . . . . . . . . . . 124
10.5 Introduction to NP-Completeness . . . . . . . . . . . . . . . . . . . . . . . . 126
10.5.1 Boolean formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.5.2 Implementation of Boolean formula data structure in Python . . . . . 127
10.5.3 Satisfiability of Boolean formulas . . . . . . . . . . . . . . . . . . . . 129
10.6 Polynomial-time reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.6.1 Ranking the hardness of problems with reductions . . . . . . . . . . . 129
10.6.2 Simple example: Reduction of IndSet to Clique . . . . . . . . . . . 131
10.6.3 Definition of polynomial-time reducibility . . . . . . . . . . . . . . . . 131
10.6.4 Using reductions to bound “hardness” of problems . . . . . . . . . . . 132
10.6.5 How to remember which direction reductions go . . . . . . . . . . . . 134
10.6.6 Definition of the 3Sat problem . . . . . . . . . . . . . . . . . . . . . 135
10.6.7 Reduction between problems with different data types: 3Sat ≤P
IndSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.6.8 Reductions are algorithms, but they don’t solve either problem. . . . 138
10.7 Definition of NP-completeness . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.7.1 The Cook-Levin Theorem . . . . . . . . . . . . . . . . . . . . . . . . 140
10.8 Optional: Additional NP-Complete problems . . . . . . . . . . . . . . . . . . 141
10.8.1 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.8.2 Subset Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.8.3 Hamiltonian path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.9 Optional: Proof of the Cook-Levin Theorem . . . . . . . . . . . . . . . . . . 145
10.10Optional: A brief history of the P versus NP problem . . . . . . . . . . . . . 149

11 Undecidability 153
11.1 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.1.1 Turing-recognizable but not decidable . . . . . . . . . . . . . . . . . . 153
11.1.2 Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.2 Optional: Source code versus programs . . . . . . . . . . . . . . . . . . . . . 155
11.3 Undecidable problems about algorithm behavior . . . . . . . . . . . . . . . . 156
CONTENTS vii

11.3.1 No-input halting problem . . . . . . . . . . . . . . . . . . . . . . . . 156


11.3.2 Recipe for showing undecidability via reduction from Haltsε . . . . 158
11.3.3 Accepting a given string . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.3.4 Accepting no inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.3.5 Rejecting at least one string . . . . . . . . . . . . . . . . . . . . . . . 160
11.4 How to spot undecidability at a glance . . . . . . . . . . . . . . . . . . . . . 160
11.5 Optional: Enumerators as an alternative definition of Turing-recognizable . . 163
11.5.1 A non-Turing-recognizable language . . . . . . . . . . . . . . . . . . . 167
11.6 The Halting Problem is undecidable . . . . . . . . . . . . . . . . . . . . . . . 170
11.6.1 Comparing sizes of sets using onto functions . . . . . . . . . . . . . . 170
11.6.2 Can one infinite set be larger than another? . . . . . . . . . . . . . . 170
11.6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
11.6.4 Countable versus uncountable sets . . . . . . . . . . . . . . . . . . . 173
11.6.5 Using diagonalization to show the halting problem is undecidable . . 174
11.7 Optional: The far-reaching consequences of undecidability . . . . . . . . . . 176
11.7.1 Gödel’s Incompleteness Theorem . . . . . . . . . . . . . . . . . . . . 176
11.7.2 Prediction of physical systems . . . . . . . . . . . . . . . . . . . . . . 178

A Reading Python 179


A.1 Installing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.2 Code from this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.3 Tutorial on reading Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
viii CONTENTS
Chapter 1

Introduction

1.1 What this course is about


What are the fundamental capabilities and limitations of computers?
The following is a rough sketch of what to expect from this course:

• In ECS 36a/b/c, you programmed computers without studying them mathematically.

• In ECS 20, you mathematically studied things that are not computers.

• In ECS 120, you will use the tools of ECS 20 to mathematically study computers.

The fundamental premise of the theory of computation is that the computer on your
desk—or in your pocket—obeys certain laws, and therefore, certain unbreakable limitations.
We can reason by analogy with the laws of physics. Newton’s equations of motion tell us
that each object with mass obeys certain rules that cannot be broken. For instance, an
object cannot accelerate in the opposite direction in which force is being applied to it. Of
course, nothing in the world is a rigid frictionless body of mass obeying all the idealized
assumptions of classical mechanics... in other words, “the map is not the territory”. So in
reality, Newton’s equations of motion do not exactly predict anything. But they are a useful
abstraction of what real matter is like, and many things in the world are close enough to
this abstraction that Newtonian predictions are reasonably accurate.
I often think of the field of computer science outside of theory as being about proving
what can be done with a computer... simply by doing it! Much of research in theoretical
computer science is about proving what cannot be done with a computer. This can be more
difficult, since you cannot simply cite your failure to invent an algorithm for a problem to be
a proof that there is no algorithm. But certain important problems cannot be solved with
any algorithm, as we will see.
We will draw no distinction between the idea of “formal proof” and more nebulous
instructions such as “show your work”/“justify your answer”/“explain”. A “proof” of a
claim is an argument that convinces an intelligent person who has never seen the claim

1
2 CHAPTER 1. INTRODUCTION

before and cannot see why it is true with having it explained. It does not matter if the
argument uses formal mathematical notation or not (though formal notation is briefer and
more straightforward to make precise than English), or if it uses proof by induction or proof
by contradiction or just a direct proof (though it is often easier to think in terms of induction
or contradiction). What matters is that there are no holes or counter-arguments that can
be thrown at the argument, and that every statement is precise and unambiguous.
However, one effective technique to prove theorems (I do this in nearly all of my research
papers) is to first give an informal “proof sketch”, intuitively explaining how the proof will
go, but shorter than the proof and lacking in potentially confusing (but necessary) details.
The proof is easier to read if one first reads the proof sketch, but the proof sketch by itself is
not a proof. In fact, I would go so far as to say that the proof by itself is not a very effective
proof either, since bare naked details and mathematical notation, without any intuition to
help someone understand it, do not communicate why the theorem is true any better than
the hand-waving proof sketch. Both are usually necessary to accomplish the goal of the
proof: to help the reader understand why the theorem is true.
Here’s an example of a formal theorem, and informal proof sketch, and a formal proof:
Theorem 1.1.1. There are infinitely many prime numbers.
Proof sketch. Intuitively, we show that for each finite set of prime numbers, we can find
a larger prime number. We do this by doing multiplying together the existing primes and
adding 1. This is a new prime because, if it were a multiple of an existing prime, it would
be only one greater than a different multiple of that prime. This is a contradiction because
multiples of integers bigger than 1 are more widely spaced. (For example, multiples of 3 are
at least 3 apart: 3, 6, 9, 12, 15, . . .)
Proof. Let S = {p1 , p2 , . . . , pk } ⊂ N be any finite set of primes. It suffices to show that we
can find a prime not in S, implying that S cannot be all the primes. Since S is an arbitrary
finite set of primes, this shows that the set of all prime numbers cannot be finite.
Let m = p1 · p2 · . . . · pk and let n = m + 1. Since n is larger than any number in S,
thus unequal to any of them, it suffices to show n is also prime. Suppose for the sake of
contradiction that n is not prime; then it has a prime factor pi 6= n, where pi | n (i.e., pi
divides n). Note that pi | m as well. Therefore n and m are different multiples of pi , so their
difference n − m must be at least pi . But n − m = 1, which is smaller than any prime, a
contradiction.
Note that the proof sketch is a bit shorter, uses less formal notation, and skips some
details. As a result, it is easy to read, but incomplete. The proof gives all the details, but
the proof sketch is a “roadmap” helping to guide the reader through it. Sometimes when I
write a paper, I don’t necessarily keep the proof sketch separate from the proof. Sometimes
the first paragraph or two of the proof is the sketch. Other times, the sketch is more “woven”
into the proof, especially for long proofs. This helps the reader stop to “come up for air”
occasionally and remember the big picture that the proof is trying to show. (However, in
this course, most proofs will not be very long, so won’t require this level of care.)
1.2. AUTOMATA, COMPUTABILITY, AND COMPLEXITY 3

In this course, in the interest of time, I will often give the intuitive proof sketch only
verbally, and write only the formal details on the board. On your homework, however, you
should explicitly write both, to make it easy for the TA to understand your proof and give
you full credit.

1.2 Automata, computability, and complexity


Multivariate polynomials

Consider the polynomial equation x2 −y 2 = 2. Does this have a solution? Yes: x = 2, y = 0
What about an integer solution? No.
x2 + 2xy − y 3 = 13. Does this have an integer solution? Yes: x = 3, y = 2.
Equations involving multivariate polynomials which we ask for a integer solution are
called Diophantine equations.
Task A: write an algorithm that indicates whether a given multivariable polynomial
equation has a real solution.
Fact: Task A is possible.
Task A0 : write an algorithm that indicates whether a given multivariable polynomial
equation has an real integer solution.
Fact: Task A0 is impossible.1

Paths touring a graph


Does the graph G = (V, E) given in Figure 1.1 have a path that contains each edge exactly
once? (Eulerian path)
Task B: write an “efficient” algorithm that indicates if a graph G has a path visiting
each edge exactly once
Algorithm: Check whether G is connected (by breadth-first search) and every node has
even degree.
Task B 0 : write an “efficient” algorithm that indicates if a graph G has a path visiting
each edge node exactly once
Fact: Task B 0 is impossible (assuming P 6= NP).

Counting
A regular expression (regex) is an expression that matches some strings and not others. For
example
(0(0 ∪ 1 ∪ 2)∗ ) ∪ ((0 ∪ 2)∗ 1)
matches any string of digits that starts with a 0, followed by any number of 0’s, 1’s, and 2’s,
or ends with a 1, preceded by any number of 0’s and 2’s.
1
We could imagine trying “all” possible integer solutions, but if there is no integer solution, then we will be trying
forever and the algorithm will not halt.
4 CHAPTER 1. INTRODUCTION

Figure 1.1: “Königsberg graph”. Licensed under CC BY-SA 3.0 via Commons – https://upload.
wikimedia.org/wikipedia/commons/5/5d/Konigsberg_bridges.png

If x is a binary string and a a symbol, let #(a, x) be the number of a’s in x. For example,
#(0, 001200121) = 4 and #(1, 001200121) = 3.
Task C: write a regex that matches a ternary string x exactly when #(0, x)+#(1, x) = 3.
Answer: 2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗
Task C 0 : write a regex that matches a ternary string x exactly when #(0, x) − #(1, x) =
3.
Fact: Task C 0 is impossible

Rough layout of this course


The following are the units of this course:

Computability theory (unit 3): What problems can algorithms solve? (finding real roots
of multivariate polynomials, but not integer roots)
Computational complexity theory (unit 2): What problems can algorithms solve effi-
ciently? (finding paths visiting every edge, but not every node)
Automata theory (unit 1): What problems can algorithms solve with “optimal” effi-
ciency (constant space and “real time”, i.e., time = size of input)? (finding whether
the sum of # of 0’s and # of 1’s equals some constant, but not the difference)

Sorting these by increasing order of the power of the algorithms studied: 1, 2, 3.


Historically, these were discovered in the order 3, 1, 2.
1.2. AUTOMATA, COMPUTABILITY, AND COMPLEXITY 5

It is most common in a theory course to cover them in order 1, 3, 2.


We are going to cover them in order 1, 2, 3.
The reason for swapping the traditional order of units 3 (computability) and 2 (compu-
tational complexity) is this: Most of computer science is about writing programs to solve
problems. You think about a problem, you write a program, and you demonstrate that the
program solves the problem.2
Now, in computability theory, unit 3, most of the problems studied are questions about
the behavior of programs themselves.3 But what sort of objects might answer these questions
about programs? Other programs! So we imagine writing a program P that takes as input
the source code of another program Q. This isn’t such a crazy idea yet. . . compilers, inter-
preters (such as the Javascript engine in a web browser), and virtual machines are programs
that take as input the source code of other programs.
But in computability theory, the situation complicates quickly. A typical problem involves
proving that a certain problem is not solvable by any program. Usually this done by writing
a program P , which takes as input another program Q, and P outputs a third program R.4
It can be quite taxing to keep track of which program is supposed to be doing what. So in
the interest of showing no program exists to solve a certain problem, we introduce not one
but three new programs, all of which are not solving that problem. It’s quite remarkable
that anyone was able to create such a chain of reasoning in the first place to prove limits
on the ability of programs. It adds many conceptual layers of abstraction onto what most
computer science students are accustomed to doing.
On the other hand, computational complexity theory, unit 2, has the virtue that most
problems are about more mundane objects such as lists of integers, graphs, and Boolean
formulas. In complexity theory, we think about programs that take these objects as input
and produce them as output, and there is less danger of getting the program confused with
a graph, for instance, because they are two different types of objects.
The ideas in units 2 and 3 are more similar to each other than they are to unit 1. Also,
we will spend about half the course on unit 1, and the other half on units 2 and 3. So, these
notes are divided into two “parts”: part 1 is unit 1, automata theory, and part 2 is units 2
and 3, computational complexity and computability theory.

2
In algorithms and theory courses and research, to “demonstrate correctness” usually means a proof. In software
engineering, it usually means unit tests and code reviews. For critical stuff like spacecraft software, it involves both.
3
The Diophantine equation problem above is a notable exception, but it took 70 years to prove it is unsolvable,
and even then it was done by creating a Diophantine equation that essentially “mimics” the behavior of a program
so as to connect the existence of its roots to a question about the behavior of certain programs.
4
Now, this sort of idea, of programs receiving and producing other programs, is not crazy in principle. The
C compiler, for example, is itself a program, which takes as input the source code of a program (written in C)
and outputs the code of another program (machine instructions, written in the “language” of the host machine’s
instruction set). The C preprocessor (which rewrites macros, for instance) takes as input C programs and produces
C programs as output.
6 CHAPTER 1. INTRODUCTION

1.3 Mathematical background


1.3.1 Implication statements
Given two boolean statements p and q 5 , the implication p =⇒ q is shorthand for “p
implies q”, or “If p is true, then q is true” 6 , p is the hypothesis, and q is the conclusion. The
following statements are related to p =⇒ q:
• the inverse: ¬p =⇒ ¬q
• the converse: q =⇒ p
• the contrapositive: ¬q =⇒ ¬p7
If an implication statement p =⇒ q and its converse q =⇒ p are both true, then we say
p if and only if (iff) q, written p ⇐⇒ q. Proving a “p ⇐⇒ q” theorem usually involves
proving p =⇒ q and q =⇒ p separately.

1.3.2 Sets
A set is a group of objects, called elements, with no duplicates.8 The cardinality of a set A is
the number of elements it contains, written |A|. For example, {7, 21, 57} is the set consisting
of the integers 7, 21, and 57, with cardinality 3.
For two sets A and B, we write A ⊆ B, and say that A is a subset of B, if every element
of A is also an element of B. A is a proper subset of B, written A B, if A ⊆ B and A 6= B.
We use the following sets throughout the course

• the natural numbers N = {0, 1, 2, . . .}


• the integers Z = {. . . , −2, −1, 0, 1, 2, . . .}
 
p
• the rational numbers Q = p ∈ Z, q ∈ N, q 6= 0
q
• the real numbers R

The unique set with no elements is called the empty set, written ∅.
To define sets symbolically,9 we use set-builder notation: for instance, { x ∈ N | x is odd }
is the set of all odd natural numbers.
5
e.g., “Hawaii is west of California”, or “The stoplight is green.”
6
e.g., “If the stoplight is green, then my car can go.”
7
The contrapositive of a statement is logically equivalent to the statement itself. For example, it is equivalent to
state “If someone is allowed to drink alcohol, then they are at least 21” and “If someone is under 21, then they are
not allowed drink alcohol”. Hence a statement’s converse and inverse are logically equivalent to each other, though
not equivalent to the statement itself.
8
Think of std::set.
9
In other words, to express them without listing all of their elements explicitly, which is convenient for large finite
sets and necessary for infinite sets.
1.3. MATHEMATICAL BACKGROUND 7

We write ∀x ∈ A as a shorthand for “for all elements x in the set A ...”, and ∃x ∈ A
as a shorthand for “there exists an element x in the set A such that ...”. For example,
(∃n ∈ N) n > 10 means “there exists a natural number n such that n is greater than 10”.
Given two sets A and B, A∪B = { x | x ∈ A or x ∈ B } is the union of A and B, A∩B =
{ x | x ∈ A and x ∈ B } is the intersection of A and B, and A \ B = { x ∈ A | x 6∈ B } is
the difference between A and B (also written A − B). A = { x | x 6∈ A } is the complement
of A. 10
Given a set A, P(A) = { S | S ⊆ A } is the power set of A, the set of all subsets of A.
For example,

P({2, 3, 5}) = {∅, {2}, {3}, {5}, {2, 3}, {2, 5}, {3, 5}, {2, 3, 5}}.

Given any set A, it always holds that ∅, A ∈ P(A), and that |P(A)| = 2|A| if |A| < ∞. 11 12

1.3.3 Sequences and tuples

A sequence is an ordered list of objects 13 . For example, (7, 21, 57, 21) is the sequence of
integers 7, then 21, then 57, then 21.
A tuple is a finite sequence.14 (7, 21, 57) is a 3-tuple. A 2-tuple is called a pair.
For two sets A and B, the cross product of A and B is A×B = { (a, b) | a ∈ A and b ∈ B }.
≤k
Sk
Note that |A×B| = |A|·|B|. For k ∈ N, we write Ak = A | × A ×
{z. . . × A
} and A = i
i=0 A .
k times
For example, N2 = N × N is the set of all ordered pairs of natural numbers.
10
Usually, if A is understood to be a subset of some larger set U , the “universe” of possible elements, then A is
understood to be U \ A. For example if we are dealing only with N, and A ⊆ N, then A = { n ∈ N | n 6∈ A }. In
other words, we used “typed” sets, in which case each set we use has some unique superset – such as {0, 1}∗ , N, R, Q,
the set of all finite automata, etc. – that is considered to contain all the elements of the same type as the elements of
the set we are discussing. Otherwise, we would have the awkward situation that for A ⊆ N, A would contain not only
nonnegative integers that are not in A, but also negative integers, real numbers, strings, functions, stuffed animals,
and other objects that are not elements of A.
11
Why?
12
Actually, Cantor’s theory of infinite set cardinalities makes sense of the claim that |P(A)| = 2|A| even if A is an
infinite set. The furthest we will study this theory in this course is to observe that there are at least two infinite set
cardinalities: that of the set of natural numbers, and that of the set of real numbers, which is bigger than the set of
natural numbers according to this theory.
13
Think of std::vector.
14
The closest C++ analogy to a tuple, as we will use them in this course, is an object. Each member variable
of an object is like an element of the tuple, although C++ is different in that each member variable of an object
has a name, whereas the only way to distinguish one element of a tuple from another is their position. But when
we use tuples, for instance to define a finite automaton as a 5-tuple, we intuitively think of the 5 elements as being
like 5 member variables that would be used to define a finite automaton object. And of course, the natural way to
implement such an object in C++ by defining a FiniteAutomaton class with 5 member variables, which is an easier
way to keep track of what each of the 5 elements is supposed to represent than, for instance, using an void[] array
of length 5.
8 CHAPTER 1. INTRODUCTION

1.3.4 Functions and relations


A function f that takes an input from set D (the domain) and produces an output in set
R (the range) is written f : D → R. 15 Given A ⊆ D, define f (A) = { f (x) | x ∈ A }; call
this the image of A under f .
Given f : D → D, k ∈ N and d ∈ D, define f k : D → D by f k (d) = f (f (. . . f ( d)) . . .))
| {z }
k times
to be f composed with itself k times.
If f might not be defined for some values in the domain, we say f is a partial function.16
If f is defined on all values, it is a total function.17
A function f with a finite domain can be represented with a table. For example, the
function f : {0, 1, 2, 3} → Q defined by f (n) = n2 is represented by the table
n f (n)
0 0
1
1 2
2 1
3
3 2

If
(∀d1 , d2 ∈ D) d1 6= d2 =⇒ f (d1 ) 6= f (d2 ),
then we say f is 1-1 (one-to-one or injective).18
If
(∀r ∈ R)(∃d ∈ D) f (d) = r,
then we say f is onto (surjective). Intuitively, f “covers” the range R, in the sense that no
element of R is left un-mapped-to by f .
f is a bijection (a.k.a. a 1-1 correspondence) if f is both 1-1 and onto.
A predicate is a function whose output is boolean.
Given a set A, a relation R on A is a subset of A × A. Intuitively, the elements in R are
the ones related to each other. Relations are often written with an operator; for instance,
the relation ≤ on N is the set R = { (n, m) ∈ N × N | (∃k ∈ N) n + k = m }.
15
Most statically typed programming languages like C++ have direct support for functions with declared types for
input and output. In Java, these are like static methods; Integer.parseInt, which takes a String and returns the
int that the String represents (if it indeed represents an integer) is like a function with domain String and range
int. Math.max is like a function with domain int × int (since it accepts a pair of ints as input) and range int. The
main difference between functions in programming languages and those in mathematics is that in a programming
language, a function is really an algorithm for computing the output, given in the input, whereas in mathematics the
function is just the abstract relationship between input and output, and there may not be any algorithm computing
it.
16
For instance, Integer.parseInt is (strictly) partial, because not all Strings look like integers, and such Strings
will cause the method to throw a NumberFormatException.
17
Every total function is a partial function, but the converse does not hold for any function that is undefined for
at least one value. We will usually assume that functions are total unless explicitly stated otherwise.
18
Intuitively, f does not map any two points in D to the same point in R. It does not lose information; knowing
an output r ∈ R suffices to identify the input d ∈ D that produced it (through f ).
1.3. MATHEMATICAL BACKGROUND 9

1.3.5 The pigeonhole principle


The pigeonhole principle is a very simple concept, but it is surprisingly powerful. It says that
if we put n objects a1 , a2 , . . . , an into fewer than n boxes, then there is a box with at least
two objects. Usually, we think of the boxes as being “properties” or “labels” the objects
have: if there are n objects but fewer than n labels, then two objects must get the same
label.
For example, all people have fewer than 200,000 hairs on their head. There are about
500,000 people in Sacramento. If we think of the people as the objects a1 , a2 , . . . , a500,000 ,
and we think of the count of hairs as the labels (perhaps person a1 has 107,235 hairs, person
a2 has 95,300 hairs, etc.) then since there are only 200,000 labels 0, 1, 2, . . . ,199,999, at least
two people must have the exact same number of hairs on their head.
Here’s a way of stating the pigeonhole principle using functions. If I have a set O of
objects and a set B of boxes, and if |O| > |B|, then every function f : O → B is not 1-1. In
other words, for every function f : O → B, there are two different objects o1 , o2 ∈ O such
that f (o1 ) = f (o2 ), i.e., f puts o1 and o2 in the same box.19

1.3.6 Combinatorics
Counting sizes of sets is a tricky art to learn. It’s a whole subfield of mathematics called
combinatorics, with very deep theorems, but we won’t need anything particularly deep in
this course. For simple counting problems, a few basic principles apply.
The first is sometimes called “The Product Rule”: If a set is defined by cross-product
(i.e., each element of a set is a tuple), then often multiplying the sizes of the smaller sets
works. For example, there are 4 integers in the set A = {1, 2, 3, 4} and 3 integers in the set
B = {1, 2, 3}. How many ways are there to pick one element from A and one from B? There
are 12 pairs of integers in the set A × B, because to choose a pair (a, b) ∈ A × B, there are
4 ways to choose a and 3 ways to choose b, so 4 · 3 ways to choose both. Here they all are:
1 2 3
1 (1, 1) (1, 2) (1, 3)
2 (2, 1) (2, 2) (2, 3)
3 (3, 1) (3, 2) (3, 3)
4 (4, 1) (4, 2) (4, 3)
If the tuple is bigger, you just keep multiplying: in the triple (a, b, c), if there are 4 ways
to choose a, 3 ways to choose b, and 7 ways to choose c, then there are 4 · 3 · 7 = 84 possible
triples (a, b, c).
19
In fact, we will eventually see (Section 11.6.3) that such reasoning applies even to infinite sets. If O and B are
both infinite, but there is no 1-1 function f : O → B, then any way of assigning objects from O to “boxes” in B must
assign two objects to the same box. For example, we will see that if O is the set of all decision problems (defined
formally in Section 2.2) we want to solve with algorithms (defined formally as Turing machines in Chapter 8), and
B is the set of all algorithms, then |O| > |B|, so there is no 1-1 function f : O → B. So any way of assigning each
decision problem uniquely to an algorithm that solves it must fail (by assigning two different problems to the same
algorithm, which can’t very well solve both of them). Thus some decision problems have no algorithm solving them.
10 CHAPTER 1. INTRODUCTION

As another example, what’s the number of functions f : {1, 2, . . . , n} → {1, 2, . . . , k}?


There are k ways to choose what f (1) is, times k ways to choose what f (2) is, etc., and there
are n such values to choose. So there are k · . . . · k} = k n ways to choose f .
| · k {z
n
The key here is that the element from set is “chosen independently”, meaning it’s possible
to have any combination of them together, with no extra constraints. Sometimes this doesn’t
hold. Suppose I define a new set C, not equal to A × B, but defined as the set of all pairs
(a, b), where a ∈ A, b ∈ B, and a ≤ b. Then C is not equal to A × B, but instead C is a
strict subset of A × B (meaning there are extra constraints applied so that not all possible
pairs in A × B are allowed in C), because of the last condition that a ≤ b, so you cannot
choose a and b independently. Here is that set (with 6 elements) for A = {1, 2, 3, 4} and
B = {1, 2, 3}:
1 2 3
1 (1, 1) (1, 2) (1, 3)
2 (2, 2) (2, 3)
3 (3, 3)
4
In the function example, if we had asked what’s the number of non-decreasing functions
f : {1, 2, . . . , n} → {1, 2, . . . , k}, then the choices are not independent: If we choose f (1) = 5,
then we could not choose f (2) = 3.
The other basic tools that get used are:
• There are n! = n · (n − 1) · (n − 2) · . . . · 2 · 1 permutations of a sequence of n elements.
For example, the 3! = 3 · 2 · 1 = 6 permutations of {1, 2, 3} are 123, 132, 213, 231, 312,
321.
n·(n−1)·(k+2)·(k+1)
• There are nk = k!(n−k)!
n!

= (n−k)·(n−k−1)·...·2·1 ways to choose a set of k elements from a
larger set of n elements. For example, the number of binary strings of length 5 with
5
 5·(5−1)
exactly two 1’s is 2 = 2
= 10: that’s the number of ways to pick 2 positions
in the string to be 1, out of 5 possible positions: 00011, 00101, 01001, 10001, 00110,
01010, 10010, 01100, 10100, 11000.
• Split up into sub-cases that can be added together. The next part uses this idea.
• Watch carefully for parts that are dependent, to try to phrase how to choose them in
a way that involves only independent choices. For example, suppose I ask how many
4-digit integers have the same first and last digit (e.g., 1471, 5015, 3223). If I say
“there are 9 ways to choose the first digit, times 10 ways to choose the second, times 10
ways to choose the third, times 10 ways to choose the fourth, but there is some tricky
dependence with the first and fourth”, it’s not clear what to do. But we can phrase it
like this: there are 9 · 10 · 10 = 900 ways to choose the first three digits, and once I’ve
picked the first digit, there is no choice with the fourth. So there are 900 total such
numbers.
1.4. PROOF BY INDUCTION 11

A slightly trickier example is (where we analyze two different sub-cases): how many
4 digit numbers have their first digit within 1 of their last digit? (for example: 1230,
1231, 1232, 9009, 9008) Well, there’s still 900 ways to choose the first three digits, but
there’s now 3 ways to choose the last digit, unless the first digit is 9, and then there’s
only 2 ways to choose the last digit. This gives two sub-cases: the first digit is 9 or it
isn’t.
There are 800 ways to choose the first three digits with the first not equal to 9, times
3 ways to choose the last digit (800 · 3 = 2400). There are 100 ways to choose the first
three digits with the first equal to 9, times 2 ways to choose the last digit (100·2 = 200).
Combining the sub-cases, there are 2400 + 200 = 2600 such numbers.

1.3.7 Graphs

See slides.

1.3.8 Boolean logic

See slides.

1.4 Proof by induction


Proof by induction is a potentially confusing concept, seeming a bit mysterious compared to
direct proofs or proofs by contradiction. Luckily, you already understand proof by induction
better than most people, since it is merely the “proof” version of the technique of recursion
you learned in programming courses.

1.4.1 Proof by induction on natural numbers

Theorem 1.4.1. For every n ∈ N, |{0, 1}n | = 2n .


20
Proof. (by induction on n)

Base case: {0, 1}0 = {ε}.21 |{ε}| = 1 = 20 , so the base case holds.

Inductive case: Assume |{0, 1}n−1 | = 2n−1 .22 We must prove that |{0, 1}n | = 2n . Note
that every x ∈ {0, 1}n−1 appears as a prefix of exactly two unique strings in {0, 1}n ,
20
To start, state in English what the theorem is saying: For every string length n, there are 2n strings of length n.
21
Note that {0, 1}0 is not ∅; there is always one string of length 0, so the set of such strings is not empty.
22
Call this the inductive hypothesis, the fact we get to assume is true in proving the inductive case.
12 CHAPTER 1. INTRODUCTION

namely x0 and x1.23 Then

|{0, 1}n | =2 · |{0, 1}n−1 |


=2 · 2n−1 inductive hypothesis
=2n .

Of course, there are other (non-induction) ways to see that |{0, 1}n | = 2n . For example,
using the Product Rule for counting, we can say that there are 2 ways to choose the first
bit, times 2 ways to choose the second bit, ..., times 2 ways to choose the last bit, so
2| · 2 ·{z. . . · 2} = 2n ways to choose all of them. This is a sort of “iterative” reasoning that is
n
more cleanly (but also more verbosely and pedantically) captured by the inductive argument
above.
Theorem 1.4.2. For every n ∈ N+ , ni=1 i(i+1) 1 n
P
= n+1 .
Pn 1 1 1 n
Proof. Base case (n = 1): i=1 i(i+1) = 1(1+1) = 2 = n+1 , so the base case holds.

Inductive case: Let n ∈ N+ and suppose the theorem holds for n. Then
n+1 n
X 1 1 X 1
= + pull out last term
i=1
i(i + 1) (n + 1)(n + 2) i=1 i(i + 1)
1 n
= + inductive hypothesis
(n + 1)(n + 2) n + 1
1 + n(n + 2)
=
(n + 1)(n + 2)
n2 + 2n + 1
=
(n + 1)(n + 2)
(n + 1)2
=
(n + 1)(n + 2)
n+1
= ,
n+2
so the inductive case holds.

1.4.2 Induction on other structures


Induction is often taught as something that applies only to natural numbers, but one can
write recursive algorithms that operate on data structures other than natural numbers.
23
The fact that they are unique means that if we count two strings in {0, 1}n for every one string in {0, 1}n−1 , we
won’t double-count any strings. Hence |{0, 1}n | = 2 · |{0, 1}n−1 |
1.4. PROOF BY INDUCTION 13

Similarly, it is possible to prove something by induction on something other than a natural


number.
Here is an inductive definition of the number of 0’s in a binary string x, denoted #(0, x).24

 0, if x = ε; (base case)

#(0, x) = #(0, w) + 1, if x = w0 for some w ∈ {0, 1} ; (inductive case)
 #(0, w), if x = w1 for some w ∈ {0, 1}∗ . (inductive case)
To prove a theorem by induction, identify the base case as the “smallest” object25 for
which the theorem holds.26
Theorem 1.4.3. Every binary tree T of depth d has at most 2d leaves.
Proof. (by induction on a binary tree T ) For T a tree, let d(T ) be the depth of T , and l(T )
the number of leaves in T .
Base case: Let T be the tree with one node. Then d(T ) = 0, and 20 = 1 = l(T ).
Inductive case: Let T ’s root have subtrees T0 and T1 , at least one of them non-empty. If
only one is non-empty (say Ti ), then
l(T ) =l(Ti )
≤2d(Ti ) inductive hypothesis
=2d(T )−1 definition of depth
d(T )
<2 .
If both subtrees are non-empty, then
l(T ) =l(T0 ) + l(T1 )
≤2d(T0 ) + 2d(T1 ) ind. hyp.
≤ max{2d(T0 ) + 2d(T0 ) , 2d(T1 ) + 2d(T1 ) }
= max{2d(T0 )+1 , 2d(T1 )+1 }
=2max{d(T0 )+1,d(T1 )+1} 2n is monotone increasing
=2d(T ) . definition of depth

24
We will do lots of proofs involving induction on strings, but for now we will just give an inductive definition. Get
used to breaking down strings and other structures in this way.
25
In the case of strings, this is the empty string. In the case of trees, this could be the empty tree, or the tree with
just one node: the root (just like with natural numbers, the base case might be 0 or 1, depending on the theorem).
26
The inductive step should then employ the truth of the theorem on some “smaller” object than the target object.
In the case of strings, this is typically a substring, often a prefix, of the target string. In the case of trees, a subtree,
typically a subtree of the root. Using smaller subtrees than the immediate subtrees of the root, or shorter substrings
than a one-bit-shorter prefix, is like using a number smaller than n − 1 to prove the inductive case for n; this is the
difference between weak induction (using the truth of the theorem on n − 1 to prove it for n) and strong induction
(using the truth of the theorem on all m < n to prove it for n)
14 CHAPTER 1. INTRODUCTION
Part I

Automata Theory

15
Chapter 2

String theory

No, not that string theory. In this chapter we cover the basic theoretical definitions and
terminology used to talk about the kind of strings found as a data type in programming
languages: finite sequences of symbols.
Some people also call this language theory, which is an odd name. It was developed
at a time when connections were being discovered between linguistics and the theory of
computation. As such, many of the terms sound strange to a computer scientist, but make
sense if one considers using the theory to model natural languages. Modern-day natural
language processing is as much machine learning techniques as linguistics. However, the
most elegant application of this theory has been to the development of parsers and compilers
for artificial programming languages.
It is also sometimes the case that “language theory” refers more broadly to “automata
theory”, which is the subject of this whole part of the book, including finite automata, regular
expressions, and context-free grammars. But the data that those automata are designed to
process is strings, so we necessarily start the study with strings.

2.1 Why to study automata theory


It is common—and unfortunate—for modern textbooks on the theory of computation to
dismiss automata theory as passé or to skip it entirely. Ironically, automata theory is a
victim of its own success. A research field is interesting when there are many questions
whose answers are unknown. The P vs. NP question (covered in Chapter 10), the biggest
open question in theoretical computer science, still open over 40 years after first being posed,
has ignited a huge number of areas of research. It is interesting precisely because it has raised
so many questions that seem important, but that we don’t know how to answer.
Automata theory, on the other hand, has fewer fundamental open questions. Precisely
because it succeeded in giving clean, elegant answers to questions, it is not as active an area of
current research. Its well-understood—and often easier to understand—results, which make
it suitable for an undergrad course, are precisely what make it less appealing for modern
research. Many modern theoretical computer scientists find that they don’t need to use their

17
18 CHAPTER 2. STRING THEORY

knowledge of automata theory to make progress in their subfield of theoretical computer


science. Perhaps they bore of teaching automata theory because (at the undergraduate
level) it doesn’t require the use of deep, difficult mathematics and lengthy, complex proofs.
However, automata theory has had one enormously impactful application outside of the-
ory: the design of source code parsers for compilers and interpreters. The centrality of
automata theory to parsing, and its ubiquity, means that automata theory single-handedly
dwarfs the impact of nearly any other subfield of theoretical computer science, including the
very impactful notion of NP-completeness.1
Even to those with no interest whatsoever in the theory of computation, this one applica-
tion alone is reason enough for every computer scientist to develop a solid grasp of automata
theory. New programming languages are developed every year. Even if you never work on
writing a new fully general-purpose programming language yourself, one fundamental tool,
which should be in the toolbox of every software developer, is the ability to create a domain-
specific language (https://en.wikipedia.org/wiki/Domain-specific_language). This
requires an understanding of parsers, and parsers are built out of the finite automata, regu-
lar expressions, and context-free grammars that we will study in the next few chapters.
And before we study those, we need a common vocabulary to talk about the data they
process: strings.

2.2 Definitions
An alphabet is any non-empty finite set, whose elements we call symbols (a.k.a., characters).
For example, {0, 1} is the binary alphabet, and

{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z}

is the Roman alphabet. Symbols in the alphabet are usually single-character identifiers.
A string (a.k.a., word ) over an alphabet is a finite sequence of symbols taken from the
alphabet. We write strings such as 010001, without the parentheses and commas.2 If x is a
string, |x| denotes the length of x.
If Σ is an alphabet, the set of all strings over Σ is denoted Σ∗ . For n ∈ N, Σn =
{ x ∈ Σ∗ | |x| = n } is the number of strings in Σ∗ of length n. Similarly Σ≤n = { x ∈ Σ∗ | |x| ≤ n }
and Σ<n = { x ∈ Σ∗ | |x| < n }.
The string of length 0 is written ε, and in some textbooks, λ. In most programming
languages it is written "".
Note in particular the difference between ε, ∅, and {ε}.3
1
Cryptography is arguably equally impactful on applications where security is a concern. However, compilers and
interpreters are needed for all software, regardless of security. If it weren’t for the compilers that enable high-level
languages, we’d still be implementing cryptographic software—and all other software—in machine code.
2
If we used the standard sequence notation, 010001 would be written (0, 1, 0, 0, 0, 1)
3
ε is a string, ∅ is a set with no elements, and {ε} is a set with one element. The following Python code defines
these three objects
epsilon = ""
2.2. DEFINITIONS 19

Given n, m ∈ N, x[n . . m] is the substring consisting of the nth through mth symbols of
x, and x[n] = x[n . . n] is the nth symbol in x, where x[1] is the first symbol.
We write xy (or x ◦ y when we would like an explicit operator symbol) to denote the
concatenation of x and y, and given k ∈ N, we write xk = xx | {z. . . x}. 4 The reverse of x,
k times
where |x| = n, is the string xR = x[n]x[n − 1] . . . x[2]x[1].
Given two strings x, y ∈ Σ∗ for some alphabet Σ, x is a prefix of y, written x v y, if x is
a substring that occurs at the start of y. x is a suffix of y if x is a substring that occurs at
the end of y.
The length-lexicographic ordering (a.k.a. military ordering) of strings over an alphabet
is the standard dictionary ordering, except that shorter strings precede longer strings.5 For
example, the length-lexicographical ordering of {0, 1}∗ is
ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000, 0001, . . .
A language (a.k.a. a decision problem) is a set of strings. A class is a set of languages.
We’ve been speeding through terminology, but it’s worth pausing on these definitions for
a moment. We already explained that “language” is an archaic term left over from linguistic
theory. But why do we say that a set of strings is also a “decision problem”? A decision
problem is a computational problem with a yes/no answer. In other words, it’s the kind of
problem you are solving when you write a programming language function with a Boolean
return type. Given some input to the function, the function answers “yes” or “no”. Is a
given integer prime or not? Is a given string a properly formatted HTML file? Is a given
list of integers x, y, z, n a counter-example xn + y n = z n to Fermat’s Last Theorem? These
are all decision problems you can solve by writing an appropriate function in your favorite
programming language.
But what’s the input to the function? Depending on the programming language, it could
be many arguments, and they could all be different types such as integers, floating point
numbers, lists of strings, etc. However, every one of these objects, somewhere in memory, is
just a sequence of bits. In other words, at a lower level of abstraction, every one of those
empty set = set()
set with epsilon = {""}
In Java, this is
String epsilon = "";
Set empty set = new HashSet();
Set<String> set with epsilon = new HashSet<String>();
set with epsilon.add(epsilon);
In C++, this corresponds to (assuming we’ve imported everything from the standard library std)
string epsilon("");
set empty set;
set<string> set with epsilon;
set with epsilon.insert(epsilon);
4
Alternatively, define xk inductively as x0 = ε and xk = xxk−1
5
It is also common to call this simply the “lexicographical ordering”. However, this term is ambiguous and is also
used to mean the standard alphabetical ordering. In length-lexicographical order, 1 < 00, but in alphabetical order,
00 < 1. We use the term length-lexicographical to avoid confusion.
20 CHAPTER 2. STRING THEORY

objects is simply a finite binary string, whose bits are interpreted in a special way. And even
if there are many arguments, when they are passed into the function as input, through some
mechanism, they are combined into a single string (for instance, with delimiters between
them to mark the boundaries).6 So even though it’s not as convenient for programming, it is
more convenient for mathematical simplicity to simply say that the input to every decision
problem is a single finite string. And since there’s only two possible answers, if we know the
subset of strings for which the correct answer is “yes”, then we also know the correct answer
for every string.
Thus, the set of input strings (i.e., the language) for which the answer is “yes” is the
mathematical object defining the decision problem itself. More formally, L ⊆ Σ∗ defines the
decision problem: on input x, if x ∈ L then output “yes”, and if x 6∈ L then output “no”.
For example, the decision problem of determining whether an integer is prime is represented
by the subset P ⊂ {0, 1}∗ of binary strings that encode prime integers in binary:

P = { 10, 11, 101, 111, 1011, 1101, 10001, . . .}


{ 2, 3, 5, 7, 11, 13, 17, . . .}

Now, there’s other kinds of computational problems worth solving, where the output is not
Boolean. A solution to a search problem, rather than being merely one bit, is a whole string.
For example, given a set of linear equations, find a solution. An optimization problem is a
special kind of search problem where different solutions have different quantitative “values”
and we want to minimize or maximize the value. For example, given a list of airline ticket
prices between cities, find the cheapest sequence of flights that visits each city once. It seems
like a major restriction to study only decision problems and not these more general types of
problems. We will talk in more detail about these kinds of problems in Chapter 9, and we
will see that in some senses, we can learn a lot about computational problems generally just
by restricting attention to decision problems.
We said that a class is a set of languages. This term “class” sounds like just another
noun. However, these terms are useful because, without them, we would just call everything
a “set”, and easily forget whether it is a set of strings, a set of sets of strings, or even the
dreaded set of sets of sets of strings (they are out there; the arithmetical and polynomial
hierarchies are sets of classes).
Why would we want to think about sets of decision problems, instead of just one at
a time? Generally, we will define some model of computation, such as finite automata, or
Turing machines, or polynomial-time Turing machines. These are sets of “types of computing
machines” or “programs” in some programming language. Each machine has some notion
of giving a Boolean output for every input, so each machine defines some decision problem
that it solves. So we consider classes of decision problems when we want to talk about a
6
Technically this would only be true for pass-by-value. If the language is pass-by-reference, the references/pointers
would be concatenated into a single string because they are passed into the function by being pushed sequentially
onto the call stack. However, we would still think of the input to the function as including the data in memory to
which those pointers are pointing, which may not be contiguous. So it would be more accurate to say that we can
conceptually think of the input as something that could easily be concatenated into a single string, even if it is not.
2.3. BINARY NUMBERS 21

concept such as the decision problems solvable by finite automata (these are also called the
class of regular languages) or the decision problems solvable by Turing machines (these are
also called the class of decidable languages) or the decision problems solvable in polynomial
time by Turing machines (this famous class has a name: P).
Given two languages A, B ⊆ Σ∗ , let AB = { ab | a ∈ A and b ∈ B } (also denoted
A ◦ B).7 Similarly, for all n ∈ N, An = AA | {z. . . A},8 , A≤n = A0 ∪ A1 ∪ . . . ∪ An and
n times
A<n = A≤n \ An .9
Given a language A, let A∗ = ∞ n 10
Note that A = A1 (hence A ⊆ A∗ ).
S
n=0 A .

Examples. Define the languages A, B ⊆ {0, 1, 2}∗ as follows: A = {0, 11, 222} and B =
{000, 11, 2}. Then
AB = {0000, 011, 02, 11000, 1111, 112, 222000, 22211, 2222}
A2 = {00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222}
A∗ = {|{z}
ε , 0, 11, 222, 00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222, 000, 0011, . . .}
| {z } | {z } | {z }
A0 A1 A2 part of A3

end of lecture 1a

2.3 Binary numbers


Since algorithms doing integer arithmetic are generally doing it on binary strings that rep-
resent the integers, it is worth recalling how binary representation of integers works. The
binary expansion of a natural number n ∈ N is sn = 0 if n = 0, and otherwise, letting
k = blog2 nc + 1, the binary string sn ∈ {0, 1}k such that (recall sR n is the reverse of sn )
n = ki=1 sR i−1
P
n [i] · 2 . For example, 10011 is the binary expansion of 19, since
1 · 24 + 0 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = 16 + 2 + 1 = 19.
Sometimes, when we want to emphasize that a string is supposed to represent a natural
number, we will write the base in a subscript, such as 100112 or 1910 .
It is worth knowing how altering the binary string sn affects the number n it represents.
Suppose we append a 0 to the end. What happens when we do this in decimal? It multiplies
7
The set of all strings formed by concatenating one string from A to one string from B
8
The set of all strings formed by concatenating n strings from A.
9
Note that there is ambiguity, since An could also mean the set of all n-tuples of strings from A, which is a
different set. We assume that An means n-fold concatenation whenever A is a language. The difference is that in
concatenation of strings, boundaries between strings are lost, whereas tuples always have the various elements of the
tuple delimited explicitly from the others.
10
The set of all strings formed by concatenating 0 or more strings from A.
22 CHAPTER 2. STRING THEORY

the number by 10. It’s


Pk theRsame i−1
thing here, but it multiplies it by 2. To see why, note how it
alters the sum n = i=1 s [i] · 2 . Let’s use the example since it’s easier to trace through:
it changes
1 · 24 + 0 · 23 + 0 · 22 + 1 · 21 + 1 · 20
to
1 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 1 · 21 + 0 · 20
In other words, it added a 0 term to the end (which doesn’t change n) and it multiplied all
other terms by 2 (since each exponent of 2 was increased by 1). More succinctly, if sm = sn 0
then m = 2n. What about sm = sn 1? (i.e, what if we append a 1 to the end?) Then it’s
just like above, but the final term is 1 instead of 0. So m = 2n + 1. What if we remove the
final bit of sn to get the string sm ? This divides by 2, dropping the remainder. (In binary
the possible remainders are just 0 and 1.)
Chapter 3

Deterministic finite automata

We start with a very restricted model of computation, which (like many models we study)
takes a string input and produces a Boolean output, called (deterministic) finite automata.
This will be our first example of how to formally model computation with rigorously defined
mathematical objects such as tuples, sets, and functions. But, it is not merely a toy model
for educational purposes; it is among the models most frequently encountered outside of
theoretical computer science. String matching algorithms that implement the “find” feature
in text editors and displays are based on finite automata. Syntax highlighting engines in text
editors are often specified using finite automata. Many common compression algorithms are
implemented as finite automata. A common use is as a lexer : the first step of source code
compilation that transforms the raw text into “tokens” such as variable identifiers, numeric
literals, comments, string literals, etc.

3.1 Intuitive overview of deterministic finite automata (DFA)


A finite automaton is a mathematical model of a idealized computational device with essen-
tially no “memory” beyond a finite set of states. A finite automaton starts in a certain state,
reads one input symbol at a time in order from left to right. A list of rules called transitions
indicate, given the current state and symbol, what is the next state the device enters. Each
state is labeled with a Boolean output, so at any time the automaton is currently reporting
a “yes” or “no” output. The output it reports for the whole string is simply the output of
the state the automaton reaches after reading the last symbol of the string.
See Figure 3.1 for an example of one way
to represent such a finite automaton D1 , 0 1
called a state transition diagram. 0
1
• states: a, b, c a b c
• input alphabet: 0, 1 0,1
• start state: a
Figure 3.1: A finite automaton D1 .
23
24 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA

• accept states: b

• reject states: a, c

• transitions: arrows labeled with input


symbols

D1 accepts or rejects based on the state it is in after reading the input.


If we give the input string 1101 to D1 , the following happens.

1. Start in state a

2. Read 1, follow transition from a to b

3. Read 1, follow transition from b to b

4. Read 0, follow transition from b to c

5. Read 1, follow transition from c to b

6. Accept the input because D1 is in state b at the end of the input.

D1 also accepts 1, 01, and 010101. In fact all transitions labeled 1 go to b, so it accepts
any string ending in a 1.
Are those all the strings it accepts?
D1 also accepts 100, 0100, 110000, and any string that has at least one 1 and ends with
an even number of 0’s following the last 1.

3.2 Formal models of computation


That example gives some intuition of what we mean by “finite automaton”, but to study
them formally, we need a formal mathematical definition, saying what a finite automaton is
using only basic discrete mathematical definitions such as integers, strings, sets, sequences,
functions, etc. Finite automata are a model of computation. In most circumstances, you can
take the phrase “model of computation” to be synonymous with programming language. Of
course, some of them look much different from the programming languages you are used to!
But they all involve some way of specifying some list of rules governing what the program is
supposed to do. So we can say that C++ is a model of computation, and each C++ program
is an instance of this model.
There are two parts to defining any formal model of computation: syntax and semantics.

• The syntax tells us what an instance of the model is.

• The semantics tell us what an instance of the model does.


3.3. FORMAL DEFINITION OF A DFA (SYNTAX) 25

For example, I can say that a C++ program is any ASCII string that the gcc compiler can
compile with no errors: that’s the definition of the syntax of C++.1 But knowing that gcc
successfully compiled a program does little to help you understand what will happen when
you actually run the program. The definition of the semantics of C++ are more involved,
telling you what each line of code does: what effect does it have on the memory of the
computer?
And C++ semantics are complex indeed. For example, a = b+c; has the following
semantics:
add the integer stored in b to the integer stored in c and store the result in a...
unless one of b or c is a float or a double, in which case use floating point
arithmetic instead, and if the other argument is an int it is first converted to a
double... unless b is actually an instance of a class that overloads the + operator,
in which case call the member function associated with that operator...
And so on. Hopefully this makes it clear why, to learn how to reason formally about computer
programs, we start with a much simpler model of computation than C++ or Python, one
whose syntax and semantics definitions fit on a page.

3.3 Formal definition of a DFA (syntax)


To describe transitions between states, we introduce a transition function δ. The goal is
to express that if an automaton is in state q, and it reads the symbol 1 (for example), and
transitions to state q 0 , then this means δ(q, 1) = q 0 .
Definition 3.3.1. A (deterministic) finite automaton (DFA) is a 5-tuple (Q, Σ, δ, s, F ),
where
• Q is a non-empty, finite set of states,
• Σ is the input alphabet,
• s ∈ Q is the start state,
• F ⊆ Q is the set of accepting states, and
• δ : Q × Σ → Q is the transition function.
Example 3.3.2. The DFA D1 of Figure 3.1 is defined D1 = (Q, Σ, δ, s, F ), where

• Q = {a, b, c},
1
Of course, that doesn’t do much to help you write syntactically valid C++ programs. Another more useful way
to define it is that it is any ASCII string that is produced by the C++ grammar (http://www.nongnu.org/hcb/).
We will cover grammars briefly later in the course, and you will understand better what it means for a string to be
produced by a grammar.
26 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA

• Σ = {0, 1},
• s = a,
• F = {b}, and
• δ is defined
δ(a, 0) = a,
δ(a, 1) = b,
δ(b, 0) = c,
δ(b, 1) = b,
δ(c, 0) = b,
δ(c, 1) = b,
or more succinctly, we represent δ by the transition table
δ 0 1
a a b
b c b
c b b
The state transition diagram and this formal description contain exactly the same informa-
tion. The diagram is easy for humans to read, and the formal description is easy to work
with mathematically, and to program.
If M = (Q, Σ, δ, s, F ) is a DFA, how many transitions are in δ; in other words, how many
entries are in the transition table?
If A ⊆ Σ∗ is the set of all strings that M accepts, we say that M decides A, and we write
L(M ) = A.
Every DFA decides a language. If it accepts no strings, what language does it decide? ∅
D1 decides the language
 
∗ w contains at least one 1 and an

L(D1 ) = w ∈ {0, 1}
even number of 0’s follow the last 1
Note the terminology here: M decides a single language L(M ), but for each string x ∈ Σ∗ ,
M either accepts or rejects x. M does not decide a string, nor does it accept or reject a
language; this misuse of terms mixes up the types.
Example 3.3.3. See Figure 3.2.
Formally, D2 = (Q, Σ, δ, s, F ), where Q = {a, b}, Σ = {0, 1}, s = a, F = {b}), and δ is
defined
δ 0 1
a a b
b a b
What does D2 do on input 1101?
3.4. MORE EXAMPLES 27

0 1
1
a b
0
Figure 3.2: A DFA D2 . L(D2 ) = { w ∈ {0, 1}∗ | w ends in a 1 } =
{1, 01, 11, 001, 011, 101, 111, 0001, . . .}

Example 3.3.4. Unless we want to talk about the individual states, when specifying a state
diagram, it is not necessary to actually give names to the states in the diagram; the start
arrow and transition arrows tell us the whole structure of the DFA. Figure 3.3 shows a such

0 1
1

0
Figure 3.3: A DFA D3 . L(D3 ) = { w ∈ {0, 1}∗ | w does not end in a 1 }

a state diagram.

end of lecture 1b

3.4 More examples


Example 3.4.1. Design a DFA that decides the language
a n ∈ N = { w ∈ {a}∗ | |w| is a multiple of 3 } = {ε, aaa, aaaaaa, aaaaaaaaa, . . .}.
 3n

See Figure 3.4.


Example 3.4.2. Design a DFA that decides the language
n ∈ N = { w ∈ {a}∗ | |w| ≡ 2 mod 3 } = {aa, aaaaa, aaaaaaaa, . . .}.
 3n+2
a
Just switch the accept state in Figure 3.4 from 0 to 2.
Example 3.4.3. Design a DFA that decides the language
{ w ∈ {0, 1}∗ | w represents a multiple of 2 in binary } .
For example it should accept 0, 1010, and 11100 and reject 1, 101, and 111.
This is actually D3 from Figure 3.3.
28 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA

a a
0 1 2

Figure 3.4: A DFA deciding whether its input string’s length is a multiple of 3. Here we give state
names that are meaningful: each represents the remainder left when dividing by 3 the number of
symbols read so far.

Example 3.4.4. Design a DFA that decides the language

{ w ∈ {0, 1}∗ | w represents a multiple of 3 in binary } .

See Figure 3.5.

1 0
0 1 2
1 0
0 1
1
1 0
s 0 1 2
1 0
0 0 1
0,1
0* r
0,1

Figure 3.5: Design of a DFA to decide multiples of 3 in binary. If a natural number n has b ∈ {0, 1} appended
to its binary expansion, then this number is 2n + b. Top: a DFA with transitions to implement “when in state k
after reading bits representing n, if n ≡ k mod 3, then after reading bit b, so that the bits read so far now represent
2n + b, change to the state k0 such that 2n + b ≡ k0 mod 3”. This assumes ε represents 0, and that leading 0’s are
allowed, e.g., 5 is represented by 101, 0101, 00101, 000101, . . . Bottom: a modification of top DFA to reject ε and
any positive integers with leading 0’s.

Intuitively, we can keep track of whether the bits read so far represent an integer n ∈ N
that is of the form n = 3k for some k ∈ N (i.e., n is a multiple of 3, or n ≡ 0 mod 3),
n = 3k + 1 (n ≡ 1 mod 3), or n = 3k + 2 (n ≡ 2 mod 3). Appending the bit 0 to the end of
n’s binary expansion multiplies n by 2, resulting in 2n, whereas appending the bit 1 results
in 2n + 1.
For example, consider appending a 1 to a multiple of 3. Since n = 3k, then 2n + 1 =
2(3k) + 1 = 3(2k) + 1 ≡ 1 mod 3, so the transition function δ obeys δ(0, 1) = 1. Consider
3.5. FORMAL DEFINITION OF COMPUTATION BY A DFA (SEMANTICS) 29

appending a 0 to n where n ≡ 2 mod 3. Since n = 3k + 2, then 2n = 2(3k + 2) = 6k + 4 =


3(2(k + 1)) + 1, so 2n ≡ 1 mod 3, so δ(2, 0) = 1. The other four cases are similar.

3.5 Formal definition of computation by a DFA (semantics)


We say D = (Q, Σ, δ, s, F ) accepts a string x ∈ Σn if there is a sequence of states q0 , q1 , q2 , . . . , qn ∈
Q, called a computation sequence of D on x, such that
1. q0 = s,
2. qn ∈ F , and
3. δ(qi , x[i + 1]) = qi+1 for all i ∈ {0, 1, . . . , n − 1}.
Otherwise, D rejects x.
For example, for the DFA in Figure 3.4, for the input x = aaaaaa, the sequence of states
(q0 , q1 , q2 , q3 , q4 , q5 , q6 ) testifying that the DFA accepts x is (0, 1, 2, 0, 1, 2, 0).
Define the language decided by D as L(D) = { x ∈ Σ∗ | D accepts x } . We say a lan-
guage L is DFA-decidable if some DFA decides L.
Using more “computational terminology”, a decision problem is DFA-decidable if it can
be solved by an algorithm that uses a constant amount of memory and that runs in exactly
n steps on input strings of length n.2
One major goal of this part of the book is to demonstrate that many different models
of computation all define exactly the DFA-decidable languages. In fact, the more common
term for this class is regular. The term “regular language” actually seems a bit inconsistent
at this point; it is named for a model studied later called “regular expressions”. But, before
we prove that they all define the same class of languages, we need some way to refer to those
classes in a way that evokes the model defining the class.
In other words, the class of regular languages is the set of all languages decided by some
DFA. We will use the term “DFA-decidable” until we actually finish proving that DFA-
decidable is the same thing as NFA-decidable, regex-decidable, and RG-decidable, after
which we will simply use the term regular instead.

end of lecture 1c

2
This is a bit inexact, as I haven’t said what “memory” or “steps” are for general algorithms. But the meaning is
clear for DFAs: memory is the set of states and steps correspond to individual transitions. Other reasonable ways of
defining space (memory) and time (number of steps) for other models of computation lead to models with equivalent
computational power. In fact, the time restriction doesn’t even matter: if you use a constant amount of memory
and any amount of time, only regular decision problems can be solved, although proving that is beyond the scope
of this course. So for example, if you write a Python function returning a Boolean, and it uses no lists or sets or
other unbounded data structures, and if it uses no recursion (a way to use unbounded memory from the function call
stack), then the decision problem it is solving is regular, i.e., also solvable by a DFA.
30 CHAPTER 3. DETERMINISTIC FINITE AUTOMATA
Chapter 4

Declarative models of computation

Using programming language terminology, DFAs are an imperative language, much like C++
or Python: they represent a sequence of instructions telling the machine what to do, based
on what input is read. This chapter introduces three models of computation: regular expres-
sions, grammars, and non-deterministic finite automata. In contrast to DFAs, these models
are declarative: each describes some language, but the model itself gives no indication of
instructions for how to “execute” an instance of the model.1 It’s not obvious how to create
an algorithm deciding a language described by a grammar or regular expression. (Although
such algorithms do exist.) Non-deterministic finite automata are closer to DFAs in the sense
that they are interpreted to start in some state and transition from state to state based on
reading input symbols, but there is a special ability to make non-deterministic choices, and
it’s not clear how one would implement this ability in a real machine (but we will see how).

4.1 Regular expressions


4.1.1 Introduction
A regular expression is a string encoding a “pattern” that matches some set of strings, used
in many programming languages and applications such as grep. For example, the regular
expression
(0 ∪ 1)0∗
matches any string starting with a 0 or 1, followed by zero or more 0’s.2
In fact, with a small substitution of symbols, the above expression is quite literally an
expression of the set of strings it represents, using set notation and string operations we
defined in Chapter 2. Simply replace each bit with a singleton set: ({0} ∪ {1}){0}∗ , which
equals {0, 1}{0}∗ = {0, 1, 00, 10, 000, 100, 0000, 1000, 00000, 10000 . . .}
1
In this sense these models are similar to declarative programming languages such as Prolog or SQL. One programs
by “declaring” what the computation result should be, without specifying (quite as directly) how to compute it.
2
The notation (0|1)0∗ is more common, especially in programming language regex libraries. We use the ∪ symbol
in place of | to emphasize the connection to set theory, and because | looks too much like 1 when hand-written.

31
32 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

The next regular expression


(0 ∪ 1)∗
matches any string consisting of zero or more 0’s and 1’s; i.e., any binary string.
Let Σ = {0, 1, 2}. Then
(0Σ∗ ) ∪ (Σ∗ 1)
matches any ternary string that either starts with a 0 or ends with a 1. The above is
shorthand for
(0(0 ∪ 1 ∪ 2)∗ ) ∪ ((0 ∪ 1 ∪ 2)∗ 1)
since Σ = {0} ∪ {1} ∪ {2}.
Each regular expression R defines (or decides, to use consistent terminology with other
automata) a language L(R).
We now give a formal inductive definition of a regular expression. It has three base cases
and three inductive cases.

4.1.2 Formal definition of a regex (syntax and semantics)


Definition 4.1.1. Let ∆ = {∪, (, ),∗ , ∅, ε} be the (regex) control alphabet. Let Σ be an
alphabet such that Σ ∩ ∆ = ∅, called the input alphabet. Let Γ = Σ ∪ ∆.
We define regular expressions inductively. R ∈ Γ∗ is a regular expression (regex), deciding
language L(R) ⊆ Σ∗ , if one of the following cases holds.
1. base cases:
(a) R = b for some b ∈ Σ. Then L(R) = {b}.
(b) R = ε. Then L(R) = {ε}.
(c) R = ∅. Then L(R) = {}.
2. inductive cases (let R1 , R2 be regex’s):
(a) R = (R1 ) ∪ (R2 ). Then L(R) = L(R1 ) ∪ L(R2 ).
(b) R = (R1 )(R2 ). Then L(R) = L(R1 ) ◦ L(R2 ).
(c) R = (R1 )∗ . Then L(R) = L(R1 )∗ .
A language is regex-decidable if some regex decides it.

4.1.3 Conventions
We sometimes abuse notation and write R to mean L(R), relying on context to interpret the
meaning.
Optionally, we may omit the parentheses in some cases. The operators have precedence
∗ > ◦ > ∪. For example, 11 ∪ 01∗ is equivalent to (1)(1) ∪ ((0)(1)∗ ).3
3
This is similar to why the arithmetic expression 3 × 7 + 4 × 56 is equivalent to (3) × (7) + ((4) × (5)6 ), but not
to ((3 × 7) + 4) × (56 ) or (3 × ((7 + 4) × 5))6 .
4.1. REGULAR EXPRESSIONS 33

For convenience, define R+ = RR∗ , for each k ∈ N, let Rk = RR . . . R}, and given an
| {z
k times
alphabet Σ = {a1 , a2 , . . . , ak }, then Σ is shorthand for the regex a1 ∪ a2 ∪ . . . ∪ ak .

The cases ε and ∅. Base case (1c) R = ∅ is required for the technical reason that, if instead
it were omitted, then the easiest decision problem there is, “just say no” (i.e., the language
∅) would not be regex-decidable. However, the case R = ∅ is rarely used in practice. The
regex simulator (http://web.cs.ucdavis.edu/~doty/automata/) has no way to specify it.
Similarly, base case (1b) R = ε is rarely used in practice. However, for many of the proofs
we do in this course, it will be convenient to have ε as its own special case. It is specified in
the simulator by the simple absence of symbols. For example, typing nothing corresponds
to the regex ε, |ab is the regex ε ∪ ab, and cd*||ab is the regex cd+ ∪ ε ∪ ab.

Regex trees. Although a regex is merely a string, a more structured way to think of a
regex, based on the inductive definition, is as having a tree structure, similar to the parse
tree for an arithmetic expression such as (7 + 3) · 9 + 11 · 5. The base cases are leaves, and the
recursive cases are internal nodes, with one child in the case of the unary operation ∗ and
two children in the case of the binary operations ◦ and ∪. See Figure 4.1 for an example.

ab* ∪ (a ∪ b)(a*b ∪ bbab)


◦ ◦

a * ∪ ∪

b a b ◦ ◦

* b b ◦

a b ◦

a b

Figure 4.1: Recursive structure of the regular expression ab∗ ∪ (a ∪ b)(a∗ b ∪ bbab) viewed as a tree.

4.1.4 Examples
Let Σ be an alphabet.
34 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

• 0∗ 10∗ = { w | w contains a single 1 }.

• Σ∗ 1Σ∗ = { w | w has at least one 1 }.

• Σ∗ 1Σ∗ 1Σ∗ = { w | w has at least two 1’s }.

• 0∗ 10∗ 10∗ = { w ∈ {0, 1}∗ | w has exactly two 1’s }.

• Σ∗ 001Σ∗ = { w | w contains the substring 001 }.

• 1∗ (01+ )∗ = { w | every 0 in w is followed by at least one 1 }.

• (ΣΣ)∗ = { w | |w| is even }.

• 0 (0∪1)∗ 0 ∪ 1 (0∪1)∗ 1 ∪ 0 ∪ 1 = { w ∈ {0, 1}∗ | w starts and ends with the same symbol }

• (0 ∪ ε)1∗ = 01∗ ∪ 1∗ .

• (0 ∪ ε)(1 ∪ ε) = {ε, 0, 1, 01}.

• R ∪ ∅ = R, where R is any regex.

• Rε = R, where R is any regex.

• R∅ = ∅, where R is any regex.


skip in lecture

∅∗ = {ε}.4

There is an algebra of regular expression operations that can be helpful to thinking about
when manipulating or simplifying them. One can think of ∪ analogously to addition and
◦ analogously to multiplication. The distributive law holds: A(B ∪ C) is equivalent to
AB ∪ AC.5 In other words, having “a string matching A, followed by one matching either B
or C” is equivalent to having “either a string matching A followed by one matching B, or a
string matching A followed by one matching C”. One can think of ∅ as the additive identity
(which is why R ∪ ∅ = R and R∅ = ∅, analogously to x + 0 = x and x · 0 = 0), and one can
think of ε as the multiplicative identity (which is why Rε = R, analogously to x · 1 = x).
4
The identity ∅∗ = {ε} is more a convention than something that can be derived from the definitions. This is
similar to the convention for 00 (the number zero raised to the power zero). Is 00 = 0, because 0 times any number
is 0? Or is 00 = 1, because anything raised to the power 0 is 1? It’s not particularly well defined, so we just say
00 = 1 by convention. Similarly, it’s not clear whether ∅∗ = ∅ because ∅ concatenated to anything is ∅, or whether
∅∗ contains ε because ε ∈ A∗ for all A. Like in the case of 00 , we just choose a convention and say that ∅∗ = {ε}. In
practice, this doesn’t come up much.
5
As with addition and multiplication, the distributive law fails if we swap the operations: A ∪ BC is not equivalent
to (A ∪ B)(A ∪ C).
4.2. CONTEXT-FREE GRAMMARS 35

Example 4.1.2. Design a regex to match C++ double literals. For example, the following
are valid:
2.0, 3.14, .02, 3., 0.02, -.02, +4.5, 0.00
The following are not valid:
., +, +-2.0, 00.0, 2 (that’s an int, not a double)
Let P = 1 ∪ . . . ∪ 9 be a regex deciding a single positive decimal digit. Let D = 0 ∪ P .

(+ ∪ − ∪ ε)(.D+ ∪ (P D∗ ∪ 0).D∗ )

end of lecture 2a

4.2 Context-free grammars


4.2.1 Introduction
Here is an example of a grammar:

S → AB
A → 0A
A → ε
B → 1B
B → ε

A grammar consists of rules (a.k.a., productions), one on each line. The single symbol on the
left is a variable, and the string on the right consists of variables and other symbols called
terminals. One variable is designated as the start variable, on the left of the topmost rule.
The fact that the left side of each production has a single variable (instead of a multiple-
symbol string) means the grammar is context-free. We abbreviate two rules with the same
left-hand variable as follows: A → 0A | ε as a shorthand for the two rules A → 0A and
A → ε.
S, A, and B are variables, and 0 and 1 are terminals.
The idea is that we start with a single copy of the start variable. We pick a rule with
the start variable and replace the variable with the string on the right side of the rule. This
gives a string mixing variables and terminals. We pick some variable in it, pick a rule with
that variable on the left side, and again replace the variable with the right side of the rule.
Do this until the string is all terminal symbols.

What is the set of strings producible with the rules of the example grammar above?
There’s no choice how to start: S becomes AB, and we write this as S ⇒ AB.
36 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

Now, we can replace either A or B; let’s pick A. We can replace it with either 0A or ε; let’s
pick 0A. The B remains unchanged as we do this substitution. Then AB ⇒ 0AB. We could
do this rule again: 0AB ⇒ 00AB. We could now replace B with 1B, so 00AB ⇒ 00A1B.
Do that a few more times: 00A1B ⇒ 00A11B ⇒ 00A111B ⇒ 00A1111B ⇒ 00A11111B.
We could go back to substituting for A: 00A11111B ⇒ 000A11111B. Now, we could
substitute A with the other rule A → ε, so 000A11111B ⇒ 00011111B. We could do the
same for B, and now there’s no more variables left to substitute: 00011111B ⇒ 00011111.
So we’ve produced string 00011111. What other strings are producible?
Example 4.2.1. Write a grammar generating the language {0n 1n | n ∈ N}.
S → 0S1 | ε
Example 4.2.2. Write a grammar generating the language of properly nested parentheses.
Examples of properly nested parentheses, with underbraces |{z}
() showing which pairs of
opening ( and closing ) parentheses are matched together:
• |{z}
()

• |{z}
() |{z}
()

• ( |{z}
() )
| {z }
• |{z}
() ( |{z}
() |{z}
() ) ( |{z}
() ( |{z}
() ) )
| {z } | {z }
| {z }
Examples of not properly nested parentheses:
• )(: cannot close parentheses without an opening one to the left.
• ((): first open parentheses are not closed.
• ()): last closing parentheses doesn’t match one that was opened.
Hint: the following rule describes properly nested parentheses: if two strings have prop-
erly nested parentheses, then so does their concatenation. Also, if x has properly nested
parentheses, then surrounding it with parentheses keeps it properly nested.

S → (S) | SS | ε

Context-free versus context-sensitive grammars. In theoretical computer science, “context-


free grammar” is the common term, whereas in programming languages, “grammar” is more
common. The adjective “context-free” distinguishes them from “context-sensitive” gram-
mars, in which the right side of a production rule can have multiple symbols, such as
AB → Axy, meaning “replace B with xy, but only if there is an A to the left of B”.
We won’t study context-sensitive grammars, but it is worth noting that they have greater
computational power than context-free grammars.
4.2. CONTEXT-FREE GRAMMARS 37

4.2.2 Formal definition of a CFG (syntax)


Definition 4.2.3. A context-free grammar (CFG) is a 4-tuple (Γ, Σ, S, ρ), where
• Γ is a finite alphabet of variables,
• Σ is a finite alphabet, disjoint from Γ, called the terminals,
• S ∈ Γ is the start symbol, and
• ρ ⊆ Γ × (Γ ∪ Σ)∗ is a finite set of rules.
By convention, if only the rules are listed, the variable on the left-hand side of the first rule
is the start symbol, Variables are often uppercase letters, but not necessarily: the variables
are whatever appears on the left-hand side of rules. It is common in grammars for defining
programming languages for the variables to have long names to help readability, such as
AssignmentOperator or while_statement. Terminals are symbols other than uppercase
letters, such as lowercase letters, numbers, and other symbols such as parentheses.

4.2.3 Formal definition of computation by a CFG (semantics)


A string in (Σ ∪ Γ)∗ is called an derived string. Let x, y, z ∈ (Σ ∪ Γ)∗ and A ∈ Γ and let
A → z be a production rule. Then we write xAy ⇒ xzy to denote that the rule A → z,
applied to derived string xAy, results in another derived string xzy. A sequence of derived
strings x1 , . . . , xk , where x1 = S and xk ∈ Σ∗ , where xi ⇒ xi+1 for 1 ≤ i < k, is called a
derivation. For example, a derivation of 000:111 is

A ⇒ 0A1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000:111.

We also say that the grammar accepts or produces the string 000:111.
For the balanced parentheses grammar S → (S) | SS | ε, a derivation of (()(())) is

S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ (()S) ⇒ (()(S)) ⇒ (()((S))) ⇒ (()(()))

We also may represent this derivation with a parse tree, shown in Figure 4.2.
The set of strings accepted by a CFG G is denoted L(G), and we say G decides (a.k.a.,
generates) L(G).
Given the example CFG G above, its language is L(G) = { 0n :1n | n ∈ N }. A language
generated by a CFG is called context-free or CFG-decidable.6
Example 4.2.4. This is a small portion of the grammar for the Python programming
language (https://docs.python.org/3/reference/grammar.html). The rules use a : in-
stead of →, and they allow some syntactic sugar such as regex operators, but the grammar
could in principle be specified using only the syntax as defined above.
6
The common term is “context-free”. We use the phrase “CFG-decidable” to be consistent with our convention
of naming language classes after a model of computation defining them.
38 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

S
S S
S S
ε S
ε
( ( ) ( ( ) ) )

Figure 4.2: A parse tree for a derivation of (()(())). Each internal node represents a variable
to which a production rule was applied, the children represent the variable(s) and terminal(s) on
the right side of the rule. Concatenating the leaves in order from left to right gives the produced
string.

compound_stmt: if_stmt | while_stmt | ...


if_stmt: ’if’ test ’:’ suite (’elif’ test ’:’ suite)* [’else’ ’:’ suite]
while_stmt: ’while’ test ’:’ suite [’else’ ’:’ suite]
test: or_test [’if’ or_test ’else’ test] | lambdef
or_test: and_test (’or’ and_test)*
and_test: not_test (’and’ not_test)*
not_test: ’not’ not_test | comparison

4.2.4 Right-regular grammars (RRG)


There is a special kind of grammar that will be significant in Section 6.2. A right-regular
grammar (RRG) is one whose rules are all of the form A → ε or A → bC for some terminal
b and variable C.

end of lecture 2b

4.3 Nondeterministic finite automata (NFA)


4.3.1 Introduction
DFA: realistic difficult to program
NFA: (apparently) unrealistic easy to program

Differences between DFA’s and NFA’s:


4.3. NONDETERMINISTIC FINITE AUTOMATA (NFA) 39

0,1
0 1
q0 1 q1 0,1 q2
1 0
s 1
ε 0
0
r0 1 r1

Figure 4.3: A nondeterministic finite automaton (NFA) called N1 . Differences with a DFA are
highlighted in red: 1) There are two 1-transitions leaving state q0 , 2) There is an “ε-transition”
leaving state s, 3) There is no 1-transition leaving states s, r0 , or q2 and no 0-transition leaving
states r1 or q2 . Normally “missing” transitions are not shown in NFAs, but we highlight them here
to emphasize the difference with a DFA.

• An NFA state may have any number (including 0) of transition arrows out of a state,
for the same input symbol.7
• An NFA may change states without reading any input (an ε-transition).
• If there is a series of choices (when there is a choice) to reach an accept state after
reading the whole string, then the NFA accepts. Otherwise, the NFA rejects.8
Example 4.3.1. Design an NFA to decide the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.
See Figure 4.4. This uses nondeterministic transitions (but no ε-transitions) to “guess”
when the string is 3 bits from the end.

0,1

1 0,1 0,1

Figure 4.4: An NFA N2 deciding the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.

4.3.2 Formal definition of an NFA (syntax)


For any alphabet Σ, let Σε = Σ ∪ {ε}.
Definition 4.3.2. A nondeterministic finite automaton (NFA) is a 5-tuple (Q, Σ, ∆, s, F ),
where
7
Unlike a DFA, an NFA may attempt to read a symbol a in a state q that has no transition arrow for a; in this
case the NFA is interpreted as immediately rejecting the string without reading any more symbols.
8
Note that accepting and rejecting are treated asymmetrically, with acceptance “favored”; if there is a series of
choices that leads to accept and there is another series of choices that leads to reject, then the NFA accepts.
40 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

• Q is a non-empty, finite set of states,

• Σ is the input alphabet,

• s ∈ Q is the start state, and

• F ⊆ Q is the set of accepting states.

• ∆ : Q × Σε → P(Q) is the transition function,

When defining ∆, we assume that if for some q ∈ Q and b ∈ Σε , ∆(q, b) is not explicitly
defined, then ∆(q, b) = ∅.

Example 4.3.3. Recall the NFA N1 in Figure 4.3. Formally, N1 = (Q, Σ, ∆, s, F ), where

• Q = {s, r0 , r1 , q0 , q1 , q2 },

• Σ = {0, 1},

• F = {q2 , r0 , r1 }, and

• ∆ is defined
∆ 0 1 ε
s {q0 } ∅ {r0 }
r0 {r1 } ∅ ∅
r1 ∅ {r0 } ∅
q0 {q0 } {q0 , q1 } ∅
q1 {q2 } {q2 } ∅
q2 ∅ ∅ ∅

4.3.3 Set of states reachable by an NFA


A DFA, after reading a string, has exactly one state it can be in. An NFA, since it has many
possible paths, could be in many states (or none).
What are all the states N1 could be in after reading the following strings? (Accepting
states underlined.)

ε: s,r0 10: ∅
0: q0 , r1 11: ∅
1: ∅ 000: q0
00: q0 010: q0 , q2 , r1
01: q0 , q1 , r0 011: q0 , q1 , q2
N1 accepts a string if any of states it could be in after reading the string are accepting;
it’s okay for some of them to be rejecting as long as at least one is accepting. Thus, N1
accepts ε, 0, 01, 010, 011 and rejects 1, 00, 10, 11, 000.
4.3. NONDETERMINISTIC FINITE AUTOMATA (NFA) 41

4.3.4 Transition function versus set of transitions


Sometimes it is easier with NFAs to talk about sets of transitions, where each transition
(represented by an arrow in the state diagram) is a triple (qf , a, qt ), where qf ∈ Q is the
from-state, qt ∈ Q is the to-state, and a ∈ Σε is the transition label. We say this is an
a
a-transition from qf to qt , and we write qf →
− qt to denote the transition.
In the transition table above, the from-states are listed on the left column, the transition
labels are on the top row, and the to-states are in the table entries. For example, N1 has
0 ε 0 1 0 1 1 0
transitions s →− q0 , s →− r0 , r0 →− r1 , r1 →− r0 , q 0 →
− q0 , q0 →
− q0 , q0 →− q1 , q1 →− q2 ,
1
and q1 →− q2 . When doing “constructions” in Chapter 5 (algorithms that process an input
automaton to produce an output automaton), we will use the language of sets of transitions
to talk about adding or removing transitions from the input to produce the output. This is
a shorthand for adding or removing to-states from the entries of the table above.

4.3.5 Formal definition of computation by an NFA (semantics)


An NFA N = (Q, Σ, ∆, s, F ) accepts a string x ∈ Σ∗ if there are sequences y1 , y2 , . . . , ym ∈ Σε
and q0 , q1 , . . . , qm ∈ Q such that

1. x = y1 y2 . . . ym ,

2. q0 = s,

3. qi+1 ∈ ∆(qi , yi+1 ) for i ∈ {0, 1, . . . , m − 1}, and

4. qm ∈ F .

The two sequences together uniquely identify which transition arrows were followed. We refer
to either the sequence of transition arrows, or sometimes the sequence of states followed, as
a computation sequence of N on x.9

4.3.6 Example NFA using ε-transitions

0,1

ε 1 0,1 0,1
a b c d e

Figure 4.5: An NFA N3 deciding the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.

9
Unlike a DFA, even if we know x and q0 , . . . , qm , we cannot uniquely identify which transitions were followed,
since it may be possible to take ε-transitions at different points while reading x, yet follow the same sequence of
states. (Example?)
42 CHAPTER 4. DECLARATIVE MODELS OF COMPUTATION

Example 4.3.4. Figure 4.5 shows another NFA deciding if the third-to-last bit is 1, but using
an ε-transition. This example shows that unlike a DFA, the sequence of states q0 , q1 , . . . , qm
in the definition of NFA acceptance of a string of length n can be longer than n + 1. For
example, the string 0100 has y1 = 0, y2 = ε, y3 = 1, y4 = 0, y5 = 0 and q0 = a, q1 = a, q2 =
b, q3 = c, q4 = d, q5 = e.
Note that if N is actually deterministic (i.e., no ε-transitions and |∆(q, b)| = 1 for all
q ∈ Q and b ∈ Σ), then the sequence of states leading to an accept state on a given input
is unique, and it is the same sequence of states as in the definition of DFA acceptance, with
n + 1 states to accept a string of length n.
Compared to DFAs, with NFAs, strings appear
• easier to accept, but
• harder to reject.10
We won’t spend time practicing NFA design right now as we did with DFAs. The utility
of the non-determinism “feature” of NFAs will become more apparent in Chapter 5.

end of lecture 2c

10
The trick with NFAs is that it becomes (apparently) easier to accept a string, since multiple paths through the
NFA could lead to an accept state, and only one must do so in order to accept. But NFAs aren’t magic; you can’t
simply put accept states and ε-transitions everywhere and claim that there exist paths doing what you want.
By the same token that makes acceptance easier, rejection becomes more difficult, because you must ensure that,
if the string ought to be rejected, then no path leads to an accept state. Therefore, the more transitions and accept
states you throw in to make accepting easier, that much more difficult does it become to design the NFA to properly
reject. The key difference is that the condition “there exists a path to an accept state” becomes, when we negate
it to define rejection, “all paths lead to a reject state”. It is more difficult to verify a “for all” claim than a “there
exists” claim.
Chapter 5

Closure of language classes under set


operations

So far we have read and “programmed” DFAs, CFGs, regex’s, and NFAs that decide par-
ticular languages. We now study fundamental properties shared by all languages decided
by these models. This will involve thinking at a higher level of abstraction than we’ve done
so far. Rather than starting with a concrete problem we want to solve, and designing, for
example, a DFA for it, we start with an abstract DFA D given to us by someone else. We
don’t know what D looks like, other than it is a valid DFA deciding some language (we don’t
know the language, but we know it’s called L(D)). We have to change it into a second DFA
D0 , for instance deciding the complement of L(D).
This might be awkward to think about, but all we are doing is describing an algorithm
that operates on DFAs. Just as with a sorting algorithm that sorts lists, you can’t make
any assumptions about the list. It could be any list, with any number of elements, in any
order, and the algorithm has to work on all of them. That’s the same idea we will apply
in this chapter: we will describe algorithms that take an automaton as input and produce
another automaton as output. We sometimes call these algorithms “constructions” because
the output of the algorithm is an automaton like a DFA or a regex that has been constructed.
We’re also not going to implement the algorithms in code, although you could. In fact, the
autograders implement all of them in code!

5.1 Automatic transformation of regex’s, NFAs, DFAs


Let’s start with a simple idea. Suppose we have a DFA D, and we swap the accept and
reject states to get DFA D0 . How are L(D) and L(D0 ) related? They behave exactly the
same on every input string, except that whenever D is accepting, D0 is rejecting, and vice
versa. D and D0 give the opposite answer on every single input. Therefore L(D0 ) = L(D).
This idea works for every single DFA. You take any DFA, defining a DFA-decidable
language L, swap the accept/reject states, and now you have a DFA deciding L. In other
words, if a language L is DFA-decidable, then L is also DFA-decidable. There’s a word for

43
44 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

this concept: the DFA-decidable languages are closed under complement.1

Observation 5.1.1. The class of DFA-decidable languages is closed under complement.

At this point you may understand the meaning of the above observation, but not under-
stand why it’s worth thinking about. So what if the DFA-decidable languages are closed
under complement? Who cares? If we think about it from the point of view of solving
problems, understanding these closure properties is often the key to breaking the problem
down and solving it.
The problem you are trying to solve, i.e., the language L you are trying to decide, may
not be obviously decidable by any DFA. But if you can spot that L is simply the complement
of another DFA-decidable language, then you know immediately that L is DFA-decidable.
It goes the other way too: in Chapter 7 we will show that certain languages are not DFA-
decidable. The techniques for proving such impossibility results are quite different from
showing that a language is DFA-decidable. So you may be asked to show L is not DFA-
decidable, and it’s not obvious how to do it. But, if you have already shown that L is not
DFA-decidable, then you can immediately conclude that L cannot be DFA-decidable either,
because then otherwise L would be also.
Let’s think about the other common set-theoretic operations we have for languages:
union, intersection (which are applicable for any kind of set, including languages), concate-
nation, and Kleene star (which only are applicable to languages). Let A and B be languages.

Union: A ∪ B = { x | x ∈ A or x ∈ B }

Intersection: A ∩ B = { x | x ∈ A and x ∈ B }

Concatenation: AB = { xy | x ∈ A and y ∈ B }2 (also written A ◦ B)

(Kleene) Star: A∗ = ∞ n 3
S
n=0 A = { x1 x2 . . . xk | k ∈ N and x1 , x2 , . . . , xk ∈ A }

Each is an operator on one or two languages, with another language as output.


Are the DFA-decidable languages closed under any of these operations? It’s not obvious.
It turns out they are, but proving this will occupy this chapter and the next. When we
are done, we will know not only that DFA-decidable languages are closed under all of these
1
Note that this is not the same as the property of a subset A ⊆ Rd being closed in the sense studied in a course
on real analysis, in that it contains all of its limit points. In particular, one typically does not talk about any class
of languages being merely “closed,” but rather, closed with respect to a certain set operation such as complement,
union, concatenation, or Kleene star. The usage of the word “closed” in real analysis can be seen as a special case
of being closed with respect to the operation of taking limits of infinite sequences from the set. The intuition for the
terminology is that if you think of the class of DFA-decidable languages as being like a room, and doing an operation
as being like moving from one language (or languages) to another, then moving via complement won’t get you out of
the room; the room is “closed” with respect to that type of move.
2
The set of all strings formed by concatenating one string from A to one string from B; only makes sense for
languages because not all types of objects can be concatenated
3
The set of all strings formed by concatenating 0 or more strings from A. Just like with Σ∗ , except now whole
strings may be concatenated instead of individual symbols.
5.1. AUTOMATIC TRANSFORMATION OF REGEX’S, NFAS, DFAS 45

operations, but that so are languages decided by a regex, NFA, or RRG, because all of these
models define exactly the same class of languages.
Let’s think about other models of computation besides DFAs. Is it obvious that NFA-
decidable languages are closed under complement? No it isn’t. You might be tempted to
swap the accept and reject states, but that won’t work! (Homework exercise.)
What about regex-decidability? Recall in the section on regex’s, we wrote a regex match-
ing strings that represent C++ double literals (for P = 1 ∪ 2 ∪ . . . ∪ 9 and D = 0 ∪ P ):

(+ ∪ − ∪ ε)(.D+ ∪ (P D∗ ∪ 0).D∗ )

Now consider this task: write a regex matching the complement of this language, i.e., letting
D = {0, 1, . . . , 9}, all strings over D ∪ {+, −, .} that are not C++ double literals.
It’s not obvious how to do it, right? It is true, in fact: the complement of any regex-
decidable language is also regex-decidable. But it’s not clear why, right now. For some
models of computation it is not even true. For example, the CFG-decidable languages are
not closed under complement.4
Or, consider these two regex’s over alphabet Σ = {0, 1}: R1 = Σ(ΣΣ)∗ , matching any
odd-length binary string, and R2 = Σ∗ 0011Σ∗ , matching any string containing the substring
0011. The regex R1 ∪ R2 matches their union: any string that is odd length or contains the
substring 0011. What about their intersection, odd length strings that contain the substring
0011?
We can consider that wherever 0011 appears, since it is even length, for the whole string
to be odd length, we must either have an even number of bits before 0011 and an odd number
of bits after, or vice versa. Recall that (ΣΣ)∗ decides even-length strings and Σ(ΣΣ)∗ decides
odd-length strings. Thus, this regex works:
R3 = (ΣΣ)∗ 0011 Σ(ΣΣ)∗
∪ Σ(ΣΣ)∗ 0011 (ΣΣ)∗

But this is ad-hoc, requiring us to think about the details of the two languages L(R1 )
and L(R2 ). Consider changing to R1 = ΣΣΣ(ΣΣΣΣΣ)∗ matching all strings whose length
is congruent to 3 mod 5. We could do a similar case analysis of all the combinations of
numbers of bits before and after 0011 that would lead the whole length to be congruent to
3 mod 5. But this would be tedious, and it similarly requires us to understand the details
of the two languages.
Could we devise a procedure to automatically process R1 and R2 , based solely on their
structure, not requiring any special reasoning specific to their languages, which would tell
us how to construct R3 with L(R3 ) = L(R1 ) ∩ L(R2 )?
Similarly, can we devise a way to automatically convert a regex R into one deciding the
complement L(R)? This task is easy enough with DFAs, but it’s not clear for regex’s.
Conversely, it is not obvious how to show the DFA-decidable languages are closed under
∪, ∩, ◦ or ∗ .
4
The language {0i 1j 2k | i 6= j or j 6= k} is CFG-decidable, but its complement {0i 1i 2i | i ∈ N} is not.
46 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

So to recap: a DFA can easily be modified to decide the complement. Regex’s can trivially
be combined for certain operations: ∪, ◦, ∗ , because those operations are built right into
the definition. NFAs can’t even seem to be modified easily to decide the complement. Some
closure properties that are easy to prove for one model seem difficult in the other.
But we made the claim that in fact all of these models define the exact same set of
decision problems: the regular languages, which are claimed to be closed under all of these
operations. Thus, if there are DFAs or regex’s or NFAs or RRGs for languages A, B, then
there are DFAs and regex’s and NFA’s and RRG’s for all of A, B, A ∪ B, A ∩ B, A ◦ B, A∗ ,
and A.

Roadmap for next two chapters. In this chapter, we will show that we can prove some
of these closure properties by giving constructions (algorithms) to transform one or two
instances of a model into another instance of that same model, deciding a (potentially)
different language. For instance, the next section shows how to combine two DFAs D1 and
D2 into a third DFA D such that L(D) = L(D1 ) ∪ L(D2 ). These constructions are more
involved than the very simple “swap the accept/reject” states construction to show DFA-
decidability is closed under complement, but they follow the same principle: the constructed
automaton “simulates” the given automata in some way. At the end of this chapter, we will
have shown that the DFA-decidable languages are closed under complement, ∪, and ∩, the
NFA-decidable languages are closed under ∪, ◦, and ∗ , and the regex-decidable languages
are closed under ∪, ◦, and ∗ . The last part about regex’s follows directly by the definition
of regex’s, so we won’t actually discuss it again in this chapter.
In Chapter 6, we do something similar, but instead, show constructions that transform
an instance of a model into an instance of a different model, deciding the same language.
For instance, we will show how to take an arbitrary NFA N , which decides language L(N )
by definition, and produce a DFA also deciding L(N ). This will be done between all four
models of DFA, NFA, regex, and RRG, showing that all of these models have the same
computational power: if one of them decides a language, so do all the others. At that
point it will make sense to call the class of languages they decide simply “regular”, without
referring specifically to one of those models.
It will follow that, for example, the DFA-decidable languages are closed under ∗ , although
the construction is somewhat indirect:
1. Start with a DFA D1 deciding language A. Note that D1 is also an NFA.
2. Convert D1 to an NFA N deciding A∗ .
3. Convert N to an equivalent DFA D2 deciding A∗ .

5.2 DFA union and intersection (product construction)


Example 5.2.1. Let L1 = {an | n ≡ 2 mod 3} and L2 = {an | n ≡ 1 mod 5 or n ≡ 4
mod 5}. Design a DFA D = (Q, {a}, δ, s, F ) to decide the language L1 ∪ L2 .
5.2. DFA UNION AND INTERSECTION (PRODUCT CONSTRUCTION) 47

See Figure 5.1. Informally, we know how to create a DFA D1 for L1 and a DFA D2 for
L2 . To design D, we want to simulate the actions of both D1 and D2 , simultaneously, on the
input. This is done by giving each state of D two “fields”, one field to track the state of D1 ,
and the second to track the state of D2 . Each field is updated independently, depending only
on the input symbol and the previous value of that field, but not depending on the other
field. At any time (including after reading all the input symbols), D knows both states that
D1 and D2 would be in, so it knows whether each is accepting. It should accept if either (or
both) are accepting.

a a a a
0 1 2 3 4

0 (0,0) (0,1) (0,2) (0,3) (0,4)

1 a (1,0) (1,1) (1,2) (1,3) (1,4)

2 (2,0) (2,1) (2,2) (2,3) (2,4)

Figure 5.1: Bottom left: a DFA deciding the language L3 = {an | n ≡ 2 mod 3}. Top right: a
DFA deciding the language L5 = {an | n ≡ 1 mod 5 or n ≡ 4 mod 5}. Bottom right: a DFA
deciding the language L3 ∪ L5 . For readability, transitions are not labeled, but they should all be
labeled with a. The accept states are row 2 and columns 1 and 4. To decide L3 ∩ L5 , we choose
the accept states to be {(2, 1), (2, 4)} instead, i.e., where row 2 meets columns 1 and 4.

Formally,

• Q = { (i, j) | 0 ≤ i < 3, 0 ≤ j < 5 } = {0, 1, 2} × {0, 1, 2, 3, 4}

• s = (0, 0)

• δ((i, j), a) = (i + 1 mod 3, j + 1 mod 5)

• F = { (i, j) | i = 2 or j = 1 or j = 4 }

D is essentially simulating two DFAs at once: one that computes congruence mod 3
and the other that computes congruence mod 5. We call this general idea of one DFA
simultaneously simulating two others, by having two parts of its state set, one to remember
the state of one DFA and the other to remember the state of the other DFA, the product
construction, because the larger DFA’s state set is the cross product of the smaller two.
48 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

D1 DFA obtained by product construction on D1, D2


q1 1 q2 deciding L(D1) ∪ L(D2)
1
1
0 0 0 1 1
q3 q1,r1 q1,r2 q2,r1 1 q2,r2 1
0
0 0
D2 0 0 1
0 1 1
1 0
r1 r2
q3,r1 q3,r2
0

Figure 5.2: The product construction applied to the DFAs D1 and D2 from Figures 3.1 and 3.2, to
obtain a DFA D with L(D) = L(D1 ) ∪ L(D2 ). We can think of the states of D as being grouped
into three groups representing q1 , q2 , and q3 respectively, and within each of those groups, there’s
a state representing r1 and a state representing r2 . In this figure, accept states have bold outlines.
To decide L(D1 ) ∩ L(D2 ) instead, we should choose only (q2 , r2 ) to be the accept state.

For an example of the product construction on DFAs with a larger alphabet, as well as
a different way to visualize the product construction, see Figure 5.2.
We now generalize this idea.

Theorem 5.2.2. The class of DFA-decidable languages is closed under ∪.


5

Proof. (DFA Product Construction for ∪)


Let D1 = (Q1 , Σ, δ1 , s1 , F1 ) and D2 = (Q2 , Σ, δ2 , s2 , F2 ) be DFAs. We construct the DFA
D = (Q, Σ, δ, s, F ) to decide L(D1 ) ∪ L(D2 ) by simulating both D1 and D2 in parallel, where

• Q keeps track of the states of both D1 and D2 :

Q = Q1 × Q2 (= { (r1 , r2 ) | r1 ∈ Q1 and r2 ∈ Q2 })
5
Proof Idea: (The DFA Product Construction for ∪)
We must show that if A1 and A2 are regular, then so is A1 ∪ A2 . Since A1 and A2 are regular, some DFA D1
decides A1 , and some DFA D2 decides A2 . It suffices to show that some DFA D decides A1 ∪ A2 ; i.e., it accepts a
string x if and only if at least one of D1 or D2 accepts x.
D will simulate D1 and D2 . If either accepts the input string, then so will D. D must simulate them simultaneously,
because if it tried to simulate D1 , then D2 , it could not remember the input to supply it to D2
5.2. DFA UNION AND INTERSECTION (PRODUCT CONSTRUCTION) 49

• δ simulates moving both D1 and D2 one step forward in response to the input symbol.
Define δ for all (r1 , r2 ) ∈ Q and all b ∈ Σ as
δ( (r1 , r2 ) , b ) = ( δ1 (r1 , b) , δ2 (r2 , b) )
6

• s ensures both D1 and D2 start in their respective start states:


s = (s1 , s2 )

• F must be accepting exactly when either one, or the other, or both, of D1 and D2 are
in an accepting state:
F = { (r1 , r2 ) | r1 ∈ F1 or r2 ∈ F2 } .
7

Then, after reading an input string x that puts D1 in state r1 and D2 in state r2 , we have
that the state q = (r1 , r2 ) is in F if and only if r1 ∈ F1 or r2 ∈ F2 , which is true if and only
if x ∈ L(D1 ) or x ∈ L(D2 ), i.e., x ∈ L(D1 ) ∪ L(D2 ).

end of lecture 3a
Theorem 5.2.3. The class of DFA-decidable languages is closed under ∩.

Proof Idea: DeMorgan’s Laws.

Proof. Let A, B be DFA-decidable languages; then A ∩ B = A ∪ B is the complement of the


union of two languages A and B that are the complements of regular languages. By closure
under union and complement, this language is also DFA-decidable.
Alternate proof. (DFA Product Construction for ∩). We can modify the product con-
struction for ∪ and define F = F1 × F2 = {(r1 , r2 ) | r1 ∈ F1 and r2 ∈ F2 }.

Multiple ∪ or ∩. We have only proved that the DFA-decidable languages are closed under
a single application of ∪ or ∩ to two DFA-decidable languages A1 and A2 . But, we can use
induction to prove that they are closed under finiteS union and intersection; for instance, for
any k ∈ N, if A1 , . . . , Ak are DFA-decidable, then ki=1 Ai is DFA-decidable.
What about infinite union or intersection?
6
This is difficult to read! Be sure to read it carefully and ensure that the type of each object is what you expect.
Q has states that are pairs of states from Q1 × Q2 . Thus, δ’s first input, a state from Q, is a pair of states (r1 , r2 ),
where r1 ∈ Q1 and r2 ∈ Q2 . Similarly, the output of δ must also be a pair from Q1 × Q2 . However, δ1 ’s first input,
and its output, are just single states from Q1 , and similarly for δ2 .
7
This is not the same as F = F1 × F2 . What would the automaton do in that case?
50 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

Closure only goes one way. We now know if A and B are DFA-decidable, then A ∪ B is
DFA-decidable. What about the reverse claim? Can we say that if A ∪ B is DFA-decidable,
then both A and B must be DFA-decidable? No: let A be any DFA-decidable language, and
let B = A. Then A ∪ B = Σ∗ , which is DFA-decidable, but A is not (neither is B, otherwise
A would also be by closure under complement).

Difficulty of showing DFA closure under ◦ or ∗ . So now we know that the DFA-decidable
languages are closed under complement, ∪, and ∩. What about ◦ or ∗ ? Given two DFAs D1
and D2 , it seems difficult to create a DFA D deciding L(D1 ) ◦ L(D2 ) using the same ideas
as we used in the product construction.
The obvious thing to try is to let D have all the states of both D1 and D2 (i.e., take the
union of their states), and on input w, to start simulating D1 reading some prefix x v w,
and then at some point to switch from D1 to D2 , and have D2 process the remaining suffix
y of w, so that w = xy and D accepts if and only if D1 accepts x and D2 accepts y. But
how does D know when to switch? If w is the concatenation x of x ∈ L(D1 ) and y ∈ L(D2 ),
there’s no delimiter to tell us where x ends and y begins. In fact, it might be possible to
split w into x and y in multiple ways, for example: if L(D1 ) = {0, 00} and L(D2 ) = {0, 00},
then w = 000 could be obtained by letting x = 0 and y = 00, or x = 00 and y = 0.

5.3 NFA union, concatenation, and star constructions


5.3.1 Examples
We being by observing that NFAs can be combined to decide union using a simpler method
than the product construction for DFAs. (Unfortunately, unlike the product construction,
the NFA union construction only works for union, not intersection.)
Example 5.3.1. Design an NFA deciding the language {x ∈ {a}∗ | |x| is a multiple of 3 or
5 }.
See Figure 5.3. This uses ε-transitions to guess which of the two DFAs to simulate: one
that decides multiples of 3, or another that decides multiples of 5.
Unlike DFAs, there is a simple way to implement concatenation and Kleene star with
NFAs as well. We start with an example of concatenation.
Example 5.3.2. If x ∈ {0, 1}∗ , let nx ∈ N be the integer that x represents in binary (with
nε = 0).
Design an NFA to decide the language

A = {xy ∈ {0, 1}∗ | (nx ≡ 0 mod 3 or nx ≡ 1 mod 3) and ny ≡ 2 mod 3}.

For example, 1000101 ∈ A since 1002 = 4 ≡ 1 mod 3 and 01012 = 5 ≡ 2 mod 3. However,
01 6∈ A since the possible values for x and y are (x = ε, y = 01), (x = 0, y = 1), or
(x = 01, y = ε), and in all cases ny 6≡ 2 mod 3.
5.3. NFA UNION, CONCATENATION, AND STAR CONSTRUCTIONS 51

N1 a a
N
r0 a r1 a r2 a a
r0 r1 r2
ε

s
a a
N2 ε
a a a a t0 a t1 a t2 a t3 a t4
t0 t1 t2 t3 t4

Figure 5.3: Example of the NFA union construction. This NFA decides the language L(N ) =
L(N1 ) ∪ L(N2 ) = {x ∈ {a}∗ | |x| is a multiple of 3 or 5 }, using ε-transitions to guess which of the
two NFAs to simulate.

See Figure 5.4. This simulates the transitions of the DFA from Figure 3.5 twice, once on
x and once on y: it uses ε-transitions to guess where x ends and y begins (and it only will
make that jump if the DFA is currently accepting x).

N1 1 0 N2 1 0
q0 q1 q2 r0 r1 r2
1 0 1 0
0 1 0 1
ε
N ε
1 0 1 0
q0 q1 q2 r0 r1 r2
1 0 1 0
0 1 0 1

Figure 5.4: Example of the NFA concatenation construction. An NFA N deciding the language
L(N1 ) ◦ L(N2 ). N essentially consists of N1 and N2 , with ε-transitions to “guess” when to switch
from N1 to N2 (but only when N1 is accepting).

Example 5.3.3. Let L = {x ∈ {a, b}∗ | x has an odd number of b’s followed by an a}.
Design an NFA to decide the language L∗ .
L is decided by the NFA N in Figure 5.5. N simulates D over and over in a loop, using
ε-transitions to guess when to start over (but only starting over if the DFA is currently
accepting x).
It is important that we did not make the old start state accepting. This changes the
semantics of the underlying NFA and could result in a mistake, if positive-length strings
reach back to the start state.

end of lecture 3b
52 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

N1 b
a
s1
b

incorrect N ε
b
a
s1
b

correct N ε
b
a
s ε s1
b

Figure 5.5: Example of the NFA Kleene star construction on an NFA. N simulates N1 over and
over in a loop, using ε-transitions to guess when to start over (but only when N1 is accepting). The
NFA N1 decides the language described by the regular expression b(bb)∗ a, i.e., an odd number of
b’s followed by an a. L(N ) = L(N1 )∗ is then the language containing ε (as any starred language
should) and positive-length strings in which all occurrences of an a are preceded by an odd number
of b’s, and the last bit is a a. It would be a mistake simply to make the old start state accepting.
The incorrect NFA N shown accepts the strings bb and bbbb, which do not end in a a.

5.3.2 Proofs
The above examples show the basic ideas for how to prove that the NFA-decidable languages
are closed under union, concatenation, and Kleene star. Now, we actually prove those facts,
by showing how to make the ideas work on any NFA.
Theorem 5.3.4. The class of NFA-decidable languages is closed under ∪.
Proof. Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) and N2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFAs, where Q1 ∩Q2 = ∅.
Define the NFA N deciding L(N1 ) ∪ L(N2 ).
See Figure 5.3 for an example. Intuitively, on input w ∈ Σ∗ , N nondeterministically
guesses whether to simulates N1 or N2 by taking an ε-transition from N ’s start state to the
start state for either N1 or N2 . N accepts if the NFA it guessed accepts.
To fully define N = (Q, Σ, ∆, s, F ):
• N has all the states of N1 and N2 , and one extra new state s, so Q = Q1 ∪ Q2 ∪ {s},
where s 6∈ Q1 ∪ Q2 .
• N accepts if the NFA it guessed accepts: F = F1 ∪ F2 .
• N simulates N1 or N2 , depending on the initial guess. To do this, for all q ∈ Q1 ∪ Q2
and b ∈ Σε , 
∆1 (q, b), if q ∈ Q1 ;
∆(q, b) =
∆2 (q, b), if q ∈ Q2 ;
5.3. NFA UNION, CONCATENATION, AND STAR CONSTRUCTIONS 53

Finally, N must make the initial guess: ∆(s, ε) = {s1 , s2 }.


Equivalently, we can use the language of sets of transitions to describe ∆: ∆ has all
ε ε
transitions of ∆1 and ∆2 , in addition to two new transitions s →
− s1 and s →
− s2 .

From now on, we will not define the full transition function ∆ in NFA constructions such
as this. Instead, we will use the equivalent (but easier) terminology of sets of transitions.

Theorem 5.3.5. The class of NFA-decidable languages is closed under ◦.

Proof. Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) and N2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFAs. Define an NFA N


deciding L(N1 ) ◦ L(N2 ) = { xy ∈ Σ∗ | x ∈ L(N1 ) and y ∈ L(N2 ) } .
See Figure 5.4 for an example. Intuitively, on input w ∈ Σ∗ , N first simulates N1 for
some prefix x v w, then N2 on the remaining suffix y (so w = xy), nondeterministically
guessing when to switch, but only when N1 is accepting. N accepts if N2 does, i.e., if N1
accepts x and N2 accepts y.
To fully define N = (Q, Σ, ∆, s, F ):

• N has all the states of N1 and N2 , so Q = Q1 ∪ Q2 .

• N starts by simulating N1 , so s = s1 .

• N simulates N1 initially, at some point when N1 is accepting, guesses where to switch


to simulating N2 . To do this, ∆ has all transitions of ∆1 and ∆2 . In addition, for each
ε
q ∈ F1 , ∆ has q →
− s2 .

• N accepts if N2 accepts after N has switched to simulating N2 , so F = F2 . (Note that


no states in F1 are accepting in N .)

Detailed proof of correctness: To see that L(N1 ) ◦ L(N2 ) ⊆ L(N ). Let w ∈ Σ∗ . If there
are x ∈ L(N1 ) and y ∈ L(N2 ) such that w = xy (i.e., w ∈ L(N1 ) ◦ L(N2 ), then there is
a sequence of choices of N such that N accepts w (i.e., q ∈ L(N )): follow the choices N1
makes to accept x, ending in a state in F1 , then execute the ε-transition to state s2 defined
above, then follow the choices N2 makes to accept y. This shows L(N1 ) ◦ L(N2 ) ⊆ L(N ).
To see the reverse containment L(N ) ⊆ L(N1 ) ◦ L(N2 ), suppose w ∈ L(N ). Then there is
a sequence of choices such that N accepts w. By construction, all paths from s = s1 to some
state in F = F2 pass through s2 , so N must reach s2 after reading some prefix x v w, and
the remaining suffix y of w takes N from s2 to a state in F2 , i.e., y ∈ L(N2 ). By construction,
all paths from s = s1 to s2 go through a state in F1 , and those states are connected to s2
only by a ε-transition, so x takes N from s1 to a state in F2 , i.e., x ∈ L(N1 ). Since w = xy,
this shows that L(N ) ⊆ L(N1 ) ◦ L(N2 ).
Thus N decides L(N1 ) ◦ L(N2 ).

Theorem 5.3.6. The class of NFA-decidable languages is closed under ∗ .


54 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS

Proof. Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) be an NFA. Define an NFA N deciding L(N1 )∗ .


See Figure 5.5 for an example. N should accept w ∈ Σ∗ if w can be broken into several
pieces x1 , . . . , xk ∈ Σ∗ such that N1 accepts each piece. N will simulate N1 repeatedly,
nondeterministically guessing when to ε-transition from an accept state to the start state,
(signifying a switch from xi to xi+1 ). Since N must also accept ε, we add one new accepting
state to N , which is N ’s start state, and an ε-transition to the original start state.8
To fully define N = (Q, Σ, ∆, s, F ):
• N has the states of N1 plus the new start state s, so Q = Q1 ∪ {s}.
• N accepts whenever N1 does, and to ensure we accept ε, the new start state is also
accepting: F = F1 ∪ {s}.
• N simulates N1 repeatedly, guessing when to reset if N1 when accepting. To do this,
ε ε
∆ has all transitions of ∆1 , in addition to s →
− s1 and, for each q ∈ F , q →
− s1 .

Detailed proof of correctness: Clearly N accepts ε ∈ L(D)∗ , so we consider only nonempty


strings.
To see that L(D)∗ ⊆ L(N ), suppose w ∈ L(D)∗ \ {ε}. Then w = x1 x2 . . . xk , where
each xj ∈ L(D) \ {ε}. Thus for each j ∈ {1, . . . , k}, there is a sequence of states sD =
qj,0 , qj,1 , . . . , qj,|xj | ∈ Q, where qj,|xj | ∈ F , and a sequence yj,0 , . . . , yj,|xj |−1 ∈ Σ, such that
xj = yj,0 . . . yj,|xj |−1 , and for each i ∈ {0, . . . , |xj | − 1}, qj,i+1 = δ(qj,i , yj,i ). The following
sequence of states in QN testifies that N accepts w:
(sN , sD , q1,1 , . . . , q1,|x1 | ,
sD , q2,1 , . . . , q2,|x2 | ,
...
sD , qk,1 , . . . , qk,|xk | ).
Each transition between adjacent states in the sequence is either one of the ∆(qj,i , yj,i ) listed
above, or is the ε-transition from sN to sD or from qj,|xj | to sD . Since qk,|xk | ∈ F ⊆ F 0 , N
accepts w, i.e., L(D)∗ ⊆ L(N ).
To see that L(N ) ⊆ L(D)∗ , let w ∈ L(N ) \ {ε}. Then there are sequences s0 =
q0 , q1 . . . , qm ∈ Q0 and y0 , . . . , ym−1 ∈ Σε such that qm ∈ F 0 , w = y0 y1 . . . ym−1 , and, for
all i ∈ {0, . . . , m − 1}, qi+1 ∈ ∆0 (qi , y[i]). Since w 6= ε, qm 6= s0 , so qm ∈ F . Since the start
state is s0 = q0 , which has no outgoing non-ε-transition, the first transition from q0 to q1
is the ε-transition s0 to s. Suppose there are k − 1 ε-transitions from a state in F to s in
q1 , . . . , qm . Then we can write q1 , . . . , qm as
( sD , q1,1 , . . . , q1,|x1 | ,
sD , q2,1 , . . . , q2,|x2 | ,
...
sD , qk,1 , . . . , qk,|xk | ).
8
It is tempting simply to make N1 ’s original start state s1 accepting. However, if s1 was rejecting before, and if
there are existing transitions into s1 , then this will add new strings to the language accepted by N1 , which we don’t
want. Adding a new start state just to handle ε solves this problem.
5.3. NFA UNION, CONCATENATION, AND STAR CONSTRUCTIONS 55

where each qj,|xj | ∈ F and has a ε-transition to s.9 For each j ∈ {1, . . . , k) and i ∈ {0, |xj |−1},
let yj,i be the corresponding symbol in Σε causing the transition from qj,i to qj,i+1 , and let
xj = yj,0 yj,1 . . . yj,|xj |−1 . Then the sequences s, qj,1 qj,2 . . . qj,|xj | and yj,0 yj,1 . . . yj,mj −1 testify
that D accepts xj , thus xj ∈ L(D). Thus w = x1 x2 . . . xk , where each xj ∈ L(D), so
w ∈ L(N )∗ , showing L(N ) ⊆ L(D)∗ .
Thus N decides L(D)∗ .

end of lecture 3c

9
Perhaps there are no such ε-transitions, in which case k = 1.
56 CHAPTER 5. CLOSURE OF LANGUAGE CLASSES UNDER SET OPERATIONS
Chapter 6

Equivalence of models

The previous chapter showed how to convert instances of one model into other instances of
the same model, deciding a different language. In this chapter, we show how to convert an
instance of one model into an instance of a different model, deciding the same language.
This is our primary tool for comparing the power of different models of computation.
For example, if any regex can be simulated by an NFA, then any regex-decidable language
is NFA-decidable, so NFAs are at least as powerful as regex’s. Conversely, if any NFA can be
simulated by a regex, then regex’s are at least as powerful as NFAs. If both are true, then
NFAs and regex’s have equivalent computational power. By the end of this chapter, we will
have shown that DFAs, NFAs, regex’s, and RRGs all have the same computational power:
they all decide the same class of languages, which we call “regular”.

6.1 Equivalence of DFAs and NFAs (subset construction)


Observation 6.1.1. If a language is DFA-decidable, then it is NFA-decidable.
Proof. A DFA D = (Q, Σ, δ, s, F ) is an NFA N = (Q, Σ, ∆, s, F ) with no ε-transitions and,
for every q ∈ Q and a ∈ Σ, ∆(q, a) = {δ(q, a)}.
The converse is less obvious.
Theorem 6.1.2. If a language is NFA-decidable, then it is DFA-decidable.

Proof. (Subset Construction) Let N = (QN , Σ, ∆, sN , FN ) be an NFA with no ε-transitions.1


Define the DFA D = (QD , Σ, δ, sD , FD ) as follows
• QD = P(QN ). Each state of D keeps track of a set of states in N , representing the set
of all states N could be in after reading some portion of the input.
• sD = {sN }, After reading no input, N can only be in state sN .
1
At the end of the proof we explain how to modify the construction to handle them.

57
58 CHAPTER 6. EQUIVALENCE OF MODELS

D a b
a,b Ø {1} {2} {1,2}
N b
1 a,b
b a b a a
a b a
ε b
a 2 3 a
a,b {3} {1,3} a {2,3} {1,2,3}
a
b

Figure 6.1: Example of the subset construction. We transform an NFA N into a DFA D. Each
state in D represents a subset of states in N . The blue dashed arrows in D represent transitions and
start state if the ε-transition in N were absent, showing the special case of the subset construction
when there are no ε-transitions. The blue dashed arrows should be removed from D, and the red
arrows should be added, to account for the ε-transition in N . The D states {1} and {1, 2} are
unreachable from the start state {1, 3}, so we could remove them without altering the behavior of
D. Either version of the DFA would work.

• FD = {A ⊆ QN | A ∩ FN 6= ∅}. Recall the asymmetric acceptance criterion; we want


to accept if there is a way to reach an accept state, i.e., if the set of states N could be
in after reading the whole input contains any accept states.

• For all R ∈ QD (i.e., all R ⊆ QN ) and b ∈ Σ,


[
δ(R, b) = ∆(q, b),
q∈R

If N is in state q ∈ R after reading some portion of the input, then the states could it
be in after reading the next symbol b are all the states in ∆(q, b); since N could be in
any state q ∈ R before reading b, then we must take the union over all q ∈ R.

Now we show how to handle the ε-transitions. For any R ⊆ QN and define

E(R) = { q ∈ QN | q is reachable from some state in R by following 0 or more ε-transitions } .

For example, in Figure 6.1, if we let R = {1, 2}, then E(R) = {1, 2, 3}. An example in
which multiple ε-transitions can be followed is in Figure 6.2; see the caption for details.
To account for the ε-transitions, D must be able to simulate

1. N following ε-transitions after each input-consuming transition, i.e., define


!
[
δ(R, b) = E ∆(q, b) .
q∈R
6.1. EQUIVALENCE OF DFAS AND NFAS (SUBSET CONSTRUCTION) 59

ε
1 6 5
a
b ε ε ε
ε
a 2 a,b
3 4

Figure 6.2: Example NFA to demonstrate E(R) for different subsets of nodes R ⊆ {1, 2, 3, 4, 5, 6}.
E({1}) = {1, 3, 4, 5, 6}, E({2}) = {2},
S E({3}) = {3, 4, 5}, E({4}) = {4, 5}, E({5}) = {4, 5},
E({6}) = {6}. For other sets, E(R) = q∈R E({q}), e.g., E({2, 5}) = E({2}) ∪ E({5}) = {2, 4, 5}.

2. N following ε-transitions before the first non-ε-transition, i.e., define sD = E({sN }).

We have constructed a DFA D such that the state D is in after reading string x is the subset
of states that N could be in after reading x. By the definition of FD , D accepts x if and
only if at least one of the states N could be in is accepting, i.e., if and only if N accepts x.
Thus L(D) = L(N ).

Corollary 6.1.3. A language is DFA-decidable if and only if it is NFA-decidable.

So from now on, when we call a language “regular”, which up to this point has been a
synonym for DFA-decidable, we can also interpret it to mean NFA-decidable.

end of lecture 4a

Alternate choices that also work. One thing to note about some of the choices we made
about when to simulate ε-transitions: we could have chosen differently and construct a
different, but still correct, DFA. For example, we could simulate ε-transitions before each
input-consuming-transition: [
δ(R, b) = ∆(E({q}), b),
q∈R

and then we could simply let sD = {sN }. However, we’d have to remember to also do ε-
transitions after the last symbol has been read, by asking not only whether any of the states
N could be in is accepting, but also if any accept states are reachable from those states by
ε-transitions:
FD = {A ⊆ QN | E(A) ∩ FN 6= ∅}.
There’s nothing wrong with doing the above; it also creates a DFA that decides L(N ). For
example, if we did this in Fig. 6.1, then the a-transition from {1, 2}, instead of going to
{2, 3}, would go to {1, 2, 3}. Although this would be a different DFA, it would be equivalent
to that of Fig. 6.1 in the sense of deciding the same language.
60 CHAPTER 6. EQUIVALENCE OF MODELS

We will take the proof of Theorem 6.1.2 to be the “official” subset construction. Partic-
ularly for auto-grading, we need to define something to be the “official” construction, and
this will be it.

Exponential blowup of states in subset construction. Note that the subset construc-
tion uses the power set of QN , which is exponentially larger than QN : |QD | = |P(QN )| =
2|QN | . It can be shown (see Kozen’s textbook) that there are languages for which this
is necessary; the smallest NFA deciding the language has n states, while the smallest
DFA deciding the language has 2n states. For example, for any n ∈ N, the language
{ x ∈ {0, 1}∗ | x[|x| − n] = 0 } has this property.
Thus, if we only care that the number of states is finite, and consider all finite numbers
“equally finite”, then finite automata have equivalent power whether or not they are nonde-
terministic. But if we consider the number of states as a resource (requiring more memory
to implement, for example), then NFAs are more powerful in the sense that for certain prob-
lems, they use vastly less resources than the best DFA. This theme will be revisited in the
unit on computational complexity.

6.2 Equivalence of RGs and NFAs


Perhaps you’re thinking that context-free grammars also define the regular languages. This
turns out not to be true. In Chapter 7, we show that some CFG-decidable languages, such
as {0n 1n | n ∈ N}, are not regular.
However, there is a one-way implication that is fairly simple to prove:
Theorem 6.2.1. Every DFA-decidable language is RRG-decidable (thus also CFG-decidable).
Proof. Let D = (Q, Σ, δ, s, F ) be a DFA. We construct an RRG G = (Γ, Σ, S, ρ) deciding
L(D). See Figure 6.3 for an example.

0
P 1 T P → 0T | 1R
T → 0R | 1P
0 1 0 R → 0P | 1R
P → ε
1 R T → ε

Figure 6.3: Left: A DFA D. Right: A right-regular grammar generating L(D).

Intuitively, any derived string has exactly 1 or 0 variables. G has one production rule for
each transition in D, which adds a new symbol to the produced string and possibly changes
the variable. G also has rules to change the variable to ε if it represents an accept state.
6.2. EQUIVALENCE OF RGS AND NFAS 61

This ensures that any possible string of terminals can be produced, but it is followed by a
variable. The variable can be erased only if it represents an accept state.
Formally, Γ = Q, Σ is the same for each, and S = s. We have a clash of conventions,
lowercase for DFA state names and uppercase for CFG variable names, so we choose upper-
case. There is one rule for each transition in δ: if δ(A, b) = C, then G has a rule A → bC.
There is also one rule for each accept state: if A ∈ F , then G has a rule A → ε.

Detailed proof of correctness: First we show L(G) ⊆ L(D). Each rule has at most one
variable on the right, so every derived string has at most one variable A. To eliminate A
and result in an all-terminal string, A must represent an accepting state, so the string is
accepted by D.
Conversely, to show L(D) ⊆ L(G), any string x accepted by D, which visits states
s = A0 , A1 , . . . , An ∈ F , can be produced by G by applying these rules in order:

A0 → x[1]A1
A1 → x[2]A2
A2 → x[3]A3
...
An−1 → x[n]An
An → ε,

resulting in the derivation A0 ⇒ x[1]A1 ⇒ x[1..2]A2 ⇒ x[1..3]A3 ⇒ . . . ⇒ x[1..n − 1]An−1 ⇒


xAn ⇒ x. Thus L(D) = L(G).

In fact, every RRG can be interpreted as implementing an NFA with no ε-transitions.


Thus, RRGs are yet another way to characterize the regular languages, along with DFAs,
NFAs, and regex’s.
Theorem 6.2.2. Every RRG-decidable language is NFA-decidable.
Proof. Let G = (Γ, Σ, S, ρ) be an RRG. We construct an NFA N = (Q, Σ, ∆, s, F ) deciding
L(G). Let
• Q = Γ,
• s = S,
• F = {A ∈ Γ | A → ε is a rule in ρ}, and
• for all A ∈ Q and b ∈ Σ, ∆(A, b) = {C | A → bC is a rule in ρ}.
Then paths from s to some state in F in N correspond exactly to a sequence of rules in G,
all of which produce a new terminal except the last. Thus for all x ∈ Σ∗ , x ∈ L(N ) ⇐⇒
x ∈ L(G).
62 CHAPTER 6. EQUIVALENCE OF MODELS

6.2.1 Left-regular grammars


There’s an obvious twist on this definition to consider: left-regular grammars (LRG), where
the non-ε rules are of the form A → Cb for C a variable and b a terminal. As you might
suspect, these also define the regular languages. Here’s one easy way to see that: if one
reverses all the strings on the right side of production rules of a CFG G, the resulting CFG
G0 defines the reverse language, i.e., L(G0 ) = L(G)R . If G is right-regular (respectively,
left-regular), then G0 is left-regular (respectively, right-regular). Since the regular languages
are closed under reverse (this hasn’t been shown, but it is true), this means G0 is regular if
and only if G is regular.
A CFG is a regular grammar (RG) if it is either left-regular or right-regular. Thus, we
have the following:
Theorem 6.2.3. A language is regular if and only if it is RG-decidable.
Be careful here with the definition of RG: either all non-ε rules are of the form A → Cb, or
all non-ε rules are of the form A → bC. It is not saying, for each non-ε rule in the grammar,
it is either of the form A → Cb or A → bC. In other words you cannot mix left-regular and
right-regular rules in a single grammar. For example, the grammar

A → 0B
B → A1
A → ε

decides the language {0n 1n | n ∈ N}, which we stated above (and will prove in Chapter 7),
is not regular.
There’s one more variation to consider: also allowing transitions of the form A → B,
i.e., just changing one variable to another, without introducing a new terminal. How would
that affect the construction of Theorem 6.2.2? How would such a rule be represented in the
NFA?

end of lecture 4b

6.3 Equivalence of regex’s and NFAs


Theorem 6.3.1. A language is regular if and only if it is regex-decidable.
We prove each direction separately via two lemmas.

6.3.1 Every regex-decidable language is NFA-decidable


Lemma 6.3.2. Every regex-decidable language is NFA-decidable.
6.3. EQUIVALENCE OF REGEX’S AND NFAS 63

a a

b b

ab a ε b

ε a ε b
ab ∪ a
ε a

ε
ε a ε b
ε
(ab ∪ a)*
ε a

ε
ε
ε a ε b
ε ε
(ab ∪ a)*b ε b
ε a

ε
ε

Figure 6.4: Example of converting the regex (ab ∪ a)∗ b to an NFA. The recursion is shown “bottom-
up”, starting with the base cases for a and b and building larger NFAs using the union construction,
concatenation construction, and Kleene star construction, as in Figures 5.3, 5.4, and 5.5.

Proof. Let R be a regex with input alphabet Σ. It suffices to construct an NFA N =


(Q, Σ, ∆, s, F ) such that L(R) = L(N ). An example of this construction is shown in Fig-
ure 6.4.
The definition of regex gives us six cases:
b
1. R = b, where b ∈ Σ, so L(R) = {b}, decided by NFA:

2. R = ε, so L(R) = {ε}, decided by NFA:

3. R = ∅, so L(R) = ∅, decided by NFA:

4. R = R1 ∪ R2 , so L(R) = L(R1 ) ∪ L(R2 ).

5. R = R1 R2 , so L(R) = L(R1 ) ◦ L(R2 ).


64 CHAPTER 6. EQUIVALENCE OF MODELS

6. R = R1∗ , so L(R) = L(R1 )∗ .


For the last three cases, assume inductively that L(R1 ) and L(R2 ) are NFA-decidable. Since
the NFA-decidable languages are closed under the operations of ∪, ◦, and ∗ , L(R) is NFA-
decidable.

Thus, to convert a regex into an NFA, we replace the base cases in the regex with the
simple 1- or 2-state NFAs above, and then we connect them using the constructions for NFA
union, concatenation, and Kleene star shown in Section 5.3, shown by example in Figures 5.3,
5.4, and 5.5.

6.3.2 Every NFA-decidable language is regex-decidable


First, we observe that NFAs can be converted easily to have a special form that will be useful
in the construction.
Lemma 6.3.3. Every NFA-decidable language is decided by an NFA N = (Q, Σ, ∆, S, F ),
such that
1. s has no transitions entering it,
2. F = {a} where a 6= s, and
3. a has no transitions leaving it.
Proof idea: Add a new start state s and a new accept state a, with new ε-transitions to
mimic the original behavior: from s to the old start state, and from the old accept states to
a. See Fig. 6.5 for an example.

N' 1 0
s' q1 q2
1 0
0 1

N ε
ε
1 0
s ε
s' q1 q2 a
1 0
0 1

Figure 6.5: Modifying an NFA to ensure it has no transitions entering the start state s, and it has
a single accept state a with no transitions leaving a.

Proof. Let N 0 = (Q0 , Σ, ∆0 , s0 , F 0 ) be an NFA. Define the NFA N = (Q, Σ, ∆, s, F ) as follows.


6.3. EQUIVALENCE OF REGEX’S AND NFAS 65

• Q = Q0 ∪ {s, a}, where s, a 6∈ Q0 ,

• F = {a}, and
ε ε
• ∆ has all the transitions of ∆0 , and also s →
− s0 , and for each a0 ∈ F 0 , a0 →
− a.

If a string x is accepted by N 0 , ending in accept state a0 ∈ F 0 , then the same computation


ε ε
sequence, with s0 →− s added to the start, and a0 → − a added to the end, is a computation
sequence showing that N accepts x as well. Thus x ∈ L(N 0 ) =⇒ x ∈ L(N ), i.e., L(N 0 ) ⊆
L(N ).
ε
Conversely, every accepting computation sequence of N on x must start with s → − s0
ε
(since that’s the only transition leaving the start state) and end with a0 → − a for some
a0 ∈ F 0 (since those are the only transitions entering a). Removing these two transitions
results in a path from s0 to some a0 ∈ F , i.e., an accepting computation sequence of N 0 on
x, showing L(N ) ⊆ L(N 0 ).

Now we show how to convert any NFA N into a regex R so that L(N ) = L(R). First we
introduce a generalization of NFAs that will be helpful in the conversion, called expression
automata (EA). Intuitively, an expression automaton can have its transition arrows labeled
with arbitrary regular expressions. A transition from state a to b labeled with regex X is
X
written a −→ b. Since ε and individual alphabet symbols are regular expressions, every NFA
is an EA. The idea is that an EA may read any number of symbols from the input while
following a transition, as long as the substring read matches the regular expression labeling
the transition.
b+a
a
aba

Figure 6.6: Example of an expression automaton (EA). It generalizes NFAs to allow arbitrary
regex’s on the transitions. This EA accepts the strings bba, bbaa, and bbaaababbba, and it rejects
the strings ε, a, b, baaba, and bbaabab.

The construction will work like this. We start with an NFA (which is a special type of
EA). Then we repeatedly remove one state i at a time from the EA. When we do, we stitch
together the transition arrows coming into and out of i, combining their regex’s so that the
EA still decides the same language. When we are done, we will have exactly two states:
a rejecting start state, with a transition going to an accept state. The regex labeling that
transition is the output of the construction.
Now, we show the construction that converts any NFA into an equivalent regex.

Theorem 6.3.4. Every NFA-decidable language is regex-decidable.


66 CHAPTER 6. EQUIVALENCE OF MODELS

Proof. Let N = (Q, Σ, ∆, s, F ) be an NFA. By Lemma 6.3.3 we may assume N has no


transitions entering s and that F = {a} with no transitions leaving a. We convert N to an
EA E such that L(N ) = L(E), where E is of the following form:

s R a

Then L(E) = L(R).

Intuition: Last step going from 3 states to final 2 states. Suppose we have an EA E1
of the following form.
Y

s X Z a
i

W
where W, X, Y, Z are regex’s. Then the following EA E2 decides the same language and
matches the form we seek:
s W ∪ XY ∗ Z a

because in both, the strings leading to a are either of the form w ∈ L(W ), or of the form
xy k z for some k ∈ N, where x ∈ L(X), y ∈ L(Y ), and z ∈ L(Z).

How to remove a state when there are > 3 states remaining. In general, we pick a state
i other than s or a arbitrarily and remove it. If there are more than 3 states remaining, we
may have many transitions entering and leaving the intermediate state i. We iterate over
X Z
each pair of states q, r such that there are transitions q −
→ i and i −→ r. (We do this even
Y W
if q = r, such as t in Fig. 6.7.) Suppose there are also transitions i −→ i and q −→ r. We
W ∪XY ∗ Z Y W
replace these 4 transitions with q −−−−−−→ r. If i − → i or q −→ r or both were not there,
then omit Y or W or both (i.e., the regex could be W ∪ XZ, XY ∗ Z, or XZ). Figure 6.7
shows an example.
Repeat this for all states i ∈ Q \ {s, a}, and at that point the EA matches the form
described above, with a single regex R such that L(R) = L(N ).
Now we have done everything we said we’d do in this chapter: we have shown that DFAs
and NFAs are equivalent, that regex’s are equivalent to both, and that RGs are equivalent
to all. We say a language is regular if it decided by any of these automata, knowing that an
automaton of one type can be converted to an equivalent automaton (meaning, it decides
the same language) of any of the other types. In the next chapter, we move on to studying
languages that cannot be decided by any of these automata.
6.4. OPTIONAL: EQUIVALENCE OF DFAS AND CONSTANT MEMORY PROGRAMS 67

E1 q 110
E2 q 110 ∪ 0*1(00)* 0*1(00)*
0 11 0 11
0*1 r a r a
0*1(00)*1+
s ε s
01 i 01(00)*
1* 1*
1+ 00
t t
01(00)*1+

Figure 6.7: Modifying EA E1 to remove state i, resulting in EA E2 . We have only one state removal
to show the intuitive idea, because this construction makes very large regexes when all states are
removed. The pairs of states between which we add transitions are (q, r), (q, t), (t, r), (t, t). Note
that since E1 has transitions between t and i in both directions, E2 has a self-loop on t. Also note
110
that since there was already a transition q −−→ r, we must take the union of the regex 110 with the
110∪0∗ 1(00)∗
new regex 0∗ 1(00)∗ on the new transition q −−−−−−−−→ r.

end of lecture 4c

6.4 Optional: Equivalence of DFAs and constant memory pro-


grams
We have alluded to the idea that a DFA is a simplified model of a program with constant
memory. Actually, we can even think of the program as having (almost) no memory. The
states in a DFA are analogous to lines of code in a program. if statements and while/for
loops transition the program to one line of code or another based on a Boolean test, in the
same way that a test on the “current” input symbol transitions the DFA to one state to
another.
For example, the DFA in Figure 3.5 is implemented by the following C++ function (the
state names have been changed to start with “q” to be valid label identifiers in C++):
1 # include < cstdlib >
2 bool dfa ( const std :: string input_string ) {
3 // this is the only variable we will declare
4 std :: string :: iterator it = input_string . begin () ;
5 q0 :
6 if ( it == input_string . end () )
7 return true ;
8 if (* it == ’0 ’) {
9 it ++;
10 goto q0 ;
11 } else {
12 it ++;
13 goto q1 ;
68 CHAPTER 6. EQUIVALENCE OF MODELS

14 }
15 q1 :
16 if ( it == input_string . end () )
17 return false ;
18 if (* it == ’0 ’) {
19 it ++;
20 goto q2 ;
21 } else {
22 it ++;
23 goto q0 ;
24 }
25 q2 :
26 if ( it == input_string . end () )
27 return false ;
28 if (* it == ’0 ’) {
29 it ++;
30 goto q1 ;
31 } else {
32 it ++;
33 goto q2 ;
34 }
35 }
Now, the semantics of DFA execution take care of some of the logic above, which is why
the DFA has only 3 states, yet the C++ program has 35 lines of code. But it is still one
block of 10 lines per state.
It is also the case that it is somewhat “cheaty” to say the program has no memory, and
not just because one iterator variable is needed. Even if no variables were needed, usually
the “line of code” is translated, when the program is compiled, to machine code instructions,
and the “line” of code becomes an index into these instructions, i.e., an integer, implemented
in memory/cache as something called a program counter.
The above example illustrates that the following implication is true: if a language is
DFA-decidable, then it is solvable by a C++ bool function with only a single variable to
iterate over the input. The converse implication is true as well, though we will not attempt
to prove it here, as the complex semantics of a general-purpose programming language such
as C++ would make for a tedious proof.
Similar statements are true about other programming languages such as Python, but
not exactly the same statement, since Python lacks a goto statement. Like C/C++, the
language of DFAs allows unstructured programming, where one can potentially jump from
any state (line of code) to any other. So to truly implement a DFA in a structured language
such as Python would require some memory to keep track of the state. For example, the
following Python function implements the same DFA as above, using an enumerated type
to represent the state:
1 from enum import Enum , auto
2 class State ( Enum ) :
3 Q0 = auto ()
4 Q1 = auto ()
5 Q2 = auto ()
6.4. OPTIONAL: EQUIVALENCE OF DFAS AND CONSTANT MEMORY PROGRAMS 69

6
7 def dfa ( input_string ) {
8 state = State . Q0
9 for ( symbol in input_string ) :
10 if state is State . Q0 :
11 if ( symbol == ’0 ’) :
12 state = State . Q0
13 else :
14 state = State . Q1
15 if state is State . Q1 :
16 if ( symbol == ’0 ’) :
17 state = State . Q2
18 else :
19 state = State . Q0
20 if state is State . Q2 :
21 if ( symbol == ’0 ’) :
22 state = State . Q1
23 else :
24 state = State . Q2
25 if state is State . Q1 :
26 return True
27 else :
28 return False
70 CHAPTER 6. EQUIVALENCE OF MODELS
Chapter 7

Proving problems are not solvable in


a model of computation

7.1 Some languages are not regular


7.1.1 Why we need rigor
Recall that given strings a, b, #(a, b) denotes the number of times that a appears as a
substring of b. Consider the languages
C = {x ∈ {0, 1}∗ | #(0, x) = #(1, x)}
C 0 = {x ∈ {0, 1}∗ | #(01, x) = #(10, x)}
C is not regular.1 C 0 looks like almost the same language, and it appears as though
it might not be regular either, since it appears to require counting how many 01’s and
how many 10’s appear in the string, and DFAs cannot count arbitrarily high. However,
C 0 = {x ∈ {0, 1}∗ | x[1] = x[|x|]} (a regular language decided by the regex 0∪1∪0Σ∗ 0∪1Σ∗ 1
for Σ = {0, 1}), because a string begins and ends with the same bit if and only if it changes
bits (switching from 0 to 1, corresponding to the substring 01, or from 1 to 0, corresponding
to the substring 10) the same number of times in each direction.
These examples show that, to demonstrate no DFA can solve a decision problem, it
insufficient to make noises such as “you need to count arbitrarily high, so you can’t do it
with finite states.” Without a formal definition of what it means to “need to count arbitrarily
high”, you could make mistakes and think you “need” to count when you really don’t, as
with language C 0 above.

7.1.2 A non-regular language


But we claimed the language C above is not regular, meaning no DFA decides it. Let’s prove
it formally.
1
Intuitively, it appears to require unlimited memory to count all the 0’s. But we must prove this formally; the
example C 0 shows that a language can appear intuitively to require unbounded memory while actually being regular.

71
72CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

For q ∈ Q and x ∈ Σ∗ , let δ(q,


b x) be the state D is in after reading x, if starting from
state q. Formally, we can define it recursively: δ(q,
b ε) = q and δ(q, b xb) = δ(δ(q,b x), b) for

x ∈ Σ and b ∈ Σ. Note that if δ(q1 , x) = q2 and δ(q2 , y) = q3 , then δ(q1 , xy) = q3 .
b b b

Theorem 7.1.1. Let C = {x ∈ {0, 1}∗ | #(0, x) = #(1, x)}. Then C is not regular.

Proof. Let D = (Q, Σ, δ, s, F ) be a DFA; we show L(D) 6= C. Let p = |Q| and w = 0p 1p ;


note w ∈ C.
Let q0 , q1 , . . . , qn ∈ Q be the computation sequence of D on w, with q0 = s. If qn 6∈ F ,
then w 6∈ L(D), so L(D) 6= C and we are done, so assume qn ∈ F . Since there are more than
p states in q0 , q1 , . . . , qp , by the pigeonhole principle, there are i, j with 0 ≤ i < j ≤ p where
qi = qj . Let m = j − i. Let y = w[i + 1..j], noting |y| = m > 0. Since qi = qj , δ(q b i , y) = qi .2
Let x = w[1..i], so δ(q b 0 , x) = qi and z = w[j + 1..|w|], so δ(q
b i , z) = qn ). (See Fig. 7.1 for
some examples.) Since δ(qi , y) = qi , δ(qi , yy) = qi . Thus δ(q0 , xyyz) = qn . Since qn ∈ F ,
b b b
xyyz ∈ L(D).
Since w starts with p 0’s, x = 0i and y = 0m . But since #(0, xyyz) = p + m > p but
#(1, xyyz) = p, xyyz 6∈ C. Since xyyz ∈ L(D), D does not decide C.

w = 000000111111
0 0 0 0 0 0

s 1 a b 1 s 1 a b 1 s 1 a b 1

1 1 0,1 0 0 1 0 1 0 1 1 1 0
1 1
0 0 0
e d c e 0 d c 1 0 e 0 d c
1

x = 0 y = 000 z = 00111111 x = ε y = 000000 z = 111111 x = 00000 y = 0 z = 111111


first repeated state: a first repeated state: s first repeated state: e
final state: s final state: d final state: a

Figure 7.1: Three different DFAs with p = 6 states, processing input w = 0p 1p = 000000111111,
showing how each DFA partitions w into three substrings w = xyz, where x leads to the first
occurrence of a repeated state, y loops leading to the second occurrence, and z leads to an accept
state. Transitions followed by x shown in green, by y in red, and by z in blue. Some transitions are
0 0
followed by more than one of x, y, z, for instance a →
− b and b → − c in the leftmost DFA, followed
by both y and z.

2
In other words, the string y takes D from state qi back to itself. Thus, another copy of y at this point would
return to state qi again.
7.2. THE PUMPING LEMMA FOR REGULAR LANGUAGES 73

7.1.3 Another non-regular language


Here’s another non-regular language. The proof is almost identical; we highlight differences
in red.
Theorem 7.1.2. The language G = { uu | u ∈ {0, 1}∗ } is not regular.
Proof. Let D = (Q, Σ, δ, s, F ) be a DFA; we show L(D) 6= G. Let p = |Q| and w = 1p 01p 0;
note w ∈ G.
Let q0 , q1 , . . . , qn ∈ Q be the computation sequence of D on w, with q0 = s. If qn 6∈ F ,
then w 6∈ L(D), so L(D) 6= C and we are done, so assume qn ∈ F . Since there are more than
p states in q0 , q1 , . . . , qp , by the pigeonhole principle, there are i, j with 0 ≤ i < j ≤ p where
qi = qj . Let m = j − i. Let y = w[i + 1..j], noting |y| = m > 0. Since qi = qj , δ(q b i , y) = qi .
Let x = w[1..i], so δ(q b 0 , x) = qi and z = w[j + 1..|w|], so δ(q
b i , z) = qn ). Since δ(q
b i , y) = qi ,
δ(q
b i , yy) = qi . Thus δ(q b 0 , xyyz) = qn . Since qn ∈ F , xyyz ∈ L(D).
Since w starts with p 1’s and j ≤ p, x = 1i and y = 1m . xyz has a single 0 in each half,
but xyyz both 0’s in the right half. Thus xyyz does not have equal left and right halves, so
xyyz 6∈ G. Since xyyz ∈ L(D), D does not decide G.

When programming, when you find yourself tempted to copy and paste code from one
part of a project to another, it is helpful to factor out the common code into a single function
that can be called repeatedly. Similarly, when in the course of proving two theorems, you
find that they have very similar proofs, it can be helpful to factor out the common parts into
a single general-purpose lemma. This guides future proofs and removes redundant parts. We
factor out the common parts of the two proofs into a statement called the Pumping Lemma.

end of lecture 5a

7.2 The pumping lemma for regular languages


The pumping lemma is based on the pigeonhole principle. The proof informally states, “If an
input w to a DFA is long enough, then some state must be visited twice, and the substring y
between the two visitations can be repeated (or omitted) without changing the DFA’s final
answer.”3
This idea is formalized next. It gives names to the substring x processed before the first
instance of the repeated state, and the substring z processed after the second instance of
the repeated state. It also observes that no matter how long the string w, if the DFA has p
states then some state must be repeated within the first p symbols of w.
3
That is, we “pump” more copies of y into the full string w. The goal is to pump until we change the membership
of the string in the language, at which point we know the DFA cannot decide the language, since it gives the same
answer on two strings, one of which is in the language and the other of which is not.
74CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

Pumping Lemma. If A is a regular language, there is a number p (the pumping length) so


that if w ∈ A and |w| ≥ p, then w may be divided into three substrings w = xyz, satisfying
the conditions:

1. for all k ∈ N, xy k z ∈ A,

2. |y| > 0, and

3. |xy| ≤ p.

The Pumping Lemma is proved formally in Section 7.4. The proof generalizes the two
proofs above, essentially stripping out the parts that are specific to the languages and re-
placing them with an argument that the three conditions hold.

7.3 Using the Pumping Lemma


We prove the Pumping Lemma at the end of the section. It closely resembles the proofs
above, however. First we show how to use it to prove languages are not regular.
The strategy for employing the Pumping Lemma to show a language L is not regular
is: assuming L is regular with pumping length p, find a long string in L, long enough that
it can be pumped (it has length at least p, and often longer so that we can make the first
p symbols be equal, which often works), then prove that pumping it “moves it out of the
language”. Thus D does not decide L, since D either accepts both strings or rejects both,
by the Pumping Lemma.
Some of the following proof is “boilerplate text” that appears in every proof using the
Pumping Lemma in this section. We highlight the non-boilerplate parts that are specific to
the language in blue. Briefly, these are the parts where we choose w based on p, and where
we prove that xyyz is not in the language.

Theorem 7.3.1. The language B = { 0n 1n | n ∈ N } is not regular.

Proof. Assume for the sake of contradiction that B is regular, with pumping length p. Let
w = 0p 1p . Since w ∈ B and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ B.
Since |xy| ≤ p and w starts with p 0’s, y is all 0’s. Since |y| > 0, xyyz has more 0’s than
1’s.
Thus xyyz 6∈ B, contradicting condition (1) of the Pumping Lemma.

In the previous subsection, we gave a direct proof of the following theorem.

Theorem 7.3.2. The language C = {w ∈ {0, 1}∗ | #(0, w) = #(1, w)} is not regular.

To help avoid mistakes that are often made when using the Pumping Lemma, let’s first
see what a common flawed proof of this theorem looks like.
7.3. USING THE PUMPING LEMMA 75

Bad proof. Assume for the sake of contradiction that C is regular, with pumping length p.
Let w = 0dp/2e 1dp/2e . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, w = xyz, where
|y| > 0, |xy| ≤ p, and for all k ∈ N, xy k z ∈ C.
Let x = ε, y = 0dp/2e , and z = 1dp/2e . Then xyyz = 02dp/2e 1dp/2e , which has more 0’s than
1’s.
Thus xyyz 6∈ C, contradicting condition (1) of the Pumping Lemma.

What’s wrong with this proof? It’s highlighted in red. We wanted y to be all 0’s so that
when we pump a second copy into w = xyz to create xyyz, we end up with more 0’s than
1’s. But just because we want y to be all 0’s doesn’t mean we get to simply declare that it
is all 0’s. The source of the error is the sentence, “Let x = ε, y = 0dp/2e , and z = 1dp/2e .”
What if those aren’t the x, y, and z that the Pumping Lemma finds? The Pumping Lemma
is based on reasoning about how a p-state DFA would process w, Fig. 7.1 reminds us that
the DFA dictates to us how w is split into xyz. We don’t get to choose how w is split.
What if x = z = ε and y = 0dp/2e 1dp/2e ? This is totally consistent with the Pumping
Lemma: |y| > 0 and |xy| ≤ p, just as promised. But for this choice of x and y, then xy k z
k
= 0dp/2e 1dp/2e , which has an equal number of 0’s and 1’s. So xy k z ∈ C for all k, and we
don’t get a contradiction.
Think of the Pumping Lemma as a contract: we give it w of length at least p, and it
gives back x, y, and z. It is guaranteed to fulfill the exact terms of the contract: |y| > 0,
|xy| ≤ p, and xy k z ∈ C for all k ∈ N. Nothing more, nothing less.
If we want y to have some extra property, such as being all 0’s, we have to prove that
y has that property, that this property follows from the conditions |y| > 0 and |xy| ≤ p
guaranteed by the Pumping Lemma. We must take care in choosing w so that every choice
of x, y, and z satisfying |y| > 0 and |xy| ≤ p will give us the property we want.
In this case, if we choose w to be a bit longer, we can use condition (3) to conclude that
y is indeed all 0’s. We highlight in blue the parts that are different from other proofs using
the Pumping Lemma.

Proof of Theorem 7.3.2. Assume for the sake of contradiction that C is regular, with pump-
ing length p. Let w = 0p 1p . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, w = xyz,
where |y| > 0, |xy| ≤ p, and for all k ∈ N, xy k z ∈ C.
Since |xy| ≤ p and w starts with p 0’s, y is all 0’s. Then xyyz has more 0’s than 1’s.
Thus xyyz 6∈ C, contradicting condition (1) of the Pumping Lemma.

Here’s an alternate proof that does not directly use the Pumping Lemma. It uses closure
properties and appeals to the fact that we already proved that B = {0n 1n | n ∈ N} is not
regular.

Alternate proof of Theorem 7.3.2. If C were regular, then by closure of regular languages
under ∩, C ∩ L(0∗ 1∗ ) = {0n 1n | n ∈ N} would be regular,4 contradicting Theorem 7.3.1.
4
An analogy: if I tell you I have a number x and an integer n, and I tell you x + n = 3.14, what do you know about
x? We don’t know the exact value of x, but we know it can’t be an integer. The integers are closed under addition
76CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

Theorem 7.3.3. The language G = { uu | u ∈ {0, 1}∗ } is not regular.


Proof. Assume for the sake of contradiction that G is regular, with pumping length p. Let
w = 1p 01p 0. Since w ∈ G and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ G.
Since |xy| ≤ p, x and y are all 1’s. Let m = |y| > 0. Then xyyz = 1p+m 01p 0. Since
p + m > p, xyyx 6= uu for any u.
Thus xyyz 6∈ G, contradicting condition (1) of the Pumping Lemma.
Here is a nonregular unary language.
n o
2
Theorem 7.3.4. The language D = 1n n ∈ N is not regular.

Proof. Assume for the sake of contradiction that D is regular, with pumping length p. Let
2
w = 1p . Since w ∈ D and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ D.
2
Let w0 = 1(p+1) be the next biggest string in D after w; then no string u where |w| <
|u| < |w0 | is in D. Then

|w0 | − |w| = (p + 1)2 − p2


= p2 + 2p + 1 − p2
= 2p + 1
> p.

Since |y| ≤ p, |xyyz| − |w| ≤ p, so |xyyz| < |w0 |. Since |y| > 0, |w| < |xyyz| < |w0 |.
Thus xyyz 6∈ D, contradicting condition (1) of the Pumping Lemma.
The next example shows that “pumping down” (replacing xyz with xz) can be useful.
Theorem 7.3.5. The language E = {0i 1j | i > j} is not regular.
Proof. Assume for the sake of contradiction that E is regular, with pumping length p. Let
w = 0p+1 1p . Since w ∈ E and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ E.
Since |xy| ≤ p, y is all 0’s. Let m = |y| > 0. Then xz = 0p+1−m 1p , noting p + 1 − m ≤ p.
Thus xz 6∈ E, contradicting condition (1) of the Pumping Lemma.
Finally, let’s see a proof where we can’t choose the first p symbols to be the same, since
the language disallows it. We have more cases to consider, since we cannot guarantee that,
as in the above proofs, y will be a unary string. Depending on the exact form of y, different
arguments are needed.
Theorem 7.3.6. The language H = {(01)n (10)n | n ∈ N} is not regular.
(the sum of two integers is always an integer), so if I add x to an integer and get 3.14, x must be a non-integer.
7.4. OPTIONAL: PROOF OF THE PUMPING LEMMA FOR REGULAR LANGUAGES 77

For example, H contains ε, 0110, 01011010, 010101101010 but not 0011 or 0101.

Proof. Assume for the sake of contradiction that H is regular, with pumping length p. Let
w = (01)p (10)p . Since w ∈ H and |w| ≥ p, by the Pumping Lemma, w = xyz, where |y| > 0,
|xy| ≤ p, and for all k ∈ N, xy k z ∈ H.
Since |xy| ≤ p and w starts with p consecutive 01 substrings, x and y both occur in this
region.5 Say that a block in a string is a substring of length 2 starting at an odd index. For
example, the blocks of 01011011 are 01, 01, 10, 11. We have two cases:

|y| is even: Then either y = (01)m or (10)m for some m. In either case, xyyz has more 01
blocks than 10 blocks, so xyyz 6∈ H.6

|y| is odd: Then y has a different number of 0’s and 1’s, so xyyz does also, so xyyz 6∈ H.

Thus xyyz 6∈ H, contradicting condition (1) of the Pumping Lemma.

end of lecture 5b

7.4 Optional: Proof of the Pumping Lemma for regular languages


Finally, we prove the Pumping Lemma. This section is optional because the main ideas are
already present in the more concrete proof of Theorem 7.1.1, but we provide the full abstract
proof for completeness.
As a running example to help understand the ideas, see the DFA D in Figure 7.2.
Note that if we reach, for example, a, then read the bits 1001, we will return to a.
Therefore, for any string x such that δ(s,
b x) = a, such as x = 0, it follows that δ(s,
b x1001) =
b x10011001) = a, etc. i.e., for all k ∈ N, defining y = 1001, δ(s, k
q1 , and δ(s, b xy ) = a.
Also notice that if we are in state a, and we read the string z = 110, we end up in
accepting state e. Therefore, δ(s,
b xz) = e, thus D accepts xz. But combined with the
previous reasoning, we also have that D accepts xy i z for all n ∈ N. In the examples
x = 0, y = 1001, z = 110.
Note that the cycle in which a string gets trapped may depend on the string. Any long
enough string starting with 1 will repeatedly visit the state f , but will never visit a.
More generally, for any DFA D, any string w with |w| ≥ |Q| has to cause D to traverse
a cycle. The substring y read along this cycle can either be removed, or more copies added,
and D will end in the same state. Calling x the part of w before y and z the part after, we
5
If y were of the form (01)∗ , the rest of the proof could be similar to the previous proofs. The complication is
that, although y appears in the first half of the string w, which is of the form (01)∗ , y is merely a substring of it, and
so it may not start with 0 or end with 1 or have even length.
6
This becomes apparent by trying out a few examples of where y could be.
78CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

0 0
0
s a 1
d 0
c
1 1 0 0 1
g 1
b f 0 e
1
0,1 0
Figure 7.2: A DFA D = (Q, Σ, δ, s, F ) to help illustrate the idea of the pumping lemma. Reading
x reaches a, then reading y returns to a, then reading z reaches accepting state e. The strings xz,
xyz, xyyz, xyyyz, . . . are all accepted too, since they also end up in state e. Their paths through
D differ only in how many times they follow the cycle (a, b, c, d, a).

have that whatever is D’s answer on w = xyz, it is the same on xy k z for any i ∈ N, since
all of those strings take D to the same state.

Proof of Pumping Lemma. Let A be a regular language, decided by a DFA D = (Q, Σ, δ, s, F ).


Let p = |Q|.
Let w ∈ A be a string with |w| = n ≥ p. Let q0 , q1 , . . . , qn be the computation sequence
of n + 1 states of D on w. Note q0 = s and qn ∈ F . Since n + 1 ≥ p + 1 > |Q|, by the
pigeonhole principle, two states in the prefix q0 , q1 , . . . , qp must be equal. We call them qi
and qj , with i < j ≤ p. Let x = w[1 . . i], y = w[i + 1 . . j], and z = w[j + 1 . . n]. For example,
with i = 1, j = 5, and qi = qj = a: (This corresponds to the string w = 010011100 on the
DFA in Figure 7.2.)
x y z
w = w[1] w[2] w[3] w[4] w[5] w[6] w[7] w[8] w[9]

s a b c d a b f e e

b j , y k ) = qj .
b i , y) = qj = qi , for all k ∈ N, δ(q
Since δ(q
b 0 , xy k z) = qn ∈ F , so xy k z ∈ L(D), satisfying (1). Since j 6= l, |y| > 0,
Therefore δ(q
satisfying (2). Finally, j ≤ p, so |xy| ≤ p, satisfying (3).

7.5 Optional: The Myhill-Nerode Theorem


The Pumping Lemma is a useful way to prove that a certain language is not regular. How-
ever, for some languages, it is not straightforward to prove it is nonregular using the Pumping
Lemma. There is a more powerful result, called the Myhill-Nerode Theorem, that can essen-
tially be used to resolve the question of whether any language is regular or not. (In fact,
7.5. OPTIONAL: THE MYHILL-NERODE THEOREM 79

unlike the Pumping Lemma, it can even be used to prove that a language is regular.)

7.5.1 Distinguishing extensions and statement of Myhill-Nerode Theorem


For any language L ⊆ Σ∗ , say two strings x, y ∈ Σ∗ are L-distinguishable if they have a
distinguishing extension z, meaning that xz ∈ L ⇐⇒ yz 6∈ L, i.e., one of xz and yz is in
L, and the other is not.
We say x and y are L-equivalent, written x ∼L y, if they are not L-distinguishable: for
all z ∈ Σ∗ , xz ∈ L ⇐⇒ yz ∈ L.
For example, for the language L = {0n 1n | n ∈ N}, the strings x = 00 and y = 000 have
a distinguishing extension. Both x and y are not in L, but setting z = 11, we have xz ∈ L
and yz 6∈ L, so z is a distinguishing extension. Thus x 6∼L y.
Note that 0, 00, 000, 0000, 00000, 000000, . . . are all ∼L -inequivalent: for any i < j, 0i and
0 have distinguishing extension 1i , so L has an infinite number of equivalence classes.
j

Theorem 7.5.1 (Myhill-Nerode Theorem). A language L is regular if and only if ∼L defines


a finite number of equivalence classes. Furthermore, the number of equivalence classes equals
the number of states in the smallest DFA deciding L.
So, if a language is not regular, this can always be proven by finding an infinite number
of strings that are all pairwise distinguishable.

7.5.2 Examples of using Myhill-Nerode Theorem


Theorem 7.5.2. The language B = {0n 1n | n ∈ N} is not regular.
Proof. Let S = {0n | n ∈ N} = {ε, 0, 00, 000, 0000, . . .}. All strings 0i , 0j ∈ S with i < j
have distinguishing extension 1i since 0i 1i ∈ B but 0j 1i 6∈ B. Since S is infinite, B is not
regular by the Myhill-Nerode Theorem.
The same proof actually works for this language as well:
Theorem 7.5.3. The language C = {x ∈ {0, 1}∗ | #(0, x) = #(1, x)} is not regular.
Proof. Let S = {0n | n ∈ N} = {ε, 0, 00, 000, 0000, . . .}. All strings 0i , 0j ∈ S with i < j
have distinguishing extension 1i since 0i 1i ∈ C but 0j 1i 6∈ C. Since S is infinite, C is not
regular by the Myhill-Nerode Theorem.
Here’s an alternate proof that does not directly use the Myhill-Nerode Theorem. It uses
closure properties and appeals to the fact that we already proved that B = {0n 1n | n ∈ N}
is not regular.
Alternate proof of Theorem 7.5.3. If C were regular, then by closure of regular languages
under ∩, C ∩ L(0∗ 1∗ ) = {0n 1n | n ∈ N} would be regular,7 contradicting Theorem 7.5.2.
7
An analogy: if I tell you I have a number x and an integer n, and I tell you x + n = 3.14, what do you know about
x? We don’t know the exact value of x, but we know it can’t be an integer. The integers are closed under addition
(the sum of two integers is always an integer), so if I add x to an integer and get 3.14, x must be a non-integer.
80CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

Here is a nonregular unary language.


n o
n2
Theorem 7.5.4. The language D = 1 n ∈ N is not regular.

Proof. Here, letting S = D itself works as the infinite set of distinguishable strings. All
2 2 2 2 2
strings 1k , 1n ∈ S with k < n have distinguishing extension z = 1(k+1) −k , since 1k z =
2 2 2 2 2 2
1(k+1) ∈ D but 1n z = 1n +(k+1) −k = 1n +2k+1 6∈ D.
The last follows because k < n =⇒ 2k + 1 < 2n + 1, but 2n + 1 is the distance from
n2 to the next perfect square (n + 1)2 . Thus n2 + 2k + 1 lies strictly between two adjacent
2
perfect squares n2 and (n + 1)2 , so is not itself a perfect square, so 1n +2k+1 6∈ D. So D is
not regular by the Myhill-Nerode Theorem.
The next example shows that “pumping down” (replacing xyz with xz) can be useful.
Theorem 7.5.5. The language E = {0i 1j | i > j} is not regular.
Proof. Let S = {0n | n ∈ N} = {ε, 0, 00, 000, 0000, . . .}. All strings 0i , 0j ∈ S with i < j
have distinguishing extension 1i since 0i 1i 6∈ E but 0j 1i ∈ E. Since S is infinite, E is not
regular by the Myhill-Nerode Theorem.
Theorem 7.5.6. The language H = {(01)n (10)n | n ∈ N} is not regular.
For example, H contains ε, 0110, 01011010, 010101101010 but not 0011 or 0101.
Proof. Let S = {(01)n | n ∈ N} = {ε, 01, 0101, 010101, . . .}. All strings (01)i , (01)j ∈ S with
i < j have distinguishing extension (10)i since (01)i (10)i ∈ H but (01)j (10)i 6∈ H. Since S
is infinite, H is not regular by the Myhill-Nerode Theorem.

7.5.3 Optional: Proof of Myhill-Nerode Theorem


Myhill-Nerode Theorem. A language L is regular if and only if ∼L defines a finite
number of equivalence classes.
Proof. ( =⇒ :) Since L is regular, some DFA D decides L. Any two strings x and y such
that δ(x)
b = δ(y)
b also obey δ(xz)
b = δ(yz)
b for all strings z, so xz ∈ L ⇐⇒ yz ∈ L.
Thus x ∼L y. So each state in D defines an equivalence class of ∼L .
( ⇐= :) Suppose L ⊆ Σ∗ has equivalence classes C1 , . . . , Cn . Define the DFA D = (Q, Σ, δ, s, F ),
where
• Q = {q1 , . . . , qn },
• s = qi such that ε ∈ Ci ,
• F = {qi | Ci ⊆ L},
• for all qi ∈ Q and b ∈ Σ, δ(qi , b) = qj , where j is defined by picking any x ∈ Ci
and letting j be such that xb ∈ Cj .
7.6. OPTIONAL: THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 81

7.6 Optional: The Pumping Lemma for context-free languages


We showed that the language {0n 1n | n ∈ N} is not regular, but it is context-free (CFG-
decidable, by the CFG S → 0S1|ε). Is every language context-free? It turns out the answer
is no; for example {0n 1n 2n | n ∈ N} is not context-free.
There is a version of the pumping lemma for context-free languages, which helps us prove
that certain languages are not context-free, in the same way the original pumping lemma is
used to show certain languages are not regular.
As with the pumping lemma for regular languages, we first prove that some languages
are not context-free directly, in order to see the proof technique in a more direct and less
abstract way, before stating the full lemma.
Intuitively, given any CFG G with k variables, any sufficiently large string must have a
parse tree of G with a root-to-leaf path of length > k. By the pigeonhole principle, along
this path, some variable A must repeat. But what does it mean for a subtree with A at the
root to also have A in an lower node? It means that we could replace the smaller subtree
with the larger subtree.
TODO: Create figure showing this.

Let’s use this idea to prove a simple lemma.


Lemma 7.6.1. If G = (Γ, Σ, S, ρ) is a CFG with at most b ∈ N symbols on the right-hand
side of any rule, then for every string x ∈ L(G) with |x| > b|Γ|+1 , every parse tree of x in G
has a root-to-leaf path that repeats some variable A ∈ Γ.
Proof. Let b be the most number of symbols on the right-hand side of any rule. Then parse
trees of G have branching factor at most b (i.e., each node has at most b children). Any tree
with branching factor b and height h has at most bh leaves, so if the tree has > bh leaves,
then the tree has height > h.
Let p = b|Γ|+1 and let x ∈ L(G) such that |x| ≥ p. Then any parse tree of x in G has
height > |Γ| + 1, so the longest root-to-leaf path has > |Γ| non-leaf nodes. By the pigeonhole
principle, some node must repeat a variable A ∈ Γ.
Theorem 7.6.2. The language {0n 1n 2n | n ∈ N} is not context free.
Proof. Suppose for the sake of contradiction that the language is context-free, generated
by CFG G = (Γ, Σ, ρ, S) with at most b symbols on the right-hand side of any rule. Let
p = b|Γ|+1 and let w = 0p 1p 2p . By Lemma 7.6.1, any parse tree of w in G must repeat a
variable A on some root-to-leaf path. Let x and y respectively be the substrings of w on the
leaves under the lower and upper occurrences of A, respectively. (Note that x is a substring
of y.)
TODO: this is easier to visualize with a figure showing the parse tree

Then we can replace the subtree under the upper A (whose leaves are y) with the smaller
subtree (whose leaves are x), and this remains a valid parse tree, whose leaves are the string
z.
We have three cases:
82CHAPTER 7. PROVING PROBLEMS ARE NOT SOLVABLE IN A MODEL OF COMPUTATION

y has no 0’s: Then y also has no 0’s, so by replacing y with x, we have decreased the
number of 1’s and/or 2’s without decreasing the number of 0’s, so z is not of the form
0n 1n 2n for any n ∈ N. But since z has a parse tree in G, z ∈ L(G), a contradiction.
y has no 2’s: This is symmetric to the previous case.
y has both 0’s and 2’s: There are three sub-cases:
x has no 0’s: Then replacing y with x decreases the number of 1’s and/or 2’s without
decreasing the number of 0’s.
x has no 2’s: This is symmetric to the previous sub-case.
x has both 0’s and 2’s: Then x contains all the 1’s in w. Thus, replacing y with x
decreases the number of 0’s and/or 2’s without decreasing the number of 1’s.
Maybe just state formal pumping lemma at end, but do mostly examples. If a parse tree repeats a variable on a root-leaf
path, then this can be pumped.
Part II

Computational Complexity and


Computability Theory

83
Chapter 8

Turing machines

8.1 Intuitive idea of Turing machines


A Turing machine (TM) is a finite automaton with an unbounded read/write tape memory.

finite-state control

Turing
machine
(TM) 0 1 0 ...
unbounded read/write tape

Figure 8.1: Intuitive idea of a Turing machine (TM).

One can think of a finite automaton as taking its input string from a “tape”, in which
the finite automaton starts reading the leftmost tape cell, moves to the right by one tape cell
each time step, and halts after moving off of the rightmost tape cell. From this perspective,
the differences between finite automata and TMs are:
• A TM can write on its tape; furthermore, it can write symbols that are not part of the
input alphabet (in particular, the blank symbol xy is never part of the input alphabet,
but appears on the tape).
• The read-write tape head can move right, but also can move left or stay still. This
means it can read the same input symbol more than once.
• The tape is unbounded: if the tape head moves off the rightmost tape cell, a new tape
cell appears, with a xy on it.1
1
It is common to describe a TM as having an infinite tape, which makes it appear unrealistic. But there is no

85
86 CHAPTER 8. TURING MACHINES

• Most states do not accept or reject; there is exactly one accept state and one reject
state, and the TM immediately halts upon entering either state. Conversely, the TM
will not halt until it reaches one of these states, which may never happen.
Recall that wR represents the reverse of w.
Example 8.1.1. Design a Turing machine to test membership in the palindrome language

P = w ∈ {0, 1}∗ | w = wR .


1. Zig-zag to either side of the string, checking if the leftmost symbol equals the rightmost
symbol. If not, reject.
2. “Cross off” symbols as they are checked (i.e., replace the symbol with a symbol x not
in the input alphabet).
3. If we make it to the end without rejecting, accept.

This can be implemented in the Turing machine simulator with

// This TM decides the language { w in {0,1}* | w = w^R }


states = {s,r00,r11,r01,r10,l,lx,qA,qR}
input_alphabet = {0,1}
tape_alphabet_extra = {x,_}
start_state = s
accept_state = qA
reject_state = qR
num_tapes = 1
delta =
s, 0 -> r00, x, R;
s, 1 -> r11, x, R;
r00, 0 -> r00, 0, R;
r00, 1 -> r01, 1, R;
r01, 0 -> r00, 0, R;
r01, 1 -> r01, 1, R;
r10, 0 -> r10, 0, R;
r10, 1 -> r11, 1, R;
r11, 0 -> r10, 0, R;
r11, 1 -> r11, 1, R;
r00, _ -> lx, _, L;
r11, _ -> lx, _, L;
r00, x -> lx, x, L;
need to posit anything infinite. Like a list in Python/Java or a vector in C++, the tape is finite, but it can always
grow to a larger (yet still finite) size.
8.2. FORMAL DEFINITION OF A TM (SYNTAX) 87

r11, x -> lx, x, L;


lx, 0 -> l, x, L;
lx, 1 -> l, x, L;
lx, x -> qA, x, S;
l, 0 -> l, 0, L;
l, 1 -> l, 1, L;
l, x -> s, x, R;
s, x -> qA, x, S;

s
0→x,R 1→x,R

_,x
0→R r00 r11 1→R

0→R 1→R qa 0→R 1→R

1→R r01 r10 0→R


x

_,x→L lx _,x→L

0,1→x,L qr
x→R
l 0,1→L

Figure 8.2: TM that decides {w ∈ {0, 1}∗ | w = wR }. Although state qr has no explicit incoming
transition, every transition not shown (e.g., reading a xy in state r10 ) implicitly goes to state qr .

8.2 Formal definition of a TM (syntax)


Definition 8.2.1. A Turing machine (TM ) is a 7-tuple (Q, Σ, Γ, δ, s, qa , qr ), where

• Q is a finite set of states,

• Σ is the input alphabet,


88 CHAPTER 8. TURING MACHINES

2
• Γ is the tape alphabet, where Σ Γ and xy ∈ Γ \ Σ,
• s ∈ Q the start state,
• qa ∈ Q the accept state,
• qr ∈ Q the reject state, where qa 6= qr , and
• δ : (Q \ {qa , qr }) × Γ → Q × Γ × {L, R, S} is the transition function.
Example 8.2.2. We formally define the TM M1 = (Q, Σ, Γ, δ, q1 , qa , qr ) described earlier,
which decides the language P = {w ∈ {0, 1}∗ | w = wR }:
• Q = {s, r00 , r01 , r10 , r11 , lx , l, qa , qr },
• Σ = {0, 1} and Γ = {0, 1, x, xy},
• s is the start state, qa is the accept state, qr is the reject state, and
• δ is shown in Figure 8.2.

8.3 Formal definition of computation by a TM (infinite semantics)


This section shows a simple definition of TM semantics that uses an infinite tape. Section 8.4
has a more complex definition that shows TM don’t actually require anything to be infinite:
a finite tape suffices, but it requires many ugly special cases to define when to grow the tape.
A configuration of a TM M = (Q, Σ, Γ, δ, s, qa , qr ) is a triple C = (q, p, w), where
• q ∈ Q is the current state,
• p ∈ N+ is the tape head position,
• w ∈ Γ∞ (a one-way infinite sequence of symbols) is the tape content.
Let C = (q, p, w) and C 0 = (q 0 , p0 , w0 ) be configurations.
Informally, we want to talk about C 0 being the configuration that the TM will enter
immediately after C.
We identify the moves L, R, S with integers −1, +1, 0, respectively. We say C yields C 0 ,
and we write C → C 0 , if and only if δ(q, w[p]) = (q 0 , w0 [p], m), where
• p0 = max(1, p + m) (move tape head position, but don’t move left if it’s already 1)
• w[i] = w0 [i] for all i ∈ N+ \ {p} (tape unchanged away from the tape head)
2
Since Σ ⊂ Γ, in the simulator, rather than writing down all of Γ, which would be redundant, you only write
Γ \ Σ, i.e., the symbols in the tape alphabet that are not in the input alphabet, calling this tape alphabet extra.
The union of tape alphabet extra and Σ = input alphabet is the tape alphabet Γ. An underscore represents xy.
8.4. OPTIONAL: FORMAL DEFINITION OF COMPUTATION BY A TM (FINITE SEMANTICS)89

A configuration (q, p, w) is accepting if q = qa , rejecting if q = qr , and halting if it is


accepting or rejecting. M accepts (respectively, rejects) input x ∈ Σ∗ if there is a finite
sequence of configurations C1 , C2 , . . . , Ck such that

1. Ck is accepting (respectively, rejecting),

2. C1 = (s, 1, xxy∞ ) (initial/start configuration: the TM starts with its input written at
the leftmost part on the tape, with xy written everywhere else.),

3. for all i ∈ {1, . . . , k − 1}, Ci → Ci+1 .

If M neither accepts nor rejects x, then we say M loops (a.k.a., does not halt) on x.

end of lecture 5c

8.4 Optional: Formal definition of computation by a TM (finite


semantics)
A configuration of a TM M = (Q, Σ, Γ, δ, s, qa , qr ) is a triple C = (q, p, w), where

• q ∈ Q is the current state,

• p ∈ N+ is the tape head position,

• w ∈ Γ∗ is the tape content, the string consisting of the symbols starting at the leftmost
position of the tape, until the rightmost non-blank symbol, or the largest position the
tape head has scanned, whichever is larger.

Let C = (q, p, w) and C 0 = (q 0 , p0 , w0 ) be configurations.


Informally, we want to talk about C 0 being the configuration that the TM will enter
immediately after C.
We identify the moves L, R, S with integers −1, +1, 0, respectively. We say C yields C 0 ,
and we write C → C 0 , if and only if δ(q, w[p]) = (q 0 , w0 [p], m), where

• p0 = max(1, p + m) (move tape head position, but don’t move left if it’s already 1)

• w[i] = w0 [i] for all i ∈ {1, . . . , |w|} \ {p} (tape unchanged away from the tape head)

• if m = L or S, then |w0 | = |w| (no need to grow the tape on a left/stay move)

• if m = R, then |w0 | = |w| if p0 < |w| (don’t grow the tape if tape head was not already
on rightmost cell), otherwise |w0 | = |w| + 1 and w0 [p0 − 1] = xy. (grow the tape and put
a xy in the new position)
90 CHAPTER 8. TURING MACHINES

A configuration (q, p, w) is accepting if q = qa , rejecting if q = qr , and halting if it is


accepting or rejecting. M accepts (respectively, rejects) input x ∈ Σ∗ if there is a finite
sequence of configurations C1 , C2 , . . . , Ck such that
1. Ck is accepting (respectively, rejecting),
2. C1 = (s, 1, x) if x 6= ε and C1 = (s, 0, xy) if x = ε (initial/start configuration: the TM
starts with only its input on the tape, or xy if input is empty.),
3. for all i ∈ {1, . . . , k − 1}, Ci → Ci+1 .
If M neither accepts nor rejects x, then we say M loops (a.k.a., does not halt) on x.

8.5 Languages recognized/decided by TMs


The language recognized by M is L(M ) = { x ∈ Σ∗ | M accepts x } .
Definition 8.5.1. A language is Turing-recognizable (a.k.a., Turing-acceptable, computably
enumerable, recursively enumerable, c.e., or r.e.) if some TM recognizes it.
A language is co-Turing-recognizable (a.k.a., co-c.e., co-r.e.) if its complement is Turing-
recognizable.
On any input, a TM may accept, reject, or loop (run forever without entering a halting
state).
If a language L is Turing-recognizable, then some Turing machine M exists so that, for
any x ∈ L, M accepts x, and for any x 6∈ L, M either rejects x or does not halt on input x.
Of course, the last is no good if we want to use M to actually solve a problem. If M halts
on every input string, we say it is total.3 A total TM M is also called a decider, and we say
it decides the language L(M ).
Definition 8.5.2. A language is called Turing-decidable (recursive), or simply decidable, if
some TM decides it.
We take the formal statement “a problem is decidable” to coincide with the intuitive
notion of “the problem is solvable by some algorithm”.
One might ask why we introduce the concept of Turing-recognizability, when Turing-
decidability is what we really want from algorithms (they always halt and give an answer).
There are a few reasons. First, there is a lot of deep mathematical structure in the notions
of Turing-recognizable and co-Turing-recognizable, which has attracted the attention of a lot
of mathematicians and logicians who find similarities with classical notions of mathematical
logic.
3
The terminology total comes from the fact that a function defined on every input in its domain is called a total
function. We can think of the decision problem solved by any model of computation equivalently as a predicate: a
function with range {0, 1}) φ : Σ∗ → {0, 1}. If a TM M halts on all inputs, then φ is defined on all inputs, so it is a
total function. φ is a partial function if M does not halt on certain inputs, since for those inputs, φ is undefined.
8.6. VARIANTS OF TMS 91

Another important reason is this: Many decision problems we study later will be of the
form, “given a Turing machine M , does L(M ) have some property?” (e.g., is nonempty, is
infinite, etc.) Every DFA, regex, NFA, and CFG has some unique language that it decides.
With Turing machines, this is true of recognizing but not deciding. That is, every Turing
machine M defines a unique language L(M ) that it recognizes. But if M is not total, it does
not decide any language. So Turing-recognizability gives us a way of making a one-to-one
equivalence between machines and languages, in a way that Turing decidability does not.
However, in the case of TMs that always halt, the TM decides the same language L(M ) that
it recognizes, so asking the question about L(M ) is not a restriction. It simply gives a way
for the question to be well-defined on all possible TMs, rather than only on those TMs that
always halt.

8.6 Variants of TMs


Why do we believe Turing machines are an appropriate model of computation?
One reason is that the definition is robust: we can make all number of changes, including
apparent enhancements, without actually adding any computation power to the model. We
describe some of these, and show that they have the same capabilities as the Turing machines
described in the previous section.4
In a sense, the equivalence of these models is unsurprising to programmers. Programmers
are well aware that all general-purpose programming languages can accomplish the same
tasks, since an interpreter for a given language can be implemented in any of the other
languages.

8.6.1 Multitape TMs


A multitape Turing machine has k ≥ 2 tapes, each with its own read/write head. The first
is the input tape and starts with the input string written on it as with a single-tape TM.
Each remaining tape is called a worktape and starts with all xy’s.
With a single-tape TM, there is rarely any reason to let a tape head stay where it is: a
sequence of multiple state transitions that occur while the tape head stays where it is could
be replaced with a single transition to the final state in the sequence. However, with multiple
4
This does not mean that any change whatsoever to the Turing machine model will preserve its abilities. By re-
placing the tape with a stack, for instance, we would reduce the power of the Turing machine to that of a deterministic
pushdown automaton, which cannot even recognize languages such as {x | x = xR }.
Conversely, allowing the start configuration to contain an arbitrary infinite sequence of symbols already written on
the tape (instead of xy on all but finitely many positions) would add power to the model: by writing an encoding of
an undecidable language, the machine would be able to decide that language. But this would not mean the machine
had magic powers; it just means that we cheated by providing the machine (without the machine having to “work
for it”) an infinite amount of information before the computation even begins. In other words, it would not be the
machine with the powerful computational ability, but the programmer of the machine who is assumed to be able to
write down an uncomputable infinite sequence of symbols.
92 CHAPTER 8. TURING MACHINES

tapes, it is often convenient to move only some tape heads but let others stay put, so we
make more use of the S move.

Transition function type signature. We omit a detailed definition of the syntax and
semantics of multitape TMs. But it is worth pointing out that the key part of that definition,
the transition function, would have a type signature

δ : (Q \ {qa , qr }) × Γk → Q × Γk × {L, R, S}k

For example, the expression

δ(qi , 0, 1, 0) = (qj , 1, 1, 0, S, S, L)

means that, if the 3-tape TM is in state qi and heads 1 through 3 are reading symbols 0, 1, 0,
the machine goes to state qj , writes a 1 on the first tape, and moves the third tape head left.
Theorem 8.6.1. For every multitape TM M , there is a single-tape TM S such that L(M ) =
L(S). Furthermore, if M is total, then S is total.

0 1 0 1 1 0 ...
M
multitape TM x x x y x x y ...

S 0x 1x 0x 1y 1x 0x y ...
single-tape TM

Figure 8.3: Single-tape TM simulating a multitape TM. According to the formalization of S’s
tape alphabet in the proof of Theorem 8.6.1, the symbols on S’s tape from left to right are
((0, N ), (x, N )), ((1, H), (x, N )), ((0, N ), (x, N )), ((1, N ), (y, H)), ((1, N ), (x, N )), ((0, N ), (x, N )),
((xy, N ), (y, N )), ((xy, N ), (xy, N )), but we write them in the figure in an easier-to-read fashion as a
string of symbols with possible dots indicating the presence of one of M ’s tape heads.

Proof. The main task is to describe three things:


1. How any configuration CM of M can be represented as a configuration CS of S.
2. How S can alter its initial configuration to represent the initial configuration of M .
8.6. VARIANTS OF TMS 93

0 0
3. If CM → CM , how S can update CS to represent CM .

There will not be a 1-1 correspondence of configurations between M and S: S will take
many transitions to simulate a single transition of M , so many consecutive configurations of
S will represent the same configuration of M .
The idea is shown in Figure 8.3. Say that M has k tapes. Each symbol in the tape
alphabet of S actually represents k symbols from the tape alphabet of M , as well as k bits
representing “is the i’th tape head of M located here?”, which will be true for exactly one
cell.

Representing a configuration of M with a configuration of S. Formally, if M has tape


k 5
alphabet ΓM , S has tape alphabet ΓS = (ΓM × {H, N }) , so that a symbol ((a1 , b1 ), (a2 , b2 ),
. . ., (ak , bk )) ∈ ΓS on the j’th tape cell of S means that the j’th tape cell of the i’th tape
of M has symbol ai , with bi = H if the head of the i’th tape is at position j and bi = N
otherwise. Figure 8.3 shows an easier-to-read graphical representation, where each symbol
of ΓS is displayed as a length-k string of symbols from ΓM , with a dot over any symbol being
scanned by a tape head of M .

Altering initial configuration of S to represent initial configuration of M . On input


w = w[1]w[2] . . . w[n], S begins by replacing its input string w[1]w[2] . . . w[n] with the symbol

• • •
w[1] xy . . . xy}
| xy{z
k−1

on the first tape cell, and


w[i] xyxy
| {z . . . xy}
k−1

on the i’th tape cell, for i ∈ {2, . . . , n}.

Simulating a transition of M with many transitions of S. Whenever a single transition


of M occurs, S must scan the entire tape, using its state to store all the symbols under each
• (since there are only a constant k of them, this can be done in its finite state set). This

information can be stored in the finite states of S since there are a fixed number of tapes
and therefore a fixed number of •’s. Then, S will reset its tape to the start, moving each •
appropriately (either left, right, or stay), and writing new symbols on the tape at the old
location of the •, according M ’s transition function.
This allows S to simulate the computation done by M , accepting, rejecting, or looping
if and only if M does, so L(S) = L(M ). Since S halts if and only if M halts, S is total if M
is total.
5
H means “head” and N means “no head”
94 CHAPTER 8. TURING MACHINES

end of lecture 6a
midterm exam

end of lecture 6b

8.6.2 Other variants


One can make all manner of other changes to the Turing machine model, without changing
the fundamental computational power of the model:
• two-way infinite tapes instead of one-way infinite
• multiple tape heads on one tape
• 2D tape instead of 1D tape, i.e., the position is an integer (x, y)-coordinate, not simply
a single integer
• random access memory: the machine has a special “address” worktape, where it can
write an integer in binary, and then enter a special state, and the machine’s tape head
on the main tape will immediately jump to that position without having to visit all the
positions in between
The above modifications give extra capabilities to the machine. We can also consider certain
ways of reducing the capabilities of a Turing machine. Surprisingly, all of the following
modifications of the Turing machine definition do not alter the fundamental computational
power of the model (all of them decide precisely the same class of languages as a Turing
machine):
• move-right-or-reset: instead of a single left move, the tape head can be reset to the
leftmost position.
• two stacks: instead of a linked list memory (which is a single list that the finite state
machine can iterate over in both directions), the finite state machine has two stacks:
lists in which only the rightmost element can be accessed, and the only way to change
the stack is to pop (remove) a symbol from the rightmost end, or to push a new symbol
to the rightmost end.
• one queue: instead of a linked list memory, the finite state machine can control a queue:
in which only the rightmost symbol can be read and removed, and the only way to add
a new symbol is to push it on the leftmost end.
• three counters: instead of a linked list memory, the finite state machine can control
three counters: each counter is a nonnegative integer whose value can be incremented,
8.7. TMS VERSUS CODE 95

decremented, or tested to check whether it is equal to 0. In this case there is no


longer any notion of an alphabet of input or tape symbols: the only way to give input
to the machine is to put a certain integer in one of its counters initially, and the
only information the finite state machine has available, other than its current state, is
three bits indicating whether each of its three counters is currently equal to 0 or not.
Nonetheless, we can still make sense of doing computation on integers, for example,
deciding whether a integer is prime, or replacing an initial n in a counter with the value
f (n) = n2 or some other computable function of n. Any surprisingly, any function on
integers that a TM can compute, can also be computed by a three-counter machine.
There are, however, ways to reduce the capability of a Turing machine sufficiently that
it actually changes the computational power:
• one stack: this is equivalent to a deterministic pushdown automaton, which can decide
only context-free languages (and not even all of those)
• two counters: there is a certain clever way that inputs can be encoded so that Turing-
powerful computation can be done with only two counters; nonetheless, these machines
are less powerful in the strict sense than three-counter machines. For example, although
a three-counter machine can start with a natural number n in its first counter and halt
with the value 2n in its first counter (i.e., it can compute the function f (n) = 2n ), no
two-counter machine can do this.
One counter is even less powerful: a single counter is like a stack with a unary alphabet,
so it’s no more powerful than a one-stack machine! A one-counter machine can decide
some nonregular languages such as {0n 1n | n ∈ N}, but not something like {w : wR |
w ∈ {0, 1}∗ }, even though a binary stack suffices for the latter.
• no memory: this is just a finite-state machine, which can decide only regular languages!

8.7 TMs versus code


Some of the homework problems involve programming multitape Turing machines. A goal of
these exercise is to instill a sense that, although Turing machines look different than popular
programming languages, they are in fact equally as powerful, if tedious to program.
We take the phrase “Turing machine” to be synonymous with “algorithm”, and from
this point on, we will describe Turing machines in terms of algorithms in actual code, in the
programming language Python. We know that if we can write an algorithm in Python to
recognize or decide a language, then we can write a Turing machine as well. We will only
refer to low-level details of Turing machines when it is convenient to do so; for instance,
when simulating an algorithm with another model of computation, it is easier to simulate a
Turing machine than a Python program.
A common misconception is that computer programmers used to program Turing ma-
chines before better programming languages came along. This is a misunderstanding of why
96 CHAPTER 8. TURING MACHINES

we study Turing machines. No one ever programmed Turing machines except as a math-
ematical exercise; they are a model of computation whose simplicity makes them easy to
handle mathematically (and whose definition is intended to model a mathematician sitting
at a desk with paper and a pencil), though this same simplicity makes them difficult to pro-
gram. We generally use Turing machines when we want to prove limitations on algorithms.
When we want to design algorithms, there is rarely a reason to use Turing machines instead
of pseudocode or a regular programming language.

Structured data and flat strings. It is common to write algorithms in terms of the data
structures they are operating on, even though these data structures must be encoded in
binary before delivering them to a computer (or in some alphabet Σ before delivering them
to a TM). Given any “discrete” object O, such as a string, graph, tuple, (or even a Turing
machine itself), we use the notation hOi to denote the encoding of O as a string in the
input alphabet of the Turing machine we are using. To encode multiple objects, we use the
notation hO1 , O2 , . . . , Ok i to denote the encoding of the objects O1 through Ok as a single
string.
1 2
 
0 1 1 0
1 0 1 0
For example, to encode the graph G = 3 4 we can use an adjacency matrix  1 1 0 1

0 0 1 0
encoded as a binary string by concatenating the rows: hGi = 0110101011010010.

8.8 Optional: The Church-Turing Thesis


• 1928 - David Hilbert puts forth the Entscheidungsproblem, asking for an algorithm
that will, given a mathematical theorem (stated in some formal language, with formal
rules of deduction and a set of axioms, such as Peano arithmetic or ZFC), will indicate
whether it is true or false

• 1931 - Kurt Gödel proves the incompleteness theorem: for logical systems such as
Peano arithmetic or ZFC, there are theorems which are true but cannot be proven in
the system.6 This leaves open the possibility of an algorithm that decides whether
the statement is true or false, even though the correctness of the algorithm cannot be
proven.7

• At this point in history, it remains the case that no one in the world knows exactly
what they mean by the word “algorithm”, or “computable function”.
6
Essentially, any sufficiently powerful logical system can express the statement, “This statement is unprovable.”,
which is either true, hence exhibiting a statement whose truth cannot be proven in the system, or false, meaning the
false theorem can be proved, and the system is contradictory.
7
Lest that would provide a proof the truth or falsehood of the statement.
8.8. OPTIONAL: THE CHURCH-TURING THESIS 97

• 1936 - Alonzo Church proposes λ-calculus (the basis of modern functional languages
such as LISP and Haskell) as a candidate for the class of computable functions. He
shows that it can compute a large variety of known computable functions, but his
arguments are questionable and researchers are not convinced.8
• 1936 - Alan Turing, as a first-year graduate student at the University of Cambridge
in England, hears of the Entscheidungsproblem taking a graduate class. He submits a
paper, “On computable numbers, with an application to the Entscheidungsproblem”,
to the London Mathematical Society, describing the Turing machine (he called them
a-machines) as a model of computation that captures all the computable functions
and formally defines what an algorithm is. He also shows that as a consequence of
various undecidability results concerning Turing machines, there is no solution to the
Entscheidungsproblem; no algorithm can indicate whether a given theorem is true or
false, in sufficiently powerful logical systems.
• Before the paper is accepted, Church’s paper reaches Turing from across the Atlantic.
Before final publication, Turing adds an appendix proving that Turing machines com-
pute exactly the same class of functions as λ-calculus.
• Turing’s paper is accepted, and researchers in the field – Church included – were imme-
diately convinced by Turing’s physical arguments: he essentially argued that his model
was powerful enough to capture anything any person could do as they are sitting at a
desk with a pencil and paper, calculating according to fixed instructions.

The Church-Turing Thesis. All functions that can be computed in a finite amount of time
by a physical machine in the universe, can be computed by a Turing machine.
The statement known as the Church-Turing thesis is not a mathematical statement that
can be formalized in the same way as a theorem. It is a physical law, much like the Laws of
Thermodynamics or Maxwell’s equations. It is something observed to hold in practice, but
in principle it is refutable. In practice, however, attempts to refute it (a crank field known
as “hypercomputation”) tend to fail, in the same way that attempts to build a perpetual
motion machine (violating the second law of thermodynamics) inevitably fail.

8
For instance, Emil Post accused Church of attempting to “mask this identification [of computable functions]
under a definition.” (Emil L. Post, Finite combinatory processes, Formulation I, The Journal of Symbolic Logic, vol.
1 (1936), pp. 103–105, reprinted in [27], pp. 289–303.)
98 CHAPTER 8. TURING MACHINES
Chapter 9

Efficient solution of problems: The


class P

Computability focuses on which problems are computationally solvable in principle. Com-


putational complexity focuses on which problems are solvable in practice.

9.1 Asymptotic analysis


Our first concept involves a question: how do we measure the running time of an algorithm?
Faster algorithms are better, we can agree, but what does it mean to say one algorithm is
faster than another?

9.1.1 Defining running time


“Clock-on-the-wall” time If I run algorithms A and B on the same input, and A gives
an answer in 3 seconds, but B gives an answer in 4 seconds, does that mean A is better?
What if I run them again, and this time A takes 5 seconds instead? Perhaps the times are
different because other processes where running on my computer the second time, making A
appear slower. What if I run A on an older, slower computer and A now takes 30 seconds?
This way of measuring algorithm efficiency, informally known as “clock-on-the-wall” time,
is inherently flawed, because an execution of an algorithm may take different amounts of
real time depending on external environmental factors, such as other programs running on
the same computer, or running the algorithm on a different computer. However, we want to
understand whether the algorithm itself is faster than another algorithm. What we really
want to know is: if we run two algorithms in the exact same environment, which will be
faster?

Standardized time using TMs To do this we need a “standard” environment for com-
parison. This is one elegant usage of the model of Turing machines: they give a simple
and clean mathematical way to measure the “number of steps” needed for an algorithm,

99
100 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

uncorrupted by complicating environmental factors such as processor clock frequencies or


competing operating system processes.
Definition 9.1.1. Let M be a TM, and x ∈ {0, 1}∗ . Define timeM (x) to be the number of
configurations M visits on input x (so a TM that immediately halts takes 1 step, not 0).
This resolves one issue, that of standardizing the definition of “time”. However, note
that timeM (x) has two variables in it: the Turing machine M and the input x. We said
we want to measure the running time of M , but instead we have measured the running
time of M on a particular input x. Suppose we have two Turing machines A and B solving
some problem, and we want to determine which is faster. What if for two inputs x and y,
timeA (x) < timeB (x) but timeA (y) > timeB (y), i.e., A is faster than B on x but slower than
B on y? Which algorithm is “better”?
The way we resolve this issue (and it is not the only way to resolve it, but it is a standard
simplifying approach) is to ask not merely how fast an algorithm runs on one input, but to
ask how quickly its running time grows as we increase the size of the inputs.
There’s two important points we’ve just made. First, we consider only the size of the
input and not the exact symbols of the input. For each possible input length n, we consider
the worst-case running time over all inputs of length n. Second, rather than having a single
number called the “running time”, we will have a function t : N → N of n. Fiven any
possible input length n, t(n) is the most time the algorithm takes on any input of length n.
Definition 9.1.2. If M is total, define the (worst-case) running time (or time complexity)
of M to be the function t : N → N defined for all n ∈ N by

t(n) = max n timeM (x).


x∈{0,1}

We call such a function t a time bound.


So we can assign to each total Turing machine M a unique function t called its running
time. But now we have a new problem. We know how to compare two numbers to say which
is larger. But how to compare two functions? What if t1 (n) > t2 (n) but t1 (n+1) < t2 (n+1)?

9.1.2 Asymptotic Analysis.


To resolve this problem, we use a tool called asymptotic analysis. This lets us compare two
running time functions and say whether the growth rate of one is larger than the growth rate
of the other, or whether they have the same growth rate.1
The main way we compare such functions involves ignoring multiplicative constants. This
will have two effects:
1. some functions will be equivalent but not equal (different functions can have the same
growth rate), and
1
A word of warning: the word “rate” here is a metaphor. There’s no number that is the rate of growth.
9.1. ASYMPTOTIC ANALYSIS 101

2. it will greatly simplify analysis of algorithms.

By giving ourselves permission to ignore constants, we will be able to quickly glance at an


algorithm and determine its growth rate, without worrying about the precise number of steps
it will take.

Definition 9.1.3. Given nondecreasing f, g : N → N+ , we write f = O(g) (or f (n) =


O(g(n))), if there exists c ∈ N such that, for all n ∈ N, f (n) ≤ c · g(n). We say g is an
asymptotic upper bound for f .2

Bounds of the form nc for some constant c are called polynomial bounds. (n, n2 , n3 , n2.2 ,
n1000 , etc.) So when we say a function f is polynomially bounded, this means that there
exists a c such that f = O(nc ).
δ
Bounds of the form 2n for √
some real constant δ > 0 are called exponential bounds. (2n ,
2 100
22n , 2100n , 20.01n , 2n , 2n , 2 n , etc.)
So-called “big-O” notation, saying that f = O(g), is analogous to saying that f is “at
most” (or “grows no faster than”) g. “Little-O” notation is a way to capture that f is
“strictly smaller” (or “grows slower than”) g.

Definition 9.1.4. Given nondecreasing f, g : N → R+ , we write f = o(g) (or f (n) =


o(g(n))), if lim fg(n)
(n)
= 0.
n→∞

Difference between f = o(g) and g 6= O(f ) when f ≤ g. It would be convenient if the


definitions of O() and o() matched with our intuition that for two numbers a, b, a < b if and
only if a ≤ b and b 6≤ a. For most common functions we deal with, functions that actually
describe the running time of an algorithm, the analogous statement holds: f = o(g) if and
only if f = O(g) and g 6= O(f ).
However, certain strange functions obey f = O(g) and g 6= O(f ), but not lim fg(n)
(n)
= 0,
n→∞
if the ratio fg(n)
(n)
oscillates between a positive value and values arbitrarily close to 0. One
example (due to Xianzhe Ma, a student who took this class) is shown in Fig. 9.1.

9.1.3 Rules of thumb for comparing growth rates

The formal definitions above are “official”, but some shortcuts are handy to remember, and
in practice, you will use these shortcuts far more often than you will need to resort to using
the above definitions directly.
2
Often one sees the slightly more complex definition that f = O(g) if there exists c, n0 ∈ N such that, for all
n ≥ n0 , f (n) ≤ c · g(n). (i.e., we require that c · g(n) exceed f only on sufficiently large n, instead of all n). Why are
these equivalent? Hint: f and g are both positive.
102 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

functions showing difference between definitions of little-o


140

120

100

80

60

40

20

0
0 20 40 60 80 100 120 140
g(n) f(n)

Figure 9.1: Example functions f and g obeying f = O(g) and g 6= O(f ), but not f = o(g). Define
g(n) = n, and f (n) = c! when c! < n ≤ (c + 1)!, where c ∈ N+ . In other words, g’s graph is the
line of slope 1, and f is constant between any two inputs of the form c! (1, 2, 6, 24, 120, 720, . . .),
jumping up to the g line upon reaching the next factorial. Since f (n) ≤ g(n) for all n, f = O(g).
Also g 6= O(f ), because for any constant c ∈ N+ , when n = c! − 1 (the value of n just before f
jumps from (c − 1)! to c!, e.g., n = 23 or n = 119), then g(n) = n > c! = c · (c − 1)! = c · f (n).
However, lim fg(n)
(n)
6= 0. The limit does not exist, since fg(n)
(n)
oscillates between 1 (when n = c! for
n→∞
some c ∈ N+ ) and values that get arbitrarily close to 0 (when n + 1 = c! for some c ∈ N+ ).
9.1. ASYMPTOTIC ANALYSIS 103

Write one function as the other times something unbounded. One way to see that
f (n) = o(g(n)) is to find h(n) so that f (n)·h(n) = g(n) for some h(n) that grows unboundedly.
For example, suppose we want to compare n and n log n: letting f (n) = n, g(n) = n log n,
and h(n) = log n, since f (n)·h(n) = g(n), and h(n) grows unboundedly, then f (n) = o(g(n)),
i.e., n = o(n log n).
Of course, this is simply a different way of saying that lim fg(n)
(n)
= 0, since fg(n)
(n) f (n)
= h(n)f (n)
=
n→∞
1
h(n)
→ 0 if h(n) → ∞.

Remove constants and lower-order terms. Another useful shortcut is to ignore all but the
largest terms, and to remove all constants. For example, 10n7 +100n4 +n2 +10n = O(n7 ), and
2n + n100 + 2n = O(2n ). This is useful when analyzing an algorithm, where you are counting
steps from many portions of the algorithm, but only the “slowest part” (contributing the
most number of steps) really makes a difference in the final analysis.

Taking logs and square roots make growth rates smaller. Taking the log of anything,
or taking the square root of anything (more more generally, raising it to a power smaller
than 1) makes it strictly
√ smaller, and raising it to a power larger than 1 makes it bigger. So
log n = o(n), and n = o(n). Also, log n4 = o(n4 ) (actually, log n4 = 4 log n = O(log n)).

Memorize these. The following are used often enough that they are worth memorizing:

• 1 = o(log n),

• log n = o( n) (or more generally log n = o(nα ) for any α > 0, e.g., n1/100 ),

• n = o(n),

• nc = o(nk ) if c < k For example, n = o(n2 ), n2 = o(n3 ), and n3 = o(n3.01 ). n = o(n)
is the special case for c = 0.5, k = 1.
δ
• nk = o(2n ) for any k > 0 and any δ > 0 (even if k is huge and δ is tiny, e.g.,
n100 = o(20.001n .) Also, nk = o(2δn ) for any k, δ > 0.

Take logs of two functions to see if they have different growth rates. If log f (n) =
o(log g(n)), then f (n) = o(g(n)). This is useful for simplifying expressions. For example,
2
how to compare 2n and nn ? One has a larger exponent, and the other a larger base, so it’s
not obvious which grows faster. But taking the log of both, we get n2 and log nn = n log n.
2
Since n log n = o(n2 ), we know immediately that nn = o(2n ). However, this doesn’t work to
go the other way: just because log f = Θ(log g) doesn’t imply that f = Θ(g). For example,
n = Θ(2n), but 2n = o(22n ), since 22n = (2n )2 .
104 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

9.1.4 Other asymptotic notations


f = O(g) is like saying f “≤” g. f = o(g) is like saying f “<” g.
Based on this analogy, we introduce notations that are analogous to the three other
comparison operators ≥, >, and =:
• f = Ω(g) if and only if g = O(f ) (like saying f “≥” g),
• f = ω(g) if and only if g = o(f ) (like saying f “>” g), and
• f = Θ(g) if and only if f = O(g) and f = Ω(g). (like saying f “=” g; note it is not
the same as saying they are actually the same function! For example, n2 = Θ(3n2 ) but
n2 6= 3n2 ; it’s saying they grow at the same asymptotic rate.)3

end of lecture 6c

9.2 Time complexity classes and the Time Hierarchy Theorem


Definition 9.2.1. Let t : N → N be a time bound.4 Define TIME(t) ⊆ P({0, 1}∗ ) to be the
set of decision problems
TIME(t) = {L ⊆ {0, 1}∗ | there is a TM with running time O(t) that decides L}.
Observation 9.2.2. For all time bounds t1 , t2 : N → N such that t1 = O(t2 ), TIME(t1 ) ⊆
TIME(t2 ).
The following result is the basis for all complexity theory. We will not prove it in this
course (take 220 for that), but it is important because it tells us that complexity classes are
worth studying. We aren’t just wasting time making up different names for the same class
of languages. Informally, it says that given more time, one can decide more languages.
3
Obviously, f = Θ(g) implies that f = O(g). But it is very common to see people write f = O(g) when they
really mean f = Θ(g). They are not equivalent, however: n = O(n2 ) but n 6= Θ(n2 ).
4
We will prove many time complexity results that actually do not hold for all functions t : N → N. Many results
hold only for time bounds that are what is known as time-constructible, which means, briefly, that t : N → N has
the property that the related function ft : {0}∗ → {0}∗ defined by ft (0n ) = 0t(n) is computable in time t(n). This is
equivalent to requiring that there is a TM that, on input 0n , halts in exactly t(n) steps. The reason is to require that
for every time bound used, there is a TM that can be used as a “clock” to time the number of steps that another
TM is using, and to halt that TM if it exceeds the time bound. If the time bound itself is very difficult to compute,
then this cannot be done.
All “natural” time bounds we will study, such as n, n log n, n2 , 2n , etc., are time-constructible. Time-
constructibility is an advanced (and boring) issue that we will not dwell on, but it is worth noting that it is possible to
define unnatural time  boundsrelative to which unnatural theorems can be proven, such as a time bound t such that
2t(n)
TIME(t(n)) = TIME 22 . This is an example of what I like to call a “false theorem”: it is true, of course, but
its truth tells us that our model of reality is incorrect and should be adjusted. Non-time-constructible time bounds
do not model anything found in reality.
9.3. OPTIONAL: TIME COMPLEXITY OF SIMULATION OF MULTITAPE TM WITH A ONE-TAPE TM105

Theorem 9.2.3 (Time Hierarchy Theorem). Let t1 , t2 : N → N be time bounds such that
t1 (n) log t1 (n) = o(t2 (n)). Then TIME(t1 ) ( TIME(t2 ).

For instance, there is a language decidable in time n2 that is not decidable in time n, and
another language decidable in time 2n that is not decidable in time n2 , etc.

9.3 Optional: Time complexity of simulation of multitape TM


with a one-tape TM
Recall the single-tape TM from Section 8.1, deciding whether its input is a palindrome.
What is its running time? Recall that it moves the tape head from the left side of the string
to the right side, repeatedly, although it replaces the leftmost and rightmost with an x and
only moves over the bits that have not been replaced. So each bit is visited twice each time
the tape head makes a pass. Once the bit is replaced with x, that tape cell is only visited
once more. So the i’th bit in the left half is then visited 2i + 1 times. The the time spent in
the left half is n/2
P Pn/2 2
i=0 (2i + 1) = n/2 + 1 + 2 · i=0 i = n/2 + 1 + 2 · 1/2 · n/2 · (n/2 + 1) = O(n ).
By symmetry, the time spent in the right half of the input is the same, so the whole running
time is O(n2 ).
Observe that a 2-tape TM can decide the palindrome language in O(n) time: it just needs
to copy the input to the second tape in n steps, put tape head 1 on the left and tape head
2 on the right on n steps, and then move them in opposite directions for n steps, comparing
the symbols they read. Recall also that we showed every multitape TM can be simulated by
a single-tape TM. Why does this not prove that single-tape TMs can decide the palindrome
language in O(n) time? It’s because that simulation incurs a quadratic slowdown, as the
next theorem shows.

Theorem 9.3.1. Let t : N → N, where t(n) ≥ n. Then every t(n) time multitape TM can
be simulated in time O(t(n)2 ) by a single-tape TM.

Proof. Recall the proof of Theorem 8.6.1. M ’s tape heads move right by at most t(n)
positions, so S’s tape contents have length at most t(n). Simulating one step of M requires
moving S’s tape by this length and back, which is O(t(n)) time. Since M takes t(n) steps
total, and S takes O(t(n)) steps to simulate each step of M , S takes O(t(n)2 ) total steps.

Note that the single-tape TM S is a little slower: it takes O(t(n)2 ) to simulate t(n) steps
of M . Could this be improved to O(t(n))? It turns out the answer is no: for example, the
palindrome language {w | w = wR } can be solved in time O(n) on a two-tape TM, but it
provably requires Ω(n2 ) time on a one-tape TM.
The lesson is that although all reasonable models of computation have running time
within a polynomial factor of each other, they sometimes are not within a constant factor of
each other.
106 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

9.4 Definition of P
9.4.1 The complexity class P.
Definition 9.4.1. Let P = ∞ k
S
k=1 TIME(n ). In other words, P is the class of languages
decidable in polynomial time on a deterministic, one-tape TM. But, for reasons we outline
below, the easiest way to think of P is as the class of languages decidable using only a
polynomial number of steps in your favorite programming language.

9.4.2 “Reasonable” encodings.


We must be careful to use reasonable encodings with the encoding function h·i : D → {0, 1}∗ ,
that maps elements of a discrete set D (such as N, Σ∗ , or the set of all Java programs) to
binary strings. For instance, for n ∈ N, two possibilities are hni1 = 1n (the unary expansion
of n) and hni2 = the binary expansion of n. | hni1 | = Ω(2|hni2 | ), so hni1 is a bad choice. Even
doing simple arithmetic would take exponential time in the length of the binary expansion,
if we choose the wrong encoding. Alternately, exponential-time algorithms might appear
mistakenly to be polynomial time, since the running time is a function of the input size,
and exponentially expanding the input lowers the running time artificially, even though the
algorithm still takes the same (very large) number of steps. Hence, an analysis showing the
very slow algorithm to be technically polynomial would be misinforming.
As another example, a reasonable encoding of a directed graph G = (V, E) with V =
{0, 1, . . . , n − 1}, is via its adjacency matrix, where for i, j ∈ {1, . . . , n}, the (n · i + j)th bit of
hGi is 1 if and only if (i, j) ∈ E. For an algorithms course, we would care about the difference
between this encoding and an adjacency list, since sparse graphs (those with |E|  |V |2 )
are more efficiently encoded by an adjacency list than an adjacency matrix. But since these
two representations differ by at most a linear factor, an algorithm could convert one to the
other in polynomial time. Thus, any polynomial-time algorithm using one representation
could be converted to a polynomial-time algorithm using the other representation.

9.4.3 Input size.


To be very technical and precise, the “input size” is the number of bits needed to encode an
input. For a graph G = (V, E), this means | hGi |, where hGi ∈ {0, 1}∗ . For an adjacency
matrix encoding, for example, this would be exactly n2 if n = |V |. However, for many data
structures there is another concept meant by “size”. For example, the “size” of G could refer
to the number of nodes |V |.
The “size” of a list of strings could mean the number of strings in it, which would be
smaller than the number of bits needed to write the whole list down. For example, the
list (010, 1001, 11) has 3 elements, but they have 9 bits in total. Furthermore, there’s no
obvious way to encode the list with only 9 bits; simply concatenating them to 010100111
could encode other lists, such as (0101001, 11) or (0, 1, 0, 1, 0, 0, 1, 1, 1). So even more bits are
needed to encode them in a way that they can be delimited; for instance, by “bit-doubling”:
9.5. EXAMPLES OF PROBLEMS IN P 107

The list above can be encoded as 001100 01 11000011 01 1111, where bits 0 and 1 of the
encoded string are represented as 00 and 11, respectively, and 01 is a delimiter to mark the
boundary between strings. This increases the length of the string by at most factor 4 (the
worst case is a list of single-bit strings).
This frees us to talk about the “size” of an input without worrying too much about
whether we mean the number of bits in an encoding of the input, or whether we mean some
more intuitive notion of size, such as the number of nodes or edges (or both) in a graph, the
number of strings in a list of strings, etc.

9.4.4 P is the same for most encodings and programming languages.


In an algorithms course, we would care very much about the differences between these
representations, because they could make a larger difference in the analysis of the running
time. But in this course, we observe that all of these differences are within a polynomial
factor of each other.5 Therefore, if we can find an algorithm that is polynomial time under
one encoding, it will be polynomial time under the others as well.
We use Python to describe algorithms, knowing that the running time will be polynomi-
ally close to the running time on a single-tape TM. That is, although the definition of, for
example, TIME(n2 ) can depend on the choice of programming language or how exactly data
structures are encoded, for most reasonable choices, the class P is unchanged.
So to summarize, we choose to study the class P for two main reasons:
1. Historically, when an “efficient” algorithm is discovered for a problem, it tends to be
polynomial time.
2. By ignoring polynomial differences, we are able to cast aside detailed differences be-
tween models of computation and particular encodings. This gives us a general theory
that applies to all computers and programming languages.

9.5 Examples of problems in P


Analyzing asymptotic running time of algorithms. A major benefit of big-O notation is
to make it easier to analyze the running time of algorithms. Rather than counting the exact
number of steps, which would be tedious, we can “throw away constants”. It is often much
easier to quickly glance at an algorithm and get a rough estimate of its running time in big-O
notation, compared to the much more arduous task of attempting to compute the precise
number of steps the algorithms takes on a certain input. Furthermore, simply identifying
that an algorithm is polynomial-time, without worrying about the exact exponent, allows
an even easier analysis.
5
To say that two time bounds t and s are “within a polynomial factor” of each other means that (assuming s is
the smaller of the two) t = O(s), or t = O(s2 ), or t = O(s3 ), etc. For example, all polynomial time bounds are within
a polynomial factor of each other. Letting s(n) = 2n and t(n) = 23n , t and s are within a polynomial factor because
t(n) = 23n = (2n )3 = O(s3 ).
108 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

We will analyze the time complexity of algorithms at a high level (rather than at the level
of individual transitions of a TM), assuming, as in ECS 36C and ECS 122A, that individual
lines of code in C++, Python, or pseudocode take constant time, so long as those lines are
not masking loops or recursion or something that would not take constant time (such as a
Python list comprehension, which is shorthand for a loop).
The code below can also be found on this GitHub page:
https://github.com/dave-doty/UC-Davis-ECS120/blob/master/120.ipynb

Directed versus undirected graphs. The simplest way to write code is to assume that all
graphs are directed. In this case, an undirected graph G = (V, E) is the special case of a
directed graph obeying the constraint that (u, v) ∈ E ⇐⇒ (v, u) ∈ E for all u, v ∈ V. The
Python notebook provides some utility methods for converting between this and the repre-
sentation of an undirected edge as a two-element set. Also the function add_reverse_edges
takes a directed graph and adds the reverse edge of every directed edge, if not already present.
This used in many examples to make it easier to specify an undirected graph.

9.5.1 Paths in graphs


Define
Path = { hG, s, ti | G is a directed graph with a path from node s to node t } .
This is also known as the directed reachability problem. (And the reachability problem is the
same one, but for undirected graphs.)
Theorem 9.5.1. Path ∈ P.
Proof. Breadth-first search works.
High-level description of algorithm. (See Python code below.) It starts the search
at node s, and iterates over the other nodes in the following order: first it visits all nodes
distance 1 from s (neighbors), then it visits all nodes distance 2 from s (neighbors of neighbors
of s), then it visits all nodes distance 3 from s (neighbors of neighbors of neighbors of s),
etc. It works by starting a first-in, first-out queue with only s, and while nodes remain in
the queue, pop a node from the queue, and if it hasn’t been visited yet, mark it as visited
(add it to the list of visited nodes), and add all of its unvisited neighbors to the queue. Each
visited node is checked to see if it is t, and if so, return True. If we exhausted all nodes
reachable from s without finding t, return False.
Running time analysis: Assume we have n nodes and m = O(n2 ) edges. When a node
comes out of queue, we add it to visited if it is not already in there. Thus the Boolean
condition at line 13 is True at most once per node in V . It takes time O(n) to check whether
a node is in the list visited. It takes time O(m) for the list comprehension at line (15) to
find all the neighbors of a given node. For each iteration of the outer loop, the inner loop
(line (16)) executes at most n times. Since this inner loop is executed at most once per node,
at most n2 nodes are ever put in queue.
9.5. EXAMPLES OF PROBLEMS IN P 109

1 import collections
2 def path (G ,s , t ) :
3 """ G : graph
4 s , t : nodes in G
5 Check if there is a path from node s to node t in graph G . """
6 V,E = G
7 visited = []
8 queue = collections . deque ([ s ])
9 while queue :
10 node = queue . popleft ()
11 if node == t :
12 return True
13 if node not in visited :
14 visited . append ( node )
15 node_neighbors = [ v for (u , v ) in E if u == node ]
16 for neighbor in node_neighbors :
17 if neighbor not in visited :
18 queue . append ( neighbor )
19 return False

So the outer loop (line (9)) executes O(n2 ) iterations. Putting this all together, it takes
time O(n2 · (n + m + n · n)) = O(n4 ), which is polynomial time.
We can test this out on a few graphs, shown in Fig. 9.2 and in the code samples below.

1 2 3 4 5 1 2 3 4 5

6 7 8 6 7 8

Figure 9.2: Directed (left) and undirected (right) versions of a graph. In the directed graph, there’s
a path from 1 to 4 but not from 4 to 1. In the undirected graph, there are paths in both directions.
In both graphs, there is no path between any of {1, 2, 3, 4, 5} and any of {6, 7, 8}.

1 V = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8]
2 E = [ (1 ,2) , (2 ,3) , (3 ,4) , (4 ,5) , (5 ,2) , (6 ,7) , (7 ,8) , (8 ,6) ]
3
4 G = (V , E )
5 print ( f " path from 4 to 1? { path (G , 4 , 1) } " )
6 print ( f " path from 1 to 4? { path (G , 1 , 4) } " )
Now let’s try it on an undirected graph (we make the graph undirected by adding a
reverse edge for each existing edge):
1 def add_rev erse_e dges ( G ) :
2 """ Makes directed graph undirected by adding reverse edges . """
3 V,E = G
4 reverse_edges = []
110 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

5 for (u , v ) in E :
6 if (v , u ) not in E and (v , u ) not in reverse_edges :
7 reverse_edges . append (( v , u ) )
8 return (V , E + reverse_edges )
9
10 G = add_re verse_ edges ( G )
11 print ( f " path from 4 to 1? { path (G , 4 , 1) } " )
12 print ( f " path from 1 to 4? { path (G , 1 , 4) } " )
13 print ( f " path from 4 to 7? { path (G , 4 , 7) } " )
There are an exponential number of simple paths from s to t in the worst case ((n − 2)!
paths in the complete directed graph with n vertices), but we do not examine them all
in a breadth-first search. The BFS takes a shortcut to zero in on one particular path in
polynomial time.
Of course, we know from studying algorithms that with an appropriate graph representa-
tion, such as an adjacency list, and with a more careful analysis that doesn’t make so many
worst-case assumptions, this time can be reduced to O(n + m). But in computational com-
plexity theory, since we have decided in advance to consider any polynomial running time to
be “efficient”, we can often be quite lazy in our analysis and choice of data structures and
still obtain a polynomial running time.

Time lower bounds. Typically we study upper bounds on running time. The running time
upper bound of O(n ) leaves open the possibility that path could take time n3 or n2 , since
4

these are both O(n4 ). But sometimes we want to know how tight the analysis is, meaning
we want a lower bound on the running time of the algorithm.
If the running time is the maximum number of steps over all inputs of length n, then
showing an upper bound of u(n) means show that all inputs of length n take time O(u(n)).
But since it is defined as a maximum, showing a lower bound of `(n) requires only showing
that, for each input length n, there exists an input of length n requiring Ω(`(n)) time.
By analogy, if I want to prove that the maximum number in a set A is at most u, then
I need to show that all numbers x ∈ A obey x ≤ u. However, if I want to prove that the
maximum number in a set A is at least `, then I need only show that some number x ∈ A
obeys x ≥ `. x may not be the maximum itself, but if not, then the maximum m is even
larger, so m ≥ ` as well.
Let’s do such an analysis on the path algorithm. In this case, we want to show that for
each n, there is a graph G = (V, E) with n = |V | requiring time Ω(n). This won’t tell us
the exact running time, but we now know it is some function between n and n4 (perhaps n2 ,
n2 log n, n, or n4 ).
One nice thing about showing a lower bound on an algorithm’s running time is that
we can ignore certain steps. Since ignoring them means we are under-counting the actual
number of steps, the lower bound still holds. We will simply exhibit a graph that requires
every node other than t to be visited before t will be visited. We don’t have to calculate the
exact time required to visit a node to know that it is at least one step. So if we need to visit
every other node before visiting t, then on that graph, path takes at least n steps, since it
9.5. EXAMPLES OF PROBLEMS IN P 111

visits all n nodes before finding t.


The graph that will make this happen is the line graph G described by s–u1 –u2 –. . .–un−2 –
t. s and t are connected, so the algorithm will run until t is visited. But a bread-first search
(or a depth-first search) will visit every other node before it visits t, thus requiring at least
n steps. This shows the running time of path is Ω(n).
It is worth mentioning one other twist: this shows a lower bound of Ω(n) time for this
particular algorithm. It does not imply that every algorithm solving the Path problem
requires time Ω(n). (That is true, but requires more intricate reasoning.) Showing a lower
bound on time for one algorithm doesn’t imply that there is not a more efficient algorithm
somewhere out there.

end of lecture 7a

9.5.2 Relatively prime integers


Given x, y ∈ N+ , we say x and y are relatively prime if the largest integer dividing both of
them is 1. Define

RelPrime = { hx, yi | x and y are relatively prime } .

Theorem 9.5.2. RelPrime ∈ P.

Proof. Since the input size is | hx, yi | = O(log x + log y) (the number of bits needed to
represent x and y), we must be careful to use an algorithm that is polynomial in n = | hx, yi |,
not polynomial in x and y themselves, which would be exponential in the input size.
Euclid’s algorithm for finding the greatest common divisor of two integers works.
1 def gcd (x , y ) :
2 """ x , y : positive integers
3 Euclid ’s algorithm for greatest common divisor . """
4 while y >0:
5 x = x % y
6 x,y = y,x
7 return x
8
9 def rel_prime (x , y ) :
10 return gcd (x , y ) == 1
11
12 print ( gcd (24 ,60) , rel_prime (24 ,60) ) # gcd (24 ,60) = 12
13 print ( gcd (25 ,63) , rel_prime (25 ,63) ) # gcd (25 ,63) = 1
Each loop iteration cuts the value of x in half, because x mod y < y, so if y ≤ x2 , then
x mod y < y ≤ x2 . Otherwise y > x2 , then x mod y = x − y < x2 . Therefore, at most
log x < | hx, yi | iterations of the loop execute, and each iteration requires O(1) arithmetic
operations, each polynomial-time computable, whence rel_prime is polynomial time.
112 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

Note that there are an exponential (in | hx, yi |) number of integers that could potentially
be common divisors of x and y (namely, all the integers less than min{x, y}), but Euclid’s
algorithm does not check all of them to see if they divide x or y; it uses a shortcut to skip
most of them.

9.5.3 Connected graphs


Recall that an undirected graph is connected if every node has a path to every other node.
Define
Connected = { hGi | G is connected undirected graph } .
Theorem 9.5.3. Connected ∈ P.
Proof. G = (V, E) is connected if, for all s, t ∈ V , hG, s, ti ∈ Path:
1 from itertools import combinations as subsets
2 def connected ( G ) :
3 """ G : graph
4 Check if G is connected . """
5 V,E = G
6 for (s , t ) in subsets (V ,2) :
7 if not path (G ,s , t ) :
8 return False
9 return True
Let n = |V | Since Path ∈ P, and connected calls path O(n2 ) times, connected takes
polynomial time.
TODO: add example of program whose running time depends on length of strings created while it is running

9.5.4 Optional: Eulerian cycles in graphs


Recall that a Eulerian cycle in an undirected graph visits each edge exactly once. Define

EulerianCycle = { hGi | G is an undirected with an Eulerian cycle } .

Theorem 9.5.4. EulerianCycle ∈ P


Proof. Euler showed that an undirected graph has an Eulerian cycle if and only if every node
has even degree, and all of its edges belong to a single connected component.
Figure 9.3 shows some examples of why the latter condition is necessary.
Based on this theorem of Euler, we can check the degree of each node to verify the first
part, and make a new graph of all the nodes with edges, ensuring it is connected, to check
the second part.
1 def degree ( node , E ) :
2 """ node : node in a graph
3 E : set of edges in a graph
4 Return degree of node . """
9.5. EXAMPLES OF PROBLEMS IN P 113

Eulerian not Eulerian


G1 G2
a b c a b c

d e f d e f

G3 G4
a b c a b c

d e f d e f

g h f g h f

Figure 9.3: Graphs with/without Eulerian cycles. In the connected graphs, G1 has even degree
on all nodes, so it has an Eulerian cycle (a, b, c, e, b, f, e, d, a), but G2 has no Eulerian cycle since c
and f have odd degree. In the disconnected graphs, G3 has an Eulerian cycle since one connected
component is G1 , and all other components are isolated nodes, but G4 , even though each connected
component has a Eulerian cycle, there is no single cycle through the whole graph.

5 return sum (1 for (u , v ) in E if u == node )


6
7 def eulerian_cycle ( G ) :
8 """ G : graph
9 Check if G has an Eulerian cycle . """
10 V,E = G
11 V_pos = [] # nodes with positive degree
12 for u in V :
13 deg = degree (u , E )
14 if deg % 2 == 1:
15 return False
16 if deg > 0:
17 V_pos . append ( u )
18 G_pos = ( V_pos , E )
19 return connected ( G_pos )

Let n = |V | and m = |E|. The loop executes n = |V | iterations, each of which executes
degree, which iterates over each of the m edges, taking time O(m). So the loop takes O(nm)
time, which is polynomial in | hGi |. It also calls connected on a potentially smaller graph
than G, which we showed also takes polynomial time.
114 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P

9.6 Optional: Why identify P with “efficient”?


• We consider polynomial running times to be “small”, and exponential running times
to be “large”.

• For n = 1000, n3 = 1 billion, whereas 2n > number of atoms in the universe

• Usually, exponential time algorithms are encountered when a problem is solved by


brute force search (e.g., searching all paths in a graph, looking for a Hamiltonian
cycle). Polynomial time algorithms are due to someone finding a shortcut that allows
the solution to be found without searching the whole space.

• All “reasonable” models of computation are polynomially equivalent: M1 (e.g., a TM)


can simulate a t(n)-time M2 (e.g., a C++ program) in time p(t(n)), for some polynomial
p (usually not much bigger than O(n2 ) or O(n3 )).

• Most “reasonable” encodings of inputs as binary strings are polynomially equivalent.

• In this course, we generally ignore polynomial differences in running time. Polynomial


differences are important, but we simply focus our lens farther out, to see where in
complexity theory
√ the really big differences lie. In ECS 122A, for instance, a difference
of a log n or n factor is considered more significant. In practice, even a constant
factor can be significant; for example the fastest known algorithm for matrix multipli-
cation take time O(n2.373 ), but the O() hides a rather large constant factor. The naı̈ve
O(n3 ) algorithm for matrix multiplication, properly implemented to take advantage of
pipelining, vectorized instructions, and modern memory caches, tends to outperform
the asymptotically faster algorithms even on rather large matrices. So this is a lesson:
computational complexity theory should be the first place you check for an efficient
algorithm for a problem, but it should not be the last place you check.

• In ignoring polynomial differences, we can make conclusions that apply to any model
of computation, since they are polynomially equivalent, rather than having to choose
one such model, such as TM’s or C++, and stick with it. Our goal is to understand
computation in general, rather than an individual programming language.

• One objection is that some polynomial running times are not feasible, for instance,
n1000 . In practice, there are few algorithms with such running times. Nearly every
algorithm known to have a polynomial running time has a running time less than n10 .
Also, when the first polynomial-time algorithm for a problem is discovered, such as
the O(n12 )-time algorithm for Primes discovered in 2002,6 it is usually brought down
within a few years to polynomial running time with a smaller degree, once the initial
insight that gets it down to polynomial inspires further research. Primes currently is
known to have a O(n6 )-time algorithm, and this will likely be improved in the future.
6
Here, n ≈ log p is the size of the input, i.e., the number of bits to represent an integer p to be tested for primality.
9.6. OPTIONAL: WHY IDENTIFY P WITH “EFFICIENT”? 115

1. Although TIME(t) is different for different models of computation, P is the same class
of languages, in any model of computation polynomially equivalent to single-tape TM’s
(which is all of them worth studying, except possibly for quantum computers, whose
status is unknown).
2. P roughly corresponds to the problems feasibly solvable by a deterministic algorithm.7

7
Here, “deterministic” is intended both to emphasize that P does not take into account nondeterminism, which is
an unrealistic model of computation, but also that it does not take into account randomized algorithms, which is a
realistic model of computation. BPP, then class of languages decidable by polynomial-time randomized algorithms,
is actually conjectured to be equal to P, though this has not been proven.
116 CHAPTER 9. EFFICIENT SOLUTION OF PROBLEMS: THE CLASS P
Chapter 10

Efficient verification of solutions: The


class NP

10.1 Polynomial-time verifiers


Some problems have polynomial-time deciders. Other problems are not known to have
polynomial-time deciders, but given a candidate solution to the problem (an informal notion
that we will make formal later), in polynomial time the solution’s validity can be verified.

10.1.1 Hamiltonian path


A Hamiltonian path in a directed graph G is a path that visits each node exactly once.
Define

HamPath = { hGi | G is a directed graph with a Hamiltonian path } .

HamPath is not known to have a polynomial-time algorithm (and it is generally believed not
to have one), but, a related problem, that of verifying whether a given path is a Hamiltonian
path in a given graph, does have a polynomial-time algorithm:

HamPathV = { hG, pi | G is a directed graph with the Hamiltonian path p } .

The algorithm simply verifies that each adjacent pair of nodes in p is connected by an edge
in G (so that p is a valid path in G), and that each node of G appears exactly once in p (so
that p is Hamiltonian).
1 def ham_path_verify (G , p ) :
2 """ G : graph
3 p : list of nodes
4 Verify that p is a Hamiltonian path in G . """
5 V,E = G
6 # verify each pair of adjacent nodes in p shares an edge
7 for i in range ( len ( p ) - 1) :
8 if ( p [ i ] , p [ i +1]) not in E :

117
118 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

9 return False
10 # verify p and V have same number of nodes
11 if len ( p ) != len ( V ) :
12 return False
13 # verify each node appears at most once in p
14 if len ( set ( p ) ) != len ( p ) :
15 return False
16 return True
Why is this polynomial-time? Let n = |V |. It executes a for loop for |p| − 1 iterations,
where |p| ≤ n. each of which checks a pair of nodes for membership in E, which takes time
O(|E|) = O(n2 ). The comparison of |p| to |V | takes O(1) time, and converting p from a list
to a set in the final if statement takes time O(n log n) for common set data structures.
Let’s try out the verifier code on some candidate paths on the graph shown in Fig. 10.1.

1 2 3

6 5 4

Figure 10.1: Directed graph with a Hamiltonian path (1, 2, 4, 3, 5, 6).

1 V = [1 ,2 ,3 ,4 ,5 ,6]
2 E = [ (1 ,2) , (2 ,4) , (2 ,5) , (4 ,3) , (3 ,5) , (3 ,1) , (5 ,6) ]
3 G = (V , E )
4
5 p_bad = [1 ,2 ,3 ,4 ,5 ,6] # not a path
6 print ( ham_path_verify (G , p_bad ) )
7
8 p_bad2 = [1 ,2 ,4 ,3 ,1 ,2 ,4 ,3 ,5 ,6] # not simple
9 print ( ham_path_verify (G , p_bad2 ) )
10
11 p_bad3 = [1 ,2 ,4] # too enough nodes
12 print ( ham_path_verify (G , p_bad3 ) )
13
14 p_good = [1 ,2 ,4 ,3 ,5 ,6] # is a Hamiltonian path
15 print ( ham_path_verify (G , p_good ) )

10.1.2 Composite numbers


Another problem with a related polynomial-time verification language is

Composites = hni n ∈ N+ and n = pq for some integers p, q > 1 ,


with the related verification language



CompositesV = hn, di n, d ∈ N+ , d divides n, and 1 < d < n .

10.2. THE CLASS NP 119

Actually, Composites ∈ P, so sometimes a problem can be decided and verified in poly-


nomial time. The algorithm deciding Composites in polynomial time is a bit more complex
(not terribly so, see https://en.wikipedia.org/wiki/AKS_primality_test#The_algorithm).
Here’s the verifier, which makes sure that the potential divisor d 1) is of the right size (strictly
between 1 and n), and 2) is actually a divisor (has remainder 0 when dividing n by d):
1 def composite_verify (n , d ) :
2 """ n , d : positive integers
3 Verify that d is a nontrivial divisor of n . """
4 if not 1 < d < n :
5 return False
6 return n % d == 0
The first check is O(n) time (each bit of the integers needs to get compared to check for
1 < d < n. The next check involving the remainder is O(n2 ) time to do the division by the
standard grade-school division algorithm.
Let’s try it out on some integers:
1 n = 15
2 print ( composite_verify (n , 3) ) # 3 is a divisor of 15 , returns True
3 print ( composite_verify (n , 4) ) # 4 not a divisor of 15 , returns False
4
5 n = 17
6 print ( f " checking all possible divisors of { n } " )
7 for d in range (2 , n ) :
8 print ( composite_verify (n , d ) , end = " " )
9
10 n = 15
11 print ( f " \ nchecking all possible divisors of { n } " )
12 for d in range (2 , n ) :
13 print ( composite_verify (n , d ) , end = " " )

end of lecture 7b

10.2 The class NP


10.2.1 Definition of NP
We now formalize these notions.
Definition 10.2.1. A polynomial-time verifier for a language A is an algorithm V , where
there are constants c, k such that V runs in time O(nc ) and
n   o
∗ ≤|x|k
A = x ∈ {0, 1} ∃w ∈ {0, 1} V accepts hx, wi .

That is, x ∈ A if and only if there is a “short” string w where hx, wi ∈ L(V ) (where “short”
means bounded by a polynomial in |x|). We call such a string w a witness (or a proof or
certificate) that testifies that x ∈ A.
120 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

Definition 10.2.2. NP is the class of languages that have polynomial-time verifiers. We


call the language decided by the verifier the verification language of the NP language.

We note one technicality: sometimes we measure the running time of V as a function of


|x|, and sometimes we measure it as a function of its entire input length, which is |hx, wi|.
Since |w| is bounded by a polynomial in |x|, one of these running times is a polynomial if
and only if the other is. The exponent in the polynomial might change. For instance, if
we have n = |x| and m = |hx, wi| = n3 , and V (x, w) runs in time m5 , then it runs in time
(n3 )5 = n15 .

Example 10.2.3. For a graph G with a Hamiltonian path p, hpi is a witness testifying that
G has a Hamiltonian path.

Example 10.2.4. For a composite integer n with a divisor 1 < d < n, hdi is a witness
testifying that n is composite. Note that n may have more than one such divisor; this shows
that a witness for an element of a polynomially-verifiable language need not be unique.

For instance, HamPathV is a verification language for HamPath.

10.2.2 What sort of computation is NP is capturing?


What, intuitively, is this class NP? And why should anyone care about it? It is a very
natural notion, even if its mathematical definition looks unnatural.
In summary:

• P: we can quickly decide whether there’s a solution.1

• NP: given a potential solution, we can quickly verify that it’s correct.

NP is, in fact, a very natural notion, because it is far more common than not that real
problems we want to solve have this character that, whether or not we know how to find a
solution efficiently, we know what a correct solution looks like, i.e., we can verify efficiently
whether a purported solution is, in fact, correct.
For instance, it may not be obvious to me, given an encrypted file C, how to find the
decrypted version of it. But if you give me the decrypted version P and the encryption key
k, I can run the encryption algorithm E to verify that E(P, k) = C.

10.2.3 Decision vs. Search vs. Optimization


Even problems that don’t look like decision problems (such as optimization problems) can
be re-cast as decision problems, and when we do this, they tend also to have this feature of
being efficiently checkable.
1
There’s a connection to actually finding a solution (rather than merely knowing one exists) explained in more
detail in Section 10.2.3.
10.3. EXAMPLES OF PROBLEMS IN NP 121

Optimization. Suppose I want to find the cheapest flights from Sacramento to Rome.
This is an optimization problem. To cast this as a decision problem, we add a new input,
a threshold value, and ask whether there are flights costing under the threshold: e.g., given
two cities C1 and C2 and a budget of b dollars, we ask whether there is a sequence of flights
from C1 to C2 costing at most b dollars. It may not be obvious to me how to find an airline
route from Sacramento to Rome that costs less than $1200. But if you give me a sequence of
flights with their ticket prices, I can easily check that their total cost is ≤ $1200, that they
start at Sacramento and end at Rome, and that the destination airport of each intermediate
leg is the starting airport of the next leg.

Search. Not all problems in NP are variants of optimization problems. For example, the
problem of determining whether a number is prime or not seems to be “inherently Boolean”:
the number is prime or it isn’t. Even looking at it like a search problem—given n, find
the prime factorization of n—doesn’t seem to involve any optimization. All integers have a
unique prime factorization.
However, there is a sense in which all NP problems are search problems. Namely, if a
problem is in NP, then it has a polynomial-time verifier that verifies purported witnesses w
for instances x. The equivalent search problem is then: given x, find a witness w, or report
than none exists.
For example for the decision problem of determining if a graph G has a Hamiltonian
path, the equivalent search problem is to find a Hamiltonian path of G if it exists. For the
Boolean satisfiability decision problem of determining whether a formula φ has a satisfying
assignment, the equivalent search problem is to find a satisfying assignment if it exists.
More generally, if a decision problem A is in NP, then it has a polynomial-time verifier V
such that, for all x ∈ {0, 1}∗ , x ∈ A ⇐⇒ (∃w ∈ {0, 1}p(|x|) ) V (x, w) accepts. The equivalent
search problem is, given x, find a w such that V (x, w) accepts.2

10.3 Examples of problems in NP


10.3.1 Finding cliques
A clique in a graph is a subgraph in which every pair of nodes in the clique has an edge. A
k-clique is a clique with k nodes. Define

Clique = { hG, ki | G is an undirected graph with a k-clique } .

Theorem 10.3.1. Clique ∈ NP.


Proof. The following is a verifier for Clique, taking a set C of nodes as the witness:
2
We won’t prove the following in this course, but it is worth knowing: If P = NP, i.e., if every NP decision problem
can be solved in polynomial time), then every NP search problem can also be solved in polynomial time (though
perhaps requiring a larger polynomial time bound). In other words, although the decision problem looks easier, in
fact, the search problem of actually finding a witness is almost as easy as telling whether one exists.
122 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

1 from itertools import combinations as subsets


2 def clique_verifier (G , k , C ) :
3 """ G : graph
4 k : positive integer
5 C : set of nodes in G
6 Verify that C is a k - clique in G . """
7 V,E = G
8 # verify C is the correct size
9 if len ( C ) != k :
10 return False
11 # verify each pair of nodes in C shares an edge
12 for (u , v ) in subsets (C , 2) :
13 if (u , v ) not in E :
14 return False
15 return True
The check for |C| = k takes O(1) time. The loop executes O(n2 ) iterations, and each check
for (u,v) not in E takes time O(|E|) = O(n2 ). So the verifier is polynomial-time.
Try it out on some examples for the graph in Fig. 10.2.

2
1 3

6 4
5

Figure 10.2: A graph with a 4-clique {1, 2, 3, 4}.

1 V = [1 ,2 ,3 ,4 ,5 ,6]
2 E = [(1 ,2) , (1 ,3) , (1 ,4) , (2 ,3) , (2 ,4) , (3 ,4) , (4 ,5) , (5 ,6) , (4 ,6) ]
3 G = (V , E )
4 G = add_re verse_ edges ( G )
5
6 C = [1 ,2 ,3 ,4]
7 k = len ( C )
8 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
9 # true
10
11 C = [3 ,4 ,5]
12 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
13 # false
14
15 C = [4 ,5 ,6]
16 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
17 # false
18
19 C = [1 ,3 ,4 ,5]
20 print ( f " { C } is a { k } - clique in G ? { clique_verifier (G , k , C ) } " )
21 # false
10.3. EXAMPLES OF PROBLEMS IN NP 123

10.3.2 Finding subset of integers with a given sum


The SubsetSum problem is: given a collection (set with repetition) of integers x1 , . . . , xk
and a target integer t, is there a subcollection that sums to t?
( )
C = {x1 , . . . , xk } is a set of
SubsetSum = hC, ti integers and (∃S ⊆ C) t = .
P
y
y∈S

For example, h{4, 11, 16, 21, 27}, 25i ∈ SubsetSum because 4+21 = 25, but h{4, 11, 16}, 13i 6∈
SubsetSum.

Theorem 10.3.2. SubsetSum ∈ NP.

Proof. The following is a verifier for SubsetSum:


1 def subset_ sum_ve rify (C , t , S ) :
2 """ C , S : sets of integers
3 t : integer
4 Verify that S is a subset of C summing to t . """
5 if sum ( S ) != t : # check sum
6 return False
7 return C . issubset ( S )
Let n = |C|. We have |S| ≤ |C|, and each number in C can have at most n bits, so it
takes time O(n) to add them. Thus checking the sum of all numbers in S takes O(n2 )
time. The final issubset check is done by Python library code, but to analyze a possible
implementation of it, simply looping over each element of C (O(n) iterations) and doing
a linear search of S (O(n) time per iteration) would take O(n2 ) time. Thus the verifier is
polynomial-time.

Note that the complements of these languages, Clique and SubsetSum, are not obvi-
ously members of NP (and are believed not to be). The class coNP is the class of languages
whose complements are in NP, so Clique, SubsetSum ∈ coNP. It is not known whether
coNP = NP, but this is believed to be false, although proving that is at least as difficult as
proving that P 6= NP: since P = coP, if coNP 6= NP, then P 6= NP.

10.3.3 The P versus NP question


Exactly one of the two possibilities shown in Figure 10.3 is correct.
The “P = NP?” question is asking, informally, “Is finding solutions as easy as verifying
that they work? ” Intuitively, it seems the answer is no. It took people thousands of years
of studying the motion of planets to devise Kepler’s laws, but it’s easy to verify that they
fit the data. It seems harder to devise a proof of a mathematical theorem than to verify its
correctness. The conjecture P 6= NP formalizes this intuition, but so far, no one has been
able to prove it.
124 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

NP

or P=NP
P

Figure 10.3: We know that P ⊆ NP, so either P ( NP, or P = NP. We don’t know which is true,
but many conjecture P ( NP.

The best known method for solving NP problems in general is the one employed in
the proof of Theorem 10.4.1 in the next subsection: a brute-force search over all possible
witnesses, giving each as input to the verifier.

end of lecture 7c

10.4 NP problems are decidable in exponential time


Although a problem being in NP doesn’t give any hint how to decide it efficiently, it does give
a way to decide it inefficiently: brute-force search over the (exponentially many) possible
witnesses. Much of the focus of this chapter is in providing evidence that this exponential
blowup is fundamental, and not an artifact of the next proof that can be avoided through a
clever trick. S
Let EXP = ∞ nk
k=1 TIME(2 ) be the set of decision problems decidable in exponential time.

Theorem 10.4.1. NP ⊆ EXP.


In other words, every problem verifiable in polynomial time is decidable in exponential time.
Proof idea: To solve a problem in NP, iterate over all possible witnesses, and run the verifier
on each of them.
Proof. Let A ∈ NP. Then there are constants c, k and a O(nc )-time verifier VA such that,
k
for all x ∈ {0, 1}∗ (letting n = |x|), x ∈ A ⇐⇒ (∃w ∈ {0, 1}≤n ) V (x, w) accepts. For
example, if k = 7, then the following algorithm decides if the latter condition is true:
1 import itertools
2
3 k = 7 # k is a constant hard - coded into the program
10.4. NP PROBLEMS ARE DECIDABLE IN EXPONENTIAL TIME 125

4
5 def b i na r y _ s t r i n g s _ o f _ l e n g t h ( length ) :
6 """ Generate all strings of a given length """
7 return map ( lambda lst : " " . join ( lst ) ,
8 itertools . product ([ " 0 " ," 1 " ] , repeat = length ) )
9
10 def A_decider ( x ) :
11 """ Exponential - time algorithm for finding witnesses . """
12 n = len ( x )
13 for m in range ( n ** k + 1) : # check lengths m in [0 ,1 ,... , n ^ k ]
14 for w in b i n a r y _ s t r i n g s _ o f _ l e n g t h ( m ) :
15 if V_A (x , w ) :
16 return True
17 return False
We now analyze the running time. The two loops iterate over all binary strings of length at
P k
most nk , of which there are nm=0 2m = 2n +1 − 1 = O(2n ). Each iteration calls VA (x, w),
k k

k k
which takes time O(nc ). Then the total time is O(nc · 2n ) = O(22n ).

This is shown by example for HamPath below. It is a bit more efficient than above:
rather than searching all possible witness binary strings up to some length, it uses the fact
that any valid witness will encode a list of nodes, that list will have length n = |V |, and it will
be a permutation of the nodes (each node in V will appear exactly once). (ham_path_verify
is the same as the verification algorithm we used above, except that the checks intended to
ensure that p is a permutation of V have been removed for simplicity, since now we are
calling ham_path_verify only with p = a permutation of V )
1 import itertools
2 def ham_path_verify (G , p ) :
3 """ G : graph (V , E )
4 p : permutation of V ( list of unique nodes of length len ( V )
5 Verify that p is a Hamiltonian path in G . """
6 V,E = G
7 # verify each pair of adjacent nodes in p shares an edge
8 for i in range ( len ( p ) - 1) :
9 if ( p [ i ] , p [ i +1]) not in E :
10 return False
11 return True
12
13 def ham_path ( G ) :
14 """ G : graph
15 Exponential - time algorithm for finding Hamiltonian paths ,
16 which calls the verifier on all potential witnesses . """
17 V,E = G
18 for p in itertools . permutations ( V ) :
19 if ham_path_verify (G , p ) :
20 return True
21 return False
No one has proven that NP 6= EXP. It is known that P ⊆ NP ⊆ EXP, and since the Time
Hierarchy Theorem tells us that P EXP, it is known that at least one of the inclusions
126 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

P ⊆ NP or NP ⊆ EXP is proper, though it is not known which one. It is suspected that they
both are; i.e., that P NP EXP.

10.5 Introduction to NP-Completeness


We now come to the only reason that any computer scientist is concerned with the class NP:
the NP-complete problems. (More justification of that claim in Section 10.10)
Intuitively, a problem is NP-complete if it is in NP, and every problem in NP is reducible to
it in polynomial time. This implies that the NP-complete problems are, “within a polynomial
factor, the hardest problems” in NP. If any NP-complete problem has a polynomial-time
algorithm, then all problems in NP also have polynomial-time algorithms, including all the
other NP-complete problems. In other words, if any NP-complete problem is in P, then
P = NP. The contrapositive is more interesting because it is stated in terms of two claims
we think are true, rather than two things we think are false:3 if P 6= NP, then no NP-complete
problem is in P.
This gives us a tool by which to prove that a problem is “probably” (so long as P 6= NP)
intractable: show that it is NP-complete.

10.5.1 Boolean formulas


The first problem concerns Boolean formulas. We represent True with 1, False with 0,
And with ∧, Or with ∨, and Not with ¬ (or with an overbar such as 0 to mean ¬0):

0∨0=0 0 ∧ 0 = 0 ¬0 = 1
0∨1=1 0 ∧ 1 = 0 ¬1 = 0
1∨0=1 1∧0=0
1∨1=1 1∧1=1

A Boolean formula is an expression involving Boolean variables and the three operations ∧,
∨, and ¬ (negation). For example,

φ = (x ∧ y) ∨ (z ∧ ¬y)

is a Boolean formula. We may also write negation with an overbar, such as

φ = (x ∧ y) ∨ (z ∧ y)

More formally, given a finite set V of variables (we write variables as single letters such as
a and b, sometimes subscripted, e.g., x1 , . . . , xn for n variables), a Boolean formula over V
is either 0) (base case) a variable x ∈ V , or 1) (recursive case 1) ¬φ, where φ is a Boolean
formula over V (called the negation of φ), 2) (recursive case 2) φ ∧ ψ, where φ and ψ are
3
Statements such as “(Some NP-complete problem is in P) =⇒ P = NP” are often called, “If pigs could whistle,
then donkeys could fly”-theorems.
10.5. INTRODUCTION TO NP-COMPLETENESS 127

Boolean formulas over V , (called the conjunction of φ and ψ), 3) (recursive case 3) φ ∨ ψ,
where φ and ψ are Boolean formulas over V , (called the disjunction of φ and ψ).
¬ takes precedence over both ∧ and ∨, and ∧ takes precedence over ∨. Parentheses may
be used to override default precedence.

10.5.2 Implementation of Boolean formula data structure in Python


The following is a simple implementation of Boolean formulas as a Python class using the
recursive definition. (Don’t be scared of the next chunk of code; most of it is parsing code to
support the ability to create a Boolean formula object from a string representing it, which
is simpler than manually creating the objects recursively.)
1 class Boolean_formula ( object ) :
2 """ Represents a Boolean formula with AND , OR , and NOT operations . """
3 def __init__ ( self , variable = None , op = None , left = None , right = None ) :
4 if not (( variable == None and op == " not " and left == None and right != None )
5 or ( variable == None and op == " and " and left != None and right != None )
6 or ( variable == None and op == " or " and left != None and right != None )
7 or ( variable != None and op == left == right == None ) ) :
8 raise ValueError ( " Must either set variable for base case " +\
9 " or must set op to ’ not ’, ’ and ’, or ’ or ’" +\
10 " and recursive formulas left and right " )
11 self . variable = variable
12 self . op = op
13 self . left = left
14 self . right = right
15 if self . variable :
16 self . variables = [ variable ]
17 elif op == ’ not ’:
18 self . variables = right . variables
19 elif op in [ ’ and ’ , ’ or ’ ]:
20 self . variables = list ( left . variables )
21 self . variables . extend ( x for x in right . variables if x not in left .
variables )
22 self . variables . sort ()
23
24 def evaluate ( self , assignment ) :
25 """ Value of this formula with given assignment , a dict mapping variable
26 names to Python booleans .
27
28 Assignment can also be a string of bits , which will be mapped to variables
29 in alphabetical order . Boolean values are interconverted with integers to
30 make for nicer printing (0 and 1 versus True and False ) """
31 if type ( assignment ) is str :
32 assignment = dict ( zip ( self . variables , map ( int , assignment ) ) )
33 if self . op == None :
34 return assignment [ self . variable ]
35 elif self . op == ’ not ’:
36 return int ( not self . right . evaluate ( assignment ) )
37 elif self . op == ’ and ’:
38 return int ( self . left . evaluate ( assignment ) and self . right . evaluate ( assignment
))
128 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

39 elif self . op == ’ or ’:
40 return int ( self . left . evaluate ( assignment ) or self . right . evaluate ( assignment )
)
41 else :
42 raise ValueError ( " This shouldn ’t be reachable " )
43
44 def __repr__ ( self ) :
45 if self . variable :
46 return self . variable
47 elif self . op == ’ not ’:
48 return ’( not {}) ’. format ( self . right )
49 else :
50 return ’ ({} {} {}) ’. format ( self . left , self . op , self . right )
51
52 def __str__ ( self ) :
53 return repr ( self )
54
55 @staticmethod
56 def from_string ( text ) :
57 """ Convert string that looks like a Python Boolean expression with
58 variables , e . g . " x and y or not z and ( a or b ) " , to Boolean_formula . """
59 # add plenty of whitespace to make it easy to tokenize with string . split ()
60 for token in [ ’ and ’ , ’ or ’ , ’ not ’ , ’( ’ , ’) ’ ]:
61 text = text . replace ( token , ’ ’ + token + ’ ’)
62 tokens = text . split ()
63 val_stack = []
64 op_stack = []
65 for token in tokens :
66 if token in [ ’ and ’ , ’ or ’ , ’ not ’ ]:
67 cur_op = token
68 while len ( op_stack ) > 0 and not precedence_greater ( cur_op , op_stack [ -1]) :
69 process_top_op ( op_stack , val_stack )
70 op_stack . append ( cur_op )
71 elif token == ’( ’:
72 op_stack . append ( ’( ’)
73 elif token == ’) ’:
74 while op_stack [ -1] != ’( ’:
75 process_top_op ( op_stack , val_stack )
76 op_stack . pop ()
77 else :
78 val_stack . append ( Boolean_formula ( variable = token ) )
79 while len ( op_stack ) > 0 and not precedence_greater ( cur_op , op_stack [ -1]) :
80 process_top_op ( op_stack , val_stack )
81 return val_stack . pop ()
82
83 def process_top_op ( op_stack , val_stack ) :
84 """ Processes top operator from op_stack , popping one or two values as needed
85 from val_stack , and pushing the result back on the value stack . """
86 op = op_stack . pop ()
87 right = val_stack . pop ()
88 if op == ’ not ’:
89 val_stack . append ( Boolean_formula ( op = ’ not ’ , right = right ) )
90 elif op in [ ’ and ’ , ’ or ’ ]:
10.6. POLYNOMIAL-TIME REDUCIBILITY 129

91 left = val_stack . pop ()


92 val_stack . append ( Boolean_formula ( op = op , left = left , right = right ) )
93
94 def pr ec edence_greater ( op1 , op2 ) :
95 return ( op2 == ’( ’) or ( op1 == ’ not ’ and op2 in [ ’ and ’ , ’ or ’ ]) or ( op1 == ’ and ’ and
op2 == ’ or ’)

The following code builds the formula φ = (x ∧ y) ∨ (z ∧ y) and then evaluates it on all
23 = 8 assignments to its three variables.
1 formula = Boolean_formula . from_string ( " (( x and y ) or ( z and ( not y ) ) ) " )
2 import itertools
3 num_variables = len ( formula . variables )
4 for assignment in itertools . product ([ " 0 " ," 1 " ] , repeat = num_variables ) :
5 assignment = " " . join ( assignment )
6 value = formula . evaluate ( assignment )
7 print ( " formula value = {} on assignment {} " . format ( value , assignment ) )

10.5.3 Satisfiability of Boolean formulas


A Boolean formula is satisfiable if some assignment w of 0’s and 1’s to its variables causes
the formula to have the value 1. φ = (x ∧ y) ∨ (z ∧ y) is satisfiable because assigning
x = 0, y = 0, z = 1 causes φ to evaluate to 1, written φ(001) = 1. We say the assignment
satisfies φ. The Boolean satisfiability problem is to test whether a given Boolean formula is
satisfiable:
Sat = { hφi | φ is a satisfiable Boolean formula } .
Note that Sat ∈ NP, since the language
SatV = { hφ, wi | φ is a Boolean formula and φ(w) = 1 }
has a polynomial time decider, i.e., there is a polynomial-time verifier for Sat: if φ has
n input variables, then hφi ∈ Sat ⇐⇒ (∃w ∈ {0, 1}n ) hφ, wi ∈ SatV . The verifier is
essentially the method Boolean_formula.evaluate in the code above.
Finally, we state the reason for the importance of the Sat (as well as Clique, HamPath,
SubsetSum, and hundreds of other problems that share this characteristic):
Theorem 10.5.1 (Cook-Levin Theorem). Sat ∈ P if and only if P = NP.
This is considered evidence (though not proof) that Sat 6∈ P. The reason the Cook-Levin
Theorem is true is that Sat is NP-complete. In the next subsection, we develop the technical
machinery needed to prove that a problem is NP-complete: reductions.

10.6 Polynomial-time reducibility


10.6.1 Ranking the hardness of problems with reductions
Theorem 10.5.1 is proven by showing, in a certain technical sense, that Sat is “at least as
hard” as every other problem in NP. The concept of reductions is used to formalize what
130 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

we mean when we say a problem is at least as hard as another. Let’s discuss a bit what
we might mean intuitively by this. First, by “hard”, we mean with respect to the running
time required to solve the problem. If problem A can be decided in time O(n3 ), whereas
problem B cannot be decided in time Ω(n6 ), this means that B is harder than A: the fastest
algorithm for A is faster than the fastest algorithm for B.
But, we will also do what we have been doing so far and relax our standards of comparison
to ignore polynomial differences. So, suppose that B is actually decidable in time O(n6 ) and
no smaller: there is an O(n6 ) time algorithm for B and every algorithm deciding B requires
Ω(n6 ) time. Then, since n6 is within a polynomial factor of n3 , because (n3 )2 = n6 (i.e., n6
“only” quadratically larger than n3 ), we won’t consider this difference significant, and we
will say that the hardness of A and B are close enough to be “equivalent”, since they are
within a polynomial factor of each other.4
However, suppose there is a third problem C, and we could prove it requires time Ω(2n )
to decide. (We have difficulty proving this for particular natural problems, but the Time
Hierarchy Theorem assures us that such problems must exist). Then we will say that C really
is harder than A or B, since 2n is not within a polynomial factor of either n3 or n6 . Even
composing n6 with a huge polynomial like n1000 gives the polynomial (n6 )1000 = n6000 = o(2n ).
But, as we said, it is often difficult to take a particular natural problem that we care
about and prove that no algorithm with a certain running time can solve it. Obtaining
techniques for doing this sort of thing may well be the most important open problem in
theoretical computer science, and after decades we still have very few tools to do so. That
is to say, it is difficult to pin down the absolute hardness of a problem, the running time of
the most efficient algorithm for the problem.
What we do have, on the other hand, is a way to compare the relative hardness of two
problems. What reducibility allows to do is to say . . . okay, fine, although I don’t know
what’s the best algorithm for A, and I also don’t know what’s the best algorithm for B,
but what I do know is that, if A is “reducible” to B, this means that any algorithm for B
can be used to solve A with only a polynomial amount of extra time. Therefore, whatever is
the fastest algorithm for B (even if I don’t know what it is, or how fast it is), I know that
the fastest algorithm for A is “almost” as fast (perhaps a polynomial factor slower). So in
particular, if B is decidable in polynomial time, then so is A (perhaps with a larger exponent
in the polynomial).

end of lecture 8a
4
By this metric, all polynomial-time algorithms are within a polynomial factor of each other. But there are
other polynomially-equivalent running times that are larger than polynomial. For instance, a 2n -time algorithm is
within a polynomial factor of a n5 · 2n -time algorithm, or even a 4n -time algorithm, since 4n = (2n )2 . However,
2 2
a 2n -time algorithm is not within a polynomial factor of a 2n -time algorithm, since 2n is not bounded by a
2
polynomial function of 2n ; i.e., there is no polynomial p such that 2n = O(p(2n )). Even if p(n) = n1000 , we still get
2
p(2n ) = (2n )1000 = 21000n , which is o(2n ) because 1000n = o(n2 ).
10.6. POLYNOMIAL-TIME REDUCIBILITY 131

10.6.2 Simple example: Reduction of IndSet to Clique


Example 10.6.1. Recall a clique in a graph G = (V, E) is a subset C ⊆ V such that, for
all pairs u, v ∈ C of distinct nodes in C, {u, v} ∈ E. An independent set in G is a similar
concept: S ⊆ V is an independent set if, for all u, v ∈ S, {u, v} 6∈ E, and we say it is a
k-independent set if it has k nodes. Suppose we have an algorithm that can decide, on input
hGc , ki, where Gc has a k-clique. How can we use this to decide the problem, given input
hGi , ki, whether Gi has a k-independent set?

Gi = (V,E) Gc = (V,E)

Figure 10.4: A way to “reduce” the problem of finding an independent set of some size in a graph
Gi to finding an clique of the same size. We transform the graph Gi on the left to Gc on the right
by taking the complement of the set of edges. Each independent set in Gi (such as the four shaded
nodes) becomes a clique in Gc .

The idea is shown in Figure 10.4. To decide whether hGi , ki ∈ IndSet, we map the
graph Gi = (V, E) to the graph Gc = (V, E), (i.e., for each pair of nodes in V , add an edge if
there wasn’t one, and otherwise remove the edge if there was one), and then we give hGc , ki
to the decider for Clique. Since each independent set in Gi becomes a clique in Gc , for
each k, Gi has a k-independent set if and only if Gc has a k-clique. Thus, the decider for
Clique can be used as a subroutine to decide IndSet, by transforming the input hGi , ki
into hGc , ki, passing the transformed input to the decider for Clique, and then returning
its answer.

10.6.3 Definition of polynomial-time reducibility


The next definition formalizes the idea shown in the previous example.
Definition 10.6.2. A function f : {0, 1}∗ → {0, 1}∗ is polynomial-time computable if there
is a polynomial-time that, on input x, returns f (x).
Definition 10.6.3. Let A, B ⊆ {0, 1}∗ . We say A is polynomial-time reducible to B (a.k.a.,
polynomial-time many-one reducible, polynomial-time mapping reducible), written A ≤P B,
if there is a polynomial-time computable function f : {0, 1}∗ → {0, 1}∗ such that, for all
x ∈ {0, 1}∗ ,
x ∈ A ⇐⇒ f (x) ∈ B.
f is called a (polynomial-time) reduction of A to B.
132 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

We interpret A ≤P B to mean that “B is at least as hard as A, to within a polynomial-


time factor.”

{0,1}* {0,1}*

A B
f

Figure 10.5: A mapping reduction f that reduces decision problem A ⊆ {0, 1}∗ to B ⊆ {0, 1}∗ .
The key visual property above is that no arrow points from inside A to outside B, nor from outside
A to inside B. Either both endpoints are in the shaded regions, or both are not.

A pictorial representation of a mapping reduction is shown in Figure 10.5.


Note that for any reduction f computable in time nc for some constant c, and all x ∈
{0, 1}∗ , |f (x)| ≤ |x|c , because if the TM runs for |x|c steps, it can write down at most one
output bit of f (x) per step.

10.6.4 Using reductions to bound “hardness” of problems


Example using IndSet and Clique. To implement the mapping reduction of Example 10.6.1,
we can use this:
1 import itertools
2 def r e d u c t i o n _ f r o m _ i n d e p e n d e n t _ s e t _ t o _ c l i q u e ( G ) :
3 V,E = G
4 Ec = [ {u , v } for (u , v ) in itertools . combinations (V ,2)
5 if {u , v } not in E and u != v ]
6 Gc = (V , Ec )
7 return Gp
If we had a hypothetical polynomial-time algorithm for Clique, then it could be used to
make a polynomial-time algorithm for IndSet, calling the reduction above, in the following
way:
1 # hypothetical polynomial - time algorithm for Independent - Set , which
2 # calls another hypothetical polynomial - time algorithm A for Clique
3 def i n d e p e n d e n t _ s e t _ a l g o r i t h m (G , k ) :
4 Gp , kp = r e d u c t i o n _ f r o m _ c l i q u e _ t o _ i n d e p e n d e n t _ s e t (G , k )
10.6. POLYNOMIAL-TIME REDUCIBILITY 133

5 return clique_algorithm ( Gp , kp )
6
7 def clique_algorithm (G , k ) :
8 raise N o t Im p l em e n te d E rr o r ()
More formally, defining IndSet = {hG, ki | G has a k-independent set }, we have just
shown that IndSet ≤P Clique. In this special case, the reduction above also shows that
Clique ≤P IndSet, since one can use an IndSet-decider as a subroutine to decide Clique
in the exact same way. But this is a special case that does not hold in general: a reduction
from A to B is not in general also a reduction from B to A, and it may be that A ≤P B but
B 6≤P A.5

General formulation. The following theorem captures one way to formalize the claim
P
“A ≤ B means that A is no harder than B”:
Theorem 10.6.4. Suppose A ≤P B. If B ∈ P, then A ∈ P.
Proof. The idea of how to use a polynomial-time mapping reduction is shown in Figure 10.6.

algorithm MA for A

reduction f of A to B algorithm MB for B yes

f(x) MB
x f
no

Figure 10.6: How to compose a mapping reduction f : {0, 1}∗ → {0, 1}∗ that reduces decision
problem A to decision problem B, with an algorithm MB deciding B, to create an algorithm MA
deciding A. If MB and f are both computable in polynomial time, then their composition MA is
computable in polynomial time as well.

Let MB be the algorithm deciding B in time nk for some constant k, and let f be the
reduction from A to B, computable in time nc for some constant c. Define the algorithm
1 def f ( x ) :
2 raise N o t Im p l em e n te d E rr o r ()
3 # TODO : code for f , reducing A to B , goes here
4
5 def M_B ( y ) :
6 raise N o t Im p l em e n te d E rr o r ()
7 # TODO : code for M , deciding B , goes here
8
5
For instance, we believe this is true of NP-complete problems: The statement that P 6= NP is equivalent to the
statement that Path ≤P Clique but Clique 6≤P Path, since Path ∈ P and Clique is NP-complete.
134 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

9 def M_A ( x ) :
10 """ Compose reduction f from A to B , with algorithm M for B ,
11 to get algorithm N for A . """
12 y = f(x)
13 output = M_B ( y )
14 return output
N correctly decides A because x ∈ A ⇐⇒ f (x) ∈ B. Furthermore, on input x of length
n, the line y = f(x) runs in time nc , and assuming in the worst case that the length of y
is nc , the line output = M_B(y) runs in time (nc )k = nck . Thus N runs in time at most
nc + nck = nc(k+1) .

Theorem 10.6.4 tells us that if the fastest algorithm for B takes time t(n), then the fastest
algorithm for A takes no more than time p(n)+t(p(n)), where p is the running time of f ; i.e.,
p is the “polynomial factor” when claiming that “A is no harder than B within a polynomial
factor”. Since we ignore polynomial differences when defining P, if it is also true that t is
a polynomial (i.e., B ∈ P), then we conclude that A ∈ P as well, since p(n) + t(p(n)) is
bounded by a polynomial in n.
This is where things begin to get a bit abstract compared to previous definitions. The
main way that we use reductions is to show that efficient algorithms do not exist for a
problem, by invoking the contrapositive of Theorem 10.6.4:

Corollary 10.6.5. Suppose A ≤P B. If A 6∈ P, then B 6∈ P.

In other words, if A ≤P B, and if A is hard to decide, then B is also hard to decide.

10.6.5 How to remember which direction reductions go


The most common mistake made when using a reduction to show that a problem is hard is
to switch the order of the problems in the reduction, To show that HamPath ≤P Sat, for
instance, we must transform a graph for HamPath into formula for Sat. It is a mistake to
go the other direction by transforming a formula for Sat into graph for HamPath.
The ≤ sign is intended to be evocative of “comparing hardness” of the problems. So if
we write A ≤P B, this means B is at least as hard as A, or A is at least as easy as B. How do
you show a problem is easy? You give an algorithm for solving it! So the way to remember
which problem the reduction is helping you solve, is to ask, “which is the easier problem
here?” That’s the one you are writing an algorithm for. The easier problem is the one on
the left of the ≤P , A. So if the reduction is intended to be used with an existing algorithm
for one problem B, in order to show that there is an algorithm for the other problem A, then
then problem A for which you are showing an algorithm exists is the easier one.
Of course, there’s a simpler mneumonic, which doesn’t really convey the why of reduc-
tions, but that is easy to remember. We always write reductions in the same left-to-right
order A ≤P B. If you picture the computation of the reduction itself starting with x on the
f
left, then processing through f to end up with f (x) on the right (x →
− f (x) as in Figure 10.6),
10.6. POLYNOMIAL-TIME REDUCIBILITY 135

then this will remind you that you should start with an instance x of A, and end with an
instance f (x) of B.
Unlike the previous examples of code implementing these ideas, we cannot give concrete
examples of MA and MB above, because for the types of problems we consider using these
reductions, we believe that no efficient algorithms MA and MB exist for the two problems
A and B. There is, however, a concrete, efficiently computable reduction f from A to B for
many important choices of A and B. We cover one of these choices next.

10.6.6 Definition of the 3Sat problem


A literal is a Boolean variable or negated Boolean variable, such as x or x. A clause is several
literals connected with ∨’s, such as (x ∨ y ∨ z ∨ x). A conjunction is several subformulas
connected with ∧’s, such as (x ∧ (y ∨ z) ∧ z ∧ w). A Boolean formula φ is in conjunctive
normal form, called a CNF-formula, if it consists of a conjunction of clauses,6 such as
φ = (x ∨ y ∨ z ∨ w) ∧ (z ∨ w ∨ x) ∧ (y ∨ x).
φ is a 3CNF-formula if all of its clauses have exactly three literals, such as
φ = (x ∨ y ∨ z) ∧ (z ∨ y ∨ x).
Note that any CNF formula with at most 3 literals per clause can be converted easily to
3CNF by duplicating literals; for example, (x ∨ y) is equivalent to (x ∨ y ∨ y).
Define
3Sat = { hφi | φ is a satisfiable 3CNF-formula } .
Theorem 10.9.3 shows that 3Sat is NP-complete. For now we will simply show that it is
reducible to IndSet.

end of lecture 8b

10.6.7 Reduction between problems with different data types: 3Sat ≤P IndSet
We now use ≤P -reductions to show that IndSet is “at least as hard” (to within a polynomial
factor) as a restricted version of the Sat problem known as 3Sat.
To construct a polynomial-time reduction from 3Sat to another language, we transform
the variables and clauses in 3Sat into structures in the other language. These structures
are called gadgets. For example, to reduce 3Sat to IndSet, nodes “simulate” variables and
triples of nodes “simulate” clauses. 3Sat is not the only NP-complete language that can
be used to show other problems are NP-complete, but its regularity and structure make it
convenient for this purpose.
6
The obvious dual of CNF is disjunctive normal form (DNF), which is an Or of conjunctions, such as the formula
one would derive applying the sum-of-products rule to the truth table of a boolean function, but 3DNF formulas do
not have the same nice properties that 3CNF formulas have, so we do not discuss them further.
136 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

x y y

x y

x y

y x

Figure 10.7: Example of the ≤P -reduction from 3Sat to IndSet when the input is the 3CNF
formula φ = (x ∨ x ∨ y) ∧ (x ∨ y ∨ y) ∧ (x ∨ y ∨ y). Observe that x = 0, y = 1 is a satisfying
assignment. Furthermore, if we pick exactly one node in each triple, corresponding to a literal that
satisfies the clause associated to that triple, it is a k-independent set: since we are picking the
literal that satisfies each clause, we never pick both a literal and its negation, so every node we pick
lacks an edge to every other node. In this example, the formula is satisfied by x = 0, y = 1, and
picking nodes y on the left, one of the y nodes on the right, and x on the top is a 3-independent
set.

Theorem 10.6.6. 3Sat ≤P IndSet.

Proof. Given a 3CNF formula φ, the reduction must output a pair hG, ki, a graph G and
integer k, so that

φ is satisfiable ⇐⇒ G has a k-independent set.

Let k = # of clauses in φ. Write φ as

φ = (a1 ∨ b1 ∨ c1 ) ∧ (a2 ∨ b2 ∨ c2 ) ∧ . . . ∧ (ak ∨ bk ∨ ck ).

G will have 3k nodes, each labeled with a literal appearing in φ.7


See Figure 10.7 for an example of the reduction.
The nodes of G are organized into k groups of three nodes each, called triples. Each
triple corresponds to a clause in φ.
G has edge {u, v} if and only if at least one of the following holds:
1. u and v are in the same triple, or
7
Note that φ could have many more (or fewer) literals than variables, so G may have many more nodes than φ
has variables. But G will have exactly as many nodes as φ has literals. Note also that literals can appear more than
once, e.g., (x ∨ x ∨ y) has two copies of the literal x.
10.6. POLYNOMIAL-TIME REDUCIBILITY 137

2. u is labeled x and v is labeled x for some variable x, or vice-versa.


Let
n
 n = |V2 |. Each of these conditions can be checked in polynomial time, and there are
2
= O(n ) such conditions, so this is computable in polynomial time.
We now show that

φ is satisfiable ⇐⇒ G has a k-independent set.

( =⇒ ): Suppose φ(w) = 1 has a satisfying assignment. Then at least one literal is true in
every clause. To construct a k-independent set S, select exactly one node from each
clause labeled by a true literal (breaking ties arbitrarily). For every u, v ∈ S where
u 6= v, condition (1) is false, and since x and x cannot both be true, condition (2) is
false. Therefore S is a k-independent set.
( ⇐= ): Suppose there is a k-independent set S in G. For every u, v ∈ S where u 6= v, u
and v are in different triples by condition (1). Since there are k triples and |S| = k,
S contains exactly one node from each triple. If a node in S is labeled with literal x,
assign x = 1. If the node is labeled with literal x, assign x = 0. Assign other variables
arbitrarily (e.g., set them to 0).
Since no pair of nodes in S are labeled with x and x by condition (2), this assignment
is well-defined (we will not attempt to assign x to be both 0 and 1). The assignment
makes every clause true, thus satisfies φ.

The reduction can be implemented in Python as


1 class CNF ( object ) :
2 """ Represents a CNF formula . Each variable is a string ( e . g . , " x1 " ,
3 and each clause is a tuple of strings , each either a variable
4 or its negation , e . g . , ("! x1 " , " x3 " , "! x4 ") """
5 def __init__ ( self , clauses ) :
6 self . variables = extract_variables ( clauses )
7 self . clauses = clauses
8
9 import itertools
10 def r e d u c e_3sa t_to _ind set ( cnf_formula ) :
11 nodes_grouped = [(( ’a ’ ,idx , clause [0]) , ( ’b ’ ,idx , clause [1]) , ( ’c ’ ,idx , clause [2]) )
12 for ( clause , idx ) in zip ( cnf_formula . clauses , itertools . count () ) ]
13 V = [ node for node_group in nodes_grouped for node in node_group ]
14 E = []
15 for (u , v ) in itertools . combinations (V ,2) :
16 add_edge = False
17 (_ , clause_idx1 , lit1 ) ,(_ , clause_idx2 , lit2 ) = u , v
18 # add edge if nodes are in same clause
19 for clause in cnf_formula . clauses :
20 if clause_idx1 == clause_idx2 :
21 add_edge = True
22 if lit1 == " ! " + lit2 or lit2 == " ! " + lit1 :
23 add_edge = True
24 if add_edge :
138 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

25 E . append ({ u , v })
26 k = len ( cnf_formula . clauses )
27 return (( V , E ) , k )

Theorems 10.6.4 and 10.6.6 tell us that if IndSet is decidable in polynomial time, then
so is 3Sat. In terms of what we believe is actually true, Corollary 10.6.5 and Theorem 10.6.6
tell us that if 3Sat is not decidable in polynomial time, then neither is IndSet.
Since IndSet is equivalent to Clique, this also shows that if 3Sat is not decidable in
polynomial time, then neither is Clique.

10.6.8 Reductions are algorithms, but they don’t solve either problem.
A reduction is just an algorithm. What makes them difficult to understand when first
learning the concept is that they are used to relate the difficulty of two problems, but not to
solve either problem. The reduction above, showing that 3Sat ≤P IndSet, is an algorithm
that takes an instance of 3Sat as input, but it does not solve the problem 3Sat. It also
does not solve the problem IndSet. It transforms an instance of 3Sat into an instance of
IndSet, while preserving the correct answer, but without ever knowing what that answer is.
The job of the reduction is to translate the question about a formula into a question about
a graph, not to answer either question.

10.7 Definition of NP-completeness


Definition 10.7.1. A language B is NP-hard if, for every A ∈ NP, A ≤P B.

Definition 10.7.2. A language B is NP-complete if

1. B ∈ NP, and

2. B is NP-hard.

Theorem 10.7.3. If B is NP-complete and B ∈ P, then P = NP.

Proof. Assume the hypothesis. Since P ⊆ NP, it suffices to show NP ⊆ P. Let A ∈ NP.
Since B is NP-hard, A ≤P B. Since B ∈ P, by Theorem 10.6.4 (the closure of P under
≤P -reductions), A ∈ P. Since A was arbitrary, NP ⊆ P.

Corollary 10.7.4. If P 6= NP, then no NP-complete problem is in P.

Since it is generally believed that P 6= NP, Corollary 10.7.4 implies that showing a problem
is NP-complete is evidence of its intractability.
The following property of polynomial-time reductions will be useful:

Observation 10.7.5. ≤P is transitive.


10.7. DEFINITION OF NP-COMPLETENESS 139

reduction h = g ○ f of A to C

reduction f of A to B reduction g of B to C

f(x)
x f g g(f(x))

Figure 10.8: Polynomial-time reductions are transitive: simply compose the algorithms.

Proof. The idea is shown in Fig. 10.8. Let A, B, C ⊆ {0, 1}∗ where A ≤P B via reduction f
and B ≤P C via reduction g. Then h = g ◦ f (i.e., for all x ∈ {0, 1}∗ , define h(x) = g(f (x)))
is a polynomial-time reduction of A to C: h can be computed in polynomial time since both
f and g can, and for all x ∈ {0, 1}∗ , x ∈ A ⇐⇒ f (x) ∈ B, and f (x) ∈ B ⇐⇒ g(f (x)) ∈ C,
so x ∈ A ⇐⇒ h(x) ∈ C.

The following theorem, using the transitivity of reductions proven in Observation 10.7.5,
is generally how one shows that a problem C is NP-hard: find some other NP-hard problem
B and show B ≤P C.

Theorem 10.7.6. If B is NP-hard and B ≤P C, then C is NP-hard.

Proof. Let A ∈ NP. Then A ≤P B since B is NP-hard. Since ≤P is transitive and B ≤P C,


it follows that A ≤P C. Since A ∈ NP was arbitrary, C is NP-hard.

The following Python code shows how one would implement this idea.
1 def reduce_A_to_B ( x ) :
2 """ reduction from some NP language A to NP - hard language B """
3 raise N o t Im p l em e n te d E rr o r ()
4
5 def reduce_B_to_C ( x ) :
6 """ reduction showing B reduces to C """
7 raise N o t Im p l em e n te d E rr o r ()
8
9 def C_decider ( x ) :
10 """ hypothetical polynomial - time decider for C """
11 raise N o t Im p l em e n te d E rr o r ()
12
13 def A_decider ( x ) :
14 """ composition of two reductions to show how a decider for C
15 can be called to decide A """
16 y = reduce_A_to_B ( x )
17 z = reduce_B_to_C ( y )
18 return C_decider ( z )

Corollary 10.7.7. If B is NP-complete, C ∈ NP, and B ≤P C, then C is NP-complete.


140 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

NP-complete
C
NP


A≤B



X Y

Figure 10.9: If an NP-complete problem reduces to C, and C is also in NP, then C is NP-complete
as well. This is because of transitivity: all the NP problems reduce to B, and B reduces to C, so
by composing these, all the NP problems reduce to C through B.

Corollary 10.7.7 is shown pictorially in Figure 10.9.


That is, NP-complete problems can, in many ways, act as “representatives” of the hard-
ness of NP, in the sense that black-box access to an algorithm for solving an NP-complete
problem is as good as access to an algorithm for any other problem in NP.
Corollary 10.7.7 is our primary tool for proving a problem is NP-complete: show that
some existing NP-complete problem reduces to it.

10.7.1 The Cook-Levin Theorem


Using the terminology we have developed concerning NP-completeness, we can now restate
the Cook-Levin Theorem (that if P 6= NP, then neither Sat nor any NP-complete problem
is in P).
Theorem 10.7.8. Sat and 3Sat are NP-complete.8
We won’t prove Theorem 10.7.8 in this course (take ECS 220 for that).
Theorem 10.7.9. Clique and IndSet are NP-complete.
Proof. 3Sat ≤P IndSet by Theorem 10.6.6, and 3Sat is NP-complete, so IndSet is NP-
hard.

Recall
that we also showed IndSet ≤P Clique via the reduction h(V, E), ki 7→
(V, E), k . So Clique is NP-hard as well since the NP-hard problem IndSet reduces to it.
We also showed both Clique and IndSet are in NP, so they are NP-complete.
8
By Theorem 10.7.3, Theorem 10.7.8 implies Theorem 10.5.1. In other words, because Sat is NP-complete, if
Sat ∈ P, then since all A ∈ NP are reducible to Sat, A ∈ P as well, i.e., P = NP. Conversely, if P = NP, then
Sat ∈ P (and so is every other problem in NP).
10.8. OPTIONAL: ADDITIONAL NP-COMPLETE PROBLEMS 141

V1 x x y y z z
(variable
gadgets)

x x x y
V2
(clause
gadgets) x y y y y z x z

Figure 10.10: An example of the reduction from 3Sat to VertexCover for the 3CNF formula
φ = (x ∨ x ∨ y) ∧ (x ∨ y ∨ y) ∧ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z), with m = 3 variables and l = 4 clauses.
φ is satisfied by x = 0, y = 1, z = 0, and we use this assignment to find a vertex cover C with
|C| = m + 2l = 11 as described in the proof; nodes in C are shown with a bold outline.

end of lecture 8c
Memorial Day

end of lecture 9a

10.8 Optional: Additional NP-Complete problems


10.8.1 Vertex Cover
If G is an undirected graph, a vertex cover of G is a subset of nodes, where each edge in G
is connected to at least one node in the vertex cover. Define

VertexCover = { hG, ki | G is an undirected graph with a vertex cover of size k } .

Note that adding nodes to a vertex cover cannot remove its ability to touch every edge;
hence if G = (V, E) has a vertex cover of size k, then it has a vertex cover of each size k 0
where k ≤ k 0 ≤ |V |. Therefore it does not matter whether we say “of size k” or “of size at
most k” in the definition of VertexCover.

Theorem 10.8.1. VertexCover is NP-complete.

Proof. We must show that VertexCover is in NP, and that some NP-complete problem
reduces to it.
142 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

(in NP): The language


 
G is an undirected graph and
VertexCoverV = hhG, ki , Ci
C is a vertex cover for G of size k

is a verification language for VertexCover, and it is in P: if G = (V, E), the verifier


tests whether C ⊆ V , |C| = k, and for each {u, v} ∈ E, either u ∈ C or v ∈ C.
(NP-hard): Reduction from 3Sat: Given a 3CNF-formula φ, we show how to (efficiently)
transform φ into a pair hG, ki, where G = (V, E) is an undirected graph and k ∈ N,
such that φ is satisfiable if and only if G has a vertex cover of size k.
See Figure 10.10 for an example of the reduction.
For each variable xi in φ, add two nodes labeled xi and xi to V , and connect them by
an edge; call this set of gadget nodes V1 . For each literal in each clause, we add a node
labeled with the literal’s value; call this set of gadget nodes V2 .
Connect nodes u and v by an edge if they are
1. in the same variable gadget,
2. in the same clause gadget, or
3. have the same label.
If φ has m input variables and l clauses, then V has |V1 | + |V2 | = 2m + 3l nodes. Let
k = m + 2l.
Since |V | = 2m + 3l and |E| ≤ |V |2 , G can be computed in O(| hφi |2 ) time.
Now we show that φ is satisfiable ⇐⇒ G has a vertex cover of size k:

( =⇒ ): Suppose (∃x1 x2 . . . xm ∈ {0, 1}m ) φ(x1 x2 . . . xn ) = 1. If xi = 1 (resp. 0), put


the node in V1 labeled xi (resp. xi ) in the vertex cover C; then every variable
gadget edge is covered. In each clause gadget where this literal appears, if it
appears in more than one node, pick one node arbitrarily and put the other two
nodes of the gadget in C; then all clause gadget edges are covered.9 All edges
between variable and clause gadgets are covered, by the variable node if it was
included in C, and by a clause node otherwise, since some other literal satisfies
the clause. Since |C| = m + 2l = k, G has a vertex cover of size k.
( ⇐= ): Suppose G has a vertex cover C with k nodes. Then C contains at least one
of each variable node to cover the variable gadget edges, and at least two clause
nodes to cover the clause gadget edges. This is ≥ k nodes, so C must have exactly
one node per variable gadget and exactly two nodes per clause gadget to have
exactly k nodes. Let xi = 1 ⇐⇒ the variable node labeled xi ∈ C. Each node
in a clause gadget has an external edge to a variable node; since only two nodes
9
Any two clause gadget nodes will cover all three clause gadget edges.
10.8. OPTIONAL: ADDITIONAL NP-COMPLETE PROBLEMS 143

of the clause gadget are in C, the external edges of the third clause gadget node
is covered by a node from a variable gadget, whence the assignment satisfies the
corresponding clause.

Reduction from IndSet: Instead of reducing from 3Sat, an elegant way to see
that VertexCover is NP-hard is to observe that IndSet ≤P VertexCover by a
particularly simple reduction hG, ki 7→ hG, n − ki, where n is the number of nodes.
This works because S is an independent set in G = (V, E) if and only if V \ S is a
vertex cover, and if |S| = k then |V \ S| = n − k. To see the forward direction, note
that no pair of nodes in S has an edge that needs to be covered. Thus, if we pick
all nodes in V \ S, they cover all edges between nodes in V \ S, and since S is an
independent set, all remaining edges must be between a node in S and a node in V \ S,
so the set V \ S covers these edges as well. Conversely, if V \ S is a vertex cover, then
no pair of nodes in S can have an edge between them (since it would be uncovered by
V \ S), so S must be an independent set. In fact, this very same reduction also shows
VertexCover ≤P IndSet; they are nearly equivalent problems, in the same way
that Clique is nearly equivalent to IndSet (recall that the reduction from IndSet
to Clique simply takes the complement of E).
We gave two reductions in the proof; there are many ways to prove a problem is NP-hard,
and you just have to find one. The first one reducing 3Sat to VertexCover was much
more complex than the second that used a reduction from IndSet. Why did we go through
all that work then, if there was an easier way?
Sometimes the simplest reduction isn’t so simple as the second one above. It’s good
to see an example of a more complex reduction between two objects of different “types”
(in this case reducing a Boolean formula question to a graph question), to see how to do
them when the correspondence between two problems is not so obvious. But, when you are
trying to show a problem is NP-complete, generally the most efficient strategy is to start by
looking for existing NP-complete problems about the same type of object. Often you find
that some existing NP-complete problem is very close to your problem (such as IndSet and
VertexCover), and the reduction is very straightforward. If that doesn’t work, then move
on to other NP-complete problems. For some reason, 3Sat often works well.

10.8.2 Subset Sum


Theorem 10.8.2. SubsetSum is NP-complete.
Proof. Theorem 10.3.2 tells us that SubsetSum ∈ NP. We show SubsetSum is NP-hard
by reduction from 3Sat.
Let φ be a 3CNF formula with variables x1 , . . . , xm and clauses c1 , . . . , cl . Construct the
pair hS, ti, where S = {y1 , z1 , . . . , ym , zm , g1 , h1 , . . . , gl , hl } is a collection of 2(m + l) integers
and t is an integer, whose decimal expansions are based on φ as shown by example:
(x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x3 ∨ x4 )
144 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

x 1 x 2 x 3 x 4 c1 c2 c3
y1 1 0 0 0 1 0 1
z1 1 0 0 0 0 1 0
y2 1 0 0 0 1 0
z2 1 0 0 1 0 0
y3 1 0 1 1 0
z3 1 0 0 0 1
y4 1 0 0 1
z4 1 0 0 0
g1 1 0 0
h1 1 0 0
g2 1 0
h2 1 0
g3 1
h3 1
t 1 1 1 1 3 3 3
The upper-left and bottom-right of the table contain exactly one 1 per row as shown, the
bottom-left is all empty (leading 0’s), and t is m 1’s followed by l 3’s. The upper-right of
the table has 1’s to indicate which literals (yi for xi and zi for xi ) are in which clause. Thus
each column in the upper-right has exactly three 1’s.
The table has size O((m + l)2 ), so the reduction can be computed in time O(n2 ), since
m, l ≤ n.
We now show that
X
φ is satisfiable ⇐⇒ (∃S 0 ⊆ S) t = n
n∈S 0

( =⇒ ): Suppose (∃x1 x2 . . . xm ∈ {0, 1}m ) φ(x1 x2 . . . xm ) = 1. Select elements S 0 ⊆ S as


follows. For each 1 ≤ i ≤ m, yi ∈ S 0 if xi = 1, and zi ∈ S 0 if xi = 0. Since every clause
cj of φ is satisfied, for each column cj , at least one row with a 1 in column cj is selected
in the upper-right. For each 1 ≤ j ≤ l, if needed gj and/or hj are placed in S 0 to make
column cj on the right side of the table sum to 3.10 Then S 0 sums to t.

( ⇐= ): Suppose (∃S 0 ⊆ S) t = n∈S 0 n. All the digits in elements of S are either 0 or


P
1, and each column in the table contains at most five 1’s; hence there are no carries
when summing S 0 . To get a 1 in the first m columns, S 0 must contain exactly one of
each pair yi and zi . The assignment x1 x2 . . . xm ∈ {0, 1}m is given by letting xi = 1
if yi ∈ S 0 , and letting xi = 0 if zi ∈ S 0 . Let 1 ≤ j ≤ l; in each column cj , to sum to
10
One or both will be needed if the corresponding clause has some false literals, but at least one true literal must
be present to satisfy the clause, so no more than two extra 1’s are needed from the bottom-right to sum the whole
column to 3.
10.9. OPTIONAL: PROOF OF THE COOK-LEVIN THEOREM 145

3 in the bottom row, S 0 contains at least one row with a 1 in column j in the upper-
right, since only two 1’s are in the lower-right. Hence cj is satisfied for every j, whence
φ(x1 x2 . . . xm ) = 1.

10.8.3 Hamiltonian path


Theorem 10.8.3. HamPath is NP-complete.
Proof. We showed HamPath is in NP earlier (before we had formally defined NP, but we
did give a polynomial-time verifier).
To see that HamPath is NP-hard, we show 3Sat ≤P HamPath. Let φ be a 3CNF
formula with n variables x1 . . . , xn and k clauses C1 , . . . , Ck . Define a graph Gφ = (V, E)
with 3k + 3 nodes.
finish this

Finally, we don’t prove it in this course, but HamPath is NP-complete. So are the
following variants:

• HamPathst = {hG, s, ti | G is an directed graph with a Hamiltonian path from s to t}


• UHamPath = {hGi | G is an undirected graph with a Hamiltonian path}
• UHamPathst = {hG, s, ti | G is an undirected graph with a Hamiltonian path from s to t}

10.9 Optional: Proof of the Cook-Levin Theorem


We prove Theorem 10.7.8 as follows. We first show that the language CircuitSat is NP-
complete, where CircuitSat consists of the satisfiable Boolean circuits. We then show that
CircuitSat ≤P 3Sat. Since every 3CNF formula is a special case of a Boolean formula, it
is easy to check that 3Sat ≤P Sat. Thus, we we are done, we will have shown all three of
these languages are NP-hard. Since they are all in NP, they are NP-complete.
Definition 10.9.1. A Boolean circuit is a collection of gates and inputs (i.e., nodes in a
graph) connected by wires (directed edges). Cycles are not permitted (it is a directed acyclic
graph). Gates are labeled And, Or (each with in-degree 2), or Not (with in-degree 1), and
have unbounded out-degree. One gate is designated the output gate.
What is the difference between a Boolean circuit and a Boolean formula? A formula is a
special type of circuit in which all the gates have fan-out 1. A circuit is more general and
allows a subexpression to be calculated once, then shared among several parts of the circuit
by fanning out its value to several downstream gates.
Given an n-input Boolean circuit γ and a binary input string x = x1 x2 . . . xn , the value
γ(x) ∈ {0, 1} is the value of the output gate when evaluating γ with the inputs given by
146 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

each xi , and the values of the gates are determined by computing the associated Boolean
function of its inputs. A circuit γ is satisfiable if there is an input string x that satisfies γ;
i.e., such that γ(x) = 1.11
Define
CircuitSat = { hγi | γ is a satisfiable Boolean circuit } .

Theorem 10.9.2. CircuitSat is NP-complete.

Proof Sketch. CircuitSat ∈ NP because the language

CircuitSatV = { hγ, wi | γ is a Boolean circuit and γ(w) = 1 }

is a polynomial-time verification language for CircuitSat.12


Let A ∈ NP, and let V be an p(n)-time-bounded verifier for A with witness length q(n),
so that, for all x ∈ {0, 1}∗ ,

∃w ∈ {0, 1}q(n) V (x, w) accepts.



x ∈ A ⇐⇒

To show that A ≤P CircuitSat, we transform an input x to A into a circuit γxV such that
γxV is satisfiable if and only if there is a w ∈ {0, 1}q(n) such that V (x, w) accepts.13
Let V = (Q, Σ, Γ, δ, s, qa , qr ) be the Turing machine deciding CircuitSatV . V takes two
inputs, x ∈ {0, 1}n and the witness w ∈ {0, 1}q(n) . γxV contains constant gates representing x,
and its q(n) input variables represent a potential witness w. We design γxV so that γxV (w) = 1
if and only if V (x, w) accepts.
We build a subcircuit γlocal .14 γlocal has 3m inputs and m outputs, where m depends on
V – but not on x or w – as described below. Assume that each state q ∈ Q and each symbol
a ∈ Γ is represented a binary string,15 called σq and σa , respectively.
11
The only difference between a Boolean formula and a Boolean circuit is the unbounded out-degree of the gates.
A Boolean formula has out-degree one, so that when expressed as a circuit, have gates that form a tree (although
note that even in a formula the input variables can appear multiple times and hence have larger out-degree), whereas
a Boolean circuit, by allowing unbounded fanout from its gates, allows the use of shared subformulas. (For example,
“Let ϕ = (x1 ∧ x2 ) in the formula φ = x1 ∨ ϕ ∧ (x4 ∨ ϕ).”) This is a technicality that will be the main obstacle to
proving that CircuitSat ≤P 3Sat, but not a difficult one.
12
To show that CircuitSat is NP-hard, we show how any verification algorithm can be simulated by a circuit, in
such a way that the verification algorithm accepts a string if and only if the circuit is satisfiable. The input to the
circuit will not be the first input to the verification algorithm, but rather, the witness.
13
In fact, w will be the satisfying assignment for γxV . The subscript x is intended to emphasize that, while x is an
input to V , it is hard-coded into γxV ; choosing a different input y for the same verification algorithm V would result
in a different circuit γyV .
The key idea will be that circuits can simulate algorithms. We prove this by showing that any Turing machine can
be simulated by a circuit, as long as the circuit is large enough to accommodate the running time and space used by
the Turing machine.
14
Many copies of γlocal will be hooked together to create γxV .
15
This is done so that a circuit may process them as inputs.
10.9. OPTIONAL: PROOF OF THE COOK-LEVIN THEOREM 147

Let m = 1 + dlog |Q|e + dlog |Γ|e.16 Represent V ’s configuration C = (q, p, y)17 as an


element of {0, 1}p(n)·m as follows. The pth tape cell with symbol a is represented as the
string σ(p) = 1σq σa . Every tape cell at positions p0 6= p with symbol b are represented as
σ(p0 ) = 0σs σb .18 Represent the configuration C by the string σ(C) = σ(0)σ(1) . . . σ(nk − 1).
γlocal : {0, 1}3m → {0, 1}m is defined to take as input σ(k − 1)σ(k)σ(k + 1),19 and output
the next configuration string for tape cell k. γlocal can be implemented by a Boolean circuit
whose size depends only on δ, so γlocal ’s size is a constant depending on V but independent
of n.20
To construct γxV ,21 attach p(n) · (m · p(n)) copies of γlocal in a square array, where the tth
horizontal row of wires input to a row of γlocal ’s represents the configuration of V at time
step t. The input x to V is provided as input to the first n copies of γlocal by constant gates,
and the input w to V is provided as input to the next q(n) copies of γlocal by the input
gates to γxV . Finally, the 3m output wires from the final row are collated together into a
single output gate that indicates whether the gate representing the tape head position was
in state qa , so that the circuit will output 1 if and only if the state of the final configuration
is accepting.
Since V (x) runs in time ≤ p(n) and therefore takes space ≤ p(n), γxV contains enough
gates in the horizontal direction to represent the entire non-xy portion of the tape of V (x) at
every step, and contains enough gates in the vertical direction to simulate V (x) long enough
to get an answer. Since the size of γlocal is constant (say, c), the size of the array is at most
cp(n)2 . n additional constant gates are needed to represent x, and the answer on the
Vtop

row can be collected into a single output gate in at most O(p(n)) gates. Therefore, | γx |
is polynomial in n, whence γxV is computable from x in polynomial time.
Since γxV (w) simulates V (x, w), w satisfies γxV if and only if V (x, w) accepts, whence γxV
is satisfiable if and only if there is a witness w testifying that x ∈ A.

Theorem 10.9.3. 3Sat is NP-complete.

Proof. 3Sat ∈ NP for the same reason that Sat ∈ NP: the language

3SatV = { hφ, wi | φ is a 3CNF Boolean formula and φ(w) = 1 }


16
m is the number of bits required to represent a state and a symbol together, plus the boolean value “the tape
head is currently here”.
k
17
where q ∈ Q is the current state, p ∈ N is the position of the tape head, and y ∈ {0, 1}n is the string on the
k
tape from position 0 to n − 1. We may assume that y contains all of the non-xy symbols that are on the tape, since
V runs in time nk and cannot move the tape head right by more than one tape cell per time step.
18
That is, only the tape cell string with the tape head actually contains a representation of the current state, and
the remaining tape cell strings have filler bits for the space reserved for the state; we have arbitrarily chosen state s
to be the filler bits, but this choice is arbitrary. The first bit indicates whether the tape head is on that tape cell or
not.
19
the configuration strings for the three tape cells surrounding tape cell k
20
It will be the number of copies of γlocal that are needed to simulate V (x, w) that will depend on n, but we will
show that the number of copies needed is polynomial in n.
21
This is where the proof gets sketchy; to specify the proof in full detail, handling every technicality, takes pages
and is not much more informative than the sketch we outline below.
148 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

is a polynomial-time verification language for 3Sat.


To show that 3Sat is NP-hard, we show that CircuitSat ≤P 3Sat.22
Let γ be a Boolean circuit with s gates; we design a 3CNF formula φ computable from
γ in polynomial time, which is satisfiable if and only if γ is.23 φ has all the input variables
x1 , . . . , xn of γ, and in addition, for each gate gj in γ, φ has a variable yj in φ representing
the values of the output wire of gate gj . Assume that y1 is the output gate of γ.
For each gate gj , φ has a subformula ψj that expresses the fact that the gate is “func-
tioning properly” in relating its inputs to its outputs. For each gate gj , with output yj and
inputs wj and zj ,24 define

(wj ∨ yj ∨ yj )
 , if gj is a Not gate;
∧ (wj ∨ yj ∨ yj )








(wj ∨ zj ∨ yj )




∧ (wj ∨ zj ∨ yj )


, if gj is an Or gate;


∧ (wj ∨ zj ∨ yj )

ψj = (10.9.1)

 ∧ (wj ∨ zj ∨ yj )







 (wj ∨ zj ∨ yj )
 ∧ (wj ∨ zj ∨ yj ) , if g is an And gate.


j
∧ (wj ∨ zj ∨ yj )




∧ (wj ∨ zj ∨ yj )

Observe that, for example, a ∧ gate with inputs a and b and output c is operating correctly
if

(a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)
∧ (a ∧ b =⇒ c)

Applying the fact that the statement p =⇒ q is equivalent to p ∨ q and DeMorgan’s laws
gives the expressions in equation (10.9.1).
22
The main obstacle to simulating a Boolean circuit with a Boolean formula is that circuits allow unbounded
fan-out and formulas do not. The naı̈ve way to handle this would be to make a separate copy of the subformula
representing a non-output gate of the circuit, one copy for each output wire. The problem is that this could lead to
an exponential increase in the number of copies, as subformulas could be copied an exponential number of times if
they are part of larger subformulas that are also copied. Our trick to get around this will actually lead to a formula
in 3CNF form.
23
φ is not equivalent to γ: φ has more input bits than γ. But it will be the case that φ is satisfiable if and only if
γ is satisfiable; it will simply require specifying more bits to exhibit a satisfying assignment for φ than for γ.
24
wj being the only input if gj is a ¬ gate, and each input being either a γ-input variable xi or a φ-input variable
yi representing an internal wire in γ
10.10. OPTIONAL: A BRIEF HISTORY OF THE P VERSUS NP PROBLEM 149

To express that γ is satisfied, we express that the output gate outputs 1, and that all
gates are properly functioning:
s
^
φ = (y1 ∨ y1 ∨ y1 ) ∧ ψj .
j=1

The only way to assign values to the various yj ’s to satisfy φ is for γ to be satisfiable, and
furthermore for the value of yj to actually equal the value of the output wire of gate gj in
γ, for each 1 ≤ j ≤ s.25 Thus φ is satisfiable if and only if γ is satisfiable.

10.10 Optional: A brief history of the P versus NP problem


There is a false folklore of the history of the P versus NP problem that is not uncommon
to encounter, and the story goes like this: First, computer scientists defined the class P
(that part is true) and the class NP (that part is false), and raised this profound question
asking whether P = NP. Then Steven Cook and Leonid Levin discovered the NP-complete
problems (that part is true), and this gave everyone the tools to now finally attack the
P 6= NP problem, since by showing that one of the NP-complete problems is in P, this would
show that P = NP.
There is a minor flaw and a major flaw to this story. The minor flaw is that it is widely
believed that in fact P 6= NP. Furthermore, the NP-complete problems are not needed to
show this. The existence of a single problem in NP that is not in P suffices to show that
P 6= NP. The problem does not need to be NP-complete for this to work.26 Cook and Levin
were not after a way to resolve P versus NP, which leads to the major flaw in the story:
the NP-complete problems came first, not the class NP. The class NP was only defined to
capture a large class of problems that Sat is complete for (and hopefully large enough to
be bigger than P).
No one would care about the class NP if not for the existence of the NP-complete problems.
NP is a totally unrealistic model of computation. Showing that a problem is in NP does not
automatically lead to a better way to decide it, in any realistic sense.
Beginning in the late 1940s, researchers in the new field of computer science spent a
lot of time searching for algorithms for important optimization problems encountered in
engineering, physics, and other fields that wanted to make use of the new invention of the
computer to automate some of their calculation needs.27
25
That is to say, even if x1 x2 . . . xn is a satisfying assignment to γ, a satisfying assignment to φ includes not only
x1 x2 . . . xn , but also the correct values of yj for each gate j; getting these wrong will fail to satisfy φ even if x1 x2 . . . xn
is correct.
26
In fact, Andrei Kolmogorov suggested that perhaps the problems that are now known to be NP-complete might
contain too much structure to facilitate a proof of intractability, and suggested less natural, but more “random-
looking” languages, as candidates for provably intractable problems, whose intractability would imply the intractabil-
ity of the more natural problems.
27
Inspired in part by the algorithmic wartime work of some of those same researchers in World War II; Dantzig’s
simplex algorithm linear programming was the result of thinking about how to solve problems with linear constraints,
150 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP

Some of these problems, such as sorting and searching, have obvious efficient algorithms,28
although clever tricks like divide-and-conquer recursion could reduce the time even further
for certain problems such as sorting. Some problems, such as linear programming and max-
imum flow, seemed to have exponential brute-force algorithms as their only obvious solu-
tion, although clever tricks (Dantzig’s simplex algorithm in the case of linear programming,
Ford-Fulkerson in the case of network flow)29 would allow for a much faster algorithm than
brute-force search. Still other problems, such as the traveling salesman problem and vertex
cover, resisted all efforts to find an efficient exact algorithm.
After decades of effort by hundreds of researchers to solve such problems failed to produce
efficient algorithms, many suspected that these problems had no efficient algorithms. The
theory of NP-completeness provides an explanation for why these problems are not feasibly
solvable. They are important problems regardless of whether they are contained in NP; what
makes them most likely intractable is the fact that they are NP-hard. In other words, they
are intractable because they are at least as hard as every problem in a class that apparently
(though not yet provably) defines a larger class of problems than P. In better-understood
models of computation such as finite automata, a simulation of a nondeterministic machine
by a deterministic machine necessitates an exponential blowup (in states in the case of finite
automata). Our intuition is that, though this has not been proven, the same exponential
blowup (in running time now instead of states) is necessary when simulating a NTM with
a deterministic TM. The NP-complete problems are believed to be difficult to the extent
that this intuition is correct. While not a proof that those problems are intractable, it
does imply that if those problems are tractable, then something is seriously wrong with our
current understanding of computation, if magical nondeterministic computers could always
be simulated by real computers for negligible extra cost.30
Prior to the theory of NP-completeness, algorithms researchers – and engineers and scien-
tists in need of those algorithms – were lost in the dark, only able to conclude that a decidable
problem was probably too difficult to have a fast algorithm if considerable cost invested in
attempting to solve it had failed to produce results. The theory of NP-completeness provides
a systematic method of, if not proving, at least providing evidence for the intractability of
a problem: show that the problem is NP-complete by reducing an existing NP-complete

such as: “If transportation with trucks costs $10/soldier, $4/ton of food, subject to the constraint that each truck
can carry weight subject to the linear tradeoff of 1 ton of food for every 10 soldiers, and each soldier requires 20
pounds of food/week, how can we move the maximum number of soldiers in 30 trucks, for less than $50,000?”
28
Though it would not be until the late 1960’s that anyone would even think of polynomial time as the default
definition of “efficient”.
29
Although technically the original versions of both of those methods were exponential time in the worst case, they
nonetheless avoided brute-force search in ingenious ways, and led eventually to provably polynomial-time algorithms.
30
We have deeper reasons for thinking that P 6= NP besides the hand-waving argument, “C’mon! How could they be
equal?!” For example, cryptography as we know it would be impossible, and the discovery of proofs of mathematical
theorems could be automated. In a depressing philosophical sense, creativity itself could be automated: as Scott
Aaronson said, “Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-
step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett.”
http://www.scottaaronson.com/blog/?p=122
10.10. OPTIONAL: A BRIEF HISTORY OF THE P VERSUS NP PROBLEM 151

problem to it.31
Scott Aaronson’s paper NP-complete Problems and Physical Reality 32 contains convincing
arguments justifying the intuition that NP contains fundamentally hard problems. Anyone
who would considering taking the position that “The lack of a proof that P 6= NP implies
that there is a non-negligible chance that NP-complete problems are tractable”, owes it to
themselves to read that paper.

31
For this we probably have Richard Karp to thank more than Cook or Levin, who, one year after the publication
of Cook’s 1971 paper showing Sat is NP-complete, published a paper listing 21 famous, important and practical
decision problems, involving graphs, job scheduling, circuits, and other combinatorial objects, showing them all to
be NP-complete by reducing Sat to them (or reducing them to each other). After that, the floodgates opened, and
by the mid-1970’s, hundreds of natural problems had been shown to be NP-complete.
32
http://arxiv.org/abs/quant-ph/0502072
152 CHAPTER 10. EFFICIENT VERIFICATION OF SOLUTIONS: THE CLASS NP
Chapter 11

Undecidability

This chapter is devoted to proving absolute limitations on the fundamental capabilities of


algorithms. Certain problems are not solvable by any algorithm, no matter how much time
is allowed. These problems are undecidable. As we mentioned at the end of the chapter on
Turing machines, we will use the terms algorithm and Turing machine interchangeably.

11.1 The Halting Problem


11.1.1 Turing-recognizable but not decidable
Define the halting problem (or Halting Language)

Halts = { hM, wi | M is a TM and M (w) halts } .

Halts is sometimes denoted K, HTM , or 00 .


Note that Halts is Turing-recognizable, via the following algorithm:
1 def halt_recognizer (M , w ) :
2 """ M is a Python function , and w is an input to M . """
3 M(w) # this executes the function M on input w
4 return True # this is only reached if M ( w ) halts
Here is halt_recognizer in action on a few Python functions.
1 def M ( w ) :
2 """ Indicates if x has > 5 occurrences of the symbol " a ". """
3 count = 0
4 for c in w :
5 if c == " a " :
6 count += 1
7 return count > 5
8
9 w = " abcabcaaabcaa "
10 halt_recognizer (M , w ) # will halt since M always halts
11
12 def M ( w ) :

153
154 CHAPTER 11. UNDECIDABILITY

13 """ Has a potential infinite loop . """


14 count = 0
15 while count < 5:
16 c = w [0]
17 if c == " a " :
18 count += 1
19 return count > 5
20
21 w = " abcabcaaabcaa "
22 halt_recognizer (M , w ) # will halt since w [0] == " a "
23
24 # WARNING : will not halt !
25 # After running this , you ’ ll need to shut down the interpreter
26 w = " bcabcaaabcaa "
27 halt_recognizer (M , w ) # will not halt since w [0] != " a "

We will eventually show that Halts is not decidable:

Theorem 11.1.1. Halts is not decidable.

But before getting to it, we will first use the machinery of reductions, together with
Theorem 11.1.1, to show that other problems are undecidable, by showing that if they could
be decided, then so could Halts, contradicting Theorem 11.1.1.

11.1.2 Reducibility

Although it is standard to use reductions as a formal tool to prove problems are undecidable,
we don’t formally use them in this chapter. For the curious, they are the similar to the
(mapping) reductions used for NP-completeness, but without the polynomial-time constraint.
We also allow a more general type of reduction (known as a Turing reduction), which is closer
to what most people would think of as “using one algorithm as a subroutine to write another”.

general (Turing) reducibility. The problem A is (Turing) reducible to problem B if “an


algorithm MB for B can be used as a subroutine to create an algorithm MA for A”. This is
formally defined with something called oracle Turing machines, but the concept is intuitively
clear enough that we won’t go into the details.

mapping (many-one) reducibility. These are the sorts of reductions we used in showing
problems are NP-complete. A mapping reduction f reducing A to B is the special case when
the algorithm MA is of the form “on input x, compute y = f (x) and return MB (y).” In
other words, MB is called as a subroutine exactly once, at the end of the algorithm, and its
answer is returned as the answer for the whole algorithm. With more general reductions, we
allow the reduction to change the answer, or to call MB more than once.
11.2. OPTIONAL: SOURCE CODE VERSUS PROGRAMS 155

11.2 Optional: Source code versus programs


We have taken the convention of representing “programs/algorithms” as Python functions.
Python has “functions as first-order objects”, meaning that once you define a function, it is
an object just like any other object, which can be passed as a parameter into other functions,
returned from other functions, and executed.
However, this hides a bit of what is going on under the hood when we talk about programs
that process other programs as input. Often, the first access we have to a program is to its
source code, which is a string.
In this section, we indicate how one would write a Python function halt_recognizer
that recognizes Halts taking as input a string M_src (not a Python function) and another
string w. The built-in Python function exec gives a way to start from Python source code
defining a function and convert it into an actual Python function that can be called. It is
worth noting that these are two different types of objects, similarly to the fact that a string
hGi representing a graph G = (V, E) is a different type of object than G itself. G is a pair
consisting of a set V and a set E of pairs of elements from V , whereas hGi is a sequence of
symbols from the alphabet {0, 1}. However, the encoding function we defined allows us to
interconvert between these two, and it is sometimes convenient to speak of G as a graph,
and other times as though it is a string representing a graph.
Similarly, Python gives an easy way to interconvert between Python functions that can
be called, and strings that contain properly formatted Python source code describing those
functions. To convert source code to a function, use exec as above. To convert a function to
source code, use the inspect library. The following code snippet shows how to do this, and
even how to do a simple modification on the source code that changes its behavior, which is
reflected if the code is “re-compiled” using exec.
1 def M ( w ) :
2 """ Indicates if | w | > 3 """
3 return len ( w ) > 3
4
5 print ( ’M ("0101") = {} ’. format ( M ( " 0101 " ) ) )
6
7 # use inspect . getsource to convert function to source code
8 import inspect
9 M_src = inspect . getsource ( M )
10 print ( " source code of M :\ n {} " . format ( M_src ) )
11
12 M_src = M_src . replace ( " len ( w ) > 3 " , " len ( w ) > 5 " )
13
14 # use exec to convert source code to function
15 namespace = {}
16 exec ( M_src , namespace )
17 M = namespace [ ’M ’]
18
19 print ( ’M ("0101") = {} ’. format ( M ( " 0101 " ) ) )
20 print ( " source code of M :\ n {} " . format ( M_src ) )
Here is how one would write halt_recognizer to take the source code of M as input.
156 CHAPTER 11. UNDECIDABILITY

1 def halt_recognizer ( M_src , w ) :


2 """ M is the source code of a Python function named M ,
3 and w is an input to M . """
4 namespace = {}
5 exec ( M_src , namespace ) # this defines the function M
6 M = namespace [ " M " ] # find function M defined when
7 # code in M_src executes
8 M(w) # this actually executes the function on input w
9 return True # this is only reached if M ( w ) halts

Here is halt_recognizer in action on a few strings representing Python source code.


1 M_src = ’’’
2 def M ( w ) :
3 """ Indicates if x has > 5 occurrences of the symbol " a "."""
4 count = 0
5 for c in w :
6 if c == " a ":
7 count += 1
8 return count > 5
9 ’’’
10 w = " abcabcaaabcaa "
11 print ( halt_recognizer ( M_src , w ) ) # prints True since M always halts
12
13 M_src = ’’’
14 def M ( w ) :
15 """ Has a potential infinite loop , depending on input w ."""
16 count = 0
17 while count < 5:
18 c = w [0]
19 if c == " a ":
20 count += 1
21 return count > 5
22 ’’’
23 w = " abcabcaaabcaa "
24 print ( halt_recognizer ( M_src , w ) ) # prints True since w [0] == " a "
25
26 # WARNING : will not halt !
27 # After running this , you ’ ll need to kill the interpreter
28 w = " bcabcaaabcaa "
29 print ( halt_recognizer ( M_src , w ) ) # won ’t halt since w [0] != " a "

From now on, for convenience we will most deal with Python functions, rather than
strings that compile into Python functions, knowing that if we had source code instead, we
could simply use exec to convert it to a Python function.

11.3 Undecidable problems about algorithm behavior


11.3.1 No-input halting problem
Define the no-input halting problem Haltsε = { hM i | M is a TM and M (ε) halts }.
11.3. UNDECIDABLE PROBLEMS ABOUT ALGORITHM BEHAVIOR 157

First, suppose we knew already that Haltsε was undecidable, but not whether Halts
was undecidable. Then we could easily use the undecidability of Haltsε to prove that
Halts is undecidable, by showing a reduction from Haltsε to Halts: suppose for the sake
of contradiction that Halts is decidable by TM H. Then to decide whether M (ε) halts, we
use call H with inputs hM, εi to decide whether M (ε) halts.
That is, an instance hM i of Haltsε is a simple special case of an instance hM, wi of
Halts (where w = ε), so Halts is at least as difficult to decide as Haltsε . Proving that
Haltsε is undecidable means showing the other direction: Haltsε is at least as difficult to
decide as Halts.

Theorem 11.3.1. Haltsε is undecidable.

Proof. Suppose for the sake of contradiction that Haltsε is decidable by algorithm Hε .
1 def H_e ( M ) :
2 """ Function that supposedly decides whether M ("") halts . """
3 raise N o t Im p l em e n te d E rr o r ()
Define the algorithm H deciding Halts as
1 def H (M , w ) :
2 """ Decider for HALTS that works assuming H_e works , giving
3 a contradiction and proving H_e cannot be implemented . """
4 def T_Mw ( x ) :
5 """ Halts on every string if M halts on w , and loops on
6 every string if M loops on w . """
7 M(w)
8 return H_e ( T_Mw )
If M (w) halts, then TM,w halts on all inputs (including ε), and if M (w) does not halt,
then TM,w halts on no inputs (including ε). Then H decides Halts, a contradiction.

Note that TM,w depends on M and w, so the program will be behave differently if different
M or w are passed to H. M and w are “hardcoded constants” in TM,w .
Pay particular attention to what H is doing: importantly, it is not running TM,x . It
defines the algorithm TM,x . However, it is possible to define an algorithm, meaning, to state
what steps would be executed if the algorithm were to run, without actually running it. Every
time you are programming, you do this. You write code, but the code doesn’t execute until
you run it. Similarly, when a Python function like H executes the def statement defining the
“local function” T_Mw, this creates the function, but it does not execute the function. The
next line return H_e(T_Mw) similarly does not execute T_Mw. It merely passes the function
object as input to another function.
For example, the following code defines f:
1 def f ( x ) :
2 print ( " Returning {} + 1 " . format ( x ) )
3 return x +1
Run it. Nothing happens. It defined f, but did not run f. Now run this code:
158 CHAPTER 11. UNDECIDABILITY

1 def f ( x ) :
2 print ( " Returning {} + 1 " . format ( x ) )
3 return x +1
4
5 print ( " Returned value : {} " . format ( f (3) ) )
It should print
Returning 3 + 1
Returned value: 4
Because after defining f, it actually runs f.
Intuitively, think of the difference between library code, versus a program with a main
function. Library code generally doesn’t run when you import the library. For instance,
executing import math in Python gives you access to all the functions (such as math.exp)
defined in the math library, but doesn’t run any of them.

11.3.2 Recipe for showing undecidability via reduction from Haltsε


We now use Haltsε to show several other questions about the behavior of algorithms are
undecidable. All the proofs reduce Haltsε to the problem of interest, following the same
basic recipe.
To determine if M (ε) halts, assuming we have an algorithm A for deciding some other
“behavior” of an algorithm TM (such as “TM accepts a given input” or “TM rejects at least
one input”), design a new algorithm TM that
1) runs M (ε), and
2) if M (ε) ever halts, make TM show the behavior that A can supposedly decide. (i.e.,
the behavior should be different than if M (ε) never halts)
Then TM has that behavior if and only if M (ε) halts. This means that essentially any
question about the “eventual behavior” of an algorithm is undecidable.

end of lecture 9b

11.3.3 Accepting a given string


Define
Accepts = {hM, wi | M is a TM that accepts w}.
Theorem 11.3.2. Accepts is undecidable.
Proof. Suppose for the sake of contradiction that Accepts is decidable by algorithm A.
1 def A (M , w ) :
2 """ Function that supposedly decides whether M ( w ) accepts ,
3 where M is a function and w an input to M . """
4 raise N o t Im p l em e n te d E rr o r ()
11.3. UNDECIDABLE PROBLEMS ABOUT ALGORITHM BEHAVIOR 159

Define the algorithm Hε deciding Haltsε as:


1 def H_eps ( M ) :
2 """ Decider for HALTS_epsilon that works assuming A works ,
3 giving a contradiction and proving A cannot be implemented . """
4 def T_M ( x ) :
5 M("")
6 return True
7 return A ( T_M , " 011 " )
If M (ε) loops, then TM loops on every input, so it does not accept 011. If M (ε) halts,
then TM accepts every input, so it accepts 011.
Then for any algorithm M ,
M (ε) halts ⇐⇒ TM accepts 011 defn of TM
⇐⇒ A accepts hTM , 011i since A decides Accepts
⇐⇒ Hε accepts hM i . line (7) of Hε
Since H always halts, it decides Haltsε , a contradiction.

11.3.4 Accepting no inputs


Define
Empty = {hM i | M is a TM and L(M ) = ∅}.
Here is an example of an undecidability proof that shows how to use a purported decider
E for Empty to decide Haltsε , but that must negate the Boolean answer given by E.
Theorem 11.3.3. Empty is undecidable.
Proof. Suppose for the sake of contradiction that Empty is decidable by algorithm E.
1 def E ( M ) :
2 """ Function that supposedly decides whether L ( M ) is empty . """
3 raise N o t Im p l em e n te d E rr o r ()
Define the algorithm Hε deciding Haltsε as:
1 def H_eps ( M ) :
2 """ Decider for HALTS_epsilon that works assuming E works ,
3 giving a contradiction and proving E cannot be implemented . """
4 def T_M ( x ) :
5 """ Accepts every string if M halts on "" , and loops on
6 every string if M loops on "". """
7 M("")
8 return True
9 return not E ( T_M )
Then for any algorithm M ,
M (ε) halts ⇐⇒ L(TM ) 6= ∅ defn of TM
⇐⇒ E rejects hTM i since E decides Empty
⇐⇒ Hε accepts hM i . line (9) of Hε
Since H always halts, it decides Haltsε , a contradiction.
160 CHAPTER 11. UNDECIDABILITY

11.3.5 Rejecting at least one string


Define
Rej = {hM i | M is a TM that rejects at least one input }.
Here is an example of an undecidability proof that shows how to use a purported decider
R for Rej to decide Haltsε , but that must define an algorithm TM that has a different
behavior on after running M (ε) than the previous proofs (which all had TM accept).
Theorem 11.3.4. Rej is undecidable.
Proof. Suppose for the sake of contradiction that Rej is decidable by algorithm R.
1 def R ( M ) :
2 """ Function that supposedly decides whether L ( M ) is empty . """
3 raise N o t Im p l em e n te d E rr o r ()
Define the algorithm Hε deciding Haltsε as:
1 def H_eps ( M ) :
2 """ Decider for HALTS_epsilon that works assuming R works ,
3 giving a contradiction and proving R cannot be implemented . """
4 def T_M ( x ) :
5 """ Rejects every string if M halts on "" , and loops on
6 every string if M loops on "". """
7 M("")
8 return False
9 return R ( T_M )
Then for any algorithm M ,

M (ε) halts ⇐⇒ TM rejects at least one input defn of TM


⇐⇒ R accepts hTM i since R decides Rej
⇐⇒ Hε accepts hM i . line (9) of Hε

Since H always halts, it decides Haltsε , a contradiction.

11.4 How to spot undecidability at a glance


The following are all questions about the eventually behavior of an algorithm, which can
be proven undecidable by showing that the halting problem can be reduced to them. This
includes questions of the form, given a program P and input x:

• Does P accept x?
• Does P reject x?
• Does P loop on x?
• Does P (x) ever execute line 320?
11.4. HOW TO SPOT UNDECIDABILITY AT A GLANCE 161

• Does P (x) ever call subroutine f ?


• Does P (x) return the value 011?
• Does P (x) ever raise an exception?

Similar questions about general program behavior over all strings are also undecidable,
i.e., given a program P :

• Does P accept at least one input?


• Does P accept all inputs?
• Is P total? (i.e., does it halt on all inputs?)
• Is there an input x such that P (x) returns the value 42? (This is undecidable even
when we restrict to total programs P ; so even if we knew in advance that P halts on
all inputs, there is no algorithm that can find an input x such P (x) returns the value
42)
• Does some input cause P to call subroutine f ?
• Do all inputs cause P to call subroutine f ?
• Does P decide the same language as some other program Q?
• Does P accept any inputs that represent a prime number?
• Does P decide the language {hni | n is prime}?
• Is P a polynomial-time machine? (this is not decidable even if we are promised that P
is total)
• Is P ’s running time O(n3 )?

However, there are certainly questions one can ask about a program P that are decidable.
For example, if the question is about the syntax instead of the behavior, then the problem
can be decidable:

• Does P have at least 100 lines of code? (Turing machine equivalent of this question
would be, “Does M have at least 100 states?”)
• (Interpreting P as a TM) Does P have 2 and 3 in its input alphabet?
• Does P halt immediately because its first line is a return statement not calling an-
other function? (Turing machine equivalent of this question would be, “Does P halt
immediately because its start state is equal to the accept or reject state?)
• Does P have a subroutine named f ?
162 CHAPTER 11. UNDECIDABILITY

• (Interpreting P as a TM) Do P ’s transitions always move the tape head right? (If
so, then we could decide whether P halts on a given input, because it either halts
after moving the tape head beyond the input upon reading enough blanks, or repeats
a non-halting state while reading blanks and will therefore move the tape head right
forever.)
• Does P “clock itself” to halt after n3 steps on inputs of length n, guaranteeing that
its running time is O(n3 )? One way to do this is, before doing anything else, to write
3
the string 0n on a worktape, then reset the tape head to the left, then move that tape
head right once per step (in addition to the computation being done with all the other
tapes), and immediately halt when that worktape reaches the blank to the right of the
last 0.1

Furthermore, many questions about the behavior of P on input x actually are decidable,
as long as the question contains a “time bound” that restricts the search, for example, given
P and x:

• Does P (x) halt within 20 steps?


• Does P (x) halt within n3 steps, where n = |x|? Note this is not the same as asking
whether P is an O(n3 )-time program, which is a question about all inputs x. P could
take time n3 on this x, but time 2|y| on some other input y, so not have worst-case
running time of O(n3 ).
• Is 21 the fifth line of code P (x) executes? (Turing machine equivalent: Is q21 the fifth
state P (x) visits?2 )
• Does P (x) (interpreted as a single-tape TM) ever move its tape head beyond position
n3 ? (where n = |x|) This one takes a bit more work to see.
Suppose P (x) does not move its tape head beyond position n3 . Then P (x) has a finite
3
number of reachable configurations, specifically, |Q|·n3 ·|Γ|n , since there are |Q| states,
3
n3 possible tape head positions, and |Γ|n possible values for the tape content. If one
of these configurations repeats, then P runs forever. Otherwise, by the pigeonhole
3
principle, P (x) can visit no more than |Q| · n3 · |Γ|n configurations before halting.
So to decide if P (x) ever moves its tape head beyond position n3 , run it until it either
moves its tape head beyond position n3 , so the answer is “yes”, or repeats a configu-
ration so the answer is “no”, since it will now repeat forever the same configurations
1
Note this is not the same as asking if P has running time O(n3 ); it is asking whether this one particular method
is employed to guarantee a running time of O(n3 ). If P simply had three nested for loops, each counting up to n, but
not “self-clocking” as described, then the answer to this question would be “no” even though P is an O(n3 ) program.
2
Note that even the following question is decidable: given P , is q21 the fifth state visited on every input? Although
there are an infinite number of inputs, the fifth transition can only read some bounded portion of the input, so by
searching over the 25 = 32 possible prefixes of length 5 (the only portion of the input that can be scanned in the first
5 steps), we can answer the question without needing to know the input beyond that prefix.
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE163

that don’t move the tape head beyond n3 . If neither of these happens, P (x) will halt
and the answer is “no”.

end of lecture 9c

11.5 Optional: Enumerators as an alternative definition of Turing-


recognizable
An enumerator is a TM E with a “printer” attached. It takes no input, and runs forever.
Every so often it prints a string.3 The set of strings printed is the language enumerated by
E.
This is a Python implementation of an enumerator that enumerates the prime numbers,
which uses a Python trick called generators that let one iterate over values without explicitly
creating a list of those values (so they can iterate over infinitely many values potentially):
1 def is_prime ( n ) :
2 """ Check if n is prime . """
3 if n < 2: # 0 and 1 are not primes
4 return False
5 for d in xrange (2 , int ( n **0.5) +1) :
6 if n % d == 0:
7 return False
8 return True
9
10 import itertools
11 def primes_ enumer ator () :
12 """ Iterate over all prime numbers . """
13 for n in itertools . count () : # iterates over all natural numbers
14 if is_prime ( n ) :
15 yield n
Running the following code prints all the prime numbers:
1 for n in p rimes_ enumer ator () :
2 print ( n )
Of course, don’t run that! If we just want to assure ourselves it works, let’s print the first
100 prime numbers:
1 # print first 100 numbers returned from prime s_enum erator ()
2 for (p , _ ) in zip ( p rimes_ enumer ator () , range (100) ) :
3 print (p , end = " " )
3
We could imagine implementing these semantics by having a special “print” worktape. Whenever the TM needs
to print a string, it writes the string on the tape, with a xy immediately to the right of the string, and places the tape
head on the first symbol of the string (so that one could print ε by placing the tape head on a xy symbol), and enters a
special state qprint . The set of strings printed is then the set of all strings that were between the tape head and the first
xy to the right of the tape head on the printing tape whenever the TM was in the state qprint . In the Python code, we
implement enumerators with generators using the yield statement: https://wiki.python.org/moin/Generators.
164 CHAPTER 11. UNDECIDABILITY

Recall that a language is Turing-recognizable if there is a TM recognizing it. In other


words, A is Turing-recognizable if there is a TM M with input alphabet Σ such that, for all
x ∈ Σ∗ , if x ∈ A, then M accepts x, and if x 6∈ L, then M either rejects or loops on x.

Theorem 11.5.1. A language A is Turing-recognizable if and only if it is enumerated by


some enumerator.

This is why such languages are sometimes called computably enumerable.

Proof. ( ⇐= ): Let E be an enumerator that enumerates A. Define M (w) as follows: run


E. Whenever E prints a string x, accept if x = w, and keep running E otherwise.
Note that if E prints w eventually, then M accepts w, and that otherwise M will not
halt. Thus M accepts w if and only if w is in the language enumerated by E.

( =⇒ ): Let M be a TM that recognizes A. We use a standard trick in computability theory


known as a dovetailing computation. Let s1 = ε, s2 = 0, s3 = 1, s4 = 00, . . . be the
standard length-lexicographical enumeration of {0, 1}∗ .
Define E as follows:
for i = 1, 2, 3, . . .:
for j = 1, 2, 3, . . . , i:
run M (sj ) for i steps; if it accepts in the first i steps, print sj .
Note that for every i, j ∈ N, M will eventually be allowed to run for i steps on input sj .
Therefore, if M (sj ) ever halts and accepts, E will detect this and print si . Furthermore,
E prints only strings accepted by M , so E enumerates exactly the strings accepted by
M.

The following Python code implements the “convert enumerator to acceptor” direction:
1 def c r e a t e _ a c c e p t o r _ f r o m _ e n u m e r a t o r ( enumerator ) :
2 def acceptor ( i nput_t o_acce ptor ) :
3 for e numera ted_ou tput in enumerator () :
4 if en umerat ed_out put == in put_to _accep tor :
5 return True
6 return False # if it has a finite language it might halt
7 return acceptor
So, if we give it the primes enumerator above, it creates a Python function that recognizes
prime numbers. (but doesn’t halt on composites!)
1 primes_acceptor = c r e a t e _ a c c e p t o r _ f r o m _ e n u m e r a t o r ( prime s_enum erator )
2 print ( primes_acceptor (2) ) # prints True
3 print ( primes_acceptor (3) ) # prints True
4 print ( primes_acceptor (5) ) # prints True
5 print ( primes_acceptor (7) ) # prints True
6 print ( primes_acceptor (9) ) # runs forever
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE165

Python doesn’t have a way to call a function and run it for only a certain number of
steps, so there is no direct Python implementation of the idea of the other direction in the
proof of Theorem 11.5.1. We can do something similar, however, using the Python threading
library. To take an algorithm M recognizing A, and create an enumerator E enumerating
A, we can run an infinite loop that starts A on each possible input x ∈ Σ∗ in a separate
thread. If M (x) does not halt then the thread will never terminate, but for any M (x) that
does halt, E can check whether it accepted and print x if so.
1 import sys , itertools , threading
2
3 # for python 3 include the next line
4 from queue import Queue
5
6 # for python 2 include the next line
7 # from Queue import Queue
8
9 def binary_strings () :
10 """ Enumerates all binary strings in length - lexicographical order . """
11 for length in itertools . count () :
12 for s in itertools . product ([ " 0 " ," 1 " ] , repeat = length ) :
13 yield " " . join ( s )
14
15 # WARNING : this runs forever , consuming processor and memory and has to be killed
16 def e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( recognizer , inputs = binary_strings () ) :
17 """ Given recognizer , a function that defines a language by returning True if
18 its input string is in the language and otherwise either returning False
19 or looping , this function enumerates over strings in the language .
20
21 inputs is iterator yield ing valid inputs to recognizer . Default is binary
22 strings in lexicographical order . """
23 # define queue to put outputs ( and its associated input ) into
24 outputs_queue = Queue ()
25 def c o m p u t e _ o u t p u t _ a n d _ a d d _ t o _ q u e u e ( input_string ) :
26 output = recognizer ( input_string )
27 if output == True :
28 outputs_queue . put (( input_string , output ) )
29
30 # run recognizer on all possible inputs and add outputs to outputs_queue
31 re co gnizer_threads = []
32 def s t a r t _ a l l _ r e c o g n i z e r _ t h r e a d s () :
33 for input_string in inputs :
34 thread = threading . Thread ( target = compute_output_and_add_to_queue ,
35 args =[ input_string ])
36 thread . daemon = True
37 thread . start ()
38 recognizer_threads . append ( thread )
39 # if finite number of inputs , wait until all threads have stopped
40 # and then add a new item to the queue to indicate all inputs are done
41 for thread in recognizer_threads :
42 thread . join ()
43 outputs_queue . put (( None , None ) )
44
166 CHAPTER 11. UNDECIDABILITY

45 starter_thread = threading . Thread ( target = s t a r t _ a l l _ r e c o g n i z e r _ t h r e a d s )


46 starter_thread . daemon = True # ensures thread is killed when main thread stops
47 starter_thread . start ()
48
49 # search for True output in outputs_queue
50 while True :
51 ( input_string , output ) = outputs_queue . get ()
52 if output == None :
53 break
54 yield input_string
55
56
57 # WARNING : this runs forever , consuming processor and memory and has to be killed
58 def c r e a t e _ e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( recognizer , inputs = binary_strings () ) :
59 """ Given recognizer , a function that defines a language by returning True
60 if its input string is in the language and otherwise either returning False
61 or looping , it returns an iterator that iterates over strings in the language .
62
63 inputs is iterator yield ing valid inputs to recognizer . Default is binary
64 strings in lexicographical order . """
65 def enumerator () :
66 return e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( recognizer , inputs )
67 return enumerator

We can test it out, but if you actually run this, be sure to kill the process manually. It
will run forever and continue to consume resources.

1 # for python 2 include the next line


2 # from future_builtins import zip
3
4 import itertools
5 enumerate_primes_created_from_recognizer = create_enumerator_from_recognizer (
is_prime , inputs = itertools . count () )
6
7 # WARNING : this will run forever and consume processor and memory and have to be
killed
8 for i , p in zip ( itertools . count () , e n u m e r a t e _ p r i m e s _ c r e a t e d _ f r o m _ r e c o g n i z e r () ) :
9 print ( " {} ’ th prime is {} " . format (i , p ) )
10 if i > 10:
11 break

A more reasonable testing option is to set inputs to iterate over a bounded number of
inputs:

1 import itertools
2 enumerate_primes = c r e a t e _ e n u m e r a t o r _ f r o m _ r e c o g n i z e r ( is_prime , inputs = range (100) )
3
4 for i , p in zip ( itertools . count () , enumerate_primes () ) :
5 print ( " {} ’ th prime is {} " . format (i , p ) )
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE167

11.5.1 A non-Turing-recognizable language


The language Halts is Turing-recognizable, but not decidable. Since it is not decidable,
its complement is not decidable. Is its complement Halts Turing-recognizable? In other
words, is Halts co-Turing-recognizable?
We will show it is not, using the following theorem.
Theorem 11.5.2. A language is decidable if and only if it is Turing-recognizable and co-
Turing-recognizable (i.e., its complement is Turing-recognizable).
Proof. We prove each direction separately.
(decidable =⇒ Turing-recognizable and co-Turing-recognizable): Any decidable lan-
guage is Turing-recognizable, and the complement of any decidable language is decid-
able, hence also Turing-recognizable.
(Turing-recognizable and co-Turing-recognizable =⇒ decidable): Let A be Turing-
recognizable and co-Turing-recognizable, let MA be a TM recognizing A, and let MA
be a TM recognizing A. Define the TM M as follows. On input w, M runs MA (w)
and MA (w) in parallel. One of them will accept since either w ∈ A or w ∈ A. If MA
accepts first, then M accepts w, and if MA accepts w first, then M (w) rejects. So M
decides A.
The following is a an implementation using the original definition of Turing-recognizable,
running two TMs in parallel using Python’s threading library.
1 # for python 3 include the next line
2 from queue import Queue
3
4 # for python 2 include the next line
5 # from Queue import Queue
6
7 import threading
8
9 def d e ci d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer , w ) :
10 """ recognizer is function recognizing some language A , and
11 comp_recognizer is function recognizing complement of A . """
12 outputs_queue = Queue ()
13
14 def r e c o g n i z e r _ a d d _ t o _ q u e u e () :
15 output = recognizer ( w )
16 if output == True :
17 outputs_queue . put ( " w in A " )
18
19 def c o m p _ r e c o g n i z e r _ a d d _ t o _ q u e u e () :
20 output = comp_recognizer ( w )
21 if output == True :
22 outputs_queue . put ( " w not in A " )
23
24 t1 = threading . Thread ( target = r e c o g n i z e r _ a d d _ t o _ q u e u e )
25 t2 = threading . Thread ( target = c o m p _ r e c o g n i z e r _ a d d _ t o _ q u e u e )
168 CHAPTER 11. UNDECIDABILITY

26 t1 . daemon = t2 . daemon = True


27 t1 . start ()
28 t2 . start ()
29
30 # exactly one of the threads will put a message in the queue
31 message = outputs_queue . get ()
32 if message == " w in A " :
33 return True
34 elif message == " w not in A " :
35 return False
36 else :
37 raise AssertionError ( " should not be reachable " )
38
39 def c r e a t e _ d e c i d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer ) :
40 def decider ( w ) :
41 return d e c i d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer , w )
42 return decider
We can also use the enumerator characterization of Turing-recognizable languages to get
simpler code for the direction (Turing-recognizable and co-Turing-recognizable =⇒
decidable). Essentially, run two enumerators in parallel (the syntax for this is very simple
in Python), and each time one produces a string, test for equality with w. Which enumerator
produces w indicates whether to accept or reject. Python actually gives a nice way using
the zip function to alternate a string produced by one enumerator with a string produced
by the other (but only the version of zip in Python 3, which returns an iterator; Python 2’s
zip will attempt to build an infinite list and won’t halt):
1 from future_builtins import zip # only Python 3 version of zip works
2
3 def d e ci d e r _ f r o m _ e n u m e r a t o r s ( enumerator , comp_enumerator , w ) :
4 """ enumerator is iterator that enumerates some language A , and
5 comp_enumerator is iterator enumerating complement of A . """
6 for ( x_yes , x_no ) in zip ( enumerator , comp_enumerator ) :
7 if w == x_yes :
8 return True
9 if w == x_no :
10 return False
The above is shorthand for the following code, which is a more verbose way to use iterators
(it also assumes that each enumerator produces an infinite number of strings; otherwise,
calling .next() raises a StopIterator exception if there is no “next” string):4
1 def d e ci d e r _ f r o m _ e n u m e r a t o r s ( enumerator , comp_enumerator , w ) :
2 """ enumerator is iterator that enumerates some language A , and
3 comp_enumerator is iterator enumerating complement of A . """
4 while True :
5 x_yes = enumerator . next ()
6 if w == x_yes :
7 return True
8 x_no = comp_enumerator . next ()
4
Of course, if that ever happens, for example if enumerator runs out of strings, then it means that the language
has only a finite number of strings, so if none of them were w, then w is not in the language and we should reject.
11.5. OPTIONAL: ENUMERATORS AS AN ALTERNATIVE DEFINITION OF TURING-RECOGNIZABLE169

9 if w == x_no :
10 return False
We can test it as follows. Here, we make recognizers for the problem A = “is a given
number prime?”, and its complement A = “is a given number composite?”, and we inten-
tionally let the recognizers run forever when the answer is no, in order to demonstrate that
the decider returned by create_decider_from_recognizers still halts.5
1 def loop () :
2 """ Loop forever . """
3 while True :
4 pass
5
6 def prime_recognizer ( n ) :
7 """ Check if n is prime . """
8 if n < 2: # 0 and 1 are not primes
9 loop ()
10 for x in range (2 , int ( n **0.5) +1) :
11 if n % x == 0:
12 loop ()
13 return True
14
15 def compo s i t e _ r e c o g n i z e r ( n ) :
16 if n < 2: # 0 and 1 are not primes
17 return True
18 for x in xrange (2 , int ( n **0.5) +1) :
19 if n % x == 0:
20 return True
21 loop ()
22
23 # WARNING : will continue to consume resources after it halts
24 # because it starts up threads that run forever
25 prime_decider = c r e a t e _ d e c i d e r _ f r o m _ r e c o g n i z e r s ( prime_recognizer ,
compos i t e _ r e c o g n i z e r )
26 for n in range (2 ,20) :
27 print ( " {:2} is prime ? {} " . format (n , prime_decider ( n ) ) )

Corollary 11.5.3. Halts is not Turing-recognizable.


Proof. Halts is Turing-recognizable. If Halts were Turing-recognizable, then Halts would
be decidable by Theorem 11.5.2, a contradiction.
Note the key fact used in the proof of Corollary 11.5.3 is that the class of Turing-
recognizable languages is not closed under complement. Hence any language that is Turing-
recognizable but not decidable (such as Halts) has a complement that is not Turing-
recognizable.
5
If you run this from the command line, it will halt as expected. Unfortunately, if you run this code within ipython
notebook, the threads continue to run even after the notebook cell has been executed, since the threads run as long as
the program that created them is still going, and the “program” is considered the whole notebook, not an individual
cell. To shut down these threads and keep them from continuing to run and consume resources, you will have to
shutdown the ipython notebook.
170 CHAPTER 11. UNDECIDABILITY

11.6 The Halting Problem is undecidable


We have used the fact that the Halting Problem is undecidable to prove that several other
problems are also undecidable. But how do we prove a problem is undecidable in the first
place, if we don’t already know of one that we can use with reductions? We now move on to
proving directly that the Halting Problem is undecidable, using Turing’s original technique
used to prove this first in 1936. He used a clever technique that was already several decades
old at the time, called diagonalization.

11.6.1 Comparing sizes of sets using onto functions


Cantor considered the question: given two infinite sets, how can we decide which one is
“larger”? We might say that if A ⊆ B, then |A| ≤ |B|.6 But this is a weak notion, as
the converse does not hold even for finite sets: {1, 2} is smaller than {3, 4, 5}, but {1, 2} 6⊆
{3, 4, 5}.
Cantor noted that two finite sets have equal size if and only if there is a bijection between
them. Extending this slightly, one can say that a finite set A is strictly smaller than B if
and only if there is no onto function f : A → B.

3 3 3 3 3 3 3 3 3
1 1 1 1 1 1 1 1 1
4 4 4 4 4 4 4 4 4
2 2 2 2 2 2 2 2 2
5 5 5 5 5 5 5 5 5

Figure 11.1: All the functions f : {1, 2} → {3, 4, 5}. None of them are onto.

Figure 11.1 shows that there is no onto function f : {1, 2} → {3, 4, 5}: there are only two
values of f , f (1) and f (2), but there are three values 3,4,5 that must be mapped to, so at
least one of 3, 4, or 5 will be left out (will not be an output of f ), so f will not be onto. We
conclude the obvious fact that |{1, 2}| < |{3, 4, 5}|.
Conversely, if there is an onto function f : A → B, then we can say that |A| ≥ |B|. For
instance, f (3) = f (4) = 1 and f (5) = 2 is an onto function f : {3, 4, 5} → {1, 2}, so we
conclude the obvious fact that |{3, 4, 5}| ≥ |{1, 2}|.

11.6.2 Can one infinite set be larger than another?


Since the notion of onto functions is just as well defined for infinite sets as for finite sets,
this gives a reasonable notion of how to compare the cardinality of infinite sets.

N vs. Z. Let’s consider two infinite sets: N = {0, 1, 2, . . . , } and Z = {. . . , −2, −1, 0, 1, 2, . . .}.
What’s an onto function from Z to N? The absolute value function works: f (n) = |n|.
In fact, for any sets A, B where A ⊆ B, we have |A| ≤ |B|, i.e., there is an onto function
6
For instance, we think of the integers as being at least as numerous as the even integers.
11.6. THE HALTING PROBLEM IS UNDECIDABLE 171

f : B → A. What is it? One choice is this: pick some fixed element a0 ∈ A. Then define f
for all x ∈ B by f (x) = x if x ∈ A, and f (x) = a0 otherwise. For f : Z → N, one possibility
is to let f (n) = n if n ≥ 0 and let f (n) = 0 if n < 0.
How about going the other way? What’s an onto function from N to Z? This doesn’t
look quite as easy; it seems like there are “more” integers than nonnegative integers, since
N ( Z. But, there is an onto function:

f (0) = 0
f (1) = 1
f (2) = −1
f (3) = 2
f (4) = −2
f (5) = 3
f (6) = −3
...

Symbolically, we can express f as f (n) = −n/2 if n is even and dn/2e if n is odd. But
functions don’t always need to be expressed symbolically. The partial enumeration above is
a perfectly clear way to express the same function.

N vs. Q+ . What about N and Q+ (the positive rational numbers)? Since N+ ⊂ Q+ ,


|N| ≤ |N | ≤ |Q+ |. (Why is the first inequality true?) What about the other way? Is there
+

an onto function f : N → Q+ ?
With N, there’s one nice way to think about onto functions f with N as the domain. We
can think of defining f : N → Q+ via r0 = f (0), r1 = f (1), . . ., where each rn ∈ Q+ . In
other words, we can describe f by describing a way to enumerate all the rational numbers
in order r0 , r1 , . . .. What order should we pick?
Each positive rational number r is defined by a pair of integers n, d ∈ N+ , where r = nd .
The following order doesn’t work: r0 = 11 , r1 = 12 , r2 = 31 , r3 = 14 , . . . , r? = 21 , r? = 22 , r? =
2
3
, . . . There’s infinitely many possible denominators d, so we can’t set n = 1 and iterate
through all possible d before changing n. We need some way of changing both n and d to
make sure all pairs of positive integers appear in this list.
Here’s on way to do it: 11 , 21 , 21 , 13 , 22 , 31 , 14 , 23 , 32 , 14 , 15 , 24 , 33 , 42 , 51 , 16 , . . . In other
words, enumerate all n, d ∈ N+ where n + d = 2 (which is just n = d = 1), then all
n, d ∈ N+ where n + d = 3 (of which there are 2), then all n, d ∈ N+ where n + d = 4 (of
which there are 3), then all n, d ∈ N+ where n + d = 5 (of which there are 4), etc. This
enumeration shows that |N| ≥ |Q+ |.

N vs. Q. To see that |N| ≥ |Q|, i.e., there’s an onto function f : N → Q, we can use the
same trick as above with Z. Let g : N → Q+ be the onto function we just defined, and let
172 CHAPTER 11. UNDECIDABILITY

f : N → Q be defined as

f (0) = 0
f (1) = g(0)
f (2) = −g(0)
f (3) = g(1)
f (4) = −g(1)
f (5) = g(2)
f (6) = −g(2)
...

Since g maps onto every positive rational number, f maps onto every rational number.

N vs. {0, 1}∗ . What about infinite sets without numbers? Recall that showing |{0, 1}∗ | ≤
|N| amounts to show that {0, 1}∗ can be “enumerated” in order s0 , s1 , s2 , . . .. The length-
lexicographical enumeration of {0, 1}∗ is such an enumeration: ε, 0, 1, 00, 01, 10, 11, 000, 001, . . ..
In fact, because this repeats no strings in {0, 1}∗ , it is a bijection (1-1 and onto). Bijections
are invertible, so its inverse is an onto function g : N → {0, 1}∗ , so |N| = |{0, 1}∗ |.

R vs. the unit interval. What about R, the set of real numbers, and (0, 1), the set of real
numbers strictly between 0 and 1? (a.k.a., the unit interval ) Since (0, 1) ⊂ R, we know that
|(0, 1)| ≤ |R|.
First, define g : (0, 1) → R+ as g(x) = x1 − 1. To see that g is onto, let y ∈ R+ and note
1 1
that setting x = y+1 means g(x) = y; note that y+1 ∈ (0, 1) for any positive real y. This
+
shows that |(0, 1)| ≥ |R |.
Now we must show |R+ | ≥ |R|. Define f : R+ → R as f (x) = log2 x. To see that f is
onto, let y ∈ R. Letting x = 2y makes f (x) = y, noting 2y > 0 for any real y.
Thus |(0, 1)| ≥ |R+ | ≥ |R|.
So it seems with many of these infinite sets, we can find an onto function from one to the
other. Perhaps |A| = |B| for all infinite sets? Cantor discovered in 1874 that the answer is
no.7

end of lecture 10a

11.6.3 Diagonalization
The following theorem changed the course of science.
7
He used a different technique in 1874, and in 1891 discovered the technique known as “diagonalization” that we
present here.
11.6. THE HALTING PROBLEM IS UNDECIDABLE 173

Theorem 11.6.1. Let X be any set. Then there is no onto function f : X → P(X).
Proof. Let X be any set, and let f : X → P(X). It suffices to show that f is not onto.
Define the set
D = { a ∈ X | a 6∈ f (a) } .
Let a ∈ X be arbitrary. Since D ∈ P(X), it suffices to show that D 6= f (a). By the
definition of D,
a ∈ D ⇐⇒ a 6∈ f (a),
so D 6= f (a).
The interpretation is that |X| < |P(X)|, even if X is infinite.
This proof works for any set X at all. In the special case where X = N, we can visualize
why this technique is called “diagonalization”. Suppose for the sake of contradiction that
there is a onto function f : N → P(N); then we can enumerate the subsets of N in order
S0 = f (0), S1 = f (1), S2 = f (2), . . .. Each set S ⊆ N can be represented by an infinite
binary sequence χS , where the n’th bit χS [n] = 1 ⇐⇒ n ∈ S. Those sequences are the
rows of the following infinite matrix:
0 1 2 3 ... k
S0 = {1, 3, . . .} 0 1 0 1 ...
S1 = {0, 1, 2, 3, . . .} 1 1 1 1 ...
S2 = {2, . . .} 0 0 1 0 ...
S3 = {0, 2, . . .} 1 0 1 0 ...
.. ...
.
Sk = D = {0, 3, . . .} 1 0 0 1 ... ?
If D is in the range of f , then Sk = D for some k ∈ N. But D is defined so that χD is the
bitwise negation of the diagonal of the above matrix. So if D appears as row k, this gives a
contradiction when we ask what is the bit at row k and column k.

11.6.4 Countable versus uncountable sets


onto
For two sets X, Y , we write |X| ≥ |Y | if there is f : X −−→ Y , and |X| < |Y | otherwise. We
write |X| = |Y | if there is a bijection (a 1-1 and onto function) f : X → Y .8
We say X is countable if |X| ≤ |N|;9 i.e., if X is a finite set or if |X| = |N|.10 We say X
is uncountable if it is not countable; i.e., if |X| > |N|.11
8
By a deep result known as the Cantor-Bernstein Theorem, this is equivalent to saying that there is an onto
function f : X → Y and another onto function g : Y → X; i.e., |X| = |Y | if and only if |X| ≤ |Y | and |X| ≥ |Y |.
9
Some textbooks define countable only for infinite sets, but here we consider finite sets to be countable, so that
uncountable will actually be the negation of countable.
10
It is not difficult to show that for every infinite countable set has the same cardinality as N; i.e., there is no
infinite countable set X with |X| < |N|.
11
The relations <, >, ≤, ≥, and = are transitive: for instance, (|A| ≤ |B| and |B| ≤ |C|) =⇒ |A| ≤ |C|.
174 CHAPTER 11. UNDECIDABILITY

Stating that a set X is countable is equivalent to saying that its elements can be listed;
i.e., that it can be written X = {x0 , x1 , x2 , . . .}, where every element of X will appear
somewhere in the list.12
Observation 11.6.2. |N| < |R|; i.e., R is uncountable.
Proof. By Theorem 11.6.1, |N| < |P(N)|, so it suffices to prove that |P(N)| ≤ |R|;13 i.e., that
there is an onto function f : R → P(N).
Define f : R → P(N) as follows. Each real number r ∈ R has an infinite decimal
expansion.14 For all n ∈ N, let rn ∈ {0, 1, . . . , 9} be the nth digit of the decimal expansion
of r. Define f (r) ⊆ N as follows. For all n ∈ N,
n ∈ f (r) ⇐⇒ rn = 0.
That is, if the nth digit of r’s binary expansion is 0, then n is in the set f (r), and n is not in
the set otherwise. Given any set A ⊆ N, there is some some number rA ∈ R whose decimal
expansion has 0’s exactly at the positions n ∈ A, so f (rA ) = A, whence f is onto.

Continuum Hypothesis. There is no set A such that |N| < |A| < |P(N)|.
More concretely, this is stating that for every set A, either there is an onto function
f : N → A, or there is an onto function g : A → P(N).
Interesting fact: Remember earlier when we stated that Gödel proved that there are
true statements that are not provable? The Continuum Hypothesis is a concrete example of
a statement that is not provable, nor is its negation.15 So it will forever remain a hypothesis;
we can never hope to prove it either true or false.
Theorem 11.6.1 has immediate consequences for the theory of computing.
Observation 11.6.3. There is an undecidable language L ⊆ {0, 1}∗ .
Proof. {0, 1}∗ is countable, as is the set of all TM’s. By Theorem 11.6.1, P({0, 1}∗ ), the set
of all binary languages, is uncountable. So some language is not decided by any TM.

11.6.5 Using diagonalization to show the halting problem is undecidable


Observation 11.6.3 shows that some undecidable language must exist. However, it is more
challenging to exhibit a particular undecidable language. In the next theorem, we use the
technique of diagonalization directly to show that
Halts = { hM, wi | M is a TM and M (w) halts }
12
This is because the order in which we list the elements implicitly gives us the bijection between N and X:
f (0) = x0 , f (1) = x1 , etc.
13
Actually they are equal, but we need not show this for the present observation.
14
This expansion need not be unique, as expansions such 0.03000000 . . . and 0.02999999 . . . both represent the
3
number 100 . But whenever this happens, exactly one representation will end in an infinite sequence of 0’s, so take
this as the “standard” decimal representation of r.
15
Technically, this isn’t quite true. What is true is that either both the Continuum Hypotheses and its negation
are both unprovable, or both are provable because all mathematical statements are provable, rendering our system of
mathematics useless for discovering true theorems.
11.6. THE HALTING PROBLEM IS UNDECIDABLE 175

is undecidable.

Theorem 11.6.4. Halts is undecidable.

Proof. Assume for the sake of contradiction that Halts is decidable, by the algorithm H.
1 def H (M , w ) :
2 """ Function that supposedly decides whether M ( w ) halts . """
3 raise N o t Im p l em e n te d E rr o r ()

Define the algorithm D as follows.


1 def D ( M ) :
2 """ " Diagonalizing " function ; given a function M , if M halts
3 when given it self as input , run forever ; otherwise halt . """
4 if H (M , M ) :
5 while True : # loop
6 pass
7 else :
8 return # halt

In other words, in input hM i a TM, D runs H(hM, M i),16 and does the opposite.17
Now consider running D(hDi). Does it halt or not? We have

D(hDi) halts ⇐⇒ H accepts hD, Di defn of H


⇐⇒ D(hDi) does not halt, defn of D

a contradiction.
1 D ( D ) # behavior not well - defined ! So H cannot be implemented .

Therefore no such algorithm H exists.

It is worth examining the proof of Theorem 11.6.4 to see the diagonalization explicitly.
We can give any string as input to a program. Some of those strings represent other
programs, and some don’t. We just want to imagine what happens when we run a program
Mi on an input hMj i that represents another program Mj . If Mi is not even intended to
handle inputs that are programs (for instance if it tests integers for primality, or graphs for
Hamiltonian paths), then the behavior of Mi on inputs representing programs may not be
interesting. Nonetheless, Mi either halts on input hMj i, or it doesn’t.
We then get an infinite matrix describing all these possible behaviors.

16
In more detail, D runs H to determine if M halts on the string that is the binary description of M itself.
17
In other words, halt if H rejects, and loop if H accepts. Since H is a decider, D will do one of these.
176 CHAPTER 11. UNDECIDABILITY

hM0 i hM1 i hM2 i hM3 i ... hDi


M0 loop halt loop halt

M1 halt halt halt halt

M2 loop loop halt loop

M3 halt loop halt loop


.. ..
. .
D halt loop loop halt ... ?

If the halting problem were decidable, then the program D would be implementable, and
its behavior on each input, described by the row with D on the left, would be the element-
wise opposite of the diagonal of this matrix. But this gives a contradiction since the entry
in the diagonal corresponding to D would then be undefined. Since D can be implemented
if Halts is decidable, this establishes that Halts is not decidable.

end of lecture 10b

11.7 Optional: The far-reaching consequences of undecidability


We have seen above that because the Halting Problem is undecidable, any other question
about the “long-term behavior” of a program is also undecidable. But undecidability has
farther-reaching consequences than this. The next is a theorem from mathematical logic,
which says something about the limits of knowledge itself.

11.7.1 Gödel’s Incompleteness Theorem


Gödel’s Incompleteness Theorem is among the few mathematical theorems to have cap-
tured the imagination of a large number of non-mathematicians (the Pythagorean The-
orem being another notable example.) It has a formal statement that almost no one
other than logicians manage to state correctly. There’s a whole book dedicated to all the
ways people misunderstand Gödel’s Incompleteness Theorem: https://www.crcpress.com/
Godels-Theorem-An-Incomplete-Guide-to-Its-Use-and-Abuse/Franzen/p/book/9781568812380
For the sake of simplicity, we will continue this tradition of incorrectly stating Gödel’s
Incompleteness Theorem: very roughly, it says that there is a mathematical statement that
is true, but that cannot be proved to be true.18
18
More precisely, it states that either there is a true statement that cannot be proved, OR some other crazy thing
11.7. OPTIONAL: THE FAR-REACHING CONSEQUENCES OF UNDECIDABILITY 177

The title of Section 11.7 claims that Gödel’s Incompleteness Theorem is a consequence
of undecidability. However, Gödel’s Incompleteness Theorem preceded Turing’s proof of the
undecidability of the Halting Problem by 5 years. Gödel glimpsed the first outlines of the full
theory of undecidability, which was opened wide with Turing’s proof of the undecidability of
the Halting Problem. So despite Gödel’s Incompleteness Theorem being chronologically first,
there’s a certain sense in which undecidability is conceptually at the center, with Gödel’s
Incompleteness Theorem being one application of these ideas (in fact, the first application,
discovered before the full weight of the ideas was truly understood). Also, one can use
Turing’s ideas to give a far simpler proof of Gödel’s Theorem than one tends to find in
textbooks on mathematical logic (and certainly simpler than Gödel’s original proof).
We prove a slightly weaker statement than what is usually called Gödel’s Incomplete-
ness Theorem; see http://www.scottaaronson.com/blog/?p=710 for a discussion of the
technicalities. What we prove is that there is a mathematical statement T such that either:

1. T and ¬T are both unprovable (thus one is true but not provable), or

2. T is false but provable.

Before going on, it is worth considering the importance of the second statement. If it were
true, it would mean that our formalization of mathematics is useless: if false statements
are provable, then proving a statement doesn’t actually inform us whether it is true.19 The
mathematical statement T will be of the form “a certain Turing machine halts on a certain
input.”
We show that if the above were false, then the Halting Problem would be decidable, a
contradiction. If the above is false, then for all mathematical statements T , both of the
following hold:
happens that renders the whole discussion irrelevant. The most common objection to Gödel’s Theorem is: “Wait, if
Gödel’s Theorem proves that a certain statement T is true but unprovable, isn’t Gödel’s Theorem itself a proof
of T ? So how can T be unprovable??” But Gödel’s Theorem doesn’t actually prove T is true: it proves the logical
disjunction “T is true OR that other crazy thing is true”, so Gödel’s Theorem is not actually a proof of T ; instead
it is a proof of (T ∨ crazy-thing). Sneak preview: the crazy thing is “some false statement is provable”.
19
Upon hearing that a mathematical system is “useless” if it gets a statement wrong, one might object, “That’s
a bit harsh... sure, there’s this one false statement T that is provable, so the mathematical system misleads us on
T , but maybe the system gets every other statement T 0 6= T correct, in the sense of being able to prove T 0 only if
T 0 is actually true, so the mathematical system is mostly useful despite getting one statement wrong.” It turns out
that, with a slight adjustment of the definition of “wrong”, this rosy scenario is impossible: if the system gets one
statement wrong, then it gets them all wrong.
The stronger statement of Gödel’s Incompleteness Theorem (actually proven by someone named Rosser) replaces
the second statement “T is false but provable” (a condition called unsoundness) with “T and ¬T are both provable”
(a condition called inconsistency). Then, if the system were inconsistent, it could prove every statement S! (including
every statement S and its negation ¬S, just like T and ¬T ). In other words, if there’s even one statement T where
the mathematical theory “can’t decide” whether T or ¬T holds, then in fact it can’t make a decision on any other
statement either. To see this, first convince yourself that ¬T =⇒ (T =⇒ S) is a logical tautology, i.e., holds for
every pair of statements T and S. (Just write out the truth table for the Boolean formula, for all 4 possible settings
of true and false to T and S.) Then, let S be an arbitrary statement and suppose T and ¬T are both provable. Apply
modus ponens twice: since the hypothesis ¬T is provable, then the conclusion (T =⇒ S) is provable, and since the
hypothesis T is provable, then the conclusion S is provable.
178 CHAPTER 11. UNDECIDABILITY

1. at least one of T or ¬T is provable, and


2. if T is provable, then it is true.
To solve the halting problem on instance hM, wi, we begin enumerating all the proofs there
are. This is a key property of any reasonable mathematical system: one must be able to
enumerate the proofs, for instance in length-lexicographical order (formal proofs are, after
all, simply finite strings). Upon encountering each new proof, check whether it is a proof of
the statement T = “M (w) halts”, or a proof of the statement ¬T = “M (w) does not halt”.
(Most of the enumerated proofs will be neither because they prove something else, such as
“3 < 4” or “P 6= NP”.)
By (1) at least one of T or ¬T has a proof, so the search for a proof of either T or ¬T
will eventually terminate. By (2), whichever proof is found is of a true statement, so if we
accept or reject hM, wi according to which statement it proved, the answer will be correct.
But this would decide the halting problem, a contradiction.
So there is Gödel’s Incompleteness Theorem (slightly weakened from the full version, and
still a bit informal since we didn’t get into formal definitions of proof or formal mathematical
system): so long as it is impossible to prove false statements—certainly a requirement of any
reasonable system of mathematics—unfortunately it is also impossible to prove some true
statements. In other words, our system for proving mathematical statements is necessarily
incomplete and unable to help us determine the truth of all possible statements.

11.7.2 Prediction of physical systems


TODO: write up some self-assembly/chemical reaction network prediction problems that are undecidable

end of lecture 10c


Appendix A

Reading Python

A.1 Installing Python


To install basic Python, go here: https://www.python.org/downloads/. I prefer a distri-
bution of Python called Anaconda, which comes with extra nice tools such as the “notebook”
discussed below: https://www.anaconda.com/download/.

A.2 Code from this book


The Python code in this book can be found in an Jupyter notebook:
https://github.com/dave-doty/UC-Davis-ECS120/blob/master/120.ipynb
If you just want to read the code without running it, you can look at it directly on GitHub
by clicking on the link above.
If you want to try running it yourself, first download the 120.ipynb file above. (Go to
https://github.com/dave-doty/UC-Davis-ECS120, right/option-click the file 120.ipynb,
and select “Save link as...”/“Download linked file as...”)
If you have Jupyter installed (there are a few ways to do this; I recommend Ana-
conda: https://www.continuum.io/downloads), open a command line in the directory
where 120.ipynb is stored and type jupyter notebook 120.ipynb
If you don’t have Jupyter installed, go to https://try.jupyter.org/, click the “upload”
button, and choose the 120.ipynb file from your local computer. Once it’s uploaded, click
on it in your browser.
The code is formatted for Python 3, which has slightly different syntax and semantics
than Python 2. If you have Python 2, a few things have to be changed for it to run; these
are marked with comments of the form
# for python 2 include the next line
above a line that needs to be uncommented. Similarly, some lines preceded by
# for python 3 include the next line
should be commented out.

179
180 APPENDIX A. READING PYTHON

A.3 Tutorial on reading Python


This section provides a short tutorial for experienced programmers on how to read the Python
code presented in this book. It is not a tutorial on how to program, or on how to write
Python. Instead, it assumes that the reader already has some familiarity with an imperative
programming language such as C, C++, C#, Java, Javascript, Go, Rust, Ruby, etc. (i.e., has
taken a programming course using such a language), and that the reader is able to follow
basic pseudocode conventions such as those used in any undergraduate algorithms/data
structures course, using generic programming constructs such as if statements, while/for
loops, assignment statements, function calls, basic data structures such as lists, arrays, sets,
dictionaries (a.k.a. key-value maps, hash tables).
The point of this section is to explain those specific aspects of Python that may be
unfamiliar, just well enough to be able to read the code examples given in this book. But
Python is called “executable pseudocode” for a reason: it is generally quite easy to read.
If you don’t already have Python installed, you can use https://repl.it/languages/
python3 or https://jupyter.org/try to try out snippets of Python code to see what it
does. If you aren’t sure what some code does, don’t guess! Try running it to see.

Variables. Python has no explicit declaration of variables. A variable is created by assign-


ing a value to it for the first time:
1 a = 3 # creates variable a and assigns it value 3
2 print ( a )
3 print ( b ) # causes error because variable b does not yet exist
4 b = 4

Strings. Python strings can be created with double quotes as in C++ or Java:
1 s = " This is a string . "
There is no separate char type. When you would use a char in C/C++/Java, in Python
you simply use a string of length 1.
Strings can also be created with single-quotes, which is convenient because double quotes
can be used inside of single-quoted strings, and vice versa, without escaping them with a
backslash:
1 s = ’ These " double quotes " do not need backslash escaping . ’
2 s2 = " These \" double quotes \" need backslash escaping . "
3 s3 = " It ’ goes the other way ’ too . "
Finally, Python has multiline strings, which start and end with three quotes instead of just
one, that allow newlines. Both types of quote can be used in them without escapes:
1 s = """ This is the " first " line .
2 This is the ’ second ’ line . """
3 s2 = ’’’ This is the " first " line .
4 This is the ’ second ’ line . ’’’
A.3. TUTORIAL ON READING PYTHON 181

Without triple quotes, you would need to use "\n" to insert newlines as in C/C++/Java,
and would need to escape some quotes:
1 s3 = " This is the \" first \" line .\ nThis is the ’ second ’ line . "

Indentation. Like pseudocode, blocks of Python code are denoted by indentation, not curly
braces as in C, C++, or Java. Consider the following code, which prints only c to the screen:
1 if 3 == 4:
2 print ( " a " )
3 print ( " b " )
4 print ( " c " )
However, the following code prints b followed by c.
1 if 3 == 4:
2 print ( " a " )
3 print ( " b " )
4 print ( " c " )

Documentation. Often the first line of a function will be a multiline string not assigned
to any variable. This is called a docstring and is interpreted as a comment documenting the
function:
1 def square ( x ) :
2 """ Compute the square of x .
3
4 For example , square (5) == 25 and square (3) == 9. """
5 return x * x
Since the docstring is not assigned to any variable, it has no effect on the program. Docstrings
are used by certain tools to automatically generate documentation for Python libraries:
https://wiki.python.org/moin/DocumentationTools

for loops. The code for i in range(n): in Python is similar to for (int i=0; i<n; i++)
in C/C++/Java: the loop has n iterations, letting i take on the values 0, 1, 2, . . ., n − 1.
So the following code prints the numbers from 0 to 9:
1 for i in range (10) :
2 print ( i )
But, many times in C/C++ when you would use a loop like for (int i=0; i<n; i++),
it would be to iterate over elements of an array or other data structure, i.e.,
1 int arr [] = {2 ,3 ,5 ,7 ,11};
2 for ( int i =0; i <5; i ++) {
3 int num = arr [ i ];
4 printf ( " The current integer is % d " , num ) ;
5 }
182 APPENDIX A. READING PYTHON

Python has a special syntax for this sort of loop, discussed below, which executes one
iteration for every element of a container. This is better to use than to loop over the indices,
because it is more readable, and because it eliminates the possible error that the index i is
out of bounds. For instance, if above loop instead were for (int i=0; i<=5; i++), i would
go off the end of the array.

lists and sets. v = [1,2,3,4,5] makes a list with 5 elements. A Python list is like an
array in C/C++/Java, but actually more like std::vector in C++ or java.util.ArrayList
in Java, because Python lists can grow and shrink. A list can have duplicates such as
d = [1,2,3,2,3], but a set cannot.
s = set(d) creates the set {1, 2, 3}, eliminating the duplicates in d. Another way to
create this set is the line s = {1,2,3}. Tuples such as (2,3,4) are lists that cannot be
modified (for example, l[0] = 5 works for a list l, but t[0] = 5 fails for a tuple t).
You can iterate over all elements of a list/tuple/set with a for loop using the in keyword:1
1 for s in [ " a " ," b " ," cd " ]:
2 print (4* s )
prints "aaaa", "bbbb", and "cdcdcdcd" to the screen.

list/set comprehensions. These are one of the most useful features and one of the best
reasons to familiarize yourself with mathematical set builder notation. Many clunky lines of
code can be replaced by a simple line that expresses the same idea. Recall that range(n) is
(something like) a list with the integers from 0 to n − 1.2 The following code creates various
other lists/sets from it:
1 ints = range (10) # range with elements 0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9
2 a = [3* n for n in ints ] # [0 ,3 ,6 ,9 ,12 ,15 ,18 ,21 ,24 ,27]
3 b = [ n for n in ints if n % 2 == 0] # [0 ,2 ,4 ,6 ,8]
4 c = [ n //3 for n in ints ] # [0 ,0 ,0 ,1 ,1 ,1 ,2 ,2 ,2 ,3]
5 d = { n //3 for n in ints } # {0 ,1 ,2 ,3}
6 e = {3 n for n in ints if n % 2 == 0} # {0 ,6 ,12 ,18 ,24}
a, b, and c are lists, but d and e are sets, so cannot have duplicates.
In general, if we have some expression <expr> (such as n//3) and optionally some Boolean
expression <phi> (such as n%2==0), the following code:
1 new_lst = [ < expr > for n in lst if <phi >]
is equivalent to
1 new_lst = []
2 for n in lst :
3 if <phi >:
4 new_lst . append ( < expr >)
1
The latest C++ standard now supports a similar idea called a “range-based for loop”: http://en.cppreference.
com/w/cpp/language/range-for.
2
Technically it returns something called a “range” type in Python 3. The main difference with a list is that the
numbers don’t get stored in memory, but we can iterate over a range just like a list, and the numbers will be generated
as we need them. In some languages this is called “lazy evaluation”.
A.3. TUTORIAL ON READING PYTHON 183

Omitting the if expression means all of the elements are added, i.e.,
1 new_lst = [ < expr > for n in lst ]
is equivalent to
1 new_lst = []
2 for n in lst :
3 new_lst . append ( < expr >)
Sets may be similarly constructed:
1 new_set = { < expr > for n in lst if <phi >}
is equivalent to
1 new_set = set () # need to use set () constructor ; {} makes an empty
dict , not a set
2 for n in lst :
3 if <phi >:
4 new_set . add ( < expr >)
For example, s = {3*n for n in range(10) if n%2==0} is equivalent to
1 s = set ()
2 for n in range (10) :
3 if n %2==0:
4 s . add (3* n )
They both create the set {0, 6, 12, 18, 24}. Note the similarity to the mathematical
set-builder notation s = {3n | n ∈ {0, 1, . . . , 9}, n is even}.
The big advantage of list/set comprehension notation isn’t so much that you have to
type fewer keystrokes (although you do save a few). The main advantage is readability. The
notation clearly communicates to anyone fluent in Python what is the purpose of the line of
code: to take items from a list/set3 , perhaps filter some of them out with the if keyword,
and process the rest using the expression at the beginning.
You may have experience with functional programming, with map and filter keywords
to transform lists. The expression <expr> above is like an anonymous function used with
map, and the Boolean expression <phi> is like an anonymous Boolean function used with
filter. In fact, Python also has the keywords map and filter, and they work the same way.
However, it is conventional to prefer a list comprehension to map and/or filter, since it is
usually more readable. Similarly, reduce (also called foldl/foldr in functional languages)
exists in Python (in functools in Python 3), but an explicit for loop is usually more readable.
See

• http://www.artima.com/weblogs/viewpost.jsp?thread=98196

• https://google.github.io/styleguide/pyguide.html

• https://stackoverflow.com/questions/5426754/google-python-style-guide
3
More generally, any object that can be “iterated over”: lists, sets, tuples, strings, and some other objects.
184 APPENDIX A. READING PYTHON

itertools. The itertools package is very useful for checking the various kinds of sub-
structures of data structures that are common in algorithms. For example, itertools.
combinations(x,k) takes all subsequences of length k from x. For example, this code:
1 import itertools
2 x = [2 ,3 ,5]
3 for t in itertools . combinations (x , 2) :
4 print ( t )
prints all the ordered pairs of elements from x, in the order they originally appear:
(2, 3)
(2, 5)
(3, 5)
To make the code easier to read, we can also import only the function combinations, and
change its name to something else (since we use it to get subsets of a fixed size, even though
for convenience we often use Python lists to represent sets):
1 from itertools import combinations as subsets
2 x = [2 ,3 ,5]
3 for t in subsets (x , 2) :
4 print ( t )
The above code is more “literate”; it’s straightforward to read the for loop in English as
“for all t that are subsets of x of size 2.”
The following code:
1 from itertools import combinations as subsets
2 x = [1 ,2 ,3 ,4 ,5]
3 for t in subsets (x , 3) :
4 print ( t )
prints all ordered triples:
(1, 2, 3)
(1, 2, 4)
(1, 2, 5)
(1, 3, 4)
(1, 3, 5)
(1, 4, 5)
(2, 3, 4)
(2, 3, 5)
(2, 4, 5)
(3, 4, 5)
Strings such as "abc" are not technically lists of characters (in fact Python has no char
data type; individual characters are the same thing as strings of length 1), but many of the
same functions that work on a list also work on a string, as though it were a list of length-1
strings. For example:
A.3. TUTORIAL ON READING PYTHON 185

1 from itertools import combinations as subsets


2 x = " abc "
3 for t in subsets (x , 2) :
4 print ( t )
prints
(’a’, ’b’)
(’a’, ’c’)
(’b’, ’c’)

You might also like