Computation Book
Computation Book
Making Connections
Jim Hefferon
https://hefferon.net/computation
Notation summary
Notation Description
P (𝑆) power set, collection of all subsets of 𝑆
𝑆c complement of the set 𝑆
1𝑆 characteristic function of the set 𝑆
⟨𝑎 0, 𝑎 1, ... ⟩ sequence
N, Z, Q, R natural numbers { 0, 1, ... }, integers, rationals, reals
a, b, . . . 0, 1 character (note the typeface)
Σ alphabet, set of characters
B alphabet of bits characters { 0, 1 }, or set of bits { 0, 1 }
𝜎, 𝜏 strings (any lower-case Greek letter except 𝜙 )
𝜀 empty string
Σ∗ set of all strings over the alphabet
L language, a subset of Σ∗
P Turing machine
𝜙 function computed by a Turing machine
𝜙 (𝑥)↓, 𝜙 (𝑥)↑ function converges on that input, or diverges
G graph
M Finite State machine
O (𝑓 ) order of growth of the function
C complexity class
Prob problem
V verifier for an NP language
Research into learning shows that content is best learned within context
. . . , when the learner is active, and that above all, when the learner can
actively construct knowledge by developing meaning and ‘layered’
understanding.
– A W (Tony) Bates, TEACHING IN A DIGITAL AGE
Jim Hefferon
Jericho, VT USA
University of Vermont
hefferon.net
Version 1.11, 2024-Oct-12
Contents
I Mechanical Computation 3
1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Computable functions . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
What it does not say . . . . . . . . . . . . . . . . . . . . . . . . . 17
An empirical question? . . . . . . . . . . . . . . . . . . . . . . . 17
Using Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Primitive recursion . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 General recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Ackermann functions . . . . . . . . . . . . . . . . . . . . . . . . 31
𝜇 recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Turing machine simulator . . . . . . . . . . . . . . . . . . . . . . . 38
B Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
D Ackermann’s function is not primitive recursive . . . . . . . . . . . . 47
E LOOP programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
II Background 59
1 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2 Cantor’s correspondence . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Universal Turing machine . . . . . . . . . . . . . . . . . . . . . . 81
Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5 The Halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
General unsolvability . . . . . . . . . . . . . . . . . . . . . . . . 91
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 Computably enumerable sets . . . . . . . . . . . . . . . . . . . . . . 106
8 Oracles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Jumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9 Fixed point theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 119
When diagonalization fails . . . . . . . . . . . . . . . . . . . . . 120
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A Hilbert’s Hotel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
B Unsolvability in intellectual culture . . . . . . . . . . . . . . . . . . 127
C Self Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
D Busy Beaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
E Cantor in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
IV Automata 179
1 Finite State machines . . . . . . . . . . . . . . . . . . . . . . . . . 179
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
2 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
𝜀 transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Equivalence of the machine types . . . . . . . . . . . . . . . . . . 198
3 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Kleene’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5 Non-regular languages . . . . . . . . . . . . . . . . . . . . . . . . . 220
6 Pushdown machines . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
A Regular expressions in the wild . . . . . . . . . . . . . . . . . . . . 234
B The Myhill-Nerode theorem . . . . . . . . . . . . . . . . . . . . . . 242
C Machine minimization . . . . . . . . . . . . . . . . . . . . . . . . . 249
Appendix 369
A Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
B Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
C Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Notes 382
Bibliography 417
Part One
Classical Computability
Chapter
I Mechanical Computation
What can be computed? For instance, the function that doubles its input, that
takes in 𝑥 and puts out 2𝑥 , is intuitively mechanically computable. We shall call
such functions effective.
The question asks for the things that can be computed more than it asks for
how to compute them. In this Part we will be more interested in the function, in
the input-output behavior, than in the details of implementing that behavior.
Section
I.1 Turing machines
Despite this desire to downplay implementation, we follow the approach of
A Turing that the first step toward defining the set of computable
functions is to reflect on the details of what mechanisms can do.
The context of Turing’s thinking was the Entscheidungsproblem,†
proposed in 1928 by D Hilbert and W Ackermann, which asks for an
algorithm that decides, after taking as input a mathematical state-
ment, whether that statement is true or false. So he considered the
kind of symbol-manipulating computation familiar in mathematics,
such as when we expand nested brackets or verify a step in a plane
geometry proof.
After reflecting on it for a while, one day after a run‡ Turing laid
down in the grass and imagined a clerk doing by-hand multiplication
with a sheet of paper that gradually becomes covered with columns
of numbers. With this as a prototype, Turing posited conditions for
the computing agent.
First, it (or he or she) has a memory facility, such as the clerk’s Alan Turing 1912–
paper, where it can put information for later retrieval. 1954
Second, the computing agent must follow a definite procedure, a
precise set of instructions with no room for creative leaps. Part of what makes the
procedure definite is that the instructions don’t involve random methods, such as
counting clicks from radioactive decay to determine which of two possibilities to
perform.
The other thing making the procedure definite is that the agent is discrete —
it does not use continuous methods or analog devices. Thus there is no question
about the precision of operations as there might be when reading results off of a
Image: copyright Kevin Twomey, http://kevintwomey.com/lowtech.html † German for “decision
problem.” Pronounced en-SHY-duns-pob-lem. ‡ He was a serious candidate for the 1948 British Olympic
marathon team.
4 Chapter I. Mechanical Computation
slide rule or an instrument dial. In line with this, the agent works in a step-by-step
fashion. If needed they could pause between steps, note where they are (“about to
carry a 1”), and pick up again later. We say that at each moment the clerk is in one
of a finite set of possible states, which we denote 𝑞 0 , 𝑞 1 , . . .
Turing’s third condition arose because he wanted to investigate what is com-
putable in principle. He therefore imposed no upper bound on the amount of
available memory. More precisely, he imposed no finite upper bound — should
a calculation threaten to run out of storage space then more is provided. This
includes imposing no upper bound on the amount of memory available for inputs or
for outputs and no bound on the amount of extra storage, scratch memory, needed
in addition to that for inputs and outputs.† He similarly put no upper bound on
the number of instructions. And, he left unbounded the number of steps that a
computation performs before it finishes.‡
The final question Turing faced is: how smart is the computing agent? For
instance, can it multiply? We don’t need to include a special facility for multiplica-
tion because we can in principle multiply via repeated addition. We don’t even
need addition because we can repeat the add-one operation. In this way Turing
pared the computing agent down until it is quite basic, quite easy to understand,
until the operations are so elementary that we cannot easily imagine them further
divided, while still keeping that agent powerful enough to do anything that can in
principle be done.
The tape is the memory, sometimes called the ‘store’. The box can read from it
and write to it, one character at a time, as well as move a read/write head relative
to the tape in either direction. Thus, to multiply, the computing agent can start by
reading the two input multiplicands from the tape (the drawing shows 74 and 72
in binary, separated by a blank), can use the tape for scratch work, and can halt
with the output written on the tape.
The box is the computing agent, the CPU, sometimes called the ‘control’. The
†
True, every existing physical computer has bounded memory, putting aside storing things in the Cloud.
However, that space is extremely large. In this Part, when working with the model devices, imposing
a bound on memory is a hindrance or at best irrelevant. ‡ Some authors describe the availability of
resources such as the amount of memory as ‘infinite’. Turing himself does this. A reader may object
that this violates the goal of the definition, to model in-principle-physically-realizable computations,
and so the development here instead says that the resources have no finite upper bound. But really, it
doesn’t matter. In both cases the point is that if something cannot be computed when there are no
bounds then it cannot be computed on any real-world device.
Section 1. Turing machines 5
Start button sets the computation going. When the computation is finished the
Halt light comes on. The engineering inside the box is not important — perhaps
like the machines that we are used to it has integrated circuits, or perhaps it has
gears and levers, or perhaps LEGO’s — what matters is that each of its finitely many
parts can only be in finitely many states. If it has chips then each register has a
finite number of possible values, while if it is made with gears or bricks then each
settles in only a finite number of possible positions. Thus, however it is made, in
total the box has only finitely many states.
While executing a calculation, the mechanism steps from state to state. For
instance, an agent doing multiplication may determine, because of what state it is
in now and because of what it is reading on the tape, that they next need to carry
a 1. The agent transitions to a new state, one whose intuitive meaning is that it is
where carries take place.
Consequently, machine steps involve four pieces of information. Call the present
state 𝑞𝑝 and the next state 𝑞𝑛 . The symbol that the read/write head is presently
pointing to is 𝑇𝑝 . Finally, the next tape action is 𝑇𝑛 . Possible actions are: moving the
tape head left or right without writing, which we denote with 𝑇𝑛 = L or 𝑇𝑛 = R,†
or writing a symbol to the tape without moving the head, which we denote with
that symbol, so that 𝑇𝑛 = 1 means the machine will write a 1 to the tape. As to the
set of characters that can go on the tape, we will choose whatever is convenient
for the job we are doing. However every tape has blanks in all but finitely many
places and so that must be one of the symbols. (We denote blank with B when an
empty space could cause confusion.)
The four-tuple 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 is an instruction. For example, the instruction 𝑞 3 1B𝑞 5
is executed only if the machine is now in state 𝑞 3 and is reading a 1 on the tape. If
so, the machine writes a blank to the tape, replacing the 1, and passes to state 𝑞 5 .
1.1 Example This Turing machine with the tape symbol set Σ = { B, 1 } has six
instructions.
Ppred = {𝑞 0 BL𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 BL𝑞 2, 𝑞 1 1B𝑞 1, 𝑞 2 BR𝑞 3, 𝑞 2 1L𝑞 2 }
We adopt the convention that when we press Start the machine is in state 𝑞 0 . The
picture above shows the machine reading 1, so instruction 𝑞 0 1R𝑞 0 applies. Thus the
first step is that the machine moves its tape head right and stays in state 𝑞 0 . The
first line of the following table shows this and later lines show the configurations
after later steps. Briefly, the head slides to the right, blanks out the final 1, and
slides back to the start.
†
Whether we move the tape or the head doesn’t matter, what matters is their relative motion. Thus
𝑇𝑛 = L means that either the tape or the head moves so that the head now points one place to the left.
In drawings we hold the tape steady and move the head because the graphics are easier to read.
6 Chapter I. Mechanical Computation
111 11
2 q0
7 q2
111 11
3 q0
8 q2
111 11
4 q1
9 q3
11
5 q1
𝑥 − 1 – if 𝑥 > 0
(
pred (𝑥) =
0 – else
If the machine’s initial tape is entirely blank except for 𝑛 -many consecutive 1’s and
the read/write head points to the leftmost of those 1’s, then when the machine
halts the tape will have 𝑛 − 1-many 1’s. The only exception is where the tape starts
with 0-many 1’s, and there the tape will end with 0 many 1’s.
1.2 Example We can think of this machine with tape alphabet Σ = { B, 1 } as adding
two natural numbers.
The input numbers are represented by two strings of 1’s, separated with a blank.
The read/write head starts under the first symbol in the first number. This shows
the machine ready to compute 2 + 3.
11 111
q0
We adopt the convention that this is the configuration at step 0. Now the machine
scans right, looking for the blank separator. It changes that into a 1, then scans
left until it finds the start. Finally, it trims off a 1 and halts with the read/write
head pointing to the start of the string. Here are the steps.
Section 1. Turing machines 7
11 111 111111
2 q0
8 q2
11 111 111111
3 q1
9 q3
111111 111111
4 q1
10 q3
111111 11111
5 q2
11 q4
111111 11111
6 q2
12 q5
Δpred B 1
𝑞0 L𝑞 1 R𝑞 0 𝑞0 𝑞1 𝑞2 𝑞3
B, L B, L B, R
𝑞1 L𝑞 2 B𝑞 1
𝑞2 R𝑞 3 L𝑞 2 1, R 1, B 1, L
𝑞3 – –
Δadd B 1
𝑞0 B𝑞 1 R𝑞 0
1, 1
𝑞1 1𝑞 1 1𝑞 2 𝑞0 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5
𝑞2 B𝑞 3 L𝑞 2 B, B 1, 1 B, B 1, B
B, R
𝑞3 R𝑞 3 B𝑞 4 1, R B, 1 1, L B, R
𝑞4 R𝑞 5 1𝑞 5
𝑞5 – –
The graph is how we will most often present machines that are small but if there
are lots of states then it can be visually confusing.
Next, a crucial observation. Some Turing machines, for at least some starting
configurations, never halt.
1.3 Example The machine Pinf loop = {𝑞 0 BB𝑞 0, 𝑞 0 11𝑞 0 } never halts, regardless of the
input.
B, B 𝑞0 1, 1
8 Chapter I. Mechanical Computation
The exercises ask for examples of Turing machines that halt on some inputs and
not on others.
High time for definitions. We take a symbol to be something that the device
can write and read, for storage and retrieval.†
1.4 Definition A Turing machine P is a finite set of four-tuple instructions 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 .‡
In an instruction, the present state 𝑞𝑝 and next state 𝑞𝑛 are elements of a set
of states 𝑄 . The input symbol or current symbol 𝑇𝑝 is an element of the tape
alphabet set Σ, which contains at least two members including one called blank
(and does not contain L or R). The action symbol 𝑇𝑛 is an element of the action
set Σ ∪ { L, R }.
The set P must be deterministic: different four-tuples cannot begin with the
same 𝑞𝑝𝑇𝑝 . Thus, over the set of instructions 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 ∈ P, the association of
present pair 𝑞𝑝𝑇𝑝 with next pair 𝑇𝑛𝑞𝑛 defines a function, the transition function
or next-state function Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × 𝑄 .
Of course, the point of these machines is what they do. To finish the formaliza-
tion we now give a complete description of a machine’s action.
In tracing through Example 1.1 and Example 1.2 we saw that a Turing machine
acts by governing the transitions as that machine moves step by step. A configuration
of a Turing machine is a four-tuple ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ , where 𝑞 is a state, a member
of 𝑄 , 𝑠 is a character from the tape alphabet Σ, and 𝜏𝐿 and 𝜏𝑅 are strings from Σ∗,
including possibly the empty string 𝜀 . These signify the current state, the character
under the read/write head, and the tape contents to the left and right of the head.
For instance, in the trace table of Example 1.2, the ‘Step 2’ line shows that after
two transitions the state is 𝑞 = 𝑞 0 , the character under the head is the blank 𝑠 = B,
to the left of the head is 𝜏𝐿 = 11, and to the right is 𝜏𝑅 = 111. Thus the graphic
on that line pictures the configuration ⟨𝑞 0, B, 11, 111⟩ . That is, a configuration is a
snapshot, an instant in a computation.
We write C (𝑡) for the machine’s configuration after the 𝑡 -th transition and say
that this is the configuration at step 𝑡 . We extend that to step 0 by saying that the
initial configuration C ( 0) is the machine’s configuration before we press Start.
Then to define the action: suppose that at step 𝑡 the machine P is in configuration
C (𝑡) = ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ . To make the next transition, look for an instruction 𝑞𝑝𝑇𝑝𝑇𝑛 𝑞𝑛 ∈
P with 𝑞𝑝 = 𝑞 and 𝑇𝑝 = 𝑠 . The condition of determinism ensures that the set P
has at most one such instruction. If there is no such instruction then at step 𝑡 + 1
the machine P halts.
Otherwise, there are three possibilities. (1) If 𝑇𝑛 is a symbol in the tape alphabet
set Σ then the machine writes that symbol to the tape, so that the next configuration
†
How the device does this depends on its construction details. It could read and write marks on a
paper tape, align magnetic particles on a plastic tape, twiddle bits on a solid state drive, or it could
push LEGO bricks to the left or right side of a slot. Discreteness ensures that the machine can cleanly
distinguish between the symbols, in contrast with the trouble that can happen, for instance, in reading
an instrument dial near a boundary. ‡ We denote a Turing machine with a P because although these
machines are hardware, the things from everyday experience that they are most like are programs.
Section 1. Turing machines 9
is C (𝑡 + 1) = ⟨𝑞𝑛 ,𝑇𝑛 , 𝜏𝐿 , 𝜏𝑅 ⟩ . (2) If 𝑇𝑛 = L then the machine moves the tape head
to the left. So the next configuration is C (𝑡 + 1) = ⟨𝑞𝑛 , 𝑠ˆ, 𝜏ˆ𝐿 , 𝜏ˆ𝑅 ⟩ where 𝜏ˆ𝑅 is the
concatenation of the one-character string ⟨𝑠⟩ with 𝜏𝑅 , where if 𝜏𝐿 = 𝜀 then 𝑠ˆ is
the blank and 𝜏ˆ𝐿 = 𝜀 , and otherwise where 𝑠ˆ = 𝜏𝐿 [−1] and 𝜏ˆ𝐿 = 𝜏𝐿 [ : −1] . (3) If
𝑇𝑛 = R then the machine moves the tape head to the right. This is like (2) so we
omit the details.
If two configurations are related by being a step apart then we write C (𝑖) ⊢
C (𝑖 + 1) .† A computation is a sequence C ( 0) ⊢ C ( 1) ⊢ C ( 2) ⊢ · · · . We abbreviate a
sequence of ⊢’s with ⊢∗ .‡ If the computation halts then the sequence has a final
configuration C (ℎ) so we could write a halting computation as C ( 0) ⊢∗ C (ℎ) .
1.5 Example In Example 1.1’s table tracing the machine’s steps, the graphics illustrate
the successive configurations. Here is the same sequence as a computation.
Finally, as in that example, observe that our description of the action of a Turing
machine emphasizes that it is a state machine — a computation is a sequence of
discrete transitions.
1 1 1 1 1 1 1 1
𝑞0 ↦→ 𝑞ℎ
But there are a couple of things that the definition must take care with. First, a
Turing machine may fail to halt on some input strings. Second, just specifying
the input string is not enough since the initial position of the head can change the
computation.
1.6 Definition Let P be a Turing machine with tape alphabet Σ. For input 𝜎 ∈ Σ∗,
placing that on an otherwise blank tape and pointing P ’s read/write head to
𝜎 ’s left-most symbol is loading that input. If we start P with 𝜎 loaded and it
eventually halts then we denote the associated output string as 𝜙 P (𝜎) . If the
machine never halts then 𝜎 has no associated output. The function computed by
the machine P is the set of associations 𝜎 ↦→ 𝜙 P (𝜎) .
† ‡
Read ‘ ⊢’ aloud as “yields.” Read this aloud as “yields eventually.”
10 Chapter I. Mechanical Computation
1.7 Definition For 𝜎 ∈ Σ∗, if the value of a Turing machine computation is not
defined on 𝜎 then we say that the function computed by the machine diverges on
that input, written 𝜙 P (𝜎)↑ (or 𝜙 P (𝜎) = ⊥ ). Otherwise we say that it converges,
𝜙 P (𝜎)↓.
Note the difference between the machine P and the function computed by
that machine, 𝜙 P . For example, the machine Ppred is a set of four-tuples but
the predecessor function is a set of input-output pairs, which we might denote
𝑥 ↦→ pred (𝑥) . Another example of the difference is that machines halt or fail to
halt, while functions converge or diverge.
More points: (1) When there is only one machine under discussion then we
write 𝜙 instead of 𝜙 P . (2) In this book we like to write machines so that they
also finish with the head under the first character of the output string, which isn’t
strictly necessary but it makes composing machines easier. (3) In other fields of
mathematics a function comes with a domain, the set of inputs on which it is
defined. In this field the convention is to write 𝜙 : Σ∗ → Σ∗ and describe it as a
partial function, where some 𝑊 ⊆ Σ∗ is the set of input strings 𝜎 such that 𝜙 (𝜎)↓.
If 𝑊 = Σ∗ then 𝜙 is said to be a total function. (Every 𝜙 is partial but saying
‘partial’ usually connotes that the function is not total.)
There is one more point to raise about the definition. We will often consider
a function that isn’t an association of string input and output, and describe it as
computed by a machine. For this we must impose an interpretation on the strings.
For instance, with the predecessor machine in Example 1.1 we took the strings
to represent natural numbers in unary. The same holds for computations with
non-numbers, such as directed graphs, where we also just fix some encoding of
the input and output strings. (We could worry that our interpretation might be so
involved that, as with a horoscope, the work happens in the interpretation. But
we will stick to cases such as the unary representation of numbers where this is
not an issue.) Of course, the same thing happens on physical computers, where
the machine twiddles bitstrings and then we interpret them as characters in a
document, or notes in a quartet, or however we please.
When we describe the function computed by a machine, we typically omit the
part about interpreting the strings. We say, “this shows that 𝜙 ( 3) = 5” rather than,
“this shows that 𝜙 takes a string representing 3 to a string representing 5.” The
details of the representation are usually not of interest in this chapter (in the fifth
chapter we will sometimes worry about the time or space that they consume).
1.8 Remark Early researchers, working before actual machines were widely available,
needed airtight proofs that for instance there is a mechanical computation of the
function that takes in a number and returns the power of 5 in that number’s prime
factorization. So they did the details, building up a large body of work which
could be quite low level.
As an example of low-level detail, in the addition machine Example 1.2 we
took the separator blank to be significant. Allowing significant blanks raises the
issue of ambiguity: which of the blanks on the tape count as input and output and
Section 1. Turing machines 11
which do not? We could handle this by adding a character to the alphabet to use
exclusively as a begin/end marker. Or we could enforce that strings come in the
form 𝜎 = 𝛼 B𝜏 where 𝜏 consists of |𝛼 | many 1’s. Or we could code everything with
integers, such as coding the triple ⟨7, 8, 9⟩ as 27 38 59.
In this book we typically don’t work through these details. Our everyday
experience convinces us that machines can use their alphabet to reasonably
represent anything computable. Besides, spending a great deal of time on these
details risks hiding the underlying ideas, and we want to get to more interesting
material. The next section will say more.
1.9 Definition A computable function, or recursive function,† is one computed by
some Turing machine (it may be a total function or partial). A computable set,
or recursive set, is one whose characteristic function is computable. A Turing
machine decides a set if it computes the characteristic function of that set. A
relation is computable if it is computable as a set.‡
There is a terminology focused on Boolean functions that we will emphasize in
Chapter Five.
1.10 Definition A Turing machines decides a language if for all strings in the language
it halts and accepts (perhaps signaled by ending with just a 1 on the tape) and
for all strings not in the language it halts and rejects (perhaps ending with all
blanks). A Turing machine recognizes a language if for all strings members of the
language it halts and accepts, and for nonembers it never halts and accepts (but it
might fail to halt).
We close with a summary. We have defined mechanical computation. We
view it as a process whereby a physical system evolves through a sequence of
discrete steps that are local, meaning that all the action takes place within one cell
of the head. This gives us a precise characterization of which functions can be
mechanically computed. The next subsection discusses why this characterization
is widely accepted.
I.1 Exercises
Unless the exercise says otherwise, assume that Σ = { B, 1 }. Also assume that any
machine must start with its head under the leftmost input character and arrange for
it to end with the head under the leftmost output character.
1.11 How is a Turing machine like a program? How is it unlike a program? How
is it like the kind of computer we have on our desks? Unlike?
1.12 Why does the definition of a Turing machine, Definition 1.4, not include a
definition of the tape?
1.13 Your study partner asks you, “The opening paragraphs talk about the Entschei-
dungsproblem, to mechanically determine whether a mathematical statement is
†
The term ‘recursive’ used to be universal but is now old-fashioned. ‡ For instance, the relation ‘less
than’ is recursive because there is a recursive function that inputs two integers 𝑎 and 𝑏 and returns 1 if
𝑎 < 𝑏 but otherwise returns 0.
12 Chapter I. Mechanical Computation
true or false. I write programs with bits like if (x>3) all the time. What’s the
problem?” Help your friend out.
✓ 1.14 Trace each computation, as in Example 1.5. (a) The machine Ppred
from Example 1.1 when starting on a tape with two 1’s. (b) The machine Padd
from Example 1.2 the addends are 2 and 2. (c) Give the two computations as
configuration sequences, as on page 8.
✓ 1.15 For each of these false statements about Turing machines, briefly explain
the fallacy. (a) Turing machines are not a complete model of computation
because they can’t do negative numbers. (b) The problem with Example 1.3 is
that the instructions don’t have any extra states where the machine goes to halt.
(c) For a machine to reach state 𝑞 50 it must run for at least fifty one steps.
1.16 We often have some states that are halting states, where we send the machine
solely to make it halt. In this case the others are working states. For instance,
Example 1.1 uses 𝑞 3 as a halting state and its working states are 𝑞 0 , 𝑞 1 , and 𝑞 2 .
Name Example 1.2’s halting and working states.
✓ 1.17 Trace the execution of Example 1.3’s Pinf loop for ten steps, from a blank
tape.
1.18 Trace the execution on each input of this Turing machine with alphabet
Σ = { B, 0, 1 } for ten steps, or fewer if it halts.
move to the end of the 1’s, past a blank, and put down two 1’s. Then move
left until you are at the start of the first sequence of 1’s. Repeat.
(b) Instead assume that the alphabet is Σ = { B, 0, 1 } and the input is represented
in binary.
✓ 1.26 Produce a Turing machine that takes as input a number 𝑛 written in unary,
represented as 𝑛 -many 1’s, and if 𝑛 is odd then it gives as output the number 1 in
unary, with the head under that 1, while if 𝑛 is even it gives the number 0 (which
in a unary representation means the tape is blank).
1.27 Write a machine P with tape alphabet Σ consisting of blank B, stroke 1, and
the comma ‘,’ character. Where Σ0 = Σ − { B }, if we interpret the input 𝜎 ∈ Σ0 as
a comma-separated list of natural numbers represented in unary, then this machine
should return the sum, also in unary. Thus, 𝜙 P ( 1111,111,1) = 11111111.
1.28 Is there a Turing machine configuration without any predecessor? Restated,
is there a configuration C = ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ for which there does not exist any
configuration Cˆ = ⟨𝑞,
ˆ 𝑠ˆ, 𝜏ˆ𝐿 , 𝜏ˆ𝑅 ⟩ and instruction I = 𝑞ˆ𝑠ˆ𝑇𝑛𝑞𝑛 such that if a machine
is in configuration Cˆ then instruction I applies and Cˆ ⊢ C ?
1.29 One way to argue that Turing machines can do anything that a modern
CPU can do involves showing how to do all of the CPU’s operations on a Turing
machine. For each, describe a Turing machine that will perform that operation.
You need not produce the machine, just outline the steps. Use the alphabet
Σ = { 0, 1, B }. (a) Take as input a 4-bit string and do a bitwise NOT, so that
each 0 becomes a 1 and each 1 becomes a 0. (b) Take as input a 4-bit string
and do a bitwise circular left shift, so that from 𝑏 3𝑏 2𝑏 1𝑏 0 you end with 𝑏 2𝑏 1𝑏 0𝑏 3 .
(c) Take as input two 4-bit strings and perform a bitwise AND.
✓ 1.30 For each, produce a machine meeting the condition. (a) It halts on exactly
one input. (b) It fails to halt on exactly one input. (c) It halts on infinitely many
inputs and fails to halt on infinitely many.
1.31 Definition 1.9 says that a set is computable if there is a Turing machine that
acts as its characteristic function. That is, the machine is started with the tape blank
except for the input string 𝜎 , and with the head under the leftmost input character.
This machine halts on all inputs, and when it halts, the tape is blank except for a
single character, and the head points to that character. That character is either 1
(meaning that the string 𝜎 is in the set) or 0 (meaning it is not). For the next three
exercises, produce a Turing machine that acts as the characteristic function of the set.
1.32 See the note above. Produce a Turing machine that acts as the characteristic
function of the set, {𝜎 ∈ B∗ 𝜎 [ 0] = 0 }, of bitstrings that start with 0.
1.33 Produce a Turing machine that acts as the characteristic function of the set
{𝜎 ∈ B∗ 𝜎 [ 0 : 1] = 01 } of bitstrings that start with 01.
1.34 See the note before Exercise 1.32. Produce a Turing machine that acts as the
characteristic function of the set of bitstrings that start with some number of 0’s,
including possibly zero-many of them, followed by a 1.
14 Chapter I. Mechanical Computation
1.35 Definition 1.9 talks about computable relations. Consider the ‘less than or
equal’ relation between two natural numbers. Produce a Turing machine with
Σ = { 0, 1, B } that takes in two numbers represented in unary and outputs 𝜏 = 1 if
the first number is less than or equal to the second, and 𝜏 = 0 if not.
1.36 Write a Turing machine that decides if its input is a palindrome, a string that
is the same backward as forward. Use Σ = { B, 0, 1 }. Have the machine end with a
single 1 on the tape if the input was a palindrome, and with a blank tape if not.
1.37 Turing machines tend to have many instructions and to be hard to understand.
So rather than exhibit a machine, people often give an overview. Do that for a
machine that replicates the input: if it is started with the tape blank except for a
contiguous sequence of 𝑛 -many 1’s, then it will halt with the tape containing two
sequences of 𝑛 -many 1’s separated by a single blank.
1.38 Show that if a Turing machine has the same configuration at two different
steps then it will never halt. Is that sufficient condition also necessary?
1.39 Show that the steps in the execution of a Turing machine are not necessarily
invertible. That is, produce a Turing machine and a configuration such that if you
are told the machine was brought to that configuration after some number of steps,
and you were asked what was the prior configuration, you couldn’t tell.
Section
I.2 Church’s Thesis
History Algorithms have always played a central role in mathematics. The simplest
example is a formula such as the one giving the height of a ball dropped from the
Leaning Tower of Pisa, ℎ(𝑡) = −4.9𝑡 2 + 56. This is a kind of program: get the
height output by squaring the time input, multiplying by −4.9, and adding 56.
In the 1670’s the co-creator of Calculus, G Leibniz, constructed
the first machine that could do addition, subtraction, multiplication,
division, and square roots as well. This led him to speculate on
the possibility of a machine that manipulates not just numbers but
also symbols, and could thereby determine the truth of scientific
statements. To settle any dispute, Leibniz wrote, scholars could say,
“Let us calculate!” This is a version of the Entscheidungsproblem.
The real push to understand computation arose in 1927 from
the Incompleteness Theorem of K Gödel. This says that for any
(sufficiently powerful) axiom system there are statements that,
while true in any model of the axioms, are not provable from those
Leibniz’s Stepped axioms. Gödel gave an algorithm that inputs the axioms and outputs
Reckoner the statement. This made evident the need to precisely define what
is ‘algorithmic’ or ‘mechanically computable’ or ‘effective’.
A number of mathematicians proposed formalizations. One was A Church,†
†
After producing his machine model in 1935, Turing got a PhD in 1938 under Church at Princeton.
Section 2. Church’s Thesis 15
who developed a system called the 𝜆 -calculus. Church and his students used it to
derive many intuitively computable functions such as number theoretic functions
for divisibility and prime factorization. Church suggested to the most prominent
expert in the area, Gödel, defining the set of effective functions as the set of
functions that are 𝜆 -computable. But Gödel, who was notoriously careful, was
unconvinced.
Everyone agreed that the doubler function 𝑓 (𝑥) = 2𝑥 is effective: we
can go from input to output in a way that is typographic, that pushes
symbols without any need for intuition or insight. Church and his students
had exhibited a wide class of functions that they argued are effective by
proving that they are 𝜆 calculable. But the question is: where is the far
end of this collection? Arguing that ‘derivable with the 𝜆 calculus’ implies
effective does not give the converse.
Everything changed when Gödel read Turing’s masterful analysis, out- Alonzo
Church
lined in the prior section. He wrote, “That this really is the correct definition
1903–1995
of mechanical computability was established beyond any doubt by Turing.”
2.1 Church’s Thesis The set of things that can be computed by a discrete and
deterministic mechanism is the same as the set of things that can be computed by
a Turing machine.‡
This is central to the Theory of Computation. It says that our technical results
have a larger importance — they describe the devices that are on our desks and in
our pockets.
Evidence We cannot give a mathematical proof of Church’s Thesis. The definition
of a Turing machine, or of 𝜆 calculus or other equivalent schemes, formalizes
‘intuitively mechanically computable’. When a researcher consents to work within
this formalization they are then free to reason about computation mathematically.
So in a sense Church’s Thesis comes before the mathematics, or at any rate sits
outside its usual derivation and verification work. Turing wrote, “All arguments
which can be given are bound to be, fundamentally, appeals to intuition, and for
this reason rather unsatisfactory mathematically.”
Despite not being the conclusion of a deductive system, Church’s Thesis
is generally accepted. We will give four points in its favor that persuaded
Gödel, Church, and others at the time, and that still persuade researchers
today: coverage, convergence, consistency, and clarity.
First, coverage. Everything that is intuitively computable has proven
to be computable by a Turing machine. This includes not just the number
Kurt Gödel theoretic functions investigated by researchers in the 1930’s but also
1906–1978 everything ever computed by every program written for every existing
computer, because all of them can be compiled to run on a Turing machine.
Despite this weight of evidence, the argument by coverage would collapse if
someone exhibited even one counterexample, one operation that can be done in
‡
In recent years this has come to be often called the Church-Turing Thesis. Here we figure that because
Turing has the machine, we can give Church nominal possession of the thesis.
16 Chapter I. Mechanical Computation
arbitrary choice, making a different choice leads to the same set of computable
functions. This is persuasive in that any proper definition of what is computable
should possess this property. For instance, if two-tape machines computed more
functions than one-tape machines and three-tape machines more than that, then
identifying the set of computable functions with those computable by single-tape
machines would be foolish. But as with the coverage and convergence arguments,
while this means that the class of Turing machine-computable functions is natural
and wide-ranging, it still leaves open a small crack of a possibility that the class
does not exhaust the list of functions that are mechanically computable.
The most persuasive single argument for Church’s Thesis — what caused Gödel
to change his mind and what still convinces scholars today — is clarity: Turing’s
analysis is compelling. Gödel noted this in the quote given earlier and Church felt
the same way, writing that Turing machines have, “the advantage of making the
identification with effectiveness . . . evident immediately.”
What it does not say Church’s Thesis does not say that in all circumstances the
best way to understand a discrete and deterministic computation is via the Turing
machine model. For example, a numerical analyst studying the performance of a
floating point algorithm should use a computer model that has registers. Church’s
Thesis says that the calculation could in principle be done by a Turing machine but
for this use registers are better because the researcher wants results that apply to
in-practice machines.†
Church’s Thesis also does not say that Turing machines are all there is to any
computation in the sense that if, say, you are working on an automobile antilock
braking system then while the Turing machine model can account for the logical
and arithmetic computations, it cannot do the entire system including sensor inputs
and actuator outputs. S Aaronson has made this point, “Suppose I . . . [argued] that
. . . [Church’s] Thesis fails to capture all of computation, because Turing machines
can’t toast bread. . . . No one ever claimed that a Turing machine could handle
every possible interaction with the external world, without first hooking it up to
suitable peripherals. If you want a Turing machine to toast bread, you need to
connect it to a toaster; then the [Turing machine] can easily handle the toaster’s
internal logic.”
In the same vein, we can get physical devices that supply a stream of random
bits. These are not pseudorandom bits that are computed by a method that
is deterministic; instead, well-established physics says these are truly random.
Turing machines are not lacking because they cannot produce the bits. Rather,
Church’s Thesis asserts that we can use Turing machines to model the discrete and
deterministic computations that we can do after we get the bits.
An empirical question? This discussion raises a big question: even if we accept
Church’s Thesis, can we do more by going beyond discrete and deterministic?
†
Scientists who study the brain also find Turing machines to be not the most suitable model. Note
however that saying that another model is a better fit is different than saying that there are brain
operations that could not in principle be done using a Turing machine as a substrate.
18 Chapter I. Mechanical Computation
Would analog methods such as passing lasers through a gas, say, or some kind of
subatomic magic allow us to compute things that no Turing machine can compute?
Or are Turing machines an ultimate in physically-possible machines? Did Turing,
on that day, lying on that grassy river bank after his run, intuit everything that
experiments with reality would ever find to be possible?
For a sense of the conversation, we know that the wave equation† can have
computable initial conditions (for these real numbers 𝑥 , there is a program that
inputs 𝑖 ∈ N and outputs 𝑥 ’s 𝑖 -th decimal place) but the solution is not computable.
So does the wave tank modeled by this equation compute something that Turing
machines cannot? Stated for rhetorical effect, do the planets in their orbits compute
an exact solution to the Three-Body Problem but our machines fail at it?
In this case we can object that an experimental apparatus can have noise
and measurement problems, including a finite number of decimal places in the
instruments, etc. But even if careful analysis of the physics of a wave tank leads us
to discount it as a reliable computer of a function, we can still wonder whether
there might be another apparatus that would work.
This big question remains open. No one has produced a generally accepted
example of a non-discrete mechanism that computes a function that no Turing
machine computes. However, there is also not yet an analysis of physically-possible
mechanical computation in the non-discrete case which has the support enjoyed
by Turing’s analysis in its more narrow domain.
We will not pursue this further, instead only observing that the mainstream
community of researchers takes Church’s Thesis as the basis for its work. For us,
‘computation’ will refer to the kind of work that Turing analyzed. That’s because
we are interested in thinking about symbol-pushing, not toasting bread.
Using Church’s Thesis Church’s Thesis asserts that the three models of computa-
tion: Turing machines, 𝜆 calculus, the general recursive functions that we will see
in the next section, and others that we won’t describe, are maximally capable. By
that we mean that these models all compute the same things — the set of functions
that each model computes equals the set of functions that we have named earlier as
the set of computable functions. So we can fix one of these models as our preferred
formalization and get on with the analysis. Here we choose Turing machines.
One reason that we emphasize Church’s Thesis is that it imbues our results
with a larger importance. When for instance we will later describe a function that
no Turing machine can compute then, with the thesis in mind, we will interpret
the technical statement to mean that this function cannot be computed by any
discrete and deterministic device.
But there is one more thing that we will do with Church’s Thesis. We will
leverage it to make life easier. As the exercises above illustrate, while writing a
few Turing machines gives some insight, after a while you find that doing more
machines does not give more illumination. Worse, focusing too much on machine
details risks obscuring larger points. So if we can be clear and rigorous without
†
A partial differential equation that describes the propagation of waves.
Section 2. Church’s Thesis 19
I.2 Exercises
2.2 Why is it Church’s Thesis instead of Church’s Theorem?
✓ 2.3 We’ve said that the thing from our everyday experience that Turing Machines
are most like is programs. What is the difference between: (a) a Turing Machine
and an algorithm? (b) a Turing Machine and a computer? (c) a program and a
computer? (d) a Turing Machine and a program?
2.4 Your study partner is struggling with a point. “I don’t get the excitement about
computing with a mechanism. I mean, the Stepped Reckoner is like an old-timey
calculator device: they can do some very limited computations, with numbers only.
But I’m interested in a modern computer that it vastly more flexible in that it can
also work with strings, for instance. I mean, a slide rule is not programmable, is
it?” Help them understand.
✓ 2.5 Each of these is often given as a counterargument to Church’s Thesis. Explain
why each is mistaken. (a) Turing machines have an infinite tape so it is not
a realistic model. (b) The universe is finite so there are only finitely many
configurations possible for any computing device, whereas a Turing machine has
infinitely many configurations, so it is not realistic.
✓ 2.6 One of these is a correct statement of Church’s Thesis and the others are not.
Which one is right? (a) Anything that can be computed by any mechanism can be
computed by a Turing machine. (b) No human computer, or machine that mimics
a human computer, can out-compute a Turing machine. (c) The set of things that
are computable by a discrete and deterministic mechanism is the same as the set of
things that are computable by a Turing machine. (d) Every product of a persons
mind, or product of a mechanism that mimics the activity of a person’s mind, can
be produced by some Turing machine.
2.7 List two benefits from adopting Church’s Thesis.
20 Chapter I. Mechanical Computation
Section
I.3 Recursion
We will outline an approach to defining computability that is different than Turing’s,
both to give a sense of another way to do this and because it is useful.† We will
list some initial functions that are intuitively computable. We will also describe
ways to combine existing functions to make new ones, where if the existing ones
are intuitively computable then so is the new one. An example of an intuitively
computable initial function is successor S : N → N, described by S (𝑥) = 𝑥 + 1, and
a combiner that preserves effectiveness is function composition. Using those, the
plus-two operation S ◦ S (𝑥) = 𝑥 + 2 is also intuitively mechanically computable.
Primitive recursion We now introduce another effectiveness-preserving
combiner, after beginning with some motivation.
Grade school students learn addition and multiplication as mildly
involved algorithms. They multiply, for example, by arranging the digits
into a table, doing partial products, and then adding. In 1861, H Grassmann
produced a more elegant definition. Here is the formula for addition,
plus : N2 → N, which takes as given the successor map.
Hermann
Grassmann
– if 𝑦 = 0
(
𝑥
plus (𝑥, 𝑦) = 1809-1877
S ( plus (𝑥, 𝑧)) – if 𝑦 = S (𝑧) for 𝑧 ∈ N
Besides being compact, this approach has a very interesting feature: ‘plus’ recurs
in its own definition.† This is definition by recursion. Whereas the grade school
definition of addition is prescriptive in that it gives a procedure, this recursive
definition is descriptive because it specifies the meaning, the semantics, of the
operation.
On first seeing recursion, many people wonder whether it might be logically
problematic — isn’t defining something in terms of itself a fallacy? However, in the
example above plus ( 3, 2) is not defined in terms of itself, it is defined in terms of
plus ( 3, 1) (and the successor function). Similarly, plus ( 3, 1) is defined in terms
of plus ( 3, 0) . And, clearly the definition of plus ( 3, 0) is not a problem. The key
here is to define the function on higher-numbered inputs using only its values on
lower-numbered ones.‡
A marvelous feature of Grassmann’s approach is that it extends naturally to
other operations. Multiplication has the same form.
0 – if 𝑦 = 0
(
product (𝑥, 𝑦) =
plus ( product (𝑥, 𝑧), 𝑥) – if 𝑦 = S (𝑧)
1 – if 𝑦 = 0
(
power (𝑥, 𝑦) =
product ( power (𝑥, 𝑧), 𝑥) – if 𝑦 = S (𝑧)
3.3 Example Similarly, the expansion of power ( 2, 3) gives a product of three 2’s.
†
That is, this is a discrete form of feedback. ‡ So the idea behind this recursion is that addition of
larger numbers reduces to addition of smaller ones.
Section 3. Recursion 23
(The (let ..) creates the local variable z, and sets it to 𝑦 − 1.) The same is true
for product and power.
(define (product x y)
(let ((z (- y 1)))
(if (= y 0)
0
(plus (product x z) x))))
(define (power x y)
(let ((z (- y 1)))
(if (= y 0)
1
(product (power x z) x))))
– if 𝑦 = 0
(
𝑔(𝑥 0, ... 𝑥𝑘 −1 )
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑦) =
ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑧), 𝑥 0, ... 𝑥𝑘 −1, 𝑧) – if 𝑦 = S (𝑧)
Here the bookkeeping is that the arity of 𝑓 , the number of inputs, is one more than
the arity of 𝑔 and one less than the arity of ℎ .
3.5 Example The function plus is defined by primitive recursion from 𝑔(𝑥 0 ) = 𝑥 0
and ℎ(𝑤, 𝑥 0, 𝑧) = S (𝑤) . The function product is defined by primitive recursion
from 𝑔(𝑥 0 ) = 0 and ℎ(𝑤, 𝑥 0, 𝑧) = plus (𝑤, 𝑥 0 ) . The function power is defined by
primitive recursion from 𝑔(𝑥 0 ) = 1 and ℎ(𝑤, 𝑥 0, 𝑧) = product (𝑤, 𝑥 0 ) .
Primitive recursion, along with function composition, suffices to define many
familiar functions.
3.6 Example The predecessor function is like an inverse to successor except that we
are using the natural numbers and so we can’t allow the predecessor of zero to
be negative. We instead take the special case that if the input is zero then the
output is zero also. We can define this function pred : N → N using the primitive
recursive schema.
0 – if 𝑦 = 0
(
pred (𝑦) =
𝑧 – if 𝑦 = S (𝑧)
Comparing this with Definition 3.4, pred has no 𝑥𝑖 ’s. Thus the bookkeeping is that
𝑔 has an arity of zero and, having no inputs, it is therefore the constant function
𝑔( ) = 0. As to ℎ , its arity is two although it ignores its first input, ℎ(𝑎, 𝑏) = 𝑏 .
†
Obviously Racket, like every general purpose programming language, comes with a built in addition
operator, as in (+ 3 2) , along with a multiplication operator, as in (* 3 2) , and with many other
arithmetic operators. ‡ A schema is an underlying organizational pattern or structure.
24 Chapter I. Mechanical Computation
3.7 Example For subtraction we must also special-case negatives. We take proper
subtraction, denoted 𝑥 −. 𝑦 , to equal 𝑥 − 𝑦 unless it is negative, in which case it
equals 0. This defines that function via primitive recursion.
– if 𝑦 = 0
(
𝑥
propersub (𝑥, 𝑦) =
pred ( propersub (𝑥, 𝑧)) – if 𝑦 = S (𝑧)
In the terms of Definition 3.4, 𝑓 is of arity two. That makes 𝑔 of arity one,
𝑔(𝑥 0 ) = 𝑥 0 . And the arity of ℎ is three so ℎ(𝑤, 𝑥 0, 𝑧) = pred (𝑤) , with two dummy
inputs.
Here is the promised collection of initial functions and function combiners.
3.8 Definition The set of primitive recursive functions consists of those that can be
® = 0,† the successor
derived from the initial operations of the zero function Z (𝑥)
function S (𝑥)
® = 𝑥 +1, and the projection functions I 𝑖 (𝑥)
® = I 𝑖 (𝑥 0, ... 𝑥𝑘 −1 ) = 𝑥𝑖 , by
a finite number of applications of the combining operations of function composition
and primitive recursion.
The initial functions are all clearly effective. Note also that the combiners are
such that if the parts are effective then so is their combination. In particular, the
computer code above makes evident that primitive recursion preserves effectiveness.
Hence every function in that set is of interest to us as it is intuitively mechanically
computable.
Function composition covers not just the simple case of two functions 𝑓 and 𝑔
that combine as 𝑓 ◦ 𝑔 (𝑥) ® = 𝑓 (𝑔(𝑥)) ® . It also covers simultaneous substitution,
where from 𝑓 (𝑥 0, ... 𝑥𝑛 ) and ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ) , . . . and ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 ) we get
𝑓 ( ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 ) ) .
3.9 Example The function defined by the recurrence
2 – if 𝑦 = 0
(
𝑓 (𝑦) =
𝑓 (𝑦 − 1) + 3𝑦 + 2 – otherwise
ℎ(𝑎, 𝑏) = plus ( plus (𝑎, product ( S ( S ( S (𝑍 (𝑎)))), plus (𝑏, 1))), S ( S (𝑍 (𝑎))))
Besides the ones in the above examples, many other familiar mathematical
operations are in the set of primitive recursive functions. They include the boolean
function that tests whether one number is less than another, the elementary
arithmetic function that finds the remainder left when one number is divided by
another, and the number-theoretic function that inputs a number and a prime and
returns the largest power of the prime that divides the number.
We have noted that every primitive recursive function is mechanically com-
putable. The list of primitive recursive functions given above and in the exercises is
so extensive that we may wonder whether every mechanically computable functions
is in the set of primitive recursive functions. The next section shows that the answer
is no — although primitive recursion is powerful, nonetheless there are intuitively
mechanically computable functions that are not primitive recursive.
I.3 Exercises
✓ 3.10 What is the difference between primitive recursion and primitive recursive?
3.11 In defining 00 there is a conflict between the desire to have that every power
of 0 is 0 and the desire to have that every number to the 0 power is 1. What does
the definition of power given above do?
✓ 3.12 As the section body describes, recursion doesn’t have to be logically problem-
atic. But some recursions are ill-defined; consider this one.
0 – if 𝑛 = 0
(
𝑓 (𝑛) =
𝑓 ( 2𝑛 − 2) – otherwise
42 – if 𝑦 = 0
(
𝐹 (𝑦) =
𝐹 (𝑦 − 1) – otherwise
3.15 The Boolean function is_zero inputs a natural number and returns 𝑇 if the
input is zero, and 𝐹 otherwise. Give a definition by primitive recursion, representing
𝑇 with 1 and 𝐹 with 0.
✓ 3.16 This is the first sequence of numbers ever computed on an electronic
computer.
0 – if 𝑦 = 0
(
𝑠 (𝑦) =
𝑠 (𝑦 − 1) + 2𝑦 − 1 – otherwise
(a) Find 𝑠 ( 0) , . . . 𝑠 ( 10) .
(b) Verify that 𝑠 is primitive recursive by putting it in the form given in Defini-
tion 3.4, giving suitable functions 𝑔 and ℎ . You can use functions already
shown in this section to be primitive recursive.
✓ 3.17 Start with a square array of dots that is 𝑛 dots on a side. Consider those
dots that are below or on the diagonal (the upper left to lower right diagonal).
This triangle has one dot in row 1, two in row 2, etc. The total number of dots in
𝑛 rows is the 𝑛 -th triangular number 𝑡 (𝑛) .
(a) Find 𝑡 ( 0) , . . . 𝑡 ( 10) .
(b) Show that 𝑡 is primitive recursive by describing it in the form given in
Definition 3.4. For 𝑔 and ℎ you can use functions already verified in this
section to be primitive recursive.
3.18 Consider this recurrence.
0 – if 𝑦 = 0
(
𝑑 (𝑦) = 2
𝑑 (𝑦 − 1) + 3𝑦 + 3𝑦 + 1 – otherwise
(a) Find 𝑑 ( 0) , . . . 𝑑 ( 5) .
(b) Verify that 𝑑 is primitive recursive by putting it in the form given in Defini-
tion 3.4. You can use functions already shown in this section to be primitive
recursive.
✓ 3.19 The Towers of Hanoi is a famous puzzle: In the great temple at Benares . . .
beneath the dome which marks the center of the world, rests a brass plate in which
are fixed three diamond needles, each a cubit high and as thick as the body of a bee.
On one of these needles, at the creation, God placed sixty-four discs of pure gold, the
largest disc resting on the brass plate, and the others getting smaller and smaller up
to the top one. This is the Tower of Brahma. Day and night unceasingly the priests
transfer the discs from one diamond needle to another according to the fixed and
immutable laws of Brahma, which require that the priest on duty must not move more
than one disc at a time and that he must place this disc on a needle so that there is no
smaller disc below it. When the sixty-four discs shall have been thus transferred from
the needle on which at the creation God placed them to one of the other needles, tower,
temple, and Brahmans alike will crumble into dust, and with a thunderclap the world
will vanish. It gives the recurrence below because to move a pile of discs you first
move to one side all but the bottom, which takes 𝐻 (𝑛 − 1) steps, then move that
Section 3. Recursion 27
bottom one, which takes one step, then re-move the other disks into place on top
of it, taking another 𝐻 (𝑛 − 1) steps.
1 – if 𝑛 = 1
(
𝐻 (𝑛) =
2 · 𝐻 (𝑛 − 1) + 1 – if 𝑛 > 0
– if 𝑏 = 0
(
𝑎
gcd (𝑎, 𝑏) =
gcd (𝑏, rem (𝑏, 𝑎)) – if 𝑏 > 0
where rem (𝑏, 𝑎) is the remainder when 𝑏 is divided by 𝑎 . Note that it has the
form of the schema of primitive recursion (however, this does not show that
it is primitive recursive because we have not yet verified that the remainder
function is primitive recursive). Use this method to compute gcd ( 28, 12) ,
gcd ( 104, 20) , and gcd ( 300009, 25) .
3.22 The following four exercises list functions and predicates. (A predicate is a
truth-valued function; we take an output of 1 to mean ‘true’ while 0 is ‘false’.) Show
that each is primitive recursive. For each, you may use functions already shown to be
primitive recursive in this section body, or in a prior exercise item or subitem.
✓ 3.23 See the note above.
(a) Constant function: C𝑘 (𝑥)
® = C𝑘 (𝑥 0, ... 𝑥𝑛−1 ) = 𝑘 for a fixed 𝑘 ∈ N.
(b) Maximum and minimum of two numbers: max (𝑥, 𝑦) and min (𝑥, 𝑦) . Hint: use
addition and proper subtraction.
(c) Absolute difference function: absdiff (𝑥, 𝑦) = |𝑥 − 𝑦| .
3.24 See the note before Exercise 3.23.
(a) Sign predicate: sgn (𝑦) , which gives 0 if 𝑦 = 0 and gives 1 if 𝑦 is greater than
zero.
28 Chapter I. Mechanical Computation
(b) Negation of the sign predicate: negsign (𝑦) , which gives 0 if 𝑦 is greater than
zero and 1 if 𝑦 = 0.
(c) Less-than predicate: lessthan (𝑥, 𝑦) = 1 if 𝑥 is less than 𝑦 , and 0 otherwise.
The greater-than predicate is similar.
✓ 3.25 See the note before Exercise 3.23.
(a) Boolean functions: we have the convention that we represent ‘true’ with 1 and
‘false’ with 0, and that holds for the outputs here. But for inputs, while we
still take 0 for ‘false’, we take any positive input to mean ‘true’. There is the
standard one-input function
1 – if 𝑥 = 0
(
not (𝑥) =
0 – otherwise
1 – if 𝑥 = 𝑦 = 1 0 – if 𝑥 = 𝑦 = 0
( (
and (𝑥, 𝑦) = or (𝑥, 𝑦) =
0 – otherwise 1 – otherwise
(To avoid being tedious, in the output of the clauses we write 0 to abbreviate
𝑍 ( ) while 1 abbreviates S (𝑍 ( )) .)
(b) Equality predicate: equal (𝑥, 𝑦) = 1 if 𝑥 = 𝑦 and 0 otherwise.
✓ 3.26 See the note before Exercise 3.23.
(a) Inequality predicate: notequal (𝑥, 𝑦) = 0 if 𝑥 = 𝑦 and 1 otherwise.
(b) Functions defined by a finite and fixed number of cases, as with these.
7 if 𝑥 = 1
7 if 𝑥 = 1 and 𝑦 = 2
𝑚(𝑥) = 9 if 𝑥 = 5 𝑛(𝑥, 𝑦) = 9 if 𝑥 = 5 and 𝑦 = 5
2 otherwise 0 otherwise
3.27 Show that each of these is primitive recursive. You may use any function
shown to be primitive recursive in the section body, in the prior exercise, or in a
prior item.
(a) Bounded sum function: the partial sums of a series where the terms 𝑔(𝑖)
are specified by a single primitive recursive function 𝑔, so that 𝑆𝑔 (𝑦) =
0 ≤𝑖<𝑦 𝑔(𝑖) = 𝑔( 0) + 𝑔( 1) + · · · + 𝑔(𝑦 − 1) (the sum of zero-many terms is
Í
𝑆𝑔 ( 0) = 0). In comparison with the final item of the prior question, while the
number of summands is also finite, here it varies with 𝑦 .
(b) Bounded product function: the partial products of a series whose terms
𝑔(𝑖) are given by a primitive recursive function, 𝑃𝑔 (𝑦) = 0 ≤𝑖<𝑦 𝑔(𝑖) =
Î
𝑔( 0) · 𝑔( 1) · · · 𝑔(𝑦 − 1) (the product of zero-many terms is 𝑃𝑔 ( 0) = 1).
(c) Bounded minimization: let 𝑚 ∈ N and let 𝑝 (𝑥, ® 𝑖) be a predicate for all
𝑖 < 𝑚 . The minimization operator 𝑀 (𝑥, ® 𝑖) , often written min𝑖<𝑚 [𝑝 (𝑥,
® 𝑖)]
or 𝜇𝑖𝑖<𝑚 [𝑝 (𝑥, ® 𝑖)] , returns the smallest 𝑖 ≤ 𝑚 such that 𝑝 (𝑥, ® 𝑖) = 0, or else
returns 𝑚 . Hint: Consider the bounded sum of the bounded products of the
predicates.
Section 3. Recursion 29
3.28 Show that each is a primitive recursive function. You can use functions
shown to be primitive recursive in this section, or in a prior exercise, or a prior
item.
(a) Bounded universal quantification: where 𝑚 ∈ N, for each 𝑖 < 𝑚 let 𝑝 (𝑥, ® 𝑖)
be a predicate. Then 𝑈 (𝑥, ® 𝑚) , typically written ∀𝑖 < 𝑚 [𝑝 (𝑥,
® 𝑖)] , has value 1
if 𝑝 (𝑥,
® 0) = 1 and . . . and 𝑝 (𝑥,® 𝑚 − 1) = 1. Otherwise, if even one 𝑝 (𝑥, ® 𝑖) is
non-1 for 0 ≤ 𝑖 < 𝑚 then 𝑈 (𝑥, ® 𝑚) = 0.
(b) Bounded existential quantification: where 𝑚 ∈ N, for each 𝑖 < 𝑚 let 𝑝 (𝑥, ® 𝑖)
be a predicate. Then 𝐸 (𝑥, ® 𝑚) , typically written ∃𝑖 ≤ 𝑚 [𝑝 (𝑥,
® 𝑖)] , has value 1
if 𝑝 (𝑥,
® 0) = 1 or . . . or 𝑝 (𝑥,
® 𝑚 − 1) = 1, and has value 0 otherwise.
(c) Divides predicate: where 𝑥, 𝑦 ∈ N we have divides (𝑥, 𝑦) if there is some 𝑘 ∈ N
with 𝑦 = 𝑥 · 𝑘 .
(d) Primality predicate: prime (𝑦) if 𝑦 has no nontrivial divisor.
3.29 We will show that the function rem (𝑎, 𝑏) giving the remainder when 𝑎 is
divided by 𝑏 is primitive recursive.
(a) Fill in this table.
𝑎 0 1 2 3 4 5 6 7
rem (𝑎, 3)
(b) Observe that rem (𝑎 + 1, 3) = rem (𝑎) + 1 for many of the entries. When is this
relationship not true?
(c) Fill in the blanks.
(1) – if 𝑎 = 0
rem (𝑎, 3) = (2) – if 𝑎 = 𝑆 (𝑧) and rem (𝑧, 3) + 1 = 3
3.31 The floor function 𝑓 (𝑥, 𝑦) = ⌊𝑥/𝑦⌋ returns the largest natural number
less than or equal to 𝑥/𝑦 . Show that it is primitive recursive. Hint: bounded
minimization from Exercise 3.27 is a good place to start.
3.32 The examples of primitive recursion in this section and earlier exercises all
have 𝑓 (𝑦) use only one prior value, 𝑓 (𝑧) = 𝑓 (𝑦 − 1) . But some recursions use more
than one, such as the Fibonacci recursion 𝐹 (𝑦) = 𝐹 (𝑦 − 1) + 𝐹 (𝑦 − 2) that uses two
(for Fibonacci, we get the recursion started by defining 𝐹 ( 0) = 1 and 𝐹 ( 1) = 1).
In a ‘course-of-values recursion’, the next value 𝑓 (𝑦) depends on some or all of the
prior values 𝑓 (𝑦 − 1) , . . . 𝑓 ( 0) . To do these in a primitive recursive way, we get
access to the sequence of all prior values by encoding them into a single number.
Consider a finite sequence of natural numbers 𝐴 = ⟨𝑎 0, ... 𝑎𝑘 − 1 ⟩ . Gödel’s multiplica-
tive encoding of 𝐴 is the natural number computed by multiplying factors, where
each factor is the 𝑖 -th prime number raised to the successor of the 𝑖 -th sequence
element.
For the empty sequence, 𝐺 (⟨ ⟩) = 1. We will sketch how to include all the prior
values.
(a) Find 𝐺 (𝐴) for 𝐴0 = ⟨3, 1⟩ and 𝐴1 = ⟨2, 2, 2⟩ .
(b) For each number 𝑛 , find the sequence 𝐴 where 𝑛 = 𝐺 (𝐴) , or find that no such
sequence exists: 𝑛 0 = 10800, 𝑛 1 = 12, and 𝑛 2 = 343.
(c) Why does the encoding use the successor function?
(d) Where 𝑓 : N𝑘+1 → N the course of values function 𝑓¯: N𝑘+1 → N is defined
as here for any 𝑥® ∈ N𝑘 .
– if 𝑦 = 0
(
𝐺 (⟨ ⟩)
𝑓¯(𝑥,
® 𝑦) =
® 0), ... 𝑓 (𝑥,
𝐺 (⟨𝑓 (𝑥, ® 𝑧)⟩) – if 𝑧 = S (𝑦)
Let 𝐹 (𝑛) be the 𝑛 -th Fibonacci number. Find 𝐹¯( 0) , 𝐹¯( 1) , 𝐹¯( 2) , and 𝐹¯( 3) .
✓ 3.33 This is McCarthy’s 91 function.
M ( M (𝑥 + 11)) – if 𝑥 ≤ 100
(
M (𝑥) =
𝑥 − 10 – if 𝑥 > 100
(a) What is the output for inputs 𝑥 ∈ { 0, ... 101 }? For larger inputs? (You may
want to write a small script.) (b) Show that this function is primitive recursive.
You may cite the results from this section or from prior exercises.
3.34 Show that every primitive recursive function is total.
Section 4. General recursion 31
Section
I.4 General recursion
Every primitive recursive function is intuitively mechanically computable. What
about the converse: is every mechanically computable function primitive recursive?
Here we will answer ‘no’.†
Ackermann functions We will give a function that is intuitively mechanically
computable but that is not primitive recursive. An important feature of this function
is that it arises naturally so we will introduce it using familiar operations. Recall
that the addition operation is repeated successor, that multiplication is repeated
addition, and that exponentiation is repeated multiplication.
𝑥 + 𝑦 = S ( S ( · · · S (𝑥))) 𝑥 ·𝑦 =𝑥 +𝑥 + ··· +𝑥 𝑥𝑦 = 𝑥 · 𝑥 · · · · · 𝑥
| {z } | {z } | {z }
𝑦 many 𝑦 many 𝑦 many
0 – if 𝑦 = 0
(
product (𝑥, 𝑦) = H2 (𝑥, 𝑦) =
H1 (𝑥, H2 (𝑥, 𝑦 − 1)) – otherwise
1 – if 𝑦 = 0
(
power (𝑥, 𝑦) = H3 (𝑥, 𝑦) =
H2 (𝑥, H3 (𝑥, 𝑦 − 1)) – otherwise
The pattern is in the ‘otherwise’ lines. Each one is H𝑛 (𝑥, 𝑦) = H𝑛− 1 (𝑥, H𝑛 (𝑥, 𝑦 − 1)) .
Because of this pattern we call each H𝑛 the level 𝑛 function, so that successor is
the level 0 operation, addition is level 1, multiplication is level 2, and exponentiation
is level 3. The definition below writes H (𝑛, 𝑥, 𝑦) in place of H𝑛 (𝑥, 𝑦) to bring all of
the levels into one formula.
4.1 Definition This is the hyperoperation H : N3 → N.
𝑦+1 – if 𝑛 = 0
– if 𝑛 = 1 and 𝑦 = 0
𝑥
H (𝑛, 𝑥, 𝑦) = 0 – if 𝑛 = 2 and 𝑦 = 0
1 – if 𝑛 > 2 and 𝑦 = 0
H (𝑛 − 1, 𝑥, H (𝑛, 𝑥, 𝑦 − 1)) – otherwise
†
That’s why the diminutive ‘primitive’ is in the name — while the class is interesting and important, it
isn’t big enough to contain every effective function.
32 Chapter I. Mechanical Computation
𝜇 recursion The prior section’s Exercise 3.27 suggests the right direction.
Í Primitive
recursion does operations that are bounded, such as bounded sum 0 ≤𝑖<𝑦 𝑔(𝑖) =
𝑔( 0) + · · · +𝑔(𝑦 − 1) and bounded minimization min𝑖<𝑚 [𝑝 (𝑥, ® 𝑖)] ,
® 𝑖)] = 𝜇𝑖𝑖<𝑚 [𝑝 (𝑥,
which returns the smallest 𝑖 < 𝑚 such that 𝑝 (𝑥, ® 𝑖) = 0. We can show that
a programming language having only bounded loops computes the primitive
recursive functions (see Extra E). To include all of the functions that are intuitively
mechanically computable we must add an operation that is unbounded.
4.5 Definition Suppose that 𝑔 : N𝑛+1 → N is total, so that for every input tuple
there is an output number. Then 𝑓 : N𝑛 → Nis defined from 𝑔 by minimization or
𝜇 -recursion, written 𝑓 (𝑥)
® = 𝜇𝑦 𝑔(𝑥, ® 𝑦) = 0 , if 𝑓 (𝑥)
® is the minimum number 𝑦
such that 𝑔(𝑥,
® 𝑦) = 0.
This is unbounded search. Think of it as examining 𝑔(𝑥, ® 0) , then 𝑔(𝑥,
® 1) ,
etc., looking for one of them to give the output 0. If that ever happens, so that
® 𝑦) = 0 for some least 𝑦 , then 𝑓 (𝑥)
𝑔(𝑥, ® = 𝑦 . If there is no such number then 𝑓 (𝑥)®
is undefined.
4.6 Example Euler noticed that the polynomial 𝑝 (𝑦) = 𝑦 2 + 𝑦 + 41 at least at first
output only primes. Does the pattern continue forever?
𝑦 0 1 2 3 4 5 6 7 8 9 ...
𝑝 (𝑦) 41 43 47 53 61 71 83 97 113 131 ...
1 – if 𝑥 0𝑦 2 + 𝑥 1𝑦 + 𝑥 2 is prime
(
𝑔(𝑥 0, 𝑥 1, 𝑥 2, 𝑦) =
0 – otherwise
(f-helper 0))
calls (f- helper 0) , then (f- helper 1) , etc. It finds an input for which Euler’s
quadratic 𝑝 returns a non-prime.
> (f 1 1 41)
40
All primitive recursive function are total. But by using the minimization
operator we can get functions whose output value is undefined for some or all
inputs. For instance, if 𝑔(𝑥, 𝑦) = 1 for all 𝑥, 𝑦 ∈ N then 𝑓 (𝑥) = 𝜇𝑦 [𝑔(𝑥, 𝑦) = 0]
is undefined for all 𝑥 . In the next example no one currently knows whether the
search will end.
4.7 Example Goldbach’s conjecture is that every even number greater than two is the
sum of two primes. The first few instances are 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,
and 10 = 3 + 7. This conjecture is not known to be true, although researchers have
confirmed it for all evens up to 𝑦 = 4 × 1018.
Here we do an unbounded search for a counterexample. This auxiliary function
;; Returns minimal i <= n such that i and n-i are prime; returns #f if no such i
(define (gb-check n)
(for/first ([i (in-range 2 (add1 n))]
#:when (and (prime? i)
(prime? (- n i))))
i))
1
(if (gb-check y)
1
0)))
(gb-f-helper 0))
4.8 Example We can expand on that approach. Suppose that we want to settle
Legendre’s conjecture, that for every natural number 𝑛 > 0 there is a prime
number 𝑝 with 𝑛 2 < 𝑝 < (𝑛 + 1) 2. Start an unbounded search for a counterexample,
but at the same time also run an unbounded search for a proof. After all, a proof
is a sequence of statements in a suitable formal language where each statement
is either an axiom or follows logically from the prior statements, and where the
final statement is the desired theorem. You could use a computer to search for
a proof as: for each 𝑛 , interpret it as a string (perhaps convert 𝑛 to binary and
interpret that binary as a string) and check whether that string is a proof of the
theorem. Now wait for one or the other of these searches to halt. Obviously this
relates unbounded search to the Entscheidungsproblem.
The above discussion makes clear that unbounded search via the 𝜇 operator is
intuitively mechanically computable. We now define a superset set of the primitive
recursive functions by adding this function operation.
4.9 Definition A function is general recursive or partial recursive, or 𝜇 -recursive,
or just recursive, if it can be derived from the initial operations of the zero
function Z (𝑥)® = 0, the successor function S (𝑥) = 𝑥 + 1, and the projection
functions I 𝑖 (𝑥)
® = 𝑥𝑖 by a finite number of applications of function composition,
the schema of primitive recursion, and minimization.
S Kleene showed that this set of functions is the same as the Turing machine-based
set of computable functions.
We have seen that unbounded search is a natural computational construct. It
is also a theme in this book. For instance, we will later consider the question of
which programs halt and a natural way to think about this is as a search for a
halting step.
I.4 Exercises
Some of these have answers that are tedious to compute. It may help to use a computer,
for instance by writing a Racket program or using Sage.
4.10 What is the difference between total recursive and primitive recursive?
✓ 4.11 Find each: H4 ( 2, 0) , H4 ( 2, 1) , H4 ( 2, 2) , H4 ( 2, 3) , and H4 ( 2, 4) .
36 Chapter I. Mechanical Computation
4.23 Finish the proof of Lemma 4.2 by verifying that H2 (𝑥, 𝑦) = 𝑥 · 𝑦 and
H3 (𝑥, 𝑦) = 𝑥 𝑦 .
4.24 Prove that the computation of H (𝑛, 𝑥, 𝑦) always terminates.
✓ 4.25
(a) Prove that the function remtwo : N → { 0, 1 } giving the remainder on division
by two is primitive recursive.
(b) Use that to prove that this function is 𝜇 -recursive: 𝑓 (𝑛) = 0 if 𝑛 is even, and
𝑓 (𝑛)↑ if 𝑛 is odd.
✓ 4.26 Consider the Turing machine P = {𝑞 0 B1𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 BR𝑞 2, 𝑞 1 1L𝑞 1 }. De-
fine 𝑔(𝑥, 𝑦) = 0 if the machine P , when started on a tape that is blank except for
𝑥 -many consecutive 1’s and with the head under the leftmost 1, has halted after
step 𝑦 . Otherwise, 𝑔(𝑥, 𝑦) = 1. Find 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 for these. (a) 𝑓 ( 0)
(b) 𝑓 ( 1) (c) 𝑓 ( 2) (d) 𝑓 ( 3) (e) 𝑓 ( 4) (f) 𝑓 ( 5)
4.27 Define 𝑔(𝑥, 𝑦) by: start P = {𝑞 0 B1𝑞 2, 𝑞 0 1L𝑞 1, 𝑞 1 B1𝑞 2, 𝑞 1 11𝑞 2 } on a tape
that is blank except for 𝑥 -many consecutive 1’s and with the head under the
leftmost 1. If P has halted after step 𝑦 then 𝑔(𝑥, 𝑦) = 0 and otherwise 𝑔(𝑥, 𝑦) = 1.
Let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . Find 𝑓 (𝑥) for these. (a) 𝑓 ( 0) (b) 𝑓 ( 1) (c) 𝑓 ( 2)
(d) 𝑓 ( 3) (e) 𝑓 ( 4) (f) 𝑓 ( 5)
4.28 Consider this Turing machine.
Let 𝑔(𝑥, 𝑦) = 0 if this machine, when started on a tape that is all blank except for
𝑥 -many consecutive 1’s and with the head under the leftmost 1, has halted after
𝑦 steps. Otherwise, 𝑔(𝑥, 𝑦) = 1. Let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . Find: (a) 𝑓 ( 0)
(b) 𝑓 ( 1) (c) 𝑓 ( 2) (d) 𝑓 (𝑥) .
✓ 4.29 Define ℎ : N+ → N by: ℎ(𝑛) = 𝑛/2 if 𝑛 is even, and otherwise ℎ(𝑛) = 3𝑛 + 1.
Let 𝐻 (𝑛, 𝑘) be the 𝑘 -fold composition of ℎ with itself, so 𝐻 (𝑛, 1) = ℎ(𝑛) , 𝐻 (𝑛, 2) =
ℎ ◦ ℎ (𝑛) , 𝐻 (𝑛, 3) = ℎ ◦ ℎ ◦ ℎ (𝑛) , etc. (We can take 𝐻 (𝑛, 0) = 0, although its
value isn’t interesting.) Let 𝐶 (𝑛) = 𝜇𝑘 𝐻 (𝑛, 𝑘) = 1 . (a) Compute 𝐻 ( 4, 1) ,
𝐻 ( 4, 2) , and 𝐻 ( 4, 3) . (b) Find 𝐶 ( 4) , if it is defined. (c) Find 𝐶 ( 5) , if it is defined.
(d) Find 𝐶 ( 11) , if it is defined. (e) Find 𝐶 (𝑛) for all 𝑛 ∈ [ 1 .. 20) , where defined.
The Collatz conjecture is that 𝐶 (𝑛) is defined for all 𝑛 . No one knows if it is true.
4.30 The Ackermann function is intuitively mechanically computable (and total)
but is not primitive recursive. Here is an alternative such function. Assume that all
partial recursive functions take one natural number input and yield one natural
numbers output. (We can simulate input pairs, etc, with Gödel’s multiplicative
encoding; see Exercise 3.32.) Let 𝑓0 , 𝑓1 , . . . be an effective list of all primitive
recursive functions. That is, there is a primitive recursive function that inputs the
index 𝑖 and returns some way of computing 𝑓𝑖 . (Remark: this is an interpreter for
the primitive recursive functions, which given the specification 𝑖 of the function,
can do the computation of 𝑓𝑖 (𝑥) for any input 𝑥 .)
38 Chapter I. Mechanical Computation
Now consider 𝐷 (𝑛) = 𝑓𝑛 (𝑛) + 1. Show that 𝐷 , while intuitively computable, is not
primitive recursive.
Extra
I.A Turing machine simulator
The source repository for this book includes a program, written in Racket, to
simulate a Turing machine. It is in the directory src/ scheme / prologue . Here we will
show how to run this simulator. (The implementation tracks closely the description
of the action of a Turing machine given on page 8.)
Example 1.1 gives a Turing machine that computes the predecessor function.
% pred.tm
% Compute predecessor fcn: pred(0)=0 and pred(n)=n-1
0 B L 1
0 1 R 0
1 B L 2
1 1 B 1
2 B R 3
2 1 L 2
Thus the simulator for any particular Turing machine is really the pair consisting
of the Racket code along with the machine’s description, as above.
Below is a run of the simulator, including its command line invocation. The
machine starts with a current symbol of 1 and the tape to the right of the current
symbol is 11 (the tape to the left is empty). Thus, the entire tape input is 𝜏 = 111.
Since the predecessor of 3 is 2, we expect that when it finishes the tape will contain
11, with the rest blank.
computing/src/scheme/prologue$ ./turing-machine.rkt -f machines/pred.tm -c "1" -r "11"
step 0: q0: *1*11
step 1: q0: 1*1*1
step 2: q0: 11*1*
step 3: q0: 111*B*
step 4: q1: 11*1*B
step 5: q1: 11*B*B
step 6: q2: 1*1*BB
step 7: q2: *1*1BB
step 8: q2: *B*11BB
step 9: q3: B*1*1BB
step 10: HALT
The output is crude but good enough for small experiments. The command line
turing - machine .rkt -- help gives the simulator’s options.
I.A Exercises
A.1 Run the simulator on Ppred starting with 11111. Also start with an empty
tape.
Extra B. Hardware 39
A.2 Run the simulator on Example 1.2’s Padd to do 1 + 2. Also simulate 0 + 2 and
0 + 0.
A.3 Write a Turing machine to perform the operation of adding 3, so that given
as input a tape containing only a string of 𝑛 consecutive 1’s, it returns a tape with
a string of 𝑛 + 3 consecutive 1’s. Follow our convention that when the program
starts and ends the head is under the first 1. Run it on the simulator, with an input
of 4 consecutive 1’s, and also with an empty tape.
A.4 Write a machine to decide if the input contains the substring 010. Fix
Σ = { 0, 1, B }. The machine starts with the tape blank except for a contiguous
string of 0’s and 1’s, and with the head under the first non-blank symbol. When
it finishes, the tape will have either just a 1 if the input contained the desired
substring, or otherwise just a 0. We will do this in stages, building a few of what
amounts to subroutines.
(a) Write instructions, starting in state 𝑞 10 , so that if initially the machine’s head
is under the first of a sequence of non-blank entries then at the end the head
will be to the right of the final such entry.
(b) Write a sequence of instructions, starting in state 𝑞 20 , so that if initially the
head is just to the right of a sequence of non-blank entries, then at the end all
entries are blank.
(c) Write the full machine, including linking in the prior items.
Extra
I.B Hardware
not 𝑃 𝑃 and 𝑄 𝑃 or 𝑄
𝑃 ¬𝑃 𝑃 𝑄 𝑃 ∧𝑄 𝑃 ∨𝑄
0 1 0 0 0 0
1 0 0 1 0 1
1 0 0 1
1 1 1 1
Those three logic operators are all we need. We will show how to go from
a specified input-output behavior, a desired truth table, to a propositional logic
expression having that behavior that uses only ‘¬’, ‘∧’, and ‘∨’. Then we will sketch
how to implement that with electronic components.
The two tables below show how. Start with the one on the left and focus on the
row with output 1. The expression ¬𝑃 ∧ ¬𝑄 makes this row take on value 1 and
every other row take on value 0.
𝑃 𝑄 𝑃 𝑄 𝑅
0 0 1 0 0 0 0
0 1 0 0 0 1 1
1 0 0 0 1 0 1
1 1 0 0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
For the table on the right, again focus on the rows ending in 1’s. For the second
row the clause is ¬𝑃 ∧ ¬𝑄 ∧ 𝑅 . Target the third row with ¬𝑃 ∧ 𝑄 ∧ ¬𝑅 and the
fifth row with 𝑃 ∧ ¬𝑄 ∧ ¬𝑅 . Now put these clauses together with ∨’s to get the
statement with the given table. (A statement consisting of clauses using ∧’s that
are joined with ∨’s is in Disjunctive Normal Form, DNF. See Section C.)
𝑃𝑄𝑅
G 5 volts
𝑉out
S
𝑉in
On the right is a battery, which as we shall see supplies the extra voltage. On the
top left, shown as a wiggle, is a resistor. When current is flowing around the circuit,
this resistor regulates the power output from the battery.
On the bottom left, shown with the circle, is a transistor. If there is enough
voltage between G and S then this component allows current from the battery
to flow between D and S. (Because it is sometimes open and sometimes closed
it is depicted as a switch, although it has no moving parts.) This transistor is
manufactured such that an input voltage 𝑉in of 5 volts will trigger this event.
To verify that this circuit inverts the signal, assume first that 𝑉in = 0. Then
there is no current flow between D and S. With no current the resistor provides no
voltage drop and Consequently the output voltage 𝑉out across the gap is all of the
voltage supplied by the battery, 5 volts. So 𝑉in = 0 results in 𝑉out = 5.
42 Chapter I. Mechanical Computation
Conversely, assume that 𝑉in = 5. Then current flows between D and S, and so
the resistor drops the voltage, meaning that the output is 𝑉out = 0.
Thus, for this device the voltage out 𝑉out is the opposite of the voltage in 𝑉in .
I.B Exercises
B.1 A propositional logic operator that is often used is Exclusive Or, XOR. It is
defined by: 𝑃 XOR 𝑄 = 1 if and only if 𝑃 ≠ 𝑄 . (a) Specify a truth table and from it
construct a DNF propositional logic expression. (b) Use that to make a circuit.
B.2 The propositional logic operator Implication, →, is given by: 𝑃 → 𝑄 is 1
except when 𝑃 is 1 and 𝑄 is 0.
(a) Make a truth table and from it construct a Disjunctive Normal Form expression.
(b) Use that to make a circuit.
B.3 For the table below, construct a DNF propositional logic expression and use
that to make a circuit.
𝑃 𝑄
0 0 0
0 1 1
1 0 0
1 1 1
B.5 Make a table with inputs 𝑃 , 𝑄 , and 𝑅 for the behavior that the output is 1 if 𝑃
equals 𝑅 . Produce the associated DNF expression. Draw the circuit.
B.6 Make a three-input table for the behavior: the output is 1 if a majority of the
inputs are 1’s. Produce the associated DNF expression. Draw the circuit.
B.7 Consider the input/output behavior that the output is 1 if a majority of the
inputs are 1’s (this does not allow ties).
(a) Make a four-input table for the behavior. Produce the associated Disjunctive
Normal Form expression.
(b) Also produce the DNF expression for this behavior with five inputs.
B.8 To add two binary numbers the most natural approach works like the grade
school decimal addition algorithm. Start at the right with the one’s column. Add
Extra C. Game of Life 43
those two bits and possibly carry a 1 to the next column. Then add down the next
column, including any carry. Repeat this from right to left.
(a) Use this method to add the two binary numbers 1011 and 1001.
(b) Make a truth table giving the desired behavior in adding the numbers in one
column. It must have three inputs because of the possibility of a carry. It must
also have two output columns, one for the least significant bit of the sum along
with one for any carry.
(c) Draw the circuits.
Extra
I.C Game of Life
J von Neumann was one of the twentieth century’s most prolific and
influential mathematicians. Just in computing, his contributions to
developments in hardware are significant enough that the single-memory
stored-program architecture is commonly called the von Neumann archi-
tecture, and in software he was also an important innovator including
inventing merge sort.
One of the many things he studied was the problem of humans living
on Mars. He thought that to colonize Mars we should first terraform
it with robots. Mars is red because it is full of rust, iron oxide. Robots John von Neu-
mann 1903-
could mine that rust, break it into iron and oxygen, and release the
1957
oxygen into the atmosphere. With all of that iron, the robots could make
more robots. So von Neumann was thinking about making machines that could
self-reproduce.† A suggestion from his best friend S Ulam led him to explore the
topic by computing on a grid, a cellular automaton.
Widespread interest in cellular automata greatly increased with
the appearance of the Game of Life, by J Conway. It was featured
in M Gardner’s celebrated Mathematical Games column of Scientific
American in October 1970. The rules are simple enough that a person
could immediately start experimenting. Lots of people did. When
personal computers appeared, Life became a computer craze since
it is easy for a beginner to program.
John Conway 1937–
Start by drawing a two-dimensional grid of square cells, as with
2020
graph paper. Each cell has eight neighbors, four that are horizontally
or vertically adjacent and four more that are diagonally adjacent. The game
proceeds in stages, or generations. At each generation each cell is in one of two
states, alive or dead. For the next generation the next state is determined by: (1) a
live cell with two or three live neighbors will again be live at the next generation but
any other live cell dies, (2) a dead cell with exactly three live neighbors becomes
alive at the next generation but other dead cells stay dead. (The backstory goes
that for (1) live cells will die if they are either isolated or overcrowded while for (2),
if the environment is just right then the neighbors can reproduce to spread life
†
There is a later Extra on self-reproduction.
44 Chapter I. Mechanical Computation
into this cell.) We begin by seeding the board with some initial pattern, and then
watch what develops.
As Gardner noted, the rules of the game balance tedious simplicity against
impenetrable complexity.
Conway chose his rules carefully, after a long period of experimentation, to meet
three desiderata:
1. There should be no initial pattern for which there is a simple proof that the
population can grow without limit.
2. There should be initial patterns that apparently do grow without limit.
3. There should be simple initial patterns that grow and change for a considerable
period of time before coming to end in three possible ways: fading away completely
(from overcrowding or becoming too sparse), settling into a stable configuration
that remains unchanged thereafter, or entering an oscillating phase in which they
repeat an endless cycle of two or more periods.
In brief, the rules should be such as to make the behavior of the population unpredictable.
The result, as Conway says, is a mathematical recreation that is a “zero-player
game.”
The simplest nontrivial pattern, a single cell, immediately dies.†
Generation 0 Generation 1
Some other patterns don’t die but don’t do anything else, either. This 2 × 2
collection is a block. It is stable from generation to generation.
Generation 0 Generation 1
Because it doesn’t change, a block is a ‘still life’. Another still life is the beehive.
Generation 0 Generation 1
But many patterns are not still. This three-cell pattern, the blinker, does a
simple oscillation.
There are other patterns that move. This is a glider, the most famous pattern in
Life.
It moves one cell vertically and one horizontally every four generations, crawling
across the screen.
When Conway came up with the Life rules he was not sure whether there is a
pattern where the total number of live cells keeps on growing. B Gosper showed
that there is, by building the glider gun, which produces a new glider every thirty
generations.
The glider pattern an example of a spaceship, a pattern that reappears, displaced,
after a number of generations. Here is another, the medium weight spaceship.
Another important pattern is the eater, which consumes gliders and other
spaceships.
I.C Exercises
For some of these a program to simulate the game will be a help. This book’s source
has a Life simulator written in Racket under the src/scheme directory. You can also
find simulators using a search engine.
C.4 On the left is the tub and on the right is the toad. One is a still life and one
an oscillator. Which is which?
C.5 It is easy to run the clock forward. Can you run the clock back?
Extra
I.D Ackermann’s function is not primitive recursive
𝑦+1 – if 𝑛 = 0
– if 𝑛 = 1 and 𝑦 = 0
𝑥
H (𝑛, 𝑥, 𝑦) = 0 – if 𝑛 = 2 and 𝑦 = 0
1 – if 𝑛 > 2 and 𝑦 = 0
H (𝑛 − 1, 𝑥, H (𝑛, 𝑥, 𝑦 − 1)) – otherwise
We have cited that this function is not primitive recursive. Here we will produce a
simplified variant and then show that it is not primitive recursive.
In H’s definition, the variable 𝑥 does not play an active role. R Péter
noted this and got a function with a simpler definition, by considering
H (𝑛, 𝑦, 𝑦) . That, and tweaking the initial value of each level, gives this.
𝑦+1 – if 𝑘 = 0
A (𝑘, 𝑦) = A (𝑘 − 1, 1) – if 𝑘 > 0 and 𝑦 = 0
Including the next two entries gives the sense that this function grows very fast
indeed.
65536
A ( 4, 2) = 265536 − 3 A ( 4, 3) = 2 ( 2 )
−3
We will prove that A is not primitive recursive. The intuition is that for any
𝑓 : N𝑛 → N that is primitive recursive, A grows faster than 𝑓 .
Recall that if a function has multiple inputs 𝑥 0, ... 𝑥𝑛− 1 then we sometimes
abbreviate that sequence with the vector 𝑥®. And, we will write max (𝑥) ® for
max ({𝑥 0, ... 𝑥𝑛− 1 }) . To compare the growth of the two-input function A with an
𝑛 -input 𝑓 , we will look at A (𝑘, max (𝑥))
® .
†
Although some authors mean the one-input version 𝑓 (𝑥 ) = A (𝑥, 𝑥 ) .
48 Chapter I. Mechanical Computation
The proof ’s strategy is to show that each primitive recursive function has a
natural number level, but A does not — it grows faster than any fixed-level function.
4.1 Definition Where 𝑘 ∈ N, a function 𝑓 is level 𝑘 if A (𝑘, max (𝑥)) ® > 𝑓 (𝑥) ® for
all 𝑥®.
By item e of the following result, if a function is level 𝑘 then it is also level 𝑘ˆ
for any 𝑘ˆ > 𝑘 .
4.2 Lemma (Monotonicity properties) (a) A (𝑘, 𝑦) > 𝑦
(b) A (𝑘, 𝑦 + 1) > A (𝑘, 𝑦) , and in general if 𝑦ˆ > 𝑦 then A (𝑘, 𝑦)
ˆ > A (𝑘, 𝑦)
(c) A (𝑘 + 1, 𝑦) ≥ A (𝑘, 𝑦 + 1)
(d) A (𝑘, 𝑦) > 𝑘
(e) A (𝑘 + 1, 𝑦) > A (𝑘, 𝑦) and in general if 𝑘ˆ > 𝑘 then A (𝑘,ˆ 𝑦) > A (𝑘, 𝑦)
(f) A (𝑘 + 2, 𝑦) > A (𝑘, 2𝑦)
Proof Here we will verify the first item, that A (𝑘, 𝑦) > 𝑦 for all 𝑘 and for all 𝑦 ,
leaving the others as Exercise D.12. We will do induction on 𝑘 . The 𝑘 = 0 base
step holds because A ( 0, 𝑦) = 𝑦 + 1, and so A ( 0, 𝑦) > 𝑦 .
For the inductive step, assume that this holds for 𝑘 = 0, ... 𝑛 .
(∗)
∀𝑦 A (𝑘, 𝑦) > 𝑦
∀𝑦 A (𝑛 + 1, 𝑦) > 𝑦 (∗∗)
4.4 Lemma Each of these initial functions has a level: (1) the zero functions Z (𝑥)
® = 0,
(2) the successor functions S (𝑥) ® = 𝑥 + 1, and (3) the projection functions
® = I 𝑖 (𝑥 0, ... 𝑥𝑘 −1 ) = 𝑥𝑖 .
I 𝑖 (𝑥)
Proof For the first, 𝑘 = 0 suffices by the first clause of the definition of A since
A ( 0, 𝑦) = 𝑦 + 1 > Z (𝑦) = 0. For item (2), 𝑘 = 1 works because by Lemma 4.2.e
A ( 1, 𝑦) > A ( 0, 𝑦) = 𝑦 + 1. For (3) take 𝑘 = 0 because by the definition’s first
clause A ( 0, max (𝑥))
® = max (𝑥)® + 1 and that is larger than the projection I 𝑖 (𝑥)
® , as
we are taking a maximum.
4.5 Lemma Let each primitive recursive function 𝑔0, ... 𝑔𝑚− 1, ℎ have a level,
𝑘 0, ... 𝑘𝑚−1, 𝑘𝑚 . Let 𝑓 be the composition 𝑓 (𝑥)
® = ℎ(𝑔0 (𝑥), ® . Then 𝑓
® ... 𝑔𝑚−1 (𝑥))
is level max ({𝑘 0, ... 𝑘𝑚− 1, 𝑘𝑚 }) + 2.
Proof Take 𝑘 = max ({𝑘 0, ... 𝑘𝑚− 1, 𝑘𝑚 }) . Then all of the functions 𝑔0, ... 𝑔𝑚− 1, ℎ
are level 𝑘 by Lemma 4.2.e.
Lemma 4.2’s item c and then the third clause in A’s definition gives this.
A (𝑘 + 2, max (𝑥))
® ≥ A (𝑘 + 1, max (𝑥)
® + 1) = A (𝑘, A (𝑘 + 1, max (𝑥)))
® (∗)
Focusing on the second argument of the right-hand expression, Lemma 4.2.e and
the assumption that each function 𝑔0, ... 𝑔𝑚− 1 is level 𝑘 show that for each function
index 𝑖 ∈ { 0, ... 𝑚 − 1 } we have A (𝑘 + 1, max (𝑥)) ® > A (𝑘, max (𝑥)) ® > 𝑔𝑖 (𝑥) ®.
Hence A (𝑘 + 1, max (𝑥))
® > max ({ 𝑔0 (𝑥),® ... 𝑔𝑚−1 (𝑥)® }) .
Lemma 4.2.b says that A is monotone in the second argument, so returning to
equation (∗) and swapping out A (𝑘 + 1, max (𝑥)) ® gives the first inequality here.
A (𝑘 + 2, max (𝑥))
® ≥ A (𝑘, max ({𝑔0 (𝑥),
® ... 𝑔𝑚−1 (𝑥)
® }))
> ℎ(𝑔0 (𝑥), ® = 𝑓 (𝑥)
® ... 𝑔𝑚−1 (𝑥)) ®
A (𝑘 + 1, max (𝑥)
® + 𝑦) > 𝑓 (𝑥,
® 𝑦) (∗)
A (𝑘 + 1, max (𝑥)
® + 𝑛) > max ({ 𝑓 (𝑥,
® 𝑛), 𝑥 0, ... 𝑥𝑛−1, 𝑛 })
With that, the first inequality below follows from Lemma 4.2.b, monotonicity of A
in its second argument. The second holds because ℎ is a level 𝑘 function.
A (𝑘 + 1, max (𝑥)
® + 𝑛 + 1) = A (𝑘, A (𝑘 + 1, max (𝑥)
® + 𝑛))
> A (𝑘, max ({ 𝑓 (𝑥,
® 𝑛), 𝑥 0, ... 𝑥𝑛−1, 𝑛 }))
> ℎ(𝑓 (𝑥,
® 𝑧), 𝑥, ® 𝑛 + 1)
® 𝑛) = 𝑓 (𝑥,
A (𝑘 + 3, max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 })) > A (𝑘 + 1, 2 · max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 }))
≥ A (𝑘 + 1, max (𝑥)
® + 𝑦)
> 𝑓 (𝑥,
® 𝑦)
The second inequality follows from 2 · max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 }) ≥ max (𝑥)
® + 𝑦 , and
the third is (∗).
4.7 Corollary The function A is not primitive recursive.
Proof If A were primitive recursive then it would be of some level, 𝑘 . That
means A (𝑘, max ({𝑥, 𝑦 })) > A (𝑥, 𝑦) for all 𝑥, 𝑦 . Taking 𝑥 and 𝑦 to be 𝑘 gives a
contradiction.
I.D Exercises
D.8 In base 10, how many digits are in A ( 4, 2) = 265536 − 3?
D.9 A classmate asks you, “How does it work that all the levels of A are primitive
recursive but as a whole it is not? Isn’t that like saying you have a cake and all the
parts are delicious but the cake as a whole is not?”
D.10 Trace through the argument to find a level number 𝑘 for these primitive
recursive functions (it needn’t be the least level).
(a) 𝑓 (𝑦) = 𝑦 + 2
(b) pred (𝑦) = 𝑦 − 1 if 𝑦 > 0 and pred ( 0) = 0.
D.11 Show that for any 𝑘, 𝑦 the evaluation of A (𝑘, 𝑦) terminates.
D.12 Verify these parts of Lemma 4.2. (a) Item b, A (𝑘, 𝑦 + 1) > A (𝑘, 𝑦) and
in general if 𝑦ˆ > 𝑦 then A (𝑘, 𝑦)
ˆ > A (𝑘, 𝑦) (b) Item c, A (𝑘 + 1, 𝑦) ≥ A (𝑘, 𝑦 + 1)
(c) Item d, A (𝑘, 𝑦) > 𝑘 (d) Item e, A (𝑘 + 1, 𝑦) > A (𝑘, 𝑦) and in general if 𝑘ˆ > 𝑘
ˆ 𝑦) > A (𝑘, 𝑦) (e) Item f, A (𝑘 + 2, 𝑦) > A (𝑘, 2𝑦)
then A (𝑘,
Extra E. LOOP programs 51
Extra
I.E LOOP programs
The primitive recursive functions are a proper subset of the general recursive
functions. The latter set consists of all functions that are mechanically computable
(under Church’s Thesis), so that collection is easy to understand. We will now give
a concrete way to understand the partial recursive functions.
Here is a Racket for loop,
(define (show-numbers)
(for ([i '(1 2 3)])
(display i)))
The difference is that in a for loop we know in advance the number of times that
the machine will go through the code inside the loop (above it is three times) —
as long as we don’t change the value of the loop variable — but a do allows the
machine to go through its code an unbounded number of times.
> (wait-until-yes)
Please enter 'yes'
yse
Enter exactly the string 'yes'
yes
Thanks
The next result says that a function is primitive recursive if and only if it can be
computed using only for loops.
E.1 Theorem (Meyer and Ritchie, 1967) A function is primitive recursive if and
only if it can be computed without using unbounded loops. More precisely, it is
limited to loops where we can compute in advance, using only primitive recursive
functions, how many iterations will occur.
We will show half of this, that if a function is
primitive recursive then we can compute it using only
bounded loops. We will do it by programming the
primitive recursive functions in a language, called
LOOP, that does not have unbounded loops. (Proof of
the converse is outside our scope.)
Albert Meyer b 1941 and Dennis Programs in LOOP execute on a machine model
Ritchie 1941–2011 (inventor of
with registers r0 , r1 , . . . that hold natural numbers.
C)
There are four kinds of instructions, which we de-
scribe using r0 and r1 : (i) r0 = 0 sets the contents of the register to zero,
52 Chapter I. Mechanical Computation
Very important: changing the contents of the loop register inside of the loop
does not change the number of times that the machine steps through that loop.
Thus, what’s below is not an infinite loop.
loop r0
r0 = r0 + 1
end
Instead, when the loop ends the value in r0 will be twice what it was when the
loop began.
To interpret LOOP programs as computing functions, we need a convention for
input and output. Where the function takes 𝑛 inputs, we will preload those inputs
into the the machine’s first 𝑛 registers. Similarly, where the function has 𝑚 outputs
we take those to be the final values of the first 𝑚 registers.
With that convention, this LOOP program computes the two-input, one output
addition function plus (𝑥, 𝑦) = 𝑥 + 𝑦 .
# plus.loop Return r0 + r1
loop r1
r0 = r0 + 1
end
This book’s source distribution comes with loop . rkt , a Racket program that
interprets LOOP code. Here is an invocation running that code.†
jim@millstone:src/scheme/prologue$ ./loop.rkt -f machines/plus.loop -p "3 2" -s
r0=3 r1=2
--start loop of 2 repetitions --
r0=4 r1=2
r0=5 r1=2
--end loop--
5
The program options are: p preloads the registers r0 and r1 with 3 and 2, while
s shows the registers for each step of the computation. By default the simulator
returns the value of the first register, here 5.
†
Racket version 8.2.
Extra E. LOOP programs 53
Two more examples. This computes the predecessor function pred (𝑥) ,
# pred.loop Return r0 - 1 (or 0)
loop r0
r2 = r1
r1 = r1 + 1
end
r0 = r2
r0 = r2
end
The program’s o option lets us show three registers instead of the default one
jim@millstone$ ./loop.rkt -f machines/rotate-shift-right.loop -p "1 2 3" -o 3
3 1 2
(we’ve avoided showing the computation’s steps by not using the s option).
We are now ready to prove that for each primitive recursive function there is a
LOOP program that computes it. The strategy is to first show how to compute the
initial functions and then show how to do the combining operations of function
composition and primitive recursion.
The zero function Z (𝑥) = 0 is computed by the LOOP program whose single
line is r0 = 0. The successor function S (𝑥) = 𝑥 + 1 is computed by the one-line
r0 = r0 + 1. Projection I 𝑖 (𝑥 0, ... 𝑥𝑖 , ... 𝑥𝑛−1 ) = 𝑥𝑖 is computed by r0 = r𝑖 .
Composition of two functions is easy. Let 𝑔(𝑥 0, ... 𝑥𝑛 ) and 𝑓 (𝑦0, ... 𝑦𝑚 ) be
computed by LOOP programs 𝑃𝑔 and 𝑃 𝑓 . Suppose that the bookkeeping of the
composition 𝑓 ◦ 𝑔 is right, that 𝑔 is an 𝑚 -output function to match the number of
𝑓 ’s inputs. Then concatenating the two programs, so that the instructions of 𝑃𝑔
are just followed by the instructions of 𝑃 𝑓 , gives the desired LOOP program for
composition, since it uses the output of 𝑔 as input to compute the action of 𝑓 .
General composition starts with
𝑓 (𝑥 0, ... 𝑥𝑛 ), ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 )
Extra E. LOOP programs 55
and produces 𝑓 (ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 )) . This needs a little more
thought than the two-function case. The issue is that were we to load the inputs
𝑦0,0 , . . . 𝑦𝑛,𝑚𝑛 into the registers r0 , r1 , . . . and then immediately begin computing
ℎ 0 , there would be a danger of overwriting the inputs for later functions such as
ℎ 1 . For instance, rotate - shift - right . loop above used an extra register, r3 ,
beyond those used to store inputs.
So we must move those inputs out of the way. Let 𝑃 𝑓 , 𝑃ℎ0 , . . . 𝑃ℎ𝑛 be LOOP
programs to compute the functions. Each uses a limited number of registers and
thus there is a number 𝑗 so large that no program uses register 𝑗 . By definition, the
program 𝑃 to compute the composition gets the sequence of inputs starting in the
register numbered 0. The first step is to copy these inputs to start in the register 𝑗 .
Next, zero out the registers below register 𝑗 , copy ℎ 0 ’s arguments down to begin
at r0 , and run the program 𝑃ℎ0 . When it finishes, copy its output to the register
numbered 𝑗 + 𝑚 0 + · · · + 𝑚𝑛 + 1. Do a similar thing for the other ℎ𝑖 ’s. Finish
by copying these outputs down to the initial registers, zeroing out the remaining
registers, and running 𝑃 𝑓 .
The other combiner operation is primitive recursion.
– if 𝑦 = 0
(
𝑔(𝑥 0, ... 𝑥𝑘 −1 )
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑦) =
ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑧), 𝑥 0, ... 𝑥𝑘 −1, 𝑧) – if 𝑦 = S (𝑧)
Suppose that we have LOOP programs 𝑃𝑔 and 𝑃ℎ . The register swapping needed
is similar to what happens for composition so we won’t go through it. The
program 𝑃 𝑓 starts by running 𝑃𝑔 . Then it sets a fresh register to 0; call that
register t. Now it enters a loop based on the register y (that is, successive times
through the loop count down as 𝑦 , 𝑦 − 1, etc.). The body of the loop computes
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑡 + 1) = ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑡), 𝑥 0, ... 𝑥𝑘 −1, 𝑡) by running 𝑃ℎ , and then
incrementing t. That ends the argument.
We close with a remark on an interesting aspect of loop . rkt , the interpreter
for LOOP. It works by replacing the C-like syntax used above with a LISP-ish one.
For instance, the interpreter converts the string input on the left to the string on
the right.
The advantage of this switch is that the parentheses automatically match the
beginning of each loop with its end and thus we don’t have to write into the
interpreter some code including a stack to keep track of loop nesting. With the
string on the right, loop . rkt computes the answer by running it through the
eval command.
56 Chapter I. Mechanical Computation
I.E Exercises
E.2 Write a LOOP program that inputs two numbers and swaps them, so that 𝑥, 𝑦
becomes 𝑦, 𝑥 .
E.3 Argue that the LOOP language would not gain strength if it were to allow
statements like r0 = r0 + 2, or statements like r0 = 1.
E.4 The program rotate - shift - right . loop inputs three numbers, outputs
three, and shifts the inputs right (with the third number ending in the first register).
Write an three input/three output program that does a rotate shift left. Also write
the program that composes the two. What does it compute?
E.5 In Ackermann’s function, after the operations plus (𝑥, 𝑦) and product (𝑥, 𝑦)
comes power (𝑥, 𝑦) . Write a LOOP program for it.
E.6 What happens when you try to change a Racket loop variable inside of the
loop? For example, what is the behavior of these two?
We want to understand the set of functions that are effective, that are mechani-
cally computable, which we have defined as computable by a Turing machine. The
major result of this chapter and the single most important result in the book is
that there are functions not computed by any machine — there are jobs that no
machine can do. We will first prove this with a counting argument, and later in
the chapter we will give specific problems that are unsolvable.
Section
II.1 Infinity
We will show that there are more functions 𝑓 : N → N than Turing machines and
that therefore there are functions with no associated machine.
Cardinality The set of functions and the set of Turing machines are both
infinite. We will begin with two paradoxes that dramatize the challenge
to our intuition posed by comparing the sizes of infinite sets. We will then
produce the mathematics to resolve these puzzles, and apply it to the sets
of functions and Turing machines.
The first puzzle is Galileo’s Paradox. It compares the size of the set
of perfect squares with the size of the set of natural numbers. The first
is a proper subset of the second and so it may seem somehow smaller.
Galileo Galilei
However, the figure below shows that the two sets can be made to match
1564–1642
element-to-element, to correspond, so in this sense there are exactly as
many squares as there are natural numbers.
0 1 2 3 4 5 6 7 8 9 10 11 ...
1.1 Animation: Correspondence 𝑛 ↔ 𝑛 2 between the natural numbers and the squares.
The second puzzle is Aristotle’s Paradox. On the left below are two circles. If
we roll them through one revolution then the trail left by the smaller one is shorter.
But if we put the smaller inside the larger and roll them, as in a train wheel, then
they appear to leave equal-length trails.
Image: This is the Hubble Deep Field image. It came from pointing the Hubble telescope at the darkest
part of the sky, the very background, for eleven days. It covers an area of the sky about the same width
as a dime viewed seventy five feet away. Every speck is a galaxy. There are thousand of them — there is
a lot in the background. Credit: Robert Williams and the Hubble Deep Field Team (STScI) and NASA.
(Also see the Deep Field movie.)
60 Chapter II. Background
As with Galileo’s Paradox, a person might think that the smaller circle’s points
make a set that is in some way smaller. But point-for-point, the smaller circle
matches the larger. The correct view is that the two sets of points have the same
number of elements, because they correspond.
The animations below illustrate matching the points in two ways. On the
left they are shown as nested circles, with points on the inside corresponding
to points on the outside. The second animation straightens that out so that the
circumferences make segments, and there for every point on the top there is a
matching point on the bottom.
1.5 Lemma For a function with a finite domain, the number of elements in its domain
is greater than or equal to the number of elements in its range. If the function is
one-to-one then its domain has the same number of elements as its range, while if
it is not one-to-one then its domain has more elements. Consequently, two finite
sets have the same number of elements if and only if they correspond, that is, if
and only if there is a function from one to the other that is a correspondence.
Proof Exercise 1.49.
1.6 Lemma The relation between two sets of ‘there is a correspondence from one to
the other’ is an equivalence.
Section 1. Infinity 61
Proof Reflexivity, that any set is related to itself, is clear since a set corresponds to
itself via the identity function. For symmetry suppose that 𝑆 0 is related to 𝑆 1 , so
that there is a correspondence 𝑓 : 𝑆 0 → 𝑆 1 , and recall that its inverse 𝑓 − 1 : 𝑆 1 → 𝑆 0
exists and is a correspondence in the other direction. For transitivity, assume
that 𝑆 0 is related to 𝑆 1 and 𝑆 1 is related to 𝑆 2 , so that there are correspondences
𝑓 : 𝑆 0 → 𝑆 1 and 𝑔 : 𝑆 1 → 𝑆 2 . Recall also that the composition 𝑔 ◦ 𝑓 : 𝑆 0 → 𝑆 2 is a
correspondence.
We now give that relation a name. This carries from the finite to the infinite
the observation of Lemma 1.5 about same-sized sets.
1.7 Definition Two sets have the same cardinality or are equinumerous, denoted
|𝑆 0 | = |𝑆 1 | , if there is a correspondence between them.
1.8 Example Galileo’s Paradox is that the set of squares 𝑆 = {𝑛 2 𝑛 ∈ N } has the same
cardinality as N, written |𝑆 | = | N | . The function 𝑓 : N → 𝑆 given by 𝑓 (𝑛) = 𝑛 2
is one-to-one because if 𝑓 (𝑥 0 ) = 𝑓 (𝑥 1 ) then 𝑥 02 = 𝑥 12 and thus, since these are
nonnegative, 𝑥 0 = 𝑥 1 . It is onto because any element of the codomain 𝑦 ∈ 𝑆 is the
square of some 𝑛 from the domain N, by the definition of 𝑆 .
1.9 Example Aristotle’s Paradox is that for 𝑟 0, 𝑟 1 ∈ R+, the interval [ 0 .. 2𝜋𝑟 0 ) has the
same cardinality as the interval [ 0 .. 2𝜋𝑟 1 ) . The map 𝑔(𝑥) = ( 2𝜋𝑟 1 /2𝜋𝑟 0 ) · 𝑥 is a
correspondence; verification is Exercise 1.43.
1.10 Example The sets 𝑆 0 = { 0, 1, 2, 3 } and 𝑆 1 = { 10, 11, 12, 13 } have the same
cardinality, |𝑆 0 | = |𝑆 1 | . One correspondence, from 𝑆 0 to 𝑆 1 , is 𝑥 ↦→ 𝑥 + 10.
1.11 Example The set of natural numbers greater than zero, N+ = { 1, 2, ... }, has the
same cardinality as N. A correspondence is 𝑓 : N → N+ given by 𝑛 ↦→ 𝑛 + 1.
Comparing the sizes of sets in this way was proposed by G Cantor in
the 1870’s. As the paradoxes above dramatize, Definition 1.7 introduces
a deep idea. We should convince ourselves that it captures what we
mean by sets having the ‘same number’ of elements. One supporting
argument is that it is the natural generalization of Lemma 1.5. A
second is Lemma 1.6, that it partitions sets into classes so that inside
of a class all of the sets have the same cardinality. That is, it justifies
the “equi” in equinumerous. The most important supporting argument
is that, as with Turing’s definition of his machine, Cantor’s definition
Georg Cantor is persuasive in itself. Gödel noted this, writing “Whatever ‘number’
1845–1918 as applied to infinite sets may mean, we certainly want it to have the
property that the number of objects belonging to some class does not
change if, leaving the objects the same, one changes in any way . . . e.g., their
colors or their distribution in space . . . From this, however, it follows at once that
two sets will have the same [cardinality] if their elements can be brought into
one-to-one correspondence, which is Cantor’s definition.”
1.12 Definition A set is finite if it has the same cardinality as { 0, 1, ... 𝑛 } for some
𝑛 ∈ N, or if it is empty. Otherwise it is infinite.
For us the most important infinite set is the natural numbers, N = { 0, 1, 2, ... }.
62 Chapter II. Background
1.13 Definition A set with the same cardinality as the natural numbers is countably
infinite. A set that is either finite or countably infinite is countable. If a set is
the range of a function whose domain is the natural numbers then we say the
function enumerates, or is an enumeration of, that set.†
The idea behind the term ‘enumeration’ is that 𝑓 : N → 𝑆 lists its range: first
𝑓 ( 0) , then 𝑓 ( 1) , etc. (This listing might have repeats, where 𝑓 (𝑛 0 ) = 𝑓 (𝑛 1 ) but
𝑛 0 ≠ 𝑛 1 .) We are often interested in enumerations that are computable.
1.14 Example The set of multiples of three, 3N = { 3𝑘 𝑘 ∈ N }, is countable. The
natural map 𝑔 : N → 3N is 𝑔(𝑛) = 3𝑛 .
1.15 Example The set N − { 2, 5 } = { 0, 1, 3, 4, 6, 7, ... } is countable. The function below,
both formally defined and illustrated with a table, closes up the gaps.
𝑛 – if 𝑛 < 2
0 1 2 3 4 5 6 ...
𝑛
𝑓 (𝑛) = 𝑛 + 1 – if 𝑛 ∈ { 2, 3 }
𝑓 (𝑛) 0 1 3 4 6 7 8 ...
𝑛 + 2 – if 𝑛 ≥ 4
This function is clearly both one-to-one and onto.
1.16 Example The set of prime numbers 𝑃 is countable. There is a function 𝑝 : N → 𝑃
where 𝑝 (𝑛) is the 𝑛 -th prime, so that 𝑝 ( 0) = 2, 𝑝 ( 1) = 3, etc.
1.17 Example Fix the set of symbols Σ = { a, ... z }. Consider the set of strings made of
those symbols, such as az and abba. The set of all such strings, Σ∗, is countable. This
table illustrates one correspondence, the one that puts the strings in lexicographic
order, where shorter strings come before longer ones and equal-length strings
come in alphabetical order. (The first entry is the empty string, 𝜀 = ‘ ’.)
𝑛∈N 0 1 2 3 4 5 6 ...
𝑓 (𝑛) ∈ Z 0 +1 −1 +2 −2 +3 −3 ...
start, the tortoise will have moved on to some 𝑥 1 . On reaching 𝑥 1 , Achilles will
find that the tortoise is ahead at 𝑥 2 . For any 𝑥𝑖 , Achilles will always be behind and
so, the tortoise reasons, Achilles can never get ahead. The heart of this argument
is that while the distances 𝑥𝑖+1 − 𝑥𝑖 shrink toward zero, there is always further to
go because of the open-endedness at the left of the interval ( 0 .. ∞) .
1.19 Figure: Zeno of Elea shows Youths the Doors to Truth and False, by covering half
the distance to the door, and then half of that, etc. (By either B Carducci (1560–
1608) or P Tibaldi (1527–1596).)
Zeno’s Paradox is not directly connected to the material of this section. But in
this chapter we we will often give arguments that use the unboundedness of the
natural numbers, that is, that leverage the open-endedness of N at infinity.
II.1 Exercises
1.25 Decide if each set is finite or infinite and justify your answer. (a) { 1, 2, 3 }
(b) { 0, 1, 4, 9, 16, ... } (c) the set of prime numbers (d) the set of real roots of
𝑥 5 − 5𝑥 4 + 3𝑥 2 + 7
1.26 Show that each pair of sets has the same cardinality by producing a one-to-
one and onto function from one to the other. You must verify that the function is a
correspondence. (a) { 0, 1, 2 }, { 3, 4, 5 } (b) Z, {𝑖 3 𝑖 ∈ Z }
✓ 1.27 Show that each pair of sets has the same cardinality by producing a corre-
spondence (you must verify that the function is a correspondence): (a) { 0, 1, 3, 7 }
and { 𝜋, 𝜋 + 1, 𝜋 + 2, 𝜋 + 3 } (b) the even natural numbers and the perfect squares
(c) the real intervals ( 1 .. 4) and (−1 .. 1) .
✓ 1.28 Verify that the function 𝑓 (𝑥) = 1/𝑥 is a correspondence between the subsets
( 0 .. 1) and ( 1 .. ∞) of R.
1.29 Give a formula for a correspondence between the sets { 1, 2, 3, 4, ... } and
{ 7, 10, 13, 16 ... }.
✓ 1.30 Consider the set of characters 𝐶 = { 0, 1, ... 9 } and the set of integers
𝐴 = { 48, 49, ... 57 }.
(a) Produce a correspondence 𝑓 : 𝐶 → 𝐴.
(b) Verify that the inverse 𝑓 − 1 : 𝐴 → 𝐶 is also a correspondence.
✓ 1.31 Show that each pair of sets have the same cardinality. You must give a
suitable function and also verify that it is one-to-one and onto. (a) N and the set
of even numbers (b) N and the odd numbers (c) the even numbers and the odd
numbers
✓ 1.32 Although sometimes there is a correspondence that is natural, correspon-
dences need not be unique. Produce the natural correspondence from ( 0 .. 1) to
( 0 .. 2) , and then produce a different one, and then another different one.
1.33 Example 1.8 gives one correspondence between the natural numbers and
the perfect squares. Give another.
1.34 Fix 𝑐 ∈ R such that 𝑐 > 1. Show that 𝑓 : R → ( 0 .. ∞) given by 𝑥 ↦→ 𝑐 𝑥 is a
correspondence.
1.35 Show that the set of powers of two { 2𝑘 𝑘 ∈ N } and the set of powers of
three { 3𝑘 𝑘 ∈ N } have the same cardinality. Generalize.
1.36 For each, give functions from N to itself. You must justify your claims.
(a) Give two examples of functions that are one-to-one but not onto. (b) Give two
examples of functions that are onto but not one-to-one. (c) Give two that are neither.
(d) Give two that are both.
1.37 Show that the intervals ( 3 .. 5) and (−1 .. 10) of real numbers have the same
cardinality by producing a correspondence. Then produce a second one.
1.38 Show that the sets have the same cardinality. (a) { 4𝑘 𝑘 ∈ N }, { 5𝑘 𝑘 ∈ N }
(b) { 0, 1, ... 99 }, {𝑚 ∈ N 𝑚 2 < 10 000 } (c) { 0, 1, 3, 6, 10, 15, ... }, N
✓ 1.39 Produce a correspondence between each pair of open intervals of reals.
(a) ( 0 .. 1) , ( 0 .. 2)
Section 1. Infinity 65
Section
II.2 Cantor’s correspondence
Countability is a property of sets so we can ask how it interacts with set operations.
We start with the Cartesian product operation, in part because we will want to
count Turing machines, which are sets of four-tuples.
2.1 Example The set 𝑆 = { 0, 1 } × N consists of ordered pairs ⟨𝑖, 𝑗⟩ where 𝑖 ∈ { 0, 1 }
and 𝑗 ∈ N. The diagram below shows two columns, each of which looks like
the natural numbers in that it is discrete and unbounded in one direction. So
informally, 𝑆 is twice the natural numbers. As in Galelio’s Paradox this might lead
to a mistaken guess that it has more members than N. But 𝑆 is countable.
To count it, alternate between columns.
.. ..
. .
h0, 3i h1, 3i
h0, 2i h1, 2i
h0, 1i h1, 1i
h0, 0i h1, 0i
𝑛∈N 0 1 2 3 4 5 ...
⟨𝑖, 𝑗⟩ ∈ 𝑆 ⟨0, 0⟩ ⟨1, 0⟩ ⟨0, 1⟩ ⟨1, 1⟩ ⟨0, 2⟩ ⟨1, 2⟩ ...
Section 2. Cantor’s correspondence 67
The map from the table’s top row to the bottom is a pairing function. Its inverse,
from bottom to top, is an unpairing function. This counting technique extends to
three copies, { 0, 1, 2 } × N, to four copies, etc.
2.3 Lemma The Cartesian product of two finite sets is finite, and therefore countable.
The Cartesian product of a finite set and a countably infinite set, or of a countably
infinite set and a finite set, is countably infinite.
Proof Exercise 2.42; use the above example as a model.
2.4 Example The obvious next set to consider is the Cartesian product of the two
countably infinite sets, N × N. In the informal language of the prior example we
can describe it as infinitely many copies of the natural numbers.
.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···
Sticking to a single column or row won’t work so here also we need to alternate.
Starting from the lower left, do a breadth-first traversal: after ⟨0, 0⟩ , next take pairs
that are one away, ⟨1, 0⟩ and ⟨0, 1⟩ , then those that are two away, ⟨2, 0⟩ , ⟨1, 1⟩
and ⟨0, 2⟩ , etc.
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
𝑛∈N 0 1 2 3 4 5 6 ...
⟨𝑥, 𝑦⟩ ∈ N2 ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
Diagonal 0 1 2 3
The pair ⟨1, 2⟩ is on diagonal 3. Prior to that diagonal comes six pairs: diagonal 0
has a single entry, diagonal 1 has two entries, and diagonal 2 has three entries.
Thus, because the counting starts at zero, diagonal 3’s initial pair ⟨0, 3⟩ is number 6
in Cantor’s correspondence. With that, ⟨1, 2⟩ is number 7.
To find the number corresponding to ⟨𝑥, 𝑦⟩ , observe that it lies on diagonal
𝑑 = 𝑥 + 𝑦 . Prior to diagonal 𝑑 comes 1 + 2 + · · · 𝑑 pairs, which is an arithmetic
series with total 𝑑 (𝑑 + 1)/2. So on diagonal 𝑑 the first pair, ⟨0, 𝑥 + 𝑦⟩ , has
number (𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2 in Cantor’s correspondence. Next on that diagonal,
⟨1, 𝑥 + 𝑦 − 1⟩ gets the number 1 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] , etc. In general,
cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] .
2.8 Example Two early examples are cantor ( 1, 2) = 7 and cantor ( 6, 2) = 42. A later
one is cantor ( 0, 36) = 666.
2.9 Lemma The Cartesian product N × N is countable, for instance under Cantor’s
correspondence, cantor : N2 → N. As well, the sets N3 = N × N × N, and N4 , . . .
are all countable.
Proof The function cantor : N × N → N is one-to-one and onto by construction,
meaning that the construction ensures that each output natural number is associated
with one and only one input pair.
The prior paragraph with domain N2 forms the base step of an induction
argument. To do N3 the idea is to take a triple ⟨𝑥, 𝑦, 𝑧⟩ to be a pair whose first entry
is a pair, ⟨⟨𝑥, 𝑦⟩, 𝑧⟩ . More formally, define cantor3 : N3 → N by cantor3 (𝑥, 𝑦, 𝑧) =
cantor ( cantor (𝑥, 𝑦), 𝑧) . Exercise 2.35 shows that this function is a correspondence.
With that, the details of the full induction are routine.
2.10 Corollary The Cartesian product of finitely many countable sets is countable.
Proof Suppose that 𝑆 0, ... 𝑆𝑛− 1 are countable and that each function 𝑓𝑖 : N → 𝑆𝑖 is
a correspondence. By the prior result, the tuple-ing function cantor𝑛− 1 : N → N𝑛
is a correspondence. Write cantor𝑛− 1 (𝑘) = ⟨𝑘 0, 𝑘 1, ... 𝑘𝑛− 1 ⟩ . Then the composition
𝑘 ↦→ ⟨𝑓0 (𝑘 0 ), 𝑓1 (𝑘 1 ), ... 𝑓𝑛−1 (𝑘𝑛−1 )⟩ from N to 𝑆 0 × · · · 𝑆𝑛−1 is a correspondence.
Thus 𝑆 0 × 𝑆 1 × · · · 𝑆𝑛− 1 is countable.
2.11 Example Also countable is the set of rational numbers, Q. We have already used
the technique of counting by alternating between positives and negatives. So
it suffices to count the nonnegative rationals with some 𝑓 : N → Q+ ∪ { 0 }. A
nonnegative rational number is a numerator-denominator pair ⟨𝑛, 𝑑⟩ ∈ N × N+ .
The complication is that some pairs collapse, such as that 𝑛 = 10 and 𝑑 = 5 is the
Section 2. Cantor’s correspondence 69
𝑠𝑖 – if 𝑖 < 𝑛
(
𝑓 (𝑖) =
𝑠 0 – otherwise
If 𝑆 is infinite and countable then it has the same cardinality as N so there is a
correspondence 𝑓 : N → 𝑆 . Correspondences are onto.
For the converse assume that either 𝑆 is empty or there is an onto map from N
to 𝑆 . Definition 1.13 says that an empty set is countable so what’s left is to consider
an onto map 𝑓 : N → 𝑆 . If 𝑆 is finite then it is countable so we are down to the
case where 𝑆 is infinite. Define 𝑓ˆ: N → 𝑆 by 𝑓ˆ(𝑛) = 𝑓 (𝑘) where 𝑘 is the least
natural number such that 𝑓 (𝑘) ∉ { 𝑓ˆ( 0), ... 𝑓ˆ(𝑛 − 1) }. Such a 𝑘 exists because 𝑆 is
infinite and 𝑓 is onto. This 𝑓ˆ is both one-to-one and onto, by construction.
2.14 Corollary (1) Any subset of a countable set is countable. (2) The intersection of
two countable sets is countable. More generally, the intersection of any number of
countable sets is countable. (3) The union of two countable sets is countable. The
union of any finite number of countable sets is countable. The union of countably
many countable sets is countable.
Proof For (1), suppose that 𝑆 is countable and that 𝑆ˆ ⊆ 𝑆 . If 𝑆 is empty then so is 𝑆ˆ,
and thus it is countable. Otherwise by the prior lemma there is an onto 𝑓 : N → 𝑆 .
If 𝑆ˆ is empty then it is countable, and otherwise fix some 𝑠ˆ ∈ 𝑆ˆ. Then this map
𝑓ˆ: N → 𝑆ˆ is onto.
𝑓 (𝑛) – if 𝑓 (𝑛) ∈ 𝑆ˆ
(
𝑓ˆ(𝑛) =
𝑠ˆ – otherwise
Item (2) is immediate from (1) since the intersection is a subset, of both sets.
70 Chapter II. Background
Now item (3). In the two-set case suppose that 𝑆 0 and 𝑆 1 are countable.
If either set is empty, or both, then the result is trivial because for instance
𝑆 0 ∪ ∅ = 𝑆 0 . Otherwise, suppose that 𝑓0 : N → 𝑆 0 and 𝑓1 : N → 𝑆 1 are onto.
Then count by alternating between the two sets. More precisely, Lemma 2.3
gives a correspondence 𝑔 : N → { 0, 1 } × N and this is a function that is onto the
set 𝑆 0 ∪ 𝑆 1 .
𝑓0 ( 𝑗) – if 𝑔(𝑛) = ⟨0, 𝑗⟩
(
𝑓2 (𝑛) =
𝑓1 ( 𝑗) – if 𝑔(𝑛) = ⟨1, 𝑗⟩
instance round-trip from the number to the machine and back to the number.
The exact numbering scheme that we use doesn’t matter much as long as it is
has the properties in the definition below. But for illustration here is an outline
of a specific way: starting with a Turing machine P, effectively convert each of
its instructions to a number, giving a set {𝑖 0, 𝑖 1, ... 𝑖𝑛 }. Then define the number 𝑒
associated with P to be the one that when written in binary has 1 in bits 𝑖 0 , . . . 𝑖𝑛 ,
that is, 𝑒 = 2𝑖 0 + 2𝑖 1 + · · · + 2𝑖𝑛. For the inverse, given 𝑒 ∈ N, expand it into binary
as 𝑒 = 2 𝑗0 + · · · + 2 𝑗𝑘 and the set of instructions corresponding to the numbers 𝑗0 ,
. . . 𝑗𝑘 is the Turing machine. (Except that we must check that the instruction set is
deterministic, that no two instructions begin with the same 𝑞𝑝𝑇𝑝 . If this is not true
then let the machine associated with 𝑒 be the empty machine, P = { }.)
2.17 Definition A numbering is a function that assigns to each Turing machine a
natural number. A numbering is acceptable if: (1) there is an effective function
that takes as input the set of instructions and gives as output the associated number,
(2) the set of numbers for which there is an associated machine is computable,
and (3) there is an effective inverse that takes as input a natural number and gives
as output the associated machine.
For the rest of the book we will just fix a numbering and cite its properties
rather than deal with its details. We call this the machine’s index number or
Gödel number. For the machine with index 𝑒 ∈ N we write P𝑒 . For the function
computed by P𝑒 we write 𝜙𝑒 .
Think of the machine’s index as its name. We will refer to the index frequently,
for instance by saying “the 𝑒 -th Turing machine.” The takeaway point is that
because the numbering is acceptable there is a program to go from the machine’s
index to its source, the set of four-tuple instructions, and a program going from the
source to the index. Briefy, the index is computationally equivalent to the source.†
2.18 Lemma (Padding lemma) Every computable function has infinitely many indices: if
𝑓 is computable then there are infinitely many distinct 𝑒𝑖 ∈ N with 𝑓 = 𝜙𝑒0 =
𝜙𝑒1 = · · · . We can effectively produce a list of such indices.
2.19 Remark In programming terms, the lemma says that for any compiled behavior
there are infinitely many different source codes. One way to get them is by starting
with a single source code and padding it by adding to the bottom a comment line
that contains the number 0, or the number 1, etc.
Proof Let 𝑓 = 𝜙𝑒 . Let 𝑞 𝑗 be the highest-numbered state in P𝑒 . For each 𝑘 ∈ N+
consider the Turing machine obtained from P𝑒 by adding the instruction 𝑞 𝑗+𝑘 BB𝑞 𝑗+𝑘 ,
This gives an effective sequence of Turing machines P𝑒1 , P𝑒2 , . . . with distinct
indices, all having the same behavior, 𝜙𝑒𝑘 = 𝑓 .
†
Here is an informal alternative index-source correspondence that can give some intuition about
numbering. On a computer, a program’s source code is saved as a bitstring, which we can interpret as a
binary number. In the other direction, given a number we take it to be a bitstring, and dissasemble it
into machine code source. (One problem with this approach is that if the first character in the source is
represented by binary 0 then in passing to a binary number that information is lost. There are patches
for the problems but they reduce the intuitive appeal so while this idea is helpful, it is best left informal.)
72 Chapter II. Background
With the ability to number machines, we are set up for this book’s most
important result. The next section shows that while the set of Turing machines is
countable, the set of natural number functions 𝑓 : N → N is not. This will establish
that there are functions that are not computable.
II.2 Exercises
✓ 2.20 Extend the table of Example 2.1 through 𝑛 = 12. Where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ , give
formulas for 𝑥 and 𝑦 .
✓ 2.21 For each pair ⟨𝑎, 𝑏⟩ find the pair before it and the pair after it in Cantor’s
correspondence. That is, where cantor (𝑎, 𝑏) = 𝑛 , find the pair associated with
𝑛 + 1 and the pair with 𝑛 − 1. (a) ⟨50, 50⟩ (b) ⟨100, 4⟩ (c) ⟨4, 100⟩ (d) ⟨0, 200⟩
(e) ⟨200, 0⟩
✓ 2.22 Corollary 2.14 says that the union of two countable sets is countable.
(a) For the sets 𝑇 = { 2𝑘 𝑘 ∈ N } and 𝐹 = { 5𝑚 𝑚 ∈ N } produce a correspon-
dence 𝑓𝑇 : N → 𝑇 and 𝑓𝐹 : N → 𝐹 . Give a table listing the values of 𝑓𝑇 ( 0) , . . .
𝑓𝑇 ( 9) and give another table listing 𝑓𝐹 ( 0) , . . . 𝑓𝐹 ( 9) .
(b) Give a table listing the first ten values for a correspondence 𝑓 : N → 𝑇 ∪ 𝐹 .
2.23 Give an enumeration of N × { 0, 1 }. Find the pair matching 0, 10, 100, and
101. Find the number corresponding to ⟨2, 1⟩ , ⟨20, 1⟩ , and ⟨200, 1⟩ .
✓ 2.24 Example 2.1 says that the method for two columns extends to three. Give a
function enumerating { 0, 1, 2 } × N. That is, where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ give a formula
for 𝑥 and 𝑦 as functions of 𝑛 . Find the pair corresponding to 0, 10, 100, and 1 000.
Find the number corresponding to ⟨1, 2, 3⟩ , ⟨1, 20, 300⟩ , and ⟨1, 200, 3000⟩ .
2.25 Give an enumeration 𝑓 of { 0, 1, 2, 3 } × N. That is, where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ , give
a formula for 𝑥 and 𝑦 . Also give the formula the general case of an enumeration of
{ 0, 1, 2, ... 𝑘 } × N.
✓ 2.26 Extend the table of Example 2.4 to cover correspondences up to 16.
✓ 2.27 Definition 2.6’s function cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] is clearly
effective since it is given as a formula. Show that its inverse, pair : N → N2 , is also
effective by sketching a way to compute it with a program.
2.28 Prove that if 𝐴 and 𝐵 are countable sets then their symmetric difference
𝐴Δ𝐵 = (𝐴 − 𝐵) ∪ (𝐵 − 𝐴) is countable.
2.29 Show that the subset 𝑆 = {𝑎 + 𝑏𝑖 𝑎, 𝑏 ∈ Z } of the complex numbers is
countable.
2.30 List the first dozen nonnegative rational numbers enumerated by the method
described in Example 2.11.
2.31 Let 𝑆 be countably infinite and let 𝑇 ⊂ 𝑆 be finite.
(a) Show that 𝑆 − 𝑇 is countable.
(b) Show that 𝑆 − 𝑇 is countably infinite.
(c) Can there be an infinite subset 𝑇 so that 𝑆 − 𝑇 is infinite?
2.32 Show that every infinite set contains a countably infinite subset.
Section 2. Cantor’s correspondence 73
2.45 Use Lemma 2.13 to give a much slicker, and shorter, proof that the rational
numbers are countable than the one in Example 2.11.
2.46 The formula for Cantor’s unpairing function cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 +
𝑦 + 1)/2] give a correspondence for natural number input. What about for real
number input? (a) Find cantor ( 2, 1) . (b) Fix 𝑥 = 1 and find two different 𝑦 ∈ R
so that cantor ( 1, 𝑦) = cantor ( 2, 1) .
Section
II.3 Diagonalization
Following Cantor’s definition of cardinality, we produced a number of correspon-
dences between sets. After working through these example maps, a person could
come to think that for any two infinite sets there is some sufficiently clever way to
give a matching between them.
This impression is wrong. There are pairs of infinite sets that do not correspond.
To demonstrate this we now introduce a very powerful technique. Our interest in
it goes far beyond this result — it is central to the entire subject.
Diagonalization There are sets so large that they are not countable. That is, there
are infinite sets 𝑆 for which no correspondence exists between 𝑆 and N. One such
set is R.
3.1 Theorem There is no onto map 𝑓 : N → R. Hence, the set of reals is not
countable.
We start by illustrating the proof ’s technique. The table below shows a function
𝑓 : N → R, listing some inputs and outputs, with the outputs aligned on the
decimal point.
Input 𝑛 Decimal expansion of output 𝑓 (𝑛)
0 42 . 3 1 2 7 7 0 4 ...
1 2 . 0 1 0 0 0 0 0 ...
2 1 . 4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
4 0 . 1 0 1 0 0 1 0 ...
5 −0 . 6 2 5 5 4 1 8 ...
.. ..
. .
We will show that this function is not onto by producing a number 𝑧 ∈ R that does
not equal any of the 𝑓 (𝑛) ’s.
Ignore what is to the left of the decimal point. To its right go down the diagonal,
taking the digits 3, 1, 4, 5, 0, 1 . . . Construct the desired 𝑧 by making its first
decimal place something other than 3, making its second decimal place something
other than 1, etc. Specifically, if the diagonal digit is a 1 then in that decimal place 𝑧
gets a 2, while otherwise 𝑧 gets a 1 there. Thus, in this example 𝑧 = 0.121112 ...
By construction, 𝑧 differs from what’s in the first row, 𝑧 ≠ 𝑓 ( 0) , because they
differ in the first decimal place. Similarly, 𝑧 ≠ 𝑓 ( 1) because they differ in the
Section 3. Diagonalization 75
second place. In this way 𝑧 does not equal any of the 𝑓 (𝑛) . Therefore 𝑓 is not
onto. This technique is diagonalization.
(We have skirted a technicality, that some real numbers have two different
decimal representations. For instance, 1.000 ... = 0.999 ... because the two differ
by less than 0.1, less than 0.01, etc. This is a potential snag to the argument because
it means that even though we have constructed a representation that is different
than all the representations on the list, it still might not be that the number 𝑧 is
different than all the numbers 𝑓 (𝑛) on the list. However, dual representation only
happens for decimals when one of the representations ends in 0’s while the other
ends in 9’s. That’s why we build 𝑧 using 1’s and 2’s.)
Proof We will show that no map 𝑓 : N → R is onto.
Denote the 𝑖 -th decimal digit of 𝑓 (𝑛) as 𝑓 (𝑛) [𝑖] (if 𝑓 (𝑛) is a number with two
decimal representations then use the one ending in 0’s). Let 𝑔 be the map on the
decimal digits { 0, ... , 9 } given by: 𝑔( 𝑗) = 2 if 𝑗 is 1 and 𝑔( 𝑗) = 1 otherwise.
Now let 𝑧 be the real number that has 0 to the left of its decimal point, and whose
𝑖 -th decimal digit is 𝑔(𝑓 (𝑖) [𝑖]) . Then for all 𝑖 , 𝑧 ≠ 𝑓 (𝑖) because 𝑧 [𝑖] ≠ 𝑓 (𝑖) [𝑖] . So
𝑓 is not onto.
3.2 Definition A set that is infinite but not countable is uncountable.
3.3 Remark Before going on, we pause to reflect that the work we have seen so far
in this chapter, especially the prior theorem, is both startling and profound: some
infinite sets have more elements than other infinite sets. In particular, the reals
have more elements than the naturals. As dramatized by Galileo’s Paradox, it is
not just that the reals are a superset of the naturals. Instead, the set of naturals
cannot be made to correspond with the set of reals.
We can make an analogy with the children’s game of Musical Chairs. We have
countably many chairs 𝑃 0, 𝑃 1, ... but there are so many children — so many reals —
that at least one is left without a chair.
We next define when one set has fewer, or more, elements than another. The
intuition comes from the picture below, trying to make a correspondence between
the two finite sets { 0, 1, 2 } and { 0, 1, 2, 3 }. There are too many elements in the
codomain for any function to cover them all. The best that we can do is to cover as
many codomain elements as possible, with a function that is one-to-one but not
onto.
3
2
2
1
1
0
0
3.4 Definition The set 𝑆 has cardinality less than or equal to that of the set 𝑇 ,
denoted |𝑆 | ≤ |𝑇 | , if there is a one-to-one function from 𝑆 to 𝑇 .
3.5 Example The inclusion map 𝜄 : N → R that sends 𝑛 ∈ N to itself, 𝑛 ∈ R, is
one-to-one and so | N | ≤ | R | . By Theorem 3.1 the cardinality is strictly less.
76 Chapter II. Background
3.6 Remark The wording of that definition suggests that if both |𝑆 | ≤ |𝑇 | and |𝑇 | ≤ |𝑆 |
then |𝑆 | = |𝑇 | . That is true but the proof is beyond our scope; see Exercise 3.32.
For the next result, recall that for a set 𝑆 the characteristic function 1𝑆 is the
Boolean function determining membership: 1𝑆 (𝑠) = 𝑇 if 𝑠 ∈ 𝑆 and 1𝑆 (𝑠) = 𝐹 if
𝑠 ∉ 𝑆 . (We sometimes instead use the bits 1 for 𝑇 and 0 for 𝐹 .) Thus for the set of
two characters 𝑆 = { a, c }, the characteristic function with domain Σ = { a, ... , z }
is 1𝑆 ( a) = 𝑇 , 1𝑆 ( b) = 𝐹 , 1𝑆 ( c) = 𝑇 , 1𝑆 ( d) = 𝐹 , ... 1𝑆 ( z) = 𝐹 .
Recall also that the power set P (𝑆) is the collection of subsets of 𝑆 . For
instance, if 𝑆 = { a, c } then P (𝑆) = { ∅, { a }, { c }, { a, c } }.
3.7 Theorem (Cantor’s Theorem) A set’s cardinality is strictly less than that of its
power set.
We first illustrate the proof. One half is easy: to start with a set 𝑆 and produce
a function to P (𝑆) that is one-to-one, just map 𝑠 ∈ 𝑆 to the set {𝑠 }.
The harder half is showing that no map from 𝑆 to P (𝑆) is a correspondence.
For an example of this half consider the set 𝑆 = { a, b, c }. We will walk through
how we prove that this function 𝑓 : 𝑆 → P (𝑆) is not onto.
𝑓 𝑓 𝑓
a ↦−→ { b, c } b ↦−→ { b } c ↦−→ { a, b, c } (∗)
Below, the first row, the a row, lists the values of the characteristic function
1 𝑓 ( a ) = 1{ b, c } on the inputs a, b, and c. The second row lists the values for
1 𝑓 ( b ) = 1{ b } . And, the third row lists 1 𝑓 ( c ) = 1{ a, b, c } .
𝑠 ∈𝑆 𝑓 (𝑠) 1 𝑓 (𝑠 ) ( a) 1 𝑓 (𝑠 ) ( b) 1 𝑓 (𝑠 ) ( c)
a { b, c } 𝐹 𝑇 𝑇
b {b} 𝐹 𝑇 𝐹
c { a, b, c } 𝑇 𝑇 𝑇
We show that 𝑓 is not onto by producing a member of P (𝑆) that is not any of the
three sets in (∗). For that, diagonalize. Take the table’s diagonal 𝐹𝑇𝑇 and flip the
values to get 𝑇 𝐹 𝐹. That describes the characteristic function of the set 𝑅 = { a }.
This set is not equal to the set 𝑓 ( a) because their characteristic functions differ
on a. Similarly, 𝑅 is not the set 𝑓 ( b) because the characteristic functions differ
on b, and 𝑅 is not 𝑓 ( c) because they differ on c. So 𝑅 is not in the range of 𝑓 , so 𝑓
is not onto.
Proof First, |𝑆 | ≤ | P (𝑆)| because the inclusion map 𝜄 : 𝑆 → P (𝑆) given by
𝜄 (𝑠) = {𝑠} is one-to-one. For the ‘strictly’ half we will show that no map from a set
to its power set is onto. Fix 𝑓 : 𝑆 → P (𝑆) and consider this element of P (𝑆) .
𝑅 = {𝑠 𝑠 ∉ 𝑓 (𝑠) }
We will demonstrate that no member of the domain maps to 𝑅 , and thus 𝑓 is not
onto. Suppose that there exists 𝑠ˆ ∈ 𝑆 such that 𝑓 (𝑠ˆ) = 𝑅 . Consider whether 𝑠ˆ
is an element of 𝑅 . We have that 𝑠ˆ ∈ 𝑅 if and only if 𝑠ˆ ∈ {𝑠 𝑠 ∉ 𝑓 (𝑠) }. By the
Section 3. Diagonalization 77
definition of 𝑅 , that holds if and only if 𝑠ˆ ∉ 𝑓 (𝑠ˆ) , which holds if and only if 𝑠ˆ ∉ 𝑅 .
The contradiction means that no such 𝑠ˆ exists.
3.8 Corollary The cardinality of the set N is strictly less than the cardinality of the
set of functions 𝑓 : N → N.
Proof Let the set of functions be 𝐺 . There is a correspondence between P ( N)
and 𝐺 , namely the one that associates each subset 𝑆 ⊆ N with its characteristic
function, 1𝑆 : N → N. Therefore | N | < | P ( N)| ≤ |𝐺 | .
3.9 Corollary (Existence of uncomputable functions) There is a function
𝑓 : N → N that is not computable: 𝑓 ≠ 𝜙𝑒 for all 𝑒 .
Proof Lemma 2.9 shows that the cardinality of the set of Turing machines equals
the cardinality of the set N. The prior result shows that the cardinality of the set
of functions from N to itself is strictly greater than the cardinality of N. So the
cardinality of the set of functions from N to itself is greater than the cardinality
of the set of Turing machines — no association of Turing machines with natural
number functions is onto. In particular, when we associate each Turing machine
with the function that it computes, that association is not onto. There is a natural
number function that is without a Turing machine to compute it.
This is an epochal result. In the light of Church’s Thesis, we take it to prove
that there are jobs that no computer can do.
To a person trained in programming, where students are trained to go from
task to the program that does the task, the existence of things that cannot be done
can be a surprise, perhaps even a shock. One point that these results make is that
the work here on sizes of infinities, which can at first seem impracticably abstract,
leads to interesting and useful conclusions.
II.3 Exercises
3.10 Your study partner is confused about the diagonal argument. “If you had an
infinite list of numbers, it would clearly contain every number, right? I mean, if
you had a list that was truly INFINITE, then you simply couldn’t find a number
that is not on the list!” Help them out.
3.11 Your classmate says, “Professor, I’m confused. The set of numbers with one
decimal place, such as 25.4 and 0.1, is clearly countable — just take the integers
and shift all the decimal places by one. The set with two decimal places, such
as 2.54 and 6.02 is likewise countable, etc. This is countably many sets, each of
which is countable, and so the union is countable. The union is the whole reals, so
I think that the reals are countable.” Where is their mistake?
3.12 Verify Cantor’s Theorem, Theorem 3.7, for these finite sets. (a) { 0, 1, 2 }
(b) { 0, 1 } (c) { 0 } (d) { }
✓ 3.13 Use Definition 3.4 to prove that the first set has cardinality less than or equal
to the second.
(a) 𝑆 = { 1, 2, 3 } , 𝑆ˆ = { 11, 12, 13 }
78 Chapter II. Background
✓ 3.26 Example 2.11 shows that the rational numbers are countable. What happens
when we apply diagonal argument given in Theorem 3.1 is to an enumeration
of the rationals? Consider a sequence 𝑞 0, 𝑞 1, ... that contains all of the rationals.
Represent each of those numbers with a decimal expansion 𝑞𝑖 = 𝑑𝑖 .𝑑𝑖,0𝑑𝑖,1𝑑𝑖,2 ...
(where 𝑑𝑖 ∈ Z and 𝑑𝑖,𝑗 ∈ { 0, ... 9 }) that does not end in all 9’s, so that the decimal
expansion is unique.
(a) Let 𝑔 be the map on the decimal digits 0, 1, ... 9 given by 𝑔( 1) = 2, and 𝑔(𝑖) = 1
if 𝑖 ≠ 2. Consider the number down the diagonal, 𝑑 = 𝑛∈ N 𝑑𝑛,𝑛 · 10 − (𝑛+1 ).
Í
Transform its digits using 𝑔, that is, define 𝑧 = 𝑛∈ N 𝑔(𝑑𝑛,𝑛 ) · 10 − (𝑛+1 ). Show
Í
that 𝑧 is irrational.
(b) Use the prior item to conclude that the diagonal number 𝑑 = 𝑛∈ N 𝑑𝑛,𝑛 ·
Í
10 − (𝑛+1 ) is irrational. Hint: show that it has no repeating pattern in its decimal
expansion.
(c) Why is the fact that the diagonal is not rational not a contradiction to the fact
that we can enumerate all of the rationals?
3.27 Verify Cantor’s Theorem in the finite case by showing that if 𝑆 is finite then
the cardinality of its power set is | P (𝑆)| = 2 |𝑆 | .
3.28 The key to the proof of Cantor’s Theorem, Theorem 3.1 is the definition of
𝑅 = {𝑠 𝑠 ∉ 𝑓 (𝑠) }. This story illustrates the idea: a high school yearbook asks
each graduating student 𝑠𝑖 make a list 𝑓 (𝑠𝑖 ) of class members that they predict will
someday be famous. Define the set of humble students 𝐻 to consist of those who
are not on their own list. Show that no student’s list equals 𝐻 .
3.29 Show that there is no set of all sets. Hint: use Theorem 3.7.
3.30 The proof of Theorem 3.1 must work around the fact that some numbers
have more than one base ten representation. Base two also has the property that
some numbers have more than one representation; an example is 0.01000 ... and
0.00111 .... But in a base two argument, when building 𝑧 there is no way to avoid
the digits 0 and 1. How could you make the argument work in base two?
3.31 The discussion after the statement of Theorem 3.1 includes that the real
number 1 has two different decimal representations, 1.000 ... = 0.999 ...
(a) Verify this equality by using the formula for an infinite geometric series,
𝑎 + 𝑎𝑟 + 𝑎𝑟 2 + 𝑎𝑟 3 + · · · = 𝑎/( 1 − 𝑟 ) .
(b) Show that if a number has two different decimal representations then in the
leftmost decimal place where they differ, they differ by 1. Hint: that is the
biggest difference that the remaining decimal places can make up.
(c) In addition show that for the one with the larger digit in that first differing
place, all of the digits to its right are 0, while the other representation has that
all of the remaining digits are 9’s.
3.32 Definition 3.4 extends the definition of equal cardinality to say that |𝐴| ≤ |𝐵|
if there is a one-to-one function from 𝐴 to 𝐵 . The Schröder–Bernstein theorem is
that if both |𝑆 | ≤ |𝑇 | and |𝑇 | ≤ |𝑆 | then |𝑆 | = |𝑇 | . We will walk through the proof.
It depends on finding chains of images: for any 𝑠 ∈ 𝑆 we form the associated chain
80 Chapter II. Background
by iterating application of the two functions, both to the right and the left, as here.
Section
II.4 Universality
We have seen a number of Turing machines, such as one whose output is the
successor of its input, one that adds two input numbers, and others. These are
Section 4. Universality 81
Weaving by hand, as the loom operator on the left is doing, is intricate and slow.
We can make a machine to reproduce her pattern. But what if we want a different
pattern; do we need another machine? In 1801 J Jacquard built a loom like the
one on the right, controlled by cards. Getting a different pattern does not require
a new loom, it only requires swapping cards.
Turing introduced the analog to this for computers. He produced a Turing
machine UP that can be fed a tape containing a description of a Turing machine M,
along with input for that machine. Then UP will have the same input-output
behavior as would M. If M halts on the input then UP will halt and give the same
output, while if M does not halt on that input then UP also does not halt.
This single machine can be made to have any desired computable behavior. So
we don’t need infinitely many different machines, we can just use UP . This was
what we meant by saying that a good first take on Turing machines is that they are
82 Chapter II. Background
Simulate P𝑒 on
sal machines are familiar from everyday computing. For one thing, we
can compare this flow chart with the behavior of a computer operating input 𝑥
system. An operating system is given a program to run and some data
to feed to that program. Think of the program as P𝑒 and the data as Print result
End
a bitstring that we can interpret as a number, 𝑥 . The operating system
arranges that the underlying hardware will behave like machine 𝑒 , with
input 𝑥 . In short, as with an operating system, Universal Turing machines change
their behavior in software. No patch chords.
Another everyday computing experience that is like a universal machine is a
language interpreter. Below is an interaction with the Racket interpreter. At the
first prompt we type in a routine that takes x and returns the sum of the first
x numbers. At the second prompt we specify the input to that routine, 𝑥 = 4.
$ racket
Welcome to Racket v8.2 [cs].
> (define (triangular x)
(if (= x 0)
0
(+ x (triangular (sub1 x)))))
> (triangular 4)
10
†
We could also define a Universal Turing machine to take the single-number input cantor (𝑒, 𝑥 ) . ‡ This
is a flowchart, which gives a high level sketch of a routine. We use three types of boxes. Boxes with
rounded corners are for Start and End. Rectangles are for ordinary operations on data. In later charts
we will also see diamond boxes, which are for decisions, if statements.
Section 4. Universality 83
The most direct example of computing systems that act as universal machines
is a language’s eval statement. At the first prompt below we define a routine
that has the interpreter evaluate the expression that is input. In the next prompt
we define a list (quoted so that it is not interpreted). This list lambda (i) ...
describes a function of one input.† In the third and fourth prompts, the interpreter
evaluates the routine that is described in that list and applies it to the numbers 5
and 0. That is, as with the loom’s punched cards, we can make utm behave as
different routines, by giving it a description of whatever routine is desired.
> (define (utm s)
(eval s))
> (define test '(lambda (i) (if (= i 0) 1 0)))
> ((utm test) 5)
0
> ((utm test) 0)
1
Finally, as to the proof of the theorem, the simplest way to prove that something
exists is to produce it. We have already exhibited what amounts to a Universal
Turing machine. At the end of Chapter One, on page 38, we gave code for a Turing
machine simulator, which reads a Turing machine from a file and then runs it. The
code is in Racket but Church’s Thesis asserts that we could write a Turing machine
with the same behavior.
Uniformity Consider this job: given a real number 𝑟 ∈ R, write a program to
output its digits. More precisely, produce a Turing machine P𝑟 such that when
given 𝑛 ∈ N as input, P𝑟 outputs the 𝑛 -th decimal place of 𝑟 (for 𝑛 = 0, it outputs
the integer to the left of the decimal point).
We know that this is not possible for all 𝑟 because while there are uncountably
many real numbers, there are only countably many Turing machines. But what
stops us? One of the enjoyable things about coding is the feeling of being able
to get the machine to do anything — why can’t write a routine that will output
whatever digits we like?
There certainly are real numbers for which there is such a routine. One is 11.25.
(define (one-quarter -decimal -places n)
(cond
[(= n 0) 11]
[(= n 1) 2]
[(= n 2) 5]
[else 0]))
For a more generic number, say, some 𝑟 = 0.703 ... , we might momentarily imagine
brute-forcing it.
(define (r-decimal -place n)
(cond
[(= n 0) 0]
[(= n 1) 7]
[(= n 2) 0]
[(= n 3) 3]
...
))
†
It uses ‘lambda’ to start the definition of a function because that’s the word Church used.
84 Chapter II. Background
But that’s silly. Programs have finite length and so can’t have infinitely many cases.
That is, because of the if , what the following program does on 𝑛 = 7 is
unconnected to what it does on other inputs.
(define (foo n)
(if (= n 7)
42
(* 2 n)))
But a program can only have finitely many such differently-behaving branches. The
fact that a Turing machine has only finitely many instructions imposes a condition
of uniformity on its behavior.
4.3 Example Connecting in this way the idea that ‘something is computable’ with ‘it
is uniformly computable’ has some surprising consequences. Consider the problem
of producing a program that inputs a number 𝑛 and decides whether somewhere
in the decimal expansion of 𝜋 = 3.14159 ... there are 𝑛 consecutive nines.
There are two possibilities. Either for all 𝑛 such a sequence exists, or else there
is some 𝑁 where a sequence of nines exists for lengths less than 𝑁 and no sequence
exists when 𝑛 ≥ 𝑁 . Consequently the problem is solved: one of the two below is
the right program (for illustration here we take 𝑁 = 1234).
(define (sequence -of-nines -0 n) (define (sequence -of-nines -1 n)
1) (if (< n 1234)
1
0))
One surprising aspect of this argument is that neither of the two routines
appears to have much to do with 𝜋 . Also surprising, and perhaps unsettling, is that
we have shown that the problem is solvable without showing how to solve it. That
is, there is a difference between showing that this function is computable
and possessing an algorithm to compute it. This shows that the assertion “something
is computable if you can write a program for it” at the very least suppresses some
important subtleties.
In contrast, imagine that we have a routine pi_decimals that inputs 𝑖 ∈ N
and outputs the 𝑖 -th decimal place of 𝜋 . Using it, we can write a program that takes
in 𝑛 and steps through 𝜋 ’s digits, looking for 𝑛 consecutive nines. This approach
has the advantage that it doesn’t just say whether the answer exists, it constructs
that answer. This approach is also uniform in the sense that we could modify
it to use other routines such as e_decimals and so look for strings of nines in
other numbers. However this approach has the disadvantage that if there is an
𝑁 where 𝜋 does not have 𝑛 consecutive nines for 𝑛 ≥ 𝑁 then this program will
search without bound, and never discover that.
input 𝑥 , including not halting if the machine does not halt on that input. That
is, there is a computable function 𝜙 : N2 → N such that 𝜙 (𝑒, 𝑥) = 𝜙𝑒 (𝑥) if 𝜙𝑒 (𝑥)↓
and 𝜙 (𝑒, 𝑥)↑ if 𝜙𝑒 (𝑥)↑.
There, the 𝑒 travels from the function’s argument to the index. We now
generalize. Start with a program that takes two inputs such as this one.
(define (P x y)
(+ x y))
Freeze the first argument. The result is a one-input program. Here we freeze 𝑥
at 7 and at 8.
This is partial application because we are not freezing all of the input variables.
Instead, we are parametrizing the variable 𝑥 .
The programs in the family are related to the starting one, obviously. Denoting
the function computed by the above starting program P as 𝜓 (𝑥, 𝑦) = 𝑥 + 𝑦 , partial
application gives a family of functions: 𝜓 0 (𝑦) = 𝑦 , 𝜓 1 (𝑦) = 1 +𝑦 , 𝜓 2 (𝑦) = 2 +𝑦 , . . .
The next result says that in general, from the index of a starting Turing machine
or computable function and from the values that are frozen, we can compute the
family members.
4.4 Theorem (s-m-n theorem, or Parameter theorem) For every 𝑚, 𝑛 ∈ N there
is a computable total function 𝑠𝑚,𝑛 : N1+𝑚 → N such that for an 𝑚 +𝑛 -ary function
𝜙𝑒 (𝑥 0, ... 𝑥𝑚−1, 𝑥𝑚 , ... 𝑥𝑚+𝑛−1 ) , freezing the initial 𝑚 variables at 𝑎 0, ... 𝑎𝑚−1 ∈ N
gives the 𝑛 -ary computable function 𝜙𝑠 (𝑒,𝑎0,...𝑎𝑚−1 ) (𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 ) .
Proof We will produce the function 𝑠 to satisfy three requirements: it must be
effective, it must input an index 𝑒 and an 𝑚 -tuple 𝑎 0, ... 𝑎𝑚− 1 , and it must output
the index of a machine P̂ that, when given the input 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 , will return the
value 𝜙𝑒 (𝑎 0, ... 𝑎𝑚− 1, 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 ) , or fail to halt if that function diverges.
The idea is that the machine that computes 𝑠 will construct the instructions
for P̂ . We can get effectively from the instruction set to the index, so with that we
will be done.
Below on the left is the flowchart for the machine that computes the function 𝑠 .
In its third box it creates the set of four-tuple instructions, P̂, sketched on the
right. The machine on the left needs 𝑎 0, ... 𝑎𝑚− 1 for the right side’s second, third,
and fourth boxes, and it needs 𝑒 for P̂ ’s fifth box. (In this book we try to avoid
getting entangled in the detail of the convention for representations for inputs and
outputs of Turing machines. However in this proof, to be as clear as possible in the
right side’s flowchart, we assume that its input is encoded in unary, that inputs
are separated with a single blank, and that when the machine is started the head
should be under the input’s left-most 1.)
86 Chapter II. Background
Start
Start
Move left 𝑎0 + · · · + 𝑎𝑚−1 + 𝑚 cells
Read 𝑒, 𝑎0 , ... , 𝑎𝑚−1
Put 𝑎0 , . . ., 𝑎𝑚−1 on the tape
Create instructions for 𝑃ˆ separated by blanks
The Turing machine P̂ does not first read its inputs 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 . Instead,
it first moves left and writes 𝑎 0, ... 𝑎𝑚− 1 on the tape, in unary and separated by
blanks, and with a blank between 𝑎𝑚− 1 and 𝑥𝑚 . (Recall that the 𝑎𝑖 are parameters,
not variables. They are fixed. They are, so to speak, hard-coded into P̂.) Then,
using universality, 𝑃ˆ simulates Turing machine 𝑃𝑒 and lets it run on the entire list
of inputs now on the tape, 𝑎 0, ... 𝑎𝑚− 1, 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 .
In the notation 𝑠𝑚,𝑛 , the subscript 𝑚 is the number of inputs being frozen
while 𝑛 is the number of inputs left free. These subscripts can be a bother and we
often omit them.
The key point about the s-m-n theorem is that it gives not just one computable
function but instead a family.
4.5 Example Consider the two-input routine sketched by this flowchart.
Start
Read 𝑥 , 𝑦
(∗)
Print 𝑥 · 𝑦
End
Start
Read 𝑦
(∗∗)
Print 𝑥 · 𝑦
End
Compare (∗∗) to (∗). The difference is that the machine in (∗∗) does not read 𝑥 ;
rather, thinking of these as programs instead of Turing machines, 𝑥 is hard-coded
into the source body.
In summary, the s-m-n Theorem gives a sequence of computable functions such
as 𝜙𝑠 (𝑒0,𝑥 ) that is a family in that the indices are given by a computable function.
This family is parametrized by 𝑥 , since 𝑒 0 is fixed.
Restated, this family is uniformly computable — there is a computable function 𝑠
(more precisely, 𝑠 1,1 ) going from the index 𝑒 and the parameter value 𝑥 to the index
of the result in (∗∗). So the s-m-n Theorem is about uniformity.
II.4 Exercises
✓ 4.6 Someone in your study group asks, “What can a Universal Turing machine do
that a regular Turing machine cannot?” Help them out.
4.7 Has anyone ever built a Universal Turing machine or a device equivalent to
one, or is it a theory-only thing?
4.8 Can a Universal Turing machine simulate another Universal Turing machine,
or for that matter can it simulate itself?
✓ 4.9 Your class has someone who says, “Universal Turing machines make no sense
to me. How could a machine simulate another machine that has more states?”
Correct their misimpression.
4.10 Is there more than one Universal Turing machine?
✓ 4.11 Consider the function 𝑓 (𝑥 0, 𝑥 1 ) = 3𝑥 0 + 𝑥 0 · 𝑥 1 .
(a) Freeze 𝑥 0 to have the value 4. What is the resulting one-variable function?
(b) Freeze 𝑥 0 at 5. What is the resulting one-variable function?
(c) Freeze 𝑥 1 to be 0. What is the resulting function?
4.12 Consider 𝑓 (𝑥 0, 𝑥 1, 𝑥 2 ) = 𝑥 0 + 2𝑥 1 + 3𝑥 2 .
(a) Freeze 𝑥 0 to have the value 1. What is the resulting two-variable function?
(b) What two-variable function results from fixing 𝑥 0 to be 2?
(c) Let 𝑎 be a natural number. What two-variable function results from fixing 𝑥 0
to be 𝑎 ?
(d) Freeze 𝑥 0 at 5 and 𝑥 1 at 3. What is the resulting one-variable function?
(e) What one-variable function results from fixing 𝑥 0 to be 𝑎 and 𝑥 1 to be 𝑏 , for
𝑎, 𝑏 ∈ N?
88 Chapter II. Background
✓ 4.13 Suppose that the Turing machine sketched by this flowchart has index 𝑒 0 .
Start
Read 𝑥 0 , 𝑥 1
Print 𝑥 0 + 𝑥 1
End
Start
Read 𝑥 0 , 𝑥 1 , 𝑥 2
Print 𝑥 0 + 𝑥 1 · 𝑥 2
End
Read 𝑥 0 , 𝑥 1
Print 𝑥 1
N
𝑥 0 > 1? Y
Infinite loop
End
(a) Describe 𝜙𝑠1,1 (𝑒0,0 ) . (b) Find 𝜙𝑠1,1 (𝑒0,0 ) ( 5) . (c) Describe 𝜙𝑠1,1 (𝑒0,1 ) . (d) Find
𝜙𝑠1,1 (𝑒0,1 ) ( 5) . (e) Describe 𝜙𝑠1,1 (𝑒0,2 ) . (f) Find 𝜙𝑠1,1 (𝑒0,2 ) ( 5) .
✓ 4.16 Let the Turing machine sketched by this flowchart have index 𝑒 0 .
Start
Read 𝑥 0 , 𝑥 1 , 𝑦
Y 𝑥 0 even? N
Print 𝑥 1 · 𝑦 Print 𝑥 1 + 𝑦
End
Section 5. The Halting problem 89
Section
II.5 The Halting problem
We’ve showed that there are functions that are not mechanically computable. We
gave a counting argument, that there are countably many Turing machines but
uncountably many functions and so there are functions with no associated machine.
While knowing what’s true is great, even better is to exhibit a specific function that
is unsolvable. We will now do that.
Input Start
0 1 2 3 4 5 6 ...
𝜙0 3 1 2 7 7 0 4 ... Read 𝑒
0 5 0 0 0 0 0
Compute table entry
𝜙1 ...
1 4 1 5 9 2 6
for index 𝑒 , input 𝑒
Function 𝜙2 ...
𝜙3 9 1 9 1 9 1 9 ...
Print result + 1
𝜙4 1 0 1 0 0 1 0 ...
𝜙5 6 2 5 5 4 1 8 ...
End
.. ..
. .
Diagonalizing means considering the machine on the right. It moves down the
array’s diagonal, changing the 3, changing the 5, etc. Thus, when 𝑒 = 0 then the
output is 4, when 𝑒 = 1 then the output is 6, etc. Our goal with this machine
is to ensure that no computable function, none of the table’s rows, has the same
input-output relationship as this machine.
But that’s a puzzle. The flowchart outlines an effective procedure — we can
implement this using a Universal Turing machine in its third box — and thus it
seems that its output should be one of the rows.
What’s the puzzle’s resolution? The flowchart’s first, second, fourth, and fifth
boxes are trivial so the answer must involve the third one. There must be an 𝑒 ∈ N
so that 𝜙𝑒 (𝑒) ↑, so that for that number the machine in the flowchart never gets
through its middle box, and consequently never gives any output. That is, to avoid
a contradiction the above table must contain ↑ ’s.
So this puzzle has led to a key insight: the fact that some computations fail to
halt on some inputs is very important.
5.1 Problem (Halting problem) † Given 𝑒 ∈ N, determine whether 𝜙𝑒 (𝑒) ↓, that is,
whether Turing machine P𝑒 halts on input 𝑒 .
1 – if 𝜙𝑒 (𝑒)↓
(
1𝐾 (𝑒) = 𝐾 (𝑒) = halt_decider (𝑒) =
0 – if 𝜙𝑒 (𝑒)↑
That assumption implies that the function 𝑓 below is also mechanically computable.
(In the top case the particular output value 42 doesn’t matter, all that matters is
that 𝑓 converges.) The flowchart illustrates how 𝑓 is constructed; it uses the above
function halt_decider in its decision box.
†
We use a distinct typeface for problem names, as in ‘Halting’.
Section 5. The Halting problem 91
Start
Read 𝑒
42 – if 𝜙𝑒 (𝑒)↑
(
Print 42
Y
𝐾 (𝑒 ) = 0? N
Infinite loop
𝑓 (𝑒) =
↑ – if 𝜙𝑒 (𝑒)↓
End
Since this is mechanically computable, it has a Turing machine index. Let that
index be 𝑒 0 , so that 𝑓 (𝑥) = 𝜙𝑒0 (𝑥) for all inputs 𝑥 .
Now consider 𝑓 (𝑒 0 ) = 𝜙𝑒0 (𝑒 0 ) (that is, feed the machine P𝑒0 its own index).
If it diverges then the first clause in the definition of 𝑓 means that 𝑓 (𝑒 0 )↓, which
contradicts the assumption of divergence. If it converges then 𝑓 ’s second clause
means that 𝑓 (𝑒 0 )↑, also a contradiction. So there are two possibilities and both lead
to a contradiction. Since assuming that halt_decider is mechanically computable
gives a contradiction, that function is not mechanically computable.
We say that a problem is unsolvable if no Turing machine has the specified
input-output behavior. If the problem is to compute the answers to ‘yes’ or ‘no’
questions, that is, to decide membership in a set, then we say that the set is
undecidable. With Church’s Thesis in mind, we interpret these to mean that the
problem or set is unsolvable by any discrete mechanism.
General unsolvability We have named one task, the Halting problem, that no
mechanical device can solve. We will next leverage that one to produce many jobs
that cannot be done. That is, the Halting problem is part of a larger phenomenon
of unsolvability.
5.4 Example Consider this problem: we want an algorithm that tells us whether a
given Turing machine halts on the input 3. That is: given 𝑒 , does 𝜙𝑒 ( 3)↓?
We will show that if this
1 – if 𝜙𝑒 ( 3)↓
(
halts_on_three_decider (𝑒) =
0 – otherwise
were a computable function then we could compute the solution of the Halting
problem. That’s impossible, so we will then know that halts_on_three_decider
is also not computable.
Our strategy is to create a scheme where being able to determine whether
an arbitrary machine halts on 3 allows us to settle questions about the Halting
problem. Imagine that we have a particular 𝑥 and want to know whether 𝜙𝑥 (𝑥)↓.
Consider the machine outlined on the right below. It reads the input 𝑦 and ignores
it, and also gives a nominal output. Its action is in the middle box, where the code
uses a universal Turing machine to simulate running P𝑥 on input 𝑥 . If that halts
then the machine on the right as a whole halts, for any input. If not then it never
gets through its middle box and so the machine as a whole does not halt.
92 Chapter II. Background
Start Start
Read 𝑥, 𝑦 Read 𝑦
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
Print 42 Print 42
End End
As just one case, the machine on the right halts on input 𝑦 = 3 if and only if P𝑥
halts on 𝑥 (having P𝑥 halt on 𝑥 implies the same for all other input 𝑦 ’s but it is not
relevant to our strategy). So with machine on the right, if we were able to answer
questions about halting on 3 then we could leverage that ability, making ourselves
able to determine whether P𝑥 halts on 𝑥 .
We are ready for the argument. Consider this function.
42 – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
5.6 Example We will show that this problem is not mechanically solvable: given 𝑒 ,
determine whether P𝑒 outputs 7 for some input.
The argument is much like the one in the prior example. Consider this.
7 – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
The flowchart on the left below sketches how to compute 𝜓. Thus 𝜓 is intuitively
mechanically computable and by Church’s Thesis there is a Turing machine whose
input-output behavior is 𝜓. That Turing machine has an index, 𝑒 0 , so that 𝜓 = 𝜙𝑒0 .
Start Start
Read 𝑥, 𝑦 Read 𝑦
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
Print 7 Print 7
End End
The function on the left below is intuitively computable by the flowchart in the
middle.
94 Chapter II. Background
Start Start
Read 𝑥, 𝑦 Read 𝑦
2𝑦 – if 𝜙𝑥 (𝑥)↓
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
Print 2𝑦 Print 2𝑦
End End
So Church’s Thesis says that there is a Turing machine that computes it. Let that
machine’s index be 𝑒 0 . Apply the s-m-n theorem to get a family of functions 𝜙𝑠 (𝑒0,0 ) ,
𝜙𝑠 (𝑒0,1 ) , . . . The generic member of this family P𝑠 (𝑒0,𝑥 ) is sketched by the flowchart
on the right. It illustrates that 𝜙𝑥 (𝑥)↓ if and only if doubler_decider (𝑠 (𝑒 0, 𝑥)) = 1.
So the supposition that doubler_decider is computable implies that the Halting
problem is computably solvable, which is false.
These examples show that the Halting problem serves as a touchstone for
unsolvability: often we prove that something is unsolvable by demonstrating that if
we could solve it then we could solve the Halting problem. We say that the Halting
problem reduces to the given problem.† Thus for instance the Halting problem
reduces to the problem of determining whether a given Turing machine halts on
input 3.
Discussion The unsolvability of the Halting problem is one of the most important
results in the Theory of Computation. We will close with a few points.
First, to reiterate, saying that a problem is unsolvable means that it is unsolvable
by a mechanism, that no Turing machine computes the solution to the problem.
There is a function that solves it, but that function is not effectively computable.
Second, the fact that the Halting problem is unsolvable does not mean that for
all computations, we cannot tell if that computation halts. Obviously this program
halts for every input.
> (define (successor i)
(+ 1 i))
Nor does it mean that we cannot tell if a computation does not halt. This one,
> (define (f x)
(displayln x)
(f (+ 1 x)))
once started, just keeps going (below, control-C interrupted the run).
> (f 0)
0
1
...
97806
97807
; user break [,bt for context]
†
Often newcomers get this terminology backwards. We are using ‘reduces to’ in the same sense
that we would in saying in Calculus, “finding the area under the graph of a polynomial reduces to
antidifferentiating that polynomial.” We can find the area if we can antidifferentiate. Similarly here,
we can solve the Halting problem if we can solve the halts on 3 problem.
Section 5. The Halting problem 95
Instead, the unsolvability of the Halting problem says: there is no single program
that for all 𝑒 correctly decides in a finite time whether P𝑒 halts on input 𝑒 .
That sentence contains the qualifier ‘single program’ because for any index 𝑒 ,
either P𝑒 halts on 𝑒 or else it does not. Consequently, for any 𝑒 one of these two
programs produces the right answer.
Of course, guessing which one of the two applies is not what we have in mind
when we think about solving the Halting problem. We want uniformity. We want a
single effective procedure, one program, that inputs 𝑒 and that outputs the right
answer.
The sentence above also includes the qualifier ‘finite time’. We could write code
that reads an input 𝑒 and simulates P𝑒 on input 𝑒 . This is a uniform approach
because it is a single program. If P𝑒 on input 𝑒 halts then our code would discover
that. But if it does not halt then our code would not get that result in a finite time.
In short, the second point is that the unsolvability of the Halting Problem is about
the non-existence of a single program that works across all indices. Theorem 5.3
speaks to uniformity — specifically, it says that uniformity is impossible.
Our third point is about why unsolvability of the Halting problem is so important
in the subject. A beginning programming class could leave the impression that if a
program doesn’t halt then it just has a bug, something fixable. So it could seem to
a student in that course that the Halting problem is not interesting.
That impression is wrong. Imagine that we could somehow write a utility
always_halt that inputs any source P and adjusts it so that for any input where
P does not halt, the modified program will halt (with some nominal output) but
the utility does not change any outputs where P does halt. That would give a
list of total functions like the one on page 90, and diagonalization would give a
contradiction. Thus, in any general computational scheme there must be some
computations that halt on all inputs, some that halt on no inputs, and some that
halt on a proper subset of inputs but not on the rest. Unsolvability of the Halting
problem is inherent in the nature of computation.
This alone is enough to justify study of the problem but our fourth point is
that there is another reason for our interest. With a computable halt_decider in
hand, we could solve many other problems. Some we saw above in this section but
there are others that we currently don’t know how to solve that involve unbounded
search.
For instance, a perfect number is a natural number that is the sum of its proper
positive divisors. An example is that 6 is perfect because 6 = 1 + 2 + 3. Another is
28 = 1 + 2 + 4 + 7 + 14. The next two perfect numbers are 496 and 8128. These
numbers have been studied since Euclid and today we understand the form of all
even perfect numbers. But no one knows if there are any odd perfect numbers.†
†
People have done computer checks up to 101500 and not found any.
96 Chapter II. Background
II.5 Exercises
5.8 Someone asks the professor, “I don’t get the point of the Halting problem.
If you want programs to halt then just watch them and when they exceed a set
number of cycles, send a kill signal.” How to respond?
5.9 Is this statement right or wrong: there is no function that solves the Halting
Problem, that is, there is no 𝑓 such that 𝑓 (𝑒) = 1 if 𝜙𝑒 (𝑒)↓ and 𝑓 (𝑒) = 0 if 𝜙𝑒 (𝑒)↑?
†
This program takes an input 𝑥 but ignores it; in this book we like to have the machines that we use
take an input and also give an output.
Section 5. The Halting problem 97
ℎ( 3𝑛 + 1) – else
The Collatz conjecture is that ℎ(𝑛) = 1 for all 𝑛 ∈ N, that is, ℎ(𝑛) halts in that it
does not keep expanding forever. No one knows whether the Collatz conjecture is
true. Is it an unsolvable problem to determine whether ℎ halts on all input?
✓ 5.15 For each of these, is it true or false?
(a) The problem of determining, given 𝑒 , whether 𝜙𝑒 ( 3)↓ is unsolvable because
no function halts_on_three_decider exists.
(b) The existence of unsolvable problems indicates weaknesses in the models of
computation, and we need stronger models.
5.16 A set is computable if its characteristic function is a computable function.
Consider the set consisting of the single number 1 if in 1924 G Mallory reached the
summit of Everest, and otherwise consisting of 0. Is that set computable?
98 Chapter II. Background
5.17 Describe the family of computable functions that you get by using the
s-m-n Theorem to parametrize 𝑥 in each function. Also give flowcharts sketching
the associated machines for 𝑥 =( 0, 𝑥 = 1, and 𝑥 = 2. (a) 𝑓 (𝑥, 𝑦) = 3𝑥 + 𝑦
𝑥 – if 𝑥 is odd
(b) 𝑓 (𝑥, 𝑦) = 𝑥𝑦 2 (c) 𝑓 (𝑥, 𝑦) =
0 – otherwise
5.18 Show that each of these is a solvable problem. (a) Given an index 𝑒 ,
determine whether Turing machine P𝑒 runs for at least 42 steps on input 3.
(b) Given an index 𝑒 , determine whether P𝑒 runs for at least 42 steps on input 𝑒 .
(c) Given 𝑒 , decide whether P𝑒 runs for at least 𝑒 steps on input 𝑒 .
Each exercise from 5.19 through 5.25 states a problem. Show that the problem is
unsolvable by reducing the Halting problem to it.
✓ 5.19 See the instructions above. Given an index 𝑒 , determine if 𝜙𝑒 is a total function,
that is, if it converges on every input.
✓ 5.20 See the instructions before Exercise 5.19. Given an index 𝑒 , decide if the
Turing machine P𝑒 squares its input. That is, decide if 𝜙𝑒 associates 𝑦 ↦→ 𝑦 2.
5.21 See the instructions above. Given 𝑒 , determine if the function 𝜙𝑒 halts and
returns the same value on two consecutive inputs, so that 𝜙𝑒 (𝑦) = 𝜙𝑒 (𝑦 + 1) for
some 𝑦 ∈ N.
✓ 5.22 See the instructions above. Given 𝑒 , decide whether 𝜙𝑒 fails to converge on
input 5.
5.23 See the instructions above. Given an index, determine if the computable
function with that index fails to converge on all odd numbers.
5.24 See the instructions above. Given 𝑒 , decide if the function 𝜙𝑒 has the action
𝑥 ↦→ 𝑥 + 1.
5.25 See the instructions above. Given 𝑒 , decide if the function 𝜙𝑒 fails to converge
on both inputs 𝑥 and 2𝑥 , for some 𝑥 .
5.26 One of these problems is solvable and one is not. Which is which? (a) Given
an index 𝑒 , decide if P𝑒 halts on input 153. (b) Given an index 𝑒 , decide if P𝑒
halts in sooner than 1000 steps on input 153.
5.27 Fix integers 𝑎, 𝑏, 𝑐 ∈ N and consider the problem L𝑎,𝑏,𝑐 of determining
whether there is a single-number input cantor (𝑥, 𝑦) such that 𝑎𝑥 + 𝑏𝑦 = 𝑐 . Is this
problem solvable or unsolvable?
5.28 For each problem, state whether it is solvable, unsolvable, or you cannot
tell. You needn’t give a proof, just decide. (a) Given 𝑒 , decide if P𝑒 halts on
all even numbers 𝑦 . (b) Given 𝑒 , decide if P𝑒 halts on three or fewer inputs 𝑦 .
(c) Given 𝑒 , decide if P4 halts on input 𝑒 . (d) Given 𝑒 , decide if P𝑒 contains an
instruction with state 𝑞𝑒 .
Section 5. The Halting problem 99
✓ 5.29 For each problem, fill in the blanks to show that the problem is unsolvable.
We will show that this is not mechanically computable.
1 – if (2)
(
(1) _decider (𝑒) =
0 – otherwise
(3) – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
0 – otherwise
Read 𝑥 , 𝑦 Read 𝑦
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
__(4)__ __(4)__
End End
By Church’s Thesis there is a Turing machine with that behavior. Let that machine have
index 𝑒 0 , so that 𝜓 (𝑥, 𝑦) = 𝜙𝑒0 (𝑥, 𝑦) . Apply the s-m-n Theorem to parametrize 𝑥 . A member
of the resulting family of Turing machines is sketched above on the right. Observe that
𝜙𝑥 (𝑥)↓ if and only if (1) _decider (𝑠 (𝑒 0, 𝑥)) = 1. Because the function 𝑠 is mechanically
computable, if (1) _decider were mechanically computable then the Halting problem
would be mechanically solvable. But the Halting problem is not mechanically solvable. Therefore
(1) _decider is not mechanically computable.
(a) Given machine index 𝑒 , decide if there is a 𝑦 ∈ N so that P𝑒 outputs 𝑦 on
input 𝑦 .
(b) Given 𝑒 , decide if there is a 𝑦 so that 𝜙𝑒 (𝑦) = 42.
(c) Given 𝑒 , decide if there is a 𝑦 so that 𝜙𝑒 (𝑦) = 𝑦 + 2.
5.30 In some ways a more natural set than 𝐾 = {𝑒 ∈ N 𝜙𝑒 (𝑒)↓} is 𝐾0 =
{ ⟨𝑒, 𝑥⟩ ∈ N2 𝜙𝑒 (𝑥)↓}. Use the fact that 𝐾 is not computable to prove that 𝐾0 is
also not computable.
5.31 The Halting problem of determining membership in the set 𝐾 = {𝑒 𝜙𝑒 (𝑒)↓}
appears to be an aggregate, or to cut across all Turing machines, in that for every
Turing machine a piece of information about that machine forms part of 𝐾 .
(a) Produce a single Turing machine, P𝑒 , such that the question of determining
membership in {𝑦 𝜙𝑒 (𝑦)↓} is undecidable.
(b) Fix a number 𝑦 . Show that the question of whether P𝑒 halts on 𝑦 is decidable.
✓ 5.32 For each, if it is mechanically solvable then sketch a algorithm to solve it. If
it is unsolvable then show that.
(a) Given 𝑒 ∈ N, determine the number of states in P𝑒 .
(b) Given 𝑒 , determine whether P𝑒 halts when the input is the empty string.
(c) Given 𝑒 , determine if P𝑒 halts on input 𝑛 within one hundred steps.
100 Chapter II. Background
5.33 Is 𝐾 infinite?
5.34 True or false: the number of unsolvable problems is countably infinite.
5.35 Show that for any Turing machine, the problem of determining whether it
halts on all inputs is solvable.
5.36 Goldbach’s conjecture, is that every even natural number greater than two is
the sum of two primes. It is one of the oldest and best-known unsolved problems
in mathematics. Show that if we could solve the Halting problem then we could in
principle settle Goldbach’s conjecture.
5.37 Brocard’s problem asks whether there are any numbers besides 4, 5, and 7
for which 𝑛 ! + 1 is a perfect square (computer searches up to a quadrillion, 1 × 1015 ,
have not found any other solutions). Show that if we could solve the Halting
problem then we could in principle settle this problem.
5.38 Show that most problems are unsolvable by showing that there are uncount-
ably many functions 𝑓 : N → N that are not computed by any Turing machine,
while the number of function that are computable is countable.
5.39 Give an example of a computable function that is total, meaning that it
converges on all inputs, but whose range is not computable.
5.40 A set of bitstrings is a decidable language if its characteristic function is
computable. Prove each. (a) The union of two decidable languages is a decidable
language. (b) The intersection of two decidable languages is a decidable language
(c) The complement of a decidable language is a decidable language.
Section
II.6 Rice’s Theorem
Our finishing point in the prior section was that the results and examples there give
the intuition that we cannot mechanically analyze the behavior of Turing machines.
In this section we will make this intuition precise.
Mechanical analysis does apply to some properties of Turing machines. We
can write a routine that, given 𝑒 , determines whether or not P𝑒 has a four-tuple
instruction whose first entry is the state 𝑞 5 . The analogue in ordinary programming
is that we can write a program to parse source code for a variable named x1. But
these are not what we mean by “behavior.” Instead, they are properties of the
implementation.
6.1 Definition Two computable functions have the same behavior, 𝜙𝑒 ≃ 𝜙𝑒ˆ, if they
converge on the same inputs 𝑥 ∈ N and when they do converge, they have the
same outputs.‡
‡
Strictly speaking, we don’t need the symbol ≃. By definition, a function is a set of ordered pairs. If
𝜙𝑒 ( 0 ) ↓ while 𝜙𝑒 ( 1 ) ↑ then the set 𝜙𝑒 contains a pair with first entry 0 but no pair starting with 1.
Thus for partial functions, if they converge on the same inputs and when they do converge they have
the same outputs, then we can simply say that the two are equal, 𝜙 = 𝜙ˆ. But we use ≃ as a reminder
that the functions may be partial.
Section 6. Rice’s Theorem 101
6.2 Definition A set I of natural numbers is an index set† when for all 𝑒, 𝑒ˆ ∈ N, if
𝑒 ∈ I and 𝜙𝑒 ≃ 𝜙𝑒ˆ then also 𝑒ˆ ∈ I .
6.3 Example If we fix a behavior and consider the indices of all of the Tur-
ing machines with that behavior then we get an index set. Thus the set
I = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 2𝑥 for all 𝑥 } is an index set. To verify, suppose that 𝑒 ∈ I
and that 𝑒ˆ ∈ N is such that 𝜙𝑒 ≃ 𝜙𝑒ˆ. Then 𝜙𝑒ˆ also doubles its input: 𝜙𝑒ˆ (𝑥) = 2𝑥 for
all 𝑥 . Thus 𝑒ˆ ∈ I also.
6.4 Example We can also get an index set by collecting multiple behaviors together.
The set J = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 , or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 } is an index set.
For, suppose that 𝑒 ∈ J and that 𝜙𝑒 ≃ 𝜙𝑒ˆ where 𝑒ˆ ∈ N. Because 𝑒 ∈ J, either
𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 . From 𝜙𝑒 ≃ 𝜙𝑒ˆ we know that either
𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 , and consequently 𝑒ˆ ∈ J.
6.5 Example The set {𝑒 ∈ N P𝑒 contains an instruction starting with 𝑞 10 } is not an
index set. We can easily produce two Turing machines having the same behavior
where one machine contains such an instruction while the other does not.
6.6 Theorem (Rice’s theorem) Every index set that is not trivial, that is not empty
and not all of N, is not computable.
Proof Let I be a nontrivial index set. Choose an 𝑒 ∈ N so that 𝜙𝑒 (𝑦) ↑ for all 𝑦 .
Then either 𝑒 ∈ I or 𝑒 ∉ I . We shall show that in the second case I is not
computable. The first case is similar and is Exercise 6.36.
So assume 𝑒 ∉ I . Since I is not empty it contains an index 𝑒ˆ ∈ I . Because I is
an index set, 𝜙𝑒 ; 𝜙𝑒ˆ. Thus there is a 𝑦 such that 𝜙𝑒ˆ (𝑦)↓.
Consider the flowchart on the left below. By Church’s Thesis there is a
Turing machine with that behavior. Let it be P𝑒0 . Apply the s-m-n theorem to
parametrize 𝑥 , resulting in the uniformly computable family of functions 𝜙𝑠 (𝑒0,𝑥 )
whose computation is outlined on the right.
Start Start
Read 𝑥, 𝑦 Read 𝑦
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
End End
We’ve constructed the machine on the right so that if 𝜙𝑥 (𝑥)↑ then 𝜙𝑠 (𝑒0,𝑥 ) ≃ 𝜙𝑒 and
thus 𝑠 (𝑒 0, 𝑥) ∉ I . As well, if 𝜙𝑥 (𝑥) ↓ then 𝜙𝑠 (𝑒0,𝑥 ) ≃ 𝜙𝑒ˆ, and thus 𝑠 (𝑒 0, 𝑥) ∈ I . It
follows that if I were mechanically computable, so that we could effectively check
whether 𝑠 (𝑒 0, 𝑥) ∈ I , then we could solve the Halting problem.
6.7 Example We will use Rice’s Theorem to show that this problem is unsolvable: given
𝑒 , decide if 𝜙𝑒 ( 3)↓. We must define an appropriate set I and then verify that it is
not empty, that it is not all of N, and that it is an index set.
†
It is called an index set because it is a set of indices.
102 Chapter II. Background
Let I = {𝑒 ∈ N 𝜙𝑒 ( 3)↓}. The simplest way to verify that this set is not empty
is to exhibit a member. The routine sketched on the left below is intuitively
computable and so Church’s Thesis says there is a Turing machine with that
behavior. That machine’s index is a member of I and thus I ≠ ∅.
Start
Start
Read 𝑥
Read 𝑥
Print 42
Infinite loop
End
Likewise, to verify that I does not contain every number, consider the routine on
the right. Church’s Thesis gives that there is a Turing machine with that behavior.
That machine’s index is not a member of I and so I ≠ N.
We finish by verifying that I is an index set. Assume that 𝑒 ∈ I and let 𝑒ˆ ∈ N
be such that 𝜙𝑒 ≃ 𝜙𝑒ˆ. Because 𝑒 ∈ I , we have that 𝜙𝑒 ( 3) ↓. Because 𝜙𝑒 ≃ 𝜙𝑒ˆ, we
have that 𝜙𝑒ˆ ( 3)↓ also, and thus 𝑒ˆ ∈ I . Hence, I is an index set.
The above example is the same problem as in the first example of the prior
subsection. Note that Rice’s Theorem makes the answer considerably simpler. (Of
course, our development of the theorem requires the prior section’s work.)
6.8 Example We can use Rice’s Theorem to show that the prior section’s second
problem is unsolvable: given 𝑒 , decide if 𝜙𝑒 (𝑥) = 7 for some 𝑥 . Rice’s Theorem
asks us to produce an appropriate I and verify that it is a nontrivial index set.
Let I = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 7 for some 𝑥 }. This set is not empty because there
is a Turing machine that acts as the identity function, so that 𝜙 (𝑥) = 𝑥 , and the
index of that machine is a member of I . This set is not all of N because there is a
Turing Machine that never halts, 𝜙 (𝑥)↑ for all 𝑥 , and that machine’s index is not a
member of I . Hence I is nontrivial.
To show that I is an index set assume that 𝑒 ∈ I , and let 𝑒ˆ ∈ N be such that
𝜙𝑒 ≃ 𝜙𝑒ˆ. By the assumption, 𝜙𝑒 (𝑥 0 ) = 7 for some input 𝑥 0 . Since the two have the
same behavior, the same input gives 𝜙𝑒ˆ (𝑥 0 ) = 7. Consequently, 𝑒ˆ ∈ I .
6.9 Example This problem is also unsolvable: given 𝑒 , decide whether 𝜙𝑒 equals this.
4 – if 𝑥 is prime
(
𝑓 (𝑥) =
𝑥 + 1 – otherwise
··· ···
Example 6.4, an index set joins together the indices from a number of behaviors; in
in the picture above on the right it is three of them. The property of being in this I
is extensional because we are taking entire behavior tiles.
In summary, although Rice’s Theorem does not apply to all problems, never-
theless it is especially significant for understanding what can be done through
mechanical analysis alone. Rice’s Theorem is about those properties of machines
that extend to be properties of the functions that those machines compute. For
instance, the problem of deciding whether a program computes the squaring
function is unsolvable, but the problem of deciding whether the code uses the letter
‘k’ is not.
II.6 Exercises
6.10 Your friend is confused, “According to Rice’s Theorem, everything is impos-
sible. Every property of a computer program is non-computable. But I do this
supposedly impossible stuff all the time!” Help them out.
6.11 Is I = {𝑒 P𝑒 runs for at least 100 steps on input 5 } an index set?
6.12 Why does Rice’s theorem not show that this problem is unsolvable: given 𝑒 ,
decide whether ∅ ⊆ {𝑥 𝜙𝑒 (𝑥)↓}?
6.13 True or false: the given property of machines is extensional. Briefly justify.
(a) The machine halts on input 5. (b) It has exactly four instructions. (c) It
computes twice its input.
6.14 Briefly describe why these machine properties, listed in the section, are
non-extensional. (a) The property of halting within 100 steps on every input,
(b) of visiting fewer than 100 tape cells on input 0, and (c) of containing the
state 𝑞 10 .
6.15 Give a trivial index set: fill in the blanks I = {𝑒 P𝑒 } so that
the set I is empty.
6.16 Give a trivial index set: fill in the blanks I = {𝑒 P𝑒 } so that
the set I is all of N.
6.17 For each problem, produce an index file suitable for applying Rice’s Theorem.
You needn’t give the entire argument, just produce a file.
(a) Given 𝑒 , determine if P𝑒 halts on input 7 with output 7.
(b) Given 𝑒 , determine if P𝑒 halts on input 𝑒 and returns output 𝑒 .
(c) Given 𝑒 , determine if P2𝑒 returns output 7 for any input 𝑦 .
(d) Given 𝑒 , determine if P𝑒 halts on 7 and gives a prime number.
For each of the problems from Exercise 6.18 to Exercise 6.24, show that it is unsolvable
by applying Rice’s theorem. (These repeat the problems from Exercise 5.19 to
Exercise 5.25.)
✓ 6.18 See the instructions above. Given an index 𝑒 , determine if 𝜙𝑒 is total, that is,
if it converges on every input.
✓ 6.19 See the instructions above. Given an index 𝑒 , decide if the Turing machine P𝑒
squares its input. That is, decide if 𝜙𝑒 performs 𝑦 ↦→ 𝑦 2 .
Section 6. Rice’s Theorem 105
6.20 See the instructions above. Given 𝑒 , determine if the function 𝜙𝑒 returns the
same value on two consecutive inputs, so that 𝜙𝑒 (𝑦) = 𝜙𝑒 (𝑦 + 1) for some 𝑦 ∈ N.
6.21 See the instructions above. Given an index 𝑒 , determine whether 𝜙𝑒 fails to
converge on input 5.
6.22 See the instructions above. Given an index, determine whether the Turing
machine P𝑒 fails to halt on all odd numbers.
6.23 See the instructions above. Given an index 𝑒 , decide if the function 𝜙𝑒
computed by machine P𝑒 performs 𝑥 ↦→ 𝑥 + 1.
6.24 See the instructions above. Given an index 𝑒 , decide if the function 𝜙𝑒 fails
to converge on both inputs 𝑥 and 2𝑥 , for some 𝑥 .
✓ 6.25 Show that each of these is an unsolvable problem by applying Rice’s Theorem.
(a) The problem of determining whether a function is partial, that is, whether it
fails to converge on some input.
(b) The problem of deciding whether a function ever converges, on any input.
✓ 6.26 For each problem, fill in the blanks to prove that it is unsolvable.
We will show that I = {𝑒 ∈ N (1) } is a nontrivial index set. Then Rice’s theorem will
give that the problem of determining membership in I is algorithmically unsolvable.
First we argue that I ≠ ∅. The routine sketched here: (2) is intuitively computable so
by Church’s Thesis there is such a Turing machine. That machine’s index is an element of I .
Next we argue that I ≠ N. The other sketch: (3) is intuitively computable so by
Church’s Thesis there is such a Turing machine. Its index is not an element of I .
Finally, we show that I is an index set. Suppose that 𝑒 ∈ I and that 𝑒ˆ is such that 𝜙𝑒 ≃ 𝜙𝑒ˆ.
Because 𝑒 ∈ I , (4) . Because 𝜙𝑒 ≃ 𝜙𝑒ˆ we have that (5) . Thus, 𝑒ˆ ∈ I . Consequently I
is an index set.
(a) Given 𝑒 , determine if Turing machine 𝑒 halts on all inputs 𝑥 that are multiples
of five.
(b) Given 𝑒 , decide if Turing machine 𝑒 ever outputs a seven.
6.27 Prove that any set that is not computable is infinite.
6.28 Define that a Turing machine accepts a set of bit strings L ⊆ B∗ if that machine
inputs bit strings, and it halts on all inputs, and it outputs 1 if and only if the input
is a member of L. Show that each problem is unsolvable, using Rice’s Theorem.
(a) The problem of deciding, given 𝑒 ∈ N, whether P𝑒 accepts an infinite language.
(b) The problem of deciding, given 𝑒 ∈ N, whether P𝑒 accepts the string 101.
6.29 As in the prior exercise, a Turing machine accepts a set of bit strings L ⊆ B∗
if it inputs bit strings, halts on all inputs, and it outputs 1 if input is a member
of L, and 0 otherwise. Show that this problem is unsolvable: given 𝑒 , determine if
P𝑒 accepts B∗ itself. Show that this problem is mechanically unsolvable: give 𝑒 ,
determine whether there is an input 𝑥 so that 𝜙𝑒 (𝑥)↓.
6.30 We say that a Turing machine has an unreachable state if for all inputs,
during the course of the computation the machine never enters that state. Show
that I = {𝑒 P𝑒 has an unreachable state } is not an index set.
106 Chapter II. Background
6.31 Your classmate says, “The section ends with ‘Rice’s Theorem is about those
properties of machines that extend to be properties of the functions that those
machines compute.’ But here is a problem that is about the properties of machines
but is also solvable: given 𝑒 , determine whether P𝑒 only halts on an empty input
tape. To solve this problem, give machine P𝑒 an empty input and see whether it
halt or it goes on.” Where are they mistaken?
6.32 Show that no finite set that is nonempty is an index set.
6.33 Show that each of these is an index set.
(a) {𝑒 ∈ N machine P𝑒 halts on at least five inputs }
(b) {𝑒 ∈ N the function 𝜙𝑒 is one-to-one }
(c) {𝑒 ∈ N the function 𝜙𝑒 is either total or else 𝜙𝑒 ( 3)↑}
6.34 In the section we characterized of index sets as in the picture below. We start
with the set of all integers, which is the rectangular box, and group them together
when they are indices of equal computable functions. Then to get an index set,
select a few parts such as the three shown, and take their union.
···
Here we justify that picture. (a) Consider the relation ≃ between natural numbers
given by 𝑒 ≃ 𝑒ˆ if 𝜙𝑒 ≃𝜙𝑒ˆ. Show that this is an equivalence relation. (b) Describe the
parts, the equivalence classes. (c) Show that each index set is the union of some
of the equivalence classes. Hint: show that if an index set contains one element of
a class then it contains them all.
6.35 Because being an index set is a property of a set, we naturally consider how
it interacts with set operations. (a) Show that the complement of an index set is
also an index set. (b) Show that the collection of index sets is closed under union.
(c) Is it closed under intersection? If so prove that and if not then give a
counterexample.
6.36 Do the 𝑒 0 ∈ I case in the proof of Rice’s Theorem, Theorem 6.6.
Section
II.7 Computably enumerable sets
The natural way to attack the Halting problem is to start by simulating P0 on
input 0 for one step. Next, simulate P0 on input 0 for a second step and also
simulate P1 on input 1 for one step. After that, run P0 on 0 for a third step, and P1
on 1 for a second step, and then P2 on 2 for one step. In this way, cycle among the
P𝑒 on 𝑒 simulations, running each for a step.† Eventually some of these halt and
†
That is, run a loop that at iteration 𝑖 runs the 𝑠 -th step of simulating P𝑒 on input 𝑒 , where 𝑖 = 𝑒 + 𝑠 .
Section 7. Computably enumerable sets 107
the elements of 𝐾 start to fill in. On computer systems this interleaving is called
time-slicing but in theory discussions it is called dovetailing.
We are imagining a computable 𝑓 such that 𝑓 ( 0) = 𝑒 , where it happens that
P𝑒 on input 𝑒 is the first of these to halt in the dovetailing, etc. The stream of
numbers 𝑓 ( 0) , 𝑓 ( 1) , . . . gives the elements of 𝐾 .
Why won’t this process solve the Halting problem? If 𝑒 ∈ 𝐾 then dovetailing will
eventually find that out. But if 𝑒 ∉ 𝐾 then it will never reveal the non-membership.
Recall that a set of natural numbers is computable if its characteristic function
is computable. We are seeing another way to describe a set, listing its members.
Definition 1.13 gives the terminology: a function 𝑓 with domain N ‘enumerates’ its
range.
7.1 Definition A set of natural numbers is computably enumerable (or c.e.) if it
is effectively listable, that is, if it is the range of a computable function. (That
function may be total or it may be partial.) Alternate terms for the same thing
are: recursively enumerable (or r.e.), or semicomputable, or semidecidable.
Picture the stream 𝜙 ( 0) , 𝜙 ( 1) , 𝜙 ( 2) , . . . gradually filling out the set. (It may
be that the stream may contain repeats, or it may be that the numbers may appear
in some willy-nilly order, not necessarily ascending, or perhaps some of the 𝜙 (𝑖) ’s
diverge.)
7.2 Remark Here is a particularly interesting stream. Fix a mathematical topic such
as elementary number theory. Statements in that topic are strings of symbols and
we can give each a number (perhaps by writing that statement in Unicode and the
number is its binary encoding, prefixed with a 1 to avoid any ambiguity of leading
0’s in the binary). Set up a process that starts with the axioms for this topic and
does a breadth-first traversal of all logical derivations from those axioms. It might
first combine axiom 0 with axiom 1, and then next combine axiom 0 with axiom 2,
etc. In this way it generates a list of all of this theory’s possible proofs. Whenever
it finishes a proof, the process outputs the number of the final statement in the
derivation, the proved statement.
Suppose that we have a statement from this topic that we are interested in, such
as Goldbach’s conjecture that every even number is the sum of at most two primes.
We could watch the process as it enumerates the theorems. If the statement is
provable then its number will eventually appear.
7.3 Lemma The following are equivalent for a set of natural numbers.
(a) It is computably enumerable, that is, it is the range of a computable function.
(b) It is the range of a total computable function, or it is empty.
(c) It is the domain of a computable function.
Proof We will show that the first and second are equivalent. That the second and
third are equivalent is Exercise 7.32.
As usual, one of the two directions is easy. Here it is (b) implies (a). If the set 𝑆
is the range of a total computable function then it is the range of a computable
function. If 𝑆 is empty then it is the range of the computable function that never
108 Chapter II. Background
converges.
Now for (a) implies (b). Assume that 𝑆 is computably enumerable so that it is
the range of a computable function 𝜙𝑒 (which may be non-total). If 𝜙𝑒 diverges for
all inputs then 𝑆 = ∅, which is one of the cases in (b).
In the other case, where 𝜙𝑒 does not diverge for all inputs, we will produce
a total computable 𝑓 whose range is 𝑆 . In this case there is an input 𝑦 where
𝜙𝑒 (𝑦)↓; let 𝑠 0 be 𝜙𝑒 (𝑦) . Define 𝑓 (𝑛) by: given 𝑛 ∈ N, run the computations of P𝑒
on inputs 0, 1, . . . 𝑛 , each for 𝑛 -many steps. Possibly some of these halt. Let 𝑓 (𝑛)
be the least 𝑘 where P𝑒 halts on some input 𝑖 within 𝑛 steps and outputs 𝑘 , and also
where 𝑘 ∉ { 𝑓 ( 0), 𝑓 ( 1), ... 𝑓 (𝑛 − 1) }. If no such 𝑘 exists then define 𝑓 (𝑛) = 𝑠 0 .
By the prior paragraph’s final sentence, 𝑓 is total. We must verify that the range
of 𝑓 is 𝑆 . For 𝑡 ∈ N, if 𝑡 ∉ 𝑆 then P𝑒 never outputs it and so 𝑡 is never defined
as 𝑓 (𝑛) . If 𝑡 ∈ 𝑆 then there must be an 𝑛 large enough that P𝑒 halts on some
input 𝑖 ≤ 𝑛 within 𝑛 steps and outputs 𝑡 . The number 𝑡 is then queued for output
by 𝑓 in the sense that it will be enumerated as, at most, 𝑓 (𝑛 + 𝑡) .
Thus, the collection of effectively listable sets is the same as the collection of
domains of computable functions. There is a standard notation for the latter.
7.4 Definition 𝑊𝑒 = {𝑥 𝜙𝑒 (𝑥)↓}
The contrast between computable and computably enumerable is that a set 𝑆 is
computable if there is a Turing machine that decides its membership, that inputs
a number 𝑥 and decides either ‘yes’ or ‘no’ whether 𝑥 ∈ 𝑆 . But with computably
enumerable, given some 𝑥 we can set up a machine to monitor the number stream
and if 𝑥 appear then this machine decides ‘yes’. However it might not ever discover
‘no’. Restated, a set is computable if there is a Turing machine that recognizes
both members and nonmembers, while a set is computably enumerable if there is
a Turing machine that recognizes members.
7.5 Lemma (a) If a set is computable then it is computably enumerable.
(b) A set is computable if and only if both it and its complement are computably
enumerable.
Proof For (a) let 𝑆 ⊆ N be computable so that its characteristic function is the
computable function 𝜙 . We will enumerate the elements of 𝑆 . Begin by using 𝜙 to
test whether 0 ∈ 𝑆 , that is, whether 𝜙 ( 0) = 1. Then test whether 1 ∈ 𝑆 , etc. If this
sequence of tests ever finds a 𝑘 0 so that 𝜙 (𝑘 0 ) = 1, then set 𝑓 ( 0) = 𝑘 0 . After that,
iterate: find the next element of 𝑆 by testing whether 𝑘 0 + 1 ∈ 𝑆 , 𝑘 0 + 2 ∈ 𝑆 , . . .
and if this testing sequence ever halts with a 𝑘 1 then set 𝑓 ( 1) = 𝑘 1 . Clearly 𝑓 is a
computable function whose range is 𝑆 .
As to (b), suppose first that 𝑆 is computable. The complement 𝑆 c is also
computable because its characteristic function is 1𝑆 c = 1 − 1𝑆 . With that, item (a)
gives that both 𝑆 and 𝑆 c are computably enumerable.
For the converse, suppose that both 𝑆 and 𝑆 c are computably enumerable. Let
𝑆 be enumerated by the function 𝑔 and let 𝑆 c be enumerated by 𝑔ˆ. To show that
𝑆 is computable we will give an effective procedure that acts as its characteristic
Section 7. Computably enumerable sets 109
II.7 Exercises
✓ 7.7 A question on the quiz asked you to define computably enumerable. A friend
says that they answered, “A set that can be enumerated by a Turing machine but
that is not computable.” Is that right?
7.8 Your study partner asks the group, “One computably enumerable set is the
empty set. But the empty set is not effectively listable, because you can’t list
nothing.” Where are they mis-thinking?
✓ 7.9 For each set, produce a function that enumerates it (a) N (b) the even numbers
(c) the perfect squares (d) the set { 5, 7, 11 }.
7.10 For each, produce a function that enumerates it (a) the prime numbers
(b) the natural numbers whose digits are in non-increasing order (e.g., 531 or
5331 but not 513).
7.11 Are there computably enumerable sets that are infinite? Finite? Empty? All
of the natural numbers?
7.12 One of these two is computable and the other is computably enumerable but
not computable. Which is which?
(a) {𝑒 P𝑒 halts on input 4 in less than twenty steps }
(b) {𝑒 P𝑒 halts on input 4 in more than twenty steps }
110 Chapter II. Background
7.13 Which of these sets are decidable, which are semidecidable but not decidable,
and which are neither? Justify in one sentence. (a) The set of indices 𝑒 such that
P𝑒 takes more than 100 steps on input 7. (b) The set of indices 𝑒 such that P𝑒
takes less than 100 steps on input 7.
7.14 (IIS, IIT 2022) One of these statements is true. Which? (a) Every proper sub-
set of a computably enumerable set is computable. (b) If a set and its complement
are both computably enumerable then both are computable.
✓ 7.15 Someone online says, “every countable set 𝑆 is computably enumerable
because if 𝑓 : N → N has range 𝑆 then you have the enumeration 𝑆 as 𝑓 ( 0) , 𝑓 ( 1) ,
. . .” Explain why this is wrong.
✓ 7.16 The set 𝐴5 = {𝑒 𝜙𝑒 ( 5)↓} is not computable. Show that it is computably
enumerable.
7.17 Show that the collection of computably enumerable sets is countable.
7.18 Every uncomputable set is infinite, since every finite set is computable. Is
every computably enumerable set infinite?
7.19 Short answer: for each set, state whether it is computable, computably
enumerable but not computable, or neither. (a) The set of indices 𝑒 of Turing
machines that contain an instruction starting with state 𝑞 4 . (b) The set of indices
of Turing machines that halt on input 4. (c) The set of indices of Turing machines
that halt on input 4 in fewer than 100 steps.
7.20 Show that the set {𝑒 𝜙𝑒 ( 2) = 4 } is computably enumerable.
7.21 Name three sets that are computably enumerable but not computable.
✓ 7.22 Let 𝐾0 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 }. (a) Show that it is computably enu-
merable. (b) Show that its columns, the sets 𝐶𝑒 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 },
make up all of the computable enumerable sets.
7.23 We know that there are subsets of N that are not computable. Do the
computably enumerable sets make up the subsets that are not computable?
✓ 7.24 Show that the set Tot = {𝑒 𝜙𝑒 (𝑥)↓ for all 𝑥 } is not computable and not
computably enumerable. Hint: if this collection is computably enumerable then we
can get a table like the one that starts Section II.5 on the Halting problem.
7.25 Prove that the set {𝑒 𝜙𝑒 ( 3)↑} is not computably enumerable.
✓ 7.26 Prove that every infinite computably enumerable set has an infinite com-
putable subset.
7.27 Define the function steps by: steps (𝑒) is the minimal number of steps so
that Turing machine P𝑒 halts if started with input 𝑒 on its tape, or is undefined
if the machine never halts. (a) Argue that this function is partial computable.
(b) Argue that it is not total. (c) Prove that it has no total computable extension,
no total computable 𝑓 : N → N so that if steps (𝑒)↓ then steps (𝑒) = 𝑓 (𝑒) .
7.28 A set is computable enumerable in increasing order if there is a computable 𝑓
that is increasing, so that 𝑛 < 𝑚 implies 𝑓 (𝑛) < 𝑓 (𝑚) , and whose range is the
Section 8. Oracles 111
set. Assume that the set 𝑆 is infinite. Prove that 𝑆 is computable if and only if it is
computably enumerable in increasing order.
7.29 A set is co-computably enumerable if its complement is computably enu-
merable. Produce a set that is neither computably enumerable nor co-computably
enumerable.
7.30 (Compare this with the next exercise.) Computability is a property of sets so we
can consider its interaction with set operations. (a) Must a subset of a computable
set be computable? (b) Must the union of two computable sets be computable?
(c) Intersection? (d) Complement?
7.31 (Compare this with the prior exercise.) We can consider the interaction of
computable enumerability with set operations. (a) Must a subset of a computably
enumerable set be computably enumerable? (b) Must the union of two computably
enumerable sets be computably enumerable? (c) Intersection? (d) Complement?
7.32 Finish the proof of Lemma 7.3 by showing that the second and third items
are equivalent.
Section
II.8 Oracles
The problem of deciding whether a given machine halts is so hard that
it is unsolvable. It this the absolutely hardest problem or are there
ones that are even harder?
What does it mean to say that one problem is harder than another?
We have compared problem hardness already, for instance when we
considered the problem of whether a Turing machine halts on input 3.
There we proved that if we could solve the halts-on-3 problem then we
could solve the Halting problem. That is, we proved that halts-on-3 is
at least as hard as the Halting problem. So, the idea is that one problem
is at least as hard as a second one if solving the first would also give
a solution to the second.†
Under Church’s Thesis we interpret the unsolvability of the Halting
problem to say that no mechanism can answer all questions about
membership in 𝐾 . So if we want to answer questions about problems Priestess of Del-
that are related to 𝐾 then we need the answers to be supplied in some phi (Collier
‡ 1891)
way that isn’t an in-principle physically realizable discrete machine
Consequently, we posit an oracle that we attach to the Turing
machine and that acts as the characteristic function of a set. Thus, to see what we
could computed if we somehow were able to solve the Halting problem, we attach
†
We can instead conceptualize that the first problem is at least as general as the second. An example is
that the problem of inputting a natural number and outputting its prime factors is at least as general as
the problem of inputting a natural and determining whether it is divisible by seven. ‡ Turing introduced
oracles in his PhD thesis. He said, “We shall not go any further into the nature of this oracle apart from
saying that it cannot be a machine.”
112 Chapter II. Background
a 𝐾 -oracle that answers questions of the form “Is 𝑥 ∈ 𝐾 ?” for any 𝑥 . This oracle is
a black box, meaning that we can’t open it to see how it works.†
..
.
Y N
𝑥 ∈ oracle?
(if (oracle? x)
(displayln "It is in the set")
(displayln "It is not in the set"))
.. ..
. .
We can change the oracle without changing the program code: in the picture if
we swap out black boxes, exchanging an 𝑋 oracle for a 𝑌 oracle, then the white
CPU box is unchanged. Of course, the values returned by the oracle queries may
change, which may change the tape output when we run the two-box system. But
such a swap leaves the white hardware unchanged.
The rest of what we have already developed about Turing machines carries
over. In particular, each such machine — each CPU box — has an index. That index
is source-equivalent, meaning that from an index we can compute the machine
source and from the source we can find the index.
Therefore to fully specify a such a computation, we must specify which machine
we are using, along with specifying which oracle. That explains the notations
for the white box, the oracle-ready Turing machine, P𝑒𝑋, and for the associated
functions, 𝜙𝑒𝑋.
8.1 Definition Let 𝑋 be a set. If the characteristic function of a set 𝑆 can be
computed relative to 𝑋 , that is, if 1𝑆 = 𝜙𝑒𝑋 for some 𝑒 , then we say that 𝑆 is
computable from the oracle 𝑋 , or is 𝑋 -computable, or is computable relative to 𝑋 ,
or that 𝑆 reduces to 𝑋 , or is Turing reducible to 𝑋 , denoted 𝑆 ≤𝑇 𝑋 .
†
Opening it would let out the magic smoke, the stuff inside of an electronic component that makes it
work. After all, once the smoke get out, the component no longer works. ‡ A particular computation
relative to an oracle might use one such query, or more than one, or none at all.
Section 8. Oracles 113
Read 𝑥 , 𝑦 Read 𝑦
42 – if 𝜙𝑥 (𝑥)↓
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
Print 42 Print 42
End End
With that we can build the oracle machine. The machine charted below is
P𝑒𝑋. It uses 𝑒 0 from the prior paragraph. If we connect it to an 𝐴 oracle then it
computes the characteristic function of 𝐾 , by the prior paragraph’s final sentence.
Start
Read 𝑘
Y 𝑠 (𝑒0, 𝑘 ) ∈ oracle? N
Print 1 Print 0
End
We have a kind of ordering, where some sets precede others in the sense that
when 𝑆 ≤𝑇 𝑋 then 𝑆 is before 𝑋 .† The intuition is that sets that are larger in this
ordering “contain more information” or are “computationally harder” than the
ones that precede them. We next show that there are sets that come at the very
beginning of this ordering, set that less than or equal to every other set.
8.4 Lemma If a set 𝑌 ⊆ N is computable then for any 𝑋 ⊆ N at all, 𝑌 ≤𝑇 𝑋 . In
particular, ∅ ≤𝑇 𝑋 and N ≤𝑇 𝑋 . Further, a set 𝑍 ⊆ N is computable if and only if
it is reducible to the empty set, 𝑍 ≤𝑇 ∅, or to any computable set.
Proof Assume that 𝑌 is computable, so that its characteristic function is computable.
That characteristic function can be computed relative to 𝑋 using an oracle Turing
machine, simply by never referring to the oracle, never asking it any questions.
As to the second statement, the prior paragraph proves that if a set is computable
then it is reducible to a computable set. For the other half of the double implication,
suppose that the characteristic function of 𝑍 can be computed by reference to a
computable oracle, so that 1𝑍 = 𝜙𝑒𝑋 where 1𝑋 is computable. Then replacing
oracle calls in the machine P𝑒𝑋 with direct computations of 1𝑋 will compute 1𝑍
without reference to an oracle. Hence, 𝑍 is computable.
8.5 Definition Two sets 𝐴, 𝐵 are Turing equivalent or 𝑇 -equivalent, denoted 𝐴 ≡𝑇 𝐵 ,
if both 𝐴 ≤𝑇 𝐵 and 𝐵 ≤𝑇 𝐴.
Showing that two sets are 𝑇 -equivalent, that they are inter-computable, shows
that there is an underlying unity between the two seemingly-different problems,
that they are restatements of the same problem.
8.6 Example Any two computable sets are Turing equivalent, by Lemma 8.4.
8.7 Example In Example 8.2 we proved that 𝐾 ≤𝑇 𝐴 where 𝐴 = {𝑒 P𝑒 halts on 3 }.
Exercise 8.22 uses a very similar argument to show that 𝐾 ≤𝑇 𝐴. So 𝐴 ≡𝑇 𝐾 .
Of course, the Halting problem asks whether P𝑒 halts on input 𝑒 . A person may
perceive that a more natural problem is deciding whether P𝑒 halts on input 𝑥 .
8.8 Definition 𝐾0 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 }
We will show that the two are Turing equivalent, that with access to solutions
to one problem then we can compute solutions to the other.‡
8.9 Theorem 𝐾 ≡𝑇 𝐾0 .
Proof For 𝐾 ≤𝑇 𝐾0 , suppose that we have access to a 𝐾0 -oracle. Then this
machine, if connected to that oracle, will have the input/output behavior that is
the characteristic function of 𝐾 .
†
This ordering between sets turns out not to be linear. It is more like the ‘divides’ relation between
integers, where 2 divides 6 and 2 divides 10, but 6 and 10 are not related. ‡ Thus our choice of 𝐾 as
our touchstone is just a matter of convenience and convention. We use it because it is the standard in the
literature and because it has some technical advantages, including that it falls out of the diagonalization
development that we did at the start of the Halting problem section.
Section 8. Oracles 115
Start
Read 𝑥
Y
h𝑥, 𝑥 i ∈ oracle? N
Print 1 Print 0
End
What remains is the 𝐾0 ≤𝑇 𝐾 half. Consider the flowchart on the left below.
It halts for the input triple ⟨𝑒, 𝑥, 𝑦⟩ if and only if ⟨𝑒, 𝑥⟩ ∈ 𝐾0 . By Church’s Thesis
there is a Turing machine implementing it; let that machine have index 𝑒 0 .
Start Start
Read 𝑒, 𝑥, 𝑦 Read 𝑦
Simulate P𝑒 Simulate P𝑒
on input 𝑥 on input 𝑥
Print 42 Print 42
End End
Get the flowchart on the right by applying the s-m-n theorem to parametrize 𝑒
and 𝑥 . That is, on the right is a sketch of P𝑠 (𝑒0,𝑒,𝑥 ) .
Now for the oracle Turing machine. Given a pair ⟨𝑒, 𝑥⟩ , the right-side machine
above, P𝑠 (𝑒0,𝑒,𝑥 ) , behaves the same on all inputs 𝑦 , namely, it either halts on all
inputs or fails to halt on all inputs, depending on whether 𝜙𝑒 (𝑥)↓. In particular,
P𝑠 (𝑒0,𝑒,𝑥 ) halts on input 𝑠 (𝑒 0, 𝑒, 𝑥) , so that 𝑠 (𝑒 0, 𝑒, 𝑥) ∈ 𝐾 , if and only if 𝜙𝑒 (𝑥)↓.
Start
Read 𝑒 , 𝑥
Y
𝑠 (𝑒0, 𝑒, 𝑥 ) ∈ oracle? N
Print 1 Print 0
End
Thus, given oracle 𝐾 the machine above acts as the characteristic function of 𝐾0 .
8.10 Corollary The Halting problem is at least as hard as any computably enumerable
problem: 𝑊𝑒 ≤𝑇 𝐾 for all 𝑒 ∈ N.
Proof By Lemma 7.3 the computably enumerable sets are the columns of 𝐾0 , as
𝑊𝑒 = {𝑦 𝜙𝑒 (𝑦)↓} = {𝑦 ⟨𝑒, 𝑦⟩ ∈ 𝐾0 }. So 𝑊𝑒 ≤𝑇 𝐾0 ≡𝑇 𝐾 .
Because every computably enumerable set is Turing-computable from 𝐾 , we
say that 𝐾 is complete among the computably enumerable sets.
Jumping We are ranking sets by how hard they are to compute. This illustrates.
The dots are sets 𝑆 ⊂ N, where in the 𝑆 ≤𝑇 𝑋 ordering, 𝑋 is drawn higher than 𝑆 .
116 Chapter II. Background
The computable sets are at the bottom, grouped in with the empty set. The sets
that are Turing equivalent to 𝐾 are grouped together at the top of the computably
enumerable sets.
𝐾𝐾
𝐾
c.e. sets
We finish this section by describing a way, given a set 𝑆 , to jump further up the
order than 𝑆 . (The set 𝐾 𝐾 illustrates the jump of 𝐾 .)
8.11 Theorem Where the relativized Halting problem is to determine membership
in 𝐾 𝐾 = {𝑥 𝜙𝑥𝐾 (𝑥)↓}, its solution is not computable from a 𝐾 oracle. That is,
there is no index 𝑒 ∈ N such that 𝜙𝑒𝐾 is the characteristic function of 𝐾 𝐾.
Proof We will adapt the proof that the Halting problem is unsolvable. Assume
otherwise, that there is a computation relative to a 𝐾 oracle, P𝑒𝐾0, that acts as the
characteristic function of 𝐾 𝐾.
1 – if 𝜙𝑥𝐾 (𝑥)↓
(
P𝑒𝐾0 (𝑥) = 1𝐾 𝐾 (𝑥) = (∗)
0 – otherwise
Consider the function on the left below, along with its flowchart on the right. Inside
the decision box the computation uses a 𝐾 oracle. (Rather than describe it as an
oracle-ready chart with a general oracle 𝑋 and then later say that we give it 𝐾 as 𝑋 ,
we’ve just written in the 𝐾 .) The first equality in (∗) makes P𝑒𝐾0 a total function.
Start
Read 𝑥
Y
𝜙𝑒𝐾0 (𝑥 ) = 0 ? N
42 – if 𝜙𝑥𝐾 (𝑥)↑
(
𝑓 𝐾(𝑥) =
Print 42 Infinite loop
↑ – if 𝜙𝑥𝐾 (𝑥)↓
End
Since 𝑓 is computable, it has an index. Let that index be 𝑒ˆ, so that 𝑓 𝐾 = 𝜙𝑒𝐾 ˆ
.
Now feed 𝑓 its own index — that is, consider 𝑓 𝐾(𝑒ˆ) = 𝜙𝑒𝐾 ˆ
( ˆ
𝑒 ) . If that diverges
then we follow the first clause in the definition of 𝑓 , which gives that 𝑓 𝐾 (𝑒ˆ) ↓,
which is a contradiction. If instead 𝑓 𝐾 (𝑒ˆ) converges then the second clause in
the definition of 𝑓 gives that 𝑓 𝐾 (𝑒ˆ)↑, also a contradiction. Either way, assuming
that (∗) can be computed relative to a 𝐾 oracle gives an impossibility.
8.12 Theorem For any set 𝑆 , that relativized Halting problem for 𝑆 is to determine
membership in 𝐾 𝑆 = {𝑥 𝜙𝑥𝑆 (𝑥)↓}. Every set is reducible to its relativized Halting
problem: 𝑆 ≤𝑇 𝐾 𝑆.
Section 8. Oracles 117
Start Start
Read 𝑥, 𝑦 Read 𝑦
Y N Y N
𝑥 ∈ oracle? 𝑥 ∈ oracle?
Print 42 Loop Print 42 Loop
End End
The machine halts for any input 𝑦 if and only if 𝑥 is a member of the
P𝑠𝑋(𝑒 ,𝑥 )
0
oracle. Taking the oracle to be 𝑆 and 𝑦 to be 𝑠 (𝑒 0, 𝑥) gives that 𝑥 ∈ 𝑆 if and only if
𝜙𝑠𝑆(𝑒0,𝑥 ) (𝑠 (𝑒 0, 𝑥))↓, which in turn holds if and only if 𝑠 (𝑒 0, 𝑥) ∈ 𝐾 𝑆. So if 𝐾 𝑆 is the
oracle for the following machine
Start
Read 𝑥
Y
𝑠 (𝑒0 , 𝑥 ) ∈ oracle? N
Print 1 Print 0
End
II.8 Exercises
Recall that a Turing machine is a decider for a set if it computes the characteristic
function of that set.
8.14 How to answer to your friend? “An oracle machine is a Turing machine with
a black box oracle that is able to decide certain problems in a single operation. As
the oracle you can even use undecidable problems, such as the Halting problem.
But isn’t assuming the existence of a machine which can decide the Halting problem
. . . problematic?”
8.15 Both oracles and deciders take in a number and return 0 or 1, giving whether
that number is in the set. What’s the difference?
118 Chapter II. Background
✓ 8.16 Your friend says the professor, “Oracle machines are not real so why talk
about them?” What should the professor say?
8.17 Your classmate says they answered a quiz question to define an oracle with,
“A set to solve unsolvable problems.” Give them a gentle critique.
✓ 8.18 Is there an oracle for every problem? For every problem is there an oracle?
8.19 A person in your class asks, “Oracles can solve unsolvable problems, right?
And 𝐾 𝐾 is unsolvable. So an oracle like the 𝐾 oracle should solve it.” Help your
prof out here; suggest a response.
✓ 8.20 Suppose that the set 𝐴 is Turing-reducible to the set 𝐵 . Which of these are
true?
(a) A decider for 𝐴 can be used to decide 𝐵 .
(b) If 𝐴 is computable then 𝐵 is computable also.
(c) If 𝐴 is uncomputable then 𝐵 is uncomputable too.
8.21 Where 𝐵 ⊆ N is a set, let 2𝐵 = { 2𝑏 𝑏 ∈ 𝐵 }. We will show that 𝐵 ≡𝑇 2𝐵 .
(a) Give a flowchart sketching a machine that, given access to oracle 2𝐵 , will
act as the characteristic function of 𝐵 . That is, this machine witnesses that
𝐵 ≤𝑇 2𝐵 .
(b) Sketch a machine that, given access to oracle 𝐵 , will act as the characteristic
function of 2𝐵 . This machine witnesses that 2𝐵 ≤𝑇 𝐵 .
✓ 8.22 Where 𝐴 = {𝑒 P𝑒 halts on 3 }, show that 𝐴 ≤𝑇 𝐾 . Hint: this machine
satisfies that 𝜙𝑖 (𝑖)↓ if and only if 𝜙𝑥 ( 3)↓.
Start
Read 𝑦
Run P𝑥 on 3
End
✓ 8.23 The set 𝑆 = {𝑥 𝜙𝑒 ( 3)↓ and 𝜙𝑒 ( 4)↓} is not computable. Sketch how to
compute it using a 𝐾 oracle. That is, sketch an oracle machine that shows 𝑆 ≤𝑇 𝐾 .
Hint: follow Example 8.2.
✓ 8.24 For the set 𝑆 = {𝑒 𝜙𝑒 ( 3)↓}, show that 𝑆 ≤𝑇 𝐾0 .
✓ 8.25 Show that 𝐾 ≤𝑇 {𝑥 𝜙𝑥 (𝑦) = 2𝑦 for all input 𝑦 }.
8.26 Consider the set {𝑥 𝜙𝑥 ( 𝑗) = 7 for some 𝑗 }.
(a) Show that it is not computable using Rice’s theorem.
(b) Sketch how to compute it using a 𝐾 oracle.
8.27 Let 𝑆 = {𝑥 ∈ N 𝜙𝑥 ( 3)↓ and 𝜙 2𝑥 ( 3)↓ and 𝜙𝑥 ( 3) = 𝜙 2𝑥 ( 3) }. Show 𝑆 ≤𝑇 𝐾
by producing a way to answer questions about membership in 𝑆 from a 𝐾 oracle.
8.28 Recall that a computable function 𝜙 is total if 𝜙 (𝑦)↓ for all 𝑦 ∈ N. The set of
total functions is Tot. Show that 𝐾 ≤𝑇 Tot.
Section 9. Fixed point theorem 119
Section
II.9 Fixed point theorem
Recall our first example of diagonalization, the proof that the set of real numbers is
not countable, on page 74. We assumed that there is an onto function 𝑓 : N → R
and considered its inputs and outputs, as illustrated in this table.
Let row 𝑛 ’s decimal representation be 𝑑𝑛 = 𝑑ˆ𝑛 .𝑑𝑛,0 𝑑𝑛,1 𝑑𝑛,2 ... Go down the diagonal
to the right of the decimal point to get the sequence of digits 𝑑 0,0, 𝑑 1,1, 𝑑 2,2, ...,
which in the illustration above is 3, 1, 4, 5, ... Using that, construct a number 𝑧 =
0.𝑧 0 𝑧 1 𝑧 2 ... by making its 𝑛 -th decimal place 𝑧𝑛 be something other than 𝑑𝑛,𝑛 . In
120 Chapter II. Background
When diagonalization fails What if the transformation is such that the diagonal
is a row, that 𝑧 = 𝑓 (𝑛 0 ) ? Then the array member where the diagonal crosses
that row is unchanged by the transformation, 𝑑𝑛0,𝑛0 = 𝑡 (𝑑𝑛0,𝑛0 ) . Conclusion: if
diagonalization fails then the transformation has a fixed point.
We will apply this to sequences of computable functions, 𝜙𝑖 0 , 𝜙𝑖 1 , 𝜙𝑖 2 ... We
are interested in effectiveness so we take the indices 𝑖 0, 𝑖 1, 𝑖 2 ... to be computable,
meaning that for some 𝑒 we have 𝑖 0 = 𝜙𝑒 ( 0) , 𝑖 1 = 𝜙𝑒 ( 1) , 𝑖 2 = 𝜙𝑒 ( 2) , etc. In short,
a computable sequence of computable functions has this form.
Sequence term
𝑛 =0 𝑛 =1 𝑛 =2 𝑛 =3 ...
𝑒 =0 𝜙𝜙 0 ( 0 ) 𝜙𝜙 0 ( 1 ) 𝜙𝜙 0 ( 2 ) 𝜙𝜙 0 ( 3 ) ...
𝑒 =1 𝜙𝜙 1 ( 0 ) 𝜙𝜙 1 ( 1 ) 𝜙𝜙 1 ( 2 ) 𝜙𝜙 1 ( 3 ) ... (∗)
Sequence 𝑒 =2 𝜙𝜙 2 ( 0 ) 𝜙𝜙 2 ( 1 ) 𝜙𝜙 2 ( 2 ) 𝜙𝜙 2 ( 3 ) ...
𝑒 =3 𝜙𝜙 3 ( 0 ) 𝜙𝜙 3 ( 1 ) 𝜙𝜙 3 ( 2 ) 𝜙𝜙 3 ( 3 )
.. .. ..
. . .
Each entry 𝜙𝜙𝑒 (𝑛) is a computable function. If the index 𝜙𝑒 (𝑛) diverges then the
function as whole diverges.
As to the transformation, the natural one is this, for the computable function 𝑓 .
𝑡𝑓 𝑡𝑓
𝜙𝑥 ↦−→ 𝜙 𝑓 (𝑥 ) so that 𝜙𝜙𝑖 ( 𝑗 ) ↦−→ 𝜙 𝑓 (𝜙𝑖 ( 𝑗 ) )
family, 𝜙𝑠 (𝑒0,𝑛) , computes the 𝑛 -th function on the diagonal of the array (∗) above,
𝜙𝜙 0 ( 0 ) , 𝜙𝜙 1 ( 1 ) , 𝜙𝜙 2 ( 2 ) ...
Start Start
Read 𝑛, 𝑥 Read 𝑥
End End
Start
Read 𝑥
𝜙 𝑓 𝜙𝑛 (𝑛) (𝑥) – if 𝜙𝑛 (𝑛)↓ Run P𝑛 on 𝑛
(
𝜙 𝑓 𝑔 (𝑛) (𝑥) =
With the result 𝑤 ,
↑ – otherwise
run P 𝑓 (𝑤) on 𝑥
End
Start Start
Read 𝑚, 𝑥 Read 𝑥
42 – if 𝑥 = 𝑚
(
Y 𝑥 = 𝑚? N Y 𝑥 = 𝑚? N
𝜙𝑠 (𝑒𝑜 ,𝑚) (𝑥) =
↑ – otherwise
Print 42 Loop Print 42 Loop
End End
Start Start
Read 𝑥 , 𝑦 Read 𝑦
Print 𝑥 Print 𝑥
End End
Discussion The Fixed Point Theorem and its proof are often considered mysterious,
or at any rate obscure. Here we will develop a point about the role of naming in
the result.
Section 9. Fixed point theorem 123
Compare the sentence Atlantis is a mythical city with There are two t’s in ‘Atlantis’.
In the first we say that ‘Atlantis’ is used because it points to something, it has a value,
it names something. In the second ‘Atlantis’ is not referring to something — its value
is itself — so we say that it is mentioned.† This is the use-mention distinction, that
we are using the word on two different levels.
A version of this happens in computer programming. See the C language code
below. There, x and y are variables. If these were ordinary variables then the
compiler would associate them with a particular memory cell. For instance, if an
ordinary variable a were associated with cell 122 then the statement a = 5 would
result in the value 5 being stored in that cell. This is a name for the cell.
But the second line’s asterisk means that x and y are not
ordinary variables, they are pointers, which are associated with
a cell but have some additional implications. The four vertical
arrays illustrate by showing a machine’s memory cells over time.
They imagine that the compiler associates x with register 123
and y with 124. The first array has that cell 123 holds the
number 901 and cell 124 is 902.
Because these are pointers, we have declared to the compiler
that we are interested in the contents of the memory cells that Courtesy xkcd.com
they point to: cell 123 is itself a name for location 901, and
124 names 902. The second vertical array illustrates, showing the effect of running
the *x = 42 statement. The system does not put 42 into 123, rather it puts 42
into 901. Next, with y = x the system sets the cell named by y to point to the
same address as x’s cell, address 901. Finally, the last line puts 13 where y points,
which is at this moment the same cell to which x points.
. . . .
Address .
.
.
.
.
.
.
.
void main() { 123 901 901 901 901
int* x,y; 124 902 902 901 901
x = malloc(sizeof(int));
y = malloc(sizeof(int)); . . . .
. . . .
. . . .
*x = 42;
y = x; 901 42 42 13
*y = 13; 902
}
Here, as with ‘Atlantis’, x and y are being used on two different levels. One is
that x refers to the contents of register 123, so it names 123. The other level is
that the system is set up to refer to the contents of the contents, that is, to what’s
in address 901. On this level, x and y are names for names.
As to the role played by names in the Fixed Point Theorem, recall the Padding
†
We see this distinction in programming books. In the sentence, “The number of players is players”
the first ‘players’ refers to people while the second is a program variable. The typewriter font helps
with the distinction. Similarly in this book we use italic for variables such as 𝑎 , which have a value, and
typewriter for characters such as a, which are a value.
124 Chapter II. Background
Lemma, Lemma 2.18, that every computable function has infinitely many indices.
So it is easy for a computable function to have two different names. We see this
in Theorem 9.1, where the conclusion that 𝜙𝑘 = 𝜙 𝑓 (𝑘 ) does not say that the two
indices are equal. Rather it says that they describe machines that give rise to the
same input/output relationship.
Another example is that in the proof 𝑔(𝑛) is this.
So 𝑔(𝑛) , 𝑠 (𝑒 0, 𝑛) , and 𝜙𝑛 (𝑛) are names for the same function. Again, equality of
the named functions does not imply that the names are equal.
Informally, what 𝑔(𝑛) names is the procedure, “Given input 𝑥 , run P𝑛 on input 𝑛
and if it halts with output 𝑤 then run P𝑤 on input 𝑥 .” Shorter: “Produce 𝜙𝑛 (𝑛)
and then do 𝜙𝑛 (𝑛) .” So here also we see the use-mention distinction.
One way in which this distinction between what is named and the name itself
is important is that regardless of whether 𝜙𝑛 (𝑛) converges, we can nonetheless
compute the index 𝑔(𝑛) and from it the instruction set P𝑔 (𝑛) . There is an analogy
here with ‘Atlantis’ — despite that the referred-to city doesn’t exist we can still
sensibly assert things about its name.
In summary, the Fixed Point Theorem is deep, showing that surprising and
interesting behaviors occur in any sufficiently powerful computation system.
II.9 Exercises
9.6 Your friend asks you about the proof of the Fixed Point Theorem, Theorem 9.1.
“The last line says 𝜙𝑔 (𝑣) = 𝜙𝜙 𝑣 (𝑣) ; isn’t this just saying that 𝑔(𝑣) = 𝜙 𝑣 (𝑣) ? Why the
circumlocution?” What can you say?
✓ 9.7 Show each. (a) There is an index 𝑒 such that 𝜙𝑒 = 𝜙𝑒+7 . (b) There is an 𝑒
such that 𝜙𝑒 = 𝜙 2𝑒 .
9.8 What conclusion can you draw by applying the Fixed Point Theorem to the
adds-five function 𝑥 ↦→ 𝑥 + 5? Generalize.
9.9 What conclusion can you draw about acceptable enumerations of Turing
machines by applying the Fixed Point Theorem to each of these? (a) The tripling
function 𝑥 ↦→ 3𝑥 . (b) The squaring function 𝑥 ↦→ 𝑥 2 . (c) The function that gives
0 except for 𝑥 = 5, when it gives 1. (d) The constant function 𝑥 ↦→ 42.
✓ 9.10 We will prove that there is an 𝑚 such that 𝑊𝑚 = {𝑥 𝜙𝑚 (𝑥)↓} = {𝑚 2 }.
(a) Produce this uniformly computable family of functions.
42 – if 𝑦 = 𝑥 2
(
𝜙𝑠 (𝑒0,𝑥 ) (𝑦) =
↑ – otherwise
(b) Observe that 𝑒 0 is fixed so that 𝑠 (𝑒 0, 𝑥) is a function of one variable only, and
call that function 𝑔 : N → N.
Extra A. Hilbert’s Hotel 125
– if 𝑥 ∈ 𝐹
(
𝑒0
ℎ(𝑥) =
𝑓 (𝑥) – otherwise
Show that ℎ has no fixed point, contradicting the Fixed Point theorem.
126 Chapter II. Background
Extra
II.A Hilbert’s Hotel
II.A Exercises
A.1 Imagine that the hotel is empty. A hundred buses arrive, where bus 𝐵𝑖
contains passengers 𝑏𝑖,0 , 𝑏𝑖,1 , etc. Give a scheme for putting them in rooms.
A.2 Give a formula assigning a room to each person from the infinite bus convoy.
A.3 The hotel builds a parking lot. Each floor 𝐹𝑖 has infinitely many spaces 𝑓𝑖,0 ,
𝑓𝑖,1 , . . . And, no surprise, there are infinitely many floors 𝐹 0 , 𝐹 1 , . . . One day
when the hotel is empty a fleet of buses arrive, one per parking space, each with
infinitely many people. Give a way to accommodate all these people.
A.4 The management is irked that this hotel cannot fit all of the real numbers. So
they announce plans for a new hotel, with a room for each 𝑟 ∈ R. Can they now
cover every possible set of guests?
†
Alas, the infinite hotel does not now exist. The guest in room 0 said that the guest from room 1 would
cover both of their bills. The guest from room 1 said yes, but in addition the guest from room 2 had
agreed to pay for all three rooms. Room 2 said that room 3 would pay, etc. So Hilbert’s Hotel made no
money despite having infinitely many rooms, or perhaps because of it.
Extra B. Unsolvability in intellectual culture 127
Extra
II.B Unsolvability in intellectual culture
Unsolvability results such as the Halting problem are about limits. Interpreted
in the light of Church’s Thesis, they say that there are things that we cannot do.
These results had an impact on the culture of mathematics but they also had an
impact on the wider intellectual world.
The discussion here is in the context of the history of European intellectual
culture, the context in which early Theory of Computation results appeared. A
broader view is beyond our scope.
With Napoleon’s downfall in the early 1800’s, many
people in Europe felt a swing back to a sense of order
and optimism, and progress. For example, in the history
of Turing’s native England, Queen Victoria’s reign from
1837 to 1901 seemed to many English commentators to
be an extended period of prosperity and peace. Across
Europe, many people perceived that the natural world
was being tamed with science and engineering — witness
the introduction of steam railways in 1825, the opening
of the Suez Canal in 1869, and the invention of the Queen Victoria opens the Great
Exhibition, 1851
electric light in 1879.†
In science this optimism was captured by the physicist
A A Michelson, who wrote in 1899, “The more important fundamental laws and
facts of physical science have all been discovered, and these are now so firmly
established that the possibility of their ever being supplanted in consequence of
new discoveries is exceedingly remote.”
The twentieth century physicist R Feynman likened science to
working out nature’s rules, “to try to understand nature is to imagine
that the gods are playing some great game like chess. . . . And you
don’t know the rules of the game, but you’re allowed to look at the
board from time to time, in a little corner, perhaps. And from these
observations, you try to figure out what the rules are of the game.”
Around the year 1900 many observers thought that we basically had
got the rules and that although there might remain a couple of obscure
David Hilbert
things like castling, soon enough those would be done also.
1862–1943
In Mathematics, this view was most famously voiced in an address
given by Hilbert in 1930, “We must not believe those, who today, with philosoph-
ical bearing and deliberative tone, prophesy the fall of culture and accept the
ignorabimus. For us there is no ignorabimus, and in my opinion none whatever in
natural science. In opposition to the foolish ignorabimus our slogan shall be: We
†
This is not to say that the perception was justified. Disease and poverty were rampant, imperialism
ruined millions of lives around the world, for much of the time the horrors of industrial slavery in the
US south went unchecked, and Europe was hardly an oasis of calm, with for instance the revolutions of
1848. Nonetheless the general feeling included a sense of progress, of winning.
128 Chapter II. Background
must know — we will know.” (‘Ignorabimus’ means ‘that which we must be forever
ignorant of ’ or ‘that thing which we will never fully penetrate’.) There was of
course a range of opinion but the zeitgeist was that we could expect that any
question would be settled, and perhaps soon.†
But starting in the early 1900’s, that changed. Exhibit A is the
picture to the right. That the modern mastery of mechanisms can have
terrible effect on human bodies became apparent to everyone during
World War I, 1914–1918. Ten million military men died. Overall,
seventeen million people died. With universal conscription, probably
the men in this picture did not want to be here. Probably they were
killed by someone who also did not want to be there, who never knew
that he killed them, and who simply entered coordinates into a firing
mechanism. For people at those coordinates, it didn’t matter how brave
World War I
they were, or how strong, or how right was their cause — they died.
trench dead
The zeitgeist shifted: Pandora’s box was now opened and the world
had become not at all ordered, reasoned, or sensible.
At something like the same time in science, Michelson’s assertion that physics
was a solved problem was destroyed by the discovery of radiation. This brought in
quantum theory, that has at its heart randomness, that included the uncertainty
principle, and that led to the atom bomb.
With Einstein we see the cultural shift directly. After experiments during a solar
eclipse in 1919 provided strong support for his theories, he became an overnight
celebrity. He was seen as having changed our view of the universe from Newtonian
clockwork to one where “everything is relative.” His work showed that the universe
has limits and that old certainties break down: nothing can travel faster than light
and even the commonsense idea of two things happening at the same instant falls
apart.
There were many reflections of this loss of certainty. For
example, the generation of writers and artists who came of age
in World War I — including Fitzgerald, Hemingway, and Stein —
became known as the Lost Generation. They expressed their
experience through themes of alienation, isolation, and dismay.
In music, composers such as Debussy and Mahler broke with
the traditional forms in ways that were often hard for listeners —
Stravinsky’s Rite of Spring caused a near riot at its premiere in S Dali’s 1931 Persistence
1913. As for visual arts, the painting here shows the same themes. of Memory.
In mathematics, much the same inversion of the standing
order happened in 1930 with K Gödel’s announcement of the Incompleteness
†
Below we will cite some things as turning points that occur before 1930; how can that be? For one
thing, it is typical for cultural shifts to have muddled timelines. Another thing is that this is Hilbert’s
retirement address so we can reasonably take his as a lagging view. Finally, in Mathematics the shift
occurred later than in the general culture. We mark that shift with the announcement of Gödel’s
Incompleteness Theorem, discussed below. That announcement came at the same meeting as Hilbert’s
speech, on the day before it. Gödel was in the audience for Hilbert’s address and during it whispered to
O Taussky-Todd, “He doesn’t get it.”
Extra C. Self Reproduction 129
Theorem. This says that if we fix a sufficiently strong formal system such as the
elementary theory of N with addition and multiplication then there are statements
that, while true in the system, cannot be proved in that system.†
This statement of hard limits seemed to many to be especially
striking in mathematics, which traditionally held the place as
the most solid of knowledge. For example, I Kant said, “I assert
that in any particular natural science, one encounters genuine
scientific substance only to the extent that mathematics is present.”
This is all the more impactful as Gödel’s results are not about a
Gödel and friend, 1947 specialized area of only technical interest but instead are about
statements in the natural numbers and about proof itself, and so
the hole that Gödel finds lies at the very foundation of rational thought.
To be a mathematical proof, each step in an argument must be verifiable as
either an axiom or as a deduction that is valid from the prior steps. So proving a
mathematical theorem is a kind of computation.‡ Thus, Gödel’s Theorem and other
uncomputability results are in the same vein. In fact, from a proof of the Halting
problem, we can get to a proof of Gödel’s Theorem in a way that is reasonably
straightforward. (Of course, while part of the battle is the technical steps, a larger
part is the genius of envisioning the statement at all.)
To people at the time these results were deeply shocking, revolutionary. And
while we work in an intellectual culture that has absorbed this shock, we must
nevertheless still recognize them as a bedrock.
Extra
II.C Self Reproduction
†
Gödel produces a statement that asserts, in a coded way, “This statement cannot be proved.” If false
then it could be proved but false statements cannot be proved in the natural numbers. So it must be
true. But then it, indeed, is true but cannot be proved to be so. ‡ This implies that you could start with
all of the axioms and apply all of the logic rules to get a set of theorems. Then application of all of the
logic rules to those will give all the second-rank theorems, etc. In this way, by dovetailing from the
axioms you can in principle computably enumerate the theorems.
130 Chapter II. Background
Paley’s watch In 1802, W Paley argued for the existence of a god from a
perception of unexplained order in the natural world.
In crossing a heath, . . . suppose I had found a watch upon the ground . . . [W]hen
we come to inspect the watch we perceive . . . that its several parts are framed and put
together for a purpose, e.g., that they are so formed and adjusted as to produce motion,
and that motion so regulated as to point out the hour of the day . . . the inference we
think is inevitable, that the watch must have a maker — that there must have existed,
at some time and at some place or other, an artificer or artificers who formed it for the
purpose which we find it actually to answer, who comprehended its construction and
designed its use.
The marks of design are too strong to be got over. Design must have had a designer.
That designer must have been a person. That person is GOD.
This essay was very influential before the development by Darwin and Wallace
of the theory of differential reproduction through natural selection.
Paley then gives his strongest argument, that the most incredible
thing in the natural world, that which distinguishes living things from
stones or machines, is that they can, if, given a chance, self-reproduce.
Suppose, in the next place, that the person, who found the watch,
would, after some time, discover, that, in addition to all the properties
which he had hitherto observed in it, it possessed the unexpected property
of producing, in the course of its movement, another watch like itself . . . If
that construction without this property, or which is the same thing, before
this property had been noticed, proved intention and art to have been
William Paley
employed about it; still more strong would the proof appear, when he
1743–1805
came to the knowledge of this further property, the crown and perfection
of all the rest.
This captures that for many pre-evolution thinkers, from among all the things
in the world to marvel at — the graceful shell of a nautilus, the precision of an
eagle’s eye, or consciousness — the greatest wonder was self-reproduction. It may
seem, for example, that making a machine to weave a rug is possible only because
the rug is less complex than the machine. In this mindset, having something that
assembles a copy of itself appears to be an impossibility, a kind of magic. But that’s
wrong. The Fixed Point Theorem gives self-reproducing mechanisms.
A person might think to include the source as a string within the source. Below
is a start at that,† which we can call try0 .c. But this is naive. The string would
have to contain another string, etc. Like the homunculus theory, this leads to an
infinite regress. Instead, we need a program that somehow contains instructions
for computing a part of itself.
main() {
printf("main(){\n ... }");
}
This is close. Escaping some newlines and quotation marks# leads to this program,
try3 .c, which works.
char *e="char*e="%c %s %c; %c main() {printf(e,34,e,34,10,10);}%c";
main(){printf(e,34,e,34,10,10);}
The verb ‘to quine’ means to write a sentence fragment a first time, and then
to write it a second time, but with quotation marks around it. For example, from
‘say’ we get “say ‘say’.” Another is “quine ‘quine’.” This is a linguistic analog of the
self-reproducing programs where the second word plays the part of the data in a
traditional program/data split, the same part as is played by try3 .c’s first line
string. That part is also played by ‘produce’ in “Produce the machine, and then do
the machine.”
We can express that in code. First consider quoting. To perform some action we
ordinarily define a function and then call it as with (f 1 2) . As a consequence, if
we want to produce a list of three strings then this
> (Boro is reading)
gives the error Boro : undefined . We must tell Racket not to evaluate these
things, in this case not to evaluate the list in the usual way of taking the first entry
to be a function and then applying it to the evaluation of the other entries.
> (quote (Boro is reading))
'(Boro is reading)
> (P 'reading)
'(Boro is reading)
For a version that does not depend on the definitions use this.
> ((lambda (x) (list x (list 'quote x)))
'(lambda (x) (list x (list 'quote x))))
'((lambda (x) (list x (list 'quote x))) '(lambda (x) (list x (list 'quote x))))
The ( lambda (x) ...) construct is how Racket defines a function of one input
without giving it a name (the term ‘lambda’ comes from Church’s Lambda Calculus).
Extra
II.D Busy Beaver
For any 𝑛 ∈ N, the set of Turing machines having no more than 𝑛 states
is finite. There are machines P𝑒 in this set that halt (on a blank tape)
and machines that do not. As the set is finite, we can think to start all
of the machines and wait until no more of them will ever converge. At
that point we will know which 𝑛 state machine runs for the most steps,
which produces the most output, which visits the most tape squares, etc.
Define the function BB : N → N to give the minimal number of steps
after which all of these size 𝑛 machines that will ever halt on a blank Tibor Radó
tape have done so. Also let Σ : N → N be the largest number of 1’s left 1895–1965
on the tape by any 𝑛 -state Turing machine, when started on a blank
tape, after halting.
D.1 Theorem (Radó, 1962) The functions BB and Σ are not computable.
Proof For BB, assume otherwise. To compute whether some P𝑒 halts on input 𝑒 ,
run P𝑒 (𝑒) for BB (𝑛) -many steps. If P𝑒 (𝑒) has not halted by then, it never will.
So computability of BB would contradict the unsolvability of the Halting problem.
(The function Σ is similar and is Exercise D.7.)
This BB may seem to be just one more uncomputable function among many
However, it has the interesting property that any function 𝑓 that grows faster than
it — where 𝑓 (𝑛) ≥ BB (𝑛) for all sufficiently large 𝑛 — is also not computable, by
the same argument as in the proof. This gives us an insight about what makes a
function uncomputable: one way is to grow faster than any computable function.†
The Busy Beaver problem is: which 𝑛 -state Turing Machine
does the most computational work before halting?
Think of this as a competition, to produce the machine that
sets the limit BB (𝑛) or Σ(𝑛) .‡ A computation needs rules and
here tradition fixes a definition of Turing machines where there
is a single tape that is unbounded at one end, there are two tape
symbols 1 and B, there is a separate halt state that is not counted in
the number of machine states, the machine is started on an empty Rare moment of rest
tape, and where transitions are of the form Δ( state, tape symbol) =
⟨state, tape symbol, head shift⟩ .
What is known In the 1962 paper Radó covered the 𝑛 = 0, 𝑛 = 1, and 𝑛 = 2 cases
(𝑛 = 0 is trivial since it refers to a machine consisting only of a halting state). In
1964 Radó and Lin showed that Σ( 3) = 6.
4.2 Example This is the three state Busy Beaver machine, with halting state 𝑞 3 .
†
Note the connection with the Ackermann function; we showed that it is not primitive recursive because
it grows faster than any primitive recursive function. ‡ For many years after the problem was originally
stated by T Radó, the competition was centered on Σ. However, recently it has become more common
to discuss BB. In any event, the two are very closely related.
134 Chapter II. Background
Δ B 1
𝑞0 𝑞 1, 1, 𝑅 𝑞 3, 1, 𝑅
𝑞1 𝑞 2, B, 𝑅 𝑞 1, 1, 𝑅
𝑞2 𝑞 2, 1, 𝐿 𝑞 0, 1, 𝐿
Halt – otherwise
This summarizes the current world records.
𝑛 1 2 3 4 5 6
BB (𝑛) 1 6 21 107 47 176 870 ≥ 10 ↑↑ 15
Σ(𝑛) 1 4 6 13 4 098 ≥ 10 ↑↑ 15
The notation 10 ↑↑ 15 means 10ˆ ( 10ˆ ( 10ˆ · · · ))) with fifteen 10’s.
How we find these After 𝑛 = 2 the obvious place to start an attack on this problem
is with a breadth-first search: there are finitely many 𝑛 -state machines so run them
all on a blank tape, dovetail, and await developments. That will quickly settle the
question for a large number of machines. Of course, some of them won’t halt or
will run longer than our patience lasts. For some of these their action will be easy
to determine from the source, and we can hope to quickly reduce to a relatively
few machines which we can study in depth, and so by exhaustion find the answer
for this 𝑛 .†
But what if for some 𝑛 we find a machine that computes something that we
don’t know? For instance, what if 𝑛 is big enough to allow a machine that halts
if and only if it finds an odd perfect number? The 𝑛 = 6 case seems to have a
machine similar to this.
For more, there are a number of websites that cover the topic, including the
latest results. Besides the Wikipedia page, the canonical site is bbchallenge.org.
Some cover variations on machine standards such as considering machines with
three or more symbols.
Not only are Busy Beaver numbers very hard to find, at some point they become
impossible. In 2016, A Yedida and S Aaronson obtained an 𝑛 for which BB (𝑛) is
unknowable. To do that, they created a programming language where programs
compile down to Turing machines. With this, they constructed a 7918-state
Turing machine that halts if there is a contradiction within the standard axioms
for Mathematics, and never halts if those axioms are consistent. We believe that
†
Brady (Brady 1983) reports 5 280 such machines for 𝑛 = 4.
Extra E. Cantor in code 135
these axioms are consistent, so we believe that this machine doesn’t halt. However,
Gödel’s Second Incompleteness Theorem shows that there is no way to prove
that the axioms are consistent using the axioms themselves. So in this case the
solution to the Busy Beaver problem is unknowable in that even if we were given
the number BB (𝑛) we could not use our axioms to prove that it is right, to prove
that this machine halts.
In summary, one way for a function to fail to be computable is if it grows faster
than any computable function. Note, however, that this is not the only way. There
are functions that grow slower than some computable function but are nonetheless
not computable.
II.D Exercises
D.3 How many Turing machines are there of the style used in this discussion?
✓ D.4 Write and run a routine to compute 𝑔( 0) , 𝑔(𝑔( 0)) , . . ..
✓ D.5 Give a diagonal construction of a function that is eventually greater than any
computable function.
✓ D.6 Show that there are uncomputable functions with the property that they
grow no faster than the computable function 𝑓 (𝑥) = 1. Hint: An argument by
countability works.
D.7 This is a proof that Σ is not computable. Let 𝑓 : N → N be any total
computable function. We will show that Σ(𝑛) > 𝑓 (𝑛) for infinitely many 𝑛 , and so
Σ ≠ 𝑓.
(a) Show that there is a Turing Machine M 𝑗 having 𝑗 many states that writes
𝑗 -many 1’s to a blank tape.
(b) Let 𝐹 : N → N be this function.
𝐹 (𝑚) = (𝑓 ( 0) + 02 ) + (𝑓 ( 1) + 12 ) + (𝑓 ( 2) + 22 ) + · · · + (𝑓 (𝑚) + 𝑚 2 )
Argue that it has the three properties: if 0 < 𝑚 then 𝑓 (𝑚) < 𝐹 (𝑚) , and
𝑚 2 ≤ 𝐹 (𝑚) , and 𝐹 (𝑚) < 𝐹 (𝑚 + 1) .
(c) The illustration below shows the composition of two Turing machines. On the
right, we have combined the final states of the first machine from the left with
the start state of the second.
... 𝑚𝑖 Halt 𝑛0 𝑛1 ... ... 𝑚𝑖 𝑛0 𝑛1 ...
... 𝑚𝑗 ... 𝑚𝑗
Consider the Turing machine P that performs M 𝑗 and followed by the machine
M𝐹 , and then follows by another copy of the machine M𝐹 . Show that its
productivity is 𝐹 (𝐹 ( 𝑗)) and that it has 𝑗 + 2𝑛 𝐹 many states.
(d) Finish by comparing that with the 𝑗 + 2𝑛 𝐹 -state Busy Beaver machine. By
definition 𝐹 (𝐹 ( 𝑗)) ≤ Σ( 𝑗 + 2𝑛 𝐹 ) . Because 𝑛 𝐹 is constant since it is the number
of states in the machine M𝐹 , the relation 𝑗 + 2𝑛 𝐹 ≤ 𝑗 2 < 𝐹 ( 𝑗) holds for
sufficiently large 𝑗 . Argue that 𝑓 ( 𝑗 + 2𝑛 𝐹 ) ≤ Σ( 𝑗 + 2𝑛 𝐹 ) .
136 Chapter II. Background
Extra
II.E Cantor in code
𝑛∈N 0 1 2 3 4 5 ...
⟨𝑖, 𝑗⟩ ∈ N × N ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ...
the map from the top row to the bottom is Cantor’s pairing function because it
outputs pairs, while its inverse from the bottom to the top is the unpairing function.
First, unpairing. Given ⟨𝑥, 𝑦⟩ , we determine the diagonal that it lies on with
1 + 2 + · · · 𝑛 = 𝑛(𝑛 + 1)/2
;; triangle -num return 1+2+3+..+n
;; natural number -> natural number
(define (triangle -num n)
(/ (* (+ n 1)
n)
2))
Next, the pairing function. Given a natural number 𝑐 , to find the associated
⟨𝑥, 𝑦⟩ , we first find the diagonal on which it will fall. Where the diagonal is 𝑑 (𝑥, 𝑦) =
𝑥 + 𝑦 , let the associated triangle number be 𝑡 (𝑥, 𝑦) = 𝑑 (𝑑√+ 1)/2 = (𝑑 2 + 𝑑)/2.
Then 0 = 𝑑 2 + 𝑑 − 2𝑡 . Applying the familiar formula (−𝑏 ± 𝑏 2 − 4𝑎𝑐)/( 2𝑎) gives
Extra E. Cantor in code 137
this.
√
−1 + 1 − 4 · 1 · (−2𝑡) −1 + 1 + 8𝑡
√︁
𝑑= =
2·1 2
(We kept only the ‘+’ of the ‘±’ because the other root is negative.) Given a pairing
function input 𝑐 , to find the number of the√ diagonal containing the ⟨𝑥, 𝑦⟩ with
pair (𝑥, 𝑦) = 𝑐 , take the floor, 𝑑 = ⌊(−1 + 1 + 8𝑐)/2⌋ .
;; diag-num Give number of diagonal containing Cantor pair numbered c
;; natural number -> natural number
(define (diag-num c)
(let ([s (integer -sqrt (+ 1 (* 8 c)))])
(floor (quotient (- s 1)
2))))
and then we get ⟨𝑥, 𝑦⟩ by seeing how far 𝑐 is along that diagonal.
;; cantor-pairing Given the cantor number, return the pair with that number
;; natural number -> (natural number natural number)
(define (cantor-pairing c)
(let* ([d (diag-num c)]
[t (triangle -num d)])
(list (- c t)
(- d (- c t)))))
With those we can reproduce the table from the section’s start.
> (for ([i '(0 1 2 3 4 5)])
(displayln (cantor-pairing i)))
(0 0)
(0 1)
(1 0)
(0 2)
(1 1)
(2 0)
The routines for triples and four-tuples show that there is a general pattern.
What the heck, just for fun we can extend to tuples of any size.
For the function unpair : N𝑘 → N, which we also call cantor, we can determine
𝑘 by peeking at the number of inputs. Thus cantor - unpairing -n generalizes
cantor - unpairing , cantor - unpairing -3 , etc., by taking a tuple of any
length.
;; cantor-unpairing -n any-sized tuple Be cantor-unparing -n where n is the tuple length
;; (natural ..) of n elets -> natural
(define (cantor-unpairing -n . args)
(cond
[(null? args) 0]
[(= 1 (length args)) (car args)]
[(= 2 (length args)) (cantor-unpairing (car args) (cadr args))]
[else
(cantor-unpairing (car args) (apply cantor-unpairing -n (cdr args)))]))
> (cantor-unpairing -n 0 0 1 0)
6
> (cantor-unpairing -n 1 2 3 4)
159331
Extra E. Cantor in code 139
To generalize to the function pair : N → N𝑘, the awkwardness is that the routine
can’t know the intended arity 𝑘 and we must specify it separately.
;; cantor-pairing -arity return the list of the given arity making the cantor number c
;; If arity=0 then only c=0 is valid (others return #f)
;; natural natural -> (natural .. natural) with arity-many elements
(define (cantor-pairing -arity arity c)
(cond
[(= 0 arity)
(if (= 0 c )
'()
(begin
(display "ERROR: cantor-pairing -arity with arity=0 requires c=0") (newline)
#f))]
[(= 1 arity) (list c)]
[else (cons (car (cantor-pairing c))
(cantor-pairing -arity (- arity 1) (cadr (cantor-pairing c))))]))
The cantor - pairing - arity routine is not uniform because it covers only
one arity at a time. Said another way, cantor - unpairing - arity is not the
inverse of cantor - pairing -n in that we have to tell it the tuple’s arity.
> (cantor-unpairing -n 3 4 5)
1381
> (cantor-pairing -arity 3 1381)
'(3 4 5)
The idea of cantor - pairing - omega is to interpret its input c as a pair ⟨𝑥, 𝑦⟩ ,
that is, 𝑐 = pair (𝑥, 𝑦) . It then returns a tuple of length 𝑥 + 1, where 𝑦 is the tuple’s
cantor number. (The reason for the +1 in 𝑥 + 1 is that the empty tuple is associated
with 𝑐 = 0. Then rather than have all later pairs ⟨0, 𝑦⟩ not be associated with any
number, we next use the one-tuple ⟨0⟩ , and after that we use ⟨1⟩ , etc.)
140 Chapter II. Background
II.E Exercises
E.1 What is the pair with Cantor number 42?
E.2 What is the pair with the number 666?
E.3 What is the first number matched by cantor - pairing - omega with a
four-tuple?
Part Two
Automata
Chapter
III Languages, Grammars, and Graphs
This chapter covers three topics we will use as a foundation for later work.
Section
III.1 Languages
Our machines input and output strings of symbols. We take a symbol, sometimes
called a token, to be an atomic unit that a machine can read and write.† On
everyday binary computers the symbols are the bits 0 and 1. An alphabet is a
nonempty and finite set of symbols. We usually denote an alphabet with the upper
case Greek letter Σ, although an exception is the alphabet of bits, B = { 0, 1 }. A
string over an alphabet is a sequence of symbols from that alphabet. We use lower
case Greek letters such as 𝜎 and 𝜏 to denote strings. We use 𝜀 to denote the empty
string, the length zero sequence of symbols. The set of all strings over Σ is Σ∗ .‡
1.1 Definition A language L over an alphabet Σ is a set of strings drawn from that
alphabet. That is, L ⊆ Σ∗.
1.2 Example The set of bitstrings that begin with 1 is L = { 1, 10, 11, 100, ... }.
1.3 Example Another language over B is the finite set { 1000001, 1100001 }.
1.4 Example Let Σ = { a, b }. The language consisting of strings where the number of
a’s is twice the number of b’s is L = {𝜀, aab, aba, baa, aaaabb, ... }.
1.5 Example Let Σ = { a, b, c }. The language of length-two strings over that alphabet
is L2 = Σ2 = { aa, ab, ba ... , cc }. Over the same alphabet, this language consists
of length-three strings whose characters are in ascending order.
L3 = { aaa, bbb, ccc, aab, aac, abb, abc, acc, bbc, bcc }
1.6 Definition A palindrome is a string that reads the same forwards as backwards.
Some palindromes in English are kayak, noon, and racecar.
1.7 Example The language of palindromes over Σ = { a, b } is L = {𝜎 ∈ Σ∗ 𝜎 = 𝜎 R }.
A few members are abba, aaabaaa, a, and 𝜀 .
1.8 Example Let Σ = { a, b, c }. Recall that a Pythagorean triple of integers has the
sum of the squares of the first two equal to the square of the third, as with 3,
4, and 5, or 5, 12, and 13. One way to describe Pythagorean triples is with the
Image: The Tower of Babel, by Pieter Bruegel the Elder (1563) † We can imagine Turing’s clerk
calculating without reading and writing symbols, for instance by keeping track of information by having
elephants move to the left side of a road or to the right. But we could translate any such procedure into
one using marks that our mechanism’s read/write head can handle. So readability and writeability are
not essential but we require them in the definition of symbols as a convenience; after all, elephants are
inconvenient. ‡ For more on strings see the Appendix on page 370.
144 Chapter III. Languages, Grammars, and Graphs
1.16 Remark For the above definition of the operation L𝑘 of repeatedly choosing
strings, there are two ways that we could go. We could choose a string 𝜎 and then
†
Don’t confuse this with the Cartesian product operation for sets. ‡ We take 𝜎 0 = 𝜀 since 𝜀 is the
identity element for string concatenation. (We saw the same reasoning when we defined the sum of
zero-many numbers to be 0 and the product of zero-many numbers to be 1, on page 21.)
Section 1. Languages 145
repeat it, and so get the set of all 𝜎 𝑘. Or we could repeatedly choose strings, getting
the set of all 𝜎0 ⌢ 𝜎1 ⌢ · · · ⌢ 𝜎𝑘 − 1 . The second is more useful so that’s what we use.
We finish by describing two ways that a machine can relate to a language. We
have already defined that a machine decides a language if it computes whether or
not a given input is a member of that language. The other way relates to languages
that are computably enumerable but not computable. For these there is a machine
that determines whether a given input is a member of the language but it is not
able to determine whether the input is not in the language. For instance, there is a
Turing machine that, given input 𝑒 , can determine whether 𝑒 ∈ 𝐾 , but no machine
can determine whether 𝑒 ∉ 𝐾 .
We will say that a machine recognizes (or accepts, or semidecides) a language
when, given an input, the machine computes in a finite time whether the input is
in the language, and further, if the input is not an element of the language then
the machine will never incorrectly report that it is an element. (The machine may
determine that it is not, or it may simply not report a conclusion by failing to halt.)
In short, ‘deciding’ means that on any input the machine correctly computes
both yes and no answers, while ‘recognizing’ requires only that it correctly computes
yes answers.
III.1 Exercises
1.17 List five of the shortest strings in each language, if there are five.
(a) {𝜎 ∈ B∗ the number of 0’s plus the number of 1’s equals 3 }
(b) {𝜎 ∈ B∗ 𝜎 ’s first and last characters are equal }
✓ 1.18 Is the set of decimal representations of real numbers a language?
1.19 Which of these is a palindrome: ()() or )(()? (a) Only the first (b) Only
the second (c) Both (d) Neither
✓ 1.20 Show that if 𝛽 is a string then 𝛽 ⌢ 𝛽 R is a palindrome. Do all palindromes
have that form?
✓ 1.21 Let L0 = {𝜀, a, aa, aaa } and L1 = {𝜀, b, bb, bbb }. (a) List all the members
of L0 ⌢ L1 . (b) List all the members of L1 ⌢ L0 . (c) List all the members of L2 . 0
(d) List ten members, if there are ten, of L0∗.
✓ 1.22 List five members of each language, if there are five, and if not then list all
of them.
(a) {𝜎 ∈ {𝑎, 𝑏 } 𝜎 = a𝑛 b for 𝑛 ∈ N }
∗
(c) { 1𝑛 0𝑛+1 ∈ B∗ 𝑛 ∈ N}
(d) { 1𝑛 02𝑛 1 ∈ B∗ 𝑛 ∈ N}
✓ 1.23 Where L = { a, ab }, list each. (a) L2 (b) L3 (c) L1 (d) L0
1.24 Where L0 = { a, ab } and L1 = { b, bb } find each. (a) L0 ⌢ L1 (b) L1 ⌢ L0
(c) L2 (d) L2 (e) L2 ⌢ L2
0 1 0 1
146 Chapter III. Languages, Grammars, and Graphs
1.25 Suppose that the language L0 has three elements and L1 has two. Knowing
only that information, for each of these find the least number of elements possible
and the greatest number possible? (a) L0 ∪ L1 (b) L0 ∩ L1 (c) L0 ⌢ L1 (d) L21
(e) L1 R (f) L0 ∗ ∩ L1 ∗
1.26 What is the language that is the Kleene star of the empty set, ∅∗ ?
✓ 1.27 Is the 𝑘 -th power of a language the same as the language of 𝑘 -th powers?
1.28 Does L∗ differ from ( L ∪ {𝜀 }) ∗ ?
1.29 We can ask how many elements are in the set L2.
(a) Prove that if two strings are unequal then their squares are also unequal.
Conclude that if L has 𝑘 -many elements then L2 has at least 𝑘 -many elements.
(b) Provide an example of a nonempty language that achieves this lower bound.
(c) Prove that where L has 𝑘 -many elements, L2 has at most 𝑘 2 -many.
(d) Provide an example, for each 𝑘 ∈ N, of a language that achieves this upper
bound.
1.30 Prove that L∗ = L0 ∪ L1 ∪ L2 ∪ · · · .
1.31 Consider the empty language L0 = ∅. For any language L1 , describe
L1 ⌢ L0 .
1.32 Languages are sets and so the operations of union and intersection apply.
(a) Name the shortest five strings in the union of 𝐴 = { a } with 𝐵 = { b } .
∗ ∗
(b) Suppose that Σ0 and Σ1 are disjoint, and that L0 and L1 are finite languages
over those alphabets respectively. What is the number of elements in their
union?
(c) Fill in the blank: the union of a language over Σ0 with a language over Σ1 is a
language over .
(d) Formulate the similar statement for intersection.
1.33 Let the language L over some Σ be finite, that is, suppose that | L | < ∞.
(a) With the language finite, must the alphabet be finite?
(b) Show that there is some bound 𝐵 ∈ N where |𝜎 | ≤ 𝐵 for all 𝜎 ∈ L.
(c) Show that the class of finite languages is closed under finite union. That is,
show that if L0, ... L𝑘 − 1 are finite languages over a shared alphabet for some
𝑘 ∈ N then their union is also finite.
(d) Show also that the class of finite languages is closed under finite intersection
and finite concatenation.
(e) Show that the class of finite languages is not closed under complementation
or Kleene star. (For an alphabet Σ, a language is a subset, L ⊆ Σ∗ . So its
complement is Lc = Σ∗ − L, also a language over Σ.)
1.34 What is the difference between the languages L = {𝜎 ∈ Σ∗ 𝜎 = 𝜎 R } and
L̂ = {𝜎 ⌢ 𝜎 R 𝜎 ∈ Σ∗ }?
1.35 For any language L ⊆ Σ∗ we can form the set of prefixes.
Pref ( L) = {𝜏 ∈ Σ∗ 𝜎 ∈ L and 𝜏 is a prefix of 𝜎 }
Where Σ = { a, b } and L = { abaaba, bba }, find Pref ( L) .
Section 2. Grammars 147
Section
III.2 Grammars
We have defined a ‘language’ as a set of strings. But this allows for any willy-nilly
set. In practice usually a language is governed by rules.
Here is an example. Native English speakers will say that the noun phrase
“the big red barn” sounds fine but that “the red big barn” sounds wrong. That is,
sentences in natural languages are constructed in patterns and the second of those
does not follow the pattern for English. Artificial languages such as programming
languages also have syntax rules, usually very strict rules.
148 Chapter III. Languages, Grammars, and Graphs
the ball
hexpri
⟨expr⟩ ⇒ ⟨term⟩
⇒ ⟨term⟩ * ⟨factor⟩ htermi
⇒ ⟨factor⟩ * ⟨factor⟩
htermi * hfactori
⇒ x * ⟨factor⟩
⇒ x * ( ⟨expr⟩ ) hfactori ( hexpri )
⇒ x * ( ⟨term⟩ + ⟨expr⟩ )
⇒ x * ( ⟨term⟩ + ⟨term⟩ ) x htermi + hexpri
⇒ x * ( ⟨factor⟩ + ⟨term⟩ )
hfactori htermi
⇒ x * ( ⟨factor⟩ + ⟨factor⟩ )
⇒ x * ( y + ⟨factor⟩ ) y hfactori
⇒ x*(y+z)
z
In that example the rules for ⟨expr⟩ and ⟨term⟩ are recursive. But we don’t get
stuck in an infinite regress because the question is not whether we could perversely
keep expanding ⟨expr⟩ forever. Instead, the question is whether, given a string
such as x*(y+z), we can find a terminating derivation.
In the prior example the nonterminals such as ⟨expr⟩ or ⟨term⟩ describe the
role of those components in the language, as did the English grammar fragment’s
⟨noun phrase⟩ and ⟨article⟩ . That is why nonterminals are sometimes called
‘syntactic categories’. But for examples and exercises we often use small grammars
whose terminals and nonterminals do not have any particular meaning. For these,
a common convention is to write productions using single letters, with nonterminals
in upper case and terminals in lower case.
2.4 Example This two-rule grammar has one nonterminal, S.
S → aSb | 𝜀
Here is a derivation of the string a2 b2 = aabb.
That is, if there is a match for the rule’s head then we can replace it with the body.
Where 𝜎0, 𝜎1 are strings of terminals and nonterminals, if they are related by a
sequence of derivation steps then we write 𝜎0 ⇒∗ 𝜎1 . Where 𝜎0 = 𝑆 is the start
symbol, if there is a sequence 𝜎0 ⇒∗ 𝜎1 that finishes with a string of terminals
𝜎1 ∈ Σ∗ then we say that 𝜎1 has a derivation from the grammar.
This description is like the one detailing how a Turing machine’s instructions
determine the evolution of the sequence of configurations that is a computation,
on page 8. That is, production rules are like a program, directing a derivation.
However, one difference is that Turing machines are deterministic, so that from
a given input string there is a determined sequence of configurations. However
here the sequence of derivation steps is nondeterministic in that from a given start
symbol a derivation can branch out to go to many different ending strings.
2.5 Definition The language derived from a grammar is the set of strings of
terminals having derivations that begin with the start symbol.
2.6 Example This grammar’s language is the set of representations of natural numbers.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | . . . | 9
This is a derivation for the string 321, along with its parse tree.
hnaturali
⟨natural⟩ ⇒ ⟨digit⟩⟨natural⟩
hdigiti hnaturali
⇒ 3 ⟨natural⟩
⇒ 3 ⟨digit⟩⟨natural⟩
3 hdigiti hnaturali
⇒ 32 ⟨natural⟩
⇒ 32 ⟨digit⟩
2 hdigiti
⇒ 321
1
S → aSb | aS | a | Sb | b
generates the language L = { a𝑖 b 𝑗 ∈ { a, b }∗ 𝑖 ≠ 0 or 𝑗 ≠ 0 }.
This is the first grammar that we have seen where the generated language is
not clear, so we will do a verification. We will show mutual containment, first that
the generated language is a subset of L and then that it is also a superset.
The rules show that any derivation step 𝜏0 ⌢ head ⌢ 𝜏1 ⇒ 𝜏0 ⌢ body ⌢ 𝜏1 only
adds a’s on the left and b’s on the right, so every string in the language has the form
a𝑖 b 𝑗 . That same rules show that in any terminating derivation S must eventually
be replaced by either a or b. Together these two give that the generated language
is a subset of L.
For containment the other way, we will prove that every 𝜎 ∈ L has a derivation.
We will use induction on the length |𝜎 | . By the definition of L the base case is
|𝜎 | = 1. In this case either 𝜎 = a or 𝜎 = b, each of which obviously has a derivation.
For the inductive step, fix 𝑛 ≥ 1 where every string from L of length 𝑘 = 1, . . .
𝑘 = 𝑛 has a derivation, and let 𝜎 have length 𝑛 + 1. By the definition of L it has
the form 𝜎 = a𝑖 b 𝑗 . There are three cases: either or 𝑖 = 𝑗 = 1, or 𝑖 > 1, or 𝑗 > 1.
The 𝜎 = a1 b1 case is easy. For the 𝑖 > 1 case, 𝜎ˆ = a𝑖 − 1 b 𝑗 is a string of length 𝑛 ,
so by the inductive hypothesis it has a derivation S ⇒ · · · ⇒ 𝜎ˆ . Prefixing that
derivation with a S ⇒ aS step will put an additional a on the left. The 𝑗 > 1 case
works the same way.
2.10 Example The fact that derivations can go more than one way leads to an important
issue with grammars, that they can be ambiguous. Consider this fragment of a
grammar for if statements in a C-like language
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
and this code string.
if enrolled(s) if studied(s) grade='P' else grade='F'
if if
as do these copies of the C-like language code string that has been indented to
dramatize the association.
if enrolled(s) if enrolled(s)
if studied(s) if studied(s)
grade='P' grade='P'
else else
grade='F' grade='F'
Obviously, those programs behave differently. This is known as a dangling else. (In
a language such as C a programmer makes clear which of the two possibilities is
the intended one by using curly braces.)
A grammar is ambiguous if there is a string in its language with more than one
leftmost derivation.
2.11 Example This grammar for elementary algebra expressions
⟨expr⟩ → ⟨expr⟩ + ⟨expr⟩
| ⟨expr⟩ * ⟨expr>⟩
| ( ⟨expr⟩ ) | a | b | . . . z
is ambiguous because a+b*c has two leftmost derivations.
hexpri hexpri
b c a b
Again, the issue is that we get two different behaviors. For instance, take 1 for a,
and 2 for b, and 3 for c. The first derivation gives 1 + ( 2 · 3) = 7 while the second
one gives ( 1 + 2) · 3 = 9.
In contrast, this grammar for the same language is unambiguous.
⟨expr⟩ → ⟨expr⟩ + ⟨term⟩
| ⟨term⟩
⟨term⟩ → ⟨term⟩ * ⟨factor⟩
| ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ )
| a | b | ... | z
154 Chapter III. Languages, Grammars, and Graphs
III.2 Exercises
✓ 2.12 Use the grammar of Example 2.3. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar, besides the string
in the example. (f) Give three strings in the language { +, *, ), (, a ... , z }∗ that
cannot be derived.
2.13 Use the grammar of Example 2.1. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar besides the ones in
the exercise, or show that there are not three such strings. (f) Give three strings
in the language that cannot be derived from this grammar, or show that there are
not three such strings.
2.14 Use this grammar.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | 1 | . . . | 9
(a) What is the alphabet? What are the terminals? The nonterminals? What is the
start symbol? (b) For each production, name the head and the body. (c) Which
metacharacters are used? (d) Derive 42. Also give the associated parse tree.
(e) Derive 993 and give its parse tree. (f) How can ⟨natural⟩ be defined in terms
of ⟨natural⟩ ? Doesn’t that lead to infinite regress? (g) Extend this grammar to
cover the integers. (h) With your grammar, can you derive +0? -0?
✓ 2.15 From this grammar
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct object⟩
⟨direct object⟩ → ⟨article⟩ ⟨noun⟩
⟨article⟩ → the | a
⟨noun⟩ → car | wall
⟨verb⟩ → hit
derive each of these: (a) the car hit a wall (b) the car hit the wall
(c) the wall hit a car.
2.16 Consider this grammar.
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun1⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct-object⟩
⟨direct-object⟩ → ⟨article⟩ ⟨noun2⟩
⟨article⟩ → the | a | 𝜀
⟨noun1⟩ → dog | flea
⟨noun2⟩ → man | dog
⟨verb⟩ → bites | licks
(a) Give a derivation for dog bites man.
(b) Show that there is no derivation for man bites dog.
Section 2. Grammars 155
✓ 2.17 Your friend tries the prior exercise and you see their work so far.
⟨sentence⟩ ⇒ ⟨subject⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨verb⟩ ⟨direct object⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨noun2⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨man|dog⟩
Stop them and explain what they are doing wrong.
2.18 With the grammar of Example 2.3, derive (a+b)*c.
✓ 2.19 Use this grammar
S → TbU
T → aT | 𝜀
U → aU | bU | 𝜀
for each part. (a) Give both a leftmost derivation and rightmost derivation of aabab.
(b) Do the same for baab. (c) Show that there is no derivation of aa.
2.20 Use this grammar.
S → aABb
A → aA | a
B → Bb | b
(a) Derive three strings.
(b) Name three strings over Σ = { a, b } that are not derivable.
(c) Describe the language generated by this grammar.
2.21 Give a grammar for the language { a𝑛 b𝑛+𝑚 a𝑚 𝑛, 𝑚 ∈ N }.
✓ 2.22 Give the parse tree for the derivation of aabb in Example 2.4.
2.23 Verify that the language derived from the grammar in Example 2.4 is
L = { a𝑛 b𝑛 𝑛 ∈ N }.
2.24 What is the language generated by this grammar?
A → aA | B
B → bB | cA
✓ 2.25 In many programming languages identifier names consist of a string of
letters or digits, with the restriction that the first character must be a letter. Create
a grammar for this, using ASCII letters.
2.26 Early programming languages had strong restrictions on what could be a
variable name. Create a grammar for a language that consists of strings of at most
four characters, upper case ASCII letters or digits, where the first character must
be a letter.
2.27 What is the language generated by a grammar with a set of production rules
that is empty?
2.28 Here is a grammar for propositional logic expressions in Conjunctive Normal
form.
⟨CNF⟩ → ( ⟨Disjunction⟩ ) ∧ ⟨CNF⟩ | ( ⟨Disjunction⟩ )
156 Chapter III. Languages, Grammars, and Graphs
✓ 2.30 This is a grammar for postal addresses. Note the use of the empty string 𝜀 to
make some components optional, such as ⟨opt suffix⟩ and ⟨apt num⟩ .
⟨postal address⟩ → ⟨name⟩ ⟨EOL⟩ ⟨street address⟩ ⟨EOL⟩ ⟨town⟩
⟨name⟩ → ⟨personal part⟩ ⟨last name⟩ ⟨opt suffix⟩
⟨street address⟩ → ⟨house num⟩ ⟨street name⟩ ⟨apt num⟩
⟨town⟩ → ⟨town name⟩ , ⟨state or region⟩
⟨personal part⟩ → ⟨initial⟩ . | ⟨first name⟩
⟨last name⟩ → ⟨char string⟩
⟨opt suffix⟩ → Sr. | Jr. | 𝜀
⟨house num⟩ → ⟨digit string⟩
⟨street name⟩ → ⟨char string⟩
⟨apt num⟩ → ⟨char string⟩ | 𝜀
⟨town name⟩ → ⟨char string⟩
⟨state or region⟩ → ⟨char string⟩
⟨initial⟩ → ⟨char⟩
⟨first name⟩ → ⟨char string⟩ | 𝜀
⟨char string⟩ → ⟨char⟩ | ⟨char⟩ ⟨char string⟩ | 𝜀
⟨char⟩ → A | B | . . . z | 0 | . . . 9 | (space)
⟨digit string⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨digit string⟩ | 𝜀
⟨digit⟩ → 0 | . . . 9
The nonterminal ⟨EOL⟩ expands to an end of line such as ASCII 10, while (space)
signifies a whitespace character such as ASCII 0 or ASCII 32, or even more exotic
characters such as en-space or em-space.
(a) Give a derivation for this address.
President
1600 Pennsylvania Avenue
Washington, DC
(b) Why is there no derivation for this address?
Sherlock Holmes
221B Baker Street
London, UK
Suggest a modification of the grammar so that this address is in the language.
(c) Give three reasons why this grammar is inadequate.
Section 2. Grammars 157
2.31 Recall Turing’s prototype computer, a clerk doing the symbolic manipulations
to multiply two large numbers. Deriving a string from a grammar has a similar
feel and we can write grammars to do computations. Fix the alphabet Σ = { 1 }, so
that we can interpret derived strings as numbers represented in unary.
(a) Produce a grammar whose language is the even numbers, { 12𝑛 𝑛 ∈ N } .
(b) Do the same for the multiples of three, { 13𝑛 𝑛 ∈ N } .
✓ 2.32 Here is a grammar that is notable for having a small alphabet, while
producing an infinite set of valid English sentences.
⟨sentence⟩ → buffalo ⟨sentence⟩ | 𝜀
(a) Derive a sentence of length one, one of length two, and one of length three.
(b) Give those sentences semantics, that is, make sense of them.
2.33 Here is a grammar for LISP.
⟨s expression⟩ → ⟨atomic symbol⟩
| ( ⟨s expression⟩ . ⟨s expression⟩ )
| ⟨list⟩
⟨list⟩ → ( ⟨list-entries⟩ )
⟨list-entries⟩ → ⟨s expression⟩
| ⟨s expression⟩ ⟨list-entries⟩
⟨atomic symbol⟩ → ⟨letter⟩ ⟨atom part⟩
⟨atom part⟩ → 𝜀
| ⟨letter⟩ ⟨atom part⟩
| ⟨number⟩ ⟨atom part⟩
⟨letter⟩ → a | . . . z
⟨number⟩ → 0 | . . . 9
Give a derivation for each string. (a) (a . b) (b) (a . (b . c))
2.34 Using the Example 2.11’s unambiguous grammar, produce a derivation for
a+(b*c).
2.35 The simplest example of an ambiguous grammar is
S → S |𝜀
(a) What is the language generated by this grammar?
(b) Produce two different derivations of the empty string.
2.36 This is a grammar for the language of bitstrings L = B∗.
⟨bit-string⟩ → 0 | 1 | ⟨bit-string⟩ ⟨bit-string⟩
Show that it is ambiguous.
2.37
(a) Show that this grammar is ambiguous by producing two different leftmost
derivations for a-b-a.
E → E-E |a |b
(b) Derive a-b-a from this grammar, which is unambiguous.
E → E-T |T
T → a |b
158 Chapter III. Languages, Grammars, and Graphs
Section
III.3 Graphs
In the Theory of Computation we often state problems using the language of Graph
Theory. Here are two examples we have already seen. Both have vertices, and
those vertices are connected by edges that represent a relationship between the
vertices.
hexpri
htermi
htermi * hfactori
𝑞0
B, L
𝑞1
B, L
𝑞2
B, R
𝑞3 hfactori ( hexpri )
1, R 1, B 1, L
x htermi + hexpri
hfactori htermi
y hfactori
𝑣0
𝑣1 𝑣2 N = {𝑣 0, ... 𝑣 4 }
E = { {𝑣 0, 𝑣 1 }, {𝑣 0, 𝑣 2 }, ... {𝑣 3, 𝑣 4 } }
𝑣3 𝑣4
Important: a graph is not its picture. Both of the pictures below show the same
graph as above because they show the same vertices connected with the same
edges.
𝑣0
𝑣3 𝑣1 𝑣1 𝑣2
𝑣0
𝑣4 𝑣2 𝑣3 𝑣4
Instead of writing 𝑒 = {𝑣, 𝑣ˆ } we often write 𝑒 = 𝑣 𝑣ˆ. Since edges are sets and
sets are unordered we could write the same edge as 𝑒 = 𝑣ˆ𝑣 .
Section 3. Graphs 159
There are many extensions of that definition for modeling different circum-
stances. One is to allow some vertices to connect to themselves, forming a loop.†
Another variant is a multigraph, which allows two vertices to share more than
one edge. Still another is a weighted graph, which gives each edge a real number
weight, perhaps signifying the distance or the cost in money or time to traverse
that edge.
A very often-used variation is a directed graph or digraph, where edges have a
direction, as in a road map that includes one-way streets. If an edge is directed
from 𝑣 to 𝑣ˆ then we can write it as 𝑣 𝑣ˆ but not in the other order. The Turing
machine graph above is a digraph and also has loops.
Some important variations involve whether the graph has cycles. A cycle is a
closed path around the graph; see the complete definition just below. A tree is an
undirected connected graph with no cycles (often one vertex is singled out as the
tree’s root). A directed acyclic graph or DAG is a directed graph with no directed
cycles.
Paths Many problems that we shall consider involve moving through a graph.
3.3 Definition Two graph edges are adjacent if they share a vertex, so that they
have the form 𝑒 0 = 𝑢𝑣 and 𝑒 1 = 𝑣𝑤 . A walk is a sequence of adjacent edges
⟨𝑣 0𝑣 1, 𝑣 1𝑣 2, ... 𝑣𝑛−1𝑣𝑛 ⟩ . Its length is the number of edges, 𝑛 . If the initial vertex 𝑣 0
equals the final vertex 𝑣𝑛 then the walk is closed, otherwise it is open. A trail is a
walk where no edge occurs twice. A circuit is a closed trail. A path is a walk with
no repeated edges or vertices, except that it may be closed and so have that its
first and last vertices are equal. A closed path with at least one edge is a cycle.‡
3.4 Example On the left, highlighted is a path from 𝑢 0 to 𝑢 3 , 𝑝 = ⟨𝑢 0𝑢 1, 𝑢 1𝑢 3 ⟩ . On
the right the highlighted walk is a cycle.
𝑣0 𝑣1
𝑢0
𝑣2 𝑣3
𝑢1
𝑣4 𝑣5
𝑢2 𝑢3
𝑣6 𝑣7
3.5 Definition If a circuit contains all of a graph’s edges then it is an Euler circuit.
If it contains all of the vertices then it is a Hamiltonian circuit.
3.6 Example In Example 3.4 the path in the graph on the left is not a circuit because
it is not closed. The path in the graph on the right is a Hamiltonian circuit but it is
not an Euler circuit.
3.7 Definition Where G = ⟨N , E ⟩ is a graph, a subgraph Ĝ = ⟨N̂ , Ê ⟩ satisfies
N̂ ⊆ N and Ê ⊆ E . A subgraph with every possible edge, so that 𝑣𝑖 , 𝑣 𝑗 ∈ N̂ and
𝑒 = 𝑣𝑖 𝑣 𝑗 ∈ E implies that 𝑒 ∈ Ê also, is an induced subgraph.
†
Formally, we might extend the definition to allow some edges in E to be single-element sets. We will
not specify how each variant is described. ‡ These terms are not completely standardized so you may
see them used in other ways, especially in older work.
160 Chapter III. Languages, Grammars, and Graphs
3.8 Example In the graph G on the left of Example 3.4, consider the edges in the
highlighted path, Ê = {𝑢 0𝑢 1, 𝑢 1𝑢 3 }. Taking those edges along with the vertices
that they contain, N̂ = {𝑢 0, 𝑢 1, 𝑢 3 }, gives a subgraph Ĝ .
With the same set of vertices, N̂ = {𝑢 0, 𝑢 1, 𝑢 3 }, the induced subgraph is the
triangle that adds the outer edge, E ∪ {𝑢 0𝑢 3 }.
3.9 Definition A vertex 𝑣 1 is reachable from the vertex 𝑣 0 if there is a path from 𝑣 0
to 𝑣 1 . A graph is connected if between any two vertices there is a path.
In Chapter Five we will consider the graph of the possible branchings of a
computation by a machine. Such a graph may have infinitely many nodes, as when
there is a branch that does not halt. There, we will need the next result.
3.10 Lemma (König’s lemma) Suppose that in a connected graph each vertex is
adjacent to only finitely many other vertices. If the graph has infinitely many
vertices then it has an infinite path, one with infinitely many vertices.
Proof Fix a vertex 𝑣 0 . The graph is connected, so for every other vertex there is
a path starting at 𝑣 0 that reaches it. For each of 𝑣 0 ’s neighbors, there is a set of
vertices that can be reached from 𝑣 0 via a path through that neighbor. There are
infinitely many vertices so there must be a neighbor (unequal to 𝑣 0 ) where the set
of vertices that are reachable in that way is infinite. Pick such a neighbor and call
it 𝑣 1 .
Now iterate: by choice of 𝑣 1 there are infinitely many vertices reachable by a
path starting with the edge 𝑣 0𝑣 1 . Because 𝑣 1 has finitely many neighbors, there is
a 𝑣 2 adjacent to 𝑣 1 (and unequal to either 𝑣 0 or 𝑣 1 ), through which there are paths
to infinitely many of the graph’s vertices. In this way we get a path containing
infinitely many vertices
𝑣0 𝑣1 𝑣2 𝑣3 𝑣4
𝑣0 0 1 1 0 0
𝑣1 ©1 0 1 1 1 ª®
M(G ) = 𝑣 2 1 1 0 1 1 ®®
0 1 1 0 1®
𝑣3
𝑣4 «0 1 1 1 0¬
We can extend this to cover other graph variants that were listed earlier. For
instance, the graph represented in (∗) is a simple graph because the matrix has
only 0 and 1 entries, because all the diagonal entries are 0, and because the matrix
is symmetric, meaning that the 𝑖, 𝑗 entry has a 1 if and only if the 𝑗, 𝑖 entry is also 1.
If the graph is directed and has a one-way edge from 𝑣𝑖 to 𝑣 𝑗 but none from 𝑣 𝑗 to 𝑣𝑖
then the matrix is not symmetric because the 𝑖, 𝑗 entry will be 1 but the 𝑗, 𝑖 entry
Section 3. Graphs 161
will be 0. For a multigraph, where there can be multiple edges from one vertex to
another, the associated entry can be larger than 1. And, if the graph has a loop
then the matrix has a diagonal entry that is a natural number larger than zero.
3.11 Definition For a graph G , the adjacency matrix M ( G ) has that the 𝑖, 𝑗 entry
equals the number of edges from 𝑣𝑖 to 𝑣 𝑗 .
3.12 Lemma Let the matrix M ( G ) represent the graph G . Then in its matrix multi-
plicative 𝑛 -th power the 𝑖, 𝑗 entry is the number of paths of length 𝑛 from vertex 𝑣𝑖
to vertex 𝑣 𝑗 .
Proof Exercise 3.41.
In contrast, the graph on the right has no 3-coloring. The four vertices are
completely connected so if two got the same color then they would be adjacent.
3.14 Example This table gives five committees. How many time slots must we use to
so that no one has two meetings at once?
A B C D E
Armis Crump Burke India Burke
Jones Edwards Frank Harris Jones
Smith Robinson Ke Smith Robinson
Model this with a graph by taking each vertex to be a committee and if committees
are related by sharing a member then put an edge between them.
𝐵
𝐴 𝐶
𝐸 𝐷
The picture shows that three colors is enough, that is, three time slots suffice. But
there is also a two-coloring, C0 = {𝐴, 𝐵, 𝐶 } and C1 = { 𝐷, 𝐸 }.
162 Chapter III. Languages, Grammars, and Graphs
A graph’s chromatic number is the minimum number 𝑘 where the graph has a
𝑘 -coloring.
Graph isomorphism We sometimes want to know when two graphs are essentially
identical. Consider these two.
𝑤0 𝑤1
𝑣3 𝑣4 𝑣5
𝑤5 𝑤2
𝑣0 𝑣1 𝑣2
𝑤4 𝑤3
They have the same number of vertices and the same number of edges. Further,
on the right as well as on the left there are two classes of vertices where all the
vertices in the first class connect to all the vertices in the second class: on the left
the two classes are the top and bottom rows while on the right they are the even-
and odd-numbered vertices. A person may suspect that, as in Example 3.2, these
are two ways to draw the same graph, with the vertex names changed for further
obfuscation.
That’s true; if we define this correspondence between the vertices
Vertex on left 𝑣0 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5
Vertex on right 𝑤0 𝑤2 𝑤4 𝑤1 𝑤3 𝑤5
Edge on left {𝑣 0, 𝑣 3 } {𝑣 0, 𝑣 4 } {𝑣 0, 𝑣 5 } {𝑣 1, 𝑣 3 } {𝑣 1, 𝑣 4 } {𝑣 1, 𝑣 5 }
Edge on right {𝑤 0, 𝑤 1 } {𝑤 0, 𝑤 3 } {𝑤 0, 𝑤 5 } {𝑤 2, 𝑤 1 } {𝑤 2, 𝑤 3 } {𝑤 2, 𝑤 5 }
(Cont.) Edge on left {𝑣 2, 𝑣 3 } {𝑣 2, 𝑣 4 } {𝑣 2, 𝑣 5 }
Edge on right {𝑤 2, 𝑤 1 } {𝑤 2, 𝑤 3 } {𝑤 2, 𝑤 5 }
3.15 Definition Two graphs G and Ĝ are isomorphic if there is a one-to-one and onto
map 𝑓 : N → N̂ such that G has an edge {𝑣𝑖 , 𝑣 𝑗 } ∈ E if and only if Ĝ has the
associated edge { 𝑓 (𝑣𝑖 ), 𝑓 (𝑣 𝑗 ) } ∈ Ê .
To verify that two graphs are isomorphic the most natural thing is to
produce the map 𝑓 and then verify that in consequence the edges also
correspond. The exercises have examples.
Showing that graphs are not isomorphic usually entails finding some
graph-theoretic way in which they differ. A useful such property to
consider is the degree of a vertex, the total number of edges touching
that vertex with the proviso that a loop from the vertex to itself counts as
two. The degree sequence of a graph is the non-increasing sequence of
its vertex degrees. Thus, the graph in Example 3.14 has degree sequence
⟨3, 2, 1, 1, 1⟩ . Courtesy xkcd
Exercise 3.39 shows that if graphs are isomorphic then associated .com
vertices have the same degree and thus graphs with different degree
Section 3. Graphs 163
sequences are not isomorphic. Also, if we have two isomorphic graphs then we
can use the degrees of the vertices to help us construct an isomorphism, if there is
one; examples are in the exercises. (Note, though, that there are graphs with the
same degree sequence that are not isomorphic.)
Determining whether two given graphs are isomorphic is in general a hard
problem. We could use brute force, checking every possible correspondence
between the two sets of vertices, but that would be slow. We do not currently know
whether there is a quick way. More on algorithm speed, including the speed of a
number of graph algorithms, is in the final chapter.
III.3 Exercises
✓ 3.16 Draw a picture of a graph illustrating each relationship. Some graphs will
be digraphs, or may have loops or multiple edges between some pairs of vertices.
(a) Maine is adjacent Massachusetts and New Hampshire. Massachusetts is adja-
cent to every other state. New Hampshire is adjacent to Maine, Massachusetts,
and Vermont. Rhode Island is adjacent to Connecticut and Massachusetts.
Vermont is adjacent to Massachusetts and New Hampshire. Give the graph
describing the adjacency relation.
(b) In the game of Rock-Paper-Scissors, Rock beats Scissors, Paper beats Rock,
and Scissors beats Paper. Give the graph of the ‘beats’ relation; note that this
is a directed relation.
(c) The number 𝑚 ∈ N is related to the number 𝑛 ∈ N by being its divisor if they
are unequal and if there is a 𝑘 ∈ N with 𝑚 · 𝑘 = 𝑛 . Give the graph describing
the divisor relation among positive natural numbers less than or equal to 12
(it is a digraph).
(d) The river Pregel cut the town of Königsberg into four land masses. There were
two bridges from mass 0 to mass 1 and one bridge from mass 0 to mass 2.
There was one bridge from mass 1 to mass 2, and two bridges from mass 1
to mass 3. Finally, there was one bridge from mass 2 to 3. Consider masses
related by bridges. Give the graph (it is a multigraph).
3.17 Put ‘Y’ or ‘N’ in the array cells for these kinds of walks.
3.18 If a graph has many edges then from a visual design standpoint it can be
confusing. Sometimes in a directed graph we can take advantage of a ‘precedes’
relation being transitive to draw it with the minimum number of edges that conveys
all of the information. Suppose that in a Mathematics program students must take
164 Chapter III. Languages, Grammars, and Graphs
Calculus II before Calculus III, and must take Calculus I before II. They must also
take Calculus II before Linear Algebra, and to take Real Analysis they must have
both Linear Algebra and Calculus III. Draw the digraph with a minimum number
of edges.
3.19 Let a simple graph G have vertices {𝑣 0, ... 𝑣 5 } and the edges 𝑣 0𝑣 1 , 𝑣 0𝑣 3 , 𝑣 0𝑣 5 ,
𝑣 1𝑣 4 , 𝑣 3𝑣 4 , and 𝑣 4𝑣 5 . (a) Draw G . (b) Give its adjacency matrix. (c) Find all
subgraphs with four nodes and four edges. (d) Find all induced subgraphs with
four nodes and four edges.
3.20 The complete graph on 𝑛 vertices, 𝐾𝑛 is the simple graph with all possible
edges. (a) Draw 𝐾4 , 𝐾3 , 𝐾2 , and 𝐾1 . (b) Draw 𝐾5 . (c) How many edges does 𝐾𝑛
have?
✓ 3.21 Morse code represents text with a combination of a short sound, written ‘.’
and pronounced “dit,” and a long sound, written ‘-’ and pronounced “dah.” Here
are the representations of the twenty six English letters.
A .- F ..-. K -.- O --- S ... W .--
B - ... G -- . L .-.. P . --. T - X -..-
C -.-. H .... M -- Q --.- U ..- Y -.--
D -.. I .. N -. R .-. V ...- Z --..
E . J .---
Some representations are prefixes of others. Give the graph for the prefix relation.
3.22 This is the Petersen graph, often used for examples in Graph Theory.
𝑣0
𝑣5
𝑣1 𝑣4
𝑣6 𝑣9
𝑣7 𝑣8
𝑣2 𝑣3
(a) List the vertices and edges. (b) Give two walks from 𝑣 0 to 𝑣 7 . What is the length
of each? (c) List both a closed walk and an open walk of length five, starting at 𝑣 4 .
(d) Give a cycle starting at 𝑣 5 . (e) Is this graph connected?
3.23 A graph is a set of vertices and edges, not a drawing. So a single graph
may be drawn with quite different pictures. Consider a graph G with the vertices
N = {𝐴, ... 𝐻 } and these edges.
E = {𝐴𝐵, 𝐴𝐶, 𝐴𝐺, 𝐴𝐻, 𝐵𝐶, 𝐵𝐷, 𝐵𝐹, 𝐶𝐷, 𝐶𝐸, 𝐷𝐸, 𝐷𝐹, 𝐸𝐹, 𝐸𝐺, 𝐹 𝐻, 𝐺𝐻 }
(a) Connect the dots below to get one drawing.
𝐵 𝐸
𝐴 𝐺
𝐶 𝐻
𝐷 𝐹
Section 3. Graphs 165
(b) A planar graph is one that can be drawn in the plane so that its edges do not
cross. Show that G is planar.
3.24 A person keeps six species of fish as pets. Species 𝐴 cannot be in a tank with
species 𝐵 or 𝐶 . Species 𝐵 cannot be with 𝐴, 𝐶 , or 𝐷 . Species 𝐶 cannot be with 𝐴,
𝐵 , 𝐷 , or 𝐸 . Species 𝐷 cannot be with 𝐵 , 𝐶 or 𝐹 . Species 𝐸 cannot be together with
𝐶 , or 𝐹 . Finally, species 𝐹 cannot be in with 𝐷 or 𝐸 . (a) Draw the graph where
the nodes are species and the edges represent the relation ‘cannot be together’.
(b) Find the chromatic number. (c) Interpret it.
✓ 3.25 If two cell towers are within line of sight of each other then they must be
assigned different frequencies. Below each tower is a vertex and an edge between
towers denotes that they can see each other. What is the minimal number of
frequencies? Give an assignment of frequencies to towers.
𝑣0 𝑣1 𝑣2
𝑣3 𝑣4 𝑣5 𝑣6
𝑣7 𝑣8 𝑣9 𝑣10
3.26 For the graph in the prior exercise, give the degree sequence.
✓ 3.27 For a blood transfusion, unless the recipient is compatible with the donor’s
blood type they can have a severe reaction. Compatibility depends on the presence
or absence of two antigens, called A and B, on the red blood cells. This creates
four major groups: A, B, O (the cells have neither antigen), and AB (the cells have
both). There is also a protein called the Rh factor that can be either present (+)
or absent (–). Thus there are eight common blood types, A+, A-, B+, B-, O+, O-,
AB+, and AB-. If the donor has the A antigen then the recipient must also have it,
and the B antigen and Rh factor work the same way. Draw a directed graph where
the nodes are blood types and there is an edge from the donor to the recipient if
transfusion is safe. Produce the adjacency matrix.
3.28 Find the degree sequence of the graph in Example 3.2 and of the two graphs
of Example 3.4.
3.29 Give the array representation, like that in equation (∗), for the graphs of
Example 3.4.
3.30 Draw a graph for this adjacency matrix.
𝑣0 𝑣1 𝑣2 𝑣3
𝑣0 0 1 1 0
𝑣1 ©1 0 0 1 ª®
1 0 0 1®
𝑣2
𝑣3 «0 1 1 0¬
𝑏 𝑐 𝑋 𝑌 𝑍
𝑦
𝐵 𝐶
𝐷 𝐸 𝐹 𝐺
3.39 We can use degrees and degree sequences to to show that graphs are not
isomorphic, or to help construct isomorphisms if they exist. (In this question graphs
can have loops and multiple edges between vertices, but not directed edges or
edges with weights.)
(a) Show that if two graphs are isomorphic then they have the same number of
vertices. Thus graphs with different numbers of vertices are not isomorphic.
(b) Show that if two graphs are isomorphic then they have the same number of
edges. Thus graphs with different numbers of edges are not isomorphic.
(c) Show that if two graphs are isomorphic and one has a vertex of degree 𝑘 then
so does the other. Thus two graphs where one has a degree 𝑘 vertex and the
other does not are not isomorphic.
(d) Show that if two graphs are isomorphic then for each degree 𝑘 , the number of
vertices of the first graph having that degree equals the number of vertices
of the second graph having that degree. Thus graphs with different degree
sequences are not isomorphic.
(e) Use the prior result to show that the two graphs of Example 3.4 are not
isomorphic.
(f) Verify that while these two graphs have the same degree sequence, they are
not isomorphic. Hint: consider the paths starting at the degree 3 vertex.
𝑣2 𝑣3 𝑤0
𝑣0 𝑣1 𝑤2 𝑤3 𝑤4 𝑤5
𝑣4 𝑣5 𝑤1
As in the final item, in arguments we often use the contrapositive of these statements.
For instance, the first item implies that if they do not have the same number of
vertices then they are not isomorphic.
✓ 3.40 Consider these two graphs, G0 and G1 .
𝑣0 𝑣1 𝑛6 𝑛2
𝑣4 𝑣5 𝑛5 𝑛1
𝑣6 𝑣7 𝑛0 𝑛4
𝑣2 𝑣3 𝑛7 𝑛3
3.41 These two are the base and inductive steps for a proof of Lemma 3.12.
(a) An edge is a length-one walk. Show that in the product of the matrix with
2
itself M ( G ) the entry 𝑖, 𝑗 is the number of length-two
𝑛 walks.
(b) Show that for 𝑛 > 2, the 𝑖, 𝑗 entry of the power M ( G ) equals the number
of length 𝑛 walks from 𝑣𝑖 to 𝑣 𝑗 .
3.42 In a finite graph, for a node 𝑞 0 there may be some nodes 𝑞𝑖 that are
unreachable, so there is no path from 𝑞 0 to 𝑞𝑖 .
(a) Devise an algorithm that inputs a directed graph and a start node 𝑞 0 , and
finds the set of nodes that are unreachable from 𝑞 0 .
(b) Apply your algorithm to these two graphs, starting with 𝑤 0 .
𝑤0 𝑤4
𝑤3
𝑤2 𝑤1 𝑤3
𝑤0 𝑤1 𝑤2
Extra
III.A BNF
We shall introduce some grammar notation conveniences that are widely used.
Together they are called Backus-Naur form, BNF.
The study of grammar, the rules for phrase structure
and forming sentences, has a long history, dating
back as early as the fifth century BC. Mathematicians,
including A Thue and E Post, began systematizing it
as rewriting rules by the early 1900’s. The variant we
see here was produced in the late 1950’s by J Backus
John Backus 1924–2007 and Peter
with contributions from P Naur as part of the design
Naur 1928–2016
of the early computer language ALGOL60. Since then
these rules have become the most common way to express grammars.
One difference from Section 2 is a minor typographical change. Metacharacters
including ‘→’ were at the time not typeable with a standard keyboard. In its place
BNF uses ‘::=’.†
BNF is both clear and concise. It can express the range of languages that we
ordinarily want to express (context free grammars) and it smoothly translates to
a parser. That is, BNF is an impedance match — it fits with what we want to do.
Here we will include some extensions for grouping and replication that are like
what you typically see in the wild.‡
1.1 Example This is a BNF grammar for real numbers with a finite decimal part. To
the rules for ⟨natural⟩ from Example 2.6, add these.
†
There are other typographical issues that arise with grammars. While many authors write nonterminals
with diamond brackets, as we do, others use a separate type style or color. ‡ BNF is only loosely defined.
While there are standards, often what you see does not conform exactly to any single standard.
Extra A. BNF 169
1.3 Example This grammar for Python floating point numbers shows both square
brackets and the plus sign.
⟨floatnumber⟩ ::= ⟨pointfloat⟩ | ⟨exponentfloat⟩
⟨pointfloat⟩ ::= [ ⟨intpart⟩ ] ⟨fraction⟩ | ⟨intpart⟩ .
⟨exponentfloat⟩ ::= ( ⟨intpart⟩ | ⟨pointfloat⟩ ) ⟨exponent⟩
⟨intpart⟩ ::= ⟨digit⟩ +
⟨fraction⟩ ::= . ⟨digit⟩ +
⟨exponent⟩ ::= (e | E) [+ | -] ⟨digit⟩ +
In the ⟨pointfloat⟩ rule the first ⟨intpart⟩ is optional. And, an ⟨intpart⟩ consists of
one or more digits.
Each of these extension constructs is not necessary in that we can express the
grammars without the extensions. For instance, we could replace the this use of
Kleene star
⟨identifier⟩ ::= ⟨letter⟩ ( ⟨letter⟩ | ⟨digit⟩ )*
with this.
⟨identifier⟩ ::= ⟨letter⟩ | ⟨letter⟩ ⟨atoms⟩
⟨atoms⟩ ::= ⟨letter⟩ ⟨atoms⟩ | ⟨digit⟩ ⟨atoms⟩ | 𝜀
But these constructs come up often enough that adopting an abbreviation is a
significant convenience.
Passing from the grammar to a parser for that grammar is mechanical. There
are programs that take as input a grammar, often one in BNF, and give as output
source code that will parse files following that grammar’s format. Such a program
is a parser-generator (sometimes instead called a compiler-compiler, which is a
fun term but is misleading because a parser is only part of a compiler).
III.A Exercises
✓ A.4 US ZIP codes have five digits, and may have a dash and four more digits at
the end. Give a BNF grammar.
A.5 Write a grammar in BNF for the language of palindromes, using Σ = { a, ... z }.
✓ A.6 At a college, course designations have a form like ‘MA 208’ or ‘PSY 101’,
where the department is two or three capital letters and the course is three digits.
Give a BNF grammar.
✓ A.7 Example 1.3 uses some BNF convenience abbreviations.
(a) Give a rule (or rules) equivalent to ⟨pointfloat⟩ but that doesn’t use square
brackets.
(b) Similarly replace the repetition operator in ⟨intpart⟩ ’s rule, as well as the
square brackets and repetition for ⟨exponent⟩ .
✓ A.8 In Roman numerals the letters I, V, X, L, C, D, and M stand for the values 1,
5, 10, 50, 100, 500, and 1 000. We represent natural numbers by writing these
letters from left to right in descending order of value, so that XVI represents the
number that in decimal notation is 16, while MDCCCCLVIII represents 1958. We
Extra A. BNF 171
always write the shortest possible string, so we do not write IIIII because we can
instead write V. However, as we don’t have a symbol whose value is larger than
1 000 we must represent large numbers with lots of M’s.
(a) Give a grammar for the strings that make sense as Roman numerals.
(b) Often Roman numerals are written in subtractive notation: for instance, 4 is
represented as IV, because four I’s are hard to distinguish from three of them
in a setting such as the face of a watch or clock. In this notation 9 is IX, 40
is XL, 90 is XC, 400 is CD, and 900 is CM. Give an extended BNF grammar for
the strings that can appear in this notation.
A.9 This grammar is for a small C-like programming language.
⟨program⟩ ::= { ⟨statement-list⟩ }
⟨statement-list⟩ ::= [ ⟨statement⟩ ; ]*
⟨statement⟩ ::= ⟨data-type⟩ ⟨identifier⟩
| ⟨identifier⟩ = ⟨expression⟩
| print ⟨identifier⟩
| while ⟨expression⟩ { ⟨statement-list⟩ }
⟨data-type⟩ ::= int | boolean
⟨expression⟩ ::= ⟨identifier⟩ | ⟨number⟩ | ( ⟨expression⟩ ⟨operator⟩
⟨expression⟩ )
⟨identifier⟩ ::= ⟨letter⟩ [ ⟨letter⟩ ]*
⟨number⟩ ::= ⟨digit⟩ [ ⟨digit⟩ ]*
⟨operator⟩ ::= + | ==
⟨letter⟩ ::= A | B | . . . | Z
⟨digit⟩ ::= 0 | 1 | . . . | 9
(a) Give a derivation and parse tree for this program.
{ int A ;
A = 1 ;
print A ;
}
Extra
III.B Graph traversal
In a number of places in this book we describe traversing a tree or other graph. For
example, when we described Cantor’s correspondence enumerating the set N × N,
we drew this array.
.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···
Number 0 1 2 3 4 5 6 ...
Pair ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .
We can make a graph by connecting each pair in the array to its neighbor above,
and the one to the right. This shows the result with the lower left rotated to the
top.
0, 0
0, 1 1, 0
0, 2 1, 1 2, 0
0, 3 1, 2 2, 1 3, 0
Extra B. Graph traversal 173
This graph isn’t a tree because there are vertices that are connected by more than
one path, for instance ⟨0, 0⟩ and ⟨1, 1⟩ . Instead it is a directed acyclic graph, a
DAG. Cantor’s enumeration is a breadth first traversal of the DAG.
Here we will show Racket code to traverse trees and DAG’s (they are alike in
that they have no cycles).
We will show two ways to do that. Below, on the left is a tree with ten nodes.
On the right the table illustrates visiting the nodes in depth-first order, where
we visit a node’s children before going on to visit its siblings. It also illustrates
breadth-first, where we cover all nodes that are at rank 𝑘 before visiting any nodes
of rank 𝑘 + 1 (a node is rank 𝑘 when a minimal path to the root has 𝑘 edges).
a
Note that children is created as a set so that we can quickly access its members.
This set is mutable because as we create the tree we will add children to that set,
so we must be able to change it.
We use that routine to create new trees and DAG’s.
(define (graph-create first-node-name)
(node-create first-node-name))
The next routine inputs a node and adds a child to its set of children. (LISP-
derived languages have a convention of using an exclamation mark for the names
of procedures whose main role is not to return something but instead to cause side
effects such as altering a data structure.)†
(define (node-add-child! parent child-name)
(let ([n (node-create child-name)])
(set-add! (node-children parent) n)
n))
†
The code here has a way to tie from parent to child but no direct way to tie back. So this tree is
directed. We can of course write code for breadth-first traversals of undirected trees but for our purposes
this suffices.
174 Chapter III. Languages, Grammars, and Graphs
And this returns the finite portion of Cantor’s array shown earlier.
(define (cantor-DAG-make)
(let* ([t (graph-create "0,0")]
[nb (node-add-child! t "0,1")]
[nc (node-add-child! t "1,0")]
[nd (node-add-child! nb "0,2")]
[ne (node-add-child! nb "1,1")]
[v0 (set-add! (node-children nc) ne)]
[nf (node-add-child! nb "2,0")]
[ng (node-add-child! nd "0,3")]
[nh (node-add-child! nd "1,2")]
[v1 (set-add! (node-children ne) nh)]
[ni (node-add-child! ne "2,1")]
[v2 (set-add! (node-children nf) ni)]
[nj (node-add-child! nf "3,0")]
)
t))
To demonstrate the traversal code, at each node we will just print out the name,
(define (show-node-name n r)
(printf "~a~a\n" (string-pad r) (node-name n)))
Mathematical sets are unordered, so when we show elements it can be that the
order out differs from the order in.
Now for breadth-first traversal. It comes in two functions, traverse - bfs and
traverse - bfs - helper . In Scheme-derived languages such as Racket, routines
are often organized with a caller and a helper. This is because the helper function
is tail-recursive. Its very last thing is a recursion, and in a Scheme language
the compiler knows that it can translate such a routine into executable code
that is iterative. This combines the expressiveness of recursion with the memory
conservation of iteration.
The strategy of traverse - bfs - helper is that at each level, when rank
is 𝑘 , the routine traverses all nodes at that rank by moving through the members
of level . As it does so, it stores all of the children of those nodes in the list
next - level .
(define (traverse -bfs node fcn #:maxrank [maxrank MAXIMUM -RANK])
(traverse -bfs-helper (mutable -set node) 0 fcn #:maxrank maxrank))
(define (traverse -bfs-helper level rank fcn #:maxrank [maxrank MAXIMUM -RANK])
(when (< rank maxrank)
(let ([next-level (mutable -set)])
(for ([node level])
(fcn node rank)
(for ([child-node (node-children node)])
(set-add! next-level child-node)
))
(when (not (set-empty? next-level))
(traverse -bfs-helper next-level (+ 1 rank) fcn)))))
This strategy will not just take the routine around a cycle because both trees and
DAGs are acyclic.
Here is the result of running the routine on the sample tree.
> (define t (sample-tree-make))
> (traverse -bfs t show-node-name)
a
b
c
d
h
e
f
g
i
j
176 Chapter III. Languages, Grammars, and Graphs
III.B Exercises
B.1 This is a binary tree because each node has either two children or none (in
the definition some authors also allow one child).
a
b c
d e f g
h i
Section
IV.1 Finite State machines
We produce a new model that computes, the Finite State machine, by modifying
the Turing machine definition. We will strip out the capability to write, changing
from read/write to read-only. It will turn out that these machines can do many
things, but not as many as Turing machines.
Definition We begin with some examples.
1.1 Example This power switch has two states, 𝑞 off and 𝑞 on , and its input alphabet
has one token, toggle. (Its standard symbol is on the right.)
toggle
𝑞off 𝑞on
toggle
The state 𝑞 on is drawn with a double circle, denoting that it is a different kind
of state than 𝑞 off . Finite State machines can’t write to the tape so they need some
other way to declare the computation’s outcome. We say that 𝑞 on is an accepting
state or final state. A computation accepts its input string if it ends with the
machine in an accepting state.
1.2 Example Operate the turnstile below by putting in two tokens and then pushing
through. It has three states and its input alphabet is Σ = { token, push }. As with
Turing machines, the states here serve as a form of memory, although a limited
one. For instance, 𝑞 one is how the turnstile “remembers” that it has so far received
one token.
Image: The astronomical clock in Notre-Dame-de-Strasbourg Cathedral, for computing the date of
Easter. Easter falls on the first Sunday after the first full moon on or after the nominal spring equinox of
March 21. Calculation of this date was a great challenge for mechanisms of that time, 1843. † Studying
the parts of the machine is natural but there is another motivation. A person could object to Turing’s
model by observing that there is a machine that iterates writing a character and then moving right,
and thereby goes through unboundedly many configurations, while no physical device can do that.
A rejoinder is that we for instance define a ‘book’ to be pages with words and don’t worry whether
physics limits the number of possible pages. Happily, we don’t need to go into this to justify our interest.
Tapeless machines are quite practical, appearing often in everyday computing, which is justification
enough.
180 Chapter IV. Automata
push
1.3 Example This vending machine dispenses items that cost 30 cents.† The picture
is complex so we will show it in three layers. First are the arrows for nickels.
push n
After receiving 30 cents and getting another nickel, this machine does something
not very sensible: it stays in 𝑞 30 . In practice a machine would have further states to
keep track of overages so that it could give change but here we ignore that. Next
comes the arrows for dimes
d
d d d d d
q q
𝑞0 1 𝑞1
0 0
1 1
𝑞3 1 𝑞2
0 0
This machine accepts a bitstring if the number of 1’s in its input is a multiple of
four.
†
US coins are: 1 cent coins not used here, nickels are 5 cents, dimes are 10 cents, and quarters are 25.
Section 1. Finite State machines 181
The picture shows a light labeled ‘Accept’. When the machine stops, when the input
string is fully consumed, if the current state is an accepting state then the light
comes on. In this case we say that the machine accepts the input string, otherwise
it rejects that string.
Here is a trace of the steps when we start Example 1.4’s modulo 4 machine
with the input string 𝜏 = 10110. Since the ending state 𝑞 3 is not accepting the
machine rejects 𝜏 .
0110 0
1 q1
4 q3
110
2 q1
5 q3
In contrast with the traces in the first chapter, here we hold the head still and move
the tape. This emphasizes that Finite State machines consume one character per
step. They stop once all the characters are gone so they are sure to halt — there is
no Halting problem for Finite State machines.
1.6 Example The machine below accepts a string if and only if it contains at least two
0’s as well as an even number of 1’s. (In tables we mark accepting states with ‘+’).
182 Chapter IV. Automata
Δ 0 1
𝑞0 𝑞1 𝑞2
𝑞0 𝑞1 𝑞3
0
0 0
𝑞1 𝑞2 𝑞4
1 1 1 1 1 1 + 𝑞2 𝑞2 𝑞5
𝑞3 𝑞4 𝑞0
𝑞3 𝑞4 𝑞5 0
0 0
𝑞4 𝑞5 𝑞1
𝑞5 𝑞5 𝑞2
1.7 Remark We pause to briefly address the key to designing Finite State machines.
Often people new to them put down a 𝑞 0 , think of some input strings and then
add states accounting for those inputs. This can give haphazard results.
Proceeding in this way is thinking of a state as about what happened to get
there. Better is to think of states as about the future. The prior example brings
this out: articulating the role of state 𝑞 1 gives something like, “waiting for a 0” or
possibly “waiting for at least one 0”. Similarly 𝑞 5 is “waiting for a 1.” Another
example is that state 𝑞 4 is looking for a 0 followed by a 1.
Finite State machine descriptions may take the alphabet to be clear from the
context. Thus, Example 1.6’s alphabet is B = { 0, 1 }. For in-practice machines, the
alphabet is the set of characters that the machine could conceivably receive, so that
a text-handling routine built to modern standards might well accept all of Unicode.
But for the examples and exercises in this book we will use small alphabets.†
1.8 Example This machine accepts strings that are valid decimal representations of
integers. So it accepts the strings 21 and -7 and +37 but does not accept 501-.
The transition graph and the table both group some inputs together when they
result in the same action. For instance, when in state 𝑞 0 this machine does the
same thing whether the input is + or -, namely it passes into 𝑞 1 .
+, - 0, . . . 9 –other–
0,...,9
Δ
𝑞0 +,-
𝑞1
0,...,9
𝑞2 0,..,9 𝑞0 𝑞1 𝑞2 𝑒
other
𝑞1 𝑒 𝑞2 𝑒
other
other + 𝑞2 𝑒 𝑞2 𝑒
𝑒 any 𝑒 𝑒 𝑒 𝑒
Any wrong input character sends the machine to the state 𝑒 . Finite State machines
often have an error state, which is a sink in that once the machine enters that state
then it never leaves.
1.9 Example This machine accepts strings that are members of the set { jpg, pdf, png }.
It is our first example with more than one accepting state.
†
We often use the characters a, b, c, etc., because something like ‘b2 ’ is clearer than something like ‘12 ’.
Section 1. Finite State machines 183
𝑞1 p
𝑞2 g
𝑞3
j
𝑞0 p
𝑞4 𝑞5 𝑞6 𝑒
d f
𝑞7 g
𝑞8
That drawing omits many edges, the ones involving the error state 𝑒 . For instance,
from state 𝑞 0 any input character other than j or p is an error. We omit all of these
edges because they would make the drawing hard to read. This illustrates that
while pictures are better for simple machines, past some point of complexity, a
transition table presentation is better than a picture.
That example points out that if a language is finite then there is a Finite State
machine that accepts a string if and only if it is a member of that language.
1.10 Example Finite State machines can accomplish reasonably hard tasks. This one
accepts strings representing natural numbers that are multiples of three such as 15
and 8013, and does not accept non-multiples such as 14 and 8012.
2,5,8
0,3,6,9 𝑞0 𝑞1 0,3,6,9
1,4,7
2,5,8
1,4,7
1,4,7 2,5,8
𝑞2
0,3,6,9
This machine accepts the empty string. Exercise 1.26 asks for a modification to
accept only non-empty strings.
1.11 Example Finite State machines translate easily to code. Here is the Racket code
for the delta function of the prior example’s multiple of three machine.
(define (delta state ch)
(cond
[(= state 0)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 0)
((memv ch '(#\1 #\4 #\7)) 1)
(else 2))]
[(= state 1)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 1)
((memv ch '(#\1 #\4 #\7)) 2)
(else 0))]
[else
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 2)
((memv ch '(#\1 #\4 #\7)) 0)
(else 1))]))
(In Racket, a character such as ‘0’ is denoted #\0. The routine memv decides if
the character ch is in the list, that is, it is a boolean function.) All that’s left is to
supply a calling function
184 Chapter IV. Automata
and a helper.
(define (multiple -of-three-fsm-helper state tau-list)
(if (null? tau-list)
state
(multiple -of-three-fsm-helper (delta state (car tau-list))
(cdr tau-list))))
1.12 Example In the 1940’s, phone call connections were handled by simple devices for
local calls but required operator intervention for long distance. That changed with
the adoption of the Finite State machine here, which allowed users to directly dial
long distance in North America. Consider dialing 1-802-555-0101. The initial 1
means that the call leaves the local office. The 802 is an area code; the system can
tell that this is not a same-area local exchange because its second digit is 0 or 1.
Next, the 555 routes the call to a local office. Then that office’s device makes the
connection to line 0101.
Op Int1 Int𝑛
1 𝑥 𝑥
Legend:
𝑥 0, . . . 9
𝑛 2, . . . 9
0 LL1 𝑛 LL2 𝑛 𝑝 0, 1
𝑝 LL3 𝑛 X1 𝑛
1
𝑞0
2,3,5,7,8
X2 𝑛 X3 𝑥 L1
4
𝑥
6 H2 L2
0, 2, 3, ... 9
9 1
H3 1 Info 𝑥
R2 L3
0, 2, 3, ... 9
1
R3 1
Rep 𝑥
E2 L4
0, 2, 3, ... 9
1
E3 1 Emr 𝑥
Con
Today, no longer are area codes required to have a middle digit of 0 or 1. This
additional flexibility is possible because switching now happens entirely in software.
After the definition of Turing machine we gave a complete description of the
action of those machines. We now do the same for Finite State machines. A
configuration of a Finite State machine is a pair C = ⟨𝑞, 𝜏⟩ where 𝑞 is a state,
𝑞 ∈ 𝑄 , and 𝜏 is a (possibly empty) string, 𝜏 ∈ Σ∗. A machine starts in an initial
configuration C0 = ⟨𝑞 0, 𝜏0 ⟩ , so that 𝜏0 is the input and 𝑞 0 is the initial state.
Section 1. Finite State machines 185
for 𝑘 2, 𝑘 3, 𝑘 5 ∈ N+ and 𝑛, 𝑘 0, 𝑘 1, 𝑘 4 ∈ N.
1.16 Definition For any Finite State machine, the extended transition function
Δ̂ : Σ∗ → 𝑄 gives the state in which the machine ends after starting in the start
state and consuming the given string.
1.17 Example Consider this machine and its transition function.
b a b Δ a b
a 𝑞0 𝑞1 𝑞0
𝑞0 a
𝑞1 𝑞2 + 𝑞1 𝑞1 𝑞2
b
𝑞2 𝑞1 𝑞2
Its extended transition function Δ̂ extends Δ in that it repeats the first row of Δ’s
†
As earlier, read ⊢ aloud as “yields.” ‡ Read ⊢∗ as “yields eventually.” # Finite State machines must
halt and so there is no notion like computably enumerable. Thus the languages that such a machines
can decide is the same as the languages that it can recognize (in contrast with the case for Turing
machines, as defined on page 11). For these machines, ‘recognized’ is the more common term.
186 Chapter IV. Automata
table.
Δ̂( a) = 𝑞 1 Δ̂( b) = 𝑞 0
(We disregard the difference between Δ’s input of characters and Δ̂’s input of
length one strings.) This is Δ̂ on the length two strings.
IV.1 Exercises
For the exercises that give a language description, a useful practice is to think through
that description by naming five strings that are in the language and five that are not.
✓ 1.18 Using this machine, trace through the computation when the input is (a) abba
(b) bab (c) bbaabbaa. Does the machine accept the string?
b a b
a
𝑞0 a
𝑞1 𝑞2
b
1.19 True or false: because a Finite State machine is finite, its language must be
finite.
1.20 Your classmate says, “I have a language L that recognizes the empty string 𝜀 .”
Explain to them the mistake.
1.21 Rebut “no Finite State machine can recognize the language { a𝑛 b 𝑛 ∈ N }
because 𝑛 is infinite.”
✓ 1.22 How many transitions does an input string of length 𝑛 cause a Finite State
machine to undergo? 𝑛 many? 𝑛 + 1? 𝑛 − 1? How many (not necessarily distinct)
states will the machine have visited after consuming the string?
✓ 1.23 For each of these descriptions of a language, give a one or two sentence
informal English-language description. Also list five strings that are elements as
well as five that are not, if there are that many.
(a) L = {𝛼 ∈ { a, b } 𝛼 = a𝑛 ba𝑛 for 𝑛 ∈ N }
∗
Section 1. Finite State machines 187
(d) { a𝑛 ba𝑛+2 ∈ { a, b } 𝑛 ∈ N }
∗
✓ 1.24 For the machines of Example 1.6, Example 1.8, Example 1.9, and Ex-
ample 1.10, answer these. (a) What are the accepting states? (b) Does it
accept the empty string 𝜀 ? (c) What is the shortest string that each accepts?
(d) Is the language of accepted strings infinite?
1.25 As in Example 1.13, give the computation for the multiple of three machine
with the initial string 2332.
1.26 Modify the machine of Example 1.10 so that it accepts only non-empty
strings.
1.27 Produce the transition graph picturing this transition function. What is the
machine’s language?
Δ a b
𝑞0 𝑞2 𝑞1
+ 𝑞1 𝑞0 𝑞2
𝑞2 𝑞2 𝑞2
a a a,b
✓ 1.29 For each language, name five strings in the language and five that are not
(if there are not five, name as many as there are). Then produce a Finite State
machine that recognizes that language. Give both a circle diagram and a transition
function table. The alphabet is Σ = { a, b }.
(a) L1 = {𝜎 ∈ Σ∗ 𝜎 has at least one a and at least one b }
(b) L2 = {𝜎 ∈ Σ∗ 𝜎 has fewer than three a’s }
(c) L3 = {𝜎 ∈ Σ∗ 𝜎 ends in ab }
(d) L4 = { a𝑛 b𝑚 ∈ Σ∗ 𝑛, 𝑚 ≥ 2 }
(e) L5 = { a𝑛 b𝑚 a𝑝 ∈ Σ∗ 𝑚 = 2 and 𝑎, 𝑝 ∈ N }
1.30 Consider the language of strings over Σ = { a, b } containing at least two a’s
and at least two b’s. Name five elements of the language and five non-elements,
if there are that many. Then produce a Finite State machine recognizing this
language. As in Example 1.6, briefly describe the intuitive meaning of the states.
✓ 1.31 For each language give a transition graph and table for a Finite State machine
recognizing the language. Use Σ = { a, b }.
(a) {𝜎 ∈ Σ∗ 𝜎 has at least two a’s }
(b) {𝜎 ∈ Σ∗ 𝜎 has exactly two a’s }
188 Chapter IV. Automata
(b) {𝜎 ∈ { a, b } 𝜎 = 𝜀 }
∗
(c) {𝜎 ∈ { a, b } 𝜎 = a3 b or 𝜎 = ba3 }
∗
(d) {𝜎 ∈ { a, b } 𝜎 = a𝑛 or 𝜎 = b𝑛 for 𝑛 ∈ N }
∗
1.34 Produce a Finite State machine over the alphabet Σ = { A, ... Z, 0, ... 9 } that
accepts only the string 911, and a machine that accepts any string but that one.
1.35 Using Example 1.17, apply the extended transition function to all of the
length three and length four string inputs.
1.36 What happens when the input to an extended transition function is the
empty string?
✓ 1.37 Consider a language of comments that begin with the two-character string
/#, end with the two-character string #/, and have no #/ substrings in the middle.
Give a Finite State machine to recognize that language.
✓ 1.38 Produce a Finite State machine that recognizes each.
(a) {𝜎 ∈ { 0, ... 9 } 𝜎 has either no 0’s or no 2’s }
∗
✓ 1.39 Give a Finite State machine over the alphabet Σ = { A, ... Z } that accepts
only strings in which the vowels occur in ascending order. (The traditional vowels,
in ascending order, are A, E, I, O, and U.)
✓ 1.40 Consider this grammar.
⟨real⟩ → ⟨posreal⟩ | + ⟨posreal⟩ | - ⟨posreal⟩
⟨posreal⟩ → ⟨natural⟩ | ⟨natural⟩ . | ⟨natural⟩ . ⟨natural⟩
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨natural⟩
⟨digit⟩ → 0 | . . . 9
(a) Give five strings of terminals that are in its language and five that are not.
(b) Does the language contain the string .12? (c) Briefly describe the language.
(d) Give a Finite State machine that recognizes the language.
1.41 Produce a Finite State machine for each.
(a) {𝜎 ∈ B∗ every 1 in 𝜎 has a 0 just before it and just after }
(b) {𝜎 ∈ B∗ 𝜎 represents in binary a number divisible by 4 }
(c) {𝜎 ∈ { 0, ... 9 } 𝜎 represents in decimal an even number }
∗
Section
IV.2 Nondeterminism
Turing machines and Finite State machines both have the property that, given the
current state and current character, the next state is completely determined. Once
we lay out an initial tape and push Start then the machine just walks through a
fixed succession of step/next step calculations. We now consider machines that
are nondeterministic, ones for which there may be configurations where there
is more than one next state, or configurations where there is just one, or even
configurations without any next state at all.
190 Chapter IV. Automata
Motivation Imagine a grammar with some rules and a start symbol. We get a
string and are asked to find a derivation of it. The challenge is that we sometimes
don’t know which rules the derivation should follow. For instance, if we have
S → BaS | AbA then from S we can do two different things: which will work?
In the Grammar section’s exercises we expected that an intelligent person
would have the insight to guess the right way. If instead we were writing a program
then we might have it try every case — we might do a breadth-first traversal of the
directed acyclic graph of all derivations — until the program finds a success.
The American philosopher and Hall of Fame baseball catcher Y Berra
said, “When you come to a fork in the road, take it.” That’s a natural way
to attack this problem: when you come up against multiple possibilities,
fork a child for each. Thus, the routine might begin with the start state 𝑆
and for each rule that could apply, it spawns a child process, deriving a
string one removed from the start. After that, each child finds each rule
that could apply to its string and spawns its own children, each of which
Yogi Berra
now has a string that is two removed from the start. Continue until the
1925–2015
desired string appears, if it ever does.
The prototypical example for this strategy is the celebrated Traveling Salesman
problem, that of finding the shortest circuit visiting every city in a list. For instance,
suppose that we want to know if there is a trip that visits each state capital in the
US lower forty eight states and returns back to where it began, in less than 16 000
kilometers. We start at Montpelier, the capital of Vermont. From there we could
fork a process for each potential next capital, making forty seven new processes.
Thus the process that after Montpelier goes next to Concord, New Hampshire
would know that the trip so far is 188 kilometers. In the next round, each child
would fork its own child processes, forty six of them. At the end, many processes
will have failed to find a short-enough trip but if even one finds it then we consider
the overall search a success.
That computation is nondeterministic in that while it is happening the machine
is simultaneously in many different states. Restated, the computation happens
on an unboundedly-parallel machine, where whenever we need an additional
computing agent, another CPU plus tape, one is available.†
We will have two ways to think about nondeterminism, two mental models.‡
The first is the one introduced above: when a machine is presented with multiple
possible next states then it forks, so that it is in all of them simultaneously. The
next example illustrates.
2.1 Example The Finite State machine below is nondeterministic because leaving 𝑞 0
are two arrows labeled 0. It also has states with a deficit of edges such as that no
arrow for 1 leaves 𝑞 1 so if it is in that state and reads that input then it passes to
no state at all.
†
This is like our experience with everyday computers, where we may be writing an email in one window
and watching a video in another. The machine appears to be in multiple states simultaneously. ‡ While
these models are helpful in learning and thinking about nondeterminism, they are not part of the
formal definitions and proofs.
Section 2. Nondeterminism 191
0,1
𝑞0 𝑞1 𝑞2 𝑞3
0 0 1
The graphic below shows what happens with input 00001. It pictures the computa-
tion history as a tree. For instance, on the first 0 the computation splits in two so
the machine is now in two states at once.
Input: 0 0 0 0 1
𝑞0
𝑞0 ⊢
⊢
𝑞0
⊢ 𝑞1
⊢
𝑞0
⊢ 𝑞1 𝑞2 𝑞3
⊢ ⊢ ⊢
𝑞0
⊢ 𝑞1 𝑞2
⊢ ⊢
𝑞0
⊢ 𝑞1 𝑞2
⊢
Step: 0 1 2 3 4 5
As an alternative, we can imagine that the machine is furnished with the answer
(“go around twice, then off to the right”) and only has to check it.
When we talk about this way of expressing the second mental
model our convention is to call the furnisher a demon, because they
somehow know answers that cannot otherwise be found but also
because we must be suspicious and check that the answer is not a trick.
Under this model, a nondeterministic computation accepts the input
if there exists a branch of the computation tree that a deterministic
machine, if told what branch to take, could verify.
Below we shall describe nondeterminism using both paradigms: as
a machine being in multiple states at once, and also as a machine Flauros, Duke of
guessing (or being told and verifying). In this chapter we will do that Hell
for Finite State machines and in the fifth chapter we will return to it
for Turing machines.
Δ 0 1
0,1
𝑞0 {𝑞 0, 𝑞 1 } {𝑞 0 }
𝑞1 {𝑞 2 } {}
𝑞0 𝑞1 𝑞2 𝑞3
0 0 1 𝑞2 {} {𝑞 3 }
+ 𝑞3 {} {}
The imagery in informal terms such as “guess” and “demon” helps introduce the
ideas but may also give an impression that those ideas are fuzzy. So when we next
step through the description of the action of these machines, note that it is precise.
2.5 Remark When we described the action of deterministic Finite State machines
on page 184, we laid out how to construct the sequence of configurations, by
transitioning from each to the succeeding one until the tape is empty. But for
nondeterministic machines there needn’t be one and only one sequence. That
makes a description that is non-constructive the clearer choice.
Section 2. Nondeterminism 193
a,b 𝑞0 a
𝑞1
b a
𝑞2 𝑞3 a,b
b
194 Chapter IV. Automata
is the set of strings containing the substring aa or bb. For instance, the machine
accepts abaaba because there is a sequence of transitions ending in an accepting
state.
recognizes the language { ( ac)𝑛 𝑛 ∈ N } = {𝜀, ac, acac, ... }. The symbol b isn’t
attached to any arrow so it won’t play a part in any accepting string.
Often a nondeterministic Finite State machines is easier to write than a
deterministic machine that does the same job.
2.10 Example Both of these machines accept any string whose next to last character
is a. The nondeterministic one on the left is simpler than the deterministic one.
a 𝑞2 a
a,b 𝑞0 a
𝑞1 𝑞2 b 𝑞0 a 𝑞1 a b
a,b
b
𝑞3
b
𝑞0 𝑞1 𝑞2
0 1
2.12 Example This is a remote control listener that waits to hear the signal 0101110.
That is, it recognizes the language {𝜎 ⌢ 0101110 𝜎 ∈ B∗ }.
0,1 𝑞0 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5 𝑞6 𝑞7
0 1 0 1 1 1 0
𝑞0 +,-,𝜀
𝑞1 𝑞2 0,...9
0,...,9
†
For this purpose the ‘𝜀 ’ is a character, not a representation of the empty string. Assume that it is not
an element of Σ.
Section 2. Nondeterminism 195
For instance, with input 123 the machine can begin as below by following the 𝜀
transition to state 𝑞 1 without reading and deleting the leading 1. It next reads that
1 and transitions to 𝑞 2 , and then stays there while processing the 2 and 3. This
branch of the machine’s computation tree accepting its input and so 123 is in the
machine’s language.
The practical effect of the 𝜀 is that this machine can accept strings that do not start
with a + or - sign.
2.14 Example This machine has a number of 𝜀 transitions.
𝜀
𝑞3 c
𝑞4
𝜀
𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑞5 𝑞6
d
⟨𝑞 0, abc⟩ ⊢ ⟨𝑞 1, bc⟩ ⊢ ⟨𝑞 2, c⟩ ⊢ ⟨𝑞 3, c⟩ ⊢ ⟨𝑞 4, 𝜀⟩
A machine may also, in a single step, follow two or more 𝜀 transitions in succession.
Here, it accepts d by transitioning from 𝑞 0 to 𝑞 5 without consuming any input.
⟨𝑞 0, d⟩ ⊢ ⟨𝑞 5, d⟩ ⊢ ⟨𝑞 6, 𝜀⟩
𝜀 𝑞0
⊢
a 𝑞0 𝜀
𝑞1 𝑞2
b 𝑞1 𝑞1 𝑞1 ⊢ 𝑞2
⊢
𝑞0 ⊢ 𝑞0 ⊢ 𝑞0
Step: 0 1 2 3
At each step, the machine is in all of the states that are inside of the step’s stripe.
For instance, at step 0 the machine is in both 𝑞 0 and 𝑞 1 . After exhausting the tape,
at step 3 it is in both 𝑞 0 and 𝑞 2 and because 𝑞 0 is an accepting state, it accepts the
input aab.
The 𝜀 transitions simplify building Finite State machines.
196 Chapter IV. Automata
2.17 Example An 𝜀 transition can put two machines together with a parallel connection.
Here is a machine whose states are named with 𝑞 ’s combined in parallel with one
whose states are named with 𝑟 ’s.
a,b
𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑠0 c
𝜀
𝑟0 𝑟1
a
We can take the alphabet for the entire machine to be the union, Σ = { a, b, c }.
2.18 Example An 𝜀 transition can also connect machines serially. The machine on
the left below recognizes L0 = { ( aab)𝑖 𝑖 ∈ N }. The one on the right recognizes
L1 = {𝜎0 ⌢ · · · 𝜎 𝑗 − 1 𝑗 ∈ N and 𝜎𝑘 = a or 𝜎𝑘 = aba for 0 ≤ 𝑘 ≤ 𝑗 − 1 }.
𝑞2 a 𝑞1 𝑞4 𝑞5
b
a a
b a
𝑞0 𝑞3
If we insert an 𝜀 bridge to the right side’s initial state from each of the left side’s
final states (here there is only one such state), and de-finalize those states on the
left,
𝑞2 a 𝑞1 𝑞4 𝑞5
b
a a
b a
𝑞0 𝜀
𝑞3
then the combined machine accepts strings in the concatenation of those languages,
L ( M) = L0 ⌢ L1 . For example, it accepts aabaababa, and aabaa, as well as abaa.
2.19 Example We can also use 𝜀 transitions to get the Kleene star of a language.
Without the 𝜀 edge this machine’s language is L = {𝜀, ab },
𝜀
𝑞0 a
𝑞1 𝑞2
b
Section 2. Nondeterminism 197
𝜀
𝑞3 c
𝑞4
𝜀
𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑞5 𝑞6
d
𝐸 (𝑞, 𝑚) 𝑚 =0 1 2 3 𝐸ˆ(𝑞)
𝑞0 {𝑞 0 } {𝑞 0, 𝑞 2 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 }
𝑞1 {𝑞 1 } {𝑞 1 } {𝑞 1 } {𝑞 1 } {𝑞 1 }
𝑞2 {𝑞 2 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 }
𝑞3 {𝑞 3 } {𝑞 3 } {𝑞 3 } {𝑞 3 } {𝑞 3 }
𝑞4 {𝑞 4 } {𝑞 4 } {𝑞 4 } {𝑞 4 } {𝑞 4 }
𝑞5 {𝑞 5 } {𝑞 5 } {𝑞 5 } {𝑞 5 } {𝑞 5 }
𝑞0 a
𝑞1 𝑞2
b
Section 2. Nondeterminism 199
𝑠 2 = {𝑞 1 } 𝑠0 𝑠3 a
a,b
𝑠6 𝑠5 𝑠1
+ 𝑠 3 = {𝑞 2 } 𝑠0 𝑠0 b a a
+ 𝑠 4 = {𝑞 0, 𝑞 1 }
b b a
𝑠4 𝑠3 𝑠2 𝑠3 𝑠4 𝑠7
+ 𝑠 5 = {𝑞 0, 𝑞 2 } 𝑠4 𝑠0 a
+ 𝑠 6 = {𝑞 1, 𝑞 2 } 𝑠0 𝑠3 b
+ 𝑠 7 = {𝑞 0, 𝑞 1, 𝑞 2 } 𝑠4 𝑠3
The machine’s table, and its transition graph, make clear that M𝐷 is deterministic.
Many of the states are unreachable; for example, 𝑠 6 has only outgoing arrows.
Below is the machine with those states removed. Again, the start state is 𝑠 1 .
a,b
𝑠0
a,b
b
𝑠1 a
𝑠4 𝑠3
b
Now we give the algorithm, the powerset construction. States in M𝐷 are sets
of states from M𝑁 . The start state of M𝐷 is the 𝜀 closure 𝐸ˆ(𝑞 0 ) (for machines
without 𝜀 moves this is {𝑞 0 }). A state of M𝐷 is accepting if it contains any of
M𝑁 ’s accepting states.
The transition function Δ𝐷 inputs a state 𝑠𝑖 ∈ M𝐷 , that is, 𝑠𝑖 = {𝑞𝑘0 , ... 𝑞𝑘𝑖 },
along with a character 𝑐 ∈ Σ. First apply M𝑁 ’s next state function to 𝑠𝑖 ’s elements
to get a set 𝑆𝑖,𝑐 = Δ𝑁 (𝑞𝑘0 , 𝑐) ∪ · · · Δ𝑁 (𝑞𝑘𝑖 , 𝑐) (if 𝑠𝑖 is empty then 𝑆𝑖,𝑐 is also empty).
Then include 𝜀 moves: where 𝑆𝑖,𝑐 = {𝑞 𝑗0 , ... 𝑞 𝑗𝑖 }, let Δ𝐷 (𝑠𝑖 , 𝑐) = 𝐸ˆ(𝑞 𝑗0 ) ∪ · · · 𝐸ˆ(𝑞 𝑗𝑖 ) .
(For machines without 𝜀 transitions this second part has no effect.)
2.23 Example We next do a nondeterministic machine with 𝜀 transitions.
𝜀
𝑞0 𝑞1
b
a b
𝜀
𝑞2 𝑞3 a
200 Chapter IV. Automata
The table below computes the associated deterministic machine. The start state is
𝐸ˆ(𝑞 0 ) = {𝑞 0, 𝑞 3 } = 𝑠 7 . A state is accepting if it contains 𝑞 1 .
Here is an example walking through the powerset algorithm. First, let the
machine be in state 𝑠 7 = {𝑞 0, 𝑞 3 } and reading b. In the terms of the algorithm’s
description, applying Δ𝑁 to each element of 𝑠 7 gives 𝑆 7,b = Δ𝑁 (𝑞 0, b) ∪Δ𝑁 (𝑞 3, b) =
{𝑞 1 } ∪ { } = {𝑞 1 }. Taking the 𝜀 closure gives Δ𝐷 (𝑠 7, b) = {𝑞 0, 𝑞 1, 𝑞 3 } = 𝑠 12 .
Finding the 𝜀 closures in advance is a help in constructing the table. We have
𝐸ˆ(𝑞 0 ) = {𝑞 0, 𝑞 3 } = 𝑠 7 , and 𝐸ˆ(𝑞 1 ) = {𝑞 0, 𝑞 1, 𝑞 3 } = 𝑠 12 , and 𝐸ˆ(𝑞 2 ) = {𝑞 2 } = 𝑠 3 ,
and 𝐸ˆ(𝑞 3 ) = {𝑞 3 } = 𝑠 4 .
The transition diagram is below. Many of the machine’s table’s sixteen states, are
unreachable from the starting state 𝑠 7 = 𝐸ˆ(𝑞 0 ) . We can see that by starting at 𝑠 7
and tracing through the states. The diagram omits unreachable states.
b
𝑠7 𝑠 12 b
b a
a
a
𝑠 10 a
𝑠4 𝑠0 a,b
b
The powerset construction shows that for any nondeterministic machine there
is a deterministic machine that recognizes the same language.
We can say more: if the nondeterministic machine has 𝑛 states then the
deterministic machine has at most 2𝑛 states. (It turns out that 2𝑛 is the best
that we can do in that for any 𝑛 there is an 𝑛 state nondeterministic machine
requiring a deterministic machine with 2𝑛 states. However, in practice usually
the deterministic machine is not too big once we minimize the number of states.
Extra C shows how to minimize.)
Section 2. Nondeterminism 201
IV.2 Exercises
2.24 Give the transition function for the machine of Example 2.8, and of Exam-
ple 2.9.
✓ 2.25 Consider this machine.
𝑞0 𝑞1 𝑞2 1
0,1 1
(a) Does it accept the empty string? (b) The string 0? (c) 011? (d) 010?
(e) List all length five accepted strings.
2.26 Your class has someone who asks, “I get that it is interesting, but isn’t all
this machine-guessing stuff just mathematical abstractions that are not real?” How
might the prof respond?
✓ 2.27 Your friend objects, “Epsilon transitions don’t make any sense because the
machine below will never get its first step done; it just endlessly follows the
epsilons.” Correct their misimpression.
𝜀
𝑞0 𝑞1 𝑞2
b
𝜀 ,a
✓ 2.28 Using the nondeterministic machine from Example 2.23, give a computation
tree table like Example 2.15’s for each input string. (a) the empty string (b) a
(c) b (d) aa (e) ab (f) ba (g) bb
2.29 Give a sequence of ‘⊢’ relations showing that Example 2.12’s machine accepts
𝜏 = 010101110.
2.30 This machine has Σ = { a, b }.
b
a 𝑞0 𝑞1 𝜀
𝑞2 b
a,b
(a) What is the 𝜀 closure of 𝑞 0 ? Of 𝑞 1 ? 𝑞 2 ? (b) Does it accept the empty string?
(c) a? b? (d) Show that it accepts aab by producing a suitable sequence of ⊢
relations. (e) List five strings of minimal length that it accepts. (f) List five of
minimal length that it does not accept.
2.31 Produce the table description of the next-state function Δ for the machine in
the prior exercise. It should have three columns, for a, b, and 𝜀 .
2.32 Consider this machine.
0 1
𝑞0 𝑞1 𝑞2
0 1
relations. (c) Does it accept the empty string? (d) 0? 1? (e) List five strings of
minimal length that it accepts. (f) List five of minimal length that it does not accept.
(g) What is the language of this machine?
✓ 2.33 Find the 𝜀 closures of the states of this nondeterministic machine, using a
table like the Example 2.20’s.
0
1 𝑞0 𝑞2 1
0
𝜀 0 𝜀
0,1 𝑞1 𝑞3 0
𝜀
2.34 Draw the transition graph of a nondeterministic machine that recognizes the
language {𝜎 = 𝜏0 ⌢ 𝜏1 ⌢ 𝜏2 ∈ B∗ 𝜏0 = 1, 𝜏1 = 1, and 𝜏2 = ( 00)𝑘 for some 𝑘 ∈ N }.
✓ 2.35 Give diagrams for nondeterministic Finite State machines that recognize the
given language and that have the given number of states. Use Σ = B.
(a) L0 = {𝜎 𝜎 ends in 00 } , having three states
(b) L1 = {𝜎 𝜎 has the substring 0110 } , with five states
(c) L2 = {𝜎 𝜎 contains an even number of 0’s or exactly two 1’s } , with six states
(d) L3 = { 0 } , with one state
∗
✓ 2.36 Draw the graph of a nondeterministic Finite State machine over B that
accepts strings with the suffix 111000111.
2.37 Find a nondeterministic Finite State machine that recognizes this language
of three words: L = { cat, cap, carumba }.
2.38 Give a nondeterministic Finite State machine over Σ = { a, b, c } recognizing
the language of strings that omit at least one of the characters in the alphabet.
✓ 2.39 For each, draw the transition graph for a Finite State machine, which may
be nondeterministic, that accepts the given strings from { a, b }∗.
(a) Accepted strings have a second character of a and next to last character of b.
(b) Accepted strings have second character a and the next to last character is
also a.
2.40 What is the language of this nondeterministic machine with 𝜀 transitions?
a
a 𝑞0 b 𝑞1 𝑞2 b
0 𝑞0 𝑞1 𝑞2 0 0,1
0 0,1
𝑞1
Section 2. Nondeterminism 203
A → aA |bB
B → bB |b 𝑆 a 𝐴 b 𝐵 b 𝐹
(a) Give three strings from the language of the grammar and show that they are
accepted by the machine. (b) Describe that language.
2.49 Decide whether each problem is solvable or unsolvable by a Turing machine.
(a) L𝐷𝐹𝐴 = { ⟨M, 𝜎⟩ the deterministic Finite State machine M accepts 𝜎 }
(b) L𝑁 𝐹𝐴 = { ⟨M, 𝜎⟩ the nondeterministic machine M accepts 𝜎 }
2.50 (a) For the machine of Example 2.23, for each 𝑞 ∈ 𝑄 produce 𝐸 (𝑞, 0) ,
𝐸 (𝑞, 1) , 𝐸 (𝑞, 2) , and 𝐸 (𝑞, 3) . List 𝐸ˆ(𝑞) for each 𝑞 ∈ 𝑄 . (b) Do the same for
Exercise 2.30’s machine.
204 Chapter IV. Automata
Section
IV.3 Regular expressions
a
𝑞0 a
𝑞1 𝑞2
b
3.7 Example The language consisting of strings of a’s whose length is a multiple of
three, L = {𝑎 3𝑘 𝑘 ∈ N } = {𝜀, aaa, aaaaaa, ... }, is described by (aaa)*.
Note that the empty string is a member of that language. A common mistake is
to forget that star includes the possibility of zero-many repetitions.
3.8 Example To match any character we can list them all. The language over
Σ = { a, b, c } of three-letter words ending in bc is { abc, bbc, cbc }. The regular
expression (a|b|c)bc describes it. (Another is (abc)|(bbc)|(cbc).)
3.9 Example Use 𝜀 to mark things as optional. Thus a*(𝜀 |b) describes the lan-
guage of strings that have any number of a’s and optionally end in one b,
L = {𝜀, b, a, ab, aa, aab, ... }. Similarly, to describe the language consisting of
words with between three and five a’s, L = { aaa, aaaa, aaaaa }, we can use
aaa(𝜀 |a|aa).
3.10 Example The language { b, bc, bcc, ab, abc, abcc, aab, ... } has words starting with
any number of a’s (including zero-many a’s), followed by a single b, and then
ending in fewer than three c’s. To describe it we can use a*b(𝜀 |c|cc).
Also see Extra A for extensions that are widely used in practice.
Kleene’s Theorem The next result justifies our study of regular expressions
because it shows that they describe the languages of interest.
3.11 Theorem (Kleene’s theorem) A language is recognized by a Finite State
machine if and only if that language is described by a regular expression.
We will prove this in separate halves. The proofs use nondeterministic machines
but since we can convert those to deterministic machines, the result holds for them
also.
3.12 Lemma If a language is described by a regular expression then there is a Finite
State machine recognizing that language.
Proof Fix an alphabet Σ. We will show that for any regular expression 𝑅 over Σ
there is a machine with alphabet Σ accepting exactly the strings matching the
expression. We use induction on the structure of regular expressions.
Start with regular expressions consisting of a single character. If 𝑅 = ∅ then
L (𝑅) = { } and the machine on the left below recognizes this language. If 𝑅 = 𝜀
then L (𝑅) = {𝜀 } and the machine in the middle recognizes it. If the regular
expression is a character from the alphabet such as 𝑅 = a then the machine on the
right works.
𝑞0 𝑞0 𝑞0 a
𝑞2
new state 𝑠 and use 𝜀 transitions to connect 𝑠 to the start states of M0 and M1 .
See Example 2.17 in the prior section.
Next consider concatenation, 𝑅 = 𝑅0 ⌢ 𝑅1 . Join the two machines serially: for
each accepting state in M0 , make an 𝜀 transition to the start state of M1 and
then convert all of the accepting states of M0 to be non-accepting states. See
Example 2.18.
Finally consider Kleene star, 𝑅 = (𝑅0 )*. For each accepting state in the
machine M0 that is not the start state, make an 𝜀 transition to the start state and
then make the start state an accepting state. See Example 2.19.
3.13 Example Building a machine for the regular expression ab(c|d)(ef)* starts with
machines for each of the single characters.
𝑞4 c 𝑞5
𝜀 𝜀
𝑞0 a 𝑞1 𝜀 𝑞2 b 𝑞3 𝑞12
𝑞8 e 𝑞9 𝜀 𝑞10 f 𝑞11
𝜀
𝑞6 d 𝑞7
𝑞4 c 𝑞5 𝜀
𝜀 𝜀
𝑞0 a 𝑞1 𝜀 𝑞2 b 𝑞3 𝜀 𝑞12 𝑞8 e 𝑞9 𝜀 𝑞10 f 𝑞11
𝜀 𝜀
𝑞6 d 𝑞7
𝑞𝑖 a
𝑞 𝑞𝑜 𝑞𝑖 𝑞𝑜
b ab
In the after picture the edge is labeled ab, with more than just one character.
For this proof we will generalize transition graphs to allow edge labels that are
regular expressions. As we eliminate states, we keep the recognized language of
the machines the same. We will be done when what remains is two states, with
one edge between them. The desired regular expression will be that edge’s label.
Before the proof, one more illustration. Start with the machine on the left.
208 Chapter IV. Automata
b b
𝑞1 𝑞1
a c a c
𝑞0 𝑞2 𝑒 𝜀
𝑞0 𝑞2 𝜀 𝑓
d d
The proof goes as on the right, by introducing a new start state, 𝑒 , and a new final
state, 𝑓 . Then the proof eliminates 𝑞 1 as below.
𝑒 𝜀
𝑞0 𝑞2 𝜀 𝑓
d|(ab*c)
Clearly this machine recognizes the same language as the starting one.
Proof Call the machine M. If it has no accepting states then the regular expression
is ∅ and we are done. Otherwise, we start by transforming M to a new machine,
M̂, that has the same language and that is ready for the state-elimination strategy.
First we arrange that M̂ has a single accepting state. Create a new state 𝑓 and
for each of M’s accepting states make an 𝜀 transition to 𝑓 (by the prior paragraph
ˆ ). Change
there is at least one such accepting state so 𝑓 is connected to the rest of 𝑀
all the accepting states to non-accepting ones and then make 𝑓 accepting. Clearly
this does not change the language of accepted strings.
Next introduce a new start state, 𝑒 . Connect it to 𝑞 0 with an 𝜀 transition, again
leaving the language of the machine unchanged. (Putting 𝑒 in M̂ allows us to
uniformly eliminate each state in M when we say below, “Pick any 𝑞 not equal to
𝑒 or 𝑓 .”)
Because the edge labels are regular expressions, we can arrange that from
any 𝑞𝑖 to any 𝑞 𝑗 there is at most one edge, since if M has more than one edge then
in M̂ we can use alternation, ‘|’, to combine the labels.
a
𝑞𝑖 𝑞𝑗 𝑞𝑖 𝑞𝑗
a|b
b
Do the same with loops, that is, cases where 𝑖 = 𝑗 . These adjustments do not
change the language of accepted strings.
The last part of transforming to M̂ is to drop states that are useless in that
they don’t affect which strings are accepted. If a state node other than 𝑓 has
no outgoing edges then omit it, along with the edges into it. The language of
the machine will not change because this state is not itself accepting as only 𝑓 is
accepting, and cannot lead to an accepting state since it doesn’t lead anywhere.
Along the same lines, if a state node is unreachable from the start 𝑒 then drop that
node along with its incoming and outgoing edges. (The idea behind useless states
has some technical aspects. For instance, omitting a no-outgoing-edges node along
with its incoming edges can result in another node now having no outgoing edges,
which in turn needs the same treatment. But these machine have only finitely
many nodes and so this omitting process must eventually finish. For a definition of
unreachability see Exercise 3.34.)
Section 3. Regular expressions 209
With that, M̂ is ready for state elimination. Pick any 𝑞 not equal to 𝑒 or 𝑓 .
Below are before and after diagrams. By the setup work above, 𝑞 has at least one
incoming and at least one outgoing edge. So there are states 𝑞𝑖 0 , . . . 𝑞𝑖 𝑗 with an
edge leading into 𝑞 , and states 𝑞𝑜 0 , . . . 𝑞𝑜𝑘 that receive an edge leading out of 𝑞 .
In addition, 𝑞 may have a loop. (A fine point is that possibly some of the states
shown on the left of each diagram equal some shown on the right. For example,
possibly 𝑞𝑖 0 equals 𝑞𝑜 0 , and the edge on the top of each diagram is a loop.)
𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜 0 )
𝑅𝑖 ,𝑜
0 0 0 0 0
𝑞𝑖0 𝑅𝑖 ,𝑜
𝑞𝑜 0 𝑞𝑖0 𝑞𝑜 0
0 𝑘 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜𝑘 )
0 𝑘 0
𝑅𝑜 0
𝑅𝑖
0
.. 𝑞
.. .. ..
. 𝑅ℓ . . .
𝑅𝑖
𝑗
𝑅𝑜𝑘
𝑞𝑖 𝑗 𝑅𝑖 ,𝑜 𝑞𝑜𝑘 𝑞𝑖 𝑗 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜 0 ) 𝑞𝑜𝑘
𝑗 0 𝑗 0 𝑗
𝑅𝑖 ,𝑜 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜𝑘 )
𝑗 𝑘 𝑗 𝑘 𝑗
Eliminate 𝑞 and the associated edges by making the replacements shown on the
after diagram. (If an edge is not present then don’t include any regular expression
in the replacement. For instance, if there is no 𝑅ℓ edge then the right’s top edge
should be 𝑅𝑖 0,𝑜 0 |𝑅𝑖 0 𝑅𝑜 0 .) By construction for any two states in the after machine,
𝑞𝑖 and 𝑞𝑜 , in passing from the before diagram to the after, the set of strings taking
the machine from the first to the second is unchanged. Thus languages of the
before and after machines are equal.
Repeat this procedure until the only states left are 𝑒 and 𝑓 . The desired regular
expression is on the sole remaining edge.
3.15 Example We want a regular expression describing the language of the machine M
on the left below. Introduce 𝑒 and 𝑓 as on the right. There are no useless states so
this is M̂.
b
b
𝑞2
𝑞2
b a
b a
b
b
𝑒 𝑞0 𝑞1 𝑓
𝑞0 𝑞1 𝜀 𝜀
a
a
𝜀
b|(ab*b)
𝑒 𝜀
𝑞0 𝑞1 𝜀 𝑓
a
𝜀
Next 𝑞 1 . There is one node giving an incoming arrow, 𝑞 0 = 𝑞𝑖 0 , and two nodes
associated with outgoing arrows, 𝑞 0 = 𝑞𝑜 0 and 𝑓 = 𝑞𝑜 1 . (Note that 𝑞 0 is both
an incoming and outgoing node; this is the “fine point” mentioned in the proof.)
The regular expressions are: there is no arrow for 𝑅𝑖 0,𝑜 0 , 𝑅𝑖 0,𝑜 1 = 𝜀 , 𝑅𝑖 0 = a,
𝑅𝑜 0 = b|(ab*b), there is also no arrow for 𝑅ℓ , and 𝑅𝑜 1 = 𝜀 . Eliminating 𝑞 1
means that the next machine has an arrow from 𝑞𝑖 0 = 𝑞 0 to 𝑞𝑜 0 = 𝑞 0 labeled
𝑅𝑖 0,𝑜 0 |(𝑅𝑖 0 𝑅ℓ *𝑅𝑜 0 ), which is a(b|ab*b). It also means that the machine has an
arrow from 𝑞𝑖 0 = 𝑞 0 to 𝑞𝑜 1 = 𝑞 𝑓 labeled 𝑅𝑖 0,𝑜 1 |(𝑅𝑖 0 𝑅ℓ *𝑅𝑜 1 ), which is 𝜀 |(a𝜀 ).
a(b|ab*b)
𝑒 𝜀
𝑞0 𝑓
𝜀 |a𝜀
Final step. The sole incoming node is 𝑒 = 𝑞𝑖 0 and the sole outgoing node is 𝑓 = 𝑞𝑜 0 .
As well, 𝑅𝑖 0 = 𝜀 , 𝑅𝑜 0 = 𝜀 |a𝜀 , and 𝑅ℓ = a(b|ab*b).
𝑒 𝜀 (a(b|ab*b))*(𝜀 |a𝜀 )
𝑓
This regular expression describes the language of the starting machine (we can
simplify it; for instance, in the final parenthesis we can replace a𝜀 with a).
IV.3 Exercises
3.16 Decide if the string 𝜎 matches the regular expression 𝑅 . (a) 𝜎 = 0010,
𝑅 = 0*10 (b) 𝜎 = 101, 𝑅 = 1*01 (c) 𝜎 = 101, 𝑅 = 1*(0|1) (d) 𝜎 = 101,
𝑅 = 1*(0|1)* (e) 𝜎 = 01, 𝑅 = 1*01*
✓ 3.17 For each regular expression produce five bitstrings that match and five
that do not, or as many as there are if there are not five. (a) 01* (b) (01)*
(c) 1(0|1)1 (d) (0|1)(𝜀 |1)0* (e) ∅
3.18 Give a brief plain English description of the language for each regular
expression. (a) a*cb* (b) aa* (c) a(a|b)*bb
✓ 3.19 For each string in { a, b }∗ that is of length less than or equal to 3, decide if the
string is a match. (a) a*b (b) a* (c) ∅ (d) 𝜀 (e) b(a|b)a (f) (a|b)(𝜀 |a)a
3.20 For these regular expressions, decide if each element of B∗ of length at most 3
is a match. (a) 0*1 (b) 1*0 (c) ∅ (d) 𝜀 (e) 0(0|1)* (f) (100)(𝜀 |1)0*
✓ 3.21 A friend says to you, “The point of parentheses is that you first do inside
the parentheses and then do what’s outside. So Kleene star must mean ‘match
the inside and repeat’. So I think that (0*1)* should match the strings 001001
and 010101, but not the strings 01001 and 00000101, because you can’t write
those two as 𝜎 𝑛 for any substring 𝜎 .” Straighten them out.
Section 3. Regular expressions 211
3.22 The person behind you in class says, “I don’t get it. I got a regular expression
that I am sure is right. But I looked in the answers and the book got a different
one.” Explain what is up.
3.23 Produce a regular expression for the language of bitstrings that have a
substring consisting of at least three consecutive 1’s.
3.24 For each language, give five strings that are in the language and five that
are not. Then give a regular expression describing the language. Finally, give a
Finite State machine that accepts the language (a nondeterministic machine is
acceptable). (a) L0 = { a𝑛 b2𝑚 𝑚, 𝑛 ≥ 1 } (b) L1 = { a𝑛 b3𝑚 𝑚, 𝑛 ≥ 1 }
3.25 Give a regular expression for the language over Σ = { a, b, c } whose strings
are missing at least one letter, that is, whose strings are either without any a’s, or
without any b’s, or without any c’s.
3.26 Give a regular expression for each language. Use Σ = { a, b }. (a) The set of
strings starting with b. (b) The set of strings whose second-to-last character is a.
(c) The set of strings containing at least one of each character. (d) The strings
where the number of a’s is divisible by three.
3.27 Give a regular expression to describe each language over the alphabet
Σ = { a, b, c }. (a) The set of strings starting with aba. (b) The set of strings
ending with aba. (c) The set of strings containing the substring aba.
✓ 3.28 Give a regular expression to describe each language over B. (a) The set of
strings of odd parity, where the number of 1’s is odd. (b) The set of strings where
no two adjacent characters are equal. (c) The set of strings representing in binary
multiples of eight.
✓ 3.29 Give a regular expression to describe each language over the alphabet
Σ = { a, b }. (a) Every a is both immediately preceded and immediately followed
by a b. (b) Each string has at least two b’s that are not followed by an a.
3.30 Give a regular expression for each language of bitstrings. (a) The number of
0’s is even. (b) There are more than two 1’s. (c) The number of 0’s is even and
there are more than two 1’s.
3.31 Give a regular expression to describe each language.
(a) {𝜎 ∈ { a, b } 𝜎 ends with the same symbol that it began with, and 𝜎 ≠ 𝜀 }
∗
the set 𝑆𝑖 of states that are reachable in 𝑖 -many steps, for each 𝑞˜ ∈ 𝑆𝑖 follow each
outbound edge for a single step and also include the elements of the 𝜀 closure.
The union of 𝑆𝑖 with the collection of the states reached in this way is the set 𝑆𝑖+1 .
Stop when 𝑆𝑖 = 𝑆𝑖+1 , at which point it is the set of ever-reachable states. The
unreachable states are the others. For each machine, use that definition to find the
set of unreachable states.
𝑞0 𝑞4 a,b
𝑞3 b a,b
(a) a
(b) a
a b 𝑞2 𝑞1 𝑞3
𝑞0 𝑞1 𝑞2 a
a b
b b a,b
3.35 Here is a grammar for regular expressions that reflects the operator
precedence rules.
⟨reg-exp⟩ → ⟨concat⟩ | ⟨reg-exp⟩ ‘|’ ⟨concat⟩
⟨concat⟩ → ⟨simple⟩ | ⟨concat⟩ ⟨simple⟩
⟨simple⟩ → ( ⟨reg-exp⟩ ) | ⟨simple⟩ * | ∅ | 𝜀 | 𝑥 0 | . . . | 𝑥𝑛
Derive and construct the parse tree for each regular expressions over Σ = { a, b, c }.
(a) a(b|c) (b) ab*(a|c)
3.36 Use the grammar in the prior exercise to give the parse trees for Remark 3.6’s
a(b|c)* and a(b*|c*).
3.37 Apply the method of Lemma 3.14’s proof to this machine to eliminate 𝑞 0 .
a,b
b
𝑞0 𝑞1
a
(a) Get M̂ by introducing 𝑒 and 𝑓 . (b) Where 𝑞 = 𝑞 0 , describe which state from
the machine is playing the diagram’s before picture role of 𝑞𝑖 0 , which edge is 𝑅𝑖 0 , etc.
(c) Eliminate 𝑞 0 .
✓ 3.38 Apply method of Lemma 3.14’s proof to this machine. At each step describe
which state from the machine is playing the role of 𝑞𝑖 0 , which edge is 𝑅𝑖 0 , etc.
0,1
𝑞0 𝑞1 𝑞2
1 1
(a) Eliminate 𝑞 0 . (b) Eliminate 𝑞 1 . (c) 𝑞 2 (d) Give the regular expression.
3.39 Apply the state elimination method of Lemma 3.14’s proof to eliminate 𝑞 1 .
Note that each of the states 𝑞 0 and 𝑞 2 are as described in the proof ’s comment on
the fine point.
F B
A C
𝑞0 𝑞1 𝑞2
E D
Section 3. Regular expressions 213
3.41 Fix a Finite State machine M. Kleene’s Theorem shows that the set of strings
taking M from the start state 𝑞 0 to the set of final states is regular.
(a) Show that for any set of states 𝑆 ⊆ 𝑄 M , final or not, the set of strings taking
M from 𝑞 0 to one of the states in 𝑆 is regular.
(b) Show that the set of strings taking M from any single state to any other single
state is regular.
3.42 Fix an alphabet Σ. Show that the set of languages over Σ that are described
by a regular expression is countably infinite. Conclude that there are languages
over Σ not recognized by any Finite State machine.
3.43 An alternative proof of Lemma 3.12, the subset method, goes from a given
regular expression to an associated machine by reversing the steps of Lemma 3.14.
Start by labeling the single edge on a two-state machine with the given regular
expression.
𝑒 𝑅
𝑓
𝑞𝑖
𝑅0 𝑅1
𝑞𝑜 =⇒ 𝑞𝑖
𝑅0
𝑞
𝑅1
𝑞𝑜
𝑅1
𝑞𝑖
𝑅0 |𝑅1
𝑞𝑜 =⇒ 𝑞𝑖
𝑅0
𝑞𝑜
𝑞𝑖 𝜀
𝑞 𝜀
𝑞𝑜
𝑞𝑖
𝑅*
𝑞𝑜 =⇒
𝑅
Use this approach to get a machine that recognizes the language described by
these regular expressions. (a) a|b (b) ca* (c) (a|b)c* (d) (a|b)(b*|a*)
3.44 Nondeterministic Finite State machines can always be made to have a single
accepting state. For deterministic machines that is not so.
(a) Show that any deterministic Finite State machine that recognizes the finite
language L1 = {𝜀, a } must have at least two accepting states.
(b) Show that any deterministic Finite State machine that recognizes L2 =
{𝜀, a, aa } must have at least three accepting states.
(c) Show that for any 𝑛 ∈ N there is a regular language that is not recognized by
any deterministic Finite State machine with at most 𝑛 accepting states.
214 Chapter IV. Automata
Section
IV.4 Regular languages
We have seen that deterministic Finite State machines, nondeterministic Finite
State machines, and regular expressions all describe the same set of languages.
The fact that we can describe these languages in so many different ways says that
there is something natural and important about them.†
Definition We now study the languages in this collection.
4.1 Definition A regular language is one that is recognized by some Finite State
machine or equivalently, described by a regular expression.
4.2 Lemma Fix an alphabet. The set of regular languages over it is countably infinite.
There are languages that are not regular.
Proof Call the alphabet Σ. We first show that the set of regular languages over Σ is
infinite. Section A specifies that any alphabet is nonempty and finite. Where 𝑥 is a
character from Σ, each of these languages is finite and therefore regular: L0 = { },
L1 = {𝑥 }, L2 = {𝑥𝑥 } . . .
Next we show that the set of regular languages over Σ is at most countable.
There is one language for each regular expression so we can do that by showing
that there are countably many regular expressions. There are finitely many regular
expressions of length 1, finitely many of length 2, etc. The union of them all is a
countable union of countable sets, and so is countable.
To finish we show that the set of all languages over Σ is uncountable, from
which it follows that there are languages that are not regular. First, the collection of
strings Σ∗ is infinite because where 𝑦 ∈ Σ, it contains infinitely the many different
elements, 𝑦 , 𝑦𝑦 , . . . In addition, that collection contains finitely many strings
of length zero, finitely many of length one, etc. and so is a countable union
of countable sets, and is therefore countably infinite. In contrast, the set of all
languages L ⊆ Σ∗ is the power set of Σ∗, and so has a greater cardinality, which
makes it uncountable.
Closure properties In proving the first half of Kleene’s Theorem, Lemma 3.12, we
showed that if L0 and L1 are regular then their union L0 ∪ L1 is regular, as is their
concatenation L0 ⌢ L1 , and the Kleene star L0 ∗. A set is closed under an operation
if performing that operation on its members always yields another member. This
restates Lemma 3.12 using that term.
4.3 Lemma The collection of regular languages is closed under the union of two
languages,‡ the concatenation of two languages, and the Kleene star of a language.
We can ask about the closure of regular languages under other operations. To
answer we will use the product construction.
†
This is just like how the fact that Turing machines, general recursive functions, and many other models
all compute the same sets says that these computable sets are a natural and important collection. This
collection is not just a historical artifact of what happened to be first proposed. ‡ If the two languages
have different alphabets Σ0 and Σ1 then the two languages as well as their union are regular over the
alphabet Σ0 ∪ Σ1 .
Section 4. Regular languages 215
4.4 Example The machine on the left, M0 , accepts strings with fewer than two a’s.
The one on the right, M1 , accepts strings with an odd number of b’s.
a,b a a
b b
𝑞0 𝑞1 𝑞2 𝑠0 𝑠1
a a
b
Δ0 a b Δ1 a b
+ 𝑞0 𝑞1 𝑞0 𝑠0 𝑠0 𝑠1
+ 𝑞1 𝑞2 𝑞1 + 𝑠1 𝑠1 𝑠0
𝑞2 𝑞2 𝑞2
The product machine M has states that are the members of the cross product
𝑄 0 × 𝑄 1 and transitions that are given by Δ (𝑞𝑖 , 𝑠 𝑗 ) = Δ0 (𝑞𝑖 ), Δ1 (𝑠 𝑗 ) . Its start
state is (𝑞 0, 𝑠 0 ) .
Δ a b
(𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 )
(𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 )
(𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 )
(𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 )
(𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 )
(𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 )
a b a b
(𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 ) + (𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 )
+ (𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 ) (𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 )
(𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 ) + (𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 )
+ (𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 ) (𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 )
(𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 )
(𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 )
On the right, accepting states (𝑞𝑖 , 𝑠 𝑗 ) are the ones where 𝑞𝑖 is accepting and 𝑠 𝑗 is not.
Then the machine accepts strings that are in the language of M0 but not that of M1 ,
so M recognizes {𝜎 ∈ Σ 𝜎 has fewer than two a’s and an even number of b’s }.
216 Chapter IV. Automata
4.5 Theorem The collection of regular languages is closed under the intersection of
two languages, the set difference of two languages, and the set complement of a
language.
Proof Fix an alphabet Σ and consider languages L0 and L1 . Let them be recognized
by the Finite State machines M0 and M1 . Perform the product construction to
get M.
If the accepting states of M are those pairs where both the first and second
component states are accepting then M recognizes the intersection of the languages,
L0 ∩ L1 . If the accepting states of M are those pairs where the first component
state is accepting but the second is not, then M recognizes the set difference of
the languages, L0 − L1 . A special case of this is when L0 is the set of all strings, Σ∗,
so that M recognizes the complement, L1 c.
These closure properties often simplify showing that a language is regular.
4.6 Example To show that the language
IV.4 Exercises
4.7 Someone in class says, “I know that regular languages are closed under closure
properties. For example, we know if L0 and L1 are regular then their intersection
L0 ∩ L1 is also regular. But when L0 and L1 are not regular then L1 0 ∩ L1 = L2
doesn’t make L2 not regular, why? Doesn’t being closed mean for non-regularity
too?” Explain it to them.
4.8 Is English a regular language?
4.9 Name a class of languages that are closed under intersection and union but
not under complement.
✓ 4.10 True or false? Justify each answer.
(a) The empty language is not regular.
(b) The intersection of two languages is regular.
(c) The language of all bitstrings, B∗, is not regular.
(d) In every infinite regular language there are two strings where no character
from the alphabet is in the same place in both.
4.11 One of these is true and one is false. Which is which? (a) Any finite language
is regular. (b) Any regular language is finite.
4.12 Is {𝜎 ∈ B∗ 𝜎 represents in binary a power of 2 } a regular language?
Section 4. Regular languages 217
a,b
𝑞0 𝑞1 a
𝑞2 𝑠0 𝑠1
b
a
b
give the transition table for the product machine. Specify the accepting states so that
the result will accept (a) the intersection of the languages of the two machines, and
(b) the union of the languages.
4.18 Find the cross product of this machine, M from Example 4.4, with itself.
a a
b
𝑠0 𝑠1
b
4.32 Prove that the language recognized by a Finite State machine with 𝑛 states
is infinite if and only if the machine accepts at least one string of length 𝑘 , where
𝑛 ≤ 𝑘 < 2𝑛 .
4.33 Fix two alphabets Σ0, Σ1 . A function ℎ : Σ0 → Σ1 ∗ induces a homomorphism
on Σ0 ∗ via the operation ℎ(𝜎 ⌢ 𝜏) = ℎ(𝜎) ⌢ ℎ(𝜏) and ℎ(𝜀) = 𝜀 .
(a) Take Σ0 = B and Σ1 = { a, b } . Fix a homomorphism ℎ̂( 0) = a and ℎ̂( 1) = ba.
Find ℎ̂( 01) , ℎ̂( 10) , and ℎ̂( 101) .
(b) Define ℎ( L) = {ℎ(𝜎) 𝜎 ∈ Σ0 ∗ } . Let L̂ = {𝜎 ⌢ 1 𝜎 ∈ B∗ } ; describe it with a
regular expression. Using the homomorphism ℎ̂ from the prior item, describe
ℎ̂( L̂) with a regular expression.
(c) Prove that the collection of regular languages is closed under homomorphism,
that if L is regular then so is ℎ( L) .
4.34 Find a nondeterministic Finite State machine M so that producing another
machine M̂ by taking the complement of the accepting states, 𝐹 M̂ = (𝐹 M ) c, will
not result in the language of the second machine being the complement of the
language of the first.
4.35 We will show that the class of regular languages is closed under reversal.
Recall that the reversal of the language is defined to be the set of reversals of the
strings in the language L R = {𝜎 R 𝜎 ∈ L }.
(a) Show that for any two strings the reversal of the concatenation is the concate-
nation, in the opposite order, of the reversals (𝜎0 ⌢ 𝜎1 ) R = 𝜎1 R ⌢ 𝜎0 R. Hint: do
induction on the length of 𝜎1 .
(b) We will prove the result by showing that for any regular expression 𝑅 , the
reversal L (𝑅) R is described by a regular expression. We will construct
this expression by defining a reversal operation on regular expressions.
Fix an alphabet Σ and let (1) ∅ R = ∅, (2) 𝜀 R = 𝜀 , (3) 𝑥 R = 𝑥 for
any 𝑥 ∈ Σ, (4) (𝑅0 ⌢ 𝑅1 ) R = 𝑅1 R ⌢ 𝑅0 R , (5) (𝑅0 |𝑅1 ) R = 𝑅0 R |𝑅1 R , and
(6) (𝑅 *) R = (𝑅 R )*. (Note the connection between (4) and the prior exercise
item.) Now show that 𝑅 R describes L (𝑅) R . Hint: use induction on the length
of the regular expression 𝑅 .
220 Chapter IV. Automata
Section
IV.5 Non-regular languages
The prior section show that there are languages that are not regular via a counting
argument. We now see a technique to show that specified languages are not
regular.† This is similar to the second chapter, where we first used a counting
argument to prove that there are unsolvable problems and later showed that specific
problems such as the Halting problem are unsolvable.
The idea is that although Finite State machines are finite, they can get arbitrarily
long inputs. For instance, the power switch from Example 1.1 has only two states
but even if we toggle it hundreds of times, it still keeps track of whether the switch
is on or off. The key observation is that to process long inputs with only a small
number of states, a machine must revisit states, that is, it must cycle.
Cycles inside a machine cause a pattern in what that machine accepts. The
diagram below shows a machine that accepts aabbbc (it only shows some of the
states, those that the machine traverses in processing this input).
b
𝑞𝑖3 𝑞𝑖2
b a
(∗)
𝑞0 a
𝑞𝑖1 𝑞𝑖4 c
𝑞𝑖5
b
Because of the cycle, in addition to aabbbc this machine must also accept a ( abb) 2 bc
since that string takes the machine through the cycle twice. Likewise, this machine
accepts a ( abb) 3 bc, and cycling more times pumps out more accepted strings.
5.1 Theorem (Pumping Lemma) Let L be a regular language. There is a constant 𝑝 ∈
N+, the pumping length for the language,‡ such that every string 𝜎 ∈ L with
|𝜎 | ≥ 𝑝 decomposes into three substrings 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 satisfying: (1) the first two
components are short, |𝛼𝛽 | ≤ 𝑝 , (2) 𝛽 is not empty, and (3) the strings 𝛼𝛾 , 𝛼𝛽 2𝛾 ,
𝛼𝛽 3𝛾 , . . . are also members of the language L.
Proof Suppose that L is recognized by the deterministic Finite State machine M.
For 𝑝 it suffices to use the number of states in M.
Consider a string 𝜎 ∈ L with |𝜎 | ≥ 𝑝 . Finite State machines perform one
transition per character so the number of characters in an input string equals the
number of transitions. Thus the number of states, not necessarily distinct ones,
that the machine visits is one more than the number of transitions. (For instance,
with a one-character input a machine visits two states.) So in processing 𝜎 , the
machine revisits at least one state. It cycles.
Of the states that are repeated as the machine processes 𝜎 , fix the one 𝑞 that it
revisits first. Also fix 𝜎 ’s shortest two prefixes ⟨𝑠 0, ... 𝑠𝑖 ⟩ and ⟨𝑠 0, ... 𝑠𝑖 , ... 𝑠 𝑗 ⟩ that
†
?? contains another way to show that a language is not regular. While somewhat more abstract, it
applies to all non-regular languages whereas the result here does not apply to some (see Exercise 5.30).
‡
If 𝑝 works then so does any number greater than 𝑝 .
Section 5. Non-regular languages 221
take the machine to 𝑞 . That is, 𝑖 and 𝑗 are minimal such that 𝑖 ≠ 𝑗 and the extended
transition function gives Δ̂(⟨𝑠 0, ... 𝑠𝑖 ⟩) = Δ̂(⟨𝑠 0, ... 𝑠 𝑗 ⟩) = 𝑞 . Let 𝛼 = ⟨𝑠 0, ... , 𝑠𝑖 ⟩ , let
𝛽 = ⟨𝑠𝑖+1, ... 𝑠 𝑗 ⟩ , and let 𝛾 = ⟨𝑠 𝑗+1, ... 𝑠𝑘 ⟩ .
These strings satisfy conditions (1) and (2). In particular, choosing 𝑞 , 𝑖 , and 𝑗 to
be minimal guarantees that that |𝛼 ⌢𝛽 | ≤ 𝑝 because the machine has 𝑝 -many states
and so a state must repeat by at most the 𝑝 -th input character. For condition (3),
this string
𝛼 ⌢ 𝛾 = ⟨𝑠 0, ... 𝑠𝑖 , 𝑠 𝑗+1, ... 𝑠𝑘 ⟩
brings the machine from the start state 𝑞 0 to 𝑞 , and then to the same ending state
as did 𝜎 . That is, Δ̂(𝛼𝛾) = Δ̂(𝛼𝛽𝛾) and so the machine accepts 𝛼𝛾 . As to the other
strings in (3), for instance with
the substring 𝛼 brings the machine from 𝑞 0 to 𝑞 , the first 𝛽 brings it from 𝑞 around
to 𝑞 again, the second 𝛽 makes the machine cycle to 𝑞 yet again, and finally 𝛾
brings it to the same ending state as did 𝜎 .
We typically use the Pumping Lemma to show that a language is not regular
through an argument by contradiction.
5.2 Example The canonical example is to show that this language of matched
parentheses is not regular.
For contradiction assume that L is regular. The Pumping Lemma says that this
language has a pumping length. Call it 𝑝 and consider 𝜎 = a𝑝 ba𝑝.
The string 𝜎 is an element of L and |𝜎 | ≥ 𝑝 . Thus it decomposes as 𝜎 = 𝛼𝛽𝛾 ,
subject to the three conditions. Condition (1) is that |𝛼𝛽 | ≤ 𝑝 and so both
substrings 𝛼 and 𝛽 are composed entirely of a’s. Condition (2) is that 𝛽 is not the
empty string and so 𝛽 consists of at least one a. Condition (3) states that all of
the strings 𝛼𝛾, 𝛼𝛽 2𝛾, 𝛼𝛽 3𝛾, ... are members of L. Consider the first, 𝛼𝛾 (there are
other choices that would work).
Compared to 𝜎 = 𝛼𝛽𝛾 , in 𝛼𝛾 the substring 𝛽 is gone. Because 𝛼 and 𝛽 consist
entirely of a’s, the substring 𝛾 has the b character from 𝜎 , and hence also has the
a𝑝 that follows this b. So compared to 𝜎 = 𝛼𝛽𝛾 , the string 𝛼𝛾 omits at least one a
before the b but none of the a’s after it. Therefore 𝛼𝛾 is not a palindrome, which is
the desired contradiction.
5.4 Remark In that example the string 𝜎 has three parts, 𝜎 = a𝑝 ⌢ b ⌢ a𝑝, and it
decomposes into three parts, 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 . Don’t make the mistake of thinking that
the two decompositions line up. The Pumping Lemma does not say that 𝛼 = a𝑝,
𝛽 = b, and 𝛾 = a𝑝 — indeed, we’ve shown that 𝛽 does not contain the b. Instead
the lemma’s first condition only says that the first two substrings together, 𝛼𝛽 ,
consists exclusively of a’s. So perhaps 𝛼𝛽 = a𝑝, or perhaps 𝛾 starts with some a’s
that are then followed by ba𝑝. That is, all we know is that 𝛼𝛽 matches the regular
expression a* while 𝛾 matches a*baa ... a, with 𝑝 -many a’s at the end.
5.5 Example Consider L = { 0𝑚 1𝑛 ∈ B∗ 𝑚 = 𝑛 + 1 } = { 0, 001, 00011, ... }, whose
members start with a number of 0’s that is one more than the number of 1’s at the
end. We will prove that it is not regular.
For contradiction assume otherwise, that L is regular, and denote its pumping
length by 𝑝 . Consider 𝜎 = 0𝑝+1 1𝑝 ∈ L. Because |𝜎 | ≥ 𝑝 , the Pumping Lemma
gives a decomposition 𝜎 = 𝛼𝛽𝛾 satisfying the three conditions. Condition (1) says
that |𝛼𝛽 | ≤ 𝑝 , so that the substrings 𝛼 and 𝛽 have only 0’s (and also, all of 𝜎 ’s
1’s are in 𝛾 ). Condition (2) says that 𝛽 has at least one character, necessarily 0.
Consider Condition (3)’s list: 𝛼𝛾 , 𝛼𝛽 2𝛾 , 𝛼𝛽 3𝛾 , . . . Compare its first entry, 𝛼𝛾 ,
to 𝜎 . The string 𝛼𝛾 has fewer 0’s then does 𝜎 but the same number of 1’s. So the
number of 0’s in 𝛼𝛾 is not one more than the number of 1’s. Thus 𝛼𝛾 ∉ L, which
contradicts the third condition of the Pumping Lemma.
We can interpret that example to say that Finite State machines cannot recognize
a predecessor-successor relationship. We can similarly use the Pumping Lemma to
show Finite State machines cannot recognize other arithmetic relations.
5.6 Example The language L = { a𝑛 𝑛 is a perfect square } = {𝜀, a, a4, a9, a16, ... } is
not regular. For, suppose otherwise. Denote the pumping length by 𝑝 and consider
2
𝜎 = a (𝑝 ), so that 𝜎 ∈ L and |𝜎 | ≥ 𝑝 .
By the Pumping Lemma, 𝜎 decomposes as 𝛼𝛽𝛾 , subject to the three conditions.
Condition (1) is that |𝛼𝛽 | ≤ 𝑝 , which implies that |𝛽 | ≤ 𝑝 . Condition (2) is that
0 < |𝛽 | . Now consider the strings 𝛼𝛾 , 𝛼𝛽 2𝛾 , . . .
We will get a contradiction from 𝛼𝛽 2𝛾 . The definition of L is that after 𝜎 the
Section 5. Non-regular languages 223
IV.5 Exercises
✓ 5.8 Example 5.5 shows that L = { 0𝑚 1𝑛 ∈ B∗ 𝑚 = 𝑛 + 1 } is not regular but your
friend doesn’t get it and asks you, “What’s wrong with the regular expression
0𝑛+1 1𝑛 ?” Explain it to them.
5.9 Example 5.2 uses 𝛼𝛽 2𝛾 to show that the language of balanced parentheses is
not regular. Instead get the contradiction by showing that 𝛼𝛾 is not a member of
the language.
5.10 Your friend has been thinking. They say, “Hey, the diagram (∗) before
Theorem 5.1 doesn’t apply unless the language is infinite. Sometimes languages
are regular because they only have like three or four strings. But the Pumping
Lemmas third condition requires that infinitely many strings be in the language, so
the Pumping Lemma is wrong.” In what way do they need to further refine their
thinking?
5.11 Someone in the class emails you, “If a language has a string with length
greater than the number of states, which is the pumping length, then it cannot be
a regular language.” Correct?
✓ 5.12 Your study partner has read Remark 5.4 but it is still sinking in. About the
matched parentheses example, Example 5.2, they say, “So 𝜎 = (𝑝 )𝑝 , and 𝜎 = 𝛼𝛽𝛾 .
We know that 𝛼𝛽 consists only of (’s, so it must be that 𝛾 consists of )’s.” Give
them a prompt.
224 Chapter IV. Automata
✓ 5.13 For each, give five strings that are elements of the language and five that are
not, and then show that the language is not regular by using the Pumping Lemma.
(a) L0 = { a𝑛 b𝑚 𝑛 + 2 = 𝑚 }
(b) L1 = { a𝑛 b𝑚 c𝑛 𝑛, 𝑚 ∈ N }
(c) L2 = { a𝑛 b𝑚 𝑛 < 𝑚 }
✓ 5.14 For each language over Σ = { a, b } produce five strings that are members.
Then decide whether that language is regular. Prove your assertion either by
producing a regular expression or using the Pumping Lemma.
(a) { a𝑛 b𝑚 ∈ Σ∗ 𝑛 = 3 }
(b) { a𝑛 b𝑚 ∈ Σ∗ 𝑛 + 3 = 𝑚 }
(c) { a𝑛 b𝑚 ∈ Σ∗ 𝑛, 𝑚 ∈ N }
(d) { a𝑛 b𝑚 ∈ Σ∗ 𝑚 − 𝑛 > 12 }
✓ 5.15 Each language is non-regular and 𝜎 is a good choice as part of a proof using
the Pumping Lemma, where 𝑝 is the pumping length. For each, give the most
specific regular expression describing 𝛼𝛽 and 𝛾 . Take Σ = { a, b }.
(a) L = { a𝑛 b2𝑛 𝑛 ∈ N } , 𝜎 = a𝑝 b2𝑝
(b) L = { a𝑛 b𝑛+5 𝑛 ∈ N } , 𝜎 = a𝑝 b𝑝+5
(c) L = { a𝑖 b 𝑗 a𝑖+𝑗 𝑖 ∈ N and 𝑗 ∈ N+ } , 𝜎 = a𝑝 ba𝑝+1
(d) L = { a𝑖 ⌢ 𝜏 𝑘 ∈ N and |𝜏 | = 𝑘 } , 𝜎 = a𝑝 b𝑝
(e) L = {𝜏 𝜏 is a palindrome and |𝜏 | is even } , 𝜎 = a𝑝 bba𝑝
5.16 With a friend you try to apply the Pumping Lemma to {𝜏 ⌢ 𝜏 𝜏 ∈ Σ∗ }.
(a) List five elements of the language.
(b) You pick 𝜎 = a𝑝 ba𝑝 b; go through the argument.
(c) Your friend tries 𝜎 = a𝑝 a𝑝 and can’t get their argument to go. Suggestions?
✓ 5.17 Use the Pumping Lemma to prove that L = { a𝑚−1 cb𝑚 𝑚 ∈ N+ } is not
regular. It may help to first produce five strings from the language.
5.18 Show that the language over { a, b } of strings having more a’s than b’s is not
regular.
5.19 One of these is regular, one is not. Which is which? (Prove your assertions.)
(a) { a𝑛 b𝑚 ∈ { a, b } 𝑛 = 𝑚 2 }
∗
(b) { a𝑛 b𝑚 ∈ { a, b } 3 < 𝑚, 𝑛 }
∗
b a a b
𝑞0 a
𝑞𝑖1 a
𝑞𝑖4 𝑞𝑖5 a
𝑞𝑖8
b
State machine with three states that recognizes this language and argue that
this is the minimal number of states for such a machine.
(b) Show that the minimal pumping length for L is 1.
Section
IV.6 Pushdown machines
No Finite State machine can recognize the language of balanced parentheses. So
this machine model is not powerful enough to, for instance, decide whether input
strings are valid programs in most programming languages. To handle nested
parentheses, the natural data structure is a pushdown stack. We now supplement
a Finite State machine by giving it access to a stack.
A stack is like the restaurant dish dispenser below: when you push a new dish
on, its weight compresses a spring underneath, so that the old ones move down
and the most recent dish is the only one that you can reach. When you pop that
top dish off, the spring pushes the remaining dishes up and now you can reach the
next one. We say that this stack is LIFO: Last-In, First-Out.
Below on the right is a sequence of views of a stack. Initially the stack has two
characters, g3 and g2. We push g1 on the stack, and then g0. Now, although g1 is
on the stack, we don’t have immediate access to it. To get at g1 we must first pop
off g0, as in the last stack shown.
g2 g1 g0 g1
g3 g2 g1 g2
g3 g2 g3
g3
Like a Turing machine tape, a stack provides storage that is unbounded. But
it has restrictions that the tape does not. Once something is popped, it is gone.
We could include in the machine a state whose intuitive meaning is that we have
just popped g0 but as there are finitely many states and unboundedly many stack
arrangements, that strategy has limits.
Section 6. Pushdown machines 227
6.1 Example We will give a Pushdown machine that recognizes the language of
balanced parentheses, LBAL , containing strings such as [] and [[]], as well as
[[][]] and [][]. Precisely stated, 𝜎 ∈ LBAL if it contains the same number of [’s
as ]’s and no prefix of 𝜎 contains more ]’s than [’s.
The Pumping Lemma shows that no Finite State machine recognizes LBAL . But
it is recognized by this Pushdown machine. It has two states 𝑄 = {𝑞 0, 𝑞 1 }, one
of which is an accepting state, 𝐹 = {𝑞 1 }. Its tape alphabet is Σ = { [ , ] } and its
stack alphabet is Γ = { g0 }. The table below gives Δ. Instruction numbers are for
ease of reference.
In a Pushdown machine every computation step begins with the machine popping
the top character off the stack. There that character is ⊥. The machine is then in
state 𝑞 0 , is reading [ on the tape, and the popped character is ⊥, so instruction 0
applies. The machine goes into state 𝑞 0 (which is not a change) and pushes the
two-token string g0⊥ onto the stack. The ⊥ only replaces what was there already,
but the g0 makes a new stack top character.
Here is an example computation accepting the string [[]][].
†
Read aloud as “bottom.” ‡ These machines sometimes need to do final work triggered by the end of
the input. This doesn’t happen for Finite State machines and so for them we don’t mark the input end
in the same way.
228 Chapter IV. Automata
Step Configuration
[[]][]B ⊥
0 𝑞0
[]][]B g0 ⊥
1 𝑞0
]][]B g0 g0 ⊥
2 𝑞0
][]B g0 ⊥
3 𝑞0
[]B ⊥
4 𝑞0
]B g0 ⊥
5 𝑞0
B ⊥
6 𝑞0
⊥
7 𝑞1
After step 1 there are two g0’s on the stack, which is how the machine remembers
that the number of [’s it has consumed is two more than the number of [’s. At the
end it has an empty tape and is in an accepting state, so it accepts the input.
Here is a rejection example, whose initial string does not have balanced
parentheses.
Step Configuration
[]]B ⊥
0 𝑞0
]]B g0 ⊥
1 𝑞0
]B ⊥
2 𝑞0
B
3 𝑞0
At the end, although the tape still has content, the stack is empty. The machine
cannot start the next step by popping the top stack character because there is no
such character. The computation dies, without accepting the input.
We are ready for the definition.
Section 6. Pushdown machines 229
In state 𝑞 0 , when this machine sees a on the tape then it pushes g0 onto the stack,
230 Chapter IV. Automata
and for b it pushes g1. Reading the c switches the machine to state 𝑞 2 . In this
phase, if it is reading a and the character on top of the stack is g0 then the machine
consumes the a, pops the g0, and goes on. The same happens with b and g1.
Otherwise, the computation dead-ends. Finally, if the machine reaches the end of
the input string at the same moment that it reaches the bottom of the stack then it
goes to the accepting state 𝑞 2 .
Here is an example computation accepting the input bacab.
Step Configuration
bacabB ⊥
0 𝑞0
acabB g1 ⊥
1 𝑞0
cabB g0 g1 ⊥
2 𝑞0
abB g0 g1 ⊥
3 𝑞1
bB g1 ⊥
4 𝑞1
B ⊥
5 𝑞1
⊥
6 𝑞3
The machine runs in two phases. Where the input is 𝜎𝜎 R, the first phase works
Section 6. Pushdown machines 231
with 𝜎 . If the tape character is 0 then the machine pushes the token g0 onto the
stack, and if it is 1 then the machine pushes g1. This is done while in state 𝑞 0 .
The second phase works with 𝜎 R. If 0 is on the tape and g0 tops the stack, or 1
and g1, then the machine proceeds. Otherwise there is no matching instruction
and the computation branch dies. This is done while in state 𝑞 1 .
Without a middle marker how does the machine know when to change from
phase one to two, from pushing to popping? It is nondeterministic — it guesses.
That happens in lines 7 and 8. The 𝜀 character in the input means that the machine
can spontaneously transition from 𝑞 0 to 𝑞 1 .
We will show three example computations. For the first, we exhibit a successful
branch of the computation tree.
Step Configuration
0110 ⊥
0 𝑞0
110 g0 ⊥
1 𝑞0
10 g1 g0 ⊥
2 𝑞1
0 g0 ⊥
3 𝑞1
⊥
4 𝑞2
First note a point about the input. Because this machine can guess, it can guess
whether the input is finished. (Instruction 11 says that if the machine is in state 𝑞 1
and the stack has only ⊥ then the machine can spontaneously transition to 𝑞 2 ,
which is the only accepting state. If this happens after the input string has run out
then the computation branch succeeds.) So we can omit the terminating B that we
used earlier.
Next is the computation for input 00. The picture below shows the entire
computation tree. The 𝜀 transitions are drawn vertically (note the difference
between the vertical ‘⊢’ and the bottom symbol). The machine accepts the input
because the highlighted branch ends with an empty tape and in the accepting
state 𝑞 2 .
232 Chapter IV. Automata
Input: 0 0
𝑞2, ⊥
11
⊢
𝑞2, ⊥ 𝑞 1 , g0⊥ ⊢ 𝑞1, ⊥
9
11
⊢
𝑞1, ⊥ 7 𝑞 1, g0g0⊥
⊢
2 7
⊢
𝑞0, ⊥ ⊢ 𝑞 0 , g0⊥ ⊢ 𝑞 0, g0g0⊥
0 3
Step: 0 1 2
6.5 Animation: Computation tree for 00. Next to the ⊢’s are instruction numbers.
The third example computation is a rejection. The input is 100, which isn’t an
even-length palindrome, and none of the branches end both with an empty string
and in an accepting state.
Input: 1 0 0
𝑞2, ⊥ 𝑞 1, g0g1⊥ ⊢ 𝑞 1, g1⊥
9
11
⊢
⊢
2 8 7
⊢
⊢
𝑞0, ⊥ ⊢ 𝑞 0 , g1⊥ ⊢ 𝑞 0, g0g1⊥ ⊢ 𝑞 0, g0g0g1⊥
1 4 3
Step: 0 1 2 3
Our intuition is that Pushdown machines have more power than Finite State
machines, in that they have a kind of unbounded read/write memory. The prior
examples support that, by showing Pushdown machines that recognize languages
that cannot be recognized by any Finite State machine.
6.7 Remark Stack machines models are often used in practice for running hardware.
Here is a ‘Hello World’ program in the PostScript printer language.
/Courier % name the font
20 selectfont % font size in points, 1/72 of an inch
72 500 moveto % position the cursor
(Hello world!) show % stroke the text
showpage % print the page
The interpreter pushes Courier on the stack, and then on the second line pushes
20 on the stack. It then executes selectfont, which pops two things off the stack
to set the font name and size. After that it moves the current point and places the
text on the page. Finally, it draws that page to paper.
We close this section with a number of related results that together make a
bigger picture, that the machine models form a linear hierarchy. Full coverage is
outside our scope so we will only discuss some of these results without proof.
The first result we have already seen, that deterministic Finite State machines
Section 6. Pushdown machines 233
do the same jobs as nondeterministic ones. That also holds for Turing machines,
although we will not consider nondeterministic Turing machines in depth until the
final chapter.
Another relevant result, which we won’t prove, is that there are things that
Turing machines can do but that no Pushdown machine can do. One is to decide
membership in the language {𝜎 ⌢ 𝜎 𝜎 ∈ B∗ }, which contains strings such as 1010
and 011011. A Pushdown machine can remember the characters by pushing them
onto the stack, and if that machine is nondeterministic then it can guess when the
first half of the input ends. But then to check that the second half of the string
matches the first it would need to pop the characters off to reverse them, and
reversing an arbitrary length string requires being able to write to the tape.
We know that Finite State machines accept Regular languages, and Turing ma-
chines accept computable languages. As to nondeterministic Pushdown machines,
recall that in the section on Grammars we restricted our attention to production
rules where the head consists of a single nonterminal, such as S → aSb.† If a
language has a grammar in which all the rules are of this type then it is a context
free language. Most familiar programming languages are context free, including C,
Java, Python, and Racket. We will state but not prove that a language is accepted
by some nondeterministic Pushdown machine if and only if it has a context free
grammar.
The last result needs deterministic Pushdown machines so we first outline how to
define them. In contradistinction to a nondeterministic machine, in a deterministic
machine at any step there is exactly one legal move. So to adjust the definition we
have for nondeterministic Pushdown machines to one for deterministic ones we
eliminate situations where the machine has choices. There are two situations. One
is that Δ(𝑞𝑖 , 𝑡, 𝑔) is a set and so we will require that in a deterministic machine
that set must have exactly one element. The other is evident in the tree diagrams
above: nondeterministic machines can have that Δ(𝑞𝑖 , 𝜀, 𝑔) is a nonempty set and
also that Δ(𝑞𝑖 , 𝑡, 𝑔) is nonempty for 𝑡 ≠ 𝜀 (see for instance the the prior example’s
machine in lines 0–2). So we outlaw the possibility that both are nonempty.
Example 6.1 and Example 6.3 are both deterministic.‡
With that, the last relevant result is that the collection of languages accepted
by deterministic Pushdown machines is a proper subset of the collection accepted
by nondeterministic Pushdown machines. While we won’t prove that, we can give
a good idea of why it is true. We have shown that there is a nondeterministic
Pushdown machine that accepts the language of even-length palindromes. It uses
𝜀 moves to guess when to change from pushing to popping. But a deterministic
Pushdown machine has recourse to no such tactic. Nor is there a middle marker to
rely on. In short, no deterministic Pushdown machine accepts LELP . So Pushdown
machines are different than Finite State machines and Turing machines — for
†
An example of a rule where the head is not of that form is cSb → aS. With this rule we can
substitute for S only if it is preceded by c and followed by b. A grammar with rules of this type is
called context sensitive because substitutions can only be done in a context. ‡ Deterministic Pushdown
machines need the end-marker B, which is why we used it for those examples.
234 Chapter IV. Automata
IV.6 Exercises
✓ 6.8 Produce a Pushdown machine that does not halt.
6.9 Consider the Pushdown machine in Example 6.1.
(a) With the input [][], step through the computation as a sequence of ⊢ relations.
(b) Do the same but with the input ][][.
✓ 6.10 Produce a Pushdown machine to accept each language over Σ = { a, b, c }.
(a) { a𝑛 cb2𝑛 𝑛 ∈ N } (b) { a𝑛 cb𝑛− 1 𝑛 > 0 }
✓ 6.11 Give a Pushdown machine that accepts { 0 ⌢ 𝜏 ⌢ 1 𝜏 ∈ B∗ }.
✓ 6.12 Write a Pushdown machine that accepts { a2𝑛 𝑛 ∈ N }.
6.13 Give a Pushdown machine that accepts { a2𝑛 b𝑛 𝑛 ∈ N }.
✓ 6.14 Example 6.4 discusses the view of a nondeterministic computation as a tree.
Draw the tree for that machine these inputs. (a) 0110 (b) 010
✓ 6.15 Give a grammar for the language in Example 6.4, the even-length palindromes
over B.
6.16 Use the Pumping Lemma to show that the language of even-length palin-
dromes from Example 6.4 is not recognized by any Finite State machine.
6.17 Fix an alphabet Σ. (a) Show that the set of Pushdown machines over Σ
is countable. (b) Show that the collection of languages accepted by Pushdown
machines is countable. (c) Conclude that there are languages that no Pushdown
machine accepts.
6.18 Use Church’s Thesis to argue that any language recognized by a Pushdown
machine is recognized by some Turing machine.
Extra
IV.A Regular expressions in the wild
Regular expressions are an important tool in practice. Modern programming
languages such as Racket and Python include capabilities for extensions to regular
expressions, which we will call regexes. These go beyond the small-scale theory
examples that we saw earlier.
Extra A. Regular expressions in the wild 235
As an example, a system administrator searching a web server log for the PDF’s
downloaded from a directory. They might give this command.
$ grep "/linearalgebra/.*\textbackslash .pdf" /var/log/apache2/access.log
The grep utility program looks through the log file line by line. If a line has a
substring matching the regex then grep prints that line.
We will illustrate with Racket regexes. As a prototype,
> (regexp-match? #px"^[A-Z][A-Z][0-9][A-Z][A-Z]\$" "KE1AZ")
returns #t. Note the caret ^ at the start of the string and the dollar sign
at the end. These are anchors, making Racket match the entire string from
start to finish. They are needed because the most common use case is for
programmers to want to find the expression anywhere in the string. Thus for
instance, ( regexp - match ? #px "[0-9]" " KE1AZ ") also returns #t, although
its expression doesn’t account for the letters, because it asks for at least one digit
somewhere in KE1AZ. However, here we will use the caret and dollar sign because
for the purpose of this explication, they better describe the matching.
The extensions that languages make in going
from the theoretical regular expressions that we have
seen earlier to in-practice regexes fall into two cat-
egories. First come convenience constructs that ease
doing something that otherwise would be possible
but awkward. Second comes extensions that give
capabilities that are just not possible with regular
expressions.
Many of the convenience extensions are about the
problem of sheer scale: in the theory discussion our
alphabets had two or three characters but in practice
an alphabet must include at least ASCII’s printable
characters: a – z, A – Z, 0 – 9, space, tab, period, dash,
exclamation point, percent sign, dollar sign, open Courtesy xkcd.com
and closed parenthesis, open and closed curly braces,
etc. These days it may even contain all of Unicode’s more than one hundred
thousand characters. We need manageable ways to describe such large sets.
Consider matching a digit. The regular expression (0|1|2|3|4|5|6|7|8|9)
works, but is too verbose for an often-needed list. One abbreviation that modern
languages allow is [0123456789], omitting the pipe characters and using square
brackets, which in regexes are metacharacters. Or, because the digit characters
are contiguous in the character set,† we can shorten it further to [0-9]. Along the
same lines, [A-Za-z] matches a singleton English letter.
To invert the set of matched characters, put a caret ‘^’ as the first thing inside
the bracket (and note that it is a metacharacter). Thus, [^0-9] matches a non-digit
and [^A-Za-z] matches a character that is not an ASCII letter.
†
The digits 0 through 9 are contiguous in both ASCII and Unicode.
236 Chapter IV. Automata
The most common lists have short abbreviations. Another abbreviation for the
digits is \d. Use \D for the ASCII non-digits, \s for the whitespace characters
(space, tab, newline, formfeed, and line return) and \S for ASCII characters that are
non-whitespace. Cover the alphanumeric characters (upper and lower case ASCII
letters, digits, and underscore) with \w and cover the ASCII non-alphanumeric
characters with \W. And — the big kahuna — the dot ‘.’ is a metacharacter that
matches any member of the alphabet at all.†
1.1 Example Canadian postal codes have seven characters: the fourth is a space, the
first, third, and sixth are letters, and the others are digits. The regular expression
[a-zA-Z]\d[a-zA-Z] \d[a-zA-Z]\d describes them.
1.2 Example Dates are often given in the ‘dd/mm/yy’ format. This matches:
\d\d/\d\d/\d\d.
1.3 Example In the twelve hour time format some typical times strings are ‘8:05 am’
or ‘10:15 pm’. You could use this (note the empty string at the start).
(|0|1)\d:\d\d\s(am|pm)
Quantifiers In the theoretical cases we saw earlier, to match ‘at most one a’ we
used 𝜀 |a. In practice we can write something like (|a), as we did above for the
twelve hour times. But depicting the empty string by just putting nothing there
can be confusing. Modern languages make question mark a metacharacter and
allow you to write a? for ‘at most one a’.
For ‘at least one a’ modern languages use a+, so the plus sign is another
metacharacter. More generally, we often want to specify quantities. For instance,
to match five a’s regexes use the curly braces as metacharacters, with a{5}. Match
between two and five of them with a{2,5} and match at least two with a{2,}.
Thus, a+ is shorthand for a{1,}.
As earlier, to match any of these metacharacters you must escape them. For
instance, To be or not to be\? matches the famous question.
†
Programming languages in practice by default have the dot match any character except newline. In
addition, these languages have a way to make it also match newline.
Extra A. Regular expressions in the wild 237
Cookbook All of the extensions to regular expressions that we are seeing are
driven by the desires of working programmers. Here is a pile of examples showing
them accomplishing practical work, matching things you’d want to match.
1.4 Example US postal codes, called ZIP codes, are five digits. Match them with
\d{5}.
1.5 Example North American phone numbers match \d{3} \d{3}-\d{4}.
1.6 Example The regex (-|\+)?\d+ matches an integer, positive or negative. The
question mark makes the sign optional. The plus sign makes sure there is at least
one digit.
1.7 Example A natural number represented in hexadecimal can contain the usual
digits, along with the additional characters ‘a’ through ‘f ’ (sometimes capital-
ized). Programmers often prefix such a representation with 0x, so the regex is
(0x)?[a-fA-F0-9]+.
1.8 Example A C language identifier begins with an ASCII letter or underscore and
then can have arbitrarily many more letters, digits, or underscores: [a-zA-Z_]\w*.
1.9 Example Match a user name of between three and twelve letters, digits, under-
scores, or periods with [\w\.]{3,12}. Match a password that is at least eight
characters long with .{8,}.
1.10 Example The International Standards Organization date format calls for dates
like ‘yyyy-mm-dd HH:MM:SS’ (along with many other variants). The regex
\d{4}-\d{2}-\d{2} (\d{2}:\d{2}(:\d{2})?)? will match them.
1.11 Example Match the text inside a single set of parentheses with \([^()]*\).
1.12 Example We next match a URL, a web address such as https://hefferon.n
et/computation. This regex is more intricate than prior ones. It is based on
breaking URL’s into three parts: a scheme such as ‘http’ along with a colon and
two forward slashes, a host such as hefferon.net and a slash, and then a path
such as computing (the standard also allows a trailing query string but this regex
does not handle that).
(https?|ftp)://([^\s/?\.#]+\.?){0,3}[^\s/?\.#]+(/[^\s]*/?)?
Notice the question mark in https?, so that the scheme can be http or https.
Notice also that the host part, consists of between one and four fields separated
by periods. We allow almost any character in those fields, except for a space, a
question mark, a period or a hash. At the end comes the path.
But wait! there’s more! We have already noted that you can match the start of a
line and end of line with the metacharacters caret ‘^’ and dollar sign ‘$’.
1.13 Example Match lines starting with ‘ Theorem’ using ^Theorem. Match lines ending
with end{equation*} using end{equation\*}$.
Regex engines in modern languages let you specify that the match is case
insensitive, although they differ in the syntax that you use to achieve this.
238 Chapter IV. Automata
1.14 Example The web document language HTML document tag for an image, such
as <img src="logo.jpg">, uses either of the keys src or img to give the name of
the file containing the image. Those strings can be in upper case or lower case,
or any mix. Racket uses a ‘?i:’ syntax to mark part of the regex as insensitive:
\\s+(?i:(img|src))=. (Note also the double backslash, which is how Racket
escapes the backslash.)
Beyond convenience The regular expression engines that come with recent
programming languages have capabilities beyond matching only those languages
that are recognized by Finite State machines.
1.15 Example The language HTML uses tags such as <b>boldface text</b> and
<i>italicized text</i>. Matching any one tag is straightforward, for instance
<b>[^<]*</b>. But for a single expression that matches them all, you would seem
to have to do each as a separate case and then combine cases with n alternation
operator. However, instead we can have the system remember what it finds at
the start and look for that again at the end. Thus, the regex <([^>]+)>.*</\\1>
matches HTML tags like the ones given. Its second character is an open paren-
thesis, and the \\1 refers to everything between that open parenthesis and the
matching close parenthesis (and, that is not a typo; Racket’s syntax calls for double
backslashes). As is hinted by the 1, you can also have a second match with \\2,
etc.
That is a back reference. It is very convenient. However, it gives regexes more
power than the theoretical regular expressions that we studied earlier.
1.16 Example This is the language of square strings over Σ = { a, b }.
L = {𝜎 ∈ Σ∗ 𝜎 = 𝜏 ⌢ 𝜏 for some 𝜏 ∈ Σ∗ }
Some members are aabaab, baaabaaa, and aa. The Pumping Lemma shows that
the language of squares is not regular; see Exercise A.36. Describe this language
with the regex (.+)\1; note the back-reference.
1.17 Example Another language that the Pumping Lemma shows cannot be represented
using regular expressions, but that can be described with regexes is the language
of numbers that are nonprime, represented in unary, L = { 1𝑛 𝑛 is not prime }
It is described by the regex ^1?$|^(11+?)\\1+$. A brief explanation: the ^1?$
matches a string that is either zero-many or one-many 1’s. The ^(11+?)\\1+$
matches a group of 1’s repeated one or more times. Being able to divide the number
of 1’s into some number of subgroups is what characterizes a unary number as
composite, that is, not prime.
Tradeoffs Regexes are powerful tools. But they come with downsides.
For instance, the regular expression for twelve hour time from Example 1.3
(𝜀 |0|1)\d:\d\d\s(am|pm) does indeed match ‘8:05 am’ and ‘10:15 pm’ but it falls
short in some respects. One is that it requires am or pm at the end, but times are
often are given without them. We could change the ending to (𝜀 |\s am|\s pm).
Extra A. Regular expressions in the wild 239
Another issue is that it also matches some strings that you don’t want, such as
13:00 am or 9:61 pm. We can solve this as with the prior paragraph, by listing the
cases.† (01|02|...|11|12):(01|02|...|59|60)(\s am|\s pm). This is like
the prior patch in that it fixes the issue but at a cost of complexity, since it amounts
to a list of allowed substrings. Regexes have a tendency to grow, to accrete subcases
like this.
Another example is the Canadian
postal expression in Example 1.1. Not
every matching string has a correspond-
ing physical post office — for one thing,
no valid codes begin with Z. And US ZIP
codes work the same way; there are fewer
than 50 000 assigned ZIP codes, so many
five digits strings are not in use. Chang- Courtesy xkcd.com
ing the regexes to cover only those codes
actually in use would make them just lists of strings, which would change frequently.
The canonical example of this is the regex describing the official standard for
valid email addresses. We show here just five lines out of its 81 but that’s enough
to make the point about its complexity.
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
And, even if you do have an address that fits the standard, you don’t know if there
is an email server listening at that address. In practice, people often use the regex
\S+@\S+ as a sanity check, for instance on a web form that expects users to input
an email address.
At this point regexes may be starting to seem a less like a fast and neat problem-
solver and a little more like a potential development and maintenance problem.
The full story is that sometimes a regex is just what you need for a quick job, and
sometimes they are good for more complex tasks also. But some of the time the
cost of complexity outweighs the gain in expressiveness. This power/complexity
tradeoff is often referred to online by citing this quote from J Zawinski.
The notion that [regexes] are the solution
to all problems is . . . braindead. . . . Some
people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems. Courtesy xkcd.com
IV.A Exercises
✓ A.18 Which of the strings matches the regex ab+c? (a) abc (b) ac (c) abbb
(d) bbc
†
Some substrings are elided so it fits in the margins.
240 Chapter IV. Automata
A.19 Which of the strings matches the regex [a-z]+[\.\? !]? (a) battle!
(b) Hot (c) green (d) swamping. (e) jump up. (f) undulate? (g) is.?
✓ A.20 Give a regex for each. (a) Match a string that has ab followed by zero or
more c’s, (b) ab followed by one or more c’s, (c) ab followed by zero or one c,
(d) ab followed by two c’s, (e) ab followed by between two and five c’s, (f) ab
followed by two or more c’s, (g) a followed by either b or c.
✓ A.21 Give a regex to accept a string for each description.
(a) Containing the substring abe.
(b) Containing only upper and lower case ASCII letters and digits.
(c) Containing a string of between one and three digits.
A.22 Give a regex to accept a string for each description. Take the English vowels
to be a, e, i, o, and u.
(a) Starting with a vowel and containing the substring bc.
(b) Starting with a vowel and containing the substring abc.
(c) Containing the five vowels in ascending order.
(d) Containing the five vowels.
A.23 Give a regex matching strings that contain an open square bracket and an
open curly brace.
✓ A.24 Every lot of land in New York City is denoted by a string of digits called BBL,
for Borough (one digit), Block (five digits), and Lot (four digits). Give a regex.
✓ A.25 Example 1.5 gives a regex for North American phone numbers. (a) They
are sometimes written with parentheses around the area code. Extend the regex
to cover this case. (b) Sometimes phone numbers do not include the area code.
Extend to cover this also.
A.26 Most operating systems come with file that has a list of words, for spell-
checking, etc. For instance, on Linux it may be at /usr/share/dict/words.
Use that file to find how many words fit the criteria. (a) contains the letter a
(b) starts with A (c) contains a or A (d) contains X (e) contains x or X
(f) contains the string st (g) contains the string ing (h) contains an a, and
later a b (i) contains none of the usual vowels a, e, i, o or u (j) contains all the
usual vowels (k) contains all the usual vowels, in ascending order
✓ A.27 Give a regex to accept time in a 24 hour format. It should match times of
the form ‘hh:mm:ss.sss’ or ‘hh:mm:ss’ or ‘hh:mm’ or ‘hh’.
A.28 Give a regex describing a floating point number.
✓ A.29 Give a suitable regex. (a) All Visa card numbers start with a 4. New
cards have 16 digits. Old cards have 13. (b) MasterCard numbers either start
with 51 through 55, or with the numbers 2221 through 2720. All have 16 digits.
(c) American Express card numbers start with 34 or 37 and have 15 digits.
✓ A.30 Postal codes in the United Kingdom have six possible formats. They are:
(i) A11 1AA, (ii) A1 1AA, (iii) A1A 1AA, (iv) AA11 1AA, (v) AA1 1AA, and (vi) AA1A
Extra A. Regular expressions in the wild 241
1AA, where A stands for a capital ASCII letter and 1 stands for a digit. (a) Give a
regex. (b) Shorten it.
✓ A.31 You are stuck on a crossword puzzle. You know that the first letter (of eight)
is an g, the third is an n and the seventh is an i. You have access to a file that
contains all English words, each on its own line. Give a suitable regex.
A.32 In the Tradeoffs discussion, we change the ending to (𝜀 |\s am|\s pm).
Why not \s(𝜀 |am|pm), which factors out the whitespace?
A.33 Imagine that you decide to avoid regexes but still want to do the sanity
check for email addresses discussed above, of accepting the string if and only it it
consists of a nonempty string of characters, followed by @, followed by a nonempty
string of characters. Implement that as a routine in your favorite language.
✓ A.35 The Roman numerals from grade school use the letters I, V, X, L, C, D, and M
to represent 1, 5, 10, 50, 100, 500, and 1000. They are written in descending order
of magnitude, from M to I, and are written greedily so that we don’t write six I’s
but rather VI. Thus, the date written on the book held by the Statue of Liberty is
MDCCLXXVI, for 1776. Further, we replace IIII with IV, and replace VIIII with IX.
Give a regular expression for valid Roman numerals less than 5000.
A.37 Consider L = { 0𝑛 10𝑛 𝑛 > 0 }. (a) Show that it is not regular. (b) Find a
regex.
A.38 In regex golf you are given two lists and must produce a regex that matches
all the words in the first list but none of the words in the second. The ‘golf ’ aspect
is that the person who finds the shortest regex, the one with the fewest characters,
wins. Try these: accept the words in the first list and not the words in the second.
(a) Accept: Arthur, Ester, le Seur, Silverter
Do not accept: Bruble, Jones, Pappas, Trent, Zikle
(b) Accept: alight, bright, kite, mite, tickle
Do not accept: buffing, curt, penny, tart
(c) Accept: afoot, catfoot, dogfoot, fanfoot, foody, foolery, foolish, fooster, footage,
foothot, footle, footpad, footway, hotfoot, jawfoot, mafoo, nonfood, padfoot,
prefool, sfoot, unfool
Do not accept: Atlas, Aymoro, Iberic, Mahran, Ormazd, Silipan, altared,
chandoo, crenel, crooked, fardo, folksy, forest, hebamic, idgah, manlike, marly,
palazzi, sixfold, tarrock, unfold
A.39 In a regex crossword each row and column has a regex. You have to find
strings for those rows and columns that meet the constraints.
242 Chapter IV. Automata
(AB|OE|SK)
[^SPEAK]+
(A|B|C)\1
EP|IP|EF
(a) (b)
HE|LL|O+ .*M?O.*
[PLEASE]+ (AN|FE|BE)
Extra
IV.B The Myhill-Nerode theorem
We have defined regular languages in terms of Finite State machines. Here we will
give a characterization that instead goes directly to the properties of the language.
Recall that in this chapter’s first section, Remark 1.7 said that the key to
designing Finite State machines is to think of each state as about its future, as
about the input strings to come.
2.1 Definition Fix a language L over an alphabet Σ, along with two strings
𝜎0, 𝜎1 ∈ Σ∗. Then 𝜏 ∈ Σ∗ is a distinguishing extension when of 𝜎0 ⌢ 𝜏 and 𝜎1 ⌢ 𝜏 ,
one is an element of L and the other is not. If such an extension exists then the
strings are L-distinguishable, otherwise they are L-indistinguishable or L-related,
denoted 𝜎0 ∼L 𝜎1 .
2.2 Lemma For any language L, the binary relation ∼L is an equivalence and hence
partitions the universe of all strings into equivalence classes, denoted EL,𝑗 .
Proof This is Exercise B.31’s item (a) .
2.3 Example Let L = {𝜎 ∈ { a, b }∗ |𝜎 | = 3 }, with 𝜎0 = aa and 𝜎1 = a.† Then 𝜏 = bb
is a distinguishing extension because 𝜎0 ⌢ 𝜏 = aabb ∉ L while 𝜎1 ⌢ 𝜏 = abb ∈ L.
The prior paragraph brings out that for this language two strings are L-
distinguishable if and only if they they have different lengths. So the equivalence
classes, the collections of indistinguishable strings, are the length zero strings,
EL,0 = {𝜀 }, and the length one strings, EL,1 = { a, b }, and those of length two,
EL,2 = { aa, ab, ba, bb }, and length three, EL,3 = { aaa, aab, ... bbb }, along with
the longer strings, EL,4 = {𝜎 |𝜎 | ≥ 4 }. In the picture below the box is the universe
of all strings Σ∗. It is partitioned into the equivalence classes, with every string a
member of one and only one class.
EL,0 EL,3
EL,1
EL,4
EL,2
†
Here, 𝜎1 = a is not a character, it is a length one string containing the character a. We won’t worry
too much about the distinction.
Extra B. The Myhill-Nerode theorem 243
EL,0 EL,1
2.5 Example The above examples have finitely many ∼L classes. Some languages
have infinitely many. One is L = {𝜎 ∈ { a, b }∗ 𝜎 = a𝑛 b𝑛 for some 𝑛 ∈ N }, from
Example 5.2. To show that the number of classes is infinite we don’t need to
produce them all; it is enough to produce infinitely many unequal classes (or for
that matter, infinitely many mutually distinguishable strings).
To verify that these singleton sets are equivalence classes, consider 𝜎 = a𝑛. Since
𝜎 ⌢ b𝑛 ∈ L, the only candidates for strings that are L-indistinguishable from 𝜎
but unequal to it have the form 𝛼 = a𝑛+𝑗 b 𝑗 with 𝑗 > 0. If 𝑛 = 0 then 𝜎 = 𝜀 and
𝛼 = b 𝑗 , and an extension distinguishing 𝜎 from 𝛼 is ab. If 𝑛 > 0 then an extension
distinguishing 𝜎 = a𝑛 from 𝛼 = a𝑛+𝑗 b 𝑗 is ab𝑛+1.
We next make a connection between the Finite State machines that recognize a
language L and the relationship of L-indistinguishability.
2.6 Example This machine recognizes L = {𝜎 ∈ { a, b }∗ 𝜎 has even length }, the
language of Example 2.4.
a,b
𝑞1 𝑞3
a,b
a
𝑞0 a,b
b
𝑞2 𝑞4
a,b
Consider other string inputs, not just the accepted ones, and see where they bring
the machine.
Input string 𝜎 𝜀 a b aa ab ba bb aaa aab aba abb ...
Ending state Δ̂(𝜎) 𝑞0 𝑞1 𝑞2 𝑞3 𝑞3 𝑞4 𝑞4 𝑞1 𝑞1 𝑞1 𝑞1 ...
The collection of input strings breaks into five sets, those that bring the machine
to 𝑞 0 , those that bring it to 𝑞 1 , etc. This is another kind of partition, which will
prove to be related to but different than the partition above. Denote the classes of
this M-related partition with EM,𝑖 = {𝜎 ∈ Σ∗ Δ̂(𝜎) = 𝑞𝑖 }.
244 Chapter IV. Automata
EM,0 = {𝜀 } EM,0
EM,1 = { a, aaa, ... } EM,1
EM,2 = { b, baa, ... } EM,3
EM,3 = { aa, ab, ... } EM,4 EM,2
EM,4 = { ba, bb, ... }
Below we lay the language-related partition, with two parts, on top of the machine-
related one with five.
EM,0
EM,1
EL,0 EM,3 EL,1 (∗)
EM,4 EM,2
The M-related parts are subsets of the L-related parts. That is, the M-related
partition is finer than the L-related partition.†
2.7 Definition Let M be a Finite State machine with alphabet Σ. Two strings
𝜎0, 𝜎1 ∈ Σ∗ are M-related if Δ̂(𝜎0 ) = Δ̂(𝜎1 ) , that is, if starting the machine with
input 𝜎0 ends with it in the same state as does starting the machine with input 𝜎1 .
2.8 Lemma The binary relation of M-related is an equivalence and so partitions the
collection of all strings Σ∗ into equivalence classes.
Proof See Exercise B.31’s item (b) .
2.9 Lemma Let M be a deterministic Finite State machine that recognizes L. If two
strings are M-related then they are L-related.
Proof Assume that 𝜎0 and 𝜎1 are M-related, so that starting M with input 𝜎0
causes it to end in the same state as starting it with input 𝜎1 . It follows that for
any suffix 𝜏 , starting M with the input 𝜎0 ⌢ 𝜏 causes it to end in the same state as
does starting it with the input 𝜎1 ⌢ 𝜏 (because the machine is deterministic). In
particular, 𝜎0 ⌢ 𝜏 takes M to an accepting state if and only if 𝜎1 ⌢ 𝜏 does. So the
two strings are L-related.
The EM,𝑖 classes reflect M’s states, in the following sense. Consider 𝑞 1 ’s class
EM,1 = { a, aaa, ... } and 𝑞 3 ’s class EM,3 = { aa, ab, ... }. Just as when the machine
is in 𝑞 1 and reads an a then it transitions to 𝑞 3 , so also if we choose any string
𝜎 ∈ EM,1 and append an a to it then 𝜎 ⌢ a is an element of EM,3 . An example is
that choosing 𝜎 = a ∈ EM,1 and appending a to it gives aa ∈ EM,3 .
This way of thinking of the M classes has them acting as a machine, in that
there are transitions among them just as a machine has. It suggests extending to
also think of the L classes as constituting their own machine. In particular, the
prior paragraph’s workings of the transitions on the EM,𝑖 classes suggest how to
define the transitions in this new machine.
†
‘Finer’ in the sense that sand is finer than gravel.
Extra B. The Myhill-Nerode theorem 245
2.10 Definition Let L be a language over Σ and let the collection of ∼L equivalence
classes EL,𝑖 be 𝐸 . The L-machine has states that are the classes, where the
start state is the one containing 𝜀 . Its accepting states are the ones containing
strings from L. The transition operation, Δ : 𝐸 × Σ → 𝐸 , is: given input EL,𝑖 and
𝑥 ∈ Σ, choose a 𝜎 in the class and then 𝜎 ⌢ 𝑥 is an element of some EL,𝑗 . Set
Δ( EL,𝑖 , 𝑥) = EL,𝑗 .
For instance, the machine for Example 2.6’s two-class language is the two-state
machine in (∗∗) below.
As stated, the definition allows us to choose any string 𝜎 at all from EL,𝑖 . We
must establish that choosing two different string representatives of the input class
does not give two different outputs.
2.11 Lemma Fix a language L. (1) The transition operation is well-defined: if two
strings 𝜎0, 𝜎1 are L-related, 𝜎0 ∼L 𝜎1 , then adjoining a common character 𝑥 ∈ Σ
gives strings that are also L-related, (𝜎0 ⌢𝑥) ∼L (𝜎1 ⌢𝑥) . (2) If one member of a
class is an element of L then every other member of that class is also an element
of L. (3) There is one and only one class containing the empty string.
Proof For the first item, if 𝜎0 ⌢ 𝑥 were not L-related to 𝜎1 ⌢ 𝑥 then the single-
character string 𝑥 would be a distinguishing extension. But 𝜎0 ∼L 𝜎1 so they have
no distinguishing extension.
The second item is: if 𝜎0 ∼L 𝜎1 and 𝜎0 ∈ L but 𝜎1 ∉ L then they are
distinguished by the empty string, which contradicts that they are L-related.
The third item is trivial since for any string, empty or not, there is one and only
one equivalence class containing that string.
2.12 Corollary The L-machine, if it has finitely many states, is a Finite State machine
that recognizes L.
Proof By well-definedness of transitions, starting the L-machine with any 𝜎 ∈ Σ∗
as input will cause the machine to end in the class containing 𝜎 . By the lemma’s
item (2), that is an accepting class of the machine if and only if 𝜎 ∈ L.
2.13 Example Let L = { ab𝑛 𝑛 ∈ N } = { a, ab, abb, ... }. We will first find the equiva-
lence classes and then determine the L-machine’s transitions.
There are three classes. First, EL,0 = {𝜀 } because for any string 𝜎 ∈ { a, b }∗, a
distinguishing extension between 𝜀 and 𝜎 is the single-character string a.
The second class is EL,1 = L. To see that any two elements of this set,
ab𝑖 , ab 𝑗 ∈ L, are L-related, suppose that 𝜏 ∈ Σ∗. If 𝜏 has the form b𝑘 then both of
ab𝑖 ⌢𝜏 and ab 𝑗 ⌢𝜏 are members of L, while if 𝜏 has at least one a then both are not
members because they have at least two a’s. It remains to show that if 𝜎0 ∈ L and
𝜎1 ∉ L then they are not L-related. That’s because, as in Lemma 2.11’s item (2),
they have a distinguishing extension of 𝜀 .
The final class contains all remaining strings, EL,2 = Σ∗ − ( EL,0 ∪ EL,1 ) . We will
show that any two elements of this set are L-related (there is no need to argue
that elements are not L-related to strings outside the set because we have already
shown that in the prior paragraphs). An element 𝜎 ∈ EL,2 must have at least one
246 Chapter IV. Automata
character and must fall into at least one of two cases: either its first item is not a,
or the rest of the string contains at least one a. In both cases for any extension
𝜏 the string 𝜎 ⌢ 𝜏 that is not an element of L, that is, there are no distinguishing
extensions.
In summary, the universe of all strings is partitioned by ∼L into these classes.
EL,0 = {𝜀 } EL,1 = { a, ab, abb, ... } EL,2 = { b, aa, ba, bb, aaa, aba, ... }
To compute the transitions, for each class choose one representative element,
append in turn each of a and b, and find the resulting classes. As representatives,
besides 𝜀 ∈ EL,0 we can choose the one-character strings a ∈ EL,1 and b ∈ EL,2 .
a
Δ a b EL,0 EL,1 b
2.15 Lemma The L-machine is minimal, meaning that from among all the deterministic
Finite State machines that recognize the language L, it has a minimal number of
states.
Proof Let a deterministic Finite State machine M recognize L. Consider two
strings 𝜎0, 𝜎1 ∈ Σ∗. If those two take M to the same state, Δ̂(𝜎0 ) = Δ̂(𝜎1 ) , then
appending any common extension does the same, Δ̂(𝜎0 ⌢ 𝜏) = Δ̂(𝜎1 ⌢ 𝜏) .
Extra B. The Myhill-Nerode theorem 247
𝑞0 1 𝑞1
0 0
1 1
𝑞3 1 𝑞2
0 0
Δ 0 1 0 EL,0 1
EL,1 0
+𝜀 ∈ EL,0 0 ∈ EL,0 1 ∈ EL,1
1 ∈ EL,1 10 ∈ EL,1 11 ∈ EL,2 1 1
Of course, the two machines are essentially the same. If we minimize a minimal
machine then we get back pretty much what we started with.
starts with the desired language L, considers the relation ∼L , and if there are
finitely many associated classes then gives a Finite State machine that recognizes L.
That is, by seeing the patterns inside L given by ∼L , we don’t have to make a
machine recognizing that language, the mathematics gives it to us. For free, this
machine is minimal. This is a deep kind of wizardry — these problems are solved
automatically.
IV.B Exercises
✓ B.17 Use the machine from Example 2.13. (a) What class contains the string bba?
(b) abba? (c) babab? (d) abbbb?
B.18 Use the L-machine from Example 2.13. For each string, identify the string’s
class, and then append the character a and identify the ending class for the
machine’s transition. (a) bba (b) 𝜀 (c) abbb (d) a
✓ B.19 This illustrates the point about ‘well-defined’ in Lemma 2.11’s part (1).
Consider the L-machine of Example 2.16.
(a) Three representative strings from EL,0 are 𝜎0 = 00, 𝜎1 = 11011, and 𝜎2 =
0011111111 = 0018. Append 0 to each and name the class of the resulting
string. Verify that all three lead to a single class and that in the machine the
state EL,0 transitions on input 0 to that class.
(b) Using the same three strings, to each append 1 and name the class of the
resulting string. Verify that the three lead to the same class, and that in the
machine the state EL,0 transitions on input 1 to it.
(c) Repeat that for three strings from EL,1 , with both 0 and 1.
✓ B.20 Example 2.4 gives a language L = {𝜎 ∈ { a, b }∗ 𝜎 has even length } with
two classes. Produce the transition table and arrow diagram for the associated
L-machine.
B.21 Example 2.3 gives a language L = {𝜎 ∈ { a, b }∗ |𝜎 | = 3 } with five classes.
Produce the L-machine.
✓ B.22 Let L be the set of strings from { a, b }∗ ending in a.
(a) Show that L is an equivalence class, EL,1 , for the relation ∼L .
(b) Show that the complement Lc is also an equivalence class, EL,0 , and therefore
there are exactly two classes.
(c) What is the initial state of the L-machine?
(d) What are the accepting states?
(e) Give the transition table and the diagram.
(f) Which of the strings 𝜀 , a, b, abba, and bba are accepted by this machine?
✓ B.23 For the language L = { a2 b𝑛 𝑛 ∈ N } with alphabet Σ = { a, b } this is a
minimal Finite State machine.
b a,b
a a a
𝑞0 𝑞1 𝑞2 𝑞3
b
b
Extra C. Machine minimization 249
is not regular.
B.31 Recall that a binary relation ∼ is an equivalence if it has three proper-
ties: (1) reflexivity, that 𝑥 ∼ 𝑥 for all 𝑥 , (2) symmetry, that if 𝑥 ∼ 𝑦 then 𝑦 ∼ 𝑥 , and
(3) transitivity, that if 𝑥 ∼ 𝑦 and 𝑦 ∼ 𝑧 then also 𝑥 ∼ 𝑧 . (a) Verify Lemma 2.2.
(b) Verify Lemma 2.8.
B.32 Generalize the first item in Lemma 2.11 to: if two strings 𝜎0, 𝜎1 are L-
related, 𝜎0 ∼L 𝜎1 , then adjoining any common extension 𝛽 gives strings that are
also L-related, (𝜎0 ⌢ 𝛽) ∼L (𝜎1 ⌢ 𝛽) .
B.33 Show that the equivalence classes for the language L are the same as for the
language that is its complement, Lc.
Extra
IV.C Machine minimization
Imagine that a person is tasked with ensuring that input for a password form
contains both upper and lower case ASCII characters, and produces the machine
on the left.
a .. z a .. z
𝑞1 A .. Z 𝑞3
a .. z
any
a .. z
𝑟1 A .. Z
𝑞0 𝑟0 𝑟3 any
A .. Z A .. Z a .. z
𝑞2 a .. z
𝑞4 any 𝑟2
A .. Z A .. Z
250 Chapter IV. Automata
The machine on the right is better in that it has one fewer state. We will give
an algorithm, Moore’s algorithm (or the table-filling algorithm), that inputs a
deterministic Finite State machine and outputs a deterministic machine that is
minimal, that from among all of the machines recognizing the same language has
the fewest states.†
It collapses together redundant states so we begin with an example of those.
This recognizes L = {𝜎 ∈ B∗ 𝜎 has at least one 0 and at least one 1 }.
𝑞0 𝑞2 𝑞4 1
1 1
0 0 0
(∗)
𝑞1 𝑞3 𝑞5
1 1
0 0 0,1
In this chapter’s first section, Remark 1.7 recommended designing Finite State
machines by thinking about each state’s future, anticipating inputs to come. In
this machine the future of 𝑞 2 , “waiting for a 0,” is the same as that of 𝑞 4 . Those
states are redundant. Likewise, the future of 𝑞 5 matches that of 𝑞 3 .
To be concrete, this table lists what happens if the machine is started in the
given state and the given string is on the tape. Entries contain ‘+’ if the machine
then ends in an accepting state, and otherwise are blank. The states 𝑞 2 and 𝑞 4
have the same rows, at least for the strings listed, as do the states 𝑞 3 and 𝑞 5 .
𝜀 0 1 00 01 10 11 ...
𝑞0 + + ...
𝑞1 + + + + ...
𝑞2 + + + + ... (∗∗)
𝑞3 + + + + + + + ...
𝑞4 + + + + ...
𝑞5 + + + + + + + ...
In contrast, 𝑞 0 does not have the same row as any other state, nor does 𝑞 1 .
3.1 Definition Fix a Finite State machine over an alphabet Σ. For two states 𝑞
and 𝑞ˆ, a 𝜎 ∈ Σ∗ is a distinguishing string if starting the machine in 𝑞 with 𝜎 on the
tape and starting it in 𝑞ˆ with 𝜎 on the tape results in two different outcomes: in
one case the machine ends in an accepting state while in the other it rejects. Two
states for which there is a distinguishing string are distinguishable, otherwise they
are indistinguishable, written 𝑞 ∼ 𝑞ˆ.
3.2 Example This is a minimal version of the machine in (∗).
𝑟0 1
𝑟2 1
0 0
0 𝑟1 1
𝑟3 0,1
†
Moore’s algorithm is easy to understand and is suitable for small calculations but when writing code
be aware that another, Hopcroft’s algorithm, is more efficient.
Extra C. Machine minimization 251
Starting in 𝑟 0 and processing the string 𝜎 = 1 ends in a rejecting state, while starting
in 𝑟 1 and processing 𝜎 ends in an accepting state. So the two are distinguished
by 𝜎 . The states 𝑟 0 and 𝑟 2 , are distinguished by the length one string 𝜎 = 0, and 𝑟 0
is distinguished from 𝑟 3 by the empty string. Similarly, the pairs 𝑟 1, 𝑟 2 and 𝑟 1, 𝑟 3 ,
and 𝑟 2, 𝑟 3 are all distinguishable. This minimal machine has no indistinguishable
states.
States that are indistinguishable are redundant. We will compute whether
states are indistinguishable by checking whether they are distinguished by strings
of length 0, or by strings of length 1, etc. Two states 𝑞 and 𝑞ˆ are 𝑛 -distinguishable
if there is a distinguishing string of length at most 𝑛 , otherwise the are 𝑛 -
indistinguishable, denoted 𝑞 ∼𝑛 𝑞ˆ.
Observe that two states 𝑞 and 𝑞ˆ are 0-indistinguishable if and only if both are
accepting states or both are rejecting states.
3.3 Lemma The relations ∼0 , ∼1 , . . . are equivalences, as is ∼, and so partition the
states into equivalences classes, the 0-distinguishable classes, the 1-distinguishable
classes, . . . along with the distinguishable classes.
Proof Exercise C.23.
3.4 Example Here are some 𝑛 -equivalence classes for the machine (∗), using the
information in the table (∗∗).
𝑛 ∼𝑛 classes
0 E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 4 } E0,1 = {𝑞 3, 𝑞 5 }
1 E1,0 = {𝑞 0 } E1,1 = {𝑞 1 } E1,2 = {𝑞 2, 𝑞 4 } E1,3 = {𝑞 3, 𝑞 5 }
2 E2,0 = {𝑞 0 } E2,1 = {𝑞 1 } E2,2 = {𝑞 2, 𝑞 4 } E2,3 = {𝑞 3, 𝑞 5 }
The 0-distinguishable classes divide the rejecting states from the accepting ones.
The 1-distinguishable classes subdivide those, based on the length one strings.
Specifically, starting with E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 4 }, we can next distinguish 𝑞 0 from 𝑞 1
because they differ on the string 1. Similarly, 𝑞 0 differs from 𝑞 2 and 𝑞 4 on 0. And,
𝑞 1 differs from 𝑞 2 and 𝑞 4 on the string 0. So where there was one ∼0 class E0,0
there are now three ∼1 classes, E1,0 = {𝑞 0 }, E1,1 = {𝑞 1 }, and E1,2 = {𝑞 2, 𝑞 4 }.
At the next stage, using the length two strings to find the the 2-distinguishability
classes does not result in any further subdivisions.
So the algorithm first finds the ∼0 classes, then finds the ∼1 classes, etc., until
the classes stop splitting. What remains are the ∼ classes, and they serve as states
of a minimal machine.
There is one difficulty. In the prior example, to get the ∼0 classes we checked
all length 0 string inputs, stage 1 checked all length 1 strings, and stage 2 checked
all length 2 strings. If at stage 𝑛 the algorithm were to check all length 𝑛 strings
then it would take exponentially long, because there are 2𝑛 of them.
To fix that, consider states 𝑞 and 𝑞ˆ that are not 𝑛 -distinguishable but are
𝑛 + 1-distinguishable. Write a distinguishing string 𝜏 = ⟨𝑠 0, 𝑠 1, ... 𝑠𝑛−1, 𝑠𝑛 ⟩ , as
𝜏 = 𝛼 ⌢ 𝑠𝑛 . Because 𝑞 and 𝑞ˆ are not 𝑛 -distinguishable, where 𝛼 brings the machine
252 Chapter IV. Automata
from 𝑞 to some state 𝑟 and also brings the machine from 𝑞ˆ to some 𝑟ˆ, then 𝑟 and 𝑟ˆ
are equivalent, 𝑟, 𝑟ˆ ∈ E𝑛,𝑖 . Therefore, distinguishing between 𝑞 and 𝑞ˆ must happen
with the final character, 𝑠𝑛 . It must take the machine from state 𝑟 to a state in
one class, and also take the machine from 𝑟ˆ to a state in a different class. In
short, in looking for a split in passing from the 𝑛 -distinguishability classes to the
𝑛 + 1-distinguishability classes, we need only look at single characters.
In summary, Moore’s algorithm is that nodes 𝑞 and 𝑞ˆ are 𝑛 + 1-indistinguishable
if and only if they are 𝑛 -indistinguishable and also Δ(𝑞, 𝑥) is 𝑛 -indistinguishable
from Δ(𝑞,ˆ 𝑥) for all characters 𝑥 ∈ Σ. The next two examples illustrate, and also
show how to use the ∼ classes to make minimal machines.
3.5 Example We will find a machine that recognizes the same language as this one
but that is minimal.
𝑞2 a 𝑞4
a,b
b
b
𝑞0 𝑞5 a,b
a b a,b
𝑞1 a
𝑞3
For bookkeeping we will use triangular tables, with an entry for every pair of
different states. We will checkmark pairs that are distinguishable. From stage
to stage we fill in more marks, first doing 0-distinguishability, then refining it to
1-distinguishability, etc. When we reach a stage where the table does not change
then we are done.
Stage 0 is to checkmark the 𝑖, 𝑗 entries that are 0-distinguishable, where one of
𝑞𝑖 and 𝑞 𝑗 is accepting while the other is not.
0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ 5
Use the blank boxes to read off the ∼0 -equivalence classes, because blankness
means that the states are mutually 0-indistinguishable. For instance, there are blank
boxes in entries 0, 3 and 0, 4 and 3, 4 and this cluster is the first ∼0 equivalence
class. Similarly there is a cluster of blank entries in 1, 2 and 1, 5 and 2, 5.
E0,0 = {𝑞 0, 𝑞 3, 𝑞 4 } E0,1 = {𝑞 1, 𝑞 2, 𝑞 5 }
Stage 1 determines whether those ∼0 classes split. For each pair 𝑞𝑖 , 𝑞 𝑗 that are
together in a class, and for each input character, compute into which classes that
character sends those states.
Extra C. Machine minimization 253
a b
𝑞 0, 𝑞 3 𝑞1 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞2 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 0, 𝑞 4 𝑞1 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞2 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 3, 𝑞 4 𝑞5 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞5 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 1, 𝑞 2 𝑞3 ∈ E0,0, 𝑞 4 ∈ E0,0 𝑞4 ∈ E0,0, 𝑞 3 ∈ E0,0
𝑞 1, 𝑞 5 𝑞3 ∈ E0,0, 𝑞 5 ∈ E0,1 𝑞4 ∈ E0,0, 𝑞 5 ∈ E0,1
𝑞 2, 𝑞 5 𝑞4 ∈ E0,0, 𝑞 5 ∈ E0,1 𝑞3 ∈ E0,0, 𝑞 5 ∈ E0,1
Two states are distinguished when they are sent by a character to members of
unequal classes. The first case is in the 𝑞 1, 𝑞 5 line, where a takes 𝑞 1 to 𝑞 3 ∈ E0,0
and takes 𝑞 5 to 𝑞 5 ∈ E0,1 . So the ∼0 class containing 𝑞 1 and 𝑞 5 , the class E0,1 , will
split into multiple ∼1 classes. The next line of the computation also shows that 𝑞 2
and 𝑞 5 are distinguishable.
To record that 𝑞 1 and 𝑞 5 are distinguishable, add a checkmark in the triangular
table’s cell for 1, 5. Also add a 2, 5 checkmark.
0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ ✓ ✓ 5
Finish this stage by getting the ∼1 classes as clusters of blank cells. There is a
cluster in 0, 3 and 0, 4 and 3, 4. There is also a blank cell in 1, 2. There is no cluster
involving 𝑞 5 but since every state must be in some class, it goes in a class by itself.
a b 0
𝑞 0, 𝑞 3 𝑞1 ∈ E1,1, 𝑞 5 ∈ E1,2 𝑞2 ∈ E1,1, 𝑞 5 ∈ E1,2 ✓ 1
✓ 2
𝑞 0, 𝑞 4 𝑞1 ∈ E1,1, 𝑞 5 ∈ E1,2 𝑞2 ∈ E1,1, 𝑞 5 ∈ E1,2 ✓ ✓ ✓ 3
𝑞 3, 𝑞 4 𝑞5 ∈ E1,2, 𝑞 5 ∈ E1,2 𝑞5 ∈ E1,2, 𝑞 5 ∈ E1,2 ✓ ✓ ✓ 4
𝑞 1, 𝑞 2 𝑞3 ∈ E1,0, 𝑞 4 ∈ E1,0 𝑞4 ∈ E1,0, 𝑞 3 ∈ E1,0 ✓ ✓ ✓ ✓ ✓ 5
There was no more splitting. The algorithm terminates with these ∼ classes.
E0 = {𝑞 0 } E1 = {𝑞 1, 𝑞 2 } E2 = {𝑞 3, 𝑞 4 } E3 = {𝑞 5 }
To define the transitions between states, consider what happens when we feed
the character a to elements of a class, such as the class 𝑟 1 = E1 = {𝑞 1, 𝑞 2 }. For
instance, if we choose 𝑞 1 and look in the original machine then under input a it
goes to 𝑞 3 . Since 𝑞 3 is an element of E2 = 𝑟 2 , in the minimal machine the a arrow
out of 𝑟 1 goes to 𝑟 2 . Other transitions work the same way.
As to the terminology that this algorithm ‘collapses’ together the redundant
states, consider this picture of the prior example, showing a kind of projection.
𝑞2 a 𝑞4
a,b
b
Input M:
b
𝑞0 𝑞5 a,b
a b a,b
𝑞1 a
𝑞3
Output N : 𝑟0
a,b
𝑟1
a,b
𝑟2
a,b
𝑟3 a,b
ΔM a b ΔN a b
𝑞0 𝑞1 𝑞2 𝑟0 𝑟1 𝑟1
𝑞1 𝑞3 𝑞4
𝑟1 𝑟2 𝑟2
𝑞2 𝑞4 𝑞3
𝑞3 𝑞5 𝑞5
𝑟2 𝑟3 𝑟3
𝑞4 𝑞5 𝑞5
𝑞5 𝑞5 𝑞5 𝑟3 𝑟3 𝑟3
3.6 Example We will minimize the machine below. This illustrates one additional
point of the algorithm since this machine has an unreachable state, 𝑞 5 . Start by
omitting it.
𝑞1 𝑞3 0,1
1
0
𝑞0 0
1
𝑞2 𝑞4 𝑞5
1 1
0 0,1 0
Extra C. Machine minimization 255
For stage 1 check whether those classes split. The calculation shows that 𝑞 0
is distinguished from 𝑞 1 by 1, and 𝑞 0 is distinguished from 𝑞 2 , also by 1. The
triangular table below reflects those updates.
0 1 0
𝑞 0, 𝑞 1 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 3 ∈ E0,1 ✓ 1
𝑞 0, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 4 ∈ E0,1 ✓ 2
𝑞 1, 𝑞 2 𝑞2 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 ✓ ✓ ✓ 3
𝑞 3, 𝑞 4 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 ✓ ✓ ✓ 4
Thus the first ∼0 class splits and these are the ∼1 classes.
E1,0 = {𝑞 0 } E1,1 = {𝑞 1, 𝑞 2 } E1,2 = {𝑞 3, 𝑞 4 }
𝑟0 0,1
𝑟1 1
𝑟2
3.7 Lemma Moore’s algorithm outputs a machine that recognizes the same language
as the input machine and that is minimal.
Proof See Exercise C.24, which verifies that that Moore’s algorithm always halts,
that it produces a Finite State machine with a well-defined transition function,† that
this output machine recognizes the same language as the input machine, and that
the output machine is minimal.
3.8 Example As an alternative to the lemma’s whole argument, we will illustrate the
approach to showing minimality. We use the two machines given at the section’s
start (we write ‘a’ for ‘a .. z’ and ‘A’ for ‘A .. Z’). Call the input machine M and
the output N . Consider the union of the two sets of states. Here is the stage 0
table and classes.
†
The last paragraph of Example 3.5 describes how to define the transition function of the minimal
machine. It appears that if an input class 𝑟𝑖 has more than one element 𝑞 𝑗 then, depending on which
one we choose, we could get different output classes. We must show that the output is the same no
matter what choice we make.
256 Chapter IV. Automata
𝑞0
𝑞1
𝑞2
✓ ✓ ✓ 𝑞3
✓ ✓ ✓ 𝑞4 E0,0 = {𝑞 3, 𝑞 4, 𝑟 3 } E0,1 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑟 0, 𝑟 1, 𝑟 2 }
✓ ✓ 𝑟0
✓ ✓ 𝑟1
✓ ✓ 𝑟2
✓ ✓ ✓ ✓ ✓ ✓ 𝑟3
Stage 1 looks at pairs from the two ∼0 classes, calculating whether character
transitions split any class.
a A
𝑞 3, 𝑞 4 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0
𝑞 3, 𝑟 3 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 4, 𝑟 3 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 0, 𝑞 1 𝑞 1 ∈ E0,1, 𝑞 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑞 3 ∈ E0,0
𝑞 0, 𝑞 2 𝑞 1 ∈ E0,1, 𝑞 4 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑞 2 ∈ E0,1
𝑞 0, 𝑟 0 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 0, 𝑟 1 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑞 0, 𝑟 2 𝑞 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 1, 𝑞 2 𝑞 1 ∈ E0,1, 𝑞 4 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0
𝑞 1, 𝑟 0 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 3 ∈ E0,0, 𝑟 2 ∈ E0,1
𝑞 1, 𝑟 1 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 1, 𝑟 2 𝑞 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑟 2 ∈ E0,1
𝑞 2, 𝑟 0 𝑞 4 ∈ E0,0, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 2, 𝑟 1 𝑞 4 ∈ E0,0, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑞 2, 𝑟 2 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑟 0, 𝑟 1 𝑟 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑟 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑟 0, 𝑟 2 𝑟 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑟 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑟 1, 𝑟 2 𝑟 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑟 3 ∈ E0,0, 𝑟 2 ∈ E0,1
There are many splits, for example 𝑞 0 and 𝑞 1 are distinguished by A. In the resulting
triangular table there are six empty boxes, at 𝑞 0, 𝑟 0 , at 𝑞 1, 𝑟 1 , at 𝑞 2, 𝑟 2 , at 𝑞 3, 𝑞 4 , at
𝑞 3, 𝑟 3 , and at 𝑞 4, 𝑟 3 . We conclude that E0,0 does not split but that E0,1 splits into
three.
𝑞0
✓ 𝑞1
✓ 𝑞2
✓ E1,0 = {𝑞 3, 𝑞 4, 𝑟 3 }
✓ ✓ 𝑞3
✓ E1,1 = {𝑞 0, 𝑟 0 }
✓ ✓✓ 𝑞4
✓ ✓ ✓ 𝑟0
✓ E1,2 = {𝑞 1, 𝑟 1 }
✓ ✓ ✓ ✓ ✓ 𝑟1 E1,3 = {𝑞 2, 𝑟 2 }
✓ ✓ ✓ ✓ ✓ ✓ 𝑟2
✓ ✓ ✓ ✓ ✓ ✓ 𝑟3
The next stage shows no more splittings. The algorithm has grouped 𝑞 0 with 𝑟 0 ,
and 𝑞 1 with 𝑟 1 , and 𝑞 2 with 𝑟 2 . It has also grouped 𝑞 3 and 𝑞 4 together with 𝑟 3 .
Extra C. Machine minimization 257
It is not surprising that starting with the minimal machine N and performing
the algorithm results in one state 𝑟𝑖 per ∼ class. It is also not surprising that the
states of M can end with multiple 𝑞 𝑗 s’ in a class, since we have seen that in earlier
examples. But the point is that for each 𝑟𝑖 there is at least one associated 𝑞 𝑗 , and
therefore N has a number of states that is less than or equal to the number in M.
We close by describing a common scenario in which minimization plays an
important role. We have seen that when we have a problem to solve with
a Finite State machine, often a nondeterministic machine is easier and more
natural. An example is an algorithm that inputs a regular expression and outputs
a machine recognizing that expression. But our algorithm for converting a
nondeterministic machine to a deterministic machine has the problem that where
the nondeterministic machine has 𝑛 states, the deterministic machine has 2𝑛 . This
section’s result alleviates that exponential blow-up. We now have a three-step
process: from a problem, we start with a nondeterministic answer, convert that to
an equivalent deterministic machine, and then minimize to get a reasonably-sized
final answer. In practice, this gives good results.
IV.C Exercises
0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ 5
C.10 From the ∼𝑖 classes find the associated triangular table. (a) E𝑖,0 = {𝑞 0, 𝑞 1 },
E𝑖,1 = {𝑞 2 }, and E𝑖,2 = {𝑞 3, 𝑞 4 }, (b) E𝑖,0 = {𝑞 0 }, E𝑖,1 = {𝑞 1, 𝑞 2, 𝑞 4 }, and
E𝑖,2 = {𝑞 3 }, (c) E𝑖,0 = {𝑞 0, 𝑞 1, 𝑞 5 }, E𝑖,1 = {𝑞 2, 𝑞 3 }, and E𝑖,2 = {𝑞 4 },
✓ C.11 Suppose that E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 5 } and E0,1 = {𝑞 3, 𝑞 4 }, and we compute
this table.
a b
𝑞 0, 𝑞 1 𝑞1 ∈ E0,0, 𝑞 1 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 3 ∈ E0,1
𝑞 0, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 4 ∈ E0,1
𝑞 0, 𝑞 5 𝑞1 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 5 ∈ E0,0
𝑞 1, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1
𝑞 1, 𝑞 5 𝑞1 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 5 ∈ E0,0
𝑞 2, 𝑞 5 𝑞2 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞4 ∈ E0,1, 𝑞 5 ∈ E0,0
𝑞 3, 𝑞 4 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 𝑞5 ∈ E0,0, 𝑞 5 ∈ E0,0
(a) Which states are 1-distinguishable that were not 0-distinguishable? (b) Give
the resulting ∼1 classes.
258 Chapter IV. Automata
✓ C.12 This machine accepts strings with an odd parity, with an odd number of 1’s.
Minimize it using the algorithm described in this section.
0 0 0
1
𝑞0 𝑞1 𝑞2
1
1
C.13 For many machines we can find the unreachable states by eye, but there
is an algorithm. It inputs a machine M and initializes the set of reachable states
to 𝑅0 = {𝑞 0 }. For 𝑛 > 0, step 𝑛 of the algorithm is: for each 𝑞 ∈ 𝑅𝑛 find all
states 𝑞ˆ reachable from 𝑞 in one transition and add those to make 𝑅𝑛+1 . That
is, 𝑅𝑛+1 = 𝑅𝑛 ∪ { 𝑞ˆ = ΔM (𝑞, 𝑥) 𝑞 ∈ 𝑅𝑛 and 𝑥 ∈ Σ }. The algorithm stops when
𝑅𝑘 = 𝑅𝑘+1 and the set of reachable states is 𝑅 = 𝑅𝑘 . The unreachable states are
the others, 𝑄 − 𝑅 . For each machine, perform this algorithm.
𝑞3 𝑞5
a,b
b
(a) (b)
a a,b
a b
a a,b
𝑞0 𝑞1 a 𝑞2 𝑞3 𝑞4
𝑞0 𝑞1 𝑞2 a a, b
b
a,b b b
✓ C.14 Perform the minimization algorithm on the machine with redundant states
at the start of this section, the one labeled (∗).
✓ C.15 This machine accepts strings described by (ab|ba)*. Minimize it, using the
algorithm of this section.
a
𝑞0 𝑞1 𝑞2 b 𝑞7
a
b
b a a b a a b
𝑞3 𝑞4 b 𝑞5 a 𝑞6
b
a,b
C.16 If a machine’s start state is accepting, must the minimized machine’s start
state be accepting? If so then prove it, and if not then give an example machine
where it is false.
C.17 Minimize.
𝑞1
0 1
0
0
𝑞0 𝑞2 𝑞4 0,1
1
1 0 1
𝑞3
Extra C. Machine minimization 259
C.18 Minimize.
a a
𝑞1 𝑞3
b
a b
𝑞0 𝑞5 a,b
b b
𝑞2 𝑞4
b
a a
0
1 0 1
𝑞2 1 𝑞3
1
b b b b a,b
Note that the algorithm takes a time that is roughly are equal to the number of
states in the machine.
C.23 Verify Lemma 3.3. (a) Verify that each ∼𝑛 is an equivalence relation
between states. (b) Verify that ∼ is an equivalence.
C.24 We will verify that Moore’s algorithm halts on any input machine M and
outputs an N that recognizes the same language, and that is minimal.
(a) Prove that the algorithm always halts.
(b) Prove that the transition function of N is well-defined.
(c) Verify that the two machines recognize the same language.
(d) Show that N is minimal: any M̂ that recognize the same language as N has at
least as many states as N . Hint: first follow Example 3.8. Then do an argument
260 Chapter IV. Automata
by induction that shows that the start states of the two are indistinguishable,
and if two states 𝑞 and 𝑟 are indistinguishable then so are the states they
transition to on a single-character input. This gives an association of single 𝑟 ’s
with at least one 𝑞 , and so there are at least as many 𝑞 ’s as 𝑟 ’s.
C.25 There are ways to minimize Finite State machines other then the one given
in this section. One is Brzozowski’s algorithm, which has the advantage of being
surprising and fun in that you perform some steps that seem a bit wacky and
unrelated to elimination of states and then at the end it has worked. (However,
it has the disadvantage of taking worst-case exponential time.) We will not go
through why it works, but we will walk through the algorithm using this Finite
State machine, M.
𝑞1 a,b
a
𝑞0 a
b
𝑞2 b
Computational Complexity
Chapter
V Computational Complexity
Earlier, we asked what can be done with a mechanism at all. This mirrors the
subject’s history: when the Theory of Computing began there were no physical com-
puters. Researchers were driven by considerations such as the Entscheidungsproblem.
The subject was interesting, the questions compelling, and there were plenty of
problems, but the initial phase had a theory-driven feel.
A natural next step is to look to do jobs efficiently. When physical computers
became widely available, that’s exactly what happened. Today, the Theory of
Computing has incorporated many questions that at least originate in applied fields,
and that need answers that are feasible.
We will review how we determine the practicality of algorithms, the order of
growth of functions. Then we will see a collection of the kinds of problems that
drive the field today. By the end of this chapter we will be at the research frontier
and we will state some things without proof, as well as discuss some things about
which we are not sure. In particular, we will consider the celebrated question of P
versus NP.
Section
V.1 Big O
We begin by reviewing the definition of the order of growth of functions. We will
study this because it measures how algorithms consume computational resources.
First, an anecdote. Here is a grade school multiplication.
678
× 42
1356
2712
28476
The algorithm combines each digit of the multiplier 42 with each digit of the
multiplicand 678, in a nested loop. A person could sensibly feel that this is the
right way to compute multiplication — indeed, the only reasonable way — and that
in general, to multiply two 𝑛 digit numbers requires about 𝑛 2 -many operations.
Image: Striders can walk on water because they are five orders of magnitude smaller than us. This
change of scale changes the world — bugs see surface tension as more important than gravity. Similarly,
finding an algorithm that changes the time that it takes to solve a problem from 𝑛 2 to 𝑛 · lg 𝑛 can make
something easy that was previously not practical.
264 Chapter V. Computational Complexity
50
√
𝑛
500 1000
√
However, for large 𝑛 the value 𝑛 is much bigger than 10 lg (𝑛) . For instance,
√
1 000 000 = 1 000 while 10 lg ( 1 000 000) ≈ 199.32.
√
1000 𝑛
500
10 lg (𝑛)
500000 1000000
Thus the first criteria is that big O must focus on what happens in the long run.
†
See the Theory of Computing blog feed at https://theory.report (Various authors 2017). ‡ We
write lg (𝑛) for log2 (𝑛) . That is, compute lg (𝑛) by finding the power of 2 that produces 𝑛 , so if 𝑛 = 8
then lg (𝑛) = 3, while if 𝑛 = 10 then lg (𝑛) ≈ 3.32. # These graphs show functions where the domain
is the real numbers. Turing machines are discrete devices so it may seem more natural to have the
domain be the natural numbers. But we will see that real functions are much more convenient for
complexity measures.
Section 1. Big O 265
The second criteria is more subtle. The next four examples illustrate.
1.1 Example These graphs compare 𝑓 (𝑛) = 𝑛 2 + 5𝑛 + 6 with 𝑔(𝑛) = 𝑛 2 . The graph
on the right compares them in ratio, 𝑓 /𝑔.
500 𝑓
400
300 𝑔
200
10
100 5 𝑓 /𝑔
0
5 10 15 20 5 10 15 20
On the left we are struck that 𝑛 2 + 5𝑛 + 6 is ahead of 𝑛 2 . But on the right the
ratios show that this is misleading. For large 𝑛 ’s, 𝑓 ’s 5𝑛 and 6 are swamped by the
𝑛 2 . Consequently in the long run these two functions track together — by far the
biggest ingredient in their behavior is that they are both quadratic.
1.2 Example Next compare the quadratic 𝑔(𝑛) = 𝑛 2 + 5𝑛 + 6 with the cubic 𝑓 (𝑥) =
𝑛 3 + 2𝑛 + 3. In contrast to the prior example, these two don’t track together.
Initially 𝑔 is larger, with 𝑔( 0) = 6 > 𝑓 ( 0) = 3 and 𝑔( 1) = 12 > 𝑓 ( 1) = 6. But
then the cubic accelerates ahead of the quadratic, so much that at the scale of the
image, the graph of 𝑔 doesn’t rise much above the axis.
𝑓
15000
10000
30
5000 20
𝑔 10
𝑓 /𝑔
10 20 5 10 15 20
1000
𝑓
500
𝑔
5
𝑓 /𝑔
5 10 15 20 10 20
This example differs from Example 1.1 in that in the long run, 𝑓 stays ahead of 𝑔
and also gains in an absolute sense, because of 𝑓 ’s dominant term 2𝑛 2 is twice
as large as 𝑔’s 𝑛 2. So it may appear that we should view 𝑔’s rate as less than 𝑓 ’s.
However unlike in Example 1.2, 𝑓 does not accelerate away. Instead, the ratio
between the two is bounded. We will take 𝑔 to be equivalent to 𝑓 .
1.4 Example We close the motivation with a very important example. Let the function
bits : N → N give the number of bits needed to represent its input in binary. The
bottom line of this table shows lg (𝑛) , the power of 2 that equals 𝑛 .
Input 𝑛 0 1 2 3 4 5 6 7 8 9
Binary 0 1 10 11 100 101 110 111 1000 1001
bits (𝑛) 1 1 2 2 3 3 3 3 4 4
lg (𝑛) – 0 1 1.58 2 2.32 2.58 2.81 3 3.17
Here is a graph of bits (𝑛) , the table’s third line, for 𝑛 ∈ { 1, ... 30 }.
5 10 15 20 25 30
The relationship between the third and fourth lines is that bits (𝑛) = 1 + ⌊ lg (𝑛)⌋ ,
except for the boundary value that bits ( 0) = 1 (lg ( 0) is undefined). The graph
below compares bits (𝑛) with lg (𝑛) . Note the change in the horizontal and vertical
scales.
10 20 30 40 50 60 70 80 90 100
This illustrates that in the formula bits (𝑛) = 1 + ⌊ lg (𝑛)⌋ , over the long run the
‘1+’ and the floor don’t matter much. A reasonable summary is that the base 2
logarithm, lg 𝑛 , describes the number of bits required to represent the number 𝑛 .
Further, the formula for converting among logarithmic functions with other
bases, log𝑐 (𝑥) = log𝑏 (𝑥)/log𝑏 (𝑐) , shows that they differ only by the constant factor
1/log𝑏 (𝑐) . As Example 1.3 notes, with the function comparison definition given
Section 1. Big O 267
below we will disregard constant factors. So even the base does not matter —
another reasonable summary is that the number of bits is “a” logarithmic function.
Definition Machine resource sizes, such as the number of bits of the input and of
memory, are natural numbers. So to describe the performance of algorithms we
may think to focus on functions that input and output natural numbers. However,
above we have already found useful a function, lg, that inputs and outputs reals.
So instead we will consider a subset of the functions from R to R.†
1.5 Definition A complexity function 𝑓 is one that inputs real number arguments
and outputs real number values, and (1) has an unbounded domain, so that there
is a number 𝑁 ∈ R+ such that 𝑥 ≥ 𝑁 implies that 𝑓 (𝑥) is defined, and (2) is
eventually nonnegative, so that there is a number 𝑀 ∈ R+ so that 𝑥 ≥ 𝑀 implies
that 𝑓 (𝑥) ≥ 0.
1.6 Definition Let 𝑔 be a complexity function. Then Big O of 𝑔, O (𝑔) , is the set of
complexity functions 𝑓 satisfying that there are constants 𝑁 , 𝐶 ∈ R+ so that if
𝑥 ≥ 𝑁 then both 𝑔(𝑥) and 𝑓 (𝑥) are defined and 𝐶 · 𝑔(𝑥) ≥ 𝑓 (𝑥) . We say that 𝑓
is O (𝑔) , or that 𝑓 ∈ O (𝑔) , or that 𝑓 is of order at most 𝑔, or that 𝑓 = O (𝑔) .
1.7 Remarks (1) We use the letter ‘O’ because this is about the order of growth.
(2) The term ‘complexity function’ is not standard but we find it convenient. (3) The
‘ 𝑓 = O (𝑔) ’ notation is very common, but awkward. It does not follow the usual
rules of equality, such as that 𝑓 = O (𝑔) does not allow us to write ‘O (𝑔) = 𝑓 ’.
Another is that 𝑥 = O (𝑥 2 ) and 𝑥 2 = O (𝑥 2 ) together do not imply that 𝑥 = 𝑥 2.
(4) Some authors do something a little different, they allow negative real outputs
and write the inequality with absolute values, 𝑓 (𝑥) ≤ 𝐶 · |𝑔(𝑥)| . (5) Sometimes
you see ‘ 𝑓 is O (𝑔) ’ stated as ‘ 𝑓 (𝑥) is O (𝑔(𝑥)) ’. Speaking strictly, this is wrong
because 𝑓 (𝑥) and 𝑔(𝑥) are numbers, not functions.
Think of ‘ 𝑓 is O (𝑔) ’ as meaning that 𝑓 ’s growth rate is less than or equal to 𝑔’s
rate. The sketches below illustrate the two possibilities.
𝑔 𝑔
𝑓
𝑓
𝑁 𝑁
1.16 Figure: Each bean holds the complexity functions. Faster growing functions are
higher, so that if they were shown then 𝑥 5 would be above 𝑥 4. On the left is the cone
O (𝑔) for some 𝑔. The ellipse at the top is Θ(𝑔) , holding functions with growth rate
equivalent to 𝑔’s. The sketch on the right adds the cone O (𝑓 ) for some 𝑓 in O (𝑔) .
The next result eases Big O calculations for most of the functions that we
encounter, such as polynomial, exponential, and logarithmic functions.
1.17 Theorem Let 𝑓 , 𝑔 be complexity functions. Suppose that lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥)
exists and equals 𝐿 , which is a member of R ∪ { ∞ }.
(a) If 𝐿 = 0 then 𝑔 grows faster than 𝑓 , that is, 𝑓 is O (𝑔) but 𝑔 is not O (𝑓 ) .†
(b) If 𝐿 = ∞ then 𝑓 grows faster than 𝑔, so that 𝑔 is O (𝑓 ) but 𝑓 is not O (𝑔) .‡
(c) If 𝐿 is between 0 and ∞ then the two functions have something like the same
growth rates, so that 𝑓 is Θ(𝑔) and 𝑔 is Θ(𝑓 ) .#
It pairs well with the following result familiar from Calculus I.
1.18 Theorem (L’Hôpital’s Rule) Let 𝑓 and 𝑔 be complexity functions such that both
𝑓 (𝑥) → ∞ and 𝑔(𝑥) → ∞ as 𝑥 → ∞, and such that both are differentiable for
large enough inputs. If lim𝑥→∞ 𝑓 ′ (𝑥)/𝑔′ (𝑥) exists and equals 𝐿 ∈ R ∪ {∞} then
lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥) also exists and also equals 𝐿 .
1.19 Example Let 𝑓 (𝑥) = 𝑥 2 + 5𝑥 + 6 and 𝑔(𝑥) = 𝑥 3 + 2𝑥 + 3. Here we apply L’Hôpital’s
Rule multiple times.
𝑓 (𝑥) 𝑥 2 + 5𝑥 + 6 2𝑥 + 5 2
lim = lim 3 = lim = lim =0
𝑥→∞ 𝑔(𝑥) 𝑥→∞ 𝑥 + 2𝑥 + 3 𝑥→∞ 3𝑥 2 + 2 𝑥→∞ 6𝑥
So 𝑓 is O (𝑔) but 𝑔 is not O (𝑓 ) . That is, 𝑓 ’s growth rate is strictly less than 𝑔’s.
1.20 Example Next consider 𝑓 (𝑥) = 3𝑥 2 + 4𝑥 + 5 and 𝑔(𝑥) = 𝑥 2.
3𝑥 2 + 4𝑥 + 5 6𝑥 + 4 6
lim = lim = lim = 3
𝑥→∞ 𝑥2 𝑥→∞ 2𝑥 𝑥→∞ 2
So their growth rates are roughly the same. That is, 𝑓 is Θ(𝑔) .
1.21 Example For 𝑓 (𝑥) = 5𝑥 4 + 15 and 𝑔(𝑥) = 𝑥 2 − 3𝑥 , this
5𝑥 4 + 15 20𝑥 3 60𝑥 2
lim = lim = lim =∞
𝑥→∞ 𝑥 2 − 3𝑥 𝑥→∞ 2𝑥 − 3 𝑥→∞ 2
shows that 𝑓 ’s growth rate is strictly greater than 𝑔’s rate — 𝑔 is O (𝑓 ) but 𝑓 is
not O (𝑔) .
†
This case is denoted 𝑓 is 𝑜 (𝑔) , read aloud as “Little Oh of 𝑔.” ‡ We also denote ‘𝑔 is O (𝑓 ) ’ by 𝑓 is
Ω (𝑔) , read aloud as “Big Omega of 𝑔.” # If 𝐿 = 1 then 𝑓 and 𝑔 are asymptotically equivalent.
270 Chapter V. Computational Complexity
1.22 Example The logarithmic function 𝑓 (𝑥) = log𝑏 (𝑥) grows very slowly: log𝑏 (𝑥)
is O (𝑥) , and log𝑏 (𝑥) is O (𝑥 0.1 ) , and is O (𝑥 0.01 ) . In fact by this equation, for any
𝑑 > 0 no matter how small, log𝑏 (𝑥) is O (𝑥 𝑑 ) and 𝑥 𝑑 is not O ( log𝑏 (𝑥)) .
log𝑏 (𝑥) ( 1/𝑥 ln (𝑏)) 1 1
lim = lim = · lim 𝑑 = 0
𝑥→∞ 𝑥𝑑 𝑥→∞ 𝑑𝑥 𝑑 − 1 𝑑 ln (𝑏) 𝑥→∞ 𝑥
The difference in growth rates is even more marked than that. L’Hôpital’s Rule,
along with the Chain Rule, gives that ( log𝑏 (𝑥)) 2 is O (𝑥) .
gives ( 1/𝑑) · lim𝑥→∞ ( 1/ln ( 2)) + ln (𝑥) /𝑥 − 1 and the second derivative gives
1/𝑑 (𝑑 − 1) · lim𝑥→∞ 1/𝑥 𝑑 − 1 , which is zero.
1.23 Example We can compare the polynomial function 𝑓 (𝑥) = 𝑥 2 with the exponential
function 𝑔(𝑥) = 2𝑥 .
2𝑥 2𝑥 · ln ( 2) 2𝑥 · ( ln ( 2)) 2
lim = lim = lim =∞
𝑥→∞ 𝑥2 𝑥→∞ 2𝑥 𝑥→∞ 2
Thus 𝑓 ∈ O (𝑔) but 𝑔 ∉ O (𝑓 ) . Induction gives that lim𝑥→∞ 2𝑥 /𝑥 𝑘 = ∞ for any 𝑘 .
1.24 Lemma Logarithmic functions grow more slowly than polynomial functions: if
𝑓 (𝑥) = log𝑏 (𝑥) for some base 𝑏 and 𝑔(𝑥) = 𝑎𝑚 𝑥 𝑚 + · · · + 𝑎 0 then 𝑓 is O (𝑔)
but 𝑔 is not O (𝑓 ) . Polynomial functions grow more slowly than exponential
functions: where ℎ(𝑥) = 𝑏 𝑥 for some base 𝑏 > 1 then then 𝑔 is O (ℎ) but ℎ is not
O (𝑔) .
We’ve defined complexity functions as mapping R to R, rather than the more
natural N to N. (One motivation is that some functions that we want to work with
are real functions, such as logarithms. Another is that L’Hôpital’s Rule, which uses
the derivative and so needs reals, is a big convenience.) The next result ensures
that our conclusions in the continuous context carry over to the discrete.
1.25 Lemma Let 𝑓0, 𝑓1 : R → R, and consider the restrictions to a discrete domain
𝑔0 = 𝑓0 ↾N and 𝑔1 = 𝑓1 ↾N . Where 𝐿 ∈ R ∪ { ∞ },
(a) for 𝑎 ∈ R, if 𝐿 = lim𝑥→∞ (𝑎𝑓0 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑎𝑔0 ) (𝑛)
(b) if 𝐿 = lim𝑥→∞ (𝑓0 + 𝑓1 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑔0 + 𝑔1 ) (𝑛) ,
(c) if 𝐿 = lim𝑥→∞ (𝑓0 · 𝑓1 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑔0 · 𝑔1 ) (𝑛) , and
(d) when the expressions are defined, if 𝐿 = lim𝑥→∞ (𝑓0 /𝑓1 ) (𝑥) then 𝐿 =
lim𝑛→∞ (𝑔0 /𝑔1 ) (𝑛) .
Section 1. Big O 271
Tractable and intractable The table below lists orders of growth that are most
common in practice.
Order Name Examples
O ( 1) Bounded 𝑓 (𝑛) = 15
O ( lg ( lg (𝑛))) Double logarithmic 𝑓 (𝑛) = ln ( ln (𝑛))
O ( lg (𝑛)) Logarithmic 𝑓0 (𝑛) = ln (𝑛) , 𝑓1 (𝑛) = lg (𝑛 3 )
O ( lg (𝑛))𝑐 Polylogarithmic 𝑓 (𝑛) = ( lg (𝑛)) 3
O (𝑛) Linear 𝑓 (𝑛) = 3𝑛 + 4
O (𝑛 lg (𝑛)) Log-linear 𝑓0 (𝑛) = 5𝑛 lg (𝑛) + 𝑛, 𝑓1 (𝑛) = lg (𝑛 !)
O (𝑛 2 ) Polynomial (quadratic) 𝑓 (𝑛) = 5𝑛 2 + 2𝑛 + 12
O (𝑛 3 ) Polynomial (cubic) 𝑓 (𝑛) = 2𝑛 3 + 12𝑛 2 + 5
..
.
2
O ( 2poly ( lg (𝑛) ) ) Quasipolynomial 𝑓0 (𝑛) = 2 ( lg (𝑛) ) +3 lg (𝑛), 𝑓1 (𝑛) = 𝑛 lg (𝑛)
..
.
O ( 2𝑛 ) Exponential 𝑓 (𝑛) = 10 · 2𝑛
O ( 3𝑛 ) Exponential 𝑓 (𝑛) = 6 · 3𝑛 + 𝑛 2
..
.
O (𝑛 !) Factorial 𝑓 (𝑛) = 5 · 𝑛 ! + 𝑛 15 − 7
O (𝑛𝑛 ) –No standard name– 𝑓 (𝑛) = 2 · 𝑛𝑛 + 3 · 2𝑛
We often draw a line in this hierarchy after the polynomial functions; the next
table shows why. It lists how long a job would take if we used an algorithm that
runs in time lg 𝑛 , time 𝑛 , etc. (A modern computer runs at 10 GHz, 10 000 million
ticks per second, and there are 3.16 × 107 seconds in a year.)
𝑛 =1 𝑛 = 10 𝑛 = 50 𝑛 = 100
lg 𝑛 – 1.05 × 10 − 17 1.79 × 10 − 17 2.11 × 10 − 17
𝑛 3.17 × 10 − 18 3.17 × 10 − 17 1.58 × 10 − 16 3.17 × 10 − 16
𝑛 lg 𝑛 – 1.05 × 10 − 16 8.94 × 10 − 16 2.11 × 10 − 15
𝑛2 3.17 × 10 − 18 3.17 × 10 − 16 7.92 × 10 − 15 3.17 × 10 − 14
𝑛3 3.17 × 10 − 18 3.17 × 10 − 15 3.96 × 10 − 13 3.17 × 10 − 12
2𝑛 6.34 × 10 − 18 3.24 × 10 − 15 3.57 × 10 − 3 4.02 × 1012
1.27 Table: Time taken in years by algorithms whose behavior is given by a few func-
tions, on a few size 𝑛 ’s.
In the 𝑛 = 100 column, between the first few rows the relative change is an
order of magnitude but the absolute times are small. Then we get to the final
row. That’s not a typo — the last entry really is on order of 1012 years. It is huge —
the universe is 14 × 109 years old so this computation, even with input size of
only 100, would take longer than the age of the universe. Exponential growth is
very, very much larger than polynomial growth.
272 Chapter V. Computational Complexity
They do the same thing but their run times are different. On the left g0 sets the
local variable x inside the loop. That makes the code on the left slower than the
right by four calculations. Big O disregards this constant time difference. Big O is
good for comparing running times among algorithms but not as good for comparing
running times among programs.
That fits with our second point about Big O. We use it to help pick the best
algorithm, to rank them according to how much they use of some computing
resources. But algorithms are tied to an underlying computing model.‡
Besides the Turing machine, another model that is widely used in this context
is the Random Access machine (RAM). Whereas a Turing machine cell stores only
a single symbol, so that big numbers need multiple cells, on a RAM model machine
each register holds an entire integer. And whereas to get to a cell a Turing machine
may spend lots of steps traversing the tape, the RAM model gets each register’s
†
Cobham’s Thesis is not universally accepted. Some researchers object that if an algorithm runs in time
𝐶𝑛𝑘 but with an enormous 𝑘 or an enormous 𝐶 , or both, then the algorithm is not practical. A rejoinder
to that objection notes a pattern that when someone announces an algorithm with a large exponent or
large constant then typically the approach gets refined over time, shrinking the number. In any event,
polynomial time is significantly better than exponential time. Here we accept Cobham’s thesis because
it gives technical meaning to the informal ‘tractable’. ‡ More discussion of the relationship between
algorithms and machine models is in Section 3.
Section 1. Big O 273
†
Authors do sometimes state the order of magnitude of these constants.
274 Chapter V. Computational Complexity
𝑛 ! – 𝑛 is a power of ten
(
𝑓 (𝑛) =
𝑛 – otherwise
This machine runs in superexponential time for rare inputs (called “black holes”).
The definition gives that overall this machine runs in time O (𝑛 !) , while for most
inputs it would be quite fast.†
V.1 Exercises
1.29 True or false: if a function is O (𝑛 2 ) then it is O (𝑛 3 ) .
✓ 1.30 Your classmate emails you a draft of an assignment answer that says, “I have
an algorithm with running time that is O (𝑛 2 ) . So with input 𝑛 = 5 it will take
25 ticks.” Make two corrections.
1.31 Suppose that someone posts to a group that you are in, “I’m working on a
problem that is O (𝑛 3 ) .” Explain to them, gently, how their sentence is mistaken.
✓ 1.32 How many bits does it take to express each number in binary? (a) 5 (b) 50
(c) 500 (d) 5 000
✓ 1.33 One is true, the other one is not. Which is which?
(a) If 𝑓 is O (𝑔) then 𝑓 is Θ(𝑔) .
(b) If 𝑓 is Θ(𝑔) then 𝑓 is O (𝑔) .
✓ 1.34 For each find the function on the order of growth hierarchy, Table 1.26,
that has the same rate of growth. (a) 𝑛 2 + 5𝑛 − 2 (b) 2𝑛 + 𝑛 3 (c) 3𝑛 4 − lg lg 𝑛
(d) lg 𝑛 + 5
1.35 For each give the function on the order of growth hierarchy, Table 1.26, that
has the same(rate of growth. That is, find 𝑔 in that table where 𝑓 is Θ(𝑔) .
𝑛 – if 𝑛 < 100
(a) 𝑓 (𝑛) =
0 – else
†
A real life example of such a thing is that the simplex algorithm, which is very widely used for linear
optimization, runs in exponential time in the worst case but typically seems to run in polynomial time.
276 Chapter V. Computational Complexity
1.48 Where does 𝑔(𝑥) ≤ 𝑥 O ( 1 ) place the function 𝑔 in the order of growth
hierarchy? Hint: see the prior question.
1.49 Let 𝑓 (𝑥) = 2𝑥 and 𝑔(𝑥) = 𝑥 2 . Prove directly from Definition 1.6 that 𝑓
is O (𝑔) , but that 𝑔 is not O (𝑓 ) .
1.50 Prove that 2𝑛 is O (𝑛 !) . Hint: because of the factorial, consider these natural
number functions and find suitable 𝑁 , 𝐶 ∈ N.
1.51 Use L’Hôpital’s Rule as in Example 1.22 to verify these for any 𝑑 ∈ R+:
(a) ( log𝑏 (𝑥)) 3 is O (𝑥 𝑑 ) (b) for any 𝑘 ∈ N+ , ( log𝑏 (𝑥))𝑘 is O (𝑥 𝑑 ) .
1.52 Assume that 𝑔 : R → R is increasing, so that 𝑥 1 ≥ 𝑥 0 implies that 𝑔(𝑥 1 ) ≥
𝑔(𝑥 0 ) . Let 𝑓 : R → R be a constant function. Show that 𝑓 is O (𝑔) .
1.53 (a) Show that there is a computable function whose output values grow at a
rate that is O ( 1) , one whose values grow at a rate that is O (𝑛) , one for O (𝑛 2 ) , etc.
(b) The Halting problem function 𝐾 is uncomputable. Place its rate of growth
in the order of growth hierarchy, Table 1.26. (c) Produce a function that is not
computable because its output values are larger than those of any computable
function. (You need not show that the rate of growth is greater, only that the
outputs are larger.)
1.54 Show that 𝑥 lg 𝑥 is quasipolynomial.
1.55 Show that the quasipolynomial function 𝑓 (𝑥) = 𝑥 lg 𝑥 grows faster than any
polynomial but slower than any exponential function.
✓ 1.56 Show that O ( 2𝑥 ) ∈ O ( 3𝑥 ) but O ( 2𝑥 ) ≠ O ( 3𝑥 ) .
1.57 Table 1.26 states that 𝑛 ! grows slower than 𝑛𝑛 . (a) Verify this. Hint: although
𝑛 ! is a natural
√ number function, Theorem 1.17 still applies. (b) Stirling’s formula
is that 𝑛 ! ≈ 2𝜋𝑛 · (𝑛𝑛 /𝑒 𝑛 ) . Doesn’t this imply that 𝑛 ! is Θ(𝑛𝑛 ) ?
✓ 1.58 Two complexity functions 𝑓 , 𝑔 are asymptotically equivalent, 𝑓 ∼ 𝑔, if
lim𝑥→∞ (𝑓 (𝑥)/𝑔(𝑥)) = 1. Show that each pair is asymptotically equivalent:
(a) 𝑓 (𝑥) = 𝑥 2 + 5𝑥 + 1 and 𝑔(𝑥) = 𝑥 2 , (b) lg (𝑥 + 1) and lg (𝑥) .
1.59 Is there an 𝑓 so that O (𝑓 ) is the set of all polynomials?
1.60 There are orders of growth between polynomial and exponential. Specifically,
𝑓 (𝑥) = 𝑥 lg 𝑥 is one. (a) Show that lg (𝑥) ∈ O (( lg (𝑥)) 2 ) but ( lg (𝑥)) 2 ∉ O ( lg (𝑥)) .
(b) Argue that for any power 𝑘 , we have 𝑥 𝑘 ∈ O (𝑥 lg 𝑥 ) but 𝑥 lg (𝑥 ) ∉ O (𝑥 𝑘 ) .
Hint: take the ratio, rewrite using 𝑎 = 2lg (𝑎) , and consider the limit of the exponent.
2
(c) Show that 𝑥 lg 𝑥 = 2 ( lg 𝑥 ) . Hint: take the logarithm of both halves. (d) Show
that 𝑥 lg 𝑥 is in O ( 2𝑥 ) . Hint: form the ratio using the prior item.
1.61 Verify the clauses of Lemma 1.12. (a) If 𝑎 ∈ R+ then 𝑎𝑓 is also O (𝑔) .
(b) The function 𝑓0 + 𝑓1 is O (𝑔) , where 𝑔 is defined by 𝑔(𝑛) = max (𝑔0 (𝑛), 𝑔1 (𝑛)) .
(c) The product 𝑓0 𝑓1 is O (𝑔0𝑔1 ) .
1.62 Verify these clauses of Lemma 1.15. (a) The Big-O relation is reflexive.
(b) It is also transitive.
278 Chapter V. Computational Complexity
1.63 Assume that 𝑓 and 𝑔 are complexity functions. (a) Let lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥)
exist and equal 0. Show that 𝑓 is O (𝑔) . (Hint: this requires a rigorous defini-
tion of the limit.) (b) We can give an example where 𝑓 is O (𝑔) even though
lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥) does not exist. Verify that, where 𝑔(𝑥) = 𝑥 and where 𝑓 (𝑥) = 𝑥
when ⌊𝑥⌋ is odd and 𝑓 (𝑥) = 2𝑥 when ⌊𝑥⌋ is even.
1.64 Prove Lemma 1.24.
Section
V.2 A problem miscellany
Much of today’s work in the Theory of Computation is driven by problems that
originate outside of the subject. We will describe some of these problems to get a
sense of the ones that people work on and also to use for examples and exercises.
All of these problems are well-known to anyone in the field.
Problems, with stories We start with a few that come with stories.
These stories are fun and an important part of the culture, and they also
give a sense of where in general problems come from.
WR Hamilton was a polymath whose genius was recognized early
and he was given a sinecure as Astronomer Royal of Ireland. He made
important contributions to classical mechanics, where his reformulation
of Newtonian mechanics is now called Hamiltonian mechanics. Other
work of his in physics helped develop classical field theories such as
electromagnetism and laid the groundwork for the development of William Rowan
quantum mechanics. In mathematics, he is best known as the inventor Hamilton
of the quaternion number system. 1805–1865
One of his ventures was a game, Around the World. The vertices in
the graph below were holes in a wooden board, labeled with the names of world
cities. Players put pegs in the holes, looking for a circuit that visits each city once
and only once.
It did not make Hamilton rich. But it did get him associated with a great problem.
2.2 Problem (Hamiltonian Circuit) Given a graph, decide if it contains a cyclic path
that includes each vertex once and only once.
Section 2. A problem miscellany 279
A special case is the Knight’s Tour problem, to use a chess knight to make a
circuit of the squares on the board. (Recall that a knight moves three squares
at a time, with the first two squares in one direction and then the third one
perpendicular to that direction.)
This is the solution given by L Euler. In graph terms, there are sixty four vertices,
representing the board squares. An edge goes between two vertices if they are
connected by a single knight move. Knight’s Tour asks for a Hamiltonian circuit of
that graph.
Hamiltonian Circuit has another famous variant.
2.3 Problem (Traveling Salesman) Given a weighted undirected graph, where we call
the vertices 𝑆 = {𝑐 0, ... 𝑐𝑘 − 1 } ‘cities’ and we call the edge weight 𝑑 (𝑐𝑖 , 𝑐 𝑗 ) ∈ N+
for 𝑖 ≠ 𝑗 the ‘distance’ between the cities, find the shortest-distance circuit that
visits every city and returns back to the start.
We can start with a map of the state capitals of the
forty eight contiguous US states and the distances between
them: Montpelier VT to Albany NY is 254 kilometers, etc.
From among all trips that visit each city and return back to
the start, such as Montpelier → Albany → Harrisburg →
Courtesy xkcd.com · · · → Montpelier, we want the shortest one.
As stated, this is an optimization problem. However we
can recast it as a decision problem. Introduce a parameter bound 𝐵 ∈ N and
change the problem statement to ‘decide if there is a circuit of total distance less
than 𝐵 ’. If we had an algorithm to quickly solve this decision problem then we
could also solve the optimization problem: ask whether there is a trip bounded by
length 𝐵 = 1, then ask if there is a trip of length 𝐵 = 2, etc. When we eventually
get a ‘yes’, we know the length of the shortest trip.
The next problem sounds much like Hamiltonian Circuit, in that
it involves exhaustively traversing a graph. But it proves to act very
differently.
Today the city of Kaliningrad is a Russian enclave between Poland
and Lithuania. But in 1727 it was in Prussia and was called Königsberg.
The Pregel river divides the city into four areas, connected by seven
bridges. The citizens used to promenade, to take leisurely walks or
drives where they could see and be seen. The question arose: can a
person cross each bridge once and only once, and arrive back at the Leonhard Euler
1707–1783
280 Chapter V. Computational Complexity
start? No one could think of a way but no one could think of a reason
that there was no way. A local mayor wrote to Euler, who proved that
no circuit is possible. This paper founded Graph Theory.
A D
Euler’s summary sketch is in the middle and the graph is on the right.
2.4 Problem (Euler Circuit) Given a graph, find a circuit that traverses each edge
once and only once, or find that no such circuit exists.
Next is a problem that sounds hard. But all of us see it solved every day, for
instance when we ask our smartphone for the shortest route to some place.
2.5 Problem (Shortest Path) Given a weighted graph and two vertices, find the
least-weight path between them, or find that no path exists.
There is an algorithm that solves this problem quickly.† For instance, with the
graph below we could look for the path from 𝐴 to 𝐹 of least cost.
14
𝐴 𝐷
9
9 2
7 𝐶 𝐹
10 11 6
𝐵 𝐸
15
the conjecture. But he did make the problem famous by promoting it among his
friends.
𝑃 𝑄 𝑅 𝑃 ∨𝑄 𝑃 ∨ ¬𝑄 ¬𝑃 ∨ 𝑄 ¬𝑃 ∨ ¬𝑄 ∨ ¬𝑅 𝑓 (𝑃, 𝑄, 𝑅)
𝐹 𝐹 𝐹 𝐹 𝑇 𝑇 𝑇 𝐹
𝐹 𝐹 𝑇 𝐹 𝑇 𝑇 𝑇 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝑇 𝐹
𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇 𝑇
𝑇 𝑇 𝑇 𝑇 𝑇 𝑇 𝐹 𝐹
More problems, omitting the stories We will list more example problems but
leaving out the background (although for some of them the motivation is clear
even without a story). All of these problems are also widely known in the field.
2.11 Problem (Vertex-to-Vertex Path) Given a graph and two vertices, find if the
second is reachable from the first.†
2.12 Example These are two Western-tradition constellations, Ursa Minor and Draco.
Here we can solve the Vertex-to-Vertex Path problem by eye. For any two vertices
in Ursa Minor there is a path and for any two vertices in Draco there is a path. But
if the two are in different constellations then there is no path.
For a graph with many thousands of nodes, such as a computer network, the
problem is harder than in the prior example. A close variant problem is to decide,
given a graph, whether all vertex pairs are connected.
†
The name Vertex-to-Vertex Path is nonstandard. It is usually known as 𝑠𝑡 -Path, 𝑠𝑡 -Connectivity, or
STCON (𝑠 and 𝑡 are generic names for vertices).
Section 2. A problem miscellany 283
2.13 Problem (Minimum Spanning Tree) Given a weighted undirected graph, find a
subgraph containing all the vertices of the original graph such that its edges have
a minimum total.
This is an undirected graph with weights on the edges.
18
8
4
3 10
9
1 7 9
3 9
5 8
4
9 4
4
2 6
9
The highlighted subgraph includes all of the vertices, that is, it spans the graph. In
addition, its weights total to a minimum from among all of the spanning subgraphs.
From that it follows that this subgraph is a tree, meaning that it has no cycles, or
else we could eliminate an edge from the cycle and thereby lower the edge weight
total without dropping any vertices.
This looks somewhat like the Hamiltonian Circuit problem in that the sought-for
subgraph contains all of the vertices. However, for the Minimum Spanning Tree
problem we know algorithms that are quick, O (𝑛 lg 𝑛) .
2.14 Problem (Vertex Cover) Given a graph and a bound 𝐵 ∈ N, decide if the graph
has a size 𝐵 set of vertices, 𝐶 , such that for any edge, at least one of its ends is a
member of 𝐶 .
2.15 Example A museum posts guards to watch their exhibits. There are eight halls,
laid out as below. They will put the guards at some of the corners 𝑤 0 , . . . 𝑤 5 . What
is the smallest number of guards that will suffice to watch all of the hallways?
𝑤0 𝑤1 𝑤2
𝑤3 𝑤4 𝑤5
Checking each corner shows that one guard will not suffice. The two-element set
𝐶 = {𝑤 0, 𝑤 4 } is a vertex cover: every hallway has at least one end in 𝐶 .
2.16 Problem (Clique) Given a graph and a bound 𝐵 ∈ N, decide if the graph has a
size 𝐵 set vertices such that any two are connected.
The term ‘clique’ comes from social networks; if the nodes represent people
and the edges connect friends then a clique is a set of people who are all friends.
A graph with a 4-clique has the subgraph like the one below on the left and
any graph with a 5 clique has the subgraph like the one the right.
284 Chapter V. Computational Complexity
𝑣4 𝑣1
𝑣5 𝑣0
2.19 Problem (Max Cut) A graph cut partitions the vertices into two disjoint subsets.
The cut set contains the edges with a vertex in each subset. The Max Cut problem
is to find the partition with the largest cut set.
2.20 Example For this graph the largest cut set contains six edges, the ones connnecting
differently colored vertices here.†
𝑣1 𝑣3
𝑣0 𝑣5
𝑣2 𝑣4
2.21 Animation: A partition for the graph with a maximum cut set.
2.22 Problem (Three Dimensional Matching) Let the sets 𝑋, 𝑌 , 𝑍 all have the same
number of elements, 𝑛 . Given as input a set 𝑀 ⊆ 𝑋 × 𝑌 × 𝑍 , decide if there is a
ˆ ⊆ 𝑀 containing 𝑛 elements such that no two of the triples in
matching, a set 𝑀
ˆ
𝑀 agree on their first coordinates, or their second or third coordinates either.
2.23 Example Let 𝑋 = { a, b }, 𝑌 = { b, c }, and 𝑍 = { a, d }, so that 𝑛 = 2. Below is a
subset of 𝑋 × 𝑌 × 𝑍 (it actually equals 𝑋 × 𝑌 × 𝑍 ).
𝑀 = { ⟨a, b, a⟩, ⟨a, c, a⟩, ⟨b, b, a⟩, ⟨b, c, a⟩, ⟨a, b, d⟩, ⟨a, c, d⟩, ⟨b, b, d⟩, ⟨b, c, d⟩ }
The set 𝑀ˆ = { ⟨a, b, a⟩, ⟨b, c, d⟩ } has 2 elements. They disagree in their first
coordinates, and their second, and their third.
2.24 Example Fix 𝑛 = 4 and consider 𝑋 = { 1, 2, 3, 4 }, 𝑌 = { 10, 20, 30, 40 }, and
𝑍 = { 100, 200, 300, 400 }, all four-element sets. Also fix this subset of 𝑋 × 𝑌 × 𝑍 .
𝑀 = { ⟨1, 10, 200⟩, ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 10, 400⟩,
⟨3, 40, 100⟩, ⟨3, 40, 200⟩, ⟨4, 10, 200⟩, ⟨4, 20, 300⟩ }
ˆ = { ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 40, 100⟩, ⟨4, 10, 200⟩ }.
A matching is 𝑀
2.25 Problem (Subset Sum) Given a multiset of natural numbers 𝑆 = {𝑛 0, ... 𝑛𝑘 − 1 }
and a target 𝑇 ∈ N, decide if a subset of 𝑆 sums to the target.†
†
One way to verify this is with a script that checks all two-set partitions of the vertices. † Recall that in
a multiset repeats do not collapse, so the multiset { 1, 2, 2, 3 } is different than the multiset { 1, 2, 3 } .
But a multiset is like a set in that the order of the elements is not significant, so the multiset { 1, 2, 2, 3 }
is the same as the multiset { 1, 2, 3, 2 } . In short, a multiset is an unordered list.
Section 2. A problem miscellany 285
2.26 Example Do some of the numbers { 911, 22, 821, 563, 405, 986, 165, 732 } add to
𝑇 = 1173? One such collection is { 165, 986, 22 }.
In contrast, no subset of { 831, 357, 63, 987, 117, 81, 6785, 606 } adds to 𝑇 =
2105. All of the numbers are multiples of three, while the target 𝑇 is not.
2.27 Problem (Knapsack) Given a finite multiset 𝑆 whose elements 𝑠 have a natural
number weight 𝑤 (𝑠) and value 𝑣 (𝑠) , and also given a weight bound 𝐵 and a value
target 𝑇, find a subset 𝑆ˆ ⊆ 𝑆 whose elements have a total weight less than or equal
to the bound and total value greater than or equal to the target.
2.28 Example Our knapsack can carry at most 𝐵 = 10 pounds. Can we pack items
with total worth at least 𝑇 = 100?
Item a b c d
Weight 3 4 5 6
Value 50 40 10 30
The best that we can do is take items a and b. We cannot meet the value target.
2.29 Problem (Partition) Given a finite multiset 𝐴 such that each element has an
associated natural number size 𝑠 (𝑎) , decide if the set splits into two, 𝐴ˆ and 𝐴 − 𝐴ˆ,
so that the total of the sizes is the same, 𝑎∈𝐴ˆ 𝑠 (𝑎) = 𝑎∉𝐴ˆ 𝑠 (𝑎) .
Í Í
2.30 Example The set 𝐴 = { I, a, my, go, rivers, cat, hotel, comb } has eight words.
The size of a word, 𝑠 (𝜎) , is the number of letters. Then 𝐴ˆ = { cat, rivers, I, a, go }
gives 𝑎∈𝐴ˆ 𝑠 (𝑎) = 𝑎∉𝐴ˆ 𝑠 (𝑎) = 12.
Í Í
2.31 Example The US President is elected by having states send representatives to the
Electoral College. The number depends in part on the state’s population.
Reps No. states States Reps No. states States
55 1 CA 11 4 AZ, IN, MA, TN
38 1 TX 10 4 MD, MN, MO, WI
29 2 FL, NY 9 3 AL, CO, SC
20 2 IL, PA 8 2 KY, LA
18 1 OH 7 3 CT, OK, OR
16 2 GA, MI 6 6 AR, IA, KS, MS, NV, UT
15 1 NC 5 3 NE, NM, WV
14 1 NJ 4 5 HI, ID, ME, NH, RI
13 1 VA 3 8 AK, DE, DC, MT, ND,
12 1 WA SD, VT, WY
The table above gives the numbers for the 2020 election; all of a state’s represen-
tatives vote for the same person (we will ignore some fine points). The Partition
Problem asks if there could be a tie.
2.32 Problem (Linear Progamming) Optimize a linear function 𝐹 (𝑥 0, ... 𝑥𝑛 ) = 𝑐 0𝑥 0 +
· · · + 𝑐𝑛 𝑥𝑛 subject to linear constraints, ones of the form 𝑎𝑖,0𝑥 0 + · · · + 𝑎𝑖,𝑛 𝑥𝑛 ≤ 𝑏𝑖
or 𝑎𝑖,0𝑥 0 + · · · + 𝑎𝑖,𝑛 𝑥𝑛 ≥ 𝑏𝑖 .
2.33 Example Maximize 𝐹 (𝑥 0, 𝑥 1 ) = 𝑥 0 + 2𝑥 1 subject to 4𝑥 0 + 3𝑥 1 ≤ 24, 𝑥 1 ≤ 4, 𝑥 0 ≥ 0
and 𝑥 1 ≥ 0. The shaded region has the points that satisfy all the inequalities; these
are said to be ‘feasible’ points.
286 Chapter V. Computational Complexity
𝑥1
5
𝐹 =8
𝐹 =6
𝐹 =4
𝐹 =2
5 𝑥0
The final three problems, about primes and divisibility, have an impeccable
history. No less an authority than Gauss said, “The problem of distinguishing prime
numbers from composite numbers and of resolving the latter into their prime
factors is known to be one of the most important and useful in arithmetic. It has
engaged the industry and wisdom of ancient and modern geometers to such an
extent that it would be superfluous to discuss the problem at length . . . Further, the
dignity of the science itself seems to require that every possible means be explored
for the solution of a problem so elegant and so celebrated.”
The three may be hard to tell apart at first glance. But as we understand them
today, they differ in the Big-O behavior of the algorithms to solve them.
2.38 Problem (Divisor) Given a number 𝑛 ∈ N, find a nontrivial divisor.
We know of no efficient algorithm to find divisors.† However, as is so often the
case, at this moment we also have no proof that no such algorithm exists.‡ Not
† ‡
No efficient algorithm is known on a non-quantum computer. The presumed difficulty of this
problem is at the heart of widely used algorithms in cryptography.
Section 2. A problem miscellany 287
all numbers of a given length are equally hard to factor. The hardest numbers to
factor are semiprimes, products of two prime numbers.
2.39 Problem (Prime Factorization) Given a number 𝑛 ∈ N, produce its decomposition
into a product of primes.
Factoring seems to be hard. But what if you only want to know whether a
number is prime and don’t care about its factors?
2.40 Problem (Primality) Given a number 𝑛 ∈ N, determine if it is prime; that is,
decide if there are no numbers 𝑎 that divide 𝑛 and such that 1 < 𝑎 < 𝑛 .
For many years the consensus among experts was that finding
a primality testing algorithm that was polytime in the number of
digits of the input was very unlikely. After all, for centuries, many
of the smartest people in the world had worked on composites
and primes, and none of them had produced a fast test.†
However, in 2002 M Agrawal, N Kayal, and N Saxena pro-
duced such an algorithm, the AKS algorithm.‡ Today, refinements
of their technique run in O (𝑛 6 ) . Nitin Saxena (b 1981),
This dramatically illustrates that even though a problem is Neeraj Kayal (b 1979),
high profile, and even though many well-respected experts have Manindra Agrawal
(b 1966)
worked on it, does not mean that the problem will never be
solved.
Although opinions of experts have value, nonetheless they can be wrong. People
producing a result that gainsays established orthodoxy has happened before and
will happen again. One correct proof is all it takes.
V.2 Exercises
2.41 Name the prime numbers less than one hundred.
2.42 Decide if each is prime.
(a) 5 477
(b) 6 165
(c) 6 863
(d) 4 207
(e) 7 689
✓ 2.43 Find a proper divisor of each. (a) 31 221 (b) 52 424 (c) 9 600 (d) 4 331
(e) 877
2.44 Your friend asks, “Doesn’t the polytime solution of Primality automatically
give us one for Divisor? Just take the divisor from the first one and use it for the
second one.” Help them out.
✓ 2.45 Decide if each formula is satisfiable.
(a) (𝑃 ∧ 𝑄) ∨ (¬𝑄 ∧ 𝑅)
†
There are a number of probabilistic algorithms that are often used in practice that can test primality
very quickly, with an extremely small chance of error. ‡ At the time that they did most of this research,
Kayal and Saxena were undergraduates.
288 Chapter V. Computational Complexity
Hamilton used the fourth, the dodecahedron, for his game. Find a Hamiltonian
circuit for the third and the fifth, the octahedron and the icosahedron. To make
the connections easier to see, below we have grabbed a face in the back of each
solid, and expanded it until we could squash the entire shape down into the plane
without any edge crossings.
0
1
1
4 2 3 4 5
2 5
3 6
7 8
0 9
10 11
✓ 2.50 Example 2.20 exhibits a cut set with six members, as shown on the left. But
on the right there are eight cut edges; what’s wrong with it?
𝑣1 𝑣3
𝑣1 𝑣3
𝑣0 𝑣5
𝑣0 𝑣5
𝑣2 𝑣4
𝑣2 𝑣4
𝑣1 𝑣3 𝑣5
✓ 2.52 This shows interlocking corporate directorships. The vertices are corporations
and they are connected if they share a member of their Board of Directors (the
data is from 2004).
JP Morgan Caterpillar AT&T Texas Instruments
(a) Is there a path from AT&T to Ford Motor? (b) Can you get from Haliburton to
Ford Motor? (c) Can you get from Caterpillar to Ford Motor? (d) JP Morgan to
Ford Motor?
2.53 How many edges are there in a Hamiltonian path?
2.54 On some Traveling Salesman problem graphs we can change the edge weights
to ensure that an edge is used but on some we cannot.
(a) A circuit for a Traveling Salesman problem instance is a Hamiltonian path.
Produce an undirected graph without loops on which there is at least one
Hamiltonian circuit, but containing an edge that belongs to no such circuit.
(b) Consider an undirected graph with an edge 𝑒 through which at least one
Hamiltonian circuit runs. Fix edge weights for all other edges. Show that
there is an edge weight for 𝑒 such that any solution for the Traveling Salesman
problem includes 𝑒 .
✓ 2.55 A popular game extends the Vertex-to-Vertex Path problem by counting the
degrees of separation. Below is a portion of the movie connection graph, where
actors are connected if they have ever been together in a movie.
Elvis Presley Meryl Streep
Change of Habit
Ed Asner JFK The River Wild
✓ 2.56 This Knapsack instance has no solution when the weight bound is 𝐵 = 73
and the value target is 𝑇 = 140.
Item a b c d e
Weight 21 33 49 42 19
Value 50 48 34 44 40
2.57 Using the data in Example 2.31, decide if there could be a tie in the 2020
Electoral College.
𝑞1 12 𝑞7
8 17
15
2
𝑞6 7
6
𝑞0 𝑞2 1 16
11
13
3 9
𝑞4 𝑞5
18
5 10
4 14
𝑞3 𝑞8
24
2.59 The Subset Sum instance with 𝑆 = { 21, 33, 49, 42, 19 } and target 𝑇 = 114
has no solution. Verify that by brute force, by checking every possible combination.
✓ 2.62 The Course Scheduling problem starts with a list of students and the classes
that they wish to take, and then finds how many time slots are needed to schedule
the classes. If there is a student taking two classes then those two will not be
scheduled to meet at the same time. Here is an instance: a school has classes
in Astronomy, Biology, Computing, Drama, English, French, Geography, History,
and Italian. After students sign up, the graph below shows which classes have
an overlap. For instance Astronomy and Biology share at least one student while
Section 2. A problem miscellany 291
𝐺 𝐸 𝐻
𝐶 𝐹 𝐼
𝐷
𝐴 𝐵
What is the minimum number of class times that we must use? In graph coloring
terms, we define that classes meeting at the same time are the same color and
we ask for the minimum number of colors needed so that no two same-colored
vertices share an edge. (a) Show that no three-coloring suffices. (b) Produce a
four-coloring.
2.63 If a Boolean expression 𝐹 is satisfiable, does that imply that its negation ¬𝐹
is not satisfiable?
2.64 Some authors define the Satisfiability problem as: given a finite set of
propositional logic statements, not just one statement, find if there is a single input
tuple 𝑏 0, ... 𝑏 𝑗 − 1 , where each 𝑏𝑖 is either 𝑇 or 𝐹 , that satisfies them all. Show that
this is equivalent to the definition given in Problem 2.9.
✓ 2.65 Find all 3-cliques in this graph.
𝑣6 𝑣5
𝑣1 𝑣4 𝑣3
𝑣0 𝑣2
𝑣0 𝑣1
𝑣2
𝑣3 𝑣4
𝑣5 𝑣6
𝑣3 𝑣4 𝑣5
(b) In this graph find a vertex cover with 𝑘 = 3 elements, and an independent set
with 𝑘ˆ = 3 elements.
𝑣0 𝑣1 𝑣2 𝑣3
𝑣4 𝑣5
(c) In this graph find a vertex cover 𝑆 with 𝑘 = 4 elements. Find an independent
set 𝑆ˆ with 𝑘ˆ = 6 elements.
𝑣0 𝑣1 𝑣2 𝑣3 𝑣4
𝑣5 𝑣6 𝑣7 𝑣8 𝑣9
0 1 2 3 4
𝛼 𝛽 𝛾 𝛿 𝜀
For example, instructor 𝐴 can only teach courses 1, 2, and 3. And, course 0 can only
run at time 𝛼 or time 𝛿 . Verify that this is an instance of the Three-dimensional Matching
problem and find a match.
2.69 Consider Three Dimensional Matching, Problem 2.22. Let 𝑋 = { a, b, c },
𝑌 = { b, c, d }, and 𝑍 = { a, d, e }. (a) List the elements of 𝑀 = 𝑋 × 𝑌 × 𝑍 .
(b) Is there a three element subset 𝑀ˆ whose triples have the property that no two
of them agree on any coordinate?
Section
V.3 Problems, algorithms, and programs
Now, with many examples in hand, we will briefly reflect on problems and solutions.
We will keep this discussion on an intuitive level only — indeed, many of these
things have no widely accepted precise definition.
Section 3. Problems, algorithms, and programs 293
Problem types We have already seen function problems. These ask that an
algorithm has a single output for each input. A example is the Prime Factorization
problem, which takes in a natural number and returns its prime decomposition.
Another example is the problem of finding the greatest common divisor, where the
input is a pair of natural numbers and the output is a natural number.
Another problem type is the optimization problem. These ask for a solution
that is best according to some metric. The Shortest Path problem is one of these,
†
There are interesting problems with only one task, such as computing the digits of 𝜋 . ‡ There is
no widely-accepted formal definition of ‘algorithm’. Whatever it is, it fits between ‘mathematical
function’ and ‘computer program’. For example, a ‘sort’ routine takes in a set of items and returns
the sorted sequence. This task, this input-output behavior, could be accomplished using different
algorithms: merge sort, heap sort, etc. So the best handle that we have is informal — an ‘algorithm’ is
an equivalence class of programs (i.e., Turing machines), where two programs are equivalent if they do
a task in essentially the same way, whatever “essentially” means. # There are now coming up on a
million volunteers offering computing time. To join them, visit https://scienceunited.org/.
294 Chapter V. Computational Complexity
3.3 Figure: Both of these show the collection of languages, P ( B∗ ) , which we often call
the ‘problems’. On the left, the dots in the blob emphasize that this is a collection
of separate sets, not a continuum. It is drawn with quickly-solvable problems, those
with a fast decider, at the bottom. But there is a catch. On the right the shaded
collection Rec consists of the Turing computable languages. Similarly, RE consists
of the languages that are computably enumerable. So this diagram makes the point
that not all languages have a decider or a recognizer — some languages are perfectly
good problems, but they are unsolvable.
†
Thus, on a Turing machine, if when the machine starts the head is under the final character, then the
machine does not even need to read the entire input to decide the question. The algorithm runs in time
independent of the input length. ‡ That is, the unary case reduces to the binary one. # ‘Reasonable’
means that it is not so inefficient as to greatly change the big-O behavior. § This is in a way like
Church’s Thesis. We cannot prove it but our experience with digital reproduction of music, movies, etc.,
argues that it is so.
298 Chapter V. Computational Complexity
V.3 Exercises
✓ 3.10 For each of these, list three examples and then — speaking informally, since
some of them do not have formal definitions — describe the difference between
them and an algorithm. (a) a heuristic (b) pseudocode (c) a Turing machine
(d) a flowchart (e) source code (f) an executable (g) a process
3.11 Your friend asks, “So, if a problem is essentially a set of strings, what
constitutes a solution?” Answer them.
3.12 What is the difference between a decision problem and a language decision
problem?
3.13 As an illustration of the thesis that even surprising things can be represented
reasonably efficiently and with reasonable fidelity in binary, we can do a simple
calculation. (a) At 30 cm, the resolution of the human eye is about 0.01 cm.
How many such pixels are there in a photograph that is 21 cm by 30 cm?
(b) We can see about a million colors. How many bits per pixel is that? (c) How
many bits for the photo, in total?
3.14 Name something important that cannot be represented in binary.
✓ 3.15 True or false: any two programs that implement the same algorithm must
compute the same function. What about the converse?
3.16 Some tasks are hard to express as a language decision problem. Consider
sorting the characters of a string into ascending order. Briefly describe why each
of these language decision problems fails to capture the task’s essential difficulty.
(a) {𝜎 ∈ Σ∗ 𝜎 is sorted } (b) { ⟨𝜎, 𝑝⟩ 𝑝 is a permutation that orders 𝜎 }
✓ 3.17 For each language decision problem, name three members of the set, if there
are three, and then sketch an algorithm solving it.
(a) L0 = { ⟨𝑛, 𝑚⟩ ∈ N2 𝑛 + 𝑚 is a square and one greater than a prime }
†
Many authors use diamond brackets to stand for a representation, as in ‘ ⟨ G, 𝑣0 , 𝑣1 ⟩ ’. Here, we reserve
diamond brackets for sequences.
Section 3. Problems, algorithms, and programs 299
3.18 Solve the language decision problem for (a) the empty language, (b) the
language B, and (c) the language B∗.
3.19 For each language, sketch an algorithm that solves the language decision
problem.
(a) {𝜎 ∈ B∗ 𝜎 matches the regular expression a*ba* }
(b) The language defined by this grammar
S → AB
A → aA | 𝜀
B → bB | 𝜀
3.20 Solve each decision problem about Finite State machines, M, by producing
an algorithm. (a) Given M, decide if the language accepted by M is empty.
(b) Decide if the language accepted by M is infinite. (c) Decide if L ( M) is the
set of all strings, Σ∗ .
3.21 For each language decision problem, give an algorithm that runs in O ( 1) .
(a) The language of minimal-length binary representations of numbers that are
nonzero.
(b) The binary representations of numbers that exceed 1000.
3.22 In a graph, a bridge edge is one whose removal disconnects the graph. That
is, there are two vertices that before the bridge is removed are connected by a
path, but are not connected after it is removed. (More precisely, a connected
component of a graph is a set of vertices that can be reached from each other by
a path. A bridge edge is one whose removal increases the number of connected
components.) The problem is: given a graph, find a bridge. Is this a function
problem, a decision problem, a language decision problem, a search problem, or
an optimization problem?
✓ 3.23 For each, give the categorization that best applies: a function problem, a
decision problem, a language decision problem, a search problem, or an opti-
mization problem. (a) The Graph Connectedness problem, which inputs a graph
and decides whether for any two vertices there is a path between them. (b) The
problem that inputs two natural numbers and returns their least common multiple.
(c) The Graph Isomorphism problem that inputs two graphs and determines
whether they are isomorphic. (d) The problem that takes in a propositional logic
statement and returns an assignment of truth values to its inputs that makes the
statement true, if there is such an assignment. (e) The Nearest Neighbor problem
that inputs a weighted graph and a vertex, and returns a vertex nearest the given
one that does not equal the given one. (f) The Discrete Logarithm problem: given
a prime number 𝑝 and two numbers 𝑎, 𝑏 ∈ N, determine if there is a power 𝑘 ∈ N
so that 𝑎𝑘 ≡ 𝑏 ( mod 𝑝) . (g) The problem that inputs a bitstring and decides if
300 Chapter V. Computational Complexity
the number that it represents in binary will, when converted to decimal, contain
only odd digits.
✓ 3.24 For each, give the characterization that best applies: a function problem, a
decision problem, a language decision problem, a search problem, or an optimiza-
tion problem. (a) The 3-SAT problem, Problem 2.10 (b) The Divisor problem,
Problem 2.38 (c) The Prime Factorization problem, Problem 2.39 (d) The F-SAT
problem, where the input is a propositional logic expression and the output is
either an assignment of 𝑇 and 𝐹 to the expression’s variables that makes it evaluate
to 𝑇 , or the string None. (e) The Primality problem, Problem 2.40
3.25 Express each task as a language decision problem. Include in the description
explicit mention of the string representation. (a) Decide whether a number is a
perfect square. (b) Decide whether a triple ⟨𝑥, 𝑦, 𝑧⟩ ∈ N3 is a Pythagorean triple,
that is, whether 𝑥 2 + 𝑦 2 = 𝑧 2 . (c) Decide whether a graph has an even number of
edges. (d) Decide whether a path in a graph has any repeated vertices.
✓ 3.26 Recast each as a language decision problem. Include explicit mention of the
string representation. (a) Given a natural number, do its factors add to more than
twice the number? (b) Given a Turing machine and input, does the machine halt
on the input in less than ten steps? (c) Given a propositional logic statement, are
there three different assignments that evaluate to 𝑇 ? That is, are there more than
three lines in the truth table that end in 𝑇 ? (d) Given a weighted graph and a
bound 𝐵 ∈ R, for any two vertices is there a path from one to the other with total
cost less than the bound?
3.27 Recast each in language decision terms. Include explicit mention of the string
representation. (a) Graph Colorability, Problem 2.7, (b) Euler Circuit, Problem 2.4,
(c) Shortest Path, Problem 2.5.
3.28 Restate the Halting problem as a language decision problem.
✓ 3.29 As stated, the Shortest Path problem, Problem 2.5, is an optimization problem.
Convert it into a parametrized family of decision problems. Hint: use the technique
outlined following the Traveling Salesman problem, Problem 2.3.
✓ 3.30 Express each optimization problem as a parametrized family of language
decision problems. (a) Given a Fiteen Game board, find the least number of slides
that will solve it. (b) Given a Rubik’s cube configuration, find the least number of
moves to solve it. (c) Given a list of jobs that must be accomplished to assemble a
car, along with how long each job takes and which jobs must be done before other
jobs, find the shortest time to finish the entire car.
3.31 As stated, the Hamiltonian Circuit problem is a decision problem. Give a
function version of this problem. Also give an optimization version.
3.32 The different problem types are related. Each of these inputs a square
matrix 𝑀 with more than 3 rows, and relates to a 3 × 3 submatrix (form the
submatrix by picking three rows and three columns, which need not be adjacent).
Characterize each as a function problem, a decision problem, a search problem, or
an optimization problem. (a) Find a submatrix that is invertible. (b) Decide if
Section 4. P 301
Section
V.4 P
We have said that we often blur the distinction between the problem of deciding
membership in a language L and the language itself. So to express that we are
studying problems of a certain type we may say we are studying languages.
4.1 Definition A complexity class is a collection of languages.
302 Chapter V. Computational Complexity
The term ‘complexity’ is there because these collections are often associated
with some resource specification, so that for instance one class is the collection of
languages that are accepted by a Turing machine in quadratic time.†
4.2 Example One complexity class is the collection of languages for which there is a
deciding Turing machine that runs in time O (𝑛 2 ) . Thus C = { L0, L1, ... }, where
each L 𝑗 is decided by some machine P𝑖 𝑗 , for which the function 𝑓 relating the size
of the machine’s input |𝜎 | to the number of steps that the machine takes to finish
is quadratic, that is, 𝑓 is O (𝑛 2 ) .
4.3 Example Another is the collection of languages accepted by some Turing machine
that uses only logarithmic space. That is, for such a machine, with input string 𝜎 the
function 𝑓 relating |𝜎 | to the maximum number of tape squares that the machine
visits in deciding a string of that length is logarithmic, 𝑓 ∈ O ( lg) .
Two points bear explication. As to the computing machine, researchers
study not just Turing machines but other types of machines as well, including
nondeterministic Turing machines and Turing machines with access to an oracle
for random numbers. And as for the resource specification, it often involve bounds
on the time or space behavior. But a class could instead be, for instance, the
complement of O (𝑛 2 ) , so the specification isn’t always a bound.‡
Definition The complexity class that we introduce now is the most important one.
It is the collection of problems that under Cobham’s Thesis we take to be tractable.
4.4 Definition A language decision problem L is a member of the class P if there is
an algorithm to decide membership in L that on a deterministic Turing machine
runs in polynomial time.
4.5 Example The problem L = { G there is a path between any two vertices } of
deciding whether a given graph is connected is a member of P. To verify this, we
must produce an algorithm that decides membership in this language, and that
runs in polynomial time. One is to do a breadth first search of the graph, which
has a runtime that is cubic in the number of nodes.
4.6 Example Another member of P is the problem of deciding whether two numbers are
relatively prime, { ⟨𝑛 0, 𝑛 1 ⟩ ∈ N2 their greatest common divisor is 1 }. As before,
to verify that this language is a member of P we produce an algorithm that
determines membership and that runs in polytime. Euclid’s algorithm fits the bill;
it solves this problem and has runtime O ( lg ( max (𝑛 0, 𝑛 1 ))) .
4.7 Example Still another member of P is the String Search problem of deciding
substring-ness, { ⟨𝜎, 𝜏⟩ ∈ Σ∗ 𝜎 is a substring of 𝜏 }. (Often in practice 𝜏 is very
long and is called the haystack while 𝜎 is short and is the needle.) The algorithm
that first tests 𝜎 at the initial character of 𝜏 , then at the next character, etc., has a
†
There are other definitions of complexity class. Some authors require that in a class the characteristic
function of each language can be computed under some resource specification. This has implications —
if all of the members of a class must be computable by Turing machines then each class is countable.
Here, we only say that it is a collection so that the definition is maximally general. ‡ At this
writing there are 546 studied classes but the number changes frequently; see the Complexity Zoo,
https://complexityzoo.net/.
Section 4. P 303
𝑏0 ⊕
∧
𝑏1 ∧
≡ 𝑓 (𝑏 0 , 𝑏 1, 𝑏 2, 𝑏 3 )
𝑏2 ⊕
∨
𝑏3 ∧
This circuit returns 1 if the sum of the input bits is a multiple of 3. The
Circuit Evaluation problem inputs a circuit like this one and computes the out-
put, 𝑓 (𝑏 0, 𝑏 1, 𝑏 2, 𝑏 3 ) . This problem is a member of P.
4.9 Example Although polytime is a restriction, nonetheless P is a very large collection.
More example members: (1) matrix multiplication, taken as a language decision
problem for { ⟨𝜎0, 𝜎1, 𝜎2 ⟩ they represent matrices with 𝑀0 · 𝑀1 = 𝑀2 } (2) mini-
mal spanning tree, { ⟨G,𝑇 ⟩ 𝑇 is a minimal spanning tree in G } (3) edit distance,
the number of single-character removals, insertions, or substitutions needed to
transform between strings, { ⟨𝜎0, 𝜎1, 𝑛⟩ 𝜎0 transforms to 𝜎1 in at most 𝑛 edits }.
4.10 Figure: This blob contains all language decision problems, all L ⊆ B∗ . Shaded is P.
Two final points. First, if a problem has an algorithm that is O ( lg 𝑛) then that
problem is in P. Second, the members of P are problems (actually, languages that
represent problems), so it is wrong to say that an algorithm is in P.
of alternative computational models proposed over the years shows that while
Turing machine algorithms are indeed often slower than related algorithms for
other natural models, it is only by a factor of between 𝑛 2 and 𝑛 4.† That is, if we
have a problem for which there is a O (𝑛) algorithm on another model then we
may find that on a Turing machine model it is O (𝑛 3 ) , or O (𝑛 4 ) , or O (𝑛 5 ) . So it is
still in P.
A variation of Church’s thesis, the Extended Church’s Thesis, posits that not
only are all reasonable models of mechanical computation of equal power, but in
addition that they are of equivalent speed in that we can simulate any reasonable
model of computation‡ in polytime on a probabilistic Turing machine.# Under the
extended thesis, a problem that falls in the class P using Turing machines also falls
in that class using any other natural models. (Note, however, that this thesis does
not enjoy anything like the support of the original Church’s Thesis. Also, we know
of several problems, including the familiar Prime Factorization problem, that under
the Quantum Computing model have algorithms with polytime solutions, but for
which we do not know of any polytime solution in a non-quantum model. So
Quantum Computing could well would provide a counterexample to the extended
thesis, if we can produce physical devices matching that model.)
4.11 Remark In recent years a number of researchers claimed to have built devices
that achieved quantum advantage, to have solved a problem using an algorithm
running on a physical quantum device that does not appear to be solvable on a
Turing machine or RAM machine-based device in less than centuries.
The claim is the subject of scholarly reservations. For one thing, the advantage
depends both on there being a quantum device that accomplishes the task and
also on there not being a classical algorithm that is fast. In fact soon after the
original claim was made other researchers produced an algorithm for a traditional
device that is near parity. Another thing is that on its face, this is not about
general purpose computing; the problem solved is exotic and especially suitable
to the instruments that researchers have available. There are sound reasons
to wonder whether quantum computers will ever be practical physical devices
used for everyday problems, although scientists and engineers are making great
progress. We will put this aside for being as-yet unsettled but it is worth monitoring
developments closely.
Naturalness We give the class P our attention because there are reasons to suppose
that it is the best candidate for the collection of problems that have a feasible
solution. We close this section with a discussion of those.
The first reason echos the prior subsection. There are many models of com-
putation, including Turing machines, RAM machines, and Racket programs. All
of them compute the same set of functions as Turing machines. Further, while
their speeds may differ, all of them run within polytime of each other.§ That makes
†
We take a model to be ‘natural’ if it was not invented in order to be a counterexample to this. ‡ One
definition of ‘reasonable’ is “in principle physically realizable” (Bernstein and Vazirani 1997). # A
Turing machine with access to an oracle of random bits. § All, that is, of the non-quantum natural
models.
Section 4. P 305
(Recall that str (...) means that we represent the argument reasonably efficiently
as a bitstring.) With that recasting of functions as languages, P is closed under
function addition, scalar multiplication by an integer, subtraction, multiplication,
and composition. It is also closed under language concatenation and the Kleene
star operator. It is the smallest nontrivial class with these appealing properties.
But the main reason that P is our candidate is Cobham’s Thesis, the contention
that a problem is tractable if it has a solution algorithm that runs in polytime.
Recall the counterargument that a problem whose solution algorithms cannot
be improved below a runtime of O (𝑛 1 000 000 ) is not really tractable. We know
such problems exist because we can produce them using diagonalization. But
problems produced in that way are artificial. Empirical experience over close to
a century of computing is that problems with solution algorithms of very large
degree polynomial time complexity do not seem to arise often in practice. We see
plenty of problems with solution algorithms that are O (𝑛 lg 𝑛) , or O (𝑛 3 ) , and we
see plenty that are exponential, but we just do not see much of O (𝑛 1 000 000 ) .
Moreover, often in the past when a researcher has produced an algorithm for a
problem with a runtime that has even a moderately large degree then often, with
this foot in the door, over the next few years the community brings to bear an
array of mathematical and algorithmic techniques that lower the runtime degree
to reasonable size.
Even if the objection to Cobham’s Thesis is right and P is too broad, the class
would still be useful because if we could show that a problem is not in P then we
would have shown that it has no general solution algorithm that is practical.†
So Cobham’s Thesis, to this point, has largely held up. Insofar as theory should
be a guide for practice, this is a good reason to use P as a touchstone for other
complexity classes.
V.4 Exercises
✓ 4.12 True or False: if the language is finite then the language decision problem is
in P.
†
This argument has lost some of force in recent years with the rise of SAT solvers. These attack problems
believed to not be in P and can solve instances of the problems of moderately large size, using only
moderately large computing resources. See Extra C.
306 Chapter V. Computational Complexity
✓ 4.13 Your coworker says something mistaken, “I’ve got a problem whose algorithm
is in P.” They are being a little sloppy with terms; how?
✓ 4.14 What is the difference between an order of growth and a complexity class?
✓ 4.15 Your friend says to you, “I think that the Circuit Evaluation problem takes
exponential time. There is a final vertex. It takes two inputs, which come from
two vertices, and each of those take two inputs, etc., so that a five-deep circuit can
have thirty two vertices.” Help them see where they are wrong.
4.16 In class, someone says to the professor, “Why aren’t all languages in P: I’ll
design a Turing machine so that no matter what the input is, it outputs 1. That
runs in polytime for sure.” Explain how this is mistaken.
4.17 True or false: if a problem has a logarithmic solution then it is in P.
4.18 True or false: if a language is decided by a machine then its complement is
also accepted by some machine.
✓ 4.19 Show that the decision problem for {𝜎 ∈ B∗ 𝜎 = 𝜏 3 for some 𝜏 ∈ B∗ } is
in P.
✓ 4.20 Show that the language of palindromes, {𝜎 ∈ B∗ 𝜎 = 𝜎 R }, is in P.
4.21 Sketch a proof that each problem is in P.
(a) The 𝜏 3 problem: given a bitstring 𝜎 , decide if it has the form 𝜎 = 𝜏 ⌢ 𝜏 ⌢ 𝜏 .
(b) The problem of deciding which Turing machines halt within ten steps.
✓ 4.22 Consider the problem of Triangle: given an undirected graph, decide if it has
a 3-clique, three vertices that are mutually connected.
(a) Why is this not the Clique problem, from page 283?
(b) Sketch a proof that this problem is in P.
✓ 4.23 Prove that each problem is in P by citing the runtime of an algorithm that
suits.
(a) Deciding the language {𝜎 ∈ { a, ... z } 𝜎 is in alphabetical order } .
∗
4.24 Find which of these are currently known to be in P and which are not.
Hint: you may need to look up the fastest known algorithm. (a) Shortest Path
(b) Knapsack (c) Euler Path (d) Hamiltonian Circuit
4.25 Is the empty language { } ⊂ B∗ a member of P?
4.26 The problem of Graph Connectedness is: given a finite graph, decide if there
is a path from any vertex to any other. Sketch an argument that this problem is
in P.
Section 4. P 307
4.40 Show that this problem is unsolvable: give a Turing machine P , decide
whether it runs in polytime on the empty input. Hint: if you could solve this
problem then you could solve the Halting problem.
4.41 There are studied complexity classes besides those associated with language
decision problems. The class FP consists of the binary relations 𝑅 ⊆ N2 where
there is a Turing machine that, given input 𝑥 ∈ N, can in polytime find a 𝑦 ∈ N
where ⟨𝑥, 𝑦⟩ ∈ 𝑅 .
(a) Prove that this class closed under function addition, multiplication by a
scalar 𝑟 ∈ N, subtraction, multiplication, and function composition.
(b) Where 𝑓 : N → N is computable, consider this decision problem associated
with the function, L 𝑓 = { str (⟨𝑛, 𝑓 (𝑛)⟩) ∈ B∗ 𝑛 ∈ N } (where the numbers
are represented in binary). Assume that we have two functions 𝑓0, 𝑓1 : N → N
such that L 𝑓0 , L 𝑓1 ∈ P. Show that the natural algorithm to check for closure
under function addition is pseudopolynomial.
4.42 Where L0, L1 ⊆ B∗ are languages, we say that L1 ≤𝑝 L0 if there is a function
𝑓 : B∗ → B∗ that is computable, total, that runs in polytime, and so that 𝜎 ∈ L1 if
and only if 𝑓 (𝜎) ∈ L0 . Prove that if L0 ∈ P and L1 ≤𝑝 L0 then L1 ∈ P.
Section
V.5 NP
Recall that a Finite State machine is nondeterministic if from a present configuration
and input it may pass to more than one next configuration, or one, or zero. We can
make a nondeterministic Turing machine by doing the same. Here is one having
two instructions starting with 𝑞 0 and 0 so if the machine is in 𝑞 0 and it reads a 0
on the tape then it is legal both to go to state 𝑞 2 and to state 𝑞 1 .
For such a machine the computational history can be more than a line, it can be a
tree. Below is part of the tree for machine P and input 00, with the middle branch
highlighted.
01 ⊢ 00
⊢ q3 q3
00
⊢ 0
⊢ q1
q1
00
⊢
q0 10 ⊢ 10 ⊢ 10 ···
q2 q2 q2
NP P P=NP
Consider Satisfiability. Imagine that in Example 5.4 above the demon whispers,
“Psst! Try TTF.” With that hint we can quickly verify with a deterministic machine.
For that example’s language of satisfiable expressions, here is the verifier.
Start
Read 𝜎 , 𝜔
Accept Reject
We start with the expression 𝜎 = 𝐸 from that example’s (∗), and also feed it the
demon’s hint 𝜔 = TTF. It accepts, certainly in polytime, which verifies that 𝐸 is
satisfiable.
5.8 Definition A verifier for a language L is a deterministic Turing machine V that
inputs ⟨𝜎, 𝜔⟩ ∈ B2 and is such that 𝜎 ∈ L if and only if there exists an 𝜔 so that
V accepts ⟨𝜎, 𝜔⟩ .† The string 𝜔 is called the witness or certificate.
5.9 Lemma A language is in NP if and only if it has a verifier that runs in time
polynomial in |𝜎 | . That is, L ∈ NP if and only if there is a polynomial 𝑝 and a
deterministic Turing machine V that halts on all inputs ⟨𝜎, 𝜔⟩ in 𝑝 (|𝜎 |) time, and
is such that 𝜎 ∈ L exactly when there is a witness 𝜔 where V accepts ⟨𝜎, 𝜔⟩ .
Before the lemma’s proof we will first discuss some aspects of both the definition
and the lemma.
5.10 Example Our touchstone is the Satisfiability problem. Using the lemma to show
that this problem is in NP requires that we produce a deterministic Turing machine
verifier. In the flowchart below the first input 𝜎 is a Boolean expression while the
second, the 𝜔 , is a string that V interprets as describing a line of 𝜎 ’s truth table. If
a candidate expression 𝜎 is satisfiable then there is a suitable witness, a line from
the truth table, so that V can check that the named line gives a result of 𝑇 . As an
example, for the expression (∗) from above, take 𝜔 = TTF. Clearly the verifier can
do the checking in polytime. On the other hand, if a candidate 𝜎 is not satisfiable,
for example with the expression 𝜎 = 𝑃 ∧ ¬𝑃 , then no 𝜔 will cause V to accept.
Before the next example, a few comments.
The most striking thing about the definition is that it says that ‘there exists’
a witness 𝜔 but it does not say where the witness comes from. A person with a
computational mindset may well ask, “but how will we calculate the 𝜔 ’s?” The
point is not how to find them. The point is whether there exists a deterministic
Turing machine V that can leverage a given hint 𝜔 to verify in polytime that 𝜎 ∈ L.
That is, we don’t find the 𝜔 ’s, we just use them.
†
While we have given a definition of a nondeterministic Turing machine accepting its input, we have
not given one for deterministic machines. We could modify the machine definition to add accepting
states but for simplicity we take it to mean that V halts and outputs 1 or ‘Accept’.
Section 5. NP 313
Second, if 𝜎 ∉ L then the definition does not require a witness to that. Instead,
what’s required is that from among all possible strings 𝜔 there is none such that
the verifier accepts ⟨𝜎, 𝜔⟩ .†
The third comment relates to this asymmetry. Imagine that a demon hands
you some papers and claims that they contain an unbeatable strategy for chess.
Verifying requires stepping through the responses to each move, and responses
to the responses, etc., so there is lots of branching. To prove that the strategy is
unbeatable we appear to have to check all of the branches, not just find one good
one. It seems that a deterministic verifier must take exponential time. That would
make the demon’s papers, in a sense, useless. So this chess strategy is not like the
problems that we have been considering.
Also reflecting this asymmetry, it is not clear that L ∈ NP implies that its
complement Lc is a member of NP. Consider Satsifiability. If a propositional logic
expression 𝜎 is satisfiable then a witness to that is a pointer to a line of the truth
table. But for non-satisfiability there is no natural witness; instead, the natural
thing is to check all lines. As far as we know today, verifying that a Boolean formula
is not satisfiable takes more than polytime. Consequently, where the complexity
class co-NP contains the complements of languages from NP, we suspect that
NP ≠ co-NP.
Thus, the lemma explains something about the class NP: while P contains
problems where we can find the answer in polytime, NP contains the problems
whose answers are useful in that we can at least verify them in polytime.
Finally, the lemma requires that the runtime of the verifier is polynomial in |𝜎 | ,
not polynomial in the length of its input, ⟨𝜎, 𝜔⟩ . If it said the latter then we could
check the chess strategy just by using a witness that is exponentially long, which
consequently makes ⟨𝜎, 𝜔⟩ exponentially long. Also observe that because V runs in
time polynomial in |𝜎 | , for the verifier to accept there must exist a witness whose
length is at most polynomial in |𝜎 | , because with 𝜔 ’s that are too long the verifier
cannot even input them before its runtime bound expires.
5.11 Example The Hamiltonian Path problem is like the Hamiltonian Circuit problem
except that instead of requiring that the starting vertex equals the ending one, it
inputs two vertices. It is the problem of determining membership in this set.
L = { ⟨G, 𝑣, 𝑣ˆ⟩ path in G between 𝑣 and 𝑣ˆ visits every vertex exactly once }
We will show that this problem is in the class NP. We must produce a deterministic
Turing machine verifier V . It is sketched below. It takes as input ⟨𝜎, 𝜔⟩ , where the
candidate for membership in L is 𝜎 = ⟨G, 𝑣, 𝑣ˆ⟩ . The verifier interprets the witness
to be a path, 𝜔 = ⟨𝑣, 𝑣 1, ... 𝑣ˆ⟩ .
†
With this in mind, perhaps a better term for 𝜔 is “potential witness” or “proposed witness.” But those
are not standard terms.
314 Chapter V. Computational Complexity
Start
Read 𝜎 , 𝜔
Y N
All vertices visited once?
Accept Reject
If there is a Hamiltonian path then there exists a witness 𝜔 , and so there is input
that V will accept. Clearly, if given acceptable input then V runs in polytime. On
the other hand, if 𝜎 has no Hamiltonian path then for no 𝜔 will V be able to verify
that 𝜔 is such a path, and thus it will not accept any input pair starting with 𝜎 .
5.12 Example The Primality problem asks whether a given number has a nontrivial
factor.
L = {𝑛 ∈ N+ 𝑛 has a divisor 𝑎 with 1 < 𝑎 < 𝑛 }
accepting ⟨𝜏, 𝜔⟩
ˆ , so there is a way for the prior paragraph to result in acceptance
of 𝜏 , and so P̂ accepts 𝜏 . Conversely, suppose that 𝜏 ∉ L. By the definition of a
verifier, no witness 𝜔
ˆ will result in V̂ accepting ⟨𝜏, 𝜔⟩
ˆ , and thus P̂ rejects 𝜏 .
A common reaction to the second half of that proof is, “Wait, P̂ pulls 𝜔 ˆ out
of the air? How is that legal?” This reaction — about everyday experience versus
abstraction — is both common and reasonable so we will address it.
The first response is purely formal. Definition 5.8, as written, states that the
candidate 𝜏 is accepted if there exists an 𝜔 ˆ and does not require us to be able to
compute it. The proof ’s final paragraph covers the two possibilities: if 𝜏 ∈ L then
there is such an 𝜔 ˆ and otherwise there is not, so the definition is satisfied. True, the
language “nondeterministically produces a witness” is provocative in that it tends
to draw the objection that we are addressing, but this language is common in the
literature. (In terms of the two mental models, we can take ‘P̂ nondeterminstically
produces a witness’ to mean either that it uses unbounded parallelism to produce
all possible 𝜔
ˆ ’s, or that it guesses 𝜔
ˆ or gets it from a demon.)
The second response is more broad. We today do not have physical devices
bearing the same relationship to nondeterministic Turing machines that everyday
computers bear to deterministic ones. (We can write a program to simulate
nondeterministic behavior but no device does it in hardware.) When Turing
formulated his definition there were no physical computers but they appeared soon
after; will we someday have nondeterministic devices? Putting aside as too exotic
proposals that involve things like time travel through wormholes, we don’t know
of any candidates.† But that doesn’t mean that thinking about them is a purely
academic exercise.‡ The model of nondeterministic Turing machines has proven to
be very fruitful.
As evidence of that, the problems that are associated with these machines, the
members of NP, are eminently practical, as witnessed by the fact that computer
scientists have been trying to find fast solutions to many members of this class
since computers have existed. For another, Lemma 5.9 rephrases questions about
nondeterministic machines as questions about deterministic ones, the verifiers.
We close with a reflection. In this section we have defined the class of problems
NP for which there is a good way to verify the solution, in contrast with the
problems in P for which there is a good way to find the solution. Just as computably
enumerable sets seem to be the limit of what can be in theory be known, polytime
verification seems to be the limit of what can feasibly be done.
In the next section we will consider whether the two classes P and NP differ.
†
In order here is a caution about the machine types that seem likely to be coming, quantum computers.
Well-established physical theory says that subatomic particles can be in a superposition of many states
at once. Naively, it may seem that because of this essentially unbounded multi-way branching, if we
could manipulate these particles then we would have nondeterministic computation. But, that we
know of, this is false. That we know of, to get information out of a quantum system we must use
interference. (Some popularizations wrongly suggest that quantum computers can try all potential
solutions in parallel. That is, they miss the point about interference.) ‡ Not that there is anything
wrong with academic exercises.
316 Chapter V. Computational Complexity
V.5 Exercises
✓ 5.13 Your study partner asks, “In Lemma 5.9, since the witness 𝜔 is not required
to be effectively computable, why can’t I just take it to be the bit 1 if 𝜎 ∈ L, and 0
if not? Then writing the verifier is easy: just ignore 𝜎 and follow the bit.” They are
confused. Straighten them out.
5.14 Which is the negation of ‘at least one branch accepts’?
(a) Every branch accepts.
(b) At least one branch rejects.
(c) Every branch rejects.
(d) At least one branch fails to reject.
(e) None of these.
✓ 5.15 Decide if it is satisfiable.
(a) (𝑃 ∧ 𝑄) ∨ (¬𝑄 ∧ 𝑅)
(b) (𝑃 → 𝑄) ∧ ¬((𝑃 ∧ 𝑄) ∨ ¬𝑃)
5.16 True or false? If a language is in P then it is in NP.
5.17 Uh-oh. You find yourself with a nondeterministic Turing machine where
on input 𝜎 , one branch of the computation tree accepts and one rejects. Some
branches don’t halt at all. What is the upshot?
✓ 5.18 You get an exercise, Write a nondeterministic algorithm that inputs a maze
and outputs 1 if there is a path from the start to the end.
(a) You hand in an algorithm that does backtracking to find any possible solution.
Your professor sends it back, and says to try again. What was wrong?
(b) You hand in an algorithm that, each time it comes to a fork in the maze,
chooses at random which way to go. Again you get it back with a note to work
out another try. What is wrong with this one?
(c) Give a right answer.
5.19 Sketch a nondeterministic algorithm to search an unordered array of numbers
for the number 𝑘 . Describe it both in terms of unbounded parallelism and in terms
of guessing.
5.20 A simple substitution cipher encrypts text by substituting one letter for
another. Start by fixing a permutation of the letters, for example ⟨W, P, ...⟩ Then
the cipher is that any A is replaced by a W, any B is replaced by a P, etc. Sketch
three algorithms for decoding a substitution cipher (assume that you can recognize
a correctly decoded string): (a) one that is deterministic, (b) one expressed in
terms of unbounded parallelism, and (c) one expressed in terms of guessing.
✓ 5.21 Outline a nondeterministic algorithm that inputs a finite planar graph and
outputs Yes if and only if the graph has a four-coloring. Describe it both in terms
of unbounded parallelism and in terms of a demon providing a witness.
5.22 The Linear Programming problem is described on page 285. The related
problem Integer Linear Programming also seeks to maximize a linear objective
function 𝐹 (𝑥 0, ... 𝑥𝑛 ) = 𝑑 0𝑥 0 + · · · + 𝑑𝑛 𝑥𝑛 subject to linear constraints 𝑎𝑖,0𝑥 0 +
Section 5. NP 317
5.25 Sketch a nondeterministic algorithm that inputs a planar graph and a bound
𝐵 ∈ N and decides whether the graph is 𝐵 -colorable, described both in terms of
unbounded parallelism and also in terms of guessing.
✓ 5.26 For each problem, cast it as a language decision problem and then prove that
it is in NP by filling in the blanks in this argument.
Lemma 5.9 requires that we produce a deterministic Turing machine verifier, V . It must input
pairs of the form ⟨𝜎, 𝜔⟩ , where 𝜎 is (1) . It must have the property that if 𝜎 ∈ L then
there is an 𝜔 such that V accepts the input, while if 𝜎 ∉ L then there is no such witness 𝜔 . And
it must run in time polynomial in |𝜎 | .
The verifier interprets the bitstring witness 𝜔 as (2) , and checks that (3) . Clearly
that check can be done in polytime.
If 𝜎 ∈ L then by definition there is (4) , and so a witness 𝜔 exists that will cause V to
accept the input pair ⟨𝜎, 𝜔⟩ . If 𝜎 ∉ L then there is no such (5) , and therefore no witness 𝜔
will cause V to accept the input pair.
(a) The Double-SAT problem inputs a propositional logic statement and decides
whether it has at least two different substitutions of Boolean values that make
it true.
(b) The Subset Sum problem inputs a set of numbers 𝑆 ⊂ N and a target sum 𝑇 ∈
N, and decides whether least one subset of 𝑆 adds to 𝑇 .
✓ 5.27 In the game show Countdown, players get a target integer 𝑇 ∈ [100 .. 999]
and six numbers from 𝑆 = { 1, 2, ... 10, 25, 50, 75, 100 } (these can be repeated).
They make an arithmetic expression that evaluates to the target, using each given
number at most once. The expression can use addition, subtraction, multiplication,
and division without remainder. Show that the decision problem for Countdown =
{ ⟨𝑠 0, ... 𝑠 5,𝑇 ⟩ ∈ 𝑆 6 × 𝐼 a combination of the 𝑠𝑖 gives 𝑇 } is in NP.
✓ 5.28 Recall that we recast Traveling Salesman optimization problem as a language
decision problem for a family of languages. Show that each such language is in NP
318 Chapter V. Computational Complexity
by applying Lemma 5.9, sketching a verifier that works with a suitable witness.
5.29 The problem of Independent Sets starts with a graph and a natural number 𝑛
and decides whether in the graph there are 𝑛 -many independent vertices, that is,
vertices that are not connected. State it as a language decision problem, and use
Lemma 5.9 to show that this problem is in NP.
✓ 5.30 Use Lemma 5.9 to show that the Knapsack problem is in NP.
5.31 True or false? For the language { ⟨𝑎, 𝑏, 𝑐⟩ ∈ N3 𝑎 + 𝑏 = 𝑐 }, the problem of
deciding membership is in NP.
✓ 5.32 The Longest Path problem is to input a graph and a bound, ⟨G, 𝐵⟩ , and
determine whether the graph contains a simple path of length at least 𝐵 ∈ N. (A
path is simple if no two of its vertices are equal). Show that this is in NP.
5.33 Recast each as a language decision problem and then show it is in NP.
(a) The Linear Divisibility problem inputs a pair of natural numbers 𝜎 = ⟨𝑎, 𝑏⟩ and
asks if there is an 𝑥 ∈ N with 𝑎𝑥 + 1 = 𝑏 .
(b) Given 𝑛 points scattered on a line, how far they are from each other defines
a multiset. (Recall that a multiset is like a set but element repeats don’t
collapse.) The reverse of this problem, starting with a multiset 𝑀 of numbers
and deciding whether there exist a set of points on a line whose pairwise
distances defines 𝑀 , is the Turnpike problem.
5.34 Is NP countable or uncountable?
✓ 5.35 Show that this problem is in NP. A company has two delivery trucks. They
work with a weighted graph called the ‘road map’. (Some vertex is distinguished
as the start/finish.) Each morning the company gets a set of vertices, 𝑉 . They
must decide if there are two cycles such that every vertex in 𝑉 is on at least one of
the two cycles, and each cycle has length at most 𝐵 ∈ N.
✓ 5.36 Two graphs G0, G1 are isomorphic if there is a one-to-one and onto function
𝑓 : N0 → N1 such that {𝑣, 𝑣ˆ } is an edge of G0 if and only if { 𝑓 (𝑣), 𝑓 (𝑣ˆ) } is an edge
of G1 . Consider the problem of computing whether two graphs are isomorphic.
(a) Define the appropriate language. (b) Show that the language membership
problem is in NP.
5.37 The definition of when a nondeterministic machine decides a language,
Definition 5.2, requires that every branch in the computation tree is finite. For
recognition of languages we can drop that. We say that a nondeterministic Turing
machine 𝑃 recognizes a language L when if 𝜎 ∈ L then there is at least one
branch in the computation tree that accepts 𝜎 , while if 𝜎 ∉ L then no branch
in the computation tree accepts (some branches may fail to accept because they
are infinite). Show that if there is a nondeterministic machine that recognizes a
language then there is a deterministic machine that also recognizes it.
5.38 Following the definition of Turing machine, on page 8, we gave a formal
description of how these machines act. We did the same for Finite State machines
on page 184, and for nondeterministic Finite State machines on page 193. Give a
formal description of the action of a nondeterministic Turing machine.
Section 6. Reductions between problems 319
5.39 (a) Show that the Halting problem in not in NP. (b) What is wrong with
this reasoning? The Halting problem is in NP because given ⟨P , 𝑥⟩ , we can take as
the witness 𝜔 a number of steps for P to halt on input 𝑥 . If it halts in that number
of steps then the verifier accepts, and if not then the verifier rejects.
Section
V.6 Reductions between problems
†
Often people get the phrase ‘reduces to’ the wrong way around. Perhaps they are misled by ‘𝐵 ≤𝑇 𝐴’
into thinking that 𝐵 is the reduced-to thing but the opposite is true. For example where 𝐴 is the
Entscheidungsproblem of answering all questions in Mathematics and 𝐵 is Goldbach’s conjecture, the
right terminology is that 𝐵 reduces to 𝐴 because a solution for 𝐴 gives one for 𝐵 as a side effect.
320 Chapter V. Computational Complexity
6.1 Definition Let L0, L1 be languages, subsets of B∗. Then L1 is polynomial time
reducible to L0 , or Karp reducible, or polynomial time mapping reducible, or
polynomial time many-one reducible, written L1 ≤𝑝 L0 or L1 ≤𝑚 L0 , if there is
𝑝
Rec
6.2 Figure: This is the collection of all problems, L ∈ P ( B∗ ) , with a few shown as dots.
Ones with fast algorithms are at the bottom. Problems are connected if there is a
polytime reduction from one to the other. Highlighted are connections within P.
We gave the intuition that there is a reduction of this kind when one problem
is a translation of the other, or at least a translation of a special case. The first
example illustrates.
6.3 Example The Shortest Path problem inputs a weighted graph, two vertices, and a
bound, and decides if there is path between the vertices of length less than the
bound.
L0 = { ⟨G, 𝑣 0, 𝑣 1, 𝐵⟩ there is path from 𝑣 0 to 𝑣 1 of length less than 𝐵 }
The Vertex-to-Vertex Path problem inputs an unweighted graph and two vertices,
and decides if there is a path between the two.
L1 = { ⟨H, 𝑤 0, 𝑤 1 ⟩ there is path between 𝑤 0 and 𝑤 1 }
6.4 Remark Authors describing a reduction will often omit this kind of development
from the write-up. It is perfectly standard to expect the reader to work out the
motivation for the reduction function’s definition.
6.5 Example The Clique problem is the decision problem for the language L𝐵 =
{ ⟨G, 𝐵⟩ G has a clique with 𝐵 vertices }. We will sketch that Satisfiability ≤𝑝
Clique, so that intuitively Clique is at least as hard as Satisfiability.
Consider how to satisfy this Boolean expression.
𝐸 = (𝑥 0 ∨ ¬𝑥 1 ∨ 𝑥 2 ) ∧ (¬𝑥 0 ∨ 𝑥 2 ∨ ¬𝑥 3 ) ∧ (¬𝑥 1 ∨ ¬𝑥 2 )
The ∧’s make the statement as a whole 𝑇 if and only if all of its clauses are 𝑇. The
∨’s mean that each clause is 𝑇 if and only if any of its literals is 𝑇. So to satisfy the
expression, select a literal from each clause and assign it the value 𝑇. For example,
we can make 𝐸 be 𝑇 by selecting 𝑥 0 from the first clause, 𝑥 2 from the second, and
¬𝑥 1 from the third, and making them 𝑇. Similarly, if ¬𝑥 1 from the first and third
clauses and 𝑥 3 from the second are 𝑇 then 𝐸 is 𝑇. What we cannot do is pick 𝑥 2
from the first and second and then ¬𝑥 2 from the third, because we cannot set both
of these literals to be 𝑇.
That is, we can think of Satisfiability as a combinatorial problem. The clauses
are like buckets and we select one thing from each bucket, subject to the constraint
that the things we select must be pairwise compatible.
This view of Satisfiability has a binary relation ‘can be compatibly picked’
between the literals. So, as below, let G𝐸 be a graph whose vertices are pairs ⟨𝑐, ℓ⟩
where 𝑐 is the number of a clause and ℓ is a literal in that clause. Two vertices
𝑣 0 = ⟨𝑐 0, ℓ0 ⟩ and 𝑣 1 = ⟨𝑐 1, ℓ1 ⟩ are connected by an edge if they come from different
clauses so 𝑐 0 ≠ 𝑐 1 , and if the literals are not negations of each other so ℓ0 ≠ ¬ℓ1 .
0, 𝑥 2 1, ¬𝑥 3
0, ¬𝑥 1 1, 𝑥 2
0, 𝑥 0 1, ¬𝑥 0
2, ¬𝑥 1 2, ¬𝑥 2
A choice of three mutually compatible vertices makes 𝐸 evaluate to 𝑇 . That is, the
3 clause expression 𝐸 is satisfiable if and only if G𝐸 has a 3-clique.
More formally, the reduction function 𝑓 inputs a propositional logic expression 𝐸
and outputs a pair 𝑓 (𝐸) = ⟨G𝐸 , 𝐵⟩ where G𝐸 is the compatibility graph associated
with 𝐸 defined in the prior paragraph and where 𝐵 is the number of clauses in 𝐸 .
Then 𝐸 ∈ SAT if and only if 𝑓 (𝐸) ∈ L𝐵 . Clearly this function can be computed in
polytime.
6.7 Example Recall that a graph is 𝑘 -colorable if we can partition the vertices into
𝑘 many classes, called ‘colors’, so that two vertices can have the same color only
322 Chapter V. Computational Complexity
We will illustrate that the Graph Colorability problem reduces to the Satisfiability
problem, Graph Colorability ≤𝑝 Satisfiability, by focusing on the 𝑘 = 3 construction.
(Larger 𝑘 ’s work much the same way, although the 𝑘 = 2 case is different.)
Denote the set of satisfiable propositional logic statements as L0 and the set
of 3-colorable graphs as L1 . To show that L1 ≤𝑝 L0 we must produce a reduction
function 𝑓 . It inputs a graph G and a outputs a propositional logic expression
𝐸 = 𝑓 ( G ) such that the graph is 3-colorable if and only if the expression is satisfiable.
The function builds 𝐸 by including clauses that state, in the language of
propositional logic, the constraints to be met for the graph to be 3-colorable.
Let G have 𝑛 -many vertices {𝑣 0, ... 𝑣𝑛− 1 }. Then 𝐸 has 3𝑛 -many Boolean variables,
𝑎 0, ... 𝑎𝑛−1 , and 𝑏 0, ... 𝑏𝑛−1 , and 𝑐 0, ... 𝑐𝑛−1 . The idea is that if the 𝑖 -th vertex 𝑣𝑖
gets the first color then 𝐸 will be satisfied when the associated variables have
𝑎𝑖 = 𝑇 , 𝑏𝑖 = 𝐹, 𝑐𝑖 = 𝐹 , while if 𝑣𝑖 gets the second color then 𝐸 will be satisfied when
𝑎𝑖 = 𝐹, 𝑏𝑖 = 𝑇 , 𝑐𝑖 = 𝐹 , and if 𝑣𝑖 gets the third color then 𝐸 will be satisfied when
𝑎𝑖 = 𝐹, 𝑏𝑖 = 𝐹, 𝑐𝑖 = 𝑇 .
Specifically, the expression includes two kinds of clauses. For every vertex 𝑣𝑖 ,
there is a clause saying that the vertex gets at least one color.
(𝑎𝑖 ∨ 𝑏𝑖 ∨ 𝑐𝑖 )
And for each edge {𝑣𝑖 , 𝑣 𝑗 }, there are three clauses which together ensure that the
edge does not connect two same-color vertices.
(¬𝑎𝑖 ∨ ¬𝑎 𝑗 ) (¬𝑏𝑖 ∨ ¬𝑏 𝑗 ) (¬𝑐𝑖 ∨ ¬𝑐 𝑗 )
The function’s output 𝐸 is the conjunction of all of these clauses.
This illustrates. The expression’s top line has the clauses of the first kind while
the remaining lines have the other kind.
(𝑎 0 ∨ 𝑏 0 ∨ 𝑐 0 ) ∧ (𝑎 1 ∨ 𝑏 1 ∨ 𝑐 1 ) ∧ (𝑎 2 ∨ 𝑏 2 ∨ 𝑐 2 ) ∧ (𝑎 3 ∨ 𝑏 3 ∨ 𝑐 3 )
∧ (¬𝑎 0 ∨ ¬𝑎 1 ) ∧ (¬𝑏 0 ∨ ¬𝑏 1 ) ∧ (¬𝑐 0 ∨ ¬𝑐 1 )
𝑣0 𝑣1 𝑣2
∧ (¬𝑎 0 ∨ ¬𝑎 3 ) ∧ (¬𝑏 0 ∨ ¬𝑏 3 ) ∧ (¬𝑐 0 ∨ ¬𝑐 3 )
𝑣3 ∧ (¬𝑎 1 ∨ ¬𝑎 2 ) ∧ (¬𝑏 1 ∨ ¬𝑏 2 ) ∧ (¬𝑐 1 ∨ ¬𝑐 2 )
∧ (¬𝑎 1 ∨ ¬𝑎 3 ) ∧ (¬𝑏 1 ∨ ¬𝑏 3 ) ∧ (¬𝑐 1 ∨ ¬𝑐 3 )
Completing the argument requires checking that the reduction function, which
inputs a bitstring representation of the graph and outputs a bitstring representation
of the expression, is polynomial. That’s clear so we omit the details.
A reduction function is a kind of compiler. A programming language compiler
inputs descriptions from one domain, such as a Racket program, and outputs a
corresponding statement from another domain, such an executable in the machine’s
native format. Similarly, the function 𝑓 above translates problem instances in the
domain of graphs to those in the domain of propositional logic.
6.9 Example We will show that Subset Sum reduces to Knapsack, that Subset Sum ≤𝑝
Knapsack. The Knapsack problem starts with a multiset of objects 𝑈 = {𝑢 0, ... 𝑢𝑛− 1 }
whose elements each have a weight 𝑤 (𝑢𝑖 ) and a value 𝑣 (𝑢𝑖 ) , along with an upper
bound on the weights 𝑊 ∈ N and a lower bound for the values 𝑉 ∈ N. It asks for a
subset 𝐴 ⊆ 𝑈 such that the weight total does not exceed 𝑊 while the value total is
at least as big as 𝑉 .
The reduction function 𝑓 must input pairs ⟨𝑆,𝑇 ⟩ and output five-tuples
⟨𝑈 , 𝑤, 𝑣,𝑊 , 𝑉 ⟩ , and must be such that ⟨𝑆,𝑇 ⟩ ∈ L1 if and only if 𝑓 (⟨𝑆,𝑇 ⟩) ∈ L0 .
And it must be polytime.
A numerical example gives the idea of how 𝑓 proceeds. Imagine that we want
to know if there is a subset of 𝑆 = { 18, 23, 31, 33, 72, 86, 94 } that adds to 𝑇 = 126.
If we had access to an oracle for Knapsack then we could set 𝑈 = 𝑆 , let 𝑤 and 𝑣 be
the identity functions so that 𝑤 ( 18) = 𝑣 ( 18) = 18 and 𝑤 ( 23) = 𝑣 ( 23) = 23, etc.,
and then fix the weight and value targets as 𝑊 = 𝑉 = 𝑇 = 126. Then ⟨𝑆,𝑇 ⟩ ∈ L1
iff ⟨𝑆, 𝑤, 𝑣,𝑊 , 𝑉 ⟩ ∈ L0 . In this way, we think of the Subset Sum problem as a
special case of Knapsack.
More generally, let 𝑓 take the input ⟨𝑆,𝑇 ⟩ to the output ⟨𝑆, 𝑤, 𝑣,𝑇 ,𝑇 ⟩ , where
the functions 𝑤 and 𝑣 are given by 𝑤 (𝑠𝑖 ) = 𝑣 (𝑠𝑖 ) = 𝑠𝑖 . Then ⟨𝑆,𝑇 ⟩ ∈ L0 if and
only if 𝑓 (⟨𝑆,𝑇 ⟩) ∈ L1 . Clearly 𝑓 can be done in polytime.
We close with some basic facts about polytime reduction.
6.10 Lemma Polytime reduction is reflexive: L ≤𝑝 L for all languages. It is also
transitive: L2 ≤𝑝 L1 and L1 ≤𝑝 L0 imply that L2 ≤𝑝 L0 . Every nontrivial
computable language is 𝑃 hard: for L1 ∈ P, every language L0 with L0 ≠ ∅ and
L0 ≠ N satisfies that L1 ≤𝑝 L0 . The class P is closed downward: if L0 ∈ P and
L1 ≤𝑝 L0 then L1 ∈ P. So also is the class NP.
Proof The first two sentences and the final sentence are Exercise 6.36.
For the third sentence fix a L0 that is nontrivial, so there is a member 𝜎 ∈ L0
and a nonmember 𝜏 ∉ L0 . Let L1 be an element of P. We will specify a polytime
324 Chapter V. Computational Complexity
V.6 Exercises
6.12 Suppose that L1 ≤𝑝 L0 . Which is the right way to use the phrase ‘reduces
to’: “L1 reduces to L0 ,” or “L0 reduces to L1 ?”
✓ 6.13 Show that if L0 ∉ P and L0 ≤𝑝 L1 then L1 ∉ P also. What about NP?
6.14 Your friend is confused. “Lemma 6.10 says that every language in P is ≤𝑝
every other nontrivial language. But there are uncountably many languages and
only countably many 𝑓 ’s because they each come from some Turing machine. So
I’m not seeing how there are enough reduction functions for a given language to
be related to all others.” Un-confuse them.
6.15 Must a set be polytime reducible to its complement?
(a) Show that N is not polytime reducible to the empty set.
(b) Further, show that if 𝐴 ≤𝑝 𝐵 and 𝐵 is computably enumerable then 𝐴 is
computably enumerable. Conclude that 𝐾 c ≰𝑝 𝐾 .
6.16 Prove that L ≤𝑝 Lc if and only if Lc ≤𝑝 L.
6.17 Example 6.9 includes as illustration a Subset Sum problem, where 𝑆 =
{ 18, 23, 31, 33, 72, 86, 94 } and 𝑇 = 126. Solve it.
6.18 Produce the compatibility graph for (𝑥 0 ∨ 𝑥 1 ) ∧ (¬𝑥 0 ∨ ¬𝑥 1 ) ∧ (𝑥 0 ∨ ¬𝑥 1 ) .
Section 6. Reductions between problems 325
6.19 Following the method of Example 6.7 give the expression associated with
the quesion of whether this graph is 3-colorable. Is that expression satisfiable?
𝑣0,0 𝑣0,1
𝑣1,0 𝑣2,0
𝑣1,1 𝑣2,1
✓ 6.20 Suppose that the language 𝐴 is polynomial time reducible to the language 𝐵 ,
𝐴 ≤𝑝 𝐵 . Which of these are true?
(a) A tractable way to decide 𝐴 can be used to tractably decide 𝐵 .
(b) If 𝐴 is tractably decidable then 𝐵 is tractably decidable also.
(c) If 𝐴 is not tractably decidable then 𝐵 is not tractably decidable too.
✓ 6.21 The Substring problem inputs two strings and decides if the second is a
substring of the first. The Cyclic Shift problem inputs two strings and decides
if the second is a cyclic shift of the first. (Where 𝛼 = abcde, one cyclic shift is
𝛽 = deabc. More precisely, if 𝛼 = 𝑎 0𝑎 1 ... 𝑎𝑛−1 and 𝛽 = 𝑏 0𝑏 1 ... 𝑏𝑛−1 are length 𝑛
strings, then 𝛽 is a cyclic shift of 𝛼 when there is an index 𝑘 ∈ { 0, ... 𝑛 − 1 } such
that 𝑎𝑖 = 𝑏 (𝑘+𝑖 ) mod 𝑛 for all 𝑖 < 𝑛 .)
(a) Name three cyclic shifts of 𝛼 = 0110010.
(b) Decide whether 𝛽 = 101001101 is a cyclic shift of 𝛼 = 001101101.
(c) State the Substring problem as a language decision problem.
(d) Also state the Cyclic Shift problem as a language decision problem.
(e) Show that Cyclic Shift ≤𝑝 Substring. Hint: for same length strings, 𝛽 is a cyclic
shift of 𝛼 if and only if 𝛽 is a substring of 𝛼 ⌢ 𝛼 .
✓ 6.22 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
by any edge. The Vertex Cover problem inputs a graph and a bound and decides if
there is a vertex set, of size less than or equal to the bound, such that every edge
contains at least one vertex in the set.
(a) State each as a language decision problem.
(b) Consider this graph. Find a vertex cover with four elements.
𝑣0 𝑣1 𝑣2 𝑣3 𝑣4
𝑣5 𝑣6 𝑣7 𝑣8 𝑣9
at least one vertex in the set. The Set Cover problem inputs a set 𝑆 , a collection of
subsets 𝑆 0 ⊆ 𝑆 , . . . 𝑆𝑛 ⊆ 𝑆 , and a bound, and decides if there is a subcollection of
the 𝑆 𝑗 , with a number of sets at most equal to the bound, whose union is 𝑆 .
(a) State each as a language decision problem.
(b) Find a vertex cover for this graph.
𝑐 𝑓 𝑙
𝑞1 𝑞3 𝑞6 𝑞8
𝑎
𝑑 𝑔 𝑗
𝑞0 𝑞5 𝑚 𝑛
ℎ 𝑘
𝑏
𝑞2 𝑒
𝑞4 𝑞7 𝑞9
𝑖
(c) Make a set 𝑆 consisting of all of that graph’s edges, and for each 𝑣 make a
subset 𝑆 𝑣 of the edges incident on that vertex. Find a set cover.
(d) Show that Vertex Cover ≤𝑝 Set Cover.
✓ 6.24 Show that Hamiltonian Circuit ≤𝑝 Traveling Salesman. (a) State each as a
language decision problem. (b) Produce the reduction function.
✓ 6.25 In this network, each edge is labeled with a capacity. (Imagine railroad
lines going from 𝑞 0 to 𝑞 6 .)
𝑞1 2 𝑞4
3 2 4 1
𝑞0 1 𝑞3 2 𝑞6
2 2 2 2
𝑞2 𝑞5
1
The Max-Flow problem is to find the maximum total amount that can flow across
the network, usually by using many paths at once. That is, we will find a flow 𝐹𝑞𝑖 ,𝑞 𝑗
for each edge, subject to the constraints that the flow through an edge must not
exceed its capacity and that the flow into a vertex must equal the flow out (except
for the source 𝑞 0 and the sink 𝑞 6 ). The Linear Programming problem is described
on page 285.
(a) Express each as a language decision problem, remembering the technique of
converting optimization problems using bounds.
(b) By eye, find the maximum flow for the above network.
(c) For each edge 𝑣 𝑖 𝑣 𝑗 , define a variable 𝑥𝑖,𝑗 . Describe the constraints on that
variable imposed by the edge’s capacity. Also describe the constraints on the
set of variables imposed by the limitation that for many vertices the flow in
must equal the flow out. Finally, use the variables to give an expression to
optimize in order to get maximum flow.
(d) Show that Max-Flow ≤𝑝 Linear Programming.
6.26 The Max-Flow problem inputs a directed graph where each edge is labeled
with a capacity, and the task is to find a the maximum amount that can flow from the
source node to the sink node (for more, see Exercise 6.25). The Drummer problem
starts with two same-sized sets, the rock bands, 𝐵 , and potential drummers, 𝐷 .
Section 6. Reductions between problems 327
Each band 𝑏 ∈ 𝐵 has a set 𝑆𝑏 ⊆ 𝐷 of drummers that they would agree to take on.
The goal is to make the most number of matches.
(a) Consider four bands 𝐵 = {𝑏 0, 𝑏 1, 𝑏 2, 𝑏 3 } and drummers 𝐷 = {𝑑 0, 𝑑 1, 𝑑 2, 𝑑 3 } .
Band 𝑏 0 likes drummers 𝑑 0 and 𝑑 2 . Band 𝑏 1 likes only drummer 𝑑 1 , and 𝑏 2
also likes only 𝑑 1 . Band 𝑏 3 like the sound of both 𝑑 2 and 𝑑 3 . What is the
largest number of matches?
(b) Express each as a language decision problem.
(c) Draw a graph with the bands on the left and the drummers on the right. Make
an arrow from a band to a drummer if there is a connection. Now add a source
and a sink node to make a flow diagram.
(d) Show that Drummer ≤𝑝 Max-Flow.
6.27 The 3-SAT problem is to decide the satisfiability of CNF propositional logic
expressions where every clause has at most three literals. The Strict 3-Satisfiability
problem requires that each clause has exactly three unequal literals. We will show
that the two are inter-reducible.
(a) Show the easy half, that Strict 3-Satisfiability ≤𝑝 3-SAT.
(b) Also show that we can go from clauses with two literals to clauses with
three by introducing an irrelevant variable: 𝑃 ∨ 𝑄 is equivalent to (𝑃 ∨
𝑄 ∨ 𝑅) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝑅) . Along the same lines, show that 𝑃 is equivalent to
(𝑃 ∨ 𝑄 ∨ 𝑅) ∧ (𝑃 ∨ ¬𝑄 ∨ 𝑅) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝑅) ∧ (𝑃 ∨ ¬𝑄 ∨ ¬𝑅) .
(c) Show 3-SAT ≤𝑝 Strict 3-Satisfiability.
6.28 We will show that the 3-SAT problem, 3-SAT, is inter-reducible with SAT.
(We will assume that instances of SAT are in Conjunctive Normal form.)
(a) Show the easy half, that 3-SAT ≤𝑝 SAT.
(b) As a preliminary for the other reduction, show that the propositional logic
implication 𝑃 → 𝑄 is equivalent to ¬𝑃 ∨ 𝑄 .
(c) To go from clauses with four literals to those with three, start with 𝑃 ∨𝑄 ∨𝑅 ∨𝑆 .
Introduce a variable 𝐴 such that 𝐴 ↔ (𝑃 ∨𝑄) , that is, (𝐴 → (𝑃 ∨𝑄)) ∧ (𝐴 ←
(𝑃 ∨𝑄)) . Show that (𝐴 → (𝑃 ∨𝑄)) is equivalent to (𝑃 ∨𝑄 ∨ ¬𝐴) . Also verify
that (𝑃 ∨𝑄) → 𝐴 is equivalent to (𝑃 ∨¬𝑄 ∨𝐴) ∧ (¬𝑃 ∨𝑄 ∨𝐴) ∧ (¬𝑃 ∨¬𝑄 ∨𝐴) .
Conclude that 𝑃 ∨ 𝑄 ∨ 𝑅 ∨ 𝑆 is equivalent to (𝐴 ∨ 𝑅 ∨ 𝑆) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝐴) ∧
(𝑃 ∨ ¬𝑄 ∨ 𝐴) ∧ (¬𝑃 ∨ 𝑄 ∨ 𝐴) ∧ (¬𝑃 ∨ ¬𝑄 ∨ 𝐴) .
(d) For a five literal clause 𝑃 ∨ 𝑄 ∨ 𝑅 ∨ 𝑆 ∨ 𝑋 , find an equivalent propositional
logic expression made of clauses having only three literals each.
(e) Show that SAT ≤𝑝 3-SAT.
6.29 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
by any edge.
(a) In this graph, find an independent set with at least 𝐵 = 3 members.
𝑞0 𝑞1 𝑞2
𝑞3 𝑞4 𝑞5
An example of such a problem is when nodes are cities and edges are flights, with
the weight of an edge being the flight’s cost.
(a) Show that Traveling Salesman ≤𝑝 Asymmetric Traveling Salesman.
Section 6. Reductions between problems 329
𝑡0 𝑡1 𝑡2 𝑡3
but make it a directed graph where between each worker and task there is an
edge in each direction. Use the given assignment cost table to make appropriate
edge weights. Finish by verifying that there is a polytime computable function 𝑔
that associates optimal assignments with optimal circuits.
330 Chapter V. Computational Complexity
6.34 We will show that Fin ≤𝑝 Reg, where they are the decision problems for the
language 𝑅 = {𝑥 ∈ N the language decided by P𝑥 is regular } and also for the
language 𝐹 = {𝑖 ∈ N the language decided by P𝑖 is finite } (this means that P𝑖
halts on all inputs and acts as the characteristic function of a set that is finite).
(a) Adapt Example 5.2 from Chapter Four to show that any infinite subset of
{ a𝑛 b𝑛 𝑛 ∈ N } is not regular.
(b) Argue that there is a Turing machine with the behavior below. Then apply the
s-m-n lemma to parameterize 𝑥 .
Start
Read 𝜎 , 𝑥
N
𝜎 matches a𝑛 b𝑛 ?
Y
P𝑥 accepts a 𝜏 of length 𝑛 ?
Y N
Print 1 Print 0
End
6.37 When L𝑖 ≤𝑝 L 𝑗 , does that mean that the best algorithm to decide L𝑖 takes
time that is less than or equal to the amount taken by the best algorithm for L 𝑗 ?
Fix a language decision problem L0 whose fastest algorithm is O (𝑛 3 ) , an L1 whose
best algorithm is O (𝑛 2 ) , a L2 whose best is O ( 2𝑛 ) , and a L3 whose best is O ( lg 𝑛) .
Section 7. NP completeness 331
Section
V.7 NP completeness
Because P ⊆ NP, the class NP contains lots of easy problems, ones with a fast
algorithm. But the interest in the class is that it also contains lots of problems that
seem to be hard. Can we prove that these problems are indeed hard?
This question was raised by S Cook in 1971. He noted
that the idea of polynomial time reducibility gives us a way
to make precise that an efficient solution for one problem
implies an efficient solution for the other. He then showed
that among the problems in NP, there are ones that are
maximally hard. (This was also shown by L Levin but he
was behind the Iron Curtain and knowledge of his work
did not spread to the rest of the world for some time.) Stephen Cook b 1939 and
Here, ‘maximally hard’ means that these are NP prob- Leonid Levin b 1948
lems and they are at least as hard as any NP problem, in
that if we could solve one of these then we could solve any NP problem at all.
7.1 Theorem (Cook-Levin theorem) The Satisfiability problem is in NP and has the
property any problem in NP reduces to it: L ≤𝑝 SAT for any L ∈ NP.
First, we have already observed that SAT ∈ NP because, given a Boolean
expression, we can use as a witness 𝜔 an assignment of truth values that satisfies
the expression.
Here is an outline of the proof ’s other half. Given L ∈ NP, we must show that
L ≤𝑝 SAT. We produce a function 𝑓L that translates membership questions for L
into Boolean expressions, such that the membership answer is ‘yes’ if and only if
the expression is satisfiable. What we know about L is that its member 𝜎 ’s are
accepted by a nondeterministic machine P in time given by a polynomial 𝑞 . With
that, from ⟨P , 𝜎, 𝑞⟩ the proof constructs a Boolean expression that yields 𝑇 if and
only if P accepts 𝜎 . The Boolean expression encodes the constraints under which
332 Chapter V. Computational Complexity
a Turing machine operates, such as that the only tape symbol that can be changed
in the current step is the symbol under the machine’s head.
7.2 Definition A problem is NP hard if every problem in NP reduces to it, that is,
L is NP hard if L̂ ∈ NP implies that L̂ ≤𝑝 L. A problem is NP complete if, in
addition to being NP hard, it is also a member of NP.†
So a problem is NP complete if it is, in a sense, at least as hard as any member
of NP. The sketch below illustrates.
NP hard
NP complete P NP
7.3 Figure: The blob contains all problems. In the bottom is NP, drawn with P as a
proper subset. The top has the NP-hard problems. The highlighted intersection is
the set of NP complete problems.
The Cook-Levin Theorem says that there is at least one NP complete problem,
namely SAT. In fact, we shall see that there are many such problems.
The NP complete problems are to the class NP as the problems Turing-equivalent
to the Halting problem set 𝐾 are to the computably enumerable sets. If we could
solve the one problem then we could solve every other problem in that class.
7.4 Lemma If L0 is NP complete, and L0 ≤𝑝 L1 , and L1 ∈ NP then L1 is NP complete.
Proof Exercise 7.30.
Soon after Cook raised the question of NP completeness, R Karp
brought it to widespread attention. Karp noted that there are clusters
of problems: there is a collection of problems solvable in time O ( lg (𝑛)) ,
problems of time O (𝑛) , those of time O (𝑛 lg 𝑛) , etc. There is also
a cluster of problems that seem much tougher. He gave a list of
twenty one of these, drawn from Computer Science, Mathematics, and
the natural sciences, where lots of smart people had for years been
unable find efficient algorithms. He showed that all of these problems Richard M Karp
are NP complete, so that if we could efficiently solve any then we could b 1935
efficiently solve them all. Not every difficult problem is NP complete
but many thousands of problems have been shown to be so and thus whatever it is
that makes these problems hard, all of them share it.
Typically we prove that a problem L is NP complete in two halves. First we
show that it is in NP by exhibiting a witness 𝜔 that a deterministic verifier can
check in polytime. Second, we show that the problem is NP hard by showing
†
In general, for a complexity class C, a problem L is C hard when all problems in that class reduce to
it: if L̂ ∈ C then L̂ ≤𝑝 L. A problem is C complete if it is hard for that class and also is a member of
that class.
Section 7. NP completeness 333
that an NP complete problem reduces to it. The list below gives the NP complete
problems most often used. For instance, we might show that 3-SAT ≤𝑝 L.
7.5 Theorem (Basic NP Complete Problems) Each of these problems is NP com-
plete.
3-Satisfiability, 3-SAT Given a propositional logic formula in conjunctive normal
form in which each clause has at most 3 variables, decide if it is satisfiable.
3 Dimensional Matching Given as input a set 𝑀 ⊆ 𝑋 × 𝑌 × 𝑍 , where the sets
𝑋, 𝑌 , 𝑍 all have the same number of elements, 𝑛 , decide if there is a matching,
a set 𝑀 ˆ ⊆ 𝑀 containing 𝑛 elements such that no two of the triples in 𝑀 ˆ agree
on any of their coordinates.
Vertex cover Given a graph and a bound 𝐵 ∈ N, decide if the graph has a size 𝐵
set of vertices 𝐶 such that for any edge 𝑣𝑖 𝑣 𝑗 , at least one of its ends is a member
of 𝐶 .
Clique Given a graph and a bound 𝐵 ∈ N, decide if the graph has a set of 𝐵 -many
vertices where any two are connected.
Hamiltonian Circuit Given a graph, decide if it contains a cyclic path that includes
each vertex.
Partition Given a finite multiset 𝑆 of natural numbers, decide if there is a division
of the set into the two parts 𝑆ˆ and 𝑆 − 𝑆ˆ so the total of their elements is the
same, 𝑠 ∈𝑆ˆ 𝑠 = 𝑠∉𝑆ˆ 𝑠 .
Í Í
7.6 Example We will show that the Traveling Salesman problem is NP complete.
Recall that we have recast it as the decision problem for the language of pairs ⟨G, 𝐵⟩ ,
where 𝐵 is a parameter bound, and that this problem is a member of NP. We will
show that it is NP hard by proving that the Hamiltonian Circuit problem reduces to
it, Hamiltonian Circuit ≤𝑝 Traveling Salesman.
We need a reduction function 𝑓 . It must input an instance of Hamiltonian Circuit,
a graph G = ⟨N , E ⟩ whose edges are unweighted. Define 𝑓 to return the instance
of Traveling Salesman that uses N as cities, that takes the distances between cities
to be 𝑑 (𝑣𝑖 , 𝑣 𝑗 ) = 1 if 𝑣𝑖 𝑣 𝑗 ∈ E and 𝑑 (𝑣𝑖 , 𝑣 𝑗 ) = 2 if 𝑣𝑖 𝑣 𝑗 ∉ E , and such that the bound
is the number of vertices, 𝐵 = | N | .
This bound means that there will be a Traveling Salesman solution if and only
if there is a Hamiltonian Circuit solution; namely, the salesman uses the edges that
appear in the Hamiltonian circuit. All that remains is to argue that the reduction
function runs in polytime. The number of edges in a graph is no more than twice
the number of vertices so polytime in the input graph size is the same as polytime
in the number of vertices. The reduction function’s algorithm examines all pairs of
vertices, which takes time that is quadratic in the number of vertices.
A common way to show that a given problem L is NP hard is to show that a
special case of L is NP hard.
7.7 Example The Knapsack problem starts with a multiset of objects 𝑆 = {𝑠 0, ... 𝑠𝑘 − 1 },
each with a natural number weight 𝑤 (𝑠𝑖 ) and a value 𝑣 (𝑠𝑖 ) , along with a weight
bound 𝐵 and value target 𝑇 . We then look for a knapsack 𝐾 ⊆ 𝑆 whose elements
334 Chapter V. Computational Complexity
have total weight less than or equal to the bound and total value greater than or
equal to the target.
First we check that this problem is in NP. As the witness we can use the 𝑘 -bit
string 𝜔 such that 𝜔 [𝑖] = 1 if 𝑠𝑖 is in the knapsack 𝐾 , and 𝜔 [𝑖] = 0 if it is not. A
deterministic machine can verify this witness in polynomial time since it only has
to total the weights and values of the elements of 𝐾 .
To finish we must show that Knapsack is NP hard. It is sufficient to show that
a special case is NP hard. Consider the Knapsack instance where 𝑤 (𝑠𝑖 ) = 𝑣 (𝑠𝑖 )
for all 𝑠𝑖 ∈ 𝑆 , and where the two criteria each equal half of the weight total,
𝐵 = 𝑇 = 0.5 · 0 ≤𝑖<𝑘 𝑤 (𝑖) . This shows that any instance of the Partition problem,
which is in the above basic list, can be expressed as a Knapsack instance, so
Í
𝑇 𝐹
𝑛0 𝑛1 𝑛2 (∗)
𝑛3 𝑛4 𝑛5
𝑎 𝑏 𝑐
Section 7. NP completeness 335
We will verify that this gadget is 3-colorable if and only if nodes 𝑎 , 𝑏 , and 𝑐 are
not all the color of 𝐹, matching the behavior of the clause 𝑎 ∨ 𝑏 ∨ 𝑐 . For ‘only if ’,
assume that 𝑎 , 𝑏 , and 𝑐 are the color of 𝐹. Then one of 𝑛 3 and 𝑛 4 is the color of 𝑇
while the other is the color of 𝐺 , and hence 𝑛 0 is the color of 𝐹 . Since 𝑛 2 is the
color of 𝑇 this implies that 𝑛 1 is the color of 𝐺 and this in turn gives that 𝑛 5 is the
color of 𝐹 . That violates 3-colorability because 𝑐 is the color of 𝐹 .
For ‘if ’, we need only exhibit that a 3-coloring exists for each remaining case.
𝑎 𝑏 𝑐 𝑛0 𝑛1 𝑛2 𝑛3 𝑛4 𝑛5
𝐹 𝐹 𝑇 𝐹 𝐺 𝑇 𝑇 𝐺 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝐺 𝐹 𝐺
𝐹 𝑇 𝑇 𝐹 𝐺 𝑇 𝑇 𝐺 𝐹
𝑇 𝐹 𝐹 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺
𝑇 𝐹 𝑇 𝐹 𝐺 𝑇 𝐺 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺
𝑇 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺
𝑇 𝐹
(∗∗)
𝑥 ¬𝑥 𝑦 ¬𝑦 𝑧 ¬𝑧
P=NP NP P
There are a number of ways to potentially settle the question. For example,
by Lemma 7.4 if there is even one NP complete problem that we can prove is a
member of P, then P = NP. Conversely, if someone shows that there is an NP
problem that is not a member of P then P ≠ NP. However, despite nearly a half
century of effort by many brilliant people, no one has accomplished either one.
To explain all the effort on the question, we first argue for its importance. As
formulated in Karp’s original paper, the question of whether P equals NP might
seem of only technical interest.
A large class of computational problems involve the determination of properties
of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean
formulas and elements of other countable domains. Through simple encodings . . .
these problems can be converted into language recognition problems, and we can
inquire into their computational complexity. It is reasonable to consider such a problem
satisfactorily solved when an algorithm for its solution is found which terminates
within a number of steps bounded by a polynomial in the length of the input. We
show that a large number of classic unsolved problems of covering, matching, packing,
routing, assignment and sequencing are equivalent, in the sense that either each of
them possesses a polynomial-bounded algorithm or none of them does.
These careful words mask the excitement. Karp demonstrated that many problems
that people had been struggling with in practice — classic unsolved problems —
fall in this category. Researchers who have been looking for an efficient solution
to Vertex Cover and those who have been working on Clique find that they are
working on the same problem, in that the two are inter-translatable. By now the
list of NP complete problems includes determining the best layout of transistors on
a chip, developing accurate financial-forecasting models, analyzing protein-folding
behavior in a cell, or finding the most energy-efficient airplane wing. So the
question of whether P equals NP is extremely practical, and extremely important.†
Researchers often take proving that a problem is NP complete to be an ending
point; they may feel that continuing to look for an algorithm is a waste since
many of the world’s best minds have failed to find one. They may turn to finding
approximations (see Extra B) or to probabilistic methods.
We next argue that among many similar questions, each of which is important,
P versus NP suggests itself as especially significant. First, a philosophical take. At
the start of this book we studied problems that are unsolvable. That is black and
white — either a problem is mechanically solvable or it is not. In this chapter we
find that many problems are solvable in principle but computing a solution seems
to be infeasible. The set P consists of the problems that we can feasibly solve. But
if P ≠ NP then the problems in NP − P, including the NP complete ones, are ones
for which we can verify a correct answer but we cannot reliably find it. The poet
†
One indication of its importance is its inclusion on the Clay Mathematics Institute’s list of problems for
which there is a one million dollar prize; see http://www.claymath.org/millennium-problems. Part
of the introduction there says, “[O]ne of the outstanding problems in computer science is determining
whether questions exist whose answer can be quickly checked, but which require an impossibly long
time to solve by any direct procedure. Problems . . . certainly seem to be of this kind, but so far no one
has managed to prove that any of them really are so hard as they appear.”
Section 7. NP completeness 339
R Browning wrote, “Ah, but a man’s reach should exceed his grasp, Or what’s a
heaven for?” We can view these problems as a transition between the possible and
the impossible.
The sense that the P versus NP question fits into a larger intellectual
setting returns us to the book’s opening. Recall the Entscheidungsproblem
that was a motivation behind the definition of a Turing machine. It asks for
an algorithm that inputs a mathematical statement and decides whether it is
true. It is perhaps a caricature, but imagine that the job of mathematicians
is to prove theorems. Then the Entscheidungsproblem asks if it is possible
to replace mathematicians with mechanisms.
Robert
In the intervening century we have come to understand, through the Browning,
work of Gödel and others, that there is a difference between a statement’s 1812–1889
being true and its being provable. Church and Turing expanded on this
insight to show that the Entscheidungsproblem is unsolvable. Consequently, we
change to asking for an algorithm that inputs statements and decides whether they
are provable.
In principle this is simple. A proof is a sequence of statements, 𝜎0 , 𝜎1 , . . . 𝜎𝑘 ,
where the final statement is the conclusion and where each statement either is
an axiom or else follows from the statements before it by an application of a rule
of deduction (a typical rule allows the simultaneous replacement of all 𝑥 ’s with
𝑦 + 4’s). A computer could brute-force the question of whether a given statement
is provable by doing a dovetail, a breadth-first search of all derivations. If a proof
exists then it will appear, eventually.†
The difficulty is the ‘eventually’. This algorithm is very slow. Is there a tractable
way? In the terminology that we now have, the modified Entscheidungsproblem is a
decision problem: given a statement 𝜎 and a bound, we ask if there is a sequence 𝜔
of statements witnessing a proof that ends in 𝜎 and that is shorter than the bound.
A computer can quickly check whether a given proof is valid — this problem is
in NP. With the current status of the P versus NP problem, the answer to the
question in the prior paragraph is that no one knows of a fast algorithm but no one
can show that there isn’t one either.
As far back as 1956, Gödel raised these issues in a letter to von Neumann (this
letter did not become public until years later).‡
One can obviously easily construct a Turing machine, which for every formula 𝐹 in
first order predicate logic and every natural number 𝑛 , allows one to decide if there
†
That is, in a particular subject such as elementary number theory, the set of theorems is computably
enumerable. ‡ At the meeting where Gödel, as an unknown fresh PhD, announced his Incompleteness
Theorem, the only person who approached him with interest was von Neumann, who was already well
established. Later, when Gödel was trying to escape the Nazis, von Neumann wrote to the director of
the Institute for Advanced Study, “Gödel is absolutely irreplaceable. He is the only mathematician . . .
about whom I would dare to make this statement.” So they were professionally quite close. At the time
of the letter, von Neumann had cancer, probably from his work on the Manhattan Project. Gödel was
misinformed and wrote, “Since you now, as I hear, are feeling stronger, I would like to allow myself to
write you about a mathematical problem, of which your opinion would very much interest me.” Within
a year von Neumann had died. We don’t know if he replied or even read the letter.
340 Chapter V. Computational Complexity
Discussion Certainly the P versus NP question is the sexiest one in the Theory
of Computing today. It has attracted a great deal of gossip. In 2018, a poll of
experts found that out of 152 respondents, 88% thought that P ≠ NP while only
12% thought that P = NP. This subsection discusses some of the intuition involved
in the question.
First we address the intuition around the conjecture that P ≠ NP.
One way to think about the question is that a problem is in P if
finding a solution is fast, while a problem is in NP if verifying the
correctness of a given witness is fast. Then the claim that P ⊆ NP
becomes the observation that if a problem is fast to solve then
A Selman’s plate, it must be fast to verify. But the other inclusion seems to most
courtesy S Selman experts to be extremely unlikely. For example, speaking informally,
S Aaronson has said, “I’d give it a 2 to 3 percent chance that P
equals NP. Those are the betting odds that I’d take.” Similarly, R Williams puts
the chance that P ≠ NP at 80%.
V Strassen has compared our confidence in this with our confidence in laws of
natural science such as 𝐹 = 𝑚𝑎 or 𝑃𝑉 = 𝑛𝑅𝑇 , “The evidence in favor of P ≠ NP . . .
is so overwhelming, and the consequences of their failure are so grotesque, that
their status may perhaps be compared to that of physical laws rather than that of
ordinary mathematical conjectures.”
As early as Karp’s original paper there was a sense that P ≠ NP was the natural
supposition. Here is the first paragraph of that paper.
Section 7. NP completeness 341
All the general methods presently known for computing the chromatic number of a
graph, deciding whether a graph has a Hamiltonian circuit, or solving a system of linear
inequalities in which the variables are constrained to be 0 or 1, require a combinatorial
search for which the worst case time requirement grows exponentialy with the length
of the input. In this paper we give theorems which strongly suggest . . . that these
problems, as well as many others, will remain intractible perpetually.
This intuition comes from a number of sources but an important one is the
everyday experience that there is a genuine difference between the difficulty of
finding a solution and that of verifying that an existing solution is correct.
Imagine a jigsaw puzzle. We perceive that if a demon gave
us an assembled puzzle 𝜔 , then checking that it is correct is very
much easier than it would have been to work out the solution
from scratch. Checking for correctness is mechanical, tedious. But
the finding of a solution, we perceive, is creative — we feel that
solving a jigsaw puzzle by brute-force trying every possible piece
against every other is too much computation to be practical.
Similarly, mathematicians find that verifying the correctness of a formally-
described proof is routine. But finding that proof in the first place may be the work
of a lifetime, or more.
Some commentators have extended this way of thinking beyond the narrow
bounds of Theoretical Computer Science. One is A Wigderson, “[P = NP would
be] utopia for the quest for knowledge and technological development by humans.
There would be a short program that, for every mathematical statement and given
page limit, would quickly generate a proof of that length, if one exists! There
would be a short program which, given detailed constraints on any engineering
task, would quickly generate a design which meets the given criteria, if one exists.
The design of new drugs, cheap energy, better strains of food, safer transportation,
and robots that would release us from all unpleasant chores, would become a
triviality.” He continues, “. . . most people revolt against the idea that such amazing
discoveries like Wiles’s proof of Fermat, Einstein’s relativity, Darwin’s evolution,
Edison’s inventions, as well as all the ones we are awaiting, could be produced in
succession quickly by a mindless robot. . . . If P = NP , any human (or computer)
would have the sort of reasoning power traditionally ascribed to deities, and this
seems hard to accept.”
Cook is of the same mind, “. . . Similar remarks apply to diverse creative
human endeavors, such as designing airplane wings, creating physical theories, or
even composing music. The question in each case is to what extent an efficient
algorithm for recognizing a good result can be found.” Perhaps it is hyperbole to
say that if P = NP then writing great symphonies would be a job for computers,
a job for mechanisms, but it is correct to say that if P = NP and if we can write
fast algorithms to recognize excellent music — and our everyday experience with
Artificial Intelligence makes this seem more and more a possibility — then we could
have fast mechanical writers of excellent music.
342 Chapter V. Computational Complexity
We finish with a taste of the intuition behind the contrarian view, the sense that
perhaps P = NP could be right.
Many observers have noted that there are cases where everyone “knew” that
some algorithm was the fastest but in the end it proved not to be so. The section on
Big-O begins with one, the grade school algorithm for multiplication. Another is
the problem of solving systems of linear equations. The Gauss’s Method algorithm,
which runs in time O (𝑛 3 ) , is perfectly natural and had been known for centuries
without anyone making improvements. However, while trying to prove that Gauss’s
Method is optimal, V Strassen found a O (𝑛 lg 7 ) method (lg 7 ≈ 2.81) .†
A more dramatic speedup happens with the Matching problem. It starts with a
graph whose vertices represent people and such that pairs of vertices are connected
if the people are compatible. We want a set of edges that is maximal, and such
that no two edges share a vertex. The naive algorithm tries all possible match
sets, which takes 2𝑚 checks where 𝑚 is the number of edges. Even with only a
hundred people there are more things to try than atoms in the universe. But since
the 1960’s we have an algorithm that runs in polytime.
Every day on the Theory of Computing blog feed there are examples of
researchers producing algorithms faster than the ones previously known. A person
can certainly have the sense that we are only just starting to explore what is
possible with algorithms. R J Lipton captured this feeling.
Since we are constantly discovering new ways to program our “machines,” why not
a discovery that shows how to factor? or how to solve SAT? Why are we all so sure that
there are no great new programming methods still to be discovered? . . . I am puzzled
that so many are convinced that these problems could not fall to new programming
tricks, yet that is what is done each and every day in their own research.
Knuth has a related but somewhat different take.
Some of my reasoning is admittedly naive: It’s hard to believe that P ≠ NP and that
so many brilliant people have failed to discover why. On the other hand if you imagine
a number 𝑀 that’s finite but incredibly large . . . then there’s a humongous number of
possible algorithms that do 𝑛 𝑀 bitwise addition or shift operations on 𝑛 given bits, and
it’s really hard to believe that all of those algorithms fail.
My main point, however, is that I don’t believe that the equality P = NP will turn
out to be helpful even if it is proved, because such a proof will almost surely be
nonconstructive. Although I think 𝑀 probably exists, I also think human beings will
never know such a value. I even suspect that nobody will even know an upper bound
on 𝑀 .
Mathematics is full of examples where something is proved to exist, yet the proof
tells us nothing about how to find it. Knowledge of the mere existence of an algorithm
is completely different from the knowledge of an actual algorithm.
†
Here is an analogy: consider the problem of evaluating 2𝑝 3 + 3𝑝 2 + 4𝑝 + 5. Someone might claim
that writing it as 2 · 𝑝 · 𝑝 · 𝑝 + 3 · 𝑝 · 𝑝 + 4 · 𝑝 + 5 makes obvious that it requires six multiplications.
But rewriting it as 𝑝 · (𝑝 · ( 2 · 𝑝 + 3 ) + 4 ) + 5 shows that it can be done with just three. That is,
naturalness and obviousness do not guarantee that something is correct. Without a proof, we must
worry that someone will produce a clever way to do the job with less.
Section 7. NP completeness 343
V.7 Exercises
7.11 This diagram is an extension of one we saw earlier. (It assumes that P ≠ NP.)
NP hard
RE Rec
P NP
✓ 7.17 Assume that P ≠ NP. Which of these statements can we infer from the fact
that the Prime Factorization problem is in NP, but is not known to be NP complete?
(a) There exists an algorithm for arbitrary instances of the Prime Factorization
problem.
(b) There exists an algorithm that efficiently solves arbitrary instances of this
problem.
(c) If we found an efficient algorithm for the Prime Factorization problem then we
could immediately use it to solve Traveling Salesman.
✓ 7.18 Suppose that L1 ≤𝑝 L0 . For each, decide if you can conclude it. (a) If L0 is
NP complete then so is L1 . (b) If L1 is NP complete then so is L0 . (c) If L0 is
NP complete and L1 is in NP then L1 is NP complete. (d) If L1 is NP complete
and L0 is in NP then L0 is NP complete. (e) It cannot be the case that both L0
and L1 are NP complete (f) If L1 is in P then so is L0 . (g) If L0 is in P then so
is L1 .
7.19 Show that these are in NP but are not NP complete, assuming that P ≠ NP.
(a) The language of even numbers.
(b) The language { G G has a vertex cover of size at most four } .
𝑣0 𝑣7 𝑣6 𝑣8
𝑣2 𝑣5 𝑣4
𝑣1 𝑣3
✓ 7.24 The Longest Path problem is to input a graph and find the longest simple
path in that graph.
(a) Find the longest path in this graph.
𝑞0 𝑞1 𝑞2
𝑞3 𝑞4 𝑞5
𝑞6 𝑞7 𝑞8
𝑞3 𝑞4 𝑞5
Section
V.8 Other classes
There are many other defined complexity classes. The next class is quite natural.
observing that 𝑛𝑐 2𝑛 is in O ( 2𝑛 ) .
𝑐 𝑐
We know by a result called the Time Hierarchy Theorem that the three classes
are not all equal. But where the division is, we don’t know. Just as we don’t today
have a proof that P is a proper subset of NP, we also don’t know whether or not
there are NP complete problems that absolutely require exponential time. The
class NP could conceivably be contained in a smaller deterministic time complexity
class — for instance, maybe Satisfiability can be solved in less than exponential
time. But we just don’t know.
EXP NP
P
8.3 Figure: The blob encloses all problems. Shaded are the three classes P, NP, and
EXP. They are drawn with strict containment, which most experts guess is the true
arrangement, but no one knows for sure.
𝑐 ∈N
𝑐 ∈N
2 3
EXP = DTIME ( 2𝑛 ) = DTIME ( 2𝑛 ) ∪ DTIME ( 2𝑛 ) ∪ DTIME ( 2𝑛 ) ∪ · · ·
𝑐
Ø
𝑐 ∈N
Proof The only equality that is not immediate is the last one. Recall that a problem
is in EXP if an algorithm for it that runs in time O (𝑏 𝑝 (𝑛) ) for some constant
base 𝑏 and polynomial 𝑝 . The equality above only uses the base 2. To cover the
2 2
discrepancy, we will show that 3𝑛 ∈ O ( 2 (𝑛 ) ) . Consider lim𝑥→∞ 2 (𝑥 ) /3𝑥 . Rewrite
the fraction as ( 2𝑥 /3) 𝑥 , which when 𝑥 > 2 is larger than ( 4/3) 𝑥 , which goes to
infinity. This argument works for any base, not just 𝑏 = 3.
8.6 Remark While the above description of NP reiterates its naturalness, as we saw
earlier, the characterization that proves to be most useful in practice is that a
problem L is in NP if there is a deterministic Turing machine V such that for each
input 𝜎 there is a polynomial length witness 𝜔 and the verification on V for 𝜎
using 𝜔 takes polytime.
Space Complexity We can consider how much space is used in solving a problem.
8.7 Definition A deterministic Turing machine runs in space 𝑠 : B∗ → R+ if for
all but finitely many inputs 𝜎 , the computation on that input uses less than or
equal to 𝑠 (|𝜎 |) -many cells on the tape. A nondeterministic Turing machine runs
in space 𝑠 if for all but finitely many inputs 𝜎 , every computation path on that
input takes less than or equal to 𝑠 (|𝜎 |) -many cells.
The machine must use less than or equal to 𝑠 (|𝜎 |) -many cells even on non-
accepting computations.
8.8 Definition Let 𝑠 : N → N. A language decision problem is an element of
DSPACE (𝑠) , or SPACE (𝑠) , if that languages is decided by a deterministic Turing
machine that runs in space O (𝑠) . A problem is an element of NSPACE (𝑠) if
the languages is decided by a nondeterministic Turing machine that runs in
space O (𝑠) .
The definitions arise from a sense we have of a symmetry between time and
space, that they are both examples of computational resources. (There are other
resources; for instance we may want to minimize disk reading or writing, which
may be quite different than space usage.) But space is not just like time. For one
thing, while a program can take a long time but use only a little space, the opposite
is not possible.
Section 8. Other classes 349
8.9 Lemma Let 𝑓 : N → N. Then DTIME (𝑓 ) ⊆ DSPACE (𝑓 ) . As well, this holds for
nondeterministic machines, NTIME (𝑓 ) ⊆ NSPACE (𝑓 ) .
Proof A machine can use at most one cell per step.
8.10 Definition
The Zoo Researchers have studied a great many complexity classes. There
are so many that they have been gathered into an online Complexity Zoo, at
complexityzoo.uwaterloo.ca/.
One way to understand these classes is that defining a class asks a type of
350 Chapter V. Computational Complexity
Theory of Computing question. For instance, we have already seen that asking
whether NP equals P is a way of asking whether unbounded parallelism makes any
essential difference — can a problem change from intractable to tractable if we
switch from a deterministic to a nondeterministic machine? Similarly, we know
that P ⊆ PSPACE. In thinking about whether the two are equal, researchers are
considering the space-time tradeoff: if you can solve a problem without much
memory does that mean you can solve it without using much time?
Here is one extra class, to give some flavor of the possibilities. For more, see
the Zoo.
The class BPP, Bounded-Error Probabilistic Polynomial Time, contains the
problems solvable by an nondeterministic polytime machine such that if the answer
is ‘yes’ then at least two-thirds of the computation paths accept and if the answer is
‘no’ then at most one-third of the computation paths accept. (Here all computation
paths have the same length.) This is often identified as the class of feasible problems
for a computer with access to a genuine random-number source. Investigating
whether BPP equals P is asking whether whether every efficient randomized
algorithm can be made deterministic: are there some problems for which there are
fast randomized algorithms but no fast deterministic ones?
On reading in the Zoo, a person is struck by two things. There are many, many
results listed — we know a lot. But there also are many questions to be answered —
breakthroughs are there waiting for a discoverer.
V.8 Exercises
✓ 8.14 Give a naive algorithm for each problem that is exponential. (a) Subset Sum
problem (b) 𝑘 Coloring problem
✓ 8.16 This illustrates how large a problem can be and still be in EXP. Consider a
game that has two possible moves at each step. The game tree is binary.
(a) How many elementary particles are there in the universe?
(b) At what level of the game tree will there be more possible branches than there
are elementary particles?
(c) Is that longer than a chess game can reasonably run?
8.17 We will show that a polynomial time algorithm that calls a polynomial time
subroutine can run, altogether, in exponential time.
(a) Verify that the grade school algorithm for multiplication gives that squaring
an 𝑛 -bit integer takes time O (𝑛) .
(b) Verify that repeated squaring of an 𝑛 -bit integer gives a result that has length
2𝑖 𝑛 , where 𝑖 is the number of squarings.
(c) Verify that if your polynomial time algorithm calls a squaring subroutine 𝑛
times then the complexity is O ( 4𝑛 𝑛 2 ) , which is exponential.
Extra A. RSA Encryption 351
Extra
V.A RSA Encryption
In this chapter we have built up the sense that there are functions that are intractible
to compute. Here we see how we can try to leverage this to engineering advantage.
We will describe the celebrated RSA encryption system.
One of the great things about the interwebs, besides that you can get free books,
is that you can buy stuff. You send a credit card number and a couple of days
later the stuff appears. For this to be practical, your credit card number must be
encrypted.
When you visit a web site using a https address, that site sends you information,
called a key, that your browser uses to encrypt your card number. The web site then
uses a different key to decrypt. This is an important point: the decrypter must differ
from the encrypter since people on the net can see the encrypter information that
the site sent you. But the site keeps the decrypter information private. These two,
encrypter and decrypter, form a matched pair. We will describe the mathematical
technologies that make this work.
The arithmetic We can take the view that everything on a computer is numbers.
Consider the message ‘send money’. Its ASCII encoding is 115 101 110 100 32 109
111 110 101 121. Converting to a bitstring gives 01110011 01100101 01101110
01100100 00100000 01101101 01101111 01101110 01100101 01111001. In
decimal that’s 544 943 221 199 950 100 456 825. So there is no loss in generality
in viewing everything we do, including encryption systems, as numerical operations.
To make encryption systems, mathematicians and computer scientists have
leveraged that there are there are things we can do easily but that we do not know
how to easily undo — there are operations we can use for encryption that are fast,
but such that the operations needed to decrypt (without the decrypter) are believed
to be so slow that they are completely impractical. So this is the engineering of
Big-O.
We will describe an algorithm based on the Factoring prob-
lem. We have algorithms for multiplying numbers that are
fast. By comparison, the algorithms that we have for starting
with a number and decomposing it into factors are quite slow.
To illustrate this, you might contrast the time it takes you to
multiply two four-digit numbers by hand with the time it takes
you to factor an eight-digit number chosen at random. For Adi Shamir (b 1952), Ron
that second job set aside an afternoon; it’ll take a while. Rivest (b 1947), Leonard
Adleman (b 1945)
The algorithm that we shall describe exploits this difference.
It was invented in 1976 by three graduate students, R Rivest,
A Shamir, and L Adleman. Rivest read a paper proposing the idea of key pairs
and decided to develop an implementation. Over a year, he and Shamir came
up with a number of ideas and for each Adleman would then produce a way to
break it. Finally they thought to use Fermat’s Little Theorem (see below). Adleman
352 Chapter V. Computational Complexity
was unable to break it since, he said, it seemed that only solving the Factoring
problem would break it and no one knew how to do that. Their algorithm, called
RSA, was first announced in Martin Gardner’s Mathematical Games column in the
August 1977 issue of Scientific American. It generated a tremendous amount of
interest and excitement.
The basis of RSA is to find three numbers, a modulus 𝑛 , an encrypter 𝑒 , and a
decrypter 𝑑 , related by this equation (here 𝑚 is the message, as a number).
(𝑚𝑒 )𝑑 ≡ 𝑚 ( mod 𝑛)
and sends Alice the sequence ⟨10496, 4861⟩ . Alice recovers his message by using
her private key.
The arithmetic, fast We’ve just illustrated that RSA uses invertible operations.
There are lots of ways to get invertible operations so our understanding of RSA is
Extra A. RSA Encryption 353
50 000 𝑥/ln (𝑥 )
This theorem says that primes are common. For example, the number of primes
less than 21024 is about 21024 /ln ( 21024 ) ≈ 21024 /709.78 ≈ 21024 /29.47 ≈ 21015 .
Said another way, if we choose a number 𝑛 at random then the probability that it
is prime is about 1/ln (𝑛) and so a random number that is 1024 bits long will be a
prime with probability about 1/( ln ( 21024 )) ≈ 1/710. On average we need only
select 355 odd numbers of about that size before we find a prime. Hence we can
efficiently generate large primes by just picking random numbers, as long as we
can efficiently test their primality.
On our way to giving an efficient way to test primality, we observe that the
operations of multiplication and addition modulo 𝑚 are efficient.
1.3 Example Multiplying 3 915 421 by 52 567 004 modulo 3 looks hard. The naive
approach is to first take their product and then divide by 3 to find the remainder.
But there is a more efficient way. Rather than multiply first and then reduce
modulo 𝑚 , reduce first and then multiply. That is, we know that if 𝑎 ≡ 𝑏 ( mod 𝑚)
and 𝑐 ≡ 𝑑 ( mod 𝑚) then 𝑎𝑐 ≡ 𝑏𝑑 ( mod 𝑚) and so since 3 915 421 ≡ 1 ( mod 3)
and 52 567 004 ≡ 2 ( mod 3) we have this.
𝑎 · 2𝑎 · · · (𝑝 − 1)𝑎 ≡ 1 · 2 · · · (𝑝 − 1) ( mod 𝑝)
(𝑝 − 1) ! · 𝑎𝑝 −1 ≡ (𝑝 − 1) ! ( mod 𝑝)
𝑎 1 2 3 4 5 6
𝑎𝑝 − 1
= 𝑎6 1 64 729 4 096 15 625 46 656
6
(𝑎 − 1)/7 0 9 104 585 2 232 6 665
By Fermat’s Little Theorem, given 𝑛 , if we find a base 𝑎 with 0 < 𝑎 < 𝑛 so that
Extra A. RSA Encryption 355
there are no such computers built, although there has been progress on that. For
the moment, RSA seems safe. (There are schemes that could replace it, if needed.)
V.A Exercises
✓ A.11 There are twenty five primes less than or equal to 100. Find them.
✓ A.12 We can walk through an RSA calculation. (a) For the primes, take
𝑝 = 11, 𝑞 = 13. Find 𝑛 = 𝑝𝑞 and 𝜑 (𝑛) = (𝑝 − 1) · (𝑞 − 1) . (b) For the the
encoder 𝑒 use the smallest prime 1 < 𝑒 < 𝜑 (𝑛) that is relatively prime with 𝜑 (𝑛) .
(c) Find the decoder 𝑑 , the multiplicative inverse of 𝑒 modulo 𝑛 . (You can uses
Euclid’s algorithm, or just test the candidates.) (d) Take the message to be
represented as the number 𝑚 = 9. Encrypt it and decrypt it.
A.13 To test whether a number 𝑛 is prime, we could just try dividing it by all
numbers less than it. (a) Show that√we needn’t try all numbers less than 𝑛 , instead
we can just √
try all 𝑘 with 2 ≤ 𝑘 ≤ 𝑛 . (b) Show that we cannot lower that any
further than 𝑛 . (c) For input 𝑛 = 1012 how many numbers would you need to test?
(d) Show that this is a terrible algorithm since it is exponential in the size of the
input.
A.14 Show that the probability that a random 𝑏 -bit number is prime is about 1/𝑏 .
Extra
V.B Good-enoughness
A theory shapes the way that you look at the world, at how you see and address
what comes before you in practice. For example, Newton’s 𝐹 = 𝑚𝑎 points is a
program for analyzing physical situations: if you see an acceleration then look
around for a force. That approach has been fantastically successful, enabling us to
build bridges, send people to the moon, etc. Likewise, Darwin’s theory tells us that
if you see a change in a species then look for a reproductive advantage.
Here we will point out a way in which a naive understanding of Complexity
Theory can lead to a misunderstanding of what can be done in practice. Of course,
the theorems are right — the proofs check out, the results stand up to formalization,
etc. But in learning, we build mental models of what those formal statements
mean and there is a common misperception about solving problems that our theory
labels “hard.”
Cobham’s Thesis identifies the problems having a tractable algorithm with P.
However, we have noted that just because a problem is in P does not mean that it
has an algorithm that we could use in practice. An example is a problem whose
fastest algorithm is O (𝑛 1000 ) and another is a problem whose algorithm has a huge
coefficient, such as 21000 · 𝑛 2 . The flip side of this is that just because a problem is
NP hard does not mean that it is hard to solve on problems that we see in practice.
It could be that an algorithm’s runtime function takes a while to get big and the
first 20 000 instances are quite doable. For another way, consider this function.
Extra B. Good-enoughness 357
400 000
300 000
𝑛 lg 𝑛 – if 𝑛 is a multiple of 5
(
200 000 𝑓 (𝑛) = 2
𝑛 – otherwise
100 000
5 10 15 20
Most of the time 𝑓 grows slowly but for every fifth input it is super-polynomial.
The exceptions could be rarer than that, such as every 10 or every 1010 or even
every 10𝑥. The definition of Big O is such that as long as there are infinitely
many super-polynomial exceptions then the growth of the function as a whole is
superpoly. If a problem’s best algorithm is O (𝑓 ) then we classify that problem as
hard. But if the exception comes once in 1010 times then for any single instance
the chance of a fast runtime is awfully good.
In short, thinking that NP hard problems are sure
to be too slow to solve except for extremely small
inputs is an incomplete understanding.
An example of an NP complete problem for which
there are available very capable algorithms is the
London pubs, via Google Earth Traveling Salesman problem. There are algorithms
that can in a reasonable time find solutions for prob-
lem instances with millions of nodes, either giving the optimal solution or, with a
high probability, even more quickly finding a path just two or three percent away
from the optimal solution. Recently a group of applied mathematicians solved the
the minimal pub crawl, the shortest route to visit all 24 727 UK pubs. The optimal
tour is 45 495 239 meters. The algorithm took 305.2 CPU days, running in parallel
on up to 48 cores on Linux servers. That is a lot of computing but it is also a lot of
pubs — this is not a toy example.
Another group solved the Traveling Salesman instance of visiting all 24 978
cities in Sweden, giving a tour of about 72 500 kilometers. The approach was
to find a nearly-best solution and then use that to find the best one. The
final stages, that improved the lower bound by by 0.000 023 percent, required
eight years of computation time running in parallel on a network of Linux
workstations.
There are a number of systems for solving the Traveling Salesman problem
that are widely available. An example is that the Free mathematics system
Sage includes one. Here is a brief example. It uses a graph that is sure to Sweden
have a solution, and then we display the solution as an adjacency matrix. For tour
more, see the documentation.
358 Chapter V. Computational Complexity
sage: g = graphs.HeawoodGraph()
sage: tsp = g.traveling_salesman_problem()
sage: tsp.adjacency_matrix()
[0 0 0 0 0 1 0 0 0 0 0 0 0 1]
[0 0 1 0 0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 1 0]
[0 0 0 1 0 0 0 0 0 1 0 0 0 0]
[1 0 0 0 0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0 0 1 0 0]
[0 0 1 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0 0 0 1 0]
V.B Exercises
B.1 Critique this from social media: “The Traveling Salesman problem is NP hard.
That means that algorithms exist that solve the problem but these algorithms are
very slow. So for an input of size 100 you may already have to wait hundreds of
years.”
B.2 A naive algorithm for the Traveling Salesman problem is to try every possible
circuit. If there are 𝑛 -many cities then how many circuits are there?
B.3 Use the exhaustive search of all possible circuits to find the shortest Travellng
Salesman Problem solution for a circuit involving the graph below.
3
𝑣0 𝑣1
2 7 1 5
𝑣2 4
𝑣3
Extra
V.C SAT solvers
The prior Extra section gives an example of a problem where the best algorithm we
know is super-polynomial, but in practice that problem is solvable. For instance,
problems may have occasional hard instances — black holes where the computer
goes in and does not come out — but on most instances we can find the answer in
a reasonable time. This section demonstrates a program to solve SAT, a SAT solver,
and shows how to use that as an oracle to solve others.
A problem reduction L1 ≤𝑝 L0 gives a way to transfer problem domains, to
change questions about L1 into questions about L0 . Here we take L0 to be SAT
and for L1 we will use Soduku.
Extra C. SAT solvers 359
9 1 5
5 4 9 7
4 7 3 5 6 1 9
7 4 9 6
8
4 8 3 1 5
1 3 5 9 2
6 2 5 7 3
7 2 1 9
In case it is unfamiliar, the popular Soduku puzzle starts with a 9-by-9 array, with
some of the cells already filled in. An example is above. Players solve it by filling in
the blanks while satisfying three restrictions: every row must contain each of the
numbers 1–9, and the same holds for each column, as well as the nine subsquares.
We first argue that this is hard. The definition of the computational complexity
of a problem requires that we describe how a solution algorithm’s use of some
resource grows with the problem’s input size. For that, we must frame the problem
to allow instances of different input sizes. The natural way to generalize the puzzle
is: instead of eighty one variables 𝑥 1,1, ... 𝑥 9,9 we could use an arbitrary number,
𝑥 1, ... 𝑥𝑛 . Instead of those variables taking on 1–9, they could take on a value in
1–𝑘 (and we could call these ‘colors’ instead of ‘values’). And, in place of rows,
columns, and subsquares that are driven by the geometry, a problem instance
could have arbitrary sets, 𝑆 ⊆ {𝑥 1, ... 𝑥𝑛 } of size 𝑘 . With that, it is more than a
fixed-sized puzzle, it is a problem, and we can show that the Soduku problem is
NP complete. Thus we can fairly consider this problem to be quite hard.
However, having noted this, in this section we will ignore the generalization
and limit our attention to the traditional 9 × 9 board.
To solve puzzle instances using the SAT solver as an oracle we focus on the
reduction Soduku ≤𝑝 SAT. We must produce a function that inputs game boards
and outputs Propositional Logic expressions. Recall that expressions such as
(𝑥 1 ∨ 𝑥 2 ) ∧ (¬𝑥 2 ∨ 𝑥 3 ) or (𝑥 1 ∨ 𝑥 2 ∨ ¬𝑥 3 ) ∧ 𝑥 4 ∧ (¬𝑥 2 ∨ 𝑥 3 ) are in Conjunctive
Normal form, CNF. These are the conjunction of clauses where each clause is the
disjunction of literals (a literal is either a single atom such as 𝑥𝑖 or the negation
of an atom such as ¬𝑥𝑖 ). The SAT solver can require that input be in this form
because for any Boolean function there is a CNF expression giving that behavior;
more is in Section C.
The SAT solver that we use needs its input file formatted to a standard called
DIMACS. It starts with comment lines, beginning with the character c. Next comes
a problem line, which starts with a p, followed by a space and the problem type
cnf, then followed by a space and the number of variables, and then followed by a
space and the number of clauses. After that line the rest of the file consists of the
clause descriptions.
360 Chapter V. Computational Complexity
To describe a clause the file lists its indices. Thus we describe 𝑥 2 ∨ 𝑥 5 ∨ 𝑥 6 with
the list 2 5 6. Negations are described with negatives, so that ¬𝑥 5 ∨ 𝑥 7 ∨ ¬𝑥 9
matches -5 7 -9. These clause descriptions are separated by 0’s.†
We will have many variables. For instance, for the Soduku array’s row 1,
column 1 entry, we will have a Boolean variable 𝑥 1,1,1 , another variable 𝑥 1,1,2 , etc.,
up to 𝑥 1,1,9 . Only one of these nine will be 𝑇 and the rest will be 𝐹 . The variable
𝑥 1,1,𝑣 is 𝑇 if in our solution the number in row 1 and column 1 is 𝑣 . Otherwise this
variable is 𝐹 . Restated, if for instance in the first row and first column the puzzle
has the value 5 then 𝑥 1,1,5 is 𝑇 while all other 𝑥 1,1,𝑖 are 𝐹 .
Thus for each row, column, and value triple 𝑟, 𝑐, 𝑣 ∈ { 1, ... 9 } we will have a
variable 𝑥𝑟,𝑐,𝑣 . That’s 93 = 729 variables.
The CNF expression that we will produce has clauses of two kinds. One
describes the general rules of the game Soduku while the other is specific to the
particular starting board instance. It is as though we bought a puzzle book and
opened first to the introduction describing the rules, and later opened to the page
containing the specific partial board.
To describe the general rules we need lot of clauses describing relationships
among the variables. An example of such a rule is that exactly one of 𝑥 1,1,1 , 𝑥 1,1,2 ,
. . . 𝑥 1,1,9 is 𝑇 . There are too many of these rules to write by hand so we will get
the computer to do them.
The Racket file starts with some constants that make the code easier to read.
(define ONETONINE '(1 2 3 4 5 6 7 8 9))
(define ONETOTHREE '(1 2 3)) ;; for boxes
(define FOURTOSIX '(4 5 6))
(define SEVENTONINE '(7 8 9))
(define BOX-INDICES (list ONETOTHREE FOURTOSIX SEVENTONINE))
Each row has nine restrictions. For instance, to express that the first row
contains an entry with the value 2, we need this clause.
𝑥 1,1,2 ∨ 𝑥 1,2,2 ∨ 𝑥 1,3,2 ∨ 𝑥 1,4,2 ∨ 𝑥 1,5,2 ∨ 𝑥 1,6,2 ∨ 𝑥 1,7,2 ∨ 𝑥 1,8,2 ∨ 𝑥 1,9,2
The Racket code below produces this corresponding list: ( (1 1 2) (1 2 2) (1
3 2) (1 4 2) (1 5 2) (1 6 2) (1 7 2) (1 8 2) (1 9 2) ). We express all
of the row restrictions with one such list for each row and value.
;; row-restrictions Return list of list of triples , each list of triples meaning
;; that each row has to have each value 1-9.
(define (row-restrictions)
(define (one-row-one-value row-number variable -value)
(for/list ([column-number ONETONINE])
(list row-number column-number variable -value)))
Running that routine produces a list of lists. This is a typical member, requiring
that at least one entry in row 3 must be an 8.
((3 1 8) (3 2 8) (3 3 8) (3 4 8) (3 5 8) (3 6 8) (3 7 8) (3 8 8) (3 9 8))
Here is a typical line, requiring that at least one entry in column 7 must be an 8.
((1 7 8) (2 7 8) (3 7 8) (4 7 8) (5 7 8) (6 7 8) (7 7 8) (8 7 8) (9 7 8))
For the subsquares, the restrictions are the same in principle but the form of
the code is a bit different.
Here is one of the box restrictions produced by the Racket code below, saying that
some entry in the upper-right box has the value 8.
((7 1 8) (7 2 8) (7 3 8) (8 1 8) (8 2 8) (8 3 8) (9 1 8) (9 2 8) (9 3 8))
Running the SAT solver with just the restrictions above finds that they can be
satisfied. But there is a surprise. It finds a satisfying assignment by putting more
than one value in some boxes and no value at all in some other boxes.
Thus we need one more set of restrictions, that no entry can contain two values.
We add clauses like ¬𝑥 3,4,1 ∨ ¬𝑥 3,4,2 , meaning that the row 3 and column 4 entry
cannot be both a 1 and a 2 (it could of course be neither).
This is one of the resulting lines, enforcing that the entry in row 8 and column 8
cannot be both 6 and 9 (again, the negative means logical negation).
((8 8 -6) (8 8 -9))
To finish we must specify the initial board. For example, to tell the SAT solver
that there is a 9 in row 1 and column 3 we include the one-literal clause 𝑥 1,3,9 .
Here are the first few lines of that routine.
;; INITIAL -CLAUSES The given layout of the board. Each row is a list with a
;; triple: row number, column number, integer.
(define INITIAL -CLAUSES
(list (list '(1 3 9)) ; there is a 9 in position (1,3)
(list '(1 8 1))
(list '(1 9 5))
In this development, and in the Racket file, we work in 𝑥𝑟,𝑐,𝑣 ’s. But DIMACS
wants a single index. So we need to convert each of our variables to some 𝑥𝑘 and
back again. The formula is 𝑘 = 1 + 81 · (|𝑣 | − 1) + 9 · (𝑟 − 1) + (𝑐 − 1) .
;; triple->varnum Find the variable number associated with the row, column, and value
;; row-number column-number integers , counting starts at 1
;; variable -value integer value of the entry. If negative , then
;; the predicate is to be negated.
;; If variable -value < 0 then use the absolute value for the basic varnum,
;; but return the negative of the polynomial (indicating that the predicate is negated).
(define (triple->varnum row-number column-number variable -value)
(let ([a-value (+ (* 81 (- (abs variable -value) 1))
(* 9 (- row-number 1))
(- column-number 1)
1)]) ;; add 1 because DIMACS uses 0 to terminate clauses
(if (negative? variable -value)
(* -1 a-value)
a-value)))
;; varnum->triple From the variable number, return the associated row, column, and value
(define (varnum->triple v)
(let* ([offset (- (abs v) 1)]
[variable -value (quotient offset 81)]
[vv-removed (- offset (* 81 variable -value))]
[row-number (quotient vv-removed 9)]
[column-number (remainder vv-removed 9)])
(if (negative? v)
(list (+ 1 row-number) (+ 1 column-number) (* -1 (+ 1 variable -value)))
(list (+ 1 row-number) (+ 1 column-number) (+ 1 variable -value)))))
Now we can write all of this to the DIMACS-format file. This will convert the
clauses.
;; produce -clauses Given a list of lists of triples , produce the matching set of strings
;; for the DIMACS file
(define (produce -clauses list-of-lists)
(define (one-line-of-one variable -in-clause) ; produce line from a list of one number
(apply format "~a 0\n" variable -in-clause))
(define (one-line-of-two variables -in-clause) ; produce line from a list of two numbers
(apply format "~a ~a 0\n" variables -in-clause))
(define (one-line-of-nine variables -in-clause) ; produce line from a list of nine numbers
(apply format "~a ~a ~a ~a ~a ~a ~a ~a ~a 0\n" variables -in-clause))
And this gathers the clauses together and then calls the above routine.
(define FILE-PREAMBLE
(list (format "c ~a\n" FILENAME)
"c DIMACS format file for SAT solver\n"
(format "c ~a Jim Hefferon , hefferon.net. Public Domain.\n"
(date->string (current -date)))
(format "p cnf ~a ~a\n" (* 9 9 9) (length CLAUSES))))
(define FILE-LINES
(append
FILE-PREAMBLE
(produce -clauses CLAUSES)))
The first few lines of the result soduku.cnf look like this.
c soduku.cnf
c DIMACS format file for SAT solver
c 2022-01-16 Jim Hefferon , hefferon.net. Public Domain.
p cnf 729 3197
651 0
8 0
364 Chapter V. Computational Complexity
333 0
334 0
256 0
663 0
502 0
262 0
506 0
183 0
SATISFIABLE
It took far less than a second, on an ordinary laptop. The algorithms for SAT
solvers are exponential in the worst case but they seem to do very well in practice.
Below is the output. It contains 729 numbers but only a few fit on this page,
and anyway that many numbers would not be more enlightening than just showing
the first few.
ftpmaint@millstone:~/Documents/computing/src/scheme/complexity$ cat soduku.out
SAT
-1 -2 -3 -4 -5 -6 -7 8 -9 -10 -11 12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 24 -25 -26
Here is a positive output number that doesn’t show in the above line.
> (varnum->triple 107)
'(3 8 2)
> (show-solved-board)
'#(#(2 6 9 3 7 8 4 1 5)
#(5 8 1 4 2 9 7 6 3)
#(4 7 3 5 6 1 9 2 8)
#(8 1 2 7 4 5 3 9 6)
#(3 5 7 1 9 6 2 8 4)
#(6 9 4 8 3 2 1 5 7)
#(1 3 5 9 8 4 6 7 2)
#(9 4 6 2 5 7 8 3 1)
#(7 2 8 6 1 3 5 4 9))
In summary, we can in reasonable time solve instances of SAT that are not toy
exercises. They are large enough that to write the CNF form we had to resort to
code.
V.C Exercises
C.1 This board is described online as an especially hard Soduku. Use the routines
from this section to solve it.
3 4 5 6 9
5 4
8 1
8 2 3 7
1 8 7 3
9
8 7
8 7 2
4 9
Extra
V.D The Bounded Halting problem
This chapter’s final section develops the intuition that the class of NP complete
problems forms a transition or bridge between the solvable problems and the
unsolvable ones. Here we support that with some results.
The signature unsolvable problem is Halting problem. Here is a variant that is
easy.
4.1 Problem Given an index, an input, and a step limit, ⟨𝑒, 𝑥, 𝑆⟩ ∈ N × B∗ × N,
decide if Turing machine P𝑒 halts on 𝑥 within 𝑆 steps.
This problem is clearly solvable since we can just run the machine to see if it
halts within 𝑆 steps. Consider a version that is not so trivial.
4.2 Problem (Nondeterministic Bounded Halting problem) Given an index and a
step limit, ⟨𝑒, 𝑆⟩ (where 𝑆 is specified in unary notation), decide if on an empty
tape, the nondeterministic Turing machine P𝑒 halts within 𝑆 steps. That is, decide
if P𝑒 ’s computation history contains a branch that reaches a halting state in no
more than 𝑆 many transitions.
366 Chapter V. Computational Complexity
Start
Start
Read 𝜎
Guess 𝜔
Guess 𝜔
Y
𝑉 (𝜎, 𝜔 ) accepts? N
Y
𝑉 (𝜎, 𝜔 ) accepts? N
Halt Inf loop
Halt Inf loop
Start Start
Read 𝜎 , 𝜔 Read 𝜔
Y
𝑉 (𝜎, 𝜔 ) accepts? N Y
𝑉 (𝜎, 𝜔 ) accepts? N
Appendices
Appendix A Strings
An alphabet is a nonempty and finite set of symbols (sometimes called tokens). We
write symbols in a distinct typeface, as in 1 or a, because the alternative of quoting
them would be clunky.† A string or word over an alphabet is a finite sequence
of elements from that alphabet. The string with no elements is the empty string,
denoted 𝜀 .
One potentially surprising aspect of a symbol is that it may contain more
than one letter. For instance, a programming language may have if as a symbol,
meaning that it is indecomposable into separate letters. Another example is that
the Racket alphabet contains the symbols or and car, as well as allowing variable
names such as x, or lastname. An example of a string is (or a ready), which is
a sequence of five alphabet elements, ⟨(, or, a, ready, )⟩ .
Traditionally, we denote an alphabet with the Greek letter Σ. In this book we
will name strings with lower case Greek letters (except that we use 𝜙 for something
else) and denote the items in the string with the associated lower case roman letter,
as in 𝜎 = ⟨𝑠 0, ... 𝑠𝑛− 1 ⟩ and 𝜏 = ⟨𝑡 0, ... 𝑡𝑚− 1 ⟩ . The length of the string 𝜎 , |𝜎 | , is the
number of symbols that it contains, 𝑛 . In particular, the length of the empty string
is |𝜀 | = 0.
In place of 𝑠𝑖 we sometimes write 𝜎 [𝑖] . One convenience of this form is that we
use 𝜎 [−1] for the final character, 𝜎 [−2] for the one before it, etc. We also write
𝜎 [𝑖 : 𝑗] for the substring between terms 𝑖 and 𝑗 , including the 𝑖 -th term but not the
𝑗 -th, and we write 𝜎 [𝑖 :] for the tail substring that starts with term 𝑖 as well as
𝜎 [ : 𝑗] for 𝜎 [ 0: 𝑗] .
The notations such as diamond brackets and commas are ungainly. We usually
work with alphabets having single-character symbols and then we write strings by
omitting the brackets and commas. That is, we write 𝜎 = abc instead of ⟨a, b, c⟩ .‡
This convenience comes with the disadvantage that without the diamond brackets
the empty string is just nothing, which is why we use the separate symbol 𝜀 .#
The alphabet consisting of the bit characters is B = { 0, 1 } (we sometimes
instead use B for the set { 0, 1 } of the bits themselves). Strings over B are bitstrings
or bit strings.§
Where Σ is an alphabet, for 𝑘 ∈ N the set of length 𝑘 strings over that alphabet
is Σ𝑘. The set of strings over Σ of any finite length is Σ∗ = ∪𝑘 ∈ N Σ𝑘. The asterisk
symbol is the Kleene star, read aloud as “star.”
Strings are simple so there are only a few operations. Let 𝜎 = ⟨𝑠 0 ... 𝑠𝑛− 1 ⟩
and 𝜏 = ⟨𝑡 0, ... 𝑡𝑚− 1 ⟩ be strings over an alphabet Σ. The concatenation 𝜎 ⌢ 𝜏 or
†
We give them a distinct look to distinguish the symbol ‘a’ from the variable ‘𝑎 ’, so that we can tell “let
𝑥 = a” apart from “let 𝑥 = 𝑎 .” Symbols are not variables — they don’t hold a value, they are themselves
a value. ‡ To see why when we drop the commas we want the alphabet to consist of single-character
symbols, consider Σ = { a, aa } and the string aaa. Without the commas this string is ambiguous: it
could mean ⟨ a, aa ⟩ , or ⟨ aa, a ⟩ , or ⟨ a, a, a ⟩ . # Omitting the diamond brackets and commas also blurs
the distinction between a symbol and a one-symbol string, between a and ⟨ a ⟩ . However, dropping
the brackets it is so convenient that we accept this disadvantage. § Some authors consider infinite
bitstrings but ours will always be finite.
𝜎𝜏 appends the second string to the first, 𝜎 ⌢ 𝜏 = ⟨𝑠 0 ... 𝑠𝑛−1, 𝑡 0, ... 𝑡𝑚−1 ⟩ . Where
𝜎 = 𝜏0 ⌢ · · · ⌢ 𝜏𝑘 −1 , we say that 𝜎 decomposes into the 𝜏 ’s, and that each 𝜏𝑖 is a
substring of 𝜎 . The first substring, 𝜏0 , is a prefix of 𝜎 . The last, 𝜏𝑘 − 1 , is a suffix.
A power or replication of a string is an iterated concatenation with itself, so
that 𝜎 2 = 𝜎 ⌢ 𝜎 and 𝜎 3 = 𝜎 ⌢ 𝜎 ⌢ 𝜎 , etc. We write 𝜎 1 = 𝜎 and 𝜎 0 = 𝜀 . The reversal
𝜎 R of a string takes the symbols in reverse order: 𝜎 R = ⟨𝑠𝑛−1, ... 𝑠 0 ⟩ . The empty
string’s reversal is 𝜀 R = 𝜀 .
For example, let Σ = { a, b, c } and let 𝜎 = abc and 𝜏 = bbaac. Then the
concatenation 𝜎𝜏 is abcbbaac. The third power 𝜎 3 is abcabcabc, and the reversal
𝜏 R is caabb. A palindrome is a string that equals its own reversal. Examples are
𝛼 = abba, 𝛽 = cdc, and 𝜀 .
Exercises
A.1 Let 𝜎 = 10110 and 𝜏 = 110111 be bit strings. Find each. (a) 𝜎⌢𝜏 (b) 𝜎⌢𝜏 ⌢𝜎
(c) 𝜎 R (d) 𝜎 3 (e) 03 ⌢ 𝜎
A.2 Let the alphabet be Σ = { a, b, c }. Suppose that 𝜎 = ab and 𝜏 = bca. Find
each. (a) 𝜎 ⌢ 𝜏 (b) 𝜎 2 ⌢ 𝜏 2 (c) 𝜎 R ⌢ 𝜏 R (d) 𝜎 3
A.3 Let L = {𝜎 ∈ B∗ |𝜎 | = 4 and 𝜎 starts with 0 }. How many elements are in
that language?
A.4 Suppose that Σ = { a, b, c } and that 𝜎 = abcbccbba. (a) Is abcb a prefix of 𝜎 ?
(b) Is ba a suffix? (c) Is bab a substring? (d) Is 𝜀 a suffix?
A.5 What is the relation between |𝜎 | , |𝜏 | , and |𝜎 ⌢ 𝜏 | ? You must justify your
answer.
A.6 The operation of string concatenation follows a simple algebra. For each
of these, decide if it is true. If so, prove it. If not, give a counterexample.
R
(a) 𝛼 ⌢ 𝜀 = 𝛼 and 𝜀 ⌢ 𝛼 = 𝛼 (b) 𝛼 ⌢ 𝛽 = 𝛽 ⌢ 𝛼 (c) 𝛼 ⌢ 𝛽 R = 𝛽 R ⌢ 𝛼 R (d) 𝛼 R = 𝛼
R
(e) 𝛼 𝑖 = 𝛼 𝑖
A.7 Show that string concatenation is not commutative, that there are strings 𝜎
and 𝜏 so that 𝜎 ⌢ 𝜏 ≠ 𝜏 ⌢ 𝜎 .
A.8 In defining decomposition above we have ‘𝜎 = 𝜏0 ⌢ · · · ⌢ 𝜏𝑛− 1 ’, without
parentheses on the right side. This takes for granted that the concatenation
operation is associative, that no matter how we parenthesize it we get the same
string. Prove this. Hint: use induction on the number of substrings, 𝑛 .
A.9 Prove that this constructive definition of string power is equivalent to the one
above.
– if 𝑛 = 0
(
𝜀
𝜎 𝑛 = 𝑛−1 ⌢
𝜎 𝜎 – if 𝑛 > 0
Appendix B Functions
A function is an input-output relationship: each input is associated with a unique
output. An example is the association of each input natural number with the output
number that is its square. Another is the association of each string of characters
with the length of that string. A third is the association of each polynomial
𝑎𝑛 𝑥 𝑛 + · · · + 𝑎 1𝑥 + 𝑎 0 with a Boolean value 𝑇 or 𝐹 , depending on whether 1 is a
root of that polynomial.
An important point is that, contrary to what is said in most introductions, a
function isn’t a ‘rule’. The function that associates a year with that year’s winners
of the US baseball World Series isn’t given by any rule simpler than an exhaustive
listing of all cases. Nor is the kind of association that a database might have, such
as linking the government ID of US citizens to their income in the most recent
tax year. True, in science many functions are described by a formula, such as
𝐸 (𝑚) = 𝑚𝑐 2, and as well many functions are computed by a program. But what
makes something a function is that for each input there is exactly one associated
output. If we can go from an input to the associated output with a calculation then
that’s great but even if we can’t, it is still a function.
For a precise definition fix two sets, a domain 𝐷 and a codomain 𝐶 . A function
or map, 𝑓 : 𝐷 → 𝐶 , is a set of pairs (𝑥, 𝑦) ∈ 𝐷 × 𝐶 , subject to the restriction of
being well-defined, that every 𝑥 ∈ 𝐷 appears as the first entry in one and only
one pair (more on well-definedness is below). We write 𝑓 (𝑥) = 𝑦 or 𝑥 ↦→ 𝑦 and
say that ‘𝑥 maps to 𝑦 ’. Note the difference between the arrow symbols used in
𝑓 : 𝐷 → 𝐶 and 𝑥 ↦→ 𝑦 . We say that 𝑥 is an input or argument to the function, and
that 𝑦 is an output or value of the function.
Some functions take more than one input, such as dist (𝑥, 𝑦) = 𝑥 2 + 𝑦 2 . We say
√︁
that this function is 2-ary, while some other functions are 3-ary, etc. The number of
inputs is the function’s arity. If the function takes only one input but that input is a
tuple then we often drop the parentheses, so we write 𝑓 ( 3, 5) instead of 𝑓 (( 3, 5)) .
20
2
10
−2 2
2
− 10
We also illustrate functions with a bean diagram, which separates the domain and
the codomain sets. Below on the left is the action of the exclusive or operator while
on the right is a variant of the bean diagram, showing the absolute value function
mapping integers to integers.
3 3
2 2
𝐹 ,𝐹 1 1
𝐹 ,𝑇 𝐹
𝑇 ,𝐹 0 0
𝑇
𝑇 ,𝑇
−1 −1
−2 −2
−3 −3
always nonnegative and so the output is real, writing 𝑓 : R → R where the second
R is the codomain, rather than troubling to find its exact range.
†
Sometimes people say that they are, “checking that the function is well-defined.” In a strict sense this
is confused, because if it is a function then it is by definition well-defined. However, natural language is
funny this way — while all tigers have stripes, we may well sometimes say “striped tiger.”
The most common way to verify that a function is onto is to start with a generic
(that is, arbitrary) codomain element 𝑦 and then exhibit a domain element 𝑥 that
maps to it. If a function is suitable for graphing on 𝑥𝑦 axes then visual proof that it
is onto is that for any 𝑦 in the codomain, the horizontal line at 𝑦 intercepts the
graph in at least one point.
As the above pictures suggest, where the domain and codomain are finite, when
there is a function 𝑓 : 𝐷 → 𝐶 then we can conclude that the number of elements
in the domain is less than or equal to the number in the codomain. Further, if the
function is onto then the number of elements in the domain equals the number in
the codomain if and only if the function is one-to-one.
0 1 2 3 4 5 6 7 ...
...
2 3 5 7 11 13 17 19
Exercises
B.1 Let 𝑓 , 𝑔 : R → R be 𝑓 (𝑥) = 3𝑥 + 1 and 𝑔(𝑥) = 𝑥 2 + 1. (a) Show that 𝑓 is
one-to-one and onto. (b) Show that 𝑔 is not one-to-one and not onto.
B.2 Show each of these.
(a) Let 𝑔 : R3 → R2 be the projection map (𝑥, 𝑦, 𝑧) ↦→ (𝑥, 𝑦) and let 𝑓 : R2 → R3
be (𝑥, 𝑦) ↦→ (𝑥, 𝑦, 0) . Then 𝑔 is a left inverse of 𝑓 but not a right inverse.
(b) The function 𝑓 : Z → Z given by 𝑓 (𝑛) = 𝑛 2 has no left inverse.
(c) Where 𝐷 = { 0, 1, 2, 3 } and 𝐶 = { 10, 11 } , the function 𝑓 : 𝐷 → 𝐶 given by
0 ↦→ 10, 1 ↦→ 11, 2 ↦→ 10, 3 ↦→ 11 has more than one right inverse.
B.3
(a) Where 𝑓 : Z → Z is 𝑓 (𝑎) = 𝑎 + 3 and 𝑔 : Z → Z is 𝑔(𝑎) = 𝑎 − 3, show that 𝑔
is inverse to 𝑓 .
(b) Where ℎ : Z → Z is the function that returns 𝑛 + 1 if 𝑛 is even and returns
𝑛 − 1 if 𝑛 is odd, find a function inverse to ℎ .
(c) If 𝑠 : R+ → R+ is 𝑠 (𝑥) = 𝑥 2 , find its inverse.
B.4 Fix 𝐷 = { 0, 1, 2 } and 𝐶 = { 10, 11, 12 }. Let 𝑓 , 𝑔 : 𝐷 → 𝐶 be 𝑓 ( 0) = 10,
𝑓 ( 1) = 11, 𝑓 ( 2) = 12, and 𝑔( 0) = 10, 𝑔( 1) = 10, 𝑔( 2) = 12. Then: (a) verify
that 𝑓 is a correspondence (b) construct an inverse for 𝑓 (c) verify that 𝑔 is not a
correspondence (d) show that 𝑔 has no inverse.
B.5
(a) Prove that a composition of one-to-one functions is one-to-one.
(b) Prove that a composition of onto functions is onto. With the prior item, this
gives that a composition of correspondences is a correspondence.
(c) Prove that if 𝑔 ◦ 𝑓 is one-to-one then 𝑓 is one-to-one.
(d) Prove that if 𝑔 ◦ 𝑓 is onto then 𝑔 is onto.
(e) If 𝑔 ◦ 𝑓 is onto, must 𝑓 be onto? If it is one-to-one, must 𝑔 be one-to-one?
B.6 Prove each.
(a) A function 𝑓 has an inverse if and only if 𝑓 is a correspondence.
(b) If a function has an inverse then that inverse is unique.
(c) The inverse of a correspondence is a correspondence.
(d) If 𝑓 and 𝑔 are each invertible then so is 𝑔 ◦ 𝑓 , and (𝑔 ◦ 𝑓 ) − 1 = 𝑓 − 1 ◦ 𝑔 − 1 .
B.7 Prove these for a function 𝑓 with a finite domain 𝐷 . They imply that
corresponding finite sets have the same size. Hint: for each, you can do induction
on either | ran (𝑓 )| or |𝐷 | .
(a) | ran (𝑓 )| ≤ |𝐷 |
(b) If 𝑓 is one-to-one then | ran (𝑓 )| = |𝐷 | .
Appendix C Propositional logic
A proposition is a statement that has a Boolean value, that is, it is either true
or false, which we write 𝑇 or 𝐹. For instance, ‘7 is odd’ and ‘82 − 1 = 127’ are
propositions, with values 𝑇 and 𝐹. In contrast, ‘𝑥 is a perfect square’ is not a
proposition because for some 𝑥 it is 𝑇 while for others it is not.
We can operate on propositions, including negating as with ‘it is not the case
that 8 is prime’, or taking the conjunction of two propositions as with ‘5 is prime
and 7 is prime’. The truth tables below define the behavior of not (also called
negation), and (also called conjunction), and or (also called disjunction).
not 𝑃 𝑃 and 𝑄 𝑃 or 𝑄
𝑃 ¬𝑃 𝑃 𝑄 𝑃 ∧𝑄 𝑃 ∨𝑄
𝐹 𝑇 𝐹 𝐹 𝐹 𝐹
𝑇 𝐹 𝐹 𝑇 𝐹 𝑇
𝑇 𝐹 𝐹 𝑇
𝑇 𝑇 𝑇 𝑇
Thus where ‘7 is odd’ is 𝑃 , and ‘8 is prime’ is 𝑄 , get the value of ‘7 is odd and 8
is prime’ from the right-hand table’s third column, third row: 𝐹. Observe that ∨
accumulates truth, in that if any of its inputs are 𝑇 then 𝑃 ∨ 𝑄 is 𝑇 . Similarly, ∧
accumulates 𝐹 .
In some fields the practice is to write 0 where we write 𝐹 and 1 in place of 𝑇.
The advantage of using symbols over writing the sentences out is that we can
express more things. For instance, if 𝑃 stands for ‘7 is odd’, 𝑄 stands for ‘9 is a
perfect square’, and 𝑅 means ‘11 is prime’ then (𝑃 ∨ 𝑄) ∧ ¬(𝑃 ∨ (𝑅 ∧ 𝑄)) is too
complex to comfortably state in everyday language. We call that a propositional
logic expression and denote it with a capital Roman letter such as 𝐸 .
Truth tables help in working out the behavior of the complex statements by
building them up from their components. The table below shows the input/output
behavior of (𝑃 ∨ 𝑄) ∧ ¬(𝑃 ∨ (𝑅 ∧ 𝑄)) .
The three ‘¬’, ‘∧’, and ‘∨’ are operators (or connectives). There are other
operators; here are two common ones.
𝑃 implies 𝑄 𝑃 if and only if 𝑄
𝑃 𝑄 𝑃 →𝑄 𝑃 ↔𝑄
𝐹 𝐹 𝑇 𝑇
𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝐹
𝑇 𝑇 𝑇 𝑇
Two statements are equivalent (or logically equivalent) if they have equal
output values whenever we give the same values to the variables. For instance,
𝑃 → 𝑄 is equivalent to ¬𝑃 ∨ 𝑄 , because if we assign 𝑃 = 𝐹, 𝑄 = 𝐹 then they both
give the value 𝑇 , if we assign 𝑃 = 𝐹, 𝑄 = 𝑇 then they also give the same value, etc.
That is, the statements are equivalent when their truth tables have the same final
column. We denote equivalence using ≡, as with 𝑃 → 𝑄 ≡ ¬𝑃 ∨ 𝑄
The set of formulas describing when statements are equivalent is Boolean
algebra. For instance, these are the distributive laws
𝑃 ∧ (𝑄 ∨ 𝑅) ≡ (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) 𝑃 ∧ (𝑄 ∨ 𝑅) ≡ (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅)
¬(𝑃 ∧ 𝑄) ≡ ¬𝑃 ∨ ¬𝑄 ¬(𝑃 ∨ 𝑄) ≡ ¬𝑃 ∧ ¬𝑄
The three operators ‘¬’, ‘∧’, and ‘∨’ form a complete set in that we can reverse
the activity above: for any truth table we can use the three to produce an expression
whose input/output behavior is that table. In short, we can produce expressions
with any desired behavior. Here are two examples.
𝑃 𝑄 𝐸0 𝑃 𝑄 𝑅 𝐸1
𝐹 𝐹 𝑇 𝐹 𝐹 𝐹 𝐹
𝐹 𝑇 𝐹 𝐹 𝐹 𝑇 𝑇
𝑇 𝐹 𝐹 𝐹 𝑇 𝐹 𝑇
𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇
𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝐹
𝑇 𝑇 𝑇 𝐹
Exercises
C.1 Make a truth table for each of these propositions. (a) (𝑃∧𝑄)∧𝑅 (b) 𝑃∧(𝑄∧𝑅)
(c) 𝑃 ∧ (𝑄 ∨ 𝑅) (d) (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅)
C.2 Make a truth table for these. (a) ¬(𝑃 ∨ 𝑄) (b) ¬𝑃 ∧ ¬𝑄 (c) ¬(𝑃 ∧ 𝑄)
(d) ¬𝑃 ∨ ¬𝑄
C.3 For the tables below, construct a DNF propositional logic expression: (a) the
table on the left, (b) the one on the right.
𝑃 𝑄 𝑅 𝑃 𝑄 𝑅
𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝑇
𝐹 𝐹 𝑇 𝑇 𝐹 𝐹 𝑇 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝐹 𝑇
𝐹 𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝐹 𝑇 𝐹 𝐹 𝐹
𝑇 𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹 𝑇
𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇
C.4 For the tables in the prior exercise, construct a CNF propositional logic
expression: (a) the table on the left, (b) the one on the right.
C.5 There are sixteen binary logical operators. Give all sixteen truth tables, and
give the operator’s name, such as ‘𝑃 → 𝑄 ’ or ‘𝑄 → 𝑃 ’.
Part Five
Notes
Endnotes
These are citations, sources, or discussions that supplement the text body. Each refers to a word or phrase
from that text body, in italics, and then the note is in plain text. Many of the entries include links to more
detail.
Cover
Calculating the bonus http://www.loc.gov/pictures/item/npc2007012636/
Preface
in addition to technical detail, also attends to a breadth of knowledge S Pinker emphasizes that a liberal
approach involves making connections and understanding in a context (Pinker 2014). “It seems to me
that educated people should know something about the 13-billion-year prehistory of our species and the
basic laws governing the physical and living world, including our bodies and brains. They should grasp
the timeline of human history from the dawn of agriculture to the present. They should be exposed
to the diversity of human cultures, and the major systems of belief and value with which they have
made sense of their lives. They should know about the formative events in human history, including
the blunders we can hope not to repeat. They should understand the principles behind democratic
governance and the rule of law. They should know how to appreciate works of fiction and art as sources
of aesthetic pleasure and as impetuses to reflect on the human condition. On top of this knowledge,
a liberal education should make certain habits of rationality second nature. Educated people should
be able to express complex ideas in clear writing and speech. They should appreciate that objective
knowledge is a precious commodity, and know how to distinguish vetted fact from superstition, rumor,
and unexamined conventional wisdom. They should know how to reason logically and statistically,
avoiding the fallacies and biases to which the untutored human mind is vulnerable. They should think
causally rather than magically, and know what it takes to distinguish causation from correlation and
coincidence. They should be acutely aware of human fallibility, most notably their own, and appreciate
that people who disagree with them are not stupid or evil. Accordingly, they should appreciate the
value of trying to change minds by persuasion rather than intimidation or demagoguery.” See also
https://www.aacu.org/leap/what-is-a-liberal-education.
computational thinking http://www.cs.cmu.edu/afs/cs/usr/wing/www/publications/Wing06.pdf
Prologue
D Hilbert and W Ackermann Hilbert was a very prominent mathematician, perhaps the world’s most
prominent mathematician, and Ackermann was his student. So they made an impression when they
wrote, “[This] must be considered the main problem of mathematical logic” (Hilbert and Ackermann
1950), p 73.
mathematical statement Specifically, the statement as discussed by Hilbert and Ackermann comes from a
first-order logic (versions of the Entscheidungsproblem for other systems had been proposed by other
mathematicians). First-order logic differs from propositional logic, the logic of truth tables, in that it
allows variables. Thus for instance if you are studying the natural numbers then you can have a Boolean
function Prime (𝑥) . (In this context a Boolean function is traditionally called a ‘predicate’.) To make
a statement that is either true or false we must then quantify statements, as in the (false) statement
“for all 𝑥 ∈ N, Prime (𝑥) implies PerfectSquare (𝑥) .” The modifier “first-order” means that the variables
used by the Boolean functions are members of the domain of discourse (for Prime above it is N), but
we cannot have that variables themselves are Boolean functions. (Allowing Boolean functions to take
Boolean functions as input is possible, but would make this a second-order, or even higher-order, logic.)
after a run He was 22 years old at the time. (Hodges 1983), p 96. This book is the authoritative source
for Turing’s fascinating life. During the Second World War, he led a group of British cryptanalysts at
Bletchley Park, Britain’s code breaking center, where his section was responsible for German naval
codes. He devised a number of techniques for breaking German ciphers, including an electromechanical
machine that could find settings for the German coding machine, the Enigma. Because the Battle of the
Atlantic was critical to the Allied war effort, and because cracking the codes was critical to defeating the
German submarine effort, Turing’s work was very important. (The major motion picture on this The
Imitation Game (Wikipedia contributors 2016e) is a fun watch but is not a slave to historical accuracy.)
After the war, at the National Physical Laboratory he made one of the first designs for a stored-program
computer. In 1952, when it was a crime in the UK, Turing was prosecuted for homosexual acts. He was
given chemical castration as an alternative to prison. He died in 1954 from cyanide poisoning which
an inquest determined was suicide. In 2009, following an Internet campaign, British Prime Minister
G Brown made an official public apology on behalf of the British government for “the appalling way he
was treated.”
Olympic marathon His time at the qualifying event was only ten minutes behind what was later the winning
time in the 1948 Olympic marathon. For more, see https://www.turing.org.uk/book/update/pa
rt6.html and http://www-groups.dcs.st-and.ac.uk/~history/Extras/Turing_running.html.
clerk Before the engineering of computing machines had advanced enough to make capable machines
widely available, much of what we would today do with a program was done by people, then called
“computers.” This book’s cover shows such computers at work.
Another example, as told in the film Hidden Figures, is that the trajectory for US astronaut John Glenn’s
pioneering orbit of Earth was found by the human computer Katherine Johnson and her colleagues,
African American women whose accomplishments are all the more impressive because they occurred
despite appalling discrimination.
don’t involve random methods We can build things that return completely random results; one example is a
device that registers consecutive clicks on a Geiger counter and if the second gap between clicks is longer
then the first it returns 1, else it returns 0. See also https://blog.cloudflare.com/randomness-
101-lavarand-in-production/.
continuous methods Before there were computers, engineers worked with analog models that were
sometimes quite large; see (Wikipedia contributors 2021). In these models there is no sense of step
one, step two.
analog devices See (A/V Geeks 2013) about slide rules, (Wikipedia contributors 2016c) about nomograms,
(YouTube user navyreviewer 2010) about a naval firing computer, and (Gizmodo 1948) about a more
general-purpose machine. See also https://www.youtube.com/watch?v=qqlJ50zDgeA about the
Antikythera mechanism. For a more recent take, see https://www.youtube.com/watch?v=GVsUOuSj
vcg.
reading results off of a slide rule or an instrument dial Suppose that an intermediate result of a calculation
is 1.23. If we read it off the slide rule with the convention that the resolution accuracy is only one
decimal place then we write down 1.2. Doubling that gives 2.4. But doubling the original number
2 · 1.23 = 2.46 and then rounding to one place gives 2.5.
no upper bound This explication is derived from (Rogers 1987), p 1–5.
more is provided Perhaps the clerk has a helper or the mechanism has a person attending it.
A reader may object that this violates the goal of the definition, to model in-principle-physically-realizable
computations We all know computations with no natural bounds. The long division algorithm that
we learn in grade school has no inherent bounds on the lengths of either inputs or outputs, or on the
amount of available scratch paper.
are so elementary that we cannot easily imagine them further divided (Turing 1937), (Turing 1938a)
LEGO’s See for instance https://www.youtube.com/watch?v=RLPVCJjTNgk&t=114s.
Finally, it trims off a 1 The instruction 𝑞 4 11𝑞 5 won’t ever be reached, but it does no harm. It is there for
the definition of a Turing machine, to make Δ defined on all 𝑞𝑝𝑇𝑝 . See also the note to that definition.
transition function The definition describes Δ as a function Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × 𝑄 . That is a
fudge. In Ppred , the state 𝑞 3 is used only for the purpose of halting the machine and so there is no
defined next state. In Padd , the state 𝑞 5 plays the same role. So, strictly speaking, the transition function
is a partial function, one where for some members of the domain there is no associated value; see
page 373. (Alternatively, we could write the set of states as 𝑄 ∪ 𝑄ˆ where the states in 𝑄ˆ are there only
for halting, and the transition function’s definition is Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × (𝑄 ∪ 𝑄) ˆ .) We have
left this point out of the main presentation since it doesn’t cause confusion and the discussion can be a
distraction.)
a complete description of a machine’s action It is reasonable to ask why our standard model, the Turing
machine, is one that is so basic that programming it can be annoying. Why not choose a real world
machine? The reason is that, as here, we can completely describe the actions of the Turing machine
model, or of any of the other simple model that are sometimes used, in only a few paragraphs. A real
machine would take a full book, and a full semester. We do Turing machines because they are simple
to describe (they are also historically important, and the work in Chapter Five needs them).
𝑞 is a state, a member of 𝑄 We are vague about what ‘states’ are but we assume that whatever they are,
the set of states 𝑄 is disjoint from the set Σ ∪ { L, R }.
a snapshot, an instant in a computation So the configuration along with the Turing machine is all the
information that you need to continue a computation — it encapsulates the future history of that
computation.
A state machine is a device that stores the status of something at a given time. On input it can change
the status (it can also cause an action or output to take place). Mathematically, it is a finite set
𝑄 = {𝑞 0, ... 𝑞𝑛 } along with a function Δ : 𝑄 × Σ → 𝑄 .
rather than, “this shows that 𝜙 takes a string representing 3 to a string representing 5.” That is, we do this
for the same reason that we would say, “This is me when I was ten.” instead of, “This is a picture of me
when I was ten.”
a physical system evolves through a sequence of discrete steps that are local, meaning that all the action takes
place within one cell of the head Adapted from (Wigderson 2017).
constructed the first machine See (Leupold 1725).
A number of mathematicians See also (Wikipedia contributors 2014).
Church suggested to the most prominent expert in the area, Gödel (Soare 1999)
established beyond any doubt (Gödel 1995)
This is central to the Theory of Computation Some authors have claimed that neither Church nor Turing
stated anything as strong as is given here but instead that they proposed that the set of things that can
be done by a Turing machine is the same as the set of things that are computable by a human computer
(see for instance (Copeland and Proudfoot 1999)). But the thesis as stated here, that what can be done
by a Turing machine is what can be done by any physical mechanism that is discrete and deterministic,
is certainly the thesis as it is taken by most researchers in the field today. And besides, Church and
Turing did not in fact distinguish between the two cases; (Hodges 2016) points to Church’s review of
Turing’s paper in the Journal of Symbolic Logic: “The author [i.e. Turing] proposes as a criterion that
an infinite sequence of digits 0 and 1 be ‘computable’ that it shall be possible to devise a computing
machine, occupying a finite space and with working parts of finite size, which will write down the
sequence to any desired number of terms if allowed to run for a sufficiently long time. As a matter of
convenience, certain further restrictions are imposed on the character of the machine, but these are of
such a nature as obviously to cause no loss of generality — in particular, a human calculator, provided
with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine.” This has
Church referring to the human calculator not as the prototype but instead as a special case of the class
of defined machines.
We cannot give a mathematical proof of Church’s Thesis We cannot give a proof that starts from axioms
whose justification is on firmer footing than the thesis itself. R Williams has commented, “[T]he
Church-Turing thesis is not a formal proposition that can be proved. It is a scientific hypothesis, so it can
be ‘disproved’ in the sense that it is falsifiable. Any ‘proof ’ must provide a definition of computability
with it, and the proof is only as good as that definition.” (Stack Exchange author Ryan Williams 2010)
formalizes ‘intuitively mechanically computable’ Kleene wrote that “its role is to delimit precisely an
hitherto vaguely conceived totality.” (Kleene 1952), p 318.
Turing wrote (Turing 1937)
systematic error (Dershowitz and Gurevich 2008) p 304.
it may be the right answer Gödel wrote, “the great importance . . . [of] Turing’s computability [is] largely
due to the fact that with this concept one has for the first time succeeded in giving an absolute definition
of an interesting epistemological notion, i.e., one not depending on the formalism chosen.” (Gödel
1995), pages 150–153.
can compute all of the functions that can be computed by machines with two or more tapes For instance, we
can simulate a two-tape machine P2 on a one-tape machine P1 . One way to do this is by having P1 use
its even-numbered tape positions for P2 ’s first tape and using its odd tape positions for P2 ’s second tape.
(A more hand-wavy explanation is: a modern computer can clearly simulate a two-tape Turing machine
but a modern computer has sequential memory, which is like the one-tape machine’s sequential tape.)
compute the same set of functions We must adjust the convention for what is the output of a function.
evident immediately (Church 1937)
S Aaronson has made this point From his blog Shtetl-Optimized, (Aaronson 2012b).
supply a stream of random bits Some CPU’s come with that capability built in; see for instance
https://en.wikipedia.org/wiki/RdRand.
beyond discrete and deterministic From (Stack Exchange author Andrej Bauer 2016): “Turing machines
are described concretely in terms of states, a head, and a working tape. It is far from obvious that this
exhausts the computing possibilities of the universe we live in. Could we not make a more powerful
machine using electricity, or water, or quantum phenomena? What if we fly a Turing machine into a
black hole at just the right speed and direction, so that it can perform infinitely many steps in what
appears finite time to us? You cannot just say ‘obviously not’ — you need to do some calculations in
general relativity first. And what if physicists find out a way to communicate and control parallel
universes, so that we can run infinitely many Turing machines in parallel time?”
everything that experiments with reality would ever find to be possible Modern Physics is a sophisticated
and advanced field of study so we could doubt that anything large has been overlooked. However,
there is historical reason for supposing that such a thing is possible. The physicists H von Helmholtz
in 1856 and S Newcomb in 1892 calculated that the Sun is about 20 million years old (they assumed
that the Sun glowed from the energy provided by its gravitational contraction in condensing from a
nebula of gas and dust to its current state). Consistently with that, one of the world’s most reputable
physicists, W Kelvin, estimated in 1897 that the Earth was, “more than 20 and less than 40 million year
old, and probably much nearer 20 than 40” (he calculated how long it would take the Earth to cool
from a completely molten object to its present temperature). He said, “unless sources now unknown to
us are prepared in the great storehouse of creation” then there was not enough energy in the system
to justify a longer estimate. One person very troubled by this was Darwin, having himself found that
a valley in England took 300 million years to erode, and consequently that there was enough time,
called “deep time,” for the slow but steady process of evolution of species to happen. Then, in 1896,
A Becquerel discovered radiation. Everything changed. All of the prior calculations did not account for
it and the apparent discrepancy vanished. (Wikipedia contributors 2016a)
the solution is not computable See (Pour-El and Richards 1981).
compute an exact solution See http://www.smbc-comics.com/?id=3054.
Three-Body Problem See https://en.wikipedia.org/wiki/Three-body_problem.
we can still wonder See (Piccinini 2017).
This big question remains open A sample of readings: frequently cited is (Black 2000), which takes the
thesis to be about what is humanly computable, and (Copeland 1996), (Copeland 1999), and (Copeland
2002) argue that computations can be done that are beyond the capabilities of Turing machines. Against
that are (Davis 2004), (Davis 2006), and (Gandy 1980), which give arguments that many Theory of
Computing researchers consider conclusive.
the mainstream community of researchers takes Church’s Thesis as the basis for its work For some idea of
additional views see (Zenil 2012).
Often when we want to show that something is computable The same point stated another way, from (Stack
Exchange author Andrej Bauer 2018): In books on computability theory it is common for the text to
skip details on how a particular machine is to be constructed. The author of the computability book
will mumble something about the Turing-Church thesis somewhere in the beginning. This is to be read
as “you will have to do the missing parts yourself, or equip yourself with the same sense of inner feeling
about computation as I did”. Often the author will give you hints on how to construct a machine, and
call them “pseudo-code”, “effective procedure”, “idea”, or some such. The Church-Turing thesis is the
social convention that such descriptions of machines suffice. (Of course, the social convention is not
arbitrary but rather based on many years of experience on what is and is not computable.) . . . I am
not saying that this is a bad idea, I am just telling you honestly what is going on. . . . So what are we
supposed to do? We certainly do not want to write out detailed constructions of machines, because
then students will end up thinking that’s what computability theory is about. It isn’t. Computability
theory is about contemplating what machines we could construct if we wanted to, but we don’t. As
usual, the best path to wisdom is to pass through a phase of confusion.
Suppose that you have infinitely many dollars. (MathOverflow user Joel David Hamkins 2010)
H Grassmann produced a more elegant definition In 1888 Dedekind used this definition to give the first
rigorous proof of the laws of elementary school arithmetic.
it specifies the meaning, the semantics, of the operation A Perl’s epigram, “Recursion is the root of
computation since it trades description for time” expresses this idea. The recursive definition includes
steps implicitly, and with them time, in that you need to keep expanding the recursive calls. But it does
not include them in preference to what they are about.
logically problematic The sense that there is something perplexing about recursion is often expressed with
a story.
W James gave a public lecture on cosmology, and was approached by an older woman from
the audience. “Your idea that the sun is the center of the solar system and the earth orbits
around it has a good ring Mr James, but it’s wrong.” she said. “Our crust of earth lies on the
back of a giant turtle.” James gently asked, “What does this turtle stand on?” “You’re very
clever, Mr James,” she replied, “but the first turtle stands on the back of a second, far larger,
turtle.” James persisted, “And the second turtle, Madam?” Immediately she crowed, “It’s no
use Mr James — it’s turtles all the way down!” (Wikipedia contributors 2016f)
See https://xkcd.com/1416.
Another widely known reference is that with the invention of better microscopes, scientists studying
fleas came to see that the fleas themselves had parasites. The Victorian mathematician Augustus De
Morgan wrote a poem (derived from one of Jonathan Swift) called Siphonaptera, which is the biological
order of fleas.
Great fleas have little fleas upon their backs to bite ’em,
And little fleas have lesser fleas, and so ad infinitum.
See also Room 8, winner of the 2014 short film award from the British Academy of Film and Television
Arts.
define the function on higher-numbered inputs using only its values on lower-numbered ones For the function
specified by 𝑓 ( 0) = 1 and 𝑓 (𝑛) = 𝑛 · 𝑓 (𝑓 (𝑛 − 1) − 1) , try computing the values 𝑓 ( 0) through 𝑓 ( 5) .
the first sequence of numbers ever computed on an electronic computer It was computed on EDSAC on
1949-May-06. See (N. J. A. Sloane 2019) and (Renwick 1949).
Towers of Hanoi The puzzle was invented by E Lucas in 1883 but the next year H De Parville made of it
quite a great problem with the delightful problem statement.
hyperoperation (Goodstein 1947)
4
H4 ( 4, 3) = 44 is much greater than the number of elementary particles in the universe The radius of the
universe if about 45 × 109 light years. That’s about 1062 Plank units. A system of much more than
𝑟 1.5 particles packed in 𝑟 Plank units will collapse rapidly. So the number of particles is less than 1092 ,
which is much less than H3 ( 4, 4) ≈ 10154 (solve 4256 = 10𝑥 by taking the logarithm base 10 of both
sides to get 𝑥 = 256 · log ( 4) ≈ 154.13). (Levin 2016)
Ackermann function There are many different Ackermann functions in the literature. A common one is
the function of one variable A (𝑘, 𝑘) . See (Wikipedia contributors 2024).
a programming language having only bounded loops computes the primitive recursive functions (Meyer and
Ritchie 1966)
output only primes In fact, there is no one-input polynomial with integer coefficients that outputs a prime
for all integer inputs, except if the polynomial is constant. This was shown in 1752 by C Goldbach.
The proof is so simple and delightful, and not widely known, that we will give it here. Suppose 𝑝 is a
polynomial with integer coefficients that on integer inputs returns only primes. Fix some 𝑛ˆ ∈ N, and
then 𝑝 (𝑛)
ˆ =𝑚 ˆ is a prime. Into the polynomial plug 𝑛ˆ + 𝑘 · 𝑚ˆ , where 𝑘 ∈ Z. Expanding gives lots of
terms with 𝑚 ˆ in them, and gathering together like terms shows 𝑝 (𝑛ˆ + 𝑘 · 𝑚)
ˆ ≡ 𝑝 (𝑛)
ˆ mod 𝑚 ˆ . Because
ˆ =𝑚
𝑝 (𝑛) ˆ , this gives that 𝑝 (𝑛ˆ + 𝑘 · 𝑚)
ˆ =𝑚
ˆ since that is the only prime number that is a multiple of 𝑚
ˆ,
and 𝑝 outputs only primes. But with that, 𝑝 (𝑛) = 𝑚 ˆ has infinitely many roots, and is therefore the
constant polynomial.
this relates unbounded search to the Entscheidungsproblem It is possible that neither search will halt. It is
possible that the conjecture is true but not provable from the axiom system that you are using.
Collatz conjecture See (Wikipedia contributors 2019a).
sin (𝑥) may be calculated via its Taylor polynomial The Taylor series is sin (𝑥) = 𝑥 −𝑥 3 /3!+𝑥 5 /5!−𝑥 7 /7!+· · · .
We might do a practical calculation by deciding that a sufficiently good approximation is to terminate
that series at the 𝑥 5 term, giving a Taylor polynomial.
C Shannon See this profile of him: http://www.newyorker.com/tech/elements/claude-shannon-
the-father-of-the-information-age-turns-1100100.
master’s thesis His paper on the subject was his master’s thesis, https://en.wikipedia.org/wiki/A_Sy
mbolic_Analysis_of_Relay_and_Switching_Circuits.
type of not gate This shows an N-type Metal Oxide Semiconductor Transistor. There are many other types.
the von Neumann architecture Although, that architecture was based on the work of JP Eckert and
J Mauchly.
problem of humans living on Mars To get there the idea was to use a rocket ship impelled by dropping
atom bombs out the bottom; the energy would let the ship move rapidly around the solar system. This
sounds like a crank plan but it is perfectly feasible (Brower 1983). Having been a key person in the
development of the atomic bomb, von Neumann was keenly aware of their capabilities.
Game of Life Conway explains it here: https://www.youtube.com/watch?v=E8kUJL04ELA .
J Conway Conway was a magnetic person and extraordinarily creative. Sadly, he died in the Covid-19
pandemic. See an excerpt from the excellent biography at https://www.ias.edu/ideas/2015/rober
ts-john-horton-conway.
M Gardner’s celebrated Mathematical Games column of Scientific American in October 1970 (Gardner 1970)
computer craze (Bellos 2014)
zero-player game See https://www.youtube.com/watch?v=R9Plq-D1gEk.
B Gosper With R Greenblatt, he started the hacker community, and is particularly well-known among
Lisp-ers.
a rabbit Discovered by A Trevorrow in 1986.
anything that can be mechanically computed (Rendell 2011)
Here we will produce a simplified variant There are a number of variants in the the literature. For instance,
the hyperoperation used here is not the function actually introduced by Ackermann, which has three
inputs. And another student of Hilbert’s, G. Sudan, produced a similar function at roughly the same
time and for the same purpose, or being computable but not primitive recursive. The hyperoperation
itself was defined in 1948 by R Goodstein.
it is not primitive recursive This presentation is based on that of (Hennie 1977), (Smoryński 1991),
and (Robinson 1948).
This variant In addition to Péter, development of this variant also came from R Robinson.
a function is primitive recursive See the history at (Brock 2020).
LOOP (Meyer and Ritchie 1966)
the interpreter for LOOP Adapted from (Schnieder 2001)
Background
Deep Field movie https://www.youtube.com/watch?v=yDiD8F9ItX0
two paradoxes These are what Quine calls veridical paradoxes: they may at first seem absurd but we will
demonstrate that they are nonetheless true. (Wikipedia contributors 2018)
Galileo’s Paradox He did not invent it but he gave it prominence in his celebrated Discourses and
Mathematical Demonstrations Relating to Two New Sciences.
same cardinality Numbers have two natures. First, in referring to the set of stars known as the Pleiades
as the “Seven Sisters” we mean to take them as a set, not ordered in any way. In contrast, second, in
referring to the “Seven Deadly Sins,” well, clearly some of them score higher than others. The first
reference speaks to the cardinal nature of numbers and the second to their ordinal nature. For finite
numbers the two are bound together, as Lemma 1.5 says, but for infinite numbers they differ.
was proposed by G Cantor in the 1870’s For his discoveries, Cantor was reviled by a prominent mathematician
and former professor L Kronecker as a “corrupter of youth.” That was pre-Elvis.
which is Cantor’s definition (Gödel 1964)
the most important infinite set is the natural numbers, N = { 0, 1, 2, ... } Its existence is guaranteed by the
Axiom of Infinity, one of the standard axioms of Mathematics, the Zermelo-Frankel axioms.
lexicographic order Sometimes called lexical order or dictionary order.
due to Zeno Zeno gave a number of related paradoxes of motion. See (Wikipedia contributors 2016g)
(Huggett 2010), (Bragg 2016), as well as http://www.smbc-comics.com/comic/zeno and this xkcd.
Courtesy xkcd.com
the distances 𝑥𝑖+1 − 𝑥𝑖 shrink toward zero, there is always further to go because of the open-endedness at
the left of the interval ( 0 .. ∞) A modern paradox that like this one uses the open-endedness of the
numbers is Thomson’s Lamp Paradox: a person turns on the room lights and then a minute later turns
them off, a half minute later turns them on again, and a quarter minute later turns them off, etc. After
two minutes, are the lights on or off? This paradox was devised in 1954 by J F Thomson to analyze the
possibility of a supertask, the completion of an infinite number of tasks. Thomson’s answer was that it
creates a contradiction: “It cannot be on, because I did not ever turn it on without at once turning it off.
It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at
once turning it on. But the lamp must be either on or off ” (Thomson 1954). See also the discussion of
the Littlewood Paradox (Wikipedia contributors 2016d).
Number the diagonals Really, these are the anti-diagonals, since the diagonal is composed of the pairs
⟨𝑛, 𝑛⟩ .
arithmetic series with total 𝑑 (𝑑 + 1)/2 It is called the 𝑑 -th trianglar number
cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦)(𝑥 + 𝑦 + 1)/2] The Fueter-Pólya Theorem says that this is essentially the
only quadratic function that serves as a pairing; see (Smoryński 1991). More precisely, the only
real-coefficient quadratic polynomials in two variables giving a correspondence from N2 to N are 𝑝 (𝑥, 𝑦)
and 𝑝 (𝑦, 𝑥) , where 𝑝 (𝑎, 𝑏) = 𝑎 + [(𝑎 + 𝑏)(𝑎 + 𝑏 + 1)/2] . No one knows whether there are pairing
functions that are any other kind of polynomial.
memoization The term was invented by Donald Michie (Wikipedia contributors 2016b), who among other
accomplishments was a coworker of Turing’s in the World War II effort to break the German secret
codes.
assume that we have a family of correspondences 𝑔 𝑗 : 𝑁 → 𝑆ˆ𝑗 To pass from the original collection of
infinitely many onto functions 𝑔𝑖 : N → 𝑆ˆ𝑖 to a single, uniform, family of onto functions 𝑔 𝑗 (𝑖) = 𝐺 ( 𝑗, 𝑦)
we need some version of the Axiom of Choice, perhaps Countable Choice. In this book we assume a
suitable Choice axiom, and we omit further discussion of that because it would take us far afield.
doesn’t matter much For more on “much” see (Rogers 1958).
adding the instruction 𝑞 𝑗+𝑘 BB𝑞 𝑗+𝑘
This is essentially what a compiler calls ‘unreachable code’ in that it is not a state that the machine will
ever be in.
central to the entire subject The classic text (Rogers 1987) says, “It is not inaccurate to say that our theory
is, in large part, a ‘theory of diagonalization’.”
This technique is diagonalization The argument just sketched is often called Cantor’s diagonal proof,
although it was not Cantor’s original argument for the result, and although the argument style is not
due to Cantor but instead to Paul du Bois-Reymond. The fact that scientific results are often attributed
to people who are not their inventor is Stigler’s law of eponymy. Naturally it wasn’t invented by Stigler
(who attributes it to Merton). In mathematics this is called Boyer’s Law, who didn’t invent it either.
(Wikipedia contributors 2015).
Musical Chairs It starts with more children than chairs. Some music plays and the children walk around
the chairs. When suddenly the music stops each child tries to sit, leaving someone without a chair. That
child has to leave the game, a chair is removed, and the game proceeds.
so many reals This is a Pigeonhole Principle argument.
That is true but the proof is beyond our scope Also beyond our scope is the argument that for any two sets,
one of them has cardinality less than or equal to the other. This is equivalent to the Axiom of Choice.
consider this element of P (𝑆) This is sometimes called the Russell set because of its relation to Russell’s
paradox. See also this XKCD.
Courtesy XKCD
Your study partner is confused about the diagonal argument From (Stack Exchange author Kaktus and
various others 2019).
ENIAC, reconfigure by rewiring. Jean Jennings (left), Marlyn Wescoff (center), and Ruth Lichterman
program the ENIAC, circa 1946. US Army Photo.
A pattern in technology is for jobs done in hardware to migrate to software One story that illustrates the
naturalness of this involves the English mathematician C Babbage, and his protegee A Lovelace. In 1812
Babbage was developing tables of logarithms. These were calculated by computers — the word then
current for the people who computed them by hand. To check the accuracy he had two people do the
same table and compared. He was annoyed at the number of discrepancies and had the idea to build a
machine to do the computing. He got a government grant to design and construct a machine called the
difference engine, which he started in 1822. This was a single-purpose device, what we today would
call a calculator. One person who became interested in the computations was an acquaintance of his,
Lovelace (who at the time was named Byron, as she was the daughter of the poet Lord Byron).
However, this machine was never finished because Babbage had the thought to make a device that
would be programmable, and that was too much of a temptation. Lovelace contributed an extensive
set of notes on a proposed new machine, the analytical engine, and has become known as the first
programmer.
controlled by cards It weaves with hooks whose positions, raised or lowered, are determined by holes
punched in the cards
have the same output behavior A technical point: Turing machines have a tape alphabet. So a universal
machine’s input or output can only involve symbols that it is defined as able to use. If another machine
has a different tape alphabet then how can the universal machine simulate it? As usual, we define
things so that the universal machine manipulates representations of the other machine’s alphabet. This
is similar to the way that an everyday computer represents decimals using binary.
flowchart Flowcharts are widely used to sketch algorithms; here is one from XKCD.
Courtesy xkcd
through 4 they are 3, 5, 17, 257, and 65537, all of which are prime. Computer searches up to 30 have
not found any more.
a quadrillion, 1 × 1015
See https://github.com/jhg023/brocard.
‘extensional’ This description is derived from A Bauer, in https://cs.stackexchange.com/q/2811/67
754.
dovetailing A dovetail joint is used by carpenters or woodworkers for building strong wood drawers. It
weaves the two sides in alternately, as shown here, an interlocking way.
“We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine.”
(Turing 1938b)
magic smoke See (Wikipedia contributors 2017f).
we will instead describe it conceptually For a full treatment see (Rogers 1987).
like the ‘divides’ relation between integers This is not a partial order because 𝐴 ≤𝑇 𝐵 and 𝐵 ≤𝑇 𝐴 does not
imply that 𝐴 = 𝐵 . Instead, this is called a ‘preorder’.
jump further up the order This is analogous to the situation with cardinalities, where taking a power set
jumps to a larger cardinality.
the notion of partial computable function seems to have an built-in defense against diagonalization (Odifreddi
1992), p 152.
this machine’s name is where it halts Nominative determinism is the theory that a person’s name has some
influence over what they do with their life. Examples are: the sprinter Usain Bolt, the US weatherman
Storm Fields, the baseball player Prince Fielder. and the Lord Chief Justice of England and Wales named
Igor Judge, I Judge. See https://en.wikipedia.org/wiki/Nominative_determinism.
considered mysterious, or at any rate obscure For example, “The recursion theorem . . . has one of the most
unintuitive proofs where I cannot explain why it works, only that it does.” (Fortnow and Gasarch 2002)
we say that it is mentioned We can have a lot of fun with the use-mention distinction. One example is the
old wisecrack that answers the statement, “Nothing rhymes with orange” with “No it doesn’t,” that
turns on the distinction between nothing and ‘nothing’. Another example is the conundrum that we all
agree that 1/2 = 3/6, but one of them involves a 3 and the other does not — how can different things
be equal? The resolution of course is that the assertion that they are equal refers to the number that
they represent, not to the representation itself. That is, in mention, ‘1/2’ and ‘3/6’ are different strings
but in use, they point to the same number.
mathematical fable This fable came from David Hilbert in 1924. It was popularized by George Gamow in
One, Two, Three . . . Infinity. (Kragh 2014).
Napoleon’s downfall in the early 1800’s See (Wikipedia contributors 2017d).
period of prosperity and peace See (Wikipedia contributors 2017i).
A A Michelson, who wrote in 1899, “The more important fundamental laws and facts of physical science
have all been discovered, and these are now so firmly established that the possibility of their ever being
supplanted in consequence of new discoveries is exceedingly remote.” Michelson was a major figure,
whose opinions carried weight. From 1901 to 1903 he was president of the American Physical Society.
In 1910–1911 he was president of the American Association for the Advancement of Science and from
1923–1927 he was president of the National Academy of Sciences. In 1907 he received the Copley
Medal from the Royal Society in London, and then the Nobel Prize. He remains well known today for
the Michelson–Morley experiment that tried to detect the presence of aether, the hypothesized medium
through with light waves travel.
working out nature’s rules See https://www.youtube.com/watch?v=o1dgrvlWML4.
many observers thought that we basically had got the rules An example is that Max Planck was advised not
to go into physics by his professor, who said, “in this field, almost everything is already discovered, and
all that remains is to fill a few unimportant holes.” (Wikipedia contributors 2017j)
the discovery of radiation This happened in 1896, before Michelson’s statement. Often the significance of
things takes time to be apparent
he became an overnight celebrity “Einstein Theory Triumphs” was the headline in The New York Times.
JJ Thomson, president of the Royal Society, referred to the experiment’s success as “one of the
momentous, if not the most momentous, pronouncements in the history of human thought.” And, when
Einstein arrived in New York by boat in 1921, reporters were delighted to find not a stuffy academic
but instead someone who was very endearing, quotable, and photogenic. The hair, the scruffy clothes,
and the violin, all made him seem the personification of a genius, which of course continues today.
“everything is relative.” Of course, the history around Einstein’s work is vastly more complex and subtle.
But we are speaking of the broad understanding, not of the truth.
loss of certainty This phrase is the title of a famous popular book on mathematics, by M Klein. The book is
fun and a thought-provoking read. Also thought-provoking are some criticisms of the book. (Wikipedia
contributors 2019b) is good introduction to both.
from a proof of the Halting problem, we can get to a proof of Gödel’s Theorem See (Aaronson 2011a). See
also https://math.stackexchange.com/a/53324/12012.
the development of a fetus is that it basically just expands The issue was whether the fetus began preformed
or as a homogeneous mass; see (Maienschein 2017). Today we have similar questions about the Big
Bang — we are puzzled to explain how a mathematical point, which is without internal structure and
entirely homogeneous, could develop into the very non-homogeneous universe that we see today.
infinite regress This line of thinking often depends on the suggestion that all organisms were created at
the same time, that they have existed since the beginning of the posited creation.
development by Darwin and Wallace of the theory of differential reproduction through natural selection
Darwin wrote in his autobiography, “The old argument of design in nature, as given by Paley, which
formerly seemed to me so conclusive, fails, now that the law of natural selection has been discovered.
We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made
by an intelligent being, like the hinge of a door by man. There seems to be no more design in the
variability of organic beings and in the action of natural selection, than in the course which the wind
blows. Everything in nature is the result of fixed laws.”
the rug is less complex than the machine This is an information theoretic analog of the Second Law of
Thermodynamics. E Musk has tweeted something of the same sentiment, “The extreme difficulty of
scaling production of new technology is not well understood. It’s 1000% to 10,000% harder than making
a few prototypes. The machine that makes the machine is vastly harder than the machine itself.” See
https://twitter.com/elonmusk/status/1308284091142266881.
self-reference ‘Self-reference’ describes something that refers to itself. The classic example is the Liar
paradox, the statement attributed to the Cretian Epimenides, “All Cretans are liars.” Because he is
Cretian we take the statement to be an utterance about utterances by him, that is, to be about itself. If
we suppose that the statement is true then it asserts that anything he says is false, so the statement is
false. But if we suppose that it is false then we take that he is saying the truth, that all his statements
are false. Its a paradox, meaning that the reasoning seems locally sound but it leads to a global
impossibility.
This is related to Russell’s paradox, which lies at the heart of the diagonalization technique, that if we
define the collection of sets 𝑅 = {𝑆 | 𝑆 ∉ 𝑆 } then 𝑅 ∈ 𝑅 holds if and only if 𝑅 ∉ 𝑅 holds.
Self-reference is obviously related to recurrence. You see it sometimes pictured as an infinite recurrence,
as here on the front of a chocolate product.
Because of this product, having a picture contain itself is sometimes known as the Droste effect.
Besides the Liar paradox there are many others. One is Quine’s paradox, a sentence that asserts its own
falsehood.
If this sentence were false then it would be saying something that is true. If this sentence were true
then what it says would hold and it would be not true.
A wonderful popular book exploring these topics and many others is (Hofstadter 1979).
quine Named for the philosopher Willard Van Orman Quine.
The verb ‘to quine’ Invented by D Hofstadter. It traces back to the statement due to the philosopher
W Quine, “yields falsehood when preceded by its quotation” yields falsehood when preceded by its quotation
which has the paradoxical quality that if true it asserts its own falsehood, and if false it must be true.
And it accomplishes that without direct self-reference.
We can express that in code The development of this part of the subsection comes from (Boro Sitnikovski
2024) — also where the name Boro comes from — and (Avigad 2007).
which 𝑛 -state Turing Machine does the most computational work before halting R H Bruck wrote (Bruck
1953), “I might compare the high-speed computing machine to a remarkably large and awkward pencil
which takes a long time to sharpen and cannot be held in the fingers in the usual manner so that it
gives the illusion of responding to my thoughts, but is fitted with a rather delicate engine and will write
like a mad thing provided I am willing to let it dictate pretty much the subjects on which it writes.”
The Busy Beaver machine is the maddest writer of that size.
Think of this as a competition Two very nice videos on this subject are The Boundary of Computation and
What happens at the Boundary of Computation? from YouTube contributor Mutual Information.
In the 1962 paper Radó This paper (Radó 1962) is exceptionally clear and interesting.
In 2024, a team of researchers See (Brubaker 2024)
odd perfect number A number is perfect if it is the sum of its divisors. For instance, 6 is 1 + 2 + 3. Even
perfect numbers exist but we do not know if odd ones do.
machines with three or more symbols The case of machines with three states and three symbols is
not known. Solving it requires solving a Collatz-like problem that currently no one can do. See
https://www.sligocki.com/2023/10/16/bb-3-3-is-hard.html.
BB (𝑛) is unknowable See (Aaronson 2012a) and the excellent summary (Aaronson 2020). See also
https://www.quantamagazine.org/the-busy-beaver-game-illuminates-the-fundamental-
limits-of-math-20201210/.
a 7918-state Turing machine The number of states needed has since been reduced. As of this writing it is
748. See the wonderful bachelor’s degree thesis by J Riebel at https://www.ingo-blechschmidt.e
u/assets/bachelor-thesis-undecidability-bb748.pdf.
the standard axioms for Mathematics This is ZFC, the Zermelo–Fraenkel axioms with the Axiom of Choice.
(In addition, they also took the hypothesis of the Stationary Ramsey Property.)
take the floor Let the 𝑛 -th triangle number be 𝑡 (𝑛) = 0 + 1 + · · · + 𝑛 = 𝑛(𝑛 + 1)/2. The function 𝑡 is
monotonically increasing and there are infinitely many triangle numbers. Thus for every natural number 𝑐
there is a unique triangle number 𝑡 (𝑛) that is maximal so that 𝑐 = 𝑡 (𝑛) + 𝑘 for some 𝑘 ∈ N. Because
𝑡 (𝑛 + 1) = 𝑡 (𝑛) + 𝑛 + 1, we see that 𝑘 < 𝑛 + 1, that is, 𝑘 ≤ 𝑛 . Thus, to compute the diagonal number 𝑑
from the Cantor number 𝑐 of a pair, we have ( 1/2)𝑑 (𝑑 + 1) ≤ 𝑐 < √ ( 1/2)(𝑑 + 1)(𝑑 + 2) . Applying
√ the
quadratic formula to√the left half and right halves gives ( 1/2)(−3 + 1 + 8𝑐) < 𝑑 ≤ ( 1/2)(−1 + 1 + 8𝑐) .
Taking ( 1/2) (−1 + 1 + 8𝑐) to be 𝛼 gives that 𝑐 ∈ (𝛼 − 1 .. 𝛼] so that 𝑑 = ⌊𝛼⌋ . (SE author Brian M.
Scott 2020)
we can extend to tuples of any size See https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_
it.
Languages
having elephants move to the left side of a road or to the right Less fancifully, we could be making a Turing
machine out of LEGO’s and want to keep track by sliding a block from one side of a column to the other.
Or, we could use an abacus.
we could translate any such procedure While a person may quite sensibly worry that elephants could be
not just on the left side or the right, but in any of the continuum of points in between, we will make
this assertion without more philosophical analysis than by just referring to the discrete nature of our
mechanisms (as Turing basically did). That is, we take it as an axiom.
finite set { 1000001, 1100001 } Although it looks like two strings plucked from the air, the language is not
without sense. The bitstring 1000001 represents capital A in the ASCII encoding, while 1100001 is lower
case a. The American Standard Code for Information Interchange, ASCII, is a widely used, albeit quite
old, way of encoding character information in computers. The most common modern character encoding
is UTF-8, which extends ASCII. For the history see https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-
history.txt.
palindrome Sometimes people call Psychology the study of college freshmen because so many studies
start, roughly, “we put a bunch of college freshmen in a room, lied to them about what we were doing,
and . . . ” In the same way, Theory of Computing can sometimes seem like the study of palindromes.
palindromes in English Some people like to move beyond single word palindromes to make sentence-length
palindromes that make some sense. Some of the more famous are: (1) supposedly the first sentence
ever uttered, “Madam, I’m Adam” (2) Napoleon’s lament, “Able was I ere I saw Elba” and (3) “A man, a
plan, a canal: Panama”, about Theodore Roosevelt. See also http://norvig.com/palindrome.html.
defining Σ∗ to be the set of strings of characters from that alphabet That is, we won’t be careful to distinguish
between the symbols of the alphabet and the single-character strings consisting of just those characters.
In practice usually a language is governed by rules Linguists started formalizing the description of language,
including phrase structure, at the start of the 1900’s. Meanwhile, string rewriting rules as formal,
abstract systems were introduced and studied by mathematicians including Axel Thue in 1914, Emil
Post from the 1920’s through the 1940’s and Turing in 1936. Noam Chomsky, while teaching linguistics
to students of information theory at MIT, combined linguistics and mathematics by taking Thue’s
formalism as the basis for the description of the syntax of natural language. (Wikipedia contributors
2017e)
“the red big barn” sounds wrong. Experts vary on the exact rules but one source gives the correct order
as (article) + number + judgment/attitude + size, length, height + age + color + origin + material
+ purpose + (noun), so that “big red barn” is size + color + noun, as is “little green men.” This is
called the Royal Order of Adjectives; see http://english.stackexchange.com/a/1159. A person
may object by citing “big bad wolf ” but it turns out there is another, stronger, rule that if there are
three words then they have to go I-A-O and if there are two words then the order has to be I followed
by either A or O. Thus we have tick tock but not tock tick. Similarly for tic-tac-toe, mishmash, King
Kong, or dilly dally.
very strict rules Everyone who has programmed has had a compiler chide them about a syntax violation.
grammars are the language of languages. From Matt Swift, http://matt.might.net/articles/gramma
rs-bnf-ebnf/.
this grammar Taken from https://en.wikipedia.org/wiki/Formal_grammar.
dangling else See https://en.wikipedia.org/wiki/Dangling_else.
postal addresses. Adapted from https://en.wikipedia.org/wiki/BackusNaur_Form.
Recall Turing’s prototype computer The fact that in this book we stick to grammars where each rule head
is a single nonterminal greatly restricts the languages that we can compute. More general grammars
can compute more, including every set that can be decided by a Turing machine.
we often state problems For instance, see the blogfeed for Theoretical Computer Science http://cstheory-
feed.org/ (Various authors 2017)
represent graphs Example 3.2 make the point that a graph is about the connections between vertices, not
about how it is drawn. This graph representation via a matrix also illustrates that point because it is,
after all, not drawn.
the most common way to express grammars One factor influencing its adoption was a letter that D Knuth
wrote to the Communications of the ACM (Knuth 1964). He listed some advantages over the grammar-
specification methods that were then widely used. Most importantly, he contrasted BNF’s more
descriptive elements such as using ‘<addition operator>’ instead of ‘A’, saying that the difference is a
great addition to “the explanatory power of a syntax.” He also proposed the name ‘Backus Naur Form’.
(Now a hyphen is most common.)
some extensions for grouping and replication The best current standard is https://www.w3.org/TR/xml/.
Time is a difficult engineering problem One complication of time, among many, is leap seconds. The Earth is
constantly undergoing deceleration caused by the braking action of the tides. The average deceleration
of the Earth is roughly 1.4 milliseconds per day per century, although the exact number varies from year
to year depending on many factors, including major earthquakes and volcanic eruptions. To ensure
that atomic clocks and the Earth’s rotational time do not differ by more than 0.9 seconds, occasionally
an extra second is added to civil time. This leap second can be either positive or negative depending on
the Earth’s rotation — on occasion there are minutes with only 58 seconds, and on occasion minutes
with 60.
Adding to the confusion is that the changes in rotation are uneven and we cannot predict leap seconds
far into the future. The International Earth Rotation Service publishes bulletins that announce leap
seconds with a few weeks warning. Thus, there is no way to determine how many seconds there will
be between the current instant and, say, ten years from now. (This can cause trouble in area such as
navigation and high-frequency trading and there are proposals to eliminate leap seconds or replace
them with leap hours.) Since the first leap second in 1972, all leap seconds have been positive and
there were 23 leap seconds in the 34 years to January 2006. (U.S. Naval Observatory 2017)
RFC 3339 (Klyne and Newman 2002)
strings such as 1958-10-12T23:20:50.52Z This format has a number of advantages including human
readability, that if you sort a collection of these strings then earlier times will come earlier, simplicity
(there is only one format), and that they include the time zone information.
a BNF grammar Some notes: (1) Coordinated Universal Time, the basis for civil time, is often called UTC,
but is sometimes abbreviated Z and read aloud as “Zulu,” (2) years are four digits to prevent the Y2K
problem (Encyclopædia Britannica Editors 2017), (3) the only month numbers allowed are 01–12 and
in each month only some day numbers are allowed, and (4) the only time hours allowed are 00–23,
minutes must be in the range 00–59, etc. (Klyne and Newman 2002)
Automata
what can be done by a machine having a number of possible configurations that is bounded From Rabin,
Scott, Finite Automata and Their Decision Problems, 1959: Turing machines are widely considered to be
the abstract prototype of digital computers; workers in the field, however, have felt more and more that the
notion of a Turing machine is too general to serve as an accurate model of actual computers. It is well
known that even for simple calculations it is impossible to give an a priori upper bound on the amount of
tape a Turing machine will need for any given computation. It is precisely this feature that renders Turing’s
concept unrealistic. In the last few years the idea of a finite automaton has appeared in the literature. These
are machines having only a finite number of internal states that can be used for memory and computation.
The restriction on finiteness appears to give a better approximation to the idea of a physical machine. Of
course, such machines cannot do as much as Turing machines, but the advantage of being able to compute
an arbitrary general recursive function is questionable, since very few of these functions come up in practical
applications.
transition function Δ : 𝑄 × Σ → 𝑄 Some authors allow the transition function to be partial. That is, some
authors allow that for some state-symbol pairs there is no next state. This choice by an author is a
matter of convenience, as for any such machine you can create an error state 𝑞 error or dead state, that is
not an accepting state and that transitions only to itself, and send all such pairs there. This transition
function is total, and the new machine has the same collection of accepted strings as the old.
Unicode While in the early days of computers characters could be encoded with standards such as ASCII,
which includes only upper and lower case unaccented letters, digits, a few punctuation marks, and a
few control characters, today’s global interconnected world needs more. The Unicode standard assigns
a unique number called a code point to every character in every language (to a fair approximation).
See (Wikipedia contributors 2017l).
if a language is finite then there is a Finite State machine that accepts a string if and only if it is a member of
that language In practice the suggestion that for any finite set of strings there is a Finite State machine
that accepts it, simply by listing all of the cases, may not be reasonable. For example, there are finitely
many people and each has finitely many active phone numbers so the set of all currently-active phone
numbers is a finite language. But constructing a machine for it would be silly. In addition, a finite
language doesn’t have to be large for it to be difficult, in a sense. Take Goldbach’s conjecture, that
every even number greater than 2 is the sum of two primes, as in 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, . . .
Computer testing shows that this pattern continues to hold up to very large numbers but no one knows
if it is true for all evens. Now consider the set consisting of the string 𝜎 ∈ { 0, ... 9 }∗ representing in
decimal the smallest even number that is not the sum of two primes. This set is finite since it has either
one member or none. But while that set is tiny, we don’t know what it contains.
simple devices The devices to do the switching were invented in 1889 by an undertaker whose competitor’s
wife was the local telephone operator and routed calls to her husband’s business. (Wikipedia contributors
2017b)
allowed users to directly dial long distance in North America See the description of the North America
Numbering Plan (Wikipedia contributors 2017g).
same-area local exchange Initially, large states, those divided into multiple numbering plan areas, were
assigned area codes with a 1 in the second position. Areas that covered entire states or provinces got
codes with 0 as the middle digit. That was abandoned by the early 1950’s. (Wikipedia contributors
2017g).
Alcuin of York (735–804) See https://www.bbc.co.uk/programmes/m000dqy8.
a wolf, a goat, and a bundle of cabbages This translation is from A Raymond, from the University of
Washington.
that of finding the shortest circuit visiting every city in a list See https://nbviewer.jupyter.org/url/n
orvig.com/ipython/TSP.ipynb.
US lower forty eight states See https://wiki.openstreetmap.org/wiki/TIGER.
no-state What is no-state, exactly? We can think that it is like what happens if you write a program with a
sequence of if-then statements and forget to include an else. Obviously the computer goes somewhere,
the instruction pointer points to some address, but what happens is not sensible in terms of the model
that you’ve stated.
As an alternate, the wonderful book (Hofstadter 1979) describes a place named Tumbolia, which is
where holes go when they are filled, and also where your lap goes when you stand up. Perhaps the
machines go there.
amb (...) This operator takes a list of possibilities and evaluates to an option, if one is available, that
allows the program as a whole to succeed. Here is a small example (from https://rosettacode.org/
wiki/Amb): first let the values (x (amb 1 2 3)) and (y (amb 5 4 3)). Then call (amb (= (* x y)
8)). The result is that x has the value 2, while 𝑦 is 4. That is, amb(1,2,3) chooses the future in which
x has value 2, and amb(7,6,4,5) chooses 4, in order to ensure that amb(x*y = 8) produces a success.
These operators were described by John McCarthy in (McCarthy 1963). “Ambiguous functions are
not really functions. For each prescription of values to the arguments the ambiguous function has a
collection of possible values. An example of an ambiguous function is less (𝑛) defined for all positive
integer values of 𝑛 . Every non-negative integer less than 𝑛 is a possible value of less (𝑛) . First we
define a basic ambiguity operator amb (𝑥, 𝑦) whose possible values are 𝑥 and 𝑦 when both are defined:
otherwise, whichever is defined. Now we can define less (𝑛) by less (𝑛) = amb (𝑛 − 1, less (𝑛 − 1)) .”
demon The term ‘demon’ arose from Maxwell’s demon. This is a thought experiment created in 1867 by
the physicist J C Maxwell about the second law of thermodynamics, which says that it takes energy to
raise the temperature of a sealed system. Maxwell imagined a chamber of gas with a door controlled by
an all-knowing demon. When the demon sees a gas molecule of gas approaching that is slow-moving, it
opens the door and lets that molecule out of the chamber, thereby raising the chamber’s temperature
without any external heat. See (Wikipedia contributors 2019c).
Pronounced KLAY-nee His son Ken Kleene, wrote, “As far as I am aware this pronunciation is incorrect in
all known languages. I believe that this novel pronunciation was invented by my father.” (Free Online
Dictionary of Computing (Denis Howe) 2017)
mathematical model of neurons (Wikipedia contributors 2017c)
have a vowel in the middle Most speakers of American English cite the vowels as ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’. See
(Bigham 2014).
before and after diagrams This is derived from (Hopcroft, Motwani, and Ullman 2001).
The fact that we can describe these languages in so many different ways (Stack Exchange author David
Richerby 2018).
performing that operation on its members always yields another member Familiar examples are that adding
two integers always gives an integer so the integers are closed under the operation of addition, and
that squaring an integer always results in an integer so that the integers are closed under squaring.
the machine accepts at least one string of length 𝑘 , where 𝑛 ≤ 𝑘 < 2𝑛 This gives an algorithm that inputs a
Finite State machine and determines, in a finite time, if it recognizes an infinite language.
reserve a character We use it only to mark the stack bottom, never in the middle of the stack.
We are ready for the definition There are a variety of definitions for Pushdown machines. For instance,
here we have the machine accepts if its tape is empty and it is in an accepting state, but a variant
requires that its stack be empty. However, all of these variants that extend nondeterministic machines
accept the same set of languages.
Δ : 𝑄 × (Σ ∪ { B, 𝜀 }) × (Γ ∪ { ⊥ }) → P 𝑄 × (Γ ∪ { ⊥ }) ∗ Tape outputs consist of sequences of elements
of Γ that optionally end in a ⊥. So a more precise codomain is P (𝑄 × 𝑆) for 𝑆 = Γ ∗ ∪ (Γ ∗ ⌢ { ⊥ }) .
without proof An excellent source for more is (Hopcroft, Motwani, and Ullman 2001).
including C, Java, Python, and Racket This is a good approximation but the full story is more complicated.
Usually the set of programs accepted by the parser is a subset of a context free language, conditioned
on some additional rules that the parser enforces. For example, in 𝐶 every variable must be appear in a
declaration inside an enclosing scope, which is clearly a context-sensitive constraint. Another example
is that in Python all the whitespace prefixes inside a block have to be the same length, which again is a
context-sensitive constraint.
\d We shall ignore cases of non-ASCII digits, that is, cases outside 0–9. Unicode includes many different
sets of graphemes for the decimal digits, along with non-decimal numerals such as Roman numerals.
There are also a number of typographical variations of the ASCII numerals provided for specialized
mathematical use and for compatibility with earlier character sets, such as circled digits sometimes
used for itemization.
ZIP codes ZIP stands for Zone Improvement Plan. The system has been in place since 1963 so it, like the
music movement called ‘New Wave’, is an example of the danger of naming your project something that
will become obsolete if that project succeeds.
a colon and two forward slashes The inventor of the World Wide Web, T Berners Lee, has admitted that
the two slashes don’t have a purpose (Firth 2009).
more power than the theoretical regular expressions that we studied earlier Omitting this power, and keeping
the implementation in sync with the theory, has the advantage of speed. See (Cox 2007).
It is described by the regex It is credited to the Perl hacker Abigail, from http://abigail.be/.
valid email addresses This expression follows the RFC 822 standard. The full listing is at http://www.ex-
parrot.com/pdw/Mail-RFC822-Address.html. It is due to Paul Warren who did not write it by hand
but instead used a Perl program to concatenate a simpler set of regular expressions that relate directly
to the grammar defined in the RFC. To use the regular expression, should you be so reckless, you would
need to remove the formatting newlines.
J Zawinski The post is from alt.religion.emacs on 1997-Aug-12. For some reason it keeps disappearing
from the online archive. The full discussion reveals that the quote is more dogmatic than the complete
assertion. One response to the quote is, “Some people, when confronted with a problem, think ‘I
know, I’ll quote Jamie Zawinski.’ Now they have two problems.” (Martin Liebach, 2009-Mar-04,
https://m.lieba.ch/2009/03/04/regex-humor/.
Now they have two problems. A classic example is trying to use regular expressions to parse an HTML
document. Sometimes scraping a fixed document to get some needed data by using regexes is just
what you need, quick and not too hard. But to parse significant parts of an HTML document, or to try
to anticipate possible changes, just leads to horrors. See (Stack Exchange author bobnice 2009).
regex golfSee https://alf.nu/RegexGolf, and https://nbviewer.jupyter.org/url/norvig.com/
ipython/xkcd1313.ipynb.
John Myhill Sr 1923–1987 and Anil Nerode b 1932 Photo credits Paul Halmos, Jason Koski/Cornell
University
the two machines are essentially the same The two machines are said to be ‘isomorphic’.
Hopcroft’s algorithm See (Knuutila 2001)
Complexity
mirrors the subject’s history This is like the slogan “ontogeny recapitulates phylogeny” for the now-
discredited biological theory that the development of an embryo, which is called ontogeny, goes through
same stages as the adult stages in the evolution of the animal’s ancestors, which is phylogeny.
A natural next step is to look to do jobs efficiently S Aaronson states it more provocatively as, “[A]s
computers became widely available starting in the 1960s, computer scientists increasingly came to see
computability theory as not asking quite the right questions. For, almost all the problems we actually
want to solve turn out to be computable in Turing’s sense; the real question is which problems are
efficiently or feasibly computable.” (Aaronson 2011b)
A Karatsuba See https://en.wikipedia.org/wiki/Anatoly_Karatsuba.
clever algorithm The idea is: let 𝑘 = ⌈𝑛/2⌉ and write 𝑥 = 𝑥 1 2𝑘 + 𝑥 0 and 𝑦 = 𝑦1 2𝑘 + 𝑦0 (so for instance,
678 = 21 · 25 + 6 and 42 = 1 · 25 + 10). Then 𝑥𝑦 = 𝐴 · 22𝑘 + 𝐵 · 2𝑘 + 𝐶 where 𝐴 = 𝑥 1𝑦1 , and
𝐵 = 𝑥 1𝑦0 + 𝑥 0𝑦1 , and 𝐶 = 𝑥 0𝑦0 (for example, 28 476 = 21 · 210 + 216 · 25 + 60). The multiplications
by 22𝑘 and 2𝑘 are just bit-shifts to known locations independent of the values of 𝑥 and 𝑦 , so they don’t
affect the time much. But the two multiplications for 𝐵 seem remove all the advantage and still give
𝑛 2 time. However, Karatsuba noted that 𝐵 = (𝑥 0 + 𝑥 1 ) · (𝑦0 + 𝑦1 ) − 𝐴 − 𝐶 Boom: done. Just one
multiplication.
The ‘ 𝑓 = O (𝑔) ’ notation is very common, but awkward See also https://whystartat.xyz/wiki/Big_O_
notation.
our conclusions in the continuous context carry over to the discrete It does not cover some functions that we
may use such as the factorial, or those that are only defined for inputs larger than some value 𝑁 , but
this version is easier to understand and makes the same point.
are most common in practice Sometimes in practice intermediate powers are notable. For instance, at this
moment the complexity of matrix multiplication is O (𝑛 2.373 ) , approximately. But most often we work
with natural number expressions.
next table shows why This table is adapted from (Garey and Johnson 1979).
there are 3.16 × 107 seconds in a year The easy way to remember this is the bumper sticker slogan by
Tom Duff from Bell Labs: “𝜋 seconds is a nanocentury.”
very, very much larger than polynomial growth According to an old tale from India, the Grand Vizier Sissa
Ben Dahir invented chess. For it, the delighted Indian King granted him a wish. Sissa said, “Majesty,
give me a grain of wheat to place on the first square of the board, and two grains of wheat to place on
the second square, and four grains of wheat to place on the third, and eight grains of wheat to place on
the fourth, and so on. Oh, King, let me cover each of the 64 squares of the board.”
“And is that all you wish, Sissa, you fool?” exclaimed the astonished King.
“Oh, Sire,” Sissa replied, “I have asked for more wheat than you have in your entire kingdom. Nay, for
more wheat that there is in the whole world, truly, for enough to cover the whole surface of the earth
to the depth of the twentieth part of a cubit.”
Sissa has the right idea but his arithmetic is slightly off. A cubit is the length of a forearm, from the tip
of the middle finger to the bottom of the elbow, so perhaps twenty inches. The geometric series formula
gives 1 + 2 + 4 + · · · + 263 = 264 − 1 = 18 446 744 073 709 551 615 ≈ 1.84 × 1019 grains of rice. The
surface are of the earth, including oceans, is 510 072 000 square kilometers. There are 1010 square
centimeters in each square kilometer so the surface of the earth is 5.10 × 1018 square centimeters.
That’s between three and four grains of rice on every square centimeter of the earth. Not rice an inch
thick, but still a lot.
Another way to get a sense of the amount of rice is: there are about 7.5 billion people on earth so it is
on the order of 108 grains of rice for each person in the world. There are about 1 000 000 = 107 grains
of rice in a bushel. In sum, ten bushels for each person.
Cobham’s thesis Credit for this goes to both A Cobham and J Edmonds, separately; see (Cobham 1965)
and (Edmunds 1965).
Jack Edmonds, b 1934 Alan Cobham, 1927–2011
Cobham’s paper starts by asking whether “is it harder to multiply than to add?” a question that we still
cannot answer. Clearly we can add two 𝑛 -bit numbers in O (𝑛) time, but we don’t know whether we
can multiply in linear time.
Cobham then goes on to point out the distinction between the complexity of a problem and the running
time of a particular algorithm to solve that problem, and notes that many familiar functions, such
as addition, multiplication, division, and square roots, can all be computed in time “bounded by a
polynomial in the lengths of the numbers involved.” He suggests we consider the class of all functions
having this property.
As for Edmunds, in a “Digression” he writes: “An explanation is due on the use of the words ‘efficient
algorithm.’ According to the dictionary, ‘efficient’ means ‘adequate in operation or performance.’ This
is roughly the meaning I want — in the sense that it is conceivable for [this problem] to have no
efficient algorithm. . . . There is an obvious finite algorithm, but that algorithm increases in difficulty
exponentially with the size of the graph. It is by no means obvious whether or not there exists an
algorithm whose difficulty increases only algebraically with the size of the graph . . . If only to motivate
the search for good, practical algorithms, it is important to realize that it is mathematically sensible
even to question their existence.”
(It is worth noting that Cobham and Edwards were not the first to talk about polynomial and other time
behaviors. For instance, in 1910 HC Pocklington discussed it in exploring the behavior of algorithms
for solving quadratic congruences. But Cobham and Edwards were the ones who started the current
interest.)
tractable Another word that you can see in this context is ‘feasible’. Some authors use them to mean the
same thing, roughly that we can solve reasonably-sized problem instances using reasonable resources.
But some authors use ‘feasible’ to have a different connotation, for instance explicitly disallowing inputs
are too large, such as having too many bits to fit in the physical universe. The word ‘tractable’ is more
standard and works better with the definition that includes the limit as the input size goes to infinity,
so here we stick with it.
slower than the right by four calculations We won’t consider whether the compiler optimizes it out of the
loop.
if the algorithm is O (𝑛 2 ) on the RAM then on the Turing machine it can be O (𝑛 5 ) A more extreme example
of a model-based difference is that addition of two 𝑛 × 𝑛 matrices on a RAM model takes time that is
O (𝑛 2 ) , but on an unboundedly parallel machine model it takes constant time, O ( 1) .
the most common model is a Turing machine This observaton is from Avi Wigderson’s Turing Award lecture,
https://www.youtube.com/watch?v=f2NiGO8zC1c.
Its definition ignores constant factors This discussion originated as (Stack Exchange author babou and
various others 2015).
could that not make the second algorithm more useful in practice? A great writeup of the details of an
algorithm for small values is the description of the sorting algorithm used by Python in (Peters 2023).
the order of magnitude of these constants For a rough idea of what these may be, here are some numbers
that every programmer should know.
Courtesy xkcd.com
no circuit is possible Consider a land mass. For each bridge in there must be an associated bridge out. So
an at least necessary condition is that the land masses have an even number of associated edges. Bt
that is not true for this city.
the countries must be contiguous A notable example of a non-contiguous country in the wold today is that
Russia is separated from Kaliningrad, the city that used to be known as Kónigsberg.
we can draw it in the plane This is because the graph comes from a planar map.
start with a planar graph The graph is undirected and without loops.
Counties of England and the derived planar graph This is today’s map. At the time, some counties were not
contiguous.
it was controversial
See https://www.maa.org/sites/default/files/pdf/upload_library/22/Ford/Swart697-
707.pdf.
Given a graph and a number 𝑘 ∈ N In the name of the problem we often omit the 𝑘 .
Conjunctive Normal form Any Boolean function can be expressed in that form; see the Appendix.
The table above gives the numbers for the 2020 election Here are the abbreviations for states and the
District of Columbia: Alabama AL, Alaska AK, Arizona AZ, Arkansas AR, California CA, Colorado CO,
Connecticut CT, Delaware DE, District of Columbia DC, Florida FL, Georgia GA, Hawaii HI, Idaho ID,
Illinois IL, Indiana IN, Iowa IA, Kansas KS, Kentucky KY, Louisiana LA, Maine ME, Maryland MD,
Massachusetts MA, Michigan MI, Minnesota MN, Mississippi MS, Missouri MO, Montana MT, Ne-
braska NE, Nevada NV, New Hampshire NH, New Jersey NJ, New Mexico NM, New York NY, North
Carolina NC, North Dakota ND, Ohio OH, Oklahoma OK, Oregon OR, Pennsylvania PA, Rhode Island RI,
South Carolina SC, South Dakota SD, Oklahoma OK, Tennessee TN, Texas TX, Utah UT, Vermont VT,
Virginia VA, Washington WA, Wisconsin WI, Wyoming WY
ignore some fine points Both Maine and Nebraska have two districts, and each elects their own representative
to the Electoral College, rather than having two state-wide electors who vote the same way.
words can be packed into the grid The earliest known example is the Sator square, five Latin words that
pack into a grid.
S A T O R
A R E P O
T E N E T
O P E R A
R O T A S
It appears in many places in the Roman Empire, often as graffiti. For instance, it was found in the ruins
of Pompeii. Like many word game solutions it sacrifices comprehension for form but it is a perfectly
grammatical sentence that translates as something like, “The farmer Arepo works the wheel with effort.”
popular with 𝑛 = 4 as a toy It was invented by Noyes Palmer Chapman, a postmaster in Canastota, New
York. As early as 1874 he showed friends a precursor puzzle. By December 1879 copies of the improved
puzzle were circulating in the northeast and students in the American School for the Deaf and other
started manufacturing it. They become popular as the “Gem Puzzle.” Noyes Chapman had applied for
a patent in February, 1880. By that time the game had became a craze in the US, somewhat like Rubik’s
Cube a century later. It was also popular in Canada and Europe. See (Wikipedia contributors 2017a).
We know of no efficient algorithm to find divisors An effort in 2009 to factor a 768-bit number (232-digits)
used hundreds of machines and took two years. The researchers estimated that a 1024-bit number
would take about a thousand times as long.
Factoring seems to be hard Finding factors has for many years been thought hard. For instance, a number
is called a Mersenne prime if it is a prime number of the form 2𝑛 − 1. They are named after M Mersenne,
a French friar and important figure in the early sharing of scientific results, who studied them in the
early 1600’s. He observed that if 𝑛 is prime then 2𝑛 − 1 may be prime, for instance with 𝑛 = 3, 𝑛 = 7,
𝑛 = 31, and 𝑛 = 127. He suspected that others of that form were also prime, in particular 𝑛 = 67.
On 1903-Oct-31 F N Cole, then Secretary of the American Mathematical Society, made a presentation
at a math meeting. When introduced, he went to the chalkboard and in complete silence computed
267 − 1 = 147 573 952 589 676 412 927. He then moved to the other side of the board, wrote
193 707 721 times 761 838 257 287, and worked through the calculation, finally finding equality. When
he was done Cole returned to his seat, having not uttered a word in the hour-long presentation. His
audience gave him a standing ovation.
Cole later said that finding the factors had been a significant effort, taking “three years of Sundays.”
Platonic solids See (Wikipedia contributors 2017k).
as shown Some PDF readers cannot do opacity, so you may not see the entire Hamiltonian path.
Six Degrees of Kevin Bacon One night, three college friends, Brian Turtle, Mike Ginelli, and Craig Fass,
were watching movies. Footloose was followed by Quicksilver, and between was a commercial for a
third Kevin Bacon movie. It seemed like Kevin Bacon was in everything! This prompted the question
of whether Bacon had ever worked with De Niro? The answer at that time was no, but De Niro was
in The Untouchables with Kevin Costner, who was in JFK with Bacon. The game was born. It became
popular when they wrote to Jon Stewart about it and appeared on his show. (From (Blanda 2013).)
See https://oracleofbacon.org/.
uniform family of tasks From (Jones 1997).
There is no widely-accepted formal definition of ‘algorithm’ This discussion derives from (Pseudonym 2014).
we prefer language decision problems Because of this, some authors modify the definition of a Turing
machine to have it come with a subset of accepting states. Such a machine solves a problem if it halts on
all input strings, and when it halts it is in an accepting state exactly when that string is in the language.
default interpretation of ‘problem’ Not every computational problem is naturally expressible as a language
decision problem Consider the task of sorting the characters of strings into ascending order. We could try
to express it as the language of sorted strings {𝜎 ∈ Σ∗ 𝜎 is sorted }. But recognizing a correctly-sorted
string does not require that we find a good way to sort an unsorted input. Another thought is to
consider the language of pairs ⟨𝜎, 𝑝⟩ where 𝑝 is a permutation of the numbers 0, ... |𝜎 | − 1 that brings
the string into ascending order. Here also the formulation seems to not capture the sorting problem, in
that recognizing a correct permutation feels different than generating one from scratch.
Both of these show the collection of languages One misleading aspect of this picture is that there are
uncountably many languages but only countably many Turing machines, and hence only countably many
computable or computably enumerable languages. So, shown to scale, the computably enumerable
area of the blob would be an infinitesimally small speck at the very bottom. But such a picture would
not show the features we want to illustrate, so these drawings take a graphical license.
the shaded collection Rec consists of the Turing computable languages The name Rec is because these used
to be known as the ‘recursive’ languages.
input two numbers and output their midpoint See https://hal.archives-ouvertes.fr/file/index/d
ocid/576641/filename/computing-midpoint.pdf.
final two bits are 00 Decimal representation is not much harder since a decimal number is divisible by
four if and only if the final two digits are in the set { 00, 04, ... 96 }.
everything of interest can be represented with reasonable efficiency by bitstrings See https://rjlipton
.wordpress.com/2010/11/07/what-is-a-complexity-class/. Of course, a wag may say that if it
cannot be represented by bitstrings then it isn’t of interest. But we mean something less tautological: we
mean that if we could want to compute with it then it can be put in bitstrings. For example, we find
that we can process speech, adjust colors on an image, or regulate pressure in a rocket fuel tank, all in
bitstrings, despite what may at first encounter seem to be the inherently analog nature of these things.
Beethoven’s 9th Symphony The official story is that CD’s are 72 minutes long so that they can hold this
piece.
researchers often do not mention representations This is like a programmer saying, “My program inputs
a number” rather than, “My program inputs the binary representation of a number.” It is also like a
person saying, “That’s me on the card” rather than “That’s a picture of me.”
leaving implementation details to a programmer (Grossman 2010)
the time or space behavior We will concentrate our attention resource bounds in the range from logarithmic
and exponential, because these are the most useful for understanding problems that arise in practice.
less than centuries See the video from Google at https://www.youtube.com/watch?v=-ZNEzzDcllU
and S Aaronson’s Quantum Supremacy FAQ at https://www.scottaaronson.com/blog/?p=4317.
The claim is the subject of scholarly reservations See the posting from IBM Research at https://www.
ibm.com/blogs/research/2019/10/on-quantum-supremacy/ and G Kalai’s Quantum Supremacy
Skepticism FAQ at https://gilkalai.wordpress.com/2019/11/13/gils-collegial-quantum-
supremacy-skepticism-faq/.
We give the class P our attention This discussion gained much from the material in (Allender, Loui, and
Regan 1997). This includes several direct quotations.
RE Recall that ‘recursively enumerable’ is an older term for ‘computably enumerable’.
adds some wrinkles But it avoids a wrinkle that we needed for Finite State machines and Pushdown
machines, 𝜀 transitions, since Turing machines are not required to consume their input one character at
a time.
function computed by a nondeterministic machine One thing that we can do is to define that the
nondeterministic machine computes 𝑓 : B∗ → B∗ is that if on an input 𝜎 , all branches halt and they all
leave the same value on the tape, which we call 𝑓 (𝜎) . Otherwise, the value is undefined, 𝑓 (𝜎)↑.
might be much faster R Hamming gives this example to demonstrate that an order of magnitude change
in speed can change the world, can change what can be done: we walk at 4 mph, a car goes at 40 mph,
and an airplane goes at 400 mph. This relates to the bug picture that opens this chapter.
we don’t find the 𝜔 ’s, we just use them This is like the Mechanical Turk https://en.wikipedia.org/wik
i/Mechanical_Turk in that the machine V does not need the smarts, it is the person, or the demon,
who provides that.
strategy for chess Chess is known to be a solvable game. This is Zermelo’s Theorem (Wikipedia contributors
2017m) — there is a strategy for one of the two players that forces a win or a draw, no matter how the
opponent plays
a deterministic verifier must take exponential time In fact, in the terminology of a later section, chess is
known to be EXP complete. See (Fraenkel and Lichtenstein 1981).
in a sense, useless Being given an answer with no accompanying justification is a problem. This is
like the Feynman algorithm for doing Physics: “The student asks . . . what are Feynman’s methods?
[M] Gell-Mann leans coyly against the blackboard and says: Dick’s method is this. You write down the
problem. You think very hard. (He shuts his eyes and presses his knuckles parodically to his forehead.)
Then you write down the answer.” (Gleick 1992) It is also like the mathematician S Ramanujan, who
relayed that the advanced formulas that he produced came in dreams from the god Narasimha. Some of
these formulas were startling and amazing, but some of them were wrong. (Chakrabarty 2017) Another
such story has to do with G Hardy about to board a ferry crossing rough seas from Denmark to Britain.
He sent a postcard to another mathematician stating that he had solved the Riemann hypothesis (this
is still one of the most famous unproven hypothesis in mathematics). And of course the most famous
example of a failure to provide backing is Fermat writing in a book he was reading that that there are
no nontrivial instances of 𝑥 𝑛 + 𝑦𝑛 ≠ 𝑧𝑛 for 𝑛 > 2 and then saying, “I have discovered a truly marvelous
proof of this, which this margin is too narrow to contain.”
the verifier cannot even input them before its runtime bound expires Some authors instead define that the
verifier runs in time polynomial in its input, ⟨𝜎, 𝜔⟩ , and add the explicit restriction that |𝜔 | must be
polynomial in |𝜎 | .
How is that legal? This is reminiscent of Quantum Bogosort, a facetious sorting algorithm. Given an
unordered list of length 𝑛 , it uses a quantum source of randomness to generate a permutation of 𝑛 . It
reorders the input according to that permutation. If the list is now sorted then good. If not then the
algorithm destroys the entire universe. Assuming the Many World’s Hypothesis, the result is that in any
surviving universe the list has been sorted in linear time.
Countdown For a brief description see https://en.wikipedia.org/wiki/Countdown_(game_show). A
version that you may find fun, worth searching for the videos, is https://en.wikipedia.org/wiki/
8_Out_of_10_Cats_Does_Countdown.
many-one reducible The name comes from the fact that still another reducibility is one-one reducibility,
where the function must be one-to-one.
that’s not true under Karp reduction The Halting problem set 𝐾 and its complement are not Karp reducible.
For, we already know that 𝐾 is computably enumerable. If 𝐾 c ≤𝑝 𝐾 then 𝑥 ∈ 𝐾 c implies that 𝑓 (𝑥) ∈ 𝐾 ,
and we can enumerate 𝑓 ( 0), 𝑓 ( 1), ... and check those against the values enumerated into 𝐾 , so we
would have that 𝐾 c is also computably enumerable. That would imply that 𝐾 is computable, which it is
not.
the Petersen graph The Petersen graph is a rich source of counterexamples for conjectures in Graph Theory
Drummer problem This is often called the Marriage problem, where the men pick suitable women. But
perhaps it is time for a new paradigm.
Asymmetric Traveling Salesman (Jonker and Volgenent 1983)
Stephen Cook b 1939 and Leonid Levin b 1948 Photo credits University of Toronto, Boston University
NP complete The name is from a survey created by Knuth. See blog.computationalcomplexity.org/2
010/11/by-any-other-name-would-be-just-as-hard.html.
there are many such problems The “at least as hard” is true in the sense that such problems can answer
questions about any other problem in that class. However, note that it might be that one NP complete
problem runs in nondeterministic time that is O (𝑛) while another runs in O (𝑛 1 000 000 ) time. So this
sense is at odds with our earlier characterization of problems that are harder to solve.
The list below gives the NP complete problems most often used These are from the classic standard reference
(Garey and Johnson 1979).
a gadget See https://cs.stackexchange.com/a/1249/50343 from the Computer Science Stack
Exchange user Jeff E.
tied to whether P = NP or P ≠ NP Ladner’s theorem is that if P ≠ NP then there is a problem in NP − P
that is not NP complete.
A large class See (Karp 1972).
an ending point That is, as P Pudlàk observes, we treat P ≠ NP as an informal axiom. (Pudlàk 2013)
caricature Paul Erdős joked that a mathematician is a machine for turning coffee into theorems.
completely within the realm of possibility that 𝜙 (𝑛) grows that slowly Hartmanis observes (Hartmanis
2017) that it is interesting that Gödel, the person who destroyed Hilbert’s program of automating
mathematics, seemed to think that these problems quite possibly are solvable in linear or quadratic
time.
In 2018, a poll The poll was conducted by W Gasarch, a prominent researcher and blogger in Computational
Complexity. There were 124 respondents. For the description see https://www.cs.umd.edu/users
/gasarch/BLOGPAPERS/pollpaper3.pdf. Note the suggestions that both respondents and even the
surveyor took the enterprise in a light-hearted way.
88% thought that P ≠ NP Gasarch divided respondents into experts, the people who are known to have
seriously thought about the problem, and the masses. The experts were 99% for P ≠ NP.
S Aaronson has said See (Roberts 2021) for both the Aaronson and Williams estimates.
A Wigderson See (Wigderson 2009).
Cook is of the same mind See (S. Cook 2000).
Many observers For example, (Viola 2018).
O (𝑛 lg 7 ) method (lg 7 ≈ 2.81) Strassen’s algorithm is used in practice. The current record is O (𝑛 2.37 ) but
it is not practical. It is a galactic algorithm because while runs faster than any other known algorithm
when the problem is sufficiently large, but the first such problem is so big that we never use the
algorithm. For other examples see (Wikipedia contributors 2020b).
Matching problem The Drummer problem described earlier is a special case of this for bipartite graphs.
more things to try than atoms in the universe There are about 1080 atoms in the universe. A graph
with 100 vertices has the potential for 100 2
edges, which is about 1002 . Trying every edge would be
10 000 10 000/3.32
2 ≈ 10 cases, which is much greater than 1080 .
since the 1960’s we have an algorithm Due to J Edmonds.
Theory of Computing blog feed (Various authors 2017)
R J Lipton captured this feeling (Lipton 2009)
Knuth has a related but somewhat different take (Knuth 2014)
all this is speculation Arthur C Clarke’s celebrated First Law is, “When a distinguished but elderly scientist
states that something is possible, he is almost certainly right. When he states that something is
impossible, he is very probably wrong.” (Wikipedia contributors 2023)
exploits this difference Recent versions of the algorithm used in practice incorporate refinements that we
shall not discuss. The core idea is unchanged.
Their algorithm, called RSA Originally the authors were listed in the standard alphabetic order: Adleman,
Rivest, and Shamir. Adleman objected that he had not done enough work to be listed first and insisted
on being listed last. He said later, “I remember thinking that this is probably the least interesting paper
I will ever write.”
tremendous amount of interest and excitement In his 1977 column, Martin Gardner posed a $100 challenge,
to crack this message: 9686 9613 7546 2206 1477 1409 2225 4355 8829 0575 9991 1245 7431
9874 6951 2093 0816 2982 2514 5708 3569 3147 6622 8839 8962 8013 3919 9055 1829 9451
5781 5254 The ciphertext was generated by the MIT team from a plaintext (English) message using
𝑒 = 9007 and this number 𝑛 (which is too long to fit on one line).
114, 381, 625, 757, 888, 867, 669, 235, 779, 976, 146, 612, 010, 218, 296, 721, 242,
362, 562, 561, 842, 935, 706, 935, 245, 733, 897, 830, 597, 123, 563, 958, 705,
058, 989, 075, 147, 599, 290, 026, 879, 543, 541
In 1994, a team of about 600 volunteers announced that they had factored 𝑛 .
𝑝 =3, 490, 529, 510, 847, 650, 949, 147, 849, 619, 903, 898, 133, 417, 764,
638, 493, 387, 843, 990, 820, 577
and
𝑞 = 32, 769, 132, 993, 266, 709, 549, 961, 988, 190, 834, 461, 413, 177, 642, 967,
992, 942, 539, 798, 288, 533
That enabled them to decrypt the message: the magic words are squeamish ossifage.
based on the next result It is called Fermat’s Little Theorem in contrast with his celebrated assertion that
𝑎𝑛 + 𝑏 𝑛 = 𝑐 𝑛 for 𝑛 > 2.
computer searches suggest that these are very rare Among the numbers less than 2.5 × 1010 there are only
21 853 ≈ 2.2 × 104 pseudoprimes base 2. That’s six orders of magnitude less.
a greater than 1 − ( 1/2)𝑘 chance that 𝑛 is prime Here is the probability 1 − ( 1/2)𝑘 for the first few
𝑘 ’s.
𝑘 Chance 𝑛 is prime
1 0.500 000 000
2 0.750 000 000
3 0.875 000 000
4 0.937 500 000
5 0.968 750 000
6 0.984 375 000
7 0.992 187 500
8 0.996 093 750
9 0.998 046 875
We get an extra decimal place of certainty about every 3 1/3 iterations because lg ( 10) ≈ 3.32. So if
you want, say, five decimal places, so that you have at least a probability of 0.999 99, then it is safe to
iterate 4 · 5 = 20 times.
any reasonable-sized 𝑘 Selecting an appropriate 𝑘 is an engineering choice between the cost of extra
iterations and the gain in confidence.
we are quite confident that it is prime We are confident, but not certain. There are numbers, called
Carmichael numbers, that are pseudoprime for every base 𝑎 relatively prime to 𝑛 . The smallest example
is 𝑛 = 561 = 3 · 11 · 17, and the next two are 1 105 and 1 729. Like pseudoprimes, these seem to be
very rare. Among the numbers less than 1016 there are 279 238 341 033 922 primes, about 2.7 × 1014 ,
but only 246 683 ≈ 2.4 × 105 -many Carmichael numbers.
the minimal pub crawl See (W. Cook et al. 2017).
An example is that the Free mathematics system Sage includes one See also https://www.youtube.com/
watch?v=q8nQTNvCrjE about the Concorde TSP solver.
the Soduku problem is NP complete First proved in the MS thesis of Takayuki Yato, from the Department
of Information Science at the University of Tokyo in 1987. That document seems to have disappeared
from the web; for a place to start see the Soduku Wikipedia page.
Appendices
empty string, denoted 𝜀 Possibly 𝜀 came as an abbreviation for ‘empty’. Some authors use 𝜆 , possibly
from the German word for ‘empty’, leer. Or it might just be that someone used the symbols just
because one was needed; the story goes that when asked why he used the 𝜆 symbol for his 𝜆 calculus,
Church replied, “eenie, meenie, meinie, mo” (Stack Exchange author Jouni Sirén 2016); see also
https://www.youtube.com/watch?v=juXwu0Nqc3I
reversal 𝜎 R of a string The most practical current notion of a string, the Unicode standard, does not have
string reversal. All of the naive ways to reverse a string run into problems for arbitrary Unicode strings
which may contain non-ASCII characters, combining characters, ligatures, bidirectional text in multiple
languages, and so on. For example, merely reversing the chars (the Unicode scalar values) in a string
can cause combining marks to become attached to the wrong characters. Another example is: how
to reverse ab<backspace>ab? The Unicode Consortium has not gone through the effort to define the
reverse of a string because there is no real-world need for it. (From https://qntm.org/trick.)
Credits
Prologue
I.1.12 SE user Shuzheng, https://cs.stackexchange.com/q/45589/50343
I.1.13 Question by SE user Arsalan MGR, https://cs.stackexchange.com/q/135343/50343
I.2.9 SE user Yuval Filmus, https://cs.stackexchange.com/a/135170/50343
I.2.13 http://www.ivanociardelli.altervista.org/wp-content/uploads/2016/09/Solutions-to-
exercises.pdf
I.4.30 SE user Ted, https://math.stackexchange.com/a/75300/12012
Background
II.2 Image credit: Robert Williams and the Hubble Deep Field Team (STScI) and NASA.
II. Image credit File:Galilee.jpg. (2018, September 27). Wikimedia Commons, the free media repository.
Retrieved 22:19, January 26, 2020 from https://commons.wikimedia.org/w/index.php?title=File:
Galilee.jpg&oldid=322065651.
II.3.18 User scherk at pbworks.com.
II.3.20 Math StackExchange user Robert Z https://math.stackexchange.com/a/1896328/12012
II.3.28 Michael J Neely
II.3.31 Answer from Stack Exchange member Alex Becker.
II.4.1 ENIAC Programmers, 1946 U. S. Army Photo from Army Research Labs Technical Library
II.4.6 Started on Stack Exchange
II.4.9 From a Stack Exchange question.
II.5.13 CS SE user Kyle Strand https://cs.stackexchange.com/q/11645/50343.
II.5.14 SE user npostavs, https://cs.stackexchange.com/a/44875/50343
II.5.35 SE user Raphael https://cs.stackexchange.com/a/44901/50343
II.6.10 Question by SE user MathematicalOrchid, https://cs.stackexchange.com/q/2811/67754, and
answer by SE user Andrej Bauer used in section. The answer here is not from Andrej Bauer.
II.6.31 SE user Rajesh R
II.8.14 https://mathoverflow.net/questions/33046/arent-oracle-machines-unsound-concepts,
(The question there as elaborated is different than this adaptation’s.)
II.8.16 SE user Karolis Juodelė
II.8.19 SE user Noah Schweber
II.8.20 http://people.cs.aau.dk/~srba/courses/tutorials-CC-10/t5-sol.pdf
II.9.10 (Rogers 1987), p 214.
II.9.12 (Rogers 1987), p 214.
II.9.17 (Rogers 1987), p 214.
II.A.1 https://www.ias.edu/ideas/2016/pires-hilbert-hotel
Languages
III.1.25 F Stephan, https://www.comp.nus.edu.sg/~fstephan/toc01slides.pdf
III.1.36 SE user babou
III.2.9 SE user Rick Decker
III.2.16 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.2.19 (Hopcroft, Motwani, and Ullman 2001), exercise 5.1.2.
III.2.32 Wikipedia contributors, https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal
o_buffalo_buffalo_Buffalo_buffalo, William J. Rapaport, https://cse.buffalo.edu/~rapaport/
BuffaloBuffalo/buffalobuffalo.html
III.2.36 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.3.17 SE user DollarAkshay
III.3.24 T Zaremba, http://www.geom.uiuc.edu/~zarembe/graph3.html.
III.A.9 http://people.cs.ksu.edu/~schmidt/300s05/Lectures/GrammarNotes/bnf.html
Automata
IV.1.44 From Introduction to Languages by Martin, edition four, p 77.
IV.3.44 https://cs.stackexchange.com/a/30726
IV.4.7 https://cs.stackexchange.com/q/155353/50343
IV.4.27 SE user jmite, https://cs.stackexchange.com/a/67317/50343.
IV.4.29 (Rich 2008)
IV.4.30 (Rich 2008), https://math.stackexchange.com/a/1102627
IV.5.20 SE user David Richerby, https://cs.stackexchange.com/a/97885/67754
IV.5.24 (Rich 2008)
IV.5.30 SE author Yuval Filmus, https://cs.stackexchange.com/a/41445/50343
IV.5.31 SE user Brian M Scott, https://math.stackexchange.com/a/1508488
IV.C.15 https://www.eecs.wsu.edu/~cook/tcs/l10.html
Complexity
V. Some of the discussion is from https://softwareengineering.stackexchange.com/a/20833.
V. Discussion of the third point started as https://cs.stackexchange.com/questions/9957/justific
ation-for-neglecting-constants-in-big-o.
V. The fourth point derives from https://stackoverflow.com/a/19647659.
V. This discussion originated as (Stack Exchange author templatetypedef 2013).
V.1.57 Stack Exchange user Daniel Fischer, https://math.stackexchange.com/a/674039, and Stack
Exchange user anon, https://math.stackexchange.com/a/61741
V.1.63 Stack Exchange user Ilmari Karonen, https://math.stackexchange.com/questions/925053/us
ing-limits-to-determine-big-o-big-omega-and-big-theta
V.2.24 Sean McCulloch, https://npcomplete.owu.edu/2014/06/03/3-dimensional-matching/
V.2.54 Private communication from Puck Rombach.
V.2.68 Jan Verschelde, http://homepages.math.uic.edu/~jan/mcs401/partitioning.pdf
V.3.11 A.A. at https://rjlipton.wordpress.com/2010/11/07/what-is-a-complexity-class/#com
ment-8872
V.4.16 https://cs.stackexchange.com/q/57518
V.5.19 Paul Black, https://xlinux.nist.gov/dads/HTML/nondetermAlgo.html
V.6.28 SE user JesusIsLord at https://cstheory.stackexchange.com/a/47031/4731
V.6.30 SE user user326210, https://math.stackexchange.com/a/2564255
V.6.34 Neal E Young, University of Calfornia Riverside
V. By Psyon (Own work) CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Jigsaw_Puzzle.
svg
V.7.13 William Gasarch, https://www.cs.umd.edu/~gasarch/COURSES/452/F14/poly.pdf
V.7.17 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/np.html
V.7.18 Kevin Wayne. http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/n
p-sol.html
V.7.21 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/np.html
V.7.24 Y Lyuu, https://www.csie.ntu.edu.tw/~lyuu/complexity/2016/20161129s.pdf
V.7.29 SE user Yuval Filmus https://cs.stackexchange.com/a/132902/50343
V.8.17 SE user Yuval Filmas https://cs.stackexchange.com/a/54452/50343
Bibliography
A/V Geeks, Y. user, ed. (2013). Slide Rule - Proportion, Percentage, Squares And Square Roots (1944).
Division of Visual Aids, US Office of Education. url:
https://www.youtube.com/watch?v=dT7bSn03lx0 (visited on 08/09/2015).
Aaronson, S. (July 21, 2011a). Rosser’s Theorem via Turing machines. url:
http://www.scottaaronson.com/blog/?p=710 (visited on 12/31/2023).
— (Aug. 14, 2011b). Why Philosophers Should Care About Computational Complexity. url:
https://arxiv.org/abs/1108.1791.
— (May 3, 2012a). The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and
me. url: http://www.scottaaronson.com/blog/?p=2725.
— (Aug. 30, 2012b). The Toaster-Enhanced Turing Machine. url:
http://www.scottaaronson.com/blog/?p=1121 (visited on 05/28/2015).
— (July 27, 2020). The Busy Beaver Frontier. url: https://www.scottaaronson.com/papers/bb.pdf
(visited on 07/02/2024).
Adams, D. (1979). The Hitchhiker’s Guide to the Galaxy. Harmony Books. isbn: 9780345391803.
Allender, E., M. C. Loui, and K. W. Regan (1997). “Complexity Classes”. In: ed. by M. J. Atallah and
M. Blanton. Boca Raton, Florida: CRC Press. Chap. 27.
Avigad, J. (Jan. 9, 2007). “Computability and Incompleteness Lecture Notes”. In: url:
https://www.andrew.cmu.edu/user/avigad/Teaching/candi_notes.pdf (visited on 09/26/2024).
Bellos, A. (Dec. 15, 2014). “The Game of Life: a beginner’s guide”. In: The Guardian. url:
http://www.theguardian.com/science/alexs-adventures-in-numberland/2014/dec/15/the-
game-of-life-a-beginners-guide (visited on 07/14/2015).
Bernstein, E. and U. Vazirani (1997). “Quantum Complexity Theory”. In: SIAM Journal of Compututing
26.5, pp. 1411–1473.
Bigham, D. S. (Aug. 19, 2014). How Many Vowels Are There in English? (Hint: It’s More Than AEIOUY.).
Slate. url: http://www.slate.com/blogs/lexicon_valley/2014/08/19/aeiou_and_sometimes_
y_how_many_english_vowels_and_what_is_a_vowel_anyway.html (visited on 06/12/2017).
Black, R. (2000). “Proving Church’s Thesis”. In: Philosophia Mathematica 8, pp. 244–258.
Blanda, S. (2013). The Six Degrees of Kevin Bacon. [Online; accessed 2019-Apr-01]. url:
https://blogs.ams.org/mathgradblog/2013/11/22/degrees-kevin-bacon/.
Boro Sitnikovski (2024). Deriving a Quine in a Lisp. url:
https://bor0.wordpress.com/2020/04/24/deriving-a-quine-in-a-lisp/ (visited on
09/26/2024).
Brady, A. H. (Apr. 1983). “The Determination of the Value of Rado’s Noncomputable Function Σ(𝑘) for
Four-State Turing Machines”. In: Mathematics of Computation 40.162, pp. 647–665.
Bragg, M. (Sept. 2016). Zeno’s Paradoxes. Podcast. Guests: Marcus du Sautoy, Barbara Sattler, and James
Warren. British Broadcasting Corporation. url: https://www.bbc.co.uk/programmes/b07vs3v1.
Brock, D. C. (2020). Discovering Dennis Ritchie’s Lost Dissertation. [Online; accessed 2020-Jun-20]. url:
https://computerhistory.org/blog/discovering-dennis-ritchies-lost-dissertation/.
Brower, K. (1983). The Starship and the Canoe. Harper Perennial; Reprint edition. isbn: 978-0060910303.
Brubaker, B. (Apr. 2, 2024). “With Fifth Busy Beaver, Researchers Approach Computation’s Limits”. In:
Quanta. url: https://www.quantamagazine.org/amateur-mathematicians-find-fifth-busy-
beaver-turing-machine-20240702/ (visited on 04/02/2024).
Bruck, R. H. (1953). “Computational Aspects of Certain Combinatorial Problems”. In: AMS Symposium in
Applied Mathematics 6, p. 31.
Chakrabarty, R. (Apr. 26, 2017). “Srinivasa Ramanujan: The mathematical genius who credited his 3900
formulae to visions from Goddess Mahalakshmi”. In: India Today. url:
https://www.indiatoday.in/education-today/gk-current-affairs/story/srinivasa-
ramanujan-life-story-973662-2017-04-26 (visited on 11/27/2020).
Church, A. (1937). “Review of Alan M. Turing, On computable numbers, with an application to the
Entscheidungsproblem”. In: Journal of Symbolic Logic 2, pp. 42–43.
Cobham, A. (1965). “The intrinsic computational difficulty of functions”. In: Logic, Methodology and
Philosophy of Science: Proceedings of the 1964 International Congress. Ed. by Y. Bar-Hillel. North-Holland
Publishing Company, pp. 24–30.
Cook, S. (2000). The P vs NP Problem. Official problem description. Clay Mathematics Institute. url:
https://www.claymath.org/sites/default/files/pvsnp.pdf (visited on 01/11/2018).
Cook, W. et al. (2017). UK Pubs Travelling Salesman Problem. url:
http://www.math.uwaterloo.ca/tsp/pubs/index.html (visited on 12/16/2017).
Copeland, B. J. and D. Proudfoot (1999). “Alan Turing’s Forgotten Ideas in Computer Science”. In:
Scientific American 280.4, pp. 99–103.
Copeland, B. J. (Sept. 1996). “What is Computation?” In: Computation, Cognition and AI, pp. 335–359.
— (1999). “Beyond the universal Turing machine”. In: Australasian Journal of Philosophy 77.1, pp. 46–67.
— (Aug. 19, 2002). The Church-Turing Thesis; Misunderstandings of the Thesis. url:
http://plato.stanford.edu/entries/church-turing/#Bloopers (visited on 01/07/2016).
Cox, R. (2007). Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python,
Ruby, . . .). url: https://swtch.com/~rsc/regexp/regexp1.html (visited on 06/29/2019).
Davis, M. (2004). “The Myth of Hypercomputation”. In: Alan Turing: Life and Legacy of a Great Thinker.
Ed. by C. Teuscher. Springer, pp. 195–211. isbn: ISBN 978-3-662-05642-4.
— (2006). “Why there is no such discipline as hypercomputation”. In: Applied Mathematics and
Computation 178, pp. 4–7.
Dershowitz, N. and Y. Gurevich (Sept. 2008). “A Natural Axiomatization of Computability and Proof of
Church’s Thesis”. In: Bulletin of Symbolic Logic 14.3, pp. 299–350.
Edmunds, J. (1965). “Paths, trees, and flowers”. In: Canadian Journal of Mathematics 17, pp. 449–467.
Eén, N. and N. Sörensson (2005). MiniSat. url: http://minisat.se/ (visited on 05/16/2022).
Encyclopædia Britannica Editors (2017). Y2K bug. url:
https://www.britannica.com/technology/Y2K-bug (visited on 05/10/2017).
Euler, L. (1766). “Solution d’une question curieuse que ne paroit soumise a aucune analyse (Solution of a
curious question which does not seem to have been subjected to any analysis)”. In: Mémoires de
l’Academie Royale des Sciences et Belles Lettres, Année 1759 15. [Online; accessed 2017-Sep-23, article
309], pp. 310–337. url: http://eulerarchive.maa.org/.
Firth, N. (Oct. 14, 2009). “Sir Tim Berners-Lee admits the forward slashes in every web address ‘were a
mistake’”. In: Daily Mail. url: https://www.dailymail.co.uk/sciencetech/article-
1220286/Sir-Tim-Berners-Lee-admits-forward-slashes-web-address-mistake.html (visited
on 11/29/2018).
Fortnow, L. and B. Gasarch (2002). Computational Complexity Blog. [Online; accessed 2017-Nov-13]. url:
http://blog.computationalcomplexity.org/2002/11/foundations-of-complexitylesson-
7.html.
Fraenkel, A. S. and D. Lichtenstein (1981). “Computing a Perfect Strategy for 𝑛 × 𝑛 Chess Requires Time
Exponential in 𝑛 ”. In: Journal Of Combinatorial Theory, Series A, pp. 199–214.
Free Online Dictionary of Computing (Denis Howe) (2017). Stephen Kleene. [Online; accessed
21-June-2017]. url: http://foldoc.org/Stephen%20Kleene.
Gandy, R. (1980). “Church’s Thesis and Principles for Mechanisms”. In: The Kleene Symposium. Ed. by
J. Barwise, H. J. Keisler, and K. Kunen. North-Holland Amsterdam, pp. 123–148. isbn:
978-0-444-85345-5.
Gardner, M. (Oct. 1970). “Mathematical Games: The fantastic combinations of John Conway’s new solitaire
game ‘life’”. In: Scientific American 223, pp. 120–123. url:
http://www.ibiblio.org/lifepatterns/october1970.html.
Garey, M. and D. Johnson (1979). Computers and Intractability, A Guide to the Theory of NP Completeness.
W. H. Freeman.
Gizmodo (1948). UCLA’s 1948 Mechanical Computer. Accessed 2019-September-18. url:
https://vimeo.com/70589461.
Gleick, J. (Sept. 20, 1992). “Part Showman, All Genius”. In: New York Times Magazine. url:
https://www.nytimes.com/1992/09/20/magazine/part-showman-all-genius.html (visited on
11/27/2020).
Gödel, K. (1964). “What is Cantor’s Continuum Problem?” In: Philosophy of Mathematics: Selected Readings.
Ed. by P. Benacerraf and H. Putnam. Cambridge University Press, pp. 470–494.
— (1995). “Undecidable diophantine propositions”. In: Collected works Volume III: Unpublished essays and
lectures. Ed. by S. F. et al. Oxford University Press.
Goodstein, R. L. (Dec. 1947). “Transfinite Ordinals in Recursive Number Theory”. In: Journal of Symbolic
Logic 12.4, pp. 123–129.
Grossman, L. (2010). Metric Math Mistake Muffed Mars Mereorology Mission. [Online; accessed
2017-May-25]. url: https://www.wired.com/2010/11/1110mars-climate-observer-report/.
Hartmanis, J. (2017). Gödel, von Neumann and the P =?NP Problem. url:
http://www.cs.cmu.edu/~15455/hartmanis-on-godel-von-neumann.pdf (visited on
12/25/2017).
Hennie, F. (1977). Introduction to Computability. Addison-Wesley. isbn: 978-0201028485.
Hilbert, D. and W. Ackermann (1950). Principles of Mathematical Logic. Trans. by R. E. Luce. AMS Chelsea
Publishing. isbn: 978-0821820247.
Hodges, A. (1983). Alan Turing: the enigma. Simon and Schuster. isbn: 0-671-49207-1.
— (2016). Alan Turing in the Stanford Encyclopedia of Philosophy. url:
http://www.turing.org.uk/publications/stanford.html (visited on 04/06/2016).
Hofstadter, D. R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books. isbn: 978-0465026562.
Hopcroft, J. E., R. Motwani, and J. D. Ullman (2001). Introduction to Automata Theory, Languages, and
Computation. 2nd ed. Pearson Education. isbn: 0201441241.
Huggett, N. (2010). Zeno’s Paradoxes — Stanford Encyclopedia of Philosophy. [Online; accessed
23-Dec-2016]. url: https://plato.stanford.edu/entries/paradox-zeno/#ParMot.
Indian Institute of Science and Indian Institutes of Technologies (2021). Graduate Aptitude Test in
Engineering.
— (2022). Graduate Aptitude Test in Engineering.
Jones, N. D. (1997). Computability and Complexity From a Programming Perspective. 1st ed. MIT Press.
isbn: 978-0262100649.
Jonker, R. and T. Volgenent (Nov. 1, 1983). “Transforming Asymmetric into Symmetric Traveling Salesman
Problems”. In: Operations Research Letters 2.4, pp. 161–163.
Karp, R. M. (1972). “Reducibility Among Combinatorial Problems”. In: ed. by R. E. Miller and
J. W. Thatcher. New York: Plenum, pp. 85–103. url:
https://www.loc.gov/resource/cph.3c10471/ (visited on 12/21/2017).
Kleene, S. (1952). Introduction to Metamathematics. North-Holland Amsterdam. isbn: 978-0923891572.
Klyne, G. and C. Newman (July 2002). Date and Time on the Internet: Timestamps. RFC 3339. RFC Editor,
pp. 1–18. url: https://www.ietf.org/rfc/rfc3339.txt.
Knuth, D. E. (Dec. 1964). “Backus Normal Form vs. Backus Naur Form”. In: Communications of the ACM
7.12, pp. 735–736.
— (May 20, 2014). Twenty Questions for Donald Knuth. url:
http://www.informit.com/articles/article.aspx?p=2213858 (visited on 02/17/2018).
Knuutila, T. (2001). “Redescribing an algorithm by Hopcroft”. In: Theoretical Computer Science 250,
pp. 333–363.
Kragh, H. (Mar. 27, 2014). The True (?) Story of Hilbert’s Infinite Hotel. url:
http://arxiv.org/abs/1403.0059.
Leupold, J. (1725). “Details of the mechanisms of the Leibniz calculator, the most advanced of its time”. In:
Illustration in: Theatrum arithmetico-geometricum, das ist . . . [bound with Theatrum machinarium,
oder, Schau-Platz der Heb-Zeuge/Jacob Leupold. Leipzig, 1725]. Leipzig: Zufinden bey dem Autore
und Joh. Friedr. Gleditschens seel. Sohn: Gedruckt bey Christoph Zunkel, 1727. url:
https://www.loc.gov/resource/cph.3c10471/ (visited on 11/14/2016).
Levin, L. A. (Dec. 7, 2016). Fundamentals of Computing. url: https://www.cs.bu.edu/fac/lnd/toc/.
Lipton, R. J. (Sept. 22, 2009). It’s All Algorithms, Algorithms and Algorithms. url: https:
//rjlipton.wordpress.com/2009/09/22/its-all-algorithms-algorithms-and-algorithms/
(visited on 02/17/2018).
Maienschein, J. (2017). “Epigenesis and Preformationism”. In: The Stanford Encyclopedia of Philosophy.
Ed. by E. N. Zalta. Spring 2017. Metaphysics Research Lab, Stanford University.
MathOverflow user Joel David Hamkins (2010). Answer to: Infinite CPU clock rate and hotel Hilbert. url:
https://mathoverflow.net/a/22038 (visited on 04/19/2017).
McCarthy, J. (1963). A Basis for a Mathematical Theory of Computation. url:
http://www-formal.stanford.edu/jmc/basis1.pdf (visited on 06/15/2017).
Meyer, A. R. and D. M. Ritchie (1966). Research report: The complexity of loop programs. Tech. rep. 1817.
IBM.
N. J. A. Sloane, e. (2019). The On-Line Encyclopedia of Integer Sequences, A000290. url:
https://oeis.org/A000290 (visited on 03/02/2019).
Odifreddi, P. (1992). Clasical Recursion Theory. Elsevier Science. isbn: 0-444-87295-7.
Perlis, A. J. (Sept. 1, 1982). “Epigrams on Programming”. In: SIGPLAN Notices 17.9, pp. 7–13. url:
https://web.archive.org/web/19990117034445/http://www-pu.informatik.uni-
tuebingen.de/users/klaeren/epigrams.html (visited on 12/23/2023).
Peters, T. (2023). Timsort. url: https://bugs.python.org/file4451/timsort.txt (visited on
01/14/2023).
Piccinini, G. (2017). “Computation in Physical Systems”. In: The Stanford Encyclopedia of Philosophy. Ed. by
E. N. Zalta. Summer 2017. Metaphysics Research Lab, Stanford University.
Pinker, S. (Sept. 4, 2014). The Trouble With Harvard. url:
https://newrepublic.com/article/119321/harvard-ivy-league-should-judge-students-
standardized-tests (visited on 12/23/2020).
Pour-El, M. B. and I. Richards (1981). “The wave equation with computable initial data such that its unique
solution is not computable”. In: Adv. in Math 39, pp. 215–239.
Pseudonym, S. E. author (2014). Answer to: What exactly is an algorithm? url:
https://cs.stackexchange.com/a/31953 (visited on 12/27/2018).
Pudlàk, P. (2013). Logical Foundations of Mathematics and Computational Complexity. Springer. isbn:
978-3-319-34268-9.
Radó, T. (May 1962). “On Non-computable Functions”. In: Bell Systems Technical Journal, pp. 877–884.
url: https://ia601900.us.archive.org/0/items/bstj41-3-877/bstj41-3-877.pdf.
Rendell, P. (2011). A Turing Machine in Coway’s Game of Life, extendable to a Universal Turing Machine. url:
http://rendell-attic.org/gol/tm.htm (visited on 07/21/2015).
Renwick, W. S. (May 6, 1949). The start of the EDSAC log. [Online; accessed 2019-Mar-02]. url:
https://www.cl.cam.ac.uk/relics/elog.html.
Rich, E. (2008). Automata, Computability, and Complexity. Pearson. isbn: 978-0-13-228806-4.
Roberts, S. (Oct. 27, 2021). “The 50-year-old problem that eludes theoretical computer science”. In: MIT
Technology Review.
Robinson, R. (1948). “Recursion and Double Recursion”. In: Bulletin of the American Mathematical Society
10, pp. 987–993.
Rogers Jr., H. (Sept. 1958). “Gödel numberings of partial recursive functions”. In: Journal of Symbolic Logic
23.3, pp. 331–341.
— (1987). Theory of Recursive Functions and Effective Computability. MIT Press. isbn: 0-262-68052-1.
Schnieder, H.-J. (2001). “Computability in an Introductory Course on Programming”. In: Bulletin of the
European Association for Theoretical Computer Science, EATCS 73, pp. 153–164.
SE author Brian M. Scott (Feb. 14, 2020). Inverting the Cantor pairing function. url:
http://math.stackexchange.com/q/222835 (visited on 10/28/2012).
Smoryński, C. (1991). Logical Number Theory I. Springer-Verlag. isbn: 978-3540522362.
Soare, R. I. (1999). “Computability and Incomputability”. In: Handbook of Computability Theory. Ed. by
E. R. Griffor. North-Holland, Amsterdam, pp. 3–36.
Stack Exchange author Andrej Bauer (2016). Answer to: Is a Turing Machine “by definition” the most
powerful machine? [Online; accessed 2017-Nov-05]. Stack Overflow discussion board. url:
https://cs.stackexchange.com/a/66753/78536.
— (2018). Answer to: Problems understanding proof of smn theorem using Church-Turing thesis. [Online;
accessed 2020-Feb-13]. Stack Overflow discussion board. url:
https://cs.stackexchange.com/a/97946/67754.
Stack Exchange author babou and various others (2015). Justification for neglecting constants in Big O.
[Online; accessed 2017-Oct-29]. Computer Science Stack Exchange discussion board. url:
https://cs.stackexchange.com/a/41000/78536.
Stack Exchange author bobnice (2009). Answer to: RegEx match open tags except XHTML self-contained tags.
url: https://stackoverflow.com/a/1732454/7168267 (visited on 01/27/2019).
Stack Exchange author David Richerby (2018). Why is there no permutation in Regexes? (Even if regular
languages seem to be able to do this). [Online; accessed 2020-Jan-01]. Stack Overflow discussion board.
url: https://cs.stackexchange.com/a/100215/67754.
Stack Exchange author JohnL (2020). How to decide whether a language is decidable when not involving
turing machines? [Online; accessed 2020-Jun-11]. Computer Science Stack Exchange discussion board.
url: https://cs.stackexchange.com/a/127035/67754.
Stack Exchange author Jouni Sirén (2016). Answer to: What is the origin of 𝜆 for empty string? Accessed
2016-October-20. url: http://cs.stackexchange.com/a/64850/50343.
Stack Exchange author Kaktus and various others (2019). Georg Cantor’s diagonal argument, what exactly
does it prove? [Online; accessed 2019-Dec-25]. Computer Science Stack Exchange discussion board.
url: https://math.stackexchange.com/q/2176304.
Stack Exchange author Ryan Williams (Sept. 2, 2010). Comment to answer for What would it mean to
disprove Church-Turing thesis? url: https://cstheory.stackexchange.com/a/105/4731 (visited on
06/24/2019).
Stack Exchange author templatetypedef (2013). What is pseudopolynomial time? How does it differ from
polynomial time? [Online; accessed 2017-Oct-29]. Stack Overflow discussion board. url:
https://stackoverflow.com/a/19647659.
Thompson, K. (Aug. 1984). “Reflections on trusting trust”. In: Communications of the ACM 27 (8),
pp. 761–763.
Thomson, J. F. (Oct. 1954). “Tasks and Super-Tasks”. In: Analysis 15.1, pp. 1–13.
Turing, A. M. (1937). “On Computable Numbers, with an Application to the Entscheidungsproblem”. In:
Proceedings of the London Mathematical Society. 2nd ser. 42, pp. 230–265.
— (1938a). “On Computable Numbers, with an Application to the Entscheidungsproblem. A Correction.”
In: Proceedings of the London Mathematical Society. 6th ser. 43, pp. 544–546.
— (1938b). “Systems of Logic Based on Ordinals”. PhD. Princeton University.
U.S. Naval Observatory, T. S. D. (2017). Leap Seconds. [Online; accessed 10-May-2017]. url:
http://tycho.usno.navy.mil/leapsec.html.
Various authors (2017). Theory of Computing Blog Aggregator. [Online; accessed 17-May-2017]. url:
http://cstheory-feed.org/.
Viola, E. (Feb. 16, 2018). I believe P=NP. url:
https://emanueleviola.wordpress.com/2018/02/16/i-believe-pnp/ (visited on 02/16/2018).
Wigderson, A. (2009). “Knowledge, Creativity and P versus NP”. in: url:
https://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/AW09/AW09.pdf (visited on
06/10/2023).
— (2017). Mathematics and Computation. [Draft of a to-be-published book; accessed 2017-Oct-27]. url:
https://www.math.ias.edu/avi/book.
Wikipedia contributors (2014). History of the Church–Turing thesis — Wikipedia, The Free Encyclopedia.
[Online; accessed 2-October-2016]. url: https://en.wikipedia.org/w/index.php?title=Histo
ry_of_the_Church%E2%80%93Turing_thesis&oldid=618643863.
— (2015). Stigler’s law of eponymy — Wikipedia, The Free Encyclopedia. url: https:
//en.wikipedia.org/w/index.php?title=Stigler%27s_law_of_eponymy&oldid=691378684
(visited on 02/14/2016).
— (2016a). Age of the Earth — Wikipedia, The Free Encyclopedia. [Online; accessed 13-June-2016]. url:
https://en.wikipedia.org/w/index.php?title=Age_of_the_Earth&oldid=724796250.
— (2016b). Donald Michie — Wikipedia, The Free Encyclopedia. [Online; accessed 24-March-2016]. url:
https://en.wikipedia.org/w/index.php?title=Donald_Michie&oldid=708156000 (visited on
03/24/2016).
— (2016c). Nomogram — Wikipedia, The Free Encyclopedia. [Online; accessed 6-October-2016]. url:
https://en.wikipedia.org/w/index.php?title=Nomogram&oldid=742964268.
— (2016d). Ross–Littlewood paradox — Wikipedia, The Free Encyclopedia. [Online; accessed
9-February-2017]. url: https://en.wikipedia.org/w/index.php?title=Ross%E2%80%93Little
wood_paradox&oldid=739534216.
— (2016e). The Imitation Game — Wikipedia, The Free Encyclopedia. [Online; accessed 28-June-2016].
url: https://en.wikipedia.org/w/index.php?title=The_Imitation_Game&oldid=723336480.
— (2016f). Turtles all the way down — Wikipedia, The Free Encyclopedia. [Online; accessed
2016-September-04]. url: https:
//en.wikipedia.org/w/index.php?title=Turtles_all_the_way_down&oldid=736001775.
— (2016g). Zeno’s paradoxes — Wikipedia, The Free Encyclopedia. [Online; accessed 23-December-2016].
url: %5Curl%7Bhttps:
//en.wikipedia.org/w/index.php?title=Zeno%27s_paradoxes&oldid=752685211%7D.
— (2017a). 15 puzzle — Wikipedia, The Free Encyclopedia. [Online; accessed 16-September-2017]. url:
https://en.wikipedia.org/w/index.php?title=15_puzzle&oldid=789930961.
— (2017b). Almon Brown Strowger — Wikipedia, The Free Encyclopedia. [Online; accessed 9-June-2017].
url:
https://en.wikipedia.org/w/index.php?title=Almon_Brown_Strowger&oldid=783883144.
— (2017c). Artificial neuron — Wikipedia, The Free Encyclopedia. [Online; accessed 21-June-2017]. url:
https://en.wikipedia.org/w/index.php?title=Artificial_neuron&oldid=780239713.
— (2017d). Aubrey–Maturin series — Wikipedia, The Free Encyclopedia. [Online; accessed 28-March-2017].
url: https:
//en.wikipedia.org/w/index.php?title=Aubrey%E2%80%93Maturin_series&oldid=771937634.
— (2017e). Backus–Naur form — Wikipedia, The Free Encyclopedia. [Online; accessed 7-May-2017]. url:
https:
//en.wikipedia.org/w/index.php?title=Backus%E2%80%93Naur_form&oldid=778354081.
— (2017f). Magic smoke — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-11]. url:
https://en.wikipedia.org/w/index.php?title=Magic_smoke&oldid=785207817.
— (2017g). North American Numbering Plan — Wikipedia, The Free Encyclopedia. [Online; accessed
9-June-2017]. url: https:
//en.wikipedia.org/w/index.php?title=North_American_Numbering_Plan&oldid=780178791.
— (2017h). Ouija — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Ouija&oldid=776109372.
— (2017i). Pax Britannica — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Pax_Britannica&oldid=775067301.
— (2017j). Philipp von Jolly — Wikipedia, The Free Encyclopedia. [Online; accessed 30-January-2019].
url: https://en.wikipedia.org/w/index.php?title=Philipp_von_Jolly&oldid=764485788.
— (2017k). Platonic solid — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-22]. url:
https://en.wikipedia.org/w/index.php?title=Platonic_solid&oldid=801264236.
— (2017l). Unicode — Wikipedia, The Free Encyclopedia. url:
https://en.wikipedia.org/w/index.php?title=Unicode&oldid=784443067.
— (2017m). Zermelo’s theorem (game theory) — Wikipedia, The Free Encyclopedia. [Online; accessed
2017-Nov-26]. url: https://en.wikipedia.org/w/index.php?title=Zermelo%27s_theorem_(ga
me_theory)&oldid=806070716.
— (2018). Paradox — Wikipedia, The Free Encyclopedia. [Online; accessed 14-December-2018]. url:
https://en.wikipedia.org/w/index.php?title=Paradox&oldid=871193884.
— (2019a). Collatz conjecture — Wikipedia, The Free Encyclopedia. [Online; accessed 15-February-2019].
— (2019b). Mathematics: The Loss of Certainty — Wikipedia, The Free Encyclopedia. [Online; accessed
30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=Mathematics:
_The_Loss_of_Certainty&oldid=879406248.
— (2019c). Maxwell’s demon — Wikipedia, The Free Encyclopedia. [Online; accessed 1-January-2020]. url:
%5Curl%7Bhttps:
//en.wikipedia.org/w/index.php?title=Maxwell%27s_demon&oldid=930445803%7D.
— (2019d). Partial application — Wikipedia, The Free Encyclopedia. [Online; accessed 26-December-2019].
— (2020a). Foobar — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Feb-14]. url:
https://en.wikipedia.org/w/index.php?title=Foobar&oldid=934819128.
— (2020b). Galactic algorithm — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Jun-17]. url:
https://en.wikipedia.org/w/index.php?title=Galactic_algorithm&oldid=957279293.
— (2021). Mississippi River Basin Model — Wikipedia, The Free Encyclopedia. [Online; accessed
25-September-2022]. url: https://en.wikipedia.org/w/index.php?title=Mississippi_River
_Basin_Model&oldid=1041334010.
— (2023). Clarke’s three laws — Wikipedia, The Free Encyclopedia. [Online; accessed 27-June-2023]. url:
https://en.wikipedia.org/w/index.php?title=Clarke%27s_three_laws&oldid=1156462008.
— (2024). Hyperoperation — Wikipedia, The Free Encyclopedia. [Online; accessed 9-August-2024]. url:
https://en.wikipedia.org/w/index.php?title=Hyperoperation&oldid=1226527372.
YouTube channel Joint Mathematics Meetings (May 5, 2018). William Cook: “Information, Computation,
Optimization . . .”. url: https://www.youtube.com/watch?v=q8nQTNvCrjE (visited on 07/02/2024).
YouTube user navyreviewer (2010). Mechanical computer part 1. url:
https://www.youtube.com/watch?v=mpkTHyfr0pM (visited on 08/09/2015).
Zenil, H., ed. (2012). A Computable Universe, Understanding and Exploring Nature as Computation. World
Scientific. isbn: 978-9814374293.
Index
+ tape, 8
in transition tables, 181 alternation, 204
operation on a language, 218 amb, ambiguous function, 191
3 Dimensional Matching problem, 333 ambiguous grammar, 152, 153
3-Coloring problem, 334 argument, to a function, 372
3-SAT, see 3-Satisfiability problem Aristotle’s Paradox, 59, 61
3-SAT, see 3-Satisfiability problem, see 3-SAT prob- Assignment problem, 329
lem, see 3-SATproblem, see 3-SAT prob- Asymmetric Traveling Salesman problem, 328, 329
lem, see 3-SATproblem asymptotically equivalent, 269, 277
3-Satisfiability problem, 282, 300, 333 atom, 288
3-SATproblem, 328, 346
3-SAT problem, 327, 328 B, 370
3-Satisfiability Backus, J
Strict variant, 327 picture, 168
4-Satisfiability problem, 344 Backus-Naur form, BNF, 168
Bacon, Kevin, 407
accept a language, 145 balanced parentheses, 227
accept an input, 185, 193, 197 BB function, 133
acceptable numbering, 71 Berra, Y
accepted language, see recognized picture, 190
of a Turing machine, 294 Big O, 267
accepting state, 179, 181, 309 Big Θ, 268
in transition tables, 181 bijection, 375
nondeterministic Finite State machine, 192 binary sequence, 73
Pushdown machine, 229 binary tree, 176
accepts, 181, 185, 193, 197 bit string, see bitstring
Ackermann function, 31–33, 36, 47–50 bitstring, 370
Ackermann, W, 3 set of, B, 370
picture, 33 blank, 8, 309
action set, 8 blank, B, 5
action symbol, 8 BNF, 168–172
addition, 6 body of a production, 148
adjacency matrix, 161 Bogosort, 409
adjacent, 158, 159 Boole, G
Adleman, L picture, 281
picture, 351 boolean, 281
Agrawal, M expression, 281
picture, 287 function, 281
AKS primality test, 287 variable, 281
algorithm, 293 Boolean algebra, 378
definition, 293 Boolean function, 379
reliance on model, 293 bottom, ⊥, 227
alphabet, 143, 370 Bounded Halting problem, 366
input, 181 BPP, Bounded-Error Probabilistic Polynomial Time
Kleene star, 144, 398 problem, 350
Pushdown machine, 229 branch
of the computation tree, 193, 197 closed walk, 159
breadth first traversal, 166 closure under an operation, 214
breadth-first traversal, 173 CNF, 155, 281, 282, 327, 334, 359, 379
bridge, 299 co-computably enumerable, 111
Brocard’s problem, 100 co-NP, 313
Brzozowski’s algorithm, 260 Cobham’s Thesis, 272
Busy Beaver, 133–135 Cobham, A
Busy Beaver problem, 133 picture, 404
button Thesis, 272
start, 5 codomain, 372
codomain versus range, 373
c.e. set, see computably enumerable Collatz conjecture, 97
caching, 69 coloring of a graph, 161
Cantor’s correspondence, 66–74 colors, 280
Cantor’s pairing function, 67 common divisor, 27
Cantor’s Theorem, 76 compiler-compiler, 170
Cantor, G complement of a language, 146
and diagonalization, 390 complete, 115
picture, 61 for a class, 332
cardinality, 59–80 NP, 332
less than or equal to, 75 complete graph, 164
cellular automaton, 43 complexity class, 301
certificate, 312 canonical, 349
characteristic function 1𝑆 , 76 EXP, 347
Chromatic Number problem, 281, 295 NP, 311
chromatic number, 162 P, 302
Church’s Thesis, 14–21 polytime, 302
and uncountability, 77 complexity function, 267
argument by, 19 Complexity Zoo, 302, 349
clarity, 17 composition, 375
consistency, 16 computable
convergence, 16 from a set, 112
coverage, 15 relative to a set, 112
Extended, 304 set, 107
Church, A computable function, 11
picture, 15 relative to an oracle, 112
Thesis, 15 computable functions, 9–11
circuit, 159, 303 computable relation, 11
Euler, 159 computable set, 11
gate, 303 computably enumerable, 106–111
Hamiltonian, 159 in an oracle, 119
wire, 303 𝐾 is complete, 115
Circuit Evaluation problem, 303 computably enumerable set, 107
class, 144, 301 co-computably enumerable, 111
complexity, 301 collection of, RE, 307
Class Scheduling problem, 336 in increasing order, 110
Clique problem, 283, 306, 321, 333 computation
clique, in a graph, 283 distributed, 293
Finite State machine, 185 DAG, directed acyclic graph, 159
nondeterministic Finite State machine, 193, 197 dangling else, 153
relative to an oracle, 112 De Morgan, A
step, 8 picture, 280
Turing machine, 9 dead state, 399
computation tree decidable, 294
branch, 193, 197 language, 100
nondeterministic Finite State machine, 193, 197 decidable language, 307
concatenation of languages, 144 decide a language, 145
concatenation of strings, 371 decided language, 185, 294
configuration, 8, 184, 193, 197 of a nondeterministic Turing machine, 309
halting, 185, 193, 197 decider, 11
initial, 8 decides
conjunction, 281, 377 language, 307
Conjunctive Normal form, 155, 281, 334, 359, 379 set, 11
connected component, 299 deciding
connected graph, 160 a language, 11
connectives decision problem, 3, 12, 35, 294
logicial, 377 decrypter, 352
context free degree of a vertex, 162
grammar, 149 degree sequence, 162
language, 233 demon, or daemon, 192
context sensitive grammar, 233 DeMorgan’s laws
control, of a machine, 4 for logic expressions, 378
converge, 10 depth first traversal, 166
Conway, J depth-first traversal, 173
picture, 43 derivation, 148, 149, 151
Cook reducibility, 319 derivation tree, 149
Cook, S determinism, 8, 16
picture, 331 diagonalization, 74–80, 119, 120
Cook-Levin theorem, 331 effectivized, 90
correspondence, 60, 375 routine, 132
Cantor’s, 67 digraph, 159
countable, 62 directed acyclic graph, 159
countably infinite, 62 directed graph, 159
Course Scheduling problem, 290 Discrete Logarithm problem, 299
course-of-values recursion, 30 disjunction, 281, 377
CPU of Turing machine, 4 Disjunctive Normal form, DNF, 40, 379
Crossword problem, 286 distinguishable classes, 251
current symbol, 8 distinguishable states, 250
currying, 393 distributed computation, 293
CW, 164 distributive laws
cycle, 159 for logic expressions, 378
Cyclic Shift problem, 325 diverge, 10
Divisor problem, 286, 300
daemon, see demon divisor, 27
DAG common, 27
traversal, 172 greatest common, 27
DNF, 40, 379 F-SAT problem, 300
domain, 372, 373 Factoring problem, 351, 352
in the Theory of Computation, 373 Prime Factorization problem, 300
Double-SAT problem, 317 Fermat number, 36
doubler function, 3, 12 Fermat prime, 36
dovetailing, 107 Fifteen Game problem, 286, 300
Droste effect, 396 Fin problem, 330
Drummer problem, 326 final state, 179, 181
DSPACE, 348 in transition tables, 181
DTIME, 347 nondeterministic Finite State machine, 192
finite set, 61
edge, 158 Finite State automata, see Finite State machine
Finite State machine, 179–189
edge weight, 159
accept string, 185
Edmunds, J
accepting state, 181
picture, 404
alphabet, 181
effective, 3, 14
computation, 185
Electoral College, 285
configuration, 184
empty language
final state, 181
decision problem, 299
halting configuration, 185
empty string, 𝜀 , 8, 62, 370
initial configuration, 184
encrypter, 352
initial state, 184
Entscheidungsproblem, 3, 12, 14, 35, 294, 339
input string, 184
unsolvability, 96
language of, 185
enumerate, 62
minimization, 249–260
𝜀 closure, 197 next-state function, 181
𝜀 moves, 194 nondeterminism, 308
𝜀 transitions, 194–198 nondeterministic, 192
equinumerous sets, 61 powerset construction, 199
equivalent growth rates, 268 product, 215
equivalent propositional logic statements, 378 product construction, 214
error state, 182, 399
Euler Circuit problem, 280, 300
Pumping Lemma, 220
reject string, 185
Euler circuit, 159 state, 181
Euler, L step, 185
picture, 279 subset method, 213
eval, 83 transition function, 181
exclusive or, 42 Fixed point theorem, 119–125
EXP, 346–347 discussion, 122–124
expansion of a production, 148 Flauros, Duke of Hell
Ext, extensible functions, 119 picture, 192
Extended Church’s Thesis, 304 flow, 326
extended regular expression, 234 flow chart, 82
extended transition function, 185 Four Color problem, 280
for nondeterministic Finite State machines, 198 function, 372–376
nondeterministic Finite State machine, 193 91 (McCarthy), 30
extensible, 119 argument, 372
extensional property, 394 Big O, 267
Big Θ, 268 translation, 319
Boolean, 379 unpairing, 67, 136
boolean, 281 value, 372
characteristic, 76 well-defined, 372, 374
codomain, 372 zero, 24, 35, 49
composition, 375 function problem, 293
computable, 11 functions
computed by a Turing machine, 9 same behavior, 100
converge, 10 same order of growth, 268
correspondence, 60, 375
definition, 372 gadget
diverge, 10 example of, 334
domain, 372 in complexity arguments, 334
doubler, 3, 12 Galilei, Galileo
effective, 3 picture, 59
enumeration, 62 Galileo, see Galilei, Galileo
exponential growth, 270 Galileo’s Paradox, 59, 61, 62
extended transition, 185 Game of Life, 43–46
extensible, 119 rules, 43
general recursive, 35 Gardner, Martin, 43
identity, 375 gate, 40, 303
image under, 373 gcd, see greatest common divisor
index, 372 general recursion, 31–38
injection, 374 general recursive function, 35
inverse, 376 general unsolvability, 91–94
left inverse, 375 Gödel number, 71
logarithmic growth, 270 Gödel’s multiplicative encoding, 30
𝜇 recursive function (mu recursive), 35 Gödel, K, 14
next-state, 8, 181 letter to von Neumann, 339
one-to-one, 60, 374 picture, 15
onto, 60, 374 picture with Einstein, 128
order of growth, 267 Gödel’s theorem, 14
output, 372 Goldbach’s conjecture, 34, 100, 107
pairing, 67, 136 grammar, 147–157
partial, 10, 373 ambiguous, 152, 153
partial recursive, 35 Backus-Naur form, BNF, 168
polynomial growth, 270 body of a production, 148
predecessor, 6, 23 context free, 149
projection, 24, 35, 49 context sensitive, 233
range, 373 derivation, 148
recursive, 11, 35 expansion of a production, 148
reduction, 320 head, 148
restriction, 373 linear, 203
right inverse, 376 nonterminal, 148
successor, 12, 21, 24, 35, 49 production, 148, 149
surjection, 374 regular, 218
total, 10, 118, 373 rewrite rule, 148, 149
transition, 8, 181, 309 right linear, 203
start symbol, 149 vertex, 158
syntactic category, 149 vertex cover, 283
terminal, 148 vertex degree, 162
graph, 158–168 walk, 159
adjacent, 158 walk length, 159
adjacent edges, 159 weighted, 159
bridge edge, 299 Graph Colorability problem, 281, 300, 322, 336
chromatic number, 162 Graph Connectedness problem, 299, 301
circuit, 159 Graph Isomorphism problem, 299, 337
clique, 283 Graph traversal, 172–176
closed walk, 159 Grassmann, H, 21
coloring, 161–162 picture, 21
colors, 280 greatest common divisor, 27
complete, 164 guessing
connected, 160 by a machine, 191
connected component, 299
cycle, 159 hailstone function, 97
degree sequence, 162 Halt light, 5
digraph, 159 halting configuration, 185, 193, 197
directed, 159 Halting problem, 89–91, 100
directed acyclic, 159 as a decision problem, 300
edge, 158 discussion, 94–96
edge weight, 159 reduction to another problem, 94
Euler circuit, 159 relativized, 116
finite, 158 relativized to a set, 116
Hamiltonian circuit, 159 significance, 95
induced subgraph, 159 unsolvability, 90
infinite, 158 halting state, 12
isomorphism, 162–163 Halts On Three problem, 91, 113, 319
loop, 159 Hamilton, W R
matrix representation, 161 picture, 278
multigraph, 159 Hamiltonian circuit, 159
neighbors, 158 Hamiltonian Circuit problem, 278, 300, 313, 326, 333
node, 158 Hamiltonian Path problem, 313, 344
rank, 173 hard
open walk, 159 for a class, 332
path, 159 NP, 332
planar, 165, 280 haystack, 302
representation, 160–161 head
simple, 158 read/write, 4
spanning subgraph, 283 head of a production, 148
subgraph, 159 Hilbert’s Hotel, 126
trail, 159 Hilbert, D, 3
transition, 7 picture, 127
traversal, 159–160, 166 Hofstadter, D, 396
breadth-first, 173 hyperoperation, 31
depth-first, 173
tree, 159, 283 𝜄 , see inclusion function
I/O head, see read/write head Kleene star, 62, 143, 144, 370, 398
identity function, 375 regular expression, 205
Ignorabimus, 128 Kleene’s fixed point theorem, 120
image under a function, 373 Kleene’s theorem, 206–210
Implication, 42 Kleene, S, 35
inclusion function 𝜄 , 374 picture, 204
Incompleteness Theorem, 14 𝐾𝑛 , complete graph on 𝑛 vertices, 164
Independent Set problem, 291, 301, 325, 327, 345 Knapsack problem, 285, 294, 318, 323, 333
index number, 71 Knight’s Tour problem, 279
index set, 101 Knuth, D
indistinguishable states, 250 picture, 273
induced subgraph, 159 Kolmogorov, A
infinite set, 61 picture, 263
infinity, 59–66, 80 König’s lemma, 160, 310
initial configuration, 8, 184, 193, 197 Königsberg, 279
initial state, 184
injection, 374 L’Hôpital’s Rule, 269
input L-distinguishable, 242
loading, 9 L-indistinguishable, 242
input alphabet, 181 L-related, 242
input string, 184, 193, 197 lambda calculus, 𝜆 calculus, 15
input symbol, 8 language, 143–147
input, to a function, 372 + operation, 218
instruction, 5, 8, 309 accept, 145
stack machine, 229 accepted by a Finite State machine, see lan-
Integer Linear Programming problem, 316 guage, recognized by a Finite State ma-
decision problem, 328 chine
inverse of a function, 376 accepted by Turing machine, 105, 294
left, 375 class, 144
right, 376 complement, 146
two-sided, 376 concatenation, 144
isomorphic graphs, 162 context free, 233
isomorphism, 162 decidable, 100, 307
decide, 145
Johnson, K decided, 294
picture, 383 decided by a Finite State machine, 185
decided by a Turing machine, 11, 307
𝑘 Coloring problem, 161, 281 decision problem, 294
𝐾 , the Halting problem set, 90, 109 derived from a grammar, 151
complete among computably enumerable sets, grammar, 148, 149
115 Kleene star, 144
𝐾0 , set of halting pairs, 99, 110, 114 non-regular, 220–226
Karatsuba, A, 264 of a Finite State machine, 185
Karp reducible, 320 of a nondeterministic Finite State machine, 193
Karp, R operations on, 144
picture, 332 power, 144
Kayal, N recognize, 145
picture, 287 recognized, 294
recognized by a Finite State machine, 185 Marriage problem, see Drummer problem or Matching
recognized by a Turing machine, 11 problem
recognized by Turing machine, 294 Matching problem, 342
regular, 214–219 matching, three dimensional, 284
reversal, 144 Max Cut problem, 284
verifier, 312 Max-Flow problem, 326
language decision problem, 294 McCarthy’s 91 function, 30
last in, first out (LIFO) stack, 226 memoization, 69
left inverse, 375 memory, 4
leftmost derivation, 149 metacharacter, 148, 204
Legendre’s conjecture, 35 Meyer, A
picture, 51
Minimal Spanning Tree problem, 294
LEGO, 5
length, 159
length of a string, 370 minimization, 33
Levin, L minimization of a Finite State machine, 249–260
picture, 331 Brzozowski’s algorithm, 260
lexicographic order, 62 Moore’s algorithm, 250
minimization, unbounded, 35
Minimum Spanning Tree problem, 283
Life, Game of, 43–46
rules, 43
LIFO stack, 226 modulus, 352
light Moore’s algorithm, 250
Halt, 5 Morse code, 164
Linear Divisibility problem, 318 𝜇 -recursion (mu recursion), 33
Linear Programming language decision problem, 285, 𝜇 recursive function, 35
301, 316, 326 multigraph, 159
multiset, 284
Lipton’s Thesis, 297
Musical Chairs, 75
loading, 9
Myhill, J
logic gate, 40
picture, 247
logical connectives, 377
Myhill-Nerode theorem, 242–249
logical operator
and, 281, 377
𝑛 -distinguishable states, 251
not, 281, 377
𝑛 -indistinguishable states, 251
or, 281, 377
𝑛 -distinguishable classes, 251
logical operators, 377
Longest Path problem, 318, 345
Naur, P
picture, 168
LOOP Nearest Neighbor problem, 299, 301
language, 51 needle, 302
program, 51 negation, 281, 377
loop, 159 neighbors, 158
LOOP program, 51–56 Nerode, A
picture, 247
M-related, 244 next state, 5, 8
machine next tape action, 5
state, 9 next-state function, 8, 181
many-one reducible, 319 nondeterministic Finite State machine, 192
map, see function NFSM, see nondeterministic Finite State machine
mapping reducible, 319 node, 158
rank, 166 oracle, 111–119
nondeterminism, 189–203 computably enumerable in, 119
for Finite State machines, 192, 308 computation relative to, 112
for Turing machines, 308 set computable from, 112
Nondeterministic Bounded Halting problem, 365 oracle Turing machine, 112
nondeterministic Finite State machine, 192 order of growth, 263–278
accept string, 193, 197 function, 267
computation, 193, 197 hierarchy, 271
computation tree, 193, 197 ouroboros, 82
configuration, 193, 197 output, from a function, 372
convert to a deterministic machine, 198, 199
𝜀 moves, 194 P, 301–308
𝜀 transitions, 194 P hard, 323
halting configuration, 193, 197 P versus NP, 311, 337–343
initial configuration, 193, 197 pairing function, 67, 136
input string, 193, 197 Paley, W
language of, 193 picture, 130
language recognized, 193 palindrome, 14, 143, 230, 371
reject string, 193, 197 paradox
nondeterministic machine Aristotle’s, 59
recognizes a language, 318 Galileo’s, 59
nondeterministic Pushdown machine, 226–234 Zeno’s, 62
nondeterministic Turing machine parameter, 85
accepting state, 309 Parameter theorem, 85
decided language, 309 parametrization, 84–87
definition, 309 parametrizing, 85
instruction, 309 parentheses
transition function, 309 balanced, 227
nonterminal, 148, 149 parse tree, 149
NP, 308–319 parser-generator, 170
NP complete, 331–337 partial function, 10, 373
basic problems, 333 partial recursive function, 35
NP hard, 332 Partition problem, 285, 317, 333, 334, 345
NP intermediate problems, 337 path, 159
NSPACE, 348 perfect number, 95
NTIME, 347 Péter, R
numbering, 71 picture, 47
acceptable, 71 Petersen graph, 164, 166
pipe, pipe alternation operator204
Ω , Big Omega, 269 pipe symbol, | , 148
𝑜 , omicron, 269 planar graph, 165, 280
one-to-one function, 60, 374 pointer, in C, 123
onto function, 60, 374 polynomial time, 302
open walk, 159 polynomial time reducibility, 320
operators polytime, 302
logicial, 377 power of a language, 144
optimization problem, 293 power of a string, 371
optimization problem reducibility, 329 powerset construction, 199
predecessor function, 6, 23 Pushdown machine, 226–234
prefix of a string, 371 input alphabet, 229
present state, 5, 8 nondeterministic, 226–234
present tape symbol, 5 stack alphabet, 229
primality, 287 transition function, 229
Primality problem, 287, 293, 294, 300, 314 pushdown stack, 226
Prime Factorization problem, 287, 293, 337, 344
primitive recursion, 21–30, 35 quantum advantage, 304
arity, 23 Quantum Bogosort, 409
primitive recursive functions, 24 Quantum Computing, 304
private key, 352 quantum computing
problem, 293 quantum advantage, 304
decision, 294 quantum supremecy, see also quantum advantage
function, 293 quine, 130
Halting, 90, 91 Quine’s paradox, 396
language decision, 294
optimization, 293 r.e. set, see computably enumerable set
search, 294 Radó, T
unsolvable, 91 picture, 133
problem miscellany, 278–292 RAM, see Random Access machine
problems Random Access machine, 272
tractable, 272 range of a function, 373
unsolvable, 106 rank, 166, 173
product construction, 214 RE, computably enumerable sets, 295
production, 149 reachable vertex, 160, 282
production in a grammar, 148 read/write head, 4
program, 293 REC, computable sets, 295, 407
projection function, 24, 35, 49 recognize a language, 145
proper subtraction, 24 recognized language
property of a Finite State machine, 185
extensional, 394 of a nondeterministic Finite State machine, 193
Propositional logic, 377–379 of a Turing machine, 294
atom, 288 recognizing
Boolean algebra, 378 a language, 11
Boolean function, 379 recursion, 21–38
Conjunctive Normal form, 155, 282, 327, 359, course-of-values, 30
379 Recursion theorem, 120
DeMorgan’s laws, 378 recursive function, 11, 35
Disjunctive Normal form, 40, 379 recursive set, 11
distributive laws, 378 recursively enumerable set, see computably enumer-
exclusive or, 42 able set
Implication, 42 reduces to, 112
operators, 377 reducibility
pseudopolynomial, 275 between optimization problems, 329
public key, 352 Cook, 319
Pumping lemma, 220 Karp, 320
pumping length, 220 polynomial time, 320
Pushdown automata, see pushdown machine polytime, 320
polytime many-one, 320 Satisfiability problem, 282, 291, 296, 312, 321, 322,
polytime mapping, 320 327, 331
polytime Turing, 319 as a language recognition problem, 295
reducible on a nondeterministic Turing machine, 310
many-one, 319 satisfiable Propositional logic expression, 281
mapping, 319 Satisfying Assignment problem, 296
reduction from the Halting problem to another, 94 Saxena, N
reduction function, 320 picture, 287
reductions between problems, 94, 319–331 schema of primitive recursion, 23
Reflections on Trusting Trust, 132 Science United, 293
Reg problem, 330 search problem, 294
regex, 234 self reproducing program, 130
regular expression, 204–213 self reproduction, 129–132
extended, 234 semicomputable set, 107
in practice, 234–242 semidecidable set, 107
operator precedence, 205 semidecide a language, 145
regex, 234 semiprime, 287
semantics, 205 Semiprime problem, 317
syntax, 204 set
regular grammar, 218 c.e., 107
regular language, 214–219 cardinality, 61
reject an input, 185, 193, 197 computable, 11, 107
rejects, 181 computably enumerable, 106–111
relation, computable, 11 countable, 62
relativized Halting problem, 116 countably infinite, 62
for a set, 116 decider, 11
replication of a string, 371 equinumerous, 61
representation, of a problem, 297 finite, 61
restriction of a function, 373 index, 101
reversal of a language, 144 infinite, 61
reversal of a string, 371 oracle, 111–119
rewrite rule, 148, 149 r.e., see computably enumerable set
Rice’s theorem, 100–106 recursive, 11
right inverse, 376 recursively enumerable, see computably enu-
right linear, 203 merable set
Ritchie, D reduces to, 112
picture, 51 semicomputable, 107
Rivest, R semidecidable, 107
picture, 351 𝑇 equivalent, 114
root, 159 Turing equivalent, 114
RSA Encryption, 351–356 uncountable, 75
Russell set, 391 undecidable, 91
Set Cover problem, 326
𝑠 -𝑚 -𝑛 theorem, 85 Shamir, A
same behavior, functions with, 100 picture, 351
same order of growth, 268 Shannon, C
SAT, see Satisfiability problem picture, 40
SAT solver, 358 Shortest Path problem, 280, 293, 300, 301, 320
Σ function, 133 power, 371
∼, asymptotically equivalent, 277 prefix, 371
simple graph, 158 replication, 371
SPACE, 348 reversal, 371
span a graph, 283 substring, 371
spanning subgraph, 283 suffix, 371
𝑠𝑡 -Connectivity problem, see Vertex-to-Vertex Path string accepted
problem by deterministic Finite State machine, 181, 185
𝑠𝑡 -Path problem, see Vertex-to-Vertex Path problem by nondeterministic Finite State machine, 193,
stack, 226 197
alphabet, 229 string rejected, 181
bottom, ⊥, 227 String Search problem, 302
LIFO, Last-In, First-Out, 226 subgraph, 159
pop, 226 induced, 159
push, 226 subset method, 213
Start button, 5, 181 Subset Sum problem, 284, 294, 301, 317, 323, 345
start state, 5, 181 substring, 371
Pushdown machine, 229 Substring problem, 325
start symbol, 149 successor function, 12, 21, 24, 35, 49
state, 181 suffix of a string, 371
accepting, 179, 181, 309 surjection, 374
dead, 399 symbol, 8, 143, 370
error, 399 action, 8
final, 179, 181 current, 8
halting, 12 input, 8
next, 5 syntactic category, 149
present, 5
start, 5 𝑇 equivalent, 114
unreachable, 105 𝑇 reducible, 112
working, 12 table, transition, 7
state machine, 9, 384 table-filling algorithm, 250
states, 4 tail recursion, 175
distinguishable, 250 tape, 4
indistinguishable, 250 tape alphabet, 8
𝑛 -distinguishable, 251 tape symbol, 8
𝑛 -indistinguishable, 251 blank, 5
set of, 8 terminal, 148, 149
Stator square, 406 tetration, 32
STCON problem, see Vertex-to-Vertex Path problem Thompson, K
step of a computation, 8, 185 picture, 132
store, of a machine, 4 Three Dimensional Matching problem, 284, 317
str function, 298 time taken by a machine, 273
Strict 3-Satisfiability, 327 token, 143, 370
string, 143, 370–371 Tot, set of total computable functions, 110, 118
concatenation, 371 total function, 10, 118, 373
decomposition, 371 Towers of Hanoi, 26
empty, 8, 62, 370 tractable, 271–272
length, 370 trail, 159
transformation function, see reduction function next state, 5, 8
transition function, 8, 181, 309 next-state function, 8
extended, 185, 198 nondeterminism, 308
graph of, 7 numbering, 71
Pushdown machine, 229 palindrome, 14
table of, 7 present state, 5, 8
transition graph, 7 present symbol, 5
transition table, 7 recognizing a language, 11
translation function, 319 simulator, 38–39
Traveling Salesman problem, 190, 279, 296, 311, 326, tape alphabet, 8
328, 333, 344, 357 transition function, 8
Asymmetric, 328, 329 universal, 81–83
traversal, 166 with oracle, 112
tree, 159, 283 Turing reducibility, 319
binary, 176 Turing reducible, 112, 319
rank, 166 Turing, A
root, 159 picture, 3
traversal, 172 Turnpike problem, 318
Triangle problem, 306 two-sided inverse, 376
triangular number, 26
truth table, 281, 377 unbounded minimization, 33
Turing equivalent, 114 unbounded search, 33
Turing machine, 3–14 uncountable, 75
accept a language, 105 undecidable, 91
accepting a language, 307 Unicode, 182, 400
action set, 8 uniformity, 83–84
action symbol, 8 Universal Turing machine, 81–83
computation, 9 universality, 80–89
configuration, 8 unpairing function, 67, 136
control, 4 unreachable state, 105
CPU, 4 Unsolvability
current symbol, 8 in intellectual culture, 127–129
decidable, 294 unsolvability, 106
decides a set, 11 unsolvable problem, 91, 106
deciding a language, 11, 307 use-mention distinction, 123
definition, 8
deterministic, 8 value, of a function, 372
for addition, 6 verifier, 312
function computed, 9 polytime, 312
Gödel number, 71 vertex, 158
index number, 71 rank, 166
input symbol, 8 reachable, 160, 282
instruction, 5, 8 vertex cover, 283
language accepted, 294 Vertex Cover problem, 283, 291, 325
language decided, 294 Vertex cover problem, 333
language recognized, 294 Vertex-to-Vertex Path problem, 282, 301, 306, 320
multitape, 20 von Neumann, J
next action, 5 architecture, 43
picture, 43
walk, 159
walk length, 159
weight, 159
weighted graph, 159
well-defined, 372, 374
wire, 303
witness, 312
word, see string
working state, 12
XOR, 42
⊢, yields
for Finite State machines, 185
for nondeterministic Finite State machines, 193,
197
for Turing machines, 9
Zeno’s Paradox, 62
zero function, 24, 35, 49
Zoo, Complexity, 349