0% found this document useful (0 votes)

67 views448 pages

Computation Book

The document is a comprehensive overview of the Theory of Computation, detailing its significance, connections to mathematics and computer science, and the educational approach taken in the accompanying text. It emphasizes the exploration of computability, unsolvability, languages, automata, and complexity, while encouraging active learning and engagement through various resources. The text is designed for undergraduate students and is freely available for educational use.

Uploaded by

Bhav Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views448 pages

Computation Book

Uploaded by

Bhav Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 448

Theory of Computation

Making Connections

Jim Hefferon
https://hefferon.net/computation
Notation summary
Notation Description
P (𝑆) power set, collection of all subsets of 𝑆
𝑆c complement of the set 𝑆
1𝑆 characteristic function of the set 𝑆
⟨𝑎 0, 𝑎 1, ... ⟩ sequence
N, Z, Q, R natural numbers { 0, 1, ... }, integers, rationals, reals
a, b, . . . 0, 1 character (note the typeface)
Σ alphabet, set of characters
B alphabet of bits characters { 0, 1 }, or set of bits { 0, 1 }
𝜎, 𝜏 strings (any lower-case Greek letter except 𝜙 )
𝜀 empty string
Σ∗ set of all strings over the alphabet
L language, a subset of Σ∗
P Turing machine
𝜙 function computed by a Turing machine
𝜙 (𝑥)↓, 𝜙 (𝑥)↑ function converges on that input, or diverges
G graph
M Finite State machine
O (𝑓 ) order of growth of the function
C complexity class
Prob problem
V verifier for an NP language

Greek letters with pronunciation

Character Name Character Name
𝛼 alpha AL-fuh 𝜈 nu NEW
𝛽 beta BAY-tuh 𝜉, Ξ xi KSIGH
𝛾, Γ gamma GAM-muh 𝑜 omicron OM-uh-CRON
𝛿, Δ delta DEL-tuh 𝜋, Π pi PIE
𝜀 epsilon EP-suh-lon 𝜌 rho ROW
𝜁 zeta ZAY-tuh 𝜎, Σ sigma SIG-muh
𝜂 eta AY-tuh 𝜏 tau TOW as in cow
𝜃, Θ theta THAY-tuh 𝜐, Υ upsilon OOP-suh-LON
𝜄 iota eye-OH-tuh 𝜙, Φ phi FEE or FI as in high
𝜅 kappa KAP-uh 𝜒 chi KI as in high
𝜆, Λ lambda LAM-duh 𝜓, Ψ psi SIGH or PSIGH
𝜇 mu MEW 𝜔, Ω omega oh-MAY-guh

Cover photo: Bonus Bureau, Computing Divison, 1924-Nov-24.

Calculating the bonus owed to each US WW I veteran.
(Auto-generated cropping decoration added.)

This PDF was compiled 2024-Dec-20.

Preface
The Theory of Computation is a wonderful thing. It is beautiful. It has deep
connections with other areas in mathematics and computer science, as well as
with the wider intellectual world. It is full of ideas, exciting and arresting ideas,
many of which apply directly to practical computing. And, looking forward into
this century, clearly a theme will be the power and limits of computation. So it is
timely, too.
It makes a delightful course. Its organizing question — what can be done? — is
both natural and compelling. Students see the contrast between computation’s
capabilities and limits. There are well understood principles and within reach are
as-yet unknown areas.
This text aims to reflect all of that: to be precise, topical, insightful, stimulating,
and perhaps sometimes even delightful.
For students Have you ever wondered, while you were learning to instruct
computers to do your bidding, what cannot be done? And what can be done in
principle but not in practice? In this course you will see the signpost results in the
study of these questions and you will learn to use the tools to address these issues
as they come up in your work.
We will consider the very nature of computation. This has been intensively
studied for a century so you will not see all that is known, but you will see enough
to end with key insights and with a better understanding of where the profession
that you are entering stands.
We do not stint on precision — why would we want to? — but we approach
the ideas liberally; in a way that, in addition to technical detail, also attends to a
breadth of knowledge. We will be eager to make connections with other fields, with
things that you have previously studied, and with a variety of modes of thinking,
most importantly computational thinking. People learn best when the topic fits
into a whole, as several of the quotes below express.
The presentation here encourages you be an active learner: to explore and
reflect on the motivation, development, and future of those ideas. It gives you the
chance to follow things that intrigue you, including that in the back of the book
are lots of notes to the main text, many of which contain links that will take you
even deeper. There are also Extra sections at the end of each chapter to help you
explore further. Whether or not your instructor covers them formally in class, they
can further your understanding of the material and where it leads.
The subject is big and a challenge. It will change the way that you see the
world. It is also a great deal of fun. Enjoy!
For instructors We cover the definition of computability, unsolvability, languages
and grammars, automata, and complexity. The audience is undergraduate majors
in Computer Science, Mathematics, and nearby areas.
The prerequisite, besides introductory programming, is Discrete Mathematics.
We rely on propositional logic, proof methods including induction, graphs, basic
number theory, sets, functions, and relations. For non-Computer Science students
without some of these topics, appendices establish notation and terminology for
strings, functions, and propositional logic, and there are brief sections on graphs
and Big-O (this section requires derivatives).
A text does its readers a disservice if it is not precise. Details matter. But
students can also fail to understand a subject because they have not had a chance
to reflect on the underlying ideas. The presentation here stresses motivation and
naturalness and, where practical, sets the results in a network of connections.
The first example comes on the first page where we begin with Turing machines.
The alternative of first doing Finite State machines may be mathematically slicker
but for a fresh learner it is more natural to instead start by asking what can be
computed at all. We follow the definition of computable function with an extensive
discussion of Church’s Thesis, relying on the intuition that students have from their
programming experience. This discussion also justifies giving algorithms in outline
or as code in the Racket language, rather than as code for a computation model or
as intricate recursions, which better communicates the ideas.
A second example of choosing naturalness and making connections happens
with nondeterminism. We introduce it in the context of Finite State machines,
along with a discussion promoting intuition so that when it appears again in
Complexity, we can rely on this understanding to develop the standard definition
that a language is in NP if it has a polytime verifier.
A third example is the inclusion of a section introducing the kinds of problems
that drive the work in Complexity today. Still another is the discussion of the
current state of P versus NP. These and many more, taken together, encourage
students to develop the habit of inquiry. Stuff should make sense.
Exploration and Enrichment The Theory of Computation is fascinating. This
book aims to showcase that, to draw readers in, to be absorbing. It uses lively
language and many illustrations.
One way to stimulate readers is to make the material explorable. Where
practical, references are clickable. For example, each picture of a founder of the
subject is a link to their Wikipedia page. This makes them very much more likely
to be the subject of further reading than is the same content in a physical library.
Another example of encouraging engagement is the many notes in the back
that fill out, and add a spark to, the core discussion. Still another example is the
inclusion of informal discussions. Informality has the potential to be a problem,
which is why it is carefully differentiated, but it can also be very valuable. Who
has not had an Ah-ha! moment in a hand-wavy hallway conversation?
Finally, chapters end in a number of additional topics. They are suitable as
one-day lectures, or for assigning for group presentations or extra credit or honors
credit, or just for students to read for pleasure.
Schedule Below is a semester schedule that I have used. The classes were four
credits and met three times a week. I used the slides available from the home page.
Chapter I defines models of computation, Chapter II covers unsolvability,
Chapter III does languages and graphs, Chapter IV is automata, and Chapter V is
computational complexity. I assign the readings as homework and quiz on them.
For those working independently I have marked a selection of exercises with ✓.

Sections Reading Notes

Week 1 I.1, I.3 I.2
2 I.4, II.1 II.A
3 II.2, II.3
4 II.4, II.5 II.B
5 II.6, II.7 II.C
6 II.9 III.A Exam
7 III.1–III.3
8 IV.1, IV.2
9 IV.3, IV.3 IV.A
10 IV.4, IV.5
11 IV.7 IV.1.4 Exam
12 V.1, V.2 V.A
13 V.3, V.4 V.3.2
14 V.5, V.6 V.B
15 V.7 –Prepare for final–

(Some instructors will do the Myhill-Nerode theorem instead of section IV.5.)

License This book is Free. You can download and use it without cost. If you are a
teacher then you can post it on the Learning Management System for your course.
Or, you can get paper copies if you like that better. For the full details, see the
home page https://hefferon.net/computation.
One reason that the book is Free is that it is written in LATEX, which is Free, as is
Asymptote which drew the illustrations, along with Emacs and all of GNU software,
and the entire Linux platform on which this book was developed. And anyway, the
research that this text presents was all made freely available by scholars.
Beyond those reasons, there is a long tradition of making educational work
open. I believe that the synthesis here adds value — I hope so, indeed — but the
masters have left a well-marked trail and following it seems right.
Acknowledgments I owe a great debt to my wife, whose patience with this project
has gone beyond all reasonable bounds. Thank you, Lynne.
My students have made the book better in so many ways. Thank you all for
those contributions.
And, I must honor my teachers. First among them is M Lerman. Thank you,
Manny. They also include H Abelson, GJ Sussmann, and J Sussmann, whose
Structure and Interpretation of Computer Programs dared to show students how
mind-blowing it all is. When I see a computer text with examples about managing
a used car dealership inventory, I can only say: Thank you, for believing in me.
Memory works far better when you learn networks of facts rather than
facts in isolation.
– T Gowers, WHAT MATHS A-LEVEL DOESN’T NECESSARILY GIVE
YOU

Research into learning shows that content is best learned within context
. . . , when the learner is active, and that above all, when the learner can
actively construct knowledge by developing meaning and ‘layered’
understanding.
– A W (Tony) Bates, TEACHING IN A DIGITAL AGE

Teach concepts, not tricks.

– G Rota, TEN LESSONS I WISH I HAD LEARNED BEFORE I STARTED
TEACHING DIFFERENTIAL EQUATIONS

[W]hile many distinguished scholars have embraced [the Jane Austen

Society] and its delights since the founding meeting, ready to don period
dress, eager to explore antiquarian minutiae, and happy to stand up at
the Saturday-evening ball, others, in their studies of Jane Austen’s works,
. . . have described how, as professional scholars, they are rendered
uneasy by this performance of pleasure at [the meetings]. . . . I am not
going to be one of those scholars.
– E Bander, PERSUASIONS, 2017

The power of modern programming languages is that they are expressive,

readable, concise, precise, and executable. That means we can eliminate
middleman languages and use one language to explore, learn, teach, and
think.
– A Downey, PROGRAMMING AS A WAY OF THINKING

Of what use are computers? They can only give answers.

– P Picasso, THE PARIS REVIEW, SUMMER-FALL 1964

Jim Hefferon
Jericho, VT USA
University of Vermont
hefferon.net
Version 1.11, 2024-Oct-12
Contents

I Mechanical Computation 3
1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Computable functions . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
What it does not say . . . . . . . . . . . . . . . . . . . . . . . . . 17
An empirical question? . . . . . . . . . . . . . . . . . . . . . . . 17
Using Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Primitive recursion . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 General recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Ackermann functions . . . . . . . . . . . . . . . . . . . . . . . . 31
𝜇 recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Turing machine simulator . . . . . . . . . . . . . . . . . . . . . . . 38
B Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
D Ackermann’s function is not primitive recursive . . . . . . . . . . . . 47
E LOOP programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

II Background 59
1 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2 Cantor’s correspondence . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Universal Turing machine . . . . . . . . . . . . . . . . . . . . . . 81
Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5 The Halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
General unsolvability . . . . . . . . . . . . . . . . . . . . . . . . 91
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 Computably enumerable sets . . . . . . . . . . . . . . . . . . . . . . 106
8 Oracles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Jumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9 Fixed point theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 119
When diagonalization fails . . . . . . . . . . . . . . . . . . . . . 120
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A Hilbert’s Hotel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
B Unsolvability in intellectual culture . . . . . . . . . . . . . . . . . . 127
C Self Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
D Busy Beaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
E Cantor in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

III Languages, Grammars, and Graphs 143

1 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Graph representation . . . . . . . . . . . . . . . . . . . . . . . . 160
Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Graph isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . 162
A BNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
B Graph traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

IV Automata 179
1 Finite State machines . . . . . . . . . . . . . . . . . . . . . . . . . 179
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
2 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
𝜀 transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Equivalence of the machine types . . . . . . . . . . . . . . . . . . 198
3 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Kleene’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5 Non-regular languages . . . . . . . . . . . . . . . . . . . . . . . . . 220
6 Pushdown machines . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
A Regular expressions in the wild . . . . . . . . . . . . . . . . . . . . 234
B The Myhill-Nerode theorem . . . . . . . . . . . . . . . . . . . . . . 242
C Machine minimization . . . . . . . . . . . . . . . . . . . . . . . . . 249

V Computational Complexity 263

1 Big O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Tractable and intractable . . . . . . . . . . . . . . . . . . . . . . 271
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
2 A problem miscellany . . . . . . . . . . . . . . . . . . . . . . . . . 278
Problems, with stories . . . . . . . . . . . . . . . . . . . . . . . . 278
More problems, omitting the stories . . . . . . . . . . . . . . . . 282
3 Problems, algorithms, and programs . . . . . . . . . . . . . . . . . . 292
Problem types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Statements and representations . . . . . . . . . . . . . . . . . . . 296
4 P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Effect of the model of computation . . . . . . . . . . . . . . . . . 303
Naturalness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
5 NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Nondeterministic Turing machines . . . . . . . . . . . . . . . . . 308
Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
6 Reductions between problems . . . . . . . . . . . . . . . . . . . . . 319
7 NP completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
P = NP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
8 Other classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
EXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 348
The Zoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A RSA Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
B Good-enoughness . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
C SAT solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
D The Bounded Halting problem . . . . . . . . . . . . . . . . . . . . . 365

Appendix 369
A Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
B Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
C Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Notes 382

Bibliography 417
Part One

Classical Computability
Chapter
I Mechanical Computation

What can be computed? For instance, the function that doubles its input, that
takes in 𝑥 and puts out 2𝑥 , is intuitively mechanically computable. We shall call
such functions effective.
The question asks for the things that can be computed more than it asks for
how to compute them. In this Part we will be more interested in the function, in
the input-output behavior, than in the details of implementing that behavior.

Section
I.1 Turing machines
Despite this desire to downplay implementation, we follow the approach of
A Turing that the first step toward defining the set of computable
functions is to reflect on the details of what mechanisms can do.
The context of Turing’s thinking was the Entscheidungsproblem,†
proposed in 1928 by D Hilbert and W Ackermann, which asks for an
algorithm that decides, after taking as input a mathematical state-
ment, whether that statement is true or false. So he considered the
kind of symbol-manipulating computation familiar in mathematics,
such as when we expand nested brackets or verify a step in a plane
geometry proof.
After reflecting on it for a while, one day after a run‡ Turing laid
down in the grass and imagined a clerk doing by-hand multiplication
with a sheet of paper that gradually becomes covered with columns
of numbers. With this as a prototype, Turing posited conditions for
the computing agent.
First, it (or he or she) has a memory facility, such as the clerk’s Alan Turing 1912–
paper, where it can put information for later retrieval. 1954
Second, the computing agent must follow a definite procedure, a
precise set of instructions with no room for creative leaps. Part of what makes the
procedure definite is that the instructions don’t involve random methods, such as
counting clicks from radioactive decay to determine which of two possibilities to
perform.
The other thing making the procedure definite is that the agent is discrete —
it does not use continuous methods or analog devices. Thus there is no question
about the precision of operations as there might be when reading results off of a
Image: copyright Kevin Twomey, http://kevintwomey.com/lowtech.html † German for “decision
problem.” Pronounced en-SHY-duns-pob-lem. ‡ He was a serious candidate for the 1948 British Olympic
marathon team.
4 Chapter I. Mechanical Computation

slide rule or an instrument dial. In line with this, the agent works in a step-by-step
fashion. If needed they could pause between steps, note where they are (“about to
carry a 1”), and pick up again later. We say that at each moment the clerk is in one
of a finite set of possible states, which we denote 𝑞 0 , 𝑞 1 , . . .
Turing’s third condition arose because he wanted to investigate what is com-
putable in principle. He therefore imposed no upper bound on the amount of
available memory. More precisely, he imposed no finite upper bound — should
a calculation threaten to run out of storage space then more is provided. This
includes imposing no upper bound on the amount of memory available for inputs or
for outputs and no bound on the amount of extra storage, scratch memory, needed
in addition to that for inputs and outputs.† He similarly put no upper bound on
the number of instructions. And, he left unbounded the number of steps that a
computation performs before it finishes.‡
The final question Turing faced is: how smart is the computing agent? For
instance, can it multiply? We don’t need to include a special facility for multiplica-
tion because we can in principle multiply via repeated addition. We don’t even
need addition because we can repeat the add-one operation. In this way Turing
pared the computing agent down until it is quite basic, quite easy to understand,
until the operations are so elementary that we cannot easily imagine them further
divided, while still keeping that agent powerful enough to do anything that can in
principle be done.

Definition Based on these reflections, Turing pictured a box containing a mecha-

nism and fitted with a tape.

The tape is the memory, sometimes called the ‘store’. The box can read from it
and write to it, one character at a time, as well as move a read/write head relative
to the tape in either direction. Thus, to multiply, the computing agent can start by
reading the two input multiplicands from the tape (the drawing shows 74 and 72
in binary, separated by a blank), can use the tape for scratch work, and can halt
with the output written on the tape.
The box is the computing agent, the CPU, sometimes called the ‘control’. The
†
True, every existing physical computer has bounded memory, putting aside storing things in the Cloud.
However, that space is extremely large. In this Part, when working with the model devices, imposing
a bound on memory is a hindrance or at best irrelevant. ‡ Some authors describe the availability of
resources such as the amount of memory as ‘infinite’. Turing himself does this. A reader may object
that this violates the goal of the definition, to model in-principle-physically-realizable computations,
and so the development here instead says that the resources have no finite upper bound. But really, it
doesn’t matter. In both cases the point is that if something cannot be computed when there are no
bounds then it cannot be computed on any real-world device.
Section 1. Turing machines 5

Start button sets the computation going. When the computation is finished the
Halt light comes on. The engineering inside the box is not important — perhaps
like the machines that we are used to it has integrated circuits, or perhaps it has
gears and levers, or perhaps LEGO’s — what matters is that each of its finitely many
parts can only be in finitely many states. If it has chips then each register has a
finite number of possible values, while if it is made with gears or bricks then each
settles in only a finite number of possible positions. Thus, however it is made, in
total the box has only finitely many states.
While executing a calculation, the mechanism steps from state to state. For
instance, an agent doing multiplication may determine, because of what state it is
in now and because of what it is reading on the tape, that they next need to carry
a 1. The agent transitions to a new state, one whose intuitive meaning is that it is
where carries take place.
Consequently, machine steps involve four pieces of information. Call the present
state 𝑞𝑝 and the next state 𝑞𝑛 . The symbol that the read/write head is presently
pointing to is 𝑇𝑝 . Finally, the next tape action is 𝑇𝑛 . Possible actions are: moving the
tape head left or right without writing, which we denote with 𝑇𝑛 = L or 𝑇𝑛 = R,†
or writing a symbol to the tape without moving the head, which we denote with
that symbol, so that 𝑇𝑛 = 1 means the machine will write a 1 to the tape. As to the
set of characters that can go on the tape, we will choose whatever is convenient
for the job we are doing. However every tape has blanks in all but finitely many
places and so that must be one of the symbols. (We denote blank with B when an
empty space could cause confusion.)
The four-tuple 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 is an instruction. For example, the instruction 𝑞 3 1B𝑞 5
is executed only if the machine is now in state 𝑞 3 and is reading a 1 on the tape. If
so, the machine writes a blank to the tape, replacing the 1, and passes to state 𝑞 5 .
1.1 Example This Turing machine with the tape symbol set Σ = { B, 1 } has six
instructions.
Ppred = {𝑞 0 BL𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 BL𝑞 2, 𝑞 1 1B𝑞 1, 𝑞 2 BR𝑞 3, 𝑞 2 1L𝑞 2 }

Below we’ve represented an initial configuration. It shows a stretch of tape along

with the machine’s state and the position of its read/write head.
111
q0

We adopt the convention that when we press Start the machine is in state 𝑞 0 . The
picture above shows the machine reading 1, so instruction 𝑞 0 1R𝑞 0 applies. Thus the
first step is that the machine moves its tape head right and stays in state 𝑞 0 . The
first line of the following table shows this and later lines show the configurations
after later steps. Briefly, the head slides to the right, blanks out the final 1, and
slides back to the start.
†
Whether we move the tape or the head doesn’t matter, what matters is their relative motion. Thus
𝑇𝑛 = L means that either the tape or the head moves so that the head now points one place to the left.
In drawings we hold the tape steady and move the head because the graphics are easier to read.
6 Chapter I. Mechanical Computation

Step Configuration Step Configuration

111 11
1 q0
6 q2

111 11
2 q0
7 q2

111 11
3 q0
8 q2

111 11
4 q1
9 q3

11
5 q1

With that, because there is no 𝑞 3 1 instruction, the machine halts.

Think of this machine as computing the predecessor function.

𝑥 − 1 – if 𝑥 > 0
(
pred (𝑥) =
0 – else

If the machine’s initial tape is entirely blank except for 𝑛 -many consecutive 1’s and
the read/write head points to the leftmost of those 1’s, then when the machine
halts the tape will have 𝑛 − 1-many 1’s. The only exception is where the tape starts
with 0-many 1’s, and there the tape will end with 0 many 1’s.
1.2 Example We can think of this machine with tape alphabet Σ = { B, 1 } as adding
two natural numbers.

Padd = {𝑞 0 BB𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 B1𝑞 1, 𝑞 1 11𝑞 2, 𝑞 2 BB𝑞 3, 𝑞 2 1L𝑞 2,

𝑞 3 BR𝑞 3, 𝑞 3 1B𝑞 4, 𝑞 4 BR𝑞 5, 𝑞 4 11𝑞 5 }

The input numbers are represented by two strings of 1’s, separated with a blank.
The read/write head starts under the first symbol in the first number. This shows
the machine ready to compute 2 + 3.

11 111
q0

We adopt the convention that this is the configuration at step 0. Now the machine
scans right, looking for the blank separator. It changes that into a 1, then scans
left until it finds the start. Finally, it trims off a 1 and halts with the read/write
head pointing to the start of the string. Here are the steps.
Section 1. Turing machines 7

Step Configuration Step Configuration

11 111 111111
1 q0
7 q2

11 111 111111
2 q0
8 q2

11 111 111111
3 q1
9 q3

111111 111111
4 q1
10 q3

111111 11111
5 q2
11 q4

111111 11111
6 q2
12 q5

Instead of giving a machine’s instructions as a list, we can use a table or a

diagram. Here is the transition table for Ppred and its transition graph.

Δpred B 1
𝑞0 L𝑞 1 R𝑞 0 𝑞0 𝑞1 𝑞2 𝑞3
B, L B, L B, R
𝑞1 L𝑞 2 B𝑞 1
𝑞2 R𝑞 3 L𝑞 2 1, R 1, B 1, L
𝑞3 – –

And here is the corresponding table and graph for Padd .

Δadd B 1
𝑞0 B𝑞 1 R𝑞 0
1, 1
𝑞1 1𝑞 1 1𝑞 2 𝑞0 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5
𝑞2 B𝑞 3 L𝑞 2 B, B 1, 1 B, B 1, B
B, R
𝑞3 R𝑞 3 B𝑞 4 1, R B, 1 1, L B, R
𝑞4 R𝑞 5 1𝑞 5
𝑞5 – –

The graph is how we will most often present machines that are small but if there
are lots of states then it can be visually confusing.
Next, a crucial observation. Some Turing machines, for at least some starting
configurations, never halt.
1.3 Example The machine Pinf loop = {𝑞 0 BB𝑞 0, 𝑞 0 11𝑞 0 } never halts, regardless of the
input.

B, B 𝑞0 1, 1
8 Chapter I. Mechanical Computation

The exercises ask for examples of Turing machines that halt on some inputs and
not on others.
High time for definitions. We take a symbol to be something that the device
can write and read, for storage and retrieval.†
1.4 Definition A Turing machine P is a finite set of four-tuple instructions 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 .‡
In an instruction, the present state 𝑞𝑝 and next state 𝑞𝑛 are elements of a set
of states 𝑄 . The input symbol or current symbol 𝑇𝑝 is an element of the tape
alphabet set Σ, which contains at least two members including one called blank
(and does not contain L or R). The action symbol 𝑇𝑛 is an element of the action
set Σ ∪ { L, R }.
The set P must be deterministic: different four-tuples cannot begin with the
same 𝑞𝑝𝑇𝑝 . Thus, over the set of instructions 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 ∈ P, the association of
present pair 𝑞𝑝𝑇𝑝 with next pair 𝑇𝑛𝑞𝑛 defines a function, the transition function
or next-state function Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × 𝑄 .
Of course, the point of these machines is what they do. To finish the formaliza-
tion we now give a complete description of a machine’s action.
In tracing through Example 1.1 and Example 1.2 we saw that a Turing machine
acts by governing the transitions as that machine moves step by step. A configuration
of a Turing machine is a four-tuple ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ , where 𝑞 is a state, a member
of 𝑄 , 𝑠 is a character from the tape alphabet Σ, and 𝜏𝐿 and 𝜏𝑅 are strings from Σ∗,
including possibly the empty string 𝜀 . These signify the current state, the character
under the read/write head, and the tape contents to the left and right of the head.
For instance, in the trace table of Example 1.2, the ‘Step 2’ line shows that after
two transitions the state is 𝑞 = 𝑞 0 , the character under the head is the blank 𝑠 = B,
to the left of the head is 𝜏𝐿 = 11, and to the right is 𝜏𝑅 = 111. Thus the graphic
on that line pictures the configuration ⟨𝑞 0, B, 11, 111⟩ . That is, a configuration is a
snapshot, an instant in a computation.
We write C (𝑡) for the machine’s configuration after the 𝑡 -th transition and say
that this is the configuration at step 𝑡 . We extend that to step 0 by saying that the
initial configuration C ( 0) is the machine’s configuration before we press Start.
Then to define the action: suppose that at step 𝑡 the machine P is in configuration
C (𝑡) = ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ . To make the next transition, look for an instruction 𝑞𝑝𝑇𝑝𝑇𝑛 𝑞𝑛 ∈
P with 𝑞𝑝 = 𝑞 and 𝑇𝑝 = 𝑠 . The condition of determinism ensures that the set P
has at most one such instruction. If there is no such instruction then at step 𝑡 + 1
the machine P halts.
Otherwise, there are three possibilities. (1) If 𝑇𝑛 is a symbol in the tape alphabet
set Σ then the machine writes that symbol to the tape, so that the next configuration

†
How the device does this depends on its construction details. It could read and write marks on a
paper tape, align magnetic particles on a plastic tape, twiddle bits on a solid state drive, or it could
push LEGO bricks to the left or right side of a slot. Discreteness ensures that the machine can cleanly
distinguish between the symbols, in contrast with the trouble that can happen, for instance, in reading
an instrument dial near a boundary. ‡ We denote a Turing machine with a P because although these
machines are hardware, the things from everyday experience that they are most like are programs.
Section 1. Turing machines 9

is C (𝑡 + 1) = ⟨𝑞𝑛 ,𝑇𝑛 , 𝜏𝐿 , 𝜏𝑅 ⟩ . (2) If 𝑇𝑛 = L then the machine moves the tape head
to the left. So the next configuration is C (𝑡 + 1) = ⟨𝑞𝑛 , 𝑠ˆ, 𝜏ˆ𝐿 , 𝜏ˆ𝑅 ⟩ where 𝜏ˆ𝑅 is the
concatenation of the one-character string ⟨𝑠⟩ with 𝜏𝑅 , where if 𝜏𝐿 = 𝜀 then 𝑠ˆ is
the blank and 𝜏ˆ𝐿 = 𝜀 , and otherwise where 𝑠ˆ = 𝜏𝐿 [−1] and 𝜏ˆ𝐿 = 𝜏𝐿 [ : −1] . (3) If
𝑇𝑛 = R then the machine moves the tape head to the right. This is like (2) so we
omit the details.
If two configurations are related by being a step apart then we write C (𝑖) ⊢
C (𝑖 + 1) .† A computation is a sequence C ( 0) ⊢ C ( 1) ⊢ C ( 2) ⊢ · · · . We abbreviate a
sequence of ⊢’s with ⊢∗ .‡ If the computation halts then the sequence has a final
configuration C (ℎ) so we could write a halting computation as C ( 0) ⊢∗ C (ℎ) .
1.5 Example In Example 1.1’s table tracing the machine’s steps, the graphics illustrate
the successive configurations. Here is the same sequence as a computation.

⟨𝑞 0, 1, 𝜀, 11⟩ ⊢ ⟨𝑞 0, 1, 1, 1⟩ ⊢ ⟨𝑞 0, 1, 11, 𝜀⟩ ⊢ ⟨𝑞 0, B, 111, 𝜀⟩ ⊢ ⟨𝑞 1, 1, 11, 𝜀⟩

⊢ ⟨𝑞 1, B, 11, 𝜀⟩ ⊢ ⟨𝑞 2, 1, 1, 𝜀⟩ ⊢ ⟨𝑞 2, 1, 𝜀, 1⟩ ⊢ ⟨𝑞 2, B, 𝜀, 11⟩ ⊢ ⟨𝑞 3, 1, 𝜀, 1⟩

Finally, as in that example, observe that our description of the action of a Turing
machine emphasizes that it is a state machine — a computation is a sequence of
discrete transitions.

Computable functions In this chapter’s opening we expressed interest more in

the things that the machines compute than in the machines themselves. We finish
this section by defining the set of functions that are mechanically computable.
A function is an association of inputs with outputs. For Turing machines the
natural association is to link the string on the tape when the machine starts with
the one on the tape when it stops. The idea is to compute the value of a string to
string function like 𝜙 ( 111) = 11111 as here.

1 1 1 1 1 1 1 1
𝑞0 ↦→ 𝑞ℎ

But there are a couple of things that the definition must take care with. First, a
Turing machine may fail to halt on some input strings. Second, just specifying
the input string is not enough since the initial position of the head can change the
computation.
1.6 Definition Let P be a Turing machine with tape alphabet Σ. For input 𝜎 ∈ Σ∗,
placing that on an otherwise blank tape and pointing P ’s read/write head to
𝜎 ’s left-most symbol is loading that input. If we start P with 𝜎 loaded and it
eventually halts then we denote the associated output string as 𝜙 P (𝜎) . If the
machine never halts then 𝜎 has no associated output. The function computed by
the machine P is the set of associations 𝜎 ↦→ 𝜙 P (𝜎) .

† ‡
Read ‘ ⊢’ aloud as “yields.” Read this aloud as “yields eventually.”
10 Chapter I. Mechanical Computation

1.7 Definition For 𝜎 ∈ Σ∗, if the value of a Turing machine computation is not
defined on 𝜎 then we say that the function computed by the machine diverges on
that input, written 𝜙 P (𝜎)↑ (or 𝜙 P (𝜎) = ⊥ ). Otherwise we say that it converges,
𝜙 P (𝜎)↓.
Note the difference between the machine P and the function computed by
that machine, 𝜙 P . For example, the machine Ppred is a set of four-tuples but
the predecessor function is a set of input-output pairs, which we might denote
𝑥 ↦→ pred (𝑥) . Another example of the difference is that machines halt or fail to
halt, while functions converge or diverge.
More points: (1) When there is only one machine under discussion then we
write 𝜙 instead of 𝜙 P . (2) In this book we like to write machines so that they
also finish with the head under the first character of the output string, which isn’t
strictly necessary but it makes composing machines easier. (3) In other fields of
mathematics a function comes with a domain, the set of inputs on which it is
defined. In this field the convention is to write 𝜙 : Σ∗ → Σ∗ and describe it as a
partial function, where some 𝑊 ⊆ Σ∗ is the set of input strings 𝜎 such that 𝜙 (𝜎)↓.
If 𝑊 = Σ∗ then 𝜙 is said to be a total function. (Every 𝜙 is partial but saying
‘partial’ usually connotes that the function is not total.)
There is one more point to raise about the definition. We will often consider
a function that isn’t an association of string input and output, and describe it as
computed by a machine. For this we must impose an interpretation on the strings.
For instance, with the predecessor machine in Example 1.1 we took the strings
to represent natural numbers in unary. The same holds for computations with
non-numbers, such as directed graphs, where we also just fix some encoding of
the input and output strings. (We could worry that our interpretation might be so
involved that, as with a horoscope, the work happens in the interpretation. But
we will stick to cases such as the unary representation of numbers where this is
not an issue.) Of course, the same thing happens on physical computers, where
the machine twiddles bitstrings and then we interpret them as characters in a
document, or notes in a quartet, or however we please.
When we describe the function computed by a machine, we typically omit the
part about interpreting the strings. We say, “this shows that 𝜙 ( 3) = 5” rather than,
“this shows that 𝜙 takes a string representing 3 to a string representing 5.” The
details of the representation are usually not of interest in this chapter (in the fifth
chapter we will sometimes worry about the time or space that they consume).
1.8 Remark Early researchers, working before actual machines were widely available,
needed airtight proofs that for instance there is a mechanical computation of the
function that takes in a number and returns the power of 5 in that number’s prime
factorization. So they did the details, building up a large body of work which
could be quite low level.
As an example of low-level detail, in the addition machine Example 1.2 we
took the separator blank to be significant. Allowing significant blanks raises the
issue of ambiguity: which of the blanks on the tape count as input and output and
Section 1. Turing machines 11

which do not? We could handle this by adding a character to the alphabet to use
exclusively as a begin/end marker. Or we could enforce that strings come in the
form 𝜎 = 𝛼 B𝜏 where 𝜏 consists of |𝛼 | many 1’s. Or we could code everything with
integers, such as coding the triple ⟨7, 8, 9⟩ as 27 38 59.
In this book we typically don’t work through these details. Our everyday
experience convinces us that machines can use their alphabet to reasonably
represent anything computable. Besides, spending a great deal of time on these
details risks hiding the underlying ideas, and we want to get to more interesting
material. The next section will say more.
1.9 Definition A computable function, or recursive function,† is one computed by
some Turing machine (it may be a total function or partial). A computable set,
or recursive set, is one whose characteristic function is computable. A Turing
machine decides a set if it computes the characteristic function of that set. A
relation is computable if it is computable as a set.‡
There is a terminology focused on Boolean functions that we will emphasize in
Chapter Five.
1.10 Definition A Turing machines decides a language if for all strings in the language
it halts and accepts (perhaps signaled by ending with just a 1 on the tape) and
for all strings not in the language it halts and rejects (perhaps ending with all
blanks). A Turing machine recognizes a language if for all strings members of the
language it halts and accepts, and for nonembers it never halts and accepts (but it
might fail to halt).
We close with a summary. We have defined mechanical computation. We
view it as a process whereby a physical system evolves through a sequence of
discrete steps that are local, meaning that all the action takes place within one cell
of the head. This gives us a precise characterization of which functions can be
mechanically computed. The next subsection discusses why this characterization
is widely accepted.

I.1 Exercises
Unless the exercise says otherwise, assume that Σ = { B, 1 }. Also assume that any
machine must start with its head under the leftmost input character and arrange for
it to end with the head under the leftmost output character.
1.11 How is a Turing machine like a program? How is it unlike a program? How
is it like the kind of computer we have on our desks? Unlike?
1.12 Why does the definition of a Turing machine, Definition 1.4, not include a
definition of the tape?
1.13 Your study partner asks you, “The opening paragraphs talk about the Entschei-
dungsproblem, to mechanically determine whether a mathematical statement is
†
The term ‘recursive’ used to be universal but is now old-fashioned. ‡ For instance, the relation ‘less
than’ is recursive because there is a recursive function that inputs two integers 𝑎 and 𝑏 and returns 1 if
𝑎 < 𝑏 but otherwise returns 0.
12 Chapter I. Mechanical Computation

true or false. I write programs with bits like if (x>3) all the time. What’s the
problem?” Help your friend out.
✓ 1.14 Trace each computation, as in Example 1.5. (a) The machine Ppred
from Example 1.1 when starting on a tape with two 1’s. (b) The machine Padd
from Example 1.2 the addends are 2 and 2. (c) Give the two computations as
configuration sequences, as on page 8.
✓ 1.15 For each of these false statements about Turing machines, briefly explain
the fallacy. (a) Turing machines are not a complete model of computation
because they can’t do negative numbers. (b) The problem with Example 1.3 is
that the instructions don’t have any extra states where the machine goes to halt.
(c) For a machine to reach state 𝑞 50 it must run for at least fifty one steps.
1.16 We often have some states that are halting states, where we send the machine
solely to make it halt. In this case the others are working states. For instance,
Example 1.1 uses 𝑞 3 as a halting state and its working states are 𝑞 0 , 𝑞 1 , and 𝑞 2 .
Name Example 1.2’s halting and working states.
✓ 1.17 Trace the execution of Example 1.3’s Pinf loop for ten steps, from a blank
tape.
1.18 Trace the execution on each input of this Turing machine with alphabet
Σ = { B, 0, 1 } for ten steps, or fewer if it halts.

{𝑞 0 BB𝑞 4, 𝑞 0 0R𝑞 0, 𝑞 0 1R𝑞 1, 𝑞 1 BB𝑞 4, 𝑞 1 0R𝑞 2, 𝑞 1 1R𝑞 0, 𝑞 2 BB𝑞 4, 𝑞 2 0R𝑞 0, 𝑞 2 1R𝑞 3 }

(a) 11 (b) 1011 (c) 110 (d) 1101 (e) 𝜀

✓ 1.19 Give the transition table for the machine in the prior exercise.
✓ 1.20 Write a Turing machine that, if it is started with the tape blank except for a
sequence of 1’s, will replace those with a blank and then halt.
✓ 1.21 Produce Turing machines to perform these Boolean operations, using
Σ = { B, 0, 1 }. (a) Take the ‘not’ of a bit 𝑏 ∈ Σ0 = Σ − { B }. That is, convert
the input 𝑏 = 0 into the output 1, and convert 1 into 0. (b) Take as input two
characters drawn from Σ0 and give as output the single character that is their
logical ‘and’. That is, if the input is 01 then the output should be 0, while if the
input is 11 then the output should be 1. (c) Do the same for ‘or’.
1.22 Give a Turing machine that takes as input a bit string, using the alphabet
{ B, 0, 1 }, and adds 01 at the back.
1.23 Produce a Turing machine that computes the constant function 𝜙 (𝑥) = 3.
It inputs a number written in unary, so that 𝑛 is represented as 𝑛 -many 1’s, and
outputs the number 3 in unary.
✓ 1.24 Produce a Turing machine that computes the successor function, that takes
as input a number 𝑛 and gives as output the number 𝑛 + 1 (in unary).
✓ 1.25 Produce a doubler, a Turing machine that computes 𝑓 (𝑥) = 2𝑥 .
(a) Assume that the input and output is in unary. Hint: you can erase the first 1,
Section 1. Turing machines 13

move to the end of the 1’s, past a blank, and put down two 1’s. Then move
left until you are at the start of the first sequence of 1’s. Repeat.
(b) Instead assume that the alphabet is Σ = { B, 0, 1 } and the input is represented
in binary.
✓ 1.26 Produce a Turing machine that takes as input a number 𝑛 written in unary,
represented as 𝑛 -many 1’s, and if 𝑛 is odd then it gives as output the number 1 in
unary, with the head under that 1, while if 𝑛 is even it gives the number 0 (which
in a unary representation means the tape is blank).
1.27 Write a machine P with tape alphabet Σ consisting of blank B, stroke 1, and
the comma ‘,’ character. Where Σ0 = Σ − { B }, if we interpret the input 𝜎 ∈ Σ0 as
a comma-separated list of natural numbers represented in unary, then this machine
should return the sum, also in unary. Thus, 𝜙 P ( 1111,111,1) = 11111111.
1.28 Is there a Turing machine configuration without any predecessor? Restated,
is there a configuration C = ⟨𝑞, 𝑠, 𝜏𝐿 , 𝜏𝑅 ⟩ for which there does not exist any
configuration Cˆ = ⟨𝑞,
ˆ 𝑠ˆ, 𝜏ˆ𝐿 , 𝜏ˆ𝑅 ⟩ and instruction I = 𝑞ˆ𝑠ˆ𝑇𝑛𝑞𝑛 such that if a machine
is in configuration Cˆ then instruction I applies and Cˆ ⊢ C ?
1.29 One way to argue that Turing machines can do anything that a modern
CPU can do involves showing how to do all of the CPU’s operations on a Turing
machine. For each, describe a Turing machine that will perform that operation.
You need not produce the machine, just outline the steps. Use the alphabet
Σ = { 0, 1, B }. (a) Take as input a 4-bit string and do a bitwise NOT, so that
each 0 becomes a 1 and each 1 becomes a 0. (b) Take as input a 4-bit string
and do a bitwise circular left shift, so that from 𝑏 3𝑏 2𝑏 1𝑏 0 you end with 𝑏 2𝑏 1𝑏 0𝑏 3 .
(c) Take as input two 4-bit strings and perform a bitwise AND.
✓ 1.30 For each, produce a machine meeting the condition. (a) It halts on exactly
one input. (b) It fails to halt on exactly one input. (c) It halts on infinitely many
inputs and fails to halt on infinitely many.
1.31 Definition 1.9 says that a set is computable if there is a Turing machine that
acts as its characteristic function. That is, the machine is started with the tape blank
except for the input string 𝜎 , and with the head under the leftmost input character.
This machine halts on all inputs, and when it halts, the tape is blank except for a
single character, and the head points to that character. That character is either 1
(meaning that the string 𝜎 is in the set) or 0 (meaning it is not). For the next three
exercises, produce a Turing machine that acts as the characteristic function of the set.
1.32 See the note above. Produce a Turing machine that acts as the characteristic
function of the set, {𝜎 ∈ B∗ 𝜎 [ 0] = 0 }, of bitstrings that start with 0.
1.33 Produce a Turing machine that acts as the characteristic function of the set
{𝜎 ∈ B∗ 𝜎 [ 0 : 1] = 01 } of bitstrings that start with 01.
1.34 See the note before Exercise 1.32. Produce a Turing machine that acts as the
characteristic function of the set of bitstrings that start with some number of 0’s,
including possibly zero-many of them, followed by a 1.
14 Chapter I. Mechanical Computation

1.35 Definition 1.9 talks about computable relations. Consider the ‘less than or
equal’ relation between two natural numbers. Produce a Turing machine with
Σ = { 0, 1, B } that takes in two numbers represented in unary and outputs 𝜏 = 1 if
the first number is less than or equal to the second, and 𝜏 = 0 if not.
1.36 Write a Turing machine that decides if its input is a palindrome, a string that
is the same backward as forward. Use Σ = { B, 0, 1 }. Have the machine end with a
single 1 on the tape if the input was a palindrome, and with a blank tape if not.
1.37 Turing machines tend to have many instructions and to be hard to understand.
So rather than exhibit a machine, people often give an overview. Do that for a
machine that replicates the input: if it is started with the tape blank except for a
contiguous sequence of 𝑛 -many 1’s, then it will halt with the tape containing two
sequences of 𝑛 -many 1’s separated by a single blank.
1.38 Show that if a Turing machine has the same configuration at two different
steps then it will never halt. Is that sufficient condition also necessary?
1.39 Show that the steps in the execution of a Turing machine are not necessarily
invertible. That is, produce a Turing machine and a configuration such that if you
are told the machine was brought to that configuration after some number of steps,
and you were asked what was the prior configuration, you couldn’t tell.

Section
I.2 Church’s Thesis
History Algorithms have always played a central role in mathematics. The simplest
example is a formula such as the one giving the height of a ball dropped from the
Leaning Tower of Pisa, ℎ(𝑡) = −4.9𝑡 2 + 56. This is a kind of program: get the
height output by squaring the time input, multiplying by −4.9, and adding 56.
In the 1670’s the co-creator of Calculus, G Leibniz, constructed
the first machine that could do addition, subtraction, multiplication,
division, and square roots as well. This led him to speculate on
the possibility of a machine that manipulates not just numbers but
also symbols, and could thereby determine the truth of scientific
statements. To settle any dispute, Leibniz wrote, scholars could say,
“Let us calculate!” This is a version of the Entscheidungsproblem.
The real push to understand computation arose in 1927 from
the Incompleteness Theorem of K Gödel. This says that for any
(sufficiently powerful) axiom system there are statements that,
while true in any model of the axioms, are not provable from those
Leibniz’s Stepped axioms. Gödel gave an algorithm that inputs the axioms and outputs
Reckoner the statement. This made evident the need to precisely define what
is ‘algorithmic’ or ‘mechanically computable’ or ‘effective’.
A number of mathematicians proposed formalizations. One was A Church,†
†
After producing his machine model in 1935, Turing got a PhD in 1938 under Church at Princeton.
Section 2. Church’s Thesis 15

who developed a system called the 𝜆 -calculus. Church and his students used it to
derive many intuitively computable functions such as number theoretic functions
for divisibility and prime factorization. Church suggested to the most prominent
expert in the area, Gödel, defining the set of effective functions as the set of
functions that are 𝜆 -computable. But Gödel, who was notoriously careful, was
unconvinced.
Everyone agreed that the doubler function 𝑓 (𝑥) = 2𝑥 is effective: we
can go from input to output in a way that is typographic, that pushes
symbols without any need for intuition or insight. Church and his students
had exhibited a wide class of functions that they argued are effective by
proving that they are 𝜆 calculable. But the question is: where is the far
end of this collection? Arguing that ‘derivable with the 𝜆 calculus’ implies
effective does not give the converse.
Everything changed when Gödel read Turing’s masterful analysis, out- Alonzo
Church
lined in the prior section. He wrote, “That this really is the correct definition
1903–1995
of mechanical computability was established beyond any doubt by Turing.”
2.1 Church’s Thesis The set of things that can be computed by a discrete and
deterministic mechanism is the same as the set of things that can be computed by
a Turing machine.‡
This is central to the Theory of Computation. It says that our technical results
have a larger importance — they describe the devices that are on our desks and in
our pockets.
Evidence We cannot give a mathematical proof of Church’s Thesis. The definition
of a Turing machine, or of 𝜆 calculus or other equivalent schemes, formalizes
‘intuitively mechanically computable’. When a researcher consents to work within
this formalization they are then free to reason about computation mathematically.
So in a sense Church’s Thesis comes before the mathematics, or at any rate sits
outside its usual derivation and verification work. Turing wrote, “All arguments
which can be given are bound to be, fundamentally, appeals to intuition, and for
this reason rather unsatisfactory mathematically.”
Despite not being the conclusion of a deductive system, Church’s Thesis
is generally accepted. We will give four points in its favor that persuaded
Gödel, Church, and others at the time, and that still persuade researchers
today: coverage, convergence, consistency, and clarity.
First, coverage. Everything that is intuitively computable has proven
to be computable by a Turing machine. This includes not just the number
Kurt Gödel theoretic functions investigated by researchers in the 1930’s but also
1906–1978 everything ever computed by every program written for every existing
computer, because all of them can be compiled to run on a Turing machine.
Despite this weight of evidence, the argument by coverage would collapse if
someone exhibited even one counterexample, one operation that can be done in
‡
In recent years this has come to be often called the Church-Turing Thesis. Here we figure that because
Turing has the machine, we can give Church nominal possession of the thesis.
16 Chapter I. Mechanical Computation

finite time on an in-principle-physically-realizable discrete and deterministic device

but that cannot be done on a Turing machine. So this argument is strong but at
least conceivably not decisive.
The second argument is convergence: in addition to Turing and Church, many
other researchers then and since have proposed models of computation. For
instance, the next section defining general recursive functions will give us a taste
of another influential model. However, despite a wide variation in approaches, all
of the models yield the same set of computable functions. For instance, Turing
showed that the set of functions computable with his machine model is equal to
the set of functions computable with Church’s 𝜆 -calculus.
Now, everyone could be wrong. There could be some systematic error in
thinking around this point. For centuries, geometers seemed unable to imagine the
possibility of Euclid’s Parallel Postulate not holding and perhaps a similar cultural
blindness is happening here. Nonetheless, if a number of very smart people go off
and work independently on a question and when they come back you find that
they have taken many different approaches but they all got the same answer, then
you could think that it may be the right answer. At the least, convergence says that
there is something natural and compelling about this set of functions.
The third argument was not completely available to researchers in the 1930’s
because it depends to some extent on work done since. It is consistency, that
the details of the definition of a Turing machine are not essential to what it can
compute. An example is that we can show that one-tape machines can compute all
of the functions that can be computed by machines with two or more tapes. Thus
the fact that Definition 1.4’s machines have just one tape is not an essential point.
Similarly, machines whose tape is unbounded in only one direction can compute
all the functions that are computable with a tape unbounded in both directions.
And machines with more than one read/write head compute the same functions as
those with only one. As to symbols, we need the blank for all but finitely-many
cells on the starting tape and we need at least one more symbol to make marks
distinguishable from the blank, but with only the symbols in Σ = { B, 1 } we can
compute any Turing machine computable function. Likewise, write-once machines
that cannot change any tape square after writing to it compute the same set
of functions. And although restricting to machines having only one state is not
enough, machines with two states are equipowerful with Definition 1.4’s machines
having arbitrarily-many states.
There is one more condition that does not change the set of computable
functions, determinism. Recall that the definition of Turing machine given above
does not allow, say, both of the instructions 𝑞 5 1R𝑞 6 and 𝑞 5 1L𝑞 4 in the same machine,
because they both begin with 𝑞 5 1. If we drop this restriction then the machines
that we get are called nondeterministic. We will have much more to say on this in
the fifth chapter but the collection of nondeterministic Turing machines computes
the same set of functions as does the collection of deterministic machines.
Thus, for any way in which the Turing machine definition seems to make an
Section 2. Church’s Thesis 17

arbitrary choice, making a different choice leads to the same set of computable
functions. This is persuasive in that any proper definition of what is computable
should possess this property. For instance, if two-tape machines computed more
functions than one-tape machines and three-tape machines more than that, then
identifying the set of computable functions with those computable by single-tape
machines would be foolish. But as with the coverage and convergence arguments,
while this means that the class of Turing machine-computable functions is natural
and wide-ranging, it still leaves open a small crack of a possibility that the class
does not exhaust the list of functions that are mechanically computable.
The most persuasive single argument for Church’s Thesis — what caused Gödel
to change his mind and what still convinces scholars today — is clarity: Turing’s
analysis is compelling. Gödel noted this in the quote given earlier and Church felt
the same way, writing that Turing machines have, “the advantage of making the
identification with effectiveness . . . evident immediately.”
What it does not say Church’s Thesis does not say that in all circumstances the
best way to understand a discrete and deterministic computation is via the Turing
machine model. For example, a numerical analyst studying the performance of a
floating point algorithm should use a computer model that has registers. Church’s
Thesis says that the calculation could in principle be done by a Turing machine but
for this use registers are better because the researcher wants results that apply to
in-practice machines.†
Church’s Thesis also does not say that Turing machines are all there is to any
computation in the sense that if, say, you are working on an automobile antilock
braking system then while the Turing machine model can account for the logical
and arithmetic computations, it cannot do the entire system including sensor inputs
and actuator outputs. S Aaronson has made this point, “Suppose I . . . [argued] that
. . . [Church’s] Thesis fails to capture all of computation, because Turing machines
can’t toast bread. . . . No one ever claimed that a Turing machine could handle
every possible interaction with the external world, without first hooking it up to
suitable peripherals. If you want a Turing machine to toast bread, you need to
connect it to a toaster; then the [Turing machine] can easily handle the toaster’s
internal logic.”
In the same vein, we can get physical devices that supply a stream of random
bits. These are not pseudorandom bits that are computed by a method that
is deterministic; instead, well-established physics says these are truly random.
Turing machines are not lacking because they cannot produce the bits. Rather,
Church’s Thesis asserts that we can use Turing machines to model the discrete and
deterministic computations that we can do after we get the bits.
An empirical question? This discussion raises a big question: even if we accept
Church’s Thesis, can we do more by going beyond discrete and deterministic?
†
Scientists who study the brain also find Turing machines to be not the most suitable model. Note
however that saying that another model is a better fit is different than saying that there are brain
operations that could not in principle be done using a Turing machine as a substrate.
18 Chapter I. Mechanical Computation

Would analog methods such as passing lasers through a gas, say, or some kind of
subatomic magic allow us to compute things that no Turing machine can compute?
Or are Turing machines an ultimate in physically-possible machines? Did Turing,
on that day, lying on that grassy river bank after his run, intuit everything that
experiments with reality would ever find to be possible?
For a sense of the conversation, we know that the wave equation† can have
computable initial conditions (for these real numbers 𝑥 , there is a program that
inputs 𝑖 ∈ N and outputs 𝑥 ’s 𝑖 -th decimal place) but the solution is not computable.
So does the wave tank modeled by this equation compute something that Turing
machines cannot? Stated for rhetorical effect, do the planets in their orbits compute
an exact solution to the Three-Body Problem but our machines fail at it?
In this case we can object that an experimental apparatus can have noise
and measurement problems, including a finite number of decimal places in the
instruments, etc. But even if careful analysis of the physics of a wave tank leads us
to discount it as a reliable computer of a function, we can still wonder whether
there might be another apparatus that would work.
This big question remains open. No one has produced a generally accepted
example of a non-discrete mechanism that computes a function that no Turing
machine computes. However, there is also not yet an analysis of physically-possible
mechanical computation in the non-discrete case which has the support enjoyed
by Turing’s analysis in its more narrow domain.
We will not pursue this further, instead only observing that the mainstream
community of researchers takes Church’s Thesis as the basis for its work. For us,
‘computation’ will refer to the kind of work that Turing analyzed. That’s because
we are interested in thinking about symbol-pushing, not toasting bread.
Using Church’s Thesis Church’s Thesis asserts that the three models of computa-
tion: Turing machines, 𝜆 calculus, the general recursive functions that we will see
in the next section, and others that we won’t describe, are maximally capable. By
that we mean that these models all compute the same things — the set of functions
that each model computes equals the set of functions that we have named earlier as
the set of computable functions. So we can fix one of these models as our preferred
formalization and get on with the analysis. Here we choose Turing machines.
One reason that we emphasize Church’s Thesis is that it imbues our results
with a larger importance. When for instance we will later describe a function that
no Turing machine can compute then, with the thesis in mind, we will interpret
the technical statement to mean that this function cannot be computed by any
discrete and deterministic device.
But there is one more thing that we will do with Church’s Thesis. We will
leverage it to make life easier. As the exercises above illustrate, while writing a
few Turing machines gives some insight, after a while you find that doing more
machines does not give more illumination. Worse, focusing too much on machine
details risks obscuring larger points. So if we can be clear and rigorous without
†
A partial differential equation that describes the propagation of waves.
Section 2. Church’s Thesis 19

actually having to handle a mass of detail then we will be delighted.

Church’s Thesis helps with this. Often when we want to show that something
is computable, we will first argue that it is intuitively computable and then
invoke Church’s Thesis to assert that it is therefore Turing machine computable.
With that we will proceed, “Let P be that machine . . . ” without ever having
to exhibit a set of four-tuple instructions. The justification is that our intuition
about what is computable — our sense of what can be done on a discrete and
deterministic device — has developed through great experience using general
purpose programming languages. Certainly there is a danger that we will get
‘intuitively computable’ wrong but we have so much practice that the danger is
relatively minimal. The upside is that we can make much more rapid progress
through the material.
To assert that something is intuitively computable we will sometimes produce
programming language code. For this our language is a Scheme, specifically,
Racket.

I.2 Exercises
2.2 Why is it Church’s Thesis instead of Church’s Theorem?
✓ 2.3 We’ve said that the thing from our everyday experience that Turing Machines
are most like is programs. What is the difference between: (a) a Turing Machine
and an algorithm? (b) a Turing Machine and a computer? (c) a program and a
computer? (d) a Turing Machine and a program?
2.4 Your study partner is struggling with a point. “I don’t get the excitement about
computing with a mechanism. I mean, the Stepped Reckoner is like an old-timey
calculator device: they can do some very limited computations, with numbers only.
But I’m interested in a modern computer that it vastly more flexible in that it can
also work with strings, for instance. I mean, a slide rule is not programmable, is
it?” Help them understand.
✓ 2.5 Each of these is often given as a counterargument to Church’s Thesis. Explain
why each is mistaken. (a) Turing machines have an infinite tape so it is not
a realistic model. (b) The universe is finite so there are only finitely many
configurations possible for any computing device, whereas a Turing machine has
infinitely many configurations, so it is not realistic.
✓ 2.6 One of these is a correct statement of Church’s Thesis and the others are not.
Which one is right? (a) Anything that can be computed by any mechanism can be
computed by a Turing machine. (b) No human computer, or machine that mimics
a human computer, can out-compute a Turing machine. (c) The set of things that
are computable by a discrete and deterministic mechanism is the same as the set of
things that are computable by a Turing machine. (d) Every product of a persons
mind, or product of a mechanism that mimics the activity of a person’s mind, can
be produced by some Turing machine.
2.7 List two benefits from adopting Church’s Thesis.
20 Chapter I. Mechanical Computation

✓ 2.8 Refute this objection to Church’s Thesis: “Some computations, such as an

operating system, are designed to never halt. The Turing machine is an inadequate
model for these.”
2.9 The idea of ‘intuitively computable’ certainly has subtleties. Let 𝑓 , 𝑔 : N → N.
(a) If both are intuitively computable then is 𝑓 ◦ 𝑔 also intuitively computable?
(b) What if 𝑔 is computable but 𝑓 is not?
2.10 The computers that we use are binary. Argue that if they were ternary, where
instead of bits with two values they used trits with three, then they would compute
exactly the same set of functions.
2.11 Use Church’s thesis to argue that the indicated function exists and is
computable. (a) Suppose that 𝑓0, 𝑓1 : N → N are computable partial functions.
Show that ℎ : N → N is a computable partial function where ℎ(𝑥) = 1 if 𝑥 is in
the intersection of the domain of 𝑓0 and the domain of 𝑓1 , and ℎ(𝑥) ↑ otherwise.
(b) Do the same as in the prior item, but take the union of the two domains.
(c) Suppose that 𝑓 : N → N is a computable function that is total. Show that
ℎ : N → N is a computable partial function, where ℎ(𝑥) = 1 if 𝑥 is in the range
of 𝑓 and and ℎ(𝑥)↑ otherwise. (d) Suppose 𝑓0, 𝑓1 : N → N are computable total
functions. Show that their composition ℎ = 𝑓1 ◦ 𝑓0 is a computable function
ℎ : N → N. (e) Suppose 𝑓0, 𝑓1 : N → N are computable partial functions. Show
that their composition is a computable partial function 𝑓1 ◦ 𝑓0 : N → N.
✓ 2.12 Suppose that 𝑓 : N → N is a total computable function. Argue that this
function is computable: ℎ(𝑛) = 0 if 𝑛 is in the range of 𝑓 , and ℎ(𝑛)↑ otherwise.
2.13 Let 𝑓 , 𝑔 : N → N be computable functions that may be either total or partial
functions. Use Church’s Thesis to argue that this function is computable: ℎ(𝑛) = 1
if both 𝑓 (𝑛)↓ and 𝑔(𝑛)↓, and ℎ(𝑛)↑ otherwise.
2.14 Arguing by Church’s Thesis relies on our having a solid intuition about what
is implementable on a device. The following is not implementable; what goes
wrong? “Given a polynomial 𝑝 (𝑥 0, ... 𝑥𝑛 ) , we can determine whether or not it
has natural number roots by trying all possible settings of the input (𝑥 0, ... 𝑥𝑛 ) to
𝑛 + 1-tuples of integers.”
✓ 2.15 If you allow processes to take infinitely many steps then you can have all
kinds of fun. Suppose that you have infinitely many dollars. You run into the Devil.
He proposes an infinite sequence of transactions, in each of which he will hand
you two dollars and take from you one dollar. (The first will take 1/2 hour, the
second 1/4 hour, etc.) You figure you can’t lose. But he proves to be particular
about the order in which you exchange bills. First he numbers your bills as 1, 3,
5, . . . At each step he buys your lowest-numbered bill and pays you with two
higher-numbered bills. Thus, he first accepts from you bill number 1 and pays you
with his own bills, numbered 2 and 4. Next he buys from you bill number 2 and
pays you with his bills numbered 6 and 8. How much do you end with?
2.16 The next two exercises involve multitape machines. Definition 1.4’s transition
function for a single tape is Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × 𝑄 . Define a machine with
Section 3. Recursion 21

𝑘 -many tapes by extending to Δ : 𝑄 × Σ𝑘 → (Σ ∪ { L, R })𝑘 × 𝑄 . Thus, a typical

four-tuple for a 𝑘 = 2 tape machine with alphabet Σ = { 0, 1, B } is 𝑞 4 ⟨1, B⟩ ⟨0, L⟩𝑞 3 .
It means that if the machine is in state 𝑞 4 and the head on tape 0 is reading 1 while
that on tape 1 is reading a blank, then the machine writes 0 to tape 0, moves left on
tape 1, and goes into state 𝑞 3 .
2.17 Write the transition table of a two-tape machine to complement a bitstring.
The machine has alphabet { 0, 1, B }. It starts with a string 𝜎 of 0’s and 1’s on tape 0
(the tape 0 head starts under the leftmost bit) and tape 1 is blank. When it finishes,
on on tape 1 is the complement of 𝜎 , with input 0’s changed to 1’s and input 1’s
changed to 0’s, and with the tape 1 head under the leftmost bit.
2.18 Write a two-tape Turing machine to take the logical and of two bitstrings.
The machine starts with two same-length strings of 0’s and 1’s on the two tapes.
The tape 0 head starts under the leftmost bit, as does the tape 1 head. When the
machine halts, the tape 1 head is under the leftmost bit of the result (we don’t care
about the tape 0 head).

Section
I.3 Recursion
We will outline an approach to defining computability that is different than Turing’s,
both to give a sense of another way to do this and because it is useful.† We will
list some initial functions that are intuitively computable. We will also describe
ways to combine existing functions to make new ones, where if the existing ones
are intuitively computable then so is the new one. An example of an intuitively
computable initial function is successor S : N → N, described by S (𝑥) = 𝑥 + 1, and
a combiner that preserves effectiveness is function composition. Using those, the
plus-two operation S ◦ S (𝑥) = 𝑥 + 2 is also intuitively mechanically computable.
Primitive recursion We now introduce another effectiveness-preserving
combiner, after beginning with some motivation.
Grade school students learn addition and multiplication as mildly
involved algorithms. They multiply, for example, by arranging the digits
into a table, doing partial products, and then adding. In 1861, H Grassmann
produced a more elegant definition. Here is the formula for addition,
plus : N2 → N, which takes as given the successor map.
Hermann
Grassmann
– if 𝑦 = 0
(
𝑥
plus (𝑥, 𝑦) = 1809-1877
S ( plus (𝑥, 𝑧)) – if 𝑦 = S (𝑧) for 𝑧 ∈ N

3.1 Example This finds the sum of 3 and 2.

plus ( 3, 2) = S ( plus ( 3, 1)) = S ( S ( plus ( 3, 0))) = S ( S ( 3)) = 5

†
One advantage of this approach is that it does not need the codings discussed for Turing machines as
it works directly with the functions.
22 Chapter I. Mechanical Computation

Besides being compact, this approach has a very interesting feature: ‘plus’ recurs
in its own definition.† This is definition by recursion. Whereas the grade school
definition of addition is prescriptive in that it gives a procedure, this recursive
definition is descriptive because it specifies the meaning, the semantics, of the
operation.
On first seeing recursion, many people wonder whether it might be logically
problematic — isn’t defining something in terms of itself a fallacy? However, in the
example above plus ( 3, 2) is not defined in terms of itself, it is defined in terms of
plus ( 3, 1) (and the successor function). Similarly, plus ( 3, 1) is defined in terms
of plus ( 3, 0) . And, clearly the definition of plus ( 3, 0) is not a problem. The key
here is to define the function on higher-numbered inputs using only its values on
lower-numbered ones.‡
A marvelous feature of Grassmann’s approach is that it extends naturally to
other operations. Multiplication has the same form.

0 – if 𝑦 = 0
(
product (𝑥, 𝑦) =
plus ( product (𝑥, 𝑧), 𝑥) – if 𝑦 = S (𝑧)

3.2 Example The expansion of product ( 2, 3) gives a sum of three 2’s.

product ( 2, 3) = plus ( product ( 2, 2), 2)

= plus ( plus ( product ( 2, 1), 2), 2)
= plus ( plus ( plus ( product ( 2, 0), 2), 2), 2)
= plus ( plus ( plus ( 0, 2), 2), 2)

Exponentiation works the same way.

1 – if 𝑦 = 0
(
power (𝑥, 𝑦) =
product ( power (𝑥, 𝑧), 𝑥) – if 𝑦 = S (𝑧)

3.3 Example Similarly, the expansion of power ( 2, 3) gives a product of three 2’s.

power ( 2, 3) = product ( power ( 2, 2), 2) = product ( product ( power ( 2, 1), 2), 2)

= product ( product ( product ( power ( 2, 0), 2), 2), 2)
= product ( product ( product ( 1, 2), 2), 2) = 8

We are interested in Grassmann’s definition because it is effective — it translates

directly into a program. Starting with a successor operation
(define (successor x)
(+ x 1))

†
That is, this is a discrete form of feedback. ‡ So the idea behind this recursion is that addition of
larger numbers reduces to addition of smaller ones.
Section 3. Recursion 23

this code implements the definition given above of plus.†

(define (plus x y)
(let ((z (- y 1)))
(if (= y 0)
x
(successor (plus x z)))))

(The (let ..) creates the local variable z, and sets it to 𝑦 − 1.) The same is true
for product and power.
(define (product x y)
(let ((z (- y 1)))
(if (= y 0)
0
(plus (product x z) x))))

(define (power x y)
(let ((z (- y 1)))
(if (= y 0)
1
(product (power x z) x))))

3.4 Definition A function 𝑓 is defined by the schema‡ of primitive recursion from

the functions 𝑔 and ℎ when this holds.

– if 𝑦 = 0
(
𝑔(𝑥 0, ... 𝑥𝑘 −1 )
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑦) =
ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑧), 𝑥 0, ... 𝑥𝑘 −1, 𝑧) – if 𝑦 = S (𝑧)
Here the bookkeeping is that the arity of 𝑓 , the number of inputs, is one more than
the arity of 𝑔 and one less than the arity of ℎ .
3.5 Example The function plus is defined by primitive recursion from 𝑔(𝑥 0 ) = 𝑥 0
and ℎ(𝑤, 𝑥 0, 𝑧) = S (𝑤) . The function product is defined by primitive recursion
from 𝑔(𝑥 0 ) = 0 and ℎ(𝑤, 𝑥 0, 𝑧) = plus (𝑤, 𝑥 0 ) . The function power is defined by
primitive recursion from 𝑔(𝑥 0 ) = 1 and ℎ(𝑤, 𝑥 0, 𝑧) = product (𝑤, 𝑥 0 ) .
Primitive recursion, along with function composition, suffices to define many
familiar functions.
3.6 Example The predecessor function is like an inverse to successor except that we
are using the natural numbers and so we can’t allow the predecessor of zero to
be negative. We instead take the special case that if the input is zero then the
output is zero also. We can define this function pred : N → N using the primitive
recursive schema.
0 – if 𝑦 = 0
(
pred (𝑦) =
𝑧 – if 𝑦 = S (𝑧)
Comparing this with Definition 3.4, pred has no 𝑥𝑖 ’s. Thus the bookkeeping is that
𝑔 has an arity of zero and, having no inputs, it is therefore the constant function
𝑔( ) = 0. As to ℎ , its arity is two although it ignores its first input, ℎ(𝑎, 𝑏) = 𝑏 .
†
Obviously Racket, like every general purpose programming language, comes with a built in addition
operator, as in (+ 3 2) , along with a multiplication operator, as in (* 3 2) , and with many other
arithmetic operators. ‡ A schema is an underlying organizational pattern or structure.
24 Chapter I. Mechanical Computation

3.7 Example For subtraction we must also special-case negatives. We take proper
subtraction, denoted 𝑥 −. 𝑦 , to equal 𝑥 − 𝑦 unless it is negative, in which case it
equals 0. This defines that function via primitive recursion.

– if 𝑦 = 0
(
𝑥
propersub (𝑥, 𝑦) =
pred ( propersub (𝑥, 𝑧)) – if 𝑦 = S (𝑧)

In the terms of Definition 3.4, 𝑓 is of arity two. That makes 𝑔 of arity one,
𝑔(𝑥 0 ) = 𝑥 0 . And the arity of ℎ is three so ℎ(𝑤, 𝑥 0, 𝑧) = pred (𝑤) , with two dummy
inputs.
Here is the promised collection of initial functions and function combiners.
3.8 Definition The set of primitive recursive functions consists of those that can be
® = 0,† the successor
derived from the initial operations of the zero function Z (𝑥)
function S (𝑥)
® = 𝑥 +1, and the projection functions I 𝑖 (𝑥)
® = I 𝑖 (𝑥 0, ... 𝑥𝑘 −1 ) = 𝑥𝑖 , by
a finite number of applications of the combining operations of function composition
and primitive recursion.
The initial functions are all clearly effective. Note also that the combiners are
such that if the parts are effective then so is their combination. In particular, the
computer code above makes evident that primitive recursion preserves effectiveness.
Hence every function in that set is of interest to us as it is intuitively mechanically
computable.
Function composition covers not just the simple case of two functions 𝑓 and 𝑔
that combine as 𝑓 ◦ 𝑔 (𝑥) ® = 𝑓 (𝑔(𝑥)) ® . It also covers simultaneous substitution,
where from 𝑓 (𝑥 0, ... 𝑥𝑛 ) and ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ) , . . . and ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 ) we get
𝑓 ( ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 ) ) .
3.9 Example The function defined by the recurrence

2 – if 𝑦 = 0
(
𝑓 (𝑦) =
𝑓 (𝑦 − 1) + 3𝑦 + 2 – otherwise

has a first few values of 𝑓 ( 0) = 2, 𝑓 ( 1) = 𝑓 ( 0) + 3 · 1 + 2 = 7, and 𝑓 ( 2) =

𝑓 ( 1) + 3 · 2 + 2 = 15. We will show that it is primitive recursive by verifying that
it is built from the initial functions using only the two combiners.
On the outermost level it looks like a primitive recursion. Because 𝑓 is a
function of one input, the bookkeeping says that 𝑔 has no inputs and ℎ has two.
We want this.
– if 𝑦 = 0
(
𝑔( )
𝑓 (𝑦) =
ℎ(𝑓 (𝑧), 𝑧) – if 𝑦 = S (𝑧)
†
The vector 𝑥® abbreviates 𝑥 0 , ... 𝑥𝑘 − 1 . There are actually infinitely many zero functions, one for each
natural number arity 𝑘 , despite that we called it “the” zero function. The same holds for successor. For
projection, for each 𝑘 there is a projection for each 𝑖 ∈ { 0, ... 𝑘 − 1 } . Also, projection generalizes the
identity function, which is why we use the letter I .
Section 3. Recursion 25

As a function of no arguments, 𝑔 is constant. So we use 𝑔( ) = S ( S (𝑍 ( ))) = 2. For

ℎ we need that ℎ(𝑓 (𝑧), 𝑧) = 𝑓 (𝑦 − 1) + 3𝑦 + 2 = 𝑓 (𝑧) + 3 (𝑧 + 1) + 2. We can use
this.

ℎ(𝑎, 𝑏) = plus ( plus (𝑎, product ( S ( S ( S (𝑍 (𝑎)))), plus (𝑏, 1))), S ( S (𝑍 (𝑎))))

Besides the ones in the above examples, many other familiar mathematical
operations are in the set of primitive recursive functions. They include the boolean
function that tests whether one number is less than another, the elementary
arithmetic function that finds the remainder left when one number is divided by
another, and the number-theoretic function that inputs a number and a prime and
returns the largest power of the prime that divides the number.
We have noted that every primitive recursive function is mechanically com-
putable. The list of primitive recursive functions given above and in the exercises is
so extensive that we may wonder whether every mechanically computable functions
is in the set of primitive recursive functions. The next section shows that the answer
is no — although primitive recursion is powerful, nonetheless there are intuitively
mechanically computable functions that are not primitive recursive.

I.3 Exercises
✓ 3.10 What is the difference between primitive recursion and primitive recursive?
3.11 In defining 00 there is a conflict between the desire to have that every power
of 0 is 0 and the desire to have that every number to the 0 power is 1. What does
the definition of power given above do?
✓ 3.12 As the section body describes, recursion doesn’t have to be logically problem-
atic. But some recursions are ill-defined; consider this one.

0 – if 𝑛 = 0
(
𝑓 (𝑛) =
𝑓 ( 2𝑛 − 2) – otherwise

(a) Find 𝑓 ( 0) and 𝑓 ( 1) . (b) Try to find 𝑓 ( 2) .

3.13 Consider this function.

42 – if 𝑦 = 0
(
𝐹 (𝑦) =
𝐹 (𝑦 − 1) – otherwise

(a) Find 𝐹 ( 0) , . . . 𝐹 ( 10) .

(b) Show that 𝐹 is primitive recursive by describing it in the form of Definition 3.4,
giving suitable functions 𝑔 and ℎ You can use functions already shown in this
section to be primitive recursive.
3.14 The function plus_three : N → N adds three to its input. Show that it is a
primitive recursive function.
26 Chapter I. Mechanical Computation

3.15 The Boolean function is_zero inputs a natural number and returns 𝑇 if the
input is zero, and 𝐹 otherwise. Give a definition by primitive recursion, representing
𝑇 with 1 and 𝐹 with 0.
✓ 3.16 This is the first sequence of numbers ever computed on an electronic
computer.
0 – if 𝑦 = 0
(
𝑠 (𝑦) =
𝑠 (𝑦 − 1) + 2𝑦 − 1 – otherwise
(a) Find 𝑠 ( 0) , . . . 𝑠 ( 10) .
(b) Verify that 𝑠 is primitive recursive by putting it in the form given in Defini-
tion 3.4, giving suitable functions 𝑔 and ℎ . You can use functions already
shown in this section to be primitive recursive.
✓ 3.17 Start with a square array of dots that is 𝑛 dots on a side. Consider those
dots that are below or on the diagonal (the upper left to lower right diagonal).
This triangle has one dot in row 1, two in row 2, etc. The total number of dots in
𝑛 rows is the 𝑛 -th triangular number 𝑡 (𝑛) .
(a) Find 𝑡 ( 0) , . . . 𝑡 ( 10) .
(b) Show that 𝑡 is primitive recursive by describing it in the form given in
Definition 3.4. For 𝑔 and ℎ you can use functions already verified in this
section to be primitive recursive.
3.18 Consider this recurrence.

0 – if 𝑦 = 0
(
𝑑 (𝑦) = 2
𝑑 (𝑦 − 1) + 3𝑦 + 3𝑦 + 1 – otherwise

(a) Find 𝑑 ( 0) , . . . 𝑑 ( 5) .
(b) Verify that 𝑑 is primitive recursive by putting it in the form given in Defini-
tion 3.4. You can use functions already shown in this section to be primitive
recursive.
✓ 3.19 The Towers of Hanoi is a famous puzzle: In the great temple at Benares . . .
beneath the dome which marks the center of the world, rests a brass plate in which
are fixed three diamond needles, each a cubit high and as thick as the body of a bee.
On one of these needles, at the creation, God placed sixty-four discs of pure gold, the
largest disc resting on the brass plate, and the others getting smaller and smaller up
to the top one. This is the Tower of Brahma. Day and night unceasingly the priests
transfer the discs from one diamond needle to another according to the fixed and
immutable laws of Brahma, which require that the priest on duty must not move more
than one disc at a time and that he must place this disc on a needle so that there is no
smaller disc below it. When the sixty-four discs shall have been thus transferred from
the needle on which at the creation God placed them to one of the other needles, tower,
temple, and Brahmans alike will crumble into dust, and with a thunderclap the world
will vanish. It gives the recurrence below because to move a pile of discs you first
move to one side all but the bottom, which takes 𝐻 (𝑛 − 1) steps, then move that
Section 3. Recursion 27

bottom one, which takes one step, then re-move the other disks into place on top
of it, taking another 𝐻 (𝑛 − 1) steps.

1 – if 𝑛 = 1
(
𝐻 (𝑛) =
2 · 𝐻 (𝑛 − 1) + 1 – if 𝑛 > 0

(a) Compute the values for 𝑛 = 1, . . ., 10.

(b) Verify that 𝐻 is primitive recursive by putting it in the form given in Defini-
tion 3.4. You can use functions already shown in this section to be primitive
recursive.
3.20 Define the factorial function fact (𝑦) = 𝑦 · (𝑦 − 1) · · · 1 by primitive recursion,
using functions that this section has already shown are primitive recursive.
✓ 3.21 Recall that a natural number 𝑑 is a divisor of the natural number 𝑎 if there is
a natural 𝑘 so that 𝑘 · 𝑑 = 𝑎 . A natural number is common divisor of 𝑎, 𝑏 ∈ N if it
divides them both. The greatest common divisor of the two is the largest of their
common divisors.
(a) Euclid observed that if 𝑎 > 𝑏 then any common divisor of the two is also
a common divisor of 𝑎 − 𝑏 and 𝑏 . Thus, to find gcd (𝑎, 𝑏) we can reduce to
finding gcd (𝑎 − 𝑏, 𝑏) (this is a reduction because the first number is smaller).
Use that to produce a recursion that computes gcd (𝑎, 𝑏) for any 𝑎, 𝑏 ∈ N.
(b) Compute gcd ( 28, 12) , gcd ( 104, 20) , and gcd ( 300009, 25) using that method.
(c) Euclid also produced this reduction.

– if 𝑏 = 0
(
𝑎
gcd (𝑎, 𝑏) =
gcd (𝑏, rem (𝑏, 𝑎)) – if 𝑏 > 0

where rem (𝑏, 𝑎) is the remainder when 𝑏 is divided by 𝑎 . Note that it has the
form of the schema of primitive recursion (however, this does not show that
it is primitive recursive because we have not yet verified that the remainder
function is primitive recursive). Use this method to compute gcd ( 28, 12) ,
gcd ( 104, 20) , and gcd ( 300009, 25) .
3.22 The following four exercises list functions and predicates. (A predicate is a
truth-valued function; we take an output of 1 to mean ‘true’ while 0 is ‘false’.) Show
that each is primitive recursive. For each, you may use functions already shown to be
primitive recursive in this section body, or in a prior exercise item or subitem.
✓ 3.23 See the note above.
(a) Constant function: C𝑘 (𝑥)
® = C𝑘 (𝑥 0, ... 𝑥𝑛−1 ) = 𝑘 for a fixed 𝑘 ∈ N.
(b) Maximum and minimum of two numbers: max (𝑥, 𝑦) and min (𝑥, 𝑦) . Hint: use
addition and proper subtraction.
(c) Absolute difference function: absdiff (𝑥, 𝑦) = |𝑥 − 𝑦| .
3.24 See the note before Exercise 3.23.
(a) Sign predicate: sgn (𝑦) , which gives 0 if 𝑦 = 0 and gives 1 if 𝑦 is greater than
zero.
28 Chapter I. Mechanical Computation

(b) Negation of the sign predicate: negsign (𝑦) , which gives 0 if 𝑦 is greater than
zero and 1 if 𝑦 = 0.
(c) Less-than predicate: lessthan (𝑥, 𝑦) = 1 if 𝑥 is less than 𝑦 , and 0 otherwise.
The greater-than predicate is similar.
✓ 3.25 See the note before Exercise 3.23.
(a) Boolean functions: we have the convention that we represent ‘true’ with 1 and
‘false’ with 0, and that holds for the outputs here. But for inputs, while we
still take 0 for ‘false’, we take any positive input to mean ‘true’. There is the
standard one-input function

1 – if 𝑥 = 0
(
not (𝑥) =
0 – otherwise

and the two-input functions.

1 – if 𝑥 = 𝑦 = 1 0 – if 𝑥 = 𝑦 = 0
( (
and (𝑥, 𝑦) = or (𝑥, 𝑦) =
0 – otherwise 1 – otherwise

(To avoid being tedious, in the output of the clauses we write 0 to abbreviate
𝑍 ( ) while 1 abbreviates S (𝑍 ( )) .)
(b) Equality predicate: equal (𝑥, 𝑦) = 1 if 𝑥 = 𝑦 and 0 otherwise.
✓ 3.26 See the note before Exercise 3.23.
(a) Inequality predicate: notequal (𝑥, 𝑦) = 0 if 𝑥 = 𝑦 and 1 otherwise.
(b) Functions defined by a finite and fixed number of cases, as with these.


 7 if 𝑥 = 1 

 7 if 𝑥 = 1 and 𝑦 = 2
𝑚(𝑥) = 9 if 𝑥 = 5 𝑛(𝑥, 𝑦) = 9 if 𝑥 = 5 and 𝑦 = 5

 


 2 otherwise  0 otherwise

 

 
3.27 Show that each of these is primitive recursive. You may use any function
shown to be primitive recursive in the section body, in the prior exercise, or in a
prior item.
(a) Bounded sum function: the partial sums of a series where the terms 𝑔(𝑖)
are specified by a single primitive recursive function 𝑔, so that 𝑆𝑔 (𝑦) =
0 ≤𝑖<𝑦 𝑔(𝑖) = 𝑔( 0) + 𝑔( 1) + · · · + 𝑔(𝑦 − 1) (the sum of zero-many terms is
Í
𝑆𝑔 ( 0) = 0). In comparison with the final item of the prior question, while the
number of summands is also finite, here it varies with 𝑦 .
(b) Bounded product function: the partial products of a series whose terms
𝑔(𝑖) are given by a primitive recursive function, 𝑃𝑔 (𝑦) = 0 ≤𝑖<𝑦 𝑔(𝑖) =
Î
𝑔( 0) · 𝑔( 1) · · · 𝑔(𝑦 − 1) (the product of zero-many terms is 𝑃𝑔 ( 0) = 1).
(c) Bounded minimization: let 𝑚 ∈ N and let 𝑝 (𝑥, ® 𝑖) be a predicate for all
𝑖 < 𝑚 . The minimization operator 𝑀 (𝑥, ® 𝑖) , often written min𝑖<𝑚 [𝑝 (𝑥,
® 𝑖)]
or 𝜇𝑖𝑖<𝑚 [𝑝 (𝑥, ® 𝑖)] , returns the smallest 𝑖 ≤ 𝑚 such that 𝑝 (𝑥, ® 𝑖) = 0, or else
returns 𝑚 . Hint: Consider the bounded sum of the bounded products of the
predicates.
Section 3. Recursion 29

3.28 Show that each is a primitive recursive function. You can use functions
shown to be primitive recursive in this section, or in a prior exercise, or a prior
item.
(a) Bounded universal quantification: where 𝑚 ∈ N, for each 𝑖 < 𝑚 let 𝑝 (𝑥, ® 𝑖)
be a predicate. Then 𝑈 (𝑥, ® 𝑚) , typically written ∀𝑖 < 𝑚 [𝑝 (𝑥,
® 𝑖)] , has value 1
if 𝑝 (𝑥,
® 0) = 1 and . . . and 𝑝 (𝑥,® 𝑚 − 1) = 1. Otherwise, if even one 𝑝 (𝑥, ® 𝑖) is
non-1 for 0 ≤ 𝑖 < 𝑚 then 𝑈 (𝑥, ® 𝑚) = 0.
(b) Bounded existential quantification: where 𝑚 ∈ N, for each 𝑖 < 𝑚 let 𝑝 (𝑥, ® 𝑖)
be a predicate. Then 𝐸 (𝑥, ® 𝑚) , typically written ∃𝑖 ≤ 𝑚 [𝑝 (𝑥,
® 𝑖)] , has value 1
if 𝑝 (𝑥,
® 0) = 1 or . . . or 𝑝 (𝑥,
® 𝑚 − 1) = 1, and has value 0 otherwise.
(c) Divides predicate: where 𝑥, 𝑦 ∈ N we have divides (𝑥, 𝑦) if there is some 𝑘 ∈ N
with 𝑦 = 𝑥 · 𝑘 .
(d) Primality predicate: prime (𝑦) if 𝑦 has no nontrivial divisor.
3.29 We will show that the function rem (𝑎, 𝑏) giving the remainder when 𝑎 is
divided by 𝑏 is primitive recursive.
(a) Fill in this table.
𝑎 0 1 2 3 4 5 6 7
rem (𝑎, 3)
(b) Observe that rem (𝑎 + 1, 3) = rem (𝑎) + 1 for many of the entries. When is this
relationship not true?
(c) Fill in the blanks.


 (1) – if 𝑎 = 0
rem (𝑎, 3) = (2) – if 𝑎 = 𝑆 (𝑧) and rem (𝑧, 3) + 1 = 3



(3) – if 𝑎 = 𝑆 (𝑧) and rem (𝑧, 3) + 1 ≠ 3





(d) Show that rem (𝑎, 3) is primitive recursive. You can use the prior item,
along with any functions shown to be primitive recursive in the section body,
Exercise 3.23 and Exercise 3.25. (Compared with Definition 3.4, here the two
arguments are switched, which is only a typographic difference.)
(e) Extend the prior item to show that rem (𝑎, 𝑏) is primitive recursive.
3.30 The function div : N2 → N gives the integer part of the division of the first
argument by the second. Thus, div ( 5, 3) = 1 and div ( 10, 3) = 3.
(a) Fill in this table.
𝑎 0 1 2 3 4 5 6 7 8 9 10
div (𝑎, 3)
(b) Much of the time div (𝑎 + 1, 3) = div (𝑎, 3) . Under what circumstance does it
not happen?
(c) Show that div (𝑎, 3) is primitive recursive. You can use the prior exercise,
along with any functions shown to be primitive recursive in the section body,
or a prior exercise such as Exercise 3.23 or Exercise 3.25. (Compared with
Definition 3.4, here the two arguments are switched, which is only a difference
of appearance.)
30 Chapter I. Mechanical Computation

(d) Show that div (𝑎, 𝑏) is primitive recursive.

3.31 The floor function 𝑓 (𝑥, 𝑦) = ⌊𝑥/𝑦⌋ returns the largest natural number
less than or equal to 𝑥/𝑦 . Show that it is primitive recursive. Hint: bounded
minimization from Exercise 3.27 is a good place to start.
3.32 The examples of primitive recursion in this section and earlier exercises all
have 𝑓 (𝑦) use only one prior value, 𝑓 (𝑧) = 𝑓 (𝑦 − 1) . But some recursions use more
than one, such as the Fibonacci recursion 𝐹 (𝑦) = 𝐹 (𝑦 − 1) + 𝐹 (𝑦 − 2) that uses two
(for Fibonacci, we get the recursion started by defining 𝐹 ( 0) = 1 and 𝐹 ( 1) = 1).
In a ‘course-of-values recursion’, the next value 𝑓 (𝑦) depends on some or all of the
prior values 𝑓 (𝑦 − 1) , . . . 𝑓 ( 0) . To do these in a primitive recursive way, we get
access to the sequence of all prior values by encoding them into a single number.
Consider a finite sequence of natural numbers 𝐴 = ⟨𝑎 0, ... 𝑎𝑘 − 1 ⟩ . Gödel’s multiplica-
tive encoding of 𝐴 is the natural number computed by multiplying factors, where
each factor is the 𝑖 -th prime number raised to the successor of the 𝑖 -th sequence
element.

𝐺 (𝐴) = 2S (𝑎0 ) · 3S (𝑎1 ) · 5S (𝑎2 ) · · · 𝑝𝑘S−(𝑎1𝑘 −1 )

For the empty sequence, 𝐺 (⟨ ⟩) = 1. We will sketch how to include all the prior
values.
(a) Find 𝐺 (𝐴) for 𝐴0 = ⟨3, 1⟩ and 𝐴1 = ⟨2, 2, 2⟩ .
(b) For each number 𝑛 , find the sequence 𝐴 where 𝑛 = 𝐺 (𝐴) , or find that no such
sequence exists: 𝑛 0 = 10800, 𝑛 1 = 12, and 𝑛 2 = 343.
(c) Why does the encoding use the successor function?
(d) Where 𝑓 : N𝑘+1 → N the course of values function 𝑓¯: N𝑘+1 → N is defined
as here for any 𝑥® ∈ N𝑘 .

– if 𝑦 = 0
(
𝐺 (⟨ ⟩)
𝑓¯(𝑥,
® 𝑦) =
® 0), ... 𝑓 (𝑥,
𝐺 (⟨𝑓 (𝑥, ® 𝑧)⟩) – if 𝑧 = S (𝑦)

Let 𝐹 (𝑛) be the 𝑛 -th Fibonacci number. Find 𝐹¯( 0) , 𝐹¯( 1) , 𝐹¯( 2) , and 𝐹¯( 3) .
✓ 3.33 This is McCarthy’s 91 function.

M ( M (𝑥 + 11)) – if 𝑥 ≤ 100
(
M (𝑥) =
𝑥 − 10 – if 𝑥 > 100

(a) What is the output for inputs 𝑥 ∈ { 0, ... 101 }? For larger inputs? (You may
want to write a small script.) (b) Show that this function is primitive recursive.
You may cite the results from this section or from prior exercises.
3.34 Show that every primitive recursive function is total.
Section 4. General recursion 31

Section
I.4 General recursion
Every primitive recursive function is intuitively mechanically computable. What
about the converse: is every mechanically computable function primitive recursive?
Here we will answer ‘no’.†
Ackermann functions We will give a function that is intuitively mechanically
computable but that is not primitive recursive. An important feature of this function
is that it arises naturally so we will introduce it using familiar operations. Recall
that the addition operation is repeated successor, that multiplication is repeated
addition, and that exponentiation is repeated multiplication.
𝑥 + 𝑦 = S ( S ( · · · S (𝑥))) 𝑥 ·𝑦 =𝑥 +𝑥 + ··· +𝑥 𝑥𝑦 = 𝑥 · 𝑥 · · · · · 𝑥
| {z } | {z } | {z }
𝑦 many 𝑦 many 𝑦 many

This is a compelling pattern.

The pattern is especially striking when we express these functions using the
schema of primitive recursion. For that, start by defining H0 to be the successor
function, H0 = S , and then we define these other H𝑖 ’s.
– if 𝑦 = 0
(
𝑥
plus (𝑥, 𝑦) = H1 (𝑥, 𝑦) =
H0 (𝑥, H1 (𝑥, 𝑦 − 1)) – otherwise

0 – if 𝑦 = 0
(
product (𝑥, 𝑦) = H2 (𝑥, 𝑦) =
H1 (𝑥, H2 (𝑥, 𝑦 − 1)) – otherwise

1 – if 𝑦 = 0
(
power (𝑥, 𝑦) = H3 (𝑥, 𝑦) =
H2 (𝑥, H3 (𝑥, 𝑦 − 1)) – otherwise
The pattern is in the ‘otherwise’ lines. Each one is H𝑛 (𝑥, 𝑦) = H𝑛− 1 (𝑥, H𝑛 (𝑥, 𝑦 − 1)) .
Because of this pattern we call each H𝑛 the level 𝑛 function, so that successor is
the level 0 operation, addition is level 1, multiplication is level 2, and exponentiation
is level 3. The definition below writes H (𝑛, 𝑥, 𝑦) in place of H𝑛 (𝑥, 𝑦) to bring all of
the levels into one formula.
4.1 Definition This is the hyperoperation H : N3 → N.


𝑦+1 – if 𝑛 = 0
– if 𝑛 = 1 and 𝑦 = 0

𝑥




H (𝑛, 𝑥, 𝑦) = 0 – if 𝑛 = 2 and 𝑦 = 0



1 – if 𝑛 > 2 and 𝑦 = 0




 H (𝑛 − 1, 𝑥, H (𝑛, 𝑥, 𝑦 − 1)) – otherwise



†
That’s why the diminutive ‘primitive’ is in the name — while the class is interesting and important, it
isn’t big enough to contain every effective function.
32 Chapter I. Mechanical Computation

4.2 Lemma H0 (𝑥, 𝑦) = S (𝑦) , H1 (𝑥, 𝑦) = 𝑥 + 𝑦 , H2 (𝑥, 𝑦) = 𝑥 · 𝑦 , H3 (𝑥, 𝑦) = 𝑥 𝑦.

Proof The level 0 statement H0 (𝑥, 𝑦) = H ( 0, 𝑥, 𝑦) = 𝑦 + 1 is in the definition of H.
We prove the level 1 statement H1 (𝑥, 𝑦) = H ( 1, 𝑥, 𝑦) = 𝑥 + 𝑦 by induction
on 𝑦 . For the 𝑦 = 0 base step, the definition is that H ( 1, 𝑥, 0) = 𝑥 , which
equals 𝑥 + 0 = 𝑥 + 𝑦 . For the inductive step, assume that the statement holds
for 𝑦 = 0, . . . 𝑦 = 𝑘 and consider the 𝑦 = 𝑘 + 1 case. The ‘otherwise’ line says
H1 (𝑥, 𝑘 + 1) = H ( 1, 𝑥, 𝑘 + 1) = H ( 0, 𝑥, H1 (𝑥, 𝑘)) = H0 (𝑥, H1 (𝑥, 𝑘)) . The inductive
hypothesis gives H0 (𝑥, H1 (𝑥, 𝑘)) = H0 (𝑥, 𝑥 +𝑘) . By the prior paragraph this equals
𝑥 + 𝑘 + 1 = 𝑥 + 𝑦.
The other two, H2 and H3 , are Exercise 4.23.
4.3 Remark Level four is tetration. The first few values are H4 (𝑥, 0) = 1, and
H4 (𝑥, 1) = H3 (𝑥, H4 (𝑥, 0)) = 𝑥 1 = 𝑥 , and H4 (𝑥, 2) = H3 (𝑥, H4 (𝑥, 1)) = 𝑥 𝑥 , as
well as these two.
𝑥𝑥
H4 (𝑥, 3) = H3 (𝑥, H4 (𝑥, 2)) = 𝑥 𝑥
𝑥
and H4 (𝑥, 4) = 𝑥 𝑥
This is a power tower. To evaluate these, recall that in exponentiation the
parentheses are significant, so for instance these two are unequal: ( 33 ) 3 = 273 =
3
39 = 19 683 while 3 ( 3 ) = 327 = 7 625 597 484 987. Tetration does it in the
second, larger, way. The rapid growth of the output values is a striking aspect
4
of tetration. For instance, H4 ( 4, 3) = 44 is much greater than the number of
elementary particles in the universe.
Hyperoperation is mechanically computable. Its code is a transcription of the
definition.
(define (H n x y)
(cond
[(= n 0) (+ y 1)]
[(and (= n 1) (= y 0)) x]
[(and (= n 2) (= y 0)) 0]
[(and (> n 2) (= y 0)) 1]
[else (H (- n 1) x (H n x (- y 1)))]))

However, hyperoperation’s recursion line

H (𝑛, 𝑥, 𝑦) = H (𝑛 − 1, 𝑥, H (𝑛, 𝑥, 𝑦 − 1))
does not fit the form of primitive recursion.
𝑓 (𝑥 0, ... , 𝑥𝑘 −1, 𝑦) = ℎ(𝑓 (𝑥 0, ... , 𝑥𝑘 −1, 𝑦 − 1), 𝑥 0, ... , 𝑥𝑘 −1, 𝑦 − 1)
The problem is not that the arguments are in a different order; that is cosmetic.
The problem is that the definition of primitive recursive function, Definition 3.4,
requires that ℎ be a function for which we already have a primitive recursive
derivation but here ℎ is the function that we are defining, 𝐻 .
Of course, just because one definition has the wrong form doesn’t mean that
no definition with the right form exists. However, Ackermann† proved that there is
no such definition,
†
We have seen Ackermann already as one of the people who stated the Entscheidungsproblem. A
function having a recursion similar to that of H is an Ackermann function.
Section 4. General recursion 33

4.4 Theorem The hyperoperation H is not primitive recursive.

The full details of the proof would a detour for us (see Extra D).
In summary, it shows that H grows faster than any primitive recursive
function. More precisely, for any primitive recursive function 𝑓 of three
inputs, there is a sufficiently large 𝑁 ∈ N such that for all 𝑛, 𝑥, 𝑦 ∈ N, if
𝑛, 𝑥, 𝑦 > 𝑁 then H (𝑛, 𝑥, 𝑦) > 𝑓 (𝑛, 𝑥, 𝑦) . This proof is about uniformity, or
rather lack of it: while at each level 𝑛 the function H𝑛 is primitive recursive,
no primitive recursive function encompasses all levels at once — there is
no primitive recursive way to uniformly compute all primitive recursive
Wilhelm Ack-
functions. ermann
This relates to a point from the discussion of Church’s Thesis. We have 1896–1962
observed that if a function is primitive recursive then it is mechanically
computable. We have built a pile of natural and interesting functions that are
primitive recursive. So ‘primitive recursive’ seems to have many of the same
characteristics as ‘Turing machine computable’. However, we now have an effective
function that is not primitive recursive. So the set of primitive recursive functions
fails the ‘coverage’ test from the Church’s Thesis discussion.
Consequently we next will expand from the set of primitive recursive functions
to a larger collection, which proves to be the same as the set of Turing-computable
functions.

𝜇 recursion The prior section’s Exercise 3.27 suggests the right direction.
Í Primitive
recursion does operations that are bounded, such as bounded sum 0 ≤𝑖<𝑦 𝑔(𝑖) =
𝑔( 0) + · · · +𝑔(𝑦 − 1) and bounded minimization min𝑖<𝑚 [𝑝 (𝑥, ® 𝑖)] ,
® 𝑖)] = 𝜇𝑖𝑖<𝑚 [𝑝 (𝑥,
which returns the smallest 𝑖 < 𝑚 such that 𝑝 (𝑥, ® 𝑖) = 0. We can show that
a programming language having only bounded loops computes the primitive
recursive functions (see Extra E). To include all of the functions that are intuitively
mechanically computable we must add an operation that is unbounded.
4.5 Definition Suppose that 𝑔 : N𝑛+1 → N is total, so that for every input tuple
there is an output number. Then 𝑓 : N𝑛 → Nis defined from 𝑔 by minimization or
𝜇 -recursion, written 𝑓 (𝑥)
® = 𝜇𝑦 𝑔(𝑥, ® 𝑦) = 0 , if 𝑓 (𝑥)
® is the minimum number 𝑦
such that 𝑔(𝑥,
® 𝑦) = 0.
This is unbounded search. Think of it as examining 𝑔(𝑥, ® 0) , then 𝑔(𝑥,
® 1) ,
etc., looking for one of them to give the output 0. If that ever happens, so that
® 𝑦) = 0 for some least 𝑦 , then 𝑓 (𝑥)
𝑔(𝑥, ® = 𝑦 . If there is no such number then 𝑓 (𝑥)®
is undefined.
4.6 Example Euler noticed that the polynomial 𝑝 (𝑦) = 𝑦 2 + 𝑦 + 41 at least at first
output only primes. Does the pattern continue forever?

𝑦 0 1 2 3 4 5 6 7 8 9 ...
𝑝 (𝑦) 41 43 47 53 61 71 83 97 113 131 ...

Here is a way to do an unbounded search for non-prime output on a quadratic.

34 Chapter I. Mechanical Computation

Start with this (Euler’s 𝑝 will use 𝑥 0 = 1, 𝑥 1 = 1, and 𝑥 2 = 41).

1 – if 𝑥 0𝑦 2 + 𝑥 1𝑦 + 𝑥 2 is prime
(
𝑔(𝑥 0, 𝑥 1, 𝑥 2, 𝑦) =
0 – otherwise

Then the search is 𝑓 (𝑥)

® = 𝜇𝑦 𝑔(𝑥, ® 𝑦) = 0 .

This code illustrates. Racket comes with a test for primality.

(require math/number-theory) ;; provides predicate: prime?

Use it to define 𝑔, which tests whether the quadratic is prime.

(define (g x0 x1 x2 y)
(if (prime? (+ (* x0 y y) (* x1 y) x2))
1
0))

With that, the search function

(define (f x0 x1 x2)
(define (f-helper y) ; x0, x1, x2 inherited from enclosing def of f
(if (= 0 (g x0 x1 x2 y))
y
(f-helper (add1 y))))

(f-helper 0))

calls (f- helper 0) , then (f- helper 1) , etc. It finds an input for which Euler’s
quadratic 𝑝 returns a non-prime.
> (f 1 1 41)
40

All primitive recursive function are total. But by using the minimization
operator we can get functions whose output value is undefined for some or all
inputs. For instance, if 𝑔(𝑥, 𝑦) = 1 for all 𝑥, 𝑦 ∈ N then 𝑓 (𝑥) = 𝜇𝑦 [𝑔(𝑥, 𝑦) = 0]
is undefined for all 𝑥 . In the next example no one currently knows whether the
search will end.
4.7 Example Goldbach’s conjecture is that every even number greater than two is the
sum of two primes. The first few instances are 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,
and 10 = 3 + 7. This conjecture is not known to be true, although researchers have
confirmed it for all evens up to 𝑦 = 4 × 1018.
Here we do an unbounded search for a counterexample. This auxiliary function
;; Returns minimal i <= n such that i and n-i are prime; returns #f if no such i
(define (gb-check n)
(for/first ([i (in-range 2 (add1 n))]
#:when (and (prime? i)
(prime? (- n i))))
i))

helps us get the definition’s boolean function 𝑔.

(define (gb-g y)
(if (or (odd? y)
(< y 3))
Section 4. General recursion 35

1
(if (gb-check y)
1
0)))

It returns 0 if the input is even and greater than 2

but there is no a pair of primes.
Then here is the unbounded search gb-f () = 𝜇𝑦 gb-g (𝑦) = 0 .
(define (gb-f)
(define (gb-f-helper y)
(if (= 0 (gb-g y))
y
(gb-f-helper (add1 y))))

(gb-f-helper 0))

4.8 Example We can expand on that approach. Suppose that we want to settle
Legendre’s conjecture, that for every natural number 𝑛 > 0 there is a prime
number 𝑝 with 𝑛 2 < 𝑝 < (𝑛 + 1) 2. Start an unbounded search for a counterexample,
but at the same time also run an unbounded search for a proof. After all, a proof
is a sequence of statements in a suitable formal language where each statement
is either an axiom or follows logically from the prior statements, and where the
final statement is the desired theorem. You could use a computer to search for
a proof as: for each 𝑛 , interpret it as a string (perhaps convert 𝑛 to binary and
interpret that binary as a string) and check whether that string is a proof of the
theorem. Now wait for one or the other of these searches to halt. Obviously this
relates unbounded search to the Entscheidungsproblem.
The above discussion makes clear that unbounded search via the 𝜇 operator is
intuitively mechanically computable. We now define a superset set of the primitive
recursive functions by adding this function operation.
4.9 Definition A function is general recursive or partial recursive, or 𝜇 -recursive,
or just recursive, if it can be derived from the initial operations of the zero
function Z (𝑥)® = 0, the successor function S (𝑥) = 𝑥 + 1, and the projection
functions I 𝑖 (𝑥)
® = 𝑥𝑖 by a finite number of applications of function composition,
the schema of primitive recursion, and minimization.
S Kleene showed that this set of functions is the same as the Turing machine-based
set of computable functions.
We have seen that unbounded search is a natural computational construct. It
is also a theme in this book. For instance, we will later consider the question of
which programs halt and a natural way to think about this is as a search for a
halting step.

I.4 Exercises
Some of these have answers that are tedious to compute. It may help to use a computer,
for instance by writing a Racket program or using Sage.
4.10 What is the difference between total recursive and primitive recursive?
✓ 4.11 Find each: H4 ( 2, 0) , H4 ( 2, 1) , H4 ( 2, 2) , H4 ( 2, 3) , and H4 ( 2, 4) .
36 Chapter I. Mechanical Computation

✓ 4.12 How many years is H4 ( 3, 3) seconds?

4.13 What is the ratio H3 ( 3, 3)/H2 ( 2, 2) ?
4.14 Graph H1 ( 2, 𝑦) up to 𝑦 = 9. Also graph H2 ( 2, 𝑦) and H3 ( 2, 𝑦) over the same
range. Put all three plots on the same axes.
4.15 This variant of H is often called “the” Ackermann function.


𝑦+1 – if 𝑘 = 0
A (𝑘, 𝑦) = A (𝑘 − 1, 1) – if 𝑦 = 0 and 𝑘 > 0



 A (𝑘 − 1, A (𝑘, 𝑦 − 1)) – otherwise




It has different boundary conditions but much the same recursion. (Extra D has
more about this variant.) Compute A (𝑘, 𝑦) for 0 ≤ 𝑘 < 4 and 0 ≤ 𝑦 < 7.
✓ 4.16 Let 𝑔(𝑥, 𝑦) = 0 if 𝑥 + 𝑦 = 100 and let 𝑔(𝑥, 𝑦) = 1 otherwise. Now let
𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . For each, find the value or say that it is not defined.
(a) 𝑓 ( 0) (b) 𝑓 ( 1) (c) 𝑓 ( 50) (d) 𝑓 ( 100) (e) 𝑓 ( 101) . Give an expression for 𝑓
that does not include 𝜇 -recursion.
4.17 Let 𝑔(𝑥, 𝑦) = 0 if 𝑥 · 𝑦 = 100 and 𝑔(𝑥, 𝑦) = 1 otherwise. Also let 𝑓 (𝑥) =
𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . For each, find the value or say that it is not defined. (a) 𝑓 ( 0)
(b) 𝑓 ( 1) (c) 𝑓 ( 50) (d) 𝑓 ( 100) (e) 𝑓 ( 101)
✓ 4.18 A Fermat number has the form 𝐹𝑛 = 22 + 1 for 𝑛 ∈ N. The first few,
𝑛

𝐹 0 = 3, 𝐹 1 = 5, 𝐹 2 = 17, 𝐹 3 = 257, and 𝐹 4 = 65 537, are prime; these are Fermat

primes. But 𝐹 5 is not prime, nor are 𝐹 6 , . . . 𝐹 32 . (We don’t know of any primes for
higher 𝑛 , and 𝐹 32 is the highest that researchers have checked.) Let 𝑔(𝑥, 𝑦) = 0
if 𝑦 is a Fermat prime andlarger than 𝐹𝑥 , and let 𝑔(𝑥, 𝑦) = 1 otherwise. Also
let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . For each, what can you say? (a) 𝑓 ( 0) (b) 𝑓 ( 1)
(c) 𝑓 ( 50) (d) 𝑓 (𝐹 4 )
4.19 Let 𝑔(𝑥, 𝑦) = 0 if 𝑦 3 is greater than or 2
equal to 𝑥 , and let 𝑔(𝑥, 𝑦) = 1
otherwise. Also let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . Find each, or state ‘undefined’.

(a) 𝑓 ( 0) (b) 𝑓 ( 1) (c) 𝑓 ( 50) (d) 𝑓 ( 100) (e) 𝑓 (𝑥)
✓ 4.20 Two natural numbers are ‘relatively prime’ if the largest natural num-
ber that divides them both is 1. Let notrelprime (𝑥, 𝑦) = 0 if the two are not
relatively prime and let notrelprime (𝑥, 𝑦) = 1 otherwise. Find each 𝑓 (𝑥) =
𝜇𝑦 notrelprime (𝑥, 𝑦) = 0 . (a) 𝑓 ( 0) (b) 𝑓 ( 1) (c) 𝑓 ( 2) (d) 𝑓 ( 3) (e) 𝑓 ( 4)

(f) 𝑓 ( 42) (g) 𝑓 (𝑥)
4.21 Where 𝑥 ∈ R the notation ⌈𝑥⌉ means the least integer that is at least as big
as 𝑥 . Let 𝑔(𝑥, 𝑦) = ⌈((𝑥 + 1)/(𝑦 + 1)) − 1⌉ and let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 .
(a) Find 𝑓 (𝑥) for 0 ≤ 𝑥 < 6.
(b) Give a description of 𝑓 that does not use 𝜇 -recursion.
4.22 In defining general recursive functions, Definition 4.9, we get all computable
functions by starting with the primitive recursive functions and adding minimization.
What if instead of minimization we had added Ackermann’s function; would we
then have all computable functions?
Section 4. General recursion 37

4.23 Finish the proof of Lemma 4.2 by verifying that H2 (𝑥, 𝑦) = 𝑥 · 𝑦 and
H3 (𝑥, 𝑦) = 𝑥 𝑦 .
4.24 Prove that the computation of H (𝑛, 𝑥, 𝑦) always terminates.
✓ 4.25
(a) Prove that the function remtwo : N → { 0, 1 } giving the remainder on division
by two is primitive recursive.
(b) Use that to prove that this function is 𝜇 -recursive: 𝑓 (𝑛) = 0 if 𝑛 is even, and
𝑓 (𝑛)↑ if 𝑛 is odd.
✓ 4.26 Consider the Turing machine P = {𝑞 0 B1𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 BR𝑞 2, 𝑞 1 1L𝑞 1 }. De-
fine 𝑔(𝑥, 𝑦) = 0 if the machine P , when started on a tape that is blank except for
𝑥 -many consecutive 1’s and with the head under the leftmost 1, has halted after
step 𝑦 . Otherwise, 𝑔(𝑥, 𝑦) = 1. Find 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 for these. (a) 𝑓 ( 0)
(b) 𝑓 ( 1) (c) 𝑓 ( 2) (d) 𝑓 ( 3) (e) 𝑓 ( 4) (f) 𝑓 ( 5)
4.27 Define 𝑔(𝑥, 𝑦) by: start P = {𝑞 0 B1𝑞 2, 𝑞 0 1L𝑞 1, 𝑞 1 B1𝑞 2, 𝑞 1 11𝑞 2 } on a tape
that is blank except for 𝑥 -many consecutive 1’s and with the head under the
leftmost 1. If P has halted after step 𝑦 then 𝑔(𝑥, 𝑦) = 0 and otherwise 𝑔(𝑥, 𝑦) = 1.
Let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . Find 𝑓 (𝑥) for these. (a) 𝑓 ( 0) (b) 𝑓 ( 1) (c) 𝑓 ( 2)

(d) 𝑓 ( 3) (e) 𝑓 ( 4) (f) 𝑓 ( 5)
4.28 Consider this Turing machine.

{𝑞 0 BR𝑞 1, 𝑞 0 1R𝑞 1, 𝑞 1 BR𝑞 2, 𝑞 1 1R𝑞 2, 𝑞 2 BL𝑞 3, 𝑞 2 1L𝑞 3, 𝑞 3 BL𝑞 4, 𝑞 3 1L𝑞 4 }

Let 𝑔(𝑥, 𝑦) = 0 if this machine, when started on a tape that is all blank except for
𝑥 -many consecutive 1’s and with the head under the leftmost 1, has halted after
𝑦 steps. Otherwise, 𝑔(𝑥, 𝑦) = 1. Let 𝑓 (𝑥) = 𝜇𝑦 𝑔(𝑥, 𝑦) = 0 . Find: (a) 𝑓 ( 0)
(b) 𝑓 ( 1) (c) 𝑓 ( 2) (d) 𝑓 (𝑥) .
✓ 4.29 Define ℎ : N+ → N by: ℎ(𝑛) = 𝑛/2 if 𝑛 is even, and otherwise ℎ(𝑛) = 3𝑛 + 1.
Let 𝐻 (𝑛, 𝑘) be the 𝑘 -fold composition of ℎ with itself, so 𝐻 (𝑛, 1) = ℎ(𝑛) , 𝐻 (𝑛, 2) =
ℎ ◦ ℎ (𝑛) , 𝐻 (𝑛, 3) = ℎ ◦ ℎ ◦ ℎ (𝑛) , etc. (We can take 𝐻 (𝑛, 0) = 0, although its
value isn’t interesting.) Let 𝐶 (𝑛) = 𝜇𝑘 𝐻 (𝑛, 𝑘) = 1 . (a) Compute 𝐻 ( 4, 1) ,
𝐻 ( 4, 2) , and 𝐻 ( 4, 3) . (b) Find 𝐶 ( 4) , if it is defined. (c) Find 𝐶 ( 5) , if it is defined.
(d) Find 𝐶 ( 11) , if it is defined. (e) Find 𝐶 (𝑛) for all 𝑛 ∈ [ 1 .. 20) , where defined.
The Collatz conjecture is that 𝐶 (𝑛) is defined for all 𝑛 . No one knows if it is true.
4.30 The Ackermann function is intuitively mechanically computable (and total)
but is not primitive recursive. Here is an alternative such function. Assume that all
partial recursive functions take one natural number input and yield one natural
numbers output. (We can simulate input pairs, etc, with Gödel’s multiplicative
encoding; see Exercise 3.32.) Let 𝑓0 , 𝑓1 , . . . be an effective list of all primitive
recursive functions. That is, there is a primitive recursive function that inputs the
index 𝑖 and returns some way of computing 𝑓𝑖 . (Remark: this is an interpreter for
the primitive recursive functions, which given the specification 𝑖 of the function,
can do the computation of 𝑓𝑖 (𝑥) for any input 𝑥 .)
38 Chapter I. Mechanical Computation

Now consider 𝐷 (𝑛) = 𝑓𝑛 (𝑛) + 1. Show that 𝐷 , while intuitively computable, is not
primitive recursive.

Extra
I.A Turing machine simulator
The source repository for this book includes a program, written in Racket, to
simulate a Turing machine. It is in the directory src/ scheme / prologue . Here we will
show how to run this simulator. (The implementation tracks closely the description
of the action of a Turing machine given on page 8.)
Example 1.1 gives a Turing machine that computes the predecessor function.

Ppred = {𝑞 0 BL𝑞 1, 𝑞 0 1R𝑞 0, 𝑞 1 BL𝑞 2, 𝑞 1 1B𝑞 1, 𝑞 2 BR𝑞 3, 𝑞 2 1L𝑞 2 }

That translates to the input file pred .tm .

% pred.tm
% Compute predecessor fcn: pred(0)=0 and pred(n)=n-1
0 B L 1
0 1 R 0
1 B L 2
1 1 B 1
2 B R 3
2 1 L 2

Thus the simulator for any particular Turing machine is really the pair consisting
of the Racket code along with the machine’s description, as above.
Below is a run of the simulator, including its command line invocation. The
machine starts with a current symbol of 1 and the tape to the right of the current
symbol is 11 (the tape to the left is empty). Thus, the entire tape input is 𝜏 = 111.
Since the predecessor of 3 is 2, we expect that when it finishes the tape will contain
11, with the rest blank.
computing/src/scheme/prologue$ ./turing-machine.rkt -f machines/pred.tm -c "1" -r "11"
step 0: q0: *1*11
step 1: q0: 1*1*1
step 2: q0: 11*1*
step 3: q0: 111*B*
step 4: q1: 11*1*B
step 5: q1: 11*B*B
step 6: q2: 1*1*BB
step 7: q2: *1*1BB
step 8: q2: *B*11BB
step 9: q3: B*1*1BB
step 10: HALT

The output is crude but good enough for small experiments. The command line
turing - machine .rkt -- help gives the simulator’s options.

I.A Exercises
A.1 Run the simulator on Ppred starting with 11111. Also start with an empty
tape.
Extra B. Hardware 39

A.2 Run the simulator on Example 1.2’s Padd to do 1 + 2. Also simulate 0 + 2 and
0 + 0.

A.3 Write a Turing machine to perform the operation of adding 3, so that given
as input a tape containing only a string of 𝑛 consecutive 1’s, it returns a tape with
a string of 𝑛 + 3 consecutive 1’s. Follow our convention that when the program
starts and ends the head is under the first 1. Run it on the simulator, with an input
of 4 consecutive 1’s, and also with an empty tape.

A.4 Write a machine to decide if the input contains the substring 010. Fix
Σ = { 0, 1, B }. The machine starts with the tape blank except for a contiguous
string of 0’s and 1’s, and with the head under the first non-blank symbol. When
it finishes, the tape will have either just a 1 if the input contained the desired
substring, or otherwise just a 0. We will do this in stages, building a few of what
amounts to subroutines.
(a) Write instructions, starting in state 𝑞 10 , so that if initially the machine’s head
is under the first of a sequence of non-blank entries then at the end the head
will be to the right of the final such entry.
(b) Write a sequence of instructions, starting in state 𝑞 20 , so that if initially the
head is just to the right of a sequence of non-blank entries, then at the end all
entries are blank.
(c) Write the full machine, including linking in the prior items.

Extra
I.B Hardware

Following Turing’s approach, we’ve gone through a development of the definition

of what is computable based on transitions. We produce a table of transition
instructions and call that a machine. However, given the table, can we be sure
that there is an associated actual mechanism, a physical implementation with that
behavior?
Put another way, in programming languages there are operators that are
constructed from other, simpler, operators. For instance, a value for sin (𝑥) may
be calculated via its Taylor polynomial from addition and multiplication. But
the very simplest operations must happen on the hardware; how does that get
implemented?
We will show how to start with any desired behavior and from it produce a
device with that behavior. For this, we will work with machines that take finite
binary sequences, bitstrings, as inputs and outputs. The easiest approach is via
propositional logic. (Section C has a review.)
Below are the three basic logic operators. These tables use 0 in place of 𝐹 and
1 in place of 𝑇, as is the convention in electronics.
40 Chapter I. Mechanical Computation

not 𝑃 𝑃 and 𝑄 𝑃 or 𝑄
𝑃 ¬𝑃 𝑃 𝑄 𝑃 ∧𝑄 𝑃 ∨𝑄
0 1 0 0 0 0
1 0 0 1 0 1
1 0 0 1
1 1 1 1

Those three logic operators are all we need. We will show how to go from
a specified input-output behavior, a desired truth table, to a propositional logic
expression having that behavior that uses only ‘¬’, ‘∧’, and ‘∨’. Then we will sketch
how to implement that with electronic components.
The two tables below show how. Start with the one on the left and focus on the
row with output 1. The expression ¬𝑃 ∧ ¬𝑄 makes this row take on value 1 and
every other row take on value 0.

𝑃 𝑄 𝑃 𝑄 𝑅
0 0 1 0 0 0 0
0 1 0 0 0 1 1
1 0 0 0 1 0 1
1 1 0 0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0

For the table on the right, again focus on the rows ending in 1’s. For the second
row the clause is ¬𝑃 ∧ ¬𝑄 ∧ 𝑅 . Target the third row with ¬𝑃 ∧ 𝑄 ∧ ¬𝑅 and the
fifth row with 𝑃 ∧ ¬𝑄 ∧ ¬𝑅 . Now put these clauses together with ∨’s to get the
statement with the given table. (A statement consisting of clauses using ∧’s that
are joined with ∨’s is in Disjunctive Normal Form, DNF. See Section C.)

(¬𝑃 ∧ ¬𝑄 ∧ 𝑅) ∨ (¬𝑃 ∧ 𝑄 ∧ ¬𝑅) ∨ (𝑃 ∧ ¬𝑄 ∧ ¬𝑅) (∗)

Next we translate those expressions into physical devices. The

observation that you can use this form of a propositional logic expres-
sion to systematically design logic circuits was made by C Shannon in
his 1937 master’s thesis. We can get electronic devices, called gates,
that perform logical operations on signals. (For this discussion we
take a 1 to be signaled by the presence of 5 volts and a 0 to be 0 volts).
On the left below is the schematic symbol for an and gate with two
input wires and one output wire, whose behavior is that a signal only Claude Shannon
appears on the output if there is a signal on both inputs. Symbolized 1916–2001
in the middle is an or gate, where there is signal out if either input
has a signal. On the right is a not gate.
Extra B. Hardware 41

A schematic of a circuit that implements expression (∗), given below, shows

three input signals on the three wires at left. For instance, to implement the
first clause, the top and gate is fed the not 𝑃 , the not 𝑄 , and the 𝑅 signals. The
second and third clauses are implemented in the other two and gates. Then the
output of the and gates goes through the or gate. (These and and or gates are
engineered to take three inputs; we won’t worry about what types of these devices
are commercially available.)

𝑃𝑄𝑅

Clearly by following this procedure we can in principle build a physical device

with any desired input/output behavior. In particular, we can in this way build a
Turing machine.
We will close with an aside. A person can wonder how these gates are
constructed, and especially can wonder how a not gate is possible — isn’t having
voltage out when there is no voltage in creating something out of nothing?
The answer is that the descriptions above abstract out some things. Here is the
internal construction of a type of not gate.

G 5 volts
𝑉out
S
𝑉in

On the right is a battery, which as we shall see supplies the extra voltage. On the
top left, shown as a wiggle, is a resistor. When current is flowing around the circuit,
this resistor regulates the power output from the battery.
On the bottom left, shown with the circle, is a transistor. If there is enough
voltage between G and S then this component allows current from the battery
to flow between D and S. (Because it is sometimes open and sometimes closed
it is depicted as a switch, although it has no moving parts.) This transistor is
manufactured such that an input voltage 𝑉in of 5 volts will trigger this event.
To verify that this circuit inverts the signal, assume first that 𝑉in = 0. Then
there is no current flow between D and S. With no current the resistor provides no
voltage drop and Consequently the output voltage 𝑉out across the gap is all of the
voltage supplied by the battery, 5 volts. So 𝑉in = 0 results in 𝑉out = 5.
42 Chapter I. Mechanical Computation

Conversely, assume that 𝑉in = 5. Then current flows between D and S, and so
the resistor drops the voltage, meaning that the output is 𝑉out = 0.
Thus, for this device the voltage out 𝑉out is the opposite of the voltage in 𝑉in .

I.B Exercises
B.1 A propositional logic operator that is often used is Exclusive Or, XOR. It is
defined by: 𝑃 XOR 𝑄 = 1 if and only if 𝑃 ≠ 𝑄 . (a) Specify a truth table and from it
construct a DNF propositional logic expression. (b) Use that to make a circuit.
B.2 The propositional logic operator Implication, →, is given by: 𝑃 → 𝑄 is 1
except when 𝑃 is 1 and 𝑄 is 0.
(a) Make a truth table and from it construct a Disjunctive Normal Form expression.
(b) Use that to make a circuit.
B.3 For the table below, construct a DNF propositional logic expression and use
that to make a circuit.
𝑃 𝑄
0 0 0
0 1 1
1 0 0
1 1 1

B.4 Construct a propositional logic expression in Disjunctive Normal Form and

use that to make a circuit (a) for the table on the left, (b) for the one on the right.
𝑃 𝑄 𝑅 𝑃 𝑄 𝑅
0 0 0 0 0 0 0 1
0 0 1 1 0 0 1 0
0 1 0 1 0 1 0 1
0 1 1 0 0 1 1 0
1 0 0 0 1 0 0 0
1 0 1 1 1 0 1 0
1 1 0 0 1 1 0 1
1 1 1 0 1 1 1 1

B.5 Make a table with inputs 𝑃 , 𝑄 , and 𝑅 for the behavior that the output is 1 if 𝑃
equals 𝑅 . Produce the associated DNF expression. Draw the circuit.
B.6 Make a three-input table for the behavior: the output is 1 if a majority of the
inputs are 1’s. Produce the associated DNF expression. Draw the circuit.
B.7 Consider the input/output behavior that the output is 1 if a majority of the
inputs are 1’s (this does not allow ties).
(a) Make a four-input table for the behavior. Produce the associated Disjunctive
Normal Form expression.
(b) Also produce the DNF expression for this behavior with five inputs.
B.8 To add two binary numbers the most natural approach works like the grade
school decimal addition algorithm. Start at the right with the one’s column. Add
Extra C. Game of Life 43

those two bits and possibly carry a 1 to the next column. Then add down the next
column, including any carry. Repeat this from right to left.
(a) Use this method to add the two binary numbers 1011 and 1001.
(b) Make a truth table giving the desired behavior in adding the numbers in one
column. It must have three inputs because of the possibility of a carry. It must
also have two output columns, one for the least significant bit of the sum along
with one for any carry.
(c) Draw the circuits.

Extra
I.C Game of Life
J von Neumann was one of the twentieth century’s most prolific and
influential mathematicians. Just in computing, his contributions to
developments in hardware are significant enough that the single-memory
stored-program architecture is commonly called the von Neumann archi-
tecture, and in software he was also an important innovator including
inventing merge sort.
One of the many things he studied was the problem of humans living
on Mars. He thought that to colonize Mars we should first terraform
it with robots. Mars is red because it is full of rust, iron oxide. Robots John von Neu-
mann 1903-
could mine that rust, break it into iron and oxygen, and release the
1957
oxygen into the atmosphere. With all of that iron, the robots could make
more robots. So von Neumann was thinking about making machines that could
self-reproduce.† A suggestion from his best friend S Ulam led him to explore the
topic by computing on a grid, a cellular automaton.
Widespread interest in cellular automata greatly increased with
the appearance of the Game of Life, by J Conway. It was featured
in M Gardner’s celebrated Mathematical Games column of Scientific
American in October 1970. The rules are simple enough that a person
could immediately start experimenting. Lots of people did. When
personal computers appeared, Life became a computer craze since
it is easy for a beginner to program.
John Conway 1937–
Start by drawing a two-dimensional grid of square cells, as with
2020
graph paper. Each cell has eight neighbors, four that are horizontally
or vertically adjacent and four more that are diagonally adjacent. The game
proceeds in stages, or generations. At each generation each cell is in one of two
states, alive or dead. For the next generation the next state is determined by: (1) a
live cell with two or three live neighbors will again be live at the next generation but
any other live cell dies, (2) a dead cell with exactly three live neighbors becomes
alive at the next generation but other dead cells stay dead. (The backstory goes
that for (1) live cells will die if they are either isolated or overcrowded while for (2),
if the environment is just right then the neighbors can reproduce to spread life
†
There is a later Extra on self-reproduction.
44 Chapter I. Mechanical Computation

into this cell.) We begin by seeding the board with some initial pattern, and then
watch what develops.
As Gardner noted, the rules of the game balance tedious simplicity against
impenetrable complexity.
Conway chose his rules carefully, after a long period of experimentation, to meet
three desiderata:
1. There should be no initial pattern for which there is a simple proof that the
population can grow without limit.
2. There should be initial patterns that apparently do grow without limit.
3. There should be simple initial patterns that grow and change for a considerable
period of time before coming to end in three possible ways: fading away completely
(from overcrowding or becoming too sparse), settling into a stable configuration
that remains unchanged thereafter, or entering an oscillating phase in which they
repeat an endless cycle of two or more periods.
In brief, the rules should be such as to make the behavior of the population unpredictable.
The result, as Conway says, is a mathematical recreation that is a “zero-player
game.”
The simplest nontrivial pattern, a single cell, immediately dies.†

Generation 0 Generation 1

Some other patterns don’t die but don’t do anything else, either. This 2 × 2
collection is a block. It is stable from generation to generation.

Generation 0 Generation 1

Because it doesn’t change, a block is a ‘still life’. Another still life is the beehive.

Generation 0 Generation 1

But many patterns are not still. This three-cell pattern, the blinker, does a
simple oscillation.

Generation 0 Generation 1 Generation 2

†
These pictures show the part of the game board containing the cells that are alive.
Extra C. Game of Life 45

There are other patterns that move. This is a glider, the most famous pattern in
Life.

It moves one cell vertically and one horizontally every four generations, crawling
across the screen.

C.1 Animation: Gliding left and gliding right.

When Conway came up with the Life rules he was not sure whether there is a
pattern where the total number of live cells keeps on growing. B Gosper showed
that there is, by building the glider gun, which produces a new glider every thirty
generations.
The glider pattern an example of a spaceship, a pattern that reappears, displaced,
after a number of generations. Here is another, the medium weight spaceship.

It also crawls across the screen.

C.2 Animation: Moving across space.

Another important pattern is the eater, which consumes gliders and other
spaceships.

Here it eats a medium weight spaceship.

46 Chapter I. Mechanical Computation

C.3 Animation: Eating a spaceship.

The behavior of some initial gameboards is extraordinarily complex. For

example, a methuselah is a small pattern that stabilizes only after a long time.
This pattern is a rabbit. It takes 17 331 turns to stabilize.

How powerful is the Game of Life as a computational system? Although

exhibiting this is beyond our scope, we can build Turing machines in the game and
so it is able to compute anything that can be mechanically computed.
To further explore this topic a great site to visit is conwaylife.com.

I.C Exercises

For some of these a program to simulate the game will be a help. This book’s source
has a Life simulator written in Racket under the src/scheme directory. You can also
find simulators using a search engine.

C.4 On the left is the tub and on the right is the toad. One is a still life and one
an oscillator. Which is which?

C.5 It is easy to run the clock forward. Can you run the clock back?

C.6 We can ask how rare various behaviors are.

(a) How many 3 × 3 grids are there? 𝑛 × 𝑛 ?
(b) How many of the 3 × 3 patterns will result in any cells on the board that
survive into the next generation?
(c) Ten generations?
Extra D. Ackermann’s function is not primitive recursive 47

Extra
I.D Ackermann’s function is not primitive recursive

The hyperoperation H is intuitively mechanically computable.


𝑦+1 – if 𝑛 = 0
– if 𝑛 = 1 and 𝑦 = 0

𝑥




H (𝑛, 𝑥, 𝑦) = 0 – if 𝑛 = 2 and 𝑦 = 0



1 – if 𝑛 > 2 and 𝑦 = 0




 H (𝑛 − 1, 𝑥, H (𝑛, 𝑥, 𝑦 − 1)) – otherwise



We have cited that this function is not primitive recursive. Here we will produce a
simplified variant and then show that it is not primitive recursive.
In H’s definition, the variable 𝑥 does not play an active role. R Péter
noted this and got a function with a simpler definition, by considering
H (𝑛, 𝑦, 𝑦) . That, and tweaking the initial value of each level, gives this.



𝑦+1 – if 𝑘 = 0
A (𝑘, 𝑦) = A (𝑘 − 1, 1) – if 𝑘 > 0 and 𝑦 = 0



 A (𝑘 − 1, A (𝑘, 𝑦 − 1)) – otherwise




Rózsa Péter
This variant, which is an Ackermann function, is often what authors mean
1905–1977
when they discuss ‘the’ Ackermann function.†
This function has only two variables so we can list its first few values
with a table.
𝑦 =0 1 2 3 4 5
𝑘 =0 1 2 3 4 5 6 ...
1 2 3 4 5 6 7 ...
2 3 5 7 9 11 13 ...
3 5 13 29 61 125 253 ...
4 13 65 533 ...

Including the next two entries gives the sense that this function grows very fast
indeed.
65536
A ( 4, 2) = 265536 − 3 A ( 4, 3) = 2 ( 2 )
−3
We will prove that A is not primitive recursive. The intuition is that for any
𝑓 : N𝑛 → N that is primitive recursive, A grows faster than 𝑓 .
Recall that if a function has multiple inputs 𝑥 0, ... 𝑥𝑛− 1 then we sometimes
abbreviate that sequence with the vector 𝑥®. And, we will write max (𝑥) ® for
max ({𝑥 0, ... 𝑥𝑛− 1 }) . To compare the growth of the two-input function A with an
𝑛 -input 𝑓 , we will look at A (𝑘, max (𝑥))
® .
†
Although some authors mean the one-input version 𝑓 (𝑥 ) = A (𝑥, 𝑥 ) .
48 Chapter I. Mechanical Computation

The proof ’s strategy is to show that each primitive recursive function has a
natural number level, but A does not — it grows faster than any fixed-level function.
4.1 Definition Where 𝑘 ∈ N, a function 𝑓 is level 𝑘 if A (𝑘, max (𝑥)) ® > 𝑓 (𝑥) ® for
all 𝑥®.
By item e of the following result, if a function is level 𝑘 then it is also level 𝑘ˆ
for any 𝑘ˆ > 𝑘 .
4.2 Lemma (Monotonicity properties) (a) A (𝑘, 𝑦) > 𝑦
(b) A (𝑘, 𝑦 + 1) > A (𝑘, 𝑦) , and in general if 𝑦ˆ > 𝑦 then A (𝑘, 𝑦)
ˆ > A (𝑘, 𝑦)
(c) A (𝑘 + 1, 𝑦) ≥ A (𝑘, 𝑦 + 1)
(d) A (𝑘, 𝑦) > 𝑘
(e) A (𝑘 + 1, 𝑦) > A (𝑘, 𝑦) and in general if 𝑘ˆ > 𝑘 then A (𝑘,ˆ 𝑦) > A (𝑘, 𝑦)
(f) A (𝑘 + 2, 𝑦) > A (𝑘, 2𝑦)
Proof Here we will verify the first item, that A (𝑘, 𝑦) > 𝑦 for all 𝑘 and for all 𝑦 ,
leaving the others as Exercise D.12. We will do induction on 𝑘 . The 𝑘 = 0 base
step holds because A ( 0, 𝑦) = 𝑦 + 1, and so A ( 0, 𝑦) > 𝑦 .
For the inductive step, assume that this holds for 𝑘 = 0, ... 𝑛 .

(∗)

∀𝑦 A (𝑘, 𝑦) > 𝑦

We must do the 𝑘 = 𝑛 + 1 case.

∀𝑦 A (𝑛 + 1, 𝑦) > 𝑦 (∗∗)

We will verify (∗∗) with an additional induction, this time on 𝑦 .

For the 𝑦 = 0 base step of this induction inside an induction, the second clause
in the definition of A is that A (𝑛 + 1, 0) = A (𝑛, 1) . By the hypothesis of the outer
induction that statement (∗) is true when 𝑘 = 𝑛 , we have A (𝑛, 1) > 1 > 𝑦 = 0.
Still doing the inside induction, for the inductive step assume that statement (∗∗)
holds for 𝑦 = 0, ... 𝑚 and consider 𝑦 = 𝑚 + 1. The definition’s third clause gives
A (𝑛 + 1, 𝑚 + 1) = A (𝑛, A (𝑛 + 1, 𝑚)) . The inductive hypothesis (∗∗) gives that
A (𝑛 + 1, 𝑚) > 𝑚 . The inductive hypothesis of the outer induction, (∗), gives
that because A (𝑛, A (𝑛 + 1, 𝑚)) has a second argument of at least 𝑚 + 1 then
A (𝑛 + 1, 𝑚 + 1) > 𝑚 + 1, as required.
D.3 Theorem (Ackermann, 1925) For each primitive recursive function 𝑓 there is
a 𝑘 ∈ N such that 𝑓 is level 𝑘 .
We shall prove this by structural induction. That is, every such 𝑓 is derived
from a finite number of initial functions via a finite number of applications of
the operations of composition and primitive recursion. The proof involves three
lemmas. We will show that there is such a 𝑘 for each initial function. Then we
will show that if there is a level number for each function in a composition then
the result also has a level number. Finally, we will also show that if there is a level
number for the functions used in a primitive recursion then the result also has a
level number.
Extra D. Ackermann’s function is not primitive recursive 49

4.4 Lemma Each of these initial functions has a level: (1) the zero functions Z (𝑥)
® = 0,
(2) the successor functions S (𝑥) ® = 𝑥 + 1, and (3) the projection functions
® = I 𝑖 (𝑥 0, ... 𝑥𝑘 −1 ) = 𝑥𝑖 .
I 𝑖 (𝑥)
Proof For the first, 𝑘 = 0 suffices by the first clause of the definition of A since
A ( 0, 𝑦) = 𝑦 + 1 > Z (𝑦) = 0. For item (2), 𝑘 = 1 works because by Lemma 4.2.e
A ( 1, 𝑦) > A ( 0, 𝑦) = 𝑦 + 1. For (3) take 𝑘 = 0 because by the definition’s first
clause A ( 0, max (𝑥))
® = max (𝑥)® + 1 and that is larger than the projection I 𝑖 (𝑥)
® , as
we are taking a maximum.
4.5 Lemma Let each primitive recursive function 𝑔0, ... 𝑔𝑚− 1, ℎ have a level,
𝑘 0, ... 𝑘𝑚−1, 𝑘𝑚 . Let 𝑓 be the composition 𝑓 (𝑥)
® = ℎ(𝑔0 (𝑥), ® . Then 𝑓
® ... 𝑔𝑚−1 (𝑥))
is level max ({𝑘 0, ... 𝑘𝑚− 1, 𝑘𝑚 }) + 2.
Proof Take 𝑘 = max ({𝑘 0, ... 𝑘𝑚− 1, 𝑘𝑚 }) . Then all of the functions 𝑔0, ... 𝑔𝑚− 1, ℎ
are level 𝑘 by Lemma 4.2.e.
Lemma 4.2’s item c and then the third clause in A’s definition gives this.

A (𝑘 + 2, max (𝑥))
® ≥ A (𝑘 + 1, max (𝑥)
® + 1) = A (𝑘, A (𝑘 + 1, max (𝑥)))
® (∗)

Focusing on the second argument of the right-hand expression, Lemma 4.2.e and
the assumption that each function 𝑔0, ... 𝑔𝑚− 1 is level 𝑘 show that for each function
index 𝑖 ∈ { 0, ... 𝑚 − 1 } we have A (𝑘 + 1, max (𝑥)) ® > A (𝑘, max (𝑥)) ® > 𝑔𝑖 (𝑥) ®.
Hence A (𝑘 + 1, max (𝑥))
® > max ({ 𝑔0 (𝑥),® ... 𝑔𝑚−1 (𝑥)® }) .
Lemma 4.2.b says that A is monotone in the second argument, so returning to
equation (∗) and swapping out A (𝑘 + 1, max (𝑥)) ® gives the first inequality here.
A (𝑘 + 2, max (𝑥))
® ≥ A (𝑘, max ({𝑔0 (𝑥),
® ... 𝑔𝑚−1 (𝑥)
® }))
> ℎ(𝑔0 (𝑥), ® = 𝑓 (𝑥)
® ... 𝑔𝑚−1 (𝑥)) ®

The second inequality holds because the function ℎ is level 𝑘 .

4.6 Lemma Assume that the function 𝑓 is obtained by the schema of primitive
recursion
– if 𝑦 = 0
(
®
𝑔(𝑥)
® 𝑦) =
𝑓 (𝑥,
ℎ(𝑓 (𝑥, ® 𝑧) – if 𝑦 = S (𝑧)
® 𝑧), 𝑥,
where 𝑔 has level 𝑘𝑔 and ℎ has level 𝑘ℎ . Then 𝑓 has level max ({𝑘𝑔 , 𝑘ℎ }) + 3.
Proof As in the prior argument it will simplify things to take 𝑘 = max ({𝑘𝑔 , 𝑘𝑛 })
so that both of the functions 𝑔 and ℎ are level 𝑘 .
We will first show this.

A (𝑘 + 1, max (𝑥)
® + 𝑦) > 𝑓 (𝑥,
® 𝑦) (∗)

We will use induction on 𝑦 . The 𝑦 = 0 base step is that A (𝑘 + 1, max (𝑥)

® + 0) =
A (𝑘 + 1, max (𝑥))
® is greater than 𝑓 (𝑥,
® 0) = 𝑔(𝑥)
® , because 𝑔 is level 𝑘 .
For the inductive step assume that (∗) holds for 𝑦 = 0, ... 𝑛 and consider
𝑦 = 𝑛 + 1. The third clause of 𝐴’s definition is that A (𝑘 + 1, max (𝑥)
® + 𝑛 + 1) =
50 Chapter I. Mechanical Computation

A (𝑘, A (𝑘 + 1, max (𝑥)

® + 𝑛)) . The second argument, A (𝑘 + 1, max (𝑥)
® + 𝑛) , is larger
than max (𝑥) ® + 𝑛 by Lemma 4.2.a, and so is larger than any 𝑥𝑖 and larger than 𝑛 .
It is also larger than 𝑓 (𝑥,
® 𝑛) by the inductive hypothesis.

A (𝑘 + 1, max (𝑥)
® + 𝑛) > max ({ 𝑓 (𝑥,
® 𝑛), 𝑥 0, ... 𝑥𝑛−1, 𝑛 })

With that, the first inequality below follows from Lemma 4.2.b, monotonicity of A
in its second argument. The second holds because ℎ is a level 𝑘 function.

A (𝑘 + 1, max (𝑥)
® + 𝑛 + 1) = A (𝑘, A (𝑘 + 1, max (𝑥)
® + 𝑛))
> A (𝑘, max ({ 𝑓 (𝑥,
® 𝑛), 𝑥 0, ... 𝑥𝑛−1, 𝑛 }))
> ℎ(𝑓 (𝑥,
® 𝑧), 𝑥, ® 𝑛 + 1)
® 𝑛) = 𝑓 (𝑥,

That ends the verification of (∗).

To finish the proof, Lemma 4.2.f gives the first inequality below.

A (𝑘 + 3, max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 })) > A (𝑘 + 1, 2 · max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 }))
≥ A (𝑘 + 1, max (𝑥)
® + 𝑦)
> 𝑓 (𝑥,
® 𝑦)

The second inequality follows from 2 · max ({𝑥 0, ... 𝑥𝑚− 1, 𝑦 }) ≥ max (𝑥)
® + 𝑦 , and
the third is (∗).
4.7 Corollary The function A is not primitive recursive.
Proof If A were primitive recursive then it would be of some level, 𝑘 . That
means A (𝑘, max ({𝑥, 𝑦 })) > A (𝑥, 𝑦) for all 𝑥, 𝑦 . Taking 𝑥 and 𝑦 to be 𝑘 gives a
contradiction.

I.D Exercises
D.8 In base 10, how many digits are in A ( 4, 2) = 265536 − 3?
D.9 A classmate asks you, “How does it work that all the levels of A are primitive
recursive but as a whole it is not? Isn’t that like saying you have a cake and all the
parts are delicious but the cake as a whole is not?”
D.10 Trace through the argument to find a level number 𝑘 for these primitive
recursive functions (it needn’t be the least level).
(a) 𝑓 (𝑦) = 𝑦 + 2
(b) pred (𝑦) = 𝑦 − 1 if 𝑦 > 0 and pred ( 0) = 0.
D.11 Show that for any 𝑘, 𝑦 the evaluation of A (𝑘, 𝑦) terminates.
D.12 Verify these parts of Lemma 4.2. (a) Item b, A (𝑘, 𝑦 + 1) > A (𝑘, 𝑦) and
in general if 𝑦ˆ > 𝑦 then A (𝑘, 𝑦)
ˆ > A (𝑘, 𝑦) (b) Item c, A (𝑘 + 1, 𝑦) ≥ A (𝑘, 𝑦 + 1)
(c) Item d, A (𝑘, 𝑦) > 𝑘 (d) Item e, A (𝑘 + 1, 𝑦) > A (𝑘, 𝑦) and in general if 𝑘ˆ > 𝑘
ˆ 𝑦) > A (𝑘, 𝑦) (e) Item f, A (𝑘 + 2, 𝑦) > A (𝑘, 2𝑦)
then A (𝑘,
Extra E. LOOP programs 51

Extra
I.E LOOP programs
The primitive recursive functions are a proper subset of the general recursive
functions. The latter set consists of all functions that are mechanically computable
(under Church’s Thesis), so that collection is easy to understand. We will now give
a concrete way to understand the partial recursive functions.
Here is a Racket for loop,
(define (show-numbers)
(for ([i '(1 2 3)])
(display i)))

and what happens is what you’d think would happen.

> (show-numbers)
123

This is a Racket do loop (which is like a while in some other languages).

(define (wait-until-yes)
(printf "Please enter 'yes'\n")
(do () ; initialization variables (here, none)
((string=? (read-line) "yes") (printf "Thanks\n")) ; stop condition
(printf "Enter exactly the string 'yes'\n"))) ; body of do loop

The difference is that in a for loop we know in advance the number of times that
the machine will go through the code inside the loop (above it is three times) —
as long as we don’t change the value of the loop variable — but a do allows the
machine to go through its code an unbounded number of times.
> (wait-until-yes)
Please enter 'yes'
yse
Enter exactly the string 'yes'
yes
Thanks

The next result says that a function is primitive recursive if and only if it can be
computed using only for loops.
E.1 Theorem (Meyer and Ritchie, 1967) A function is primitive recursive if and
only if it can be computed without using unbounded loops. More precisely, it is
limited to loops where we can compute in advance, using only primitive recursive
functions, how many iterations will occur.
We will show half of this, that if a function is
primitive recursive then we can compute it using only
bounded loops. We will do it by programming the
primitive recursive functions in a language, called
LOOP, that does not have unbounded loops. (Proof of
the converse is outside our scope.)
Albert Meyer b 1941 and Dennis Programs in LOOP execute on a machine model
Ritchie 1941–2011 (inventor of
with registers r0 , r1 , . . . that hold natural numbers.
C)
There are four kinds of instructions, which we de-
scribe using r0 and r1 : (i) r0 = 0 sets the contents of the register to zero,
52 Chapter I. Mechanical Computation

(ii) r0 = r0 + 1 increments the contents of the register, (iii) r1 = r0 copies

the contents of r0 into r1 , leaving r0 unchanged, and (iv) loop r0 ... end
executes a sequence of instructions repeatedly, with the number of repetitions given
by the value of the register. For the last, the ‘ ... ’ is a sequence that could contain
any of the four kinds of statements, including that it could contain a nested loop
(which might in turn contain its own nested loop , etc.).
An example is that the LOOP program below finishes with the register r0
holding a value of 4, and with r1 holding 2 (the indentation in the loop is just for
visual clarity). Note that registers start with a value of zero, unless we preload the
machine before running the program.
r1 = r1 + 1
r1 = r1 + 1
loop r1
r0 = r0 + 1
r0 = r0 + 1
end

Very important: changing the contents of the loop register inside of the loop
does not change the number of times that the machine steps through that loop.
Thus, what’s below is not an infinite loop.
loop r0
r0 = r0 + 1
end

Instead, when the loop ends the value in r0 will be twice what it was when the
loop began.
To interpret LOOP programs as computing functions, we need a convention for
input and output. Where the function takes 𝑛 inputs, we will preload those inputs
into the the machine’s first 𝑛 registers. Similarly, where the function has 𝑚 outputs
we take those to be the final values of the first 𝑚 registers.
With that convention, this LOOP program computes the two-input, one output
addition function plus (𝑥, 𝑦) = 𝑥 + 𝑦 .
# plus.loop Return r0 + r1
loop r1
r0 = r0 + 1
end

This book’s source distribution comes with loop . rkt , a Racket program that
interprets LOOP code. Here is an invocation running that code.†
jim@millstone:src/scheme/prologue$ ./loop.rkt -f machines/plus.loop -p "3 2" -s
r0=3 r1=2
--start loop of 2 repetitions --
r0=4 r1=2
r0=5 r1=2
--end loop--
5

The program options are: p preloads the registers r0 and r1 with 3 and 2, while
s shows the registers for each step of the computation. By default the simulator
returns the value of the first register, here 5.
†
Racket version 8.2.
Extra E. LOOP programs 53

This LOOP program computes the multiplication function product (𝑥, 𝑦) = 𝑥 · 𝑦 .

# product.loop Return r0 * r1
r2 = r1 # save the inputs in higher registers
r1 = r0 #
r0 = 0
loop r2
loop r1
r0 = r0 + 1
end
end

It has a nested loop.

jim@millstone:src/scheme/prologue$ ./loop.rkt -f machines/product.loop -p "3 2" -s
r0=3 r1=2
r0=3 r1=2 r2=2
r0=3 r1=3 r2=2
r0=0 r1=3 r2=2
--start loop of 2 repetitions --
--start loop of 3 repetitions --
r0=1 r1=3 r2=2
r0=2 r1=3 r2=2
r0=3 r1=3 r2=2
--end loop--
--start loop of 3 repetitions --
r0=4 r1=3 r2=2
r0=5 r1=3 r2=2
r0=6 r1=3 r2=2
--end loop--
--end loop--
6

Two more examples. This computes the predecessor function pred (𝑥) ,
# pred.loop Return r0 - 1 (or 0)
loop r0
r2 = r1
r1 = r1 + 1
end
r0 = r2

which equals 𝑥 − 1 unless 𝑥 equals 0, when it equals 0.

jim@millstone:src/scheme/prologue$ ./loop.rkt -f machines/pred.loop -p "3" -s
r0=3
--start loop of 3 repetitions --
r0=3 r1=0 r2=0
r0=3 r1=1 r2=0
r0=3 r1=1 r2=1
r0=3 r1=2 r2=1
r0=3 r1=2 r2=2
r0=3 r1=3 r2=2
--end loop--
r0=2 r1=3 r2=2
2

And this uses predecessor to compute proper subtraction 𝑥 −. 𝑦 ,

#proper-sub.loop Return r0 - r1 (or 0)
loop r1
r3 = 0
loop r0
r2 = r3
r3 = r3 + 1
end
54 Chapter I. Mechanical Computation

r0 = r2
end

which equals 𝑥 − 𝑦 unless 𝑦 is greater than 𝑥 , when the outcome is 0.

jim@millstone:src/scheme/prologue$ ./loop.rkt -f machines/proper-sub.loop -p "3 2" -s
r0=3 r1=2
--start loop of 2 repetitions --
r0=3 r1=2 r3=0
--start loop of 3 repetitions --
r0=3 r1=2 r2=0 r3=0
r0=3 r1=2 r2=0 r3=1
r0=3 r1=2 r2=1 r3=1
r0=3 r1=2 r2=1 r3=2
r0=3 r1=2 r2=2 r3=2
r0=3 r1=2 r2=2 r3=3
--end loop--
r0=2 r1=2 r2=2 r3=3
r0=2 r1=2 r2=2 r3=0
--start loop of 2 repetitions --
r0=2 r1=2 r2=0 r3=0
r0=2 r1=2 r2=0 r3=1
r0=2 r1=2 r2=1 r3=1
r0=2 r1=2 r2=1 r3=2
--end loop--
r0=1 r1=2 r2=1 r3=2
--end loop--
1

Finally, this illustrates a routine with more than one output.

# rotate-shift-right.loop input x,y,z, output z,x,y
r3 = r2
r2 = r1
r1 = r0
r0 = r3

The program’s o option lets us show three registers instead of the default one
jim@millstone$ ./loop.rkt -f machines/rotate-shift-right.loop -p "1 2 3" -o 3
3 1 2

(we’ve avoided showing the computation’s steps by not using the s option).
We are now ready to prove that for each primitive recursive function there is a
LOOP program that computes it. The strategy is to first show how to compute the
initial functions and then show how to do the combining operations of function
composition and primitive recursion.
The zero function Z (𝑥) = 0 is computed by the LOOP program whose single
line is r0 = 0. The successor function S (𝑥) = 𝑥 + 1 is computed by the one-line
r0 = r0 + 1. Projection I 𝑖 (𝑥 0, ... 𝑥𝑖 , ... 𝑥𝑛−1 ) = 𝑥𝑖 is computed by r0 = r𝑖 .
Composition of two functions is easy. Let 𝑔(𝑥 0, ... 𝑥𝑛 ) and 𝑓 (𝑦0, ... 𝑦𝑚 ) be
computed by LOOP programs 𝑃𝑔 and 𝑃 𝑓 . Suppose that the bookkeeping of the
composition 𝑓 ◦ 𝑔 is right, that 𝑔 is an 𝑚 -output function to match the number of
𝑓 ’s inputs. Then concatenating the two programs, so that the instructions of 𝑃𝑔
are just followed by the instructions of 𝑃 𝑓 , gives the desired LOOP program for
composition, since it uses the output of 𝑔 as input to compute the action of 𝑓 .
General composition starts with
𝑓 (𝑥 0, ... 𝑥𝑛 ), ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 )
Extra E. LOOP programs 55

and produces 𝑓 (ℎ 0 (𝑦0,0, ... 𝑦0,𝑚0 ), ... ℎ𝑛 (𝑦𝑛,0, ... 𝑦𝑛,𝑚𝑛 )) . This needs a little more
thought than the two-function case. The issue is that were we to load the inputs
𝑦0,0 , . . . 𝑦𝑛,𝑚𝑛 into the registers r0 , r1 , . . . and then immediately begin computing
ℎ 0 , there would be a danger of overwriting the inputs for later functions such as
ℎ 1 . For instance, rotate - shift - right . loop above used an extra register, r3 ,
beyond those used to store inputs.
So we must move those inputs out of the way. Let 𝑃 𝑓 , 𝑃ℎ0 , . . . 𝑃ℎ𝑛 be LOOP
programs to compute the functions. Each uses a limited number of registers and
thus there is a number 𝑗 so large that no program uses register 𝑗 . By definition, the
program 𝑃 to compute the composition gets the sequence of inputs starting in the
register numbered 0. The first step is to copy these inputs to start in the register 𝑗 .
Next, zero out the registers below register 𝑗 , copy ℎ 0 ’s arguments down to begin
at r0 , and run the program 𝑃ℎ0 . When it finishes, copy its output to the register
numbered 𝑗 + 𝑚 0 + · · · + 𝑚𝑛 + 1. Do a similar thing for the other ℎ𝑖 ’s. Finish
by copying these outputs down to the initial registers, zeroing out the remaining
registers, and running 𝑃 𝑓 .
The other combiner operation is primitive recursion.

– if 𝑦 = 0
(
𝑔(𝑥 0, ... 𝑥𝑘 −1 )
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑦) =
ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑧), 𝑥 0, ... 𝑥𝑘 −1, 𝑧) – if 𝑦 = S (𝑧)

Suppose that we have LOOP programs 𝑃𝑔 and 𝑃ℎ . The register swapping needed
is similar to what happens for composition so we won’t go through it. The
program 𝑃 𝑓 starts by running 𝑃𝑔 . Then it sets a fresh register to 0; call that
register t. Now it enters a loop based on the register y (that is, successive times
through the loop count down as 𝑦 , 𝑦 − 1, etc.). The body of the loop computes
𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑡 + 1) = ℎ(𝑓 (𝑥 0, ... 𝑥𝑘 −1, 𝑡), 𝑥 0, ... 𝑥𝑘 −1, 𝑡) by running 𝑃ℎ , and then
incrementing t. That ends the argument.
We close with a remark on an interesting aspect of loop . rkt , the interpreter
for LOOP. It works by replacing the C-like syntax used above with a LISP-ish one.
For instance, the interpreter converts the string input on the left to the string on
the right.

r1 = r1 + 1 ((incr r1) (loop r1 (incr r0)))

loop r1
r0 = r0 + 1
end

The advantage of this switch is that the parentheses automatically match the
beginning of each loop with its end and thus we don’t have to write into the
interpreter some code including a stack to keep track of loop nesting. With the
string on the right, loop . rkt computes the answer by running it through the
eval command.
56 Chapter I. Mechanical Computation

I.E Exercises
E.2 Write a LOOP program that inputs two numbers and swaps them, so that 𝑥, 𝑦
becomes 𝑦, 𝑥 .
E.3 Argue that the LOOP language would not gain strength if it were to allow
statements like r0 = r0 + 2, or statements like r0 = 1.
E.4 The program rotate - shift - right . loop inputs three numbers, outputs
three, and shifts the inputs right (with the third number ending in the first register).
Write an three input/three output program that does a rotate shift left. Also write
the program that composes the two. What does it compute?
E.5 In Ackermann’s function, after the operations plus (𝑥, 𝑦) and product (𝑥, 𝑦)
comes power (𝑥, 𝑦) . Write a LOOP program for it.
E.6 What happens when you try to change a Racket loop variable inside of the
loop? For example, what is the behavior of these two?

(for ([i (in-range 5)]) (for ([i (in-range 5)])

(set! i 1) (displayln (~a "i=" i))
(displayln (~a "i=" i))) (set! i 1))
Chapter
II Background

We want to understand the set of functions that are effective, that are mechani-
cally computable, which we have defined as computable by a Turing machine. The
major result of this chapter and the single most important result in the book is
that there are functions not computed by any machine — there are jobs that no
machine can do. We will first prove this with a counting argument, and later in
the chapter we will give specific problems that are unsolvable.

Section
II.1 Infinity
We will show that there are more functions 𝑓 : N → N than Turing machines and
that therefore there are functions with no associated machine.

Cardinality The set of functions and the set of Turing machines are both
infinite. We will begin with two paradoxes that dramatize the challenge
to our intuition posed by comparing the sizes of infinite sets. We will then
produce the mathematics to resolve these puzzles, and apply it to the sets
of functions and Turing machines.
The first puzzle is Galileo’s Paradox. It compares the size of the set
of perfect squares with the size of the set of natural numbers. The first
is a proper subset of the second and so it may seem somehow smaller.
Galileo Galilei
However, the figure below shows that the two sets can be made to match
1564–1642
element-to-element, to correspond, so in this sense there are exactly as
many squares as there are natural numbers.
0 1 2 3 4 5 6 7 8 9 10 11 ...

0 1 4 9 16 25 36 49 64 81 100 121 ...

1.1 Animation: Correspondence 𝑛 ↔ 𝑛 2 between the natural numbers and the squares.

The second puzzle is Aristotle’s Paradox. On the left below are two circles. If
we roll them through one revolution then the trail left by the smaller one is shorter.
But if we put the smaller inside the larger and roll them, as in a train wheel, then
they appear to leave equal-length trails.
Image: This is the Hubble Deep Field image. It came from pointing the Hubble telescope at the darkest
part of the sky, the very background, for eleven days. It covers an area of the sky about the same width
as a dime viewed seventy five feet away. Every speck is a galaxy. There are thousand of them — there is
a lot in the background. Credit: Robert Williams and the Hubble Deep Field Team (STScI) and NASA.
(Also see the Deep Field movie.)
60 Chapter II. Background

1.2 Animation: Circles of different radiuses 1.3 Animation: Embedded circles

have different circumferences. rolling together.

As with Galileo’s Paradox, a person might think that the smaller circle’s points
make a set that is in some way smaller. But point-for-point, the smaller circle
matches the larger. The correct view is that the two sets of points have the same
number of elements, because they correspond.
The animations below illustrate matching the points in two ways. On the
left they are shown as nested circles, with points on the inside corresponding
to points on the outside. The second animation straightens that out so that the
circumferences make segments, and there for every point on the top there is a
matching point on the bottom.

1.4 Animation: Corresponding points on the circumferences 𝑥 · ( 2𝜋𝑟 0 ) ↔ 𝑥 · ( 2𝜋𝑟 1 ) .

Recall the definition of a correspondence as a function that is both one-to-one

and onto. A function 𝑓 : 𝐷 → 𝐶 is one-to-one if 𝑓 (𝑥 0 ) = 𝑓 (𝑥 1 ) implies that 𝑥 0 = 𝑥 1
for 𝑥 0, 𝑥 1 ∈ 𝐷 . It is onto if for any 𝑦 ∈ 𝐶 there is an 𝑥 ∈ 𝐷 such that 𝑦 = 𝑓 (𝑥) .
(Section B is a review.) Below, the left map is one-to-one but it is not onto because
there is a codomain element with no associated domain element. The right map is
onto but it is not one-to-one as two inputs map to the same output.

1.5 Lemma For a function with a finite domain, the number of elements in its domain
is greater than or equal to the number of elements in its range. If the function is
one-to-one then its domain has the same number of elements as its range, while if
it is not one-to-one then its domain has more elements. Consequently, two finite
sets have the same number of elements if and only if they correspond, that is, if
and only if there is a function from one to the other that is a correspondence.
Proof Exercise 1.49.
1.6 Lemma The relation between two sets of ‘there is a correspondence from one to
the other’ is an equivalence.
Section 1. Infinity 61

Proof Reflexivity, that any set is related to itself, is clear since a set corresponds to
itself via the identity function. For symmetry suppose that 𝑆 0 is related to 𝑆 1 , so
that there is a correspondence 𝑓 : 𝑆 0 → 𝑆 1 , and recall that its inverse 𝑓 − 1 : 𝑆 1 → 𝑆 0
exists and is a correspondence in the other direction. For transitivity, assume
that 𝑆 0 is related to 𝑆 1 and 𝑆 1 is related to 𝑆 2 , so that there are correspondences
𝑓 : 𝑆 0 → 𝑆 1 and 𝑔 : 𝑆 1 → 𝑆 2 . Recall also that the composition 𝑔 ◦ 𝑓 : 𝑆 0 → 𝑆 2 is a
correspondence.
We now give that relation a name. This carries from the finite to the infinite
the observation of Lemma 1.5 about same-sized sets.
1.7 Definition Two sets have the same cardinality or are equinumerous, denoted
|𝑆 0 | = |𝑆 1 | , if there is a correspondence between them.
1.8 Example Galileo’s Paradox is that the set of squares 𝑆 = {𝑛 2 𝑛 ∈ N } has the same
cardinality as N, written |𝑆 | = | N | . The function 𝑓 : N → 𝑆 given by 𝑓 (𝑛) = 𝑛 2
is one-to-one because if 𝑓 (𝑥 0 ) = 𝑓 (𝑥 1 ) then 𝑥 02 = 𝑥 12 and thus, since these are
nonnegative, 𝑥 0 = 𝑥 1 . It is onto because any element of the codomain 𝑦 ∈ 𝑆 is the
square of some 𝑛 from the domain N, by the definition of 𝑆 .
1.9 Example Aristotle’s Paradox is that for 𝑟 0, 𝑟 1 ∈ R+, the interval [ 0 .. 2𝜋𝑟 0 ) has the
same cardinality as the interval [ 0 .. 2𝜋𝑟 1 ) . The map 𝑔(𝑥) = ( 2𝜋𝑟 1 /2𝜋𝑟 0 ) · 𝑥 is a
correspondence; verification is Exercise 1.43.
1.10 Example The sets 𝑆 0 = { 0, 1, 2, 3 } and 𝑆 1 = { 10, 11, 12, 13 } have the same
cardinality, |𝑆 0 | = |𝑆 1 | . One correspondence, from 𝑆 0 to 𝑆 1 , is 𝑥 ↦→ 𝑥 + 10.
1.11 Example The set of natural numbers greater than zero, N+ = { 1, 2, ... }, has the
same cardinality as N. A correspondence is 𝑓 : N → N+ given by 𝑛 ↦→ 𝑛 + 1.
Comparing the sizes of sets in this way was proposed by G Cantor in
the 1870’s. As the paradoxes above dramatize, Definition 1.7 introduces
a deep idea. We should convince ourselves that it captures what we
mean by sets having the ‘same number’ of elements. One supporting
argument is that it is the natural generalization of Lemma 1.5. A
second is Lemma 1.6, that it partitions sets into classes so that inside
of a class all of the sets have the same cardinality. That is, it justifies
the “equi” in equinumerous. The most important supporting argument
is that, as with Turing’s definition of his machine, Cantor’s definition
Georg Cantor is persuasive in itself. Gödel noted this, writing “Whatever ‘number’
1845–1918 as applied to infinite sets may mean, we certainly want it to have the
property that the number of objects belonging to some class does not
change if, leaving the objects the same, one changes in any way . . . e.g., their
colors or their distribution in space . . . From this, however, it follows at once that
two sets will have the same [cardinality] if their elements can be brought into
one-to-one correspondence, which is Cantor’s definition.”
1.12 Definition A set is finite if it has the same cardinality as { 0, 1, ... 𝑛 } for some
𝑛 ∈ N, or if it is empty. Otherwise it is infinite.
For us the most important infinite set is the natural numbers, N = { 0, 1, 2, ... }.
62 Chapter II. Background

1.13 Definition A set with the same cardinality as the natural numbers is countably
infinite. A set that is either finite or countably infinite is countable. If a set is
the range of a function whose domain is the natural numbers then we say the
function enumerates, or is an enumeration of, that set.†
The idea behind the term ‘enumeration’ is that 𝑓 : N → 𝑆 lists its range: first
𝑓 ( 0) , then 𝑓 ( 1) , etc. (This listing might have repeats, where 𝑓 (𝑛 0 ) = 𝑓 (𝑛 1 ) but
𝑛 0 ≠ 𝑛 1 .) We are often interested in enumerations that are computable.
1.14 Example The set of multiples of three, 3N = { 3𝑘 𝑘 ∈ N }, is countable. The
natural map 𝑔 : N → 3N is 𝑔(𝑛) = 3𝑛 .
1.15 Example The set N − { 2, 5 } = { 0, 1, 3, 4, 6, 7, ... } is countable. The function below,
both formally defined and illustrated with a table, closes up the gaps.
𝑛 – if 𝑛 < 2
0 1 2 3 4 5 6 ...

𝑛

𝑓 (𝑛) = 𝑛 + 1 – if 𝑛 ∈ { 2, 3 }


𝑓 (𝑛) 0 1 3 4 6 7 8 ...
𝑛 + 2 – if 𝑛 ≥ 4



This function is clearly both one-to-one and onto.
1.16 Example The set of prime numbers 𝑃 is countable. There is a function 𝑝 : N → 𝑃
where 𝑝 (𝑛) is the 𝑛 -th prime, so that 𝑝 ( 0) = 2, 𝑝 ( 1) = 3, etc.
1.17 Example Fix the set of symbols Σ = { a, ... z }. Consider the set of strings made of
those symbols, such as az and abba. The set of all such strings, Σ∗, is countable. This
table illustrates one correspondence, the one that puts the strings in lexicographic
order, where shorter strings come before longer ones and equal-length strings
come in alphabetical order. (The first entry is the empty string, 𝜀 = ‘ ’.)

𝑛∈N 0 1 2 ... 26 27 28 ...

𝑓 (𝑛) ∈ Σ∗ 𝜀 a b ... z aa ab ...
1.18 Example The set of integers Z = { ... , −2, −1, 0, 1, 2, ... } is countable. The natural
correspondence alternates between positive and negative numbers.

𝑛∈N 0 1 2 3 4 5 6 ...
𝑓 (𝑛) ∈ Z 0 +1 −1 +2 −2 +3 −3 ...

We close this section by circling back to the paradoxes of infinity. In Exam-

ple 1.18 we might naively expect that the positives and the negatives combine
make Z somehow twice as big as N. But this is the point of Galileo’s Paradox: the
correct way to measure how many elements a set has is not through superset and
subset, but through cardinality.
We close by mentioning one more paradox, due to Zeno (c 450 BC). He imagined
a tortoise challenging swift Achilles to a race, asking only for a head start. Achilles
laughs but the tortoise says that by the time Achilles reaches the spot 𝑥 0 of the head
†
The ‘a function whose domain is the natural numbers’ might seem to imply that the function is total
but in section 7 we will apply this to some functions that are not total, that are not defined on some
natural numbers.
Section 1. Infinity 63

start, the tortoise will have moved on to some 𝑥 1 . On reaching 𝑥 1 , Achilles will
find that the tortoise is ahead at 𝑥 2 . For any 𝑥𝑖 , Achilles will always be behind and
so, the tortoise reasons, Achilles can never get ahead. The heart of this argument
is that while the distances 𝑥𝑖+1 − 𝑥𝑖 shrink toward zero, there is always further to
go because of the open-endedness at the left of the interval ( 0 .. ∞) .

1.19 Figure: Zeno of Elea shows Youths the Doors to Truth and False, by covering half
the distance to the door, and then half of that, etc. (By either B Carducci (1560–
1608) or P Tibaldi (1527–1596).)

Zeno’s Paradox is not directly connected to the material of this section. But in
this chapter we we will often give arguments that use the unboundedness of the
natural numbers, that is, that leverage the open-endedness of N at infinity.

II.1 Exercises

✓ 1.20 Verify Example 1.14, that the function 𝑔 : N → { 3𝑘 𝑘 ∈ N } given by

𝑛 ↦→ 3𝑛 is both one-to-one and onto.
1.21 A friend says, “The perfect squares and the perfect cubes have the same
number of elements because these sets are both one-to-one and onto.” That’s not
right; straighten them out.
1.22 Let 𝑓 , 𝑔 : Z → Z be 𝑓 (𝑥) = 2𝑥 and 𝑔(𝑥) = 2𝑥 − 1. Give a proof or a
counterexample for each. (a) If 𝑓 one-to-one? Is it onto? (b) If 𝑔 one-to-one?
Onto? (c) Are 𝑓 and 𝑔 inverse to each other?
✓ 1.23 Decide if each function is one-to-one, onto, both, or neither. You cannot just
answer ‘yes’ or ‘no’, you must justify the answer.
(a) 𝑓 : N → N given by 𝑓 (𝑛) = 𝑛 + 1
(b) 𝑓 : Z → Z given by 𝑓 (𝑛) = 𝑛 + 1
(c) 𝑓 : N → N given by 𝑓 (𝑛) = 2𝑛
(d) 𝑓 : Z → Z given by 𝑓 (𝑛) = 2𝑛
(e) 𝑓 : Z → N given by 𝑓 (𝑛) = |𝑛| .

1.24 Decide if each is a correspondence (you must also verify): (a) 𝑓 : Q → Q

given by 𝑓 (𝑛) = 𝑛 + 3 (b) 𝑓 : Z → Q given by 𝑓 (𝑛) = 𝑛 + 3 (c) 𝑓 : Q → N given
by 𝑓 (𝑎/𝑏) = |𝑎 · 𝑏 | .
64 Chapter II. Background

1.25 Decide if each set is finite or infinite and justify your answer. (a) { 1, 2, 3 }
(b) { 0, 1, 4, 9, 16, ... } (c) the set of prime numbers (d) the set of real roots of
𝑥 5 − 5𝑥 4 + 3𝑥 2 + 7
1.26 Show that each pair of sets has the same cardinality by producing a one-to-
one and onto function from one to the other. You must verify that the function is a
correspondence. (a) { 0, 1, 2 }, { 3, 4, 5 } (b) Z, {𝑖 3 𝑖 ∈ Z }
✓ 1.27 Show that each pair of sets has the same cardinality by producing a corre-
spondence (you must verify that the function is a correspondence): (a) { 0, 1, 3, 7 }
and { 𝜋, 𝜋 + 1, 𝜋 + 2, 𝜋 + 3 } (b) the even natural numbers and the perfect squares
(c) the real intervals ( 1 .. 4) and (−1 .. 1) .
✓ 1.28 Verify that the function 𝑓 (𝑥) = 1/𝑥 is a correspondence between the subsets
( 0 .. 1) and ( 1 .. ∞) of R.
1.29 Give a formula for a correspondence between the sets { 1, 2, 3, 4, ... } and
{ 7, 10, 13, 16 ... }.
✓ 1.30 Consider the set of characters 𝐶 = { 0, 1, ... 9 } and the set of integers
𝐴 = { 48, 49, ... 57 }.
(a) Produce a correspondence 𝑓 : 𝐶 → 𝐴.
(b) Verify that the inverse 𝑓 − 1 : 𝐴 → 𝐶 is also a correspondence.
✓ 1.31 Show that each pair of sets have the same cardinality. You must give a
suitable function and also verify that it is one-to-one and onto. (a) N and the set
of even numbers (b) N and the odd numbers (c) the even numbers and the odd
numbers
✓ 1.32 Although sometimes there is a correspondence that is natural, correspon-
dences need not be unique. Produce the natural correspondence from ( 0 .. 1) to
( 0 .. 2) , and then produce a different one, and then another different one.
1.33 Example 1.8 gives one correspondence between the natural numbers and
the perfect squares. Give another.
1.34 Fix 𝑐 ∈ R such that 𝑐 > 1. Show that 𝑓 : R → ( 0 .. ∞) given by 𝑥 ↦→ 𝑐 𝑥 is a
correspondence.
1.35 Show that the set of powers of two { 2𝑘 𝑘 ∈ N } and the set of powers of
three { 3𝑘 𝑘 ∈ N } have the same cardinality. Generalize.
1.36 For each, give functions from N to itself. You must justify your claims.
(a) Give two examples of functions that are one-to-one but not onto. (b) Give two
examples of functions that are onto but not one-to-one. (c) Give two that are neither.
(d) Give two that are both.
1.37 Show that the intervals ( 3 .. 5) and (−1 .. 10) of real numbers have the same
cardinality by producing a correspondence. Then produce a second one.
1.38 Show that the sets have the same cardinality. (a) { 4𝑘 𝑘 ∈ N }, { 5𝑘 𝑘 ∈ N }
(b) { 0, 1, ... 99 }, {𝑚 ∈ N 𝑚 2 < 10 000 } (c) { 0, 1, 3, 6, 10, 15, ... }, N
✓ 1.39 Produce a correspondence between each pair of open intervals of reals.
(a) ( 0 .. 1) , ( 0 .. 2)
Section 1. Infinity 65

(b) ( 0 .. 1) , (𝑎 .. 𝑏) for real numbers 𝑎 < 𝑏

(c) ( 0 .. ∞) , (𝑎 .. ∞) for the real number 𝑎
(d) This shows a correspondence 𝑥 ↦→ 𝑓 (𝑥) between a finite interval of reals and
an infinite one, 𝑓 : ( 0 .. 1) → ( 0 .. ∞) .
𝑃 𝑦=1
𝑓 (𝑥)
𝑥

The point 𝑃 is at (−1, 1) . Give a formula for 𝑓 .

✓ 1.40 Not√every set containing irrational numbers is uncountable. Show that the
𝑛
set 𝑆 = { 2 𝑛 ∈ N and 𝑛 ≥ 2 } is countable.
1.41 Let B be the set of characters from which bit strings are made, B = { 0, 1 }.
(a) Let 𝐵 be the set of finite bit strings where the initial bit is 1. Show that 𝐵 is
countable. (b) Let B∗ be the set of finite bit strings, without the restriction on the
initial bit. Show that it also is countable. Hint: use the prior item.
1.42 Use the arctangent function to prove that the sets ( 0 .. 1) and R have the
same cardinality.
1.43 Example 1.9 restates Aristotle’s Paradox as: the intervals 𝐼 0 = [ 0 .. 2𝜋𝑟 0 )
and 𝐼 1 = [ 0 .. 2𝜋𝑟 1 ) have the same cardinality, for 𝑟 0, 𝑟 1 ∈ R+ .
(a) Verify it by checking that 𝑔 : 𝐼 0 → 𝐼 1 given by 𝑔(𝑥) = 𝑥 · (𝑟 1 /𝑟 0 ) is a corre-
spondence.
(b) Show that where 𝑎 < 𝑏 , the cardinality of [ 0 .. 1) equals that of [𝑎 .. 𝑏) .
(c) Generalize by showing that where 𝑎 < 𝑏 and 𝑐 < 𝑑 , the real intervals [𝑎 .. 𝑏)
and [𝑐 .. 𝑑) have the same cardinality.
1.44 Suppose that 𝐷 ⊆ R. A function 𝑓 : 𝐷 → R is strictly increasing if 𝑥 < 𝑥ˆ
implies that 𝑓 (𝑥) < 𝑓 (𝑥)ˆ for all 𝑥, 𝑥ˆ ∈ 𝐷 . Prove that any strictly increasing
function is one-to-one; it is therefore a correspondence between 𝐷 and its range.
(The same applies if the function is strictly decreasing.) Does this hold for 𝐷 ⊆ N?
✓ 1.45 A paradoxical aspect of both Aristotle’s and Galieo’s examples is that they
gainsay Euclid’s “the whole is greater than the part,” because they name sets
where that set is equinumerous with a proper subset. Here, show that each
pair of a set and a proper subset has the same cardinality. (a) N, { 2𝑛 𝑛 ∈ N }
(b) N, {𝑛 ∈ N 𝑛 > 4 }
1.46 Example 1.15 illustrates that we can take away a finite number of elements
from the set N without changing the cardinality. Prove that if 𝑆 is a finite subset
of N then N − 𝑆 is countable.
1.47
(a) Let 𝐷 = { 0, 1, 2, 3 } and 𝐶 = { Spades, Hearts, Clubs, Diamonds } , and let
𝑓 : 𝐷 → 𝐶 be given by 𝑓 ( 0) = Spades, 𝑓 ( 1) = Hearts, 𝑓 ( 2) = Clubs, 𝑓 ( 3) =
Diamonds. Find the inverse function 𝑓 − 1 : 𝐶 → 𝐷 and verify that it is a
correspondence.
66 Chapter II. Background

(b) Let 𝑓 : 𝐷 → 𝐶 be a correspondence. Show that the inverse function exists.

That is, show that associating each 𝑦 ∈ 𝐶 with the 𝑥 ∈ 𝐷 such that 𝑓 (𝑥) = 𝑦
gives a well-defined function 𝑓 − 1 : 𝐶 → 𝐷 .
(c) Show that show that the inverse of a correspondence is also a correspondence,
that the function defined in the prior item is a correspondence.
1.48 Prove that a set S is infinite if and only if it has the same cardinality as a
proper subset of itself.
1.49 Prove Lemma 1.5 by proving each.
(a) For any function with a finite domain, the number of elements in that domain
is greater than or equal to the number of elements in the range. Hint: use
induction on the number of elements in the domain.
(b) If such a function is one-to-one then its domain has the same number of
elements as its range. Hint: again use induction on the size of the domain.
(c) If it is not one-to-one then its domain has more elements than its range.
(d) Two finite sets have the same number of elements if and only if there is a
correspondence from one to the other.

Section
II.2 Cantor’s correspondence

Countability is a property of sets so we can ask how it interacts with set operations.
We start with the Cartesian product operation, in part because we will want to
count Turing machines, which are sets of four-tuples.
2.1 Example The set 𝑆 = { 0, 1 } × N consists of ordered pairs ⟨𝑖, 𝑗⟩ where 𝑖 ∈ { 0, 1 }
and 𝑗 ∈ N. The diagram below shows two columns, each of which looks like
the natural numbers in that it is discrete and unbounded in one direction. So
informally, 𝑆 is twice the natural numbers. As in Galelio’s Paradox this might lead
to a mistaken guess that it has more members than N. But 𝑆 is countable.
To count it, alternate between columns.
.. ..
. .
h0, 3i h1, 3i
h0, 2i h1, 2i
h0, 1i h1, 1i
h0, 0i h1, 0i

2.2 Animation: Counting 𝑆 = { 0, 1 } × N.

Here is that correspondence as a table.

𝑛∈N 0 1 2 3 4 5 ...
⟨𝑖, 𝑗⟩ ∈ 𝑆 ⟨0, 0⟩ ⟨1, 0⟩ ⟨0, 1⟩ ⟨1, 1⟩ ⟨0, 2⟩ ⟨1, 2⟩ ...
Section 2. Cantor’s correspondence 67

The map from the table’s top row to the bottom is a pairing function. Its inverse,
from bottom to top, is an unpairing function. This counting technique extends to
three copies, { 0, 1, 2 } × N, to four copies, etc.
2.3 Lemma The Cartesian product of two finite sets is finite, and therefore countable.
The Cartesian product of a finite set and a countably infinite set, or of a countably
infinite set and a finite set, is countably infinite.
Proof Exercise 2.42; use the above example as a model.
2.4 Example The obvious next set to consider is the Cartesian product of the two
countably infinite sets, N × N. In the informal language of the prior example we
can describe it as infinitely many copies of the natural numbers.
.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···
Sticking to a single column or row won’t work so here also we need to alternate.
Starting from the lower left, do a breadth-first traversal: after ⟨0, 0⟩ , next take pairs
that are one away, ⟨1, 0⟩ and ⟨0, 1⟩ , then those that are two away, ⟨2, 0⟩ , ⟨1, 1⟩
and ⟨0, 2⟩ , etc.
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...

2.5 Animation: Counting N × N.

Here is the same correspondence as a table.

𝑛∈N 0 1 2 3 4 5 6 ...
⟨𝑥, 𝑦⟩ ∈ N2 ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .

2.6 Definition Cantor’s correspondence cantor : N2 → N or unpairing function,

is the above correspondence. Its inverse cantor − 1 : N → N2 is Cantor’s pairing
function. (A notation for cantor (𝑥, 𝑦) common elsewhere is ⟨𝑥, 𝑦⟩ .)
Clearly this function is a correspondence. It is clearly also effective, meaning
that we can write a program to compute it.
2.7 Remark There is also a simple formula for the unpairing function. It is amusing
so we will show it. We first walk through finding the number of the pair ⟨1, 2⟩ .
Number the diagonals.
68 Chapter II. Background

.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
Diagonal 0 1 2 3

The pair ⟨1, 2⟩ is on diagonal 3. Prior to that diagonal comes six pairs: diagonal 0
has a single entry, diagonal 1 has two entries, and diagonal 2 has three entries.
Thus, because the counting starts at zero, diagonal 3’s initial pair ⟨0, 3⟩ is number 6
in Cantor’s correspondence. With that, ⟨1, 2⟩ is number 7.
To find the number corresponding to ⟨𝑥, 𝑦⟩ , observe that it lies on diagonal
𝑑 = 𝑥 + 𝑦 . Prior to diagonal 𝑑 comes 1 + 2 + · · · 𝑑 pairs, which is an arithmetic
series with total 𝑑 (𝑑 + 1)/2. So on diagonal 𝑑 the first pair, ⟨0, 𝑥 + 𝑦⟩ , has
number (𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2 in Cantor’s correspondence. Next on that diagonal,
⟨1, 𝑥 + 𝑦 − 1⟩ gets the number 1 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] , etc. In general,
cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] .
2.8 Example Two early examples are cantor ( 1, 2) = 7 and cantor ( 6, 2) = 42. A later
one is cantor ( 0, 36) = 666.
2.9 Lemma The Cartesian product N × N is countable, for instance under Cantor’s
correspondence, cantor : N2 → N. As well, the sets N3 = N × N × N, and N4 , . . .
are all countable.
Proof The function cantor : N × N → N is one-to-one and onto by construction,
meaning that the construction ensures that each output natural number is associated
with one and only one input pair.
The prior paragraph with domain N2 forms the base step of an induction
argument. To do N3 the idea is to take a triple ⟨𝑥, 𝑦, 𝑧⟩ to be a pair whose first entry
is a pair, ⟨⟨𝑥, 𝑦⟩, 𝑧⟩ . More formally, define cantor3 : N3 → N by cantor3 (𝑥, 𝑦, 𝑧) =
cantor ( cantor (𝑥, 𝑦), 𝑧) . Exercise 2.35 shows that this function is a correspondence.
With that, the details of the full induction are routine.
2.10 Corollary The Cartesian product of finitely many countable sets is countable.
Proof Suppose that 𝑆 0, ... 𝑆𝑛− 1 are countable and that each function 𝑓𝑖 : N → 𝑆𝑖 is
a correspondence. By the prior result, the tuple-ing function cantor𝑛− 1 : N → N𝑛
is a correspondence. Write cantor𝑛− 1 (𝑘) = ⟨𝑘 0, 𝑘 1, ... 𝑘𝑛− 1 ⟩ . Then the composition
𝑘 ↦→ ⟨𝑓0 (𝑘 0 ), 𝑓1 (𝑘 1 ), ... 𝑓𝑛−1 (𝑘𝑛−1 )⟩ from N to 𝑆 0 × · · · 𝑆𝑛−1 is a correspondence.
Thus 𝑆 0 × 𝑆 1 × · · · 𝑆𝑛− 1 is countable.
2.11 Example Also countable is the set of rational numbers, Q. We have already used
the technique of counting by alternating between positives and negatives. So
it suffices to count the nonnegative rationals with some 𝑓 : N → Q+ ∪ { 0 }. A
nonnegative rational number is a numerator-denominator pair ⟨𝑛, 𝑑⟩ ∈ N × N+ .
The complication is that some pairs collapse, such as that 𝑛 = 10 and 𝑑 = 5 is the
Section 2. Cantor’s correspondence 69

same rational as 𝑛 = 2 and 𝑑 = 1. Similarly 𝑛 = 0 and 𝑑 = 1, and 𝑛 = 0 and 𝑑 = 2.

We will describe 𝑓 with an algorithm rather than a formula. Suppose that
it is given input 𝑖 . It will use its prior values 𝑓 ( 0) , 𝑓 ( 1) , . . . 𝑓 (𝑖 − 1) . The
algorithm loops, applying cantor − 1 to generate pairs: first cantor − 1 ( 0) = ⟨0, 0⟩ ,
then cantor − 1 ( 1) = ⟨0, 1⟩ . . . For each such ⟨𝑎, 𝑏⟩ , if the rational number 𝑎/𝑏 is
among the function’s prior values or if the second entry is 𝑏 = 0 (which would give
an invalid fraction) then the algorithm skips this pair, going on to the next loop
iteration, to generate a new candidate pair. It finds an acceptable ⟨𝑎, 𝑏⟩ eventually
because there are infinitely many rationals, and it outputs the rational 𝑓 (𝑖) = 𝑎/𝑏 .
2.12 Remark Having a routine save prior values to use later is memoization or caching.
It is widely used. For example, when your browser visits a web site it saves any
images to your disk so that the next time it visits the site, if the image has not
changed then the browser reuses the prior copy, reducing download time. The
next result also uses memoization.
2.13 Lemma A set 𝑆 is countable if and only if either 𝑆 is empty or there is an onto
map 𝑓 : N → 𝑆 .
Proof Assume first that 𝑆 is countable. If it is empty then we are done. If it is finite
but nonempty, 𝑆 = {𝑠 0, ... 𝑠𝑛− 1 }, then this map is onto.

𝑠𝑖 – if 𝑖 < 𝑛
(
𝑓 (𝑖) =
𝑠 0 – otherwise
If 𝑆 is infinite and countable then it has the same cardinality as N so there is a
correspondence 𝑓 : N → 𝑆 . Correspondences are onto.
For the converse assume that either 𝑆 is empty or there is an onto map from N
to 𝑆 . Definition 1.13 says that an empty set is countable so what’s left is to consider
an onto map 𝑓 : N → 𝑆 . If 𝑆 is finite then it is countable so we are down to the
case where 𝑆 is infinite. Define 𝑓ˆ: N → 𝑆 by 𝑓ˆ(𝑛) = 𝑓 (𝑘) where 𝑘 is the least
natural number such that 𝑓 (𝑘) ∉ { 𝑓ˆ( 0), ... 𝑓ˆ(𝑛 − 1) }. Such a 𝑘 exists because 𝑆 is
infinite and 𝑓 is onto. This 𝑓ˆ is both one-to-one and onto, by construction.
2.14 Corollary (1) Any subset of a countable set is countable. (2) The intersection of
two countable sets is countable. More generally, the intersection of any number of
countable sets is countable. (3) The union of two countable sets is countable. The
union of any finite number of countable sets is countable. The union of countably
many countable sets is countable.
Proof For (1), suppose that 𝑆 is countable and that 𝑆ˆ ⊆ 𝑆 . If 𝑆 is empty then so is 𝑆ˆ,
and thus it is countable. Otherwise by the prior lemma there is an onto 𝑓 : N → 𝑆 .
If 𝑆ˆ is empty then it is countable, and otherwise fix some 𝑠ˆ ∈ 𝑆ˆ. Then this map
𝑓ˆ: N → 𝑆ˆ is onto.
𝑓 (𝑛) – if 𝑓 (𝑛) ∈ 𝑆ˆ
(
𝑓ˆ(𝑛) =
𝑠ˆ – otherwise
Item (2) is immediate from (1) since the intersection is a subset, of both sets.
70 Chapter II. Background

Now item (3). In the two-set case suppose that 𝑆 0 and 𝑆 1 are countable.
If either set is empty, or both, then the result is trivial because for instance
𝑆 0 ∪ ∅ = 𝑆 0 . Otherwise, suppose that 𝑓0 : N → 𝑆 0 and 𝑓1 : N → 𝑆 1 are onto.
Then count by alternating between the two sets. More precisely, Lemma 2.3
gives a correspondence 𝑔 : N → { 0, 1 } × N and this is a function that is onto the
set 𝑆 0 ∪ 𝑆 1 .
𝑓0 ( 𝑗) – if 𝑔(𝑛) = ⟨0, 𝑗⟩
(
𝑓2 (𝑛) =
𝑓1 ( 𝑗) – if 𝑔(𝑛) = ⟨1, 𝑗⟩

This approach extends to any finite number of countable sets.

Finally, we start with countably many countable sets, 𝑆𝑖 for 𝑖 ∈ N, and show
that their union 𝑆 0 ∪ 𝑆 1 ∪ · · · is countable. If all but finitely many are empty then
we can fall back to the finite case so instead suppose that infinitely many of the sets
are nonempty. Throw out the empty ones because they don’t affect the union, write
𝑆ˆ𝑗 for the remaining sets, and assume that we have a family of correspondences
𝑔 𝑗 : 𝑁 → 𝑆ˆ𝑗 . Then use Cantor’s pairing function: the desired onto map from N to
𝑆 0 ∪ 𝑆 1 ∪ · · · is 𝑔(𝑛)
ˆ = 𝑔 𝑗 (𝑘) where cantor−1 (𝑛) = ⟨𝑗, 𝑘⟩ .
2.15 Corollary For any countable set, the collection of its finite subsets is countable.
Proof If the set 𝑆 is empty then the statement is trivial. Otherwise use an onto map
𝑓 : N → 𝑆 to put the elements of 𝑆 into a sequence ⟨𝑠 0, 𝑠 1, ...⟩ with no repeats: take
𝑠 0 = 𝑓 ( 0) , and for each 𝑗 the number 𝑓 ( 𝑗) gets appended to the sequence only
if it is not already a member. With that, any finite subsequence ⟨𝑠𝑖 0 , 𝑠𝑖 1 , ... 𝑠𝑖𝑘 ⟩
corresponds to the natural number 2𝑖 0 + 2𝑖 1 + · · · 2𝑖𝑘 ; for instance, ⟨𝑠 2, 𝑠 3, 𝑠 7 ⟩
corresponds to 22 + 23 + 27 = 140.
2.16 Corollary Fix an alphabet Σ. There are countably many Turing machines over
that alphabet.
Proof The alphabet Σ is finite by Section A, and 𝑄 = {𝑞 0, 𝑞 1, ... } is countable. So
by Corollary 2.10 the set 𝑄 × Σ × (Σ ∪ { L, R }) × 𝑄 is countable. Each Turing
machine is a finite set of instructions, of members that set. Then the prior result
gives that the collection of Turing machines is countable.
We can do that result one better. Observe that Corollary 2.10 on the Catrtesian
product of countable sets is effectivizable — if we can count the sets via some
effective function then we can count their Cartesian product via an effective
function — because the cantor function and its inverse are effective. Therefore
there is an effective function, a function computable with a program, that takes
in a natural number and outputs the corresponding instruction, and there is also
an effective function that takes in an instruction and outputs the corresponding
number.
Observe as well that Corollary 2.15 is effectivizable. So we can improve
Corollary 2.16 by numbering the Turing machines effectively: there is a program
that inputs a Turing machine and outputs a number, and a program that inputs
a number and outputs a machine, and these two are inverse in that we can for
Section 2. Cantor’s correspondence 71

instance round-trip from the number to the machine and back to the number.
The exact numbering scheme that we use doesn’t matter much as long as it is
has the properties in the definition below. But for illustration here is an outline
of a specific way: starting with a Turing machine P, effectively convert each of
its instructions to a number, giving a set {𝑖 0, 𝑖 1, ... 𝑖𝑛 }. Then define the number 𝑒
associated with P to be the one that when written in binary has 1 in bits 𝑖 0 , . . . 𝑖𝑛 ,
that is, 𝑒 = 2𝑖 0 + 2𝑖 1 + · · · + 2𝑖𝑛. For the inverse, given 𝑒 ∈ N, expand it into binary
as 𝑒 = 2 𝑗0 + · · · + 2 𝑗𝑘 and the set of instructions corresponding to the numbers 𝑗0 ,
. . . 𝑗𝑘 is the Turing machine. (Except that we must check that the instruction set is
deterministic, that no two instructions begin with the same 𝑞𝑝𝑇𝑝 . If this is not true
then let the machine associated with 𝑒 be the empty machine, P = { }.)
2.17 Definition A numbering is a function that assigns to each Turing machine a
natural number. A numbering is acceptable if: (1) there is an effective function
that takes as input the set of instructions and gives as output the associated number,
(2) the set of numbers for which there is an associated machine is computable,
and (3) there is an effective inverse that takes as input a natural number and gives
as output the associated machine.
For the rest of the book we will just fix a numbering and cite its properties
rather than deal with its details. We call this the machine’s index number or
Gödel number. For the machine with index 𝑒 ∈ N we write P𝑒 . For the function
computed by P𝑒 we write 𝜙𝑒 .
Think of the machine’s index as its name. We will refer to the index frequently,
for instance by saying “the 𝑒 -th Turing machine.” The takeaway point is that
because the numbering is acceptable there is a program to go from the machine’s
index to its source, the set of four-tuple instructions, and a program going from the
source to the index. Briefy, the index is computationally equivalent to the source.†
2.18 Lemma (Padding lemma) Every computable function has infinitely many indices: if
𝑓 is computable then there are infinitely many distinct 𝑒𝑖 ∈ N with 𝑓 = 𝜙𝑒0 =
𝜙𝑒1 = · · · . We can effectively produce a list of such indices.
2.19 Remark In programming terms, the lemma says that for any compiled behavior
there are infinitely many different source codes. One way to get them is by starting
with a single source code and padding it by adding to the bottom a comment line
that contains the number 0, or the number 1, etc.
Proof Let 𝑓 = 𝜙𝑒 . Let 𝑞 𝑗 be the highest-numbered state in P𝑒 . For each 𝑘 ∈ N+
consider the Turing machine obtained from P𝑒 by adding the instruction 𝑞 𝑗+𝑘 BB𝑞 𝑗+𝑘 ,
This gives an effective sequence of Turing machines P𝑒1 , P𝑒2 , . . . with distinct
indices, all having the same behavior, 𝜙𝑒𝑘 = 𝑓 .
†
Here is an informal alternative index-source correspondence that can give some intuition about
numbering. On a computer, a program’s source code is saved as a bitstring, which we can interpret as a
binary number. In the other direction, given a number we take it to be a bitstring, and dissasemble it
into machine code source. (One problem with this approach is that if the first character in the source is
represented by binary 0 then in passing to a binary number that information is lost. There are patches
for the problems but they reduce the intuitive appeal so while this idea is helpful, it is best left informal.)
72 Chapter II. Background

With the ability to number machines, we are set up for this book’s most
important result. The next section shows that while the set of Turing machines is
countable, the set of natural number functions 𝑓 : N → N is not. This will establish
that there are functions that are not computable.

II.2 Exercises
✓ 2.20 Extend the table of Example 2.1 through 𝑛 = 12. Where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ , give
formulas for 𝑥 and 𝑦 .
✓ 2.21 For each pair ⟨𝑎, 𝑏⟩ find the pair before it and the pair after it in Cantor’s
correspondence. That is, where cantor (𝑎, 𝑏) = 𝑛 , find the pair associated with
𝑛 + 1 and the pair with 𝑛 − 1. (a) ⟨50, 50⟩ (b) ⟨100, 4⟩ (c) ⟨4, 100⟩ (d) ⟨0, 200⟩
(e) ⟨200, 0⟩
✓ 2.22 Corollary 2.14 says that the union of two countable sets is countable.
(a) For the sets 𝑇 = { 2𝑘 𝑘 ∈ N } and 𝐹 = { 5𝑚 𝑚 ∈ N } produce a correspon-
dence 𝑓𝑇 : N → 𝑇 and 𝑓𝐹 : N → 𝐹 . Give a table listing the values of 𝑓𝑇 ( 0) , . . .
𝑓𝑇 ( 9) and give another table listing 𝑓𝐹 ( 0) , . . . 𝑓𝐹 ( 9) .
(b) Give a table listing the first ten values for a correspondence 𝑓 : N → 𝑇 ∪ 𝐹 .
2.23 Give an enumeration of N × { 0, 1 }. Find the pair matching 0, 10, 100, and
101. Find the number corresponding to ⟨2, 1⟩ , ⟨20, 1⟩ , and ⟨200, 1⟩ .
✓ 2.24 Example 2.1 says that the method for two columns extends to three. Give a
function enumerating { 0, 1, 2 } × N. That is, where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ give a formula
for 𝑥 and 𝑦 as functions of 𝑛 . Find the pair corresponding to 0, 10, 100, and 1 000.
Find the number corresponding to ⟨1, 2, 3⟩ , ⟨1, 20, 300⟩ , and ⟨1, 200, 3000⟩ .
2.25 Give an enumeration 𝑓 of { 0, 1, 2, 3 } × N. That is, where 𝑓 (𝑛) = ⟨𝑥, 𝑦⟩ , give
a formula for 𝑥 and 𝑦 . Also give the formula the general case of an enumeration of
{ 0, 1, 2, ... 𝑘 } × N.
✓ 2.26 Extend the table of Example 2.4 to cover correspondences up to 16.
✓ 2.27 Definition 2.6’s function cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 + 𝑦 + 1)/2] is clearly
effective since it is given as a formula. Show that its inverse, pair : N → N2 , is also
effective by sketching a way to compute it with a program.
2.28 Prove that if 𝐴 and 𝐵 are countable sets then their symmetric difference
𝐴Δ𝐵 = (𝐴 − 𝐵) ∪ (𝐵 − 𝐴) is countable.
2.29 Show that the subset 𝑆 = {𝑎 + 𝑏𝑖 𝑎, 𝑏 ∈ Z } of the complex numbers is
countable.
2.30 List the first dozen nonnegative rational numbers enumerated by the method
described in Example 2.11.
2.31 Let 𝑆 be countably infinite and let 𝑇 ⊂ 𝑆 be finite.
(a) Show that 𝑆 − 𝑇 is countable.
(b) Show that 𝑆 − 𝑇 is countably infinite.
(c) Can there be an infinite subset 𝑇 so that 𝑆 − 𝑇 is infinite?
2.32 Show that every infinite set contains a countably infinite subset.
Section 2. Cantor’s correspondence 73

2.33 We will show that Z [𝑥] = {𝑎𝑛 𝑥 𝑛 + · · · + 𝑎 1𝑥 + 𝑎 0 𝑛 ∈ N and 𝑎𝑛 ... 𝑎 0 ∈ Z },

the set of polynomials in the variable 𝑥 with integer coefficients, is countable.
(a) Fix a natural number 𝑛 . Prove that the set of degree 𝑛 polynomials Z𝑛 [𝑥] =
{𝑎𝑛 𝑥 𝑛 + · · · + 𝑎 0 𝑎𝑛 , ... 𝑎 0 ∈ Z } is countable.
(b) Finish the argument.
2.34 Show that if 𝑆 is countably infinite then there is a 𝑓 : 𝑆 → 𝑆 that is one-to-one
but not onto.
✓ 2.35 The proof of Lemma 2.9 says that the function cantor3 : N3 → N given by
cantor3 (𝑎, 𝑏, 𝑐) = cantor ( cantor (𝑎, 𝑏), 𝑐) is a correspondence. Verify that.
2.36 Define 𝑐 3 : N3 → N by ⟨𝑥, 𝑦, 𝑧⟩ ↦→ cantor (𝑥, cantor (𝑦, 𝑧)) . (a) Compute
𝑐 3 ( 0, 0, 0) , 𝑐 3 ( 1, 2, 3) , and 𝑐 3 ( 3, 3, 3) . (b) Find the triples corresponding to 0, 1,
2, 3, 4, and 5. (c) Give a formula.
2.37 Say that an entry in N × N is on the diagonal if it is ⟨𝑖, 𝑖⟩ for some 𝑖 . Show
that an entry on the diagonal has a Cantor number that is a multiple of four.
2.38 A binary sequence is an infinite bitstring, that is, we can think of it as a list
𝑏 = ⟨𝑏 0, 𝑏 1, ...⟩ or as a function 𝑏 : N → B. Suppose that we consider two binary
sequences equivalent if they have the same tail, so that 𝑏 ≡ 𝑏ˆ if there is an 𝑁 so
that 𝑖 ≥ 𝑁 implies 𝑏 (𝑖) = 𝑏ˆ. Show that for any 𝑏 , the number of equivalent binary
sequences is countably infinite.
2.39 Corollary 2.14 says that the union of any finite number of countable sets
is countable. The base case is for two sets (and the inductive step covers larger
numbers of sets). Give a proof specific to the three set case.
2.40 Show that the set of all functions from { 0, 1 } to N is countable.
2.41 Show that the image under any function of a countable set is countable. That
is, show that if 𝑆 is countable and there is a function 𝑓 : 𝑆 → 𝑇 then the range set
𝑓 (𝑆) = ran (𝑓 ) = {𝑦 𝑦 = 𝑓 (𝑥) for some 𝑥 ∈ 𝑆 } is also countable.
2.42 Give a proof of Lemma 2.3.
✓ 2.43 Consider a programming language using the alphabet Σ consisting of the
twenty six capital ASCII letters, the ten digits, the space character, open and closed
parenthesis, and the semicolon. Show each.
(a) The set of length-5 strings Σ5 is countable.
(b) The set of strings of length at most 5 over this alphabet is countable.
(c) The set of finite-length strings over this alphabet is countable.
(d) The set of programs in this language is countable.
2.44 There are other correspondences from N2 to N besides Cantor’s.
(a) Consider 𝑔 : N2 → N given by ⟨𝑛, 𝑚⟩ ↦→ 2𝑛 ( 2𝑚 + 1) − 1. Find the number
corresponding to the pairs in { ⟨𝑛, 𝑚⟩ ∈ N2 0 ≤ 𝑛, 𝑚 < 4 }.
(b) Show that 𝑔 is a correspondence.
(c) The box enumeration goes: ( 0, 0) , then ( 0, 1) , ( 1, 1) , ( 1, 0) , then ( 0, 2) , ( 1, 2) ,
( 2, 2) , ( 2, 1) , ( 2, 0) , etc. Produce a table of 𝐵(𝑥, 𝑦) for 0 ≤ 𝑥, 𝑦 ≤ 4. To what
value does ( 3, 4) correspond?
74 Chapter II. Background

2.45 Use Lemma 2.13 to give a much slicker, and shorter, proof that the rational
numbers are countable than the one in Example 2.11.
2.46 The formula for Cantor’s unpairing function cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦) (𝑥 +
𝑦 + 1)/2] give a correspondence for natural number input. What about for real
number input? (a) Find cantor ( 2, 1) . (b) Fix 𝑥 = 1 and find two different 𝑦 ∈ R
so that cantor ( 1, 𝑦) = cantor ( 2, 1) .

Section
II.3 Diagonalization
Following Cantor’s definition of cardinality, we produced a number of correspon-
dences between sets. After working through these example maps, a person could
come to think that for any two infinite sets there is some sufficiently clever way to
give a matching between them.
This impression is wrong. There are pairs of infinite sets that do not correspond.
To demonstrate this we now introduce a very powerful technique. Our interest in
it goes far beyond this result — it is central to the entire subject.
Diagonalization There are sets so large that they are not countable. That is, there
are infinite sets 𝑆 for which no correspondence exists between 𝑆 and N. One such
set is R.
3.1 Theorem There is no onto map 𝑓 : N → R. Hence, the set of reals is not
countable.
We start by illustrating the proof ’s technique. The table below shows a function
𝑓 : N → R, listing some inputs and outputs, with the outputs aligned on the
decimal point.
Input 𝑛 Decimal expansion of output 𝑓 (𝑛)
0 42 . 3 1 2 7 7 0 4 ...
1 2 . 0 1 0 0 0 0 0 ...
2 1 . 4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
4 0 . 1 0 1 0 0 1 0 ...
5 −0 . 6 2 5 5 4 1 8 ...
.. ..
. .
We will show that this function is not onto by producing a number 𝑧 ∈ R that does
not equal any of the 𝑓 (𝑛) ’s.
Ignore what is to the left of the decimal point. To its right go down the diagonal,
taking the digits 3, 1, 4, 5, 0, 1 . . . Construct the desired 𝑧 by making its first
decimal place something other than 3, making its second decimal place something
other than 1, etc. Specifically, if the diagonal digit is a 1 then in that decimal place 𝑧
gets a 2, while otherwise 𝑧 gets a 1 there. Thus, in this example 𝑧 = 0.121112 ...
By construction, 𝑧 differs from what’s in the first row, 𝑧 ≠ 𝑓 ( 0) , because they
differ in the first decimal place. Similarly, 𝑧 ≠ 𝑓 ( 1) because they differ in the
Section 3. Diagonalization 75

second place. In this way 𝑧 does not equal any of the 𝑓 (𝑛) . Therefore 𝑓 is not
onto. This technique is diagonalization.
(We have skirted a technicality, that some real numbers have two different
decimal representations. For instance, 1.000 ... = 0.999 ... because the two differ
by less than 0.1, less than 0.01, etc. This is a potential snag to the argument because
it means that even though we have constructed a representation that is different
than all the representations on the list, it still might not be that the number 𝑧 is
different than all the numbers 𝑓 (𝑛) on the list. However, dual representation only
happens for decimals when one of the representations ends in 0’s while the other
ends in 9’s. That’s why we build 𝑧 using 1’s and 2’s.)
Proof We will show that no map 𝑓 : N → R is onto.
Denote the 𝑖 -th decimal digit of 𝑓 (𝑛) as 𝑓 (𝑛) [𝑖] (if 𝑓 (𝑛) is a number with two
decimal representations then use the one ending in 0’s). Let 𝑔 be the map on the
decimal digits { 0, ... , 9 } given by: 𝑔( 𝑗) = 2 if 𝑗 is 1 and 𝑔( 𝑗) = 1 otherwise.
Now let 𝑧 be the real number that has 0 to the left of its decimal point, and whose
𝑖 -th decimal digit is 𝑔(𝑓 (𝑖) [𝑖]) . Then for all 𝑖 , 𝑧 ≠ 𝑓 (𝑖) because 𝑧 [𝑖] ≠ 𝑓 (𝑖) [𝑖] . So
𝑓 is not onto.
3.2 Definition A set that is infinite but not countable is uncountable.
3.3 Remark Before going on, we pause to reflect that the work we have seen so far
in this chapter, especially the prior theorem, is both startling and profound: some
infinite sets have more elements than other infinite sets. In particular, the reals
have more elements than the naturals. As dramatized by Galileo’s Paradox, it is
not just that the reals are a superset of the naturals. Instead, the set of naturals
cannot be made to correspond with the set of reals.
We can make an analogy with the children’s game of Musical Chairs. We have
countably many chairs 𝑃 0, 𝑃 1, ... but there are so many children — so many reals —
that at least one is left without a chair.
We next define when one set has fewer, or more, elements than another. The
intuition comes from the picture below, trying to make a correspondence between
the two finite sets { 0, 1, 2 } and { 0, 1, 2, 3 }. There are too many elements in the
codomain for any function to cover them all. The best that we can do is to cover as
many codomain elements as possible, with a function that is one-to-one but not
onto.

3
2
2
1
1
0
0

3.4 Definition The set 𝑆 has cardinality less than or equal to that of the set 𝑇 ,
denoted |𝑆 | ≤ |𝑇 | , if there is a one-to-one function from 𝑆 to 𝑇 .
3.5 Example The inclusion map 𝜄 : N → R that sends 𝑛 ∈ N to itself, 𝑛 ∈ R, is
one-to-one and so | N | ≤ | R | . By Theorem 3.1 the cardinality is strictly less.
76 Chapter II. Background

3.6 Remark The wording of that definition suggests that if both |𝑆 | ≤ |𝑇 | and |𝑇 | ≤ |𝑆 |
then |𝑆 | = |𝑇 | . That is true but the proof is beyond our scope; see Exercise 3.32.
For the next result, recall that for a set 𝑆 the characteristic function 1𝑆 is the
Boolean function determining membership: 1𝑆 (𝑠) = 𝑇 if 𝑠 ∈ 𝑆 and 1𝑆 (𝑠) = 𝐹 if
𝑠 ∉ 𝑆 . (We sometimes instead use the bits 1 for 𝑇 and 0 for 𝐹 .) Thus for the set of
two characters 𝑆 = { a, c }, the characteristic function with domain Σ = { a, ... , z }
is 1𝑆 ( a) = 𝑇 , 1𝑆 ( b) = 𝐹 , 1𝑆 ( c) = 𝑇 , 1𝑆 ( d) = 𝐹 , ... 1𝑆 ( z) = 𝐹 .
Recall also that the power set P (𝑆) is the collection of subsets of 𝑆 . For
instance, if 𝑆 = { a, c } then P (𝑆) = { ∅, { a }, { c }, { a, c } }.
3.7 Theorem (Cantor’s Theorem) A set’s cardinality is strictly less than that of its
power set.
We first illustrate the proof. One half is easy: to start with a set 𝑆 and produce
a function to P (𝑆) that is one-to-one, just map 𝑠 ∈ 𝑆 to the set {𝑠 }.
The harder half is showing that no map from 𝑆 to P (𝑆) is a correspondence.
For an example of this half consider the set 𝑆 = { a, b, c }. We will walk through
how we prove that this function 𝑓 : 𝑆 → P (𝑆) is not onto.
𝑓 𝑓 𝑓
a ↦−→ { b, c } b ↦−→ { b } c ↦−→ { a, b, c } (∗)

Below, the first row, the a row, lists the values of the characteristic function
1 𝑓 ( a ) = 1{ b, c } on the inputs a, b, and c. The second row lists the values for
1 𝑓 ( b ) = 1{ b } . And, the third row lists 1 𝑓 ( c ) = 1{ a, b, c } .
𝑠 ∈𝑆 𝑓 (𝑠) 1 𝑓 (𝑠 ) ( a) 1 𝑓 (𝑠 ) ( b) 1 𝑓 (𝑠 ) ( c)
a { b, c } 𝐹 𝑇 𝑇
b {b} 𝐹 𝑇 𝐹
c { a, b, c } 𝑇 𝑇 𝑇
We show that 𝑓 is not onto by producing a member of P (𝑆) that is not any of the
three sets in (∗). For that, diagonalize. Take the table’s diagonal 𝐹𝑇𝑇 and flip the
values to get 𝑇 𝐹 𝐹. That describes the characteristic function of the set 𝑅 = { a }.
This set is not equal to the set 𝑓 ( a) because their characteristic functions differ
on a. Similarly, 𝑅 is not the set 𝑓 ( b) because the characteristic functions differ
on b, and 𝑅 is not 𝑓 ( c) because they differ on c. So 𝑅 is not in the range of 𝑓 , so 𝑓
is not onto.
Proof First, |𝑆 | ≤ | P (𝑆)| because the inclusion map 𝜄 : 𝑆 → P (𝑆) given by
𝜄 (𝑠) = {𝑠} is one-to-one. For the ‘strictly’ half we will show that no map from a set
to its power set is onto. Fix 𝑓 : 𝑆 → P (𝑆) and consider this element of P (𝑆) .

𝑅 = {𝑠 𝑠 ∉ 𝑓 (𝑠) }
We will demonstrate that no member of the domain maps to 𝑅 , and thus 𝑓 is not
onto. Suppose that there exists 𝑠ˆ ∈ 𝑆 such that 𝑓 (𝑠ˆ) = 𝑅 . Consider whether 𝑠ˆ
is an element of 𝑅 . We have that 𝑠ˆ ∈ 𝑅 if and only if 𝑠ˆ ∈ {𝑠 𝑠 ∉ 𝑓 (𝑠) }. By the
Section 3. Diagonalization 77

definition of 𝑅 , that holds if and only if 𝑠ˆ ∉ 𝑓 (𝑠ˆ) , which holds if and only if 𝑠ˆ ∉ 𝑅 .
The contradiction means that no such 𝑠ˆ exists.
3.8 Corollary The cardinality of the set N is strictly less than the cardinality of the
set of functions 𝑓 : N → N.
Proof Let the set of functions be 𝐺 . There is a correspondence between P ( N)
and 𝐺 , namely the one that associates each subset 𝑆 ⊆ N with its characteristic
function, 1𝑆 : N → N. Therefore | N | < | P ( N)| ≤ |𝐺 | .
3.9 Corollary (Existence of uncomputable functions) There is a function
𝑓 : N → N that is not computable: 𝑓 ≠ 𝜙𝑒 for all 𝑒 .
Proof Lemma 2.9 shows that the cardinality of the set of Turing machines equals
the cardinality of the set N. The prior result shows that the cardinality of the set
of functions from N to itself is strictly greater than the cardinality of N. So the
cardinality of the set of functions from N to itself is greater than the cardinality
of the set of Turing machines — no association of Turing machines with natural
number functions is onto. In particular, when we associate each Turing machine
with the function that it computes, that association is not onto. There is a natural
number function that is without a Turing machine to compute it.
This is an epochal result. In the light of Church’s Thesis, we take it to prove
that there are jobs that no computer can do.
To a person trained in programming, where students are trained to go from
task to the program that does the task, the existence of things that cannot be done
can be a surprise, perhaps even a shock. One point that these results make is that
the work here on sizes of infinities, which can at first seem impracticably abstract,
leads to interesting and useful conclusions.

II.3 Exercises
3.10 Your study partner is confused about the diagonal argument. “If you had an
infinite list of numbers, it would clearly contain every number, right? I mean, if
you had a list that was truly INFINITE, then you simply couldn’t find a number
that is not on the list!” Help them out.
3.11 Your classmate says, “Professor, I’m confused. The set of numbers with one
decimal place, such as 25.4 and 0.1, is clearly countable — just take the integers
and shift all the decimal places by one. The set with two decimal places, such
as 2.54 and 6.02 is likewise countable, etc. This is countably many sets, each of
which is countable, and so the union is countable. The union is the whole reals, so
I think that the reals are countable.” Where is their mistake?
3.12 Verify Cantor’s Theorem, Theorem 3.7, for these finite sets. (a) { 0, 1, 2 }
(b) { 0, 1 } (c) { 0 } (d) { }
✓ 3.13 Use Definition 3.4 to prove that the first set has cardinality less than or equal
to the second.
(a) 𝑆 = { 1, 2, 3 } , 𝑆ˆ = { 11, 12, 13 }
78 Chapter II. Background

(b) 𝑇 = { 0, 1, 2 } , 𝑇ˆ = { 11, 12, 13, 14 }

(c) 𝑈 = { 0, 1, 2 } , the set of odd numbers
(d) the set of even numbers, the set of odds
3.14 One set is countable and the other is uncountable. Which is which?
(a) {𝑛 ∈ Z 𝑛 + 3 < 5 } (b) {𝑥 ∈ R 𝑥 + 3 < 5 }
✓ 3.15 Characterize each set as countable or uncountable. You need only give a one-
word answer. (a) ( 5 .. ∞) ⊂ R (b) [ 1 .. 4) ⊂ N (c) [ 1 .. 4) ⊂ R (d) [ 5 .. ∞) ⊂ N
3.16 List all of the functions with domain 𝐴2 = { 0, 1 } and codomain P (𝐴2 ) .
How many functions are there for a set 𝐴3 with three elements? 𝑛 elements?
3.17 List all of the functions from 𝑆 to 𝑇 . Which are one-to-one? (a) 𝑆 = { 0, 1 },
𝑇 = { 10, 11 } (b) 𝑆 = { 0, 1 }, 𝑇 = { 10, 11, 12 }
✓ 3.18 Short answer: fill each blank, giving the sharpest conclusion possible, by
choosing from (i) uncountable, (ii) countable or uncountable, (iii) finite, (iv) count-
able, (v) finite, countably infinite, or uncountable (you might use an answer more
than one blank or not at all). You needn’t give a proof.
(a) If 𝐴 and 𝐵 are finite then 𝐴 ∪ 𝐵 is .
(b) If 𝐴 is countable and 𝐵 is finite then 𝐴 ∪ 𝐵 is .
(c) If 𝐴 is countable and 𝐵 is uncountable then 𝐴 ∪ 𝐵 is .
(d) if 𝐴 is countable and 𝐵 is uncountable then 𝐴 ∩ 𝐵 is .
3.19 Short answer: Consider 𝑓 : 𝑆 → 𝑇 where 𝑆 is countable. For the two items
below, list all of these that are possible (i) 𝑆 is finite, (ii) 𝑇 is finite, (iii) 𝑆 is countably
infinite, (iv) 𝑇 is countably infinite, (v) 𝑇 is uncountable: (a) the map is onto,
(b) the map is one-to-one.
3.20 Let 𝑆 = [ 0 .. 1] and 𝑇 = ( 0 .. 1) be intervals of reals. Give a correspondence.
Hint: 𝑓 (𝑥) = 𝑥 works for most inputs but the endpoints have to go somewhere, so
start with 𝑓 ( 0) = 1/2 and 𝑓 ( 1) = 1/4 and follow the consequences.
✓ 3.21 Give a set with a larger cardinality than R.
✓ 3.22 Recall that we write the set of bits as B = { 0, 1 }.
(a) Show that the set of finite bit strings, ⟨𝑏 0𝑏 1 ... 𝑏𝑘 − 1 ⟩ where 𝑏𝑖 ∈ B and 𝑘 ∈ N,
is countable.
(b) We can think of an infinite bit string 𝑓 = ⟨𝑏 0, 𝑏 1, ...⟩ as a function 𝑓 : N → B.
Show that the set of infinite bit strings is uncountable, using diagonalization.
3.23 Prove that for two sets, 𝑆 ⊆ 𝑇 implies |𝑆 | ≤ |𝑇 | .
3.24 Use diagonalization to show that this is false: all functions 𝑓 : N → N with
finite range are computable. Hint: it is enough to show that the collection of such
functions is not countable.
3.25 In mathematics classes we mostly work with rational numbers, perhaps
leaving the impression that irrational numbers are uncommon. But actually,
there are more irrational numbers than rationals. Prove that while the set of
rational numbers is countable by Example 2.11, the set of irrational numbers is
uncountable.
Section 3. Diagonalization 79

✓ 3.26 Example 2.11 shows that the rational numbers are countable. What happens
when we apply diagonal argument given in Theorem 3.1 is to an enumeration
of the rationals? Consider a sequence 𝑞 0, 𝑞 1, ... that contains all of the rationals.
Represent each of those numbers with a decimal expansion 𝑞𝑖 = 𝑑𝑖 .𝑑𝑖,0𝑑𝑖,1𝑑𝑖,2 ...
(where 𝑑𝑖 ∈ Z and 𝑑𝑖,𝑗 ∈ { 0, ... 9 }) that does not end in all 9’s, so that the decimal
expansion is unique.
(a) Let 𝑔 be the map on the decimal digits 0, 1, ... 9 given by 𝑔( 1) = 2, and 𝑔(𝑖) = 1
if 𝑖 ≠ 2. Consider the number down the diagonal, 𝑑 = 𝑛∈ N 𝑑𝑛,𝑛 · 10 − (𝑛+1 ).
Í
Transform its digits using 𝑔, that is, define 𝑧 = 𝑛∈ N 𝑔(𝑑𝑛,𝑛 ) · 10 − (𝑛+1 ). Show
Í
that 𝑧 is irrational.
(b) Use the prior item to conclude that the diagonal number 𝑑 = 𝑛∈ N 𝑑𝑛,𝑛 ·
Í
10 − (𝑛+1 ) is irrational. Hint: show that it has no repeating pattern in its decimal
expansion.
(c) Why is the fact that the diagonal is not rational not a contradiction to the fact
that we can enumerate all of the rationals?
3.27 Verify Cantor’s Theorem in the finite case by showing that if 𝑆 is finite then
the cardinality of its power set is | P (𝑆)| = 2 |𝑆 | .
3.28 The key to the proof of Cantor’s Theorem, Theorem 3.1 is the definition of
𝑅 = {𝑠 𝑠 ∉ 𝑓 (𝑠) }. This story illustrates the idea: a high school yearbook asks
each graduating student 𝑠𝑖 make a list 𝑓 (𝑠𝑖 ) of class members that they predict will
someday be famous. Define the set of humble students 𝐻 to consist of those who
are not on their own list. Show that no student’s list equals 𝐻 .
3.29 Show that there is no set of all sets. Hint: use Theorem 3.7.
3.30 The proof of Theorem 3.1 must work around the fact that some numbers
have more than one base ten representation. Base two also has the property that
some numbers have more than one representation; an example is 0.01000 ... and
0.00111 .... But in a base two argument, when building 𝑧 there is no way to avoid
the digits 0 and 1. How could you make the argument work in base two?
3.31 The discussion after the statement of Theorem 3.1 includes that the real
number 1 has two different decimal representations, 1.000 ... = 0.999 ...
(a) Verify this equality by using the formula for an infinite geometric series,
𝑎 + 𝑎𝑟 + 𝑎𝑟 2 + 𝑎𝑟 3 + · · · = 𝑎/( 1 − 𝑟 ) .
(b) Show that if a number has two different decimal representations then in the
leftmost decimal place where they differ, they differ by 1. Hint: that is the
biggest difference that the remaining decimal places can make up.
(c) In addition show that for the one with the larger digit in that first differing
place, all of the digits to its right are 0, while the other representation has that
all of the remaining digits are 9’s.
3.32 Definition 3.4 extends the definition of equal cardinality to say that |𝐴| ≤ |𝐵|
if there is a one-to-one function from 𝐴 to 𝐵 . The Schröder–Bernstein theorem is
that if both |𝑆 | ≤ |𝑇 | and |𝑇 | ≤ |𝑆 | then |𝑆 | = |𝑇 | . We will walk through the proof.
It depends on finding chains of images: for any 𝑠 ∈ 𝑆 we form the associated chain
80 Chapter II. Background

by iterating application of the two functions, both to the right and the left, as here.

... 𝑓 −1 (𝑔 −1 (𝑠)), 𝑔 −1 (𝑠), 𝑠, 𝑓 (𝑠), 𝑔(𝑓 (𝑠)), 𝑓 (𝑔(𝑓 (𝑠))) ...

(Starting with 𝑠 the chain to the right is 𝑠, 𝑓 (𝑠), 𝑔(𝑓 (𝑠)), 𝑓 (𝑔(𝑓 (𝑠))), ... while the
chain stretching to the left is ... 𝑓 − 1 (𝑔 − 1 (𝑠)), 𝑔 − 1 (𝑠), 𝑠 .) For any 𝑡 ∈ 𝑇 define the
associated chain similarly.
An example is to take a set of integers 𝑆 = { 0, 1, 2 } and a set of characters
𝑇 = { a, b, c }, and consider the two one-to-one functions 𝑓 : 𝑆 → 𝑇 and 𝑔 : 𝑇 → 𝑆
shown here.
𝑠 𝑓 (𝑠) 2 c 𝑡 𝑔(𝑡)
0 b 1 b
a 0
1 c 0 a
b 1
2 a c 2
Starting at 0 ∈ 𝑆 gives a single chain that is cyclic, ... 0, b, 1, c, 2, a, 0 ...
(a) Consider 𝑆 = { 0, 1, 2, 3 } and 𝑇 = { a, b, c, d } . Let 𝑓 associate 0 ↦→ a, 1 ↦→ b,
2 ↦→ d and 3 ↦→ c. Let 𝑔 associate a ↦→ 0, b ↦→ 1, c ↦→ 2 and d ↦→ 3. Check
that these maps are one-to-one. List the chain associated with each element
of 𝑆 and the chain associated with each element of 𝑇 .
(b) For infinite sets a chain can have a first element, an element without any
preimage. Let 𝑆 be the even numbers and let 𝑇 be the odds. Let 𝑓 : 𝑆 → 𝑇 be
𝑓 (𝑥) = 𝑥 + 1 and let 𝑔 : 𝑇 → 𝑆 be 𝑔(𝑥) = 𝑥 + 1. Show each map is one-to-one.
Show there is a single chain and that it has a first element.
(c) Argue that we can assume without loss of generality that 𝑆 and 𝑇 are disjoint
sets.
(d) Assume that 𝑆 and 𝑇 are disjoint and that 𝑓 : 𝑆 → 𝑇 and 𝑔 : 𝑇 → 𝑆 are one-
to-one. Show that every element of either set is in a unique chain, and that
each chain is of one of four kinds: (i) those that repeat after some number of
terms (ii) those that continue infinitely in both directions without repeating
(iii) those that continue infinitely to the right but stop on the left at some
element of 𝑆 , and (iv) those that continue infinitely to the right but stop on
the left at some element of 𝑇 .
(e) Show that for any chain the function below is a correspondence between the
chain’s elements from 𝑆 and its elements from 𝑇 .

– if 𝑠 is in a sequence of type (i), (ii), or (iii)

(
𝑓 (𝑠)
ℎ(𝑠) = 1
𝑔 (𝑠) – if 𝑠 is in a sequence of type (iv)
−

Section
II.4 Universality
We have seen a number of Turing machines, such as one whose output is the
successor of its input, one that adds two input numbers, and others. These are
Section 4. Universality 81

single-purpose devices, where to get different input-output behavior we need a

new machine. This picture shows programmers of an early computer. They are
changing its behavior by changing its circuits, using the patch cords, essentially
making new hardware.

4.1 Figure: ENIAC, reconfigure by rewiring.

Imagine having a cellphone where to change from running a browser to taking a

call you must pull one chip and replace it with another. The picture’s patch cords
are an improvement over a soldering iron, but are not a final answer.
Universal Turing machine A pattern in technology is for jobs done in hardware
to migrate to software. The classic example is weaving.

Weaving by hand, as the loom operator on the left is doing, is intricate and slow.
We can make a machine to reproduce her pattern. But what if we want a different
pattern; do we need another machine? In 1801 J Jacquard built a loom like the
one on the right, controlled by cards. Getting a different pattern does not require
a new loom, it only requires swapping cards.
Turing introduced the analog to this for computers. He produced a Turing
machine UP that can be fed a tape containing a description of a Turing machine M,
along with input for that machine. Then UP will have the same input-output
behavior as would M. If M halts on the input then UP will halt and give the same
output, while if M does not halt on that input then UP also does not halt.
This single machine can be made to have any desired computable behavior. So
we don’t need infinitely many different machines, we can just use UP . This was
what we meant by saying that a good first take on Turing machines is that they are
82 Chapter II. Background

more like a modern computer program than a modern computer — on a modern

computer, to change behavior you don’t change the hardware.
Before stating Turing’s theorem, we first address an often-asked
question. The machine UP may seem to present a chicken and egg
problem: how can we give a Turing machine as input to a Turing
machine? In particular, since UP is itself a Turing machine, the
theorem seems to allow the possibility of giving it to itself — won’t
feeding a machine to itself lead to infinite regress?
We run Turing machines by loading symbols on the tape and
An ouroboros, a pressing Start. So we don’t feed machines to machines; we feed
snake swallow- them symbols, representations of things. True, we can feed UP a
ing its own tail specification of itself, such as a pair 𝑒, 𝑥 where 𝑒 is the index number of
UP , and is thus computationally equivalent to that machine’s source,
and 𝑥 is the input. Even so, the universe won’t collapse — we can absolutely use a
text editor to edit its own source, or ask a compiler to generate its own executable.
Similarly, we can feed UP its own index number. Lots of interesting things happen
as a result, but for a start there is no inherent impossibility.
4.2 Theorem (Turing, 1936) There is a Turing machine that when given the inputs
𝑒 and 𝑥 will have the same output behavior as does P𝑒 on input 𝑥 .
This is a Universal Turing Machine.† This figure‡ outlines the action
Start
of such a machine.
Before our argument for their existence we first observe that univer- Read 𝑒, 𝑥

Simulate P𝑒 on
sal machines are familiar from everyday computing. For one thing, we
can compare this flow chart with the behavior of a computer operating input 𝑥
system. An operating system is given a program to run and some data
to feed to that program. Think of the program as P𝑒 and the data as Print result

End
a bitstring that we can interpret as a number, 𝑥 . The operating system
arranges that the underlying hardware will behave like machine 𝑒 , with
input 𝑥 . In short, as with an operating system, Universal Turing machines change
their behavior in software. No patch chords.
Another everyday computing experience that is like a universal machine is a
language interpreter. Below is an interaction with the Racket interpreter. At the
first prompt we type in a routine that takes x and returns the sum of the first
x numbers. At the second prompt we specify the input to that routine, 𝑥 = 4.
$ racket
Welcome to Racket v8.2 [cs].
> (define (triangular x)
(if (= x 0)
0
(+ x (triangular (sub1 x)))))
> (triangular 4)
10
†
We could also define a Universal Turing machine to take the single-number input cantor (𝑒, 𝑥 ) . ‡ This
is a flowchart, which gives a high level sketch of a routine. We use three types of boxes. Boxes with
rounded corners are for Start and End. Rectangles are for ordinary operations on data. In later charts
we will also see diamond boxes, which are for decisions, if statements.
Section 4. Universality 83

The most direct example of computing systems that act as universal machines
is a language’s eval statement. At the first prompt below we define a routine
that has the interpreter evaluate the expression that is input. In the next prompt
we define a list (quoted so that it is not interpreted). This list lambda (i) ...
describes a function of one input.† In the third and fourth prompts, the interpreter
evaluates the routine that is described in that list and applies it to the numbers 5
and 0. That is, as with the loom’s punched cards, we can make utm behave as
different routines, by giving it a description of whatever routine is desired.
> (define (utm s)
(eval s))
> (define test '(lambda (i) (if (= i 0) 1 0)))
> ((utm test) 5)
0
> ((utm test) 0)
1

Finally, as to the proof of the theorem, the simplest way to prove that something
exists is to produce it. We have already exhibited what amounts to a Universal
Turing machine. At the end of Chapter One, on page 38, we gave code for a Turing
machine simulator, which reads a Turing machine from a file and then runs it. The
code is in Racket but Church’s Thesis asserts that we could write a Turing machine
with the same behavior.
Uniformity Consider this job: given a real number 𝑟 ∈ R, write a program to
output its digits. More precisely, produce a Turing machine P𝑟 such that when
given 𝑛 ∈ N as input, P𝑟 outputs the 𝑛 -th decimal place of 𝑟 (for 𝑛 = 0, it outputs
the integer to the left of the decimal point).
We know that this is not possible for all 𝑟 because while there are uncountably
many real numbers, there are only countably many Turing machines. But what
stops us? One of the enjoyable things about coding is the feeling of being able
to get the machine to do anything — why can’t write a routine that will output
whatever digits we like?
There certainly are real numbers for which there is such a routine. One is 11.25.
(define (one-quarter -decimal -places n)
(cond
[(= n 0) 11]
[(= n 1) 2]
[(= n 2) 5]
[else 0]))

For a more generic number, say, some 𝑟 = 0.703 ... , we might momentarily imagine
brute-forcing it.
(define (r-decimal -place n)
(cond
[(= n 0) 0]
[(= n 1) 7]
[(= n 2) 0]
[(= n 3) 3]
...
))
†
It uses ‘lambda’ to start the definition of a function because that’s the word Church used.
84 Chapter II. Background

But that’s silly. Programs have finite length and so can’t have infinitely many cases.
That is, because of the if , what the following program does on 𝑛 = 7 is
unconnected to what it does on other inputs.
(define (foo n)
(if (= n 7)
42
(* 2 n)))

But a program can only have finitely many such differently-behaving branches. The
fact that a Turing machine has only finitely many instructions imposes a condition
of uniformity on its behavior.
4.3 Example Connecting in this way the idea that ‘something is computable’ with ‘it
is uniformly computable’ has some surprising consequences. Consider the problem
of producing a program that inputs a number 𝑛 and decides whether somewhere
in the decimal expansion of 𝜋 = 3.14159 ... there are 𝑛 consecutive nines.
There are two possibilities. Either for all 𝑛 such a sequence exists, or else there
is some 𝑁 where a sequence of nines exists for lengths less than 𝑁 and no sequence
exists when 𝑛 ≥ 𝑁 . Consequently the problem is solved: one of the two below is
the right program (for illustration here we take 𝑁 = 1234).
(define (sequence -of-nines -0 n) (define (sequence -of-nines -1 n)
1) (if (< n 1234)
1
0))

One surprising aspect of this argument is that neither of the two routines
appears to have much to do with 𝜋 . Also surprising, and perhaps unsettling, is that
we have shown that the problem is solvable without showing how to solve it. That
is, there is a difference between showing that this function is computable

1 – if 𝜋 has 𝑛 consecutive nines

(
𝑓 (𝑛) =
0 – otherwise

and possessing an algorithm to compute it. This shows that the assertion “something
is computable if you can write a program for it” at the very least suppresses some
important subtleties.
In contrast, imagine that we have a routine pi_decimals that inputs 𝑖 ∈ N
and outputs the 𝑖 -th decimal place of 𝜋 . Using it, we can write a program that takes
in 𝑛 and steps through 𝜋 ’s digits, looking for 𝑛 consecutive nines. This approach
has the advantage that it doesn’t just say whether the answer exists, it constructs
that answer. This approach is also uniform in the sense that we could modify
it to use other routines such as e_decimals and so look for strings of nines in
other numbers. However this approach has the disadvantage that if there is an
𝑁 where 𝜋 does not have 𝑛 consecutive nines for 𝑛 ≥ 𝑁 then this program will
search without bound, and never discover that.

Parametrization Universality says that there is a Turing machine that takes in

inputs 𝑒 and 𝑥 and returns the same value as we would get by running P𝑒 on
Section 4. Universality 85

input 𝑥 , including not halting if the machine does not halt on that input. That
is, there is a computable function 𝜙 : N2 → N such that 𝜙 (𝑒, 𝑥) = 𝜙𝑒 (𝑥) if 𝜙𝑒 (𝑥)↓
and 𝜙 (𝑒, 𝑥)↑ if 𝜙𝑒 (𝑥)↑.
There, the 𝑒 travels from the function’s argument to the index. We now
generalize. Start with a program that takes two inputs such as this one.

(define (P x y)
(+ x y))

Freeze the first argument. The result is a one-input program. Here we freeze 𝑥
at 7 and at 8.

(define (P_7 y) (define (P_8 y)

(P 7 y)) (P 8 y))

This is partial application because we are not freezing all of the input variables.
Instead, we are parametrizing the variable 𝑥 .
The programs in the family are related to the starting one, obviously. Denoting
the function computed by the above starting program P as 𝜓 (𝑥, 𝑦) = 𝑥 + 𝑦 , partial
application gives a family of functions: 𝜓 0 (𝑦) = 𝑦 , 𝜓 1 (𝑦) = 1 +𝑦 , 𝜓 2 (𝑦) = 2 +𝑦 , . . .
The next result says that in general, from the index of a starting Turing machine
or computable function and from the values that are frozen, we can compute the
family members.
4.4 Theorem (s-m-n theorem, or Parameter theorem) For every 𝑚, 𝑛 ∈ N there
is a computable total function 𝑠𝑚,𝑛 : N1+𝑚 → N such that for an 𝑚 +𝑛 -ary function
𝜙𝑒 (𝑥 0, ... 𝑥𝑚−1, 𝑥𝑚 , ... 𝑥𝑚+𝑛−1 ) , freezing the initial 𝑚 variables at 𝑎 0, ... 𝑎𝑚−1 ∈ N
gives the 𝑛 -ary computable function 𝜙𝑠 (𝑒,𝑎0,...𝑎𝑚−1 ) (𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 ) .
Proof We will produce the function 𝑠 to satisfy three requirements: it must be
effective, it must input an index 𝑒 and an 𝑚 -tuple 𝑎 0, ... 𝑎𝑚− 1 , and it must output
the index of a machine P̂ that, when given the input 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 , will return the
value 𝜙𝑒 (𝑎 0, ... 𝑎𝑚− 1, 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 ) , or fail to halt if that function diverges.
The idea is that the machine that computes 𝑠 will construct the instructions
for P̂ . We can get effectively from the instruction set to the index, so with that we
will be done.
Below on the left is the flowchart for the machine that computes the function 𝑠 .
In its third box it creates the set of four-tuple instructions, P̂, sketched on the
right. The machine on the left needs 𝑎 0, ... 𝑎𝑚− 1 for the right side’s second, third,
and fourth boxes, and it needs 𝑒 for P̂ ’s fifth box. (In this book we try to avoid
getting entangled in the detail of the convention for representations for inputs and
outputs of Turing machines. However in this proof, to be as clear as possible in the
right side’s flowchart, we assume that its input is encoded in unary, that inputs
are separated with a single blank, and that when the machine is started the head
should be under the input’s left-most 1.)
86 Chapter II. Background

Start
Start
Move left 𝑎0 + · · · + 𝑎𝑚−1 + 𝑚 cells
Read 𝑒, 𝑎0 , ... , 𝑎𝑚−1
Put 𝑎0 , . . ., 𝑎𝑚−1 on the tape
Create instructions for 𝑃ˆ separated by blanks

Return the index of Move I/O head to the start of 𝑎0

that instruction set
Simulate 𝑃𝑒
End
End

The Turing machine P̂ does not first read its inputs 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 . Instead,
it first moves left and writes 𝑎 0, ... 𝑎𝑚− 1 on the tape, in unary and separated by
blanks, and with a blank between 𝑎𝑚− 1 and 𝑥𝑚 . (Recall that the 𝑎𝑖 are parameters,
not variables. They are fixed. They are, so to speak, hard-coded into P̂.) Then,
using universality, 𝑃ˆ simulates Turing machine 𝑃𝑒 and lets it run on the entire list
of inputs now on the tape, 𝑎 0, ... 𝑎𝑚− 1, 𝑥𝑚 , ... 𝑥𝑚+𝑛− 1 .
In the notation 𝑠𝑚,𝑛 , the subscript 𝑚 is the number of inputs being frozen
while 𝑛 is the number of inputs left free. These subscripts can be a bother and we
often omit them.
The key point about the s-m-n theorem is that it gives not just one computable
function but instead a family.
4.5 Example Consider the two-input routine sketched by this flowchart.

Start

Read 𝑥 , 𝑦
(∗)
Print 𝑥 · 𝑦

End

By Church’s Thesis there is a Turing machine following the sketch, computing

the function 𝜓 (𝑥, 𝑦) = 𝑥 · 𝑦 . Let that machine have index 𝑒 0 . (The subscript
emphasizes that this index is a constant.)
On the left below is the flowchart sketching the machine P𝑠 (𝑒0,0 ) , which freezes
the value of 𝑥 to 0 and so computes the function 𝜙𝑠 (𝑒0,0 ) (𝑦) = 0. For example,
𝜙𝑠 (𝑒0,0 ) ( 5) = 0. Similarly, the other two are flowcharts summarizing P𝑠 (𝑒0,1 )
and P𝑠 (𝑒0,2 ) , freezing the value of 𝑥 at 1 and 2 and therefore computing the
functions 𝜙𝑠 (𝑒0,1 ) (𝑦) = 𝑦 and 𝜙𝑠 (𝑒0,2 ) (𝑦) = 2𝑦 .

Start Start Start

Read 𝑦 Read 𝑦 Read 𝑦

...
Print 0 · 𝑦 Print 1 · 𝑦 Print 2 · 𝑦

End End End

Section 4. Universality 87

Here is the generic member, P𝑠 (𝑒0,𝑥 ) .

Start

Read 𝑦
(∗∗)
Print 𝑥 · 𝑦

End

Compare (∗∗) to (∗). The difference is that the machine in (∗∗) does not read 𝑥 ;
rather, thinking of these as programs instead of Turing machines, 𝑥 is hard-coded
into the source body.
In summary, the s-m-n Theorem gives a sequence of computable functions such
as 𝜙𝑠 (𝑒0,𝑥 ) that is a family in that the indices are given by a computable function.
This family is parametrized by 𝑥 , since 𝑒 0 is fixed.
Restated, this family is uniformly computable — there is a computable function 𝑠
(more precisely, 𝑠 1,1 ) going from the index 𝑒 and the parameter value 𝑥 to the index
of the result in (∗∗). So the s-m-n Theorem is about uniformity.

II.4 Exercises
✓ 4.6 Someone in your study group asks, “What can a Universal Turing machine do
that a regular Turing machine cannot?” Help them out.
4.7 Has anyone ever built a Universal Turing machine or a device equivalent to
one, or is it a theory-only thing?
4.8 Can a Universal Turing machine simulate another Universal Turing machine,
or for that matter can it simulate itself?
✓ 4.9 Your class has someone who says, “Universal Turing machines make no sense
to me. How could a machine simulate another machine that has more states?”
Correct their misimpression.
4.10 Is there more than one Universal Turing machine?
✓ 4.11 Consider the function 𝑓 (𝑥 0, 𝑥 1 ) = 3𝑥 0 + 𝑥 0 · 𝑥 1 .
(a) Freeze 𝑥 0 to have the value 4. What is the resulting one-variable function?
(b) Freeze 𝑥 0 at 5. What is the resulting one-variable function?
(c) Freeze 𝑥 1 to be 0. What is the resulting function?
4.12 Consider 𝑓 (𝑥 0, 𝑥 1, 𝑥 2 ) = 𝑥 0 + 2𝑥 1 + 3𝑥 2 .
(a) Freeze 𝑥 0 to have the value 1. What is the resulting two-variable function?
(b) What two-variable function results from fixing 𝑥 0 to be 2?
(c) Let 𝑎 be a natural number. What two-variable function results from fixing 𝑥 0
to be 𝑎 ?
(d) Freeze 𝑥 0 at 5 and 𝑥 1 at 3. What is the resulting one-variable function?
(e) What one-variable function results from fixing 𝑥 0 to be 𝑎 and 𝑥 1 to be 𝑏 , for
𝑎, 𝑏 ∈ N?
88 Chapter II. Background

✓ 4.13 Suppose that the Turing machine sketched by this flowchart has index 𝑒 0 .
Start

Read 𝑥 0 , 𝑥 1

Print 𝑥 0 + 𝑥 1

End

(a) Describe the function 𝜙𝑠1,1 (𝑒0,1 ) .

(b) What are the values of 𝜙𝑠1,1 (𝑒0,1 ) ( 0) , 𝜙𝑠1,1 (𝑒0,1 ) ( 1) , and 𝜙𝑠1,1 (𝑒0,1 ) ( 2) ?
(c) Describe the function 𝜙𝑠1,1 (𝑒0,0 ) .
(d) What are the values of 𝜙𝑠1,1 (𝑒0,0 ) ( 0) , 𝜙𝑠1,1 (𝑒0,0 ) ( 1) , and 𝜙𝑠1,0 (𝑒0,0 ) ( 2) ?
4.14 Let the Turing machine sketched by this flowchart have index 𝑒 0 .

Start

Read 𝑥 0 , 𝑥 1 , 𝑥 2

Print 𝑥 0 + 𝑥 1 · 𝑥 2

End

(a) Describe the function 𝜙𝑠1,2 (𝑒0,1 ) .

(b) Find 𝜙𝑠1,2 (𝑒0,1 ) ( 0, 1) , 𝜙𝑠1,2 (𝑒0,1 ) ( 1, 0) , and 𝜙𝑠1,2 (𝑒0,1 ) ( 2, 3)
(c) Describe the function 𝜙𝑠2,1 (𝑒0,1,2 ) .
(d) Find 𝜙𝑠2,1 (𝑒0,1,2 ) ( 0) , 𝜙𝑠2,1 (𝑒0,1,2 ) ( 1) , and 𝜙𝑠2,1 (𝑒0,1,2 ) ( 2) .
✓ 4.15 Suppose that the Turing machine sketched by this flowchart has index 𝑒 0 .
Start

Read 𝑥 0 , 𝑥 1

Print 𝑥 1
N
𝑥 0 > 1? Y
Infinite loop

End

(a) Describe 𝜙𝑠1,1 (𝑒0,0 ) . (b) Find 𝜙𝑠1,1 (𝑒0,0 ) ( 5) . (c) Describe 𝜙𝑠1,1 (𝑒0,1 ) . (d) Find
𝜙𝑠1,1 (𝑒0,1 ) ( 5) . (e) Describe 𝜙𝑠1,1 (𝑒0,2 ) . (f) Find 𝜙𝑠1,1 (𝑒0,2 ) ( 5) .
✓ 4.16 Let the Turing machine sketched by this flowchart have index 𝑒 0 .
Start

Read 𝑥 0 , 𝑥 1 , 𝑦

Y 𝑥 0 even? N

Print 𝑥 1 · 𝑦 Print 𝑥 1 + 𝑦

End
Section 5. The Halting problem 89

We will describe the family of functions parametrized by the arguments 𝑥 0 and 𝑥 1 .

(a) As in the s-m-n theorem, fix 𝑥 0 = 0 and 𝑥 1 = 3. Describe 𝜙𝑠 (𝑒 0 ,0,3 ) . What is
𝜙𝑠 (𝑒0,0,3 ) ( 5) ?
(b) Describe 𝜙𝑠 (𝑒 0 ,1,3 ) . What is 𝜙𝑠 (𝑒 0 ,1,3 ) ( 5) ?
(c) Describe 𝜙𝑠 (𝑒 0 ,𝑎,𝑏 ) .
✓ 4.17 Show that there is a total computable function 𝑔 : N → N such that Turing
machine P𝑔 (𝑛) computes the function 𝑦 ↦→ 𝑦 + 𝑛 2.
✓ 4.18 Show that there is a total computable function 𝑔 : N2 → N such that Turing
machine P𝑔 (𝑚,𝑏 ) computes 𝑥 ↦→ 𝑚𝑥 + 𝑏 .
✓ 4.19 Suppose that 𝑒 0 is such that 𝜙𝑒0 is the function computed by a Universal
Turing machine, meaning that if given the input cantor (𝑒, 𝑥) then it returns the
same value as P𝑒 (𝑥) . Suppose also that 𝑒 1 is such that 𝜙𝑒1 (𝑥) = 4𝑥 for all 𝑥 ∈ N.
Determine, if possible, the value of these (if it is not possible then briefly describe
why not).
(a) 𝜙𝑒 0 ( cantor (𝑒 1, 5))
(b) 𝜙𝑒 1 ( cantor (𝑒 0, 5))
(c) 𝜙𝑒 0 ( cantor (𝑒 0, cantor (𝑒 1, 5)))
4.20 Suppose that 𝑒 0 is such that 𝜙𝑒0 ( cantor (𝑒, 𝑥)) returns the same value as
𝜙𝑒 (𝑥) (or does not converge if that function does not converge). Suppose also that
𝜙𝑒1 (𝑥) = 𝑥 + 2 and that 𝜙𝑒2 (𝑥) = 𝑥 2 for all 𝑥 ∈ N. If possible determine the value
of these (if it is not possible, briefly say why not).
(a) 𝜙𝑒 0 ( 4)
(b) 𝜙𝑒 0 ( cantor (𝑒 1, 4))
(c) 𝜙𝑒 0 ( cantor ( 4, 𝑒 1 ))
(d) 𝜙𝑒 1 ( cantor (𝑒 0, 4))
4.21 Write a program that reads a file and returns its first character. Apply that
program to its own source.
4.22 Write a self-modifying program. Specifically, write a program that sets a
variable COUNTER , and is such that when run the program will increment that
variable in the source, and also print out the new value.

Section
II.5 The Halting problem
We’ve showed that there are functions that are not mechanically computable. We
gave a counting argument, that there are countably many Turing machines but
uncountably many functions and so there are functions with no associated machine.
While knowing what’s true is great, even better is to exhibit a specific function that
is unsolvable. We will now do that.

Definition The natural approach to producing such a function is to go through

Cantor’s Theorem and effectivize it, to turn the proof into a construction.
90 Chapter II. Background

Here is an illustrative table adapted from the discussion of Cantor’s Theorem

on page 74. Imagine that this table’s rows are the computable functions and its
columns are the inputs. For instance, this table lists 𝜙 2 ( 3) = 5.

Input Start
0 1 2 3 4 5 6 ...
𝜙0 3 1 2 7 7 0 4 ... Read 𝑒
0 5 0 0 0 0 0
Compute table entry
𝜙1 ...
1 4 1 5 9 2 6
for index 𝑒 , input 𝑒
Function 𝜙2 ...
𝜙3 9 1 9 1 9 1 9 ...
Print result + 1
𝜙4 1 0 1 0 0 1 0 ...
𝜙5 6 2 5 5 4 1 8 ...
End
.. ..
. .

Diagonalizing means considering the machine on the right. It moves down the
array’s diagonal, changing the 3, changing the 5, etc. Thus, when 𝑒 = 0 then the
output is 4, when 𝑒 = 1 then the output is 6, etc. Our goal with this machine
is to ensure that no computable function, none of the table’s rows, has the same
input-output relationship as this machine.
But that’s a puzzle. The flowchart outlines an effective procedure — we can
implement this using a Universal Turing machine in its third box — and thus it
seems that its output should be one of the rows.
What’s the puzzle’s resolution? The flowchart’s first, second, fourth, and fifth
boxes are trivial so the answer must involve the third one. There must be an 𝑒 ∈ N
so that 𝜙𝑒 (𝑒) ↑, so that for that number the machine in the flowchart never gets
through its middle box, and consequently never gives any output. That is, to avoid
a contradiction the above table must contain ↑ ’s.
So this puzzle has led to a key insight: the fact that some computations fail to
halt on some inputs is very important.
5.1 Problem (Halting problem) † Given 𝑒 ∈ N, determine whether 𝜙𝑒 (𝑒) ↓, that is,
whether Turing machine P𝑒 halts on input 𝑒 .

5.2 Definition The Halting problem set is 𝐾 = {𝑒 ∈ N 𝜙𝑒 (𝑒)↓}.

5.3 Theorem The Halting problem is unsolvable by any Turing machine.
Proof Assume otherwise, that there exists a Turing machine with this behavior.

1 – if 𝜙𝑒 (𝑒)↓
(
1𝐾 (𝑒) = 𝐾 (𝑒) = halt_decider (𝑒) =
0 – if 𝜙𝑒 (𝑒)↑

That assumption implies that the function 𝑓 below is also mechanically computable.
(In the top case the particular output value 42 doesn’t matter, all that matters is
that 𝑓 converges.) The flowchart illustrates how 𝑓 is constructed; it uses the above
function halt_decider in its decision box.
†
We use a distinct typeface for problem names, as in ‘Halting’.
Section 5. The Halting problem 91

Start

Read 𝑒
42 – if 𝜙𝑒 (𝑒)↑
(

Print 42
Y
𝐾 (𝑒 ) = 0? N
Infinite loop
𝑓 (𝑒) =
↑ – if 𝜙𝑒 (𝑒)↓
End

Since this is mechanically computable, it has a Turing machine index. Let that
index be 𝑒 0 , so that 𝑓 (𝑥) = 𝜙𝑒0 (𝑥) for all inputs 𝑥 .
Now consider 𝑓 (𝑒 0 ) = 𝜙𝑒0 (𝑒 0 ) (that is, feed the machine P𝑒0 its own index).
If it diverges then the first clause in the definition of 𝑓 means that 𝑓 (𝑒 0 )↓, which
contradicts the assumption of divergence. If it converges then 𝑓 ’s second clause
means that 𝑓 (𝑒 0 )↑, also a contradiction. So there are two possibilities and both lead
to a contradiction. Since assuming that halt_decider is mechanically computable
gives a contradiction, that function is not mechanically computable.
We say that a problem is unsolvable if no Turing machine has the specified
input-output behavior. If the problem is to compute the answers to ‘yes’ or ‘no’
questions, that is, to decide membership in a set, then we say that the set is
undecidable. With Church’s Thesis in mind, we interpret these to mean that the
problem or set is unsolvable by any discrete mechanism.

General unsolvability We have named one task, the Halting problem, that no
mechanical device can solve. We will next leverage that one to produce many jobs
that cannot be done. That is, the Halting problem is part of a larger phenomenon
of unsolvability.
5.4 Example Consider this problem: we want an algorithm that tells us whether a
given Turing machine halts on the input 3. That is: given 𝑒 , does 𝜙𝑒 ( 3)↓?
We will show that if this

1 – if 𝜙𝑒 ( 3)↓
(
halts_on_three_decider (𝑒) =
0 – otherwise

were a computable function then we could compute the solution of the Halting
problem. That’s impossible, so we will then know that halts_on_three_decider
is also not computable.
Our strategy is to create a scheme where being able to determine whether
an arbitrary machine halts on 3 allows us to settle questions about the Halting
problem. Imagine that we have a particular 𝑥 and want to know whether 𝜙𝑥 (𝑥)↓.
Consider the machine outlined on the right below. It reads the input 𝑦 and ignores
it, and also gives a nominal output. Its action is in the middle box, where the code
uses a universal Turing machine to simulate running P𝑥 on input 𝑥 . If that halts
then the machine on the right as a whole halts, for any input. If not then it never
gets through its middle box and so the machine as a whole does not halt.
92 Chapter II. Background

Start Start

Read 𝑥, 𝑦 Read 𝑦

Run P𝑥 on 𝑥 Run P𝑥 on 𝑥

Print 42 Print 42

End End

As just one case, the machine on the right halts on input 𝑦 = 3 if and only if P𝑥
halts on 𝑥 (having P𝑥 halt on 𝑥 implies the same for all other input 𝑦 ’s but it is not
relevant to our strategy). So with machine on the right, if we were able to answer
questions about halting on 3 then we could leverage that ability, making ourselves
able to determine whether P𝑥 halts on 𝑥 .
We are ready for the argument. Consider this function.

42 – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise

Observe that 𝜓 is computed by the flowchart above on the left. So it is intuitively

computable. By Church’s Thesis there is a Turing machine whose input-output
behavior is 𝜓 . That machine has some index, 𝑒 0 , meaning that 𝜓 = 𝜙𝑒0 .
Use the s-m-n theorem to parametrize 𝑥 , giving 𝜙𝑠 (𝑒0,𝑥 ) . This is a family of
functions, one for 𝑥 = 0, one for 𝑥 = 1, etc. Below are the associated machines.
Note that each has a ‘Read 𝑦 ’ but no ‘Read 𝑥 ’; for each of these machines the value
used in its middle box is hard-coded into its source. Note also that the flowchart
on the right is the same as the one on the right in the strategy discussion above.
That is, machine P𝑠 (𝑒0,𝑥 ) halts on input 𝑦 = 3 if and only if P𝑥 halts on input 𝑥 .
Start Start Start

Read 𝑦 Read 𝑦 Read 𝑦

Run P0 on 0 Run P1 on 1 ... Run P𝑥 on 𝑥 ...

Print 42 Print 42 Print 42

End End End

Therefore, for all 𝑥 ∈ N we have this.

𝜙𝑥 (𝑥)↓ if and only if halts_on_three_decider ( 𝑠 (𝑒 0, 𝑥) ) = 1 (∗)
The function 𝑠 is computable so if halts_on_three_decider were computable
then the entire right side of (∗) would be computable. That would in turn imply
that the Halting problem on the left side is computably solvable, which it isn’t.
Therefore halts_on_three_decider is not computable.
5.5 Remark The subscript 0 is there to signal that 𝑒 0 is constant, since it is the index
of the machine that computes 𝜓 . In the above family of infinitely many computable
functions, the parameter 𝑥 is what varies, not 𝑒 0 .
Section 5. The Halting problem 93

5.6 Example We will show that this problem is not mechanically solvable: given 𝑒 ,
determine whether P𝑒 outputs 7 for some input.

1 – if 𝜙𝑒 (𝑦) = 7 for some 𝑦

(
outputs_seven_decider (𝑒) =
0 – otherwise

The argument is much like the one in the prior example. Consider this.

7 – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise

The flowchart on the left below sketches how to compute 𝜓. Thus 𝜓 is intuitively
mechanically computable and by Church’s Thesis there is a Turing machine whose
input-output behavior is 𝜓. That Turing machine has an index, 𝑒 0 , so that 𝜓 = 𝜙𝑒0 .

Start Start

Read 𝑥, 𝑦 Read 𝑦

Run P𝑥 on 𝑥 Run P𝑥 on 𝑥

Print 7 Print 7

End End

The s-m-n theorem gives a family of functions 𝜙𝑠 (𝑒0,𝑥 ) parametrized by 𝑥 , the

functions computed by the machines P𝑠 (𝑒0,0 ) , P𝑠 (𝑒0,0 ) , . . . On the right is the
flowchart for the machine P𝑠 (𝑒0,𝑥 ) . As in the prior example note that 𝑥 is hard-coded
into its source, so this is a single-input machine.
This machine P𝑠 (𝑒0,𝑥 ) has the property that P𝑥 halts on 𝑥 if and only if there
exists an input 𝑦 such that the machine outputs a 7. (Not only is it true if
and only if there exist such an input, but in fact it is true if and only if this
happens for every input. But that is not relevant to the argument.) Thus
𝜙𝑥 (𝑥) ↓ if and only if outputs_seven_decider (𝑠 (𝑒 0, 𝑥)) = 1. If the function
outputs_seven_decider were computable then because the composition of two
computable functions, outputs_seven_decider ◦𝑠 , is computable, we would have
that the Halting problem is computably solvable, which is not right. Therefore
outputs_seven_decider is not computable.
5.7 Example We next show that this problem is unsolvable: given 𝑒 , decide whether
𝜙𝑒 is the doubler function, that is, whether 𝜙𝑒 (𝑦) = 2𝑦 for all 𝑦 .
We will show that this function is not computable.

1 – if 𝜙𝑒 (𝑦) = 2𝑦 for all 𝑦

(
doubler_decider (𝑒) =
0 – otherwise

The function on the left below is intuitively computable by the flowchart in the
middle.
94 Chapter II. Background

Start Start

Read 𝑥, 𝑦 Read 𝑦
2𝑦 – if 𝜙𝑥 (𝑥)↓
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
Print 2𝑦 Print 2𝑦

End End

So Church’s Thesis says that there is a Turing machine that computes it. Let that
machine’s index be 𝑒 0 . Apply the s-m-n theorem to get a family of functions 𝜙𝑠 (𝑒0,0 ) ,
𝜙𝑠 (𝑒0,1 ) , . . . The generic member of this family P𝑠 (𝑒0,𝑥 ) is sketched by the flowchart
on the right. It illustrates that 𝜙𝑥 (𝑥)↓ if and only if doubler_decider (𝑠 (𝑒 0, 𝑥)) = 1.
So the supposition that doubler_decider is computable implies that the Halting
problem is computably solvable, which is false.
These examples show that the Halting problem serves as a touchstone for
unsolvability: often we prove that something is unsolvable by demonstrating that if
we could solve it then we could solve the Halting problem. We say that the Halting
problem reduces to the given problem.† Thus for instance the Halting problem
reduces to the problem of determining whether a given Turing machine halts on
input 3.
Discussion The unsolvability of the Halting problem is one of the most important
results in the Theory of Computation. We will close with a few points.
First, to reiterate, saying that a problem is unsolvable means that it is unsolvable
by a mechanism, that no Turing machine computes the solution to the problem.
There is a function that solves it, but that function is not effectively computable.
Second, the fact that the Halting problem is unsolvable does not mean that for
all computations, we cannot tell if that computation halts. Obviously this program
halts for every input.
> (define (successor i)
(+ 1 i))

Nor does it mean that we cannot tell if a computation does not halt. This one,
> (define (f x)
(displayln x)
(f (+ 1 x)))

once started, just keeps going (below, control-C interrupted the run).
> (f 0)
0
1
...
97806
97807
; user break [,bt for context]
†
Often newcomers get this terminology backwards. We are using ‘reduces to’ in the same sense
that we would in saying in Calculus, “finding the area under the graph of a polynomial reduces to
antidifferentiating that polynomial.” We can find the area if we can antidifferentiate. Similarly here,
we can solve the Halting problem if we can solve the halts on 3 problem.
Section 5. The Halting problem 95

Instead, the unsolvability of the Halting problem says: there is no single program
that for all 𝑒 correctly decides in a finite time whether P𝑒 halts on input 𝑒 .
That sentence contains the qualifier ‘single program’ because for any index 𝑒 ,
either P𝑒 halts on 𝑒 or else it does not. Consequently, for any 𝑒 one of these two
programs produces the right answer.

(define (yes e) (define (no e)

(display 1)) (display 0))

Of course, guessing which one of the two applies is not what we have in mind
when we think about solving the Halting problem. We want uniformity. We want a
single effective procedure, one program, that inputs 𝑒 and that outputs the right
answer.
The sentence above also includes the qualifier ‘finite time’. We could write code
that reads an input 𝑒 and simulates P𝑒 on input 𝑒 . This is a uniform approach
because it is a single program. If P𝑒 on input 𝑒 halts then our code would discover
that. But if it does not halt then our code would not get that result in a finite time.
In short, the second point is that the unsolvability of the Halting Problem is about
the non-existence of a single program that works across all indices. Theorem 5.3
speaks to uniformity — specifically, it says that uniformity is impossible.
Our third point is about why unsolvability of the Halting problem is so important
in the subject. A beginning programming class could leave the impression that if a
program doesn’t halt then it just has a bug, something fixable. So it could seem to
a student in that course that the Halting problem is not interesting.
That impression is wrong. Imagine that we could somehow write a utility
always_halt that inputs any source P and adjusts it so that for any input where
P does not halt, the modified program will halt (with some nominal output) but
the utility does not change any outputs where P does halt. That would give a
list of total functions like the one on page 90, and diagonalization would give a
contradiction. Thus, in any general computational scheme there must be some
computations that halt on all inputs, some that halt on no inputs, and some that
halt on a proper subset of inputs but not on the rest. Unsolvability of the Halting
problem is inherent in the nature of computation.
This alone is enough to justify study of the problem but our fourth point is
that there is another reason for our interest. With a computable halt_decider in
hand, we could solve many other problems. Some we saw above in this section but
there are others that we currently don’t know how to solve that involve unbounded
search.
For instance, a perfect number is a natural number that is the sum of its proper
positive divisors. An example is that 6 is perfect because 6 = 1 + 2 + 3. Another is
28 = 1 + 2 + 4 + 7 + 14. The next two perfect numbers are 496 and 8128. These
numbers have been studied since Euclid and today we understand the form of all
even perfect numbers. But no one knows if there are any odd perfect numbers.†
†
People have done computer checks up to 101500 and not found any.
96 Chapter II. Background

Start With a solution to the Halting Problem we could

settle the question. The program sketched here
Read 𝑥
searches for an odd perfect number.† If it finds one
𝑖 =0 then it halts. If not then it does not halt. So if we
had a halt_decider and we gave it the index of
this program, then that would settle whether there
𝑖 =𝑖 +1 N 2𝑖 + 1 perfect?
exists any odd perfect numbers.
Y There are many open questions that would fall
Print 1 to this approach. Just to name one more: no one
knows if there is any 𝑛 > 4 such that 2 ( 2 ) + 1 is
End
𝑛

prime. We could answer by writing a P to search

for one and give its index to halt_decider.
Before moving to the last point, note that unbounded search is a theme in this
subject. We saw it in defining general recursion using 𝜇 recursion. Also, the obvious
way to test whether 𝜙𝑒 (𝑒) ↓ is to search for a stage at which the computation
halts. It even lurks on this book’s first page, since the natural algorithm for the
Entscheidungsproblem is to, given a statement, start with the axioms and do a
breadth-first enumeration of the theorems, looking for the given one. Turing and
Church independently used an approach like the one in this section to show that
the Entscheidungsproblem is unsolvable: if there were a computable way to answer
all mathematical questions then we could use it to answer questions about Turing
machines, including whether they halt.
Our final point in this discussion starts by noting that not every problem
involving Turing machines is unsolvable. There are problems beginning with
“Given 𝑒 ” that we can do. One is: given 𝑒 , decide whether one of the instructions
in P𝑒 is 𝑞 0 BL𝑞 1 .
The difference between this problem and the ones earlier in this section is that
this solvable one is about the machine’s source. The unsolvable ones that we have
seen are about the machine’s behavior. We apparently run into trouble when we try
to mechanically analyze what a machine will do, rather than sticking to what it is.
This point brings us full circle the opening of the first chapter where we said
that we are most interested in the input-output behavior of the machines, in what
they do, and less interested in things such as internal program structure.

II.5 Exercises
5.8 Someone asks the professor, “I don’t get the point of the Halting problem.
If you want programs to halt then just watch them and when they exceed a set
number of cycles, send a kill signal.” How to respond?
5.9 Is this statement right or wrong: there is no function that solves the Halting
Problem, that is, there is no 𝑓 such that 𝑓 (𝑒) = 1 if 𝜙𝑒 (𝑒)↓ and 𝑓 (𝑒) = 0 if 𝜙𝑒 (𝑒)↑?
†
This program takes an input 𝑥 but ignores it; in this book we like to have the machines that we use
take an input and also give an output.
Section 5. The Halting problem 97

✓ 5.10 Which of these statements is a correct use of ‘reduces to’?

(a) “The problem of getting bread reduces to the problem of finding the store” or
“The problem of finding the store reduces to the problem of getting bread.”
(b) “The problem of showing a number is nonprime reduces to the problem of
exhibiting a one of its factors” or “The problem of exhibiting a factor of a
number reduces to the problem of showing that the number is nonprime.”
(c) “The problem of getting the most votes reduces to the problem of winning the
election” or “The problem of winning the election reduces to the problem of
getting the most votes.”
5.11 Your study partner asks you, “The Turing machine P = {𝑞 0 BB𝑞 0, 𝑞 0 11𝑞 0 }
fails to halt for all inputs, that’s obvious. But these unsolvability results say that I
cannot know that. Why not?” Explain what they missed.
✓ 5.12 A person in your class asks, “What is wrong with this approach to solving
the Halting problem? To not halt the machine must have an infinite loop, right?
Any Turing machine has a finite number of states and the tape alphabet is finite,
right? So there are only finitely many state-character pairs that can happen. As
the machine runs, just monitor it for a repeat of some pair. A repeat means that
the machine is looping, and so it won’t halt.” What are they missing?
5.13 (This is related to the prior exercise.) Would it be possible for a computer
to detect infinite loops and subsequently stop the associated process, or would
implementing such logic be solving the Halting problem? Specifically, could the
runtime environment do this: after each instruction is executed, it makes a snapshot
of all of the relevant memory, the stack and heap data, the registers, the instruction
pointer, etc., and before executing a instruction it checks its snapshot against all
prior ones, and if there is a repeat then it declares that the program is in an infinite
loop?
5.14 This is the hailstone function, which inputs natural numbers.


 1 – if 𝑛 = 0 or 𝑛 = 1
ℎ(𝑛) = ℎ(𝑛/2) – if 𝑛 is even



ℎ( 3𝑛 + 1) – else



The Collatz conjecture is that ℎ(𝑛) = 1 for all 𝑛 ∈ N, that is, ℎ(𝑛) halts in that it
does not keep expanding forever. No one knows whether the Collatz conjecture is
true. Is it an unsolvable problem to determine whether ℎ halts on all input?
✓ 5.15 For each of these, is it true or false?
(a) The problem of determining, given 𝑒 , whether 𝜙𝑒 ( 3)↓ is unsolvable because
no function halts_on_three_decider exists.
(b) The existence of unsolvable problems indicates weaknesses in the models of
computation, and we need stronger models.
5.16 A set is computable if its characteristic function is a computable function.
Consider the set consisting of the single number 1 if in 1924 G Mallory reached the
summit of Everest, and otherwise consisting of 0. Is that set computable?
98 Chapter II. Background

5.17 Describe the family of computable functions that you get by using the
s-m-n Theorem to parametrize 𝑥 in each function. Also give flowcharts sketching
the associated machines for 𝑥 =( 0, 𝑥 = 1, and 𝑥 = 2. (a) 𝑓 (𝑥, 𝑦) = 3𝑥 + 𝑦
𝑥 – if 𝑥 is odd
(b) 𝑓 (𝑥, 𝑦) = 𝑥𝑦 2 (c) 𝑓 (𝑥, 𝑦) =
0 – otherwise

5.18 Show that each of these is a solvable problem. (a) Given an index 𝑒 ,
determine whether Turing machine P𝑒 runs for at least 42 steps on input 3.
(b) Given an index 𝑒 , determine whether P𝑒 runs for at least 42 steps on input 𝑒 .
(c) Given 𝑒 , decide whether P𝑒 runs for at least 𝑒 steps on input 𝑒 .

Each exercise from 5.19 through 5.25 states a problem. Show that the problem is
unsolvable by reducing the Halting problem to it.
✓ 5.19 See the instructions above. Given an index 𝑒 , determine if 𝜙𝑒 is a total function,
that is, if it converges on every input.
✓ 5.20 See the instructions before Exercise 5.19. Given an index 𝑒 , decide if the
Turing machine P𝑒 squares its input. That is, decide if 𝜙𝑒 associates 𝑦 ↦→ 𝑦 2.
5.21 See the instructions above. Given 𝑒 , determine if the function 𝜙𝑒 halts and
returns the same value on two consecutive inputs, so that 𝜙𝑒 (𝑦) = 𝜙𝑒 (𝑦 + 1) for
some 𝑦 ∈ N.
✓ 5.22 See the instructions above. Given 𝑒 , decide whether 𝜙𝑒 fails to converge on
input 5.
5.23 See the instructions above. Given an index, determine if the computable
function with that index fails to converge on all odd numbers.
5.24 See the instructions above. Given 𝑒 , decide if the function 𝜙𝑒 has the action
𝑥 ↦→ 𝑥 + 1.
5.25 See the instructions above. Given 𝑒 , decide if the function 𝜙𝑒 fails to converge
on both inputs 𝑥 and 2𝑥 , for some 𝑥 .
5.26 One of these problems is solvable and one is not. Which is which? (a) Given
an index 𝑒 , decide if P𝑒 halts on input 153. (b) Given an index 𝑒 , decide if P𝑒
halts in sooner than 1000 steps on input 153.
5.27 Fix integers 𝑎, 𝑏, 𝑐 ∈ N and consider the problem L𝑎,𝑏,𝑐 of determining
whether there is a single-number input cantor (𝑥, 𝑦) such that 𝑎𝑥 + 𝑏𝑦 = 𝑐 . Is this
problem solvable or unsolvable?
5.28 For each problem, state whether it is solvable, unsolvable, or you cannot
tell. You needn’t give a proof, just decide. (a) Given 𝑒 , decide if P𝑒 halts on
all even numbers 𝑦 . (b) Given 𝑒 , decide if P𝑒 halts on three or fewer inputs 𝑦 .
(c) Given 𝑒 , decide if P4 halts on input 𝑒 . (d) Given 𝑒 , decide if P𝑒 contains an
instruction with state 𝑞𝑒 .
Section 5. The Halting problem 99

✓ 5.29 For each problem, fill in the blanks to show that the problem is unsolvable.
We will show that this is not mechanically computable.

1 – if (2)
(
(1) _decider (𝑒) =
0 – otherwise

For that, consider this function.

(3) – if 𝜙𝑥 (𝑥)↓
(
𝜓 (𝑥, 𝑦) =
0 – otherwise

The flowchart on the left shows that 𝜓 is intuitively mechanically computable.

Start Start

Read 𝑥 , 𝑦 Read 𝑦

Run P𝑥 on 𝑥 Run P𝑥 on 𝑥

__(4)__ __(4)__

End End

By Church’s Thesis there is a Turing machine with that behavior. Let that machine have
index 𝑒 0 , so that 𝜓 (𝑥, 𝑦) = 𝜙𝑒0 (𝑥, 𝑦) . Apply the s-m-n Theorem to parametrize 𝑥 . A member
of the resulting family of Turing machines is sketched above on the right. Observe that
𝜙𝑥 (𝑥)↓ if and only if (1) _decider (𝑠 (𝑒 0, 𝑥)) = 1. Because the function 𝑠 is mechanically
computable, if (1) _decider were mechanically computable then the Halting problem
would be mechanically solvable. But the Halting problem is not mechanically solvable. Therefore
(1) _decider is not mechanically computable.
(a) Given machine index 𝑒 , decide if there is a 𝑦 ∈ N so that P𝑒 outputs 𝑦 on
input 𝑦 .
(b) Given 𝑒 , decide if there is a 𝑦 so that 𝜙𝑒 (𝑦) = 42.
(c) Given 𝑒 , decide if there is a 𝑦 so that 𝜙𝑒 (𝑦) = 𝑦 + 2.
5.30 In some ways a more natural set than 𝐾 = {𝑒 ∈ N 𝜙𝑒 (𝑒)↓} is 𝐾0 =
{ ⟨𝑒, 𝑥⟩ ∈ N2 𝜙𝑒 (𝑥)↓}. Use the fact that 𝐾 is not computable to prove that 𝐾0 is
also not computable.
5.31 The Halting problem of determining membership in the set 𝐾 = {𝑒 𝜙𝑒 (𝑒)↓}
appears to be an aggregate, or to cut across all Turing machines, in that for every
Turing machine a piece of information about that machine forms part of 𝐾 .
(a) Produce a single Turing machine, P𝑒 , such that the question of determining
membership in {𝑦 𝜙𝑒 (𝑦)↓} is undecidable.
(b) Fix a number 𝑦 . Show that the question of whether P𝑒 halts on 𝑦 is decidable.
✓ 5.32 For each, if it is mechanically solvable then sketch a algorithm to solve it. If
it is unsolvable then show that.
(a) Given 𝑒 ∈ N, determine the number of states in P𝑒 .
(b) Given 𝑒 , determine whether P𝑒 halts when the input is the empty string.
(c) Given 𝑒 , determine if P𝑒 halts on input 𝑛 within one hundred steps.
100 Chapter II. Background

5.33 Is 𝐾 infinite?
5.34 True or false: the number of unsolvable problems is countably infinite.
5.35 Show that for any Turing machine, the problem of determining whether it
halts on all inputs is solvable.
5.36 Goldbach’s conjecture, is that every even natural number greater than two is
the sum of two primes. It is one of the oldest and best-known unsolved problems
in mathematics. Show that if we could solve the Halting problem then we could in
principle settle Goldbach’s conjecture.
5.37 Brocard’s problem asks whether there are any numbers besides 4, 5, and 7
for which 𝑛 ! + 1 is a perfect square (computer searches up to a quadrillion, 1 × 1015 ,
have not found any other solutions). Show that if we could solve the Halting
problem then we could in principle settle this problem.
5.38 Show that most problems are unsolvable by showing that there are uncount-
ably many functions 𝑓 : N → N that are not computed by any Turing machine,
while the number of function that are computable is countable.
5.39 Give an example of a computable function that is total, meaning that it
converges on all inputs, but whose range is not computable.
5.40 A set of bitstrings is a decidable language if its characteristic function is
computable. Prove each. (a) The union of two decidable languages is a decidable
language. (b) The intersection of two decidable languages is a decidable language
(c) The complement of a decidable language is a decidable language.

Section
II.6 Rice’s Theorem
Our finishing point in the prior section was that the results and examples there give
the intuition that we cannot mechanically analyze the behavior of Turing machines.
In this section we will make this intuition precise.
Mechanical analysis does apply to some properties of Turing machines. We
can write a routine that, given 𝑒 , determines whether or not P𝑒 has a four-tuple
instruction whose first entry is the state 𝑞 5 . The analogue in ordinary programming
is that we can write a program to parse source code for a variable named x1. But
these are not what we mean by “behavior.” Instead, they are properties of the
implementation.
6.1 Definition Two computable functions have the same behavior, 𝜙𝑒 ≃ 𝜙𝑒ˆ, if they
converge on the same inputs 𝑥 ∈ N and when they do converge, they have the
same outputs.‡
‡
Strictly speaking, we don’t need the symbol ≃. By definition, a function is a set of ordered pairs. If
𝜙𝑒 ( 0 ) ↓ while 𝜙𝑒 ( 1 ) ↑ then the set 𝜙𝑒 contains a pair with first entry 0 but no pair starting with 1.
Thus for partial functions, if they converge on the same inputs and when they do converge they have
the same outputs, then we can simply say that the two are equal, 𝜙 = 𝜙ˆ. But we use ≃ as a reminder
that the functions may be partial.
Section 6. Rice’s Theorem 101

6.2 Definition A set I of natural numbers is an index set† when for all 𝑒, 𝑒ˆ ∈ N, if
𝑒 ∈ I and 𝜙𝑒 ≃ 𝜙𝑒ˆ then also 𝑒ˆ ∈ I .
6.3 Example If we fix a behavior and consider the indices of all of the Tur-
ing machines with that behavior then we get an index set. Thus the set
I = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 2𝑥 for all 𝑥 } is an index set. To verify, suppose that 𝑒 ∈ I
and that 𝑒ˆ ∈ N is such that 𝜙𝑒 ≃ 𝜙𝑒ˆ. Then 𝜙𝑒ˆ also doubles its input: 𝜙𝑒ˆ (𝑥) = 2𝑥 for
all 𝑥 . Thus 𝑒ˆ ∈ I also.
6.4 Example We can also get an index set by collecting multiple behaviors together.
The set J = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 , or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 } is an index set.
For, suppose that 𝑒 ∈ J and that 𝜙𝑒 ≃ 𝜙𝑒ˆ where 𝑒ˆ ∈ N. Because 𝑒 ∈ J, either
𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 . From 𝜙𝑒 ≃ 𝜙𝑒ˆ we know that either
𝜙𝑒 (𝑥) = 3𝑥 for all 𝑥 or 𝜙𝑒 (𝑥) = 𝑥 3 for all 𝑥 , and consequently 𝑒ˆ ∈ J.
6.5 Example The set {𝑒 ∈ N P𝑒 contains an instruction starting with 𝑞 10 } is not an
index set. We can easily produce two Turing machines having the same behavior
where one machine contains such an instruction while the other does not.
6.6 Theorem (Rice’s theorem) Every index set that is not trivial, that is not empty
and not all of N, is not computable.
Proof Let I be a nontrivial index set. Choose an 𝑒 ∈ N so that 𝜙𝑒 (𝑦) ↑ for all 𝑦 .
Then either 𝑒 ∈ I or 𝑒 ∉ I . We shall show that in the second case I is not
computable. The first case is similar and is Exercise 6.36.
So assume 𝑒 ∉ I . Since I is not empty it contains an index 𝑒ˆ ∈ I . Because I is
an index set, 𝜙𝑒 ; 𝜙𝑒ˆ. Thus there is a 𝑦 such that 𝜙𝑒ˆ (𝑦)↓.
Consider the flowchart on the left below. By Church’s Thesis there is a
Turing machine with that behavior. Let it be P𝑒0 . Apply the s-m-n theorem to
parametrize 𝑥 , resulting in the uniformly computable family of functions 𝜙𝑠 (𝑒0,𝑥 )
whose computation is outlined on the right.

Start Start

Read 𝑥, 𝑦 Read 𝑦

Run P𝑥 on 𝑥 Run P𝑥 on 𝑥

Run P𝑒ˆ on 𝑦 Run P𝑒ˆ on 𝑦

End End

We’ve constructed the machine on the right so that if 𝜙𝑥 (𝑥)↑ then 𝜙𝑠 (𝑒0,𝑥 ) ≃ 𝜙𝑒 and
thus 𝑠 (𝑒 0, 𝑥) ∉ I . As well, if 𝜙𝑥 (𝑥) ↓ then 𝜙𝑠 (𝑒0,𝑥 ) ≃ 𝜙𝑒ˆ, and thus 𝑠 (𝑒 0, 𝑥) ∈ I . It
follows that if I were mechanically computable, so that we could effectively check
whether 𝑠 (𝑒 0, 𝑥) ∈ I , then we could solve the Halting problem.
6.7 Example We will use Rice’s Theorem to show that this problem is unsolvable: given
𝑒 , decide if 𝜙𝑒 ( 3)↓. We must define an appropriate set I and then verify that it is
not empty, that it is not all of N, and that it is an index set.
†
It is called an index set because it is a set of indices.
102 Chapter II. Background

Let I = {𝑒 ∈ N 𝜙𝑒 ( 3)↓}. The simplest way to verify that this set is not empty
is to exhibit a member. The routine sketched on the left below is intuitively
computable and so Church’s Thesis says there is a Turing machine with that
behavior. That machine’s index is a member of I and thus I ≠ ∅.
Start
Start
Read 𝑥
Read 𝑥
Print 42
Infinite loop
End

Likewise, to verify that I does not contain every number, consider the routine on
the right. Church’s Thesis gives that there is a Turing machine with that behavior.
That machine’s index is not a member of I and so I ≠ N.
We finish by verifying that I is an index set. Assume that 𝑒 ∈ I and let 𝑒ˆ ∈ N
be such that 𝜙𝑒 ≃ 𝜙𝑒ˆ. Because 𝑒 ∈ I , we have that 𝜙𝑒 ( 3) ↓. Because 𝜙𝑒 ≃ 𝜙𝑒ˆ, we
have that 𝜙𝑒ˆ ( 3)↓ also, and thus 𝑒ˆ ∈ I . Hence, I is an index set.
The above example is the same problem as in the first example of the prior
subsection. Note that Rice’s Theorem makes the answer considerably simpler. (Of
course, our development of the theorem requires the prior section’s work.)
6.8 Example We can use Rice’s Theorem to show that the prior section’s second
problem is unsolvable: given 𝑒 , decide if 𝜙𝑒 (𝑥) = 7 for some 𝑥 . Rice’s Theorem
asks us to produce an appropriate I and verify that it is a nontrivial index set.
Let I = {𝑒 ∈ N 𝜙𝑒 (𝑥) = 7 for some 𝑥 }. This set is not empty because there
is a Turing machine that acts as the identity function, so that 𝜙 (𝑥) = 𝑥 , and the
index of that machine is a member of I . This set is not all of N because there is a
Turing Machine that never halts, 𝜙 (𝑥)↑ for all 𝑥 , and that machine’s index is not a
member of I . Hence I is nontrivial.
To show that I is an index set assume that 𝑒 ∈ I , and let 𝑒ˆ ∈ N be such that
𝜙𝑒 ≃ 𝜙𝑒ˆ. By the assumption, 𝜙𝑒 (𝑥 0 ) = 7 for some input 𝑥 0 . Since the two have the
same behavior, the same input gives 𝜙𝑒ˆ (𝑥 0 ) = 7. Consequently, 𝑒ˆ ∈ I .
6.9 Example This problem is also unsolvable: given 𝑒 , decide whether 𝜙𝑒 equals this.

4 – if 𝑥 is prime
(
𝑓 (𝑥) =
𝑥 + 1 – otherwise

Let I = { 𝑗 ∈ N 𝜙 𝑗 = 𝑓 }. The set I is not empty because we can write a program

with this behavior and so by Church’s Thesis there is a Turing machine with this
behavior, and its index is a member of I . Also, I ≠ N because there is a Turing
machine that fails to halt on any input and its index is not a member of I .
To finish we argue that I is an index set. So suppose that 𝑒 ∈ I and also that
𝑒ˆ is such that 𝜙𝑒 ≃ 𝜙𝑒ˆ. Because 𝑒 ∈ I we have that 𝜙𝑒 (𝑥) = 𝑓 (𝑥) for all inputs 𝑥 .
Because 𝜙𝑒 ≃ 𝜙𝑒ˆ we have that 𝜙𝑒ˆ (𝑥) = 𝜙𝑒 (𝑥) for all 𝑥 , and so 𝑒ˆ is also a member
of I . Hence, I is an index set.
Section 6. Rice’s Theorem 103

We close by revisiting Rice’s Theorem at a somewhat higher level. This

chapter started by showing that unsolvable problems exist. However, that counting
argument proof did not yield natural examples. In the prior section we got many
examples of interesting and practical unsolvable problems, and these led to the
intuition that we cannot mechanically analyze the behavior of machines. In this
section we formally defined ‘behavior’. With that, Rice’s Theorem says that every
interesting set of behaviors, every nontrivial index set, is unsolvable. So we’ve gone
from taking unsolvable problems as exotic, to taking them as things that genuinely
do come up, to a point where a casual reading might make it seem that every
problem is unsolvable.
But that can’t be right. We’ve all written programs that do real world tasks,
with behaviors that we would at least informally characterize as interesting. The
resolution of this dissonance of course is that Rice’s Theorem says that no problem
is solvable if it is of a certain kind. We now expand on what that kind is.
In the drawing on the left below, the red dots represent Turing machines.
Of course there are infinitely many of them, which the drawing indicates with
an ellipsis on the right. There are some pairs of machines that have the same
input/output behavior, that is, they yield the same computable function. (In fact,
the Padding Lemma on page 71 says that for every computable function there are
infinitely many machines.) So there are multiple machines whose input/output
behavior is the doubling function 𝑓 (𝑥) = 2𝑥 , multiple machines that halt on
no input, etc. The gray-bordered tiles group together machines with the same
behavior, ones that give the same computable functions.

··· ···

We say that a property of a machine is ‘extensional’ if whenever two machines

compute the same function — whenever they are shown in the same tile — then
either they both have that property or they both do not. Rice’s Theorem asserts
that an extensional property of machines that is effective must be trivial, meaning
that it either holds of all machines or of none.
Here are some extensional properties of machines: (1) halting on input 3,
(2) having an input for which the output is 7, (3) having an input on which
the machine fails to halt. For example for (1), as it is an extensional property
and nontrivial, Rice’s Theorem says that it cannot be effective, that we cannot
computably decide it.
Here are some non-extensional properties, ones to which Rice’s Theorem does
not apply: (1) halting within 100 steps on every input, (2) visiting fewer than 100
tape cells on input 0, and (3) containing the state 𝑞 10 .
So where do index sets fit with behaviors? As we saw earlier, for instance in
104 Chapter II. Background

Example 6.4, an index set joins together the indices from a number of behaviors; in
in the picture above on the right it is three of them. The property of being in this I
is extensional because we are taking entire behavior tiles.
In summary, although Rice’s Theorem does not apply to all problems, never-
theless it is especially significant for understanding what can be done through
mechanical analysis alone. Rice’s Theorem is about those properties of machines
that extend to be properties of the functions that those machines compute. For
instance, the problem of deciding whether a program computes the squaring
function is unsolvable, but the problem of deciding whether the code uses the letter
‘k’ is not.

II.6 Exercises
6.10 Your friend is confused, “According to Rice’s Theorem, everything is impos-
sible. Every property of a computer program is non-computable. But I do this
supposedly impossible stuff all the time!” Help them out.
6.11 Is I = {𝑒 P𝑒 runs for at least 100 steps on input 5 } an index set?
6.12 Why does Rice’s theorem not show that this problem is unsolvable: given 𝑒 ,
decide whether ∅ ⊆ {𝑥 𝜙𝑒 (𝑥)↓}?
6.13 True or false: the given property of machines is extensional. Briefly justify.
(a) The machine halts on input 5. (b) It has exactly four instructions. (c) It
computes twice its input.
6.14 Briefly describe why these machine properties, listed in the section, are
non-extensional. (a) The property of halting within 100 steps on every input,
(b) of visiting fewer than 100 tape cells on input 0, and (c) of containing the
state 𝑞 10 .
6.15 Give a trivial index set: fill in the blanks I = {𝑒 P𝑒 } so that
the set I is empty.
6.16 Give a trivial index set: fill in the blanks I = {𝑒 P𝑒 } so that
the set I is all of N.
6.17 For each problem, produce an index file suitable for applying Rice’s Theorem.
You needn’t give the entire argument, just produce a file.
(a) Given 𝑒 , determine if P𝑒 halts on input 7 with output 7.
(b) Given 𝑒 , determine if P𝑒 halts on input 𝑒 and returns output 𝑒 .
(c) Given 𝑒 , determine if P2𝑒 returns output 7 for any input 𝑦 .
(d) Given 𝑒 , determine if P𝑒 halts on 7 and gives a prime number.
For each of the problems from Exercise 6.18 to Exercise 6.24, show that it is unsolvable
by applying Rice’s theorem. (These repeat the problems from Exercise 5.19 to
Exercise 5.25.)
✓ 6.18 See the instructions above. Given an index 𝑒 , determine if 𝜙𝑒 is total, that is,
if it converges on every input.
✓ 6.19 See the instructions above. Given an index 𝑒 , decide if the Turing machine P𝑒
squares its input. That is, decide if 𝜙𝑒 performs 𝑦 ↦→ 𝑦 2 .
Section 6. Rice’s Theorem 105

6.20 See the instructions above. Given 𝑒 , determine if the function 𝜙𝑒 returns the
same value on two consecutive inputs, so that 𝜙𝑒 (𝑦) = 𝜙𝑒 (𝑦 + 1) for some 𝑦 ∈ N.
6.21 See the instructions above. Given an index 𝑒 , determine whether 𝜙𝑒 fails to
converge on input 5.
6.22 See the instructions above. Given an index, determine whether the Turing
machine P𝑒 fails to halt on all odd numbers.
6.23 See the instructions above. Given an index 𝑒 , decide if the function 𝜙𝑒
computed by machine P𝑒 performs 𝑥 ↦→ 𝑥 + 1.
6.24 See the instructions above. Given an index 𝑒 , decide if the function 𝜙𝑒 fails
to converge on both inputs 𝑥 and 2𝑥 , for some 𝑥 .
✓ 6.25 Show that each of these is an unsolvable problem by applying Rice’s Theorem.
(a) The problem of determining whether a function is partial, that is, whether it
fails to converge on some input.
(b) The problem of deciding whether a function ever converges, on any input.
✓ 6.26 For each problem, fill in the blanks to prove that it is unsolvable.
We will show that I = {𝑒 ∈ N (1) } is a nontrivial index set. Then Rice’s theorem will
give that the problem of determining membership in I is algorithmically unsolvable.
First we argue that I ≠ ∅. The routine sketched here: (2) is intuitively computable so
by Church’s Thesis there is such a Turing machine. That machine’s index is an element of I .
Next we argue that I ≠ N. The other sketch: (3) is intuitively computable so by
Church’s Thesis there is such a Turing machine. Its index is not an element of I .
Finally, we show that I is an index set. Suppose that 𝑒 ∈ I and that 𝑒ˆ is such that 𝜙𝑒 ≃ 𝜙𝑒ˆ.
Because 𝑒 ∈ I , (4) . Because 𝜙𝑒 ≃ 𝜙𝑒ˆ we have that (5) . Thus, 𝑒ˆ ∈ I . Consequently I
is an index set.
(a) Given 𝑒 , determine if Turing machine 𝑒 halts on all inputs 𝑥 that are multiples
of five.
(b) Given 𝑒 , decide if Turing machine 𝑒 ever outputs a seven.
6.27 Prove that any set that is not computable is infinite.
6.28 Define that a Turing machine accepts a set of bit strings L ⊆ B∗ if that machine
inputs bit strings, and it halts on all inputs, and it outputs 1 if and only if the input
is a member of L. Show that each problem is unsolvable, using Rice’s Theorem.
(a) The problem of deciding, given 𝑒 ∈ N, whether P𝑒 accepts an infinite language.
(b) The problem of deciding, given 𝑒 ∈ N, whether P𝑒 accepts the string 101.
6.29 As in the prior exercise, a Turing machine accepts a set of bit strings L ⊆ B∗
if it inputs bit strings, halts on all inputs, and it outputs 1 if input is a member
of L, and 0 otherwise. Show that this problem is unsolvable: given 𝑒 , determine if
P𝑒 accepts B∗ itself. Show that this problem is mechanically unsolvable: give 𝑒 ,
determine whether there is an input 𝑥 so that 𝜙𝑒 (𝑥)↓.
6.30 We say that a Turing machine has an unreachable state if for all inputs,
during the course of the computation the machine never enters that state. Show
that I = {𝑒 P𝑒 has an unreachable state } is not an index set.
106 Chapter II. Background

6.31 Your classmate says, “The section ends with ‘Rice’s Theorem is about those
properties of machines that extend to be properties of the functions that those
machines compute.’ But here is a problem that is about the properties of machines
but is also solvable: given 𝑒 , determine whether P𝑒 only halts on an empty input
tape. To solve this problem, give machine P𝑒 an empty input and see whether it
halt or it goes on.” Where are they mistaken?
6.32 Show that no finite set that is nonempty is an index set.
6.33 Show that each of these is an index set.
(a) {𝑒 ∈ N machine P𝑒 halts on at least five inputs }
(b) {𝑒 ∈ N the function 𝜙𝑒 is one-to-one }
(c) {𝑒 ∈ N the function 𝜙𝑒 is either total or else 𝜙𝑒 ( 3)↑}
6.34 In the section we characterized of index sets as in the picture below. We start
with the set of all integers, which is the rectangular box, and group them together
when they are indices of equal computable functions. Then to get an index set,
select a few parts such as the three shown, and take their union.

···

Here we justify that picture. (a) Consider the relation ≃ between natural numbers
given by 𝑒 ≃ 𝑒ˆ if 𝜙𝑒 ≃𝜙𝑒ˆ. Show that this is an equivalence relation. (b) Describe the
parts, the equivalence classes. (c) Show that each index set is the union of some
of the equivalence classes. Hint: show that if an index set contains one element of
a class then it contains them all.
6.35 Because being an index set is a property of a set, we naturally consider how
it interacts with set operations. (a) Show that the complement of an index set is
also an index set. (b) Show that the collection of index sets is closed under union.
(c) Is it closed under intersection? If so prove that and if not then give a
counterexample.
6.36 Do the 𝑒 0 ∈ I case in the proof of Rice’s Theorem, Theorem 6.6.

Section
II.7 Computably enumerable sets
The natural way to attack the Halting problem is to start by simulating P0 on
input 0 for one step. Next, simulate P0 on input 0 for a second step and also
simulate P1 on input 1 for one step. After that, run P0 on 0 for a third step, and P1
on 1 for a second step, and then P2 on 2 for one step. In this way, cycle among the
P𝑒 on 𝑒 simulations, running each for a step.† Eventually some of these halt and
†
That is, run a loop that at iteration 𝑖 runs the 𝑠 -th step of simulating P𝑒 on input 𝑒 , where 𝑖 = 𝑒 + 𝑠 .
Section 7. Computably enumerable sets 107

the elements of 𝐾 start to fill in. On computer systems this interleaving is called
time-slicing but in theory discussions it is called dovetailing.
We are imagining a computable 𝑓 such that 𝑓 ( 0) = 𝑒 , where it happens that
P𝑒 on input 𝑒 is the first of these to halt in the dovetailing, etc. The stream of
numbers 𝑓 ( 0) , 𝑓 ( 1) , . . . gives the elements of 𝐾 .
Why won’t this process solve the Halting problem? If 𝑒 ∈ 𝐾 then dovetailing will
eventually find that out. But if 𝑒 ∉ 𝐾 then it will never reveal the non-membership.
Recall that a set of natural numbers is computable if its characteristic function
is computable. We are seeing another way to describe a set, listing its members.
Definition 1.13 gives the terminology: a function 𝑓 with domain N ‘enumerates’ its
range.
7.1 Definition A set of natural numbers is computably enumerable (or c.e.) if it
is effectively listable, that is, if it is the range of a computable function. (That
function may be total or it may be partial.) Alternate terms for the same thing
are: recursively enumerable (or r.e.), or semicomputable, or semidecidable.
Picture the stream 𝜙 ( 0) , 𝜙 ( 1) , 𝜙 ( 2) , . . . gradually filling out the set. (It may
be that the stream may contain repeats, or it may be that the numbers may appear
in some willy-nilly order, not necessarily ascending, or perhaps some of the 𝜙 (𝑖) ’s
diverge.)
7.2 Remark Here is a particularly interesting stream. Fix a mathematical topic such
as elementary number theory. Statements in that topic are strings of symbols and
we can give each a number (perhaps by writing that statement in Unicode and the
number is its binary encoding, prefixed with a 1 to avoid any ambiguity of leading
0’s in the binary). Set up a process that starts with the axioms for this topic and
does a breadth-first traversal of all logical derivations from those axioms. It might
first combine axiom 0 with axiom 1, and then next combine axiom 0 with axiom 2,
etc. In this way it generates a list of all of this theory’s possible proofs. Whenever
it finishes a proof, the process outputs the number of the final statement in the
derivation, the proved statement.
Suppose that we have a statement from this topic that we are interested in, such
as Goldbach’s conjecture that every even number is the sum of at most two primes.
We could watch the process as it enumerates the theorems. If the statement is
provable then its number will eventually appear.
7.3 Lemma The following are equivalent for a set of natural numbers.
(a) It is computably enumerable, that is, it is the range of a computable function.
(b) It is the range of a total computable function, or it is empty.
(c) It is the domain of a computable function.
Proof We will show that the first and second are equivalent. That the second and
third are equivalent is Exercise 7.32.
As usual, one of the two directions is easy. Here it is (b) implies (a). If the set 𝑆
is the range of a total computable function then it is the range of a computable
function. If 𝑆 is empty then it is the range of the computable function that never
108 Chapter II. Background

converges.
Now for (a) implies (b). Assume that 𝑆 is computably enumerable so that it is
the range of a computable function 𝜙𝑒 (which may be non-total). If 𝜙𝑒 diverges for
all inputs then 𝑆 = ∅, which is one of the cases in (b).
In the other case, where 𝜙𝑒 does not diverge for all inputs, we will produce
a total computable 𝑓 whose range is 𝑆 . In this case there is an input 𝑦 where
𝜙𝑒 (𝑦)↓; let 𝑠 0 be 𝜙𝑒 (𝑦) . Define 𝑓 (𝑛) by: given 𝑛 ∈ N, run the computations of P𝑒
on inputs 0, 1, . . . 𝑛 , each for 𝑛 -many steps. Possibly some of these halt. Let 𝑓 (𝑛)
be the least 𝑘 where P𝑒 halts on some input 𝑖 within 𝑛 steps and outputs 𝑘 , and also
where 𝑘 ∉ { 𝑓 ( 0), 𝑓 ( 1), ... 𝑓 (𝑛 − 1) }. If no such 𝑘 exists then define 𝑓 (𝑛) = 𝑠 0 .
By the prior paragraph’s final sentence, 𝑓 is total. We must verify that the range
of 𝑓 is 𝑆 . For 𝑡 ∈ N, if 𝑡 ∉ 𝑆 then P𝑒 never outputs it and so 𝑡 is never defined
as 𝑓 (𝑛) . If 𝑡 ∈ 𝑆 then there must be an 𝑛 large enough that P𝑒 halts on some
input 𝑖 ≤ 𝑛 within 𝑛 steps and outputs 𝑡 . The number 𝑡 is then queued for output
by 𝑓 in the sense that it will be enumerated as, at most, 𝑓 (𝑛 + 𝑡) .
Thus, the collection of effectively listable sets is the same as the collection of
domains of computable functions. There is a standard notation for the latter.
7.4 Definition 𝑊𝑒 = {𝑥 𝜙𝑒 (𝑥)↓}
The contrast between computable and computably enumerable is that a set 𝑆 is
computable if there is a Turing machine that decides its membership, that inputs
a number 𝑥 and decides either ‘yes’ or ‘no’ whether 𝑥 ∈ 𝑆 . But with computably
enumerable, given some 𝑥 we can set up a machine to monitor the number stream
and if 𝑥 appear then this machine decides ‘yes’. However it might not ever discover
‘no’. Restated, a set is computable if there is a Turing machine that recognizes
both members and nonmembers, while a set is computably enumerable if there is
a Turing machine that recognizes members.
7.5 Lemma (a) If a set is computable then it is computably enumerable.
(b) A set is computable if and only if both it and its complement are computably
enumerable.
Proof For (a) let 𝑆 ⊆ N be computable so that its characteristic function is the
computable function 𝜙 . We will enumerate the elements of 𝑆 . Begin by using 𝜙 to
test whether 0 ∈ 𝑆 , that is, whether 𝜙 ( 0) = 1. Then test whether 1 ∈ 𝑆 , etc. If this
sequence of tests ever finds a 𝑘 0 so that 𝜙 (𝑘 0 ) = 1, then set 𝑓 ( 0) = 𝑘 0 . After that,
iterate: find the next element of 𝑆 by testing whether 𝑘 0 + 1 ∈ 𝑆 , 𝑘 0 + 2 ∈ 𝑆 , . . .
and if this testing sequence ever halts with a 𝑘 1 then set 𝑓 ( 1) = 𝑘 1 . Clearly 𝑓 is a
computable function whose range is 𝑆 .
As to (b), suppose first that 𝑆 is computable. The complement 𝑆 c is also
computable because its characteristic function is 1𝑆 c = 1 − 1𝑆 . With that, item (a)
gives that both 𝑆 and 𝑆 c are computably enumerable.
For the converse, suppose that both 𝑆 and 𝑆 c are computably enumerable. Let
𝑆 be enumerated by the function 𝑔 and let 𝑆 c be enumerated by 𝑔ˆ. To show that
𝑆 is computable we will give an effective procedure that acts as its characteristic
Section 7. Computably enumerable sets 109

procedure, that is, so that given 𝑥 ∈ N, the procedure determines whether or

not 𝑥 ∈ 𝑆 . The idea is to dovetail the two enumerations: first run the computation
of 𝑔( 0) for a step and the computation of 𝑔(
ˆ 0) for a step. Next run the computations
of 𝑔( 0) and 𝑔(ˆ 0) for a second step, along with the computations of 𝑔( 1) and 𝑔(
ˆ 1)
for a step each. Continuing in this way, eventually we will find that 𝑥 has been
enumerated into one of 𝑆 or 𝑆 c.
7.6 Corollary The Halting problem set 𝐾 is computably enumerable. Its complement
𝐾 c is not.
Proof The set 𝐾 is the domain of 𝑓 (𝑥) = 𝜙𝑥 (𝑥) , which is a partial computable func-
tion by Church’s Thesis, so it is computably enumerable. If the complement 𝐾 c were
also computably enumerable then Lemma 7.5 would imply that 𝐾 is computable,
which it isn’t.
That result gives one reason to be interested in computably enumerable sets,
namely that the Halting problem set 𝐾 falls into the class of computably enumerable
sets, as do such sets as {𝑒 𝜙𝑒 ( 3)↓} and {𝑒 there is an 𝑥 so that 𝜙𝑒 (𝑥) = 7 }. So
the collection of computably enumerable sets contains lots of interesting members.
Another reason that these sets are interesting is philosophical: with Church’s
Thesis in mind we can think that in a sense computable sets are the only ones that
we will ever fully know. In that line of thinking, computably enumerable sets are
those about which we have at least partial information: we can tell what is in but
we cannot necessarily tell what is out.

II.7 Exercises
✓ 7.7 A question on the quiz asked you to define computably enumerable. A friend
says that they answered, “A set that can be enumerated by a Turing machine but
that is not computable.” Is that right?
7.8 Your study partner asks the group, “One computably enumerable set is the
empty set. But the empty set is not effectively listable, because you can’t list
nothing.” Where are they mis-thinking?
✓ 7.9 For each set, produce a function that enumerates it (a) N (b) the even numbers
(c) the perfect squares (d) the set { 5, 7, 11 }.
7.10 For each, produce a function that enumerates it (a) the prime numbers
(b) the natural numbers whose digits are in non-increasing order (e.g., 531 or
5331 but not 513).
7.11 Are there computably enumerable sets that are infinite? Finite? Empty? All
of the natural numbers?
7.12 One of these two is computable and the other is computably enumerable but
not computable. Which is which?
(a) {𝑒 P𝑒 halts on input 4 in less than twenty steps }
(b) {𝑒 P𝑒 halts on input 4 in more than twenty steps }
110 Chapter II. Background

7.13 Which of these sets are decidable, which are semidecidable but not decidable,
and which are neither? Justify in one sentence. (a) The set of indices 𝑒 such that
P𝑒 takes more than 100 steps on input 7. (b) The set of indices 𝑒 such that P𝑒
takes less than 100 steps on input 7.
7.14 (IIS, IIT 2022) One of these statements is true. Which? (a) Every proper sub-
set of a computably enumerable set is computable. (b) If a set and its complement
are both computably enumerable then both are computable.
✓ 7.15 Someone online says, “every countable set 𝑆 is computably enumerable
because if 𝑓 : N → N has range 𝑆 then you have the enumeration 𝑆 as 𝑓 ( 0) , 𝑓 ( 1) ,
. . .” Explain why this is wrong.
✓ 7.16 The set 𝐴5 = {𝑒 𝜙𝑒 ( 5)↓} is not computable. Show that it is computably
enumerable.
7.17 Show that the collection of computably enumerable sets is countable.
7.18 Every uncomputable set is infinite, since every finite set is computable. Is
every computably enumerable set infinite?
7.19 Short answer: for each set, state whether it is computable, computably
enumerable but not computable, or neither. (a) The set of indices 𝑒 of Turing
machines that contain an instruction starting with state 𝑞 4 . (b) The set of indices
of Turing machines that halt on input 4. (c) The set of indices of Turing machines
that halt on input 4 in fewer than 100 steps.
7.20 Show that the set {𝑒 𝜙𝑒 ( 2) = 4 } is computably enumerable.
7.21 Name three sets that are computably enumerable but not computable.
✓ 7.22 Let 𝐾0 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 }. (a) Show that it is computably enu-
merable. (b) Show that its columns, the sets 𝐶𝑒 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 },
make up all of the computable enumerable sets.
7.23 We know that there are subsets of N that are not computable. Do the
computably enumerable sets make up the subsets that are not computable?
✓ 7.24 Show that the set Tot = {𝑒 𝜙𝑒 (𝑥)↓ for all 𝑥 } is not computable and not
computably enumerable. Hint: if this collection is computably enumerable then we
can get a table like the one that starts Section II.5 on the Halting problem.
7.25 Prove that the set {𝑒 𝜙𝑒 ( 3)↑} is not computably enumerable.
✓ 7.26 Prove that every infinite computably enumerable set has an infinite com-
putable subset.
7.27 Define the function steps by: steps (𝑒) is the minimal number of steps so
that Turing machine P𝑒 halts if started with input 𝑒 on its tape, or is undefined
if the machine never halts. (a) Argue that this function is partial computable.
(b) Argue that it is not total. (c) Prove that it has no total computable extension,
no total computable 𝑓 : N → N so that if steps (𝑒)↓ then steps (𝑒) = 𝑓 (𝑒) .
7.28 A set is computable enumerable in increasing order if there is a computable 𝑓
that is increasing, so that 𝑛 < 𝑚 implies 𝑓 (𝑛) < 𝑓 (𝑚) , and whose range is the
Section 8. Oracles 111

set. Assume that the set 𝑆 is infinite. Prove that 𝑆 is computable if and only if it is
computably enumerable in increasing order.
7.29 A set is co-computably enumerable if its complement is computably enu-
merable. Produce a set that is neither computably enumerable nor co-computably
enumerable.
7.30 (Compare this with the next exercise.) Computability is a property of sets so we
can consider its interaction with set operations. (a) Must a subset of a computable
set be computable? (b) Must the union of two computable sets be computable?
(c) Intersection? (d) Complement?
7.31 (Compare this with the prior exercise.) We can consider the interaction of
computable enumerability with set operations. (a) Must a subset of a computably
enumerable set be computably enumerable? (b) Must the union of two computably
enumerable sets be computably enumerable? (c) Intersection? (d) Complement?
7.32 Finish the proof of Lemma 7.3 by showing that the second and third items
are equivalent.

Section
II.8 Oracles
The problem of deciding whether a given machine halts is so hard that
it is unsolvable. It this the absolutely hardest problem or are there
ones that are even harder?
What does it mean to say that one problem is harder than another?
We have compared problem hardness already, for instance when we
considered the problem of whether a Turing machine halts on input 3.
There we proved that if we could solve the halts-on-3 problem then we
could solve the Halting problem. That is, we proved that halts-on-3 is
at least as hard as the Halting problem. So, the idea is that one problem
is at least as hard as a second one if solving the first would also give
a solution to the second.†
Under Church’s Thesis we interpret the unsolvability of the Halting
problem to say that no mechanism can answer all questions about
membership in 𝐾 . So if we want to answer questions about problems Priestess of Del-
that are related to 𝐾 then we need the answers to be supplied in some phi (Collier
‡ 1891)
way that isn’t an in-principle physically realizable discrete machine
Consequently, we posit an oracle that we attach to the Turing
machine and that acts as the characteristic function of a set. Thus, to see what we
could computed if we somehow were able to solve the Halting problem, we attach
†
We can instead conceptualize that the first problem is at least as general as the second. An example is
that the problem of inputting a natural number and outputting its prime factors is at least as general as
the problem of inputting a natural and determining whether it is divisible by seven. ‡ Turing introduced
oracles in his PhD thesis. He said, “We shall not go any further into the nature of this oracle apart from
saying that it cannot be a machine.”
112 Chapter II. Background

a 𝐾 -oracle that answers questions of the form “Is 𝑥 ∈ 𝐾 ?” for any 𝑥 . This oracle is
a black box, meaning that we can’t open it to see how it works.†

We could formally define computation relative to an oracle 𝑋 ⊆ N by extending

the definition of Turing machines. However, the low-level details are less helpful
in this section so we will instead describe it conceptually. Imagine starting with
a programming language and adding a Boolean function oracle ?, or changing
flowcharts by allowing queries.‡

..
.

Y N
𝑥 ∈ oracle?
(if (oracle? x)
(displayln "It is in the set")
(displayln "It is not in the set"))
.. ..
. .

We can change the oracle without changing the program code: in the picture if
we swap out black boxes, exchanging an 𝑋 oracle for a 𝑌 oracle, then the white
CPU box is unchanged. Of course, the values returned by the oracle queries may
change, which may change the tape output when we run the two-box system. But
such a swap leaves the white hardware unchanged.
The rest of what we have already developed about Turing machines carries
over. In particular, each such machine — each CPU box — has an index. That index
is source-equivalent, meaning that from an index we can compute the machine
source and from the source we can find the index.
Therefore to fully specify a such a computation, we must specify which machine
we are using, along with specifying which oracle. That explains the notations
for the white box, the oracle-ready Turing machine, P𝑒𝑋, and for the associated
functions, 𝜙𝑒𝑋.
8.1 Definition Let 𝑋 be a set. If the characteristic function of a set 𝑆 can be
computed relative to 𝑋 , that is, if 1𝑆 = 𝜙𝑒𝑋 for some 𝑒 , then we say that 𝑆 is
computable from the oracle 𝑋 , or is 𝑋 -computable, or is computable relative to 𝑋 ,
or that 𝑆 reduces to 𝑋 , or is Turing reducible to 𝑋 , denoted 𝑆 ≤𝑇 𝑋 .
†
Opening it would let out the magic smoke, the stuff inside of an electronic component that makes it
work. After all, once the smoke get out, the component no longer works. ‡ A particular computation
relative to an oracle might use one such query, or more than one, or none at all.
Section 8. Oracles 113

The intuition behind the notation 𝑆 ≤𝑇 𝑋 is that 𝑋 is more general than 𝑆 , or

“knows more” than 𝑆 , or has more computational content than 𝑆 , in that answers
to questions about 𝑋 suffice to give us answers to questions about 𝑆 .†
8.2 Example Recall the problem of determining, given 𝑒 , whether P𝑒 halts on input 3.
That problem asks for a machine that acts as the characteristic function of the set
𝐴 = {𝑒 P𝑒 halts on 3 }. We will show that 𝐾 ≤𝑇 𝐴.
To satisfy the above definition we produce an oracle machine P𝑒𝑋. We first
reprise Example 5.4. Start with the function 𝜓 : N2 → N on the left below. The
flowchart in the middle, by Church’s Thesis, shows that there is a Turing machine
whose input-output behavior is 𝜓 ; let it have index 𝑒 0 . The s-m-n theorem gives
a family of machines P𝑠 (𝑒0,𝑥 ) parametrized by 𝑥 , as outlined on the right. Then
𝑘 ∈ 𝐾 if and only if 𝑠 (𝑒 0, 𝑘) ∈ 𝐴, for any 𝑘 ∈ N.
Start Start

Read 𝑥 , 𝑦 Read 𝑦
42 – if 𝜙𝑥 (𝑥)↓
Run P𝑥 on 𝑥 Run P𝑥 on 𝑥
(
𝜓 (𝑥, 𝑦) =
↑ – otherwise
Print 42 Print 42

End End

With that we can build the oracle machine. The machine charted below is
P𝑒𝑋. It uses 𝑒 0 from the prior paragraph. If we connect it to an 𝐴 oracle then it
computes the characteristic function of 𝐾 , by the prior paragraph’s final sentence.

Start
Read 𝑘

Y 𝑠 (𝑒0, 𝑘 ) ∈ oracle? N

Print 1 Print 0

End

8.3 Theorem (a) (Reflexivity) Every set is computable from itself, 𝐴 ≤𝑇 𝐴.

(b) (Transitivity) If 𝐴 ≤𝑇 𝐵 and 𝐵 ≤𝑇 𝐶 then 𝐴 ≤𝑇 𝐶 .
Proof For Reflexivity we must show how to compute 1𝐴 using an 𝐴 oracle. So,
given a number 𝑥 , we want to decide whether 𝑥 is an element of 𝐴 by using the
𝐴 oracle. That’s trivial because those are exactly the questions that the oracle
answers.
For Transitivity suppose that P𝑒𝐵 computes the characteristic function of 𝐴 and
that P𝑒𝐶
ˆ
computes the characteristic function of 𝐵 . Then we can compute the
characteristic function of 𝐴 directly from 𝐶 , by going through in the computation
of 𝐴 from 𝐵 and replacing the 𝐵 -oracle calls with calls to P𝑒𝐶
ˆ
.
†
We have already discussed, in the Halting problem section on page 94, that the terminology ‘𝑆 reduces
to 𝑋 ’ can cause confusion. A common mistake is to read 𝑆 ≤𝑇 𝑋 as ‘𝑋 reduces to 𝑆 ’.
114 Chapter II. Background

We have a kind of ordering, where some sets precede others in the sense that
when 𝑆 ≤𝑇 𝑋 then 𝑆 is before 𝑋 .† The intuition is that sets that are larger in this
ordering “contain more information” or are “computationally harder” than the
ones that precede them. We next show that there are sets that come at the very
beginning of this ordering, set that less than or equal to every other set.
8.4 Lemma If a set 𝑌 ⊆ N is computable then for any 𝑋 ⊆ N at all, 𝑌 ≤𝑇 𝑋 . In
particular, ∅ ≤𝑇 𝑋 and N ≤𝑇 𝑋 . Further, a set 𝑍 ⊆ N is computable if and only if
it is reducible to the empty set, 𝑍 ≤𝑇 ∅, or to any computable set.
Proof Assume that 𝑌 is computable, so that its characteristic function is computable.
That characteristic function can be computed relative to 𝑋 using an oracle Turing
machine, simply by never referring to the oracle, never asking it any questions.
As to the second statement, the prior paragraph proves that if a set is computable
then it is reducible to a computable set. For the other half of the double implication,
suppose that the characteristic function of 𝑍 can be computed by reference to a
computable oracle, so that 1𝑍 = 𝜙𝑒𝑋 where 1𝑋 is computable. Then replacing
oracle calls in the machine P𝑒𝑋 with direct computations of 1𝑋 will compute 1𝑍
without reference to an oracle. Hence, 𝑍 is computable.
8.5 Definition Two sets 𝐴, 𝐵 are Turing equivalent or 𝑇 -equivalent, denoted 𝐴 ≡𝑇 𝐵 ,
if both 𝐴 ≤𝑇 𝐵 and 𝐵 ≤𝑇 𝐴.
Showing that two sets are 𝑇 -equivalent, that they are inter-computable, shows
that there is an underlying unity between the two seemingly-different problems,
that they are restatements of the same problem.
8.6 Example Any two computable sets are Turing equivalent, by Lemma 8.4.
8.7 Example In Example 8.2 we proved that 𝐾 ≤𝑇 𝐴 where 𝐴 = {𝑒 P𝑒 halts on 3 }.
Exercise 8.22 uses a very similar argument to show that 𝐾 ≤𝑇 𝐴. So 𝐴 ≡𝑇 𝐾 .
Of course, the Halting problem asks whether P𝑒 halts on input 𝑒 . A person may
perceive that a more natural problem is deciding whether P𝑒 halts on input 𝑥 .
8.8 Definition 𝐾0 = { ⟨𝑒, 𝑥⟩ P𝑒 halts on input 𝑥 }
We will show that the two are Turing equivalent, that with access to solutions
to one problem then we can compute solutions to the other.‡
8.9 Theorem 𝐾 ≡𝑇 𝐾0 .
Proof For 𝐾 ≤𝑇 𝐾0 , suppose that we have access to a 𝐾0 -oracle. Then this
machine, if connected to that oracle, will have the input/output behavior that is
the characteristic function of 𝐾 .

†
This ordering between sets turns out not to be linear. It is more like the ‘divides’ relation between
integers, where 2 divides 6 and 2 divides 10, but 6 and 10 are not related. ‡ Thus our choice of 𝐾 as
our touchstone is just a matter of convenience and convention. We use it because it is the standard in the
literature and because it has some technical advantages, including that it falls out of the diagonalization
development that we did at the start of the Halting problem section.
Section 8. Oracles 115

Start
Read 𝑥

Y
h𝑥, 𝑥 i ∈ oracle? N

Print 1 Print 0

End

What remains is the 𝐾0 ≤𝑇 𝐾 half. Consider the flowchart on the left below.
It halts for the input triple ⟨𝑒, 𝑥, 𝑦⟩ if and only if ⟨𝑒, 𝑥⟩ ∈ 𝐾0 . By Church’s Thesis
there is a Turing machine implementing it; let that machine have index 𝑒 0 .

Start Start

Read 𝑒, 𝑥, 𝑦 Read 𝑦

Simulate P𝑒 Simulate P𝑒
on input 𝑥 on input 𝑥

Print 42 Print 42

End End

Get the flowchart on the right by applying the s-m-n theorem to parametrize 𝑒
and 𝑥 . That is, on the right is a sketch of P𝑠 (𝑒0,𝑒,𝑥 ) .
Now for the oracle Turing machine. Given a pair ⟨𝑒, 𝑥⟩ , the right-side machine
above, P𝑠 (𝑒0,𝑒,𝑥 ) , behaves the same on all inputs 𝑦 , namely, it either halts on all
inputs or fails to halt on all inputs, depending on whether 𝜙𝑒 (𝑥)↓. In particular,
P𝑠 (𝑒0,𝑒,𝑥 ) halts on input 𝑠 (𝑒 0, 𝑒, 𝑥) , so that 𝑠 (𝑒 0, 𝑒, 𝑥) ∈ 𝐾 , if and only if 𝜙𝑒 (𝑥)↓.

Start
Read 𝑒 , 𝑥

Y
𝑠 (𝑒0, 𝑒, 𝑥 ) ∈ oracle? N

Print 1 Print 0

End

Thus, given oracle 𝐾 the machine above acts as the characteristic function of 𝐾0 .
8.10 Corollary The Halting problem is at least as hard as any computably enumerable
problem: 𝑊𝑒 ≤𝑇 𝐾 for all 𝑒 ∈ N.
Proof By Lemma 7.3 the computably enumerable sets are the columns of 𝐾0 , as
𝑊𝑒 = {𝑦 𝜙𝑒 (𝑦)↓} = {𝑦 ⟨𝑒, 𝑦⟩ ∈ 𝐾0 }. So 𝑊𝑒 ≤𝑇 𝐾0 ≡𝑇 𝐾 .
Because every computably enumerable set is Turing-computable from 𝐾 , we
say that 𝐾 is complete among the computably enumerable sets.
Jumping We are ranking sets by how hard they are to compute. This illustrates.
The dots are sets 𝑆 ⊂ N, where in the 𝑆 ≤𝑇 𝑋 ordering, 𝑋 is drawn higher than 𝑆 .
116 Chapter II. Background

The computable sets are at the bottom, grouped in with the empty set. The sets
that are Turing equivalent to 𝐾 are grouped together at the top of the computably
enumerable sets.

𝐾𝐾
𝐾
c.e. sets

We finish this section by describing a way, given a set 𝑆 , to jump further up the
order than 𝑆 . (The set 𝐾 𝐾 illustrates the jump of 𝐾 .)
8.11 Theorem Where the relativized Halting problem is to determine membership
in 𝐾 𝐾 = {𝑥 𝜙𝑥𝐾 (𝑥)↓}, its solution is not computable from a 𝐾 oracle. That is,
there is no index 𝑒 ∈ N such that 𝜙𝑒𝐾 is the characteristic function of 𝐾 𝐾.
Proof We will adapt the proof that the Halting problem is unsolvable. Assume
otherwise, that there is a computation relative to a 𝐾 oracle, P𝑒𝐾0, that acts as the
characteristic function of 𝐾 𝐾.
1 – if 𝜙𝑥𝐾 (𝑥)↓
(
P𝑒𝐾0 (𝑥) = 1𝐾 𝐾 (𝑥) = (∗)
0 – otherwise
Consider the function on the left below, along with its flowchart on the right. Inside
the decision box the computation uses a 𝐾 oracle. (Rather than describe it as an
oracle-ready chart with a general oracle 𝑋 and then later say that we give it 𝐾 as 𝑋 ,
we’ve just written in the 𝐾 .) The first equality in (∗) makes P𝑒𝐾0 a total function.

Start

Read 𝑥

Y
𝜙𝑒𝐾0 (𝑥 ) = 0 ? N
42 – if 𝜙𝑥𝐾 (𝑥)↑
(
𝑓 𝐾(𝑥) =
Print 42 Infinite loop
↑ – if 𝜙𝑥𝐾 (𝑥)↓

End

Since 𝑓 is computable, it has an index. Let that index be 𝑒ˆ, so that 𝑓 𝐾 = 𝜙𝑒𝐾 ˆ
.
Now feed 𝑓 its own index — that is, consider 𝑓 𝐾(𝑒ˆ) = 𝜙𝑒𝐾 ˆ
( ˆ
𝑒 ) . If that diverges
then we follow the first clause in the definition of 𝑓 , which gives that 𝑓 𝐾 (𝑒ˆ) ↓,
which is a contradiction. If instead 𝑓 𝐾 (𝑒ˆ) converges then the second clause in
the definition of 𝑓 gives that 𝑓 𝐾 (𝑒ˆ)↑, also a contradiction. Either way, assuming
that (∗) can be computed relative to a 𝐾 oracle gives an impossibility.
8.12 Theorem For any set 𝑆 , that relativized Halting problem for 𝑆 is to determine
membership in 𝐾 𝑆 = {𝑥 𝜙𝑥𝑆 (𝑥)↓}. Every set is reducible to its relativized Halting
problem: 𝑆 ≤𝑇 𝐾 𝑆.
Section 8. Oracles 117

Proof On the left below is an oracle machine that is intuitively mechanically

computable. So Church’s Thesis gives that it has an index, making it P𝑒𝑋0 for
some 𝑒 0 . Use the s-m-n theorem to parametrize 𝑥 , giving the uniformly computable
family of machines P𝑠𝑋(𝑒 ,𝑥 ) charted on the right.
0

Start Start

Read 𝑥, 𝑦 Read 𝑦

Y N Y N
𝑥 ∈ oracle? 𝑥 ∈ oracle?
Print 42 Loop Print 42 Loop

End End

The machine halts for any input 𝑦 if and only if 𝑥 is a member of the
P𝑠𝑋(𝑒 ,𝑥 )
0
oracle. Taking the oracle to be 𝑆 and 𝑦 to be 𝑠 (𝑒 0, 𝑥) gives that 𝑥 ∈ 𝑆 if and only if
𝜙𝑠𝑆(𝑒0,𝑥 ) (𝑠 (𝑒 0, 𝑥))↓, which in turn holds if and only if 𝑠 (𝑒 0, 𝑥) ∈ 𝐾 𝑆. So if 𝐾 𝑆 is the
oracle for the following machine
Start
Read 𝑥

Y
𝑠 (𝑒0 , 𝑥 ) ∈ oracle? N

Print 1 Print 0

End

then that machine acts as the characteristic function of 𝑆 .

8.13 Corollary 𝐾 ≤𝑇 𝐾 𝐾 , but 𝐾 𝐾 ≰𝑇 𝐾
Proof This follows from the prior result, and the one before that.
The start of this section asks whether there are any problems harder than
the Halting problem. We now have the answer: one problem strictly harder than
computing the characteristic function of 𝐾 is to compute the characteristic function
of 𝐾 𝐾.

II.8 Exercises
Recall that a Turing machine is a decider for a set if it computes the characteristic
function of that set.
8.14 How to answer to your friend? “An oracle machine is a Turing machine with
a black box oracle that is able to decide certain problems in a single operation. As
the oracle you can even use undecidable problems, such as the Halting problem.
But isn’t assuming the existence of a machine which can decide the Halting problem
. . . problematic?”
8.15 Both oracles and deciders take in a number and return 0 or 1, giving whether
that number is in the set. What’s the difference?
118 Chapter II. Background

✓ 8.16 Your friend says the professor, “Oracle machines are not real so why talk
about them?” What should the professor say?
8.17 Your classmate says they answered a quiz question to define an oracle with,
“A set to solve unsolvable problems.” Give them a gentle critique.
✓ 8.18 Is there an oracle for every problem? For every problem is there an oracle?
8.19 A person in your class asks, “Oracles can solve unsolvable problems, right?
And 𝐾 𝐾 is unsolvable. So an oracle like the 𝐾 oracle should solve it.” Help your
prof out here; suggest a response.
✓ 8.20 Suppose that the set 𝐴 is Turing-reducible to the set 𝐵 . Which of these are
true?
(a) A decider for 𝐴 can be used to decide 𝐵 .
(b) If 𝐴 is computable then 𝐵 is computable also.
(c) If 𝐴 is uncomputable then 𝐵 is uncomputable too.
8.21 Where 𝐵 ⊆ N is a set, let 2𝐵 = { 2𝑏 𝑏 ∈ 𝐵 }. We will show that 𝐵 ≡𝑇 2𝐵 .
(a) Give a flowchart sketching a machine that, given access to oracle 2𝐵 , will
act as the characteristic function of 𝐵 . That is, this machine witnesses that
𝐵 ≤𝑇 2𝐵 .
(b) Sketch a machine that, given access to oracle 𝐵 , will act as the characteristic
function of 2𝐵 . This machine witnesses that 2𝐵 ≤𝑇 𝐵 .
✓ 8.22 Where 𝐴 = {𝑒 P𝑒 halts on 3 }, show that 𝐴 ≤𝑇 𝐾 . Hint: this machine
satisfies that 𝜙𝑖 (𝑖)↓ if and only if 𝜙𝑥 ( 3)↓.

Start

Read 𝑦

Run P𝑥 on 3

End

✓ 8.23 The set 𝑆 = {𝑥 𝜙𝑒 ( 3)↓ and 𝜙𝑒 ( 4)↓} is not computable. Sketch how to
compute it using a 𝐾 oracle. That is, sketch an oracle machine that shows 𝑆 ≤𝑇 𝐾 .
Hint: follow Example 8.2.
✓ 8.24 For the set 𝑆 = {𝑒 𝜙𝑒 ( 3)↓}, show that 𝑆 ≤𝑇 𝐾0 .
✓ 8.25 Show that 𝐾 ≤𝑇 {𝑥 𝜙𝑥 (𝑦) = 2𝑦 for all input 𝑦 }.
8.26 Consider the set {𝑥 𝜙𝑥 ( 𝑗) = 7 for some 𝑗 }.
(a) Show that it is not computable using Rice’s theorem.
(b) Sketch how to compute it using a 𝐾 oracle.
8.27 Let 𝑆 = {𝑥 ∈ N 𝜙𝑥 ( 3)↓ and 𝜙 2𝑥 ( 3)↓ and 𝜙𝑥 ( 3) = 𝜙 2𝑥 ( 3) }. Show 𝑆 ≤𝑇 𝐾
by producing a way to answer questions about membership in 𝑆 from a 𝐾 oracle.
8.28 Recall that a computable function 𝜙 is total if 𝜙 (𝑦)↓ for all 𝑦 ∈ N. The set of
total functions is Tot. Show that 𝐾 ≤𝑇 Tot.
Section 9. Fixed point theorem 119

8.29 A computable partial function 𝜙𝑥 is extensible if there is a computable total

function 𝜙 where whenever 𝜙𝑥 (𝑦)↓ then the two agree, 𝜙𝑥 (𝑦) = 𝜙 (𝑦) . The set of
extensible functions is Ext.
(a) Show that this function is not a member of Ext: if 𝑥 ∈ 𝐾 then steps (𝑥) is the
smallest step number 𝑠 where P𝑥 halts on input 𝑥 by step 𝑠 , and steps (𝑥) ↑
otherwise.
(b) Prove that 𝐾 ≤𝑇 Ext.
8.30 Let 𝐴 and 𝐵 be sets. Show that if 𝐴(𝑞) = 𝐵(𝑞) for all 𝑞 ∈ N used in the
oracle computation 𝜙 𝐴 (𝑥) then 𝜙 𝐴 (𝑥) = 𝜙 𝐵 (𝑥) .
✓ 8.31 Prove that 𝐴 ≤𝑇 𝐴c for all 𝐴 ⊆ N.
8.32 Show that 𝐾 ≰𝑇 ∅.
8.33 Is the number of oracles countable or uncountable?
8.34 Fix an oracle. How many sets are computable from that oracle?.
8.35 Let 𝐴 and 𝐵 be sets. Produce a set 𝐶 so that 𝐴 ≤𝑇 𝐶 and 𝐵 ≤𝑇 𝐶 .
8.36 The relation ≤𝑇 is between sets so we naturally ask how it interacts with set
operations. (a) Does 𝐴 ⊆ 𝐵 imply 𝐴 ≤𝑇 𝐵 ? (b) Is 𝐴 ≤𝑇 𝐴 ∪𝐵 ? (c) Is 𝐴 ≤𝑇 𝐴 ∩𝐵 ?
(d) Is 𝐴 ≤𝑇 𝐴c ?
8.37 Assume 𝐴 ≤𝑇 𝐵 . Decide whether each is True or False, and sketch the
argument. (a) 𝐴c ≤𝑇 𝐵 (b) 𝐴 ≤𝑇 𝐵 c (c) 𝐴c ≤𝑇 𝐵 c
8.38 Let 𝐴 ⊆ N. (a) Produce a definition of when a set is computably enumerable
in an oracle. (b) Show that N is computably enumerable in 𝐴 for all sets 𝐴.
(c) Show that 𝐾 𝐴 is computably enumerable in 𝐴.

Section
II.9 Fixed point theorem
Recall our first example of diagonalization, the proof that the set of real numbers is
not countable, on page 74. We assumed that there is an onto function 𝑓 : N → R
and considered its inputs and outputs, as illustrated in this table.

𝑛 𝑓 (𝑛) ’s decimal expansion

0 42 . 3 1 2 7 7 0 4 ...
1 2.0 1 0 0 0 0 0 ...
2 1.4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
.. ..
. .

Let row 𝑛 ’s decimal representation be 𝑑𝑛 = 𝑑ˆ𝑛 .𝑑𝑛,0 𝑑𝑛,1 𝑑𝑛,2 ... Go down the diagonal
to the right of the decimal point to get the sequence of digits 𝑑 0,0, 𝑑 1,1, 𝑑 2,2, ...,
which in the illustration above is 3, 1, 4, 5, ... Using that, construct a number 𝑧 =
0.𝑧 0 𝑧 1 𝑧 2 ... by making its 𝑛 -th decimal place 𝑧𝑛 be something other than 𝑑𝑛,𝑛 . In
120 Chapter II. Background

our earlier example we took the digit transformation 𝑡 given by 𝑡 (𝑑𝑛,𝑛 ) = 2 if

𝑑𝑛,𝑛 = 1, and 𝑡 (𝑑𝑛,𝑛 ) = 1 otherwise, so that the table above yields 𝑧 = 0.1211 ...
Then the argument culminates in verifying that 𝑧 is not any of the rows. We say
that we got 𝑧 by ‘diagonalizing out’ of the list of rows.

When diagonalization fails What if the transformation is such that the diagonal
is a row, that 𝑧 = 𝑓 (𝑛 0 ) ? Then the array member where the diagonal crosses
that row is unchanged by the transformation, 𝑑𝑛0,𝑛0 = 𝑡 (𝑑𝑛0,𝑛0 ) . Conclusion: if
diagonalization fails then the transformation has a fixed point.
We will apply this to sequences of computable functions, 𝜙𝑖 0 , 𝜙𝑖 1 , 𝜙𝑖 2 ... We
are interested in effectiveness so we take the indices 𝑖 0, 𝑖 1, 𝑖 2 ... to be computable,
meaning that for some 𝑒 we have 𝑖 0 = 𝜙𝑒 ( 0) , 𝑖 1 = 𝜙𝑒 ( 1) , 𝑖 2 = 𝜙𝑒 ( 2) , etc. In short,
a computable sequence of computable functions has this form.

𝜙𝜙𝑒 ( 0 ) , 𝜙𝜙𝑒 ( 1 ) , 𝜙𝜙𝑒 ( 2 ) ...

Below is a table with all such sequences.

Sequence term
𝑛 =0 𝑛 =1 𝑛 =2 𝑛 =3 ...
𝑒 =0 𝜙𝜙 0 ( 0 ) 𝜙𝜙 0 ( 1 ) 𝜙𝜙 0 ( 2 ) 𝜙𝜙 0 ( 3 ) ...
𝑒 =1 𝜙𝜙 1 ( 0 ) 𝜙𝜙 1 ( 1 ) 𝜙𝜙 1 ( 2 ) 𝜙𝜙 1 ( 3 ) ... (∗)
Sequence 𝑒 =2 𝜙𝜙 2 ( 0 ) 𝜙𝜙 2 ( 1 ) 𝜙𝜙 2 ( 2 ) 𝜙𝜙 2 ( 3 ) ...
𝑒 =3 𝜙𝜙 3 ( 0 ) 𝜙𝜙 3 ( 1 ) 𝜙𝜙 3 ( 2 ) 𝜙𝜙 3 ( 3 )
.. .. ..
. . .

Each entry 𝜙𝜙𝑒 (𝑛) is a computable function. If the index 𝜙𝑒 (𝑛) diverges then the
function as whole diverges.
As to the transformation, the natural one is this, for the computable function 𝑓 .

𝑡𝑓 𝑡𝑓
𝜙𝑥 ↦−→ 𝜙 𝑓 (𝑥 ) so that 𝜙𝜙𝑖 ( 𝑗 ) ↦−→ 𝜙 𝑓 (𝜙𝑖 ( 𝑗 ) )

We next show that under this transformation, diagonalization fails. So 𝑡 𝑓 has a

fixed point.
9.1 Theorem (Fixed Point Theorem, Kleene 1938)† For any total computable
function 𝑓 there is a number 𝑘 such that 𝜙𝑘 = 𝜙 𝑓 (𝑘 ) .
Proof The flowchart on the left below sketches a function 𝑓 (𝑛, 𝑥) = 𝜙𝜙𝑛 (𝑛) (𝑥) .
Church’s Thesis says that some Turing machine computes this function; let that
machine have index 𝑒 0 . Apply the s-m-n theorem to parametrize 𝑛 , giving the
right chart, which describes a family of machines. The 𝑛 -th member of that
†
This is also known as the Recursion Theorem. But there is another widely used result with that name,
and this name is both common and more descriptive of the result.
Section 9. Fixed point theorem 121

family, 𝜙𝑠 (𝑒0,𝑛) , computes the 𝑛 -th function on the diagonal of the array (∗) above,
𝜙𝜙 0 ( 0 ) , 𝜙𝜙 1 ( 1 ) , 𝜙𝜙 2 ( 2 ) ...

Start Start

Read 𝑛, 𝑥 Read 𝑥

Run P𝑛 on 𝑛 Run P𝑛 on 𝑛 𝜙𝜙𝑛 (𝑛) (𝑥) – if 𝜙𝑛 (𝑛)↓

(
𝜙𝑠 (𝑒0,𝑛) (𝑥) =
With the result 𝑤 , With the result 𝑤 ,
↑ – otherwise
run P𝑤 on 𝑥 run P𝑤 on 𝑥

End End

The index 𝑒 0 is fixed, so 𝑠 (𝑒 0, 𝑛) is a function of one variable. Let 𝑔(𝑛) = 𝑠 (𝑒 0, 𝑛) ,

so that the functions on the diagonal are 𝜙𝜙 0 ( 0 ) = 𝜙𝑔 ( 0 ) , 𝜙𝜙 1 ( 1 ) = 𝜙𝑔 ( 1 ) , . . . The
function 𝑔 is computable and total because 𝑠 is computable and total.
Under 𝑡 𝑓 those functions are transformed to 𝜙 𝑓𝑔 ( 0 ) , 𝜙 𝑓𝑔 ( 1 ) , 𝜙 𝑓𝑔 ( 2 ) , ... The
composition 𝑓 ◦ 𝑔 is computable and total since 𝑓 is specified as total in the
theorem statement.

Start

Read 𝑥
𝜙 𝑓 𝜙𝑛 (𝑛) (𝑥) – if 𝜙𝑛 (𝑛)↓ Run P𝑛 on 𝑛
(
𝜙 𝑓 𝑔 (𝑛) (𝑥) =
With the result 𝑤 ,
↑ – otherwise
run P 𝑓 (𝑤) on 𝑥

End

As the flowchart underlines, 𝜙 𝑓𝑔 ( 0 ) , 𝜙 𝑓𝑔 ( 1 ) , 𝜙 𝑓𝑔 ( 2 ) , ... is a computable sequence of

computable functions. Hence it is one of the table’s rows. Let it be row 𝑣 , making
𝜙 𝑓𝑔 (𝑚) = 𝜙𝜙 𝑣 (𝑚) for all 𝑚 . Consider where the diagonal sequence 𝜙𝜙𝑛 (𝑛) = 𝜙𝑔 (𝑛)
intersects that row: 𝜙𝑔 (𝑣) = 𝜙𝜙 𝑣 (𝑣) = 𝜙 𝑓𝑔 (𝑣) . The fixed point for 𝑓 is 𝑘 = 𝑔(𝑣) .
The Fixed Point Theorem says that when we try to diagonalize out of the
partial computable functions, diagonalization fails. That is, the notion of partial
computable function seems to have an built-in defense against diagonalization.
The Fixed Point Theorem applies to any total computable function. It conse-
quently it leads to many results, many of them surprising.
9.2 Corollary There is an index 𝑒 so that 𝜙𝑒 = 𝜙𝑒+1 .
Proof The function 𝑓 (𝑥) = 𝑥 + 1 is computable and total. So there is an 𝑒 ∈ N
such that 𝜙𝑒 = 𝜙 𝑓 (𝑒 ) .
9.3 Corollary There is an index 𝑒 such that P𝑒 halts only on 𝑒 .
Proof Consider the machine described by the flowchart on the left below. By
Church’s Thesis it can be done with a Turing machine, P𝑒0 . Parametrize to get the
122 Chapter II. Background

program on the right, P𝑠 (𝑒0,𝑚) .

Start Start

Read 𝑚, 𝑥 Read 𝑥
42 – if 𝑥 = 𝑚
(

Y 𝑥 = 𝑚? N Y 𝑥 = 𝑚? N
𝜙𝑠 (𝑒𝑜 ,𝑚) (𝑥) =
↑ – otherwise
Print 42 Loop Print 42 Loop

End End

Since 𝑒 0 is fixed, 𝑠 (𝑒 0, 𝑥) is a total computable function of one variable, 𝑓 (𝑚) =

𝑠 (𝑒 0, 𝑚) , where the associated Turing machine halts only on input 𝑚 . The function 𝑓
has a fixed point, 𝜙 𝑓 (𝑒 ) = 𝜙𝑒 , and the associated Turing machine P𝑒 halts only
on 𝑒 .
This says that there is a Turing machine that halts only on one input, its index.
Rephrased for rhetorical effect, this machine’s name is where it halts.
9.4 Corollary There is an 𝑚 ∈ N such that 𝜙𝑚 (𝑥) = 𝑚 for all 𝑥 .
Proof Consider the function 𝜓 (𝑥, 𝑦) = 𝑥 . As the flowchart on the left below shows,
it is computable. By Church’s Thesis there is a Turing machine that computes it.
Let that machine have index 𝑒 0 , so that 𝜓 (𝑥, 𝑦) = 𝜙𝑒0 (𝑥, 𝑦) = 𝑥 .

Start Start

Read 𝑥 , 𝑦 Read 𝑦

Print 𝑥 Print 𝑥

End End

Apply the s-m-n theorem to get a uniformly computable family of functions

parametrized by 𝑥 , given by 𝜙𝑠 (𝑒0,𝑥 ) (𝑦) = 𝑥 . Let 𝑓 : N → N be 𝑓 (𝑥) = 𝑠 (𝑒 0, 𝑥) .
Because 𝑒 0 is fixed this is a total computable function of one input. Therefore there
is a fixed point 𝑚 ∈ N with 𝜙𝑚 (𝑦) = 𝜙 𝑓 (𝑚) (𝑦) = 𝜙𝑠 (𝑒0,𝑚) (𝑦) = 𝑚 for all 𝑦 .
In a sense, this machine’s behavior is the same as its index. Imagine that we
happened to look at the source P7 and we realized that it outputs 7 on all inputs.
We might think that this is a crazy accident of our choice of numbering scheme for
machines. But the corollary says that any acceptable numbering must for some
machine have this coincidence between index and behavior.
Further, since a Turing machine’s index is source-equivalent, this result suggests
that there is a program that outputs its own source, that self-reproduces. This is a
form of self-reference; for more see Extra C.

Discussion The Fixed Point Theorem and its proof are often considered mysterious,
or at any rate obscure. Here we will develop a point about the role of naming in
the result.
Section 9. Fixed point theorem 123

Compare the sentence Atlantis is a mythical city with There are two t’s in ‘Atlantis’.
In the first we say that ‘Atlantis’ is used because it points to something, it has a value,
it names something. In the second ‘Atlantis’ is not referring to something — its value
is itself — so we say that it is mentioned.† This is the use-mention distinction, that
we are using the word on two different levels.
A version of this happens in computer programming. See the C language code
below. There, x and y are variables. If these were ordinary variables then the
compiler would associate them with a particular memory cell. For instance, if an
ordinary variable a were associated with cell 122 then the statement a = 5 would
result in the value 5 being stored in that cell. This is a name for the cell.
But the second line’s asterisk means that x and y are not
ordinary variables, they are pointers, which are associated with
a cell but have some additional implications. The four vertical
arrays illustrate by showing a machine’s memory cells over time.
They imagine that the compiler associates x with register 123
and y with 124. The first array has that cell 123 holds the
number 901 and cell 124 is 902.
Because these are pointers, we have declared to the compiler
that we are interested in the contents of the memory cells that Courtesy xkcd.com
they point to: cell 123 is itself a name for location 901, and
124 names 902. The second vertical array illustrates, showing the effect of running
the *x = 42 statement. The system does not put 42 into 123, rather it puts 42
into 901. Next, with y = x the system sets the cell named by y to point to the
same address as x’s cell, address 901. Finally, the last line puts 13 where y points,
which is at this moment the same cell to which x points.
. . . .
Address .
.
.
.
.
.
.
.
void main() { 123 901 901 901 901
int* x,y; 124 902 902 901 901
x = malloc(sizeof(int));
y = malloc(sizeof(int)); . . . .
. . . .
. . . .
*x = 42;
y = x; 901 42 42 13
*y = 13; 902
}

9.5 Animation: Pointers in a C program.

Here, as with ‘Atlantis’, x and y are being used on two different levels. One is
that x refers to the contents of register 123, so it names 123. The other level is
that the system is set up to refer to the contents of the contents, that is, to what’s
in address 901. On this level, x and y are names for names.
As to the role played by names in the Fixed Point Theorem, recall the Padding
†
We see this distinction in programming books. In the sentence, “The number of players is players”
the first ‘players’ refers to people while the second is a program variable. The typewriter font helps
with the distinction. Similarly in this book we use italic for variables such as 𝑎 , which have a value, and
typewriter for characters such as a, which are a value.
124 Chapter II. Background

Lemma, Lemma 2.18, that every computable function has infinitely many indices.
So it is easy for a computable function to have two different names. We see this
in Theorem 9.1, where the conclusion that 𝜙𝑘 = 𝜙 𝑓 (𝑘 ) does not say that the two
indices are equal. Rather it says that they describe machines that give rise to the
same input/output relationship.
Another example is that in the proof 𝑔(𝑛) is this.

𝜙𝜙𝑛 (𝑛) (𝑥) – if 𝜙𝑛 (𝑛)↓

(
𝜙𝑔 (𝑛) (𝑥) = 𝜙𝑠 (𝑒0,𝑛) (𝑥) = (∗)
↑ – otherwise

So 𝑔(𝑛) , 𝑠 (𝑒 0, 𝑛) , and 𝜙𝑛 (𝑛) are names for the same function. Again, equality of
the named functions does not imply that the names are equal.
Informally, what 𝑔(𝑛) names is the procedure, “Given input 𝑥 , run P𝑛 on input 𝑛
and if it halts with output 𝑤 then run P𝑤 on input 𝑥 .” Shorter: “Produce 𝜙𝑛 (𝑛)
and then do 𝜙𝑛 (𝑛) .” So here also we see the use-mention distinction.
One way in which this distinction between what is named and the name itself
is important is that regardless of whether 𝜙𝑛 (𝑛) converges, we can nonetheless
compute the index 𝑔(𝑛) and from it the instruction set P𝑔 (𝑛) . There is an analogy
here with ‘Atlantis’ — despite that the referred-to city doesn’t exist we can still
sensibly assert things about its name.
In summary, the Fixed Point Theorem is deep, showing that surprising and
interesting behaviors occur in any sufficiently powerful computation system.

II.9 Exercises
9.6 Your friend asks you about the proof of the Fixed Point Theorem, Theorem 9.1.
“The last line says 𝜙𝑔 (𝑣) = 𝜙𝜙 𝑣 (𝑣) ; isn’t this just saying that 𝑔(𝑣) = 𝜙 𝑣 (𝑣) ? Why the
circumlocution?” What can you say?
✓ 9.7 Show each. (a) There is an index 𝑒 such that 𝜙𝑒 = 𝜙𝑒+7 . (b) There is an 𝑒
such that 𝜙𝑒 = 𝜙 2𝑒 .
9.8 What conclusion can you draw by applying the Fixed Point Theorem to the
adds-five function 𝑥 ↦→ 𝑥 + 5? Generalize.
9.9 What conclusion can you draw about acceptable enumerations of Turing
machines by applying the Fixed Point Theorem to each of these? (a) The tripling
function 𝑥 ↦→ 3𝑥 . (b) The squaring function 𝑥 ↦→ 𝑥 2 . (c) The function that gives
0 except for 𝑥 = 5, when it gives 1. (d) The constant function 𝑥 ↦→ 42.
✓ 9.10 We will prove that there is an 𝑚 such that 𝑊𝑚 = {𝑥 𝜙𝑚 (𝑥)↓} = {𝑚 2 }.
(a) Produce this uniformly computable family of functions.

42 – if 𝑦 = 𝑥 2
(
𝜙𝑠 (𝑒0,𝑥 ) (𝑦) =
↑ – otherwise

(b) Observe that 𝑒 0 is fixed so that 𝑠 (𝑒 0, 𝑥) is a function of one variable only, and
call that function 𝑔 : N → N.
Extra A. Hilbert’s Hotel 125

(c) Apply the Fixed Point Theorem to get the desired 𝑚 .

9.11 We will show there is an index 𝑚 so that 𝑊𝑚 = {𝑦 𝜙𝑚 (𝑦)↓} is the set
consisting of one element, the 𝑚 -th prime number. (a) Argue that the function
𝑝 : N → N such that 𝑝 (𝑥) is the 𝑥 -th prime is computable. (b) Use 𝑝 and the s-m-n
Theorem to get that this family of functions is uniformly computable: 𝜙𝑠 (𝑒0,𝑥 ) (𝑦)
is 42 if 𝑦 = 𝑝 (𝑥) , and 𝜙𝑠 (𝑒0,𝑥 ) (𝑦) diverges otherwise. (c) Draw the desired
conclusion.
✓ 9.12 Prove that there exists 𝑚 ∈ N such that 𝑊𝑚 = {𝑦 𝜙𝑚 (𝑦)↓} = 10𝑚 .
9.13 Show that there must be a pair of Turing machines P𝑒 and P𝑒ˆ with the
same input-output behavior, such that all of the digits in 𝑒ˆ are one larger than the
matching digits in 𝑒 , except that a 9 in 𝑒 changes to a 0 in 𝑒ˆ. For instance, we
might have 𝑒 = 35 and 𝑒ˆ = 46, or we might have 𝑒 = 29 and 𝑒ˆ = 30.
9.14 Show there is an index 𝑛 so that 𝑊𝑛 = {𝑥 𝜙𝑛 (𝑥)↓} = { 0, 1, ... 𝑛 }.
✓ 9.15 Show that there is a total and computable function that names itself in the
sense that it is constant, so that there is 𝑘 ∈ N with 𝜙 (𝑥) = 𝑘 for all inputs, and
that its index equals that constant, 𝜙 = 𝜙𝑘 .
9.16 The Fixed Point Theorem says that for all 𝑓 (which are computable and
total) there is an 𝑛 so that 𝜙𝑛 = 𝜙 𝑓 (𝑛) . What about the statement in which
we flip the quantifiers: for all 𝑛 ∈ N, does there exist a total and computable
function 𝑓 : N → N so that 𝜙𝑛 = 𝜙 𝑓 (𝑛) ?
9.17 Prove or disprove the existence of the set. (a) 𝑊𝑚 = {𝑦 𝜙𝑚 (𝑦)↓} = N −{𝑚 }
(b) 𝑊𝑚 = {𝑥 𝜙𝑚 (𝑥) diverges }
9.18 Corollary 9.3 shows that there is a computable function 𝜙𝑚 with domain {𝑚 }.
(a) Show that there is a computable function 𝜙𝑚 with domain {𝑚 + 1 } .
(b) Is there a computable function 𝜙𝑚 with range { 2𝑚 } ?
9.19 Prove that 𝐾 is not an index set. Hint: use Corollary 9.3 and the Padding
Lemma, Lemma 2.18.
9.20 We can extend the Fixed Point Theorem to show that not only does any
total computable 𝑓 : N → N have a fixed point, it has infinitely many distinct ones.
Hint: let 𝑓 : N → N be total and computable. Assume that its collection of fixed
points 𝐹 = {𝑘 ∈ N 𝜙𝑘 = 𝜙 𝑓 (𝑘 ) } is finite; we will get a contradiction.
(a) Show that there is a partial recursive function 𝑔 : N → N whose indices are
not members of 𝐹 , i.e., where 𝑔 = 𝜙𝑒 implies that 𝑒 ∉ 𝐹.
(b) Suppose that 𝑔 has index 𝑒 0 , so that 𝑔 = 𝜙𝑒 0 . Consider this function, which is
clearly total and computable.

– if 𝑥 ∈ 𝐹
(
𝑒0
ℎ(𝑥) =
𝑓 (𝑥) – otherwise

Show that ℎ has no fixed point, contradicting the Fixed Point theorem.
126 Chapter II. Background

Extra
II.A Hilbert’s Hotel

A famous mathematical fable dramatizes the question of countable and uncountable

sets.
Once upon a time there was an infinite hotel. The rooms were
numbered 0, 1, . . . , naturally. One day when every room was occupied,
someone new came to the front desk; could the hotel accommodate?
The clerk hit on an idea. They moved each guest up a room, so the guest
in room 𝑛 moved to room 𝑛 + 1, leaving room 0 empty. Thus this hotel
always has space for a new guest, or a finite number of new guests.
Next a bus rolled in with infinitely many people 𝑝 0, 𝑝 1, ... The clerk
had the brainstorm to move each guest to the room with twice their
current number, putting the guest from room 𝑛 into room 2𝑛 . Now, with
the odd-numbered rooms empty, 𝑝𝑖 can go in room 2𝑖 + 1, and everyone
has a room. Plenty of empty
rooms.
Then in rolled a convoy of buses, infinitely many of them, each with
infinitely many people: 𝐵 0 = {𝑝 0,0, 𝑝 0,1, ... }, and 𝐵 1 = {𝑝 1,0, 𝑝 1,1, ... },
etc. By now the spirit was clear: move each current guest to a new room with
twice the number and the new people go into the odd-numbered rooms, in the
breadth-first order that we use to count N × N.
After this experience the clerk may well suppose that there is always room in
the infinite hotel, that for any number of guests there is a sufficiently clever method
to have them all fit. Restated, this story makes natural the guess that all infinite
sets have the same cardinality. That guess is wrong. There are sets so large that
their members could not all fit. One such set is R.†

II.A Exercises
A.1 Imagine that the hotel is empty. A hundred buses arrive, where bus 𝐵𝑖
contains passengers 𝑏𝑖,0 , 𝑏𝑖,1 , etc. Give a scheme for putting them in rooms.
A.2 Give a formula assigning a room to each person from the infinite bus convoy.
A.3 The hotel builds a parking lot. Each floor 𝐹𝑖 has infinitely many spaces 𝑓𝑖,0 ,
𝑓𝑖,1 , . . . And, no surprise, there are infinitely many floors 𝐹 0 , 𝐹 1 , . . . One day
when the hotel is empty a fleet of buses arrive, one per parking space, each with
infinitely many people. Give a way to accommodate all these people.
A.4 The management is irked that this hotel cannot fit all of the real numbers. So
they announce plans for a new hotel, with a room for each 𝑟 ∈ R. Can they now
cover every possible set of guests?

†
Alas, the infinite hotel does not now exist. The guest in room 0 said that the guest from room 1 would
cover both of their bills. The guest from room 1 said yes, but in addition the guest from room 2 had
agreed to pay for all three rooms. Room 2 said that room 3 would pay, etc. So Hilbert’s Hotel made no
money despite having infinitely many rooms, or perhaps because of it.
Extra B. Unsolvability in intellectual culture 127

Extra
II.B Unsolvability in intellectual culture

Unsolvability results such as the Halting problem are about limits. Interpreted
in the light of Church’s Thesis, they say that there are things that we cannot do.
These results had an impact on the culture of mathematics but they also had an
impact on the wider intellectual world.
The discussion here is in the context of the history of European intellectual
culture, the context in which early Theory of Computation results appeared. A
broader view is beyond our scope.
With Napoleon’s downfall in the early 1800’s, many
people in Europe felt a swing back to a sense of order
and optimism, and progress. For example, in the history
of Turing’s native England, Queen Victoria’s reign from
1837 to 1901 seemed to many English commentators to
be an extended period of prosperity and peace. Across
Europe, many people perceived that the natural world
was being tamed with science and engineering — witness
the introduction of steam railways in 1825, the opening
of the Suez Canal in 1869, and the invention of the Queen Victoria opens the Great
Exhibition, 1851
electric light in 1879.†
In science this optimism was captured by the physicist
A A Michelson, who wrote in 1899, “The more important fundamental laws and
facts of physical science have all been discovered, and these are now so firmly
established that the possibility of their ever being supplanted in consequence of
new discoveries is exceedingly remote.”
The twentieth century physicist R Feynman likened science to
working out nature’s rules, “to try to understand nature is to imagine
that the gods are playing some great game like chess. . . . And you
don’t know the rules of the game, but you’re allowed to look at the
board from time to time, in a little corner, perhaps. And from these
observations, you try to figure out what the rules are of the game.”
Around the year 1900 many observers thought that we basically had
got the rules and that although there might remain a couple of obscure
David Hilbert
things like castling, soon enough those would be done also.
1862–1943
In Mathematics, this view was most famously voiced in an address
given by Hilbert in 1930, “We must not believe those, who today, with philosoph-
ical bearing and deliberative tone, prophesy the fall of culture and accept the
ignorabimus. For us there is no ignorabimus, and in my opinion none whatever in
natural science. In opposition to the foolish ignorabimus our slogan shall be: We
†
This is not to say that the perception was justified. Disease and poverty were rampant, imperialism
ruined millions of lives around the world, for much of the time the horrors of industrial slavery in the
US south went unchecked, and Europe was hardly an oasis of calm, with for instance the revolutions of
1848. Nonetheless the general feeling included a sense of progress, of winning.
128 Chapter II. Background

must know — we will know.” (‘Ignorabimus’ means ‘that which we must be forever
ignorant of ’ or ‘that thing which we will never fully penetrate’.) There was of
course a range of opinion but the zeitgeist was that we could expect that any
question would be settled, and perhaps soon.†
But starting in the early 1900’s, that changed. Exhibit A is the
picture to the right. That the modern mastery of mechanisms can have
terrible effect on human bodies became apparent to everyone during
World War I, 1914–1918. Ten million military men died. Overall,
seventeen million people died. With universal conscription, probably
the men in this picture did not want to be here. Probably they were
killed by someone who also did not want to be there, who never knew
that he killed them, and who simply entered coordinates into a firing
mechanism. For people at those coordinates, it didn’t matter how brave
World War I
they were, or how strong, or how right was their cause — they died.
trench dead
The zeitgeist shifted: Pandora’s box was now opened and the world
had become not at all ordered, reasoned, or sensible.
At something like the same time in science, Michelson’s assertion that physics
was a solved problem was destroyed by the discovery of radiation. This brought in
quantum theory, that has at its heart randomness, that included the uncertainty
principle, and that led to the atom bomb.
With Einstein we see the cultural shift directly. After experiments during a solar
eclipse in 1919 provided strong support for his theories, he became an overnight
celebrity. He was seen as having changed our view of the universe from Newtonian
clockwork to one where “everything is relative.” His work showed that the universe
has limits and that old certainties break down: nothing can travel faster than light
and even the commonsense idea of two things happening at the same instant falls
apart.
There were many reflections of this loss of certainty. For
example, the generation of writers and artists who came of age
in World War I — including Fitzgerald, Hemingway, and Stein —
became known as the Lost Generation. They expressed their
experience through themes of alienation, isolation, and dismay.
In music, composers such as Debussy and Mahler broke with
the traditional forms in ways that were often hard for listeners —
Stravinsky’s Rite of Spring caused a near riot at its premiere in S Dali’s 1931 Persistence
1913. As for visual arts, the painting here shows the same themes. of Memory.
In mathematics, much the same inversion of the standing
order happened in 1930 with K Gödel’s announcement of the Incompleteness
†
Below we will cite some things as turning points that occur before 1930; how can that be? For one
thing, it is typical for cultural shifts to have muddled timelines. Another thing is that this is Hilbert’s
retirement address so we can reasonably take his as a lagging view. Finally, in Mathematics the shift
occurred later than in the general culture. We mark that shift with the announcement of Gödel’s
Incompleteness Theorem, discussed below. That announcement came at the same meeting as Hilbert’s
speech, on the day before it. Gödel was in the audience for Hilbert’s address and during it whispered to
O Taussky-Todd, “He doesn’t get it.”
Extra C. Self Reproduction 129

Theorem. This says that if we fix a sufficiently strong formal system such as the
elementary theory of N with addition and multiplication then there are statements
that, while true in the system, cannot be proved in that system.†
This statement of hard limits seemed to many to be especially
striking in mathematics, which traditionally held the place as
the most solid of knowledge. For example, I Kant said, “I assert
that in any particular natural science, one encounters genuine
scientific substance only to the extent that mathematics is present.”
This is all the more impactful as Gödel’s results are not about a
Gödel and friend, 1947 specialized area of only technical interest but instead are about
statements in the natural numbers and about proof itself, and so
the hole that Gödel finds lies at the very foundation of rational thought.
To be a mathematical proof, each step in an argument must be verifiable as
either an axiom or as a deduction that is valid from the prior steps. So proving a
mathematical theorem is a kind of computation.‡ Thus, Gödel’s Theorem and other
uncomputability results are in the same vein. In fact, from a proof of the Halting
problem, we can get to a proof of Gödel’s Theorem in a way that is reasonably
straightforward. (Of course, while part of the battle is the technical steps, a larger
part is the genius of envisioning the statement at all.)
To people at the time these results were deeply shocking, revolutionary. And
while we work in an intellectual culture that has absorbed this shock, we must
nevertheless still recognize them as a bedrock.

Extra
II.C Self Reproduction

Where do babies come from?

Some early investigators, working with crude microscopes, thought that
the development of a fetus is that it basically just expands while retaining
its essential features of one head, two arms, etc. Projecting backwards,
they posited a ‘homunculus’, a small human-like figure that when given life
swells to become a person.
One issue with this theory is that the person may become a parent. So
each homunculus contains its children? And grandchildren? That is, one
Sperm, sci-
problem is infinite regress. Of course today we know that sperm and egg entific
don’t contain bodies, they contain DNA, which we may think of as code to illustra-
create bodies. tion, 1695

†
Gödel produces a statement that asserts, in a coded way, “This statement cannot be proved.” If false
then it could be proved but false statements cannot be proved in the natural numbers. So it must be
true. But then it, indeed, is true but cannot be proved to be so. ‡ This implies that you could start with
all of the axioms and apply all of the logic rules to get a set of theorems. Then application of all of the
logic rules to those will give all the second-rank theorems, etc. In this way, by dovetailing from the
axioms you can in principle computably enumerate the theorems.
130 Chapter II. Background

Paley’s watch In 1802, W Paley argued for the existence of a god from a
perception of unexplained order in the natural world.
In crossing a heath, . . . suppose I had found a watch upon the ground . . . [W]hen
we come to inspect the watch we perceive . . . that its several parts are framed and put
together for a purpose, e.g., that they are so formed and adjusted as to produce motion,
and that motion so regulated as to point out the hour of the day . . . the inference we
think is inevitable, that the watch must have a maker — that there must have existed,
at some time and at some place or other, an artificer or artificers who formed it for the
purpose which we find it actually to answer, who comprehended its construction and
designed its use.
The marks of design are too strong to be got over. Design must have had a designer.
That designer must have been a person. That person is GOD.
This essay was very influential before the development by Darwin and Wallace
of the theory of differential reproduction through natural selection.
Paley then gives his strongest argument, that the most incredible
thing in the natural world, that which distinguishes living things from
stones or machines, is that they can, if, given a chance, self-reproduce.
Suppose, in the next place, that the person, who found the watch,
would, after some time, discover, that, in addition to all the properties
which he had hitherto observed in it, it possessed the unexpected property
of producing, in the course of its movement, another watch like itself . . . If
that construction without this property, or which is the same thing, before
this property had been noticed, proved intention and art to have been
William Paley
employed about it; still more strong would the proof appear, when he
1743–1805
came to the knowledge of this further property, the crown and perfection
of all the rest.
This captures that for many pre-evolution thinkers, from among all the things
in the world to marvel at — the graceful shell of a nautilus, the precision of an
eagle’s eye, or consciousness — the greatest wonder was self-reproduction. It may
seem, for example, that making a machine to weave a rug is possible only because
the rug is less complex than the machine. In this mindset, having something that
assembles a copy of itself appears to be an impossibility, a kind of magic. But that’s
wrong. The Fixed Point Theorem gives self-reproducing mechanisms.

Quines The Fixed Point Theorem shows that there is a num-

ber 𝑚 such that 𝜙𝑚 (𝑥) = 𝑚 for all inputs. Think of 𝑚 as the
function’s name, so that this machine names itself. This is
self-reference. Said another way, P𝑚 ’s name is its behavior.
Since we can go effectively from the index 𝑚 to the machine
source, in a sense this machine knows its source. A quine is a
program that outputs its own source code. We will next step Courtesy xkcd.com
through the nitty-gritty of making a quine.† We will use the C
language since it is low-level and so the details are not hidden.
†
The easiest such program finds its source file on the disk and prints it. That is cheating.
Extra C. Self Reproduction 131

A person might think to include the source as a string within the source. Below
is a start at that,† which we can call try0 .c. But this is naive. The string would
have to contain another string, etc. Like the homunculus theory, this leads to an
infinite regress. Instead, we need a program that somehow contains instructions
for computing a part of itself.
main() {
printf("main(){\n ... }");
}

A more sophisticated approach leverages our discussion of the Fixed Point

Theorem in that it mentions the code before using it. This is try1 .c.‡
char *e="main(){printf(e);}"
main(){printf(e);};

Here is the printout.

main(){printf(e);};

Ratcheting up this approach gives try2 .c.

char *e="main(){printf("char*e="");printf(e); printf("";\n");printf(e);"
main(){printf("char *e="");printf(e); printf("";\n");printf(e);}

This is close. Escaping some newlines and quotation marks# leads to this program,
try3 .c, which works.
char *e="char*e="%c %s %c; %c main() {printf(e,34,e,34,10,10);}%c";
main(){printf(e,34,e,34,10,10);}

The verb ‘to quine’ means to write a sentence fragment a first time, and then
to write it a second time, but with quotation marks around it. For example, from
‘say’ we get “say ‘say’.” Another is “quine ‘quine’.” This is a linguistic analog of the
self-reproducing programs where the second word plays the part of the data in a
traditional program/data split, the same part as is played by try3 .c’s first line
string. That part is also played by ‘produce’ in “Produce the machine, and then do
the machine.”
We can express that in code. First consider quoting. To perform some action we
ordinarily define a function and then call it as with (f 1 2) . As a consequence, if
we want to produce a list of three strings then this
> (Boro is reading)

gives the error Boro : undefined . We must tell Racket not to evaluate these
things, in this case not to evaluate the list in the usual way of taking the first entry
to be a function and then applying it to the evaluation of the other entries.
> (quote (Boro is reading))
'(Boro is reading)

There is a single-character convenient shortcut.

†
The backslash-n gives a newline character. ‡ The char *e ="..." construct gives a string. In the C
language printf (...) command the first argument is a string. In that string double quotes expand
to single quotes, %c takes a character substitution from any following arguments, and %s takes a string
substitution. # The 10 is the ASCII encoding for newline and 34 is ASCII for a double quotation mark.
132 Chapter II. Background

> '(Boro is reading)

'(Boro is reading)

We can put this in a function.

(define (P x)
(list 'Boro 'is x))

> (P 'reading)
'(Boro is reading)

Using those gives the diagonalization routine.

(define (Diag p x)
(p (list 'quote (p x))))

> (Diag P 'reading)

'(Boro is '(Boro is reading))

And here is a quine.

(define (Quine -1 x)
(list x (list 'quote x)))

> (Quine -1 'reading)

'(reading 'reading)
> (Quine -1 'Quine -1)
'(Quine -1 'Quine -1)

For a version that does not depend on the definitions use this.
> ((lambda (x) (list x (list 'quote x)))
'(lambda (x) (list x (list 'quote x))))
'((lambda (x) (list x (list 'quote x))) '(lambda (x) (list x (list 'quote x))))

The ( lambda (x) ...) construct is how Racket defines a function of one input
without giving it a name (the term ‘lambda’ comes from Church’s Lambda Calculus).

Reflections on Trusting Trust K Thompson is one of the two main cre-

ators of the UNIX operating system. For this and other accomplishments
he won the Turing Award, the highest honor in computer science. He
began his acceptance address with this.
In college, before video games, we would amuse ourselves by posing
programming exercises. One of the favorites was to write the shortest
self-reproducing program. . . .
More precisely stated, the problem is to write a source program that,
Ken Thompson
when compiled and executed, will produce as output an exact copy of its
b 1943
source. If you have never done this, I urge you to try it on your own. The
discovery of how to do it is a revelation that far surpasses any benefit obtained by being
told how to do it. The part about “shortest” was just an incentive to demonstrate skill
and determine a winner.
This celebrated essay develops a quine and goes on to show how the existence
of such code poses a security threat that is very subtle and just about undetectable.
The entire address (Thompson 1984) is widely available; everyone should read it.
Extra D. Busy Beaver 133

Extra
II.D Busy Beaver

For any 𝑛 ∈ N, the set of Turing machines having no more than 𝑛 states
is finite. There are machines P𝑒 in this set that halt (on a blank tape)
and machines that do not. As the set is finite, we can think to start all
of the machines and wait until no more of them will ever converge. At
that point we will know which 𝑛 state machine runs for the most steps,
which produces the most output, which visits the most tape squares, etc.
Define the function BB : N → N to give the minimal number of steps
after which all of these size 𝑛 machines that will ever halt on a blank Tibor Radó
tape have done so. Also let Σ : N → N be the largest number of 1’s left 1895–1965
on the tape by any 𝑛 -state Turing machine, when started on a blank
tape, after halting.
D.1 Theorem (Radó, 1962) The functions BB and Σ are not computable.
Proof For BB, assume otherwise. To compute whether some P𝑒 halts on input 𝑒 ,
run P𝑒 (𝑒) for BB (𝑛) -many steps. If P𝑒 (𝑒) has not halted by then, it never will.
So computability of BB would contradict the unsolvability of the Halting problem.
(The function Σ is similar and is Exercise D.7.)
This BB may seem to be just one more uncomputable function among many
However, it has the interesting property that any function 𝑓 that grows faster than
it — where 𝑓 (𝑛) ≥ BB (𝑛) for all sufficiently large 𝑛 — is also not computable, by
the same argument as in the proof. This gives us an insight about what makes a
function uncomputable: one way is to grow faster than any computable function.†
The Busy Beaver problem is: which 𝑛 -state Turing Machine
does the most computational work before halting?
Think of this as a competition, to produce the machine that
sets the limit BB (𝑛) or Σ(𝑛) .‡ A computation needs rules and
here tradition fixes a definition of Turing machines where there
is a single tape that is unbounded at one end, there are two tape
symbols 1 and B, there is a separate halt state that is not counted in
the number of machine states, the machine is started on an empty Rare moment of rest
tape, and where transitions are of the form Δ( state, tape symbol) =
⟨state, tape symbol, head shift⟩ .

What is known In the 1962 paper Radó covered the 𝑛 = 0, 𝑛 = 1, and 𝑛 = 2 cases
(𝑛 = 0 is trivial since it refers to a machine consisting only of a halting state). In
1964 Radó and Lin showed that Σ( 3) = 6.
4.2 Example This is the three state Busy Beaver machine, with halting state 𝑞 3 .

†
Note the connection with the Ackermann function; we showed that it is not primitive recursive because
it grows faster than any primitive recursive function. ‡ For many years after the problem was originally
stated by T Radó, the competition was centered on Σ. However, recently it has become more common
to discuss BB. In any event, the two are very closely related.
134 Chapter II. Background

Δ B 1
𝑞0 𝑞 1, 1, 𝑅 𝑞 3, 1, 𝑅
𝑞1 𝑞 2, B, 𝑅 𝑞 1, 1, 𝑅
𝑞2 𝑞 2, 1, 𝐿 𝑞 0, 1, 𝐿

In 1983 A Brady showed that Σ( 4) = 13 and BB ( 4) = 107.

In 2024, a team of researchers working together over the Internet and using the
Coq proof verification system showed that Σ( 5) = 4 098 and BB ( 5) = 47 176 870.
Their machine computes 𝑔( 0) , 𝑔(𝑔( 0)) , . . . until it halts, where 𝑔 is this function.


 ( 5𝑛 + 18)/3 – if 𝑛 leaves a remainder of 0 on division by 3
𝑔(𝑛) = ( 5𝑛 + 22)/3 – if 𝑛 leaves a remainder of 1



 Halt – otherwise



This summarizes the current world records.
𝑛 1 2 3 4 5 6
BB (𝑛) 1 6 21 107 47 176 870 ≥ 10 ↑↑ 15
Σ(𝑛) 1 4 6 13 4 098 ≥ 10 ↑↑ 15
The notation 10 ↑↑ 15 means 10ˆ ( 10ˆ ( 10ˆ · · · ))) with fifteen 10’s.
How we find these After 𝑛 = 2 the obvious place to start an attack on this problem
is with a breadth-first search: there are finitely many 𝑛 -state machines so run them
all on a blank tape, dovetail, and await developments. That will quickly settle the
question for a large number of machines. Of course, some of them won’t halt or
will run longer than our patience lasts. For some of these their action will be easy
to determine from the source, and we can hope to quickly reduce to a relatively
few machines which we can study in depth, and so by exhaustion find the answer
for this 𝑛 .†
But what if for some 𝑛 we find a machine that computes something that we
don’t know? For instance, what if 𝑛 is big enough to allow a machine that halts
if and only if it finds an odd perfect number? The 𝑛 = 6 case seems to have a
machine similar to this.
For more, there are a number of websites that cover the topic, including the
latest results. Besides the Wikipedia page, the canonical site is bbchallenge.org.
Some cover variations on machine standards such as considering machines with
three or more symbols.
Not only are Busy Beaver numbers very hard to find, at some point they become
impossible. In 2016, A Yedida and S Aaronson obtained an 𝑛 for which BB (𝑛) is
unknowable. To do that, they created a programming language where programs
compile down to Turing machines. With this, they constructed a 7918-state
Turing machine that halts if there is a contradiction within the standard axioms
for Mathematics, and never halts if those axioms are consistent. We believe that
†
Brady (Brady 1983) reports 5 280 such machines for 𝑛 = 4.
Extra E. Cantor in code 135

these axioms are consistent, so we believe that this machine doesn’t halt. However,
Gödel’s Second Incompleteness Theorem shows that there is no way to prove
that the axioms are consistent using the axioms themselves. So in this case the
solution to the Busy Beaver problem is unknowable in that even if we were given
the number BB (𝑛) we could not use our axioms to prove that it is right, to prove
that this machine halts.
In summary, one way for a function to fail to be computable is if it grows faster
than any computable function. Note, however, that this is not the only way. There
are functions that grow slower than some computable function but are nonetheless
not computable.

II.D Exercises
D.3 How many Turing machines are there of the style used in this discussion?
✓ D.4 Write and run a routine to compute 𝑔( 0) , 𝑔(𝑔( 0)) , . . ..
✓ D.5 Give a diagonal construction of a function that is eventually greater than any
computable function.
✓ D.6 Show that there are uncomputable functions with the property that they
grow no faster than the computable function 𝑓 (𝑥) = 1. Hint: An argument by
countability works.
D.7 This is a proof that Σ is not computable. Let 𝑓 : N → N be any total
computable function. We will show that Σ(𝑛) > 𝑓 (𝑛) for infinitely many 𝑛 , and so
Σ ≠ 𝑓.
(a) Show that there is a Turing Machine M 𝑗 having 𝑗 many states that writes
𝑗 -many 1’s to a blank tape.
(b) Let 𝐹 : N → N be this function.

𝐹 (𝑚) = (𝑓 ( 0) + 02 ) + (𝑓 ( 1) + 12 ) + (𝑓 ( 2) + 22 ) + · · · + (𝑓 (𝑚) + 𝑚 2 )
Argue that it has the three properties: if 0 < 𝑚 then 𝑓 (𝑚) < 𝐹 (𝑚) , and
𝑚 2 ≤ 𝐹 (𝑚) , and 𝐹 (𝑚) < 𝐹 (𝑚 + 1) .
(c) The illustration below shows the composition of two Turing machines. On the
right, we have combined the final states of the first machine from the left with
the start state of the second.
... 𝑚𝑖 Halt 𝑛0 𝑛1 ... ... 𝑚𝑖 𝑛0 𝑛1 ...

... 𝑚𝑗 ... 𝑚𝑗

Consider the Turing machine P that performs M 𝑗 and followed by the machine
M𝐹 , and then follows by another copy of the machine M𝐹 . Show that its
productivity is 𝐹 (𝐹 ( 𝑗)) and that it has 𝑗 + 2𝑛 𝐹 many states.
(d) Finish by comparing that with the 𝑗 + 2𝑛 𝐹 -state Busy Beaver machine. By
definition 𝐹 (𝐹 ( 𝑗)) ≤ Σ( 𝑗 + 2𝑛 𝐹 ) . Because 𝑛 𝐹 is constant since it is the number
of states in the machine M𝐹 , the relation 𝑗 + 2𝑛 𝐹 ≤ 𝑗 2 < 𝐹 ( 𝑗) holds for
sufficiently large 𝑗 . Argue that 𝑓 ( 𝑗 + 2𝑛 𝐹 ) ≤ Σ( 𝑗 + 2𝑛 𝐹 ) .
136 Chapter II. Background

Extra
II.E Cantor in code

In this section we show that Cantor’s correspondence between N and N × N is

effective in the most straightforward way: we exhibit code.
Recall that in this table

𝑛∈N 0 1 2 3 4 5 ...
⟨𝑖, 𝑗⟩ ∈ N × N ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ...

the map from the top row to the bottom is Cantor’s pairing function because it
outputs pairs, while its inverse from the bottom to the top is the unpairing function.
First, unpairing. Given ⟨𝑥, 𝑦⟩ , we determine the diagonal that it lies on with
1 + 2 + · · · 𝑛 = 𝑛(𝑛 + 1)/2
;; triangle -num return 1+2+3+..+n
;; natural number -> natural number
(define (triangle -num n)
(/ (* (+ n 1)
n)
2))

and then use that to find the value ( 1 + 2 + · · · 𝑥) + 𝑦 .

;; cantor-unpairing Cantor number of the pair (x,y)
;; natural number, natural number -> natural number
(define (cantor-unpairing x y)
(let ([d (+ x y)])
(+ (triangle -num d)
x)))

Using the function is easy.

$ racket
Welcome to Racket v8.3 [cs].
> (require "counting.rkt")
> (cantor-unpairing 1 1)
4
> (cantor-unpairing 34 10)
1024
>

We might wonder whether the arguments to cantor - unpairing should be

the two numbers 𝑥 and 𝑦 as we’ve got above, or the pair ⟨𝑥, 𝑦⟩ . But it doesn’t
matter much because if we want the pair then we can instead use the apply
operator.
> (cantor-unpairing 10 12)
263
> (apply cantor-unpairing '(10 12))
263

Next, the pairing function. Given a natural number 𝑐 , to find the associated
⟨𝑥, 𝑦⟩ , we first find the diagonal on which it will fall. Where the diagonal is 𝑑 (𝑥, 𝑦) =
𝑥 + 𝑦 , let the associated triangle number be 𝑡 (𝑥, 𝑦) = 𝑑 (𝑑√+ 1)/2 = (𝑑 2 + 𝑑)/2.
Then 0 = 𝑑 2 + 𝑑 − 2𝑡 . Applying the familiar formula (−𝑏 ± 𝑏 2 − 4𝑎𝑐)/( 2𝑎) gives
Extra E. Cantor in code 137

this.
√
−1 + 1 − 4 · 1 · (−2𝑡) −1 + 1 + 8𝑡
√︁
𝑑= =
2·1 2
(We kept only the ‘+’ of the ‘±’ because the other root is negative.) Given a pairing
function input 𝑐 , to find the number of the√ diagonal containing the ⟨𝑥, 𝑦⟩ with
pair (𝑥, 𝑦) = 𝑐 , take the floor, 𝑑 = ⌊(−1 + 1 + 8𝑐)/2⌋ .
;; diag-num Give number of diagonal containing Cantor pair numbered c
;; natural number -> natural number
(define (diag-num c)
(let ([s (integer -sqrt (+ 1 (* 8 c)))])
(floor (quotient (- s 1)
2))))

and then we get ⟨𝑥, 𝑦⟩ by seeing how far 𝑐 is along that diagonal.
;; cantor-pairing Given the cantor number, return the pair with that number
;; natural number -> (natural number natural number)
(define (cantor-pairing c)
(let* ([d (diag-num c)]
[t (triangle -num d)])
(list (- c t)
(- d (- c t)))))

Use this in the natural way.

> (cantor-pairing 15)
'(0 5)
> (cantor-pairing (cantor-unpairing 10 12))
'(10 12)

With those we can reproduce the table from the section’s start.
> (for ([i '(0 1 2 3 4 5)])
(displayln (cantor-pairing i)))
(0 0)
(0 1)
(1 0)
(0 2)
(1 1)
(2 0)

Extending to triples is straightforward. These routines are perhaps misnamed —

they might be better named cantor - tupling -3 and cantor - untupling -3 —
but we will stick with what we have.
;; cantor-unpairing -3 Cantor number of a triple
;; natural number, natural number, natural number -> natural number
(define (cantor-unpairing -3 x0 x1 x2)
(cantor-unpairing x0 (cantor-unpairing x1 x2)))

;; cantor-pairing -3 Return the triple for (cantor-unpairing -3 x0 x1 x2) => c

;; natural number -> (natural natural natural)
(define (cantor-pairing -3 c)
(cons (car (cantor-pairing c))
(cantor-pairing (cadr (cantor-pairing c)))))

Using these routines is also straightforward.

138 Chapter II. Background

> (cantor-pairing -3 172)

'(1 2 3)
> (for ([i '(0 1 2 3 4 5 6 7 8 9)])
(displayln (cantor-pairing -3 i)))
(0 0 0)
(0 0 1)
(1 0 0)
(0 1 0)
(1 0 1)
(2 0 0)
(0 0 2)
(1 1 0)
(2 0 1)
(3 0 0)

Similar routines do four-tuples.

;; cantor-unpairing -4 Number quads
;; natural natural natural natural -> natural
(define (cantor-unpairing -4 x0 x1 x2 x3)
(cantor-unpairing x0 (cantor-unpairing -3 x1 x2 x3)))

;; cantor-pairing -4 Find the quad that corresponds to the given natural

;; natural -> natural natural natural natural
(define (cantor-pairing -4 c)
(let ((pr (cantor-pairing c)))
(cons (car pr)
(cantor-pairing -3 (cadr pr)))))

Here are the first few four-tuples.

> (for ([i '(0 1 2 3 4 5 6)])
(displayln (cantor-pairing -4 i)))
(0 0 0 0)
(0 0 0 1)
(1 0 0 0)
(0 1 0 0)
(1 0 0 1)
(2 0 0 0)
(0 0 1 0)

The routines for triples and four-tuples show that there is a general pattern.
What the heck, just for fun we can extend to tuples of any size.
For the function unpair : N𝑘 → N, which we also call cantor, we can determine
𝑘 by peeking at the number of inputs. Thus cantor - unpairing -n generalizes
cantor - unpairing , cantor - unpairing -3 , etc., by taking a tuple of any
length.
;; cantor-unpairing -n any-sized tuple Be cantor-unparing -n where n is the tuple length
;; (natural ..) of n elets -> natural
(define (cantor-unpairing -n . args)
(cond
[(null? args) 0]
[(= 1 (length args)) (car args)]
[(= 2 (length args)) (cantor-unpairing (car args) (cadr args))]
[else
(cantor-unpairing (car args) (apply cantor-unpairing -n (cdr args)))]))

> (cantor-unpairing -n 0 0 1 0)
6
> (cantor-unpairing -n 1 2 3 4)
159331
Extra E. Cantor in code 139

To generalize to the function pair : N → N𝑘, the awkwardness is that the routine
can’t know the intended arity 𝑘 and we must specify it separately.
;; cantor-pairing -arity return the list of the given arity making the cantor number c
;; If arity=0 then only c=0 is valid (others return #f)
;; natural natural -> (natural .. natural) with arity-many elements
(define (cantor-pairing -arity arity c)
(cond
[(= 0 arity)
(if (= 0 c )
'()
(begin
(display "ERROR: cantor-pairing -arity with arity=0 requires c=0") (newline)
#f))]
[(= 1 arity) (list c)]
[else (cons (car (cantor-pairing c))
(cantor-pairing -arity (- arity 1) (cadr (cantor-pairing c))))]))

This shows the routine acting like cantor - pairing -4 .

> (for ([i '(0 1 2 3 4 5 6)])
(displayln (cantor-pairing -arity 4 i)))
(0 0 0 0)
(0 0 0 1)
(1 0 0 0)
(0 1 0 0)
(1 0 0 1)
(2 0 0 0)
(0 0 1 0)

The cantor - pairing - arity routine is not uniform because it covers only
one arity at a time. Said another way, cantor - unpairing - arity is not the
inverse of cantor - pairing -n in that we have to tell it the tuple’s arity.
> (cantor-unpairing -n 3 4 5)
1381
> (cantor-pairing -arity 3 1381)
'(3 4 5)

To cover tuples of all lengths, to give a correspondence between the natural

numbers and the set of sequences of natural numbers, we define two matched
routines, cantor - pairing - omega and cantor - unpairing - omega .
> (for ([i '(0 1 2 3 4 5 6 7 8)])
(displayln (cantor-pairing -omega i)))
()
(0)
(0 0)
(1)
(0 1)
(0 0 0)
(2)
(1 0)
(0 0 1)

The idea of cantor - pairing - omega is to interpret its input c as a pair ⟨𝑥, 𝑦⟩ ,
that is, 𝑐 = pair (𝑥, 𝑦) . It then returns a tuple of length 𝑥 + 1, where 𝑦 is the tuple’s
cantor number. (The reason for the +1 in 𝑥 + 1 is that the empty tuple is associated
with 𝑐 = 0. Then rather than have all later pairs ⟨0, 𝑦⟩ not be associated with any
number, we next use the one-tuple ⟨0⟩ , and after that we use ⟨1⟩ , etc.)
140 Chapter II. Background

;; cantor-pairing -omega Inverse of cantor-unpairing -omega (but with arguments inserted)

;; natural -> (natural .. )
(define (cantor-pairing -omega c)
(let* ([pr (cantor-pairing c)]
[a (car pr)]
[cantor-number (cadr pr)])
(cond
[(and (= a 0)
(= cantor-number 0)) '()]
[(= a 0) (list (- cantor-number 1))]
[else (cantor-pairing -arity (+ 1 a) cantor-number)])))

;; cantor-unpairing -omega encode the arity in the first component

;; natural natural .. -> natural
(define (cantor-unpairing -omega . tuple)
(let ([arity (length tuple)])
(cond
[(= arity 0) (cantor-unpairing 0 0)]
[(= arity 1) (cantor-unpairing 0 (+ 1 (car tuple)))]
[else
(let ([newtuple (list (- arity 1)
(apply cantor-unpairing -n tuple))])
(apply cantor-unpairing newtuple))])))

This shows their use.

> (cantor-unpairing -omega 1 2 3 4)
12693741448
> (cantor-pairing -omega 12693741448)
'(1 2 3 4)

II.E Exercises
E.1 What is the pair with Cantor number 42?
E.2 What is the pair with the number 666?
E.3 What is the first number matched by cantor - pairing - omega with a
four-tuple?
Part Two

Automata
Chapter
III Languages, Grammars, and Graphs
This chapter covers three topics we will use as a foundation for later work.

Section
III.1 Languages
Our machines input and output strings of symbols. We take a symbol, sometimes
called a token, to be an atomic unit that a machine can read and write.† On
everyday binary computers the symbols are the bits 0 and 1. An alphabet is a
nonempty and finite set of symbols. We usually denote an alphabet with the upper
case Greek letter Σ, although an exception is the alphabet of bits, B = { 0, 1 }. A
string over an alphabet is a sequence of symbols from that alphabet. We use lower
case Greek letters such as 𝜎 and 𝜏 to denote strings. We use 𝜀 to denote the empty
string, the length zero sequence of symbols. The set of all strings over Σ is Σ∗ .‡
1.1 Definition A language L over an alphabet Σ is a set of strings drawn from that
alphabet. That is, L ⊆ Σ∗.
1.2 Example The set of bitstrings that begin with 1 is L = { 1, 10, 11, 100, ... }.
1.3 Example Another language over B is the finite set { 1000001, 1100001 }.
1.4 Example Let Σ = { a, b }. The language consisting of strings where the number of
a’s is twice the number of b’s is L = {𝜀, aab, aba, baa, aaaabb, ... }.
1.5 Example Let Σ = { a, b, c }. The language of length-two strings over that alphabet
is L2 = Σ2 = { aa, ab, ba ... , cc }. Over the same alphabet, this language consists
of length-three strings whose characters are in ascending order.

L3 = { aaa, bbb, ccc, aab, aac, abb, abc, acc, bbc, bcc }

1.6 Definition A palindrome is a string that reads the same forwards as backwards.
Some palindromes in English are kayak, noon, and racecar.
1.7 Example The language of palindromes over Σ = { a, b } is L = {𝜎 ∈ Σ∗ 𝜎 = 𝜎 R }.
A few members are abba, aaabaaa, a, and 𝜀 .
1.8 Example Let Σ = { a, b, c }. Recall that a Pythagorean triple of integers has the
sum of the squares of the first two equal to the square of the third, as with 3,
4, and 5, or 5, 12, and 13. One way to describe Pythagorean triples is with the
Image: The Tower of Babel, by Pieter Bruegel the Elder (1563) † We can imagine Turing’s clerk
calculating without reading and writing symbols, for instance by keeping track of information by having
elephants move to the left side of a road or to the right. But we could translate any such procedure into
one using marks that our mechanism’s read/write head can handle. So readability and writeability are
not essential but we require them in the definition of symbols as a convenience; after all, elephants are
inconvenient. ‡ For more on strings see the Appendix on page 370.
144 Chapter III. Languages, Grammars, and Graphs

language L = { a𝑖 b 𝑗 c𝑘 ∈ Σ∗ 𝑖, 𝑗, 𝑘 ∈ N and 𝑖 2 + 𝑗 2 = 𝑘 2 }. Some members are

aaabbbbccccc = a3 b4 c5, and a5 b12 c13, and a8 b15 c17.
1.9 Example The empty set is a language L = { } over any alphabet. So is the set
whose single element is the empty string L̂ = {𝜀 }. These two languages are
different, because the first has no members while the second has one.
The motivation for taking ‘language’ to be a set of strings is that we can imagine
that Σ is the set of words in a dictionary and a sentence is a string of words, 𝜎 ∈ Σ∗.
However, this thinking allows a language to be any string of words at all, while a
natural language such as English we must follow rules. The next section studies
rules for languages, grammars.
1.10 Definition A collection of languages is a class.
1.11 Example For any alphabet, the collection of all finite languages over that alphabet
is a class.
1.12 Example Let P𝑒 be a Turing machine using the input alphabet Σ = { B, 1 }. The
set of strings 𝑊𝑒 = {𝜎 ∈ Σ∗ P𝑒 halts on input 𝜎 } is a language. The collection of
all such languages, of the 𝑊𝑒 for all 𝑒 ∈ N, is the class of computably enumerable
languages over Σ.
These are the natural operations on languages.
1.13 Definition (Operations on languages) The concatenation of languages, L0⌢L1
or L0 L1 , is the language of concatenations, {𝜎0 ⌢ 𝜎1 𝜎0 ∈ L0 and 𝜎1 ∈ L1 }.
For any language L, when 𝑘 > 0 the power L𝑘 is the language consisting of the
concatenation of 𝑘 -many members, L𝑘 = {𝜎0 ⌢ · · · ⌢ 𝜎𝑘 − 1 𝜎𝑖 ∈ L }.† We take
L0 = {𝜀 }.‡ The Kleene star of a language L∗ is the language consisting of the
concatenation of any number of strings.

L∗ = {𝜎0 ⌢ · · · ⌢ 𝜎𝑘 − 1 𝑘 ∈ N and 𝜎0, ... , 𝜎𝑘 − 1 ∈ L } = L0 ∪ L1 ∪ L2 ∪ · · ·

This includes the concatenation of 0-many strings so 𝜀 ∈ L∗ even if L = ∅. The

reversal of a language is the language of reversals, L R = {𝜎 R 𝜎 ∈ L }.
As described in Section A, we extend the star notation beyond languages to
alphabets Σ, defining Σ∗ to be the set of strings of characters from that alphabet.
1.14 Example Where the language is the set of bitstrings L = { 1000001, 1100001 }
then the reversal is L R = { 1000001, 10000011 }.
1.15 Example If the language L consists of two strings { a, bc } then its second power
is L2 = { aa, abc, bca, bcbc }. Its Kleene star is the union of the powers.

L∗ = {𝜀, a, bc, aa, abc, bca, bcbc, aaa, ... }

1.16 Remark For the above definition of the operation L𝑘 of repeatedly choosing
strings, there are two ways that we could go. We could choose a string 𝜎 and then
†
Don’t confuse this with the Cartesian product operation for sets. ‡ We take 𝜎 0 = 𝜀 since 𝜀 is the
identity element for string concatenation. (We saw the same reasoning when we defined the sum of
zero-many numbers to be 0 and the product of zero-many numbers to be 1, on page 21.)
Section 1. Languages 145

repeat it, and so get the set of all 𝜎 𝑘. Or we could repeatedly choose strings, getting
the set of all 𝜎0 ⌢ 𝜎1 ⌢ · · · ⌢ 𝜎𝑘 − 1 . The second is more useful so that’s what we use.
We finish by describing two ways that a machine can relate to a language. We
have already defined that a machine decides a language if it computes whether or
not a given input is a member of that language. The other way relates to languages
that are computably enumerable but not computable. For these there is a machine
that determines whether a given input is a member of the language but it is not
able to determine whether the input is not in the language. For instance, there is a
Turing machine that, given input 𝑒 , can determine whether 𝑒 ∈ 𝐾 , but no machine
can determine whether 𝑒 ∉ 𝐾 .
We will say that a machine recognizes (or accepts, or semidecides) a language
when, given an input, the machine computes in a finite time whether the input is
in the language, and further, if the input is not an element of the language then
the machine will never incorrectly report that it is an element. (The machine may
determine that it is not, or it may simply not report a conclusion by failing to halt.)
In short, ‘deciding’ means that on any input the machine correctly computes
both yes and no answers, while ‘recognizing’ requires only that it correctly computes
yes answers.

III.1 Exercises
1.17 List five of the shortest strings in each language, if there are five.
(a) {𝜎 ∈ B∗ the number of 0’s plus the number of 1’s equals 3 }
(b) {𝜎 ∈ B∗ 𝜎 ’s first and last characters are equal }
✓ 1.18 Is the set of decimal representations of real numbers a language?
1.19 Which of these is a palindrome: ()() or )(()? (a) Only the first (b) Only
the second (c) Both (d) Neither
✓ 1.20 Show that if 𝛽 is a string then 𝛽 ⌢ 𝛽 R is a palindrome. Do all palindromes
have that form?
✓ 1.21 Let L0 = {𝜀, a, aa, aaa } and L1 = {𝜀, b, bb, bbb }. (a) List all the members
of L0 ⌢ L1 . (b) List all the members of L1 ⌢ L0 . (c) List all the members of L2 . 0
(d) List ten members, if there are ten, of L0∗.
✓ 1.22 List five members of each language, if there are five, and if not then list all
of them.
(a) {𝜎 ∈ {𝑎, 𝑏 } 𝜎 = a𝑛 b for 𝑛 ∈ N }
∗

(b) {𝜎 ∈ {𝑎, 𝑏 } 𝜎 = a𝑛 b𝑛 for 𝑛 ∈ N }

∗

(c) { 1𝑛 0𝑛+1 ∈ B∗ 𝑛 ∈ N}
(d) { 1𝑛 02𝑛 1 ∈ B∗ 𝑛 ∈ N}
✓ 1.23 Where L = { a, ab }, list each. (a) L2 (b) L3 (c) L1 (d) L0
1.24 Where L0 = { a, ab } and L1 = { b, bb } find each. (a) L0 ⌢ L1 (b) L1 ⌢ L0
(c) L2 (d) L2 (e) L2 ⌢ L2
0 1 0 1
146 Chapter III. Languages, Grammars, and Graphs

1.25 Suppose that the language L0 has three elements and L1 has two. Knowing
only that information, for each of these find the least number of elements possible
and the greatest number possible? (a) L0 ∪ L1 (b) L0 ∩ L1 (c) L0 ⌢ L1 (d) L21
(e) L1 R (f) L0 ∗ ∩ L1 ∗
1.26 What is the language that is the Kleene star of the empty set, ∅∗ ?
✓ 1.27 Is the 𝑘 -th power of a language the same as the language of 𝑘 -th powers?
1.28 Does L∗ differ from ( L ∪ {𝜀 }) ∗ ?
1.29 We can ask how many elements are in the set L2.
(a) Prove that if two strings are unequal then their squares are also unequal.
Conclude that if L has 𝑘 -many elements then L2 has at least 𝑘 -many elements.
(b) Provide an example of a nonempty language that achieves this lower bound.
(c) Prove that where L has 𝑘 -many elements, L2 has at most 𝑘 2 -many.
(d) Provide an example, for each 𝑘 ∈ N, of a language that achieves this upper
bound.
1.30 Prove that L∗ = L0 ∪ L1 ∪ L2 ∪ · · · .
1.31 Consider the empty language L0 = ∅. For any language L1 , describe
L1 ⌢ L0 .
1.32 Languages are sets and so the operations of union and intersection apply.
(a) Name the shortest five strings in the union of 𝐴 = { a } with 𝐵 = { b } .
∗ ∗

(b) Suppose that Σ0 and Σ1 are disjoint, and that L0 and L1 are finite languages
over those alphabets respectively. What is the number of elements in their
union?
(c) Fill in the blank: the union of a language over Σ0 with a language over Σ1 is a
language over .
(d) Formulate the similar statement for intersection.
1.33 Let the language L over some Σ be finite, that is, suppose that | L | < ∞.
(a) With the language finite, must the alphabet be finite?
(b) Show that there is some bound 𝐵 ∈ N where |𝜎 | ≤ 𝐵 for all 𝜎 ∈ L.
(c) Show that the class of finite languages is closed under finite union. That is,
show that if L0, ... L𝑘 − 1 are finite languages over a shared alphabet for some
𝑘 ∈ N then their union is also finite.
(d) Show also that the class of finite languages is closed under finite intersection
and finite concatenation.
(e) Show that the class of finite languages is not closed under complementation
or Kleene star. (For an alphabet Σ, a language is a subset, L ⊆ Σ∗ . So its
complement is Lc = Σ∗ − L, also a language over Σ.)
1.34 What is the difference between the languages L = {𝜎 ∈ Σ∗ 𝜎 = 𝜎 R } and
L̂ = {𝜎 ⌢ 𝜎 R 𝜎 ∈ Σ∗ }?
1.35 For any language L ⊆ Σ∗ we can form the set of prefixes.
Pref ( L) = {𝜏 ∈ Σ∗ 𝜎 ∈ L and 𝜏 is a prefix of 𝜎 }
Where Σ = { a, b } and L = { abaaba, bba }, find Pref ( L) .
Section 2. Grammars 147

1.36 This explains why we define L0 = {𝜀 } even when L = ∅.

(a) Show that L𝑚 ⌢L𝑛 = L𝑚+𝑛 for any 𝑚, 𝑛 ∈ N+.
(b) Show that if L0 = ∅ then L0 ⌢ L1 = L1 ⌢ L0 = ∅.
(c) Argue that if L ≠ ∅ then the only sensible definition for L0 is {𝜀 } .
(d) Why would L = ∅ throw a monkey wrench if the works unless we define
L0 = {𝜀 }?
1.37 Prove these for any alphabet Σ.
(a) For any natural number 𝑛 the language Σ𝑛 is countable.
(b) The language Σ∗ is countable.
1.38 The description of the powers of a language in Definition 1.13 writes
𝜎0 ⌢ · · · ⌢ 𝜎𝑘 −1 , without parentheses, and so doesn’t specify a construction for the
set. For that construction the natural approach is: L0 = {𝜀 }, and L𝑘+1 = L𝑘 ⌢ L.
Verify that this gives the same set as in the definition.
1.39 True or false: if L ⌢ L = L then either L = ∅ or 𝜀 ∈ L. If it is true then prove
it and if it is false give a counterexample.
1.40 Prove that no language contains a representation for each real number.
1.41 The operations of languages form an algebraic system. Assume these
languages are over the same alphabet. Show each.
(a) Language union and intersection are commutative, L0 ∪ L1 = L1 ∪ L0 and
L0 ∩ L1 = L1 ∩ L0 .
(b) The language consisting of the empty string is the identity element with
respect to language concatenation, so L ⌢ {𝜀 } = L and {𝜀 } ⌢ L = L.
(c) Language concatenation need not be commutative; there are languages such
that L0 ⌢ L1 ≠ L1 ⌢ L0 .
(d) Language concatenation is associative, ( L0 ⌢ L1 ) ⌢ L2 = L0 ⌢ ( L1 ⌢ L2 ) .
R
(e) ( L0 ⌢ L1 ) = L1 R ⌢ L0 R.
(f) Concatenation is left distributive over union, ( L0 ∪ L1 ) ⌢ L2 = ( L0 ⌢ L2 ) ∪
( L1 ⌢ L2 ) , and also right distributive.
(g) The empty language is an annihilator for concatenation, ∅ ⌢ L = L ⌢ ∅ = ∅.
(h) The Kleene star operation is idempotent, ( L∗ ) = L∗.
∗

Section
III.2 Grammars
We have defined a ‘language’ as a set of strings. But this allows for any willy-nilly
set. In practice usually a language is governed by rules.
Here is an example. Native English speakers will say that the noun phrase
“the big red barn” sounds fine but that “the red big barn” sounds wrong. That is,
sentences in natural languages are constructed in patterns and the second of those
does not follow the pattern for English. Artificial languages such as programming
languages also have syntax rules, usually very strict rules.
148 Chapter III. Languages, Grammars, and Graphs

A grammar is a set of rules for the formation of strings in a language. In an

aphorism, grammars are the language of languages.
Definition Before the formal definition we’ll first see an example.
2.1 Example A full set of rules for a natural language such as English would be quite
large. But here is a subset that gives a sense of what a set of rules would look
like: (1) a sentence can be made from a noun phrase followed by a verb phrase,
(2) a noun phrase can be made from an article followed by a noun, (3) a noun
phrase can also be made from an article then an adjective then a noun, (4) a verb
phrase can be made with a verb followed by a noun phrase, (5) one article is ‘the’,
(6) one adjective is ‘young’, (7) one verb is ‘caught’, (8) two nouns are ‘man’ and
‘ball’.
This is a convenient notation for those rules.
⟨sentence⟩ → ⟨noun phrase⟩ ⟨verb phrase⟩
⟨noun phrase⟩ → ⟨article⟩ ⟨noun⟩
⟨noun phrase⟩ → ⟨article⟩ ⟨adjective⟩ ⟨noun⟩
⟨verb phrase⟩ → ⟨verb⟩ ⟨noun phrase⟩
⟨article⟩ → the
⟨adjective⟩ → young
⟨verb⟩ → caught
⟨noun⟩ → man | ball
Each line is a production or rewrite rule. Each has one arrow, →.† To the left of
the arrow is the rule’s head while to the right is its body or expansion. Sometimes
two rules have the same head, as with ⟨noun phrase⟩ . There are also two rules
for ⟨noun⟩ but we have abbreviated by combining the bodies using the ‘|’ pipe
symbol.‡
The rules use two different kinds of components. The ones written in typewriter
type, such as the, are elements of Σ, the alphabet of the language. These
components are terminals.# The ones with angle brackets and italics, such as
⟨sentence⟩ , are nonterminals. These are like variables for intermediate steps and
do not appear in the language’s strings.
The two symbols ‘→’ and ‘|’ are neither terminals nor nonterminals. They are
metacharacters, part of the syntax of the rules themselves.
The rewrite rules govern the derivation of strings in the language. Under the
grammar above we shall have that every derivation starts with ⟨sentence⟩ . During
a derivation, intermediate strings contain a mix of nonterminals and terminals. In
our grammars every rule has a head with a single nonterminal. To get the next
string, pick a nonterminal in the present string, find a rule where that nonterminal
is a head, and then substitute that rule’s body.
Below is one such derivation. Note that while the single line arrow → is for
rules, we use the double line arrow ⇒ for derivations.§
†
Read the arrow aloud as “may produce,” or “may expand to,” or “may be constructed as.” ‡ Read the
vertical bar aloud as “or.” # So the alphabet Σ is not the set of twenty six letters, it is the dictionary of
allowed English words. § Read ‘⇒’ aloud as “expands to.”
Section 2. Grammars 149

⟨sentence⟩ ⇒ ⟨noun phrase⟩ ⟨verb phrase⟩

⇒ ⟨article⟩ ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the young ⟨noun⟩ ⟨verb phrase⟩
⇒ the young man ⟨verb phrase⟩
⇒ the young man ⟨verb⟩ ⟨noun phrase⟩
⇒ the young man caught ⟨noun phrase⟩
⇒ the young man caught ⟨article⟩ ⟨noun⟩
⇒ the young man caught the ⟨noun⟩
⇒ the young man caught the ball
This is a leftmost derivation, which always substitutes for the leftmost nonterminal
in that it first substitutes for the leftmost nonterminal on the right side of the first
line, ⟨noun phrase⟩ , then substitutes for the leftmost nonterminal on the second
line, ⟨article⟩ , etc. However, in general we could substitute for any nonterminal.
An alternative representation is the derivation tree or parse tree.†
hsentencei

hnoun phrasei hverb phrasei

harticlei hadjectivei hnouni hverbi hnoun phrasei

the young man caught harticlei hnouni

the ball

2.2 Definition A context-free grammar# is a four-tuple G = ⟨Σ, 𝑁 , 𝑆, 𝑃⟩ . The set

Σ is an alphabet, whose elements are the terminal symbols, and the elements of
the set 𝑁 are the nonterminals or syntactic categories. (We take Σ and 𝑁 to be
disjoint, and that neither contains metacharacters.) The symbol 𝑆 ∈ 𝑁 is the start
symbol. Finally, 𝑃 is a set of productions or rewrite rules.
In this book we use the convention that the start symbol is the head of the first
rule.
2.3 Example This grammar describes algebraic expressions that involve only addition,
multiplication, and parentheses.
†
The words ‘terminal’ and ‘nonterminal’ come from the position of those components in this tree.
#
This definition of rules, grammars, and derivations suffices for us but it is not the most general one.
One more general definition allows heads of the form 𝜎0 𝑋 𝜎1 , where 𝜎0 and 𝜎1 are strings of terminals.
(The 𝜎𝑖 ’s can be empty.) For example, consider this grammar: (i) S → aBSc | abc, (ii) Ba → aB,
(iii) Bb → bb. Rule (ii) says that if you see a string with something followed by a then you can replace
that string with a followed by that thing. For instance, in the derivation S ⇒ aBSc ⇒ aBabcc
⇒ aaBbcc ⇒ aabbcc the third step uses (ii) and the fourth step uses (iii). Such grammars are
context sensitive because we can only substitute for X in the context of 𝜎0 and 𝜎1 . Context sensitive
grammars describe more languages than the context free ones that we are using, and there are grammar
classes even more general. But our definition satisfies our needs and is the class of grammars that
appear most often in practice.
150 Chapter III. Languages, Grammars, and Graphs

⟨expr⟩ → ⟨term⟩ + ⟨expr⟩ | ⟨term⟩

⟨term⟩ → ⟨term⟩ * ⟨factor⟩ | ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ ) | a | b | . . . | z
Here is a derivation of the string x*(y+z), along with its tree.

hexpri
⟨expr⟩ ⇒ ⟨term⟩
⇒ ⟨term⟩ * ⟨factor⟩ htermi
⇒ ⟨factor⟩ * ⟨factor⟩
htermi * hfactori
⇒ x * ⟨factor⟩
⇒ x * ( ⟨expr⟩ ) hfactori ( hexpri )
⇒ x * ( ⟨term⟩ + ⟨expr⟩ )
⇒ x * ( ⟨term⟩ + ⟨term⟩ ) x htermi + hexpri
⇒ x * ( ⟨factor⟩ + ⟨term⟩ )
hfactori htermi
⇒ x * ( ⟨factor⟩ + ⟨factor⟩ )
⇒ x * ( y + ⟨factor⟩ ) y hfactori
⇒ x*(y+z)
z

In that example the rules for ⟨expr⟩ and ⟨term⟩ are recursive. But we don’t get
stuck in an infinite regress because the question is not whether we could perversely
keep expanding ⟨expr⟩ forever. Instead, the question is whether, given a string
such as x*(y+z), we can find a terminating derivation.
In the prior example the nonterminals such as ⟨expr⟩ or ⟨term⟩ describe the
role of those components in the language, as did the English grammar fragment’s
⟨noun phrase⟩ and ⟨article⟩ . That is why nonterminals are sometimes called
‘syntactic categories’. But for examples and exercises we often use small grammars
whose terminals and nonterminals do not have any particular meaning. For these,
a common convention is to write productions using single letters, with nonterminals
in upper case and terminals in lower case.
2.4 Example This two-rule grammar has one nonterminal, S.
S → aSb | 𝜀
Here is a derivation of the string a2 b2 = aabb.

S ⇒ aSb ⇒ aaSbb ⇒ aa𝜀 bb = aabb

Similarly, 𝑆 ⇒ a𝑆 b ⇒ aa𝑆 bb ⇒ aaa𝑆 bbb ⇒ aaa𝜀 bbb = aaabbb is a derivation

of a3 b3. With this grammar, derivable strings have the form a𝑛 b𝑛 for 𝑛 ∈ N.
We now give a complete description of how the production rules govern the
derivations. Each rule in a context free grammar has the form ‘head → body’
where the head consists of a single nonterminal. The body is a sequence of
terminals and nonterminals. Each step of a derivation has the form below, where
𝜏0 and 𝜏1 are strings of terminals and non-terminals.
𝜏0 ⌢ head ⌢ 𝜏1 ⇒ 𝜏0 ⌢ body ⌢ 𝜏1
Section 2. Grammars 151

That is, if there is a match for the rule’s head then we can replace it with the body.
Where 𝜎0, 𝜎1 are strings of terminals and nonterminals, if they are related by a
sequence of derivation steps then we write 𝜎0 ⇒∗ 𝜎1 . Where 𝜎0 = 𝑆 is the start
symbol, if there is a sequence 𝜎0 ⇒∗ 𝜎1 that finishes with a string of terminals
𝜎1 ∈ Σ∗ then we say that 𝜎1 has a derivation from the grammar.
This description is like the one detailing how a Turing machine’s instructions
determine the evolution of the sequence of configurations that is a computation,
on page 8. That is, production rules are like a program, directing a derivation.
However, one difference is that Turing machines are deterministic, so that from
a given input string there is a determined sequence of configurations. However
here the sequence of derivation steps is nondeterministic in that from a given start
symbol a derivation can branch out to go to many different ending strings.
2.5 Definition The language derived from a grammar is the set of strings of
terminals having derivations that begin with the start symbol.
2.6 Example This grammar’s language is the set of representations of natural numbers.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | . . . | 9
This is a derivation for the string 321, along with its parse tree.

hnaturali
⟨natural⟩ ⇒ ⟨digit⟩⟨natural⟩
hdigiti hnaturali
⇒ 3 ⟨natural⟩
⇒ 3 ⟨digit⟩⟨natural⟩
3 hdigiti hnaturali
⇒ 32 ⟨natural⟩
⇒ 32 ⟨digit⟩
2 hdigiti
⇒ 321
1

2.7 Example The language of this grammar

⟨natural⟩ → 𝜀 | 1 ⟨natural⟩
is the set of strings representing natural numbers in unary.
2.8 Example Any finite language is derived from a grammar. This one gives the
language of all length 2 bitstrings, using the brute force approach of just listing all
the member strings.
S → 00 | 01 | 10 | 11
This gives the length 3 bitstrings by using the nonterminals to keep count.
A → 0B | 1B
B → 0C | 1C
C → 0|1
A derivation of 101 is A ⇒ 1B ⇒ 10C ⇒ 101.
2.9 Example This grammar
152 Chapter III. Languages, Grammars, and Graphs

S → aSb | aS | a | Sb | b
generates the language L = { a𝑖 b 𝑗 ∈ { a, b }∗ 𝑖 ≠ 0 or 𝑗 ≠ 0 }.
This is the first grammar that we have seen where the generated language is
not clear, so we will do a verification. We will show mutual containment, first that
the generated language is a subset of L and then that it is also a superset.
The rules show that any derivation step 𝜏0 ⌢ head ⌢ 𝜏1 ⇒ 𝜏0 ⌢ body ⌢ 𝜏1 only
adds a’s on the left and b’s on the right, so every string in the language has the form
a𝑖 b 𝑗 . That same rules show that in any terminating derivation S must eventually
be replaced by either a or b. Together these two give that the generated language
is a subset of L.
For containment the other way, we will prove that every 𝜎 ∈ L has a derivation.
We will use induction on the length |𝜎 | . By the definition of L the base case is
|𝜎 | = 1. In this case either 𝜎 = a or 𝜎 = b, each of which obviously has a derivation.
For the inductive step, fix 𝑛 ≥ 1 where every string from L of length 𝑘 = 1, . . .
𝑘 = 𝑛 has a derivation, and let 𝜎 have length 𝑛 + 1. By the definition of L it has
the form 𝜎 = a𝑖 b 𝑗 . There are three cases: either or 𝑖 = 𝑗 = 1, or 𝑖 > 1, or 𝑗 > 1.
The 𝜎 = a1 b1 case is easy. For the 𝑖 > 1 case, 𝜎ˆ = a𝑖 − 1 b 𝑗 is a string of length 𝑛 ,
so by the inductive hypothesis it has a derivation S ⇒ · · · ⇒ 𝜎ˆ . Prefixing that
derivation with a S ⇒ aS step will put an additional a on the left. The 𝑗 > 1 case
works the same way.
2.10 Example The fact that derivations can go more than one way leads to an important
issue with grammars, that they can be ambiguous. Consider this fragment of a
grammar for if statements in a C-like language
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
and this code string.
if enrolled(s) if studied(s) grade='P' else grade='F'

Here are the first two lines of one derivation

⟨stmt⟩ ⇒ if ⟨bool⟩ ⟨stmt⟩
⇒ if ⟨bool⟩ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩

and here are the first two of another.

⟨stmt⟩ ⇒ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
⇒ if ⟨bool⟩ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
That is, we cannot tell whether the else in the code line is associated with the
first if or the second. The resulting parse trees for the full code line dramatize
the difference

if if

enrolled(s) if enrolled(s) if else grade='F'

studied(s) grade='P' else grade='F' studied(s) grade='P'

Section 2. Grammars 153

as do these copies of the C-like language code string that has been indented to
dramatize the association.

if enrolled(s) if enrolled(s)
if studied(s) if studied(s)
grade='P' grade='P'
else else
grade='F' grade='F'

Obviously, those programs behave differently. This is known as a dangling else. (In
a language such as C a programmer makes clear which of the two possibilities is
the intended one by using curly braces.)
A grammar is ambiguous if there is a string in its language with more than one
leftmost derivation.
2.11 Example This grammar for elementary algebra expressions
⟨expr⟩ → ⟨expr⟩ + ⟨expr⟩
| ⟨expr⟩ * ⟨expr>⟩
| ( ⟨expr⟩ ) | a | b | . . . z
is ambiguous because a+b*c has two leftmost derivations.

⟨expr⟩ ⇒ ⟨expr⟩ + ⟨expr⟩ ⇒ a + ⟨expr⟩

⇒ a + ⟨expr⟩ * ⟨expr⟩ ⇒ a + b * ⟨expr⟩ ⇒ a + b * c
⟨expr⟩ ⇒ ⟨expr⟩ * ⟨expr⟩ ⇒ ⟨expr⟩ + ⟨expr⟩ * ⟨expr⟩
⇒ a + ⟨expr⟩ * ⟨expr⟩ ⇒ a + b * ⟨expr⟩ ⇒ a + b * c
The difference is reflected in different parse trees.

hexpri hexpri

hexpri + hexpri hexpri * hexpri

a hexpri * hexpri hexpri + hexpri c

b c a b

Again, the issue is that we get two different behaviors. For instance, take 1 for a,
and 2 for b, and 3 for c. The first derivation gives 1 + ( 2 · 3) = 7 while the second
one gives ( 1 + 2) · 3 = 9.
In contrast, this grammar for the same language is unambiguous.
⟨expr⟩ → ⟨expr⟩ + ⟨term⟩
| ⟨term⟩
⟨term⟩ → ⟨term⟩ * ⟨factor⟩
| ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ )
| a | b | ... | z
154 Chapter III. Languages, Grammars, and Graphs

III.2 Exercises
✓ 2.12 Use the grammar of Example 2.3. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar, besides the string
in the example. (f) Give three strings in the language { +, *, ), (, a ... , z }∗ that
cannot be derived.
2.13 Use the grammar of Example 2.1. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar besides the ones in
the exercise, or show that there are not three such strings. (f) Give three strings
in the language that cannot be derived from this grammar, or show that there are
not three such strings.
2.14 Use this grammar.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | 1 | . . . | 9
(a) What is the alphabet? What are the terminals? The nonterminals? What is the
start symbol? (b) For each production, name the head and the body. (c) Which
metacharacters are used? (d) Derive 42. Also give the associated parse tree.
(e) Derive 993 and give its parse tree. (f) How can ⟨natural⟩ be defined in terms
of ⟨natural⟩ ? Doesn’t that lead to infinite regress? (g) Extend this grammar to
cover the integers. (h) With your grammar, can you derive +0? -0?
✓ 2.15 From this grammar
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct object⟩
⟨direct object⟩ → ⟨article⟩ ⟨noun⟩
⟨article⟩ → the | a
⟨noun⟩ → car | wall
⟨verb⟩ → hit
derive each of these: (a) the car hit a wall (b) the car hit the wall
(c) the wall hit a car.
2.16 Consider this grammar.
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun1⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct-object⟩
⟨direct-object⟩ → ⟨article⟩ ⟨noun2⟩
⟨article⟩ → the | a | 𝜀
⟨noun1⟩ → dog | flea
⟨noun2⟩ → man | dog
⟨verb⟩ → bites | licks
(a) Give a derivation for dog bites man.
(b) Show that there is no derivation for man bites dog.
Section 2. Grammars 155

✓ 2.17 Your friend tries the prior exercise and you see their work so far.
⟨sentence⟩ ⇒ ⟨subject⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨verb⟩ ⟨direct object⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨noun2⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨man|dog⟩
Stop them and explain what they are doing wrong.
2.18 With the grammar of Example 2.3, derive (a+b)*c.
✓ 2.19 Use this grammar
S → TbU
T → aT | 𝜀
U → aU | bU | 𝜀
for each part. (a) Give both a leftmost derivation and rightmost derivation of aabab.
(b) Do the same for baab. (c) Show that there is no derivation of aa.
2.20 Use this grammar.
S → aABb
A → aA | a
B → Bb | b
(a) Derive three strings.
(b) Name three strings over Σ = { a, b } that are not derivable.
(c) Describe the language generated by this grammar.
2.21 Give a grammar for the language { a𝑛 b𝑛+𝑚 a𝑚 𝑛, 𝑚 ∈ N }.
✓ 2.22 Give the parse tree for the derivation of aabb in Example 2.4.
2.23 Verify that the language derived from the grammar in Example 2.4 is
L = { a𝑛 b𝑛 𝑛 ∈ N }.
2.24 What is the language generated by this grammar?
A → aA | B
B → bB | cA
✓ 2.25 In many programming languages identifier names consist of a string of
letters or digits, with the restriction that the first character must be a letter. Create
a grammar for this, using ASCII letters.
2.26 Early programming languages had strong restrictions on what could be a
variable name. Create a grammar for a language that consists of strings of at most
four characters, upper case ASCII letters or digits, where the first character must
be a letter.
2.27 What is the language generated by a grammar with a set of production rules
that is empty?
2.28 Here is a grammar for propositional logic expressions in Conjunctive Normal
form.
⟨CNF⟩ → ( ⟨Disjunction⟩ ) ∧ ⟨CNF⟩ | ( ⟨Disjunction⟩ )
156 Chapter III. Languages, Grammars, and Graphs

⟨Disjunction⟩ → ⟨Literal⟩ ∨ ⟨Disjunction⟩ | ⟨Literal⟩

⟨Literal⟩ → ¬ ⟨Variable⟩ | ⟨Variable⟩
⟨Variable⟩ → x0 | x1 | . . .
For more, see Section C.
(a) Derive (𝑥 0 ∨ ¬𝑥 1 ) ∧ (𝑥 1 ∨ 𝑥 2 ) .
(b) Show that you cannot derive (¬𝑥 0 ∧ ¬𝑥 1 ) ∨ (¬𝑥 0 ∧ ¬𝑥 2 ) .
2.29 Create a grammar for each of these languages.
(a) the language of all character strings L = { a, ... , z }
∗

(b) the language of strings of at least one digit, {𝜎 ∈ { 0, ... , 9 } |𝜎 | ≥ 1 }

∗

✓ 2.30 This is a grammar for postal addresses. Note the use of the empty string 𝜀 to
make some components optional, such as ⟨opt suffix⟩ and ⟨apt num⟩ .
⟨postal address⟩ → ⟨name⟩ ⟨EOL⟩ ⟨street address⟩ ⟨EOL⟩ ⟨town⟩
⟨name⟩ → ⟨personal part⟩ ⟨last name⟩ ⟨opt suffix⟩
⟨street address⟩ → ⟨house num⟩ ⟨street name⟩ ⟨apt num⟩
⟨town⟩ → ⟨town name⟩ , ⟨state or region⟩
⟨personal part⟩ → ⟨initial⟩ . | ⟨first name⟩
⟨last name⟩ → ⟨char string⟩
⟨opt suffix⟩ → Sr. | Jr. | 𝜀
⟨house num⟩ → ⟨digit string⟩
⟨street name⟩ → ⟨char string⟩
⟨apt num⟩ → ⟨char string⟩ | 𝜀
⟨town name⟩ → ⟨char string⟩
⟨state or region⟩ → ⟨char string⟩
⟨initial⟩ → ⟨char⟩
⟨first name⟩ → ⟨char string⟩ | 𝜀
⟨char string⟩ → ⟨char⟩ | ⟨char⟩ ⟨char string⟩ | 𝜀
⟨char⟩ → A | B | . . . z | 0 | . . . 9 | (space)
⟨digit string⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨digit string⟩ | 𝜀
⟨digit⟩ → 0 | . . . 9
The nonterminal ⟨EOL⟩ expands to an end of line such as ASCII 10, while (space)
signifies a whitespace character such as ASCII 0 or ASCII 32, or even more exotic
characters such as en-space or em-space.
(a) Give a derivation for this address.
President
1600 Pennsylvania Avenue
Washington, DC
(b) Why is there no derivation for this address?
Sherlock Holmes
221B Baker Street
London, UK
Suggest a modification of the grammar so that this address is in the language.
(c) Give three reasons why this grammar is inadequate.
Section 2. Grammars 157

2.31 Recall Turing’s prototype computer, a clerk doing the symbolic manipulations
to multiply two large numbers. Deriving a string from a grammar has a similar
feel and we can write grammars to do computations. Fix the alphabet Σ = { 1 }, so
that we can interpret derived strings as numbers represented in unary.
(a) Produce a grammar whose language is the even numbers, { 12𝑛 𝑛 ∈ N } .
(b) Do the same for the multiples of three, { 13𝑛 𝑛 ∈ N } .
✓ 2.32 Here is a grammar that is notable for having a small alphabet, while
producing an infinite set of valid English sentences.
⟨sentence⟩ → buffalo ⟨sentence⟩ | 𝜀
(a) Derive a sentence of length one, one of length two, and one of length three.
(b) Give those sentences semantics, that is, make sense of them.
2.33 Here is a grammar for LISP.
⟨s expression⟩ → ⟨atomic symbol⟩
| ( ⟨s expression⟩ . ⟨s expression⟩ )
| ⟨list⟩
⟨list⟩ → ( ⟨list-entries⟩ )
⟨list-entries⟩ → ⟨s expression⟩
| ⟨s expression⟩ ⟨list-entries⟩
⟨atomic symbol⟩ → ⟨letter⟩ ⟨atom part⟩
⟨atom part⟩ → 𝜀
| ⟨letter⟩ ⟨atom part⟩
| ⟨number⟩ ⟨atom part⟩
⟨letter⟩ → a | . . . z
⟨number⟩ → 0 | . . . 9
Give a derivation for each string. (a) (a . b) (b) (a . (b . c))
2.34 Using the Example 2.11’s unambiguous grammar, produce a derivation for
a+(b*c).
2.35 The simplest example of an ambiguous grammar is
S → S |𝜀
(a) What is the language generated by this grammar?
(b) Produce two different derivations of the empty string.
2.36 This is a grammar for the language of bitstrings L = B∗.
⟨bit-string⟩ → 0 | 1 | ⟨bit-string⟩ ⟨bit-string⟩
Show that it is ambiguous.
2.37
(a) Show that this grammar is ambiguous by producing two different leftmost
derivations for a-b-a.
E → E-E |a |b
(b) Derive a-b-a from this grammar, which is unambiguous.
E → E-T |T
T → a |b
158 Chapter III. Languages, Grammars, and Graphs

Section
III.3 Graphs
In the Theory of Computation we often state problems using the language of Graph
Theory. Here are two examples we have already seen. Both have vertices, and
those vertices are connected by edges that represent a relationship between the
vertices.

hexpri

htermi

htermi * hfactori
𝑞0
B, L
𝑞1
B, L
𝑞2
B, R
𝑞3 hfactori ( hexpri )

1, R 1, B 1, L
x htermi + hexpri

hfactori htermi

y hfactori

Definition We start with the basics.

3.1 Definition A simple graph is an ordered pair G = ⟨N , E ⟩ where N is a set of
vertices or nodes and E is a set of edges. Each edge is a set of two distinct vertices;
these vertices are adjacent or neighbors.
A graph is finite if it has finitely many vertices and infinite if it has infinitely many.
3.2 Example This simple graph G has five vertices and eight edges.

𝑣0
𝑣1 𝑣2 N = {𝑣 0, ... 𝑣 4 }
E = { {𝑣 0, 𝑣 1 }, {𝑣 0, 𝑣 2 }, ... {𝑣 3, 𝑣 4 } }
𝑣3 𝑣4

Important: a graph is not its picture. Both of the pictures below show the same
graph as above because they show the same vertices connected with the same
edges.

𝑣0
𝑣3 𝑣1 𝑣1 𝑣2
𝑣0
𝑣4 𝑣2 𝑣3 𝑣4

Instead of writing 𝑒 = {𝑣, 𝑣ˆ } we often write 𝑒 = 𝑣 𝑣ˆ. Since edges are sets and
sets are unordered we could write the same edge as 𝑒 = 𝑣ˆ𝑣 .
Section 3. Graphs 159

There are many extensions of that definition for modeling different circum-
stances. One is to allow some vertices to connect to themselves, forming a loop.†
Another variant is a multigraph, which allows two vertices to share more than
one edge. Still another is a weighted graph, which gives each edge a real number
weight, perhaps signifying the distance or the cost in money or time to traverse
that edge.
A very often-used variation is a directed graph or digraph, where edges have a
direction, as in a road map that includes one-way streets. If an edge is directed
from 𝑣 to 𝑣ˆ then we can write it as 𝑣 𝑣ˆ but not in the other order. The Turing
machine graph above is a digraph and also has loops.
Some important variations involve whether the graph has cycles. A cycle is a
closed path around the graph; see the complete definition just below. A tree is an
undirected connected graph with no cycles (often one vertex is singled out as the
tree’s root). A directed acyclic graph or DAG is a directed graph with no directed
cycles.
Paths Many problems that we shall consider involve moving through a graph.
3.3 Definition Two graph edges are adjacent if they share a vertex, so that they
have the form 𝑒 0 = 𝑢𝑣 and 𝑒 1 = 𝑣𝑤 . A walk is a sequence of adjacent edges
⟨𝑣 0𝑣 1, 𝑣 1𝑣 2, ... 𝑣𝑛−1𝑣𝑛 ⟩ . Its length is the number of edges, 𝑛 . If the initial vertex 𝑣 0
equals the final vertex 𝑣𝑛 then the walk is closed, otherwise it is open. A trail is a
walk where no edge occurs twice. A circuit is a closed trail. A path is a walk with
no repeated edges or vertices, except that it may be closed and so have that its
first and last vertices are equal. A closed path with at least one edge is a cycle.‡
3.4 Example On the left, highlighted is a path from 𝑢 0 to 𝑢 3 , 𝑝 = ⟨𝑢 0𝑢 1, 𝑢 1𝑢 3 ⟩ . On
the right the highlighted walk is a cycle.
𝑣0 𝑣1
𝑢0
𝑣2 𝑣3

𝑢1
𝑣4 𝑣5
𝑢2 𝑢3
𝑣6 𝑣7

3.5 Definition If a circuit contains all of a graph’s edges then it is an Euler circuit.
If it contains all of the vertices then it is a Hamiltonian circuit.
3.6 Example In Example 3.4 the path in the graph on the left is not a circuit because
it is not closed. The path in the graph on the right is a Hamiltonian circuit but it is
not an Euler circuit.
3.7 Definition Where G = ⟨N , E ⟩ is a graph, a subgraph Ĝ = ⟨N̂ , Ê ⟩ satisfies
N̂ ⊆ N and Ê ⊆ E . A subgraph with every possible edge, so that 𝑣𝑖 , 𝑣 𝑗 ∈ N̂ and
𝑒 = 𝑣𝑖 𝑣 𝑗 ∈ E implies that 𝑒 ∈ Ê also, is an induced subgraph.
†
Formally, we might extend the definition to allow some edges in E to be single-element sets. We will
not specify how each variant is described. ‡ These terms are not completely standardized so you may
see them used in other ways, especially in older work.
160 Chapter III. Languages, Grammars, and Graphs

3.8 Example In the graph G on the left of Example 3.4, consider the edges in the
highlighted path, Ê = {𝑢 0𝑢 1, 𝑢 1𝑢 3 }. Taking those edges along with the vertices
that they contain, N̂ = {𝑢 0, 𝑢 1, 𝑢 3 }, gives a subgraph Ĝ .
With the same set of vertices, N̂ = {𝑢 0, 𝑢 1, 𝑢 3 }, the induced subgraph is the
triangle that adds the outer edge, E ∪ {𝑢 0𝑢 3 }.
3.9 Definition A vertex 𝑣 1 is reachable from the vertex 𝑣 0 if there is a path from 𝑣 0
to 𝑣 1 . A graph is connected if between any two vertices there is a path.
In Chapter Five we will consider the graph of the possible branchings of a
computation by a machine. Such a graph may have infinitely many nodes, as when
there is a branch that does not halt. There, we will need the next result.
3.10 Lemma (König’s lemma) Suppose that in a connected graph each vertex is
adjacent to only finitely many other vertices. If the graph has infinitely many
vertices then it has an infinite path, one with infinitely many vertices.
Proof Fix a vertex 𝑣 0 . The graph is connected, so for every other vertex there is
a path starting at 𝑣 0 that reaches it. For each of 𝑣 0 ’s neighbors, there is a set of
vertices that can be reached from 𝑣 0 via a path through that neighbor. There are
infinitely many vertices so there must be a neighbor (unequal to 𝑣 0 ) where the set
of vertices that are reachable in that way is infinite. Pick such a neighbor and call
it 𝑣 1 .
Now iterate: by choice of 𝑣 1 there are infinitely many vertices reachable by a
path starting with the edge 𝑣 0𝑣 1 . Because 𝑣 1 has finitely many neighbors, there is
a 𝑣 2 adjacent to 𝑣 1 (and unequal to either 𝑣 0 or 𝑣 1 ), through which there are paths
to infinitely many of the graph’s vertices. In this way we get a path containing
infinitely many vertices

Graph representation We can represent graphs in a computer with reasonable

efficiency. A simple and common way is with a matrix. This example represents
Example 3.2’s graph: it has a 1 in the 𝑖, 𝑗 entry if the graph has an edge from 𝑣𝑖
to 𝑣 𝑗 and a 0 if there is no such edge.

𝑣0 𝑣1 𝑣2 𝑣3 𝑣4
𝑣0 0 1 1 0 0
𝑣1 ©1 0 1 1 1 ª®
M(G ) = 𝑣 2 1 1 0 1 1 ®®

0 1 1 0 1®

𝑣3
𝑣4 «0 1 1 1 0¬

We can extend this to cover other graph variants that were listed earlier. For
instance, the graph represented in (∗) is a simple graph because the matrix has
only 0 and 1 entries, because all the diagonal entries are 0, and because the matrix
is symmetric, meaning that the 𝑖, 𝑗 entry has a 1 if and only if the 𝑗, 𝑖 entry is also 1.
If the graph is directed and has a one-way edge from 𝑣𝑖 to 𝑣 𝑗 but none from 𝑣 𝑗 to 𝑣𝑖
then the matrix is not symmetric because the 𝑖, 𝑗 entry will be 1 but the 𝑗, 𝑖 entry
Section 3. Graphs 161

will be 0. For a multigraph, where there can be multiple edges from one vertex to
another, the associated entry can be larger than 1. And, if the graph has a loop
then the matrix has a diagonal entry that is a natural number larger than zero.
3.11 Definition For a graph G , the adjacency matrix M ( G ) has that the 𝑖, 𝑗 entry
equals the number of edges from 𝑣𝑖 to 𝑣 𝑗 .
3.12 Lemma Let the matrix M ( G ) represent the graph G . Then in its matrix multi-
plicative 𝑛 -th power the 𝑖, 𝑗 entry is the number of paths of length 𝑛 from vertex 𝑣𝑖
to vertex 𝑣 𝑗 .
Proof Exercise 3.41.

Colors We sometimes partition a graph’s vertices.

3.13 Definition A 𝑘 -coloring of a graph, for 𝑘 ∈ N, is a partition of its vertices into
𝑘 -many classes such that adjacent vertices are in different classes.
The name comes from the convention of showing the classes by drawing the
vertices in different colors.
On the left is a graph that is 3-colored.

In contrast, the graph on the right has no 3-coloring. The four vertices are
completely connected so if two got the same color then they would be adjacent.
3.14 Example This table gives five committees. How many time slots must we use to
so that no one has two meetings at once?

A B C D E
Armis Crump Burke India Burke
Jones Edwards Frank Harris Jones
Smith Robinson Ke Smith Robinson

Model this with a graph by taking each vertex to be a committee and if committees
are related by sharing a member then put an edge between them.

𝐵
𝐴 𝐶

𝐸 𝐷

The picture shows that three colors is enough, that is, three time slots suffice. But
there is also a two-coloring, C0 = {𝐴, 𝐵, 𝐶 } and C1 = { 𝐷, 𝐸 }.
162 Chapter III. Languages, Grammars, and Graphs

A graph’s chromatic number is the minimum number 𝑘 where the graph has a
𝑘 -coloring.
Graph isomorphism We sometimes want to know when two graphs are essentially
identical. Consider these two.

𝑤0 𝑤1
𝑣3 𝑣4 𝑣5
𝑤5 𝑤2
𝑣0 𝑣1 𝑣2
𝑤4 𝑤3

They have the same number of vertices and the same number of edges. Further,
on the right as well as on the left there are two classes of vertices where all the
vertices in the first class connect to all the vertices in the second class: on the left
the two classes are the top and bottom rows while on the right they are the even-
and odd-numbered vertices. A person may suspect that, as in Example 3.2, these
are two ways to draw the same graph, with the vertex names changed for further
obfuscation.
That’s true; if we define this correspondence between the vertices

Vertex on left 𝑣0 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5
Vertex on right 𝑤0 𝑤2 𝑤4 𝑤1 𝑤3 𝑤5

then as a consequence the edges also correspond.

Edge on left {𝑣 0, 𝑣 3 } {𝑣 0, 𝑣 4 } {𝑣 0, 𝑣 5 } {𝑣 1, 𝑣 3 } {𝑣 1, 𝑣 4 } {𝑣 1, 𝑣 5 }
Edge on right {𝑤 0, 𝑤 1 } {𝑤 0, 𝑤 3 } {𝑤 0, 𝑤 5 } {𝑤 2, 𝑤 1 } {𝑤 2, 𝑤 3 } {𝑤 2, 𝑤 5 }
(Cont.) Edge on left {𝑣 2, 𝑣 3 } {𝑣 2, 𝑣 4 } {𝑣 2, 𝑣 5 }
Edge on right {𝑤 2, 𝑤 1 } {𝑤 2, 𝑤 3 } {𝑤 2, 𝑤 5 }

3.15 Definition Two graphs G and Ĝ are isomorphic if there is a one-to-one and onto
map 𝑓 : N → N̂ such that G has an edge {𝑣𝑖 , 𝑣 𝑗 } ∈ E if and only if Ĝ has the
associated edge { 𝑓 (𝑣𝑖 ), 𝑓 (𝑣 𝑗 ) } ∈ Ê .
To verify that two graphs are isomorphic the most natural thing is to
produce the map 𝑓 and then verify that in consequence the edges also
correspond. The exercises have examples.
Showing that graphs are not isomorphic usually entails finding some
graph-theoretic way in which they differ. A useful such property to
consider is the degree of a vertex, the total number of edges touching
that vertex with the proviso that a loop from the vertex to itself counts as
two. The degree sequence of a graph is the non-increasing sequence of
its vertex degrees. Thus, the graph in Example 3.14 has degree sequence
⟨3, 2, 1, 1, 1⟩ . Courtesy xkcd
Exercise 3.39 shows that if graphs are isomorphic then associated .com
vertices have the same degree and thus graphs with different degree
Section 3. Graphs 163

sequences are not isomorphic. Also, if we have two isomorphic graphs then we
can use the degrees of the vertices to help us construct an isomorphism, if there is
one; examples are in the exercises. (Note, though, that there are graphs with the
same degree sequence that are not isomorphic.)
Determining whether two given graphs are isomorphic is in general a hard
problem. We could use brute force, checking every possible correspondence
between the two sets of vertices, but that would be slow. We do not currently know
whether there is a quick way. More on algorithm speed, including the speed of a
number of graph algorithms, is in the final chapter.

III.3 Exercises
✓ 3.16 Draw a picture of a graph illustrating each relationship. Some graphs will
be digraphs, or may have loops or multiple edges between some pairs of vertices.
(a) Maine is adjacent Massachusetts and New Hampshire. Massachusetts is adja-
cent to every other state. New Hampshire is adjacent to Maine, Massachusetts,
and Vermont. Rhode Island is adjacent to Connecticut and Massachusetts.
Vermont is adjacent to Massachusetts and New Hampshire. Give the graph
describing the adjacency relation.
(b) In the game of Rock-Paper-Scissors, Rock beats Scissors, Paper beats Rock,
and Scissors beats Paper. Give the graph of the ‘beats’ relation; note that this
is a directed relation.
(c) The number 𝑚 ∈ N is related to the number 𝑛 ∈ N by being its divisor if they
are unequal and if there is a 𝑘 ∈ N with 𝑚 · 𝑘 = 𝑛 . Give the graph describing
the divisor relation among positive natural numbers less than or equal to 12
(it is a digraph).
(d) The river Pregel cut the town of Königsberg into four land masses. There were
two bridges from mass 0 to mass 1 and one bridge from mass 0 to mass 2.
There was one bridge from mass 1 to mass 2, and two bridges from mass 1
to mass 3. Finally, there was one bridge from mass 2 to 3. Consider masses
related by bridges. Give the graph (it is a multigraph).
3.17 Put ‘Y’ or ‘N’ in the array cells for these kinds of walks.

Vertices can Edges can Can be Can be

repeat? repeat? closed? open?
Walk
Trail
Circuit
Path
Cycle

3.18 If a graph has many edges then from a visual design standpoint it can be
confusing. Sometimes in a directed graph we can take advantage of a ‘precedes’
relation being transitive to draw it with the minimum number of edges that conveys
all of the information. Suppose that in a Mathematics program students must take
164 Chapter III. Languages, Grammars, and Graphs

Calculus II before Calculus III, and must take Calculus I before II. They must also
take Calculus II before Linear Algebra, and to take Real Analysis they must have
both Linear Algebra and Calculus III. Draw the digraph with a minimum number
of edges.
3.19 Let a simple graph G have vertices {𝑣 0, ... 𝑣 5 } and the edges 𝑣 0𝑣 1 , 𝑣 0𝑣 3 , 𝑣 0𝑣 5 ,
𝑣 1𝑣 4 , 𝑣 3𝑣 4 , and 𝑣 4𝑣 5 . (a) Draw G . (b) Give its adjacency matrix. (c) Find all
subgraphs with four nodes and four edges. (d) Find all induced subgraphs with
four nodes and four edges.
3.20 The complete graph on 𝑛 vertices, 𝐾𝑛 is the simple graph with all possible
edges. (a) Draw 𝐾4 , 𝐾3 , 𝐾2 , and 𝐾1 . (b) Draw 𝐾5 . (c) How many edges does 𝐾𝑛
have?
✓ 3.21 Morse code represents text with a combination of a short sound, written ‘.’
and pronounced “dit,” and a long sound, written ‘-’ and pronounced “dah.” Here
are the representations of the twenty six English letters.
A .- F ..-. K -.- O --- S ... W .--
B - ... G -- . L .-.. P . --. T - X -..-
C -.-. H .... M -- Q --.- U ..- Y -.--
D -.. I .. N -. R .-. V ...- Z --..
E . J .---
Some representations are prefixes of others. Give the graph for the prefix relation.
3.22 This is the Petersen graph, often used for examples in Graph Theory.
𝑣0

𝑣5
𝑣1 𝑣4
𝑣6 𝑣9

𝑣7 𝑣8

𝑣2 𝑣3

(a) List the vertices and edges. (b) Give two walks from 𝑣 0 to 𝑣 7 . What is the length
of each? (c) List both a closed walk and an open walk of length five, starting at 𝑣 4 .
(d) Give a cycle starting at 𝑣 5 . (e) Is this graph connected?
3.23 A graph is a set of vertices and edges, not a drawing. So a single graph
may be drawn with quite different pictures. Consider a graph G with the vertices
N = {𝐴, ... 𝐻 } and these edges.
E = {𝐴𝐵, 𝐴𝐶, 𝐴𝐺, 𝐴𝐻, 𝐵𝐶, 𝐵𝐷, 𝐵𝐹, 𝐶𝐷, 𝐶𝐸, 𝐷𝐸, 𝐷𝐹, 𝐸𝐹, 𝐸𝐺, 𝐹 𝐻, 𝐺𝐻 }
(a) Connect the dots below to get one drawing.
𝐵 𝐸
𝐴 𝐺

𝐶 𝐻
𝐷 𝐹
Section 3. Graphs 165

(b) A planar graph is one that can be drawn in the plane so that its edges do not
cross. Show that G is planar.
3.24 A person keeps six species of fish as pets. Species 𝐴 cannot be in a tank with
species 𝐵 or 𝐶 . Species 𝐵 cannot be with 𝐴, 𝐶 , or 𝐷 . Species 𝐶 cannot be with 𝐴,
𝐵 , 𝐷 , or 𝐸 . Species 𝐷 cannot be with 𝐵 , 𝐶 or 𝐹 . Species 𝐸 cannot be together with
𝐶 , or 𝐹 . Finally, species 𝐹 cannot be in with 𝐷 or 𝐸 . (a) Draw the graph where
the nodes are species and the edges represent the relation ‘cannot be together’.
(b) Find the chromatic number. (c) Interpret it.
✓ 3.25 If two cell towers are within line of sight of each other then they must be
assigned different frequencies. Below each tower is a vertex and an edge between
towers denotes that they can see each other. What is the minimal number of
frequencies? Give an assignment of frequencies to towers.

𝑣0 𝑣1 𝑣2

𝑣3 𝑣4 𝑣5 𝑣6

𝑣7 𝑣8 𝑣9 𝑣10

3.26 For the graph in the prior exercise, give the degree sequence.
✓ 3.27 For a blood transfusion, unless the recipient is compatible with the donor’s
blood type they can have a severe reaction. Compatibility depends on the presence
or absence of two antigens, called A and B, on the red blood cells. This creates
four major groups: A, B, O (the cells have neither antigen), and AB (the cells have
both). There is also a protein called the Rh factor that can be either present (+)
or absent (–). Thus there are eight common blood types, A+, A-, B+, B-, O+, O-,
AB+, and AB-. If the donor has the A antigen then the recipient must also have it,
and the B antigen and Rh factor work the same way. Draw a directed graph where
the nodes are blood types and there is an edge from the donor to the recipient if
transfusion is safe. Produce the adjacency matrix.
3.28 Find the degree sequence of the graph in Example 3.2 and of the two graphs
of Example 3.4.
3.29 Give the array representation, like that in equation (∗), for the graphs of
Example 3.4.
3.30 Draw a graph for this adjacency matrix.

𝑣0 𝑣1 𝑣2 𝑣3
𝑣0 0 1 1 0
𝑣1 ©1 0 0 1 ª®
1 0 0 1®

𝑣2
𝑣3 «0 1 1 0¬

3.31 Show that every tree has a 2-coloring.

166 Chapter III. Languages, Grammars, and Graphs

✓ 3.32 These two graphs are isomorphic.

𝑎
𝑥 𝑧 𝐴 𝐵 𝐶

𝑏 𝑐 𝑋 𝑌 𝑍
𝑦

(a) Define the function giving the correspondence.

(b) Verify that under that function the edges then also correspond.
3.33 For the two graphs in the prior exercise, give the degree sequences. Are they
the same?
✓ 3.34 Consider this tree.
𝐴

𝐵 𝐶

𝐷 𝐸 𝐹 𝐺

(a) Verify that ⟨𝐵𝐴, 𝐴𝐶⟩ is a path from 𝐵 to 𝐶 .

(b) Why is ⟨𝐵𝐷, 𝐷𝐵, 𝐵𝐴, 𝐵𝐶⟩ not also a path from 𝐵 to 𝐶 ?
(c) Show that in any tree, for any two vertices there is a unique path from one to
the other.
3.35 For the tree in the prior exercise, give the degree sequence.
✓ 3.36 A graph traversal is a sequence listing each vertex in a graph. The sequence
need not follow the edges and some vertices may repeat. (Also see ??.)
(a) In a connected tree with a root, the rank of a vertex is the number of edges in
the shortest path between that vertex and the root. A breadth first traversal
lists vertices in rank order, so all vertices of rank 𝑘 are listed before any of
rank 𝑘 + 1. That is, a breadth first traversal visits sibling vertices before visiting
any child vertices. Give a breadth first traversal of Exercise 3.34’s tree.
(b) A depth first traversal visits child vertices before sibling vertices. Give a depth
first traversal of the same tree.
3.37 This is the Petersen graph.

(a) Show that it has no 2-coloring.

(b) Give a 3-coloring.
3.38 Consider building a simple graph by starting with with 𝑛 vertices. (a) How
many potential edges are there? (b) How many such graphs are there? (c) List
the number of such graphs for 𝑛 = 0 through 𝑛 = 6.
Section 3. Graphs 167

3.39 We can use degrees and degree sequences to to show that graphs are not
isomorphic, or to help construct isomorphisms if they exist. (In this question graphs
can have loops and multiple edges between vertices, but not directed edges or
edges with weights.)
(a) Show that if two graphs are isomorphic then they have the same number of
vertices. Thus graphs with different numbers of vertices are not isomorphic.
(b) Show that if two graphs are isomorphic then they have the same number of
edges. Thus graphs with different numbers of edges are not isomorphic.
(c) Show that if two graphs are isomorphic and one has a vertex of degree 𝑘 then
so does the other. Thus two graphs where one has a degree 𝑘 vertex and the
other does not are not isomorphic.
(d) Show that if two graphs are isomorphic then for each degree 𝑘 , the number of
vertices of the first graph having that degree equals the number of vertices
of the second graph having that degree. Thus graphs with different degree
sequences are not isomorphic.
(e) Use the prior result to show that the two graphs of Example 3.4 are not
isomorphic.
(f) Verify that while these two graphs have the same degree sequence, they are
not isomorphic. Hint: consider the paths starting at the degree 3 vertex.
𝑣2 𝑣3 𝑤0
𝑣0 𝑣1 𝑤2 𝑤3 𝑤4 𝑤5
𝑣4 𝑣5 𝑤1

As in the final item, in arguments we often use the contrapositive of these statements.
For instance, the first item implies that if they do not have the same number of
vertices then they are not isomorphic.
✓ 3.40 Consider these two graphs, G0 and G1 .
𝑣0 𝑣1 𝑛6 𝑛2
𝑣4 𝑣5 𝑛5 𝑛1

𝑣6 𝑣7 𝑛0 𝑛4
𝑣2 𝑣3 𝑛7 𝑛3

(a) List the vertices and edges of G0 . Do the same for G1 .

(b) Give the degree sequences of G0 and G1 .
(c) Consider this correspondence between the vertices.
vertex of G0 𝑣0 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 𝑣7
vertex of G1 𝑛6 𝑛2 𝑛7 𝑛3 𝑛5 𝑛1 𝑛0 𝑛4
Find the image, under the correspondence, of the edges of G0 . Do they match
the edges of G1 ?
(d) Of course, failure of any one proposed map does not imply that the two cannot
be isomorphic. Nonetheless, argue that they are not isomorphic.
168 Chapter III. Languages, Grammars, and Graphs

3.41 These two are the base and inductive steps for a proof of Lemma 3.12.
(a) An edge is a length-one walk. Show that in the product of the matrix with
2
itself M ( G ) the entry 𝑖, 𝑗 is the number of length-two
𝑛 walks.
(b) Show that for 𝑛 > 2, the 𝑖, 𝑗 entry of the power M ( G ) equals the number
of length 𝑛 walks from 𝑣𝑖 to 𝑣 𝑗 .
3.42 In a finite graph, for a node 𝑞 0 there may be some nodes 𝑞𝑖 that are
unreachable, so there is no path from 𝑞 0 to 𝑞𝑖 .
(a) Devise an algorithm that inputs a directed graph and a start node 𝑞 0 , and
finds the set of nodes that are unreachable from 𝑞 0 .
(b) Apply your algorithm to these two graphs, starting with 𝑤 0 .
𝑤0 𝑤4
𝑤3

𝑤2 𝑤1 𝑤3
𝑤0 𝑤1 𝑤2

Extra
III.A BNF
We shall introduce some grammar notation conveniences that are widely used.
Together they are called Backus-Naur form, BNF.
The study of grammar, the rules for phrase structure
and forming sentences, has a long history, dating
back as early as the fifth century BC. Mathematicians,
including A Thue and E Post, began systematizing it
as rewriting rules by the early 1900’s. The variant we
see here was produced in the late 1950’s by J Backus
John Backus 1924–2007 and Peter
with contributions from P Naur as part of the design
Naur 1928–2016
of the early computer language ALGOL60. Since then
these rules have become the most common way to express grammars.
One difference from Section 2 is a minor typographical change. Metacharacters
including ‘→’ were at the time not typeable with a standard keyboard. In its place
BNF uses ‘::=’.†
BNF is both clear and concise. It can express the range of languages that we
ordinarily want to express (context free grammars) and it smoothly translates to
a parser. That is, BNF is an impedance match — it fits with what we want to do.
Here we will include some extensions for grouping and replication that are like
what you typically see in the wild.‡
1.1 Example This is a BNF grammar for real numbers with a finite decimal part. To
the rules for ⟨natural⟩ from Example 2.6, add these.
†
There are other typographical issues that arise with grammars. While many authors write nonterminals
with diamond brackets, as we do, others use a separate type style or color. ‡ BNF is only loosely defined.
While there are standards, often what you see does not conform exactly to any single standard.
Extra A. BNF 169

⟨start⟩ ::= - ⟨fraction⟩ | + ⟨fraction⟩ | ⟨fraction⟩

⟨fraction⟩ ::= ⟨natural⟩ | ⟨natural⟩ . ⟨natural⟩
This derivation for 2.718 is rightmost.
⟨start⟩ ⇒ ⟨fraction⟩ ⇒ ⟨natural⟩ . ⟨natural⟩
⇒ ⟨natural⟩ . ⟨digit⟩⟨natural⟩ ⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩⟨natural⟩
⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩⟨digit⟩ ⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩ 8
⇒ ⟨natural⟩ . ⟨digit⟩ 18 ⇒ ⟨natural⟩ .718 ⇒ 2.718
Here is a derivation for 0.577 that is neither leftmost nor rightmost.
⟨start⟩ ⇒ ⟨fraction⟩ ⇒ ⟨natural⟩ . ⟨natural⟩
⇒ ⟨natural⟩ . ⟨digit⟩⟨natural⟩ ⇒ ⟨natural⟩ .5 ⟨natural⟩
⇒ ⟨natural⟩ .5 ⟨digit⟩⟨natural⟩ ⇒ ⟨digit⟩ .5 ⟨digit⟩⟨natural⟩
⇒ ⟨digit⟩ .5 ⟨digit⟩⟨digit⟩ ⇒ ⟨digit⟩ .5 ⟨digit⟩ 7 ⇒ 0.5 ⟨digit⟩ 7
⇒ 0.577
1.2 Example Time is a difficult engineering problem. One issue is representing
times and one standard in that area is RFC 3339, Date and Time on the Internet:
Timestamps. It uses strings such as 1958-10-12T23:20:50.52Z. Here is part of a
BNF grammar. (It includes a metacharacter extension discussed below.)
⟨date-fullyear⟩ ::= ⟨4-digits⟩
⟨date-month⟩ ::= ⟨2-digits⟩
⟨date-mday⟩ ::= ⟨2-digits⟩
⟨time-hour⟩ ::= ⟨2-digits⟩
⟨time-minute⟩ ::= ⟨2-digits⟩
⟨time-second⟩ ::= ⟨2-digits⟩
⟨time-secfrac⟩ ::= . ⟨1-or-more-digits⟩
⟨time-numoffset⟩ ::= (+ | -) ⟨time-hour⟩ : ⟨time-minute⟩
⟨time-offset⟩ ::= Z | ⟨time-numoffset⟩
⟨partial-time⟩ ::= ⟨time-hour⟩ : ⟨time-minute⟩ : ⟨time-second⟩
[ ⟨time-secfrac⟩ ]
⟨full-date⟩ ::= ⟨date-fullyear⟩ - ⟨date-month⟩ - ⟨date-mday⟩
⟨full-time⟩ ::= ⟨partial-time⟩ ⟨time-offset⟩
⟨date-time⟩ ::= ⟨full-date⟩ T ⟨full-time⟩
There are a number of extended BNF notations that that are more than simple
character substitutions. One is shown above in the ⟨partial-time⟩ rule, which
includes square brackets as metacharacters to denote that ⟨time-secfrac⟩ is optional.
This is a very common construct: another example of it is in this syntax description
for if ... then ... with an optional else ...
⟨if-stmt⟩ ::= if ⟨boolean-expr⟩ then ⟨stmt-sequence⟩
[ else ⟨stmt-sequence⟩ ] ⟨end if ⟩ ;
To show repetition, BNF uses a Kleene star ∗ for ‘zero or more’, as here.
⟨identifier⟩ ::= ⟨letter⟩ ( ⟨letter⟩ | ⟨digit⟩ )*
To express the repetition construct ‘one or more’, BNF uses a plus sign, +.
170 Chapter III. Languages, Grammars, and Graphs

1.3 Example This grammar for Python floating point numbers shows both square
brackets and the plus sign.
⟨floatnumber⟩ ::= ⟨pointfloat⟩ | ⟨exponentfloat⟩
⟨pointfloat⟩ ::= [ ⟨intpart⟩ ] ⟨fraction⟩ | ⟨intpart⟩ .
⟨exponentfloat⟩ ::= ( ⟨intpart⟩ | ⟨pointfloat⟩ ) ⟨exponent⟩
⟨intpart⟩ ::= ⟨digit⟩ +
⟨fraction⟩ ::= . ⟨digit⟩ +
⟨exponent⟩ ::= (e | E) [+ | -] ⟨digit⟩ +
In the ⟨pointfloat⟩ rule the first ⟨intpart⟩ is optional. And, an ⟨intpart⟩ consists of
one or more digits.
Each of these extension constructs is not necessary in that we can express the
grammars without the extensions. For instance, we could replace the this use of
Kleene star
⟨identifier⟩ ::= ⟨letter⟩ ( ⟨letter⟩ | ⟨digit⟩ )*
with this.
⟨identifier⟩ ::= ⟨letter⟩ | ⟨letter⟩ ⟨atoms⟩
⟨atoms⟩ ::= ⟨letter⟩ ⟨atoms⟩ | ⟨digit⟩ ⟨atoms⟩ | 𝜀
But these constructs come up often enough that adopting an abbreviation is a
significant convenience.
Passing from the grammar to a parser for that grammar is mechanical. There
are programs that take as input a grammar, often one in BNF, and give as output
source code that will parse files following that grammar’s format. Such a program
is a parser-generator (sometimes instead called a compiler-compiler, which is a
fun term but is misleading because a parser is only part of a compiler).

III.A Exercises
✓ A.4 US ZIP codes have five digits, and may have a dash and four more digits at
the end. Give a BNF grammar.
A.5 Write a grammar in BNF for the language of palindromes, using Σ = { a, ... z }.
✓ A.6 At a college, course designations have a form like ‘MA 208’ or ‘PSY 101’,
where the department is two or three capital letters and the course is three digits.
Give a BNF grammar.
✓ A.7 Example 1.3 uses some BNF convenience abbreviations.
(a) Give a rule (or rules) equivalent to ⟨pointfloat⟩ but that doesn’t use square
brackets.
(b) Similarly replace the repetition operator in ⟨intpart⟩ ’s rule, as well as the
square brackets and repetition for ⟨exponent⟩ .
✓ A.8 In Roman numerals the letters I, V, X, L, C, D, and M stand for the values 1,
5, 10, 50, 100, 500, and 1 000. We represent natural numbers by writing these
letters from left to right in descending order of value, so that XVI represents the
number that in decimal notation is 16, while MDCCCCLVIII represents 1958. We
Extra A. BNF 171

always write the shortest possible string, so we do not write IIIII because we can
instead write V. However, as we don’t have a symbol whose value is larger than
1 000 we must represent large numbers with lots of M’s.
(a) Give a grammar for the strings that make sense as Roman numerals.
(b) Often Roman numerals are written in subtractive notation: for instance, 4 is
represented as IV, because four I’s are hard to distinguish from three of them
in a setting such as the face of a watch or clock. In this notation 9 is IX, 40
is XL, 90 is XC, 400 is CD, and 900 is CM. Give an extended BNF grammar for
the strings that can appear in this notation.
A.9 This grammar is for a small C-like programming language.
⟨program⟩ ::= { ⟨statement-list⟩ }
⟨statement-list⟩ ::= [ ⟨statement⟩ ; ]*
⟨statement⟩ ::= ⟨data-type⟩ ⟨identifier⟩
| ⟨identifier⟩ = ⟨expression⟩
| print ⟨identifier⟩
| while ⟨expression⟩ { ⟨statement-list⟩ }
⟨data-type⟩ ::= int | boolean
⟨expression⟩ ::= ⟨identifier⟩ | ⟨number⟩ | ( ⟨expression⟩ ⟨operator⟩
⟨expression⟩ )
⟨identifier⟩ ::= ⟨letter⟩ [ ⟨letter⟩ ]*
⟨number⟩ ::= ⟨digit⟩ [ ⟨digit⟩ ]*
⟨operator⟩ ::= + | ==
⟨letter⟩ ::= A | B | . . . | Z
⟨digit⟩ ::= 0 | 1 | . . . | 9
(a) Give a derivation and parse tree for this program.
{ int A ;
A = 1 ;
print A ;
}

(b) Must all programs be surrounded by curly braces?

A.10 Here is a grammar for LISP.
⟨s-expression⟩ ::= ⟨atomic-symbol⟩
| ( ⟨s-expression⟩ . ⟨s-expression⟩ )
| ⟨list⟩
⟨list⟩ ::= ( ⟨s-expression⟩ * )
⟨atomic-symbol⟩ ::= ⟨letter⟩ ⟨atom-part⟩
⟨atom-part⟩ ::= ⟨empty⟩
| ⟨letter⟩ ⟨atom-part⟩
| ⟨number⟩ ⟨atom-part⟩
⟨letter⟩ ::= a | b | . . . z
⟨number⟩ ::= 1 | 2 | . . . 9
There is also a rule that from ⟨empty⟩ produces a blank space " " (this rule is
clearer to state in words than it is to show). Derive the s-expression (cons (car
x) y).
172 Chapter III. Languages, Grammars, and Graphs

A.11 Python 3’s Format Specification Mini-Language is used to describe string

substitution.
⟨format-spec⟩ ::=
[[ ⟨fill⟩ ] ⟨align⟩ ][ ⟨sign⟩ ][#][0][ ⟨width⟩ ][ ⟨gr⟩ ][. ⟨precision⟩ ][ ⟨type⟩ ]
⟨fill⟩ ::= ⟨any character⟩
⟨align⟩ ::= < | > | = | ˆ
⟨sign⟩ ::= + | - |
⟨width⟩ ::= ⟨integer⟩
⟨gr⟩ ::= - | ,
⟨precision⟩ ::= ⟨integer⟩
⟨type⟩ ::= b | c | d | e | E | f | F | g | G | n | o | s | x | X | %
Take ⟨integer⟩ to produce ⟨digit⟩ ⟨integer⟩ or ⟨digit⟩ . Give a derivation of these
strings: (a) 03f (b) +#02X.

Extra
III.B Graph traversal

In a number of places in this book we describe traversing a tree or other graph. For
example, when we described Cantor’s correspondence enumerating the set N × N,
we drew this array.

.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···

The enumeration starts like this.

Number 0 1 2 3 4 5 6 ...
Pair ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .

We can make a graph by connecting each pair in the array to its neighbor above,
and the one to the right. This shows the result with the lower left rotated to the
top.

0, 0

0, 1 1, 0

0, 2 1, 1 2, 0

0, 3 1, 2 2, 1 3, 0
Extra B. Graph traversal 173

This graph isn’t a tree because there are vertices that are connected by more than
one path, for instance ⟨0, 0⟩ and ⟨1, 1⟩ . Instead it is a directed acyclic graph, a
DAG. Cantor’s enumeration is a breadth first traversal of the DAG.
Here we will show Racket code to traverse trees and DAG’s (they are alike in
that they have no cycles).
We will show two ways to do that. Below, on the left is a tree with ten nodes.
On the right the table illustrates visiting the nodes in depth-first order, where
we visit a node’s children before going on to visit its siblings. It also illustrates
breadth-first, where we cover all nodes that are at rank 𝑘 before visiting any nodes
of rank 𝑘 + 1 (a node is rank 𝑘 when a minimal path to the root has 𝑘 edges).
a

b c d Traversal Node order

Depth first 𝑎 -𝑏 -𝑒 - 𝑓 -𝑐 -𝑑 -𝑔-𝑖 - 𝑗 -ℎ
e g
f h
Breadth first 𝑎 -𝑏 -𝑐 -𝑑 -𝑒 - 𝑓 -𝑔-ℎ -𝑖 - 𝑗
i j

The code first defines a node , essentially as an ordered pair.

(struct node (name children))

The Racket struct construct automatically defines a number of useful functions.

One is a creator function ( node n s), which brings nodes into existence. It also
makes the accessor functions node - name and node - children that we will see
below.
We use that creator to make the next function.
(define (node-create name)
(node name (mutable -set)))

Note that children is created as a set so that we can quickly access its members.
This set is mutable because as we create the tree we will add children to that set,
so we must be able to change it.
We use that routine to create new trees and DAG’s.
(define (graph-create first-node-name)
(node-create first-node-name))

The next routine inputs a node and adds a child to its set of children. (LISP-
derived languages have a convention of using an exclamation mark for the names
of procedures whose main role is not to return something but instead to cause side
effects such as altering a data structure.)†
(define (node-add-child! parent child-name)
(let ([n (node-create child-name)])
(set-add! (node-children parent) n)
n))

†
The code here has a way to tie from parent to child but no direct way to tie back. So this tree is
directed. We can of course write code for breadth-first traversals of undirected trees but for our purposes
this suffices.
174 Chapter III. Languages, Grammars, and Graphs

With those, we can construct the ten-node tree shown above.

(define (sample-tree-make)
(let* ([t (graph-create "a")]
[nb (node-add-child! t "b")]
[nc (node-add-child! t "c")]
[nd (node-add-child! t "d")]
[ne (node-add-child! nb "e")]
[nf (node-add-child! nb "f")]
[ng (node-add-child! nd "g")]
[nh (node-add-child! nd "h")]
[ni (node-add-child! ng "i")]
[nj (node-add-child! ng "j")]
)
t))

And this returns the finite portion of Cantor’s array shown earlier.
(define (cantor-DAG-make)
(let* ([t (graph-create "0,0")]
[nb (node-add-child! t "0,1")]
[nc (node-add-child! t "1,0")]
[nd (node-add-child! nb "0,2")]
[ne (node-add-child! nb "1,1")]
[v0 (set-add! (node-children nc) ne)]
[nf (node-add-child! nb "2,0")]
[ng (node-add-child! nd "0,3")]
[nh (node-add-child! nd "1,2")]
[v1 (set-add! (node-children ne) nh)]
[ni (node-add-child! ne "2,1")]
[v2 (set-add! (node-children nf) ni)]
[nj (node-add-child! nf "3,0")]
)
t))

To demonstrate the traversal code, at each node we will just print out the name,
(define (show-node-name n r)
(printf "~a~a\n" (string-pad r) (node-name n)))

indented by 2 · 𝑘 spaces where 𝑘 is the rank.

(define (string-pad n)
(apply string-append (build-list n (lambda (x) " "))))

It is simpler so we first go through the routine to traverse depth-first.

(define (traverse -dfs node rank fcn #:maxrank [maximumrank MAXIMUM -RANK])
(fcn node rank)
(when (< rank maximumrank)
(let ([children (node-children node)])
(for ([child children])
(traverse -dfs child (+ rank 1) fcn #:maxrank maximumrank))
)))

The parameter MAXIMUM - RANK is a convenience for testing and development, to

keep the routine from running away.
(define MAXIMUM -RANK 100)

The definition of traverse - bfs makes an optional keyword argument with it as

the default.
This routine has a fcn input to describe what to do when we visit each node.
Racket and other LISP-derived languages allow us to pass functions. We will pass
show - node - name and it gets called on the ( fcn node rank ) line.
Extra B. Graph traversal 175

Here is the depth-first routine in action.

> (define t (sample-tree-make))
> (traverse -dfs t 0 show-node-name)
a
b
e
f
c
d
h
g
i
j

Mathematical sets are unordered, so when we show elements it can be that the
order out differs from the order in.
Now for breadth-first traversal. It comes in two functions, traverse - bfs and
traverse - bfs - helper . In Scheme-derived languages such as Racket, routines
are often organized with a caller and a helper. This is because the helper function
is tail-recursive. Its very last thing is a recursion, and in a Scheme language
the compiler knows that it can translate such a routine into executable code
that is iterative. This combines the expressiveness of recursion with the memory
conservation of iteration.
The strategy of traverse - bfs - helper is that at each level, when rank
is 𝑘 , the routine traverses all nodes at that rank by moving through the members
of level . As it does so, it stores all of the children of those nodes in the list
next - level .
(define (traverse -bfs node fcn #:maxrank [maxrank MAXIMUM -RANK])
(traverse -bfs-helper (mutable -set node) 0 fcn #:maxrank maxrank))

(define (traverse -bfs-helper level rank fcn #:maxrank [maxrank MAXIMUM -RANK])
(when (< rank maxrank)
(let ([next-level (mutable -set)])
(for ([node level])
(fcn node rank)
(for ([child-node (node-children node)])
(set-add! next-level child-node)
))
(when (not (set-empty? next-level))
(traverse -bfs-helper next-level (+ 1 rank) fcn)))))

This strategy will not just take the routine around a cycle because both trees and
DAGs are acyclic.
Here is the result of running the routine on the sample tree.
> (define t (sample-tree-make))
> (traverse -bfs t show-node-name)
a
b
c
d
h
e
f
g
i
j
176 Chapter III. Languages, Grammars, and Graphs

Finally, here is the result from the Cantor DAG.

> (define t (cantor-DAG-make))
> (traverse -bfs t show-node-name)
0,0
0,1
1,0
2,0
1,1
0,2
3,0
2,1
1,2
0,3

III.B Exercises
B.1 This is a binary tree because each node has either two children or none (in
the definition some authors also allow one child).
a

b c

d e f g

h i

(a) Give the Racket code to define this tree.

(b) Perform a depth-first traversal.
(c) Do breadth-first.
Chapter
IV Automata

Our touchstone model of computation is the Turing machine. It has two

components, the box and the tape. In this chapter we will focus on the box: we we
will consider what can be done with states alone, what can be done by a machine
having a number of possible configurations that is bounded.†

Section
IV.1 Finite State machines
We produce a new model that computes, the Finite State machine, by modifying
the Turing machine definition. We will strip out the capability to write, changing
from read/write to read-only. It will turn out that these machines can do many
things, but not as many as Turing machines.
Definition We begin with some examples.
1.1 Example This power switch has two states, 𝑞 off and 𝑞 on , and its input alphabet
has one token, toggle. (Its standard symbol is on the right.)
toggle

𝑞off 𝑞on
toggle

The state 𝑞 on is drawn with a double circle, denoting that it is a different kind
of state than 𝑞 off . Finite State machines can’t write to the tape so they need some
other way to declare the computation’s outcome. We say that 𝑞 on is an accepting
state or final state. A computation accepts its input string if it ends with the
machine in an accepting state.
1.2 Example Operate the turnstile below by putting in two tokens and then pushing
through. It has three states and its input alphabet is Σ = { token, push }. As with
Turing machines, the states here serve as a form of memory, although a limited
one. For instance, 𝑞 one is how the turnstile “remembers” that it has so far received
one token.
Image: The astronomical clock in Notre-Dame-de-Strasbourg Cathedral, for computing the date of
Easter. Easter falls on the first Sunday after the first full moon on or after the nominal spring equinox of
March 21. Calculation of this date was a great challenge for mechanisms of that time, 1843. † Studying
the parts of the machine is natural but there is another motivation. A person could object to Turing’s
model by observing that there is a machine that iterates writing a character and then moving right,
and thereby goes through unboundedly many configurations, while no physical device can do that.
A rejoinder is that we for instance define a ‘book’ to be pages with words and don’t worry whether
physics limits the number of possible pages. Happily, we don’t need to go into this to justify our interest.
Tapeless machines are quite practical, appearing often in everyday computing, which is justification
enough.
180 Chapter IV. Automata

push

𝑞init 𝑞one 𝑞ready

token token

push push token

1.3 Example This vending machine dispenses items that cost 30 cents.† The picture
is complex so we will show it in three layers. First are the arrows for nickels.
push n

𝑞0 𝑞5 𝑞10 𝑞15 𝑞20 𝑞25 𝑞30

n n n n n n

push push push push push push

After receiving 30 cents and getting another nickel, this machine does something
not very sensible: it stays in 𝑞 30 . In practice a machine would have further states to
keep track of overages so that it could give change but here we ignore that. Next
comes the arrows for dimes
d

𝑞0 𝑞5 𝑞10 𝑞15 𝑞20 𝑞25 d 𝑞30

d d d d d

and for quarters.

q
q
q
q
q
𝑞0 𝑞5 𝑞10 𝑞15 𝑞20 𝑞25 𝑞30

q q

1.4 Example This machine’s alphabet is the set of bits, B = { 0, 1 }. As an example,

if the input string is 𝜎 = 101101 then the machine reads those bits, first passing
from 𝑞 0 into 𝑞 1 and 𝑞 1 again, then through 𝑞 2 and 𝑞 3 and 𝑞 3 again, before finally
returning to 𝑞 0 . As 𝑞 0 has a double circle, the machine accepts 𝜎 .

𝑞0 1 𝑞1
0 0

1 1

𝑞3 1 𝑞2
0 0

This machine accepts a bitstring if the number of 1’s in its input is a multiple of
four.
†
US coins are: 1 cent coins not used here, nickels are 5 cents, dimes are 10 cents, and quarters are 25.
Section 1. Finite State machines 181

We defined Turing machines as sets of instructions. Instructions have the

advantage of being intuitive, and of matching how we think about CPU’s on
everyday computers. While we have later occasionally referred to instructions,
what has mattered most is that they describe the machine’s next-state function, Δ.
For the definition of Finite State machines we will cut right to giving it in terms of
the next-state function.
1.5 Definition A Finite State machine or Finite State automata ⟨𝑄, 𝑞 0, 𝐹, Σ, Δ⟩ has
a finite set of states 𝑄 , one of which is the start state 𝑞 0 , a subset 𝐹 ⊆ 𝑄 of final
states or accepting states, a finite input alphabet set Σ, and a next-state function
or transition function Δ : 𝑄 × Σ → 𝑄 .
A full description of the action of these machines comes after a few more
examples. But basically, to work a machine, load a string input on the tape and
press Start. At each step the machine consumes one tape token: it reads, acts on,
and deletes it, and then moves so that the head points to the next tape cell.

The picture shows a light labeled ‘Accept’. When the machine stops, when the input
string is fully consumed, if the current state is an accepting state then the light
comes on. In this case we say that the machine accepts the input string, otherwise
it rejects that string.
Here is a trace of the steps when we start Example 1.4’s modulo 4 machine
with the input string 𝜏 = 10110. Since the ending state 𝑞 3 is not accepting the
machine rejects 𝜏 .

Step Configuration Step Configuration

10110 10
0 q0 3 q2

0110 0
1 q1
4 q3

110
2 q1
5 q3

In contrast with the traces in the first chapter, here we hold the head still and move
the tape. This emphasizes that Finite State machines consume one character per
step. They stop once all the characters are gone so they are sure to halt — there is
no Halting problem for Finite State machines.
1.6 Example The machine below accepts a string if and only if it contains at least two
0’s as well as an even number of 1’s. (In tables we mark accepting states with ‘+’).
182 Chapter IV. Automata

Δ 0 1
𝑞0 𝑞1 𝑞2
𝑞0 𝑞1 𝑞3
0
0 0
𝑞1 𝑞2 𝑞4
1 1 1 1 1 1 + 𝑞2 𝑞2 𝑞5
𝑞3 𝑞4 𝑞0
𝑞3 𝑞4 𝑞5 0
0 0
𝑞4 𝑞5 𝑞1
𝑞5 𝑞5 𝑞2

1.7 Remark We pause to briefly address the key to designing Finite State machines.
Often people new to them put down a 𝑞 0 , think of some input strings and then
add states accounting for those inputs. This can give haphazard results.
Proceeding in this way is thinking of a state as about what happened to get
there. Better is to think of states as about the future. The prior example brings
this out: articulating the role of state 𝑞 1 gives something like, “waiting for a 0” or
possibly “waiting for at least one 0”. Similarly 𝑞 5 is “waiting for a 1.” Another
example is that state 𝑞 4 is looking for a 0 followed by a 1.
Finite State machine descriptions may take the alphabet to be clear from the
context. Thus, Example 1.6’s alphabet is B = { 0, 1 }. For in-practice machines, the
alphabet is the set of characters that the machine could conceivably receive, so that
a text-handling routine built to modern standards might well accept all of Unicode.
But for the examples and exercises in this book we will use small alphabets.†
1.8 Example This machine accepts strings that are valid decimal representations of
integers. So it accepts the strings 21 and -7 and +37 but does not accept 501-.
The transition graph and the table both group some inputs together when they
result in the same action. For instance, when in state 𝑞 0 this machine does the
same thing whether the input is + or -, namely it passes into 𝑞 1 .

+, - 0, . . . 9 –other–
0,...,9
Δ
𝑞0 +,-
𝑞1
0,...,9
𝑞2 0,..,9 𝑞0 𝑞1 𝑞2 𝑒
other
𝑞1 𝑒 𝑞2 𝑒
other
other + 𝑞2 𝑒 𝑞2 𝑒
𝑒 any 𝑒 𝑒 𝑒 𝑒

Any wrong input character sends the machine to the state 𝑒 . Finite State machines
often have an error state, which is a sink in that once the machine enters that state
then it never leaves.
1.9 Example This machine accepts strings that are members of the set { jpg, pdf, png }.
It is our first example with more than one accepting state.

†
We often use the characters a, b, c, etc., because something like ‘b2 ’ is clearer than something like ‘12 ’.
Section 1. Finite State machines 183

𝑞1 p
𝑞2 g
𝑞3

j
𝑞0 p
𝑞4 𝑞5 𝑞6 𝑒
d f

𝑞7 g
𝑞8

That drawing omits many edges, the ones involving the error state 𝑒 . For instance,
from state 𝑞 0 any input character other than j or p is an error. We omit all of these
edges because they would make the drawing hard to read. This illustrates that
while pictures are better for simple machines, past some point of complexity, a
transition table presentation is better than a picture.
That example points out that if a language is finite then there is a Finite State
machine that accepts a string if and only if it is a member of that language.
1.10 Example Finite State machines can accomplish reasonably hard tasks. This one
accepts strings representing natural numbers that are multiples of three such as 15
and 8013, and does not accept non-multiples such as 14 and 8012.
2,5,8

0,3,6,9 𝑞0 𝑞1 0,3,6,9
1,4,7

2,5,8
1,4,7
1,4,7 2,5,8

𝑞2

0,3,6,9

This machine accepts the empty string. Exercise 1.26 asks for a modification to
accept only non-empty strings.
1.11 Example Finite State machines translate easily to code. Here is the Racket code
for the delta function of the prior example’s multiple of three machine.
(define (delta state ch)
(cond
[(= state 0)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 0)
((memv ch '(#\1 #\4 #\7)) 1)
(else 2))]
[(= state 1)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 1)
((memv ch '(#\1 #\4 #\7)) 2)
(else 0))]
[else
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 2)
((memv ch '(#\1 #\4 #\7)) 0)
(else 1))]))

(In Racket, a character such as ‘0’ is denoted #\0. The routine memv decides if
the character ch is in the list, that is, it is a boolean function.) All that’s left is to
supply a calling function
184 Chapter IV. Automata

(define (multiple -of-three-fsm input-string)

(let ((state 0))
(if (= 0 (multiple -of-three-fsm-helper state (string->list input-string)))
"accept"
"reject")))

and a helper.
(define (multiple -of-three-fsm-helper state tau-list)
(if (null? tau-list)
state
(multiple -of-three-fsm-helper (delta state (car tau-list))
(cdr tau-list))))

1.12 Example In the 1940’s, phone call connections were handled by simple devices for
local calls but required operator intervention for long distance. That changed with
the adoption of the Finite State machine here, which allowed users to directly dial
long distance in North America. Consider dialing 1-802-555-0101. The initial 1
means that the call leaves the local office. The 802 is an area code; the system can
tell that this is not a same-area local exchange because its second digit is 0 or 1.
Next, the 555 routes the call to a local office. Then that office’s device makes the
connection to line 0101.

Op Int1 Int𝑛
1 𝑥 𝑥
Legend:
𝑥 0, . . . 9
𝑛 2, . . . 9
0 LL1 𝑛 LL2 𝑛 𝑝 0, 1
𝑝 LL3 𝑛 X1 𝑛
1

𝑞0
2,3,5,7,8
X2 𝑛 X3 𝑥 L1

4
𝑥

6 H2 L2
0, 2, 3, ... 9

9 1
H3 1 Info 𝑥

R2 L3
0, 2, 3, ... 9

1
R3 1
Rep 𝑥

E2 L4
0, 2, 3, ... 9

1
E3 1 Emr 𝑥

Con

Today, no longer are area codes required to have a middle digit of 0 or 1. This
additional flexibility is possible because switching now happens entirely in software.
After the definition of Turing machine we gave a complete description of the
action of those machines. We now do the same for Finite State machines. A
configuration of a Finite State machine is a pair C = ⟨𝑞, 𝜏⟩ where 𝑞 is a state,
𝑞 ∈ 𝑄 , and 𝜏 is a (possibly empty) string, 𝜏 ∈ Σ∗. A machine starts in an initial
configuration C0 = ⟨𝑞 0, 𝜏0 ⟩ , so that 𝜏0 is the input and 𝑞 0 is the initial state.
Section 1. Finite State machines 185

A Finite State machine acts by making a sequence of transitions between

configurations. A machine’s configuration after the 𝑠 -th transition (for 𝑠 ∈ N+ ) is
its configuration at step 𝑠 . Fitting with C0 , we write C1 , C2 , and so forth.
Here is the rule for making one transition. Begin with C𝑠 = ⟨𝑞, 𝜏𝑠 ⟩ . Either 𝜏𝑠
is empty or it is not. If 𝜏𝑠 is not empty then pop its leading character 𝑐 . That is,
because 𝜏𝑠 is not empty it decomposes into a first character and a remaining string,
𝜏𝑠 = 𝑐 ⌢𝜏𝑠+1 . Then the machine’s next state is 𝑞ˆ = Δ(𝑞, 𝑐) and its next configuration
is C𝑠+1 = ⟨𝑞,
ˆ 𝜏𝑠+1 ⟩ . We denote this before-after relationship between configurations
with C𝑠 ⊢ C𝑠+1 .†
The other possibility is that 𝜏𝑠 is the empty string. Then C𝑠 is a halting
configuration. No transitions follow. Every Finite State machine eventually reaches
a halting configuration because at each transition the tape string loses a character.
A computation for a Finite State machine is a sequence C0 ⊢ C1 ⊢ · · · Cℎ = ⟨𝑞, 𝜀⟩ ,
which we abbreviate C0 ⊢∗ Cℎ .‡ If Cℎ ’s state is a final state, 𝑞 ∈ 𝐹 , then the machine
accepts the input string 𝜏0 . Otherwise it rejects 𝜏0 .
1.13 Example The multiple of three machine of Example 1.10 with input 8013 gives
the computation ⟨𝑞 0, 8013⟩ ⊢ ⟨𝑞 2, 013⟩ ⊢ ⟨𝑞 2, 13⟩ ⊢ ⟨𝑞 0, 3⟩ ⊢ ⟨𝑞 0, 𝜀⟩ . Since 𝑞 0 is
accepting, the machine accepts 5013.
1.14 Definition The set of strings accepted by a Finite State machine M is the
language recognized, or decided, by the machine,# or just simply the language of
that machine, L ( M) .
1.15 Example The language of the power switch machine from Example 1.1 is the
set of strings consisting of the token toggle given an even number of times,
L = { toggle2𝑘 𝑘 ∈ N }. The language of the turnstile machine, Example 1.2 is
the set of strings from the alphabet Σ = { token, push } of the form
𝑛
( push)𝑘0 token ( push)𝑘1 ( token)𝑘2 ( push)𝑘3 token ( push)𝑘4 ( token)𝑘5

for 𝑘 2, 𝑘 3, 𝑘 5 ∈ N+ and 𝑛, 𝑘 0, 𝑘 1, 𝑘 4 ∈ N.
1.16 Definition For any Finite State machine, the extended transition function
Δ̂ : Σ∗ → 𝑄 gives the state in which the machine ends after starting in the start
state and consuming the given string.
1.17 Example Consider this machine and its transition function.

b a b Δ a b
a 𝑞0 𝑞1 𝑞0
𝑞0 a
𝑞1 𝑞2 + 𝑞1 𝑞1 𝑞2
b
𝑞2 𝑞1 𝑞2

Its extended transition function Δ̂ extends Δ in that it repeats the first row of Δ’s
†
As earlier, read ⊢ aloud as “yields.” ‡ Read ⊢∗ as “yields eventually.” # Finite State machines must
halt and so there is no notion like computably enumerable. Thus the languages that such a machines
can decide is the same as the languages that it can recognize (in contrast with the case for Turing
machines, as defined on page 11). For these machines, ‘recognized’ is the more common term.
186 Chapter IV. Automata

table.
Δ̂( a) = 𝑞 1 Δ̂( b) = 𝑞 0
(We disregard the difference between Δ’s input of characters and Δ̂’s input of
length one strings.) This is Δ̂ on the length two strings.

Δ̂( aa) = 𝑞 1 Δ̂( ab) = 𝑞 2 Δ̂( ba) = 𝑞 1 Δ̂( bb) = 𝑞 0

Observe that a string 𝜎 is accepted by the machine if and only if Δ̂(𝜎) is an

accepting state. For instance, the string aa is in the language of this machine as
Δ̂( aa) = 𝑞 1 , which is accepting.
We can give a constructive definition of Δ̂. Fix a Finite State machine M with
transition function Δ : 𝑄 × Σ → 𝑄 . Begin with Δ̂(𝜀) = {𝑞 0 }. Then for 𝜏 ∈ Σ∗ and
𝑡 ∈ Σ, define Δ̂(𝜏 ⌢ 𝑡) to be Δ(𝑞, 𝑡) , where Δ̂(𝜏) = 𝑞 .
One way in which Δ̂ is handy is that it gives us a nice alternative definition of
when an input string is accepted by a machine: 𝜏0 is accepted if Δ̂(𝜏0 ) ∈ 𝐹 .
This brings us back to determinism because Δ̂ would not be well-defined
without it: by determinism Δ has one next state for all input configurations and so,
by induction, Δ̂ has one and only one output state for all input strings.

IV.1 Exercises
For the exercises that give a language description, a useful practice is to think through
that description by naming five strings that are in the language and five that are not.
✓ 1.18 Using this machine, trace through the computation when the input is (a) abba
(b) bab (c) bbaabbaa. Does the machine accept the string?
b a b

a
𝑞0 a
𝑞1 𝑞2
b

1.19 True or false: because a Finite State machine is finite, its language must be
finite.
1.20 Your classmate says, “I have a language L that recognizes the empty string 𝜀 .”
Explain to them the mistake.
1.21 Rebut “no Finite State machine can recognize the language { a𝑛 b 𝑛 ∈ N }
because 𝑛 is infinite.”
✓ 1.22 How many transitions does an input string of length 𝑛 cause a Finite State
machine to undergo? 𝑛 many? 𝑛 + 1? 𝑛 − 1? How many (not necessarily distinct)
states will the machine have visited after consuming the string?
✓ 1.23 For each of these descriptions of a language, give a one or two sentence
informal English-language description. Also list five strings that are elements as
well as five that are not, if there are that many.
(a) L = {𝛼 ∈ { a, b } 𝛼 = a𝑛 ba𝑛 for 𝑛 ∈ N }
∗
Section 1. Finite State machines 187

(b) { 𝛽 ∈ { a, b }∗ 𝛽 = a𝑛 ba𝑚 for 𝑛, 𝑚 ∈ N }

(d) { a𝑛 ba𝑛+2 ∈ { a, b } 𝑛 ∈ N }
∗

(e) {𝛾 ∈ { a, b } 𝛾 has the form 𝛾 = 𝛼 ⌢ 𝛼 for 𝛼 ∈ { a, b } }

∗ ∗

✓ 1.24 For the machines of Example 1.6, Example 1.8, Example 1.9, and Ex-
ample 1.10, answer these. (a) What are the accepting states? (b) Does it
accept the empty string 𝜀 ? (c) What is the shortest string that each accepts?
(d) Is the language of accepted strings infinite?
1.25 As in Example 1.13, give the computation for the multiple of three machine
with the initial string 2332.
1.26 Modify the machine of Example 1.10 so that it accepts only non-empty
strings.
1.27 Produce the transition graph picturing this transition function. What is the
machine’s language?

Δ a b
𝑞0 𝑞2 𝑞1
+ 𝑞1 𝑞0 𝑞2
𝑞2 𝑞2 𝑞2

✓ 1.28 What language is recognized by this machine?

𝑞0 𝑞1 𝑞2
b b

a a a,b

✓ 1.29 For each language, name five strings in the language and five that are not
(if there are not five, name as many as there are). Then produce a Finite State
machine that recognizes that language. Give both a circle diagram and a transition
function table. The alphabet is Σ = { a, b }.
(a) L1 = {𝜎 ∈ Σ∗ 𝜎 has at least one a and at least one b }
(b) L2 = {𝜎 ∈ Σ∗ 𝜎 has fewer than three a’s }
(c) L3 = {𝜎 ∈ Σ∗ 𝜎 ends in ab }
(d) L4 = { a𝑛 b𝑚 ∈ Σ∗ 𝑛, 𝑚 ≥ 2 }
(e) L5 = { a𝑛 b𝑚 a𝑝 ∈ Σ∗ 𝑚 = 2 and 𝑎, 𝑝 ∈ N }
1.30 Consider the language of strings over Σ = { a, b } containing at least two a’s
and at least two b’s. Name five elements of the language and five non-elements,
if there are that many. Then produce a Finite State machine recognizing this
language. As in Example 1.6, briefly describe the intuitive meaning of the states.
✓ 1.31 For each language give a transition graph and table for a Finite State machine
recognizing the language. Use Σ = { a, b }.
(a) {𝜎 ∈ Σ∗ 𝜎 has at least two a’s }
(b) {𝜎 ∈ Σ∗ 𝜎 has exactly two a’s }
188 Chapter IV. Automata

(c) {𝜎 ∈ Σ∗ 𝜎 has two or fewer a’s }

(d) {𝜎 ∈ Σ∗ 𝜎 has at least one a followed by at least one b }
✓ 1.32 Give a Finite State machine over Σ = { a, b, c } that accepts any string
containing the substring abc. Give a brief explication of each state’s role in the
machine, as in Example 1.6.
1.33 For each language, give five strings from that language and five that are not
(if there are not that many then list all of the strings that are possible). Also give a
Finite State machine that recognizes the language. Use Σ = { a, b }.
(a) L = {𝜎 ∈ { a, b } 𝜎 ends in aa }
∗

(b) {𝜎 ∈ { a, b } 𝜎 = 𝜀 }
∗

(d) {𝜎 ∈ { a, b } 𝜎 = a𝑛 or 𝜎 = b𝑛 for 𝑛 ∈ N }
∗

1.34 Produce a Finite State machine over the alphabet Σ = { A, ... Z, 0, ... 9 } that
accepts only the string 911, and a machine that accepts any string but that one.
1.35 Using Example 1.17, apply the extended transition function to all of the
length three and length four string inputs.
1.36 What happens when the input to an extended transition function is the
empty string?
✓ 1.37 Consider a language of comments that begin with the two-character string
/#, end with the two-character string #/, and have no #/ substrings in the middle.
Give a Finite State machine to recognize that language.
✓ 1.38 Produce a Finite State machine that recognizes each.
(a) {𝜎 ∈ { 0, ... 9 } 𝜎 has either no 0’s or no 2’s }
∗

(b) {𝜎 ∈ { 0, ... 9 } 𝜎 is the decimal representation of a multiple of 5 }

∗

✓ 1.39 Give a Finite State machine over the alphabet Σ = { A, ... Z } that accepts
only strings in which the vowels occur in ascending order. (The traditional vowels,
in ascending order, are A, E, I, O, and U.)
✓ 1.40 Consider this grammar.
⟨real⟩ → ⟨posreal⟩ | + ⟨posreal⟩ | - ⟨posreal⟩
⟨posreal⟩ → ⟨natural⟩ | ⟨natural⟩ . | ⟨natural⟩ . ⟨natural⟩
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨natural⟩
⟨digit⟩ → 0 | . . . 9
(a) Give five strings of terminals that are in its language and five that are not.
(b) Does the language contain the string .12? (c) Briefly describe the language.
(d) Give a Finite State machine that recognizes the language.
1.41 Produce a Finite State machine for each.
(a) {𝜎 ∈ B∗ every 1 in 𝜎 has a 0 just before it and just after }
(b) {𝜎 ∈ B∗ 𝜎 represents in binary a number divisible by 4 }
(c) {𝜎 ∈ { 0, ... 9 } 𝜎 represents in decimal an even number }
∗

(d) {𝜎 ∈ { 0, ... 9 } 𝜎 represents in decimal a multiple of 100 }

∗
Section 2. Nondeterminism 189

1.42 Consider {𝜎 ∈ { 0, ... 9 }∗ 𝜎 represents in decimal a multiple of 4 }. Briefly

describe a Finite State machine. You need not give the full graph or table.
1.43 Following Definition 1.16 is a constructive definition of the extended transition
function. We will apply that to the machine in Example 1.6. (a) Use the definition
to find Δ̂( 0) and Δ̂( 1) . (b) Use the definition to find Δ̂’s output on all length two
inputs 00, 01, 10, and 11. (c) Find its action on all length three strings.
✓ 1.44 Produce a Finite State machine that recognizes the language over Σ = { a, b }
containing no more than one occurrence of the substring aa. That is, it may contain
zero-many such substrings or one, but not two. Note that the string aaa contains
two occurrences of that substring.
1.45 Let Σ = B. (a) List all of the different Finite State machines over Σ with a
single state, 𝑄 = {𝑞 0 }. (Ignore whether a state is final or not; we will do that below.)
(b) List all the the ones with two states, 𝑄 = {𝑞 0, 𝑞 1 }. (c) How many machines are
there with 𝑛 states? (d) What if we distinguish between machines with different
sets of final states?
✓ 1.46 Propositiones ad acuendos iuvenes (problems to sharpen the young) is one of
the oldest collection of mathematical problems. It is by Alcuin of York (735–804),
royal advisor to Charlemagne and head of the Frankish court school, at Aachen.
One problem, Propositio de lupo et capra etfasciculo cauli, is particularly famous: A
man had to transport to the far side of a river a wolf, a goat, and a bundle of cabbages.
The only boat he could find was one that could carry only two of them. For that
reason, he sought a plan which would enable them all to get to the far side unhurt.
Let him, who is able, say how it could be possible to transport them safely. Readers of
the 700’s would have known that a wolf cannot be left alone with a goat, nor can a
goat be left alone with cabbages. Construct the relevant Finite State machine and
use it to solve the problem.
1.47 Fix an alphabet with at least two members. We will show that there
are languages over that alphabet not recognized by any Finite State machine.
(a) Show that the number of Finite State machines with that alphabet is infinite.
(b) Show that it is countable. (c) Show that the number of languages over that
alphabet is uncountable.

Section
IV.2 Nondeterminism
Turing machines and Finite State machines both have the property that, given the
current state and current character, the next state is completely determined. Once
we lay out an initial tape and push Start then the machine just walks through a
fixed succession of step/next step calculations. We now consider machines that
are nondeterministic, ones for which there may be configurations where there
is more than one next state, or configurations where there is just one, or even
configurations without any next state at all.
190 Chapter IV. Automata

Motivation Imagine a grammar with some rules and a start symbol. We get a
string and are asked to find a derivation of it. The challenge is that we sometimes
don’t know which rules the derivation should follow. For instance, if we have
S → BaS | AbA then from S we can do two different things: which will work?
In the Grammar section’s exercises we expected that an intelligent person
would have the insight to guess the right way. If instead we were writing a program
then we might have it try every case — we might do a breadth-first traversal of the
directed acyclic graph of all derivations — until the program finds a success.
The American philosopher and Hall of Fame baseball catcher Y Berra
said, “When you come to a fork in the road, take it.” That’s a natural way
to attack this problem: when you come up against multiple possibilities,
fork a child for each. Thus, the routine might begin with the start state 𝑆
and for each rule that could apply, it spawns a child process, deriving a
string one removed from the start. After that, each child finds each rule
that could apply to its string and spawns its own children, each of which
Yogi Berra
now has a string that is two removed from the start. Continue until the
1925–2015
desired string appears, if it ever does.
The prototypical example for this strategy is the celebrated Traveling Salesman
problem, that of finding the shortest circuit visiting every city in a list. For instance,
suppose that we want to know if there is a trip that visits each state capital in the
US lower forty eight states and returns back to where it began, in less than 16 000
kilometers. We start at Montpelier, the capital of Vermont. From there we could
fork a process for each potential next capital, making forty seven new processes.
Thus the process that after Montpelier goes next to Concord, New Hampshire
would know that the trip so far is 188 kilometers. In the next round, each child
would fork its own child processes, forty six of them. At the end, many processes
will have failed to find a short-enough trip but if even one finds it then we consider
the overall search a success.
That computation is nondeterministic in that while it is happening the machine
is simultaneously in many different states. Restated, the computation happens
on an unboundedly-parallel machine, where whenever we need an additional
computing agent, another CPU plus tape, one is available.†
We will have two ways to think about nondeterminism, two mental models.‡
The first is the one introduced above: when a machine is presented with multiple
possible next states then it forks, so that it is in all of them simultaneously. The
next example illustrates.
2.1 Example The Finite State machine below is nondeterministic because leaving 𝑞 0
are two arrows labeled 0. It also has states with a deficit of edges such as that no
arrow for 1 leaves 𝑞 1 so if it is in that state and reads that input then it passes to
no state at all.
†
This is like our experience with everyday computers, where we may be writing an email in one window
and watching a video in another. The machine appears to be in multiple states simultaneously. ‡ While
these models are helpful in learning and thinking about nondeterminism, they are not part of the
formal definitions and proofs.
Section 2. Nondeterminism 191

0,1

𝑞0 𝑞1 𝑞2 𝑞3
0 0 1

The graphic below shows what happens with input 00001. It pictures the computa-
tion history as a tree. For instance, on the first 0 the computation splits in two so
the machine is now in two states at once.
Input: 0 0 0 0 1
𝑞0
𝑞0 ⊢
⊢
𝑞0
⊢ 𝑞1
⊢
𝑞0
⊢ 𝑞1 𝑞2 𝑞3
⊢ ⊢ ⊢
𝑞0
⊢ 𝑞1 𝑞2
⊢ ⊢
𝑞0
⊢ 𝑞1 𝑞2
⊢
Step: 0 1 2 3 4 5

2.2 Animation: Steps in the nondeterministic computation.

When we considered the forking approach to string derivations or to the

Traveling Salesman problem, we observed that if a solution exists then there is at
least one child process that finds it. The same happens here: there is a branch of
the computation tree that accepts the input string. There are also branches that are
not successful. The one at the bottom dies after step 2 because when the present
state is 𝑞 2 and the input is 0 then this machine passes to no-state.† The branch
at the top does not die early but it also does not accept the input. However, we
don’t care if there are dozens of unsuccessful branches, we only care that there is
at least one success. Consequently, we will define that a nondeterministic machine
accepts an input if the computation tree has at least one branch that accepts.
The machine in the above example accepts any string that ends in two 0’s and
a 1. As it is consuming the input 𝜎 = 00001, the problem that the machine faces
is: when should it stop going around 𝑞 0 ’s loop and start to the right? This machine
accepts this input, so it has solved this problem — viewed from the outside at least,
we could say that the machine has correctly guessed what to do.
This is our second model for nondeterminism. We can imagine programming
by calling a function, some amb (...) , that guesses a successful sequence if there
is one to guess.‡
Saying that a mechanism guesses is jarring. Based on programming classes, a
person’s intuition may well be that “guessing” is not mechanically accomplishable.
† ‡
No-state cannot be an accepting state, since it isn’t a state at all. The name amb abbreviates
‘ambiguous function’.
192 Chapter IV. Automata

As an alternative, we can imagine that the machine is furnished with the answer
(“go around twice, then off to the right”) and only has to check it.
When we talk about this way of expressing the second mental
model our convention is to call the furnisher a demon, because they
somehow know answers that cannot otherwise be found but also
because we must be suspicious and check that the answer is not a trick.
Under this model, a nondeterministic computation accepts the input
if there exists a branch of the computation tree that a deterministic
machine, if told what branch to take, could verify.
Below we shall describe nondeterminism using both paradigms: as
a machine being in multiple states at once, and also as a machine Flauros, Duke of
guessing (or being told and verifying). In this chapter we will do that Hell
for Finite State machines and in the fifth chapter we will return to it
for Turing machines.

Definition A nondeterministic Finite State machine’s next-state function does not

output single states, it outputs sets of states, members of the power set P (𝑄) .
2.3 Definition A nondeterministic Finite State machine M = ⟨𝑄, 𝑞 0, 𝐹, Σ, Δ⟩ consists
of a finite set of states 𝑄 , one of which is the start state 𝑞 0 , a subset 𝐹 ⊆ 𝑄 of
accepting states or final states, a finite input alphabet set Σ, and a next-state
function Δ : 𝑄 × Σ → P (𝑄) .
We will use these machines in three ways. First, they are useful in practice; many
tasks are more easily solved with nondeterministic machines than with deterministic
ones. We will use them to prove Kleene’s Theorem, Theorem 3.11. And, they give
us an initial encounter with nondeterminism, which is a critical concept for this
book’s fifth chapter.
2.4 Example This is Example 2.1’s nondeterministic Finite State machine, along with
its transition function. That function does not output states, it outputs sets of
states.

Δ 0 1
0,1
𝑞0 {𝑞 0, 𝑞 1 } {𝑞 0 }
𝑞1 {𝑞 2 } {}
𝑞0 𝑞1 𝑞2 𝑞3
0 0 1 𝑞2 {} {𝑞 3 }
+ 𝑞3 {} {}

The imagery in informal terms such as “guess” and “demon” helps introduce the
ideas but may also give an impression that those ideas are fuzzy. So when we next
step through the description of the action of these machines, note that it is precise.
2.5 Remark When we described the action of deterministic Finite State machines
on page 184, we laid out how to construct the sequence of configurations, by
transitioning from each to the succeeding one until the tape is empty. But for
nondeterministic machines there needn’t be one and only one sequence. That
makes a description that is non-constructive the clearer choice.
Section 2. Nondeterminism 193

A configuration is a pair C = ⟨𝑞, 𝜏⟩ where 𝑞 ∈ 𝑄 and 𝜏 ∈ Σ∗. A machine starts

in an initial configuration C0 = ⟨𝑞 0, 𝜏0 ⟩ , so that 𝜏0 is the input.
Two configurations may be related by a transition, denoted ⊢. In the configura-
tion C𝑠 = ⟨𝑞, 𝜏𝑠 ⟩ , let the string 𝜏𝑠 be not empty. Then where Cˆ = ⟨𝑞, ˆ 𝜏⟩
ˆ , we write
C𝑠 ⊢ Cˆ or say that Cˆ succeeds C𝑠 if two conditions hold. First, it must be that 𝜏ˆ
comes from removing 𝜏𝑠 ’s leading character, so that 𝜏𝑠 = 𝑐 ⌢ 𝜏ˆ for 𝑐 ∈ Σ. Second,
using that leading character 𝑐 , it must also be that the next state 𝑞ˆ is a member of
the set Δ(𝑞, 𝑐) .
So far the definitions apply to any Finite State machine, deterministic or not.
But for the machines in this section there may be many such branches, or one, or
none at all.
A configuration with an tape empty string, Cℎ = ⟨𝑞ℎ , 𝜀⟩ , is a halting configura-
tion. Consider the sequences of configurations related by transitions and starting
with C0 . A branch is a sequence that is maximally long. (In this machine type a
halting configuration has no configuration succeeding it, so if a branch has one
then it comes last. But as in Example 2.1, branches can also terminate without
ending in a halting configuration.)
The computation tree or just computation for a nondeterministic Finite State
machine M is the set containing all of the branches. If in that set there exists
a branch ending in a halting configuration, C0 = ⟨𝑞 0, 𝜏0 ⟩ ⊢ C1 ⊢ · · · Cℎ = ⟨𝑞ℎ , 𝜀⟩
whose state is accepting, 𝑞ℎ ∈ 𝐹, then M accepts the input 𝜏0 . Otherwise, it
rejects 𝜏0 .
2.6 Example Example 2.1 shows a number branches. Since this branch

⟨𝑞 0, 00001⟩ ⊢ ⟨𝑞 0, 0001⟩ ⊢ ⟨𝑞 0, 001⟩ ⊢ ⟨𝑞 1, 01⟩ ⊢ ⟨𝑞 2, 1⟩ ⊢ ⟨𝑞 3, 𝜀⟩

ends in a halting configuration whose state is accepting, the machine accepts
00001.
As with deterministic Finite State machines, at each step the machine consumes
one character. There is no Halting problem for these machines.
2.7 Definition For a nondeterministic Finite State machine M, the set of accepted
strings is the language of the machine L ( M) , or the language recognized by that
machine.
We can also adapt the definition of the extended transition function so that it
outputs sets, Δ̂ : Σ∗ → P (𝑄) . Fix a nondeterministic M with transition function
Δ : 𝑄 × Σ → 𝑄 . Start with Δ̂(𝜀) = {𝑞 0 }. For 𝜏 ∈ Σ∗, where Δ̂(𝜏) = {𝑞𝑖 0 , 𝑞𝑖 1 , ... 𝑞𝑖𝑘 }
and 𝑡 ∈ Σ, define Δ̂(𝜏 ⌢ 𝑡) to be Δ(𝑞𝑖 0 , 𝑡) ∪ Δ(𝑞𝑖 1 , 𝑡) ∪ · · · ∪ Δ(𝑞𝑖𝑘 , 𝑡) . Then the
machine accepts 𝜎 ∈ Σ∗ if and only if any element of Δ̂(𝜎) is a final state.
2.8 Example The language recognized by this nondeterministic machine

a,b 𝑞0 a
𝑞1

b a

𝑞2 𝑞3 a,b
b
194 Chapter IV. Automata

is the set of strings containing the substring aa or bb. For instance, the machine
accepts abaaba because there is a sequence of transitions ending in an accepting
state.

⟨𝑞 0, abaaba⟩ ⊢ ⟨𝑞 0, baaba⟩ ⊢ ⟨𝑞 0, aaba⟩ ⊢ ⟨𝑞 1, aba⟩ ⊢ ⟨𝑞 3, ba⟩ ⊢ ⟨𝑞 3, a⟩ ⊢ ⟨𝑞 3, 𝜀⟩

2.9 Example With Σ = { a, b, c }, this nondeterministic machine

c
𝑞0 𝑞1
a

recognizes the language { ( ac)𝑛 𝑛 ∈ N } = {𝜀, ac, acac, ... }. The symbol b isn’t
attached to any arrow so it won’t play a part in any accepting string.
Often a nondeterministic Finite State machines is easier to write than a
deterministic machine that does the same job.
2.10 Example Both of these machines accept any string whose next to last character
is a. The nondeterministic one on the left is simpler than the deterministic one.

a 𝑞2 a

a,b 𝑞0 a
𝑞1 𝑞2 b 𝑞0 a 𝑞1 a b
a,b
b
𝑞3
b

2.11 Example This machine accepts {𝜎 ∈ B∗ 𝜎 = 0 ⌢ 𝜏 ⌢ 1 where 𝜏 ∈ B∗ }.

0,1

𝑞0 𝑞1 𝑞2
0 1

2.12 Example This is a remote control listener that waits to hear the signal 0101110.
That is, it recognizes the language {𝜎 ⌢ 0101110 𝜎 ∈ B∗ }.

0,1 𝑞0 𝑞1 𝑞2 𝑞3 𝑞4 𝑞5 𝑞6 𝑞7
0 1 0 1 1 1 0

𝜀 transitions Another extension, beyond nondeterminism, is to allow 𝜀 tran-

sitions or 𝜀 moves. We alter the definition of a nondeterministic Finite State
machine so that instead of Δ : 𝑄 × Σ → P (𝑄) the transition function’s signature
is Δ : 𝑄 × (Σ ∪ {𝜀 }) → P (𝑄) .† The associated behavior is that the machine can
transition spontaneously, without consuming any input. We start with a number of
examples.
2.13 Example This machine recognizes valid integer representations. The 𝜀 on the first
arrow means that the machine can jump from 𝑞 0 to 𝑞 1 without reading the tape.

𝑞0 +,-,𝜀
𝑞1 𝑞2 0,...9
0,...,9

†
For this purpose the ‘𝜀 ’ is a character, not a representation of the empty string. Assume that it is not
an element of Σ.
Section 2. Nondeterminism 195

For instance, with input 123 the machine can begin as below by following the 𝜀
transition to state 𝑞 1 without reading and deleting the leading 1. It next reads that
1 and transitions to 𝑞 2 , and then stays there while processing the 2 and 3. This
branch of the machine’s computation tree accepting its input and so 123 is in the
machine’s language.

⟨𝑞 0, 123⟩ ⊢ ⟨𝑞 1, 123⟩ ⊢ ⟨𝑞 2, 23⟩ ⊢ ⟨𝑞 2, 3⟩ ⊢ ⟨𝑞 2, 𝜀⟩

The practical effect of the 𝜀 is that this machine can accept strings that do not start
with a + or - sign.
2.14 Example This machine has a number of 𝜀 transitions.

𝜀
𝑞3 c
𝑞4
𝜀
𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑞5 𝑞6
d

Here it accepts abc by following the 𝜀 transition between 𝑞 2 and 𝑞 3 .

⟨𝑞 0, abc⟩ ⊢ ⟨𝑞 1, bc⟩ ⊢ ⟨𝑞 2, c⟩ ⊢ ⟨𝑞 3, c⟩ ⊢ ⟨𝑞 4, 𝜀⟩

A machine may also, in a single step, follow two or more 𝜀 transitions in succession.
Here, it accepts d by transitioning from 𝑞 0 to 𝑞 5 without consuming any input.

⟨𝑞 0, d⟩ ⊢ ⟨𝑞 5, d⟩ ⊢ ⟨𝑞 6, 𝜀⟩

The language is L = { abc, abd, c, d }.

2.15 Example Below is a machine that is nondeterministic and with 𝜀 moves, along
with its computation tree on input aab. The 𝜀 moves are inside the white stripes.
Input: a a b
𝑞1
⊢

𝜀 𝑞0
⊢

a 𝑞0 𝜀
𝑞1 𝑞2
b 𝑞1 𝑞1 𝑞1 ⊢ 𝑞2
⊢

𝑞0 ⊢ 𝑞0 ⊢ 𝑞0
Step: 0 1 2 3

2.16 Animation: Computation tree for a nondeterministic machine with 𝜀 moves.

At each step, the machine is in all of the states that are inside of the step’s stripe.
For instance, at step 0 the machine is in both 𝑞 0 and 𝑞 1 . After exhausting the tape,
at step 3 it is in both 𝑞 0 and 𝑞 2 and because 𝑞 0 is an accepting state, it accepts the
input aab.
The 𝜀 transitions simplify building Finite State machines.
196 Chapter IV. Automata

2.17 Example An 𝜀 transition can put two machines together with a parallel connection.
Here is a machine whose states are named with 𝑞 ’s combined in parallel with one
whose states are named with 𝑟 ’s.
a,b

𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑠0 c
𝜀
𝑟0 𝑟1
a

The top machine’s language is {𝜎 ∈ { a, b }∗ 𝜎 ends in ab } and the bottom’s is

{𝜎 ∈ { a, c }∗ 𝜎 = ( ac)𝑛 for some 𝑛 ∈ N }. This is the language for the entire
machine.
L = {𝜎 ∈ Σ∗ either 𝜎 ends in ab or 𝜎 = ( ac)𝑛 for 𝑛 ∈ N }

We can take the alphabet for the entire machine to be the union, Σ = { a, b, c }.
2.18 Example An 𝜀 transition can also connect machines serially. The machine on
the left below recognizes L0 = { ( aab)𝑖 𝑖 ∈ N }. The one on the right recognizes
L1 = {𝜎0 ⌢ · · · 𝜎 𝑗 − 1 𝑗 ∈ N and 𝜎𝑘 = a or 𝜎𝑘 = aba for 0 ≤ 𝑘 ≤ 𝑗 − 1 }.

𝑞2 a 𝑞1 𝑞4 𝑞5
b
a a
b a

𝑞0 𝑞3

If we insert an 𝜀 bridge to the right side’s initial state from each of the left side’s
final states (here there is only one such state), and de-finalize those states on the
left,
𝑞2 a 𝑞1 𝑞4 𝑞5
b
a a
b a

𝑞0 𝜀
𝑞3

then the combined machine accepts strings in the concatenation of those languages,
L ( M) = L0 ⌢ L1 . For example, it accepts aabaababa, and aabaa, as well as abaa.
2.19 Example We can also use 𝜀 transitions to get the Kleene star of a language.
Without the 𝜀 edge this machine’s language is L = {𝜀, ab },
𝜀

𝑞0 a
𝑞1 𝑞2
b
Section 2. Nondeterminism 197

but with that edge the language is L∗ = { ( ab)𝑛 𝑛 ∈ N }.

We next describe the action of these machines. For that we need a function.
The 𝜀 closure 𝐸ˆ : 𝑄 → P (𝑄) inputs a state 𝑞 and returns the set of states that are
reachable from 𝑞 in some number of 𝜀 moves.
We will build it by iteratively, by constructing a function 𝐸 : 𝑄 × N → P (𝑄)
such that 𝐸 (𝑞, 𝑚) is the set of states reachable from 𝑞 in at most 𝑚 -many 𝜀 moves.
To define this function, for 𝑚 = 0 take 𝐸 (𝑞, 𝑚) = {𝑞 }. For 𝑚 > 0, where
𝐸 (𝑞, 𝑚) = {𝑞𝑚0 , ... 𝑞𝑚𝑘 }, take 𝐸 (𝑞, 𝑚 + 1) = 𝐸 (𝑞, 𝑚) ∪ Δ(𝑞𝑚0 , 𝜀) ∪ · · · Δ(𝑞𝑚𝑘 , 𝜀) .
The resulting sets are nested, 𝐸 (𝑞, 0) = {𝑞 } ⊆ 𝐸 (𝑞, 1) ⊆ · · · . There are only
finitely many states so for any initial 𝑞 there must be an 𝑚𝑞 ∈ N where the sequence
of sets stops growing, 𝐸 (𝑞, 𝑚𝑞 ) = 𝐸 (𝑞, 𝑚𝑞 + 1) = · · · . Define the 𝜀 closure as the
limit, 𝐸ˆ(𝑞) = 𝐸 (𝑞, 𝑚𝑞 ) .
2.20 Example Recall the machine from Example 2.14.

𝜀
𝑞3 c
𝑞4
𝜀
𝑞0 a
𝑞1 𝑞2
b
𝜀
𝑞5 𝑞6
d

This table finds the epsilon closure for each state.

𝐸 (𝑞, 𝑚) 𝑚 =0 1 2 3 𝐸ˆ(𝑞)
𝑞0 {𝑞 0 } {𝑞 0, 𝑞 2 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 0, 𝑞 2, 𝑞 3, 𝑞 5 }
𝑞1 {𝑞 1 } {𝑞 1 } {𝑞 1 } {𝑞 1 } {𝑞 1 }
𝑞2 {𝑞 2 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 } {𝑞 2, 𝑞 3, 𝑞 5 }
𝑞3 {𝑞 3 } {𝑞 3 } {𝑞 3 } {𝑞 3 } {𝑞 3 }
𝑞4 {𝑞 4 } {𝑞 4 } {𝑞 4 } {𝑞 4 } {𝑞 4 }
𝑞5 {𝑞 5 } {𝑞 5 } {𝑞 5 } {𝑞 5 } {𝑞 5 }

We can now describe the action. A configuration is a pair C = ⟨𝑞, 𝜏⟩ ∈ 𝑄 × Σ∗.

A machine starts in an initial configuration C0 = ⟨𝑞 0, 𝜏0 ⟩ , with input 𝜏0 .
Fix C𝑠 = ⟨𝑞, 𝜏𝑠 ⟩ in order to describe under what circumstances C𝑠 ⊢ Cˆ for Cˆ =
ˆ 𝜏⟩
⟨𝑞, ˆ . There are two possibilities. The first is the same as for any nondeterministic
machine: 𝜏 is not empty and 𝜏ˆ results from popping 𝜏 ’s leading character, 𝜏 = 𝑐 ⌢𝜏ˆ,
and also 𝑞ˆ is a member of Δ(𝑞, 𝑐) . The second possibility applies only to machines
with 𝜀 moves: 𝜏ˆ = 𝜏 , so the machine doesn’t consume any input, and also 𝑞ˆ is a
member of the 𝜀 closure 𝐸ˆ(𝑞) .
Consider the sequences of configurations related by transitions. A branch is a
sequence starting with C0 that is maximally long. (In this machine type there can
be cycles of 𝜀 moves and so branches can be infinite.)
The computation tree or just computation for a nondeterministic Finite State
machine with 𝜀 moves M is the set containing all of the branches. If in that set
there exists a branch containing a halting configuration, Cℎ = ⟨𝑞ℎ , 𝜀⟩ , and its state
is accepting, 𝑞ℎ ∈ 𝐹, then M accepts the input 𝜏0 . Otherwise M rejects 𝜏0 .
198 Chapter IV. Automata

We finish by defining the extended transition function Δ̂ : 𝑄 × (Σ ∪ {𝜀 }) ∗ → 𝑄

to suit this new machine type. Fix 𝑞 ∈ 𝑄 . Begin with Δ̂(𝑞, 𝜀) = 𝐸ˆ(𝑞) . For the
iteration, suppose that Δ̂(𝑞, 𝜏) = {𝑞𝑖 0 , 𝑞𝑖 1 , ... 𝑞𝑖𝑘 } for 𝜏 ∈ Σ∗. Where 𝑐 ∈ Σ,
to define Δ̂(𝑞, 𝜏 ⌢ 𝑐) first make the set of all states reachable without 𝜀 moves
from a state in Δ̂(𝑞, 𝜏) with 𝑆 = Δ(𝑞𝑖 0 , 𝑐) ∪ · · · Δ(𝑞𝑖𝑘 , 𝑐) . Then include 𝜀 moves:
Δ̂(𝑞, 𝜏 ⌢ 𝑐) = 𝐸ˆ(𝑠 0 ) ∪ 𝐸ˆ(𝑠 1 ) ∪ · · · 𝐸ˆ(𝑠𝑛 ) for 𝑠𝑖 ∈ 𝑆 . Observe that the machine M
accepts 𝜏0 if Δ̂(𝑞 0, 𝜏0 ) ∩ 𝐹 is not empty and otherwise M rejects 𝜏0 .

Equivalence of the machine types In this section we have apparently extended

the capabilities of Finite State machines, including by adding nondeterminism.
This can be puzzling. For instance, if we feed the input 𝜏 = 010101110 to the
machine from Example 2.12 then it accepts, but that machine is set up to accept
strings that begin with two blocks of 01’s, while 𝜏 begins with three. How can it
know, without peeking ahead, that it should ignore the first block but transition on
the second? How can it guess?
In mathematics we can consider whatever we can make precise but we have so
far stuck to models of machines that are in principle physically realizable. So this
may seem to be a shift.
It is not a shift. We finish this section by showing how to convert any nondeter-
ministic Finite State machine into a deterministic one that does the same job. So
we can take nondeterminism to be an abbreviation, a shorthand, a convenience,
a way of alternatively specifying a deterministic machine. This obviates at least
some of the paradox of guessing.
2.21 Theorem The class of languages recognized by nondeterministic Finite State
machines equals the class of languages recognized by deterministic Finite State
machines. This remains true if we allow the nondeterministic machines to have
𝜀 transitions.
Inclusion in one direction is easy. In a deterministic machine the next-state
function outputs single states and to make it a nondeterministic machine, just
convert those states into singleton sets. Thus the set of languages recognized
by deterministic machines is a subset of the set recognized by nondeterministic
machines.
We will demonstrate inclusion in the other direction constructively, starting with
nondeterministic machines and building deterministic machines that recognize the
same language. The two examples below show the construction. Our complete
description of the algorithm comes after the first example. We won’t give a proof
that this construction works simply because the examples are entirely convincing.
2.22 Example This nondeterministic machine M𝑁 has no 𝜀 transitions. Its language is
L = {𝜎 ∈ { a, b }∗ 𝜎 = a𝑛 or 𝜎 = a𝑛 ab for 𝑛 ∈ N }.

𝑞0 a
𝑞1 𝑞2
b
Section 2. Nondeterminism 199

The associated deterministic machine M𝐷 is below. Each member of this machine

is a set of M𝑁 ’s states. The start state of M𝐷 is the one-element set {𝑞 0 } = 𝑠 1
containing the start state of M𝑁 , and a state of M𝐷 is accepting if any of its
elements are accepting states in M𝑁 .
As an illustration of constructing the transition table, suppose that we are in
𝑠 5 = {𝑞 0, 𝑞 2 } and are reading a. Combine the above machine’s next states due
to 𝑞 0 with those due to 𝑞 2 , that is, Δ𝐷 (𝑠 5, a) = {𝑞 0, 𝑞 1 } ∪ { } = {𝑞 0, 𝑞 1 }. That’s 𝑠 4 .
Δ𝐷 a b
𝑠0 = { } 𝑠0 𝑠0 a,b 𝑠0 b
b
+ 𝑠 1 = {𝑞 0 } 𝑠4 𝑠0 a

𝑠 2 = {𝑞 1 } 𝑠0 𝑠3 a
a,b
𝑠6 𝑠5 𝑠1
+ 𝑠 3 = {𝑞 2 } 𝑠0 𝑠0 b a a

+ 𝑠 4 = {𝑞 0, 𝑞 1 }
b b a
𝑠4 𝑠3 𝑠2 𝑠3 𝑠4 𝑠7
+ 𝑠 5 = {𝑞 0, 𝑞 2 } 𝑠4 𝑠0 a
+ 𝑠 6 = {𝑞 1, 𝑞 2 } 𝑠0 𝑠3 b

+ 𝑠 7 = {𝑞 0, 𝑞 1, 𝑞 2 } 𝑠4 𝑠3
The machine’s table, and its transition graph, make clear that M𝐷 is deterministic.
Many of the states are unreachable; for example, 𝑠 6 has only outgoing arrows.
Below is the machine with those states removed. Again, the start state is 𝑠 1 .
a,b

𝑠0
a,b
b

𝑠1 a
𝑠4 𝑠3
b

Now we give the algorithm, the powerset construction. States in M𝐷 are sets
of states from M𝑁 . The start state of M𝐷 is the 𝜀 closure 𝐸ˆ(𝑞 0 ) (for machines
without 𝜀 moves this is {𝑞 0 }). A state of M𝐷 is accepting if it contains any of
M𝑁 ’s accepting states.
The transition function Δ𝐷 inputs a state 𝑠𝑖 ∈ M𝐷 , that is, 𝑠𝑖 = {𝑞𝑘0 , ... 𝑞𝑘𝑖 },
along with a character 𝑐 ∈ Σ. First apply M𝑁 ’s next state function to 𝑠𝑖 ’s elements
to get a set 𝑆𝑖,𝑐 = Δ𝑁 (𝑞𝑘0 , 𝑐) ∪ · · · Δ𝑁 (𝑞𝑘𝑖 , 𝑐) (if 𝑠𝑖 is empty then 𝑆𝑖,𝑐 is also empty).
Then include 𝜀 moves: where 𝑆𝑖,𝑐 = {𝑞 𝑗0 , ... 𝑞 𝑗𝑖 }, let Δ𝐷 (𝑠𝑖 , 𝑐) = 𝐸ˆ(𝑞 𝑗0 ) ∪ · · · 𝐸ˆ(𝑞 𝑗𝑖 ) .
(For machines without 𝜀 transitions this second part has no effect.)
2.23 Example We next do a nondeterministic machine with 𝜀 transitions.
𝜀
𝑞0 𝑞1
b

a b
𝜀

𝑞2 𝑞3 a
200 Chapter IV. Automata

The table below computes the associated deterministic machine. The start state is
𝐸ˆ(𝑞 0 ) = {𝑞 0, 𝑞 3 } = 𝑠 7 . A state is accepting if it contains 𝑞 1 .
Here is an example walking through the powerset algorithm. First, let the
machine be in state 𝑠 7 = {𝑞 0, 𝑞 3 } and reading b. In the terms of the algorithm’s
description, applying Δ𝑁 to each element of 𝑠 7 gives 𝑆 7,b = Δ𝑁 (𝑞 0, b) ∪Δ𝑁 (𝑞 3, b) =
{𝑞 1 } ∪ { } = {𝑞 1 }. Taking the 𝜀 closure gives Δ𝐷 (𝑠 7, b) = {𝑞 0, 𝑞 1, 𝑞 3 } = 𝑠 12 .
Finding the 𝜀 closures in advance is a help in constructing the table. We have
𝐸ˆ(𝑞 0 ) = {𝑞 0, 𝑞 3 } = 𝑠 7 , and 𝐸ˆ(𝑞 1 ) = {𝑞 0, 𝑞 1, 𝑞 3 } = 𝑠 12 , and 𝐸ˆ(𝑞 2 ) = {𝑞 2 } = 𝑠 3 ,
and 𝐸ˆ(𝑞 3 ) = {𝑞 3 } = 𝑠 4 .

𝑆𝑖,a Δ𝐷 (𝑠𝑖 , a) 𝑆𝑖,b Δ𝐷 (𝑠𝑖 , b)

𝑠0 = { } {} 𝑠0 {} 𝑠0
𝑠 1 = {𝑞 0 } {𝑞 2 } 𝑠3 {𝑞 1 } 𝑠 12
+ 𝑠 2 = {𝑞 1 } {} 𝑠0 {} 𝑠0
𝑠 3 = {𝑞 2 } {} 𝑠0 {𝑞 0 } 𝑠7
𝑠 4 = {𝑞 3 } {𝑞 3 } 𝑠4 {} 𝑠0
+ 𝑠5 = {𝑞 0, 𝑞 1 } {𝑞 2 } 𝑠3 {𝑞 1 } 𝑠 12
𝑠6 = {𝑞 0, 𝑞 2 } {𝑞 2 } 𝑠3 {𝑞 0, 𝑞 1 } 𝑠 12
𝑠7 = {𝑞 0, 𝑞 3 } {𝑞 2, 𝑞 3 } 𝑠 10 {𝑞 1 } 𝑠 12
+ 𝑠8 = {𝑞 1, 𝑞 2 } {} 𝑠0 {𝑞 0 } 𝑠7
+ 𝑠9 = {𝑞 1, 𝑞 3 } {𝑞 3 } 𝑠4 {} 𝑠0
𝑠 10 = {𝑞 2, 𝑞 3 } {𝑞 3 } 𝑠4 {𝑞 0 } 𝑠7
+ 𝑠 11 = {𝑞 0, 𝑞 1, 𝑞 2 } {𝑞 2 } 𝑠3 {𝑞 0, 𝑞 1 } 𝑠 12
+ 𝑠 12 = {𝑞 0, 𝑞 1, 𝑞 3 } {𝑞 2, 𝑞 3 } 𝑠 10 {𝑞 1 } 𝑠 12
𝑠 13 = {𝑞 0, 𝑞 2, 𝑞 3 } {𝑞 2, 𝑞 3 } 𝑠 10 {𝑞 0, 𝑞 1 } 𝑠 12
+ 𝑠 14 = {𝑞 1, 𝑞 2, 𝑞 3 } {𝑞 3 } 𝑠4 {𝑞 0 } 𝑠7
+ 𝑠 15 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 3 } {𝑞 2, 𝑞 3 } 𝑠 10 {𝑞 0, 𝑞 1 } 𝑠 12

The transition diagram is below. Many of the machine’s table’s sixteen states, are
unreachable from the starting state 𝑠 7 = 𝐸ˆ(𝑞 0 ) . We can see that by starting at 𝑠 7
and tracing through the states. The diagram omits unreachable states.
b
𝑠7 𝑠 12 b

b a
a
a

𝑠 10 a
𝑠4 𝑠0 a,b
b

The powerset construction shows that for any nondeterministic machine there
is a deterministic machine that recognizes the same language.
We can say more: if the nondeterministic machine has 𝑛 states then the
deterministic machine has at most 2𝑛 states. (It turns out that 2𝑛 is the best
that we can do in that for any 𝑛 there is an 𝑛 state nondeterministic machine
requiring a deterministic machine with 2𝑛 states. However, in practice usually
the deterministic machine is not too big once we minimize the number of states.
Extra C shows how to minimize.)
Section 2. Nondeterminism 201

IV.2 Exercises
2.24 Give the transition function for the machine of Example 2.8, and of Exam-
ple 2.9.
✓ 2.25 Consider this machine.
𝑞0 𝑞1 𝑞2 1
0,1 1

(a) Does it accept the empty string? (b) The string 0? (c) 011? (d) 010?
(e) List all length five accepted strings.
2.26 Your class has someone who asks, “I get that it is interesting, but isn’t all
this machine-guessing stuff just mathematical abstractions that are not real?” How
might the prof respond?
✓ 2.27 Your friend objects, “Epsilon transitions don’t make any sense because the
machine below will never get its first step done; it just endlessly follows the
epsilons.” Correct their misimpression.
𝜀
𝑞0 𝑞1 𝑞2
b
𝜀 ,a

✓ 2.28 Using the nondeterministic machine from Example 2.23, give a computation
tree table like Example 2.15’s for each input string. (a) the empty string (b) a
(c) b (d) aa (e) ab (f) ba (g) bb
2.29 Give a sequence of ‘⊢’ relations showing that Example 2.12’s machine accepts
𝜏 = 010101110.
2.30 This machine has Σ = { a, b }.
b

a 𝑞0 𝑞1 𝜀
𝑞2 b
a,b

(a) What is the 𝜀 closure of 𝑞 0 ? Of 𝑞 1 ? 𝑞 2 ? (b) Does it accept the empty string?
(c) a? b? (d) Show that it accepts aab by producing a suitable sequence of ⊢
relations. (e) List five strings of minimal length that it accepts. (f) List five of
minimal length that it does not accept.
2.31 Produce the table description of the next-state function Δ for the machine in
the prior exercise. It should have three columns, for a, b, and 𝜀 .
2.32 Consider this machine.
0 1

𝑞0 𝑞1 𝑞2
0 1

(a) Show that it accepts 011 by producing a suitable sequence of ⊢ relations.

(b) Show that the machine accepts 00011 by producing a suitable sequence of ⊢
202 Chapter IV. Automata

relations. (c) Does it accept the empty string? (d) 0? 1? (e) List five strings of
minimal length that it accepts. (f) List five of minimal length that it does not accept.
(g) What is the language of this machine?
✓ 2.33 Find the 𝜀 closures of the states of this nondeterministic machine, using a
table like the Example 2.20’s.
0
1 𝑞0 𝑞2 1
0
𝜀 0 𝜀

0,1 𝑞1 𝑞3 0
𝜀

2.34 Draw the transition graph of a nondeterministic machine that recognizes the
language {𝜎 = 𝜏0 ⌢ 𝜏1 ⌢ 𝜏2 ∈ B∗ 𝜏0 = 1, 𝜏1 = 1, and 𝜏2 = ( 00)𝑘 for some 𝑘 ∈ N }.
✓ 2.35 Give diagrams for nondeterministic Finite State machines that recognize the
given language and that have the given number of states. Use Σ = B.
(a) L0 = {𝜎 𝜎 ends in 00 } , having three states
(b) L1 = {𝜎 𝜎 has the substring 0110 } , with five states
(c) L2 = {𝜎 𝜎 contains an even number of 0’s or exactly two 1’s } , with six states
(d) L3 = { 0 } , with one state
∗

✓ 2.36 Draw the graph of a nondeterministic Finite State machine over B that
accepts strings with the suffix 111000111.
2.37 Find a nondeterministic Finite State machine that recognizes this language
of three words: L = { cat, cap, carumba }.
2.38 Give a nondeterministic Finite State machine over Σ = { a, b, c } recognizing
the language of strings that omit at least one of the characters in the alphabet.
✓ 2.39 For each, draw the transition graph for a Finite State machine, which may
be nondeterministic, that accepts the given strings from { a, b }∗.
(a) Accepted strings have a second character of a and next to last character of b.
(b) Accepted strings have second character a and the next to last character is
also a.
2.40 What is the language of this nondeterministic machine with 𝜀 transitions?
a

a 𝑞0 b 𝑞1 𝑞2 b

2.41 Find a deterministic machine and a nondeterministic machine that recog-

nizes the set of bitstrings containing the substring 11. You need not derive the
deterministic machine with the powerset construction.
✓ 2.42 For each, follow the powerset construction to make a deterministic machine
recognizing the same language.
𝜀
1 0 𝑞0 𝑞2
1

0 𝑞0 𝑞1 𝑞2 0 0,1
0 0,1
𝑞1
Section 2. Nondeterminism 203

2.43 This table

Δ a b
𝑞0 {𝑞 0 } {𝑞 1, 𝑞 2 }
𝑞1 {𝑞 3 } {𝑞 3 }
𝑞2 {𝑞 1 } {𝑞 3 }
+ 𝑞3 {𝑞 3 } {𝑞 3 }
gives the next-state function for a nondeterministic Finite State machine. (a) Draw
the transition graph. (b) What is the recognized language? (c) Give the next-state
table for a deterministic machine that recognizes the same language.
✓ 2.44 Find the nondeterministic Finite State machine that accepts all bitstrings that
begin with 10. Use the powerset construction to produce the transition function
table of a deterministic machine that does the same.
✓ 2.45 For each give a nondeterministic Finite State machine without 𝜀 transitions
over Σ = { 0, 1, 2 }. (a) The machine recognizing the language of strings whose
final character appears exactly twice in the string. (b) The machine recognizing
the language of strings whose final character appears exactly twice in the string,
but in between those two occurrences is no higher digit.
✓ 2.46 For each give a nondeterministic Finite State machine, possibly with 𝜀 tran-
sitions, that recognizes the language with alphabet B. (a) In each string, every
0 is followed immediately by a 1. (b) Each string contains 000 followed, possi-
bly with some intermediate characters, by 001. (c) In each string the first two
characters equals the final two characters, in order. (Hint: what about 000?)
(d) There is either an even number of 0’s or an odd number of 1’s.
2.47 Give a minimal-sized nondeterministic Finite State machine over Σ = { a, b, c }
that accepts only the empty string. Also give one that accepts any string except the
empty string. For both, produce the transition graph and table.
2.48 A grammar is right linear if every production rule has the form ⟨N ⟩ → x ⟨M⟩ ,
where the right side has a single terminal followed by a single nonterminal. With
this right linear grammar we can associate this nondeterministic Finite State
machine.
S → aA a b

A → aA |bB
B → bB |b 𝑆 a 𝐴 b 𝐵 b 𝐹

(a) Give three strings from the language of the grammar and show that they are
accepted by the machine. (b) Describe that language.
2.49 Decide whether each problem is solvable or unsolvable by a Turing machine.
(a) L𝐷𝐹𝐴 = { ⟨M, 𝜎⟩ the deterministic Finite State machine M accepts 𝜎 }
(b) L𝑁 𝐹𝐴 = { ⟨M, 𝜎⟩ the nondeterministic machine M accepts 𝜎 }
2.50 (a) For the machine of Example 2.23, for each 𝑞 ∈ 𝑄 produce 𝐸 (𝑞, 0) ,
𝐸 (𝑞, 1) , 𝐸 (𝑞, 2) , and 𝐸 (𝑞, 3) . List 𝐸ˆ(𝑞) for each 𝑞 ∈ 𝑄 . (b) Do the same for
Exercise 2.30’s machine.
204 Chapter IV. Automata

Section
IV.3 Regular expressions

In 1951, S Kleene† was studying a mathematical model of neurons. These

are like a Finite State machine in that they do not have scratch memory.
Because of that connection, his work has made its way over to here. He
noted patterns, that is, regularities, in the languages that they recognize.
For instance, the Finite State machine below accepts strings that have any
number of b’s (perhaps zero many) followed by at least one a, optionally
then followed by some number of repetitions of a pattern of at least one
b and then at least one a. He introduced a convenient way, called regular Stephen
Kleene
expressions, for constructs such as ‘any number of ’.
1909–1994
b a b

a
𝑞0 a
𝑞1 𝑞2
b

Definition A regular expression is a string that describes a language. We will

introduce these with a few examples, using the alphabet Σ = { a, ... z }.
3.1 Example The regular expression p(a|e|i|o|u)t describes the language of
strings that start with p, have a vowel in the middle, and end with t. That
is, this regular expression describes the language consisting of five words, L =
{ pat, pet, pit, pot, put }.
The alternation operator, ‘|’,‡ denoting ‘or’, and the parentheses, which provide
grouping, are not part of the strings being described; they are metacharacters.
Besides alternation and parentheses, the regular expression in that example
also includes concatenation since the language consists of strings where the initial
p is concatenated with a vowel, which in turn is concatenated with t.
3.2 Example The regular expression ab*c describes the language whose words begin
with an a, followed by any number of b’s (including possibly zero-many b’s), and
ending with a c. Thus, ‘∗’ means ‘repeat the prior thing any number of times,
including possibly zero-many times’. This regular expression describes the language
L = { ac, abc, abbc, ... }.
3.3 Definition Let Σ = {𝑥 0, 𝑥 1, ... 𝑥𝑛 } be an alphabet not containing the
metacharacters ‘)’, ‘(’, ‘|’, or ‘*’.# A regular expression over Σ is a string in the
language of this grammar.
⟨reg-exp⟩ → ⟨char⟩ | ⟨alter⟩ | ⟨concat⟩ | ⟨star⟩
⟨char⟩ → ∅ | 𝜀 | 𝑥 0 | 𝑥 1 | . . . | 𝑥𝑛
⟨alter⟩ → ( ⟨reg-exp⟩ ‘|’ ⟨reg-exp⟩ )
⟨concat⟩ → ( ⟨reg-exp⟩ ⟨reg-exp⟩ )
⟨star⟩ → ( ⟨reg-exp⟩ )*
† ‡
Pronounced KLAY-nee. He was a PhD student of Church’s. This symbol is called ‘pipe’. Other
notations for alternation are ‘+’ and ‘∪’.
Section 3. Regular expressions 205

In words, any of the single character strings ∅ or 𝜀 or 𝑥 0 , . . . or 𝑥𝑛 are regular

expressions. And, where 𝑅0 and 𝑅1 are regular expressions then so also are
(𝑅0 |𝑅1 ), and (𝑅0𝑅1 ), and (𝑅0 )*. Thus these four strings are regular expressions
over Σ = { a, b }: (ab), and (a|b), and b, and (a(b)*). (Below we will discuss
conventions for omitting some parentheses.)
That definition gives the syntax of regular expressions, the rules for the
structure of valid strings. As to their semantics, what they mean, a regular
expression describes a language. The language described by the one-character
regular expression ∅ is the empty language, L (∅) = ∅. The language described
by the regular expression consisting of only the character 𝜀 is the one-element
language containing only the empty string, L (𝜀) = {𝜀 } = { ‘ ’ }. If the regular
expression consists of a single character from the alphabet Σ then the language that
it describes contains only one string, which consists of only that single character,
as in L ( a) = { a }.
We finish the semantics by doing the operations. Start with regular expressions
𝑅0 and 𝑅1 describing languages L (𝑅0 ) and L (𝑅1 ) . The alternation of two regular
expressions (𝑅0 |𝑅1 ) describes the union of their languages, L (𝑅0 ) ∪ L (𝑅1 ) . Con-
catenation (0𝑅1 ) describes concatenation of the languages, L (𝑅0 ) ⌢ L (𝑅1 ) . And,
the Kleene star (𝑅0 )* describes the star of the language L (𝑅) ∗.
3.4 Example Consider the regular expression (ab) over Σ = { a, b }. It is the
concatenation of the regular expression a with the regular expression b. The first
describes a single-element language L ( a) = { a } and likewise the second describes
L ( b) = { b }. Thus L ( (ab)) = L ( a) ⌢ L ( b) = { ab }, another language with only
one element.
As formally defined, the syntax rules call for lots of parentheses. We cut down
on the annoyance by adopting a convention for operator precedence: star binds
most tightly, then concatenation, and then the alternation operator, |, binds least
tightly.
3.5 Example Instead of (a(b(a)*)) we write aba*. The precedence rules give that
the star applies only to the a before it, as in ab(a*). (If concatenation bound more
tightly than star then aba* would instead mean (aba)*.) The described language
is L ( aba*) = { ab, aba, abaa, aba3, ... } = { aba𝑛 𝑛 ∈ N }.
3.6 Remark There is an interaction between alternation and star. Consider the
regular expression (b|c)*. It could mean either ‘any number of repetitions of
picking b or c’ or ‘pick b or c and repeat that character any number of times’. The
first of those two is more convenient. Thus the language described by (b|c)* is
L = {𝜀, b, c, bb, bc, cb, cc, ... }. To describe the language whose members consist of
any number of b’s or any number of c’s, L̂ = {𝜀, b, bb, ... , c, cc, ... }, use the regular
expression b*|c*.
We next see some common constructs. These examples use Σ = { a, b, c }.
#
As we have done with other grammars, here we use the pipe symbol | as a metacharacter, to collapse
rules with the same left side. But this symbol also appears in regular expressions. For that usage the
grammar wraps it in single quotes, as ‘|’.
206 Chapter IV. Automata

3.7 Example The language consisting of strings of a’s whose length is a multiple of
three, L = {𝑎 3𝑘 𝑘 ∈ N } = {𝜀, aaa, aaaaaa, ... }, is described by (aaa)*.
Note that the empty string is a member of that language. A common mistake is
to forget that star includes the possibility of zero-many repetitions.
3.8 Example To match any character we can list them all. The language over
Σ = { a, b, c } of three-letter words ending in bc is { abc, bbc, cbc }. The regular
expression (a|b|c)bc describes it. (Another is (abc)|(bbc)|(cbc).)
3.9 Example Use 𝜀 to mark things as optional. Thus a*(𝜀 |b) describes the lan-
guage of strings that have any number of a’s and optionally end in one b,
L = {𝜀, b, a, ab, aa, aab, ... }. Similarly, to describe the language consisting of
words with between three and five a’s, L = { aaa, aaaa, aaaaa }, we can use
aaa(𝜀 |a|aa).
3.10 Example The language { b, bc, bcc, ab, abc, abcc, aab, ... } has words starting with
any number of a’s (including zero-many a’s), followed by a single b, and then
ending in fewer than three c’s. To describe it we can use a*b(𝜀 |c|cc).
Also see Extra A for extensions that are widely used in practice.

Kleene’s Theorem The next result justifies our study of regular expressions
because it shows that they describe the languages of interest.
3.11 Theorem (Kleene’s theorem) A language is recognized by a Finite State
machine if and only if that language is described by a regular expression.
We will prove this in separate halves. The proofs use nondeterministic machines
but since we can convert those to deterministic machines, the result holds for them
also.
3.12 Lemma If a language is described by a regular expression then there is a Finite
State machine recognizing that language.
Proof Fix an alphabet Σ. We will show that for any regular expression 𝑅 over Σ
there is a machine with alphabet Σ accepting exactly the strings matching the
expression. We use induction on the structure of regular expressions.
Start with regular expressions consisting of a single character. If 𝑅 = ∅ then
L (𝑅) = { } and the machine on the left below recognizes this language. If 𝑅 = 𝜀
then L (𝑅) = {𝜀 } and the machine in the middle recognizes it. If the regular
expression is a character from the alphabet such as 𝑅 = a then the machine on the
right works.

𝑞0 𝑞0 𝑞0 a
𝑞2

We finish by handling the three operations. Let 𝑅0 and 𝑅1 be regular expressions.

The inductive hypothesis gives a machine M0 whose language is described by 𝑅0
and a machine M1 whose language is described by 𝑅1 .
First consider alternation, 𝑅 = 𝑅0 |𝑅1 . Create the machine recognizing the
language described by 𝑅 by joining those two machines in parallel: introduce a
Section 3. Regular expressions 207

new state 𝑠 and use 𝜀 transitions to connect 𝑠 to the start states of M0 and M1 .
See Example 2.17 in the prior section.
Next consider concatenation, 𝑅 = 𝑅0 ⌢ 𝑅1 . Join the two machines serially: for
each accepting state in M0 , make an 𝜀 transition to the start state of M1 and
then convert all of the accepting states of M0 to be non-accepting states. See
Example 2.18.
Finally consider Kleene star, 𝑅 = (𝑅0 )*. For each accepting state in the
machine M0 that is not the start state, make an 𝜀 transition to the start state and
then make the start state an accepting state. See Example 2.19.
3.13 Example Building a machine for the regular expression ab(c|d)(ef)* starts with
machines for each of the single characters.

𝑞0 a 𝑞1 𝑞2 b 𝑞3 ... 𝑞10 f 𝑞11

Put these atomic components together

𝑞4 c 𝑞5
𝜀 𝜀
𝑞0 a 𝑞1 𝜀 𝑞2 b 𝑞3 𝑞12
𝑞8 e 𝑞9 𝜀 𝑞10 f 𝑞11
𝜀
𝑞6 d 𝑞7

to get the complete machine.

𝑞4 c 𝑞5 𝜀
𝜀 𝜀
𝑞0 a 𝑞1 𝜀 𝑞2 b 𝑞3 𝜀 𝑞12 𝑞8 e 𝑞9 𝜀 𝑞10 f 𝑞11
𝜀 𝜀
𝑞6 d 𝑞7

This is a nondeterministic machine with 𝜀 transitions. To get a deterministic

machine use the powerset construction.
3.14 Lemma Any language recognized by a Finite State machine is described by a
regular expression.
Our strategy starts with a Finite State machine and eliminates its states one
at a time. Below is an illustration, before and after pictures of part of a larger
machine, where we eliminate the state 𝑞 .

𝑞𝑖 a
𝑞 𝑞𝑜 𝑞𝑖 𝑞𝑜
b ab

In the after picture the edge is labeled ab, with more than just one character.
For this proof we will generalize transition graphs to allow edge labels that are
regular expressions. As we eliminate states, we keep the recognized language of
the machines the same. We will be done when what remains is two states, with
one edge between them. The desired regular expression will be that edge’s label.
Before the proof, one more illustration. Start with the machine on the left.
208 Chapter IV. Automata

b b

𝑞1 𝑞1
a c a c

𝑞0 𝑞2 𝑒 𝜀
𝑞0 𝑞2 𝜀 𝑓
d d

The proof goes as on the right, by introducing a new start state, 𝑒 , and a new final
state, 𝑓 . Then the proof eliminates 𝑞 1 as below.

𝑒 𝜀
𝑞0 𝑞2 𝜀 𝑓
d|(ab*c)

Clearly this machine recognizes the same language as the starting one.
Proof Call the machine M. If it has no accepting states then the regular expression
is ∅ and we are done. Otherwise, we start by transforming M to a new machine,
M̂, that has the same language and that is ready for the state-elimination strategy.
First we arrange that M̂ has a single accepting state. Create a new state 𝑓 and
for each of M’s accepting states make an 𝜀 transition to 𝑓 (by the prior paragraph
ˆ ). Change
there is at least one such accepting state so 𝑓 is connected to the rest of 𝑀
all the accepting states to non-accepting ones and then make 𝑓 accepting. Clearly
this does not change the language of accepted strings.
Next introduce a new start state, 𝑒 . Connect it to 𝑞 0 with an 𝜀 transition, again
leaving the language of the machine unchanged. (Putting 𝑒 in M̂ allows us to
uniformly eliminate each state in M when we say below, “Pick any 𝑞 not equal to
𝑒 or 𝑓 .”)
Because the edge labels are regular expressions, we can arrange that from
any 𝑞𝑖 to any 𝑞 𝑗 there is at most one edge, since if M has more than one edge then
in M̂ we can use alternation, ‘|’, to combine the labels.
a
𝑞𝑖 𝑞𝑗 𝑞𝑖 𝑞𝑗
a|b
b

Do the same with loops, that is, cases where 𝑖 = 𝑗 . These adjustments do not
change the language of accepted strings.
The last part of transforming to M̂ is to drop states that are useless in that
they don’t affect which strings are accepted. If a state node other than 𝑓 has
no outgoing edges then omit it, along with the edges into it. The language of
the machine will not change because this state is not itself accepting as only 𝑓 is
accepting, and cannot lead to an accepting state since it doesn’t lead anywhere.
Along the same lines, if a state node is unreachable from the start 𝑒 then drop that
node along with its incoming and outgoing edges. (The idea behind useless states
has some technical aspects. For instance, omitting a no-outgoing-edges node along
with its incoming edges can result in another node now having no outgoing edges,
which in turn needs the same treatment. But these machine have only finitely
many nodes and so this omitting process must eventually finish. For a definition of
unreachability see Exercise 3.34.)
Section 3. Regular expressions 209

With that, M̂ is ready for state elimination. Pick any 𝑞 not equal to 𝑒 or 𝑓 .
Below are before and after diagrams. By the setup work above, 𝑞 has at least one
incoming and at least one outgoing edge. So there are states 𝑞𝑖 0 , . . . 𝑞𝑖 𝑗 with an
edge leading into 𝑞 , and states 𝑞𝑜 0 , . . . 𝑞𝑜𝑘 that receive an edge leading out of 𝑞 .
In addition, 𝑞 may have a loop. (A fine point is that possibly some of the states
shown on the left of each diagram equal some shown on the right. For example,
possibly 𝑞𝑖 0 equals 𝑞𝑜 0 , and the edge on the top of each diagram is a loop.)
𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜 0 )
𝑅𝑖 ,𝑜
0 0 0 0 0

𝑞𝑖0 𝑅𝑖 ,𝑜
𝑞𝑜 0 𝑞𝑖0 𝑞𝑜 0
0 𝑘 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜𝑘 )
0 𝑘 0
𝑅𝑜 0
𝑅𝑖
0

.. 𝑞
.. .. ..
. 𝑅ℓ . . .

𝑅𝑖
𝑗
𝑅𝑜𝑘
𝑞𝑖 𝑗 𝑅𝑖 ,𝑜 𝑞𝑜𝑘 𝑞𝑖 𝑗 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜 0 ) 𝑞𝑜𝑘
𝑗 0 𝑗 0 𝑗

𝑅𝑖 ,𝑜 𝑅𝑖 ,𝑜 |(𝑅𝑖 𝑅ℓ *𝑅𝑜𝑘 )
𝑗 𝑘 𝑗 𝑘 𝑗

Eliminate 𝑞 and the associated edges by making the replacements shown on the
after diagram. (If an edge is not present then don’t include any regular expression
in the replacement. For instance, if there is no 𝑅ℓ edge then the right’s top edge
should be 𝑅𝑖 0,𝑜 0 |𝑅𝑖 0 𝑅𝑜 0 .) By construction for any two states in the after machine,
𝑞𝑖 and 𝑞𝑜 , in passing from the before diagram to the after, the set of strings taking
the machine from the first to the second is unchanged. Thus languages of the
before and after machines are equal.
Repeat this procedure until the only states left are 𝑒 and 𝑓 . The desired regular
expression is on the sole remaining edge.
3.15 Example We want a regular expression describing the language of the machine M
on the left below. Introduce 𝑒 and 𝑓 as on the right. There are no useless states so
this is M̂.
b
b

𝑞2
𝑞2
b a
b a
b
b
𝑒 𝑞0 𝑞1 𝑓
𝑞0 𝑞1 𝜀 𝜀
a
a
𝜀

We will eliminate 𝑞 2 , then 𝑞 1 , then 𝑞 0 (this choice is arbitrary; we could have

instead done any other order). For the first, in the notation used in the proof ’s
diagram, 𝑞 1 = 𝑞𝑖 0 and 𝑞 0 = 𝑞𝑜 0 . The regular expressions are 𝑅𝑖 0,𝑜 0 = b, 𝑅𝑖 0 = a,
𝑅ℓ = b, and 𝑅𝑜 0 = b. Elimination leaves the resulting machine with an arrow from
𝑞𝑖 0 = 𝑞 1 to 𝑞𝑜 0 = 𝑞 0 labeled 𝑅𝑖 0,𝑜 0 |(𝑅𝑖 0 𝑅ℓ *𝑅𝑜 0 ), which is b|(ab*b).
210 Chapter IV. Automata

b|(ab*b)

𝑒 𝜀
𝑞0 𝑞1 𝜀 𝑓
a
𝜀

Next 𝑞 1 . There is one node giving an incoming arrow, 𝑞 0 = 𝑞𝑖 0 , and two nodes
associated with outgoing arrows, 𝑞 0 = 𝑞𝑜 0 and 𝑓 = 𝑞𝑜 1 . (Note that 𝑞 0 is both
an incoming and outgoing node; this is the “fine point” mentioned in the proof.)
The regular expressions are: there is no arrow for 𝑅𝑖 0,𝑜 0 , 𝑅𝑖 0,𝑜 1 = 𝜀 , 𝑅𝑖 0 = a,
𝑅𝑜 0 = b|(ab*b), there is also no arrow for 𝑅ℓ , and 𝑅𝑜 1 = 𝜀 . Eliminating 𝑞 1
means that the next machine has an arrow from 𝑞𝑖 0 = 𝑞 0 to 𝑞𝑜 0 = 𝑞 0 labeled
𝑅𝑖 0,𝑜 0 |(𝑅𝑖 0 𝑅ℓ *𝑅𝑜 0 ), which is a(b|ab*b). It also means that the machine has an
arrow from 𝑞𝑖 0 = 𝑞 0 to 𝑞𝑜 1 = 𝑞 𝑓 labeled 𝑅𝑖 0,𝑜 1 |(𝑅𝑖 0 𝑅ℓ *𝑅𝑜 1 ), which is 𝜀 |(a𝜀 ).

a(b|ab*b)

𝑒 𝜀
𝑞0 𝑓
𝜀 |a𝜀

Final step. The sole incoming node is 𝑒 = 𝑞𝑖 0 and the sole outgoing node is 𝑓 = 𝑞𝑜 0 .
As well, 𝑅𝑖 0 = 𝜀 , 𝑅𝑜 0 = 𝜀 |a𝜀 , and 𝑅ℓ = a(b|ab*b).

𝑒 𝜀 (a(b|ab*b))*(𝜀 |a𝜀 )
𝑓

This regular expression describes the language of the starting machine (we can
simplify it; for instance, in the final parenthesis we can replace a𝜀 with a).

IV.3 Exercises
3.16 Decide if the string 𝜎 matches the regular expression 𝑅 . (a) 𝜎 = 0010,
𝑅 = 0*10 (b) 𝜎 = 101, 𝑅 = 1*01 (c) 𝜎 = 101, 𝑅 = 1*(0|1) (d) 𝜎 = 101,
𝑅 = 1*(0|1)* (e) 𝜎 = 01, 𝑅 = 1*01*
✓ 3.17 For each regular expression produce five bitstrings that match and five
that do not, or as many as there are if there are not five. (a) 01* (b) (01)*
(c) 1(0|1)1 (d) (0|1)(𝜀 |1)0* (e) ∅
3.18 Give a brief plain English description of the language for each regular
expression. (a) a*cb* (b) aa* (c) a(a|b)*bb
✓ 3.19 For each string in { a, b }∗ that is of length less than or equal to 3, decide if the
string is a match. (a) a*b (b) a* (c) ∅ (d) 𝜀 (e) b(a|b)a (f) (a|b)(𝜀 |a)a
3.20 For these regular expressions, decide if each element of B∗ of length at most 3
is a match. (a) 0*1 (b) 1*0 (c) ∅ (d) 𝜀 (e) 0(0|1)* (f) (100)(𝜀 |1)0*
✓ 3.21 A friend says to you, “The point of parentheses is that you first do inside
the parentheses and then do what’s outside. So Kleene star must mean ‘match
the inside and repeat’. So I think that (0*1)* should match the strings 001001
and 010101, but not the strings 01001 and 00000101, because you can’t write
those two as 𝜎 𝑛 for any substring 𝜎 .” Straighten them out.
Section 3. Regular expressions 211

3.22 The person behind you in class says, “I don’t get it. I got a regular expression
that I am sure is right. But I looked in the answers and the book got a different
one.” Explain what is up.
3.23 Produce a regular expression for the language of bitstrings that have a
substring consisting of at least three consecutive 1’s.
3.24 For each language, give five strings that are in the language and five that
are not. Then give a regular expression describing the language. Finally, give a
Finite State machine that accepts the language (a nondeterministic machine is
acceptable). (a) L0 = { a𝑛 b2𝑚 𝑚, 𝑛 ≥ 1 } (b) L1 = { a𝑛 b3𝑚 𝑚, 𝑛 ≥ 1 }
3.25 Give a regular expression for the language over Σ = { a, b, c } whose strings
are missing at least one letter, that is, whose strings are either without any a’s, or
without any b’s, or without any c’s.
3.26 Give a regular expression for each language. Use Σ = { a, b }. (a) The set of
strings starting with b. (b) The set of strings whose second-to-last character is a.
(c) The set of strings containing at least one of each character. (d) The strings
where the number of a’s is divisible by three.
3.27 Give a regular expression to describe each language over the alphabet
Σ = { a, b, c }. (a) The set of strings starting with aba. (b) The set of strings
ending with aba. (c) The set of strings containing the substring aba.
✓ 3.28 Give a regular expression to describe each language over B. (a) The set of
strings of odd parity, where the number of 1’s is odd. (b) The set of strings where
no two adjacent characters are equal. (c) The set of strings representing in binary
multiples of eight.
✓ 3.29 Give a regular expression to describe each language over the alphabet
Σ = { a, b }. (a) Every a is both immediately preceded and immediately followed
by a b. (b) Each string has at least two b’s that are not followed by an a.
3.30 Give a regular expression for each language of bitstrings. (a) The number of
0’s is even. (b) There are more than two 1’s. (c) The number of 0’s is even and
there are more than two 1’s.
3.31 Give a regular expression to describe each language.
(a) {𝜎 ∈ { a, b } 𝜎 ends with the same symbol that it began with, and 𝜎 ≠ 𝜀 }
∗

(b) { a𝑖 ba 𝑗 𝑖 and 𝑗 leave the same remainder on division by three }

✓ 3.32 Give a regular expression describing each language over B∗.
(a) The strings representing a binary number that is a multiple of two.
(b) The bitstrings where the first character differs from the final one.
(c) The bitstrings where no two adjacent characters are equal.
✓ 3.33 Produce a Finite State machine whose language equals the language described
by each regular expression. (a) a*ba (b) ab*(a|b)*a
3.34 Part of the proof of Lemma 3.14 involves unreachable states. Here is a
definition. Given a state 𝑞 , construct the set of states reachable from it by first
setting 𝑆 0 = {𝑞 } ∪ 𝐸ˆ(𝑞) , where 𝐸ˆ(𝑞) is the 𝜀 closure. Then iterate: starting with
212 Chapter IV. Automata

the set 𝑆𝑖 of states that are reachable in 𝑖 -many steps, for each 𝑞˜ ∈ 𝑆𝑖 follow each
outbound edge for a single step and also include the elements of the 𝜀 closure.
The union of 𝑆𝑖 with the collection of the states reached in this way is the set 𝑆𝑖+1 .
Stop when 𝑆𝑖 = 𝑆𝑖+1 , at which point it is the set of ever-reachable states. The
unreachable states are the others. For each machine, use that definition to find the
set of unreachable states.
𝑞0 𝑞4 a,b
𝑞3 b a,b

(a) a
(b) a
a b 𝑞2 𝑞1 𝑞3
𝑞0 𝑞1 𝑞2 a
a b
b b a,b

3.35 Here is a grammar for regular expressions that reflects the operator
precedence rules.
⟨reg-exp⟩ → ⟨concat⟩ | ⟨reg-exp⟩ ‘|’ ⟨concat⟩
⟨concat⟩ → ⟨simple⟩ | ⟨concat⟩ ⟨simple⟩
⟨simple⟩ → ( ⟨reg-exp⟩ ) | ⟨simple⟩ * | ∅ | 𝜀 | 𝑥 0 | . . . | 𝑥𝑛
Derive and construct the parse tree for each regular expressions over Σ = { a, b, c }.
(a) a(b|c) (b) ab*(a|c)
3.36 Use the grammar in the prior exercise to give the parse trees for Remark 3.6’s
a(b|c)* and a(b*|c*).
3.37 Apply the method of Lemma 3.14’s proof to this machine to eliminate 𝑞 0 .
a,b

b
𝑞0 𝑞1
a

(a) Get M̂ by introducing 𝑒 and 𝑓 . (b) Where 𝑞 = 𝑞 0 , describe which state from
the machine is playing the diagram’s before picture role of 𝑞𝑖 0 , which edge is 𝑅𝑖 0 , etc.
(c) Eliminate 𝑞 0 .
✓ 3.38 Apply method of Lemma 3.14’s proof to this machine. At each step describe
which state from the machine is playing the role of 𝑞𝑖 0 , which edge is 𝑅𝑖 0 , etc.
0,1

𝑞0 𝑞1 𝑞2
1 1

(a) Eliminate 𝑞 0 . (b) Eliminate 𝑞 1 . (c) 𝑞 2 (d) Give the regular expression.
3.39 Apply the state elimination method of Lemma 3.14’s proof to eliminate 𝑞 1 .
Note that each of the states 𝑞 0 and 𝑞 2 are as described in the proof ’s comment on
the fine point.
F B

A C
𝑞0 𝑞1 𝑞2
E D
Section 3. Regular expressions 213

3.40 (IIS, IIT 2021) Let L ⊆ B∗ be recognized by a deterministic Finite State

machine having exactly 𝑘 states. One of these must also be accepted by such a
machine with 𝑘 states, while for the other that is not necessarily right. Which is
which? (a) The complement, Lc (b) L ∪ { 01 }

3.41 Fix a Finite State machine M. Kleene’s Theorem shows that the set of strings
taking M from the start state 𝑞 0 to the set of final states is regular.
(a) Show that for any set of states 𝑆 ⊆ 𝑄 M , final or not, the set of strings taking
M from 𝑞 0 to one of the states in 𝑆 is regular.
(b) Show that the set of strings taking M from any single state to any other single
state is regular.

3.42 Fix an alphabet Σ. Show that the set of languages over Σ that are described
by a regular expression is countably infinite. Conclude that there are languages
over Σ not recognized by any Finite State machine.

3.43 An alternative proof of Lemma 3.12, the subset method, goes from a given
regular expression to an associated machine by reversing the steps of Lemma 3.14.
Start by labeling the single edge on a two-state machine with the given regular
expression.

𝑒 𝑅
𝑓

Then instead of eliminating nodes, introduce them.

𝑞𝑖
𝑅0 𝑅1
𝑞𝑜 =⇒ 𝑞𝑖
𝑅0
𝑞
𝑅1
𝑞𝑜

𝑅1
𝑞𝑖
𝑅0 |𝑅1
𝑞𝑜 =⇒ 𝑞𝑖
𝑅0
𝑞𝑜

𝑞𝑖 𝜀
𝑞 𝜀
𝑞𝑜
𝑞𝑖
𝑅*
𝑞𝑜 =⇒
𝑅

Use this approach to get a machine that recognizes the language described by
these regular expressions. (a) a|b (b) ca* (c) (a|b)c* (d) (a|b)(b*|a*)

3.44 Nondeterministic Finite State machines can always be made to have a single
accepting state. For deterministic machines that is not so.
(a) Show that any deterministic Finite State machine that recognizes the finite
language L1 = {𝜀, a } must have at least two accepting states.
(b) Show that any deterministic Finite State machine that recognizes L2 =
{𝜀, a, aa } must have at least three accepting states.
(c) Show that for any 𝑛 ∈ N there is a regular language that is not recognized by
any deterministic Finite State machine with at most 𝑛 accepting states.
214 Chapter IV. Automata

Section
IV.4 Regular languages
We have seen that deterministic Finite State machines, nondeterministic Finite
State machines, and regular expressions all describe the same set of languages.
The fact that we can describe these languages in so many different ways says that
there is something natural and important about them.†
Definition We now study the languages in this collection.
4.1 Definition A regular language is one that is recognized by some Finite State
machine or equivalently, described by a regular expression.
4.2 Lemma Fix an alphabet. The set of regular languages over it is countably infinite.
There are languages that are not regular.
Proof Call the alphabet Σ. We first show that the set of regular languages over Σ is
infinite. Section A specifies that any alphabet is nonempty and finite. Where 𝑥 is a
character from Σ, each of these languages is finite and therefore regular: L0 = { },
L1 = {𝑥 }, L2 = {𝑥𝑥 } . . .
Next we show that the set of regular languages over Σ is at most countable.
There is one language for each regular expression so we can do that by showing
that there are countably many regular expressions. There are finitely many regular
expressions of length 1, finitely many of length 2, etc. The union of them all is a
countable union of countable sets, and so is countable.
To finish we show that the set of all languages over Σ is uncountable, from
which it follows that there are languages that are not regular. First, the collection of
strings Σ∗ is infinite because where 𝑦 ∈ Σ, it contains infinitely the many different
elements, 𝑦 , 𝑦𝑦 , . . . In addition, that collection contains finitely many strings
of length zero, finitely many of length one, etc. and so is a countable union
of countable sets, and is therefore countably infinite. In contrast, the set of all
languages L ⊆ Σ∗ is the power set of Σ∗, and so has a greater cardinality, which
makes it uncountable.
Closure properties In proving the first half of Kleene’s Theorem, Lemma 3.12, we
showed that if L0 and L1 are regular then their union L0 ∪ L1 is regular, as is their
concatenation L0 ⌢ L1 , and the Kleene star L0 ∗. A set is closed under an operation
if performing that operation on its members always yields another member. This
restates Lemma 3.12 using that term.
4.3 Lemma The collection of regular languages is closed under the union of two
languages,‡ the concatenation of two languages, and the Kleene star of a language.
We can ask about the closure of regular languages under other operations. To
answer we will use the product construction.
†
This is just like how the fact that Turing machines, general recursive functions, and many other models
all compute the same sets says that these computable sets are a natural and important collection. This
collection is not just a historical artifact of what happened to be first proposed. ‡ If the two languages
have different alphabets Σ0 and Σ1 then the two languages as well as their union are regular over the
alphabet Σ0 ∪ Σ1 .
Section 4. Regular languages 215

4.4 Example The machine on the left, M0 , accepts strings with fewer than two a’s.
The one on the right, M1 , accepts strings with an odd number of b’s.
a,b a a
b b

𝑞0 𝑞1 𝑞2 𝑠0 𝑠1
a a
b

The transition tables contain the same information as the pictures.

Δ0 a b Δ1 a b
+ 𝑞0 𝑞1 𝑞0 𝑠0 𝑠0 𝑠1
+ 𝑞1 𝑞2 𝑞1 + 𝑠1 𝑠1 𝑠0
𝑞2 𝑞2 𝑞2

The product machine M has states that are the members of the cross product
𝑄 0 × 𝑄 1 and transitions that are given by Δ (𝑞𝑖 , 𝑠 𝑗 ) = Δ0 (𝑞𝑖 ), Δ1 (𝑠 𝑗 ) . Its start

state is (𝑞 0, 𝑠 0 ) .

Δ a b
(𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 )
(𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 )
(𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 )
(𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 )
(𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 )
(𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 )

As an example, with aba on the tape, M’s states go from (𝑞 0, 𝑠 0 ) to (𝑞 1, 𝑠 0 ) ,

to (𝑞 1, 𝑠 1 ) , and then to (𝑞 2, 𝑠 1 ) . This is simply because M0 passes from 𝑞 0 to 𝑞 1 ,
to 𝑞 1 again, and to 𝑞 2 , while M1 does 𝑠 0 to 𝑠 0 , to 𝑠 1 , and to 𝑠 1 . That is, the product
machine M runs M0 and M1 in parallel.
We have not yet fully specified the machine because we have not said which
states are accepting. On the left below, (𝑞𝑖 , 𝑠 𝑗 ) is accepting if both 𝑞𝑖 and 𝑠 𝑗 are
accepting. With this, M accepts a string if and only if both M0 and M1 accept it,
so M recognizes {𝜎 ∈ Σ 𝜎 has fewer than two a’s and an odd number of b’s }.

a b a b
(𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 ) + (𝑞 0, 𝑠 0 ) (𝑞 1, 𝑠 0 ) (𝑞 0, 𝑠 1 )
+ (𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 ) (𝑞 0, 𝑠 1 ) (𝑞 1, 𝑠 1 ) (𝑞 0, 𝑠 0 )
(𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 ) + (𝑞 1, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 1, 𝑠 1 )
+ (𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 ) (𝑞 1, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 1, 𝑠 0 )
(𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 )
(𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 1 ) (𝑞 2, 𝑠 0 )

On the right, accepting states (𝑞𝑖 , 𝑠 𝑗 ) are the ones where 𝑞𝑖 is accepting and 𝑠 𝑗 is not.
Then the machine accepts strings that are in the language of M0 but not that of M1 ,
so M recognizes {𝜎 ∈ Σ 𝜎 has fewer than two a’s and an even number of b’s }.
216 Chapter IV. Automata

4.5 Theorem The collection of regular languages is closed under the intersection of
two languages, the set difference of two languages, and the set complement of a
language.
Proof Fix an alphabet Σ and consider languages L0 and L1 . Let them be recognized
by the Finite State machines M0 and M1 . Perform the product construction to
get M.
If the accepting states of M are those pairs where both the first and second
component states are accepting then M recognizes the intersection of the languages,
L0 ∩ L1 . If the accepting states of M are those pairs where the first component
state is accepting but the second is not, then M recognizes the set difference of
the languages, L0 − L1 . A special case of this is when L0 is the set of all strings, Σ∗,
so that M recognizes the complement, L1 c.
These closure properties often simplify showing that a language is regular.
4.6 Example To show that the language

L = {𝜎 ∈ B∗ 𝜎 has an even number of 0’s and more than two 1’s }

is regular, we could produce a machine that recognizes it or exhibit a regular

expression. Instead, following the above result we note that L is the intersection of
{𝜎 ∈ B∗ 𝜎 has an even number of 0’s } and {𝜎 ∈ B∗ 𝜎 has more than two 1’s }.
Showing that those two are regular by producing machines or regular expressions
is easy.

IV.4 Exercises
4.7 Someone in class says, “I know that regular languages are closed under closure
properties. For example, we know if L0 and L1 are regular then their intersection
L0 ∩ L1 is also regular. But when L0 and L1 are not regular then L1 0 ∩ L1 = L2
doesn’t make L2 not regular, why? Doesn’t being closed mean for non-regularity
too?” Explain it to them.
4.8 Is English a regular language?
4.9 Name a class of languages that are closed under intersection and union but
not under complement.
✓ 4.10 True or false? Justify each answer.
(a) The empty language is not regular.
(b) The intersection of two languages is regular.
(c) The language of all bitstrings, B∗, is not regular.
(d) In every infinite regular language there are two strings where no character
from the alphabet is in the same place in both.
4.11 One of these is true and one is false. Which is which? (a) Any finite language
is regular. (b) Any regular language is finite.
4.12 Is {𝜎 ∈ B∗ 𝜎 represents in binary a power of 2 } a regular language?
Section 4. Regular languages 217

✓ 4.13 Show that each language over Σ = { a, b } is regular.

(a) {𝜎 ∈ Σ∗ 𝜎 starts and ends with a }
(b) {𝜎 ∈ Σ∗ the number of a’s is even }
4.14 Is the set of strings 9𝑛 that occur in the decimal expansion of 𝜋 a regular
language?
✓ 4.15 Suppose that the language L over B is regular. Show that the language
L̂ = { 1 ⌢ 𝜎 𝜎 ∈ L }, also over B, is also regular.
4.16 If two machines have 𝑛 0 states and 𝑛 1 states, how many states does their
product have?
✓ 4.17 For these two,
a b a
b

a,b
𝑞0 𝑞1 a
𝑞2 𝑠0 𝑠1
b
a
b

give the transition table for the product machine. Specify the accepting states so that
the result will accept (a) the intersection of the languages of the two machines, and
(b) the union of the languages.
4.18 Find the cross product of this machine, M from Example 4.4, with itself.
a a

b
𝑠0 𝑠1
b

Set the accepting states so that it accepts the same language as M.

4.19 One of our first examples of Finite State machines, Example 1.6, accepts a
string when it contains at least two 0’s as well as an even number of 1’s. Make
such a machine as a product of two simpler machines.
✓ 4.20 For each, decide whether it is true or false. Briefly justify.
(a) Every language is the subset of a regular language.
(b) Every language has a subset that is not regular.
(c) The union of a regular language and a language that is not regular must be
not regular.
(d) The union of two regular languages is regular, without exception.
4.21 Choose the right letter to fill in the blank (with justification): the concate-
nation of a regular language with a language that is not regular regular.
(a) must be (b) might be, or might be not (c) cannot be
4.22 True or false? Briefly justify.
(a) Regularity is inherited by subsets: if L0 is a regular language and L1 ⊆ L0
then L1 is also regular.
(b) Non-regularity is inherited by supersets: if L1 is not regular and L1 ⊆ L0 then
L0 is also not regular.
218 Chapter IV. Automata

4.23 Where L is a language, define L+ as the language L ⌢ L∗. Show that if L is

regular then so is L+.
4.24 Use closure properties in showing that if L is regular then the set of
even-length strings in L is also regular.
4.25 Example 4.6 shows that closure properties can make easier some arguments
that a language is regular. It can do the same for arguments that a language is not
regular. The next section’s Example 5.2 shows that { a𝑛 b𝑛 ∈ { a, b }∗ 𝑛 ∈ N } is
not regular. Show that {𝜎 ∈ { a, b }∗ 𝜎 contains the same number of a’s as b’s }
is not regular using closure properties. Hint: one way is to use closure under
intersection.
✓ 4.26 Lemma 4.2 gives a counting argument, a pure existence proof, that there
are languages that are not regular. But we can also exhibit such a language.
Prove that L = { 1𝑘 𝑘 ∈ 𝐾 } is not regular, where 𝐾 is the Halting problem set,
𝐾 = {𝑒 ∈ N 𝜙𝑒 (𝑒)↓}.
4.27 Show each.
(a) The collection of regular languages is not closed under the union of infinitely
many sets.
(b) Nor is it closed under the intersection of infinitely many sets.
4.28 Lemma 4.2 shows that the collection of regular languages over B is countable.
Show that not every individual language in that collection is countable.
4.29 An alternative definition of a regular language is as one generated by a
regular grammar, where rewrite rules have three forms: X → tY, or X → t, or
X → 𝜀 . That is, the rule head has one nonterminal and the rule body either has a
terminal followed by a nonterminal, or a single nonterminal, or is the empty string.
This is an example, with the language that it generates.
S → aS | bS | aA
A → aB L = {𝜎 ∈ { a, b }∗ 𝜎 = 𝜏 ⌢ aa or 𝜎 = 𝜏 ⌢ aab }
B → 𝜀|b
Here we outline an algorithm that inputs a regular grammar and produces a Finite
State machine that recognizes the same language. Apply these steps to the above
grammar. (a) For each nonterminal X make a machine state 𝑞𝑋 , where the start
state is the one for the start symbol. (b) For each X → 𝜀 rule make state 𝑞𝑋
accepting. (c) For each X → tY rule put a transition from 𝑞𝑋 to 𝑞𝑌 labeled t.
(d) If there are any X → t rules then make an accepting state 𝑞ˆ, and for each
such rule put a transition from 𝑞𝑋 to 𝑞ˆ labeled t.
4.30 Prove that the collection of regular languages over Σ is closed under each of
the operations.
(a) pref ( L) contains those strings that are a prefix of at least one element 𝛼 ∈ L,
that is, pref ( L) = {𝜎 ∈ Σ∗ there is a 𝜏 ∈ Σ∗ such that 𝜎 ⌢ 𝜏 = 𝛼 ∈ L }
(b) suff ( L) contains the strings that are a suffix of some string in L, that is,
suff ( L) = {𝜎 ∈ Σ∗ there is a 𝜏 ∈ Σ∗ such that 𝜏 ⌢ 𝜎 ∈ L }
Section 4. Regular languages 219

(c) allprefs ( L) contains the strings𝜎 such that every prefix of 𝜎 is in L, so

allprefs ( L) = {𝜎 ∈ L every 𝜏 ∈ Σ∗ that is a prefix of 𝜎 has 𝜏 ∈ L }
4.31 We can give an alternative proof of Theorem 4.5, that the collection of regular
languages is closed under set intersection, set difference, and set complement, that
does not rely on “by construction.”
c
(a) Observe that the identity 𝑆 ∩ 𝑇 = (𝑆 c ∪ 𝑇 c ) gives intersection in terms of
union and complement. Use Lemma 4.3 to argue that if regular languages are
closed under complement then they are also closed under intersection.
(b) Use the identity 𝑆 − 𝑇 = 𝑆 ∩ 𝑇 c to make a similar observation about set
difference.
(c) Show that the complement of a regular language is also a regular language.

4.32 Prove that the language recognized by a Finite State machine with 𝑛 states
is infinite if and only if the machine accepts at least one string of length 𝑘 , where
𝑛 ≤ 𝑘 < 2𝑛 .
4.33 Fix two alphabets Σ0, Σ1 . A function ℎ : Σ0 → Σ1 ∗ induces a homomorphism
on Σ0 ∗ via the operation ℎ(𝜎 ⌢ 𝜏) = ℎ(𝜎) ⌢ ℎ(𝜏) and ℎ(𝜀) = 𝜀 .
(a) Take Σ0 = B and Σ1 = { a, b } . Fix a homomorphism ℎ̂( 0) = a and ℎ̂( 1) = ba.
Find ℎ̂( 01) , ℎ̂( 10) , and ℎ̂( 101) .
(b) Define ℎ( L) = {ℎ(𝜎) 𝜎 ∈ Σ0 ∗ } . Let L̂ = {𝜎 ⌢ 1 𝜎 ∈ B∗ } ; describe it with a
regular expression. Using the homomorphism ℎ̂ from the prior item, describe
ℎ̂( L̂) with a regular expression.
(c) Prove that the collection of regular languages is closed under homomorphism,
that if L is regular then so is ℎ( L) .
4.34 Find a nondeterministic Finite State machine M so that producing another
machine M̂ by taking the complement of the accepting states, 𝐹 M̂ = (𝐹 M ) c, will
not result in the language of the second machine being the complement of the
language of the first.
4.35 We will show that the class of regular languages is closed under reversal.
Recall that the reversal of the language is defined to be the set of reversals of the
strings in the language L R = {𝜎 R 𝜎 ∈ L }.
(a) Show that for any two strings the reversal of the concatenation is the concate-
nation, in the opposite order, of the reversals (𝜎0 ⌢ 𝜎1 ) R = 𝜎1 R ⌢ 𝜎0 R. Hint: do
induction on the length of 𝜎1 .
(b) We will prove the result by showing that for any regular expression 𝑅 , the
reversal L (𝑅) R is described by a regular expression. We will construct
this expression by defining a reversal operation on regular expressions.
Fix an alphabet Σ and let (1) ∅ R = ∅, (2) 𝜀 R = 𝜀 , (3) 𝑥 R = 𝑥 for
any 𝑥 ∈ Σ, (4) (𝑅0 ⌢ 𝑅1 ) R = 𝑅1 R ⌢ 𝑅0 R , (5) (𝑅0 |𝑅1 ) R = 𝑅0 R |𝑅1 R , and
(6) (𝑅 *) R = (𝑅 R )*. (Note the connection between (4) and the prior exercise
item.) Now show that 𝑅 R describes L (𝑅) R . Hint: use induction on the length
of the regular expression 𝑅 .
220 Chapter IV. Automata

Section
IV.5 Non-regular languages
The prior section show that there are languages that are not regular via a counting
argument. We now see a technique to show that specified languages are not
regular.† This is similar to the second chapter, where we first used a counting
argument to prove that there are unsolvable problems and later showed that specific
problems such as the Halting problem are unsolvable.
The idea is that although Finite State machines are finite, they can get arbitrarily
long inputs. For instance, the power switch from Example 1.1 has only two states
but even if we toggle it hundreds of times, it still keeps track of whether the switch
is on or off. The key observation is that to process long inputs with only a small
number of states, a machine must revisit states, that is, it must cycle.
Cycles inside a machine cause a pattern in what that machine accepts. The
diagram below shows a machine that accepts aabbbc (it only shows some of the
states, those that the machine traverses in processing this input).
b
𝑞𝑖3 𝑞𝑖2

b a
(∗)
𝑞0 a
𝑞𝑖1 𝑞𝑖4 c
𝑞𝑖5
b

Because of the cycle, in addition to aabbbc this machine must also accept a ( abb) 2 bc
since that string takes the machine through the cycle twice. Likewise, this machine
accepts a ( abb) 3 bc, and cycling more times pumps out more accepted strings.
5.1 Theorem (Pumping Lemma) Let L be a regular language. There is a constant 𝑝 ∈
N+, the pumping length for the language,‡ such that every string 𝜎 ∈ L with
|𝜎 | ≥ 𝑝 decomposes into three substrings 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 satisfying: (1) the first two
components are short, |𝛼𝛽 | ≤ 𝑝 , (2) 𝛽 is not empty, and (3) the strings 𝛼𝛾 , 𝛼𝛽 2𝛾 ,
𝛼𝛽 3𝛾 , . . . are also members of the language L.
Proof Suppose that L is recognized by the deterministic Finite State machine M.
For 𝑝 it suffices to use the number of states in M.
Consider a string 𝜎 ∈ L with |𝜎 | ≥ 𝑝 . Finite State machines perform one
transition per character so the number of characters in an input string equals the
number of transitions. Thus the number of states, not necessarily distinct ones,
that the machine visits is one more than the number of transitions. (For instance,
with a one-character input a machine visits two states.) So in processing 𝜎 , the
machine revisits at least one state. It cycles.
Of the states that are repeated as the machine processes 𝜎 , fix the one 𝑞 that it
revisits first. Also fix 𝜎 ’s shortest two prefixes ⟨𝑠 0, ... 𝑠𝑖 ⟩ and ⟨𝑠 0, ... 𝑠𝑖 , ... 𝑠 𝑗 ⟩ that
†
?? contains another way to show that a language is not regular. While somewhat more abstract, it
applies to all non-regular languages whereas the result here does not apply to some (see Exercise 5.30).
‡
If 𝑝 works then so does any number greater than 𝑝 .
Section 5. Non-regular languages 221

take the machine to 𝑞 . That is, 𝑖 and 𝑗 are minimal such that 𝑖 ≠ 𝑗 and the extended
transition function gives Δ̂(⟨𝑠 0, ... 𝑠𝑖 ⟩) = Δ̂(⟨𝑠 0, ... 𝑠 𝑗 ⟩) = 𝑞 . Let 𝛼 = ⟨𝑠 0, ... , 𝑠𝑖 ⟩ , let
𝛽 = ⟨𝑠𝑖+1, ... 𝑠 𝑗 ⟩ , and let 𝛾 = ⟨𝑠 𝑗+1, ... 𝑠𝑘 ⟩ .
These strings satisfy conditions (1) and (2). In particular, choosing 𝑞 , 𝑖 , and 𝑗 to
be minimal guarantees that that |𝛼 ⌢𝛽 | ≤ 𝑝 because the machine has 𝑝 -many states
and so a state must repeat by at most the 𝑝 -th input character. For condition (3),
this string
𝛼 ⌢ 𝛾 = ⟨𝑠 0, ... 𝑠𝑖 , 𝑠 𝑗+1, ... 𝑠𝑘 ⟩
brings the machine from the start state 𝑞 0 to 𝑞 , and then to the same ending state
as did 𝜎 . That is, Δ̂(𝛼𝛾) = Δ̂(𝛼𝛽𝛾) and so the machine accepts 𝛼𝛾 . As to the other
strings in (3), for instance with

𝛼 ⌢ 𝛽 2 ⌢ 𝛾 = ⟨𝑠 0, ... 𝑠𝑖 , 𝑠𝑖+1, ... , 𝑠 𝑗 , 𝑠𝑖+1, ... 𝑠 𝑗 , 𝑠 𝑗+1, ... 𝑠𝑘 ⟩

the substring 𝛼 brings the machine from 𝑞 0 to 𝑞 , the first 𝛽 brings it from 𝑞 around
to 𝑞 again, the second 𝛽 makes the machine cycle to 𝑞 yet again, and finally 𝛾
brings it to the same ending state as did 𝜎 .
We typically use the Pumping Lemma to show that a language is not regular
through an argument by contradiction.
5.2 Example The canonical example is to show that this language of matched
parentheses is not regular.

L = { (𝑛 )𝑛 ∈ Σ∗ 𝑛 ∈ N } = {𝜀, (), (()), ((())), (4 )4, ... }

The alphabet is the set of parentheses characters, Σ = { ), ( }.

For contradiction, assume that L is regular. Then the Pumping Lemma says
that L has a pumping length. Call it 𝑝 . Consider the string 𝜎 = (𝑝 )𝑝.
That string is an element of L and |𝜎 | ≥ 𝑝 . Therefore it decomposes into three
substrings 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 in a way that satisfies the conditions. Condition (1) is that
the length of the prefix 𝛼 ⌢ 𝛽 is less than or equal to 𝑝 . Because of this, and because
the first 𝑝 -many characters of 𝜎 are all open parentheses, we know that both 𝛼
and 𝛽 are composed entirely of open parentheses. Condition (2) is that 𝛽 is not
empty, so it consists of at least one (.
Condition (3) is that all of the strings 𝛼𝛾 , 𝛼𝛽 2𝛾 , 𝛼𝛽 3𝛾 , . . . are elements of L.
To get the contradiction, consider 𝛼𝛽 2𝛾 (there are other members of the list that
we could choose but we only need to choose one). Compared with 𝜎 = 𝛼𝛽𝛾 , this
string has an extra 𝛽 , which adds at least one open parenthesis without also adding
any closed parentheses. That is, 𝛼𝛽 2𝛾 has more (’s than )’s. It is therefore not a
member of L. But the Pumping Lemma says that it must be a member of L, and
thus the assumption that L is regular leads to a contradiction.
5.3 Example Recall that a palindrome is a string that reads the same backwards as
forwards, such as bab, abbaabba, or a5 ba5. We will show that the language over
Σ = { a, b } of all palindromes, L = {𝜎 ∈ Σ∗ 𝜎 R = 𝜎 } is not regular.
222 Chapter IV. Automata

For contradiction assume that L is regular. The Pumping Lemma says that this
language has a pumping length. Call it 𝑝 and consider 𝜎 = a𝑝 ba𝑝.
The string 𝜎 is an element of L and |𝜎 | ≥ 𝑝 . Thus it decomposes as 𝜎 = 𝛼𝛽𝛾 ,
subject to the three conditions. Condition (1) is that |𝛼𝛽 | ≤ 𝑝 and so both
substrings 𝛼 and 𝛽 are composed entirely of a’s. Condition (2) is that 𝛽 is not the
empty string and so 𝛽 consists of at least one a. Condition (3) states that all of
the strings 𝛼𝛾, 𝛼𝛽 2𝛾, 𝛼𝛽 3𝛾, ... are members of L. Consider the first, 𝛼𝛾 (there are
other choices that would work).
Compared to 𝜎 = 𝛼𝛽𝛾 , in 𝛼𝛾 the substring 𝛽 is gone. Because 𝛼 and 𝛽 consist
entirely of a’s, the substring 𝛾 has the b character from 𝜎 , and hence also has the
a𝑝 that follows this b. So compared to 𝜎 = 𝛼𝛽𝛾 , the string 𝛼𝛾 omits at least one a
before the b but none of the a’s after it. Therefore 𝛼𝛾 is not a palindrome, which is
the desired contradiction.
5.4 Remark In that example the string 𝜎 has three parts, 𝜎 = a𝑝 ⌢ b ⌢ a𝑝, and it
decomposes into three parts, 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 . Don’t make the mistake of thinking that
the two decompositions line up. The Pumping Lemma does not say that 𝛼 = a𝑝,
𝛽 = b, and 𝛾 = a𝑝 — indeed, we’ve shown that 𝛽 does not contain the b. Instead
the lemma’s first condition only says that the first two substrings together, 𝛼𝛽 ,
consists exclusively of a’s. So perhaps 𝛼𝛽 = a𝑝, or perhaps 𝛾 starts with some a’s
that are then followed by ba𝑝. That is, all we know is that 𝛼𝛽 matches the regular
expression a* while 𝛾 matches a*baa ... a, with 𝑝 -many a’s at the end.
5.5 Example Consider L = { 0𝑚 1𝑛 ∈ B∗ 𝑚 = 𝑛 + 1 } = { 0, 001, 00011, ... }, whose
members start with a number of 0’s that is one more than the number of 1’s at the
end. We will prove that it is not regular.
For contradiction assume otherwise, that L is regular, and denote its pumping
length by 𝑝 . Consider 𝜎 = 0𝑝+1 1𝑝 ∈ L. Because |𝜎 | ≥ 𝑝 , the Pumping Lemma
gives a decomposition 𝜎 = 𝛼𝛽𝛾 satisfying the three conditions. Condition (1) says
that |𝛼𝛽 | ≤ 𝑝 , so that the substrings 𝛼 and 𝛽 have only 0’s (and also, all of 𝜎 ’s
1’s are in 𝛾 ). Condition (2) says that 𝛽 has at least one character, necessarily 0.
Consider Condition (3)’s list: 𝛼𝛾 , 𝛼𝛽 2𝛾 , 𝛼𝛽 3𝛾 , . . . Compare its first entry, 𝛼𝛾 ,
to 𝜎 . The string 𝛼𝛾 has fewer 0’s then does 𝜎 but the same number of 1’s. So the
number of 0’s in 𝛼𝛾 is not one more than the number of 1’s. Thus 𝛼𝛾 ∉ L, which
contradicts the third condition of the Pumping Lemma.
We can interpret that example to say that Finite State machines cannot recognize
a predecessor-successor relationship. We can similarly use the Pumping Lemma to
show Finite State machines cannot recognize other arithmetic relations.
5.6 Example The language L = { a𝑛 𝑛 is a perfect square } = {𝜀, a, a4, a9, a16, ... } is
not regular. For, suppose otherwise. Denote the pumping length by 𝑝 and consider
2
𝜎 = a (𝑝 ), so that 𝜎 ∈ L and |𝜎 | ≥ 𝑝 .
By the Pumping Lemma, 𝜎 decomposes as 𝛼𝛽𝛾 , subject to the three conditions.
Condition (1) is that |𝛼𝛽 | ≤ 𝑝 , which implies that |𝛽 | ≤ 𝑝 . Condition (2) is that
0 < |𝛽 | . Now consider the strings 𝛼𝛾 , 𝛼𝛽 2𝛾 , . . .
We will get a contradiction from 𝛼𝛽 2𝛾 . The definition of L is that after 𝜎 the
Section 5. Non-regular languages 223

next longest string has length (𝑝 + 1) 2 = 𝑝 2 + 2𝑝 + 1. The difference between 𝑝 2

and 𝑝 2 + 2𝑝 + 1 is strictly greater than 𝑝 . However the gap between the length
|𝜎 | = |𝛼𝛽𝛾 | and the length |𝛼𝛽 2𝛾 | is at most 𝑝 because 0 < |𝛽 | ≤ 𝑝 . Hence the
length of 𝛼𝛽 2𝛾 is not a perfect square, which contradicts the Pumping Lemma.
Sometimes we use the Pumping Lemma in conjunction with the closure proper-
ties of regular languages.
5.7 Example The language L = {𝜎 ∈ { a, b }∗ 𝜎 has as many a’s as b’s } is not regular.
To prove that, observe that the language L̂ = { a𝑚 b𝑛 ∈ { a, b }∗ 𝑚, 𝑛 ∈ N } is
regular, described by the regular expression a*b*. Recall that the intersection
of two regular languages is regular. But L ∩ L̂ is the set { a𝑛 b𝑛 𝑛 ∈ N } and
Example 5.2 shows that this language isn’t regular, where we substitute a and b for
the parentheses.
We have seen many examples of things that Finite State machines can do and
here we have seen things that they cannot. This is a pleasing balance, but our
interest is motivated by more than symmetry.
For instance, recognizing the language of balanced parentheses as in Exam-
ple 5.2 is something that we often want to do in a compiler. A Turing machine
can solve this problem but we now know that a Finite State machine cannot. We
therefore now know that to solve this problem we must have some kind of scratch
memory. So the results in this section speak to the resources needed to solve
problems.

IV.5 Exercises
✓ 5.8 Example 5.5 shows that L = { 0𝑚 1𝑛 ∈ B∗ 𝑚 = 𝑛 + 1 } is not regular but your
friend doesn’t get it and asks you, “What’s wrong with the regular expression
0𝑛+1 1𝑛 ?” Explain it to them.
5.9 Example 5.2 uses 𝛼𝛽 2𝛾 to show that the language of balanced parentheses is
not regular. Instead get the contradiction by showing that 𝛼𝛾 is not a member of
the language.
5.10 Your friend has been thinking. They say, “Hey, the diagram (∗) before
Theorem 5.1 doesn’t apply unless the language is infinite. Sometimes languages
are regular because they only have like three or four strings. But the Pumping
Lemmas third condition requires that infinitely many strings be in the language, so
the Pumping Lemma is wrong.” In what way do they need to further refine their
thinking?
5.11 Someone in the class emails you, “If a language has a string with length
greater than the number of states, which is the pumping length, then it cannot be
a regular language.” Correct?
✓ 5.12 Your study partner has read Remark 5.4 but it is still sinking in. About the
matched parentheses example, Example 5.2, they say, “So 𝜎 = (𝑝 )𝑝 , and 𝜎 = 𝛼𝛽𝛾 .
We know that 𝛼𝛽 consists only of (’s, so it must be that 𝛾 consists of )’s.” Give
them a prompt.
224 Chapter IV. Automata

✓ 5.13 For each, give five strings that are elements of the language and five that are
not, and then show that the language is not regular by using the Pumping Lemma.
(a) L0 = { a𝑛 b𝑚 𝑛 + 2 = 𝑚 }
(b) L1 = { a𝑛 b𝑚 c𝑛 𝑛, 𝑚 ∈ N }
(c) L2 = { a𝑛 b𝑚 𝑛 < 𝑚 }
✓ 5.14 For each language over Σ = { a, b } produce five strings that are members.
Then decide whether that language is regular. Prove your assertion either by
producing a regular expression or using the Pumping Lemma.
(a) { a𝑛 b𝑚 ∈ Σ∗ 𝑛 = 3 }
(b) { a𝑛 b𝑚 ∈ Σ∗ 𝑛 + 3 = 𝑚 }
(c) { a𝑛 b𝑚 ∈ Σ∗ 𝑛, 𝑚 ∈ N }
(d) { a𝑛 b𝑚 ∈ Σ∗ 𝑚 − 𝑛 > 12 }
✓ 5.15 Each language is non-regular and 𝜎 is a good choice as part of a proof using
the Pumping Lemma, where 𝑝 is the pumping length. For each, give the most
specific regular expression describing 𝛼𝛽 and 𝛾 . Take Σ = { a, b }.
(a) L = { a𝑛 b2𝑛 𝑛 ∈ N } , 𝜎 = a𝑝 b2𝑝
(b) L = { a𝑛 b𝑛+5 𝑛 ∈ N } , 𝜎 = a𝑝 b𝑝+5
(c) L = { a𝑖 b 𝑗 a𝑖+𝑗 𝑖 ∈ N and 𝑗 ∈ N+ } , 𝜎 = a𝑝 ba𝑝+1
(d) L = { a𝑖 ⌢ 𝜏 𝑘 ∈ N and |𝜏 | = 𝑘 } , 𝜎 = a𝑝 b𝑝
(e) L = {𝜏 𝜏 is a palindrome and |𝜏 | is even } , 𝜎 = a𝑝 bba𝑝
5.16 With a friend you try to apply the Pumping Lemma to {𝜏 ⌢ 𝜏 𝜏 ∈ Σ∗ }.
(a) List five elements of the language.
(b) You pick 𝜎 = a𝑝 ba𝑝 b; go through the argument.
(c) Your friend tries 𝜎 = a𝑝 a𝑝 and can’t get their argument to go. Suggestions?
✓ 5.17 Use the Pumping Lemma to prove that L = { a𝑚−1 cb𝑚 𝑚 ∈ N+ } is not
regular. It may help to first produce five strings from the language.
5.18 Show that the language over { a, b } of strings having more a’s than b’s is not
regular.
5.19 One of these is regular, one is not. Which is which? (Prove your assertions.)
(a) { a𝑛 b𝑚 ∈ { a, b } 𝑛 = 𝑚 2 }
∗

(b) { a𝑛 b𝑚 ∈ { a, b } 3 < 𝑚, 𝑛 }
∗

5.20 Is {𝜎 ∈ B∗ 𝜎 = 𝛼𝛽𝛼 R for 𝛼, 𝛽 ∈ B∗ } regular? Either way, prove it.

5.21 Prove that the language L = {𝜎 ∈ { 1 }∗ |𝜎 | = 𝑛 ! for some 𝑛 ∈ N } is not
regular. Hint: the differences (𝑛 + 1) ! − 𝑛 ! grow without bound.
5.22 One of these is regular, one is not: { 0𝑚 10𝑛 𝑚, 𝑛 ∈ N } and { 0𝑛 10𝑛 𝑛 ∈ N }.
Which is which? (Prove the assertions.)
✓ 5.23 Show that there is a Finite State machine that recognizes this language of all
sums totaling less than four, L4 = { a𝑖 b 𝑗 c𝑘 𝑖, 𝑗, 𝑘 ∈ N and 𝑖 + 𝑗 = 𝑘 and 𝑘 < 4 }.
Use the Pumping Lemma to show that no Finite State machine recognizes the
language of all sums, L = { a𝑖 b 𝑗 c𝑘 𝑖, 𝑗, 𝑘 ∈ N and 𝑖 + 𝑗 = 𝑘 }.
Section 5. Non-regular languages 225

5.24 Decide if each is a regular language of bitstrings:

(a) the number of 0’s plus the number of 1’s equals five,
(b) the number of 0’s minus the number of 1’s equals five.
✓ 5.25 Show that { 0𝑚 1𝑛 ∈ B∗ 𝑚 ≠ 𝑛 } is not regular. Hint: use the closure
properties of regular languages and consider the set { 0𝑚 1𝑚 ∈ B∗ 𝑚 ∈ N }.
5.26 Example 5.7 shows that {𝜎 ∈ { a, b }∗ 𝜎 has as many a’s as b’s } is not
regular. In contrast, show that L = {𝜎 ∈ { a, b }∗ 𝜎 has as many ab’s as ba’s } is
regular. Hint: think of ab and ba as marking a transition from a block of one
character to a block of another. For instance, this string has three such transitions
aaa b aaa bbb (and is not in a member of L).
✓ 5.27 Rebut someone who says to you, “Sure, for the machine before Theorem 5.1
on page 220, a single loop will cause 𝜎 = 𝛼 ⌢ 𝛽 ⌢𝛾 . But if the machine had a double
loop like below then you’d need a longer decomposition.”
b b
𝑞𝑖3 𝑞𝑖2 𝑞𝑖7 𝑞𝑖6

b a a b
𝑞0 a
𝑞𝑖1 a
𝑞𝑖4 𝑞𝑖5 a
𝑞𝑖8
b

5.28 Show that {𝜎 ∈ B∗ 𝜎 = 1𝑛 where 𝑛 is prime } is not a regular language.

Hint: the third condition’s sequence has a constant difference in string lengths.
5.29 Consider { a𝑖 b 𝑗 c𝑖 · 𝑗 𝑖, 𝑗 ∈ N }.
(a) Give five strings from this language.
(b) Show that it is not regular.
5.30 There are non-regular languages that the Pumping Lemma will not prove
are non-regular. Take Σ = { a, b, c } and let L0 = { ab𝑛 c𝑛 𝑛 ∈ N } and L1 =
{ a𝑘 ⌢ 𝜏 𝑘 ≠ 1 and 𝜏 ∈ { b, c }∗ }. We will show that L = L0 ∪ L1 is not regular,
although it satisfies the Pumping Lemma in that it has a pumping length 𝑝 ∈ N
such that every string 𝜎 ∈ L with |𝜎 | ≥ 𝑝 decomposes as 𝜎 = 𝛼𝛽𝛾 , subject to the
familiar three conditions.
(a) Show that L1 is regular.
(b) Show that L0 is not regular.
(c) Conclude from the prior two that L is not regular.
(d) Fix 𝑝 = 3. For a string of the form 𝜎 = ab𝑛 c𝑛 find a decomposition 𝜎 = 𝛼𝛽𝛾
such that |𝛼𝛽 | ≤ 𝑝 and |𝛽 | > 0 and every string in 𝛼𝛾, 𝛼𝛽 2𝛾, ... is also a
member of L.
(e) Again taking 𝑝 = 3, verify the same for a string starting with zero-many a’s
that is, of the form 𝜎 = 𝜏 ∈ { b, c }∗.
(f) Verify it also for a string of the form 𝜎 = a𝑘 ⌢ 𝜏 for 𝑘 ≥ 2.
5.31 The proof of the Pumping Lemma shows that where a Finite State machine
recognizes a language, the number of states in the machine suffices as a pumping
length for that language. But 𝑝 can be smaller than that.
(a) Consider the language L described by (01)*. Construct a deterministic Finite
226 Chapter IV. Automata

State machine with three states that recognizes this language and argue that
this is the minimal number of states for such a machine.
(b) Show that the minimal pumping length for L is 1.

Section
IV.6 Pushdown machines
No Finite State machine can recognize the language of balanced parentheses. So
this machine model is not powerful enough to, for instance, decide whether input
strings are valid programs in most programming languages. To handle nested
parentheses, the natural data structure is a pushdown stack. We now supplement
a Finite State machine by giving it access to a stack.
A stack is like the restaurant dish dispenser below: when you push a new dish
on, its weight compresses a spring underneath, so that the old ones move down
and the most recent dish is the only one that you can reach. When you pop that
top dish off, the spring pushes the remaining dishes up and now you can reach the
next one. We say that this stack is LIFO: Last-In, First-Out.

Below on the right is a sequence of views of a stack. Initially the stack has two
characters, g3 and g2. We push g1 on the stack, and then g0. Now, although g1 is
on the stack, we don’t have immediate access to it. To get at g1 we must first pop
off g0, as in the last stack shown.

g2 g1 g0 g1

g3 g2 g1 g2

g3 g2 g3

Like a Turing machine tape, a stack provides storage that is unbounded. But
it has restrictions that the tape does not. Once something is popped, it is gone.
We could include in the machine a state whose intuitive meaning is that we have
just popped g0 but as there are finitely many states and unboundedly many stack
arrangements, that strategy has limits.
Section 6. Pushdown machines 227

Definition We will extend the definition of Finite State machines by adding a

stack. This stack holds tokens from an alphabet, Γ = { g0, g1, ... } We reserve a
character, ⊥,† to mark the stack bottom and so we stipulate that it is not a member
of Γ. Similarly, we will use a single B to mark the end of the tape input, and so
assume it is not a member of the tape alphabet Σ.‡ And, because the variety of
Finite State machine that we will extend is the nondeterministic machine with
𝜀 moves, we also assume that the tape alphabet does not contain the character 𝜀 .
Before the definition we will give an example.

6.1 Example We will give a Pushdown machine that recognizes the language of
balanced parentheses, LBAL , containing strings such as [] and [[]], as well as
[[][]] and [][]. Precisely stated, 𝜎 ∈ LBAL if it contains the same number of [’s
as ]’s and no prefix of 𝜎 contains more ]’s than [’s.
The Pumping Lemma shows that no Finite State machine recognizes LBAL . But
it is recognized by this Pushdown machine. It has two states 𝑄 = {𝑞 0, 𝑞 1 }, one
of which is an accepting state, 𝐹 = {𝑞 1 }. Its tape alphabet is Σ = { [ , ] } and its
stack alphabet is Γ = { g0 }. The table below gives Δ. Instruction numbers are for
ease of reference.

Instruction number Input Output

0 𝑞 0, [, ⊥ 𝑞 0, ‘g0 ⊥’
1 𝑞 0, [, g0 𝑞 0, ‘g0 g0’
2 𝑞 0, ], g0 𝑞 0, ‘ ’
3 𝑞 0, ], ⊥ 𝑞 0, ‘ ’
4 𝑞 0, B, ⊥ 𝑞 1, ‘⊥’
5 𝑞 0, B, g0 𝑞 0, ‘g0’

In a Pushdown machine every computation step begins with the machine popping
the top character off the stack. There that character is ⊥. The machine is then in
state 𝑞 0 , is reading [ on the tape, and the popped character is ⊥, so instruction 0
applies. The machine goes into state 𝑞 0 (which is not a change) and pushes the
two-token string g0⊥ onto the stack. The ⊥ only replaces what was there already,
but the g0 makes a new stack top character.
Here is an example computation accepting the string [[]][].

†
Read aloud as “bottom.” ‡ These machines sometimes need to do final work triggered by the end of
the input. This doesn’t happen for Finite State machines and so for them we don’t mark the input end
in the same way.
228 Chapter IV. Automata

Step Configuration
[[]][]B ⊥
0 𝑞0

[]][]B g0 ⊥
1 𝑞0

]][]B g0 g0 ⊥
2 𝑞0

][]B g0 ⊥
3 𝑞0

[]B ⊥
4 𝑞0

]B g0 ⊥
5 𝑞0

B ⊥
6 𝑞0

⊥
7 𝑞1

After step 1 there are two g0’s on the stack, which is how the machine remembers
that the number of [’s it has consumed is two more than the number of [’s. At the
end it has an empty tape and is in an accepting state, so it accepts the input.
Here is a rejection example, whose initial string does not have balanced
parentheses.

Step Configuration
[]]B ⊥
0 𝑞0

]]B g0 ⊥
1 𝑞0

]B ⊥
2 𝑞0

B
3 𝑞0

At the end, although the tape still has content, the stack is empty. The machine
cannot start the next step by popping the top stack character because there is no
such character. The computation dies, without accepting the input.
We are ready for the definition.
Section 6. Pushdown machines 229

6.2 Definition A nondeterministic Pushdown machine ⟨𝑄, 𝑞 0, 𝐹, Σ, Γ, Δ⟩ consists of

a finite set of states 𝑄 = {𝑞 0, ... 𝑞𝑛− 1 }, including a start state 𝑞 0 , a subset 𝐹 ⊆ 𝑄 of
accepting states, a nonempty input alphabet Σ, a nonempty stack alphabet Γ, and
a transition function Δ : 𝑄 × (Σ ∪ { B, 𝜀 }) × (Γ ∪ { ⊥ }) → P 𝑄 × (Γ ∪ { ⊥ }) ∗ .

A configuration C of a machine is a triple listing its present state, present

sequence of characters remaining on the tape, and present sequence of characters
on the stack. The transition function specifies how the machine moves from
configuration to configuration. We can represent this specification either as
Δ(𝑞𝑖 , 𝑡, 𝑔) = 𝑆 where 𝑆 is a set of pairs ⟨𝑞𝑘 , 𝛾⟩ or as a set of 5-tuple ⟨𝑞𝑖 , 𝑡, 𝑔, 𝑞𝑘 , 𝛾⟩
instructions, where 𝑡 ∈ Σ ∪ { B, 𝜀 }, 𝑔 ∈ Γ ∪ { ⊥ }, and 𝛾 ∈ Γ ∗.
The starting configuration C0 has the machine in state 𝑞 0 , with a stack containing
only the ⊥ character, and with the read head at the first character of 𝜏0 ⌢ B where
𝜏0 ∈ Σ∗ is the input string.
As for actions, suppose that the machine’s configuration C𝑠 has it in state 𝑞𝑖
with the tape head reading 𝑡 . If there is nothing on the stack, including not even ⊥,
then the computation dead-ends — there is no configuration C such that C𝑠 ⊢ C and
the the machine does not accept the input string 𝜏0 .
Otherwise let 𝑔 be the token at the top of the stack. Suppose that one of the
outputs that Δ associates with ⟨𝑞𝑖 , 𝑡, 𝑔⟩ is ⟨𝑞𝑘 , 𝛾⟩ . Then a next configuration, a C𝑠+1
so that C𝑠 ⊢ C𝑠+1 , has these properties. (1) If 𝑡 ∈ Σ then the machine consumes
one tape character, necessarily 𝑡 , goes into state 𝑞𝑘 , pops 𝑔 off the stack, and then
pushes the characters of the sequence 𝛾 = ⟨𝑔0, ... 𝑔𝑚 ⟩ onto the stack in the order
that leaves 𝑔0 now at the stack top. (2) If 𝑡 = 𝜀 (that is, 𝑡 is the single character ‘𝜀 ’).
then everything is the same except that the read head does not consume its input
character.
A computation ends when the tape is exhausted, including the end-marking B.
The machine accepts its initial string 𝜏0 if at that point it is in a final state.
6.3 Example Palindromes that are odd length have a character in the middle that acts
as its own reverse. That is, they have the form 𝜎 ⌢ 𝑠 ⌢ 𝜎 R for 𝑠 ∈ Σ. The language
LMM uses 𝜎 ∈ { a, b }∗ along with the character 𝑠 = c as a middle marker so that
LMM = {𝜎 ∈ { a, b, c }∗ 𝜎 = 𝜏 ⌢ c ⌢ 𝜏 R for some 𝜏 ∈ { a, b }∗ }.
The machine below accepts this language. It has 𝑄 = {𝑞 0, 𝑞 1, 𝑞 2 }, 𝐹 = {𝑞 2 },
Σ = { a, b, c }, and Γ = { g0, g1 }.
Inst Input Output Inst Input Output
0 𝑞 0, a, ⊥ 𝑞 0, ‘g0 ⊥’ 8 𝑞 0, c, g0 𝑞 1, ‘g0’
1 𝑞 0, b, ⊥ 𝑞 0, ‘g1 ⊥’ 9 𝑞 0, c, g1 𝑞 1, ‘g1’
2 𝑞 0, c, ⊥ 𝑞 1, ‘⊥’ 10 𝑞 1, a, g0 𝑞 1, ‘ ’
3 𝑞 0, B, ⊥ 𝑞 3, ‘⊥’ 11 𝑞 1, b, g1 𝑞 1, ‘ ’
4 𝑞 0, a, g0 𝑞 0, ‘g0 g0’ 12 𝑞 1, B, ⊥ 𝑞 2, ‘⊥’
5 𝑞 0, a, g1 𝑞 0, ‘g0 g1’
6 𝑞 0, b, g0 𝑞 0, ‘g1 g0’
7 𝑞 0, b, g1 𝑞 0, ‘g1 g1’

In state 𝑞 0 , when this machine sees a on the tape then it pushes g0 onto the stack,
230 Chapter IV. Automata

and for b it pushes g1. Reading the c switches the machine to state 𝑞 2 . In this
phase, if it is reading a and the character on top of the stack is g0 then the machine
consumes the a, pops the g0, and goes on. The same happens with b and g1.
Otherwise, the computation dead-ends. Finally, if the machine reaches the end of
the input string at the same moment that it reaches the bottom of the stack then it
goes to the accepting state 𝑞 2 .
Here is an example computation accepting the input bacab.

Step Configuration
bacabB ⊥
0 𝑞0

acabB g1 ⊥
1 𝑞0

cabB g0 g1 ⊥
2 𝑞0

abB g0 g1 ⊥
3 𝑞1

bB g1 ⊥
4 𝑞1

B ⊥
5 𝑞1

⊥
6 𝑞3

Our final example makes essential use of guessing, by relying on 𝜀 transitions.

6.4 Example Consider the language of all even-length palindromes over B∗, LELP =
{𝜎𝜎 R 𝜎 ∈ B∗ } = {𝜀, 00, 11, 0000, 0110, 1001, 1111, ... }. The Pumping Lemma
shows that no Finite State machine recognizes this language. But this nondeter-
ministic Pushdown machine does.
This machine has three states, 𝑄 = {𝑞 0, 𝑞 1, 𝑞 2 } with 𝐹 = {𝑞 2 }, as well as Σ = B
and Γ = { g0, g1 }.

Inst Input Output Inst Input Output

0 𝑞 0, 0, ⊥ 𝑞 0, ‘g0⊥’ 7 𝑞 0, 𝜀, g0 𝑞 1, ‘g0’
1 𝑞 0, 1, ⊥ 𝑞 0, ‘g1⊥’ 8 𝑞 0, 𝜀, g1 𝑞 1, ‘g1’
2 𝑞 0, 𝜀, ⊥ 𝑞 1, ⊥ 9 𝑞 1, 0, g0 𝑞 1, ‘ ’
3 𝑞 0, 0, g0 𝑞 0, ‘g0g0’ 10 𝑞 1, 1, g1 𝑞 1, ‘ ’
4 𝑞 0, 0, g1 𝑞 0, ‘g0g1’ 11 𝑞 1, 𝜀, ⊥ 𝑞 2, ⊥
5 𝑞 0, 1, g0 𝑞 0, ‘g1g0’
6 𝑞 0, 1, g1 𝑞 0, ‘g1g1’

The machine runs in two phases. Where the input is 𝜎𝜎 R, the first phase works
Section 6. Pushdown machines 231

with 𝜎 . If the tape character is 0 then the machine pushes the token g0 onto the
stack, and if it is 1 then the machine pushes g1. This is done while in state 𝑞 0 .
The second phase works with 𝜎 R. If 0 is on the tape and g0 tops the stack, or 1
and g1, then the machine proceeds. Otherwise there is no matching instruction
and the computation branch dies. This is done while in state 𝑞 1 .
Without a middle marker how does the machine know when to change from
phase one to two, from pushing to popping? It is nondeterministic — it guesses.
That happens in lines 7 and 8. The 𝜀 character in the input means that the machine
can spontaneously transition from 𝑞 0 to 𝑞 1 .
We will show three example computations. For the first, we exhibit a successful
branch of the computation tree.

Step Configuration
0110 ⊥
0 𝑞0

110 g0 ⊥
1 𝑞0

10 g1 g0 ⊥
2 𝑞1

0 g0 ⊥
3 𝑞1

⊥
4 𝑞2

First note a point about the input. Because this machine can guess, it can guess
whether the input is finished. (Instruction 11 says that if the machine is in state 𝑞 1
and the stack has only ⊥ then the machine can spontaneously transition to 𝑞 2 ,
which is the only accepting state. If this happens after the input string has run out
then the computation branch succeeds.) So we can omit the terminating B that we
used earlier.
Next is the computation for input 00. The picture below shows the entire
computation tree. The 𝜀 transitions are drawn vertically (note the difference
between the vertical ‘⊢’ and the bottom symbol). The machine accepts the input
because the highlighted branch ends with an empty tape and in the accepting
state 𝑞 2 .
232 Chapter IV. Automata

Input: 0 0
𝑞2, ⊥
11

⊢
𝑞2, ⊥ 𝑞 1 , g0⊥ ⊢ 𝑞1, ⊥
9
11

⊢
𝑞1, ⊥ 7 𝑞 1, g0g0⊥

⊢
2 7

⊢
𝑞0, ⊥ ⊢ 𝑞 0 , g0⊥ ⊢ 𝑞 0, g0g0⊥
0 3
Step: 0 1 2

6.5 Animation: Computation tree for 00. Next to the ⊢’s are instruction numbers.

The third example computation is a rejection. The input is 100, which isn’t an
even-length palindrome, and none of the branches end both with an empty string
and in an accepting state.
Input: 1 0 0
𝑞2, ⊥ 𝑞 1, g0g1⊥ ⊢ 𝑞 1, g1⊥
9
11
⊢

𝑞1, ⊥ 𝑞 1 , g1⊥ 7 𝑞 1, g0g0g1⊥

⊢
2 8 7
⊢

⊢
𝑞0, ⊥ ⊢ 𝑞 0 , g1⊥ ⊢ 𝑞 0, g0g1⊥ ⊢ 𝑞 0, g0g0g1⊥
1 4 3
Step: 0 1 2 3

6.6 Animation: Computation tree rejecting the input 100.

Our intuition is that Pushdown machines have more power than Finite State
machines, in that they have a kind of unbounded read/write memory. The prior
examples support that, by showing Pushdown machines that recognize languages
that cannot be recognized by any Finite State machine.
6.7 Remark Stack machines models are often used in practice for running hardware.
Here is a ‘Hello World’ program in the PostScript printer language.
/Courier % name the font
20 selectfont % font size in points, 1/72 of an inch
72 500 moveto % position the cursor
(Hello world!) show % stroke the text
showpage % print the page

The interpreter pushes Courier on the stack, and then on the second line pushes
20 on the stack. It then executes selectfont, which pops two things off the stack
to set the font name and size. After that it moves the current point and places the
text on the page. Finally, it draws that page to paper.
We close this section with a number of related results that together make a
bigger picture, that the machine models form a linear hierarchy. Full coverage is
outside our scope so we will only discuss some of these results without proof.
The first result we have already seen, that deterministic Finite State machines
Section 6. Pushdown machines 233

do the same jobs as nondeterministic ones. That also holds for Turing machines,
although we will not consider nondeterministic Turing machines in depth until the
final chapter.
Another relevant result, which we won’t prove, is that there are things that
Turing machines can do but that no Pushdown machine can do. One is to decide
membership in the language {𝜎 ⌢ 𝜎 𝜎 ∈ B∗ }, which contains strings such as 1010
and 011011. A Pushdown machine can remember the characters by pushing them
onto the stack, and if that machine is nondeterministic then it can guess when the
first half of the input ends. But then to check that the second half of the string
matches the first it would need to pop the characters off to reverse them, and
reversing an arbitrary length string requires being able to write to the tape.
We know that Finite State machines accept Regular languages, and Turing ma-
chines accept computable languages. As to nondeterministic Pushdown machines,
recall that in the section on Grammars we restricted our attention to production
rules where the head consists of a single nonterminal, such as S → aSb.† If a
language has a grammar in which all the rules are of this type then it is a context
free language. Most familiar programming languages are context free, including C,
Java, Python, and Racket. We will state but not prove that a language is accepted
by some nondeterministic Pushdown machine if and only if it has a context free
grammar.
The last result needs deterministic Pushdown machines so we first outline how to
define them. In contradistinction to a nondeterministic machine, in a deterministic
machine at any step there is exactly one legal move. So to adjust the definition we
have for nondeterministic Pushdown machines to one for deterministic ones we
eliminate situations where the machine has choices. There are two situations. One
is that Δ(𝑞𝑖 , 𝑡, 𝑔) is a set and so we will require that in a deterministic machine
that set must have exactly one element. The other is evident in the tree diagrams
above: nondeterministic machines can have that Δ(𝑞𝑖 , 𝜀, 𝑔) is a nonempty set and
also that Δ(𝑞𝑖 , 𝑡, 𝑔) is nonempty for 𝑡 ≠ 𝜀 (see for instance the the prior example’s
machine in lines 0–2). So we outlaw the possibility that both are nonempty.
Example 6.1 and Example 6.3 are both deterministic.‡
With that, the last relevant result is that the collection of languages accepted
by deterministic Pushdown machines is a proper subset of the collection accepted
by nondeterministic Pushdown machines. While we won’t prove that, we can give
a good idea of why it is true. We have shown that there is a nondeterministic
Pushdown machine that accepts the language of even-length palindromes. It uses
𝜀 moves to guess when to change from pushing to popping. But a deterministic
Pushdown machine has recourse to no such tactic. Nor is there a middle marker to
rely on. In short, no deterministic Pushdown machine accepts LELP . So Pushdown
machines are different than Finite State machines and Turing machines — for
†
An example of a rule where the head is not of that form is cSb → aS. With this rule we can
substitute for S only if it is preceded by c and followed by b. A grammar with rules of this type is
called context sensitive because substitutions can only be done in a context. ‡ Deterministic Pushdown
machines need the end-marker B, which is why we used it for those examples.
234 Chapter IV. Automata

Pushdown machines, nondeterminism changes what can be done.

The diagram below summarizes our bigger picture. The universal box encloses
all languages of bitstrings, all subsets of B∗. The nested sets enclose those languages
recognized by some Finite State machine, etc.

Class Machine type

𝐴 Finite State, including nondeterministic
𝐴 𝐵 𝐶 𝐷 𝐵 Deterministic Pushdown
𝐶 Nondeterministic Pushdown
𝐷 Turing

IV.6 Exercises
✓ 6.8 Produce a Pushdown machine that does not halt.
6.9 Consider the Pushdown machine in Example 6.1.
(a) With the input [][], step through the computation as a sequence of ⊢ relations.
(b) Do the same but with the input ][][.
✓ 6.10 Produce a Pushdown machine to accept each language over Σ = { a, b, c }.
(a) { a𝑛 cb2𝑛 𝑛 ∈ N } (b) { a𝑛 cb𝑛− 1 𝑛 > 0 }
✓ 6.11 Give a Pushdown machine that accepts { 0 ⌢ 𝜏 ⌢ 1 𝜏 ∈ B∗ }.
✓ 6.12 Write a Pushdown machine that accepts { a2𝑛 𝑛 ∈ N }.
6.13 Give a Pushdown machine that accepts { a2𝑛 b𝑛 𝑛 ∈ N }.
✓ 6.14 Example 6.4 discusses the view of a nondeterministic computation as a tree.
Draw the tree for that machine these inputs. (a) 0110 (b) 010
✓ 6.15 Give a grammar for the language in Example 6.4, the even-length palindromes
over B.
6.16 Use the Pumping Lemma to show that the language of even-length palin-
dromes from Example 6.4 is not recognized by any Finite State machine.
6.17 Fix an alphabet Σ. (a) Show that the set of Pushdown machines over Σ
is countable. (b) Show that the collection of languages accepted by Pushdown
machines is countable. (c) Conclude that there are languages that no Pushdown
machine accepts.
6.18 Use Church’s Thesis to argue that any language recognized by a Pushdown
machine is recognized by some Turing machine.

Extra
IV.A Regular expressions in the wild
Regular expressions are an important tool in practice. Modern programming
languages such as Racket and Python include capabilities for extensions to regular
expressions, which we will call regexes. These go beyond the small-scale theory
examples that we saw earlier.
Extra A. Regular expressions in the wild 235

As an example, a system administrator searching a web server log for the PDF’s
downloaded from a directory. They might give this command.
$ grep "/linearalgebra/.*\textbackslash .pdf" /var/log/apache2/access.log

The grep utility program looks through the log file line by line. If a line has a
substring matching the regex then grep prints that line.
We will illustrate with Racket regexes. As a prototype,
> (regexp-match? #px"^[A-Z][A-Z][0-9][A-Z][A-Z]\$" "KE1AZ")

returns #t. Note the caret ^ at the start of the string and the dollar sign
at the end. These are anchors, making Racket match the entire string from
start to finish. They are needed because the most common use case is for
programmers to want to find the expression anywhere in the string. Thus for
instance, ( regexp - match ? #px "[0-9]" " KE1AZ ") also returns #t, although
its expression doesn’t account for the letters, because it asks for at least one digit
somewhere in KE1AZ. However, here we will use the caret and dollar sign because
for the purpose of this explication, they better describe the matching.
The extensions that languages make in going
from the theoretical regular expressions that we have
seen earlier to in-practice regexes fall into two cat-
egories. First come convenience constructs that ease
doing something that otherwise would be possible
but awkward. Second comes extensions that give
capabilities that are just not possible with regular
expressions.
Many of the convenience extensions are about the
problem of sheer scale: in the theory discussion our
alphabets had two or three characters but in practice
an alphabet must include at least ASCII’s printable
characters: a – z, A – Z, 0 – 9, space, tab, period, dash,
exclamation point, percent sign, dollar sign, open Courtesy xkcd.com
and closed parenthesis, open and closed curly braces,
etc. These days it may even contain all of Unicode’s more than one hundred
thousand characters. We need manageable ways to describe such large sets.
Consider matching a digit. The regular expression (0|1|2|3|4|5|6|7|8|9)
works, but is too verbose for an often-needed list. One abbreviation that modern
languages allow is [0123456789], omitting the pipe characters and using square
brackets, which in regexes are metacharacters. Or, because the digit characters
are contiguous in the character set,† we can shorten it further to [0-9]. Along the
same lines, [A-Za-z] matches a singleton English letter.
To invert the set of matched characters, put a caret ‘^’ as the first thing inside
the bracket (and note that it is a metacharacter). Thus, [^0-9] matches a non-digit
and [^A-Za-z] matches a character that is not an ASCII letter.
†
The digits 0 through 9 are contiguous in both ASCII and Unicode.
236 Chapter IV. Automata

The most common lists have short abbreviations. Another abbreviation for the
digits is \d. Use \D for the ASCII non-digits, \s for the whitespace characters
(space, tab, newline, formfeed, and line return) and \S for ASCII characters that are
non-whitespace. Cover the alphanumeric characters (upper and lower case ASCII
letters, digits, and underscore) with \w and cover the ASCII non-alphanumeric
characters with \W. And — the big kahuna — the dot ‘.’ is a metacharacter that
matches any member of the alphabet at all.†
1.1 Example Canadian postal codes have seven characters: the fourth is a space, the
first, third, and sixth are letters, and the others are digits. The regular expression
[a-zA-Z]\d[a-zA-Z] \d[a-zA-Z]\d describes them.
1.2 Example Dates are often given in the ‘dd/mm/yy’ format. This matches:
\d\d/\d\d/\d\d.
1.3 Example In the twelve hour time format some typical times strings are ‘8:05 am’
or ‘10:15 pm’. You could use this (note the empty string at the start).

(|0|1)\d:\d\d\s(am|pm)

To match a metacharacter, prefix it with a back-

slash, ‘\’. Thus, to look for the string ‘(Note’ put
a backslash before the open parentheses, \(Note.
Similarly, \| matches a pipe and \[ matches an open
square bracket. Match backslash itself with \\. This
is called escaping the metacharacter. The scheme
Courtesy xkcd.com
described above for representing lists with \d, \D,
etc. is an extension of escaping.
Operator precedence is: repetition binds most strongly, then concatenation,
and then alternation (force different meanings with parentheses). Thus, ab* is
equivalent to a(b*), and ab|cd is equivalent to (ab)|(cd).

Quantifiers In the theoretical cases we saw earlier, to match ‘at most one a’ we
used 𝜀 |a. In practice we can write something like (|a), as we did above for the
twelve hour times. But depicting the empty string by just putting nothing there
can be confusing. Modern languages make question mark a metacharacter and
allow you to write a? for ‘at most one a’.
For ‘at least one a’ modern languages use a+, so the plus sign is another
metacharacter. More generally, we often want to specify quantities. For instance,
to match five a’s regexes use the curly braces as metacharacters, with a{5}. Match
between two and five of them with a{2,5} and match at least two with a{2,}.
Thus, a+ is shorthand for a{1,}.
As earlier, to match any of these metacharacters you must escape them. For
instance, To be or not to be\? matches the famous question.
†
Programming languages in practice by default have the dot match any character except newline. In
addition, these languages have a way to make it also match newline.
Extra A. Regular expressions in the wild 237

Cookbook All of the extensions to regular expressions that we are seeing are
driven by the desires of working programmers. Here is a pile of examples showing
them accomplishing practical work, matching things you’d want to match.
1.4 Example US postal codes, called ZIP codes, are five digits. Match them with
\d{5}.
1.5 Example North American phone numbers match \d{3} \d{3}-\d{4}.
1.6 Example The regex (-|\+)?\d+ matches an integer, positive or negative. The
question mark makes the sign optional. The plus sign makes sure there is at least
one digit.
1.7 Example A natural number represented in hexadecimal can contain the usual
digits, along with the additional characters ‘a’ through ‘f ’ (sometimes capital-
ized). Programmers often prefix such a representation with 0x, so the regex is
(0x)?[a-fA-F0-9]+.
1.8 Example A C language identifier begins with an ASCII letter or underscore and
then can have arbitrarily many more letters, digits, or underscores: [a-zA-Z_]\w*.
1.9 Example Match a user name of between three and twelve letters, digits, under-
scores, or periods with [\w\.]{3,12}. Match a password that is at least eight
characters long with .{8,}.
1.10 Example The International Standards Organization date format calls for dates
like ‘yyyy-mm-dd HH:MM:SS’ (along with many other variants). The regex
\d{4}-\d{2}-\d{2} (\d{2}:\d{2}(:\d{2})?)? will match them.
1.11 Example Match the text inside a single set of parentheses with $[^()]*$.
1.12 Example We next match a URL, a web address such as https://hefferon.n
et/computation. This regex is more intricate than prior ones. It is based on
breaking URL’s into three parts: a scheme such as ‘http’ along with a colon and
two forward slashes, a host such as hefferon.net and a slash, and then a path
such as computing (the standard also allows a trailing query string but this regex
does not handle that).

(https?|ftp)://([^\s/?\.#]+\.?){0,3}[^\s/?\.#]+(/[^\s]*/?)?

Notice the question mark in https?, so that the scheme can be http or https.
Notice also that the host part, consists of between one and four fields separated
by periods. We allow almost any character in those fields, except for a space, a
question mark, a period or a hash. At the end comes the path.

But wait! there’s more! We have already noted that you can match the start of a
line and end of line with the metacharacters caret ‘^’ and dollar sign ‘$’.
1.13 Example Match lines starting with ‘ Theorem’ using ^Theorem. Match lines ending
with end{equation*} using end{equation\*}$.
Regex engines in modern languages let you specify that the match is case
insensitive, although they differ in the syntax that you use to achieve this.
238 Chapter IV. Automata

1.14 Example The web document language HTML document tag for an image, such
as <img src="logo.jpg">, uses either of the keys src or img to give the name of
the file containing the image. Those strings can be in upper case or lower case,
or any mix. Racket uses a ‘?i:’ syntax to mark part of the regex as insensitive:
\\s+(?i:(img|src))=. (Note also the double backslash, which is how Racket
escapes the backslash.)

Beyond convenience The regular expression engines that come with recent
programming languages have capabilities beyond matching only those languages
that are recognized by Finite State machines.
1.15 Example The language HTML uses tags such as boldface text and
italicized text. Matching any one tag is straightforward, for instance
[^<]*. But for a single expression that matches them all, you would seem
to have to do each as a separate case and then combine cases with n alternation
operator. However, instead we can have the system remember what it finds at
the start and look for that again at the end. Thus, the regex <([^>]+)>.*</\\1>
matches HTML tags like the ones given. Its second character is an open paren-
thesis, and the \\1 refers to everything between that open parenthesis and the
matching close parenthesis (and, that is not a typo; Racket’s syntax calls for double
backslashes). As is hinted by the 1, you can also have a second match with \\2,
etc.
That is a back reference. It is very convenient. However, it gives regexes more
power than the theoretical regular expressions that we studied earlier.
1.16 Example This is the language of square strings over Σ = { a, b }.

L = {𝜎 ∈ Σ∗ 𝜎 = 𝜏 ⌢ 𝜏 for some 𝜏 ∈ Σ∗ }

Some members are aabaab, baaabaaa, and aa. The Pumping Lemma shows that
the language of squares is not regular; see Exercise A.36. Describe this language
with the regex (.+)\1; note the back-reference.
1.17 Example Another language that the Pumping Lemma shows cannot be represented
using regular expressions, but that can be described with regexes is the language
of numbers that are nonprime, represented in unary, L = { 1𝑛 𝑛 is not prime }
It is described by the regex ^1?$|^(11+?)\\1+$. A brief explanation: the ^1?$
matches a string that is either zero-many or one-many 1’s. The ^(11+?)\\1+$
matches a group of 1’s repeated one or more times. Being able to divide the number
of 1’s into some number of subgroups is what characterizes a unary number as
composite, that is, not prime.

Tradeoffs Regexes are powerful tools. But they come with downsides.
For instance, the regular expression for twelve hour time from Example 1.3
(𝜀 |0|1)\d:\d\d\s(am|pm) does indeed match ‘8:05 am’ and ‘10:15 pm’ but it falls
short in some respects. One is that it requires am or pm at the end, but times are
often are given without them. We could change the ending to (𝜀 |\s am|\s pm).
Extra A. Regular expressions in the wild 239

Another issue is that it also matches some strings that you don’t want, such as
13:00 am or 9:61 pm. We can solve this as with the prior paragraph, by listing the
cases.† (01|02|...|11|12):(01|02|...|59|60)(\s am|\s pm). This is like
the prior patch in that it fixes the issue but at a cost of complexity, since it amounts
to a list of allowed substrings. Regexes have a tendency to grow, to accrete subcases
like this.
Another example is the Canadian
postal expression in Example 1.1. Not
every matching string has a correspond-
ing physical post office — for one thing,
no valid codes begin with Z. And US ZIP
codes work the same way; there are fewer
than 50 000 assigned ZIP codes, so many
five digits strings are not in use. Chang- Courtesy xkcd.com
ing the regexes to cover only those codes
actually in use would make them just lists of strings, which would change frequently.
The canonical example of this is the regex describing the official standard for
valid email addresses. We show here just five lines out of its 81 but that’s enough
to make the point about its complexity.
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0

And, even if you do have an address that fits the standard, you don’t know if there
is an email server listening at that address. In practice, people often use the regex
\S+@\S+ as a sanity check, for instance on a web form that expects users to input
an email address.
At this point regexes may be starting to seem a less like a fast and neat problem-
solver and a little more like a potential development and maintenance problem.
The full story is that sometimes a regex is just what you need for a quick job, and
sometimes they are good for more complex tasks also. But some of the time the
cost of complexity outweighs the gain in expressiveness. This power/complexity
tradeoff is often referred to online by citing this quote from J Zawinski.
The notion that [regexes] are the solution
to all problems is . . . braindead. . . . Some
people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems. Courtesy xkcd.com

IV.A Exercises
✓ A.18 Which of the strings matches the regex ab+c? (a) abc (b) ac (c) abbb
(d) bbc
†
Some substrings are elided so it fits in the margins.
240 Chapter IV. Automata

A.19 Which of the strings matches the regex [a-z]+[\.\? !]? (a) battle!
(b) Hot (c) green (d) swamping. (e) jump up. (f) undulate? (g) is.?
✓ A.20 Give a regex for each. (a) Match a string that has ab followed by zero or
more c’s, (b) ab followed by one or more c’s, (c) ab followed by zero or one c,
(d) ab followed by two c’s, (e) ab followed by between two and five c’s, (f) ab
followed by two or more c’s, (g) a followed by either b or c.
✓ A.21 Give a regex to accept a string for each description.
(a) Containing the substring abe.
(b) Containing only upper and lower case ASCII letters and digits.
(c) Containing a string of between one and three digits.
A.22 Give a regex to accept a string for each description. Take the English vowels
to be a, e, i, o, and u.
(a) Starting with a vowel and containing the substring bc.
(b) Starting with a vowel and containing the substring abc.
(c) Containing the five vowels in ascending order.
(d) Containing the five vowels.
A.23 Give a regex matching strings that contain an open square bracket and an
open curly brace.
✓ A.24 Every lot of land in New York City is denoted by a string of digits called BBL,
for Borough (one digit), Block (five digits), and Lot (four digits). Give a regex.
✓ A.25 Example 1.5 gives a regex for North American phone numbers. (a) They
are sometimes written with parentheses around the area code. Extend the regex
to cover this case. (b) Sometimes phone numbers do not include the area code.
Extend to cover this also.
A.26 Most operating systems come with file that has a list of words, for spell-
checking, etc. For instance, on Linux it may be at /usr/share/dict/words.
Use that file to find how many words fit the criteria. (a) contains the letter a
(b) starts with A (c) contains a or A (d) contains X (e) contains x or X
(f) contains the string st (g) contains the string ing (h) contains an a, and
later a b (i) contains none of the usual vowels a, e, i, o or u (j) contains all the
usual vowels (k) contains all the usual vowels, in ascending order
✓ A.27 Give a regex to accept time in a 24 hour format. It should match times of
the form ‘hh:mm:ss.sss’ or ‘hh:mm:ss’ or ‘hh:mm’ or ‘hh’.
A.28 Give a regex describing a floating point number.
✓ A.29 Give a suitable regex. (a) All Visa card numbers start with a 4. New
cards have 16 digits. Old cards have 13. (b) MasterCard numbers either start
with 51 through 55, or with the numbers 2221 through 2720. All have 16 digits.
(c) American Express card numbers start with 34 or 37 and have 15 digits.
✓ A.30 Postal codes in the United Kingdom have six possible formats. They are:
(i) A11 1AA, (ii) A1 1AA, (iii) A1A 1AA, (iv) AA11 1AA, (v) AA1 1AA, and (vi) AA1A
Extra A. Regular expressions in the wild 241

1AA, where A stands for a capital ASCII letter and 1 stands for a digit. (a) Give a
regex. (b) Shorten it.

✓ A.31 You are stuck on a crossword puzzle. You know that the first letter (of eight)
is an g, the third is an n and the seventh is an i. You have access to a file that
contains all English words, each on its own line. Give a suitable regex.

A.32 In the Tradeoffs discussion, we change the ending to (𝜀 |\s am|\s pm).
Why not \s(𝜀 |am|pm), which factors out the whitespace?

A.33 Imagine that you decide to avoid regexes but still want to do the sanity
check for email addresses discussed above, of accepting the string if and only it it
consists of a nonempty string of characters, followed by @, followed by a nonempty
string of characters. Implement that as a routine in your favorite language.

A.34 Give a regex that matches no string.

✓ A.35 The Roman numerals from grade school use the letters I, V, X, L, C, D, and M
to represent 1, 5, 10, 50, 100, 500, and 1000. They are written in descending order
of magnitude, from M to I, and are written greedily so that we don’t write six I’s
but rather VI. Thus, the date written on the book held by the Statue of Liberty is
MDCCLXXVI, for 1776. Further, we replace IIII with IV, and replace VIIII with IX.
Give a regular expression for valid Roman numerals less than 5000.

A.36 Example 1.16 says that L = {𝜎 ∈ Σ∗ 𝜎 = 𝜏 ⌢ 𝜏 for some 𝜏 ∈ Σ∗ }, the

language of square strings over Σ = { a, b }, is not regular. Verify that.

A.37 Consider L = { 0𝑛 10𝑛 𝑛 > 0 }. (a) Show that it is not regular. (b) Find a
regex.

A.38 In regex golf you are given two lists and must produce a regex that matches
all the words in the first list but none of the words in the second. The ‘golf ’ aspect
is that the person who finds the shortest regex, the one with the fewest characters,
wins. Try these: accept the words in the first list and not the words in the second.
(a) Accept: Arthur, Ester, le Seur, Silverter
Do not accept: Bruble, Jones, Pappas, Trent, Zikle
(b) Accept: alight, bright, kite, mite, tickle
Do not accept: buffing, curt, penny, tart
(c) Accept: afoot, catfoot, dogfoot, fanfoot, foody, foolery, foolish, fooster, footage,
foothot, footle, footpad, footway, hotfoot, jawfoot, mafoo, nonfood, padfoot,
prefool, sfoot, unfool
Do not accept: Atlas, Aymoro, Iberic, Mahran, Ormazd, Silipan, altared,
chandoo, crenel, crooked, fardo, folksy, forest, hebamic, idgah, manlike, marly,
palazzi, sixfold, tarrock, unfold

A.39 In a regex crossword each row and column has a regex. You have to find
strings for those rows and columns that meet the constraints.
242 Chapter IV. Automata

(AB|OE|SK)
[^SPEAK]+

(A|B|C)\1
EP|IP|EF
(a) (b)
HE|LL|O+ .*M?O.*
[PLEASE]+ (AN|FE|BE)

Extra
IV.B The Myhill-Nerode theorem

We have defined regular languages in terms of Finite State machines. Here we will
give a characterization that instead goes directly to the properties of the language.
Recall that in this chapter’s first section, Remark 1.7 said that the key to
designing Finite State machines is to think of each state as about its future, as
about the input strings to come.
2.1 Definition Fix a language L over an alphabet Σ, along with two strings
𝜎0, 𝜎1 ∈ Σ∗. Then 𝜏 ∈ Σ∗ is a distinguishing extension when of 𝜎0 ⌢ 𝜏 and 𝜎1 ⌢ 𝜏 ,
one is an element of L and the other is not. If such an extension exists then the
strings are L-distinguishable, otherwise they are L-indistinguishable or L-related,
denoted 𝜎0 ∼L 𝜎1 .
2.2 Lemma For any language L, the binary relation ∼L is an equivalence and hence
partitions the universe of all strings into equivalence classes, denoted EL,𝑗 .
Proof This is Exercise B.31’s item (a) .
2.3 Example Let L = {𝜎 ∈ { a, b }∗ |𝜎 | = 3 }, with 𝜎0 = aa and 𝜎1 = a.† Then 𝜏 = bb
is a distinguishing extension because 𝜎0 ⌢ 𝜏 = aabb ∉ L while 𝜎1 ⌢ 𝜏 = abb ∈ L.
The prior paragraph brings out that for this language two strings are L-
distinguishable if and only if they they have different lengths. So the equivalence
classes, the collections of indistinguishable strings, are the length zero strings,
EL,0 = {𝜀 }, and the length one strings, EL,1 = { a, b }, and those of length two,
EL,2 = { aa, ab, ba, bb }, and length three, EL,3 = { aaa, aab, ... bbb }, along with
the longer strings, EL,4 = {𝜎 |𝜎 | ≥ 4 }. In the picture below the box is the universe
of all strings Σ∗. It is partitioned into the equivalence classes, with every string a
member of one and only one class.

EL,0 EL,3
EL,1
EL,4
EL,2

†
Here, 𝜎1 = a is not a character, it is a length one string containing the character a. We won’t worry
too much about the distinction.
Extra B. The Myhill-Nerode theorem 243

2.4 Example Consider L = {𝜎 ∈ { a, b }∗ 𝜎 has even length }. Notice that if a string

𝜎 has even length then 𝜎 ⌢𝜏 ∈ L for an extension 𝜏 if and only if 𝜏 has even length.
So there is no extension that distinguishes between even length strings. Similar
reasoning holds for two odd-length strings. Thus the ∼L relation breaks { a, b }∗
into two parts, EL,0 = {𝜎 |𝜎 | is even } and EL,1 = {𝜎 |𝜎 | is odd }. This shows
the partition of the universe { a, b }∗.

EL,0 EL,1

2.5 Example The above examples have finitely many ∼L classes. Some languages
have infinitely many. One is L = {𝜎 ∈ { a, b }∗ 𝜎 = a𝑛 b𝑛 for some 𝑛 ∈ N }, from
Example 5.2. To show that the number of classes is infinite we don’t need to
produce them all; it is enough to produce infinitely many unequal classes (or for
that matter, infinitely many mutually distinguishable strings).

EL,𝑖 0 = {𝜀 } EL,𝑖 1 = { a } EL,𝑖 2 = { aa } EL,𝑖 3 = { aaa } ...

To verify that these singleton sets are equivalence classes, consider 𝜎 = a𝑛. Since
𝜎 ⌢ b𝑛 ∈ L, the only candidates for strings that are L-indistinguishable from 𝜎
but unequal to it have the form 𝛼 = a𝑛+𝑗 b 𝑗 with 𝑗 > 0. If 𝑛 = 0 then 𝜎 = 𝜀 and
𝛼 = b 𝑗 , and an extension distinguishing 𝜎 from 𝛼 is ab. If 𝑛 > 0 then an extension
distinguishing 𝜎 = a𝑛 from 𝛼 = a𝑛+𝑗 b 𝑗 is ab𝑛+1.
We next make a connection between the Finite State machines that recognize a
language L and the relationship of L-indistinguishability.
2.6 Example This machine recognizes L = {𝜎 ∈ { a, b }∗ 𝜎 has even length }, the
language of Example 2.4.
a,b

𝑞1 𝑞3
a,b
a
𝑞0 a,b
b
𝑞2 𝑞4
a,b

Consider other string inputs, not just the accepted ones, and see where they bring
the machine.
Input string 𝜎 𝜀 a b aa ab ba bb aaa aab aba abb ...
Ending state Δ̂(𝜎) 𝑞0 𝑞1 𝑞2 𝑞3 𝑞3 𝑞4 𝑞4 𝑞1 𝑞1 𝑞1 𝑞1 ...

The collection of input strings breaks into five sets, those that bring the machine
to 𝑞 0 , those that bring it to 𝑞 1 , etc. This is another kind of partition, which will
prove to be related to but different than the partition above. Denote the classes of
this M-related partition with EM,𝑖 = {𝜎 ∈ Σ∗ Δ̂(𝜎) = 𝑞𝑖 }.
244 Chapter IV. Automata

EM,0 = {𝜀 } EM,0
EM,1 = { a, aaa, ... } EM,1
EM,2 = { b, baa, ... } EM,3
EM,3 = { aa, ab, ... } EM,4 EM,2
EM,4 = { ba, bb, ... }

Below we lay the language-related partition, with two parts, on top of the machine-
related one with five.

EM,0
EM,1
EL,0 EM,3 EL,1 (∗)
EM,4 EM,2

The M-related parts are subsets of the L-related parts. That is, the M-related
partition is finer than the L-related partition.†
2.7 Definition Let M be a Finite State machine with alphabet Σ. Two strings
𝜎0, 𝜎1 ∈ Σ∗ are M-related if Δ̂(𝜎0 ) = Δ̂(𝜎1 ) , that is, if starting the machine with
input 𝜎0 ends with it in the same state as does starting the machine with input 𝜎1 .
2.8 Lemma The binary relation of M-related is an equivalence and so partitions the
collection of all strings Σ∗ into equivalence classes.
Proof See Exercise B.31’s item (b) .
2.9 Lemma Let M be a deterministic Finite State machine that recognizes L. If two
strings are M-related then they are L-related.
Proof Assume that 𝜎0 and 𝜎1 are M-related, so that starting M with input 𝜎0
causes it to end in the same state as starting it with input 𝜎1 . It follows that for
any suffix 𝜏 , starting M with the input 𝜎0 ⌢ 𝜏 causes it to end in the same state as
does starting it with the input 𝜎1 ⌢ 𝜏 (because the machine is deterministic). In
particular, 𝜎0 ⌢ 𝜏 takes M to an accepting state if and only if 𝜎1 ⌢ 𝜏 does. So the
two strings are L-related.
The EM,𝑖 classes reflect M’s states, in the following sense. Consider 𝑞 1 ’s class
EM,1 = { a, aaa, ... } and 𝑞 3 ’s class EM,3 = { aa, ab, ... }. Just as when the machine
is in 𝑞 1 and reads an a then it transitions to 𝑞 3 , so also if we choose any string
𝜎 ∈ EM,1 and append an a to it then 𝜎 ⌢ a is an element of EM,3 . An example is
that choosing 𝜎 = a ∈ EM,1 and appending a to it gives aa ∈ EM,3 .
This way of thinking of the M classes has them acting as a machine, in that
there are transitions among them just as a machine has. It suggests extending to
also think of the L classes as constituting their own machine. In particular, the
prior paragraph’s workings of the transitions on the EM,𝑖 classes suggest how to
define the transitions in this new machine.
†
‘Finer’ in the sense that sand is finer than gravel.
Extra B. The Myhill-Nerode theorem 245

2.10 Definition Let L be a language over Σ and let the collection of ∼L equivalence
classes EL,𝑖 be 𝐸 . The L-machine has states that are the classes, where the
start state is the one containing 𝜀 . Its accepting states are the ones containing
strings from L. The transition operation, Δ : 𝐸 × Σ → 𝐸 , is: given input EL,𝑖 and
𝑥 ∈ Σ, choose a 𝜎 in the class and then 𝜎 ⌢ 𝑥 is an element of some EL,𝑗 . Set
Δ( EL,𝑖 , 𝑥) = EL,𝑗 .
For instance, the machine for Example 2.6’s two-class language is the two-state
machine in (∗∗) below.
As stated, the definition allows us to choose any string 𝜎 at all from EL,𝑖 . We
must establish that choosing two different string representatives of the input class
does not give two different outputs.
2.11 Lemma Fix a language L. (1) The transition operation is well-defined: if two
strings 𝜎0, 𝜎1 are L-related, 𝜎0 ∼L 𝜎1 , then adjoining a common character 𝑥 ∈ Σ
gives strings that are also L-related, (𝜎0 ⌢𝑥) ∼L (𝜎1 ⌢𝑥) . (2) If one member of a
class is an element of L then every other member of that class is also an element
of L. (3) There is one and only one class containing the empty string.
Proof For the first item, if 𝜎0 ⌢ 𝑥 were not L-related to 𝜎1 ⌢ 𝑥 then the single-
character string 𝑥 would be a distinguishing extension. But 𝜎0 ∼L 𝜎1 so they have
no distinguishing extension.
The second item is: if 𝜎0 ∼L 𝜎1 and 𝜎0 ∈ L but 𝜎1 ∉ L then they are
distinguished by the empty string, which contradicts that they are L-related.
The third item is trivial since for any string, empty or not, there is one and only
one equivalence class containing that string.
2.12 Corollary The L-machine, if it has finitely many states, is a Finite State machine
that recognizes L.
Proof By well-definedness of transitions, starting the L-machine with any 𝜎 ∈ Σ∗
as input will cause the machine to end in the class containing 𝜎 . By the lemma’s
item (2), that is an accepting class of the machine if and only if 𝜎 ∈ L.
2.13 Example Let L = { ab𝑛 𝑛 ∈ N } = { a, ab, abb, ... }. We will first find the equiva-
lence classes and then determine the L-machine’s transitions.
There are three classes. First, EL,0 = {𝜀 } because for any string 𝜎 ∈ { a, b }∗, a
distinguishing extension between 𝜀 and 𝜎 is the single-character string a.
The second class is EL,1 = L. To see that any two elements of this set,
ab𝑖 , ab 𝑗 ∈ L, are L-related, suppose that 𝜏 ∈ Σ∗. If 𝜏 has the form b𝑘 then both of
ab𝑖 ⌢𝜏 and ab 𝑗 ⌢𝜏 are members of L, while if 𝜏 has at least one a then both are not
members because they have at least two a’s. It remains to show that if 𝜎0 ∈ L and
𝜎1 ∉ L then they are not L-related. That’s because, as in Lemma 2.11’s item (2),
they have a distinguishing extension of 𝜀 .
The final class contains all remaining strings, EL,2 = Σ∗ − ( EL,0 ∪ EL,1 ) . We will
show that any two elements of this set are L-related (there is no need to argue
that elements are not L-related to strings outside the set because we have already
shown that in the prior paragraphs). An element 𝜎 ∈ EL,2 must have at least one
246 Chapter IV. Automata

character and must fall into at least one of two cases: either its first item is not a,
or the rest of the string contains at least one a. In both cases for any extension
𝜏 the string 𝜎 ⌢ 𝜏 that is not an element of L, that is, there are no distinguishing
extensions.
In summary, the universe of all strings is partitioned by ∼L into these classes.

EL,0 = {𝜀 } EL,1 = { a, ab, abb, ... } EL,2 = { b, aa, ba, bb, aaa, aba, ... }

To compute the transitions, for each class choose one representative element,
append in turn each of a and b, and find the resulting classes. As representatives,
besides 𝜀 ∈ EL,0 we can choose the one-character strings a ∈ EL,1 and b ∈ EL,2 .
a
Δ a b EL,0 EL,1 b

𝜀 ∈ EL,0 a ∈ EL,1 b ∈ EL,2

b EL,2
a

+ a ∈ EL,1 aa ∈ EL,2 ab ∈ EL,1

b ∈ EL,2 ba ∈ EL,2 bb ∈ EL,2
a,b

B.14 Theorem (Myhill-Nerode) A language L is regular if and only if the relation

∼L gives only finitely many equivalence classes.
Proof For the first direction suppose that L is a regular language, recognized by a
Finite State machine M. The number of elements in the partition associated with
∼M is finite, as there is one part for each of M’s reachable states. Consequently the
number of element in the partition induced by ∼L is finite because by Lemma 2.9,
∼L ’s partition has at most the number of elements that ∼M ’s partition has (as
illustrated in Example 2.6’s (∗) diagram).
For the other direction let the number of elements in the partition induced by
∼L be finite. By the corollary above, the L machine is a Finite State machine that
recognizes L. Since there is a Finite Sate machine recognizing it, L is regular.
Returning to Example 2.6, the way that the L classes fit the M classes in
diagram (∗) suggests that we are collapsing the M classes to get the L machine,
and that machine also recognizes the language L, but does so with a minimal
number of states.
a,b
a,b
𝑞1 𝑞3
a,b
𝑞0
a
a,b
=⇒ EL,0 EL,1 (∗∗)
b a,b
𝑞2 𝑞4
a,b

2.15 Lemma The L-machine is minimal, meaning that from among all the deterministic
Finite State machines that recognize the language L, it has a minimal number of
states.
Proof Let a deterministic Finite State machine M recognize L. Consider two
strings 𝜎0, 𝜎1 ∈ Σ∗. If those two take M to the same state, Δ̂(𝜎0 ) = Δ̂(𝜎1 ) , then
appending any common extension does the same, Δ̂(𝜎0 ⌢ 𝜏) = Δ̂(𝜎1 ⌢ 𝜏) .
Extra B. The Myhill-Nerode theorem 247

Thus if the two have a distinguishing extension, 𝜎0 ≁L 𝜎1 , then Δ̂(𝜎0 ) ≠ Δ̂(𝜎1 ) .

Consequently, if there are 𝑘 -many EL,𝑖 classes then there must be a set of 𝑘 -many
strings that pairwise end in different states. Hence any machine recognizing L
must have at least that many different states.
To analyze a language with the Myhill-Nerode theorem,
the challenge lies in finding the ∼L equivalence classes. One
helpful point is that Lemma 2.11’s item (2) says that if a
class contains even one string from L then it consists entirely
of such strings (it may be that more than one class contains
strings from L). Another point is that if the given language
John Myhill Sr 1923–1987 is recognized by some Finite State machine then with our
and Anil Nerode b 1932 experience often we see how to build such an M. It can give
us insight into the ∼L classes, since Lemma 2.9 says that the
∼L classes are groupings of ∼M classes. Also, if the M that we build is minimal
then by Lemma 2.15 the ∼L classes equal the ∼M classes.
2.16 Example In this chapter’s first section, Example 1.4 discusses the language of
strings 𝜎 ∈ B∗ where the number of 1’s is a multiple of four. It recognizes L using
this machine, which is clearly minimal. (We could verify that using Extra C.)

𝑞0 1 𝑞1
0 0

1 1

𝑞3 1 𝑞2
0 0

There are four M equivalence classes, EM,0 = {𝜎 Δ̂(𝜎) = 𝑞 0 }, and EM,1 =

{𝜎 Δ̂(𝜎) = 𝑞 1 }, etc. These are also the L equivalence classes, EL,0 = EM,0 , etc.
The class EL,0 contains the strings where the number of 1’s is divisible by four
without remainder, the class EL,1 contains the strings where the number of 1’s
leaves a remainder of 1, etc.
For the L-machine, compute the transitions by choosing a representative
element from each class, appending in turn each character of the alphabet, and
finding the resulting class.

Δ 0 1 0 EL,0 1
EL,1 0
+𝜀 ∈ EL,0 0 ∈ EL,0 1 ∈ EL,1
1 ∈ EL,1 10 ∈ EL,1 11 ∈ EL,2 1 1

11 ∈ EL,2 110 ∈ EL,2 111 ∈ EL,3 1

EL,3 EL,2
111 ∈ EL,3 1110 ∈ EL,3 1111 ∈ EL,0 0 0

Of course, the two machines are essentially the same. If we minimize a minimal
machine then we get back pretty much what we started with.

We finish with a reflection. At the chapter’s start we constructed machines by

hand, perhaps sometimes struggling a bit. In contrast, the Myhill-Nerode result
248 Chapter IV. Automata

starts with the desired language L, considers the relation ∼L , and if there are
finitely many associated classes then gives a Finite State machine that recognizes L.
That is, by seeing the patterns inside L given by ∼L , we don’t have to make a
machine recognizing that language, the mathematics gives it to us. For free, this
machine is minimal. This is a deep kind of wizardry — these problems are solved
automatically.

IV.B Exercises
✓ B.17 Use the machine from Example 2.13. (a) What class contains the string bba?
(b) abba? (c) babab? (d) abbbb?
B.18 Use the L-machine from Example 2.13. For each string, identify the string’s
class, and then append the character a and identify the ending class for the
machine’s transition. (a) bba (b) 𝜀 (c) abbb (d) a
✓ B.19 This illustrates the point about ‘well-defined’ in Lemma 2.11’s part (1).
Consider the L-machine of Example 2.16.
(a) Three representative strings from EL,0 are 𝜎0 = 00, 𝜎1 = 11011, and 𝜎2 =
0011111111 = 0018. Append 0 to each and name the class of the resulting
string. Verify that all three lead to a single class and that in the machine the
state EL,0 transitions on input 0 to that class.
(b) Using the same three strings, to each append 1 and name the class of the
resulting string. Verify that the three lead to the same class, and that in the
machine the state EL,0 transitions on input 1 to it.
(c) Repeat that for three strings from EL,1 , with both 0 and 1.
✓ B.20 Example 2.4 gives a language L = {𝜎 ∈ { a, b }∗ 𝜎 has even length } with
two classes. Produce the transition table and arrow diagram for the associated
L-machine.
B.21 Example 2.3 gives a language L = {𝜎 ∈ { a, b }∗ |𝜎 | = 3 } with five classes.
Produce the L-machine.
✓ B.22 Let L be the set of strings from { a, b }∗ ending in a.
(a) Show that L is an equivalence class, EL,1 , for the relation ∼L .
(b) Show that the complement Lc is also an equivalence class, EL,0 , and therefore
there are exactly two classes.
(c) What is the initial state of the L-machine?
(d) What are the accepting states?
(e) Give the transition table and the diagram.
(f) Which of the strings 𝜀 , a, b, abba, and bba are accepted by this machine?
✓ B.23 For the language L = { a2 b𝑛 𝑛 ∈ N } with alphabet Σ = { a, b } this is a
minimal Finite State machine.
b a,b

a a a
𝑞0 𝑞1 𝑞2 𝑞3
b

b
Extra C. Machine minimization 249

List the ∼L equivalence classes.

✓ B.24 The language L = { a𝑛 b 𝑛 ∈ N } is regular. Find the ∼L equivalence classes.
The alphabet is Σ = { a, b }.
B.25 Find the L equivalence classes for the language strings that end in 01. The
alphabet is B.
(a) Approach the problem by using Lemma 2.15, building a minimal Finite State
machine for the language and using the M classes as the L classes.
(b) Instead directly show that the L class sets are equivalence classes.
B.26 Describe the L equivalence classes for the set of strings where every 0 is
immediately followed by two 1’s. The alphabet is B.
B.27 In the L-machines in the examples, the language is one equivalence class.
Produce a language L that is divided by the ∼L relation into more than one
equivalence class.
✓ B.28 The language of palindromes L = {𝜎 ∈ a, b∗ 𝜎 R = 𝜎 } is not regular. Find
infinitely many L equivalence classes.
✓ B.29 Show that the language of strings 𝜎 ∈ B∗ where the number of 0’s in 𝜎 is
the same as the number of 1’s is not regular by using the Myhill-Nerode Theorem.
B.30 Show that the language of strings of the form 02 , that is, L = { 0, 04, 08, ... },
𝑗

is not regular.
B.31 Recall that a binary relation ∼ is an equivalence if it has three proper-
ties: (1) reflexivity, that 𝑥 ∼ 𝑥 for all 𝑥 , (2) symmetry, that if 𝑥 ∼ 𝑦 then 𝑦 ∼ 𝑥 , and
(3) transitivity, that if 𝑥 ∼ 𝑦 and 𝑦 ∼ 𝑧 then also 𝑥 ∼ 𝑧 . (a) Verify Lemma 2.2.
(b) Verify Lemma 2.8.
B.32 Generalize the first item in Lemma 2.11 to: if two strings 𝜎0, 𝜎1 are L-
related, 𝜎0 ∼L 𝜎1 , then adjoining any common extension 𝛽 gives strings that are
also L-related, (𝜎0 ⌢ 𝛽) ∼L (𝜎1 ⌢ 𝛽) .
B.33 Show that the equivalence classes for the language L are the same as for the
language that is its complement, Lc.

Extra
IV.C Machine minimization

Imagine that a person is tasked with ensuring that input for a password form
contains both upper and lower case ASCII characters, and produces the machine
on the left.
a .. z a .. z

𝑞1 A .. Z 𝑞3
a .. z
any
a .. z
𝑟1 A .. Z
𝑞0 𝑟0 𝑟3 any

A .. Z A .. Z a .. z
𝑞2 a .. z
𝑞4 any 𝑟2

A .. Z A .. Z
250 Chapter IV. Automata

The machine on the right is better in that it has one fewer state. We will give
an algorithm, Moore’s algorithm (or the table-filling algorithm), that inputs a
deterministic Finite State machine and outputs a deterministic machine that is
minimal, that from among all of the machines recognizing the same language has
the fewest states.†
It collapses together redundant states so we begin with an example of those.
This recognizes L = {𝜎 ∈ B∗ 𝜎 has at least one 0 and at least one 1 }.
𝑞0 𝑞2 𝑞4 1
1 1

0 0 0

(∗)
𝑞1 𝑞3 𝑞5
1 1

0 0 0,1

In this chapter’s first section, Remark 1.7 recommended designing Finite State
machines by thinking about each state’s future, anticipating inputs to come. In
this machine the future of 𝑞 2 , “waiting for a 0,” is the same as that of 𝑞 4 . Those
states are redundant. Likewise, the future of 𝑞 5 matches that of 𝑞 3 .
To be concrete, this table lists what happens if the machine is started in the
given state and the given string is on the tape. Entries contain ‘+’ if the machine
then ends in an accepting state, and otherwise are blank. The states 𝑞 2 and 𝑞 4
have the same rows, at least for the strings listed, as do the states 𝑞 3 and 𝑞 5 .
𝜀 0 1 00 01 10 11 ...
𝑞0 + + ...
𝑞1 + + + + ...
𝑞2 + + + + ... (∗∗)
𝑞3 + + + + + + + ...
𝑞4 + + + + ...
𝑞5 + + + + + + + ...
In contrast, 𝑞 0 does not have the same row as any other state, nor does 𝑞 1 .
3.1 Definition Fix a Finite State machine over an alphabet Σ. For two states 𝑞
and 𝑞ˆ, a 𝜎 ∈ Σ∗ is a distinguishing string if starting the machine in 𝑞 with 𝜎 on the
tape and starting it in 𝑞ˆ with 𝜎 on the tape results in two different outcomes: in
one case the machine ends in an accepting state while in the other it rejects. Two
states for which there is a distinguishing string are distinguishable, otherwise they
are indistinguishable, written 𝑞 ∼ 𝑞ˆ.
3.2 Example This is a minimal version of the machine in (∗).
𝑟0 1
𝑟2 1

0 0

0 𝑟1 1
𝑟3 0,1

†
Moore’s algorithm is easy to understand and is suitable for small calculations but when writing code
be aware that another, Hopcroft’s algorithm, is more efficient.
Extra C. Machine minimization 251

Starting in 𝑟 0 and processing the string 𝜎 = 1 ends in a rejecting state, while starting
in 𝑟 1 and processing 𝜎 ends in an accepting state. So the two are distinguished
by 𝜎 . The states 𝑟 0 and 𝑟 2 , are distinguished by the length one string 𝜎 = 0, and 𝑟 0
is distinguished from 𝑟 3 by the empty string. Similarly, the pairs 𝑟 1, 𝑟 2 and 𝑟 1, 𝑟 3 ,
and 𝑟 2, 𝑟 3 are all distinguishable. This minimal machine has no indistinguishable
states.
States that are indistinguishable are redundant. We will compute whether
states are indistinguishable by checking whether they are distinguished by strings
of length 0, or by strings of length 1, etc. Two states 𝑞 and 𝑞ˆ are 𝑛 -distinguishable
if there is a distinguishing string of length at most 𝑛 , otherwise the are 𝑛 -
indistinguishable, denoted 𝑞 ∼𝑛 𝑞ˆ.
Observe that two states 𝑞 and 𝑞ˆ are 0-indistinguishable if and only if both are
accepting states or both are rejecting states.
3.3 Lemma The relations ∼0 , ∼1 , . . . are equivalences, as is ∼, and so partition the
states into equivalences classes, the 0-distinguishable classes, the 1-distinguishable
classes, . . . along with the distinguishable classes.
Proof Exercise C.23.
3.4 Example Here are some 𝑛 -equivalence classes for the machine (∗), using the
information in the table (∗∗).

𝑛 ∼𝑛 classes
0 E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 4 } E0,1 = {𝑞 3, 𝑞 5 }
1 E1,0 = {𝑞 0 } E1,1 = {𝑞 1 } E1,2 = {𝑞 2, 𝑞 4 } E1,3 = {𝑞 3, 𝑞 5 }
2 E2,0 = {𝑞 0 } E2,1 = {𝑞 1 } E2,2 = {𝑞 2, 𝑞 4 } E2,3 = {𝑞 3, 𝑞 5 }

The 0-distinguishable classes divide the rejecting states from the accepting ones.
The 1-distinguishable classes subdivide those, based on the length one strings.
Specifically, starting with E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 4 }, we can next distinguish 𝑞 0 from 𝑞 1
because they differ on the string 1. Similarly, 𝑞 0 differs from 𝑞 2 and 𝑞 4 on 0. And,
𝑞 1 differs from 𝑞 2 and 𝑞 4 on the string 0. So where there was one ∼0 class E0,0
there are now three ∼1 classes, E1,0 = {𝑞 0 }, E1,1 = {𝑞 1 }, and E1,2 = {𝑞 2, 𝑞 4 }.
At the next stage, using the length two strings to find the the 2-distinguishability
classes does not result in any further subdivisions.
So the algorithm first finds the ∼0 classes, then finds the ∼1 classes, etc., until
the classes stop splitting. What remains are the ∼ classes, and they serve as states
of a minimal machine.
There is one difficulty. In the prior example, to get the ∼0 classes we checked
all length 0 string inputs, stage 1 checked all length 1 strings, and stage 2 checked
all length 2 strings. If at stage 𝑛 the algorithm were to check all length 𝑛 strings
then it would take exponentially long, because there are 2𝑛 of them.
To fix that, consider states 𝑞 and 𝑞ˆ that are not 𝑛 -distinguishable but are
𝑛 + 1-distinguishable. Write a distinguishing string 𝜏 = ⟨𝑠 0, 𝑠 1, ... 𝑠𝑛−1, 𝑠𝑛 ⟩ , as
𝜏 = 𝛼 ⌢ 𝑠𝑛 . Because 𝑞 and 𝑞ˆ are not 𝑛 -distinguishable, where 𝛼 brings the machine
252 Chapter IV. Automata

from 𝑞 to some state 𝑟 and also brings the machine from 𝑞ˆ to some 𝑟ˆ, then 𝑟 and 𝑟ˆ
are equivalent, 𝑟, 𝑟ˆ ∈ E𝑛,𝑖 . Therefore, distinguishing between 𝑞 and 𝑞ˆ must happen
with the final character, 𝑠𝑛 . It must take the machine from state 𝑟 to a state in
one class, and also take the machine from 𝑟ˆ to a state in a different class. In
short, in looking for a split in passing from the 𝑛 -distinguishability classes to the
𝑛 + 1-distinguishability classes, we need only look at single characters.
In summary, Moore’s algorithm is that nodes 𝑞 and 𝑞ˆ are 𝑛 + 1-indistinguishable
if and only if they are 𝑛 -indistinguishable and also Δ(𝑞, 𝑥) is 𝑛 -indistinguishable
from Δ(𝑞,ˆ 𝑥) for all characters 𝑥 ∈ Σ. The next two examples illustrate, and also
show how to use the ∼ classes to make minimal machines.
3.5 Example We will find a machine that recognizes the same language as this one
but that is minimal.

𝑞2 a 𝑞4
a,b
b
b
𝑞0 𝑞5 a,b

a b a,b
𝑞1 a
𝑞3

For bookkeeping we will use triangular tables, with an entry for every pair of
different states. We will checkmark pairs that are distinguishable. From stage
to stage we fill in more marks, first doing 0-distinguishability, then refining it to
1-distinguishability, etc. When we reach a stage where the table does not change
then we are done.
Stage 0 is to checkmark the 𝑖, 𝑗 entries that are 0-distinguishable, where one of
𝑞𝑖 and 𝑞 𝑗 is accepting while the other is not.

0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ 5

Use the blank boxes to read off the ∼0 -equivalence classes, because blankness
means that the states are mutually 0-indistinguishable. For instance, there are blank
boxes in entries 0, 3 and 0, 4 and 3, 4 and this cluster is the first ∼0 equivalence
class. Similarly there is a cluster of blank entries in 1, 2 and 1, 5 and 2, 5.

E0,0 = {𝑞 0, 𝑞 3, 𝑞 4 } E0,1 = {𝑞 1, 𝑞 2, 𝑞 5 }

Stage 1 determines whether those ∼0 classes split. For each pair 𝑞𝑖 , 𝑞 𝑗 that are
together in a class, and for each input character, compute into which classes that
character sends those states.
Extra C. Machine minimization 253

a b
𝑞 0, 𝑞 3 𝑞1 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞2 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 0, 𝑞 4 𝑞1 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞2 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 3, 𝑞 4 𝑞5 ∈ E0,1, 𝑞 5 ∈ E0,1 𝑞5 ∈ E0,1, 𝑞 5 ∈ E0,1
𝑞 1, 𝑞 2 𝑞3 ∈ E0,0, 𝑞 4 ∈ E0,0 𝑞4 ∈ E0,0, 𝑞 3 ∈ E0,0
𝑞 1, 𝑞 5 𝑞3 ∈ E0,0, 𝑞 5 ∈ E0,1 𝑞4 ∈ E0,0, 𝑞 5 ∈ E0,1
𝑞 2, 𝑞 5 𝑞4 ∈ E0,0, 𝑞 5 ∈ E0,1 𝑞3 ∈ E0,0, 𝑞 5 ∈ E0,1

Two states are distinguished when they are sent by a character to members of
unequal classes. The first case is in the 𝑞 1, 𝑞 5 line, where a takes 𝑞 1 to 𝑞 3 ∈ E0,0
and takes 𝑞 5 to 𝑞 5 ∈ E0,1 . So the ∼0 class containing 𝑞 1 and 𝑞 5 , the class E0,1 , will
split into multiple ∼1 classes. The next line of the computation also shows that 𝑞 2
and 𝑞 5 are distinguishable.
To record that 𝑞 1 and 𝑞 5 are distinguishable, add a checkmark in the triangular
table’s cell for 1, 5. Also add a 2, 5 checkmark.
0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ ✓ ✓ 5

Finish this stage by getting the ∼1 classes as clusters of blank cells. There is a
cluster in 0, 3 and 0, 4 and 3, 4. There is also a blank cell in 1, 2. There is no cluster
involving 𝑞 5 but since every state must be in some class, it goes in a class by itself.

E1,0 = {𝑞 0, 𝑞 3, 𝑞 4 } E1,1 = {𝑞 1, 𝑞 2 } E1,2 = {𝑞 5 }

The ∼0 class E0,1 = {𝑞 1, 𝑞 2, 𝑞 5 } has split into two ∼1 classes.

Iterate. At stage 2 the single-element set E1,2 can’t split so consider only pairs
from the other two classes. Calculation shows that 𝑞 0 is distinguishable from 𝑞 3
and also from 𝑞 4 . Add to the triangular table checkmarks for 0, 3 and 0, 4.

a b 0
𝑞 0, 𝑞 3 𝑞1 ∈ E1,1, 𝑞 5 ∈ E1,2 𝑞2 ∈ E1,1, 𝑞 5 ∈ E1,2 ✓ 1
✓ 2
𝑞 0, 𝑞 4 𝑞1 ∈ E1,1, 𝑞 5 ∈ E1,2 𝑞2 ∈ E1,1, 𝑞 5 ∈ E1,2 ✓ ✓ ✓ 3
𝑞 3, 𝑞 4 𝑞5 ∈ E1,2, 𝑞 5 ∈ E1,2 𝑞5 ∈ E1,2, 𝑞 5 ∈ E1,2 ✓ ✓ ✓ 4
𝑞 1, 𝑞 2 𝑞3 ∈ E1,0, 𝑞 4 ∈ E1,0 𝑞4 ∈ E1,0, 𝑞 3 ∈ E1,0 ✓ ✓ ✓ ✓ ✓ 5

The blank cells give these ∼2 classes.

E2,0 = {𝑞 0 } E2,1 = {𝑞 1, 𝑞 2 } E2,2 = {𝑞 3, 𝑞 4 } E2,3 = {𝑞 5 }

One more iteration.

0
a b ✓ 1
✓ 2
𝑞 1, 𝑞 2 𝑞 3 ∈ E2,2, 𝑞 4 ∈ E2,2 𝑞 4 ∈ E2,2, 𝑞 3 ∈ E2,2 ✓ ✓ ✓ 3
𝑞 3, 𝑞 4 𝑞 5 ∈ E2,3, 𝑞 5 ∈ E2,3 𝑞 5 ∈ E2,3, 𝑞 5 ∈ E2,3 ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ ✓ 5
254 Chapter IV. Automata

There was no more splitting. The algorithm terminates with these ∼ classes.

E0 = {𝑞 0 } E1 = {𝑞 1, 𝑞 2 } E2 = {𝑞 3, 𝑞 4 } E3 = {𝑞 5 }

To finish, we produce the minimal machine. Take 𝑟 0 as a name for E0 , take 𝑟 1

for E1 , 𝑟 2 for E2 , and 𝑟 3 for E3 . The start state is the one containing 𝑞 0 , namely 𝑟 0 .
The final states are the ones containing final states of the original machine, 𝑟 1 and
𝑟3.
𝑟0 𝑟1 𝑟2 𝑟3 a,b
a,b a,b a,b

To define the transitions between states, consider what happens when we feed
the character a to elements of a class, such as the class 𝑟 1 = E1 = {𝑞 1, 𝑞 2 }. For
instance, if we choose 𝑞 1 and look in the original machine then under input a it
goes to 𝑞 3 . Since 𝑞 3 is an element of E2 = 𝑟 2 , in the minimal machine the a arrow
out of 𝑟 1 goes to 𝑟 2 . Other transitions work the same way.
As to the terminology that this algorithm ‘collapses’ together the redundant
states, consider this picture of the prior example, showing a kind of projection.

𝑞2 a 𝑞4
a,b
b

Input M:
b
𝑞0 𝑞5 a,b

a b a,b
𝑞1 a
𝑞3

Output N : 𝑟0
a,b
𝑟1
a,b
𝑟2
a,b
𝑟3 a,b

We can express the same using Δ tables.

ΔM a b ΔN a b
𝑞0 𝑞1 𝑞2 𝑟0 𝑟1 𝑟1
𝑞1 𝑞3 𝑞4
𝑟1 𝑟2 𝑟2
𝑞2 𝑞4 𝑞3
𝑞3 𝑞5 𝑞5
𝑟2 𝑟3 𝑟3
𝑞4 𝑞5 𝑞5
𝑞5 𝑞5 𝑞5 𝑟3 𝑟3 𝑟3
3.6 Example We will minimize the machine below. This illustrates one additional
point of the algorithm since this machine has an unreachable state, 𝑞 5 . Start by
omitting it.

𝑞1 𝑞3 0,1
1
0
𝑞0 0

1
𝑞2 𝑞4 𝑞5
1 1

0 0,1 0
Extra C. Machine minimization 255

That leaves this stage 0 triangular table and these ∼0 classes.

0
1
2 E0,0 = {𝑞 0, 𝑞 1, 𝑞 2 } E0,1 = {𝑞 3, 𝑞 4 }
✓ ✓ ✓ 3
✓ ✓ ✓ 4

For stage 1 check whether those classes split. The calculation shows that 𝑞 0
is distinguished from 𝑞 1 by 1, and 𝑞 0 is distinguished from 𝑞 2 , also by 1. The
triangular table below reflects those updates.
0 1 0
𝑞 0, 𝑞 1 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 3 ∈ E0,1 ✓ 1
𝑞 0, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 4 ∈ E0,1 ✓ 2
𝑞 1, 𝑞 2 𝑞2 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 ✓ ✓ ✓ 3
𝑞 3, 𝑞 4 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 ✓ ✓ ✓ 4

Thus the first ∼0 class splits and these are the ∼1 classes.
E1,0 = {𝑞 0 } E1,1 = {𝑞 1, 𝑞 2 } E1,2 = {𝑞 3, 𝑞 4 }

The next iteration, stage 2, shows no more splitting.

0 1
𝑞 1, 𝑞 2 𝑞 2 ∈ E1,1, 𝑞 2 ∈ E1,1 𝑞 3 ∈ E1,2, 𝑞 4 ∈ E1,2
𝑞 3, 𝑞 4 𝑞 3 ∈ E1,2, 𝑞 4 ∈ E1,2 𝑞 3 ∈ E1,2, 𝑞 4 ∈ E1,2
The minimized machine has three states, one for each ∼ equivalence class,
E0 = {𝑞 0 }, E1 = {𝑞 1, 𝑞 2 }, and E2 = {𝑞 3, 𝑞 4 }.
0 0,1

𝑟0 0,1
𝑟1 1
𝑟2

3.7 Lemma Moore’s algorithm outputs a machine that recognizes the same language
as the input machine and that is minimal.
Proof See Exercise C.24, which verifies that that Moore’s algorithm always halts,
that it produces a Finite State machine with a well-defined transition function,† that
this output machine recognizes the same language as the input machine, and that
the output machine is minimal.
3.8 Example As an alternative to the lemma’s whole argument, we will illustrate the
approach to showing minimality. We use the two machines given at the section’s
start (we write ‘a’ for ‘a .. z’ and ‘A’ for ‘A .. Z’). Call the input machine M and
the output N . Consider the union of the two sets of states. Here is the stage 0
table and classes.
†
The last paragraph of Example 3.5 describes how to define the transition function of the minimal
machine. It appears that if an input class 𝑟𝑖 has more than one element 𝑞 𝑗 then, depending on which
one we choose, we could get different output classes. We must show that the output is the same no
matter what choice we make.
256 Chapter IV. Automata

𝑞0
𝑞1
𝑞2
✓ ✓ ✓ 𝑞3
✓ ✓ ✓ 𝑞4 E0,0 = {𝑞 3, 𝑞 4, 𝑟 3 } E0,1 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑟 0, 𝑟 1, 𝑟 2 }
✓ ✓ 𝑟0
✓ ✓ 𝑟1
✓ ✓ 𝑟2
✓ ✓ ✓ ✓ ✓ ✓ 𝑟3

Stage 1 looks at pairs from the two ∼0 classes, calculating whether character
transitions split any class.

a A
𝑞 3, 𝑞 4 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0
𝑞 3, 𝑟 3 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 4, 𝑟 3 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 0, 𝑞 1 𝑞 1 ∈ E0,1, 𝑞 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑞 3 ∈ E0,0
𝑞 0, 𝑞 2 𝑞 1 ∈ E0,1, 𝑞 4 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑞 2 ∈ E0,1
𝑞 0, 𝑟 0 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 0, 𝑟 1 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑞 0, 𝑟 2 𝑞 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 1, 𝑞 2 𝑞 1 ∈ E0,1, 𝑞 4 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑞 4 ∈ E0,0
𝑞 1, 𝑟 0 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 3 ∈ E0,0, 𝑟 2 ∈ E0,1
𝑞 1, 𝑟 1 𝑞 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑞 3 ∈ E0,0, 𝑟 3 ∈ E0,0
𝑞 1, 𝑟 2 𝑞 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑞 3 ∈ E0,0, 𝑟 2 ∈ E0,1
𝑞 2, 𝑟 0 𝑞 4 ∈ E0,0, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑞 2, 𝑟 1 𝑞 4 ∈ E0,0, 𝑟 1 ∈ E0,1 𝑞 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑞 2, 𝑟 2 𝑞 4 ∈ E0,0, 𝑟 3 ∈ E0,0 𝑞 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑟 0, 𝑟 1 𝑟 1 ∈ E0,1, 𝑟 1 ∈ E0,1 𝑟 2 ∈ E0,1, 𝑟 3 ∈ E0,0
𝑟 0, 𝑟 2 𝑟 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑟 2 ∈ E0,1, 𝑟 2 ∈ E0,1
𝑟 1, 𝑟 2 𝑟 1 ∈ E0,1, 𝑟 3 ∈ E0,0 𝑟 3 ∈ E0,0, 𝑟 2 ∈ E0,1
There are many splits, for example 𝑞 0 and 𝑞 1 are distinguished by A. In the resulting
triangular table there are six empty boxes, at 𝑞 0, 𝑟 0 , at 𝑞 1, 𝑟 1 , at 𝑞 2, 𝑟 2 , at 𝑞 3, 𝑞 4 , at
𝑞 3, 𝑟 3 , and at 𝑞 4, 𝑟 3 . We conclude that E0,0 does not split but that E0,1 splits into
three.
𝑞0
✓ 𝑞1
✓ 𝑞2
✓ E1,0 = {𝑞 3, 𝑞 4, 𝑟 3 }
✓ ✓ 𝑞3
✓ E1,1 = {𝑞 0, 𝑟 0 }
✓ ✓✓ 𝑞4
✓ ✓ ✓ 𝑟0
✓ E1,2 = {𝑞 1, 𝑟 1 }
✓ ✓ ✓ ✓ ✓ 𝑟1 E1,3 = {𝑞 2, 𝑟 2 }
✓ ✓ ✓ ✓ ✓ ✓ 𝑟2
✓ ✓ ✓ ✓ ✓ ✓ 𝑟3

The next stage shows no more splittings. The algorithm has grouped 𝑞 0 with 𝑟 0 ,
and 𝑞 1 with 𝑟 1 , and 𝑞 2 with 𝑟 2 . It has also grouped 𝑞 3 and 𝑞 4 together with 𝑟 3 .
Extra C. Machine minimization 257

It is not surprising that starting with the minimal machine N and performing
the algorithm results in one state 𝑟𝑖 per ∼ class. It is also not surprising that the
states of M can end with multiple 𝑞 𝑗 s’ in a class, since we have seen that in earlier
examples. But the point is that for each 𝑟𝑖 there is at least one associated 𝑞 𝑗 , and
therefore N has a number of states that is less than or equal to the number in M.
We close by describing a common scenario in which minimization plays an
important role. We have seen that when we have a problem to solve with
a Finite State machine, often a nondeterministic machine is easier and more
natural. An example is an algorithm that inputs a regular expression and outputs
a machine recognizing that expression. But our algorithm for converting a
nondeterministic machine to a deterministic machine has the problem that where
the nondeterministic machine has 𝑛 states, the deterministic machine has 2𝑛 . This
section’s result alleviates that exponential blow-up. We now have a three-step
process: from a problem, we start with a nondeterministic answer, convert that to
an equivalent deterministic machine, and then minimize to get a reasonably-sized
final answer. In practice, this gives good results.

IV.C Exercises

✓ C.9 From the triangular table find the ∼𝑖 classes.

0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ 5

C.10 From the ∼𝑖 classes find the associated triangular table. (a) E𝑖,0 = {𝑞 0, 𝑞 1 },
E𝑖,1 = {𝑞 2 }, and E𝑖,2 = {𝑞 3, 𝑞 4 }, (b) E𝑖,0 = {𝑞 0 }, E𝑖,1 = {𝑞 1, 𝑞 2, 𝑞 4 }, and
E𝑖,2 = {𝑞 3 }, (c) E𝑖,0 = {𝑞 0, 𝑞 1, 𝑞 5 }, E𝑖,1 = {𝑞 2, 𝑞 3 }, and E𝑖,2 = {𝑞 4 },
✓ C.11 Suppose that E0,0 = {𝑞 0, 𝑞 1, 𝑞 2, 𝑞 5 } and E0,1 = {𝑞 3, 𝑞 4 }, and we compute
this table.

a b
𝑞 0, 𝑞 1 𝑞1 ∈ E0,0, 𝑞 1 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 3 ∈ E0,1
𝑞 0, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 4 ∈ E0,1
𝑞 0, 𝑞 5 𝑞1 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞2 ∈ E0,0, 𝑞 5 ∈ E0,0
𝑞 1, 𝑞 2 𝑞1 ∈ E0,0, 𝑞 2 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1
𝑞 1, 𝑞 5 𝑞1 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞3 ∈ E0,1, 𝑞 5 ∈ E0,0
𝑞 2, 𝑞 5 𝑞2 ∈ E0,0, 𝑞 5 ∈ E0,0 𝑞4 ∈ E0,1, 𝑞 5 ∈ E0,0
𝑞 3, 𝑞 4 𝑞3 ∈ E0,1, 𝑞 4 ∈ E0,1 𝑞5 ∈ E0,0, 𝑞 5 ∈ E0,0

(a) Which states are 1-distinguishable that were not 0-distinguishable? (b) Give
the resulting ∼1 classes.
258 Chapter IV. Automata

✓ C.12 This machine accepts strings with an odd parity, with an odd number of 1’s.
Minimize it using the algorithm described in this section.

0 0 0

1
𝑞0 𝑞1 𝑞2
1
1

C.13 For many machines we can find the unreachable states by eye, but there
is an algorithm. It inputs a machine M and initializes the set of reachable states
to 𝑅0 = {𝑞 0 }. For 𝑛 > 0, step 𝑛 of the algorithm is: for each 𝑞 ∈ 𝑅𝑛 find all
states 𝑞ˆ reachable from 𝑞 in one transition and add those to make 𝑅𝑛+1 . That
is, 𝑅𝑛+1 = 𝑅𝑛 ∪ { 𝑞ˆ = ΔM (𝑞, 𝑥) 𝑞 ∈ 𝑅𝑛 and 𝑥 ∈ Σ }. The algorithm stops when
𝑅𝑘 = 𝑅𝑘+1 and the set of reachable states is 𝑅 = 𝑅𝑘 . The unreachable states are
the others, 𝑄 − 𝑅 . For each machine, perform this algorithm.
𝑞3 𝑞5
a,b
b

(a) (b)
a a,b
a b
a a,b
𝑞0 𝑞1 a 𝑞2 𝑞3 𝑞4
𝑞0 𝑞1 𝑞2 a a, b
b
a,b b b

✓ C.14 Perform the minimization algorithm on the machine with redundant states
at the start of this section, the one labeled (∗).
✓ C.15 This machine accepts strings described by (ab|ba)*. Minimize it, using the
algorithm of this section.

a
𝑞0 𝑞1 𝑞2 b 𝑞7
a
b
b a a b a a b

𝑞3 𝑞4 b 𝑞5 a 𝑞6
b

a,b

C.16 If a machine’s start state is accepting, must the minimized machine’s start
state be accepting? If so then prove it, and if not then give an example machine
where it is false.
C.17 Minimize.

𝑞1
0 1
0
0
𝑞0 𝑞2 𝑞4 0,1
1

1 0 1

𝑞3
Extra C. Machine minimization 259

C.18 Minimize.
a a

𝑞1 𝑞3
b
a b
𝑞0 𝑞5 a,b

b b
𝑞2 𝑞4
b

a a

C.19 This machine has no accepting states. Minimize it.

𝑞0 𝑞1 0
0

0
1 0 1

𝑞2 1 𝑞3
1

What happens to a machine where all states are accepting?

C.20 Minimize this machine.
𝑞2
b b
a b
b
b
𝑞0 𝑞1 𝑞3 a 𝑞4 a 𝑞5
a
a
b a

C.21 What happens if you perform the minimization procedure on a machine

that has an unreachable state, such as Example 3.6, without first omitting the
unreachable state?
✓ C.22 Minimize.
𝑞0 a
𝑞1 a
𝑞2 a
𝑞3 a
𝑞4

b b b b a,b

Note that the algorithm takes a time that is roughly are equal to the number of
states in the machine.
C.23 Verify Lemma 3.3. (a) Verify that each ∼𝑛 is an equivalence relation
between states. (b) Verify that ∼ is an equivalence.
C.24 We will verify that Moore’s algorithm halts on any input machine M and
outputs an N that recognizes the same language, and that is minimal.
(a) Prove that the algorithm always halts.
(b) Prove that the transition function of N is well-defined.
(c) Verify that the two machines recognize the same language.
(d) Show that N is minimal: any M̂ that recognize the same language as N has at
least as many states as N . Hint: first follow Example 3.8. Then do an argument
260 Chapter IV. Automata

by induction that shows that the start states of the two are indistinguishable,
and if two states 𝑞 and 𝑟 are indistinguishable then so are the states they
transition to on a single-character input. This gives an association of single 𝑟 ’s
with at least one 𝑞 , and so there are at least as many 𝑞 ’s as 𝑟 ’s.
C.25 There are ways to minimize Finite State machines other then the one given
in this section. One is Brzozowski’s algorithm, which has the advantage of being
surprising and fun in that you perform some steps that seem a bit wacky and
unrelated to elimination of states and then at the end it has worked. (However,
it has the disadvantage of taking worst-case exponential time.) We will not go
through why it works, but we will walk through the algorithm using this Finite
State machine, M.
𝑞1 a,b

a
𝑞0 a

b
𝑞2 b

(a) Use Moore’s algorithm to minimize it.

(b) Instead, get a new machine by taking M, changing the state names to be 𝑡𝑖
instead of 𝑞𝑖 , and reversing all the arrows (loops reverse to themselves). This
gives a nondeterministic machine. Mark what used to be the initial state as
an accepting state and mark what used to be the accepting state as an initial
state. (This may give a machine with more than one initial state.)
(c) Convert this to a deterministic machine, using the powerset method. Name
the state of this machine 𝑢𝑖 . Omit unreachable states.
(d) Repeat the second item by changing the state names from 𝑢𝑖 to 𝑣 𝑖 and reversing
all the arrows. Mark what used to be the initial state as an accepting state
and mark what used to be the accepting state as an initial state.
(e) Convert to a deterministic machine and compare with the one in the first item.
C.26 For each language L recognized by some Finite State machine, let rank ( L)
be the smallest number 𝑛 ∈ N such that L is accepted by a Finite State machine
with 𝑛 states. Prove that for every 𝑛 there is a language with that rank.
Part Three

Computational Complexity
Chapter
V Computational Complexity

Earlier, we asked what can be done with a mechanism at all. This mirrors the
subject’s history: when the Theory of Computing began there were no physical com-
puters. Researchers were driven by considerations such as the Entscheidungsproblem.
The subject was interesting, the questions compelling, and there were plenty of
problems, but the initial phase had a theory-driven feel.
A natural next step is to look to do jobs efficiently. When physical computers
became widely available, that’s exactly what happened. Today, the Theory of
Computing has incorporated many questions that at least originate in applied fields,
and that need answers that are feasible.
We will review how we determine the practicality of algorithms, the order of
growth of functions. Then we will see a collection of the kinds of problems that
drive the field today. By the end of this chapter we will be at the research frontier
and we will state some things without proof, as well as discuss some things about
which we are not sure. In particular, we will consider the celebrated question of P
versus NP.

Section
V.1 Big O
We begin by reviewing the definition of the order of growth of functions. We will
study this because it measures how algorithms consume computational resources.
First, an anecdote. Here is a grade school multiplication.

678
× 42
1356
2712
28476

The algorithm combines each digit of the multiplier 42 with each digit of the
multiplicand 678, in a nested loop. A person could sensibly feel that this is the
right way to compute multiplication — indeed, the only reasonable way — and that
in general, to multiply two 𝑛 digit numbers requires about 𝑛 2 -many operations.

Image: Striders can walk on water because they are five orders of magnitude smaller than us. This
change of scale changes the world — bugs see surface tension as more important than gravity. Similarly,
finding an algorithm that changes the time that it takes to solve a problem from 𝑛 2 to 𝑛 · lg 𝑛 can make
something easy that was previously not practical.
264 Chapter V. Computational Complexity

In 1960, A Kolmogorov organized a seminar at Moscow State

University aimed at proving this. But before the seminar’s second
meeting one of the students, A Karatsuba, discovered that it is false. He
produced a clever algorithm that used only 𝑛 lg ( 3 ) ≈ 𝑛 1.585 operations.
At the next meeting Kolmogorov explained the result and closed the
seminar.
And this continues. Every day researchers produce results saying,
“for this problem, here is a way to solve it in less time, or less space,
etc.” † People are good at finding algorithms that solve a problem using
less of some computational resource. But we are not as good at finding Andrey Kol-
mogorov 1903–
lower bounds, at proving “no algorithm, no matter how clever, can
1987
solve the problem faster than such and such.” This is one reason that
we will compare the growth rates of resources consumed by algorithms using a
tool, Big O, that is like ‘less than or equal to’.
Motivation To compare algorithms we need a way to measure how they perform.
Typically, an algorithm takes longer on longer input. So we describe performance
by using a function whose argument is the input size and whose value is the
maximum time that the algorithm takes on all inputs of at most that size.
Next we develop the criteria for the definition of Big O, the tool that we use
to compare performance functions. √ Suppose that we have two algorithms. When‡
the input√is size 𝑛 ∈ N, one takes 𝑛 many
√ ticks while the other takes 10 · lg (𝑛) .
Initially, 𝑛 looks better. For instance, 1 000 ≈ 31.62 and 10 lg ( 1 000) ≈ 99.66.#
100
10 lg (𝑛)

50
√
𝑛
500 1000
√
However, for large 𝑛 the value 𝑛 is much bigger than 10 lg (𝑛) . For instance,
√
1 000 000 = 1 000 while 10 lg ( 1 000 000) ≈ 199.32.
√
1000 𝑛

500
10 lg (𝑛)

500000 1000000

Thus the first criteria is that big O must focus on what happens in the long run.
†
See the Theory of Computing blog feed at https://theory.report (Various authors 2017). ‡ We
write lg (𝑛) for log2 (𝑛) . That is, compute lg (𝑛) by finding the power of 2 that produces 𝑛 , so if 𝑛 = 8
then lg (𝑛) = 3, while if 𝑛 = 10 then lg (𝑛) ≈ 3.32. # These graphs show functions where the domain
is the real numbers. Turing machines are discrete devices so it may seem more natural to have the
domain be the natural numbers. But we will see that real functions are much more convenient for
complexity measures.
Section 1. Big O 265

The second criteria is more subtle. The next four examples illustrate.
1.1 Example These graphs compare 𝑓 (𝑛) = 𝑛 2 + 5𝑛 + 6 with 𝑔(𝑛) = 𝑛 2 . The graph
on the right compares them in ratio, 𝑓 /𝑔.

500 𝑓

400

300 𝑔

200
10
100 5 𝑓 /𝑔
0
5 10 15 20 5 10 15 20

On the left we are struck that 𝑛 2 + 5𝑛 + 6 is ahead of 𝑛 2 . But on the right the
ratios show that this is misleading. For large 𝑛 ’s, 𝑓 ’s 5𝑛 and 6 are swamped by the
𝑛 2 . Consequently in the long run these two functions track together — by far the
biggest ingredient in their behavior is that they are both quadratic.
1.2 Example Next compare the quadratic 𝑔(𝑛) = 𝑛 2 + 5𝑛 + 6 with the cubic 𝑓 (𝑥) =
𝑛 3 + 2𝑛 + 3. In contrast to the prior example, these two don’t track together.
Initially 𝑔 is larger, with 𝑔( 0) = 6 > 𝑓 ( 0) = 3 and 𝑔( 1) = 12 > 𝑓 ( 1) = 6. But
then the cubic accelerates ahead of the quadratic, so much that at the scale of the
image, the graph of 𝑔 doesn’t rise much above the axis.

𝑓
15000

10000

5000 20

𝑔 10
𝑓 /𝑔

10 20 5 10 15 20

On the right, the ratio grows without bound. So 𝑓 is a faster-growing function

than 𝑔. They both go to infinity but 𝑓 goes there faster.
1.3 Example Now compare the quadratics 𝑓 (𝑥) = 2𝑛 2 + 3𝑛 + 4 and 𝑔(𝑛) = 𝑛 2 + 5𝑛 + 6.
We’ve already seen, as described above, that the function comparison definition
needs to discount the initial behavior that 𝑓 ( 0) = 4 < 𝑔( 0) = 6 and 𝑓 ( 1) = 9 <
𝑔( 1) = 12, and instead focus on the long run.
266 Chapter V. Computational Complexity

1000
𝑓

500
𝑔

5
𝑓 /𝑔

5 10 15 20 10 20

This example differs from Example 1.1 in that in the long run, 𝑓 stays ahead of 𝑔
and also gains in an absolute sense, because of 𝑓 ’s dominant term 2𝑛 2 is twice
as large as 𝑔’s 𝑛 2. So it may appear that we should view 𝑔’s rate as less than 𝑓 ’s.
However unlike in Example 1.2, 𝑓 does not accelerate away. Instead, the ratio
between the two is bounded. We will take 𝑔 to be equivalent to 𝑓 .
1.4 Example We close the motivation with a very important example. Let the function
bits : N → N give the number of bits needed to represent its input in binary. The
bottom line of this table shows lg (𝑛) , the power of 2 that equals 𝑛 .
Input 𝑛 0 1 2 3 4 5 6 7 8 9
Binary 0 1 10 11 100 101 110 111 1000 1001
bits (𝑛) 1 1 2 2 3 3 3 3 4 4
lg (𝑛) – 0 1 1.58 2 2.32 2.58 2.81 3 3.17

Here is a graph of bits (𝑛) , the table’s third line, for 𝑛 ∈ { 1, ... 30 }.

5 10 15 20 25 30

The relationship between the third and fourth lines is that bits (𝑛) = 1 + ⌊ lg (𝑛)⌋ ,
except for the boundary value that bits ( 0) = 1 (lg ( 0) is undefined). The graph
below compares bits (𝑛) with lg (𝑛) . Note the change in the horizontal and vertical
scales.

10 20 30 40 50 60 70 80 90 100

This illustrates that in the formula bits (𝑛) = 1 + ⌊ lg (𝑛)⌋ , over the long run the
‘1+’ and the floor don’t matter much. A reasonable summary is that the base 2
logarithm, lg 𝑛 , describes the number of bits required to represent the number 𝑛 .
Further, the formula for converting among logarithmic functions with other
bases, log𝑐 (𝑥) = log𝑏 (𝑥)/log𝑏 (𝑐) , shows that they differ only by the constant factor
1/log𝑏 (𝑐) . As Example 1.3 notes, with the function comparison definition given
Section 1. Big O 267

below we will disregard constant factors. So even the base does not matter —
another reasonable summary is that the number of bits is “a” logarithmic function.
Definition Machine resource sizes, such as the number of bits of the input and of
memory, are natural numbers. So to describe the performance of algorithms we
may think to focus on functions that input and output natural numbers. However,
above we have already found useful a function, lg, that inputs and outputs reals.
So instead we will consider a subset of the functions from R to R.†
1.5 Definition A complexity function 𝑓 is one that inputs real number arguments
and outputs real number values, and (1) has an unbounded domain, so that there
is a number 𝑁 ∈ R+ such that 𝑥 ≥ 𝑁 implies that 𝑓 (𝑥) is defined, and (2) is
eventually nonnegative, so that there is a number 𝑀 ∈ R+ so that 𝑥 ≥ 𝑀 implies
that 𝑓 (𝑥) ≥ 0.
1.6 Definition Let 𝑔 be a complexity function. Then Big O of 𝑔, O (𝑔) , is the set of
complexity functions 𝑓 satisfying that there are constants 𝑁 , 𝐶 ∈ R+ so that if
𝑥 ≥ 𝑁 then both 𝑔(𝑥) and 𝑓 (𝑥) are defined and 𝐶 · 𝑔(𝑥) ≥ 𝑓 (𝑥) . We say that 𝑓
is O (𝑔) , or that 𝑓 ∈ O (𝑔) , or that 𝑓 is of order at most 𝑔, or that 𝑓 = O (𝑔) .
1.7 Remarks (1) We use the letter ‘O’ because this is about the order of growth.
(2) The term ‘complexity function’ is not standard but we find it convenient. (3) The
‘ 𝑓 = O (𝑔) ’ notation is very common, but awkward. It does not follow the usual
rules of equality, such as that 𝑓 = O (𝑔) does not allow us to write ‘O (𝑔) = 𝑓 ’.
Another is that 𝑥 = O (𝑥 2 ) and 𝑥 2 = O (𝑥 2 ) together do not imply that 𝑥 = 𝑥 2.
(4) Some authors do something a little different, they allow negative real outputs
and write the inequality with absolute values, 𝑓 (𝑥) ≤ 𝐶 · |𝑔(𝑥)| . (5) Sometimes
you see ‘ 𝑓 is O (𝑔) ’ stated as ‘ 𝑓 (𝑥) is O (𝑔(𝑥)) ’. Speaking strictly, this is wrong
because 𝑓 (𝑥) and 𝑔(𝑥) are numbers, not functions.
Think of ‘ 𝑓 is O (𝑔) ’ as meaning that 𝑓 ’s growth rate is less than or equal to 𝑔’s
rate. The sketches below illustrate the two possibilities.

𝑔 𝑔

𝑓
𝑓

𝑁 𝑁

On the left, 𝑔 appears to accelerate away, suggesting that 𝑓 ’s rate of growth is

strictly less than 𝑔’s. On the right the two seem to track together so that they have
†
Using real functions has the disadvantage that it can seem to leave out natural number functions
such as 𝑛 !. One way to deal with this is to extend these natural number functions to take real number
arguments, for instance, extending the factorial to ⌊𝑥 ⌋ !, whose domain is the set of nonnegative reals
(or to the more advanced Γ function).
268 Chapter V. Computational Complexity

the same rate of growth, that is, 𝑓 is O (𝑔) and also 𝑔 is O (𝑓 ) .

The definition requires that to show 𝑓 is O (𝑔) , we must produce suitable 𝑁
and 𝐶 and verify that they work.
1.8 Example Let 𝑓 (𝑥) = 𝑥 2 and 𝑔(𝑥) = 𝑥 3 . Then 𝑓 is O (𝑔) , as witnessed by 𝑁 = 2
and 𝐶 = 1. The verification is: 𝑥 > 𝑁 = 2 implies that 𝑔(𝑥) = 𝑥 3 = 𝑥 · 𝑥 2 is greater
than 2 · 𝑥 2 , which in turn is greater than 𝑥 2 = 𝐶 · 𝑓 (𝑥) = 1 · 𝑓 (𝑥) .
1.9 Example If 𝑓 (𝑥) = 5𝑥 2 and 𝑔(𝑥) = 𝑥 4 then to show 𝑓 is O (𝑔) take 𝑁 = 2 and
𝐶 = 2. The verification is that 𝑥 > 𝑁 = 2 implies that 𝐶 ·𝑥 4 = 2 ·𝑥 2 ·𝑥 2 ≥ 8𝑥 2 > 5𝑥 2 .
Don’t confuse a function having smaller values with it having a smaller growth
rate. Take 𝑔(𝑥) = 𝑥 2 and 𝑓 (𝑥) = 𝑥 2 + 1, so that 𝑔(𝑥) < 𝑓 (𝑥) for all 𝑥 . But 𝑔’s
growth rate is not smaller; rather, 𝑓 is O (𝑔) . To verify, take 𝑁 = 2 and 𝐶 = 2.
Then 𝑥 ≥ 𝑁 = 2 gives 𝐶 · 𝑔(𝑥) = 2𝑥 2 = 𝑥 2 + 𝑥 2 > 𝑥 2 + 1 = 𝑓 (𝑥) .
1.10 Example As noted above, all logarithmic functions differ by only a constant factor.
So for instance lg is O ( ln) .
1.11 Example Some pairs of functions aren’t comparable, so that neither 𝑓 ∈ O (𝑔)
nor 𝑔 ∈ O (𝑓 ) . For an instance, let 𝑔(𝑥) = 𝑥 3 and consider the function 𝑓 (𝑥) that
returns 𝑥 2 if ⌊𝑥⌋ is even and otherwise returns 𝑥 4 . For inputs where ⌊𝑥⌋ is odd
there is no constant 𝐶 that gives 𝐶 · 𝑥 3 ≥ 𝑥 4 for all 𝑥 , so 𝑓 is not O (𝑔) . Likewise, 𝑔
is not O (𝑓 ) because of 𝑓 ’s behavior when ⌊𝑥⌋ is even.
1.12 Lemma (Algebraic properties) Let these be complexity functions.
(a) If 𝑓 is O (𝑔) then for any constant 𝑎 ∈ R+, the function 𝑎 · 𝑓 is O (𝑔) .
(b) If 𝑓0 is O (𝑔0 ) and 𝑓1 is O (𝑔1 ) then the sum 𝑓0 + 𝑓1 is O (𝑔) where 𝑔(𝑥) =
max (𝑔0 (𝑥), 𝑔1 (𝑥)) . So if both 𝑓0 and 𝑓1 are O (𝑔) then 𝑓0 + 𝑓1 is also O (𝑔) .
(c) If 𝑓0 is O (𝑔0 ) and 𝑓1 is O (𝑔1 ) then the product 𝑓0 𝑓1 is O (𝑔0𝑔1 ) .
(As this section is a review, for a number of results the proofs are exercises.
This one’s proof is Exercise 1.61.)
That result gives us two principles for simplifying Big O expressions. First, if
an expression is a sum of finitely many terms of which one has the largest growth
rate then we can drop the other terms. Second, if an expression is a product of
factors then we can drop constants, factors that do not depend on the input.
1.13 Example Consider 𝑓 (𝑛) = 5𝑥 3 + 3𝑥 2 + 12𝑥 . Looking to the first principle, the
term with the largest growth rate is 5𝑥 3 (this is intuitively clear and it will follow
from Theorem 1.17 below) and applying the lemma’s second item with 𝑔(𝑥) = 5𝑥 3
gives that 𝑓 is O ( 5𝑥 3 ) . Then the second simplification principle, applying the
lemma’s first item with 𝑎 = 1/5, gives that 𝑓 is O (𝑥 3 ) .
1.14 Definition Complexity functions 𝑓 and 𝑔 have equivalent growth rates or the
same order of growth if 𝑓 is O (𝑔) and also 𝑔 is O (𝑓 ) . We say that 𝑓 is Θ(𝑔) (read
‘ 𝑓 is Big-Theta of 𝑔’), or, what is the same thing, that 𝑔 is Θ(𝑓 ) .
1.15 Lemma The Big-O relation is reflexive, so 𝑓 is O (𝑓 ) . It is also transitive, so that
if 𝑓 is O (𝑔) and 𝑔 is O (ℎ) then 𝑓 is O (ℎ) . Thus having equivalent growth rates,
which forces symmetry, is an equivalence relation between functions.
Section 1. Big O 269

1.16 Figure: Each bean holds the complexity functions. Faster growing functions are
higher, so that if they were shown then 𝑥 5 would be above 𝑥 4. On the left is the cone
O (𝑔) for some 𝑔. The ellipse at the top is Θ(𝑔) , holding functions with growth rate
equivalent to 𝑔’s. The sketch on the right adds the cone O (𝑓 ) for some 𝑓 in O (𝑔) .

The next result eases Big O calculations for most of the functions that we
encounter, such as polynomial, exponential, and logarithmic functions.
1.17 Theorem Let 𝑓 , 𝑔 be complexity functions. Suppose that lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥)
exists and equals 𝐿 , which is a member of R ∪ { ∞ }.
(a) If 𝐿 = 0 then 𝑔 grows faster than 𝑓 , that is, 𝑓 is O (𝑔) but 𝑔 is not O (𝑓 ) .†
(b) If 𝐿 = ∞ then 𝑓 grows faster than 𝑔, so that 𝑔 is O (𝑓 ) but 𝑓 is not O (𝑔) .‡
(c) If 𝐿 is between 0 and ∞ then the two functions have something like the same
growth rates, so that 𝑓 is Θ(𝑔) and 𝑔 is Θ(𝑓 ) .#
It pairs well with the following result familiar from Calculus I.
1.18 Theorem (L’Hôpital’s Rule) Let 𝑓 and 𝑔 be complexity functions such that both
𝑓 (𝑥) → ∞ and 𝑔(𝑥) → ∞ as 𝑥 → ∞, and such that both are differentiable for
large enough inputs. If lim𝑥→∞ 𝑓 ′ (𝑥)/𝑔′ (𝑥) exists and equals 𝐿 ∈ R ∪ {∞} then
lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥) also exists and also equals 𝐿 .
1.19 Example Let 𝑓 (𝑥) = 𝑥 2 + 5𝑥 + 6 and 𝑔(𝑥) = 𝑥 3 + 2𝑥 + 3. Here we apply L’Hôpital’s
Rule multiple times.
𝑓 (𝑥) 𝑥 2 + 5𝑥 + 6 2𝑥 + 5 2
lim = lim 3 = lim = lim =0
𝑥→∞ 𝑔(𝑥) 𝑥→∞ 𝑥 + 2𝑥 + 3 𝑥→∞ 3𝑥 2 + 2 𝑥→∞ 6𝑥

So 𝑓 is O (𝑔) but 𝑔 is not O (𝑓 ) . That is, 𝑓 ’s growth rate is strictly less than 𝑔’s.
1.20 Example Next consider 𝑓 (𝑥) = 3𝑥 2 + 4𝑥 + 5 and 𝑔(𝑥) = 𝑥 2.
3𝑥 2 + 4𝑥 + 5 6𝑥 + 4 6
lim = lim = lim = 3
𝑥→∞ 𝑥2 𝑥→∞ 2𝑥 𝑥→∞ 2

So their growth rates are roughly the same. That is, 𝑓 is Θ(𝑔) .
1.21 Example For 𝑓 (𝑥) = 5𝑥 4 + 15 and 𝑔(𝑥) = 𝑥 2 − 3𝑥 , this
5𝑥 4 + 15 20𝑥 3 60𝑥 2
lim = lim = lim =∞
𝑥→∞ 𝑥 2 − 3𝑥 𝑥→∞ 2𝑥 − 3 𝑥→∞ 2

shows that 𝑓 ’s growth rate is strictly greater than 𝑔’s rate — 𝑔 is O (𝑓 ) but 𝑓 is
not O (𝑔) .
†
This case is denoted 𝑓 is 𝑜 (𝑔) , read aloud as “Little Oh of 𝑔.” ‡ We also denote ‘𝑔 is O (𝑓 ) ’ by 𝑓 is
Ω (𝑔) , read aloud as “Big Omega of 𝑔.” # If 𝐿 = 1 then 𝑓 and 𝑔 are asymptotically equivalent.
270 Chapter V. Computational Complexity

1.22 Example The logarithmic function 𝑓 (𝑥) = log𝑏 (𝑥) grows very slowly: log𝑏 (𝑥)
is O (𝑥) , and log𝑏 (𝑥) is O (𝑥 0.1 ) , and is O (𝑥 0.01 ) . In fact by this equation, for any
𝑑 > 0 no matter how small, log𝑏 (𝑥) is O (𝑥 𝑑 ) and 𝑥 𝑑 is not O ( log𝑏 (𝑥)) .
log𝑏 (𝑥) ( 1/𝑥 ln (𝑏)) 1 1
lim = lim = · lim 𝑑 = 0
𝑥→∞ 𝑥𝑑 𝑥→∞ 𝑑𝑥 𝑑 − 1 𝑑 ln (𝑏) 𝑥→∞ 𝑥
The difference in growth rates is even more marked than that. L’Hôpital’s Rule,
along with the Chain Rule, gives that ( log𝑏 (𝑥)) 2 is O (𝑥) .

( log𝑏 (𝑥)) 2 2 ln (𝑥)/(𝑥 ln (𝑏))

lim = lim
𝑥→∞ 𝑥 𝑥→∞ 1
2 ln (𝑥) 2 1/𝑥
= · lim = · lim =0
ln (𝑏) 𝑥→∞ 𝑥 ln (𝑏) 𝑥→∞ 1
Further, Exercise 1.51 shows that for every power 𝑘 the function ( log𝑏 (𝑥))𝑘 is
O (𝑥 𝑑 ) for any 𝑑 > 0.
The log-linear function 𝑥 · lg (𝑥) has a similar relationship to the polynomials
𝑥 𝑑, where 𝑑 > 1. Starting with the limit 𝑑lim 𝑥→∞ (𝑥 lg (𝑥))/𝑥 , the derivative
𝑑

gives ( 1/𝑑) · lim𝑥→∞ ( 1/ln ( 2)) + ln (𝑥) /𝑥 − 1 and the second derivative gives
1/𝑑 (𝑑 − 1) · lim𝑥→∞ 1/𝑥 𝑑 − 1 , which is zero.

1.23 Example We can compare the polynomial function 𝑓 (𝑥) = 𝑥 2 with the exponential
function 𝑔(𝑥) = 2𝑥 .
2𝑥 2𝑥 · ln ( 2) 2𝑥 · ( ln ( 2)) 2
lim = lim = lim =∞
𝑥→∞ 𝑥2 𝑥→∞ 2𝑥 𝑥→∞ 2
Thus 𝑓 ∈ O (𝑔) but 𝑔 ∉ O (𝑓 ) . Induction gives that lim𝑥→∞ 2𝑥 /𝑥 𝑘 = ∞ for any 𝑘 .
1.24 Lemma Logarithmic functions grow more slowly than polynomial functions: if
𝑓 (𝑥) = log𝑏 (𝑥) for some base 𝑏 and 𝑔(𝑥) = 𝑎𝑚 𝑥 𝑚 + · · · + 𝑎 0 then 𝑓 is O (𝑔)
but 𝑔 is not O (𝑓 ) . Polynomial functions grow more slowly than exponential
functions: where ℎ(𝑥) = 𝑏 𝑥 for some base 𝑏 > 1 then then 𝑔 is O (ℎ) but ℎ is not
O (𝑔) .
We’ve defined complexity functions as mapping R to R, rather than the more
natural N to N. (One motivation is that some functions that we want to work with
are real functions, such as logarithms. Another is that L’Hôpital’s Rule, which uses
the derivative and so needs reals, is a big convenience.) The next result ensures
that our conclusions in the continuous context carry over to the discrete.
1.25 Lemma Let 𝑓0, 𝑓1 : R → R, and consider the restrictions to a discrete domain
𝑔0 = 𝑓0 ↾N and 𝑔1 = 𝑓1 ↾N . Where 𝐿 ∈ R ∪ { ∞ },
(a) for 𝑎 ∈ R, if 𝐿 = lim𝑥→∞ (𝑎𝑓0 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑎𝑔0 ) (𝑛)
(b) if 𝐿 = lim𝑥→∞ (𝑓0 + 𝑓1 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑔0 + 𝑔1 ) (𝑛) ,
(c) if 𝐿 = lim𝑥→∞ (𝑓0 · 𝑓1 ) (𝑥) then 𝐿 = lim𝑛→∞ (𝑔0 · 𝑔1 ) (𝑛) , and
(d) when the expressions are defined, if 𝐿 = lim𝑥→∞ (𝑓0 /𝑓1 ) (𝑥) then 𝐿 =
lim𝑛→∞ (𝑔0 /𝑔1 ) (𝑛) .
Section 1. Big O 271

Tractable and intractable The table below lists orders of growth that are most
common in practice.
Order Name Examples
O ( 1) Bounded 𝑓 (𝑛) = 15
O ( lg ( lg (𝑛))) Double logarithmic 𝑓 (𝑛) = ln ( ln (𝑛))
O ( lg (𝑛)) Logarithmic 𝑓0 (𝑛) = ln (𝑛) , 𝑓1 (𝑛) = lg (𝑛 3 )
O ( lg (𝑛))𝑐 Polylogarithmic 𝑓 (𝑛) = ( lg (𝑛)) 3
O (𝑛) Linear 𝑓 (𝑛) = 3𝑛 + 4
O (𝑛 lg (𝑛)) Log-linear 𝑓0 (𝑛) = 5𝑛 lg (𝑛) + 𝑛, 𝑓1 (𝑛) = lg (𝑛 !)
O (𝑛 2 ) Polynomial (quadratic) 𝑓 (𝑛) = 5𝑛 2 + 2𝑛 + 12
O (𝑛 3 ) Polynomial (cubic) 𝑓 (𝑛) = 2𝑛 3 + 12𝑛 2 + 5
..
.
2
O ( 2poly ( lg (𝑛) ) ) Quasipolynomial 𝑓0 (𝑛) = 2 ( lg (𝑛) ) +3 lg (𝑛), 𝑓1 (𝑛) = 𝑛 lg (𝑛)
..
.
O ( 2𝑛 ) Exponential 𝑓 (𝑛) = 10 · 2𝑛
O ( 3𝑛 ) Exponential 𝑓 (𝑛) = 6 · 3𝑛 + 𝑛 2
..
.
O (𝑛 !) Factorial 𝑓 (𝑛) = 5 · 𝑛 ! + 𝑛 15 − 7
O (𝑛𝑛 ) –No standard name– 𝑓 (𝑛) = 2 · 𝑛𝑛 + 3 · 2𝑛

1.26 Table: The order of growth hierarchy.

We often draw a line in this hierarchy after the polynomial functions; the next
table shows why. It lists how long a job would take if we used an algorithm that
runs in time lg 𝑛 , time 𝑛 , etc. (A modern computer runs at 10 GHz, 10 000 million
ticks per second, and there are 3.16 × 107 seconds in a year.)
𝑛 =1 𝑛 = 10 𝑛 = 50 𝑛 = 100
lg 𝑛 – 1.05 × 10 − 17 1.79 × 10 − 17 2.11 × 10 − 17
𝑛 3.17 × 10 − 18 3.17 × 10 − 17 1.58 × 10 − 16 3.17 × 10 − 16
𝑛 lg 𝑛 – 1.05 × 10 − 16 8.94 × 10 − 16 2.11 × 10 − 15
𝑛2 3.17 × 10 − 18 3.17 × 10 − 16 7.92 × 10 − 15 3.17 × 10 − 14
𝑛3 3.17 × 10 − 18 3.17 × 10 − 15 3.96 × 10 − 13 3.17 × 10 − 12
2𝑛 6.34 × 10 − 18 3.24 × 10 − 15 3.57 × 10 − 3 4.02 × 1012

1.27 Table: Time taken in years by algorithms whose behavior is given by a few func-
tions, on a few size 𝑛 ’s.

In the 𝑛 = 100 column, between the first few rows the relative change is an
order of magnitude but the absolute times are small. Then we get to the final
row. That’s not a typo — the last entry really is on order of 1012 years. It is huge —
the universe is 14 × 109 years old so this computation, even with input size of
only 100, would take longer than the age of the universe. Exponential growth is
very, very much larger than polynomial growth.
272 Chapter V. Computational Complexity

Another way to understand this is to think about, say, a string algorithm.

Consider adding one more character to the input, for example by passing from the
length ten string 𝜎10 = 11 0100 1010 to the length eleven 𝜎11 = 110 1001 0101. An
algorithm that loops through the characters and so runs in |𝜎 | time must do one
more loop, so that in this example it takes ten percent more time. But an algorithm
that takes 2 |𝜎 | time will take double the time.
Cobham’s thesis is that the tractable problems — the ones that are at least
conceivably solvable in practice — are those for which there is an algorithm whose
resource consumption is at most polynomial.† For instance, if a problem’s best
available algorithm runs in exponential time then we may say that as we understand
it today, the problem appears intractable.
Discussion Big O is about relative scalability: an algorithm whose runtime
behavior is O (𝑛 2 ) scales worse than one whose behavior is O (𝑛 lg 𝑛) but better
than one whose behavior is O (𝑛 3 ) . Is there more to say?
Certainly that is the essence. Nonetheless, experience shows that there are
points about Big O that can puzzle learners and we will pause to elaborate on
those.
The first point is that Big O is not the right tool for characterizing fine coding
details. Contrast these.

(define (g0 n) (define (g1 n)

(for ([i '(0 1 2 3 4)]) (let ([x (* n n)])
(let ([x (* n n)]) (for ([i '(0 1 2 3 4)])
(printf "~a " (+ i x))))) (printf "~a " (+ i x)))))

They do the same thing but their run times are different. On the left g0 sets the
local variable x inside the loop. That makes the code on the left slower than the
right by four calculations. Big O disregards this constant time difference. Big O is
good for comparing running times among algorithms but not as good for comparing
running times among programs.
That fits with our second point about Big O. We use it to help pick the best
algorithm, to rank them according to how much they use of some computing
resources. But algorithms are tied to an underlying computing model.‡
Besides the Turing machine, another model that is widely used in this context
is the Random Access machine (RAM). Whereas a Turing machine cell stores only
a single symbol, so that big numbers need multiple cells, on a RAM model machine
each register holds an entire integer. And whereas to get to a cell a Turing machine
may spend lots of steps traversing the tape, the RAM model gets each register’s
†
Cobham’s Thesis is not universally accepted. Some researchers object that if an algorithm runs in time
𝐶𝑛𝑘 but with an enormous 𝑘 or an enormous 𝐶 , or both, then the algorithm is not practical. A rejoinder
to that objection notes a pattern that when someone announces an algorithm with a large exponent or
large constant then typically the approach gets refined over time, shrinking the number. In any event,
polynomial time is significantly better than exponential time. Here we accept Cobham’s thesis because
it gives technical meaning to the informal ‘tractable’. ‡ More discussion of the relationship between
algorithms and machine models is in Section 3.
Section 1. Big O 273

contents in a single step.

Close analysis shows that if we start with an algorithm intended for a RAM
model machine and execute it on a Turing machine then this may add as much as
𝑛 3 extra ticks to the runtime, so that if the algorithm is O (𝑛 2 ) on the RAM then on
the Turing machine it can be O (𝑛 5 ) . Thus, to understand the cost of an algorithm,
we must first fix a model and then the following definition lets us discuss the Big O.
(In practice the most common model is a Turing machine, because it comes with a
built-in notion of time as number of transitons and space as tape squares.)
1.28 Definition A machine M with input alphabet Σ takes time 𝑡 M : Σ∗ → N ∪ { ∞ }
if that function gives the number of steps that the machine takes to halt on
input 𝜎 ∈ Σ∗ . If M does not halt then 𝑡 M (𝜎) = ∞. The machine runs in input
length time 𝑡ˆM : N → N ∪ { ∞ } if 𝑡ˆM (𝑛) is the maximum of the 𝑡 (𝜎) over all
inputs 𝜎 ∈ Σ∗ of length 𝑛 . The machine runs in time O (𝑓 ) if 𝑡ˆM is O (𝑓 ) .
We have already, in relation to Cobham’s Thesis, brought up our third point
about Big O. Its definition ignores constant factors; does that greatly reduce its
value for comparing algorithms? If for instance an algorithm takes time given
by 𝐶𝑛 2 for inputs 𝑛 > 𝑁 then don’t we need to know 𝐶 and 𝑁 ? After all, if one
algorithm has runtime 𝐶 0𝑛 with an enormous 𝐶 0 while another is 𝐶 1𝑛 2 for tiny 𝐶 1 ,
could that not make the second algorithm more useful in practice? Similarly, could
a huge 𝑁 mean that we need to describe what happens to inputs less than that
number?
Part of the answer is that finding these constants is hard.† Machines vary
widely in their low-level details such as the memory addressing and paging, cache
hits, and whether the CPU can do some operations in parallel, and these details
can make a tremendous difference in constants such as 𝐶 and 𝑁 . Imagine doing
the analysis on a commercially available machine and then the vendor releases
a new model, so it is all to do again. That would be discouraging. And what’s
more, experience shows that doing the work to find the exact numbers usually
does not change the algorithm that gets picked. As Table 1.27 illustrates, knowing
at a Big O level how the algorithm grows is, at least past some point, much more
important than knowing the values of the constants.
Instead of analyzing commercial machines, we could agree on spec-
ifications for a reference architecture — this is the approach taken by
D Knuth in the monumental Art of Computer Programming series — but
again there is the risk that we might have to update that standard. Then
published results from some time ago might no longer apply because
they refer to an old standard. Again, discouraging. So the analysis that
we do to find the Big O behavior of an algorithm usually refers to an
Donald Knuth
abstract computer model, such as a Turing machine or a RAM model, and
b 1938
it usually does not go to the extent of finding the constants. That is, being
reference independent is an advantage, particularly in a quickly changing field.

†
Authors do sometimes state the order of magnitude of these constants.
274 Chapter V. Computational Complexity

This echos the paragraph at the start of this

discussion. Not taking into account the precise dif-
ference between, say, the cost of a division and the
cost of a memory write (as long as those costs lie
between reasonable limits) discounts the meaning-
fulness of the constant factors and instead focuses
on relative comparisons. Here is an analogy: abso-
lute measurement of distance involves units such Courtesy xkcd.com
as miles or kilometers, but being able to make
statements irrespective of the unit constants requires making statements that are
relative, such as “from here, New York City is twice as far as Boston.”
That is, if we have an algorithm that on input of size 𝑛 will take 3𝑛 ticks then
we say it is O (𝑛) in order to express, roughly, that doubling the input size will no
more than double the number of steps taken. Similarly, if an algorithm is O (𝑛 2 )
then doubling the input size will roughly at most quadruple the number of steps.
The Big O notation ignores constants because that is inherent in being a unit free,
relative measurement.
However, if a person has the sense that leaving out the constants makes this
measure approximate, then certainly, Big O is is only a rough comparison. It may
suggest that a O (𝑛 2 ) algorithm is better than a O (𝑛 5 ) one. But it cannot say which
of two O (𝑛 2 ) algorithms will be absolutely better when they are coded and run on
a particular platform, for input sizes in a given range.†
This leads to our fourth point about Big O. Under-
standing how an algorithm performs as the input size
grows requires that we define the input size.
Consider an algorithm for testing primality that inputs
a natural number 𝑛 and tries each 𝑘 ∈ { 2, ... 𝑛 − 1 } to
see if it divides 𝑛 . The worst case 𝑛 is that it tests all of
those 𝑘 ’s, roughly 𝑛 of them. Take the size of 𝑛 to be the
number of bits needed to represent 𝑛 , approximately lg 𝑛 .
So for this algorithm the input is of size lg 𝑛 and the
number of operations is about 𝑛 . That’s exponential
growth — passing from lg 𝑛 to 𝑛 requires exponentiating.
However, in a programming class this algorithm
would likely be described as linear because for the input 𝑛
there are about 𝑛 -many divisions. How to explain the
difference?
This is about the relationship between the algorithm
and the underlying computing model. We may make an
engineering judgment that for every use of our program
the input will fit into a 64 bit word. We are choosing a Courtesy xkcd.com
computation model that is like the RAM model, where
large numbers take the same time to read as small ones. With this model the
†
For that, use benchmarks.
Section 1. Big O 275

relationship between size of the input and the runtime is linear.

This difference is in part a theory-versus-application thing. In a common
programming setting where the input is bounded, the behavior is O (𝑛) . In a
theoretical setting the algorithm accepts arbitrarily large input and so the runtime
is a function of the bit size of the inputs, the algorithm is O ( 2𝑏 ) . An algorithm
whose behavior as a function of the input is polynomial but whose behavior as a
function of the bit size of the input is exponential is pseudopolynomial.
A final point about Big O. When analyzing an algorithm we can consider the
behavior that is the worst case for any input of that size as in Definition 1.28, or
the behavior that is the average over all inputs of that size (worst-case analysis is
much more common). For instance, the quicksort algorithm takes quadratic time
O (𝑛 2 ) at worst but on average is O (𝑛 lg 𝑛) .
Related to that, imagine a machine with this runtime behavior.

𝑛 ! – 𝑛 is a power of ten
(
𝑓 (𝑛) =
𝑛 – otherwise
This machine runs in superexponential time for rare inputs (called “black holes”).
The definition gives that overall this machine runs in time O (𝑛 !) , while for most
inputs it would be quite fast.†

V.1 Exercises
1.29 True or false: if a function is O (𝑛 2 ) then it is O (𝑛 3 ) .
✓ 1.30 Your classmate emails you a draft of an assignment answer that says, “I have
an algorithm with running time that is O (𝑛 2 ) . So with input 𝑛 = 5 it will take
25 ticks.” Make two corrections.
1.31 Suppose that someone posts to a group that you are in, “I’m working on a
problem that is O (𝑛 3 ) .” Explain to them, gently, how their sentence is mistaken.
✓ 1.32 How many bits does it take to express each number in binary? (a) 5 (b) 50
(c) 500 (d) 5 000
✓ 1.33 One is true, the other one is not. Which is which?
(a) If 𝑓 is O (𝑔) then 𝑓 is Θ(𝑔) .
(b) If 𝑓 is Θ(𝑔) then 𝑓 is O (𝑔) .
✓ 1.34 For each find the function on the order of growth hierarchy, Table 1.26,
that has the same rate of growth. (a) 𝑛 2 + 5𝑛 − 2 (b) 2𝑛 + 𝑛 3 (c) 3𝑛 4 − lg lg 𝑛
(d) lg 𝑛 + 5
1.35 For each give the function on the order of growth hierarchy, Table 1.26, that
has the same(rate of growth. That is, find 𝑔 in that table where 𝑓 is Θ(𝑔) .
𝑛 – if 𝑛 < 100
(a) 𝑓 (𝑛) =
0 – else
†
A real life example of such a thing is that the simplex algorithm, which is very widely used for linear
optimization, runs in exponential time in the worst case but typically seems to run in polynomial time.
276 Chapter V. Computational Complexity

1 000 000 · 𝑛 – if 𝑛 < 10 000

(
(b) 𝑓 (𝑛) = 2
𝑛 – else
2
1 000 000 · 𝑛 – if 𝑛 < 100 000
(
(c) 𝑓 (𝑛) =
lg 𝑛 – else
✓ 1.36 For each pair, find the limit of the ratio 𝑓 /𝑔 to decide if 𝑓 is O (𝑔) , or 𝑔 is O (𝑓 ) ,
or both, or neither. (a) 𝑓 (𝑛) = 3𝑛 3 +2𝑛+4, 𝑔(𝑛) = lg (𝑛)+6 (b) 𝑓 (𝑛) = 3𝑛 3 +2𝑛+4,
𝑔(𝑛) = 𝑛 + 5𝑛 3 (c) 𝑓 (𝑛) = ( 1/2)𝑛 3 + 12𝑛 2 , 𝑔(𝑛) = 𝑛 2 lg (𝑛) (d) 𝑓 (𝑛) = lg (𝑛) ,
𝑔(𝑛) = ln (𝑛) (e) 𝑓 (𝑛) = 𝑛 2 + lg (𝑛) , 𝑔(𝑛) = 𝑛 4 −𝑛 3 (f) 𝑓 (𝑛) = 55, 𝑔(𝑛) = 𝑛 2 +𝑛
✓ 1.37 For each pair of functions simplify using Lemma 1.12 to decide if 𝑓 is
O (𝑔) , or 𝑔 is O (𝑓 ) , or both, or neither. (a) 𝑓 (𝑛) = 4𝑛 2 + 3, 𝑔(𝑛) = ( 1/2)𝑛 2 − 𝑛
√
(b) 𝑓 (𝑛) √= 53𝑛 3 , 𝑔(𝑛) = ln 𝑛 (c) 𝑓 (𝑛) = 2𝑛 2 , 𝑔(𝑛) = 𝑛 (d) 𝑓 (𝑛) = 𝑛 1.2 + lg 𝑛 ,
𝑔(𝑛) = 𝑛 2 + 2𝑛 (e) 𝑓 (𝑛) = 𝑛 6 , 𝑔(𝑛) = 2𝑛/6 (f) 𝑓 (𝑛) = 3𝑛 , 𝑔(𝑛) = 3 · 2𝑛
(g) 𝑓 (𝑛) = lg ( 3𝑛) , 𝑔(𝑛) = lg (𝑛)
√
1.38 (IIS, IIT 2021) Consider the functions 𝑓1 = 10𝑛 , 𝑓2 = 𝑛 log 𝑛 , and 𝑓3 = 𝑛 𝑛 .
Which of these arranges the functions in increasing order of asymptotic growth
rate? (a) 𝑓3 , 𝑓2 , 𝑓1 , (b) 𝑓2 , 𝑓1 , 𝑓3 , (c) 𝑓1 , 𝑓2 , 𝑓3 , (d) 𝑓2 , 𝑓3 , 𝑓1 .
1.39 Which of these are O (𝑛 2 ) ? (a) lg 𝑛 (b) 3 + 2𝑛 + 𝑛 2 (c) 3 + 2𝑛 + 𝑛 3
(d) 10 + 4𝑛 2 + ⌊ cos (𝑛 3 )⌋ (e) lg ( 5𝑛 )
✓ 1.40 For each, state true or false. (a) 5𝑛 2 + 2𝑛 is O (𝑛 3 ) (b) 2 + 4𝑛 3 is O ( lg 𝑛)
(c) ln 𝑛 is O ( lg 𝑛) (d) 𝑛 3 + 𝑛 2 + 𝑛 is O (𝑛 3 ) (e) 𝑛 3 + 𝑛 2 + 𝑛 is O ( 2𝑛 )
1.41 For each find the smallest 𝑘 ∈ N so that the given function is O (𝑛𝑘 ) .
(a) 𝑛 3 + (𝑛 4 /10 000 000√
) (b) (𝑛 + 2) (𝑛 + 3) (𝑛 2 − lg 𝑛) (c) 5𝑛 3 + 25 + ⌈cos (𝑛)⌉
2 3 4
(d) 9 · (𝑛 + 𝑛 ) (e) ⌊ 5𝑛 7 + 2𝑛 2 ⌋
1.42 Let 𝑍 : R → R be the zero function, 𝑍 (𝑥) = 0. Show that 𝑍 is O (𝑔) for
every complexity function 𝑔.
1.43 Consider Table 1.27. (a) Add a column for 𝑛 = 200. (b) Add a row for 3𝑛 .
✓ 1.44 On a computer that performs at 10 GHz, at 10 000 million instructions per
second, what is the longest input that can be done in √
a year under an algorithm
with each time performance function? (a) lg 𝑛 (b) 𝑛 (c) 𝑛 (d) 𝑛 2 (e) 𝑛 3
(f) 2𝑛
1.45 What is the least input number such that 𝑓 (𝑛) = 100 000 · 𝑛 2 is less than
𝑔(𝑛) = 𝑛 3 ?
1.46 What is the order of growth of the run time of a deterministic Finite State
machine?
✓ 1.47 (a) Verify that 𝑓 (𝑥) = 7 is O ( 1) . (b) Verify that 𝑓 (𝑥) = 7 + sin (𝑥) is
O ( 1) . So if a function is in O ( 1) , that does not mean that it is a constant function.
(c) Verify that 𝑓 (𝑥) = 7 + ( 1/𝑥) is also O ( 1) . (d) Show that a complexity
function 𝑓 is O ( 1) if and only if it is bounded above by a constant, that is, if an
only if there exists 𝐿 ∈ R so that 𝑓 (𝑥) ≤ 𝐿 for all inputs 𝑥 ∈ R.
Section 1. Big O 277

1.48 Where does 𝑔(𝑥) ≤ 𝑥 O ( 1 ) place the function 𝑔 in the order of growth
hierarchy? Hint: see the prior question.
1.49 Let 𝑓 (𝑥) = 2𝑥 and 𝑔(𝑥) = 𝑥 2 . Prove directly from Definition 1.6 that 𝑓
is O (𝑔) , but that 𝑔 is not O (𝑓 ) .
1.50 Prove that 2𝑛 is O (𝑛 !) . Hint: because of the factorial, consider these natural
number functions and find suitable 𝑁 , 𝐶 ∈ N.
1.51 Use L’Hôpital’s Rule as in Example 1.22 to verify these for any 𝑑 ∈ R+:
(a) ( log𝑏 (𝑥)) 3 is O (𝑥 𝑑 ) (b) for any 𝑘 ∈ N+ , ( log𝑏 (𝑥))𝑘 is O (𝑥 𝑑 ) .
1.52 Assume that 𝑔 : R → R is increasing, so that 𝑥 1 ≥ 𝑥 0 implies that 𝑔(𝑥 1 ) ≥
𝑔(𝑥 0 ) . Let 𝑓 : R → R be a constant function. Show that 𝑓 is O (𝑔) .
1.53 (a) Show that there is a computable function whose output values grow at a
rate that is O ( 1) , one whose values grow at a rate that is O (𝑛) , one for O (𝑛 2 ) , etc.
(b) The Halting problem function 𝐾 is uncomputable. Place its rate of growth
in the order of growth hierarchy, Table 1.26. (c) Produce a function that is not
computable because its output values are larger than those of any computable
function. (You need not show that the rate of growth is greater, only that the
outputs are larger.)
1.54 Show that 𝑥 lg 𝑥 is quasipolynomial.
1.55 Show that the quasipolynomial function 𝑓 (𝑥) = 𝑥 lg 𝑥 grows faster than any
polynomial but slower than any exponential function.
✓ 1.56 Show that O ( 2𝑥 ) ∈ O ( 3𝑥 ) but O ( 2𝑥 ) ≠ O ( 3𝑥 ) .
1.57 Table 1.26 states that 𝑛 ! grows slower than 𝑛𝑛 . (a) Verify this. Hint: although
𝑛 ! is a natural
√ number function, Theorem 1.17 still applies. (b) Stirling’s formula
is that 𝑛 ! ≈ 2𝜋𝑛 · (𝑛𝑛 /𝑒 𝑛 ) . Doesn’t this imply that 𝑛 ! is Θ(𝑛𝑛 ) ?
✓ 1.58 Two complexity functions 𝑓 , 𝑔 are asymptotically equivalent, 𝑓 ∼ 𝑔, if
lim𝑥→∞ (𝑓 (𝑥)/𝑔(𝑥)) = 1. Show that each pair is asymptotically equivalent:
(a) 𝑓 (𝑥) = 𝑥 2 + 5𝑥 + 1 and 𝑔(𝑥) = 𝑥 2 , (b) lg (𝑥 + 1) and lg (𝑥) .
1.59 Is there an 𝑓 so that O (𝑓 ) is the set of all polynomials?
1.60 There are orders of growth between polynomial and exponential. Specifically,
𝑓 (𝑥) = 𝑥 lg 𝑥 is one. (a) Show that lg (𝑥) ∈ O (( lg (𝑥)) 2 ) but ( lg (𝑥)) 2 ∉ O ( lg (𝑥)) .
(b) Argue that for any power 𝑘 , we have 𝑥 𝑘 ∈ O (𝑥 lg 𝑥 ) but 𝑥 lg (𝑥 ) ∉ O (𝑥 𝑘 ) .
Hint: take the ratio, rewrite using 𝑎 = 2lg (𝑎) , and consider the limit of the exponent.
2
(c) Show that 𝑥 lg 𝑥 = 2 ( lg 𝑥 ) . Hint: take the logarithm of both halves. (d) Show
that 𝑥 lg 𝑥 is in O ( 2𝑥 ) . Hint: form the ratio using the prior item.
1.61 Verify the clauses of Lemma 1.12. (a) If 𝑎 ∈ R+ then 𝑎𝑓 is also O (𝑔) .
(b) The function 𝑓0 + 𝑓1 is O (𝑔) , where 𝑔 is defined by 𝑔(𝑛) = max (𝑔0 (𝑛), 𝑔1 (𝑛)) .
(c) The product 𝑓0 𝑓1 is O (𝑔0𝑔1 ) .
1.62 Verify these clauses of Lemma 1.15. (a) The Big-O relation is reflexive.
(b) It is also transitive.
278 Chapter V. Computational Complexity

1.63 Assume that 𝑓 and 𝑔 are complexity functions. (a) Let lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥)
exist and equal 0. Show that 𝑓 is O (𝑔) . (Hint: this requires a rigorous defini-
tion of the limit.) (b) We can give an example where 𝑓 is O (𝑔) even though
lim𝑥→∞ 𝑓 (𝑥)/𝑔(𝑥) does not exist. Verify that, where 𝑔(𝑥) = 𝑥 and where 𝑓 (𝑥) = 𝑥
when ⌊𝑥⌋ is odd and 𝑓 (𝑥) = 2𝑥 when ⌊𝑥⌋ is even.
1.64 Prove Lemma 1.24.

Section
V.2 A problem miscellany
Much of today’s work in the Theory of Computation is driven by problems that
originate outside of the subject. We will describe some of these problems to get a
sense of the ones that people work on and also to use for examples and exercises.
All of these problems are well-known to anyone in the field.

Problems, with stories We start with a few that come with stories.
These stories are fun and an important part of the culture, and they also
give a sense of where in general problems come from.
WR Hamilton was a polymath whose genius was recognized early
and he was given a sinecure as Astronomer Royal of Ireland. He made
important contributions to classical mechanics, where his reformulation
of Newtonian mechanics is now called Hamiltonian mechanics. Other
work of his in physics helped develop classical field theories such as
electromagnetism and laid the groundwork for the development of William Rowan
quantum mechanics. In mathematics, he is best known as the inventor Hamilton
of the quaternion number system. 1805–1865
One of his ventures was a game, Around the World. The vertices in
the graph below were holes in a wooden board, labeled with the names of world
cities. Players put pegs in the holes, looking for a circuit that visits each city once
and only once.

2.1 Animation: Hamilton’s Around the World game

It did not make Hamilton rich. But it did get him associated with a great problem.
2.2 Problem (Hamiltonian Circuit) Given a graph, decide if it contains a cyclic path
that includes each vertex once and only once.
Section 2. A problem miscellany 279

A special case is the Knight’s Tour problem, to use a chess knight to make a
circuit of the squares on the board. (Recall that a knight moves three squares
at a time, with the first two squares in one direction and then the third one
perpendicular to that direction.)

This is the solution given by L Euler. In graph terms, there are sixty four vertices,
representing the board squares. An edge goes between two vertices if they are
connected by a single knight move. Knight’s Tour asks for a Hamiltonian circuit of
that graph.
Hamiltonian Circuit has another famous variant.
2.3 Problem (Traveling Salesman) Given a weighted undirected graph, where we call
the vertices 𝑆 = {𝑐 0, ... 𝑐𝑘 − 1 } ‘cities’ and we call the edge weight 𝑑 (𝑐𝑖 , 𝑐 𝑗 ) ∈ N+
for 𝑖 ≠ 𝑗 the ‘distance’ between the cities, find the shortest-distance circuit that
visits every city and returns back to the start.
We can start with a map of the state capitals of the
forty eight contiguous US states and the distances between
them: Montpelier VT to Albany NY is 254 kilometers, etc.
From among all trips that visit each city and return back to
the start, such as Montpelier → Albany → Harrisburg →
Courtesy xkcd.com · · · → Montpelier, we want the shortest one.
As stated, this is an optimization problem. However we
can recast it as a decision problem. Introduce a parameter bound 𝐵 ∈ N and
change the problem statement to ‘decide if there is a circuit of total distance less
than 𝐵 ’. If we had an algorithm to quickly solve this decision problem then we
could also solve the optimization problem: ask whether there is a trip bounded by
length 𝐵 = 1, then ask if there is a trip of length 𝐵 = 2, etc. When we eventually
get a ‘yes’, we know the length of the shortest trip.
The next problem sounds much like Hamiltonian Circuit, in that
it involves exhaustively traversing a graph. But it proves to act very
differently.
Today the city of Kaliningrad is a Russian enclave between Poland
and Lithuania. But in 1727 it was in Prussia and was called Königsberg.
The Pregel river divides the city into four areas, connected by seven
bridges. The citizens used to promenade, to take leisurely walks or
drives where they could see and be seen. The question arose: can a
person cross each bridge once and only once, and arrive back at the Leonhard Euler
1707–1783
280 Chapter V. Computational Complexity

start? No one could think of a way but no one could think of a reason
that there was no way. A local mayor wrote to Euler, who proved that
no circuit is possible. This paper founded Graph Theory.

A D

Euler’s summary sketch is in the middle and the graph is on the right.
2.4 Problem (Euler Circuit) Given a graph, find a circuit that traverses each edge
once and only once, or find that no such circuit exists.
Next is a problem that sounds hard. But all of us see it solved every day, for
instance when we ask our smartphone for the shortest route to some place.
2.5 Problem (Shortest Path) Given a weighted graph and two vertices, find the
least-weight path between them, or find that no path exists.
There is an algorithm that solves this problem quickly.† For instance, with the
graph below we could look for the path from 𝐴 to 𝐹 of least cost.
14
𝐴 𝐷
9
9 2
7 𝐶 𝐹
10 11 6
𝐵 𝐸
15

The next problem was discovered in 1852 by a young mathematician, F Guthrie,

who was drawing a map of England’s counties. He wanted to color them, with
different colors for counties that share a border. His map required only four colors
and he conjectured that for any map, four colors were enough.
Guthrie imposed the condition that the countries must be contigu-
ous and he defined ‘sharing a border’ to mean sharing an interval, not just
a point (see Exercise 2.49). Below is a map and a graph version of the
same problem. In the graph, counties are vertices and edges connect ones
that are adjacent. A crucial point is that the graph is planar — we can
draw it in the plane so that its edges do not cross.
The Four Color problem is to start with a planar graph and end with
Augustus De
the vertices partitioned into no more than four sets, called colors, such
Morgan
that adjacent vertices are in different colors. Guthrie consulted his former
1806–1871
professor, A De Morgan, who was also unable to either prove or disprove
†
Dijkstra’s algorithm is at worst quadratic in the number of vertices.
Section 2. A problem miscellany 281

the conjecture. But he did make the problem famous by promoting it among his
friends.

2.6 Animation: Counties of England and the derived planar graph

It remained unsolved until 1976, when K Appel and W Haken

reduced the proof to 1 936 cases and got a computer to check
those cases. This was the first major proof that was done on a
Appel and Haken’s post computer and it was controversial. Many mathematicians felt
office celebrating that the purpose of the subject was to understand things and
not just be satisfied when a computer program (that conceivably
had bugs) assures us that theorems are verified.† However, today’s generation
of mathematicians is more comfortable with this and now computer proofs are
routine.
2.7 Problem (Graph Colorability) Given a graph and a number 𝑘 ∈ N, decide whether
the graph is 𝑘 -colorable, whether we can partition its vertices into 𝑘 -many sets,
N = C0 ∪ · · · ∪ C𝑘 − 1 , such that no two same-set vertices are connected.
2.8 Problem (Chromatic Number) Given a graph, find the smallest number 𝑘 ∈ N
such that the graph is 𝑘 -colorable.
Our final story introduces a problem that will be a benchmark to which
we compare other problems. In 1847, G Boole outlined what we today
call Boolean algebra. A variable is Boolean if it takes only the values 𝑇
or 𝐹 . We focus on Boolean expressions that connect variables using the
and operator ∧, the or operator ∨, and the not operator ¬. (For more,
see Section C.) This Boolean function is given by an expression with three
variables.
George Boole
𝑓 (𝑃, 𝑄, 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ ¬𝑄) ∧ (¬𝑃 ∨ 𝑄) ∧ (¬𝑃 ∨ ¬𝑄 ∨ ¬𝑅) 1815–1864

An expression is satisfiable if some combination of input 𝑇 ’s and 𝐹 ’s makes it

evaluate to 𝑇 . It is in Conjunctive Normal form if it consists of clauses of variables
or negations connected by ∨’s, where the clauses are connected with ∧’s. This truth
table shows the input-output behavior of the function defined by the expression.
†
This is in contrast to the Entscheidungsproblem.
282 Chapter V. Computational Complexity

𝑃 𝑄 𝑅 𝑃 ∨𝑄 𝑃 ∨ ¬𝑄 ¬𝑃 ∨ 𝑄 ¬𝑃 ∨ ¬𝑄 ∨ ¬𝑅 𝑓 (𝑃, 𝑄, 𝑅)
𝐹 𝐹 𝐹 𝐹 𝑇 𝑇 𝑇 𝐹
𝐹 𝐹 𝑇 𝐹 𝑇 𝑇 𝑇 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝑇 𝐹
𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇 𝑇
𝑇 𝑇 𝑇 𝑇 𝑇 𝑇 𝐹 𝐹

That 𝑇 in the final column witnesses that this formula is satisfiable.

2.9 Problem (Satisfiability, SAT) Decide if a given Boolean expression is satisfiable.
2.10 Problem (3-Satisfiability, 3-SAT) Given a propositional logic formula in Conjunc-
tive Normal form in which each clause has at most three variables, decide if it is
satisfiable.
Observe that if the number of input variables is 𝑣 then the number of rows in
the truth table is 2𝑣 . So solving SAT appears to require exponential time. Whether
that is right is a very important question, as we will see in later sections.

More problems, omitting the stories We will list more example problems but
leaving out the background (although for some of them the motivation is clear
even without a story). All of these problems are also widely known in the field.
2.11 Problem (Vertex-to-Vertex Path) Given a graph and two vertices, find if the
second is reachable from the first.†
2.12 Example These are two Western-tradition constellations, Ursa Minor and Draco.

Here we can solve the Vertex-to-Vertex Path problem by eye. For any two vertices
in Ursa Minor there is a path and for any two vertices in Draco there is a path. But
if the two are in different constellations then there is no path.
For a graph with many thousands of nodes, such as a computer network, the
problem is harder than in the prior example. A close variant problem is to decide,
given a graph, whether all vertex pairs are connected.
†
The name Vertex-to-Vertex Path is nonstandard. It is usually known as 𝑠𝑡 -Path, 𝑠𝑡 -Connectivity, or
STCON (𝑠 and 𝑡 are generic names for vertices).
Section 2. A problem miscellany 283

2.13 Problem (Minimum Spanning Tree) Given a weighted undirected graph, find a
subgraph containing all the vertices of the original graph such that its edges have
a minimum total.
This is an undirected graph with weights on the edges.
18
8
4
3 10
9
1 7 9
3 9
5 8
4
9 4
4
2 6
9

The highlighted subgraph includes all of the vertices, that is, it spans the graph. In
addition, its weights total to a minimum from among all of the spanning subgraphs.
From that it follows that this subgraph is a tree, meaning that it has no cycles, or
else we could eliminate an edge from the cycle and thereby lower the edge weight
total without dropping any vertices.
This looks somewhat like the Hamiltonian Circuit problem in that the sought-for
subgraph contains all of the vertices. However, for the Minimum Spanning Tree
problem we know algorithms that are quick, O (𝑛 lg 𝑛) .
2.14 Problem (Vertex Cover) Given a graph and a bound 𝐵 ∈ N, decide if the graph
has a size 𝐵 set of vertices, 𝐶 , such that for any edge, at least one of its ends is a
member of 𝐶 .
2.15 Example A museum posts guards to watch their exhibits. There are eight halls,
laid out as below. They will put the guards at some of the corners 𝑤 0 , . . . 𝑤 5 . What
is the smallest number of guards that will suffice to watch all of the hallways?
𝑤0 𝑤1 𝑤2

𝑤3 𝑤4 𝑤5

Checking each corner shows that one guard will not suffice. The two-element set
𝐶 = {𝑤 0, 𝑤 4 } is a vertex cover: every hallway has at least one end in 𝐶 .
2.16 Problem (Clique) Given a graph and a bound 𝐵 ∈ N, decide if the graph has a
size 𝐵 set vertices such that any two are connected.
The term ‘clique’ comes from social networks; if the nodes represent people
and the edges connect friends then a clique is a set of people who are all friends.
A graph with a 4-clique has the subgraph like the one below on the left and
any graph with a 5 clique has the subgraph like the one the right.
284 Chapter V. Computational Complexity

2.17 Example This graph has a 4-clique.

𝑣3 𝑣2

𝑣4 𝑣1

𝑣5 𝑣0

2.18 Animation: Instance of the Clique problem

2.19 Problem (Max Cut) A graph cut partitions the vertices into two disjoint subsets.
The cut set contains the edges with a vertex in each subset. The Max Cut problem
is to find the partition with the largest cut set.
2.20 Example For this graph the largest cut set contains six edges, the ones connnecting
differently colored vertices here.†
𝑣1 𝑣3
𝑣0 𝑣5
𝑣2 𝑣4

2.21 Animation: A partition for the graph with a maximum cut set.

2.22 Problem (Three Dimensional Matching) Let the sets 𝑋, 𝑌 , 𝑍 all have the same
number of elements, 𝑛 . Given as input a set 𝑀 ⊆ 𝑋 × 𝑌 × 𝑍 , decide if there is a
ˆ ⊆ 𝑀 containing 𝑛 elements such that no two of the triples in
matching, a set 𝑀
ˆ
𝑀 agree on their first coordinates, or their second or third coordinates either.
2.23 Example Let 𝑋 = { a, b }, 𝑌 = { b, c }, and 𝑍 = { a, d }, so that 𝑛 = 2. Below is a
subset of 𝑋 × 𝑌 × 𝑍 (it actually equals 𝑋 × 𝑌 × 𝑍 ).

𝑀 = { ⟨a, b, a⟩, ⟨a, c, a⟩, ⟨b, b, a⟩, ⟨b, c, a⟩, ⟨a, b, d⟩, ⟨a, c, d⟩, ⟨b, b, d⟩, ⟨b, c, d⟩ }

The set 𝑀ˆ = { ⟨a, b, a⟩, ⟨b, c, d⟩ } has 2 elements. They disagree in their first
coordinates, and their second, and their third.
2.24 Example Fix 𝑛 = 4 and consider 𝑋 = { 1, 2, 3, 4 }, 𝑌 = { 10, 20, 30, 40 }, and
𝑍 = { 100, 200, 300, 400 }, all four-element sets. Also fix this subset of 𝑋 × 𝑌 × 𝑍 .

𝑀 = { ⟨1, 10, 200⟩, ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 10, 400⟩,
⟨3, 40, 100⟩, ⟨3, 40, 200⟩, ⟨4, 10, 200⟩, ⟨4, 20, 300⟩ }
ˆ = { ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 40, 100⟩, ⟨4, 10, 200⟩ }.
A matching is 𝑀
2.25 Problem (Subset Sum) Given a multiset of natural numbers 𝑆 = {𝑛 0, ... 𝑛𝑘 − 1 }
and a target 𝑇 ∈ N, decide if a subset of 𝑆 sums to the target.†
†
One way to verify this is with a script that checks all two-set partitions of the vertices. † Recall that in
a multiset repeats do not collapse, so the multiset { 1, 2, 2, 3 } is different than the multiset { 1, 2, 3 } .
But a multiset is like a set in that the order of the elements is not significant, so the multiset { 1, 2, 2, 3 }
is the same as the multiset { 1, 2, 3, 2 } . In short, a multiset is an unordered list.
Section 2. A problem miscellany 285

2.26 Example Do some of the numbers { 911, 22, 821, 563, 405, 986, 165, 732 } add to
𝑇 = 1173? One such collection is { 165, 986, 22 }.
In contrast, no subset of { 831, 357, 63, 987, 117, 81, 6785, 606 } adds to 𝑇 =
2105. All of the numbers are multiples of three, while the target 𝑇 is not.
2.27 Problem (Knapsack) Given a finite multiset 𝑆 whose elements 𝑠 have a natural
number weight 𝑤 (𝑠) and value 𝑣 (𝑠) , and also given a weight bound 𝐵 and a value
target 𝑇, find a subset 𝑆ˆ ⊆ 𝑆 whose elements have a total weight less than or equal
to the bound and total value greater than or equal to the target.
2.28 Example Our knapsack can carry at most 𝐵 = 10 pounds. Can we pack items
with total worth at least 𝑇 = 100?
Item a b c d
Weight 3 4 5 6
Value 50 40 10 30

The best that we can do is take items a and b. We cannot meet the value target.
2.29 Problem (Partition) Given a finite multiset 𝐴 such that each element has an
associated natural number size 𝑠 (𝑎) , decide if the set splits into two, 𝐴ˆ and 𝐴 − 𝐴ˆ,
so that the total of the sizes is the same, 𝑎∈𝐴ˆ 𝑠 (𝑎) = 𝑎∉𝐴ˆ 𝑠 (𝑎) .
Í Í

2.30 Example The set 𝐴 = { I, a, my, go, rivers, cat, hotel, comb } has eight words.
The size of a word, 𝑠 (𝜎) , is the number of letters. Then 𝐴ˆ = { cat, rivers, I, a, go }
gives 𝑎∈𝐴ˆ 𝑠 (𝑎) = 𝑎∉𝐴ˆ 𝑠 (𝑎) = 12.
Í Í

2.31 Example The US President is elected by having states send representatives to the
Electoral College. The number depends in part on the state’s population.
Reps No. states States Reps No. states States
55 1 CA 11 4 AZ, IN, MA, TN
38 1 TX 10 4 MD, MN, MO, WI
29 2 FL, NY 9 3 AL, CO, SC
20 2 IL, PA 8 2 KY, LA
18 1 OH 7 3 CT, OK, OR
16 2 GA, MI 6 6 AR, IA, KS, MS, NV, UT
15 1 NC 5 3 NE, NM, WV
14 1 NJ 4 5 HI, ID, ME, NH, RI
13 1 VA 3 8 AK, DE, DC, MT, ND,
12 1 WA SD, VT, WY

The table above gives the numbers for the 2020 election; all of a state’s represen-
tatives vote for the same person (we will ignore some fine points). The Partition
Problem asks if there could be a tie.
2.32 Problem (Linear Progamming) Optimize a linear function 𝐹 (𝑥 0, ... 𝑥𝑛 ) = 𝑐 0𝑥 0 +
· · · + 𝑐𝑛 𝑥𝑛 subject to linear constraints, ones of the form 𝑎𝑖,0𝑥 0 + · · · + 𝑎𝑖,𝑛 𝑥𝑛 ≤ 𝑏𝑖
or 𝑎𝑖,0𝑥 0 + · · · + 𝑎𝑖,𝑛 𝑥𝑛 ≥ 𝑏𝑖 .
2.33 Example Maximize 𝐹 (𝑥 0, 𝑥 1 ) = 𝑥 0 + 2𝑥 1 subject to 4𝑥 0 + 3𝑥 1 ≤ 24, 𝑥 1 ≤ 4, 𝑥 0 ≥ 0
and 𝑥 1 ≥ 0. The shaded region has the points that satisfy all the inequalities; these
are said to be ‘feasible’ points.
286 Chapter V. Computational Complexity

𝑥1
5
𝐹 =8
𝐹 =6
𝐹 =4
𝐹 =2

5 𝑥0

The level lines of 𝐹 indicate that the maximum is at (𝑥 0, 𝑥 1 ) = ( 3, 4) .

2.34 Problem (Crossword) Given an 𝑛 × 𝑛 grid and a set of 2𝑛 -many strings, each of
length 𝑛 , decide if the words can be packed into the grid.
2.35 Example Can we pack the words AGE, AGO, BEG, CAB, CAD, and DOG into a 3 × 3
grid?
C A B
A G E
D O G

2.36 Animation: Instance of the Crossword problem

2.37 Problem (Fifteen Game) Given an 𝑛 ×𝑛 grid holding tiles numbered 1, . . . 𝑛 2 − 1,

and a blank, find the minimum number of moves that will put the tile numbers
into ascending order. A move consists of switching a tile with an adjacent blank.
This game became popular with 𝑛 = 4 as a toy.

The final three problems, about primes and divisibility, have an impeccable
history. No less an authority than Gauss said, “The problem of distinguishing prime
numbers from composite numbers and of resolving the latter into their prime
factors is known to be one of the most important and useful in arithmetic. It has
engaged the industry and wisdom of ancient and modern geometers to such an
extent that it would be superfluous to discuss the problem at length . . . Further, the
dignity of the science itself seems to require that every possible means be explored
for the solution of a problem so elegant and so celebrated.”
The three may be hard to tell apart at first glance. But as we understand them
today, they differ in the Big-O behavior of the algorithms to solve them.
2.38 Problem (Divisor) Given a number 𝑛 ∈ N, find a nontrivial divisor.
We know of no efficient algorithm to find divisors.† However, as is so often the
case, at this moment we also have no proof that no such algorithm exists.‡ Not
† ‡
No efficient algorithm is known on a non-quantum computer. The presumed difficulty of this
problem is at the heart of widely used algorithms in cryptography.
Section 2. A problem miscellany 287

all numbers of a given length are equally hard to factor. The hardest numbers to
factor are semiprimes, products of two prime numbers.
2.39 Problem (Prime Factorization) Given a number 𝑛 ∈ N, produce its decomposition
into a product of primes.
Factoring seems to be hard. But what if you only want to know whether a
number is prime and don’t care about its factors?
2.40 Problem (Primality) Given a number 𝑛 ∈ N, determine if it is prime; that is,
decide if there are no numbers 𝑎 that divide 𝑛 and such that 1 < 𝑎 < 𝑛 .
For many years the consensus among experts was that finding
a primality testing algorithm that was polytime in the number of
digits of the input was very unlikely. After all, for centuries, many
of the smartest people in the world had worked on composites
and primes, and none of them had produced a fast test.†
However, in 2002 M Agrawal, N Kayal, and N Saxena pro-
duced such an algorithm, the AKS algorithm.‡ Today, refinements
of their technique run in O (𝑛 6 ) . Nitin Saxena (b 1981),
This dramatically illustrates that even though a problem is Neeraj Kayal (b 1979),
high profile, and even though many well-respected experts have Manindra Agrawal
(b 1966)
worked on it, does not mean that the problem will never be
solved.
Although opinions of experts have value, nonetheless they can be wrong. People
producing a result that gainsays established orthodoxy has happened before and
will happen again. One correct proof is all it takes.

V.2 Exercises
2.41 Name the prime numbers less than one hundred.
2.42 Decide if each is prime.
(a) 5 477
(b) 6 165
(c) 6 863
(d) 4 207
(e) 7 689
✓ 2.43 Find a proper divisor of each. (a) 31 221 (b) 52 424 (c) 9 600 (d) 4 331
(e) 877
2.44 Your friend asks, “Doesn’t the polytime solution of Primality automatically
give us one for Divisor? Just take the divisor from the first one and use it for the
second one.” Help them out.
✓ 2.45 Decide if each formula is satisfiable.
(a) (𝑃 ∧ 𝑄) ∨ (¬𝑄 ∧ 𝑅)
†
There are a number of probabilistic algorithms that are often used in practice that can test primality
very quickly, with an extremely small chance of error. ‡ At the time that they did most of this research,
Kayal and Saxena were undergraduates.
288 Chapter V. Computational Complexity

(b) (𝑃 → 𝑄) ∧ ¬((𝑃 ∧ 𝑄) ∨ ¬𝑃)

2.46 We can specify a propositional logic behavior in a truth table and then
produce such a statement in conjunctive normal form.
𝑃 𝑄 𝑅
𝐹 𝐹 𝐹 𝑇
𝐹 𝐹 𝑇 𝑇
𝐹 𝑇 𝐹 𝐹
𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇
𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇
𝑇 𝑇 𝑇 𝐹
(a) The two terms 𝑃 and ¬𝑃 are atoms. So are 𝑄 , ¬𝑄 , 𝑅 , and ¬𝑅 . Produce a
three-atom clause that evaluates to 𝐹 only on the 𝐹 -𝑇 -𝐹 line.
(b) Produce three-atom clauses for each of the other truth table lines having the
value 𝐹 on the right.
(c) Take the conjunction of those four clauses and verify that it has the given
behavior.
✓ 2.47 Each of the five Platonic solids has a Hamiltonian circuit, as shown.

Hamilton used the fourth, the dodecahedron, for his game. Find a Hamiltonian
circuit for the third and the fifth, the octahedron and the icosahedron. To make
the connections easier to see, below we have grabbed a face in the back of each
solid, and expanded it until we could squash the entire shape down into the plane
without any edge crossings.
0
1
1

4 2 3 4 5
2 5
3 6
7 8
0 9
10 11

2.48 Give a planar map that requires four colors.

2.49 (a) The Four Color problem requires that the countries be contiguous, that
they not consist of separated regions (that is, components). Give a planar map that
consists of separated regions that requires five colors. (b) We also define adjacent
to mean sharing a border that is an interval, not just a point. Give a planar map
that, without that restriction, would require five colors.
Section 2. A problem miscellany 289

✓ 2.50 Example 2.20 exhibits a cut set with six members, as shown on the left. But
on the right there are eight cut edges; what’s wrong with it?

𝑣1 𝑣3
𝑣1 𝑣3
𝑣0 𝑣5
𝑣0 𝑣5
𝑣2 𝑣4
𝑣2 𝑣4

2.51 Find the maximum cut set for this graph.

𝑣0 𝑣2 𝑣4

𝑣1 𝑣3 𝑣5

✓ 2.52 This shows interlocking corporate directorships. The vertices are corporations
and they are connected if they share a member of their Board of Directors (the
data is from 2004).
JP Morgan Caterpillar AT&T Texas Instruments

Ford Motor Citigroup

AIG Georgia Pacific Haliburton

(a) Is there a path from AT&T to Ford Motor? (b) Can you get from Haliburton to
Ford Motor? (c) Can you get from Caterpillar to Ford Motor? (d) JP Morgan to
Ford Motor?
2.53 How many edges are there in a Hamiltonian path?
2.54 On some Traveling Salesman problem graphs we can change the edge weights
to ensure that an edge is used but on some we cannot.
(a) A circuit for a Traveling Salesman problem instance is a Hamiltonian path.
Produce an undirected graph without loops on which there is at least one
Hamiltonian circuit, but containing an edge that belongs to no such circuit.
(b) Consider an undirected graph with an edge 𝑒 through which at least one
Hamiltonian circuit runs. Fix edge weights for all other edges. Show that
there is an edge weight for 𝑒 such that any solution for the Traveling Salesman
problem includes 𝑒 .
✓ 2.55 A popular game extends the Vertex-to-Vertex Path problem by counting the
degrees of separation. Below is a portion of the movie connection graph, where
actors are connected if they have ever been together in a movie.
Elvis Presley Meryl Streep
Change of Habit
Ed Asner JFK The River Wild

JFK Kevin Bacon She’s Having a Baby Alec Baldwin

Jay O Sanders
Cats & Dogs
Tara O’Reilley Northern Borders Beauty Shop

Shout It Out! Andie MacDowell John Kennedy

Jim Hefferon
290 Chapter V. Computational Complexity

A person’s Bacon number is the number of edges connecting them to Bacon, or

infinity if they are not connected. The game Six Degrees of Kevin Bacon asks: is
everyone connected to Kevin Bacon by at most six movies?
(a) What is Elvis’s Bacon number?
(b) John Kennedy’s (no, it is not that John Kennedy)?
(c) Bacon’s?
(d) How many movies separate me from Meryl Streep?

✓ 2.56 This Knapsack instance has no solution when the weight bound is 𝐵 = 73
and the value target is 𝑇 = 140.

Item a b c d e
Weight 21 33 49 42 19
Value 50 48 34 44 40

Verify that by brute force, by checking every possible packing attempt.

2.57 Using the data in Example 2.31, decide if there could be a tie in the 2020
Electoral College.

2.58 Find the shortest path in this graph

𝑞1 12 𝑞7
8 17
15
2
𝑞6 7
6
𝑞0 𝑞2 1 16
11
13
3 9
𝑞4 𝑞5
18
5 10
4 14
𝑞3 𝑞8
24

(a) from 𝑞 2 to 𝑞 7 , (b) from 𝑞 0 to 𝑞 8 , (c) from 𝑞 8 to 𝑞 0 .

2.59 The Subset Sum instance with 𝑆 = { 21, 33, 49, 42, 19 } and target 𝑇 = 114
has no solution. Verify that by brute force, by checking every possible combination.

✓ 2.60 What shape is a 3-clique? A 2-clique?

2.61 How many edges does a 𝑘 -clique have?

✓ 2.62 The Course Scheduling problem starts with a list of students and the classes
that they wish to take, and then finds how many time slots are needed to schedule
the classes. If there is a student taking two classes then those two will not be
scheduled to meet at the same time. Here is an instance: a school has classes
in Astronomy, Biology, Computing, Drama, English, French, Geography, History,
and Italian. After students sign up, the graph below shows which classes have
an overlap. For instance Astronomy and Biology share at least one student while
Section 2. A problem miscellany 291

Biology and Drama do not.

𝐺 𝐸 𝐻

𝐶 𝐹 𝐼

𝐷
𝐴 𝐵

What is the minimum number of class times that we must use? In graph coloring
terms, we define that classes meeting at the same time are the same color and
we ask for the minimum number of colors needed so that no two same-colored
vertices share an edge. (a) Show that no three-coloring suffices. (b) Produce a
four-coloring.
2.63 If a Boolean expression 𝐹 is satisfiable, does that imply that its negation ¬𝐹
is not satisfiable?
2.64 Some authors define the Satisfiability problem as: given a finite set of
propositional logic statements, not just one statement, find if there is a single input
tuple 𝑏 0, ... 𝑏 𝑗 − 1 , where each 𝑏𝑖 is either 𝑇 or 𝐹 , that satisfies them all. Show that
this is equivalent to the definition given in Problem 2.9.
✓ 2.65 Find all 3-cliques in this graph.

𝑣6 𝑣5

𝑣1 𝑣4 𝑣3

𝑣0 𝑣2

2.66 Is there a 3-clique in this graph? A 4-clique? A 5-clique?

𝑣0 𝑣1
𝑣2

𝑣3 𝑣4

𝑣5 𝑣6

2.67 Recall that Vertex Cover inputs a graph G = ⟨N , E ⟩ and a number 𝑘 ∈ N,

and asks if there is a subset 𝑆 of at most 𝑘 vertices such that for each edge at least
one endpoint is an element of 𝑆 . The Independent Set problem inputs a graph and
a number 𝑘ˆ ∈ N and asks if there is a subset 𝑆ˆ with at least 𝑘ˆ vertices such that for
each edge at most one endpoint is in 𝑆ˆ. The two are obviously related.
(a) In this graph find a vertex cover 𝑆 with 𝑘 = 2 elements. Find an independent
292 Chapter V. Computational Complexity

set with 𝑘ˆ = 4 elements.

𝑣0 𝑣1 𝑣2

𝑣3 𝑣4 𝑣5

(b) In this graph find a vertex cover with 𝑘 = 3 elements, and an independent set
with 𝑘ˆ = 3 elements.
𝑣0 𝑣1 𝑣2 𝑣3

𝑣4 𝑣5

(c) In this graph find a vertex cover 𝑆 with 𝑘 = 4 elements. Find an independent
set 𝑆ˆ with 𝑘ˆ = 6 elements.
𝑣0 𝑣1 𝑣2 𝑣3 𝑣4

𝑣5 𝑣6 𝑣7 𝑣8 𝑣9

(d) Prove that 𝑆 is a vertex cover if and only if its complement 𝑆ˆ = N − 𝑆 is an

independent set.
✓ 2.68 A college department has instructors 𝐴, 𝐵 , 𝐶 , 𝐷 , and 𝐸 . They need placing
into courses 0, 1, 2, 3, and 4. The available time slots are 𝛼 , 𝛽 , 𝛾 , 𝛿 , and 𝜀 . This
shows which instructors can teach which courses, and which courses can run in
which slots.
𝐴 𝐵 𝐶 𝐷 𝐸

0 1 2 3 4

𝛼 𝛽 𝛾 𝛿 𝜀

For example, instructor 𝐴 can only teach courses 1, 2, and 3. And, course 0 can only
run at time 𝛼 or time 𝛿 . Verify that this is an instance of the Three-dimensional Matching
problem and find a match.
2.69 Consider Three Dimensional Matching, Problem 2.22. Let 𝑋 = { a, b, c },
𝑌 = { b, c, d }, and 𝑍 = { a, d, e }. (a) List the elements of 𝑀 = 𝑋 × 𝑌 × 𝑍 .
(b) Is there a three element subset 𝑀ˆ whose triples have the property that no two
of them agree on any coordinate?

Section
V.3 Problems, algorithms, and programs
Now, with many examples in hand, we will briefly reflect on problems and solutions.
We will keep this discussion on an intuitive level only — indeed, many of these
things have no widely accepted precise definition.
Section 3. Problems, algorithms, and programs 293

A problem is a job, a task. It is a usually uniform family of tasks, with an

unbounded number of instances. For a sense of ‘family’, contrast the general
Shortest Path problem with that of finding the shortest path between Los Angeles
and New York. The first is a family while the second is an instance. We are
more likely to talk about the family, both because any conclusion about the first
subsumes the second and also because the first feels more natural.† We are focused
on problems that can be solved with a mechanism, although we continue to be
interested to learn that a problem cannot be solved mechanically at all.
An algorithm is an effective way to solve a problem.‡ An algorithm is not a
program, although it should be described in a way that is detailed enough that
implementing it is routine for an experienced professional. The description should
be complete enough to analyze its Big O behavior.
One subtle point about algorithms is that while they are abstractions, they
are nonetheless based on a computing model. An algorithm that is based on a
Turing machine model for adding one to an input would be very different than an
algorithm to do the same task on a model that is like an everyday computer.
An example of an unusual computing model that an algorithm
could target is distributed computation. For instance, Science
United is a way for anyone with a computer and an Internet
connection to help scientific projects by donating computing time. These projects
do research in astronomy, physics, biomedicine, mathematics, and environmental
science. Contributors install a free program that runs jobs in the background. This
is massively parallel computation.#
A program is an implementation of an algorithm, expressed in a formal computer
language and often designed to be executed on a specific computing platform.
Here is an illustration of the differences between the problems, algorithms,
and programs. We’ve discussed the Primality problem. One algorithm is, given an
input 𝑛 > 1, to try every 𝑘 ∈ ( 1 .. 𝑛) to see if it divides 𝑛 . We could implement that
algorithm with a program written in Racket.

Problem types We have already seen function problems. These ask that an
algorithm has a single output for each input. A example is the Prime Factorization
problem, which takes in a natural number and returns its prime decomposition.
Another example is the problem of finding the greatest common divisor, where the
input is a pair of natural numbers and the output is a natural number.
Another problem type is the optimization problem. These ask for a solution
that is best according to some metric. The Shortest Path problem is one of these,
†
There are interesting problems with only one task, such as computing the digits of 𝜋 . ‡ There is
no widely-accepted formal definition of ‘algorithm’. Whatever it is, it fits between ‘mathematical
function’ and ‘computer program’. For example, a ‘sort’ routine takes in a set of items and returns
the sorted sequence. This task, this input-output behavior, could be accomplished using different
algorithms: merge sort, heap sort, etc. So the best handle that we have is informal — an ‘algorithm’ is
an equivalence class of programs (i.e., Turing machines), where two programs are equivalent if they do
a task in essentially the same way, whatever “essentially” means. # There are now coming up on a
million volunteers offering computing time. To join them, visit https://scienceunited.org/.
294 Chapter V. Computational Complexity

as is the Minimal Spanning Tree problem.

A perhaps less familiar problem type is the search problem. For these, while
there may be many solutions in the search space, the algorithm can stop when it
has found one. An example inputs a Propositional Logic statement and outputs
any truth table line witnessing that the statement is satisfiable. A second is the
problem that inputs a weighted graph, two vertices, and a bound 𝐵 ∈ R, and finds
a path between the vertices that costs less than the bound. Another example is
that of finding a 𝐵 -coloring for a graph. Still another is the Knapsack problem. In
all of these, we want to find if there is a way to solve the problem, such as a way to
pack the knapsack, and if there is at least one then we are done.
A decision problem is one with a ‘Yes’ or ‘No’ answer.† The first problem
that we saw, the Entscheidungsproblem, is one of these.‡ We have also seen
decision problems in conjunction with the Halting problem, such as the problem of
determining, given an index 𝑒 , whether there is an input such that 𝜙𝑒 will output
a seven. In this chapter we saw the Primality problem, the problem of deciding
whether a given natural number is prime, as well as the Subset Sum problem.
Often a decision problem is expressed as a language decision problem, where
we are given a language and asked for an algorithm to decide if the input is
a member of that language. We will see many examples later but just to give
one here: we can express the task of deciding the primality of a natural number,
the Primality problem, as that of deciding membership in the set of bitstrings
{𝜎 ∈ P ( B) 𝜎 is the binary representation of a prime 𝑛 ∈ N }.
This relates to the discussion from the Languages section, on page 145, about
the distinction between deciding a language and recognizing it. We are ready for
the following.
3.1 Definition A language L is decided by a Turing machine, or is Turing machine
decidable, if the function computed by that machine is the characteristic function
of the language. The language is recognized, or accepted, by a machine when for
each input 𝜎 ∈ B∗, if 𝜎 ∈ L then the machine returns 1, while if 𝜎 ∉ L then either
the machine does not halt or it returns something other than 1.
Restated, the Turing machine P decides the language L if it has this input-output
behavior.
1 – if 𝜎 ∈ L
(
𝜙 P (𝜎) = 1L (𝜎) =
0 – otherwise
Thus, if P decides the language then it halts for all inputs.
3.2 Remark One reason that we are interested in language membership decisions
comes from practice. A language compiler must recognize whether a given source
file is a member of the language.
Another reason is that Finite State machines decide languages. We did lots of
†
Although a decision problem calls for producing a function of a kind, a Boolean function, they are
important enough to be a separate category. ‡ Recall that the word is German for “decision problem”
and that it asks for an algorithm to decide, given a mathematical statement, whether that statement is
true or false.
Section 3. Problems, algorithms, and programs 295

those, such as producing a machine that decides if an input string is a member

of L = {𝜎 ∈ { a, b }∗ 𝜎 contains at least two b’s } or proving that no Finite State
machine can determine membership in { a𝑛 b𝑛 𝑛 ∈ N }. Thus, if we want to
compare Finite State machines with other machines then we must compare which
languages they can decide, because that’s all that they can do. Pushdown machines
are the same.
Still another reason is that in many contexts stating a problem in this way is
natural, as we saw with the Halting problem.
Distinctions between problem types can be fuzzy. In addition, often if we have
a task then we could describe it with more than one type. An instance is the task of
determining the evenness of a natural number. We could express it as the function
problem ‘given 𝑛 , return its remainder on division by 2’, or as the language decision
problem of determining membership in L = { 2𝑘 𝑘 ∈ N }.
When we have a choice of problem types, we prefer language decision problems.
It is our default interpretation of ‘problem’ and we will focus on them in the rest of
the book. In addition, we will be sloppy about the distinction between the decision
problem for a language and the language itself; for instance, we will write L for a
problem.
All languages:

Langs with slow algorithms

.. RE Rec
.
Langs with fast algorithms

3.3 Figure: Both of these show the collection of languages, P ( B∗ ) , which we often call
the ‘problems’. On the left, the dots in the blob emphasize that this is a collection
of separate sets, not a continuum. It is drawn with quickly-solvable problems, those
with a fast decider, at the bottom. But there is a catch. On the right the shaded
collection Rec consists of the Turing computable languages. Similarly, RE consists
of the languages that are computably enumerable. So this diagram makes the point
that not all languages have a decider or a recognizer — some languages are perfectly
good problems, but they are unsolvable.

Many problem types can be recast as decision problems.

3.4 Example The Satisfiability problem, as we have stated it, is a decision problem.
We can recast it as the problem of determining membership in the language
SAT = { 𝐸 𝐸 is a satisfiable propositional logic statement }. This recasting is triv-
ial, suggesting that the language recognition problem form is a natural way to
describe the underlying task.
A pattern that we shall see is to recast optimization problems as language
decision problems, by parametrizing.
3.5 Example The Chromatic Number problem inputs a graph and returns a mini-
mal 𝑘 ∈ N such that the graph is 𝑘 -colorable. Recast it by considering the family of
296 Chapter V. Computational Complexity

languages, L𝐵 = { G G that has a 𝐵 -coloring } for 𝐵 ∈ N+. Here the parameter is

𝐵 . If we could solve these language decision problems then we could compute the
minimal chromatic number by testing 𝐵 = 1, 𝐵 = 2, etc., until we find the smallest
one for which G ∈ L𝐵 .
3.6 Example The Traveling Salesman problem is an optimization problem. Recast it
as a sequence of language decision problems as above: consider a parameter 𝐵 ∈ N
and define T S 𝐵 = { G the graph G has a circuit of length no more than 𝐵 }.
What is important about these recastings of optimization problems is that they
preserve polytime solvability. For instance, for the Traveling Salesman problem, if
for each 𝐵 we could solve member of the family T S 𝐵 in time O (𝑛𝑘 ) then looping
through 𝐵 = 1, 𝐵 = 2, etc., will solve the optimization problem in polytime, namely
time O (𝑛𝑘+1 ) .
Our last example is a preview of the work that we will do later in the chapter
to relate problems.
3.7 Example Consider the Satisfying Assignment problem that takes in a boolean
expression 𝐸 (𝑃 0, 𝑃 1 ... 𝑃𝑘 ) and returns an assignment of truth values for 𝑃 0 , 𝑃 1 , . . .
that makes 𝐸 evaluate to 𝑇, or a flag that there is no such assignment. This is a
search problem in that there may be many such assignments but we stop if we find
one.
We will show that this problem is equivalent to the Satisfiability language
decision problem in this sense: if we could solve one in polytime then we could
also solve the other in polytime. The easy direction is that if we had an algorithm
that solves Satisfying Assignment in polytime then given an expression 𝐸 we can
apply that algorithm to it and know in polytime whether 𝐸 is satisfiable.
For the other direction suppose that we have a polytime algorithm for SAT.
Given an expression 𝐸 , apply SAT’s algorithm to determine whether 𝐸 is satisfiable
at all. If so then we need to return an assignment. First fix 𝑃 0 at 𝐹 and run SAT’s
algorithm. If 𝐸 (𝑃 0 = 𝐹, 𝑃 1, 𝑃 2 ... 𝑃𝑘 ) is not satisfiable then our assignment needs
𝑃0 = 𝑇 , and otherwise 𝑃0 = 𝐹 suffices. Now with a suitable 𝑃0 we iterate, next
fixing 𝑃 1 = 𝐹 , etc. This builds an assignment by wrapping a loop of length 𝑘
around the polytime algorithm for SAT, and therefore the algorithm as a whole is
polytime.
One more remark. Sometimes selecting the problem type requires judgment.
Consider the task of finding rational number roots of a polynomial. As a function
problem it is ‘given a polynomial 𝑝 , return its rational roots’, but a language decision
problem restatement is ‘decide if ⟨𝑝, 𝑟 ⟩ , belongs to the set of pairs consisting of a
polynomial and one of its rational roots’. The second just ask us to plug 𝑟 into 𝑝
and does not seem to capture the essential difficulty.

Statements and representations To be complete, the description of a problem

must include the form of the inputs and outputs. For instance, if we state a problem
as: ‘input two numbers and output their midpoint’ then we have not fully specified
what needs to be done. The input or output might use strings representing decimal
Section 3. Problems, algorithms, and programs 297

numbers, or might be floating point representations, or even might be in unary.

This matters in that the input’s form can change the algorithm that we choose
or its runtime behavior. Suppose that we must decide whether a number is divisible
by four. If the input is in binary then the algorithm is immediate: a number is
divisible by four if and only if in its final two bits are 00.† In contrast, if it is in
unary then we may scan the 1’s, keeping track of the current remainder modulo 4.
On the other hand, the representation doesn’t matter in the sense that if we
have an algorithm for one representation then we can solve the problem for other
representations by translating. For example, we could do the divisible-by-four
problem with unary input by converting to binary and then applying the binary
algorithm.‡ Typically the costs of different representations don’t change the Big O
runtime behavior. For example we might have a graph algorithm whose run time
is O (𝑛 lg 𝑛) . Even for this minimal time, we can find a representation for the input
graphs, such as where inputting takes O (𝑛) time, that leaves the algorithm analysis
conclusion unchanged at O (𝑛 lg 𝑛) .
With this in mind, we will take the view, which we call Lipton’s Thesis, that
everything of interest can be represented with reasonable efficiency by bitstrings.#
This applies to all of the mathematical problems stated earlier. But it also applies
to cases that may seem less natural, such as that we can use bitstrings to represent
with sufficient fidelity Beethoven’s 9th Symphony or an exquisite Old Master.§

3.8 Figure: Basket of Fruit by Caravaggio (1571–1610)

†
Thus, on a Turing machine, if when the machine starts the head is under the final character, then the
machine does not even need to read the entire input to decide the question. The algorithm runs in time
independent of the input length. ‡ That is, the unary case reduces to the binary one. # ‘Reasonable’
means that it is not so inefficient as to greatly change the big-O behavior. § This is in a way like
Church’s Thesis. We cannot prove it but our experience with digital reproduction of music, movies, etc.,
argues that it is so.
298 Chapter V. Computational Complexity

Consequently, in practice researchers often do not mention representations.

We may describe the Shortest Path problem as, “Given a weighted graph and
two vertices . . .” in place of the more complete, “Given the following reasonably
efficient bitstring representation of a weighted graph G and vertices 𝑣 0 and 𝑣 1 ,
. . .” Outside of this section we also do this, leaving implementation details to
a programmer. (When we do discuss representations, we use str (𝑥) to denote
a convenient, reasonably efficient, bitstring representation of 𝑥 .† ) Basically, the
representation details do not affect the outcome of our analysis, much.
3.9 Remark There is a caveat. We have seen that conflating {𝑛 ∈ N 𝑛 is prime } with
{𝜎 ∈ B∗ 𝜎 represents a prime number } can cause confusion. The distinction
between thinking of an algorithm as inputting a number and thinking of it as
inputting the string representation of a number is the basis for describing the
Big O behavior of that algorithm as pseudopolynomial. This is because the binary
representation of a number 𝑛 takes O ( lg 𝑛) bits.

V.3 Exercises
✓ 3.10 For each of these, list three examples and then — speaking informally, since
some of them do not have formal definitions — describe the difference between
them and an algorithm. (a) a heuristic (b) pseudocode (c) a Turing machine
(d) a flowchart (e) source code (f) an executable (g) a process
3.11 Your friend asks, “So, if a problem is essentially a set of strings, what
constitutes a solution?” Answer them.
3.12 What is the difference between a decision problem and a language decision
problem?
3.13 As an illustration of the thesis that even surprising things can be represented
reasonably efficiently and with reasonable fidelity in binary, we can do a simple
calculation. (a) At 30 cm, the resolution of the human eye is about 0.01 cm.
How many such pixels are there in a photograph that is 21 cm by 30 cm?
(b) We can see about a million colors. How many bits per pixel is that? (c) How
many bits for the photo, in total?
3.14 Name something important that cannot be represented in binary.
✓ 3.15 True or false: any two programs that implement the same algorithm must
compute the same function. What about the converse?
3.16 Some tasks are hard to express as a language decision problem. Consider
sorting the characters of a string into ascending order. Briefly describe why each
of these language decision problems fails to capture the task’s essential difficulty.
(a) {𝜎 ∈ Σ∗ 𝜎 is sorted } (b) { ⟨𝜎, 𝑝⟩ 𝑝 is a permutation that orders 𝜎 }
✓ 3.17 For each language decision problem, name three members of the set, if there
are three, and then sketch an algorithm solving it.
(a) L0 = { ⟨𝑛, 𝑚⟩ ∈ N2 𝑛 + 𝑚 is a square and one greater than a prime }
†
Many authors use diamond brackets to stand for a representation, as in ‘ ⟨ G, 𝑣0 , 𝑣1 ⟩ ’. Here, we reserve
diamond brackets for sequences.
Section 3. Problems, algorithms, and programs 299

(b) L1 = {𝜎 ∈ { 0, ... 9 }∗ 𝜎 represents in decimal a multiple of 100 }

3.18 Solve the language decision problem for (a) the empty language, (b) the
language B, and (c) the language B∗.
3.19 For each language, sketch an algorithm that solves the language decision
problem.
(a) {𝜎 ∈ B∗ 𝜎 matches the regular expression a*ba* }
(b) The language defined by this grammar
S → AB
A → aA | 𝜀
B → bB | 𝜀
3.20 Solve each decision problem about Finite State machines, M, by producing
an algorithm. (a) Given M, decide if the language accepted by M is empty.
(b) Decide if the language accepted by M is infinite. (c) Decide if L ( M) is the
set of all strings, Σ∗ .
3.21 For each language decision problem, give an algorithm that runs in O ( 1) .
(a) The language of minimal-length binary representations of numbers that are
nonzero.
(b) The binary representations of numbers that exceed 1000.

3.22 In a graph, a bridge edge is one whose removal disconnects the graph. That
is, there are two vertices that before the bridge is removed are connected by a
path, but are not connected after it is removed. (More precisely, a connected
component of a graph is a set of vertices that can be reached from each other by
a path. A bridge edge is one whose removal increases the number of connected
components.) The problem is: given a graph, find a bridge. Is this a function
problem, a decision problem, a language decision problem, a search problem, or
an optimization problem?
✓ 3.23 For each, give the categorization that best applies: a function problem, a
decision problem, a language decision problem, a search problem, or an opti-
mization problem. (a) The Graph Connectedness problem, which inputs a graph
and decides whether for any two vertices there is a path between them. (b) The
problem that inputs two natural numbers and returns their least common multiple.
(c) The Graph Isomorphism problem that inputs two graphs and determines
whether they are isomorphic. (d) The problem that takes in a propositional logic
statement and returns an assignment of truth values to its inputs that makes the
statement true, if there is such an assignment. (e) The Nearest Neighbor problem
that inputs a weighted graph and a vertex, and returns a vertex nearest the given
one that does not equal the given one. (f) The Discrete Logarithm problem: given
a prime number 𝑝 and two numbers 𝑎, 𝑏 ∈ N, determine if there is a power 𝑘 ∈ N
so that 𝑎𝑘 ≡ 𝑏 ( mod 𝑝) . (g) The problem that inputs a bitstring and decides if
300 Chapter V. Computational Complexity

the number that it represents in binary will, when converted to decimal, contain
only odd digits.
✓ 3.24 For each, give the characterization that best applies: a function problem, a
decision problem, a language decision problem, a search problem, or an optimiza-
tion problem. (a) The 3-SAT problem, Problem 2.10 (b) The Divisor problem,
Problem 2.38 (c) The Prime Factorization problem, Problem 2.39 (d) The F-SAT
problem, where the input is a propositional logic expression and the output is
either an assignment of 𝑇 and 𝐹 to the expression’s variables that makes it evaluate
to 𝑇 , or the string None. (e) The Primality problem, Problem 2.40
3.25 Express each task as a language decision problem. Include in the description
explicit mention of the string representation. (a) Decide whether a number is a
perfect square. (b) Decide whether a triple ⟨𝑥, 𝑦, 𝑧⟩ ∈ N3 is a Pythagorean triple,
that is, whether 𝑥 2 + 𝑦 2 = 𝑧 2 . (c) Decide whether a graph has an even number of
edges. (d) Decide whether a path in a graph has any repeated vertices.
✓ 3.26 Recast each as a language decision problem. Include explicit mention of the
string representation. (a) Given a natural number, do its factors add to more than
twice the number? (b) Given a Turing machine and input, does the machine halt
on the input in less than ten steps? (c) Given a propositional logic statement, are
there three different assignments that evaluate to 𝑇 ? That is, are there more than
three lines in the truth table that end in 𝑇 ? (d) Given a weighted graph and a
bound 𝐵 ∈ R, for any two vertices is there a path from one to the other with total
cost less than the bound?
3.27 Recast each in language decision terms. Include explicit mention of the string
representation. (a) Graph Colorability, Problem 2.7, (b) Euler Circuit, Problem 2.4,
(c) Shortest Path, Problem 2.5.
3.28 Restate the Halting problem as a language decision problem.
✓ 3.29 As stated, the Shortest Path problem, Problem 2.5, is an optimization problem.
Convert it into a parametrized family of decision problems. Hint: use the technique
outlined following the Traveling Salesman problem, Problem 2.3.
✓ 3.30 Express each optimization problem as a parametrized family of language
decision problems. (a) Given a Fiteen Game board, find the least number of slides
that will solve it. (b) Given a Rubik’s cube configuration, find the least number of
moves to solve it. (c) Given a list of jobs that must be accomplished to assemble a
car, along with how long each job takes and which jobs must be done before other
jobs, find the shortest time to finish the entire car.
3.31 As stated, the Hamiltonian Circuit problem is a decision problem. Give a
function version of this problem. Also give an optimization version.
3.32 The different problem types are related. Each of these inputs a square
matrix 𝑀 with more than 3 rows, and relates to a 3 × 3 submatrix (form the
submatrix by picking three rows and three columns, which need not be adjacent).
Characterize each as a function problem, a decision problem, a search problem, or
an optimization problem. (a) Find a submatrix that is invertible. (b) Decide if
Section 4. P 301

there is an invertible submatrix. (c) Return a submatrix that is invertible, or the

string ‘None’. (d) Return a submatrix whose determinant has the largest absolute
value. Also give a language for an associated language decision problem.
3.33 Recast each function problem as a decision problem.
(a) The problem that inputs two natural numbers and returns their product.
(b) The Nearest Neighbor problem, that inputs a weighted graph and a vertex and
returns the vertex nearest the given one, but not equal to it.
3.34 The Linear Programming problem is described on page 285. Give a version
that is a (a) language decision problem, (b) search problem, (c) function problem,
and (d) optimization problem. (For some parts there is more than one sensible
answer.)
3.35 An independent set in a graph is a collection of vertices such that no
two are connected by an edge. Give a version of the problem of finding an
independent set that is a (a) a decision problem, (b) language decision problem,
(c) search problem, (d) function problem, and (e) optimization problem. (For
some parts there is more than one reasonable answer.)
3.36 Give an example of a problem where the decision variant is solvable quickly
but the search variant is not.
3.37 Let L𝐹 = { ⟨𝑛, 𝐵⟩ ∈ N2 there is an 𝑚 ∈ { 2, ... 𝐵 } that divides 𝑛 } and con-
sider its language decision problem. (a) Show that ⟨𝑑, 𝐵⟩ ∈ L𝐹 if and only if 𝐵 is
greater than or equal to the least prime factor of 𝑑 . (b) Conclude that you can use
a solution to the language recognition problem to solve the search problem that is
given a number and returns a prime factor of that number.
✓ 3.38 Show how to use an algorithm that solves the Shortest Path problem to
solve the Vertex-to-Vertex Path problem. How to use it on graphs that are not
weighted?
✓ 3.39 Show that with an algorithm that quickly solves the Subset Sum problem,
Problem 2.25, we can quickly solve the associated function problem of finding the
subset.
3.40 Show how to use an algorithm that solves Vertex-to-Vertex Path problem
to solve the Graph Connectedness problem, which inputs a graph and decides
whether that graph is connected, so that for any two vertices there is a path
between them.

Section
V.4 P
We have said that we often blur the distinction between the problem of deciding
membership in a language L and the language itself. So to express that we are
studying problems of a certain type we may say we are studying languages.
4.1 Definition A complexity class is a collection of languages.
302 Chapter V. Computational Complexity

The term ‘complexity’ is there because these collections are often associated
with some resource specification, so that for instance one class is the collection of
languages that are accepted by a Turing machine in quadratic time.†
4.2 Example One complexity class is the collection of languages for which there is a
deciding Turing machine that runs in time O (𝑛 2 ) . Thus C = { L0, L1, ... }, where
each L 𝑗 is decided by some machine P𝑖 𝑗 , for which the function 𝑓 relating the size
of the machine’s input |𝜎 | to the number of steps that the machine takes to finish
is quadratic, that is, 𝑓 is O (𝑛 2 ) .
4.3 Example Another is the collection of languages accepted by some Turing machine
that uses only logarithmic space. That is, for such a machine, with input string 𝜎 the
function 𝑓 relating |𝜎 | to the maximum number of tape squares that the machine
visits in deciding a string of that length is logarithmic, 𝑓 ∈ O ( lg) .
Two points bear explication. As to the computing machine, researchers
study not just Turing machines but other types of machines as well, including
nondeterministic Turing machines and Turing machines with access to an oracle
for random numbers. And as for the resource specification, it often involve bounds
on the time or space behavior. But a class could instead be, for instance, the
complement of O (𝑛 2 ) , so the specification isn’t always a bound.‡
Definition The complexity class that we introduce now is the most important one.
It is the collection of problems that under Cobham’s Thesis we take to be tractable.
4.4 Definition A language decision problem L is a member of the class P if there is
an algorithm to decide membership in L that on a deterministic Turing machine
runs in polynomial time.
4.5 Example The problem L = { G there is a path between any two vertices } of
deciding whether a given graph is connected is a member of P. To verify this, we
must produce an algorithm that decides membership in this language, and that
runs in polynomial time. One is to do a breadth first search of the graph, which
has a runtime that is cubic in the number of nodes.
4.6 Example Another member of P is the problem of deciding whether two numbers are
relatively prime, { ⟨𝑛 0, 𝑛 1 ⟩ ∈ N2 their greatest common divisor is 1 }. As before,
to verify that this language is a member of P we produce an algorithm that
determines membership and that runs in polytime. Euclid’s algorithm fits the bill;
it solves this problem and has runtime O ( lg ( max (𝑛 0, 𝑛 1 ))) .
4.7 Example Still another member of P is the String Search problem of deciding
substring-ness, { ⟨𝜎, 𝜏⟩ ∈ Σ∗ 𝜎 is a substring of 𝜏 }. (Often in practice 𝜏 is very
long and is called the haystack while 𝜎 is short and is the needle.) The algorithm
that first tests 𝜎 at the initial character of 𝜏 , then at the next character, etc., has a
†
There are other definitions of complexity class. Some authors require that in a class the characteristic
function of each language can be computed under some resource specification. This has implications —
if all of the members of a class must be computable by Turing machines then each class is countable.
Here, we only say that it is a collection so that the definition is maximally general. ‡ At this
writing there are 546 studied classes but the number changes frequently; see the Complexity Zoo,
https://complexityzoo.net/.
Section 4. P 303

runtime of O (|𝜎 | · |𝜏 |) , which is O ( max (|𝜎 |, |𝜏 |) 2 ) .

4.8 Example A circuit is a directed acyclic graph. Below, each vertex, called a gate,
acts as a two input/one output Boolean function. The only exception is that
some vertices are input gates that provide source bits (below on the left they are
𝑏 0, 𝑏 1, 𝑏 2, 𝑏 3 ∈ B). Edges are called wires, ∧ denotes the boolean function ‘and’, ∨
is ‘or’, ⊕ is ‘exclusive or’, and ≡ is the negation of ‘exclusive or’, which returns 1 if
and only if the two inputs bits are the same.

𝑏0 ⊕
∧
𝑏1 ∧
≡ 𝑓 (𝑏 0 , 𝑏 1, 𝑏 2, 𝑏 3 )
𝑏2 ⊕
∨
𝑏3 ∧

This circuit returns 1 if the sum of the input bits is a multiple of 3. The
Circuit Evaluation problem inputs a circuit like this one and computes the out-
put, 𝑓 (𝑏 0, 𝑏 1, 𝑏 2, 𝑏 3 ) . This problem is a member of P.
4.9 Example Although polytime is a restriction, nonetheless P is a very large collection.
More example members: (1) matrix multiplication, taken as a language decision
problem for { ⟨𝜎0, 𝜎1, 𝜎2 ⟩ they represent matrices with 𝑀0 · 𝑀1 = 𝑀2 } (2) mini-
mal spanning tree, { ⟨G,𝑇 ⟩ 𝑇 is a minimal spanning tree in G } (3) edit distance,
the number of single-character removals, insertions, or substitutions needed to
transform between strings, { ⟨𝜎0, 𝜎1, 𝑛⟩ 𝜎0 transforms to 𝜎1 in at most 𝑛 edits }.

4.10 Figure: This blob contains all language decision problems, all L ⊆ B∗ . Shaded is P.

Two final points. First, if a problem has an algorithm that is O ( lg 𝑛) then that
problem is in P. Second, the members of P are problems (actually, languages that
represent problems), so it is wrong to say that an algorithm is in P.

Effect of the model of computation A problem is in P if it has an algorithm

that is polytime. But algorithms are based on an underlying computing model. Is
membership in P dependent on the model that we use?
In particular, our experience with Turing machines gives the sense that they
involve a lot of tape shuttling. So we may expect that algorithms directed at
Turing machine hardware are slow. However, close analysis with a wide range
304 Chapter V. Computational Complexity

of alternative computational models proposed over the years shows that while
Turing machine algorithms are indeed often slower than related algorithms for
other natural models, it is only by a factor of between 𝑛 2 and 𝑛 4.† That is, if we
have a problem for which there is a O (𝑛) algorithm on another model then we
may find that on a Turing machine model it is O (𝑛 3 ) , or O (𝑛 4 ) , or O (𝑛 5 ) . So it is
still in P.
A variation of Church’s thesis, the Extended Church’s Thesis, posits that not
only are all reasonable models of mechanical computation of equal power, but in
addition that they are of equivalent speed in that we can simulate any reasonable
model of computation‡ in polytime on a probabilistic Turing machine.# Under the
extended thesis, a problem that falls in the class P using Turing machines also falls
in that class using any other natural models. (Note, however, that this thesis does
not enjoy anything like the support of the original Church’s Thesis. Also, we know
of several problems, including the familiar Prime Factorization problem, that under
the Quantum Computing model have algorithms with polytime solutions, but for
which we do not know of any polytime solution in a non-quantum model. So
Quantum Computing could well would provide a counterexample to the extended
thesis, if we can produce physical devices matching that model.)
4.11 Remark In recent years a number of researchers claimed to have built devices
that achieved quantum advantage, to have solved a problem using an algorithm
running on a physical quantum device that does not appear to be solvable on a
Turing machine or RAM machine-based device in less than centuries.
The claim is the subject of scholarly reservations. For one thing, the advantage
depends both on there being a quantum device that accomplishes the task and
also on there not being a classical algorithm that is fast. In fact soon after the
original claim was made other researchers produced an algorithm for a traditional
device that is near parity. Another thing is that on its face, this is not about
general purpose computing; the problem solved is exotic and especially suitable
to the instruments that researchers have available. There are sound reasons
to wonder whether quantum computers will ever be practical physical devices
used for everyday problems, although scientists and engineers are making great
progress. We will put this aside for being as-yet unsettled but it is worth monitoring
developments closely.
Naturalness We give the class P our attention because there are reasons to suppose
that it is the best candidate for the collection of problems that have a feasible
solution. We close this section with a discussion of those.
The first reason echos the prior subsection. There are many models of com-
putation, including Turing machines, RAM machines, and Racket programs. All
of them compute the same set of functions as Turing machines. Further, while
their speeds may differ, all of them run within polytime of each other.§ That makes
†
We take a model to be ‘natural’ if it was not invented in order to be a counterexample to this. ‡ One
definition of ‘reasonable’ is “in principle physically realizable” (Bernstein and Vazirani 1997). # A
Turing machine with access to an oracle of random bits. § All, that is, of the non-quantum natural
models.
Section 4. P 305

P invariant under the choice of computing model: if a problem is in P for any of

these models then it is in P for all. The fact that Turing machines are our standard
is in some ways a historical accident but differences between the runtime behavior
of any of these models is lost in the general polynomial sloppiness.
Another reason that P is a natural class is that we’d like that if two things are
easy to compute then a simple combination of the two is also easy. More precisely,
fix two total functions 𝑓 , 𝑔 : N → N and consider these.

L 𝑓 = { str (⟨𝑛, 𝑓 (𝑛)⟩) ∈ B∗ × B∗ 𝑛 ∈ N }

L𝑔 = { str (⟨𝑛, 𝑔(𝑛)⟩) ∈ B∗ × B∗ 𝑛 ∈ N }

(Recall that str (...) means that we represent the argument reasonably efficiently
as a bitstring.) With that recasting of functions as languages, P is closed under
function addition, scalar multiplication by an integer, subtraction, multiplication,
and composition. It is also closed under language concatenation and the Kleene
star operator. It is the smallest nontrivial class with these appealing properties.
But the main reason that P is our candidate is Cobham’s Thesis, the contention
that a problem is tractable if it has a solution algorithm that runs in polytime.
Recall the counterargument that a problem whose solution algorithms cannot
be improved below a runtime of O (𝑛 1 000 000 ) is not really tractable. We know
such problems exist because we can produce them using diagonalization. But
problems produced in that way are artificial. Empirical experience over close to
a century of computing is that problems with solution algorithms of very large
degree polynomial time complexity do not seem to arise often in practice. We see
plenty of problems with solution algorithms that are O (𝑛 lg 𝑛) , or O (𝑛 3 ) , and we
see plenty that are exponential, but we just do not see much of O (𝑛 1 000 000 ) .
Moreover, often in the past when a researcher has produced an algorithm for a
problem with a runtime that has even a moderately large degree then often, with
this foot in the door, over the next few years the community brings to bear an
array of mathematical and algorithmic techniques that lower the runtime degree
to reasonable size.
Even if the objection to Cobham’s Thesis is right and P is too broad, the class
would still be useful because if we could show that a problem is not in P then we
would have shown that it has no general solution algorithm that is practical.†
So Cobham’s Thesis, to this point, has largely held up. Insofar as theory should
be a guide for practice, this is a good reason to use P as a touchstone for other
complexity classes.

V.4 Exercises
✓ 4.12 True or False: if the language is finite then the language decision problem is
in P.
†
This argument has lost some of force in recent years with the rise of SAT solvers. These attack problems
believed to not be in P and can solve instances of the problems of moderately large size, using only
moderately large computing resources. See Extra C.
306 Chapter V. Computational Complexity

✓ 4.13 Your coworker says something mistaken, “I’ve got a problem whose algorithm
is in P.” They are being a little sloppy with terms; how?
✓ 4.14 What is the difference between an order of growth and a complexity class?
✓ 4.15 Your friend says to you, “I think that the Circuit Evaluation problem takes
exponential time. There is a final vertex. It takes two inputs, which come from
two vertices, and each of those take two inputs, etc., so that a five-deep circuit can
have thirty two vertices.” Help them see where they are wrong.
4.16 In class, someone says to the professor, “Why aren’t all languages in P: I’ll
design a Turing machine so that no matter what the input is, it outputs 1. That
runs in polytime for sure.” Explain how this is mistaken.
4.17 True or false: if a problem has a logarithmic solution then it is in P.
4.18 True or false: if a language is decided by a machine then its complement is
also accepted by some machine.
✓ 4.19 Show that the decision problem for {𝜎 ∈ B∗ 𝜎 = 𝜏 3 for some 𝜏 ∈ B∗ } is
in P.
✓ 4.20 Show that the language of palindromes, {𝜎 ∈ B∗ 𝜎 = 𝜎 R }, is in P.
4.21 Sketch a proof that each problem is in P.
(a) The 𝜏 3 problem: given a bitstring 𝜎 , decide if it has the form 𝜎 = 𝜏 ⌢ 𝜏 ⌢ 𝜏 .
(b) The problem of deciding which Turing machines halt within ten steps.

✓ 4.22 Consider the problem of Triangle: given an undirected graph, decide if it has
a 3-clique, three vertices that are mutually connected.
(a) Why is this not the Clique problem, from page 283?
(b) Sketch a proof that this problem is in P.

✓ 4.23 Prove that each problem is in P by citing the runtime of an algorithm that
suits.
(a) Deciding the language {𝜎 ∈ { a, ... z } 𝜎 is in alphabetical order } .
∗

(b) Deciding the language of correct sums, { ⟨𝑎, 𝑏, 𝑐⟩ ∈ N3 𝑎 + 𝑏 = 𝑐 } .

(c) Analogous to the prior item, deciding this language of triples of matrices that
give correct products, { ⟨𝐴, 𝐵, 𝐶⟩ the matrices are such that 𝐴𝐵 = 𝐶 }.
(d) Deciding the language of primes, { 1𝑘 𝑘 is prime } .
(e) Vertex-to-Vertex Path: { ⟨G, 𝑣 0, 𝑣 1 ⟩ the graph G has a path from 𝑣 0 to 𝑣 1 } .

4.24 Find which of these are currently known to be in P and which are not.
Hint: you may need to look up the fastest known algorithm. (a) Shortest Path
(b) Knapsack (c) Euler Path (d) Hamiltonian Circuit
4.25 Is the empty language { } ⊂ B∗ a member of P?
4.26 The problem of Graph Connectedness is: given a finite graph, decide if there
is a path from any vertex to any other. Sketch an argument that this problem is
in P.
Section 4. P 307

4.27 Following the definition of complexity class, Definition 4.1, is a discussion of

the additional condition of being computed by some machine under a resource
specification, such as a Big O constraint on time or space.
(a) Show that the set of regular languages forms a complexity class, and that it
meets this additional constraint.
(b) The definition of P uses Turing machines. We can view a Finite State machine
as a kind of Turing machine, one that consumes its input one character at a
time, never writes to the tape, and, depending on the state that the machine is
in when the input is finished, prints 0 or 1. With that, argue that any regular
language is an element of P.
4.28 We have already studied the collection RE of languages that are computably
enumerable.
(a) Recast RE as a class of language decision problems.
(b) Following Definition 4.1 is a discussion of the additional condition of being
computed by a machine under a resource specification. Show that RE also
satisfies this condition.
4.29 Is P countable or uncountable?
4.30 If L0, L1 ∈ P and L0 ⊆ L ⊆ L1 , must L be in P?
✓ 4.31 Is the Halting problem in P?
4.32 A common modification of the definition of Turing machine designates one
state as an accepting state. Then the machine decides the language L if it halts on
all input strings, and L is the set of strings that such that the machine ends in the
accepting state. A language is decidable if it is decided by some machine. Prove
that every language in P is decidable.
✓ 4.33 Draw a circuit that inputs three bits, 𝑏 0, 𝑏 1, 𝑏 2 ∈ B, and outputs the value of
𝑏 0 + 𝑏 1 + 𝑏 2 ( mod 2) .
4.34 Prove that the union of two complexity classes is also a complexity class.
What about the intersection? Complement?
✓ 4.35 Prove that P is closed under the union of two languages. That is, prove that
if two languages are both in P then so is their union. Prove the same for the union
of finitely many languages.
4.36 Prove that P is closed under complement. That is, prove that if a language is
in P then so is its set complement.
4.37 Prove that the class of languages P is closed under reversal. That is, prove
that if a language is an element of P then so is the reversal of that language (which
is the language of string reversals).
4.38 Show that P is closed under the concatenation of two languages.
4.39 Show that P is closed under Kleene star, meaning that if L ∈ P then L∗ ∈ P.
(Hint: 𝜎 ∈ L∗ if 𝜎 = 𝜀 , or 𝜎 ∈ L, or 𝜎 = 𝛼 ⌢ 𝛽 for some 𝛼, 𝛽 ∈ L∗ )
308 Chapter V. Computational Complexity

4.40 Show that this problem is unsolvable: give a Turing machine P , decide
whether it runs in polytime on the empty input. Hint: if you could solve this
problem then you could solve the Halting problem.
4.41 There are studied complexity classes besides those associated with language
decision problems. The class FP consists of the binary relations 𝑅 ⊆ N2 where
there is a Turing machine that, given input 𝑥 ∈ N, can in polytime find a 𝑦 ∈ N
where ⟨𝑥, 𝑦⟩ ∈ 𝑅 .
(a) Prove that this class closed under function addition, multiplication by a
scalar 𝑟 ∈ N, subtraction, multiplication, and function composition.
(b) Where 𝑓 : N → N is computable, consider this decision problem associated
with the function, L 𝑓 = { str (⟨𝑛, 𝑓 (𝑛)⟩) ∈ B∗ 𝑛 ∈ N } (where the numbers
are represented in binary). Assume that we have two functions 𝑓0, 𝑓1 : N → N
such that L 𝑓0 , L 𝑓1 ∈ P. Show that the natural algorithm to check for closure
under function addition is pseudopolynomial.
4.42 Where L0, L1 ⊆ B∗ are languages, we say that L1 ≤𝑝 L0 if there is a function
𝑓 : B∗ → B∗ that is computable, total, that runs in polytime, and so that 𝜎 ∈ L1 if
and only if 𝑓 (𝜎) ∈ L0 . Prove that if L0 ∈ P and L1 ≤𝑝 L0 then L1 ∈ P.

Section
V.5 NP
Recall that a Finite State machine is nondeterministic if from a present configuration
and input it may pass to more than one next configuration, or one, or zero. We can
make a nondeterministic Turing machine by doing the same. Here is one having
two instructions starting with 𝑞 0 and 0 so if the machine is in 𝑞 0 and it reads a 0
on the tape then it is legal both to go to state 𝑞 2 and to state 𝑞 1 .

P = {𝑞 0 01𝑞 2, 𝑞 0 0R𝑞 1, 𝑞 1 0B𝑞 1, 𝑞 1 01𝑞 3, 𝑞 2 11𝑞 2, 𝑞 3 10𝑞 3 }

For such a machine the computational history can be more than a line, it can be a
tree. Below is part of the tree for machine P and input 00, with the middle branch
highlighted.
01 ⊢ 00
⊢ q3 q3

00
⊢ 0
⊢ q1
q1

00
⊢
q0 10 ⊢ 10 ⊢ 10 ···
q2 q2 q2

Nondeterministic Turing machines This modifies the definition of a Turing

machine by changing the transition function so that it outputs sets.
Section 5. NP 309

5.1 Definition A nondeterministic Turing machine P is a finite set of instruc-

tions 𝑞𝑝𝑇𝑝𝑇𝑛𝑞𝑛 ∈ 𝑄 × Σ × (Σ ∪ { L, R }) × 𝑄 , where 𝑄 is a finite set of states
and Σ is a finite set of tape alphabet characters, which contains at least two
members, including blank, and does not contain the characters L or R. Some
of the states, 𝐹 ⊆ 𝑄 , are accepting states. The association of the present state
and tape character with what happens next is given by the transition function,
Δ : 𝑄 × Σ → P ((Σ ∪ { L, R }) × 𝑄) .
After the definition of the Turing machine and others, we described how they
act as a sequence of ‘⊢’ steps from the initial configuration. Exercise 5.38 asks for a
similar description for these machines.
Adding nondeterminism to Turing machines adds some wrinkles. The computa-
tion tree might have some branches that halt and some that don’t, or maybe some
output 0 while some output 1. The simplest approach is not to define a function
computed by a nondeterministic machine but instead to follow our strategy for
Finite State machines and describe when the input string is accepted, so that these
machines are language deciders.
Of course, we have two mental models for how nondeterministic machines act.
One is that the machine is unboundedly parallel and simultaneously computes all of
the branches. The other is that the machine guesses which branch to follow — or is
told by some demon — and then can deterministically check that branch. For both,
the machine accepts an input string if at least one branch ends in an accepting
state and does not accept the input if no branch ends in an accepting state.
That final sentence raises two points. First, once some branch accepts the input
we might as well stop the computation, so we can take that accepting computations
always halt. Second, “no branch ends” seemingly could mean that in a non-
accepting computation some branches do not accept because their computation
fails to halt. But we are using these machines as language deciders and we shall
want to time them, including how long they take to not accept. So we will only
define language-deciding when all branches halt.
5.2 Definition Let P be a nondeterministic Turing machine such that every branch
in the computation tree, every sequence of valid transitions from the starting
configuration, is finite. Then P accepts an input string if at least one branch ends
in an accepting state and otherwise rejects it. The machine decides a language L
when for every input string 𝜎 , if 𝜎 ∈ L then P accepts it while if 𝜎 ∉ L then P
rejects it.
For Finite State machines, nondeterminism does not make any difference in that
a language is decided by some nondeterministic Finite State machine if and only
if it is decided by a deterministic Finite State machine. But Pushdown machines
are different: there are languages that a nondeterministic Pushdown machine
can decide but that cannot be decided by any deterministic one. Does adding
nondeterminism to Turing machines add any new capabilities?
5.3 Lemma For Turing machines, deterministic and nondeterministic machines decide
the same languages.
310 Chapter V. Computational Complexity

Proof One direction is easy. A deterministic Turing machine is a special case of

a nondeterministic one. So if a deterministic machine decides a language then a
nondeterministic one does also.
Conversely, let the nondeterministic Turing machine P decide the language L.
Consider a deterministic machine Q that does a breadth-first search of P ’s com-
putation tree. We will show that it decides the same language. Fix an input 𝜎 .
If P accepts 𝜎 then the search done by Q will eventually find that node in the
computation tree and then Q accepts 𝜎 . If P does not accept 𝜎 then every branch
in its computation tree halts in a state that is not accepting. There is a longest such
branch by König’s lemma. So the breadth-first search will eventually exhaust all
branches and then Q rejects 𝜎 .
That proof is, basically, time-slicing. In everyday computing we simulate an
unboundedly-parallel machine by having its CPU cycle among processes. For
example, if on a machine’s screen we have a window open to edit a program
and another window showing a video, the computer may do the two tasks by
updating the editor for a time, then updating the video, then back to the editor, etc.
When we use a such a system we perceive that many things are happening at once
although actually there is only one, or at least a limited number of simultaneous
physical computations.† This is a kind of dovetailing, a way of doing a breadth-first
traversal of the computation branches.
So adding nondeterminism doesn’t expand what Turing machines can compute.
Nonetheless, nondeterministic Turing machines are of interest. The reason is that
they at least appear to be fast.
Speed The real excitement is that a nondeterministic Turing machine might be
much faster than a deterministic one.
5.4 Example Recall that the Satisfiability problem inputs a Propositional Logic expres-
sion and determines whether there are truth value that we can substitute for the
variables to make the expression 𝑇 .
Is this propositional logic expression satisfiable?
𝐸 = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ ¬𝑄) ∧ (¬𝑃 ∨ 𝑄) ∧ (¬𝑃 ∨ ¬𝑄 ∨ ¬𝑅) ∧ (𝑄 ∨ 𝑅) (∗)
The natural approach is to compute a truth table. The table below shows that it is
satisfiable, because the 𝑇𝑇 𝐹 row ends in a 𝑇 .
𝑃 𝑄 𝑅 𝑃 ∨𝑄 𝑃 ∨ ¬𝑄 ¬𝑃 ∨ 𝑄 ¬𝑃 ∨ ¬𝑄 ∨ ¬𝑅 𝑄 ∨𝑅 (∗)
𝐹 𝐹 𝐹 𝐹 𝑇 𝑇 𝑇 𝐹 𝐹
𝐹 𝐹 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝑇 𝑇 𝐹
𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇 𝑇 𝐹 𝑇 𝐹 𝐹
𝑇 𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇 𝑇 𝑇
𝑇 𝑇 𝑇 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹
†
Depending on how many cores are in the CPU.
Section 5. NP 311

As to runtime, the number of table rows grows exponentially: it is 2 raised to

the number of input variables. Going through the rows one at a time would be
very slow.
Each line of the truth table is easy; the issue is that there are lots of lines.
This situation is perfectly suited for a machine that is unboundedly parallel. For
each line we could fork a child process. Each of these children is done quickly,
certainly in polytime, and if they are working in parallel then the whole thing is
polytime. If at the end any child is holding a 𝑇 then the expression as a whole is
satisfiable. That is, while a serial machine appears to require exponential time, a
nondeterministic machine does this job in polytime.
The same potential speedup happens with the Traveling Salesman problem.
Checking every potential circuit one at a time to see if it is shorter than a specified
bound would be slow. But we can think of a nondeterministic machine either as
doing all circuits in parallel, or as just guessing the best circuit (or being given it by
some oracular demon) and then checking it, which would be fast.
So while adding nondeterminism to Turing machines doesn’t allow them to
compute anything new, we might reasonably conjecture that it does allow them to
compute those things faster.

Definition Next we name a class of language decision problems associated with

nondeterministic Turing machines.
5.5 Definition The complexity class NP is the set of languages for which there is a
nondeterministic Turing machine decider P that runs in polytime, meaning that
there is a polynomial 𝑝 such that on input 𝜎 , all branches of P ’s computation halt
in time 𝑝 (|𝜎 |) .
The following is immediate because a deterministic Turing machine is a special
case of a nondeterministic one.
5.6 Lemma P ⊆ NP
Very important: no one knows whether P is a strict subset. That is, no one
knows whether P ≠ NP or P = NP. This is the biggest open problem in the Theory
of Computing and we will say much more in the rest of this chapter.

NP P P=NP

5.7 Figure: Which is it: P ⊂ NP or P = NP?

A pattern in mathematical presentations is to have a definition that is conceptu-

ally clear, followed by a result that is what we use in practice to determine whether
the definition applies. We now give that result. This is where we use the mental
model of the machine guessing, or of being told an answer.
312 Chapter V. Computational Complexity

Consider Satisfiability. Imagine that in Example 5.4 above the demon whispers,
“Psst! Try TTF.” With that hint we can quickly verify with a deterministic machine.
For that example’s language of satisfiable expressions, here is the verifier.
Start

Read 𝜎 , 𝜔

Compute line 𝜔 of 𝜎 ’s truth table (∗∗)

Y
It gives 𝑇 ? N

Accept Reject

We start with the expression 𝜎 = 𝐸 from that example’s (∗), and also feed it the
demon’s hint 𝜔 = TTF. It accepts, certainly in polytime, which verifies that 𝐸 is
satisfiable.
5.8 Definition A verifier for a language L is a deterministic Turing machine V that
inputs ⟨𝜎, 𝜔⟩ ∈ B2 and is such that 𝜎 ∈ L if and only if there exists an 𝜔 so that
V accepts ⟨𝜎, 𝜔⟩ .† The string 𝜔 is called the witness or certificate.
5.9 Lemma A language is in NP if and only if it has a verifier that runs in time
polynomial in |𝜎 | . That is, L ∈ NP if and only if there is a polynomial 𝑝 and a
deterministic Turing machine V that halts on all inputs ⟨𝜎, 𝜔⟩ in 𝑝 (|𝜎 |) time, and
is such that 𝜎 ∈ L exactly when there is a witness 𝜔 where V accepts ⟨𝜎, 𝜔⟩ .
Before the lemma’s proof we will first discuss some aspects of both the definition
and the lemma.
5.10 Example Our touchstone is the Satisfiability problem. Using the lemma to show
that this problem is in NP requires that we produce a deterministic Turing machine
verifier. In the flowchart below the first input 𝜎 is a Boolean expression while the
second, the 𝜔 , is a string that V interprets as describing a line of 𝜎 ’s truth table. If
a candidate expression 𝜎 is satisfiable then there is a suitable witness, a line from
the truth table, so that V can check that the named line gives a result of 𝑇 . As an
example, for the expression (∗) from above, take 𝜔 = TTF. Clearly the verifier can
do the checking in polytime. On the other hand, if a candidate 𝜎 is not satisfiable,
for example with the expression 𝜎 = 𝑃 ∧ ¬𝑃 , then no 𝜔 will cause V to accept.
Before the next example, a few comments.
The most striking thing about the definition is that it says that ‘there exists’
a witness 𝜔 but it does not say where the witness comes from. A person with a
computational mindset may well ask, “but how will we calculate the 𝜔 ’s?” The
point is not how to find them. The point is whether there exists a deterministic
Turing machine V that can leverage a given hint 𝜔 to verify in polytime that 𝜎 ∈ L.
That is, we don’t find the 𝜔 ’s, we just use them.
†
While we have given a definition of a nondeterministic Turing machine accepting its input, we have
not given one for deterministic machines. We could modify the machine definition to add accepting
states but for simplicity we take it to mean that V halts and outputs 1 or ‘Accept’.
Section 5. NP 313

Second, if 𝜎 ∉ L then the definition does not require a witness to that. Instead,
what’s required is that from among all possible strings 𝜔 there is none such that
the verifier accepts ⟨𝜎, 𝜔⟩ .†
The third comment relates to this asymmetry. Imagine that a demon hands
you some papers and claims that they contain an unbeatable strategy for chess.
Verifying requires stepping through the responses to each move, and responses
to the responses, etc., so there is lots of branching. To prove that the strategy is
unbeatable we appear to have to check all of the branches, not just find one good
one. It seems that a deterministic verifier must take exponential time. That would
make the demon’s papers, in a sense, useless. So this chess strategy is not like the
problems that we have been considering.
Also reflecting this asymmetry, it is not clear that L ∈ NP implies that its
complement Lc is a member of NP. Consider Satsifiability. If a propositional logic
expression 𝜎 is satisfiable then a witness to that is a pointer to a line of the truth
table. But for non-satisfiability there is no natural witness; instead, the natural
thing is to check all lines. As far as we know today, verifying that a Boolean formula
is not satisfiable takes more than polytime. Consequently, where the complexity
class co-NP contains the complements of languages from NP, we suspect that
NP ≠ co-NP.
Thus, the lemma explains something about the class NP: while P contains
problems where we can find the answer in polytime, NP contains the problems
whose answers are useful in that we can at least verify them in polytime.
Finally, the lemma requires that the runtime of the verifier is polynomial in |𝜎 | ,
not polynomial in the length of its input, ⟨𝜎, 𝜔⟩ . If it said the latter then we could
check the chess strategy just by using a witness that is exponentially long, which
consequently makes ⟨𝜎, 𝜔⟩ exponentially long. Also observe that because V runs in
time polynomial in |𝜎 | , for the verifier to accept there must exist a witness whose
length is at most polynomial in |𝜎 | , because with 𝜔 ’s that are too long the verifier
cannot even input them before its runtime bound expires.
5.11 Example The Hamiltonian Path problem is like the Hamiltonian Circuit problem
except that instead of requiring that the starting vertex equals the ending one, it
inputs two vertices. It is the problem of determining membership in this set.

L = { ⟨G, 𝑣, 𝑣ˆ⟩ path in G between 𝑣 and 𝑣ˆ visits every vertex exactly once }

We will show that this problem is in the class NP. We must produce a deterministic
Turing machine verifier V . It is sketched below. It takes as input ⟨𝜎, 𝜔⟩ , where the
candidate for membership in L is 𝜎 = ⟨G, 𝑣, 𝑣ˆ⟩ . The verifier interprets the witness
to be a path, 𝜔 = ⟨𝑣, 𝑣 1, ... 𝑣ˆ⟩ .

†
With this in mind, perhaps a better term for 𝜔 is “potential witness” or “proposed witness.” But those
are not standard terms.
314 Chapter V. Computational Complexity

Start

Read 𝜎 , 𝜔

Check that 𝜔 is a path in 𝜎 ’s graph

Y N
All vertices visited once?

Accept Reject

If there is a Hamiltonian path then there exists a witness 𝜔 , and so there is input
that V will accept. Clearly, if given acceptable input then V runs in polytime. On
the other hand, if 𝜎 has no Hamiltonian path then for no 𝜔 will V be able to verify
that 𝜔 is such a path, and thus it will not accept any input pair starting with 𝜎 .
5.12 Example The Primality problem asks whether a given number has a nontrivial
factor.
L = {𝑛 ∈ N+ 𝑛 has a divisor 𝑎 with 1 < 𝑎 < 𝑛 }

To show that L ∈ NP we construct a verifier. It inputs ⟨𝜎, 𝜔⟩ where 𝜎 represents

a number 𝑛 ∈ N+. For the witness 𝜔 we can use a string that V interprets as a
number 𝑎 , and then it tests whether 1 < 𝑎 < 𝑛 and whether 𝑎 divides 𝑛 . If 𝜎 ∈ L
then there is a suitable such witness, while if 𝜎 ∉ L then no 𝜔 will make V accept
the input. We can write this V to run in polytime. Therefore L ∈ NP.
We are ready now for the promised proof of Lemma 5.9.
Proof Suppose first that the language L is accepted by a nondeterministic Turing
machine P in polytime. We will construct a deterministic verifier V that runs in
polytime. Let 𝑝 : N → N be the polynomial such that for any input 𝜎 ∈ L, the
machine P has an accepting branch of length at most 𝑝 (|𝜎 |) . Use that branch to
make a witness to 𝜎 ’s acceptance, for instance as in the sequence 𝜔 = ⟨3, 2 ...⟩
meaning, “At the first node take the third child, then take the second child of
that, etc.” Restated, P has a finite number of states 𝑘 so we can represent the
accepting branch of its computation tree with a sequence 𝜔 of at most 𝑝 (|𝜎 |) many
numbers, each less than 𝑘 . In total, 𝜔 ’s length is polynomial in |𝜎 | . With this 𝜔 , a
deterministic machine V can verify P ’s acceptance of 𝜎 , in polynomial time.
For the converse suppose that the language L is accepted by a verifier V̂ that
runs in time bounded by a polynomial 𝑞 . We will construct a nondeterministic
Turing machine P̂ that in polytime accepts an input bitstring 𝜏 if and only if 𝜏 ∈ L.
The key is that P̂ is nondeterministic. Given a candidate 𝜏 for membership in
the language, (1) have P̂ nondeterminstically produce a witness 𝜔 ˆ of length less
than 𝑞(|𝜏 |) , (2) have P̂ then run ⟨𝜏, 𝜔⟩
ˆ through V̂ , and (3) if the verifier accepts its
input then P̂ accepts 𝜏 , while if V does not accept then P̂ rejects 𝜏 .
We must check that P̂ accepts 𝜏 if and only if 𝜏 ∈ L. A nondeterministic
machine accepts a string if there exists a branch that accepts the string, and rejects
the string if every branch rejects it. Suppose first that 𝜏 ∈ L. Because V̂ is a verifier,
in this case there exists a witness 𝜔ˆ (of length less than 𝑞(|𝜏 |) ) that will result in V̂
Section 5. NP 315

accepting ⟨𝜏, 𝜔⟩
ˆ , so there is a way for the prior paragraph to result in acceptance
of 𝜏 , and so P̂ accepts 𝜏 . Conversely, suppose that 𝜏 ∉ L. By the definition of a
verifier, no witness 𝜔
ˆ will result in V̂ accepting ⟨𝜏, 𝜔⟩
ˆ , and thus P̂ rejects 𝜏 .
A common reaction to the second half of that proof is, “Wait, P̂ pulls 𝜔 ˆ out
of the air? How is that legal?” This reaction — about everyday experience versus
abstraction — is both common and reasonable so we will address it.
The first response is purely formal. Definition 5.8, as written, states that the
candidate 𝜏 is accepted if there exists an 𝜔 ˆ and does not require us to be able to
compute it. The proof ’s final paragraph covers the two possibilities: if 𝜏 ∈ L then
there is such an 𝜔 ˆ and otherwise there is not, so the definition is satisfied. True, the
language “nondeterministically produces a witness” is provocative in that it tends
to draw the objection that we are addressing, but this language is common in the
literature. (In terms of the two mental models, we can take ‘P̂ nondeterminstically
produces a witness’ to mean either that it uses unbounded parallelism to produce
all possible 𝜔
ˆ ’s, or that it guesses 𝜔
ˆ or gets it from a demon.)
The second response is more broad. We today do not have physical devices
bearing the same relationship to nondeterministic Turing machines that everyday
computers bear to deterministic ones. (We can write a program to simulate
nondeterministic behavior but no device does it in hardware.) When Turing
formulated his definition there were no physical computers but they appeared soon
after; will we someday have nondeterministic devices? Putting aside as too exotic
proposals that involve things like time travel through wormholes, we don’t know
of any candidates.† But that doesn’t mean that thinking about them is a purely
academic exercise.‡ The model of nondeterministic Turing machines has proven to
be very fruitful.
As evidence of that, the problems that are associated with these machines, the
members of NP, are eminently practical, as witnessed by the fact that computer
scientists have been trying to find fast solutions to many members of this class
since computers have existed. For another, Lemma 5.9 rephrases questions about
nondeterministic machines as questions about deterministic ones, the verifiers.
We close with a reflection. In this section we have defined the class of problems
NP for which there is a good way to verify the solution, in contrast with the
problems in P for which there is a good way to find the solution. Just as computably
enumerable sets seem to be the limit of what can be in theory be known, polytime
verification seems to be the limit of what can feasibly be done.
In the next section we will consider whether the two classes P and NP differ.
†
In order here is a caution about the machine types that seem likely to be coming, quantum computers.
Well-established physical theory says that subatomic particles can be in a superposition of many states
at once. Naively, it may seem that because of this essentially unbounded multi-way branching, if we
could manipulate these particles then we would have nondeterministic computation. But, that we
know of, this is false. That we know of, to get information out of a quantum system we must use
interference. (Some popularizations wrongly suggest that quantum computers can try all potential
solutions in parallel. That is, they miss the point about interference.) ‡ Not that there is anything
wrong with academic exercises.
316 Chapter V. Computational Complexity

V.5 Exercises
✓ 5.13 Your study partner asks, “In Lemma 5.9, since the witness 𝜔 is not required
to be effectively computable, why can’t I just take it to be the bit 1 if 𝜎 ∈ L, and 0
if not? Then writing the verifier is easy: just ignore 𝜎 and follow the bit.” They are
confused. Straighten them out.
5.14 Which is the negation of ‘at least one branch accepts’?
(a) Every branch accepts.
(b) At least one branch rejects.
(c) Every branch rejects.
(d) At least one branch fails to reject.
(e) None of these.
✓ 5.15 Decide if it is satisfiable.
(a) (𝑃 ∧ 𝑄) ∨ (¬𝑄 ∧ 𝑅)
(b) (𝑃 → 𝑄) ∧ ¬((𝑃 ∧ 𝑄) ∨ ¬𝑃)
5.16 True or false? If a language is in P then it is in NP.
5.17 Uh-oh. You find yourself with a nondeterministic Turing machine where
on input 𝜎 , one branch of the computation tree accepts and one rejects. Some
branches don’t halt at all. What is the upshot?
✓ 5.18 You get an exercise, Write a nondeterministic algorithm that inputs a maze
and outputs 1 if there is a path from the start to the end.
(a) You hand in an algorithm that does backtracking to find any possible solution.
Your professor sends it back, and says to try again. What was wrong?
(b) You hand in an algorithm that, each time it comes to a fork in the maze,
chooses at random which way to go. Again you get it back with a note to work
out another try. What is wrong with this one?
(c) Give a right answer.
5.19 Sketch a nondeterministic algorithm to search an unordered array of numbers
for the number 𝑘 . Describe it both in terms of unbounded parallelism and in terms
of guessing.
5.20 A simple substitution cipher encrypts text by substituting one letter for
another. Start by fixing a permutation of the letters, for example ⟨W, P, ...⟩ Then
the cipher is that any A is replaced by a W, any B is replaced by a P, etc. Sketch
three algorithms for decoding a substitution cipher (assume that you can recognize
a correctly decoded string): (a) one that is deterministic, (b) one expressed in
terms of unbounded parallelism, and (c) one expressed in terms of guessing.
✓ 5.21 Outline a nondeterministic algorithm that inputs a finite planar graph and
outputs Yes if and only if the graph has a four-coloring. Describe it both in terms
of unbounded parallelism and in terms of a demon providing a witness.
5.22 The Linear Programming problem is described on page 285. The related
problem Integer Linear Programming also seeks to maximize a linear objective
function 𝐹 (𝑥 0, ... 𝑥𝑛 ) = 𝑑 0𝑥 0 + · · · + 𝑑𝑛 𝑥𝑛 subject to linear constraints 𝑎𝑖,0𝑥 0 +
Section 5. NP 317

· · · + 𝑎𝑖,𝑛 𝑥𝑛 ≤ 𝑏𝑖 (or ≥ 𝑏𝑖 ), but with the restriction that all of 𝑥 𝑗 , 𝑑 𝑗 , 𝑏 𝑗 , and

𝑎𝑖,𝑗 are integers. Recast it as a family of language decision problems. Sketch a
nondeterministic algorithm, giving both an unbounded parallelism formulation
and a guessing formulation.
✓ 5.23 The Semiprime problem inputs a number 𝑛 ∈ N and decides if its prime
factorization has exactly two primes, 𝑛 = 𝑝 00 𝑝 11 where 𝑒𝑖 > 0. State it as a language
𝑒 𝑒

decision problem. Sketch a nondeterministic algorithm that runs in polytime. Give

both an unbounded parallelism formulation and a guessing formulation.
5.24 For each, give a language so that it is a language decision problem. Then
give a polytime nondeterministic algorithm. State it in terms of guessing.
(a) Three Dimensional Matching: where 𝑋, 𝑌 , 𝑍 are sets of integers having 𝑛 ele-
ments, given as input a set of triples 𝑀 ⊆ 𝑋 × 𝑌 × 𝑍 , decide if there is an
𝑛 -element subset 𝑀ˆ ⊆ 𝑀 so that no two triples agree on their first coordinates,
or second, or third.
(b) Partition: given a finite multiset 𝐴 of natural numbers, decide if 𝐴 splits into
ˆ 𝐴 − 𝐴ˆ so the elements total to the same number,
multisets 𝐴, 𝑎∈𝐴ˆ 𝑎 = 𝑎∉𝐴ˆ 𝑎 .
Í Í

5.25 Sketch a nondeterministic algorithm that inputs a planar graph and a bound
𝐵 ∈ N and decides whether the graph is 𝐵 -colorable, described both in terms of
unbounded parallelism and also in terms of guessing.
✓ 5.26 For each problem, cast it as a language decision problem and then prove that
it is in NP by filling in the blanks in this argument.
Lemma 5.9 requires that we produce a deterministic Turing machine verifier, V . It must input
pairs of the form ⟨𝜎, 𝜔⟩ , where 𝜎 is (1) . It must have the property that if 𝜎 ∈ L then
there is an 𝜔 such that V accepts the input, while if 𝜎 ∉ L then there is no such witness 𝜔 . And
it must run in time polynomial in |𝜎 | .
The verifier interprets the bitstring witness 𝜔 as (2) , and checks that (3) . Clearly
that check can be done in polytime.
If 𝜎 ∈ L then by definition there is (4) , and so a witness 𝜔 exists that will cause V to
accept the input pair ⟨𝜎, 𝜔⟩ . If 𝜎 ∉ L then there is no such (5) , and therefore no witness 𝜔
will cause V to accept the input pair.
(a) The Double-SAT problem inputs a propositional logic statement and decides
whether it has at least two different substitutions of Boolean values that make
it true.
(b) The Subset Sum problem inputs a set of numbers 𝑆 ⊂ N and a target sum 𝑇 ∈
N, and decides whether least one subset of 𝑆 adds to 𝑇 .
✓ 5.27 In the game show Countdown, players get a target integer 𝑇 ∈ [100 .. 999]
and six numbers from 𝑆 = { 1, 2, ... 10, 25, 50, 75, 100 } (these can be repeated).
They make an arithmetic expression that evaluates to the target, using each given
number at most once. The expression can use addition, subtraction, multiplication,
and division without remainder. Show that the decision problem for Countdown =
{ ⟨𝑠 0, ... 𝑠 5,𝑇 ⟩ ∈ 𝑆 6 × 𝐼 a combination of the 𝑠𝑖 gives 𝑇 } is in NP.
✓ 5.28 Recall that we recast Traveling Salesman optimization problem as a language
decision problem for a family of languages. Show that each such language is in NP
318 Chapter V. Computational Complexity

by applying Lemma 5.9, sketching a verifier that works with a suitable witness.
5.29 The problem of Independent Sets starts with a graph and a natural number 𝑛
and decides whether in the graph there are 𝑛 -many independent vertices, that is,
vertices that are not connected. State it as a language decision problem, and use
Lemma 5.9 to show that this problem is in NP.
✓ 5.30 Use Lemma 5.9 to show that the Knapsack problem is in NP.
5.31 True or false? For the language { ⟨𝑎, 𝑏, 𝑐⟩ ∈ N3 𝑎 + 𝑏 = 𝑐 }, the problem of
deciding membership is in NP.
✓ 5.32 The Longest Path problem is to input a graph and a bound, ⟨G, 𝐵⟩ , and
determine whether the graph contains a simple path of length at least 𝐵 ∈ N. (A
path is simple if no two of its vertices are equal). Show that this is in NP.
5.33 Recast each as a language decision problem and then show it is in NP.
(a) The Linear Divisibility problem inputs a pair of natural numbers 𝜎 = ⟨𝑎, 𝑏⟩ and
asks if there is an 𝑥 ∈ N with 𝑎𝑥 + 1 = 𝑏 .
(b) Given 𝑛 points scattered on a line, how far they are from each other defines
a multiset. (Recall that a multiset is like a set but element repeats don’t
collapse.) The reverse of this problem, starting with a multiset 𝑀 of numbers
and deciding whether there exist a set of points on a line whose pairwise
distances defines 𝑀 , is the Turnpike problem.
5.34 Is NP countable or uncountable?
✓ 5.35 Show that this problem is in NP. A company has two delivery trucks. They
work with a weighted graph called the ‘road map’. (Some vertex is distinguished
as the start/finish.) Each morning the company gets a set of vertices, 𝑉 . They
must decide if there are two cycles such that every vertex in 𝑉 is on at least one of
the two cycles, and each cycle has length at most 𝐵 ∈ N.
✓ 5.36 Two graphs G0, G1 are isomorphic if there is a one-to-one and onto function
𝑓 : N0 → N1 such that {𝑣, 𝑣ˆ } is an edge of G0 if and only if { 𝑓 (𝑣), 𝑓 (𝑣ˆ) } is an edge
of G1 . Consider the problem of computing whether two graphs are isomorphic.
(a) Define the appropriate language. (b) Show that the language membership
problem is in NP.
5.37 The definition of when a nondeterministic machine decides a language,
Definition 5.2, requires that every branch in the computation tree is finite. For
recognition of languages we can drop that. We say that a nondeterministic Turing
machine 𝑃 recognizes a language L when if 𝜎 ∈ L then there is at least one
branch in the computation tree that accepts 𝜎 , while if 𝜎 ∉ L then no branch
in the computation tree accepts (some branches may fail to accept because they
are infinite). Show that if there is a nondeterministic machine that recognizes a
language then there is a deterministic machine that also recognizes it.
5.38 Following the definition of Turing machine, on page 8, we gave a formal
description of how these machines act. We did the same for Finite State machines
on page 184, and for nondeterministic Finite State machines on page 193. Give a
formal description of the action of a nondeterministic Turing machine.
Section 6. Reductions between problems 319

5.39 (a) Show that the Halting problem in not in NP. (b) What is wrong with
this reasoning? The Halting problem is in NP because given ⟨P , 𝑥⟩ , we can take as
the witness 𝜔 a number of steps for P to halt on input 𝑥 . If it halts in that number
of steps then the verifier accepts, and if not then the verifier rejects.

Section
V.6 Reductions between problems

When we studied incomputability we considered a sense in which some problems

are harder than others. Recall the Halts on Three problem to decide membership
in the set 𝑆 = {𝑒 ∈ N 𝜙𝑒 ( 3)↓}. We showed that if we could solve this then we
could solve the Halting problem and we denoted this situation with 𝐾 ≤𝑇 𝑆 . In
general, a set 𝐵 Turing reduces to 𝐴, written 𝐵 ≤𝑇 𝐴, if there is a Turing machine
that computes 𝐵 from an 𝐴 oracle, so that 𝜙𝑒𝐴 = 1𝐵 .† Said another way, we can
answer questions about membership in 𝐵 by being given access to a routine that
answers questions about membership in 𝐴.
There are reductions between problems other than Turing reductions. Consider
these two problems, where the second is a translation or alternate presentation of
the other: 𝑆 = {𝑒 𝜙𝑒 ( 3)↓} and 𝑇 = {𝑒 2 + 1 𝜙𝑒 ( 3)↓}. We can compute answers
√
to questions about 𝑇 from answers about 𝑆 — we have 𝑥 ∈ 𝑇 iff 𝑥 − 1 ∈ 𝑆 . We
say that 𝐵 is mapping reducible or many-one reducible to 𝐴, denoted 𝐵 ≤𝑚 𝐴, if
there is a total computable translation function 𝑓 such that 𝑥 ∈ 𝐵 if and only if
𝑓 (𝑥) ∈ 𝐴. Thus for this paragraph’s example, 𝑇 ≤𝑚 𝑆 .
The difference between the two reducibilities is that with 𝐵 ≤𝑇 𝐴, we answer
questions about membership in 𝐵 by asking a number of questions of an 𝐴 oracle
and then possibly doing more processing with that information. But with ≤𝑚
we make only one oracle call, asking whether 𝑓 (𝑥) ∈ 𝐴, and then the reduction
returns that call’s result. So, many-one reductions are a special case of Turing
reductions, meaning that if 𝐵 ≤𝑚 𝐴 then 𝐵 ≤𝑇 𝐴, but the converse does not
hold; see Exercise 6.35.
This chapter focuses on efficient use of resources so one natural thing to do is to
adapt Turing reduction and define a Cook reduction or polytime Turing reduction,
denoted 𝐵 ≤𝑇 𝐴, if there is an oracle Turing machine giving 𝜙𝑒𝐴 = 1𝐵 that runs in
𝑝

polytime. However, the more common reduction between problems is when we

adapt many-one reduction by restricting the translation function to run in polytime.

†
Often people get the phrase ‘reduces to’ the wrong way around. Perhaps they are misled by ‘𝐵 ≤𝑇 𝐴’
into thinking that 𝐵 is the reduced-to thing but the opposite is true. For example where 𝐴 is the
Entscheidungsproblem of answering all questions in Mathematics and 𝐵 is Goldbach’s conjecture, the
right terminology is that 𝐵 reduces to 𝐴 because a solution for 𝐴 gives one for 𝐵 as a side effect.
320 Chapter V. Computational Complexity

6.1 Definition Let L0, L1 be languages, subsets of B∗. Then L1 is polynomial time
reducible to L0 , or Karp reducible, or polynomial time mapping reducible, or
polynomial time many-one reducible, written L1 ≤𝑝 L0 or L1 ≤𝑚 L0 , if there is
𝑝

a reduction function or transformation function 𝑓 : B∗ → B∗ that is polynomial

time computable and such that 𝜎 ∈ L1 if and only if 𝑓 (𝜎) ∈ L0 .†
The point of a polytime reduction L1 ≤𝑝 L0 is that a fast way to determine
membership in L0 gives a fast way to determine membership in L1 .

Rec

6.2 Figure: This is the collection of all problems, L ∈ P ( B∗ ) , with a few shown as dots.
Ones with fast algorithms are at the bottom. Problems are connected if there is a
polytime reduction from one to the other. Highlighted are connections within P.

We gave the intuition that there is a reduction of this kind when one problem
is a translation of the other, or at least a translation of a special case. The first
example illustrates.
6.3 Example The Shortest Path problem inputs a weighted graph, two vertices, and a
bound, and decides if there is path between the vertices of length less than the
bound.
L0 = { ⟨G, 𝑣 0, 𝑣 1, 𝐵⟩ there is path from 𝑣 0 to 𝑣 1 of length less than 𝐵 }

The Vertex-to-Vertex Path problem inputs an unweighted graph and two vertices,
and decides if there is a path between the two.
L1 = { ⟨H, 𝑤 0, 𝑤 1 ⟩ there is path between 𝑤 0 and 𝑤 1 }

We will prove that L1 ≤𝑝 L0 . To decide whether some ⟨H, 𝑤 0, 𝑤 1 ⟩ is a member of

L1 , we will translate that question into one of membership in L0 . From H, make a
weighted graph G by and giving all its edges weight 1. Then ⟨H, 𝑤 0, 𝑤 1 ⟩ ∈ L1 if
and only if 𝑓 ( H) = ⟨G, 𝑤 0, 𝑤 1, | G |⟩ ∈ L0 . Clearly 𝑓 can be done in polytime.
In the next example at first glance the problems don’t seem related — for
one thing, one is a graph problem and the other is not — so we include some
development suggesting how to see such a relationship.
†
If between two problems there is a Karp reduction then there is a Cook reduction also, but the converse
does not hold. Under Cook reduction a problem and its complement are equivalent but that’s not true
under Karp reduction. An important case is that the complexity classes NP and co-NP (the problems
whose complement is in NP) are equal under Cook reduction but appear to be separate under Karp
reduction. Since any Karp reduction result is automatically a Cook reduction result, Karp reduction
gives a more fine-grained partition of the problems and so while it may at first glance seem a little less
natural, it is the right one for us to study. In any event, Karp reduction is absolutely the standard in
computational complexity.
Section 6. Reductions between problems 321

6.4 Remark Authors describing a reduction will often omit this kind of development
from the write-up. It is perfectly standard to expect the reader to work out the
motivation for the reduction function’s definition.
6.5 Example The Clique problem is the decision problem for the language L𝐵 =
{ ⟨G, 𝐵⟩ G has a clique with 𝐵 vertices }. We will sketch that Satisfiability ≤𝑝
Clique, so that intuitively Clique is at least as hard as Satisfiability.
Consider how to satisfy this Boolean expression.

𝐸 = (𝑥 0 ∨ ¬𝑥 1 ∨ 𝑥 2 ) ∧ (¬𝑥 0 ∨ 𝑥 2 ∨ ¬𝑥 3 ) ∧ (¬𝑥 1 ∨ ¬𝑥 2 )
The ∧’s make the statement as a whole 𝑇 if and only if all of its clauses are 𝑇. The
∨’s mean that each clause is 𝑇 if and only if any of its literals is 𝑇. So to satisfy the
expression, select a literal from each clause and assign it the value 𝑇. For example,
we can make 𝐸 be 𝑇 by selecting 𝑥 0 from the first clause, 𝑥 2 from the second, and
¬𝑥 1 from the third, and making them 𝑇. Similarly, if ¬𝑥 1 from the first and third
clauses and 𝑥 3 from the second are 𝑇 then 𝐸 is 𝑇. What we cannot do is pick 𝑥 2
from the first and second and then ¬𝑥 2 from the third, because we cannot set both
of these literals to be 𝑇.
That is, we can think of Satisfiability as a combinatorial problem. The clauses
are like buckets and we select one thing from each bucket, subject to the constraint
that the things we select must be pairwise compatible.
This view of Satisfiability has a binary relation ‘can be compatibly picked’
between the literals. So, as below, let G𝐸 be a graph whose vertices are pairs ⟨𝑐, ℓ⟩
where 𝑐 is the number of a clause and ℓ is a literal in that clause. Two vertices
𝑣 0 = ⟨𝑐 0, ℓ0 ⟩ and 𝑣 1 = ⟨𝑐 1, ℓ1 ⟩ are connected by an edge if they come from different
clauses so 𝑐 0 ≠ 𝑐 1 , and if the literals are not negations of each other so ℓ0 ≠ ¬ℓ1 .

0, 𝑥 2 1, ¬𝑥 3

0, ¬𝑥 1 1, 𝑥 2

0, 𝑥 0 1, ¬𝑥 0

2, ¬𝑥 1 2, ¬𝑥 2

6.6 Animation: Compatibility graph associated with 𝐸 .

A choice of three mutually compatible vertices makes 𝐸 evaluate to 𝑇 . That is, the
3 clause expression 𝐸 is satisfiable if and only if G𝐸 has a 3-clique.
More formally, the reduction function 𝑓 inputs a propositional logic expression 𝐸
and outputs a pair 𝑓 (𝐸) = ⟨G𝐸 , 𝐵⟩ where G𝐸 is the compatibility graph associated
with 𝐸 defined in the prior paragraph and where 𝐵 is the number of clauses in 𝐸 .
Then 𝐸 ∈ SAT if and only if 𝑓 (𝐸) ∈ L𝐵 . Clearly this function can be computed in
polytime.
6.7 Example Recall that a graph is 𝑘 -colorable if we can partition the vertices into
𝑘 many classes, called ‘colors’, so that two vertices can have the same color only
322 Chapter V. Computational Complexity

when there is no edge between them.

6.8 Animation: A 3-coloring of the Petersen graph.

We will illustrate that the Graph Colorability problem reduces to the Satisfiability
problem, Graph Colorability ≤𝑝 Satisfiability, by focusing on the 𝑘 = 3 construction.
(Larger 𝑘 ’s work much the same way, although the 𝑘 = 2 case is different.)
Denote the set of satisfiable propositional logic statements as L0 and the set
of 3-colorable graphs as L1 . To show that L1 ≤𝑝 L0 we must produce a reduction
function 𝑓 . It inputs a graph G and a outputs a propositional logic expression
𝐸 = 𝑓 ( G ) such that the graph is 3-colorable if and only if the expression is satisfiable.
The function builds 𝐸 by including clauses that state, in the language of
propositional logic, the constraints to be met for the graph to be 3-colorable.
Let G have 𝑛 -many vertices {𝑣 0, ... 𝑣𝑛− 1 }. Then 𝐸 has 3𝑛 -many Boolean variables,
𝑎 0, ... 𝑎𝑛−1 , and 𝑏 0, ... 𝑏𝑛−1 , and 𝑐 0, ... 𝑐𝑛−1 . The idea is that if the 𝑖 -th vertex 𝑣𝑖
gets the first color then 𝐸 will be satisfied when the associated variables have
𝑎𝑖 = 𝑇 , 𝑏𝑖 = 𝐹, 𝑐𝑖 = 𝐹 , while if 𝑣𝑖 gets the second color then 𝐸 will be satisfied when
𝑎𝑖 = 𝐹, 𝑏𝑖 = 𝑇 , 𝑐𝑖 = 𝐹 , and if 𝑣𝑖 gets the third color then 𝐸 will be satisfied when
𝑎𝑖 = 𝐹, 𝑏𝑖 = 𝐹, 𝑐𝑖 = 𝑇 .
Specifically, the expression includes two kinds of clauses. For every vertex 𝑣𝑖 ,
there is a clause saying that the vertex gets at least one color.
(𝑎𝑖 ∨ 𝑏𝑖 ∨ 𝑐𝑖 )
And for each edge {𝑣𝑖 , 𝑣 𝑗 }, there are three clauses which together ensure that the
edge does not connect two same-color vertices.
(¬𝑎𝑖 ∨ ¬𝑎 𝑗 ) (¬𝑏𝑖 ∨ ¬𝑏 𝑗 ) (¬𝑐𝑖 ∨ ¬𝑐 𝑗 )
The function’s output 𝐸 is the conjunction of all of these clauses.
This illustrates. The expression’s top line has the clauses of the first kind while
the remaining lines have the other kind.
(𝑎 0 ∨ 𝑏 0 ∨ 𝑐 0 ) ∧ (𝑎 1 ∨ 𝑏 1 ∨ 𝑐 1 ) ∧ (𝑎 2 ∨ 𝑏 2 ∨ 𝑐 2 ) ∧ (𝑎 3 ∨ 𝑏 3 ∨ 𝑐 3 )
∧ (¬𝑎 0 ∨ ¬𝑎 1 ) ∧ (¬𝑏 0 ∨ ¬𝑏 1 ) ∧ (¬𝑐 0 ∨ ¬𝑐 1 )
𝑣0 𝑣1 𝑣2
∧ (¬𝑎 0 ∨ ¬𝑎 3 ) ∧ (¬𝑏 0 ∨ ¬𝑏 3 ) ∧ (¬𝑐 0 ∨ ¬𝑐 3 )
𝑣3 ∧ (¬𝑎 1 ∨ ¬𝑎 2 ) ∧ (¬𝑏 1 ∨ ¬𝑏 2 ) ∧ (¬𝑐 1 ∨ ¬𝑐 2 )
∧ (¬𝑎 1 ∨ ¬𝑎 3 ) ∧ (¬𝑏 1 ∨ ¬𝑏 3 ) ∧ (¬𝑐 1 ∨ ¬𝑐 3 )

By construction, there is an assignments of truth values for the variables to satisfy

the expression if and only if the graph has a 3-coloring.
Section 6. Reductions between problems 323

Completing the argument requires checking that the reduction function, which
inputs a bitstring representation of the graph and outputs a bitstring representation
of the expression, is polynomial. That’s clear so we omit the details.
A reduction function is a kind of compiler. A programming language compiler
inputs descriptions from one domain, such as a Racket program, and outputs a
corresponding statement from another domain, such an executable in the machine’s
native format. Similarly, the function 𝑓 above translates problem instances in the
domain of graphs to those in the domain of propositional logic.
6.9 Example We will show that Subset Sum reduces to Knapsack, that Subset Sum ≤𝑝
Knapsack. The Knapsack problem starts with a multiset of objects 𝑈 = {𝑢 0, ... 𝑢𝑛− 1 }
whose elements each have a weight 𝑤 (𝑢𝑖 ) and a value 𝑣 (𝑢𝑖 ) , along with an upper
bound on the weights 𝑊 ∈ N and a lower bound for the values 𝑉 ∈ N. It asks for a
subset 𝐴 ⊆ 𝑈 such that the weight total does not exceed 𝑊 while the value total is
at least as big as 𝑉 .

L0 = { ⟨𝑈 , 𝑤, 𝑣,𝑊 , 𝑉 ⟩ some 𝐴 ⊆ 𝑈 has 𝑎∈𝐴 𝑤 (𝑎) ≤ 𝑊 and 𝑎∈𝐴 𝑣 (𝑎) ≥𝑉}

Í Í

The Subset Sum problem starts with a a multiset 𝑆 = {𝑠 0, ... 𝑠𝑘 − 1 } ⊂ N and a

target 𝑇 ∈ N. It asks for a subset whose elements add to the target.

L1 = { ⟨𝑆,𝑇 ⟩ some 𝑅 ⊆ 𝑆 has 𝑟 ∈𝑅 𝑟 =𝑇 }

The reduction function 𝑓 must input pairs ⟨𝑆,𝑇 ⟩ and output five-tuples
⟨𝑈 , 𝑤, 𝑣,𝑊 , 𝑉 ⟩ , and must be such that ⟨𝑆,𝑇 ⟩ ∈ L1 if and only if 𝑓 (⟨𝑆,𝑇 ⟩) ∈ L0 .
And it must be polytime.
A numerical example gives the idea of how 𝑓 proceeds. Imagine that we want
to know if there is a subset of 𝑆 = { 18, 23, 31, 33, 72, 86, 94 } that adds to 𝑇 = 126.
If we had access to an oracle for Knapsack then we could set 𝑈 = 𝑆 , let 𝑤 and 𝑣 be
the identity functions so that 𝑤 ( 18) = 𝑣 ( 18) = 18 and 𝑤 ( 23) = 𝑣 ( 23) = 23, etc.,
and then fix the weight and value targets as 𝑊 = 𝑉 = 𝑇 = 126. Then ⟨𝑆,𝑇 ⟩ ∈ L1
iff ⟨𝑆, 𝑤, 𝑣,𝑊 , 𝑉 ⟩ ∈ L0 . In this way, we think of the Subset Sum problem as a
special case of Knapsack.
More generally, let 𝑓 take the input ⟨𝑆,𝑇 ⟩ to the output ⟨𝑆, 𝑤, 𝑣,𝑇 ,𝑇 ⟩ , where
the functions 𝑤 and 𝑣 are given by 𝑤 (𝑠𝑖 ) = 𝑣 (𝑠𝑖 ) = 𝑠𝑖 . Then ⟨𝑆,𝑇 ⟩ ∈ L0 if and
only if 𝑓 (⟨𝑆,𝑇 ⟩) ∈ L1 . Clearly 𝑓 can be done in polytime.
We close with some basic facts about polytime reduction.
6.10 Lemma Polytime reduction is reflexive: L ≤𝑝 L for all languages. It is also
transitive: L2 ≤𝑝 L1 and L1 ≤𝑝 L0 imply that L2 ≤𝑝 L0 . Every nontrivial
computable language is 𝑃 hard: for L1 ∈ P, every language L0 with L0 ≠ ∅ and
L0 ≠ N satisfies that L1 ≤𝑝 L0 . The class P is closed downward: if L0 ∈ P and
L1 ≤𝑝 L0 then L1 ∈ P. So also is the class NP.
Proof The first two sentences and the final sentence are Exercise 6.36.
For the third sentence fix a L0 that is nontrivial, so there is a member 𝜎 ∈ L0
and a nonmember 𝜏 ∉ L0 . Let L1 be an element of P. We will specify a polytime
324 Chapter V. Computational Complexity

reduction function 𝑓 for L1 ≤𝑝 L0 . For 𝛼 ∈ B∗, computing whether 𝛼 ∈ L1 can

be done in polytime. If it is a member then let 𝑓 (𝛼) = 𝜎 while if not then let
𝑓 (𝛼) = 𝜏 .
For the downward closure of P, suppose that L1 ≤𝑝 L0 via the polytime
function 𝑔 and also suppose that there is a polytime algorithm for determining
membership in L0 . Determine membership in L1 by starting with an input 𝜎 ,
finding 𝑔(𝜎) , and applying the L0 algorithm to settle whether 𝑔(𝜎) ∈ L0 . Where
the L0 algorithm runs in time that is O (𝑛𝑖 ) and where 𝑔 runs in time that is O (𝑛 𝑗 ) ,
determining L1 membership in this way runs in time that is O (𝑛 max (𝑖,𝑗 ) ) , which is
polynomial.
By the lemma, because Example 6.7 shows Graph Colorability ≤𝑝 Satisfiability
and Example 6.5 gives Satisfiability ≤𝑝 Clique, we know that Graph Colorability ≤𝑝
Clique.
Reiterating this section’s main point, a reducibility such as L1 ≤𝑝 L0 means
that if we could solve the L0 problem in polynomial time then we could solve the
L1 problem in polynomial time also.
Finally, more examples of problem reductions in Section 7.

V.6 Exercises

6.11 If L1 ≤𝑝 L0 then which is true?

(a) A fast algorithm for L0 would give a fast algorithm for L1 .
(b) A fast algorithm for L1 would give a fast one for L0 .

6.12 Suppose that L1 ≤𝑝 L0 . Which is the right way to use the phrase ‘reduces
to’: “L1 reduces to L0 ,” or “L0 reduces to L1 ?”
✓ 6.13 Show that if L0 ∉ P and L0 ≤𝑝 L1 then L1 ∉ P also. What about NP?
6.14 Your friend is confused. “Lemma 6.10 says that every language in P is ≤𝑝
every other nontrivial language. But there are uncountably many languages and
only countably many 𝑓 ’s because they each come from some Turing machine. So
I’m not seeing how there are enough reduction functions for a given language to
be related to all others.” Un-confuse them.
6.15 Must a set be polytime reducible to its complement?
(a) Show that N is not polytime reducible to the empty set.
(b) Further, show that if 𝐴 ≤𝑝 𝐵 and 𝐵 is computably enumerable then 𝐴 is
computably enumerable. Conclude that 𝐾 c ≰𝑝 𝐾 .
6.16 Prove that L ≤𝑝 Lc if and only if Lc ≤𝑝 L.
6.17 Example 6.9 includes as illustration a Subset Sum problem, where 𝑆 =
{ 18, 23, 31, 33, 72, 86, 94 } and 𝑇 = 126. Solve it.
6.18 Produce the compatibility graph for (𝑥 0 ∨ 𝑥 1 ) ∧ (¬𝑥 0 ∨ ¬𝑥 1 ) ∧ (𝑥 0 ∨ ¬𝑥 1 ) .
Section 6. Reductions between problems 325

6.19 Following the method of Example 6.7 give the expression associated with
the quesion of whether this graph is 3-colorable. Is that expression satisfiable?

𝑣0,0 𝑣0,1

𝑣1,0 𝑣2,0

𝑣1,1 𝑣2,1

✓ 6.20 Suppose that the language 𝐴 is polynomial time reducible to the language 𝐵 ,
𝐴 ≤𝑝 𝐵 . Which of these are true?
(a) A tractable way to decide 𝐴 can be used to tractably decide 𝐵 .
(b) If 𝐴 is tractably decidable then 𝐵 is tractably decidable also.
(c) If 𝐴 is not tractably decidable then 𝐵 is not tractably decidable too.
✓ 6.21 The Substring problem inputs two strings and decides if the second is a
substring of the first. The Cyclic Shift problem inputs two strings and decides
if the second is a cyclic shift of the first. (Where 𝛼 = abcde, one cyclic shift is
𝛽 = deabc. More precisely, if 𝛼 = 𝑎 0𝑎 1 ... 𝑎𝑛−1 and 𝛽 = 𝑏 0𝑏 1 ... 𝑏𝑛−1 are length 𝑛
strings, then 𝛽 is a cyclic shift of 𝛼 when there is an index 𝑘 ∈ { 0, ... 𝑛 − 1 } such
that 𝑎𝑖 = 𝑏 (𝑘+𝑖 ) mod 𝑛 for all 𝑖 < 𝑛 .)
(a) Name three cyclic shifts of 𝛼 = 0110010.
(b) Decide whether 𝛽 = 101001101 is a cyclic shift of 𝛼 = 001101101.
(c) State the Substring problem as a language decision problem.
(d) Also state the Cyclic Shift problem as a language decision problem.
(e) Show that Cyclic Shift ≤𝑝 Substring. Hint: for same length strings, 𝛽 is a cyclic
shift of 𝛼 if and only if 𝛽 is a substring of 𝛼 ⌢ 𝛼 .
✓ 6.22 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
by any edge. The Vertex Cover problem inputs a graph and a bound and decides if
there is a vertex set, of size less than or equal to the bound, such that every edge
contains at least one vertex in the set.
(a) State each as a language decision problem.
(b) Consider this graph. Find a vertex cover with four elements.
𝑣0 𝑣1 𝑣2 𝑣3 𝑣4

𝑣5 𝑣6 𝑣7 𝑣8 𝑣9

(c) In that graph find an independent set with six elements.

(d) Show that in a graph, 𝑆 is an independent set if and only if N − 𝑆 is a vertex
cover, where N is the set of vertices.
(e) Conclude that Vertex Cover ≤𝑝 Independent Set.
(f) Also conclude that Independent Set ≤𝑝 Vertex Cover.
6.23 The Vertex Cover problem inputs a graph and a bound and decides if there is
a vertex set, of size less than or equal to the bound, such that every edge contains
326 Chapter V. Computational Complexity

at least one vertex in the set. The Set Cover problem inputs a set 𝑆 , a collection of
subsets 𝑆 0 ⊆ 𝑆 , . . . 𝑆𝑛 ⊆ 𝑆 , and a bound, and decides if there is a subcollection of
the 𝑆 𝑗 , with a number of sets at most equal to the bound, whose union is 𝑆 .
(a) State each as a language decision problem.
(b) Find a vertex cover for this graph.
𝑐 𝑓 𝑙
𝑞1 𝑞3 𝑞6 𝑞8
𝑎
𝑑 𝑔 𝑗
𝑞0 𝑞5 𝑚 𝑛
ℎ 𝑘
𝑏
𝑞2 𝑒
𝑞4 𝑞7 𝑞9
𝑖

(c) Make a set 𝑆 consisting of all of that graph’s edges, and for each 𝑣 make a
subset 𝑆 𝑣 of the edges incident on that vertex. Find a set cover.
(d) Show that Vertex Cover ≤𝑝 Set Cover.

✓ 6.24 Show that Hamiltonian Circuit ≤𝑝 Traveling Salesman. (a) State each as a
language decision problem. (b) Produce the reduction function.
✓ 6.25 In this network, each edge is labeled with a capacity. (Imagine railroad
lines going from 𝑞 0 to 𝑞 6 .)

𝑞1 2 𝑞4
3 2 4 1

𝑞0 1 𝑞3 2 𝑞6
2 2 2 2
𝑞2 𝑞5
1

The Max-Flow problem is to find the maximum total amount that can flow across
the network, usually by using many paths at once. That is, we will find a flow 𝐹𝑞𝑖 ,𝑞 𝑗
for each edge, subject to the constraints that the flow through an edge must not
exceed its capacity and that the flow into a vertex must equal the flow out (except
for the source 𝑞 0 and the sink 𝑞 6 ). The Linear Programming problem is described
on page 285.
(a) Express each as a language decision problem, remembering the technique of
converting optimization problems using bounds.
(b) By eye, find the maximum flow for the above network.
(c) For each edge 𝑣 𝑖 𝑣 𝑗 , define a variable 𝑥𝑖,𝑗 . Describe the constraints on that
variable imposed by the edge’s capacity. Also describe the constraints on the
set of variables imposed by the limitation that for many vertices the flow in
must equal the flow out. Finally, use the variables to give an expression to
optimize in order to get maximum flow.
(d) Show that Max-Flow ≤𝑝 Linear Programming.

6.26 The Max-Flow problem inputs a directed graph where each edge is labeled
with a capacity, and the task is to find a the maximum amount that can flow from the
source node to the sink node (for more, see Exercise 6.25). The Drummer problem
starts with two same-sized sets, the rock bands, 𝐵 , and potential drummers, 𝐷 .
Section 6. Reductions between problems 327

Each band 𝑏 ∈ 𝐵 has a set 𝑆𝑏 ⊆ 𝐷 of drummers that they would agree to take on.
The goal is to make the most number of matches.
(a) Consider four bands 𝐵 = {𝑏 0, 𝑏 1, 𝑏 2, 𝑏 3 } and drummers 𝐷 = {𝑑 0, 𝑑 1, 𝑑 2, 𝑑 3 } .
Band 𝑏 0 likes drummers 𝑑 0 and 𝑑 2 . Band 𝑏 1 likes only drummer 𝑑 1 , and 𝑏 2
also likes only 𝑑 1 . Band 𝑏 3 like the sound of both 𝑑 2 and 𝑑 3 . What is the
largest number of matches?
(b) Express each as a language decision problem.
(c) Draw a graph with the bands on the left and the drummers on the right. Make
an arrow from a band to a drummer if there is a connection. Now add a source
and a sink node to make a flow diagram.
(d) Show that Drummer ≤𝑝 Max-Flow.
6.27 The 3-SAT problem is to decide the satisfiability of CNF propositional logic
expressions where every clause has at most three literals. The Strict 3-Satisfiability
problem requires that each clause has exactly three unequal literals. We will show
that the two are inter-reducible.
(a) Show the easy half, that Strict 3-Satisfiability ≤𝑝 3-SAT.
(b) Also show that we can go from clauses with two literals to clauses with
three by introducing an irrelevant variable: 𝑃 ∨ 𝑄 is equivalent to (𝑃 ∨
𝑄 ∨ 𝑅) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝑅) . Along the same lines, show that 𝑃 is equivalent to
(𝑃 ∨ 𝑄 ∨ 𝑅) ∧ (𝑃 ∨ ¬𝑄 ∨ 𝑅) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝑅) ∧ (𝑃 ∨ ¬𝑄 ∨ ¬𝑅) .
(c) Show 3-SAT ≤𝑝 Strict 3-Satisfiability.
6.28 We will show that the 3-SAT problem, 3-SAT, is inter-reducible with SAT.
(We will assume that instances of SAT are in Conjunctive Normal form.)
(a) Show the easy half, that 3-SAT ≤𝑝 SAT.
(b) As a preliminary for the other reduction, show that the propositional logic
implication 𝑃 → 𝑄 is equivalent to ¬𝑃 ∨ 𝑄 .
(c) To go from clauses with four literals to those with three, start with 𝑃 ∨𝑄 ∨𝑅 ∨𝑆 .
Introduce a variable 𝐴 such that 𝐴 ↔ (𝑃 ∨𝑄) , that is, (𝐴 → (𝑃 ∨𝑄)) ∧ (𝐴 ←
(𝑃 ∨𝑄)) . Show that (𝐴 → (𝑃 ∨𝑄)) is equivalent to (𝑃 ∨𝑄 ∨ ¬𝐴) . Also verify
that (𝑃 ∨𝑄) → 𝐴 is equivalent to (𝑃 ∨¬𝑄 ∨𝐴) ∧ (¬𝑃 ∨𝑄 ∨𝐴) ∧ (¬𝑃 ∨¬𝑄 ∨𝐴) .
Conclude that 𝑃 ∨ 𝑄 ∨ 𝑅 ∨ 𝑆 is equivalent to (𝐴 ∨ 𝑅 ∨ 𝑆) ∧ (𝑃 ∨ 𝑄 ∨ ¬𝐴) ∧
(𝑃 ∨ ¬𝑄 ∨ 𝐴) ∧ (¬𝑃 ∨ 𝑄 ∨ 𝐴) ∧ (¬𝑃 ∨ ¬𝑄 ∨ 𝐴) .
(d) For a five literal clause 𝑃 ∨ 𝑄 ∨ 𝑅 ∨ 𝑆 ∨ 𝑋 , find an equivalent propositional
logic expression made of clauses having only three literals each.
(e) Show that SAT ≤𝑝 3-SAT.
6.29 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
by any edge.
(a) In this graph, find an independent set with at least 𝐵 = 3 members.
𝑞0 𝑞1 𝑞2

𝑞3 𝑞4 𝑞5

(b) State Independent Set as a language decision problem.

328 Chapter V. Computational Complexity

(c) Decide if 𝐸 = (𝑃 0 ∨ ¬𝑃 1 ∨ ¬𝑃 2 ) ∧ (𝑃 1 ∨ 𝑃 2 ∨ ¬𝑃 3 ) is satisfiable.

(d) State 3-SAT as a language decision problem.
(e) With the expression 𝐸 , make a triangle for each of the two clauses, where the
vertices of the first are labeled 𝑣 0 , 𝑣 1 , and 𝑣 2 , while the vertices of the second
are 𝑤 1 , 𝑤 2 , and 𝑤 3 . In addition to the edges forming the triangles, also put
one connecting 𝑣 1 with 𝑤 1 , and one connecting 𝑣 2 with 𝑤 2 .
(f) Sketch an argument that 3-SAT ≤𝑝 Independent Set.
✓ 6.30 The decision problem Integer Linear Programming starts with a list of linear
inequalities with variables 𝑥 0, ... 𝑥𝑛 , such as 𝑎𝑖,0𝑥 0 + · · · +𝑎𝑖,𝑛 𝑥𝑛 ≤ 𝑏𝑖 (or ≥ 𝑏 ), and it
looks for a sequence ⟨𝑠 0, ... 𝑠𝑛 ⟩ that is feasible in that it satisfies all of the constraints,
but with the restriction that the numbers must be integers, 𝑎𝑖,𝑗 , 𝑏𝑖 , 𝑠𝑖 ∈ Z.
(a) Consider the propositional logic clause 𝑃 0 ∨ ¬𝑃 1 ∨ ¬𝑃 2 . Create variables 𝑧 0 ,
𝑧 1 , and 𝑧 2 and list linear constraints such that each must be either 0 or 1. Also
give a linear inequality that holds if and only if the clause is true.
(b) Show that 3-SAT ≤𝑝 Integer Linear Programming.
6.31 Consider the problem D of deciding whether a multivariable polynomial has
any integer roots. That is, it is the language decision problem for this set.
𝐷 = {𝑝 𝑝 is a polynomial and there is 𝑐® ∈ Z𝑛 so that 𝑝 (®
𝑐) = 0 }
We will show that 3-SAT ≤𝑝 D.
(a) Argue that a one-clause disjunction has a value of 𝑇 if and only if any of its
literals has a value of 𝑇 . For instance, 𝐸 0 = 𝑃 0 ∨ ¬𝑃 1 is true if and only if
𝑃0 = 𝑇 or ¬𝑃1 = 𝑇 .
(b) Associate 𝐸 0 with the set 𝑆 𝐸 0 = {𝑥 0 ( 1 − 𝑥 0 ), 𝑥 1 ( 1 − 𝑥 1 ), 𝑥 0 ( 1 − 𝑥 1 ) } of three
polynomials. Argue that all three have a value of 0 if and only if both variables
have a value of either 0 or 1, and either 𝑥 0 = 0 or 𝑥 1 = 1.
(c) For the expression 𝐸 1 = (𝑃 0 ∨ ¬𝑃 1 ∨ ¬𝑃 2 ) ∧ (𝑃 1 ∨ 𝑃 2 ∨ ¬𝑃 3 ) , produce a set of
polynomials 𝑆 𝐸1 with the analogous properties.
(d) Combine the polynomials from 𝑆 𝐸 1 in the prior item into a single polynomial
in such a way that it has an overall value of 0 if and only if all the members
of 𝑆 have a value of 0.
(e) Show that 3-SAT ≤𝑝 D.
6.32 Our definition of the Traveling Salesman problem is based on an undirected
graph. The Asymmetric Traveling Salesman problem takes place on a directed
graph. (The problem name comes from the fact that the graph’s matrix is
asymmetric.)
𝑣0 𝑣0 𝑣1 𝑣2
𝑣0 – 1 2
!
1
3 2
5
𝑀 = 𝑣1 3 – 4
𝑣1
6
𝑣2 𝑣2 5 6 –
4

An example of such a problem is when nodes are cities and edges are flights, with
the weight of an edge being the flight’s cost.
(a) Show that Traveling Salesman ≤𝑝 Asymmetric Traveling Salesman.
Section 6. Reductions between problems 329

(b) We will develop the converse. An asymmetric problem instance is a directed

graph G with vertices 𝑔0, ... 𝑔𝑛 . From it make a new graph H with twice as
many vertices, which come in two colors, ℎ 0, ... ℎ𝑛 , ℎ̂ 0, ... ℎ̂𝑛 . This graph only
has connections between different-colored vertices. Where 𝑖 ≠ 𝑗 , if vertex 𝑔𝑖
is connected to 𝑔 𝑗 by an edge with weight 𝑤 then vertex ℎ̂𝑖 is connected to ℎ 𝑗
and vertex ℎ𝑖 is connected to ℎ̂ 𝑗 , both by an edge with weight 𝑤 . Using the
graph G pictured above, produce the graph matrix for H and verify that it is
symmetric.
(c) To finish, let 𝑊 be the product of G ’s maximum edge weight with its number
of vertices. In the graph 𝐻 , for all indices 𝑖 use an edge of weight −𝑊 − 1 to
connect ℎ𝑖 with ℎ̂𝑖 , in both directions. Modify the matrix from the prior part
and verify that the modified matrix is symmetric.
(d) Find the connection between solution circuits in H and those in G . Argue
that with a Traveling Salesman oracle we could, given an asymmetric instance
G , translate that to a symmetric instance H and use the oracle to derive a
solution to G , all in polytime.
✓ 6.33 We can do reductions between problems of types other than language
decision problems, such as optimizations. A reduction between optimization
problems is a pair of polytime computable functions ⟨𝑓 , 𝑔⟩ mapping bitstrings to
bitstrings, where 𝑓 is a reduction function as in Definition 6.1, taking instances to
instances, and 𝑔 takes optimal solutions to optimal solutions.
(a) The Assignment problem inputs a set of workers 𝑊 = {𝑤 0, ... 𝑤 𝑛− 1 } and a set
of tasks 𝑇 = {𝑡 0, ... 𝑡𝑛− 1 }, of equal sizes. For each worker and task there is a
cost 𝐶 (𝑤𝑖 , 𝑡 𝑗 ) . The goal is to assign each task, one per worker, for minimal
total cost. By eye, solve this Assignment problem instance.
Cost 𝐶 (𝑤𝑖 , 𝑡 𝑗 ) 𝑤0 𝑤1 𝑤2 𝑤3
𝑡0 13 4 7 6
𝑡1 1 11 5 4
𝑡2 6 7 2 8
𝑡3 1 3 5 9
(b) The Asymmetric Traveling Salesman problem asks for circuit of minimal to-
tal cost in a weighted graph that is directed. Show that Assignment ≤𝑝
Asymmetric Traveling Salesman. Hint: the reduction function 𝑓 inputs an
assignment table. Have it output a bipartite graph as shown here, where each
worker is connected to each task,
𝑤0 𝑤1 𝑤2 𝑤3

𝑡0 𝑡1 𝑡2 𝑡3

but make it a directed graph where between each worker and task there is an
edge in each direction. Use the given assignment cost table to make appropriate
edge weights. Finish by verifying that there is a polytime computable function 𝑔
that associates optimal assignments with optimal circuits.
330 Chapter V. Computational Complexity

6.34 We will show that Fin ≤𝑝 Reg, where they are the decision problems for the
language 𝑅 = {𝑥 ∈ N the language decided by P𝑥 is regular } and also for the
language 𝐹 = {𝑖 ∈ N the language decided by P𝑖 is finite } (this means that P𝑖
halts on all inputs and acts as the characteristic function of a set that is finite).
(a) Adapt Example 5.2 from Chapter Four to show that any infinite subset of
{ a𝑛 b𝑛 𝑛 ∈ N } is not regular.
(b) Argue that there is a Turing machine with the behavior below. Then apply the
s-m-n lemma to parameterize 𝑥 .

Start

Read 𝜎 , 𝑥

N
𝜎 matches a𝑛 b𝑛 ?
Y

P𝑥 accepts a 𝜏 of length 𝑛 ?
Y N

Print 1 Print 0

End

(c) Using the prior item, produce the reduction function.

6.35 As discussed at the start of the section, 𝐵 ≤𝑇 𝐴 if there is an oracle 𝐴

Turing machine that computes 𝐵 , so that 𝜙𝑒𝐴 = 1𝐵 . And, 𝐵 ≤𝑚 𝐴 if there is a total
computable function so that 𝑥 ∈ 𝐵 if and only if 𝑓 (𝑥) ∈ 𝐴. We will show that the
two reductions ≤𝑇 and ≤𝑚 differ.
(a) Let 𝐵 be the nonempty computable set { 2𝑛 𝑛 ∈ N } of even numbers and let
𝐴 be the empty set. Show that 𝐵 ≤𝑇 𝐴 but it is not the case that 𝐵 ≤𝑚 𝐴.
(b) Observe that 𝐴 ≤𝑇 𝐴c for all sets, and so 𝐾 ≤𝑇 𝐾 c . Show that if 𝐵 ≤𝑚 𝐴 and 𝐵
is computably enumerable then so is 𝐴, and conclude that 𝐾 is not many-one
reducible to its complement.

✓ 6.36 Lemma 6.10 leaves a couple of points undone.

(a) Show that ≤𝑝 is reflexive and transitive.
(b) It says that nontrivial languages are P hard. What about trivial ones? Which
languages reduce to the empty set? To B∗ ?
(c) Show that NP is downward closed, that if L1 ≤𝑝 L0 and L0 ∈ NP then
L1 ∈ NP also.

6.37 When L𝑖 ≤𝑝 L 𝑗 , does that mean that the best algorithm to decide L𝑖 takes
time that is less than or equal to the amount taken by the best algorithm for L 𝑗 ?
Fix a language decision problem L0 whose fastest algorithm is O (𝑛 3 ) , an L1 whose
best algorithm is O (𝑛 2 ) , a L2 whose best is O ( 2𝑛 ) , and a L3 whose best is O ( lg 𝑛) .
Section 7. NP completeness 331

In the array entry 𝑖, 𝑗 below, put ‘N’ if L𝑖 ≤𝑝 L 𝑗 is not possible.

L0 L1 L2 L3
L0 (0,0) (0,1) (0,2) (0,3)
L1 (1,0) (1,1) (1,2) (1,3)
L2 (2,0) (2,1) (2,2) (2,3)
L3 (3,0) (3,1) (3,2) (3,3)

6.38 Is there a connection between subset and polytime reducibility? Find

languages L0, L1 ∈ P ( B∗ ) for each:
(a) L0 ⊂ L1 and L0 ≤𝑝 L1 ,
(b) L0 ⊄ L1 and L0 ≤𝑝 L1 ,
(c) L0 ⊂ L1 and L0 ≰𝑝 L1 ,
(d) L0 ⊄ L1 and L0 ≰𝑝 L1 .

Section
V.7 NP completeness
Because P ⊆ NP, the class NP contains lots of easy problems, ones with a fast
algorithm. But the interest in the class is that it also contains lots of problems that
seem to be hard. Can we prove that these problems are indeed hard?
This question was raised by S Cook in 1971. He noted
that the idea of polynomial time reducibility gives us a way
to make precise that an efficient solution for one problem
implies an efficient solution for the other. He then showed
that among the problems in NP, there are ones that are
maximally hard. (This was also shown by L Levin but he
was behind the Iron Curtain and knowledge of his work
did not spread to the rest of the world for some time.) Stephen Cook b 1939 and
Here, ‘maximally hard’ means that these are NP prob- Leonid Levin b 1948
lems and they are at least as hard as any NP problem, in
that if we could solve one of these then we could solve any NP problem at all.
7.1 Theorem (Cook-Levin theorem) The Satisfiability problem is in NP and has the
property any problem in NP reduces to it: L ≤𝑝 SAT for any L ∈ NP.
First, we have already observed that SAT ∈ NP because, given a Boolean
expression, we can use as a witness 𝜔 an assignment of truth values that satisfies
the expression.
Here is an outline of the proof ’s other half. Given L ∈ NP, we must show that
L ≤𝑝 SAT. We produce a function 𝑓L that translates membership questions for L
into Boolean expressions, such that the membership answer is ‘yes’ if and only if
the expression is satisfiable. What we know about L is that its member 𝜎 ’s are
accepted by a nondeterministic machine P in time given by a polynomial 𝑞 . With
that, from ⟨P , 𝜎, 𝑞⟩ the proof constructs a Boolean expression that yields 𝑇 if and
only if P accepts 𝜎 . The Boolean expression encodes the constraints under which
332 Chapter V. Computational Complexity

a Turing machine operates, such as that the only tape symbol that can be changed
in the current step is the symbol under the machine’s head.
7.2 Definition A problem is NP hard if every problem in NP reduces to it, that is,
L is NP hard if L̂ ∈ NP implies that L̂ ≤𝑝 L. A problem is NP complete if, in
addition to being NP hard, it is also a member of NP.†
So a problem is NP complete if it is, in a sense, at least as hard as any member
of NP. The sketch below illustrates.

NP hard

NP complete P NP

7.3 Figure: The blob contains all problems. In the bottom is NP, drawn with P as a
proper subset. The top has the NP-hard problems. The highlighted intersection is
the set of NP complete problems.

The Cook-Levin Theorem says that there is at least one NP complete problem,
namely SAT. In fact, we shall see that there are many such problems.
The NP complete problems are to the class NP as the problems Turing-equivalent
to the Halting problem set 𝐾 are to the computably enumerable sets. If we could
solve the one problem then we could solve every other problem in that class.
7.4 Lemma If L0 is NP complete, and L0 ≤𝑝 L1 , and L1 ∈ NP then L1 is NP complete.
Proof Exercise 7.30.
Soon after Cook raised the question of NP completeness, R Karp
brought it to widespread attention. Karp noted that there are clusters
of problems: there is a collection of problems solvable in time O ( lg (𝑛)) ,
problems of time O (𝑛) , those of time O (𝑛 lg 𝑛) , etc. There is also
a cluster of problems that seem much tougher. He gave a list of
twenty one of these, drawn from Computer Science, Mathematics, and
the natural sciences, where lots of smart people had for years been
unable find efficient algorithms. He showed that all of these problems Richard M Karp
are NP complete, so that if we could efficiently solve any then we could b 1935
efficiently solve them all. Not every difficult problem is NP complete
but many thousands of problems have been shown to be so and thus whatever it is
that makes these problems hard, all of them share it.
Typically we prove that a problem L is NP complete in two halves. First we
show that it is in NP by exhibiting a witness 𝜔 that a deterministic verifier can
check in polytime. Second, we show that the problem is NP hard by showing
†
In general, for a complexity class C, a problem L is C hard when all problems in that class reduce to
it: if L̂ ∈ C then L̂ ≤𝑝 L. A problem is C complete if it is hard for that class and also is a member of
that class.
Section 7. NP completeness 333

that an NP complete problem reduces to it. The list below gives the NP complete
problems most often used. For instance, we might show that 3-SAT ≤𝑝 L.
7.5 Theorem (Basic NP Complete Problems) Each of these problems is NP com-
plete.
3-Satisfiability, 3-SAT Given a propositional logic formula in conjunctive normal
form in which each clause has at most 3 variables, decide if it is satisfiable.
3 Dimensional Matching Given as input a set 𝑀 ⊆ 𝑋 × 𝑌 × 𝑍 , where the sets
𝑋, 𝑌 , 𝑍 all have the same number of elements, 𝑛 , decide if there is a matching,
a set 𝑀 ˆ ⊆ 𝑀 containing 𝑛 elements such that no two of the triples in 𝑀 ˆ agree
on any of their coordinates.
Vertex cover Given a graph and a bound 𝐵 ∈ N, decide if the graph has a size 𝐵
set of vertices 𝐶 such that for any edge 𝑣𝑖 𝑣 𝑗 , at least one of its ends is a member
of 𝐶 .
Clique Given a graph and a bound 𝐵 ∈ N, decide if the graph has a set of 𝐵 -many
vertices where any two are connected.
Hamiltonian Circuit Given a graph, decide if it contains a cyclic path that includes
each vertex.
Partition Given a finite multiset 𝑆 of natural numbers, decide if there is a division
of the set into the two parts 𝑆ˆ and 𝑆 − 𝑆ˆ so the total of their elements is the
same, 𝑠 ∈𝑆ˆ 𝑠 = 𝑠∉𝑆ˆ 𝑠 .
Í Í

7.6 Example We will show that the Traveling Salesman problem is NP complete.
Recall that we have recast it as the decision problem for the language of pairs ⟨G, 𝐵⟩ ,
where 𝐵 is a parameter bound, and that this problem is a member of NP. We will
show that it is NP hard by proving that the Hamiltonian Circuit problem reduces to
it, Hamiltonian Circuit ≤𝑝 Traveling Salesman.
We need a reduction function 𝑓 . It must input an instance of Hamiltonian Circuit,
a graph G = ⟨N , E ⟩ whose edges are unweighted. Define 𝑓 to return the instance
of Traveling Salesman that uses N as cities, that takes the distances between cities
to be 𝑑 (𝑣𝑖 , 𝑣 𝑗 ) = 1 if 𝑣𝑖 𝑣 𝑗 ∈ E and 𝑑 (𝑣𝑖 , 𝑣 𝑗 ) = 2 if 𝑣𝑖 𝑣 𝑗 ∉ E , and such that the bound
is the number of vertices, 𝐵 = | N | .
This bound means that there will be a Traveling Salesman solution if and only
if there is a Hamiltonian Circuit solution; namely, the salesman uses the edges that
appear in the Hamiltonian circuit. All that remains is to argue that the reduction
function runs in polytime. The number of edges in a graph is no more than twice
the number of vertices so polytime in the input graph size is the same as polytime
in the number of vertices. The reduction function’s algorithm examines all pairs of
vertices, which takes time that is quadratic in the number of vertices.
A common way to show that a given problem L is NP hard is to show that a
special case of L is NP hard.
7.7 Example The Knapsack problem starts with a multiset of objects 𝑆 = {𝑠 0, ... 𝑠𝑘 − 1 },
each with a natural number weight 𝑤 (𝑠𝑖 ) and a value 𝑣 (𝑠𝑖 ) , along with a weight
bound 𝐵 and value target 𝑇 . We then look for a knapsack 𝐾 ⊆ 𝑆 whose elements
334 Chapter V. Computational Complexity

have total weight less than or equal to the bound and total value greater than or
equal to the target.
First we check that this problem is in NP. As the witness we can use the 𝑘 -bit
string 𝜔 such that 𝜔 [𝑖] = 1 if 𝑠𝑖 is in the knapsack 𝐾 , and 𝜔 [𝑖] = 0 if it is not. A
deterministic machine can verify this witness in polynomial time since it only has
to total the weights and values of the elements of 𝐾 .
To finish we must show that Knapsack is NP hard. It is sufficient to show that
a special case is NP hard. Consider the Knapsack instance where 𝑤 (𝑠𝑖 ) = 𝑣 (𝑠𝑖 )
for all 𝑠𝑖 ∈ 𝑆 , and where the two criteria each equal half of the weight total,
𝐵 = 𝑇 = 0.5 · 0 ≤𝑖<𝑘 𝑤 (𝑖) . This shows that any instance of the Partition problem,
which is in the above basic list, can be expressed as a Knapsack instance, so
Í

Partition ≤𝑝 Knapsack, and consequently the latter is NP hard.

Another common strategy in these proofs is to build the reduction function so

that the behavior of the input instance’s fundamental units is simulated by subparts
of the output instance. Such a construct is a gadget.
7.8 Example Recall that a graph can be three-colored if we can partition its vertices
into three categories, the colors, in such a way that no two same-colored vertices
are connected by an edge. We will show that the 3-Coloring problem, the decision
problem for L = { G the graph G has a 3-coloring }, is NP complete.
The easy half is L ∈ NP. As a witness we use a 3-coloring 𝜔 = {𝐶 0, 𝐶 1, 𝐶 2 }
where each 𝐶𝑖 is a subset of G ’s vertices. Clearly we can produce a verifier that
inputs the graph 𝜎 = G along with 𝜔 and certifies in polytime that 𝜔 partitions the
vertices and that never do two same-color vertices lie on the same edge.
The other half is to show that L is NP hard. We will sketch the proof
that 3-SAT ≤𝑝 3-Coloring. The reduction function inputs a propositional logic
expression in Conjunctive Normal form. It outputs a graph, which is 3-colorable if
and only if the input expression is satisfiable.
The gadget is below. It simulates one clause in the input propositional logic
expression, 𝑎 ∨ 𝑏 ∨ 𝑐 (these are literals so that each could be 𝑥 or ¬𝑥 ). At the top
are nodes labeled 𝑇 , 𝐹 , and 𝐺 . If the graph is to be 3-colored then no two nodes in
this triangle can have the same color. At the gadget’s bottom are nodes labeled 𝑎 ,
𝑏 , and 𝑐 . They are connected to 𝐺 and consequently each of these three must have
either the color of 𝑇 or the color of 𝐹 . (The connections are shown dashed to fit
with the drawing (∗∗) below.) Also, because 𝑛 2 is connected to both 𝐺 and 𝐹 , it
must be the color of 𝑇 .

𝑇 𝐹

𝑛0 𝑛1 𝑛2 (∗)
𝑛3 𝑛4 𝑛5

𝑎 𝑏 𝑐
Section 7. NP completeness 335

We will verify that this gadget is 3-colorable if and only if nodes 𝑎 , 𝑏 , and 𝑐 are
not all the color of 𝐹, matching the behavior of the clause 𝑎 ∨ 𝑏 ∨ 𝑐 . For ‘only if ’,
assume that 𝑎 , 𝑏 , and 𝑐 are the color of 𝐹. Then one of 𝑛 3 and 𝑛 4 is the color of 𝑇
while the other is the color of 𝐺 , and hence 𝑛 0 is the color of 𝐹 . Since 𝑛 2 is the
color of 𝑇 this implies that 𝑛 1 is the color of 𝐺 and this in turn gives that 𝑛 5 is the
color of 𝐹 . That violates 3-colorability because 𝑐 is the color of 𝐹 .
For ‘if ’, we need only exhibit that a 3-coloring exists for each remaining case.

𝑎 𝑏 𝑐 𝑛0 𝑛1 𝑛2 𝑛3 𝑛4 𝑛5
𝐹 𝐹 𝑇 𝐹 𝐺 𝑇 𝑇 𝐺 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝐺 𝐹 𝐺
𝐹 𝑇 𝑇 𝐹 𝐺 𝑇 𝑇 𝐺 𝐹
𝑇 𝐹 𝐹 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺
𝑇 𝐹 𝑇 𝐹 𝐺 𝑇 𝐺 𝑇 𝐹
𝑇 𝑇 𝐹 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺
𝑇 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹 𝐺 𝐺

We finish by describing how to combines multiple gadgets for multiple clauses.

𝑇 𝐹

(∗∗)

𝑥 ¬𝑥 𝑦 ¬𝑦 𝑧 ¬𝑧

Above is the output graph for the two-clause input expression 𝐸 = (𝑥 ∨ 𝑦 ∨ 𝑧) ∧

(¬𝑥 ∨ ¬𝑦 ∨ 𝑧) . On the left is the gadget for the first clause and on the right is the
gadget for the second. At the bottom, instead of (∗)’s single node 𝑎 , here there
are two nodes, marked 𝑥 and ¬𝑥 . Similarly, there are 𝑦 , ¬𝑦 , 𝑧 , and ¬𝑧 . All six
share an edge with 𝐺 (like (∗), this drawing uses dashes for the edges but here
they lead to an electrical ground symbol denoting the 𝐺 connection). Because of
this connection, in a 3-colored graph all six of these nodes must have the color of 𝑇
or the color of 𝐹 . What’s more, because 𝑥 is connected to ¬𝑥 , in a 3-colored graph
one of these has the color of 𝑇 and the other has the color of 𝐹 , which explains the
node names. The same holds for the other bottom node pairs.
The first clause in 𝐸 is 𝑥 ∨ 𝑦 ∨ 𝑧 so in the left gadget the reduction function
outputs the graph with the highlighted edges lead to the nodes marked 𝑥 , 𝑦 , and 𝑧 .
The second clause is ¬𝑥 ∨ ¬𝑦 ∨ 𝑧 so in the right gadget the edges lead to ¬𝑥 , ¬𝑦 ,
and 𝑧 .
Clearly this gives a 3-colorable graph if and only if the expression is satisfiable.
336 Chapter V. Computational Complexity

One of Karp’s points was the practical importance of NP completeness. Many

problems from applications fall into this class. The next example illustrates.
7.9 Example Scheduling is a rich source of difficult combinatorial problems. One is
the problem of scheduling college classes into time slots. Usually colleges start the
process by assigning classes to slots and then students try to find classes that they
need and that are offered at different times. Imagine if instead the process began
with each student submitting their list of desired classes, and then the college tries
to find a non-conflicting schedule of slots and rooms to accomodate the requests.
Specifically, consider a college with 𝑡 = 12 available time slots, and that must
work 𝑛 = 420 classes into 𝑟 = 60 classrooms. Since 12 · 60 = 720 and we need only
accomodate 420 classes, this might seem easy. But this college has 1724 students
and when each one submits their class requests, 𝑆𝑖 = {𝑐𝑖 0 , 𝑐𝑖 1 , ... 𝑐𝑖𝑛 }, each pair
of classes 𝑐𝑖𝑝 , 𝑐𝑖𝑞 puts the restriction on the college-wide schedule that those two
cannot meet at the same time. Can the college find a schedule despite all these
constraints?
The Class Scheduling problem inputs the number of time slots and rooms, along
with the class requests from each student, and decides if there is a way to allocate
classes so that there is no conflict.

L = { ⟨𝑡, 𝑟, {𝑆 0, 𝑆 1, ... }⟩ a nonconflicting schedule exists }

We will show that this problem is NP complete.

It is a member of NP because we can take as the witness 𝜔 a schedule, a
nonconflicting assignment of classes to rooms and times. Writing a verifier that
inputs 𝜎 = ⟨𝑡, 𝑟, {𝑆 0, 𝑆 1, ... }⟩ and 𝜔 , and then checks in polytime that 𝜔 meets the
restrictions is straightforward.
What remains is show that this problem is NP hard. This problem is a good
fit with the Graph Colorability problem with the nodes of the graph as the classes,
where two nodes are connected when some student has requested them both,
and where nodes are colored the same if they are in the same time slot. (There
is a further restriction about the number of available rooms that we will handle
along the way.) The prior example proves that Graph Colorability is NP hard
for 𝑘 = 3 colors and an extension of that argument proves the same for any
larger 𝑘 . So we will show that Class Scheduling is NP hard by showing that
Graph Colorability ≤𝑝 Class Scheduling.
Assume that we are given an instance of Graph Colorability, a graph G = ⟨V , E ⟩
and a natural number 𝑘 . Here is the instance of the Class Scheduling problem
that the reduction function associates with G : the number of classes is |𝑉 | , as is
the number of rooms, and the number of time slots is 𝑘 . Each student takes two
classes and there is an edge if and only if it connects some student’s two. Clearly
from the input instance we can in polytime produce the output instance. Also
clearly, the Graph Colorability instance has a solution if and only if the associated
Class Scheduling instance has a nonconflicting schedule.
Before we leave this discussion, we address a natural question: we’ve seen that
Section 7. NP completeness 337

lots of problems are NP complete but which problems are not?

Trivially, the definition gives that the empty language and the language of all
strings are not complete. Also trivially from the definition, a problem cannot be
NP complete if it is not in NP. For instance, as with finding a chess strategy, a
problem can be so hard that we cannot even check its solution in polytime.
The more interesting part of the answer is tied to whether P = NP or P ≠ NP.
We will address it in the next subsection but just to not have brushed past the issue,
first note that if P = NP then every nontrivial problem is NP hard by Lemma 6.10.
So assume that P ≠ NP. Any algorithms text describes many problems with
polytime algorithms and P ≠ NP implies that these are not NP complete. So
if P ≠ NP then most problems from ordinary programming experience are not
NP complete.
The most interesting questions concern the NP intermediate problems. These
are in NP − P but are not NP complete. If P ≠ NP we can prove that such problems
exist. However, proving that a particular problem of interest is in this category is
astonishingly difficult — at this moment the field just does not know how to do it.
Two problems of great interest that many experts believe are NP intermediate, but
no one can currently prove that, are Prime Factorization† and Graph Isomorphism
(to decide whether two graphs are isomorphic).
P = NP? Every deterministic Turing machine is
a nondeterministic machine and so P ⊆ NP. Thus
every polytime problem is in NP. But what about
the other direction — could it be that P = NP and
every NP problem is, in a sense, easy?
We have seen that one way to think of non-
deterministic machines is that they are unbound-
edly parallel. So the P versus NP question
asks: does adding parallelism add speed? Can un-
bounded parallelism bring problems from super-
Courtesy xkcd.com
polynomial to polynomial?
The short answer is that no one knows. We don’t know which of these two
pictures is right.

P=NP NP P

7.10 Figure: Which is it: P = NP or P ⊊ NP?

†
In 1994, P Shor discovered an algorithm for a quantum computer that solves the Prime Factorization
problem in polynomial time. This will have significant implications if we manage to build quantum
computers. However, on classical computers most experts believe that this problem is is in NP − P but
not complete.
338 Chapter V. Computational Complexity

There are a number of ways to potentially settle the question. For example,
by Lemma 7.4 if there is even one NP complete problem that we can prove is a
member of P, then P = NP. Conversely, if someone shows that there is an NP
problem that is not a member of P then P ≠ NP. However, despite nearly a half
century of effort by many brilliant people, no one has accomplished either one.
To explain all the effort on the question, we first argue for its importance. As
formulated in Karp’s original paper, the question of whether P equals NP might
seem of only technical interest.
A large class of computational problems involve the determination of properties
of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean
formulas and elements of other countable domains. Through simple encodings . . .
these problems can be converted into language recognition problems, and we can
inquire into their computational complexity. It is reasonable to consider such a problem
satisfactorily solved when an algorithm for its solution is found which terminates
within a number of steps bounded by a polynomial in the length of the input. We
show that a large number of classic unsolved problems of covering, matching, packing,
routing, assignment and sequencing are equivalent, in the sense that either each of
them possesses a polynomial-bounded algorithm or none of them does.
These careful words mask the excitement. Karp demonstrated that many problems
that people had been struggling with in practice — classic unsolved problems —
fall in this category. Researchers who have been looking for an efficient solution
to Vertex Cover and those who have been working on Clique find that they are
working on the same problem, in that the two are inter-translatable. By now the
list of NP complete problems includes determining the best layout of transistors on
a chip, developing accurate financial-forecasting models, analyzing protein-folding
behavior in a cell, or finding the most energy-efficient airplane wing. So the
question of whether P equals NP is extremely practical, and extremely important.†
Researchers often take proving that a problem is NP complete to be an ending
point; they may feel that continuing to look for an algorithm is a waste since
many of the world’s best minds have failed to find one. They may turn to finding
approximations (see Extra B) or to probabilistic methods.
We next argue that among many similar questions, each of which is important,
P versus NP suggests itself as especially significant. First, a philosophical take. At
the start of this book we studied problems that are unsolvable. That is black and
white — either a problem is mechanically solvable or it is not. In this chapter we
find that many problems are solvable in principle but computing a solution seems
to be infeasible. The set P consists of the problems that we can feasibly solve. But
if P ≠ NP then the problems in NP − P, including the NP complete ones, are ones
for which we can verify a correct answer but we cannot reliably find it. The poet
†
One indication of its importance is its inclusion on the Clay Mathematics Institute’s list of problems for
which there is a one million dollar prize; see http://www.claymath.org/millennium-problems. Part
of the introduction there says, “[O]ne of the outstanding problems in computer science is determining
whether questions exist whose answer can be quickly checked, but which require an impossibly long
time to solve by any direct procedure. Problems . . . certainly seem to be of this kind, but so far no one
has managed to prove that any of them really are so hard as they appear.”
Section 7. NP completeness 339

R Browning wrote, “Ah, but a man’s reach should exceed his grasp, Or what’s a
heaven for?” We can view these problems as a transition between the possible and
the impossible.
The sense that the P versus NP question fits into a larger intellectual
setting returns us to the book’s opening. Recall the Entscheidungsproblem
that was a motivation behind the definition of a Turing machine. It asks for
an algorithm that inputs a mathematical statement and decides whether it is
true. It is perhaps a caricature, but imagine that the job of mathematicians
is to prove theorems. Then the Entscheidungsproblem asks if it is possible
to replace mathematicians with mechanisms.
Robert
In the intervening century we have come to understand, through the Browning,
work of Gödel and others, that there is a difference between a statement’s 1812–1889
being true and its being provable. Church and Turing expanded on this
insight to show that the Entscheidungsproblem is unsolvable. Consequently, we
change to asking for an algorithm that inputs statements and decides whether they
are provable.
In principle this is simple. A proof is a sequence of statements, 𝜎0 , 𝜎1 , . . . 𝜎𝑘 ,
where the final statement is the conclusion and where each statement either is
an axiom or else follows from the statements before it by an application of a rule
of deduction (a typical rule allows the simultaneous replacement of all 𝑥 ’s with
𝑦 + 4’s). A computer could brute-force the question of whether a given statement
is provable by doing a dovetail, a breadth-first search of all derivations. If a proof
exists then it will appear, eventually.†
The difficulty is the ‘eventually’. This algorithm is very slow. Is there a tractable
way? In the terminology that we now have, the modified Entscheidungsproblem is a
decision problem: given a statement 𝜎 and a bound, we ask if there is a sequence 𝜔
of statements witnessing a proof that ends in 𝜎 and that is shorter than the bound.
A computer can quickly check whether a given proof is valid — this problem is
in NP. With the current status of the P versus NP problem, the answer to the
question in the prior paragraph is that no one knows of a fast algorithm but no one
can show that there isn’t one either.
As far back as 1956, Gödel raised these issues in a letter to von Neumann (this
letter did not become public until years later).‡
One can obviously easily construct a Turing machine, which for every formula 𝐹 in
first order predicate logic and every natural number 𝑛 , allows one to decide if there
†
That is, in a particular subject such as elementary number theory, the set of theorems is computably
enumerable. ‡ At the meeting where Gödel, as an unknown fresh PhD, announced his Incompleteness
Theorem, the only person who approached him with interest was von Neumann, who was already well
established. Later, when Gödel was trying to escape the Nazis, von Neumann wrote to the director of
the Institute for Advanced Study, “Gödel is absolutely irreplaceable. He is the only mathematician . . .
about whom I would dare to make this statement.” So they were professionally quite close. At the time
of the letter, von Neumann had cancer, probably from his work on the Manhattan Project. Gödel was
misinformed and wrote, “Since you now, as I hear, are feeling stronger, I would like to allow myself to
write you about a mathematical problem, of which your opinion would very much interest me.” Within
a year von Neumann had died. We don’t know if he replied or even read the letter.
340 Chapter V. Computational Complexity

is a proof of 𝐹 of length 𝑛 (length = number of symbols). Let Ψ(𝐹, 𝑛) be the number

of steps the machine requires for this and let 𝜙 (𝑛) = max𝐹 Ψ(𝐹, 𝑛) . The question is
how fast 𝜙 (𝑛) grows for an optimal machine. One can show that 𝜙 (𝑛) ≥ 𝑘 · 𝑛 . If
there really were a machine with 𝜙 (𝑛) ∼ 𝑘 · 𝑛 (or even ∼ 𝑘 · 𝑛 2 ), this would have
consequences of the greatest importance. Namely, it would obviously mean that in spite
of the undecidability of the Entscheidungsproblem, the mental work of a mathematician
concerning Yes-or-No questions could be completely replaced by a machine. After all,
one would simply have to choose the natural number 𝑛 so large that when the machine
does not deliver a result, it makes no sense to think more about the problem. Now it
seems to me, however, to be completely within the realm of possibility that 𝜙 (𝑛) grows
that slowly. . . . It would be interesting to know, for instance, the situation concerning
the determination of primality of a number and how strongly in general the number of
steps in finite combinatorial problems can be reduced with respect to simple exhaustive
search.
Again, this brings up back to this book’s beginning. Again, we are asking,
“What can be done?” But here we are asking about what can be done feasibly,
what can be done in a reasonable time. Taking more time than the lifetime of the
universe is not reasonable.
Summing up, we can compare P versus NP with the Halting problem. The
Halting problem and related results tell us, in the light of Church’s Thesis, what is
knowable in principle. The P versus NP question, in contrast, speaks to what we
can know in practice.

Discussion Certainly the P versus NP question is the sexiest one in the Theory
of Computing today. It has attracted a great deal of gossip. In 2018, a poll of
experts found that out of 152 respondents, 88% thought that P ≠ NP while only
12% thought that P = NP. This subsection discusses some of the intuition involved
in the question.
First we address the intuition around the conjecture that P ≠ NP.
One way to think about the question is that a problem is in P if
finding a solution is fast, while a problem is in NP if verifying the
correctness of a given witness is fast. Then the claim that P ⊆ NP
becomes the observation that if a problem is fast to solve then
A Selman’s plate, it must be fast to verify. But the other inclusion seems to most
courtesy S Selman experts to be extremely unlikely. For example, speaking informally,
S Aaronson has said, “I’d give it a 2 to 3 percent chance that P
equals NP. Those are the betting odds that I’d take.” Similarly, R Williams puts
the chance that P ≠ NP at 80%.
V Strassen has compared our confidence in this with our confidence in laws of
natural science such as 𝐹 = 𝑚𝑎 or 𝑃𝑉 = 𝑛𝑅𝑇 , “The evidence in favor of P ≠ NP . . .
is so overwhelming, and the consequences of their failure are so grotesque, that
their status may perhaps be compared to that of physical laws rather than that of
ordinary mathematical conjectures.”
As early as Karp’s original paper there was a sense that P ≠ NP was the natural
supposition. Here is the first paragraph of that paper.
Section 7. NP completeness 341

All the general methods presently known for computing the chromatic number of a
graph, deciding whether a graph has a Hamiltonian circuit, or solving a system of linear
inequalities in which the variables are constrained to be 0 or 1, require a combinatorial
search for which the worst case time requirement grows exponentialy with the length
of the input. In this paper we give theorems which strongly suggest . . . that these
problems, as well as many others, will remain intractible perpetually.

This intuition comes from a number of sources but an important one is the
everyday experience that there is a genuine difference between the difficulty of
finding a solution and that of verifying that an existing solution is correct.
Imagine a jigsaw puzzle. We perceive that if a demon gave
us an assembled puzzle 𝜔 , then checking that it is correct is very
much easier than it would have been to work out the solution
from scratch. Checking for correctness is mechanical, tedious. But
the finding of a solution, we perceive, is creative — we feel that
solving a jigsaw puzzle by brute-force trying every possible piece
against every other is too much computation to be practical.
Similarly, mathematicians find that verifying the correctness of a formally-
described proof is routine. But finding that proof in the first place may be the work
of a lifetime, or more.
Some commentators have extended this way of thinking beyond the narrow
bounds of Theoretical Computer Science. One is A Wigderson, “[P = NP would
be] utopia for the quest for knowledge and technological development by humans.
There would be a short program that, for every mathematical statement and given
page limit, would quickly generate a proof of that length, if one exists! There
would be a short program which, given detailed constraints on any engineering
task, would quickly generate a design which meets the given criteria, if one exists.
The design of new drugs, cheap energy, better strains of food, safer transportation,
and robots that would release us from all unpleasant chores, would become a
triviality.” He continues, “. . . most people revolt against the idea that such amazing
discoveries like Wiles’s proof of Fermat, Einstein’s relativity, Darwin’s evolution,
Edison’s inventions, as well as all the ones we are awaiting, could be produced in
succession quickly by a mindless robot. . . . If P = NP , any human (or computer)
would have the sort of reasoning power traditionally ascribed to deities, and this
seems hard to accept.”
Cook is of the same mind, “. . . Similar remarks apply to diverse creative
human endeavors, such as designing airplane wings, creating physical theories, or
even composing music. The question in each case is to what extent an efficient
algorithm for recognizing a good result can be found.” Perhaps it is hyperbole to
say that if P = NP then writing great symphonies would be a job for computers,
a job for mechanisms, but it is correct to say that if P = NP and if we can write
fast algorithms to recognize excellent music — and our everyday experience with
Artificial Intelligence makes this seem more and more a possibility — then we could
have fast mechanical writers of excellent music.
342 Chapter V. Computational Complexity

We finish with a taste of the intuition behind the contrarian view, the sense that
perhaps P = NP could be right.
Many observers have noted that there are cases where everyone “knew” that
some algorithm was the fastest but in the end it proved not to be so. The section on
Big-O begins with one, the grade school algorithm for multiplication. Another is
the problem of solving systems of linear equations. The Gauss’s Method algorithm,
which runs in time O (𝑛 3 ) , is perfectly natural and had been known for centuries
without anyone making improvements. However, while trying to prove that Gauss’s
Method is optimal, V Strassen found a O (𝑛 lg 7 ) method (lg 7 ≈ 2.81) .†
A more dramatic speedup happens with the Matching problem. It starts with a
graph whose vertices represent people and such that pairs of vertices are connected
if the people are compatible. We want a set of edges that is maximal, and such
that no two edges share a vertex. The naive algorithm tries all possible match
sets, which takes 2𝑚 checks where 𝑚 is the number of edges. Even with only a
hundred people there are more things to try than atoms in the universe. But since
the 1960’s we have an algorithm that runs in polytime.
Every day on the Theory of Computing blog feed there are examples of
researchers producing algorithms faster than the ones previously known. A person
can certainly have the sense that we are only just starting to explore what is
possible with algorithms. R J Lipton captured this feeling.
Since we are constantly discovering new ways to program our “machines,” why not
a discovery that shows how to factor? or how to solve SAT? Why are we all so sure that
there are no great new programming methods still to be discovered? . . . I am puzzled
that so many are convinced that these problems could not fall to new programming
tricks, yet that is what is done each and every day in their own research.
Knuth has a related but somewhat different take.
Some of my reasoning is admittedly naive: It’s hard to believe that P ≠ NP and that
so many brilliant people have failed to discover why. On the other hand if you imagine
a number 𝑀 that’s finite but incredibly large . . . then there’s a humongous number of
possible algorithms that do 𝑛 𝑀 bitwise addition or shift operations on 𝑛 given bits, and
it’s really hard to believe that all of those algorithms fail.
My main point, however, is that I don’t believe that the equality P = NP will turn
out to be helpful even if it is proved, because such a proof will almost surely be
nonconstructive. Although I think 𝑀 probably exists, I also think human beings will
never know such a value. I even suspect that nobody will even know an upper bound
on 𝑀 .
Mathematics is full of examples where something is proved to exist, yet the proof
tells us nothing about how to find it. Knowledge of the mere existence of an algorithm
is completely different from the knowledge of an actual algorithm.

†
Here is an analogy: consider the problem of evaluating 2𝑝 3 + 3𝑝 2 + 4𝑝 + 5. Someone might claim
that writing it as 2 · 𝑝 · 𝑝 · 𝑝 + 3 · 𝑝 · 𝑝 + 4 · 𝑝 + 5 makes obvious that it requires six multiplications.
But rewriting it as 𝑝 · (𝑝 · ( 2 · 𝑝 + 3 ) + 4 ) + 5 shows that it can be done with just three. That is,
naturalness and obviousness do not guarantee that something is correct. Without a proof, we must
worry that someone will produce a clever way to do the job with less.
Section 7. NP completeness 343

Of course, all this is speculation. Speculating is fun, and in order to make

progress in their work, people must have some intuition. But in the end, we look
to settle the question with proof.†

V.7 Exercises
7.11 This diagram is an extension of one we saw earlier. (It assumes that P ≠ NP.)

NP hard

RE Rec

P NP

On that, locate these languages.

(a) 𝐾 = {𝜎 𝜎 represents 𝑥 ∈ N where 𝜙𝑥 (𝑥)↓}
(b) ∅
(c) L𝐵 = { ⟨G, 𝑣 0, 𝑣 1 ⟩ there is a path from 𝑣 0 to 𝑣 1 of length at most 𝐵 }
(d) SAT
✓ 7.12 You hear someone say, “The Satisfiability problem is NP because it is not
computable in polynomial time, so far as we know.” It’s a short sentence but find
three mistakes.
✓ 7.13 Someone in your class says, “I will show that the Hamiltonian Circuit problem
is not in P, which will demonstrate that P ≠ NP. The algorithm to solve a given
instance G of the Hamiltonian Circuit problem is: generate all permutations of G ’s
vertices, test each to find if it is a circuit, and if any circuits appear then accept the
input, else reject the input. For sure that algorithm is not polynomial, since the
first step is exponential.” Where is their mistake?
✓ 7.14 Your friend says, “The problem of recognizing when one string is a substring
of another has a polytime algorithm, so it is not in NP.” They have misspoken;
help them out.
7.15 Someone in your study group wants to ask your professor, “Is the brute force
algorithm for solving the Satisfiability problem NP complete?” Explain to them
that it isn’t a sensible question, that they are making a type error.
7.16 True or false?
(a) The collection NP is a subset of the NP complete sets, which is a subset of
NP hard.
(b) The collection NP is a specialization of P to nondeterministic machines, so it
is a subset of P.
†
There is also in the air what we could think of as a third side to this debate. Theory should be guide
for practice. There is some recent evidence, as suggested by two Topics at the end of this chapter,
that the fact that a problem is NP complete does not mean that in practice it is intractable, even for
reasonably large-scale instances. So we can ask whether the attention given to the question puts the
focus of the community in exactly the right place.
344 Chapter V. Computational Complexity

✓ 7.17 Assume that P ≠ NP. Which of these statements can we infer from the fact
that the Prime Factorization problem is in NP, but is not known to be NP complete?
(a) There exists an algorithm for arbitrary instances of the Prime Factorization
problem.
(b) There exists an algorithm that efficiently solves arbitrary instances of this
problem.
(c) If we found an efficient algorithm for the Prime Factorization problem then we
could immediately use it to solve Traveling Salesman.

✓ 7.18 Suppose that L1 ≤𝑝 L0 . For each, decide if you can conclude it. (a) If L0 is
NP complete then so is L1 . (b) If L1 is NP complete then so is L0 . (c) If L0 is
NP complete and L1 is in NP then L1 is NP complete. (d) If L1 is NP complete
and L0 is in NP then L0 is NP complete. (e) It cannot be the case that both L0
and L1 are NP complete (f) If L1 is in P then so is L0 . (g) If L0 is in P then so
is L1 .

7.19 Show that these are in NP but are not NP complete, assuming that P ≠ NP.
(a) The language of even numbers.
(b) The language { G G has a vertex cover of size at most four } .

7.20 If P = NP then what happens with NP complete sets? Show that if P = NP

then every nontrivial language in P is NP complete.

✓ 7.21 Traveling Salesman is NP complete. From P ≠ NP which of the following

statements could we infer?
(a) No algorithm solves all instances of Traveling Salesman.
(b) No algorithm quickly solves all instances of Traveling Salesman.
(c) Traveling Salesman is in P.
(d) All algorithms for Traveling Salesman run in polynomial time.

✓ 7.22 Prove that the 4-Satisfiability problem is NP hard.

✓ 7.23 The Hamiltonian Path problem inputs a graph and decides if there are two
vertices in that graph such that there is a path between those two that contains all
the vertices.
(a) Show that Hamiltonian Path is in NP.
(b) This graph has a Hamiltonian path from 𝑣 0 to 𝑣 8 . Find it.

𝑣0 𝑣7 𝑣6 𝑣8

𝑣2 𝑣5 𝑣4

𝑣1 𝑣3

Why must those two be the endpoints?

(c) Show that Hamiltonian Circuit ≤𝑝 Hamiltonian Path.
(d) Conclude that the Hamiltonian Path problem is NP complete.
Section 7. NP completeness 345

✓ 7.24 The Longest Path problem is to input a graph and find the longest simple
path in that graph.
(a) Find the longest path in this graph.

𝑞0 𝑞1 𝑞2

𝑞3 𝑞4 𝑞5

𝑞6 𝑞7 𝑞8

(b) Remembering the technique for converting an optimization problem to a

language decision problem by using bounds, state this as a language decision
problem. Show that Longest Path ∈ NP.
(c) Show that the Hamiltonian Path problem reduces to Longest Path. Hint: lever-
age the bound from the prior item.
(d) Use the prior exercise to conclude that the Longest Path problem is NP com-
plete.
✓ 7.25 The Subset Sum problem inputs a multiset 𝑇 and a target 𝐵 ∈ N, and decides
if there is a subset 𝑇ˆ ⊆ 𝑇 whose elements add to the target. The Partition problem
inputs a multiset 𝑆 and decides whether or not it has a subset 𝑆ˆ ⊂ 𝑆 so that the
sum or elements of 𝑆ˆ equals the sum of elements not in that subset.
(a) Find a subset of 𝑇 = { 3, 4, 6, 7, 12, 13, 19 } that adds to 𝐵 = 30.
(b) Find a partition of 𝑆 = { 3, 4, 6, 7, 12, 13, 19 } .
(c) Show that if the sum of the elements in a set is odd then the set has no
partition.
(d) Express each problem as a language decision problem.
(e) Prove that Partition ≤𝑝 Subset Sum. (Hint: handle separately the case where
the sum of elements in 𝑆 is odd.)
(f) Conclude that Subset Sum is NP complete.
7.26 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
to each other by an edge.
(a) Find an independent set in this graph.
𝑞0 𝑞1 𝑞2

𝑞3 𝑞4 𝑞5

(b) State Independent Set as a language decision problem.

(c) Decide if 𝐸 = (𝑃 0 ∨ ¬𝑃 1 ∨ ¬𝑃 2 ) ∧ (𝑃 1 ∨ 𝑃 2 ∨ ¬𝑃 3 ) is satisfiable.
(d) State 3-Satisfiability as a language decision problem.
(e) With the expression 𝐸 , make a triangle for each of the two clauses, where the
vertices of the first are labeled 𝑣 0 , 𝑣 1 , and 𝑣 2 , while the vertices of the second
are labeled 𝑤 1 , 𝑤 2 , and 𝑤 3 . In addition to the edges forming the triangles,
also put one connecting 𝑣 1 with 𝑤 1 , and one connecting 𝑣 2 with 𝑤 2 .
346 Chapter V. Computational Complexity

(f) Sketch an argument that 3-Satisfiability ≤𝑝 Independent Set.

✓ 7.27 The difficulty in settling P = NP is to prove lower bounds. That is, the
trouble lies in showing, for a given problem, that any algorithm at all must use
such-and-such many steps. One common mistake is to reason that any algorithm
for the problem must take at least as many steps as the length of the input, thinking
that to compute the output the algorithm must at least read all of the input. We
will exhibit a familiar problem for which this isn’t true.
Consider the successor function. Show that it can be computed on a Turing machine
without reading all of the input. More, show how to compute it in constant time,
that it has an algorithm whose running time when the input is large is the same
as the running time when the input is small. Assume that the algorithm is given
the input 𝑛 in unary with the head under the leftmost 1, and that it ends with
𝑛 + 1-many 1’s and with the head under the leftmost 1.
7.28 Do we know of any problems in NP and not in P, and that are not NP
complete?
7.29 Find three languages so that L2 ⊂ L1 ⊂ L0 , and L2, L0 are NP complete,
while L1 ∈ P.
7.30 Prove Lemma 7.4.
7.31 The class P has some nice closure properties, and so does NP. (a) Prove that
NP is closed under union, so that if L, L̂ ∈ NP then L ∪ L̂ ∈ NP. (b) Prove that
NP is closed under concatenation. (c) Argue that no one can prove that NP is not
closed under set complement.
7.32 Is the set of NP complete sets countable or uncountable?
7.33 We will sketch a proof that the Halting problem is NP hard but not NP.
Consider the language HP = { ⟨P𝑒 , 𝑥⟩ 𝜙𝑒 (𝑥)↓}. (a) Show that HP ∉ NP.
(b) Sketch an argument that for any problem L ∈ NP, there is a polynomial time
computable verifier, 𝑓 : B∗ → B∗ , such that 𝜎 ∈ L if and only if 𝑓 (𝜎) ∈ HP .

Section
V.8 Other classes
There are many other defined complexity classes. The next class is quite natural.

EXP The Satisfiability problem is a touchstone result among problems in NP. We

have discussed computing it using a nondeterministic Turing machine that is
unboundedly parallel, or alternatively using a witness and verifier. But, naively, in
the familiar computational setting of a deterministic machine, it appears that to
solve it, we must go through the truth table line by line. That is, SAT appears to
take exponential time.
In this chapter’s first section we included O ( 2𝑛 ) and O ( 3𝑛 ) , and by extension
other exponentials, in the list of common orders of growth.
Section 8. Other classes 347

8.1 Definition A language decision problem is an element of the complexity class

EXP if there is an algorithm for solving it that runs in time O (𝑏 𝑝 (𝑛) ) for some
constant base 𝑏 and polynomial 𝑝 .
A first, informal, take is that EXP contains nearly every problem with which
we concern ourselves in practice — it contains most problems that we seriously
hope ever to attack. In contrast with polytime, where a rough summary is that
its problems all have an algorithm that can conceivably be used, for the hardest
problems in EXP, even the best algorithms are just too slow.
8.2 Lemma P ⊆ NP ⊆ EXP
Proof Fix L ∈ NP. We can verify L on a deterministic Turing machine P in
polynomial time using a witness whose length is bounded by the same polynomial.
Let this problem’s bound be 𝑛𝑐 .
We will decide L in exponential time by brute-forcing it: we will use P to run
every possible verification. Trying any single witness requires polynomial time,
𝑛𝑐 . Witnesses are in binary so for length ℓ there are 0 ≤𝑖 ≤ℓ 2𝑖 = 2ℓ+1 − 1 many
Í
possible ones; In total then, brute force requires O (𝑛𝑐 2𝑛 ) operations. Finish by
𝑐

observing that 𝑛𝑐 2𝑛 is in O ( 2𝑛 ) .
𝑐 𝑐

We know by a result called the Time Hierarchy Theorem that the three classes
are not all equal. But where the division is, we don’t know. Just as we don’t today
have a proof that P is a proper subset of NP, we also don’t know whether or not
there are NP complete problems that absolutely require exponential time. The
class NP could conceivably be contained in a smaller deterministic time complexity
class — for instance, maybe Satisfiability can be solved in less than exponential
time. But we just don’t know.

EXP NP
P

8.3 Figure: The blob encloses all problems. Shaded are the three classes P, NP, and
EXP. They are drawn with strict containment, which most experts guess is the true
arrangement, but no one knows for sure.

Time Complexity Researchers have generalized to many more classes, trying

to capture various aspects of computation. For instance, the impediment that a
programmer runs across first is time.
8.4 Definition Let 𝑓 : N → N. A decision problem for a language is an element of
DTIME (𝑓 ) if it is decided by a deterministic Turing machine that runs in time O (𝑓 ) .
A problem is an element of NTIME (𝑓 ) if it is decided by a nondeterministic Turing
machine that runs in time O (𝑓 ) .
348 Chapter V. Computational Complexity

8.5 Lemma A problem is polytime, P, if it is a member of DTIME (𝑛𝑐 ) for some

power 𝑐 ∈ N.

P= DTIME (𝑛𝑐 ) = DTIME (𝑛) ∪ DTIME (𝑛 2 ) ∪ DTIME (𝑛 3 ) ∪ · · ·

𝑐 ∈N

The matching statements hold for NP and EXP.

NP = NTIME (𝑛𝑐 ) = NTIME (𝑛) ∪ NTIME (𝑛 2 ) ∪ NTIME (𝑛 3 ) ∪ · · ·

𝑐 ∈N
2 3
EXP = DTIME ( 2𝑛 ) = DTIME ( 2𝑛 ) ∪ DTIME ( 2𝑛 ) ∪ DTIME ( 2𝑛 ) ∪ · · ·
𝑐
Ø

𝑐 ∈N
Proof The only equality that is not immediate is the last one. Recall that a problem
is in EXP if an algorithm for it that runs in time O (𝑏 𝑝 (𝑛) ) for some constant
base 𝑏 and polynomial 𝑝 . The equality above only uses the base 2. To cover the
2 2
discrepancy, we will show that 3𝑛 ∈ O ( 2 (𝑛 ) ) . Consider lim𝑥→∞ 2 (𝑥 ) /3𝑥 . Rewrite
the fraction as ( 2𝑥 /3) 𝑥 , which when 𝑥 > 2 is larger than ( 4/3) 𝑥 , which goes to
infinity. This argument works for any base, not just 𝑏 = 3.
8.6 Remark While the above description of NP reiterates its naturalness, as we saw
earlier, the characterization that proves to be most useful in practice is that a
problem L is in NP if there is a deterministic Turing machine V such that for each
input 𝜎 there is a polynomial length witness 𝜔 and the verification on V for 𝜎
using 𝜔 takes polytime.

Space Complexity We can consider how much space is used in solving a problem.
8.7 Definition A deterministic Turing machine runs in space 𝑠 : B∗ → R+ if for
all but finitely many inputs 𝜎 , the computation on that input uses less than or
equal to 𝑠 (|𝜎 |) -many cells on the tape. A nondeterministic Turing machine runs
in space 𝑠 if for all but finitely many inputs 𝜎 , every computation path on that
input takes less than or equal to 𝑠 (|𝜎 |) -many cells.
The machine must use less than or equal to 𝑠 (|𝜎 |) -many cells even on non-
accepting computations.
8.8 Definition Let 𝑠 : N → N. A language decision problem is an element of
DSPACE (𝑠) , or SPACE (𝑠) , if that languages is decided by a deterministic Turing
machine that runs in space O (𝑠) . A problem is an element of NSPACE (𝑠) if
the languages is decided by a nondeterministic Turing machine that runs in
space O (𝑠) .
The definitions arise from a sense we have of a symmetry between time and
space, that they are both examples of computational resources. (There are other
resources; for instance we may want to minimize disk reading or writing, which
may be quite different than space usage.) But space is not just like time. For one
thing, while a program can take a long time but use only a little space, the opposite
is not possible.
Section 8. Other classes 349

8.9 Lemma Let 𝑓 : N → N. Then DTIME (𝑓 ) ⊆ DSPACE (𝑓 ) . As well, this holds for
nondeterministic machines, NTIME (𝑓 ) ⊆ NSPACE (𝑓 ) .
Proof A machine can use at most one cell per step.
8.10 Definition

PSPACE = ∪𝑐 ∈ N DSPACE (𝑛𝑐 ) = DSPACE (𝑛) ∪ DSPACE (𝑛 2 ) ∪ · · ·

NPSPACE = ∪𝑐 ∈ N NSPACE (𝑛𝑐 ) = NSPACE (𝑛) ∪ NSPACE (𝑛 2 ) ∪ · · ·
So PSPACE is the class of problems that can be solved by a deterministic Turing
machine using only a polynomially-bounded amount of space, regardless of how
long the computation takes.
As even those preliminary results suggest, restricting by space instead of time
allows for a lot more power.
8.11 Lemma P ⊆ NP ⊆ PSPACE
Proof For any problem in NP, check all possible witness strings 𝜔 . These take at
most polynomial space. If any proof string works then the answer to the problem
is ‘yes’. Otherwise, the answer is ‘no’.
Note that the method in the proof may take exponential time but it takes only
polynomial space.
Here is a result whose proof is beyond our scope, but that serves as a caution
that time and space are very different. We don’t know whether deterministic
polynomial time equals nondeterministic polynomial time, but we do know the
answer for space.
8.12 Theorem (Savitch’s Theorem) PSPACE = NPSPACE
We finish with a list of the most natural complexity classes.
8.13 Definition These are the canonical complexity classes

1. L = DSPACE ( lg 𝑛) , deterministic log space and NL = NSPACE ( lg 𝑛) , nonde-

terministic log space

2. P, deterministic polynomial time and NP, nondeterministic polynomial time

3. E = ∪𝑘=1,2,... DTIME (𝑘 𝑛 ) and NE = ∪𝑘=1,2,... NTIME (𝑘 𝑛 )

4. EXP = ∪𝑘=1,2,... DTIME ( 2𝑛 ) , deterministic exponential time and NEXP =

𝑘

∪𝑘=1,2,... NTIME ( 2𝑛 ) , nondeterministic exponential time

𝑘

5. PSPACE, deterministic polynomial space

6. EXPSPACE = ∪𝑘=1,2,... DSPACE ( 2𝑛 ) , deterministic exponential space

𝑘

The Zoo Researchers have studied a great many complexity classes. There
are so many that they have been gathered into an online Complexity Zoo, at
complexityzoo.uwaterloo.ca/.
One way to understand these classes is that defining a class asks a type of
350 Chapter V. Computational Complexity

Theory of Computing question. For instance, we have already seen that asking
whether NP equals P is a way of asking whether unbounded parallelism makes any
essential difference — can a problem change from intractable to tractable if we
switch from a deterministic to a nondeterministic machine? Similarly, we know
that P ⊆ PSPACE. In thinking about whether the two are equal, researchers are
considering the space-time tradeoff: if you can solve a problem without much
memory does that mean you can solve it without using much time?
Here is one extra class, to give some flavor of the possibilities. For more, see
the Zoo.
The class BPP, Bounded-Error Probabilistic Polynomial Time, contains the
problems solvable by an nondeterministic polytime machine such that if the answer
is ‘yes’ then at least two-thirds of the computation paths accept and if the answer is
‘no’ then at most one-third of the computation paths accept. (Here all computation
paths have the same length.) This is often identified as the class of feasible problems
for a computer with access to a genuine random-number source. Investigating
whether BPP equals P is asking whether whether every efficient randomized
algorithm can be made deterministic: are there some problems for which there are
fast randomized algorithms but no fast deterministic ones?
On reading in the Zoo, a person is struck by two things. There are many, many
results listed — we know a lot. But there also are many questions to be answered —
breakthroughs are there waiting for a discoverer.

V.8 Exercises

✓ 8.14 Give a naive algorithm for each problem that is exponential. (a) Subset Sum
problem (b) 𝑘 Coloring problem

8.15 Show that 𝑛 ! is 2O (𝑛 ). Show that Traveling Salesman ∈ EXP.

✓ 8.16 This illustrates how large a problem can be and still be in EXP. Consider a
game that has two possible moves at each step. The game tree is binary.
(a) How many elementary particles are there in the universe?
(b) At what level of the game tree will there be more possible branches than there
are elementary particles?
(c) Is that longer than a chess game can reasonably run?

8.17 We will show that a polynomial time algorithm that calls a polynomial time
subroutine can run, altogether, in exponential time.
(a) Verify that the grade school algorithm for multiplication gives that squaring
an 𝑛 -bit integer takes time O (𝑛) .
(b) Verify that repeated squaring of an 𝑛 -bit integer gives a result that has length
2𝑖 𝑛 , where 𝑖 is the number of squarings.
(c) Verify that if your polynomial time algorithm calls a squaring subroutine 𝑛
times then the complexity is O ( 4𝑛 𝑛 2 ) , which is exponential.
Extra A. RSA Encryption 351

Extra
V.A RSA Encryption

In this chapter we have built up the sense that there are functions that are intractible
to compute. Here we see how we can try to leverage this to engineering advantage.
We will describe the celebrated RSA encryption system.
One of the great things about the interwebs, besides that you can get free books,
is that you can buy stuff. You send a credit card number and a couple of days
later the stuff appears. For this to be practical, your credit card number must be
encrypted.
When you visit a web site using a https address, that site sends you information,
called a key, that your browser uses to encrypt your card number. The web site then
uses a different key to decrypt. This is an important point: the decrypter must differ
from the encrypter since people on the net can see the encrypter information that
the site sent you. But the site keeps the decrypter information private. These two,
encrypter and decrypter, form a matched pair. We will describe the mathematical
technologies that make this work.

The arithmetic We can take the view that everything on a computer is numbers.
Consider the message ‘send money’. Its ASCII encoding is 115 101 110 100 32 109
111 110 101 121. Converting to a bitstring gives 01110011 01100101 01101110
01100100 00100000 01101101 01101111 01101110 01100101 01111001. In
decimal that’s 544 943 221 199 950 100 456 825. So there is no loss in generality
in viewing everything we do, including encryption systems, as numerical operations.
To make encryption systems, mathematicians and computer scientists have
leveraged that there are there are things we can do easily but that we do not know
how to easily undo — there are operations we can use for encryption that are fast,
but such that the operations needed to decrypt (without the decrypter) are believed
to be so slow that they are completely impractical. So this is the engineering of
Big-O.
We will describe an algorithm based on the Factoring prob-
lem. We have algorithms for multiplying numbers that are
fast. By comparison, the algorithms that we have for starting
with a number and decomposing it into factors are quite slow.
To illustrate this, you might contrast the time it takes you to
multiply two four-digit numbers by hand with the time it takes
you to factor an eight-digit number chosen at random. For Adi Shamir (b 1952), Ron
that second job set aside an afternoon; it’ll take a while. Rivest (b 1947), Leonard
Adleman (b 1945)
The algorithm that we shall describe exploits this difference.
It was invented in 1976 by three graduate students, R Rivest,
A Shamir, and L Adleman. Rivest read a paper proposing the idea of key pairs
and decided to develop an implementation. Over a year, he and Shamir came
up with a number of ideas and for each Adleman would then produce a way to
break it. Finally they thought to use Fermat’s Little Theorem (see below). Adleman
352 Chapter V. Computational Complexity

was unable to break it since, he said, it seemed that only solving the Factoring
problem would break it and no one knew how to do that. Their algorithm, called
RSA, was first announced in Martin Gardner’s Mathematical Games column in the
August 1977 issue of Scientific American. It generated a tremendous amount of
interest and excitement.
The basis of RSA is to find three numbers, a modulus 𝑛 , an encrypter 𝑒 , and a
decrypter 𝑑 , related by this equation (here 𝑚 is the message, as a number).

(𝑚𝑒 )𝑑 ≡ 𝑚 ( mod 𝑛)

The encrypted message is 𝑚𝑒 mod 𝑛 . To decrypt it, to recover 𝑚 , calculate

(𝑚𝑒 )𝑑 mod 𝑛 . These three are chosen so that knowing 𝑒 and 𝑛 , or even 𝑚 , still
leaves a potential secret-cracker who is looking for 𝑑 with an extremely difficult
job.
To choose them, first choose distinct prime numbers 𝑝 and 𝑞 . Pick these at
random so they are of about equal bit-lengths. Compute 𝑛 = 𝑝𝑞 and 𝜑 (𝑛) =
(𝑝 − 1) · (𝑞 − 1) . Next, choose 𝑒 with 1 < 𝑒 < 𝜑 (𝑛) that is relatively prime to 𝑛 .
Finally, find 𝑑 as the multiplicative inverse of 𝑒 modulo 𝑛 . (We shall show below
that all these operations, including using the keys for encryption and decryption,
can be done quickly.)
The pair ⟨𝑛, 𝑒⟩ is the public key and the pair ⟨𝑛, 𝑑⟩ is the private key. The length
of 𝑑 in bits is the key length. Most experts consider a key length of 2 048 bits to be
secure for the mid-term future, until 2030 or so, when computers will have grown
in power enough that they may be able to use an exhaustive brute-force search to
find 𝑑 . Quite cautious people might use 3 072 bits.
1.1 Example Alice chooses the primes 𝑝 = 101 and 𝑞 = 113 (these are too small to use
in practice but are good for an illustration) and then calculates 𝑛 = 𝑝𝑞 = 11 413
and 𝜑 (𝑛) = (𝑝 − 1) (𝑞 − 1) = 11 200. To get the encrypter she randomly picks
numbers 1 < 𝑒 < 11 200 until she gets one that is relatively prime to 11 200,
choosing 𝑒 = 3533. She publishes her public key ⟨𝑛, 𝑒⟩ = ⟨11 413, 3 533⟩ on her
home page. She computes the decrypter 𝑑 = 𝑒 − 1 mod 11 200 = 6 597, and finds a
safe place to store her private key ⟨𝑛, 𝑑⟩ = ⟨11 413, 6 597⟩ .
Bob wants to say ‘Hi’. In ASCII that’s 01001000 01101001. If he converted that
string into a single decimal number it would be bigger than 𝑛 so he breaks it into
two substrings, getting the decimals 72 and 105. Using her public key he computes

723533 mod 11413 = 10496 10535333 mod 11413 = 4861

and sends Alice the sequence ⟨10496, 4861⟩ . Alice recovers his message by using
her private key.

104966597 mod 11413 = 72 48616597 mod 11413 = 105

The arithmetic, fast We’ve just illustrated that RSA uses invertible operations.
There are lots of ways to get invertible operations so our understanding of RSA is
Extra A. RSA Encryption 353

incomplete unless we know why it uses these particular operations. As discussed

above, the important point is that they can be done quickly, but undoing them,
finding the decrypter, is believed to take a very long time.
We start with a classic, beautiful, result from Number Theory.
A.2 Theorem (Prime Number Theorem) The number of primes less than 𝑛 ∈ N is
approximately 𝑛/ln (𝑛) . That is, this limit is 1.

number of primes less than 𝑛

lim
𝑛→∞ (𝑛/ln 𝑛)
This shows the number of primes less than 𝑛 for some values up to a million.
100 000 ( 1 000 000, 78 498 )

50 000 𝑥/ln (𝑥 )

500 000 1 000 000

This theorem says that primes are common. For example, the number of primes
less than 21024 is about 21024 /ln ( 21024 ) ≈ 21024 /709.78 ≈ 21024 /29.47 ≈ 21015 .
Said another way, if we choose a number 𝑛 at random then the probability that it
is prime is about 1/ln (𝑛) and so a random number that is 1024 bits long will be a
prime with probability about 1/( ln ( 21024 )) ≈ 1/710. On average we need only
select 355 odd numbers of about that size before we find a prime. Hence we can
efficiently generate large primes by just picking random numbers, as long as we
can efficiently test their primality.
On our way to giving an efficient way to test primality, we observe that the
operations of multiplication and addition modulo 𝑚 are efficient.
1.3 Example Multiplying 3 915 421 by 52 567 004 modulo 3 looks hard. The naive
approach is to first take their product and then divide by 3 to find the remainder.
But there is a more efficient way. Rather than multiply first and then reduce
modulo 𝑚 , reduce first and then multiply. That is, we know that if 𝑎 ≡ 𝑏 ( mod 𝑚)
and 𝑐 ≡ 𝑑 ( mod 𝑚) then 𝑎𝑐 ≡ 𝑏𝑑 ( mod 𝑚) and so since 3 915 421 ≡ 1 ( mod 3)
and 52 567 004 ≡ 2 ( mod 3) we have this.

3 915 421 · 52 567 004 ≡ 1 · 2 ( mod 3) ≡ 2 ( mod 3)

Similarly, exponentiation modulo 𝑚 is also efficient, both in time and in space.

1.4 Example Consider raising 4 to the 13-th power, modulo 𝑚 = 497. The naive
approach would be to raise 4 to the 13-th power, which is a very large number, and
then reduce modulo 497. But there is a better way.
Start by expressing the power 13 in base 2 as 13 = 8 + 4 + 1 = 11012 . So,
413 = 48 · 44 · 41 . If we can efficiently get those powers then the prior example
says that we can multiply them modulo 𝑚 efficiently, and we will be all set.
Get these powers by repeated squaring, modulo 𝑚 . Start with 𝑝 = 1. Squaring
gives 42 , then squaring again gives 44 , and squaring again gives 48 . Squaring
354 Chapter V. Computational Complexity

modulo 𝑚 is just a multiplication, which we can do efficiently.

The last thing that we need for efficiently testing primality is to efficiently find
the multiplicative inverse modulo 𝑚 . Recall that two numbers are relatively prime
or coprime if their greatest common divisor is 1. For example, 15 = 3 · 5 and
22 = 2 · 11 are relatively prime.
1.5 Lemma If 𝑎 and 𝑚 are relatively prime then there is an inverse for 𝑎 modulo 𝑚 , a
number 𝑘 such that 𝑎 · 𝑘 ≡ 1 ( mod 𝑚)
Proof Because the greatest common divisor of 𝑎 and 𝑚 is 1, Euclid’s algorithm
produces a linear combination of the two, a 𝑠𝑎 + 𝑡𝑚 for some 𝑠, 𝑡 ∈ Z, that adds
to 1. Doing the operations modulo 𝑚 gives 𝑠𝑎 + 𝑡𝑚 ≡ 1 ( mod 𝑚) . Since 𝑡𝑚 is a
multiple of 𝑚 , we have 𝑡𝑚 ≡ 0 ( mod 𝑚) , leaving 𝑠𝑎 ≡ 1 ( mod 𝑚) , and 𝑠 is the
inverse of 𝑎 modulo 𝑚 .
Euclid’s algorithm is efficient, both in time and space, so finding an inverse
modulo 𝑚 is efficient.
With that we can efficiently test for primes. The simplest way to test whether a
number is prime is to try dividing it by all possible factors. But that is very slow.
There is a faster way, based on the next result.
A.6 Theorem (Fermat’s Little Theorem) For a prime 𝑝 , if 𝑎 ∈ Z is not divisible
by 𝑝 then 𝑎𝑝 − 1 ≡ 1 ( mod 𝑝) .
Proof Let 𝑎 be an integer not divisible by the prime 𝑝 . Multiply 𝑎 by each number
𝑖 ∈ { 1, ... 𝑝 − 1 } and reduce modulo 𝑝 to get the numbers 𝑟𝑖 = 𝑖𝑎 mod 𝑝 .
We will show that the set 𝑅 = {𝑟 1, ... 𝑟 𝑝 − 1 } equals the set 𝑃 = { 1, ... 𝑝 − 1 }.
First, 𝑅 ⊆ 𝑃 . Because 𝑝 is prime and does not divide 𝑖 or 𝑎 , it does not divide
their product 𝑖𝑎 . Thus 𝑟𝑖 . 0 ( mod 𝑝) and so all the 𝑟𝑖 are members of the set
{ 1, ... 𝑝 − 1 }.
To get inclusion the other way, 𝑃 ⊆ 𝑅 , we show that if 𝑖 0 ≠ 𝑖 1 then 𝑟𝑖 0 ≠ 𝑟𝑖 1 . For,
with 𝑟 0 − 𝑟 1 = 𝑖 0𝑎 − 𝑖 1𝑎 = (𝑖 0 − 𝑖 1 )𝑎 , because 𝑝 is prime and does not divide 𝑖 0 − 𝑖 1
or 𝑎 as each is smaller in absolute value than 𝑝 , it does not divide their product.
That means that the two sets have the same number of elements, so 𝑃 ⊆ 𝑅 .
To finish, multiply together the elements of that set.

𝑎 · 2𝑎 · · · (𝑝 − 1)𝑎 ≡ 1 · 2 · · · (𝑝 − 1) ( mod 𝑝)
(𝑝 − 1) ! · 𝑎𝑝 −1 ≡ (𝑝 − 1) ! ( mod 𝑝)

Canceling the (𝑝 − 1) !’s gives the result.

1.7 Example Let the prime be 𝑝 = 7. Of course, any natural number 𝑎 with 0 < 𝑎 < 𝑝
is not divisible by 𝑝 . This list verifies that 𝑎𝑝 − 1 − 1 is divisible by 𝑝 .

𝑎 1 2 3 4 5 6
𝑎𝑝 − 1
= 𝑎6 1 64 729 4 096 15 625 46 656
6
(𝑎 − 1)/7 0 9 104 585 2 232 6 665

By Fermat’s Little Theorem, given 𝑛 , if we find a base 𝑎 with 0 < 𝑎 < 𝑛 so that
Extra A. RSA Encryption 355

𝑎𝑛−1 mod 𝑛 is not 1 then 𝑛 is not prime.

1.8 Example Let 𝑛 = 415 692. If 𝑎 = 2 then 2415692 ≡ 58346 ( mod 415693) so 𝑛 is
not prime.
So if we are given 𝑛 and try to show it is not prime with a number of such 𝑎 ’s,
and each time find that 𝑎𝑛− 1 mod 𝑛 = 1, then we may start to think that 𝑛 is prime
after all. Now, there are composite numbers 𝑛 where 𝑎𝑛− 1 ≡ 1 ( mod 𝑛) but 𝑛 is
not prime. Such a number is a Fermat liar or Fermat pseudoprime with base 𝑎 .
One is 𝑛 = 341 = 11 · 31 for base 𝑎 = 2, since 2340 ≡ 1 ( mod 341) . However,
computer searches suggest that these are very rare.
The rarity of exceptions suggests that we can use a probabilistic primality
test: given 𝑛 ∈ N to test for primality, pick at random a base 𝑎 with 0 < 𝑎 < 𝑛 and
calculate whether 𝑎𝑛− 1 ≡ 1 ( mod 𝑛) . If it is not true then 𝑛 is not prime and we
stop testing. But if it is true then we have evidence that 𝑛 is prime. Now we iterate
until we reach the desired level of confidence.
As to the relationship between number of iterations and the confidence level,
researchers have shown that if 𝑛 is not prime then each choice of 𝑎 has a less than
1/2 chance of finding that 𝑎𝑛− 1 ≡ 1 ( mod 𝑛) . So if 𝑛 were not prime and we did
the test with two different bases 𝑎 0, 𝑎 1 then there would be a less than ( 1/2) 2
1 1
chance of getting both 𝑎𝑛− 0 ≡ 1 ( mod 𝑛) and 𝑎𝑛− 1 ≡ 1 ( mod 𝑛) . Restated, there
2
is at least a 1 − ( 1/2) chance that 𝑛 is prime. In general, after 𝑘 -many iterations
of choosing a base, doing the calculation, and never finding that that 𝑛 is not prime,
then we have a greater than 1 − ( 1/2)𝑘 chance that 𝑛 is prime.
In summary, if 𝑛 passes 𝑘 -many tests for any reasonable-sized 𝑘 then we are
quite confident that it is prime. Our interest in this test is that it is extremely fast;
it runs in time O (𝑘 · ( log 𝑛) 2 · log log 𝑛 · log log log 𝑛) . So we can run it lots of
times, becoming very confident, in not very much time.
1.9 Example We could test whether 𝑛 = 7 is prime by computing, say, that 36 ≡ 1
( mod 7) , and 56 ≡ 1 ( mod 7) , and 66 ≡ 1 ( mod 7) . The fact that 𝑛 = 7 does not
fail makes us confident it is prime.
The RSA algorithm also uses this offshoot of Fermat’s Little Theorem.
1.10 Corollary Let 𝑝 and 𝑞 be unequal primes and suppose that 𝑎 is not divisible by
either. Then 𝑎 (𝑝 − 1 ) (𝑞− 1 ) ≡ 1 ( mod 𝑛) .
Proof By Fermat, 𝑎𝑝 − 1 ≡ 1 ( mod 𝑝) and 𝑎𝑞− 1 ≡ 1 ( mod 𝑞) . Raise the first to the
𝑞 − 1 power and the second to the 𝑝 − 1 power, giving 𝑎 (𝑝 −1 ) (𝑞−1 ) ≡ 1 ( mod 𝑝)
and 𝑎 (𝑝 − 1 ) (𝑞− 1 ) ≡ 1 ( mod 𝑞) . Since 𝑎 (𝑝 − 1 ) (𝑞− 1 ) − 1 is divisible by both 𝑝 and 𝑞 ,
it is divisible by their product 𝑝𝑞 = 𝑛 .
Experts think that the most likely attack on RSA encryption is by factoring the
modulus 𝑛 . Anyone who factors 𝑛 can use the same method as the RSA key setup
process to turn the encrypter 𝑒 into the decrypter 𝑑 . That’s why 𝑛 is taken to be
the product of two large primes; it makes factoring as hard as possible.
There is a factoring algorithm that takes only O (𝑏 3 ) time (and O ((𝑏) space),
called Shor’s algorithm. But it runs only on quantum computers. At this moment
356 Chapter V. Computational Complexity

there are no such computers built, although there has been progress on that. For
the moment, RSA seems safe. (There are schemes that could replace it, if needed.)

V.A Exercises
✓ A.11 There are twenty five primes less than or equal to 100. Find them.
✓ A.12 We can walk through an RSA calculation. (a) For the primes, take
𝑝 = 11, 𝑞 = 13. Find 𝑛 = 𝑝𝑞 and 𝜑 (𝑛) = (𝑝 − 1) · (𝑞 − 1) . (b) For the the
encoder 𝑒 use the smallest prime 1 < 𝑒 < 𝜑 (𝑛) that is relatively prime with 𝜑 (𝑛) .
(c) Find the decoder 𝑑 , the multiplicative inverse of 𝑒 modulo 𝑛 . (You can uses
Euclid’s algorithm, or just test the candidates.) (d) Take the message to be
represented as the number 𝑚 = 9. Encrypt it and decrypt it.
A.13 To test whether a number 𝑛 is prime, we could just try dividing it by all
numbers less than it. (a) Show that√we needn’t try all numbers less than 𝑛 , instead
we can just √
try all 𝑘 with 2 ≤ 𝑘 ≤ 𝑛 . (b) Show that we cannot lower that any
further than 𝑛 . (c) For input 𝑛 = 1012 how many numbers would you need to test?
(d) Show that this is a terrible algorithm since it is exponential in the size of the
input.
A.14 Show that the probability that a random 𝑏 -bit number is prime is about 1/𝑏 .

Extra
V.B Good-enoughness

A theory shapes the way that you look at the world, at how you see and address
what comes before you in practice. For example, Newton’s 𝐹 = 𝑚𝑎 points is a
program for analyzing physical situations: if you see an acceleration then look
around for a force. That approach has been fantastically successful, enabling us to
build bridges, send people to the moon, etc. Likewise, Darwin’s theory tells us that
if you see a change in a species then look for a reproductive advantage.
Here we will point out a way in which a naive understanding of Complexity
Theory can lead to a misunderstanding of what can be done in practice. Of course,
the theorems are right — the proofs check out, the results stand up to formalization,
etc. But in learning, we build mental models of what those formal statements
mean and there is a common misperception about solving problems that our theory
labels “hard.”
Cobham’s Thesis identifies the problems having a tractable algorithm with P.
However, we have noted that just because a problem is in P does not mean that it
has an algorithm that we could use in practice. An example is a problem whose
fastest algorithm is O (𝑛 1000 ) and another is a problem whose algorithm has a huge
coefficient, such as 21000 · 𝑛 2 . The flip side of this is that just because a problem is
NP hard does not mean that it is hard to solve on problems that we see in practice.
It could be that an algorithm’s runtime function takes a while to get big and the
first 20 000 instances are quite doable. For another way, consider this function.
Extra B. Good-enoughness 357

400 000

300 000

𝑛 lg 𝑛 – if 𝑛 is a multiple of 5
(
200 000 𝑓 (𝑛) = 2
𝑛 – otherwise
100 000

5 10 15 20

Most of the time 𝑓 grows slowly but for every fifth input it is super-polynomial.
The exceptions could be rarer than that, such as every 10 or every 1010 or even
every 10𝑥. The definition of Big O is such that as long as there are infinitely
many super-polynomial exceptions then the growth of the function as a whole is
superpoly. If a problem’s best algorithm is O (𝑓 ) then we classify that problem as
hard. But if the exception comes once in 1010 times then for any single instance
the chance of a fast runtime is awfully good.
In short, thinking that NP hard problems are sure
to be too slow to solve except for extremely small
inputs is an incomplete understanding.
An example of an NP complete problem for which
there are available very capable algorithms is the
London pubs, via Google Earth Traveling Salesman problem. There are algorithms
that can in a reasonable time find solutions for prob-
lem instances with millions of nodes, either giving the optimal solution or, with a
high probability, even more quickly finding a path just two or three percent away
from the optimal solution. Recently a group of applied mathematicians solved the
the minimal pub crawl, the shortest route to visit all 24 727 UK pubs. The optimal
tour is 45 495 239 meters. The algorithm took 305.2 CPU days, running in parallel
on up to 48 cores on Linux servers. That is a lot of computing but it is also a lot of
pubs — this is not a toy example.
Another group solved the Traveling Salesman instance of visiting all 24 978
cities in Sweden, giving a tour of about 72 500 kilometers. The approach was
to find a nearly-best solution and then use that to find the best one. The
final stages, that improved the lower bound by by 0.000 023 percent, required
eight years of computation time running in parallel on a network of Linux
workstations.
There are a number of systems for solving the Traveling Salesman problem
that are widely available. An example is that the Free mathematics system
Sage includes one. Here is a brief example. It uses a graph that is sure to Sweden
have a solution, and then we display the solution as an adjacency matrix. For tour
more, see the documentation.
358 Chapter V. Computational Complexity

sage: g = graphs.HeawoodGraph()
sage: tsp = g.traveling_salesman_problem()
sage: tsp.adjacency_matrix()
[0 0 0 0 0 1 0 0 0 0 0 0 0 1]
[0 0 1 0 0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 1 0]
[0 0 0 1 0 0 0 0 0 1 0 0 0 0]
[1 0 0 0 0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0 0 1 0 0]
[0 0 1 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0 0 0 1 0]

A wonderful overview talk, Information, Computation, Optimization . . . by W Cook

is available online; see (YouTube channel Joint Mathematics Meetings 2018).
In summary, even though the theory says that a problem is hard does not mean
it that it cannot be attacked for non-toy instances. See also the next Extra section.

V.B Exercises
B.1 Critique this from social media: “The Traveling Salesman problem is NP hard.
That means that algorithms exist that solve the problem but these algorithms are
very slow. So for an input of size 100 you may already have to wait hundreds of
years.”
B.2 A naive algorithm for the Traveling Salesman problem is to try every possible
circuit. If there are 𝑛 -many cities then how many circuits are there?
B.3 Use the exhaustive search of all possible circuits to find the shortest Travellng
Salesman Problem solution for a circuit involving the graph below.
3
𝑣0 𝑣1
2 7 1 5

𝑣2 4
𝑣3

Extra
V.C SAT solvers

The prior Extra section gives an example of a problem where the best algorithm we
know is super-polynomial, but in practice that problem is solvable. For instance,
problems may have occasional hard instances — black holes where the computer
goes in and does not come out — but on most instances we can find the answer in
a reasonable time. This section demonstrates a program to solve SAT, a SAT solver,
and shows how to use that as an oracle to solve others.
A problem reduction L1 ≤𝑝 L0 gives a way to transfer problem domains, to
change questions about L1 into questions about L0 . Here we take L0 to be SAT
and for L1 we will use Soduku.
Extra C. SAT solvers 359

9 1 5
5 4 9 7
4 7 3 5 6 1 9
7 4 9 6
8
4 8 3 1 5
1 3 5 9 2
6 2 5 7 3
7 2 1 9

In case it is unfamiliar, the popular Soduku puzzle starts with a 9-by-9 array, with
some of the cells already filled in. An example is above. Players solve it by filling in
the blanks while satisfying three restrictions: every row must contain each of the
numbers 1–9, and the same holds for each column, as well as the nine subsquares.
We first argue that this is hard. The definition of the computational complexity
of a problem requires that we describe how a solution algorithm’s use of some
resource grows with the problem’s input size. For that, we must frame the problem
to allow instances of different input sizes. The natural way to generalize the puzzle
is: instead of eighty one variables 𝑥 1,1, ... 𝑥 9,9 we could use an arbitrary number,
𝑥 1, ... 𝑥𝑛 . Instead of those variables taking on 1–9, they could take on a value in
1–𝑘 (and we could call these ‘colors’ instead of ‘values’). And, in place of rows,
columns, and subsquares that are driven by the geometry, a problem instance
could have arbitrary sets, 𝑆 ⊆ {𝑥 1, ... 𝑥𝑛 } of size 𝑘 . With that, it is more than a
fixed-sized puzzle, it is a problem, and we can show that the Soduku problem is
NP complete. Thus we can fairly consider this problem to be quite hard.
However, having noted this, in this section we will ignore the generalization
and limit our attention to the traditional 9 × 9 board.
To solve puzzle instances using the SAT solver as an oracle we focus on the
reduction Soduku ≤𝑝 SAT. We must produce a function that inputs game boards
and outputs Propositional Logic expressions. Recall that expressions such as
(𝑥 1 ∨ 𝑥 2 ) ∧ (¬𝑥 2 ∨ 𝑥 3 ) or (𝑥 1 ∨ 𝑥 2 ∨ ¬𝑥 3 ) ∧ 𝑥 4 ∧ (¬𝑥 2 ∨ 𝑥 3 ) are in Conjunctive
Normal form, CNF. These are the conjunction of clauses where each clause is the
disjunction of literals (a literal is either a single atom such as 𝑥𝑖 or the negation
of an atom such as ¬𝑥𝑖 ). The SAT solver can require that input be in this form
because for any Boolean function there is a CNF expression giving that behavior;
more is in Section C.
The SAT solver that we use needs its input file formatted to a standard called
DIMACS. It starts with comment lines, beginning with the character c. Next comes
a problem line, which starts with a p, followed by a space and the problem type
cnf, then followed by a space and the number of variables, and then followed by a
space and the number of clauses. After that line the rest of the file consists of the
clause descriptions.
360 Chapter V. Computational Complexity

To describe a clause the file lists its indices. Thus we describe 𝑥 2 ∨ 𝑥 5 ∨ 𝑥 6 with
the list 2 5 6. Negations are described with negatives, so that ¬𝑥 5 ∨ 𝑥 7 ∨ ¬𝑥 9
matches -5 7 -9. These clause descriptions are separated by 0’s.†
We will have many variables. For instance, for the Soduku array’s row 1,
column 1 entry, we will have a Boolean variable 𝑥 1,1,1 , another variable 𝑥 1,1,2 , etc.,
up to 𝑥 1,1,9 . Only one of these nine will be 𝑇 and the rest will be 𝐹 . The variable
𝑥 1,1,𝑣 is 𝑇 if in our solution the number in row 1 and column 1 is 𝑣 . Otherwise this
variable is 𝐹 . Restated, if for instance in the first row and first column the puzzle
has the value 5 then 𝑥 1,1,5 is 𝑇 while all other 𝑥 1,1,𝑖 are 𝐹 .
Thus for each row, column, and value triple 𝑟, 𝑐, 𝑣 ∈ { 1, ... 9 } we will have a
variable 𝑥𝑟,𝑐,𝑣 . That’s 93 = 729 variables.
The CNF expression that we will produce has clauses of two kinds. One
describes the general rules of the game Soduku while the other is specific to the
particular starting board instance. It is as though we bought a puzzle book and
opened first to the introduction describing the rules, and later opened to the page
containing the specific partial board.
To describe the general rules we need lot of clauses describing relationships
among the variables. An example of such a rule is that exactly one of 𝑥 1,1,1 , 𝑥 1,1,2 ,
. . . 𝑥 1,1,9 is 𝑇 . There are too many of these rules to write by hand so we will get
the computer to do them.
The Racket file starts with some constants that make the code easier to read.
(define ONETONINE '(1 2 3 4 5 6 7 8 9))
(define ONETOTHREE '(1 2 3)) ;; for boxes
(define FOURTOSIX '(4 5 6))
(define SEVENTONINE '(7 8 9))
(define BOX-INDICES (list ONETOTHREE FOURTOSIX SEVENTONINE))

Each row has nine restrictions. For instance, to express that the first row
contains an entry with the value 2, we need this clause.
𝑥 1,1,2 ∨ 𝑥 1,2,2 ∨ 𝑥 1,3,2 ∨ 𝑥 1,4,2 ∨ 𝑥 1,5,2 ∨ 𝑥 1,6,2 ∨ 𝑥 1,7,2 ∨ 𝑥 1,8,2 ∨ 𝑥 1,9,2
The Racket code below produces this corresponding list: ( (1 1 2) (1 2 2) (1
3 2) (1 4 2) (1 5 2) (1 6 2) (1 7 2) (1 8 2) (1 9 2) ). We express all
of the row restrictions with one such list for each row and value.
;; row-restrictions Return list of list of triples , each list of triples meaning
;; that each row has to have each value 1-9.
(define (row-restrictions)
(define (one-row-one-value row-number variable -value)
(for/list ([column-number ONETONINE])
(list row-number column-number variable -value)))

(for*/fold ([accumulator '()]

#:result (reverse accumulator))
([variable -value ONETONINE]
[row-number ONETONINE])
(cons (one-row-one-value row-number variable -value) accumulator)))
†
Since DIMACS uses 0 to separate clauses, in this section we don’t follow our usual practice of starting
with the first variable named 𝑥 0 . Rather, we start with 𝑥 1 .
Extra C. SAT solvers 361

Running that routine produces a list of lists. This is a typical member, requiring
that at least one entry in row 3 must be an 8.
((3 1 8) (3 2 8) (3 3 8) (3 4 8) (3 5 8) (3 6 8) (3 7 8) (3 8 8) (3 9 8))

The column restrictions are much like the row ones.

;; column-restrictions Return list of list of triples , each list of triples meaning

;; that each row has to have each value 1-9.
(define (column-restrictions)
(define (one-column-one-value column-number variable -value)
(for/list ([row-number ONETONINE])
(list row-number column-number variable -value)))

(for*/fold ([accumulator '()]

#:result (reverse accumulator))
([variable -value ONETONINE]
[column-number ONETONINE])
(cons (one-column-one-value column-number variable -value) accumulator)))

Here is a typical line, requiring that at least one entry in column 7 must be an 8.
((1 7 8) (2 7 8) (3 7 8) (4 7 8) (5 7 8) (6 7 8) (7 7 8) (8 7 8) (9 7 8))

For the subsquares, the restrictions are the same in principle but the form of
the code is a bit different.

;; box-restrictions Return list of list of triples , each list of triples meaning

;; that each 3x3 box has to have each value 1-9.
(define (box-restrictions)
(define (one-box-one-value box-row-list box-column-list variable -value)
(for*/list ([row-number box-row-list]
[column-number box-column-list])
(list row-number column-number variable -value)))

(for*/fold ([accumulator '()]

#:result (reverse accumulator))
([variable -value ONETONINE]
[box-row-list BOX-INDICES]
[box-column-list BOX-INDICES])
(cons (one-box-one-value box-row-list box-column-list variable -value) accumulator)))

Here is one of the box restrictions produced by the Racket code below, saying that
some entry in the upper-right box has the value 8.
((7 1 8) (7 2 8) (7 3 8) (8 1 8) (8 2 8) (8 3 8) (9 1 8) (9 2 8) (9 3 8))

Running the SAT solver with just the restrictions above finds that they can be
satisfied. But there is a surprise. It finds a satisfying assignment by putting more
than one value in some boxes and no value at all in some other boxes.
Thus we need one more set of restrictions, that no entry can contain two values.
We add clauses like ¬𝑥 3,4,1 ∨ ¬𝑥 3,4,2 , meaning that the row 3 and column 4 entry
cannot be both a 1 and a 2 (it could of course be neither).

;; entry-restrictions Return list of list of triples , each list of triples meaning

;; that each entry cannot be two separate values 1-9.
(define (entry-restrictions)
362 Chapter V. Computational Complexity

(for*/list ([row-number ONETONINE]

[column-number ONETONINE]
[variable -value1 ONETONINE]
[variable -value2 ONETONINE]
#:unless (>= variable -value1 variable -value2))
(list (list row-number column-number (* -1 variable -value1))
(list row-number column-number (* -1 variable -value2)))))

This is one of the resulting lines, enforcing that the entry in row 8 and column 8
cannot be both 6 and 9 (again, the negative means logical negation).
((8 8 -6) (8 8 -9))

To finish we must specify the initial board. For example, to tell the SAT solver
that there is a 9 in row 1 and column 3 we include the one-literal clause 𝑥 1,3,9 .
Here are the first few lines of that routine.

;; INITIAL -CLAUSES The given layout of the board. Each row is a list with a
;; triple: row number, column number, integer.
(define INITIAL -CLAUSES
(list (list '(1 3 9)) ; there is a 9 in position (1,3)
(list '(1 8 1))
(list '(1 9 5))

In this development, and in the Racket file, we work in 𝑥𝑟,𝑐,𝑣 ’s. But DIMACS
wants a single index. So we need to convert each of our variables to some 𝑥𝑘 and
back again. The formula is 𝑘 = 1 + 81 · (|𝑣 | − 1) + 9 · (𝑟 − 1) + (𝑐 − 1) .

;; triple->varnum Find the variable number associated with the row, column, and value
;; row-number column-number integers , counting starts at 1
;; variable -value integer value of the entry. If negative , then
;; the predicate is to be negated.
;; If variable -value < 0 then use the absolute value for the basic varnum,
;; but return the negative of the polynomial (indicating that the predicate is negated).
(define (triple->varnum row-number column-number variable -value)
(let ([a-value (+ (* 81 (- (abs variable -value) 1))
(* 9 (- row-number 1))
(- column-number 1)
1)]) ;; add 1 because DIMACS uses 0 to terminate clauses
(if (negative? variable -value)
(* -1 a-value)
a-value)))

;; varnum->triple From the variable number, return the associated row, column, and value
(define (varnum->triple v)
(let* ([offset (- (abs v) 1)]
[variable -value (quotient offset 81)]
[vv-removed (- offset (* 81 variable -value))]
[row-number (quotient vv-removed 9)]
[column-number (remainder vv-removed 9)])
(if (negative? v)
(list (+ 1 row-number) (+ 1 column-number) (* -1 (+ 1 variable -value)))
(list (+ 1 row-number) (+ 1 column-number) (+ 1 variable -value)))))

Here are a couple of example conversions using these routines.

> (triple->varnum 3 4 5)
346
> (varnum->triple 346)
'(3 4 5)
Extra C. SAT solvers 363

Now we can write all of this to the DIMACS-format file. This will convert the
clauses.

;; produce -clauses Given a list of lists of triples , produce the matching set of strings
;; for the DIMACS file
(define (produce -clauses list-of-lists)
(define (one-line-of-one variable -in-clause) ; produce line from a list of one number
(apply format "~a 0\n" variable -in-clause))
(define (one-line-of-two variables -in-clause) ; produce line from a list of two numbers
(apply format "~a ~a 0\n" variables -in-clause))
(define (one-line-of-nine variables -in-clause) ; produce line from a list of nine numbers
(apply format "~a ~a ~a ~a ~a ~a ~a ~a ~a 0\n" variables -in-clause))

(for/list ([clause-list list-of-lists])

(display clause-list)(newline)
(cond [(= 9 (length clause-list))
(one-line-of-nine (map (lambda (x) (triple->varnum (first x) (second x) (third x)))
clause-list))]
[(= 1 (length clause-list))
(one-line-of-one (map (lambda (x) (triple->varnum (first x) (second x) (third x)))
clause-list))]
[(= 2 (length clause-list))
(one-line-of-two (map (lambda (x) (triple->varnum (first x) (second x) (third x)))
clause-list))])))

And this gathers the clauses together and then calls the above routine.

;; CLAUSES The list of all clauses , including those auto generated.

(define CLAUSES
(append INITIAL -CLAUSES (entry-restrictions)
(row-restrictions) (column-restrictions) (box-restrictions)))

(define FILE-PREAMBLE
(list (format "c ~a\n" FILENAME)
"c DIMACS format file for SAT solver\n"
(format "c ~a Jim Hefferon , hefferon.net. Public Domain.\n"
(date->string (current -date)))
(format "p cnf ~a ~a\n" (* 9 9 9) (length CLAUSES))))

(define FILE-LINES
(append
FILE-PREAMBLE
(produce -clauses CLAUSES)))

;; dump-to-file Drop the clauses to the file

(define (dump-to-file)
(define (dump-lines outfile)
(for ([ln FILE-LINES])
(display ln outfile)))

(call-with-output-file* FILENAME dump-lines #:mode 'text #:exists 'replace))

The first few lines of the result soduku.cnf look like this.

c soduku.cnf
c DIMACS format file for SAT solver
c 2022-01-16 Jim Hefferon , hefferon.net. Public Domain.
p cnf 729 3197
651 0
8 0
364 Chapter V. Computational Complexity

333 0
334 0
256 0
663 0
502 0
262 0
506 0
183 0

Remember that the 0’s separate clauses.

We can now run the SAT solver, MiniSat (Eén and Sörensson 2005).
ftpmaint@millstone:~/Documents/computing/src/scheme/complexity$ minisat soduku.cnf soduku.out
WARNING: for repeatability , setting FPU to use double precision
============================[ Problem Statistics ]=============================
| |
| Number of variables: 729 |
| Number of clauses: 2044 |
| Parse time: 0.00 s |
| Eliminated clauses: 0.00 Mb |
| Simplification time: 0.00 s |
| |
============================[ Search Statistics ]==============================
| Conflicts | ORIGINAL | LEARNT | Progress |
| | Vars Clauses Literals | Limit Clauses Lit/Cl | |
===============================================================================
| 100 | 257 1209 4354 | 443 99 7 | 47.051 % |
| 250 | 256 1209 4354 | 487 248 8 | 47.188 % |
| 475 | 247 1190 4283 | 536 464 8 | 48.423 % |
===============================================================================
restarts : 6
conflicts : 645 (88660 /sec)
decisions : 1326 (0.00 % random) (182268 /sec)
propagations : 15204 (2089897 /sec)
conflict literals : 4922 (18.52 % deleted)
Memory used : 29.00 MB
CPU time : 0.007275 s

SATISFIABLE

It took far less than a second, on an ordinary laptop. The algorithms for SAT
solvers are exponential in the worst case but they seem to do very well in practice.
Below is the output. It contains 729 numbers but only a few fit on this page,
and anyway that many numbers would not be more enlightening than just showing
the first few.
ftpmaint@millstone:~/Documents/computing/src/scheme/complexity$ cat soduku.out
SAT
-1 -2 -3 -4 -5 -6 -7 8 -9 -10 -11 12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 24 -25 -26

To see a solution entry, pick a positive number from the output.

> (varnum->triple 8)
'(1 8 1)

Here is a positive output number that doesn’t show in the above line.
> (varnum->triple 107)
'(3 8 2)

Thus, row 3 and column 8 holds 2.

Some simple routines built on what is above show the entire board.
Extra D. The Bounded Halting problem 365

> (show-solved-board)
'#(#(2 6 9 3 7 8 4 1 5)
#(5 8 1 4 2 9 7 6 3)
#(4 7 3 5 6 1 9 2 8)
#(8 1 2 7 4 5 3 9 6)
#(3 5 7 1 9 6 2 8 4)
#(6 9 4 8 3 2 1 5 7)
#(1 3 5 9 8 4 6 7 2)
#(9 4 6 2 5 7 8 3 1)
#(7 2 8 6 1 3 5 4 9))

In summary, we can in reasonable time solve instances of SAT that are not toy
exercises. They are large enough that to write the CNF form we had to resort to
code.

V.C Exercises
C.1 This board is described online as an especially hard Soduku. Use the routines
from this section to solve it.
3 4 5 6 9
5 4
8 1
8 2 3 7
1 8 7 3
9
8 7
8 7 2
4 9

Extra
V.D The Bounded Halting problem
This chapter’s final section develops the intuition that the class of NP complete
problems forms a transition or bridge between the solvable problems and the
unsolvable ones. Here we support that with some results.
The signature unsolvable problem is Halting problem. Here is a variant that is
easy.
4.1 Problem Given an index, an input, and a step limit, ⟨𝑒, 𝑥, 𝑆⟩ ∈ N × B∗ × N,
decide if Turing machine P𝑒 halts on 𝑥 within 𝑆 steps.
This problem is clearly solvable since we can just run the machine to see if it
halts within 𝑆 steps. Consider a version that is not so trivial.
4.2 Problem (Nondeterministic Bounded Halting problem) Given an index and a
step limit, ⟨𝑒, 𝑆⟩ (where 𝑆 is specified in unary notation), decide if on an empty
tape, the nondeterministic Turing machine P𝑒 halts within 𝑆 steps. That is, decide
if P𝑒 ’s computation history contains a branch that reaches a halting state in no
more than 𝑆 many transitions.
366 Chapter V. Computational Complexity

This problem is also solvable. For instance, we could use a deterministic

machine to simulate all of the nondeterministic machine’s at most 2𝑆 possible
branches, and just check each for a halting state. Thus, on a serial machine the
naive approach to this problem is exponential. Can we do better?
D.3 Theorem The Nondeterministic Bounded Halting problem is NP complete.
Proof Write B for the Nondeterministic Bounded Halting problem’s language.
We first argue that B is in NP. We can write a verifier that inputs a problem
instance ⟨𝑒, 𝑆⟩ and a witness, which represents a halting branch (it might be a
bitstring, where 0 and 1 describe the two ways that the branch can go). Obviously
⟨𝑒, 𝑆⟩ is in B if and only if there exists such a witness. Also clear is that the verifier
can run in time polynomial in |𝜎 | , since 𝑆 is given in unary.
What remains is to argue that B is NP hard. We will show that any problem
L ∈ NP reduces to it, that L ≤𝑝 B . Each such L has an associated polytime
verifier, a deterministic Turing machine V where 𝜎 ∈ L if and only if there exists
an 𝜔 so that V accepts ⟨𝜎, 𝜔⟩ and where V runs in time bounded 𝑝 (|𝜎 |) for some
polynomial 𝑝 .
For the reduction we need a computable function 𝑓 so that 𝜎 ∈ L if and only
if 𝑓 (𝜎) ∈ B. On the left below is a nondeterministic Turing machine (recall that
one of our mental models is that such machines guess values, or receive them from
a demon). Let this machine have index 𝑒 0 . By the s-m-n theorem the one on the
right is P𝑠 (𝑒0,𝜎 ) .

Start
Start
Read 𝜎
Guess 𝜔
Guess 𝜔
Y
𝑉 (𝜎, 𝜔 ) accepts? N
Y
𝑉 (𝜎, 𝜔 ) accepts? N
Halt Inf loop
Halt Inf loop

Then 𝜎 ∈ L if and only if ⟨𝑠 (𝑒 0, 𝜎), 𝑝 (|𝜎 |)⟩ ∈ B, as required.

We can also consider the following serial variant of the Halting problem.
4.4 Problem (Bounded Halting problem) Given an index and a step limit, ⟨𝑒, 𝑆⟩ ,
decide if on every input, Turing machine P𝑒 halts within 𝑆 steps.
This problem too is solvable. We don’t need to check all of the infinitely many
inputs 𝑥 ∈ B∗ because in 𝑆 steps the machine can read at most 𝑆 many bits off the
tape. So we need only generate the 2𝑆 many tape inputs that are relevant, and on
each run the machine for 𝑆 steps.
We are actually going to consider the complement of the Bounded Halting
problem. It is also solvable, because answering ‘yes’ or ‘no’ to the Bounded Halting
problem is the same as answering ‘no’ or ‘yes’ to its complement.
Our naive brute force approach to these is exponential. Can we do better?
Extra D. The Bounded Halting problem 367

D.5 Theorem The complement of the Bounded Halting problem is NP complete.

Proof Write B for the Bounded Halting problem language so that B c is the language
of the complement.
Half of the argument is that B c is a member of NP. We have that ⟨𝑒, 𝑆⟩ ∈ B c if
and only if there is at least one relevant 𝑥 0 where P𝑒 on tape input 𝑥 0 does not
halt within 𝑆 steps. So we can write a verifier that uses the relevant inputs as the
witnesses, and that runs in polytime.
The other half is showing that any member of NP reduces to B c. Fix L ∈ NP. It
has an associated polytime verifier, a deterministic Turing machine V where 𝜎 ∈ L
if and only if there exists an 𝜔 so that V accepts ⟨𝜎, 𝜔⟩ and where V runs in time
bounded 𝑝 (|𝜎 |) for some polynomial 𝑝 .
The reduction needs a computable function 𝑓 where 𝜎 ∈ L if and only if
𝑓 (𝜎) ∈ B c. Let the routine on the left below be P𝑒0 . If 𝜎 ∈ L then this machine
goes into an infinite loop for at least one 𝜔 . By the s-m-n theorem the one on the
right is P𝑠 (𝑒0,𝜎 ) .

Start Start

Read 𝜎 , 𝜔 Read 𝜔

Y
𝑉 (𝜎, 𝜔 ) accepts? N Y
𝑉 (𝜎, 𝜔 ) accepts? N

Inf loop Halt Inf loop Halt

Then 𝜎 ∈ L if and only if ⟨𝑠 (𝑒 0, 𝜎), 𝑝 (|𝜎 |)⟩ ∉ B.

Part Four

Appendices
Appendix A Strings
An alphabet is a nonempty and finite set of symbols (sometimes called tokens). We
write symbols in a distinct typeface, as in 1 or a, because the alternative of quoting
them would be clunky.† A string or word over an alphabet is a finite sequence
of elements from that alphabet. The string with no elements is the empty string,
denoted 𝜀 .
One potentially surprising aspect of a symbol is that it may contain more
than one letter. For instance, a programming language may have if as a symbol,
meaning that it is indecomposable into separate letters. Another example is that
the Racket alphabet contains the symbols or and car, as well as allowing variable
names such as x, or lastname. An example of a string is (or a ready), which is
a sequence of five alphabet elements, ⟨(, or, a, ready, )⟩ .
Traditionally, we denote an alphabet with the Greek letter Σ. In this book we
will name strings with lower case Greek letters (except that we use 𝜙 for something
else) and denote the items in the string with the associated lower case roman letter,
as in 𝜎 = ⟨𝑠 0, ... 𝑠𝑛− 1 ⟩ and 𝜏 = ⟨𝑡 0, ... 𝑡𝑚− 1 ⟩ . The length of the string 𝜎 , |𝜎 | , is the
number of symbols that it contains, 𝑛 . In particular, the length of the empty string
is |𝜀 | = 0.
In place of 𝑠𝑖 we sometimes write 𝜎 [𝑖] . One convenience of this form is that we
use 𝜎 [−1] for the final character, 𝜎 [−2] for the one before it, etc. We also write
𝜎 [𝑖 : 𝑗] for the substring between terms 𝑖 and 𝑗 , including the 𝑖 -th term but not the
𝑗 -th, and we write 𝜎 [𝑖 :] for the tail substring that starts with term 𝑖 as well as
𝜎 [ : 𝑗] for 𝜎 [ 0: 𝑗] .
The notations such as diamond brackets and commas are ungainly. We usually
work with alphabets having single-character symbols and then we write strings by
omitting the brackets and commas. That is, we write 𝜎 = abc instead of ⟨a, b, c⟩ .‡
This convenience comes with the disadvantage that without the diamond brackets
the empty string is just nothing, which is why we use the separate symbol 𝜀 .#
The alphabet consisting of the bit characters is B = { 0, 1 } (we sometimes
instead use B for the set { 0, 1 } of the bits themselves). Strings over B are bitstrings
or bit strings.§
Where Σ is an alphabet, for 𝑘 ∈ N the set of length 𝑘 strings over that alphabet
is Σ𝑘. The set of strings over Σ of any finite length is Σ∗ = ∪𝑘 ∈ N Σ𝑘. The asterisk
symbol is the Kleene star, read aloud as “star.”
Strings are simple so there are only a few operations. Let 𝜎 = ⟨𝑠 0 ... 𝑠𝑛− 1 ⟩
and 𝜏 = ⟨𝑡 0, ... 𝑡𝑚− 1 ⟩ be strings over an alphabet Σ. The concatenation 𝜎 ⌢ 𝜏 or
†
We give them a distinct look to distinguish the symbol ‘a’ from the variable ‘𝑎 ’, so that we can tell “let
𝑥 = a” apart from “let 𝑥 = 𝑎 .” Symbols are not variables — they don’t hold a value, they are themselves
a value. ‡ To see why when we drop the commas we want the alphabet to consist of single-character
symbols, consider Σ = { a, aa } and the string aaa. Without the commas this string is ambiguous: it
could mean ⟨ a, aa ⟩ , or ⟨ aa, a ⟩ , or ⟨ a, a, a ⟩ . # Omitting the diamond brackets and commas also blurs
the distinction between a symbol and a one-symbol string, between a and ⟨ a ⟩ . However, dropping
the brackets it is so convenient that we accept this disadvantage. § Some authors consider infinite
bitstrings but ours will always be finite.
𝜎𝜏 appends the second string to the first, 𝜎 ⌢ 𝜏 = ⟨𝑠 0 ... 𝑠𝑛−1, 𝑡 0, ... 𝑡𝑚−1 ⟩ . Where
𝜎 = 𝜏0 ⌢ · · · ⌢ 𝜏𝑘 −1 , we say that 𝜎 decomposes into the 𝜏 ’s, and that each 𝜏𝑖 is a
substring of 𝜎 . The first substring, 𝜏0 , is a prefix of 𝜎 . The last, 𝜏𝑘 − 1 , is a suffix.
A power or replication of a string is an iterated concatenation with itself, so
that 𝜎 2 = 𝜎 ⌢ 𝜎 and 𝜎 3 = 𝜎 ⌢ 𝜎 ⌢ 𝜎 , etc. We write 𝜎 1 = 𝜎 and 𝜎 0 = 𝜀 . The reversal
𝜎 R of a string takes the symbols in reverse order: 𝜎 R = ⟨𝑠𝑛−1, ... 𝑠 0 ⟩ . The empty
string’s reversal is 𝜀 R = 𝜀 .
For example, let Σ = { a, b, c } and let 𝜎 = abc and 𝜏 = bbaac. Then the
concatenation 𝜎𝜏 is abcbbaac. The third power 𝜎 3 is abcabcabc, and the reversal
𝜏 R is caabb. A palindrome is a string that equals its own reversal. Examples are
𝛼 = abba, 𝛽 = cdc, and 𝜀 .

Exercises

A.1 Let 𝜎 = 10110 and 𝜏 = 110111 be bit strings. Find each. (a) 𝜎⌢𝜏 (b) 𝜎⌢𝜏 ⌢𝜎
(c) 𝜎 R (d) 𝜎 3 (e) 03 ⌢ 𝜎
A.2 Let the alphabet be Σ = { a, b, c }. Suppose that 𝜎 = ab and 𝜏 = bca. Find
each. (a) 𝜎 ⌢ 𝜏 (b) 𝜎 2 ⌢ 𝜏 2 (c) 𝜎 R ⌢ 𝜏 R (d) 𝜎 3
A.3 Let L = {𝜎 ∈ B∗ |𝜎 | = 4 and 𝜎 starts with 0 }. How many elements are in
that language?
A.4 Suppose that Σ = { a, b, c } and that 𝜎 = abcbccbba. (a) Is abcb a prefix of 𝜎 ?
(b) Is ba a suffix? (c) Is bab a substring? (d) Is 𝜀 a suffix?
A.5 What is the relation between |𝜎 | , |𝜏 | , and |𝜎 ⌢ 𝜏 | ? You must justify your
answer.
A.6 The operation of string concatenation follows a simple algebra. For each
of these, decide if it is true. If so, prove it. If not, give a counterexample.
R
(a) 𝛼 ⌢ 𝜀 = 𝛼 and 𝜀 ⌢ 𝛼 = 𝛼 (b) 𝛼 ⌢ 𝛽 = 𝛽 ⌢ 𝛼 (c) 𝛼 ⌢ 𝛽 R = 𝛽 R ⌢ 𝛼 R (d) 𝛼 R = 𝛼
R
(e) 𝛼 𝑖 = 𝛼 𝑖
A.7 Show that string concatenation is not commutative, that there are strings 𝜎
and 𝜏 so that 𝜎 ⌢ 𝜏 ≠ 𝜏 ⌢ 𝜎 .
A.8 In defining decomposition above we have ‘𝜎 = 𝜏0 ⌢ · · · ⌢ 𝜏𝑛− 1 ’, without
parentheses on the right side. This takes for granted that the concatenation
operation is associative, that no matter how we parenthesize it we get the same
string. Prove this. Hint: use induction on the number of substrings, 𝑛 .
A.9 Prove that this constructive definition of string power is equivalent to the one
above.
– if 𝑛 = 0
(
𝜀
𝜎 𝑛 = 𝑛−1 ⌢
𝜎 𝜎 – if 𝑛 > 0
Appendix B Functions
A function is an input-output relationship: each input is associated with a unique
output. An example is the association of each input natural number with the output
number that is its square. Another is the association of each string of characters
with the length of that string. A third is the association of each polynomial
𝑎𝑛 𝑥 𝑛 + · · · + 𝑎 1𝑥 + 𝑎 0 with a Boolean value 𝑇 or 𝐹 , depending on whether 1 is a
root of that polynomial.
An important point is that, contrary to what is said in most introductions, a
function isn’t a ‘rule’. The function that associates a year with that year’s winners
of the US baseball World Series isn’t given by any rule simpler than an exhaustive
listing of all cases. Nor is the kind of association that a database might have, such
as linking the government ID of US citizens to their income in the most recent
tax year. True, in science many functions are described by a formula, such as
𝐸 (𝑚) = 𝑚𝑐 2, and as well many functions are computed by a program. But what
makes something a function is that for each input there is exactly one associated
output. If we can go from an input to the associated output with a calculation then
that’s great but even if we can’t, it is still a function.
For a precise definition fix two sets, a domain 𝐷 and a codomain 𝐶 . A function
or map, 𝑓 : 𝐷 → 𝐶 , is a set of pairs (𝑥, 𝑦) ∈ 𝐷 × 𝐶 , subject to the restriction of
being well-defined, that every 𝑥 ∈ 𝐷 appears as the first entry in one and only
one pair (more on well-definedness is below). We write 𝑓 (𝑥) = 𝑦 or 𝑥 ↦→ 𝑦 and
say that ‘𝑥 maps to 𝑦 ’. Note the difference between the arrow symbols used in
𝑓 : 𝐷 → 𝐶 and 𝑥 ↦→ 𝑦 . We say that 𝑥 is an input or argument to the function, and
that 𝑦 is an output or value of the function.
Some functions take more than one input, such as dist (𝑥, 𝑦) = 𝑥 2 + 𝑦 2 . We say
√︁

that this function is 2-ary, while some other functions are 3-ary, etc. The number of
inputs is the function’s arity. If the function takes only one input but that input is a
tuple then we often drop the parentheses, so we write 𝑓 ( 3, 5) instead of 𝑓 (( 3, 5)) .

Pictures We often illustrate functions using the familiar 𝑥𝑦 axes.

20
2

−2 2
2
− 10

𝑓 (𝑥) = 𝑥 3 𝑓 (𝑥) = ⌊𝑥⌋

− 20
−4

We also illustrate functions with a bean diagram, which separates the domain and
the codomain sets. Below on the left is the action of the exclusive or operator while
on the right is a variant of the bean diagram, showing the absolute value function
mapping integers to integers.

3 3
2 2
𝐹 ,𝐹 1 1
𝐹 ,𝑇 𝐹
𝑇 ,𝐹 0 0
𝑇
𝑇 ,𝑇
−1 −1
−2 −2
−3 −3

Codomain and range Where 𝑆 ⊆ 𝐷 is a subset of the domain, its image is

the set 𝑓 (𝑆) = { 𝑓 (𝑠) 𝑠 ∈ 𝑆 }. Thus, under the squaring function the image of
𝑆 = { 0, 1, 2 } is 𝑓 (𝑆) = { 0, 1, 4 }. Under the floor function 𝑔 : R → R given by
𝑔(𝑥) = ⌊𝑥⌋ , the image of the positive reals is the set of natural numbers.
The image of the entire domain 𝐷 is the function’s range, ran (𝑓 ) = 𝑓 (𝐷) =
{ 𝑓 (𝑑) 𝑑 ∈ 𝐷 }. For instance, the range of the function 𝑓 : Z → Z given by
𝑥 ↦→ 2𝑥 is the set of even integers.
Note the difference between a function’s range and its codomain; the codomain
is a convenient
√ superset of the range. An example is that for the function given by
𝑓 (𝑥) = 2𝑥 + 2𝑥 2 + 15, we will usually be content to note that the polynomial is
4

always nonnegative and so the output is real, writing 𝑓 : R → R where the second
R is the codomain, rather than troubling to find its exact range.

Domain Sometimes a function’s domain requires attention. Examples are that

𝑓 (𝑥) = 1/𝑥 is undefined at 𝑥 = 0 and that the function defined by the infinite
series 𝑔(𝑟 ) = 1 + 𝑟 + 𝑟 2 + · · · diverges when 𝑟 is outside the interval (−1 .. 1) .
Formally, when we define the function we must specify the domain to eliminate
such problems, for instance by defining the domain of 𝑓 as R − { 0 }. However,
we are usually casual about this, expecting that a reader will understand to omit
inputs that are an issue.
We sometimes have a function 𝑓 : 𝐷 → 𝐶 and want to reduce the domain
to some subset 𝑆 ⊆ 𝐷 . The restriction 𝑓 ↾𝑆 is the function with domain 𝑆 and
codomain 𝐶 defined by 𝑓 ↾𝑆 (𝑥) = 𝑓 (𝑥) .

In the Theory of Computation we sometimes uses these terms in a way that

is at odds with the definitions above. When we study a function, often we will
fix a convenient set of inputs 𝐷 , such as the set of all strings Σ∗ or all natural
numbers N, and refer to 𝐷 as the function’s domain, although that function may be
undefined on some of 𝐷 ’s elements. In this case we say that 𝑓 is a partial function.
If 𝑓 is defined on all inputs then it is a total function. Strictly speaking, ‘partial’ is
redundant since any function is partial, including a total function, but often ‘partial’
is used to suggest the possibility that 𝑓 may be not defined on some 𝑑 ∈ 𝐷 .
Well-defined The definition of a function contains the condition that for each
domain element there cannot be two associated codomain elements. We say that
functions are well-defined.
When we are considering a relationship between 𝑥 ’s and 𝑦 ’s and asking if it is a
function, typically it is well-definedness that is at issue.† For instance, consider
the set of ordered pairs (𝑥, 𝑦) where 𝑦 2 = 𝑥 . If 𝑥 = 9 then both 𝑦 = 3 and 𝑦 = −3
are related to 𝑥 so this is not a functional relationship — it is not well-defined —
because 𝑥 = 9 does not have only one associated 𝑦 . Another example is that
when setting up a company’s email we may decide to use each person’s first initial
and last name, but there could be more than one, say, lwainwright, making the
relationship email ↦→ person be not well-defined.
For a function that is suitable for graphing on 𝑥𝑦 axes, visual proof of well-
definedness is that for any 𝑥 in the domain, the vertical line through 𝑥 intercepts
𝑓 ’s graph in exactly one point.
One-to-one and onto The definition of function has an asymmetry: among the
ordered pairs (𝑥, 𝑦) , it requires that each domain element 𝑥 be in one pair and
only one pair, but it does not require the same of the codomain elements.
A function is one-to-one (or 1-1 or an injection) if each codomain element 𝑦 is
in at most one pair. The function below is one-to-one because for every element 𝑦
in the codomain, the bean on the right, there is at most one arrow ending at 𝑦 .

An important example is that when the domain is a subset of the codomain, 𝐷 ⊆ 𝐶 ,

then the inclusion function 𝜄 : 𝐷 → 𝐶 is defined by 𝜄 (𝑥) = 𝑥 . The most common
way to prove that a function 𝑓 is one-to-one is to assume that 𝑓 (𝑥 0 ) = 𝑓 (𝑥 1 )
and then argue that therefore 𝑥 0 = 𝑥 1 . If a function is suitable for graphing on
𝑥𝑦 axes then visual proof that it is one-to-one is that for any 𝑦 in the codomain,
the horizontal line at 𝑦 intercepts the graph in at most one point.
A function is onto (or a surjection) if each codomain element 𝑦 is in at least
one pair. Thus, a function is onto if its codomain equals its range. The function
below is onto because every element in the codomain bean has at least one arrow
ending at it.

†
Sometimes people say that they are, “checking that the function is well-defined.” In a strict sense this
is confused, because if it is a function then it is by definition well-defined. However, natural language is
funny this way — while all tigers have stripes, we may well sometimes say “striped tiger.”
The most common way to verify that a function is onto is to start with a generic
(that is, arbitrary) codomain element 𝑦 and then exhibit a domain element 𝑥 that
maps to it. If a function is suitable for graphing on 𝑥𝑦 axes then visual proof that it
is onto is that for any 𝑦 in the codomain, the horizontal line at 𝑦 intercepts the
graph in at least one point.
As the above pictures suggest, where the domain and codomain are finite, when
there is a function 𝑓 : 𝐷 → 𝐶 then we can conclude that the number of elements
in the domain is less than or equal to the number in the codomain. Further, if the
function is onto then the number of elements in the domain equals the number in
the codomain if and only if the function is one-to-one.

Correspondence A function is a correspondence (or bijection) if it is both one-to-

one and onto. The picture on the left shows a correspondence between two finite
sets, both with four elements, and the picture on the right shows a correspondence
between the natural numbers and the primes.

0 1 2 3 4 5 6 7 ...

...
2 3 5 7 11 13 17 19

The most common way to verify that a function is a correspondence is to separately

verify that it is one-to-one and that it is onto. Where the function is 𝑓 : R → R, so
it can be graphed on 𝑥𝑦 axes, visual proof that it is a correspondence is that for
any 𝑦 in the codomain, the horizontal line at 𝑦 intercepts the graph in exactly one
point.
As the picture above on the left suggests, where the domain and codomain are
finite, if a function is a correspondence then its domain has the same number of
elements as its codomain.

Composition and inverse If 𝑓 : 𝐷 → 𝐶 and 𝑔 : 𝐶 → 𝐵 then their composition

𝑔 ◦ 𝑓 : 𝐷 → 𝐵 is defined by 𝑔 ◦ 𝑓 (𝑑) = 𝑔( 𝑓 (𝑑) ) . For instance, the real functions
𝑓 (𝑥) = 𝑥 2 and 𝑔(𝑥) = sin (𝑥) combine to give 𝑔 ◦ 𝑓 = sin (𝑥 2 ) .
Composition does not commute. Using the functions from the prior paragraph,
𝑓 ◦ 𝑔 = sin (𝑥 2 ) and 𝑓 ◦ 𝑔 = ( sin 𝑥) 2 are different; for instance they are unequal
when 𝑥 = 𝜋 . Composition can fail to commute more dramatically: if 𝑓 : R2 → R
is given by 𝑓 (𝑥 0, 𝑥 1 ) = 𝑥 0 , and 𝑔 : R → R is 𝑔(𝑥) = 𝑥 , then 𝑔 ◦ 𝑓 (𝑥 0, 𝑥 1 ) = 𝑥 0 is
perfectly sensible but composition in the other order is not even defined.
The composition of one-to-one functions is one-to-one, and the composition of
onto functions is onto; see the exercises. It follows then that the composition of
correspondences is a correspondence.
An identity function id : 𝐷 → 𝐷 is given by id (𝑑) = 𝑑 for all 𝑑 ∈ 𝐷 . It acts as
the identity element in function composition, so that if 𝑓 : 𝐷 → 𝐶 then 𝑓 ◦ id = 𝑓
and if 𝑔 : 𝐶 → 𝐷 then id ◦𝑔 = 𝑔. As well, if ℎ : 𝐷 → 𝐷 then ℎ ◦ id = id ◦ℎ = ℎ .
Given 𝑓 : 𝐷 → 𝐶 , if 𝑔 ◦ 𝑓 is the identity function then 𝑔 is a left inverse function
of 𝑓 , or what is the same thing, 𝑓 is a right inverse of 𝑔. If 𝑔 is both a left and right
inverse of 𝑓 then we simply say that it is an inverse (or two-sided inverse) of 𝑓
and denoted it as 𝑓 − 1. If a function has an inverse then that inverse is unique. A
function has a two-sided inverse if and only if it is a correspondence.

Exercises
B.1 Let 𝑓 , 𝑔 : R → R be 𝑓 (𝑥) = 3𝑥 + 1 and 𝑔(𝑥) = 𝑥 2 + 1. (a) Show that 𝑓 is
one-to-one and onto. (b) Show that 𝑔 is not one-to-one and not onto.
B.2 Show each of these.
(a) Let 𝑔 : R3 → R2 be the projection map (𝑥, 𝑦, 𝑧) ↦→ (𝑥, 𝑦) and let 𝑓 : R2 → R3
be (𝑥, 𝑦) ↦→ (𝑥, 𝑦, 0) . Then 𝑔 is a left inverse of 𝑓 but not a right inverse.
(b) The function 𝑓 : Z → Z given by 𝑓 (𝑛) = 𝑛 2 has no left inverse.
(c) Where 𝐷 = { 0, 1, 2, 3 } and 𝐶 = { 10, 11 } , the function 𝑓 : 𝐷 → 𝐶 given by
0 ↦→ 10, 1 ↦→ 11, 2 ↦→ 10, 3 ↦→ 11 has more than one right inverse.
B.3
(a) Where 𝑓 : Z → Z is 𝑓 (𝑎) = 𝑎 + 3 and 𝑔 : Z → Z is 𝑔(𝑎) = 𝑎 − 3, show that 𝑔
is inverse to 𝑓 .
(b) Where ℎ : Z → Z is the function that returns 𝑛 + 1 if 𝑛 is even and returns
𝑛 − 1 if 𝑛 is odd, find a function inverse to ℎ .
(c) If 𝑠 : R+ → R+ is 𝑠 (𝑥) = 𝑥 2 , find its inverse.
B.4 Fix 𝐷 = { 0, 1, 2 } and 𝐶 = { 10, 11, 12 }. Let 𝑓 , 𝑔 : 𝐷 → 𝐶 be 𝑓 ( 0) = 10,
𝑓 ( 1) = 11, 𝑓 ( 2) = 12, and 𝑔( 0) = 10, 𝑔( 1) = 10, 𝑔( 2) = 12. Then: (a) verify
that 𝑓 is a correspondence (b) construct an inverse for 𝑓 (c) verify that 𝑔 is not a
correspondence (d) show that 𝑔 has no inverse.
B.5
(a) Prove that a composition of one-to-one functions is one-to-one.
(b) Prove that a composition of onto functions is onto. With the prior item, this
gives that a composition of correspondences is a correspondence.
(c) Prove that if 𝑔 ◦ 𝑓 is one-to-one then 𝑓 is one-to-one.
(d) Prove that if 𝑔 ◦ 𝑓 is onto then 𝑔 is onto.
(e) If 𝑔 ◦ 𝑓 is onto, must 𝑓 be onto? If it is one-to-one, must 𝑔 be one-to-one?
B.6 Prove each.
(a) A function 𝑓 has an inverse if and only if 𝑓 is a correspondence.
(b) If a function has an inverse then that inverse is unique.
(c) The inverse of a correspondence is a correspondence.
(d) If 𝑓 and 𝑔 are each invertible then so is 𝑔 ◦ 𝑓 , and (𝑔 ◦ 𝑓 ) − 1 = 𝑓 − 1 ◦ 𝑔 − 1 .
B.7 Prove these for a function 𝑓 with a finite domain 𝐷 . They imply that
corresponding finite sets have the same size. Hint: for each, you can do induction
on either | ran (𝑓 )| or |𝐷 | .
(a) | ran (𝑓 )| ≤ |𝐷 |
(b) If 𝑓 is one-to-one then | ran (𝑓 )| = |𝐷 | .
Appendix C Propositional logic
A proposition is a statement that has a Boolean value, that is, it is either true
or false, which we write 𝑇 or 𝐹. For instance, ‘7 is odd’ and ‘82 − 1 = 127’ are
propositions, with values 𝑇 and 𝐹. In contrast, ‘𝑥 is a perfect square’ is not a
proposition because for some 𝑥 it is 𝑇 while for others it is not.
We can operate on propositions, including negating as with ‘it is not the case
that 8 is prime’, or taking the conjunction of two propositions as with ‘5 is prime
and 7 is prime’. The truth tables below define the behavior of not (also called
negation), and (also called conjunction), and or (also called disjunction).

not 𝑃 𝑃 and 𝑄 𝑃 or 𝑄
𝑃 ¬𝑃 𝑃 𝑄 𝑃 ∧𝑄 𝑃 ∨𝑄
𝐹 𝑇 𝐹 𝐹 𝐹 𝐹
𝑇 𝐹 𝐹 𝑇 𝐹 𝑇
𝑇 𝐹 𝐹 𝑇
𝑇 𝑇 𝑇 𝑇

Thus where ‘7 is odd’ is 𝑃 , and ‘8 is prime’ is 𝑄 , get the value of ‘7 is odd and 8
is prime’ from the right-hand table’s third column, third row: 𝐹. Observe that ∨
accumulates truth, in that if any of its inputs are 𝑇 then 𝑃 ∨ 𝑄 is 𝑇 . Similarly, ∧
accumulates 𝐹 .
In some fields the practice is to write 0 where we write 𝐹 and 1 in place of 𝑇.
The advantage of using symbols over writing the sentences out is that we can
express more things. For instance, if 𝑃 stands for ‘7 is odd’, 𝑄 stands for ‘9 is a
perfect square’, and 𝑅 means ‘11 is prime’ then (𝑃 ∨ 𝑄) ∧ ¬(𝑃 ∨ (𝑅 ∧ 𝑄)) is too
complex to comfortably state in everyday language. We call that a propositional
logic expression and denote it with a capital Roman letter such as 𝐸 .
Truth tables help in working out the behavior of the complex statements by
building them up from their components. The table below shows the input/output
behavior of (𝑃 ∨ 𝑄) ∧ ¬(𝑃 ∨ (𝑅 ∧ 𝑄)) .

𝑃 𝑄 𝑅 𝑃 ∨𝑄 𝑅 ∧𝑄 𝑃 ∨ (𝑅 ∧ 𝑄) ¬(𝑃 ∨ (𝑅 ∧ 𝑄)) expression

𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝑇 𝐹
𝐹 𝐹 𝑇 𝐹 𝐹 𝐹 𝑇 𝐹
𝐹 𝑇 𝐹 𝐹 𝐹 𝐹 𝑇 𝐹
𝐹 𝑇 𝑇 𝐹 𝐹 𝐹 𝑇 𝐹
𝑇 𝐹 𝐹 𝐹 𝐹 𝐹 𝑇 𝐹
𝑇 𝐹 𝑇 𝐹 𝐹 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝐹 𝐹 𝐹 𝑇 𝐹
𝑇 𝑇 𝑇 𝐹 𝐹 𝐹 𝑇 𝐹

The three ‘¬’, ‘∧’, and ‘∨’ are operators (or connectives). There are other
operators; here are two common ones.
𝑃 implies 𝑄 𝑃 if and only if 𝑄
𝑃 𝑄 𝑃 →𝑄 𝑃 ↔𝑄
𝐹 𝐹 𝑇 𝑇
𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝐹
𝑇 𝑇 𝑇 𝑇

Two statements are equivalent (or logically equivalent) if they have equal
output values whenever we give the same values to the variables. For instance,
𝑃 → 𝑄 is equivalent to ¬𝑃 ∨ 𝑄 , because if we assign 𝑃 = 𝐹, 𝑄 = 𝐹 then they both
give the value 𝑇 , if we assign 𝑃 = 𝐹, 𝑄 = 𝑇 then they also give the same value, etc.
That is, the statements are equivalent when their truth tables have the same final
column. We denote equivalence using ≡, as with 𝑃 → 𝑄 ≡ ¬𝑃 ∨ 𝑄
The set of formulas describing when statements are equivalent is Boolean
algebra. For instance, these are the distributive laws

𝑃 ∧ (𝑄 ∨ 𝑅) ≡ (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) 𝑃 ∧ (𝑄 ∨ 𝑅) ≡ (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅)

and these are DeMorgan’s laws.

¬(𝑃 ∧ 𝑄) ≡ ¬𝑃 ∨ ¬𝑄 ¬(𝑃 ∨ 𝑄) ≡ ¬𝑃 ∧ ¬𝑄

The three operators ‘¬’, ‘∧’, and ‘∨’ form a complete set in that we can reverse
the activity above: for any truth table we can use the three to produce an expression
whose input/output behavior is that table. In short, we can produce expressions
with any desired behavior. Here are two examples.

𝑃 𝑄 𝐸0 𝑃 𝑄 𝑅 𝐸1
𝐹 𝐹 𝑇 𝐹 𝐹 𝐹 𝐹
𝐹 𝑇 𝐹 𝐹 𝐹 𝑇 𝑇
𝑇 𝐹 𝐹 𝐹 𝑇 𝐹 𝑇
𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝑇
𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝐹
𝑇 𝑇 𝑇 𝐹

To produce 𝐸 0 , on the left, focus on the 𝑇 row. There, 𝑃 = 𝐹 and 𝑄 = 𝐹 and so

for 𝐸 0 we can use the expression ¬𝑃 ∧ ¬𝑄 . For 𝐸 1 , on the right, focus on all of
the 𝑇 rows. Target the second row with ¬𝑃 ∧ ¬𝑄 ∧ 𝑅 . Target the third row with
¬𝑃 ∧ 𝑄 ∧ ¬𝑅 . For the fifth use 𝑃 ∧ ¬𝑄 ∧ ¬𝑅 . Then 𝐸 1 joins these clauses with ∨’s.

(¬𝑃 ∧ ¬𝑄 ∧ 𝑅) ∨ (¬𝑃 ∧ 𝑄 ∧ ¬𝑅) ∨ (𝑃 ∧ ¬𝑄 ∧ ¬𝑅) (∗)

In a propositional logic expression, a single variable such as 𝑃 or 𝑄 is an atom.

An atom or its negation, such as 𝑃 or ¬𝑃 , is a literal. A clause is a number of literals
joined by a connective (we stick to either ∧ or ∨), so that ¬𝑃 ∧ ¬𝑄 ∧ 𝑅 is a clause.
The form of (∗) above is important. A propositional logic expression is in
Disjunctive Normal form or DNF if is a disjunction of clauses, where each clause is
a conjunction of literals.
Intuition requires that there be a matching approach that starts with the 𝐹
rows. We illustrate with 𝐸 0 . The
second, third, and fourth rows are 𝐹 . So 𝐸 0 ≡
¬ (¬𝑃 ∧𝑄)∨(𝑃 ∧¬𝑄)∨(𝑃 ∧𝑄) . DeMorgan’s second law gives ¬(¬𝑃 ∧¬𝑄)∧¬(𝑃 ∧
¬𝑄)∧¬(𝑃∧𝑄) . Next DeMorgan’s first law gives 𝐸 0 ≡ (𝑃∨𝑄)∧(¬𝑃∨𝑄)∧(¬𝑃∨¬𝑄) .
Conjunctive Normal form or CNF is a conjunction of clauses, where each clause is
a disjunction of literals. We use CNF more than DNF. With this form an expression
evaluates to 𝑇 if and only if all of its clauses evaluate to 𝑇 . And each clause
evaluates to 𝑇 if and only if at least one of its literals evaluates to 𝑇.
A Boolean function has Boolean inputs, that is, they are either 𝑇 or 𝐹 , and
Boolean outputs. By the prior paragraphs about DNF and CNF, each single-output
Boolean function is determined by some Boolean expression.

Exercises
C.1 Make a truth table for each of these propositions. (a) (𝑃∧𝑄)∧𝑅 (b) 𝑃∧(𝑄∧𝑅)
(c) 𝑃 ∧ (𝑄 ∨ 𝑅) (d) (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅)
C.2 Make a truth table for these. (a) ¬(𝑃 ∨ 𝑄) (b) ¬𝑃 ∧ ¬𝑄 (c) ¬(𝑃 ∧ 𝑄)
(d) ¬𝑃 ∨ ¬𝑄
C.3 For the tables below, construct a DNF propositional logic expression: (a) the
table on the left, (b) the one on the right.
𝑃 𝑄 𝑅 𝑃 𝑄 𝑅
𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝑇
𝐹 𝐹 𝑇 𝑇 𝐹 𝐹 𝑇 𝐹
𝐹 𝑇 𝐹 𝑇 𝐹 𝑇 𝐹 𝑇
𝐹 𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹
𝑇 𝐹 𝐹 𝐹 𝑇 𝐹 𝐹 𝐹
𝑇 𝐹 𝑇 𝑇 𝑇 𝐹 𝑇 𝐹
𝑇 𝑇 𝐹 𝐹 𝑇 𝑇 𝐹 𝑇
𝑇 𝑇 𝑇 𝐹 𝑇 𝑇 𝑇 𝑇

C.4 For the tables in the prior exercise, construct a CNF propositional logic
expression: (a) the table on the left, (b) the one on the right.
C.5 There are sixteen binary logical operators. Give all sixteen truth tables, and
give the operator’s name, such as ‘𝑃 → 𝑄 ’ or ‘𝑄 → 𝑃 ’.
Part Five

Notes
Endnotes
These are citations, sources, or discussions that supplement the text body. Each refers to a word or phrase
from that text body, in italics, and then the note is in plain text. Many of the entries include links to more
detail.

Cover
Calculating the bonus http://www.loc.gov/pictures/item/npc2007012636/

Preface
in addition to technical detail, also attends to a breadth of knowledge S Pinker emphasizes that a liberal
approach involves making connections and understanding in a context (Pinker 2014). “It seems to me
that educated people should know something about the 13-billion-year prehistory of our species and the
basic laws governing the physical and living world, including our bodies and brains. They should grasp
the timeline of human history from the dawn of agriculture to the present. They should be exposed
to the diversity of human cultures, and the major systems of belief and value with which they have
made sense of their lives. They should know about the formative events in human history, including
the blunders we can hope not to repeat. They should understand the principles behind democratic
governance and the rule of law. They should know how to appreciate works of fiction and art as sources
of aesthetic pleasure and as impetuses to reflect on the human condition. On top of this knowledge,
a liberal education should make certain habits of rationality second nature. Educated people should
be able to express complex ideas in clear writing and speech. They should appreciate that objective
knowledge is a precious commodity, and know how to distinguish vetted fact from superstition, rumor,
and unexamined conventional wisdom. They should know how to reason logically and statistically,
avoiding the fallacies and biases to which the untutored human mind is vulnerable. They should think
causally rather than magically, and know what it takes to distinguish causation from correlation and
coincidence. They should be acutely aware of human fallibility, most notably their own, and appreciate
that people who disagree with them are not stupid or evil. Accordingly, they should appreciate the
value of trying to change minds by persuasion rather than intimidation or demagoguery.” See also
https://www.aacu.org/leap/what-is-a-liberal-education.
computational thinking http://www.cs.cmu.edu/afs/cs/usr/wing/www/publications/Wing06.pdf

Prologue
D Hilbert and W Ackermann Hilbert was a very prominent mathematician, perhaps the world’s most
prominent mathematician, and Ackermann was his student. So they made an impression when they
wrote, “[This] must be considered the main problem of mathematical logic” (Hilbert and Ackermann
1950), p 73.
mathematical statement Specifically, the statement as discussed by Hilbert and Ackermann comes from a
first-order logic (versions of the Entscheidungsproblem for other systems had been proposed by other
mathematicians). First-order logic differs from propositional logic, the logic of truth tables, in that it
allows variables. Thus for instance if you are studying the natural numbers then you can have a Boolean
function Prime (𝑥) . (In this context a Boolean function is traditionally called a ‘predicate’.) To make
a statement that is either true or false we must then quantify statements, as in the (false) statement
“for all 𝑥 ∈ N, Prime (𝑥) implies PerfectSquare (𝑥) .” The modifier “first-order” means that the variables
used by the Boolean functions are members of the domain of discourse (for Prime above it is N), but
we cannot have that variables themselves are Boolean functions. (Allowing Boolean functions to take
Boolean functions as input is possible, but would make this a second-order, or even higher-order, logic.)
after a run He was 22 years old at the time. (Hodges 1983), p 96. This book is the authoritative source
for Turing’s fascinating life. During the Second World War, he led a group of British cryptanalysts at
Bletchley Park, Britain’s code breaking center, where his section was responsible for German naval
codes. He devised a number of techniques for breaking German ciphers, including an electromechanical
machine that could find settings for the German coding machine, the Enigma. Because the Battle of the
Atlantic was critical to the Allied war effort, and because cracking the codes was critical to defeating the
German submarine effort, Turing’s work was very important. (The major motion picture on this The
Imitation Game (Wikipedia contributors 2016e) is a fun watch but is not a slave to historical accuracy.)
After the war, at the National Physical Laboratory he made one of the first designs for a stored-program
computer. In 1952, when it was a crime in the UK, Turing was prosecuted for homosexual acts. He was
given chemical castration as an alternative to prison. He died in 1954 from cyanide poisoning which
an inquest determined was suicide. In 2009, following an Internet campaign, British Prime Minister
G Brown made an official public apology on behalf of the British government for “the appalling way he
was treated.”
Olympic marathon His time at the qualifying event was only ten minutes behind what was later the winning
time in the 1948 Olympic marathon. For more, see https://www.turing.org.uk/book/update/pa
rt6.html and http://www-groups.dcs.st-and.ac.uk/~history/Extras/Turing_running.html.
clerk Before the engineering of computing machines had advanced enough to make capable machines
widely available, much of what we would today do with a program was done by people, then called
“computers.” This book’s cover shows such computers at work.

Katherine Johnson, 1918–2020

Another example, as told in the film Hidden Figures, is that the trajectory for US astronaut John Glenn’s
pioneering orbit of Earth was found by the human computer Katherine Johnson and her colleagues,
African American women whose accomplishments are all the more impressive because they occurred
despite appalling discrimination.
don’t involve random methods We can build things that return completely random results; one example is a
device that registers consecutive clicks on a Geiger counter and if the second gap between clicks is longer
then the first it returns 1, else it returns 0. See also https://blog.cloudflare.com/randomness-
101-lavarand-in-production/.
continuous methods Before there were computers, engineers worked with analog models that were
sometimes quite large; see (Wikipedia contributors 2021). In these models there is no sense of step
one, step two.
analog devices See (A/V Geeks 2013) about slide rules, (Wikipedia contributors 2016c) about nomograms,
(YouTube user navyreviewer 2010) about a naval firing computer, and (Gizmodo 1948) about a more
general-purpose machine. See also https://www.youtube.com/watch?v=qqlJ50zDgeA about the
Antikythera mechanism. For a more recent take, see https://www.youtube.com/watch?v=GVsUOuSj
vcg.
reading results off of a slide rule or an instrument dial Suppose that an intermediate result of a calculation
is 1.23. If we read it off the slide rule with the convention that the resolution accuracy is only one
decimal place then we write down 1.2. Doubling that gives 2.4. But doubling the original number
2 · 1.23 = 2.46 and then rounding to one place gives 2.5.
no upper bound This explication is derived from (Rogers 1987), p 1–5.
more is provided Perhaps the clerk has a helper or the mechanism has a person attending it.
A reader may object that this violates the goal of the definition, to model in-principle-physically-realizable
computations We all know computations with no natural bounds. The long division algorithm that
we learn in grade school has no inherent bounds on the lengths of either inputs or outputs, or on the
amount of available scratch paper.
are so elementary that we cannot easily imagine them further divided (Turing 1937), (Turing 1938a)
LEGO’s See for instance https://www.youtube.com/watch?v=RLPVCJjTNgk&t=114s.
Finally, it trims off a 1 The instruction 𝑞 4 11𝑞 5 won’t ever be reached, but it does no harm. It is there for
the definition of a Turing machine, to make Δ defined on all 𝑞𝑝𝑇𝑝 . See also the note to that definition.
transition function The definition describes Δ as a function Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × 𝑄 . That is a
fudge. In Ppred , the state 𝑞 3 is used only for the purpose of halting the machine and so there is no
defined next state. In Padd , the state 𝑞 5 plays the same role. So, strictly speaking, the transition function
is a partial function, one where for some members of the domain there is no associated value; see
page 373. (Alternatively, we could write the set of states as 𝑄 ∪ 𝑄ˆ where the states in 𝑄ˆ are there only
for halting, and the transition function’s definition is Δ : 𝑄 × Σ → (Σ ∪ { L, R }) × (𝑄 ∪ 𝑄) ˆ .) We have
left this point out of the main presentation since it doesn’t cause confusion and the discussion can be a
distraction.)
a complete description of a machine’s action It is reasonable to ask why our standard model, the Turing
machine, is one that is so basic that programming it can be annoying. Why not choose a real world
machine? The reason is that, as here, we can completely describe the actions of the Turing machine
model, or of any of the other simple model that are sometimes used, in only a few paragraphs. A real
machine would take a full book, and a full semester. We do Turing machines because they are simple
to describe (they are also historically important, and the work in Chapter Five needs them).
𝑞 is a state, a member of 𝑄 We are vague about what ‘states’ are but we assume that whatever they are,
the set of states 𝑄 is disjoint from the set Σ ∪ { L, R }.
a snapshot, an instant in a computation So the configuration along with the Turing machine is all the
information that you need to continue a computation — it encapsulates the future history of that
computation.
A state machine is a device that stores the status of something at a given time. On input it can change
the status (it can also cause an action or output to take place). Mathematically, it is a finite set
𝑄 = {𝑞 0, ... 𝑞𝑛 } along with a function Δ : 𝑄 × Σ → 𝑄 .
rather than, “this shows that 𝜙 takes a string representing 3 to a string representing 5.” That is, we do this
for the same reason that we would say, “This is me when I was ten.” instead of, “This is a picture of me
when I was ten.”
a physical system evolves through a sequence of discrete steps that are local, meaning that all the action takes
place within one cell of the head Adapted from (Wigderson 2017).
constructed the first machine See (Leupold 1725).
A number of mathematicians See also (Wikipedia contributors 2014).
Church suggested to the most prominent expert in the area, Gödel (Soare 1999)
established beyond any doubt (Gödel 1995)
This is central to the Theory of Computation Some authors have claimed that neither Church nor Turing
stated anything as strong as is given here but instead that they proposed that the set of things that can
be done by a Turing machine is the same as the set of things that are computable by a human computer
(see for instance (Copeland and Proudfoot 1999)). But the thesis as stated here, that what can be done
by a Turing machine is what can be done by any physical mechanism that is discrete and deterministic,
is certainly the thesis as it is taken by most researchers in the field today. And besides, Church and
Turing did not in fact distinguish between the two cases; (Hodges 2016) points to Church’s review of
Turing’s paper in the Journal of Symbolic Logic: “The author [i.e. Turing] proposes as a criterion that
an infinite sequence of digits 0 and 1 be ‘computable’ that it shall be possible to devise a computing
machine, occupying a finite space and with working parts of finite size, which will write down the
sequence to any desired number of terms if allowed to run for a sufficiently long time. As a matter of
convenience, certain further restrictions are imposed on the character of the machine, but these are of
such a nature as obviously to cause no loss of generality — in particular, a human calculator, provided
with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine.” This has
Church referring to the human calculator not as the prototype but instead as a special case of the class
of defined machines.
We cannot give a mathematical proof of Church’s Thesis We cannot give a proof that starts from axioms
whose justification is on firmer footing than the thesis itself. R Williams has commented, “[T]he
Church-Turing thesis is not a formal proposition that can be proved. It is a scientific hypothesis, so it can
be ‘disproved’ in the sense that it is falsifiable. Any ‘proof ’ must provide a definition of computability
with it, and the proof is only as good as that definition.” (Stack Exchange author Ryan Williams 2010)
formalizes ‘intuitively mechanically computable’ Kleene wrote that “its role is to delimit precisely an
hitherto vaguely conceived totality.” (Kleene 1952), p 318.
Turing wrote (Turing 1937)
systematic error (Dershowitz and Gurevich 2008) p 304.
it may be the right answer Gödel wrote, “the great importance . . . [of] Turing’s computability [is] largely
due to the fact that with this concept one has for the first time succeeded in giving an absolute definition
of an interesting epistemological notion, i.e., one not depending on the formalism chosen.” (Gödel
1995), pages 150–153.
can compute all of the functions that can be computed by machines with two or more tapes For instance, we
can simulate a two-tape machine P2 on a one-tape machine P1 . One way to do this is by having P1 use
its even-numbered tape positions for P2 ’s first tape and using its odd tape positions for P2 ’s second tape.
(A more hand-wavy explanation is: a modern computer can clearly simulate a two-tape Turing machine
but a modern computer has sequential memory, which is like the one-tape machine’s sequential tape.)
compute the same set of functions We must adjust the convention for what is the output of a function.
evident immediately (Church 1937)
S Aaronson has made this point From his blog Shtetl-Optimized, (Aaronson 2012b).
supply a stream of random bits Some CPU’s come with that capability built in; see for instance
https://en.wikipedia.org/wiki/RdRand.
beyond discrete and deterministic From (Stack Exchange author Andrej Bauer 2016): “Turing machines
are described concretely in terms of states, a head, and a working tape. It is far from obvious that this
exhausts the computing possibilities of the universe we live in. Could we not make a more powerful
machine using electricity, or water, or quantum phenomena? What if we fly a Turing machine into a
black hole at just the right speed and direction, so that it can perform infinitely many steps in what
appears finite time to us? You cannot just say ‘obviously not’ — you need to do some calculations in
general relativity first. And what if physicists find out a way to communicate and control parallel
universes, so that we can run infinitely many Turing machines in parallel time?”
everything that experiments with reality would ever find to be possible Modern Physics is a sophisticated
and advanced field of study so we could doubt that anything large has been overlooked. However,
there is historical reason for supposing that such a thing is possible. The physicists H von Helmholtz
in 1856 and S Newcomb in 1892 calculated that the Sun is about 20 million years old (they assumed
that the Sun glowed from the energy provided by its gravitational contraction in condensing from a
nebula of gas and dust to its current state). Consistently with that, one of the world’s most reputable
physicists, W Kelvin, estimated in 1897 that the Earth was, “more than 20 and less than 40 million year
old, and probably much nearer 20 than 40” (he calculated how long it would take the Earth to cool
from a completely molten object to its present temperature). He said, “unless sources now unknown to
us are prepared in the great storehouse of creation” then there was not enough energy in the system
to justify a longer estimate. One person very troubled by this was Darwin, having himself found that
a valley in England took 300 million years to erode, and consequently that there was enough time,
called “deep time,” for the slow but steady process of evolution of species to happen. Then, in 1896,
A Becquerel discovered radiation. Everything changed. All of the prior calculations did not account for
it and the apparent discrepancy vanished. (Wikipedia contributors 2016a)
the solution is not computable See (Pour-El and Richards 1981).
compute an exact solution See http://www.smbc-comics.com/?id=3054.
Three-Body Problem See https://en.wikipedia.org/wiki/Three-body_problem.
we can still wonder See (Piccinini 2017).
This big question remains open A sample of readings: frequently cited is (Black 2000), which takes the
thesis to be about what is humanly computable, and (Copeland 1996), (Copeland 1999), and (Copeland
2002) argue that computations can be done that are beyond the capabilities of Turing machines. Against
that are (Davis 2004), (Davis 2006), and (Gandy 1980), which give arguments that many Theory of
Computing researchers consider conclusive.
the mainstream community of researchers takes Church’s Thesis as the basis for its work For some idea of
additional views see (Zenil 2012).
Often when we want to show that something is computable The same point stated another way, from (Stack
Exchange author Andrej Bauer 2018): In books on computability theory it is common for the text to
skip details on how a particular machine is to be constructed. The author of the computability book
will mumble something about the Turing-Church thesis somewhere in the beginning. This is to be read
as “you will have to do the missing parts yourself, or equip yourself with the same sense of inner feeling
about computation as I did”. Often the author will give you hints on how to construct a machine, and
call them “pseudo-code”, “effective procedure”, “idea”, or some such. The Church-Turing thesis is the
social convention that such descriptions of machines suffice. (Of course, the social convention is not
arbitrary but rather based on many years of experience on what is and is not computable.) . . . I am
not saying that this is a bad idea, I am just telling you honestly what is going on. . . . So what are we
supposed to do? We certainly do not want to write out detailed constructions of machines, because
then students will end up thinking that’s what computability theory is about. It isn’t. Computability
theory is about contemplating what machines we could construct if we wanted to, but we don’t. As
usual, the best path to wisdom is to pass through a phase of confusion.
Suppose that you have infinitely many dollars. (MathOverflow user Joel David Hamkins 2010)
H Grassmann produced a more elegant definition In 1888 Dedekind used this definition to give the first
rigorous proof of the laws of elementary school arithmetic.
it specifies the meaning, the semantics, of the operation A Perl’s epigram, “Recursion is the root of
computation since it trades description for time” expresses this idea. The recursive definition includes
steps implicitly, and with them time, in that you need to keep expanding the recursive calls. But it does
not include them in preference to what they are about.
logically problematic The sense that there is something perplexing about recursion is often expressed with
a story.

W James gave a public lecture on cosmology, and was approached by an older woman from
the audience. “Your idea that the sun is the center of the solar system and the earth orbits
around it has a good ring Mr James, but it’s wrong.” she said. “Our crust of earth lies on the
back of a giant turtle.” James gently asked, “What does this turtle stand on?” “You’re very
clever, Mr James,” she replied, “but the first turtle stands on the back of a second, far larger,
turtle.” James persisted, “And the second turtle, Madam?” Immediately she crowed, “It’s no
use Mr James — it’s turtles all the way down!” (Wikipedia contributors 2016f)

See https://xkcd.com/1416.
Another widely known reference is that with the invention of better microscopes, scientists studying
fleas came to see that the fleas themselves had parasites. The Victorian mathematician Augustus De
Morgan wrote a poem (derived from one of Jonathan Swift) called Siphonaptera, which is the biological
order of fleas.

Great fleas have little fleas upon their backs to bite ’em,
And little fleas have lesser fleas, and so ad infinitum.

See also Room 8, winner of the 2014 short film award from the British Academy of Film and Television
Arts.
define the function on higher-numbered inputs using only its values on lower-numbered ones For the function
specified by 𝑓 ( 0) = 1 and 𝑓 (𝑛) = 𝑛 · 𝑓 (𝑓 (𝑛 − 1) − 1) , try computing the values 𝑓 ( 0) through 𝑓 ( 5) .
the first sequence of numbers ever computed on an electronic computer It was computed on EDSAC on
1949-May-06. See (N. J. A. Sloane 2019) and (Renwick 1949).
Towers of Hanoi The puzzle was invented by E Lucas in 1883 but the next year H De Parville made of it
quite a great problem with the delightful problem statement.
hyperoperation (Goodstein 1947)
4
H4 ( 4, 3) = 44 is much greater than the number of elementary particles in the universe The radius of the
universe if about 45 × 109 light years. That’s about 1062 Plank units. A system of much more than
𝑟 1.5 particles packed in 𝑟 Plank units will collapse rapidly. So the number of particles is less than 1092 ,
which is much less than H3 ( 4, 4) ≈ 10154 (solve 4256 = 10𝑥 by taking the logarithm base 10 of both
sides to get 𝑥 = 256 · log ( 4) ≈ 154.13). (Levin 2016)
Ackermann function There are many different Ackermann functions in the literature. A common one is
the function of one variable A (𝑘, 𝑘) . See (Wikipedia contributors 2024).
a programming language having only bounded loops computes the primitive recursive functions (Meyer and
Ritchie 1966)
output only primes In fact, there is no one-input polynomial with integer coefficients that outputs a prime
for all integer inputs, except if the polynomial is constant. This was shown in 1752 by C Goldbach.
The proof is so simple and delightful, and not widely known, that we will give it here. Suppose 𝑝 is a
polynomial with integer coefficients that on integer inputs returns only primes. Fix some 𝑛ˆ ∈ N, and
then 𝑝 (𝑛)
ˆ =𝑚 ˆ is a prime. Into the polynomial plug 𝑛ˆ + 𝑘 · 𝑚ˆ , where 𝑘 ∈ Z. Expanding gives lots of
terms with 𝑚 ˆ in them, and gathering together like terms shows 𝑝 (𝑛ˆ + 𝑘 · 𝑚)
ˆ ≡ 𝑝 (𝑛)
ˆ mod 𝑚 ˆ . Because
ˆ =𝑚
𝑝 (𝑛) ˆ , this gives that 𝑝 (𝑛ˆ + 𝑘 · 𝑚)
ˆ =𝑚
ˆ since that is the only prime number that is a multiple of 𝑚
ˆ,
and 𝑝 outputs only primes. But with that, 𝑝 (𝑛) = 𝑚 ˆ has infinitely many roots, and is therefore the
constant polynomial.
this relates unbounded search to the Entscheidungsproblem It is possible that neither search will halt. It is
possible that the conjecture is true but not provable from the axiom system that you are using.
Collatz conjecture See (Wikipedia contributors 2019a).
sin (𝑥) may be calculated via its Taylor polynomial The Taylor series is sin (𝑥) = 𝑥 −𝑥 3 /3!+𝑥 5 /5!−𝑥 7 /7!+· · · .
We might do a practical calculation by deciding that a sufficiently good approximation is to terminate
that series at the 𝑥 5 term, giving a Taylor polynomial.
C Shannon See this profile of him: http://www.newyorker.com/tech/elements/claude-shannon-
the-father-of-the-information-age-turns-1100100.
master’s thesis His paper on the subject was his master’s thesis, https://en.wikipedia.org/wiki/A_Sy
mbolic_Analysis_of_Relay_and_Switching_Circuits.
type of not gate This shows an N-type Metal Oxide Semiconductor Transistor. There are many other types.
the von Neumann architecture Although, that architecture was based on the work of JP Eckert and
J Mauchly.
problem of humans living on Mars To get there the idea was to use a rocket ship impelled by dropping
atom bombs out the bottom; the energy would let the ship move rapidly around the solar system. This
sounds like a crank plan but it is perfectly feasible (Brower 1983). Having been a key person in the
development of the atomic bomb, von Neumann was keenly aware of their capabilities.
Game of Life Conway explains it here: https://www.youtube.com/watch?v=E8kUJL04ELA .
J Conway Conway was a magnetic person and extraordinarily creative. Sadly, he died in the Covid-19
pandemic. See an excerpt from the excellent biography at https://www.ias.edu/ideas/2015/rober
ts-john-horton-conway.
M Gardner’s celebrated Mathematical Games column of Scientific American in October 1970 (Gardner 1970)
computer craze (Bellos 2014)
zero-player game See https://www.youtube.com/watch?v=R9Plq-D1gEk.
B Gosper With R Greenblatt, he started the hacker community, and is particularly well-known among
Lisp-ers.
a rabbit Discovered by A Trevorrow in 1986.
anything that can be mechanically computed (Rendell 2011)
Here we will produce a simplified variant There are a number of variants in the the literature. For instance,
the hyperoperation used here is not the function actually introduced by Ackermann, which has three
inputs. And another student of Hilbert’s, G. Sudan, produced a similar function at roughly the same
time and for the same purpose, or being computable but not primitive recursive. The hyperoperation
itself was defined in 1948 by R Goodstein.
it is not primitive recursive This presentation is based on that of (Hennie 1977), (Smoryński 1991),
and (Robinson 1948).
This variant In addition to Péter, development of this variant also came from R Robinson.
a function is primitive recursive See the history at (Brock 2020).
LOOP (Meyer and Ritchie 1966)
the interpreter for LOOP Adapted from (Schnieder 2001)

Background
Deep Field movie https://www.youtube.com/watch?v=yDiD8F9ItX0
two paradoxes These are what Quine calls veridical paradoxes: they may at first seem absurd but we will
demonstrate that they are nonetheless true. (Wikipedia contributors 2018)
Galileo’s Paradox He did not invent it but he gave it prominence in his celebrated Discourses and
Mathematical Demonstrations Relating to Two New Sciences.
same cardinality Numbers have two natures. First, in referring to the set of stars known as the Pleiades
as the “Seven Sisters” we mean to take them as a set, not ordered in any way. In contrast, second, in
referring to the “Seven Deadly Sins,” well, clearly some of them score higher than others. The first
reference speaks to the cardinal nature of numbers and the second to their ordinal nature. For finite
numbers the two are bound together, as Lemma 1.5 says, but for infinite numbers they differ.
was proposed by G Cantor in the 1870’s For his discoveries, Cantor was reviled by a prominent mathematician
and former professor L Kronecker as a “corrupter of youth.” That was pre-Elvis.
which is Cantor’s definition (Gödel 1964)
the most important infinite set is the natural numbers, N = { 0, 1, 2, ... } Its existence is guaranteed by the
Axiom of Infinity, one of the standard axioms of Mathematics, the Zermelo-Frankel axioms.
lexicographic order Sometimes called lexical order or dictionary order.
due to Zeno Zeno gave a number of related paradoxes of motion. See (Wikipedia contributors 2016g)
(Huggett 2010), (Bragg 2016), as well as http://www.smbc-comics.com/comic/zeno and this xkcd.
Courtesy xkcd.com

the distances 𝑥𝑖+1 − 𝑥𝑖 shrink toward zero, there is always further to go because of the open-endedness at
the left of the interval ( 0 .. ∞) A modern paradox that like this one uses the open-endedness of the
numbers is Thomson’s Lamp Paradox: a person turns on the room lights and then a minute later turns
them off, a half minute later turns them on again, and a quarter minute later turns them off, etc. After
two minutes, are the lights on or off? This paradox was devised in 1954 by J F Thomson to analyze the
possibility of a supertask, the completion of an infinite number of tasks. Thomson’s answer was that it
creates a contradiction: “It cannot be on, because I did not ever turn it on without at once turning it off.
It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at
once turning it on. But the lamp must be either on or off ” (Thomson 1954). See also the discussion of
the Littlewood Paradox (Wikipedia contributors 2016d).
Number the diagonals Really, these are the anti-diagonals, since the diagonal is composed of the pairs
⟨𝑛, 𝑛⟩ .
arithmetic series with total 𝑑 (𝑑 + 1)/2 It is called the 𝑑 -th trianglar number
cantor (𝑥, 𝑦) = 𝑥 + [(𝑥 + 𝑦)(𝑥 + 𝑦 + 1)/2] The Fueter-Pólya Theorem says that this is essentially the
only quadratic function that serves as a pairing; see (Smoryński 1991). More precisely, the only
real-coefficient quadratic polynomials in two variables giving a correspondence from N2 to N are 𝑝 (𝑥, 𝑦)
and 𝑝 (𝑦, 𝑥) , where 𝑝 (𝑎, 𝑏) = 𝑎 + [(𝑎 + 𝑏)(𝑎 + 𝑏 + 1)/2] . No one knows whether there are pairing
functions that are any other kind of polynomial.
memoization The term was invented by Donald Michie (Wikipedia contributors 2016b), who among other
accomplishments was a coworker of Turing’s in the World War II effort to break the German secret
codes.
assume that we have a family of correspondences 𝑔 𝑗 : 𝑁 → 𝑆ˆ𝑗 To pass from the original collection of
infinitely many onto functions 𝑔𝑖 : N → 𝑆ˆ𝑖 to a single, uniform, family of onto functions 𝑔 𝑗 (𝑖) = 𝐺 ( 𝑗, 𝑦)
we need some version of the Axiom of Choice, perhaps Countable Choice. In this book we assume a
suitable Choice axiom, and we omit further discussion of that because it would take us far afield.
doesn’t matter much For more on “much” see (Rogers 1958).
adding the instruction 𝑞 𝑗+𝑘 BB𝑞 𝑗+𝑘
This is essentially what a compiler calls ‘unreachable code’ in that it is not a state that the machine will
ever be in.
central to the entire subject The classic text (Rogers 1987) says, “It is not inaccurate to say that our theory
is, in large part, a ‘theory of diagonalization’.”
This technique is diagonalization The argument just sketched is often called Cantor’s diagonal proof,
although it was not Cantor’s original argument for the result, and although the argument style is not
due to Cantor but instead to Paul du Bois-Reymond. The fact that scientific results are often attributed
to people who are not their inventor is Stigler’s law of eponymy. Naturally it wasn’t invented by Stigler
(who attributes it to Merton). In mathematics this is called Boyer’s Law, who didn’t invent it either.
(Wikipedia contributors 2015).
Musical Chairs It starts with more children than chairs. Some music plays and the children walk around
the chairs. When suddenly the music stops each child tries to sit, leaving someone without a chair. That
child has to leave the game, a chair is removed, and the game proceeds.
so many reals This is a Pigeonhole Principle argument.
That is true but the proof is beyond our scope Also beyond our scope is the argument that for any two sets,
one of them has cardinality less than or equal to the other. This is equivalent to the Axiom of Choice.
consider this element of P (𝑆) This is sometimes called the Russell set because of its relation to Russell’s
paradox. See also this XKCD.

Courtesy XKCD

Your study partner is confused about the diagonal argument From (Stack Exchange author Kaktus and
various others 2019).
ENIAC, reconfigure by rewiring. Jean Jennings (left), Marlyn Wescoff (center), and Ruth Lichterman
program the ENIAC, circa 1946. US Army Photo.
A pattern in technology is for jobs done in hardware to migrate to software One story that illustrates the
naturalness of this involves the English mathematician C Babbage, and his protegee A Lovelace. In 1812
Babbage was developing tables of logarithms. These were calculated by computers — the word then
current for the people who computed them by hand. To check the accuracy he had two people do the
same table and compared. He was annoyed at the number of discrepancies and had the idea to build a
machine to do the computing. He got a government grant to design and construct a machine called the
difference engine, which he started in 1822. This was a single-purpose device, what we today would
call a calculator. One person who became interested in the computations was an acquaintance of his,
Lovelace (who at the time was named Byron, as she was the daughter of the poet Lord Byron).

Charles Babbage, 1791–1871 Ada Lovelace (nee Byron), 1815–1852

However, this machine was never finished because Babbage had the thought to make a device that
would be programmable, and that was too much of a temptation. Lovelace contributed an extensive
set of notes on a proposed new machine, the analytical engine, and has become known as the first
programmer.
controlled by cards It weaves with hooks whose positions, raised or lowered, are determined by holes
punched in the cards
have the same output behavior A technical point: Turing machines have a tape alphabet. So a universal
machine’s input or output can only involve symbols that it is defined as able to use. If another machine
has a different tape alphabet then how can the universal machine simulate it? As usual, we define
things so that the universal machine manipulates representations of the other machine’s alphabet. This
is similar to the way that an everyday computer represents decimals using binary.
flowchart Flowcharts are widely used to sketch algorithms; here is one from XKCD.

Courtesy xkcd

“Yields falsehood when preceded by its quotation”

yields falsehood when preceded by its quotation.

If this sentence were false then it would be saying something that is true. If this sentence were true
then what it says would hold and it would be not true.
A wonderful popular book exploring these topics and many others is (Hofstadter 1979).
quine Named for the philosopher Willard Van Orman Quine.
The verb ‘to quine’ Invented by D Hofstadter. It traces back to the statement due to the philosopher
W Quine, “yields falsehood when preceded by its quotation” yields falsehood when preceded by its quotation
which has the paradoxical quality that if true it asserts its own falsehood, and if false it must be true.
And it accomplishes that without direct self-reference.
We can express that in code The development of this part of the subsection comes from (Boro Sitnikovski
2024) — also where the name Boro comes from — and (Avigad 2007).
which 𝑛 -state Turing Machine does the most computational work before halting R H Bruck wrote (Bruck
1953), “I might compare the high-speed computing machine to a remarkably large and awkward pencil
which takes a long time to sharpen and cannot be held in the fingers in the usual manner so that it
gives the illusion of responding to my thoughts, but is fitted with a rather delicate engine and will write
like a mad thing provided I am willing to let it dictate pretty much the subjects on which it writes.”
The Busy Beaver machine is the maddest writer of that size.
Think of this as a competition Two very nice videos on this subject are The Boundary of Computation and
What happens at the Boundary of Computation? from YouTube contributor Mutual Information.
In the 1962 paper Radó This paper (Radó 1962) is exceptionally clear and interesting.
In 2024, a team of researchers See (Brubaker 2024)
odd perfect number A number is perfect if it is the sum of its divisors. For instance, 6 is 1 + 2 + 3. Even
perfect numbers exist but we do not know if odd ones do.
machines with three or more symbols The case of machines with three states and three symbols is
not known. Solving it requires solving a Collatz-like problem that currently no one can do. See
https://www.sligocki.com/2023/10/16/bb-3-3-is-hard.html.
BB (𝑛) is unknowable See (Aaronson 2012a) and the excellent summary (Aaronson 2020). See also
https://www.quantamagazine.org/the-busy-beaver-game-illuminates-the-fundamental-
limits-of-math-20201210/.
a 7918-state Turing machine The number of states needed has since been reduced. As of this writing it is
748. See the wonderful bachelor’s degree thesis by J Riebel at https://www.ingo-blechschmidt.e
u/assets/bachelor-thesis-undecidability-bb748.pdf.
the standard axioms for Mathematics This is ZFC, the Zermelo–Fraenkel axioms with the Axiom of Choice.
(In addition, they also took the hypothesis of the Stationary Ramsey Property.)
take the floor Let the 𝑛 -th triangle number be 𝑡 (𝑛) = 0 + 1 + · · · + 𝑛 = 𝑛(𝑛 + 1)/2. The function 𝑡 is
monotonically increasing and there are infinitely many triangle numbers. Thus for every natural number 𝑐
there is a unique triangle number 𝑡 (𝑛) that is maximal so that 𝑐 = 𝑡 (𝑛) + 𝑘 for some 𝑘 ∈ N. Because
𝑡 (𝑛 + 1) = 𝑡 (𝑛) + 𝑛 + 1, we see that 𝑘 < 𝑛 + 1, that is, 𝑘 ≤ 𝑛 . Thus, to compute the diagonal number 𝑑
from the Cantor number 𝑐 of a pair, we have ( 1/2)𝑑 (𝑑 + 1) ≤ 𝑐 < √ ( 1/2)(𝑑 + 1)(𝑑 + 2) . Applying
√ the
quadratic formula to√the left half and right halves gives ( 1/2)(−3 + 1 + 8𝑐) < 𝑑 ≤ ( 1/2)(−1 + 1 + 8𝑐) .
Taking ( 1/2) (−1 + 1 + 8𝑐) to be 𝛼 gives that 𝑐 ∈ (𝛼 − 1 .. 𝛼] so that 𝑑 = ⌊𝛼⌋ . (SE author Brian M.
Scott 2020)
we can extend to tuples of any size See https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_
it.

Languages
having elephants move to the left side of a road or to the right Less fancifully, we could be making a Turing
machine out of LEGO’s and want to keep track by sliding a block from one side of a column to the other.
Or, we could use an abacus.
we could translate any such procedure While a person may quite sensibly worry that elephants could be
not just on the left side or the right, but in any of the continuum of points in between, we will make
this assertion without more philosophical analysis than by just referring to the discrete nature of our
mechanisms (as Turing basically did). That is, we take it as an axiom.
finite set { 1000001, 1100001 } Although it looks like two strings plucked from the air, the language is not
without sense. The bitstring 1000001 represents capital A in the ASCII encoding, while 1100001 is lower
case a. The American Standard Code for Information Interchange, ASCII, is a widely used, albeit quite
old, way of encoding character information in computers. The most common modern character encoding
is UTF-8, which extends ASCII. For the history see https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-
history.txt.
palindrome Sometimes people call Psychology the study of college freshmen because so many studies
start, roughly, “we put a bunch of college freshmen in a room, lied to them about what we were doing,
and . . . ” In the same way, Theory of Computing can sometimes seem like the study of palindromes.
palindromes in English Some people like to move beyond single word palindromes to make sentence-length
palindromes that make some sense. Some of the more famous are: (1) supposedly the first sentence
ever uttered, “Madam, I’m Adam” (2) Napoleon’s lament, “Able was I ere I saw Elba” and (3) “A man, a
plan, a canal: Panama”, about Theodore Roosevelt. See also http://norvig.com/palindrome.html.
defining Σ∗ to be the set of strings of characters from that alphabet That is, we won’t be careful to distinguish
between the symbols of the alphabet and the single-character strings consisting of just those characters.
In practice usually a language is governed by rules Linguists started formalizing the description of language,
including phrase structure, at the start of the 1900’s. Meanwhile, string rewriting rules as formal,
abstract systems were introduced and studied by mathematicians including Axel Thue in 1914, Emil
Post from the 1920’s through the 1940’s and Turing in 1936. Noam Chomsky, while teaching linguistics
to students of information theory at MIT, combined linguistics and mathematics by taking Thue’s
formalism as the basis for the description of the syntax of natural language. (Wikipedia contributors
2017e)
“the red big barn” sounds wrong. Experts vary on the exact rules but one source gives the correct order
as (article) + number + judgment/attitude + size, length, height + age + color + origin + material
+ purpose + (noun), so that “big red barn” is size + color + noun, as is “little green men.” This is
called the Royal Order of Adjectives; see http://english.stackexchange.com/a/1159. A person
may object by citing “big bad wolf ” but it turns out there is another, stronger, rule that if there are
three words then they have to go I-A-O and if there are two words then the order has to be I followed
by either A or O. Thus we have tick tock but not tock tick. Similarly for tic-tac-toe, mishmash, King
Kong, or dilly dally.
very strict rules Everyone who has programmed has had a compiler chide them about a syntax violation.
grammars are the language of languages. From Matt Swift, http://matt.might.net/articles/gramma
rs-bnf-ebnf/.
this grammar Taken from https://en.wikipedia.org/wiki/Formal_grammar.
dangling else See https://en.wikipedia.org/wiki/Dangling_else.
postal addresses. Adapted from https://en.wikipedia.org/wiki/BackusNaur_Form.
Recall Turing’s prototype computer The fact that in this book we stick to grammars where each rule head
is a single nonterminal greatly restricts the languages that we can compute. More general grammars
can compute more, including every set that can be decided by a Turing machine.
we often state problems For instance, see the blogfeed for Theoretical Computer Science http://cstheory-
feed.org/ (Various authors 2017)
represent graphs Example 3.2 make the point that a graph is about the connections between vertices, not
about how it is drawn. This graph representation via a matrix also illustrates that point because it is,
after all, not drawn.
the most common way to express grammars One factor influencing its adoption was a letter that D Knuth
wrote to the Communications of the ACM (Knuth 1964). He listed some advantages over the grammar-
specification methods that were then widely used. Most importantly, he contrasted BNF’s more
descriptive elements such as using ‘<addition operator>’ instead of ‘A’, saying that the difference is a
great addition to “the explanatory power of a syntax.” He also proposed the name ‘Backus Naur Form’.
(Now a hyphen is most common.)
some extensions for grouping and replication The best current standard is https://www.w3.org/TR/xml/.
Time is a difficult engineering problem One complication of time, among many, is leap seconds. The Earth is
constantly undergoing deceleration caused by the braking action of the tides. The average deceleration
of the Earth is roughly 1.4 milliseconds per day per century, although the exact number varies from year
to year depending on many factors, including major earthquakes and volcanic eruptions. To ensure
that atomic clocks and the Earth’s rotational time do not differ by more than 0.9 seconds, occasionally
an extra second is added to civil time. This leap second can be either positive or negative depending on
the Earth’s rotation — on occasion there are minutes with only 58 seconds, and on occasion minutes
with 60.
Adding to the confusion is that the changes in rotation are uneven and we cannot predict leap seconds
far into the future. The International Earth Rotation Service publishes bulletins that announce leap
seconds with a few weeks warning. Thus, there is no way to determine how many seconds there will
be between the current instant and, say, ten years from now. (This can cause trouble in area such as
navigation and high-frequency trading and there are proposals to eliminate leap seconds or replace
them with leap hours.) Since the first leap second in 1972, all leap seconds have been positive and
there were 23 leap seconds in the 34 years to January 2006. (U.S. Naval Observatory 2017)
RFC 3339 (Klyne and Newman 2002)
strings such as 1958-10-12T23:20:50.52Z This format has a number of advantages including human
readability, that if you sort a collection of these strings then earlier times will come earlier, simplicity
(there is only one format), and that they include the time zone information.
a BNF grammar Some notes: (1) Coordinated Universal Time, the basis for civil time, is often called UTC,
but is sometimes abbreviated Z and read aloud as “Zulu,” (2) years are four digits to prevent the Y2K
problem (Encyclopædia Britannica Editors 2017), (3) the only month numbers allowed are 01–12 and
in each month only some day numbers are allowed, and (4) the only time hours allowed are 00–23,
minutes must be in the range 00–59, etc. (Klyne and Newman 2002)

Automata
what can be done by a machine having a number of possible configurations that is bounded From Rabin,
Scott, Finite Automata and Their Decision Problems, 1959: Turing machines are widely considered to be
the abstract prototype of digital computers; workers in the field, however, have felt more and more that the
notion of a Turing machine is too general to serve as an accurate model of actual computers. It is well
known that even for simple calculations it is impossible to give an a priori upper bound on the amount of
tape a Turing machine will need for any given computation. It is precisely this feature that renders Turing’s
concept unrealistic. In the last few years the idea of a finite automaton has appeared in the literature. These
are machines having only a finite number of internal states that can be used for memory and computation.
The restriction on finiteness appears to give a better approximation to the idea of a physical machine. Of
course, such machines cannot do as much as Turing machines, but the advantage of being able to compute
an arbitrary general recursive function is questionable, since very few of these functions come up in practical
applications.
transition function Δ : 𝑄 × Σ → 𝑄 Some authors allow the transition function to be partial. That is, some
authors allow that for some state-symbol pairs there is no next state. This choice by an author is a
matter of convenience, as for any such machine you can create an error state 𝑞 error or dead state, that is
not an accepting state and that transitions only to itself, and send all such pairs there. This transition
function is total, and the new machine has the same collection of accepted strings as the old.
Unicode While in the early days of computers characters could be encoded with standards such as ASCII,
which includes only upper and lower case unaccented letters, digits, a few punctuation marks, and a
few control characters, today’s global interconnected world needs more. The Unicode standard assigns
a unique number called a code point to every character in every language (to a fair approximation).
See (Wikipedia contributors 2017l).
if a language is finite then there is a Finite State machine that accepts a string if and only if it is a member of
that language In practice the suggestion that for any finite set of strings there is a Finite State machine
that accepts it, simply by listing all of the cases, may not be reasonable. For example, there are finitely
many people and each has finitely many active phone numbers so the set of all currently-active phone
numbers is a finite language. But constructing a machine for it would be silly. In addition, a finite
language doesn’t have to be large for it to be difficult, in a sense. Take Goldbach’s conjecture, that
every even number greater than 2 is the sum of two primes, as in 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, . . .
Computer testing shows that this pattern continues to hold up to very large numbers but no one knows
if it is true for all evens. Now consider the set consisting of the string 𝜎 ∈ { 0, ... 9 }∗ representing in
decimal the smallest even number that is not the sum of two primes. This set is finite since it has either
one member or none. But while that set is tiny, we don’t know what it contains.
simple devices The devices to do the switching were invented in 1889 by an undertaker whose competitor’s
wife was the local telephone operator and routed calls to her husband’s business. (Wikipedia contributors
2017b)
allowed users to directly dial long distance in North America See the description of the North America
Numbering Plan (Wikipedia contributors 2017g).
same-area local exchange Initially, large states, those divided into multiple numbering plan areas, were
assigned area codes with a 1 in the second position. Areas that covered entire states or provinces got
codes with 0 as the middle digit. That was abandoned by the early 1950’s. (Wikipedia contributors
2017g).
Alcuin of York (735–804) See https://www.bbc.co.uk/programmes/m000dqy8.
a wolf, a goat, and a bundle of cabbages This translation is from A Raymond, from the University of
Washington.
that of finding the shortest circuit visiting every city in a list See https://nbviewer.jupyter.org/url/n
orvig.com/ipython/TSP.ipynb.
US lower forty eight states See https://wiki.openstreetmap.org/wiki/TIGER.
no-state What is no-state, exactly? We can think that it is like what happens if you write a program with a
sequence of if-then statements and forget to include an else. Obviously the computer goes somewhere,
the instruction pointer points to some address, but what happens is not sensible in terms of the model
that you’ve stated.
As an alternate, the wonderful book (Hofstadter 1979) describes a place named Tumbolia, which is
where holes go when they are filled, and also where your lap goes when you stand up. Perhaps the
machines go there.
amb (...) This operator takes a list of possibilities and evaluates to an option, if one is available, that
allows the program as a whole to succeed. Here is a small example (from https://rosettacode.org/
wiki/Amb): first let the values (x (amb 1 2 3)) and (y (amb 5 4 3)). Then call (amb (= (* x y)
8)). The result is that x has the value 2, while 𝑦 is 4. That is, amb(1,2,3) chooses the future in which
x has value 2, and amb(7,6,4,5) chooses 4, in order to ensure that amb(x*y = 8) produces a success.
These operators were described by John McCarthy in (McCarthy 1963). “Ambiguous functions are
not really functions. For each prescription of values to the arguments the ambiguous function has a
collection of possible values. An example of an ambiguous function is less (𝑛) defined for all positive
integer values of 𝑛 . Every non-negative integer less than 𝑛 is a possible value of less (𝑛) . First we
define a basic ambiguity operator amb (𝑥, 𝑦) whose possible values are 𝑥 and 𝑦 when both are defined:
otherwise, whichever is defined. Now we can define less (𝑛) by less (𝑛) = amb (𝑛 − 1, less (𝑛 − 1)) .”
demon The term ‘demon’ arose from Maxwell’s demon. This is a thought experiment created in 1867 by
the physicist J C Maxwell about the second law of thermodynamics, which says that it takes energy to
raise the temperature of a sealed system. Maxwell imagined a chamber of gas with a door controlled by
an all-knowing demon. When the demon sees a gas molecule of gas approaching that is slow-moving, it
opens the door and lets that molecule out of the chamber, thereby raising the chamber’s temperature
without any external heat. See (Wikipedia contributors 2019c).
Pronounced KLAY-nee His son Ken Kleene, wrote, “As far as I am aware this pronunciation is incorrect in
all known languages. I believe that this novel pronunciation was invented by my father.” (Free Online
Dictionary of Computing (Denis Howe) 2017)
mathematical model of neurons (Wikipedia contributors 2017c)
have a vowel in the middle Most speakers of American English cite the vowels as ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’. See
(Bigham 2014).
before and after diagrams This is derived from (Hopcroft, Motwani, and Ullman 2001).
The fact that we can describe these languages in so many different ways (Stack Exchange author David
Richerby 2018).
performing that operation on its members always yields another member Familiar examples are that adding
two integers always gives an integer so the integers are closed under the operation of addition, and
that squaring an integer always results in an integer so that the integers are closed under squaring.
the machine accepts at least one string of length 𝑘 , where 𝑛 ≤ 𝑘 < 2𝑛 This gives an algorithm that inputs a
Finite State machine and determines, in a finite time, if it recognizes an infinite language.
reserve a character We use it only to mark the stack bottom, never in the middle of the stack.
We are ready for the definition There are a variety of definitions for Pushdown machines. For instance,
here we have the machine accepts if its tape is empty and it is in an accepting state, but a variant
requires that its stack be empty. However, all of these variants that extend nondeterministic machines
accept the same set of languages.
Δ : 𝑄 × (Σ ∪ { B, 𝜀 }) × (Γ ∪ { ⊥ }) → P 𝑄 × (Γ ∪ { ⊥ }) ∗ Tape outputs consist of sequences of elements

of Γ that optionally end in a ⊥. So a more precise codomain is P (𝑄 × 𝑆) for 𝑆 = Γ ∗ ∪ (Γ ∗ ⌢ { ⊥ }) .
without proof An excellent source for more is (Hopcroft, Motwani, and Ullman 2001).
including C, Java, Python, and Racket This is a good approximation but the full story is more complicated.
Usually the set of programs accepted by the parser is a subset of a context free language, conditioned
on some additional rules that the parser enforces. For example, in 𝐶 every variable must be appear in a
declaration inside an enclosing scope, which is clearly a context-sensitive constraint. Another example
is that in Python all the whitespace prefixes inside a block have to be the same length, which again is a
context-sensitive constraint.
\d We shall ignore cases of non-ASCII digits, that is, cases outside 0–9. Unicode includes many different
sets of graphemes for the decimal digits, along with non-decimal numerals such as Roman numerals.
There are also a number of typographical variations of the ASCII numerals provided for specialized
mathematical use and for compatibility with earlier character sets, such as circled digits sometimes
used for itemization.
ZIP codes ZIP stands for Zone Improvement Plan. The system has been in place since 1963 so it, like the
music movement called ‘New Wave’, is an example of the danger of naming your project something that
will become obsolete if that project succeeds.
a colon and two forward slashes The inventor of the World Wide Web, T Berners Lee, has admitted that
the two slashes don’t have a purpose (Firth 2009).
more power than the theoretical regular expressions that we studied earlier Omitting this power, and keeping
the implementation in sync with the theory, has the advantage of speed. See (Cox 2007).
It is described by the regex It is credited to the Perl hacker Abigail, from http://abigail.be/.
valid email addresses This expression follows the RFC 822 standard. The full listing is at http://www.ex-
parrot.com/pdw/Mail-RFC822-Address.html. It is due to Paul Warren who did not write it by hand
but instead used a Perl program to concatenate a simpler set of regular expressions that relate directly
to the grammar defined in the RFC. To use the regular expression, should you be so reckless, you would
need to remove the formatting newlines.
J Zawinski The post is from alt.religion.emacs on 1997-Aug-12. For some reason it keeps disappearing
from the online archive. The full discussion reveals that the quote is more dogmatic than the complete
assertion. One response to the quote is, “Some people, when confronted with a problem, think ‘I
know, I’ll quote Jamie Zawinski.’ Now they have two problems.” (Martin Liebach, 2009-Mar-04,
https://m.lieba.ch/2009/03/04/regex-humor/.
Now they have two problems. A classic example is trying to use regular expressions to parse an HTML
document. Sometimes scraping a fixed document to get some needed data by using regexes is just
what you need, quick and not too hard. But to parse significant parts of an HTML document, or to try
to anticipate possible changes, just leads to horrors. See (Stack Exchange author bobnice 2009).
regex golfSee https://alf.nu/RegexGolf, and https://nbviewer.jupyter.org/url/norvig.com/
ipython/xkcd1313.ipynb.
John Myhill Sr 1923–1987 and Anil Nerode b 1932 Photo credits Paul Halmos, Jason Koski/Cornell
University
the two machines are essentially the same The two machines are said to be ‘isomorphic’.
Hopcroft’s algorithm See (Knuutila 2001)

Complexity
mirrors the subject’s history This is like the slogan “ontogeny recapitulates phylogeny” for the now-
discredited biological theory that the development of an embryo, which is called ontogeny, goes through
same stages as the adult stages in the evolution of the animal’s ancestors, which is phylogeny.
A natural next step is to look to do jobs efficiently S Aaronson states it more provocatively as, “[A]s
computers became widely available starting in the 1960s, computer scientists increasingly came to see
computability theory as not asking quite the right questions. For, almost all the problems we actually
want to solve turn out to be computable in Turing’s sense; the real question is which problems are
efficiently or feasibly computable.” (Aaronson 2011b)
A Karatsuba See https://en.wikipedia.org/wiki/Anatoly_Karatsuba.
clever algorithm The idea is: let 𝑘 = ⌈𝑛/2⌉ and write 𝑥 = 𝑥 1 2𝑘 + 𝑥 0 and 𝑦 = 𝑦1 2𝑘 + 𝑦0 (so for instance,
678 = 21 · 25 + 6 and 42 = 1 · 25 + 10). Then 𝑥𝑦 = 𝐴 · 22𝑘 + 𝐵 · 2𝑘 + 𝐶 where 𝐴 = 𝑥 1𝑦1 , and
𝐵 = 𝑥 1𝑦0 + 𝑥 0𝑦1 , and 𝐶 = 𝑥 0𝑦0 (for example, 28 476 = 21 · 210 + 216 · 25 + 60). The multiplications
by 22𝑘 and 2𝑘 are just bit-shifts to known locations independent of the values of 𝑥 and 𝑦 , so they don’t
affect the time much. But the two multiplications for 𝐵 seem remove all the advantage and still give
𝑛 2 time. However, Karatsuba noted that 𝐵 = (𝑥 0 + 𝑥 1 ) · (𝑦0 + 𝑦1 ) − 𝐴 − 𝐶 Boom: done. Just one
multiplication.

The ‘ 𝑓 = O (𝑔) ’ notation is very common, but awkward See also https://whystartat.xyz/wiki/Big_O_
notation.

our conclusions in the continuous context carry over to the discrete It does not cover some functions that we
may use such as the factorial, or those that are only defined for inputs larger than some value 𝑁 , but
this version is easier to understand and makes the same point.

are most common in practice Sometimes in practice intermediate powers are notable. For instance, at this
moment the complexity of matrix multiplication is O (𝑛 2.373 ) , approximately. But most often we work
with natural number expressions.

next table shows why This table is adapted from (Garey and Johnson 1979).

there are 3.16 × 107 seconds in a year The easy way to remember this is the bumper sticker slogan by
Tom Duff from Bell Labs: “𝜋 seconds is a nanocentury.”

very, very much larger than polynomial growth According to an old tale from India, the Grand Vizier Sissa
Ben Dahir invented chess. For it, the delighted Indian King granted him a wish. Sissa said, “Majesty,
give me a grain of wheat to place on the first square of the board, and two grains of wheat to place on
the second square, and four grains of wheat to place on the third, and eight grains of wheat to place on
the fourth, and so on. Oh, King, let me cover each of the 64 squares of the board.”

“And is that all you wish, Sissa, you fool?” exclaimed the astonished King.

“Oh, Sire,” Sissa replied, “I have asked for more wheat than you have in your entire kingdom. Nay, for
more wheat that there is in the whole world, truly, for enough to cover the whole surface of the earth
to the depth of the twentieth part of a cubit.”

Sissa has the right idea but his arithmetic is slightly off. A cubit is the length of a forearm, from the tip
of the middle finger to the bottom of the elbow, so perhaps twenty inches. The geometric series formula
gives 1 + 2 + 4 + · · · + 263 = 264 − 1 = 18 446 744 073 709 551 615 ≈ 1.84 × 1019 grains of rice. The
surface are of the earth, including oceans, is 510 072 000 square kilometers. There are 1010 square
centimeters in each square kilometer so the surface of the earth is 5.10 × 1018 square centimeters.
That’s between three and four grains of rice on every square centimeter of the earth. Not rice an inch
thick, but still a lot.

Another way to get a sense of the amount of rice is: there are about 7.5 billion people on earth so it is
on the order of 108 grains of rice for each person in the world. There are about 1 000 000 = 107 grains
of rice in a bushel. In sum, ten bushels for each person.

Cobham’s thesis Credit for this goes to both A Cobham and J Edmonds, separately; see (Cobham 1965)
and (Edmunds 1965).
Jack Edmonds, b 1934 Alan Cobham, 1927–2011

Cobham’s paper starts by asking whether “is it harder to multiply than to add?” a question that we still
cannot answer. Clearly we can add two 𝑛 -bit numbers in O (𝑛) time, but we don’t know whether we
can multiply in linear time.
Cobham then goes on to point out the distinction between the complexity of a problem and the running
time of a particular algorithm to solve that problem, and notes that many familiar functions, such
as addition, multiplication, division, and square roots, can all be computed in time “bounded by a
polynomial in the lengths of the numbers involved.” He suggests we consider the class of all functions
having this property.
As for Edmunds, in a “Digression” he writes: “An explanation is due on the use of the words ‘efficient
algorithm.’ According to the dictionary, ‘efficient’ means ‘adequate in operation or performance.’ This
is roughly the meaning I want — in the sense that it is conceivable for [this problem] to have no
efficient algorithm. . . . There is an obvious finite algorithm, but that algorithm increases in difficulty
exponentially with the size of the graph. It is by no means obvious whether or not there exists an
algorithm whose difficulty increases only algebraically with the size of the graph . . . If only to motivate
the search for good, practical algorithms, it is important to realize that it is mathematically sensible
even to question their existence.”
(It is worth noting that Cobham and Edwards were not the first to talk about polynomial and other time
behaviors. For instance, in 1910 HC Pocklington discussed it in exploring the behavior of algorithms
for solving quadratic congruences. But Cobham and Edwards were the ones who started the current
interest.)
tractable Another word that you can see in this context is ‘feasible’. Some authors use them to mean the
same thing, roughly that we can solve reasonably-sized problem instances using reasonable resources.
But some authors use ‘feasible’ to have a different connotation, for instance explicitly disallowing inputs
are too large, such as having too many bits to fit in the physical universe. The word ‘tractable’ is more
standard and works better with the definition that includes the limit as the input size goes to infinity,
so here we stick with it.
slower than the right by four calculations We won’t consider whether the compiler optimizes it out of the
loop.
if the algorithm is O (𝑛 2 ) on the RAM then on the Turing machine it can be O (𝑛 5 ) A more extreme example
of a model-based difference is that addition of two 𝑛 × 𝑛 matrices on a RAM model takes time that is
O (𝑛 2 ) , but on an unboundedly parallel machine model it takes constant time, O ( 1) .
the most common model is a Turing machine This observaton is from Avi Wigderson’s Turing Award lecture,
https://www.youtube.com/watch?v=f2NiGO8zC1c.
Its definition ignores constant factors This discussion originated as (Stack Exchange author babou and
various others 2015).
could that not make the second algorithm more useful in practice? A great writeup of the details of an
algorithm for small values is the description of the sorting algorithm used by Python in (Peters 2023).
the order of magnitude of these constants For a rough idea of what these may be, here are some numbers
that every programmer should know.

Operation Cost in nanoseconds

Cache reference 0.5–7
Branch mispredict 5
Main memory reference 100
Send 1K bytes over 1 Gbps network 10 000
Read 1 MB sequentially from disk 20 000 000
Send packet CA to Netherlands to CA 150 000 000

A nanosecond is 10 − 9 seconds. For more, see https://www.youtube.com/watch?v=JEpsKnWZrJ8&ap

p=desktop.
update that standard Even Knuth had to update standards, from his machine model MIX to MMIX.
an important part of the culture That is, these are storied problems.
inventor of the quaternion number system See https://en.wikipedia.org/wiki/History_of_quatern
ions.
Around the World Another version was called The Icosian Game. See http://puzzlemuseum.com/month
/picm02/200207icosian.htm.
This is the solution given by L Euler The figure is from (Euler 1766).
find the shortest-distance circuit that visits every city Traveling Salesman was first posed by K Menger in an
article that appeared in the same journal and issue as Gödel’s Incompleteness Theorem. The two were
close friends.
Kaliningrad is a Russian enclave between Poland and Lithuania

Kaliningrad, on the Baltic Sea between Poland and Lithuania

Königsberg This happens to be the hometown of David Hilbert.

A local mayor wrote to Euler XKCD, as usual, is on it.

Courtesy xkcd.com
no circuit is possible Consider a land mass. For each bridge in there must be an associated bridge out. So
an at least necessary condition is that the land masses have an even number of associated edges. Bt
that is not true for this city.
the countries must be contiguous A notable example of a non-contiguous country in the wold today is that
Russia is separated from Kaliningrad, the city that used to be known as Kónigsberg.
we can draw it in the plane This is because the graph comes from a planar map.
start with a planar graph The graph is undirected and without loops.
Counties of England and the derived planar graph This is today’s map. At the time, some counties were not
contiguous.
it was controversial
See https://www.maa.org/sites/default/files/pdf/upload_library/22/Ford/Swart697-
707.pdf.
Given a graph and a number 𝑘 ∈ N In the name of the problem we often omit the 𝑘 .
Conjunctive Normal form Any Boolean function can be expressed in that form; see the Appendix.
The table above gives the numbers for the 2020 election Here are the abbreviations for states and the
District of Columbia: Alabama AL, Alaska AK, Arizona AZ, Arkansas AR, California CA, Colorado CO,
Connecticut CT, Delaware DE, District of Columbia DC, Florida FL, Georgia GA, Hawaii HI, Idaho ID,
Illinois IL, Indiana IN, Iowa IA, Kansas KS, Kentucky KY, Louisiana LA, Maine ME, Maryland MD,
Massachusetts MA, Michigan MI, Minnesota MN, Mississippi MS, Missouri MO, Montana MT, Ne-
braska NE, Nevada NV, New Hampshire NH, New Jersey NJ, New Mexico NM, New York NY, North
Carolina NC, North Dakota ND, Ohio OH, Oklahoma OK, Oregon OR, Pennsylvania PA, Rhode Island RI,
South Carolina SC, South Dakota SD, Oklahoma OK, Tennessee TN, Texas TX, Utah UT, Vermont VT,
Virginia VA, Washington WA, Wisconsin WI, Wyoming WY
ignore some fine points Both Maine and Nebraska have two districts, and each elects their own representative
to the Electoral College, rather than having two state-wide electors who vote the same way.
words can be packed into the grid The earliest known example is the Sator square, five Latin words that
pack into a grid.

S A T O R
A R E P O
T E N E T
O P E R A
R O T A S

It appears in many places in the Roman Empire, often as graffiti. For instance, it was found in the ruins
of Pompeii. Like many word game solutions it sacrifices comprehension for form but it is a perfectly
grammatical sentence that translates as something like, “The farmer Arepo works the wheel with effort.”
popular with 𝑛 = 4 as a toy It was invented by Noyes Palmer Chapman, a postmaster in Canastota, New
York. As early as 1874 he showed friends a precursor puzzle. By December 1879 copies of the improved
puzzle were circulating in the northeast and students in the American School for the Deaf and other
started manufacturing it. They become popular as the “Gem Puzzle.” Noyes Chapman had applied for
a patent in February, 1880. By that time the game had became a craze in the US, somewhat like Rubik’s
Cube a century later. It was also popular in Canada and Europe. See (Wikipedia contributors 2017a).
We know of no efficient algorithm to find divisors An effort in 2009 to factor a 768-bit number (232-digits)
used hundreds of machines and took two years. The researchers estimated that a 1024-bit number
would take about a thousand times as long.
Factoring seems to be hard Finding factors has for many years been thought hard. For instance, a number
is called a Mersenne prime if it is a prime number of the form 2𝑛 − 1. They are named after M Mersenne,
a French friar and important figure in the early sharing of scientific results, who studied them in the
early 1600’s. He observed that if 𝑛 is prime then 2𝑛 − 1 may be prime, for instance with 𝑛 = 3, 𝑛 = 7,
𝑛 = 31, and 𝑛 = 127. He suspected that others of that form were also prime, in particular 𝑛 = 67.
On 1903-Oct-31 F N Cole, then Secretary of the American Mathematical Society, made a presentation
at a math meeting. When introduced, he went to the chalkboard and in complete silence computed
267 − 1 = 147 573 952 589 676 412 927. He then moved to the other side of the board, wrote
193 707 721 times 761 838 257 287, and worked through the calculation, finally finding equality. When
he was done Cole returned to his seat, having not uttered a word in the hour-long presentation. His
audience gave him a standing ovation.
Cole later said that finding the factors had been a significant effort, taking “three years of Sundays.”
Platonic solids See (Wikipedia contributors 2017k).
as shown Some PDF readers cannot do opacity, so you may not see the entire Hamiltonian path.
Six Degrees of Kevin Bacon One night, three college friends, Brian Turtle, Mike Ginelli, and Craig Fass,
were watching movies. Footloose was followed by Quicksilver, and between was a commercial for a
third Kevin Bacon movie. It seemed like Kevin Bacon was in everything! This prompted the question
of whether Bacon had ever worked with De Niro? The answer at that time was no, but De Niro was
in The Untouchables with Kevin Costner, who was in JFK with Bacon. The game was born. It became
popular when they wrote to Jon Stewart about it and appeared on his show. (From (Blanda 2013).)
See https://oracleofbacon.org/.
uniform family of tasks From (Jones 1997).
There is no widely-accepted formal definition of ‘algorithm’ This discussion derives from (Pseudonym 2014).
we prefer language decision problems Because of this, some authors modify the definition of a Turing
machine to have it come with a subset of accepting states. Such a machine solves a problem if it halts on
all input strings, and when it halts it is in an accepting state exactly when that string is in the language.
default interpretation of ‘problem’ Not every computational problem is naturally expressible as a language
decision problem Consider the task of sorting the characters of strings into ascending order. We could try
to express it as the language of sorted strings {𝜎 ∈ Σ∗ 𝜎 is sorted }. But recognizing a correctly-sorted
string does not require that we find a good way to sort an unsorted input. Another thought is to
consider the language of pairs ⟨𝜎, 𝑝⟩ where 𝑝 is a permutation of the numbers 0, ... |𝜎 | − 1 that brings
the string into ascending order. Here also the formulation seems to not capture the sorting problem, in
that recognizing a correct permutation feels different than generating one from scratch.
Both of these show the collection of languages One misleading aspect of this picture is that there are
uncountably many languages but only countably many Turing machines, and hence only countably many
computable or computably enumerable languages. So, shown to scale, the computably enumerable
area of the blob would be an infinitesimally small speck at the very bottom. But such a picture would
not show the features we want to illustrate, so these drawings take a graphical license.
the shaded collection Rec consists of the Turing computable languages The name Rec is because these used
to be known as the ‘recursive’ languages.
input two numbers and output their midpoint See https://hal.archives-ouvertes.fr/file/index/d
ocid/576641/filename/computing-midpoint.pdf.
final two bits are 00 Decimal representation is not much harder since a decimal number is divisible by
four if and only if the final two digits are in the set { 00, 04, ... 96 }.
everything of interest can be represented with reasonable efficiency by bitstrings See https://rjlipton
.wordpress.com/2010/11/07/what-is-a-complexity-class/. Of course, a wag may say that if it
cannot be represented by bitstrings then it isn’t of interest. But we mean something less tautological: we
mean that if we could want to compute with it then it can be put in bitstrings. For example, we find
that we can process speech, adjust colors on an image, or regulate pressure in a rocket fuel tank, all in
bitstrings, despite what may at first encounter seem to be the inherently analog nature of these things.
Beethoven’s 9th Symphony The official story is that CD’s are 72 minutes long so that they can hold this
piece.
researchers often do not mention representations This is like a programmer saying, “My program inputs
a number” rather than, “My program inputs the binary representation of a number.” It is also like a
person saying, “That’s me on the card” rather than “That’s a picture of me.”
leaving implementation details to a programmer (Grossman 2010)
the time or space behavior We will concentrate our attention resource bounds in the range from logarithmic
and exponential, because these are the most useful for understanding problems that arise in practice.
less than centuries See the video from Google at https://www.youtube.com/watch?v=-ZNEzzDcllU
and S Aaronson’s Quantum Supremacy FAQ at https://www.scottaaronson.com/blog/?p=4317.
The claim is the subject of scholarly reservations See the posting from IBM Research at https://www.
ibm.com/blogs/research/2019/10/on-quantum-supremacy/ and G Kalai’s Quantum Supremacy
Skepticism FAQ at https://gilkalai.wordpress.com/2019/11/13/gils-collegial-quantum-
supremacy-skepticism-faq/.
We give the class P our attention This discussion gained much from the material in (Allender, Loui, and
Regan 1997). This includes several direct quotations.
RE Recall that ‘recursively enumerable’ is an older term for ‘computably enumerable’.
adds some wrinkles But it avoids a wrinkle that we needed for Finite State machines and Pushdown
machines, 𝜀 transitions, since Turing machines are not required to consume their input one character at
a time.
function computed by a nondeterministic machine One thing that we can do is to define that the
nondeterministic machine computes 𝑓 : B∗ → B∗ is that if on an input 𝜎 , all branches halt and they all
leave the same value on the tape, which we call 𝑓 (𝜎) . Otherwise, the value is undefined, 𝑓 (𝜎)↑.
might be much faster R Hamming gives this example to demonstrate that an order of magnitude change
in speed can change the world, can change what can be done: we walk at 4 mph, a car goes at 40 mph,
and an airplane goes at 400 mph. This relates to the bug picture that opens this chapter.
we don’t find the 𝜔 ’s, we just use them This is like the Mechanical Turk https://en.wikipedia.org/wik
i/Mechanical_Turk in that the machine V does not need the smarts, it is the person, or the demon,
who provides that.
strategy for chess Chess is known to be a solvable game. This is Zermelo’s Theorem (Wikipedia contributors
2017m) — there is a strategy for one of the two players that forces a win or a draw, no matter how the
opponent plays
a deterministic verifier must take exponential time In fact, in the terminology of a later section, chess is
known to be EXP complete. See (Fraenkel and Lichtenstein 1981).
in a sense, useless Being given an answer with no accompanying justification is a problem. This is
like the Feynman algorithm for doing Physics: “The student asks . . . what are Feynman’s methods?
[M] Gell-Mann leans coyly against the blackboard and says: Dick’s method is this. You write down the
problem. You think very hard. (He shuts his eyes and presses his knuckles parodically to his forehead.)
Then you write down the answer.” (Gleick 1992) It is also like the mathematician S Ramanujan, who
relayed that the advanced formulas that he produced came in dreams from the god Narasimha. Some of
these formulas were startling and amazing, but some of them were wrong. (Chakrabarty 2017) Another
such story has to do with G Hardy about to board a ferry crossing rough seas from Denmark to Britain.
He sent a postcard to another mathematician stating that he had solved the Riemann hypothesis (this
is still one of the most famous unproven hypothesis in mathematics). And of course the most famous
example of a failure to provide backing is Fermat writing in a book he was reading that that there are
no nontrivial instances of 𝑥 𝑛 + 𝑦𝑛 ≠ 𝑧𝑛 for 𝑛 > 2 and then saying, “I have discovered a truly marvelous
proof of this, which this margin is too narrow to contain.”
the verifier cannot even input them before its runtime bound expires Some authors instead define that the
verifier runs in time polynomial in its input, ⟨𝜎, 𝜔⟩ , and add the explicit restriction that |𝜔 | must be
polynomial in |𝜎 | .
How is that legal? This is reminiscent of Quantum Bogosort, a facetious sorting algorithm. Given an
unordered list of length 𝑛 , it uses a quantum source of randomness to generate a permutation of 𝑛 . It
reorders the input according to that permutation. If the list is now sorted then good. If not then the
algorithm destroys the entire universe. Assuming the Many World’s Hypothesis, the result is that in any
surviving universe the list has been sorted in linear time.
Countdown For a brief description see https://en.wikipedia.org/wiki/Countdown_(game_show). A
version that you may find fun, worth searching for the videos, is https://en.wikipedia.org/wiki/
8_Out_of_10_Cats_Does_Countdown.
many-one reducible The name comes from the fact that still another reducibility is one-one reducibility,
where the function must be one-to-one.
that’s not true under Karp reduction The Halting problem set 𝐾 and its complement are not Karp reducible.
For, we already know that 𝐾 is computably enumerable. If 𝐾 c ≤𝑝 𝐾 then 𝑥 ∈ 𝐾 c implies that 𝑓 (𝑥) ∈ 𝐾 ,
and we can enumerate 𝑓 ( 0), 𝑓 ( 1), ... and check those against the values enumerated into 𝐾 , so we
would have that 𝐾 c is also computably enumerable. That would imply that 𝐾 is computable, which it is
not.
the Petersen graph The Petersen graph is a rich source of counterexamples for conjectures in Graph Theory
Drummer problem This is often called the Marriage problem, where the men pick suitable women. But
perhaps it is time for a new paradigm.
Asymmetric Traveling Salesman (Jonker and Volgenent 1983)
Stephen Cook b 1939 and Leonid Levin b 1948 Photo credits University of Toronto, Boston University
NP complete The name is from a survey created by Knuth. See blog.computationalcomplexity.org/2
010/11/by-any-other-name-would-be-just-as-hard.html.
there are many such problems The “at least as hard” is true in the sense that such problems can answer
questions about any other problem in that class. However, note that it might be that one NP complete
problem runs in nondeterministic time that is O (𝑛) while another runs in O (𝑛 1 000 000 ) time. So this
sense is at odds with our earlier characterization of problems that are harder to solve.
The list below gives the NP complete problems most often used These are from the classic standard reference
(Garey and Johnson 1979).
a gadget See https://cs.stackexchange.com/a/1249/50343 from the Computer Science Stack
Exchange user Jeff E.
tied to whether P = NP or P ≠ NP Ladner’s theorem is that if P ≠ NP then there is a problem in NP − P
that is not NP complete.
A large class See (Karp 1972).
an ending point That is, as P Pudlàk observes, we treat P ≠ NP as an informal axiom. (Pudlàk 2013)
caricature Paul Erdős joked that a mathematician is a machine for turning coffee into theorems.
completely within the realm of possibility that 𝜙 (𝑛) grows that slowly Hartmanis observes (Hartmanis
2017) that it is interesting that Gödel, the person who destroyed Hilbert’s program of automating
mathematics, seemed to think that these problems quite possibly are solvable in linear or quadratic
time.
In 2018, a poll The poll was conducted by W Gasarch, a prominent researcher and blogger in Computational
Complexity. There were 124 respondents. For the description see https://www.cs.umd.edu/users
/gasarch/BLOGPAPERS/pollpaper3.pdf. Note the suggestions that both respondents and even the
surveyor took the enterprise in a light-hearted way.
88% thought that P ≠ NP Gasarch divided respondents into experts, the people who are known to have
seriously thought about the problem, and the masses. The experts were 99% for P ≠ NP.
S Aaronson has said See (Roberts 2021) for both the Aaronson and Williams estimates.
A Wigderson See (Wigderson 2009).
Cook is of the same mind See (S. Cook 2000).
Many observers For example, (Viola 2018).
O (𝑛 lg 7 ) method (lg 7 ≈ 2.81) Strassen’s algorithm is used in practice. The current record is O (𝑛 2.37 ) but
it is not practical. It is a galactic algorithm because while runs faster than any other known algorithm
when the problem is sufficiently large, but the first such problem is so big that we never use the
algorithm. For other examples see (Wikipedia contributors 2020b).
Matching problem The Drummer problem described earlier is a special case of this for bipartite graphs.
more things to try than atoms in the universe There are about 1080 atoms in the universe. A graph
with 100 vertices has the potential for 100 2
edges, which is about 1002 . Trying every edge would be
10 000 10 000/3.32
2 ≈ 10 cases, which is much greater than 1080 .
since the 1960’s we have an algorithm Due to J Edmonds.
Theory of Computing blog feed (Various authors 2017)
R J Lipton captured this feeling (Lipton 2009)
Knuth has a related but somewhat different take (Knuth 2014)
all this is speculation Arthur C Clarke’s celebrated First Law is, “When a distinguished but elderly scientist
states that something is possible, he is almost certainly right. When he states that something is
impossible, he is very probably wrong.” (Wikipedia contributors 2023)
exploits this difference Recent versions of the algorithm used in practice incorporate refinements that we
shall not discuss. The core idea is unchanged.
Their algorithm, called RSA Originally the authors were listed in the standard alphabetic order: Adleman,
Rivest, and Shamir. Adleman objected that he had not done enough work to be listed first and insisted
on being listed last. He said later, “I remember thinking that this is probably the least interesting paper
I will ever write.”
tremendous amount of interest and excitement In his 1977 column, Martin Gardner posed a $100 challenge,
to crack this message: 9686 9613 7546 2206 1477 1409 2225 4355 8829 0575 9991 1245 7431
9874 6951 2093 0816 2982 2514 5708 3569 3147 6622 8839 8962 8013 3919 9055 1829 9451
5781 5254 The ciphertext was generated by the MIT team from a plaintext (English) message using
𝑒 = 9007 and this number 𝑛 (which is too long to fit on one line).

114, 381, 625, 757, 888, 867, 669, 235, 779, 976, 146, 612, 010, 218, 296, 721, 242,
362, 562, 561, 842, 935, 706, 935, 245, 733, 897, 830, 597, 123, 563, 958, 705,
058, 989, 075, 147, 599, 290, 026, 879, 543, 541

In 1994, a team of about 600 volunteers announced that they had factored 𝑛 .

𝑝 =3, 490, 529, 510, 847, 650, 949, 147, 849, 619, 903, 898, 133, 417, 764,
638, 493, 387, 843, 990, 820, 577

and
𝑞 = 32, 769, 132, 993, 266, 709, 549, 961, 988, 190, 834, 461, 413, 177, 642, 967,
992, 942, 539, 798, 288, 533

That enabled them to decrypt the message: the magic words are squeamish ossifage.
based on the next result It is called Fermat’s Little Theorem in contrast with his celebrated assertion that
𝑎𝑛 + 𝑏 𝑛 = 𝑐 𝑛 for 𝑛 > 2.
computer searches suggest that these are very rare Among the numbers less than 2.5 × 1010 there are only
21 853 ≈ 2.2 × 104 pseudoprimes base 2. That’s six orders of magnitude less.
a greater than 1 − ( 1/2)𝑘 chance that 𝑛 is prime Here is the probability 1 − ( 1/2)𝑘 for the first few
𝑘 ’s.
𝑘 Chance 𝑛 is prime
1 0.500 000 000
2 0.750 000 000
3 0.875 000 000
4 0.937 500 000
5 0.968 750 000
6 0.984 375 000
7 0.992 187 500
8 0.996 093 750
9 0.998 046 875

We get an extra decimal place of certainty about every 3 1/3 iterations because lg ( 10) ≈ 3.32. So if
you want, say, five decimal places, so that you have at least a probability of 0.999 99, then it is safe to
iterate 4 · 5 = 20 times.
any reasonable-sized 𝑘 Selecting an appropriate 𝑘 is an engineering choice between the cost of extra
iterations and the gain in confidence.
we are quite confident that it is prime We are confident, but not certain. There are numbers, called
Carmichael numbers, that are pseudoprime for every base 𝑎 relatively prime to 𝑛 . The smallest example
is 𝑛 = 561 = 3 · 11 · 17, and the next two are 1 105 and 1 729. Like pseudoprimes, these seem to be
very rare. Among the numbers less than 1016 there are 279 238 341 033 922 primes, about 2.7 × 1014 ,
but only 246 683 ≈ 2.4 × 105 -many Carmichael numbers.
the minimal pub crawl See (W. Cook et al. 2017).
An example is that the Free mathematics system Sage includes one See also https://www.youtube.com/
watch?v=q8nQTNvCrjE about the Concorde TSP solver.
the Soduku problem is NP complete First proved in the MS thesis of Takayuki Yato, from the Department
of Information Science at the University of Tokyo in 1987. That document seems to have disappeared
from the web; for a place to start see the Soduku Wikipedia page.

Appendices
empty string, denoted 𝜀 Possibly 𝜀 came as an abbreviation for ‘empty’. Some authors use 𝜆 , possibly
from the German word for ‘empty’, leer. Or it might just be that someone used the symbols just
because one was needed; the story goes that when asked why he used the 𝜆 symbol for his 𝜆 calculus,
Church replied, “eenie, meenie, meinie, mo” (Stack Exchange author Jouni Sirén 2016); see also
https://www.youtube.com/watch?v=juXwu0Nqc3I
reversal 𝜎 R of a string The most practical current notion of a string, the Unicode standard, does not have
string reversal. All of the naive ways to reverse a string run into problems for arbitrary Unicode strings
which may contain non-ASCII characters, combining characters, ligatures, bidirectional text in multiple
languages, and so on. For example, merely reversing the chars (the Unicode scalar values) in a string
can cause combining marks to become attached to the wrong characters. Another example is: how
to reverse ab<backspace>ab? The Unicode Consortium has not gone through the effort to define the
reverse of a string because there is no real-world need for it. (From https://qntm.org/trick.)
Credits

Prologue
I.1.12 SE user Shuzheng, https://cs.stackexchange.com/q/45589/50343
I.1.13 Question by SE user Arsalan MGR, https://cs.stackexchange.com/q/135343/50343
I.2.9 SE user Yuval Filmus, https://cs.stackexchange.com/a/135170/50343
I.2.13 http://www.ivanociardelli.altervista.org/wp-content/uploads/2016/09/Solutions-to-
exercises.pdf
I.4.30 SE user Ted, https://math.stackexchange.com/a/75300/12012

Background
II.2 Image credit: Robert Williams and the Hubble Deep Field Team (STScI) and NASA.
II. Image credit File:Galilee.jpg. (2018, September 27). Wikimedia Commons, the free media repository.
Retrieved 22:19, January 26, 2020 from https://commons.wikimedia.org/w/index.php?title=File:
Galilee.jpg&oldid=322065651.
II.3.18 User scherk at pbworks.com.
II.3.20 Math StackExchange user Robert Z https://math.stackexchange.com/a/1896328/12012
II.3.28 Michael J Neely
II.3.31 Answer from Stack Exchange member Alex Becker.
II.4.1 ENIAC Programmers, 1946 U. S. Army Photo from Army Research Labs Technical Library
II.4.6 Started on Stack Exchange
II.4.9 From a Stack Exchange question.
II.5.13 CS SE user Kyle Strand https://cs.stackexchange.com/q/11645/50343.
II.5.14 SE user npostavs, https://cs.stackexchange.com/a/44875/50343
II.5.35 SE user Raphael https://cs.stackexchange.com/a/44901/50343
II.6.10 Question by SE user MathematicalOrchid, https://cs.stackexchange.com/q/2811/67754, and
answer by SE user Andrej Bauer used in section. The answer here is not from Andrej Bauer.
II.6.31 SE user Rajesh R
II.8.14 https://mathoverflow.net/questions/33046/arent-oracle-machines-unsound-concepts,
(The question there as elaborated is different than this adaptation’s.)
II.8.16 SE user Karolis Juodelė
II.8.19 SE user Noah Schweber
II.8.20 http://people.cs.aau.dk/~srba/courses/tutorials-CC-10/t5-sol.pdf
II.9.10 (Rogers 1987), p 214.
II.9.12 (Rogers 1987), p 214.
II.9.17 (Rogers 1987), p 214.
II.A.1 https://www.ias.edu/ideas/2016/pires-hilbert-hotel

Languages
III.1.25 F Stephan, https://www.comp.nus.edu.sg/~fstephan/toc01slides.pdf
III.1.36 SE user babou
III.2.9 SE user Rick Decker
III.2.16 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.2.19 (Hopcroft, Motwani, and Ullman 2001), exercise 5.1.2.
III.2.32 Wikipedia contributors, https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal
o_buffalo_buffalo_Buffalo_buffalo, William J. Rapaport, https://cse.buffalo.edu/~rapaport/
BuffaloBuffalo/buffalobuffalo.html
III.2.36 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.3.17 SE user DollarAkshay
III.3.24 T Zaremba, http://www.geom.uiuc.edu/~zarembe/graph3.html.
III.A.9 http://people.cs.ksu.edu/~schmidt/300s05/Lectures/GrammarNotes/bnf.html

Automata
IV.1.44 From Introduction to Languages by Martin, edition four, p 77.
IV.3.44 https://cs.stackexchange.com/a/30726
IV.4.7 https://cs.stackexchange.com/q/155353/50343
IV.4.27 SE user jmite, https://cs.stackexchange.com/a/67317/50343.
IV.4.29 (Rich 2008)
IV.4.30 (Rich 2008), https://math.stackexchange.com/a/1102627
IV.5.20 SE user David Richerby, https://cs.stackexchange.com/a/97885/67754
IV.5.24 (Rich 2008)
IV.5.30 SE author Yuval Filmus, https://cs.stackexchange.com/a/41445/50343
IV.5.31 SE user Brian M Scott, https://math.stackexchange.com/a/1508488
IV.C.15 https://www.eecs.wsu.edu/~cook/tcs/l10.html

Complexity
V. Some of the discussion is from https://softwareengineering.stackexchange.com/a/20833.
V. Discussion of the third point started as https://cs.stackexchange.com/questions/9957/justific
ation-for-neglecting-constants-in-big-o.
V. The fourth point derives from https://stackoverflow.com/a/19647659.
V. This discussion originated as (Stack Exchange author templatetypedef 2013).
V.1.57 Stack Exchange user Daniel Fischer, https://math.stackexchange.com/a/674039, and Stack
Exchange user anon, https://math.stackexchange.com/a/61741
V.1.63 Stack Exchange user Ilmari Karonen, https://math.stackexchange.com/questions/925053/us
ing-limits-to-determine-big-o-big-omega-and-big-theta
V.2.24 Sean McCulloch, https://npcomplete.owu.edu/2014/06/03/3-dimensional-matching/
V.2.54 Private communication from Puck Rombach.
V.2.68 Jan Verschelde, http://homepages.math.uic.edu/~jan/mcs401/partitioning.pdf
V.3.11 A.A. at https://rjlipton.wordpress.com/2010/11/07/what-is-a-complexity-class/#com
ment-8872
V.4.16 https://cs.stackexchange.com/q/57518
V.5.19 Paul Black, https://xlinux.nist.gov/dads/HTML/nondetermAlgo.html
V.6.28 SE user JesusIsLord at https://cstheory.stackexchange.com/a/47031/4731
V.6.30 SE user user326210, https://math.stackexchange.com/a/2564255
V.6.34 Neal E Young, University of Calfornia Riverside
V. By Psyon (Own work) CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Jigsaw_Puzzle.
svg
V.7.13 William Gasarch, https://www.cs.umd.edu/~gasarch/COURSES/452/F14/poly.pdf
V.7.17 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/np.html
V.7.18 Kevin Wayne. http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/n
p-sol.html
V.7.21 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/np.html
V.7.24 Y Lyuu, https://www.csie.ntu.edu.tw/~lyuu/complexity/2016/20161129s.pdf
V.7.29 SE user Yuval Filmus https://cs.stackexchange.com/a/132902/50343
V.8.17 SE user Yuval Filmas https://cs.stackexchange.com/a/54452/50343
Bibliography
A/V Geeks, Y. user, ed. (2013). Slide Rule - Proportion, Percentage, Squares And Square Roots (1944).
Division of Visual Aids, US Office of Education. url:
https://www.youtube.com/watch?v=dT7bSn03lx0 (visited on 08/09/2015).
Aaronson, S. (July 21, 2011a). Rosser’s Theorem via Turing machines. url:
http://www.scottaaronson.com/blog/?p=710 (visited on 12/31/2023).
— (Aug. 14, 2011b). Why Philosophers Should Care About Computational Complexity. url:
https://arxiv.org/abs/1108.1791.
— (May 3, 2012a). The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and
me. url: http://www.scottaaronson.com/blog/?p=2725.
— (Aug. 30, 2012b). The Toaster-Enhanced Turing Machine. url:
http://www.scottaaronson.com/blog/?p=1121 (visited on 05/28/2015).
— (July 27, 2020). The Busy Beaver Frontier. url: https://www.scottaaronson.com/papers/bb.pdf
(visited on 07/02/2024).
Adams, D. (1979). The Hitchhiker’s Guide to the Galaxy. Harmony Books. isbn: 9780345391803.
Allender, E., M. C. Loui, and K. W. Regan (1997). “Complexity Classes”. In: ed. by M. J. Atallah and
M. Blanton. Boca Raton, Florida: CRC Press. Chap. 27.
Avigad, J. (Jan. 9, 2007). “Computability and Incompleteness Lecture Notes”. In: url:
https://www.andrew.cmu.edu/user/avigad/Teaching/candi_notes.pdf (visited on 09/26/2024).
Bellos, A. (Dec. 15, 2014). “The Game of Life: a beginner’s guide”. In: The Guardian. url:
http://www.theguardian.com/science/alexs-adventures-in-numberland/2014/dec/15/the-
game-of-life-a-beginners-guide (visited on 07/14/2015).
Bernstein, E. and U. Vazirani (1997). “Quantum Complexity Theory”. In: SIAM Journal of Compututing
26.5, pp. 1411–1473.
Bigham, D. S. (Aug. 19, 2014). How Many Vowels Are There in English? (Hint: It’s More Than AEIOUY.).
Slate. url: http://www.slate.com/blogs/lexicon_valley/2014/08/19/aeiou_and_sometimes_
y_how_many_english_vowels_and_what_is_a_vowel_anyway.html (visited on 06/12/2017).
Black, R. (2000). “Proving Church’s Thesis”. In: Philosophia Mathematica 8, pp. 244–258.
Blanda, S. (2013). The Six Degrees of Kevin Bacon. [Online; accessed 2019-Apr-01]. url:
https://blogs.ams.org/mathgradblog/2013/11/22/degrees-kevin-bacon/.
Boro Sitnikovski (2024). Deriving a Quine in a Lisp. url:
https://bor0.wordpress.com/2020/04/24/deriving-a-quine-in-a-lisp/ (visited on
09/26/2024).
Brady, A. H. (Apr. 1983). “The Determination of the Value of Rado’s Noncomputable Function Σ(𝑘) for
Four-State Turing Machines”. In: Mathematics of Computation 40.162, pp. 647–665.
Bragg, M. (Sept. 2016). Zeno’s Paradoxes. Podcast. Guests: Marcus du Sautoy, Barbara Sattler, and James
Warren. British Broadcasting Corporation. url: https://www.bbc.co.uk/programmes/b07vs3v1.
Brock, D. C. (2020). Discovering Dennis Ritchie’s Lost Dissertation. [Online; accessed 2020-Jun-20]. url:
https://computerhistory.org/blog/discovering-dennis-ritchies-lost-dissertation/.
Brower, K. (1983). The Starship and the Canoe. Harper Perennial; Reprint edition. isbn: 978-0060910303.
Brubaker, B. (Apr. 2, 2024). “With Fifth Busy Beaver, Researchers Approach Computation’s Limits”. In:
Quanta. url: https://www.quantamagazine.org/amateur-mathematicians-find-fifth-busy-
beaver-turing-machine-20240702/ (visited on 04/02/2024).
Bruck, R. H. (1953). “Computational Aspects of Certain Combinatorial Problems”. In: AMS Symposium in
Applied Mathematics 6, p. 31.
Chakrabarty, R. (Apr. 26, 2017). “Srinivasa Ramanujan: The mathematical genius who credited his 3900
formulae to visions from Goddess Mahalakshmi”. In: India Today. url:
https://www.indiatoday.in/education-today/gk-current-affairs/story/srinivasa-
ramanujan-life-story-973662-2017-04-26 (visited on 11/27/2020).
Church, A. (1937). “Review of Alan M. Turing, On computable numbers, with an application to the
Entscheidungsproblem”. In: Journal of Symbolic Logic 2, pp. 42–43.
Cobham, A. (1965). “The intrinsic computational difficulty of functions”. In: Logic, Methodology and
Philosophy of Science: Proceedings of the 1964 International Congress. Ed. by Y. Bar-Hillel. North-Holland
Publishing Company, pp. 24–30.
Cook, S. (2000). The P vs NP Problem. Official problem description. Clay Mathematics Institute. url:
https://www.claymath.org/sites/default/files/pvsnp.pdf (visited on 01/11/2018).
Cook, W. et al. (2017). UK Pubs Travelling Salesman Problem. url:
http://www.math.uwaterloo.ca/tsp/pubs/index.html (visited on 12/16/2017).
Copeland, B. J. and D. Proudfoot (1999). “Alan Turing’s Forgotten Ideas in Computer Science”. In:
Scientific American 280.4, pp. 99–103.
Copeland, B. J. (Sept. 1996). “What is Computation?” In: Computation, Cognition and AI, pp. 335–359.
— (1999). “Beyond the universal Turing machine”. In: Australasian Journal of Philosophy 77.1, pp. 46–67.
— (Aug. 19, 2002). The Church-Turing Thesis; Misunderstandings of the Thesis. url:
http://plato.stanford.edu/entries/church-turing/#Bloopers (visited on 01/07/2016).
Cox, R. (2007). Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python,
Ruby, . . .). url: https://swtch.com/~rsc/regexp/regexp1.html (visited on 06/29/2019).
Davis, M. (2004). “The Myth of Hypercomputation”. In: Alan Turing: Life and Legacy of a Great Thinker.
Ed. by C. Teuscher. Springer, pp. 195–211. isbn: ISBN 978-3-662-05642-4.
— (2006). “Why there is no such discipline as hypercomputation”. In: Applied Mathematics and
Computation 178, pp. 4–7.
Dershowitz, N. and Y. Gurevich (Sept. 2008). “A Natural Axiomatization of Computability and Proof of
Church’s Thesis”. In: Bulletin of Symbolic Logic 14.3, pp. 299–350.
Edmunds, J. (1965). “Paths, trees, and flowers”. In: Canadian Journal of Mathematics 17, pp. 449–467.
Eén, N. and N. Sörensson (2005). MiniSat. url: http://minisat.se/ (visited on 05/16/2022).
Encyclopædia Britannica Editors (2017). Y2K bug. url:
https://www.britannica.com/technology/Y2K-bug (visited on 05/10/2017).
Euler, L. (1766). “Solution d’une question curieuse que ne paroit soumise a aucune analyse (Solution of a
curious question which does not seem to have been subjected to any analysis)”. In: Mémoires de
l’Academie Royale des Sciences et Belles Lettres, Année 1759 15. [Online; accessed 2017-Sep-23, article
309], pp. 310–337. url: http://eulerarchive.maa.org/.
Firth, N. (Oct. 14, 2009). “Sir Tim Berners-Lee admits the forward slashes in every web address ‘were a
mistake’”. In: Daily Mail. url: https://www.dailymail.co.uk/sciencetech/article-
1220286/Sir-Tim-Berners-Lee-admits-forward-slashes-web-address-mistake.html (visited
on 11/29/2018).
Fortnow, L. and B. Gasarch (2002). Computational Complexity Blog. [Online; accessed 2017-Nov-13]. url:
http://blog.computationalcomplexity.org/2002/11/foundations-of-complexitylesson-
7.html.
Fraenkel, A. S. and D. Lichtenstein (1981). “Computing a Perfect Strategy for 𝑛 × 𝑛 Chess Requires Time
Exponential in 𝑛 ”. In: Journal Of Combinatorial Theory, Series A, pp. 199–214.
Free Online Dictionary of Computing (Denis Howe) (2017). Stephen Kleene. [Online; accessed
21-June-2017]. url: http://foldoc.org/Stephen%20Kleene.
Gandy, R. (1980). “Church’s Thesis and Principles for Mechanisms”. In: The Kleene Symposium. Ed. by
J. Barwise, H. J. Keisler, and K. Kunen. North-Holland Amsterdam, pp. 123–148. isbn:
978-0-444-85345-5.
Gardner, M. (Oct. 1970). “Mathematical Games: The fantastic combinations of John Conway’s new solitaire
game ‘life’”. In: Scientific American 223, pp. 120–123. url:
http://www.ibiblio.org/lifepatterns/october1970.html.
Garey, M. and D. Johnson (1979). Computers and Intractability, A Guide to the Theory of NP Completeness.
W. H. Freeman.
Gizmodo (1948). UCLA’s 1948 Mechanical Computer. Accessed 2019-September-18. url:
https://vimeo.com/70589461.
Gleick, J. (Sept. 20, 1992). “Part Showman, All Genius”. In: New York Times Magazine. url:
https://www.nytimes.com/1992/09/20/magazine/part-showman-all-genius.html (visited on
11/27/2020).
Gödel, K. (1964). “What is Cantor’s Continuum Problem?” In: Philosophy of Mathematics: Selected Readings.
Ed. by P. Benacerraf and H. Putnam. Cambridge University Press, pp. 470–494.
— (1995). “Undecidable diophantine propositions”. In: Collected works Volume III: Unpublished essays and
lectures. Ed. by S. F. et al. Oxford University Press.
Goodstein, R. L. (Dec. 1947). “Transfinite Ordinals in Recursive Number Theory”. In: Journal of Symbolic
Logic 12.4, pp. 123–129.
Grossman, L. (2010). Metric Math Mistake Muffed Mars Mereorology Mission. [Online; accessed
2017-May-25]. url: https://www.wired.com/2010/11/1110mars-climate-observer-report/.
Hartmanis, J. (2017). Gödel, von Neumann and the P =?NP Problem. url:
http://www.cs.cmu.edu/~15455/hartmanis-on-godel-von-neumann.pdf (visited on
12/25/2017).
Hennie, F. (1977). Introduction to Computability. Addison-Wesley. isbn: 978-0201028485.
Hilbert, D. and W. Ackermann (1950). Principles of Mathematical Logic. Trans. by R. E. Luce. AMS Chelsea
Publishing. isbn: 978-0821820247.
Hodges, A. (1983). Alan Turing: the enigma. Simon and Schuster. isbn: 0-671-49207-1.
— (2016). Alan Turing in the Stanford Encyclopedia of Philosophy. url:
http://www.turing.org.uk/publications/stanford.html (visited on 04/06/2016).
Hofstadter, D. R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books. isbn: 978-0465026562.
Hopcroft, J. E., R. Motwani, and J. D. Ullman (2001). Introduction to Automata Theory, Languages, and
Computation. 2nd ed. Pearson Education. isbn: 0201441241.
Huggett, N. (2010). Zeno’s Paradoxes — Stanford Encyclopedia of Philosophy. [Online; accessed
23-Dec-2016]. url: https://plato.stanford.edu/entries/paradox-zeno/#ParMot.
Indian Institute of Science and Indian Institutes of Technologies (2021). Graduate Aptitude Test in
Engineering.
— (2022). Graduate Aptitude Test in Engineering.
Jones, N. D. (1997). Computability and Complexity From a Programming Perspective. 1st ed. MIT Press.
isbn: 978-0262100649.
Jonker, R. and T. Volgenent (Nov. 1, 1983). “Transforming Asymmetric into Symmetric Traveling Salesman
Problems”. In: Operations Research Letters 2.4, pp. 161–163.
Karp, R. M. (1972). “Reducibility Among Combinatorial Problems”. In: ed. by R. E. Miller and
J. W. Thatcher. New York: Plenum, pp. 85–103. url:
https://www.loc.gov/resource/cph.3c10471/ (visited on 12/21/2017).
Kleene, S. (1952). Introduction to Metamathematics. North-Holland Amsterdam. isbn: 978-0923891572.
Klyne, G. and C. Newman (July 2002). Date and Time on the Internet: Timestamps. RFC 3339. RFC Editor,
pp. 1–18. url: https://www.ietf.org/rfc/rfc3339.txt.
Knuth, D. E. (Dec. 1964). “Backus Normal Form vs. Backus Naur Form”. In: Communications of the ACM
7.12, pp. 735–736.
— (May 20, 2014). Twenty Questions for Donald Knuth. url:
http://www.informit.com/articles/article.aspx?p=2213858 (visited on 02/17/2018).
Knuutila, T. (2001). “Redescribing an algorithm by Hopcroft”. In: Theoretical Computer Science 250,
pp. 333–363.
Kragh, H. (Mar. 27, 2014). The True (?) Story of Hilbert’s Infinite Hotel. url:
http://arxiv.org/abs/1403.0059.
Leupold, J. (1725). “Details of the mechanisms of the Leibniz calculator, the most advanced of its time”. In:
Illustration in: Theatrum arithmetico-geometricum, das ist . . . [bound with Theatrum machinarium,
oder, Schau-Platz der Heb-Zeuge/Jacob Leupold. Leipzig, 1725]. Leipzig: Zufinden bey dem Autore
und Joh. Friedr. Gleditschens seel. Sohn: Gedruckt bey Christoph Zunkel, 1727. url:
https://www.loc.gov/resource/cph.3c10471/ (visited on 11/14/2016).
Levin, L. A. (Dec. 7, 2016). Fundamentals of Computing. url: https://www.cs.bu.edu/fac/lnd/toc/.
Lipton, R. J. (Sept. 22, 2009). It’s All Algorithms, Algorithms and Algorithms. url: https:
//rjlipton.wordpress.com/2009/09/22/its-all-algorithms-algorithms-and-algorithms/
(visited on 02/17/2018).
Maienschein, J. (2017). “Epigenesis and Preformationism”. In: The Stanford Encyclopedia of Philosophy.
Ed. by E. N. Zalta. Spring 2017. Metaphysics Research Lab, Stanford University.
MathOverflow user Joel David Hamkins (2010). Answer to: Infinite CPU clock rate and hotel Hilbert. url:
https://mathoverflow.net/a/22038 (visited on 04/19/2017).
McCarthy, J. (1963). A Basis for a Mathematical Theory of Computation. url:
http://www-formal.stanford.edu/jmc/basis1.pdf (visited on 06/15/2017).
Meyer, A. R. and D. M. Ritchie (1966). Research report: The complexity of loop programs. Tech. rep. 1817.
IBM.
N. J. A. Sloane, e. (2019). The On-Line Encyclopedia of Integer Sequences, A000290. url:
https://oeis.org/A000290 (visited on 03/02/2019).
Odifreddi, P. (1992). Clasical Recursion Theory. Elsevier Science. isbn: 0-444-87295-7.
Perlis, A. J. (Sept. 1, 1982). “Epigrams on Programming”. In: SIGPLAN Notices 17.9, pp. 7–13. url:
https://web.archive.org/web/19990117034445/http://www-pu.informatik.uni-
tuebingen.de/users/klaeren/epigrams.html (visited on 12/23/2023).
Peters, T. (2023). Timsort. url: https://bugs.python.org/file4451/timsort.txt (visited on
01/14/2023).
Piccinini, G. (2017). “Computation in Physical Systems”. In: The Stanford Encyclopedia of Philosophy. Ed. by
E. N. Zalta. Summer 2017. Metaphysics Research Lab, Stanford University.
Pinker, S. (Sept. 4, 2014). The Trouble With Harvard. url:
https://newrepublic.com/article/119321/harvard-ivy-league-should-judge-students-
standardized-tests (visited on 12/23/2020).
Pour-El, M. B. and I. Richards (1981). “The wave equation with computable initial data such that its unique
solution is not computable”. In: Adv. in Math 39, pp. 215–239.
Pseudonym, S. E. author (2014). Answer to: What exactly is an algorithm? url:
https://cs.stackexchange.com/a/31953 (visited on 12/27/2018).
Pudlàk, P. (2013). Logical Foundations of Mathematics and Computational Complexity. Springer. isbn:
978-3-319-34268-9.
Radó, T. (May 1962). “On Non-computable Functions”. In: Bell Systems Technical Journal, pp. 877–884.
url: https://ia601900.us.archive.org/0/items/bstj41-3-877/bstj41-3-877.pdf.
Rendell, P. (2011). A Turing Machine in Coway’s Game of Life, extendable to a Universal Turing Machine. url:
http://rendell-attic.org/gol/tm.htm (visited on 07/21/2015).
Renwick, W. S. (May 6, 1949). The start of the EDSAC log. [Online; accessed 2019-Mar-02]. url:
https://www.cl.cam.ac.uk/relics/elog.html.
Rich, E. (2008). Automata, Computability, and Complexity. Pearson. isbn: 978-0-13-228806-4.
Roberts, S. (Oct. 27, 2021). “The 50-year-old problem that eludes theoretical computer science”. In: MIT
Technology Review.
Robinson, R. (1948). “Recursion and Double Recursion”. In: Bulletin of the American Mathematical Society
10, pp. 987–993.
Rogers Jr., H. (Sept. 1958). “Gödel numberings of partial recursive functions”. In: Journal of Symbolic Logic
23.3, pp. 331–341.
— (1987). Theory of Recursive Functions and Effective Computability. MIT Press. isbn: 0-262-68052-1.
Schnieder, H.-J. (2001). “Computability in an Introductory Course on Programming”. In: Bulletin of the
European Association for Theoretical Computer Science, EATCS 73, pp. 153–164.
SE author Brian M. Scott (Feb. 14, 2020). Inverting the Cantor pairing function. url:
http://math.stackexchange.com/q/222835 (visited on 10/28/2012).
Smoryński, C. (1991). Logical Number Theory I. Springer-Verlag. isbn: 978-3540522362.
Soare, R. I. (1999). “Computability and Incomputability”. In: Handbook of Computability Theory. Ed. by
E. R. Griffor. North-Holland, Amsterdam, pp. 3–36.
Stack Exchange author Andrej Bauer (2016). Answer to: Is a Turing Machine “by definition” the most
powerful machine? [Online; accessed 2017-Nov-05]. Stack Overflow discussion board. url:
https://cs.stackexchange.com/a/66753/78536.
— (2018). Answer to: Problems understanding proof of smn theorem using Church-Turing thesis. [Online;
accessed 2020-Feb-13]. Stack Overflow discussion board. url:
https://cs.stackexchange.com/a/97946/67754.
Stack Exchange author babou and various others (2015). Justification for neglecting constants in Big O.
[Online; accessed 2017-Oct-29]. Computer Science Stack Exchange discussion board. url:
https://cs.stackexchange.com/a/41000/78536.
Stack Exchange author bobnice (2009). Answer to: RegEx match open tags except XHTML self-contained tags.
url: https://stackoverflow.com/a/1732454/7168267 (visited on 01/27/2019).
Stack Exchange author David Richerby (2018). Why is there no permutation in Regexes? (Even if regular
languages seem to be able to do this). [Online; accessed 2020-Jan-01]. Stack Overflow discussion board.
url: https://cs.stackexchange.com/a/100215/67754.
Stack Exchange author JohnL (2020). How to decide whether a language is decidable when not involving
turing machines? [Online; accessed 2020-Jun-11]. Computer Science Stack Exchange discussion board.
url: https://cs.stackexchange.com/a/127035/67754.
Stack Exchange author Jouni Sirén (2016). Answer to: What is the origin of 𝜆 for empty string? Accessed
2016-October-20. url: http://cs.stackexchange.com/a/64850/50343.
Stack Exchange author Kaktus and various others (2019). Georg Cantor’s diagonal argument, what exactly
does it prove? [Online; accessed 2019-Dec-25]. Computer Science Stack Exchange discussion board.
url: https://math.stackexchange.com/q/2176304.
Stack Exchange author Ryan Williams (Sept. 2, 2010). Comment to answer for What would it mean to
disprove Church-Turing thesis? url: https://cstheory.stackexchange.com/a/105/4731 (visited on
06/24/2019).
Stack Exchange author templatetypedef (2013). What is pseudopolynomial time? How does it differ from
polynomial time? [Online; accessed 2017-Oct-29]. Stack Overflow discussion board. url:
https://stackoverflow.com/a/19647659.
Thompson, K. (Aug. 1984). “Reflections on trusting trust”. In: Communications of the ACM 27 (8),
pp. 761–763.
Thomson, J. F. (Oct. 1954). “Tasks and Super-Tasks”. In: Analysis 15.1, pp. 1–13.
Turing, A. M. (1937). “On Computable Numbers, with an Application to the Entscheidungsproblem”. In:
Proceedings of the London Mathematical Society. 2nd ser. 42, pp. 230–265.
— (1938a). “On Computable Numbers, with an Application to the Entscheidungsproblem. A Correction.”
In: Proceedings of the London Mathematical Society. 6th ser. 43, pp. 544–546.
— (1938b). “Systems of Logic Based on Ordinals”. PhD. Princeton University.
U.S. Naval Observatory, T. S. D. (2017). Leap Seconds. [Online; accessed 10-May-2017]. url:
http://tycho.usno.navy.mil/leapsec.html.
Various authors (2017). Theory of Computing Blog Aggregator. [Online; accessed 17-May-2017]. url:
http://cstheory-feed.org/.
Viola, E. (Feb. 16, 2018). I believe P=NP. url:
https://emanueleviola.wordpress.com/2018/02/16/i-believe-pnp/ (visited on 02/16/2018).
Wigderson, A. (2009). “Knowledge, Creativity and P versus NP”. in: url:
https://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/AW09/AW09.pdf (visited on
06/10/2023).
— (2017). Mathematics and Computation. [Draft of a to-be-published book; accessed 2017-Oct-27]. url:
https://www.math.ias.edu/avi/book.
Wikipedia contributors (2014). History of the Church–Turing thesis — Wikipedia, The Free Encyclopedia.
[Online; accessed 2-October-2016]. url: https://en.wikipedia.org/w/index.php?title=Histo
ry_of_the_Church%E2%80%93Turing_thesis&oldid=618643863.
— (2015). Stigler’s law of eponymy — Wikipedia, The Free Encyclopedia. url: https:
//en.wikipedia.org/w/index.php?title=Stigler%27s_law_of_eponymy&oldid=691378684
(visited on 02/14/2016).
— (2016a). Age of the Earth — Wikipedia, The Free Encyclopedia. [Online; accessed 13-June-2016]. url:
https://en.wikipedia.org/w/index.php?title=Age_of_the_Earth&oldid=724796250.
— (2016b). Donald Michie — Wikipedia, The Free Encyclopedia. [Online; accessed 24-March-2016]. url:
https://en.wikipedia.org/w/index.php?title=Donald_Michie&oldid=708156000 (visited on
03/24/2016).
— (2016c). Nomogram — Wikipedia, The Free Encyclopedia. [Online; accessed 6-October-2016]. url:
https://en.wikipedia.org/w/index.php?title=Nomogram&oldid=742964268.
— (2016d). Ross–Littlewood paradox — Wikipedia, The Free Encyclopedia. [Online; accessed
9-February-2017]. url: https://en.wikipedia.org/w/index.php?title=Ross%E2%80%93Little
wood_paradox&oldid=739534216.
— (2016e). The Imitation Game — Wikipedia, The Free Encyclopedia. [Online; accessed 28-June-2016].
url: https://en.wikipedia.org/w/index.php?title=The_Imitation_Game&oldid=723336480.
— (2016f). Turtles all the way down — Wikipedia, The Free Encyclopedia. [Online; accessed
2016-September-04]. url: https:
//en.wikipedia.org/w/index.php?title=Turtles_all_the_way_down&oldid=736001775.
— (2016g). Zeno’s paradoxes — Wikipedia, The Free Encyclopedia. [Online; accessed 23-December-2016].
url: %5Curl%7Bhttps:
//en.wikipedia.org/w/index.php?title=Zeno%27s_paradoxes&oldid=752685211%7D.
— (2017a). 15 puzzle — Wikipedia, The Free Encyclopedia. [Online; accessed 16-September-2017]. url:
https://en.wikipedia.org/w/index.php?title=15_puzzle&oldid=789930961.
— (2017b). Almon Brown Strowger — Wikipedia, The Free Encyclopedia. [Online; accessed 9-June-2017].
url:
https://en.wikipedia.org/w/index.php?title=Almon_Brown_Strowger&oldid=783883144.
— (2017c). Artificial neuron — Wikipedia, The Free Encyclopedia. [Online; accessed 21-June-2017]. url:
https://en.wikipedia.org/w/index.php?title=Artificial_neuron&oldid=780239713.
— (2017d). Aubrey–Maturin series — Wikipedia, The Free Encyclopedia. [Online; accessed 28-March-2017].
url: https:
//en.wikipedia.org/w/index.php?title=Aubrey%E2%80%93Maturin_series&oldid=771937634.
— (2017e). Backus–Naur form — Wikipedia, The Free Encyclopedia. [Online; accessed 7-May-2017]. url:
https:
//en.wikipedia.org/w/index.php?title=Backus%E2%80%93Naur_form&oldid=778354081.
— (2017f). Magic smoke — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-11]. url:
https://en.wikipedia.org/w/index.php?title=Magic_smoke&oldid=785207817.
— (2017g). North American Numbering Plan — Wikipedia, The Free Encyclopedia. [Online; accessed
9-June-2017]. url: https:
//en.wikipedia.org/w/index.php?title=North_American_Numbering_Plan&oldid=780178791.
— (2017h). Ouija — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Ouija&oldid=776109372.
— (2017i). Pax Britannica — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Pax_Britannica&oldid=775067301.
— (2017j). Philipp von Jolly — Wikipedia, The Free Encyclopedia. [Online; accessed 30-January-2019].
url: https://en.wikipedia.org/w/index.php?title=Philipp_von_Jolly&oldid=764485788.
— (2017k). Platonic solid — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-22]. url:
https://en.wikipedia.org/w/index.php?title=Platonic_solid&oldid=801264236.
— (2017l). Unicode — Wikipedia, The Free Encyclopedia. url:
https://en.wikipedia.org/w/index.php?title=Unicode&oldid=784443067.
— (2017m). Zermelo’s theorem (game theory) — Wikipedia, The Free Encyclopedia. [Online; accessed
2017-Nov-26]. url: https://en.wikipedia.org/w/index.php?title=Zermelo%27s_theorem_(ga
me_theory)&oldid=806070716.
— (2018). Paradox — Wikipedia, The Free Encyclopedia. [Online; accessed 14-December-2018]. url:
https://en.wikipedia.org/w/index.php?title=Paradox&oldid=871193884.
— (2019a). Collatz conjecture — Wikipedia, The Free Encyclopedia. [Online; accessed 15-February-2019].
— (2019b). Mathematics: The Loss of Certainty — Wikipedia, The Free Encyclopedia. [Online; accessed
30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=Mathematics:
_The_Loss_of_Certainty&oldid=879406248.
— (2019c). Maxwell’s demon — Wikipedia, The Free Encyclopedia. [Online; accessed 1-January-2020]. url:
%5Curl%7Bhttps:
//en.wikipedia.org/w/index.php?title=Maxwell%27s_demon&oldid=930445803%7D.
— (2019d). Partial application — Wikipedia, The Free Encyclopedia. [Online; accessed 26-December-2019].
— (2020a). Foobar — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Feb-14]. url:
https://en.wikipedia.org/w/index.php?title=Foobar&oldid=934819128.
— (2020b). Galactic algorithm — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Jun-17]. url:
https://en.wikipedia.org/w/index.php?title=Galactic_algorithm&oldid=957279293.
— (2021). Mississippi River Basin Model — Wikipedia, The Free Encyclopedia. [Online; accessed
25-September-2022]. url: https://en.wikipedia.org/w/index.php?title=Mississippi_River
_Basin_Model&oldid=1041334010.
— (2023). Clarke’s three laws — Wikipedia, The Free Encyclopedia. [Online; accessed 27-June-2023]. url:
https://en.wikipedia.org/w/index.php?title=Clarke%27s_three_laws&oldid=1156462008.
— (2024). Hyperoperation — Wikipedia, The Free Encyclopedia. [Online; accessed 9-August-2024]. url:
https://en.wikipedia.org/w/index.php?title=Hyperoperation&oldid=1226527372.
YouTube channel Joint Mathematics Meetings (May 5, 2018). William Cook: “Information, Computation,
Optimization . . .”. url: https://www.youtube.com/watch?v=q8nQTNvCrjE (visited on 07/02/2024).
YouTube user navyreviewer (2010). Mechanical computer part 1. url:
https://www.youtube.com/watch?v=mpkTHyfr0pM (visited on 08/09/2015).
Zenil, H., ed. (2012). A Computable Universe, Understanding and Exploring Nature as Computation. World
Scientific. isbn: 978-9814374293.
Index
+ tape, 8
in transition tables, 181 alternation, 204
operation on a language, 218 amb, ambiguous function, 191
3 Dimensional Matching problem, 333 ambiguous grammar, 152, 153
3-Coloring problem, 334 argument, to a function, 372
3-SAT, see 3-Satisfiability problem Aristotle’s Paradox, 59, 61
3-SAT, see 3-Satisfiability problem, see 3-SAT prob- Assignment problem, 329
lem, see 3-SATproblem, see 3-SAT prob- Asymmetric Traveling Salesman problem, 328, 329
lem, see 3-SATproblem asymptotically equivalent, 269, 277
3-Satisfiability problem, 282, 300, 333 atom, 288
3-SATproblem, 328, 346
3-SAT problem, 327, 328 B, 370
3-Satisfiability Backus, J
Strict variant, 327 picture, 168
4-Satisfiability problem, 344 Backus-Naur form, BNF, 168
Bacon, Kevin, 407
accept a language, 145 balanced parentheses, 227
accept an input, 185, 193, 197 BB function, 133
acceptable numbering, 71 Berra, Y
accepted language, see recognized picture, 190
of a Turing machine, 294 Big O, 267
accepting state, 179, 181, 309 Big Θ, 268
in transition tables, 181 bijection, 375
nondeterministic Finite State machine, 192 binary sequence, 73
Pushdown machine, 229 binary tree, 176
accepts, 181, 185, 193, 197 bit string, see bitstring
Ackermann function, 31–33, 36, 47–50 bitstring, 370
Ackermann, W, 3 set of, B, 370
picture, 33 blank, 8, 309
action set, 8 blank, B, 5
action symbol, 8 BNF, 168–172
addition, 6 body of a production, 148
adjacency matrix, 161 Bogosort, 409
adjacent, 158, 159 Boole, G
Adleman, L picture, 281
picture, 351 boolean, 281
Agrawal, M expression, 281
picture, 287 function, 281
AKS primality test, 287 variable, 281
algorithm, 293 Boolean algebra, 378
definition, 293 Boolean function, 379
reliance on model, 293 bottom, ⊥, 227
alphabet, 143, 370 Bounded Halting problem, 366
input, 181 BPP, Bounded-Error Probabilistic Polynomial Time
Kleene star, 144, 398 problem, 350
Pushdown machine, 229 branch
of the computation tree, 193, 197 closed walk, 159
breadth first traversal, 166 closure under an operation, 214
breadth-first traversal, 173 CNF, 155, 281, 282, 327, 334, 359, 379
bridge, 299 co-computably enumerable, 111
Brocard’s problem, 100 co-NP, 313
Brzozowski’s algorithm, 260 Cobham’s Thesis, 272
Busy Beaver, 133–135 Cobham, A
Busy Beaver problem, 133 picture, 404
button Thesis, 272
start, 5 codomain, 372
codomain versus range, 373
c.e. set, see computably enumerable Collatz conjecture, 97
caching, 69 coloring of a graph, 161
Cantor’s correspondence, 66–74 colors, 280
Cantor’s pairing function, 67 common divisor, 27
Cantor’s Theorem, 76 compiler-compiler, 170
Cantor, G complement of a language, 146
and diagonalization, 390 complete, 115
picture, 61 for a class, 332
cardinality, 59–80 NP, 332
less than or equal to, 75 complete graph, 164
cellular automaton, 43 complexity class, 301
certificate, 312 canonical, 349
characteristic function 1𝑆 , 76 EXP, 347
Chromatic Number problem, 281, 295 NP, 311
chromatic number, 162 P, 302
Church’s Thesis, 14–21 polytime, 302
and uncountability, 77 complexity function, 267
argument by, 19 Complexity Zoo, 302, 349
clarity, 17 composition, 375
consistency, 16 computable
convergence, 16 from a set, 112
coverage, 15 relative to a set, 112
Extended, 304 set, 107
Church, A computable function, 11
picture, 15 relative to an oracle, 112
Thesis, 15 computable functions, 9–11
circuit, 159, 303 computable relation, 11
Euler, 159 computable set, 11
gate, 303 computably enumerable, 106–111
Hamiltonian, 159 in an oracle, 119
wire, 303 𝐾 is complete, 115
Circuit Evaluation problem, 303 computably enumerable set, 107
class, 144, 301 co-computably enumerable, 111
complexity, 301 collection of, RE, 307
Class Scheduling problem, 336 in increasing order, 110
Clique problem, 283, 306, 321, 333 computation
clique, in a graph, 283 distributed, 293
Finite State machine, 185 DAG, directed acyclic graph, 159
nondeterministic Finite State machine, 193, 197 dangling else, 153
relative to an oracle, 112 De Morgan, A
step, 8 picture, 280
Turing machine, 9 dead state, 399
computation tree decidable, 294
branch, 193, 197 language, 100
nondeterministic Finite State machine, 193, 197 decidable language, 307
concatenation of languages, 144 decide a language, 145
concatenation of strings, 371 decided language, 185, 294
configuration, 8, 184, 193, 197 of a nondeterministic Turing machine, 309
halting, 185, 193, 197 decider, 11
initial, 8 decides
conjunction, 281, 377 language, 307
Conjunctive Normal form, 155, 281, 334, 359, 379 set, 11
connected component, 299 deciding
connected graph, 160 a language, 11
connectives decision problem, 3, 12, 35, 294
logicial, 377 decrypter, 352
context free degree of a vertex, 162
grammar, 149 degree sequence, 162
language, 233 demon, or daemon, 192
context sensitive grammar, 233 DeMorgan’s laws
control, of a machine, 4 for logic expressions, 378
converge, 10 depth first traversal, 166
Conway, J depth-first traversal, 173
picture, 43 derivation, 148, 149, 151
Cook reducibility, 319 derivation tree, 149
Cook, S determinism, 8, 16
picture, 331 diagonalization, 74–80, 119, 120
Cook-Levin theorem, 331 effectivized, 90
correspondence, 60, 375 routine, 132
Cantor’s, 67 digraph, 159
countable, 62 directed acyclic graph, 159
countably infinite, 62 directed graph, 159
Course Scheduling problem, 290 Discrete Logarithm problem, 299
course-of-values recursion, 30 disjunction, 281, 377
CPU of Turing machine, 4 Disjunctive Normal form, DNF, 40, 379
Crossword problem, 286 distinguishable classes, 251
current symbol, 8 distinguishable states, 250
currying, 393 distributed computation, 293
CW, 164 distributive laws
cycle, 159 for logic expressions, 378
Cyclic Shift problem, 325 diverge, 10
Divisor problem, 286, 300
daemon, see demon divisor, 27
DAG common, 27
traversal, 172 greatest common, 27
DNF, 40, 379 F-SAT problem, 300
domain, 372, 373 Factoring problem, 351, 352
in the Theory of Computation, 373 Prime Factorization problem, 300
Double-SAT problem, 317 Fermat number, 36
doubler function, 3, 12 Fermat prime, 36
dovetailing, 107 Fifteen Game problem, 286, 300
Droste effect, 396 Fin problem, 330
Drummer problem, 326 final state, 179, 181
DSPACE, 348 in transition tables, 181
DTIME, 347 nondeterministic Finite State machine, 192
finite set, 61
edge, 158 Finite State automata, see Finite State machine
Finite State machine, 179–189
edge weight, 159
accept string, 185
Edmunds, J
accepting state, 181
picture, 404
alphabet, 181
effective, 3, 14
computation, 185
Electoral College, 285
configuration, 184
empty language
final state, 181
decision problem, 299
halting configuration, 185
empty string, 𝜀 , 8, 62, 370
initial configuration, 184
encrypter, 352
initial state, 184
Entscheidungsproblem, 3, 12, 14, 35, 294, 339
input string, 184
unsolvability, 96
language of, 185
enumerate, 62
minimization, 249–260
𝜀 closure, 197 next-state function, 181
𝜀 moves, 194 nondeterminism, 308
𝜀 transitions, 194–198 nondeterministic, 192
equinumerous sets, 61 powerset construction, 199
equivalent growth rates, 268 product, 215
equivalent propositional logic statements, 378 product construction, 214
error state, 182, 399
Euler Circuit problem, 280, 300
Pumping Lemma, 220
reject string, 185
Euler circuit, 159 state, 181
Euler, L step, 185
picture, 279 subset method, 213
eval, 83 transition function, 181
exclusive or, 42 Fixed point theorem, 119–125
EXP, 346–347 discussion, 122–124
expansion of a production, 148 Flauros, Duke of Hell
Ext, extensible functions, 119 picture, 192
Extended Church’s Thesis, 304 flow, 326
extended regular expression, 234 flow chart, 82
extended transition function, 185 Four Color problem, 280
for nondeterministic Finite State machines, 198 function, 372–376
nondeterministic Finite State machine, 193 91 (McCarthy), 30
extensible, 119 argument, 372
extensional property, 394 Big O, 267
Big Θ, 268 translation, 319
Boolean, 379 unpairing, 67, 136
boolean, 281 value, 372
characteristic, 76 well-defined, 372, 374
codomain, 372 zero, 24, 35, 49
composition, 375 function problem, 293
computable, 11 functions
computed by a Turing machine, 9 same behavior, 100
converge, 10 same order of growth, 268
correspondence, 60, 375
definition, 372 gadget
diverge, 10 example of, 334
domain, 372 in complexity arguments, 334
doubler, 3, 12 Galilei, Galileo
effective, 3 picture, 59
enumeration, 62 Galileo, see Galilei, Galileo
exponential growth, 270 Galileo’s Paradox, 59, 61, 62
extended transition, 185 Game of Life, 43–46
extensible, 119 rules, 43
general recursive, 35 Gardner, Martin, 43
identity, 375 gate, 40, 303
image under, 373 gcd, see greatest common divisor
index, 372 general recursion, 31–38
injection, 374 general recursive function, 35
inverse, 376 general unsolvability, 91–94
left inverse, 375 Gödel number, 71
logarithmic growth, 270 Gödel’s multiplicative encoding, 30
𝜇 recursive function (mu recursive), 35 Gödel, K, 14
next-state, 8, 181 letter to von Neumann, 339
one-to-one, 60, 374 picture, 15
onto, 60, 374 picture with Einstein, 128
order of growth, 267 Gödel’s theorem, 14
output, 372 Goldbach’s conjecture, 34, 100, 107
pairing, 67, 136 grammar, 147–157
partial, 10, 373 ambiguous, 152, 153
partial recursive, 35 Backus-Naur form, BNF, 168
polynomial growth, 270 body of a production, 148
predecessor, 6, 23 context free, 149
projection, 24, 35, 49 context sensitive, 233
range, 373 derivation, 148
recursive, 11, 35 expansion of a production, 148
reduction, 320 head, 148
restriction, 373 linear, 203
right inverse, 376 nonterminal, 148
successor, 12, 21, 24, 35, 49 production, 148, 149
surjection, 374 regular, 218
total, 10, 118, 373 rewrite rule, 148, 149
transition, 8, 181, 309 right linear, 203
start symbol, 149 vertex, 158
syntactic category, 149 vertex cover, 283
terminal, 148 vertex degree, 162
graph, 158–168 walk, 159
adjacent, 158 walk length, 159
adjacent edges, 159 weighted, 159
bridge edge, 299 Graph Colorability problem, 281, 300, 322, 336
chromatic number, 162 Graph Connectedness problem, 299, 301
circuit, 159 Graph Isomorphism problem, 299, 337
clique, 283 Graph traversal, 172–176
closed walk, 159 Grassmann, H, 21
coloring, 161–162 picture, 21
colors, 280 greatest common divisor, 27
complete, 164 guessing
connected, 160 by a machine, 191
connected component, 299
cycle, 159 hailstone function, 97
degree sequence, 162 Halt light, 5
digraph, 159 halting configuration, 185, 193, 197
directed, 159 Halting problem, 89–91, 100
directed acyclic, 159 as a decision problem, 300
edge, 158 discussion, 94–96
edge weight, 159 reduction to another problem, 94
Euler circuit, 159 relativized, 116
finite, 158 relativized to a set, 116
Hamiltonian circuit, 159 significance, 95
induced subgraph, 159 unsolvability, 90
infinite, 158 halting state, 12
isomorphism, 162–163 Halts On Three problem, 91, 113, 319
loop, 159 Hamilton, W R
matrix representation, 161 picture, 278
multigraph, 159 Hamiltonian circuit, 159
neighbors, 158 Hamiltonian Circuit problem, 278, 300, 313, 326, 333
node, 158 Hamiltonian Path problem, 313, 344
rank, 173 hard
open walk, 159 for a class, 332
path, 159 NP, 332
planar, 165, 280 haystack, 302
representation, 160–161 head
simple, 158 read/write, 4
spanning subgraph, 283 head of a production, 148
subgraph, 159 Hilbert’s Hotel, 126
trail, 159 Hilbert, D, 3
transition, 7 picture, 127
traversal, 159–160, 166 Hofstadter, D, 396
breadth-first, 173 hyperoperation, 31
depth-first, 173
tree, 159, 283 𝜄 , see inclusion function
I/O head, see read/write head Kleene star, 62, 143, 144, 370, 398
identity function, 375 regular expression, 205
Ignorabimus, 128 Kleene’s fixed point theorem, 120
image under a function, 373 Kleene’s theorem, 206–210
Implication, 42 Kleene, S, 35
inclusion function 𝜄 , 374 picture, 204
Incompleteness Theorem, 14 𝐾𝑛 , complete graph on 𝑛 vertices, 164
Independent Set problem, 291, 301, 325, 327, 345 Knapsack problem, 285, 294, 318, 323, 333
index number, 71 Knight’s Tour problem, 279
index set, 101 Knuth, D
indistinguishable states, 250 picture, 273
induced subgraph, 159 Kolmogorov, A
infinite set, 61 picture, 263
infinity, 59–66, 80 König’s lemma, 160, 310
initial configuration, 8, 184, 193, 197 Königsberg, 279
initial state, 184
injection, 374 L’Hôpital’s Rule, 269
input L-distinguishable, 242
loading, 9 L-indistinguishable, 242
input alphabet, 181 L-related, 242
input string, 184, 193, 197 lambda calculus, 𝜆 calculus, 15
input symbol, 8 language, 143–147
input, to a function, 372 + operation, 218
instruction, 5, 8, 309 accept, 145
stack machine, 229 accepted by a Finite State machine, see lan-
Integer Linear Programming problem, 316 guage, recognized by a Finite State ma-
decision problem, 328 chine
inverse of a function, 376 accepted by Turing machine, 105, 294
left, 375 class, 144
right, 376 complement, 146
two-sided, 376 concatenation, 144
isomorphic graphs, 162 context free, 233
isomorphism, 162 decidable, 100, 307
decide, 145
Johnson, K decided, 294
picture, 383 decided by a Finite State machine, 185
decided by a Turing machine, 11, 307
𝑘 Coloring problem, 161, 281 decision problem, 294
𝐾 , the Halting problem set, 90, 109 derived from a grammar, 151
complete among computably enumerable sets, grammar, 148, 149
115 Kleene star, 144
𝐾0 , set of halting pairs, 99, 110, 114 non-regular, 220–226
Karatsuba, A, 264 of a Finite State machine, 185
Karp reducible, 320 of a nondeterministic Finite State machine, 193
Karp, R operations on, 144
picture, 332 power, 144
Kayal, N recognize, 145
picture, 287 recognized, 294
recognized by a Finite State machine, 185 Marriage problem, see Drummer problem or Matching
recognized by a Turing machine, 11 problem
recognized by Turing machine, 294 Matching problem, 342
regular, 214–219 matching, three dimensional, 284
reversal, 144 Max Cut problem, 284
verifier, 312 Max-Flow problem, 326
language decision problem, 294 McCarthy’s 91 function, 30
last in, first out (LIFO) stack, 226 memoization, 69
left inverse, 375 memory, 4
leftmost derivation, 149 metacharacter, 148, 204
Legendre’s conjecture, 35 Meyer, A
picture, 51
Minimal Spanning Tree problem, 294
LEGO, 5
length, 159
length of a string, 370 minimization, 33
Levin, L minimization of a Finite State machine, 249–260
picture, 331 Brzozowski’s algorithm, 260
lexicographic order, 62 Moore’s algorithm, 250
minimization, unbounded, 35
Minimum Spanning Tree problem, 283
Life, Game of, 43–46
rules, 43
LIFO stack, 226 modulus, 352
light Moore’s algorithm, 250
Halt, 5 Morse code, 164
Linear Divisibility problem, 318 𝜇 -recursion (mu recursion), 33
Linear Programming language decision problem, 285, 𝜇 recursive function, 35
301, 316, 326 multigraph, 159
multiset, 284
Lipton’s Thesis, 297
Musical Chairs, 75
loading, 9
Myhill, J
logic gate, 40
picture, 247
logical connectives, 377
Myhill-Nerode theorem, 242–249
logical operator
and, 281, 377
𝑛 -distinguishable states, 251
not, 281, 377
𝑛 -indistinguishable states, 251
or, 281, 377
𝑛 -distinguishable classes, 251
logical operators, 377
Longest Path problem, 318, 345
Naur, P
picture, 168
LOOP Nearest Neighbor problem, 299, 301
language, 51 needle, 302
program, 51 negation, 281, 377
loop, 159 neighbors, 158
LOOP program, 51–56 Nerode, A
picture, 247
M-related, 244 next state, 5, 8
machine next tape action, 5
state, 9 next-state function, 8, 181
many-one reducible, 319 nondeterministic Finite State machine, 192
map, see function NFSM, see nondeterministic Finite State machine
mapping reducible, 319 node, 158
rank, 166 oracle, 111–119
nondeterminism, 189–203 computably enumerable in, 119
for Finite State machines, 192, 308 computation relative to, 112
for Turing machines, 308 set computable from, 112
Nondeterministic Bounded Halting problem, 365 oracle Turing machine, 112
nondeterministic Finite State machine, 192 order of growth, 263–278
accept string, 193, 197 function, 267
computation, 193, 197 hierarchy, 271
computation tree, 193, 197 ouroboros, 82
configuration, 193, 197 output, from a function, 372
convert to a deterministic machine, 198, 199
𝜀 moves, 194 P, 301–308
𝜀 transitions, 194 P hard, 323
halting configuration, 193, 197 P versus NP, 311, 337–343
initial configuration, 193, 197 pairing function, 67, 136
input string, 193, 197 Paley, W
language of, 193 picture, 130
language recognized, 193 palindrome, 14, 143, 230, 371
reject string, 193, 197 paradox
nondeterministic machine Aristotle’s, 59
recognizes a language, 318 Galileo’s, 59
nondeterministic Pushdown machine, 226–234 Zeno’s, 62
nondeterministic Turing machine parameter, 85
accepting state, 309 Parameter theorem, 85
decided language, 309 parametrization, 84–87
definition, 309 parametrizing, 85
instruction, 309 parentheses
transition function, 309 balanced, 227
nonterminal, 148, 149 parse tree, 149
NP, 308–319 parser-generator, 170
NP complete, 331–337 partial function, 10, 373
basic problems, 333 partial recursive function, 35
NP hard, 332 Partition problem, 285, 317, 333, 334, 345
NP intermediate problems, 337 path, 159
NSPACE, 348 perfect number, 95
NTIME, 347 Péter, R
numbering, 71 picture, 47
acceptable, 71 Petersen graph, 164, 166
pipe, pipe alternation operator204
Ω , Big Omega, 269 pipe symbol, | , 148
𝑜 , omicron, 269 planar graph, 165, 280
one-to-one function, 60, 374 pointer, in C, 123
onto function, 60, 374 polynomial time, 302
open walk, 159 polynomial time reducibility, 320
operators polytime, 302
logicial, 377 power of a language, 144
optimization problem, 293 power of a string, 371
optimization problem reducibility, 329 powerset construction, 199
predecessor function, 6, 23 Pushdown machine, 226–234
prefix of a string, 371 input alphabet, 229
present state, 5, 8 nondeterministic, 226–234
present tape symbol, 5 stack alphabet, 229
primality, 287 transition function, 229
Primality problem, 287, 293, 294, 300, 314 pushdown stack, 226
Prime Factorization problem, 287, 293, 337, 344
primitive recursion, 21–30, 35 quantum advantage, 304
arity, 23 Quantum Bogosort, 409
primitive recursive functions, 24 Quantum Computing, 304
private key, 352 quantum computing
problem, 293 quantum advantage, 304
decision, 294 quantum supremecy, see also quantum advantage
function, 293 quine, 130
Halting, 90, 91 Quine’s paradox, 396
language decision, 294
optimization, 293 r.e. set, see computably enumerable set
search, 294 Radó, T
unsolvable, 91 picture, 133
problem miscellany, 278–292 RAM, see Random Access machine
problems Random Access machine, 272
tractable, 272 range of a function, 373
unsolvable, 106 rank, 166, 173
product construction, 214 RE, computably enumerable sets, 295
production, 149 reachable vertex, 160, 282
production in a grammar, 148 read/write head, 4
program, 293 REC, computable sets, 295, 407
projection function, 24, 35, 49 recognize a language, 145
proper subtraction, 24 recognized language
property of a Finite State machine, 185
extensional, 394 of a nondeterministic Finite State machine, 193
Propositional logic, 377–379 of a Turing machine, 294
atom, 288 recognizing
Boolean algebra, 378 a language, 11
Boolean function, 379 recursion, 21–38
Conjunctive Normal form, 155, 282, 327, 359, course-of-values, 30
379 Recursion theorem, 120
DeMorgan’s laws, 378 recursive function, 11, 35
Disjunctive Normal form, 40, 379 recursive set, 11
distributive laws, 378 recursively enumerable set, see computably enumer-
exclusive or, 42 able set
Implication, 42 reduces to, 112
operators, 377 reducibility
pseudopolynomial, 275 between optimization problems, 329
public key, 352 Cook, 319
Pumping lemma, 220 Karp, 320
pumping length, 220 polynomial time, 320
Pushdown automata, see pushdown machine polytime, 320
polytime many-one, 320 Satisfiability problem, 282, 291, 296, 312, 321, 322,
polytime mapping, 320 327, 331
polytime Turing, 319 as a language recognition problem, 295
reducible on a nondeterministic Turing machine, 310
many-one, 319 satisfiable Propositional logic expression, 281
mapping, 319 Satisfying Assignment problem, 296
reduction from the Halting problem to another, 94 Saxena, N
reduction function, 320 picture, 287
reductions between problems, 94, 319–331 schema of primitive recursion, 23
Reflections on Trusting Trust, 132 Science United, 293
Reg problem, 330 search problem, 294
regex, 234 self reproducing program, 130
regular expression, 204–213 self reproduction, 129–132
extended, 234 semicomputable set, 107
in practice, 234–242 semidecidable set, 107
operator precedence, 205 semidecide a language, 145
regex, 234 semiprime, 287
semantics, 205 Semiprime problem, 317
syntax, 204 set
regular grammar, 218 c.e., 107
regular language, 214–219 cardinality, 61
reject an input, 185, 193, 197 computable, 11, 107
rejects, 181 computably enumerable, 106–111
relation, computable, 11 countable, 62
relativized Halting problem, 116 countably infinite, 62
for a set, 116 decider, 11
replication of a string, 371 equinumerous, 61
representation, of a problem, 297 finite, 61
restriction of a function, 373 index, 101
reversal of a language, 144 infinite, 61
reversal of a string, 371 oracle, 111–119
rewrite rule, 148, 149 r.e., see computably enumerable set
Rice’s theorem, 100–106 recursive, 11
right inverse, 376 recursively enumerable, see computably enu-
right linear, 203 merable set
Ritchie, D reduces to, 112
picture, 51 semicomputable, 107
Rivest, R semidecidable, 107
picture, 351 𝑇 equivalent, 114
root, 159 Turing equivalent, 114
RSA Encryption, 351–356 uncountable, 75
Russell set, 391 undecidable, 91
Set Cover problem, 326
𝑠 -𝑚 -𝑛 theorem, 85 Shamir, A
same behavior, functions with, 100 picture, 351
same order of growth, 268 Shannon, C
SAT, see Satisfiability problem picture, 40
SAT solver, 358 Shortest Path problem, 280, 293, 300, 301, 320
Σ function, 133 power, 371
∼, asymptotically equivalent, 277 prefix, 371
simple graph, 158 replication, 371
SPACE, 348 reversal, 371
span a graph, 283 substring, 371
spanning subgraph, 283 suffix, 371
𝑠𝑡 -Connectivity problem, see Vertex-to-Vertex Path string accepted
problem by deterministic Finite State machine, 181, 185
𝑠𝑡 -Path problem, see Vertex-to-Vertex Path problem by nondeterministic Finite State machine, 193,
stack, 226 197
alphabet, 229 string rejected, 181
bottom, ⊥, 227 String Search problem, 302
LIFO, Last-In, First-Out, 226 subgraph, 159
pop, 226 induced, 159
push, 226 subset method, 213
Start button, 5, 181 Subset Sum problem, 284, 294, 301, 317, 323, 345
start state, 5, 181 substring, 371
Pushdown machine, 229 Substring problem, 325
start symbol, 149 successor function, 12, 21, 24, 35, 49
state, 181 suffix of a string, 371
accepting, 179, 181, 309 surjection, 374
dead, 399 symbol, 8, 143, 370
error, 399 action, 8
final, 179, 181 current, 8
halting, 12 input, 8
next, 5 syntactic category, 149
present, 5
start, 5 𝑇 equivalent, 114
unreachable, 105 𝑇 reducible, 112
working, 12 table, transition, 7
state machine, 9, 384 table-filling algorithm, 250
states, 4 tail recursion, 175
distinguishable, 250 tape, 4
indistinguishable, 250 tape alphabet, 8
𝑛 -distinguishable, 251 tape symbol, 8
𝑛 -indistinguishable, 251 blank, 5
set of, 8 terminal, 148, 149
Stator square, 406 tetration, 32
STCON problem, see Vertex-to-Vertex Path problem Thompson, K
step of a computation, 8, 185 picture, 132
store, of a machine, 4 Three Dimensional Matching problem, 284, 317
str function, 298 time taken by a machine, 273
Strict 3-Satisfiability, 327 token, 143, 370
string, 143, 370–371 Tot, set of total computable functions, 110, 118
concatenation, 371 total function, 10, 118, 373
decomposition, 371 Towers of Hanoi, 26
empty, 8, 62, 370 tractable, 271–272
length, 370 trail, 159
transformation function, see reduction function next state, 5, 8
transition function, 8, 181, 309 next-state function, 8
extended, 185, 198 nondeterminism, 308
graph of, 7 numbering, 71
Pushdown machine, 229 palindrome, 14
table of, 7 present state, 5, 8
transition graph, 7 present symbol, 5
transition table, 7 recognizing a language, 11
translation function, 319 simulator, 38–39
Traveling Salesman problem, 190, 279, 296, 311, 326, tape alphabet, 8
328, 333, 344, 357 transition function, 8
Asymmetric, 328, 329 universal, 81–83
traversal, 166 with oracle, 112
tree, 159, 283 Turing reducibility, 319
binary, 176 Turing reducible, 112, 319
rank, 166 Turing, A
root, 159 picture, 3
traversal, 172 Turnpike problem, 318
Triangle problem, 306 two-sided inverse, 376
triangular number, 26
truth table, 281, 377 unbounded minimization, 33
Turing equivalent, 114 unbounded search, 33
Turing machine, 3–14 uncountable, 75
accept a language, 105 undecidable, 91
accepting a language, 307 Unicode, 182, 400
action set, 8 uniformity, 83–84
action symbol, 8 Universal Turing machine, 81–83
computation, 9 universality, 80–89
configuration, 8 unpairing function, 67, 136
control, 4 unreachable state, 105
CPU, 4 Unsolvability
current symbol, 8 in intellectual culture, 127–129
decidable, 294 unsolvability, 106
decides a set, 11 unsolvable problem, 91, 106
deciding a language, 11, 307 use-mention distinction, 123
definition, 8
deterministic, 8 value, of a function, 372
for addition, 6 verifier, 312
function computed, 9 polytime, 312
Gödel number, 71 vertex, 158
index number, 71 rank, 166
input symbol, 8 reachable, 160, 282
instruction, 5, 8 vertex cover, 283
language accepted, 294 Vertex Cover problem, 283, 291, 325
language decided, 294 Vertex cover problem, 333
language recognized, 294 Vertex-to-Vertex Path problem, 282, 301, 306, 320
multitape, 20 von Neumann, J
next action, 5 architecture, 43
picture, 43

walk, 159
walk length, 159
weight, 159
weighted graph, 159
well-defined, 372, 374
wire, 303
witness, 312
word, see string
working state, 12

XOR, 42

⊢, yields
for Finite State machines, 185
for nondeterministic Finite State machines, 193,
197
for Turing machines, 9

Zeno’s Paradox, 62
zero function, 24, 35, 49
Zoo, Complexity, 349

Introduction To Computer Theory Daniel I A Cohen
67% (3)
Introduction To Computer Theory Daniel I A Cohen
828 pages
Theory of Computation
100% (2)
Theory of Computation
429 pages
Module 1-2 Calculus PPT Dr. Ankur
No ratings yet
Module 1-2 Calculus PPT Dr. Ankur
216 pages
Languages and Machines Thomas A Sudkamp PDF
80% (5)
Languages and Machines Thomas A Sudkamp PDF
574 pages
CS101 Study Guide
No ratings yet
CS101 Study Guide
49 pages
Algorithmic Thinking For Adventurous Minds
100% (1)
Algorithmic Thinking For Adventurous Minds
208 pages
Languages and Machines (Thomas A. Sudkamp)
100% (3)
Languages and Machines (Thomas A. Sudkamp)
574 pages
The Theory of Computation
100% (21)
The Theory of Computation
471 pages
Quantum Computation Scribe Notes by Ryan O'Donnell and John Wright
No ratings yet
Quantum Computation Scribe Notes by Ryan O'Donnell and John Wright
242 pages
Function Practice Questions
No ratings yet
Function Practice Questions
6 pages
Chapter Three: Optimization Techniques
No ratings yet
Chapter Three: Optimization Techniques
16 pages
1 - How To Make Your PMO Survive in Difficult Times
100% (3)
1 - How To Make Your PMO Survive in Difficult Times
11 pages
Bielajew, A. F. - Introduction To Computers and Programming Using C++ and MATLAB - 2002
No ratings yet
Bielajew, A. F. - Introduction To Computers and Programming Using C++ and MATLAB - 2002
440 pages
(Sudkamp) Languages and Machines PDF
No ratings yet
(Sudkamp) Languages and Machines PDF
574 pages
Qlikview Tips Tricks
100% (1)
Qlikview Tips Tricks
189 pages
Theoretical Computer Science
No ratings yet
Theoretical Computer Science
9 pages
The Programmers Guide To Theory
No ratings yet
The Programmers Guide To Theory
214 pages
Book
No ratings yet
Book
425 pages
Cubic Functions
No ratings yet
Cubic Functions
19 pages
Cs50x 2025 Course
No ratings yet
Cs50x 2025 Course
49 pages
Unit-2 PPT Math Modules 234
No ratings yet
Unit-2 PPT Math Modules 234
52 pages
CS50 Lecture 0 Notes
100% (1)
CS50 Lecture 0 Notes
13 pages
Theory of Computation Automata, Formal Languages, Computation and Complexity (K. R. Chowdhary) (Z-Library)
No ratings yet
Theory of Computation Automata, Formal Languages, Computation and Complexity (K. R. Chowdhary) (Z-Library)
663 pages
Pub - Elements of Computation Theory PDF
No ratings yet
Pub - Elements of Computation Theory PDF
428 pages
IntroCS Book
No ratings yet
IntroCS Book
104 pages
Introduction To Computers and Programming
No ratings yet
Introduction To Computers and Programming
441 pages
Daniel I.A. Cohen - Introduction To Computer Theory (1996, John Wiley & Sons) PDF
No ratings yet
Daniel I.A. Cohen - Introduction To Computer Theory (1996, John Wiley & Sons) PDF
336 pages
Think Complexity Complexity Science and Computational Modeling (Allen B. Downey)
No ratings yet
Think Complexity Complexity Science and Computational Modeling (Allen B. Downey)
146 pages
Book
No ratings yet
Book
443 pages
Lec. Notes
No ratings yet
Lec. Notes
210 pages
Lecture Slides For Signals and Systems-2016!01!25
No ratings yet
Lecture Slides For Signals and Systems-2016!01!25
497 pages
Complejidad Computacional
No ratings yet
Complejidad Computacional
485 pages
Theory of Computation
No ratings yet
Theory of Computation
445 pages
Mathematical Foundations of Computing - Preliminary Course Notes
No ratings yet
Mathematical Foundations of Computing - Preliminary Course Notes
347 pages
Recursion Theory and Undecidability: Lecture Notes For CSCI 303 by David Kempe March 26, 2008
100% (1)
Recursion Theory and Undecidability: Lecture Notes For CSCI 303 by David Kempe March 26, 2008
11 pages
Downey R. Computability and Complexity. Foundations and Tools... 2024
No ratings yet
Downey R. Computability and Complexity. Foundations and Tools... 2024
361 pages
Computer Science Principles With Java
No ratings yet
Computer Science Principles With Java
261 pages
Computer Science Principles With Python
No ratings yet
Computer Science Principles With Python
263 pages
Theory of Computation A
No ratings yet
Theory of Computation A
171 pages
Eigenmath Manual
No ratings yet
Eigenmath Manual
54 pages
Mathematics 11 2021-22 Term 1 Syllabus
No ratings yet
Mathematics 11 2021-22 Term 1 Syllabus
4 pages
Sin (1/x), Fplot Command - y Sin (X), Area (X, Sin (X) ) - y Exp (-X. X), Barh (X, Exp (-X. X) )
No ratings yet
Sin (1/x), Fplot Command - y Sin (X), Area (X, Sin (X) ) - y Exp (-X. X), Barh (X, Exp (-X. X) )
26 pages
Cs 6505
No ratings yet
Cs 6505
204 pages
1 Introduction
No ratings yet
1 Introduction
65 pages
Differential Geometry Notes
No ratings yet
Differential Geometry Notes
3 pages
Automata
No ratings yet
Automata
157 pages
Math S1-SB
No ratings yet
Math S1-SB
250 pages
Doctor of Science: A Theory of Nonlinear Systems
No ratings yet
Doctor of Science: A Theory of Nonlinear Systems
118 pages
154 Main
No ratings yet
154 Main
53 pages
Programming Matlab and C++
No ratings yet
Programming Matlab and C++
440 pages
Algorithms in C - Robert Sedgewick
No ratings yet
Algorithms in C - Robert Sedgewick
672 pages
Introduction To Computer Science: Dmytro Zubov, PHD Dmytro - Zubov@Czac - CZ
No ratings yet
Introduction To Computer Science: Dmytro Zubov, PHD Dmytro - Zubov@Czac - CZ
45 pages
Chapter 5 Applications of Differentiation
No ratings yet
Chapter 5 Applications of Differentiation
111 pages
Chapter 1 Set
No ratings yet
Chapter 1 Set
50 pages
Lecture Note Formal Methods in Software Engineering - Lecture 1 (Download Tai Tailieutuoi - Com)
No ratings yet
Lecture Note Formal Methods in Software Engineering - Lecture 1 (Download Tai Tailieutuoi - Com)
6 pages
Mupad
No ratings yet
Mupad
37 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Languages and Machines An Introduction To The Theory of Computer Science 3rd Edition Sudkamp All Chapter Instant Download
No ratings yet
Languages and Machines An Introduction To The Theory of Computer Science 3rd Edition Sudkamp All Chapter Instant Download
86 pages
Introduction To Computing Systems: From Bits & Gates To C & Beyond 3rd Edition Yale Patt - Ebook PDF Instant Download
No ratings yet
Introduction To Computing Systems: From Bits & Gates To C & Beyond 3rd Edition Yale Patt - Ebook PDF Instant Download
71 pages
Grade 12 AS Sir Lecture Details
No ratings yet
Grade 12 AS Sir Lecture Details
21 pages
MAT10f Midterm PDF
No ratings yet
MAT10f Midterm PDF
19 pages
Commerce Xii C
No ratings yet
Commerce Xii C
16 pages
Practice Exam 1 - Section 1 - Part A
No ratings yet
Practice Exam 1 - Section 1 - Part A
21 pages
Module 4 (105) Mathematical Foundations
No ratings yet
Module 4 (105) Mathematical Foundations
15 pages
00 Introduction 320 2025w
No ratings yet
00 Introduction 320 2025w
26 pages
Pu 2 Maths QP SN
No ratings yet
Pu 2 Maths QP SN
7 pages
1 Introduction To Theory of Computation 2020
No ratings yet
1 Introduction To Theory of Computation 2020
13 pages
1718 QS015 - 1 Solution PDF
No ratings yet
1718 QS015 - 1 Solution PDF
19 pages
Manual Set 1
No ratings yet
Manual Set 1
9 pages
Pharm - D Calculas 2
No ratings yet
Pharm - D Calculas 2
11 pages
Week1 Introduction Background
No ratings yet
Week1 Introduction Background
15 pages
Theory of Computation Slides: Emanuele Viola
No ratings yet
Theory of Computation Slides: Emanuele Viola
11 pages
12th Class Syllabus Math
No ratings yet
12th Class Syllabus Math
6 pages
Python For Data Science Nympy and Pandas
No ratings yet
Python For Data Science Nympy and Pandas
4 pages
ICS 104 - Introduction To Programming in Python and C: Functions
No ratings yet
ICS 104 - Introduction To Programming in Python and C: Functions
6 pages
An Introduction of Theory of Computation
No ratings yet
An Introduction of Theory of Computation
5 pages
Tufts CS170
No ratings yet
Tufts CS170
3 pages
Mathematics GR 10 Nov 2023 Assessment Framework
No ratings yet
Mathematics GR 10 Nov 2023 Assessment Framework
5 pages
Day Wise MFCS
No ratings yet
Day Wise MFCS
3 pages
Pragati Public School Class - Xi Annual Examination (2017-18) Sub:-Mathematics Set-A Time 3 Hrs. MM - 100
No ratings yet
Pragati Public School Class - Xi Annual Examination (2017-18) Sub:-Mathematics Set-A Time 3 Hrs. MM - 100
3 pages
Preface - Reasonably Polymorphic
No ratings yet
Preface - Reasonably Polymorphic
3 pages
Cs607 3rd Quiz
No ratings yet
Cs607 3rd Quiz
14 pages