Downey R. Computability and Complexity. Foundations and Tools... 2024
Downey R. Computability and Complexity. Foundations and Tools... 2024
Rod Downey
Computability
and Complexity
Foundations and Tools for
Pursuing Scientific Applications
Undergraduate Topics in Computer Science
Series Editor
Ian Mackie
Advisory Board
Samson Abramsky , Department of Computer Science, University of Oxford,
Oxford, UK
Chris Hankin , Department of Computing, Imperial College London, London, UK
Mike Hinchey , Lero — The Irish Software Research Centre, University of
Limerick, Limerick, Ireland
Dexter C. Kozen, Department of Computer Science, Cornell University, Ithaca,
NY, USA
Hanne Riis Nielson , Department of Applied Mathematics and Computer Science,
Technical University of Denmark, Kongens Lyngby, Denmark
Steven S. Skiena, Department of Computer Science, Stony Brook University, Stony
Brook, NY, USA
Iain Stewart , Department of Computer Science, Durham University, Durham, UK
Joseph Migga Kizza, Engineering and Computer Science, University of Tennessee
at Chattanooga, Chattanooga, TN, USA
Roy Crole, School of Computing and Mathematics Sciences, University of Leicester,
Leicester, UK
Elizabeth Scott, Department of Computer Science, Royal Holloway University of
London, Egham, UK
Andrew Pitts , Department of Computer Science and Technology, University of
Cambridge, Cambridge, UK
‘Undergraduate Topics in Computer Science’ (UTiCS) delivers high-quality
instructional content for undergraduates studying in all areas of computing and
information science. From core foundational and theoretical material to final-year
topics and applications, UTiCS books take a fresh, concise, and modern approach
and are ideal for self-study or for a one- or two-semester course. The texts are
authored by established experts in their fields, reviewed by an international advisory
board, and contain numerous examples and problems, many of which include fully
worked solutions.
The UTiCS concept centers on high-quality, ideally and generally quite
concise books in softback format. For advanced undergraduate textbooks that are
likely to be longer and more expository, Springer continues to offer the highly
regarded Texts in Computer Science series, to which we refer potential authors.
Rod Downey
Computability
and Complexity
Foundations and Tools for Pursuing Scientific
Applications
Rod Downey
School of Mathematics and Statistics
Victoria University
Wellington, New Zealand
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
One of the great achievements of humanity has been the clarification of the
concept of an algorithmic process. Having done this, we then sought to un-
derstand the structure of algorithms and their complexity. Whilst this story
goes back literally thousands of years, it has only been relatively recently
that we have had the mathematical tools to deal with these issues.
This book deals with the amazing intellectual development of the ideas
of computability and complexity over the last 150 years, or so. These ideas
are deep. They should be known by working mathematicians and computer
scientists.
The goal of this book is to give a straightforward introduction to the
main ideas in these areas, at least from my own biased point of view. The
book has been derived from various courses I have taught at Victoria Univer-
sity of Wellington, University of Madison, Wisconsin, Notre Dame, Univer-
sity of Chicago, Cornell University, Nanyang University of Technology, and
elsewhere over the last 35 years. The target audience is somewhere around
advanced undergraduate and beginning graduate students. That is, in the
British system, final year undergraduate students and Masters Part 1 stu-
dents, and in the US system, seniors and first year graduate students.
This text could be also used by a professional mathematician to gain a
thorough grounding in computability and computational complexity.
The topics covered include basic naive set theory, regular languages and au-
tomata, models of computation, undecidability proofs, classical computability
theory including the arithmetical hierarchy and the priority method, the ba-
sics of computational complexity, hierarchy theorems, NP and PSPACE com-
pleteness, structural complexity such as oracle results, parameterized com-
plexity and other methods of coping such as approximation, average case
complexity, generic case complexity and smoothed analysis.
There are a number of topics of this book which have never been given in
a basic text before. I have included the fascinating series of reductions used
by John Conway to show a variation of the “Collatz 3n + 1 Function” is a
universal programming language, something I have found is a great teaching
vii
viii Preface
Other areas of Science always cite source papers2 . (Of course this can only
be done within reason, as we would otherwise endlessly be citing Euclid!) I
sincerely hope that the rather large list of references should serve as a resource
for the reader.
The theory of computation is now a huge subject. It is a great intellectual
achievement. It is also a bit daunting to those who enter. I hope this text will
serve as a guide to help you on your first steps.
2 It is a striking fact that up to around 2012, the centenary of Turing’s birth, Turing’s
most cited paper was not the fundamental work he did on laying the foundations of the
theory of computation, but one in biology called “The Chemical Basis for Morphogenesis”
[Tur52]. This work describes how patterns in nature, such as stripes (e.g. in Zebras) and
spirals, can arise naturally from a homogeneous, uniform state. It was also one of the
earliest simulations of nonlinear partial differential equations.
Acknowledgements
xi
Introduction
3 https://en.wikipedia.org/wiki/YBC_7289.
xiii
xiv Introduction
It is reasonable to assume
√ that Babylonian mathematicians had an algo-
rithm for approximating 2 to an arbitrary precision [FR98].4 In India, the
text Shatapatha Brahmana used an approximation of 339 108 for π, correct to 4
decimal places. This work is dated (if you can believe it) either 600, 700 or
800 BC. Again, there must have been some algorithm for this approximation.
At the heart of our understanding of computability in mathematics and
computer science is the notion of an algorithm. The etymology of this word
goes back to Al-Khwarizmi, a Persian astronomer and mathematician who
wrote a treatise in 825 AD, On Calculation with Hindu Numerals, together
with an error in the Latin translation of his name.
Likely the first algorithm which we might be taught in an elementary
algebra course at university is Euclid’s algorithm devised around 300 BC.
Euclid, or perhaps Team Euclid5 , devised this algorithm for determining the
greatest common divisor of of two numbers. Here is an example of Euclid’s
Algorithm.
Euclid’s Algorithm
• To find gcd(1001,357).
• 1001 = 357 · 2 + 287
• 357 = 287 · 1 + 70
• 287 = 70 · 4 + 7
• 70 = 7 · 10
• 7=gcd(1001,357).
4 Fowler and Robson use their understanding of Old Babylonian mathematics to give an
algorithm that they may have used. Possibly it was something called “Hero’s Method”,
which begins with a guess and refines it with iterative methods.
5 In those days, people such as Euclid would have had a number of disciples working for
him, so it is not completely clear who was responsible for such results.
Introduction xv
The invariants guaranteed by Hilbert’s Basis Theorem are important for ap-
plications and hence we need to know them. Is there an algorithm which gen-
erates them? And, if so, how quickly can we compute them? Actually finding
these invariants, rather than simply proving that they exist, requires signif-
icant algorithmic insight. This quest lead to a new subject called Gröbner
Basis Theory, an area actually due to Buchberger [Buc65].
Hilbert was the leading mathematician of his day. In a famous list of
problems for the new century (i.e. the 20th Century, published in [Hil12]),
in the 1900 International Congress of Mathematicians, Hilbert asked what is
called the Entscheidungsproblem. (The decidability of first order logic.) More
precisely, we know that there is a way of deciding if a formula of propositional
logic is a tautology: you draw up a truth table, and see if all the lines are true.
That is, propositional logic has a decision procedure for validity8 . Suppose
we enrich our language and add quantifiers and predicates to form first order
6 This is likely a myth as the first reference to this “quote” was 25 years after Hilbert’s
paper and after Gordan’s death. But is one of those things that should be true even if it
is not!
7 Another version says “Zauberei” (sorcery) in place of “Theologie”.
8 Not a very efficient method since a propositional formula with 12,000 many variables
would have truth table with 212,000 many lines. In Chapter 7, we will examine the question
of whether there is there is a shorter decision method for this logic.
xvi Introduction
Other examples, such as the work of Kronecker, Hermann, Dehn and Von
Mises [Kro82], [Her26], [Deh11] show that phrases such as ‘by finite means’
or ‘by constructive measures’ were relatively standard, but lacked any precise
definition.
It was the work of a collection of logicians in the 1930’s that the notion of
algorithm was given a mathematical definition. The culmination of this work
was by Alan Turing with the famous paper of 1936 [Tur36]. This definition
used a model now called a Turing Machine. Turing gave a detailed conceptual
analysis of human thought, and used this as the basis for the justification of
his model for “decision procedure”. The point being that in the 1930’s, a
decision procedure would be something a human could do. Turing’s model
was immediately accepted as a true model of computation, and allowed him
9 Strictly speaking, Hilbert asked for mathematicians to give such a decision procedure.
See §4.3.6
10Quotes and comments from Borel’s paper [Bor12] are based on a translation (French to
English) by Avigad and Brattka [AB12].
Introduction xvii
to give a proof that first order logic was undecidable. There is no decision
procedure.
We will look at Turing’s work and other models of computation in Chap-
ter 3. These include partial recursive functions and register machines. The
other models all are equivalent to Turing’s model, and hence support what is
called the Church-Turing Thesis. This states that Turing machines capture
the intuitive model of computation. (We will discuss this further in Chapter
3.)
These models are also useful in establishing other undecidability results,
such as ones from “normal mathematics” such as in group theory, algebra,
analysis, etc. Typical undecidability results work as follows: We have some
problem we suspect is algorithmically undecidable. To establish undecidabil-
ity, we use a method called “the method of reductions”. First we will establish
that a certain problem is algorithmically unsolvable, namely the Halting prob-
lem for Turing Machines. Then, given a problem, we show that if the given
problem was decidable, then we could solve the halting problem. That is, we
“code” the halting problem into the problem at hand.
Because the models are all equivalent, we don’t need to start with Turing
machines, but could code any undecidable problem for the given model. As
an illustration, we will look at a seemingly tame problem generalizing the
famous Collatz Conjecture: The Collatz problem takes a positive integer n,
and defines a function:
(
n
if n is even
f (n) = 2
3n + 1 if n is odd
we then look at the sequence of iterates f (n), f (f (n)), . . . . The Collatz Con-
jecture that we always get back to 1 no matter what n we start with. In
[Con72], John Conway defined a variation of this where we replace even and
odd by a sequence of congruences mod d, and asks what can we predict algo-
rithmically about the sequence of iterates we get. That is, we give a set of ra-
tionals {r0 , . . . , rk , d1 , . . . dp−1 } and define g(n) = ri n + di if n ≡ i mod p. In
the original Collatz conjecture, p = 2 we have r0 = 12 , d0 = 0; r1 = 3, d1 = 1.
Amazingly, for suitably chosen rationals ri , and di = 0 for all i, and integer
p, g = ghr1 ,...,rk i can simulate a universal model of computation equivalent to
the Turing machine! Hence we can’t predict anything about their behaviour!
We prove Conway’s result in Chapter 4. We do so by coding Register Ma-
chines into Conway/Collatz functions. In Chapter 4 we will also prove several
other natural undecidability results-undecidability results concerning prob-
lems which appear in “normal mathematics”, such as in algebra or analysis.
These include word problems in semigroups and we will sketch the undecid-
ability of the Entscheidungsproblem. To complete the chapter, we will look
at the Exponential case of Hilbert’s 10th Problem. This algorithmic problem
asks for an algorithm to determine zeroes of multivariable polynomials with
rational coefficients. Again, we will show that the relevant problem is algo-
xviii Introduction
Hamilton Cycle
Input: A graph G.
Question: Is there a cycle through the vertices of G going through every
vertex exactly once?
Essentially the only known way to solve this is to “try all possibilities”. For
a graph with 50, 000, 000 vertices would involve looking at around 50, 000, 000!
Introduction xix
Euler Cycle
Input: A graph G.
Question: Does G have a cycle through all the edges of the graph exactly
once?
Long ago, Euler [Eul36] proved11 that G has an Euler cycle iff G is con-
nected and has only vertices of even degree. For a graph with 50, 000, 000
vertices deciding this would take around 50, 000, 0002 many steps, well within
the range of modern computers, as it involves simply looking at all pairs of
vertices, first seeing if the graph is connected (which uses an efficient algo-
rithm called network flow running in around n2 many steps) and then seeing
what the degree of each vertex of G is, (taking about n many steps where n
is the number of vertices).
Why is one problem, Hamilton Cycle, seemingly impossible to solve ef-
ficiently and the other, Euler Cycle, “fairly quick” to solve. After all, all we
have done is replaced “all vertices” by “all edges”. In the case of Hamilton
Cycle, can we show that we cannot do better than trying all possibilities?
That would certainly show that the problem is infeasible, whatever we mean
by infeasible.
Is it that we are not smart enough to find some efficient algorithm? What
do we mean by “efficient” anyway? What is our definition?
It was in the 1960’s that authors such as Hartmanis and Sterns [HS65] gave
a formal framework for the investigation of the complexity of algorithms12 .
Their framework was based on putting time and space constraints on the
model used for computability theory. They used their model to calibrate
complexity into time and space bounded classes. At more or less the same
time, Edmonds [Edm65] also suggested asymptotic analysis of the behaviour
of algorithms as a basis for classification of their difficulty, also suggesting
counting computation steps and using polynomial time was a reasonable ap-
proximation for feasibility.
The major advance was the identification of the class NP. This is the class
of problem which can be solved in nondeterministic polynomial time. Nonde-
terminism was discovered decades earlier in an area called automata theory.
However, the introduction of nondeterminism into complexity analyses was
found to give great insight into the understanding the intrinsic difficulty of
11 Actually, in 1736, Euler only proved the necessary condition, and it was only later, in
1871, that Hierholzer [Hie73] established that the conditions were sufficient, although this
was stated by Euler.
12 There are many other contributors to the birth of computational complexity such as
Pocklington (namely [Poc12]), Cobham, Rabin, and others. There is also a famous letter
by Gödel where he describes the class NP well before it was defined. The reader should
read the preface of Demaine, Gasarch and Hajiaghayi [DGH24].
xx Introduction
In the last chapter, Chapter 10, we will rather briefly look at other meth-
ods of dealing with the murky universe of intractability. This will include
approximation algorithms (where we give up on an exact solution and seek
once which is “close” by some approximation measure, e.g. Ausiello et. al.
[ACG+ 99]); and average case complexity (where we seek to understand how
algorithms behave “on a typical input”, Levin [Lev86]). We will also give, for
the first time in a text, a treatment of the ideas of generic case complexity
(Kapovich et. al. [KMSS03]) which is a variation of average case complexity,
within which sometimes even undecidable problems become feasible. In the
last section of the chapter, we will look at the idea behind Spielman and
Teng’s smoothed analysis ([ST01]), which is a relatively recent approach to
understanding why well-known algorithms such as the Simplex Method
for linear programming actually work much more quickly in practice than we
would expect; given the worst case performance is known to be exponential
time. In a basic textbook such as this one, it is impossible to give a complete
treatment of these approaches to dealing with intractability, as each of these
topics would deserve a book to itself (and has already in most cases!). But
I believe that it is important for the reader to see some of the horizons and
hopefully be inspired to pursue further. This is a starred section which is not
essential for the remainder of the book, and more of a tour de horizon.
The only chapters we have not yet discussed are the first two. The first
two chapters are concerned with, respectively, naive set theory and automata
theory.
Naive Set Theory gives a platform for the material to follow, and intro-
duces a number of key ideas, such as coding and diagonalization, more easily
understood in a situation where there is no computation happening.
The second chapter gives a general introduction to computation via a
primitive computation device called an automaton. It also introduces the
notion of nondeterminism. Aside from the use for the later sections of the
book, regular languages and automata theory are a thriving and important
area of computer science in their own right.
Starred Material. Some material is starred. This means that it can easily
be skipped, especially on a first reading. Starred material is not necessary for
the rest of the book.
xxii Introduction
...
EXPSPACE
EXPTIME
E
...
PSPACE
∆04 PH
...
...
Σ30 Π30 Σ3P ∩ Π3P XP
W [P ]
∆03 Σ2P Π2P
...
Σ20 Π20 Σ2P ∩ Π2P W [2]
Part I Background
xxiii
xxiv Contents
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Kleene’s Theorem: Finite State = Regular . . . . . . . . . . . . . . . . . 33
2.6.1 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 The Myhill-Nerode Theorem∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7.1 The Method of Test Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.2 State Minimization∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7.4 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Undecidable Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.2 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Minsky Machines and Generalized Collatz Functions . . . . . . . . 76
4.2.1 Collatz functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 Vector games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.3 Rational Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.4 Generalized Collatz functions . . . . . . . . . . . . . . . . . . . . . . 83
4.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Unsolvable Word problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.1 Semi-Thue Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.2 Thue Processes and Word Problems in Semigroups . . . 89
4.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Contents xxv
11 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
11.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
11.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
11.5 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
11.6 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
11.7 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
11.8 Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
11.9 Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Part I
Background
Chapter 1
Some Naive Set Theory
.
Abstract This chapter gives meaning to the notion of size (cardinality) for
infinite sets. We define countable and uncountable sets, and introduce Gödel
numbering, coding, and diagonalization arguments. These ideas will be recy-
cled throughout the book.
1.1 Introduction
It may seem strange that we begin our journey towards understanding the
basics of the theory of computation with something much older: naive set the-
ory. This goes back to famous work of Georg Cantor [Can74, Can78, Can79]
in the late 19th century. This work attempts to give meaning to |A|, the
“size” of a set A for A infinite. The reason we begin with set theory is that
the ideas and techniques such as diagonalization and coding will have reflec-
tions in much of what follows, so it is useful to see them in the pure situation
of not having the overlay of computation.
We will be looking at “naive” set theory, which takes as an undefined term
“set”, and allows definitions of sets as S = {x : R(x) holds}, without caring
about what properties R we allow. Russell’s Paradox from 1901 points out
that this naive approach leads to problems when, for example, we consider
S = {x : x 6∈ x}, the collection of sets which are not members of themselves.
If we then ask “Is S ∈ S?” either answer leads to a contradiction. There are
methods to circumvent such paradoxes, and these belong to an area of math-
ematical logic called axiomatic set theory, and the reader is urged to pursue
these issues (see, for example Kunen [Kun11]). For us, we’ll be concerned
with sets commonly found in mathematics such as N, Z, R and the like, and
such issues don’t generally arise; at least for the issues we will be concerned
with.
How to give meaning to the size of sets? If we have two bags of candy and
ask children who can’t count who has the most, they would likely pair the
candies off one at a time, and the one who had some left would know they
had more, or if they finished together, they would have the same size. This
is the basic idea behind relative cardinality.
The following result actually needs an axiom called the axiom of choice,
but we will take it as a theorem for our purposes.
Theorem 1.1.1. |A| 6 |B| iff A = ∅ or there is a function g mapping B
onto A. (Recall g being onto means that for all a ∈ A there is a b in B with
g(a) =.)
Remark 1.1.1 (No Special Cases for Empty sets). Unless otherwise specified,
henceforth we will assume sets are nonempty, as things which are obviously
true for empty sets will be obviously true for empty sets, without treating
them as a special case.
Proof. Suppose that |A| 6 |B|. Let f : A → B be 1-1. The obvious thing to
do is to invert f , and then map any of B left over to a fixed member a ∈ A.
Thus g is defined as g(b) = f −1 (b) if b ∈ raf , and g(b) = a if b 6∈ raf.
Conversely1 , suppose that g : B → A is onto. We’d like to invert this
map, but it could be that many elements of B go to the same element of A
and we need to choose one to make the inversion 1-1. We say that b1 ≡ b2
if g(b1 ) = g(b2 ). For each equivalence class [b] = {bb ∈ B : bb ≡ b} choose one
representative. The to define f : A → B, for each a ∈ A, define f (a) = b0 ,
where b0 is the representative of [{b : g(b) = a}]. The f is well-defined since
for all a ∈ A there is a b ∈ B with g(b) = a. And it is 1-1 since we are
choosing one b0 for each a. t u
1 It is here we are using the Axiom of Choice. This axiom says that if {Hi : i ∈ D} is a
family of subsets of a set A, there there is a “choice function” f , such that, for all i ∈ D,
f (i) ∈ Hi . In the present proof the subsets are the pre-images and the choices are the
representatives of the equivalence classes.
1.1 Introduction 5
Proposition 1.1.1. |A| 6 |B| and |B| 6 |C| implies |A| 6 |C|.
We can define |A| = |B| iff |A| 6 |B| and |B| 6 |A|. But the candy
matching motivation would lead us to expect that |A| 6 |B| and |B| 6 |A|
would mean the there is a bijection, that is a 1-1 correspondence, between A
and B; not simply that each could be injected into the other. This intuition
is indeed correct, there is such a bijection, but proving this based on the
assumption that |A| 6 |B| and |B| 6 |A| is surprisingly difficult to prove. Do
not be concerned if you find this proof tricky; it is. Also we really only will
be using the fact that the result is true after this, so could well be skipped.
I am including this as it is good for the soul to see such a pretty proof.
2
Theorem 1.1.2 (Cantor-Schröder-Bernstein).
|A| = |B| iff there is a bijection h : A → B.
Proof. ∗ If there is a bijection h then obviously |A| 6 |B| (by h) and |B| 6 |A|
(by h−1 ).
Conversely, suppose that |A| 6 |B| and |B| 6 |A|, via injective functions
f : A → B and g : B → A. Let P (A) denote the power set of A, the set of
all subsets of A. Let ϕ : P (A) → P (A) be defined as follows. For E ∈ P (A),
that is E ⊆ A, let
ϕ(E) = (A \ (g(B \ f (E))).
D = ∪E∈R E = ∪R.
ϕ(D) = D.
That is
ϕ(D) = A \ g(B \ f (D)) = D
Hence, g(B \ f (D)) = A \ D. Therefore B \ f (D) = g −1 (B \ D).
But now we are finished: f maps D 1-1 to f (D), and g −1 maps A \ D 1-1
to B \ f (D).. There fore we can define
(
f (a) if a ∈ D
h(a) = −1
g (a) if a ∈ A \ D
Then as f and g −1 are 1-1 and onto for these restricted domains, h is a
bijection. t
u
As we see later in Section 1.3, Cantor’s insight is that using this definition
of relative size some infinite sets cannot be put into 1-1 correspondence with
each other. We will start this analysis with the simplest infinite sets.
The name comes from the intuition that if we had g : N → A, being onto, as
we “counted” 0, 1, 2, . . . we are implicitly counting A = {g(0), g(1), g(2), . . . }.
Clearly every B ⊆ N is countable, as we can map B into N in a 1-1 way
by mapping b 7→ b for b ∈ b. Are there any other interesting examples of
countable sets? Definitely yes. We will look at several examples, and then
introduce some generic coding techniques for establishing countability. These
techniques will be recycled when we look at computable functions in Chapter
5, and the undecidability of the Halting Problem in Chapter 3.
Example 1.1.1. Z is countable. There is a simple bijection from Z to N defined
via f (0) = 0 and, for n ∈ N, f (−n) = 2n and f (n) = 2n + 1. The negative
integers are mapped to the even numbers and the positive ones to the odd
numbers.
Example 1.1.2 (Cantor [Can74, Can78]). Q is countable. We will look at Can-
tor’s original technique which uses a “pairing” function (See Exercise 1.2.5).
Using the trick for Z is surely enough to show that Q+ , the positive integers,
1.2 Gödel numberings and other coding techniques 7
2 2 2 2 2
1 2 3 4 5 ···
3 3 3 3 3
1 2 3 4 5 ···
4 4 4 4 4
1 2 3 4 5 ···
.. .. .. .. ..
. . . . .
The method of Example 1.1.2 can be used to other situations where such
a 2 × 2 array could be visualized.
The methods of the previous section are valuable, and should be internalized.
However, today we think of data as being digitized, and this involves coding
the data by some representation. Typically this would be base 2. Implicitly,
3 Again choice is being used here, as we did with Theorem 1.1.1.
8 1 Some Naive Set Theory
1.2.1 Exercises
(x + y)(x + y + 1)
p(x, y) = +y
2
is a bijection from N × N → N.
This is not at all easy. A hint is that from basic arithmetic:
x+y
X (x + y)(x + y + 1)
k= .
2
k=0
So are all sets countable? Set theory would certainly be simpler if that was
true, but Cantor had the intuition that N cannot be put into 1-1 correspon-
10 1 Some Naive Set Theory
dence with R, or even R ∩ (0, 1). Thus there are different cardinalities of
infinite sets. Cantor’s method was an ingenious new method called Diagonal-
ization which has under-pinned a lot of modern mathematics and computer
science. In the proof below we will add a diagram which makes it clear why
the technique is called diagonalization.
Theorem 1.3.1 (Cantor [Can79]). |N| < |R ∩ (0, 1)|.
Proof. Certainly |N| 6 |R ∩ (0, 1)| via a map such as n 7→ ·0n where we
write n in unary (base 1). (Thus, for instance 2 7→ ·011.) Thus we need to
show that R ∩ (0, 1) is not countable. Suppose not. Then, by the Cantor-
Schröder-Bernstein Theorem, there is a bijection f : N → R ∩ (0, 1). That is
{f (0), f (1), . . . } lists all of the reals between 0 and 1.
Now we think of a real between 0 and 1 via its decimal expansion. Thus
r = .a1 a2 . . . with ai ∈ N. This is not the greatest representation in the
world, since we can’t really decide which to chose of, say, ·010000 . . . and
·009999999 . . . . But the method below will dispose of both such representa-
tions simultaneously.
We can think of f (i) = ·ai,1 ai,2 ai,3 . . . .
We define a real r which
1. Should be on the list as it is in (0, 1), and
2. Can’t be on the list, because of the way we construct it, to diagonalize
against the list.
We define r = ·r1 r2 . . . where we specify ri = ai,i +5 mod 10. So if ai,i = 6
then ri = 1, for example. Now we can imagine this using the diagram below:
a1 = a11 a12 a13 a14 a15 ···
.. .. .. .. .. .. ..
. . . . . . .
You can see we are working down the diagonal, and making ri , the i-th
decimal place of r significantly different from f (i) its its i-th place ai,i . Thus
the distance between r and f (i) is at least 4 × 10−i . Thus, r 6= f (i) for all i,
each f (i) being “diagonalized” at decimal place i.
1.3 Diagonalization and Uncountable Sets 11
The reader should note that we did not state this as |N| < |P (N)| although
this is easily seen to be true since a 7→ {a} is a suitable injection from N into
P (N). But for any infinite set X, if it is not countable then |N| < |X|. This
fact is somewhat a consequence of the following “Trichotomy” result, which
is also equivalent to what is called the Axiom of Choice. We will take the
result as a Black Box, since delving into the niceties of axioms of set theory
will distract from our main mission.
Theorem 1.3.2 (Trichotomy). For any sets A, B exactly one of the fol-
lowing holds: |A| < |B|, |A| = |B|, or |B| < |A|.
We remark that |N| < |P (N)| is a special case of a more general result
which uses an abstract form of diagonalization, saying that for any set A,
|A| < |P (A)| as we see below. We will recycle the diagonalization method
when we prove that the halting problem for Turing Machines is algorithmi-
cally undecidable in Theorem 4.1.4, and later the Hierarchy Theorem (The-
orem 6.1.5) in §6, two of the core theorems of this book.
Theorem 1.3.3 (Cantor [Can79]). For any set A, |A| < |P (A)|.
1.3.1 Exercises
Exercise
P∞ 1.3.4 A power series is an “infinite polynomial” of the form
n
n=0 a n X . ItPis called binary if an ∈P{0, 1} for all n. Two binary power
∞ ∞
series a(X) = n=0 an X n and b(X) = n=0 bn X n are equal iff an = bn for
all n. Show that the collection of binary power series forms an uncountable
set.
Exercise 1.3.5 Let X = {x1 , x2 , . . . } be a countable set. Let {Yn : n ∈ N}
be a countable listing of all finite subsets of X. Show that the set
S = {n : xn 6∈ Yn }
is not finite.
Exercise 1.3.6 Suppose that A is countable and B is uncountable. Show
that |A| × |B| = |B|.
Exercise 1.3.7 Show that |R × R| = |R|.
belief that if a function satisfied the intermediate value theorem (namely for
all x < y and z ∈ [f (x), f (y)], there is a q ∈ [x, y] such that f (q) = z),
then it must be continuous. This is easily seen to be false by using the point-
set definition of function, and in fact it is possible to have a function which
assumed all real values in every interval, and was everywhere discontinuous!
Cantor [Can74] showed the insight gained by the point to point definition of
an injective function.
One of the first applications of Cantor’s work concerned transcendental
numbers. Transcendental numbers are reals which are not algebraic. We met
algebraic numbers in Exercise 1.2.4; they being defined as those real numbers
which are roots of polynomials with rational coefficients. Heroic efforts in the
19th century with Louiville showing in 1844 that transcendental numbers do
exist. In 1851, Louiville showed that the “explicit number”
∞
X
10−n!
n=1
The reader might note that we have skirted the issue of what the actual size of
a set is. In view of Cantor-Schröder-Bernstein, we can think of relative sizes
as be subdivided into equivalence classes where A ≡ B means |A| = |B|.
These equivalence classes correspond to the sizes of certain well-orderings
called ordinals, a discussion of which would take us on a beautiful trip, too
expensive for us to travel in this (relatively) short book. Now, for the finite A’s
we know the names of [A]’s, these ordinals are simply the finite numbers. For
|N| the ordinal is ω which is N considered as an ordering ω = 0 < 1 < 2 . . . .
This sequence is extended by adding a new element in the ordering beyond
ω so bigger than all the finite ordinals and call this ω + 1, thence ω + 2, and
so on. Ordinals have many other uses in mathematics beyond set theory
The equivalence class of sets which have the same cardinality as N, or
more precisely the order-type of this ordering ω, is called ℵ0 . This is referred
to as a cardinal. The next cardinal is called ℵ1 , and it is again associated
with a particular ordinal, a well ordering, akin to ω and is called ω1 . It is the
first uncountable ordinal. Thus, the order type of ω1 is the first uncountable
cardinal. The cardinality of R is the cardinality of the set of functions from N
to {0, 1}, that is, the number of subsets of N. In the arithmetic of cardinals,
this is 2ℵ0 .
The continuum hypothesis, says that ℵ1 = 2ℵ0 . That is, there is no set A
with
|N| < |A| < |R|.
This remarkable hypothesis was at the heart of set theory for over 60 years;
and is really part of axiomatic set theory. It turns out that relative to the
accepted axioms of set theory, it is both consistent that the continuum hy-
pothesis is true, and that the continuum hypothesis is false. The consistency
of it being true was proven by Gödel [Goe40]. Cohen [Coh63, Coh64] proved
the consistency of it being false. Both proofs involved truly significant new
ideas (constructibility in Gödel’s case, and forcing in Cohen’s case) the ram-
ifications of which are still being felt today. We refer the reader to Kunen
[Kun11] if they want to pursue this fascinating topic.
Part II
Computability Theory
Chapter 2
Regular Languages and Finite
Automata
Abstract We introduce the notion of a regular language, and show that reg-
ular languages are precisely those that are accepted by deterministic finite
automata. We introduce nondeterminism, and prove that for automata, non-
deterministic and deterministic machines have the same power, the trade-off
being an exponential increase in the number of states. We finish with the
Myhill-Nerode Theorem which shows how finite state is that same as having
finite index for a certain canonical equivalence relation.
2.1 Introduction
In this chapter we will begin by look at yet another apparently unrelated area:
formal language theory. But we will find that formal languages go hand-in-
hand with a primitive notion of computation called an automaton. Whilst
automata post-date the Turing machines we will soon meet in Chapter 3,
studying them first makes the material from Chapter 3 more easily digested.
To do this, we will first introduce a syntactic notion1 called regularity, a
very important notion in an area called formal language theory. As we see,
regularity coincides with being accepted by an automaton. This fact shows
that things which seem far from being concerned with computation can be
intertwined with computation theory.
Even now, the area of formal language theory, and its widespread applica-
tions in computing, remain significant areas of research interest. In particular,
many automata-based algorithms are used in compilers, operating systems
(e.g. avoiding deadlock), and security (they are used to model security). More
1 That is one based on symbols and rules for manipulating them.
widely, even algorithm design for some aspects in graph theory (such as for
parse-based graph families like those of bounded path and treewidth) are re-
liant on automata-theoretical methods. Many basic algorithms in computer
science, such as in software engineering, rely heavily on such methods. Thus,
beyond motivation for later material, this material is something “very good
to know”. The principal proof technique for this section is mathematical in-
duction, applied to both the number or length of an object, or its structure.
In this section, we will be considering subsets of the set of all strings ob-
tained from an alphabet Σ = {a, . . . , an }, which we will always be finite in
this Chapter. Recall from Exercise 1.2.2 that a string or word is a finite con-
catenation of symbols from Σ. For example, if Σ = {a, b, c} then bbaccaacc is
a string in this alphabet Σ. We sometimes write wj for w concatenated with
itself j times. Thus, this previous expression could be written as b2 ac2 a2 c2 .
The reader should note that if Σ = {00, 11}, then 1100 would be a string,
but 01 would not be. We let Σ ∗ denote the collection of all strings obtainable
from Σ. Notice that this set could be obtained using an inductive definition:
λ, the empty string, is a string. If σ ∈ Σ ∗ , then σa ∈ Σ ∗ for each a ∈ Σ.
Σ ∗ is called the Kleene Star of Σ, named in honour of the American mathe-
matician Stephen Cole Kleene [Kle56], the first to realize the significance of
the class of languages we will study, the regular languages.
For example, if Σ denotes the letters of the English alphabet, then strings
would be potential English words. Only some of these strings are English
words, and we could form
The fact that most languages are not regular does not give any method to
to decide if some simple-looking language which could be regular is regular.
The following result gives a necessary condition for regularity. Note that all
finite languages are regular, since we could use an expression which simply
lists the members. For example if L = {a, aa, b, cc} over Σ = {a, b, c} then
L = L(α) for α = a ∪ aa ∪ b ∪ cc. Thus the theorem below is concerned with
necessary conditions for infinite languages.
The Pumping Lemma is a very useful tool for showing that languages are
not regular.
Proof. Like many proofs relying on the Pumping Lemma, the proof works by
looking at the form of the members of L, and then arguing that the relevant
x, y, z cannot exist. In this case, x, y and z must be aj for fixed j’s, and then
it would follow that there would be an infinite arithmetical progression in the
primes3
In more detail, if L were regular, there would need to be p, q, r ∈ N such
that x = ap , y = aq and z = ar with q 6= 0, such that for all n > 0,
ap (aq )n ar = ap+nq+r ∈ L. But then p + nq + r would need to be prime for
all n > 0. But let n = p + 2q + r + 2. Then p + nq + r is prime and hence
p + (p + 2q + r + 2)q + r is prime. But p + nq + r = p + (p + 2q + r + 2)q + r =
(q + 1)(p + 2q + r), which is the product of two smaller numbers (as q 6= λ),
and this is a contradiction. t u
2.2.4 Exercises
Given regular languages L1 , L2 can we make others? It turns out that the
following closure properties hold: Regular languages are closed under
1. Union: If L1 and L2 are regular, so is L1 ∪ L2 .
2. Intersection: If L1 and L2 are regular, so is L1 ∩ L2 .
3. Concatenation: If L1 and L2 are regular, so is L1 ·L2 = {xy : x ∈ L1 ∧y ∈
L2 }.
4. Kleene star: If L is regular, so is L∗ .
5. Complementation: If L is regular, so is L = Σ ∗ \ L.
We have the techniques to establish union, concatenation, and Kleene
star, as they follow by definition of a regular expression. However, showing
closure under intersection and complementation is much more difficult. If
we can show closure under complementation, then closure under intersection
follows, since L(α) ∩ L(β) = L(α) ∪ L(β), by De Morgan’s Laws. We will
delay proving the following result until §2.6 we show that regular languages
are exactly those accepted by automata; as the result becomes obviously true.
This book is about computation, and we have not yet seen any computational
devices. In this section, we will introduce our first model of computation. It is
not a very general one, but turns out to coincide with the syntactic notion of
regularity. The reader should think of the definition below as a computation
device with a finite number of internal states. The device examines the bits
of string in order. According to the symbol being scanned and the internal
state, moves to another (perhaps the same) state and moves on to the next
state. We end when we run out of symbols to be scanned; so that the machine
has eaten the whole string and is now in a state qi reading the empty string
λ. If we are in a “yes” (accept) state at the end we put the string in the
language, and if not we leave it out. More formally we have the following.
We will usually denote states by {qi : i ∈ G} for some set G and have
S = {q0 }. The automata M as starts on the leftmost symbol of a string
σ ∈ Σ ∗ , in state q0 . The transition function δ induces a rule
“If M is in state qi reading a (∈ Σ), move one symbol right and change
the internal state of M to δ(qi , a) = qj for some qj ∈ K.”
0
0
1
qº q¡
Historical Notes
The study of what we now call finite automata essentially began with Mc-
Culloch and Pitts [MP43]. Kleene [Kle56] introduced the model we now use.
Many others had similar ideas and introduced various models. Hopcroft and
Ullman [HU79] is a standard text for finite state automata.
2.4 Exercises
The basic idea behind nondeterminism is that from any position, there are
a number of possible computation paths. For automata nondeterminism is
manifested as the generalization the transition relation δ from a function to
a multi-function. Thus, δ now becomes a relation so that from a given hstate,
symboli there may be several possibilities for δ(state, symbol). Formally, we
have the following definition.
but now we will interpret ` to mean “can possibly yield.” Again, we will let
`∗ denote the reflexive transitive closure of ` and declare that
qº q∞ a q¡ b q™
b
a b b a
b
b
a q£ q¢ a
Q0 = S = {q0 },
Q1 = {q ∈ K : hq0 , ai ` hq, λi},
and
Q2 = ∪q∈Q1 {r ∈ K : hq, bi ` hr, λi}.
The idea continues inductively as we run through all the members of Σ and
longer and longer input strings. We continue until no new states are gener-
ated. Notice that path lengths are bounded by n = |K|, and that the process
will halt after O(2n ) steps. The accept states of the machine so generated
consists of those subsets of K that we construct and contain at least one
accept state of M . It is routine to prove that this construction works. (See
Exercise 2.5.3.) tu
The algorithm implicit in the proof above is often called Thompson’s con-
struction [Tho68]. Notice that we may not necessarily construct all the 2n
subsets of K as states. The point here is that many classical proofs of Rabin
and Scott’s theorem simply say to generate all the 2n subsets of K. Then,
let the initial state of M 0 be the set of start states of M , the accept states
of M 0 are the subsets of K containing at least one accept state of M , and, if
F ⊆ K and a ∈ Σ, define
2.4.3 Exercises
b
a
b a
{qº} {qº, q∞} {qº, q¡}
b
a
a
{ qº, q∞, q™, q¢} {qº, q£}
b
b
a
a
b
{ qº, q¡, q¢}
a
{ qº, q£, q¢}
Exercise 2.4.6 (Leiss [Lei81]) (This is not easy.) Let L = L(α) for α =
2(0∪1)∗ 12∗ ((0∪1)2∗ )n . Prove that L can be represented by a nondeterministic
automaton with n+2 many states, but any deterministic automaton accepting
L must have at least 2n+2 + 1 many states.
λ
1
0 0 0
q¡ λ q™
qº
1
1
q£ 0
that is, E(q) denotes the collection of states accessible from q with no head
movement. In Example 2.4.3, E(q0 ) = {q0 , q2 }. As with the proof of Theorem
2.4.4, for M 0 , we replace states by sets of states of M . This time the set of
states are the equivalence classes E(q) for q ∈ M . Of course, the accept states
are just those E(q) containing accept states of M , and the new transition on
input, a, takes E(q) to E(r), provided that there is some qi in E(q) and
some qj in E(r) with ∆ taking qi to qj on input a. It is easy to see that this
construction works (Exercise 2.5.4). t u
When applied to the automaton of Example 2.4.3, the method used in the
proof of Theorem 2.4.7 yields the automaton in Figure 2.5.
0 0
0
{qº, q™} 1 {q¡, q™}
1 1
{q£} 0
1. union
2. intersection
3. star (i.e., if L ∈ L, then so is L∗ , the collection of strings obtainable from
L)
4. complementation
5. concatenation (i.e., if L1 , L2 ∈ L, then so is L1 L2 = {xy : x ∈ L1 ∧ y ∈
L2 }).
Proof. Let Mi = hKi , Σi , Si , ∆i , Fi i for i = 1, 2, be automata over Σ. By
renaming, we can suppose that K1 ∩ K2 = ∅.
For (i), we simply add a new start state S 0 with a λ move to each of the
start states of M1 and M2 . The accept states for M1 ∪ M2 would be F1 ∪ F2 .
For (iv), we need to prove that L(M1 ) is finite state. This is easy. Let
M 0 = hK1 , Σ, S1 , δ1 , K1 − F1 i. Clearly, L(M1 ) = L(M 0 ).
For (ii), we can use the fact that L(M1 ) ∩ L(M2 ) = L(M1 ) ∪ L(M2 ).
We leave (iii) and (v) for the reader (Exercise 2.5.6). t
u
Historical Notes
2.5 Exercises
Exercise 2.5.6 Prove (iii) and (v) of Theorem 2.4.8; that is, prove that the
collection of finite-state languages over Σ is closed under star and concate-
nation.
Exercise 2.5.7 It is possible to generalize the notion of a nondeterministic
automaton as follows. Now M is a quintuple hK, Σ, δ, S, F i with all as before
save that δ is a multi-function from a subset of K × Σ ∗ 7→ K giving a rule:
In the last few sections, we have examined regular languages, proving that
that determinism and nondeterminism have the same computational power,
at least in sofar as accepting languages is concerned. The following beautiful
theorem shows that the syntactic approach given by regularity and the se-
mantic approach using automata coincide. Later we will see an echo of this
when we show that partial recursive functions and partial Turing computable
functions coincide in Chapter 3.
Theorem 2.6.1 (Kleene [Kle56]). For a language L, L is finite state iff
L is regular. Furthermore, from a regular expression of length n, one can
construct a deterministic finite automaton with at most (more or less) (2n )
states accepting L.
34 2 Regular Languages and Finite Automata
L(M ) = ∪{R(i, j, n + 1) : qi ∈ S ∧ qj ∈ F }.
To see that the claim holds, note that R(i, j, k + 1) is the collection of strings
σ with hqi , σi moving to hqj , λi without using states with indices > k + 1.
Now to go from state qi to state qj without involving qm for m > k + 1, we
must choose one of the following options.
1. Go from qi to qj and only involve states qp for p < k. This is the set
R(i, j, k).
2. Otherwise go from qi to qj and necessarily involve qk . Therefore, we must
choose one of the three sub-options below.
q˚–¡
q∆
q^
q¡
Figure 2.6 may be helpful in visualizing (b) above. Thus the contribution
of (b) is R(i, k, k)R(k, k, k)∗ R(k, j, k). Now we can apply the induction hy-
pothesis to each of the expressions in the decompositions of R(i, j, k + 1) of
the claim. This observation establishes the claim, and hence the Lemma and
finishes the proof of Theorem 2.6.1. t u
t
u
Kleene’s theorem is taken from Kleene [Kle56]. The now-standard proof that
finite-state languages are regular was formulated by McNaughton and Ya-
mada [MY60].
Definition 2.7.1.
The archetype for this notion is congruence mod k, on the integers. For
this equivalence relation there are k cells {[0], [1], . . . , [k − 1]}. If we consider
k ∈ {2, 4}, then congruence mod 4 is a refinement of congruence mod 2.
Proof. (i) → (ii). Suppose that L is regular and is accepted by the deter-
ministic finite automaton M = hK, Σ, S, F, δi. Without loss of generality, let
S = {q0 }. We define R = RM via
xRy iff there is a q ∈ K such that hq0 , xi `∗ hq, λi and hq0 , yi `∗ hq, λi
(that is, on input x or y, we finish up in the same state q). It is evident that
RM is a right congruence since, for all z ∈ Σ ∗ ,
Also, the index of RM is finite since there are only finitely many states in K.
The theorem follows since L = ∪{σ ∈ Σ ∗ : hq0 , σi `∗ hq, λi} with q ∈ F .
(ii) → (iii). We prove (b) from which (ii) → (iii) follows. Let R be a right
congruence of finite index so that L is a union of some of R’s equivalence
classes. By the fact that R is a right congruence, for all z,
xz ∈ L iff yz ∈ L.
It is often quite easy to prove that certain languages are not finite state
using the Myhill-Nerode Theorem.
Proof. Recall that we earlier proved this using the Pumping Lemma. But
using the Myhill-Nerode all we need to note is that {an 6∼L ak : for all k 6=
n ∈ N} t u
xy n z ∈ L.
Theorem 2.7.2. For a formal language L, ∼L is finite index iff ≈: has finite
index.
Proof. Suppose that ≈L is finite index, but ∼L is not. Let {xi 6∼L xj : i 6= j}.
But then these also witness ≈L does not have finite index, a contradiction.
Conversely, suppose that ≈L is not finite index, but ∼L is. Let A = {xi 6≈L
xj : i 6= j}. Then for each pair e = hi, ji there are ye , ze such that
ye xi ze ∈ L iff ye xj ze 6∈ L.
see that R is a right congruence, suppose that xRy and z ∈ Σ ∗ . Now xrev ≈L
y rev and since ≈L is a (left) congruence, we have z rev xrev ≈L z rev y rev and
hence (xz)rev ≈L (yz)rev . This implies that xzRyz and hence R is a right
congruence.
Our result will then follow from the Myhill–Nerode Theorem once we can
establish that Lrev is a union of equivalence classes of R. We need to show
that if xRy, then x ∈ Lrev implies y ∈ Lrev . If x ∈ Lrev , then xrev ∈ L. If
xRy, then xrev ≈L y rev and hence y rev ∈ L, which implies that y ∈ Lrev .
∗
The proof of (iii) in Theorem 2.7.1 gives rise to what Downey and Fellows
[DF13] call The Method of Test Sets.
Suppose that we have a language L which we know is regular but we are
not given a regular expression for L, only given a representation via a decision
procedure for membership by some algorithm which says, on input x, either
yes x ∈ L, or no, x 6∈ L. Is it possible to deduce an automaton accepting L?
Because of Rice’s Theorem (which we prove in Chapter 5), we know that we
cannot deduce a regular expression or automaton for L.
However, given such an algorithm, we can deduce the automaton if we
additionally know, for instance, some bound on the number of states of such
an automaton. The method of test sets allows to solve the following “promise
problem.”
Proof. The idea is simple. We use the proof (iii) → (i) of Theorem 2.7.1. The
machine is described by items 1 to 4 of that proof. The only fact that we
need to check is that we can decide if x ∼L y for x, y ∈ L. By definition,
x ∼L y iff for all z ∈ L, xz ∈ L iff yz ∈ L. Suppose that M is any automaton
accepting L with n states, and x 6∼L y. We claim that there some z ∈ Σ ∗
with |z| 6 n and z a witness to x 6∼L y. The point is that there is some z 0
such that, say, xz 0 ∈ L yet yz 0 6∈ L. This means that if we input xz 0 into L,
we will get to an accept state, and if we input yz 0 into M , we must get to a
nonaccept state. But since there are at most n states in M , the longest path
2.7 The Myhill-Nerode Theorem∗ 41
from one state to another can have length n. Thus, we can refine z 0 down to
a string of length 6 n by deleting all loops.
Thus, there is some z with |z| 6 n witnessing that x 6∼L y. Hence, to
decide if [x] = [y], we simply need go through all z ∈ Σ ∗ with |z| 6 n and
see if for those z, xz ∈ L iff yz ∈ L. The same reasoning demonstrates that
for all x ∈ Σ ∗ , there is some x0 ∈ [x] with |x0 |, |y 0 | 6 n. Hence, to implement
the method of (iii) → (i), we need only look at a table of strings of length
6 n and test them with strings of length 6 n. The fact that ∼L defines the
coarsest congruence means that the number of states of M 0 we obtain from
this technique must be minimal. t u
0 1 1
1 1 0
a b c d
0
0
0
e
Original Automaton
1
b b
c c
d d
e e
a b c d a b c d
1 1
0
1 0
a {b, c} d
0
0
1
Minimized Automaton
2.7.3 Exercises
Use the method of Example 2.7.1 to prove that for any r ∈ N, if L is finite
state, then so is
(Hint: For x ∈ Σ ∗ , let Sx,i = {[u]∼L : d(x, u) = i}. For i 6 r, let xRi y iff Sx,i = Sy,i .
Argue via R = ∩06i6r Ri .)
Exercise 2.7.7 Use the method of Example 2.7.1 to prove that if L is finite
state, then so is (1/3)L = {x ∈ Σ ∗ : ∃y ∈ Σ ∗ (xy ∈ L ∧ |xy| = 3|x|)}. (Hint:
Define xRy to hold iff x ≈L y ∧ Sx = Sy , where Sz = {[u]∼L : |zu| = 3|z|}.)
3.1 Introduction
“in state qi reading xj , so perform action Ai,j and move to state qi,j ”.
Some authors use the model of quintuples where one can print a symbol
and also move but this clearly makes no essential difference in that one kind
of machine can be simulated by the other at the expense of a few extra moves.
All of our machines will have an initial state q0 . They will also have a halt
state qh . We will use the convention that if the machine gets into a situation
where there is no quadruple for an hqi , Sj , ·, ·i situation, then the machine
will halt.
1 One traditional model would have the tape be infinite from the start, but a halting
computation will only ever see a finite portion of the tape. The reader should think of the
tape as having a tape generating machine at each end, which will manufacture more tape
whenever a computation needs more tape.
3.2 Turing Machines 47
Non-halting. In contrast with the situation for finite automata, Turing Ma-
chine might never halt. For example, consider the machines M1 and M2 over
the alphabet {0} (and the blank square symbol B) which have quadruples
M1 = {hq0 , 0, L, hq1 i, hq1 , B, 0, q0 i}, and M2 = {hq0 , 0, L, hq1 i, hq1 , B, R, q0 i}.
Neither machine will ever halt when started on a tape blank except with a
single 0, in state q0 reading single 0. They do so for different reasons. M1
never halts but continue left forever. M2 never halts but moves back and
forth from the 0 to the blank on the left. We will use the notation ↑ to denote
a non-halting computation (no matter what the reason), and ↓ for a halting
one.
Now we need to say what we mean by a machine giving a mechanical
method of computing a function. We restrict our attention to functions on
the integers, but as we see this is no real restriction. We will represent x ∈ N
as a block of x 10 s on the tape, meaning that for the present we will be using
unary notation. When running M on x we begin with an otherwise blank
tape on the leftmost 1 of a block of x 1’s. For a deterministic machine, we
regard the computation as being successful computing y if we finish in the
halting state on the leftmost 1 of a block of y 1’s of an otherwise blank tape.
In this case we would write M (x) ↓ and M (x) = y. For reasons that are not
obvious at this stage, we will concern ourselves with partial functions.
Definition 3.2.2.
That is, if x is in the domain of f , the machine will stop on the leftmost
1 of a block of f (x) many 1’s, and if x is not in the domain of f , M won’t
halt.
Example 3.2.1. f (x) = 2x. In the below qn is the halt state.
hq0 , B, B, qn i hq2 , 1, 0, q1 i
hq0 , 1, 0, q1 i hq2 , B, L, q3 i
hq1 , 0, L, q1 i hq3 , 0, 1, q3 i
hq1 , B, 0, q2 i hq3 , 1, L, q3 i
hq2 , 0, R, q2 i hq3 , B, R, qn i
We will delay the proof of Theorem 3.2.2 until later since to prove it by
using explicit quadruples would provide much detail and no insight.
Another model, more or less the same as Turing machines is provided by
the following definition.
50 3 General Models of Computation
3.2.1 Exercises
Exercise 3.2.3 There are other ways a machine may not halt. Using the
alphabet Σ = {0, B}, (B for blank) build a machine which, when started on
an otherwise blank tape will fill the whole tape with 0’s.
Exercise 3.2.4 Show that the function f (x) which is 1 if x is even and 2x
if x is odd is Turing computable.
3.2 Turing Machines 51
Exercise 3.2.5 Construct Turing Machines for the following partial func-
tions f : N → N. You may assume that the input is a string of 1’s on an
otherwise blank tape. Use the alphabet B (for blank), 0, 1.
In this subsection, we will introduce a central idea, coding, which will be used
throughout the book. This idea should become so ingrained in the reader as
to become second nature.
52 3 General Models of Computation
The reader should also note that coding allows us to consider a lot of other
models such as multi-tape Turing Machines. That is we have n tapes instead
of one, with n heads. Then the next “move” is determined by a transition
function based on the symbol each head is reading and in which state (or
states) the machine is in, that determines the action on each of the tapes. In
this model, it is convenient to have a single tape for the input and output,
at least for unary functions. Again, this could be simulated by a single tape
machine using, for example, Gödel numbers, with n primes devoted to each
of the tapes, and their powers to the contents.
3.2.3 Exercises
Exercise 3.2.11 (We have already seen this so it should be trivial.) Show
how to effectively code all finite subsets of N.
Exercise 3.2.12 Build a Turing machine which simulates g(x, y) : N × N →
{0, 1}, which is 1 if x is a factor of y, and 0 if either x = 0 or x is not a factor
of y.
In the next few sections, we will look at other models of computation. They
provably give the same class of functions as the partial functions computable
by Turing Machines. We are now in a position to state one of the fundamental
claims in computability theory.
This is a claim that cannot be proven, although as usual with the Scien-
tific Method, it could be disproven. However, this thesis is almost universally
accepted. The evidence, so far, is that all models so far found by people co-
incide with the class of (partial) Turing computable functions, and whenever
one analyses the way a computable process might work it seems that a Turing
machine can emulate it.
The context of Turing’s 1936 paper [Tur36] where he introduced Turing
Machines was that Turing was trying to model an “abstract human”. That
is, in 1936, a decision procedure would be have been interpreted to mean “a
procedure which could be performed by a human with arbitrary amounts of
54 3 General Models of Computation
time”. Thus, Turing did a thought experiment and argued an abstract hu-
man computor (computors were people employed to do mathematical com-
putations) by limitations of sensory and mental apparatus we have had the
following properties:
1. a finite bound for the number symbols,
2. a finite bound for the number of squares,
3. a finite bound on the number of actions at each step,
4. a finite bound on the number of squares each action could be applied to
on each step,
5. a finite bound on the movement, and
6. a finite bound on the number of (mental) states.
Gandy, Soare (and others, see [Dow14]) argue that Turing proves (in the
sense of proofs in the physical sciences) any function calculable by an abstract
human is computable by a Turing Machine.
There were other models of computation before Turing’s model, such as
λ-computable functions (not treated in this book,) and partial computable
functions we cover soon. But the impact of Turing’s work can best be de-
scribed by the following quote from Gandy in [Her95]:
“What Turing did, by his analysis of the processes and limitations of calculations of
human beings, was to clear away, with a single stroke of his broom, this dependency
on contemporary experience, and produce a characterization-within clearly perceived
limits- which will stand for all time..... What Turing also did was to show that
calculation can be broken down into the iteration (controlled by a “program”) of
extremely simple concrete operations 2 ; so concrete that they can easily be described
in terms of (physical) mechanisms. (The operations of λ-calculus are much more
abstract.)”
3.2.5 Nondeterminism
If the reader is willing to believe that Turing machines have the same com-
putational power as modern computers and vice versa, this result is obvious.
That is, given a program in some computer language, we can convert it into
ASCII code, and treat it as a number. Given such a binary number, we can
decode it and decide whether it corresponds to the code of a program, and if
so execute this program. Thus a compiler for the given language can be used
to produce a universal program.
We will prove a sharper statement of the result in the next section, but
here is a sketch of Turing’s proof of the result. First, a Turing machine is
simply a finite collection of quadruples Q1 , . . . Qk over some alphabet of states
q1 , . . . , qm and symbols L, R, S1 , . . . , Sd . We can assign unique Gödel numbers
to each of the symbols and states, and hence a quadruple Q could be coded
by #(Q) = 2#(qi ) 3#(Sj ) 5#(Ai,j ) 7#(qi,j ) . Then we could code the machine by
56 3 General Models of Computation
#(Q )
2#(Q1 ) 3#(Q2 ) . . . pk k . Then we could consider the two tape Turing machine
which reads the input x, y and on tape 1 decodes the input x onto tape 2 as
a collection of quadruples and then executes this machine on y.
The reader who, perhaps rightfully, believes that the above is mere hand-
waving should be reassured by the next section, where we will formalize
things, and culminate with Kleene Normal Form. At the same time we will
introduce another more functional model of computation called partial recur-
sive functions.
Turing was the deviser of the idea of a universal machine, a compiler.
The reader should remember that, before this, models of computation were
hand crafted for the particular function. For example, things like “slide-rules”
were mechanical devices to add, multiply, divide and other basic arithmetical
operations. That is if you wanted to compute some function, a machine would
be purpose built for it. The Universal Turing machine makes this unnecessary:
Turing said in a lecture of 1947 with his design of ACE (automated com-
puting engine—one of the first designs for a programmable computer)
“The special machine may be called the universal machine; it works in the following
quite simple manner. When we have decided what machine we wish to imitate we
punch a description of it on the tape of the universal machine... . The universal
machine has only to keep looking at this description in order to find out what it
should do at each stage. Thus the complexity of the machine to be imitated is
concentrated in the tape and does not appear in the universal machine proper in
any way... [D]igital computing machines such as the ACE ... are in fact practical
versions of the universal machine.”
Examples
(i) f (a, b) = a + b is primitive recursive.
Let h1 (x1 ) = S(x).
Let h2 (x1 , x2 ) = P12 (x1 , x2 ) = x1 .
Let h3 (x1 , x2 , x3 ) = P33 (x1 , x2 , x3 ) = x3 .
Let h4 (x1 , x2 , x3 ) = h1 (h3 (x1 , x2 , x3 )).
Define:
f (a, 0) = h2 (a, 0),
f (a, b + 1) = h4 (a, b, f (a, b)).
We can check that this definition is correct by induction:
Base case: f (a, 0) = h2 (a, 0) = a.
Inductive Hypothesis: Suppose f (a, b) = a + b.
Then f (a, b + 1) = h4 (a, b, f (a, b))
= h4 (a, b, a + b)
= h1 (h3 (a, b, a + b))
= h1 (a + b)
= a + b + 1.
(ii) f (a, b) = ab is primitive recursive.
Let g(x1 , x2 ) = x1 + x2 .
58 3 General Models of Computation
3.3.2 Exercises
Exercise 3.3.1 Give full derivations from the initial functions and the rules
to show that the following functions are primitive recursive:
(i) f (a, b) = ab
(ii) f (a, b) = a! (
0 if b > a
(iii) f (a, b) = a −
· b=
a − b if a > b.
Exercise 3.3.2 Prove that any primitive recursive function is total, that is,
it is defined on all arguments. (Hint: Use induction on the way that the functions are
defined.)
Examples
(i) min(a, b) = b −
· (b −
· a).
(ii) max(a, b) = (a + b) −· min(a, b).
(iii) Define the function sg(a) as follows:
(
0 if a = 0
sg(a) =
1 if a > 0
Then sg(a) = 1 −· (1 − · a). This follows since if a > 0, 1 −· a = 0, and
hence 1 −
· (1 −
· a) = 1 − · 0 = 1, and if a = 0, then 1 −· (1 −· a) = 1 −
·
1 = 0.
(iv) The function
P called bounded sum is quite important: g(x1 , . . . , xn , z +
1) = y6z+1 f (x1 , . . . , xn , y):
X
f (x1 , . . . , xn , y) = f (x1 , . . . , xn , 0),
y60
X X
f (x1 , . . . , xn , y) = f (x1 , . . . , xn , y) + f (x1 , . . . , xn , z + 1).
y6z+1 y6z
Q
(v) As is bounded product g(x1 , . . . , xn , z + 1) = y6z+1 f (x1 , . . . , xn , y) :
Y
f (x1 , . . . , xn , y) = f (x1 , . . . , xn , y),
y60
Y Y
f (x1 , . . . , xn , y) = f (x1 , . . . , xn , y) × f (x1 , . . . , xn , z + 1).
y6z+1 y6z
(vi) |a − b| = (a −
· b) + (b −
· a). You can prove this by considering whether
a > b or not. If a > b then a −· b = a − b and b −· a = 0. If b > a then
the converse holds.
(vii) Let rm(a, b) denote the function which outputs the remainder when b is
divided by a. This function is defined by primitive recursion as follows:
rm(a, 0) = 0,
rm(a, b + 1) = (rm(a, b) + 1) · sg|a − (rm(a, b) + 1)|.
Remark 3.3.1. ∗ We remark that any (total) function on N which the reader
has ever encountered in “normal” mathematics or computing will be prim-
itive recursive. An explicit function which is total and computable but not
60 3 General Models of Computation
Recall that a relation or predicate is a set R such that R(x) = {x|x has
property R}. We say that R(x) holds if and only if x has property R. We also
define the function χR to be the following.
(
1 if R(x1 , . . . , xn , y) holds
χR (x1 , . . . , xn , y) =
0 if R(x1 , . . . , xn , y) does not hold
Then we can say a relation R is primitive recursive if and only if χR is a
primitive recursive function. We call χR the characteristic function of R, in
the same was as the characteristic function of a set.
Examples
(i) Let D(a, b) hold if and only if a divides b. Then
χD (a, b) = 1 −
· sg(rm(a, b)).
Now we will give some rules for making new primitive recursive functions
from old ones. Let R and S be two given primitive recursive relations.
(i) The relation “T = R and S”, written R ∧ S, is primitive recursive since
χT = χR · χS .
(ii) The relation T = not R written as ¬R is primitive recursive since the
function χT = 1 · χR is primitive recursive.
(iii) Hence any boolean combination of primitive recursive functions is prim-
itive recursive.
3.3 Partial recursive functions 61
Y
∀y 6 zR(x1 , . . . .xn , y) if and only if χR (x1 , . . . .xn , y) = 1,
y6z
X
∃y 6 zR(x1 , . . . .xn , y) if and only if sg( χR (x1 , . . . .xn , y)) = 1.
y6z
The notion of bounded search, namely the “least y less than or equal to x
such that some relation holds” defined below is also primitive recursive:
(
µyR(x1 , . . . , xn , y) if (∃y 6 z)R(x1 , . . . , xn , y)
µy 6 xR(x1 , . . . .xn , y) =
0 otherwise
We invite the reader to prove this in Exercise 3.3.4 below. The primi-
tive recursiveness of bounded quantification and bounded search is useful to
quickly observe the primitive recursiveness of other functions and relations:
Examples
(i) Let Pr(x) hold if and only if x is prime. Then Pr(x) if and only if
3.3.5 Exercises
Exercise 3.3.5 Let fp (x) denote the power of p in the prime decomposition
of x. Thus, if x = 6, then f2 (x) = 1 and f1 1(x) = 0, for example. Show that
for any prime p, fp (x) is primitive recursive. Conclude that the relation
(
1 if the power of p in x is 4
Rp (x) =
0 otherwise.
is primitive recursive.
62 3 General Models of Computation
Are the primitive recursive functions enough to characterize all (total) com-
putable functions? Using diagonalization, the answer is clearly no: Since the
primitive recursive functions are built up in a (computable) and hierarchi-
cal way, we could easily assign a Gödel number to each primitive recursive
function. Let {fn : n ∈ N} be such an enumeration. Consider the function
g(n) = fn (n)+1. This is clearly a (total) and intuitively computable function,
but cannot be primitive recursive. If it was, it would need to have g = fm
for some m, but then g(m) = fm (m) + 1, so 0 = 1! The masochistic reader
might be keen to formalize this using Turing Machines. Notice that this is
the diagonalization technique we met in Chapter 1 to show that the reals
were uncountable, but now applied in the context of computation.
Thus, primitive recursive functions are insufficient to capture the notion
of intuitively computable functions. The problem lies in the fact that searches
for primitive recursive functions are bounded. In fact there is a kind of Church-
Turing Thesis for primitive recursive functions:
As it is not central to our story, we won’t dwell on this thesis; but mention
it as the primitive recursive functions do occupy a significant place in the
theory of computation. Any function the reader has encountered in “normal”
mathematics will be primitive recursive.
Thus, to capture the notion of computability we need “until” searches.
We remark that the last clause in the least number rule is necessary to
ensure that the class C is closed under this rule. All of these definitions are
due to the logician Stephen Cole Kleene. The (likely apocryphal) legend is
that he thought of the least number operator at the dentist whilst having
wisdom teeth extracted.
The reader might well ask, why stop here? Can’t we play the same trick
as before and get something which is intuitively computable but not partial
recursive? We can still enumerate the partial recursive functions, similarly
as {fn : n ∈ N}. Then we could define g(n) = fn (n) + 1. Why don’t we
diagonalize out of the class as we did before. The answer is that it is possible
that g(n) = fn (n) + 1 because fn (n) is undefined! We are dealing with
partial functions here. It could be (and is true) that g(n) = fn (n) + 1 is
partial recursive. This is because both sides are undefined, and remember
f (x) = g(x) if both sides are undefined.
Kleene knew that we could diagonalize out of the class of primitive recur-
sive functions, and realized that he could not diagonalize out of the class of
partial recursive functions. The partial recursive functions gave one of the
original models of computation. Church and others claimed that the class of
partial recursive functions captured all intuitively computable functions; and
they made this claim well before the Turing model had been constructed.
Imagine you had only seen the definition of partial recursive functions. I
believe that you would be rightfully suspicious of this claim that this class
captured all intuitively computable functions. It was only when Turing pro-
posed his model of Turing machines that people accepted Church’s claim.
The point was that Turing machines are so obviously computable, and gen-
uinely seem to reflect the actions of the human computor of Turing’s thought
experiment.
Turing [Tur37a] also showed that the class of partial recursive functions
and the class of partial Turing computable functions coincide. We will do this
in the next subsection. Earlier, Kleene [Kle36] had proven that the models
of λ-computable (the even more obscure model of computation developed by
Church) and partial recursive functions coincided.
The aim of the next two sections is to prove that the class C of partial recursive
functions is exactly the class of functions T computable by a Turing machine.
The following exercise is one direction. It extends Exercise 3.3.3.
Exercise 3.3.6 Show how to compute a given partial recursive function with
a Turing machine program.
64 3 General Models of Computation
We will now work towards the other direction, namely that T ⊆ C. That
is, any partial function computed by some Turing machine program is also a
partial recursive function.
To do this we again use the coding technique of Gödel numbering to pro-
duce a unique code for each Turing machine program.
Recall that a Turing machine program P is a finite set of quadruples, say
Q0 , Q1 , . . . , Qn , of the general form hqi , xi , Ai,j , qi,j i. Also recall that the set
of states from which qi and qi,j is finite, as is the alphabet Σ. For our purposes
we fix the alphabet Σ to be {0, 1, B}.
We define a function g to assign a number to the different parameters in
a quadruple as follows:
g(0) = 2
g(1) = 3
g(B) = 4
g(L) = 5
g(R) = 6
g(qi ) = 7 + i
We can now use this definition to assign numbers to quadruples:
Examples
The following program P when given input x computes x + 1.
Q0 = hq0 , 1, R, q0 i
Q1 = hq0 , B, 1, q1 i
Q2 = hq1 , 1, L, q1 i
Q3 = hq1 , B, R, q2 i
where state q2 is a halt state.
The Gödel numbers are:
g(Q0 ) = 27 .33 .56 .77
g(Q1 ) = 27 .34 .53 .78
g(Q2 ) = 28 .33 .55 .78
g(Q3 ) = 28 .34 .56 .79
In a similar way we can also use the definition of g so far to number
programs. If P is the Turing machine program consisting of the quadruples
Q0 , Q1 , . . . , Qn then define:
g(Q )
g(P ) = 2g(Q0 ) × 3g(Q1 ) × . . . × pn+1n ,
Examples
Let P be the program in the previous example. Suppose we give P the
input x = 1. Then the configurations of the computation are as follows:
c 0 : q0 1
c1 : 1q0 B
c2 : 1q1 1
c3 : q1 11
c4 : q1 B11
c5 : q2 11
We can think of a Turing machine computation of program P on input x as
a sequence of configurations c0 , c1 , . . . , ct such that c0 represents the machine
in state q0 reading the leftmost symbol of the input x, ct represents the
machine halting in some state qj , and the transition between configurations
ci and ci+1 is given by the program P . Here we are thinking of a computation
as a calculation that halts. Since at the beginning the tape contains only
finitely many non-blank squares, and this is true at any later stage of the
calculation whether it halts or not (since at most one square can be changed
from a blank to a non-blank in each configuration), then the integers n and
m exist for each configuration.
Now we will use the configuration approach to code a computation with
Gödel numbering.
Let r be the sequence s1 , s2 , . . . , sn . Then define
g(s )
g(l) = 2g(s−m ) × . . . × pm−1
−2
× pg(s
m
−1 )
.
It follows from the Normal Form Theorem that every Turing computable
partial function is partial recursive. In fact, the proof shows that any partial
recursive function can be obtained from two primitive recursive functions by
one application of the µ-operator. The following is the promised proof of the
existence of a universal Turing machine.
the same class of partial functions as the partial computable functions, and
hence support the Church-Turing Thesis.
Here we will introduce yet another model of computation. This model is very
useful in modelling complexity (questions of size and running times) as it
is more faithful to the idea of a RAM (Random Access Machine, the kind
that modern computers are). Generally if we wish to show that, for example,
multiplication takes a certain time for n-bit numbers, we would choose a
model like a register machine rather than a Turing machine. The point is that,
while Turing machines and register machines have the same computational
power, there is a certain overhead of simulating one by another. Thus what
might take a linear number of steps on register machine might take, say,
cubic time on a Turing machine. This is because adding one to a register
takes one step on a Register Machine, whereas a Turing Machine would need
to set aside a part of the tape to record the contents of the register, move to
that part and add one to the block and then move back. This involves lots
of criss-crossing the tape and hence many steps. We remark in passing that
the overhead in this simulation is polynomial at worst, and can be done so
that one register machine step corresponds to about a quadratic number of
Turing Machine steps. We will discuss this point more later in Chapter 7.
Convention 3.4.1 We will use the following useful convention: We will not
allow “go to h0i as part of any instruction. Thus, for example, “0. R1+ go
3.4 Minsky and Register Machines 69
Proof. We sketch the proof, and leave the details to the reader. The easiest
model to simulate using register machines is the partial recursive functions. It
is trivial to simulate zero, predecessor, successor. For projection, we can either
use Gödel numbers to represent k-ary functions or define register machines to
devote their first k registers for inputs hx1 , . . . , xk i. In either case, projection
is also trivial, or at least easy. Composition is straightforward, since we can
start the second machine on the halt line of the first machine. This only
leaves recursion and least number. For recursion we would devote a register
for the parameter being recursed (i.e. used in applications of the recursion
scheme) upon, and least number similarly, increasing the parameter when the
machine returns a no. we leave the details to the reader. t u
3.4.1 Exercises
4.1 Introduction
4.1.1 Exercises
Exercise 4.1.3 Suppose that L, L1 , and L2 are finite state. Prove that the
following are decidable if we are given finite state automata M , M1 , and M2
with L = L(M ), L1 = L(M1 ), and L2 = L(M2 ).
(a) Is L = ∅?
(b) Is L = Σ ∗ ?
(c) Is L1 ⊆ L2 ?
(d) Is L1 = L2 ?
Again we will drop the trivial cases of A ∈ {∅, N}, as we did in Chapter 1.
In Definition 4.1.2, we think of A as coding the problem Q and B as coding
P in the explanation above. The following Lemma is immediate.
function f such that x ∈ A iff f (x) ∈ B, then we can say that A 61 B, but
this refined reducibility plays no explicit part in the present book.
Now to complete our preliminaries we need some core problem to reduce
from.
Halting Problem
Input: hx, yi ∈ N × N.
Question: Does ϕx (y) ↓?
Theorem 4.1.4 (Gödel (in some sense), Turing [Tur36]). The halting
problem is undecidable.
Proof. The proof resembles the proof that the primitive recursive functions do
not exhaust the total computable functions. It is a diagonalization. Towards
a contradiction, suppose that the Halting Problem was decidable. Then we
would have an algorithm showing that the set A was computable: where
hx, yi ∈ A iff ϕx (y) ↓ . Now we define a partial computable function g as
follows: (
1 if ϕx (x) ↑
g(x) =
ϕx (x) + 1 if ϕx (x) ↓ .
Then g would be a total and computable function. To compute g(x) see if
hx, xi ∈ A. We have an algorithm for this by assumption. If the answer is
no, then g(x) = 1. If the answer is yes, we know that ϕx (x) ↓, and hence we
can use the Universal Turing Machine, to run ϕx (x)[s] over stages s until it
halts. When we find the answer, we add one.
But now we arrive at a contradiction. Since g is computable, for some z,
g = ϕz . But then g(z) ↓ as g is total, and hence ϕz (z) ↓= g(z) = ϕz (z) + 1 =
g(z) + 1, so that 0 = 1 a contradiction!
Therefore the algorithm deciding the halting problem cannot exist. t u
Actually the proof above yields the following sharper result. Not only can’t
we decide whether ϕx (y) ↓ for arbitrary hx, yi but we can’t even decide this
for x = y. In Theorem 5.1.5 we will prove that both problems are more or
less the same, as they have the same m-degree.
4.1 Introduction 75
Since all the models of computation are the same, there are similar results
for the partial recursive functions and the register machines. That is, for
instance, there is no algorithm to decide if a given register machine on input
hn, 0, . . . , 0i halts.
Remark 4.1.1. The reader will note that we have been a wee bit naughty
in referring to the halting problem, when which pairs hx, yi have ϕx (y) ↓
thoroughly depends on the choice of universal Turing Machine. The point is
that it is irrelevant which enumeration we choose, since any will yield an
undecidable problem. In fact it is possible to show that all halting problems
are m-reducible to each other1 , so more-or-less the same problem, and we’ll
see how to do this in Chapter 5.
4.1.3 Exercises
Exercise 4.1.6 Use the method of the proof of the undecidability of the
halting problem to show that there is no computable function f such that
{ϕf (e) : e ∈ N}
The famous number theorist Paul Erdös said of the Collatz problem:
“Mathematics may not be ready for such problems.2 ” An apparently sim-
pler question is the following. This question clearly has a positive solution if
the Collatz conjecture has a positive solution, since C = N+ in that case.
recursive for partial functions. That is, we show that each game can be used
as a universal programming language3 .
Convention 4.2.2 The game will stop if we can’t find any such vector. This
condition will be understood in all the games below.
We have defined the action of the game, and, like Turing Machines and
Register Machines, now we need to define what we mean by a game simulating
a function.
Example 4.2.1.
vj = h0, 0, . . . , 1 , . . . , 0| − k, ti.
position i
80 4 Undecidable Problems
(Here, the vertical line here has no meaning beyond distinguishing the last
two positions in the vector.)
If the instruction at line k is Ri− ; if > 0 go to line t1 , and if = 0 go to line
t2 . Then this instruction is represented by two vectors:
vj = h0, 0, . . . , −1 , . . . , 0| − k, t1 i,
position i
vj+1 = h0, 0, . . . , 0, . . . , 0| − k, t2 i.
Example 4.2.2. Before we give a formal proof that this works, we pause for an
example; our old friend f (x) = 2x. Recall that this had the Minsky program
0. R1− if = 0 go to 6
if > 0 go to 1
1. R2+ go to 2
2. R2+ go to 3
3. R1− if = 0 go to 4
if > 0 go to 1
4. R2− if = 0 go to 6
if > 0 go to 5
5. R1+ go to 4
6. Halt
The corresponding Minsky Machine is the following:
The next step of the equivalences concerns a simple translation into another
game. Remember that, multiplying the same number to different powers is
the same as adding exponents, and so, for example 5n 5m = 5n+m and more
generally, (2x 3y ) · (2z 3m ) = 2x+z 3y+m . That is, when we multiply numbers
which are the same prime to two exponents, the multiplication translates as
addition of exponents. This observation is the motivation of the next game.
(i). ∀n ∈ dom ϕ
(2)
2ϕ(n) is the first power of 2 in gL (2n ), gL (2n ), . . .
(ii). ∀n ∈
/ dom ϕ
(2)
there is no power of 2 in gL (2n ), gL (2n ), . . .
Proof. First we show that every Vector Game can be emulated by a Rational
Game. For a vector game of arity k, let 2, 3, 5, . . . , pk be the first k prime
numbers Then, for all vector vi = hvi1 , vi2 , . . . , vik i in the game, encode as
2vi1 3vi2 · · · pvkik = ri , hence preserving order of the vectors in the order of the
rationals. So the input vector v = hx, 0, . . . , 0i translates to 2x , as expected by
the rational game. Moreover, the output vector v = hf (x), 0, . . . , 0i translates
to 2f (x) as expected also, for all x ∈ dom f . Choosing the first vector vi in
order to preserve hv1 , . . . , vk i ∈ Z+ corresponds to choosing the first rational
to preserve 2v1 3v2 · · · pvkk ∈ N+ .
Every Rational Game can be simulated by a Vector Game: Find the highest
prime needed for representation of a rational in the rational game, say pk .
Then make a vector game of arity k by constructing a vector vi from each ri
as follows:
Let ri = 2r1i 3r2i · · · prkki , then vi = hr1i , r2i , . . . , rki i.
Then the input rational r = 2x corresponds to the vector hx, 0, . . . , 0i as
expected by the vector game. And the output rational r = 2f (x) corresponds
to the vector hf (x), 0, . . . , 0i as expected also, for all x ∈ dom f . Choosing
the first rational to preserve 2r1i 3r2i · · · prkki ∈ N+ corresponds to choosing
the first vector to preserve hr1i , r2i , . . . , rki i ∈ Zk . t
u
Thus our final task is to demonstrate that rational games can be translated
into a system of congruences for a suitably chosen p.
84 4 Undecidable Problems
such that
f (x) = ri x
where i is uniquely determined by x ≡ y(mod p) for some y ∈ Di .
That is, every Rational Game is step by step equivalent to a Generalised
Collatz function.
4.2.5 Exercises
turn it into a Vector Game, then into a Rational Game, and then into a
generalized Collatz function.
Exercise 4.2.8 What games would the original Collatz function generate?
Input : Words u, v
Question : Does u ⇒? v?
Theorem 4.3.1 (Post [Pos47]). There are semi-Thue systems whose word
problems are unsolvable.
Proof. To prove this, we show how to obtain from any Turing machine T a
corresponding semi-Thue system Π(T ) such that an algorithm for solving the
word problem for Π(T ) can be used to solve the halting problem for T . If T
is a Turing Machine with states Here B is the blank symbol. q0 , q1 , · · · qn and
alphabet Σ = {0, 1}, then Π(T ) will be a semi-Thue process on the alphabet
A = {B, 0, 1, q0 , q1 , · · · qn , q, q 0 , h}. We refer to the symbols q0 , q1 , · · · qn , q, q 0
as q-symbols, and B is the blank symbol.
The following echoes the definition of configurations we used in the proof
that Turing Machines could be emulated by partial recursive functions in
Definition 3.3.3.
u, v are words on Σ
v 6= λ, and
0 6 i 6 n.
4.3 Unsolvable Word problems 87
qi Si −→ qi,j Sk
(ii). For each quadruple hqi , Sj , R, qi,j i we would have, for each symbol Sd of
T,
qi Sj Sd −→ Sj qi,j Sd
qi Sj h −→ Sj qi,j B h
Sd qi Sj −→ qi,j Sd Sj
h qi Sj −→ h qi,j B Sj .
(iv). For each pair hi, ji where there is no quadruple in T beginning qi , Sj , add
to Π(T ):
qi Sj −→ q Sj .
(This uses the convention that if I get to a place the machine has no
corresponding quadruple for q = qhalt , it halts.)
(v). For each symbol Sd , always add to Π(T ):
q Sd −→ q
q h −→ q 0 h
Sd q 0 −→ q 0
88 4 Undecidable Problems
Each production added by (i), (ii) and (iii) causes changes in a Post word
exactly corresponding to the effect on a machine configuration of applying
the quadruple from the machine specified. Suppose h u qi Sj v h is a Post
word corresponding to a given configuration of T , and that T contains, for
example, the quadruple hqi , Sj , Si,j , qi,j i. Then we have
where h u qi,j Si,j v h is in fact the Post word corresponding to the next
configuration of T . The other kinds of quadruples translate similarly, remem-
bering the h denotes the end markers for the portion of the tape addressed so
far. The symbol q 0 is a kind of “eating” symbol which converts a halt state to
a single symbol q 0 . This follows since each production added for (iv) operates
on a Post word corresponding to a configuration of T when T has just been
forced to halt. The production replaces instruction symbol qi by q.
That is, the productions from (i), (ii), and (iii), the semi-Thue process
emulates the actions of the Turing Machine until the Turing Machine halts.
If the Turing Machine eventually halts, the productions from (iv) and (v)
delete the representation of the tape contents until only h q 0 h is left, giving
us a single specific word to ask the word problem about.
Now, if T begins at q0 scanning the leftmost symbol of x, the corresponding
Post word is h q0 x h. Suppose that T eventually halts. Then
h q0 x h ⇒?Π(T ) h u q v h ⇒?Π(T ) h q 0 h.
h q0 x h = u1 ⇒ u2 ⇒ · · · ⇒ un = h q 0 h.
Note each ui must contain exactly one q-symbol, since the initial word con-
tained exactly one and each production g → ḡ has exactly one q-symbol in
each of g, ḡ. Now, (i) and (iii) replace each qi with qj and (v) never replaces
a qi . However, to get to q 0 some qi must be replaced by q, and q by q 0 . Hence,
a production from (iv) must have been used (exactly) once. This implies that
T halts. t u
Hence we have proven the following.
Theorem 4.3.2. The Turing Machine T , beginning with instruction q0 on
input x, eventually halts if and only if h q0 x h ⇒Π(T ) h q 0 h.
Proof. Above. t
u
Corollary 4.3.1. There is no algorithm to determine, for any x ∈ N,
whether
h q0 x h ⇒?Π(T ) h q 0 h.
Corollary 4.3.2. The word problem for Π(T ) is unsolvable.
4.3 Unsolvable Word problems 89
whenever u = u1 ⇒Π u2 ⇒Π · · · ⇒Π un = v
also v = un ⇒Π un−1 ⇒Π · · · ⇒Π u1 = u.
Definition 4.3.5.
Proof. The lemma is clearly true in direction →, since Π(T ) ⊆ Π̄(T ), and
hence all the productions from Π(T ) are present in those of Π̄(T ).
Conversely, suppose
By repeating the above steps we can remove all productions not in Π(T ),
and so end up with h q0 x h ⇒?Π(t) h q 0 h as required. t
u
4.3.3 Exercises
compatibility will be determined by the quadruples of the Turing Machine. If h is the “end
marker” of the configuration, then you would also add tiles for h, and ones saying that
it is compatible to add tiles with colours hl and hr which can be added to the left and
right respectively of a tile coloured h and extended arbitrarily left and right5 DNA models
another universal programming language.)
That is, given two lists of m words, {h1 , . . . , hm } and {k1 , . . . , km }, we want
to determine whether any concatenation of words from the h list is equal to
the concatenation of the corresponding words from the k list. A solution is
such a concatenation.
Given a semi-Thue process Π and words u, v, we can construct a Post
correspondence system that has a solution iff u ⇒?Π v. Then we can conclude
Theorem 4.3.5. There is no algorithm for determining of a given arbitrary
Post correspondence system whether or not it has a solution.
Proof. Let Π be a semi-Thue process on alphabet A = {a1 , . . . , an }, and let
u, v be words on A. We construct a Post correspondence system P on the
alphabet
B = {a1 , . . . , an , a01 , . . . , a0n , [, ], ?, ?0 }
of 2n + 4 symbols. For any word w on A, write w0 for the word obtained from
w by placing 0 after each symbol of w.
Let the productions of Π be gi → ḡi , i 6 i 6 k, and assume these include
the n identity productions ai → ai , 1 6 i 6 n. Note this is without loss of
generality as the identity productions do not change the set of pairs u, v such
that u ⇒?Π v. However, we may now assert that u ⇒?Π v iff we can write
u = u1 ⇒Π u2 ⇒Π · · · ⇒Π um = v, where m is odd.
5 Tiling systems provide a fascinating area to represent many, often very complicated,
undecidable problems. For example, you might change the shapes, ask for aperiodicity, etc.
An old account, written for the lay audience can be found in Wang [Wan65]. Remarkably,
these ideas found further realizations in modelling DNA self assembly, the basis of life,
beginning with Winfree’s remarkable PhD Thesis [Win98]. For a more recent analysis, see
Doty et. al. [DLP+ 12].
4.3 Unsolvable Word problems 93
w = [u ? | · · · ?0 v|]
= [|u ? · · · | ?0 v]
4.3.5 Groups∗
G = ha1 , . . . , an : g1 = g2 , . . . , gk = gk+1 i.
94 4 Undecidable Problems
Here the gi are words in the generators {a1 , . . . , an }. In the same way as
the semigroup above, we are declaring that we can substitute gi+1 whenever
we see gi as a subword of v and conversely. Groups will automatically have
additional relations since they must have an identity element 1, and for each
v ∈ G, there must also be an inverse v −1 ; hence we will also add lots of new
elements. As we have seen, we can use Post words to construct semigroups
which are very close to Turing Machines.
However, because of the additional structure guaranteed by the presence of
inverses many, many new “productions” will be present if we try to somehow
emulate the proof for semigroups to give an undecidability proof for groups
as we discuss below.
The definition of a finitely presented group above correlates with another
classical equivalent formulation of group presentations. For that formulation
we would have a free group F based on the symbols {a1 , . . . , an }. One def-
inition for a free group on such symbols is that F is the group where the
only relations holding between the symbols are the ones guaranteed by the
definition of being a group. So if x, y ∈ F then xy ∈ F , and this “product” is
concatenation. Every x ∈ F must have an inverse x−1 with x−1 x = xx−1 = 1;
and this is induced for the elements of F by giving each of a1 , . . . , an their
own inverse. So if x = a2 a1 a−1
3 , say, then x
−1
would be a3 a−1 −1
1 a2 .
Then we can add the relations gk = gk+1 by thinking of them as saying
−1
gk gk+1 = 1. Using group theory, the way that we can force this to happen in
F is to consider the quotient group, G = F/N , where N is a normal subgroup
−1
generated by {gi gi+1 : i = 1, . . . , k}.
In this equivalent classical form, in Max Dehn [Deh11] asked three funda-
mental questions (expressed in more modern language), in some sense found-
ing the area of combinatorial group theory.
• The Word Problem6 . Is there an algorithm to decide if given g ∈ G
whether g = 1 in G. Because of the existence of inverses, this is equivalent
to asking whether we can decide, given u, v whether u = v since u = v iff
uv −1 = 1.
• The Conjugacy Problem. Given u and v in G, can we decide if there
exists a z such that
u = zvz −1 ?
• The Isomorphism Problem. Is there an algorithm, which given two
finitely presented groups, G1 and G2 , decides whether G1 ∼
= G2 ?
6 On the first page of this paper, Dehn states all three problems. For instance, the word
problem is stated as “Itrgend ein Element der Gruppe ist dutch seine Zusammensezung aus
den Erzeugenden gegeben. Man soll eine Methode angeben, um mit einer endlichen Anzahl
yon Schritten zu entscheiden, ob dies Element der Identität gleich ist oder nicht.” This
translates (more or less) as “Some element of the group is given by its composition of the
generators. One should specify a method to decide in a finite number of steps whether this
element of the identity is the same or not.” As with Hilbert and his Entscheidungsproblem,
Dehn implicitly believes that there will be such an algorithm.
4.3 Unsolvable Word problems 95
Lemma, and the like, were by-products of the quest for undecidability. These
methods are now mainstays of combinatorial group theory. Rotman [Rot65],
Chapter 12 gives a very good account of the standard proof. The hidden
message here is that more involved undecidability proofs will often rely on
some highly nontrivial structure theory.
In the case that we allow an infinite set of relations7 gi = gi+1 then the
following group has an unsolvable word problem
In some sense this is a bit of a cheat, since the proof of the Higman
Embedding Theorem relies on the coding method in the proof of the Unde-
cidability of the Word Problem plus a certain technique called the “Higman
Rope Trick.”
While remaining restricted to groups, we can also expand the problems
shown to be algorithmically unsolvable. Let G be a collection of groups. We
say G is
For example, G may consist of only the trivial group {1}. It might consist
of exactly the groups which are cyclic, finite, free, or abelian. Thus there is
no algorithm allowing us to tell from a presentation whether the associated
group has any of those properties. The Adian-Rabin Theorem says that most
“reasonable” properties of finitely presented groups are algorithmically un-
solvable. Later in the book we will look at Rice’s Theorem, Theorem 5.2.1,
we will see that the same is true for deciding properties by their machine de-
scription alone. The proof of the Adian-Rabin Theorem is not that difficult,
but would take us a bit far. We refer the reader to, for example, Lyndon and
Schupp [LS01] for more on this fascinating subject.
Recall that one of Hilbert’s problem was to give a decision procedure for first
order logic. We are in a position to sketch the proof that there is no such
procedure. That is, we give a detailed sketch of the proof of the undecidability
of predicate logic.
For this section, we will assume that the reader is familiar with the rudi-
ments of predicate logic. Here we start with variables x, y, z, . . . , predicates
P x for x = x1 . . . xk (for k ∈ N) and the usual rules of boolean algebra (for
formulas A, B, C) such as A∧(B∨C) = (A∧B)∨(A∧C), ¬(A∨B) = ¬A∨¬B,
etc, and then enrich with quantifiers ∀x and ∃y. The Entscheidungsproblem
is the following
The proof is to take the proof for semi-Thue systems and “interpret it”
as something happening in predicate logic. From the proof of the undecid-
ability of semi-Thue systems, we have the following: We can use a unary
relation R, and think of R(q) as q 0 . That is, we can have a set of states
q, q 0 , q 00 , . . . representing q0 , q1 , q2 , . . . . Similarly symbols will be represented
by B, 1, 10 , 100 , . . . in place of B, S0 , S1 , S2 . . . . We’d need the “axiom” that
x(yz) = (xy)z, which say that brackets can be forgotten. We also interpret
u ⇒?Π v as a binary relation ⇒∗ . We then need to write out axioms for the
behaviour of ⇒∗ . Namely, we would write for each u ⇒?Π v generated by T ,
For each u ⇒Π v, generated by T , we would add the above sentence ψu,v . The
particular formula which would not be decidable would be that for a suitably
chosen T , that [∀x, y, z(x(yz) = (xy)z) ∧ [∧u⇒Π v ψu,v ] ∧ ∀u, v, w([(u ⇒∗ v) ∧
(v ⇒∗ w)] → u ⇒∗ w)] → (hxh ⇒∗ hq 0 h), giving a sentence ρx . This is valid
iff T halts on x.
The actual setting for the proof is in the natural numbers, not the integers.
This does not result in loss of generality because we are proving unsolvability.
If the integer version is solvable, then we may solve the natural number
version for P (x1 , . . . , xn ) = 0 by solving the integer version of P (u21 + v12 +
x21 + y12 , . . . , u2n + vn2 + x2n + yn2 ) = 0, because every non-negative integer is the
sum of four squares. Therefore if the natural number version is unsolvable,
the integer version must also be.
For example, sticking to the naturals, “is a factor of” is Diophantine,
since “a is a factor of b” iff ∃x(a · x = b); similarly: “a is composite” iff
(∃x1 )(∃x1 )[(x1 + 2)(x2 + 2) − a = 0]; “a < b” iff (∃x)[a + x + 1 − b = 0].
8 A Diophantine equation with any unknowns and with whole rational number coefficients
are presented; one should have a procedure indicate after which, using a finite number of
operations, decide whether the equation is in whole rational numbers is solvable.
4.3 Unsolvable Word problems 99
The proof of this result involves very clever results about the behaviour
of polynomials, as well as ingredients like Fibonacci numbers, and other mis-
cellaneous number-theoretic facts. Thus we will not cover this intricate proof
in detail.
However, because we have covered Register Machines, there is a relatively
accessible proof due to Jones and Matijacevič [JM79] for one of the prelimi-
nary results of the proof. This proof is not widely known, and it seems worth
studying. Matijacevič’s Theorem is the end of a long series of papers, and is
regarded as a major theorem in the history of mathematics. The result we
prove here is one of the major steps in the proof and we clarify its relationship
with the final proof later. Thus we will take the opportunity to look at the
Jones-Matijacevič material in this section, and then use hand waving for the
last step of the proof.
We will prove the weaker result that the halting problem (any c.e. relation)
is exponential Diophantine. This is defined precisely as in Definition 4.3.7,
but also allowing terms like ax (exponentials, but only single exponentials,
x
so 22 , for example, is not allowed), in the definition. Thus, for example,
2y − 7 = x2 , xx y y = z z and 2x + 11z = 7y + 1 are examples of exponential
Diophantine equations. Matijacevič’s key result was to show that exponential
Diophantine is the same as Diophantine. The earlier undecidability result
for exponential Diophantine relations is a celebrated result due to Davis,
Putnam, and Robinson [DPR61].
We code the action of Register Machines and the goal will be to ensure the
following: We want to write equations which have a solution iff the machine
9 Actually this works for any computably enumerable k-ary relation as defined in Chapter
5.
10 We state this as we will make some remarks about the consequences of the full result
at the end. Diophantine to c.e. is the easy direction. Let R be as in Definition 4.3.7. We
may computably list all n-tuples hy1 , . . . , yn i. The procedure ϕ(hx1 , . . . , xk i) evaluates
P (x1 , . . . , xk , y1 , . . . , yn ) on successive n-tuples until 0 is obtained, if ever (and we argue
by the Church-Turing thesis that ϕ is partial computable).
100 4 Undecidable Problems
To follow the paper [JM79], we will slightly modify the definition of register
machines we use. The new lines are11 :
Li If Rj < Rm, GOTO Lk; meaning if the contents of register i is less than
that of register j, go to line k.
Li If Rj = 0, GOTO Lk.
Li + 1 ELSE Rj ← Rj − 1.
Finally,
Li HALT.
It is not hard to prove that the machines created with these commands are
equivalent to those we considered when we looked at Minsky Machines, and
Collatz Functions.
We now introduce some new definitions and notations where all definitions
have variables in N.
[a 4 b and c 6 d] ↔ [a + cQ 4 b + dQ].
where the ri and si are the binary digits of r and s, respectively (taken to
have common length n).
We turn to coding register machines. We will define the version of the halt-
ing problem for M with registers R1, . . . , Rr to have lines labelled L1, . . . , L`.
M (x) will halt when started on x in R1 finished in the halt line with zero
in all other registers. This makes no essential difference to the definition of
halting. We will pick a suitably large number Q, and use digit of numbers,
written in base Q, to emulate the action of the machine. Let s be the number
of steps used by the machine when it stops, let ri,t denote the content or
register ri at step t, and `i,t be 1 if we execute line Li at time t, and 0 if we
do not. Now we define
s
X Q
Ri = rj,t Qt (0 6 rj,t 6 ), and,
t=0
2
s
X
Li = `i,t Qt (0 6 `j,t 6 1).
t=0
The base Q we use will be a sufficiently large power of 2. At step t the contents
of any register cannot be larger than x + t, and hence rj,t 6 x + s. So we can
choose Q so that
Q
x+s< ∧ ` + 1 < Q ∧ Q pow2.
2
Thus take Q = 2x+s+` .
Example 4.3.1. The following example is taken from [JM79] (so that I don’t
get it wrong!)
4.3 Unsolvable Word problems 103
L0 R2← R2+1
L1 R2← R2+1
L2 IF R3=0 GOTO L5
L3 R3← R3-1
L4 GOTO L2
L5 R3← R3+1, R4← R4+1, R2← R2-1
L6 IF R2> 0 GOTO L5
L7 R2← R2+1, R4← R4-1
L8 IF R4> 0 GOTO L7
L9 IF R3< R1 GOTO L5
L10 IF R1< R3 GOTO L1
L11 IF R2< R1 GOTO L10
L12 R1← R1-1, R2← R2-1, R3← R3-1
L3 IF R1> 0 GOTO L12
L14 HALT
0 0 1 1 2 2 2 2 2 2 2 2 2 22 2 2 2 2 R1
0 0 1 1 2 2 2 2 2 1 1 0 0 11 2 2 1 0 R2
0 0 1 1 2 2 2 2 2 2 2 2 2 11 0 0 0 0 R3
0 0 0 0 0 0 0 0 0 1 1 2 2 11 0 0 0 0 R4
0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 L0
0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 1 1 L1
0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 1 0 0 L2
0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 L3
0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 L4
0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 0 0 0 L5
0 0 0 0 0 0 0 0 0 0 0 0 1 01 0 0 0 0 L6
0 0 0 0 0 0 0 0 0 1 0 1 0 00 0 0 0 0 L7
0 0 0 0 0 0 0 0 1 0 1 0 0 00 0 0 0 0 L8
0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 L9
0 0 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 L10
0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 L11
0 0 1 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 L12
0 1 0 1 0 0 0 0 0 0 0 0 0 00 0 0 0 0 L13
1 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 L14
For the proof we need a number I which is all 1’s when written base Q. If
1 + (Q − 1)I = Qs+1 , then
104 4 Undecidable Problems
s
X
I= Qt will work.
t=0
Ps Ps
We remark that t=0 at Qt 4 t=0 bt Qt for any at , bt < Q. It remains to
show how how these parameters x, s, Q, I, R1 , . . . , Rr , L0 . . . , L` can be used
to write conditions that have a solution iff
M (x) ↓ .
That is, there is some s with M (x) ↓ [s]. The numbers r, ` are constants, x is
the exponential Diophantine parameter and s, Q, I, R1 , . . . , Rr , L0 . . . , L` are
unknowns.
The first set of conditions is
Q
x+s< ∧ ` + 1 < Q ∧ Q pow2.
2
Ps
The next are 1 + (Q − 1)I = Qs+1 and I = t=0 Qt . To force an arbitrary
Ps
number Rj to have the form Ri = t=0 rj,t Qt (0 6 rj,t 6 Q 2 ), we use 4, as
Q
Q is a power of 2. This condition and rj,t 6 2 is implied by
Q
Rj 4 ( − 1)I.
2
The next action is to force one digit to be 1 in each column of the columns.
Since ` < Q, we use
Xt
I= Li , and,
i=0
Li 4 I(i = 0, . . . , `).
The starting condition is stipulated by 1 4 L0 . Because of GOTO commands,
we can assume there is only one halting line, and this happens at step s, so
L` = Qs .
QLi 4 Lk .
Li : If Rj = 0 GOTO Lk,
we use
QLi 4 Lk + Li+1 and QLi 4 Li+1 + QI − 2Rj .
4.3 Unsolvable Word problems 105
we simulate with
If the command is
Li If Rj 6 Rm GOTO Lk,
simulate with
The k-sum is taken over all k for which the programme has an instruction
of the type Lk . . . Rj ← Rj + 1 . . . .. The i sum is taken over all i where the
programme has an instruction of the form Li . . . Rj ← Rj − 1 . . . , or one of
the form Li + 1 . . . ELSE Rj ← Rj − 1. This simulation proves the theorem.
More details can be found in Jones and Matijacevič [JM79]. In that paper
thay also discuss applications in computational complexity theory and NP-
completeness, via miniaturizations, and placing bounds on Register Machines.
4.3.8 Exercises
√
Exercise 4.3.11 Quantifying over N, show that “ a is rational” is Diophan-
tine.
Exercise 4.3.12 (Putnam) Prove that every Diophantine set is the positive
part of the range of a polynomial over Z.
Exercise 4.3.13 Prove by induction that if r 6 s then r 4 s iff rs is odd.
Exercise 4.3.14 Apply the method of this section to obtain a table corre-
sponding to the following register machine for x = 2.
L0 IF R1=0 GOTO L6
L1 R1← R1-1
L2 IF R1=0 GOTO L5
L3 R1← R1-1
L4 GOTO L0
L5 R1← R1+1
L6 HALT
4.3.9 Coda
5.1 Introduction
In this Chapter, we will develop a number of more advanced tools we can use
to tackle issues in computability theory. In particular, we will be able to deal
with problems more complex than the halting problem, as delve more deeply
into the fine structure of reducibilities and noncomputable sets.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 109
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_5
110 5 Deeper Computability
K = {x : Φx (x) ↓}
In Theorem 5.1.5, we will see that they are the “same” problem as K ≡m K0 .
We can use simular codings for other problems:
Nonemptiness
Input : e
Question : Is dom(ϕe ) 6= ∅.
The hardness of the halting problem stems from the fact that we are
seeking a witness and perhaps one never occurs. Such problems have a very
special character. Although they are not decidable (proven below for N ),
they are “semi-decidable”. This means that a “yes” instance will be observed
should one ever happen, as time goes by. For example, if we began a process:
Observing the Halting Computations
• Stage 0 Compute ϕ00 (0). This means “run partial computable function ϕ0
on input 0 for 0 many steps.
• Stage s For e, j 6 s, compute ϕse (j). That is, run the first s many machines
for s steps for inputs j 6 s
The following definition makes sense once we consider the dovetail enu-
meration above.
Remark 5.1.1. We remark that we will use both the notations We,s and We [s],
as well as similar ones like ϕe,s and ϕe [s] deliberately, as both are commonly
used in the literature. The notation with [s] is due to Lachlan and is partic-
ularly convenient as we take it to mean “approximate everything for s many
steps.”
If A is c.e., then it clearly has a computable approximation, that is, a
uniformly computable family {As }s∈N of sets such that A(n) = lims As (n)
for all n. (For example, {We [s]}s∈ω is a computable approximation of We .)
We [s] is our old friend T (e, x, s), where T is the Kleene T -predicate of Lemma
3.3.1. The halting problem, or problems, occupy a central place in the theory
of c.e. sets.
K = {e : ϕe (e) ↓}.
Note that “K” in this definition is the standard notation. The etymology
is that “K” is the first letter of komplett, “complete” in German. K and K0
112 5 Deeper Computability
are called m-complete since they are c.e. and if W is a c.e. set the W 6m C
where C ∈ {K, K0 }.
This is obviously true for K0 since
y ∈ Wx iff hx, yi ∈ K0 .
f (v) = x2 , etc. Then since x ∈ A iff x ∈ Au for some u, we see that the range
of f will be A, as required.
Conversely, if f (N = A, is computable, we build a partial computable
function ϕ, by declaring ϕ(x) ↓ [s] if f (s) = x. Then A = dom ϕ. t u
The reader might note that this proof is basically an “effectivization” (i.e.
making computable step by step) of arguments from Chapter 1, §1.1.2.
Some important refinements of the results above, are the following two
results:
Proposition 5.1.4.
The reader should establish these results as exercises (see below) to make
sure that they have followed the material above. Solutions can be found in
the back of the book if they are stuck.
5.1.2 Exercises
Exercise 5.1.1 Show that if A and B are c.e. then so are A ∩ B and A ∪ B.
Exercise 5.1.2 Show that if f is a computable function then ∪e Wf (e) is
computably enumerable.
Exercise 5.1.3 Modify the proof of Proposition 5.1.2 to prove Proposition
5.1.3.
Exercise 5.1.4 Modify the proof of Proposition 5.1.2 to prove Proposition
5.1.4.
Proof (Sketch). The proof of Lemma 5.1.1 runs as follows: For 1, we will be
given (the quadruples and hence the index of) a Turing machine M computing
g and a number x, we can build a Turing machine N that on input y simulates
the action of writing the pair (x, y) on M ’s input tape and running M . We
can, in a primitive recursive way, calculate an index s(x) for the function
computed by N from the index of that of M . t u
We give two typical proofs using the s-m-n Theorem. The first is the
promised proof of the m-completeness of K
Proposition 5.1.5. K ≡m K0 .
Note that f (a, z) does not depend on z. Also note that f (hx, yi, z) ↓ for all z
iff f (hx, yi, z) ↓ for some z iff ϕx (y) ↓.
By the s-m-n-Theorem, there is a computable function s such that for
each a, f (a, ·) = ϕs(a) . It follows that ϕx (y) ↓ iff ϕs(hx,yi) (s(hx, yi)) ↓ . That
is hx, yi ∈ K0 iff s(hx, yi) ∈ K. t u
Proof. We will show that the halting problem is m-reducible to this emptiness
problem. To do this, we define a partial computable function of two variables
via
2 Actually, this function is linear for any reasonable programming system.
5.2 Index Sets and Rice’s Theorem 115
(
1 if ϕx (x) ↓
g(x, y) =
↑ if ϕx (x) ↑ .
Notice that g ignores its second input.
Via the s-m-n theorem, we can consider g(x, y) as a computable collection
of partial computable functions. That is, there is a computable (primitive
recursive) s(x) such that, for all x, y, Now
(
N if ϕx (x) ↓
dom(ϕs(x) ) =
∅ if ϕx (x) ↑,
so if we could decide for a given x whether ϕs(x) has empty domain, then we
could solve the halting problem. That is K 6m N . t u
We have seen in Exercise 4.1.3 that some algorithmic problems are decidable.
For example, if α is a regular expression, “Is L(α) nonempty?” is algorith-
mically decidable. However, in this section we will show that for a nontrivial
problem to be decidable, it cannot be a problem given purely by a Turing
machine description. We want to capture the idea that we are representing a
class of (partial computable) functions and not particular ways of computing
the functions. This idea leads to the notion of an index set defined below.
For example, {x : dom(ϕx ) = ∅} is an index set. On the other hand the set
of indices of Turing machine with 12 states is not an index set. Keep in mind:
an index set can be thought of as coding a problem about computable func-
tions (like the emptiness of domain problem) whose answer does not depend
on the particular algorithm used to compute a function. Generalizing Propo-
sition 5.1.6, we have the following result, which shows that nontrivial index
sets are never computable. Its proof is very similar to that of Proposition
5.1.6.
Proof. Let A ∈
/ {∅, N} be an index set. Let e be such that dom(ϕe ) = ∅.
We can assume without loss of generality that e ∈ A (the case e ∈ A being
116 5 Deeper Computability
How should we reconcile Rice’s Theorem with the fact that, for example,
for a regular expression α we can decide if L(α) is finite? The point is that
for that result, we supposed that L was given as a L(α). Suppose that some
regular L is only given by M a Turing machine with L(M ) = L. Because of
Rice’s Theorem, from only a Turing Machine description of L, rather than
the particular representation by a regular expression (or by an automaton),
we can deduce nothing about a regular language.
Notice that in the proof of Rice’s Theorem we are establishing an m-
reduction from the halting problem K. Thus, as corollary we have.
The explicit definition of n given above can clearly be carried out computably
given an index for f . t u
The reader, like the author, might find proof above to be quite mysterious.
It is. A longer but more perspicuous proof of the recursion theorem was given
by Owings [Owi73]; see also Soare [Soa87, pp. 36–37]. I will give another proof
in a starred section §5.3.1.
There are many variations on this theme. For example, if f (x, y) is com-
putable, then there is a computable total function n(y) such that, for all
y,
ϕn(y) = ϕf (n(y),y) .
This result is called the Recursion Theorem with Parameters, and is also due
to Kleene [Kle38]. (See Exercise 5.3.7.)
Here is a very simple application of the Recursion Theorem.
Proof. Let f be a computable function such that ϕf (n) (n) ↓ and ϕf (n) (m) ↑
for all m 6= n. We can obtain such an f by a standard application of the
s-m-n-Theorem: That is, we can define
(
1 if y = n
g(n, y) =
↑ otherwise
Now take f with ϕf (n) (·) = g(n, ·), using the s-m-n-Theorem.
Let n be a fixed point for f , so that ϕn = ϕf (n) . Let m 6= n be another
index for ϕn . Then ϕn (n) ↓ and hence n ∈ K, but ϕm (m) ↑ and hence m ∈/ K.
So ∅0 is not an index set. t u
Note that this example also shows that there is a Turing machine that
halts only on its own index.
118 5 Deeper Computability
Let’s do another proof of the Recursion Theorem. This proof was communi-
cated to Noam Greenberg and is by Iskander Kalimullin from Kazan.
We will work up to it by first looking at an apparently weaker form.
Our goal is to prove that if f is total computable then there is an e with
We = Wf (e) ,
[n]
Then the desired fixed point is Wn , since this is certainly a c.e. set and
[n] [n]
Wn = Wf (n) .
[n]
The Problem What’s the problem? The problem is that the index of Wn
is not n, but g(n, n).
[m]
Wg(m,m) = Wm = B [m] = {hm, yi | hm, yi ∈ Wf (g(m,m)) }.
5.3.2 Exercises
ϕn(y) = ϕf (n(y),y) .
Exercise 5.3.10 Use the Recursion Theorem to show that there is an infinite
computable set C with K ∩ C = ∅.
In this section, we will look a class of often quite simple arguments commonly
used in computability theory, called wait and see arguments, or sometimes
bait and snatch arguments. The idea is that we want to build some objects,
typically computably enumerable sets, and want to perform some diagonal-
ization. We think of the diagonalization as meeting requirements.
As a prototype for such proofs, think of Cantor’s [Can79] proof that the
collection of all infinite binary sequences is uncountable (essentially Theorem
1.3.1). We can conceive of this proof as follows.
Suppose we could list the infinite binary sequences as S = {S0 , S1 , . . .},
with Se = se,0 se,1 . . .. It is our goal to construct a binary sequence U =
u0 u1 . . . that is not on the list S. We think of the construction as a game
against our opponent who must supply us with S. We construct u in stages,
at stage t specifying only u0 . . . ut , the initial segment of U of length t + 1.
Our list of requirements is the decomposition of the overall goal into subgoals
of the form
Re : U 6= Se .
There is one such requirement for each e ∈ N. Of course, we know how to
satisfy these requirements. At stage e, we simply ensure that ue 6= se,e by
setting ue = 1−se,e . This action ensures that U 6= Se for all e; in other words,
all the requirements are met. This fact contradicts the assumption that S lists
all infinite binary sequences, as U is itself an infinite binary sequence.
In the construction of a c.e. set with certain properties, we will lay some
kind of trap to meet requirements Re of some kind. We observe the dovetail
enumeration of the universe and make some decision.
The simplest example is a form of the halting problem. The following is
an immediate consequence of the existence of ∅0 , but let’s pretend we did not
know that result.
Proposition 5.4.1 (The Halting Problem-Revisited). There is a c.e.
noncomputable set A.
Proof. We think about building A using requirements
Re : We 6= A.
5.4 Wait and See Arguments 121
Of course there is nothing new here, but the proof constitutes a new way
of viewing the construction of K.
Here is an another example. In this construction, we will construct an
example of a variety of c.e. sets we have not yet met in this book.
Re : |We | = ∞ implies We ∩ A 6= ∅.
We think of N as being divided into boxes. The size of the e-th box is e + 2,
so that B0 = {0, 1}, B1 = {2, 3, 4}, . . . .
Construction At stage s, for e 6 s, if Re is not yet satisfied, meaning that
We,s ∩As = ∅, and we see some y ∈ We,s with y 6∈ ∪j6e Bj , put y ∈ As+1 \As .
This action makes Re satisfied at stage s + 1.
End of Construction
Notice that since we need to act at most once for each Re , and the condition
y 6∈ ∪j6e Bj means that if our action for Re takes something from box Bj
and puts it into A, then j > e. Therefore at most e − 1 elements can be taken
122 5 Deeper Computability
from Be and put into A. Thus at least one remains. Therefore A is infinite
as witnessed by the elements of the boxes not put into A. Re is met since, if
We is infinite, it must have some element y 6∈ ∪j6e Bj . This will be observed
at some stage s > y and we will put it into As+1 , if necessary. t u
Simple sets were constructed by Post to try to show that there are unde-
cidable semi-decidable problems which are not either the halting problem or
its complement. Simple sets do this at least for m-reducibility.
Theorem 5.4.2 (Post [Pos47]). If a c.e. set A is simple, then ∅ <m A <m
K.
This subsection is devoted to using wait and see; but aimed at a new arena
of computability theory: computable structure theory. Computable structure
theory is a broad area where we seek to understand the effective (computable)
content of well-known algebraic structures. We have already seen this in
Chapter 4, where we looked at word problems in structures such as groups
and semigroups. Those structure had finite presentations, but there is no
reason we not to look at computable presentations. Historically, one of the
classical examples was that of Frölich and Shepherdson [FS56] who studied
computable procedures in field theory3 , such as whether there is an algorithm
to classify the algebraic closure. This paper clearly shows the historical con-
text of the subject, the clear intuition of van der Waerden (which apparently
came from Emmy Noether’s lecture notes) and the fact that isomorphic com-
putable structures (here fields) can have distinct algorithmic properties, and
hence cannot be computably isomorphic. Here we quote from the abstract.
“Van der Waerden (1930a, pp. 128–131) has discussed the problem of carrying out
certain field theoretical procedures effectively, i.e. in a finite number of steps. He
defined an ‘explicitly given’ field as one whose elements are uniquely represented
by distinguishable symbols with which one can perform the operations of addition,
3 The reader should not concern themselves if they have not had a course in abstract
algebra, since this is only a historical example. Suffice to say that fields are central algebraic
objects studied by mathematicians, and understanding the effective (algorithmic) content
of their theories seems important.
5.4 Wait and See Arguments 123
In modern terms, Frölich and Shepherdson [FS56] showed that the halting
problem is many-one reducible to the problem of having a splitting algorithm.
Since this book is a first course in the theory of computation, we will
concentrate on the effective content of a class of structures where we don’t
need to concern ourselves with complicated operations. That is, we will study
the class of linear orderings (linearly ordered sets). Recall that a linear or-
dering a set A and an ordering <A which is transitive, antisymmetric and
antireflexive. We will only be concerned with infinite linear orderings.
The rationals with the standard ordering has a computable presentation ob-
tained from some Gödel numbering of Q. The rationals have a special role
in the theory of countable orderings. All such orderings embed into (Q, <Q );
indeed all countable orderings embed into the (Q, <Q ). This result is also
computable in the following sense:
The construction of L works as follows. The bait (witness) for Re is the pair
(2e, 2(e + 1))
Construction. At each stage s, we will add at least one new (odd) element
to Ls+1 \ Ls . It is convenient to think of L0 = 0 <L 2 <L 4 <L 4 <L . . . . We
5.4 Wait and See Arguments 125
could deal with a finite number of points at stage s, but this will always be
the skeleton of the orderings, so we might as well as regard this as the initial
ordering. For e 6 s, we will say that Re requires attention at stage s if it is
not yet satisfied, and ϕe,s (h2e, 2(e + 1)i) ↓ [s] = 1, so that it is telling us that
(2e, 2(e + 1)) is an adjacency, then at step s for the least such e, we will add
the least odd number 2k + 1 into the adjacency: that is, declare
5.4.3 Exercises
Exercise 5.4.4 We say that a collection of canonical finite sets (finite sets
given by Gödel numbers)
F = {Fx : x ∈ N}
sets. (Hint: Build Ai = ∪S Ai,s in stages. Note that We is computable iff there is some j
with We t Wj = N. Thus, it suffices to meet the requirements
Think about setting aside a witness for this requirement, and see if it occurs in Wi or Wj ,
or neither.)
|α − qn | < 2−n .
lim rn = β.
n→∞
computes ∅0 . (Hint: The bait should be a dedicated pair for the question “?Is e ∈
∅0 ?”. Make these non-adjacent iff e enters ∅0 [s]. This shows that ∅0 is m-reducible to the
nondajacency relation.)
Now the question is: What principles should we use for a general reduction?
We do a mind experiment akin to that done by Turing when he introduced
Turing Machines:
128 5 Deeper Computability
The reader might wonder how to formalize this definition using quadruples,
etc. One model would work as follows. We imagine a 3 tape Turing Machine,
with one tape to work on as with the normal Turing Machine, one tape for
the oracle set to be written upon as, for example, χB as a sequence of 1’s
and 0’s with B(x) = 1 iff x ∈ B. Then one question tape to write queries.
Each tape has its own head, and quadruples as processed can use the current
state, the symbols currently being read on each tape by the head devoted to
the tape to decide the action on the tapes. The query tape has blocks of a
special character, which also have a symbol +, −, which indicate whether it
is a positive or negative query. They also have some symbol such as ∗ which
indicates that the query is ready to be asked. There will be extra tuples
and states devoted to writing on each tape; work states, question states and
query states. During the computation on the work tape we can be generating
a question on the question tape, and will place a ∗ when the question has
the right length. Then the query states will act to see if the position of 1
in the question tape has (for +), or does not have (−), the same sign as
the appropriate block on the oracle tape. This determines the next state,
symbol and moves of the rest of the machine, and wiping the contents of the
question tape before the next move of the work tape, so that it is ready for a
new query. The oracle tape is read-only; meaning that all we can do is decide
if entries on the question tape match the contents of the oracle tape.
This all looks extremely messy; and that’s because it is. The details of
equivalent formulations have been worked out by many authors and are not
especially illuminating; so we will omit them. Again, note that the action is
fundamentally local, (in fact primitive recursive) the machine itself is defined
without recourse to the contents of the oracle tape, but what the machine
does depends on the contents of the oracle tape.
130 5 Deeper Computability
Definition 5.5.3.
relentlessly mix notation by writing, for example, A <T b, for a set A and a
degree b, to mean that A <T B for some (or equivalently all) B ∈ b.
The Turing degrees form an uppersemilattice. The join operation is in-
duced by ⊕, where A ⊕ B = {2n : n ∈ A} ∪ {2n + 1 : n ∈ B}. Clearly
A, B 6T A ⊕ B, and if A, B 6T C, then A ⊕ B 6T C. Furthermore, if A ≡T A b
and B ≡T B, b then A ⊕ B ≡T A b ⊕ B.
b Thus it makes sense to define the
join a ∨ b of the degrees a and b to be the degree of A ⊕ B for some (or
equivalently all) A ∈ a and B ∈ b.
We let 0 denote the degree of the computable sets. Note that each degree
is countable and has only countably many predecessors (since there are only
countably many oracle machines), so there are continuum many degrees.
ΦX (e, x) = ΦX
e (x).
Proof. Proposition 5.5.1 is most easily seen to be true via the functional
formulation of Turing reduction, since there is a computable listing of partial
computable functions (acting on {0, 1}∗ , without loss of generality, as the
coding is immaterial), and a universal partial computable function guaranteed
by Proposition 5.1.6. For any e we can regard Φe as initially undefined. As
we find longer and longer τ ’s and σ’s with ϕe (τ ) = σ[s] and obeying the
rule that we already have seen ϕ(b τ) = σ b for σb ≺ σ. we can extend the
definition of Φτe (|σ|) ↓ [s]. Thus gives a dovetail enumeration of all the oracle
computations, as required. t u
Proof. The sets A and B give the same answers to all questions asked during
the relevant computations, so the results must be the same. tu
Remark 5.5.1. The reader should note that using lower case letters like ϕ to
correspond to upper case letter like Φ when representing a use function has
some potential for confusion, as here ϕ might not be a partial computable
function. However, this convention is standard practice, and there should be
no confusion because of the context.
Lemma 5.5.1.
1. A <T A0
2. A 6T B then A0 6T B 0 .
5 The reader might think that this should be ≡T as we’d need to relativize the definition of
6m . However, the proof of the s-m-n-Theorem (which is used in the proof that K ≡m K0 )
a primitive recursive (recall, writing ‘x’ on the tape) is always the same and does not have
anything to do with oracles.
134 5 Deeper Computability
ΦA A A
z (z) ↓= g (z) = Φz (z) + 1.
This is a contradiction.
2. Suppose that A 6T B. Let Ψ B = A. We can construct an A-computable f
such that ΦB A A
f (e) = Φe , by using B to simulate Φe computations using B as
an oracle, via Ψ B . (That is, if ΦA
e (x) queries “y ∈ A?” this is transferred to
B via Ψ , since from B we can compute A(y). Then ΦA e (e) ↓ iff Φf (e) (e) ↓ .
That is e ∈ K A iff hf (e), ei ∈ K0A .
t
u
Note that with this notation, we would write
So, for example, A(2) = A00 and A(3) = A000 . If a = deg(A) then we write a0
for deg(A0 ), and similarly for the nth jump notation. This definition makes
sense because A ≡T B implies A0 ≡T B 0 . Note that we have a hierarchy of
degrees 0 < 00 < 000 < · · · . Next we explore this hierarchy.
We define the notions of Σn0 , Πn0 , and ∆0n sets as follows. A set A is Σn0 if
there is a computable relation R(x1 , . . . , xn , y) such that y ∈ A iff
Π10 Π20
(
(
(
(
(
Σ10 Σ20
As we will see in the next section, there is a strong relationship between the
arithmetical hierarchy and enumeration. The following is a simple example
at the lowest level of the hierarchy
Proposition 5.6.1 (Kleene) A set A is computably enumerable iff A is
Σ10 .
Proof. If A is c.e. then A = dom ϕe . Now we can apply the Kleene Normal
Form Theorem, Theorem 3.3.8.
136 5 Deeper Computability
Intuitively, the proof of the “if” direction of the limit lemma boils down
to saying that, since (by Propositions 5.6.1) ∅0 can decide whether ∃s >
t (g(n, s) 6= g(n, s + 1)) for any n and t, it can also compute lims g(n, s).
As we have seen, we often want to relativize results, definitions, and proofs
in computability theory. The limit lemma relativizes to show that A 6T B 0 iff
there is a B-computable binary function f such that A(n) = lims f (n, s) for
all n. Combining this fact with induction, we have the following generalization
of the limit lemma.
Corollary 5.6.1 (Limit Lemma, Strong Form). Let k > 1. For a set A,
we have A 6T ∅(k) iff there is a computable function g of k + 1 variables such
that A(n) = lims1 lims2 . . . limsk g(n, s1 , s2 , . . . , sk ) for all n.
5.6.2 Exercises
Theorem 5.6.6 (Post’s Theorem). Let n > 0. Recall that ∅(n) denotes the
n-th jump of ∅
0
(i) A set B is Σn+1 ⇔ B is c.e. in some Σn0 set ⇔ B is c.e. in some Πn0
set.
(ii) The set ∅(n) is Σn0 m-complete.
0
(iii) A set B is Σn+1 iff B is c.e. in ∅(n) .
(iv) A set B is ∆0n+1 iff B 6T ∅(n) .
Proof. (i) First note that if B is c.e. in A then B is also c.e. in A. Thus, being
c.e. in a Σn0 set is the same as being c.e. in a Πn0 set, so all we need to show
0
is that B is Σn+1 iff B is c.e. in some Πn0 set.
5.6 The Arithmetic Hierarchy 139
The “only if” direction has the same proof as the corresponding part of
Proposition 5.6.1, except that the computable relation R in that proof is now
replaced by a Πn0 relation R.
For the “if” direction, let B be c.e. in some Πn0 set A. Then, by Proposition
5.6.1 relativized to A, there is an e such that n ∈ B iff
We need to m-reduce ∅00 to Fin. Using the s-m-n theorem, we can define a
computable function f such that for all s and e, we have s ∈ Wf (e) iff there
0 0 0
is a t > s such that either Φ∅e (e)[t] ↑ or ∅0t+1 ϕ∅e (e)[t] 6= ∅0t ϕ∅e (e)[t]. Then
0
f (e) ∈ Fin iff Φ∅e (e) ↓ iff e ∈ ∅00 .
Part (ii) is similar, and is left to the reader in Exercise 5.6.9. t u
140 5 Deeper Computability
The following result is more difficult, and can easily be skipped; but the
reader who works through the proof will see some of the more dynamic meth-
ods used in classical computability theory.
∗
Theorem 5.6.8. Cof = {e : We is cofinite} is Σ30 m-complete.
∗
Proof. First Cof is Σ30 because
Stage s + 1.
Case 1. No (x, v) with v 6 s fires. In this case we will add a new Z and
otherwise change nothing: For v 6 s, Zv,s+1 = Zv,s , and we pick the least
fresh number a, and define Zs+1,s+1 = {a} so that z(s + 1, s + 1) = a and
declare that ϕf (x) (a) ↑ [s + 1].
Case 2. Some (x, v) with v least fires at stage s+1. Then, as above, we let
Zv,s+1 eat Zv0 ,s for s > v 0 > v, and for all elements a ∈ Zv,s ∪ ∪s>v0 >v Zv0 ,s ,
declare that ϕf (x) (a) ↓ [s + 1], if currently ϕf (x) (a) ↑, that is, if necessary. We
add the least number z 6∈ Zv,s ∪ ∪s>v0 >v Zv0 ,s into Zv,s+1 and declare that
ϕf (x) (z) ↑ [s+1], so that z = z(v, s+1). Then, as indicated by the zone rules,
we define Zv0 ,s+1 each with single with fresh numbers for s + 1 > v 0 > v.
End of Construction
Now we verify that the construction works. Suppose that x ∈ Cof. Then
∃v∀t∃sR(x, v, t, s). We call v a witness if ∀t∃sR(x, v, t, s). There is some least
witness, and we consider this v. Since for all w < v, ∃t∀s > t¬R(x, w, t, s),
there is some stage s0 after which (x, w) will not fire for any w < v. If s0 − 1
is the last stage any such (x, w) fires, then after s0 , Zv,s will never again be
initialized, since zones are initialized only by smaller (x, w) firing.
Now, since ∀t∃sR(x, v, s, t), (x, v) must fire infinitely many times. Every
stage s it fires it eats all bigger zones v 0 for v 6 v 0 6 s in existence and
makes ϕf (x) (a) ↓ [s + 1] for all p 6 a 6 s, where p is the least number in
Zv,s (which does not change after stage s0 ). Because it fires infinitely often,
for every number a > p there will be some stage where we define ϕf (x) (a) ↓ .
Thus f (x) ∈Cof.
Conversely, suppose that x is not in Σ30 . That is
∀v∃s∀t¬R(x, v, s, t).
Then for each v, we will reach a stage s = sv such that, for all w 6 v, (x, w)
does not fire after stage sv . It follows that Zv,sv = Zv,s0 for all s0 > sv . Every
zone at each stage has one element z(v, s) with ϕf (x) (z(v, s)) ↑ [s], and this
won’t change after stage sv . Therefore there are infinitely many elements,
namely z(v) = lims z(v, s) with ϕf (x) (z(v)) ↑, so that f (x) 6∈Cof. t u
5.6.4 Exercises
Exercise 5.6.9 Prove that the following sets are are both Π20 m-complete.
1. Tot = {e : ϕe is total}.
2. Inf = {e : We is infinite}.
Exercise 5.6.10 Show that the index sets J = {e : ϕe halts in exactly 3
places} and Q = {e : ϕe halts in at most 3 places} are both Σ20 m-complete.
142 5 Deeper Computability
Exercise 5.6.11 Show that the index set P = {e : ϕe has infinite co-infinite
domain (i.e. ∃∞ x 6∈ dom ϕe ∧ ∃∞ y ∈ dom ϕe )} is Π20 m-complete.
Exercise 5.6.12 ∗ (You will need to use the method of Theorem 5.6.8.)
Prove that {hx, yi : Wx =∗ Wy } is Σ30 m-complete. Here Wx =∗ Wy means
that they differ by only a finite number of elements.
Exercise 5.6.13 ∗ [Lerman [Ler81]] (You will need to use the method of
Theorem 5.6.8.) Let L be a linear ordering. A block of size n in L is a
collection of elements x1 <L · · · <L xn , such that (xi , xi+1 ) is an adjacency
and there is no y ∈ L with either (y, x1 ) nor (xn , y) an adjacency. (That is
x1 and xn are limit points.) Consider a computable ordering L of order-type
Z + n1 + Z + n1 + . . . . Then let B(L) = {n : L has a block of size n} Such
and L is called an Z-representation of S.
1. Show that B(L) is Σ30 .
2. Show that if S is an Σ30 set, then it has a Z-representation; that is there is
a computable ordering L (of the indicated order-type), with B(L) = S 9 .
(Hint: Without loss of generality, suppose that 2 ∈ S, and is the smallest element of
S. Let x ∈ S iff ∃v∀s∃tR(x, v, s, t). Break the ordering into Z + 2 + Z + 2 + . . . at
stage 0. Devote n-th copy of 2 to hx, vi = n. First turn the 2-block into an x-block.
The idea is to try to build around 2 by putting points y between the 2 block and
the Z’s on each side: Z + (points in here) + x + (points in here) + Z. Whilst hx, vi is
not firing try to turn the x into a copy of Z, by putting point on the outside of the
Z + (yyyyyyxyyyyyy) + Z (here y denotes single points) by putting one y more at
each end Each time hx, vi fires also put a y on the right hand end of the first lot of
y-points, and one of the left hand end of the second. If hx, vi fires infinitely often, it
will isolate out the x-block, and we’ll produce Z + Z + x + Z + Z. If it stops firing then
we’ll replace the x block by a copy of Z, and we’ll finish with Z + Z + Z.)
It is consistent with our knowledge so far that the only Turing degrees are
0, 00 , 000 , . . . (Actually with “transfinite” extensions of this beginning with
0(ω) = degT (∅(ω) ) where ∅(ω) =def ⊕n∈N ∅(n) , a uniform upper bound of all the
0(n) ’s.), but certainly the only arithmetical degrees we have seen have been
the degrees of the iterated jumps of ∅, given by Post’s Theorem, Theorem
5.6.6. Maybe those are the only Turing degrees; namely the natural degrees
9 This result will relativize. For any X, and S a Σ3X set, there is an X-computable linear
0
ordering Z-representing S. By Post’s Theorem there exist S ∈ Σ3∅ \ Σ30 , such as ∅00 .
Since B(L) is a classical invariant, it follows that there are ∅0 -computable linear orderings
not classically isomorphic to computable ones. The reason is that if L b was a computable
ordering isomorphic to L, then B(L) b is Σ 0 , a contradiction. The existence of such an
3
ordering is due to Feiner [Fei68] using slightly different methods.
5.7 The Structure of Turing Degrees and Post’s Problem 143
generated by the jump operator10 . That would make the structure of the
ordering of the degrees a linear ordering 0 < 00 < 000 < . . . .
This picture is very far from being true. The Turing degrees, even below
00 form a complex structure which is an upper semilattice and not a lattice.
We won’t have time to look at the complex structure as that is best left to a
second course in computability theory as per the classic texts Rogers [Rog87],
Lerman [Ler83], and Soare [Soa87, Soa16].
In this section we will have a first foray into the fascinating techniques
developed to understand the structure of the Turing degrees. It is fair to say
that the impetus for this analysis came from Post’s classic paper [Pos44].
Rice’s Theorem 5.2.1 shows that all index sets are of degree > 00 . In 1944,
Post [Pos44] observed that all computably enumerable problems known at the
time were either computable or of Turing degree 00 . He asked the following
question.
As we will see in §5.7.2, Post’s problem was finally given a positive answer
by Friedberg [Fri57] and Muchnik [Muc56], using a new and ingenious method
called the priority method. This method was an effectivization of an earlier
method discovered by Kleene and Post [KP54]. The latter is called the finite
extension method, and was used to prove the following result.
Theorem 5.7.1 (Kleene and Post [KP54]). There are degrees a and b,
both below 00 , such that a|T b. That is a 66T b and b 66T a. In other words,
there are ∅0 -computable sets that are incomparable under Turing reducibil-
ity.11
Proof. As with §5.4, we will break our task down into infinitely many re-
quirements. We construct A = lims As and B = lims Bs in stages, to meet
the following requirements for all e ∈ N.
R2e : ΦA
e 6= B.
R2e+1 : ΦB
e 6= A.
that the degrees constructed in the proof of Theorem 5.7.1 are not necessarily computably
enumerable, but merely ∆02 .
144 5 Deeper Computability
The reasoning at the end of the above proof is quite common: we often
make use of the fact that ∅0 can answer any ∆02 question, and hence any Σ10
or Π10 question.
A key ingredient of the proof of Theorem 5.7.1 is the use principle (Propo-
sition 5.5.1). In constructions of this sort, where we build objects to defeat
certain oracle computations, a typical requirement will say something like
“the reduction Γ is not a witness to A 6T B.” If we have a converging com-
putation Γ B (n)[s] 6= A(n)[s] and we “preserve the use” of this computation
by not changing B after stage s on the use γ B (n)[s] (and similarly preserve
A(n)), then we will preserve this disagreement. But this use is only a finite
portion of B, so we still have all the numbers bigger than it to meet other re-
quirements. In the finite extension method, this use preservation is automatic,
since once we define B(x) we never redefine it, but in other constructions we
will introduce below, this may not be the case, because we may have occasion
to redefine certain values of B. In that case, to ensure that Γ B 6= A, we will
have to structure the construction so that, if Γ B is total, then there are n
and s such that Γ B (n)[s] 6= A(n)[s] and, from stage s on, we preserve both
A(n) and B γ B (n)[s].
5.7.1 Exercises
That is, for each x, A can determine if x ∈ A, without using x in any query
to itself. For example, a complete theory T has this property since, for each
(code of) a sentence x, x ∈ T iff ¬x 6∈ T . Using the finite extension method
construct a set A 6T ∅0 which is not autoreducible.
Exercise 5.7.3 (Jockusch and Posner [JP78])
1. A set A is called 1-generic if for all c.e. sets of strings V (that is, V = {σ :
the code of σ ∈ We } for some e), one of the following holds.
a. There is a σ ∈ V with σ ≺ A.
b. There is a σ ≺ A, such that for all τ ∈ V , σ 6≺ τ .
12 More precisely, we use the s-m-n theorem to construct a computable ternary function
f such that for all e, σ, x, and z, we have Φf (e,σ,x) (z) ↓ iff (5.2) holds. Then (5.2) holds iff
f (e, σ, x) ∈ ∅0 .
146 5 Deeper Computability
Ri,j : ΦA B
i = Φj = f total implies f is computable,
as well as
PeA : A 6= We ,
PeB : B 6= We .
The PeAand PeB are met by diagonalization with witnesses. The more complex require-
ments Ri,j requirements are met as follows. Suppose that we have As and Bs and are now
dealing with Ri,j at step s + 1. We seek extensions σ and τ , and a witness n such that
As 4 σ, Bs 4 τ and t > s with
Φσ τ
i (n) ↓6= Φj (n) ↓ [t].
R2e : ΦA
e 6= B.
R2e+1 : ΦB
e 6= A.
Find the least j with Rj requiring attention. (If there is none, then proceed
to the next stage.) We suppose that j = 2e, the odd case being symmetric.
If R2e has no follower, then let x be a fresh large number (that is, one larger
than all numbers seen in the construction so far) and appoint x as R2e ’s
follower.
If R2e has a follower x, then it must be the case that ΦA e (x)[s] ↓= 0 =
Bs (x). In this case, enumerate x into B and initialize all Rk with k > 2e by
canceling all their followers.
In either case, we say that R2e receives attention at stage s.
End of Construction.
Verification. We prove by induction that, for each j,
(i) Rj receives attention only finitely often, and
(ii) Rj is met.
150 5 Deeper Computability
Suppose that (i) holds for all k < j. Suppose that j = 2e for some e, the
odd case being symmetric. Let s be the least stage such that for all k < j, the
requirement Rk does not require attention after stage s. By the minimality of
s, some requirement Rk with k < j received attention at stage s (or s = 0),
and hence Rj does not have a follower at the beginning of stage s + 1. Thus,
Rj requires attention at stage s + 1, and is appointed a follower x. Since
Rj cannot have its follower canceled unless some Rk with k < j receives
attention, x is Rj ’s permanent follower.
It is clear by the way followers are chosen that x is never any other re-
quirement’s follower, so x will not enter B unless Rj acts to put it into B.
So if Rj never requires attention after stage s + 1, then x ∈ / B, and we never
have ΦA A A
e (x)[t] ↓= 0 for t > s, which implies that either Φe (x) ↑ or Φe (x) ↓6= 0.
In either case, Rj is met.
On the other hand, if Rj requires attention at a stage t + 1 > s + 1, then
x ∈ B and ΦA e (x)[t] ↓= 0. The only requirements that put numbers into A
after stage t + 1 are ones weaker than Rj (i.e., requirements Rk for k > j).
Each such strategy is initialized at stage t + 1, which means that, when it is
later appointed a follower, that follower will be bigger than ϕA e (x)[t]. Thus no
number less than ϕA e (x)[t] will ever enter A after stage t + 1, which implies,
by the use principle, that ΦA (x) ↓= ΦA e (x)[t] = 0 6= B(x). So in this case
also, Rj is met. Since x ∈ Bt+2 and x is Rj ’s permanent follower, Rj never
requires attention after stage t + 1. t u
The reader may wonder if all this complexity is necessary for solving
Post’s Problem. There are solutions to Post’s Problem not using the priority
method such as Kučera [Kuč86] and Downey, Hirschfeldt, Nies and Stephan
[DHNS03]. These proofs are much more difficult than the proof above. In
fact using metamathematical techniques, they have been shown to be prov-
ably more complicated in a certain technical sense! Also the non-priority
techniques used have not yet seen many further applications. The priority
technique has seen many applications throughout computability theory, com-
putable structure theory, and also in areas such as algorithmic randomness
and descriptive set theory. Such applications are beyond the scope of the
present text, but we refer the reader to Downey and Hirschfeldt [DH10], Ash
and Knight [AK00], Downey and Melnikov [DMar], and Moschovakis [Mos09].
The above proof is an example of the simplest kind of finite injury argu-
ment, what is called a bounded injury construction. That is, we can put a
computable bound in advance on the number of times that a given require-
ment Rj will be injured. In this case, the bound is 2j − 1.
We give another example of this kind of construction, connected with the
important concept of lowness. It is natural to ask what can be said about the
jump operator beyond the basic facts we have seen so far. The next theorem
proves that the jump operator on degrees is not injective. Indeed, injectivity
fails in the first place it can, in the sense that there are noncomputable sets
5.7 The Structure of Turing Degrees and Post’s Problem 151
that the jump operator cannot distinguish from ∅. Recall that X is low if
X 0 ≡T ∅0 .
Theorem 5.7.6 (Friedberg). There is a noncomputable c.e. low set.
Proof. We construct our set A in stages. To make A noncomputable we need
to meet the requirements
Pe : A 6= We .
To make A low we meet the requirements
Ne : ∀n [(∃∞ s ΦA A
e (n)[s] ↓) =⇒ Φ (n) ↓].
To see that such requirements suffice, suppose they are met and define the
computable binary function g by letting g(x, s) = 1 if ΦA e (x)[s] ↓ and g(x, s) =
0 otherwise. Then g(e) = lims g(e, s) is well-defined, and by the limit lemma,
A0 = {e : g(e) = 1} 6T ∅0 .
The strategy for Pe is simple. We pick a fresh large follower x, and keep
it out of A. If x enters We , then we put x into A. We meet Ne by an equally
simple conservation strategy. If we see ΦA e (n)[s] ↓ then we simply try to
ensure that A ϕA e (n)[s] = A s ϕ A
e (n)[s] by initializing all weaker priority
requirements, which forces them to choose fresh large numbers as followers.
These numbers will be too big to injure the ΦA e (n)[s] computation after stage
s. The priority method sorts the actions of the various strategies out. Since
Pe picks a fresh large follower each time it is initialized, it cannot injure any
Nj for j < e. It is easy to see that any Ne can be injured at most e many
times, and that each Pe is met, since it is initialized at most 2e many times.
t
u
We finish this section with a classical application of the bounded injury
method to computable linear orderings. The conflicts are very clear between
the two teams of requirements in this proof. The presentation of this proof
is taken from [DMNar].
Theorem 5.7.7 (Tennenbaum, unpublished). There is a computable
copy of ω ∗ + ω with no infinite computable ascending or descending sub-
orderings.
Proof. We will build the ordering (A, 6A ) in stages. The domain of A will be
N so that suborderings correspond to subsets of N. Recall that We denotes
the e-th computably enumerable set. We meet the requirements
and similarly
Cs = cm,s <A cm−1,s <A · · · <A c0,s .
Thus we will need to ensure that for all i, limi ci,s = ci exists and similarly
lims bi,s = bi exists. (We could explicitly add this as new requirements, but
it is unnecessary as we see below.)
Now the blue part is of course the ω ∗ part and the red part the ω part of A.
At any stage s+1 red element can become blue and vice versa. If bi,s becomes
red, say, then every element x ∈ Bs with x >A bi,s will also become red, so
that if bi,s is the 6A -least element that becomes red, and k elements become
red, then at stage s + 1 we’d have the blue elements now b0,s , . . . , bi−1,s , and
the red ones now cm+k,s , cm+k−1,s , . . . , c0,s . That is bj,s+1 = bj,s for j 6 i − 1
and cj,s+1 are the same for j 6 m and for j > m are defined as the erstwhile
blue elements.
Why would we do such a thing? We wish to make sure that We is not an
ascending (blue) ω sequence. If We contains a red element then it cannot be
such a sequence. So the obvious strategy for R2e if to wait till we see some
blue bi,s ∈ We,s , and make it red, as indicated. To make sure that we don’t
do this for all elements, we will only look at bi,s > 2e, so that R2e has no
authority to recolour elements {0, 1, . . . , 2e}. On the other hand, R2q+1 is
trying to stop Wq from being an infinite ω ∗ sequence, and by the same token
wants to make red elements blue. If we allowed it to undo the work we just
did for R2e by making the erstwhile bi,s blue again, we would undo the work
we have done to meet R2e . Thus, when we act to Rk , we will do so with
priority k, and make some element the colour demanded by Rk , unless Rk0
for k 0 < k wishes to change its colour.
So we would say R2e requires attention at stage s if R2e not currently
declared satisfied and we see some bi,s ∈ We,s not protected by any Rk for
k < 2e and bi,s > 2e, and similarly R2q+1 with ci,s in place of bi,s . The
construction if to take the smallest k if any, and perform the demanded re-
colouring demanded by Rk which requires attention. The only other part of
the construction if that at ever stage, we will add one more blue element to
the right of Bs+1 and one more red element to the left of Cs+1 . The remaining
details are to let the requirements fight it out by priorities.
A routine induction on k shows that we meet Rk . Once the higher priority
requirements have ceased activity, if Rk requires attention via some n ∈ Wd,s
the whatever colour Rk gives n will thereafter be fixed, as Rk has priority.
Finally, if Wd is infinite the such an n will occur. t u
5.7.3 Exercises
There are priority arguments in which the number of injuries to each re-
quirement, while finite, is not bounded by any computable function. One
example is the following proof of Sacks’ Splitting Theorem [Sac63]. We write
A = A0 t A1 to mean that A = A0 ∪ A1 and A0 ∩ A1 = ∅.
which can be thought of as a high water mark for the length of agreements
seen so far. Associated with this maximum length of agreement function is a
use function
u(e, i, s) = max{ϕA i
e i(z) : z 6 l (e, t)[t] : t 6 s},
i 13
using the convention that use functions are monotone increasing where de-
fined.
The main idea of the proof is perhaps initially counterintuitive. Let us con-
sider a single requirement Re,i in isolation. At each stage s, although we want
ΦA Ai
e 6= A, instead of trying to destroy the agreement between Φe (k)[s] and
i
i
A(k)[s] represented by l (e, s), we try to preserve it. (A method sometimes
called the Sacks preservation strategy.) The way we do this preservation is to
put numbers entering A u(e, i, s) after stage s into A1−i and not into Ai .
By the use principle, since this action freezes the Ai side of the computations
involved in the definition of li (e, s), it ensures that for all k < li (e, s), we
have ΦA Ai
e (k) = Φe (k)[s].
i
Now suppose that lim inf s li (e, s) = ∞, so for each k we can find infinitely
many stages s at which k < li (e, s). For each such stage, ΦA Ai
e (k) = Φe (k)[s] =
i
A(k)[s]. Thus A(k) = A(k)[s] for any such s. So we can compute A(k) simply
by finding such an s, which contradicts the noncomputability of A. Thus
lim inf s li (e, s) < ∞, which clearly implies that Re,i is met.
In the full construction, of course, we have competing requirements, which
we sort out by using priorities. That is, we establish a priority list of our
requirements (for instance, saying that Re,i is stronger than Re0 ,i0 iff he, ii <
he0 , i0 i). At stage s, for the single element xs entering A at stage s, we find
the strongest priority Re,i with he, ii < s such that xs < u(e, i, s) and put xs
into A1−i . We say that Re,i acts at stage s. (If there is no such requirement,
then we put xs into A0 .)
13 Of course, in defining this set we ignore t’s such that li (e, t) = 0. We will do the same
without further comment below. Here and below, we take the maximum of the empty set
to be 0.
5.7 The Structure of Turing Degrees and Post’s Problem 155
5.7.5 Exercises
Exercise 5.7.13 Show that Sacks method can be used to prove the follow-
ing. Given a c.e. set C 6≡T ∅, there is a c.e. noncomputable set A with C 66T A.
(Hint: The requirements are Re : A 6= We , and Ne : ΦA
e 6= C. Use the length of agreement
function `(e, s) = max{x : ∀y < x[ΦA
e (y) = C(y)[s]], where C = ∪s Cs is a computable
enumeration of C.)
Exercise 5.7.14
1. Consider the proof of Sacks Splitting Theorem. Prove that for any non-
computable c.e. set C, we can add requirements of the form ΦA e 6= C
i
to the above construction, satisfying them in the same way that we did
for the Re,i via the method of Exercise 5.7.13. Thus, as shown by Sacks
[Sac63], in addition to making A0 |T A1 , we can also ensure that Ai T C
for i = 0, 1.
2. Show that C being c.e. is not essential. Such an argument shows that
if C is a noncomputable ∆02 set, then there exist Turing incomparable
c.e. sets A0 and A1 such that A = A0 t A1 and Ai T C for i = 0, 1.
Here you would use the Limit Lemma to begin with a ∆02 approximation
C = lims Cs .
14 A much more difficult result is that the computably enumerable degrees are dense. That
is if a > c are c.e., then there is a c.e. degree c <T b <T a. This beautiful result due to
Sacks [Sac64], and needs the infinite injury method to prove it.
15In fact, it is possible to calibrate how many times the injury must occur using more
complex techniques, and we refer the reader to [DASMar]
Part III
Computational Complexity Theory
Chapter 6
Computational Complexity
6.1 Introduction
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 159
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_6
160 6 Computational Complexity
important, especially as we are in the computer age. But first we will need
to deal with some preliminaries. We’ll need to develop the framework.
Turing showed that we could break down computation into elementary
steps, and computable functions were built from combinations of these steps.
The most intuitive idea for trying to understand how hard some computation
is would be to try to understand the resources we need to perform these steps.
Classically, the resources which have been the most important are:
These two measures will be clarified once we have developed models for
them. Once we have developed models, we will then develop measures of
hardness which we will use to calibrate the intrinsic difficulty of the problems.
It seems that what takes about n steps on a Turing machine with binary
input seems to take about n2 many steps on unary.
Wrong!!. If I give M the number 2100 in binary the representation would
be a string of 1 followed by 99 many 0’s. In unary, the input would be 2100
many 1’s, exponentially longer. For the input n = 250,000,000 , in unary the
input itself, would be larger than the estimated size of the universe! It would
have galactic size.
What’s the correct way to measure the difference between the cost for
unary and for binary? We want to measure the performance in terms of the
natural size of the input.
The following notation/definition is important:
n
So if f (n) = 3n, g(n) = 27n + 2, and h(n) = 25 , then f is O(g), g is O(f ),
f is o(h), and certainly h is Ω(f ). The number of steps for f (n) = 2n with
the input in binary is o(g) where g is the number of steps to even read the
input in unary.
Why base 2? The issue above is that representing objects in unary means
that the input is exponentially longer than in binary. But this is not true be-
tween bases binary and ternary for instance. Since we are only interested in
162 6 Computational Complexity
Thus we see that the theory we will be looking at does depend on the way
that we specify the input; but that the specification is natural.
Pairing. We will also have to be careful with how we used various com-
putable functions from the previous chapters. Consider the pairing function:
hx, yi. Now supposing that x, y ∈ {0, 1}∗ , presumably we want the size of the
code representing the pair as not being too much larger. One way to do this
is locally double x, put 01 and then write y. For example, if x = 10001
and y = 1101, then we can represent hx, yi as 1100000011011101. Then
|hx, yi| = 2|x| + |y| + 2. This is “small” relative to |x| + |y|.
The reader might wonder how much we can compress the information in
(x, y). The answer comes from an area of research called Kolmogorov Com-
plexity, and we refer the reader to, for example, Li and Vitanyi [LV93] or
Downey and Hirschfeldt [DH10].
For our purposes we will deal with languages L ⊆ {0, 1}∗ . The length of a
string will be the size of the input. For simplicity, we will be concerned with
the following form, at least at present:
6.1 Introduction 163
Membership of L
Input: x ∈ Σ ∗ .
Question: Is x ∈ L?
work tape
M
input/read tape
Learning algorithm. But for the general theory, we are considering the coarse
complexity theoretical behaviour such as |x|2 vs 2|x| , and hence for coarse
behaviour we have the following variation of the Church-Turing Thesis :
Theorem 6.1.1. All of the models which we have discussed in this book
can be translated from one to another with at worst a cubic overhead.
That is, not only is there a polynomial overhead, but the polynomial has
low degree (3). In the case of space these translations only take constantly
extra space. The proof comes from very carefully counting how we translate,
for example, a single step of a register machine as several steps in a Tur-
ing Machine. Or how to translate a step for a partial recursive function as
steps in Turing Machine simulation. For example, for partial recursive func-
tions, on input x, we count |x| many steps for the basic zero, successor and
monus functions, and then recursion translates as counting etc. We will take
Theorem 6.1.1 as a Black Box.
Various special g’s are given names. So if g(n) = O(n), then L ∈ DTIME(g)
is called linear time on a Turing Machine, and similarly quadratic time, for
g(n) = O(n2 ). Notice that, for such small time classes, we are careful to spec-
ify the computation device. Linear time on a Turing Machine would not be
the same as linear time on a Register Machine. There is a branch of compu-
tational complexity theory concerned with finding, for example, the fastest
algorithm for processes such as matrix multiplication3 . For such questions,
we would not work with a general model such as a Turing Machine. Rather,
we would be concerned with a model which more closely resembles a modern
computer; for example a Register Machine or something similar such as a
“random access machine”. For a fixed machine model it does make sense to
discuss questions such as time n2 vs time n2.5 .
But keep in mind we are more concerned with coarser delineations such as
O(n) vs O(2n ), feasible vs infeasible. In that case, the model plays no part,
at least with the current state of human knowledge.
6.1.4 Exercises
Exercise 6.1.2 The following are Blum’s Axioms for measuring resources
used in computation. We say that a collection of partial computable functions
{Γe : e ∈ N} are a complexity measure iff the following two axioms are
satisfied.
(i) ϕe (x) ↓ iff Γe (x) ↓.
(ii) The question “Does Γe (x) = s?” is algorithmically decidable.
3 As of 2023, this is O(n2.371552 ) Williams, Xu, Xu, and Zhou [WXXZ23].
6.1 Introduction 167
The P vs NP Problem.
not even known if P = PSPACE! We remark that later we will show that
PSPACE = NPSPACE. We will also prove that NP ⊆ PSPACE, and we also
don’t know if NP = PSPACE. Lots of basic open questions!
Definition 6.1.4.
Now the reader might wonder what kinds of functions are not, for example,
time constructible. We will construct some examples later when we look at
the Union Theorem, Theorem 6.3. However, for our purposes, constructibility
allows us to define “clocks” for time (and space).
Namely suppose that g is time constructible via machine M . We can enu-
merate
{hϕe , ci : ϕe runs in c times (i.e. multiplied) the time constructed by M, c > 1}.
That is, on input x we would run ϕe (x) only for c · M (1|x| ) many steps. If it
does not halt in time, we output 0. This would computably list DTIME(g).
We have an analog of Theorem 3.2.13, an enumeration theorem. Strictly
speaking, this takes about 2 · c · g(|x|) many steps, but we only really care
up to an O factor for the definition of a time (or space) class. (There is a
technical linear speed-up theorem, which says that, by changing the number
170 6 Computational Complexity
of states in the Turing Machine, if you can do something in time cg(|x|) and
g is superlinear with c > 1, then you can do it in time g(|x|). We will not
prove this result as it detracts from the story.)
Lemma 6.1.1. The following are time and space constructible.
ϕe (1m ) 6= L(1m ), and then moving on to the next requirement in the list.
Every step of the construction is kept O(g(s)) for some O independent of the
construction. So L ∈ DTIME(g). t u
1. DTIME(nk ) ⊂ DTIME(nk+1 ).
2. NTIME(nk ) ⊂ NTIME(nk+1 ).
3. DSPACE(nk ) ⊂ DSPACE(nk+1 ).
4. NSPACE(nk ) ⊂ NSPACE(nk+1 ).
At this stage, we introduce further complexity classes.
Definition 6.1.5.
S
1. LOG = S c>1 DSPACE(c · log n)
2. NLOG = c>1SNSPACE(c · log n)
c·n
3. E = DEXT = S c>0 DTIME(2 )
4. NE = NEXTS= c>0 NTIME(2c·n )
5. ESPACE = c>0 DSPACE(2c·n )
c
EXPTIME = c>0 DTIME(2n )
S
6.
c
NEXPTIME = c>0 NTIME(2n )
S
7.
The following separations use the same method as the Hierarchy Theorem.
Theorem 6.1.6.
1. P ⊂ E ⊂ EXPTIME.
2. LOG ⊂ PSPACE.
6.1.9 Exercises
Exercise 6.1.8 Show that if f and g are time constructible and ∃∞ n(g(n) >
f (n)), then DTIME(g) 6⊆ DTIME(f ).
f ◦ f ◦ f ◦ ··· ◦ f .
| {z }
n-times
That is, f composed with itself n times. We will meet the requirements:
174 6 Computational Complexity
Si : (ϕi accepts R in time Ψi ) → ∃j(ϕj accepts R∧(a.a.x)f (Ψj (x)) < Ψi (x)).
The presentation of the proof given here is adapted from a recent paper by
Bienvenu and Downey [BD20]. In that paper the authors used the Speedup
Theorem to show that access to a randomness source (as an oracle in the sense
of Chapter 8) will always accelerate the computation of some computable
functions by more than a polynomial amount. More on this in §8.1.1.
6.3 The Union Theorem∗ 175
In this section, we will prove another classic theorem not normally covered
in complexity courses today. A corollary to this result is that there is a single
computable function f such that P = DTIME(f ). It follows that f cannot be
time constructible; for otherwise the machine M running in time O(f ) would
accept (or could be modified to accept) a language in L ∈ DTIME(f ), mean-
ing that it would be in P. But that would contradict the Hierarchy Theorem,
since f would then need to be dominated by a polynomial O(|x|k ) and the
Hierarchy Theorem says that DTIME(f ) ⊆ DTIME(|x|k ) ⊂ DITME(|x|k+1 ).
Proof. For convenience for each n, and for each c ∈ N, there is some tm
for m > n, such that tm (k) > c · tn (k) for almost all k. This makes the
construction simpler as then we don’t need to worry about the constants in
the DTIME(tn ).
We build t in stages using a priority argument. To ensure that
[
DTIME(tn ) ⊂ DTIME(t),
n
Here “a.a. m” means “for all but finitely many.” It is convenient to define the
diagonal function d(m) = tm (m). Note that d is computable as the sequence
tn is uniformly computable. (In the case of tn = |x|n , then d(m) = mm .)
Note that if e 6 m then te (m) 6 d(m), by construction. But we can’t use
d = t since in the case of d(m) = mm , this is definitely not a polynomial, and
would be in DTIME(d). So we also have negative requirements
[
Ne : (a.a.x)ϕe (x) runs in time 6 t(|x|) then ϕe ∈ DTIME(tn ).
n
176 6 Computational Complexity
S
By this we mean the language Le accepted by ϕe is in n DTIME(tn ). We
denote the running time of ϕe by Ψe , to simplify notation.
We satisfy this as
[
Ne : Le 6∈ DTIME(tn ) → ∃∞ x[t(|x|) < Ψe (x)].
n
S
For all m, we will force t(m) 6 d(m). Now, if Le 6∈ n DTIME(tn ), then
Then as the construction proceeds we ask that the pair he, ni requires atten-
tion when we see some x with
If we make the value of t(|x|) smaller than Ψe (x), we would be working towards
meeting Ne .
Thus we will take a pairing h·, ·i, and at stage n, see if there x with |x| = n
and for which there is an he, zi not yet cancelled, such that
(i) he, zi 6 |x|.
(ii) the,zi (|x|) < Ψe (|x|) 6 d(|x|).
If there is no such pair, set t(|x|) = d(|x|).
If there is such a pair, choose the least one and define t(|x|) = Ψe (|x|) − 1.
Then cancel he, zi.
Clearly t is computable by the hypotheses of the theorem, and because we
can see if Ψe has halted yet; if not it’s running time is too large.
First we argue that we meet Pe . Note that t(|x|) < te (|x|) only when
we defined t(|x|) = Ψp (x) − 1 in the construction, when we would need
thf,pi (|x|) < Ψp (x). But this can only happen if hf, pi < e, and after it happens
we cancel hf, pi. So, from some point onwards t(|x|) > S te (|x|).
Second, we argue that Ne is met. Suppose that Le 6∈ n DTIME(tn ). Then
for all z,
∃∞ x(the,xi (|x|) < Ψe (x)).
If there are infinitely many x with d(|x|) < Ψe (x), and since t(|x|) < d(|x|),
we are done. Otherwise, for almost all x, Ψe (x) 6 d(|x|), and hence
Then he, zi will eventually receive attention, and we set t(|x|) < Ψe (x) by
construction. tu
Chapter 7
NP- and PSPACE-Completeness
In this Chapter, we will look at the class NP in more detail. We will see that
it relates to natural problems in way that was discovered in the 1970’s, and is
a widespread and important concept. There is a vague analog of c.e. vs NP.
By Kleene’s Normal Form Theorem, Theorem 3.3.8, a language L is c.e. iff
there is a computable (primitive recursive) R such that
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 177
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_7
178 7 NP- and PSPACE-Completeness
Definition 7.1.1.
Π1P Π2P
⊆
⊆
⊆
⊆
⊆
Σ1P Σ2P
This hierarchy is called the Polynomial (Time) Hierarchy, PH, a name
first given by Meyer and Stockmeyer [MS72]. In the case of the arithmetical
hierarchy, we know that all containments are proper, but alas for the time
bounded versions, we don’t know if any are proper!
The reader might note that the we have not denoted ΣkP ∩ ΠkP by ∆P k.
There is a class ∆P
k , but it won’t play any part in this book. It will be defined
when we look at polynomial time Turing reducibility in §7.2.4.
The analog of Kleene Normal Form is the following:
Conversely, if L is Σ1P i.e., x ∈ L iff ∃y(|y| 6 |x|d ∧ R(y, x)) then we design
a nondeterministic Turing Machine which, on input x, begins by guessing y
of length 6 |x|d and then checks if R(x, y) holds. t u
The theorem above says that NP models “guess and check” and that all the
nondeterminism can be guessed at the beginning. Theorem 7.1.1 is analogous
to Kleene’s Theorem (Lemma 3.3.1) which says that we can code c.e. sets
with a single search over a primitive recursive relation.
7.2 NP Completeness
x ∈ L1 iff f (x) ∈ L2 .
Proof. For any B ∈NP, compose the polynomial-time algorithm for A with
the polynomial-time function reducing B to A to get a polynomial-time al-
gorithm for B. tu
Theorem 7.2.1.
7.2 NP Completeness 181
The languages in Theorem 7.2.1 are unnatural, but are useful machine-
based ones.
7.2.2 Exercises
By Theorem 7.2.1, we know that there are some abstract languages taking
the role of the halting problem. They are NP-complete. But are there any
interesting, natural languages which are NP-complete? The answer is most
definitely yes. Cook [Coo71] and Levin [Lev73] both realized that there were
natural NP-complete problems. In the original papers, Cook showed that
Satisfiability, below, is NP-complete as well as Subgraph Isomorphism
(see Clique below), and Levin proved a version of Satisfiability, as well
as Set Cover and a tiling problem akin to that of Exercise 4.3.4 (see also
Exercise 7.2.9 below). We will begin with the Cook-Levin Theorem which
is concerned with satisfiability in propositional logic; that is, finding out of
the truth table of a propositional formula has a line that evaluates to be
true. We can view this example in the context of the Undecidability of First
Order Logic. We know that Propositional Logic is decidable, since the decision
process is to build a truth table for the formula. We see below that we can
use satisfiability of a formula of propositional logic to emulate instances of an
NP-complete problem, and hence the satisfiability problem is NP-complete.
Both Cook and Levin had the idea of using a generic reduction, meaning that
we will perform a direct simulation of nondeterministic Turing Machines.
Sat or Satisfiability
Input: A formula X of propositional logic.
Question: Does X have a satisfying assignment of boolean variables?
That is, is there at least one assignment of the boolean variables which
makes X true?
(1). Sat ∈ NP
(2). Sat is NP-hard
Proof. (1) Clearly, Sat ∈ NP. Simply take the machine M which on input X,
guesses values for the variables of X and then evaluates X on those values.
(2) Take any A ∈ NP, for example, A = L from Theorem 7.2.1. We need
to show that A 6pm Sat.
2 We do not think that this problem is NP-complete.
7.2 NP Completeness 183
G5VBy time N , M has entered state qy (the accept state) and accepted x,
06i6N Q[i, y]
G6 For each i, configuration of M at time i+1 follows by a single application
of transition function from the configuration at time i. i.e., the action is
faithful.
The transition function of M is the set ∆ of quadruples for M .
(i) Suppose that the quadruple is hqd , Sk , Sk0 , qd0 i. The the formula would
be ^ h
(Q(i, d) ∧ H(i, j) ∧ S(i, j, k)) →
06i6N
−N 6j6N
^ i
(Q(i+1, d0 )∧H(i+1, j)∧S(i, j, k 0 )∧ (S(i, j 0 , p) → S(i+1, j 0 , p)) .
j 0 6=j
06p6P
−N 6j 0 6N
06p6P
−N 6j 0 6N
06p6P
CNFSat
Input: A boolean formula X in conjunctive normal form
Question: Is X satisfiable?
In many ways Sat and CNFSat are the bases for most practical NP-
completeness results. As mentioned above, this proof is an example of a
generic reduction, where we emulate a nondeterministic Turing Machine step-
by-step. Usually, if we want to show B is NP-complete we show for some
known NP-complete problem A that A 6P m B. One particularly useful vari-
ation of Sat is 3-Sat, which is exactly the same problem as CNFSat but
the clause size is at most 3.
Then replace the clause by new clauses (Y1 ∨ Yn ∨ Zj,1 ) ∧ (¬Zj,1 ∨ Y2 ∨ Zj,2 ) ∧
(¬Zj,2 ∨ . . . ) ∧ . . . . For example,
ϕ = (p1 ∨ p2 ∨ ¬p3 ∨ p4 )
ψ = (p1 ∨ p2 ∨ z) ∧ (¬z ∨ ¬p3 ∨ p4 ).
The reader should note that you cannot use the Zj,k ’s to make the conjunction
of the clauses evaluate to true. You must make one of the literals Cj evaluate
to true. Moreover, if you can make Cj true, then this assignment can use the
Zj,k ’s to make the conjunction of the clauses in ψ to be true. That is ϕ is
satisfiable iff f (ϕ) = ψ is satisfiable. t
u
The reader should note that the polynomial time m-reduction used to
reduce CNFSat to 3-Sat is not a direct emulation as we have used in ma-
chine to machine reductions. They are simply equi-satisfiable formulae. What
about formulae where we want every line of the truth table to evaluate to be
true?
Boolean Tautology
Input: A formula X of propositional logic.
Question: Is X valid? That is, does every assignment of values to the boolean
variables evaluate to true?
The proof of Sat being NP-complete is so generic that it yields the fol-
lowing corollary.
The reader might also wonder if we can do better than 3-Sat. What about
clause size 2?
Proof. Take some ϕ in 2-CNF. Then the clauses in ϕ contain at most two
literals, and without loss of generality, exactly two. Think of every clause
(l ∨ l0 ) as a pair of implications:
Construct a directed graph G = (V, E) with a vertex for every literal (positive
and negated) and directed edges corresponding to the implications as above.
Recall a subset of the nodes of a directed graph is a strongly connected
component if there is a path between any pair of nodes in the subset. We
claim that ϕ is satisfiable iff no pair of complementary literals both appear
in the same strongly connected component of G.
7.2 NP Completeness 187
The reader should note that the reduction of this process is not a poly-
nomial m-reduction, as it involves many questions to the decision problem
3 This is a fact from graph theory. It is not hard to prove. To prove the problem in P it
would be enough to compute the transitive closure of the directed graph, and algorithms
such as Warshall’s Algorithm do this in polynomial time.
188 7 NP- and PSPACE-Completeness
The kind of Turing reduction reducing the search to the decision problem
is known in the literature as a self-reduction. The following is yet another
open problem in computational complexity theory:
If P=NP the answer is definitely yes. But assuming P6=NP, we don’t know.
All natural problems have this property.
We also remark that we can use polynomial time Turing reductions to
define the class ∆P +
k for k ∈ N . Recall from the Limit Lemma (Theorem 5.6.2)
and Post’s Theorem (Theorem 5.6.6) that A is ∆02 iff A ∈ Σ20 ∩Π20 iff A 6T ∅0 .
So ∆02 is the same as being 6T the halting problem. Complexity theorists
defined ∆P P
k as those languages polynomial time Turing reducible to Σk−1 .
P P P
Note that ∆k ⊆ Σk ∩Πk . (See Exercise 7.2.12.) We won’t pursue this further
as it plays no part in our story, save to point out that all kinds of analogs
we might expect from computability theory do not hold in computational
complexity theory, at least with the current state of knowledge.
7.2.5 Exercises
Exercise 7.2.9 (Levin [Lev73]) Recall from Exercise 4.3.4 that a Tiling
System T is a finite collection of unit square tiles with coloured edges and a
set of rules saying which colours can be next to which. Using the ideas from
Exercise 4.3.4, show that the following problem is NP-complete.
Tiling
Input: A Tiling System T and an integer N .
Question: Can T colour the N × N square from a given starting position?
Exercise 7.2.10 Suppose that I have a polynomial time oracle procedure
deciding whether a given graph G has a Hamilton Cycle. Construct self-
reduction showing that we can find Hamilton cycles in polynomial time.
7.2 NP Completeness 189
In this section, we will discover that many problems are NP-complete. There
are tens of thousands more. Imagine you are a worker in, say, computational
biology, or linguistics or some other area where discrete computational prob-
lems naturally occur. You have some problem and it seems hard to find an
efficient algorithm for it. Should you wish to show the problem is NP-hard,
then likely one of the problems in the current section will be helpful to re-
duce from. Moreover, the techniques we use such as local replacement and
component design are standard in applications of this hardness theory.
We start by generating a collection of NP-complete graphical problems.
We will begin by reducing from Sat. We recall the following definition.
Clique
Input : A graph G = (V, E), integer k
Question : Does G have a k-clique?
and are not complementary. There are k clauses, hence those literals form a
k-clique in G.
Conversely, suppose G has a k-clique.
G is k-partite4 , with partition elements corresponding to clauses, so the k-
clique must have one element in each clause.
Now, assign true to the literals corresponding to the vertices in the k-clique.
This is possible since no pair of complementary literals are connected, hence
cannot be in the clique. The remaining literals can have values assigned ar-
bitrarily.
The resulting truth assignment assigns true to at least one literal in each
clause, hence ϕ is satisfiable. tu
4 Meaning that the vertices of G can be split into k sets S1 , . . . , Sk of vertices all disjoint
from the others, and if xy is an edge of G for some i 6= j, x ∈ Si and y ∈ Sj .
7.2 NP Completeness 191
Hence, we have shown that CNFSat ≡pm Clique, and so we know that
Clique is NP-complete.
Independent Set
Input : A graph G = (V, E), integer k.
Question does G have an independent set of size k?
Then G has a k-clique iff Ḡ has an independent set of size k. This simple
one-to-one correspondence gives polynomial reductions in both directions.
t
u
Vertex Cover
Input : A graph G = (V, E), and a positive integer k.
Question : Is there a vertex cover in G of cardinality 6 k?
Proof. We reduce from CNFSat. In fact, we can assume that we are reducing
from an instance of Exact 3-Sat, by Exercise 7.2.11. (In the diagram we
have indicated the minor change needed to make the reduction work for
CNFSat.) We can trivially reduce k-Col to 3-Col for k > 3 by appending
a (k − 3)-clique and edges from every vertex of the k − 3-clique to every other
vertex. So we only need to show CNF-Sat6pm 3-Col.
Let ϕ be a propositional formula in CNF. We will construct a graph G
that is 3-colourable iff ϕ is satisfiable.
Create three central vertices, R, G, B, connected in a triangle. These
letters indicate labels for us but are not part of the graph. We can think of
these as aligning to the colours red, green, and blue. Also create a vertex for
each literal in ϕ, connecting each to both its complement and B.
R G
Now, if every vertex along the top is coloured red this corresponds to ϕ
being unsatisfied.
But we easily see that this cannot give a legal 3-colouring, as the middle
vertices must then all be blue so the bottom row alternates red and green,
clashing with the final vertex in the bottom row.
Suppose however that one of the vertices on the top row is coloured green,
which corresponds to ϕ being satisfied.
Then we can colour the vertex below it red and all other vertices on the
middle row blue. This forces a colouring for the bottom row which is successful
alternation of red and green, with one blue vertex under the green top row
vertex. This 3-colouring extends to the whole graph G.
Thus, we have that if there is a legal 3-colouring then the subgraph corre-
sponding to each cause must have at least one green literal, hence we have a
satisfying truth assignment for ϕ.
Conversely, if there is a satisfying assignment then simply colour the true
variables green and the false ones red. There is a green literal in each clause,
so the colouring can be extended to a 3-colouring of the whole graph.
Hence, ϕ is satisfiable iff G is 3-colourable, and G can be constructed in
polynomial time. t u
The technique in the proof above is called component design.
Often in problems with a parameter k, like k-Col and k-CNF-SAT, larger
values of k make the problem harder. However, this is not always true. Con-
sider the problem of trying to determine whether a planar graph 5 has a
k-colouring. The problem is easy for k = 1, k = 2, and trivial (“just say yes”)
for k > 4 (by the famous Four Colour Theorem [HA77, AHK77]).
Theorem 7.2.16 (Karp [Kar73]). 3-Col 6pm Planar 3-Col, so Pla-
nar 3-Col is NP-complete.
Proof. This proof uses widget design or gadget design. We will use a crossing
widget.
Given an undirected graph G = (V, E), embed it in the plane arbitrarily,
letting edges cross if necessary. Replace each such cross with the following
edge crossing gadget:
W=
5 That is, a graph that can be drawn on the plane with no edges crossing.
194 7 NP- and PSPACE-Completeness
7.2.7 Exercises
Exercise 7.2.17 Prove that the widget of Theorem 7.2.16 does what is
claimed. This is basically case analysis.
Exercise 7.2.18 Prove that 2-Col ∈ P (for any graph).
One problem related to 3-Col, but not obviously so, is the following.
Proof. We reduce from 3-Col. Let G = (V, E) be the given graph. For
each v ∈ V we will add 4 elements to X, v together with elements la-
belled Rv , Gv , Bv . Then for each edge uv ∈ E, add three elements to X,
Ruv , Buv , Guv . Then the total size of X is O(|G|).
Now we need to specify the family of subsets. For each v ∈ V , construct
a subset SvR which contains v and Ruv for each vu ∈ E. Now do the same to
define SvB and SvG . To complete the S construction, add sets of the form {p}
for p ∈ X and p 6= v for v ∈ V ∪ uv∈E {Ruv , Buv , Guv }. Again the size is
O(|G|).
Now, suppose that Exact Cover has a solution. Then, because of the
sets of the form {p}, each of these must be chosen. To cover each v ∈ V ,
we must choose exactly one of RvB , RvG , or RvB . This is a 3-colouring. The
reasoning reverses, giving the result. t u
7.2.9 Exercises
Exercise 7.2.22 Consider the following scheduling problem. You are given
positive integers m and t and an undirected graph G. The vertices of G
specify unit time jobs and the edges specify that two jobs cannot be scheduled
6 You can think of the dominating sets as placements of e.g. radar or cell phone towers.
One of these gets destroyed and is no longer available for placement of towers. Maybe the
land is now a designated park.
196 7 NP- and PSPACE-Completeness
Subset Sum
Input : A finite set S, integer weight function w : S → N, and a target
integer B.
Question : Is there a subset S 0 ⊆ S such that a∈S 0 w(a) = B?
P
m−1
X
B= px
x=0
pm − 1
= .
p−1
In p-ary notation, w(A) looks like a string of 0’s and 1’s with a 1 in position
x for each x ∈ A and 0 elsewhere. B looks like a string of 1’s of length m.
Adding the numbers w(A) simulates the union of the sets A. The number
p was chosen big enough so that we don’t get into trouble with carries.
Asking whether there is a subset sum that gives B is the same as asking
for an exact cover of X. t u
7.2 NP Completeness 197
Partition
Input : A finite set S, integer weight functionPw : S → N. P
Question : Is there a subset S 0 ⊆ S such that a∈S 0 w(a) = a∈S−S 0 w(a)?
Knapsack
Input : A finite set S, integer weight function w : S → N, benefit function
b : S → N, weight limit W ∈ N, and desired benefit BP∈ N.
Question
P : Is there a subset S 0 ⊆ S such that a∈S 0 w(a) 6 W and
a∈S 0 b(a) > B?
Bin Packing
Input : A finite set S, volumes w : S → N, bin size B ∈ N, k ∈ N.
Question : Is there a way to fit all the elements of S into k or fewer bins?
Proof. For Partition, we reduce from Subset Sum. We introduce P two new
elements of weight N − B and N − (Q − B), where we have Q = a∈S w(a)
and N is sufficiently large (with N > Q).
Choose N large enough so that both new elements cannot go in the same
partition element, because together they outweigh all the other elements.
Now ask whether this new set of elements can be partitioned into two sets of
equal weight (which must be N ).
By leaving out the new elements, this gives a partition of the original set
into two sets of weight B and Q − B.
To reduce Partition to Knapsack take b = w and W = B = 21 Q.
For Bin Packing, take an instance of Knapsack. Then choose B to be
half the total weight of all elements of S and k = 2. t
u
Integer Programming
Input : Positive rationals aij , cj , bi , 1 6 i 6 m, 1 6 j 6 n and
Pn threshold B.
Question : Can we find the integers x1 , . . . , xn such that j=1 cj xj > B,
Pn
subject to constraints j=1 aij xj 6 bi , 1 6 i 6 m?
Proof. Hardness: We reduce from Subset Sum. The subset sum instance,
consisting of a set S with weights w : S → N and threshold B, has a pos-
P solution iff the integer programming instance: 0 6 xn 6 1, a ∈ S,
itive
a∈S w(a)xa = B, as an integer solution.
Membership: Integer Programming is in NP. We show that if there is
an integer solution, then there is one with only polynomially many bits as a
198 7 NP- and PSPACE-Completeness
function of the size of the input. The integer solution canP then be guessed
n
and verified in polynomial time. Considering constraints j=1 aij xj 6 bi ,
1 6 i 6 m, we can convert to all the entries being integers by multiplying by
the least common multiple of the denominators. Thus we can consider the
coefficients as all integers. The fact that all entries are positive means that
the bi will be an upper-bound (after we turn everything into integers) for the
values of the xi and hence the solutions can only have at most a polynomial
number of bits. t u
There is a related and very important optimization problem called Lin-
ear Programming (where we optimize the sum over the real or complex
numbers) which is is in solvable in polynomial time. The existence of a poly-
nomial time algorithm for Linear Programming was a celebrated result
proven by Kachiyan [Kha79] in 1979. Kachiyan’s algorithm is not practical,
but there is another polynomial time algorithm which is due to Karmarkar
[Kar84] which is is “reasonably” practical. However, Linear Programming
shows the difference between theory and practice. As we will discuss in §10.5,
the most commonly used algorithm for Linear Programming, is one called
the simplex method which provably does not run in polynomial time. How-
ever, the simplex method works well “mostly” in practice. It was, and is, a
longstanding question to give a satisfactory explanation for this phenomenon.
(More in §10.5.)
In practice, often Integer Programming problems are approximated
by Linear Programming ones, hoping to get a reasonable approximate
solution quickly. We discuss this idea in the proof of Theorem 10.2.8, where
we are looking at approximation algorithms.
We finish with one of our original motivating examples.
Theorem 7.2.27 (Karp [Kar73]).
(i) Hamilton Cycle is NP-complete.
(ii) Directed Hamilton Cycle (in directed graphs, where to be a cycle
all the edges must go in the same direction) is NP-complete.
The proof below involves some relatively complicated gadgetry.
Proof. We show Vertex Cover 6pm Hamiltonian Cycle. We build a
graph H which has a Hamilton cycle iff a given graph G = (V, E) has a
vertex cover of size k. We begin by considering the directed version.
For a directed graph we use a four vertex widget:
3 v 4
1 u 2
For an undirected graph we use a twelve vertex widget:
7.2 NP Completeness 199
v
3 4
1 u 2
There is one widget for each edge (u, v) ∈ E, one side corresponds to the
vertex u and the other to the vertex v.
The widgets have the property that any Hamiltonian cycle entering at
vertex 1 must leave at vertex 2, and likewise with vertex 3 and 4. There are
two ways for this to happen: either straight through, and so no vertices on
the other side are visited, or zigzagging so that every vertex on both sides
are visited. Any other path through will leave some vertex stranded and so
cannot be part of a Hamiltonian cycle. This is proven by case analysis.
We form H as follows.
For each vertex u, string together all the u sides of all the widgets corre-
sponding to edges in E incident to u. Call this the u loop.
Also, H has a set K of k extra vertices, where k is the parameter of vertex
cover as given.
There is an edge from each vertex in K to the first vertex in the u loop, and
an edge from the last vertex in the u loop to each vertex in K.
from vertices in K
weights w : E → N.
Question : Is there a tour of total weight at most k visiting each vertex at
least once and returning home?
7.2.11 Commentary∗
By the end of the last three (sub-)sections, you have worked through a number
of natural problems which are NP-complete. Soon after the papers of Cook
and Levin, Karp [Kar73] was to the first to realize the widespread nature
of NP-completeness. His original paper listed 21 problems, including all of
those we have analysed. Many, many more can be found in the classic text
by Garey and Johnson [GJ79]. Since then, thousands of problems have been
shown to be NP-complete.
In a first course such as this, you gain the impression that proving NP-
completeness of some problem is roughly like “bouldering” compared to the
“mountain climbing” of theorem proving7 . That is, the proofs are clever but
not that difficult, and some elementary local replacement or gadget design
will work.
However, there are many exceptionally complex and deep NP-completeness
results requiring techniques far too advanced for this book, and involving sub-
stantial amounts of deep mathematical tools. Arguably, one of the high points
of computational complexity theory is something called the PCP Theorem,
which is fundamentally a reduction. This remarkable theorem states roughly
that for some universal constant C, for every n, any mathematical proof for a
7 Thanks to my friend Mike Fellows for this analogy.
7.2 NP Completeness 201
But Does It All Matter? Beyond the question of P vs NP, one of the
striking developments in recent years is the developing realization of the im-
portance of the question: “Does P 6= NP really matter? ”. What do we mean
by this? There are now many engineering applications using Sat Solvers.
In the proof of Clique being NP-complete we explicitly gave a reduction
from Clique to Sat (in Lemma 7.2.2). We gave this explicit reduction in
spite of the fact that we could have simply argued that Clique was in NP.
Therefore, Clique must have such a translation via the Cook-Levin Theo-
rem. But we wanted to show that that detour was unnecessary and there was
an easy reduction re-writing the Clique problem into an instance of Sat.
Many problems have easy and natural translations into instances of Sat.
So What? Isn’t Sat supposedly hard? Is that not the point of NP-
completeness? Well, not quite. We believe that such an NP-completeness
result shows that if L is NP-complete, then there are instances of “x ∈ L?”
not solvable in in polynomial time, assuming P 6= NP. What about instances
that occur in practice? It turns out that many natural engineering problems,
when translated into instances of Sat, and can be solved with a class of algo-
rithms called Sat Solvers. These are algorithms specifically engineered to
solve Satisfiability. Moreover, this methodology seems to work well and
quickly for the versions of the problems encountered in “real life”. NASA,
for example, uses Sat Solvers for robot navigation. When the author was
young, showing something to be NP-complete meant that the problem looked
scary and unapproachable; like they who must not be named in Harry Potter.
But now many engineers regard a problem coming from practice as “solved”
if it can be efficiently translated into an instance of Sat.
Of course, this translation does not always work, such as in certain biolog-
ical problems and in cryptography (the latter being a very good thing, since
otherwise all banking would be insecure).
A really good open question is why? What is it about these instances
of Sat, coming from engineering problems, which allows us to solve them
efficiently? We really have no idea.
202 7 NP- and PSPACE-Completeness
7.2.12 Exercises
Set Cover
Input : A ground set U = {x1 , . . . , xn }, a collection of subsets Si ⊆ U ,
{S1 , . . . , Sk } and an integer k.
Question : Is there a sub-collection of at most k Si whose union is U ?
(Hint. Reduce from Vertex Cover.)
Long Path.
Input : A graph G with n vertices.
Question : Does G have a path with exactly n − 1 edges and no vertices
repeated?
(This might need a bit of thought.)
Exercise 7.2.32 A graph H is called a topological minor of a graph G if
there is an injection f : H → G, such that edges of H are represented by
disjoint paths in G. That is, if xy ∈ E(H) then there is a path P (xy) from
f (x) to f (y) in G, for all xy, if P (xy) and P (qr) has a vertex in common it
is only f (x) or f (y).
Prove that the problem below is NP-complete.
Topological Minor
Input : Graphs H and G.
Question : Is H a topological minor of G?
Exercise 7.2.33 Binary Integer Programming asks that the integer
solutions xi ∈ {0, 1}. Show that Binary Integer Programming is NP-
complete.
7.3 Space
there are 6 clog |x| many possible configurations. Working base 2, this gives
a worst case running time of O(2c log |x| ). But this is O(|x|c ) and hence poly-
nomial time. The upshot is L ⊆ P.
Similar simulations show that NP⊆ PSPACE, since we can check every
computation path of a nondeterministic machine running in nondeterministic
time |x|k , in space O(|x|k ) by cycling through the computation paths in
lexicographic order, one at a time. Furthermore, by the Hierarchy Theorem,
Theorem 6.1.5, L ⊂ PSPACE. Thus, we get the following picture:
We also see that co-PSPACE = PSPACE, since we can cycle through all
configurations of a machine M to see if we get acceptance.
The following theorem shows that space seems more pliable than time, and
also shows that nondeterminism is often less interesting for space.
Now C1 , C2 have length < t(|x|) and i 6 n, then we guess so that the tape
at any stage appears as:
C1 | C2 | i | C C1 | C | i − 1 | C 0 C1 | C 0 | i − 2 | C 00
i module i − 1 module i − 2 module
We remark that the t(|x|) > log |x| is needed for this theorem. It is un-
known if NL, the nondeterministic version of L, is different from L. An exam-
ple of a problem in NL is whether x and y are path connected in a directed
graph G, since you only need to guess the vertices of the path one at a time,
making sure that each pair are connected and stopping if you hit the target
one. This connectivity problem is in fact complete for the class NL, under the
appropriate reductions. It seems that to determine such connectivity deter-
ministically we need the square of L. But again L 6= NL is a longstanding open
question. (See Papadimitriou [Pap94], Chapter 16 for a thorough discussion
of this issue.)
Less obvious, but also true, is that the “diagonal” of PH is complete for
PSPACE. That is, we can define QBFSat as the collection of codes for true
quantified Boolean formulae.
QBF (QBFSat)
Input : A quantified boolean formula ϕ = ∃P x1 ∀P x2 . . . ϕ(x1 , x2 , . . . xn ).
Question : Is ϕ valid?
Proof. We only sketch the proof as it uses the same ideas as Savitch’s Theo-
rem. Namely that to get from one configuration to another requires a half-way
point, and the idea here is that this can be found by quantifying, with an
additional layer of quantification as the computations get longer.
Fix a PSPACE bounded machine M with bound nc . The proof uses
the formula accessj (α, β) where α and β are free variables representing M -
configurations, and the formula asserts that we can produce β from α in 6 2j
steps.
The key idea is that the formula is re-usable: for j = 1, accessj can be
read off from the transition diagram, and for j > 1, you can express accessj
as
7.3.3 Exercises
Proof. This is more or less identical to Theorem 7.2.1, and is left to the
reader. (Exercise 7.4.4). t
u
Generalized Geography
Input : An instance of generalized geography on a directed graph G.
Question : Does player I have a winning strategy?
X = ∃a∀b∃c∀d[(a ∨ ¬b ∨ c) ∧ (b ∨ ¬d)]
See Figure 7.1 below.
a0
a1 ā1
a2
b0
b1 b̄1
b2
c0
c1 c̄1
c2
d0
d1 d¯1
d2
We complete the proof. Player I will choose a vertex after v1,0 . This will
correspond to a choice of value for v1 , and player II. Because the path to
v2,0 is of length 2, it is player II who makes a choice at v2,0 . Player II wishes
to make a choice making X false, etc. Thus we see a direct emulation of
the quantifiers. At the end, when we hit vn,2 , the parity of the number of
quantifier alternations means that it is player II who get to choose a clause
vertex cj . Player I will then get to choose one of the paths leading back to
the main frame of the graph. That is, it is choosing one of the literals. Player
II must choose the vertex to which the path is connected. If player I has
chosen this variable vi,1 or v i,1 already, then player I wins, as II has no play.
If I has chosen badly, II will be able to choose one of vi,1 or v i,1 , and then I
would need to choose vi,2 , which it cannot. If player II gets to choose for this
diamond, then they would choose vi,1 if ¬vi ∈ cj , and choose v i,1 if vi ∈ cj .
That is, I can win the game iff X is true. t u
The pebble game is played by two players I and II, who alternatively move
pebbles on the nodes. At the beginning of the game, pebbles are placed on
all the start nodes. If (x, y, z) ∈ R and there are pebbles upon x and y but
not on z, then a player can move a pebble from x to z. The winner is the
first player to put a pebble on T or can force the other player into a position
where he cannot move.
There are natural problems for these classes, but we will not look at them
in this first course.
210 7 NP- and PSPACE-Completeness
7.4.1 Exercises
7.5.1 Introduction
Proof. First it is easy to see that for any complexity class C, (co-C)/Poly is
the same as co-(C/Poly). It is enough to prove this for i = 1 and the rest
works entirely analogously. Suppose that Π1 ⊆ Σ1 /Poly and let L ∈ Σ2 /Poly.
Since L ∈ Σ2 /Poly, we have some advice function f , and Q ∈ Σ2 with x ∈ L
212 7 NP- and PSPACE-Completeness
iff hx, f (x)i ∈ Q, and hence for some polynomial time computable R, x ∈ L
if and only if ∃y[∀zR(hx, f (x)i, y, z)]. Because Π1 ⊆ Σ1 /Poly, we can replace
the Π1 relation in the brackets [ ] by a Σ1 /Poly one. We see that for each
y, there are polynomial time relations Ry0 and advice function g such that
x ∈ L if and only if ∃y∃y 0 (Ry0 (hx, f (x)i, y, hy, g(x, f (x), y)i). To finish we
need to amalgamate all the advice so that the advice will depend only upon
x. However, all the advice depends only on x and y and the range of these
0
is 6 (|x| + |y|)c = |x|c for some c, c0 . As it is only the lengths, we can
concatenate the advice together in a self-delimiting way, and only have a
0
polynomial amount of advice that depends on |x|c . Thus we can construct
a polynomial time relation R00 and an advice function h, such that x ∈ L if
and only if ∃yR00 (hx, h(x)i, y), as required. t u
We remark that advice classes became popular for quite a while, as there
was a plan to tackle P vs NP using them. The argument was that circuits
are non-dynamic objects and perhaps proving lower bounds for them would
be easier than dealing with moving heads on Turing Machines and the like.
Also the feeling was that P vs NP has little to do with uniformity. Thus if
we could show that there were problems in NP requiring exponential sized
circuits, we would be done. Unfortunately, the best lower bounds proven so
far for languages in NP in inputs of size m is 3m.
In this last section we will meet another complexity class not routinely en-
countered, but of significant importance. It concerns randomized reductions.
We will be using elementary probability theory, which is fundamentally count-
ing. The probability that a string is chosen of length n is 2−n , technically
we will we be using the uniform distribution (i.e. unbiased coin-toss met in
school).
In the above, the numbers 34 and 14 are arbitrary. For a randomly chosen
computation path y, the probability that M (x) accepts for certificate y, is
214 7 NP- and PSPACE-Completeness
PrM (x, y) > 34 , and likewise that M rejects. Using repetition, we can amplify
the computational path as we like8 . In particular, for any L ∈RP or BPP,
and any polynomial nd , if you build a new Turing machine M which replicates
the original one nc+d many times, you can make sure that
d
• x ∈ L implies that the probability that M accepts is > 1 − 2−n and
• x∈ 6 L implies M does not accept (RP) or the probability that M accepts
d
is 6 2−n (BPP).
This process is referred to as (probability) amplification.
It is easily seen that P ⊆ RP⊆ NP, and that RP ⊆ BPP. There is one
additional relationship known.
Theorem 7.5.2 (Sipser [Sip83], Lautemann [Lau83]). BPP ⊆ Σ2 ∩Π2 .
• Guess z1 , . . . , zm .
• Generate all potential d of length m.
• See, in polynomial time if d ∈ {y ⊕ zj | 1 6 j 6 m ∧ y ∈ Ax }, which is the
same as {d ⊕ zj | 1 6 j 6 m} ∩ Ax 6= ∅; by checking M (x, d ⊕ zj ) for each
j.
This matrix defines a Σ2 predicate. Hence L ∈ Σ2 by the lemma. t
u
We next explore means for showing that problems are likely to be in-
tractable, using randomized polynomial-time reductions. In some sense, this
result and the one above were the key to many results in modern complexity
like Toda’s Theorem9 and the PCP Theorem10 . The problem of interest is
whether there is unique satisfiability for a CNF formula.
beyond the scope of this book. We discussed this earlier. Kozen [Koz06] has a nice and
accessible discussion of the PCP Theorem.
216 7 NP- and PSPACE-Completeness
The reader will note that this proof also gives a reduction for Sat with a
uniqueness promise, that is, for formulae ρ such that if if ρ has a satisfying
assignment it will have exactly 1. Note also that the reduction is a polynomial
time Turing reduction as we need to ask many questions. More on this in
Chapter 8.
Chapter 8
Some Structural Complexity
8.1 Introduction
8.1.1 Oracles
X 0 = {e : ΦX
e (e) ↓}.
Then we see that for any X, X <T X 0 . For the proof we simply say “use the
same proof as we did for the proof of the undecidability of halting problem,
but putting X’s everywhere”. (Chapter 5.)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 217
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_8
218 8 Some Structural Complexity
Techniques which relativize will not suffice to show P = NP, nor show
P 6= NP.
This is the upshot of the next theorem. What techniques relativize? Any
normal diagonalization relativizes, and any normal simulation relativizes. In
the below, we use the relativization of PX , interpreted to be the collection
of languages {L : L 6PT X}, that is, those for which there is a deterministic
oracle Turing machine running in polynomial time and accepting L with
oracle X. Similarly, we take NPX to be the set of languages for which there
is a nondeterministic oracle Turing machine M running in polynomial time
and with oracle X, such that x ∈ L iff some computation path of M X accepts
x.
1
Theorem 8.1.1 (Baker, Gill and Solovay [BGS75]).
Proof. 1.
We construct a computable oracle A and a language L ∈NPA to meet the
requirements
reached the threshold of the Rhe0 ,ki for he0 , ki < j, and 2s > d · sk , compute
A∗
Φe s (1s ). Here A∗s is the same as As and also has no elements of length s put
into As+1 . Note that we can do this action in more or less sk many steps.
A∗
Case 1 Φe s (1s ) = 1, then we can diagonalize against Φe by
(i) Adding nothing to L by asking that nothing of length s enters A \ As .
(ii) Making sure that A∗s ≺ A by making sure that the threshold of Rhe,ki
A∗
exceeds the use of the computation Φe s (1s ).
This makes
L(1s ) = 0 6= 1 = ΦA s
e (1 ).
We set the next threshold to exceed the use of the computation, and move
to j + 1.
End of Construction
It is clear that A is computable and we diagonalize Φe and k.
Now we turn to 2.
This is easy. Let B be any PSPACE complete language. These are evidently
closed under polynomial time Turing reductions. Thus PB = PSPACEB . t u
There have been many oracle separation and collapse results. It is argued
that these results that techniques which relativize don’t suffice to solve ques-
tions regarding the relationships between complexity classes. One notable
one, also from the original paper [BGS75], is the following. Its significance is
that in Proposition 5.1.1, we observed that ∆01 is the same as computable.
That is, being computable is the same as being c.e. and co-c.e.. The natural
complexity analog is that “if language L is in NP ∩ co-NP then L ∈ P”.
That is, co-NP ∩ NP = P. This analog might be true, but proving it will
require a proof that does not relativize, as we see in the next result. We re-
mark that most experts think that co-NP ∩ NP 6= P, but this is yet another
longstanding conjecture.
220 8 Some Structural Complexity
and hence
P B 6= co-NPB ∩ NPB .
Proof. This construction is significantly more complex than the proof of The-
orem 8.1.1. Again, we will have L ∈ NPB to meet the requirements
set.
The usual argument shows that K B is NPB -complete. The reader is invited
to prove that this language is NPB complete in Exercise 8.1.3. It is important
that the reader notices that to decide if z ∈ K B , we only look at elements of
length smaller than |z|. We will build B so that
x ∈ K B iff x ∈ K Bs ,
8.1.2 Exercises
Exercise 8.1.5 (Baker, Gill and Solovay [BGS75]) Show that there is
a computable oracle A with
222 8 Some Structural Complexity
One result which generated a lot of hope was by Bennett and Gill [BG81].
Bennet and Gill showed that for a random oracle X, PX 6= NPX , with
probability 1. Bennett and Gill argued that
“random oracles, by their very structurelessness, appear more benign and less likely
to distort the relations among complexity classes than the other oracles used in com-
plexity theory and recursive function theory, which are usually designed expressly
to help or frustrate some class of computations.”
However, this hope was yet another forlorn one. Within two years Stuart
Kurtz [Kur83] had disproved this “random oracle hypothesis” (that is, the
hypothesis that if something was true relative to a random oracle, then it
was actually true unrelativized). Also see Chang et. al. [CCG+ 94] for another
solution, but more below. Kurtz’s counterexample was a bit artificial as it
involved relativizing something which was already relativized. So perhaps
something might be salvaged for “natural problems”.
Anyway, the only real way we seemed to know how to separate classes
was diagonalization, and the only way to achieve coincidence was to use
simulation. What methods might be used if not these?
The next major advance came in Shamir [Sha92]. It built on earlier work of
Lund, Fortnow, Karloff and Nisan [LFKN92]. Shamir’s paper was concerned
with a class called IP for interactive protocol. An interactive proof system
(protocol) for a language L is a generalization of being in NP where we use
probabilistic notions in an interactive game between a prover and a verifier.
The prover is trying to convince the verifier whether they should believe
x ∈ L. After a polynomial number of rounds the verifier will accept if it
believes x ∈ L and reject if it believes x 6∈ L. We need that the probability
that the protocol accepts if x ∈ L is at least 23 (which can be amplified) and
the probability that the verifier rejects is at least 23 if x 6∈ L. The verifier has
a supply of random coins for this protocol. Here is an example of this idea:
Graph Non-Isomorphism is in IP.
8.1 Introduction 223
The protocol begins with two graphs G, H with n vertices. We want the
verifier to accept if G ∼
6 H and reject otherwise. The following protocol is
=
repeated a constant number of times.
(i) The verifier uses its random coin to compute graphs G1 ∼ = G and H1 ∼ =
H. (These are random permutations of the vertices.)
(ii) The verifier flips a coin and sends one of the pair (G, H1 ) or (G, G1 ) to
the prover.
(iii) The prover must return to the verifier which of (G, H1 ) or (G, G1 ) was
chosen by the verifier.
The only way that the prover can determine which of (G, H1 ) or (G, G1 )
was chosen by the verifier is if it can distinguish between G1 and H1 . If
G ∼= H, then H1 ∼ = G. Thus, the prover will always be able to determine
which it was sent by simply testing if the second graph is isomorphic to G.
Therefore, if the two graphs are not isomorphic and the prover and the verifier
both follow the protocol, the prover can always determine the result of the
coin flip. However, if H ∼= G, then G1 and H1 are indistinguishable. Hence
the prover cannot distinguish between (G, H1 ) or (G, G1 ) any better than
random guessing.
What’s the computational power of this strange model? It was great shock
when Shamir proved IP = PSPACE. Introduced by Lund, Fortnow, Karloff
and Nisan in [LFKN92], the method Shamir used was quite new. It is called
arithmetization. What we do is take a QBF formula X and P1 turn it into a
number by replacing ∨ by +, ∧ by ·, ¬x by (1 − x), ∃x by x=0 , and ∀x by
Q1
x=0 . Then X is true iff the number evaluates to n 6= 0. So why not just
compute this and solve X, showing that P = PSPACE? The problem is that
the number is likely too big to be evaluated in polynomial time.
So, what Shamir did was to move the whole evaluation problem into arith-
metic over prime field (i.e. replacing the sums and products by evaluations in
the field), but the prime of polynomial size relative to the input. This allevi-
ates the problems of numbers being too large since modular arithmetic can
be done in polynomial time (relative to the prime). In the protocol, first we
ask the oracle “What is the valuation for X is in the prime field?”. Then the
protocol, roughly stripped one quantifier at each step, treating the formula
without the sum (product) as a sum (product) of polynomials and ask the or-
acle to supply the value of the evaluation. You can verify that the oracle does
not lie by checking random points of the field to evaluate the “polynomial”
and see if it evaluates correctly. For example if we had f (z) = h(z) + q(z)
as the claimed value modulo p, then this could be checked by substituting
random q ∈ GF (P ), and see if evaluated correctly. The protocol uses the
fact that modular arithmetic is fast. The point, roughly, is that the only way
that we can be fooled is if the q chosen randomly happens to be the point
of intersection of two curves over GF(p), for a large prime p. This is highly
unlikely. Moreover lies propagate through the rest of the proof; the first lie
by the oracle is never forgiven, at least with high probability. An amusing
224 8 Some Structural Complexity
hard instances, and yet seem easily solvable for instances occurring in natural
problems?”.
Remark 8.1.1. Personal Musings. I finish this section with some personal
musings. We have to be very careful with what we mean by these limita-
tion results. IP = PSPACE is proven by a simulation, only one that filters
through a kind of reduction to arithmetic formulas. So what do we mean by
“normal simulation?”. Also proving lower bounds for monotone circuits cer-
tainly used diagonalization, only filtering through a randomization process,
so what counts as “normal diagonalization?” Finally, part of the problem,
the mixing of oracles and bounded reductions, can be indicated by classical
computability. A truth table reduction A 6tt B is a given by a computable
function f , such that, on input x, f (x) computes a boolean expression σ(x),
and x ∈ A iff B models σ(x) where xi ∈ σ(x) is interpreted as i. For example,
f (4) = σ(4) might say ((x1 ∨ ¬x3 ) ∧ x5 ) ∨ x17 , so that 4 ∈ A iff either (5 ∈ B
and at least one of (1 ∈ B) or (3 6∈ B)) or (17 ∈ B) holds. It is known that
there is a “minimal” c.e truth table degree: namely a > 0, such that there
is no c with a > c > 0. (Kobzev [Kob78]) On the other hand, Mohrherr
[Moh84] showed that the truth table degrees above 00 are dense, meaning, in
particular, that if a > 00 , there is always a c with a > c > 00 . Does this mean
that the existence of minimal truth table degrees does not relativize? Well
no, the point is that Mohrherr’s Theorem concerns the truth table degrees
above 00 , and in those degrees we have not relativized the x 7→ σ(x) function.
The reason I mention Mohrherr’s Theorem is that truth table reduction is
much more like polynomial time Turing reduction in that giving access to
an oracle does not change the nature of the access mechanism, it is still a
computable computation of σ(x), and similarly for A 6P T B, we still only
have polynomial access to the oracle. They are only partial relativizations. In
classical relativization in computability theory, everything is relativized in-
cluding the access mechanism, and it is not altogether clear what that would
mean for a polynomial. In the case of 6tt then we would need to relativize
this to ∅0 , and then we do find that there is a degree c.e. relative to ∅0 which
0
a minimal 6∅tt “truth table” degree, so Kobzev’s theorem is resurrected.
Certainly there have been some attempts to re-define the concept of an
oracle in computational complexity theory, but the situation remains murky
at best.
A ⊕ B = {0σ, 1τ : σ ∈ A, τ ∈ B}.
The following is one of the classical results from the early days of compu-
tational complexity theory.
Proof. Whilst this is not the original Ladner proof, it is the one most used. We
will be using a diagonalization argument which works roughly by “blowing
holes in B”. We will construct X = A ⊕ C, and hence will be concerned
with C. We will build a linear time computable relation R with R defined on
strings of the form 1s . We will define
Clearly C 6P
m B. We need to meet the requirements
R2he,ki : ΦX k
e 6= B in time |x| .
R2he,ki+1 : ΦA k
e 6= X in time |x| .
ΦX k P
e = B in time |x| implies B 6T A,
Now, notice that we might have made a decision long ago at stage t which
would have caused a disagreement ΦX e (x) 6= B(x), but it might take us a very
long time to actually find that disagreement at some stage s with t << s.
But eventually we will observe this previously created disagreement.
In the construction, if we have not observed a disagreement, then keep
R2he,ki as the requirement asserting control and keep R(1s+1 ) = 0.
If we have observed a disagreement looking back, then we pass control to
R2he,ki+1 setting R(1s+1 ) = 1.
The case that R2he,ki+1 has control at stage s is similar. We will have
R(1s ) = 1 and hence C is copying B. If C successfully copied B and we failed
to meet R2he,ki+1 , then ΦA e = X, and C almost equals B. Thus B 6T A, a
P
Proof. Now we run the same proof using an oracle for B. In linear time
relative to an oracle for B we can compute a slow enumeration for B, and
moreover as A 6P T B, we have a polynomial time procedure Γ
B
= A in time
d
|x| for some d on inputs of length x. Thus we can run the whole construction
using B as an oracle. t u
Corollary 8.2.3 (Ladner [Lad75]).
If P6=NP, then there are languages in NP\ P which are not NP-complete.
Proof. By Exercise 7.2.3, if C 6P
m B and B ∈NP, then C ∈NP. We can apply
Ladner’s Theorem with A = ∅ and B coding Sat, giving an NP language X
with X not NP-complete. t u
Corollary 8.2.3 yields the following question:
Question 8.2.1. Assuming that P 6= NP, are there any natural languages L ∈
NP which are not NP-complete?
We have no idea. We have some possible candidates, two noteworthy ones
are below.
Graph Isomorphism
Input : Two graphs G1 , G2 .
Question : Is G1 ∼
= G2 ?
The next one is not a decision problem but an NP-search problem.
Factorization
Input : An integer n.
Question : Find the prime factors of n.
For more fine-grained results we refer the reader to the book [KST93].
8.2.1 Exercises
Re,i,k,d : ΦA B k d A
e = Φi running in times |x| and |x| , respectively → Φe ∈ P.
The easiest way to do this is to make one side or the other empty for a long time making
the computation in P.)
Exercise 8.2.4 The original proof of Ladner used stretching. To the lan-
guage C we need for X = A ⊕ C is constructed by building a function f . Now
we will put z = x01f (|x|) ∈ C iff x ∈ B. That is z is a “stretched form” of
x. The idea is that in a computation ΦA⊕C e (x) running in time |x|k , we can
k
only ask questions of length 6 |x| . In the construction, we would be build-
ing f and make sure that f is built in polynomial time. When we compute
ΦA⊕C
e (x), once f (x) is bigger than |x|k , the value of B on x will not effect the
value of ΦA⊕C
e (x). Once Rhe,ki asserts control, we will define f accordingly
to have f (|x|) > |x|k for |x| > s. Now, if ΦA⊕Ce = B, then you can argue that
the very computation ΦA⊕C e (x) allows us to compute B(x) which then allows
us to compute C on longer z which then allows us to compute B on longer
x’s.
Using these ideas give a proof of the density of the polynomial time Turing
degrees of computable sets using stretching.
Chapter 9
Parameterized Complexity
9.1 Introduction
In this section we will look at a coping strategy which seems to have turned
out to be important for a wide class of practical problems. It is called param-
eterized complexity, or more recently multivariant complexity or fine-grained
complexity. The idea is to look more deeply into the way that a problem is
intractable; what is causing the intractability. The area was introduced by
Downey and Fellows [DF92, DF93, DF95a, DF95b].
In this book we have seen the story of classical complexity. It begins with
some problem we wish to find an efficient algorithm for. Now, what do we
mean by efficient? We have idealized the notion of being efficient by being in
polynomial time. Having done this, we discover that the only algorithm we
have for the given problem is to try all possibilities and this takes Ω(2n ) for
instances of size n. What we would like is to prove that there is no algorithm
running in feasible time. Using our idealization that feasible is the same as
polynomial, this equates to showing that there is no algorithm running in
polynomial time.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 231
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_9
232 9 Parameterized Complexity
When in real life do we know nothing else about a problem than its
size?
database example, an algorithm that works very efficiently for small formulas
with low logical depth might well be perfectly acceptable in practice.
Thus, parameterized complexity is a refined complexity analysis, driven
by the idea that in real life data is often given to us naturally with an under-
lying structure which we might profitably exploit. The idea is not to replace
polynomial time as the underlying paradigm of feasibility, but to provide a
set of tools that refine this concept, allowing some exponential aspect in the
running times by allowing us either to use the given structure of the input to
arrive at feasibility, or develop some relevant hardness theory to show that
the kind of structure is not useful for this approach.
This simple idea is pretty obvious once you think about it. For example,
in Chapter 2.2, we showed regular language acceptance is in linear time. But
this is really not quite true: it is only true if the language is presented to
us as, say, a regular expression, whereas it could be a language presented as
the output of a Turing machine, then by Rice’s Theorem, regular language
acceptance case acceptance is undecidable. The point is that we only really
care about regular languages when they are given to us in a structured way,
namely via regular expressions.
By way of motivation, we recall three basic combinatorial decision prob-
lems Vertex Cover, Dominating Set, Independent Set.
The reader should recall that in a graph G a vertex cover is where vertices
cover edges: that is C = {v1 , . . . , vk } is a vertex cover iff for each e ∈ E(G),
there is a vi ∈ C such that vi ∈ e. The reader should recall that a dominat-
ing set is the situation where vertices cover vertices: D = {v1 , . . . , vk } is a
dominating set iff for all v ∈ V (G), either v ∈ D or there is an e ∈ E(G)
such that e = hvi , vi for some vi ∈ D. Finally an independent set is a col-
lection of vertices no pair of which are connected. In Chapter 7, we proved
all of the related decision problems, Vertex Cover, Independent Set,
Dominating Set are NP-complete.
These kind of problems can occur as mathematical models of data occur-
ring in computational biology. Suppose we had a conflict graph of some data
from this area. Because of the nature of the data we know that it is likely the
conflicts are at most about 50 or so, but the data set is large, maybe 1012
points. We wish to eliminate the conflicts, by identifying those 50 or fewer
points. Let’s examine the problem depending on whether the identification
turns out to be a dominating set problem or a vertex cover problem.
• Dominating Set Essentially the only known algorithm for this problem
is to try all possibilities. (Does this sound familiar?) Since we are looking
at subsets of size 50 or less then we will need to examine all (1012 )50 many
possibilities. Of course, this is completely impossible.
• Vertex Cover There is now an algorithm running in time O(1.2738k +
kn) (Chen et. al. [CKX10]) for determining if an G has a vertex cover of
size k. This and and structurally similar algorithms has been implemented
and is practical for n of unlimited practical size and k large. The rele-
vant k has been increasing all the time, evolving from about 400. Such
234 9 Parameterized Complexity
n = 50 n = 100 n = 150
k=2 625 2,500 5,625
k=3 15,625 125,000 421,875
k=5 390,625 6,250,000 31,640,625
k = 10 1.9 × 1012 9.8 × 1014 3.7 × 1016
k = 20 1.8 × 1026 9.5 × 1031 2.1 × 1035
nk+1
Table 9.1: The Ratio 2k n
for Various Values of n and k.
There are several varieties of parametric tractability, but for this course, I
will stick to the basic “strongly uniform” variety most commonly used in
practice.
9.2.1 Discussion
Now you might (in some cases validly) complain about the presence of an
arbitrarily bad computable function f . Could this not be like, for example
Ackermann’s function? This is a true enough complaint, but the argument
also applies to polynomial time. Could not polynomial time allow for running
times like n30,000,000 ? As noted by Edmonds [Edm65], the practical algorithm
builder’s answer tends to be that “in real life situations, polynomial time
algorithms tend to have small exponents and small constants.” That certainly
was true in 1965, but as we will see in §9.3.4, this is no longer true. The same
heuristic applies here. By and large, for most practical problems, at least until
recently, the f (k)’s tended to be manageable and the exponents reasonable.
In fact, an important offshoot of parameterized complexity theory is that
it does (sometimes) provide tools to show that bad constants or bad expo-
nents for problems with algorithms running in polynomial time cannot be
eliminated, modulo some reasonable complexity assumption. More on this in
§9.3, especially in §9.3.4.
One of the key features of the theory is a wide variety of associated tech-
niques for proving parametric tractability. We will discuss them in §9.4, but
before we do so, let’s examine the associated hardness theory.
is probably Ω(|M |k ) since again our intuition would be that all paths would
need to be tried:
M : hG, ki 7→ hG0 , k 0 i,
so that
Following Karp [Kar73], and then four decades of work, we know that
thousands of problems are all NP-complete. They are all reducible to one
another and hence seem to have the same classical complexity. On the other
hand, with parameterized complexity, as we now see, we have theory which
separates Vertex Cover from Dominating Set and Independent Set.
With such refined reducibilties, it seems highly unlikely that the hardness
classes would coalesce into a single class like NP-complete. And indeed we
think that there are many hardness class. We have seen in the theorem above
that Short Nondeterministic Turing Machine Acceptance ≡f pt In-
dependent Set. However, we do not think that Dominating Set 6f pt
Independent Set.
Similarly, we can define Weighted s-Cnf Sat where the clauses have
only s many literals. Classically, using a padding argument, we know that
Cnf Sat ≡P m 3-Sat, as we saw in Corollary 7.2.1. Recall that to do this
for a clause of the form {q1 , . . . , qk } we add extra variables zj and turn the
clause into several as per: {q1 , q2 , z1 }, {z1 , q3 , z2 }, etc. Now this is definitely
not a parametric reduction from Weighted Cnf Sat to Weighted 3 Cnf
Sat because a weight k assignment could go to any other weight assignment
for the corresponding instance of 3-Cnf Sat.
We define infinitely many parameterized complexity classes as
The letter “W” stands for weft 2 which comes from the original papers. In
[DF92, DF93, DF95a, DF95b], the classes were introduced as circuits with
gates representing ∧, ∨ and inverters representing ¬. They had n many inputs
for boolean variables and a single output at the bottom. These circuits were
2 A term from weaving.
9.3 Parametric Intractability 239
classified via depth and weft, where depth is the the length of the longest path
to the output and weft is the depth when only large gates are considered.
Then, for example, a CNF formula would have a single large ∧ gate at the
bottom with a layer of small or large ∨ gates (representing clauses) feeding
into it, perhaps with inverters between the inputs and the ∨ gates. In the
case of CNF the gates would be large as there is no bound beyond n on their
size. In the case of s-CNF, they are small. The difference between s-CNF and
s0 -CNF is that we bound the size of the small gates by s and s0 respectively.
In the original model, the circuits did not have to be be boolean circuits,
only have a single output and be of polynomial depth. For if the circuits
had reasonably high depth, then something at level k could have an output
into something of level k 0 << k. In boolean circuits this only happens with
k 0 = k − 1 (modulo inverters). In Figure 9.1, we have an example of a weft 2
depth 5 decision circuit.
However, in [DF92, DF95a, DF95b], a normalization theorem is proven
to say that we can regard these circuits as being boolean, and representing
formulae. This proof is a combinatorial argument and we have decided to
omit it, and take as a fact that we can define the complexity classes using
boolean circuits (formulae).
Convention 9.3.2 For this Chapter we will also use the notation + for ∨
and · for ∧, as this makes the material more consistent with usage in the
area. So a CNF formula is a product of sums.
This then allows us to reduce circuits of polynomial depth and weft 1
back to one of the classes W [1, s] We are about to prove a basic result in pa-
rameterized complexity. The result says that weighted (antimonotone) 2-Cnf
Sat is complete for W [1]. It is important that the reader see the difference
between the combinatorics needed for a NP-completeness proof and one for
parameterized complexity. Do not be daunted by this proof, for whilst it is
reasonably long, it is combinatorial, and does not need high powered math-
ematics.
We are defining the basic class as
∞
[
W [1] = W [1, s].
s=1
Large gates (unbounded fanin) represented by Small gates (bounded fanin) represented by
x¡ x™ x£ x¢ x∞ x§ x¶
¬
¬ ¬
x¡ x™ x£ x¢ x∞
¬
¬
¬
¬
Red/Blue Nonblocker
Input: A graph G = (V, E) where V is partitioned into two colour classes
V = Vred ∪ Vblue .
Parameter: A positive integer k.
Question: Is there is a set of red vertices V 0 ⊆ Vred of cardinality k such
that every blue vertex has at least one neighbor that does not belong to V 0 .
The closed neighborhood of a vertex u ∈ V is the set of vertices N [u] =
{x : x ∈ V and x = u or xu ∈ E}.
It is easy to see that the restriction of Red/Blue Nonblocker to graphs
G with blue vertices of maximum degree s belongs to antimonotone W [1, s]
since the product-of-sums (Remember: and of ors) boolean expression
Y X
¬xi
u∈Vblue xi ∈N [u]∩Vred
has a weight k truth assignment if and only if G has size k nonblocking set.
Such an expression corresponds directly to a formula meeting the defin-
ing conditions for antimonotone W [1, s]. We will refer to the restriction of
Red/Blue Nonblocker to graphs with blue vertices of maximum de-
gree bounded by s as s-Red/Blue Nonblocker. We next argue that s-
Red/Blue Nonblocker is complete for W [1, s].
By our definition of W [1, s], we can assume that we are given a normalized
s-CNF boolean expression. Thus, let X be a boolean expression in conjunctive
normal form with clauses of size bounded by s. Suppose X consists of m
clauses C1 , . . . , Cm over the set of n variables x0 , . . . , xn−1 . We show how to
produce in polynomial time by local replacement, a graph G = (Vred , Vblue , E)
that has a nonblocking set of size 2k if and only if X is satisfied by a truth
assignment of weight k.
Before we give any details, we give a brief overview of the construction,
whose component design is outlined in Figure 9.2. There are 2k “red” com-
ponents arranged in a line. These are alternatively grouped as blocks from
Vred = V1 ∪ V2 (V1 ∩ V2 = ∅), with a block of vertices from V1 and then V2
to be precisely described below. The idea is that V1 blocks should represent
a positive choice (corresponding to a literal being true) and the V2 blocks
corresponding to the “gap” until the next positive choice. We think of the
3 The reduction for Red/Blue Nonblocker from Downey-Fellows [DF95b] contained a
flaw that was spotted by Alexander Vardy. The reduction used here is from [DF13].
242 9 Parameterized Complexity
.
. . . n-1 Variable Choice (V1 )
. (red)
. ..
1
0 Missing edges
For each i
one blue vertex and vertices
for each vertex
not in row i . . .
(V4 ) "Gap" Choice(red)
(V2 )
N = {a[r, ir ] : 0 6 r 6 k − 1} ∪ {b[r, ir , dr ] : 0 6 r 6 k − 1}
is a nonblocking set in G.
Conversely, suppose N is a 2k-element nonblocking set in G. It is straight-
forward to check that a truth assignment for X of weight k is described by
setting those variables true for which a vertex representative of this possibility
belongs to N , and by setting all other variables to false.
Note that the edges of the sets E1 (E2 ) which connect pairs of distinct
vertices of A(r1 ) [B(r1 )] to blue vertices of degree 2 enforce that any 2k-
element nonblocking set must contain exactly one vertex from each of the sets
A(0), B(0), A(1), B(1), . . . , A(k −1), B(k −1). The edges of E3 and E4 enforce
(again, by connections to blue vertices of degree 2) that if a representative
of the possibility that xi evaluates to true is selected for a nonblocking set
from A(r1 ), then a vertex in the i-th row of B(r1 ) must be selected as well,
representing (consistently) the interval of variables set false (by increasing
index because of the E7 and E8 edges) until the “next” variable selected to
be true. The edges of E5 and E6 ensure consistency between the selection
in A(r1 ) and the selection in A(r1 + 1 mod n). The edges of E9 and E10
ensure that a consistent selection can be nonblocking if and only if it does
not happen that there is a set of representatives for a clause witnessing that
9.3 Parametric Intractability 245
every literal in the clause evaluates to false. (There is a blue vertex for every
such possible set of representatives.) t u
Theorem 9.3.4 provides the starting point for demonstrating the following
dramatic collapse of the W [1] stratification.
Theorem 9.3.5. W [1] = W [1, 2].
v[i0 ] ⇒ v[i].
It may be seen that this transformation may increase the size of the circuit
by a linear factor exponential in k. We make the following argument for the
correctness of the transformation.
If C accepts a weight k input vector, then setting the corresponding copies
x[i, j] among the new input variables accordingly, together with appropriate
settings for the new “collective” variables v[i], yields a vector of weight k 0
that is accepted by C 0 .
For the other direction, suppose C 0 accepts a vector of weight k 0 . Because
of the implications in (1) above, exactly k sets of copies of inputs to C must
have value 1 in the accepted input vector (since there are 2k copies in each
set). Because of the implications described in (2) and (3) above, the variables
v[i] must have values in the accepted input vector compatible with the values
of the sets of copies. By the construction of C 0 , this implies there is a weight
k input vector accepted by C. t u
We have now done most of the work required to show that the following
parameterizations of well-known problems are complete for W [1]:
Clique
Instance : A graph G = (V, E).
Parameter : A positive integer k.
Question : Is there a set of k vertices V 0 ⊆ V that forms a complete subgraph
of G (that is, a clique of size k)?
Theorem 9.3.6 (Downey and Fellows [DF95b]).
(i) Independent Set is complete for W [1].
(ii) Clique is complete for W [1].
Proof. It is easy to observe that Independent Set belongs to W [1]. By
Theorems 9.3.4 and 9.3.5, it is enough to argue that Independent Set is
hard for Antimonotone W [1, 2]. Given a boolean expression X in conjunc-
tive normal form (product of sums) with clause size 2 and all literals negated,
we may form a graph GX with one vertex for each variable of X and having
an edge between each pair of vertices corresponding to variables in a clause.
The graph GX has an independent set of size k if and only if X has a weight
k truth assignment.
(ii) This follows immediately by considering the complement of a given graph.
The complement has an independent set of size k if and only if the graph has
a clique of size k. t
u
9.3 Parametric Intractability 247
Phew! The reader should note that henceforth we will only need the the
statement of the parameterized version the Cook-Levin Theorem. Thus, The-
orem 9.3.7 can be used as a Black Box ; that is as a starting point for hardness
proofs.
We remark that Theorem 9.3.7 depends crucially on there being no bound
on the size of the Turing machine alphabets in the definition of the problem.
If we restrict Short Turing Machine Acceptance to Turing machines
with |Σ| bounded by some constant b, then the number of configurations is
bounded by bk |Q|k and the problem becomes fixed-parameter tractable (see
Exercise 9.3.12).
9.3.2 Exercises
the answer is no. So figure out a way to lift the degrees of vertices. Think about taking r
copies of the graph G.)
Irredundant Set
Instance : A graph G = (V, E), a positive integer k.
Parameter : A positive integer k.
Question : Is there a set V 0 ⊆ V of cardinality k having the property that
each vertex u ∈ V 0 has a private neighbour? (A private neighbour of a vertex
u ∈ V 0 is a vertex u0 (possibly u0 = u) with the property that for every vertex
v ∈ V 0 , u 6= v, u0 ∈
/ N [v].)
(Hint: Let G = (V, E) be a graph for which we wish to determine if G has a k-element
irredundant set. We construct the circuit C corresponding to the following boolean expres-
sion E. The variables of E are:
{p[i, x, y] : 1 6 i 6 k, x, y ∈ V } ∪ {c[i, x] : 1 6 i 6 k, x ∈ V }
k-Multicoloured Clique
Input : A graph G with a vertex colouring χ : V (G) → {1, . . . , k}.
Parameter : k
Question : Does G have a colourful clique of size k? Here the clique being
colourful means that if x, y are in the clique, then χ(x) 6= χ(y).
Definition 9.3.3.
(i) W [P ] denotes the collection of parameterized languages fpt-
reducible to the following problem.
Weighted Circuit Sat
Input : A decision circuit of X size polynomial in inputs
{x1 , . . . , xn }.
Parameter : An integer k.
Question : Does X have a weight k satisfying assignment?
4 Originally these classes were defined using bounded weft circuits, but for our purposes
boolean formulae suffice, although the fact that the two definitions are equivalent requires
a combinatorial proof.
9.3 Parametric Intractability 251
(ii) W [SAT ] is the same as W [P ] except that we ask that the circuit is
a boolean one corresponding to formula of propositional logic.
(iii) XP is the class of all languages L fpt-reducible to a language L
b with
b (k) ∈ DTIME(nk ),
L
Proof. Suppose that FPT⊆XP. Define a language L, such that, for each k,
L(k) is in DTIME(nk+1 )\DTIME(nk ), given by the Hierarchy Theorem. Then
L ∈XP. Suppose that L ∈FPT. Thus, by definition, there is a p, and M with
k-Linear Inequalities
Instance : A system of (rational) linear inequalities.
Parameter : A positive integer k.
Question : Can the system be made consistent over the rationals by deleting
at most k of the inequalities?
(1 + ε)|Q| 6 |S|.
Table 9.2: The Running Times for Some Relatively Recent PTAS’s with 20%
Error.
Arora gave an EPTAS for the Euclidean TSP in [Aro97], but for all of
the other PTAS’s mentioned above, the possibility of such an improvement
remains open.
But we have a strategy: Prove the problem is W [1]-hard parameterized
by k = 1ε and you prove that the problem has no EPTAS, assuming that
W [1] 6= F P T . Parameterizing this way, the following is not hard to see.
Historically, it was first observed by Bazgan [Baz95] and also Cai and Chen
[CC97].
Planar Tmin
Input : A collection of Boolean formulas in DNF (sum-of-products) form,
with all literals positive, where the associated bipartite graph is planar (this
graph has a vertex for each formula and a vertex for each variable, and an
edge between two such vertices if the variable occurs in the formula).
Output : A truth assignment of minimum weight (i.e., a minimum number
of variables set to true) that satisfies all the formulas.
all i 6 i 6 t. If a term ψi,j is satisfied, both ai,i,s and bi,i,s have to be true
for some s, and this implies s = vi = vi0 .
Then the vertices a1 , . . . , vt must be a clique. If vi vj 6∈ E(G) none of the
terms in ψi,j are satisfied. t u
This “gap creation” method has been used to show that various PTAS’s
cannot be EPTAS’s. We refer the reader to Marx [Mar08] and [CT97, Mar05a,
Mar05b].
Rather than dwelling on hardness, in the next sections we turn to looking
at the rich collection of techniques available to prove that problems are FPT.
for a k-vertex cover is k and then we can decide if G has a vertex cover in
time O(2k |G|) by checking the 2k paths of the tree.
Whilst this is a simple idea, to use it in genuine algorithms, you need to
appeal to some kind of combinatorics to improve performance. An illustration
of this idea is that if we can make the search tree smaller than the complete
binary tree of length k, then the performance will improve. Notice that, if
G has no vertex of degree three or more, then G consists of a collection of
cycles, and this is very quick to check. Thus we can assume we have vertices
of higher degree than 2. For vertex cover of G we must have either v or all of
its neighbours, so we create children of the root node corresponding to these
two possibilities. The first child is labeled with {v} and G − v, the second
with {w1 , w2 , . . . , wp }, the neighbours of v, and G − {w1 , w2 , . . . , wp }. In the
case of the first child, we are still looking for a size k − 1 vertex cover, but
in the case of the second child we need only look for a vertex cover of size
k − p, where p is at least 3. Thus, the bound on the size of the search tree
is now somewhat smaller than 2k . It can be shown that this algorithm runs
1 1
in time O([5 4 ]k · n), and note that 5 4 ≈ 1.446. In typical graphs, there are
many vertices of higher degree than 3, and hence this works even faster.
The best algorithms along these lines use more complex branching rules.
For example, Niedermeier [Nie06] uses the following branching rules.
9.4.2 Exercises
Exercise 9.4.1 Prove that the problem called Hitting Set, parameterized
by k and hyperedge size d, is FPT. A hypergraph is like a graph except that
an edge is simply a subset of V (G). In this problem, we are given a hypergraph
where the maximum number of vertices in a hyperedge is bounded by d > 2.
Note that for p = 2 this is simply Vertex Cover.
Exercise 9.4.2 Prove that Dominating Set restricted to graphs of maxi-
mum degree t is solvable in time O((t + 1)k |G|).
Exercise 9.4.3 (Downey and Fellows [DF95c]) Let q be fixed. Prove
that the following problems are solvable in time O(q k |E|).
Minimum Fill-In
Input : A graph G.
9.4 Parameterized Tractability 261
9.4.3 Kernelization
This is again a simple basic idea. If we can make the problem smaller then
the search will be quicker. This idea of shrinking is a data reduction or pre-
processing idea, and is the heart of many heuristics.
Whilst there are variations of the idea below, the simplest version of ker-
nelization is the following.
7 Kaplan, Shamir and Tarjan [KST94] have used (essentially) the Problem Kernel
Method of the next section to give a O(k5 |V ||E| + f (k)) for suitably chosen f . Fi-
nally, Leizhen Cai [Cai96] also used the problem kernel method another way to give a
O(4k ((k + 1)−3/2 )[|E(G)||V (G)| + |V (G)|2 ] time algorithm. The
√
current champion for this
problem is Fomin and Villanger [FV11] and runs in time 0∗ (2 k log k ).
262 9 Parameterized Complexity
There are other notions, where the kernel may be another problem (often
“annotated”) or the parameter might increase, but, crucially, the size of the
kernel depends only on k.
Here are some natural reduction rules for a kernel for Vertex Cover.
These rules are obvious. Sam Buss (see [DF98]) originally observed that,
for a simple graph G, any vertex of degree greater than k must belong to
every k-element vertex cover of G (otherwise all the neighbours of the vertex
must be included, and there are more than k of these).
This leads to our last reduction rule.
in time O(|I|), and the problem can be solved by employing a search tree of
size 2.27k . Compare a running time of O(2.27k ·k 3 +|I|) (without interleaving)
with a running time of O(2.27k + |I|) (with interleaving).
We also remark that there are many strategies of reduction rules to shrink
the kernel. These include things like crown reductions (Abu-Khzam et. al.
[AKCF+ 04, AKLSS06]), and other crown structures which generalize the
notion of a degree 1 vertex having its neighbours in the vertex cover, to more
complicated structures which resemble “crowns” attached to the graph8 .
Clearly, another game is to seek the smallest kernel. For instance, we know
by Nemhauser and Trotter [NT75] a size 2k kernel (i.e. the number of vertices)
is possible for Vertex Cover. Note that with this kernel size, we get an
algorithm running in time O(n) + g(2k, k) · 2k, where g(2k, k) is the best
we can do for finding a vertex cover in a graph with 2k many vertices, and
seeking a vertex cover of size k. A natural question is “can we do better?”.
As we see in §9.5, modulo some complexity considerations, sometimes we can
show lower bounds on kernels. (Clearly, if P = N P then all have constant size
kernels, so some assumption is needed.) Using a classical assumption called
the unique games conjecture (something apparently stronger than P6=NP) we
can prove that there is an ε > 0 such that no kernel of size (1 + ε) · k can be
achieved. We mention this in passing, and urge the reader to follow up this
line.
The state of the art in the theory of kernelization can be found in Fomin,
Lokshtanov, Saurabh, and Zehavi [FLSZ19]. This is a 529 page book de-
voted to kernelization so the reader can see that the area of kernelization is
technically very deep and we have only addressed the very basic technique.
9.4.4 Exercises
Exercise 9.4.6 Show that the following problem is FPT by the method of
reduction to a problem kernel:
Weighted Marbles
Input : A sequence of marbles, each with an integer weight and a colour.
Parameter : A positive integer k.
Question : Can we remove marbles of a total cost 6 k,
such that for each colour, all marbles of that colour are consecutive?
(Hint: Consider the following reduction rules: Rule 1: If we have two consecutive marbles
of the same colour, replace it them by one with the sum of the weights. Call a colour good
if there is only one marble with this colour. Rule 2: Suppose two successive marbles both
have a good colour. Give the second the colour of the first. Apply these exhaustively. No
two successive marbles will be of the same colour, and no two successive marbles will have
a good colour. The number of marbles is at most twice (+1) the number with a bad colour.
This gives Rule 3: If there are at least 2k + 1 bad coloured marbles, say No. Finally for
a kernelization we also apply exhaustively Rule 4: If a marble has weight > k + 1, give it
weight k + 1.)
Exercise 9.4.8 (Downey and Fellows [DF13]) Prove that the following
problem is FPT.
Matrix Domination
Instance : An n × n matrix with entries in {0, 1}. X.
Parameter: A positive integer k.
Question : Find a set C of at most k 1’s such that all the other nonzero
entries are in the same row or column as at least one member of C.
This technique was first introduced in a paper by Reed, Smith and Vetta
in 2004 [RSV04] and more or less re-discovered by Karp [Kar11]. Although
currently only a small number of results are known, it seems to be applicable
to a range of parameterized minimization problems, where the parameter is
the size of the solution set. Most of the currently known iterative compression
algorithms solve feedback set problems in graphs, problems where the task is
to destroy certain cycles in the graph by deleting at most k vertices or edges.
In particular, the k-Graph Bipartisation problem, where the task is to
find a set of at most k vertices whose deletion destroys all odd-length cycles,
has been shown to be FPT by means of iterative compression [RSV04]. This
had been a long-standing open problem in parameterized complexity theory.
Here is the iterative compression routine for our much attacked friend
Vertex Cover. It is slightly modified in that we begin with an empty cover,
but this is of no consequence. More importantly, in the algorithm below, if
there is no size k solution at some stage, this certified “no” is a hereditary
property. Therefore, as we only add vertices and edges, the instance cannot
become a “yes”; so we are safe to abort.
The routine is to add the vertices of the graph, one at a time; if the relevant
problem is of the compressible type, we will have a running solution for the
induced subgraph of size k or at some stage get a certificate that there is
no such solution for G. The algorithm modifies the current solution at each
compression stage to make another one for the next stage.
Algorithm 9.4.10 (Iterative Compression for Vertex Cover)
The parametric tractability of the method stems from the fact that each
intermediate solution considered has size bounded by some k 0 = f (k), where
k is the parameter value for the original problem. It works very well with
monotone problems, where if we get an intermediate no then the answer is
definitely NO. Note that many minimization problems are not monotone in
this sense. For example, a NO instance (G = (V, E), k) for k-Dominating
Set can be changed to a YES instance by means of the addition of a single
vertex that is adjacent to all vertices in V .
Niedermeier [Nie06] has an excellent discussion of this technique, which
would seem to have a lot of applications.
9.4.6 Exercises
∗
Exercise 9.4.11 (Reed, Smith, Vetta [RSV04]) (This is definitely not
easy. But I have given solutions.)
(i) Prove the following Lemma.
9.4 Parameterized Tractability 267
Edge Bipartization
Instance : A graph G = (V, E).
Parameter : An integer k.
Question : Can I delete k or fewer edges and turn G into a bipartite graph?
a vertex s ∈ V (G) and then for any v ∈ V (G), and i ∈ {1, . . . , k}, we define
The key to the efficiency is that Cs (i, v) stores sets of colours in paths
of length i − 1 between s and v, instead of the paths themselves.
Here χ(v) denotes the colour of v. Observe that |Cs (i, v)| 6 ki and, for
Alon, Yuster and Zwick demonstrated that the colour coding technique
could be applied to a number of problems of the form asking “is G0 a subgraph
of G?”
What has this to do with parameterized complexity? It turns out that this
method can be used as an engine for FPT algorithms via a de-randomization
process. A k-perfect family of hash functions is a family F of functions (colour-
ings) taking [n] = {1, . . . n} onto [k], such that for all S ⊆ [n] of size k there
is a f ∈ F whose restriction to is bijective (colourful). It is known that
9.4 Parameterized Tractability 269
k-perfect families of 2O(k) log n linear time hash functions exist. Using the
hash functions allows us to deterministically simulate the probabilistic al-
gorithm by running through the hashings, and hence gives a deterministic
2O(k) |E| log |V | algorithm for k-Path. More such applications can be found
in Downey and Fellows [DF13], and Niedermeier [Nie06], and Cygan et. al.
[CFK+ 16].
The O(k) in the exponent hides evil, and the de-randomization method
at present seems far from practical, since the hashing families only begin at
large numbers.
Note that the method does not work when applied to things like k-Clique
to be shown randomized FPT because (i) above fails. The important part of
the dynamic programming method was that a path was represented by its
beginning v0 and some vertex vi , and to extend the path only needed local
knowledge; namely the colours used so far and vi . This fails for Clique, and
would need ns at step s in the clique case.
There are many, many more methods used for proving problems FPT.
They involve methods using automata on graphs which are treelike, or path-
like, methods based on logic, methods based on “well-quasi-ordering” theory,
which can give proofs that certain problems are FPT and hence in P for a
fixed k, where we have no idea what the algorithm actually is (an idea going
back to Fellows and Langston [FL88]). We cannot do justice to these meth-
ods in a short chapter like this, and refer the reader to Downey and Fellows
[DF13], Flum and Grohe [FG06], Gygan et. al. [CFK+ 16].
9.4.8 Exercises
Multidimensional Matching
Input : A set M ⊆ X1 × · · · × Xr , where the Xi are disjoint sets.
Parameter : The positive integer r and a positive integer k.
Question : Is there a subset M 0 ⊆ M with |M 0 | = k, such that no two
elements of M 0 agree in any coordinate?
Exercise 9.4.14 Show that the same is true for the following variations of
k-Path.
(i) k-Cycle which asks for a cycle of length at least k,
(ii) k-Exact Cycle which asks for a cycle of length exactly k, and
(iii) k-Cheap Tour which takes a weighted graph and asks for a tour of
length at least k and cost 6 S. The parameters here are S and k.
270 9 Parameterized Complexity
x ∈ Ln . In the remaining part of the proof, we show that there exists such
an advice S ⊆ Lnc as required above.
We view A as a function mapping strings from (Ln )t to Lnc , and say a
string y ∈ Lnc covers a string x ∈ Ln if there exist x1 , . . . , xt ∈ Σ 6n with
xi = x for some i, 1 6 i 6 t, and with A(x1 , . . . , xt ) = y. Clearly, our
goal is to find polynomial-size subset of Lnc which covers all strings in Ln .
By the pigeonhole principle, there is a string y ∈ Y for which A maps at
least |(Ln )t |/|Lnc | = |Ln |t /|Lnc | tuples in (Ln )t to. Taking the t’th square
root, this gives us |Ln |/|Lnc |1/t distinct strings in Ln which are covered by y.
Hence, by letting t = lg |Lnc | = O(nc ), this gives us a constant fraction of the
strings in Ln . It follows that we can repeat this process recursively in order
to cover all strings in Ln with a polynomial (in fact, logarithmic) number of
strings in Lnc . t u
We invite the reader to prove this in Exercise 9.5.5. The parametric version
of distillation is the following.
What about AND composition and distillation? For a long time, the situ-
ation was as above. There was no “classical” evidence that AND-distillation
implied any collapse until a remarkable solution to this conundrum was an-
nounced by a graduate student from MIT10 .
Even though there are now shorter proofs, all proofs of Theorem 9.5.4
are complex and need concepts beyond the scope of the present book. AND-
composition can be used to show that many important problems such as
determining if a graph has “treewidth k” (meaning it has a certain struc-
ture widely applied in algorithmic graph theory) cannot have polynomial
sized kernels unless collapse occurs, in spite of having an FPT algorithm for
recognition of a fixed k.
9.5.1 Exercises
k log n Mini-Circsat
Input : A positive integers k and n in unary, and a Boolean circuit C of
total description size n.
Parameter : A positive integer k.
Question : Is there any input vector x of weight 6 k log n with C(x) = 1?
We are lead to the following (which has higher level extensions to M [t] for
t > 1), by confining ourselves to boolean circuits of weft t.
10 Now a professor at the University of Chicago.
9.6 Another Basic Hardness Class and XP Optimality∗ 277
k log n Mini-H[t]
Input : Positive integers k and n and m, and a weft t Boolean circuit
C with k log n variables and of total description size m.
Parameter : A positive integer k.
Question : Is there any input vector x such that C(x) = 1?
It is not hard to see that M [1] ⊆ W [1]. Next we will need a very deep
result from computational complexity theory:
Theorem 9.6.2 (Cai and Juedes [CJ03], Downey et. al. [DECF+ 03]).
Mini-3CNF Sat is M [1]-complete under parameterized Turing reductions.
Proof. One direction follows from the fact that a language decidable in time
2o(k log n) is FPT.
For the other direction, suppose we are given a weft 1 Boolean circuit C
of size N , and suppose that Mini-Circsat is solvable in FPT time f (k)nc .
Take k = f −1 (N ) and n = 2(N/k) . Then, of course, N = k log n. For example,
k
if f (k) = 22 then f −1 (N ) = log log N . In general, k = f −1 (N ) will be some
slowly growing function of N , and therefore N/k = o(N ), and also cN/k =
o(N ) since c is a constant, and furthermore by trivial algebra cN/k + log N =
o(N ). Using the FPT algorithm, we thus have a running time of
It is possible to show that many of the classical reductions work for the
miniaturized problems, which miniaturize the size of the input and not some
part of the input; meaning we can often re-cycle classical proofs. Such meth-
ods allow us to show the following.
9.6 Another Basic Hardness Class and XP Optimality∗ 279
Corollary 9.6.1.
1. The following are all M [1]-complete under fpt Turing reductions: Mini-
SAT, Mini-3SAT, Min-d-Colourability, and Mini-Independent
Set.
2. Hence neither Mini-Vertex Cover nor any of these can have a subex-
ponential time algorithm unless the ETH fails.
Theorem 9.6.4 (Cai and Juedes [CJ01, CJ03]). Assuming ETH (equiv-
alently M [1] 6=FPT) there is no
√
O∗ (2o(k) ) algorithm for k-Path, Feedback
∗ k
Vertex Set, and no O (2 ) algorithm for Planar Vertex Cover.
Planar Independent Set, Planar Dominating Set, and Planar
Red/Blue Dominating Set
We can even refine this further to establish exactly which level of the
M -hierarchy classifies the subexponential hardness of a given problem. For
example, Independent Set and Dominating Set which certainly are in
XP. But what’s the best exponent we can hope for for slice k?
In some sense we have returned to the dream we started with for com-
putational complexity in Chapter 6; we can prove algorithms more or less
optimal subject to complexity assumptions. We refer the reader to Downey
and Fellows [DF13], Chapter 29, for more on this fascinating story.
9.6.1 Exercises
10.1 Introduction
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 281
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_10
282 10 Other Approaches to Coping with Hardness∗
10.2.2 Exercises
Prove:
(i) For m > 2, the problem is NP-complete.
(ii) The problem has an approximation algorithm to within a constant.
c(I, f (I))
.
c(I, OP T (I))
The idea is that even if we cannot solve the problem exactly, maybe we can
do no worse than (e.g.) double the true solution. One of the classic constant
ratio approximation algorithms is for the problem Bin Packing which we
proved NP-complete in Theorem 7.2.25. We recall that it is the following
problem:
Bin Packing
Input : A finite set S, volumes w : S → N, bin size B ∈ N, k ∈ N.
Question : Is there a way to fit all the elements of S into k or fewer bins?
1 √
The actual running time of the best algorithm is O( nm).
10.2 Approximation Algorithms 285
In Exercise 10.2.9, we invite the reader to verify that this lemma holds for
one of the reducibilities 6AP .
Using these notions of reduction, we can now define APX-R-complete op-
timization problems as those whose decision problem is NP-complete, have
constant approximation algorithms, and every other such language R-reduces
to them.
The following problem is APX-complete for appropriate reductions R.
Detailed proofs appear in for example, Vazirani [Vaz01] or Ausiello et. al.
[ACG+ 99].
If G has a Hamilton cycle then the optimal solution is |V (G)|, and if not then
the optimal solution must have weight > c · |V |. The competitive ratio allows
us to use this show Hamilton Cycle is in P. t u
10.2.5 Exercises
10.2.6 PTAS’s
(1 + ε)|Q| 6 |S|.
In §9.3.4, we met these are showed that some problems have PTAS’s which
were quite infeasible assuming W [1] 6= FPT. As we mentioned there some
problems do have quite feasible PTAS’s, and there is quite a rich literature
around this area. We will limit ourselves to an analysis of the following which
problem. It is a variation of Partition we met in Chapter 7.
Minimum Partition
instance : A finite set X of items and for each xi ∈ X a weight ai .
Solution : APpartition YP1 t Y2 = X.
Cost : max{ xi ∈Y1 ai , xi ∈Y2 ai }.
The following result is folklore, or at least very hard to track who proved
it first.
The running time of the algorithm is polynomial since, first we nee to sort the
items, taking O(n log n) (as is well-known). The preliminary optimal solution
1
takes nk(r) since k(r) is O( r−1 ), we get that the running time is O(n log n +
nk(r) ). t
u
Definition 10.2.4.
terval representation. It would take us too far afield for look into properties
of interval graphs and bounded pathwidth graphs, so we refer the reader to ei-
ther Diestel’s [Die05] excellent book for a good treatment or perhaps Downey
and Fellows [DF13] which gives a reasonable source for the main ideas. It is
known that determining the pathwidth of a graph is NP-complete, but also
FPT for a fixed k. (Bodlaender [Bod96]). However, the algorithm is horrible,
3
and for a fixed k takes O(235k |G|). Cattel, Dinneen, and Fellows [CDF96]
produce a very simple algorithm based on “pebbling” the graph using a pool
of O(2k ) pebbles, that in linear time (for fixed k), either determines that the
pathwidth of a graph is more than k or finds a path decomposition of width
at most the number of pebbles actually used. The main advantages of this
algorithm over previous results are (1) the simplicity of the algorithm and (2)
the improvement of the hidden constant for a determination that the path-
width is greater than k. The algorithm works by trying to build a binary tree
T as a topological minor (cf. Exercise 7.2.32) of G (represented by pebbles
which are slid along the edges) or builds a path decomposition of width 2k .
It is known that G has pathwidth k iff it does not have the complete binary
tree as a topological minor. Again it would take us a bit too far from our
main goals to look at the detailed proof of this result. Hence we only state it
below:
10.2.9 Exercises
Exercise 10.2.13 (i) Show that G is an interval graph of width k iff its
maximal cliques have size 6 k + 1 and can be ordered M1 , . . . , Mp so
that every vertex belonging to two of these cliques also belongs to all
cliques between them.
(ii) Hence, or otherwise, show that being an interval graph of width 6 k
can be decided in polynomial time.
The idea behind this approach, is to figure out which problems have algo-
rithms which are hard/easy for “typical” instances. Most problems seem, in
practice, to have lots of easy instances. How to capture this idea? Problems
which are not in P will have an infinite complexity core of hard instances (Ex-
ercise 10.3.3), but they can be distributed in many different ways. We have
already seen this in the the Buhrman and Hitchcock Theorem which showed
that under the assumption that co-NP 6⊆ NP/Poly, then NP m-complete
problems have exponentially dense hard instances.
Pioneered by Leonid Levin [Lev86], authors looked at what it meant for a
problem to be easy on average. By way of motivation, consider the problem
of graph colouring in k colours. Suppose that we posit that the graphs with n
n
vertices are distributed uniformly at random. (That is, if G is one of the 2( 2 )
n
graphs with n vertices the probability of G be randomly chosen is 2−( 2 ) . This
is the standard “coin flip” probability.) Then, if we simply use backtracking,
2 Naranaswamy and Subhash Babu [NSB08] gave an approximation ratio of 8. We remark
that in 1988, Chrobak and Ślusarak [CS88] gave a lower bound of 4.4. Finally, in 2016
Kierstead, Smith, and Trotter [KST16] showed that the performance ratio of first fit
on interval graphs is at least 5. The most recent word is by Dujmovic, Joret and Wood
[DJW12], who have shown that first-fit will colour graphs of pathwidth k (i.e. not just
interval graphs) with at most 8(k + 1) many colours. The precise best bound is not yet
known.
3 After Kierstead and Trotter [KT99] who proved this for interval graphs.
294 10 Other Approaches to Coping with Hardness∗
|x|c
µ(t(x) > d) 6 .
dε
10.3 Average Case Complexity 295
That is, after running for t(x) many steps, the algorithm solves all but a
|x|c
t(x)ε fraction of the inputs of length |x|. The reader should note that the use
of ε means that we make the definition machine independent.
To complete our notion of average case, we need to say what typical means
for inputs. Thus we will need specify a pair (L, µ) where µ is a probability
distribution with associated probability Probµ = Prob. Since it seems foolish
to define this for distributions which take a long time to compute, we use
distributions that we can compute, and compute quickly on average. The
reader should note that most distributions are, in fact, real-valued, but we
can ask that they be approximable in polynomial time.
Definition 10.3.3.
We will omit the somewhat technical proof, although we invite the reader
to prove Lemma 10.3.1 in Exercise 10.3.5.
The art here is finding the relevant distributions where a problem becomes
tractable. Alternatively, we can formulate the problem so that it is only con-
cerned with instances where the problem becomes average case tractable.
An illustration of this is the paper [DTH86] by Dieu, Thanh and Hoa. They
consider the problem below.
10.3 Average Case Complexity 297
Clique(ne )
Input : A Graph G and a integers n, k.
Question : Does G have n vertices and 6 ne edges and a clique of size k?
Theorem 10.3.1 (Dieu, Thanh and Hoa [DTH86]). For 0 < e < 2,
Clique(ne ) ∈ AvgP.
Dieu, Thanh and Hoa also show that the problem is NP-complete. Their
proof actually shows that the the algorithm of Tsukiyama, Ide, Ariyoshi and
Shirakawa [TMAS77] runs in polynomial time on average. Unsurprisingly,
the methods are taken from probability theory and are sufficiently complex
that we will omit them and refer the reader to that paper.
Actually, this example shows one of the main drawbacks of the theory.
It seems that applying the theory involves some complicated mathematics,
and there are not too many genuine applications in the literature. Although
average case complexity seems a fine idea in theory, as a practical coping
strategy, it has proven quite unwieldy.
Average case complexity also has a hardness theory. The next subsection
looks at this theory: when things seem hard on average.
10.3.2 DistNP
Levin proved that there are problems complete under this notion of mem-
bership and reduction. It is an analog of our old friend we found in Theorem
7.2.1, which had echoes for other complexity classes: L = {he, x, 1n i : some
computation of Ne accepts x in 6 n steps}. Now we need to get the distri-
bution involved, and this seems somewhat artificial.
We invite the reader to prove that such an h exists in Exercise 10.3.7. Livne
then defines a new polynomial time m-reduction f , in place of h which explic-
itly codes z into f (z). For example, if we use an underlying lexicographic or-
dering on formulae, and consider the formulae e0 = (x0 ∨¬x0 ), e1 = (x1 ∨¬x1 ),
so that (the coding of) e0 6lex e1 , with |e0 | = |e1 |, for w ∈ {0, 1}∗ , Livne
defines
f (w) = ew1 ∧ ew2 ∧ ew|w| ∧ h(w),
where wi is the i-th bit of w. Then
• Given f (w) we can compute w.
• w 6lex wb implies f (w) 6lex f (w).
b
• Since e0 and e1 are tautologies, f (w) preserves the truth values of h(w).
We call f an order preserving and polynomial invertible reduction from L
to Sat. Now we want to make a distributional version of Sat. We give it the
distribution ν via
(
µ(f −1 (x)) if x ∈ ra(f ).
ν(x) =
0 otherwise.
Then (L, µ) 6m (Sat, ν), because it maps each instance of L to one of Sat,
and the probabilities are preserved4 . It is also clear that ν is P -computable
as µ is.
As Livne observes, the proof above utilizes padding, and lexicographic
order. So all that is needed are similar methods for other problems to define
length preserving invertible reductions. He provides details for Clique and
Hamilton Cycle. It is relatively clear that we could also do this for all of,
for example, Karp’s original 21 problems.
10.3.4 Exercises
−2 −|x|
Exercise 10.3.4 Prove that the distribution µ(x) = 6
π 2 |x| 2 on {0, 1}∗
is computable and uniform.
4 The reader might be somewhat dubious about whether this says a lot about “normal”
distributional versions of the combinatorial problems; but the point is that the methods
show that there is some some distributional version.
300 10 Other Approaches to Coping with Hardness∗
One relatively recent initiative is an area called generic case complexity which
resembles average case complexity, but has a somewhat different flavour.
Generic case complexity is still largely undeveloped. Its motivation comes
from the idea of an algorithm being essentially good provided that it works
well given that that we completely ignore a small collection of inputs5 . Its
origin was combinatorial group theory. We have seen in §4.3.5 that there
are finitely presented groups with unsolvable word problems. However, in
practice, it seems that you can solve the word problem in a typical finitely
presented group on a laptop. In particular, the reader might imagine that
we might have a partial algorithm for a problem which, for instance, might
not halt on a very small set of inputs, but might run very quickly on the
remainder. This partial algorithm would have no “average” running time as
the run time would not even be defined for such a partial algorithm. This
is the idea for generic case complexity, defined formally below. So far, it has
proven that generic case complexity is rather simpler to apply than average
case complexity. This is because generic case complexity feasibility proofs rely
on less heavy duty probability theory, at least to this author’s knowledge.
5 There have been other examples of this in numerical analysis, such as Amelunxen and
Lotz [AL17], which is entitled “Complexity theory without the black swans” which comes
across as amusing to me, as I originate from Australia, where for the most part swans are
black.
10.4 Generic Case Complexity 301
is generic in I.
(ii) L is said to be in SGP, strongly GenP, if
is strongly generic in I.
Here is one example drawn from the unpublished note [GMMU07]. Recall
that Subset Sum is NP-complete by Theorem 7.2.24. Subset Sum is theo-
retically difficult as it is NP-complete, but easily solved in practice. We will
10.4 Generic Case Complexity 303
Subset Sum 2
Input : A finite set S = {w1 , . . . , wn } ⊂ N, andPa target integer c
Question : Is there a subset S 0 ⊆ S such that w∈S 0 w = c?
j−1
X
wk = c, Accept.
i=1
as n → ∞. t
u
With a wee bit more work, it is possible to eliminate the use of parameters
to see that Subset Sum 2 is in GenP. This is an interesting example, in
that it is observed in [GMMU07] that the measure of negligible strings only
goes to 0 rather slowly.
304 10 Other Approaches to Coping with Hardness∗
Here Bn is the collection of words of length 6 n. This works, in fact, for any
finitely generated group.
Here is an example of a strongly generic algorithm in this area. To make
sense of this example, the reader will need to be familiar with some combi-
natorial group theory for the correctness proof. The reader unfamiliar with
the concepts below should simply skip the details. Let G = ha, b|Ri be any
2-generator group. It is a fact from combinatorial group theory (See [LS01])
that any countable group is embeddable in a 2-generator group. Thus, there
are uncountably many such G! Let F = hx, y | i be the free group of rank 2.
Define H = G ∗ hx, yi := ha, b, x, y; Ri, that is, the free product of G and F .
Theorem 10.4.2 (Kapovich, Myasnikov, Schupp, Shpilrain [KMSS03]).
The word problem for H is strongly generically solvable in linear time.
Proof. (Sketch) Take a long word w on the alphabet {a, b, x, y}±1 , e.g.
abx−1 bxyaxbby.
Erase the a, b symbols, freely reduce the remaining word on {x, y}±1 , and
if any letters remain, Reject.
This partial algorithm gives no incorrect answers because if the image of
w under the projection homomorphism to the free group F is not 1, then
w 6= 1 in H.
abx−1 bxyaxbby → x−1 xyxy → yxy 6= 1
The successive letters on {x, y}±1 in a long random word w ∈ H is a long
random word in F which is not equal to the identity. So the algorithm rejects
on a strongly generic set and gives no answer if the image in F is equal to
the identity. t
u
The above is called the quotient method and can be used to show that G
has a generic case solvable word problem for any G = hX, Ri, G subgroup
of K of finite index for which there is an epimorphism K → H, such that
G is hyperbolic and not virtually cyclic. (Don’t worry about these terms if
they are not familiar. Enough to know that these are important classes of
geometric groups.)
The methods of the paper [KMSS03] and follow up papers allowed us
to show that the word problems are generically linear time decidable for
1-relator groups with > 3 generators. 1-relator finitely presented groups fa-
mously have solvable word problems by Magnus (see [LS01]). However, all
10.4 Generic Case Complexity 305
known algorithms use double exponential time. The paper [KMSS03], also
gives generic case algorithms for isomorphism problems and algorithms for
braid groups. (Again, don’t worry if you are not familiar with these concepts
from group theory as they only serve as illustrations of important problems
which have good generic case decidable solutions.) Certain finitely presented
groups with undecidable work problems have generic case decidable word
problems. For example, the Boone group, we constructed to give one with an
unsolvable word problem, has a strongly generic case decidable word problem
([KMSS03]).
We remark that the algorithms above (in some sense) show the limitations
of the concept in that most of the generic case algorithms are trivial. Also
generic case complexity seems hard to apply to settings outside of group the-
ory. For example, we can define what it means for a first order theory under
a reasonable coding to be generically decidable with the obvious meaning.
One way to do this would be to count the number of possible well-formed
sentences of a given length (determined by counting the symbols in the ex-
pressions), and asking for an algorithm which generically decides if a sentence
is valid, according to this notion of density. However, many workers have in-
dependently observed that a generic case decidable theory with this notion
of density is decidable. (This is reasonably clear. If we have a natural size
measure for formulae, then to decide if a sentence X is true, we can keep
padding the sentence with X ∧ (Y ∨ (a ∨ ¬a)) for all longer sentences Y . This
will have positive density and hence at some stage the generic case algorithm
run upon these padded instances will need to answer yes or no, and it must
be correct. All we need to do is wait.)
Generic case complexity also has a hardness theory. Not surprisingly, a
“dense” variation of the halting problem has no algorithm which can generi-
cally decide it; and there is a GenP version of this.
We also remark that there are other variations on the theme of ignoring
inputs. For example, X is called coarsely computable if there is an algorithm
A halting on all inputs and is correct on a density 1 set, and combining the
two, there is a notion of coarsely generic. All finitely presented groups have
coarsely decidable word problems, exponentially fast (in terms of the density
upon which they are correct) and in linear time. ([KMSS03])
It is not yet clear how important this area is, as it remains in a state of
formation.
10.4.2 Exercises
In this last short section, we will give a brief account of one recent area
of computational complexity, called smoothed analysis, which has delivered
techniques which (sometimes) seem to explain why algorithms seem to work
better in practice than we would expect. We have seen some other explana-
tions in the two preceding sections on average case and generic case complex-
ity.
It is fair to say that the motivating example for smoothed analysis was an
analysis of the Simplex Algorithm for Linear Programming. The Sim-
plex Algorithm was created in 1947 by George Dantzig for solving linear
programming questions. These questions seek to optimize a set of solutions
to multivariable linear inequalities over R. The algorithm works by creating
a simplex of feasible solutions and then systematically searches for a a solu-
tion by traversing the simplicial edges6 . Because of this it is known to take
exponential time in worst case. But, in practice, it seems to almost always
run quickly, and for many years researchers were puzzled by that fact.
There were several attempts to explain the behaviour by average case anal-
ysis, such as Smale [Sma83]. In 2001, Spielman and Teng [ST01] and [ST09]
suggested an explanation they called smooothed analysis. This suggestion has
attracted considerable interest in the last two decades. Spielman and Tang’s
idea is that not only is the problem average case feasible, but also the dis-
tribution of hard instances has a certain very nice feature: A hard instance
is “very close” to an easy one. That is, if we randomly perturb the problem
then almost surely we will hit an easy instance.
The relevant definition if the following.
Notice that this is a mixture of worst case analysis since we look at all
inputs I of length n, and average case by looking the expected behaviour
within neighbourhoods of each I.
6 It is not important for this motivating example to know the precise details, only to know
that this is one of the most important and used algorithms for optimization in existence.
10.5 Smoothed Analysis 307
|u1 −v 1 |+|u2 −v 2 |+|x1 −y 1 |+|x2 −y 2 |−|u1 −y 1 |−|u2 −y 2 |−|v 1 −x1 |−|v 2 −x2 |.
Claim. For a perturbed instance and ε > 0, the probability that there is an
4
ε-small swap is O(ε · nd ).
We first prove the claim. Considering all improving swaps, there are only
O(n4 ) distinct swaps. We first establish a bound of O( dε ) on the probability
that a fixed swap is ε-small. Consider the swap xy, uv as above. Then |u1 −
v 1 | + |u2 − v 2 | + |x1 − y 1 | + |x2 − y 2 | − |u1 − y 1 | − |u2 − y 2 | − |v 1 − x1 | − |v 2 − xx |,
is linear in 8 variables with each variable have a coefficient from {−2, 0, 2}.
There are 576 = (4!)2 linear combinations (i.e. all pairs in differing orders).
Thus, the expression is ε-small iff one of these combinations lies in (0, ε). For
each fixed linear combination, we claim that the probability that the linear
ε
combination lies in (0, ε) is 2d . Since all the coefficients cannot be 0 to give
a ε-small improvement, as can assume that at least one is ±2. The values
of u1 , plus value of the linear combination of the other 7 variables together
ε
with 2u1 or −2u1 to lie in (0, ε) is at most 2d since we are using the uniform
4
distribution. Finally, note that the O(n ) swaps are all independent events,
and hence we get a bound of O(n4 2d ε
) = O(d−1 n4 · ε), as required to establish
the claim.
Now we can finish the proof of the Theorem. We first note that in [0, 1] ×
[0, 1], the maximal distance between any two points using the norm || is 2,
and hence any cycle has length at most 2n. If there are no ε-small swaps, then
the local search will halt within 2n ε many iterations. The worst case number
of iterations is bounded above by n! as this is the number of different cycles.
We estimate how many iterations N are needed. Let ε = 2n N . The expected
Pn!
number of iterations is N =1 Prob[The number of iterations > N ]. This
P n! 2n
sum is bounded above by N =1 Prob[There is a N -small swap]. By Claim
10.5, this is
n! n!
X 2n n4 X n5
6 O( · )= O( ).
N d N ·d
N =1 N =1
Now,
n!
X n5 1 1 1
O( ) = O(d−1 · n5 (1 + + + · · · + )).
N ·d 2 3 n!
N =1
10.6 Summary 309
10.6 Summary
In this chapter we gave brief and often sketchy snapshots of several methods of
coping with intractability. These were not meant to be complete treatments
and we hope that the reader will be inspired to seek out fuller details for
themselves. One slightly dated but very interesting article is Impagliazzo
[Imp95], where the Impagliazzo discusses the implications of the validity of
various complexity hypotheses about average case complexity, and proposes
“Five Worlds” and as he says:
“In each such “world”, we will look at the outcomes of these questions on algorithm
design for such areas as articicial intelligence and VLSI design, and for cryptography
and computer security”.
11.1 Chapter 1
Exercise 1.2.2.
1. Use, for example, Gödel numbers. If Σ = {a1 , . . . , an } then let ai be represented by i,
i.e. #(ai ) = i, and then a string x1 . . . xn could be represented by 2#(x1 3#(x2 ) . . . (pn )#(xn ) ,
where pn denotes the n-th prime.
2. The same technique works for Σ = {a1 , a2 , . . . }.
3. To specify the periodic sequence, you only need to specify the finite subsequence which
generates it.
4. For eventually periodic, you’d need to specify the finite initial segment, and the the
finite periodic part.
Exercise 1.2.3 and 1.2.4 both can be done with Gödel numbers, the first one speficying
A for cofinite A, and the second the coefficients of the polynomial.
Exercise 1.2.5. This is purely algebraic. Is f onto? If x = 0, y = 0, then
1 2
f (x, y) = (0 + 2.0.0 + 02 + 3.0 + 0) = 0
2
So if S ⊆ N, and S = {m ∈ N : ∃x0 , y 0 ınN, F (x0 , y 0 ) = m}, 0 ∈ N.
Suppose that k ∈ S, so ∃x0 , y 0 ∈ N : f (x0 , y 0 ) = k. Then what values for x and y do we
need so that f (x, y) = k + 1?
Either y 0 = 0, or it isn’t. If it is, then 2k = (x0 )2 + 3x0 . What values for x and y are
needed so that
2(k + 1) = x2 + 2xy + y 2 + 3x + y?
We can choose x = 0 and y = x0 + 1. Then
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 313
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8_11
314 11 Solutions
2k = (a + b)2 + 3a + b = (c + d)2 + 3c + d
11.2 Chapter 2
Exercise 2.2.2 1. L = L(α) with α = b∗ ∪ (b∗ ab∗ ) ∪ (b∗ ab∗ ab∗ ) ∪ (b∗ ab∗ ab∗ ab∗ ).
2. L = L(β) with β = (a∗ ba∗ ba∗ )∗ .
Exercise 2.2.6 1. (a∗ b∗ )∗ (b∗ a∗ )∗ = (a ∪ b)∗ . Prove this by induction. Clearly λ is in
both. if σ ∈ (a ∪ b)∗ , then σa ∈ (a∗ b∗ )∗ (b∗ a∗ )∗ , since the (b∗ a∗ )∗ allows one more a, and
similarly b.
Exercise 2.4.5 I strongly suggest you draw a diagram:
Q0 = {q0 }, (Q0 , a) ` {q0 , q1 } = Q1 , (Q0 , b) ` {q3 } = Q2 , (Q1 , a) ` {q0 , qq , q2 , q3 } =
Q3 , (Q1 , b) ` {q1 , q3 } = Q4 , (Q2 , a) ` {q0 } = Q0 , (Q2 , b) ` Q2 , (Q3 , a) ` Q3 , (Q3 , b) `
{q3 , q1 , q2 } = Q5 , (Q4 , a) ` {q0 , q3 , q2 } = Q6 , (Q4 , b) ` Q4 , (Q5 , a) ` Q6 , (Q5 , b) `
Q5 , (Q6 , a) ` {q0 , q1 , q2 } = Q7 , (Q6 , b) ` {q3 , q2 } = Q8 , (Q7 , a) ` Q3 , (Q7 , b) ` Q5 , (Q8 , a) `
{q0 , q2 } = Q9 , (Q8 , b) ` Q8 , (Q9 , a) ` Q7 , (Q9 , b) ` {q3 , q2 } = Q8 . K = {Q0 , . . . , Q9 }
F = {Q0 , . . . , Q7 }.
11.3 Chapter 3 315
a
a
a a
0
q1 q2
q0
b b
a b a
qb
11.3 Chapter 3
Exercise 3.2.7 Quadruples needed for a Turing machine to run on a planar tape are as
follows:
hqi , xi , Ai,j , qi,j i where qi , qi,j ∈ {q0 , . . . , qm } with q0 the initial state; xi ∈ Σ; Ai,j ∈
Σ ∪ {N, S, E, W }.
A quadruple is interpreted as: if I am in state qi reading symbol xi then perform action
Ai,j by writing a symbol in Σ or moving in one of the directions north, south, east or west,
and then go to state qi,j .
The following list of quadruples is a program for covering an empty planar tape with
the symbol A. It works by moving around in a clockwise spiral.
{hq0 , B, A, q0 i, hq0 , A, S, q1 i, hq1 , A, N, q2 i, hq1 , B, A, q3 i, hq2 , A, E, q0 i, hq3 , A, W, q4 i,
hq4 , A, E, q5 i, hq4 , B, A, q6 i, hq5 , A, S, q5 i, hq5 , B, A, q3 i, hq6 , A, N, q7 i, hq7 , A, S, q8 i,
hq7 , B, A, q9 i, hq8 , A, W, q8 i, hq8 , B, A, q6 i, hq9 , A, E, q10 i, hq10 , A, W, q11 i,
hq10 , B, A, q0 i, hq11 , A, N, q11 i, hq11 , B, A, q9 i}
Quadruples 1 to 5 move east writing the symbol A in blank squares until finding a blank
square to the south. Similarly, quadruples 6 to 10 move south writing A’s until finding a
blank square to the west. quadruples 11 to 15 move west writing A’s until detecting a blank
row to the north, and quadruples 16 to 20 write A’s north until finding a blank square to
the east. The process then repeats forever.
Exercise 3.3.1
(i) Use recursion. f (a, 0) = 1 = P (S(Z(a))). f (a, b + 1) = f (a, b) · b. (We have already
shown that multiplication is primitive recursive.)
Use induction to prove that it works. For the induction step, if we assume f (a, b) = ab ,
then f (a, b + 1) = f (a, b) · a = ab · a = ab+1 , as required.
(ii) Use recursion again. f (0) = 1, f (a + 1) = (a + 1) · f (a), etc.
Exercise 4.1.3 These are all similar. You can convert the regular languages into deter-
ministic finite automata. Then, for example, asking if L(M ) = ∅ is asking whether there is
a string σ accepted by the machine. Since there are only n states for fixed M , if M accepts
anything it will accept something of length 6 n, as the longest path without repetitions
will have length n. Thus, at worst try all strings of length 6 n.
Exercise 4.1.5 Let f (x) = ϕx (x) + 1 if ϕx (x) ↓. Then f is partial computable, by the
universal Turing Machine. Suppose that g is a computable function extending f . Then
there is some z with g = ϕz . But then as g is total, g(z) ↓ . But by definition of f ,
f (z) = ϕz (z) + 1. However, since g extends f , ϕz (z) = g(z) = f (z) = ϕz (z) + 1, a
contradiction.
Exercise 4.2.6 A Minsky machine to compute the function f (x) = x2 + 1 is:
(Here I am using the shorthand Ri − p, q to mean “If the content of Ri is > 0 go to
instruction p, and if = 0, go to instruction labeled q.” This separates the “take 1” option,
into two instructions, which happens at instruction p.)
0. R1 − 2, 11 7. R4 + 5
1. R1 − 2, 4 8. R4 − 9, 4
2. R2 + 3 9. R2 + 8
3. R3 + 1 10. R2 − 10, 11
4. R3 − 5, 10 11. R1 + 12
5. R2 − 6, 8 12. Halt.
6. R1 + 7
A rational game for f (x) = x2 + 1 derived from the above vector game is:
Finally, the relevant Collatz function is the following. (To simplify the calculation, I
will use the least common multiple of the denominators instead of their product as the
base p.)
Let p = 2 · 3 · 5 · 7 · 1112 · 13. The value of f (n) is
11
1. 13
n if n = 13k mod p for some k.
1
2. 11 12 n if n = 1112 k mod p for some k and the previous case does not apply.
2·1312
3. 1111
n if n = 1111 k mod p for some k and the previous cases do not apply.
1310
4. 3·1110
n if n = (3 · 1110 )k mod p for some k and the previous cases do not apply.
1311
5. 1110
n if n = 1110 k mod p for some k and the previous cases do not apply.
8
3·13
6. 119
n if n = 119 k mod p for some k and the previous cases do not apply.
139
7. 7·118
n if n = (7 · 118 )k mod p for some k and the previous cases do not apply.
134
8. 118
n if n = 118 k mod p for some k and the previous cases do not apply.
7·135
9. 117
n if n = 117 k mod p for some k and the previous cases do not apply.
2·137
10. 116
n if n = 116 k mod p for some k and the previous cases do not apply.
136
11. 3·115
n if n = (3 · 115 )k mod p for some k and the previous cases do not apply.
8
13
12. 115
n if n = 115 k mod p for some k and the previous cases do not apply.
135
13. 5·114
n if n = (5 · 114 )k mod p for some k and the previous cases do not apply.
1310
14. 114
n if n = 114 k mod p for some k and the previous cases do not apply.
5·13
15. 113
n if n = 113 k mod p for some k and the previous cases do not apply.
3
3·13
16. 112
n if n = 112 k mod p for some k and the previous cases do not apply.
132
17. 2·11
n if n = (2 · 11)k mod p for some k and the previous cases do not apply.
134
18. 11
n if n = 11k mod p for some k and the previous cases do not apply.
132
19. 2
n if n = 2k mod p for some k and the previous cases do not apply.
20. 1311 n if none of the precious cases apply.
11.4 Chapter 5 319
√
Exercise 4.3.11 a is rational iff ∃x∃y((x + 1)2 − a(y + 1)2 = 0).
Exercise 4.3.12 Suppose S is Diophantine. a ∈ S ⇐⇒ (∃x)[p(a, x) = 0]. Define q(a, x) =
(a + 1)(1 − 2p(a, x)2 ) − 1. Then if p(a, x) = 0 then q(a, x) = (a + 1(1 − 2 · 0) − 1 = a. On
the other hand, if p(a, x) 6= 0, 1 − 2p(a, x)2 < 0, so q(a, x) < 0 (unless a < 0). Therefore
S =ra(q) ∩ N
Exercise 4.3.14 ` = 6, x = 2, s = 7 so that Q = 32.
2 2 1 1 0 0 0 R1
1 0 0 0 0 1 0 L0
0 1 0 0 0 0 0 L1
0 0 1 0 0 0 0 L2
0 0 0 1 0 1 0 L3
0 0 0 0 1 0 0 L4
0 0 0 0 0 0 0 L5
1 0 0 0 0 0 1 L6
11.4 Chapter 5
Exercise 5.1.3 Suppose that A is infinite and c.e. Then there is a computable f with
f (N) = A. We can “thin f down” as we know that A is infinite. Thus we can define
a new function g with g(0) = f (0) and g(n + 1) = x where (x 6∈ {g(0), . . . , g(n)} ∧
∃s(s least with f (s) = x)). Then g is 1-1.
Exercise 5.1.4 1. Suppose that A = ra (f ) with f computable and increasing. Then
to decide if x ∈ A, that is χA (x), compute {f (0), . . . , f (x + 1)}. Then x ∈ A iff x ∈
{f (0), . . . , f (x + 1)}.
2. Let A be c.e. and infinite, with A = ra g and g computable. By waiting, we can have a
computable function f with f (0) = g(0), f (n1 ) = g(m) where m is least with f (m) > g(n).
Then define B = ra f .
Exercise 5.3.4 Actually, the easiest proof is to take f (x) = x2 and to take an index e
with ϕe = f . then ϕe (e) = e2 . The original proof I had was the following, and a similar
proof would show that there is an index e with ϕe (e) = e2 and ϕ(x) ↑ for x 6= e using
g(x, y) = y 2 if x = y and ↑, else. Define g(x, y) = y 2 for all x. Use the s-m-n theorem to
get computable s with g(x, ·) = ϕs(x) (·). Now apply the Recursion Theorem to get a fixed
point, ϕs(z) = ϕz . Let e = s(z). Then g(e, e) = ϕs(e) (e) = ϕe (e) = e2 .
Exercise 5.6.3. (Sketch) Take the proof of the Limit Lemma and observe that if Φ is a
wtt functional, then it can change its mind on x at most 2ϕ(x)+1 many times. For the
other direction, use the same proof but observe that once you know the computable mind
change bound, then you can bound the search in B. Since B is c.e., B 6m K0 , so we can
bound the use in the halting problem represented by K0 .
Exercise 5.6.4 (Sketch) The normal proofs from a course in calculus can easily be made
to be computable. This is because the relationship between ε and δ are explicit functions.
Exercise 5.7.2 Build A = lims As by finite extension to meet
A\{x}
Re : ∃x[Φe (x) 6= A(x)].
Having met Rj for j < e and given As , we pick a large fresh number x and ask ∅0 whether
there is a string σ extending As with
Φσ
s+1 (x) = 1?
b
320 11 Solutions
Here σ b is the same as σ except σ b(x) = 0. If the answer is yes, let As+1 = As σ b. If the
answer is no, let As+1 be the extension of As of length x which is all 0 except at length x
where we have As+1 (x) = 1. In either case we meet Rs+1 .
Exercise 5.7.3 (Sketch) (i) Let Ve be the e-th c.e. set of strings. Meet a requirement
which meets or avoids Ve . at stage s see if there is a σ with As σ ∈ Ve to meet Rs+1 .
(ii) If A was c.e. then as it is infinite it has a computable subset B = {b0 < b1 < . . . }.
Consider the collection of strings V = {τj : |τ | = bj ∧ τ (bj ) = 0, j ∈ N+ }. Then for all
σ ≺ A, there is a τj ∈ V with σ 4 τj , but for no j is τj ≺ A.
Exercise 5.7.10 1. In the same was as we turned Kleene-Post into a priority argument,
turn the solution of Exercise 5.7.2 into a priority argument.
2. Suppose that A = A1 t A2 with Γ A1 = A2 , ∆A2 = A1 . We build an autoreduction Φ
for A. Run the enumerations of Ai,s so that at every stage As = A1,s t A2,s ξ(s). Here
ξ(s) = max{j 6 s : γ(j)[s], δ(j)[s]}. Now to compute if x ∈ A, Φ computes the least stage
s such that (As \ {x}) ξ(x)[s] = (A \ {(x}) ξ[s]. Then x ∈ A iff x ∈ As . The point is
that if x enters A after stage s, it must enter one of A1 and A2 , but also the other Ai must
change as well, and this cannot use x. Therefore (As \ {x}) ξ(x)[s]] 6= (A \ {(x}) ξ[s].
The reasoning more or less reverses. If ΦA\{(x} = A(x) is an autoreduction, then whould
x enter As+1 \ As , then some other number must enter As+1 \ As , once we speed up the
computations so that at each stage s the length of agreement is at least s. For the least x
entering As+1 \ As we put the least number into A1,s+1 and all the others into A2,s+1 .
Then it is not hard to show that A1 ≡T A2 .
11.5 Chapter 6
Exercise 6.1.2
(i) Let Γe (x) be a measure such that Γe (x) = s means that ϕe (x) uses exactly s space.
ϕe (x) ↓ implies that ϕe (x) halts in some space s, and hence Γe (x) ↓= s. Conversely,
if Γe (x) = s then ϕe (x) halts in exactly space s, which must mean that ϕe (x) ↓ .
Hence, ϕe (x) ↓ if and only if Γe (x) ↓, so Blum’s first axiom is satisfied for Γ.
To show the question Γe (x) = s? is computable we consider four cases:
we begin running ϕe (x) and keep track step by step of how much space is used and
whether we have seen a configuration at step t occur at some earlier step.
a. If ϕe (x) reaches a stage when it uses more than s space then we answer the
above question NO.
b. If ϕe (x) ↓ at some stage and has used space strictly less than s then we answer
NO.
c. If ϕe (x) ↓ at some stage having used exactly space s then we answer YES.
d. If we see a configuration at some stage that we have already seen at an earlier
stage, and strictly less than s space has been used, then we know that the
computation has entered an infinite loop of repeating configurations, and so we
can answer NO.
Hence Γ satisfies Blum’s second axiom, and so Γ is a complexity measure.
(ii) To show Blum’s axioms are independent we give two examples, (there being many
others, of course) one for which axiom 1 holds but axiom 2 does not hold, and vice-
versa, one for which axiom 2 holds but axiom 1 does not hold:
Let (
1 if ϕe (x) ↓
Γe (x) =
↑ otherwise
11.6 Chapter 7 321
then clearly ϕe (x) ↓ if and only if Γe (x) ↓ and so axiom 1 holds. But the ques-
tion Γe (x) = s? cannot be computable because then the Halting problem would be
computable, a contradiction.
Now let
Γe (x) ↓= 1
for all e and x. Then clearly Blum’s second axiom holds since to answer the question
Γe (x) = s? we just say YES if s = 1 and NO otherwise. Also, axiom 1 does not hold
since Γe (x) is always defined, even if ϕe (x) is undefined.
Exercise 6.1.3 The simulation of each step of the 2-tape machine M requires finding and
recording the symbols under the two heads but using one tape and one head. The alphabet
for the one tape machine M 0 uses symbols to encode what is being read by each tape of
M together with whether the tape heads for M are reading a particular square or not.
If L is accepted in time f (n) by M then M 0 never needs to have more than 2f (n) tape
squares in use. Searching for squares of M 0 that represent the square of each tape in M
that is being read by each head can therefore only take time 6 4f (n). Then to perform
the required change of state and tape writing action will be at most another 2f (n) time
steps, so 6 6f (n) in total (moving to the opposite end of the tape) and we ignore some
constant number of steps that are needed to perform the action. So each step of M takes
O(f (n)) steps by M 0 . Therefore the at most f (n) steps that M performs can be done by
M 0 in O(f (n)2 ) steps.
Exercise 6.1.7 WeSfirst show that P ⊆ DTIME(nlog n ).
Recall that P = d DTIME(nd ), where n is the length of the input x, and DTIME(f (n)) =
{L : there is some machine M that accepts almost all x ∈ L in time 6 f (n)}. Almost all
means all but finitely many, that is there exists n0 such that for all n > n0 , M accepts x
in time less than f (n) where n is the length of the input x.
We show that given L ∈ P (so L ∈ DTIME(nd ), for some d) there is an n0 such that
for all strings x of length > n0 , nlog n > nd .
Choose n0 = 3d . Then ed < 3d and so ed < n0 . Therefore ed < elog n0 and so d < log n0 .
It follows that nd0 < nlog
0
n0
as required.
Therefore P ⊆ DTIME(nlog n ).
Now we show the inclusion is strict, that is P 6= DTIME(nlog n ). Suppose f ∈
DTIME(nlog n ) and g ∈ P, that is g ∈ DTIME(nk ) for some k ∈ N. Then by the hi-
erarchy theorem, since
g(n) nk 1
lim = lim log n = lim log n−k = 0,
n→∞ f (n) n→∞ n n→∞ n
11.6 Chapter 7
Since B is a polynomial time computable language, the above algorithm can be performed
in polynomial time.
Exercise 7.2.3 Suppose that L b 6Pm L via f . Using Exercise 7.2.2, let x ∈ L iff ∃z[|z| =
p(|x|) ∧ hx, zi ∈ B], with B ∈ P. then u ∈ L b iff f (u) ∈ L iff ∃z[|z| = p(|f (u)|) ∧ hx, zi ∈ B].
Exercise 7.2.10 Given G, ask if G has a Hamilton sycle. If not then say no. If yes, start
at any vertex v and look at any edge uv. Consider G1 = G \ {uv}. If G1 has a Hamilton
cycle, try to remove anoth edge from v. If G1 has no Hamilton cycle, the uv is essential to
any cycle. Continue this way removing edges until only a cycle remains.
Exercise 7.2.11 (Sketch) Consider k = 3. Let X be an instance of 3-Sat. The only
problem might be that X has one of more clauses C of size < 3. We use local replacement
to replace them with clauses of size 3. For example if C has size 2, say m ∨ n for literals
m, n, then we replace C by (m ∨ n ∨ z) ∧ (m ∨ n ∨ ¬z) for some fresh variable z, etc.
Exercise 7.2.17 The solution is a case analysis. As the graph is symmetric, it is enough
to consider the opposite corners on the horizontal axis.
Suppose that x has colour 1. We need to show y has colour 1. Then give s colour 2.
Hence h has colour 3.
Case 1. r has colour 3. Then ` has colour 1 and q has colour 2 and k has colour 3. Hence
t has colour 1, p has colour 2, m has colour 3, v has colour 2 meaning that y must have
colour 3.
Case 2. r has colour 2. Then ` has colour 1 and hence q and k have colour 3. Thus v
has colour 2, and n has colour 1, meaning that y must be 3.
Exercise 7.2.20 The problem is in NP by guess and check. To establish NP-hardness, we
reduce from Vertex Cover. Let (G, k) be an instance of Vertex Cover.
We construct a new graph H such that H has a dominating set of size k iff G has a
vertex cover of size k. To make H, begin with G, and for each edge xy ∈ E(G), add a new
vertex vxy to G. We then add edges vxy x and vxy y to specify H as G together with the
new vertices and edges.
Now suppose H has a dominating set of size k. Since every vertex of H must be dom-
inated, all the vxy must be dominated, so for each edge xy ∈ E(G), we must choose x, y,
or vxy , since they are the only vertices connected to vxy . In particular, one vertex corre-
sponding to each edge of G must be chosen, and hence there must be k vertices which are
a vertex cover in G. Conversely, if G has a vertex cover of size k, say u1 , . . . , uk , then this
is also a dominating set in H.
Exercise 7.2.21 This is NP-complete. Let (G, k) be an instance of Dominating Set. Now
form a new graph H by adding a single vertex v, joined to every vertex of G. H has a
dominating set of size 1, namely {v}. If we later remove v (and all its edges) from H then
solving Dominating Set Repair is the same as finding a dominating set in G.
Exercise 7.2.22 (i) First see if the graph is 2-colourable. If it is not then evidently no
schedule exists. If it is 2-colourable, then we need to decide, for each component, whether
11.6 Chapter 7 323
to schedule the red vertices at time 0, an the blue ones at time 1 or vice versa, so that
there are at most m vertices scheduled at any one time. This can be done in polynomial
time as can be seen by induction on the number of components.
(ii) The problem is equivalent to the following. Given a graph G can the vertices of G
be properly coloured into t colours and each monochromatic set has cardinality at most
m? This solves t-Col by taking m = n, and hence is NP-complete.
Exercise 7.2.23 (i) We reduce from SAT. Suppose that we have an instance ϕ of SAT with
a variable x occurring k > 3 times. Then we will replace all but one of these occurrences
by new variables, {s1 , . . . , sk−1 } (chosen to be distinct for each x). We then add the
new clauses saying that each si is equivalent to x. This is enforced by x → s1 → s2 →
. . . sk−1 → x, which is equivalent to the conjunction:
(ii) If the variable x occurs only positively then we can make all clauses containing x true
by making it true, and eliminate them, similarly if all occurrences are negative. Therefore
we can reduce to the case where every variable occurs both positively and negatively, and,
since it can occur at most twice, they occur exactly once positively and once negatively.
If those two clauses contain no other variable, you would get x ∧ ¬x so no satisfying
assignment. Then so solve the question, we can always apply a ”resolution rule” that we
can combine two clauses, one containing x and one containing ¬x, by throwing x away. For
example, if we have (x ∨ y ∨ z) and (¬x ∨ p ∨ q ∨ r) they combine to give (y ∨ z)(p ∨ q ∨ r)
and the new clauses are satisfiable iff the old ones were. Continue in this way, until all
variables are eliminated (then it is satisfiable) or we get a contradiction.
Exercise 7.2.31 First the problem is in NP as you can guess the solution and then check
it. We reduce Hamilton Cycle to Long Path. Take G and suppose that G has n vertices.
We make a new graph G b by taking a vertex x of G and adding an edge in its place of uv
of G. Then we add two new vertices u b and vb, and declaring u
bu and vbv to be new edges; so
that ub and vb have degree 1 in G.b Then G b has n + 3 many vertices and a long path will have
n + 2 edges. If G b has a long path then it must start at u b and finish at vb, up to reversal.
This is because they have degree 1. Moreover, since vertices are not repeated, u and v are
used exactly once. But then there is, in G a path from u to v consisting of only old edges,
and then contracting uv in this path makes a Hamilton cycle in G. Conversely if G has a
Hamilton cycle, then it has one beginning and ending at x. Then reversing the reasoning
above, we can turn this into a long path.
Exercise 7.2.32 Reduce from Hamilton Cycle. Let G be an instance with n vertices,
then the cycle of n vertices is a topological minor iff G has a Hamilton cycle.
Exercise 7.4.5 (Sketch) Use the following widget, where the indicated crossing is replaced
by the crossing widget.
11.7 Chapter 8
11.8 Chapter 9
Exercise 9.4.11 (i) (2) implies (1). It suffices to prove that E(C) ∩ Y 6= ∅ for any odd
cycle C. Let E(C) ∩ X = {u0 v0 , u1 v1 , . . . , uq vq }, which can be shown to be odd, and
numbered with vertices ui v(i+1) mod q being connected by a path in G − X. Since Φ is
a valid colouring, we see that Φ(vi ) 6= Φ(ui ) for all 0 6 i 6 q. As q is odd, there must
be some pair vi u(i+1) mod q with Φ(vi ) 6= Φ(u(i+1) mod q . But removal of Y destroys all
paths between black and white vertices, and hence E(C) ∩ Y 6= ∅.
(1) implies (2). Let CX : V → {B, W } be a two-colouring of the bipartite graph G − X,
and similarly GY a two-colouring of G − Y . Now define Φ : V → {B, W } with Φ(v) = B by
having if CX (v) = CY (v) and Φ(v) = W , otherwise. The claim is that given the restriction
of Φ to V (X), Φb is a valid colouring with Y an edge cut in G − X between the white and
the black vertices of Φ.
b
(ii)The method is quite similar to that used for Vertex Cover. Let E(G) = {e1 , . . . , em }.
At step i we consider G[{e1 , . . . , ei }] := Gi , and either construct from Xi−1 a minimal
bipartization set Xi or return a “no” answer. X1 = ∅. Consider step i, and suppose that
|Xi−1 | 6 k. If Xi−1 is a bipartization set for Gi , we are done, as we can keep Xi−1 = Xi ,
and move on to step i + 1. Else, we consider Xi−1 ∪ {ei }, which will clearly be a minimal
edge bipartization set for Gi . If |Xi−1 ∪ {ei }| 6 k then set Xi = Xi−1 ∪ {ei } and move
on to step i + 1. If |Xi−1 ∪ {ei }| = k + 1, we seek an Xi that will have 6 k edges, or we
report “no.” The plan is to use Lemma 9.4.1 with “X = Xi−1 ” and “Y = Xi ” to seek new
bipartizations. This lemma needs Y to be disjoint from X. This is achieved by a simple
reduction at this step. For each uv ∈ Xi−1 , we delete this edge, and replace it in G with
3 edges, uw1 , w1 w2 , w2 v (w1 , w2 are distinct for each pair u, v). Then in Xi−1 we include
one of these edges for each uv, for instance w1 w2 . Notice that if uv is necessary for a
minimal edge bipartization before this step, then we can use either of the other two in its
stead for Gi . Now we proceed analogously to Vertex Cover. Namely, we enumerate all
valid colourings Φ of V (Xi−1 ), and determine a minimum edge cut between the white and
black vertices of size 6 k. Each of the minimum cut problems can be solved using bipartite
11.9 Chapter 10 325
matching and hence in time O((k + 1)i), as this has k + 1 rounds. If no such cut is found
in any of the partitions we say “no”. P If we find a Y , set Gi = Y , and move to step i + 1.
The total running time is thus O( m i=1 2 k+1 ki) = O(2k km2 ).
Exercise 9.4.13 Let N denote the number of distinct coordinates that appear in the r-
tuples of M , and assume that these coordinates are {1, . . . , N }. Let K = kr. Let n denote
the total size of the description of M .
Define a solution schema S for the problem to be a set of k r-tuples with all coor-
dinates distinct, and with the coordinates chosen from J(K). If α is an r-tuple in M ,
α = (x1 , . . . , xr ), let h(α) denote the image r-tuple, h(α) = (h(x1 ), . . . , h(xr )).
The correctness of the algorithm is nearly immediate. If M has a k-matching, then this
involves K = kr distinct coordinates, and by the lemma, there must be some h ∈ H(N, K)
that maps these invectively to J(K), and the image under h is the solution schema that
causes the algorithm to output “yes. ” Conversely, if there is an h that realizes a solution
schema S, then choosing one r-tuple in each of the k preimages yields a matching.
The running time of the algorithm is bounded by O(K!K 3K+1 n(log n)6 ).
11.9 Chapter 10
Exercise 10.2.1 This uses a simple gadget. Suppose that we had such an approximation
algorithm and we can approximate to within k. Let G be a graph and with an instance
of classical Independent Set. To G add an independent set with k elements to make H.
Then G has an independent set of size t iff H has one of size t + k.
Exercise 10.2.9 Suppose that (I1 , S1 ) 6AP (I2 , S2 ), with f , g and d as above. Let A
be an approximation algorithm for (I2 , S2 ). Then the algorithm g(I, A(f (I, r)), r) is an
approximation algorithm for (I1 , S1 ) with ratio 1 + d(r − 1).
Exercise 10.2.15 This is proven by induction on the width k, and for each k we recursively
construct an algorithm Ak . If k = 1 then G is what is called a caterpillar graph being one
that is like a path except it can have places which look like small stars with spikes of length
1; and we can use greedy minimization which will use at most 3 colours. So suppose k > 1,
and let Gn have vertices {v1 , . . . , vn }. The computable algorithm Ak will have computed a
computable partition of G, which we denote by {Dy | y < k}. We refer to the Dy as layers.
Consider vn+1 . If the pathwidth of Gn+1 = Gn ∪ {vn+1 } is < k, colour vn+1 by Ak−1 ,
and put into one of the cells Dy , for y < k − 1 recursively. (In the case of pathwidth 1, this
will all go into D0 .) We will be colouring using using the set of colours {1, . . . , 3k − 2}.
If the pathwidth of Gn+1 is k, consider Hn+1 , the induced subgraph of Gn+1 generated
by Gn+1 \ Dk . If the pathwidth of Hn+1 is < k, then again colour vn+1 by Ak−1 , and
put into one of the cells Dy , for y < k − 1, recursively, and colour using the set of colours
{1, . . . , 3k − 2}. If the pathwidth of Hn+1 is k, then we put vn+1 into Dk−1 . In this case,
that is in Dk−1 , we will use first fit using colours 3k − 2 < j 6 3k + 1.
The validity of this method follows from the fact that the maximum degree of vertices
restricted to Dk−1 is 2, and induction on k. Assume that Ak−1 is correct and colours the
subgraph of Gn induced by the layers {Dy | y < k − 1} using colours {1, . . . , 3k − 2}.
Note that the construction ensures that the pathwidth of this subgraph Hk is at most
k−1. Moreover, induction on n ensures that Ak−1 would colour the vertices of the subgraph
326 11 Solutions
AB09. Sanjeev Arora and Boaz Borak. Computational Complexity: A Modern Ap-
proach. Princeton University Press, 2009. vi
AB12. Jeremy Avigad and Vasco Brattka. Computability and analysis: the legacy
of Alan Turing. In Rodney Downey, editor, Turing’s Legacy, pages 1–46.
Springer-Verlag, 2012. xiv
ACG+ 99. Giorgio Ausiello, Perlugio Crescenzi, Gorgio Gambosi, Viggo Kann, Alberto
Marchetti-Spaccemela, and Marco Protasi. Complexity and Approximation.
Springer-Verlag, 1999. vi, xix, 286, 287
Ack28. Wilhelm Ackermann. Zum hilbertschen aufbau der reellen zahlen. Mathema-
tische Annalen, 99:118–133, 1928. 60
AD22. Matthew Askes and Rodney Downey. Online, computable, and punctual struc-
ture theory. Logic Journal of the IGPL, jzac065:44 pages, 2022. 293
ADF93. Karl Abrahamson, Rodney Downey, and Michael Fellows. Fixed parameter
tractability and completeness iv: On completeness for W [P ] and PSPACE
analogs. Annals of Pure and Applied Logic, 73:235–276, 1993. 252
Adi55. Sergei Adian. Algorithmic unsolvability of problems of recognition of certain
properties of groups. Doklady Akademii Nauk SSSR, 103:533–535, 1955. 96
AHK77. Kenneth Appel, Wolfgang Haken, and John Koch. Every planar map is four
colorable. ii. reducibility. Illinois Journal of Mathematics, 21(3):491–567,
1977. 193, 282
AIK84. Akeo Adachi, Shigeki Iwata, and Takumi Kasai. Some combinatorial game
problems require ω(nk ) time. Journal of the ACM, 31(2), 1984. 209
AK00. Chris Ash and Julia Knight. Computable structures and the hyperarithmetical
hierarchy, volume 144 of Studies in Logic and the Foundations of Mathemat-
ics. North-Holland Publishing Co., Amsterdam, 2000. vi, 150
AKCF+ 04. Faisal Abu-Khzam, Rebecca Collins, Michael Fellows, Michael Langston,
Henry Suters, and Christopher Symons. Kernelization algorithms for the
vertex cover problem: theory and experiments. In R. Sedgewick L. Arge,
G. Italiano, editor, ALENEX/ANALC, Proceedings of the Sixth Workshop
on Algorithm Engineering and Experiments and the First Workshop on Ana-
lytic Algorithmics and Combinatorics, New Orleans, LA, USA, pages 62–69.
SIAM, 2004. 263
AKLSS06. Faisal Abu-Khzam, Michael Langston, Puskar Shanbhag, and Christopher
Symons. Scalable parallel algorithms for FPT problems. Algorithmica, 45:269–
284, 2006. 260, 262, 263
AKS04. Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. PRIMES in P. Annals
of Mathematics, 160:781–793, 2004. 181
© The Editor(s) (if applicable) and The Author(s), under exclusive license 327
to Springer Nature Switzerland AG 2024
R. Downey, Computability and Complexity, Undergraduate Topics
in Computer Science, https://doi.org/10.1007/978-3-031-53744-8
328 References
AL17. Dennis Amelunxen and Martin Lotz. Average-case complexity without the
black swans. Journal of Complexity, 41:82–101, 2017. 300, 309
ALM+ 98. Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario
Szegedy. Proof verification and the hardness of approximation problems. Jour-
nal of the ACM, 45(3):501–555, 1998. 201, 224, 253
Aro96. Sanjeev Arora. Polynomial time approximation schemes for Euclidean TSP
and other geometric problems. In Proceedings of the 37th IEEE Symposium
on Foundations of Computer Science, 1996. 253, 254
Aro97. Sanjeev Arora. Nearly linear time approximation schemes for Euclidean TSP
and other geometric problems. In Proc. 38th Annual IEEE Symposium on
the Foundations of Computing (FOCS’97), pages 554–563. IEEE Press, 1997.
254, 255
AW09. Scott Aaronson and Avi Wigderson. Algebraization: A new barrier in com-
plexity theory. In Symposium on the Theory of Computing (STOC), 2009.
224
AYZ94. Noga Alon, Raphy Yuster, and Uri Zwick. Color-coding: A new method for
finding simple paths, cycles and other small subgraphs within large graphs. In
Proc. Symp. Theory of Computing (STOC), pages 326–335. ACM, 1994. 267,
268
Bab90. Lázló Babai. E-mail and the unexpected power of interaction. In Proceedings
Fifth Annual Structure in Complexity Theory Conference, pages 30–44, 1990.
224
Bab16. Lázló Babai. Graph isomorphism in quasipolynomial time [extended abstract].
In STOC ’16: Proceedings of the forty-eighth annual ACM symposium on
Theory of Computing, pages 684–697, 2016. 228
Baz95. Christina Bazgan. Schémas d’approximation et complexité paramétrée. Rap-
port de stage de DEA d’Informatique à Orsay, 1995. 255
BBM03. Cyril Banderier, René Beier, and Kurt Mehlhorn. Smoothed analysis of three
combinatorial problems. In Branislav Rovan and Peter Vojtáš, editors, Math-
ematical Foundations of Computer Science 2003, pages 198–207, Berlin, Hei-
delberg, 2003. Springer Berlin Heidelberg. 309
BD20. Laurent Bienvenu and Rodney Downey. On low for speed oracles. Journal of
Computing and System Sciences, 108:49–63, 2020. 174
BDFH09. Hans Bodlaender, Rodney Downey, Michael Fellows, and Danny Hermelin.
On problems without polynomial kernels. Journal of Computing and System
Sciences, 75(8):423–434, 2009. 270, 273, 274, 275
BDG90. José Balcázar, Josep Dı́az, and Joaquim Gabarró. Structural Complexity I
and II. Number 11 and 22 in EATCS Monographs on Theoretical Computer
Science. Springer Verlag, 1988 and 1990. 170, 211
Bee85. Michael J. Beeson. Foundations of Constructive Mathematics: Metamathe-
matical Studies. Springer-Verlag, 1985. 72
BG81. Charles Bennett and John Gill. Relative to a random oracle A, PA 6=NPA 6=co-
NPA with probability 1. SIAM Journal on Computing, 10:96–113, 1981. 222
BGG97. Egon Börger, Erich Grädel, and Yuri Gurevich. The Classical Decision Prob-
lem. Springer-Verlag, 1997. 107
BGS75. Theodore Baker, John Gill, and Robert Solovay. Relativizations of the P=?NP
question. SIAM Journal on Computing, 4:431–442, 1975. 218, 219, 220, 221,
222
BH08. Harry Buhrman and John Hitchcock. NP-hard sets are exponentially dense
unless coNP ⊆ NP/poly. In Proceedings of the 23rd Annual IEEE Conference
on Computational Complexity, pages 1–7. IEEE, 2008. 272, 273
BHPS61. Yehoshua Bar-Hillel, Micha Perles, and Eli Shamir. On the formal properties
of the simple phrase structure grammars. Z. Phon. Sprachwiss Kummun.
Forsch., 14(2):143–172, 1961. 21, 39
References 329
CFK+ 16. Marek Cygan, Fedor Fomin, Lukasz Kowalik, Dniel Lokshtanov, Dániel Marx,
Daniel Pilipczuk, Michaa Pilipczuk, and Saket Saurabh. Parameterized Algo-
rithms. Springer-Verlag, 2016. vi, xviii, 260, 267, 269
CI97. Marco Cesati and Miriam Di Ianni. Computation models for parameterized
complexity. MLQ Math. Log. Q., 43:179–202, 1997. 251, 275
CJ01. Liming Cai and David Juedes. Subexponential parameterized algorithms col-
lapse the W -hierarchy. In 28th International Colloquium on Automata, Lan-
guages and Programming, pages 273–284, 2001. 278, 279
CJ03. Liming Cai and David Juedes. On the existence of subexponential parameter-
ized algorithms. Journal of Computing and System Sciences, 67(4):789–807,
2003. 277, 278, 279
CK00. Chandra Chekuri and Sanjee Khanna. A ptas for the multiple knapsack prob-
lem. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
(SODA 2000), pages 213–222, 2000. 253, 254
CKX10. Jianer Chen, Iyad Kanj, and Ge Xia. Improved upper bounds for vertex cover.
Theoretical Computer Science A, 411:3736–3756, 2010. 233, 259
CL19. Yijia Chen and Bingkai Lin. The constant inapproximability of the parame-
terized dominating set problem. SIAM Journal on Computing, 48(2):513–533,
2019. 291
CM99. Jianer Chen and Antonio Miranda. A polynomial-time approximation scheme
for general multiprocessor scheduling. In Proc. ACM Symposium on Theory
of Computing (STOC ’99), pages 418–427. ACM Press, 1999. 253, 254
Coh63. Paul Cohen. The independence of the continuum hypothesis, [part 1]. Pro-
ceedings of the National Academy of Sciences of the United States of America,
50(6):1143–1148, 1963. 14
Coh64. Paul Cohen. The independence of the continuum hypothesis, [part 2]. Pro-
ceedings of the National Academy of Sciences of the United States of America,
51(1):105–110, 1964. 14
Con72. John Conway. Unpredictable iterations. In 1972 Number Theory Conference,
University of Colorado, Boulder, pages 49–52. Springer-Verlag, 1972. xv, 77,
79, 83
Coo71. Stephen Cook. The complexity of theorem proving procedures. In Proceedings
of the Third Annual ACM Symposium on Theory of Computing, pages 151–
158, 1971. xviii, 182
CS88. Marek Chrobak and Maciej Ślusarek. On some packing problem related to
dynamic storage allocation. RAIRO Inform. Théor. Appl., 22(4):487–499,
1988. 293
CT97. Marco Cesati and Luca Trevisan. On the efficiency of polynomial time ap-
proximation schemes. Information Processing Letters, pages 165–171, 1997.
258
DASMar. Rodney Downey, Klaus Ambos-Spies, and Martin Monath. Notes on Sacks’
Splitting Theorem. Journal of Symbolic Logic, to appear. 155
Dav77. Martin Davis. Unsolvable problems. In Jon Barwise, editor, Handbook of
Mathematical Logic, volume 90 of Studies in Logic and the Foundations of
Mathematics, pages 567–594. North-Holland Publishing Co., 1977. 107
DECF+ 03. Rodney Downey, Vladimir Estivill-Castro, Michael R. Fellows, Elena Prieto-
Rodriguez, and Frances A. Rosamond. Cutting up is hard to do: the pa-
rameterized complexity of k-cut and related problems. Electronic Notes in
Theoretical Computer Science, 78:205–218, 2003. 277
Deh11. Max Dehn. Über unendliche diskontinuierliche gruppen. Math. Ann.,
71(1):116–144, 1911. xiv, 85, 94, 95
DF92. Rodney Downey and Michael Fellows. Fixed parameter tractability and com-
pleteness. Congressus Numerantium, 87:161–187, 1992. 231, 235, 237, 238,
239, 250, 251
References 331
DF93. Rodney Downey and Michael Fellows. Fixed parameter tractability and com-
pleteness iii: Some structural aspects of the W -hierarchy. In K Ambos-Spies,
S. Homer, , and E. Schöning, editors, Complexity Theory: Current Research,
pages 166–191. Cambridge Univ. Press, 1993. 231, 238
DF95a. Rodney Downey and Michael Fellows. Fixed parameter tractability and com-
pleteness i: Basic theory. SIAM Journal on Computing, 24:873–921, 1995.
231, 238, 239, 250, 251
DF95b. Rodney Downey and Michael Fellows. Fixed parameter tractability and com-
pleteness ii: Completeness for W [1]. Theoretical Computer Science A, 141:109–
131, 1995. 231, 237, 238, 239, 241, 245, 246
DF95c. Rodney Downey and Michael Fellows. Parameterized computational feasibil-
ity. In P. Clote and J. Remmel, editors, Proceedings of Feasible Mathematics
II, pages 219–244. Birkhauser, 1995. 260
DF98. Rodney Downey and Michael Fellows. Parameterized Complexity. Springer-
Verlag, 1998. 234, 262, 269
DF13. Rodney Downey and Michael Fellows. Fundamentals of Parameterized Com-
plexity. Springer-Verlag, 2013. vi, xviii, 40, 241, 251, 252, 264, 267, 269, 270,
279, 291, 292
DFM06. Rodney Downey, Michael Fellows, and Catherine McCartin. Parameterized
approximation problems. In Hans Bodlaender and Michael Langston, editors,
Parameterized and Exact Computation. Second International Workshop, IW-
PEC ’06. Zürich, Switzerland, September 13–15, 2006. Proceedings, LNCS
4169, pages 121–129. Springer, 2006. 292
DGH24. Eric Demaine, William Gasarch, and Mohammed Hajiaghayi. Computational
Intractability: A Guide to Algorithmic Lower Bounds. MIT Press, 2024. xvii,
xviii
DH10. Rodney Downey and Denis Hirschfeldt. Algorithmic randomness and complex-
ity. Theory and Applications of Computability. Springer, New York, 2010. vi,
147, 150, 162, 209
DHNS03. Rodney Downey, Denis Hirschfeldt, André Nies, and Frank Stephan. Trivial
reals. In Proceedings of the 7th and 8th Asian Logic Conferences, pages 103–
131, Singapore, 2003. Singapore Univ. Press. 150
Die05. Reinhard Dietstel. Graph Theory, 3rd Edition. Springer-Verlag, 2005. 291
Din07. Irit Dinur. The PCP Theorem by gap amplification. Journal of the ACM,
54(3):12–es, 2007. 201
Dir89. G. P. Lejeune Dirichlet. Über die darstellung ganz willkürlicher funktionen
durch sinus- und cosinusreihen (1837). In Gesammelte Werke, pages 135–160.
Bd. I. Berlin, 1889. 12
DJW12. Vida Dujmovic, Gwenael Joret, and David Wood. An improved bound for
first-fit on posets without two long incomparable chains. SIAM J. Discrete
Math, 26:1068–1075, 2012. 293
DK92. Rodney Downey and Julia Knight. Orderings with αth jump degree 0(α) .
Proc. Amer. Math. Soc., 114(2):545–552, 1992. 138
DLP+ 12. David Doty, Jack Lutz, Matthew Patitz, Robert Schweller, Scott Summers,
and Damien Woods. The tile assembly model is intrinsically universal. In
Proceedings of the Fifty-third Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2012, New Brunswick, NJ, October 20-23), pages
302–310. IEEE, 2012. 92
DM40. Ben Dushnik and Evan Miller. Concerning similarity transformations of lin-
early ordered sets. Bull. Amer. Math. Soc., 46:322–326, 1940. 153
DMar. Rodney Downey and Alexander Melnikov. Computable Structure Theory:
A Unified Approach. Springer-Verlag, to appear. vi, 137, 150, 151
Dow03. Rodney Downey. Parameterized complexity for the skeptic. In Computational
Complexity, 18th Annual Conference, pages 147–169. IEEE, 2003. 253, 254
332 References
Dow14. Rodney Downey. Turing’s Legacy. Cambridge University Press, 2014. 54, 67
DPR61. Martin Davis, Hillary Putnam, and Julia Robinson. The decision problem for
exponential diophantine equations. Annals of Mathematics, 74:425–436, 1961.
99
Dru12. Andrew Drucker. New limits to classical and quantum instance compression.
In Foundations of Computer Science, FOCS 2012, pages 609–618, 2012. 276
DTH86. Phan Dinh Dieu, Le Cong Thanh, and Le Tuan Hoa. Average polynomial time
complexity of some NP-complete problems. Theoretical Computer Science,
46:219–327, 1986. 296, 297
Edm65. Jack Edmonds. Paths, trees, and flowers. Canadian Journal of Mathematics,
17:449–467, 1965. xvii, 159, 236
EJS01. Thomas Erlebach, Klaus Jansen, and Eike Seidel. Polynomial time approxi-
mation schemes for geometric graphs. In Proc. ACM Symposium on Discrete
Algorithms (SODA’01), pages 671–679, 2001. 254
Epp95. David Eppstein. Subgraph isomorphism in planar graphs and related prob-
lems. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete
Algorithms, 22–24 January 1995. San Francisco, California, pages 632–640,
1995. 275
ERV14. Matthias Englert, Berthold Röglin, and Heiko Vöcking. Worst case and prob-
abilistic analysis of the 2-opt algorithm for the TSP. Algorithmica, 68(1):190–
264, 2014. 307, 309
Eul36. Leonard Euler. Solutio problematis ad geometriam situs pertinentis. Com-
ment. Academiae Sci. I. Petropolitanae, 8:128–140, 1736. xvii
Fei68. Lawrence Feiner. Orderings and boolean algebras not isomorphic to recursive
ones. ProQuest LLC, Ann Arbor, MI, 1968. Thesis (Ph.D.)–Massachusetts
Institute of Technology. 142
Fel02. Michael Fellows. Parameterized complexity: the main ideas and connections to
practical computing. In Experimental Algorithmics. Apringer-Verlag, LNCS
2547, 2002. 256, 257
FG06. Jorg Flum and Martin Grohe. Parameterized Complexity Theory. Springer-
Verlag, 2006. vi, xviii, 235, 251, 269
FHRV09. Michael Fellows, Danny Hermelin, Frances Rosamond, and Stéphanie Vialette.
On the parameterized complexity of multiple-interval graph problems. Theo-
retical Computer Science A, 410:53–61, 2009. 249
FL88. Michael Fellows and Michael Langston. Nonconstructive proofs of polynomial-
time complexity. Information Processing Letters, 26:157–162, 1987/88. 269
FLSZ19. Fedor Fomin, Daniel Lokshtanov, Saket Saurabh, and Meirav Zehavi. Kernel-
ization: Theory of Parameterized Preprocessing. Cambridge University Press,
2019. vi, xviii, 263, 270
FR98. David Fowler and Eleanor Robson. Square root approximations in old babylo-
nian mathematics: YBC 7289 in context. Historia Math, 25(4):366–378, 1998.
xii
Fri57. Richard Friedberg. Two recursively enumerable sets of incomparable degrees of
unsolvability. Proceedings of the National Academy of Sciences of the United
States of America, 43:236–238, 1957. 143, 147, 148
FS56. A. Fröhlich and J. Shepherdson. Effective procedures in field theory. Philos.
Trans. Roy. Soc. London. Ser. A., 248:407–432, 1956. 122, 123
FS11. Lance Fortnow and Rahul Santhanam. Infeasibility of instance compression
and succinct PCPs for NP. Journal of Computing and System Sciences,
77(1):91–106, 2011. 270, 271
FSU83. Aviezri Fraenkel, Edward Scheinerman, and Daniel Ullman. Undirected edge
geography. Theoretical Computer Science, 112(2):371–381, 1983. 208
FV11. Fedor Fomin and Yngve Villanger. Subexponential parameterized algorithm
for minimum fill-in. CoRR, abs/1104.2230, 2011. 261
References 333
IPZ01. Russell Impagliazzo, Ramamoham Paturi, and Francis Zane. Which problems
have strongly exponential complexity? Journal of Computing and System
Sciences, 63(4):512–530, 2001. 277, 278
JM79. James Jones and Yuri Matijacevič. Diophantine representation of enumerable
sets. Journal of Symbolic Logic, 49(3):181–829, 1979. 99, 100, 101, 102, 103,
105
Joh73. David Johnson. Near-Optimal Bin Packing Algorithms. PhD thesis, Mas-
sachusetts Institute of Technology, 1973. 284
JP78. Carl Jockusch and David Posner. Double jumps of minimal degrees. The
Journal of Symbolic Logic, 43:715–724, 1978. 145
Kar73. Richard Karp. Reducibility among combinatorial problems. In Raymond
Miller and James Thatcher, editors, Complexity of Computer Computations,
pages 85–103. Plenum Press, 1973. 70, 189, 191, 192, 193, 195, 196, 197, 198,
200, 238
Kar84. Narendra Karmarkar. A new polynomial-time algorithm for linear program-
ming. Combinatorica, 4(4):373–395, 1984. 198
Kar11. Richard Karp. Heuristic algorithms in computational molecular biology. Jour-
nal of Computing and System Sciences, 77(1):122–128, 2011. 265
Kha79. Leonid Khachiyan. A polynomial algorithm for linear programming. Doklady
Akademii Nauk SSSR., 224(5):1093–1096, 1979. 198
Kho02. Subhash Khot. On the power of unique 2-prover 1-round games. In Proceedings
of the thirty-fourth annual ACM symposium on Theory of computing, pages
767–775, 2002. 286
KL80. Richard Karp and Richard Lipton. Some connections between nonuniform
and uniform complexity classes. In Proceedings of the 12th Symposium on the
Theory of Computing, pages 302–309, 1980. 210, 211
Kle36. Stephen Kleene. λ-definability and recursiveness. Duke Mathematical Journal,
2(2):340–352, 1936. 63
Kle38. Stephen Kleene. On notation for ordinal numbers. The Journal of Symbolic
Logic, 3:150–155, 1938. 116, 117
Kle56. Stephen Kleene. Representation of events in nerve nets and finite automata.
In C. Shannon and J. McCarthy, editors, Automata Studies, Annals of Math-
ematics Studies, pages 3–42. Princeton University Press, 1956. 18, 26, 33,
36
KM96. Sanjeev Khanna and Rajeev Motwani. Towards a syntactic characterization
of PTAS. In Proceedings of Symposium on the Theory of Computing, pages
329–337, 1996. 255
KMSS03. Ilya Kapovich, Alexander Myasnikov, Paul Schupp, and Vladimir Shpilrain.
Generic case complexity, decision problems in group theory and random walks.
Journal of Algebra, 264:665–694, 2003. xix, 302, 304, 305
Kob78. G. N. Kobzev. On tt-degrees of r.e. t-degrees. Mat. Sb., 106:507–514, 1978.
225
Koz06. Dexter Kozen. Theory of Computation. Springer-Verlag, 2006. vi, 210, 215
KP54. Stephen Kleene and Emil Post. The upper semi-lattice of degrees of recursive
unsolvability. Annals of Mathematics. Second Series, 59:379–407, 1954. 143
KQ95. Hal Kierstead and Jun Qin. Coloring interval graphs with First-Fit. Discrete
Math., 144(1–3):47–57, 1995. Combinatorics of ordered sets (Oberwolfach,
1991). 293
Kro82. Leonid Kronecker. Grundzüge einer arithmetischen theorie der algebraischen
grossen. J. Reine Angew. Math., 92:1–123, 1882. xiv
KS81. Jussi Ketonen and Robert Solovay. Rapidly growing Ramsey functions. Annals
of Mathematics, 113(2):267–314, 1981. 60
KST93. Johannes Köbler, Uwe Schöning, and Jacobo Toran. The Graph Isomorphism
Problem: Its Structural Complexity. Springer-Verlag, 1993. 229
References 335
KST94. Hiam Kaplan, Ron Shamir, and Robert Tarjan. Tractability of parameterized
completion problems on chordal and interval graphs: Minimum fill-in and DNA
physical mapping (extended abstract). In 35th Ann. Proc. of the Foundations
of Computer Science (FOCS ’94), pages 780–891, 1994. 260, 261
KST16. Hal Kierstead, David Smith, and William Trotter. First-fit coloring on interval
graphs has performance ratio at least 5. European J. Combin., 51:236–254,
2016. 293
KT99. Antonin Kučera and Sebastiaan Terwijn. Lowness for the class of random sets.
The Journal of Symbolic Logic, 64:1396–1402, 1999. 293
Kuč86. Antonin Kučera. An alternative priority-free solution to Post’s problem. In
J. Gruska, B. Rovan, and J. Wiederman, editors, Proceedings, Mathematical
Foundations of Computer Science, Lecture Notes in Computer Science 233,
pages 493–500. Springer, Berlin, 1986. 150
Kun11. Kenneth Kunen. Set Theory, Revised Edition. Studies in Logic: Mathematical
Logic and Foundations. College Publications, 2011. 3, 14
Kur83. Stuart Kurtz. On the random oracle hypothesis. Information Processing
Letters, 57:40–47, 1983. 222
Lac76. Alastair Lachlan. A recursively enumerable degree which will not split over
all lesser ones. Annals of Mathematical Logic, 9(4):307–365, 1976. 148
Lad73. Richard Ladner. Mitotic recursively enumerable sets. Journal of Symbolic
Logic, 38(2):199–211, 1973. 153
Lad75. Richard Ladner. On the structure of polynomial time reducibilities. Journal
of computing and System Sciences, 22:155–171, 1975. 226, 227, 228
Lau83. Clemens Lautemann. BPP and the polynomial hierarchy. Information Pro-
cessing Letters, 17(4):215–217, 1983. 214
Lei81. Ernst Leiss. The complexity of restricted regular expressions and the synthesis
problem for finite automata. Journal of Computing and Systems Sciences,
23(3):348–354, 1981. 30
Ler81. Manuel Lerman. On recursive linear orderings. In Logic Year 1979–80
(Proc. Seminars and Conf. Math. Logic, Univ. Connecticut, Storrs, Conn.,
1979/80), volume 859 of Lecture Notes in Math., pages 132–142. Springer,
Berlin, 1981. 142
Ler83. Manuel Lerman. Degrees of Unsolvability. Perspectives in Mathematical Logic.
Springer, Berlin, 1983. 143
Lev73. Leonid Levin. Universal search problems. Problems of Information Transmis-
sion, 9(3):115–116, 1973. xviii, 182, 188
Lev86. Leonid Levin. Average case complete problems. SIAM Journal on Computing,
15:285–286, 1986. xix, 293, 298
LFKN92. Carsten Lund, Lance Fortnow, Howard Karloff, and Noam NIsan. Algebraic
methods for interactive proof systems. Journal of the ACM, 39:859–868, 1992.
222, 223, 224
Liv10. Noam Livne. All natural NP-complete problems have average case complete
versions. Computational Complexity, 19:477–499, 2010. 298
Lob51. Nikolai Lobachevsky. On the vanishing of trigonometric series (1834). In
Collected Works, pages 31–80. Moscow-Leningrad, 1951. 12
LPS+ 08. Michael Langston, Andy Perkins, Arnold Saxton, Scharff Jon, and Brynn Voy.
Innovative computational methods for transcriptomic data analysis: A case
study in the use of FPT for practical algorithm design and implementation.
The Computer Journal, 51:26–38, 2008. 234
LS80. David Lichtenstein and Michael Sipser. Go is polynomial-space hard. Journal
of the ACM., 27(2):393–401, 1980. 206, 210
LS01. Roger C. Lyndon and Paul E. Schupp. Combinatorial group theory. Classics
in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1977 edition.
97, 304
336 References
Myh57. John Myhill. Finite automata and representation of events. WADD TR-57-
624, Wright-Patterson AFB, Ohio, pages 112–137, 1957. 36, 43
Ner58. Anil Nerode. Linear automaton transformations. Proceedings of the American
Math. Soc., 9:541–544, 1958. 36, 43
Nie06. Rolf Niedermeier. Invitation to Fixed-Parameter Algorithms. Oxford Univer-
sity Press, 2006. vi, xviii, 259, 262, 264, 266, 269
Nie09. André Nies. Computability and Randomness, volume 51 of Oxford Logic
Guides. Oxford University Press, Oxford, 2009. vi
Nov55. Pyotr Novikov. On the algorithmic unsolvability of the word problem in group
theory. Trudy Mat. Inst. Steklov, 44:1–143, 1955. 96
NR00. Rolf Niedermeier and Peter Rossmanith. A general method to speed up
fixed-parameter-tractable algorithms. Inform. Process. Lett., 73(3–4):125–
129, 2000. 262
NSB08. N. S. Narayanaswamy and R. Subhash Babu. A note on first-fit coloring of
interval graphs. Order, 25(1):49–53, 2008. 293
NT75. George Nemhauser and Leslie Trotter. Vertex packings: Structural properties
and algorithms. Math. Program., 8:232–248, 1975. 263, 284, 288
Odi90. Piogeorgio Odifreddi. Classical Recursion Theory. The theory of functions and
sets of natural numbers. Number 125 in Studies in Logic and the Foundations
of Mathematics. North-Holland Publishing Company, Amsterdam, 1990. vi
Owi73. James Owings, Jr. Diagonalization and the recursion theorem. Notre Dame
Journal of Formal Logic, 14:95–99, 1973. 117
Pap94. Christos Papadimitriou. Computational Complexity. Addison–Wesley, 1994.
204
PCW05. Iman Poernomo, John Crossley, and Martin Wirsing. Adapting Proofs-as-
Programs: The Curry-Howard Protocol. Monographs in Computer Science.
Springer-Verlag, 2005. 72
PER89. Marion Pour-El and Ian Richards. Computability in Analysis and Physics.
Perspectives in Mathematical Logic. Springer Verlag, Berlin, 1989. vi, 137
Pip79. Nick Pippenger. On simultaneous resource bounds. In 20th Annual Sympo-
sium on Foundations of Computer Science, FOCS 1979, San Juan, Puerto
Rico, 29–31 October 1979. Proceedings, pages 307–311. IEEE Computer So-
ciety, 1979. 211
Poc12. Henry Pocklington. The determination of the exponent to which a number
belongs, the practical solution of certain congruences, and the law of quadratic
reciprocity. Proceedings of the Cambridge Philosophical Society, 16:1–5, 1912.
xvii
Poo14. Bjorn Poonen. Undecidable problems: a sampler. In J. Kennedy, editor,
Interpreting Gödel, pages 211–241. Cambridge Univ. Press, 2014. 107
Pos44. Emil Post. Recursively enumerable sets of positive integers and their decision
problems. Bulletin of the American Mathematical Society, 50:284–316, 1944.
143
Pos47. Emil Post. Recursive unsolvability of a problem of Thue. Journal of Symbolic
Logic, 12:1–11, 1947. 86, 91, 121, 122
Pre08. Charles Pretzgold. The Annotated Turing. Wiley, 2008. 54
Rab58. Michael Rabin. Recursive unsolvability of group theoretic problems. Annals
of Mathematics, 67:172–194, 1958. 96
Raz85. Alexander Razborov. Some lower bounds for the monotone complexity of some
boolean functions. Soviet Math. Dokl., 31:354–357, 1985. 224
Ric53. Henry Rice. Classes of recursively enumerable sets and their decision problems.
Transactions of the American Mathematical Society, 74:358–366, 1953. 115
Rob52. Julia Robinson. Existential definability in arithmetic. Transactions of the
American Mathematical Society, 72:437–449, 1952. 101
Rog87. Hartley Rogers. Theory of recursive functions and effective computability.
MIT Press, Cambridge, MA, second edition, 1987. vi, 130, 143
338 References
Rot65. Joseph Rotman. The Theory of Groups. Allyn and Bacon, 1965. 96
RR97. Alexander Razborov and Stephen Rudich. Natural proofs. Journal of Com-
puter and System Sciences, 55, 1997. 224
RS69. Michael O. Rabin and Dana Scott. Finite automata and their decision prob-
lems. IBM J. Res. Dev., 3:114–125, 1969. 28, 30, 32
RSV04. Bruce Reed, Kaleigh Smith, and Adrian Vetta. Finding odd cycle transversals.
Operations Research Letters, 32:299–301, 2004. 265, 266
Sac63. Gerald Sacks. On the degrees less than 00 . Annals of Mathematics. Second
Series, 77:211–231, 1963. 153, 155
Sac64. Gerald Sacks. The recursively enumerable degrees are dense. Annals of Math-
ematics Second Series, 80:300–312, 1964. 155
Sav70. Walter Savitch. Relationships between nondeterministic and deterministic
tape complexities. Journal of Computer and System Sciences, 4(2):177–192,
1970. 203
Sch78. Thomas Schaefer. On the complexity of some two-person perfect-information
games. Journal of Computer and System Sciences, 16(2):185–225, 1878. 206
Sch88. Uwe Schöning. Graph isomorphism is in the low hierarchy. Journal of Com-
puting and System Sciences, 37(3):312–323, 1988. 229
See14. Abigail See. Smoothed analysis with applications in machine learning. Tripos
Part III, Essay, Cambridge University, 2014. 309
SG76. Sartaj Sahni and Teofilo Gonzales. P-complete approximation problems. Jour-
nal of the Association for Computing Machinery, 23, 1976. 287
Sha92. Adi Shamir. IP=PSPACE. Journal of the ACM, 39:869–877, 1992. 222
Sho59. Joseph Shoenfield. On degrees of unsolvability. Annals of Mathematics. Sec-
ond Series, 69:644–653, 1959. 136
Sho79. Richard Shore. The homogeneity conjecture. Proceedings of the National
Academy of Science, USA, 76(9):4218–4219, 1979. 218
Sip83. Michael Sipser. A complexity theoretic approach to randomness. In 15th
Symposium on the Theory of Computing, pages 330–335, 1983. 214
Sma83. Steve Smale. On the average number of steps of the simplex method of linear
programming. Mathematical Programming, 27:241–262, 1983. 306
Soa87. Robert I. Soare. Recursively enumerable sets and degrees. Perspectives in
Mathematical Logic. Springer-Verlag, Berlin, 1987. A study of computable
functions and computably generated sets. vi, 117, 143, 148
Soa16. Robert I. Soare. Turing Computability. Theory and Applications of Com-
putability. Springer-Verlag, Berlin, 2016. Theory and applications. vi, 143
SST06. Arvind Sankar, Daniel A. Spielman, and Shang-Hua Teng. Smoothed analysis
of the condition numbers and growth factors of matrices. SIAM Journal on
Matrix Analysis and Applications, 28(2):446–476, 2006. 309
ST98. Ron Shamir and Dekel Tzur. The maximum subforest problem: Approxima-
tion and exact algorithms. In Proc. ACM Symposium on Discrete Algorithms
(SODA’98), pages 394–399. ACM Press, 1998. 253, 254
ST01. Daniel Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: why
the simplex algorithm usually takes polynomial time. In Proceedings of the
Thirty-Third Annual ACM Symposium on Theory of Computing, pages 296–
305. ACM, 2001. xix, 306
ST09. Daniel Spielman and Shang-Hua Teng. Smoothed analysis: an attempt to
explain the behavior of algorithms in practice. Communications of the ACM,
52(10):76–84, 2009. 306
Tao22. Terry Tao. Almost all orbits of the Collatz map attain almost bounded values.
Forum of Mathematics, Pi, 10(e12), 2022. 76
Tho68. Ken Thompson. Regular expression search algorithm. Communications of the
ACM, 11(6):419–422, 1968. 29, 32
Tho98. Robin Thomas. An update on the four-color theorem. Notices of the American
Mathematical Society, 45(7):848–859, 1998. 282
References 339
TMAS77. Shuji Tsukiyama, Ide Mikio, Hiromu Ariyoshi, and Isao Shirikawa. A new
algorithm for generating all the maximal independent sets. SIAM Journal on
Computing, 6:506–517, 1977. 297
Tra84. Boris Trakhtenbrot. A survey of Russian approach to perebor (brute force)
algorithms. Annals of the History of Computing, 6:384–400, 1984. xvi
Tur36. Alan Turing. On computable numbers with an application to the Entschei-
dungsproblem. Proceedings of the London Mathematical Society, 42:230–265,
1936. correction in Proceedings of the London Mathematical Society, vol. 43
(1937), pp. 544–546. xiv, 46, 53, 74, 126
Tur37a. Alan Turing. Computability and λ-definability. Journal of Symbolic Logic,
2(4):153–163, 1937. 63
Tur37b. Alan M. Turing. On Computable Numbers, with an Application to the
Entscheidungsproblem. A Correction. Proceedings of the London Mathemati-
cal Society, 43:544–546, 1937. 126
Tur39. Alan Turing. Systems of logic based on ordinals. Proceedings of the London
Math. Society, 45:154–222, 1939. 129
Tur52. Alan Turing. The chemical basis of morphogenesis. Philosophical Transactions
of the Royal Society of London B, 237:37–72, 1952. vii
Var82. Moshe Vardi. The complexity of relational query languages. In Proceedings
STOC ’82. ACM, 1982. 208
Vaz01. Vijay Vazirani. Approximation Algorithms. Springer-Verlag, 2001. vi, 286,
287
Viz64. Vadim Vizing. On an estimate of the chromatic class of a p-graph. Diskret.
Analiz., 3:25–30, 1964. 282
VV86. Leslie Valiant and Vijay Vazirani. NP is as easy as detecting unique solutions.
Theoretical Computer Science, 47:85–93, 1986. 215
Wan65. Hao Wang. Games, logic and computers. Scientific American, 213(5):98–106,
1965. 92
Wil85. Herbert Wilf. Some examples of combinatorial averaging. American Math.
Monthly, 92:250–261, 1985. 294
Win98. Erik Winfree. Algorithmic Self Assembly of DNA. PhD thesis, California
Institute of Technology, 1998. 92
WXXZ23. Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. New
bounds for matrix multiplication: from alpha to omega, 2023. 166
Yan81. Michael Yannakakis. Computing the minimum fill-in is NP-complete. SIAM
J. Algebr. Discrete Methods, 2, 1981. 261
Yap83. Chee Yap. Some consequences of non-uniform conditions on uniform classes.
Theoretical Computer Science, 26:287–300, 1983. 211, 212
Index