Avigad J. - Computability and Incompleteness. Lecture Notes (2007)
Avigad J. - Computability and Incompleteness. Lecture Notes (2007)
Lecture notes
Jeremy Avigad
1 Preliminaries 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The set-theoretic view of mathematics . . . . . . . . . . . . . 4
1.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Models of computation 13
2.1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Some Turing computable functions . . . . . . . . . . . . . . . 18
2.3 Primitive recursion . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Some primitive recursive functions . . . . . . . . . . . . . . . 24
2.5 The recursive functions . . . . . . . . . . . . . . . . . . . . . 31
2.6 Recursive is equivalent to Turing computable . . . . . . . . . 35
2.7 Theorems on computability . . . . . . . . . . . . . . . . . . . 41
2.8 The lambda calculus . . . . . . . . . . . . . . . . . . . . . . . 47
3 Computability Theory 57
3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Computably enumerable sets . . . . . . . . . . . . . . . . . . 59
3.3 Reducibility and Rice’s theorem . . . . . . . . . . . . . . . . . 65
3.4 The fixed-point theorem . . . . . . . . . . . . . . . . . . . . . 72
3.5 Applications of the fixed-point theorem . . . . . . . . . . . . 76
4 Incompleteness 81
4.1 Historical background . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Background in logic . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Representability in Q . . . . . . . . . . . . . . . . . . . . . . 90
4.4 The first incompleteness theorem . . . . . . . . . . . . . . . . 100
4.5 The fixed-point lemma . . . . . . . . . . . . . . . . . . . . . . 107
4.6 The first incompleteness theorem, revisited . . . . . . . . . . 110
i
4.7 The second incompleteness theorem . . . . . . . . . . . . . . 112
4.8 Löb’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.9 The undefinability of truth . . . . . . . . . . . . . . . . . . . 117
5 Undecidability 119
5.1 Combinatorial problems . . . . . . . . . . . . . . . . . . . . . 120
5.2 Problems in linguistics . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Hilbert’s 10th problem . . . . . . . . . . . . . . . . . . . . . . 124
Chapter 1
Preliminaries
1.1 Overview
Three themes are developed in this course. The first is computability, and
its flip side, uncomputability or unsolvability.
The informal notion of a computation as a sequence of steps performed
according to some kind of recipe goes back to antiquity. In Euclid, one finds
algorithmic procedures for constructing various geometric objects using a
compass and straightedge. Throughout the middle ages Chinese and Arabic
mathematicians wrote treatises on arithmetic calculations and methods of
solving equations and word problems. The word “algorithm” comes from the
name “al-Khowârizmi,” a mathematician who, around the year 825, wrote
such a treatise. It was titled Hiŝab al-jabr w’al-muqâ-balah, “science of the
reunion and the opposition.” The phrase “al-jabr” was also used to describe
the procedure of setting broken bones, and is the source of the word algebra.
I have just alluded to computations that were intended to be carried out
by human beings. But as technology progressed there was also an interest
in mechanization. Blaise Pascal built a calculating machine in 1642, and
Gottfried Leibniz built a better one a little later in the century. In the early
19th century Charles Babbage designed two grand mechanical computers,
the Difference Engine and the Analytic Engine, and Ada Lovelace wrote
some of the earliest computer programs. Alas, the technology of the time was
incapable of machining gears fine enough to meet Babbage’s specifications.
What is lacking in all these developments is a precise definition of what
it means for a function to be computable, or for a problem to be solvable.
For most purposes, this absence did not cause any difficulties; in a sense,
computability is similar to the Supreme Court Justice Stewart’s character-
1
2 CHAPTER 1. PRELIMINARIES
{x ∈ A | P (x)}
is just a fancy way of describing the set of even numbers. Here are some
other examples:
1. {x ∈ N | x is prime}
One can also describe a set by listing its elements, as in {1, 2}. Note that by
Fermat’s last theorem this is the same set as the one described in the second
example above, because they have the same elements; but a proof that
the different descriptions denote the same set is a major accomplishments
of contemporary mathematics. In philosophical terms, this highlights the
6 CHAPTER 1. PRELIMINARIES
{x | P (x)}
S = {x | x 6∈ x},
the set of all sets that are not elements of themselves. The paradox arises
from asking whether or not S ∈ S. By definition, if S ∈ S, then S 6∈ S,
a contradiction. So S 6∈ S. But then, by definition, S ∈ S. And this is
contradictory too.
This is the reason for restricting the set formation property above to
elements of a previously formed set A. Note that Russell’s paradox also
tells us that it is inconsistent to have a “set of all sets.” If A were such a
thing, then {x ∈ A | P (x)} would be no different from {x | P (x)}.
If A and B are sets, A × B, “the cross product of A and B,” is the set
of all ordered pairs ha, bi consisting of an element a ∈ A and an element
b ∈ B. Iterating this gives us notions of ordered triple, quadruple, and so
on; for example, one can take ha, b, ci to abbreviate ha, hb, cii. I noted above
that on the set-theoretic point of view, everything can be construed as a set.
This is true for ordered pairs as well; I will ask you to show, for homework,
that if one defines ha, bi to be {{a}, {a, b}}, the definiendum has the right
properties; in particular, ha, bi = hc, di if and only if a = c and b = d. (It is a
further exercise to show that the definition of A × B can be put in the form
{x ∈ C | P (x)}, where C is constructed using operations, like power-set,
described above.) This definition of ordered pairs is due to Kuratowski.
A binary relation R on A and B is just a subset of A × B. For example,
the relation “divides” on {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6} is formally defined
to be the set of ordered pairs
{h1, 1i, h1, 2i, h1, 3i, h1, 4i, h1, 5i, h1, 6i, h2, 2i, h2, 4i,
h2, 6i, h3, 3i, h3, 6i, h4, 4i, h5, 5i, h6, 6i}.
The first clause says that for every a there is some b such that Rf (a, b),
while the second clause says there is at most one such b. So, the two can be
combined by saying that for every a there is exactly one b such that Rf (a, b).
Of course, we write f (a) = b instead of Rf (a, b). (Similar considerations
hold for binary functions, ternary functions, and so on.) The important
thing to keep in mind is that in the official definition, a function is just a set
of ordered pairs. The advantage to this definition is that it provides a lot
of latitude in defining functions. Essentially, you can use any methods that
you use to define sets. According to the recipe above, you can define any
set of the form {x ∈ C | P (x)}, so the challenge is just to find a set C that
is big enough and a clearly stated property P (x). For example, consider the
function f : R → R defined by
1 if x is irrational
f (x) =
0 if x is rational
• f :N→N
• f : N → {even numbers}
• f :N→R
Again, I will draw the corresponding picture on the board. You should think
about what the equation above says in terms of the relations Rf and Rg .
It is not hard to argue from the basic axioms of set theory that for every
such f and g there is a function g ◦ f meeting the specification. (So the
“definition” has a little “theorem” built in.)
Later in the course we will need to use the notion of a partial function.
1. f : N → N defined by
x/2 if x is even
f (x) =
undefined otherwise
2. g : R → R defined by
√
x if x ≥ 0
g(x) =
undefined otherwise
1.3 Cardinality
The abstract style of reasoning in mathematics is nicely illustrated by Can-
tor’s theory of cardinality. Later, what has come to be known as Cantor’s
“diagonal method” will also play a central role in our analysis of computabil-
ity.
The following definition suggests a sense in which two sets can be said
to have the same “size”:
This definition agrees with the usual notion of the size of a finite set (namely,
the number of elements), so it can be seen as a way of extending size com-
parisons to the infinite. The definition has a lot of pleasant properties. For
example:
Proposition 1.3.2 Equipollence is an equivalence relation: for every A, B,
and C,
• A≈A
• if A ≈ B, then B ≈ A
• if A ≈ B and B ≈ C then A ≈ C
(An aside: one can define an ordering A B, which holds if and only
if there is an injective map from A to B. Under the axiom of choice, this
is a linear ordering. It is true but by no means obvious that if A B and
B A then A ≈ B; this is known as the Schröder-Bernstein theorem.)
Here are some examples.
1. The set of even numbers is countably infinite: f (x) = 2x is a bijection
from N to this set.
2. The set of prime numbers is countably infinite: let f (x) be the xth
prime number.
Proof. Let us show that in fact the real interval [0, 1] is not countable. Sup-
pose f : N → [0, 1] is any function; it suffices to construct a real number that
is not in the range of f . Note that every real number f (i) can be written as
a decimal of the form
0.ai,0 ai,1 ai,2 . . .
writing 1 as 0.99999. (If f (i) is a terminating decimal, it can also be written
as a decimal ending with 9’s. For concretness, choose the latter represen-
tation.) Now define a new number 0.b0 b1 b2 . . . by making each bi different
from ai,i . Specifically, set bi to be 3 if ai,i is any number other than 3, and 7
otherwise. Then the number 0.b0 b1 b2 . . . is not in the range of f (i), because
it differs from f (i) at the ith digit.
Similar arguments can be used to show that the set of all functions f :
N → N, and even the set of all functions f : N → {0, 1} are uncountable. In
fact, both these sets have the same cardinality, namely, that of R. Cantor’s
12 CHAPTER 1. PRELIMINARIES
Models of computation
2. A proof of the equivalence of two definitions (in case the new definition
has a greater intuitive appeal).
13
14 CHAPTER 2. MODELS OF COMPUTATION
• The machine has a two-way infinite tape with discrete cells. Note that
“infinite” really means “as big as is needed for the computation”; any
halting computation will only have used a finite piece of it.
To start a computation, you put the machine in the start state, with the tape
head to the right of a finite string of symbols (on an otherwise blank tape).
Then you keep following instructions, until you end up in a state/symbol
pair for which no further instruction applies.
The textbook describes Turing machines with only two symbols, 0 and
1; but one can show that with only two symbols, it is possible to simulate
machines with more. Similarly, some authors use Turing machines with
“one-way” infinite tapes; with some work, one can show how to simulate two
way tapes, or even multiple tapes or two-dimensional tapes, etc. Indeed, we
will argue that with the Turing machines we have described, it is possible
to simulate any mechanical procedure at all.
The book has a standard but clunky notation for describing Turing ma-
chine programs. We will use a more convenient type of diagram, which I will
describe in class. Roughly, circles with numbers in them represent states.
An arrow between states i and l labelled (j, k) stands for the instruction
2.1. TURING MACHINES 15
“if in state i and scanning j, write k and go to state l.” “Move right” and
“move left” are indicated with arrows, → and ← respectively. This is the
notation used in the program Turing’s World, which allows you to design
Turing machines and then watch them run. If you have never played with
Turing machines before, I recommend this program to you.
It is easy to design machines that never halt; for example, you can use
one state and loop indefinitely. In class, I will go over an example from
Turing’s world called “The Lone Rearranger.”
I have described the notion of a Turing machine informally. Now let me
present a precise mathematical definition. For starters, if the machine has
n states, I will assume that they are numbered 0, . . . , n − 1, and that 0 is
the start state; similarly, it is convenient to assume that the symbols are
numbered 0, . . . , m − 1, where 0 is the “blank” character. For such a Turing
machine, it is also convenient to use m to stand for “move left” and m + 1
for “move right.”
Since we only care about the position of the tape head relative to the data,
it is convenient to replace the last two pieces of information with these three:
the symbol under the tape head, the string to the left of the tape head (in
reverse order), and the string to the right of the tape head.
M halts with output 1f (x0 ,...,xk−1 ) extending to the right of the tape head.
18 CHAPTER 2. MODELS OF COMPUTATION
3. Print a 1.
2. If there are no more 1’s in the first block (i.e. x = 0), delete the second
block, and halt.
6. Now the tape head is on a blank (i.e. a 0); to the right of the blank
are (x − 1)y blanks, followed by y 1’s. Fill in the blanks to the right
of the tape head with 1’s.
20 CHAPTER 2. MODELS OF COMPUTATION
These examples are far from convincing that Turing machines can do
anything a Cray supercomputer can do, even setting issues of efficiency
aside. Beyond the direct appeal to intuition, Turing suggested two ways
of making the case stronger: first, showing that lots more functions can be
computed by such machines; and, second, showing that one can simulate
other models of computation. For example, many of you would be firmly
convinced if we had a mechanical way of compiling C++ source down to
Turing machine code!
One way to proceed towards both these ends would be to build up a
library of computable functions, as well as build up methods of executing
subroutines, passing arguments, and so on. But designing Turing machines
with diagrams and lists of 4-tuples can be tedious, so we will take another
tack. I will describe another class of functions, the primitive recursive func-
tions, and show that this class is very flexible and robust; and then we will
show that every primitive recursive function is Turing computable.
We can also compose functions to build more complex ones; for example,
k(x) = xx + (x + 3) · x
= f (h(x, x), g(f (x, 3), x)).
for each natural number n and i < n. In the end, we have the following:
Definition 2.3.1 The set of primitive recursive functions is the set of func-
tions of various arities from the set of natural numbers to the set of natural
numbers, defined inductively by the following clauses:
h(x, y) = f (P02 (x, y), g(P02 (x, y), P02 (x, y), P12 (x, y)), P12 (x, y)).
l(x, y) = g(P02 (x, y), P02 (x, y), P12 (x, y)),
x+0 = x
x + (y + 1) = S(x + y).
2.3. PRIMITIVE RECURSION 23
• Addition, x + y
• Multiplication, x · y
• Factorial, x!
x −. 0 = x, x −. (y + 1) = pred (x −. y)
max(x, y) = x + (y −. x)
• Minimum, min(x, y)
The set of primitive recursive functions is further closed under the fol-
lowing two operations:
We can also define boolean operations, where 1 stands for true, and 0 for
false:
• Negation, not(x) = 1 −. x
It should be clear that one can compose relations with other primitive
recursive functions. So the following are also primitive recursive:
• Negation, ¬P
• Conjunction, P ∧ Q
• Disjunction, P ∨ Q
• Implication P → Q
26 CHAPTER 2. MODELS OF COMPUTATION
For m greater than 1, one can just compose definitions of this form.
We will also make good use of bounded minimization:
x|y ≡ ∃z ≤ y (x · z = y).
Here we are relying on Euclid’s proof of the fact that there is always
a prime number between x and x! + 1.
• The function p(x), returning the xth prime, defined by p(0) = 2, p(x +
1) = nextPrime(p(x)). For convenience we will write this as px (start-
ing with 0; i.e. p0 = 2).
28 CHAPTER 2. MODELS OF COMPUTATION
I have added one to the last exponent, to guarantee that, for example, the
sequences h2, 7, 3i and h2, 7, 3, 0, 0i have distinct numeric codes. I will take
both 0 and 1 to code the empty sequence; for concreteness, let ∅ denote 0.
(This coding scheme is slightly different from the one used in the book.)
Let us define the following functions:
• length(s), which returns the length of the sequence s:
0 if s = 0 or s = 1
length(s) = min i < s (pi |s ∧ ∀j < s
(j > i → pj 6 |s)) + 1 otherwise
• append (s, a), which returns the result of appending a to the sequence
s: ( a+1
2 if s = 0 or s = 1
append (s, a) = s·pa+1
length(s)
p otherwise
length(s)−1
I will leave it to you to check that integer division can also be defined
using minimization.
• element(s, i), which returns the ith element of s (where the initial
element is called the 0th), or 0 if i is greater than or equal to the
length of s:
if i ≥ length(s)
0
element(s, i) = min j < s (pj+1
i 6 |s) − 1 if i + 1 = length(s)
min j < s (pj+1 6 |s)
i otherwise
But using bounded search, we can be lazy. All we need to do is write down
a primitive recursive specification of the object (number) we are looking for,
and a bound on how far to look. The following works:
with the understanding that the second argument to h is just the empty
sequence when x is 0. In either formulation, the idea is that in computing
the “successor step,” the function f can make use of the entire sequence of
values computed so far. This is known as a course-of-values recursion. For a
particular example, it can be used to justify the following type of definition:
h(x, f (k(x, ~z), ~z), ~z) if k(x, ~z) < x
f (x, ~z) =
g(x, ~z) otherwise
• iequal (x, y)
2.5. THE RECURSIVE FUNCTIONS 31
• iplus(x, y)
• iminus(x, y)
• itimes(x, y)
Similarly, we can define a rational number to be a pair hx, yi of integers with
y 6= 0, representing the value x/y. And we can define qequal , qplus, qminus,
qtimes, qdivides, and so on.
g(x, y) = fx (y)
h(x) = g(x, x) + 1
= fx (x) + 1.
g0 (x) = x + 1
gn+1 (x) = gnx (x)
You can confirm that each function gn is primitive recursive. Each successive
function grows much faster than the one before; g1 (x) is equal to 2x, g2 (x)
is equal to 2x · x, and g3 (x) grows roughly like an exponential stack of x 2’s.
Ackermann’s function is essentially the function G(x) = gx (x), and one can
show that this grows faster than any primitive recursive function.)
32 CHAPTER 2. MODELS OF COMPUTATION
To motivate the definition of the recursive functions, note that our proof
that there are computable functions that are not primitive recursive actually
establishes much more. The argument was very simple: all we used was the
fact was that it is possible to enumerate functions f0 , f1 , . . . such that, as a
function of x and y, fx (y) is computable. So the argument applies to any
class of functions that can be enumerated in such a way. This puts us in
a bind: we would like to describe the computable functions explicitly; but
any explicit description of a collection of computable functions cannot be
exhaustive!
The way out is to allow partial functions to come into play. We will see
that it is possible to enumerate the partial Turing computable functions; in
fact, we already pretty much know that this is the case, since it is possible
to enumerate Turing machines in a systematic way. We will come back to
our diagonal argument later, and explore why it does not go through when
partial functions are included.
The question is now this: what do we need to add to the primitive
recursive functions to obtain all the partial recursive functions? We need to
do two things:
1. Modify our definition of the primitive recursive functions to allow for
partial functions as well.
is defined if and only if each gi is defined at ~x, and h is defined at g0 (~x), . . . , gk (~x).
With this understanding, the definitions of composition and primitive recur-
sion for partial functions is just as above, except that we have to replace
“=” by “'”.
34 CHAPTER 2. MODELS OF COMPUTATION
Definition 2.5.1 The set of partial recursive functions is the smallest set
of partial functions from the natural numbers to the natural numbers (of
various arities) containing zero, successor, and projections, and closed under
composition, primitive recursion, and unbounded search.
Definition 2.5.2 The set of recursive functions is the set of partial recur-
sive functions that are total.
is an x such that f (x, ~z) = 0. In other words, the regular functions are
exactly those functions to which one can apply unbounded search, and end
up with a total function. One can, conservatively, restrict unbounded search
to regular functions:
Definition 2.5.3 The set of general recursive functions is the smallest set
of functions from the natural numbers to the natural numbers (of various
arities) containing zero, successor, and projections, and closed under com-
position, primitive recursion, and unbounded search applied to regular func-
tions.
Since the recursive functions are just the partial recursive functions that
happen to be total, and similarly for the Turing computable functions, we
have:
There are two directions to proving Theorem 2.6.1. First, we will show
that every partial recursive function is Turing computable. Then we will
show that every Turing computable function is partial recursive.
For the first direction, recall the definition of the set of partial recursive
functions as being the smallest set containing zero, successor, and projec-
tions, and closed under composition, primitive recursion, and unbounded
search. To show that every partial recursive function is Turing computable,
36 CHAPTER 2. MODELS OF COMPUTATION
it therefore suffices to show that the initial functions are Turing computable,
and that the (partial) Turing computable functions are closed under these
same operations. Indeed, we will show something slightly stronger: each ini-
tial function is computed by a Turing machine that never moves to the left
of the start position, ends its computation on the same square on which it
started, and leaves the tape after the output blank; and the set of functions
computable in this way is closed under the relevant operations. I will follow
the argument in the textbook.
Computing the constant zero is easy: just halt with a blank tape. Com-
puting the successor function is also easy: again, just halt. Computing a
projection function Pin is not much harder: just erase all the inputs other
than the ith, copy the ith input to the beginning of the tape, and delete a
single 1.
Closure under composition is slightly more interesting. Suppose f is the
function defined by composition from h, g0 , . . . , gk , i.e.
• Copy I: I, 0, I
• ...
• Delete the initial block I, and move the output to the left.
Setting aside primitive recursion for the moment, let us consider un-
bounded search. Suppose we already have a machine Mf that computes
f (x, ~z), and we wish to find a machine to compute µx f (x, ~z). Let I denote
the block corresponding to the input ~z. The algorithm, of course, is to iter-
atively compute f (0, ~z), f (1, ~z), . . . until the output is zero. We can do this
as follows:
• Loop, as follows:
and that the function that returns the output of a halting computation is also
primitive recursive. Then, assuming f is computed by Turing machine M ,
we can describe f as a partial recursive function as follows: on input x, use
unbounded search to look for a halting computation sequence for machine
M on input x; and, if there is one, return the output of the computation
sequence.
In fact, we did most of the work when we gave a precise definition of Tur-
ing computability in Section 2.1; we only have to show that all the definitions
can be expressed in terms of primitive recursive functions and relations. It
turns out to be convenient to use a sequence of 4-tuples to represent a Tur-
ing machine’s list of instructions (instead of a partial function); otherwise,
the definitions below are just the primitive recursive analogues of the ones
given in Section 2.1.
In the list below, names of functions begin with lower case letters, while
the names of relations begin with upper case letters. I will not provide every
last detail; the ones I leave out are for you to fill in.
4. Operations on configurations:
addSymbol (a, s) = haiˆs
dropSymbol (s) = . . .
This should return the result of dropping the first symbol of s, if s is
nonempty, and s otherwise.
40 CHAPTER 2. MODELS OF COMPUTATION
(s)0 if length(s) > 0
firstSymbol (s) =
0 otherwise
changeState(c, q) = hq, cSymbol (c), cLeftString(c), cRightString(c)i
This should change the state of configuration c to q.
moveLeft(c) = hcState(c), firstSymbol (cLeftString(c)),
dropSymbol (cLeftString(c)), addSymbol (cSymbol (c), cRightString(c))i
moveRight(c) = . . .
These return the result of moving left (resp. right) in configuration c.
We can now finish off the proof of the theorem. Suppose f (x) is a partial
function computed by a Turing machine, coded by M . Then for every x, we
have
f (x) = output(µs CompSeq(M, x, s)).
This shows that f is partial recursive.
for every x.
2.7. THEOREMS ON COMPUTABILITY 43
Proof. T and U are simply more conventional notations for any relation and
function pair that behaves like our CompSeq and output.
Proof. Let Un(k, x) ' U (µs T (k, x, s)) in Kleene’s normal form theorem.
Proof. This theorem says that there is no total computable function that is
universal for the total computable functions. The proof is a simple diago-
nalization: if U n0 (k, x) is total and computable, then
f (x) = U n0 (x, x) + 1
is also total and computable, and for every k, f (k) is not equal to U n0 (k, k).
g(0, k, x) ' 0
g(y + 1, k, x) ' Un(k, x);
then
Un 0 (k, x) ' g(f (k, x), k, x).
But now Un 0 (k, x) is a total function. And since Un 0 (k, x) agrees with
Un(k, x) wherever the latter is defined, Un 0 is universal for those partial
computable functions that happen to be total. But this contradicts Theo-
rem 2.7.6.
• computable functions
To sort this out, it might help to draw a big square representing all the
partial functions from N to N, and then mark off two overlapping regions,
corresponding to the total functions and the computable partial functions,
respectively. It is a good exercise to see if you can describe an object in each
of the resulting regions in the diagram.
I have already noted that there are many more — including abacus com-
putability, computability by register machines, computability by a C++ or
Java program, and more. In fact, we will come across a few additional ones
towards the end of the course. In this section we will consider one more
model: computability by lambda terms.
The lambda calculus was originally designed by Alonzo Church in the
early 1930’s as a basis for constructive logic, and not as a model of the
computable functions. But soon after the Turing computable functions, the
recursive functions, and the general recursive functions were shown to be
equivalent, lambda computability was added to the list. The fact that this
initially came as a small surprise makes the characterization all the more
interesting.
In Chapter 3, the textbook discusses some simple uses of λ notation.
Instead of saying “let f be the function defined by f (x) = x + 3,” one can
say, “let f be the function λx (x + 3).” In other words, λx (x + 3) is just a
name for the function that adds three to its argument. In this expression,
x is just a dummy variable, or a placeholder: the same function can just as
well be denoted λy (y + 3). The notation works even with other parameters
around. For example, suppose g(x, y) is a function of two variables, and k
is a natural number. Then λx g(x, k) is the function which maps any value
of x to g(x, k).
This way of defining a function from a symbolic expression is known
as lambda abstraction. The flip side of lambda abstraction is application:
48 CHAPTER 2. MODELS OF COMPUTATION
assuming one has a function f (say, defined on the natural numbers), one
can apply it to any value, like 2. In conventional notation, of course, we
write f (2).
What happens when you combine lambda abstraction with application?
Then the resulting expression can be simplified, by “plugging” the applicand
in for the abstracted variable. For example,
(λx (x + 3))(2)
can be simplified to 2 + 3.
Up to this point, we have done nothing but introduce new notations
for conventional notions. The lambda calculus, however, represents a more
radical departure from the set-theoretic viewpoint. In this framework:
The system without any constants at all is called the pure lambda calculus.
Following the handout, we will follow a few notational conventions:
• When parentheses are left out, application takes place from left to
right. For example, if M , N , P , and Q are terms, then M N P Q
abbreviates (((M N )P )Q).
For example,
λxy. xxyxλz xz
abbreviates
λx λy ((((xx)y)x)λz (xz)).
Memorize these conventions. They will drive you crazy at first, but you
will get used to them, and after a while they will drive you less crazy than
having to deal with a morass of parentheses.
Two terms that differ only in the names of the bound variables are called
α-equivalent; for example, λx x and λy y. It will be convenient to think
of these as being the “same” term; in other words, when I say that M and
N are the same, I also mean “up to renamings of the bound variables.”
Variables that are in the scope of a λ are called “bound”, while others are
called “free.” There are no free variables in the previous example; but in
(λz yz)x
y and x are free, and z is bound. More precise definitions of “free,” “bound,”
and “α equivalent” can be found in the handout.
What can one do with lambda terms? Simplify them. If M and N are
any lambda terms and x is any variable, the handout uses [N/x]M to denote
the result of substituting N for x in M , after renaming any bound variables
of M that would interfere with the free variables of N after the substitution.
For example,
[yyz/x](λw xxw) = λw (yyz)(yyz)w.
50 CHAPTER 2. MODELS OF COMPUTATION
This notation is not the only one that is standardly used; I, myself, prefer
to use the notation M [N/x], and others use M [x/N ]. Beware!
Intuitively, (λx M )N and [N/x]M have the same meaning; the act of
replacing the first term by the second is called β-contraction. More generally,
if it is possible convert a term P to P 0 by β-contracting some subterm, one
says P β-reduces to P 0 in one step. If P can be converted to P 0 with any
number of one-step reductions (possibly none), then P β-reduces to P 0 . A
term that can not be β-reduced any further is called β-irreducible, or β-
normal. I will say “reduces” instead of “β-reduces,” etc., when the context
is clear.
Let us consider some examples.
1. We have
4. Also, some terms can be reduced in more than one way; for example,
Theorem 2.8.1 Let M , N1 , and N2 be terms, such that M .N1 and M .N2 .
Then there is a term P such that N1 . P and N2 . P .
The proof of Theorem 2.8.1 goes well beyond the scope of this class, but
if you are interested you can look it up in Hindley and Seldin, Introduction
to Combinators and λ Calculus.
Finally, I will say that two terms M and N are β-equivalent, or just
equivalent, if they reduce to a common term; in other words, if there is some
P such that M .P and N .P . This is written M ≡ N . Using Theorem 2.8.1,
you can check that ≡ is an equivalence relation, with the additional property
that for every M and N , if M . N or N . M , then M ≡ N . (In fact, one
can show that ≡ is the smallest equivalence relation having this property.)
What is the lambda calculus doing in a chapter on models of computa-
tion? The point is that it does provide us with a model of the computable
functions, although, at first, it is not even clear how to make sense of this
statement. To talk about computability on the natural numbers, we need
to find a suitable representation for such numbers. Here is one that works
surprisingly well.
make the intentions behind the definitions clearer. In a similar way, I will
resort to the old-fashioned way of saying “define M by M (x, y, z) = . . .”
instead of “define M by M = λx λy λz . . ..”
Let us run through the list. Zero, 0, is just λxy. y. The successor
function, S, is defined by S(u) = λxy. x(uxy). You should think about why
this works; for each numeral n, thought of as an iterator, and each function
f , S(n, f ) is a function that, on input y, applies f n times starting with y,
and then applies it once more.
n
There is nothing to say about projections: P i (x0 , . . . , xn−1 ) = xi . In
n
other words, by our conventions, P i is the lambda term λx0 , . . . , xn−1 . xi .
Closure under composition is similarly easy. Suppose f is defined by
composition from h, g0 , . . . , gk−1 . Assuming h, g0 , . . . , gk−1 are represented
by h, g 0 , . . . , g k−1 , respectively, we need to find a term f representing f . But
we can simply define f by
In other words, the language of the lambda calculus is well suited to represent
composition as well.
When it comes to primitive recursion, we finally need to do some work.
We will have to proceed in stages. As before, on the assumption that we
already have terms g and h representing functions g and h, respectively, we
want a term f representing the function f defined by
for every natural number n; the fact that G0 and H 0 represent g and h means
that whenever we plug in numerals m ~ for ~z, F (n + 1, m)
~ will normalize to
the right answer.
But for this, it suffices to find a term F satisfying
F (0) ≡ G
F (n + 1) ≡ H(n, F (n))
54 CHAPTER 2. MODELS OF COMPUTATION
G = λ~z G0 (~z)
and
H(u, v) = λ~z H 0 (u, v(u, ~z), ~z).
In other words, with lambda trickery, we can avoid having to worry about
the extra parameters ~z — they just get absorbed in the lambda notation.
Before we define the term F , we need a mechanism for handling ordered
pairs. This is provided by the next lemma.
Lemma 2.8.6 There is a lambda term D such that for each pair of lambda
terms M and N , D(M, N )(0) . M and D(M, N )(1) . N .
K(y) = λx y.
and
D(M, N, 1) . 1(K(N ))M . K(N )M . N,
as required.
The idea is that D(M, N ) represents the pair hM, N i, and if P is as-
sumed to represent such a pair, P (0) and P (1) represent the left and right
projections, (P )0 and (P )1 . For clarity, I will use the latter notations.
Now, let us remember where we stand. We need to show that given any
terms, G and H, we can find a term F such that
F (0) ≡ G
F (n + 1) ≡ H(n, F (n))
using numerals as iterators. Notice that the first pair is just h0, Gi. Given a
pair hn, F (n)i, the next pair, hn + 1, F (n + 1)i is supposed to be equivalent
to hn + 1, H(n, F (n))i. We will design a lambda term T that makes this
one-step transition.
The last paragraph was simply heuristic; the details are as follows. Define
T (u) by
T (u) = hS((u)0 ), H((u)0 , (u)1 )i.
Now it is easy to verify that for any number n,
T (hn, M i) . hn + 1, H(n, M )i.
As suggested above, given G and H, define F (u) by
F (u) = (u(T, h0, Gi))1 .
In other words, on input n, F iterates T n times on h0, Gi, and then returns
the second component. To start with, we have
• 0(T, h0, Gi) ≡ h0, Gi
• F (0) ≡ G
By induction on n, we can show that for each natural number one has the
following:
• n + 1(T, h0, Gi) ≡ hn + 1, F (n + 1)i
• F (n + 1) ≡ H(n, F (n))
For the second clause, we have
F (n + 1) . (n + 1(T, h0, Gi))1
≡ (T (n(T, h0, Gi)))1
≡ (T (hn, F (n)i))1
≡ (hn + 1, H(n, F (n))i)1
≡ H(n, F (n)).
Here we have used the induction hypothesis on the second-to-last line. For
the first clause, we have
n + 1(T, h0, Gi) ≡ T (n(T, h0, Gi))
≡ T (hn, F (n)i)
≡ hn + 1, H(n, F (n))i
≡ hn + 1, F (n + 1)i.
56 CHAPTER 2. MODELS OF COMPUTATION
Here we have used the second clause in the last line. So we have shown
F (0) ≡ G and, for every n, F (n + 1) ≡ H(n, F (n)), which is exactly what
we needed.
The only thing left to do is to show that the partial functions repre-
sented by lambda terms are closed under the µ operation, i.e. unbounded
search. But it will be much easier to do this later on, after we have dis-
cussed the fixed-point theorem. So, take this as an IOU. Modulo this claim
(and some details that have been left for you to work out), we have proved
Theorem 2.8.5.
Chapter 3
Computability Theory
3.1 Generalities
The branch of logic known as Computability Theory deals with issues having
to do with the computability, or relative computability, of functions and sets.
From the last chapter, we know that we can take the word “computable” to
mean “Turing computable” or, equivalently, “recursive.” It is a evidence of
Kleene’s influence that the subject used to be known as Recursion Theory,
and today, both names are commonly used.
Most introductions to Computability Theory begin by trying to abstract
away the general features of computability as much as possible, so that
one can explore the subject without having to refer to a specific model of
computation. For example, we have seen that there is a universal partial
computable function, Un(n, x). This allows us to enumerate the partial
computable functions; from now on, we will adopt the notation ϕn to denote
the nth unary partial computable function, defined by ϕn (x) ' Un(n, x).
(Kleene used {n} for this purpose, but this notation has not been used as
much recently.) Slightly more generally, we can uniformly enumerate the
partial computable functions of arbitrary arities, and I will use ϕkn to denote
the nth k-ary partial recursive function. The key fact is that there is a
universal function for this set. In other words:
Theorem 3.1.1 There is a partial computable function f (x, y) such that
for each n and k and sequence of numbers a0 , . . . , ak−1 we have
In fact, we can take f (n, x) to be Un(n, x), and define ϕkn (a0 , . . . , ak−1 ) '
Un(n, ha0 , . . . , ak−1 i). Alternatively, you can think of f as the partial com-
57
58 CHAPTER 3. COMPUTABILITY THEORY
putable function that, on input n and ha0 , . . . , ak−1 i, returns the output of
Turing machine n on input a0 , . . . , ak−1 .
Remember also Kleene’s normal form theorem:
Theorem 3.1.2 There is a primitive recursive relation T (n, x, s) and a
primitive recursive function U such that for each recursive function f there
is a number n, such that
f (x) ' U (µsT (n, x, s)).
In fact, T and U can be used to define the enumeration ϕ0 , ϕ1 , ϕ2 , . . .. From
now on, we will assume that we have fixed a suitable choice of T and U , and
take the equation
ϕn (x) ' U (µsT (n, x, s))
to be the definition of ϕn .
The next theorem is known as the “s-m-n theorem,” for a reason that
will be clear in a moment. The hard part is understanding just what the
theorem says; once you understand the statement, it will seem fairly obvious.
Theorem 3.1.3 For each pair of natural numbers n and m, there is a prim-
itive recursive function sm
n such that for every sequence x, a0 , . . . , am−1 , y0 , . . . , yn−1 ,
we have
ϕnsm
n (x,a0 ,...,am−1 )
(y0 , . . . , yn−1 ) ' ϕm+n
x (a0 , . . . , am−1 , y0 , . . . , yn−1 ).
It is helpful to think of sm m
n as acting on programs. That is, sn takes a pro-
gram, x, for an (m+n)-ary function, as well as fixed inputs a0 , . . . , am−1 ; and
it returns a program, sm n (x, a0 , . . . , am−1 ), for the n-ary function of the re-
maining arguments. It you think of x as the description of a Turing machine,
then smn (x, a0 , . . . , am−1 ) is the Turing machine that, on input y0 , . . . , yn−1 ,
prepends a0 , . . . , am−1 to the input string, and runs x. Each sm n is then just
a primitive recursive function that finds a code for the appropriate Turing
machine.
Here is another useful fact:
Theorem 3.1.4 Every partial computable function has infinitely many in-
dices.
Again, this is intuitively clear. Given any Turing machine, M , one can
design another Turing machine M 0 that twiddles its thumbs for a while, and
then acts like M .
Throughout this chapter, we will reason about what types of things are
computable. To show that a function is computable, there are two ways one
can proceed:
3.2. COMPUTABLY ENUMERABLE SETS 59
The textbook uses the term recursively enumerable instead. This is the
original terminology, and today both are commonly used, as well as the
abbreviations “c.e.” and “r.e.” You should think about what the definition
means, and why the terminology is appropriate. The idea is that if S is the
range of the computable function f , then
Theorem 3.2.3 Let S be a set of natural numbers. Then the following are
equivalent:
1. S is computably enumerable.
The first three clauses say that we can equivalently take any nonempty
computably enumerable set to be enumerated by either a computable func-
tion, a partial computable function, or a primitive recursive function. The
fourth clause tells us that if S is computably enumerable, then for some
index e,
S = {x | ϕe (x) ↓}.
If we take e to code a Turing machine, then S is the set of inputs on which
the Turing machine halts. For that reason, computably enumerable sets are
3.2. COMPUTABLY ENUMERABLE SETS 61
Let
g(y) = µx(f (x) = y).
62 CHAPTER 3. COMPUTABILITY THEORY
S = {x | ϕe (x) ↓}.
S = {x | ∃y R(x, y)}.
S = {x | ∃y T (e, x, y)}.
Theorem 3.2.5 Let K0 be the set {he, xi | ϕe (x) ↓}. Then K0 is com-
putably enumerable but not computable.
The following theorem gives some closure properties on the set of com-
putably enumerable sets.
and
j(x) = µy(f ((y)0 ) = x ∧ g((y)1 ) = x).
Then A ∪ B is the domain of h, and A ∩ B is the domain of j. Here is what is
going on, in computational terms: given procedures that enumerate A and
B, we can semi-decide if an element x is in A ∪ B by looking for x in either
enumeration; and we can semi-decide if an element x is in A ∩ B for looking
for x in both enumerations at the same time.
64 CHAPTER 3. COMPUTABILITY THEORY
that is similar to the one described below, only with the added requirement
that the reduction can be computed in polynomial time.
We have already used this notion implicitly. Define the set K by
K = {x | ϕx (x) ↓},
Proof. Let f be a many-one reduction from A to B. For the first claim, just
check that if B is the domain of partial function g, then A is the domain of
g ◦ f:
x ∈ A ↔ f (x) ∈ B
↔ g(f (x)) ↓ .
K1 = {e | ϕe (0) ↓}.
Note that f ignores its third input entirely. Pick an index e such that
f = ϕ3e ; so we have
ϕ3e (x, y, z) ' ϕx (y).
By the s-m-n theorem, there is a function s(e, x, y) such that, for every z,
In terms of the informal argument above, s(e, x, y) is an index for the ma-
chine that, for any input z, ignores that input and computes ϕx (y). In
particular, we have
B = We = {x | ϕe (x) ↓}.
Let f be the function f (x) = he, xi. Then for every natural number x, x ∈ B
if and only if f (x) ∈ K0 . In other words, f reduces B to K0 .
To see that K1 is complete, note that in the last theorem we reduced
K0 to it. So, by Proposition 3.3.2, any computably enumerable set can be
reduced to K1 as well. K can be reduced to K0 in much the same way.
So, it turns out that all the examples of computably enumerable sets
that we have considered so far are either computable, or complete. This
should seem strange! Are there any examples of computably enumerable
sets that are neither computable nor complete? The answer is yes, but it
wasn’t until the middle of the 1950’s that this was established by Friedberg
and Muchnik, independently.
Let us consider one more example of using the s-m-n theorem to show
that something is noncomputable. Let Tot be the set of indices of total
computable functions, i.e.
Proof. It turns out that Tot is not even computably enumerable — its com-
plexity lies further up on the “arithmetic hierarchy.” But we will not worry
about this strengthening here.
To see that Tot is not computable, it suffices to show that K is reducible
to it. Let h(x, y) be defined by
0 if x ∈ K
h(x, y) '
undefined otherwise
Note that h(x, y) does not depend on y at all. By now, it should not be hard
to see that h is partial computable: on input x, y, the program computing
h first simulates machine x on input x; if this computation machine halts,
h(x, y) outputs 0 and halts. So h(x, y) is just Z(µs T (x, x, s)), where Z is
the constant zero function.
Using the s-m-n theorem, there is a primitive recursive function k(x)
such that for every x and y,
0 if x ∈ K
ϕk(x) (y) =
undefined otherwise
If you think about it, you will see that the specifics of Tot do not play
into the proof above. We designed h(x, y) to act like the constant function
j(y) = 0 exactly when x is in K; but we could just as well have made it act
like any other partial computable function under those circumstances. This
observation lets us state a more general theorem, which says, roughly, that
no nontrivial property of computable functions is decidable.
Keep in mind that ϕ0 , ϕ1 , ϕ2 , . . . is our standard enumeration of the
partial computable functions.
An index set is a set A with the property that if n and m are indices
which “compute” the same function, then either both n and m are in A,
or neither is. It is not hard to see that the set A in the theorem has this
3.3. REDUCIBILITY AND RICE’S THEOREM 71
• {x | ϕx is constant}
• {x | ϕx is total}
and let plq be an index for l. Finally, let e = diag(plq). Then for every y,
we have
as required.
What’s going on? The following heuristic might help you understand
the proof.
Suppose you are given the task of writing a computer program that
prints itself out. Suppose further, however, that you are working with a
programming language with a rich and bizarre library of string functions.
In particular, suppose your programming language has a function diag which
works as follows: given an input string s, diag locates each instance of the
symbol ‘x’ occuring in s, and replaces it by a quoted version of the original
string. For example, given the string
3.4. THE FIXED-POINT THEOREM 75
hello x world
as input, the function returns
hello ’hello x world’ world
as output. In that case, it is easy to write the desired program; you can
check that
print(diag(’print(diag(x))’))
does the trick. For more common programming languages like C++ and
Java, the same idea (with a more involved implementation) still works.
We are only a couple of steps away from the proof of the fixed-point
theorem. Suppose a variant of the print function print(x , y) accepts a string
x and another numeric argument y, and prints the string x repeatedly, y
times. Then the “program”
getinput(y);print(diag(’getinput(y);print(diag(x),y)’),y)
prints itself out y times, on input y. Replacing the getinput–print–diag
skeleton by an arbitrary funtion g(x, y) yields
g(diag(’g(diag(x),y)’),y)
which is a program that, on input y, runs g on the program itself and y.
Thinking of “quoting” with “using an index for,” we have the proof above.
For now, it is o.k. if you want to think of the proof as formal trickery,
or black magic. But you should be able to reconstruct the details of the ar-
gument given above. When we prove the incompleteness theorems (and the
related “fixed-point theorem”) we will discuss other ways of understanding
why it works.
Let me also show that the same idea can be used to get a “fixed point”
combinator. Suppose you have a lambda term g, and you want another term
k with the property that k is β-equivalent to gk. Define terms
diag(x) = xx
and
l(x) = g(diag(x))
using our notational conventions; in other words, l is the term λx.g(xx). Let
k be the term ll. Then we have
k = (λx.g(xx))(λx.g(xx))
. g((λx.g(xx))(λx.g(xx)))
= gk.
76 CHAPTER 3. COMPUTABILITY THEORY
If one takes
Y = λg ((λx. g(xx))(λx. g(xx)))
then Y g and g(Y g) reduce to a common term; so Y g ≡β g(Y g). This is
known as “Curry’s combinator.” If instead one takes
then in fact Y g reduces to g(Y g), which is a stronger statement. This latter
version of Y is known as “Turing’s combinator.”
ϕe (y) = e + y.
As another example, one can use the proof of the fixed-point theorem to
design a program in Java or C++ that prints itself out.
and then using the fixed-point lemma to find an index e such that ϕe (y) =
g(e, y).
For a concrete example, the “greatest common divisor” function gcd (u, v)
can be defined by
v if 0 = 0
gcd (u, v) '
gcd (v mod u, u) otherwise
Proof. The idea is roughly as follows. Given x, we will use the fixed-point
lambda term Y to define a function hx (n) which searches for a y starting at
n; then g(x) is just hx (0). The function hx can be expressed as the solution
of a fixed-point equation:
n if f (x, n) = 0
hx (n) '
hx (n + 1) otherwise.
We can do this using the fixed-point term Y . First, let U be the term
and then let H be the term Y U . Notice that the only free variable in H is
x. Let us show that H satisfies the equation above.
By the definition of Y , we have
H = Y U ≡ U (Y U ) = U (H).
H(n) ≡ U (H, n)
. D(n, H(S(n)), F (x, n)),
Incompleteness
81
82 CHAPTER 4. INCOMPLETENESS
the century, mathematicians tried to push further, and explain all aspects
of calculus, including the real numbers themselves, in terms of the natu-
ral numbers. (Kronecker: “God created the whole numbers, all else is the
work of man.”) In 1872, Dedekind wrote “Continuity and the irrational
numbers,” where he showed how to “construct” the real numbers as sets of
rational numbers (which, as you know, can be viewed as pairs of natural
numbers); in 1888 he wrote “Was sind und was sollen die Zahlen” (roughly,
“What are the natural numbers, and what should they be?”) which aimed
to explain the natural numbers in purely “logical” terms. In 1887 Kro-
necker wrote “Über den Zahlbegriff” (“On the concept of number”) where
he spoke of representing all mathematical object in terms of the integers; in
1889 Giuseppe Peano gave formal, symbolic axioms for the natural numbers.
The end of the nineteenth century also brought a new boldness in dealing
with the infinite. Before then, infinitary objects and structures (like the set
of natural numbers) were treated gingerly; “infinitely many” was understood
as “as many as you want,” and “approaches in the limit” was understood as
“gets as close as you want.” But Georg Cantor showed that it was impossible
to take the infinite at face value. Work by Cantor, Dedekind, and others
help to introduce the general set-theoretic understanding of mathematics
that we discussed earlier in this course.
Which brings us to twentieth century developments in logic and foun-
dations. In 1902 Russell discovered the paradox in Frege’s logical system.
In 1904 Zermelo proved Cantor’s well-ordering principle, using the so-called
“axiom of choice”; the legitimacy of this axiom prompted a good deal of
debate. Between 1910 and 1913 the three volumes of Russell and White-
head’s Principia Mathematica appeared, extending the Fregean program of
establishing mathematics on logical grounds. Unfortunately, Russell and
Whitehead were forced to adopt two principles that seemed hard to jus-
tify as purely logical: an axiom of infinity and an axiom of “reducibility.”
In the 1900’s Poincaré criticized the use of “impredicative definitions” in
mathematics, and in the 1910’s Brouwer began proposing to refound all of
mathematics in an “intuitionistic” basis, which avoided the use of the law
of the excluded middle (p ∨ ¬p).
Strange days indeed! The program of reducing all of mathematics to
logic is now referred to as “logicism,” and is commonly viewed as having
failed, due to the difficulties mentioned above. The program of developing
mathematics in terms of intuitionistic mental constructions is called “in-
tuitionism,” and is viewed as posing overly severe restrictions on everyday
mathematics. Around the turn of the century, David Hilbert, one of the
most influential mathematicians of all time, was a strong supporter of the
84 CHAPTER 4. INCOMPLETENESS
new, abstract methods introduced by Cantor and Dedekind: “no one will
drive us from the paradise that Cantor has created for us.” At the same
time, he was sensitive to foundational criticisms of these new methods (oddly
enough, now called “classical”). He proposed a way of having one’s cake and
eating it too:
• Use safe, “finitary” methods to prove that these formal deductive sys-
tems are consistent.
are all terms. Strictly speaking, there should be more parentheses, and
function symbols should all be written before the arguments (e.g. +(x, y)),
but we will adopt the usual conventions for readability. I will typically use
symbols r, s, t to range over terms, as in “let t be any term.” Some terms,
like the last one above, have no variables; they are said to be “closed.”
Once one has specified the set of terms, one then defines the set of for-
mulas. Do not confuse these with terms: terms name things, while formulas
say things. I will use Greek letters like ϕ, ψ, and θ to range over formulas.
Some examples are
is read
((∀x R(x)) → ((¬S(z)) ∧ T (u))).
One can extend both the semantic and syntactic notions above to first-
order logic. On the semantic side, an interpretation of a language is called a
model. Intuitively, it should be clear which of the following models satisfies
the sentence ∀x ∃y R(x, y):
• hN, <i, that is, the interpretation in which variables range over natural
number and R is interpreted as the less-than relation
• hZ, <i
• hN, >i
• hN, |i
Here too the hard direction is completeness; this was proved by Gödel
in 1929. It may seem consummately confusing that he later proved incom-
pleteness in 1931, but it is important to keep in mind that the two theorems
say very different things.
• Gödel’s completeness theorem says that the usual deductive systems
are complete for the semantics “true in all models.”
– ∀x ϕ(x) → ϕ(t)
– ϕ(t) → ∃x ϕ(x)
– ∀x (ϕ → ψ(x)) → (ϕ → ∀x ψ(x)), as long as x is not free in ϕ.
– ∀x (x = x)
88 CHAPTER 4. INCOMPLETENESS
– ∀x (x = y → y = x)
– ∀x (x = y ∧ y = z → z = y)
– ∀x0 , . . . , xk , y0 , . . . , yk (x0 = y0 ∧ . . . ∧ xk = yk ∧ ϕ(x0 , . . . , xk ) →
ϕ(y0 , . . . , yk )).
Note that the first clause relies on the fact that the set of propositional
validities is decidable. Note also that there are infinitely many axioms above;
for example, the first quantifier axiom is really an infinite list of axioms, one
for each formula ϕ. Finally, there are three rules that allow you to derive
more theorems:
Incidentally, any sound and complete deductive system will satisfy what
is known as the deduction theorem: if Γ is any set of sentences and ϕ and
ψ are any sentences, then if Γ ∪ {ϕ} ` ψ, then Γ ` ϕ → ψ (the converse is
obvious). This is often useful. Since ¬ϕ is logically equivalent to ϕ → ⊥,
where ⊥ is any contradiction, the deduction theorem implies that Γ ∪ {ϕ}
is consistent if and only if Γ 6` ¬ϕ, and Γ ∪ {¬ϕ} is consistent if and only if
Γ 6` ϕ.
Where are we going with all this? We would like to bring computabil-
ity into play; in other words, we would like to ask questions about the
computability of various sets and relations having to do with formulas and
proofs. So the first step is to choose numerical codings of
• terms,
• formulas, and
• proofs
One can do similar things for formulae, and then a proof is just a sequence
of formulae satisfying certain restrictions. It is not difficult to choose cod-
ing such that the following, for example, are all computable (and, in fact,
primitive recursive):
4.3 Representability in Q
Let us start by focusing on theories of arithmetic. We will describe a
very minimal such theory called “Q” (or, sometimes, “Robinson’s Q,” af-
ter Raphael Robinson). We will say what it means for a function to be
representable in Q, and then we will prove the following:
For one thing, this provides us with yet another model of computability.
But we will also use it to show that the set {ϕ | Q ` ϕ} is not decidable, by
reducing the halting problem to it. By the time we are done, we will have
proved much stronger things than this; but this initial sketch gives a good
sense of where we are headed.
First, let us define Q. The language of Q is the language of arithmetic,
as described above; Q consists of the following axioms (to be used in con-
junction with the other axioms and rules of first-order logic with equality):
1. x0 = y 0 → x = y
2. 0 6= x0
3. x 6= 0 → ∃y (x = y 0 )
4. x + 0 = x
5. x + y 0 = (x + y)0
6. x × 0 = 0
7. x × y 0 = x × y + x
8. x < y ↔ ∃z (z 0 + x = y)
00 0
For each natural number n, define the numeral n to be the term 0 ... where
there are n tick marks in all. (Note that the book does not take < to be
4.3. REPRESENTABILITY IN Q 91
• ϕf (n0 , . . . , nk , m)
There are ways of stating the definition; for example, we could equiva-
lently require that Q proves ∀y (ϕf (n0 , . . . , nk , y) ↔ m = y), where we can
take θ ↔ η to abbreviate (θ → η) ∧ (η → θ). The main theorem of this
section is the following:
There are two directions to proving the theorem. One of them is fairly
straightforward.
Proof. All we need to know is that we can code terms, formulas, and proofs
in such a way that the relation “d is a proof of ϕ in theory Q” is computable,
as well as the function SubNumeral (ϕ, n, v) which returns (a numerical code
of) the result of substituting the numeral corresponding to n for the vari-
able (coded by) v in the formula (coded by) ϕ. Assuming this, suppose
92 CHAPTER 4. INCOMPLETENESS
This completes the proof, modulo the (involved but routine) details of coding
and defining the function and relation above.
The other direction is more interesting, and requires more work. We will
complete the proof as follows:
• We will show that C is the set of computable functions, i.e. our defi-
nition provides another characterization of computability.
Our proof will follow the proof in Chapter 22 of the textbook very closely.
(The textbook takes C to include partial functions as well, but then implic-
itly restricts to the total ones later in the proof.)
Let C be the smallest set of functions containing
• 0,
• successor,
• addition,
• multiplication,
• projections, and
• composition and
Remember this last restriction means simply that you can only use the µ
operation when the result is total. Compare this to the definition of the
general recursive functions: here we have added plus, times, and χ= , but we
have dropped primitive recursion. Clearly everything in C is recursive, since
plus, times, and χ= are. We will show that the converse is also true; this
amounts to saying that with the other stuff in C we can carry out primitive
recursion.
To do so, we need to develop functions that handle sequences. (If we had
exponentiation as well, our task would be easier.) When we had primitive
recursion, we could define things like the “nth prime,” and pick a fairly
straightforward coding. But here we do not have primitive recursion, so we
need to be more clever.
Lemma 4.3.4 There is a function β(d, i) in C such that for every sequence
a0 , . . . , an there is a number d, such that for every i less than or equal to n,
β(d, i) = ai .
Definition 4.3.5 Two natural numbers a and b are relatively prime if their
greatest common divisor is 1; in other words, they have no other divisors in
common.
Definition 4.3.6 a ≡ b mod c means c|(a − b), i.e. a and b have the same
remainder when divided by c.
z ≡ y0 mod x0
z ≡ y1 mod x1
..
.
z ≡ yn mod xn .
I will not prove this theorem, but you can find the proof in many number
theory textbooks. The proof is also outlined as exercise 1 on page 201 of
the textbook.
Here is how we will use the Chinese remainder theorem: if x0 , . . . , xn
are bigger than y0 , . . . , yn respectively, then we can take z to code the se-
quence hy0 , . . . , yn i. To recover yi , we need only divide z by xi and take
the remainder. To use this coding, we will need to find suitable values for
x0 , . . . , xn .
A couple of observations will help us in this regard. Given y0 , . . . , yn , let
j = max(n, y0 , . . . , yn ) + 1,
and let
x0 = 1 + j!
x1 = 1 + 2 · j!
x2 = 1 + 3 · j!
..
.
xn = 1 + (n + 1) · j!
at most n, and we have chosen j > n, so this implies that p|j!, again a
contradiction. So there is no prime number dividing both xi and xk . Clause
2 is easy: we have yi < j < j! < xi .
Now let us prove the β function lemma. Remember that C is the smallest
set containing 0, successor, plus, times, χ= , projections, and closed under
composition and µ applied to regular functions. As usual, say a relation is in
C if its characteristic function is. As before we can show that the relations
in C are closed under boolean combinations and bounded quantification; for
example:
• not(x) = χ= (x, 0)
• µx ≤ z R(x, y) = µx (R(x, y) ∨ x = z)
• ∃x ≤ z R(x, y) ≡ R(µx ≤ z R(x, y), y)
We can then show that all of the following are in C:
• The pairing function, J(x, y) = 12 [(x + y)(x + y + 1)] + x
• Projections
K(z) = µx ≤ q (∃y ≤ z (z = J(x, y)))
and
L(z) = µy ≤ q (∃x ≤ z (z = J(x, y))).
• x<y
• x|y
• The function rem(x, y) which returns the remainder when y is divided
by x
Now define
β ∗ (d0 , d1 , i) = rem(1 + (i + 1)d1 , d0 )
and
β(d, i) = β ∗ (K(d), L(d), i).
This is the function we need. Given a0 , . . . , an , as above, let
j = max(n, a0 , . . . , an ) + 1,
and let d1 = j!. By the observations above, we know that 1 + d1 , 1 +
2d1 , . . . , 1 + (n + 1)d1 are relatively prime and all are bigger than a0 , . . . , an .
By the Chinese remainder theorem there is a value d0 such that for each i,
d0 ≡ ai mod (1 + (i + 1)d1 )
96 CHAPTER 4. INCOMPLETENESS
ai = rem(1 + (i + 1)d1 , d0 ).
β(d, i) = β ∗ (d0 , d1 , i)
= rem(1 + (i + 1)d1 , d0 )
= ai
which is what we need. This completes the proof of the β-function lemma.
Now we can show that C is closed under primitive recursion. Suppose
f (~z) and g(u, v, ~z) are both in C. Let h(x, ~z) be the function defined by
where now (d)i is short for β(d, i). In other words, ĥ returns a sequence that
begins hh(0, ~z), h(1, ~z), . . . , h(x, ~z)i. ĥ is in C, because we can write it as
Note that the lemma does not say much: in essence it says that Q
can prove that different numerals denote different objects. For example, Q
proves 000 6= 0000 . But showing that this holds in general requires some care.
Note also that although we are using induction, it is induction outside of Q.
I will continue on through Lemma 12. At that point, we will be able
to represent zero, successor, plus, times, and the characteristic function for
equality, and projections. In each case, the appropriate representing function
is entirely straightforward; for example, zero is represented by the formula
y = 0,
x00 = y,
x0 + x1 = y.
The work involves showing that Q can prove the relevant statements; for ex-
ample, saying that plus is represented by the formula above involves showing
that for every pair of natural numbers m and n, Q proves
n+m=n+m
98 CHAPTER 4. INCOMPLETENESS
and
∀y (n + m = y → y = n + m).
What about composition? Suppose h is defined by
h(x0 , . . . , xl−1 ) = f (g0 (x0 , . . . , xl−1 ), . . . , gk−1 (x0 , . . . , xl−1 )).
where we have already found formulas ϕf , ϕg0 , . . . , ϕgk−1 representing the
functions f, g0 , . . . , gk−1 , respectively. Then we can define a formula ϕh
representing h, by defining ϕh (x0 , . . . , xl−1 , y) to be
x < n + 1 → x = 0 ∨ . . . x = n.
Proof. Let us do 1 and part of 2, informally (i.e. only giving hints as to how
to construct the formal derivation).
For part 1, by the definition of <, we need to prove ¬∃y (y 0 + x = 0)
in Q, which is equivalent (using the axioms and rules of first-order logic) to
∀y (y 0 + x 6= 0). Here is the idea: suppose y 0 + x = 0. If x is 0, we have
y 0 + 0 = 0. But by axiom 4 of Q, we have y 0 + 0 = y 0 , and by axiom 2 we
have y 0 6= 0, a contradiction. So ∀y (y 0 + x 6= 0). If x is not 0, by axiom 3
there is a z such that x = z 0 . But then we have y 0 + z 0 = 0. By axiom 5, we
have (y 0 + z)0 = 0, again contradicting axiom 2.
For part 2, use induction on n. Let us consider the base case, when
n = 0. In that case, we need to show x < 1 → x = 0. Suppose x < 1. Then
by the defining axiom for <, we have ∃y ((y 0 + x) = 00 ). Suppose y has that
property; so we have y 0 + x = 00 .
We need to show x = 0. By axiom 3, if x is not 0, it is equal to z 0 for
some z. Then we have y 0 + z 0 = 00 . By axiom 5 of Q, we have (y 0 + z)0 = 00 .
By axiom 1, we have y 0 + z = 0. But this means, by definition, z < 0,
contradicting part 1.
For the induction step, and more details, see the textbook.
With these two definitions and theorems in hand, we have opened the flood-
gates between logic and computability, and now we can use the work we
4.4. THE FIRST INCOMPLETENESS THEOREM 101
Proof. It is not hard to see that Q is c.e., since it is the set of (codes for)
sentences y such that there is a proof x of y in Q:
But we know that PrQ (x, y) is computable (in fact, primitive recursive), and
any set that can be written in the above form is c.e.
Saying that it is a complete c.e. set is equivalent to saying that K ≤m Q,
where K = {x | ϕx (x) ↓}. So let us show that K is reducible to Q. Since
Kleene’s predicate T (e, x, s) is primitive recursive, it is representable in Q,
say by ϕT . Then for every x, we have
x ∈ K → ∃s T (x, x, s)
→ ∃s(Q proves ϕT (x, x, s))
→ Q proves ∃s ϕT (x, x, s).
Conversely, if Q proves ∃s ϕT (x, x, s), then, in fact, for some natural number
n the formula ϕT (x, x, n) must be true. Now, if T (x, x, n) were false, Q
would prove ¬ϕT (x, x, n), since ϕT represents T . But then Q proves a false
formula, which is a contradiction. So T (x, x, n) must be true, which implies
ϕx (x) ↓.
In short, we have that for every x, x is in K if and only if Q proves
∃s T (x, x, s). So the function f which takes x to (a code for) the sentence
∃s T (x, x, s) is a reduction of K to Q.
The proof above relied on the fact that any sentence provable in Q is
“true” of the natural numbers. The next definition and theorem strengthen
102 CHAPTER 4. INCOMPLETENESS
this theorem, by pinpointing just those aspects of “truth” that were needed
in the proof above. Don’t dwell on this theorem too long, though, because
we will soon strengthen it even further. I am including it mainly for his-
torical purposes: Gödel’s original paper used the notion of ω-consistency,
but his result was strengthened by replacing ω-consistency with ordinary
consistency soon after.
to R(k, y). But then we have that S(k) is equivalent to both R(k, k) and
¬R(k, k), which is a contradiction.
S(n) → T ` θS (n)
→ R(#(θS (u)), n)
and
That is, for every y, S(y) is true if and only if R(#(θS (u)), y) is. So R is
universal, and we have the contradiction we were looking for.
Let “true arithmetic” be the theory {ϕ | hN, 0,0 , +, ×, <i |= ϕ}, that
is, the set of sentences in the language of arithmetic that are true in the
standard interpretation.
S(n) → Q ` θS (n)
→ θS (n) ∈ C
and
The following theorem says that not only is Q undecidable, but, in fact,
any theory that does not disagree with Q is undecidable.
C = {ϕ | T ` η → ϕ}.
Corollary 4.4.12 First-order logic for the language of arithmetic (that is,
the set {ϕ | ϕ is provable in first-order logic}) is undecidable.
The proof is just a small modification of the proof of the last theorem;
one could use a counterexample to get a separation of Q and Q0 . One can
take ZFC , Zermelo Fraenkel set theory with the axiom of choice, to be an
axiomatic foundation that is powerful enough to carry out a good deal of
ordinary mathematics. In ZFC one can define the natural numbers, and via
this interpretation, the axioms of Q are true. So we have
4.5. THE FIXED-POINT LEMMA 107
The language of ZFC has only a single binary relation, ∈. (In fact, you
don’t even need equality.) So we have
Corollary 4.4.16 First-order logic for any language with a binary relation
symbol is undecidable.
This result extends to any language with two unary function symbols, since
one can use these to simulate a binary relation symbol. The results just
cited are tight: it turns out that first-order logic for a language with only
unary relation symbols and at most one unary function symbol is decidable.
One more bit of trivia. We know that the set of sentences in the language
0, S, +, ×, < true in the standard model. In fact, one can define < in terms
of the other symbols, and then one can define + in terms of × and S. So
the set of true sentences in the language 0, S, × is undecidable. On the
other hand, Presberger has shown that the set of sentences in the language
0, S, + true in the language of arithmetic is decidable. The procedure is
computationally infeasible, however.
diag(#(ψ(diag(x)))) = #(ψ(diag(pψ(diag(x))q))
= #(ϕ).
diag(pψ(diag(x))q) = pϕq,
it can prove ψ(diag(pψ(diag(x))q)) ↔ ψ(pϕq). But the left hand side is, by
definition, ϕ.
In general, diag will not be a function symbol of T . But since T extends
Q, the function diag will be represented in T by some formula θdiag (x, y). So
instead of writing ψ(diag(x)) we will have to write ∃y (θdiag (x, y) ∧ ψ(y)).
Otherwise, the proof sketched above goes through.
Proof of the fixed point lemma. Given ψ(x), let η(x) be the formula
∃y (θdiag (x, y) ∧ ψ(y)) and let ϕ be the formula η(pη(x)q).
Since θdiag represents diag, T can prove
Going back to the definition of η(x), we see η(pη(x)q) is just the formula
Using the last two sentences and ordinary first-order logic, one can then
prove
η(pη(x)q) ↔ ψ(pϕq).
But the left-hand side is just ϕ.
You should compare this to the proof of the fixed-point lemma in com-
putability theory. The difference is that here we want to define a statement
in terms of itself, whereas there we wanted to define a function in terms of
itself; this difference aside, it is really the same idea.
110 CHAPTER 4. INCOMPLETENESS
PrT (m, pϕq). So T proves ∃x PrT (x, pϕq), which is, by definition, ProvT (pϕq).
By the equivalence (4.1), T proves ¬ϕ. We have shown that if T proves ϕ,
then it also proves ¬ϕ, and hence it is inconsistent.
For the second claim, let us show that if T proves ¬ϕ, then it is ω-
inconsistent. Suppose T proves ¬ϕ. If T is inconsistent, it is ω-inconsistent,
and we are done. Otherwise, T is consistent, so it does not prove ϕ. Since
there is no proof of ϕ in T , T proves
Can we modify Gödel’s proof, to get this stronger result? The answer is
“yes,” using a trick discovered by Rosser. Let not(x) be the primitive re-
cursive function which does the following: if x is the code of a formula ϕ,
not(x) is a code of ¬ϕ. To simplify matters, assume T has a function symbol
not such that for any formula ϕ, T proves not(pϕq) = p¬ϕq. This is not
a major assumption; since not(x) is computable, it is represented in T by
some formula θnot (x, y), and we could eliminate the reference to the function
symbol in the same way that we avoided using a function symbol diag in
the proof of the fixed-point lemma.
Rosser’s trick is to use a “modified” provability predicate Prov0T (y), de-
fined to be
ϕ ↔ ¬Prov0T (pϕq).
112 CHAPTER 4. INCOMPLETENESS
for every formula ϕ. Notice that this is really a schema, which is to say, in-
finitely many axioms (and it turns out that PA is not finitely axiomatizable).
But since one can effectively determine whether or not a string of symbols is
an instance of an induction axiom, the set of axioms for PA is computable.
PA is a much more robust theory than Q. For example, one can easily
prove that addition and multiplication are commutative, using induction in
the usual way. In fact, most finitary number-theoretic and combinatorial
arguments can be carried out in PA.
Since PA is computably axiomatized, the provability predicate Pr PA (x, y)
is computable and hence represented in Q (and so, in PA). As before, I
will take PrPA (x, y) to denote the formula representing the relation. Let
ProvPA (y) be the formula ∃x Pr PA (x, y), which, intuitively says, “y is prov-
able from the axioms of PA.” The reason we need a little bit more than
the axioms of Q is we need to know that the theory we are using is strong
enough to prove a few basic facts about this provability predicate. In fact,
what we need are the following facts:
1. If PA ` ϕ, then PA ` ProvPA (pϕq)
got around to it; since everyone who understood the argument believed that
it could be carried out, he did not need to fill in the details.)
How can we express the assertion that PA doesn’t prove its own consis-
tency? Saying PA is inconsistent amounts to saying that PA proves 0 = 1.
So we can take Con PA to be the formula ¬ProvPA (p0 = 1q), and then the
following theorem does the job:
Now suppose PA proves ConPA . Then it proves ¬ProvPA (pϕq). But since
ϕ is a Gödel sentence, this is equivalent to ϕ. So PA proves ϕ.
But: we know that if PA is consistent, it doesn’t prove ϕ! So if PA is
consistent, it can’t prove ConPA .
To make the argument more precise, we will let ϕ be the Gödel sentence
and use properties 1–3 above to show that PA proves ConPA → ϕ. This will
show that PA doesn’t prove ConPA . Here is a sketch of the proof, in PA:
The move from the third to the fourth line uses the fact that ¬ProvPA (pϕq)
is equivalent to ProvPA (pϕq) → 0 = 1 in PA. The more abstract version of
the incompleteness theorem is as follows:
4.8. LÖB’S THEOREM 115
Theorem 4.7.2 Let T be any theory extending Q and let Prov T (y) be any
formula satisfying 1–3 for T . Then if T is consistent, then T does not prove
¬Prov T (p0 = 1q).
The moral of the story is that no “reasonable” consistent theory for mathe-
matics can prove its own consistency. Suppose T is a theory of mathematics
that includes Q and Hilbert’s “finitary” reasoning (whatever that may be).
Then, the whole of T cannot prove the consistency of T , and so, a fortiori,
the finitary fragment can’t prove the consistency of T either. In that sense,
there cannot be a finitary consistency proof for “all of mathematics.”
There is some leeway in interpreting the term finitary, and Gödel, in the
1931 paper, grants the possibility that something we may consider “finitary”
may lie outside the kinds of mathematics Hilbert wanted to formalize. But
Gödel was being charitable; today, it is hard to see how we might find
something that can reasonably be called finitary but is not formalizable in,
say, ZFC .
Theorem 4.8.1 Let T be any theory extending Q, and suppose Prov T (y)
is a formula satisfying conditions 1–3 from the previous section. If T proves
Prov T (pϕq) → ϕ, then in fact T proves ϕ.
2. Suppose X is true.
3. Then what is says is true; i.e. if X is true, then Santa Claus exists.
116 CHAPTER 4. INCOMPLETENESS
θ → (Prov T (pθq) → ϕ)
Prov T (pθ → (Prov T (pθq) → ϕ)q) by 1
Prov T (pθq) → Prov T (pProv T (pθq) → ϕq) using 2
Prov T (pθq) → (Prov T (pProv T (pθq)q) → Prov T (pϕq)) using 2
Prov T (pθq) → Prov T (pProv T (pθq)q) by 3
Prov T (pθq) → Prov T (pϕq)
Prov T (pϕq) → ϕ by assumption
Prov T (pθq) → ϕ
θ def of θ
Prov T (pθq) by 1
ϕ
2. T ` ϕ → Prov(pϕq).
3. If T ` Prov(pϕq), then T ` ϕ.
4. T ` Prov(pϕq) → ϕ
Under what conditions are each of these statements true?
4.9. THE UNDEFINABILITY OF TRUTH 117
S = {x | ∃y T (e, x, y)}.
S = {x | N |= ∃y θT (e, x, y)},
so ∃y θT (e, x, y) defines S is N .
118 CHAPTER 4. INCOMPLETENESS
So, more sets are definable in N . For example, it is not hard to see that
complements of c.e. sets are also definable. The sets of numbers definable
in N are called, appropriately, the “arithmetically definable sets,” or, more
simply, the “arithmetic sets.” I will draw a picture on the board.
What about Arith itself? Is it definable in arithmetic? That is: is the
set {pϕq | N |= ϕ} definable in arithmetic? Tarski’s theorem answers this
in the negative.
Proof. Suppose θ(x) defined it. By the fixed-point lemma, there is a formula
ϕ such that Q proves ϕ ↔ ¬θ(pϕq), and hence N |= ϕ ↔ ¬θ(pϕq). But
then N |= ϕ if and only if N |= ¬θ(pϕq), which contradicts the fact that
θ(y) is suppose to define the set of true statements of arithmetic.
However, for any language strong enough to represent the diagonal function,
and any linguistic predicate T (x), and can construct a sentence X satisfying
“X if and only if not T (‘X’).” Given that we do not want a truth predicate
to declare some sentences to be both true and false, Tarski concluded that
one cannot specify a truth predicate for all sentences in a language without,
somehow, stepping outside the bounds of the language. In other words, a
the truth predicate for a language cannot be defined in the language itself.
Chapter 5
Undecidability
119
120 CHAPTER 5. UNDECIDABILITY
Most of the examples I will discuss are in the handout I have given you,
taken from Lewis and Papadimitriou’s book, Elements of the Theory of
Computation. Hilbert’s 10th problem is discussed in an appendix to Martin
Davis’ book Computability and Unsolvability.
over some fixed (finite) alphabet. Given such a system and strings x and
y, say x can be transformed to y in one step if there is a pair {u, v} in the
system such that y can be obtained from x be replacing some substring u in
x by v; in other words, for some (possibly empty) strings s and t, x is sut
and y is svt. Say that x and y are equivalent if x can be transformed to y
in finitely many steps. For example, given the Thue system
In fact, there is even a single Thue system and a string x such that the
following question is undecidable:
A match is a sequence of pairs hxi , yi i from the list above (with duplicates
allowed) such that
x0 x1 . . . xk = y0 y1 . . . yk .
For example, try finding a match for the following system:
• A set of rules, i.e. pairs hu, vi, where u is a string of symbols with at
least one nonterminal symbol, and v is a string of symbols.
You can think of the symbols as denoting grammatical elements, and the
terminal symbols as denoted basic elements like words or phrases. In the
example below, you can think of Se as standing for “sentence,” Su a standing
for “subject,” P r as standing for “predicate,” and so on.
Se → Su P r
Su → Art N
Art → the
Art → a
N → dog
N → boy
N → ball
Pr → V I
P r → V T Su
V I → flies
V I → falls
VT → kicks
VT → throws
In the general setup, there may be more than one symbol on the left side;
such grammars are called “unrestricted,” or “context sensitive,” because you
can think of the extra symbols on the left as specifying the context in which
a substitution can occur. For example, you could have rules
P r → P r and P r
and P r → and his P r
S → ∅
S → ABCS
AB → BA
BA → AB
AC → CA
CA → AC
BC → CB
CB → BC
A → a
B → b
C → c
It is less obvious that one can code Turing machine computations into gram-
mars, but the handout shows that this is the case. As a result, the following
questions are undecidable:
• Given a grammar G and a string w, is w in the language generated by
G?
• Given grammars G1 and G2 , do they generate the same language?
• Given a grammar G, is anything in the language generated by G?
The first question shows that in general it is not possible to parse an unre-
stricted grammar, which is obviously an undesirable feature for some formal
languages, such as programming languages. For that very reason, computer
scientists are interested in more restricted classes of grammars, for which
one can parse computably, and even reasonably efficiently. For example, if
in every rule hu, vi has to consist of a single nonterminal symbol, one has
the context free grammars, and these can be parsed. But still the following
questions are undecidable:
• Given context free grammars G1 and G2 , is there any string that is
simultaneously in both languages?
• Given a context free grammar G, does every string in the language of
G have a unique parse tree? (In other words, is the grammar unam-
biguous?)
124 CHAPTER 5. UNDECIDABILITY
xxyz = xy + xy + xy + yz.
x2 yz = 3xy + yz,
• 15x + 6y = 19
• x2 = 48
• 2x2 − y 2 = 56