240 PDF
240 PDF
Course 240
Contents
Introduction 5
I Computability 8
1 What is an algorithm? 8
1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 What is an algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Why use Turing machines? . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Church’s thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Summary of section . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Unsolvable problems 66
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 The halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Gödel’s incompleteness theorem . . . . . . . . . . . . . . . . . . . . 79
5.5 Summary of section . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6 Part I in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
II Algorithms 88
6 Use of algorithms 88
6.1 Run time function of an algorithm . . . . . . . . . . . . . . . . . . . 88
6.2 Choice of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4 Useful books on algorithms . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 Summary of section . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7 Graph algorithms 96
7.1 Graphs: the basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2 Representing graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 Algorithm for searching a graph . . . . . . . . . . . . . . . . . . . . 99
7.4 Paths and connectedness . . . . . . . . . . . . . . . . . . . . . . . . 105
7.5 Trees, spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.6 Complete graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.7 Hamiltonian circuit problem (HCP) . . . . . . . . . . . . . . . . . . 110
7.8 Summary of section . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12 NP-completeness 153
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
12.2 Proving NP-completeness by reduction . . . . . . . . . . . . . . . . 156
12.3 Cook’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
12.4 Sample exam questions on Parts II, III . . . . . . . . . . . . . . . . . 160
12.5 Part III in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Index 165
5
Introduction
The text and index are copyright (° c ) Ian Hodkinson. You may use them freely so long
as you do not sell them for profit.
The text has been used by Ian Hodkinson and Margaret Cunningham as coursenotes
in the 20-lecture second-year undergraduate course ‘240 Computability, algorithms,
and complexity’ in the Department of Computing at Imperial College, London, UK,
since 1991.
Italic font is used for emphasis, and bold to highlight some technical terms. ‘E.g.’
means ‘for example, viz. means ‘namely’, ‘i.e.’ means ‘that is’, and iff means ‘if and
only if’. § means ‘section’ — for example, ‘§5.3.3’ refers to the section called ‘The
Turing machine EDIT ’ starting on page 74.
There are bibliographic references on pages 6, 21, 96, and 154.
Part I
Computability
1. What is an algorithm?
We begin Part I with a problem that could pose difficulties for those who think com-
puters are ‘all-powerful’. To analyse the problem, we then discuss the general notion
of an algorithm (as opposed to particular algorithms), and why it is important.
8
1.1. The problem 9
1 repeat forever
2 generate the next program Pn in the list
3 run Pn as far as the nth bit of the output
4 if Pn terminates or prompts for input before the nth bit is output then
5 output 1
6 else if the nth bit of Pn ’s output is 0 then
7 output 1
8 else if the nth bit of Pn ’s output is 1 then
9 output 0
10 end if
11 end repeat
• This language is not quite Java, but the idea is the same — certainly we could
write it formally in Java.
• Generating and running the next program (lines 2 and 3) is easy — we generate
all strings of text in alphabetical order, and use an interpreter to check each string
in turn for syntactic errors. If there are none, the string is our next program, and
the interpreter can run it. This is slow, but it works.
• We assume that we can write an interpreter in our language — certainly we can
write a Java interpreter in Java.
• In each trip round the loop, the interpreter is provided with the text of the next
program, Pn , and the number n. The interpreter runs Pn , halting execution if
(a) Pn itself halts, (b) Pn prompts for input or tries to read a file, or (c) Pn has
produced n bits of output.
• All other steps of P are easy to implement.
So P is a legitimate program. So P is in the list of Pn s. Which Pn is P?
Suppose that P is P7 , say. Then P has the same output as P7 . Now on the seventh
loop of P, P7 (i.e., P) will be generated, and run as far as its seventh output bit. The
possibilities are:
1. P7 halts or prompts for input before it outputs 7 bits (impossible, as the code for
P = P7 has no HALT or READ statement!)
2. P7 does output bit 7, and it’s 0. Then P outputs 1 (look at the code above). But
this 1 will be the 7th output bit of P = P7 , a contradiction!
3. P7 does output bit 7, and it’s 1. Then P outputs 0 (look at the code again). But
this 0 will be P’s 7th output bit, and P = P7 !
This is a contradiction: if P7 outputs 0 then P outputs 1, and vice versa; yet P was
supposed to be P7 . So P is not P7 after all.
In the same way we can show that P is not Pn for any n, because P differs from Pn
at the nth place of its output. So P is not in our list of programs. This is impossible, as
the list contains all programs of our language!
10 1. What is an algorithm?
Paradoxes might not be too worrying in a natural language like English. We might
suppose that English is vague, or the speaker is talking nonsense. But we think of com-
puting as a precise engineering-mathematical discipline. It is used for safety-critical
applications. Certainly it should not admit any paradoxes. We should therefore exam-
ine our ‘paradox’ very carefully.
It may be that it comes from some quirk of the programming language. Perhaps
a better version of Java or whatever would avoid it. In Part I of the course our aim is
first to show that the ‘paradox’ above is extremely general and occurs in all reasonable
models of computing. We will do this by examining a very simple model of a computer.
In spite of its simplicity we will give evidence for its being fully general, able to do
anything that a computer — real or imagined — could.
We will then rediscover the ‘paradox’ in our simple model. I have to say at this
point that there is no real paradox here. The argument above contained an implicit
assumption. [What?] Nonetheless, there is still a problem: the implicit assumption
cannot be avoided, because if it could, we really would have a paradox. So we cannot
‘patch’ our program P to remove the assumption!
But now, because our simple model is so general, we are forced to draw funda-
mental conclusions about the limitations of computing itself. Certain precisely-stated
problems are unsolvable by a computer even in principle. (We cannot write a patch for
P.)
There are lots of unsolvable problems! They include:
• printing out all the true statements about arithmetic and no false ones (Gödel’s
incompleteness theorem).
Undeniably these are problems for which solutions would be very useful.
In Part III of the course we will apply the idea of self-reference again to NP-
complete problems — not now to the question of what we can compute, but to how
fast can we compute it. Here our results will be more positive in tone.
• simple and without extraneous details, so we can reason easily with it.
• general, so that all algorithms are covered.
Once formalised, an idea can be explored with rigour, using high-powered mathe-
matical techniques. This can pay huge dividends. Once gravity was formalised by
Newton as F = Gm1 m2 /r2 , calculations of orbits, tides, etc., became possible, with all
that that implies. Pay-offs from the formalisation of algorithm included the modern
programmable computer itself.1 This is quite a spectacular pay-off! Others include
the answer to Hilbert’s question, related work in complexity (see Part III) and more
besides.
Tape
A B A ∧ 3 ∧ ∧
(one-way
infinite)
0 1 2 3 4 5 6
symbols from 'blank'
square alphabet ∑ symbol
number state set Q
(not visible
starting state
to TM head!)
instruction Turing final states
table δ machine
head
The tape The main memory of a TM is a 1-way-infinite tape, viewed as laid out from
left to right. The tape goes off to the right, forever. It is divided into squares,
numbered 0, 1, 2, . . . ; these numbers are for our convenience and are not seen
by the Turing machine.
The alphabets In each square of the tape is written a single symbol. These symbols
are taken from some finite alphabet. We will use the Greek letter sigma (Σ)
to denote the alphabet. The alphabet Σ is part of the Turing machine. Σ is just
a set of symbols, but it will always be finite with at least two symbols, one of
which is a special blank symbol which we always write as ‘∧’. Subject to these
restrictions, a Turing machine can have any finite alphabet Σ we like.
14 1. What is an algorithm?
A blank in a square really means that the square is empty. Having a symbol for
‘empty’ is convenient — we don’t have to have a special case for empty squares,
so things are kept simple.
The read/write head The TM has a single head, as on a tape recorder. The head can
read and write to the tape.
At any given moment the head of the TM is positioned over a particular square
of the tape — the current square. At the start, the head is over square 0.
The set of states The TM has a finite set Q of states. There is a special state q0 in
Q, called the starting state or initial state. The machine begins in the starting
state, and changes state as it goes along. At any given stage, the machine will
be ‘in’ some particular state in Q, called the current state. The current state is
one of the two factors that determine, at each stage, what it does next (the other
is the symbol in the square where the head is). The state of the TM corresponds
roughly to the current instruction together with the contents of the registers in a
conventional computer. It gives our ‘current position’ within the algorithm.
• In (b), the TM could write the same symbol as it just read. So in this case, the
tape contents will not change.
• Similarly, in (d) the state of the TM may not change (as perhaps in a loop). In
(c), the head may not move.
Also notice that the tape will always contain only finitely many non-blank symbols,
because at the start only finitely many squares are not blank, and at most one square is
altered at each step.
Knowing the current state and symbol, the Turing machine can read down the list
to find the relevant line, and take action according to what it finds. To avoid ambiguity,
no pair (current-state; current-symbol) should occur in more than one line of the list.3
(You might think that every such pair should occur somewhere in the list, but in fact
we don’t insist on this: see Halting below.)
Clearly, the ‘programming language’ is very low-level, like assembler. This fits
our wish to keep things simple. But we will see some higher-level constructs for TMs
later.
3. If the head is over square 0 of the tape and tries to move left from square 0
along the tape, we count it as ‘no applicable instruction’ (because there is no
tape square to the left of square 0, so the TM is stuck again). So in this case the
machine also halts and fails. Again, the output is undefined.
Of course the machine may never halt — it may go on running forever. If so, the output
is again undefined. E.g., it may be writing the decimal expansion of π on the tape ‘to
the last place’ (there is a Turing machine that does this). Or it may get into a loop: i.e.,
at some point of the run, its ‘configuration’ (state, tape and head position) are exactly
the same as at some earlier point, so that from then on, the same configurations will
recycle again, forever. (A machine computing π never does this, as the tape keeps
changing as more digits are printed. It never halts, but it doesn’t loop, either.)
1.3.6 Summary
The Turing machine has a 1-way infinite tape, a read/write head, and a finite set of
states. It looks at its state and reads the current square, and then writes, moves and
changes state according to its instruction table. Get-State, Read, Write, Move, Next-
State. It does this over and over again, until it halts, if at all. And that’s it!
• It fits the requirements that the formalisation of algorithm should be precise and
simple. (We’ll make it even more precise in section 2.) Its generality will be
discussed when we come to Church’s thesis — the architecture of the Turing
machine allows strong intuitive arguments here.
• It is the standard benchmark for reasoning about the time or space used by an
algorithm (see Part III).
• It is now part of the computing culture. Its historical importance is great and
every computer scientist should be familiar with it.
Why not adopt (say) a Cray YMP as our model? We could, but it would be too complex
for our work. Our aim here is to study the concept of computability. We are concerned
with which problems can be solved in principle, and not (yet) with practicality. So a
very simple model of a computer, whose workings can be explained fully in a page
or two, is better for us than one that takes many manuals to describe, and may have
unknown bugs in it. And one can prove that a TM can solve exactly the same problems
as an idealised Cray with unlimited memory!
turned out to be algorithms under either definition! Their definitions were equivalent.
Now if two people independently put forward quite different-looking definitions of
algorithm that turn out to be equivalent, we may take it as evidence for the correctness
of both. Such a ‘coincidence’ hints that there is in nature a genuine class of things that
are algorithms, one that fits most definitions we offer.
This and other considerations (below) made Church put forward his famous Thesis,
which says that the definition is the correct one.
This is also known as the Church-Turing thesis, and, when phrased in terms of
Turing machines, it is certainly argued for in Turing’s 1936 paper, which was writ-
ten without knowing Church’s work. But the shorter title is probably more common,
though less just.
(b) A proof of the equivalence of two definitions (in case the new definition has a greater
intuitive appeal).
They all look very different, but can solve (at best) precisely the same problems
as Turing machines. As we will see, various souped-up versions of the Turing
machine itself — even non-deterministic variants — are also equivalent to the
basic model.
The essential features of the Turing machine are:
• its computations work in a discrete way, step by step, acting on only a
finite amount of information at each stage,
• it uses finite but unbounded storage.
Any model with these two features will probably lead to an equivalent definition
of algorithm.
(c) There are intuitive arguments that any algorithm could be implemented by a
Turing machine. In his paper, Turing imagines someone calculating (computing)
by hand.
It is always possible for the computer to break off from his work, to go
away and forget all about it, and later to come back and go on with it. If
he does this he must leave a note of instructions (written in some standard
form) explaining how the work is to be continued. . . . We will suppose that
the computer works in such a desultory manner that he never does more than
one step at a sitting. The note of instructions must enable him to carry out
one step and write the next note. Thus the state of progress of the compu-
tation at any stage is completely determined by the note of instructions and
the symbols on the tape. . . . This [combination] may be called the “state
formula”. We know that the state formula at any given stage is determined
by the state formula before the last step was made, and we assume that the
relation of these two formulae is expressible. In other words we assume that
there is an axiom A which expresses the rules governing the behaviour of the
computer, in terms of the relation of the state formula at any stage to the state
formula at the preceding stage. If this is so, we can construct a machine to
write down the successive state formulae, and hence to compute the required
number. (pp. 253–4).
So for Turing, any calculation that a person can do on paper could be done by a
Turing machine: type (c) (i.e., intuitive) evidence for Church’s thesis. He also showed
that π, e, etc., can be printed out by a TM (type (a) evidence), and in an appendix
proved the equivalence to Church’s lambda calculus formalisation (type (b)).
Is Church’s thesis really true then? Can Turing machines do interactive work?
Well, as the ‘specification’ for an interactive system corresponds to a function whose
input and output are ‘infinite’ (the interaction can go on forever), the Turing machine
model needs modifying. But the basic Turing machine hardware is still adequate —
it’s only how we use it that changes. For example, every time the Turing machine
reaches a halting state, we might look at its output, overwrite it with a new input of our
choice (depending on the previous output), and set it off again from the initial state.
We could model a word processor like this. The collection of all the inputs and outputs
(the ‘behaviour over time/at infinity’) is what counts now. This is research material
and beyond the scope of the course. See Harel’s book for more information on reactive
systems.
More recent challenges to Church’s thesis include quantum computers — whether
they violate the thesis depends on who you read (go to the third-year course on quan-
tum computing). Another is a Turing machine dropped into a rotating black hole.
Theoretically, such a ‘Marvin machine’ could run forever, yet we could still read the
‘answer’ after its infinitely long computation. Recent research (still in progress) sug-
gests this might be possible in principle in certain kinds of solution to Einstein’s equa-
tions of general relativity. Whether it could ever be practically possible is quite another
question, and whether it would violate Church’s thesis is debated among philosophers.
Those who want to find out more could start with the article Smash and grab by
Marcus Chown, New Scientist vol 174 issue 2337, 6 April 2002, page 24, online via
http://archive.newscientist.com/
We must now define Turing machines more precisely, using mathematical notation.
Then we will see some examples and programming tricks.
22 2. Turing machines and examples
2.1.1 Explanation
Q, Σ, q0 , and F are self-explanatory, and we’ll explain I in §2.2.1 below. Let us
examine the instruction table δ. If q is the current state and s the character of Σ in
the current square, δ(q, s) (if defined) will be a triple (q′ , s′ , δ) ∈ Q × Σ × {−1, 0, 1}.
This represents the instruction to make q′ the next state of M , to write s′ in the old
square, and to move the head in direction d : −1 for left, 0 for no move, +1 for right.
So the line
q s q’ s’ d
of the ‘instruction table’ of §1.3.4 is represented formally as
We can represent the entire table as a partial function δ in this way, by letting δ(first
two symbols) = last three symbols, for each line of the table. The table and δ carry the
same information. Functions are more familiar mathematical objects than ‘tables’, so
it is now standard to use a function for the instruction table. But it is not essential:
Turing used tables in his original paper.
Note that δ(q, s) is undefined if q ∈ F (why?). Also, δ is a partial function: it is
undefined for those arguments (q, s) that didn’t occur in the table. So it’s OK to write
δ : Q × Σ → Q × Σ × {−1, 0, 1}, rather than δ : (Q \ F) × Σ → Q × Σ × {−1, 0, 1}, since
δ is partial anyway.
w0 w1 w2 … wn-1 ∧ ∧
Tape
0 1 2 n-1 n n+1
The word w is the input of M for the coming run. It is the initial data that we have
provided for M .
24 2. Turing machines and examples
Note that w can have any finite length ≥ 0. M will probably want to read all of w.
How does M know where w ends? Well, M can just move its head rightwards until its
head reads a blank ‘∧’ on the tape. Then it knows it has reached the end of the input.
This is why w must contain no blanks, and why the remainder of the tape is filled up
with blanks (∧). If w were allowed to have blanks in it, M could not tell whether it had
reached the end of w. Effectively, ∧ is the ‘end-of-data’ character for the input.
Of course, M might put blanks anywhere on the tape when it is running. In fact it
can write any letters from Σ. The extra letters of Σ \ I are used for rough (or ‘scratch’)
work, and we call them scratch characters.
Exercise 2.3 Consider the Turing machine M = ({q0 , q1 , q2 }, {1, ∧}, {1}, q0 , δ, {q2 }),
with instruction table:
q0 1 q1 ∧ 1
q0 ∧ q2 ∧ 0
q1 1 q0 1 1
2.3. Representing Turing machines 25
So δ is given by: δ(q0 , 1) = (q1 , ∧, 1), δ(q0 , ∧) = (q2 , ∧, 0), and δ(q1 , 1) = (q0 , 1, 1).
List the successive configurations of the machine and tape until M halts, for inputs
1111, 11111 respectively. What is the output of M in each case?
Exercise 2.5 Let M be in exercise 2.3. Let 1n abbreviate 1111. . . 1 (n times). For
which n is fM (1n ) defined?
(a,b,-1)
q q'
(b,c,1)
from state q to state q′ is (a, a′ , d), this means that if M reads a from the current square
while in state q, it must write a′ , then take q′ as its new state, and move the head
by d (+1 for right, 0 for ‘no move’, and −1 for left). Thus, for each (q, a) ∈ Q × Σ,
if δ(q, a) = (q′ , a′ , d) then we draw an arrow from state q to state q′ , labelled with
(a, a′ , d).
By allowing multiple labels on an arrow, as in figure 2.2, we can combine all arrows
from q to q′ into one. We can attach more than one label to an arrow either by listing
them all, or (shorthand) by using a variable (s,t, x, y, z, etc.), and perhaps attaching
conditions. So for example, the label ‘(x, a, 1) if x 6= ∧, a’ from state q to state q′ in
figure 2.3 below means that when in state q, if any symbol other than ∧ or a is read,
then the head writes a and moves right, and the state changes to q′ . It is equivalent to
adding lots of labels (b, a, 1), one for each b ∈ Σ with b 6= ∧, a.
q (x,a,1) if x ≠ ∧, a q'
Exercises 2.6
1. No arrows leave any final state. How does this follow from definition 2.1? Can
there be a non-final (i.e., round) state from which no arrows come, and what
would happen if the TM got into such a state?
2. Figure 2.4 is a flowchart of the Turing machine of exercises 2.3 and 2.5 above.
Try doing the exercises using the flowchart. Is it easier?
Warning Because δ is a function, each state of a flowchart should have no more than
one arrow labelled (a, ?, ?) leaving it, for any a ∈ Σ and any values ?, ?. And if you
forget an arrow or label, the machine might halt and fail wrongly.
2.3. Representing Turing machines 27
(1,∧,1)
q0 q1
(1,1,1)
(∧,∧,0)
q2
2.3.3 Illustration
Example 2.7 (Deleting characters) Fix an alphabet I . Let us define a TM M with
where the function head is as in definition 2.2. M will have three states: skip, erase,
and stop. So Q = {skip, erase, stop}. Skip is the start state, and stop is the only halting
state. We can take the alphabet Σ to be I ∪ {∧}. δ is given by:
(x,x,1)
skip erase
(x,∧,0)
stop
The names of the states are not really needed in a flowchart, but they can make it
more readable. In pseudo-code:
move right
write ∧
halt & succeed
Exercises 2.8 (Unary notation, unary addition) We can represent the number n on
the tape by 111. . . 1 (n times). This is unary notation. So 0 is represented by a blank
tape, 2 by two 1s followed by blanks, etc. For short, we write the string 111. . . 1 of n
1’s as 1n . In this course, 1n will NOT mean 1 × 1 × 1 . . . × 1 (n times). Note: 10 is ε.
2.4. Examples of Turing machines 29
1. Suppose I = {1, +}. Draw a flowchart for a Turing machine M with input alpha-
bet I , such that fM (1n . + .1m ) = 1n+m . (Remember that ‘.’ means concatenation.
E.g., if the input is ‘111+11’, the output is ‘11111’.) So M adds unary numbers.
(There is a suitable machine with 4 states. Beware of the case n = 0 and/or
m = 0.)
2. Write a pseudo-code version of M .
Warning These devices are to help the programmer. They involve no change to the
definition of a TM. (In section 3 we will consider genuine variants of the TM that make
for even easier programming — though these are no more powerful in theory, as we
would expect from Church’s thesis.)
The M above only works for inputs in {0, 1}∗ , but we could design a similar ma-
chine MI = (QI , I ∪ {∧}, I, q0 , δI , FI ) to shift a word of I ∗ to the right, where I is any
finite alphabet. If I has more than 2 symbols then MI would need more states than
M above (how many?). But the idea will be the same for each I , so we would like to
express MI uniformly in I .
Suppose we could introduce into QI a special state seen(x) with a parameter, x,
that can take any value in I . We could then use x to remember the symbol just read.
Using seen(x), the table δI can be given very simply as follows:
30 2. Turing machines and examples
(0,0,1)
(0,0,1)
q0 seen_0
(∧,∧,0) (∧,0,0)
q1
(1,1,1) (0,1,1)
(∧,1,0)
(1,0,1)
seen_1
(1,1,1)
(s,s,1;x:=s) if s≠∧
q0 seen(x)
(∧,∧,0) (s,x,1;x:=s)
(∧,x,0)
q1
Each arrow leading to seen(x) is labelled with one or more 4-tuples. The last entry
of each 4-tuple is an ‘assignment statement’, saying what x becomes when seen(x) is
entered.
The pseudo-code will use a variable x. x can take only finitely many values. We
need not mention the initial write, as we only need specify writes that actually alter the
tape.
2.4. Examples of Turing machines 31
In fact we can use states like seen(x) without changing the formal definition of the
Turing machine at all! We just observe that whilst it’s convenient to view seen(x)
as a single state with a parameter x, we could equally get away with the collection
seen(a), seen(b), . . . of states, one for each letter in I , if we are prepared to draw them
all in and connect all the arrows correctly. This is a bit like multiplication: 3 × 4 is
convenient, but if we only have addition we can view this as shorthand for 3+3+3+3.
What we do is this. For each letter a of I we introduce a single state, called seen(a),
or if you prefer, seena . Because I is finite, this introduces only finitely many states.
So the resulting state set is finite, and so is allowed by the definition of a Turing
machine. In fact, if I = {a1 , . . . , an } then QI = {q0 , q1 , seen(a1 ), . . . , seen(an )}: i.e.,
n + 2 states in all. Then δI as above is just a partial function from QI × (I ∪ {∧})
into QI × (I ∪ {∧}) × {0, 1, −1}. So our machine is MI = (QI , I ∪ {∧}, I, q0 , δI , F) — a
genuine Turing machine!
So although seen(x) is conveniently viewed by us as a single state with a param-
eter ranging over I , for the Turing machine it is really many states, namely seen(a1 ),
seen(a2 ), . . . seen(an ), one for each element of I .
So we can in effect allow parameters x in the states of Turing machines, so long
as x can take only finitely many values. Doing so is just a useful piece of notation, to
help us write programs. This notation represents the idea of storing a bounded finite
amount of information in the state (as in the registers on a computer).
Warning We cannot store any parameter x that can take infinitely many values, or
even an unbounded finite number of values. That would force the underlying genuine
state set Q to be infinite, in contravention of the definition of a Turing machine. So,
e.g., for any I , we get a Turing machine MI that works for I . MI is built in a uniform
way, but we do not (cannot) get a single Turing machine M that works for any I !
Similarly, we cannot use a parameter in a state to count the length of the input word,
since even though the length of the input is always finite, there is no finite upper bound
on it.
32 2. Turing machines and examples
w1 * w2 ∧ ∧ ...
We will use a parameter to remember the last character seen. We will also need
to tick off characters once we have checked them. So we let M have full alphabet
√ √
Σ = I ∪ {∧, }, where (‘tick’) is a new character not in I . We will overwrite each
√
character with , once we’ve checked it. Figure 2.9 shows a flowchart for M .
(a,√,0,x:= a) if a≠*
M (*,*,0, x:= *)
begin seen(x) (b,b,1) if
b≠ *
(√,√,1) halt (∧,∧,0) if
y=* (*,*,1,y:= x)
(a,a,-1) if
a ≠ √ return test(y) (*,*,1)
(a,*,0) if a≠* and a=y
√
M overwrites the leftmost unchecked character of w1 with , passing to state
seen(x) and remembering what the character was using the parameter x of ‘seen’. (But
if x is ∗, this means it has checked all of w1 , so it only remains to make sure there are
no more uncompared characters of w2 .) Then it moves right until it sees ∗, when it
jumps to state test(y), remembering x as y. In this state it moves past all ∗’s (which are
the checked characters of w2 ). It stops when it finds a character — a, say — that isn’t
∗ (i.e., a is the first unchecked character of w2 ). It compares a with y, the remembered
character of w1 . There are three possibilities:
2.4. Examples of Turing machines 33
Exercises 2.11
1. Try M on the inputs 123∗123, 12∗13, 1∗11, 12∗1, ∗1, 1∗, and ∗ (in the last three,
w1 , w2 , or both are empty (ε)). What is the output of M in each case?
√
2. What would go wrong if the ‘begin → seen’ arrow was just labelled (a, , 0, x :=
a)?
Please don’t worry if you found that hard; Turing machines that need as many as five
states (not counting any parameters) are fairly rare, and anyway we’ll soon see ways to
make things easier. By the way, it’s a good idea to write your Turing machines using
as few states as you can.
3. Design a Turing machine TI to calculate the function tail : I ∗ → I ∗ .
4. Design a Turing machine M that checks that the first character of its input does
not appear elsewhere in the input. How will you make M output the answer?
track 1 w1 ....
tape ∧ ∧ ∧
track 2 w2 ∧ ∧ ....
q0 ((x,x),∧,1)
if x ≠ ∧
((∧,∧),∧,1)
halt
Warning The tuples (a1 , . . . , an ) are just single symbols in the Turing machine’s al-
phabet. The tracks only help us to think about Turing machine operations — they
exist only in the mind of the programmer. No change to the definition of a Turing ma-
chine has been made. Compare arrays in an ordinary computer. The array A(5, 6) will
usually be implemented as a 1-dimensional memory area of 30 contiguous cells. The
division into a 2-dimensional array is done in software.
Warning We cannot divide the tape into infinitely many tracks — this would vio-
late the requirement that Σ be finite. (But see 2-dimensional-tape Turing machines in
§3.4.1.)
2.4. Examples of Turing machines 35
We view it as:
Track 1 a a 1 2
∧
∧ a ∧
Track 2 b a a b ∧
((a,b),(a,a),1)
if a≠∧
((∧,b),(∧,∧),0)
w1 * w2 ∧ ....
as in figure 2.13 (how exactly?) Then return to square 0. The resulting tape has
2 tracks as far as the input went; after that, it has only one. Also, while we’re at
it, we mark square 0 with a ‘∗’ in track 2 (figure 2.15).
track 1 w1 ....
....
∧ ∧ ∧
∧
track 2 * ... w2
∧ ∧
Stage 2: shift w2 left to align it with w1 . E.g., use some version of tail (exercise 2.11)(3)
repeatedly, until the ∗ is gone (see figure 2.16).
track 1 w1 ....
....
∧ ∧
∧
track 2 w2 ∧ ∧ ∧ ....
Stage 3: compare the tracks as far as their first ∧’s, halting & failing if a difference is
found. This is easy — see figure 2.11.
Exercise: work out the details.
So comparing two words is easier with two tracks. But tapes with more than one
track are useful even if there’s only one input. An example is implicit marking of
square 0 (§2.4.3 below); we’ll see others in section 3.
((x,y,z),x,1)
if x≠∧
((∧,y,z),∧,0)
3. Include ∗ as a special character of Σ. To initialise, shift the input right one place
and insert ∗ in square 0. Then carry out all operations on squares 1,2,. . . , using
∗ as a left end marker. This works OK, but involves some tedious copying, so is
not recommended when designing actual TMs!
2.4.3.1 Convention
Because we can always know when in square 0 (by using one of these ways), we will
assume that a Turing machine always knows when its head is over square 0 of the tape.
square 0 is assumed to be implicitly marked. This saves us having to mention the
extra track explicitly when describing the machine, and so keeps things simple.
(x,a,-1) if x≠ a
and not in sq. 0
(x,x,0) if x=a or
head in sq. 0
We often need to return the head to square 0. This can be done very simply, using
a loop:
2.4.4 Subroutines
It is quite in order to string several Turing machines together. Informally, when a
final state of one is reached, the state changes to the initial state of the next in the
chain. This can be done formally by collecting up the states and instruction tables of
all the machines in the chain, and for each final state of one machine, adding a new
instruction that changes the state to the initial state of the next machine in the chain,
without altering the tape or moving the head. The number of states in the ‘chain’
machine is the sum of the numbers of states for the individual machines, so is finite.
Thus we obtain a single Turing machine from the machines in the chain; again we have
not changed the basic definition of the Turing machine. We will use this technique
repeatedly.
Warning When control passes to the next Turing machine in the chain, the head may
not be over square 0. Moreover, the tape following the ‘input’ may contain the previous
machine’s scratchwork and so not be entirely blank. Each machine’s design should
allow for these possibilities, e.g., by returning the head to square 0 before starting, or
otherwise.
2.4.5 Exercises
We end this section with some problems that illustrate the techniques we have seen
here.
Exercises 2.12
1. (subtraction) Suppose that I = {1, −}. Design a Turing machine M = (Q, Σ,
I, q0 , δ, F) such that
½
n m 1n−m if n ≥ m,
fM (1 . − .1 ) =
ε, otherwise.
M performs subtraction on unary numbers.
2. (unary multiplication) Suppose I = {1, ∗}. Design a Turing machine M such that
fM (1n . ∗ .1m ) = 1nm . Hint: use 3 tracks and repeated addition and subtraction.
3. (inverting words) Let I be an alphabet. Find a Turing machine M = (Q, Σ, I, q0 ,
δ, F) such that fM (w) is the reverse of w. E.g.: use storage in states and marking
of square 0.
4. (unary-binary conversion)
(a) Design a machine M to add 1 to a binary number (much easier than in
decimal!). That is, if if n > 0 is a number let ‘n’ ∈ {0, 1}∗ be the binary
expansion of n, without leading zeros, written with the least significant
digits on the left. (E.g., ‘13’ = 1011. This makes things easier.) Define ‘0’
to be a single zero. We then require fM (‘n’) = ‘n + 1’ for all n ≥ 0.
40 3. Variants of Turing machines
(b) Extend M to a machine that converts a unary number to its binary equiva-
lent. Hint: use two tracks.
(c) Design a Turing machine that converts a binary number to unary.
5. (primality testing) Design a TM that, given as input some binary number, tests
whether it is prime. [Again, a 3-track machine is useful. See Hopcroft & Ull-
man, p. 154].
In this section we examine some variants of the TM we considered before. The main
examples we have in mind are machines with a two-way infinite tape, or more than
one tape. We will see that in computational power they are all the same as the ordinary
model. This is in line with Church’s thesis, and provides some evidence for the ‘truth’
of the thesis.
Nonetheless, variants of the basic Turing machine are still useful. Just as in real
life, the more complex (expensive) versions can be easier to program (user-friendly),
whilst the simpler, cheaper models can be easier to understand and so prove things
about. For example, suppose we wanted to prove that ‘register machines’ are equiva-
lent in power to Turing machines. This amounts to showing that Turing machines are
no better and no worse than register machines (with respect to computational power).
We could show this directly. But clearly it might be easier to prove that a cheap Tur-
ing machine is no better than a register machine, which in turn is no better than an
expensive Turing machine. As in fact both kinds of Turing machine have equal com-
putational power, this is good enough.
3.1. Computational power 41
B A F A B A ∧ 3 ∧ ∧
-3 -2 -1 0 1 2 3 4 5 6
M±
Definition 3.2 (Two-way-infinite tape TM) A two-way infinite tape Turing ma-
chine has the form M ± = (Q, Σ, I, q0 , δ, F), exactly as before. The tape now goes right
and left forever, so the head of M ± can move left from square 0 to squares −1, −2, etc.,
without halting and failing. (You can’t tell from the 6-tuple definition what kind of tape
the machine has; this information must be added as a rider. Of course, by default the
tape is 1-way infinite, as that’s our definition of a Turing machine.)
The input to M ± is written initially in squares 0, 1, . . . , n. All squares > n and
< 0 are blank. If M ± terminates, the output is taken to be whatever is in squares
0, 1, . . . , m − 1, where the first blank square ≥ 0 is in square m. So we can define the
input-output function fM± for a two-way infinite tape machine as before.
Exercise 3.3 (This is too easy!) Since we can’t tell from the 6-tuple definition what
kind of tape the machine has, we can alter an ordinary TM by giving it a two-way infi-
nite tape to run on: the result is a working machine. Find a 1-way infinite tape Turing
machine that has a different input-output function if we give it a two-way infinite tape
in this way.
Now 2-way infinite tape Turing machines still seem algorithmic in nature, so if
Church’s thesis is true, they should be able to compute exactly the same functions as
ordinary Turing machines. Indeed they can: for every two-way infinite Turing machine
there is an equivalent ordinary Turing machine, and vice versa. But we can’t just
quote Church’s thesis for this, as we are still gathering evidence for the thesis! We
must prove it. If we can do this, it will provide some type (b) evidence (see §1.5.3) for
the correctness of Church’s thesis as a definition of algorithm.
Two-way machines seem intuitively more powerful than ordinary ones. So it
should be easy to prove:
And it is. We take M ± = (Q, Σ ∪ {fail}, I, q0 , δ± , F), where ‘fail’ is a new symbol
not in Σ. M ± begins by moving left to square −1, writing ‘fail’ there, and moving right
3.2. Two-way-tape Turing machines 43
to square 0 again. Then it behaves exactly as M , except that if it ever reads ‘fail’ it
halts and fails. Clearly fM = fM± . QED.
As we might expect, the converse is a little harder.
P ROOF. The idea is to curl the two-way tape round in a U-shape, making it 1-way
infinite but with two tracks. The top track will have the same contents as squares
0,1,2,. . . of the two-way infinite tape of M ± . The bottom track will have a special
symbol ‘∗’ in square 0, to mark the end of the tape, and squares 1, 2, . . . will contain
the contents of squares −1, −2, . . . of M ± ’s tape. See figure 3.2.
∧ C F A B A ∧ 3 ∧ ∧
-3 -2 -1 0 1 2 3 4 5 6
M± q
A B A ∧ 3 ∧ ∧
* F C ∧ ∧ ∧ ∧
0 1 2 3 4 5 6
M q(-1)
The 1-way tape of M holds the same information as M ± ’s 2-way tape. M will use
it to follow M ± , move for move. It keeps track of whether M ± is currently left or right
of square 0, by remembering this (a finite amount of information!) in its state, as in
§2.4.1.
In pseudo-code, it is quite easy to specify M . The variable track will hold 1 if
M is now reading a positive square or 0, and −1 if M ± is reading a negative square.
±
For M , +1 means ‘top track’ and −1 means ‘bottom track’. The variable M ± -state
holds the state of M ± . Note that these variables can only take finitely many values, so
they can be implemented as a parameter in the states of M , as in §2.4.1. Remember
(§2.4.2.1) that in reality, two-track squares of M ’s tape hold pairs (a, b), where we view
a as the contents of the top track, and b the contents of the bottom track.
if the current square contains a (say), and a is not a pair of symbols, then
if track = 1 then write (a, ∧) else write (∧, a) end if % dynamic track set-up
end if
if track = 1 then % M ± is reading a square ≥ 0
case [we are reading (a, b) and δ(M -state, a) = (q, a′ , d)]:
± % δ from M ±
write (a′ , b) % write to top track
±
M -state := q
if b = ∗ and d = −1 then % M ± in square 0 and moving left. . .
move right; track := −1
else
move in direction d % track = 1 so M moves the same way as M
end if
end case
else if track = −1 then % M ± is reading a square < 0
case [we are reading (a, b) and δ(M ± -state, b) = (q, b′ , d)]:
write (a, b′ ) % write to bottom track
±
M -state := q
move in direction −d % track = −1, so M moves ‘wrong’ way
if now reading ∗ in track 2 then track := 1 % M ± now in square 0
end case
end if
end repeat
% M ± has halted & succeeded, so clean up & output
move left until read ∗ in track 2 % return to square 0
repeat while not reading ∧ in track 1
if reading (a, b) (say) then write a % replace two tracks with one
move right
end repeat
write ∧; halt & succeed % blank to mark end of output
So M mimics M ± , move for move. Note that the case statements involve a fixed
finite number of options, one case for each triple (q, a, b) where q ∈ Q and a, b ∈ Σ. So
we can implement them by ‘hard-wiring’, using finitely many states of M . We stipulate
that if no option applies, the case statement halts & fails.
When M ± halts and succeeds (if ever!), M removes the bottom track, the old top
track up to its first ∧ (which is M ± ’s output) becoming the whole width of the tape.
Thus, the output of M is the same as M ± in all cases, and so M is equivalent to M ± .
QED.
Tape 1 A B A 1 ∧ C ∧
0 1 head 1 3 4 5 6
0 1 2 3 4 5 head 2
Tape 3 A & 3 % G Ø ∧
0 1 2 3 head 3 5 6
state held
in here Control
machine with 3 tracks. Here, there are three heads which can move independently on
their own tapes. At each step:
• All 3 heads read their squares; the 3 symbols found are passed to control.
• Depending on these 3 symbols and on its current state, control then:
– tells each head what to write;
– tells each head which way to move;
– moves into a new state.
• The process repeats.
The moves, new symbols and state are determined by the instruction table, and de-
pend on the old state and all three old symbols. So what one head writes can depend
on what the other heads just read. In many ways, a many-tape Turing machine is
analogous to concurrent execution, as in effect we have several communicating Turing
machines running together.
46 3. Variants of Turing machines
Warning Do not confuse multi-tape TMs with multi-track TMs. I know they are
similar as English words, but these names are in general use and we are stuck with
them. They mean quite different things. Think of tape recorders. Two old mono tape
recorders (2 tapes, with 1 track each) are not the same as one stereo tape recorder
(1 tape, 2 tracks). They are in principle better, because (a) we could synchronise them
to get the capabilities of a single stereo machine, but (b) we can use them in other ways
too, e.g., for editing. Similar considerations apply to Turing machines.
The definition of the n-tape machine is the same, except that we have:
δ : Q × Σn → Q × Σn × {−1, 0, 1}n .
Here, and below, if S is any set then Sn is the set {(a1 , . . . , an ) : a1 , . . . , an ∈ S}.
3.3.1.1 Remarks
2. How can you tell how many tapes the Turing machine (Q, Σ, I, q0 , δ, F) has?
Only by looking at δ. If δ takes 1 state argument and n symbol arguments, there
are n tapes.
3. Note that it is NOT correct to write, e.g., (Q3 , Σ3 , I 3 , q0 , δ, F 3 ) for a 3-tape ma-
chine. One Turing machine, one state set, one alphabet. There is a single state
set in a 3-tape Turing machine, and so we write it as Q. If each of the three heads
were in its own state from Q, then the state of the whole machine would indeed
be a triple in Q3 . But remember, everything is linked up, and one head’s actions
depend on what the other heads read. With so much instantaneous communica-
tion between heads, it is meaningless to say that they have individual states. The
machine as a whole is in some state — some q in Q.
And there is one alphabet: the set of characters that can occur in the squares on
the tapes. We used Σ3 when there were 3 tracks on a single tape, because this
involved changing the symbols we were allowed to write in a single square of a
tape. So if you write Σ3 , you should be thinking of a 3-track machine. In fact,
after using 3-tape machines for a while you won’t want to bother with tracks
any more, except to mark square 0.
3.3. Multi-tape Turing machines 47
3.3.1.2 Computations
How does a many-tape TM operate? Consider (for instance) a 3-tape machine M =
(Q, Σ, I, q0 , δ, F). At the beginning, each head is over square 0 of its tape. Assume that
at some stage, M is in state q and reads the symbol ai from head number i (for each
i = 1, 2, 3). Suppose that
δ(q, a1 , a2 , a3 ) = (q′ , b1 , b2 , b3 , d1 , d2 , d3 ).
Then for each i = 1, 2, 3, head i will write the symbol bi in its current square and then
move in direction di (0 or ±1 as usual), and M will go into state q′ . M halts and
succeeds if q′ is a halting state. It halts and fails if there is no applicable instruction, or
if any of the three heads tries to move left from square 0. The definition of an n-tape
machine is similar.
Exercises 3.7
1. Draw a flowchart for this machine. The main bit is shown in figure 3.4. (This
((x,x),(x,x),(1,-1))
if h2 not in sq. 0
((x,x),(x,x),(0,0))
if h2 in sq. 0
halts & fails if the heads read different characters — there’s no applicable in-
struction.) Now try to design a single-tape Turing machine that does the same
job. That should convince you that many-tape machines can help programming
considerably. (For a solution see Harel’s book, p.202.)
3.3. Multi-tape Turing machines 49
P ROOF. To show (1) is easy (because expensive is ‘obviously better’ than cheap).
Given an ordinary 1-tape Turing machine M , we can make it into an n-tape Turing
machine by adding extra tapes and heads but telling it not to use them. In short, it
ignores the extra tapes and goes on ignoring them!
Formally, if M = (Q, Σ, I, q0 , δ, F), we define Mn = (Q, Σ, I, q0 , δ′ , F) by:
δ′ : Q × Σn → Q × Σn × {−1, 0, 1}n
δ′ (q, a1 , . . . , an ) = (q′ , b1 , ∧, ∧, . . . , ∧, d1 , 0, 0, . . . , 0) where δ(q, a1 ) = (q′ , b1 , d1 ).
(Recall that δ is the only formal difference between TMs with different numbers of
tapes.) Clearly, Mn computes the same function as M , so it’s equivalent to M .
The converse (2), showing that cheap is really just as good as expensive, is of
course harder to prove. For simplicity, we only do it for n = 2, but the idea for larger n
is the same.
So let M2 be a 2-tape Turing machine. We will construct a 1-tape Turing machine
M that simulates M2 . As we said, each M2 -instruction will correspond to an entire
subroutine for M . The idea is very simple: M simulates M2 by drawing a diagram or
50 3. Variants of Turing machines
picture of M2 ’s initial configuration, and then updating the picture to keep track of the
moves M2 makes.
M has a single 4-track tape (cf. §2.4.2). At each stage, track 1 will have the same
contents as tape 1 of M2 . Track 2 will show the location of head 1 of M2 , by having
an X in the current location of head 1 and a blank in the other squares. Tracks 3 and 4
will do the same for tape 2 and head 2 of M2 .
Example 3.9 Suppose that at some point of execution, the tapes and heads of M2 are
as in figure 3.5. Then the tape of M will currently be looking like figure 3.6.
Tape 1 A B A C D
0 1 head 1 3 4 5 6
Tape 2 D A D D A
0 1 2 3 4 5 head 2
Control
track 1 A B A C D
track 2 X
track 3 D A D D A
track 4 X
0 1 2 3 4 5 6
So the tape of M will always show the current layout of M2 ’s tapes and heads. We have
to show how M can update its tape to keep track of M2 . Let us describe M ’s operation
from start to finish, beginning with the setting-up of the tracks.
3.3. Multi-tape Turing machines 51
Initialisation Recall that initially both heads of M2 are over square 0; tape 1 carries
the input of M2 , and tape 2 of M2 is blank. M is trying to compute the same
function as M2 , so we can assume its input is the same. That is, initially M ’s
(single) tape is the same as tape 1 of M2 .
First, M sets up square 0. Suppose that tape 1 of M2 (and so also M ’s tape) has
the symbol a in square 0. Then M writes (a, X, ∧, X) in square 0. This is because
it knows that both heads of M2 start in square 0 — so that’s where the Xs should
be! And it knows tape 2 of M2 is blank.
But these Xs will move around later, with the heads of M2 , so also, square 0
should be marked. M can mark square 0 with an extra track — cf. §2.4.3. So
really M ’s tape has five tracks; but we agreed in §2.4.3.1 not to mention this
track, for simplicity.
We’ll assume dynamic track set-up, as in §2.4.2.4. So whenever M moves into
a square whose contents are not of the form (a, b, c, d), but just a, it immediately
overwrites the square with (a, ∧, ∧, ∧), and then continues. This is because to
begin with, M ’s tape is the same as tape 1 of M2 , and tape 2 of M2 is blank. So
(a, ∧, ∧, ∧) is the right thing to write.
We assume also that M always knows the current state q of M2 (initially q0 ). It
can keep this information in its state set as well, because the state set of M2 is
also finite.
M must now update the tape after each move of M2 , repeating the process until
M2 halts. Suppose that M2 is about to execute an instruction (i.e., to read from and
write to the tapes, move the heads, and change state). When M2 has done this, its head
positions and tape contents may be different. M updates its own tape to reflect this, in
two stages:
Stage 1: Finding out what M2 knows First M ’s head sweeps from square 0 to the
right. As it does so, it will come across the X markers in tracks 2 and 4. When
it hits the X in track 2, it looks at the symbol in track 1 of the same square.
This is the symbol that head 1 of M2 is currently scanning. Suppose it is a1 ,
say. M remembers this symbol ‘a1 ’ in its own internal states — cf. §2.4.1. It
can do this because there are only finitely many possible symbols that a1 could
be (Σ is finite). Similarly, M will eventually find the X in track 4, and then it
also remembers the symbol — a2 , say — in track 3 of the same square. a2 is
the symbol that head 2 of M2 is currently scanning. Of course, M might find the
X in track 4 before or even at the same time as the X in track 2. In any event,
once it has found both Xs, M knows both the current symbols a1 , a2 that M2 is
scanning.
Stage 2: Updating the tape We assume that M ‘knows’ the instruction table δ of M2 .
This never changes so can be ‘hard-coded’ in the instruction table of M . As with
the 2-way-infinite tape simulation (theorem 3.5), M does not have to compute
δ — δ is built into M in the sense that the instruction table of M is based on it.
52 3. Variants of Turing machines
If q′ is not a halting state for M2 , M now forgets M2 ’s old state q, remembers the new
state q′ , and begins the next sweep at Stage 1 above.
The output Suppose then that q′ is a halting state for M2 . So at this point, M2 will
halt and succeed with the output on tape 1. As track 1 of M ’s tape always looks
the same as tape 1 of M2 , this same output word must now be on track 1 of M ’s
tape. M now demolishes the other three tracks in the usual way, leaving a single
track tape containing the contents of the old track 1, up to the first blank.
M has simulated every move of M2 . So for all inputs in I ∗ , the output of M is the
same as that of M2 . Thus fM = fM2 , and M is equivalent to M2 , as required. QED.
Exercises 3.10
1. Write out the pseudo-code routine to handle tracks 3 and 4 in SweepLeft.
3.3. Multi-tape Turing machines 53
2. Draw flowcharts of the parts of M that handle stages 1 and 2 above. It’s not too
complicated if you use parameters to store a1 , a2 , q, q′ , b1 , b2 , d1 , and d2 , as in
§2.4.1.
3. Why do we ‘move −d1 ’ after writing X in track 2? (Hint: what if the heads are
both in square 6?)
4. Why do we need the variable done1 in SweepLeft? What might happen if we
omitted it?
5. What alterations would be needed to simulate an n-tape machine?
6. Suppose that at some point M2 tries to move one of its heads left from square 0.
M2 halts and fails in this situation. What will M do?
7. Suppose that on some input, M2 never halts. What will M do?
8. How could we make M more efficient?
9. (Quite long.) Let M2 be the ‘reverser’ 2-tape Turing machine of exercise 3.7.
Suppose M2 is given as input a word of length n. How many steps will M2 take
before it halts? If M is the 1-tape machine that simulates M2 , as above, how
many steps (roughly!) will it take?
flow-chart diagram; in the latter case you should explain your notation for
instructions.
(c) By modifying M or otherwise, briefly explain how you would design a
(2-tape) Turing machine M ∗ with input alphabet {1, ∗}, such that for any
n ≥ 0 and m > 0, fM ∗ (1n ∗1m ) = 1r , where r is the remainder when dividing
n by m.
The three parts carry, respectively, 20%, 45% and 35% of the marks.
3. (a) What is Church’s thesis? Explain why it cannot be proved but could
possibly be disproved. What kinds of evidence for the thesis are there?
(b) Design a 2-tape Turing machine M with input alphabet I , such that if w1
and w2 are words of I of equal length, the initial contents of tape 1 are w1
and the initial contents of tape 2 are w2 , then M halts and succeeds if w1 is
an anagram (i.e., a rearrangement of the letters) of w2 , and halts and fails
otherwise.
For example, if w1 = abca and w2 = caba, M halts and succeeds; if w1 =
abca and w2 = cabb, M halts and fails.
You may use pseudo-code or a flow-chart diagram; in the latter case you
should explain your notation for instructions. You may assume that square
zero of each Turing machine tape is implicitly marked.
The two parts carry, respectively, 40% and 60% of the marks. [1993]
a b a
1 1 1 0 a
1 a 1
Then tape 1 of the little machine will contain the three segments
— or the same but with longer segments filled out by blanks. Note the double ∗∗ at the
end. Tape 2 of the little machine is used for scratch work.
Head 1 of the little machine is over the symbol corresponding to where the big
machine’s head is. If big head moves left or right, so does little head 1. If however,
big head moves up, little head 1 must move to the corresponding symbol in the next
segment to the right. So the little machine must remember the offset of head 1 within
its current segment. This offset is put on tape 2, e.g., in unary notation. So in this case,
little head 1 moves left until it sees ‘∗’. For each move left, little head 2 writes a 1 to
tape 2. When ‘∗’ is hit, head 1 moves right to the next ‘∗’. Then for each further move
of head 1 right, head 2 deletes a 1 from tape 2. When all the 1’s have gone, head 1 is
over the correct square, and the next cycle commences.
Sometimes the little machine must add a segment to tape 1 (if big head moves
higher on the big tape than ever before), or lengthen each segment (if big head moves
further right than before). It is easy to add an extra segment of blanks on the end of
tape 1 of the right length — tape 2 is used to count out the length. Adding a blank at
the end of each segment can be done by shifting, as in example 2.9. The little machine
can do all this, return head 1 to the correct position (how?), and then implement the
move of the big head as above — there is now room for it to do so.
56 3. Variants of Turing machines
This bears out Turing’s remark in his pioneering paper that whilst people use paper
to calculate, the 2-dimensional character of the paper is never strictly necessary. Again
we have found evidence of type (b) for Church’s thesis. A similar construction can
be used to show that for any n ≥ 1, n-dimensional Turing machines are equivalent to
ordinary ones.
We can imagine Turing machines with alphabet Σ0 = {0, 1, ∧} and I = {0, 1}. Unlike
the previous variants, these are seemingly less powerful (cheaper) than the basic model.
But they can compute any function fM : I → I for any Turing machine M . The idea is
to simulate a given Turing machine (Q, Σ, I, q0 , δ, F) by coding its scratch characters
(those of Σ \ I ) as strings of 1s. E.g., we list Σ as {s1 , . . . , sn } and represent si by a string
1i of i 1s. Exercise: work out the details. We will develop this idea considerably in the
next section.
We will define these and show that they’re equivalent to ordinary machines in Part III
of the course.
Ordinary Turing machines have the same computational power as register machines,
and also more abstract systems such as the lambda calculus and partial recursive
functions. No-one has found a formalism that is intuitively algorithmic in nature but
has more computational power. This fact provides further evidence for Church’s thesis.
We considered what it means for two different kinds of machine to have the same
computational power, deciding that it meant that they could compute the same class of
functions. Examples such as palindrome detection showed how useful many-tape TMs
can be. We proved or indicated that the ordinary Turing machine has the same com-
putational power as the variants: 2-way infinite tape machines, multi-tape machines,
2-dimensional tape machines, limited character machines, and non-deterministic ma-
chines. This provided evidence for Church’s thesis.
57
We nowadays accept that a single computer can solve a vast range of problems, rang-
ing from astronomical calculations to graphics and process control. But before com-
puters were invented there were many kinds of problem-solving machines, with quite
different ‘hardware’. Turing himself helped to design code-breaking equipment with
dedicated hardware during the second world war. These machines could do nothing
but break codes. Turing machines themselves come in different kinds, with different
alphabets, state sets, and even hardware (many tapes, etc).
It was Turing’s great insight that this proliferation is unnecessary. In his 1936 paper
he described a single general-purpose Turing machine, that can solve all problems that
any Turing machine could solve.
This machine is called a universal Turing machine. We call it U . U is not magic
— it is an ordinary Turing machine, with a state set, alphabet, etc, as usual. If we want
U to calculate fM (w) for some arbitrary Turing machine M and input w to M , we give U
the input w plus a description of M . We can do this because M = (Q, Σ, I, q0 , δ, F) can
be described by a finite amount of information. U then evaluates fM (w) by calculating
what M would do, given input w — rather in the way that the 1-tape Turing machine
simulated a 2-tape Turing machine in theorem 3.8.
So really, U is programmable: it is an interpreter for arbitrary Turing machines.
In this section, we will show how to build U .
Definition 4.1 We let C be the alphabet {a,b,c,. . . ,A,B,. . . ,0,1,2,. . . ,!,@,£,. . . } of char-
acters that you would find on any typewriter (about 88 in all; note that ∧ is not included
in C).
S is then specified completely by this list, together with the numbers n and f .
1 We
will input the pair (code(S), w) to U in the usual way, by giving it the string code(S) concatenated
with the string w, with a delimiting character, say ∗, in between.
4.2. Codes for standard Turing machines 59
There are many ways of coding this information. We will use a simple one. Con-
sider the word
n, f ,t1 ,t2 , . . . ,tN
where the list of 5-tuples is t1 ,t2 , . . . ,tN in some arbitrary order,2 and all numbers (n, f
and the numbers q, q′ and d in the 5-tuples) are written in decimal (say). This is a word
of our coding alphabet C ∪ {∧}. We let code(S) be the word of C obtained from this
by replacing every ‘∧’ by the five-letter word ‘blank’. (As we will be giving code(S)
as input to Turing machines, we don’t want ∧ to appear in it.)
2,2, (1,a,2,blank,-1)
decimal number: 1 decimal number instruction
less than number of first halting
of states state
0123456789 , − ( ) blank
We stress that these are only details. U only needs to be able to recover the work-
ings of S from code(S). We can use any coding that allows this.
4.2.4 Summary
The point behind these details is that each standard Turing machine S can be repre-
sented by a finite piece of information, and hence can be coded by a word code(S)
of C, in such a way that we can reconstruct S from code(S). The word code(S) ∈ C∗
carries all the information about S. It is really a name or plan of S.
for some f ≤ n. Assume the input to U is code(S) ∗ w. U will simulate the run of S on
input w. We will ensure that at each stage during the simulation:
• tape 1 keeps its original contents code(S) ∗ w, for reference;
3 Recall that code(S) is not unique. But any code for S carries all the information about S. In fact, it
will be clear that U will output fS (w) given input s ∗ w, where s is any code for S.
4 If we wish, we can use theorem 3.8 to find a one-tape equivalent of U, and then, as the output of U
will be a word of C (why?), apply theorem 4.7 below to find a standard TM equivalent to U.
62 4. Universal Turing machines
1. Maybe the current state q of S is a halting state. To find out, U first com-
pares the number q on tape 3 with the number f in code(S). U can find
what f is by looking just after the first ‘,’ on tape 1. They are in the same
decimal format, so U can use a simple string comparison to check whether
q < f or q ≥ f .
2. If q ≥ f , this means that S is now in a halting state. Because tape 2 of U is
always the same as the tape of S, the output of S is now on tape 2 of U . U
now copies tape 2 to tape 1, terminated by a blank, and halts & succeeds.
3. If q < f then q is not a halting state, and S is about to execute its next
instruction. So head 1 of U scans through the list of instructions (the rest
of code(S), still on tape 1) until it finds a 5-tuple of the form (q, s, q′ , s′ , d)
where:
• q (as above) is S’s current state as held on tape 3. Head 3 repeatedly
moves along in parallel with head 1, to check this.
• s is the symbol that head 2 is now scanning — i.e., S’s current symbol.
A direct comparison of the symbols read by heads 1 and 2 will check
this. (If s is ‘blank’, U tests whether head 2 is reading ∧.)
4. If no such tuple is found on tape 1, this means that S has no applicable
instruction, and will halt and fail. Hence U halts and fails too (e.g., by
moving heads left until they run off the tape).
5. So assume that U has found on tape 1 the part ‘(q, s’ of the instruction
(q, s, q′ , s′ , d) that S is about to execute. S will write s′ , move its head by
d , and change state to q′ . To match this, U needs to know what s′ , q′ ,
and d are. It finds out by looking further along the instruction 5-tuple it
just found on tape 1, using the delimiter ‘,’ to keep track of where it is in
the 5-tuple.5 Head 2 of U can now write s′ at its current location (by just
5 The awful possibility that s and/or s′ is the delimiter ‘,’ can be got round by careful counting.
4.4. Coding 63
copying it from tape 1, except that if it is blank, head 2 writes ∧), and
then move by d (d is also got from tape 1). Finally, U copies the decimal
number q′ from tape 1 to tape 3, replacing tape 3’s current contents. After
returning head 1 to square 0, U is ready for the next step of the run of S. It
now repeats Step 2 again.
Thus, every move of S is simulated by U . Clearly, U halts and succeeds if and only
if S does, and in that case, the output of U is just fS (w). Hence, fU (code(S) ∗ w) =
fS (w), and U is the universal machine we wanted.
Exercises 4.6
1. What does U do if S tries at some step to move its head left from square 0 of its
tape?
2. (Important) Why do we not hold the state of S in the state of U (cf. storing a
finite amount of information in the states, as in §2.4.1 and theorems 3.5 and 3.8)?
After all, the state set of S is finite!
3. By using theorem 3.8, and then theorem 4.7 below, we can replace U with an
equivalent standard TM. So we can assume that U is standard, so that code(U)
exists and is a word of C.
Let S be a standard TM, and let w ∈ C∗ . What is fU (code(U) ∗ code(S) ∗ w)?
Using an interpreter was a key step in our original paradox, and so we are now
well on the way to rediscovering it in the TM setting. In fact we will not use U to do
this, but will give a direct argument. Nonetheless, U is an ingenious and fascinating
construction — and historically it led to the modern programmable computer.
4.4 Coding
A Turing machine can have any finite alphabet, but the machine U built above can only
‘interpret’ standard Turing machines, with alphabet C. This is not a serious restriction.
Computers use only 0 and 1 internally, yet they can work with English text, Chinese,
graphics, sound, etc. They do this by coding.
Coding is not the secret art of spies — that is called cryptography. Coding means
turning information into a different (e.g., condensed) format, but in such a way that
nothing is lost, so that we can decode it to recover the original form. (Cryptography
seeks codings having decodings that are hard to find without knowing them.) Examples
of codings are ASCII, Braille, hashing and some compression techniques, Morse code,
etc., (think of some more). A computer stores graphics in coded (e.g., bit-mapped)
form.
Here, we will indicate briefly how to use coding to get round the restriction that U
can only ‘do’ standard machines.
64 4. Universal Turing machines
P ROOF. (sketch; cf. Rayward-Smith, Theorem 2.6) The idea is to get S to mimic M
by working with codes throughout. Choose an encoding function code : Σ → C∗ . The
input word w is a word of C. But as Σ is contained in C, w is also a word of Σ, so w
itself can be coded! S begins by encoding w itself, to obtain a (longer) word code(w)
of C. S can then simulate the action of M , working with codes of characters all along.
Whatever M does with the symbols of Σ, M ∗ does with their codes.
If the simulation halts, S can decode the information on the tape to obtain the
required output. The decoding only has to cope with codes of characters in C ∪ {∧},
as we are told that the output consists only of characters in C. Because S simulates
all operations of M , we have fS = fM , so S and M are equivalent. At no stage does S
need to use any other characters than ∧ or those in C. So S can be taken to be standard.
QED.
We can now use U to interpret any Turing machine M with input alphabet C and
such that fM : C∗ → C∗ . We first apply the theorem to obtain an equivalent standard
Turing machine S, and then pass code(S) to U .
is a possible input word for M ; we are not allowed to pass this to U , as it’s not a word
of U ’s input alphabet. But M is presumably executing some algorithm, so we’d like U
to have a crack at simulating M .
Well, coding can help here, too. Just as computers can do English (or Chinese)
word processing with their limited 0-1 alphabet, so we can design a new Turing ma-
chine M ∗ that parallels the action of M , but working with codes of the characters that
M actually uses. We’ll describe briefly how to do this; it’s like eliminating scratch
characters.
Assume M has full alphabet Σ. Σ could be very large, but it is finite (because the
definition of Turing machine only allows finite alphabets). Choose a coding function
code : Σ → C∗ . Where M is given input w ∈ C∗ , we’ll give code(w) to M ∗ . From then
on, M ∗ will work with the codes of characters from Σ just as in theorem 4.7 above. M
will halt and succeed on input w if and only if M ∗ halts and succeeds on input code(w).
The output of M ∗ in this case will be code( fM (w)), the code of M ’s output, and this
carries the same information as the actual output fM (w) of M .
5. Unsolvable problems
5.1 Introduction
In this section, we will show that some problems, although not vague in any way,
are inherently unsolvable by a Turing machine. Church’s thesis then applies, and we
conclude that there is no algorithm to solve the problem.
Question Why can we not just use U of section 4 to do this, by getting U to simulate
S running on w, and seeing whether it halts or not?
• h(x) = 1 if x = code(S) ∗ w for some standard Turing machine S, and S halts and
succeeds on input w
68 5. Unsolvable problems
• h(x) = 0 if x = code(S) ∗ w for some standard Turing machine S, and S does not
halt and succeed on input w
• h(x) is arbitrary (e.g., undefined) if x is not of the form code(S) ∗ w for any
standard Turing machine S and word w ∈ C∗ .
Big question: is this function h Turing-computable? Is there a Turing machine H such
that fH = h? Such an H would solve the halting problem.
Warning Our choice of values 1, 0 for h is not important. Any two different words
of C would do. What matters is that, on input code(S) ∗ w, H always halts & succeeds,
and we can tell from its output whether or not S would halt & succeed on input w.
The halting problem is not a toy problem. Such an H would be very useful. As we
now demonstrate, regrettably there is no such H . This fact has serious repercussions.
P ROOF. Assume for contradiction that the partial function h (as above) is Turing
computable. Clearly, if h is computable it is trivial to compute the partial function
g : C∗ → C∗ given by:
½
1, if h(w ∗ w) = 0,
g(w) =
undefined, otherwise
(Here, ‘w ∗ w’ is just w followed by a ‘*’, followed by w.) So let M be a Turing machine
with fM = g. By theorem 4.7 (scratch character elimination) we can assume that M is
standard, so it has a code, namely code(M).
There are two cases, according to whether g(code(M)) is defined or not.
Another way of seeing the proof: Suppose for contradiction that H is a Turing
machine that solves HP. We don’t know how H operates. We only know that fH = h.
Consider the simple modification M of H shown in figure 5.1. If the input to M is w,
(input w*w to H)
then M adds a ∗ after w, then adds a copy of w after it, leaving ‘w ∗ w’ on the tape. It
then returns to square 0, calls H as a subroutine, and halts & succeeds/fails according
to the output of H , as in the figure. Note that these extra operations (copying, etc.,) are
easy to do with a TM. So if H exists, so does M .
Clearly M outputs only 0, if anything. So fM : C∗ → C∗ , and by theorem 4.7
(elimination of scratch characters) we can assume that M is standard. So M has a code,
viz. code(M).
Consider the run of M when its input is code(M). M will send input ‘code(M) ∗
code(M)’ to H . Now as we assumed H solves HP, the output of H on input code(M) ∗
code(M) says whether M halts and succeeds when given input code(M).
But we are now in the middle of this very run — of M on input code(M)! H is
saying whether the current run will halt & succeed or not! The run hasn’t finished yet,
but H is supposed to predict how it will end — in success or failure. This is clearly a
difficult task for H ! In fact, M is designed to find out what H predicts, and then do the
exact opposite! Let us continue and see what happens.
The input to H was code(M) ∗ code(M), and code(M) is the code for a standard
Turing machine (M itself). So H will definitely halt and succeed. Suppose H outputs
1 (saying that M halts and succeeds on input code(M)). M now moves to a state with
70 5. Unsolvable problems
no applicable instruction (look at figure 5.1). M has now halted and failed on input
code(M), so H was wrong.
Alternatively, H decides that M halts and fails on input code(M). So H outputs 0.
In this case, M gleefully halts and succeeds: again, H was wrong.
But H was assumed to be correct for all inputs. This is a contradiction. So H does
not exist. QED.
1. We proved that there’s no Turing machine that solves the halting problem.
3. But our brains are algorithmic — just complicated computers running a complex
algorithm.
4. We can solve the halting problem, as we can tell whether a program will halt or
not. So there is an algorithm to solve the halting problem — us!
The function prime returns true if the argument is a prime number, and false other-
wise. The main program halts if some even number > 2 is not the sum of two primes.
Otherwise it runs forever. As far as I know, no-one knows whether it halts or not. See
Goldbach’s conjecture (§5.4). (And of course we could design a Turing machine doing
the same job, and no-one would know whether it halts or not.)
Exercises 5.2
1. Write a program that halts iff Fermat’s last theorem is false. (This theorem was
only proved in around 1995, after 300 years of effort. So telling if your program
halts can be quite hard!)
2. What happens if we rewire the Turing machine M of figure 5.1, swapping the
0 and 1, so that M halts and succeeds if H outputs 1, and halts and fails if H ’s
output is 0? What if we omit the duplicator that adds ‘∗w’ after w? [Try the
resulting machines on some sample inputs.]
3. Show that there is no Turing machine X such that for all standard Turing ma-
chines S and words w of C, fX (code(S) ∗ w) = 1 if S halts (successfully or not)
on input w, and 0 otherwise.
5. (similar to part of exam question, 1991) Let X be a Turing machine such that
fX (w) = w ∗ w for all w ∈ C∗ . Let Y be a hypothetical Turing machine such that
for every standard Turing machine S and word w of C,
fY (code(S) ∗ w) = 1 if fS (w) = 0,
n
0 otherwise
So Y tells us whether or not S outputs 0 on input w.
(a) How might we build a standard Turing machine M such that for all w ∈ C∗ ,
we have fM (w) = fY ( fX (w))?
(b) By evaluating fM (code(M)), or otherwise, deduce that Y does not exist.
7. A super Turing machine is like an ordinary TM except that I and Σ are allowed
to be infinite. Find a super TM that solves HP for ordinary TMs. [Hint: take
the alphabet to be C∗ .] Deduce that super TMs can ‘compute’ non-algorithmic
functions. What if instead we let Q be infinite?
72 5. Unsolvable problems
5.3 Reduction
So the halting problem, HP, is not solvable by a Turing machine. There is no machine
H as above. (By Church’s thesis, HP has no algorithmic solution.) We can use this fact
to show that a range of other problems have no solution by Turing machines.
The method is to reduce HP to a special case of the new problem. The idea is very
simple. We just show that in order to solve HP (by a Turing machine), it is enough to
solve the new problem. We could say that the task of solving HP reduces to the task of
solving the new problem, or that HP ‘is’ a special case of the new problem. So if the
new problem had a TM solution, so would HP, contradicting theorem 5.1. This means
that the new problem doesn’t have a TM solution.
Warning A reduces to B = you can use B to solve A. Please get it the right way
round!
Warning It is vital to realise that we can reduce A to B whether or not A and B are
solvable. The point is that if we were given a solution to B, we could use it to solve
A, so that A is ‘no harder’ than B. (These free, magic solutions to problems like B are
called oracles.) Not all unsolvable problems reduce to each other.3 Some unsolvable
problems are more unsolvable than others! We’ll see more of this idea in Part III.
Suppose M is any Turing machine, and w a word of its input alphabet, I . We write
M[w] for the new Turing machine4 that does the following:
It’s easy to see that there always is a TM doing this, whatever M and w are. Here’s an
example.
Example 5.4 Suppose TAIL is a Turing machine such that fTAIL (w) = tail(w) (for all
w ∈ C∗ ). So TAIL deletes the first character of w, and shifts the rest one square to the
left. See exercise 2.11(3) on page 33.
The machine TAIL[hello world] first writes ‘hello world’ on the tape, overwriting
whatever was there already. Then it returns to square 0 and calls TAIL as a subroutine.
This means that its output will be the rather coarse ‘ello world’ on any input. The input
is immediately overwritten, so it doesn’t matter what it is.
Figure 5.2 shows the Turing machine TAIL[hello] as a flowchart.
(x,l,1) (x,o,1)
0 (x,h,1) 1 (x,e,1) 2 (x,l,1) 3 4 5
(x,∧,−1)
(x,e,-1) 9 (x,l,-1) (x,l,-1) (x,o,-1) 6
10 8 7
(x,h,0)
2. M halts and succeeds on input w, iff M[w] halts and succeeds on input v for any
or all words v of I .
4A less succinct notation for M[w] would be ‘w, then M’.
74 5. Unsolvable problems
% In figure 5.2 we’d get (0, a, 1, h, 1), (1, a, 2, e, 1), (2, a, 3, l, 1), . . ., for all a in C ∪ {blank}.
s := 0 % s will be current state number (s = 0, 1, . . . , 2 · length(w)).
repeat with q = 1 to length(w)
for each a ∈ C ∪ {blank}, add an instruction 5-tuple ‘(s, a, s + 1, hqth char of wi, 1)’
s := s + 1
end repeat
for each a ∈ C ∪ {blank}, add an instruction 5-tuple ‘(s, a, s + 1, blank, −1)’
s := s + 1
% S[w] returns to square 0 and hands over to S.
% Add instructions for this, in the way we said.
repeat with q = length(w) down to 2
for each a ∈ C ∪ {blank}, add an instruction 5-tuple ‘(s, a, s + 1, hqth char of wi, −1)’
s:= s+1
end repeat
for each a ∈ C ∪ {blank}, add an instruction 5-tuple ‘(s, a, s + 1, h1st char of wi, 0)’
halt & succeed
Of course, this does not show all the details of how to transform the input word
code(S) ∗ w into the output word code(S[w]).
Exercises 5.5
1. Write out the code of the head-calculating machine M shown in figure 2.5
(p. 28), assuming to keep it short that the alphabet is only {a, b, ∧}. Then write
out the code of M[ab]. Do a flowchart for it. Does it have the number of states
that I claimed above? What is its output on inputs (i) bab, (ii) ∧? (Don’t just
guess; run your machine and see what it outputs!)
2. Would you implement the variable q in the pseudocode for EDIT using a param-
eter in one of EDIT ’s states, or by an extra tape? Why?
3. Write proper pseudocode (or even a flowchart!) for EDIT .
P ROOF. We will prove this by showing that HP reduces to EIHP. So assume we’re
given a Turing machine EI that solves EIHP. We convert it into a Turing machine H ,
as shown in figure 5.3. H first runs EDIT (see §5.3.3), then returns to square 0, then
(x,x,0) if sq. 0
EDIT EI
(x,x,-1)
if not sq. 0
H
Figure 5.3: EIHP solution gives HP solution
calls EI as a subroutine. We showed how to make EDIT , we’re given EI , and the rest
is easy. So we can really make H as above.
We claim that H solves HP. Let’s feed into H an input code(S) ∗ w of HP. H runs
EDIT , which converts code(S) ∗ w into code(S[w]). This word code(S[w]) is then fed
into EI , which (we are told) outputs 1 if S[w] halts and succeeds on input ε, and 0
otherwise.
But (cf. §5.3.2.1, with v = ε) S[w] halts and succeeds on input ε iff S halts and
succeeds on w.
So the output of H is 1 if S halts and succeeds on w, and 0 otherwise. Thus, H
solves HP, as claimed, and we have reduced HP to EIHP.
So by the argument at the beginning of §5.3, EIHP has no Turing machine solution.
EI does not exist. QED.
Exercises 5.7
4. [Challenge!] Show that the problem of deciding whether or not two arbitrary
standard Turing machines S1 , S2 are equivalent (definition 3.1) is not solvable
by a Turing machine.
5.3. Reduction 77
1 if fM (w) = ε,
½
fEO (code(M) ∗ w) =
0 otherwise,
for any standard Turing machine M and word w of the typewriter alphabet
C.
78 5. Unsolvable problems
If EO outputs 1
X
DUPLICATE EO
If EO outputs 0
The three parts carry, respectively, 20%, 40% and 40% of the marks. [1994]
A B A ∧ 3 ∧ ∧
0 1 2 3 4 5 6
state 6
We can use such a formula SEQ to code the sequence (a0 , a1 , . . . , an ) by the single number c.
We can recover (a0 , a1 , . . . , an ) from c using SEQ.
Finding such a formula SEQ is not easy, but it can be done. For example, we might try
to code (a0 , a1 , . . . , an ) by the number c = 2a0 +1 · 3a1 +1 · . . . · pnan +1 , where the first n + 1 primes
are p0 = 2, p1 = 3, . . . , pn .8 E.g., the sequence (2,0,3) would be coded by 22+1 · 30+1 · 53+1 =
15,000. Because any whole number factors uniquely into primes, we can recover a0 + 1, . . . , an +
1, and hence (a0 , a1 , . . . , an ) itself, from the number c. So it is enough if we can write a formula
SEQ(x, y, z) saying ‘the highest power of the yth prime that divides x is z + 1’. In fact we can
write such a formula, but there are simpler ones available (and usually the simple versions are
used in proving that there is a formula SEQ like this!)
4. In fact we want to code a configuration of M as a single number. Using SEQ, we can
code the configuration (k, r, ASCII(a0 ), ASCII(a1 ), . . . , ASCII(am )) as a single number, c. If we
do this, then SEQ(c, 0, k), SEQ(c, 1, r), SEQ(c, 2, ASCII(a0 )), . . . , SEQ(c, m + 2, ASCII(am )) are
true, and in each case the number in the third slot is the only one that makes the formula true.
5. Relationship between successive configurations. Suppose that M is in a configuration
coded (as in (4)) by the number c. If M executes an instruction successfully, without halting &
failing, it will move to a new configuration, coded by c′ , say. What is the arithmetical relationship
between c and c′ ?
Let c code (k, r, ASCII(a0 ), ASCII(a1 ), . . . , ASCII(am )). Assume that r ≤ m.9 So M is in
state k, and its head is reading the symbol ar = a, say. But we know the instruction table δ
of M. Assume that δ(k, a) = (k′ , b, d), where k′ is the new state, b is the symbol written, and
d the move. So if c′ codes the next configuration (k′ , r′ , ASCII(a′0 ), ASCII(a′1 ), . . . , ASCII(a′m ))
of M, we know that (i) r′ = r + d, and (ii) a′i = ai unless i = r, in which case a′r = b. So in
this case, the arithmetical relationship between c and c′ is expressible by the following formula
8 We use a1 + 1, etc., because a1 may be 0; if we just used a1 then 0 wouldn’t show up as a power of 2
dividing c. So we’d have code(2, 0, 3) = code(2, 0, 3, 0, 0, 0) — not a good idea.
9 If r > m, M is reading ∧. This is a special case which we omit for simplicity. Think about what we
need to do to allow for it.
5.4. Gödel’s incompleteness theorem 83
F(c, c′ , k, a, k′ , b, d):10
³
∀r [SEQ(c, 0, k) ∧ SEQ(c, 1, r) ∧ SEQ(c, r + 2, ASCII(a)]
% state k, head in sq. r reads a
→ [SEQ(c′ , 0, k′ ) ∧ SEQ(c′ , 1, r + d) ∧ SEQ(c′ , r + 2, ASCII(b))
%´ new state, head pos & char
∧ ∀i(i ≥ 2 ∧ i 6= r + 2 → ∀x(SEQ(c, i, x) ↔ SEQ(c′ , i, x))]
%rest of tape is unchanged
7. Coding a successful run as a single number. If we now use the formula SEQ to code the
entire (n + 2)-sequence (n, c0 , . . . , cn ) as a single number, g, say, we can express the constraints
in (6) as properties of g:
• ∀x(SEQ(g, 1, x) → I(x))
• ∀n∀i∀x∀y(SEQ(g, 0, n) ∧ 1 ≤ i < n + 1 ∧ SEQ(g, i, x) ∧ SEQ(g, i + 1, y) → N(x, y))
• ∀n∀x(SEQ(g, 0, n) ∧ SEQ(g, n + 1, x) → H(x))
10 We’ve cheated and used 1,2,3 in this formula, rather than S0, SS0, and SSS0. We will continue to cheat
like this, to simplify things. Also, the formula only works if d ≥ 0, as there’s no ‘−’ in the language of
arithmetic so if d = −1 we can’t write r +d. If d = −1, we replace SEQ(c, 2, r) in line 1 by SEQ(c, 2, r +1)
and SEQ(c′ , 2, r + d) in line 2 by SEQ(c′ , 2, r).
84 5. Unsolvable problems
So these algorithmic steps would solve the halting problem by a Turing machine.
11. Conclusion. But we know the halting problem can’t be solved by a Turing machine.
This is a contradiction. So G does not exist (because this is the only assumption we made).
3. Complete the missing details in the proof of Gödel’s theorem. Find out (from
libraries) how to write the formula SEQ.
Algorithms
6. Use of algorithms
We now take a rest from Turing machines, and examine the use of algorithms in general
practice. There are many: quicksort, mergesort, treesort, selection and insertion sort,
etc., are just some of the sorting algorithms in common use, and for most problems
there is more than one algorithm. How to choose the right algorithm? There is no
‘best’ search algorithm (say): it depends on the application, the implementation, the
environment, the frequency of use, etc, etc. Comparing algorithms is a subtle issue
with many sides.
88
6.1. Run time function of an algorithm 89
place to begin.) Other algorithms are still mysterious. Maybe you designed your own
algorithm or improved someone else’s, or you have a new implementation on a new
system. Then you have to analyse it yourself.
Often it’s not worth the effort to do a detailed analysis: rough rules of thumb are
good enough. Suggestions:
• Identify the abstract operations used by the algorithm (read, if-then, +, etc). To
maximise machine-independence, base the analysis on the abstract operations,
rather than on individual Java instructions.
• Most of the (abstract) instructions in the algorithm will be unimportant resource-
wise. Some may only be used once, in initialisation. Generally the algorithm
will have an ‘inner loop’. Instructions in the inner loop will be executed by far
the most often, and so will be the most important. Where is the inner loop?
A ‘profiling’ compilation option to count the different instructions that get exe-
cuted can help to find it.
• By counting instructions, find a good upper bound for the worst case run time,
and if possible an average case figure. It will usually not be worth finding an
exact value here.
• Repeatedly refine the analysis until you are happy. Improvements to the code
may be suggested by the analysis: see later.
(approx.
n!)
4. Maybe the algorithm throws away half the input in each step, as in binary search,
or heap accessing (see §7.3.4). So
f (n) = 1 + f (n/2)
5. The algorithm might recursively divide the input into two halves, but make a
pass through the entire input before, during or after splitting it. This is a very
common ‘divide and conquer’ scheme, used in mergesort, quicksort (but see
below), etc. We have
f (n) = n + 2 f (n/2)
roughly — n to pass through the whole input, plus f (n/2) to process each half.
So, using a trick of dividing by the argument, and letting n = 2x again,
for some constant c. So f (2x ) = 2x (x+c) = 2x ·x+ smaller terms. Thus, roughly,
f (n) = n log2 n + smaller terms. We have a log linear algorithm.
Log linear algorithms are usually good in practice.
6. Maybe the input is a set of n positive and negative whole numbers, and the algo-
rithm must find a subset of the numbers that add up to 0. If it does an exhaustive
search, in the worst case it has to check all possible subsets — 2n of them. This
takes exponential time. Algorithms with this run time are probably going to
be appalling, unless their average-case performance is better (the simplex algo-
rithm from optimisation is an example).
7. If the problem is to find all anagrams of the n-letter input word, it might try all
n! possible orderings of the n letters. The factorial function n! = 1 × 2 × 3 × . . . n
grows at about the rate of nn , even faster than 2n .
Quicksort This is a borderline case. In the average case, the hope is that in each re-
cursive call quicksort divides its input into two roughly equal parts. By case 5 above,
this means log linear time average-case performance. In the worst case, in each recur-
sive call the division gives a big and a small part. Then case 3 is more appropriate, and
it shows that quicksort runs in quadratic time in the worst case.
In practice, quicksort performs very well — and it can sort ‘in place’ without using
much extra space, which is good for large sorting jobs.
Exercises 6.4
1. Show that f is θ(g) iff there are m, c, d (with d > 0, possibly a fraction) such
that d · g(n) ≤ f (n) ≤ c · g(n) for all n ≥ m.
2. Show that if f is O (g) then there are c, d such that for all n, f (n) ≤ max(c, d ·
g(n)). Is the converse true?
3. [Quite long.] Check that the functions in §6.1.1 are listed in increasing order of
growth: if f is before g in the list then f is O (g) but not θ(g). [Calculus, taking
logs, etc., may help.]
4. Let F be the set of all real-valued functions on whole numbers. Define a binary
relation E on F by: E( f , g) holds iff f is θ(g). Show that E is an equivalence
relation on F (see §7.1 if you’ve forgotten what an equivalence relation is).
(Some people define θ( f ) to be the E -class of f .)
Show also that ‘ f is O (g)’ is a pre-order (reflexive and transitive) on F .
5. Show that for any a, b, x > 0, loga (x) = logb (x) · loga (b). Deduce that loga (n) is
θ(logb (n)). Conclude that when we say an algorithm runs in log time, we don’t
need to say what base the log is taken to.
• A rough analysis is quicker to do, and often surprisingly accurate. E.g., most
time will be spent in the inner loop (‘90% of the time is spent in 10% of the
code’), so you might even ignore the rest of your program.
• You may not know whether the algorithm will be run on a Cray or a PC (or
both). Each instruction runs faster by roughly a constant factor on a Cray than
on a PC, so we might prefer to keep the constant factor c above.
• You may not know how good the implementation of the algorithm is. If it uses
more instructions than needed in the inner loop, this will increase its running
time by roughly a constant factor.
• The run time of an algorithm will depend on the format of the data provided to
it. E.g., hexadecimal addition may run faster than decimal addition. So if you
don’t know the data format, or it may change, an uncertainty of a constant factor
is again introduced.
94 6. Use of algorithms
Your choice of algorithm may also be influenced by other factors, such as the
prevalent data structures (linked list, tree, etc.,) in your programming environment, as
some algorithms take advantage of certain structures. The algorithm’s space (memory)
usage may also be important.
6.3 Implementation
We’ve seen some of the factors involved in choosing an algorithm. But the same algo-
rithm can be implemented in many different ways, even in the same language. What
advice is there here?
6.3.2 Optimisation
Only do this if it’s worth it: if the implementation will be used a lot, or if you know it
can be improved. If it is, improve it incrementally:
• Get a simple version going first. It may do as it is!
• Check any known maths for the algorithm against your simple implementation.
E.g., if a supposedly linear time algorithm takes ages to run, something’s wrong.
• Find the inner loop and shorten it. Use a profiling option to find the heavily-
used instructions. Are they in what you think is the inner loop? Look at every
instruction in the inner loop. Is it necessary, or inefficient? Remove procedure
calls from the loop, or even (last resort!) implement it in assembler. But try
to preserve robustness and machine-independence, or more programmers’ time
may be needed later.
• Check improvements by empirical testing at each stage — this helps to elimi-
nate the bad improvements. Watch out for diminishing returns: your time may
nowadays be more expensive than the computer’s.
An improvement by a factor of even 4 or 5 might be obtained in the end. You may
even end up improving the algorithm itself.
If you’re building a large system, try to keep it amenable to improvements in the
algorithms it uses, as they can be crucial in performance.
96 7. Graph algorithms
The last three are listed in Sedgewick. Maybe that’s why all four are from the same
publisher.
7. Graph algorithms
Definition 7.1 A graph is a pair (V, E), where V is a non-empty set of vertices or
nodes, and E is a symmetric, irreflexive binary relation on V .
We can represent a graph by drawing the nodes as little circles, and putting a line
(‘edge’) between nodes x, y iff E(x, y) holds. In figure 7.1, the graph is ({1,2,3,4,5,6},
{(1,3),(3,1), (2,3),(3,2), (4,5),(5,4), (5,6),(6,5), (4,6),(6,4)}).
1 4
3 5 4
2 3 1
5
2 6 6
Exercise 7.2 Show that any graph with n nodes has at most n(n − 1)/2 edges.
Examples There are many examples of graphs, and many problems can be repre-
sented in terms of graphs. The London tube stations form the nodes of a graph whose
edges are the stations (s, s′ ) that are one stop apart. The world’s airports form the nodes
of a graph whose edge pairs consist of airports that one can fly between by Aer Lingus
without changing planes. The problem of pairing university places to students can be
considered as a graph: we have a node for each place and each student, and we ask
for every student to be connected to a unique place by an edge. Electrical circuits: the
wires form the edges between the nodes (transistors etc).
98 7. Graph algorithms
Other graphs There are more ‘advanced’ graphs. Directed graphs do not require
that E is symmetric. In effect, we allow arrows on edges, giving each edge a direction.
There may now be an edge from x to y, but no edge from y to x. The nodes could
represent tasks, and an arrow from a to b could say that we should do a before b. They
might be players in a tournament: an arrow from Merlin to Gandalf means Merlin won.
Weighted graphs (see section 8) have edges labelled with numbers, called weights
or lengths. If the nodes represent cities, the weights might represent distances between
them, or costs or times of travel.
Graph algorithms Many useful algorithms for dealing with graphs are known, but
as we will see, some are not easy. For example, no fast way to tell whether a graph
can be drawn on paper without edges crossing one another was known until 1974,
when R.E. Tarjan developed an ingenious linear time algorithm. We’ll soon see graph
problems with no known efficient algorithmic solution.
A linked list can represent directed graphs: the entries in a line headed by x are
those nodes y such that there’s an arrow from x to y. The weights in a weighted graph
can be held in an integer field attached to each non-header entry in the list.
unlike when searching a tree (e.g., in implementing Prolog), the difference between
breadth first and depth-first search in a graph is not just the order in which vertices are
visited. The path actually taken also differs.
It is x that’s in the queue (with label y and priority p). So at most one x-entry is allowed
in the queue at any time.
1. We can ‘push’ onto the priority queue any entry x, with any label y, and any
priority p.
2. The push has no effect if x is already an entry in the queue with higher or equal
priority than p. I.e., if the queue contains a triple (x, z, q), where z is any label
and q is a priority higher than or equal to p, the push doesn’t do anything.
3. Any x-entry already in the queue but with lower priority than p is removed. I.e.,
if the queue contains a triple (x, z, q), with any label z, and q a lower priority
than p, then the push replaces it with (x, y, p).
4. A ‘pop’ operation always removes from the queue (and returns) an entry (x, y, p)
with highest possible priority.
usually want to know the route we took when searching the graph, as well as which
nodes we visited and in what order. Knowing the order that we visited the nodes in is
not enough to determine the route: see the example below.
1 2 3 4
7 6 5
1→2→5→6→7
2→1→6
3→4→5→6
4→3→5
5→1→3→4
6→1→2→3
7→1
1 2 3 4
7 6 5
The ‘tree’ produced (heavy arrows in figure 7.4) tends to have only a few branches,
which tend to be long. Cf. figure 7.2 (left).
7.3. Algorithm for searching a graph 103
7.3.6.1 Execution
We show the execution as a table. Initially, the fringe consists of (1, ∗, 0), where the la-
bel ∗ indicates we’ve just started, 1 is the starting node, and 0 is the (arbitrary) priority.
We pop it from the fringe. The immediate neighbours of 1 are numbered 2, 5, 6 and
7. Assume we push them in this order. The fringe becomes (2, 1, 1), (5, 1, 2), (6, 1, 3),
(7, 1, 4), in the format of §7.3.4.1; the third figure is the (increasing) priority.
fringe pop visited print push comments
(1, ∗, 0) (1, ∗, 0) 1 (2, 1, 1)
(5, 1, 2)
(6, 1, 3)
(7, 1, 4)
(2, 1, 1) (7, 1, 4) 7 edge – No unvisited neighbours of 7, so
(5, 1, 2) ‘1, 7’ no push.
(6, 1, 3)
(7, 1, 4)
(2, 1, 1) (6, 1, 3) 6 edge (2, 6, 5) ‘Backtrack’ to visit 6 from 1. Push
(5, 1, 2) ‘1, 6’ (3, 6, 6) of 2 has better priority than the
(6, 1, 3) current fringe entry (2, 1, 1), which
is replaced.
(5, 1, 2) (3, 6, 6) 3 edge (4, 3, 7) The view of 5 from 3 replaces the
(2, 6, 5) ‘6, 3’ (5, 3, 8) older view from 1.
(3, 6, 6)
(2, 6, 5) (5, 3, 8) 5 edge (4, 5, 9) Again, this push involves updat-
(4, 3, 7) ‘3, 5’ ing the priority of node 4.
(5, 3, 8)
(2, 6, 5) (4, 5, 9) 4 edge – No unvisited neighbours of 4, so
(4, 5, 9) ‘5, 4’ no push.
(2, 6, 5) (2, 6, 5) 2 edge – Another backtrack! No unvisited
‘6, 2’ neighbours of 2, so no pushes.
empty Terminate call of visit(1). Return.
Exercises 7.3
1. What route do we get if we start at 2, or 4, instead of 1?
2. What alterations to the code in §7.3.5 would be needed to implement the fringe
with an ordinary stack?
104 7. Graph algorithms
3. Work out how to deduce the path taken in depth-first search, if you know only (a)
the graph, and (b) the order in which its nodes were visited. (The main problem,
of course, is to handle backtracking.) Can you do the same for breadth-first
search (see below)?
1 2 3 4
5 6 7
Exercise 7.4 Work out the execution sequence and try it from different starting nodes.
• Initialisation of the n elements of the visited array to false (line 1), the n reads
from it in line 3, and initialisation of the fringe (lines 7–8): total = 2n + 2.
• Every node is removed from the fringe exactly once (line 10). This involves n
accesses, each taking time ≤ log n. Total: n log n.
7.4. Paths and connectedness 105
• For each node x visited, every neighbour z is obtained from the linked list (only
count 1 for each access to this, since we just follow the links) and checked to
see if it’s been visited (lines 12–16; count 1). As each graph edge (x, z) gets
checked twice in this way, once from x and once from z, the total time cost here
is 2 × 2e = 4e.
• Not all z connected to x get written to the fringe, because of the test in line 13.
If z is put on the fringe when at x, then x will not be put on the fringe later, when
at z, as x will by then have been visited. So each edge results in at most one
fringe-write. Hence the fringe is written to at most e times. Each write takes
log n. Total: e log n.
Grand total: 2n + 2 + n log n + 4e + e log n. This is satisfactorily low, and the algorithm
is useful in practice. Neglecting smaller terms, we conclude:
The algorithm takes time O ((n + e) log n) (i.e., log linear time) in the
worst case.
The performance of graph algorithms is often stated as f (n, e), not just f (n).
B C B C B C
A D A D A D
H H H
G F E G F E G F E
In figure 7.6 (left), the heavy lines show the path ACHFDE from A to E. (They
also represents the path EDFHCA from E to A — we can’t tell the direction from the
figure.) This path is non-backtracking. In the centre, BHFEHC (or BHEFHC?) is a
path from B to C, but it’s a backtracking path because H comes up twice. On the right,
the heavy line is an attempt to represent the path HFH, which again is backtracking.
106 7. Graph algorithms
1 2 3 1 2 3
6 6
7 5 4 7 5 4
7.4.1 Connectedness
The graph of figure 7.3 is connected: there’s a path along edges between any two
distinct (= different) vertices. In contrast, the graph on the left of figure 7.7 is discon-
nected. There’s no path from 1 to 3.
What if we run the algorithm on a disconnected graph like this? In depth-first
mode, it traces out the heavy lines on the right of figure 7.7. Visit(1) starts at 1 and
worms its way round to 2,6,5 and 7. But then it terminates, and visit(3) is called (line
3 of the code in §7.3.5). Whatever priority scheme we adopt, visit(1) will only visit all
nodes reachable from 1.
7.5.1 Trees
A tree is a special kind of graph:
Definition 7.7 (very important!) A tree is a connected graph with no cycles.
But what’s a cycle?
Definition 7.8 A cycle in a graph is a path from a node back to itself without using a
node or edge twice.
So paths of the form ABA, and figures-of-eight such as ABCAEDA (see figure 7.8),
are not cycles.
E A B
A B
D C
A B C 2
1 3
In figure 7.9, A is a tree. B has a cycle (several in fact), so isn’t a tree. C has no
cycles but isn’t connected, so isn’t a tree. It splits into three connected components,
the ringed 1,2 and 3, which are trees. Such a ‘disconnected tree’ is called a forest.
Only a connected graph can have a spanning tree, but it can have more than one
spanning tree. The breadth-first and depth-first searches above gave different spanning
trees (figures 7.4 and 7.5).
A spanning tree is the quickest way to visit all the nodes of a (connected) graph
from a starting node, as no edge is wasted. The algorithm starts with the initial vertex
and no edges. Every step adds a single node and edge, so the number of nodes visited
is always one ahead of the number of edges. Because of this, the number of edges in
the final spanning tree is one less than the number of nodes.
If we run the algorithm on a tree, it will trace out the entire tree, using all the edges
in the tree. (A tree T is connected, so the algorithm generates a spanning tree T ′ of T .
Every edge of T is in T ′ . For if e = (x, y) were not an edge of T ′ , then as x and y are
in T ′ , there’s a (non-backtracking) path from x to y in T ′ ; and this path, plus e, gives a
cycle in the original tree T — impossible. So T ′ = T .) Thus we see:
Proposition 7.10 Any tree with n vertices has n − 1 edges.
1 2 3 4
7 6 5
Definition 7.12 A graph (V, E) is said to be complete if (x, y) ∈ E for all x, y ∈ V with
x 6= y.
Exercises 7.13
1. What spanning trees are obtained by depth first and breadth-first search in the
complete graph of figure 7.12? How many writes to the fringe are there in each
case?
6 2
5 3
Definition 7.15 The Hamiltonian circuit problem (HCP) asks: does a given graph
have a Hamiltonian circuit?
7.8. Summary of section 111
Warning HCP is, it seems, much harder than the previous problems. Our search
algorithm is no use here: we want a ‘spanning cycle’, not a spanning tree. An algorithm
could check every possible ordered list of the nodes in turn, stopping if one of them is
a Hamiltonian circuit. If the graph has n nodes, there are essentially at most (n − 1)!/2
such lists: n! ways of ordering the nodes, but we don’t care which is the start node (so
divide by n), or which way we go round (so divide by 2). Whether a given combination
is a Hamiltonian circuit can be checked quickly, so the (n − 1)! part dominates the time
function of this algorithm. But (n − 1)! is not O (nk ) for any number k. It is not even
O (2n ) (exponential time).
There is no known polynomial time solution to this problem: one with time func-
tion O(nk ) for some k. We will look at it again later, as it is one of the important class
of NP-complete problems.
Exercise 7.16 (Puzzle) Consider the squares on a chess-board as the nodes of a graph,
and let two squares (nodes) be connected by an edge iff a knight can move from one
square to the other in one move. Find a Hamiltonian circuit for this graph.
8. Weighted graphs
Now we’ll consider the more exotic but still useful weighted graph. We’ll examine
some weighted graph problems and algorithms to solve them. Sedgewick’s book has
more details.
edges represent the cost of building an oil pipeline from one town to another: e.g.,
from A to D it’s £5 million. The problem is to find the cheapest network.
A
8 1
9
E B
5 3
7 4
6 6
2
D C
A A
8
1
E E
3 B B
5 3
7 4
6
D C D C
a pipe is to be built directly between x and y. Two possible pipe networks are given in
figure 8.2.
Clearly, the cheapest network will have only one pipeline route from any town to
any other. For if there were two different ways of getting oil from A to B (e.g., via C or
via E and D, as on the left of figure 8.2), it would be cheaper to cut out one of the pipes
(say the expensive one from A to E). The remaining network would still link all the
towns, but would be cheaper. In general, if there is a cycle in the proposed network, we
can remove an edge from it and get a cheaper network that still links up all the towns.
So:
• the pipes the company builds should form a tree.
The right hand pipeline network in figure 8.2 does not connect all the towns. As every
town should lie on the network,
• the tree should be a spanning tree (of the complete graph with vertices A–E).
• And its total cost should be least possible.
Definition 8.2 A minimal spanning tree (MST) of a (connected) weighted graph
(V, E, w) is a graph (V, P) such that:
1. (V, P) is connected
2. P ⊆ E
3. the sum of the weights of the edges in P is as small as possible, subject to the
two previous constraints.
A minimal spanning tree (V, P) will be a tree, for (as above) we could delete an edge
from any cycle, leaving edges still connecting all the nodes but with smaller total
weight. Because (V, P) must be connected, it must be a spanning tree of (V, E).
A MST will give the oil company a cheapest network. There may be more than
one such tree: e.g., if all weights in (V, E, w) are equal, any spanning tree of (V, E) will
do. Though one might find a MST in the graph of figure 8.1 by inspection, this will be
harder if there are 100 towns, say. We need an algorithm to find a MST.
114 8. Weighted graphs
Example 8.3 Figure 8.3 shows a weighted graph, the weight w(x, y) being the distance
between the nodes x and y as measured in the diagram. The bold lines form a spanning
X Y
Figure 8.3: does this tree have the separation property? (weight ≈ distance)
8.3. Prim’s algorithm to find a MST 115
tree; the light lines are the graph edges not in the tree. We’ve chosen an arbitrary
division of the nodes into sets X,Y . If the spanning tree has the separation property, no
graph edge from X to Y should be shorter than the three heavy tree edges crossing the
X−Y division.
Warning — what the separation property is not. There might be more than one
shortest edge from X to Y . (They’ll all be of equal length, of course. For example, this
happens if all graph edges have the same length!) The separation property says that at
least one of them is in the spanning tree.
The separation property is talking about shortest X−Y edges, not shortest paths. It
is false in general that the shortest path between any node of X and any node of Y is
the path through the tree. Look for yourself. The top two nodes in figure 8.4 below are
connected by an edge of length 12. But the path between them in the tree shown has
length 4 + 7 + 6 + 5 = 22 — and the tree does in fact have the separation property.
12 12
T
4 4
Y
5 5
4 10 4 10
9 9
6 3 5 6 3 5
7 7
X
8 8
Z
6 6 6 6
Figure 8.4: the MST has one of the least weight X–Y (and Z–T) edges
edges (of weight 5), and one of them is indeed in the MST shown, as the separation
property says. On the right, I used a different division, Z−T , of the same weighted
graph. The shortest Z−T edge is of length 3 — and again, it’s in the MST.
So the separation property might just hold for this MST, if we checked all sets X,Y .
But in general? In fact, any MST has the separation property. But we can’t establish
this by checking all possible MSTs of all weighted graphs — there are infinitely many
of them, and we wouldn’t have the time. We will have to prove it.
Theorem 8.5 Any MST has the separation property.
116 8. Weighted graphs
P ROOF. We will show that any spanning tree that does not have the separation property
is not an MST.
Suppose then that:
(For an example, see the spanning tree shown in figure 8.3.) We’ll show that T is not a
MST.
As T is a spanning tree, there’s a unique path in T connecting x to y (the dotted line
in figure 8.5). This path must cross over from X to Y by some edge e′ = (x′ , y′ ) ∈ E .
(We let e′ be any cross-over edge if there’s more than one.)
X Y
y
x
e
y'
e'
x'
Let’s replace e′ by e in T . We get T ∗ = (V, (P ∪ {e}) \ {e′ }) (see figure 8.6). Then:
• T ∗ is connected. For if z,t ∈ V are different nodes, there was a path from z to t
in T . If this path didn’t use the edge e′ , it’s still a path in T ∗ . If it did use e′ , then
the path needed to get from x′ to y′ . But we can get from x′ to y′ in T ∗ , by going
via e. So we can still get from z to t in T ∗ .
So T ∗ is a spanning tree. But T ∗ has smaller total weight than T . So T was not a
MST. The separation property is proved. QED.
8.3. Prim’s algorithm to find a MST 117
X Y
y
x
e
y'
x' e'
P ROOF. Assume for simplicity that all graph edges have different weights (lengths).
(The algorithm finds a MST even if they don’t: proving this is a tutorial exercise.) Let
T be any MST. At each stage, our proposed algorithm adds to its half-built tree X
the shortest possible edge connecting the nodes of X with the remaining nodes Y . (As
all edges have different weights, there is exactly one such edge.) By the ‘separation’
property (theorem 8.5), the MST also includes this edge. So every edge of the tree built
by the algorithm is in T .
Example 8.7 Figure 8.7 shows Prim’s algorithm half way through building a MST for
the graph in figure 8.4.
12 X Next
5 4 edge
4 10
9 to add
6 3 5
7 8 6 6
Y
X is the half-built tree — the nodes already visited. Y is the rest. In the next step,
the algorithm will add the edge shown, as it has highest priority on the fringe at the
moment (check this!) But by the separation property, this edge is the shortest X−Y
edge. So it is also in any MST — e.g., it’s in the one shown in figure 8.4.
But all spanning trees have the same number of edges (n − 1, where the whole
graph has n nodes; see proposition 7.10). We know the algorithm always builds a
spanning tree — so it chooses n − 1 edges. But T is a MST, so also has n − 1 edges.
Since the algorithm only chooses edges in the MST T , and it chooses the same number
of edges (n − 1) as T has, it follows that the tree built by the algorithm is T . So Prim’s
algorithm does indeed produce a MST. This is true even with fractional or real-number
weights. QED.
Exercises 8.8 (challenge!)
1. Deduce that if all edges in a weighted graph have different weights, then it has
a unique MST. Must this still be true if some edges have equal weight? Can it
still be true?
2. Is it true that any spanning tree (of a weighted graph) that has the separation
property is a MST of that graph? (This, ‘separation property ⇒ MST’, is the
converse of theorem 8.5.)
3. Here’s a proposed algorithm to find a MST of a connected weighted graph G.
1 Start with any spanning tree T of G.
2 Pick any X−Y division of the nodes of G.
3 If T doesn’t have a shortest X−Y edge, replace an X−Y edge of T
with a shorter one [as in the proof in §8.3.1.3, especially figures 8.5
and 8.6].
4 Repeat steps 2–3 until T doesn’t change any more, whichever X , Y
are picked.
(a) Does this terminate?
(b) If it does, is the resulting tree T a MST of G?
(c) If so, would you recommend this algorithm? Why?
Warning If we run Prim’s algorithm on the graph in figure 8.1, starting from node
A, we get a MST — we just proved this. If we start it from node D, we also get a MST
— we proved that the algorithm always gives a MST. So wherever we start it from,
it delivers a MST. Of course, we may not always get the same one. But if all edges
had different weights, we would get the same MST wherever we started it from (by
exercise 8.8(1) above).
So to get an MST, there is no need to run the algorithm from each node in turn, and
take the smallest tree found. It gives an MST wherever we start it from.
8.3. Prim’s algorithm to find a MST 119
Try the algorithm on figure 8.1, starting from each node in turn. What is the total
weight of the tree found in each case? (They should all be the same!) Do you get the
same tree?
Exercise 8.9 (Kruskal’s algorithm for MST) Another algorithm to find an MST, due
to Kruskal, runs in O (e log e). This exercise is to check that it works. Let (V, E, w) be a
connected weighted graph with n nodes and e edges. The algorithm works as follows:
1 Sort the edges in E . Let the result be E = {s1 , . . . , se } in order of weight,
so that w(s1 ) ≤ w(s2 ) ≤ · · · ≤ w(se ).
120 8. Weighted graphs
A
8 1
E 9 B
5 3
7 6 4
6
D C
2
2 set T := 0/ ; set i := 1
3 repeat until T contains n − 1 edges
4 if the graph (V, T ∪ {si }) has no cycles then set T := T ∪ {si }
5 add 1 to i
6 end repeat
7 output T
3. Deduce that (V, T ) is a MST of (V, E, w). (It may help to simplify by assuming
all edges of E have different weights, but try to eliminate this assumption later.)
4. Show that, using a suitable sorting algorithm (suggest one), Kruskal’s algorithm
runs in time O (e log e).
Problem (TSP): Given a complete weighted graph (V, E, w) (that is, (V, E) is a com-
plete graph), and a number d , is there a Hamiltonian circuit of (V, E) of total length at
most d ?2
As the graph is complete, there will be many Hamiltonian circuits, but they may all
be longer than d .
This is not a toy problem: variants arise in network design, integrated circuit de-
sign, robotics, etc. See Harel’s book, p.153. TSP is another hard problem. The ex-
haustive search algorithm for HCP also works for TSP. There are (n − 1)!/2 possible
routes to consider (see page 111). For each route, we find its length (this can be done
in time O (n)) and compare it with d . As for HCP, this algorithm runs even slower
than exponential time. There is no known polynomial time solution to TSP. Some
heuristics and sub-optimal solutions in special cases are known.
is correct, actually delivering a MST, the performance of the nearest neighbour heuris-
tic is absolutely diabolical in many cases — it’s one of the worst TSP heuristics of all.
The energetic will find seriously incriminating evidence in Rayward-Smith’s book; the
rest of us may just try the heuristic on the graph shown in figure 8.9.
start 1
2
150 1
2
One might easily think that the nearest neighbour heuristic was ‘intuitively a cor-
rect solution’ to TSP. It takes the best edge at each step, yes? But in fact, it is far
from being correct. Intuition is surely very valuable. Here we have an awful warning
against relying on it uncritically. Nonetheless, nearest neighbour is used as an initial
stage in some more effective heuristics.
Exercise 8.10 Suppose we had a polynomial time algorithm that solved TSP. Show
how it could be used to solve the ‘original’ version of TSP mentioned in footnote 2.
(Creativity is called for.)
Example 8.11 (p-time reduction of HCP to TSP) Suppose that we have an instance
of HCP: a graph such as the one shown in figure 8.10.
We can turn it into an instance of TSP by:
• defining the distance between nodes x and y by
½
1, if x is joined to y in the graph,
d(x, y) =
2, otherwise
c1 c2
= distance 1
c6
c3 = distance 2
c5
c4
We get figure 8.11. This conversion takes time about n2 if there are n nodes, so is
p-time. Then
• any Hamiltonian circuit in the original graph yields a round trip of length n in
the weighted graph.
• Conversely, any round trip in the weighted graph must obviously contain n
edges; if it is of length ≤ n then all its edges must have length 1. So they
must be real edges of the original graph.
124 8. Weighted graphs
So the original graph has a Hamiltonian circuit iff there’s a route of length ≤ n in the
corresponding weighted graph. E.g., in figure 8.11, the route (c1 , c2 , c3 , c4 , c6 , c5 , c1 )
has length 6.
Complexity
In Part I of the course we saw that some problems are algorithmically unsolvable.
Examples:
• the halting problem (will a given TM halt on a given input?)
• deciding the truth of an arbitrary statement about arithmetic.
But there are wide variations in difficulty even amongst the solvable problems. In prac-
tice it’s no use to know that a problem is solvable, if all solutions take an inordinately
long time to run. So we need to refine our view of the solvable problems. In Part III
we will classify them according to difficulty: how long they take to solve. Note: the
problems in Part III are solvable by an algorithm; but they may not be solvable in a
reasonable time.
Earlier, we formalised the notion of a solvable problem as one that can be solved
by a Turing machine (Church’s thesis). We did this to be able to reason about al-
gorithms in general. We will now formalise the complexity of a problem, in terms
of Turing machines, so that we can reason in general about the varying difficulty of
problems.
We will classify problems into four levels of difficulty or complexity. (There are
many finer divisions).
1. The class P of tractable problems that can be solved efficiently (in polynomial
time: p-time).
2. The intractable problems. Even though these are algorithmically solvable, any
algorithmic solution will run in exponential time (or slower) in the worst case.
Such problems cannot be solved in a reasonable time, even for quite small in-
puts, and for practical purposes they are unsolvable for most inputs, unless
the algorithm’s average case performance is good. The exponential function
dwarfs technological changes (figure 6.1), so hardware improvements will not
help much (though quantum computers might).
3. The class NP of problems. These form a kind of half-way house between the
tractable and intractable problems. They can be solved in p-time, but by a non-
deterministic algorithm. Could they have p-time deterministic solutions?
This is the famous question ‘P = NP?’ — is every NP-problem a P-problem?
The answer is thought to be no, though no-one has proved it. So these problems
are currently believed to be intractable, but haven’t been proved so.
126
127
4. The class NPC of NP-complete problems. In a precise sense, these are the hard-
est problems in NP. Cook’s theorem (section 12) shows that NP-complete prob-
lems exist (e.g., ‘PSAT’); examples include the Hamiltonian circuit and travel-
ling salesman problems we saw in sections 7–8, and around 1,000 others (so
far). All NP-complete problems reduce to each other in polynomial time (see
§8.6). So a fast solution to any NP-complete problem would immediately give
fast solutions to all the others — in fact to all NP problems. This is one rea-
son why most people believe NP-complete problems have no fast deterministic
solution.
Why study complexity? It is useful in practice. It guides us towards the tractable
problems that are solvable with fast algorithms. Conversely, NP-complete problems
occur frequently in applications. Complexity theory tells us that when we meet one, it
might be wise not to seek a fast solution, as many have tried to do this without success.
On a more philosophical level, Church’s thesis defined an algorithm to be a Turing
machine. So two Turing machines that differ even slightly represent two different
algorithms. But if each reduces quickly to the other, as all NP-complete problems
do, we might wish to regard them as the same algorithm — even if they solve quite
different problems! So the notion of fast reducibility of one problem or algorithm to
another gives us a higher-level view of the notion of algorithm.
So in Part III we will:
1. define the run time function of a Turing machine,
2. introduce non-deterministic Turing machines and define their run time function
also,
3. formalise fast reduction of one problem to another,
4. examine NP- and NP-complete problems.
We begin by introducing the notions needed to distinguish between tractable and in-
tractable problems. The classes NP and NPC will be discussed in sections 10 and 12.
Definition 9.1 A yes/no problem is one with answer yes or no. Each yes/no problem
has a set of instances — the set of valid inputs for that problem. The yes-instances
of a problem are those instances for which the answer is ‘yes’. The others are the
no-instances.
Definition 9.2
1. A Turing machine M is said to accept a word w of its input alphabet if M halts
and succeeds on input w.
2. M is said to reject w if M halts and fails on input w.
3. A Turing machine M is said to solve a yes/no problem A if:
9.1. Yes/no problems 129
Y N
Note We do not use the O -notation in the definition of M running in p-time (e.g.,
by saying ‘timeM (n) is O (nk )’ for some k). We require that timeM (n) should be at
most p(n), not just at most c · p(n) for some constant c. This is no restriction because
c · p(n) is a polynomial anyway. But further, we require that timeM (n) ≤ p(n) for all
n, however small, so that we are sure what happens for all n. We have to be a bit
more careful than with the more liberal O -notation, but there are some benefits of this
approach:
Proposition 9.6 p-time Turing machines always halt.
P ROOF. If M is p-time then for some polynomial p(n), M takes at most p(n) steps to
run on any word w of length n. But p(n) is always finite, for any n. So M always halts
(succeeding or failing) on any input; it can’t run forever. QED.
The following exercise shows that insisting on a firm polynomial bound on run
time for all n is not really a restriction.
Exercise 9.7 Let f : I ∗ → Σ∗ be any partial function. Show that the following are
equivalent:
9.3. Tractable problems — the class P 131
9.3.2 P
Adopting the Cook–Karp thesis, we make the following important definition.
Definition 9.8
1. A yes/no problem is said to be tractable if it can be solved by a Turing machine
running in p-time.
2. An algorithm is said to be tractable if it can be implemented by a Turing machine
that runs in polynomial time.
3. We write P for the class of tractable yes/no problems: those that are solvable by
a Turing machine running in polynomial time.
132 9. Basic complexity theory
Definition 9.9 The complement of a yes/no problem is got by exchanging the answers
yes and no. What were the yes-instances now become the no-instances, and vice versa.
E.g., the complement of ‘is n prime?’ is ‘is n composite?’.
If S is a class of problems (e.g., P), we write co-S for the class consisting of the
complements of the problems in S . Clearly, S = co-co-S .
Exercise 9.11 The proof above doesn’t explain in detail what happens if M tries to
move left from square 0. How would you do this?
Show that co-P = P (i.e., not just ‘⊆’).
Some problems are known to be intractable. There are many examples from logic:
one is deciding validity of sentences of first-order logic written with only two variables
(possibly re-used: like ∃x∃y(x < y ∧ ∃x(y < x))). This problem is solvable, but all
algorithms must take at least exponential time.
But many common problems have not been proved to be either tractable or in-
tractable! Typical examples are HCP and TSP. All known algorithms to solve these
problems are intractable, but it is not known if the problems are themselves intractable.
Maybe there’s a fast algorithm that everyone’s missed. For reasons to be seen in sec-
tion 12, this is not thought likely.
Besides TSP and HCP, problems in this category include:
• h(p → q) = false,
• h(((p → q) → p) → p) = true,
• h((p ∧ q) ∨ (¬p ∧ ¬q)) = h(p ↔ q) = false.
For other problems such as HCP and TSP (type (∃)), no clever search strategy has
yet been found, and no tractable solutions are known.3
So problems subdivide further:
(∃1) Problems of type (∃) for which a clever (i.e., p-time) search strategy is available.
(∃2) Problems of type (∃) for which no clever search strategy is known.
The type (∀) problems subdivide similarly. E.g., is every spanning tree of length
> d ? is type (∀1), as we can find a MST and see if it weighs in at more than d . The
table gives more examples.
∃) Is there a needle? ∀) Is there no needle?
1) a fast search strategy ‘Does the weighted ‘Is every spanning tree
is known graph G have a of the weighted graph
spanning tree of weight G of total weight ≥ d?’
< d ?’
2) no fast search TSP, HCP, PSAT (and ‘Is every Ham. circuit
strategy is known as yet all NP-complete of length > d ?’ ‘Is the
problems) given formula
unsatisfiable?’
The point is that we know where to look for a tree of length ≤ d if there is one. We
can narrow down the search space to a small size that can be searched tractably, given
that we have a fast algorithm to check that a given possible solution to the problem is
actually a solution.
Now, importantly, if we can narrow down the search space in this way, then it’s
just as easy to find a solution as to check that there isn’t a solution! Both involve
going through the same shrunken search space. So type (∃1) and (∀1) problems are
equally easy (they are tractable). This really follows from the fact that P is closed
under complement (proposition 9.10). Once we know a type (∃) problem is tractable
(in P), its complement, a type (∀) problem, will also be tractable.
The type (2) problems seem to be intractable, but the only (!) source of intractabil-
ity is our inability to find a clever strategy to replace exhaustive search. They would
become tractable if we had a good search strategy.
Which problems become tractable if we discount4 the cost of exhaustive search?
This is a kind of science fiction question: what would it be like if . . . ? The answer is:
over a thousand commonly encountered ones: the NP problems. They are of type (∃);
their complements, of type (∀), would simultaneously become tractable, too.
And could there really be a clever search strategy for these problems, one that
we’ve all missed? Most people think not, but no-one is sure. We explore these inter-
esting questions in the next section, using a new kind of Turing machine.
3 There are fast probabilistic and genetic ‘solutions’ to TSP that are sometimes very effective, but they
are not guaranteed to solve the problem.
4 There are several ways of doing the discounting, depending on what kind of information we want
from the exhaustive search. The simplest way is to discount the cost of search in type (∃) problems —
those involving simply seeing whether there exists (∃) a solution among many possibilities — and this is
the approach we will take in section 10. Another way uses oracles.
137
A non-deterministic Turing machine is one that can make choices of which ‘instruc-
tion’ to execute at various points in a run. So what happens during the run is not
determined in advance. Such a machine gives us an exhaustive search for free, be-
cause by using a sequence of choices it can simply guess the solution (which part of
the haystack to check). We don’t specify which choices are made, or how, because we
are interested in solving problems when we’re given a search for free, not in the mech-
anism of the search. We can view the non-deterministic parts of a non-deterministic
Turing machine N as ‘holes’, waiting to be filled by a clever search strategy if it’s ever
invented. (Such holes are rather like variables in equations — e.g., x in x2 + 2x + 1 = 0
— and we know how useful variables can be.) In the meantime we can still study the
behaviour of N — by studying non-determinism itself.1
So: a non-deterministic Turing machine is like an ordinary one, but more than one
instruction may be applicable in a given state, reading a given symbol. If you like, the
instruction table δ can have more than one entry for each pair (q, a) ∈ Q × Σ. When
in state q and reading a, the machine can choose which instruction to execute. This is
why these machines are called non-deterministic: their behaviour on a given input is
not determined in advance.
its head in square 0. In state q and reading symbol a, N works like this:
• If q ∈ F then N halts and succeeds.
• Otherwise, N can go into state q′ , write symbol a′ , and move the head in direc-
tion d ∈ {0, 1, −1}, for any (q′ , a′ , d) ∈ δ(q, a).
• N has free choice as to which (q′ , a′ , d) ∈ δ(q, a) to take.
• δ(q, a) = 0/ means that there is no applicable instruction. In this case N halts and
fails. N also halts and fails if its head tries to move left off the tape.
Of course, there are NDTMs N such that δ(q, a) always either contains a single triple
(q′ , a′ , d) or is empty. Such an N will behave like an ordinary Turing machine — deter-
ministically. So the ordinary Turing machine is a special case of a non-deterministic
Turing machine. We have again generalised the definition of a Turing machine, as
we did with the n-tape Turing machine in section 3 (the n-tape model generalises the
ordinary one, as n could be 1).
The definition of solving a yes/no problem is the same as for deterministic Tur-
ing machines (definition 9.2): N should accept the yes-instances and reject the no-
instances.
There is a lot of dubious mysticism surrounding the way NDTMs make their
choices. Some writers talk about magic coins, others, lucky guesses, etc etc. In my
view, there is no magic or luck involved. According to the above definition, a NDTM
N accepts an input w if it is possible for N to halt & succeed on input w. If you re-
member this simple statement, you’ll save yourself a lot of headaches. N would have
to make all the right choices — somehow — but we don’t need to say how! We are
not claiming that NDTMs are (yet) a practical form of computation. They are a tool
for studying complexity. As we said, non-determinism is a ‘hole’ waiting to be filled
in by future discoveries.
Exercise 10.3 What’s wrong with this argument: the non-deterministic Turing ma-
chine in figure 10.1 solves any yes/no problem, because given a yes-instance, it can
move to state q1 , and given a no-instance it can move to state q2 . As it’s non-determin-
istic, we don’t need to say how it chooses! [Look at the definition of solving.]
q0
(x,x,0)
(x,x,0)
q1 q2
3. if there’s no remainder then it halts and succeeds (‘yes’); otherwise it halts and
fails (‘no’).
140 10. Non-deterministic Turing machines
N q0
(∧,(∧,∧),0) Is (number on track 2)
A bigger than 1 and less
(x,(x,0),1) no than (number on track 1)?
(x,(x,1),1) q1
if x= 0 or 1 remainder ≠ 0 yes
1 1 0 1 1
1 ∧ ∧ ∧ ∧
0 0 0 1 0
Exercise 10.5 Why doesn’t this approach work for the problem ‘is n prime?’
10.3. Speed of NDTMs 141
The class NP is very important. It is the class of ‘type (∃) problems’ (see §9.5) that
would succumb to a clever search strategy. It contains many commonly occurring
problems of practical importance. For instance, each NDTM described in the examples
of §10.2 has p-time complexity, so PSAT, TSP, HCP, and compositeness testing are all
in NP. As Pratt proved in 1977, so is primality testing (it is now known to be in P).
We said that an NDTM is a Turing machine with ‘holes’ waiting to be filled by
a clever search strategy. In effect, all the hoped-for strategy needs do is to search the
tree of figure 10.4, to find an accepting run. Thus the remark in footnote 1 on p. 137 is
justified!
10.4.1 P = NP?
Clearly P ⊆ NP (simply because p-time deterministic TMs are special cases of p-time
NDTMs). It is not known whether P = NP — this is probably the most famous open
question in computer science. Most computer scientists believe that P 6= NP, but they
may all be wrong. Unlike Church’s thesis, the question P = NP is precisely stated.
Whether P = NP or not shouldn’t be a matter of belief. We want to prove it, one way
or the other. A lot of work has been done, but no-one has yet published a proof.
10.5. Simulation of NDTMs by ordinary TMs 143
10. Having done all children of this configuration s∗t , head 1 moves to the next con-
figuration on tape 1, and the process repeats, the new children being appended
to tape 2. And so on, through all configurations on tape 1.
11. When every configuration on tape 1 has been dealt with in this way, tape 2 holds
the configurations corresponding to level n + 1 of the tree. M can copy tape 2
back over tape 1 and go on to the next cycle (step 7 above). If tape 2 is empty,
this means that there are no valid children. The tree has no level n + 1, so M
halts and fails.
We now use Turing machines to formalise the technique we saw in §8.6 of reducing
one problem to another in p-time. This p-time reduction gives fast non-deterministic
solutions to new yes/no problems from known fast non-deterministic solutions to old
ones. It gives a measure of the relative hardness of yes/no problems.
TM to TM to
reduce solve B
A to B
TM that solves A
Warning A ≤ B implies that, but is not the same as, any fast solution to B can be
used to solve A quickly. There might be other ways of using B to solve A than via
reduction (have a look at exercise 8.10 again).
Example 11.2 By example 8.11, HCP reduces to TSP, and the reduction can easily
done by a deterministic Turing machine running in p-time. So HCP ≤ TSP.
Warning Don’t try to reduce HCP to TSP in p-time as follows: given an instance G
of HCP,
1 d=3
• If G is a yes-instance of HCP, output 1 1
3 d=1
• If G is a no-instance of HCP, output 9 5
Why do we not allow a = 1 here — unary notation? Unary is a special case; the
exercise below shows why.
Exercise 11.4 There is a deterministic Turing machine BU that, given the binary rep-
resentation of a number as input, outputs the unary representation. (1) Design one. (2)
Show that no such machine BU can have polynomial time complexity. [Hint: how long
does BU take to output the answer if the input is the binary representation of n?]
11.2 ≤ is a pre-order
We saw that if A ≤ B then we can use a fast solution to B to solve A quickly. So
if A ≤ B then in effect A is no harder than B. Thus the relation ≤ orders the yes/no
problems by increasing difficulty.
But is ≤ really an ordering at all? In fact it’s what’s called a pre-order: a reflexive,
transitive (see §7.1) binary relation. Other pre-orders include the ordering on numbers:
x ≤ x for all x, and x ≤ y ≤ z implies x ≤ z. The relation T on students given by s T t
iff t is at least as tall as s is a pre-order. The well-known binary relation likes(x, y)
may be a pre-order, if everyone likes themselves (so Bob likes Bob, for example), and
whenever (say) Bob likes Chris and Chris likes Keith then also Bob likes Keith.
Theorem 11.5 The relation ≤ defined above is a pre-order on the class of yes/no prob-
lems.
P ROOF. ≤ is reflexive. To prove this we must show that for any yes/no problem A,
A ≤ A holds. To prove A ≤ A, we must find a deterministic p-time Turing machine
reducing A to A.
Let I be a finite alphabet in which all instances of A can be written. Let X be the
deterministic Turing machine
/ {q0 }).
(q0 , I ∪ {∧}, I, q0 , 0,
(Cf. Y of figure 9.1.) X just halts & succeeds without action, so its output is the same
as its input. Hence if w is a yes-instance of A then fX (w) = w is a yes-instance of A;
and similarly for no-instances. So X reduces A to A. Moreover, X runs in polynomial
time, since timeX (n) = 0 for all n.
148 11. Reduction in p-time
The total run time of X ∗Y is thus at most 1 + p(n) + p(n) + q(p(n)), which works out
to a polynomial. For example, if p(n) = 2n2 +n3 and q(n) = 4+5n, then the expression
is 1 + 2(2n2 + n3 ) + 4 + 5(2n2 + n3 ), which works out to 5 + 14n2 + 7n3 , a polynomial.
QED.
Remark 11.6
1. Steps 3 and 4 above are important.
2. We cannot just say: the symbol ≤ looks like the usual ordering 1 ≤ 2 ≤ 3 . . . on
numbers (it has a line under it: it is not <, but ≤), so therefore A ≤ A (etc).
The symbol ≤ may look like the ordering of numbers, but it has a quite different
meaning. To prove that ≤ is reflexive and transitive we must use the definition
of ≤. No other way will do.
3. To prove that A reduces to A in p-time may seem a silly thing to do. It is not
silly. It is just trivial.
Exercise 11.7 Show that the relation A ∼ B given by ‘A ≤ B and B ≤ A’ (defini-
tion 11.1) is an equivalence relation (see §7.1) on the class of all yes/no problems.
(Use the theorem.)
Result: a
w (input word non-deterministic
for yes/no problem A) Turing machine X*N
X*N solving yes/no
mark problem
square X rtn to N A in NP-time
0 sq. 0
non-deterministic
deterministic output of X Turing machine
Turing machine is an instance solving yes/no
reducing A to B in of B problem
p-time B in NP-time
• X runs for time at most p(n) on input w. X halts with output fX (w).
• Then the output fX (w) of X is given as input to N . As before, fX (w) has length
≤ p(n), so no run of N is longer than q(p(n)).
Remark We’ve showed that the concatenation (joining up) of p-time Turing ma-
chines X,Y by sending the output of X into Y as input, gives another p-time Turing
machine X ∗Y . Y can be non-deterministic, in which case so is X ∗Y .
P ROOF. TSP is in NP, by example 10.6. A simple formalisation of example 8.11 using
Turing machines shows that HCP ≤ TSP. So by the theorem, HCP is in NP also. QED.
We already knew this (exercise 10.8), but reduction is useful for other things — see
especially NP-complete problems in section 12. Some 1,000 other p-time reductions
of problems to NP problems are known.
11.4. The P-problems are ≤-easiest 151
Theorem 11.10 If Ais a yes-no problem, and A ≤ B for all yes-no problems B, then
A ∈ P.
Result: a
w (input word deterministic
for yes/no problem A) Turing machine X*M
solving yes/no
mark problem
square rtn to
X M A in p-time
0 sq. 0
X*M deterministic
deterministic output of X Turing machine
Turing machine is an instance solving yes/no
reducing A to B in of B problem
p-time B in p-time
Now we show the other half, that the problems in P are ≤-easiest. This is a new
argument for us, and one that seems like a trick.
Theorem 11.11 If A is any problem in P, and B is any yes/no problem at all, then
A ≤ B.
P ROOF.
Crudely, the idea is this. We want to show that we can find a fast solution to A
if we are allowed to use one for B. But we’re told A is solvable in p-time, so we can
solve A directly, without using the solution for B! This is crude because we have to
show that A reduces to B in p-time, which is not quite the same (see the warning on
page 146). But the same trick works.
152 11. Reduction in p-time
In the figure, ‘output w1 ’ is a Turing machine that outputs the word w1 as fixed
text (as in the ‘hello world’ example). The Turing machine ‘output w2 ’ is similar. X
contains a likeness M ′ of M , slightly modified so that:
• if M halts and succeeds then control passes to output w1 ,
• if M halts and fails then control passes to output w2 .
We require that M ′ eventually passes control to the rest of X , so that fX (w) is defined
for any instance w of A. This is true, because as M runs in p-time, it always halts (see
proposition 9.6).
Clearly, X is deterministic. By counting steps, as in theorems 11.5 and 11.8, we
can check that M runs in p-time.
We show that X reduces A to B. If the input to X is a yes-instance w of A, then M
will halt and succeed on w, so X outputs w1 , a yes-instance of B. Alternatively, if the
input is a no-instance of A, then M halts and fails, and X outputs w2 , a no-instance of
B. So by definition 11.1(1), X reduces A to B. So A ≤ B as required. QED.
Exercises 11.12
1. Check that M above does run in p-time.
2. Let ∼ be the equivalence relation of definition 11.1(4). Let A be any yes/no
problem in P. Show that for any yes/no problem B, A ∼ B iff B ∈ P.
3. In theorem 5.6, we reduced HP to EIHP. Is this reduction p-time?
11.5. Summary of section 153
yes/no problems
NP ≤ hard
easiest problems
problems intractable
Gödel
P PSAT
(tractable) HCP unsolvable
TSP HP
Each ∼-class consist of all problems of a certain ≤-difficulty (of A, B such that
A ≤ B and B ≤ A). By theorem 11.11 and exercise 11.12(2), P is an entire ∼-class,
consisting of the ≤-easiest problems. If A ∈ NP then any ≤-easier problem is also in
NP. So NP is a union of classes: no ∼-class overlaps NP on both sides. Of course the
shaded area in NP but outside P may be empty, so that PSAT and friends are all in P!
Whether this is so is the question P = NP, which is unsolved.
You will probably survive if you remember that:
The problems in P are the ≤-easiest. Is there is a ≤-hardest problem? We will investi-
gate this — at least within NP — in the final section.
12. NP-completeness
12.1 Introduction
Is there is a ≤-hardest problem? The answer is not at all obvious, even within NP.
Could it be that for any NP-problem, there’s always a harder one (still in NP)? If so,
there’d be harder and harder problems in NP (with respect to ≤), forming an infinite
sequence of increasingly hard problems, never stopping. Or maybe there are many
different ≤-hardest NP-problems, all unrelated by ≤ — after all, why should we expect
very different problems to reduce to each other? (Of course if P = NP then our question
is irrelevant. But most likely, P 6= NP. Ladner showed in 1975 that if P 6= NP then there
are infinitely many ∼-classes within NP.)
In fact this doesn’t happen, at least within NP (and also within several other com-
plexity classes which we won’t discuss). In a famous paper of 1971, Stephen Cook
proved that there are ≤-hardest problems in NP. Such problems are called NP-complete
problems.
What do we mean by a ≤-hardest problem?
1. A ∈ NP,
Exercise 12.2 Show that if A, B are NP-complete then A ∼ B. Show that if A is NP-
complete and A ∼ B then B is also NP-complete. So the NP-complete problems form
a single ∼-class.
We prove it in §12.3.
Some 1,000 examples of NP-complete problems are now known. They include
PSAT, the Hamiltonian circuit problem (HCP), the travelling salesman problem (TSP),
partition (given a set of integers, can it be divided into two sets with equal sum?)
scheduling (can a given set of tasks of varying length be done on two identical ma-
chines to meet a given deadline?) and many other problems of great practical impor-
tance. See texts, e.g., Garey & Johnson, for a longer list. Non-primality and primality
testing were known for a long time to be in NP, but are now known (since 2002) to be
in P, which is even better (remember that P ⊆ NP).
1 S.A.
Cook, The complexity of theorem proving procedures, Proceedings of Third Annual ACM
Symposium on the Theory of Computing, 1971, pp. 151–158.
12.1. Introduction 155
delivers a route round the cities of at most twice the optimal length. But for the general
version of TSP, the existence of such a polynomial time algorithm would imply P =
NP.
1. A is in NP, and
For if A is ≥-harder than a ≥-hardest problem in NP, but is still in NP, then A must
also be ≥-hardest in NP: i.e., NP-complete.
(1) is usually easy, but must not be forgotten. (For example, there are unsolvable
problems A that satisfy (2). Satisfiability for predicate logic is unsolvable, but is
clearly ≥ PSAT since PSAT is a special case of it. Such problems are not NP-complete,
being outside NP.) One can either prove (1) directly, as in §10.2, or else show that
A ≤ C for some C known to be in NP, and then use theorem 11.8.
To show (2), you must reduce a known NP-complete problem B to A in p-time.
Any B in NPC will do. There are now about 1000 Bs to choose from. A popular
choice is 3SAT:
Cook showed that 3SAT is NP-complete in his 1971 paper. Because the instances of
3SAT are more limited than those of PSAT, it is often simpler to reduce 3SAT to the
original problem A than to reduce PSAT to it.
Exercises 12.4
1. Recall the definition of the class co-NP (definition 9.9). Show that NP = co-NP
iff NPC ∩ co-NP 6= 0/ .4
2. [Quite hard; for mathematicians] Show PSAT ∼ 3SAT, and HCP ∼ PSAT. You
can assume Cook’s theorem (below).
3 The ‘3’ in 3SAT refers to there being 3 disjuncts (X,Y, Z above) in each clause. The formula F may
use many more than 3 atoms.
4 By exercise 9.11, if P = NP then NP = co-NP. It might be true that NP = co-NP and still P 6= NP, but
this is thought unlikely. Cf. the discussion in §9.5.
12.3. Cook’s theorem 157
X
yes-instance satisfiable
w of A formula F w
X
no-instance unsatisfiable
w of A formula F w
time 0: initially. Box 1 will represent the configuration at time 1, and so on. We know
N halts by time p(n), so we don’t need any more boxes than 0, 1, . . . , p(n).
How does the table record a run of N on w? Each box is divided horizontally
into three segments, to describe the configuration of N at the time that concerns the
box (the ‘current time’). The first segment indicates the state of N at that time. It’s
divided into as many rows as there are states, one row for each state. We shade the
row of the current state. So if Q = {q0 , . . . , qs } and the current state is qi , row i alone
will be shaded. In figure 12.3 (time 0) row 0 is shaded because initially N is in state
q0 . The second segment describes the current tape contents. Now as N only has at
most p(n) moves, its head can never get beyond square p(n). So all tape squares after
p(n) will always be blank, and we need only describe the contents up to square p(n).
We do it by chopping the segment up into rows and columns. The rows correspond
to symbols from Σ = {a0 , . . . , ar } say, where ar = ∧, and the columns correspond to
squares 0, 1, . . . , p(n) of the tape. We shade the intersection of row i, column j iff ai is
currently the character in square j. So the shading for time 0 describes the initial tape
contents: w itself.5
We can read off w from figure 12.3: it is a0 a1 a4 a3 a4 . The rest of the tape is ar = ∧.
Finally the third segment describes N ’s current head position. We divide it into
columns 0, 1, . . . , p(n), and shade the column where the head is. The head never moves
more than p(n) away from square 0 in any run (it hasn’t time), so we only need p(n)+1
columns for this. For the time 0 box (figure 12.3) we shade column 0, as N ’s head
begins in square 0. If N halts before p(n), we can leave all later boxes blank.
Will a table like that of figure 12.2 but with random shading correspond to a real
accepting run of N ? No: there are four kinds of constraint.
1. For each t ≤ p(n), the box for time t must represent a genuine configuration C(t)
5 Technicalpoint: by replacing p(n) by p(n) + n if need be, we can assume that p(n) ≥ n. We still have
timeM (n) ≤ p(n). So we’ve room to record w itself.
12.3. Cook’s theorem 159
time 0
q0
q1
: :
: :
qs
a0
a1
a2
: :
: :
ar = ∧
head position
0 1 2 ... p(n)
2. C(0) must be the initial configuration of N . So the box for time 0 should say
that the head is in square 0 and the state is q0 , as in figure 12.3. Moreover, it
should say that the initial tape contents are w.
3. The whole table must represent a run of N . The successive configurations C(t)
represented by the boxes should be related, as we have to be able to get from
C(t) to C(t + 1) by a single step of N . So there’ll be further constraints. Only
one tape character can change at a time; and the head position can vary by at
most 1. Compare boxes 0 and 1 of figure 12.2. And the new character and
position, and new state, are related to the old by δ. E.g., we can read off from
figure 12.2 that (q0 , a0 , q2 , a2 , 1) is an instruction of N .
4. The run indicated by the table must be accepting. This is a constraint on the
final state, as shown in segment 1 of the last non-empty box. If the shaded state
is in F , the run is accepting; otherwise, not.
Any accepting run of N on w meets these constraints and we can fill in a table for
it. Conversely, if we fill in the table so as to meet the four constraints, we do get an
accepting run of N on w. So the question does N accept w is the same as can we fill
in the table subject to the four constraints?
160 12. NP-completeness
Describing the table with logic Filling in squares of the table is really a logical
Boolean operation — either a square is shaded (1), or not (0). Let’s introduce a propo-
sitional atom for each little square of the table. The atom’s being true will mean that
its square is filled in. So a valuation v of the atoms corresponds exactly (in a 1–1 way)
to a completed table (though it may not meet the constraints).
Describing the constraints with logic The constraints (1)–(4) above correspond to
constraints on the truth values of the atoms. It is possible to write a propositional
formula Fw that expresses these constraints. Fw does not say which squares are filled
in: it only expresses the constraints. (The constraints do determine how box 0 is filled
in, but boxes 1, 2, . . . , p(n) can be filled in in many different ways, corresponding to
the different choices made by N during its run.) Given any valuation v of the atoms
that makes Fw true, we can read off from v a shading for the table that meets the four
constraints. And from any validly-completed table we can read off a valuation v to the
atoms such that v(Fw ) = true. So the question can we fill in the table subject to the
four constraints? is the same as the question is there a valuation v making Fw true?
The end We have fX (w) = Fw . But now we’re finished. For, w is a yes-instance of A
iff N accepts w, iff there’s a way to fill in the table that meets the constraints, iff there’s
some valuation v making fX (w) = Fw true, iff Fw is a yes-instance of PSAT. Hence X
reduces A to PSAT. Since X is deterministic and runs in p-time, we have A ≤ PSAT.
But this holds for any A in NP. So PSAT is NP-complete. QED and goodnight.
2. (a) Briefly explain the difference between the depth-first and breadth-first
methods of constructing a spanning tree of a connected graph.
(b) i. List the edges of a spanning tree of the following graph, using the
depth-first method.
A B
E D
accepting run on that input. It rejects an input if all its runs on that input end in
failure. It solves a yes/no problem if it accepts the yes-instances and rejects the
rest, as before. Its run time function timeN (n) is the length of the longest run
of N on any input of size n. The definition of N running in p-time is as before.
NDTMs can solve yes/no problems like PSAT, HCP, TSP, etc., in p-time, as
they simply guess a possible solution x to the input instance w, accepting w if x
is in fact a solution. Checking that x is a solution to w (e.g., x is a round trip of
length ≤ d in TSP, or in PSAT, x is a valuation making the propositional formula
A true) can be done in p-time. We let NP be the class of all yes/no problems
solvable by some NDTM in p-time. So TSP, HCP and PSAT are all in NP. As
deterministic Turing machines are a special case of NDTMs, any P problem is
in NP.
However, though faster than deterministic Turing machines, NDTMs can solve
no more yes/no problems. This gives more evidence for Church’s thesis. A de-
terministic Turing machine can simulate any NDTM by constructing all possible
runs in a breadth-first manner, and seeing if any is accepting. That is, it does a
full exhaustive search of the tree of runs of the NDTM.
Section 11: We can formalise the notion of a yes/no problem A being no harder than
another, B, by p-time reduction. To reduce A to B in p-time (‘A ≤ B’) is to find
a (deterministic) p-time Turing machine X that converts yes-instances of A into
yes-instances of B, and similarly for no-instances. Since X is fast, any given fast
solution F to B can be used to solve A: first apply X , then F . If F solves B non-
deterministically, the solution we get to A is also non-deterministic, so if B ∈ NP
and A ≤ B then also A ∈ NP: that is, ≤-easier problems than NP-problems are
also in NP.
It’s easy to convert yes-instances of A into yes-instances of A and no-instances
of A into no-instances of A in p-time (leave them alone!), so A ≤ A and ≤
is reflexive. ≤ is transitive, as if X converts A-instances to B-instances in p-
time, and Y converts B-instances to C-instances in p-time, then running X then
Y converts A-instances to C-instances (always preserving parity: yes goes to
yes, no to no). Careful counting shows that this takes only p-time (remember to
return heads to square 0, and that the input to Y may be (polynomially) longer
than the original input to X ). Hence A ≤ B ≤ C implies A ≤ C, so ≤ is transitive.
≤ is thus a pre-order.
Problems in P are ≤-easiest of all. For if A ∈ P and B is arbitrary, we can
convert instances w of A to instances of B in p-time and preserving parity) by
the following trick. As we can solve A completely in p-time, we find out in
p-time whether w is yes or no for A. Then we hand over a fixed instance of B of
appropriate parity.
Section 12: The ≥-hardest problems in NP are called NP-complete. A yes/no problem
A is NP-complete if A is in NP but A ≥ B for all NP-problems B. Cook proved
in 1971 that PSAT is NP-complete, so NP-complete problems exist. His proof
went like this. We know PSAT ∈ NP. If A ∈ NP, there’s a p-time NDTM
164 12. NP-completeness
165
166 Index