Formal Language - A Practical Introduction 2008 - Adam Brooks Webber
Formal Language - A Practical Introduction 2008 - Adam Brooks Webber
A P R A C T I C A L I N T R O D U C T I O N
© 2008 Franklin, Beedle & Associates, Incorporated. No part of this book may be
reproduced, stored in a retrieval system, transmitted, or transcribed, in any form or by
any means—electronic, mechanical, telepathic, photocopying, recording, or otherwise—
without prior written permission of the publisher. Requests for permission should be
addressed as follows:
Rights and Permissions
Franklin, Beedle & Associates, Incorporated
8536 SW St. Helens Drive, Ste. D
Wilsonville, Oregon 97070
CHAPTER 1 Fundamentals 11
1.1 Alphabets 12
1.2 Strings 12
1.3 Languages 14
Exercises 15
iii
iv CONTENTS
CHAPTER 8 Regular-ExpressionApplications 91
8.1 The egrep Tool 92
8.2 Nonregular Regexps 94
8.3 Implementing Regexps 95
8.4 Regular Expressions in Java 96
8.5 The lex Tool 97
8.6 Further Reading 100
Exercises 101
viii
PREFACE ix
are minor technicalities, to be sure, but there are enough of them that I find my students
become discouraged—and if they look to other books for guidance that only makes it worse,
because no two agree on these details. The formalism generates too much heat and not
enough light. In place of PDAs this book investigates stack machines, a simpler computational
formalism that captures the CFLs, connects neatly with CFGs, and provides a good
foundation for a discussion of parsing techniques.
With few other exceptions, the ideas, theorems, and proofs in this book are standard.
They are presented without the detailed historical attributions that may be found in
graduate-level texts. However, many of the chapters conclude with a Further Reading section,
giving references into the literature for interested students.
I designed the book to be accessible to sophomores in a computer-science major—
perhaps even to second-semester freshmen. The presentation does not rely on any college-
level mathematical prerequisites. The only computer-science prerequisite is the ability to read
simple code examples: a Java-like syntax is used, but the ability to read any C-family language
will suffice.
In some academic departments, the material on complexity—Chapters 19, 20, and 21,
and Appendices B and C—will be positioned in a separate course, together with related
material on algorithms and complexity analysis. In my experience there is more than enough
even in Chapters 1 through 18 to fill a semester-long course, so those wishing to cover
the entire subject of the book may want to consider skipping or abbreviating, particularly
Chapters 9, 11, 14, and 18.
In many CS and CE departments (such as at the University of Wisconsin—Milwaukee,
where I began the development of this book) a course in this subject is a required part of
the major. At other institutions (including Monmouth College, where I completed it) the
course has gradually fallen into disuse; it has sometimes become an elective, rarely if ever
offered. The reason for this decline, I believe, is the mismatch between the students and
the treatment of the subject. Instructors do not enjoy the drudgery of having to frog-march
undergraduates through a difficult textbook that they barely understand and quickly forget;
students enjoy the experience even less. I hope that Formal Language: A Practical Introduction
will help to rekindle the excitement of teaching and studying this subject.
Introduction
It sometimes seems that everything is a science these days: political science and beauty
science, food science and waste science, creation science and mortuary science. Just for
fun, try searching the Web for any well-formed phrase ending with the word science. Weed
science, reincarnation science, sales science—it’s hard to find one that is not in use.
Because of this oddly mixed company, the name computer science triggers some
skepticism. That’s unfortunate, because it is actually an excellent field of study for many
x PREFACE
undergraduates. Looking for a liberal-arts education? Computer science has deep connections
throughout the web of human knowledge and is highly relevant to broad, ongoing changes
in our culture. Looking for a rigorous scientific education? There are research-oriented
departments of computer science at all the world’s great universities, engaged in a dramatic
scientific adventure as our young science begins to reveal its secrets. Looking for a vocational
education? Computer science is a solid foundation for a variety of career paths in the
computer industry.
The subject of this book, formal language, is at the heart of computer science, and it
exemplifies the virtues just cited:
• Formal language is connected to many other branches of knowledge. It is where
computer science, mathematics, linguistics, and philosophy meet. Understanding
formal language helps you see this interconnectedness, so that you will never make
the mistake of thinking of computer science as a separate intellectual kingdom.
• Formal language is a rigorous branch of mathematics, with many open questions at
the frontiers. This book covers only the basics, but if you find the basics interesting
there is much more to study. Some advanced topics are identified here, with
pointers to further reading; others may be found in any graduate-level text.
• Formal language is very useful. This book stresses applications wherever they arise,
and they arise frequently. Techniques that derive from the study of formal language
are used in many different practical computer systems, especially in programming
languages and compilers.
In addition, formal language has two special virtues, not shared with most of the other
branches of computer science:
• Formal language is accessible. This book treats formal language in a way that does
not assume the reader knows any advanced mathematics.
• Finally, formal language is stable—at least in the elementary parts. Almost
everything in this book has been known for 30 years or more. That’s a very long
time in computer science. The computer you bought as a freshman may be obsolete
by the time you’re a senior, and the cutting-edge programming language you
learned may be past its prime, but the things you learn about formal language will
not lose their relevance.
No one who loves language can take much pleasure in the prospect of studying a subject
called formal language. It sounds suspiciously abstract and reductionistic. It sounds as if all
the transcendent beauty of language will be burned away, fired under a dry heat of definitions
and theorems and proofs, until nothing is left but an ash of syntax. It sounds abstract—and
it is, undeniably. Yet from this abstraction arise some of the most beautiful and enduring
ideas in all of computer science.
PREFACE xi
This book has two major goals. The first is to help you understand and appreciate
the beautiful and enduring ideas of formal language. These ideas are the birthright of all
computer scientists, and they will profoundly change the way you think about computation.
They are not only among the most beautiful, but also among the most useful tools in
computer science. They are used to solve problems in a wide variety of practical applications,
and they are especially useful for defining programming languages and for building language
systems. The second purpose of this book is to help you develop a facility with these useful
tools. Our code examples are in Java, but they are not particularly Java-centric and should be
accessible to any programmer.
There is also a third major reason to study formal language, one that is not a primary
focus of this book: to learn the techniques of mathematical proof. When you are learning
about formal language, it can also be a good time to learn proof techniques, because the
subject is full of theorems to practice on. But this book tries to make the beautiful and useful
ideas for formal language accessible to students at all levels of mathematical interest and
ability. To that end, although the book presents and discusses many simple proofs, it does not
try to teach advanced proof techniques. Relatively few of the exercises pose challenging proof
problems. Those planning graduate-level study of theoretical computer science would be well
advised not to rely exclusively on this book for that kind of training.
Acknowledgments
Today the territory mapped in this book is part of the intellectual commons of our
discipline, but it was not always so. I am indebted to the many researchers, living and dead,
who first explored this territory. I am honored to be able to share their work with others.
I am particularly indebted to those scholars who first shared it me, for their style, grace,
and elegant precision: Scot Drysdale at Dartmouth College, and Dexter Kozen and Juris
Hartmanis at Cornell University.
Thanks to Jim Leisy, my publisher at Franklin, Beedle & Associates. His feedback and
support, both for this project and for my previous book, have been extremely valuable.
Thanks also to Stephanie Welch and Tom Sumner, my editors, for their great patience and
diligence with this project. Thanks to the anonymous reviewers who commented on early
drafts of the book. The remaining defects are, of course, entirely my own.
I have been encouraged and supported in this work by two excellent department
chairmen: K. Vairavan at the University of Wisconsin—Milwaukee, and Lyle Welch at
Monmouth College. Monmouth has been a particularly supportive environment for a
project like this, with its emphasis on the undergraduate experience. I’m proud to have been
able to teach there, and I thank the Monmouth students who endured my well-meaning
experimentation with early drafts of this book.
xii PREFACE
Finally, thanks to my family: my wife Kelly, my children Fern and Fox, my parents
Howard and Helen Webber, and my wife’s parents, Harold and Margaret Autrey. Their
loving support has been unfailing. For their sake I hope that the next book I write will be
more fun for them to read.
Web Site
The web site for this book is at http://www.webber-labs.com/fl.html. Materials there
include a full set of slides for each chapter, and all the larger code examples from the text
and exercises. There are also instructions for contacting the author to report defects, and for
accessing additional instructors-only materials.
1
CHA P TER
Fundamentals
Algebraists use the words group, ring, and field in technical ways,
while entomologists have precise definitions for common words
like bug and fly. Although it can be slightly confusing to overload
ordinary words like this, it’s usually better than the alternative,
which is to invent new words. So most specialized fields of study
make the same choice, adding crisp, rigorous definitions for words
whose common meaning is fuzzy and intuitive.
The study of formal language is no exception. We use crisp, rigorous
definitions for basic terms such as alphabet, string, and language.
1
2 CHAPTER ONE: FUNDAMENTALS
1.1 Alphabets
Formal language begins with sets of symbols called alphabets:
A typical alphabet is the set = {a, b}. It is the set of two symbols, a and b. There is no
semantics; we do not ascribe any meaning to a or to b. They are just symbols, which could as
well be 0 and 1, or and .
Different applications of formal languages use different alphabets. If you wanted to work
with decimal numbers, you might use the alphabet {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. If you wanted
to work with the contents of computer memories, you might use the alphabet {0, 1}. If you
wanted to work with text files on a computer, you might use one of the standard machine-
text alphabets, like ASCII or Unicode.
It is a convention in the study of formal languages, which is followed in this book, to
use the first few Latin letters, like a and b, in most example alphabets. This helps you resist
the urge to ascribe meaning to the symbols. When you see a string like 1000, you naturally
think of it as representing the decimal number one thousand, while the string abbb does not
tempt you to make any irrelevant interpretations. Because the symbols in the alphabet are
uninterpreted, our results apply to all alphabets.
Another convention followed in the book is to use the symbol to stand for the
alphabet currently in use. When it is necessary to manipulate more than one alphabet at
once, subscripts are used: 1, 2, and so on.
The empty set {} is a legal alphabet, but not usually an interesting one. (Some people use
the special symbol for the empty set, but this book just uses {}.)
1.2 Strings
The symbols from an alphabet are put together into strings:
For example, abbb and 010 are strings. In formal languages, unlike most programming
languages, strings are not written with quotation marks around them.
The length of a string is the number of symbols in it. To refer to the length of a string,
bracket the string with vertical lines: |abbb| = 4.
When the symbols in a string are part of a particular alphabet , we say that the string
is “a string over .” In this usage, the word “over” means “built using symbols from.” So, for
example, the set of all strings of length 2 over the alphabet {a, b} is the set of all strings of
length 2 that can be built using the symbols a and b: {aa, bb, ab, ba}.
1.2 STRINGS 3
The special symbol ε is used to represent the string of length zero: the string of no
symbols. Note here that ε is not a symbol in any alphabet we will use; it simply stands for the
empty string, much as you would write "" in some programming languages. For example,
the set of all strings of length 2 or less over the alphabet {a, b} is {ε, a, b, aa, bb, ab, ba}.
The length of ε is zero, of course: |ε| = 0. Be careful not to confuse the empty set, {}, with
the empty string, ε, and note also that {} {ε}; the set {} contains nothing, while the set {ε}
contains one thing: the empty string.
When describing languages and proving things about them, it is sometimes necessary
to use variables that stand for strings, such as x = abbb. This is a natural concept in
programming languages; in Java one writes String x = "abbb", and it is clear from
the syntax that x is the name of a variable and abbb are the characters making up the string
to which x refers. In the notation used for describing formal languages there is not so much
syntax, so you have to rely more on context and on naming conventions. The convention
followed in the book is to use the last few Latin letters, like x, y, and z, as string variables, not
as symbols in alphabets.
For example, the following definition of concatenation uses string variables.
The concatenation of two strings x and y is the string containing all the symbols
of x in order, followed by all the symbols of y in order.
To refer to the concatenation of two strings, just write them right next to each other. For
example, if x = abc and y = def then the concatenation of x and y is xy = abcdef. For any string
x, we have xε = εx = x. So ε is the identity element for concatenation of strings, just as 0 is
the identity element for addition of natural numbers and just as 1 is for multiplication of
natural numbers.
Speaking of numbers, we’ll denote the set of natural numbers, {0, 1, …}, as N. Any
natural number n N can be used like an exponent on a string, denoting the concatenation
of that string with itself, n times. Thus for any string x,
x0 = ε (that is, zero copies of x)
x1 = x
x2 = xx
x3 = xxx
and, in general,
xn = xx . . . x
n times
When the alphabet does not contain the symbols ( and ), which is almost all the time, you
can use parentheses to group symbols together for exponentiation. For example, (ab)7 denotes
the string containing seven concatenated copies of ab: (ab)7 = ababababababab.
4 CHAPTER ONE: FUNDAMENTALS
1.3 Languages
A special notation is used to refer to the set of all strings over a given alphabet.
The Kleene closure of an alphabet , written *, is the set of all strings over .
For example, {a}* is the set of all strings of zero or more as: {ε, a, aa, aaa, ...}, and {a, b}* is the set of
all strings of zero or more symbols, each of which is either a or b: {ε, a, b, aa, bb, ab, ba, aaa, ...}. This
allows a more compact way of saying that x is a string over the alphabet ; we can just write x *.
(Here the symbol is used as in standard mathematical notation for sets and means “is an element
of.”)
Except for the special case of = {}, the Kleene closure of any alphabet is an infinite
set. Alphabets are finite sets of symbols, and strings are finite sequences of symbols, but
a language is any set of strings—and most interesting languages are in fact infinite sets of
strings.
Languages are often described using set formers. A set former is written like a set, but
it uses the symbol |, read as “such that,” to add extra constraints or conditions limiting the
elements of the set. For example, {x {a, b}* | |x| 2} is a set former that specifies the set of
all strings x over the alphabet {a, b}, such that the length of x is less than or equal to 2. Thus,
{x {a, b}* | |x| 2} = {ε, a, b, aa, bb, ab, ba}
{xy | x {a, aa} and y {b, bb}} = {ab, abb, aab, aabb}
{x {a, b}* | x contains one a and two bs} = {abb, bab, bba}
{anbn | n 1} = {ab, aabb, aaabbb, aaaabbbb, ...}
That last example shows why set former notation is so useful: it allows you to describe
infinite languages without trying to list the strings in them. Unless otherwise constrained,
exponents in a set former are assumed to range over all of N. For example,
{(ab)n} = {ε, ab, abab, ababab, abababab, ...}
{anbn} = {ε, ab, aabb, aaabbb, aaaabbbb, ...}
The set former is a very expressive tool for defining languages, but it lacks precision. In
one example above we used an English-language constraint: “x contains one a and two bs.”
That’s allowed—but it makes it easy to write set formers that are vague, ambiguous, or self-
contradictory. For that matter, even if we stick to mathematical constraints like “|x| 2” and
“n 1,” we still have the problem of explaining exactly which mathematical constraints are
EXERCISES 5
permitted and exactly what they mean. We won’t try to do that—we’ll continue to use set
formers throughout this book, but we’ll use them in an informal way, without any formal
definition of what they mean. Our quest in subsequent chapters will be to find other tools
for defining languages, formal tools that are precise and unambiguous.
Exercises
EXERCISE 1
Restate each of the following languages by listing its contents. For example, if the
language is shown as {x {a, b}* | |x| 2}, your answer should be {ε, a, b, aa, bb, ab, ba}.
a. {x {a, b, c}* | |x| 2}
b. {xy | x {a, aa} and y {aa, aaa}}
c. {}*
d. {an | n is less than 20 and divisible by 3}
e. {anbm | n < 2 and m < 3}
EXERCISE 2
List all strings of length 3 or less in each of the following languages:
a. {a}*
b. {a, b}*
c. {anbn}
d. {xy | x {a}* and y {b}*}
e. {anbm | n > m}
EXERCISE 3
Many applications of formal language theory do associate meanings with the strings in a
language. Restate each of the following languages by listing its contents:
a. {x {0, 1}* | x is a binary representation, without unnecessary leading zeros, of a
number less than 10}
b. {x {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}* | x is a decimal representation, without unnecessary
leading zeros, of a prime number less than 20}
c. {x {a, b, ..., z}* | x is a two-letter word in English}
EXERCISE 4
Restate each of the following languages using set former notation.
a. the language of all strings over the alphabet {a, b} that begin with a
b. the language of all even-length strings over the alphabet {a, b, c}
c. the language of strings consisting of zero or more copies of the string ba
d. the language of strings consisting of any number of as followed by the same number
of bs, followed by the same number of cs
6 CHAPTER ONE: FUNDAMENTALS
EXERCISE 5
This exercise illustrates one of the perils of set formers: their use can lead to logical
contradictions. This example is related to Russell’s Paradox, a famous contradiction in
naive set theory discovered by Bertrand Russell in 1901, which led to an important body
of work in mathematics.
A set former is itself a string, so one can define languages that contain set formers. For
example, the set of all set formers that define the empty set would include such strings as
“{}”, “{x | |x| < 0}”, and “{x | x is not equal to x}”. (We put quotation marks around set
formers considered as strings.)
a. Give an example of a string x that is a set former that defines a language that
includes x itself.
b. Give an example of a string y that is a set former that defines a language that does
not include y itself.
c. Consider r = “{set formers x | the language defined by x does not include x}”.
(Your set former y from Part b, above, is an example of a string in the language
defined by r.) Show that assuming r is in the language defined by r leads to a
contradiction. Show that assuming r is not in the language defined by r also leads to
a contradiction.
2
CHA P TER
Finite Automata
7
8 CHAPTER TWO: FINITE AUTOMATA
E: mwgc E: wc
W: W: mg
What happens if the g move is used again? Then the man rows back to the east side with the
goat, and the whole puzzle returns to the initial state. We include this transition as another
arrow:
g
E: mwgc E: wc
W: W: mg
g
2.1 MAN, WOLF, GOAT, AND CABBAGE 9
In this way, a state transition diagram can show all legal transitions and all reachable
states, omitting those illegal states where something gets eaten:
g n
E: mwgc E: wc E: mwc
W: W: mg W: g
c
g n w
c
w
E: c E: w
W: mwg W: mgc
g g g g
E: mgc E: mgw
W: w W: c
c w
g n c
w
E: E: mg E: g
W: mwgc W: wc W: mwc
g n
An extra arrow indicates the start state, and a double circle indicates the state in which the
diagram accepts that the solution is correct.
By starting in the start state and making one transition on each symbol in the string
gnwgcng, you can check that this is indeed a minimum-length solution to the problem. The
other minimum-length solution is gncgwng. There are longer solutions too, which involve
repeated states. For example, the man can just row back and forth with the goat any number
of times before completing the solution: gggggggggncgwng. In fact, there are infinitely many
possible solutions, though only two minimum-length solutions. The language of strings
representing legal solutions to the problem can be described with reference to the diagram
above: {x {w, g, c, n}* | starting in the start state and following the transitions of x ends up in the
accepting state}.
10 CHAPTER TWO: FINITE AUTOMATA
w,g,c,n error
The transition arrow is labeled with all the symbols in the alphabet and simply returns to the
same state. This shows that the state is a trap: once reached, it cannot be departed from. All
the transitions that were unspecified in the original diagram can now be shown as explicit
transitions to this error state. The resulting fully specified diagram is shown on the following
page.
This more elaborate diagram can handle any string over the alphabet {w, g, c, n}. By
starting in the start state and following the transitions on each character in the string, you
end up in a final state. If that final state is the accepting state, the string is a solution; if it is
any other state, the string is not a solution.
E: mwgc E: wc E: mwc
W: W: mg W: g
c
g n w
c
w,c
g w
w,c,n
E: c E: w
c,n W: mwg W: mgc
w,n
w,g,c,n error g g g g
c,n
g n
states. From every state, for every symbol in the alphabet , there is exactly one
arrow labeled with that symbol going to another state (or back to the same
state).
DFAs define languages, just as our man-wolf-goat-cabbage automaton defined the language
of solutions to the puzzle. Given any string over its alphabet, a DFA can read the string and
follow its state-to-state transitions. At the end of the string it is either in an accepting state
or not. If it is an accepting state, we say that the machine accepts the string; otherwise we
say that it rejects the string. The language defined by a DFA M is just the set of strings in *
accepted by M.
For example, here is a DFA for {xa | x {a, b}*}—that is, the language of strings over the
alphabet {a, b} that end in a:
12 CHAPTER TWO: FINITE AUTOMATA
b a
This diagram meets all the requirements for a DFA: it has a finite set of states with a single
start state, and it has exactly one transition from every state on every symbol in the alphabet
{a, b}. Unlike the man-wolf-goat-cabbage diagram, the states in this DFA are unlabeled. It
does not hurt to label the states, if they have some meaning that you want to call attention
to, or if you need to refer to the states by name for some reason. We could, for example, draw
the DFA above like this:
b a
But the labels of states (unlike the labels on the arrows) have no impact on the behavior of
the machine. They are rather like comments in a computer program.
There is one important convention you should know about before attempting the
exercises at the end of the chapter. If a DFA has more than one arrow with the same source
and destination states, like this:
a
then we usually draw it more compactly as a single arrow with a list of symbols as its label,
like this:
2.4 THE 5-TUPLE 13
a,b
These two forms mean exactly the same thing: they both show a state transition that the
DFA can make on either a or b.
The first part of a DFA is the set Q of states; this corresponds to the set of states drawn
as circles in a DFA diagram. The different states in Q are usually referred to as qi for different
values of i; note that the definition implies that there must be at least one state in Q, namely
the start state q0.
The alphabet is the second part of a DFA. When you draw a diagram for a DFA, the
alphabet is implicit; you can tell what it must be by looking at the labels on the arrows. For
the formal definition of a DFA, however, the alphabet is explicit.
The third part of a DFA is the transition function . The definition says (Q Q).
This is the mathematical way of giving the type of the function, and it says that takes two
inputs, a state from Q and a symbol from , and produces a single output, another state in
Q. If the DFA is in state qi reading symbol a, then (qi, a) is the state to go to next. Thus the
transition function encodes all the information about the arrows in a DFA diagram.
The fourth part of the DFA is the start state q0. There must be exactly one start state.
The fifth and final part of the DFA is F, a subset of Q identifying the accepting states, those
that are double-circled in the DFA diagram. There is nothing in the definition that prevents
F from being {}—in that case there would be no accepting states, so the machine would
reject all strings and the language would be {}. There is also nothing that prevents F from
14 CHAPTER TWO: FINITE AUTOMATA
being equal to Q—in that case all states would be accepting states, so the machine would
accept all strings and the language would be *.
Consider for example this DFA for the language {xa | x {a, b}*}:
b a
q0 q1
Formally, the DFA shown is M = (Q, , , q0, F), where Q = {q0, q1}, = {a, b},
F = {q1}, and the transition function is
(q0, a) = q1
(q0, b) = q0
(q1, a) = q1
(q1, b) = q0
A DFA is a 5-tuple: it is a mathematical structure with five parts given in order.
The names given to those five parts in the definition above—Q , , and so forth—
are conventional, but the machine M could just as well have been defined as
M = ({q0, q1}, {a, b}, , q0, {q1}), without naming all the parts. The important thing
is to specify the five parts in the required order.
That is, a string x is accepted if and only if the DFA, started in its start state and taken
through the sequence of transitions on the symbols in x, ends up in one of its accepting
states. We can also define L(M), the language accepted by a DFA M:
For any DFA M = (Q, , , q0, F ), L(M) denotes the language accepted by M,
which is L(M) = {x * | *(q0, x) F }.
A direct way to prove that a language is regular is to give a DFA that accepts it. To prove that
a language is not regular is generally much harder, and this is a topic that will be addressed
later in this book.
Exercises
EXERCISE 1
For each of the following strings, say whether it is in the language accepted by this DFA:
a b
a. b
b. ε
c. ab
d. abba
e. ababaaaab
16 CHAPTER TWO: FINITE AUTOMATA
EXERCISE 2
For each of the following strings, say whether it is in the language accepted by this DFA:
0 1
1 0
1 0
a. 0
b. 11
c. 110
d. 1001
e. 101
EXERCISE 3
Say what language is accepted by each of the following DFAs. Do not just describe the
language in English; write your answer as a set, using set former notation if necessary.
For example, if the DFA shown is
b a
a,b
EXERCISES 17
b.
a,b
a,b
c.
a,b
a,b
d.
a,b
a,b
e.
a a
b
EXERCISE 4
For each of the following languages, draw a DFA accepting that language. The alphabet
for the DFA in all cases should be {a, b}.
a. {a, b}*
b. {a}*
c. {x {a, b}* | the number of as in x is odd}
d. {ax | x {a, b}*}
e. {(ab)n}
18 CHAPTER TWO: FINITE AUTOMATA
EXERCISE 5
Draw the DFA diagram for each of the following DFAs.
a. ({q0, q1}, {a, b}, , q0, {q1}), where the transition function is
(q0, a) = q1
(q0, b) = q1
(q1, a) = q0
(q1, b) = q0
b. ({q0, q1, q2}, {0, 1}, , q0, {q0}), where the transition function is
(q0, 0) = q0
(q0, 1) = q1
(q1, 0) = q2
(q1, 1) = q0
(q2, 0) = q1
(q2, 1) = q2
c. ({q0, q1, q2}, {a, b}, , q0, {q2}), where the transition function is
(q0, a) = q1
(q0, b) = q0
(q1, a) = q1
(q1, b) = q2
(q2, a) = q2
(q2, b) = q2
EXERCISE 6
State each of the following DFAs formally, as a 5-tuple.
a.
a,b
a,b
b.
a a
b
EXERCISES 19
c.
0,1
0,1
EXERCISE 7
For each machine M in the previous exercise, state what the language L(M) is, using set
formers.
EXERCISE 8
Evaluate each of these expressions for the following DFA:
0 1
1 0
q0 q1 q2
1 0
a. (q2, 0)
b. *(q0, 010)
c. ( *(q1, 010), 1)
d. *(q2, ε)
e. *(q2, 1101)
f. *(q2, 110111)
EXERCISE 9
Draw the diagram for a DFA for each of the following languages.
a. {x {a, b}* | x contains at least 3 as}
b. {x {a, b}* | x contains at least 3 consecutive as}
20 CHAPTER TWO: FINITE AUTOMATA
21
22 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
q1 0,1
0
q0
1
q2 0,1
L is the language of strings over the alphabet {0, 1} that start with a 0. The complement of L,
written as L , is the language of strings over the alphabet {0, 1} that do not start with a 0. We
can show that this is a regular language too, by constructing a DFA to recognize it:
q1 0,1
0
q0
1
q2 0,1
Comparing the two DFAs, you can see that the machine for L is the same as the machine for
L, with one difference: the accepting states and the nonaccepting states have been exchanged.
Whenever the first machine accepts, the second machine rejects; whenever the first machine
rejects, the second machine accepts.
This trick for constructing a complemented DFA can be generalized to handle any
complemented language. First, here is a formal definition of the complement of a language:
Notice that the complement of a language is always taken with respect to the underlying
alphabet. This guarantees that the complement of a complement is the original language:
L = L.
3.1 CLOSED UNDER COMPLEMENT 23
Now we can easily prove a useful property of the regular languages:
Theorem 3.1: If L is any regular language, L is also a regular language.
Proof : Let L be any regular language. By definition there must be some DFA
M = (Q, , , q0, F ) with L(M ) = L. Define a new DFA
M' = (Q, , , q0, Q - F ) using the same set of states, alphabet, transition
function, and start state as M, but using Q–F as its set of accepting states. Now
for any x *, M' accepts x if and only if M rejects x. So L(M' ) = L . Since M'
is a DFA that accepts L , it follows that L is regular.
This theorem shows that if you take any regular language and complement it, you
still get a regular language. In other words, you cannot leave the class of regular languages
by using the complement operation. We say that the regular languages are closed under
complement. One useful thing about such closure properties is that they give you shortcuts
for proving that a language belongs to a particular class. If you need to know whether L
is a regular language and you already know that L is, you can immediately conclude that
L is (without actually constructing a DFA for it) simply by invoking the closed-under-
complement property of the regular languages.
This definition works fine no matter what the underlying alphabets are like. If L1 and L2 are
over the same alphabet , which is the most common case, then L1 L2 is a language over
the same alphabet—it may not actually use all the symbols of , but it certainly does not use
more. Similarly, if L1 and L2 are languages over different alphabets 1 and 2, then L1 L2 is
a language over 1 2
. We will generally assume that L1 and L2 are over the same alphabet.
The regular languages are closed under intersection. As in the previous section, we can
prove this by construction: given any two DFAs, there is a mechanical way to combine them
into a third DFA that accepts the intersection of the two original languages. For example,
consider the languages L1 = {0x | x {0, 1}*} and L2 = {x0 | x {0, 1}*}: the language of
strings that start with a 0 and the language of strings that end with a 0. Here are DFAs M1
and M2 for the two languages:
24 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
0,1 1 0
0
0
q0 q1 r0 r1
1
M1 M2
q2
0,1
We will define a DFA M3 that keeps track of where M1 is and where M2 is after each
symbol. Initially, M1 is in state q0 and M2 is in state r0; our new machine will reflect this by
starting in a state named (q0, r0 ):
0
q0 ,r0
1
Now when M1 is in state q0 and reads a 0, it goes to state q1, and when M2 is in state
r0 and reads a 0, it goes to state r1, so when M3 is in state (q0, r0 ) and reads a 0, it goes to a
new state named (q1, r1). That way it keeps track of what both M1 and M2 are doing. We can
define the transition on 1 similarly, adding this to the construction of M3:
0
q1 ,r1
0 1
q0 ,r0
0
1
q2 ,r0
1
The process of constructing M3 is not infinite, since there are only six possible states of the
form (qi, rj ). In fact, the construction is already about to repeat some states: for example,
from (q1, r1) on a 0 it will return to (q1, r1). The fully constructed DFA M3 looks like this:
3.2 CLOSED UNDER INTERSECTION 25
0
1
q1 ,r1 q1 ,r0 1
0
0
q0,r0
0
1
q2 ,r0 q2 ,r1 0
Notice that we did not need a state for the pair (q0, r1). That is because this combination
of states is unreachable: it is not possible for M1 to be in the state q0 while M2 is in the state
r1. The constructed machine accepts in the state (q1, r1), which is exactly where both M1 and
M2 accept. Since the new machine in effect simulates both the original machines and accepts
only when both accept, we can conclude that the language accepted is L1 L2.
The construction uses an operation on sets that you may not have seen before. Given
any two sets, you can form the set of all pairs of elements—pairs in which the first is from
the first set and the second is from the second set. In mathematics this is called the Cartesian
product of two sets:
To express the general form of the construction combining two DFAs, we use the Cartesian
product of the two sets of states. For this reason, the construction used in the following proof
is called the product construction.
Theorem 3.2: If L1 and L2 are any regular languages, L1 L2 is also a regular
language.
Proof: Let L1 and L2 be any two regular languages. Since they are regular, there
must be some DFAs M1 = (Q, , 1, q0, F1) with L(M1) = L1 and
M2 = (R, , 2, r0, F2) with L(M2) = L2. Construct a new DFA
M3 = (Q R, , , (q0, r0), F1 F2), where is defined so that for all q Q,
r R, and a , we have ((q, r), a) = ( 1(q, a), 2(r, a)). Since this DFA
simulates both M1 and M2 and accepts if and only if both accept, we can
conclude that L(M3) = L1 L2. It follows that L1 L2 is a regular language.
Notice that the machine constructed for the proof includes all the states in Q R, which may
include some unreachable states. In the case of our earlier example, this is a six-state DFA
instead of the five-state DFA. The extra state is not reachable by any path from the start state,
so it has no effect on the language accepted.
26 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
As with intersection, we will generally assume that L1 and L2 are over the same alphabet, but
the definition works fine even if they are not.
The regular languages are closed under union. There is a simple proof of this using
DeMorgan’s laws. There is also a direct proof using the subset construction, but choosing a
different set of accepting states.
Theorem 3.3: If L1 and L2 are any regular languages, L1 L2 is also a regular
language.
Proof 1: Using DeMorgan’s laws, L1 ∪ L2 = L1 ∩ L2 . This defines union in terms
of intersection and complement, for which the regular languages have already
been shown to be closed.
Proof 2: Use the product construction as in the proof of closure under
intersection, but for the accepting states use (F1 R) (Q F2). Now the
constructed DFA simulates both M1 and M2 and accepts if and only if either
or both accept. We conclude that L(M3) = L1 L2 and that L1 L2 is a regular
language.
The product construction for union is almost the same as the product construction for
intersection. The only difference is in the choice of accepting states. For intersection we make
the new DFA accept when both the original DFAs accept, while for union we make the new
DFA accept when either or both of the original DFAs accept. For example, in Section 3.2 we
produced a DFA for the intersection of the two languages {0x | x {0, 1}*} and {x0 | x
{0, 1}*}. For the union of those same two languages, we could use the same set of states and
the same transition function:
0
1
q1 ,r1 q1 ,r0 1
0
0
q0,r0
0
1
q2 ,r0 q2 ,r1 0
1
3.4 DFA PROOFS USING INDUCTION 27
The only difference is that this DFA accepts in more states. It accepts if either or both of the
original DFAs accept.
Proof: Let L1 and L2 be any two regular languages. Since they are regular, there
must be some DFAs M1 = (Q, , 1, q0, F1) with L(M1) = L1 and
M2 = (R, , 2, r0, F2) with L(M2) = L2. Construct a new DFA
M3 = (Q R, , , (q0, r0), F1 F2), where is defined so that for all
q Q, r R, and a , we have ((q, r), a) = ( 1(q, a), 2(r, a)). Since this
DFA simulates both M1 and M2 and accepts if and only if both accept, we can
conclude that L(M3 ) = L1 L2. It follows that L1 L2 is a regular language.
What exactly does it mean that the new DFA “simulates both M1 and M2”? Formally
speaking, it means this:
Lemma 3.1: In the product construction, for all x *, *((q0, r0), x) =
( 1*(q0, x), 2*(r0, x)).
That is, for any input string x, the new machine finds exactly the pair of final states that
the two original machines reach. Our proof of Theorem 3.2 assumed Lemma 3.1 was true
without proving it.
Does Lemma 3.1 require any proof? By construction, we know that for all q Q,
r R, and a , we have ((q, r), a) = ( 1(q, a), 2(r, a)). This is something like Lemma 3.1,
but only for a single symbol a, not for a whole string x. Whether the leap from the property
to the corresponding * property is obvious enough to be passed over without comment
is largely a matter of taste. Whenever you prove something, you have to exercise judgment
about how much detail to give. Not enough detail makes a proof unconvincing, while too
much makes it tiresome (and just as unconvincing). But even if you think Lemma 3.1 is
obvious enough not to require any proof, let’s prove it anyway.
28 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
For any fixed length of x, it is easy to prove Lemma 3.1. For example, when |x| = 0
*((q0, r0), x)
= *((q0, r0), ε) (since |x| = 0)
= (q0, r0) (by the definition of *)
= ( 1*(q0, ε), 2*(r0, ε)) (by the definitions of 1* and 2
*)
= ( 1*(q0, x), 2*(r0, x)) (since |x| = 0)
And when |x| = 1
*((q0, r0), x)
= *((q0, r0), ya) (for some symbol a and string y)
= ( *((q0, r0), y), a) (by the definition of *)
= (( 1*(q0, y), 2*(r0, y)), a) (using Lemma 3.1 for |y| = 0)
= ( 1( 1*(q0, y), a), 2( 2*(r0, y), a)) (by the construction of )
= ( 1*(q0, ya), 2*(r0, ya)) (by the definitions of 1* and 2*)
= ( 1*(q0, x), 2*(r0, x)) (since x = ya)
Notice that we used the fact that we already proved Lemma 3.1 for strings of length 0. When
|x| = 2
*((q0, r0), x)
= *((q0, r0), ya) (for some symbol a and string y)
= ( *((q0, r0), y), a) (by the definition of *)
= (( 1*(q0, y), 2*(r0, y)), a) (using Lemma 3.1 for |y| = 1)
= ( 1( 1*(q0, y), a), 2( 2*(r0, y), a)) (by the construction of )
= ( 1*(q0, ya), 2*(r0, ya)) (by the definitions of 1* and 2*)
= ( 1*(q0, x), 2*(r0, x)) (since x = ya)
Here, we used the fact that we already proved Lemma 3.1 for strings of length 1. As you can
see, the proof for |x| = 2 is almost the same as the proof for |x| = 1. In general, you can easily
continue with proofs for |x| = 3, 4, 5, 6, and so on, each one using the fact that the lemma
was already proved for shorter strings. Unfortunately, this is a proof process that never ends.
What we need is a finite proof that Lemma 3.1 holds for all the infinitely many different
lengths of x.
To prove Lemma 3.1 for all string lengths we use an inductive proof.
Proof: By induction on |x|.
Base case: When |x| = 0, we have
*((q0, r0), x)
= *((q0, r0), ε) (since |x| = 0)
= (q0, r0) (by the definition of *)
3.4 DFA PROOFS USING INDUCTION 29
= ( 1*(q0, ε), 2
*(r0, ε)) (by the definitions of 1
* and *)
2
= ( 1*(q0, x), 2
*(r 0
, x)) (since |x| = 0)
void proveit(int n) {
if (n==0) {
base case: prove for empty string
}
else {
proveit(n-1);
prove for strings of length n, assuming n-1 case has been proved
}
}
Not all inductive proofs follow this pattern exactly, of course. There are as many different
ways to prove something with induction as there are to program something with recursion.
But DFA-related proofs often do follow this pattern. Many such proofs use induction on the
length of the string, with the empty string as the base case.
30 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
1 0
0 1 2
1 0
What language does it accept? By experimenting with it, you can see that it rejects the
strings 1, 10, 100, 101, 111, and 1000, while it accepts 0, 11, 110, and 1001. Do you see
a pattern there? Can you give an intuitive characterization of the language, before reading
further?
First, here is a lemma that summarizes the transition function :
Lemma 3.2.1: For all states i Q and symbols c ,
(i, c) = (2i + c) mod 3.
Proof: By enumeration. Notice that we have named the states with the numbers
0, 1 and 2, which lets us do direct arithmetic on them. We have
(0, 0) = 0 = (2 0 + 0) mod 3
(0, 1) = 1 = (2 0 + 1) mod 3
(1, 0) = 2 = (2 1 + 0) mod 3
(1, 1) = 0 = (2 1 + 1) mod 3
(2, 0) = 1 = (2 2 + 0) mod 3
(2, 1) = 2 = (2 2 + 1) mod 3
Now for any string x over the alphabet {0, 1}, define val(x) as the natural number for
which x is a binary representation. (For completeness, define val(ε) = 0.) So, for example,
val(11) = 3, val(111) = 7, and val(000) = val(0) = val(ε) = 0. Using this, we can describe the
language accepted by the mystery DFA: L(M ) = {x | val(x) mod 3 = 0}. In other words, it is
the language of strings that are binary representations of numbers that are divisible by three.
Let’s prove by induction that L(M ) = {x | val(x) mod 3 = 0}. This illustrates a common
pitfall for inductive proofs and a technique for avoiding it. If you try to prove directly the
hypothesis L(M) = {x | val(x) mod 3 = 0}, you get into trouble:
Lemma 3.2.2 (weak): L(M ) = {x | val(x) mod 3 = 0}.
Proof: By induction on |x|.
Base case: When |x| = 0, we have
3.5 A MYSTERY DFA 31
*(0, x)
= *(0, ε) (since |x| = 0)
=0 (by definition of *)
So in this case x L(M) and val(x) mod 3 = 0.
Inductive case: When |x| > 0, we have
*(0, x)
= *(0, yc) (for some symbol c and string y)
= ( *(0, y), c) (by definition of *)
= ???
Here the proof would falter, because the inductive hypothesis we’re using is not strong
enough to make progress with. It tells us that *(0, y) = 0 if and only if val(x) mod 3 = 0, but
it does not tell us what *(0, y) is when val(x) mod 3 0. Without knowing that, we can’t
make progress from here.
To make a successful proof of Lemma 3.2.2, we actually need to prove something even
stronger. We will prove that *(0, x) = val(x) mod 3. This implies the weak version of Lemma
3.2.2, because the state of 0 is the only accepting state. But it is stronger than that, because
it tells you exactly what state the DFA will end up in after reading any string in {0,1}*. Using
that trick, let’s try the proof again:
Lemma 3.2.2 (strong): *(0, x) = val(x) mod 3.
Proof: By induction on |x|.
Base case: When |x| = 0, we have
*(0, x)
= *(0, ε) (since |x| = 0)
=0 (by definition of *)
= val(x) mod 3 (since val(x) mod 3 = val(ε) mod 3 = 0)
Inductive case: When |x| > 0, we have
*(0, x)
= *(0, yc) (for some symbol c and string y)
= ( *(0, y), c) (by definition of *)
= (val(y) mod 3, c) (using the inductive hypothesis)
= (2(val(y) mod 3) + c) mod 3 (by Lemma 3.2.1)
= 2(val(y) + c) mod 3 (using modulo arithmetic)
= val(yc) mod 3 (using binary arithmetic, val(yc) = 2(val(y)) + c)
= val(x) mod 3 (since x = yc)
32 CHAPTER THREE: CLOSURE PROPERTIES FOR REGULAR LANGUAGES
This technique is something you will often need to use with inductive proofs; to make
the induction go through, you will often find that you need to prove something stronger,
something more detailed, than your original hypothesis. This is a little counterintuitive at
first. It seems like it should be more difficult to prove something stronger and easier to prove
something weaker. But with induction, you get to use the thing you are trying to prove as the
inductive hypothesis, so proving something stronger can give you more to work with.
Exercises
EXERCISE 1
Draw DFAs for the following languages:
a. {0x0 | x {0, 1}*}
b. the complement of the language above
c. {xabay | x {a,b}* and y {a,b}*}
d. the complement of the language above
e. {waxayaz | w, x, y, and z are all in {a, b}*}
f. the complement of the language above
EXERCISE 2
The DFA constructed for the example in Section 3.2 accepts the language of strings over
the alphabet {0, 1} that start and end with a 0. It is not, however, the smallest DFA to do
so. Draw a DFA for the same language, using as few states as possible.
EXERCISE 3
The DFA constructed for the example in Section 3.2 has five states, but the full DFA
given by the product construction has six. Draw the full six-state DFA.
EXERCISE 4
Consider the following three DFAs and the languages they define:
b a,b
a
q r
b b
s t u
b a a,b
a b
v w x
EXERCISE 6
Consider this DFA M:
a,b
0 1
a,b
Deterministic-
Finite-Automata
Applications
We have seen how DFAs can be used to define formal languages.
In addition to this formal use, DFAs have practical applications.
DFA-based pieces of code lie at the heart of many commonly used
computer programs.
35
36 CHAPTER FOUR: DETERMINISTIC-FINITE-AUTOMATA APPLICATIONS
1 0
0 1 2
1 0
Now let’s see how this DFA can be implemented in the Java programming language.
The first thing to deal with is the input alphabet. The DFA above uses the alphabet
{0, 1}, which is the alphabet of interest for this problem. But the program will work with a
typed input string, so we do not have the luxury of restricting the alphabet in this way. The
program should accept "011" (a representation for the number 3) and reject "101" (a
representation for the number 5), but it must also properly reject strings like "01i" and
"fred". The alphabet for the Java implementation must be the whole set of characters that
can occur in a Java string—that is, the whole set of values making up the Java char type.
The DFA we actually implement will have four states, like this:
4.2 A DFA-BASED TEXT FILTER IN JAVA 37
∑
∑ - 0,1 ∑ - 0,1
0 1
∑ - 0,1
1 0
0 1 2
1 0
An object of the Mod3 class represents such a DFA. A Mod3 object has a current state,
which is encoded using the integers 0 through 3. The class definition begins like this:
/**
* A deterministic finite-state automaton that
* recognizes strings that are binary representations
* of natural numbers that are divisible
* by 3. Leading zeros are permitted, and the
* empty string is taken as a representation for 0
* (along with "0", "00", and so on).
*/
public class Mod3 {
/*
* Constants q0 through q3 represent states, and
* a private int holds the current state code.
*/
private static final int q0 = 0;
private static final int q1 = 1;
private static final int q2 = 2;
private static final int q3 = 3;
The int variables q0, q1, q2, and q3 are private (visible only in this class), static
(shared by all objects of this class), and final (not permitted to change after initialization).
38 CHAPTER FOUR: DETERMINISTIC-FINITE-AUTOMATA APPLICATIONS
In effect, they are named constants. The int field named state will hold the current state
of each Mod3 object.
The next part of this class definition is the transition function, a method named delta.
/**
* The transition function.
* @param s state code (an int)
* @param c char to make a transition on
* @return the next state code
*/
static private int delta(int s, char c) {
switch (s) {
case q0: switch (c) {
case '0': return q0;
case '1': return q1;
default: return q3;
}
case q1: switch (c) {
case '0': return q2;
case '1': return q0;
default: return q3;
}
case q2: switch (c) {
case '0': return q1;
case '1': return q2;
default: return q3;
}
default: return q3;
}
}
The delta method is declared private (cannot be called from outside the class) and
static (is not given an object of the class to work on). It is thus a true function, having
no side effects and returning a value that depends only on its parameters. It computes the
transition function shown in the previous DFA diagram.
Next, the class defines methods that modify the state field of a Mod3 object. The first
resets it to the start state; the second applies the transitions required for a given input string.
4.2 A DFA-BASED TEXT FILTER IN JAVA 39
/**
* Reset the current state to the start state.
*/
public void reset() {
state = q0;
}
/**
* Make one transition on each char in the given
* string.
* @param in the String to use
*/
public void process(String in) {
for (int i = 0; i < in.length(); i++) {
char c = in.charAt(i);
state = delta(state, c);
}
}
The process method makes one transition on each character in the input string. Note
that process handles the empty string correctly—by doing nothing.
The only other thing required is a method to test whether, after processing an input
string, the DFA has ended in an accepting state:
/**
* Test whether the DFA accepted the string.
* @return true if the final state was accepting
*/
public boolean accepted() {
return state==q0;
}
}
That is the end of the class definition. To test whether a string s is in the language defined by
this DFA, we would write something like
40 CHAPTER FOUR: DETERMINISTIC-FINITE-AUTOMATA APPLICATIONS
To demonstrate the Mod3 class we will use it in a Java application. This is a simple filter
program. It reads lines of text from the standard input, filters out those that are not binary
representations of numbers that are divisible by three, and echoes the others to the standard
output.
import java.io.*;
/**
* A Java application to demonstrate the Mod3 class by
* using it to filter the standard input stream. Those
* lines that are accepted by Mod3 are echoed to the
* standard output.
*/
public class Mod3Filter {
public static void main(String[] args)
throws IOException {
String s = in.readLine();
while (s!=null) {
m.reset();
m.process(s);
if (m.accepted()) System.out.println(s);
s = in.readLine();
}
}
}
To test this program, we can create a file named numbers containing the numbers zero
through ten in binary:
4.3 A TABLE-DRIVEN ALTERNATIVES 41
0
1
10
11
100
101
110
111
1000
1001
1010
After compiling Mod3Filter (and Mod3), we can use it on the file numbers
to filter out all the numbers not divisible by 3. On a Unix system, the command
java Mod3Filter < numbers produces this output:
0
11
110
1001
Of course, the array delta must first be initialized with the appropriate transitions, so that
delta[q0,'0'] is q0, delta[q0,'1'] is q1, and so on. To avoid the possibility of
the reference delta[state,c] being out of bounds, delta will have to be initialized
with a very large array. The program uses only 4 values for state, but there are 65,536
possible values for c! That is because Java uses Unicode, a 16-bit character encoding, for
the char type. Depending on the source of the input string, we may be able to restrict this
considerably—we may know, for example, that the characters are 7-bit ASCII (with values 0
through 127). Even so, we will have to initialize the array so that delta[state,c] is q3
for every value of c other than '0' and '1'.
Instead of using a very large array, we could use a small array but handle the
exception that occurs when the array reference is out of bounds. In Java this is the
ArrayIndexOutOfBoundsException; the process method could catch this
exception and use the state q3 as the next state whenever it occurs. The definition of the
delta array and the process method would then be the following:
/*
* The transition function represented as an array.
* The next state from current state s and character c
* is at delta[s,c-'0'].
*/
static private int[][] delta =
{{q0,q1},{q2,q0},{q1,q2},{q3,q3}};
/**
* Make one transition on each char in the given
* string.
* @param in the String to use
*/
public void process(String in) {
for (int i = 0; i < in.length(); i++) {
char c = in.charAt(i);
try {
state = delta[state][c-'0'];
}
catch (ArrayIndexOutOfBoundsException ex) {
state = q3;
}
}
}
EXERCISES 43
This is a reasonable way to solve the problem by hand. Automatically generated systems
usually use the full table with an element for every possible input character. One reason for
this is that when the full array is used, process need contain no reference to individual
states or characters. This way, any DFA can be implemented using the same process code,
just by substituting a different transition table.
Incidentally, the transition table is usually stored in a more compressed form than we
have shown. Our implementation used a full 32-bit int for each entry in the table, which
is quite wasteful. It would be relatively easy to implement this using one byte per entry,
and it would be possible to use even less, since we really need only two bits to represent
our four possible states. The degree of compression chosen is, as always, a trade-off: heavily
compressed representations take less space, but using them slows down each table access and
thus slows down each DFA transition.
Exercises
EXERCISE 1
Reimplement Mod3 using the full transition table. Assume that the characters in the
input string are in the range 0 through 127.
EXERCISE 2
Using the DFA-based approach, write a Java class Mod4Filter that reads lines of text
from the standard input, filters out those that are not binary representations of numbers
that are divisible by four, and echoes the others to the standard output.
EXERCISE 3
Using the DFA-based approach, write a Java class ManWolf that takes a string from the
command line and reports whether or not it represents a solution to the man-wolf-goat-
cabbage problem of Chapter 2. For example, it should have this behavior:
Nondeterministic
Finite Automata
A DFA has exactly one transition from every state on every symbol
in the alphabet. By relaxing this requirement we get a related but
more flexible kind of automaton: the nondeterministic finite
automaton (NFA).
NFAs are a bit harder to think about than DFAs because they do
not appear to define simple computational processes. They may seem
at first to be unnatural, like puzzles invented by professors for the
torment of students. But have patience! NFAs and other kinds of
nondeterministic automata arise naturally in many ways, as you
will see later in this book, and they too have a variety of practical
applications.
45
46 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
a
q0 q1
a,b
This is not a DFA, since it violates the rule that there must be exactly one transition from
every state on every symbol in the alphabet. There is no transition from the q1 state on either
a or b, and there is more than one transition from the q0 state on an a. This is an example of
a nondeterministic finite automaton (NFA).
How can you tell whether such a machine accepts a given string? Consider the string aa.
Following the arrows, there are three possible sequences of moves:
1. from q0 back to q0 on the first a, and back to q0 again on the second;
2. from q0 back to q0 on the first a, then to q1 on the second; and
3. from q0 to q1 on the first a, then getting stuck with no legal move on the second.
Only the second of these is an accepting sequence of moves. But the convention is that if
there is any way for an NFA to accept a string, that string is taken to be part of the language
defined by the NFA. The string aa is in the language of the NFA above because there is a
sequence of legal moves that reaches the end of aa in an accepting state. The fact that there
are also sequences of legal moves that do not succeed is immaterial.
The language defined by the NFA above is {xa | x {a, b}*}—that is, the language of
strings over the alphabet {a, b} that end in a. The machine can accept any such string by
staying in the q0 state until it reaches the a at the end, then making the transition to q1 as its
final move.
A DFA can be thought of as a special kind of NFA—an NFA that happens to specify
exactly one transition from every state on every symbol. In this sense every DFA is an NFA,
but not every NFA is a DFA. So in some ways the name nondeterministic finite automaton
is misleading. An NFA might more properly be called a not-necessarily-deterministic finite
automaton.
The extra flexibility afforded by NFAs can make them much easier to construct. For
example, consider the language {x {0, 1}* | the next-to-last symbol in x is 1}. The smallest
DFA for the language is this one:
5.2 SPONTANEOUS TRANSITIONS 47
0
1 1
0
0
The connection between the language and this DFA is not obvious. (With some effort you
should be able to convince yourself that the four states correspond to the four possibilities for
the last two symbols seen—00, 01, 10, or 11—and that the two accepting states are the states
corresponding to 10 and 11, where the next-to-last symbol is a 1.) An NFA for the language
can be much simpler:
0,1
1 0,1
This NFA has fewer states and far fewer transitions. Moreover, it is much easier to
understand. It can clearly accept a string if and only if the next-to-last symbol is a 1.
q1 a
q0
q2 b
To accept the string aa, this NFA would first make an ε-transition to q1, then make a
transition back to q1 on the first a, and finally another transition back to q1 on the second a.
Notice that although q0 is not an accepting state, this NFA does accept the empty string.
Because of the ε-transitions, it has three possible sequences of moves when the input is the
empty string: it can make no moves, staying in q0; it can move to q1 and end there; or it
can move to q2 and end there. Since two of these are sequences that end in accepting states,
the empty string is in the language defined by this NFA. In general, any state that has an ε-
transition to an accepting state is, in effect, an accepting state too.
ε-transitions are especially useful for combining smaller automata into larger ones.
The example above is an NFA that accepts {an} {bn}, and it can be thought of as the
combination of two smaller NFAs, one for the language {an} and one for the language {bn}.
For another example, consider these two NFAs:
The first accepts L1 = {an | n is odd}, and the second accepts L2 = {bn | n is odd}. Suppose you
wanted an NFA for the union of the two languages, L1 L2. You might think of combining
the two smaller NFAs by unifying their start states, like this:
5.2 SPONTANEOUS TRANSITIONS 49
a
a
b
b
But that does not work—it accepts strings like aab, which is not in either of the original
languages. Since NFAs may return to their start states, you cannot form an NFA for the
union of two languages just by combining the start states of their two NFAs. But you can
safely combine the two NFAs using ε-transitions, like this:
a
ε
a
b
ε
Similarly, suppose you wanted an NFA for the concatenation of the two languages,
{xy | x L1 and y L2}. You might think of combining the two smaller NFAs by using the
accepting state of the first as the start state of the second, like this:
a b
a b
But again that does not work—it accepts strings like abbaab, which is not in the desired
language. As before, the problem can be solved using ε-transitions:
a b
ε
a b
50 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
Using such techniques, large NFAs can be composed by combining smaller NFAs. This
property of NFAs is called compositionality, and it is one of the advantages NFAs have over
DFAs in some applications. We will see further examples of this in later chapters.
5.3 Nondeterminism
NFAs and DFAs both define languages, but only DFAs really define direct computational
procedures for testing language membership. It is easy to check whether a DFA accepts a
given input string: you just start in the start state, make one transition for each symbol in
the string, and check at the end to see whether the final state is accepting. You can carry this
computational procedure out for yourself or write a program to do it. But what about the
same test for an NFA—how can you test whether an NFA accepts a given input string? It is
no longer so easy, since there may be more than one legal sequence of steps for an input, and
you have to determine whether at least one of them ends in an accepting state. This seems
to require searching through all legal sequences of steps for the given input string, but how?
In what order? The NFA does not say. It does not fully define a computational procedure for
testing language membership.
This is the essence of the nondeterminism of NFAs:
1. For a given input there can be more than one legal sequence of steps.
2. The input is in the language if at least one of the legal sequences says so.
The first part of nondeterminism is something you may be familiar with from ordinary
programming. For example, a program using a random-number generator can make more
than one legal sequence of steps for a given input. But randomness is not the same thing
as nondeterminism. That second part of nondeterminism is the key, and it is what makes
nondeterministic automata seem rather alien to most programmers.
Because of their nondeterminism, NFAs are harder to implement than DFAs. In spite of
this higher level of abstraction, NFAs have many practical applications, as we will see in later
chapters. We will see algorithmic techniques for deciding whether an NFA accepts a given
string, and we will see that, in many applications, the compactness and compositionality of
NFAs outweigh the difficulties of nondeterminism.
The definition says (Q ( {ε}) P(Q)); this is the mathematical way of giving
the type of the function, and it says that takes two inputs, a state from Q and a symbol
from {ε}, and produces a single output, which is some subset of the set Q. If the NFA
is in the qi state, then (qi, a) is the set of possible next states after reading the a symbol
and (qi, ε) is the set of possible next states that can be reached on an ε-transition, without
consuming an input symbol. Thus the transition function encodes all the information
about the arrows in an NFA diagram. All the other parts of the NFA—the set of states, the
alphabet, the start state, and the set of accepting states—are the same as for a DFA.
Consider for example this NFA:
a,b
q0 a q1 b q2
Formally, the NFA shown is M = (Q, , , q0, F), where Q = {q0, q1, q2}, = {a, b},
F = {q2}, and the transition function is
(q0, a) = {q0, q1}
(q0, b) = {q0}
(q0, ε) = {q2}
(q1, a) = {}
(q1, b) = {q2}
(q1, ε) = {}
(q2, a) = {}
52 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
(q2, b) = {}
(q2, ε) = {}
When the diagram has no arrow from a given state on a given symbol, as for (q1, a), the
transition function reflects this by producing {}, the empty set, as the set of possible next
states. When the diagram has a single arrow from a given state on a given symbol, as for
(q0, b), the transition function produces a singleton set such as {q0}, showing that there
is exactly one possible next state. When the diagram allows more than one transition
from a given state on a given symbol, as for (q0, a), the transition function produces a set
containing more than one possible next states.
The example above demonstrates an odd technicality about NFA alphabets. We
assumed for the example that = {a, b}, but all we really know from looking at the diagram
is that contains a and b. It could contain other symbols as well. A DFA diagram tells
you unequivocally what the alphabet must be, because it gives a transition from every state
on every symbol in the alphabet. An NFA need not give transitions on every symbol, so
the alphabet is not fully determined by the diagram. Usually this is an inconsequential
technicality; the language defined by the machine above is {a, b}*, even if we take it that
= {a, b, c}. But sometimes you do need to know the exact alphabet (as when taking the
complement of the language), and at such times an NFA diagram by itself is inadequate.
The definition permits or = ε. Thus the move can be either a move on one symbol
of input or an ε-transition. Next we define an extended relation * for sequences of zero or
more steps.
EXERCISES 53
Notice here that * is reflexive; for any ID I, I * I by a sequence of zero moves. Using the
* relation, we can define the * function for M:
Thus, for any string x * and any state q Q, *(q, x) is the set of possible states for the
NFA to end up in, starting from q and making some sequence of legal -transitions on the
symbols in x. Using this extended transition function, it is possible to express the idea that an
NFA accepts a string if it has at least one path to an accepting state on that string:
A string x is accepted if and only if the NFA, started in its start state and taken through some
sequence of transitions on the symbols in x, has at least one accepting state as a possible final
state. We can also define L(M), the language accepted by an NFA M:
For any NFA M = (Q, , , q0, F ), L(M ) denotes the language accepted by M,
which is L(M) = {x * | *(q0, x) contains at least one element of F }.
Exercises
EXERCISE 1
For each of the following strings, say whether it is in the language accepted by this NFA:
a,b a,b
a a
a. aa
b. baabb
c. aabaab
d. ababa
e. ababaab
54 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
EXERCISE 2
For each of the following strings, say whether it is in the language accepted by this NFA:
a
b
a,b
a. bb
b. bba
c. bbabb
d. aabaab
e. ababa
f. abbaab
EXERCISE 3
Draw a DFA that accepts the same language as each of the following NFAs. Assume that
the alphabet in each case is {a, b}; even though not all the NFAs explicitly mention both
symbols, your DFAs must.
a.
a,ε
b.
a a,b
c.
a
d.
a, b
EXERCISES 55
e.
b
f.
EXERCISE 4
Draw the diagram for an NFA for each of the following languages. Use as few states
and as few transitions as possible. Don’t just give a DFA, unless you are convinced it is
necessary.
a. {x {a, b}* | x contains at least 3 as}
b. {x {a, b}* | x starts with at least 3 consecutive as}
c. {x {a, b}* | x ends with at least 3 consecutive as}
d. {x {a, b}* | x contains at least 3 consecutive as}
e. {x {a, b}* | x has no two consecutive as}
f. {axb | x {a, b}*}
g. {x {0, 1}* | x ends in either 0001 or 1000}
h. {x {0, 1}* | x either starts with 000 or ends with 000, or both}
EXERCISE 5
Draw an NFA for each of the following languages. Hint: Try combining smaller NFAs
using ε-transitions.
a. {an | n is even} {bn | n is odd}
b. {(ab)n} {(aba)n}
c. {x {a, b}* | x has at least 3 consecutive as at the start or end (or both)}
d. {x {a, b}* | the number of as in x is odd} {bn | n is odd} {(aba)n}
e. {an | n is divisible by at least one of the numbers 2, 3, or 5}
f. {xy {a, b}* | the number of as in x is odd and the number of bs in y is even}
g. {xy | x {(ab)n} and y {(aba)m}}
EXERCISE 6
a. Draw a DFA for the set of strings over the alphabet {a} whose length is divisible by 3
or 5 (or both).
b. Draw an NFA for the same language, using at most nine states.
EXERCISE 7
Draw the diagram for each of the following NFAs.
a. ({q0, q1}, {0, 1}, , q0, {q0}), where the transition function is
56 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
(q0, 0) = {}
(q0, 1) = {q1}
(q0, ε) = {}
(q1, 0) = {q1}
(q1, 1) = {q0}
(q1, ε) = {}
b. ({q0, q1, q2}, {0, 1}, , q0, {q2}), where the transition function is
(q0, 0) = {q0, q1}
(q0, 1) = {q0}
(q0, ε) = {}
(q1, 0) = {}
(q1, 1) = {q2}
(q1, ε) = {q0}
(q2, 0) = {}
(q2, 1) = {}
(q2, ε) = {}
c. ({q0, q1, q2}, {a, b, c}, , q0, {q0}), where the transition function is
(q0, a) = {q1}
(q0, b) = {}
(q0, c) = {}
(q0, ε) = {}
(q1, a) = {}
(q1, b) = {q2}
(q1, c) = {}
(q1, ε) = {}
(q2, a) = {}
(q2, b) = {}
(q2, c) = {q0}
(q2, ε) = {}
EXERCISE 8
State each of the following NFAs formally, as a 5-tuple. The alphabet in each case should
be {a, b}.
a.
a a,b
a
EXERCISES 57
b.
a
c.
a, b
d.
EXERCISE 9
For each machine M in the previous exercise, state what the language L(M ) is, using sets
and/or set formers.
EXERCISE 10
Evaluate each of these expressions for the following NFA:
0,1
1 q1 0,1 q2
q0
a. *(q0, 010)
b. (q2, 0)
c. *(q2, ε)
d. *(q0, 1101)
e. *(q0, 1011)
EXERCISE 11
Evaluate each of these expressions for the following NFA:
a,b
q0 a q1 b q2
ε
58 CHAPTER FIVE: NONDETERMINISTIC FINITE AUTOMATA
a. (q0, a)
b. (q0, ε)
c. *(q0, a)
d. *(q0, ab)
e. (q1, ε)
f. *(q1, ε)
EXERCISE 12
Construct NFAs with the following properties:
a. an NFA N with L(N) = {a}, that has exactly one accepting sequence of moves
b. an NFA N with L(N) = {a}, that has exactly two accepting sequence of moves
c. an NFA N with L(N) = {a}, that has infinitely many accepting sequences of moves
EXERCISE 13
Prove formally that for any NFA M = (Q, , , q0, F), any strings x * and
y *, and any state q Q,
δ*(q , xy) = δδ*(q' , y).
⊃
q' ∈ δ *(q , x)
Hint: induction is not helpful here. Use the definition of * for NFAs to express the sets
in terms of sequences of moves using the * relation.
EXERCISE 14
You’re given an integer i > 0. Give the 5-tuple for an NFA for the language
{0ix | x {0, 1}*}.
EXERCISE 15
You’re given a DFA M = (Q, , , q0, F). Show how to construct the 5-tuple for a new
NFA N with L(N) = L(M) {ε}.
EXERCISE 16
Prove Theorem 3.3 (closure of regular languages under union) again, this time using
NFAs in the construction.
EXERCISE 17
Show how to take any given NFA M = (Q, , , q0, F ) and construct another NFA
N = (Q', , ', q0', F'), so that |F'| = 1 and L(M ) = L(N). Be sure to give a convincing
proof that the two languages are equal.
EXERCISE 18
Suppose you’re given an NFA M that has a single accepting state. Show how to construct
the 5-tuple for a new NFA N with L(N ) = {xy | x L(M ) and y L(M )}. Be sure to give
a convincing proof that L(N) is the language specified.
6
CHA P TER
NFA Applications
59
60 CHAPTER SIX: NFA APPLICATIONS
q0 1 q1 0,1 q2
Now let’s see how this NFA can be implemented in the Java language. As in Chapter 4, we
will write the NFA as a Java class. As in the table-based implementation in Section 4.3, we
will use a table to represent the transition function . Where (q, a) is the set of zero or more
possible next states from the state q reading the symbol a, in our Java array
delta[q][a-'0'] will be an array of zero or more possible next states. Thus the delta
array is a three-dimensional array of int values.
/**
* A nondeterministic finite-state automaton that
* recognizes strings of 0s and 1s with 1 as the
* next-to-last character.
*/
public class NFA1 {
/*
* The transition function represented as an array.
* The entry at delta[s,c-'0'] is an array of 0 or
* more ints, one for each possible move from
* the state s on the character c.
*/
private static int[][][] delta =
{{{0},{0,1}}, // delta[q0,0], delta[q0,1]
{{2},{2}}, // delta[q1,0], delta[q1,1]
{{},{}}}; // delta[q2,0], delta[q2,1]
To determine whether the NFA accepts a string, we will do the obvious thing: just
search all possible paths through the machine on that string. If we find one that ends in
an accepting state, we accept; otherwise we reject. This search is easy to implement using
recursion:
6.1 AN NFA IMPLEMENTED WITH A BACKTRACKING SEARCH 61
/**
* Test whether there is some path for the NFA to
* reach an accepting state from the given state,
* reading the given string at the given character
* position.
* @param s the current state
* @param in the input string
* @param pos index of the next char in the string
* @return true if the NFA accepts on some path
*/
private static boolean accepts
(int s, String in, int pos) {
if (pos==in.length()) { // if no more to read
return (s==2); // accept if final state is q2
}
The code uses some of the same tricks we saw in Chapter 4. In particular, it uses
exception handling to catch out-of-bounds array references, so that we do not need the array
to have entries for char values other than '0' and '1'. The new trick here is the recursive
62 CHAPTER SIX: NFA APPLICATIONS
/**
* Test whether the NFA accepts the string.
* @param in the String to test
* @return true if the NFA accepts on some path
*/
public static boolean accepts(String in) {
return accepts(0, in, 0); // start in q0 at char 0
}
}
That is the end of the class definition. To test whether a string s is in the language
defined by this NFA, we would write something like
if (NFA1.accepts(s)) ...
This implementation is not particularly object-oriented, since the whole thing uses static
methods and fields and no objects of the NFA1 class are created. All the information
manipulated in the recursive search is carried in the parameters, so no objects were needed.
/*
* The current set of states, encoded bitwise:
* state i is represented by the bit 1<<i.
*/
private int stateSet;
We use a bit-coded integer to represent a set of states. The << operator in Java shifts an
integer to the left, so 1<<0 is 1 shifted left 0 positions (thus 1<<0 is 1), 1<<1 is 1 shifted
left 1 position (thus 1<<1 is 2), 1<<2 is 1 shifted left 2 positions (thus 1<<2 is 4), and so
on. In general, an NFA implemented this way will use the number 1<<i to represent the
i state. For this NFA, there are only the three states 1<<0, 1<<1, and 1<<2. Since we are
using one bit for each state, we can form any set of states just by combining the state bits
with logical OR. For example, the set {q0, q2} can be represented as 1<<0|1<<2.
The initial state set is {q0}. This method initializes the current state set to that value:
/**
* Reset the current state set to {the start state}.
*/
public void reset() {
stateSet = 1<<0; // {q0}
}
We will use the table-driven approach for the transition function, just as we did for
DFAs. In this case, however, each entry in the table is not just a single state, but a bit-coded
set of states:
/*
* The transition function represented as an array.
* The set of next states from a given state s and
* character c is at delta[s,c-'0'].
*/
static private int[][] delta =
64 CHAPTER SIX: NFA APPLICATIONS
The process method looks at characters in the input string and keeps track of the set
of possible states after each one. This is the only part that is significantly more complicated
than the DFA case, because for each character in the input string we have to execute a little
loop. To compute the next set of states given a symbol c, we need to take the union of the
sets (q, c) for each state q in the current set of states. The process method does that in its
inner loop; its outer loop repeats the computation once for each symbol in the input string.
/**
* Make one transition from state-set to state-set on
* each char in the given string.
* @param in the String to use
*/
public void process(String in) {
for (int i = 0; i < in.length(); i++) {
char c = in.charAt(i);
int nextSS = 0; // next state set, initially empty
for (int s = 0; s <= 2; s++) { // for each state s
if ((stateSet&(1<<s)) != 0) { // if maybe in s
try {
nextSS |= delta[s][c-'0'];
}
catch (ArrayIndexOutOfBoundsException ex) {
// in effect, nextSS |= 0
}
}
}
stateSet = nextSS; // new state set after c
}
}
6.3 THE SUBSET CONSTRUCTION 65
All that remains is a method to test whether the NFA should accept. It accepts if the final
set of possible states includes the accepting state q2.
/**
* Test whether the NFA accepted the string.
* @return true if the final set includes
* an accepting state.
*/
public boolean accepted() {
return (stateSet&(1<<2))!=0; // true if q2 in set
}
}
That is the end of the class definition. To test whether a string s is in the language defined by
this NFA, we would write something like
The interface to this is largely the same as for the DFAs we developed in Chapter 4.
Because the Java int type is a 32-bit word, our NFA2 code generalizes simply for any
NFA of up to 32 states. For NFAs of up to 64 states, the 64-bit long type could be used
easily. To implement an NFA with more than 64 states in Java would require some additional
programming. One could, for example, use an array of ⎡n/32⎤ int variables to represent
the set of n states. The process loop would be considerably more complicated, and slower,
for such large NFAs.
As the NFA2 implementation runs, the stateSet field keeps track of the set of states
that the NFA could be in. This basic idea—keeping track of the set of possible states after
each symbol is read—is the basis for a construction that lets us convert any NFA into a DFA.
0,1
q0 1 q1 0,1 q2
We will construct a DFA for the same language. The new DFA will in fact simulate the NFA,
by keeping track of the states the NFA might be in after each symbol of the input string.
Being (potentially) nondeterministic, the NFA can have more than one legal sequence of
steps for an input, and so can be in more than one possible state after each symbol of the
input string. So each state of the DFA must correspond to some subset of the set of states
from the NFA. That is why this is called the subset construction.
Initially, of course, the NFA is in its start state. We will create a start state for the DFA
and label it with the set of states {q0}:
0 ...
q0
1 ...
Next, suppose the NFA is in q0 and reads a 0. What is the set of possible states after that?
It is (q0, 0) = {q0} again, so the transition on 0 in our DFA should return to the state labeled
{q0}. If the NFA is in q0 and reads a 1, its set of possible next states is (q0, 1) = {q0, q1}, so we
will add a new state labeled {q0, q1} to the DFA under construction. So far we have
0 ...
q0 1 q0,q1
0
1 ...
Next, suppose the NFA is in one of the states {q0, q1} and reads a 0. What is the set of
possible states after that? We have (q0, 0) = {q0} and (q1, 0) = {q2}, so the next set of possible
states is (q0, 0) (q1, 0) = {q0, q2}. Similarly, if the NFA is in one of the states {q0, q1} and
reads a 1, its set of possible next states is (q0, 1) (q1, 1) = {q0, q1, q2}. We add these two
new states to the DFA under construction, producing
6.3 THE SUBSET CONSTRUCTION 67
q0 1 q0,q1
0
0 1
... 0 0 ...
q0,q2 q0,q1,q2
... 1 ...
Each step in the construction follows this same pattern. Every state of the DFA is labeled
with a set of states from the NFA. The transition on a symbol a from a state labeled R in the
DFA is determined by taking the union, over all r R, of the NFA’s (r, a). (This is exactly
what the inner loop of the process method in our NFA2 implementation computes.)
Because there are only finitely many different subsets of the set of states in the NFA; this
construction eventually stops with a finite DFA. In our example, in fact, we have already
found all the reachable subsets of states; the remaining transitions all return to sets already
added to the DFA.
The only remaining problem is to decide which of the states of the DFA should be
accepting states. Since the NFA will accept whenever its set of possible final states contains
at least one accepting state, that is what the DFA should do too. All the states labeled with
subsets containing at least one accepting state of the NFA should be accepting states of the
DFA. (This is exactly what the accepted method of our NFA2 implementation does.)
The final result is this:
q0 1 q0,q1
0
1
1
0
0
q0,q2 q0,q1,q2 1
0
If you now discard the state labels, you can see that this is the same as the DFA given in
Section 5.1 for the original language, {x {0, 1}* | the next-to-last symbol in x is 1}.
The construction just illustrated is the subset construction. The following proof expresses
it formally:
68 CHAPTER SIX: NFA APPLICATIONS
Q D = P(Q N)
δD(R , a) = δδ*N (r , a), for all R ∈ Q D and a ∈ Σ
⊃
r ∈R
q D = δδ*N (q N , ε)
ε
F D = {R ∈ Q D | R ∩ F N ≠ {} }
q0
There is no legal sequence of moves in this NFA for any string other than the empty string.
The set of states QN = {q0}, and the powerset of that is QD = P(QN ) = {{}, {q0}}. The DFA
produced by the construction will therefore be this two-state machine:
6.4 NFAS ARE EXACTLY AS POWERFUL AS DFAS 69
q0 0,1
0,1
No matter what the original NFA is, the construction always gives
δD({} , a) = δδ*N (r , a)= {}
⊃
r ∈ {}
so the {} state in the DFA produced by the subset construction always has transitions back
to itself for all symbols in the alphabet. In this way, the subset construction automatically
provides {} as a nonaccepting trap state. This trap state will be reachable from the start state if
and only if the original NFA can get stuck—that is, if and only if there is some input string x
for which δ *N (q N , x) = {}.
We have seen two different approaches for implementing NFAs. The subset construction
gives us a third: we can first convert the NFA into a DFA using the subset construction, then
implement that using the techniques of Chapter 4.
Lemma 6.2: If L is any regular language, there is some NFA N for which
L(N) = L.
Proof: Let L be any regular language. By definition there must be some DFA
D = (Q, , D, q0, F) with L(D) = L. This DFA immediately gives an equivalent
NFA N = (Q, , N, q0, F), where we define N(q, a) = { D(q, a)} for all q Q
and a , and N(q, ε) = {} for all q Q. Clearly, we have *N(q, x) = { *D(q, x)}
for all q Q and x *, and so N accepts x if and only if D
accepts x. It follows that L(N) = L(D) = L.
Although in diagrams it is clear that every DFA is an NFA, it is not quite true with
formal 5-tuples, since the functions of DFAs and NFAs have different types. However,
this difference is trivial, as the proof above illustrates; wherever a DFA has D(q, a) = r, an
equivalent NFA should have N(q, a) = {r }. The jump in the proof from N(q, a) = { D(q, a)}
to *N(q, x) = { *D(q, x)} can be made more rigorous using a routine inductive proof of the
kind demonstrated in Section 3.4.
From the previous two lemmas it follows that the set of languages that can be defined
using NFAs is exactly the set of regular languages.
70 CHAPTER SIX: NFA APPLICATIONS
Theorem 6.1: A language L is L(N ) for some NFA N if and only if L is a regular
language.
Proof: It follows immediately from Lemmas 6.1 and 6.2.
We conclude that allowing nondeterminism in finite automata can make them more
compact and easier to construct, but in the sense shown in this theorem, it neither weakens
nor strengthens them.
Exercises
EXERCISE 1
The DFA constructed for the example in Section 6.3 has four states, but the full DFA
given by the formal subset construction has eight. Draw the full eight-state DFA,
including the unreachable states.
EXERCISE 2
Convert this NFA into a DFA using the subset construction. Draw the DFA and label
each state with the corresponding subset of states from the NFA. You do not have to
show unreachable states, if any.
q0 a q1 b q2
EXERCISE 3
Convert this NFA into a DFA using the subset construction. Draw the DFA and label
each state with the corresponding subset of states from the NFA. You do not have to
show unreachable states, if any.
a,b
q0 a q1 b q2
72 CHAPTER SIX: NFA APPLICATIONS
EXERCISE 4
Convert this NFA into a DFA using the subset construction. Draw the DFA and label
each state with the corresponding subset of states from the NFA. You do not have to
show unreachable states, if any.
a,b
q0 a q1 b,ε q2
EXERCISE 5
Convert this NFA into a DFA using the subset construction. Draw the DFA and label
each state with the corresponding subset of states from the NFA. You do not have to
show unreachable states, if any.
0,1
q0 q1
0,1 1 1
0 q2
EXERCISE 6
Convert this NFA into a DFA using the subset construction. Draw the DFA and label
each state with the corresponding subset of states from the NFA. You do not have to
show unreachable states, if any.
b
q1 q4
a,ε a,ε
a b b
q0 q2 q5 q6
b a
q3
EXERCISES 73
EXERCISE 7
Write a Java class that implements this NFA:
a,b a,b
a a
ε
a
b
ε
Regular Expressions
7
Most programmers and other power-users of computer systems have used tools that match text
patterns. You may have used a Web search engine with a pattern like travel cancun
OR acapulco, trying to find information about either of two travel destinations. You may
have used a search tool with a pattern like gr[ea]y, trying to find files with occurrences of
either grey or gray. Any such pattern for text works as a definition of a formal language.
Some strings match the pattern and others do not; the language defined by the pattern is the
set of strings that match. In this chapter we study a particular kind of formal text pattern
called a regular expression. Regular expressions have many applications. They also provide
an unexpected affirmation of the importance of some of the theoretical concepts from previous
chapters.
This is a common occurrence in mathematics. The first time a young student sees the mathematical
constant , it looks like just one more school artifact: one more arbitrary symbol whose definition
to memorize for the next test. Later, if he or she persists, this perception changes. In many
branches of mathematics and with many practical applications, keeps on turning up. "There
it is again!" says the student, thus joining the ranks of mathematicians for whom mathematics
seems less like an artifact invented and more like a natural phenomenon discovered.
So it is with regular languages. We have seen that DFAs and NFAs have equal definitional power.
It turns out that regular expressions also have exactly that same definitional power: they can be
used to define all the regular languages and only the regular languages. There it is again!
75
76 CHAPTER SEVEN: REGULAR EXPRESSIONS
The Kleene closure of a language is the set of strings that can be formed by concatenating
any number of strings, each of which is an element of that language:
Note that this generalizes the definition we used in Chapter 1 for the Kleene closure of an
alphabet.
There is a common mistake to guard against as you read the Kleene closure definition.
It does not say that L* is the set of strings that are concatenations of zero or more copies
of some string in L. That would be {xn | n 0, with x L}; compare that with the actual
definition above. The actual definition says that L* is the set of strings that are concatenations
of zero or more substrings, each of which is in the language L. Each of those zero or more
substrings may be a different element of L. For example, the language {ab, cd}* is the
language of strings that are concatenations of zero of more things, each of which is either ab
or cd: {ε, ab, cd, abab, abcd, cdab, cdcd, …}. Note that because the definition allows zero or
more, L* always includes ε.
Now we can define regular expressions.
A regular expression is a string r that denotes a language L(r) over some alphabet
. The six kinds of regular expressions and the languages they denote are as
follows. First, there are three kinds of atomic regular expressions:
1. Any symbol a is a regular expression with L(a) = {a}.
2. The special symbol ε is a regular expression with L(ε) = {ε}.
3. The special symbol is a regular expression with L( ) = {}.
There are also three kinds of compound regular expressions, which are built from
smaller regular expressions, here called r, r1, and r2:
4. (r1 + r2) is a regular expression with L(r1 + r2) = L(r1) L(r2)
5. (r1r2) is a regular expression with L(r1r2) = L(r1)L(r2)
6. (r)* is a regular expression with L((r)*) = (L(r))*
7.2 EXAMPLES OF REGULAR EXPRESSIONS 77
The parentheses in compound regular expressions may be omitted, in which
case * has highest precedence and + has lowest precedence.
Regular expressions make special use of the symbols ε, , +, *, (, and ), so for simplicity
we will assume that these special symbols are not included in . Regular expressions can
look just like ordinary strings, so you will occasionally have to rely on the context to decide
whether a string such as abc is just a string or a regular expression denoting the language
{abc}.
Unfortunately, the term regular expression is heavily overloaded. It is used for the text
patterns of many different tools like awk, sed, and grep, many languages like Perl, Python,
Ruby, and PHP, and many language libraries like those for Java and the .NET languages.
Each of these applications means something slightly different by regular expression, and we
will see more about these applications later.
have ε L((r)*). Each of the concatenated substrings can be a different element of L(r). For
example:
Regular expression: (a + b)*
Language denoted: {a, b}*
In that example the parentheses were necessary, since * has higher precedence than +. The
regular expression a + b* would denote the language {a} {bn}.
It is important to understand that (a + b)* is not the same as (a* + b*). Any confusion
about this point will cause big problems as we progress. In L((a + b)*) we concatenate zero
or more things together, each of which may be either an a or a b. That process can construct
every string over the alphabet {a, b}. In L(a* + b*) we take the union of two sets: the set of
all strings over the alphabet {a}, and the set of all strings over the alphabet {b}. That result
contains all strings of only as and all strings of only bs, but does not contain any strings that
contain both as and bs.
Occasionally the special symbol ε appears in a regular expression, when the empty string
is to be included in the language:
Regular expression: ab + ε
Language denoted: {ab, ε}
The special symbol appears very rarely. Without it there is no other way to denote the
empty set, but that is all it is good for. It is not useful in compound regular expressions, since
for any regular expression r we have L((r) ) = L( (r)) = {}, L(r + ) = L( + r) = L(r), and
L( *) = {ε}.
The subexpressions in a compound expression may be compound expressions
themselves. This way, compound expressions of arbitrary complexity may be developed:
Regular expression: (a + b)(c + d )
Language denoted: {ac, ad, bc, bd }
Regular expression: (abc)*
Language denoted: {(abc)n} = {ε, abc, abcabc, abcabcabc, ...}
Regular expression: a*b*
Language denoted: {anbm} = {ε, a, b, aa, ab, bb, aaa, aab, ...}
Regular expression: (a + b)*aa(a + b)*
Language denoted: {x {a, b}* | x contains at least two consecutive as}
Regular expression: (a + b)*a(a + b)*a(a + b)*
Language denoted: {x {a, b}* | x contains at least two as}
Regular expression: (a*b*)*
Language denoted: {a, b}*
7.3 FOR EVERY REGULAR EXPRESSION, A REGULAR LANGUAGE 79
In this last example, the regular expression (a*b*)* turns out to denote the same language
as the simpler (a + b)*. To see why, consider that L(a*b*) contains both a and b. Those two
symbols alone are enough for the Kleene star to build all of {a, b}*. In general, whenever
L(r), then L((r)*) = *.
The constructed machine will have a start state, a single accepting state, and a collection
of other states and transitions that ensure that there is a path from the start state to the
accepting state if and only if the input string is in L(r).
Because all our NFAs will have this form, they can be combined with each other very
neatly. For example, if you have NFAs for L(r1) and L(r2), you can easily construct an NFA
for L(r1 + r2):
r1
ε ε
ε ε
r2
In this new machine there is a new start state with ε-transitions to the two original start states
and a new accepting state with ε-transitions from the two original accepting states (which are
no longer accepting states in the new machine). Clearly the new machine has a path from the
80 CHAPTER SEVEN: REGULAR EXPRESSIONS
start state to the accepting state if and only if the input string is in L(r1 + r2). And it has the
special form—a single accepting state, not the same as the start state—which ensures that it
can be used as a building block in even larger machines.
The previous example shows how to do the construction for one kind of regular
expression—r1 + r2. The basic idea of the following proof sketch is to show that the same
construction can be done for all six kinds of regular expressions.
Lemma 7.1: If r is any regular expression, there is some NFA N that has a single
accepting state, not the same as the start state, with L(N ) = L(r).
Proof sketch: For any of the three kinds of atomic regular expressions, an NFA
of the desired kind can be constructed as follows:
a
a ∈∑:
ε
ε:
∅:
For any of the three kinds of compound regular expressions, given appropriate
NFAs for regular subexpressions r1 and r2, an NFA of the desired kind can be
constructed as follows:
7.4 REGULAR EXPRESSIONS AND STRUCTURAL INDUCTION 81
r1
ε ε
(r1 + r2):
ε ε
r2
r1
ε
(r1r2):
r2
ε r1 ε
(r1)*:
Thus, for any regular expression r, we can construct an equivalent NFA with a
single accepting state, not the same as the start state.
machine actually accepts the language it is supposed to accept. More significantly, a rigorous
proof would be organized as a structural induction.
As we have observed, there are as many different ways to prove something with induction
as there are to program something with recursion. Our proofs in Chapter 3 concerning
DFAs used induction on a natural number—the length of the input string. That is the style
of induction that works most naturally for DFAs. Structural induction performs induction
on a recursively defined structure and is the style of induction that works most naturally
for regular expressions. The base cases are the atomic regular expressions, and the inductive
cases are the compound forms. The inductive hypothesis is the assumption that the proof has
been done for structurally simpler cases; for a compound regular expression r, the inductive
hypothesis is the assumption that the proof has been done for r’s subexpressions.
Using structural induction, a more formal proof of Lemma 7.1 would be organized like
this:
(For each atomic form you would give the NFA, as in the previous proof
sketch.)
Inductive cases: When r is a compound expression, it has one of these three
forms:
A proof using this style of induction demonstrates that Lemma 7.1 holds for any atomic
regular expression and then shows that whenever it holds for some expressions r1 and r2,
7.5 FOR EVERY REGULAR LANGUAGE, A REGULAR EXPRESSION 83
it also holds for (r1 + r2), (r1r2), and (r1)*. It follows that the lemma holds for all regular
expressions.
The construction that proves this lemma is rather tricky, so it is relegated to Appendix A. For
now, a short example will suffice.
Consider again this NFA (which is also a DFA) from Chapter 3:
0 1
1 0
0 1 2
1 0
We proved that this machine accepts the language of strings that are binary representations of
numbers that are divisible by three. Now we will construct a regular expression for this same
language. It looks like a hard problem; the trick is to break it into easy pieces.
When you see an NFA, you normally think only of the language of strings that take it
from the start state to end in any accepting state. But let’s consider some other languages that
are relevant to this NFA. For example, what is the language of strings that take the machine
from state 2 back to state 2, any number of times, without passing through states 0 or 1? Any
string of zero or more 1s would do it, and that is an easy language to give a regular expression
for:
1*
Next, a bigger piece: what is the language of strings that take the machine from 1 back to
1, any number of times, without passing through state 0? The machine has no transition that
allows it to go from 1 directly back to 1. So each single trip from 1 back to 1 must follow
these steps:
1. Go from 1 to 2.
2. Go from 2 back to 2, any number of times, without passing through 0 or 1.
3. Go from 2 to 1.
The first step requires a 0 in the input string. The second step is a piece we already have
a regular expression for: 1*. The third step requires a 0 in the input string. So the whole
language, the language of strings that make the machine do those three steps repeated any
84 CHAPTER SEVEN: REGULAR EXPRESSIONS
number of times, is
(01*0)*
Next, a bigger piece: what is the language of strings that take the machine from 0 back to
0, any number of times? There are two ways to make a single trip from 0 back
to 0. The machine can make the direct transition from state 0 to state 0 on an input
symbol 0 or it can follow these steps:
1. Go from 0 to 1.
2. Go from 1 back to 1, any number of times, without passing through 0.
3. Go from 1 to 0.
The first step requires a 1 in the input string. The second step is a piece we already have a
regular expression for: (01*0)*. The third step requires a 1 in the input string. So the whole
language, the language of strings that make the machine go from state 0 back to state 0 any
number of times, is
(0 + 1(01*0)*1)*
This also defines the whole language of strings accepted by the NFA—the language of strings
that are binary representations of numbers that are divisible by three.
The proof of Lemma 7.2 is a construction that builds a regular expression for the
language accepted by any NFA. It works something like the example just shown, defining the
language in terms of smaller languages that correspond to restricted paths through the NFA.
Appendix A provides the full construction. But before we get lost in those details, let’s put all
these results together:
DFAs, NFAs, and regular expressions all have equal power for defining languages.
EXERCISES 85
Exercises
EXERCISE 1
Give a regular expression for each of the following languages.
a. {abc}
b. {abc, xyz}
c. {a, b, c}*
d. {ax | x {a, b}*}
e. {axb | x {a, b}*}
f. {(ab)n}
g. {x {a, b}* | x contains at least three consecutive as}
h. {x {a, b}* | the substring bab occurs somewhere in x}
i. {x {a, b}* | x starts with at least three consecutive as}
j. {x {a, b}* | x ends with at least three consecutive as}
k. {x {a, b}* | x contains at least three as}
l. {x {0, 1}* | x ends in either 0001 or 1000}
m. {x {0, 1}* | x either starts with 000 or ends with 000, or both}
n. {an | n is even} {bn | n is odd}
o. {(ab)n} {(aba)n}
p. {x {a}* | the number of as in x is odd}
q. {x {a, b}* | the number of as in x is odd}
r. {x {a, b}* | the number of as in x is odd} {bn | n is odd} {(aba)n}
s. {an | n is divisible by at least one of the numbers 2, 3, or 5}
t. {x {a, b}* | x has no two consecutive as}
u. {xy {a, b}* | the number of as in x is odd and the number of bs in y is even}
EXERCISE 2
For each of these regular expressions, give two NFAs: the exact one constructed by the
proof of Lemma 7.1, and the smallest one you can think of.
a.
b. ε
c. a
d. 0+1
e. (00)*
f. ab*
EXERCISE 3
For the following DFA, give a regular expression for each of the languages indicated.
When the question refers to a machine “passing through” a given state, that means
entering and then exiting the state. Merely starting in a state or ending in it does not
count as “passing through.”
86 CHAPTER SEVEN: REGULAR EXPRESSIONS
q0 q1
b b b b
q2 q3
a. the language of strings that make the machine, if started in q0, end in q0, without passing
through q0, q1, q2, or q3
b. the language of strings that make the machine, if started in q0, end in q2, without passing
through q0, q1, q2, or q3
c. the language of strings that make the machine, if started in q2, end in q0, without passing
through q0, q1, q2, or q3
d. the language of strings that make the machine, if started in q2, end in q2, without passing
through q0 or q1
e. the language of strings that make the machine, if started in q2, end in q0, without passing
through q0 or q1
f. the language of strings that make the machine, if started in q0, end in q2, without passing
through q0 or q1
g. the language of strings that make the machine, if started in q1, end in q1, without passing
through q0 or q1
h. the language of strings that make the machine, if started in q1, end in q2, without passing
through q0 or q1
i. the language of strings that make the machine, if started in q0, end in q0, without passing
through q0 or q1
j. the language of strings that make the machine, if started in q1, end in q1, without passing
through q0
EXERCISE 4
(This exercise refers to material from Appendix A.) Let M be this NFA:
a,b
a
q0 q1
EXERCISES 87
Give the simplest regular expression you can think of for each of the following internal
languages of M.
a. L(M, 0, 0, 2)
b. L(M, 0, 0, 1)
c. L(M, 0, 0, 0)
d. L(M, 0, 1, 2)
e. L(M, 0, 1, 1)
f. L(M, 0, 1, 0)
g. L(M, 1, 1, 2)
h. L(M, 1, 1, 1)
i. L(M, 1, 1, 0)
j. L(M, 1, 0, 2)
j. L(M, 1, 0, 1)
l. L(M, 1, 0, 0)
EXERCISE 5
(This exercise refers to material from Appendix A.) Let M be this NFA:
b
q1
a
a
q0
b
q2
Give the simplest regular expression you can think of for each of the following internal
languages of M. (Use the d2r construction if necessary.)
a. L(M,2,2,3)
b. L(M,2,2,2)
c. L(M,1,1,3)
d. L(M,1,1,2)
e. L(M,1,1,1)
f. L(M,0,0,3)
g. L(M,0,0,2)
h. L(M,0,0,1)
i. L(M,0,0,0)
88 CHAPTER SEVEN: REGULAR EXPRESSIONS
EXERCISE 6
(This exercise refers to material from Appendix A.) Let M be this DFA:
0 1
1 0
q0 q1 q2
1 0
Give the simplest regular expression you can think of for each of the following internal
languages of M. (Use the d2r construction if necessary.)
a. L(M, 2, 2, 3)
b. L(M, 2, 2, 2)
c. L(M, 1, 1, 3)
d. L(M, 1, 1, 2)
e. L(M, 1, 1, 1)
f. L(M, 0, 0, 3)
g. L(M, 0, 0, 2)
h. L(M, 0, 0, 1)
i. L(M, 0, 0, 0)
EXERCISE 7
(This exercise refers to material from Appendix A.) The following code is used to verify
the regular expression computed by d2r for the example in Section A.3:
This code makes use of two classes, State and NFA. The NFA class represents an NFA.
When an NFA object is constructed, it initially contains only one state, the start state,
EXERCISES 89
with no accepting states and no transitions. Additional states are added dynamically
by calling the addState method, which returns an object of the State class. Each
transition on a symbol is added dynamically by calling the addX method, giving the
source state, the symbol, and the target state. States are labeled as accepting by calling the
makeAccepting method. Finally, the d2r construction is performed by calling the
d2r method, passing it the integers i, j, and k. (States are considered to be numbered
in the order they were added to the NFA, with the start state at number 0.) The d2r
method returns a String. The code shown above creates an NFA object, initializes it as
the example NFA, and prints out the regular expression for the language it accepts.
Implement the State and NFA classes according to this specification, along with any
auxiliary classes you need. You need not implement ε-transitions in the NFA, and you
may represent ε in regular expressions using the symbol e. Test your classes with the
test code shown above, and check that your d2r returns the correct regular expression.
(Recall that, for readability, some of the parentheses in the regular expression in
Section A.3 were omitted. Your result may include more parentheses than shown there.)
EXERCISE 8
For any string x, define xR to be that same string reversed. The intuition is plain enough,
but to support proofs we’ll need a formal definition:
εR = ε
(ax)R = xRa, for any symbol a and string x
We’ll use the same notation to denote the set formed by reversing every string in a
language:
AR = {xR | x A}
Using these definitions, prove the following properties of reversal:
a. Prove that for any strings x and y, (xy)R = yRxR. Hint: Use induction on |x|.
b. Prove that for any string x, (xR)R = x. Hint: Use induction on |x| and the result from
Part a.
c. Prove that for any languages A and B, (A B)R = AR BR.
d. Prove that for any languages A and B, (AB)R = BRAR. You’ll need the result from
Part a.
e. Prove that for any language A, (A*)R = (AR)*. You’ll need the result from Part a.
f. Prove that the regular languages are closed for reversal. Hint: Using structural
induction, show that for every regular expression r, there is a regular expression
r' with L(r' ) = (L(r))R.
8
CHA P TER
Regular-Expression
Applications
We have seen some of the implementation techniques related to
DFAs and NFAs. These important techniques are like tricks of the
programmer’s trade, normally hidden from the end user. Not so with
regular expressions; they are often visible to the end user and are part
of the user interface of a variety of useful software tools.
91
92 CHAPTER EIGHT: REGULAR-EXPRESSION APPLICATIONS
fred
barney
wilma
betty
then this command after the % prompt searches the file for lines containing an a:
In that example, egrep searched for lines containing a simple constant substring, but
the language of patterns understood by egrep can do much more. Various dialects of the
patterns understood by egrep are also used by many other tools. Unfortunately, these patterns
are often simply called regular expressions, though both in syntax and meaning they are a bit
different from the regular expressions we studied in the last chapter. To keep the two ideas
separate, this book refers to the patterns used by egrep and other tools using their common
nickname: regexps. Some of the special characters used in egrep’s regexp dialect are
* This symbol is like our Kleene star. For any regexp x, x* matches strings that are
concatenations of zero or more strings from the language specified by x.
| This symbol is like our +. For any regexps x and y, x|y matches strings that match
either x or y (or both).
() These symbols are used for grouping.
^ When this special symbol is at the start of the regexp, it allows the regexp to match
only at the start of the line.
$ When this special symbol is at the end of the regexp, it allows the regexp to match
only at the end of the line.
. This symbol matches any symbol (except the end-of-line marker).
For example, the regexp a.*y matches strings consisting of an a, followed by zero or
more other characters, followed by a y. We can search the names file for any line containing
such a string:
8.1 THE EGREP TOOL 93
% egrep 'a.*y' names
barney
%
Why did the search match all the lines, even those with an even number of characters? It did
this because egrep searches the file for all lines that contain a substring matching the specified
pattern—and any line with one or more characters contains a substring with an odd number
of characters. To match only odd-length lines, we must use the special symbols ^ and $:
0
1
10
11
100
101
110
111
94 CHAPTER EIGHT: REGULAR-EXPRESSION APPLICATIONS
1000
1001
1010
then this egrep command selects those that are divisible by three:
abaaba
ababa
abbbabbb
abbaabb
then this grep command selects those lines that consist of repeated strings:
The formal language corresponding to that example is {xx | x *}. This is not
something that can be defined with plain regular expressions. A useful intuition about regular
expressions is that, like DFAs, they can do only what you could implement on a computer
8.3 IMPLEMENTING REGEXPS 95
using a fixed, finite amount of memory. Capturing parentheses clearly go beyond that limit,
since they must capture a string whose size is unbounded. To implement \(.*\)\1, you
would have to first store the string matched by .* somewhere, then test for a match with \1
by comparing against that stored string. Since the stored string can be arbitrarily large, this
cannot be done using a fixed, finite memory. That is, of course, just an informal argument.
We will prove formally in later chapters that {xx | x *} is not a regular language and no
mere regular expression can define it.
situation differently, but many tools (like lex) are required to find the longest of the leftmost
matches first, which would be the string abb. In that case, the DFA-based implementation
must continue processing the string, always remembering the last accepting state that was
entered and the position in the string that went with it. Then, as soon as the DFA enters a
nonaccepting trap state or encounters the end of the string, it can report the longest leftmost
match that was found.
Similar accommodations must be made by NFA-based automata. In particular, when
an implementation using backtracking finds a match, it cannot necessarily stop there. If the
longest match is required, it must remember the match and continue, exploring all paths
through the NFA to make sure that the longest match is found.
import java.io.*;
import java.util.regex.*;
/**
* A Java application to demonstrate the Java package
* java.util.regex. We take one command-line argument,
* which is treated as a regexp and compiled into a
* Pattern. We then use that Pattern to filter the
* standard input, echoing to standard output only
* those lines that match the Pattern.
*/
class RegexFilter {
public static void main(String[] args)
throws IOException {
String s = in.readLine();
while (s!=null) {
Matcher m = p.matcher(s);
8.5 THE LEX TOOL 97
if (m.matches()) System.out.println(s);
s = in.readLine();
}
}
}
This application is structured like Mod3Filter from Chapter 4. But now, instead
of using a fixed DFA to test each line, it takes a parameter from the command line, treats
it as a regexp, and uses it to construct a Pattern object. A Pattern is something like
a representation of an NFA: it is a compiled version of the regexp, ready to be given an
input string to test. In the main loop, the application tests each input line by creating a
Matcher, an object that represents the NFA of the Pattern along with a particular input
string and current state. A Matcher object can do many things, such as finding individual
matches within a string and reporting their locations. But here, we just use it to test the
entire string to see whether the string matches the pattern. Using this application, we could
do our divisible-by-three filtering with this command:
The regex package also provides a single, static method that combines all those
steps, so you don’t have to keep track of separate Pattern and Matcher objects. To test
whether a String s is in the language defined by a regexp String r, you can just
evaluate Pattern.matches(r,s). This is easier for the programmer and makes sense
if you are going to use the regexp only once. But if (as in our example above) you need to use
the same regexp repeatedly, the first technique is more efficient; it compiles the regexp into a
Pattern that can be used repeatedly.
Scripting languages—Perl, Python, PHP, Tcl, Ruby, JavaScript, VBScript, and so on—
often have extra support for programming with regular expressions. In Perl, for example,
operations involving regular expressions don’t look like method calls; regular expressions are
used so often that the language provides a special-purpose syntax that is much more compact.
The Perl expression $x =~ m/a*b/ evaluates to true if and only if the string variable $x
matches the regexp a*b.
These regexps do not change from one compilation to the next, and it would not make sense
to perform the conversion from regexp to automaton every time the compiler is run. Instead,
the automaton can be a fixed, preconstructed part of the compiler. Tools like lex help with
this; they convert regexps into high-level-language code that can be used as a fixed part of
any application.
The lex tool converts a collection of regexps into C code for a DFA. The input file to
lex has three sections: a definition section, a rules section, and a section of user subroutines.
Lex processes this input file and produces an output file containing DFA-based C code,
which can be a stand-alone application or can be used as part of a larger program. The three
sections are separated by lines consisting of two percent signs, so a lex input file looks like
this:
definition section
%%
rules section
%%
user subroutines
The definition section can include a variety of preliminary definitions, but for the simple
examples in this book this section is empty. The user subroutines section can include C
code, which is copied verbatim to the end of the lex output file. This can be used to define
C functions used from inside the rules section, but for the simple examples in this book this
section too is empty. Thus, in our examples, the lex input file has only the rules section, like
this:
%%
rules section
%%
The rules section is a list of regexps. Each regexp is followed by some C code, which is to
be executed whenever a match is found for the regexp. For example, this lex program prints
the line Found one. for each occurrence of the string abc in the input file and ignores all
other characters:
%%
abc {fprintf(yyout, "Found one.\n");}
.|\n {}
%%
8.5 THE LEX TOOL 99
The lex program above contains two regexps. The first is abc; whenever that is found, the C
code {fprintf(yyout, "Found one.\n");} is executed. Here, fprintf is C’s
standard input/output function for printing to a file, and yyout is the output file currently
being used by the lex-generated code. So the program generated by lex reads its input and,
whenever it sees the string abc, prints the line Found one.
Code generated by lex copies any unmatched characters to the output. Our lex program
avoids this default behavior by including a second regexp, .|\n, which matches every
character. (The regexp . matches every character except the end-of-line marker, and the
regexp \n matches the end-of-line marker.) The C code associated with the second regexp
is the empty statement, meaning that no action is taken when this match is found. The lex-
generated code finds the longest match it can, so it won’t match a single character a using the
second rule if it is the start of an abc that matches the first rule.
If the lex program above is stored in a file named abc.l, we can build a program from
it with the following Unix commands:
% flex abc.l
% gcc lex.yy.c -o abc –ll
%
The first command uses flex (the Gnu implementation of lex) to compile the lex program
into DFA-based C code, which is stored in a file named lex.yy.c. The second command
then runs the C compiler on the flex-generated code; the –o abc tells it to put the
executable program in a file named abc, and –ll tells it to link with the special library for
lex. To test the resulting program, we make a file named abctest that contains these lines:
abc
aabbcc
abcabc
line matching that regexp is echoed to the output; any unmatched characters are ignored.
(The lex variable yytext gives us a way to access the substring that matched the regexp.)
%%
^(0|1(01*0)*1)*$ {fprintf(yyout, "%s\n", yytext);}
.|\n {}
%%
This lex program can be compiled into a filter program, just like the Mod3Filter we
implemented in Java:
% flex mod3.l
% gcc lex.yy.c -o mod3 –ll
%
The result is a directly executable file that filters its input file, echoing the divisible-by-
three lines to the standard output. Here, as before, the file numbers contains the binary
representations of the numbers zero through ten.
For simple applications like those above, the code produced by lex can be compiled as a
standalone program. For compilers and other large applications, the code produced by lex is
used as one of many source files from which the full application is compiled.
Exercises
EXERCISE 1
Show an egrep command that reads the standard input and echoes only those lines that
are in the language L(01*0).
EXERCISE 2
Show an egrep command that reads the standard input and echoes only those lines over
the alphabet {a, b} that have an odd number of as.
EXERCISE 3
Show an egrep command that reads the standard input and echoes only those lines that
are binary representations of numbers that are divisible by four.
EXERCISE 4
Show an egrep command that reads the standard input and echoes only those lines that
contain somewhere a decimal number in this form: one or more decimal digits, followed
by a decimal point, followed by one or more additional digits. You will need to find
more information about egrep’s regexp syntax to solve this problem correctly, and still
more to solve it elegantly.
EXERCISE 5
The lexical structure of Java programs includes the following three elements:
• Single-line comments, using //, as in these examples:
int x = 1; // NOT starting from 0 here
// s = "Hello";
• Literals of the String type, such as
"hello"
"Comments start with /*"
"He said, \"Hello.\""
• Literals of the char type, such as
'a'
'\n'
'\''
'\34'
If you are not a Java expert, you will need to refer to Java documentation for precise
definitions of these three elements. You’ll also need to learn more about egrep’s regexp
syntax to solve these problems.
a. Show an egrep command that reads the standard input and echoes only those lines
that contain single-line comments. You may assume that no String or char
literals and no block comments using /* and */ are present.
102 CHAPTER EIGHT: REGULAR-EXPRESSION APPLICATIONS
b. Show an egrep command that reads the standard input and echoes only those lines
that contain valid String literals. You may assume that no comments or char
literals are present.
c. Show an egrep command that reads the standard input and echoes only those lines
that contain valid char literals. You may assume that no comments or String
literals are present.
EXERCISE 6
The lexical structure of Java programs includes the following four elements:
• Single-line comments, as described in the previous exercise.
• Literals of the String type, as described in the previous exercise.
• Literals of the char type, as described in the previous exercise.
• Traditional comments using /* and */, as in these examples:
/* fix
this */
for (int i = 1; /* fix this */ i<100; i++)
/* hi /* // "hello" */
If you are not a Java expert, you will need to refer to Java documentation for precise
definitions of these elements. You’ll also need to learn more about the Java regexp syntax
and about the methods of the Matcher and Pattern classes. One hint: A common
beginner’s mistake when using the regex package is to forget to double up the
backslashes. Because a regexp is expressed as a Java String literal, a regexp like \s (the
character class for white space) must be written in the string as "\\s".
a. Write a Java application that reads a Java source file from standard input. It examines
the file one line at a time, identifies single-line comments by using the regex
package, and echoes only those comments (not necessarily the whole line) to
standard output. You may assume that none of the other elements described above is
present.
b. Write a Java application that reads a Java source file from standard input. It examines
the file one line at a time, identifies String literals by using the regex package,
and echoes only those literals to standard output, one on each line of output. (Note
that the input file may contain more than one String literal on a line; you must
find them all.) You may assume that none of the other elements described above
is present. Your program need not be well behaved if the input file contains errors
(such as unclosed String literals).
c. Write a Java application that reads a Java source file from standard input. It examines
the file one line at a time, identifies char literals by using the regex package,
EXERCISES 103
and echoes only those literals to standard output, one on each line of output. (Note
that the input file may contain more than one char literal on a line; you must
find them all.) You may assume that none of the other elements described above
is present. Your program need not be well behaved if the input file contains errors
(such as unclosed char literals).
d. Write a Java application that takes the name of a Java source file as a command-line
argument. It reads the entire file into a string, identifies traditional comments by
using the regex package, and echoes only those comments to standard output. You
may assume that none of the other elements described above is present. Note that
these may be multiline comments, so it is not sufficient to process each input line
in isolation. Your program need not be well behaved if the input file contains errors
(such as unclosed traditional comments).
e. Write a Java application that takes the name of a Java source file as a command-line
argument. It reads the entire file into a string, identifies all four elements described
above by using the regex package, and echoes only them to standard output. Your
output should identify which of the four parts each element is. For example, if the
file Test.java contains this text:
Your program need not be well behaved if the input file contains errors. Hint: it may
sound difficult to handle comments that contain strings, strings that contain comments,
and so on, but it isn’t. With minor modifications, you can simply combine the four
regexps from the previous four parts into one regexp. If your final regexp has four sets
104 CHAPTER EIGHT: REGULAR-EXPRESSION APPLICATIONS
of capturing parentheses, one for each of the four elements, you’ll be able to identify
which of the four was responsible for each match by using the group method of the
Matcher class.
EXERCISE 7
Write a Java application that examines a text file, looking for words that occur twice in
a row, separated by white space. (Define a “word” to be a string of one or more letters,
upper- or lowercase; define “white space” as in the Java regexp \s character class.) Your
application should take the file name as input on the command line. For each duplicated
word found, your application should print the word, the line number (counting the
first line as line 1), and the character position of the start of the first of the two copies
(counting the first character as character 1).
For example, if the file speech.txt contains this:
In your application, read the entire file into a string, then use the regex package to find
the duplicated words. (As the example shows, you must catch pairs that are split from
one line to the next.) You’ll need to learn more about the Java regexp syntax and about
the methods of the Matcher and Pattern classes. Be careful that your application
does not find partial word pairs, like “ago our” in the example above or “equal equality.”
9
CHA P TER
Advanced Topics in
Regular Languages
There are many more things to learn about finite automata than
are covered in this book. There are many variations with interesting
applications, and there is a large body of theory. Especially interesting,
but beyond the scope of this book, are the various algebras that arise
around finite automata. This chapter gives just a taste of some of
these advanced topics.
105
106 CHAPTER NINE: ADVANCED TOPICS IN REGULAR LANGUAGES
q0 a q1 a q2
b b
b
q3 q4
a,b a,b
This machine has two “trap” states, q3 and q4. When in state q3, the machine will
ultimately reject the string, no matter what the unread part contains. State q4 has exactly
the same property, so it is equivalent to q3 in an important sense and can be merged with q3
without changing the language accepted by the machine. So then we have
a
q0 a q1 a q2
b b
b
a,b
9.1 DFA MINIMIZATION 107
The states q1 and q2 are equivalent in that same sense. When in state q2, the machine will
accept if and only if the rest of the string consists of zero or more as. State q1 has exactly the
same property, so it is equivalent to q2 and can be merged with it. We then have
a
q0 a
b
b
a,b
The result is a minimum-state DFA for the language {xay | x {b}* and y {a}*}.
That example used an important idea about the equivalence of two states. Informally, we
said that two states were equivalent when the machine’s future decision, after any remaining
input, was going to be the same from either state. Formally, let’s define a little language
L(M, q) for each state q, which is the language that would be accepted by M if q were used as
the start state:
L(M, q) = {x * | *(q, x) F}
Now we can formally define our idea of the equivalence of two states: “q is equivalent to r”
means L(M, q) = L(M, r). In our original DFA above, we had
L(M, q0) = {xay | x {b}* and y {a}*}
L(M, q1) = {x | x {a}*}
L(M, q2) = {x | x {a}*}
L(M, q3) = {}
L(M, q4) = {}
Thus q1 was equivalent to q2, and q3 was equivalent to q4.
A general procedure for minimizing DFAs is this:
1. Eliminate states that are not reachable from the start state.
2. Combine all the equivalent states, so that no two remaining states are equivalent to
each other.
108 CHAPTER NINE: ADVANCED TOPICS IN REGULAR LANGUAGES
Formally, Step 2 can be described as the construction of a new DFA whose states are
the equivalence classes of the states of the original DFA. This is sometimes called
the quotient construction. But we will not give a formal definition of the quotient
construction. Instead, we will just state without proof the important property of the
whole minimization procedure:
Theorem 9.1: Every regular language has a unique minimum-state DFA, and no
matter what DFA for the language you start with, the minimization procedure
finds it.
The minimum-state DFA is “unique” in a structural sense. Mathematicians say that the
minimized DFA is unique up to isomorphism—in this case, unique except perhaps for the
names of the states, which could of course be changed without affecting the structure of
the DFA materially. Thus our minimization procedure is both safe and effective: safe, in
that it does not change the language accepted by the DFA; effective, in that it arrives at the
structurally unique, smallest DFA for that language.
As described above, our minimization procedure is adequate for simple exercises done
by hand, but it is not obvious how to write a program to do it. Is there an algorithm that
can efficiently detect equivalent states and so perform the minimization? The answer is yes.
The basic strategy is to start with a partition of the states into two classes, accepting and
nonaccepting, and then repeatedly divide these into smaller partitions as states are discovered
to be nonequivalent. The Further Reading section below has references for this algorithm.
Minimization results are weaker for NFAs. With NFAs, you can eliminate unreachable
states and combine states using an equivalence relation similar to the one we saw for DFAs.
But the resulting NFA is not necessarily a unique minimum-state NFA for the language.
x1 x2 ... xn-1 xn
2DFA
9.3 FINITE-STATE TRANSDUCERS 109
The picture above shows an input tape containing the input string x1x2 ... xn-1xn. The input
string is framed on the tape by special symbols that serve as left and right end-markers. Like
a DFA, a 2DFA defines state transitions based on the current symbol and state. But the
transition function has an expanded role, since it now returns two things: a new state and
a direction to move the head, either L for left or R for right. For example, (q, a) = (s, L)
says that if the 2DFA is in the state q and the head is currently reading the symbol a, it then
enters the state s and moves the head one place to the left. (On every transition the head
moves one place, either left or right.) The transition function cannot move the head left past
the left end-marker or right past the right end-marker.
In a DFA the end of the computation is reached when the last input symbol is read,
but a 2DFA needs some other way to tell when the computation is finished. So instead of a
set of accepting states, a 2DFA has a single accepting state t and a single rejecting state r. Its
computation is finished when it enters either of these two states, signaling that it has reached
its decision about the input string. (It can also get into an infinite loop, so it might never
reach a decision.)
We state without proof the important results about two-way finite automata:
So adding two-way reading to finite automata does not increase their power; they can still
recognize exactly the regular languages.
The interesting thing is that with just a little more tweaking we can get automata that
are much more powerful, as we will see in later chapters. Adding the ability to write the
tape as well as read it yields a kind of machine called a linear bounded automaton, which
can define far more than just the regular languages. Adding the ability to write and to move
unboundedly far past at least one of the end-markers produces a kind of machine called a
Turing machine, which is more powerful still.
#,0#
#,ε q0 q1 0,0
0,ε
1,0
1,ε
#,0#
0,1
q2
1,1
The alphabet for this machine is {0, 1, #}. It works like a DFA, except that on every
transition, the machine not only reads one input symbol, but also produces a string of zero or
more output symbols. Each transition is labeled with a pair a, x. The first element in the pair,
a, is the input symbol read when the transition is made; the second element in the pair, x, is
an output string generated when the transition is made. There is no accepting state, because
the machine is not trying to recognize a language—it is trying to transform input strings into
output strings.
Let’s see what this machine does given the input 10#11#. Starting in q0, initially no
input has been read and no output produced. The following chart shows how this changes
after each move:
Given the input 10#11#, the machine produced the output 10#10#. In general, if the input
to this machine is a string of binary numbers, each terminated by one or more # symbols, the
output is a string of the same binary numbers rounded down to the nearest even number, each
terminated by a single # symbol.
9.4 ADVANCED REGULAR EXPRESSIONS 111
As this example suggests, finite-state transducers can be used as signal processors. In
particular, they are used in a variety of natural-language processing, speech recognition,
and speech-synthesis applications. Finite-state transducers come in many flavors. Some are
deterministic, some nondeterministic; some associate an output string with each transition,
others with each state.
Exercises
EXERCISE 1
For each of the following DFAs, list the unreachable states if any, show L(M, q) for each
q Q, and construct the minimized DFA using the procedure of Section 9.1.
114 CHAPTER NINE: ADVANCED TOPICS IN REGULAR LANGUAGES
a.
0,1
0,1
q0 q1
0 1
q2
b.
a
a
q0 q1
b b
a
q2 q3
b
a,b
c.
a
q0 q1 q2
a a a
b
b b
q3 q4 q5
b
a,b b
EXERCISES 115
EXERCISE 2
Write a Java class that implements the finite-state transducer example from Section 9.3.
Include a main method that tests it on input from the command line.
EXERCISE 3
Using regular expressions extended with complement, find an expression with a star
height of 0 for each of these languages. Assume that = {a, b}.
a. *
b. L((a + b)*aa(a + b)*)
c. {x * | x has no two as in a row}
d. L(b*)
e. For any two extended regular expressions r1 and r2 of star height 0, the language
L(r1) L(r2).
f. For any two extended regular expressions r1 and r2 of star height 0, the language
L(r1) - L(r2).
g. L((ab)*). Hint: Almost all of these are strings that begin with ab, end with another
ab, and have no two as or bs in a row.
10
CHA P TER
Grammars
117
118 CHAPTER TEN: GRAMMARS
For a more abstract example, consider this grammar, which we’ll call G2:
S aS
S X
X bX
X ε
The final production for X says that an X may be replaced by the empty string, so that for
example the string abbX can become the string abb. Written in the more compact way, G2 is
S aS | X
X bX | ε
Here are some derivations using G2:
S aS aX a
S X bX b
The first part of the grammar is an alphabet, the nonterminal alphabet V. The
nonterminal alphabet contains the symbols we have been writing as uppercase letters—the
symbols that are used in derivations but do not occur in the string that is finally derived. The
second part of a grammar is the terminal alphabet. We use the same symbol that we have
used throughout this book for the alphabet of current interest; these are the symbols that we
have been writing as lowercase letters. The sets and V are disjoint, meaning that no symbol
occurs in both. The third part of the grammar is a special nonterminal symbol S, the start
symbol. The final part of a grammar is the set of productions. Each production is of the form
x y, where x and y are both strings over V and x is not permitted to be the empty
string.
For example, consider our previous grammar for the language L(a*b*):
S aS | X
X bX | ε
Formally, the grammar is G = (V, , S, P), where V = {S, X }, = {a, b}, and the set P of
productions is
{S aS, S X, X bX, X ε}
A grammar is a 4-tuple: it is a mathematical structure with four parts given in order. The
names given to those four parts in the definition above—V, , S, and P—are conventional,
but the grammar G could just as well have been defined as G = ({S, X }, {a, b }, S, {S aS,
S X, X bX, X ε}), without naming all the parts. The important thing is to specify the
four parts in the required order.
the production x y says that wherever you see the substring x, you may substitute y. When
a string w can be transformed into a string z in this way, we write w z. (This is read as
“w derives z.”) Formally:
Notice here that * is reflexive: for any string x, x * x by a zero-step derivation. Using the
* relation, we can define the language generated by G:
Notice here the restriction that x *. The intermediate strings in a derivation can involve
both terminal and nonterminal symbols, but only the fully terminal strings derivable from
the start symbol are in the language generated by the grammar.
All four of the preceding definitions were made with respect to some fixed grammar
G = (V, , S, P). Clearly, different grammars produce different and * relations. In those
rare cases when we need to work with two or more grammars at once, we use subscripts to
identify the relations of each: G , H , and so on.
b ε
S R T
We will construct a grammar that generates the same language. Not only will it generate the
same language, but its derivations will exactly mimic the behavior of the NFA. To make the
construction clearer, the states in the NFA are named S, R, and T, instead of the usual q0, q1,
and q2. That’s because in our construction, each state of the NFA will become a nonterminal
symbol in the grammar, and the start state of the NFA will become the start symbol in the
grammar. That settles the two alphabets and the start state for the new grammar; all that
remains is to construct the set of productions.
For each possible transition Y (X, z) in the NFA (where z is any single symbol from
or z = ε), we add a production X zY to the grammar. For our example this is
Transition of M Production in G
(S, a) = {S } S aS
(S, b) = {R } S bR
(R, c) = {R } R cR
(R, ε) = {T } R T
In addition, for each accepting state in the NFA, the grammar contains an ε-production for
that nonterminal. For our example this is
Accepting State of M Production in G
T T ε
If you’re given a grammar that happens to be single step, you can easily reverse the
construction and build a corresponding NFA. For instance, here’s a single-step grammar that
generates the language L(ab*a):
S aR
R bR
R aT
T ε
We can make an NFA with three states, S, R, and T. The first three productions correspond
to three transitions in the NFA diagram, and the final production tells us to make T an
accepting state. The resulting NFA is
b
a a
S R T
productions R aY and Y ε, both of which are single step. The resulting single-step
grammar is
S aX
X bR
R aY
Y ε
This can then be converted into an NFA using our construction:
a b a
S X R Y
The grammar we started with this time, although not quite in the form we wanted, was
still fairly close; every production had a single nonterminal as its left-hand side and a string
containing at most one nonterminal as its right-hand side. Moreover, that nonterminal on
the right-hand side was always the rightmost symbol in the string. A grammar that follows
this restriction is called a right-linear grammar.
Every right-linear grammar generates a regular language and can be converted fairly easily
into an NFA that accepts that language. As in the example above, the trick is to first rewrite
the grammar to make it single step. This is always possible.
X z1K1
K1 z2K2
...
10.6 EVERY RIGHT-LINEAR GRAMMAR GENERATES A REGULAR LANGUAGE 127
Kn-1 znKn
Kn
where each Ki is a new nonterminal symbol. Now let G' = (V', , S, P'), where
V' is the set of nonterminals used in the new productions. The new grammar G'
is single step; it remains to be proven that L(G) = L(G' ). This is straightforward.
Any derivation of a terminal string in G can be converted to a derivation of
the same terminal string in G' simply by replacing each step that used a rule
X z1...zn with the n + 1 corresponding steps in G' and vice versa.
The construction given above makes more new productions than strictly necessary. For
example, it converts a production A bC into the two productions A bK1 and
K1 C. Conversion is unnecessary in that case, of course, since the original production
was already in the required form. But it doesn’t hurt, and it makes the proof simpler to use a
single, general transformation for every production in the original grammar.
With a little more work, one can show that the language generated by a left-linear grammar
is also always a regular language. A grammar that is either left linear or right linear is
sometimes called a regular grammar.
A grammar does not have to be regular to generate a regular language. Consider this
grammar:
S aSaa | ε
128 CHAPTER TEN: GRAMMARS
This is neither right linear nor left linear, yet the language it generates is a simple regular
language: L((aaa)*). If a grammar is regular, you can conclude that the language it generates
is regular. But if a grammar is not regular, you can’t conclude anything about the language it
generates; the language may still be regular.
All of this leads to the next important question: are there grammars that generate
languages that are not regular? This is answered in the next chapter.
Exercises
EXERCISE 1
These questions concern grammar G1 from section 10.1:
a. Show a derivation from S for the string thedogeatsthecat.
b. List all the lowercase strings that can be derived from P.
c. How many lowercase strings are in the whole language defined by G1—the language
of all lowercase strings that can be derived from S? Show your computation.
EXERCISE 2
Modify grammar G1 from Section 10.1 to make it allow at most one of the adjectives
red, little, black, or cute, as part of any noun phrase. For example, your grammar should
generate the sentence thereddoglovestheblackcat. Make sure it also generates all the original
sentences, without adjectives.
EXERCISE 3
These questions concern grammar G2 from Section 10.1:
c. Show a derivation from S for the string bbb.
d. Show a derivation from S for the string aabb.
e. Show a derivation from S for the empty string.
EXERCISE 4
Give a regular expression for the language generated by each of these grammars.
a. S abS | ε
b. S aS | aA (Hint: This one is tricky—be careful!)
A aS | aA
c. S smellA | fishA
A y|ε
d. S aaSa | ε
EXERCISE 5
Give a grammar for each of the following languages. In each case, use S as the start
symbol.
a. L(a*)
b. L(aa*)
c. L(a*b*c*)
EXERCISES 129
d. L((abc)*)
e. The set of all strings consisting of one or more digits, where each digit is one of the
symbols 0 through 9.
EXERCISE 6
Give a DFA that accepts the language generated by this grammar:
S A|B|C
A aB | ε
B bC | ε
C cA | ε
EXERCISE 7
State each of the following grammars formally, as a 4-tuple. (Assume, as usual, that
the terminal alphabet is the set of lowercase letters that appear in productions, the
nonterminal alphabet is the set of uppercase letters that appear in productions, and the
start symbol is S.)
a. S aS | b
b. S abS | A
A baA | ε
c. S A | B | C
A aB | ε
B bC | ε
C cA | ε
EXERCISE 8
According to the definition, what is the smallest possible grammar—with the smallest
possible set in each of the four parts? What language does it generate?
EXERCISE 9
Consider this grammar:
S aS | bbS | X
X cccX | ε
For each of the following, show all strings that can be derived from the given string in
one step. For example, for the string cS, your answer would be the three strings caS, cbbS,
and cX.
a. S
b. aXa
c. abc
d. ScS
e. aXbSc
130 CHAPTER TEN: GRAMMARS
EXERCISE 10
Consider this grammar:
S AB | ε
A aA | ε
B bB | ε
For each of the following pairs related by *, show a derivation. For instance, for the
pair S * a, one correct answer is the four-step derivation S AB aAB aB a.
a. aA * aaaA
b. aB * abbB
c. Bb * bb
d. S * ε
e. S * a
f. S * aabb
EXERCISE 11
Give a grammar for each of the following languages. You need not state it formally; just
list the productions, and use S as the start symbol.
a. The set of all strings consisting of zero or more as with a b after each one.
b. The set of all strings consisting of one or more as with a b after each one.
c. The set of all strings consisting of one or more as, with a b between each a and the
next. (There should be no b before the first or after the last a.)
d. The set of all strings consisting of zero or more as, with a b between each a and the
next. (There should be no b before the first or after the last a.)
e. The set of all strings consisting of an open bracket (the symbol [) followed by a
list of zero or more digits separated by commas, followed by a closing bracket (the
symbol ]). (A digit is one of the characters 0 through 9.)
EXERCISE 12
Using the construction of Theorem 10.1, make a right-linear grammar that generates the
language accepted by each of these NFAs.
a.
a b
a
EXERCISES 131
b.
b b
c.
d.
a
ε
a
b
ε
e.
0,1
1 0,1
132 CHAPTER TEN: GRAMMARS
EXERCISE 13
Using the construction of Theorem 10.2, make an NFA that accepts the language
generated by each of these right-linear grammars.
a. S bS | aX
X aX | ε
b. S bS | X | ε
X aX | cY
Y ε
c. S bR | ε
R baR | ε
d. S aS | bbS | X
X cccX | d
EXERCISE 14
Given any single-step grammar G = (Q, , S, P), give a formal construction for an
NFA M with L(M) = L(G). (This is the reverse of the construction used to prove
Theorem 10.1.) You don’t have to show a proof that the languages are equal.
EXERCISE 15
Consider this grammar G:
S Sa | Sb | ε
Prove by induction on |x| that for any string x {a, b}*, S * Sx. Then use this result to
prove that L(G) = {a, b}*.
EXERCISE 16
Rewrite the proof of Theorem 10.1. This time, include a full inductive proof that G has
a derivation X * zY if and only if M has (X, z) * (Y, ε). Hint: It is easier to prove a
slightly stronger lemma, that G has an n-step derivation X * zY if and only if M has an
n-step sequence (X, z) * (Y, ε).
11
CHA P TER
Nonregular
Languages
We have now encountered regular languages in several different
places. They are the languages that can be recognized by a DFA.
They are the languages that can be recognized by an NFA. They
are the languages that can be denoted by a regular expression. They
are the languages that can be generated by a right-linear grammar.
You might begin to wonder, are there any languages that are not
regular?
In this chapter, we will see that there are. There is a proof tool that
is often used to prove languages nonregular. It is called the pumping
lemma, and it describes an important property of all regular
languages. If you can show that a given language does not have this
property, you can conclude that it is not a regular language.
133
134 CHAPTER ELEVEN: NONREGULAR LANGUAGES
b
ε
11.1 THE LANGUAGE {anbn} 135
Here is one for the larger subset {anbn | n 2}:
a a
b
ε
For the next larger subset {anbn | n 3}, we could use this:
a a a
b
ε
ε b
b
136 CHAPTER ELEVEN: NONREGULAR LANGUAGES
Clearly, this is not going to be a successful pattern on which to construct an NFA for the
whole language {anbn}, since for each larger value of n we are adding two more states. In
effect, we are using the states of the NFA to count how many as were seen, then to verify that
the same number of bs follows. This won’t work in general, since the F in NFA stands for
finite, but no finite number of states will be enough to count the unbounded n in {anbn}.
Of course, this failure does not prove that {anbn} is not regular, but it contains the germ
of the idea for a proof—the intuition that no fixed, finite number of states can be enough. A
formal proof follows below.
Proof: Let M = (Q, {a, b}, , q0, F ) be any DFA over the alphabet {a, b }; we
will show that L(M) {anbn}. Consider the behavior of M on an arbitrarily long
string of as. As it reads the string, M visits a sequence of states: first *(q0, ε),
then *(q0, a), then *(q0, aa), and so on. Eventually, since M has only finitely
many states, it must revisit a state; that is, there must be some i and j with i < j
for which *(q0, ai ) = *(q0, a j ). Now by appending b j to both strings, we see
that *(q0, aib j ) = *(q0, a jb j ). Thus M ends up in the same final state for both
aib j and ajb j. If that is an accepting state, M accepts both; if it is a rejecting
state, M rejects both. But this means that L(M ) {anbn}, since a jb j is in {anbn}
while aib j is not. Since we have shown that L(M ) {anbn} for any DFA M, we
conclude that {anbn} is not regular.
An interesting thing about this proof is how much one can infer about the behavior
of a completely unknown DFA M. Using only the fact that as a DFA M must have a finite
number of states, one can infer that there must be some i and j with i < j for which M
either accepts both a ib j and a jb j or rejects both. The basic insight is that with a sufficiently
long string you can force any DFA to repeat a state. This is the basis of a wide variety of
nonregularity proofs.
Theorem 11.2: {xx R } is not regular for any alphabet with at least two symbols.
Proof: Let M = (Q, , , q0, F) be any DFA with | | 2; we will show that
L(M) {xx }. The alphabet contains at least two symbols; call two of these
R
This proof was almost identical to the previous proof. We used the same insight that by
giving a sufficiently long string we could force the DFA to repeat a state—and by so doing,
we could force it to end up in the same final state for two strings that ought to be treated
differently.
11.3 Pumping
Both of the last two proofs worked by choosing a string long enough to make any given DFA
repeat a state. Both proofs just used strings of as, showing that for any given DFA there must
be some i and j with i < j for which *(q0, a i ) = *(q0, a j ).
For those proofs it was enough to find those two strings—two different strings that put
the DFA in the same state. For other proofs of nonregularity it is sometimes necessary to
carry the same argument further: to show that there are not just two but infinitely many
different strings, all of which leave the DFA in the same state. It is actually a fairly simple
extension. Let r be the state that repeats, so r = *(q0, a i ). We know that r is visited again
138 CHAPTER ELEVEN: NONREGULAR LANGUAGES
after j - i more as: *(q0, a j ) = *( q0, a i+(j-i )) = r. In fact, every time the machine reads an
additional j - i as, it will return to state r :
r = *(q0, ai)
= *(q0, ai+(j-i))
= *(q0, ai+2(j-i))
= *(q0, ai+3(j-i))
= ...
and so on. That little substring of j - i additional as can be “pumped” any number of times,
and the DFA always ends up in the same state.
All regular languages have an important property involving pumping. Any sufficiently
long string in a regular language must contain a “pumpable” substring—a substring that can
be replicated any number of times, always yielding another string in the language. Formally:
Lemma 11.1: (The Pumping Lemma for Regular Languages): For all regular
languages L there exists some k N such that for all xyz L with |y| k, there
exist uvw = y with |v| >0, such that for all i 0, xuv iwz L.
Proof: Let L be any regular language, and let M = (Q, , , q0, F ) be any DFA
for it. For k we choose k = |Q|. Consider any x, y, and z with xyz L and |y|
k. Since |y| |Q| we know there is some state r that M repeats while reading
the y part of the string xyz. We can therefore divide y into three substrings
uvw = y so that *(q0, xu) = *(q0, xuv) = r. Now v can be pumped: for all
i 0, *(q0, xuv i ) = r, and so *(q0, xuv iwz) = *(q0, xuvwz) = *(q0, xyz) F.
Therefore, for all i 0, xuviwz L.
In this proof, nothing is known about the strings x, y, and z except that xyz L and
|y| k = |Q|. Because there are at least as many symbols in y as states in Q, we know that
some state r must be visited more than once as M reads the y part of the string xyz:
In state r here And again here
x y z
That gives us a way to divide y into those three further substrings uvw = y:
In state r here And again here
x u v w z
11.4 PUMPING-LEMMA PROOFS 139
Now v is the pumpable substring. We can replace it with any number of copies, and we know
the DFA will end up in the same state after each copy, so we know it will end in the same
(accepting) state after the whole string is read. Thus it will accept, not just that original string
xyz = xuvwz, but xuviwz for all i.
x u v v ... v w z
Take a closer look at the structure of the pumping lemma. It is a sequence of clauses that
alternate “for all” and “there exist” parts: for all regular languages L there exists some k N
such that for all xyz L with |y| k, there exist uvw = y with |v| >0, such that for all i 0,
xuviwz L. Here is the overall structure of the pumping lemma, using the symbols (“for
all”) and (“there exists”):
1. L ...
2. k ...
3. xyz ...
4. uvw ...
5. i ...
Our proof shows how the values, k and uvw, can be constructed, starting from a DFA for
the language L. But the construction used in the proof is not part of the lemma. From now
on we will forget that k might be the number of states in a DFA for L; we will close the lid
and treat the pumping lemma as a black box. The lemma merely says that suitable values k
and uvw can be found; it does not say how.
11.4 Pumping-Lemma Proofs
The pumping lemma is very useful for proving that languages are not regular. For example,
here is a pumping-lemma proof showing that {anbn} is not regular. The steps in the proof are
numbered for future reference.
1. The proof is by contradiction using the pumping lemma for regular languages.
Assume that L = {anbn} is regular, so the pumping lemma holds for L. Let k be as
given by the pumping lemma.
2. Choose x, y, and z as follows:
x = ak
y = bk
z=ε
Now xyz = akbk L and |y| k as required.
3. Let u, v, and w be as given by the pumping lemma, so that uvw = y, |v| > 0, and for
all i 0, xuviwz L.
140 CHAPTER ELEVEN: NONREGULAR LANGUAGES
4. Choose i = 2. Since v contains at least one b and nothing but bs, uv2w has more bs
than uvw. So xuv2wz has more bs than as, and xuv2wz L.
5. By contradiction, L = {anbn} is not regular.
The structure of this proof matches the structure of the pumping lemma. The alternating
and parts of the lemma make every pumping-lemma proof into a kind of game. The
parts (the natural number k and the strings u, v, and w) are merely guaranteed to exist. In
effect, the pumping lemma itself makes these moves; it is a black box that produces values for
k, u, v, and w. In Steps 1 and 3 we could only say these values are “as given by the pumping
lemma.” The parts, on the other hand, are the moves you get to make; you can choose any
values for x, y, z, and i, since the lemma is supposed to hold for all such values. In Steps 2
and 4 we chose values for x, y, z, and i so as to reach a contradiction: a string xuviwz that is
not in L, contradicting the pumping lemma.
Pumping-lemma proofs follow this pattern quite strictly. Here is another example: a
pumping-lemma proof showing that {xxR} is not regular.
1. The proof is by contradiction using the pumping lemma for regular languages.
Assume that L = {xxR } is regular, so the pumping lemma holds for L. Let k be as
given by the pumping lemma.
2. Choose x, y, and z as follows:
x = akbb
y = ak
z=ε
Now xyz = akbbak L and |y| k as required.
3. Let u, v, and w be as given by the pumping lemma, so that uvw = y, |v| > 0, and for
all i 0, xuviwz L.
4. Choose i = 2. Since v contains at least one a and nothing but as, uv2w has more
as than uvw. So xuv2wz has more as after the bs than before them, and thus
xuv2wz L.
5. By contradiction, L = {xxR } is not regular.
Notice that Steps 1, 3, and 5 are the same as before—only the language L has been changed.
In fact, Steps 1, 3, and 5 are the same in all pumping-lemma proofs. The only steps that
require creativity are Steps 2 and 4. In step 2, you choose xyz and show that you have met the
requirements xyz L and |y| k; the trick here is to choose them so that pumping in the y
part will lead to a contradiction, a string that is not in L. In Step 4 you choose i, the number
of times to pump, and show that the contradiction has been achieved: xuviwz L.
11.5 STRATEGIES 141
11.5 Strategies
Proving that a language L is not regular using the pumping lemma comes down to those four
delicate choices: you must choose the strings xyz and the pumping count i and show that
these choices lead to a contradiction. There are usually a number of different choices that
successfully lead to a contradiction—and, of course, many others that fail.
Let A = {a n b j a n | n 0, j 1}, and let’s try to prove that A is not regular. The start is
always the same:
1. The proof is by contradiction using the pumping lemma for regular languages.
Assume that A is regular, so the pumping lemma holds for it. Let k be as given by
the pumping lemma.
2. Choose x, y, and z ...
How to proceed? The following table shows some good and bad choices for x, y, and z:
x = aaa Bad choice. The pumping lemma requires |y| k; it never applies to
y=b fixed-size examples. Since k is not known in advance, y must be some
z = aaa string that is constructed using k, such as ak.
x=ε Bad choice. The pumping lemma applies only if the string xyz A. That
y = ak is not the case here.
z = ak
x = an This is ill-formed, since the value of n is not defined. At this point the
y=b only integer variable that is defined is k.
z = an
x = ak This meets the requirements xyz A and |y| k, but it is a bad choice
y = bk+2 because it won’t lead to a contradiction. Pumping within the string y will
z = ak change the number of bs in the middle, but the resulting string can still
be in A.
x = ak This meets the requirements xyz A and |y| k, but it is a bad choice
y = bbak because it won’t lead to a contradiction. The pumping lemma can choose
z=ε any uvw = y with |v| > 0. If it chooses u = b, v = b, and w = ak, there will
be no contradiction, since for all i 0, xuviwz A.
x = akb Good choice. It meets the requirements xyz A and |y| k, and it will
y = ak lead to a contradiction because pumping anywhere in the y part will
z=ε change the number of as after the b, without changing the number
before the b.
x=ε An equally good choice.
y = ak
z = bak
142 CHAPTER ELEVEN: NONREGULAR LANGUAGES
If the language L is finite then there must be some number which is greater than the length
of the longest string in L. Choosing k to be this number, it is then obviously true that “for all
xyz L with |y| k, ...”—since there are no strings in L with |y| k. The rest of the lemma
is, as mathematicians say, vacuously true.
Thus, if L is a finite language, we can’t get a contradiction out of the pumping lemma.
In fact, you may already have noticed the following:
Alternative proofs using DFAs, NFAs, or right-linear grammars are only a little harder.
EXERCISES 143
Exercises
EXERCISE 1
Prove that {anbncn} is not regular. Hint: Copy the proof of Theorem 11.1—only minor
alterations are needed.
EXERCISE 2
Prove that {anb*cn} is not regular. Hint: Copy the proof of Theorem 11.1—only minor
alterations are needed.
EXERCISE 3
Show that {xx R} is regular when | | = 1.
EXERCISE 4
Show that {xcxR | x {a, b}*} is not regular. Hint: Copy the proof of Theorem 11.2—
only minor alterations are needed.
EXERCISE 5
Show that {xx | x {a, b}*} is not regular. Hint: Copy the proof of Theorem 11.2—only
minor alterations are needed.
EXERCISE 6
Let A = {anb2n | n 0}. Using the pumping lemma for regular languages, prove that A is
not regular.
EXERCISE 7
Let B = {anbncn | n 0}. Using the pumping lemma for regular languages, prove that B is
not regular.
EXERCISE 8
Let C = {0n1m0p | n + m = p}. Using the pumping lemma for regular languages, prove that
C is not regular.
EXERCISE 9
Let D = {anbm | n m}. Using the pumping lemma for regular languages, prove that D is
not regular.
EXERCISE 10
Let E = {anbm | m n}. Using the pumping lemma for regular languages, prove that E is
not regular.
EXERCISE 11
Let F = {an | n = b2 for some integer b }. Using the pumping lemma for regular languages,
prove that F is not regular.
12
CHA P TER
Context-free
Languages
We defined the right-linear grammars by giving a simple restriction
on the form of each production. By relaxing that restriction a bit, we
get a broader class of grammars: the context-free grammars. These
grammars generate the context-free languages, which include all
the regular languages along with many that are not regular.
145
146 CHAPTER TWELVE: CONTEXT-FREE LANGUAGES
Why is this called context free? A production like uRz uyz specifies that R can be replaced
by y, but only in a specific context—only when there is a u to the left and a z to the right. In
a context-free grammar, all productions look like R y, so they specify a substitution for a
nonterminal that does not depend on the context of surrounding symbols in the string.
Because every regular language has a right-linear grammar and every right-linear grammar
is a CFG, it follows that every regular language is a CFL. But, as the examples above show,
CFGs can generate languages that are not regular, so not every CFL is regular. The CFLs
properly contain the regular languages, like this:
12.2 WRITING CFGS 147
L(a*b*)
regular
languages
CFLs
{anbn}
0 0
S T U
Now we apply the construction of Theorem 10.1. Wherever the NFA has Y (X, z), we
add the production X zY, and for each accepting state X in the NFA, we add X ε. The
result is this grammar for L:
S 1S | 0T | ε
T 1T | 0U
U 1U | 0S
148 CHAPTER TWELVE: CONTEXT-FREE LANGUAGES
12.2.3 Concatenations
A divide-and-conquer approach is often helpful for writing CFGs, just as for writing
programs. For instance, consider the language L = {a nb nc md m}. At first glance this may look
daunting, but notice that we can easily write grammars for {anbn} and {c md m}:
S1 aS1b | ε
S2 cS2d | ε
12.2 WRITING CFGS 149
These are two separate grammars with two separate start symbols S1 and S2. Now every string
in L consists of a string generated from S1 followed by a string generated from S2. So if we
combine the two grammars and introduce a new start symbol, we get a full grammar for L:
S S1S2
S1 aS1b | ε
S2 cS2d | ε
In general, when you discover that a CFL L can be thought of as the concatenation of two
languages L1 and L2
L = L1L2 = {xy | x L1 and y L2}
you can write a CFG for L by writing separate CFGs for L1 and L2, carefully keeping the two
sets of nonterminal symbols separate and using two separate start symbols S1 and S2. A full
grammar for L is then given by combining all the productions and adding a new start symbol
S with the production S S1S2.
12.2.4 Unions
Another use of the divide-and-conquer approach is for a language that can be decomposed
into the union of two simpler languages. Consider this language:
L = {z {a, b}* | z = xxR for some x, or |z| is odd}
Taken all at once this might be difficult, but notice that the definition of L can be expressed
as a union:
L = {xxR | x {a, b}*} {z {a, b}* | |z| is odd}
We can easily give CFGs for these two parts—again, being careful to use separate sets of
nonterminal symbols and separate start symbols. A grammar for {xxR | x {a, b}*} is
S1 aS1a | bS1b | ε
And a grammar for {z {a, b}* | |z| is odd} is
S2 XXS2 | X
X a|b
So a grammar for the whole language L is
S S1 | S2
S1 aS1a | bS1b | ε
S2 XXS2 | X
X a|b
For a slightly more subtle example, consider the language L = {anbm | n m}. One way to
build a grammar for this is to think of it as a union:
150 CHAPTER TWELVE: CONTEXT-FREE LANGUAGES
while (a<b) {
c = c * a;
a = a + a;
}
The BNF grammar for a full language may include hundreds of productions.
tree from left to right. These two different ways of thinking of a grammar are equivalent in
the sense that a string is produced by some derivation if and only if it is the fringe of some
parse tree.
For example, consider again this grammar for a simple language of expressions:
<exp> ::= <exp> - <exp> | <exp> * <exp> | <exp> = <exp> | <exp> < <exp>
| (<exp>) | a | b | c
The string a–b*c is in the language generated by this grammar. Our usual way to
demonstrate this has been to give a derivation, like this:
<exp> <exp> * <exp>
<exp> - <exp> * <exp>
a- <exp> * <exp>
a-b* <exp>
a-b*c
Here is a parse tree for the same string:
<exp>
<exp> * <exp>
c
<exp> - <exp>
a b
This parse tree is more than just a demonstration that the string a–b*c is in the
language—it is also a plan for evaluating the expression when the program is run. It says to
evaluate a - b, then multiply that value by c. (That’s not how most programming languages
handle a-b*c, of course—more about that below!) In general, a parse tree specifies not just
the syntax of a program, but how the different parts of the program fit together. This in turn
says something about what happens when the program runs. The parse tree (or a simplified
version called the abstract syntax tree) is one of the central data structures of almost every
12.5 AMBIGUITY 153
compiler or other programming language system. To parse a program is to find a parse tree
for it. Every time you compile a program, the compiler must first parse it. We will see more
about algorithms for parsing in another chapter.
12.5 Ambiguity
The previous grammar for expressions is ambiguous in the sense that it permits the
construction of different parse trees for the same string. For example, consider our string
a–b*c. We built one parse tree for it, but here is another one:
<exp>
<exp> - <exp>
a
b * c
This parse tree suggests a different plan for evaluating the expression: first compute the
value b c, then subtract that value from a. This kind of ambiguity is unacceptable for
programming languages; part of the definition of the language must be a clear decision about
whether a–b*c means (a - b) c or a - (b c).
To resolve this problem, BNF grammars for programming languages are usually crafted
to be unambiguous. They not only specify the intended syntax, but do so with a unique parse
tree for each program, one that agrees with intended semantics. This is not usually difficult,
but it generally means making the grammar more complicated. For example, the following
grammar generates the same language as the previous one, but now does so unambiguously:
<exp> ::= <ltexp> = <exp> | <ltexp>
<ltexp> ::= <ltexp> < <subexp> | <subexp>
<subexp> ::= <subexp> - <mulexp> | <mulexp>
<mulexp> ::= <mulexp> * <rootexp> | <rootexp>
<rootexp> ::= (<exp>) | a | b | c
Using this grammar, a parse tree for a-b*c is
154 CHAPTER TWELVE: CONTEXT-FREE LANGUAGES
<exp>
<ltexp>
<subexp>
<subexp> - <mulexp>
<rootexp> <rootexp> c
a b
Like most programming language definitions, this gives the multiplication operator higher
precedence than the subtraction operator, so that the expression a–b*c is computed as
a - (b c). It also makes subtraction left associative, so that the expression a-b-c is
computed as (a - b) - c.
An alternative solution to the problem of ambiguity is to stick with the ambiguous,
simple grammar, but add some text describing how to choose among the possible parse
trees. In our case, we could stick with our first grammar but add a sentence saying that
each operator is at a separate level of precedence, in order =, <, -, and * from lowest
to highest, and that all except = are left associative. If the grammar is meant only for
human consumption, this approach may make sense. But often, grammars are meant to
be used directly by computer programs, like parsers and parser generators, about which
we will see more in another chapter. For such uses it is generally better for the grammar
to be unambiguous. (Incidentally, there are CFLs for which it is not possible to give an
unambiguous grammar. Such languages are called inherently ambiguous. But programming
languages rarely suffer from this problem.)
12.7 Exercises
EXERCISE 1
Give CFGs for the following languages.
a. L(a*b*)
b. {x {0, 1}* | the number of 0s in x is divisible by 3}
c. {x {0, 1}* | |x| is divisible by 3}
d. {x {0, 1}* | x is a binary representation of a number divisible by 3}
e. {x {a, b}* | x L(a*b*)}
f. {anbm | m > 2n}
g. {anbn | n is even}
h. {(ab)nc(de)n}
i. {anbn | n is odd}
j. {aibnajbnak}
k. {anbnaib j }
l. {x {0, 1}* | the number of 0s in x is divisible by 3,
or |x| is divisible by 3, or both}
m. {anbm | m > 2n or m < n}
n. {x {a, b}* | x {anbn}}
o. {anbnc j}
p. {anb2n}
q. {anbmcmdn}
r. {anbncmdm}
s. {x {a, b}* | x = xR}
EXERCISE 2
Give a BNF grammar for each of the languages below:
a. The set of all strings consisting of zero or more as.
b. The set of all strings consisting of an uppercase letter followed by zero or more
additional characters, each of which is either an uppercase letter or one of the
characters 0 through 9.
c. The set of all strings consisting of one or more as.
EXERCISES 157
d. The set of all strings consisting of one or more digits. (Each digit is one of the
characters 0 through 9.)
e. The set of all strings consisting of zero or more as with a semicolon after each one.
f. The set of all strings consisting of the keyword begin, followed by zero or more
statements with a semicolon after each one, followed by the keyword end. Use the
nonterminal <statement> for statements, and do not give productions for it.
g. The set of all strings consisting of one or more as with a semicolon after each one.
h. The set of all strings consisting of the keyword begin, followed by one or more
statements with a semicolon after each one, followed by the keyword end. Use the
nonterminal <statement> for statements, and do not give productions for it.
i. The set of all strings consisting of one or more as, with a comma between each a
and the next. (There should be no comma before the first or after the last a.)
j. The set of all strings consisting of an open bracket (the symbol [) followed by a list
of one or more digits separated by commas, followed by a closing bracket (the
symbol ]).
k. The set of all strings consisting of zero or more as, with a comma between each a
and the next. (There should be no comma before the first or after the last a.)
l. The set of all strings consisting of an open bracket (the symbol [) followed by a list
of zero or more digits separated by commas, followed by a closing bracket (the
symbol ]).
EXERCISE 3
Give an EBNF grammar for each of the languages of Exercise 1. Use the EBNF
extensions wherever possible to simplify the grammars. In particular, you should
eliminate recursion from the grammars wherever possible. Don’t forget to put quotation
marks around metasymbols when they are used as tokens.
EXERCISE 4
Show that each of the following BNF grammars is ambiguous. (To show that a grammar
is ambiguous, you must demonstrate that it can generate two parse trees for the same
string.)
a. This grammar for expressions:
<exp> ::= <exp> + <exp>
| <exp> * <exp>
| ( <exp> )
|a|b|c
b. This grammar:
<person> ::= <woman> | <man>
<woman> ::= wilma | betty | <empty>
<man> ::= fred | barney | <empty>
158 CHAPTER TWELVE: CONTEXT-FREE LANGUAGES
Stack Machines
Commonly, the word stack refers to any orderly, vertical pile: a stack of
books, plates, or poker chips. All the action in a stack is at the top, where
items can be added or removed. Throughout computer science, the word
stack is used in a related technical sense: a stack is a collection of data
accessed in last-in-first-out order. Data items may be added to the top
(pushed onto the stack) or removed from the top (popped off the stack).
Stacks are ubiquitous in computer programming, and they have an
important role in formal language as well. A stack machine is a kind
of automaton that uses a stack for auxiliary data storage. The size of the
stack is unbounded—it never runs out of space—and that gives stack
machines an edge over finite automata. In effect, stack machines have
infinite memory, though they must use it in stack order.
If you travel by two paths that seem to depart in different directions, it is
a surprise to discover that they lead to the same destination. It makes that
destination feel more important—an intersection rather than a dead end.
That is the situation with context-free languages. Stack machines and
CFGs seem like two very different mechanisms for language definition—
two paths that depart in different directions. But it turns out that these
two paths lead to the same place. The set of languages that can be defined
using a stack machine is exactly the same as the set of languages that can
be defined using a CFG: the context-free languages.
159
160 CHAPTER THIRTEEN: STACK MACHINES
The entry in the first column is an input symbol (or ε—more about this below). The entry
in the second column is a stack symbol, and the entry in the third column is a string of stack
symbols. The example above says
If the current input symbol is a, and if the symbol on top of the
stack is c, you may pop the c off the stack, push the string abc on in
its place, and advance to the next input symbol.
Remember that the stack is represented as a string with the top symbol at the left end. So
when you push a string of two or more symbols onto the stack, it is the leftmost one that
becomes the new top. For example, suppose the stack machine’s next input symbol is an a,
and suppose its stack is cd. Then the move shown above can be used, leaving the string abcd
on the stack. (It first pops the c off, leaving d, then pushes the string abc on, leaving abcd.)
The new top symbol on the stack is the leftmost symbol, the a.
Every move of a stack machine pops one symbol off the stack, then pushes a string of
zero or more symbols onto the stack. To specify a move that leaves the stack unchanged, you
can explicitly push the popped symbol back on, like this:
read pop push
a c c
In this case, if the stack machine’s next input symbol is an a, and its stack is cd, then the
move shown above can be used, leaving the string cd on the stack. (It first pops the c off,
leaving d, then pushes the c back on, leaving cd.)
Every move of a stack machine pushes some string of symbols onto the stack. To specify
a move that pops but does not push, you can explicitly push the empty string, like this:
read pop push
a c ε
13.1 STACK MACHINE BASICS 161
In this case, if the stack machine’s next input symbol is an a, and its stack is cd, then the
move shown above can be used, leaving the string d on the stack. (It first pops the c off,
leaving d, then pushes the empty string back on, still leaving d.)
The entry in the first column can be ε. This encodes a move that can be made without
reading an input symbol, just like an ε-transition in an NFA:
The example above says that whenever the stack machine has a c on top of the stack, it may
pop it off and push the string ab onto the stack. This move does not advance to the next
input symbol and may even be made after all the input symbols have been read.
A stack machine starts with a stack that contains just one symbol S. On each move it can
alter its stack, but only in stack order—only by popping from the top and/or pushing onto
the top. If the stack machine decides to accept the input string, it signals this by leaving its
stack empty, popping everything off including the original bottom-of-stack symbol S.
Like an NFA, a stack machine is potentially nondeterministic: it may have more than
one legal sequence of moves on a given input string. It accepts if there is at least one legal
sequence of moves that reads the entire input string and ends with the stack empty—the initial
symbol S, and anything else that was pushed during the computation, must be popped off to
signal that the input string is accepted.
Consider this stack machine:
read pop push
1. ε S ab
2. a S ef
3. a S ε
Suppose the input string is a. The initial stack is S, so all three moves are possible. In fact,
there are three possible sequences of moves:
• If move 1 is used as the first move, no input is read and the stack becomes ab. From
there no further move is defined. This is a rejecting sequence of moves, both because the
input was not finished and because the stack is not empty.
• If move 2 is used as the first move, the a is read and the stack becomes ef; then no further
move can be made. This too is a rejecting sequence of moves, because even though all the
input was read, the stack is not empty.
• If move 3 is used as the first move, the a is read and the stack becomes empty. This is an
accepting sequence of moves: the input was all read and the stack is empty at the end.
Because there is at least one accepting sequence of moves for a, a is in the language defined
by this stack machine. (In fact, the language defined by this stack machine is just {a}.)
162 CHAPTER THIRTEEN: STACK MACHINES
Let’s look at the accepting sequence of moves this machine makes for the string aaabbb.
Initially, the stack machine is reading the first symbol of aaabbb and has S on its stack, as
shown here:
aaabbb S
The current input symbol and the top-of-the-stack symbol are underlined; these determine
which moves are possible. The possible moves are 1 and 2, but the accepting sequence starts
with move 1. After that move, the stack machine has advanced to the second symbol of
aaabbb and has S1 on its stack, like this:
aaabbb S
aaabbb S1
aaabbb S
aaabbb S1
aaabbb S11
aaabbb S111
move 2: pop S
aaabbb 111
aaabbb 11
aaabbb 1
aaabbb ε
At the end, the stack machine has read all its input and emptied its stack. That is an
accepting sequence of moves. As we have already seen, there are also many nonaccepting
sequences, such as this one:
164 CHAPTER THIRTEEN: STACK MACHINES
aaabbb S
aaabbb S1
move 2: pop S
aaabbb 1
There is no next move here, so the stack machine terminates without reaching the end of the
input string and without emptying its stack. But since there is at least one accepting sequence
of moves, the string aaabbb is in the language defined by the machine.
abbbba Sa
abbbba Sba
abbbba Sbba
move 3: pop S
abbbba bba
abbbba ba
abbbba a
aaabbb ε
166 CHAPTER THIRTEEN: STACK MACHINES
Notice that this stack machine can use move 3 at any time (though, since it pops off the
only S, it can be used only once). However, in an accepting sequence move 3 must occur
exactly in the middle of the input string. We can think of the stack machine as making a
guess about where the middle of the string is. When it guesses that it has found the middle,
it applies move 3, then begins popping instead of pushing. This takes advantage of the
nondeterminism of the stack machine. It accepts if there is any sequence of moves that reads
the entire input string and ends with an empty stack. All those sequences that make the
wrong guess about where the middle of the string is do not accept. But if the string is in our
language, the one sequence that makes the right guess about where the middle of the string is
does accept; and that one is all it takes.
The stack alphabet may or may not overlap with the input alphabet.
The transition function takes two parameters. The first parameter is an input symbol
or ε, and the second is a stack symbol. These parameters correspond to the first two
columns of the tables we used in the previous chapter: the input symbol being read (or ε
for ε-transitions) and the symbol currently on top of the stack. The output of the transition
function is a set of strings that can be pushed onto the stack. It is a set of strings, rather than
just a single string, because a stack machine is nondeterministic: at each stage there may be
any number of possible moves. For example, this machine:
read pop push
1. ε S ab
2. a S ef
3. a S ε
The function for the stack machine determines a relation on IDs; we write
I J if I is an ID and J is an ID that follows from I after one move of the stack machine.
Note that no move is possible when the stack machine’s stack is empty; there is never an ID I
with (x, ε) I.
Next, as usual, we define an extended relation * for sequences of zero or more steps:
Notice here that * is reflexive: for any ID I, I * I by a sequence of zero moves. Using the
* relation, we can define the language accepted by M:
In this definition, the stack machine starts reading its input string x with S on the stack. It
accepts the string x if it has some sequence of zero or more moves that ends with all the input
used up and the stack empty.
For example, in Section 13.3 we gave an informal description of a stack machine for the
language {xxR | x {a, b}*}:
168 CHAPTER THIRTEEN: STACK MACHINES
and we showed an excepting sequence of moves for the string abbbba. Formally, this machine
is M = ({a, b, S}, {a, b}, S, ), where
(a, S) = {Sa}
(b, S) = {Sb}
(ε, S) = {ε}
(a, a) = {ε}
(b, b) = {ε}
and the accepting sequence of moves that we showed for abbbba is
(abbbba, S) (bbbba, Sa) (bbba, Sba) (bba, Sbba)
(bba, bba) (ba, ba) (a, a) (ε, ε)
Thus (abbbba, S) * (ε, ε) and so, by definition, abbbba L(M).
That one example does not, of course, establish that L(M ) = {xxR | x {a, b}*}. Proofs of
that kind tend to be quite tricky. In the case of our current stack machine, however, we can
sidestep explicit induction without too much difficulty. We first observe that only move 3
changes the number of S symbols in the stack, reducing it by 1. So move 3 must be used in
any accepting sequence, exactly once. In fact, any accepting sequence must have the form
(xy, S) * (y, Sz) 3 (y, z) * (ε, ε)
where x, y, and z are in {a, b}*, only moves 1 and 2 occur before the use of move 3, and
only moves 4 and 5 occur after it. Moves 1 and 2 push the symbol just read onto the stack;
therefore z = xR. Moves 4 and 5 pop as and bs only if they match the input; therefore z = y.
Thus any accepting sequence must have the form
(xxR, S) * (xR, SxR) (xR, xR) * (ε, ε)
and there is such an accepting sequence for any string x. We conclude that
L(M) = {xxR | x {a, b}*}.
Here, moves 1 and 2 allow it to push a 0 for each a read. (Two moves are required to handle
the two possible symbols on top of the stack, S and 0.) Move 3 allows it to pop a 0 for each
b read. Move 4 allows it to pop the S off. Since move 4 is the only way to get rid of the S,
it must occur in any accepting sequence; and since it leaves the stack empty, no move can
follow it. Thus it is always the last move in any accepting sequence.
Formally, this is M1 = ({0, S}, {a, b}, S, ), where the function is
(a, S) = {0S}
(a, 0) = {00}
(b, 0) = {ε}
(ε, S) = {ε}
Here is a sequence of IDs showing that M1 accepts abab:
(abab, S) (bab, 0S) (ab, S) (b, 0S) (ε, S) (ε, ε)
This sequence shows that we have (abab, S) * (ε, ε), and so, by definition, abab L(M1).
It is clear that by using the stack as a counter, M1 checks that the number of as in the
string is the same as the number of bs. Thus everything M1 accepts is in our target language
A. Unfortunately, not everything in A is accepted by M1. Consider its behavior on the input
string abba. Starting in (abba, S), it can proceed to (bba, 0S ) and then to (ba, S ). But from
there it has no transition for reading the next input symbol, the b. Our strategy uses the
number of 0s on the stack to represent the number of as so far minus the number of bs so far.
That will fail if, as in abba, there is some prefix with more bs than as.
To handle this we need a more sophisticated strategy. We will still keep count using 0s
on the stack, whenever the number of as so far exceeds the number of bs so far. But when
the number of bs so far exceeds the number of as so far, we will keep count using 1s. That
strategy is embodied in this stack machine, which we’ll call M2:
170 CHAPTER THIRTEEN: STACK MACHINES
q0 1 q1
1
1
0
0
q2 q3
0
1
13.7 A STACK MACHINE FOR EVERY CFG 171
Our strategy for making a stack machine to simulate a DFA is simple. Until the last move,
our machine will always have exactly one symbol on its stack, which will be the current
state of the DFA. As its last move, if the DFA is in an accepting state, the stack machine will
pop that state off, ending with an empty stack. For the DFA above, our stack machine is
M = ({q0, q1, q2, q3}, {0, 1}, q0, ), where the function is
(0, q0) = {q0}
(1, q0) = {q1}
(0, q1) = {q2}
(1, q1) = {q3}
(0, q2) = {q0}
(1, q2) = {q1}
(0, q3) = {q2}
(1, q3) = {q3}
(ε, q2) = {ε}
(ε, q3) = {ε}
Notice that we are using the states of the DFA as the stack alphabet of the stack machine,
and we are using the start state q0 of the DFA as the initial stack symbol. The transition
function of the stack machine exactly parallels that of the DFA, so that the stack machine
simply simulates the actions of the DFA. For example, the DFA accepts the string 0110 by
going through a series of states (q0, q0, q1, q3, q2) ending in an accepting state (q2). The stack
machine accepts the string 0110 through the corresponding sequence of IDs:
(0110, q0) (110, q0) (10, q1) (0, q3) (ε, q2) (ε, ε)
The same kind of construction works, not just for DFAs, but for NFAs as well. You
can practice it in some of the following exercises. However, we will not give the formal
construction here, since we will now prove an even stronger result.
Using this idea, we can take any CFG and construct a stack machine for the same
language. The constructed machine starts, as always, with the start symbol S on its stack.
It carries out a derivation in the language, repeatedly using two types of moves:
1. If the top symbol on the stack is a nonterminal, replace it using any one of the
grammar’s productions for that nonterminal. This is an ε-move, not consuming any
symbols of the stack machine's input.
2. If the top symbol on the stack is a terminal that matches the next input symbol,
pop it off.
Finally, when it has completed a derivation and popped off all the resulting terminal symbols,
it is left with an empty stack, and it accepts the input string.
This is a highly nondeterministic machine. The moves of type 1 allow the machine to
trace out any derivation from S permitted by the grammar. But the moves of type 2 allow
progress only if the derived string matches the input string. The only way the stack can be
emptied is if a fully terminal string was derived. The only way the stack machine will accept
at that point is if the entire input string has been matched. In effect, the machine can try all
possible derivations; it accepts if and only if at least one of them is a derivation of the input
string.
For example, consider this CFG for {xxR | x {a, b}*}:
S aSa | bSb | ε
The constructed stack machine corresponding to this CFG is this:
read pop push
1. ε S aSa
2. ε S bSb
3. ε S ε
4. a a ε
5. b b ε
Moves 1 through 3 correspond to the three productions in the grammar. Moves 4 and 5
allow the stack machine to pop the terminals a and b off the stack if they match the input
symbols; these moves would be the same in any stack machine constructed from a CFG
whose terminal alphabet is {a, b}.
A string in the language defined by the CFG is abbbba. Here is a derivation for the
string:
S aSb abSba abbSbba abbbba
13.8 A CFG FOR EVERY STACK MACHINE 173
Here is the corresponding accepting sequence of IDs for the stack machine. The subscripts
on each step indicate which move was performed:
(abbbba, S) 1 (abbbba, aSa) 4 (bbbba, Sa) 2 (bbbba, bSba)
5
(bbba, Sba) 2 (bbba, bSbba) 5 (bba, Sbba) 3 (bba, bba)
5
(ba, ba) 5 (a, a) 4 (ε, ε)
We can generalize this construction in the following lemma and proof sketch.
The language it accepts is the language of strings over the alphabet {a, b} in which the
number of as equals the number of bs. Here is a grammar that can simulate this stack
machine using productions that mimic the stack machine’s transitions:
174 CHAPTER THIRTEEN: STACK MACHINES
1. S a0S
2. 0 a00
3. 0 b
4. S b1S
5. 1 b11
6. 1 a
7. S ε
The productions are numbered here to show how they match the moves of the stack
machine. Wherever the stack machine has a move t ( , A), the grammar has a
production A t. Thus the derivations in the grammar simulate the executions of the
stack machine: each time the stack machine reads an input symbol, the grammar generates
that symbol, and the stack machine’s stack is simulated in the nonterminal part of the string
being rewritten. For example, this is an accepting sequence in the stack machine on the input
abab:
(abab, S ) 1 (bab, 0S) 3 (ab, S ) 1 (b, 0S) 3 (ε, S ) 7 (ε, ε)
(The subscripts indicate which move was used at each step.) The grammar generates the
string abab using the corresponding sequence of derivation steps:
S 1
a0S 3
abS 1
aba0S 3
ababS 7
abab
At each stage in this derivation, the string has two parts xy: x is the portion of the input
string that has been read by the stack machine, and y is the content of the stack. At the end
of the derivation the stack machine’s stack y is empty, and the string x is the fully terminal
string that was the stack machine’s input.
In the construction, the stack symbols of the stack machine become nonterminals in
the constructed grammar, and the input symbols become terminals. That’s why we need the
assumption that the two alphabets have no symbols in common; no symbol in a grammar
can be both a terminal and a nonterminal. The assumption is “without loss of generality”
because offending stack symbols can always be renamed if necessary.
For an example demonstrating this kind of renaming, consider this stack machine for the
language {anb2n}:
read pop push
1. a S Sbb
2. ε S ε
3. b b ε
This stack machine does not satisfy our assumption, because it uses the symbol b as
both an input symbol and a stack symbol. If we applied our construction to this machine
we would end up with a grammar containing the production b b, which is not legal in a
CFG. But the problem is easy to solve by renaming: simply replace the stack symbol b with a
new symbol such as B:
read pop push
1. a S SBB
2. ε S ε
3. b B ε
Notice that we changed only the pop and push columns—only the instances of b on the stack,
not the instance of b as an input symbol. This modified stack machine now satisfies our
assumption, and we can proceed with the construction, yielding this CFG:
S aSBB | ε
B b
The proof of Lemma 13.2.1 asserts that the leftmost derivations in G exactly simulate
the executions of M. The previous examples should help to convince you that this is true, but
we can also prove the assertion in more detail using induction. We have avoided most such
proofs this far, but since this one is similar to the missing piece in our proof of Lemma 13.1,
let’s bite the bullet this time and fill in the details.
176 CHAPTER THIRTEEN: STACK MACHINES
Lemma 13.2.2: For the construction of the proof of Lemma 13.2.1, for any
x * and y *, S * xy if and only if (x, S) * (ε, y).
Proof that if S * xy then (x, S) * (ε, y): By induction on the length of the
derivation. In the base case the length of the derivation is zero. The only zero-
length derivation S * xy is with xy = S, and since x * we must have x = ε
and y = S. For these values, by definition, (x, S) * (ε, y) in zero steps.
In the inductive case, the length of the derivation is greater than zero, so it has
at least one final step. Also, there must be some leftmost derivation of the same
string, and a leftmost derivation in G always produces a string of terminals
followed by a string of nonterminals. Thus, we have
S * x'Ay' xy
for some x' *, A , and y' *. By the inductive hypothesis (which
applies since the derivation of x'Ay' is one step shorter) we know that
(x', S) * (ε, Ay'). The final step in the derivation uses one of the productions
in P, which by construction are all of the form
(A t) | A , ( {ε}), and t ( , A)
where x' = x and ty' = y. Now since (x', S) * (ε, Ay' ) we also have
(x' , S) * ( , Ay' ), and since t ( , A) we can add a final step for
the stack machine:
Proof of the other direction (if (x, S) * (ε, y) then S * xy): Similar, by
induction on the number of steps in the execution.
Theorem 13.1: A language is context free if and only if it is L(M ) for some stack
machine M.
Proof: Follows immediately from Lemmas 13.1 and 13.2.1.
Exercises
EXERCISE 1
How would you change the stack machine of section 13.2 so that the language it accepts
is {anbn | n > 0}?
EXERCISE 2
Show the table of moves for a stack machine for the language {ancbn}.
EXERCISE 3
How would you change the stack machine of Section 13.3 so that the language it accepts
is {xxR | x {a, b}*, x ε}?
EXERCISE 4
Show the table of moves for a stack machine for the language {xcxR | x {a, b}*}.
EXERCISE 5
Show the table of moves for a stack machine for the language {(ab)n(ba)n}.
EXERCISE 6
Give stack machines for the following languages:
a. L(a*b*)
b. {x {0, 1}* | the number of 0s in x is divisible by 3}
c. {x {0, 1}* | |x| is divisible by 3}
d. {x {0, 1}* | x is a binary representation of a number divisible by 3}
e. {x {a, b}* | x L(a*b*)}
f. {anbm | m > 2n}
g. {anbn | n is even}
h. {(ab)nc(de)n}
i. {anbn | n is odd}
j. {aibna j bnak}
k. {anbnaib j }
l. {anbm | m n}
m. {anbnc j }
n. {anb2n}
o. {anbmc md n}
p. {anbnc md m}
q. {x {a, b}* | x = xR}
178 CHAPTER THIRTEEN: STACK MACHINES
EXERCISE 7
Using the construction of Lemma 13.1, give a stack machine for this grammar:
S S1 | S2
S1 aS1a | bS1b | ε
S2 XXS2 | X
X a|b
Show accepting sequences of IDs for your stack machine for abba and bbb.
EXERCISE 8
Using the construction of Lemma 13.1, give a stack machine for this grammar:
S T0T0T0S | T
T 1T | ε
Show accepting sequences of IDs for your stack machine for 110010 and 00011000.
EXERCISE 9
Using the construction of Lemma 13.1, give a stack machine for this grammar:
S XSY | ε
X a|b
Y c|d
Show accepting sequences of IDs for your stack machine for abcd and abbddd.
EXERCISE 10
Using the construction of Lemma 13.1, give a stack machine for this grammar:
S S1S2 | S3S1
S1 aS1b | ε
S2 bS2 | b
S3 aS3 | a
Show accepting sequences of IDs for your stack machine for aabbb and aaabb.
EXERCISE 11
Give a more elegant grammar for the language recognized by the stack machine in
Section 13.8.
EXERCISE 12
Using the construction of Lemma 13.2.1, give a CFG for this stack machine:
EXERCISES 179
(Note that because this machine has stack and input alphabets that overlap, you will have
to rename some of the symbols in the stack alphabet to make the construction work.)
Show a derivation of abba in your grammar.
EXERCISE 14
Give an example of a stack machine for the language {anbn} that has exactly two distinct
accepting sequences of IDs on every accepted input. Then give an example of a stack
machine for the same language that has infinitely many distinct accepting sequences of
IDs on every accepted input.
EXERCISE 15
Our proof of Lemma 13.2.2 ended like this:
Proof of the other direction (if (x, S) * (ε, y) then S * xy): Similar, by
induction on the number of steps in the execution.
Fill in the missing details, proving by induction that if (x, S) * (ε, y) then S * xy.
14
CHA P TER
The Context-free
Frontier
At this point we have two major language categories, the regular languages and the
context-free languages, and we have seen that the CFLs include the regular languages,
like this:
L(a*b*)
regular
languages
CFLs
{anbn}
Are there languages outside of the CFLs? In this chapter we see that the answer is yes,
and we see some simple examples of languages that are not CFLs.
We have already seen that there are many closure properties for regular languages. Given
any two regular languages, there are many ways to combine them—intersections, unions,
and so on—that are guaranteed to produce another regular language. The context-
free languages also have some closure properties, though not as many as the regular
languages. If regular languages are a safe and settled territory, context-free languages are
more like frontier towns. Some operations like union get you safely to another context-
free language; others like complement and intersection just leave you in the wilderness.
181
182 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
A pumping parse tree for a CFG G = (V, , S, P ) is a parse tree with two
properties:
1. There is a node for some nonterminal symbol A, which has that
same nonterminal symbol A as one of its descendants.
2. The terminal string generated from the ancestor A is longer than
the terminal string generated from the descendant A.
If you can find a pumping parse tree for a grammar, you know a set of strings that must
be in the language—not just the string the pumping parse tree yields, but a whole collection
of related strings.
u v w x y
u y
Or we could replace the w subtree with the vwx subtree, producing this parse
tree for uv2wx2y:
u v x y
A
v w x
184 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
Then we could again replace the w subtree with the vwx subtree, producing this
parse tree for uv3wx3y:
u v x y
A
v x
A
v w x
Repeating this step any number of times, we can generate a parse tree for uviwxiy
for any i.
The previous lemma shows that pumping parse trees are very useful—find one, and you
can conclude that not only uvwxy L(G), but uviwxiy L(G) for all i. The next lemma
shows that they are not at all hard to find. To prove it, we will need to refer to the height of a
parse tree, which is defined as follows:
The height of a parse tree is the number of edges in the longest path from the
start symbol to any leaf.
a S + S S + S
a b S * S c
a b
We will also need the idea of a minimum-size parse tree for a given string.
The need for this definition arises because an ambiguous grammar can generate the same
string using more than one parse tree, and these parse trees might even have different
numbers of nodes. For example, the previous grammar generates the string a*b+c with the
parse tree shown above, but also with these ones:
S S
S * S S + S
a S + S S * S S
b c a b c
All three parse trees generate the same string, but the last one is not minimum size, since it
has one more node than the others. Every string in the language generated by a grammar
has at least one minimum-size parse tree; for an ambiguous grammar, some strings may have
more than one.
Now we’re ready to see why pumping parse trees are easy to find.
generates infinitely many minimum-size parse trees. There are only finitely
many minimum-size parse trees of height |V | or less; therefore, G generates
a minimum-size parse tree of height greater than |V | (indeed, it generates
infinitely many). Such a parse tree must satisfy property 1 of pumping
parse trees, because on a path with more than |V| edges there must be some
nonterminal A that occurs at least twice. And such a parse tree must also satisfy
property 2, because it is a minimum-size parse tree; if it did not satisfy property
2, we could replace the ancestor A with the descendant A to produce a parse tree
with fewer nodes yielding the same string. Thus, G generates a pumping parse
tree.
This proof actually shows that every grammar for an infinite language generates not just one,
but infinitely many pumping parse trees. However, one is all we’ll need for the next proof.
The tricky part of this proof is seeing that no matter how you break up a string akbkck
into substrings uvwxy, where v and x are not both ε, you must have uv2wx2y {anbncn}. If the
substrings v and/or x contain a mixture of symbols, as in this example:
a a a a a b b b b b c c c c c
u v w x y
then the resulting string uv2wx2y would have as after bs and/or bs after cs; for the example
above it would be aaaaabbaabbbbbcbbccccc, clearly not in {anbncn}. On the other hand, if the
substrings v and x contain at most one kind of symbol each, as in this example:
14.3 CLOSURE PROPERTIES FOR CFLS 187
a a a a a b b b b b c c c c c
u v w x y
then the resulting string uv2wx2y would no longer have the same number of as, bs, and cs;
for the example above it would be aaaaaaabbbbbbccccc, clearly not in {anbncn}. Either way,
uv2wx2y {anbncn}.
V = V1 V2 {S},
= 1 2
, and
P = P1 P2 {(S S1), (S S2)}
Theorem 14.2.2: If L1 and L2 are any context-free languages, L1L2 is also context
free.
Proof: By construction. If L1 and L2 are context free then, by definition, there
exist grammars G1 = (V1, 1, S1, P1) and G2 = (V2, 2, S2, P2) with L(G1) = L1
and L(G2) = L2. We can assume without a loss of generality that V1 and V2 are
disjoint. (Symbols shared between V1 and V2 could easily be renamed.) Now
consider the grammar G = (V, , S, P), where S is a new nonterminal symbol
and
188 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
V = V1 V2 {S},
= 1 2
, and
P = P1 P2 {(S S1S2)}
We can define the Kleene closure of a language in a way that parallels our use of the
Kleene star in regular expressions:
The Kleene closure of any language L is L* = {x1x2 ... xn | n 0, with all xi L}.
The fact that the context-free languages are closed under Kleene star can then be proved
with a CFG construction, as before:
One more closure property: the CFLs are closed under intersection with regular languages:
S1 A 1 B1
A1 aA1b | ε
B1 cB1 | ε
S2 A 2 B2
A2 aA2 | ε
B2 bB2c | ε
Now L(G1) = {anbncm}, while L(G2) = {ambncn}. The intersection of the two
languages is {anbncn}, which we know is not context free. Thus, the CFLs are not
closed for intersection.
This nonclosure property does not mean that every intersection of CFLs fails to be a
CFL. In fact, we have already seen many cases in which the intersection of two CFLs is
another CFL. For example, we know that any intersection of regular languages is a regular
language—and they are all CFLs. We also know that the intersection of a CFL with a regular
language is a CFL. All the theorem says is that the intersection of two CFLs is not always a
CFL. Similarly, the complement of a CFL is sometimes, but not always, another CFL:
strings are in L . Even-length strings are in L if and only if they have at least
one symbol in the first half that is not the same as the corresponding symbol
in the second half; that is, either there is an a in the first half and a b at the
corresponding position in the second half, or the reverse. In the first case the
string looks like this, for some i and j :
center
At first this looks daunting: how could a grammar generate such strings? How
can we generate waxybz, where |w| = |y| = i and |x| = |z| = j? It doesn’t look
context free, until you realize that because the x and y parts can both be any
string in {a, b}*, we can swap them, instead picturing it as:
Now we see that this actually is context free; it’s {way | |w| = |y|}, concatenated
with {xbz | |x| = |z|}. For the case with a b in the first half corresponding to an
a in the second half, we can just swap the two parts, getting {xbz | |x| = |z|}
concatenated with {way | |w| = |y|}.
This CFG generates the language: O generates the odd-length part, and AB and
BA generate the two even-length parts.
S O | AB | BA
A XAX | a
14.5 A PUMPING LEMMA 191
B XBX | b
O XXO | X
X a|b
We conclude that L is context free. But as we will show in Section 14.7, the
complement of L , L = {xx | x {a, b}*} is not context free. We conclude that the
CFLs are not closed for complement.
Lemma 14.2.1: For every grammar G = (V, , S, P), every minimum-size parse
tree of a height greater than |V | can be expressed as a pumping parse tree with
the properties shown:
height ≥ |V | + 1
height ≤ |V | + 1
A
u v w x y
Proof: Choose any path from root to leaf with more than |V | edges. Then,
working from the leaf back toward the root on this path, choose the first two
nodes that repeat a nonterminal, and let these be the nodes labeled A in the
diagram. As shown in the proof of Lemma 14.1.2, the result is a pumping parse
tree. Furthermore, the nonterminal A must have repeated within the first |V | + 1
edges, so the height of the subtree generating vwx is |V | + 1.
192 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
Being able to give a bound on the height of a tree or subtree also enables us to give a
bound on the length of the string generated.
Lemma 14.2.2: For every CFG G = (V, , S, P) there exists some k greater than
the length of any string generated by any parse tree or subtree of height |V | + 1
or less.
Proof 1: For any given grammar, there are only finitely many parse trees and
subtrees of height |V | + 1 or less. Let k be one greater than the length of the
longest string so generated.
Proof 2: Let b be the length of the longest right-hand side of any production
in P. Then b is the maximum branching factor in any parse tree. So a tree or
subtree of height |V | + 1 can have at most b|V |+1 leaves. Let k = b|V |+1 + 1.
These two proofs give different values for k. That is not a problem, because what matters
here is not the value of k, but the existence of some k with the right properties. This was
also true of the value k we used in the pumping lemma for regular languages. To prove the
pumping lemma, we let k be the number of states in some DFA for the language. But once
the pumping lemma for regular languages was proved, the value of k used in the proof was
irrelevant, since the lemma itself claims only that some suitable k exists.
Putting these lemmas together with our previous results about pumping parse trees, we
get the pumping lemma for context-free languages.
Lemma 14.2.3 (The Pumping Lemma for Context-free Languages): For all
context-free languages L there exists some k N such that for all z L with
|z| k, there exist uvwxy such that
1. z = uvwxy,
2. v and x are not both ε,
3. |vwx| k, and
4. for all i, uviwxiy A.
Proof: We are given some CFL L. Let G be any CFG with L(G) = L, and let k
be as given for this grammar by Lemma 14.2.2. We are given some z L with
|z| k. Since all parse trees in G for z have a height greater than |V | + 1, that
includes a minimum-size parse tree for z. We can express this as a pumping parse
tree for z as given in Lemma 14.2.1. Since this is a parse tree for z, property
1 is satisfied; since it is a pumping parse tree, properties 2 and 4 are satisfied;
and since the subtree generating vwx is of height |V | + 1 or less, property 3 is
satisfied.
14.6 PUMPING-LEMMA PROOFS 193
This pumping lemma shows once again how matching pairs are fundamental to the
context-free languages. Every sufficiently long string in a context-free language contains a
matching pair consisting of the two substrings v and x of the lemma. These substrings can be
pumped in tandem, always producing another string uv iwx iy in the language. Of course, one
or the other (but not both) of those substrings may be ε. In that way the pumping lemma for
context-free languages generalizes the one for regular languages, since if one of the strings is
ε, the other can be pumped by itself.
But in all these cases, since v and x are not both ε, pumping changes the number of
one or two of the symbols, but not all three. So uv2wx2y L.
5. This contradicts the pumping lemma. By contradiction, L = {anbncn} is not context
free.
The structure of this proof matches the structure of the pumping lemma itself. Here is
the overall structure of the pumping lemma, using the symbols (“for all”) and (“there
exists”):
194 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
1. k ...
2. z ...
3. uvwxy ...
4. i 0 ...
The alternating and parts make every pumping-lemma proof into a kind of game, just
as we saw using the pumping lemma for regular languages. The parts (the natural number
k and the strings u, v, w, x, and y) are merely guaranteed to exist. In effect, the pumping
lemma itself makes these moves, choosing k, u, v, w, x, and y any way it wants. In steps
1 and 3 we could only say these values are “as given by the pumping lemma.” The parts,
on the other hand, are the moves you get to make: you can choose any values for z and i,
since the lemma holds for all such values. This pumping lemma offers fewer choices than the
pumping lemma for regular languages, and that actually makes the proofs a little harder to
do. The final step, showing that a contradiction is reached, is often more difficult, because
one has less control early in the proof.
You may have noticed that in that table in the proof—the one that shows how the strings
v and x can fall within the string akbkck—line 6 is unnecessary. It does not hurt to include it
in the proof, because all six cases lead to the same contradiction. But the substrings v and x
actually cannot be as far apart as shown in line 6, because that would make the combined
substring vwx more than k symbols long, while the pumping lemma guarantees that
|vwx| k. In many applications of the pumping lemma it is necessary to make use of this.
The following pumping-lemma proof is an example:
ak bk ck
1. v x
2. v x
3. v x
4. v x
5. v x
6. v x
Because v and x are not both ε, uv2wx2y in case 1 has more as than cs. In case 2
there are either more as than cs, more bs than cs, or both. In case 3 there are
more bs than as and cs. In case 4 there are more cs than as, more bs than as, or
both. In case 5 there are more cs than as. Case 6 contradicts |vwx| k. Thus, all
cases contradict the pumping lemma. By contradiction, L = {anbmcn | m n} is
not context free.
This proof is very similar to the previous one, using the same string z and pumping
count i. This time, however, we needed to use the fact that |vwx| k to reach a contradiction
in case 6. Without that, no contradiction would be reached, since pumping v and x together
could make the number of as and the number of cs grow in step, which would still be a string
in the language L.
Theorem 14.5: {xx | x *} is not a CFL for any alphabet with at least two
symbols.
Proof: By contradiction using the pumping lemma for context-free languages.
Let be any alphabet containing at least two symbols (which we will refer to as
a and b). Assume that L = {xx | x *} is context free, so the pumping lemma
holds for L. Let k be as given by the pumping lemma. Choose z = akbkakbk. Now
z L and |z| k as required. Let u, v, w, x, and y be as given by the pumping
lemma, so that uvwxy = akbkakbk, v and x are not both ε, |vwx| k, and for all i,
uv iwx iy L.
196 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
Now consider how the substrings v and x fall within the string. Because
|vwx| k, v and x cannot be widely separated, so we have these 13 cases:
ak bk ak bk
1. v x
2. v x
3. v x
4. v x
5. v x
6. v x
7. v x
8. v x
9. v x
10. v x
11. v x
12. v x
13. v x
For cases 1 through 5, we can choose i = 0. Then the string uv0wx0y is some sakbk
where |s| < 2k. In this string the last symbol of the first half is an a, while the last
symbol of the second half is a b. So uv0wx0y L.
For cases 9 through 13, we can again choose i = 0. This time the string uv0wx0y
is some akb ks where |s| < 2k. In this string the first symbol of the first half is an a,
while the first symbol of the second half is a b. So, again, uv0wx0y L.
Finally, for cases 6, 7, and 8, we can again choose i = 0. Now the string uv0wx0y
is some aksbk where |s| < 2k. But such an aksbk can’t be rr for any r, because if r
starts with k as and ends with k bs, we must have |r| 2k and so |rr| 4k, while
our |aksbk | < 4k. So, again, uv0wx0y L.
In every case we have a contradiction. By contradiction, L is not a context-free
language.
In Chapter 12 we saw the important role CFGs play in defining the syntax of
programming languages. Many languages, including Java, require variables to be declared
before they are used:
EXERCISES 197
int fred = 0;
while (fred==0) {
...
}
The requirement that the same variable name fred occur in two places, once in a
declaration and again in a use, is not part of any grammar for Java. It is a non-context-free
construct in the same way that {xx | x *} is a non-context-free language—think of x here
as the variable name which must be the same in two places. Java compilers can enforce the
requirement easily enough, but doing so requires more computational power than can be
wired into any context-free grammar.
One final note about pumping-lemma proofs. In the proof above, as in all those shown
in this chapter, the choice of i was static. The proof above always chose i = 0, regardless of the
values of the other variables. This is not always possible. The pumping lemma permits the
choice of i to depend on the values of k, u, v, w, x, and y, just as the choice of z must depend
on k. For more challenging proofs it is often necessary to take advantage of that flexibility.
In terms of the pumping-lemma game, it is often necessary to see what moves the pumping
lemma makes before choosing the i that will lead to a contradiction.
Exercises
EXERCISE 1
If our definition of pumping parse trees included only the first, but not the second
property, the proof of Theorem 14.1 would fail to reach a contradiction. Explain exactly
how it would fail.
EXERCISE 2
Construct a grammar for an infinite language that generates a parse tree of a height
greater than |V | that is not a pumping parse tree. Show both the grammar and the parse
tree.
EXERCISE 3
Prove that {anbnc nd n} is not a CFL.
EXERCISE 4
Prove that {a nb 2nc n} is not a CFL.
EXERCISE 5
Show that {anbnc pd q} is a CFL by giving either a stack machine or a CFG for it.
EXERCISE 6
Show that {a nb mc n+m} is a CFL by giving either a stack machine or a CFG for it.
EXERCISE 7
Give a CFG for {anbn} {xxR | x {a, b}*}.
198 CHAPTER FOURTEEN: THE CONTEXT-FREE FRONTIER
EXERCISE 8
Give a stack machine for {anbn} {xxR | x {a, b}*}.
EXERCISE 9
Give a CFG for the set of all strings xy such that x {0n12n} and y is any string over the
alphabet {0, 1} that has an even number of 1s.
EXERCISE 10
Give a stack machine for the set of all strings xy such that x {0n12n} and y is any string
over the alphabet {0, 1} that has an even number of 1s.
EXERCISE 11
Give a CFG for {a ib na jb na k}*.
EXERCISE 12
Give a stack machine for {a ib na jb na k}*.
EXERCISE 13
Give a CFG for {anbn} {x {a, b}* | the number of as in x is even}.
EXERCISE 14
Give a stack machine for {anbn} {x {a, b}* | the number of as in x is even}.
EXERCISE 15
Using the pumping lemma for context-free languages, prove that {anbmcn | m n}
is not a CFL.
EXERCISE 16
Show that {anbmcn} is a CFL by giving either a stack machine or a CFG for it.
EXERCISE 17
Using the pumping lemma for context-free languages, prove that {anbmc p | n m
and m p} is not a CFL.
EXERCISE 18
Show that {anbmc p | n m} is a CFL by giving either a stack machine or a CFG for it.
EXERCISE 19
Let A = {x | x {a, b, c}* and x contains the same number of each of the three symbols}.
Using the pumping lemma for context-free languages, prove that A is not a CFL.
EXERCISE 20
Let B = {x | x {a, b, c}* and x contains the same number of at least two of the three
symbols}. Show that B is a CFL by giving either a stack machine or a CFG for it.
15
CHA P TER
Stack Machine
Applications
The parse tree (or a simplified version, the abstract syntax tree) is
one of the central data structures of almost every compiler or other
programming language system. To parse a program is to find a parse
tree for it. Every time you compile a program, the compiler must
first parse it. Parsing algorithms are fundamentally related to stack
machines, as this chapter illustrates.
199
200 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
Although this stack machine is nondeterministic, closer inspection reveals that it is always
quite easy to choose the next move. For example, suppose the input string is abbcbba. There
are three possible first moves:
15.1 TOP-DOWN PARSING 201
(abbcbba, S) 1
(abbcbba, aSa) …
(abbcbba, S) 2
(abbcbba, bSb) …
(abbcbba, S) 3
(abbcbba, c) …
But only the first of these has any future. The other two leave, on top of the stack, a terminal
symbol that is not the same as the next input symbol. From there, no further move will be
possible.
We can formulate simple rules about when to use those three moves:
• Use move 1 when the top stack symbol is S and the next input symbol is a.
• Use move 2 when the top stack symbol is S and the next input symbol is b.
• Use move 3 when the top stack symbol is S and the next input symbol is c.
These rules say when to apply those moves that derived from productions in the original
grammar. (The other stack-machine moves, the ones that read an input symbol and pop
a matching stack symbol, are already deterministic.) Our rules can be expressed as a two-
dimensional look-ahead table.
a b c $
S S aSa S bSb S c
The entry at table[A][ ] tells which production to use when the top of the stack is A and the
next input symbol is . The final column of the table, table[A][$], tells which production
to use when the top of the stack is A and all the input has been read. For our example,
table[S][$] is empty, showing that if you still have S on top of the stack but all the input has
been read, there is no way to proceed.
Using the table requires one symbol of look-ahead: when there is a nonterminal symbol
on top of the stack, we need to look ahead to see what the next input symbol will be, without
actually consuming it. With the help of this table our stack machine becomes deterministic.
Here is the idea in pseudocode:
1. void predictiveParse(table, S) {
2. initialize a stack containing just S
3. while (the stack is not empty) {
4. A = the top symbol on stack
5. c = the current symbol in input (or $ at the end)
6. if (A is a terminal symbol) {
7. if (A != c) the parse fails
8. pop A and advance input to the next symbol
9. }
10. else {
11. if table[A][c] is empty the parse fails
202 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
This treats input as a global variable: the source of the input string to be parsed.
The code scans the input from left to right, one symbol at a time, maintaining (implicitly)
the current position of the scan. Line 5 peeks at the current input symbol, but only line 8
advances to the next symbol. We assume there is a special symbol, not the same as any
terminal symbol, marking the end of the string; call this symbol $. The parameter table is
the two-dimensional table of productions, as already described.
Our predictiveParse is an example of a family of top-down techniques called
LL(1) parsers. The first L signifies a left-to-right scan of the input, the second L signifies
that the parse follows the order of a leftmost derivation, and the 1 signifies that one symbol
of input look-ahead is used (along with the top symbol of the stack) to choose the next
production. It is a simple parsing algorithm; the catch is the parse table it requires. Such
parse tables exist for some grammars but not for others—which means that LL(1) parsing is
possible for some grammars but not for others. LL(1) grammars are those for which LL(1)
parsing is possible; LL(1) languages are those that have LL(1) grammars.
LL(1) grammars can often be constructed for programming languages. Unfortunately,
they tend to be rather contorted. Consider this simple grammar and the little language of
expressions it generates:
S (S) | S + S | S * S | a | b | c
As we saw in Chapter 12, that grammar is ambiguous, and no ambiguous grammar is LL(1).
But even this reasonably simple unambiguous grammar for the language fails to be LL(1):
S S+R|R
R R*X|X
X (S) | a | b | c
In this case the grammar’s problem is that it is left recursive: it includes productions like
S S + R that replace a nonterminal with a string starting with that same nonterminal.
No left-recursive grammar is LL(1). This LL(1) grammar for the same language manages
to avoid left recursion, at some cost in clarity:
S AR
R +AR | ε
A XB
15.2 RECURSIVE-DESCENT PARSING 203
B *XB | ε
X (S) | a | b | c
Once we have an LL(1) grammar, we still need to construct the LL(1) parse table for it. That
construction can be rather subtle. Here, for example, is the parse table that goes with the
grammar above:
a b c + * ( ) $
S S AR S AR S AR S AR
R R +AR R ε R ε
A A XB A XB A XB A XB
B B ε B *XB B ε B ε
X X a X b X c X (S)
void match(x) {
c = the current symbol in input
if (c!=x) the parse fails
advance input to the next symbol
}
All match does is check that the expected nonterminal really does occur next in the input
and advance past it. Now we implement a function for deriving a parse tree for S:
204 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
void parse_S() {
c = the current symbol in input (or $ at the end)
if (c=='a') { // production S aSa
match('a'); parse_S(); match('a');
}
else if (c=='b') { // production S bSb
match('b'); parse_S(); match('b');
}
else if (c=='c') { // production S c
match('c');
}
else the parse fails
}
This code is an LL(1) parser, not really that different from predictiveParse, our
previous, table-driven implementation. Both are left-to-right, top-down parsers that choose a
production to apply based on the current nonterminal and the current symbol in the input.
But now the information from the parse table is incorporated into the code, which works
recursively. This is called a recursive-descent parser.
This is part of a recursive-descent parser for our LL(1) grammar of expressions:
void parse_S() {
c = the current symbol in input (or $ at the end)
if (c=='a' || c=='b' ||
c=='c' || c=='(') { // production S AR
parse_A(); parse_R();
}
else the parse fails
}
void parse_R() {
c = the current symbol in input (or $ at the end)
if (c=='+') { // production R +AR
match('+'); parse_A(); parse_R();
15.3 BOTTOM-UP PARSING 205
}
else if (c==')' || c=='$') { // production R ε
}
else the parse fails
}
Additional functions parse_A, parse_B, and parse_X constructed on the same lines
would complete the parser.
Compare the recursive-descent parser with the previous table-driven LL(1) parser. Both
are top-down, LL(1) techniques that work only with LL(1) grammars. The table-driven
method uses an explicit parse table, while recursive descent uses a separate function for each
nonterminal. The table-driven method uses an explicit stack—but what happened to the
stack in the recursive-descent method? It’s still there, but hidden. Where a table-driven, top-
down parser uses an explicit stack, a recursive-descent parser uses the language system’s call
stack. Each function call implicitly pushes and each return implicitly pops the call stack.
In this table the current input symbol is underlined and, for reduce moves, the salient
substring of the stack is underlined as well. As you can see, the parser does not get around to
the top of the parse tree, the root symbol S, until the final step. It builds the parse tree from
the bottom up.
A popular kind of shift-reduce parser is the LR(1) parser. The L signifies a left-to-right
scan of the input, the R signifies that the parse follows the order of a rightmost derivation
in reverse, and the 1 signifies that one symbol of input look-ahead is used (along with the
string on top of the stack) to select the next move. These techniques are considerably trickier
than LL(1) techniques. One difficulty is that reduce moves must operate on the top-of-the-
stack string, not just the top-of-the-stack symbol. Making this efficient requires some close
attention. (One implementation trick uses stacked DFA state numbers to avoid expensive
string comparisons in the stack.)
Grammars that can be parsed this way are called LR(1) grammars, and languages that
have LR(1) grammars are called LR(1) languages. In spite of their complexity, LR(1) parsers
are quite popular, chiefly because the class of LR(1) grammars is quite broad. The LR(1)
grammars include all the LL(1) grammars, and many others as well. Programming language
constructs can usually be expressed with LR(1) grammars that are reasonably readable;
making a grammar LR(1) usually does not require as many contortions as making it LL(1).
(For example, LR(1) grammars need not avoid left-recursive productions.)
LR(1) parsers are complicated—almost always too complicated to be written without
the help of special tools. There are many tools for generating LR(1) parsers automatically.
15.4 PDAS, DPDAS, AND DCFLS 207
A popular one is the Unix tool called yacc, which works a bit like the lex tool we saw in
Chapter 8. It converts a context-free grammar into C code for an LR(1) parser. In the yacc
input file, each production in the grammar is followed by a piece of C code, the action,
which is incorporated into the generated parser and is executed by the parser whenever a
reduce step is made using that production. For most compilers, this action involves the
construction of a parse tree or abstract syntax tree, which is used in subsequent (hand-coded)
phases of compilation.
Although they are complicated, LR(1) techniques have very good efficiency—like LL(1)
techniques, they take time that is essentially proportional to the length of the program being
parsed. Beyond LR(1) techniques there are many other parsing algorithms, including some
that have no restrictions at all and work with any CFG. The Cocke-Kasami-Younger (CKY)
algorithm, for example, parses deterministically and works with any CFG. It is quite a simple
algorithm too—much simpler than LR(1) parsing. The drawback is its lack of efficiency; the
CKY algorithm takes time proportional to the cube of the length of the string being parsed,
and that isn’t nearly good enough for compilers and other programming-language tools.
a,Z/x
q r
This says that if the PDA is in the state q, and the current input symbol is a, and the symbol
on top of the stack is Z, the PDA can go to the state r and replace the Z on the stack with the
string of symbols x.
There is considerable variety among PDA models. Some accept a string by emptying
their stack, like a stack machine; some accept by ending in an accepting state, like an NFA;
some must do both to accept. Some start with a special symbol on the stack; some start with
208 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
a special symbol marking the end of the input string; some do both; some do neither. All
these minor variations end up defining the same set of languages; the languages that can be
recognized by a PDA are exactly the CFLs.
The formal definition for a PDA is more complicated than the formal definition for
a stack machine, particularly in the transition function. The set of languages that can be
recognized is the same as for a stack machine. So why bother with PDAs at all? One reason is
because they are useful for certain kinds of proofs. For example, the proof that the CFLs are
closed for intersection with regular languages is easier using PDAs, since you can essentially
perform the product construction on the two state machines while preserving the stack
transitions of the PDA. Another reason is that they have a narrative value: they make a good
story. You start with DFAs. Add nondeterminism, and you have NFAs. Does that enlarge
the class of languages you can define? No! Then add a stack, and you have PDAs. Does that
enlarge the class of languages you can define? Yes! What is the new class? The CFLs!
But perhaps the most important reason for studying PDAs is that they have an
interesting deterministic variety. When we studied finite automata, we saw that the transition
function for an NFA gives zero or more possible moves from each configuration, while the
transition function for a DFA always gives exactly one move until the end of the string is
reached. One can think of DFAs as a restricted subset of NFAs. Restricting PDAs to the
deterministic case yields another formalism for defining languages: the deterministic PDA
(DPDA). For technical reasons, DPDAs are usually defined in a way that allows them to get
stuck in some configurations, so a DPDA always has at most one possible move. But like a
DFA, it never faces a choice of moves, so it defines a simple computational procedure for
testing language membership.
When we studied finite automata, we discovered that adding nondeterminism did not
add any definitional power; NFAs and DFAs can define exactly the same set of languages,
the regular languages. For PDAs, the case is different; DPDAs are strictly weaker than PDAs.
DPDAs define a separate, smaller class of languages: the deterministic context-free languages.
L(a*b*)
regular
languages {xx R | x ∈{a,b}* }
DCFLs
CFLs
{anbn}
For example, the language {xxR | x {a, b}*} is a CFL but not a DCFL. That makes
intuitive sense; a parser cannot know where the center of the string is, so it cannot make
a deterministic decision about when to stop pushing and start popping. By contrast, the
language {xcxR | x {a, b}*} is a DCFL.
The DCFLs have their own closure properties, different from the closure properties of
CFLs. Unlike the CFLs, the DCFLs are not closed for union: the union of two DCFLs is not
necessarily a DCFL, though of course it is always a CFL. Unlike the CFLs, the DCFLs are
closed for complement: the complement of a DCFL is always another DCFL. These different
closure properties can be used to help prove that a given CFL is not a DCFL. Such proofs are
usually quite tricky, since there is no tool like the pumping lemma specifically for DCFLs.
Part of the reason why we view the regular languages and the CFLs as important
language classes is that they keep turning up. Regular languages turn up in DFAs, NFAs,
regular expressions, and right-linear grammars, as well as many other places not mentioned
in this book. CFLs turn up in CFGs, stack machines, and PDAs. DCFLs get this kind of
validation as well. It turns out that they are the same as a class we have already met; the
DCFLs are the same as the LR(1) languages. These are exactly the languages that can be
quickly parsed using deterministic, bottom-up techniques.
Exercises
EXERCISE 1
Implement a predictive parser in Java for the language of expressions discussed in
Section 15.1, using the grammar and parse table as given. Your Java class PParse
should have a main method that takes a string from the command line and reports the
sequence of productions (used at line 12 in the predictiveParse pseudocode) used
to parse it. If the parse fails, it should say so. For example, it should have this behavior:
(Unix systems usually preprocess input from the command line and give special
treatment to symbols like (, ), and *. On these systems, you should enclose the
argument to PParse in quotation marks, as in java PParse "a+b*c".)
EXERCISE 2
Complete the pseudocode for the recursive-descent parser begun in Section 15.2.
EXERCISE 3
Implement a recursive-descent parser in Java for the language of expressions discussed
in Sections 15.1 and 15.2, using the grammar given. Your Java class RDParse should
have a main method whose visible behavior is exactly like that of the main method of
PParse in Exercise 1.
EXERCISES 211
EXERCISE 4
A stack machine is deterministic if it never has more than one possible move. To be
deterministic, a stack machine must meet two requirements:
1. For all {ε} and A , | ( , A)| 1. (No two entries in the table have the
same “read” and “pop” columns.)
2. For all A , if | (ε, A)| > 0 then for all c , | (c, A)| = 0. (If there is an ε-move
for some stack symbol A, there can be no other move for A.)
For example, consider this machine for the language L(a*b):
read pop push
1. ε S aS
2. ε S b
3. a a ε
4. b b ε
This is not deterministic, because moves 1 and 2 are both possible whenever there is an S
on top of the stack. A deterministic stack machine for the same language would be
read pop push
1. a S S
2. b S ε
Here, $ in the “read” column signifies a move that can be made only when all the input
has been read. The entry above says that if all the input has been read and if there is an
A on top of the stack, you may pop the A and push x in its place. Using this additional
kind of move (along with the others), give deterministic stack machines for the following
languages. (These languages lack the prefix property, so they cannot be defined using
plain deterministic stack machines.)
a. L((0 + 1) * 0)
b. {anbn}
EXERCISE 7
Using the extended definition of a deterministic stack machine from Exercise 6, prove by
construction that every regular language is defined by some deterministic stack machine.
(Hint: Start from a DFA for the language.) Include a detailed proof that the stack
machine you construct works as required.
EXERCISE 8
(Note to instructors: Read the sample solution before assigning this one!)
In this exercise you will write a recursive-descent parser for a language of quantified
Boolean formulas. Your parser will compile each string into an object representing the
formula. Each such object will implement the following interface:
/*
* An interface for quantified boolean formulas.
*/
public interface QBF {
/*
* A formula can convert itself to a string.
*/
String toString();
}
The simplest of these formulas are variables, which are single, lowercase letters, like x
and y. These will be compiled into objects of the Variable class:
/*
* A QBF that is a reference to a variable, v.
EXERCISES 213
*/
public class Variable implements QBF {
private char v; // the variable to which we refer
public Variable(char v) {
this.v = v;
}
public String toString() {
return "" + v;
}
}
Any QBF can be logically negated using the ~ operator, as in ~a. Logical negations will
be compiled into objects of the Complement class:
/*
* A QBF for a complement: not e.
*/
public class Complement implements QBF {
private QBF e; // the QBF we complement
public Complement(QBF e) {
this.e = e;
}
public String toString() {
return "~(" + e + ")";;
}
}
Any two QBFs can be logically ANDed using the * operator, as in a*b. These formulas
will be compiled into objects of the Conjunction class:
/*
* A QBF for a conjunction: lhs and rhs.
*/
public class Conjunction implements QBF {
private QBF lhs; // the left operand
private QBF rhs; // the right operand
public Conjunction(QBF lhs, QBF rhs) {
this.lhs = lhs;
this.rhs = rhs;
214 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
}
public String toString() {
return "(" + lhs + "*" + rhs + ")";
}
}
Similarly, any two QBFs can be logically ORed using the + operator, as in a+b. These
formulas will be compiled into objects of the Disjunction class:
/*
* A QBF for a disjunction: lhs or rhs.
*/
public class Disjunction implements QBF {
private QBF lhs; // the left operand
private QBF rhs; // the right operand
public Disjunction(QBF lhs, QBF rhs) {
this.lhs = lhs;
this.rhs = rhs;
}
public String toString() {
return "(" + lhs + "+" + rhs + ")";
}
}
/*
* A QBF that is universally quantified: for all v, e.
*/
public class Universal implements QBF {
private char v; // our variable
private QBF e; // the quantified formula
public Universal(char v, QBF e) {
this.v = v;
this.e = e;
EXERCISES 215
}
public String toString() {
return "A" + v + "(" + e + ")";
}
}
/*
* A QBF that is existentially quantified:
* there exists v such that e.
*/
public class Existential implements QBF {
private char v; // our variable
private QBF e; // the quantified formula
public Existential(char v, QBF e) {
this.v = v;
this.e = e;
}
public String toString() {
return "E" + v + "(" + e + ")";
}
}
This is a BNF grammar for the language, using <QBF> as the starting nonterminal:
<QBF> ::= A <V> <QBF> | E <V> <QBF> | <BF>
<BF> ::= <A> <R>
<R> ::= + <A> <R> | <empty>
<A> ::= <X> <B>
<B> ::= * <X> <B> | <empty>
<X> ::= ~ <X> | ( <QBF> ) | <V>
<V> ::= a | b | c | … | z
The alphabet of the language is exactly as shown, with no spaces. The grammar
establishes the precedence: ~ has highest precedence, and quantifiers have lowest
216 CHAPTER FIFTEEN: STACK MACHINE APPLICATIONS
The QBF classes already have toString methods that will convert them back into
strings for printing. So, if your compiler works correctly, the printed output will look
much like the input, except that there will be exactly one pair of parentheses for each
operator and each quantifier. Thus, you should see
Your code will compile QBF formulas, but it won’t do anything with them other than
print them out. Just throw an Error if the string does not parse. You should not have
to add anything to the QBF classes. (A later exercise, at the end of Chapter 20, will add
methods to the QBF classes, so that a formula can decide whether or not it is true.)
16
CHA P TER
Turing Machines
217
218 CHAPTER SIXTEEN: TURING MACHINES
TM state
machine
The TM’s input is presented on a tape with one symbol at each position. The tape extends
infinitely in both directions; those positions not occupied by the input contain a special
blank-cell symbol B. The TM has a head that can read and write symbols on the tape and
can move in both directions. The head starts at the first symbol in the input string.
A state machine controls the read/write head. Each move of the TM is determined by
the current state and the current input symbol. To make a move, the TM writes a symbol at
the current position, makes a state transition, and moves the head one position, either left
or right. If the TM enters an accepting state, it halts and accepts the input string. It does not
matter what the TM leaves on its tape or where it leaves the head; it does not even matter
whether the TM has read all the input symbols. As soon as the TM enters an accepting state,
it halts and accepts. This mechanism for accepting is quite different from that of DFAs and
NFAs. DFAs and NFAs can make transitions out of their accepting states, proceeding until
all the input has been read, and they often have more than one accepting state. In a TM,
transitions leaving an accepting state are never used, so there is never any real need for more
than one accepting state.
We will draw TMs using state-transition diagrams. Each TM transition moves the head
either to the right or to the left, as illustrated in these two forms:
a/b,R
q r
a/b,L
q r
The first says that if the TM is in the state q and the current tape symbol is a, the TM can
write b over the a, move the head one place to the right, and go to the state r. The second is
the same except that the head moves one place to the left.
As you can see, TMs are not much more complicated than DFAs. TMs are deterministic
and have no ε-transitions. Unlike a stack machines, TMs do not have a separate location for
1.2 SIMPLE TMS 219
storage. They use one tape both for input and for scratch memory. But they have a far richer
variety of behaviors than DFAs or stack machines and (as we will see) can define a far broader
class of languages.
b/b,R
a/a,R
c/c,R
B/ B,R c/c,R
B/B,R
c/c,R
B/B,R
This machine does not take advantage of a TM’s ability to move the head in both directions;
it always moves right. It also does not take advantage of a TM’s ability to write on the tape;
it always rewrites the same symbol just read. Of course, since it never moves to the left, it
makes no difference what it writes on the tape. Any symbols written are left behind and
never revisited. So it could just as well write a B over every symbol, erasing as it reads.
TMs can also easily handle all context-free languages. For example, consider the language
{a b }. Here is a series of steps for a TM that accepts this language:
n n
a/a,R
b/b,R
B/B,R a/ B,R
5 1 2
B/ B,L
B/ B,R
b/ B,L
4 3
a/a,L
b/b,L
It is also possible to take any stack machine and convert it into an equivalent TM that
uses the infinite tape as a stack. But, as in the example above, it is often easier to find some
non-stack-oriented approach.
1
B/B,R
a/X,R b/b,R
Z/Z,R
a/a,R b/Y,R
Y/Y,R 2 3
7 c/Z,L
a/X,R
X/X,R
5 4
a/a,L
Y/Y,R b/b,L
Y/Y,L
Z/Z,L
B/B,R 6
Y/Y,R
Z/Z,R
222 CHAPTER SIXTEEN: TURING MACHINES
Note that the tape alphabet includes all of the input alphabet, plus at least one additional
symbol B. The requirement that Q = {} means no state is also a symbol—not an onerous
requirement, but a necessary one for the definition of IDs that follows.
The transitions of the TM are defined by the transition function . The function
takes as parameters a state q Q (the current state) and a symbol X (the symbol at the
current position of the head). The value (q, X) is a triple (p, Y, D), where p Q is the
next state, Y is the symbol to write at the current head position, and D is one of the
directions L or R, indicating whether the head should move left or right. Note that a TM
is deterministic in the sense that it has at most one transition at any point. However, the
function need not be defined over its whole domain, so there may be some q and X with no
move (q, X).
Although the tape extends infinitely in both directions, it contains only Bs in those positions
that are outside the input string and not yet written by the TM. So, because the leading Bs
on the left and the trailing Bs on the right are suppressed, an ID is always a finite string. The
16.6 TO HALT OR NOT TO HALT 223
state q in the ID string tells what state the TM is in and also shows the position of the head.
The function for a TM determines a relation on IDs; we write I J if I is an ID
and J is an ID that follows from I after one move of the TM.
is a relation on IDs, defined by the function for the TM. For any
x *, c , q Q, a , and y *, we have
1. Left moves: if (q, a) = (p, b, L) then xcqay idfix(xpcby)
2. Right moves: if (q, a) = (p, b, R) then xcqay idfix(xcbpy)
For example, consider making a left move. Suppose the machine is in the configuration
given by the ID xcqay. So it is in the state q, reading the tape symbol a. Thus (q, a) gives
the transition. If that transition is (q, a) = (p, b, L), the machine can write a b over the a, go
to the state p, and move the head one position to the left. So after the move the ID is xpcby:
the a has been replaced by a b, and the head has moved left so that it is now reading the tape
symbol c.
Next, as usual, we define an extended relation * for sequences of zero or more steps:
Notice here that * is reflexive: for any ID I, I * I by a sequence of zero moves. Using the
* relation, we can define the language accepted by M:
In this definition, the TM starts in its start state reading the first symbol of the input
string x (or reading B, if x is ε—the initial ID normalized by idfix will be either Bq0x if
x ε or Bq0B if x = ε). It accepts x if it has a sequence of zero or more moves that goes to an
accepting state, regardless of what is left on the tape, and regardless of the final position of
the head.
1/1,R
q r
B/B,L
0/0,R
Formally this is M = ({q, r, s}, {0, 1}, {0, 1, B}, , B, q, {s}), where the transition function is
given by
(q, 1) = (r, 1, R)
(q, 0) = (s, 0, R)
(r, B) = (q, B, L)
Given the input string 0, M accepts (and thus halts):
Bq0 0sB
In general, M will accept a string if and only if it begins with a 0, so L(M) is the regular
language L(0(0 + 1)*). But M also exhibits the other two possible outcomes. Given the input
string ε, M rejects by halting in a nonaccepting state—the start state, since there is no move
from there:
BqB ?
Given the input string 1, M does not accept or reject, but runs forever without reaching an
accepting state:
Bq1 1rB Bq1 1rB Bq1 ...
Of course, it is possible to make a TM for the language L(0(0 + 1)*) that halts on all inputs.
In general, though, we will see that it is not always possible for TMs to avoid infinite loops.
We might say that the possibility of running forever is the price TMs pay for their great
power.
In previous chapters we have seen occasional glimpses of nonterminating computations.
Stack machines and NFAs can in some sense run forever, since they can contain cycles of
ε-transitions. But those nonterminating computations are never necessary. We saw how to
construct an equivalent DFA for any NFA—and DFAs always terminate. Similarly, it is
possible to show that for any stack machine with a cycle of ε-transitions you can construct an
equivalent one without a cycle of ε-transitions—which therefore always terminates. Thus, the
possibility of having a computation run forever has been a minor nuisance, easily avoidable.
16.7 A TM FOR {xcx | x ∈ {a, b}* } 225
With TMs, however, the situation is different. There are three possible outcomes when a
TM is run: it may accept, it may reject, or it may run forever. A TM M actually partitions *
into three subsets: those that make M accept (which we have named L(M )), those that make
M reject, and those that make M run forever. Instead of just defining L(M ), then, a TM
really defines three languages:
That third possibility is critical. There is a special name for TMs that halt on all inputs;
they are called total TMs.
In the next few chapters we will see some important results about the nonterminating
behavior of TMs. We will see that if we restrict our attention to total TMs we lose power. In
other words, we will see that these two sets of languages are not the same:
These may seem like odd names—what does recursion have to do with TMs? But it turns
out that these two sets of languages were independently identified by a number of different
researchers working in different areas of mathematics. Although we are defining these sets in
terms of TMs, the standard names come from mathematical studies of computability using
the theory of recursive functions.
c/c,R
q s
a/a',L
a/a',R
b/b',R a/a,L
b/b,L
b/b',L
r t
c/c,R
a/a,R a'/a',R
b/b,R b'/b',R
c/c,R
B/B,R
w x
a'/a',R
b'/b',R
This TM always halts, showing that {xcx | x {a, b}*} is a recursive language.
This example demonstrates two important higher-level techniques of TM construction.
The first is the idea of marking cells of the tape. As the machine illustrates, a cell can be
marked simply by overwriting the symbol there with a marked version of that symbol. In our
machine, the input alphabet is {a, b, c }, but the tape alphabet is {a, b, c, a', b', B}—it includes
marked versions of the symbols a and b.
The second is the technique of using the TM states to record some finite information. In
our example, we needed the machine to remember whether a or b was marked in the state p.
So we constructed the machine with two paths of states from p to u. The machine is in the
16.8 THREE TAPES 227
state q or s if it saw a and is expecting to see a matching a in the second half of the string; it
is in the state r or t if it saw b and is expecting to see a matching b. In a high-level description
of the machine we can simply say that it “remembers” whether a or b was seen; we can always
implement any finite memory using the states of the TM.
... B a p p l e B ...
... B p e a r B B ...
... B l e m o n B ...
We can encode all this information on one tape, using an enlarged tape alphabet:
Each symbol in this new alphabet is a triple, encoding all the information about three
symbols from the original alphabet and using markings to indicate the presence or absence of
the head at that position. For example, the symbol (a', p, l ) indicates that, in the three-tape
machine, there is an a on the first tape, a p on the second tape, and an l on the third tape; the
228 CHAPTER SIXTEEN: TURING MACHINES
head of the first tape is at this location, while the heads of the other two tapes are elsewhere.
This alphabet of triples is, of course, much larger than the original alphabet. The original
alphabet is first doubled in size by the addition of a marked version of each symbol; for
example, = {a, B} becomes 2 = {a, a', B, B'}. This enlarged alphabet is then cubed in size
by being formed into 3-tuples; for example, 2 = {a, a', b, b' } becomes
3
=( 2 2 2
)
= {(a, a, a), (a, a, a'), (a, a, B), (a, a, B'), (a, a', a), (a, a', a' ), (a, a', B), (a, a', B'), ...}
and so on, for a total of 64 symbols.
Now let M3 be any three-tape TM. We can construct an ordinary TM M1 to simulate
M3. M1 will use the alphabet of triples, as shown above, and will use (B, B, B) as its blank
symbol. To simulate one move of the three-tape TM, M1 will use a two-pass strategy like this:
1. Make a left-to-right pass over the tape until all three marked symbols have been
found. Use our state to record M3’s state and the symbols being read at each of
M3’s three heads. This information determines the move M3 will make. If M3
halts, halt M1 as well—in an accepting state if M3 was in an accepting state or in a
nonaccepting state if not.
2. Make a right-to-left pass over the tape, carrying out M3’s actions at each of its
three head positions. (This means writing what M3 would write and also moving
the marks to record M3’s new head positions.) Leave the head at the leftmost cell
containing any marked symbol. Go back to step 1 for the next move.
By repeating these passes, M1 simulates M3. Of course, it makes far more moves than M3.
It uses a far larger alphabet than M3. It uses far more states than M3. But none of that matters
for questions of computability. M1 reaches exactly the same decisions as M3 for all input
strings: accepting where M3 accepts, rejecting where M3 rejects, and running forever where
M3 runs forever. Using this construction we can conclude the following:
Theorem 16.1: For any given partion of a * into three subsets L, R, and F,
there is a three-tape TM M3 with L(M3) = L, R(M3) = R, and F(M3) = F, if and
only if there is a one-tape TM M1 with L(M1) = L, R(M1) = R, and F(M1) = F.
Proof sketch: As indicated above, given any three-tape TM, we can construct a
one-tape TM with the same behaviors. In the other direction, any one-tape TM
can easily be simulated by a three-tape TM that simply ignores two of its three
tapes.
q1 q2
Numbering a as 1
and b as , we have this transition function for M:
2
(q1, 1) = q2
(q1, 2) = q1
(q2, 1) = q1
(q2, 2) = q2
that is encoded as this string:
101011 0 101101 0 110101 0 11011011
(The spaces shown are not part of the string, but emphasize the division of the string into a
list of four transitions separated by 0s.)
Finally, we need to represent the set of accepting states. We are already representing each
state qi as 1i, so we can represent the set of accepting states as a list of such strings, using 0
as the separator. Putting this all together, we can represent the entire DFA by concatenating
230 CHAPTER SIXTEEN: TURING MACHINES
the transition-function string with the accepting-states string, using 00 as a separator. For the
example DFA M, this is
101011 0 101101 0 110101 0 11011011 00 11
This gives us a way to encode a DFA as a string of 1s and 0s. It should come as no
surprise that this can be done—programmers are accustomed to the idea that everything
physical computers manipulate is somehow encoded using the binary alphabet. Our goal
now is to define a TM that takes one of these encoded DFAs along with an encoded input
string for that DFA and decides by a process of simulation whether the DFA accepts the
string.
We’ll express this as a three-tape TM (knowing that there is an equivalent, though more
complex, one-tape TM). Our TM will use the first tape to hold the DFA being simulated
and use the second tape to hold the DFA’s input string. Both tapes are encoded using the
alphabet {0, 1}, as explained in the previous section. The third tape will hold the DFA’s
current state, representing each state qi as 1i. This shows the initial configuration of the
simulator, using the example DFA M, defined above, and the input string abab.
... B 1 0 11 0 1 0 11 B ...
... B 1 B ...
As before, the spaces are added in the illustration to emphasize the encoding groups.
To simulate a move of the DFA, our TM will perform one state transition and erase one
encoded symbol of input. In our example above, the machine is in the state q1 (as shown by
the third tape) reading the input 1 = a (as shown by the second tape). The first transition
recorded on the first tape says what to do in that situation: go to the state 2. So after one
simulated move, the machine’s state will be this:
... B B B 11 0 1 0 11 B ...
... B 11 B ...
Of course, it takes many moves of the TM to accomplish this simulation of one move of
the DFA. That one-move simulation is repeated until all the input has been erased; then the
final state of the DFA is checked to see whether or not the DFA accepts the string. This is the
16.10 SIMULATING OTHER AUTOMATA 231
strategy in more detail:
1. If the second tape is not empty, go to Step 2. Otherwise, the DFA’s input has all
been read and its computation is finished. Search the list of accepting states on the
first tape for a match with the DFA’s current state on the third tape. If a match is
found, halt in an accepting state; if it is not found, halt in a nonaccepting state.
2. Search the first tape for a delta-function move 1i01j01k, where 1i matches the
current state on the third tape and 1j matches the first input symbol on the second
tape. (Since a DFA has a move for every combination of state and symbol, this
search will always succeed.) The 1k part now gives the next state for the DFA.
3. Replace the 1i on the third tape with 1k, and erase the 1j and any subsequent
separator from the second tape, by writing Bs over them. This completes one move
of the DFA; go to Step 1.
The TM that simulates a DFA uses only a fixed, finite portion of each of its tapes. It
takes the encoded machine as input on its first tape and never uses more; it takes the encoded
input string on its second tape and never uses more; and it stores only one encoded state on
its third tape, which takes no more cells than there are states in the DFA. The existence of
this TM shows, indirectly, that all regular languages are recursive. (For any regular language,
we could construct a specialized version of the DFA-simulating TM, with a DFA for that
language wired in.)
One important detail we are skipping is the question of what to do with improper
inputs. What should the DFA simulator do with all the strings in {0, 1}* that don’t properly
encode DFAs and inputs? If our simulator needs to reject such strings, it will have to check
the encoding before beginning the simulation. For example, since the input is supposed to
be a DFA, it will have to check that there is exactly one transition from every state on every
symbol in the alphabet.
proper superset of the CFLs. This picture captures our language classes so far:
L(a*b*)
regular
languages {anbncn}
CFLs
recursive
{anbn}
languages
Using the same techniques, a TM can even act as a TM-interpreter. Such a TM is called
a universal Turing machine. To construct a universal TM, we first design an encoding of TMs
as strings of 1s and 0s, whose principal content is an enumeration of the transition function.
Then we can construct a three-tape universal TM, taking an encoded TM as input on the
first tape and encoded input string on the second tape and using the third tape to record its
current state. The second tape, containing the input to the simulated TM, also serves as that
TM’s working tape, so we need to use marked symbols there to record the location of the
simulated TM’s head. Simulating a move is simply a matter of looking up the appropriate
transition on the first tape, making the necessary change on the second tape, and updating
the state on the third tape. If the simulated TM has no next move, we make our simulator
halt—in an accepting state if the simulated TM accepted or in a rejecting state if it rejected.
Of course, if the simulated TM runs forever, our simulating TM will too.
Like all the TMs described in the last few sections, our universal TM is merely sketched
here. In fact, we have not given a detailed construction of any TM since Section 16.7. We
did not give a complete, formal construction for converting three-tape TMs to standard
TMs; we just presented a sketch. We did not really construct any TMs to simulate other
automata; we just sketched how certain three-tape TMs could be constructed and then relied
on the previous construction to conclude that equivalent standard TMs exist. In effect, we
demonstrated that all these TMs could be constructed, without actually constructing them.
When you want to show that something can be done using a Turing machine, rough
outlines are often more effective than detailed constructions. That is because it is very
difficult to figure out what a TM does just by inspection—very difficult even for small TMs,
and inhumanly difficult for large ones. Merely showing a large TM and claiming that it
recognizes a certain language is not a good way to convince anyone that the language can be
recognized by a TM. Instead, it is more convincing to give less detail—to describe in outline
how a TM can be constructed. Once you’re convinced it can be done there is no point in
actually doing it!
No point, that is, except for fun. People have actually constructed many universal TMs
EXERCISES 233
in detail. It serves virtually no purpose, but it is an interesting puzzle, especially if you try to
do it with the smallest possible number of states and the smallest possible alphabet. There is a
fine introduction to this puzzle in Marvin Minsky’s classic book, noted below.
Exercises
These exercises refer to ordinary, one-tape TMs. Show completed TMs in detail. (Yes, we
just asserted that high-level descriptions are more readable and that there is no point in
constructing large TMs in detail, except for fun. Conclusion: These exercises must be fun.)
EXERCISE 1
Give a TM for the language L(a*b*).
EXERCISE 2
Give a TM for the language accepted by this DFA:
0 1
1 0
1 0
EXERCISE 3
Give a TM for {x {a, b}* | x contains at least three consecutive as}.
EXERCISE 4
Give a TM for the language generated by this grammar:
S aSa | bSb | c
EXERCISE 5
The TM in Section 16.3 for {anbncn} has two states than can be merged. Rewrite it with
these states merged, so that it has one less state in total.
EXERCISE 6
Give a TM for {anbnc2n}.
234 CHAPTER SIXTEEN: TURING MACHINES
EXERCISE 7
Give a TM that accepts the same language as the TM in Section 16.6 but always halts.
EXERCISE 8
Construct a TM that accepts the language {anbncn} and runs forever on all strings that
are not accepted. Thus, for your TM M you will have L(M) = {anbncn}, R(M ) = {}, and
F(M) = {x {a, b, c}* | x {anbncn}}. Hint: Modify the machine from Section 16.3.
EXERCISE 9
Construct a TM that accepts the language {x {a, b, c}* | x {anbncn }}. Hint: Modify
the machine from Section 16.3.
EXERCISE 10
The TM of Section 16.7 was chosen to illustrate the concept of marking cells of the tape.
A simpler TM for the same language can be implemented if the symbols in the first half
of the string are not marked but simply erased (by being overwritten with a B). Using
this idea, reimplement the TM. (You should be able to eliminate a state.)
EXERCISE 11
Construct a TM for the language {0i1j0k | i > 0, j > 0, and i + j = k}.
EXERCISE 12
Construct a TM for the language {anbnaib j }.
EXERCISE 13
Construct a TM for the language L(a*b* + b*a*).
EXERCISE 14
Construct a TM that runs forever on inputs in L(a*b* + b*a*), but halts on all other
inputs in {a, b}*. (It doesn’t matter whether or not it accepts the other inputs.)
EXERCISE 15
Consider the language of encoded DFAs from Section 16.9. By removing the
requirement that there be exactly one transition from every state on every symbol in the
alphabet, we have a similar language of encoded NFAs. Give a regular expression for this
language.
EXERCISE 16
Construct a TM that accepts the language of strings that encode NFAs that have at least
one accepting state; that is, your TM must check its input for membership in the regular
language of the previous exercise and must check that the NFA encoded has at least one
accepting state.
EXERCISES 235
EXERCISE 17
Construct a TM that takes an input of the form 1i0x, where x is an encoding of a DFA
as in Section 16.9. Your TM should halt leaving only the string 1j on the tape, where
(q1, i) = qj. Other than this string 1j, the final tape should contain nothing but Bs. The
behavior of your TM on improperly formatted inputs does not matter, and it does not
matter whether the TM accepts or rejects.
EXERCISE 18
Construct a TM that takes an input of the form 1i0x, where x is an encoding of a DFA as
in Section 16.9. Your TM should accept if and only if the DFA accepts the one-symbol
string i. The behavior of your TM on improperly formatted inputs does not matter.
17
CHA P TER
Computability
237
238 CHAPTER SEVENTEEN: COMPUTABILITY
In this definition, the final state of the TM is not considered, nor does it matter where the
TM’s head is positioned when it halts.
Let’s look at an example. In Java and the other C-family languages we write the
expression x&y to express the bitwise AND of two integer operands x and y. We’ll show that
this is Turing computable, and to make this more interesting, we’ll extend it to allow the
operands to be arbitrarily long strings of bits. Imagine first that the TM’s input is presented
with the operands already stacked up, using Σ = { 00 , 01 , 10 , 11 } as the input alphabet. To show
that this and function is Turing computable, we need to find a TM that, given any x *
as input, halts with and(x) on its tape. For example, given 00 11 10 01 11 10 as input, our TM should
halt with 010010 on its tape, since and ( 00 11 10 01 11 10 )=010010 .
But nothing could be simpler:
0
0 /0,R
1
0 /0,R
0
1 /0,R
1
1 /1,R
This and machine simply proceeds from left to right over its input, overwriting each stacked-
up pair of input bits with the corresponding output bit. When the end of the input is
reached, the machine halts, having no transition on a B. This machine implements our and
function, demonstrating that the function is Turing computable. Notice that the and TM has
no accepting states, so L(and ) = {}. But we’re not interested in the language it defines; we’re
interested in the function it implements.
17.2 TM Composition
You might argue that we cheated in that last example by assuming that the input was
presented in stacked form, using an input alphabet completely different from the output
alphabet. What if the input is presented as two, linear, binary operands? What if our
input is not 00 11 10 01 11 10 but 011011#010110, where some special symbol # separates the two
17.2 TM COMPOSITION 239
operands? It’s simple: we can preprocess the input into the stacked form we need. We can
define a function stacker that converts linear inputs into stacked form, so that for example
stacker(011011#010110)= 00 11 10 01 11 10 . This function is Turing computable; here’s a machine to
compute it:
0 0
0 / 0 ,L
1 1
0 / 0 ,L
0 0
0/0,L 1 / 1 ,L
1 1
1/1,L 1 / 1 ,L
0 0
0/ 0 ,R
1 1
0/0,R #/#,L 0/ 0 ,R
0 0
1/1,R B/ 00 ,R 1/ 1 ,R
1 1
#/#,R / 1 ,R
0 / 00 ,R 1
0/0,R
0/B,L 1 / 10 ,R 1/1,R
#/#,R
B/B,L B/B,L
b
1/B,L B/ 01 ,R
0 / 01 ,R
#/B,L 1 / 11 ,R
#/#,L
0/0,L 0 0
/ 0 ,L
0
B/B,R 1/1,L 1 1
y z 0 / 0 ,L
0 0
1 / 1 ,L
1 1
0 0 1 / 1 ,L
0/ 0 ,L
1 1
0/ 0 ,L
0 0
1/ 1 ,L
1 1
1/ 1 ,L
0/ 00 ,L
1/ 10 ,L
The stacker machine uses the following strategy. In the start state it skips to the rightmost
symbol of the string. Then for each symbol in the second operand it makes a pass through
the main loop of the machine, starting and ending in the state b. On each such pass it erases
the rightmost remaining symbol of the second operand and overwrites the rightmost 0 or 1
of the first operand with the corresponding stacked symbol. (It treats B as equivalent to 0 in
the first operand, in case it is shorter than the second operand.)
When the second operand has been erased, the separating # is also erased and the
machine enters the state y. There it makes a final pass over the first operand. If the first
operand was longer than the second, it will still contain some unstacked 0s and/or 1s; in the
240 CHAPTER SEVENTEEN: COMPUTABILITY
state y they are replaced with stacked versions, as if the second operand had been padded
with leading 0s.
In its final state z, the TM halts, leaving the stacked version of its input on the tape. (If
the input is improperly formatted—if it does not include exactly one separator #—then all
bets are off, and the output of the stacker TM will not be a neatly stacked binary string. But
we don’t care about the output for such cases.) In the state z the stacker TM’s head is at the
leftmost symbol of the stacked string, which is exactly where our and TM wants it. So we
can simply compose the two machines, replacing the final state of the stacker TM with the
start state of the and TM. This gives us a TM that implements the composition of the two
functions: linearAnd(y) = and(stacker(y)).
Using TM composition is a bit like using subroutines in a high-level language. It is an
important technique for building computers of more elaborate functions.
17.3 TM Arithmetic
We turn now to the problem of adding binary numbers. As before, we’ll assume that
the operands are stacked binary numbers of arbitrary length. The add function will
take an input of that form and produce the simple binary sum as output. For example,
add ( 00 11 10 01 11 10 )=110001 (or, in decimal, 27 + 22 = 49).
The add function is Turing computable:
0 0
0 / 0 ,R
1 1 0
0 / 0 ,R 0 /0,L
0 0 1
1 / 1 ,R 0 /1,L
1 1 0
1 / 1 ,R 1 /1,L
B/B,L B/B,R
c0
0
0/1,L 1
1 /0,L
B/1,L
1
0 /0,L
c1 0
1 /0,L
1
1 /1,L
The add machine uses the following strategy. In the start state it skips to the rightmost
symbol of the string. Then for each pair of stacked symbols, working from right to left, it
overwrites that pair with the corresponding sum bit. The state of the TM records the carry-in
17.4 TM RANDOM ACCESS 241
from the previous bit; it is in the state c0 if the carry-in was 0 and the state c1 if the carry-in
was 1. When all the bits have been added the TM encounters the B to the left of the input. If
the final carry-out was a 1, it writes a leftmost 1 over the B; then it halts.
As before, we can combine this machine with the stacker machine to make an adder that
takes its operands linearly: linearAdd(x) = add(stacker(x)).
To decrement a binary number (that is, to subtract one from it) is a simple operation
closely related to addition. We can decrement a binary number simply by adding a
string of 1s to it and ignoring the carry-out from the final bit position. For example,
decrement(100) = 011; this is the same as linearAdd(100#111) = 1011, except for the final
carry-out. If you start with the add machine above, but assume that the bottom bit in each
stacked pair is a 1 and ignore the final carry-out, you get this decrement machine:
0/0,R
1/1,R 0/1,L
B/B,L B/B,R
c0 z
1/0,L
B/B,R
c1 nz
0/0,L
1/1,L
The extra final state in the decrement machine may come in handy, since it indicates whether
the number that has just been decremented was already zero. If it was zero, there was no
carry-out from the leftmost bit, so the final state is z. If it was not zero, the final state is nz.
So our decrement machine can be used not only as a decrement-function computer, but also as
a decrement-and-test component in larger machines.
0'/0,L
0'/0,R
1'/1,L
1'/1,R 1/0,L B'/B,L
B'/B,R
B/B,R
nz c1
0/0,R
1/1,R 0/0,L
#/#,R 1/1,L
B/B,R
Notice that our decrement machine is embedded here, in the states labeled c0, c1, nz, and z.
The ith machine starts by marking the first symbol of the string s. Then, in its main loop, it
repeatedly decrements i and moves the mark one place to the right. When the decrementing
step finds that i was zero (entering the state z), the currently marked symbol is the one that
should be output; the ith machine then erases everything except the marked symbol and
halts.
Although these mathematicians all started out in different directions, they all arrived at
the same place. With suitable conversions between the different kinds of data—for instance,
representing the natural numbers of -recursive functions as strings of digits for Turing
machines—it turns out that all those formalisms for computation are interconvertible.
Any Turing machine can be simulated by a Post system and vice versa, any Post system
can be simulated by a -term and vice versa, and so on. People say that any formalism for
computation that is interconvertible with Turing machines is Turing equivalent. (It would
also be correct to say that they are “Post-system equivalent,” “ -calculus equivalent,” and
so on, but Turing machines are the most popular of the formalisms, perhaps because they
are the most like physical computers.) All Turing-equivalent formalisms have the same
computational power as Turing machines; they also have the same lurking possibility of
infinite computation.
In 1936, Alonzo Church and Alan Turing both suggested that their equivalent
formalisms had in fact captured the elusive idea of “effective computational procedure.”
In effect, they were suggesting that computability by a total TM be taken as the definition
of computability. (Note that the definition of Turing computable requires a total Turing
machine. A computational procedure that sometimes goes on forever is naturally not
considered to be effective!) This has come to be known as Church’s Thesis, or the Church-
Turing Thesis. It is called a thesis, not a theorem, because it is not the kind of thing that is
subject to proof or disproof. But it is a definition that has come to be generally accepted,
and today it marks the border of our happy realm. One thing that makes Church’s Thesis so
compelling is the effect we have mentioned so often before: it just keeps turning up. Many
researchers walking down different paths of mathematical inquiry found the same idea of
effective computation waiting for them at the end. That was true in the 1930s, and it is even
more true today—because today we have the additional evidence of all modern programming
languages and all the physical computer systems that run them. They too are interconvertible
with Turing machines, as we will see in the next section. Java, ML, Prolog, and all the
rest just add more support for Church’s Thesis; they are Turing-equivalent formalisms for
computation. “Computable by a Java program that always halts” is the same as “Turing
computable,” and according to Church’s Thesis, that’s what computable means, period.
There are several different words in common use for this same concept of computability.
A function that is Turing computable (like addition on binary numbers) is called a
computable function. A language that is recognized by some total Turing machine (like
{anbncn}) is called a recursive language. When there is some total Turing machine that decides
whether an input has a given property (like being a prime number or the square of an
integer), we say the property is decidable. But beware: some authors prefer a different usage,
and some just invent new terms!
An important related term is algorithm. Its meaning is quite close to that of effective
computational procedure, as defined by Church’s Thesis. The usage of algorithm tends to be
17.7 TM AND JAVA ARE INTERCONVERTIBLE 245
broader, however. There are fault-tolerant, distributed algorithms; probabilistic algorithms;
and interactive algorithms. These kinds of algorithms don’t compute functions, so they’re
not a good fit for Church’s Thesis. In casual use the word algorithm sometimes even refers to
things like recipes, which are essentially noncomputational.
/**
* A Turing machine for the language a^n b^n c^n.
*/
public class TManbncn {
/*
* A constant for the tape blank.
*/
private static final char B = 0;
/*
* The transition function. A char is used to
* identify each state, and one of the chars
* 'R' or 'L' is used to identify the head directions.
* Each transition delta(q,a)=(p,b,D), where q is the
* current state, a is the symbol being read, p is
* the next state, b is the symbol to write, and D is
* the direction to move the head, is represented by
* a char array of length five: {q,a,p,b,D}.
* Individual move arrays occur in the delta array in
* no particular order, so looking up a move requires
* a linear search.
*/
private static final char[][] delta = {
{1,'a',2,'X','R'}, // delta(q1,a) = (q2,X,R)
246 CHAPTER SEVENTEEN: COMPUTABILITY
{1,B,7,B,'R'}, // etc.
{2,'a',2,'a','R'},
{2,'Y',2,'Y','R'},
{2,'b',3,'Y','R'},
{3,'b',3,'b','R'},
{3,'Z',3,'Z','R'},
{3,'c',4,'Z','L'},
{4,'a',4,'a','L'},
{4,'b',4,'b','L'},
{4,'Z',4,'Z','L'},
{4,'Y',4,'Y','L'},
{4,'X',5,'X','R'},
{5,'a',2,'X','R'},
{5,'Y',6,'Y','R'},
{6,'Y',6,'Y','R'},
{6,'Z',6,'Z','R'},
{6,B,7,B,'R'}
};
/*
* A String containing the char for each accepting
* state, in any order.
*/
private static final String accepting = "\7";
/*
* The initial state.
*/
private static final char initial = 1;
Formally, a TM is M = (Q, , , , B, q0, F). In this simulation, we encode only the last four
parts. The tape alphabet is taken to be Java’s char type, and the input alphabet is taken to
be char – {B}. The state set is not explicity represented, but the states are represented by
char values. Our machine uses the states 1 through 7.
The class definition continues:
/*
* The TM's current tape and head position. We always
* maintain 0 <= head < tape.length(), adding blanks
* to the front or rear of the tape as necessary.
17.7 TM AND JAVA ARE INTERCONVERTIBLE 247
*/
private String tape;
private int head;
/*
* The current state.
*/
private char state;
/**
* Decide whether the TM accepts the given string. We
* run the TM until it either enters an accepting state
* or gets stuck. If the TM runs forever on the given
* string, this will of course never return.
* @param s the String to test
* @return true if it accepts, false if not
*/
public boolean accepts(String s) {
state = initial;
head = 0;
tape = s;
while (true) {
if (accepting.indexOf(state)!=-1) return true;
char[] move = lookupMove(state,tape.charAt(head));
if (move==null) return false;
executeMove(move[2],move[3],move[4]);
}
}
The accepts method simulates the TM on a given input string. Its while loop simulates
one move of the TM for each iteration. The loop has two exits. If the machine is in an
accepting state, the simulation halts and accepts; if there is no next move from the current
configuration, the simulation halts and rejects. (To test for an accepting state, it uses the
indexOf method of the String class: s.indexOf(c) will return -1 if and only if the
char c does not occur in the String s.)
248 CHAPTER SEVENTEEN: COMPUTABILITY
/**
* Execute one move of the TM.
* @param newstate the next state
* @param symbol the char to write
* @param dir the direction ('L' or 'R') to move
*/
void executeMove(char newstate, char symbol, char dir) {
if (dir=='L') {
if (head==0) tape = B + tape; else head -= 1;
}
else {
head += 1;
if (head==tape.length()) tape += B;
}
/**
* Find the move for the given state and symbol.
* @param state current state
* @param symbol symbol at current head position
* @return the five-element char[] from the delta table
*/
char[] lookupMove(char state, char symbol) {
for(int i = 0; i<delta.length; i++) {
17.7 TM AND JAVA ARE INTERCONVERTIBLE 249
char[] move = delta[i];
if (move[0]==state && move[1]==symbol) return move;
}
return null;
}
That concludes the implementation of the TManbncn simulator. To test the String s for
membership in {anbncn}, we just create a TManbncn object and use its accepts method,
like this:
Of course, there are much more efficient ways for a Java program to test a string for
membership in {anbncn}. Previous chapters examined implementations of DFAs, NFAs, and
stack machines, and those techniques had important practical applications. Unlike them, our
TM implementation has no practical value. It is just an exercise that helps you understand
how TMs work, and it illustrates one direction of the Turing-equivalence construction.
In the previous chapter we showed how to construct TMs that interpret other automata
encoded as input strings. Ultimately, we saw that it is possible to construct universal TMs
that interpret other TMs encoded as input strings. Our Java implementation is not quite
like that, since the TM being interpreted is hardwired into the program, in the form of those
definitions of the variables B, delta, accepting, and initial. But it would be easy
enough to modify the code to read the values of those variables from a file. Then it would be
an implementation of a universal Turing machine.
The other direction—constructing a TM equivalent to a given Java program—is also
possible, but requires a great deal more work. A TM for Java could have two parts: a Java
compiler (converting the Java source to a machine language) and an interpreter for that
machine language. The Java compiler could be written in machine language, so we only really
need the second part, the machine-language interpreter. We have already seen that TMs can
perform the basic operations of such an interpreter: Boolean logic, binary arithmetic, indexed
addressing, and so on. Following this outline, it would be possible (though inhumanly
complicated) to build a TM that serves as a Java interpreter, just as it was possible (and rather
simple) to build a Java program that serves as a TM interpreter.
Of course, there’s nothing in all this that is unique to Java. A similar argument
about interconvertibility can be made for any high-level programming language. Our
demonstration of this has been informal; rigorously proving that TMs are equivalent in
power to a particular high-level language is a long and tedious chore. But such proofs have
250 CHAPTER SEVENTEEN: COMPUTABILITY
been done and are universally accepted. There is nothing that high-level languages can do
that TMs cannot do. All modern programming languages turn out to be Turing equivalent,
adding still more support to the Church-Turing Thesis.
Exercises
EXERCISE 1
The linearAnd function can be implemented with fewer states and a smaller tape alphabet,
if implemented directly and not as a composition of and with stacker. Show how.
EXERCISE 2
Let increment be the binary increment function, increment(x) = add(stacker(x#1)).
Give a TM that implements increment. Note that the definition requires it to treat
ε as a representation for 0, so that increment (ε) = 1. Hint: Actually using stacker in
your implementation would be doing it the hard way. The increment function can be
implemented with just three states, using just {0, 1, B} as the tape alphabet.
EXERCISE 3
Let roundup be the binary, round-up-to-even function, roundup(x) = x if x represents
an even number, or increment (x) if x represents an odd number. Give a TM that
implements roundup. Treat ε as a representation for zero, so that roundup(ε) = ε.
EXERCISE 4
Let rounddown be the binary, round-down-to-even function, rounddown(x) = x if
x represents an even number, or decrement (x) if x represents an odd number. Give
a TM that implements rounddown. Treat ε as a representation for zero, so that
rounddown(ε) = ε.
EXERCISE 5
For any x {a, b}*, let mid(x) be the middle symbol of x or ε if x has even length. Give a
TM that implements mid.
EXERCISE 6
Give a TM that implements subtraction on unsigned, binary numbers, assuming stacked
inputs as for the add TM of Section 17.3. Assume that the second operand is less than or
equal to the first, so that the result can be represented as an unsigned, binary number.
EXERCISE 7
Give a TM that implements multiplication by two on unary numbers, using
= {1}. Given an input string 1i, your TM should halt leaving 12i (and nothing else) on
the tape.
EXERCISE 8
Give a TM that implements division by two on unary numbers, using = {1}. Given
an input string 1i, your TM should halt leaving 1i/2 (and nothing else) on the tape. If i is
odd, round up.
252 CHAPTER SEVENTEEN: COMPUTABILITY
EXERCISE 9
For any x {0, 1}*, let val(x) be the natural number for which x is a binary representa-
tion; for completeness, let val(ε) = 0. Give a TM that implements a binary-to-unary
conversion function, btu(x) = 1val(x).
EXERCISE 10
Let utb be a unary-to-binary conversion function (an inverse of btu from the previous
exercise). Give a TM that implements utb.
EXERCISE 11
Write a Java method anbncn, so that anbncn(s) returns true if String s is
in {anbncn} and returns false if not. Make it as efficient as possible (which of course
means not implementing it as a TM simulation). Then answer in as much detail as you
can, how does the speed of your anbncn compare with that of the accepts method
in TManbncn?
EXERCISE 12
Write a Java implementation of the TM implementing the linearAdd function, as
described in Sections 17.2 and 17.3. Include a main method that reads the input
string from the command line and writes the output string (that is, the final contents of
the tape, not including leading and trailing Bs) to standard output. For example, your
program should have this behavior:
> java TMlinearAdd 1010#1100
10110
EXERCISE 13
Write a Java implementation of a universal Turing machine. Start with the code for
TManbncn, but alter it to make it read the TM to be simulated from a file. You can
design your own encoding for storing the TM in a file, but it should be in a text format
that can be read and written using an ordinary text editor. Your class should have a main
method that reads a file name and a text string from the command line, runs the TM
encoded in the given file on the given string, and reports the result. For example, if the
file anbncn.txt contains the encoding of a TM for the language {anbncn}, then your
program would have this behavior:
> java UTM anbncn.txt abca
reject
> java UTM anbncn.txt aabbcc
accept
EXERCISES 253
EXERCISE 14
Complete the following short story:
The judge sighed as he sat down before the computer screen. He was administering
his eighth Turing test of the day, and he was tired of it. Six had been machines, their
classification easily revealed by their responses to simple conversational gambits like “So
... you married?” or “How 'bout them Yankees?” There had been only
one human this morning, and the judge had been sure of the classification in less than
a minute. He had always had a knack for this kind of testing, and the more he did, the
quicker he got.
But this morning it was getting tedious. One more test, he thought, and I can break for
lunch. He rubbed his eyes, sat up straighter in his chair, and quickly typed:
Would you say that your mind is finite?
Slowly, the response came back ...
18
CHA P TER
Uncomputability
L(a*b* )
regular
languages {anbncn}
CFLs
recursive
{anbn}
languages
255
256 CHAPTER EIGHTEEN: UNCOMPUTABILITY
boolean ax(String p) {
return (p.length()>0 && p.charAt(0)=='a');
}
(For those not familiar with Java, the expression p.length() is the length of the String
object referred to by p, and the expression p.charAt(i) yields the ith character of the
String p, counting from zero.) Here are decision methods for the languages {} and *:
boolean emptySet(String p) {
return false;
}
boolean sigmaStar(String p) {
return true;
}
As with Turing machines, we will refer to the language accepted by a decision method m
as L(m). So L(emptySet) = {} and L(sigmaStar) = *. Decision methods give us an
alternative, equivalent definition of a recursive language: a language is recursive if and only if
it is L(m) for some decision method m.
For methods that might run forever, we will use the broader term recognition method.
These correspond to TMs that are not necessarily total.
boolean anbncn1(String p) {
String as = "", bs = "", cs = "";
while (true) {
String s = as+bs+cs;
if (p.equals(s)) return true;
as += 'a'; bs += 'b'; cs += 'c';
}
}
This is a highly inefficient way of testing whether the string p is in {anbncn}, but we are not
concerned with efficiency. We are at times concerned with avoiding nontermination; the
recognition method anbncn1 loops forever if the string is not in the language. Thus it only
demonstrates that {anbncn} is RE. We know that {anbncn} is a recursive language, so there is
some decision method for it, such as this one:
boolean anbncn2(String p) {
String as = "", bs = "", cs = "";
while (true) {
String s = as+bs+cs;
if (s.length()>p.length()) return false;
else if (p.equals(s)) return true;
as += 'a'; bs += 'b'; cs += 'c';
}
}
/**
* run(rSource, in) runs a recognition method, given
* its source code. If rSource contains the source
* code for a recognition method r, and in is any
258 CHAPTER EIGHTEEN: UNCOMPUTABILITY
String s =
"boolean ax(String p) { " +
" return (p.length()>0 && p.charAt(0)=='a'); " +
"} ";
run(s,"ba");
run would return false, since ax("ba") returns false. Finally, in this fragment:
String s =
"boolean anbncn1(String p) { " +
" String as = \"\", bs = \"\", cs = \"\"; " +
" while (true) { " +
" String s = as+bs+cs; " +
" if (p.equals(s)( return true; " +
" as += 'a'; bs += 'b'; cs += 'c'; " +
" } " +
"} ";
run(s,"abbc");
run would never return, because anbncn1("abbc") runs forever. Implementing run
would be a lot of work—see any textbook on compiler construction for the details—but it is
clearly possible.
The run method doesn’t quite fit our definition of a recognition method, because it
takes two input strings. We could make it fit by redefining so that it takes one delimited
input string: something like run(p+'#'+in) instead of run(p,in). That’s the kind
of trick we had to use in Chapter 17 to give a Turing machine more than one input; recall
18.2 THE LANGUAGE LU 259
linearAdd(x#y). But to keep the code more readable, we will just relax our definitions a bit,
allowing recognition and decision methods to take more than one parameter of the String
type. With this relaxation, the run method qualifies as a recognition (but not a decision)
method.
int j = 0;
for (int i = 0; i < 100; j++) {
j += f(i);
}
Oops! That postincrement expression should read i++, not j++. As it stands, the variable i
is never assigned any value other than 0, so the loop never stops. Not noticing this mistake,
you run the program. Perhaps that function f is something complicated, so you expect the
computation to take a few minutes anyway. Ten minutes go by, then twenty, then thirty.
Eventually you ask yourself the question programmers around the world have been asking
since the dawn of the computer age: is this stuck in an infinite loop, or is it just taking a long
time? There is, alas, no sure way for a person to answer such questions—and, as it turns out,
no sure way for a computer to find the answer for you.
The existence of our run method shows that a particular language is RE:
L(run) = {(p, in) | p is a recognition method and in L(p)}
The language is RE because we have a recognition method that accepts it—the run method.
A corresponding language for TMs is this:
{m#x | m encodes a TM and x is a string it accepts}
The language is RE because we have a TM that accepts it—any universal TM. In either case,
we’ll call the language Lu—remember u for universal.
Is Lu recursive, that is, is it possible to write a decision method with this specification?
/**
* shortcut(p,in) returns true if run(p,in) would
* return true, and returns false if run(p,in)
* would return false or run forever.
*/
boolean shortcut(String p, String in) {
...
}
260 CHAPTER EIGHTEEN: UNCOMPUTABILITY
The shortcut method would be just like the run method, but it would always produce
an answer—not run forever, even when run(p,in) would. For example, in this fragment:
String x =
"boolean anbncn1(String p) { " +
" String as = \"\", bs = \"\", cs = \"\"; " +
" while (true) { " +
" String s = as+bs+cs; " +
" if (p.equals(s)) return true; " +
" as += 'a'; bs += 'b'; cs += 'c'; " +
" } " +
"} ";
shortcut(x,"abbc");
But is it possible to write shortcut so that it can detect and avoid all possible infinite
loops? It would certainly be valuable; imagine a debugging tool that could reliably alert you
to infinite computations! Unfortunately, we can prove that no such shortcut method
exists.
Such proofs are tricky; it isn’t enough to say that we tried really hard but just couldn’t
think of a way to implement shortcut. We need a proof that no such implementation is
possible. The proof is by contradiction. Assume by way of contradiction that Lu is recursive,
so some implementation of shortcut exists. Then we could use it to make this decision
method:
/**
* nonSelfAccepting(p) returns false if run(p,p)
* would return true, and returns true if run(p,p)
* would return false or run forever.
*/
18.2 THE LANGUAGE LU 261
boolean nonSelfAccepting(String p) {
return !shortcut(p,p);
}
This decision method determines what the program would decide, given itself as input—
then returns the opposite. So the language defined by nonSelfAccepting is simply that
set of recognition methods that do not accept, given themselves as input. For example,
nonSelfAccepting(
"boolean sigmaStar(String p) {return true;}"
);
returns false. That’s because sigmaStar accepts all strings, including the one that starts
boolean sigmaStar .... In other words, sigmaStar accepts itself, and therefore
nonSelfAccepting returns false given sigmaStar as input. On the other hand,
nonSelfAccepting(
"boolean ax(String p) { " +
" return (p.length()>0 && p.charAt(0)=='a'); " +
"} "
);
tests whether the string boolean ax ... is accepted by the decision method ax.
Since the string begins with b, ax returns false. It does not accept itself, and therefore
nonSelfAccepting returns true given ax as input.
Now comes the tricky part. What happens if we call nonSelfAccepting, giving it
itself as input? We can easily arrange to do this:
nonSelfAccepting(
"boolean nonSelfAccepting(p) { " +
" return !shortcut(p,p); " +
"} "
)
What does nonSelfAccepting return, given itself as input? If it accepts itself, that
means shortcut determined it was not self-accepting—which is a contradiction. If it
rejects itself, that means shortcut determined it was self-accepting—also a contradiction.
Yet it must either accept or reject; it cannot run forever, because shortcut is a decision
method. We’re left with a contradiction in all cases. By contradiction, our original
assumption must be false: no program satisfying the specifications of shortcut exists. In
other words:
262 CHAPTER EIGHTEEN: UNCOMPUTABILITY
This is our first example of a problem that lies outside the borders of computability. Lu is
not recursive; equivalently, we can say that the shortcut function is not computable and
the machine-M-accepts-string-x property is not decidable. This verifies our earlier claim that
total TMs are weaker than general TMs. No total TM can be a universal TM.
/**
* haltsRE(p,in) returns true if run(p,in) halts.
* It just runs forever if run(p,in) runs forever.
*/
boolean haltsRE(String p, String in) {
run(p,in);
return true;
}
/**
* halts(p,in) returns true if run(p,in) halts, and
* returns false if run(p,in) runs forever.
*/
boolean halts(String p, String in) {
...
}
The halts method would be just like the haltsRE method, but it would always produce
an answer—not run forever, even when run(p,in) would.
From our results about Lu, you might guess that Lh is not going to be recursive either.
Intuitively, the only way to tell what p will do when run on in is to simulate it—and if
that simulation runs forever, we won’t get an answer. But that kind of intuitive argument
is not strong enough to constitute a proof. How do we know there isn’t some other way of
determining whether p halts, a way that doesn’t involve actually running it?
The proof that this is impossible is by contradiction. Assume by way of contradiction
that Lh is recursive, so some implementation of halts exists. Then we could use it to make
this program:
/**
* narcissist(p) returns true if run(p,p) would
* run forever, and runs forever if run(p,p) would
* halt.
*/
boolean narcissist(String p) {
if (halts(p,p)) while(true) {}
else return true;
}
This method determines whether the program p will contemplate itself forever—that is, it
defines the language of recognition methods that run forever, given themselves as input.
Now comes that trick using self-reference. What happens if we call narcissist,
giving it itself as input? We can easily arrange to do this:
narcissist(
"boolean narcissist(p) { " +
" if (halts(p,p)) while(true) {} " +
264 CHAPTER EIGHTEEN: UNCOMPUTABILITY
Does it run forever? No, that can’t be right; if it did we would have a contradiction, since
it would be saying that it halts. So then does it halt (returning true)? No, that can’t be right
either; if it did it would be saying that it runs forever. Either way we have a contradiction.
By contradiction, we conclude that no program satisfying the specifications of halts exists.
This proves the following:
At this point, we have identified our first two languages that are not recursive, and our
picture looks like this:
L(a*b*)
regular
languages {anbncn}
CFLs Lu
Lh
recursive
languages {anbn}
The non-recursive languages don’t stop there, however. It turns out that there are
uncountably many languages beyond the computability border. The question of whether
a program halts on a given input is a classic undecidable problem: a halting problem. It has
many variations: does a program halt on a given input? Does it halt on any input? Does it
halt on every input? That last variation would be the most useful. It would be nice to have
a program that could check over your code and warn you about all possible infinite loops.
Unfortunately, all these variations of the halting problem are undecidable. In fact, as we will
see below, most questions about the runtime behavior of TMs (or computer programs) turn
out to be undecidable.
g. }
h. boolean b = anbncn2(x2);
i. return !b;
j. }
Lines b through g are step 1; they translate the string x1 into a new string x2, in this case by
replacing the ds with cs. Line h is step 2; it checks whether x2 L2, in this case using our
previously implemented decision method anbncn2. (In general, we don’t need to show all
the code for step 2; if we know that L2 is recursive, we know some decision method for it
exists, and that’s enough.) Line i is step 3; it converts the answer for x2 L2 into an answer
for x1 L1, in this case by logical negation. Steps 1 and 3 obviously cannot run forever, so
this reduction shows that if L2 is recursive (which we know it is) then L1 is recursive.
Here is another example of a proof using a reduction: we’ll prove that {anbn} is recursive
by reduction to the recognition problem for {anbncn}:
This reduces the problem of deciding membership in {anbn} to the problem of deciding
membership in {anbncn}. As we have already shown that {anbncn} is recursive, we conclude that
{anbn} is recursive. Of course, we already knew that; in fact, {anbn} is not only recursive but
also context free. This illustrates the one-way nature of the inference you can draw from a
reduction. By reducing from a new problem to a problem with a known solution, we show
that the new problem is no harder than the known one. That reduction does not rule out the
possibility that the new problem might be easier.
For any given program p and input string x, halts constructs a string x2
containing the source of a new decision method f. The f method ignores its
input string z. Instead of looking at z, it runs p on x, then returns true. Now
if p runs forever on x, f will not return, so the language of strings z it accepts
is {}. But if p halts on x, f will return true, so the language of strings z it
accepts is *. Thus x2 Le if and only if p runs forever on x. We conclude that
halts is a decision method for Lh. But this is a contradiction, since Lh is not
recursive. By contradiction, we conclude that Le is not recursive.
The reduction shows that Le is no easier to decide than Lh. But we know Lh is not
recursive; so we conclude that Le is not recursive. This is a very important application for
reductions. To show that a language is recursive is comparatively easy: you only need to give
a program that recognizes it and always halts. But to show that a language is not recursive is
generally more difficult. Often, the best choice is to give a reduction from a problem that is
already known to be nonrecursive.
Here’s another example along the same lines. Define
Lr = {p | p is a recognition method and L(p) is regular}
For example, the string
For any given program p and input string x, halts constructs a string x2
containing the source of a new decision method f. The f method runs p on x,
then returns anbn(z). Now if p runs forever on x, f will not return, so the
language of strings z it accepts is {}—a regular language. But if p halts on x, f
will return anbn(z), so the language of strings z it accepts is {anbn}—not a
regular language. Thus x2 Lr if and only if p runs forever on x. We conclude
that halts is a decision method for Lh. But this is a contradiction, since Lh is
not recursive. By contradiction, we conclude that Lr is not recursive.
Theorem 18.4 (Rice’s theorem): For all nontrivial properties , the language
18.7 Enumerators
We have generally treated Turing machines as language recognizers: machines that take an
input string and try to determine its membership in a language. When Alan Turing published
the 1936 paper introducing his model for effective computation, his concept was slightly
different. He envisioned his tape-memory automata as language enumerators: machines that
take no input but simply generate, on an output tape, a sequence of strings.
This is just another way of defining a language formally: L(M ) for such a machine is the
language of strings that eventually appear on the output tape:
L(M) = {x | for some i, x is the ith string in M ’s output sequence}
Like all Turing machines, enumerators may run forever. In fact, they must, if they enumerate
infinite languages; and they may even if they enumerate finite languages. In Java-like
notation we can think of these original Turing machines as enumerator objects:
An enumerator class is a class with an instance method next that takes no input
and returns a string (or runs forever).
272 CHAPTER EIGHTEEN: UNCOMPUTABILITY
An enumerator object may preserve state across calls of next, so next may (and generally
does) return a different string every time it is called. For an enumerator class C, we’ll take
L(C) to be the set of strings returned by an infinite sequence of calls to the next method of
an object of the C class.
For example, this is an enumerator class AStar with L(AStar) = {a}*:
class AStar {
int n = 0;
String next() {
String s = "";
for (int i = 0; i < n; i++) s += 'a';
n++;
return s;
}
}
For this enumerator, the jth call to next (counting from 0) will return a string of j as. In
general, an enumerator class won’t be limited to enumerating the strings in order of their
length, but this one just happens to be specified that way.
For a more interesting example, suppose we have a method isPrime that tests whether
a number is prime. Then we can implement this enumerator class:
class TwinPrimes {
int i = 1;
String next() {
while (true) {
i++;
if (isPrime(i) && isPrime(i+2))
return i + "," + (i+2);
}
}
}
This is an enumerator for a language of twin primes. A twin prime is a pair of numbers
(i, i+2) that are both prime. So a series of calls to next would return
3,5
5,7
18.7 ENUMERATORS 273
11,13
17,19
29,31
and so on. There is a famous conjecture that there are infinitely many twin primes, but no
one has been able to prove it. So the language enumerated by TwinPrimes may or may
not be infinite. If it is not, there is some largest pair of twin primes, and a call made to next
after that largest pair has been returned will run forever.
Here’s an enumeration problem whose solution we’ll need later on: make an enumerator
class for the set of all pairs of numbers from N, {(j, k) | j 0, k 0}. (As usual, we'll represent
the natural numbers j and k as decimal strings.) This is a bit trickier. This enumerates all the
pairs (0, k):
class BadNatPairs1 {
int k = 0;
String next() {
return "(0," + k++ + ")";
}
}
But that’s not good enough. This enumerates all the pairs (j, k) where j = k, and that’s not
good enough either:
class BadNatPairs2 {
int j = 0;
int k = 0;
String next() {
return "(" + j++ + "," + k++ + ")";
}
}
Instead we need to visit the ( j, k) pairs in an order that eventually reaches every one. This is
one way:
class NatPairs {
int n = 0;
int j = 0;
274 CHAPTER EIGHTEEN: UNCOMPUTABILITY
String next() {
String s = "(" + j + "," + (n-j) + ")";
if (j<n) j++;
else {j=0; n++;}
return s;
}
}
j=4
j=3
j=2
etc.
j=1
j=0
k=0 k=1 k=2 k=3 k=4 k=5
so every point in the space, every pair ( j, k) in the language, is reached eventually.
For a last example, imagine defining an enumerator class SigmaStar that enumerates
*. If = {a, b}, a SigmaStar object might return these strings for a sequence of calls to
next:
""
"a"
"b"
"aa"
"ab"
"ba"
"bb"
"aaa"
and so on. The exact order in which strings are returned doesn’t matter for our purposes.
SigmaStar is not too difficult to implement, and this is left as an exercise.
Enumerators generate the strings in a language. You can also think of them as numbering
the strings in the language—since the strings are generated in a fixed order. For example, we
18.8 RECURSIVELY ENUMERABLE LANGUAGES 275
could define a method sigmaStarIth(i) that takes a natural number i and returns the
String that is the ith one enumerated by SigmaStar:
String sigmaStarIth(int i) {
SigmaStar e = new SigmaStar();
String s = "";
for (int j = 0; j<=i; j++) s = e.next();
return s;
}
By the way, nothing in our definitions requires enumerators to generate a unique string on
every call, so sigmaStarIth does not necessarily give a one-to-one mapping from N to
*. In mathematical terms it is merely a surjection (an onto function): for every string s *
there is at least one i such that sigmaStarIth(i) = s.
We’ll make use of both sigmaStarIth and NatPairs in the next section.
boolean aRecognize(String s) {
AEnumerate e = new AEnumerate();
while (true)
if (s.equals(e.next())) return true;
}
This method returns true if and only if s is eventually enumerated by AEnumerate. Thus
L(aRecognize) = L(AEnumerate).
The construction in the other direction is a bit trickier. Given a recognition method
aRecognize for some RE language A, we need to show that we can construct an
276 CHAPTER EIGHTEEN: UNCOMPUTABILITY
enumerator class AEnumerate for the same language. Here is an attempt that doesn’t quite
work. It uses the SigmaStar enumerator to generate each possible string in *. To find the
next string in A, it tries successive strings in * until it finds one that aRecognize accepts:
class BadAEnumerate {
SigmaStar e = new SigmaStar();
String next() {
while (true) {
String s = e.next();
if (aRecognize(s)) return s;
}
}
}
That would work just fine if aRecognize were total—a decision method and not just a
recognition method. But if aRecognize runs forever on one of the strings generated by
SigmaStar, next will get stuck. It will never get around to testing any of the subsequent
strings.
To solve this problem we will introduce a time-limited version of run. Define
runWithTimeLimit(p,in,j) so that it returns true if and only if p returns true
for in within j steps of the simulation. (Various definitions of a “step” are possible here:
a statement, a virtual machine instruction, and so on. It doesn’t matter which one you
use. For TMs the meaning of “step” would be more obvious: one move of the TM.) This
runWithTimeLimit can be total, because it can return false if p exceeds j steps without
reaching a decision about in.
Using runWithTimeLimit, we can make an enumerator for A. For each ( j, k) pair,
we check whether the jth string in * is accepted by aRecognize within k steps. This is
where we can use the NatPairs enumerator and the sigmaStarIth method from the
previous section. NatPairs enumerates all the ( j, k) pairs we need, and sigmaStarIth
finds indexed strings in *.
class AEnumerate {
NatPairs e = new NatPairs();
String next() {
while (true) {
int (j,k) = e.next();
String s = sigmaStarIth(j);
18.9 LANGUAGES THAT ARE NOT RE 277
if (runWithTimeLimit(aRecognize,s,k)) return s;
}
}
}
(This uses some syntax that is not proper Java. The NatPairs method next returns a
string giving a pair of natural numbers such as "(13,7)", and it would take a few real Java
statements to break that string apart and convert the numeric strings back into integer values
j and k.)
Clearly AEnumerate only enumerates strings accepted by aRecognize. Further,
every string s accepted by aRecognize is eventually enumerated by AEnumerate,
because every such s = sigmaStarIth(j) for some j, and every such s is accepted
by aRecognize within some number of steps k, and AEnumerate tries all such
pairs (j,k). Thus, L(AEnumerate) = L(aRecognize). That completes the proof of
Theorem 18.5.
Theorem 18.6: If a language is RE but not recursive, its complement is not RE.
Proof: Let L be any language that is RE but not recursive. Assume by way of
contradiction that L is also RE. Then there exist recognition methods lrec
and lbar for both languages, so we can implement
boolean ldec(String s) {
for (int j = 1; ; j++) {
if (runWithTimeLimit(lrec,s,j)) return true;
if (runWithTimeLimit(lbar,s,j)) return false;
}
}
From this we conclude that the RE languages are not closed for complement. Notice that the
recursive languages are closed for complement; given a decision method ldec for L, we can
construct a decision method for L very simply:
But when a language is RE but not recursive, this kind of construction fails. If the
recognition method lrec(s) runs forever, !lrec(s) will too.
At the beginning of this chapter we saw that the languages Lh and Lu are RE but not
recursive. From Theorem 18.6 it follows that these complements are not RE:
Lu = {(p, s) | p is not a recognition method that returns true for s}
Lh = {(p, s) | p is not a recognition method that halts given s}
L(a*b*)
regular
languages {anbncn}
CFLs Lu
recursive Lu
languages
{anbn}
RE languages
Grammars of these four types define a hierarchy of four classes of languages called the
Chomsky hierarchy:
18.12 Oracles
Here are two languages that Rice’s theorem shows are nonrecursive:
Le = {p | p is a recognition method and L(p) = {}}
Lf = {p | p is a recognition method and L(p) = *}
The first is the set of recognition methods that always return false or run forever; the second
is the set of recognition methods that always return true. Neither one is recursive, and neither
is RE. But there is a sense in which one is much harder to recognize than the other.
To the purely practical mind, that last statement is meaningless. Undecidable is
undecidable; everything beyond the computability border is out of reach, so why worry
about additional distinctions? But for those with active mathematical imaginations, the
unreachable territory beyond the computability border has a fascinating structure.
Earlier in this chapter, we proved that Lh is not recursive. Later, we showed that Le is
not recursive, using a reduction from Lh. In other words, we showed that if there were a way
to decide Le, we could use that to decide Lh. Our conclusion at the time was that, since we
already know there is no decision method for Lh, there couldn’t be one for Le either. But
what if there were some way of deciding membership in Le? We know that empty can’t be a
decision method in our ordinary sense, but what if it decides Le somehow anyway? Then our
construction would allow us to decide Lh as well.
Turing machines with such impossible powers are called oracle machines. They are just
like ordinary Turing machines, except that they have a one-step way of checking a string’s
membership in a particular language. The language of this oracle can be anything we choose.
Giving a TM an oracle for a nonrecursive language like Le increases the range of languages
it can decide. In particular, our previous construction shows that given an oracle for Le, the
language Lh is recursive. With a different construction you can also show the reverse: given an
oracle for Lh, the language Le is recursive.
But an oracle for Lh would not put an end to uncomputability. In particular, although
it can decide the halting problem for ordinary Turing machines, it cannot decide the halting
282 CHAPTER EIGHTEEN: UNCOMPUTABILITY
problem for Turing machines with Lh oracles! That would require a stronger oracle, whose
addition to TMs would make the halting problem even harder, requiring a still stronger
oracle, and so on. The result is an infinite hierarchy of oracles. It turns out that Le is recursive
given an oracle for Lh, but Lf is not—it requires a more powerful oracle. In that sense, Lf is
harder to decide than Le.
Exercises
EXERCISE 1
Implement Java decision methods for the following languages:
a. {abc}
b. L(a*b*)
c. {anb2n}
d. {xx | x {a, b}*}
e. {x {0, 1}* | |x| is a prime number}
f. {x {0, 1}* | x is a binary representation of an odd number}
284 CHAPTER EIGHTEEN: UNCOMPUTABILITY
EXERCISE 2
For each of these Java decision methods, give a total Turing machine for the same
language.
a. boolean ax(String p) {
return (p.length()>0 && p.charAt(0)=='a');
}
b. boolean emptySet(String p) {
return false;
}
c. boolean sigmaStar(String p) {
return true;
}
d. boolean d1(String p) {
String s = "";
int n = p.length();
for (int i = 0; i<n; i++) s = p.charAt(i) + s;
return p.equals(s);
}
EXERCISE 3
For each of these Java recognition methods, give a Java decision method that accepts the
same language.
a. boolean r1(String p) {
while (true) {}
return true;
}
b. boolean r2(String p) {
if (p.length()<10) return true;
else return r2(p);
}
c. boolean r3(String p) {
while (true) {
if (p.length()<10) p += 'a';
}
return (p.length()>=10);
}
d. boolean r4(String p) {
EXERCISES 285
String s = "";
while (true) {
s += 'a';
if (s.equals(p)) return true;
}
return false;
}
EXERCISE 4
Given the shortcut and halts methods as described in Sections 18.2 and 18.3,
what would be the outcome of these code fragments? (Of course, we proved no such
methods exist; but assume here that we have some way of meeting their specifications.)
a. String s =
"boolean sigmaStar(String p) { " +
" return true; " +
"} ";
shortcut(s,"fourscore and seven years ago");
b. String s =
"boolean sigmaStar(String p) { " +
" return true; " +
"} ";
halts(s,"fourscore and seven years ago");
c. String s =
"boolean emptySet(String p) { " +
" return false; " +
"} ";
shortcut(s,s);
d. String s =
"boolean emptySet(String p) { " +
" return false; " +
"} ";
halts(s,s);
e. String s =
"boolean r1(String p) { " +
" while (true) {} " +
" return true; " +
"} ";
shortcut(s,"abba");
286 CHAPTER EIGHTEEN: UNCOMPUTABILITY
f. String s =
"boolean r1(String p) { " +
" while (true) {} " +
" return true; " +
"} ";
halts(s,"abba");
EXERCISE 5
Consider the nonSelfAccepting method used in Section 18.2 to prove by
contradiction that shortcut cannot be implemented. If we reimplement it like this:
boolean nonSelfAccepting(String p) {
return !run(p,p);
}
does it lead in the same way to a contradiction proving that run cannot be
implemented? Carefully explain why or why not.
EXERCISE 6
Prove that this language is not recursive:
{p | p is a recognition method that returns false for the empty string}
Hint: Follow the pattern in Section 18.5; you can’t use Rice’s theorem for this one.
EXERCISE 7
Prove that this language is not recursive:
{p | p is a method that takes no parameters and returns the string “Hello world”}
Hint: Follow the pattern in Section 18.5; you can’t use Rice’s theorem for this one.
EXERCISE 8
For each of the following languages, decide whether Rice’s theorem applies, and explain
why it does or does not.
a. {p | p is a recognition method and L(p) contains the empty string}
b. {p | p is a recognition method and L(p) *}
c. {m | m encodes a Turing machine M with abba L(M )}
d. {p | p is a recognition method that contains the statement x=1;}
e. {p | p is a recognition method that, when run on the empty string, executes the
statement x=1;}
f. {m | m encodes a Turing machine M for which L(M ) is finite}
EXERCISES 287
EXERCISE 9
Classify the following languages as either
A. regular,
B. context free but not regular,
C. recursive but not context free, or
D. not recursive,
and explain for each why you chose that answer. You do not have to give detailed
proofs—just give one or two sentences that outline how you would prove your
classification.
a. {m#x | m encodes a TM and x is a string it accepts}
b. {anbncn}
c. {x {a, b}* | the number of as in x is the same as the number of bs in x }
d. {anb2n}
e. {p | p is a recognition method that is syntactically legal (that is, it can be parsed
using a standard grammar for Java)}
f. the set of strings that contain the Java statement while(true){}
g. {(p, x) | p is a recognition method that, when run on the input x, executes the
statement while(true){}}
h. {p | p is a recognition method and L(p) {}}
i. {m | m encodes a TM M with L(M) {}}
j. {xx | x ∈ {a, b, c }* }
k. {(m, x) | m encodes a TM M that, when run on the input x, visits a state q7}
l. L(a*b*) L((a + ba*b)*)
m. {xxR | x {0, 1}*}
EXERCISE 10
Taking the alphabet to be = {a, b }, implement a Java enumerator class SigmaStar.
Generate the strings in order of their length: first ε, then the strings of length 1, then
the strings of length 2, and so on. Among strings of the same length any fixed order is
acceptable.
EXERCISE 11
Repeat Exercise 10, but add a String parameter to the SigmaStar constructor, and
take to be the set of characters in that String. You may assume that the string is not
empty, so {}.
EXERCISE 12
Implement a Java decision method for {xx | x {a, b}*}.
EXERCISE 13
Implement a Java enumerator class for {xx | x {a, b}*}.
288 CHAPTER EIGHTEEN: UNCOMPUTABILITY
EXERCISE 14
In the proof of Theorem 18.5, how often will each string in the language recognized by
aRecognize be enumerated by AEnumerate? Explain.
EXERCISE 15
Suppose you’re given a Java decision method javaRec that decides the language {p | p
is a Java recognition method}. (The javaRec method would verify that the string p
compiles without error as a Java method, that it has a single String parameter, and
that it returns a boolean value.) Using this, write an enumerator class for {p | p is a
Java recognition method}.
EXERCISE 16
Using javaRec defined as in Exercise 15, write an enumerator class for the
nonrecursive language {p | p is a Java recognition method and ε L(p)}. Hint: This
is like the construction of AEnumerate in Section 18.8. You’ll need NatPairs,
runWithTimeLimit, and your solution to Exercise 15.
EXERCISE 17
Using javaRec defined as in Exercise 15, write an enumerator class for the
nonrecursive language {p | p is a Java recognition method and L(p) {}}. Hint: You may
assume you have an enumerator class NatTriples that enumerates natural-number
triples (i, j, k).
EXERCISE 18
Let AEnumerate be an enumerator class with the special property that it enumerates
the strings of an infinite language A in order of their length, shortest first. Give a decision
method for A, thus proving that A is recursive.
EXERCISE 19
Give a grammar for the language {anbncnd n}, and show a derivation of the string
aaabbbcccddd.
EXERCISE 20
Give a grammar for the language {xx | x {a, b}*}, and show a derivation of the string
aabaab.
EXERCISE 21
Using decision methods, show that an oracle for Lh makes Lu recursive.
EXERCISE 22
Using decision methods, show that an oracle for Lu makes Lh recursive.
EXERCISE 23
Using decision methods, show that an oracle for Le makes Lu recursive.
EXERCISE 24
Using decision methods, show that an oracle for Lu makes Le recursive. (Hint: See the
construction of AEnumerate in Section 18.8.)
19
CHA P TER
Cost Models
289
290 CHAPTER NINETEEN: COST MODELS
Typically, the function f is the complicated function whose rate of growth we’re trying to
characterize; the function g is some simple function, like g(n) = n2, whose rate of growth is
easy to grasp. The assertion f(n) is O(g(n)) is a way of saying that f grows no faster than g.
If you multiply g(n) by some constant c, and plot the result, it will be above the plot of f(n)
almost everywhere—like this:
c g(n)
f (n)
f (n) is O(g(n))
n0
Some exceptions to f(n) c g(n) may occur among small values of n, those less than n0; the
assertion “f(n) is O(g(n))” says that some such constant n0 exists, but does not say what it is,
just as it does not say what the constant of proportionality c is. As the illustration suggests,
the big-O notation describes upper bounds. When you assert that f (n) is O(g(n)) you are
saying that f grows no faster than g; that doesn’t rule out the possibility that f actually grows
much slower than g. For example, if f(n) is O(n2), then by definition it is also true (though
less informative) to say that f(n) is O(n3), O(n4), or O(n100).
When you want to describe a lower bound on a function—to say that f grows at least as
fast as g—there is a related asymptotic notation using the symbol .
19.1 ASYMPTOTIC NOTATION 291
Thus, if you divide g(n) by some constant c and plot the result, it will be below the plot of
f (n) almost everywhere—like this:
f (n)
f (n) is Ω(g(n))
1 g(n)
c
n0
Again, some exceptions may occur among small values of n, those less than n0.
As the illustration suggests, the notation describes lower bounds. When you assert
that f (n) is (g(n)) you are saying that f grows at least as fast as g; that doesn’t rule out the
possibility that f actually grows much faster than g. For example, if f (n) is (n3), then by
definition it is also true (though less informative) to say that f (n) is (n2), (n), or (1).
If you inspect the definitions for O and , you'll see that it is possible for a function
f(n) to be both O(g(n)) and (g(n)) for the same function g. In fact, this is a very important
situation: when your upper and lower bounds are the same, you know that you have an
asymptotic characterization of the function f that is as informative as possible. There is a
special notation for this:
The important thing to observe here is that the constants involved, c and n0, are not the
same for both the upper bound and the lower bound. For the upper bound O(g(n)) we have
one pair of constants c and n0, and for the lower bound (g(n)) we have a different pair of
constants—call them c' and n0':
292 CHAPTER NINETEEN: COST MODELS
f (n)
c g(n)
f (n) is Θ(g(n))
1 g(n)
c'
n0 n'0
The notation is used to describe tight bounds. When you assert that f(n) is (g(n)) you
are saying that f grows at the same rate as g, neither faster nor slower.
This property says, for example, that O(3n2) and O(n2) are equivalent. Thus you should
never rest with the conclusion that a function is O(3n2), because O(n2) is equivalent and
simpler.
The same theorem also applies for and , with similar proofs:
Theorem 19.1.2, O version: Let {g1, …, gm} be any finite set of functions over
N in which g1 is maximal, in the sense that every gi(n) is O(g1(n)). Then for any
function f, f(n) is O(g1(n) + g2(n) + … + gm(n)) if and only if f (n) is O(g1(n)).
Proof: If f(n) is O(g1(n) + g2(n) + … + gm(n)) then, by definition, there exist
natural numbers c0 and n0 so that for every n n0,
We are given that every term gi(n) is O(g1(n)), so by definition there exist
natural numbers {c1, …, cm} and {n1, …, nm} such that, for every i and every
n max(n0, …, nm), gi(n) cig1(n). Therefore, for every n max(n0, …, nm),
By choosing c' = c0(c1 + … + cm) and n' = max(n0, …, nm), we can restate
this as follows: for every n n', f(n) c'·g1(n). Thus, by definition, f (n) is
O(g1(n)). (In the other direction, the proof is trivial, using the fact that
g1(n) g1(n) + g2(n) + … + gm(n).)
Because of this property, sums of terms inside the big O can almost always be simplified.
You just identify the fastest growing term and discard the others. For example, you would
never conclude that a function is O(n2 + n + log n + 13), because O(n2) is equivalent and
simpler. If you ever find yourself writing “+” inside the big O, think twice; it’s almost always
possible to simplify using Theorem 19.1.2.
Theorem 19.1.2 also applies to the and forms:
Theorem 19.1.2, version: Let {g1, …, gm} be any finite set of functions over
N in which g1 is maximal, in the sense that every gi(n) is O(g1(n)). Then for any
function f, f(n) is (g1(n) + g2(n) + … + gm(n)) if and only if f(n) is (g1(n)).
Theorem 19.1.2, version: Let {g1, …, gm} be any finite set of functions over
N in which g1 is maximal, in the sense that every gi(n) is O(g1(n)). Then for any
function f, f(n) is (g1(n) + g2(n) + … + gm(n)) if and only if f(n) is (g1(n)).
294 CHAPTER NINETEEN: COST MODELS
Interestingly, we still select the maximal term to keep (defined using O), even in the
and versions of this theorem. You'll see how this works out if you do Exercise 7.
Another property that is very useful for analyzing algorithms is this:
The proofs of these are left as an exercise (Exercise 4). They come in handy for analyzing
loops; if you know that the time taken by each iteration of a loop is O(g1(n)), and you know
that the number of iterations is O(g2(n)), then you can conclude that the total amount of
time taken by the loop is O(g1(n) g2(n)).
One final property: the O and forms are symmetric:
Theorem 19.1.4: For any functions f and g over N, f (n) is O(g(n)) if and only if
g(n) is (f(n)).
Proof: Follows directly from the definitions of O and .
This symmetry is interesting but not often useful, because O and are typically used to
characterize complicated functions by relating them to simple functions. Reversing the
direction is not usually a good idea; it might be mathematically correct to characterize a
simple function by relating it to complicated function, but it's rarely helpful.
Theorem 19.2:
• For all positive real constants a and b, a is O((log n)b).
• For all positive real constants a and b, and for any logarithmic base, (log n)a is
O(nb).
• For all real constants a > 0 and c > 1, na is O(cn).
1
Note that the theorem covers cases with fractional exponents. For example, n=n and so, by
2
b/b,R
a/a,R
c/c,R
B/B,R c/c,R
B/B,R
c/c,R
B/B,R
It is easy to see how many moves this machine will make on any x L(M ), because it
always moves right and never gets stuck until after the first B at the end of the input. So
for x L(M), time(M, x) = |x| + 1 and space(M, x) = |x| + 2 (including as “visited” the final
position of the head, from which the final state has no transition). On the other hand, the
19.4 A TM COST MODEL 297
machine stops early when x L(M), because it gets stuck before reaching the accepting state.
For example, it stops after one move on any input string beginning with ba, no matter how
long: time(M, bax) = 1. We conclude that the worst-case time and space costs are linear: both
worst-case-time(M, n) and worst-case-space(M, n) are (n).
Example: Quadratic Time, Linear Space
Consider this TM for {anbn}, which was introduced in Chapter 16.
a/a,R
b/b,R
B/B,R a/B,R
5 1 2
B/B,L
B/B,R
b/B,L
4 3
a/a,L
b/b,L
The machine contains two states with self-edges: 2 and 4. For each state, the self-edges
of that state all move the head in the same direction and all transition on a symbol that is
not B. The machine never writes a nonB over a B, so the number of nonB symbols on the
tape is never greater than the original length m. Therefore, when any state is entered, there
will be O(m) moves in the worst case before that state is exited. Aside from the self-edges,
the machine has only one cycle: the states 1, 2, 3, and 4. On each iteration of this cycle,
it overwrites one a and one b. It cannot iterate more than m/2 times before all the input
symbols have been overwritten, so the number of iterations is O(m) in the worst case. In the
worst case, then, the machine makes O(m) iterations of the cycle, each of which takes O(m)
time. Thus, worst-case-time(M, m) is O(m2). The machine visits the whole part of the tape on
which the input is presented, but not beyond that, so worst-case-space(M, m) is (m).
With a little more work it is possible to show that the time bound is tight: worst-case-
time(M, m) is (m2). It is even possible to derive an exact formula for worst-case-time,
without using asymptotics (see Exercise 8). But finding exact formulas for worst-case time
and space isn’t easy, even for this relatively simple TM. For larger TMs it becomes dauntingly
difficult, and for TMs that are described but not actually constructed it is impossible.
Fortunately, it is often possible to give asymptotic bounds on the time and space resources
298 CHAPTER NINETEEN: COST MODELS
required, based only on an informal description of the machine. For this, asymptotics are
more than just a convenience—they are indispensable.
boolean ax(String p) {
return (p.length()!=0 && p.charAt(0)=='a');
}
Let n be the input size, in this case simply n = |p|. Using our basic cost-model assumptions,
we can break down the worst-case time cost like this:
1. This is not quite realistic for integer multiplication and division. Simple algorithms for multiplication and division
of n-bit operands require (n2) time. Advanced algorithms can improve on this, approaching (n) time. To keep
things simple here, we’ll use (n) for all operators, even including *, /, and %.
300 CHAPTER NINETEEN: COST MODELS
Time Cost
1. boolean ax(String p) { (1)
2. return ( (1)
3. p.length() (1)
4. != (1)
5. 0 (1)
6. && (1)
7. p.charAt(0) (1)
8. == (1)
9. 'a'); (1)
10. } +_________
(1)
It turns out that the time complexity does not depend on n; every operation here takes
constant time and is executed at most once. (In general, the != operator takes time
proportional to the size of the data it reads; but it is reasonable to assume that the !=
operator on line 4 does not need not read the whole number to tell whether it is zero.) To get
the total worst-case-time(ax, n), we add that column of asymptotics. Using Theorem 19.1.2,
this is simply a matter of selecting the fastest-growing asymptotic in the column. The result:
worst-case-time(ax, n) is (1).
Turning to the space analysis, we can again consider this line by line.
Space Cost
1. boolean ax(String p) { (1)
2. return ( (1)
3. p.length() (1)
4. != (1)
5. 0 (1)
6. && (1)
7. p.charAt(0) (1)
8. == (1)
9. 'a'); (1)
10. } +_________
(1)
The space taken by the string p is, of course, (n). But in our model of execution this
value is passed as a reference—it is not copied into new memory space. Similarly, the
space required for p.length() on line 3 is (log n). But in our model of execution
the length method just returns, by reference, the stored length of the string, so no new
memory space is used. We conclude that worst-case-space(ax, n) is also (1).
19.5 A JAVA COST MODEL 301
Like ax, most methods manipulate a large (but constant) number of values that require
constant space each. These add up to (1) space, which is already given as the overhead of
calling the method in the first place. So we can simplify the analysis of space complexity by
ignoring all these constant-space needs. The analysis can focus instead on expressions that
construct new values that require nonconstant space (like String, array, or unbounded
int values) and on expressions that call methods that require nonconstant space. If there
are none, as in ax, then the space complexity is (1). In a similar way, the analysis of time
complexity can be simplified by ignoring constant-time operations repeated for any constant
number of iterations.
Example: Loglinear Time, Logarithmic Space
Here’s another example, a decision method for {anbn}:
boolean anbn(String x) {
if (x.length()%2 != 0) return false;
for (int i = 0; i < x.length()/2; i++)
if (x.charAt(i) != 'a') return false;
for (int i = x.length()/2; i < x.length(); i++)
if (x.charAt(i) != 'b') return false;
return true;
}
The previous example developed a tight bound on the worst-case complexity directly. This
time, it will be more instructive to develop upper and lower bounds separately. First, a lower
bound. Clearly, the worst case for this code is when x {anbn}, so that all the loops run to
completion and none of the early returns is taken. Assuming that x {anbn}, we can state a
lower bound for the time spent on each line:
The “Total” column here shows the total time spent on each line. In this case, it is computed
for each line by simple multiplication: the minimum time for one iteration of the line
multiplied by the minimum number of iterations of that line. This technique is very useful,
but it does have its limitations. Consider, for example, lines 6 and 7. Because i can be as
low as 0 there, we can’t do better than (1) as a lower bound on the time spent for a single
iteration of those lines. (For the similar lines 11 and 12 we can claim the stronger (log n)
lower bound, because there we know that i n/2.) Of course, i doesn’t stay 0—and the
total time for lines 6 and 7 is more than just the minimum time for any iteration times
the number of iterations. With a more detailed analysis we could get a stronger total lower
bound for these lines. But in this case there is no need. Adding up the “Total” column is
simply a matter of choosing the fastest growing function in the column (as Theorem 19.1.2
showed), and that is (n log n). This turns out to be the strongest lower bound possible.
We know it is the strongest lower bound possible, because it matches the upper bound:
Space Cost
1. boolean anbn(String x) {
2. if (x.length()%2 != 0)
3. return false;
4. for (int i = 0;
5. i < x.length()/2; (log n)
6. i++) (log n)
7. if (x.charAt(i) != 'a')
8. return false;
9. for (int i = x.length()/2; (log n)
10. i < x.length();
11. i++) (log n)
12. if (x.charAt(i) != 'b')
13. return false;
14. return true;
15. } +_________
(log n)
The values constructed at lines 5 and 9 are the same: n/2. The largest value constructed at
line 6 is n/2, and the largest value constructed at line 11 is n. These all take (log n) space.
Thus the total space requirement, worst-case-space(anbn, n), is (log n).
304 CHAPTER NINETEEN: COST MODELS
String matchingAs(String s) {
String result = "";
for (int i = 0; i < s.length(); i++)
r += 'a';
return r;
}
As in the previous example, this produces some rather weak lower bounds. In this case, the
weakness is significant. For upper bounds, we get
where t(k) is (k). Using the definition of , we know that for all sufficiently large n, there is
some c for which
n n n
g(n)=
k =1
Σ t(k) ≥ Σ
k =1
1 k=1
c c Σ k = 1c
k =1
n2+n
2
It follows that g(n), the total worst-case time at line 6, is (n2). That gives a total lower
bound of (n2). That matches the upper bound, so worst-case-space(matchingAs, n) is
(n2).
The worst-case space analysis is more straightforward:
Space Cost
1. String matchingAs(String s) {
2. String result = "";
3. for (int i = 0;
4. i < s.length();
5. i++) (log n)
6. r += 'a'; (n)
7. return r;
8. } +_________
(n)
The largest value constructed at line 6 has the same length as the input string s.
306 CHAPTER NINETEEN: COST MODELS
boolean isPrime(int x) {
if (x < 2) return false;
for (int i = 2; i < x; i++)
if (x % i == 0) return false;
return true;
}
The analysis must be carried out in terms of the input size n, which is the amount of space
required to represent the input integer x. The relation between n and x is determined by the
representation. If we’re using decimal, n is roughly log10 x; if we’re using binary, it’s roughly
log2 x, and so on. Let’s just say that for some constant base b, x is (bn). Then we have this
for an upper bound on the worst-case time:
An analysis of the lower bounds, which we omit, gives a weaker result for line 5 but the same
total, (nbn). Thus, worst-case-time(isPrime, n) is (nbn): exponential time.
If we performed our analysis in terms of the value of x, instead of the length of the string
representing x, we would get a much less intimidating result: worst-case-time(isPrime, x) is
O(x log x). But in the study of formal language, we always treat complexity as a function of
the size of the input string, whether it’s a string of letters or, as in this case, a string of digits.
Turning to the space complexity, we have:
EXERCISES 307
Space Cost
1. boolean isPrime(int x) {
2. if (x < 2) return false;
3. for (int i = 2;
4. i < x;
5. i++) (n)
6. if (x % i (n)
7. ==
8. 0)
9. return false;
10. return true;
11. } +_________
(n)
The largest value constructed at line 5 is x, and the largest remainder at line 6 is x/2 -1.
Thus the total space requirement, worst-case-space(isPrime, n), is (n).
Exercises
EXERCISE 1
True or false:
a. 2n is O(n)
b. 2n is (n)
c. 2n is (n)
d. 3n2 + 3n is O(n3)
e. 3n2 + 3n is (n3)
f. 3n2 + 3n is (n3)
g. n is O(2n)
h. n2 is (n2 + 2n + 1)
i. log n2 is (log n3)
j. n2 is O(n (log n)2)
k. 2n is O(22n)
l. 3n is (5n)
EXERCISE 2
Simplify each of the following as much as possible. For example,
O(3a2 + 2a) simplifies to O(a2). (Hint: A few are already as simple as possible.)
a. O(5)
b. O(log3 a)
308 CHAPTER NINETEEN: COST MODELS
c. O(log a + 1)
d. O(log a2)
e. O((log a)2)
f. (a + log a)
g. (log a + 1)
h. (max(a2, a))
i. (2n + n100)
j. (a2 + 2a)
k. (3n2 + 2n + log n + 1)
l. (n2n)
m. (3n + 2n)
EXERCISE 3
Using the definitions only, without using the simplification theorems proved in
Section 19.2, prove the following:
a. n2 + 5 is O(n2)
b. n3 + 2n2 is (n)
c. a2 + 2a + 12 is (a2)
EXERCISE 4
Give a detailed proof of all three cases for Theorem 19.1.3.
EXERCISE 5
Give a detailed proof of Theorem 19.1.4.
EXERCISE 6
Give a detailed proof of Theorem 19.1.1, for the and versions.
EXERCISE 7
Give a detailed proof of Theorem 19.1.2, for the and versions.
EXERCISE 8
These questions concern the TM M for the language L(M ) = {anbn}, discussed in
Section 19.4.
a. Derive an exact formula for time(M, anbn), and show that it is (n2).
b. Show that space(M, anbn) is (n).
c. Show that worst-case-space(M, m) is O(m), and conclude (using the result from
Part b) that it is (m).
d. Show that worst-case-time(M, m) is O(m2), and conclude (using the result from
Part a) that it is (m2). (Note that this m is the total length of the string; your proof
must encompass strings that are not in L(M ).)
EXERCISES 309
EXERCISE 9
These questions concern the TM from Section 16.3 (for the language {anbncn}).
a. Show that worst-case-space(M, m) is O(m).
b. Show that worst-case-space(M, m) is (m), and conclude that it is (m).
c. Show that worst-case-time(M, m) is O(m2).
d. Show that worst-case-time(M, m) is (m2), and conclude that it is (m2).
EXERCISE 10
These questions concern the TM from Section 16.7 (for the language
{xcx | x {a, b}*}).
a. Show that worst-case-space(M, n) is O(n).
b. Show that worst-case-space(M, n) is (n), and conclude that it is (n).
c. Show that worst-case-time(M, n) is O(n2).
d. Show that worst-case-time(M, n) is (n2), and conclude that it is (n2).
EXERCISE 11
These questions concern the TM from Section 17.2 (the stacker machine). Assume that
all input strings are in the required form x#y, where x {0, 1}* and y {0, 1}*.
a. Show that space(M, x#y) is (|x#y|), and thus worst-case-space(M, n) = (n).
b. Derive a bound on worst-case-time(M, n).
c. Derive a bound on best-case-time(M, n).
EXERCISE 12
These questions concern the TM from Section 17.4 (the ith machine). Assume the input
string is in the required form.
a. Show that worst-case-time(M, n) is (n2n).
b. Show that worst-case-space(M, n) is (2n).
EXERCISE 13
Derive tight asymptotic bounds on the worst-case time and space used by this
contains method. Show enough detail to make a convincing case that your bounds
are correct.
boolean contains(String s, char c) {
for (int i = 0; i < s.length(); i++)
if (s.charAt(i)==c) return true;
return false;
}
EXERCISE 14
Derive tight asymptotic bounds on the worst-case time and space used by this decision
method for the language {xx R}. Show enough detail to make a convincing case that your
bounds are correct.
310 CHAPTER NINETEEN: COST MODELS
boolean xxR(String s) {
if (s.length() % 2 != 0) return false;
for (int i = 0; i < s.length() / 2; i++)
if (s.charAt(i) != s.charAt(s.length()-i-1))
return false;
return true;
}
EXERCISE 15
a. Let g(n) be a function with
n
g(n)= Σ t(k)
k =1
where t(k) is (log k). Show that g(n) is (n log n). (Hint: Use this asymptotic
version of Stirling’s approximation: log n! is (n log n).)
b. Using that result, derive a tight asymptotic bound on the worst-case runtime of
the following method. Show your derivation for the upper bound first, then your
derivation for the matching lower bound.
boolean containsC(String s) {
int i = s.length();
while (i > 0) {
i--;
if (s.charAt(i)=='c') return true;
}
return false;
}
EXERCISE 16
Derive an asymptotic tight bound on the worst-case space used by this fact method,
and then derive an asymptotic upper bound on the worst-case time. (Hint: Use the result
of Exercise 15, Part a.) Make sure your results are expressed in terms of the length of the
input, not in terms of its integer value. Show enough detail to make a convincing case
that your bounds are correct.
int fact(int p) {
int sofar = 1;
while (n > 0) sofar *= p--;
return sofar;
}
20
CHA P TER
Deterministic
Complexity Classes
Have you ever thought about what you would do if you had all the
money in the world? It’s fun to imagine. It can also be instructive,
when you realize that some of the things you want, money can’t buy.
When we considered questions of computability in previous chapters,
we were asking the same kind of question: “What languages can be
decided, with unlimited time and space?” Of course, we never have
unlimited time and space, but it’s fun to think about. It can also be
instructive, when you realize that some of the languages you want
to decide can’t be decided, no matter how much time and space you
have to spend.
Now we turn to a different, more pragmatic kind of question: what
languages can be decided on a limited budget? We’ll use the worst-
case-time and worst-case-space measures developed in the previous
chapter to classify languages according to the resource budgets
required to decide them.
311
312 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
The resulting class of languages is called a complexity class: languages classified according to
the resources required to decide them.
The previous chapter analyzed the worst-case time requirements of individual methods.
Now, we are considering the time requirements of whole languages—of all possible
decision methods for a given language. To prove that a language L is in TIME (g(n)) can
be straightforward. We need only give an example of a decision method p, show that it
decides L, and show (using the techniques of the previous chapter) that its worst-case time is
O(g(n)). For example, in the previous chapter we analyzed a decision method anbn for the
language {anbn}. We showed that worst-case-time(anbn, n) is O(n log n). That proves that the
language can be decided in O(n log n) time:
{anbn} TIME (n log n).
Since the language can be decided within that time budget, it can certainly also be
decided within any more generous budget, so we can also conclude that {anbn} TIME (n2),
{anbn} TIME (n3), and so on. In general, if f (n) is O(g(n)), then TIME (f(n)) TIME (g(n)).
On the other hand, to prove that a language L is not in TIME (g(n)) is usually much more
difficult. It isn’t enough to say that, although you tried really hard, you just couldn’t think
of an O(g(n)) decision method. You have to prove that no such method can exist. Here’s an
example:
TIME (log n)
TIME (n log n)
{anbn}
It is not surprising to observe that the more time you have for the decision, the more
languages you can decide.
A time-hierarchy theorem states this observation more rigorously. We know that if
f (n) is O(g(n)), then TIME(f(n)) TIME (g(n)). A time-hierarchy theorem establishes some
additional conditions on f and g that allow you to conclude that the inclusion is strict—that
TIME (f (n)) TIME (g(n)). It’s a very useful kind of theorem: it often allows you to conclude
that two complexity classes are distinct, without the need to find and prove special theorems
like Theorem 20.1, above.
In fact, our time-hierarchy theorem automatically establishes an infinite hierarchy
of complexity classes. For example, it can be used to show that if a < b then
TIME (na) TIME (nb), producing an infinite hierarchy of polynomial complexity classes:
TIME (1)
TIME (n)
TIME (n2)
TIME (n3)
...
This polynomial time hierarchy is just one simple consequence of our time-hierarchy
theorem. Roughly speaking, our time-hierarchy theorem says that anything more than an
extra log factor is enough to decide additional languages. So there is actually a far richer
314 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
structure of time-complexity classes than those pictured above. For example, there is an
infinite hierarchy of distinct classes like TIME (n log2 n) and TIME (n log4 n) that all lie inside the
diagram above, between TIME (n) and TIME (n2).
Unfortunately, this rich and powerful theorem is rather difficult to state and prove in
detail. Interested readers should consult Appendix B.
= TIME (nk )
⊃
P
k
DFA-acceptance
Instance: a DFA M and a string x.
Question: does M accept x?
Informal descriptions are more readable because they suppress details. They are also more
susceptible to error, for the same reason. When you read or write these descriptions, you
should be vigilant, carefully thinking about how you might represent each problem instance
as a string. For example, consider this decision problem:
Primality
Instance: a number x N.
Question: is x prime?
How is the natural number x represented? The informal description doesn’t say, but it is
conventional to assume that natural numbers are represented compactly, using a string of
numerals with a positional notation like binary or decimal. That’s like our cost model for
Java. In fact, we’ve already seen a Java decision method for this problem:
316 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
boolean isPrime(int x) {
if (x < 2) return false;
for (int i = 2; i < x; i++)
if (x % i == 0) return false;
return true;
}
As we showed in Section 19.5, this takes exponential time— (nbn), if integers are
represented in base b and n is the length of the string of base b numerals used to represent x.
That is not polynomial time; when b > 1, bn is not O(nk) for any fixed k. So this decision
method does not show that Primality is in P. (It turns out that it is in P, but proving it
requires a much more subtle algorithm. See the Further Reading at the end of this chapter.)
P includes some of the major classes we’ve already studied:
Proof: Polynomial-time parsing algorithms for CFGs, like the CKY algorithm
(see Section 15.3), are well known. But not all languages in P are context free;
for example, it is easy to decide {anbncn} in polynomial time.
Because P includes all context-free languages, we know it includes all regular languages.
Because it includes all regular languages, we know it includes all finite languages—a fact we’ll
make use of in the next section.
People sometimes characterize the complexity class P as the set of tractable decision
problems. That’s probably an overstatement, because almost all the algorithms used in real
computer systems have not just polynomial complexity, but very low polynomial complexity.
For most applications that work with large data-sets, useful algorithms can’t take much more
than linear time, and it’s hard to think of practical examples in which problems requiring
even, say, O(n4) time are considered tractable. (For that matter, it’s hard to think of any
commercial setting in which the constant factor hidden by the big-O notation is actually
immaterial. If you slow down a program by a factor of ten, customers don’t say, “Who cares?
It’s only a constant factor!”) P certainly includes all the tractable decision problems, but also
includes many whose decision computations are, by most everyday measures, prohibitively
expensive.
But if in practice the word tractable isn’t a perfect description for the inside of P, the word
intractable certainly seems like a good description for the outside of P. That’s another reason
for the importance of this class. Languages that are recursive, yet beyond the polynomial-
time boundary, have an interesting intermediate status. The time required to decide such
a language grows more rapidly than n4, more rapidly than n4000, more rapidly than nk for
any fixed value of k. These are languages that can be decided in theory but not in practice;
20.3 EXPONENTIAL TIME 317
languages whose computation we can specify but cannot afford to carry out, except on the
very shortest input strings.
k
= TIME (2n )
⊃
EXPTIME
k
This is what it means to say that a language requires exponential time to decide: the
decision algorithm takes time proportional to some constant base raised to a power that is a
polynomial function of the length of the input string.
The time-hierarchy theorem demonstrates that two classes are distinct by constructing
a language that is provable in one but not in the other. In this way, it shows that there is
at least one language in EXPTIME that is not in P. But the language it constructs is highly
artificial—it’s provably hard to decide, but it isn’t clear why you would ever want to decide
it in the first place! Luckily, we don’t have to rely on the time-hierarchy theorem to illustrate
EXPTIME. In fact, we’ve already seen one example of a language that is in EXPTIME but not in P.
It’s the language decided by our runWithTimeLimit method. Recall that we defined
runWithTimeLimit(p,in,j) so that it returns true if and only if the recognition
method p returns true for the input string in within j steps. As an informal decision
problem, this is
Run-with-time-limit
Instance: a recognition method p, a string in, and a natural number j.
Question: does p accept in within j moves?
This decision problem can be solved in exponential time, as shown in the following theorem:
We can also prove that the decision problem cannot be solved using polynomial time:
1 boolean LdecPoly(String x) {
2 int n = x.length();
3 if (n < n0) return LdecShort(x);
4 int b = c * power(2,power(n,k));
5 return rWTL(Ldec,x,b);
6 }
This proof uses the technique of reduction, which we first encountered in Chapter 18.
In that chapter it didn’t matter how efficient the reduction was, as long as it was computable.
The proof above is different; it depends critically on the fact that the reduction itself requires
only polynomial time. Lines 2 through 4 accomplish the reduction; you can think of them as
20.3 EXPONENTIAL TIME 319
defining a function f that converts any instance of L into an equivalent instance of Run-with-
time-limit. Such reductions are very useful for proving things about complexity classes; they
even have their own special notation.
Our proof of Theorem 20.3.2 shows that for all languages L EXPTIME, L P Run-with-
time-limit. In that sense, Run-with-time-limit is at least as hard as any EXPTIME problem. There
is a term for languages that have this property:
TIME
k
SPACE (g(n))
= {L | L is decided by some decision method p,
where worst-case-space(p, n) is O(g(n))}.
Basic results for space complexity parallel those for time complexity, so we’ll just summarize
them here.
• There is a separate hierarchy theorem for space. From basic definitions we know that if
f(n) is O(g(n)), then SPACE (f (n)) SPACE (g(n)). The space hierarchy theorem establishes
some additional conditions on f and g that allow you to conclude that the inclusion is
strict—that SPACE (f(n)) SPACE (g(n)).
• The space-hierarchy theorem is strong enough to show, for example, that if a < b then
SPACE (na) SPACE (nb), producing an infinite hierarchy of polynomial-complexity classes
for space.
• Corresponding to the time-complexity class P (polynomial time) there is a space-
complexity class PSPACE (polynomial space) that is robust with respect to changes in the
underlying computational model:
SPACE(nk)
⊃
PSPACE =
k
• Corresponding to the time-complexity class EXPTIME (exponential time) there is a space-
complexity class EXPSPACE (exponential space) that provably contains problems not
solvable in PSPACE:
k
= SPACE(2n )
⊃
EXPSPACE
k
Proof: According to our cost model, simply creating values taking (g(n)) space
must take (g(n)) time.
This holds, not just for our Java decision methods, but for any reasonable cost model and
for a variety of models of computation. For example, simply visiting g(n) cells of a Turing
machine’s tape requires making at least g(n) moves.
As a direct consequence of Theorem 20.4.1, we have
Theorem 20.4.3: For any g(n), there exists some constant k such that
SPACE (g(n))TIME (2k⋅g(n)).
Proof sketch: This proof is easier to do using the TM model. The basic idea is
to show that a TM that operates within an m-cell region of its tape cells cannot
have more than 2km distinct IDs, for some constant k. So a modified version
of the machine, limited to 2km steps, must produce the same decision as the
original.
Again, this is true for any reasonable cost model and for a variety of models of
computation.
As a direct consequence of Theorem 20.4.3, we have
Some of the relations among time- and space-complexity classes are fascinating puzzles—
puzzles that have withstood many years of research effort by some of the most brilliant
computer scientists on the planet. For example, most researchers believe that
P PSPACE and EXPTIME EXPSPACE, but no one has been able to prove either.
PSPACE-complete Problems
These are the hardest problems in PSPACE. They seem to be intractable, requiring exponential
time, but no one has been able to prove this. A classic, PSPACE-complete problem is QBF: the
Quantified Boolean Formulas problem.
For example, the QBF instance x y (x y) is true: for all Boolean values that can be
assigned to x, there exists an assignment for y that makes x y true. That’s a positive instance,
an element of the QBF language. A negative instance is x y (x y); it’s a well-formed
instance, but it is not true that for all x there exists a y that makes x y true.
Imagine yourself facing QBF as a programming assignment: write a program that lets the
user type in a QBF formula, evaluates it, and tells whether it is true or false. There’s a simple
kind of PSPACE solution to this problem: you just evaluate the input formula using all possible
combinations of truth assignments for the variables and collect the results. (To work out the
details, try Exercise 8.) But because there are exponentially many different combinations of
truth assignments for the variables, this approach takes exponential time. It’s a simple, brute-
force solution, yet no solution with better asymptotic worst-case time complexity is known.
A more familiar PSPACE-complete program (at least to the readers of this book) is the
problem of deciding whether two regular expressions are equivalent.
Regular-expression-equivalence
Instance: two regular expressions r1 and r2, using only the three basic
operations: concatenation, +, and Kleene star.
Question: is L(r1) = L(r2)?
NFA-equivalence
Instance: two NFAs M1 and M2.
Question: is L(M1) = L(M2)?
The same result does not hold for DFAs. Two DFAs can be converted to minimum-state
DFAs and compared for equivalence in polynomial time.
20.4 SPACE-COMPLEXITY CLASSES 323
EXPTIME-complete Problems
We have already shown that Run-with-time-limit is EXPTIME-complete. That might seem like
a rather artificial problem, but many more natural EXPTIME-complete problems have been
found. An example is the game of checkers:
Generalized-checkers
Instance: an n-by-n checkerboard with red and black pieces.
Question: can red guarantee a win from this position?
Notice that the game is generalized to boards of unbounded size. Of course, even for n = 8
(as in American checkers) the number of problem instances—the number of possible
configurations of pieces on the 8-by-8 checkerboard—is large enough to make the game
challenging for humans. The language of winning positions is too large for human players
to memorize. But it is still a finite number, and that makes the fixed-size game uninteresting
from the perspective of formal language; it’s just a finite (though very large) language,
decidable by a simple (though very large) DFA. Generalized to n-by-n checkerboards, the
language becomes infinite, and deciding it turns out to be intractable. The exact rules of
checkers (or draughts) vary around the world, but under one reasonable definition of the
rules, Generalized-checkers has been shown to be EXPTIME-complete. Generalized versions of
chess and go have also been shown to be EXPTIME-complete.
EXPSPACE-complete Problems
In Section 9.4, several extensions to regular expressions were discussed. One was squaring,
defining L((r)2) = L(rr). The question of whether two regular expressions are equivalent,
when squaring is allowed, becomes more difficult to answer. You can of course convert all
the squaring into explicit duplication; then you’ve got two basic regular expressions that can
be compared as already discussed. But that first step, eliminating the squaring, can blow
up the expression exponentially—not a promising first step, expecially considering that the
rest of the algorithm needs space that is polynomial in that new, blown-up length! In fact,
the regular-expression-equivalence problem is known to be EXPSPACE-complete when the
expressions can use these four operations: concatenation, +, Kleene star, and squaring.
Instead of squaring, regular expressions can be extended with an intersection operator
&, so that L(r1 & r2) = L(r1) L(r2). This again makes the equivalence problem EXPSPACE-
complete. Note that neither of these extensions changes the set of languages you can define
using regular expressions; it remains the set of regular languages. The only change is that,
using the extensions, you can define some regular languages much more compactly.
324 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
Presburger-arithmetic
Instance: a formula that uses only the constants 0 and 1, variables over
the natural numbers, the + operator, the = comparison, the logical
connectives , , and , and the quantifiers and .
Question: is the formula true?
Mojzesz Presburger explored this theory in the late 1920s. He gave a set of axioms for this
arithmetic and proved that they are consistent (not leading to any contradictions) and
complete (capable of proving all true theorems). He proved that this arithmetic is decidable:
an algorithm exists to decide it, so the language is recursive. But it is known that no
algorithm that decides Presburger-arithmetic can take less than doubly exponential time. So
this is an example of a language that is in 2EXPTIME, but not in EXPTIME.
A Problem Requiring Nonelementary Time
Beyond 2EXPTIME lies an infinite hierarchy of higher exponentials: 3EXPTIME, 4EXPTIME, and
so on, each one occupied by problems that cannot be solved in less time. The union of all
the classes in this exponential-time hierarchy is the class called ELEMENTARY TIME : the class of
languages decidable in k-fold exponential time, for any k. That’s an impressively generous
time budget—but it still isn’t enough to handle all the recursive languages. There are
languages that are recursive, but provably not in ELEMENTARY TIME.
A simple example is the regular-expression-equivalence problem, when regular
expressions are extended with a complement operator C, so that L((r)C) = {x * | x L(r)}.
We know that the regular languages are closed for complement, so this extension does not
add the ability to define new languages. But, once again, we get the ability to define some
languages much more compactly; and, once again, that makes the problem of deciding the
equivalence of two regular expressions harder, in this case, much harder. When complement is
allowed, the problem of deciding the equivalence of two regular expressions provably requires
nonelementary time.
Exercises
EXERCISE 1
a. For fixed k, is O(nk) the same as O(nk log n)? Prove your answer.
EXERCISE 2
EXERCISE 3
Suppose that an incredibly fast computer began executing a decision method at the
time of the big bang. Assume that the big bang was 15 billion years ago, and that the
computer has executed one step per yoctosecond ever since—that is, 1024 steps per
second. What is the largest input of the size n for which it could have reached a decision
by now, assuming that the decision required
a. n steps
b. n2 steps
c. n13 steps
d. 2n steps
e. 10n steps
f. 22n nsteps
2
g. 22 steps
EXERCISE 4
(This exercise refers to concepts discussed with the time-hierarchy theorem in
Appendix B.) Give an implementation and analysis to prove that each of the following
functions over N is time constructible.
a. n3
b. 2n
c. n log2 n (Implement this as n log2n and assume n > 0.)
d. 22n / log2 n (Implement this as 22n , and assume n > 0.)
n log2n
EXERCISE 5
(Some parts require the time-hierarchy theorem from Appendix B. You may assume that
all the functions you need are time constructible.)
a. Is TIME (n3) the same as TIME (n3 (log n)3)? Prove your answer.
b. Is TIME (n3) the same as TIME (3n3)? Prove your answer.
c. Is TIME (2n) the same as TIME (2n+2)? Prove your answer.
d. Is TIME (2n) the same as TIME (n2n)? Prove your answer.
e. Is TIME (n log n) the same as TIME (n n )? Prove your answer.
EXERCISE 6
In the proof of Theorem 20.3.1, the decision method LdecPoly has this statement:
int b = c * power(2,power(n,k));
Give an implementation of the power method, and using it show that the runtime for
this statement is polynomial in n.
EXERCISES 327
EXERCISE 7
For each of the following QBF instances, decide whether it is true or false, and prove it.
a. ∀x(x)
b. ∃x(x)
c. ∀x(x x)
d. ∃x(x x)
e. ∀x∃y((x y) (x y))
f. ∃x∀y((x y) (x y))
g. ∃x∀y∃z(x y z)
h. ∃x∀y∃z((x y) (y z))
EXERCISE 8
In this exercise, you will write a Java application DecideQBF that takes a QBF instance
from the command line and prints true or false depending on whether the input
string is in the language of true, syntactically legal, quantified Boolean formulas.
Start with the syntax, the QBF classes, and the parser from Chapter 15, Exercise 8.
Then add an eval method to the QBF interface and to each of the QBF classes, so that
each formula knows how to decide whether it is true or not. (You will need to pass the
context of current bindings in some form as a parameter to the eval method.) The
eval method should throw an Error if it encounters scoping problems—that is, if
there is a use of a free variable, as in Ax(y), or a redefinition of a bound variable, as
in Ax(Ex(x)). Of course, the parser will also throw an Error if the string cannot
be parsed. All these Errors should be caught, and your DecideQBF should decide
false for such inputs. It should decide true only for inputs that are legal, true, quantified
Boolean formulas. For example, you should have
EXERCISE 9
The following decision problems involving basic regular expressions can all be solved
by algorithms that begin by converting the regular expression into an NFA, as in the
construction of Lemma 7.1.
328 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
Regular-expression-emptiness
Instance: a regular expression r using only the three basic
operations: concatenation, +, and Kleene star.
Question: is L(r) = {}?
Regular-expression-universality
Instance: a regular expression r using only the three basic
operations: concatenation, +, and Kleene star.
Question: is L(r) = *?
Regular-expression-finiteness
Instance: a regular expression r using only the three basic
operations: concatenation, +, and Kleene star.
Question: is L(r) finite?
EXERCISE 10
Consider the following decision problems:
Extended-regular-expression-equivalence
Instance: extended regular expressions r1 and r2 that may use
concatenation, +, Kleene star, and complement.
Question: is L(r1) = L(r2)?
Extended-regular-expression-emptiness
Instance: an extended regular expression r that may use
concatenation, +, Kleene star, and complement.
Question: is L(r) = {}?
EXERCISES 329
This chapter stated (without proof ) that Extended-regular-expression-equivalence requires
nonelementary time. Now, prove that Extended-regular-expression-emptiness requires
nonelementary time if and only if Extended-regular-expression-equivalence requires
nonelementary time.
EXERCISE 11
State the following assertions about natural numbers as formulas in Presburger
arithmetic. Then state whether the assertion is true.
a. The successor of any number is not equal to that number.
b. There is a number that is greater than or equal to any other number.
c. Adding zero to any number gives you that number again.
d. No number is less than three.
e. Zero is not the successor of any number.
f. Between every number and its successor there lies another number.
g. For any x 0, 3x > 2.
EXERCISE 12
Using Java decision methods, show that P is closed for
a. complement
b. union
c. intersection
d. reversal
e. concatenation
f. Kleene star (This last is more difficult than the others. Try for a solution using
dynamic programming.)
EXERCISE 13
Using Java decision methods, show that PSPACE is closed for
a. complement
b. union
c. intersection
d. reversal
e. concatenation
f. Kleene star
EXERCISE 14
Prove or disprove each of the following assertions.
a. The P relation is transitive: if A P B and B P C then A P
C.
b. The P relation is reflexive: for all languages L, L P L.
c. The P relation is symmetric: if A P B then B P A.
d. If A P B and B P then A P.
e. If A P B and B P A then both A and B are in P.
f. If A is EXPTIME-hard and A P B then B is EXPTIME-hard.
330 CHAPTER TWENTY: DETERMINISTIC COMPLEXITY CLASSES
Complexity Classes
331
332 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
A verification method takes two String parameters, the instance and the
certificate, and returns a boolean value.
We have already stretched our definition of a decision method to allow multiple string
parameters, on the understanding that if we wanted to use the strict definition, we could
rewrite any method that takes two strings (x,y) to make it take a single string x+c+y,
for some separator char c not occurring in x. So, in that sense, a verification method is no
different from a decision method, and we can continue to use all our old definitions.
The important difference between verification methods and decision methods lies in
how we use them to define languages. We use a verification method to define, not the set of
instance/certificate pairs, but the set of positive instances: the set of instances for which some
certificate exists.
The instance string x is the string being tested for language membership; it is in the language
if there exists at least one string p that the verification method accepts as a certificate for x.
For example, this is a verification method for the language {zz | z *}:
In this example, the certificate is a string p such that x = pp. The instance string x is in
the language {zz | z *} if and only if such a string p exists. In this case, of course, the
certificate is telling us something we could easily have figured out anyway; a decision method
for the language is not that much more difficult to write. But in other cases, the information
in the certificate appears to make writing a verification method for a language much simpler
than writing a decision method. Stretching the definition to allow int inputs, consider this
verification method:
21.2 NP, NP-HARD, AND NP-COMPLETE 333
boolean compositeVerify(int x, int p) {
return p > 1 && p < x && x % p == 0;
}
This is a verification method for the language of composite numbers; a composite number is
a positive integer with an integer factor other than one or itself. Here, verification seems to
be much easier than decision; checking a proposed factor of x seems to be much easier than
deciding from scratch whether such a factor exists.
As these examples show, a verification method by itself is not a computational procedure
for determining language membership. It does not tell you how to find the certificate; it
only checks it after you’ve found it. Of course, you could search for it blindly. Given any
verification method aVerify for a language A, you could make a recognition method for
A by enumerating the possible certificates p and trying them all. Suppose SigmaStar is an
enumerator class that enumerates *—then we could implement a recognition method this
way:
boolean aRecognize(String x) {
SigmaStar e = new SigmaStar();
while (true) {
String p = e.next();
if (aVerify(x,p)) return true;
}
}
But this is not a decision method; if there is no string p for which aVerify(x,p) returns
true, it simply runs forever. To make this into a decision method, we need some finite bound
on the set of certificates that must be checked.
This is a restriction on our general definition of a verification method: not only must it run
in polynomial time, but if it accepts any certificate for a string x, it must accept at least
334 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
one whose size is polynomial in x. We’ll call this an NP verification method, to distinguish it
from the unrestricted kind. The time and space requirements of a verification method are
measured in terms of the input size n = |x| + |p|. But here, because |p| itself is polynomial
in |x|, we can say that the worst-case time and space of an NP verification method are simply
polynomial in n = |x|.
Let’s examine this class NP. Where does it fit among our other complexity classes? For one
thing, it clearly contains all of P.
This method ignores the certificate string and decides whether x A using
aDec. If x A, it will return false in polynomial time, no matter what
certificate is given. If x A, it will return true in polynomial time, for any
certificate p, and in particular for the empty string. So it meets our definition
for an NP verifier using k = 0.
With a slightly more elaborate construction, we can show that NP is contained in EXPTIME.
If there is a certificate for x, there must be some certificate p with |p| |x|k,
so this method will find it and return true. If not, this method will exhaust all
the certificates with |p| |x|k and return false. The number of iterations of the
loop—the number of calls to the polynomial-time verifier—in the worst case for
an input of size n is the number of possible certificates p with |p| nk, which is
k
|Σ|n . Thus, A EXPTIME.
In this proof, the verifier for A runs in polynomial time, so the language of instance/
certificate pairs is in P. But what about the language A itself—the language of instances for
which a certificate exists? It’s still possible that A is in P, using a decision method with a better
strategy than searching though all possible certificates. But our construction above shows
only that A is in EXPTIME.
We can make hardness and completeness definitions for NP as usual:
We have already seen quite a few NP-hard problems. For example, Generalized-checkers is
EXPTIME-hard and NP EXPTIME, so Generalized-checkers is NP-hard. But Generalized-checkers
is not known to be in NP. Of particular interest are those problems that are both NP-hard and
in NP.
NP-complete problems are, in a sense, the hardest problems in NP. We have not yet seen any
examples of NP-complete problems, but we’ll see several in the following pages.
Putting Theorems 21.1.1 and 21.1.2 together gives us a rough idea of where NP fits in
among our other complexity classes: P NP EXPTIME. But that’s a very wide range; problems
in P are (at least nominally) tractable, while EXPTIME, as we have seen, provably contains
intractable problems. So what about those problems in NP ? Are they all tractable, or are the
harder ones (in particular, the NP-complete problems) intractable? That’s the heart of the
matter—and the most important open question in computer science today. Most researchers
believe that P NP, so that the NP-complete problems are intractable, requiring more than
polynomial time. But no one has been able to prove it. No one has been able either to find
polynomial-time algorithms for these problems or to prove that no such algorithms exist.
336 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
Satisfiability (SAT )
Instance: a formula that uses only Boolean variables, with operators
for logical AND, logical OR, and logical NOT.
Question: is there some assignment of values to the variables that
makes the formula true?
This is related to the QBF problem we’ve already seen, but simpler; a SAT instance is like
a QBF instance with no quantifiers. (In SAT, all the variables are implicitly existentially
quantified.) Mathematicians often use the operator symbols , , and and with precedence
in that order and parentheses as needed; for example:
a b true, under the assignment a=true and b=true
a a false, under all assignments
(a b) (a b) true, under the assignment a=true and b=false
In most programming languages we use symbols that are easier to type. In Java, we might
give the same examples as
a & b
a & !a
(a | b) & !(a & b)
But the concrete syntax isn’t important here. Both the mathematical notation and the Java
notation are just context-free languages, for which simple grammars can be given, and a
string that encodes a SAT instance in one of these languages can be parsed and checked
syntactically in polynomial time. We might say that the language of syntactically correct
SAT instances is in P. But that still leaves the satisfiability question: SAT is a subset of
the syntactically correct instances, containing those that have at least one satisfying truth
assignment. This subset is not context free, and it is not believed to be in P.
It is, however, in NP. We can show this by outlining a verification method for it. Here,
the certificate is a truth assignment; a verification method need only verify that it makes the
formula true.
21.3 A TOUR OF SIX DECISION PROBLEMS 337
Lemma 21.1.1: SAT NP.
The NP verification method for SAT only needs to check, in polynomial time, whether the
particular truth assignment given by the certificate makes the formula true. Checking a truth
assignment is easy; finding one seems much harder. There is a simple procedure, of course: a
blind search for a certificate, as in the proof of Theorem 21.1.2. But that takes O(2n) time in
the worst case. And although many researchers have sought a polynomial-time solution, none
has been found.
If there is a polynomial-time solution, then there is a polynomial-time solution for every
problem in NP, and so P = NP. The following theorem was proved by Stephen Cook in 1971.
We have now proved that SAT is in NP and is NP-hard. Putting these results together, we
conclude
A SAT formula is in conjunctive normal form (CNF ) when it is the logical AND
of one or more clauses, each of which is the logical OR of one or more literals,
each of which is either a positive literal (a single Boolean variable) or a negative
literal (a logical NOT operator applied to a single Boolean variable).
CNFSAT instances use all the same operators as SAT instances, but restrict the order in
which they may be applied: variables may be negated using , then those pieces (the literals)
may be combined using , and finally those pieces (the clauses) may be combined using .
For example, here are some CNFSAT instances, written with a mathematical syntax:
(a) (b) true, under the assignment a=true
and b=true
(a) ( a) false, under all assignments
(a b) ( a b) true, under the assignment a=true
and b=false
( b) ( c) (a b c) ( a b c) false, under all assignments
This concrete syntax is simpler than that concrete syntax we used for SAT. (In fact,
depending on the details, it might well be regular—see Exercise 10.) But the concrete syntax
is not the important question here. For any reasonable representation of CNFSAT instances,
the important question is the satisfiability question: is there a truth assignment for the
variables that makes the whole formula true?
It is easy to see that CNFSAT is in NP, because it can use largely the same verification
method as SAT.
Lemma 21.2.1: CNFSAT NP.
Proof: The restricted syntax is context free and so can be checked in polynomial
time. With that change, an NP verification method can be constructed the same
way as for SAT, in the proof of Lemma 21.1.1.
In fact, checking the certificates could be simpler for CNFSAT than for general SAT. The
verification method only needs to check that, using the given truth assignment, each clause
contains at least one true literal. The more interesting result is that, in spite of the restricted
syntax, CNFSAT is still NP-hard.
21.3 A TOUR OF SIX DECISION PROBLEMS 339
Lemma 21.2.2: CNFSAT is NP-hard.
Proof: By reduction from SAT. See Appendix C.
We have now shown that CNFSAT is in NP and is NP-hard. Putting these results together,
we have
We now have two NP-complete problems. Each can be reduced to the other in polynomial
time: SAT P CNFSAT, and CNFSAT P SAT. In that sense, CNFSAT is exactly as difficult
to decide as SAT itself. It’s simpler without being any easier.
3SAT
It turns out that we can simplify satisfiability even more, without losing NP-completeness. We
can require that every clause contain exactly the same number of literals:
Restricting instances to 3-CNF produces a decision problem that is structurally very simple,
yet still NP-complete:
This is just a restricted syntax for SAT instances, so proving membership in NP is easy:
Proof: The restricted syntax is context free and so can be checked in polynomial
time. With that change, an NP verification method can be constructed the same
way as for SAT, in the proof of Lemma 21.1.1.
n of literals in the clause, we can replace each clause c with one or more 3SAT
clauses as follows.
If n = 1, c = (l) for some single literal l. Create two new variables x and y, not
occurring elsewhere in the formula, and replace c with these four clauses:
(l x y) (l x y) (l x y) (l x y)
A truth assignment satisfies these if and only if it makes l true, so this
transformation preserves satisfiability. (Note that we cannot simply use
(l l l ), because 3-CNF requires three distinct literals in each clause.)
If n = 2, c = (l1 l2). Create one new variable x, not occurring elsewhere in the
formula, and replace c with these two clauses:
(l1 l2 x) (l1 l2 x)
A truth assignment satisfies these if and only if it makes l1 l2 true, so this
transformation preserves satisfiability.
If n = 3, c is already a 3-CNF clause, and no transformation is necessary.
If n 4, c = (l1 l2 l3 … ln ). Create n - 3 new variables, x3 through
xn-1, not occurring elsewhere in the formula, and replace c with these n - 2
clauses:
(l1 l2 x3)
( x3 l3 x4 )
( x4 l4 x5 )
( x5 l5 x6 )
…
( xn-1 ln-1 ln )
(Except for the first and last, these clauses are all of the form ( xi li xi+1 ).)
Suppose the original clause is satisfiable. Then there is some truth assignment
that makes at least one of the li true. This can be extended to a truth assignment
that satisfies all the replacement clauses, by choosing xk = true for all k i, and
xk false for all k > i. Conversely, suppose there is some truth assignment that
satisfies all the replacement clauses. Such a truth assignment cannot make all
the li false; if it did, the first replacement clause would require x3 = true, and so
the second would require x4 = true, and so on, making the last clause false. Since
a satisfying assignment for the replacement clauses must make at least one li
true, it also satisfies the original clause. Therefore, this transformation preserves
satisfiability.
21.3 A TOUR OF SIX DECISION PROBLEMS 341
Applying this transformation to each clause in the original formula produces
a 3CNF formula ', with CNFSAT if and only if ' 3SAT. The entire
construction takes polynomial time. (In fact, the size of ' is linear in the size
of .) Therefore, CNFSAT P 3SAT. Because CNFSAT is NP-hard, we conclude
that 3SAT is NP-hard.
Incidentally, it turns out that 3 is the smallest value of k for which kSAT is NP-hard. The
satisfiability of 1-CNF and 2-CNF formulas can be decided in linear time.
We have now shown that 3SAT is in NP and is NP-hard. Putting these results together, we
have
Gödel City
Kleene Valley
Cookville
Postonia
If you can afford to build k = 5 stations, there’s no problem: just put one in every city. But
suppose you can only afford k = 2. Is it still possible to place at least one at one end of every
road? Yes. If you build one in Turington and one in Postonia, that will suffice. On the other
hand, if you can only afford k = 1, there is no solution.
This is a concrete example of the Vertex Cover problem. In the abstract, a problem
instance has two parts: a graph and a goal k N. The graph G = (V, E) consists of a set V of
vertices and a set E of edges. Each edge is just a pair of vertices. So for the example above,
this would be
342 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
G = (V, E )
v z V = {v, w , x , y , z }
x
E = {(v, w ), (w, x ), (x, y ), (v, y ), (y, z )}
w
A vertex cover of a graph is a subset of the vertex set, V' V, such that for all
(u, v) E, u V' or v V'. Of course every graph has a vertex cover V' = V,
but the decision problem asks, is there a vertex cover V' of size k or less?
Vertex Cover
Instance: a graph G = (V, E) and a number k N.
Question: does G have a vertex cover of size k or less?
It is easy to see that Vertex Cover is in NP. The certificate can just be a proposed vertex
cover, and we have only to check that it is a correct cover and has size k or less. As usual, you
would have to specify some encoding of the problem instance as a string and check that the
input string is a legal encoding. But abstracting away those details, we have
Proof: We can make an NP verification method for Vertex Cover. The instance
is a graph G = (V, E) and a number k N, and the certificate for an instance is
some V' V. Check the syntax of the problem instance and certificate. Then
make a pass over the edge set E. For each (u, v) E, check that at least one of
u or v occurs in V'. For any reasonable encoding of the graph, these tests can be
completed in polynomial time.
Vertex Cover is also NP-hard. The proof, by reduction from 3SAT, is rather surprising:
what could Vertex Cover have to do with Boolean satisfiability? But we can show how to
take any 3-CNF formula and convert it into an instance of Vertex Cover: a graph G and
constant k such that G has a vertex cover of size k or less, if and only if is satisfiable.
We have now shown that Vertex Cover is in NP and is NP-hard. Putting these results
together, we have
Edge Cover
Instance: a graph G = (V, E ) and a number k N.
Question: does G have an edge cover of size k or less?
Recall that our first example of Vertex Cover used this graph:
y
v z
x
Examining the same graph for edge covers, we see that at k = 5, we can use every edge,
E' = E, to make an edge cover; at k = 3 there is still a solution at E' = {(y, z), (v, w), (x, y)};
while at k = 2 there is no solution.
The surprising thing here is that, while Vertex Cover is NP-complete, Edge Cover is in P.
We won’t present it here, but there’s a fairly simple, polynomial-time algorithm for finding a
minimum-size edge cover for any graph. This is a good example of how unreliable intution
can be as a guide to decision problems. Edge Cover may seem like a close relative of Vertex
Cover, but it’s in a different and (apparently) much simpler complexity class. Vertex Cover
seems at first glance to be completely unrelated to Boolean-satisfiability problems; yet Vertex
Cover, SAT, CNFSAT, and 3SAT are all NP-complete, so they can all be reduced to each other
in polynomial time.
Hamiltonian Circuit
The next is a classic route-finding problem: the problem of finding a path that starts and
ends at the same point and visits every vertex in the graph exactly once. (Think of a delivery
truck that needs to start at the warehouse, visit a list of destinations, and return to the
warehouse.)
Hamiltonian Circuit
Instance: a graph G = (V, E).
Question: is there a path that starts and ends at the same vertex,
arriving at every vertex exactly once?
344 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
The first two have Hamiltonian circuits; the third does not.
It is easy to see that Hamiltonian Circuit is in NP. The certificate can just be a list of
vertices in some order. We have only to check that it is a Hamiltonian circuit for the graph.
As usual, you would have to specify some encoding of the problem instance as a string and
check that the input string is a legal encoding. But abstracting away those details, we have
Hamiltonian Circuit is also NP-hard. The proof is by reduction from Vertex Cover.
Given any instance of Vertex Cover, consisting of a graph G and a natural number k, we can
construct a new graph H such that H has a Hamiltonian circuit if and only if G has a vertex
cover of size k or less.
We have now shown that Hamiltonian Circuit is in NP and is NP-hard. Putting these
results together, we have
Many variations of this problem are also NP-complete. For example, the problem remains NP-
complete if we allow paths that visit every vertex exactly once, but don’t end up where they
started. It remains NP-complete if we consider graphs with one-way edges (directed graphs).
On the other hand, there’s an interesting relative of Hamiltonian Circuit called Euler
Circuit. A Hamiltonian circuit must visit each vertex exactly once; an Euler circuit must use
each edge exactly once.
21.3 A TOUR OF SIX DECISION PROBLEMS 345
Euler Circuit
Instance: a graph G = (V, E).
Question: is there a path that starts and ends at the same vertex and
follows every edge exactly once?
Alhough this sounds like a close relative of Hamiltonian Circuit, it is actually much simpler.
We won’t present it here, but there is a simple algorithm to find an Euler circuit for a graph
or prove that none exists. (A graph has an Euler circuit if and only if it is fully connected and
every vertex has an even number of incident edges.) So Euler Circuit is in P.
Traveling Salesman
Imagine you’re planning the route for a delivery truck. It leaves the distribution center in the
morning, makes a long list of deliveries, and returns to the distribution center at night. You
have only enough gas to drive k miles. Is there a route you can use to make all your deliveries
without running out of gas?
This is a Traveling Salesman Problem.
Proof: We can make an NP verification method for TSP. The instance is a graph
G = (V, E) and a number k; the certificate for an instance is some sequence
of vertices. Check the syntax of the problem instance and certificate. Then
make a pass over the certificate, checking that it visits every vertex once, moves
only along edges actually present in the graph, ends where it began, and has a
total length k. For any reasonable encoding of the graph, these tests can be
completed in polynomial time.
We have now shown that TSP is in NP and is NP-hard. Putting these results together, we
have
Theorem 21.2.6: TSP is NP-complete.
There are many variations of this problem. In some, the goal is a closed circuit; in
others, the goal is a path from a specified target to a specified destination. In some the graph
is undirected; in others the graph is directed. In some, the goal is to visit every vertex at
least once; in others, exactly once. In some the edge lengths are Euclidean, or at least are
guaranteed to obey the triangle inequality (length(x, z) length(x, y) + length(y, z)); in
others the lengths are unconstrained. In some, the graph is always complete, with an edge
between every pair of vertices; in others the graph contains some edges but not others. All
these variations turn out to be NP-complete.
Unfortunately, there’s no standard terminology for these problems. In different contexts,
Traveling Salesman Problem might mean any one of these variations.
An amazing variety of problems in all kinds of domains, some very far removed from
computer science, have turned out to be NP-complete. All of them stand or fall together.
For many of these decision problems there is a corresponding optimization problem. For
example:
When studying formal language we focus on decision problems, but in other contexts you
are much more likely to encounter the corresponding optimization problems. People often
fail to distinguish between a decision problem and its corresponding optimization problem.
This is technically sloppy: P and NP are sets of languages, so it is not really correct to speak of
an optimization problem as being NP-complete. However, this casual usage is often justified
by the close connection between the two kinds of problems.
Consider, for example, our Vertex Cover decision problem and its corresponding
optimization problem, Vertex Cover Optimization. There is a simple recursive solution for
Vertex Cover Optimization:
minCover(G)
if G has no edges, return {}
else
let (v1, v2) be any edge in G
let V1 = minCover(G - v1)
let V2 = minCover(G - v2)
let n1 = |V1|
let n2 = |V2|
if n1 < n2 return V1 {v1}
else return V2 {v2}
Here, the expression G - v refers to the graph G, with the vertex v and all its incident edges
removed. In the base case, when there are no edges in the graph, the empty set is clearly the
minimum-size vertex cover. Otherwise, we choose any edge (v1, v2) in G. At least one of
the two ends of this edge must be in any cover for G, so we try both. We remove v1 and the
edges it covers and then recursively find a minimum cover V1 for the rest of the graph; then
V1 {v1} must have the minimum size of any cover for G that contains v1. Then we do the
same for the other vertex, so that V2 {v2} has the minimum size of any cover for G that
contains v2. Because we know that at least one of v1 or v2 must be part of any vertex cover, we
can conclude that the smaller of V1 {v1} and V2 {v2} must be a minimum vertex cover for
G. (If they are the same size, it would be correct to return either one.)
This is a reasonably simple solution, but very expensive. Our minCover calls itself
recursively, twice. That makes a binary tree of recursive calls, whose height is (in the worst
case) equal to the number of edges in the graph. That will take exponential time. Now, if we
could find n1 and n2 some other way—find the size of a minimum-size vertex cover, without
actually finding the cover—then we could reorganize the code so that minCover calls itself
recursively, only once:
21.4 THE WIDER WORLD OF NP-COMPLETE PROBLEMS 349
minCover(G)
if G has no edges, return {}
else
let (v1, v2) be any edge in G
let n1 = minCoverSize(G - v1)
let n2 = minCoverSize(G - v2)
if n1 < n2 return minCover(G - v1) {v1}
else return minCover(G - v2) {v2}
This makes just one recursive call for each edge in the graph. In fact, not counting the time
spent in minCoverSize, this is now a polynomial-time solution for Vertex Cover Optimization.
That’s the basis of the following proof:
Theorem 21.3: Vertex Cover P if and only if Vertex Cover Optimization can be
computed in polynomial time.
Proof: Suppose Vertex Cover Optimization can be computed in polynomial
time. We can immediately use this to decide any Vertex Cover instance (G, k):
in polynomial time, compute a minimum-size cover U for G, and return true if
|U| k, false if |U| > k.
In the other direction, suppose Vertex Cover P. We can compute the size of a
minimum-size vertex cover like this:
minCoverSize(G)
for each i from 0 to the number of vertices in G
if (G, i) Vertex Cover return i
This is a common situation: either the decision problem and the corresponding optimization
problem can be done in polynomial time, or neither can. Such optimization problems stand
or fall along with the thousands of decision problems for NP-complete languages.
One interesting feature of optimization problems, not shared by decision problems, is
that they are natural targets for approximation. For example:
350 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
Find the most valuable way Find a reasonably valuable way to fill
to fill a knapsack from a given a knapsack from a given inventory,
inventory, so that the total so that the total weight is no more
weight is no more than w. than w.
Find the least wasteful way to Find a way to cut a sheet of metal
cut a sheet of metal into a set of into a set of required parts, with a
required parts. reasonably small amount of waste.
In practice, many applications will be satisfied by any reasonably good solution, even if it
is not provably optimal. That’s good news, because while no polynomial-time solutions
are known for the optimization problems, we can sometimes find a reasonably close
approximation in polynomial time. (Of course, what makes a solution reasonably close to
optimal depends on the application.)
A good algorithm for finding an approximate solution for one NP-related optimization
problem does not necessarily translate to others. Each problem has unique approximation
properties. For example, consider Traveling Salesman Problem. In some variations, the
edge lengths are guaranteed to obey the triangle inequality (length(x, z) length(x, y) +
length(y, z)); in others the lengths are unconstrained. From the formal-language point of
view it makes no difference; both versions are NP-complete. But if you are trying to find an
approximation algorithm, it makes a great deal of difference. There is a good algorithm for
finding a circuit that is guaranteed to be no more than 3/2 times the length of the shortest
circuit—if the triangle inequality is obeyed. If the triangle inequality is not obeyed, you
can prove that no such approximation is possible in polynomial time, unless P = NP. (See
Exercise 16.)
But we will not explore these approximation algorithms further here. Our focus here is
on formal language. In the study of formal language we are chiefly concerned with deciding
language membership. In this black-and-white world, the NP-complete problems are all
21.5 NONDETERMINISTIC TURING MACHINES 351
treated as interchangeable; each may be reduced to the others in polynomial time. Indeed,
we tend to think of them as just different faces of the same problem—different avatars of the
one, big, abstract, NP-complete superproblem.
boolean lrec(String s) {
BinaryStar e = new BinaryStar();
while (true) {
String guide = e.next();
String result = runGuided(N,s,guide);
if (result.equals("accept")) return true;
}
}
A similar construction, with a little more attention to detail, can be used to make a decision
method for L(N ) from any total NDTM N. (See Exercise 17.) So our definition of a
recursive language is likewise unaffected by the addition of nondeterminism.
On the other hand, the addition of nondeterminism seems to have a major effect on
our complexity classes. To explore this, we’ll need to revisit our cost model. We will say that
the cost of an NDTM computation is the cost of the most expensive of any of the possible
sequences on the given input:
time(M, x) = the maximum number of moves made by M on input x *
space(M, x) = the maximum length of tape visited by M on input x *
Building from that foundation, what kinds of complexity classes do we get? Surprisingly, one
at least is already familiar: the class of languages defined by some polynomial-time NDTM is
exactly the class NP. We prove this in two parts.
21.5 NONDETERMINISTIC TURING MACHINES 353
Theorem 21.4.2: If L NP, L = L(N) for some polynomial-time NDTM N.
Proof: Let L be an arbitrary language in NP. Then L has a verification method
vL. Because this is an NP-verification method, it runs in polynomial time, and
we have a polynomial bound t(x) on the length of any viable certificate y. Now
construct an NDTM N with two phases of computation. In the first phase, N
skips to the end of its input string x, writes a separator symbol #, and then writes
a certificate string, choosing up to t(x) symbols to write nondeterministically.
Then it returns the head to the starting position, the first symbol of x. In the
second phase, N performs the same computation as vL(x, y), verifying the
instance/certificate pair. Now L(N) = L, and N runs in polynomial time.
The NDTM constructed by this machine does all of its nondeterministic computation up
front, capturing it in the certificate. Verifying the certificate is purely deterministic. The same
idea works in reverse: given any NDTM N, we can construct a verification algorithm that
works by capturing all the nondeterminism in the certificate. In fact, we’ve already seen the
construction: it’s our runGuided method.
Putting those results together, we see that NDTMs and verification methods are
interchangeable ways to define the complexity class NP. (“There it is again!”) In fact, NP was
originally defined in terms of NDTMs; NP stands for Nondeterministic Polynomial time.
NDTMs can be used to define many other nondeterministic complexity classes, both
for space and time: NPSPACE, NEXPTIME, and so on. But of all these, the most important open
questions revolve around NP.
354 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
And we know that P EXPTIME, so at least one of the three inclusions above must be proper.
But we don’t know which one, or ones!
The one most important open problem in this field today is the question of P versus
NP. Roughly speaking, the P class represents problems for which we can find the answer in
polynomial time; the NP class represents problems for which we can verify a given answer in
polynomial time. Our intuition tells us that finding an answer can be much more difficult
than verifying one. Finding a proof feels harder than checking someone else’s proof; solving
a jigsaw puzzle takes hours of work, while checking someone’s solution takes just a glance.
(Same thing for Sudoku puzzles—and yes, decision problems related to Sudoku have been
shown to be NP-complete.) To this intuition, we add a long experience of failure—the failure
to find a polynomial-time solution to any of the NP-hard problems. Many of these problems
are of great commercial importance, and many programmers over the last three decades have
struggled with them, but to no avail.
For most researchers, this all adds up to a feeling that these problems are truly
intractable, though we can’t yet prove it. In fact, modern computing depends on this
supposed intractability in many ways. For example, many of the cryptographic techniques
that protect financial transactions on the Internet depend on this assumption. If a
polynomial-time solution to an NP-hard problem were ever found, there would be quite a
scramble to find other ways to secure network transactions!
But just because many researchers have this same feeling is no reason to consider the
solution inevitable. The history of mathematics and computing contains a number of widely
believed hunches and conjectures that turned out to be false. Indeed, we’ve seen a major one
already in this book: in the early 1900s, most mathematicians shared David Hilbert’s feeling
that a proof of the completeness and correctness of number theory was just around the
corner, until Kurt Gödel proved them all wrong. We’ve learned not to be too complacent; a
consensus of hunches is no substitute for a proof.
All in all, it’s a dramatic situation. There is a tension that has existed since the problem
was first identified over 30 years ago, and it has grown as the list of NP-complete problems has
grown. To resolve this growing tension a solid proof is needed: ideally, either a polynomial-
time decision method for some NP-complete problem, or a proof that no such method exists.
Certain fame awaits anyone who can solve this problem. Fame, and fortune: the P versus NP
EXERCISES 355
problem is one of seven key problems for which the Clay Mathematics Institute has offered
prizes—$1 million for each problem. The prizes were announced in 2000, in Paris, 100
years after Hilbert announced his famous list of problems in that same city. So far, the prizes
remain unclaimed.
Exercises
EXERCISE 1
For each of the following verification methods, describe the language it defines, and write
a decision method for it.
a. boolean xxVerify(String x, String p) {
return x.equals(p+p);
}
b. boolean compositeVerify(int x, int p) {
return p > 1 && p < x && x % p == 0;
}
c. boolean vc(String x, String p) {
return x.equals(p+p+p);
}
d. boolean vd(String x, String p) {
return x.equals(p);
}
e. boolean ve(int x, int p) {
return x = 2*p;
}
f. boolean vf(int x, int p) {
return x = p*p;
}
EXERCISE 2
Assuming that L is any NP-complete language, use the definitions to prove the following
statements:
a. For all A NP, A P L.
b. For all A, if A is NP-hard then L P A.
c. For all A, if A is NP-complete then A P L.
356 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
This is similar to Exercise 6, but not the same; you should not use the standard
distributive law, but should use instead the special transformation from the proof of
Lemma 21.2.2 and the commuted version of it, which introduce extra variables as
necessary but achieve a polynomial-time reduction. (Our sample output above names the
extra variables #i, but you can name them whatever you like.)
EXERCISE 9
Use the construction from the proof of Lemma 21.3.2 to convert these CNF formulas to
3-CNF. (This may involve the introduction of new variables.)
a. a
b. (a b c)
c. (a b)
d. (a b)
358 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
e. (a b) (c d)
f. (a b c d)
g. (a b c d e f g)
EXERCISE 10
We define a concrete syntax for CNF formulas using the operator symbols , , and .
There is exactly one pair of parentheses around each clause and no other parentheses in
the formula. Variable names are strings of one or more lowercase letters.
a. Show that this language is regular by giving a left-linear grammar for it.
b. Show that the restriction of this language to 3-CNF is regular by giving a left-linear
grammar for it.
EXERCISE 11
(This exercise uses material from Appendix C, Section C.1.) The proof of Cook’s
Theorem defines a function e = f (a, b, c, d). This function defines a symbol in an ID for
a Turing machine in terms of the previous-step symbols at and near that position:
... a b c d ...
... e ...
Give pseudocode for computing this function f using the transition function for the
TM.
EXERCISE 12
(This exercise uses material from Appendix C, Section C.3.) Convert a a to a
3SAT instance, show the Vertex Cover instance constructed for it as in the proof of
Lemma 21.4.2, and show a minimum-size vertex cover for it. (Because the formula is not
satisfiable, you should find that there is no vertex cover of size 2m + n.)
EXERCISE 13
Consider the following decision problem:
Monotone 3SAT
Instance: a formula in 3-CNF with the property that in each clause,
either all literals are positive or all are negative.
Question: is there some assignment of values to the variables that
makes the formula true?
1-in-3SAT
Instance: a formula in 3-CNF.
Question: is there some assignment of values to the variables that
makes exactly one literal in each clause true?
Here, the instances are the same as for 3SAT, but the question is different; we need to
decide whether the formula has an assignment that makes exactly one literal in every
clause true. (Our regular satisfiability problems ask whether there is an assignment that
makes at least one literal in every clause true.)
Prove that 1-in-3SAT is NP-complete. (Hint: To prove NP-hardness, use a reduction from
3SAT. For each clause in the original formula, come up with a set of new clauses that
have a 1-in-3 satisfying assignment if and only if the original clause has a plain satisfying
assignment. This will require the introduction of new variables.)
EXERCISE 15
The pseudocode in the proof of Theorem 21.3 is inefficient. (We were only interested in
showing polynomial time.) Show an implementation of minCoverSize that makes only
logarithmically many calls to the decision method for Vertex Cover.
EXERCISE 16
The Traveling Salesman Optimization problem is this: given a graph G with a length for
each edge, find a path that starts and ends at the same vertex, visits every vertex exactly
once, and has a minimum total length. An approximate solution to this optimization
problem would be an algorithm for finding a reasonably short circuit. We’ll say that a
circuit is reasonably short if it is no more than k times longer than the minimum-length
path, for some constant k > 1. (Such a constant k is called a ratio bound.)
Now consider the following construction. Given a graph G = (V, E) and a constant k > 1,
construct a new graph G' with the same vertices. For each edge in G, add the same edge
to G' with length 1. For each pair of vertices not connected by an edge in G, add that
edge to G' with length k|V |. (Thus G' is a complete graph, with an edge of some length
between every pair of vertices.)
a. Prove that if G has a Hamiltonian circuit then any Traveling Salesman Optimization
solution for G' has a total length |V |.
360 CHAPTER TWENTY-ONE: COMPLEXITY CLASSES
b. Prove that if G does not have a Hamiltonian circuit then any Traveling Salesman
Optimization solution for G' has a total length > k|V |.
c. Using these results, prove that for any constant k > 1, if there is an approximate
solution for Traveling Salesman Optimization with a ratio bound k that runs in
polynomial time, then P = NP.
(Note that our construction builds a graph G' with lengths that may violate the triangle
inequality. For graphs that obey the triangle inequality, although TSP remains NP-
complete, there are approximation techniques with good ratio bounds.)
EXERCISE 17
An NDTM N is total if, for all inputs x *, all legal sequences of moves either accept
(by entering an accepting state) or reject (by entering a configuration from which there is
no possible next move).
Using the ideas from the proof of Theorem 21.4.1, show how to construct a decision
method ldec from any total NDTM N, so that L(ldec) = L(N ).
EXERCISE 18
Using your solution to Exercise 17, show that NP PSPACE. (You do not need to show
implementations for things like runGuided and BinaryStar, but you should
explain your assumptions about their space requirements.)
EXERCISE 19
Implement the runGuided method for NDTMs, as outlined in Section 21.5. Then,
using that and your ldec method from Exercise 17, write a Java implementation of
a universal Turing machine for NDTMs. Your program should read the NDTM to be
simulated from a file. You can design your own encoding for storing the NDTM in a
file, but it should be in a text format that can be read and written using an ordinary
text editor. Your class should have a main method that reads a file name and a text
string from the command line, uses your ldec method to decide whether the string is
in the language defined by the NDTM, and reports the result. For example, if the file
anbncn.txt contains the encoding of an NDTM for the language {anbncn}, then your
program would have this behavior:
From an NFA to a
Regular Expression
For every NFA, there is an equivalent regular expression. In Chapter 7
we showed an example of how to construct a regular expression for
an NFA, but skipped the general construction and proof. In this
appendix we tackle the proof. Warning: There are mathematical
proofs whose elegance compels both agreement and admiration.
They are so simple and concise that they can be understood in their
entirety, all at once. Unfortunately, this is not one of them!
361
362 APPENDIX A: FROM AN NFA TO A REGULAR EXPRESSION
Let M be any NFA with n states numbered q0 through qn-1. Define the internal
language L(M, i, j, k) = {x | (qi, x) * (qj, ε), by some sequence of IDs in which
no state other than the first and the last is numbered less than k}.
L(M, i, j, k) is the set of all strings that take the NFA from the state i to the state j, without
passing through any state numbered less than k. For example, let M be this DFA:
0 1
1 0
q0 q1 q2
1 0
• L(M, 2, 2, 3) is the set of strings x for which (q2, x) * (q2, ε), where states
other than the first and the last are all numbered 3 or higher. Since there are
no states in M numbered 3 or higher, there can be no IDs in the sequence
other than the first and last. So there are only two strings in this language:
ε (which produces the one-ID sequence (q2, ε)) and 1 (which produces the
two-ID sequence (q2, 1) (q2, ε)). Thus L(M, 2, 2, 3) = L(1 + ε).
• L(M, 2, 2, 2) is the set of strings x for which (q2, x) * (q2, ε), where states
other than the first and the last are all numbered 2 or higher. Since q2 is the
only state that can appear in the sequence, the strings in this language are
just strings of any number of 1s. Thus L(M, 2, 2, 2) = L(1*).
• L(M, 2, 1, 2) is the set of strings x for which (q2, x) * (q1, ε), where states
other than the first and the last are all numbered 2 or higher. Since q2 is the
only state other than the last that can appear in the sequence, the strings in
this language are just strings of any number of 1s with a single 0 at the end.
Thus L(M, 2, 1, 2) = L(1*0).
• L(M, 0, 0, 0) is the set of strings x for which (q0, x) * (q0, ε), where states
other than the first and the last are all numbered 0 or higher. Since all states
in M are numbered 0 or higher, this is just the language of strings that
take M from its start state to its accepting state. Thus, for this machine,
L(M, 0, 0, 0) = L(M ).
A.2 THE D2R FUNCTION 363
d2r(M, i, j, k)
if k (number of states in M ) then
let r = a1 + a2 + …, for all ap {ε} with qj (qi, ap),
or if there are no transitions from qi to qj
if i = j then return r + ε else return r
else
let r1 = d2r(M, i, j, k + 1)
let r2 = d2r(M, i, k, k + 1)
let r3 = d2r(M, k, k, k + 1)
let r4 = d2r(M, k, j, k + 1)
return r1 + (r2)(r3)*(r4)
This is a complete procedure for constructing a regular expression for any internal language
of an NFA.
L(M )= L(M,0,j,0)
⊃
qj ∈ F
b
q0 q1
a
A.3 PROOF OF LEMMA 7.2 365
There is only one accepting state, q1, so L(M) = L(M, 0, 1, 0). The evaluation of
d2r(M, 0, 1, 0) uses the recursive case, returning r1 + (r2)(r3 )*(r4), where
r1 = d2r(M, 0, 1, 1)
r2 = d2r(M, 0, 0, 1)
r3 = d2r(M, 0, 0, 1)
r4 = d2r(M, 0, 1, 1)
Now the evaluation of d2r(M, 0, 1, 1), for both r1 and r4, uses the recursive case,
returning r5 + (r6)(r7)*(r8), where
r5 = d2r(M, 0, 1, 2) = b
r6 = d2r(M, 0, 1, 2) = b
r7 = d2r(M, 1, 1, 2) = b + ε
r8 = d2r(M, 1, 1, 2) = b + ε
Those four are all base cases, as shown. Finally the evaluation of d2r(M, 0, 0, 1), for both r2
and r3, uses the recursive case, returning r9 + (r10)(r11)*(r12), where
r9 = d2r(M, 0, 0, 2) = a + ε
r10 = d2r(M, 0, 1, 2) = b
r11 = d2r(M, 1, 1, 2) = b + ε
r12 = d2r(M, 1, 0, 2) = a
Those are also base cases, so we can now assemble the whole expression, omitting some
unnecessary parentheses to make the result slightly more readable:
d2r(M, 0, 1, 0)
= r1 + r2r3*r4
= r5 + r6r7*r8 + r2r3*(r5 + r6r7*r8)
= r5 + r6r7*r8 + (r9 + r10r11*r12)( r9 + r10r11*r12)*(r5 + r6r7*r8)
= b + b(b + ε)*(b + ε)
+ (a + ε + b(b + ε)*a)(a + ε + b(b + ε)*a)*(b + (b + ε)*(b + ε))
The language accepted by the NFA in this example is the language of strings over the
alphabet {a, b} that end with b; a simple regular expression for that language is (a + b) * b. It
is true, but not at all obvious, that this short regular expression and the long one generated
by the formal construction are equivalent. As you can see, the regular expression given by
the construction can be very much longer than necessary. But that is not a concern here.
The point of Lemma 7.2 is that the construction can always be done; whether it can be done
concisely is another (and much harder) question. It can always be done—so we know that
there is a regular expression for every regular language.
B
AP PEN DIX
A Time-Hierarchy
Theorem
The more time you allow for a decision, the more languages you can
decide. It’s not a surprising claim, perhaps, but it isn’t easy to prove.
366
B.2 TIME-CONSTRUCTIBLE FUNCTIONS 367
To say that f(n) is o(g(n)) is to say that f grows more slowly than g. That’s stricter than big O;
little o rules out the possibility that f and g grow at the same rate. Where big O is like , little
o is like <. That will serve as our definition of “more time”: g(n) allows more time than f (n) if
f(n) is o(g(n)).
In Chapter 19, Theorem 19.2 summarized the comparative rates of growth of some
common asymptotic functions. It turns out that all the results given there for big O can be
tightened to little o. We can state the following:
Theorem B.1:
• For all positive real constants a and b, a is o((log n)b).
• For all positive real constants a and b, and for any logarithmic base, (log n)a is o(nb).
• For all real constants a > 0 and c > 1, na is o(cn).
A function f is time constructible if we can compute what f(n) is within O(f (n)) time. (This
use of a time bound is unusual in that it measures time as a function of the input number,
not as a function of the size of the representation of that number.) For example, the function
n2 is time constructible, because we can implement it as
int nSquared(int n) {
return n * n;
}
368 APPENDIX B: A TIME-HIERARCHY THEOREM
This computes n2 and takes (log n) time, which is certainly O(n2). Most commonly
occurring functions, like n, n log n, nk, and 2n, are easily shown to be time constructible. (In
fact, it is challenging to come up with a function that is (n) but not time constructible.)
Roughly speaking, this says that any more than an extra log factor of time allows you to
decide more languages. Of course, you can still decide all the languages you could have
decided without that extra log factor; clearly TIME(f(n)) TIME (g(n) log n). The tricky part is
to show that the inclusion is proper—that there exists at least one language
B.4 A TIME-HIERARCHY THEOREM 369
A TIME (g(n) log n) that is not in TIME ( f (n)). Such a language A is the one decided by this
method:
1. boolean aDec(String p) {
2. String rejectsItself =
3. "boolean rejectsItself(String s) {" +
4. " return !run(s,s); " +
5. "} ";
6. int t = gVal(p.length());
7. return runWithTimeLimit(rejectsItself,p,t);
8. }
Here, gVal is a method that computes g(n) and does it in O(g(n)) time; we know that such
a method exists because g is time constructible.
Let A be the language decided by aDec. Because gVal is the method given by the time
constructability of g, line 6 runs in O(g(n)) time. Line 7 requires O(g(n) log g(n)) time, so
the whole method aDec requires O(g(n) log g(n)) time. This shows A TIME (g(n) log g(n)).
It remains to show that A TIME (f(n)).
Toward that end, we prove a lemma about the method aDec:
Lemma B.1: For any language in TIME ( f(n)), there is some decision method
string p for which aDec(p) = !run(p,p).
Proof: For any decision method string p, two conditions are sufficient to
establish that aDec(p) = !run(p,p): first, that p is O( f(n)), and second,
that p is sufficiently long. For if p is O( f(n)), and given that f(n) is o(g(n)), then
time(run, (rejectsItself,p)) is o(g(|p|)). This means that there exists
some n0 such that for every |p| n0,
time(run, (rejectsItself,p)) g(|p|).
Because g(|p|) = t, we can conclude that when p is sufficiently long, the
runWithTimeLimit at line 7 does not exceed its time limit. Therefore it
produces the same result as a plain run:
runWithTimeLimit(rejectsItself,p,t)
= run(rejectsItself,p)
= !run(p,p).
Now let B be any language in TIME (f(n)). By definition, there must exist an
O(f(n)) decision method for B. Its source code may not have |p| n0, but it is
easy to construct a sufficiently long equivalent method, simply by appending
370 APPENDIX B: A TIME-HIERARCHY THEOREM
spaces. Thus, B always has some decision method string p that meets our two
conditions, so that aDec(p) = !run(p,p).
Lemma B.1 shows that for every decision method for a language in TIME ( f(n)), there
is at least one string p about which aDec makes the opposite decision. It follows that the
language A, the language decided by aDec, is not the same as any language in TIME ( f(n)).
Therefore A TIME (f(n)), and this completes the proof of Theorem B.2.
The proof of this time-hierarchy theorem uses a variation of the same technique
used in Section 18.2 to prove that Lu is not recursive. Note the similarity between the
rejectsItself method used in this proof and the nonSelfAccepting method
used there.
B.5 Application
The time-hierarchy theorem is a tool for proving that one complexity class is a proper subset
of another. For example, we can use it to prove that TIME (2n) is a proper subset of TIME(22n).
To apply the theorem, we choose f (n) = 2n and g(n) = 22n / log2 n. (Note the separation here:
we choose a g that is at least a log factor below the time-complexity function we’re ultimately
interested in.) First, to invoke the time-hierarchy theorem, we need to know that g is time
constructible. Next, we need to know that f (n) is o(g(n)). By definition, this is true if for
/
every c > 0 there exists some n0, so that for every n n0, 2n ≥ 1c 22n log2 n . Simplifying, we
see that this inequality is satisfied whenever we have 2n / log2 n c, which is clearly true for
sufficiently large n, for any constant c. So we can apply the time-hierarchy theorem and
conclude that TIME( f (n)) TIME(g (n) log n). Substituting back in for f and g, we conclude
that TIME (2n) is a proper subset of TIME (22n).
This time-hierarchy theorem cannot be used to prove that time-complexity classes are
distinct when they differ by exactly one log factor. For example, though it is true that TIME (n)
is a proper subset of TIME (n log n), our time-hierarchy theorem cannot be used to prove it.
The theorem kicks in only when the complexity functions differ by more than a log factor, in
the little-o sense.
C
AP PEN DIX
Some NP-Hardness
Proofs
This appendix presents the proofs of some of the trickier NP-hardness
results from Chapter 21.
371
372 APPENDIX C: SOME NP-HARDNESS PROOFS
...
t(x) ... ... ... ...
In terms of this grid, we can be still more specific about the mapping function f
we’re looking for; f (x) should be satisfiable if and only if the grid can be filled so
that
1. the first line is a starting ID of M on x (for some y), as shown in the illustration
above,
2. each line follows from the line before it according to the transition function of M
(or is the same as the line before it, when M has halted), and
3. the final row is an ID in an accepting state.
Now our mapping function f (x) can construct a big formula using these Boolean
variables. The formula is a conjunction of three parts, encoding requirements 1
through 3 above.
f (x)= starts moves accepts
374 APPENDIX C: SOME NP-HARDNESS PROOFS
Here’s where it gets technical, and we’ll skip these details. Roughly speaking:
1. starts enforces the requirement that the first line of the grid is a starting ID of M on
x. This establishes some of the values b1,j so that the start state is q0, the tape to the
left is all Bs, and the tape at and to the right of the head contains x#. It leaves the
y part, the certificate, unconstrained; ultimately, our formula will be satisfiable if
there is some y part that makes the whole system true.
2. moves enforces the requirement that each line follow from the next according to
M’s transition function. In fact, because of the way a Turing machine works, each
individual cell bi+1,j is a simple function of its near neighbors in the previous row,
bi,j-1 through bi,j+2:
... a b c d ...
... e ...
Theorem C.1: For any SAT formula, there is an equivalent formula in CNF.
Proof sketch: Any SAT formula can be transformed to CNF by using simple
laws of Boolean logic. First, all the operators can be moved inward until they
apply directly to variables, thus forming the literals for CNF. For this we can use
DeMorgan's laws and the law of double negation:
• (x y) = ( x y)
• (x y) = ( x y)
• x=x
Then, all the operators can be moved inward until they apply directly to
literals, thus forming the clauses for CNF. For this we can use the distributive
law:
• (x y) z = (x z) (y z)
This is only a proof sketch; a more complete proof would have to define the structure of
SAT instances more closely and then use a structural induction to demonstrate that the
transformations proposed actually do handle all possible SAT instances. (You can experiment
with this for yourself; see Exercise 6 in Chapter 21.)
The theorem shows that any SAT instance can be converted into an equivalent
CNFSAT instance '. Then, of course, SAT if and only if ' CNFSAT. So why doesn’t
this prove that SAT P CNFSAT? The problem is that, although it is a reduction from SAT
to CNFSAT, it is not always a polynomial-time reduction. In particular, that distributive
law is trouble. It makes duplicate terms; as written above, the right-hand side contains two
copies of the term z. For formulas that require this operation repeatedly, the result can be
an exponential blow-up in the size of the formula (and in the time it takes to construct it).
Consider a term of this form:
(a1 b1) (a2 b2) … (ak bk)
for some k. After the conversion, the CNF formula will have one clause for each way to
choose one variable from every pair. For example:
(a1 b1) (a2 b2) =
376 APPENDIX C: SOME NP-HARDNESS PROOFS
Next, all the operators can be moved inward until they apply directly to
literals, thus forming the clauses for CNF. For this we use a special variation of
the distributive law, one that introduces a new variable di:
(x y) z = (di x) (di y) ( di z)
(Each application of this law generates a new variable di, distinct from any other
variable in .) The result of applying this law is no longer an equivalent formula,
because it contains a new variable. However, suppose the left-hand side is
satisfiable. Then there is some truth assignment that makes either x y or z true.
We can extend this truth assignment to one that makes the right-hand side true,
by making di = z. Conversely, suppose the right-hand side is satisfiable; then
there is some truth assignment that makes all three clauses true. This assignment
must make either di or di false; therefore it must make either x y or z true.
So it is also a satisfying truth assignment for the left-hand side. Therefore, this
transformation preserves satisfiability.
C.3 VERTEX COVER IS NP-HARD 377
Repeated application of this law (and the commuted version, x (y z) =
(di x) ( di y) ( di z)) completes the construction of a CNF formula
' that is satisfiable if and only if the original formula is satisfiable. The entire
construction takes polynomial time, so we conclude that SAT P CNFSAT.
Because SAT is NP-hard, we conclude that CNFSAT is NP-hard.
This is only a proof sketch; a more complete proof would have to define the structure
of SAT instances more closely and then use a structural induction to demonstrate that
the transformations proposed actually do convert all possible SAT instances and can be
implemented to do it in polynomial time. (You can experiment with this for yourself; see
Exercise 8 in Chapter 21.)
Then, for each clause, we’ll add three vertices connected in a triangle. Our example has two
clauses, so we’ll add this:
Finally, we’ll add edges from each triangle vertex to the vertex for the corresponding literal
in that clause. For instance, the first clause is (x y z), so our first triangle will get edges
from its three corners to the three vertices for x, y, and z. We end up with this graph:
x x y y z z
378 APPENDIX C: SOME NP-HARDNESS PROOFS
Now consider a minimum-size vertex cover for this graph. It must include two of the
three vertices from each triangle—there’s no other way to cover those three triangle edges.
It must also include one vertex for each variable, the positive or negative literal, in order
to cover those edges at the top of the illustration. So no cover can contain fewer than
2m + n = 7 vertices. Here’s an example at that minimum size, with the covered vertices
circled:
x x y y z z
The covered literals give us a satisfying assignment for : we covered x, y, and z, so let
x = true, y = true, and z = true. By construction, that makes at least one literal in each
clause true. In fact, any cover of the size 2m + n gives a satisfying truth assignment, and any
satisfying truth assignment gives a cover of the size 2m + n. This correspondence is the main
idea in the following proof.
First, for each of the n variables xi, create a variable component, consisting of two
vertices, labeled xi and xi, with an edge between them:
xi xi
This gives us a vertex for each possible literal. Next, for each of the m clauses
(l1 l2 l3 ), create a clause component: three vertices connected in a triangle, and
three connecting edges (the curved lines below) from each to the corresponding
vertex already created for that literal:
C.3 VERTEX COVER IS NP-HARD 379
l1 l2 l3
The widgets will be connected in the larger graph only by their corners. As such, there are
only two ways a widget can participate in a Hamiltonian circuit of that larger graph. The
circuit might enter at one corner, visit every vertex of the widget, and leave at the opposite
corner on the same side, like this:
We’ll call that a one-pass path. The alternative is that the circuit might make two separated
passes through the widget, visiting one side on one pass and the other side on the other, like
this:
C.4 HAMILTONIAN CIRCUIT IS NP-HARD 381
We’ll call that a two-pass path. Our reduction will use the fact that there is no other way to
include a widget in a Hamiltonian path:
Lemma C.1: If a widget graph is connected in a larger graph only at its corners,
then a Hamiltonian circuit entering at one of the corners must exit at the
opposite corner on the same side, having visited either all the vertices in the
widget or all the vertices on that side of the widget and none on the other.
Proof: By inspection.
To convince yourself this is true, try making some paths through a widget. You’ll quickly see
why most of your choices are restricted by the need to visit every vertex exactly once.
In the following construction, there is one widget in the new graph for each edge in the
original graph. The widgets are connected together in such a way that Hamiltonian circuits
in the new graph correspond to vertex covers of the original graph. If an edge is covered from
just one end, our Hamiltonian circuit will visit its widget by a one-pass path; if it is covered
from both ends, by a two-pass path.
the y side
yxL yxR
xyL xyR
the x side
Next, for each vertex x in G, let y1…yn be any ordering of all the vertices
connected to x in G, and add edges (xy1R, xy2L), (xy2R, xy3L), and so on through
(xyn-1R, xynL). These edges connect all the x sides (of widgets that have x sides)
into one path.
382 APPENDIX C: SOME NP-HARDNESS PROOFS
Finally, for each vertex x in G and for each of the new vertices zi in H, add
the edges from zi to the start of the x-side path (that is, from zi to xy1L) and
from the end of the x-side path to zi (that is, from xynR to zi ). These edges
complete paths leading from any zi, through any x-side path, to any zk. That
completes the construction. The size of H and the time to construct it are clearly
polynomial in the size of G.
Now we claim that the constructed graph H has a Hamiltonian circuit if and
only if G has a vertex cover of size k or less. First, suppose H has a Hamiltonian
circuit. This must include every zi. By construction the only way to get from one
zi to the next is by a ui-side path for some variable ui, so the Hamiltonian circuit
must include k different ui-side paths. Let U = {u1, …, uk } be the set of variables
whose ui-side paths are used in the circuit. Every vertex of every widget must be
included in the circuit, so every widget must occur on one or two ui-side paths
for some ui U. Therefore, every edge in G must have one or two vertices in U.
So U is a vertex cover for G.
On the other hand, suppose U = {u1, …, uk } is some vertex cover for G. Then
we can make a circuit in H as follows. Start at z1; take the u1-side path to z2;
then take the u2-side path to z3; and so on. At the end of the uk-side path, return
to z1. Because U is a vertex cover for G, every edge has either one or two ends in
U; therefore every widget occurs either once or twice in this circuit. If once, use
the one-pass path through that widget; if twice, use the two-pass paths. In this
way we visit every vertex exactly once, so this is a Hamiltonian circuit.
We have shown that Vertex Cover P Hamiltonian Circuit. Because Vertex Cover
is NP-hard, we conclude that Hamiltonian Circuit is NP-hard.
Perhaps a quick example of this construction will help to elucidate the proof. Here is the
graph G for an instance of Vertex Cover; next to it is the graph H constructed by the proof,
with k = 1.
baL b side baR bcL b side bcR dcL d side dcR
a
z1
C.4 HAMILTONIAN CIRCUIT IS NP-HARD 383
As you can see, G has no vertex cover of size k = 1, just as H has no Hamiltonian circuit. Any
set containing a single vertex fails to cover all the edges in G and corresponds to a path that
fails to visit all vertices in H. For example, the set {c} covers only two of the three edges in G,
just as the path from z1 through the c sides and back visits only two of the three widgets in
H.
On the other hand, if we allow k = 2, then G has a cover and H has a Hamiltonian
circuit:
z1
z1
The illustration shows the cover for G (with circled vertices) and shows (with solid lines) a
corresponding Hamiltonian circuit in H: from z1 through the b sides to z2, then through the
c sides and back to z1. Notice how the widgets for the edges (a, b) and (c, d ) are visited using
the one-pass path, since those edges are covered from only one end, while the widget for
(b, c) is visited using the two-pass paths, since it is covered from both ends.
i n d e x
A B
abstract syntax tree, 152 backtracking search, 60–62, 96, 200
algorithm Backus, John, 150
approximation, 350, 359–360 Backus-Naur form. See BNF
Cocke-Kasami-Younger (CKY), 207 base case, 29
for deciding Primality, 316, 324 BNF (Backus-Naur form), 150–151, 215–216
for DFA minimization, 108, 113 bottom-up parsing, 205–207, 209
for edge cover minimization, 343
for finding an Euler circuit, 345 C
for finding minimum star height, 112 capturing parentheses, 94
for regular expression equivalence, 111 Cartesian product, 25
parsing. See parsing certificate of language membership, 332–337
tractable, 316 CFG (context-free grammar). See also grammar
using reduction, 265 conversion from stack machine, 173–176
verification, 353, 372 conversion to stack machine, 171–173
vs. effective computational procedure, 244 definition, 146
alphabet for regular languages, 147–148
binary, 229, 373 in the Chomsky hierarchy, 279
definition, 2 undecidable questions for, 280
in a 3-tape TM, 227 with balanced pairs, 148
in Java, 36 with concatenations, 148–149
input, in a stack machine, 166 with unions, 149–150
input, of a TM, 222 CFL (context-free language)
Kleene closure of, 4 closure under concatenation, 187–188
nonterminal, for a grammar, 121 closure under intersection with regular language, 188
of a DFA, 13 closure under Kleene star, 188
of a regular expression, 76 closure under union, 187
of an NFA, 51, 52 definition, 146
renaming, 174–175 not closed for complement, 189–191
stack, in a stack machine, 166 not closed for intersection, 189
stacked, 238–240 checkers, 323
tape, of a TM, 222 chess, 323
terminal, for a grammar, 121 Chomsky hierarchy, 279–280
ambiguity Chomsky, Noam, 150, 279, 281
inherent, in context-free languages, 154, 158 Church, Alonzo, 243, 244, 283
in grammars, 153–154 Church-Turing Thesis, 244
approximation problems, 349–350 Clay Mathematics Institute, 355
asymptotic bounds closure
common, 294–295 Kleene, of a language, 188
little-o, 367 Kleene, of an alphabet, 4
lower (big- ), 290 of CFLs under concatenation, 187–188
simplification of, 292–294 of CFLs under intersection with regular languages, 188
tight (big- ), 291 of CFLs under Kleene star, 188
upper (big-O), 290 of CFLs under union, 187
384
INDEX 385
of regular languages under complement, 23 Diophantine equations, 283
of regular languages under intersection, 23–25 disjoint sets, 121
of regular languages under union, 26–27 doubly exponential time. See 2EXPTIME
properties of DCFLs, 209 DPDA (deterministic PDA), 208
properties of P, 329 draughts, 323
properties of PSPACE, 329 Dyck language, 158
CNF (conjunctive normal form), 338
CNF Satisfiability (CNFSAT), 338 E
Cocke-Kasami-Younger (CKY) algorithm, 207 EBNF (Extended BNF), 154
complement in regular expressions, 112, 324 Edge Cover, 343
complement of a language, 22 effective computational procedure, 243
complete axiom system, 282 egrep, 92–94
complexity class, 312 ELEMENTARY TIME, 112, 319, 324
compositionality, 50 empty set, 2
computable function, 244, 278 empty string, 2
concatenation enumerator, 271–275
of languages, 76, 148 ε-transitions
of strings, 3 cycles of, 224
conjunctive normal form (CNF), 338 for NFA composition, 79
consistent axiom system, 282 in NFAs, 47
context-free grammar. See CFG in stack machines, 161, 167
context-free language. See CFL in the subset construction, 68
context-sensitive grammar, 279 Euler Circuit, 345
context-sensitive language (CSL), 280 exponential growth, 295
Cook, Stephen, 337 EXPSPACE (exponential space)
Cook’s Theorem, 337, 346, 372–374 completeness, 111, 320, 323
CSL (context-sensitive language), 280 definition, 320
cubic growth, 295 hardness, 320
Curry, Haskell, 243 EXPTIME (exponential time)
completeness, 319, 323
D definition, 317
DCFL (deterministic CFL), 208–209 hardness, 319
decidable property, 244, 278
decision method, 256 F
declaration before use, 196 finite languages, 142
derivation relations, 122 finite-state transducer, 109–111
deterministic CFL (DCFL), 208–209 flex, 99
deterministic finite automaton. See DFA
deterministic PDA (DPDA), 208 G
deterministic stack machine, 211–212 go, 323
DFA (deterministic finite automaton) Gödel, Kurt, 282, 354
applications, 36 grammar. See also CFG (context-free grammar)
encoded as a string, 229–231 conversion from NFA to, 122–124
formal definition, 13–14 derivation in, 122
informal definition, 10–11 formal definition, 121
language accepted by, 14–15 language generated by, 121–122
minimization, 106–108 left-linear form, 127
simulated by a TM, 229–231 regular, 127
difference of sets, 33 right-linear form, 126–127, 279
386 INDEX