Once Upon An Algorithm - How Stories Explain Computing
Once Upon An Algorithm - How Stories Explain Computing
Martin Erwig
This book was set in Garamond by the author using the LATEX document preparation
system. Printed and bound in the United States of America.
10 9 8 7 6 5 4 3 2 1
Contents
Preface vii
Acknowledgments xi
Introduction 1
Part I ALGORITHMS
Part II LANGUAGES
Glossary 287
Notes 303
Index 313
Preface
When people ask about my work, the conversation quickly turns to the question of
what computer science is. To say computer science is the science of computers is mis-
leading (though strictly speaking not incorrect) because most people will take computer
to mean PC or laptop and conclude that computer scientists spend their time construct-
ing hardware. On the other hand, defining computer science as the study of computa-
tion is simply kicking the can down the road, since it immediately raises the question
of what computation is.
Over the years, I have come to realize that teaching by just introducing concept after
concept doesn’t work very well; it is simply too abstract. Nowadays, I typically start
by describing computer science as the study of systematic problem solving. Everybody
knows what a problem is, and everybody has seen solutions as well. After explain-
ing this view through an example, I often get the chance to introduce the concept of
an algorithm, which allows me to point out important differences between computer
science and mathematics. Most of the time, I don’t need to talk about programming
languages, computers, and related technical matters, but even if it comes to that, the
concrete problem makes it easy to illustrate these concepts. Once Upon an Algorithm is
an elaboration of this approach.
Computer science is a relatively new member of the science club, and sometimes
it seems it has not yet earned the respect granted to serious scientific disciplines like
physics, chemistry, and biology. Think of a movie scene involving a physicist. You
will probably see someone discussing complicated formulas written on a blackboard or
supervising an experiment wearing a lab coat. The physicist is shown as a reputable sci-
entist whose knowledge is treasured. Now imagine a similar scene involving a computer
scientist. Here you will likely see some nerdy guy sitting in a dark, messy room and
staring at a computer screen. He is frantically typing on a keyboard, probably trying to
break some code or password. In both scenes, an important problem is being solved, but
while the physicist might provide some plausible explanation of how it can be solved,
the solution of the computer problem remains mysterious, often magical, and much
viii Once Upon an Algorithm
Table 1
Story Chapters Topics
Part I
Hansel and Gretel 1, 2 Computation and Algorithms
Sherlock Holmes 3, 4 Representation and Data Structures
Indiana Jones 5, 6, 7 Problem Solving and Its Limitations
Part II
Over the Rainbow 8, 9 Language and Meaning
Groundhog Day 10, 11 Control Structures and Loops
Back to the Future 12, 13 Recursion
Harry Potter 14, 15 Types and Abstraction
The Myth of Sisyphus raises the question of how to face the absurdity of life and also how
to detect a never-ending computation.
Stories have multiple layers of meaning. They often include a computational layer.
Once Upon an Algorithm is an effort to unveil this layer and offer readers a new per-
spective on stories and computation. I hope that stories will be appreciated for their
computational content and that this novel point of view will spark interest in computer
science.
Acknowledgments
The idea for Once Upon an Algorithm arose from many conversations with friends,
students, colleagues, and people I talk to on the bus on my way to work. I thank all of
them for their patience in listening to my explanations about computer science and for
their friendly impatience when the explanations became too long and complicated. The
goal of writing a widely accessible book about computer science was fueled to a large
degree by these experiences.
Over the last decade I’ve had the opportunity to work with many high school stu-
dents as summer interns, which provided additional encouragement. These internships
were supported by several grants from the National Science Foundation, to which I am
grateful for its support of scientific research and science education in the United States.
In researching the material for Once Upon an Algorithm I have relied on the inter-
net, in particular, Wikipedia (wikipedia.org) and the TV Tropes website (tvtropes.org).
Thanks to all those contributors for their commitment and enthusiasm in sharing their
knowledge with the world.
While I was writing this book, Eric Walkingshaw, Paul Cull, and Karl Smeltzer read
some of the chapters and provided expert feedback on content and writing style. I want
to thank them for their helpful advice. Thanks also to Jennifer Parham-Mocello for
reading some of the chapters and testing several of the examples with her students in a
college freshman class. I also thank my son, Alexander, for proofreading the manuscript
and providing expert advice on questions related to Harry Potter. Most of this book
was written during a sabbatical from my position at Oregon State University. I thank
my department and the university for their support for this project.
Turning the idea for this book into reality was a much greater challenge than I
had anticipated. My sincere thanks to Marie Lufkin-Lee, Katherine Almeida, Kathleen
Hensley, and Christine Savage at MIT Press for supporting this project and helping me
with it along the way.
Finally, I am fortunate to be married to my most patient and candid reader. My
wife, Anja, encouraged me throughout the adventure of this book project. She always
xii Once Upon an Algorithm
had an open ear for my questions, which more often than not were quite nerdy and
abstract. She read many drafts and patiently tried to rein me in whenever my writing
was too academic and relied too much on technical jargon. The completion of Once
Upon an Algorithm owes more to her than to anyone else, and I dedicate this book to
her.
Introduction
Computation plays a prominent role in society now. But unless you want to become
a computer scientist, why should this prompt you to learn more about it? You could
simply enjoy the technology powered by computation and take advantage of its benefits.
You likely also don’t study avionics to make use of air travel or seek a medical degree to
benefit from the achievements of modern health care.
However, the world we live in consists of more than man-made technology. We
still have to interact with nonautonomous objects ruled by physical laws. Therefore,
it is beneficial to understand the basics of mechanics simply to predict the behavior
of objects and to safely navigate one’s environment. A similar case can be made for
the benefits of studying computation and its related concepts. Computation is not only
revealed in computers and electronic gadgets but also occurs outside of machines. In the
following I briefly discuss some of the major computer science principles and explain
why they matter.
measure
1 inch
draw draw
1 inch
(such as pencil, paper, and a ruler) and takes time to carry out. Here again, having an
algorithmic description is important because it helps in analyzing the resource require-
ments for computations.
niques also apply to many other areas, from simple assembly instructions for furniture
to organizational processes for running an office or managing a factory floor.
In the realm of technology, algorithms control basically all the computations in
computers. One prominent example is data compression, without which the trans-
mission of music and movies over the internet would be nearly impossible. A data
compression algorithm identifies frequent patterns and replaces them with small codes.
Data compression directly addresses the resource problem of computation by reducing
the space required to represent songs and movies and thus also the time it takes to load
them over the internet. Another example is Google’s page-rank algorithm, which deter-
mines the order in which search results are presented to a user. It works by assessing the
importance of a web page through counting how many links point to it and weighing
how important those links are.
the way your fridge or pantry is organized has an impact on how fast you can retrieve all
the ingredients required for the recipe. Or consider how the question of representation
applies to the recipe itself. It may be given as a textual description, through a series of
pictures, or even in the form of a YouTube video. The choice of representation often
makes a big difference in the effectiveness of algorithms.
In particular, the question of how to arrange a collection of items or people has
many applications, for instance, how to organize your desk or garage to help you find
items more quickly or how to organize the book shelves in a library. Or consider the
different ways you wait in line: in a grocery store (standing in a queue) or at a doctor’s
office (sitting in a waiting room having picked a number) or when boarding an airplane
(multiple queues).
In the realm of technology, spreadsheets are among the most successful program-
ming tools. The organization of data in tabular form has contributed to much of
their success because it supports the quick and easy formulation of sums over rows
and columns and represents the data and computation results for them in one place.
On the other hand, the internet, one of the most transformative inventions of the late
twentieth century, organizes web pages, computers, and the connections between them
as a network. This representation supports flexible access to information and efficient
transmission of data.
• that some computations can support others, such as sorting can support search-
ing,
• that algorithms with exponential runtimes cannot really be considered solutions
to problems.
In the realm of technology, control structures are used wherever algorithms are
used, and thus they are everywhere. Any information sent over the internet is transmit-
ted in a loop repeatedly until it has been properly received. Traffic lights are controlled
by endlessly repeating loops, and many manufacturing processes contain tasks that are
repeated until a quality measure is met. Predicting the behavior of algorithms for un-
known future inputs has many applications in security. For example, one would like
to know if a system is vulnerable to attacks by hackers. It also applies to rescue robots
that have to be used in situations different from the ones they are trained in. Accurately
predicting robot behavior in unknown situations can mean the difference between life
and death.
Recursion
The principle of reduction—the process of explaining or implementing a complex sys-
tem by simpler parts—plays an important role in much of science and technology. Re-
cursion is a special form of reduction that refers to itself. Many algorithms are recur-
sive. Consider, for example, the instructions for looking up a word in a dictionary
that contains one entry per page: “Open the dictionary. If you can see the word, stop.
Otherwise, look up the word in the dictionary part before or after the current page.”
Notice how the look-up instruction in the last sentence is a recursive reference to the
whole process that brings you back to the beginning of the instructions. There is no
need for adding something like “repeat this until the word is found” to the description.
In chapter 12, I explain recursion, which is a control structure but is also used in
the definition of data organization. In chapter 13, I illustrate different approaches for
understanding recursion.
action can be described recursively by using the action to be repeated (here “season to
taste”) in its description and a condition when to stop.
Recursion is an essential principle for obtaining finite descriptions of potentially
infinite data and computations. Recursion in the grammar of a language facilitates an
infinite number of sentences, and a recursive algorithm allows it to process inputs of
arbitrary size.
Since recursion is a general control structure and a mechanism for organizing data,
it is part of many software systems. In addition, there are several direct applications of
recursion. For example, the Droste effect, in which a picture contains a smaller version
of itself, can be obtained as a result of a feedback loop between a signal (a picture) and
a receiver (a camera). The feedback loop is a recursive description of the repetitious ef-
fect. Fractals are self-similar geometric patterns that can be described through recursive
equations. Fractals can be found in nature, for example, in snowflakes and crystals, and
are also used in analyzing protein and DNA structures. Moreover, fractals are employed
in nanotechnology for designing self-assembling nanocircuits. Self-replicating machines
are a recursive concept because once they are operating, they reproduce copies of them-
selves that reproduce further copies, and so on. Self-replicating machines are investi-
gated for space exploration.
details is called abstraction, which is the subject of chapter 15, where I explain why
abstraction is central to computer science and how it applies not only to types but also
to algorithms, and even computers and languages.
defines
executes
du plo
ces ys
ex
sum
em
e
pli
s
structured by
Chapter 10
fy
solves
rm
Abstraction
nsf
tra
n
tai
co
n
nt
co
ain
Chapter 12
Chapters 5 & 6
Problems Loops Recursion
Representation
es
can uir
can
req
lains
Chapters 3 & 4
be req
u
be
ire
exp
Chapter 11 Chapter 13
Chapter 7
Intractable Unsolvable Termination Interpretation
I suggest reading chapters 1 and 2 first, since they introduce fundamental concepts of
computing that appear throughout the book, such as algorithm, parameter, computer,
and runtime complexity. These two chapters should be easy to follow.
The other six topic areas (light-shaded boxes in figure 2) 5 6
are largely independent of one another, but within each topic 4 8 13
area the chapters should be read in order. Chapter 4 introduces 10
several data structures, so it should be read before chapters 5, 6, 12
8, 12, and 13 (see diagram).
Finally, the Glossary at the end of the book provides additional information about
how the chapters relate by grouping definitions into topic areas and cross-linking their
entries.
Part I
ALGORITHMS
Chapters 1 & 2
describes
Computation Algorithm
pro
con
du
ces executes
sum
es
ms
solves
r
fo
Resources Computer
ns
tra
Chapters 5 & 6
Representation Problems
can be
Chapters 3 & 4
Chapter 7
Intractable
Computation and Algorithms
It is early in the morning. The alarm clock rings. You move your body out of bed—
eventually. You get dressed. This simple daily getting-up routine solves a recurring
problem through a series of well-defined steps. In computer science such a routine is
called an algorithm. Taking a shower, brushing your teeth, having breakfast, and so on,
are more examples of algorithms that solve specific problems.
But wait a minute. Except perhaps for not getting enough sleep, what is the problem
here? We usually do not perceive mundane everyday activities as producing solutions to
problems. Maybe this is because these problems have obvious solutions or the solutions
are easily obtained. However, the word problem is commonly used for situations or
questions that have well-known solutions. Think of exams, which contain problems
with well-defined answers. A problem is thus any question or situation that demands a
solution, even if it is clear how to get it. In this sense, having to get up in the morning
is a problem that has a well-known method for producing a solution.
Once we know how to solve a problem, we seldom wonder how the corresponding
method was conceived. In particular, when the method is obvious and simple to use,
there seems to be no point in reflecting on it. But thinking about how we can solve
problems can help us solve unknown problems in the future. Solutions to problems are
not always obvious. In hindsight, most solutions seem evident, but what if you didn’t
know how to solve the getting-up problem. How would you approach it?
One crucial observation is that nontrivial problems can be decomposed into sub-
problems and that the solutions to the subproblems can be combined into a solution
for the original problem. The getting-up problem consists of the two subproblems:
getting out of bed and getting dressed. We have algorithms for solving both of these
problems, that is, moving our body out of the bed and putting on clothes, respectively,
and we can combine these algorithms into an algorithm for getting up, but we have to
be careful to do it in the right order. Since it is rather difficult to get dressed in bed, we
should take the step of getting out of bed first. If you don’t find this example convinc-
ing, think about the order in which to take a shower and to get dressed. In this simple
18 Once Upon an Algorithm
example there is only one ordering of the steps that leads to a viable solution, but this
is not always the case.
Problem decomposition is not limited to one level. For example, the problem of
getting dressed can be further decomposed into several subproblems, such as putting on
pants, a shirt, shoes, and so on. The nice thing about problem decomposition is that
it helps us to modularize the process of finding solutions, which means that solutions
to different subproblems can be developed independently of one another. Modularity
is important because it enables the parallel development of solutions to problems by
teams.
Finding an algorithm for solving a problem is not the end of the story. The algo-
rithm has to be put to work to actually solve the problem. Knowing how to do some-
thing and actually doing it are two different things. Some of us are made painfully aware
of this difference each morning when the alarm clock rings. Thus there is a difference
between an algorithm and its use.
In computer science each use of an algorithm is called a computation. So does that
mean, when we actually do get up, that we are computing ourselves out of bed and into
our clothes? That sounds crazy, but what would we say about a robot that did the same
thing? The robot needs to be programmed to accomplish the task. In other words,
the robot is told the algorithm in a language it understands. When the robot executes
its program to get up, wouldn’t we say it carries out a computation? This is not to
say that humans are robots, but it does show that people compute when they execute
algorithms.
The power of algorithms derives from the fact that they can be executed many
times. Like the proverbial wheel that should not be reinvented, a good algorithm, once
developed, is there to stay and serve us forever. It will be reused by many people in many
situations to reliably compute solutions to recurring problems. This is why algorithms
play a central role in computer science and why the design of algorithms is one of the
most important and exciting tasks for computer scientists.
Computer science could be called the science of problem solving. Even though
you won’t find this definition in many textbooks, this perspective is a useful reminder
of why computer science affects more and more areas of our lives. Moreover, many
useful computations happen outside of machines and are performed by people (without
a computer science education) to solve problems. Chapter 1 introduces the concept
of computation and, through the story of Hansel and Gretel, highlights its problem-
solving and human aspects.
1 A Path to Understanding
Computation
What is computation? This question lies at the core of computer science. This chapter
provides an answer—at least a tentative one—and connects the notion of computation
to some closely related concepts. In particular, I explain the relationship between com-
putation and the concepts of problem solving and algorithms. To this end, I describe
two complementary aspects of computation: what it does, and what it is.
The first view, computation solves problems, emphasizes that a problem can be solved
through computation once it is suitably represented and broken down into subprob-
lems. It not only reflects the tremendous impact computer science has had in so many
different areas of society but also explains why computation is an essential part of all
kinds of human activities, independent of the use of computing machines.
However, the problem-solving perspective leaves out some important aspects of
computation. A closer look at the differences between computation and problem solv-
ing leads to a second view, computation is algorithm execution. An algorithm is a precise
description of computation and makes it possible to automate and analyze computa-
tion. This view portrays computation as a process consisting of several steps, which
helps explain how and why it is so effective in solving problems.
The key to harnessing computation lies in grouping similar problems into one class
and designing an algorithm that solves each problem in that class. This makes an algo-
rithm similar to a skill. A skill such as baking a cake or repairing a car can be invoked
at different times and thus can be employed repeatedly to solve different instances of
a particular problem class. Skills can also be taught to and shared with others, which
20 Once Upon an Algorithm
gives them an even wider impact. Similarly, we can execute an algorithm repeatedly for
different problem instances and generate with each execution a computation that solves
the problem at hand.
the next are easy to achieve. When combined, they yield a move-
ment from the starting location in the forest to the home. This movement solves Hansel
and Gretel’s problem of survival in a systematic way. Systematic problem solving is one
key characteristic of computation.
As this example illustrates, a computation usually consists of not just one but many
steps. Each of these steps solves a subproblem and changes the problem situation a
Chapter 1 A Path to Understanding Computation 21
little bit. For example, each move by Hansel and Gretel to the next pebble is a step in
the computation that changes their position in the forest, which corresponds to solving
the subproblem of reaching the next target on the path home. While in most cases
each individual step will bring the computation closer to the solution, this does not
necessarily have to be the case for every step. Only all steps taken together have to yield
the solution. In the story, while each position that Hansel and Gretel go through will
generally be closer to home, it is also likely that the path is not a straight line. Some
pebbles may even cause detours, for example, to move around obstacles or to cross a
river using a bridge, but this does not change the effect of the combined movement.
The important lesson is that a solution is obtained through a systematic problem
decomposition. While decomposition is a key strategy to obtaining a solution to a prob-
lem, it is not always sufficient by itself, and solutions may depend on supplementary
items—in the case of Hansel and Gretel, the pebbles.
Danger
Safety
...
...
Representation Representation
of Problem of Solution
Figure 1.1 Computation is a process for solving a particular problem. Usually, a computation
consists of several steps. Starting with a representation of the problem, each step transforms the
representation until the solution is obtained. Hansel and Gretel solve the problem of surviving
through a process of changing their position step-by-step and pebble-by-pebble from within the
forest to their home.
computer, that is, Hansel and Gretel, to actually carry out the steps of the computa-
tion. It is common to have several layers of representation. In this case we have one
that defines the problem (locations) and one that makes it possible to compute a so-
lution (pebbles). In addition, all the pebbles taken together constitute another level
of representation, since they represent the path out of the forest back home. These
representations are summarized in table 1.1.
Figure 1.1 summarizes the problem-solving picture of computation; it shows Hansel
and Gretel’s way-finding as an instance of the view that computation manipulates repre-
sentation in a sequence of steps. In the getting-up problem we also find representations,
for example, as location (in bed, out of bed) and as an alarm clock representing time.
Representations can take many different forms. They are discussed in more detail in
chapter 3.
Table 1.1
Computation Representation Problem Representation
Object Represents Concept Represents
Location in forest Location in forest Danger
One pebble
Home Home Safety
All pebbles Path out of forest Path out of forest Problem Solution
Chapter 1 A Path to Understanding Computation 23
Problem Solving
without real-world
significance
find path
within forest find path ad hoc, not systematic
home find
substitute
for pebbles
nonrepeatable
Computation
Figure 1.2 Distinguishing problem solving from computation. A computation whose effect
carries no meaning in the real world does not solve any problems. An ad hoc solution to a
problem that is not repeatable is not a computation.
finger. This idea is not the result of a systematic computation, but it solves the problem:
it postpones the witch’s eating Hansel.
Another example of noncomputational problem solving occurs right after Hansel
and Gretel have returned home. The parents plan to lead them into the forest again
the next day, but this time the stepmother locks the doors the night before to prevent
Hansel from collecting any pebbles. The problem is that Hansel cannot get access to
the pebbles that served him so well the first time and that he relied on for finding
the way back home. His solution is to find a substitute in the form of breadcrumbs.
The important point here is how Hansel arrives at this solution—he has an idea, a
creative thought. A solution that involves a eureka moment is generally very difficult,
if not impossible, to derive systematically through a computation because it requires
reasoning on a subtle level about objects and their properties.
Unfortunately for Hansel and Gretel, the breadcrumbs solution doesn’t work as
expected:
When the moon came, they set out, but they found no crumbs, for the many
thousands of birds which fly about in the woods and fields had picked them all
up.1
Since the breadcrumbs are gone, Hansel and Gretel can’t find their way home, and the
rest of the story unfolds.
However, let us assume for a moment that Hansel and Gretel could somehow have
found their way back home again and that the parents would have tried a third time
to leave them behind in the forest. Hansel and Gretel would have had to think of
Chapter 1 A Path to Understanding Computation 25
yet another means to mark their way home. They would have had to find something
else to drop along the way, or maybe they would have tried to mark trees or bushes.
Whatever the solution, it would have been produced by thinking about the problem
and having another creative idea, but not by systematically applying a method. This
highlights the other criterion for computation, which is its ability to be repeated and
solve many similar problems. The method for solving the way-finding problem by
following pebbles is different in this regard because it can be executed repeatedly for
many different pebble placements.
To summarize, while the problem-solving perspective shows that computation is a
systematic and decomposable process, it is not sufficient to give a comprehensive and
precise picture of computation. Viewing computation as problem solving demonstrates
how computation can be employed in all kinds of situations and thus illustrates its
importance but ignores some important attributes that explain how computation works
and why it can be successfully applied in so many different ways.
problem. For example, a recipe for a cake will produce the same cake over and over
again. Since the output of the algorithm is transient—the cake gets eaten—reproducing
the same result is very useful. The same holds for the problem of getting out of bed
and dressed; the effect of the algorithm has to be reproduced every day, although likely
with different clothes and at a different time on weekends. This also applies to Hansel
and Gretel. Even if they are brought to the same place in the forest as on the first day,
getting home has to be recomputed by repeating the algorithm to solve the exact same
problem.
The situation is different for algorithms that produce nonphysical, abstract results,
such as numbers. In such a case one can simply write down the result and look it up the
next time it is needed instead of executing the algorithm again. For an algorithm to be
useful in such situations it must be able to solve a whole class of problems, which means
that it must be possible to apply the method to several different, but related, problems.2
In the story the method is general enough to solve many different way-finding prob-
lems, since the exact positions of the pebbles do not matter. No matter where exactly
in the forest the parents lead the children to, the algorithm will work in every case3 and
consequently will cause a computation that solves Hansel and Gretel’s survival problem.
Much of the power and impact of algorithms comes from the fact that one algorithm
gives rise to many computations.
The notion of an algorithm is one of the most important concepts in computer
science because it provides the foundation for the systematic study of computation.
Accordingly, many different aspects of algorithms are discussed throughout this book.
machine, then the proliferation of computation seems almost limitless, bounded only
by the resources needed to build and operate computers.
Does the getting-up algorithm need a description in a language? Probably not.
Through repeated execution we all have internalized the steps to the point that we
execute them unconsciously and don’t need a description. However, for some parts of
this algorithm there do exist descriptions, often given as a sequence of pictures. Think
about tying a tie or arranging a sophisticated braided hairstyle. If you do this for the
first time and no one is available to demonstrate the technique, you can learn the skill
from such a description.
The ability to express an algorithm in a language has another important effect. It
allows the systematic analysis and formal manipulation of algorithms, which is the sub-
ject of computer science theory and programming languages research.
An algorithm must be expressible in a language that can be understood by the com-
puter to execute it. Moreover, the description must be finite, that is, it is bounded and
doesn’t go on forever. Finally, each of its individual steps must be effective, that is,
whoever executes the algorithm must be able to understand and perform all the steps.
Hansel and Gretel’s algorithm is clearly finite, since it contains only a few instructions,
and the individual steps are also effective, at least when we assume that the pebbles are
placed within viewing distance of one another. One might have some doubts about
the requirement to always find a pebble that was not visited before; this can be dif-
ficult because of the need to remember all the previously encountered pebbles. This
requirement could be easily realized, though, by simply picking up each pebble imme-
diately after it has been visited. However, this would then be a different algorithm.
Incidentally, that algorithm would have made it easy for Hansel and Gretel to find their
way back on the second day, since Hansel would have held onto all the pebbles. The
slightly changed algorithm would have made the story as told by the Brothers Grimm
impossible (depriving us, alas, of a classic fairy tale).
A Wish List
In addition to the defining characteristics, there are several desirable features for algo-
rithms. For example, an algorithm should always produce a computation that termi-
nates and that delivers correct results. Since Hansel has placed a finite number of pebbles
that mark the way to the parents’ house, an execution of the described algorithm will
terminate because it visits each pebble no more than once. Surprisingly, however, it
might not produce a correct result in all cases because the process could get stuck.
28 Once Upon an Algorithm
C C
... A ... A
D D
B B
Figure 1.3 A path that illustrates a possible dead end in an algorithm. Left: Visiting the pebbles
in the reverse order leads to Hansel and Gretel’s home. Right: Since pebbles B, C , and D are
all within viewing distance from one another, Hansel and Gretel could choose to go from D to
B and then to C . At that point, however, they are stuck because no pebble that they haven’t
visited before is visible from C . Specifically, they cannot reach A, which is the next pebble on
the path home.
As stated, the algorithm does not say which pebble exactly to go to. If the parents
are leading Hansel and Gretel not in a straight line but, say, along a zig-zag path into the
forest, it could happen that from one pebble several other pebbles are visible. Which
pebble should Hansel and Gretel go to in such a case? The algorithm doesn’t say. Under
the assumption that all pebbles are placed in viewing distance from one another, we can
encounter the following situation, illustrated in figure 1.3.
Imagine a series of pebbles A, B, C , and D, placed by Hansel on their way into the
forest. Suppose that A can be seen from B, and B can be seen from C , but A is too far
away to be visible from C . (This is indicated in the figure by the circles of visibility
around pebbles B and C .) Moreover, suppose that D is within viewing distance from
both B and C . This means that when Hansel and Gretel arrive at D, they can see the
two pebbles B and C and have a choice to make. If they choose to go to C , then they
will find B next and finally A, and everything will be fine (see left part of figure 1.3).
However, if they choose B instead of C —which is possible according to the algorithm,
since B is a visible pebble not visited before—they can get into trouble because if they
next choose C —which again is visible and has not been visited yet—they will be stuck
at C . This is because the only pebbles that they can see from C are B and D, both
of which have already been visited and therefore cannot be chosen, according to the
algorithm (see right part of figure 1.3).
Of course, we could try to fix the algorithm by adding instructions to backtrack
in cases like this and choose a different alternative, but the point of this example is to
illustrate a case in which a given algorithm does not produce a correct result. It also
Chapter 1 A Path to Understanding Computation 29
shows that the behavior of an algorithm is not always easy to predict, which makes the
design of algorithms a challenging and interesting endeavor.
The termination of algorithms is not an easy-to-recognize property either. If we
remove the condition in the algorithm to find only nonvisited pebbles, a computation
can easily slip into a nonterminating back-and-forth movement between two pebbles.
One might object that Hansel and Gretel would never do such a silly thing and would
recognize such a repeating pattern. This might be true, but then they would not be
following the exact algorithm and would, in fact, be deliberately avoiding a previously
visited pebble.
Whereas the case of a nonterminating back-and-forth between
two pebbles would be easy to spot, the problem can be much more
difficult generally. Imagine a path taken by the parents into the
forest that crosses itself a few times. The resulting pebble placement
would include several loops, each of which Hansel and Gretel could be caught in; only
by remembering visited pebbles could they be certain to avoid such loops. Chapter 11
considers the problem of termination in more detail.
Questions of correctness and termination do not seem that important for the
getting-up algorithm, but people have been known to put on nonmatching socks or
to violate the rules for correctly buttoning a shirt. And if you persist in repeatedly
hitting the snooze button, the getting-up algorithm will not even terminate.
Starting the Day
Most people’s day doesn’t really start before they’ve had breakfast. Cereal, fruit, eggs
with bacon, juice, coffee—whatever is on the menu, chances are that breakfast needs
some form of preparation. Some of these preparations can be described by algorithms.
If you like to vary your breakfast, for example, by adding different toppings to
your cereal or brewing different amounts of coffee, the algorithm that describes the
preparation must be able to reflect this flexibility. The key to providing controlled
variability is to employ one or more placeholders, called parameters, that are replaced
by concrete values whenever the algorithm is executed. Using different values for a
placeholder causes the algorithm to produce different computations. For example, a
parameter “fruit” can be substituted by different fruits on different days, allowing the
execution of the algorithm to produce blueberry cereal as well as banana cereal. The
getting-up algorithm also contains parameters so that we are not forced to wake up at
the same time and wear the same shirt every day.
If you are grabbing coffee from a coffee shop on your way to work, or if you are or-
dering breakfast in a restaurant, algorithms are still employed in producing your break-
fast. It is just that other people have to do the work for you. The person or machine that
is executing an algorithm is called a computer and has a profound effect on the outcome
of a computation. It may happen that a computer is unable to execute the algorithm if it
doesn’t understand the language in which the algorithm is given or if it is unable to per-
form one of its steps. Imagine you are a guest on a farm, and the algorithm for getting
your morning milk involves your milking a cow. This step might prove prohibitive.
But even if a computer can execute all the steps of an algorithm, the time it takes to
do so matters. In particular, the execution time can vary substantially between different
computers. For example, an experienced cow milker can extract a glass of milk faster
than a novice. However, computer science mostly ignores these differences because they
are transient and not very meaningful, since the speed of electronic computers increases
over time—and novice cow milkers can get more experienced and thus faster. What
is of great importance, however, is the difference in execution time between different
32 Once Upon an Algorithm
algorithms for the same problem. For example, if you want to get a glass of milk
for everyone in your family, you could fetch each glass separately, or you could fetch
a milk can once and then fill all glasses at the breakfast table. In the latter case you
have to walk the distance to the stable only twice, whereas in the former case you have
to walk it ten times for a family of five. This difference between the two algorithms
exists independently of how fast you can milk or walk. It is thus an indicator for the
complexity of the two algorithms and can be the basis for choosing between them.
In addition to execution time, algorithms may also differ with regard to other re-
sources that are required for their execution. Say your morning drink is coffee and
not milk, and you have the option to brew coffee with a coffee maker or use a French
press. Both methods require water and ground coffee, but the first method additionally
requires coffee filters. The resource requirements for different milking algorithms are
even more pronounced. Getting fresh milk requires a cow, whereas milk bought at a
grocery store requires a fridge for storing it. This example also shows that computation
results can be saved for later use and that computation can be sometimes traded for stor-
age space. We can save the effort of milking a cow by storing previously drawn milk in
a fridge.
The execution of an algorithm has to pay for its effect with the use of resources.
Therefore, in order to compare different algorithms for the same problem, it is impor-
tant to be able to measure the resources they consume. Sometimes we may even want
to sacrifice correctness for efficiency. Suppose you are on your way to your office and
you have to grab a few items from the grocery store. Since you are in a hurry, you leave
the change behind instead of storing the coins returned to you. The correct algorithm
would exchange the exact amount of money for the bought items, but the approxima-
tion algorithm that rounds up finishes your transaction faster.
Studying the properties of algorithms and their computations, including the re-
source requirements, is an important task of computer science. It facilitates judging
whether a particular algorithm is a viable solution to a specific problem. Continuing
with the story about Hansel and Gretel, I explain in chapter 2 how different computa-
tions can be produced with one algorithm and how to measure the resources required.
2 Walk the Walk: When
Computation Really
Happens
In the previous chapter we saw how Hansel and Gretel solved a problem of survival
by computing their way back home. This computation transformed their position sys-
tematically in individual steps, and it solved the problem by moving from a position in
the forest, representing danger, to the final position at home, representing safety. The
back-to-home computation was the result of executing an algorithm to follow a path of
pebbles. Computation happens when algorithms get to work.
While we now have a good picture of what computation is, we have only seen one
aspect of what computation actually does, namely, the transformation of representation.
But there are additional details that deserve attention. Therefore, to expand understand-
ing beyond the static representation through algorithms, I discuss the dynamic behavior
of computation.
The great thing about an algorithm is that it can be used repeatedly to solve dif-
ferent problems. How does this work? And how is it actually possible that one fixed
algorithm description can produce different computations? Moreover, we have said that
computation results from executing an algorithm, but who or what is executing the al-
gorithm? What are the skills needed to execute an algorithm? Can anybody do it? And
finally, while it is great to have an algorithm for solving a problem, we have to ask at
what cost. Only when an algorithm can deliver a solution to a problem fast enough and
with the resources allocated is it a viable option.
34 Once Upon an Algorithm
Creating Variety
Getting back to Hansel and Gretel, we’ve seen that their pebble-tracing algorithm can
be used repeatedly and in different situations. Let us now take a closer look at how this
actually works. Since the description in the algorithm is fixed, some part of this descrip-
tion must account for the variability in computations. This part is called a parameter.
A parameter in an algorithm stands for some concrete value, and when an algorithm is
executed, a concrete value must be substituted for the parameter in the algorithm. Such
a value is called an input value, or just input, for the algorithm.
For example, an algorithm for making coffee may use a parameter number to repre-
sent the number of cups to be brewed so that the algorithm’s instructions can refer to
this parameter. Here is an excerpt from such an algorithm:1
To execute this algorithm for three cups of coffee, you have to substitute the input value
3 for the parameter number in the algorithm’s instructions, which yields the following
specialized version of the algorithm:
The use of the parameter makes the algorithm applicable in a variety of different sit-
uations. Each situation is represented by a different input value (in the example, the
number of cups of coffee to be brewed) to be substituted for the parameter, and the
substitution adapts the algorithm to the situation represented by the input value.
Hansel and Gretel’s algorithm uses a parame-
ter for the pebbles placed in the forest. I did not
...
specify the parameter exactly before because the in-
struction “Find a pebble not visited before” clearly
referred to the pebbles placed by Hansel. The
parameter can be made explicit by using the fol-
lowing instruction instead: “Find a pebble from
the pebbles-placed-by-Hansel not visited before.” For
each execution of the algorithm, the parameter
pebbles-placed-by-Hansel is then replaced by the pebbles that Hansel has dropped—at
least we can think of it this way. Since we obviously cannot physically place pebbles
into the algorithm description, we treat the parameter as a reference or pointer to the
Chapter 2 Walk the Walk: When Computation Really Happens 35
input value. A pointer is a mechanism to access the input value; it tells us where to
look for the input value when the algorithm needs it. In the way-finding algorithm, the
input value is to be found on the forest floor. In the coffee-making algorithm, we have
the input value in our mind, and whenever the parameter refers to it, we retrieve it.
Nonetheless, the idea of substitution provides a useful analogy that helps to make the
relationship between an algorithm and a computation concrete.
By introducing a parameter and using it to replace concrete values we can generalize
an algorithm to work in many situations. For example, if a particular getting-up algo-
rithm contains the instruction “Wake up at 6:30 a.m.,” we can replace the concrete time
value by a parameter wake-up-time, which leads to the generalized instruction “Wake
up at wake-up-time.” Similarly, the cereal experience can be expanded by making the
parameter fruit part of the preparation algorithm.
The flip side is that in order to execute the getting-up algorithm we now need to
supply it with an input value so that the instructions can be made concrete by replacing
the parameter with a time value. This is generally not a problem, but it requires a
decision and is a potential source of mistakes. It all depends how valuable the variability
is. An alarm clock that can wake you up at only one preset time that can never be
changed is probably unacceptable, but many people can do without a parameter for
selecting different alarm sounds.
Finally, an algorithm that has no parameters and thus cannot take different input
values will always produce the same computation. As mentioned before, this is not
a problem for algorithms that have a transient physical effect, as in the case of cake
recipes, where the produced cakes get eaten, or for fetching milk, when the milk is
drunk. The recomputation of the same effect makes sense in these cases. However,
algorithms for computing values that can be stored and reused at later times need one
or more parameters to be reusable.
Parameters are a crucial component of algorithms, but the question of how general
or specific an algorithm should be does not have a simple answer. It is discussed in
chapter 15.
Who Is Performing?
As we have seen, a computation is the result of executing an algorithm. This raises the
question of who or what can execute an algorithm, and how. The examples have shown
that people can certainly execute algorithms, but so can (electronic) computers. Are
there other possibilities? And what are the requirements for executing an algorithm?
36 Once Upon an Algorithm
Algorith
m
...
...
Problem
Execution Solution
Computer
Computation
Figure 2.1 The execution of an algorithm generates a computation. The algorithm describes
a method that works for a whole class of problems, and an execution operates on the represen-
tation of a particular example problem. The execution must be performed by a computer, for
example, a person or a machine, that can understand the language in which the algorithm is
given.
The word for somebody or something that can perform a computation is, of course,
computer. In fact, the original meaning of the word referred to people who carried out
computations.2 In the following, I use the term in a general sense that refers to any
natural or artificial agent that can perform computations.
We can distinguish two principal cases of algorithm execution based on the abilities
of the performing computer. On the one hand, there are universal computers, such
as people or laptops or smart phones. A universal computer can in principle execute
any algorithm as long as it is expressed in a language understandable by the computer.
Universal computers establish an execution relationship between algorithms and com-
putation. Whenever the computer executes an algorithm for a particular problem, it
performs steps that change some representation (see figure 2.1).
On the other hand, there are computers that execute just one
algorithm (or perhaps a set of predefined algorithms). For exam-
ple, a pocket calculator contains hard-wired electronic circuits for
executing algorithms to perform arithmetic computations, as does
an alarm clock for making sounds at particular times. Another
intriguing example can be found in cell biology.
Think about what happens in your cells millions of times while you are reading
this sentence. Ribosomes are producing proteins to support your cells’ functions. Ribo-
Chapter 2 Walk the Walk: When Computation Really Happens 37
somes are little machines that assemble proteins as described by RNA molecules. These
are sequences of amino acids that tell the ribosome to produce specific proteins. You are
alive thanks to the computation that results from the ribosomal computers in your cells
reliably executing an algorithm to translate RNA molecules into proteins. But even
though the algorithm used by a ribosome can produce a huge variety of proteins, this is
the only algorithm that ribosomes can ever execute. While very useful, ribosomes are
limited; they cannot get you dressed or find a way out of the forest.
In contrast to computers that consist of hard-wired algorithms, an important re-
quirement for universal computers is that they understand the language in which their
algorithms are given. If the computer is a machine, the algorithm is also called a pro-
gram, and the language in which it is given is called a programming language.
If Hansel and Gretel wrote their memoirs and included a de- Der Weg
Zurück
scription of the algorithm that saved their lives, other children
von
with access to that book could execute the algorithm only if they H. & G
.
understood the language in which the book was written. This re-
quirement does not apply to nonuniversal computers that carry
out only fixed, hard-wired algorithms.
A requirement that applies to every kind of computer is the ability to access the
representation that is used by the algorithm. In particular, a computer must be able to
affect the required changes to the representation. If Hansel and Gretel were chained to a
tree, the algorithm would be of no help to them, since they wouldn’t be able to change
their position, which is what an execution of the way-finding algorithm requires.
To summarize, any computer has to be able to read and manipulate the representa-
tions that the algorithms operate on. In addition, a universal computer has to under-
stand the language in which algorithms are expressed. From now on, I use the term
computer to mean a universal computer.
enough is an entirely different matter. Related is the question of whether the computer
in charge has enough resources available to perform the computation in the first place.
For example, when Hansel and Gretel follow the pebble trace
back home, the entire computation takes as many steps as the Algorithm steps
number of pebbles that Hansel dropped.3 Note that step here
means “step of the algorithm” and not “step as a part of walking.” ... ...
In particular, one step of the algorithm generally corresponds Footsteps
to several steps taken by Hansel and Gretel through the forest. ...
Thus the number of pebbles is a measure for the execution time
of the algorithm, since one step of the algorithm is required for each pebble. Thinking
about the number of steps an algorithm needs to perform its task is judging its runtime
complexity.
Moreover, the algorithm works only if Hansel and Gretel have
enough pebbles to cover the path from their home to the place in the
forest where they are left by their parents. This is an example of a re-
source constraint. A shortfall of pebbles could be due either to a limited
availability of pebbles, which would be a limit of external resources, or to the limited
space offered by Hansel’s pockets, which would be a limit of the computer. To judge an
algorithm’s space complexity means to ask how much space a computer needs to execute
the algorithm. In the example, this amounts to asking how many pebbles are needed to
find a path of a particular length and whether Hansel’s pockets are big enough to carry
them all.
Therefore, while the algorithm may work in theory for any place in the forest, it is
not clear ahead of time that a computation will succeed in practice because it may take
too much time or require an amount of resources exceeding what is available. Before
examining computation resources more closely, I explain two important assumptions
about measuring computing costs that make this kind of analysis practical at all. In the
following, I focus on the runtime aspect, but the discussion also applies to the question
of space resources.
an algorithm—a description that does not just apply to a particular computation but
captures all computations. In other words, we are looking for a generalization of cost
descriptions. This generalization can be achieved by using parameters that can make the
number of steps required for executing an algorithm dependent on the size of its input.
Thus runtime complexity is a function that yields the number of steps in a computation
for an input of a given size.
For example, the number of computation steps, and thus the time,
required to execute the pebble-tracing algorithm is roughly equal to the
number of dropped pebbles. Since paths to different places in the forest
generally contain different numbers of pebbles, computations for these
paths also require different numbers of steps. This fact is reflected in
expressing runtime complexity as a function of the size of input. For the pebble-tracing
algorithm it is easy to derive a precise measure for each computation, since the number
of computation steps seems to be in one-to-one correspondence with the number of
pebbles. For example, for a path with 87 pebbles, the computation requires 87 steps.
However, this is not always the case. Take another look C
A
at the path depicted in figure 1.3. That example was used to
illustrate how the algorithm can get stuck, but we can also D
B
use it to demonstrate how the algorithm can produce compu-
tations that take fewer steps than there are pebbles. Since B
and C are visible from D, we can pick B, and since both A and C are visible from B,
we can next pick A, that is, the path D, B, A is a valid path, and since it bypasses C , the
computation contains at least one less step than the number of pebbles in the path.
Note that in this case the number of steps in the computation is actually lower than
predicted by the measure for the algorithm, which means that the cost of the computa-
tion is overestimated. Thus, runtime complexity reports the complexity a computation
can have in the worst case. This helps us decide whether or not to execute the algorithm
for a particular input. If the estimated runtime is acceptable, then the algorithm can
be executed. If the computation actually performs faster and takes fewer steps, then all
the better, but the worst-case complexity provides a guarantee that the algorithm will
not take more time. In the case of getting up, your worst-case estimate for the time it
takes to take a shower in the morning might be 5 minutes. Since this includes the time
it takes for the water to get warm, your actual shower time might be shorter if someone
has taken a shower before you.
Since runtime analysis is applied on the level of algorithms, only algorithms (not
individual computations) are amenable to an analysis. This also means that it is possible
40 Once Upon an Algorithm
to assess the runtime of a computation before it happens because the analysis is based on
the algorithm’s description.
Another assumption for runtime complexity is that one step in the algorithm gen-
erally corresponds to several steps that are executed by the performer of the algorithm.
This is obvious in the example. The pebbles are probably not placed one footstep apart,
so it will take Hansel and Gretel several footsteps to get from one pebble to the next.
However, each algorithm step must not cause an arbitrarily large number of computer
steps. This number must be constant and relatively small compared to the number of
steps taken by the algorithm. Otherwise the information about the runtime of the al-
gorithm would become meaningless: the number of algorithm steps would not be an
accurate measure for the actual runtime. A related aspect is that different computers
have different performance characteristics. In the example, Hansel may have longer legs
than Gretel and thus may need fewer footsteps to move between pebbles, but Gretel
might be a faster walker than Hansel and thus take a specific number of footsteps in less
time. All these factors can be ignored by focusing on the runtime of algorithms.
Cost Growth
Since information about runtime complexity for an algorithm is given as a function, it
can capture the differences in runtime for different computations. This approach reflects
the fact that algorithms typically require more time for bigger inputs.
The complexity of Hansel and Gretel’s algorithm can be characterized by the rule
“The runtime is proportional to the number of pebbles,” which means that the ratio of
the number of footsteps to pebbles is constant. In other words, if a path were doubled in
length and thus had twice as many pebbles, the runtime would double as well. Note that
this does not mean that the number of footsteps is identical to the number of pebbles,
only that it increases and decreases in the same way as the input.
This relationship is called linear, and it presents itself as
a straight line in a graph that plots the number of footsteps Number
of
linear
needed for any number of pebbles. In cases like this we say that
the algorithm has linear runtime complexity. We also some-
Number of
times say, for short, the algorithm is linear.
Linear algorithms are very good, and in many cases the best one can hope for. To
see an example of a different runtime complexity, consider the algorithm that Hansel
executes when dropping the pebbles. In the original version of the story he has all the
pebbles in his pocket and thus can drop them as they go into the forest. This is clearly
Chapter 2 Walk the Walk: When Computation Really Happens 41
a linear algorithm (with respect to the number of pebbles), since Hansel only takes a
constant number of footsteps to reach the location for dropping the next pebble.
But now suppose that Hansel has no way of storing and con-
cealing the pebbles. In that case, he has to go back home and
fetch a new pebble for each pebble he wants to drop, which takes
roughly twice the number of steps needed to get to that pebble.
The total number of steps is then the sum of the steps required
for each pebble. Since the distance from home increases with
each dropped pebble, the total number of steps is proportional
to the sum 1 + 2 + 3 + 4 + 5 + · · · , which is proportional to the
square of the number of pebbles placed.
This relationship can be illustrated by considering the distance that Hansel must
travel, measured in number of pebbles. For dropping two pebbles, Hansel has to get
to the place where he can drop the first pebble, go back to fetch another pebble, and
then go via the first pebble to the location where he can drop the second pebble. This
causes him to travel a total distance of four pebbles. For dropping three pebbles, Hansel
first needs to traverse the distance required for dropping two pebbles, which we already
know to be four. Then he has to go back to fetch the third pebble, which means traveling
a distance of two pebbles. To place the third pebble, he then needs to go the distance
of another three pebbles, which yields a total distance corresponding to 4 + 2 + 3 = 9
pebbles.
Let us consider one more case. For the fourth pebble, Hansel has traveled the
distance to drop three pebbles, goes back (three pebbles), and then needs to go the
distance of another four pebbles to get to the place for the fourth pebble, for a total
distance of 9 + 3 + 4 = 16 pebbles. In a similar fashion, we can compute the distance
required for placing five pebbles (16 + 4 + 5 = 25), six pebbles (25 + 5 + 6 = 36), and so
on.
We can clearly see a certain pattern emerging, namely, that the
number of steps needed by Hansel is proportional to the square of
the number of pebbles placed. Algorithms that have this complex- Number of
ity pattern are said to have quadratic runtime, or for short, to be quadratic
may therefore take 20 or 30 or 140 steps, respectively, for a path of, say 10 pebbles.
The same is true for a quadratic algorithm, whose number of steps may also need to
be multiplied by a factor. This indicates that a linear algorithm is not necessarily faster
than a quadratic algorithm in all cases. With a large constant factor it may take more
steps than a quadratic algorithm with a small constant factor, at least for small enough
inputs. For example, a linear algorithm that takes 14 steps per pebble takes 140 steps for
an input of 10 pebbles, which is more than the 100 steps a quadratic algorithm with 1
step per pebble will take. However, we can also see that with larger and larger inputs,
the impact of a constant factor fades and the growth of the quadratic algorithm takes
over. For example, for 100 pebbles this linear algorithm takes 1,400 steps, whereas the
quadratic one already takes 10,000.
In the story, the quadratic algorithm is prohibitive and wouldn’t work. Think about
the time it would take to place the last pebble. Hansel would have to walk all the way
home and then back into the forest again, thus basically covering the distance into the
forest three times. The parents were already impatient when he was placing pebbles
using the linear algorithm.
His father said: “Hansel, what are you looking at there and staying behind for?
Pay attention, and do not forget how to use your legs.”
They surely wouldn’t have waited for Hansel to go back home every time he needed a
new pebble. Thus an algorithm’s runtime really matters. If the algorithm is too slow, it
can become useless from a practical perspective (see chapter 7).
This example also illustrates that space and time efficiency are often interdependent.
In this case one can improve the runtime efficiency of the algorithm from quadratic to
linear at the expense of storage capacity, that is, using linear storage space, assuming that
Hansel’s pockets can store all the pebbles.
When two algorithms solve the same problem, but one has a lower runtime com-
plexity than the other, then the faster algorithm is said to be more efficient (with respect
to runtime). Similarly, when one algorithm uses less memory than the other, it is said
to be more space efficient. In the example, the linear pebble-placing algorithm is more
runtime efficient than the quadratic one, but less space efficient because it fills Hansel’s
pockets with pebbles.
Further Exploration
The pebbles in the story of Hansel and Gretel were representations to be used by a way-
finding algorithm. Marking a path can have different applications. On the one hand, it
can help you find your way back when exploring an unknown territory. This is what
happens in the story of Hansel and Gretel. On the other hand, it can help others follow
you. This occurs, for example, in J.J.R. Tolkien’s The Lord of the Rings: The Two Towers
when Pippin, who was taken prisoner by the orcs with Merry, drops a brooch as a sign
for Aragorn, Legolas, and Gimli. Similarly, in the movie Indiana Jones and the Kingdom
of the Crystal Skull, Mac secretly drops radio beacons so that he can be followed.
In all three examples, the markers are placed on a more or less open terrain where
movements can occur in almost any direction. In contrast, there are also situations
when the movement is restricted to a fixed number of connections between junctions.
This occurs in Mark Twain’s The Adventures of Tom Sawyer when Tom and Becky ex-
plore a cave and place smoke marks on the walls to find their way back out. But they
still get lost in the cave. When after several days Becky is too weak to go any further,
Tom continues to explore the cave and uses, as a more reliable method, a thread from
a kite to always find his way back to Becky. Probably the most famous (and oldest)
example of using a thread to not get lost in a labyrinth can be found in the Greek myth
of the Minotaur, in which Theseus uses a thread given to him by Ariadne to find his
way out of the labyrinth. The same method is used by the Benedictine novice Adso in
Umberto Eco’s The Name of the Rose to find his way back through the labyrinth of the
monastic library.
It is interesting to compare the different kind of markers used in the stories, and
the implications for the corresponding way-finding algorithms. For example, the use
of pebbles, brooch, or smoke marks still requires some search to get from one marker
to the next, since they only appear at a few places. In contrast, a thread provides a
continuous guide that can simply be followed without any search. Moreover, the use
of a thread avoids the possibility of a dead-end situation, such as the one described in
chapter 1, that can occur when using pebbles or other discrete markers.
44 Once Upon an Algorithm
Hansel and Gretel’s method is used in modern user interfaces of file systems or
query systems, where it is actually known as breadcrumb navigation. For example,
browsers in a file system often show a list of the parent or enclosing folders of the
current folder. Moreover, search interfaces for email programs or databases often show a
list of search terms that apply to the currently shown selection. The action of going back
to a parent folder or removing the last search term to get a wider selection corresponds
to going to a pebble and picking it up.
Representation and Data Structures
Sherlock Holmes
On Your Way
You are on your way to work. Whether you are driving your car, riding your bike,
or walking, you will encounter traffic signs and traffic lights that regulate how you and
your fellow commuters share the road. Some of the rules associated with traffic signs are
algorithms. For example, a stop sign at a four-way stop street tells you to stop, wait for
all other vehicles that have arrived before you to cross the intersection, and then cross
the intersection yourself.1 Following the rules associated with traffic signs amounts to
executing the corresponding algorithms, which means that the resulting motions are
examples of computation. Since many drivers and vehicles are engaged in this activity
and since they share the road as a common resource, this is actually an example of
distributed computing, but that is not the main point here.
It is remarkable that every day millions of people with completely different goals
can effectively coordinate their actions and successfully navigate their ways around one
another. Sure, traffic jams and accidents happen on a regular basis, but overall traffic
works quite successfully. Even more remarkable is the fact that all this becomes possible
through a small set of signs. Placing a red, hexagonally shaped sign containing the word
“STOP” at all entries to an intersection enables the coordinated crossing of countless
vehicles.
How is it possible that signs have such profound effects? The key observation is
that signs carry meaning. For example, a direction sign provides information about
the direction of a place of interest. The fact represented by such a sign supports the
traveler in making decisions about which turns or exits to take. Other signs provide
warnings (for example, of obstacles or curves), prohibit certain actions (such as limiting
the maximum speed), and regulate access to shared traffic spaces (such as intersections).
In chapter 1 I used the term representation for signs that stand for something else (such
as the pebbles that represent locations). From this perspective, signs derive their power
from the fact that they are representations.
The effect of a sign does not somehow appear magically by itself but needs to be
extracted by some agent. This process is called interpretation, and different agents may
48 Once Upon an Algorithm
interpret a sign in different ways. For example, while a typical traffic participant inter-
prets traffic signs as information or instruction, traffic signs have also become collector’s
items. Interpretation is required to understand traffic signs and is also required for all
representations used in computer science.
Signs are related to computation in several different ways, emphasizing their im-
portance. First, signs can directly represent computation, as in the case of the stop
sign, which designates a particular algorithm for traffic participants to execute. Even
if the computation represented by an individual sign is trivial, a combination of such
signs has the potential to produce a significant computation as an aggregate. Hansel and
Gretel’s pebbles provide an example of this. A single pebble triggers the simple action
“Come here if you haven’t visited me before,” whereas all pebbles taken together affect
the life-saving movement out of the forest.
Second, a systematic transformation of a sign is a compu-
tation.2 For example, crossing out a sign suspends or negates its
meaning; this is often done to prohibit actions, such as a bent ar-
row in a red circle with a red diagonal over it, which forbids a turn. Another example
is a traffic light: when it changes from red to green, the meaning changes accordingly
from “stop” to “go.”
Finally, the process of interpreting a sign is a computation. This isn’t obvious for
simple signs such as pebbles and stop signs but becomes clear when looking at composite
signs. One example is the crossed-out sign whose meaning is obtained from the original
sign, to which then the meaning of crossing-out has to be applied. Other examples are
food-exit signs, whose meanings are obtained as a combination of direction plus the
meaning of the restaurant logos that indicate different kinds of food. Interpretation is
discussed in chapters 9 and 13. The point here is that signs are entangled in computation
in multiple ways. It is therefore a good idea to understand what they are, how they
work, and what roles they play in computation. This is the subject of chapter 3.
3 The Mystery of Signs
As illustrated in the first two chapters, computation does its work by manipulating rep-
resentations, which are symbols or signs that stand for meaningful things. We have seen
how Hansel and Gretel employed pebbles as representations for locations in support
of their way-finding algorithm. For that the pebbles had to fulfill a number of require-
ments that we took for granted. Taking a close look at these requirements will provide
a better understanding of what a representation is and how it supports computation.
A representation consists of at least two parts, namely, something that represents
and something that is represented. This fact is captured by the notion of a sign. The
following are three aspects of signs to keep in mind: signs can operate on multiple levels;
signs can be ambiguous (a single sign can represent different things); and one thing can
be represented by different signs. This chapter also discusses how different mechanisms
enable signs as representations.
Signs of Representation
Do you have any doubt that 1 + 1 equals 2? Probably not, unless you are a time traveler
from ancient Rome. In that case the number symbols would look funny to you, and
you might instead agree that I + I equals II, that is, if somebody explained to you the
meaning of the + symbol, which was not known to the Romans (it was first used in the
fifteenth century). And if you could ask an electronic computer, whose number system
50 Once Upon an Algorithm
is based on binary numbers, it would tell you that 1 + 1 equals 10.1 What’s going on
here?
This example illustrates that a conversation about even very simple facts of arith-
metic requires an agreement about the symbols that represent quantities. This is of
course also true for computations with quantities. Doubling 11 yields 22 in the decimal
system, which is based on Hindu-Arabic numerals. In ancient Rome one would double
II, which yields IV (not IIII),2 and the result from an electronic computer would be 110,
since 11 in binary notation represents the number 3, and 110 represents 6.3
What this shows is that the meaning of a computation depends on the meaning
of the representation it transforms. For example, the computation that transforms 11
into 110 means doubling if the numbers are interpreted as binary numbers, whereas
it means multiplying by 10 if the numbers are interpreted as decimal numbers. And
the transformation is meaningless under Roman numeral interpretation because the
Romans didn’t have a representation for zero.
Since representation plays such a crucial role in computation, it is important to
understand what it really is. And since the word representation is used in many different
ways, it is important to be clear about its meaning in computer science. To this end, I
solicit help from Sherlock Holmes, the famous detective whose crime-solving methods
can reveal much about the way representations work in support of computation. It is
typical of Sherlock Holmes to make keen observations of minute details and interpret
them in surprising ways. Often these inferences help solve a crime, but sometimes they
are just an entertaining way of revealing information for advancing the story. In either
case, Sherlock Holmes’s inferences are often based on interpreting representations.
Representations play an important part in one of the most popular and famous
Sherlock Holmes adventures, The Hound of the Baskervilles. In typical Sherlock Holmes
style, the story begins with a number of observations about a walking stick left be-
hind by a visitor, Dr. Mortimer. Holmes and Watson interpret the engraving on the
walking stick, which reads as follows: “To James Mortimer, MRCS, from his friends
of the CCH.” Living in England, Holmes and Watson know that “MRCS” stands for,
or represents, Member of the Royal College of Surgeons, and from that and with the
help of a medical directory, Holmes deduces that “CCH” must stand for Charing Cross
Hospital, since Dr. Mortimer has worked there for some time. He also infers that the
walking stick must have been given to Dr. Mortimer in appreciation of his service when
he left the hospital to become a country practitioner, although it turns out later that
this inference is incorrect and that Dr. Mortimer was given the walking stick on the
occasion of his wedding anniversary.
Chapter 3 The Mystery of Signs 51
represents
on the one hand, text and diagrams that are used to write about
signs are signs themselves and, on the other hand, abstract con-
Signifier “tree”
cepts or ideas in the mind can never be shown directly but have to
be ultimately represented by signs as well. In the literature about semiotics, the theory
of signs and their meaning, the idea of a sign is often illustrated by a diagram containing
the word “tree” as an example of a signifier and a drawing of a tree as the thing signified
by “tree.” However, the drawing is itself a sign for the concept of a tree, and therefore
the diagram can be misleading, since “tree” is not a signifier of the drawing but of what
the drawing signifies, which is the concept of a tree.
Since we are bound to using language to talk about language and representation,
there is no way out of this dilemma, and we can never present ideas or concepts other
than through some linguistic means. Whether we want to talk about a signifier or a
signified, we always have to use signifiers to do that. Fortunately, we can get around
this problem most of the time by putting quote marks around a word or phrase that is
used as a signifier or by typesetting it in a special form such as italics. The quote marks
have the effect of referring to the word or phrase as such, without interpreting it. In
contrast, a word or phrase used without quote marks is interpreted as what it stands for,
that is, a signified concept.
Thus “tree” refers to the four-letter word, whereas the word without the quote
marks refers to the concept of a tree. The distinction between a quoted word stand-
ing for itself and its unquoted use standing for what it means is referred to as the use-
52 Once Upon an Algorithm
mention distinction in analytic philosophy. The unquoted word is actually used and
denotes what it represents, while the quoted word is only mentioned and does not refer
to what it stands for. Quoting stops interpretation from acting on the quoted part and
thus allows us to clearly distinguish between talk about a word and its meaning. For
example, we can say that “tree” has four letters, whereas a tree does not have letters but
branches and leaves.
The seemingly simple concept of a sign contains a lot of flexibility. For example,
signs can operate on multiple levels, signs can have multiple meanings, and the link
between signifier and signified can be established in different ways. I describe these
three aspects in the following sections.
“CCH” is the phrase “Charing Cross Hospital,” which is it- “Charing Cross
=
self a signifier for a London hospital, “CCH” can represent Signified Hospital”
represents
Viewed as a binary number, the signifier “1” represents the num- Bet win Cash
ber one on the level of computation representation. This number
double
can represent different facts in different contexts and thus has differ- One Two
ent problem representations. If you are playing roulette, this could
be, for example, the amount of money you bet on black. The trans- 1 append 0 10
formation that appends a 0 to the 1 means, on the level of computation representation,
to double the number one to two. In the context of the problem representation the
transformation can also mean that black did come up and that you won your bet and
now have twice the amount of money available.
Similarly, the pebbles in the forest are signifiers that represent locations for Hansel
and Gretel and belong to the computation representation. In addition, the individual
locations represent positions of danger in the problem representation. To distinguish
between different locations, one could further quantify the degree of danger by the
distance of a location from Hansel and Gretel’s home. Moving from pebble to pebble
means simply a change of location in the computation representation but also means
a decrease in danger in the problem representation if the location moved to is closer
to home. The transitivity of signs is the reason “He worked at CHH” means that he
worked at Charing Cross Hospital, not that he worked at “Charing Cross Hospital,”
which would not make any sense, since one cannot work at the name of a hospital.
l
ma
ma
bin
ci
ci
ar y
de
de
ber ten in the decimal representation and the number two in the bi-
nary representation. Moreover, the number two is represented by 2 10
the signifier “2” in the decimal representation and by the signifier “10” in the binary
representation. Of course, multiple representations exist also on the level of problem
representation. Clearly, the number one can be used to represent other things than a
bet on black at the roulette table.
54 Once Upon an Algorithm
er, ...
To James Mortim
Sign
Signified Dr. Mortimer: Sherlock Holmes:
Figure 3.1 Signs are the basis for representation. A sign consists of a signifier that represents
some concept, called the signified. One signifier can stand for different concepts.
signs can be classified accordingly. The logician, scientist, and philosopher Charles
Sanders Peirce identified three different kinds of signs.
First, an icon represents an object based on its similarity or likeness to the
object. An example is a drawing of a person that represents the person by
highlighting specific features. An obvious example of an iconic representation
in The Hound of the Baskervilles is the portrait of Sir Hugo Baskerville, which repre-
sents him through likeness. The portrait also looks similar to the murderer and is thus
another example of a signifier that can stand for different signifieds. The fact that the
portrait is effectively two signs helps Sherlock Holmes solve the case. Further examples
are the abbreviations CCH and MRCS when they are used to represent the phrases they
stand for. Here the likeness is established by the characters that a phrase and its abbre-
viation have in common. Finally, Sherlock Holmes employs a map of the Devonshire
moor to understand the location where the murder has happened. A map is iconic,
since the features it contains (paths, rivers, forest, etc.) resemble, mostly in shape and
position, the objects they represent.
Second, an index represents an object through some lawlike relationship
that lets the viewer of the index infer the object through this relationship.
An example is a weather vane from whose direction the wind direction can be
inferred. Other examples are all kinds of gauges that have been engineered as indexes for
different physical phenomena (temperature, pressure, speed, etc.). The saying “Where
there is smoke, there is fire” is based on smoke being an index for fire. An index sign
is determined by the object it signifies through the lawlike relationship between them.
Other important index signs in The Hound of the Baskervilles (and in other Sherlock
Holmes stories) are footprints. For example, the dog footprints found near the deceased
Charles Baskerville signify a gigantic hound. Moreover, the fact that the footprints stop
at some distance from Sir Charles is interpreted by Sherlock Holmes to indicate that
the hound did not have physical contact with him. The special form of Sir Charles’s
footprints indicates that he was fleeing the hound. Another index is the amount of ashes
from Sir Charles’s cigar found at the crime scene, indicating the time he was waiting at
the location of his death. Incidentally, Peirce himself used the example of a murderer
and his victim as an example of an index. Applied to the story, this means the dead Sir
Charles is an index for his murderer.
Third, a symbol represents an object by convention only; no likeness or
lawlike connection is involved in the representation. Since the link between
the signifier and the signified is completely arbitrary, the creator and the user
of the sign must agree on the definition and interpretation of the sign for it to work.
Most modern languages are symbolic. The fact that the word “tree” stands for a tree
Chapter 3 The Mystery of Signs 57
cannot be inferred but is a fact that has to be learned. Similarly, that “11” is a symbol for
eleven as well as three and that the pebbles are symbols for locations are conventions.
Of the signs we have mentioned from The Hound of the Baskervilles, the abbreviations
MRCS and CCH are symbols if used to represent the surgeon society membership and
the hospital, respectively, because they bear no resemblance nor do they result from any
lawlike relationship. Also, the symbol 2704 represents a cab that Holmes and Watson
follow to identify a person suspected to be threatening the Baskerville heir Sir Henry.
This happens when Sherlock Holmes recognizes the murderer in the portrait of Sir
Hugo Baskerville.
As an example of a computation with an index sign, recall the map of the De-
vonshire moor. To find the location where a particular road crosses a river, Sherlock
Holmes could compute the intersection of the two lines that represent the river and the
road, respectively. In fact, the map representation has the point effectively computed
already, so that it could be simply read off the map. Assuming that the map is accurate,
the resulting point then would represent the sought location.4 Sherlock Holmes could
also compute the length of a path on the map and, using the scale of the map, determine
the length of the path in the moor and also the time needed to traverse it. Again, this
works if the map is drawn to scale. Index computations exploit the lawlike relationship
between sign and signified to move from one signified to another via a transformation
of the index.
Computations with symbols are probably the most common in computer science,
because symbols enable the representation of arbitrary problems. The most obvious
symbol-based computations involve numbers and arithmetic, and we can find such an
example in Sherlock Holmes’s first adventure, A Study in Scarlet,5 where he computes
the height of a suspect from the length of the suspect’s stride. This is a very simple
computation, consisting of only one multiplication, and the corresponding algorithm
consists of simply one step.
More interesting from a computational perspective are Sherlock Holmes’s attempts
to decode encrypted messages. In The Valley of Fear he tries to understand a message
sent to him that starts as follows: 534 c2 13 127 36 . . .. This code signifies a message. The
first task for Sherlock Holmes is to figure out the algorithm that was used to generate
the code, since this allows him to decode the message. He concludes that 534 must be
a page number of some book, that c2 means “second column,” and that the following
numbers represent words in that column.
But is the code really a symbolic sign? Since the code is generated by an algorithm
from a given message, it seems that the encoding algorithm establishes a lawlike rela-
tionship between the message and its code. Therefore, the code is not a symbol but
an index. This illustrates another way in which computation is involved with signs.
Whereas interpretation produces the signified for a given signifier, an algorithm for gen-
erating index values operates in the opposite direction and generates a signifier from a
given signified.
The bottom line is this: Representation is the basis for computation. Its nature and
basic properties can be understood through the lens of signs. And much like works
of art can be based on many different materials (clay, marble, paint, etc.), computa-
Chapter 3 The Mystery of Signs 59
You have arrived at your office and face the task of working through a collection of
documents. Before you can start the actual work, you have to decide on the order in
which to process the documents and how to manage the document collection to ensure
that order. These questions are also relevant in other contexts. Think, for example, of
a car mechanic who has to repair several different cars or a physician who has to treat a
number of patients in the waiting room.
The order in which the elements of a collection (documents or cars or people) are
to be processed is often determined by a policy, such as first come, first serve, which
means that the elements are dealt with in the order in which they arrive. Such a policy
requires that the maintenance of a collection follow a particular pattern that defines
the interplay between adding, accessing, and removing elements. In computer science
a collection with a specific access pattern is called a data type, and the data type that
captures the first come, first serve principle is called a queue.
While queues are widely used to determine the order in which the elements of a
collection are to be processed, there are also other strategies. For example, if elements
are processed according to some priority instead of in the order they arrive, the collec-
tion is a priority queue data type. Some of the documents in your office might be of
that type, for example, an urgent inquiry that has to be answered immediately, a memo
that you have to respond to by lunch time, or an offer that must go out today. Other
examples are patients in the emergency room, who are treated in order of the severity
of their symptoms, or travelers who can board an airplane according to their frequent
flier status.
Another pattern is the processing of requests in the opposite order of their appear-
ance. While this may seem odd at first, this situation arises quite frequently. For ex-
ample, imagine you are working on your tax return. You may start with the main tax
form, but when one of the fields asks for your deductions, this requires you to fill out
another form first. To do this, you have to retrieve the corresponding receipts and add
the amounts. You finish dealing with the three kinds of documents in the opposite or-
62 Once Upon an Algorithm
der in which they became relevant: You first enter the amounts from the receipts and
put them away. Then you can complete the deductions form, and finally you go back to
working on the main tax form. A collection whose elements are processed in this order
is referred to as a stack data type, since it behaves like a stack of pancakes: the pancake
that ended up last on the stack will be eaten first, and the bottom-most pancake, which
is the first element put on the stack, gets eaten last. The pattern described by a stack
data type occurs in a variety of tasks, ranging from baking to assembling furniture. For
example, the egg whites have to be whisked before being added to the batter, and the
drawer has to be screwed together before being inserted into the cupboard.
Knowing the access pattern for processing the elements of a collection leads to the
second question of how to arrange the elements to best support that pattern. Such an
arrangement of elements is called a data structure. Let’s take the queue data type as an
example and see how it can be implemented in different ways. If you have enough desk
space (which may be too optimistic an assumption), you can line up the documents,
add them at one end and remove them from the other. Every now and then you shift
all documents to move the empty space at the front of the queue to its end. This is
similar to how people line up one after another at a coffee shop. Each person enters the
line at one end and makes their way to the front after all other people in front of them
have left the line. A different method, employed in many government offices, is for
everybody to pull a number and then wait to be called for their turn. You can also use
this system for the documents in your office by attaching sticky notes with consecutive
numbers to them.
The sequence of documents on your desk and the line in a coffee shop is called a
list data structure. Here the physical arrangement of the people in line ensures the or-
dering required for a queue. In contrast, the method for assigning consecutive numbers
doesn’t require you to physically maintain the order of people or documents; they can
be placed anywhere, since the numbers capture the correct ordering. This arrangement
of assigning elements to numbered slots is called an array data structure. In addition
to the numbered slots (realized by people carrying the numbers they have pulled), one
also has to maintain two counters, one for the next available slot and one for the next
number to be served.
By representing collections as data structures they become accessible to computa-
tion. The choice of data structure is important for the efficiency of algorithms and is
sometimes influenced by other considerations, such as the available space. When Sher-
lock Holmes maintains information about a case, such as a collection of suspects, he is
essentially using data types and data structures. I therefore continue using the story of
The Hound of the Baskervilles to motivate and explain these concepts.
4 Detective’s Notebook:
Accessory after the Fact
Computation is particularly useful when we have to deal with large amounts of data
that cannot be handled in a few individual steps. In such cases an appropriate algorithm
can ensure that all data are processed systematically and, in many cases, also efficiently.
Signs, discussed in chapter 3, illustrate how representation works for individual
pieces of information and how this representation becomes part of computation. For
example, Hansel and Gretel’s movements from pebble to pebble mean that they are in
danger until they make the move from the last pebble to their home. But even though
collections of signs are signs themselves, it is not clear how to compute with such col-
lections. In the case of Hansel and Gretel, an individual pebble is a signifier for one
location, and the collection of all pebbles signifies a path from danger to safety, but how
is such a collection built and used systematically? The maintenance of a collection of
data raises two questions.
First, in which order will data be inserted into, looked up in, and removed from a
collection? The answer depends, of course, on the computational task that the collec-
tion is involved in, but we can observe that specific patterns of accessing the elements
in a collection reoccur. Such a data access pattern is called a data type. For example, the
pebbles employed by Hansel and Gretel are visited in the opposite order in which they
were placed; such an access pattern is called a stack.
Second, how can a collection be stored so that the access pattern, or data type, is
supported most efficiently? Here the answer depends on a wide variety of factors. For
example, how many elements have to be stored? Is this number known in advance?
64 Once Upon an Algorithm
How much space does each element require for storage? Do all elements have the same
size? Any particular way of storing a collection is called a data structure. A data struc-
ture makes a collection amenable to computation. One data type can be implemented
by different data structures, which means that a particular access pattern can be imple-
mented through different ways of storing data. The difference between data structures
lies in how efficiently they support specific operations on the collection. Moreover, one
data structure can implement different data types.
This chapter discusses several data types, the data structures to implement them,
and how they are used as part of computations.
The order of the elements in a list matters, and the elements are typically accessed one
by one, starting at one end and proceeding to the other. Lists are often written down
vertically, one element per line, with the first element at the top. Computer scientists,
however, write lists horizontally, presenting the elements from left to right, connected
by arrows to indicate the order of the elements.2 Using this notation, Sherlock Holmes
can write down his list of suspects as follows:
The arrows are called pointers and make the connection between list elements explicit,
which becomes important when one considers how lists are updated. Assume Sherlock
Holmes’s suspect list is Mortimer → Beryl and that he wants to add Jack between the
two.
If the elements are written down as a vertical list of items without
Mortimer
any empty space between them, he has to resort to some extra notation Jack Beryl
to make the position of the new element clear. The alternative is to
simply write down a fresh copy of the complete new list. That would be, however, a
huge waste of time and space. It would require time and space that is in the worst case
quadratic in the size of the final list.
Pointers give us the flexibility to write down new ele-
ments wherever we find space and still place them at the Mortimer Beryl
proper position in the list by connecting them to their neigh- redirect add
boring list elements. For example, we can place Jack at the Mortimer Beryl Jack
end of the list, redirect the outgoing pointer from Mortimer
to Jack, and add a pointer from Jack to Beryl.
In the context of the story, the order of the suspects in this list is arbitrary and has
no meaning, but we can see that the creation of a list forces us to pick some order for
its elements. It is a defining characteristic of a list that its elements are kept in a specific
order.
The inspection of the list’s elements happens in the particular order given by the
list. Thus to find out whether Selden is a suspect, we have to start at the beginning of
the list and check the elements one by one by following the pointers. While it seems we
can spot the element Selden directly in the list, this works only for relatively small lists.
Since our visual field is limited, we cannot immediately recognize a specific element in
a long list and thus have to resort to the one-element-at-a-time traversal of the list.
66 Once Upon an Algorithm
bering the meaning of the two terms: A data type describes a task; a data structure
describes a solution.) A data type is a more abstract description of data management
than a data structure and has the advantage that some details can be left unspecified and
thus can lead to succinct and general descriptions. In The Hound of the Baskervilles the
set data type reflects the task of maintaining a collection of suspects without having to
say in detail how to implement it. In the Hansel and Gretel story a set data type to
remember visited pebbles is enough to describe the algorithm. However, to actually
execute the operations stipulated by a data type, a computer needs to employ a concrete
data structure that defines how the operations operate on the representation offered by
the data structure. Also, only when a concrete data structure is selected for an algorithm
can we determine the algorithm’s runtime complexity.
Since one data type can be implemented by different data structures, the question
is which data structure one should choose. One would want the data structure that
implements the data type operations with the best possible runtime so that the algo-
rithm using the data structure runs as fast as possible. However, this is not always an
easy decision because a data structure may support some operations well but not others.
Moreover, data structures also have different space requirements. The situation is simi-
lar to choosing a vehicle for a particular transportation task. A bike is environmentally
friendly, and you can’t beat the mileage it gets per gallon. But it is comparatively slow,
can only transport one or two people, and has a limited range. You may need a van or
even a bus for traveling with many people over longer distances. You pick a truck to
transport large items, a sedan for traveling comfortably, and maybe a sports car when
you’re a male in your fifties.
Coming back to the question of how to implement the set data type, two popular
alternatives to using lists are the array and the binary search tree data structures. Binary
search trees are discussed in detail in chapter 5; here I focus on arrays.
If a list is like a ring binder, an array is like a notebook, which has a fixed number
of pages that are each uniquely identified. An individual field in an array data structure
is called a cell, and a cell’s identifier is also called its index. Identifying (or indexing)
cells is usually done using numbers, but we can also use letters or names as identifiers
as long as we can directly open a particular page using the identifier.4 The importance
of an array data structure lies in the fast access it provides to each individual cell. No
matter how many cells the array contains, it takes just one step to access a cell. Any
such operation that requires only one or a few steps, irrespective of the size of the data
structure, is said to run in constant time.
68 Once Upon an Algorithm
+ − + + + ...
Mortimer Desmond Jack Beryl Selden ...
The important difference between arrays and lists is that one can locate individual cells
immediately in an array, whereas one has to scan the elements of a list from the begin-
ning to find the element (or reach the end of the list in case the element is not in the
list).
Since we can directly open a particular page in the notebook (or access a cell in an
array), all three operations—adding, removing, and finding suspects—can be performed
in constant time, which is optimal: we cannot possibly do any of this faster. Since the
list data structure requires linear time for checking and removing suspects, the array
data structure seems to be the winner hands down. So why are we even talking about
lists?
The problem with arrays is their fixed size, that is, a notebook has a specific number
of pages and cannot grow over time. This has two important implications. First, the
notebook has to be chosen big enough to contain all possible suspects right from the
beginning, even if many of them never become suspects. Thus we may waste a lot
of space, and we may be carrying around a huge book with pages for hundreds or
thousands of potential suspects, whereas in reality the set of suspects may be very small;
it may very well contain fewer than ten suspects at any given time. This also means that
the preparation of the notebook’s thumb index at the start can take a long time because
it requires writing the name of each potential suspect on a different page. Second—
and this is an even more serious problem—it might not be clear at the beginning of
the mystery who all the potential suspects are. In particular, new potential suspects
Chapter 4 Detective’s Notebook: Accessory after the Fact 69
insert remove
Selden Desmond
Desmond Desmond
Set
Selden Selden Data Type
Jack Jack Jack
List
Desmond Jack Selden Desmond Jack Selden Jack
Data Structure
- + + - - + + + - - + + Array
Beryl Desmond Jack Selden Beryl Desmond Jack Selden Beryl Desmond Jack Selden Data Structure
Figure 4.1 A data type can be implemented by different data structures. Inserting an element
into a list can be done by simply adding it to the front of the list, but removal requires traversing
the list to find it. With an array, insertion and removal are done by directly accessing the array
cell indexed by an element and changing the mark accordingly. Arrays allow faster implementa-
tion, but lists are more space efficient.
may become known as the story unfolds, which is certainly the case in The Hound of
the Baskervilles. This lack of information prohibits the use of a notebook, since its
initialization is impossible.
This weakness of the bulky array is the strength of the nimble list, which can grow
and shrink over time as needed and which never stores more elements than necessary.
In choosing a data structure for implementing a set data type we have to be mindful
of the following trade-off. An array provides a very fast implementation of the set
operations but potentially wastes space and might not work in all situations. A list is
more space efficient than an array and works under all circumstances, but it implements
some operations less efficiently. The situation is summarized in figure 4.1.
Information Gathering
Identifying suspects is only one step in solving a murder mystery. To narrow down
the set of suspects Sherlock Holmes and Dr. Watson need to gather specific information
about them, such as motive or potential alibi. In the case of Selden, for example, this in-
formation includes the fact that he is an escaped convict. All this additional information
should be stored together with each corresponding suspect. When using a notebook,
70 Once Upon an Algorithm
Sherlock Holmes would add information about a person on the page reserved for that
person.
The operations of the set data type cannot do that, but a few small changes to the
operations for adding and finding elements are sufficient to make it happen. First, the
operation for inserting elements takes two pieces of information: a key for identifying
the information plus the additional information associated with the key. The key for
information about a suspect is his or her name. Second, the operations for finding and
removing suspects will only take the key as input. In the case of removing, a person’s
name and all the additional information stored with it will be removed. In the case of
finding a person, the information stored for that name will be returned as a result.
This minor but important extension of the set data type is called a dictionary because
like a real dictionary it allows us to look up information based on a keyword, as when
Sherlock Holmes uses a medical directory to find out about Dr. Mortimer’s professional
history at the beginning of The Hound of the Baskervilles. A dictionary can be viewed as
a collection of signs, and each key is a signifier for the information stored with it. The
dictionary data type differs in two ways from a traditional printed dictionary. First, the
content of a printed dictionary is fixed, whereas a dictionary data type can change—
new definitions can be added, obsolete ones can be deleted, and existing ones can be
updated. Second, the entries in a printed dictionary are alphabetically sorted by keys,
whereas that is not required for a dictionary data type. The ordering of entries in a
printed dictionary is necessary because the large number of entries makes direct access
to particular pages impossible. Since a thumb index with access to each page would
contain too many entries and would have to be too tiny to be of practical use, the
sorted keys enable a user of the dictionary to find the entry using a search algorithm
(see chapter 5).
Both restrictions of physical dictionaries, the need for sorted keys and the fixed size
and content, do not apply to electronic dictionaries. A widely used dynamic dictionary
is Wikipedia,5 which not only lets you use it but also extend and update information
in it. In fact, Wikipedia’s content has been assembled by its users, a remarkable success
of crowdsourcing and a testimony to the power of collaboration. If Sherlock Holmes
and Watson were working on the The Hound of the Baskervilles case these days, they
might very well, instead of sending letters back and forth, use a wiki6 for maintaining
information about the suspects and the case.
The dynamic nature of the dictionary is not limited to only inserting and removing
information about suspects; it also allows updating that information. For example, the
fact that the escaped convict, Selden, is Eliza Barrymore’s brother is not known when
Selden becomes a suspect and has to be added later to an already existing dictionary en-
Chapter 4 Detective’s Notebook: Accessory after the Fact 71
try for him. But how can this be done? Since we have only three operations for adding,
removing, and finding entries in a dictionary, how can we update information once it is
stored with a key in the dictionary? We can achieve this by combining operations: find
the entry using the key, take the returned information, modify it as needed, remove the
entry from the dictionary, and finally add it back with the updated information.
We can add new operations to the set data type in a similar fashion. For example,
if Sherlock Holmes maintains a set of people who benefit from Sir Charles’s death, he
might want to add a motive for some of the suspects. To do that he could compute the
intersection of the set of beneficiaries with the set of suspects. Or perhaps he wants
to identify new suspects by determining the beneficiaries who are not in the set of sus-
pects. To do that he could compute the set difference between the two sets. Assuming
that the set data type has an operation for reporting all elements in a set, an algorithm
for computing the intersection or set difference of two sets can simply go through all
elements of one set and for each element check whether it is in the second set. If it is, it
reports the element as a result of set intersection. If it is not, it reports the element as a
result of set difference. Such computations are quite common, since they correspond to
the combination of predicates. For example, the intersection of the suspect and benefi-
ciary sets corresponds to the predicate “is a suspect and a beneficiary,” and the difference
between beneficiaries and suspects corresponds to the predicate “is a beneficiary and not
a suspect.”
Finally, we need a data structure to implement dictionaries so that we can actually
compute with them. Since the dictionary data type differs from the set data type only
by associating additional information with elements, most data structures for sets, in-
cluding arrays and lists, can be extended to also implement dictionaries. This is the case
for all those data structures that explicitly represent each element of the set, because in
that case we can simply add the additional information to the key. This also means that
any data structure for implementing a dictionary could be used to implement a set; one
can simply store an empty or meaningless piece of information together with the key.
For other tasks the order of items in a data representation does matter. Consider,
for example, the heirs of the deceased Sir Charles. The difference between the first
and second in line is the entitlement to a heritage worth one million pounds. This
information is not only important for determining who gets rich and who doesn’t but
also provides Holmes and Watson with clues about potential motives of their suspects.
In fact, the murderer Stapleton is second in line and tries to kill the first heir, Sir Henry.
While the succession of heirs matters, the ordering of heirs is not determined by the
time people enter the collection of heirs. For example, when a child is born to the
bequeather, it does not become the last in line but takes precedence in the ordering
over, say, nephews. A data type in which the ordering of elements is not determined
by the time of entry but by some other criterion is called a priority queue. The name
indicates that the position of the element in the queue is governed by some priority,
such as the relationship to the bequeather in the case of heirs, or the severity of the
injury of patients in an emergency room.
In contrast, if the time of entry does determine the position in a collection, the data
type is called a queue if the elements are removed in the order they were added or a stack
if they are removed in reverse order. A queue is what you go through in a supermarket,
a coffee shop, or a security check at the airport. You enter at one end and come out
at the other. People are served (and then leave the queue) in the order they entered. A
queue data type therefore establishes a first come, first serve policy. The order in which
elements enter and leave a queue is also called FIFO for first in, first out.
In a stack, on the other hand, elements leave in the opposite order in which they
were added. A good example is a stack of books on your desk. The topmost book,
which was placed on the stack last, has to be removed before any other book can be
accessed. When you have a window seat in an airplane, you are effectively on the bottom
of a stack when it comes to going to the bathroom. The person in the middle seat who
sat down after you has to leave the row before you can, and they have to wait for the
person in the aisle seat, who sat down last and is on top of the stack, to get out first.
Another example is the order in which Hansel and Gretel place and visit the pebbles.
The pebble that is placed last is the one they visit first. And if they use the improved
algorithm for avoiding loops, the last-placed pebble is the first they pick up.
At first, using a stack data type seems like a strange way of processing data, but
stacks are really good for staying organized. For Hansel and Gretel the stack of peb-
bles allows them to systematically go to the place where they have been before, which
ultimately leads all the way back to their home. In a similar way, Theseus escaped the
Minotaur’s labyrinth with the help of a thread given to him by Ariadne: unraveling the
thread amounts to adding it inch by inch to a stack, and coming out of the labyrinth
Chapter 4 Detective’s Notebook: Accessory after the Fact 73
by rewinding the thread amounts to removing the thread from the stack. If in our daily
lives we are interrupted in one task by, say, a phone call, which is then interrupted by,
say, someone knocking at the door, we mentally keep those tasks on a stack, and we
finish the last one first and then go back and pick up where we left the previous one.
The order in which elements enter and leave a stack is therefore also called LIFO for
last in, first out. And to complete the mnemonics, we could call the order in which
elements enter and leave a priority queue HIFO for highest (priority) in, first out.
As for the set and dictionary data types, we would like to know which data struc-
tures can be used to implement the (priority) queue and stack data types. It is easy to
see that a stack can be well implemented by a list: by simply adding and removing ele-
ments always at the front of a list we implement the LIFO ordering, and this takes only
constant time. Similarly, by adding elements at the end of a list and removing them
at the front, we obtain the FIFO behavior of a queue. It is also possible to implement
queues and stacks with arrays.
Queues and stacks preserve the ordering of the elements during their stay in the
data structure and thus exhibit a predictable pattern of elements leaving them. Waiting
in the security line at an airport is usually rather boring. Things get more exciting when
people cut into line or when special lanes for priority customers exist. This behavior is
reflected in the priority queue data type.
when it changes, for example, when children are born. In that case the family tree
would change, and the list computed from the old tree would not reflect the correct
inheritance ordering.
The information provided in The Hound of the Baskervilles indicates that the old
Hugo Baskerville had four children. Charles was the oldest child and inherited from
Hugo. The next-oldest brother, John, had one son named Henry, and the youngest
brother, Rodger, had a son who is Stapleton. The story mentions that Hugo Baskerville
had a daughter, Elizabeth, who I assume to be the youngest child. The names in the
tree are called nodes, and as in a family tree, when a node B (such as John) is connected
to node A above it (such as Hugo), B is called A’s child, and A is called B’s parent. The
topmost node in a tree, which has no parent, is called its root, and nodes that don’t have
any children are called leaves.
Hugo A
Henry Stapleton
The inheritance rule applied to the Baskerville family tree says that Charles, John,
Rodger, and Elizabeth should inherit from Hugo, in that particular order. Since the
rule also says that children should inherit before siblings, this means that Henry is in
line before Rodger, and Stapleton precedes Elizabeth. In other words, the ordered list
of heirs should be as follows:
Hugo → Charles → John → Henry → Rodger → Stapleton → Elizabeth
This inheritance list can be computed by traversing the tree in a particular order, which
visits each node before its children. The traversal also visits the grandchildren of older
children before visiting younger children (and their grandchildren). The algorithm for
computing an inheritance list for a node in a tree can be described as follows:
To compute the inheritance list for a node N, compute and append the
inheritance lists for all of its children (oldest to youngest), and place the
node N at the beginning of the result.
This description implies that the inheritance list for a node that does not have children
simply consists of that node alone, so the inheritance list for that tree is obtained by
Chapter 4 Detective’s Notebook: Accessory after the Fact 75
computing the inheritance list for its root. It might seem peculiar that the algorithm
refers to itself in its own description. Such a description is called recursive (see chapters
12 and 13).
The following illustrates how the algorithm works by executing it on an example
tree, one with a few more members. Assume Henry had two children, Jack and Jill,
and a younger sister, Mary:
Hugo
Jack Jill
If we execute the algorithm on this tree, starting with Hugo, we have to compute the
inheritance list for each of Hugo’s children. Beginning with the oldest child, the inher-
itance list for Charles consists of only himself, since he has no children. The situation
is more interesting for John. To compute his inheritance list, we need to compute the
inheritance lists for Henry and Mary and append them. Henry’s inheritance list con-
sists of his children plus himself, and since Mary has no children, her inheritance list
contains only herself. So far we have computed the following inheritance lists:
The inheritance list for a node always starts with that node itself, and the inheritance
list for a node without children is a list that only contains that node. Moreover, the
76 Once Upon an Algorithm
inheritance lists for Henry and John demonstrate how the inheritance lists for nodes
with children are obtained by appending their children’s inheritance lists. The inheri-
tance lists for Rodger and Elizabeth are computed in a similar way, and if we append
the inheritance lists for Hugo’s four children, adding him at the front, we obtain the
following ordered list of heirs:
The examples in The Hound of the Baskervilles have illustrated how signs work as repre-
sentations and how data types and data structures organize collections of data.
Many Sherlock Holmes stories are replete with signs and their interpretation, in-
cluding the analysis of fingerprints, footprints, and handwriting. Even in The Hound
of the Baskervilles, there are several additional signs that were not discussed. For ex-
ample, Dr. Mortimer’s stick contains scratches that indicate its use as a walking stick,
and it also has bite marks indicating that Dr. Mortimer has a pet dog. Or, examining
the anonymous message sent to Sir Henry, Sherlock Holmes concludes that based on
their typeface, the letters used must have been cut out from the Times newspaper. He
also concludes from the size of the cut marks that a small pair of scissors must have
been used. For a detective, the conclusions to be drawn can differ when signs have
multiple interpretations or when some signs are ignored. This is illustrated in Pierre
Bayard’s Sherlock Holmes Was Wrong, where he argues that Sherlock Holmes identified
the wrong person as a murderer in The Hound of the Baskervilles. Of course, many other
detective stories, movies, and popular TV series, such as Columbo or CSI, provide ex-
amples of signs and representations and demonstrate how they are used in computing
conclusions about unsolved crimes.
Umberto Eco’s The Name of the Rose is the story of a murder mystery that takes
place in a monastery in the fourteenth century. It embeds semiotics into a detective
story. The protagonist William of Baskerville (the last name is actually a signifier for
the Sherlock Holmes story), who is in charge of the murder investigation, is open-
minded in his interpretation of signs and thus encourages the reader to play an active
role in analyzing the events in the story. More recently, Dan Brown employed many
signs in the popular novels The Da Vinci Code and The Lost Symbol.
In Gulliver’s Travels, by Jonathan Swift, we are told about a project in the grand
academy of Lagado that aims at avoiding the use of words, and thus signs, because they
only stand for things. Instead of uttering words, it is suggested that people carry objects
that they produce whenever they need to talk about them. Swift’s satire illustrates
78 Once Upon an Algorithm
Indiana Jones
Lost and Found
Where is that note you created a few months ago? You know you wrote it on a sheet
of paper that was related to the project that just came up again. You searched all the
places where that sheet of paper could possibly be—at least you think you did. Yet, you
can’t find it. You observe yourself searching some of the places repeatedly—maybe you
didn’t look carefully enough the first time. You also find notes you were looking for
desperately and unsuccessfully some time ago. Darn it.
Does that scenario sound familiar? I certainly have experienced it, and not just once.
Without doubt, finding something can be difficult, and an unsuccessful search can be
quite frustrating. Even in the days of the seemingly all-powerful Google search engine,
finding the right information on the internet can be difficult if you don’t know the right
keywords or if the keywords are too general and match too many sources.
Fortunately, we are not doomed to be helpless victims of data deluge. The challenge
of finding something efficiently can be met by organizing the space in which to search.
After all, your mother was right when she told you to clean up your room.1 An effec-
tive organization of the search space involves one or both of the following principles:
(1) partition the space into disjoint areas and place the searchable items into these areas,
and (2) arrange the searchable items in some order (within their areas).
For example, the partitioning principle may lead to placing all books on a shelf, all
papers into a file drawer, and all notes into a ring binder. If you then have to find a note,
you know that you only have to search in the ring binder and not in the file drawer or
on the bookshelf. Partitioning is important because it allows you to effectively limit the
search space to a more manageable size. This principle can also be applied on multiple
levels to make a search even more focused. For example, books can be further grouped
according to their subject, or papers can be grouped by the year in which they were
written.
Of course, partitioning works only under two important conditions. First, the cat-
egories used for subdividing the search space must be discernible for the item you are
looking for. Suppose, for example, you are looking for Franz Kafka’s The Metamorpho-
82 Once Upon an Algorithm
sis. Did you place it under “Fiction” or under “Philosophy”? Or maybe you placed it
with your other entomology books, or with comics about transforming superheroes,
such as The Hulk or some of the X-Men. Second, this strategy can work only if the
search partition is maintained correctly at all times. For example, if you take a note out
of the ring binder, and instead of putting it back immediately after using it, keep it on
your desk or stuff it in the bookshelf, you won’t be able to find it in the binder the next
time you’re looking for it.
Maintaining an order, the second principle to support searching, finds applications
in many different situations, from using printed dictionaries to holding a hand of play-
ing cards. The order among the searchable items makes it easier and faster to find items.
The method we employ effortlessly in searching a sorted collection is called binary
search. We pick an item somewhere in the middle of the collection and then continue
searching to the left or right, depending on whether the item we’re looking for is smaller
or larger, respectively. It is also possible to combine the partitioning and ordering prin-
ciples, which often provides some helpful flexibility. For example, while we can keep
the books on the shelf ordered by author name, we can order the papers by date or the
notes by their main subject.
A closer look at the two principles reveals that they are related. Specifically, the
idea of keeping things in order implies the idea of partitioning the search space, applied
rigorously and recursively. Each element divides the collection space into two areas
containing all smaller and larger elements, and each of those areas is again organized in
the same way. Maintaining an ordered arrangement requires real effort. The interesting
question from a computing perspective is whether the payoff of faster search justifies
the effort of maintaining order. The answer depends on how large the collection is and
how often one has to search it.
The need for searching arises not only in an office. Other environments, such as
kitchens, garages, or hobby rooms, are subject to the same challenges, frustrations, and
corresponding solutions. Searching is also called for in situations not usually perceived
as such. In chapter 5, Indiana Jones’s quest for the Holy Grail in The Last Crusade
provides several obvious and not-so-obvious examples of searching.
5 The Search for the Perfect
Data Structure
Data types discussed in chapter 4 capture specific access patterns to collections of data.
Since collections can grow very large, an important practical consideration is how to
manage them efficiently. We have already seen that different data structures can support
the efficiency of specific operations and that they have different space requirements.
While building and transforming collections are important tasks, finding items in a
collection is probably the most frequently needed operation.
We are searching for things all the time. Often this happens unconsciously, but
sometimes we are made painfully aware of it, such as when a mundane activity like
getting your car keys can turn into an agonizing search. The more places we have to
search and the more items we have to go through, the more difficult it is to find what
we’re looking for. And we tend to accumulate lots of artifacts over the years—after all,
we are descendants of hunters and gatherers. In addition to collector’s items such as
stamps, coins, or sports cards, we also hoard lots of books, pictures, or clothes over
time. Sometimes this happens as a side effect of some hobby or passion—I know several
home improvement aficionados who have amassed an impressive collection of tools.
If your bookshelf is organized alphabetically or by topic, or if your picture collec-
tion is electronic and has location and time tags, finding a particular book or picture
might be easy, but if the number of items is large and lacks any form of organization,
such a search can become arduous.
The situation gets much worse when it comes to electronically stored data. Since
we have access to basically unlimited storage space, the size of stored data grows rapidly.
84 Once Upon an Algorithm
For example, according to YouTube, 300 hours of video are uploaded to its site every
minute.1
Searching is a prevalent problem in real life, and it is also an important topic in
computer science. Algorithms and data structures can speed up the search process con-
siderably. And what works with data can sometimes help with storing and retrieving
physical artifacts. There is indeed a very simple method that relieves you from ever
having to search for your car keys again—if you follow the method meticulously, that
is.
How does a search for an archaeological artifact—or a person for that matter—work? If
the location of the sought object is known, no search is of course required. Simply go
there and find it. Otherwise, the search process depends on two things: the search space
and potential clues or keys that can narrow down the search space.
In The Last Crusade, Indiana Jones receives his father’s notebook containing infor-
mation about the Holy Grail by mail from Venice, which causes him to start his search
there. The notebook’s origin is a clue for Indiana Jones that narrows down the initial
search space considerably—from all of earth to one city. In this example the search
space is literally the two-dimensional geometric space, but in general the term search
space means something more abstract. For example, any data structure for representing
a collection of items can be viewed as a search space in which one can search for partic-
ular items. Sherlock Holmes’s list of suspects is such a search space in which one can
look for a particular name.
How well are lists suited for searching? As I have discussed in chapter 4, in the
worst case we have to inspect all elements in the list before we find the desired element
or learn that the element is not contained in the list. This means that searching through
a list makes no effective use of clues for narrowing down the search space.
Chapter 5 The Search for the Perfect Data Structure 85
The search through the shelves can be carried out in different ways. One simple
approach is to inspect the shelves one by one until the shelf with the range containing
Doyle is found. This approach treats the shelves effectively as a list and consequently has
the same shortcomings. In the case of Doyle, the search might not take too long, but if
we are looking for a book by Yeats, it will take much longer. Few people would actually
start their search for Yeats at the A shelf but instead begin closer to the Z shelf and thus
find the correct shelf more quickly. This approach is based on the assumption that all
shelves are ordered by the names of authors whose books they carry and that if there
were 26 shelves (and assuming one shelf per letter), one would start looking for Yeats
on the twenty-fifth shelf and for Doyle on the fourth. In other words, one would treat
the shelves as an array that is indexed by letters and whose cells are shelves. Of course,
it is unlikely that the library contains exactly 26 shelves, but the approach scales to any
number of shelves; one would simply start the search for Yeats in the last one-thirteenth
of the shelves. A problem with this approach is that the number of authors whose names
start with the same letter varies greatly (there are more authors whose names start with
an S than with an X ). In other words, the distribution of books is not uniform across
all shelves, and thus this strategy is not exact. Therefore, one generally has to go back
and forth to locate the target shelf. Exploiting knowledge about the distribution of
author names, one could increase the precision of this method substantially, but even
the simple strategy works well in practice and is effective in excluding as “outside” a
large number of shelves that need not be considered at all.
The search for the book within one shelf can proceed again in different ways. One
can scan the books one by one or apply a similar strategy to the one for locating shelves
by estimating the location of the book based on the clue’s position in the range of
authors for that particular shelf. Overall, the two-phase method works pretty well and
is much faster than the naive approach of looking at all books one by one. I performed
an experiment and searched for The Hound of the Baskervilles in the Corvallis Public
Library. I was able to locate the correct shelf in five steps and the book on the shelf in
another seven steps. This is quite an improvement over the naive method, given that at
the time the library carried in its adult fiction section 44,679 books on 36 shelves.
The decision by Indiana Jones to travel to Venice in search of his father is based on
a similar strategy. In this case the world is viewed as an array whose cells correspond
to geographic regions that are indexed by city names. The return address on the mail
containing Henry Jones, Sr.’s notebook serves as a clue that picks out the cell indexed
by “Venice” to continue the search. In Venice, the search leads Indiana Jones to a library,
where he is looking not for a book but for the tomb of Sir Richard, one of the knights
Chapter 5 The Search for the Perfect Data Structure 87
of the first crusade. He finds the tomb after crushing a floor tile marked with an X,
which is not without irony, given his earlier proclamation to his students,
The key to a fast search is to have a structure that allows the search space to be effectively
narrowed down quickly. The smaller the “inside” identified by a key, the better, because
it allows the search to converge faster. In the case of the book search there are two search
phases separated by one major narrowing-down step.
Survival by Boggle
Search often comes in disguise and in the most unexpected circumstances. Toward the
end of The Last Crusade, Indiana Jones arrives in the Temple of the Sun, where he has
to overcome three challenges before he can finally enter the room that contains the
Holy Grail. The second challenge requires him to cross a tiled floor over an abyss.
The problem is that only some of the tiles are safe while others crumble and lead to
a gruesome death when stepped on. The floor consists of 50 or so tiles, arranged in a
nonregular grid, each marked with a letter of the alphabet. Life as an archaeologist is
tricky: one day you have to crush a floor tile to make progress; on other days you have
to avoid it at all costs.
Finding those tiles that are safe to step on is not an easy task, since without any fur-
ther restrictions the number of possibilities is huge: more than one thousand trillion—1
followed by 15 zeros. The clue to finding a viable sequence of tiles that leads safely to
the other side of the floor is that the tiles should spell the name Iehova. Although this
information basically solves the puzzle, the identification of the correct sequence of tiles
still requires some work. Surprisingly, it involves a search that systematically narrows
down the space of possibilities.
This task is similar to playing Boggle, where the goal is to find
strings of connected letters on a grid that form words.3 Indiana
Jones’s task seems to be much easier because he knows the word
already. However, in Boggle, consecutive characters must be on ad-
jacent tiles, which is not a constraint for the tile floor challenge and
thus leaves more possibilities for Indiana Jones to consider and makes it harder again.
To illustrate the search problem to be solved, let us assume for simplicity that the tile
floor consists of six rows, each containing eight different letters, which yields a grid of 48
tiles. If the correct path consists of one tile in each row, there are 8 × 8 × 8 × 8 × 8 × 8 =
88 Once Upon an Algorithm
262,144 possible paths through all possible combination of tiles in the six rows. Of
these, only one is viable.
How does Indiana Jones now find the path? Guided by the clue word, he finds the
tile that contains the letter I in the first row and steps on it. This identification is itself
a search process that involves multiple steps. If the letters on the tiles do not appear in
alphabetical order, Indiana Jones has to look at them one by one until he finds the tile
with the letter I. He then steps on it, and continues searching for the tile with the letter
e in the second row of tiles, and so forth.
If the one-by-one search reminds you of the search for an element in a list, you are
correct. This is exactly what’s happening here. The difference is that the list search is
applied repeatedly for each row, that is, for each character of the clue word. And therein
lies the power of this method. Consider what happens to the search space, that is, the
set of all 262,144 possible paths across the tile grid. Each tile in the first row identifies a
different starting point that can be continued in 8 × 8 × 8 × 8 × 8 = 32,768 different ways
by selecting the different combination of letters from the five remaining rows. When
Indiana Jones looks at the first tile, which is labeled, say, K, he will of course not step
on it, since it doesn’t match the required I. This single decision has removed in one fell
swoop a total of 32,768 paths from the search space, namely, all those paths that can be
formed by starting with the K tile. And the same reduction happens with each further
rejected tile in the first row.
Once Indiana Jones reaches the correct tile, the reduction of the search space is even
more dramatic, because as soon as he steps on the tile, the decision process for the first
row is completed, and the search space is immediately reduced to the 32,768 paths that
can be formed with the remaining five rows. All in all, with at most seven decisions
(which are required when the I tile comes last) Indiana Jones has reduced the search
space by a factor of eight. And so he continues with the second row of tiles and the
letter e. Again, the search space is reduced by a factor of eight to 4,096 with at most
seven decisions. Once Indiana Jones has reached the last row, only eight possible paths
are left, and again with no more than seven decisions he can complete the path. In the
worst case this search requires 6 × 7 = 42 “steps” (only six literal steps)—a remarkably
efficient way of finding one path among 262,144.
It might not be obvious, but the challenge mastered by Indiana Jones bears some
similarity to Hansel and Gretel’s. In both cases the protagonists have to find a path to
safety, and in both cases the path consists of a sequence of locations that are marked—
in the case of Hansel and Gretel by pebbles, in the case of Indiana Jones by letters.
The search for the next location in the paths is different, though: Hansel and Gretel
just have to find any pebble (although they have to make sure not to revisit pebbles),
Chapter 5 The Search for the Perfect Data Structure 89
whereas Indiana Jones has to find a specific letter. Both examples again emphasize the
role of representation in computation. Indiana Jones’s challenge specifically illustrates
that the sequence of signifiers (letters) for individual tiles is itself another signifier (the
word Iehova) for a path. Moreover, the fact that the word Iehova stands for a path
illustrates how a computation that simply searches for a word becomes meaningful and
important in the world.
The way in which Indiana Jones finds the clue word on the grid is exactly how one
can search efficiently for a word in a dictionary: first narrow down the group of pages
that contain words that start with the same letter as the clue word, then narrow down
the group of pages further to those that match the first and second letters, and so on,
until the word is found.
To better understand the data structure that lurks behind the tile grid and that makes
Indiana Jones’s search so efficient, we look at another way of representing words and
dictionaries that uses trees to organize the search space.
be safe, and since all words are considered to be equally likely correct, the chance of
survival increases to three out of five for the v tile.4
Therefore, a good strategy is to first compute the frequency of let- da
ters among the five words and then try to step on tiles with the high- e
G
est occurrence rate. Such a mapping of letters to frequencies is called h
I
a histogram. To compute a letter histogram, Indiana Jones needs to J
o
maintain a frequency counter for each letter. By scanning through all v
the words he can increment the counter for each encountered letter. wY
Maintaining counters for different letters is a task that is solved by the 0 1 2 3 4
dictionary data type (see chapter 4). For this application the keys are individual letters,
and the information stored with each key is the letter frequency.
We could use an array with letters as indices as a data structure to implement this
dictionary, but this would waste over 50% of the array’s space, since we have only a
total of eleven different letters to count. Alternatively, we could also use a list. But that
wouldn’t be very efficient. To see this, let’s suppose we scan the words in alphabetical
order. Thus we start with the word God and add each of its letters into the list, together
with the initial count of 1. We obtain the following list:
Note that the insertion of G took one step, the insertion of o took two because we had
to add it after the G, and the insertion of d took three steps because we had to add it
after both G and o. But couldn’t we instead insert the new letters at the front of the
list? Unfortunately, that wouldn’t work because we have to make sure a letter is not in
the list before we add it, and so we have to look at all existing letters in the list before
we can add a new one. If a letter is already contained in the list, we instead increment
its counter. Thus we have spent already 1+2+3 = 6 steps on the first word.
For the next word Iehova we need 4, 5, and 6 steps to insert I, e, and h, respectively,
which brings us to a total of 21 steps. The next letter is an o, which already exists. We
only need 2 steps to find it and update its count to 2. Letters v and a are new and need
an additional 7+8 = 15 steps to be inserted at the end of the list. At this point our list
has taken 38 steps to build and looks as follows:
Updating the dictionary with the counts from the word Jehova then takes 9 steps for
adding the J and 5, 6, 2, 7, and 8 steps for incrementing the counts for the remaining
letters that are already present in the list, which brings the total number of steps to
75. After processing the last two words, Yahweh and Yehova, using 40 and 38 steps,
Chapter 5 The Search for the Perfect Data Structure 91
respectively, we finally end up with the following list, which took a total of 153 steps to
construct:
G:1 → o:4 → d:1 → I:1 → e:4 → h:4 → v:3 → a:4 → J:1 → Y:2 → w:1
Note that we must not add the second h in Yahweh, since counting it twice for one word
would increase, incorrectly, the odds of the letter (in this case from 80% to 100%).5 A
final scan of the list shows that the letters o, e, h, and a are the safest to begin with, all
having a probability of 80% of being contained in the correct word.
G G G
o d o
a e
Figure 5.1 Three example binary trees. Left: A tree of only one node. Middle: A tree whose
root has a right subtree of a single node. Right: A tree whose root has two subtrees. All three
trees have the binary search property, which says that nodes in left subtrees are smaller than the
root, and nodes in right subtrees are larger than the root.
have found the value and the search is over. Otherwise, in case the value is smaller than
the root, we can limit the further search exclusively to the left subtree. We do not have
to look at any of the nodes in the right subtrees because all the nodes are known to be
larger than the root and thus larger than the sought value. In each case, the internal
node defines a boundary that separates the “inside” (given by the left subtree) from the
“outside” given by the right subtree just as the return address of the Grail diary defines
the “inside” of the next step of Indiana Jones’s search for his father.
The three trees shown in figure 5.1 are all binary search trees. They store letters as
values that can be compared according to their position in the alphabet. To find a value
in such a tree works by repeatedly comparing the sought value to the values in nodes in
the tree and correspondingly descending into the right or left subtrees.
Suppose we want to find out whether e is contained in the right tree. We start at
the root of the tree and compare e with G. Since e precedes G in the alphabet and is
thus “smaller” than G, we continue the search in the left subtree, which contains all
values smaller than G that are stored in this tree. Comparing e with the root of the
left subtree, d, we find that e is larger and therefore continue the search in the right
subtree, which is a tree of only one node. Comparing e with that node finishes the
search successfully. Had we searched for, say, f instead, the search would have led to
the same node e, but since e does not have a right subtree, the search would have ended
there with the conclusion that f is not contained in the tree.
Any search proceeds along a path in the tree, and so the time it takes to find an
element or to conclude that it does not exist in the tree never takes longer than the
longest path from the root to a leaf in the tree. We can see that the right tree in figure
5.1, which stores five elements, contains paths to leaves that have a length of 2 or 3 only.
Chapter 5 The Search for the Perfect Data Structure 93
This means we can find any element in at most three steps. Compare this with a list of
five elements, where a search for the last element of the list always takes five steps.
We are now ready to follow the computation of the letter histogram using a binary
search tree to represent a dictionary. Again, we start with the word God and add each
of its letters into the tree, together with the initial count of 1. We obtain the following
tree, which reflects the order of the letters where they appear in the tree:
G:1
d:1 o:1
In this case the insertion of G took one step, as it did for the list, but the insertion of o
and d both took only two steps, since they are both children of G. Thus we have saved
one step for the first word (1+2+2 = 5 steps for the tree versus 1+2+3 = 6 for the list).
Inserting the next word, Iehova, requires three steps for each of the letters, except
for o and h, which require two and four steps, respectively. As with lists, processing the
o does not create a new element but increments the count of the existing element by
1. Processing this word therefore takes 18 steps and brings the total to 24 steps, com-
pared to 38 steps that were needed for the list data structure. The word Jehova changes
the tree structure only slightly by adding a node for J, which takes four steps. With
3+4+2+3+3 = 15 more steps the counts for the existing letters are updated, leading to
a total of 39 steps, compared to 75 for the list:
G:1 G:1
Finally, adding Yahwe(h) and Yehova, each requiring 19 steps, we obtain the following
tree, which took a total of 76 steps to construct. This is half of the 153 steps that were
needed for the list data structure.
94 Once Upon an Algorithm
G:1
d:1 o:4
w:1
This binary search tree represents the same dictionary as the list, but in a form that sup-
ports searching and updating faster—in general, at least. The spatial shapes of lists and
trees explain some of the differences in their efficiency. The long and narrow structure
of the list often forces the search to go long distances and to look at elements that are
not relevant. In contrast, the wide and shallow form of the tree directs the search ef-
fectively and limits the elements to be considered and the distance traversed. However,
a fair comparison between lists and binary search trees requires us to consider a few
additional aspects.
becomes quadratic, which can be prohibitive. Recall the strategy that had Hansel go all
the way back home for fetching each new pebble.
What, in comparison, is the runtime for inserting an element into a tree and search-
ing for an element in a tree? The final search tree contains 11 elements, and searching
and inserting takes between three and five steps. In fact, the time to find or insert an
element is bounded by the height of the tree, which is often much smaller than the
number of its elements. In a balanced tree, that is, a tree in which all paths from the
root to a leaf have the same length (±1), the height of a tree is logarithmic in size, which
means the height grows by 1 only whenever the number of nodes in the tree doubles.
For example, a balanced tree with 15 nodes has a height of 4, a balanced tree with 1,000
nodes has height 10, and 1,000,000 nodes fit comfortably into a balanced tree of height
20. Logarithmic runtime is much better than linear runtime, and as the size of a dictio-
nary gets really large, the tree data structure gets better and better compared to the list
data structure.
This analysis is based on balanced binary search trees. Can we actually guarantee
that constructed trees are balanced, and if not, what is the runtime in the case of unbal-
anced search trees? The final tree in this example is not balanced, since the path to leaf
e is of length 3 while the path to leaf w is of length 5. The order in which the letters are
inserted into the tree matters and can lead to different trees. For example, if we insert
the letters in the order given by the word doG, we obtain the following binary search
tree, which is not balanced at all:
d:1
o:1
G:1
This tree is extremely unbalanced—it is not really different from a list. And this is not
some quirky exception. Of the six possible sequences of the three letters, two lead to
a balanced tree (God and Gdo) and the other four lead to lists. Fortunately, there are
techniques to balance search trees that get out of shape after insertions, but these add to
the cost of insertion operations, although the cost can still be kept logarithmic.
Finally, binary search trees work only for those kind of elements that can always be
ordered, that is, for any two elements where we can say which is larger and which is
smaller. Such a comparison is required to decide in which direction to move in a tree
to find or store an element. However, for some kinds of data, such comparisons are
not possible. Suppose, for example, you are keeping notes on quilting patterns. For
96 Once Upon an Algorithm
each pattern you want to record what fabrics and tools you need, how difficult it is
to make, and how much time it takes. To store information about those patterns in a
binary search tree, how could you decide whether one pattern is smaller or larger than
another? Since patterns differ in the number of pieces they contain, as well as the shapes
and colors of the pieces, it’s not obvious how to define an order among those patterns.
This is not an impossible task—one could decompose a pattern into its constituting
parts and describe it by a list of its features (for example, the number of pieces or their
colors and shapes) and then compare patterns by comparing their lists of features, but
this requires effort, and it may not be very practical. Thus a binary search tree might
not be well suited for storing a quilting pattern dictionary. However, you could still use
a list, since all you need to do for a list is to decide whether two patterns are the same,
which is easier than ordering them.
To summarize: Binary search trees exploit a strategy employed naturally and ef-
fortlessly by people to decompose search problems into smaller ones. In fact, binary
search trees have systematized and perfected the idea. As a consequence, binary search
trees can be much more efficient than lists in representing dictionaries. However, they
require more effort to guarantee the balancing, and they do not work for every kind of
data. And this should match your experience with searching your desk, office, kitchen,
or garage. If you regularly devote effort to keeping things in order, for example, by
placing documents or tools back after their use, you have a much easier time finding
objects than by having to search through a pile of unorganized stuff.
Try a Trie
Binary search trees are an efficient alternative to lists when it comes to computing a
histogram of letter frequencies in words. But this is only one of the computations that
help Indiana Jones solve the tile floor challenge. The other computation is to identify a
sequence of tiles that spell a particular word and lead safely across the tile floor.
We have already determined that a grid consisting of six rows, each with eight letters,
contains 262,144 different six-tile paths. Since each path corresponds to one word, we
could represent the association of words to paths with a dictionary. A list would be an
inefficient representation, since the clue word Iehova might be located near the end of
the list and thus might take quite some time to locate. A balanced binary search tree
would be much better because its height would be 18, ensuring the clue word can be
found relatively quickly. But we don’t have such a balanced search tree available, and
constructing it is a time-consuming endeavor. As with the search tree containing the
letters, we have to insert each element separately, and to do so we have to repeatedly
Chapter 5 The Search for the Perfect Data Structure 97
traverse paths in the tree from the root to the leaves. Without analyzing this process in
detail, it should be clear that building the tree takes more than linear time.6
Still, we can identify the sequence of tiles quite efficiently (with no more than 42
steps) without any additional data structure. How is this possible? The answer is that
the tile floor with the letters on the tiles is itself a data structure that supports par-
ticularly well the kind of search Indiana Jones has to perform. It is called a trie,7 a
data structure that is somewhat similar to a binary search tree but also different in an
important way.
Recall how Indiana Jones with each step on a tile reduces the search space by a factor
of eight. This is similar to what happens when by descending into a balanced binary
search tree the search space is cut in half: by taking one of two tree branches, half the
nodes in the tree are excluded from the search. Similarly, by not taking a tile, the search
space is reduced by one-eighth, and by selecting a tile, the search space is reduced to one-
eighth of its previous size, because selecting one tile means not selecting seven others
and thus reducing the search space by seven-eighth.
But something else happens here as well that is different from how a binary search
tree works. In a binary search tree each element is stored in a separate node. By contrast,
a trie stores only single letters in each node and represents words as paths connecting
different letters. To understand the difference between a binary search tree and a trie,
consider the following example. Suppose we want to represent the set of words bat, bag,
beg, bet, mag, mat, meg, and met. A balanced binary search tree containing these words
looks as follows:
bet
bat mat
met
To find the word bag in this tree, we compare it with the root bet, which tells us to
continue the search in the left subtree. This comparison takes two steps for comparing
the first two characters of both words. Next we compare bag with the root of the
left subtree, bat, which tells us again to continue the search in the left subtree. This
comparison takes three steps, since we have to compare all three characters of the two
words. Finally, we compare bat with the leftmost node in the tree, which results in the
98 Once Upon an Algorithm
successful completion of the search. This last comparison also takes three steps, and the
whole search requires 8 comparisons altogether.
We can represent the same collection of words by a 2-by-3 tile floor, where each row
contains tiles for the letters that can occur at the corresponding position in any of the
words. For example, since each word begins with either b or m, the first row needs two
tiles for these two letters. Similarly, the second row needs two tiles for a and e, and the
third row needs tiles for g and t.
b m
a e
g t
Finding a word on the tile floor works by systematically traversing the tile floor row-
by-row. For each letter of the word the corresponding tile row is traversed from left to
right until the letter is found. For example, to find the word bag on this tile floor, we
start searching for the first letter, b, in the first row. We find the tile in one step. Next
we search for the second letter a in the second row, which again takes one step. Finally,
we can complete the search successfully by finding g in the third row, which also takes
only one step. Altogether this search takes 3 comparisons.
The search on the tile floor requires fewer steps than the binary search tree because
we have to compare each letter only once, whereas the search in the binary tree causes
repeated comparisons of initial parts of the word. The word bag represents the best case
scenario for the tile floor, since each letter can be found on the first tile. By contrast,
the word met requires six steps, since each of its letters is contained on the last tile in a
row. But it cannot get worse than this because we have to check each tile at most once.
(For comparison, finding met in the binary search tree requires 2+3+3+3 = 11 steps.)
The best case for the binary search tree is finding the word bet, which requires only
three steps. However, with growing distance from the root, binary search leads to more
and more comparisons. Since most words are located toward the leaves of the tree, at
a greater distance from the root,8 we generally have to repeatedly perform comparisons
of initial parts of the word, which indicates that in most cases search in a trie is faster
than in a binary search tree.
The tile floor analogy suggests that a trie can be represented as a table, but this is not
always the case. The example works out nicely because all words have the same length
and each position in a word can contain the same letters. However, suppose we want
to also represent the words bit and big. In this case, the second row needs an additional
tile for the letter i, which breaks the rectangular shape. Adding the two words reveals
another regularity in the example that does not exist in general. The example words
Chapter 5 The Search for the Perfect Data Structure 99
are chosen so that different prefixes of the same length have the same possible suffixes,
which allows the sharing of information. For example, ba and be can both be completed
by adding either g or t, which means that the possible continuations of both prefixes
can be described by a single tile row containing the letters g and t. However, this is not
true anymore when we add the words bit and big. While b can be continued with 3
possible letters, namely, a, e, and i, m can be continued only with a and e, which means
that we need two different continuations.
Therefore, a trie is typically represented as a special kind of binary tree in which
nodes carry single letters, left subtrees represent word continuations, and right subtrees
represent letter alternatives (plus their continuations). For example, the trie for the
words bat, bag, beg, and bet looks as follows:
g e
t g
Since the root has no right edge, all words start with b. The left edge from b to the
subtree with root a leads to the possible continuations, which start with either a or e,
the alternative to a pointed to by the right edge from a. The fact that ba and be can be
continued by either g or t is represented by a left edge from a to a subtree with root g,
which has a right child t. The fact that the left subtrees of a and e are identical means
that they can be shared, which is exploited in the tile floor representation by having a
single tile row. Because of this sharing, a trie typically requires less storage than a binary
search tree.
Whereas in the binary search tree each key is completely contained in a separate
node, the keys stored in a trie are distributed over the trie’s nodes. Any sequence of
nodes from the root of a trie is a prefix of some key in the trie, and in the case of the
tile floor, the selected tiles are a prefix of the final path. This is why a trie is also called
a prefix tree. The trie data structure representing the tile floor facing Indiana Jones is
illustrated in figure 5.2.
Like binary search trees and lists, tries have their advantages and their problems.
For example, they work only for keys that can be decomposed into sequences of items.
As with Indiana Jones, who in the end was not able to hold on to the Holy Grail, there
100 Once Upon an Algorithm
K K
K J I ···
Rows 2-6 J a e k ··· Rows 2-6 J
d f h ···
..
Rows 2-6 I . Rows 2-6 I
a ··· a ···
K J I ···
Rows 3-6 e a e k ··· Rows 3-6 e
d f h ···
Rows 3-6 ··· .. Rows 3-6 ···
.
Figure 5.2 A trie data structure and how to search it. Left: A trie representation where left
subtrees represent possible word continuations of the letter in the parent node (represented by
the rounded rectangles) and right subtrees represent alternatives to the parent node. Middle top:
The tile floor representation of the trie on the left in which the common subtrees are shared
through single tile rows. Middle bottom: Three selected tiles that spell the beginning of the word
Iehova. Right: A path through the trie marked by bold edges for the selected tiles. The circled
nodes correspond to selected tiles.
is no Holy Grail of data structures. The best choice of a data structure always depends
on the details of the application.
When we watch the movie scene, we consider the real challenge to be finding the
clue word or jumping precisely onto the target tiles, but hardly the identification of
the tiles according to the letters of the clue word. This seems so obvious that we don’t
think about it at all, which again demonstrates how natural the execution of efficient
algorithms is to us. As with Hansel and Gretel and Sherlock Holmes, Indiana Jones’s
adventures and his survival are based on core computing principles.
And finally, if you’re still wondering about the method to never lose your car keys
again, it is very simple: When you come home, always place them at one particular
location. This location is the key to the keys. But you probably knew that already.
Getting Your Ducks in a Row
If you are a teacher, a significant part of your job is to grade essays or exams for the
students in your classes. The task involves more than just reading and grading individual
exams. For example, before deciding on the final grades you may want to get a sense
of the grade distribution and potentially curve the grades. Moreover, once the grades
are fixed, you have to enter them into the class roster, and finally, you have to hand the
exams back to the students. In some cases each of these tasks can be performed more
efficiently by employing some form of sorting.
Let’s first consider entering grades into a roster. Even for this simple task one can
identify three different algorithms with different runtime performances. First, you can
go through the pile of graded exams and enter each grade into the roster. Each exam
is obtained in constant time from the pile, but it takes logarithmic time to find the
student’s name in the roster, assuming it is sorted by names and you use binary search
to find it. For entering all grades into the roster this adds up to linearithmic runtime,1
which is much better than quadratic but not quite as good as linear.
Second, you can go through the roster name by name and find for each student her
or his exam in the pile. Again, getting to the next name in the roster takes constant time,
but finding the exam for a particular student in the list requires, on average, traversing
half the list, resulting in quadratic runtime.2 Thus this algorithm should not be used.
Third, you could sort the exams by name and then go through the roster and the
sorted pile in parallel. Since the sorted pile and the roster are aligned, entering each
grade takes only constant time. Altogether this leads to linear time plus the time it takes
to sort the list of exams, which can be done in linearithmic time. The total runtime for
this method is dominated by the linearithimc time required for sorting. Therefore, the
total runtime of this last algorithm is also linearithmic. But that’s not better than the
first approach, so why go to the extra trouble of sorting the exams in the first place?
Since there are situations in which we can actually sort faster than linearithmic time,
the third method can in these cases improve the performance and be the most efficient
approach.
102 Once Upon an Algorithm
The second task, curving the exam grades, requires creating a point distribution,
that is, a map with point scores and the number of students who have obtained that
score. This task is similar to Indiana Jones’s task of computing a histogram of letters.
A grade distribution is also a histogram. Computing a histogram using a list takes
quadratic runtime while using a binary search tree takes linearithmic runtime, which
suggests using the latter. You could also compute a histogram by first sorting the pile
of exams by points, followed by a simple scan of the sorted pile where you count how
often each point score repeats. This works, since in a sorted list all the exams with the
same points will follow one another. As in the case of entering the grades into the class
roster, sorting does not increase the runtime but can potentially shorten it. However,
the situation is different if the point range is small. In that case, updating an array of
point scores is probably the fastest method.
Finally, handing back exams in a classroom can become an embarrassingly slow
endeavor. Imagine standing in front of a student and trying to find her or his exam
in a large pile. Even using binary search this takes too long. Repeating the search for
a large number of students makes this an inconveniently slow process (even though it
gets faster with each step as the pile of exam gets smaller). An alternative is to sort the
exams by position of students in the class. If seats are numbered and you have a seating
chart (or just happen to know where students sit), you can sort the exams by seat
numbers. Handing back the exams then becomes a linear algorithm where, similarly
to entering the grades into the class roster, you can traverse the seats and the pile of
exams in parallel and hand back each exam in one step. In this case, even if the sorting
takes linearithmic time, it is worth the effort, since it saves precious class time. This
is an example of precomputing, where some data needed for an algorithm is computed
before the algorithm is executed. It is what the mail carrier does before delivering mail
or what you do when arranging your shopping list according to how items are placed
in the aisles at the grocery store.
Sorting is a more general activity than one might think. In addition to sorting col-
lections of items, we regularly have to sort tasks based on their dependencies. Consider
the algorithm for getting dressed: you have to figure out to put on socks before shoes
and underpants before pants. Assembling furniture, repairing or maintaining machines,
filling out paperwork—most of these activities require us to perform some actions in
the correct order. Even preschoolers are given problems to sort pictures into a coherent
story.
6 Sorting out Sorting
example, searching). More generally, computation is a resource that can be saved and
reused through data structures. This interplay between data structures and algorithms
shows how closely related the two are.
Figure 6.1 Selection sort repeatedly finds the smallest element in an unsorted list (left of verti-
cal bar) and appends it to the sorted list (right). Elements are not sorted by name but according
to the dependency of tasks they represent.
Both methods start with an unsorted list and repeatedly move elements around until
the list is sorted. The difference between the two methods can be best illustrated by
describing how to move elements from an unsorted list into a sorted list.
The first method, illustrated in figure 6.1, is to repeatedly find the smallest element
in the unsorted list and append it to the sorted list. To compare actions, one action is
considered to be smaller than another if it can come before. Consequently, the smallest
action is one that does not have to be preceded by any other action. At the beginning
all elements are still to be processed, and the sorted list is empty. Since the first task is
to travel to Nepal, this means Nepal is the smallest element and is thus the first element
to be selected and appended to the sorted list. The next step is to find the possessor of
the disc, that is, to find Marion. Next Indiana Jones has to obtain the disc, travel to
Tanis, and uncover the map room, which is reflected in the second-to-last line in figure
6.1. Finally, Indiana Jones has to focus the sunbeam using the disc to reveal the location
of the Well of Souls, where he can then find the ark. The resulting sorted list represents
the plan for Indiana Jones’s adventure.
As the figure illustrates, the algorithm is finished when the list of unsorted elements
becomes empty, since in that case all elements have been moved to the sorted list. This
sorting algorithm is called selection sort because it is based on repeatedly selecting an
element from an unsorted list. Note that the method works just as well the other way
around, that is, by repeatedly finding the largest element and adding it to the beginning
of the sorted list.
106 Once Upon an Algorithm
Figure 6.2 Insertion sort repeatedly inserts the next element from the unsorted list (left of the
vertical bar) into the sorted list (right).
It may seem obvious, but how can we actually find the smallest element in a list?
A simple method is to remember the value of the first element and compare it with
the second, third, and so on, until we find an element that is smaller. In that case, we
take the value of that element and continue comparing and remembering the currently
smallest element until we reach the end of the list. This method takes linear time in the
length of the list in the worst case, since the smallest element can be at the end of the
list.
Most of the effort in selection sort is expended on finding the smallest element in
an unsorted list. Even though the unsorted list shrinks by one element in each step, the
overall runtime of the sorting algorithm is still quadratic, since we have to traverse, on
average, a list that contains half the elements. We saw the same pattern when analyzing
Hansel’s algorithm in chapter 2 for fetching the next pebble from home: the sum of the
first n numbers is proportional to the square of n.1
The other popular method for sorting takes an arbitrary element (usually the first)
from the unsorted list and places it at the correct position in the sorted list. The element
is inserted after the largest element in the sorted list that is smaller than the element to
be inserted. In other words, the insertion step traverses the sorted list to find the last
element that has to precede the element to be inserted.
The effort in this method is spent on the insertion of elements, not their selection,
which gives the method the name insertion sort (see figure 6.2). Insertion sort is the
preferred sorting method of many people who play cards. Having a pile of dealt cards
Chapter 6 Sorting out Sorting 107
in front of them, they pick up one after the other and insert it into an already sorted
hand.
The difference between insertion sort and selection sort can best be seen when the
Sunbeam element is moved from the unsorted list into the sorted list. As figure 6.2
shows, the element is simply removed from the unsorted list, without the need for
any search, and is inserted into the sorted list by traversing it until its proper position
between Map and Well is found.
This example also demonstrates a subtle difference in the runtime of both algo-
rithms. While selection sort has to traverse one of the two lists completely for every
selected element, insertion sort has to do this only when the element to be inserted is
larger than all elements currently in the sorted list. In the worst case, when the list to
be sorted is already sorted to begin with, every element will be inserted at the end of
the sorted list. In this case insertion sort has the same runtime as selection sort. In
contrast, when the list is inversely sorted, that is, from the largest to smallest element,
every element will be inserted at the front of the sorted list, which leads to a linear run-
time for insertion sort. One can show that on average insertion sort still has quadratic
runtime. Insertion sort can be much faster than selection sort in some cases and never
does worse.
Why does insertion sort have a better runtime in some cases than selection sort, even
though they both operate similarly? The key difference is that insertion sort makes use
of the result of its own computation. Since new elements are inserted into an already
sorted list, insertions do not always have to traverse the whole list. In contrast, selection
sort always appends to the sorted list, and since the selection process cannot exploit the
sorting, it always has to scan the whole unsorted list. This comparison illustrates the
important computer science design principle of reuse.
Apart from efficiency, do the two sorting methods differ in how well they are suited
for the problem of arranging actions into a plan? Neither method is ideal, because
a particular difficulty in this example problem is not the process of sorting but the
decision about which actions precede other actions. If there is uncertainty about the
exact ordering of elements, selection sort seems least attractive, since in the very first
step one already has to decide how a tentative smallest element compares to all other
elements. Insertion sort is better, since one can pick an arbitrary element in the first
step without the need to perform any comparisons and thus already obtains a sorted
one-element list. However, in the steps that follow, each selected element has to be
compared to more and more elements in the growing sorted list to find its correct
position. While in the beginning the number of difficult comparisons might be smaller
for insertion sort than for selection sort, the algorithm might force decisions that one
108 Once Upon an Algorithm
might not be able to make yet. Is there a method that gives a sorter more control over
which elements to compare?
Figure 6.3 Quicksort splits a list into two sublists of elements that are smaller and larger,
respectively, than a chosen pivot element. Then those two lists are sorted, and the results are
appended together with the pivot element to form the resulting sorted list.
elements smaller than Tanis is empty, which is trivially sorted already. To sort this two-
element list, we simply have to compare the two elements and invert their position,
which yields the sorted list Marion → Disc. Then we append the empty list, Nepal,
and Marion → Disc and obtain the sorted sublist Nepal → Marion → Disc. The sorting
works similarly if we pick any of the other elements as pivots. If we pick Disc, we
again obtain an empty list and a two-element list that needs to be sorted, and if we pick
Marion, we obtain two single-element lists that are both already sorted. Sorting the
sublist of elements larger than Tanis is analogous.
Once the sublists are sorted, we can append them with Tanis in the middle and
obtain the final result. As figure 6.3 shows, quicksort converges surprisingly quickly.
But is this always the case? What is the runtime of quicksort in general? It seems
we got lucky by picking Tanis as the pivot in the first step, since Tanis divides the
list into two lists of equal size. If we can always pick a pivot with that property, the
sublists will always be cut in half, which means that the number of division levels will
be proportional to the logarithm of the original list’s length. What does this mean for
the overall runtime of quicksort in this and the general case?
The first iteration, in which we split one list into two, takes linear time, since we
have to inspect all the elements in the list. In the next iteration we have to split two
sublists, which again takes altogether linear time because the total number of elements
in both lists is 1 less than the original list,2 no matter where we split and how long each
of the two sublists are. And this schema continues: each level has no more elements
than the previous one and thus takes linear time to split. Altogether we accumulate
linear time for each level, which means that the runtime of quicksort depends on the
number of levels needed for splitting sublists. Once the original list has been completely
decomposed into single elements, the splitting process ensures that all these elements are
in the correct order and can simply be concatenated into a sorted result list. This takes,
110 Once Upon an Algorithm
480
400 quadratic
320
240
linearithmic
160
80
linear
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
once more, linear effort, so the total runtime for quicksort is given by the sum of the
linear times for all levels.
In the best case, when we can split each sublist roughly in half, we obtain a loga-
rithmic number of levels. For example, a list with 100 elements leads to seven levels, a
list with 1,000 elements leads to ten levels, and a list with 1,000,000 can be completely
decomposed in only 20 levels.3 For the total runtime this means that in the case of
1,000,000 elements, we have to spend the linear effort of scanning a list of 1,000,000 el-
ements only 20 times. This leads to a runtime on the order of tens of millions of steps,
which is much better than a quadratic runtime, which means hundreds of billions to
trillions of steps. This runtime behavior, where the runtime grows proportionally to
the size of the input times the logarithm of the size is called linearithmic. It is not quite
as good as linear runtime, but it is much better than quadratic runtime (see figure 6.4).
Quicksort has linearithmic runtime in the best case. However, if we pick the pivots
unwisely, the situation is quite different. For example, if we choose Nepal instead of
Tanis as the first pivot, the sublist of smaller elements will be empty, and the sublist
of larger elements will contain all elements except Nepal. If we next choose Marion as
the pivot of that sublist, we run into the same situation where one list is empty and
the other is only one element shorter than the sublist that is split. We can see that in
this situation quicksort effectively behaves just like selection sort, which also repeatedly
removes the smallest element from the unsorted list. And like selection sort, quicksort
will in this case also have quadratic runtime. The efficiency of quicksort depends on
the selection of the pivot: if we can always find an element in the middle, the splitting
Chapter 6 Sorting out Sorting 111
process returns evenly divided lists. But how can we find a good pivot? While it is not
easy to guarantee good pivots, it turns out that picking a random element or the median
of the first, middle, and last element works well on average.
The importance and impact of the pivot is closely related to the notion of bound-
ary that was used in chapter 5 to explain the essence of effective search. In the case of
searching, the purpose of the boundary is to divide the search space into an outside and
an inside so that the inside is as small as possible to make the remainder of the search
easier. In the case of sorting, the boundary should divide the sorting space into equal
parts so that the sorting of each part is sufficiently simplified through the decomposi-
tion. Therefore, if the pivot is not chosen well, quicksort’s runtime deteriorates. But
while quicksort has quadratic runtime in the worst case, it has linearithmic runtime on
average, and it performs well in practice.
Figure 6.5 Mergesort splits a list into two sublists of equal size, sorts them, and merges the
sorted results into one sorted list. The parentheses indicate the order in which lists must be
merged. In line 4 the decomposition is complete when only single-element lists are obtained.
In line 5 three pairs of single-element lists have been merged into three sorted two-element lists,
and in line 6 those lists are merged again into one four-element and one three-element list, which
are merged in the last step to produce the final result.
using insertion sort. After all exams have been placed into their buckets, the buckets
are appended in alphabetical order to produce an ordered list. Bucket sort is similar to
counting sort (discussed later).
On the face of it, mergesort looks more complicated than quicksort, but this may
be due to the fact that some steps were skipped in the description of quicksort. But still
it seems that the repeated merging of longer and longer lists is inefficient. However,
this intuition is deceptive. Since the decomposition is systematic and halves the size of
lists in every step, the overall runtime performance is quite good: in the worst case the
runtime of mergesort is linearithmic. This can be seen as follows. First, since we always
split lists in half, the number of times lists need to be split is logarithmic. Second, the
merging on each level takes only linear time because we have to process each element
once (see figure 6.5). Finally, since merging happens on each level once, we altogether
obtain linearithmic runtime.
Mergesort bears some similarity to quicksort. In particular, both algorithms have a
phase for splitting lists, followed by a recursive sorting step for each of the smaller lists,
and finally a phase for combining the sorted sublists into longer sorted lists. In fact,
both quicksort and mergesort are examples of divide-and-conquer algorithms, which all
are instances of the following general schema:
Chapter 6 Sorting out Sorting 113
The case for solving a nontrivial problem shows how divide-and-conquer works. The
first step is the divide step; it reduces the problem’s complexity. The second step is
the recursive application of the method to the subproblems. If the subproblems thus
obtained are small enough, they can be solved directly. Otherwise, they will be de-
composed further until they are small enough to be solved directly. The third step is
the merge step; it is assembling a solution for the problem from the solutions of the
subproblems.
Quicksort does most of its work in the divide step, where all the element com-
parisons happen: by ensuring that the elements in the two lists are separated by the
pivot, it allows the merge step to be a simple concatenation of lists. In contrast, merge-
sort’s divide step is very simple and does not contain any element comparisons. Most
of mergesort’s work happens in the merge step, where the sorted lists are combined,
similar to a zipper.
0 2 0 2 1 1 0 ··· counter
1 2 3 4 5 6 7 ··· index
By traversing the array we find that 1 does not occur in the list, since its counter is
still 0. A 1 will therefore not be part of the sorted result list. In contrast, the 2 occurs
twice and will therefore be put twice into the list, and so on. The resulting list will
be 2→2→4→4→5→6. This method is called counting sort, since the array maintains
a counter for how often an element occurs in the list to be sorted. The runtime of
counting sort results from the combined cost of traversing the list and array. Since
either step is linear in the size of the respective data structure (list and array), counting
sort runs in linear time in the size of the list or array, whichever is larger.
The downside of counting sort is that it can waste a lot of space. In the example all
cells with index 7 through 100 are never used. Moreover, it only works if the elements
can be used to index an array and if the range of the elements of the list is known
and is not too large. For example, we cannot sort a list of names using counting sort
because names are sequences of characters and cannot be used to index arrays.5 There
are other specialized sorting algorithms for lists of strings. For example, the trie data
structure (see chapter 5) can be used to sort lists of strings, but those methods make
other assumptions about the elements to be sorted.
As Good as It Gets
Without exploiting special properties of the data, we cannot sort faster than mergesort.
While this fact may seem disappointing at first, it is also good news because it gives
us the assurance that we have found the fastest algorithm possible. In other words,
mergesort is the best possible solution for the problem of sorting. It is therefore an
optimal algorithm. Computer science researchers can consider the problem solved and
spend their time and energy on solving other problems.
The optimality of mergesort rests on two related but distinct facts. Mergesort has
linearithmic runtime, and any sorting algorithm for the general case must have at least
linearithmic runtime. It is this second part that justifies the conclusion about merge-
sort’s optimality and, for that matter, about the optimality of any other sorting algo-
rithm that has linearithmic runtime in the worst case. The ultimate goal in designing
algorithms and data structures is to find an optimal algorithm for a problem, that is,
an algorithm whose worst-case runtime complexity matches the intrinsic complexity
of the problem it solves. Any such algorithm could be considered the Holy Grail for
Chapter 6 Sorting out Sorting 115
that problem. And like Indiana Jones in The Last Crusade, John von Neumann found
in mergesort the Holy Grail of sorting.6
It is important to distinguish between the runtime of an algorithm and the com-
plexity of a problem. The latter says that a correct solution must take at least that many
steps. In contrast, the runtime of an algorithm says that the particular algorithm takes
at most that many steps. A statement about the minimal complexity of a problem is
called the problem’s lower bound. A lower bound provides an estimate of how much
work a problem will need at minimum and thus characterizes the problem’s inherent
complexity. It is similar to the geometric distance between two points, which is a lower
bound for any path connecting the points. Any such path may be longer than the dis-
tance because of obstacles, but it cannot be shorter than the distance. The possible
disappointment one may feel about the lower bound of sorting and the related limi-
tation of sorting algorithms should be offset by the deep insight that this knowledge
provides about the problem of sorting. Compare this with similar results from other
disciplines. For example, in physics we know that we cannot travel faster than light and
that we cannot create energy out of nothing.
But how can we be so certain about the lower bound of sorting? Maybe there is
an algorithm no one has thought of yet that actually does run faster than linearithmic
time. Proving a negative is not easy, since it requires us to show that any algorithm,
existing or yet to be invented, has to perform a certain minimum number of steps. The
argument for the lower bound of sorting counts the number of possible lists of a specific
length7 and shows that the number of comparisons required to identify the sorted list
is linearithmic.8 Since every algorithm needs to perform this minimum number of
comparisons, it follows that any algorithm requires at least a linearithmic number of
steps, which proves the lower bound.
The reasoning about the runtime of algorithms and lower bounds makes assump-
tions about the capabilities of the computer that is executing algorithms. For example,
a typical assumption is that the steps of an algorithm are carried out in sequence and
that it takes one time unit to perform one computational step. These assumptions also
underlie the analysis of the sorting algorithms discussed in this chapter. If we assume,
however, that we can perform comparisons in parallel, the analysis changes, and we
obtain different results for runtimes and lower bounds.
Computation Preservation
It seems that Indiana Jones is pretty organized about going on an adventure: he always
packs his hat and whip into his travel bag. But instead of making a comprehensive plan
116 Once Upon an Algorithm
of all the steps, he could also approach an adventure by determining his next step just
right before it is necessary to do so. Both approaches, planning ahead and living in
the moment, have their respective advantages and disadvantages. A trip to the North
Pole requires different clothing and equipment than one to the Amazon. Here planning
ahead seems like a good idea. On the other hand, changing circumstances can make
prior plans obsolete and render the planning effort useless. In particular, during an
adventure the unexpected may happen, which often requires taking different actions
than anticipated.
If Indiana Jones’s strategy for finding the Lost Ark were to determine the next action
whenever it is needed, he would essentially be performing a selection sort and would
always search for the smallest element among the actions left to be performed, because
smallest here means “has to come before all other actions.” As discussed, this is not
very efficient, since selection sort is a quadratic algorithm. Indiana can do significantly
better if he creates a plan in advance by employing a linearithmic sorting algorithm
such as mergesort. This strategy of computing information ahead of time is called
precomputation. In the case of the plan for locating the Lost Ark the precomputed
information is not the set of individual steps but the arrangement of the steps into the
correct order.
The ordering information is kept in a sorted list. The crucial aspect
of precomputation is that computational effort is expended at one time
and the computed result is used at a later time. The precomputed result
Computation
is preserved in a data structure—in this case a sorted list. This sorted list
acts like a computational battery that can be charged through sorting. The
runtime spent by an algorithm to establish the sorted list amounts to the
energy used to charge the data structure battery. This energy can be used
to power computation, such as finding the next smallest element. Here, to power means
to speed up: without the sorted-list battery, it takes linear time to find the next smallest
element; with the battery, it only takes constant time.
Some ways are more efficient than others in charging data structure batteries, which
is reflected in the different runtimes of the different sorting algorithms. For example,
insertion sort is less efficient than mergesort, and the fact that mergesort is an optimal
sorting algorithm means that a most efficient method exists for charging the sorted-list
battery. Unlike electrical batteries, whose energy can be spent only once, data structure
batteries have the nice property that, once charged, they can be discharged repeatedly
without ever having to recharge them. This feature is an important point in favor of
precomputing. Situations in which one can expect to make use of a data structure
repeatedly provide a strong incentive to expend the precomputing effort because the
Chapter 6 Sorting out Sorting 117
cost can be amortized over several uses. On the other hand, a data structure battery
must be fully charged to work properly. An almost sorted list is not sufficient because
it does not guarantee that the smallest element will be at the front and can therefore
produce incorrect results. Essentially, this means that a data structure battery has two
states: it is either completely charged and useful, or not.
Precomputation seems like a great idea—like the squirrel that gathers nuts to save
them for the winter. There are many situations, however, when it’s not clear whether
the precomputation effort will pay off. We know many circumstances when acting
early can be beneficial, but this isn’t guaranteed and could actually turn out to be a
disadvantage. If you buy a plane ticket early or book a hotel room at a nonreimbursable
hotel rate, you may get a good deal, but if you fall ill and can’t make the trip, you can
lose your money and do worse than you would have by waiting with the purchase.
In cases like these, the value of acting early, or precomputating, is called into ques-
tion by uncertainty about the future. Since the benefit of precomputation depends on
a specific outcome of future events, it reflects a rather optimistic computation attitude
with a confident outlook on the future. But what if you’re skeptical about the future?
You may think that people who file their tax returns early are masochists, and you al-
ways delay yours until April 14, because the IRS could be dissolved any day now, or
you could die before. However, if, like Ben Franklin, you are certain that taxes are as
certain as death, then precomputation may seem a prudent thing to do.
A skeptical attitude toward the future calls for a radically different strategy for
scheduling computation, namely, a strategy that tries to delay costly operations as much
as possible until they cannot be avoided anymore. The hope or expectation is that
something might happen that makes the costly operation obsolete and thus saves its
runtime (and potentially other resources). In real life this behavior would be called pro-
crastination; in computer science it is called lazy evaluation. Lazy evaluation can save
computation effort whenever the information that would have been obtained by the
saved computation is not needed anymore. In the case of Indiana Jones’s adventures,
it happens quite often that an initial plan has to be changed or abandoned because of
unforeseen events or complications. In such cases, all the effort that went into creating
the plan is wasted and could have been saved by not making a plan in the first place.
Whereas the precomputing advocate’s motto is “A stitch in time saves nine” or “The
early bird catches the worm,” the champion of lazy evaluation might respond with “Yes,
but the second mouse gets the cheese.” While lazy evaluation seems attractive in its
promise to not waste effort, it is problematic when an action that becomes unavoidable
takes longer than it would have under precomputing, or worse, longer than there is
time available. In particular, when several delayed actions become due at the same time,
118 Once Upon an Algorithm
this might present a serious resource problem. Therefore, an overall more sensible
strategy is to distribute work evenly over time. While this might waste some effort on
precomputing, it avoids crises that a lazy evaluation strategy might bring on.
Getting Lunch
It’s Wednesday, the day of the week when you and some of your colleagues go out
to lunch together. You decide to try out the new Italian restaurant, but when you
arrive you learn that their credit card reader is not working and they are accepting only
cash today. Before you order, you determine how much cash your lunch party has.
When it comes to ordering, you are faced with the problem of selecting a set of items
(appetizers, entrées, side dishes, and drinks) that will feed all members of the lunch
party and satisfy their preferences as much as possible but that also doesn’t exceed the
budget given by the amount of cash available. Of course, if you have enough money,
the menu selection doesn’t pose a problem, but that is not necessarily a safe assumption
these days, as few people carry cash at all, expecting to be able to pay by debit or credit
card or smartphone.
How do you go about selecting the menu items to order? You could start with
everybody’s preferred order. If that fits within the budget, then the problem is solved.
But what if the total exceeds the available cash amount? In that case, people could
offer to order cheaper alternatives or to not order appetizers or drinks until the total
is within the limit. This approach depends on your ability to determine the overall
value of each lunch order based on all participants’ preferences. Here value means the
combined satisfaction of the lunch party with a particular lunch selection.
This is probably not an easy task, but let’s assume that we can determine the value
of lunch orders. Now the goal is to find a lunch order of maximum satisfaction value
whose total cost doesn’t exceed the cash limit. Perhaps a good strategy for this is to
gradually trade value for cost. However, this strategy isn’t as simple as it looks. Even
if it is better for Bob to refrain from ordering a drink than for Alice to not order an
appetizer, it is not clear that going along with this selection yields the better overall
value. This is because Alice’s appetizer might be more expensive than Bob’s drink and
thus might save enough so that Carol could have her second entrée preference instead
of her third. Now Bob’s and Carol’s preferences together might be more valuable to
120 Once Upon an Algorithm
satisfy than Alice’s. When it gets down to the details, it is not so clear which selections
to change and in which order.
We encounter this problem of selecting a number of items on a limited budget in
many other situations as well, including planning a vacation with more costly or less
costly events, choosing different options for traveling, or selecting among the upgrades
for a new car or some electronic gadget. It might not seem like it, but this problem
is surprisingly difficult in general. All known algorithms to solve these problems have
runtimes that are exponential in the number of items to select from. For instance,
consider writing down all possible lunch selections on small napkins of size 10 cm by
10 cm that each fit ten selections. Assuming that each person selects between one and
four items out of a menu of ten items (which gives 5,860 menu options for each person),
the napkins needed for writing down all possibilities for a lunch party of five would be
enough to cover the surface of the earth over 13 times.
The lunch selection problem has two characteristic features: (1) the number of pos-
sibilities to be considered grows very quickly, and (2) known algorithms work only if
they examine all or most of these possibilities. Problems like these are called intractable
because algorithms take too long to be of practical use for all but very simple cases.
But this doesn’t stop us from ordering lunches, planning vacations, and making
other selections we are often quite happy with. We employ approximation algorithms,
which are efficient algorithms that do not necessarily compute an exact solution for the
problem but a solution that is just good enough. For example, a very simple approxima-
tion for the lunch problem is to split the total budget among all members of the lunch
party and have each person find a selection that fits.
The difficulty of finding optimal selections can also be exploited. Insurance compa-
nies can offer overwhelmingly many choices that make it difficult for customers to find
the optimal choice and make more money for the company.
7 Mission Intractable
but the time (and other resources) it would take to actually do that makes it close to
impossible in practice.
Similarly, there are computational problems that have solutions in principle but take
too long to compute in practice. This chapter discusses examples of that sort. I present
some problems that can be solved (so far) only by algorithms that have exponential
runtime, that is, algorithms whose runtimes grow exponentially with the size of the
input. Algorithms with exponential runtime are unusable in practice for all but very
small inputs, which makes the question of lower bounds and whether faster algorithms
exist particularly important. This question is at the core of the P = NP problem, a
prominent problem in computer science that is still open.
As in the case of sorting, a result about the limits of computer science that looks
at first like a disappointment doesn’t have to be. Even if no efficient algorithms can be
developed for a particular problem, that doesn’t mean we have to give up on such prob-
lems altogether. In particular, we can devise approximation algorithms that compute not
exact yet good enough solutions to such problems. Moreover, the fact that a particular
problem cannot be solved in practice can sometimes be exploited for solutions to other
problems.
While large inputs reveal the difference between a quadratic or a linearithmic algo-
rithm, a quadratic algorithm may work fine for small inputs. For example, sorting a list
of 10,000 elements with selection sort takes about one-tenth of a second on a computer
that can perform a billion operations per second. While the runtime is barely notice-
able in this case, an increase in the list’s size by a factor of ten means an increase of the
algorithm’s runtime by a factor of one hundred. Thus, the runtime for sorting lists of
100,000 elements increases to about 10 seconds. This may not be acceptable anymore,
in particular, in situations where users are interacting with a system and expect instant
responses. However, the next generation of computers may remedy the situation, and
the algorithm may again become usable. Technological advances may suffice to push
the limit of a quadratic algorithm. Unfortunately, we cannot rely on this effect to make
an algorithm with exponential runtime usable.
approximates the idol’s weight. Alas, the bag is too heavy and triggers the traps, which
leads to a spectacular escape from the deadly cave.
Had Indiana Jones known the exact weight of the idol, he could
have filled the bag with the precise amount of sand, and his exit from
the cave would have been much less dramatic. But since he was proba-
bly not carrying around a scale, he needed to gauge the weight in some
other way. Fortunately, it is not too difficult to build a balance scale. Basically all you
need is a stick; attach the sand bag to one end and the exact weight at the other, and
then fill the sand bag with sand until the stick is balanced. Assuming that Indiana Jones
doesn’t have an object that has the exact same weight as the idol, he has to approximate
the weight with a collection of objects. This doesn’t seem like such a difficult problem.
If the idol has an estimated weight of, say, 42 ounces (which is about 2.6 pounds or 1.2
kg) and Indiana Jones has six objects that weigh 5, 8, 9, 11, 13, and 15 ounces, respec-
tively, then after trying several combinations he will find that the 5, 9, 13, and 15 ounce
objects add up exactly to 42 ounces.
But how exactly does the approach of trying several combinations work, and how
long does it take? In the example, the heaviest object weighs less than half of the idol,
which means we need at least three objects to get to the idol’s weight. However, it is
not immediately clear which three objects to pick and whether we need a fourth or fifth
object as well. Moreover, an algorithm must work for any possible situation and thus
be able to handle as input different numbers of objects of different weights as well as
arbitrary target weights.
A straightforward method for solving the weighing problem is to systematically
form all combinations of objects and check for each combination whether the total
weight is equal to the target weight. This strategy is also called generate-and-test, , since
it consists of the repeated execution of the following two steps: (1) generate a potential
solution, and (2) test whether the potential solution actually is a solution. In this case,
the generating step produces a combination of objects, and the testing step adds the
weights of the objects and compares the sum with the target weight. It is important
that the generating step be repeated systematically and cover all cases because otherwise
the algorithm could miss the solution.
The runtime of the generate-and-test method depends on how many combinations
of objects can be formed. To understand how many combinations exist, let us consider
how any one particular combination is formed from all the available objects. To this
end, take an arbitrary combination such as 5, 9, and 11. For every available object we can
ask whether it is contained in this particular combination. For example, 5 is contained
in it, but 8 is not; 9 and 11 are elements of the combination, but the elements 13 and
124 Once Upon an Algorithm
15 are not. In other words, any possible combination is given by a decision for each
element to either include or exclude it, and all these decisions are independent of each
other.
We can compare the process of forming a combination with that of filling out a
questionnaire that contains the available objects, each with a check box next to it. One
combination corresponds to a questionnaire in which the boxes of the selected objects
are checked. The number of combinations is then identical to the number of different
ways the questionnaire can be filled out. And here we see that each box can be checked
or not, independently of all the other boxes.
Therefore, the number of possible combinations is given by the product of the
available choices, which is 2 for selecting or not selecting an object (or equivalently,
checking or not checking a box). Since Indiana Jones has six objects to choose from, the
number of combinations is 2 multiplied by itself six times, that is, 2 × 2 × 2 × 2 × 2 × 2 =
26 = 64.1 Since the generate-and-test algorithm has to test each combination, its runtime
grows at least at the same rate. In fact, it is even more time consuming, since testing
one combination requires adding all the weights and comparing the sum with the target
weight.
Table 7.1 Approximate runtimes for different sizes of input on a computer that can perform a
billion steps per second.
Runtime
Input Size
Linear Linearithmic Quadratic Exponential
20 0.001 second
50 13 days
100
1,000
10,000 0.1 second
1 million 0.001 second 0.002 second 16 minutes
1 billion 1 second 30 seconds 32 years
Note: Blank cells indicate runtimes < 1 millisecond, too small to be meaningful for human per-
ception of time, and thus of no practical significance. Gray cells indicate runtimes too immense
to comprehend.
Shared Destiny
The generate-and-test algorithm for solving the weighing problem works only for rel-
atively small inputs (less than 30 or so), which is fine for the specific problem Indiana
Jones faces. But because of the exponential growth of its runtime, the algorithm will
never be able to deal with inputs of size 100 or larger. This particular algorithm’s expo-
nential runtime does not mean that there can’t be other, more efficient algorithms with
nonexponential runtimes. Currently, however, no such algorithm is known.
A problem that can only be solved by algorithms with exponential (or worse) run-
time is called intractable. Since we only know algorithms with exponential runtime for
the weighing problem, it may seem that it is intractable, but we don’t really know for
sure. Maybe there is a nonexponential algorithm that just hasn’t been discovered yet.
If we could prove that no such nonexponential algorithm can exist for the problem,
then we would know the problem is intractable. A lower bound could provide us with
certainty.
But why should we care about this? Maybe computer scientists should just let go
of this problem and move on to investigate other questions. However, the weighing
Chapter 7 Mission Intractable 127
problem is very similar to a range of other problems that all share the following two
intriguing properties. First, the only known algorithms for solving the problem have
exponential runtime, and second, if a nonexponential algorithm were found for any one
of the problems, it would immediately lead to nonexponential algorithms for all the
others. Any problem that is a member of this exclusive club is called NP-complete.5 The
significance of NP-complete problems comes from the fact that many practical problems
are indeed NP-complete and they all share the fate of potentially being intractable.
Toward the end of the adventure The Kingdom of the Crystal Skull, Indiana Jones’s
companion Mac collects treasure items in the temple of the crystal skull. He tries
to carry as many items as he possibly can while maximizing their total value. This
problem, the weighing problem, and the group lunch problem are all instances of the
knapsack problem, which is named for the task of filling a knapsack of limited capacity
with as many items as possible while maximizing some value or utility of the packed
items. In the weighing problem the knapsack is the scale, its limit is the weight to
be measured, the packing of items amounts to placing objects on the scale, and the
optimization objective is to get as close as possible to the target weight. For Mac’s
selection problem the limit is what he can carry, packing means to select items, and the
optimization is maximizing the items’ total value. In the lunch problem the limit is the
total amount of cash available to purchase items, the packing is the selection of items,
and the optimization is maximizing the value of the menu selection.
Incidentally, Mac and Indiana Jones both fail: Mac spends too much time selecting
the items and dies in the crumbling temple, and Indiana Jones’s sandbag triggers the
deadly traps, although he ultimately manages to escape them. It’s unclear whether their
failure is due to the NP-completeness of the problems they try to solve, but it is a nice
reminder of intractability.
The knapsack problem with its applications is only one of many NP-complete prob-
lems. Another well-known example is the traveling salesman problem, which asks for a
round trip connecting a number of cities that minimizes the length of the trip. A simple
method is to generate all possible round trips and find the shortest one. The complexity
of this algorithm is also exponential, but it is much worse than the weighing problem.
For example, while the weighing algorithm can complete a problem of size 20 in 1 mil-
lisecond, computing a round trip for 20 cities would take 77 years. Finding round trips
is not just useful for traveling salesmen but has many other applications, ranging from
planning school bus routes to scheduling tours of cruise ships. Many other optimization
problems are also NP-complete.
The intriguing property of NP-complete problems is that they all are either in-
tractable or have algorithms with nonexponential runtimes as solutions. One way of
128 Once Upon an Algorithm
showing that a problem is NP-complete is to demonstrate that a solution for the prob-
lem could be transformed (in nonexponential time) into a solution for another problem
that is already known to be NP-complete. Such a transformation of a solution is called a
reduction. Reductions are clever ways of problem solving by transforming a solution for
one problem into a solution for another. Reductions occur quite frequently, and some-
times implicitly, as when Indiana Jones reduces the problem of finding safe tiles to the
spelling of the name Iehova, or when Sherlock Holmes reduces the task of maintaining
information about suspects to operations on a dictionary.
Reductions are themselves computations, and they make a handy addition to the
repertoire of computational techniques, since they can be combined with other com-
putations. For example, the problem of finding the smallest element in a list can be
reduced to sorting the list and then taking the first element of the sorted list. Specifi-
cally, a reduction transforms the input (here, an unsorted list) into a specific form (here,
a sorted list) on which another algorithm (here, taking the first element) can operate.
Combining both computations, sorting followed by taking the first element, yields the
computation that is a solution to the original problem of finding a minimum in an ar-
bitrary list. However, it is important to take into account the runtime of reductions.
For example, since the sorting step requires linearithmic time, the method obtained
by this reduction does not have the same runtime as simply scanning the list for the
minimum, which takes only linear time. This reduction is therefore not very useful
because it leads to a less efficient algorithm.6 Therefore, the reductions between NP-
complete problems must be achieved by nonexponential algorithms because otherwise
a nonexponential solution to one problem would become exponential because of the
reduction’s exponential runtime.
The nonexponential reduction between all NP-complete problems provides enor-
mous leverage. Just like the wheel had to be invented by only one individual and could
then be used by all humans, the solution to one NP-complete problem solves all others.
And this fact leads to the question that is summarized by the famous equation:
P = NP ?
The problem was first brought up by the Austrian-American logician Kurt Gödel and
was made precise in 1971 by the Canadian-American computer scientist Stephen Cook.
The question is whether the class P of problems that can be solved in nonexponential
(or, polynomial) time is equal to the class NP of problems that can be verified in poly-
nomial time. Finding an exponential lower bound for NP-complete problems would
provide “no” as the answer to the equation question. On the other hand, finding a
nonexponential algorithm for any of the problems would yield “yes” as the answer.
Chapter 7 Mission Intractable 129
The tractability of a huge number of practical problems depends on the answer to the
P = NP question, which emphasizes its great importance. This problem has been vex-
ing computer scientists for over four decades and is probably the most important open
problem in computer science. Most computer scientists today believe that NP-complete
problems are indeed intractable and that nonexponential algorithms do not exist for the
problems, but no one really knows for sure.
In the weighing problem the simple algorithm initially has to sort the objects by
weight. It then finds the first object that weighs less than the target weight and keeps
adding objects until the target weight is reached. Start by adding 15, 13, and 11, which
makes a total of 39. Adding the next weight, 9, would exceed the target weight, and
thus we skip it and try the next one. But both 8 and 5 are also too heavy, and thus the
approximate result obtained is 15, 13, and 11, which is only 3 off the optimal solution.
In other words, the solution is within 7% of the optimal solution.
This strategy of repeatedly picking the largest possible value is called a greedy al-
gorithm, since it always jumps at the first opportunity. Greedy algorithms are simple
and work well for many problems, but for some problems they miss exact solutions, as
in this case. It was the greedy action of taking the 11 instead of waiting for the 9 that
made an optimal solution unachievable in this example. This greedy algorithm takes
linearithmic runtime (caused by the initial sorting step) and is therefore quite efficient.
An important question for an approximation algorithm is how good an approxima-
tion it can produce in the worst case. For the greedy weighing algorithm one can show
that it is always within 50% of the optimal solution.7
Approximations are solutions to problems that are good enough. They are not as
good as an optimal solution but better than no solution at all. In Indiana Jones and
the Temple of Doom, Indiana Jones and his two companions face the problem that their
plane is about to crash into a mountain. The two pilots have abandoned the plane,
drained the remaining fuel, and have not left any parachutes on board. Indiana Jones
solves this problem by using an inflatable boat to sail the passengers to the ground, ap-
proximating the effect of a parachute. Having any approximation algorithm, no matter
how crude, is often better than having no practical algorithm at all.
Make Lemonade
Approximation algorithms can ameliorate the problem of inefficiency caused by expo-
nential algorithms. That is good news, but there is more. The fact that solutions to a
particular problem cannot be computed efficiently can actually be a good thing. For
example, the generate-and-test algorithm for solving the weighing problem is similar to
what we have to do when trying to unlock a number lock after having forgotten its
combination: We have to go through all combinations, of which there are 10 × 10 × 10 =
1,000 in the case of three single-digit dials. It is the fact that it takes a long time to try all
1,000 combinations that makes number locks somewhat effective in protecting access to
bags and lockers. Of course, locks can be broken, but that is more like bypassing the
problem, not solving it.
Chapter 7 Mission Intractable 131
To check 1,000 combinations with an electronic computer is trivial, but for a slow
human it takes long enough, which shows that efficiency is a relative notion and depends
on the ability of the computer. But since algorithmic runtime is really about the growth
of an algorithm’s runtime in relation to its growing input, faster computers cannot
make up for the immense increase in runtime of exponential algorithms that is caused
by minor increases in the size of the input. This fact is exploited in cryptography to
facilitate the secure exchange of messages. Whenever you access a website whose address
starts with https://, the padlock in your web browser’s address bar locks to indicate that
a secure communication channel to the website has been established.
One approach to sending and receiving encrypted messages works by encoding and
decoding messages with two related keys, one public key and one private key. Each
participant in the communication has one such pair of keys. The public key is known
to everybody, but each participant’s private key is exclusively known by the participant.
The two keys are related in such a way that a message encoded by the public key can be
decoded only through the use of the corresponding private key. This makes it possible
to send a secure message to someone by encoding it using that person’s public key,
which is publicly known. Since only the recipient of the message knows the private
key, only the recipient can decode the message. For example, if you want to check
the current balance of your bank account via the internet, your web browser transmits
your public key to the bank’s computer, which uses it to encrypt the amount and sends
it back to your browser. Anyone who sees this message can’t decipher it, since it is
encrypted. Only you can decode it in your browser using your private key.
If we didn’t have the internet and had to do this with postal mail, this would be
similar to sending a box with an unlocked padlock (to which only you have the key) to
your bank. The bank writes the amount on a piece of paper, puts it into the box, locks
the box with the padlock, and sends the locked box back to you. When you receive
the box, you can unlock the padlock with your key and see the amount. Since nobody
else can unlock the box, the information is protected from unauthorized access during
the transportation. In this example, the box with the open padlock corresponds to the
public key, and the bank’s action of putting the sheet with the amount into the box
corresponds to the encryption of the message.
An encrypted message is practically safe from unauthorized access, since in order to
decode it without knowing the private key one has to compute the prime factors of a
large number. While it is not known whether the computation of prime factors is an
NP-complete problem, nonexponential algorithms are not known at this time, and thus
it takes too long in practice to solve. Thus the difficulty of solving a problem can be
utilized for protecting the transmission of information from unauthorized access.
132 Once Upon an Algorithm
This is not really a new idea. Moats, fences, walls, and other
protective mechanisms are all based on this principle. But many
of these protections can be broken. Currently, sending and receiv-
ing encrypted messages can be considered safe because of the lack
of a nonexponential algorithm for prime factorization. However, as
soon as anyone finds a nonexponential algorithm, that safety evapo-
rates in an instant. If, on the other hand, someone could establish an
exponential lower bound on the problem, we would be certain that
encrypted messages are safe. In the meantime, our ignorance about
this question keeps the method working.
Further Exploration
The adventures of Indiana Jones usually concern searches for artifacts, locations, and
people. Sometimes a quest is guided by items or information gathered along the way. In
these cases the path that is traveled is determined dynamically. In such stories the search
leads to a list of places visited, and this list is a search path toward the final goal. Movies
such as National Treasure or Dan Brown’s The Da Vinci Code follow this pattern as well.
At other times, the basis for a search is a treasure map. In this case the path is clear
right from the beginning, and the search is about locating landmarks shown on the
map in the real world. The book Treasure Island, by Robert Louis Stevenson, has made
treasure maps famous. A treasure map is an algorithm for finding a specific location, but
the algorithm can be given in quite different ways. In some stories the treasure map does
not contain direct instructions for locating the treasure but rather contains clues, codes,
or riddles that need to be decoded or solved. Such treasure maps do not really describe
algorithms but rather problems to be solved. This is the case, for example, in the movie
National Treasure, where a treasure map is hidden on the back of the Declaration of
Independence that contains a code for locating spectacles that reveal further clues on
the map.
The movie National Treasure also contains a riddle similar to the tile challenge faced
by Indiana Jones. Ben Gates, one of the protagonists, has to infer from the keys pressed
on a keyboard the password that was typed in, which requires finding the right order-
ing of the letters. A word or phrase whose letters denote another phrase is called an
anagram. Solving anagrams is not really about sorting but about finding among all
possible letter orderings a particular one. Anagrams appear, for example, in the Harry
Potter stories, where “I am Lord Voldemort” is an anagram of his birth name Tom Mar-
volo Riddle, and in the movie Sneakers, where “Setec Astronomy,” a code name for a
decrypting device, is an anagram of “too many secrets.”
In the Brothers Grimm version of the fairy tale Cinderella—in German, Aschen-
puttel, the bad stepmother presents Aschenputtel with the challenge to sort out lentils
from the ashes, which is a simple form of bucket sort. Sometimes the events of a story
134 Once Upon an Algorithm
also require sorting when they are narrated in nonchronological order. One extreme
is the movie Memento, which presents much of its story in reverse. Something similar
happens in the movie Eternal Sunshine of the Spotless Mind, where a couple undergoes a
procedure to erase their memories after a breakup. The memories of one of them are
then presented in reverse order. The movie Vantage Point presents the events leading up
to an attempt to assassinate the U.S. President from different perspectives. Each descrip-
tion is itself incomplete, but adds more details and facts, and the viewer must merge all
accounts together to understand the story.
David Mitchell’s book Cloud Atlas consists of several stories that are nested within
one another. To obtain a completed version of the stories, one has to reorder different
parts of the book. An interesting variation of the ordering challenge is presented in
Julio Cortázar’s Hopscotch, which contains two different sets of explicit instructions
about the order in which to read the chapters of the book.
Part II
LANGUAGES
em
defines
executes
du plo
c ys
ex
sum
es
em
ms
es
pli
sfor
structured by
Chapter 10
fy
solves
Abstraction
ain
co
nt
n
co
ta
in
Chapter 12
Chapters 5 & 6
Problems Loops Recursion
Representation
es
can uir
can
req
lains
Chapters 3 & 4
be req
u
be
ire
exp
Chapter 11 Chapter 13
Chapter 7
Intractable Unsolvable Termination Interpretation
Language and Meaning
After lunch you have a doctor’s appointment. Following the exam, the physician writes
a prescription and fills out a form for ordering some blood tests. You take the form to
the lab, get blood drawn, and then take the prescription to the pharmacy. At this
point it should come as no surprise that the lab technician, in subjecting the blood to
several tests, is executing an algorithm, as is the pharmacist in preparing the prescription
according to the doctor’s instructions. A noteworthy feature of this scenario is that the
algorithm is defined by one person and executed by another.
This division of labor is possible only because the algorithm is written down in a
language. Although the physician could also have called and instructed the pharmacist
directly, this would have complicated the process, since it would have required the physi-
cian and the pharmacist to work simultaneously. The representation of the algorithm
in a written form allows them to work independently of one another, and at different
times. Moreover, once written, a prescription can be used multiple times.
The separation of algorithm definition and execution makes the need for a precise
definition of a language evident. The physician and the pharmacist must agree on what
a prescription is, what it may contain, and what it means. Your health as a patient de-
pends on this. For example, a dosage must state its unit to avoid ambiguity. If physician
and pharmacist interpret dosage in different units, ambiguity could cause the medica-
tion dosage to be ineffectively low or dangerously high. Similar agreements are required
between the physician and the lab technician.
The pharmacist and lab technician are computers who have to execute the algo-
rithms given to them by the physician. Only if they know how to read and interpret
the physician’s instructions can they successfully execute them to support the physi-
cian’s goal of helping you, the patient.
The first step in executing a written algorithm is to parse it, which means to extract
its underlying structure. This structure identifies the key components of an algorithm
(in the case of a prescription, drug names, quantities, and frequency of intake) and how
they are related (for example, whether the quantity refers to each intake or the total
140 Once Upon an Algorithm
amount of the drug). Once the structure is identified, the meaning of the language in
which the algorithm is expressed defines what exactly the pharmacist and lab technician
have to do. The process of parsing is itself an algorithm for extracting the underlying
structure of any given sentence.
The languages used for ordering blood work and for prescriptions are quite different.
In addition to their content, they also have different external appearances. While a
prescription typically consists of a sequence of words for drugs, amounts, frequency of
intake, and so on, an order for blood work is routinely given by a form with several
checked boxes. Often the particular format of a language is due to historical reasons,
but it can also be the result of a conscious design decision to better support some of the
language’s goals. For example, a form with check boxes reflects the internal structure of
the language and simplifies the task of writing and interpreting (and billing) orders. It
can also avoid potential ambiguities. For example, the form can be designed to prevent
the selection of redundant tests, such as ordering a basic and a comprehensive test.
Language plays a central role in computer science. Without language, we could not
talk about computation, algorithms, or their properties. Many domains have developed
specialized languages with their own terminology and notation. In addition to using
languages, computer scientists study language itself, as linguists and philosophers of
language do. In particular, computer scientists address questions of how languages can
be precisely defined, what features languages (should) have, and how new languages
can be designed. Computer scientists study formalisms for specifying, analyzing, and
translating languages, and they develop algorithms to automate these tasks.
The philosopher Ludwig Wittgenstein famously wrote: “The limits of my language
mean the limits of my world.”1 In chapter 8 I use the language of music to illustrate
concepts of language relevant to computer science and to show that the application of
language has few limits in this world.
8 The Prism of Language
We use language effortlessly every day, unaware of the complicated mechanisms in-
volved when language does its work. It is similar to walking. Once it is learned, we can
easily wade through water, plow through sand, climb up stairs, and step over obstacles.
The complexity of the task becomes apparent when we try to build a robot that mimics
the behavior. The same is true of languages. Once you try to instruct machines how to
use language effectively, you realize how difficult this actually is. Did you ever get frus-
trated using Siri or failing to find what you were looking for on Google? The Turing
test for machine intelligence reflects this state of affairs perfectly. According to this test,
a machine is deemed intelligent if a user cannot distinguish it from a human being in a
conversation. In other words, language competence is used as a benchmark for artificial
intelligence.
The phenomenon of language has been studied in several disciplines, including phi-
losophy, linguistics, and sociology, so there is no single agreed-upon definition of what
a language is. From a computer science point of view, a language is a precise and ef-
fective means to communicate meaning. Chapter 3 showed how signs form the basis
for representation and thus can give meaning to computations by linking symbols to
the concepts they stand for. While signs can represent merely individual concepts, a
language defines the meaningful combination of signs into sentences and represents re-
lationships between such concepts. Signs and languages are both representations, but
while signs are sufficient to represent objects of interest for a particular discourse, it
takes a language to represent computations through algorithms.
142 Once Upon an Algorithm
This chapter explains what a language is and how languages can be defined. I begin
by illustrating how languages enable the representation of algorithms and thus compu-
tation, which accounts for the importance of languages as a subject in computer science.
I then show how languages can be defined through grammars. A language consists of
sentences, and while a sentence is often considered to be simply a list of symbols or
words, this is too narrow a view. It is like looking at the drawing of a scene and only
recognizing the objects in it but ignoring their relationships. Imagine a drawing that
shows a man, a dog, and a stick. It matters whether the dog is carrying the stick and is
walking toward the man or whether the man is holding the stick and the dog is biting
the man. Similarly, each sentence of a language has an internal structure that plays an
important role in assembling its meaning. To illustrate this aspect, I demonstrate how
a grammar not only defines the external appearance of sentences as sequences of words
(concrete syntax) but also their internal structure (abstract syntax).
\ \ \
This chapter is not about computation per se but about the description of computation,
or rather about the description of the description of computation. What does that
mean? An algorithm is a description of a computation, but this chapter is not about
specific algorithms; it’s about how to describe algorithms. It turns out that such a
description amounts to a language.
When we have to explain what a car or a flower is, we do not talk just about a few
specific cars or flowers but rather about what all cars or flowers have in common, the
essence of cars or flowers, so to speak. Individual specimens of cars and flowers will
likely be mentioned as examples, but to talk about all possible cars or flowers requires
the development of a model of what a car or a flower is. A model defines the type of all
cars or flowers (see chapter 14).
Templates, which consist of a fixed structure with some variable parts, can be effec-
tive models for cars and flowers, since much of their inherent structure is fixed. While
templates work well to describe some kinds of algorithms, for example, spreadsheets or
the forms for ordering blood work, a template-based model would be too rigid and not
expressive enough for algorithms in general. The flexibility required for the description
of arbitrary algorithms is provided by languages.
In the story accompanying this chapter, language itself plays a prominent role. I
chose music as the domain in which to study the concepts of language because musical
languages are simple and structured enough so that the concepts to be explained are
easily appreciable. The domain of music is narrow enough to lend itself to a specialized
Chapter 8 The Prism of Language 143
notation, yet it is universally understood and thus can be taken for granted while the
discussion focuses directly on musical languages.
Regarding music as a language is not a new idea. For example, using ideas that
go back to Pythagoras, the astronomer Johannes Kepler employed musical concepts
to explain the laws governing the orbiting frequencies of the planets. Francis Godwin
describes in the novel Man in the Moone (1638) how the inhabitants of the moon use
musical language to communicate. In the movie Close Encounters of the Third Kind, hu-
mans communicate with aliens using a five-tone scale. Since music is a highly structured
yet concrete and culture-transcendent medium, it is well suited to explain the concepts
of language.
Don’t worry if you don’t know music notation. I explain the necessary concepts over
the course of this chapter. At this point, it is sufficient to know that the balloon-shaped
symbols represent individual notes of the melody, their vertical position determines
the pitch of the note, and that the duration of a note is indicated by the kind of stem
attached and whether the note oval is black or empty. Pitch and duration are the basic
elements to form melodies. By singing or listening to the tune and following the score
one can get a pretty good understanding of the meaning of the individual notes and the
notation in general.
This score can be regarded as a sentence in the language of music notation. What’s
more, the score is the description of an algorithm to produce music. Anyone who
144 Once Upon an Algorithm
Playing
Silence Music
Musician
Performance
Figure 8.1 Playing (executing) a music score (an algorithm) generates a performance (a com-
putation), which transforms silence into music. The performance is produced by a musician
(a computer), for example, a person or a machine that can understand the music notation (lan-
guage) in which the piece is written. See figure 2.1 on page 36.
understands this language can execute this sentence, which effectively turns the musician
into a computer. The analogy with computing is depicted in figure 8.1. The generated
computation produces representations of sounds, which can take quite different forms.
A vocalist will exert movement of her vocal chords, and a pianist, guitarist, or violinist
will start and stop string vibrations, using keys and hammers, fingers, or a bow.
The remarkable thing about music notation is that it facilitates the faithful repro-
duction of a melody, even by people who have never heard it before. The composer
Harold Arlen, who invented the melody and then coded it by writing it down in music
notation, could have simply sent the score to Judy Garland, and this would have been
enough to ensure that she could sing the song properly. Without such a notation, the
only way to share music would be through a recording or by hearsay (hearplay?), that
is, to perform it in front of others who would then have to remember it to reproduce
it. If you have ever played the telephone game, you know how unreliable such a process
is.3
Pondering the design of the staff notation, one can notice how arbitrary it is. Even
though a note is an indexical sign (see chapter 3) whose height reflects the pitch of the
represented note, there seems to be no particular reason for having five staff lines, for
the elliptical shape of notes, or for the particular length and direction of stems. This
could all be done quite differently. In fact, it is sometimes done differently.
For example, the tablature notation for guitar is
based on a very different representation. It shows how !
1 0 0 1
0 2 2 0
3 3
to interact directly with the strings of the guitar and thus
Chapter 8 The Prism of Language 145
avoids an abstract concept of notes. A number on a line indicates where (on which fret)
to hold down the string while plucking it. One advantage of this notation lies in its
directness: it allows beginners to quickly play tunes without first having to learn an
abstract music notation. A disadvantage is that the notation does not precisely reflect
the duration of notes. Moreover, it is limited to one kind of instrument.
The tablature notation reflects the physical structure of the guitar and is thus less
arbitrary. For example, since a guitar usually has six strings, the design of the notation
requires that it have six horizontal lines and that numbers appear only on those lines.
By contrast, the number of five lines in the score notation is arbitrarily chosen and can
be extended when needed. In fact, the very first note in the staff notation example does
just that and employs an auxiliary line.
Tablature and staff notation are two different languages for representing the domain
of music. Each language is defined by a set of rules that specify what kind of symbols
can be used and how they can be combined. The rules define the syntax of the language;
they are used to distinguish proper elements of the language, called sentences, from
gibberish. A properly constructed music score or tablature is a sentence. Any notation
that violates the rules cannot be properly executed as an algorithm. For example, it
would make no sense to use negative numbers in tablature. A guitar player would
not know how to interpret them and what to do. Similarly, if we had a note in the
staff notation that was spread out over several staff lines, Judy Garland would not have
known which note to sing.
A music performer can reproduce music only if the music notation is clear and
unambiguous.4 This is another way of saying that the music notation, like any other
algorithmic notation, must represent effectively executable steps that can be interpreted
unambiguously. Thus, to ensure the effectiveness of any musical (and other algorithmic)
language we first need a precise definition of what counts as a sentence of that language.
Grammar Rules
The syntax of a language can be defined with a grammar, which is understood as a set of
rules for constructing sentences of the language. Using natural languages like Spanish
or English to describe syntax is not a good alternative, because the descriptions tend
to be long and imprecise. This is why math equations or physics laws are generally
not expressed in prose but in some specialized notation. In fact, many scientific and
technical fields have developed their own terminology and notation to enable effective
communication about the ideas in their respective domains. And the same is true for
146 Once Upon an Algorithm
the study of languages, whether it is in the field of linguistics or computer science, and
one special notation to define languages is that of grammars.
One problem in describing a language is how to represent in a finite way a poten-
tially infinite set of sentences. Science faces a similar challenge by having to describe
an infinite set of facts through a small set of laws. Science’s solution to this problem is
helpful in explaining some concepts of grammars. Consider, for example, the famous
physics equation E = mc 2 , which relates the energy ( E ) that an object contains to its
mass (m ). The exact meaning of this equation is not really important here; what mat-
ters is that the equation contains a constant c and two variables, m and E, which can
stand for arbitrary positive numbers. In particular, the two variables in this equation
allow the computation of the energy content of any object, no matter how large or
small, how simple or complicated. The use of variables enables the single equation to
represent an infinite number of physical facts. A variable in an equation serves the same
purpose as a parameter in an algorithm.
Similar to equations, a grammar also contains constants and variables. The con-
stants are called terminal symbols, or terminals for short, and the variables are called
nonterminal symbols, or nonterminals for short. The reason for these names will be-
come clear shortly. A sentence of a language is given by a sequence of terminal symbols
and does not contain any nonterminals, which play only an auxiliary role in construct-
ing sentences. In the case of staff notation, terminals are, for example, individual notes
appearing on staffs and bars that group notes into measures. Of course, the notation
contains other terminal symbols, but notes and bars are sufficient to illustrate the gram-
mar concepts needed to define simple melodies.
For example, the terminals that constitute the first mea-
sure of “Over the Rainbow” are the two note symbols and
, followed by a bar . Similar to a constant in an equation,
Some - where o
a terminal symbol represents a fixed, unchangeable part of a
sentence, and a sentence corresponds to one particular scientific fact that results from
an equation by substituting numbers for the variables.
In contrast, a nonterminal symbol acts like a variable in an equation that can take
on different values. A nonterminal can be substituted by a sequence of other (terminal
and nonterminal) symbols. However, substitution is not arbitrary. Instead, all the
possible substitutions are defined by a set of rules. A nonterminal is like a placeholder
and is typically used to represent a particular part of a language. The substitution rules
define what those parts look like in terms of sequences of terminals. For example, in
the definition of a simplified grammar for the staff notation we find a nonterminal note
that can stand for an arbitrary note terminal symbol.5
Chapter 8 The Prism of Language 147
Table 1.1
Grammars Equations Algorithms
Nonterminal Variable Parameter
Terminal Constant, operation Value, instruction
Sentence Fact Algorithm with only values
Rule Instruction
Figure 8.2 Grammar rules defining the possible substitutions for the note nonterminal. Since
an arbitrary note is represented by one nonterminal, a separate rule is required for each combi-
nation of pitch and duration. Each column shows the rules for notes for a particular duration:
left, half notes (notes that last one half of a measure); middle, quarter notes; right, eighth notes.
we can define melodies by the following three rules. (Each rule is given a name for later
reference.)
The first rule, NewNote, says that a melody starts with some note and is followed by
another melody. This may seem strange at first, but if we consider that a melody is a se-
quence of notes, this rule says that a sequence of notes starts with a note and is followed
by another sequence of notes. The form of the rule looks peculiar because it replaces a
symbol by an RHS that contains that very symbol. Such a rule is said to be recursive. It
seems that a recursive rule does not really achieve the substitution that it is supposed to
accomplish. And if we had only recursive rules for melody , the grammar would indeed
be problematic because we could never get rid of the nonterminal. However, in the
example, the third rule, LastNote, is not recursive and can always be employed to re-
place a melody nonterminal by a note nonterminal, which could then be replaced by a
terminal note symbol. Instead of the rule LastNote we could also employ the following
alternative third rule that has an empty RHS:
melody → (EndMelody)
Chapter 8 The Prism of Language 149
This rule says to replace melody by an empty sequence of symbols (that is, nothing).
Such a rule can be used to effectively remove a melody nonterminal from a sentential
form.
Recursive rules are typically used to generate some number of symbols through
repeated applications.
The second rule, NewMeasure, is similar to the first and also allows the repeated
generation of note nonterminals. In addition, however, it generates a bar terminal sym-
bol , which indicates the end of one measure and the beginning of a new one.
Starting with a nonterminal such as melody , we can generate sequences of symbols
by repeatedly replacing nonterminals through the application of grammar rules. For
example, the first measure of “Over the Rainbow” can be produced as follows:
NewNote
melody −−−−−−−−−−→ note melody
NoteC
1/2
−−−−−−−−→ melody
NewMeasure
−−−−−−−−−−−−→ note melody
NoteC’
1/2
−−−−−−−−→ melody
The labeled arrow in each line indicates which rule was applied. For example, first the
rule NewNote is applied to generate the first note nonterminal. Then this nonterminal
is immediately replaced by a terminal that represents the first note of the song. The
specific rule from figure 8.2 that is used is indicated by the pitch and duration annota-
tions of the rule NoteC 1/2 (C for the pitch and /2 for a note that lasts for one half of a
1
whole measure). The process then continues by using the NewMeasure rule to generate
another note nonterminal, followed by a bar terminal that ends the current measure.
The new note nonterminal is then also substituted in the next step by a note terminal.
The decisions about which rules to use determine the melody that is generated.
Notice that the order in which the rules are applied is flexible to some degree. For
example, we could have swapped the application of the rules NoteC 1/2 and NewMeasure
and still obtained the same sentential form:
150 Once Upon an Algorithm
NewNote
melody −−−−−−−−−−→ note melody
NewMeasure
−−−−−−−−−−−−→ note note melody
NoteC
1/2
−−−−−−−−→ note melody
NoteC’
1/2
−−−−−−−−→ melody
We could have also swapped the order in which the two note nonterminals have been
substituted and obtained the same result.
We can terminate the series of rule applications by applying the third rule (either
LastNote or EndMelody), which eliminates the remaining melody nonterminal. (If we
use LastNote, we also have to apply one more Note rule to eliminate the resulting note
nonterminal.) The produced sequence of terminals is a sentence of the language, and the
sequence of sentential forms and rule applications, from the initial melody nonterminal
to the final sentence, is called a derivation. A sequence of terminals is a sentence of
the language only if such a derivation exists, and deciding a sentence’s membership in a
language boils down to finding a corresponding derivation. A derivation is proof that
its resulting sequence of terminals is an element of the language.
One of the nonterminals of a grammar is designated as the start symbol, which repre-
sents the main category of sentences that are defined by the grammar. This nonterminal
also gives the grammar its name. For example, since the purpose of this grammar is to
define melodies, the start symbol of the grammar should be melody , and we can refer
to the grammar as the melody grammar. The language defined by the melody grammar
is the set of all sentences that can be derived from the melody start symbol.
melody
These two steps are simple and straightforward. The next two steps lead to a somewhat
unexpected result because the parse tree does not reveal the structure of the song. The
parse tree does not show how a melody consists of measures and how measures consist
of notes:
melody
The lack of structure is a consequence of how the earlier grammar was defined: its rules
expanded a melody simply into a sequence of notes. Therefore, it is not surprising that
the parse tree does not mention any measures. By changing the grammar definition
to account for measures, we can remedy the situation. Figure 8.3 shows parts of the
parse and abstract syntax trees. In computer science trees are drawn upside down with
the root at the top and the leaves at the bottom. Nonterminals are the locations of
branching in the tree, and terminals occur as the leaves of the tree.
The parse tree in the figure is the immediate result of turning a derivation into a
tree; it retains all the details of the derivation, even those parts that are not required for
translating the tree into a different notation. In contrast, the abstract syntax tree ignores
Chapter 8 The Prism of Language 153
melody
...
Figure 8.3 The structure of a melody represented by syntax trees. A syntax tree captures the
structural result of a derivation but ignores details, such as which rules have been applied in
which order. Left: A parse tree, which captures all the details of the derivation. Right: An
abstract syntax tree, which omits unnecessary details and retains only the structurally relevant
information.
terminals and nonterminals that are not essential for the structure of the sentence. For
example, the bar terminals are redundant, since the grouping of notes into measures
is already captured by measure nonterminals. The note nonterminals are not needed
either, since each one of them is always expanded into exactly one note terminal. Adding
the note terminals directly as children to the measure nonterminals captures the same
structural information and thus justifies the removal of the note nonterminals. The
parse tree and the abstract syntax tree share a common structure: both roots are given
by the melody nonterminal, and all the leaves are terminal symbols. The abstract syntax
tree reflects the structure of the sentence more directly and is the basis for analysis and
translation; it also facilitates the definition of a sentence’s meaning.
The process of constructing a parse tree or an abstract syntax tree for a given sen-
tence is called parsing. The relationships between sentences, syntax trees, and parsing
are illustrated in figure 8.4. Parsing is a computation for which several different algo-
rithms exist. While a grammar provides a clear definition of which sentences belong
to a language, parsing also needs a strategy for turning a given sentence into a syntax
tree. The difficulty in parsing lies in deciding which grammar rules to select during the
analysis of a sentence. There are different strategies for parsing sentences of a gram-
mar. We have seen one, called top-down parsing, which begins with the start symbol of
the grammar and repeatedly applies rules to gradually build a syntax tree by expanding
nonterminals. This process is repeated until the sentence appears as the sequence of
terminals in the leaves of the tree. In contrast, bottom-up parsing tries to match right
154 Once Upon an Algorithm
Abstract Syntax
.
melody
.
measure .
measure ...
Parsing !
! ! !
"
!
"
...
Pretty printing
Pretty printing
"
0 2 2 0
3 3
Figure 8.4 Parsing is the process of identifying the structure of a sentence and representing
it as a syntax tree. Pretty printing turns a syntax tree into a sentence. Since an abstract syntax
tree may omit certain terminal symbols (for example, bars), pretty printing cannot just collect
the terminals in the leaves of the syntax tree but must generally employ grammar rules to insert
additional terminal symbols. The parsing arrow is missing for the tablature notation because it
is an ambiguous notation and thus doesn’t allow the construction of a unique abstract syntax
tree.
sides of rules in the sentence and applies rules in reverse. The goal is to build the syntax
tree by adding the nonterminals of the left side of matched rules as parents to the tree.
This is repeated until a tree with a single root node is obtained.
The opposite process, turning a syntax tree into concrete syntax, is called pretty
printing. This is a mostly straightforward computation, since the structure of the sen-
tence is already given and provides guidance for the creation of the concrete representa-
tion.
With parsing and pretty printing we have the necessary tools to translate between
languages. A translation between two languages that have the same abstract syntax is
obtained by parsing a sentence in the first language and applying pretty printing for the
second. For example, in figure 8.4 we can see how to get from staff notation to tablature
by first parsing a sentence in the staff notation and then applying the tablature pretty
printer to the resulting abstract syntax tree. To translate between languages that do
not share the same abstract syntax requires an additional transformation between the
abstract syntax trees.
You are at the pharmacy to pick up your medication, but there is a problem. The
prescription lacks information about the dosage form—capsules or liquid? Since the
prescription is ambiguous and lacks a precise meaning, it does not represent an algo-
rithm. The prescription does not describe a set of effective steps that would allow the
pharmacist to assemble a medication.
Ambiguities can occur for several reasons. In this case the lack of information is
reflected in the abstract syntax tree by a nonterminal for the dosage form that is unex-
panded. Since the pharmacist cannot execute the prescription provided by the physi-
cian, she has to call and ask for clarification. Once the ambiguity is resolved, she has an
algorithm to execute and can successfully prepare your medication.
The separation of work between physician and pharmacist is successful because both
use a shared language of prescriptions. But that alone is not enough. Both of them also
have to understand the sentences of the language in the same way. This is not a trivial
requirement, evidenced by the fact that abbreviations of drug names and dosing have
actually led to dispensing mistakes. A naive idea for defining the meaning, or semantics,
for a language is to explicitly assign a meaning for each possible sentence. This cannot
work for languages that contain an infinite number of sentences, but even for finite
languages such an approach is highly impractical.
Instead the semantics definition has to work in two related steps. First, it assigns
semantics to individual words, and then it defines rules for how to derive the semantics
for a sentence from the semantics of its parts. For example, in the form for ordering
blood work, the semantics of an individual check box is to order one particular blood
test. The semantics of a group of such check boxes is defined as a collection of orders,
that is, to carry out all the orders that are obtained from the semantics of each individual
check box in the group. This compositional approach requires the syntax definition of
the language to make the structure of sentences available to the semantics definition,
which it does in the form of abstract syntax trees.
158 Once Upon an Algorithm
The languages for blood work and prescriptions show that the semantics of a lan-
guage can take different forms. For example, a prescription defines an algorithm for
filling prescriptions, whereas the form for blood work provides instructions to draw
blood and perform lab tests. The computers for executing the algorithms in these dif-
ferent languages need to understand the different semantics and consequently need dif-
ferent skill sets to do their work successfully. The form language for blood work also
illustrates that even a single language can have different semantics, namely, instructions
to draw a specific amount of blood and to perform different lab tests. Thus one sentence
in one language can be executed differently, often by different computers (for instance, a
phlebotomist and a lab technician) and produce completely unrelated results. This can
be beneficial and supports the division of labor.
One important application of having different semantics for one language is the
analysis of algorithms to detect and eliminate potential mistakes, which can help prevent
the computation of incorrect results. Compare this with a document that can be read
by different people to find typos or grammar mistakes, to judge the content, or to check
for compliance with typesetting conventions. All of these tasks have different goals and
can be performed before the document is published. Similarly, a pharmacist should
double-check a prescription before filling it to prevent any unintended side effects for
the patient.
Languages are everywhere. In addition to prescriptions, lab work, music, and nu-
merous other specific domains, computer science itself is full of languages, not even
counting the thousands of programming languages. We have already seen a language for
describing grammars. There are also languages to define the semantics of languages, to
define parsers and pretty printers, and many others. Languages are an essential com-
puter science tool for representing data and computation. The effective use of languages
depends on having a precise semantics, just as the health and safety of patients depends
on a clear semantics for prescriptions. Based on the song “Over the Rainbow,” I will
illustrate in chapter 9 how the semantics of a language can be defined and some of the
challenges involved in establishing such a definition.
9 Finding the Right Tone:
Sound Meaning
In chapter 8 we saw that the score for the song “Over the Rainbow” is an algorithm
that can be executed by musicians to produce music. The algorithm was written (that
is, invented and coded) by the composer Harold Arlen in the language of staff notation.
The syntax of this language can be defined by a grammar, which defines the appearance
of a sentence as a score and its internal structure as an abstract syntax tree.
We also saw that one language can be defined by different grammars, which may
at first seem like an oddity. However, differences in grammars lead to differences in
the abstract syntax and represent different views of music structure. These differences
matter because when Judy Garland wants to perform “Over the Rainbow,” she first
has to parse the music notation to uncover the song’s structure. In other words, the
meaning of a language builds on its abstract syntax.
This is why ambiguity poses a problem for any language. If a sentence can have
more than one abstract syntax tree, then it is not clear which structure to follow when
applying the semantics definition. Consequently, an ambiguous sentence may have
more than one potential meaning, which is how the word ambiguous is usually defined.
This chapter looks more closely at the problem of ambiguity and addresses the ques-
tion of how sentences acquire meaning. A key insight is that a systematic definition of
language meaning hinges on the concept of compositionality, which says that the seman-
tics of a sentence is obtained by combining the semantics of its parts in a systematic
160 Once Upon an Algorithm
way, as defined by the sentence’s structure. This shows that the structure of a language
plays a pivotal role in defining its meaning.
Some -day
Some day I'll
I'll wish
wish uu-pon
pon aa star
star and
and
If you know the song and try to sing the score, you notice that the emphasis should
be on the second note. It will probably not be easy to sing the song by emphasizing
the first note. Usually, however, it is the first note in each bar that is emphasized. This
means if Judy Garland had been given the previous score instead of the original, she
would probably have assumed, incorrectly, that the emphasis should be on the first
note. These two different interpretations are reflected in two different abstract syntax
trees, shown in figure 9.1. The first interpretation groups the first eight notes into
the first measure and the last note into the second measure. The second interpretation
groups the first note into the first measure and the remaining eight notes into the second
measure.
Both these syntax trees can be derived if we modify the grammar to not include bar
symbols. The difference in the two abstract syntax trees results from a different decision
Chapter 9 Finding the Right Tone: Sound Meaning 161
melody melody
... ...
Figure 9.1 Ambiguity in a grammar can cause a sentence to have different abstract syntax
trees. These trees present a different hierarchical structure for the same sentence. It is generally
impossible to determine the meaning of such a sentence, since its structure affects its meaning.
on which rule to use to expand the first measure nonterminal. The tree on the left in
the figure results when we expand it into eight note nonterminals, whereas the tree on
the right results if we expand it into only one note nonterminal. Both trees represent
the same sequence of notes, but their internal structures are different. The left-hand
tree emphasizes “Some,” whereas the right-hand tree emphasizes “day.” This sequence of
notes cannot actually be properly parsed, since the total time of the nine notes adds up
to 11/8, which is more than fits into one measure.
To correctly express the sentence one has to place
a bar somewhere to split the notes into two measures.
The correct placement of the bar is after the first note.
This will put the emphasis on the second note.
The Wizard of Oz contains another example of an ambiguity, in English, which
occurs as the Wicked Witch skywrites "Surrender Dorothy" with her broom when
Dorothy and her friends are in Emerald City. This message can mean either a call to
the inhabitants of Emerald City to surrender Dorothy or a call to Dorothy to surrender
herself. In the latter case, there should be a comma between “Surrender” and “Dorothy.”
Just as bars in music notation help to clarify the correct structure and emphasis of
melodies, commas and periods serve this purpose in written natural languages. And
punctuation symbols and keywords serve the same purpose in programming languages.
Sometimes languages offer constructs that leave some
choice to the interpreter of the algorithm as to what steps
to perform. For instance, the improvisation notation
here provides the musician with a choice for the pitch of the second, third, and fourth
notes. An algorithm that employs such constructs is called nondeterministic. In mu-
162 Once Upon an Algorithm
sic notation this is sometimes used to provide room for improvisation, that is, to give
musicians more freedom in their interpretation and performance of a music piece. But
nondeterminism must not be confused with ambiguity. From an algorithmic point
of view, a music score provides instructions for a musician to produce a sequence of
sounds, each of a particular pitch and duration. In contrast, an ambiguous notation
would leave the musician wondering what to do. Thus, while nondeterminism is a fea-
ture of a language, ambiguity is a bug in a language’s syntax definition. It may come as a
surprise, but algorithms that leave some choices open are quite common. For example,
in Hansel and Gretel’s simple way-finding algorithm, the next pebble to be chosen was
not uniquely specified.
Ambiguity is a pervasive phenomenon in languages. It reminds us that a language
is not just a collection of its well-formed sentences but also maps these sentences to
their structure in the form of abstract syntax trees. Ambiguities can be fun; they are
sometimes exploited in music to create clever effects, for example, by starting a melody
that is interpreted by the listener in some way and then adding a different rhythm that
creates a surprise and forces the listener to change the interpretation.
In our example you can simulate this as follows.
(This is best done with two people. If you have a key-
board available, this can also be done by one person,
but the effect is not as strong.) One person starts by humming or playing (but not
singing—this is probably too difficult) the alternating G and E eighth notes with the
(wrong) emphasis on the Gs. Then, after some time, the second person starts humming
quarter notes (for example, C) that start together with each E hummed by the first
person. After a few iterations, you will perceive a shift in the rhythm and the song will
suddenly sound like the well-known part from “Over the Rainbow.”
Ambiguities occur in other notations as well. You have probably
seen the Necker cube. If you look long enough at the picture, your per-
ception shifts between looking down on a cube from the right to looking
up at a cube from the left. Again, there is one visual representation that
has two different structural interpretations. This ambiguity matters. Suppose you want
to touch the cube at its top right front corner; then the position you reach out to de-
pends on your interpretation of the figure.
Related are figures like the Penrose triangle (inspired by M. C.
Escher drawings). However, the issue with these figures is not ambi-
guity between different possible interpretations but rather the fact that
no interpretation exists that is consistent with physical reality as expe-
rienced by people (see chapter 12).
Chapter 9 Finding the Right Tone: Sound Meaning 163
Ambiguity is an intriguing concept and a great source for humor in natural lan-
guages, as in playing on the difference between “Let’s eat, grandma” and “Let’s eat
grandma.” However, ambiguity presents a serious problem for algorithmic languages,
since it can prevent the clear representation of a sentence’s intended meaning. There
is a subtle interplay between the syntax of a language and its meaning. On the one
hand, one needs to think carefully about the syntax of a language to be able to define its
meaning. On the other hand, one has to understand the meaning before one can define
the syntax properly.
Take on Meaning
The ambiguity example demonstrates that without knowing the structure of a sentence
it is impossible to understand its proper meaning. But what really is the meaning of a
sentence?
Since natural languages such as English are used to talk about everything, it is very
difficult to narrow down the domain of their meaning other than saying “everything
that can be talked about.” For music languages this is much easier: The meaning of a
sentence, that is, a music piece, is the sound you can hear when the piece is performed.
Before diving more deeply into questions about the meaning of a language and how to
define it, I want to point out two different senses of the word meaning. On the one hand,
we have the meaning of an individual sentence. On the other hand, we have the meaning
of a language. To distinguish between the two, I mostly use the word semantics when
talking about languages and the word meaning when referring to individual sentences.
An overview of how sentences, languages, and meaning are related is given in fig-
ure 9.2. In a nutshell, the semantics of a language is given by the meaning of all its
sentences, and the meaning of an individual sentence is given by associating it with a
value of the domain the language talks about.
The meaning of one particular sentence in a music language is the music you hear
when somebody performs it. However, since different musicians will interpret the
song differently—some will sing it, some will play it on an instrument, others will vary
harmonization or speed—there doesn’t seem to be one unique sound that can be defined
to be the meaning of the song. This problem can be addressed either by postulating the
performance with one particular instrument by one particular musician or by taking
the performance by a MIDI synthesizer as the standard. Alternatively, we could say
that the meaning of “Over the Rainbow” is given by the set of all possible sounds
that result from proper performances of the song. Of course, this raises the further
question of what counts as a proper performance. To avoid a full discussion of this
164 Once Upon an Algorithm
Meaning
melody
...
measure measure ...
Language
Figure 9.2 The semantics of a language is given by a mapping that associates each sentence’s
structure, represented by its abstract syntax tree, with an element of the semantic domain. This
view of meaning is called denotational semantics, since it is based on assigning denotations to
sentences of a language.
thorny philosophical issue, I refer to words of the former U.S. Supreme Court Justice
Potter Stewart, who, when asked what his criterion for obscenity was, famously replied,
“I know it when I see it.” Thus, if you’ve heard the original version of “Over the
Rainbow” in The Wizard of Oz, you can tell a proper performance if you hear one.
After all, music is art, and we shouldn’t be surprised or concerned that it can’t be
captured completely through formal definitions. The important point here is that the
meaning of the “Over the Rainbow” score is a sound that somebody who knows the
song well would recognize as its performance.
If we now take all the sounds that could result from performing any music score
imaginable, we obtain a semantic domain. A semantic domain for a language such as
staff notation or tablature is the set of all meanings that any sentence of that language
can have. A semantic domain surely is a large set, but there are many things it does not
contain, such as cars, animals, thoughts, movies, traffic laws, and so on. Therefore, this
set still is useful in characterizing a language. It does that by describing what a user of
the language can expect a sentence of that language to denote.
The collection of all individual sentences of a language and their associated meanings
constitutes the semantics of a language. If a language is nonambiguous, each of its
sentences has only one meaning,1 and so picking out a sentence from the language will
always lead to its meaning. In an ambiguous language, the fact that one sentence can
Chapter 9 Finding the Right Tone: Sound Meaning 165
have more than one meaning is reflected by its having multiple syntax trees, each of
which can have a different meaning. The described view of meaning is referred to as
denotational semantics, which is based on the idea of assigning meanings to sentences.
There are several other approaches in computer science to defining the semantics of
languages, but denotational semantics is probably the closest to our intuition and also
easier to understand than some of the alternatives.
The major purpose of a language is the communication of meaning, but invented
languages have no a priori obvious semantics. Therefore, for a language to be useful, it
needs to be assigned a semantics. Since most languages consist of an infinite number of
sentences, we cannot simply list all sentences and their meanings. There must be some
other systematic way of defining the semantics of a language. This question ultimately
boils down to finding an algorithm for defining the meaning of individual sentences.
Such an algorithmic denotational semantics definition for a language consists of two
parts. The first part is a mapping of terminal symbols to basic elements of the semantic
domain. In this case this means mapping individual notes to sounds of a particular
pitch and duration. The second part is given by rules that say for each nonterminal
how to construct its meaning from the meanings of its children in the syntax tree. In
this example, there are three nonterminals. The rule for note is trivial, since each note
nonterminal has exactly one child, which means that its sound should be identical to
that of its child. The meaning of measure is obtained by concatenating the sounds of
its children in the order they appear. Finally, the meaning of melody is obtained in the
same way as for measure , namely, through the concatenation of the sounds obtained as
meaning for the melody’s measures.
This example illustrates how the meaning of a sentence is systematically constructed
via its abstract syntax tree by combining the meanings of leaves into meanings of their
parents, followed by combining their meanings, and so on, until the meaning for the
root of the tree is obtained. A semantics definition that has this form is said to be
compositional. A compositional definition is appealing because it mirrors the way a
grammar defines the syntax of an infinite language through a finite set of rules. A
language whose semantics can be defined in a compositional way is also called compo-
sitional. The principle of compositionality was identified by the mathematician and
philosopher Gottlob Frege during his investigation of languages and ways to formally
define language semantics. Some degree of compositionality is principally required to
obtain a finite description of the meaning for an infinite number of sentences given by a
grammar. If a language lacks compositionality, then it is generally problematic to define
its semantics, since the meaning of an arbitrary sentence cannot be obtained by com-
166 Once Upon an Algorithm
posing the meanings of its parts but must be described separately. Such descriptions are
exceptions that override the general rules for determining meaning.
The simplified music language described here is com-
positional, but many other languages are not, or only
partly so. English is one such example (as are most other
natural languages). For very simple examples of noncom-
positionality one has to look no further than compound
words. While a firefighter is somebody who fights fires, a
hot dog does not refer to a high-temperature canine, and
a red herring does not refer to a fish of a specific color.
Other examples are idiomatic expressions such as “kicking
the bucket” or “spilling the beans,” each of whose meanings
is not obtained through the combination of the expression’s words.
Staff notation also contains noncompositional
elements. For example, a tie is a symbol that con-
nects two consecutive notes of the same pitch, typ- why oh
why oh why
why can
can't II?
ically the last note of a measure with the first note of the next measure. This occurs in
“Over the Rainbow” at the very end of the song in the phrase “why oh why can’t I?”
where “I” is sung as one note that lasts for two whole measures. According to the rule
for finding the meaning of a melody, one has to determine the meanings of the indi-
vidual measures and then concatenate those. This would lead to two one-measure-long
sounds instead of one sound that lasts two measures. Therefore, to properly get the
meaning of notes tied across two (or more) measures, the rule for finding the meaning
of a melody must be overridden by a rule that treats groups of measures whose notes
are tied as one.
The rules for determining the meaning of a sentence are remarkably similar to the
rules that musicians follow when interpreting music notation. This is not an accident
because a denotational semantics for a language is an algorithm to compute the meaning
for a given sentence from that language. A computer that understands the language in
which the denotational semantics is given can execute the semantics and thus compute
the meaning of sentences. A computer or algorithm that can execute semantics is called
an interpreter. While it may seem strange to view Hansel and Gretel as interpreters
of pebble-following instructions, calling Judy Garland an interpreter of Harold Arlen’s
music probably won’t raise any objections. Just as an algorithm makes computation
repeatable, a language definition can be used to create computers for executing sentences
of the language and thus makes language execution repeatable. In the case of the music
notation, anybody who understands the semantics for the music notation can learn
Chapter 9 Finding the Right Tone: Sound Meaning 167
how to interpret music notation. In other words, the semantics can be used to teach
musicians without a music teacher; it allows people to teach themselves.
A question related to interpreters is what happens if we create a recording of the
music performance. Do recordings somehow embody the meaning of a music piece?
Not really. What happens is that the recording of music produces merely a different
representation of the music. The first recordings of “Over the Rainbow” were produced
on analog records, which contain grooves that are interpreted by the needle of a record
player and transformed into sound waves. Later, sound waves were encoded digitally on
a CD as a sequence of bits (zeros and ones) that are read by a laser and then interpreted
by a digital-to-analog converter to reproduce the sound waves. Today, most music is
represented using software formats, such as MP3, designed to support the streaming of
music over the internet. Whatever the representation is, it is still a representation of the
music in some specific format, or language, that requires a performer or computer for
executing the language instructions to produce the intended sound effect. Thus, playing
and recording a music piece is actually a form of translation from one language, say,
staff notation, into another, say, bit representation of sound waves.
Further Exploration
The song “Over the Rainbow” was used to illustrate that each sentence of a language
has a structure that is important for understanding its meaning. Any other music piece
could have been used as well to investigate the structure of songs and the potential ambi-
guity in notation, provided it could be described by some sufficiently expressive music
notation. To understand the significance of notes, bars, and other structural elements of
the notation, it is instructive to examine alternative notation systems for music and un-
derstand their limitations. In particular, the systematic restriction of existing notations
results in new languages. For example, restricting the available pitches of notes leads to
percussion notation, and ignoring pitches altogether leads to a language of rhythms. If
we further reduce the available note lengths to only two (“short” and “long”), we obtain
Morse code. Chord symbols ignore the melody and only show the harmonic progression
of a music piece, while Chord charts combine chord symbols and rhythm notation to
show the harmonic and rhythmic progression of a music piece.
Staff notation is more visual than a typical textual language, yet a sentence (that is,
music piece) is still a linear sequence of note symbols that can be adapted by changing
their position and appearance. This principle forms the basis for many notations that
are not traditional textual languages. For example, in the universal graphical language
Bliss, a sentence consists of sequences of words, which are symbols that can be composed
out of simpler symbols. This is similar to part of Egyptian hieroglyphs, in which symbols
can be combined in different ways, depending on their size and the intent of what
is being expressed. Bliss and hieroglyphs are general languages, intended to express
arbitrary thoughts. In contrast, the staff notation is targeted at a narrow domain and is
thus much simpler. An example of a special-purpose notation is chemical formulas that
describe the composition of molecules out of atoms (for example, the formula H2 O, for
water, says that each water molecule consists of one oxygen and two hydrogen atoms).
The previous examples are still mostly linear notations, that is, a sentence in these
languages is a sequence of symbols. Visual languages can be more expressive if they
employ a two-dimensional layout. For example, chemical formulas only represent the
170 Once Upon an Algorithm
proportions of atoms that constitute a molecule and ignore the spatial arrangement
of the atoms. In contrast, structural formulas provide a description of the geometric
arrangement of atoms. Similarly, Feynman diagrams are a two-dimensional language,
used in physics to describe the behavior of subatomic particles.
All of these languages employ a static one- or two-dimensional representation,
which means one can take a picture of a sentence and send it to somebody else, who
can then interpret it. Other languages transcend this snapshot limitation and employ a
temporal dimension as well. For example, gestures are a form of language. Movements
of body parts cannot be captured well by a single picture. To define a language of body
movements one typically needs videos or sequences of pictures. In user interfaces for
computers, tablets, and cell phones, hand movements such as swiping or pinching are
interpreted as actions to be performed. In addition to gestures, there are also notations
for describing dance, for example, Labanotation. A sentence in that language is an algo-
rithm to be executed by a dancer, which produces a dance as a computation. Origami
instructions are a similar kind of language for describing algorithms to fold paper ob-
jects. Language is not even restricted to humans. For example, in the series of books
about the adventures of Doctor Dolittle, animals communicate by movements of noses,
ears, and tails. But even outside of fiction we find, for example, the waggle dance of bees
to communicate the location of food sources.
The meaning of language is defined by rules that translate the abstract syntax of a
sentence into values of a semantic domain. Communication can be successful only if the
communicating partners agree on those rules. Lewis Carroll’s books Alice’s Adventures
in Wonderland and Through the Looking-Glass illustrate what happens if the rules are
violated or interpreted differently by the communicating partners.
Control Structures and Loops
Groundhog Day
Force of Habit
Back at the office, your first task is to send out several letters. Without thinking about
it, you fold each letter twice horizontally to reduce its height to about one-third and
then put it into an envelope. This works perfectly every time, since the paper and the
envelope you are using have a fixed size, and you figured out long ago that this particular
folding scheme makes the letter fit into the envelope.
The folding process is a computation described by an algorithm—a basic example
of origami. Despite its simplicity, the paper-folding algorithm illustrates a number of
points about languages.
The first is that the algorithm can be described in two slightly different ways. If fold
means to fold a piece of paper a specific distance from the top (for example, one-third of
the total paper length), then we could say either “fold and fold” or “fold twice.” It may
not seem like a big difference, but these are two fundamentally different descriptions
of a repeated action. The first lists an action to be repeated explicitly as many times
as needed, whereas the second merely says how often an action is to be repeated. The
difference becomes clearer when you have to send a larger sheet of paper that needs
additional folding steps. For example, for folding three times, the two approaches would
read “fold and fold and fold” and “fold three times,” respectively. Next, envision the case
for 500 or more repetitions, and you can see which approach remains practical.
The first description is a sequential composition of individual steps, whereas the sec-
ond is a loop. Both are examples of control structures, which are components of an
algorithm that don’t do any actual algorithmic work but rather organize the work of
other steps. This is similar to workers and managers in a factory. Managers don’t man-
ufacture anything directly but coordinate the actions of the workers who do. Control
structures are essential components of any language for describing algorithms, and loops
(or recursion) in particular are employed in most nontrivial algorithms.
There is actually a third algorithm for folding a piece of paper, which simply says
“fold until paper fits.” This algorithm also employs a loop control structure, but this
loop is different from the previous one in an important way. It does not explicitly
174 Once Upon an Algorithm
say how many times an action should be repeated, but instead defines a condition that
must be fulfilled for the repetition to end. This algorithm is actually more general
than the previous two, since it works for paper of arbitrary size, and thus it solves a
more general problem. It would be impossible to express this algorithm without a loop
(or recursion). In contrast, a loop with a fixed number of repetitions can always be
rewritten as a sequence of that many steps.
Assume you have to send not only a single letter but a whole bunch of documents.
Whenever you have more than five sheets of paper, instead of folding them all and
stuffing them into a small envelope, you use a larger envelop into which you can place
the stack of papers directly without folding. Making this decision, to fold or not to fold,
is an example of another control structure, the conditional, which executes one of two
possible actions depending on a condition. In this case the condition is that the number
of pages is five or less, in which case you fold the paper and use the small envelope.
Otherwise, you don’t fold the paper and use the large envelope.
A condition is also needed for determining when to end the repetition of a loop.
The condition to end the loop “fold three times” is expressed with a counter having
reached a particular value, namely, 3. In contrast, the condition of the loop “fold until
paper fits” tests a property of the paper that is transformed by the actions in the loop.
The latter kind of condition is more powerful, as evidenced by the fact that it expresses a
more general algorithm. But with this increased expressiveness also comes a cost. While
the counter-based loop can easily be shown to terminate (since the counter does not
depend on what the actions in the loop do), this is not so obvious for the more general
kind of loop, and we cannot be certain whether an algorithm using this kind of loop
will ever stop (see chapter 11).
Much of what we do is repetitive and can often be described by loops. Sometimes
whole days seem repetitive. This view is taken to the extreme in the movie Groundhog
Day, which guides the exploration of loops and other control structures in chapters 10
and 11.
10 Weather, Rinse, Repeat
Any algorithm is given in some language, whose semantics determine the computation
represented by the algorithm. We have seen that different languages can have different
semantics and can be targeted for different computers. For example, an algorithm ex-
pressed in a programming language is executed on an electronic computer and typically
denotes the computation of data. On the other hand, an algorithm in the language of
music notation is executed by a musician and denotes sounds. Despite these differences,
most nontrivial languages share an intriguing property, namely, that they consist of two
kinds of instructions: (1) operations that have a direct effect, and (2) control structures
for organizing the order, application, and repetition of operations.
Control structures not only play a crucial role in formulating algorithms, but also
determine the expressiveness of languages. This means that the definition of control
structures and the decision as to which control structures to include in a language de-
termine which algorithms can be expressed in a language and thus which problems can
be solved using it.
One such control structure is the so-called loop, which allows the description of
repetitions. We have seen many examples of loops already, as in Hansel and Gretel’s
way-finding algorithm that instructs them to repeatedly find the next unvisited pebble,
or in selection sort, which works by repeatedly finding the smallest element in a list.
Even music notation contains control structures for expressing loops. Though I have
employed loops extensively already, I haven’t discussed them in detail yet. This is done
in this chapter. Following the weather reporter Phil Connors, who relives Groundhog
176 Once Upon an Algorithm
Day over and over again, I explain how loops and other control structures work. I
show different ways of describing loops and the differences in the computations they
can express.
Phil: What would you do if you were stuck in one place and every day was
exactly the same and nothing that you did mattered?
Ralph: That about sums it up for me.
What makes Phil Connors’s daily routine so maddening to him—and so funny to the
viewer of the movie—is that everything happens in exactly the same way as the day
before, that is, everything that is not a direct response to his actions during the day.
Otherwise, the movie would quickly become boring, like a monotonously repeating
refrain at the end of a song.
Our days are typically less frustrating than the ever-repeating Groundhog Day be-
cause our actions yesterday have an effect on today. Therefore, each day, no matter how
repetitive it may seem, happens in a different context, and the knowledge that what we
and others do makes a difference provides us with a sense of continuous change and
progress. These observations suggest a distinction between two kinds of loops, namely,
loops that produce on each iteration the same outcome and loops that yield different
outcomes. An example of the first kind is a loop that prints the name of the city you
were born in, say, New York. Barring any renaming of the city, this loop will produce a
steady stream of the same outputs, such as “New York” “New York” “New York” . . . .
In contrast, a loop to report whether it is raining will produce a stream that contains a
mix of yes and no values—unless you live in Cherrapunji,1 where it’s likely to produce
a constant stream of yes values.
Note that even in a loop that produces varying outcomes, the loop body is the same
for each iteration. The variation is achieved by variables. A variable is a name that
points to part of the world. Through the variable name the algorithm can observe and
manipulate the world. For example, the variable weather points to the current weather
condition, say, sunny, and an algorithm that checks the current weather by accessing
this variable would obtain the corresponding value. The variable weather cannot be
changed by an algorithm, but the variable pebble that refers to the next pebble Hansel
and Gretel walk toward changes with each pebble visited. Similarly, in the instruction
to find the smallest element in a list that is part of selection sort, the variable smallest
element changes in each iteration unless the list contains duplicates.
Since the term loop is sometimes used for both the description of the loop (the
algorithm) and the computation it generates, this is a good time to recall the differ-
ence between the description of a computation and its execution. A loop for the daily
weather report looks as follows:
repeat report weather until forever
The result of executing this loop is a sequence of values, which is a very different thing
from the loop description itself. For example, the execution of the algorithm could lead
to the following sequence:
rainy cloudy sunny sunny thunderstorm sunny . . .
178 Once Upon an Algorithm
This weather-reporting loop describes the kind of loop that Phil Connors finds himself
in: each morning he has to prepare a report on the forecast of his colleague, the ground-
hog Punxsutawney Phil. (Interestingly, despite all the dramatic variations he creates for
the day—at one time he even tries to kill Punxsutawney Phil—he never seems to miss
his report about the groundhog’s forecast.) Of course, Phil Connors hopes that the
loop he is in does not repeat forever. In fact, he assumes the loop is of the following
form:
repeat report weather until ‘some hidden condition’
His initial attempts to escape this loop are essentially efforts to discover the hidden
condition, which is a critical component of any loop, called its termination condition.
Toward the end of the story we learn that the termination condition is his becoming
a good person, somebody who cares about and helps other people. His reincarnation
every day provides an opportunity to perfect his karma, which is the key to his leaving
the purgatory-like Groundhog Day.
The Groundhog Day loop is an instance of the following general loop schema:2
repeat step until condition
The termination condition appears at the end and is evaluated after the body of the loop
has been executed. If the condition is true, the loop stops. Otherwise, the body will be
executed again, after which the condition is also checked again to determine whether to
continue or end the loop, and so on.
It is clear that a termination condition that is false at one time can become true
later only if it contains a variable that can be changed by the actions of the loop body.
For example, efforts to start a car that has run out of gas will not succeed until gas is
refilled. Thus, if the repeated actions include only turning the ignition key or maybe
kicking the tires or the incantation of a magic spell, they won’t help. The termination
condition thus lets us distinguish between terminating loops whose termination condi-
tion eventually becomes true, and nonterminating loops whose termination condition
always remains false.
It has been said that insanity is doing the same thing over and over again and expect-
ing different results.3 We might be tempted therefore to brand nonterminating loops
as insane, but there are examples of perfectly valid loops that do not terminate. For
example, a web service is a loop that takes a request, processes it, and then continues to
take the next request, ad infinitum. However, when a loop is part of an algorithm—in
particular, when it is followed by other steps—we expect it to end eventually because
otherwise steps that follow the loop could never be executed, and the algorithm would
not terminate either.
Chapter 10 Weather, Rinse, Repeat 179
step ; step
180 Once Upon an Algorithm
Here step is a nonterminal that can be substituted by any simple or compound step. In
particular, step can be another sequential composition of other steps. Thus, if you want
to squeeze in a shower between getting up and having breakfast, you can use sequential
composition twice by expanding the first step to get up; take a shower and the second
one to have breakfast, which altogether produces get up; take a shower; have breakfast.4
As with the example of folding a piece of paper twice, “fold and fold,” which we
now write as fold; fold, the steps connected by ; need not be different. Executing a loop
repeat fold until paper fits (or repeat fold three times) yields the same computation as
the sequence fold; fold; fold, which illustrates that loops are a tool to describe sequences
of actions. The important contribution of a loop is that it produces arbitrarily long
sequences of steps while having to mention the actions to be repeated only once.
The conditional selects one of two (groups of) steps for execution depending on a
condition. Like a loop it uses a condition to make the decision. The general form of a
conditional is as follows:
if sunny then announce six more weeks of winter else announce early spring
The conditional is a control structure that allows algorithms to make choices and select
between alternative steps. The preceding conditional is part of a yearly loop for Punx-
sutawney Phil and a daily loop for Phil Connors. This illustrates that control structures
can be combined in arbitrary ways, that is, conditionals can be part of loops or sequen-
tial compositions, loops can occur in the alternatives of a conditional or as part of a
sequence of steps, and so on.
The basic steps of an algorithm are like the moves in a game (for example, passing
the ball or taking a shot on a goal in soccer, or attacking a piece or castling in chess).
Then the control structures define strategies in those games, for example, repeat pass
until in front of goal (or if you are Lionel Messi, repeat dribble until in front of goal),
that is, control structures compose basic moves into bigger plays.
As shown in chapter 8, there are many different notations to describe music. Sim-
ilarly, there are different notations for algorithms. Each programming language is an
example of a specific notation for algorithms, and while languages can differ signifi-
cantly in which specific control structures they offer, most provide some form of loop,
conditional, and composition.5 A notation that nicely illustrates the difference between
control structures is a flowchart. A flowchart depicts an algorithm as boxes connected
Chapter 10 Weather, Rinse, Repeat 181
experience no
Groundhog Day
is it yes announce
get up sunny? winter
good
take a shower no person?
announce yes
have breakfast early spring
Figure 10.1 Flowchart notation for control structures. Left: A sequence of steps Phil Connors
takes every morning. Middle: A conditional showing the decision that Punxsutawney Phil faces
on each Groundhog Day. Right: A loop expressing the life of Phil Connors during the movie
Groundhog Day. The “no” arrow and the arrow to the condition form a cycle through the two
nodes.
by arrows. Basic actions are displayed inside the boxes, and decisions are enclosed in
diamond shapes. The arrows indicate how the computation progresses. For sequential
composition this means to follow single arrows from box to box, but for conditionals
and loops, whose conditions have two outgoing arrows, which arrow to follow depends
on the condition. Some examples of the flowchart notation are shown in figure 10.1.
It is striking how similar the notations for conditionals and loops are. Both consist
of a condition with two possible continuations. The only crucial difference is that the
“no” path from the condition in the loop leads to a step that continues back to the
condition. The cycle thus formed provides a nice visual explanation of the name loop
for this control structure.
Flowcharts are a visual language. In contrast to a textual language that presents an
algorithm as a linear sequence of words and symbols, a visual language presents symbols
in two- (or three-)dimensional space, connected by spatial relationships. Flowcharts use
arrows to express the “execute next” relationship between actions, which are repre-
sented as boxes. A flowchart looks similar to a transportation network. The junctions
are places where actions take place, and the connections lead from one action to an-
other. Think of moving through an amusement park. The park can be regarded as an
algorithm for having fun. Different people will visit individual attractions in a different
order and a different number of times, depending on their predispositions and experi-
ences with the attractions. Or envision the aisles of a supermarket that connect different
182 Once Upon an Algorithm
sections and shelves. The supermarket can be viewed as an algorithm for different shop-
ping experiences.
Flowcharts were quite popular in the 1970s, and they are sometimes still employed
for the documentation of software, but they are rarely used these days as a programming
notation. One reason is that the notation does not scale well. Even moderate-sized
flowcharts are difficult to read—the profusion of arrows has been called spaghetti code.
Moreover, the similarity between the notation for conditional and loop, while quite
helpful to illustrate their relationship, makes it hard to identify and distinguish between
control structures in flowcharts.
The control structures presented here are those used by algorithms for single com-
puters. Modern microprocessors come with multiple cores that can execute operations
in parallel. People, too, can compute in parallel, especially when they are in a team.
To make use of parallelism, an algorithm needs control structures that are specifically
dedicated for this purpose. For example, we could write walk ∥ chew gum to instruct
somebody to walk and chew gum at the same time. This is different from walk; chew
gum, which means first walk and then chew gum.
Parallel composition is useful when two results that do not depend on each other
are needed by another computation. For example, Sherlock Holmes and Watson often
split their investigative work in solving a crime. On the other hand, one cannot get
up and take a shower at the same time. Those two actions must be executed strictly in
sequence (and the order matters).
Related to parallel computing is distributed computing, in which computation hap-
pens through communication among interacting agents. For example, when Phil Con-
nors and his team create a report about the groundhog’s prognostication, they need to
synchronize the operation of the camera with Phil Connors’s talking. An algorithm to
describe this coordination requires its own control structures, in particular, for sending
and receiving messages.
In general, any domain-specific language, that is, a language for
a specialized application area, may have its own control structures.
For example, music notation contains control structures for repetition and jumps, and
languages for recipes contain constructs for choice that allow for the creation of variety
in recipes. Control structures are the glue to connect primitive operations into larger
algorithms for describing meaningful computations.
Chapter 10 Weather, Rinse, Repeat 183
Even though both loops are controlled by a condition, the role of the condition is
different in each loop. Whereas the condition controls the exit of a repeat loop, it
controls the (re-)entry of a while loop. In other words, if the condition is true, the
repeat loop ends, whereas the while loop continues; if the condition is false, the repeat
loop continues, whereas the while loop ends.6 This difference is also highlighted by the
flowchart notation for the two loops, shown in figure 10.2.
Despite their apparently different behaviors, one can express a repeat loop using a
while loop, and vice versa. For this, one has to negate the condition, that is, transform
the condition so that it is true in exactly those cases when the original condition is false.
For example, the termination condition for the Groundhog Day repeat loop, “being a
good person,” becomes the entry condition “not being a good person” or “being a bad
person” for the corresponding while loop. Moreover, one must be careful to ensure the
same number of iterations. For example, the repeat loop lets Phil Connors experience
Groundhog Day at least once, no matter what, whereas the while loop does that only
if he is a bad person. Since this is the case in the story, both loops behave the same.
The language representation lets us express the equivalence between the loops more
formally through a simple equation:
184 Once Upon an Algorithm
experience no
Groundhog Day bad no
person?
yes
good
person? experience
Groundhog Day
yes
Figure 10.2 Flowchart notation illustrating the different behavior of repeat and while loops.
Left: Flowchart of a repeat loop. Right: Flowchart of a while loop.
GroundhogDay = experience the day; if good person? then do nothing else GroundhogDay
This definition effectively emulates a repeat loop: after living through the day, the con-
ditional checks the termination condition. If it is false, the computation simply ter-
minates by doing nothing. Otherwise, the algorithm is executed again. The recursive
Chapter 10 Weather, Rinse, Repeat 185
execution of the algorithm is like a jump to the beginning of the sequence and triggers
the reexecution of the loop.
All the different loop descriptions encountered so far (repeat, while, and recursion)
have in common that their termination is controlled by a condition that is reevaluated
before or after each execution of the body. The termination of the loop depends on
the body to have an effect that ultimately makes the termination condition true (or the
entry condition false, in the case of a while loop). This means that it is not known
in advance how many iterations a loop will go through; it is not even clear that any
such loop will terminate at all. This uncertainty is actually an important part of the
Groundhog Day loop that Phil Connors experiences.
For some computations described by loops, however, it is clear how many times the
loop should be executed. For example, if the task is to compute the square of the first
ten natural numbers, it is clear that this computation can be achieved by a loop that
repeats the squaring operation exactly ten times. Or, recall the algorithm to fold a piece
of paper to fit into an envelope, which is described by a loop that is executed exactly
twice. For cases like these, we employ for loops, which have the following general form:7
Using this schema, the paper-folding loop would be expressed as for 2 times do fold.
The advantage of a for loop is that it is absolutely clear even before it is executed how
many iterations will be performed. This is not the case for the other loops because one
finds out only when executing the loop. This is an immensely important difference,
since the for loop is guaranteed to terminate, while the other loops may run forever
(see chapter 11).
Closely related is the question of the runtime of loops. It is clear that a loop that
is executed, say, 100 times takes at least 100 steps. In other words, a loop is linear in
the number of its iterations, and this is the case for every kind of loop. In addition to
the number of iterations we also have to consider the runtime of the loop’s body. The
runtime of a loop is the number of its iterations times the runtime of the body. For
example, selection sort is a loop whose body consists of finding a minimum in a list. The
loop is linear in the size of the list, and the body takes, on average, time proportional
to half of the list length. Therefore, the runtime of selection sort is quadratic in the size
of the list.
Given that a for loop seems to behave so much more predictably than all the other
kinds of loops, why not use for loops exclusively? The reason is that for loops are less
expressive than while loops and repeat loops (and recursion), that is, there are problems
that can be solved using a while loop or a repeat loop that can’t be solved by a for loop.
186 Once Upon an Algorithm
The Groundhog Day loop is an example where it is not known at the beginning (at
least, not to Phil Connors) how many iterations it will take. It is easy to see that any
for loop can be represented by a while (or repeat) loop by explicitly maintaining the for
loop counter, but the reverse is not true because one cannot immediately see how many
iterations a while loop or a repeat loop needs before it terminates.
Predictability has its price. While uncertainty about the duration or outcome of
adventures is welcome, we would rather know in advance how long a computation
takes before we use it and rely on it—in particular, if it might run forever.
Stop at Nothing
After you’ve folded your letters and put them into envelopes, it is time to welcome your
new colleague who recently moved into an office one floor above you. Your walk to
her office is the result of executing an algorithm that repeats taking steps until you have
reached her office. But what if you don’t know the exact location of her office? In that
case, you have to walk around the floor and try to find her name on the office door. But
what if her start date was pushed back and she hasn’t even moved in yet? In that case,
your simple algorithm that takes steps until the target office is reached won’t terminate;
rather, it will continue forever.
Of course, you wouldn’t actually do that. You would execute a different algorithm,
which, after searching the whole floor (maybe repeatedly, just to be sure), would aban-
don the search and lead back to your own office. This algorithm has a more sophisti-
cated termination condition. It would not require only finding the office but would also
allow for termination after a certain time.
Loops that do not terminate seem like a silly idea. Although some computations
are the result of executing a loop without a termination condition (for instance, a web
service or even a simple timer or counter), a computation to produce one fixed result
would only use loops that are intended to terminate because a nonterminating loop
would prevent the result from being constructed.
In general, algorithms are expected to terminate because otherwise they would not
be an effective method for solving problems. It would therefore be helpful if we had a
way of telling whether an algorithm terminates when executed. Since the only reason
for an algorithm not to terminate is that one of its loops doesn’t terminate,1 judging the
termination of an algorithm boils down to determining whether the loops employed by
it actually terminate. The different algorithms for folding paper all do in fact terminate.
This is clear for the for loop fold three times, since it explicitly mentions the number
of iterations of the loop. The repeat loop fold until paper fits also terminates, but this
may not immediately be obvious. Even if we know that each folding step reduces the
size of the paper, we also have to assume that folding happens along different axes when
188 Once Upon an Algorithm
needed. In that case, the size of the folded paper must eventually become smaller than
that of the envelope, since it halves with every folding step. While it is not clear how
many iterations the loop will go through and thus how long the algorithm will take, it
is clear that the algorithm terminates eventually.
However, this is not the case for loops in general. The walking loop to find your
colleague’s office will not terminate if the termination condition is to find an office that
doesn’t exist. The reason you wouldn’t search for a nonexisting office forever is that
the office-finding problem is only a small part of a much larger set of goals for your
workday, and if the algorithm is unable to solve the subproblem, it will be abandoned,
and either a different method will be adopted or the goal will be replaced by another
goal. If you consider programming a robot that simply executes the office-finding loop
and that does not understand it as part of a bigger mission, the robot would go on
forever (or until it runs out of energy or is stopped by a human).
So how can we distinguish between a terminating and a nonterminating loop? As
we have seen, the answer is trivial in the case of for loops, which have a fixed number of
iterations and thus always terminate. In the case of repeat loops (and while loops) we
have to understand the relationship between the termination condition and the steps in
the body of the loop. An interesting question to be investigated in chapter 11 is whether
there is an algorithm that we can use to determine the termination behavior of loops.
The answer may surprise you.
The runtimes of algorithms matter. A linear algorithm is preferable to a quadratic
one, and an exponential algorithm is impractical for all but trivial input sizes. How-
ever, an exponential algorithm is still better than one that never terminates even for
small inputs. The question of termination of an algorithm comes down to the ques-
tion of termination of loops; it is one of utmost importance. This is also the question
that haunts Phil Connors as he desperately tries to identify and satisfy the termination
condition of the Groundhog Day loop.
11 Happy Ending
Not Guaranteed
Loops and recursion inject power into algorithms. Loops enable algorithms to process
inputs of arbitrary size and complexity. Without loops, algorithms could deal only with
small and simple cases of inputs. Loops get algorithms off the ground. They are to al-
gorithms what wings are to airplanes: without wings, airplanes still can move, but their
full transportation potential cannot be realized. Similarly, there are some computations
that algorithms can describe without loops, but the full power of computation can be
realized only with loops. Such power also needs to be controlled, however. As the
Groundhog Day story illustrates vividly, a control structure over which one has no con-
trol is not a blessing but a curse. Goethe’s poem “The Sorcerer’s Apprentice” may also
come to mind (which may be better known in the United States through the Mickey
Mouse sequence in the film Fantasia). The key to controlling a loop is to understand
the condition that decides when it ends. Phil Connors can eventually end the loop he is
in, leading to a happy ending. But this happens more as a coincidence than by following
a plan.
Since algorithms are generally expected to terminate, one would like to know before
executing an algorithm whether all the loops in it will terminate. Trying to figure this
out can be quite a tedious and complicated endeavor, and therefore we ideally would
like to delegate this task to an algorithm itself. The quest for an algorithm that can de-
cide whether another algorithm will terminate is a famous problem in computer science
called the halting problem. This chapter explains and discusses this problem and shows
that it is actually unsolvable, which means that no such algorithm can exist. This is a
190 Once Upon an Algorithm
rather surprising fact that reveals a lot about the limitations of algorithms and compu-
tation in general.
Even though the events that are unfolding in the Groundhog Day story look like a
loop, the script of the story does not contain a loop. Instead, all repetitions of actions
are spelled out in detail. Thus, it is not hard to determine that the story does end. After
all, a movie producer must know the length of a movie before starting the project. But
we can imagine a version of the story as a play in which the acting is not prescribed but
improvised. That description could include a loop with a termination condition, but for
the improv performance of Groundhog Day, it is not clear how long it would take. And
if it weren’t for the ultimate exhaustion of the actors (and the audience), we wouldn’t
know in advance if it terminated at all.
Out of Control
On his second day in Punxsutawney, Phil Connors starts to suspect that he might be
reliving the previous day, after he wakes up to the same song on the radio and meets
the same people in the same situations. Being deferred by someone on the phone to the
next day, he replies:
During the many repetitions of Groundhog Day, most events and encounters are iden-
tical to the day before. However, some details vary because Phil Connors reacts differ-
ently each time. Initially, as he gets used to his new situation, he tries to take advantage
of and manipulate people by accumulating information over successive iterations of the
Groundhog Day. This is most prominent in his attempt to learn everything about his
producer-turned-lover Rita.
His strategy works, in principle at least, because of the fundamentally different way
in which he and other people experience the Groundhog Day loop. Most important,
while he perceives the repetition of the day, all other people are not aware of it at all.
In fact, when he tries to share his plight with others, for example, with Rita or later
with a psychiatrist, they think he is crazy. This fact provides him with an important
advantage, since, in contrast to other people, he has the ability to remember information
that happened in previous iterations.
A similar situation exists in the loops of algorithms. Some things remain constant
over different loop iterations while others change. For example, in the loop to find an
element in a list (see chapter 5) the list of elements doesn’t change, and neither does
Chapter 11 Happy Ending Not Guaranteed 191
the element that we are looking for, but the current position in the list and thus the
currently inspected element changes in every iteration until the element is found or the
end of the list is reached.
A closer look reveals that the loop body and the termination condition have access
to the state of the world, also just called state. As explained in chapter 10, this access
happens through variables. The instructions of the loop body can both read the vari-
ables and change their values. In contrast, the termination condition can only read the
variables to produce a true or false value to end or continue the loop, respectively.
In the element-finding algorithm the state consists of the list to be searched, the
element to be found, and the position in the list that is currently being searched. To
make the discussion more concrete, I use the following analogy.
Suppose you have taken a picture of a flower on one of your recent hikes and want
to know what kind of flower it is. To this end, you want to look it up in a book
of plants that contains on each page a description for a particular flower, including a
picture. Assuming that the pictures in the book are not given in any particular order,
you have to go through all the pages of the book to find the page that describes your
flower. To make sure you find an entry for your flower, you have to look at each page
until you have found it or exhausted all the pages. A simple method is to start with the
first page, then look at the second, and so on.
The state for this search consists of three items: the picture of your flower, the book,
and the current page in the book, given, for example, by a bookmark. This state is read
by the termination condition, which checks whether the bookmark for the current
page has reached the end of the book or whether the picture on the current page shows
the same flower as your picture. In either of these two cases, the loop ends, but only
in the latter case with a successful search. The loop must also contain an instruction
to turn the page if the picture has not been found yet and when there are more pages
with pictures to inspect. This step modifies the state of the loop because it changes the
current page. This is also the only modification of the state that occurs in the loop.
It is clear that the state modification is critically important for the search algorithm
because without turning pages you would not be able to find the picture (unless it
happens to be on the first page).
This fact is clear to Phil Connors, and he therefore tries to change things around
Punxsutawney to let the Groundhog Day termination condition become true so that
he can exit the loop. But not only is the state of the Groundhog Day loop much larger—
it includes all the people (and Punxsutawney Phil), their thoughts and attitudes, and so
on—it is not even clear what the state consists of. And so it is rather by coincidence
that he is able to finally escape the loop.
192 Once Upon an Algorithm
Figure 11.1 Unfolding a loop means to create a sequence of copies of its body. For each
iteration of the loop a separate copy of the loop body is created. The figure shows the state
that is modified and how it changes (if at all). If the state never changes so that the termination
condition is fulfilled, the sequence becomes infinite.
You may object that the Groundhog Day loop analogy to computing is invalid be-
cause the computer that executes it and the rules by which it operates are completely
fictional—some kind of metaphysical purgatory engine to produce good people. While
the Groundhog Day story is certainly fictional, its loop metaphor is apt. It also illus-
trates that computation can happen under very different circumstances, with computers
that operate under a vast variety of rules. In fact, computation is often used to simulate
hypothetical worlds and scenarios. As long as the rules for computation are logical and
consistent, there is no limit to the computational scenarios that our imagination can
create.
ter, and reacting to the encounters he makes. Since we don’t have a precise description
of this behavior, we can’t call it an algorithm, but the execution of the Groundhog Day
loop still unfolds the actions from each day into a long sequence that eventually leads
to the satisfaction of the termination condition. We cannot pin down a specific action
that changed the state so that the termination condition was fulfilled. This is different
from the example of finding a picture in a book, where it is clear that turning the page
is a necessary action to change the state so that the search algorithm can terminate.
For example, suppose that we alternate between flipping forward and backward a
page. While this is changing the state by turning pages, it obviously will prevent the
algorithm from terminating (unless the book has only two pages). Of course, few
people would do such a thing, but this example illustrates that changes to the state are
not enough to ensure the termination of an algorithm. Rather, an algorithm must make
the right changes to achieve termination.
A related aspect is that we want the algorithm not only to terminate but also to
produce a correct result. Correctness in this example means that we either stop on the
page that contains the sought flower (if it is in the book), or we stop on the last page
knowing that the flower is not contained in the book. But now consider the following
variation of the algorithm, which turns not one single page but two or more pages at
a time. In this case, we may miss the picture in the book. The algorithm will still
terminate, but when we have reached the last page, we can’t be sure that the flower is
not contained in the book.
The termination condition of the Groundhog Day loop is Phil Connors’s being a
good person. Since he doesn’t know this, he initially tries out all sorts of actions to
change the state to achieve termination. This includes different forms of suicide and
even the murder of Punxsutawney Phil—all to no avail. As he realizes that he has no
control over the loop, he changes his attitude from cynical to making the best of the
day by helping other people. Ultimately, as his transformation into a good person
is complete, he can successfully exit the moral loop. And to make the happy ending
perfect, Rita reciprocates his love for her.
The task that Phil Connors faces is especially hard, since he is flying blind. He has
to satisfy the termination condition of the Groundhog Day loop without even knowing
what it is. Moreover, he doesn’t know what the underlying state is and how to change
it. A truly daunting task. Surely, finding out the termination of an algorithm must be
much easier because we can see the termination condition, know what the underlying
state is, and can see how the actions in the loop body might change the state.
194 Once Upon an Algorithm
No End in Sight
Suppose you are presented with an algorithm and have to decide whether the algorithm
is worth running, that is, whether it produces a result in a finite amount of time. How
would you go about it? Knowing that the only source of nonterminating behavior
is loops, you would first identify all the loops in the algorithm. Then, for each loop
you would try to understand the relationship between the termination condition and
the instructions of the loop body, because the termination of a loop depends on the
termination condition and how the values on which it depends are transformed by the
loop body. This analysis will allow you to decide whether a particular loop terminates
and whether the algorithm has a chance of terminating. To ensure the termination of
the algorithm, you have to perform this analysis for each loop in the algorithm.
Since termination is such an important property of algorithms—it separates algo-
rithms that actually solve a problem from those that don’t—it would be nice to know
the termination property for any algorithm that one may want to use. However, per-
forming a termination analysis for an algorithm is not an easy task, and it might take a
considerable amount of time to carry out. Thus, we are tempted to automate this task,
that is, to create an algorithm, say, Halts, that does the termination analysis automati-
cally. Since we have many other algorithms for analyzing algorithms (for example, for
parsing, see chapter 8), this doesn’t seem like such an outlandish idea.
Unfortunately, it is impossible to construct the algorithm Halts. It is not that this
is too difficult at the moment or that computer scientists have not thought about the
problem long and hard enough. No, we know that it is impossible in principle to craft
an algorithm Halts. It is impossible now, and it will never be possible. This fact is often
referred to as the unsolvability of the halting problem. The halting problem is a major
problem in computer science. It was conceived by Alan Turing in 1936 as an example
of an undecidable problem.
But why can’t we create the algorithm Halts? Since any algorithm to be analyzed by
Halts is itself given by a finite description, it seems we would have to inspect only a finite
number of instructions and how they affect the state that determines the termination
condition.
To understand why it is indeed impossible to define Halts, let us first construct
an algorithm, Loop, whose termination behavior is clear (see figure 11.2). Loop takes
a number as a parameter and assigns it to the variable x. Application of Loop to the
number 1—written as Loop(1)—yields a computation that assigns 1 to the variable x
and stops, since the termination condition of the repeat loop is true (middle graphic in
the figure). Otherwise, for any other number, it loops back to the assignment of the
Chapter 11 Happy Ending Not Guaranteed 195
Loop 1 Loop 2
Loop
x= x=1 x=2
Figure 11.2 The algorithm Loop stops when called with the number 1 as an argument but
loops forever when called with any other argument. Left: The definition of the algorithm Loop.
Middle: Application of Loop to the number 1. Right: Application of Loop to the number 2.
Figure 11.3 Left: The structure of the algorithm Halts. It takes two parameters, an algorithm
(Alg) and its input (Inp), and tests whether the algorithm applied to the input Alg(Inp) termi-
nates. Middle: Application of Halts to the algorithm Loop and number 1, which results in “yes.”
Right: Application of Halts to Loop and the number 2, which results in “no.”
can get a glimpse of what’s going on by unfolding the definition of Selfie when applied to
itself. To do so, we substitute Selfie for the parameter Alg in its definition. As shown in
the middle graphic of figure 11.4, this leads to a loop that terminates when Selfie applied
to its own definition does not halt, because if Halts(Selfie,Selfie) is true, the algorithm
loops back to test the condition again, and otherwise stops.
Selfie Selfie
Selfie Alg
Figure 11.4 Left: The definition of the algorithm Selfie. It takes an algorithm (Alg) as a param-
eter and tests whether the algorithm applied to itself terminates. In that case Selfie will enter a
nonterminating loop. Otherwise, it will stop. Middle: Application of Selfie to itself leads to a
contradiction: If Selfie applied to itself terminates, then it goes into a nonterminating loop, that
is, it doesn’t terminate; if it doesn’t terminate, it stops, that is, it terminates. Right: Expanding
the definition of Halts provides a slightly different view of the paradox.
Chapter 11 Happy Ending Not Guaranteed 197
The movie Groundhog Day has illustrated different aspects of a loop. A crucial feature
in the movie is the question of whether the loop will ever end. In the movie, the
protagonist Phil Connors tries to find a combination of actions that will end the loop.
Similarly, in the movie Edge of Tomorrow, an Army public relations officer named Bill
Cage, who dies while reporting about an alien invasion of earth, is repeatedly sent back
in time to the previous day to relive the experience. In contrast to Groundhog Day,
where every loop iteration takes exactly one day, later iterations in Edge of Tomorrow
take Bill Cage further into the day and delay his death as he is trying out alternative
actions in every iteration. (The movie is also known under the title Live. Die. Repeat.,
making the loop algorithm obvious.) This is similar to replaying a video game over
and over again, learning from past mistakes and trying to reach a higher level each
time. Dark Souls is a video game that exploits this phenomenon and makes it a feature
of its game play. The novel Replay, by Ken Grimwood, also contains characters who
relive parts of their lives. However, the relived episodes get shorter and shorter, ending
always at the same time and starting closer to the end point. Moreover, the people have
no control over their replays. This is also the case for the protagonist in the movie
Source Code, where a passenger on a commuter train into Chicago is in an eight-minute
time loop, trying to prevent a bomb attack. In this story, the protagonist himself has
no control over the loop, since it is a computer simulation managed from the outside.
A loop works by changing in each iteration parts of an underlying state until the
termination condition for the loop is fulfilled. Relevant for the effect and termination
of a loop are only those parts of the state that transcend each iteration. In the case
of Groundhog Day, the state that can be changed is only the person Phil Connors.
Everything else (the physical surroundings, the memories of other people, etc.) is reset
on each iteration and cannot be affected by Phil Connors. Analyzing a story from this
perspective helps uncover implicit assumptions about the effect of actions, memories of
people, and so on, and thus supports the understanding of the story. It also helps with
discovering inconsistencies and plot holes.
200 Once Upon an Algorithm
When the actions in a loop don’t affect the parts of the state that can make the termi-
nation condition true, a loop will not terminate. The movie Triangle seems essentially
to consist of one big nonterminating loop. However, since some of the actors have
doppelgängers, the loop structure is actually more intricate and indicates the existence
of several interleaved loops. A nonterminating loop is displayed most prominently in
the story of Sisyphus. According to Greek mythology, Sisyphus was punished by Zeus
by forcing him endlessly to push a boulder up a hill. When he reaches the top of the
hill, the boulder rolls down again, and Sisyphus has to begin all over again. The story of
Sisyphus is interpreted philosophically in Albert Camus’s The Myth of Sisyphus, which
contemplates how to live in a world that is ultimately devoid of meaning and where no
action can have any lasting effects.
Recursion
You are back in your office and have several phone calls to make. Calling someone is
simply a matter of a few clicks on your smartphone, or you might ask an electronic
assistant to initiate the call for you. But without a computer to help, you would have
to find the number in a phone book. How do you do that? You probably won’t start
on the first page and search through all the names one by one, that is, you won’t treat
the phone book as a list. Instead you will probably open the phone book somewhere
in the middle and compare the name you are looking for with the one you see. You
may get lucky and immediately find the name on the current page. If so, you are done.
Otherwise, you will continue your search either before or after the current position,
again by opening a page somewhere in the middle of the identified unexplored page
range. In other words, you will employ binary search to find the phone number. The
same algorithm is used to find a word in a printed dictionary or a book among the
shelves in a library.
This algorithm, discussed in some detail in chapter 5, is recursive, since it occurs as
part of itself. This becomes clear when you try to explain the algorithm to somebody
else. After describing the steps of the first round (opening the book, comparing the
name, deciding whether to search forward or backward), you need to explain that the
same steps have to be repeated again. You could say, for example, “Then repeat these
actions.” It is the compound noun “these actions” that refers to what needs to be re-
peated. If we decide to give a name to the whole algorithm, say BinarySearch, then we
can actually use this name in the instruction: “Then repeat BinarySearch” to express the
repetition in the algorithm. Employing the name BinarySearch in this way results in a
definition of the algorithm that uses that name in its own definition, and this is what
descriptive recursion is: it uses a name or symbol or some other mechanism to express
self-reference, that is, to refer to itself in its definition.
When executed, the algorithm leads to the repetition of a few steps (for example,
opening a book on a particular page or comparing names) potentially many times.
In this respect, recursion is similar to a loop, which also causes the repetition of the
204 Once Upon an Algorithm
If you are reading this sentence now, you didn’t follow the instructions of the task; oth-
erwise you wouldn’t have come this far. Just as you aborted the fruitless search for your
colleague’s office, you have made a decision outside of the algorithm (the preceding
sentence in this case) to stop executing the algorithm. Just as loops are prone to nonter-
mination, so are recursive descriptions, and detecting nontermination is algorithmically
impossible for recursion, as it is for loops.
Chapter 12 explains different forms of recursion using time travel as a metaphor.
By taking a close look at paradoxes—in time travel and in recursion—we can better
understand what the meaning of a recursive definition is.
12 A Stitch in Time
Computes Fine
The word recursion has two different meanings, which can be a source of confusion
about the concept. Since recursion plays an important role in the description of com-
putation, it should be understood well. While recursion in algorithms can be replaced
by loops, recursion is actually more fundamental than loops because, in addition to
defining computations, it is also used for defining data. The definitions of lists, trees,
and grammars all require recursion; there is no alternative version that is based on loops.
This means if one had to decide between the two concepts and pick only one, it would
have to be recursion, because many data structures depend on it.
Deriving from the Latin verb recurrere, which roughly means “run back,” the word
recursion is used to indicate some form of self-similarity or self-reference. These two
different meanings lead to different notions of recursion.
Self-similarity can be found in pictures that contain themselves in smaller form, such
as the picture of a room that contains a TV that displays the same view of the room,
including a smaller version of the TV that shows the room, and so on. In contrast, self-
reference occurs when a definition of a concept contains a reference to itself, usually
through the use of a name or symbol. As an example consider the definition of descen-
dant (which was used as a basis for an algorithm for computing descendants in chapter
4). Your descendants are all your children and the descendants of your children. Here
the definition of what your descendants are includes a reference to the word descendant.
In this chapter I present different forms of recursion and specifically explain the re-
lationship between the self-similarity and self-reference forms of recursion. I use the
206 Once Upon an Algorithm
movie trilogy Back to the Future and the idea of time travel to illustrate several aspects
concerning recursion. Time travel can be viewed as a mechanism that works like re-
cursion in describing sequences of events. Starting from the observation that recursive
definitions can be understood as time travel instructions, I explain the problem of time
paradoxes and how it is related to the question of making sense of recursive definitions
through the concept of fixed points.
This chapter also points out several features of recursion. For example, the TV room
recursion is direct in the sense that the picture of the room is immediately contained as
part of itself. Moreover, the recursion is unbounded, that is, the containment goes on
forever, so that if we could repeatedly zoom in and expand the TV screen to the size of
the picture, we would never reach an end of the containment. This chapter examines
examples of indirect and bounded recursions and explores their impact on the concept
of recursion.
Finally, I explain the close relationship between loops and recursion by demon-
strating how algorithms involving loops, such as Hansel and Gretel’s pebble-tracing
algorithm, can also be described using recursion. I show that loops and recursion are
equivalent in the sense that any algorithm that uses a loop can always be transformed
into an algorithm that computes the same result using recursion, and vice versa.
the reality of 1985 to what it was before they left for 2015, Marty and Doc Brown travel
back to 1955 to take the sports almanac away from Biff.
Doc Brown: Obviously the time continuum has been disrupted, creating a new
temporal event sequence resulting in this alternate reality.
Marty: English, Doc!
1955 1985
All this jumping back and forth in
Biff doesn't Peaceful
time sounds quite confusing, and Doc have almanac Hill Valley
Brown has to explain some of the Biff gives himself
Biff has Violent
consequences of a planned trip with the almanac almanac Hill Valley
the time machine to Marty using a Marty destroys Peaceful
almanac Biff doesn't
drawing on a blackboard, similar to have almanac Hill Valley
he had arrived in 1885 by the outlaw Buford “Mad Dog” Tannen. He therefore follows
Doc back to 1885, using the time machine that Doc had hidden in an old gold mine
in 1885 for Marty to recover in 1955. In 1885 he helps Doc avoid being shot by Buford
Tannen and ultimately manages to return to 1985.
Recursive algorithms do something very similar and can be understood as inserting
instructions into a stream of instructions. Each step of an algorithm is either some basic
instruction, or it can be an instruction to execute another algorithm, which effectively
causes the instructions of that algorithm to be inserted into the current sequence of
instructions. In the case of recursion, this means that the instructions of the current
algorithm itself are inserted where the algorithm is called.
Since the execution of each algorithmic step produces some intermediate value or
has some other effect, a recursive execution of an algorithm makes its value or effect
available at the place where it is called. In other words, a recursive call of an algorithm
can be viewed as an instruction to travel back in time to start the computation that
makes the required result available now.
As a first example, let’s describe Marty’s actions by a recursive algorithm. Specifi-
cally, we can define an algorithm ToDo through several equations that tell Marty what
to do at what time:2
We could extend this algorithm to include the time travel to 2015, but the three shown
cases suffice to illustrate a number of points about recursion. First, the equation for
ToDo(1985) reveals that the actions to be taken in 1985 require actions to be taken in 1955,
since in order to be able to go camping with Jennifer the world needs to be in a different
state. This requirement is expressed using recursion in the algorithm ToDo. One of
the steps in the algorithm ToDo is an execution of ToDo itself. Second, the recursive
execution ToDo(1955) uses a different argument than the equation for ToDo(1985), of
which it is a part. This means that the recursion does not lead to an exact replication
(unlike the picture of the room containing a TV). This is significant for the termination
behavior of computations.
Let’s consider how the computation unfolds when the algorithm is executed with
the argument 1985. The first step of the computation ToDo(1985) is the execution of
ToDo(1955), which means that before Marty can go camping, he has to travel back in
time to 1955 to retrieve the sports almanac from Biff. But before he can return to 1985,
he has to take the steps described by ToDo(1885), that is, he has to travel further back in
Chapter 12 A Stitch in Time Computes Fine 209
time, to 1885, to save Doc Brown. After that he returns to 1985, when he finally can go
on the long-planned camping trip with his girlfriend.
When we examine the third equation carefully, we observe something peculiar. In-
stead of returning to 1955, the time Marty came from when the computation ToDo(1885)
was started, the algorithm directly returns to 1985. This is what actually happens in the
third Back to the Future movie. This makes a lot of sense, because jumping back to 1955,
only to then directly jump back to 1985 would not be very useful. (Most people prefer
direct flights over layovers.)
However, this behavior is not how recursion typically works. When a recursive
computation is completed, it automatically returns to the point where it left off, and
the computation then continues right after it. In this example, it would have meant
the jump to 1885 returning to 1955. The reason for this behavior is that the recursive
computation generally doesn’t know how the computation is being continued, and thus
the safe thing to do is to return to the point of departure in order to not miss anything
important, even though in this particular case, since the next step is to return to 1985,
the direct jump is fine. Whether there are two consecutive jumps or just a single one is
a matter of efficiency. In Back to the Future this really matters because the flux capacitor
that makes time travel possible in this story requires a lot of energy for each time jump.
Doc Brown laments in 1955 about his car design from 1985:
How could I have been so careless? 1.21 gigawatts! Tom [Thomas Edison], how
am I gonna generate that kind of power? It can’t be done, it can’t!
No Matter When
To use Doc Brown’s time machine one needs to provide the exact target date and time of
the time travel. However, since the purpose of time travel is to change the causal chain
of events, the exact date/time doesn’t really matter as long as one arrives before the
event that is to be changed. Assuming that we have a table containing the dates/times
for all the events we are interested in changing, we could express the ToDo algorithm
differently, employing intended causal changes instead of explicit time jumps. In fact,
the algorithmic style in ToDo of using direct jumps is quite old. It is how low-level
languages for programming microprocessors operate, namely, by labeling pieces of code
and then using jump instructions to move between pieces of code. All the control
structures discussed in chapter 10 can be realized using such jumps. However, programs
that employ jumps are difficult to understand and reason about, especially if the jumps
are used in a haphazard way, in which case the resulting code is often called spaghetti
code. Jumps have therefore been mostly retired as a way of expressing algorithms, even
210 Once Upon an Algorithm
though they are still used in the low-level representation of code that runs on computer
hardware. Instead, algorithms employ conditionals, loops, and recursion.
Since explicit jumps are a deprecated control structure, how can we express the
ToDo algorithm without them? In this case, instead of using specific years for label-
ing sequences of actions, we can employ names for goals the algorithm is supposed to
achieve. When an algorithm is called with a particular goal, we can find the equation
for that goal among the equations. Moreover, once a recursive execution is finished, we
automatically return to the point where it was started. While the exact years are very
important for the movie to establish different cultural contexts, they do not matter for
establishing the correct sequence of steps, which only depends on a relative ordering
that respects the causal dependencies of the actions. We can therefore replace the algo-
rithm ToDo with a new algorithm Goal, defined as follows:
The computation for this algorithm unfolds in the same way as for the algorithm ToDo,
except that the times are not made explicit and that the return to 1985 happens in two
steps.
Just in Time
The computation that results from executing the recursive ToDo or Goal algorithm has
an effect on the state of the world and is not directly visible by tracing the steps of the
algorithms as they unfold. To illustrate the connection between recursive execution
and computation more clearly, let us look at another example and consider a simple
algorithm for counting the elements in a list. As a concrete analogy, think of counting
the cards in a deck.
This algorithm has to distinguish two cases. First, if the deck is empty, it contains
0 cards. Second, if the deck is not empty, the number of cards can be determined by
adding 1 to the number of cards in the deck without the card on top. If we represent the
deck of cards as a list data structure, the algorithm has to correspondingly distinguish
between an empty and a nonempty list. In the latter case, it adds 1 to the result of
applying the algorithm to the list’s tail (which is the list without its first element).
Since any recursive call to a nonempty list will itself lead to the addition of 1, the
algorithm will add as many 1s as there are elements in the list. The algorithm Count
Chapter 12 A Stitch in Time Computes Fine 211
can be described by the following two equations that deal with the cases empty list and
nonempty list:
Count( ) =0
Count(x→rest) = Count(rest) + 1
The two cases are distinguished by displaying different forms of arguments for Count
on the left sides of the equation. In the first equation a blank space indicates that this
equation applies to empty lists that don’t contain any elements. In the second equation
the pattern x→rest represents a nonempty list, where x stands for the first element and
rest stands for the tail of the list. The definition in this case counts the elements in the
list’s tail by Count(rest) and then adds 1 to the result. This way of selecting between
different cases for an algorithm is called pattern matching. Pattern matching can often
replace the use of conditionals and leads to a clear separation of the different cases to be
considered by the algorithm. Pattern matching also provides direct access to parts of the
data structure that the algorithm is working on, which sometimes can make definitions
shorter. In this case x refers to the first element of the list, but it is not used in the
definition on the right side of the equation. But since rest names the tail of the list, it
can be employed as the argument for the recursive call to Count. Another nice benefit
of pattern matching is that it clearly separates the recursive from the nonrecursive part
in the definition.
The recursive equation for Count can be understood as the following hypothetical
statement: If we knew the number of elements in the list’s tail, the total number of
elements could be obtained by simply adding 1 to that number. Imagine somebody had
already counted the number of cards in the deck without the top card and had left a
sticky note with the number on it. In that case, the total number of cards could be
determined by simply adding 1 to the number on the sticky note.
However, since this information is not available, we have to compute it—
recursively—by applying Count to the tail of the list, rest. And here is the connection to
time travel. Consider the times when the different computation steps occur. If we want
to add 1 now, the computation of Count( rest) must already be completed and therefore
must have been started sometime in the past. Similarly, if we want to compute the total
number of cards immediately, we could have asked somebody else to count the number
of cards in the deck without the top card a few minutes ago to have the result avail-
able now. Thus we can regard a recursive call of an algorithm as a trip into the past to
create computational steps that will be completed in the present so that the remaining
operations can be carried out immediately.
To see an example computation of a recursive algorithm, let’s count the different
things Marty takes with him to 1885, which are cowboy boots (B), a pair of walkie
212 Once Upon an Algorithm
talkies (W), and the hoverboard (H). When we apply Count to the list B→W→H, we
have to use the second equation, since the list is not empty. To perform this application,
we have to match the pattern x→rest against the list, which causes x to refer to B, and
rest to refer to W→H. The defining equation for Count then instructs to add 1 to the
recursive call of Count to rest:
Count(B→W→H) = Count(W→H) + 1
In order to carry out this addition, we need the result of Count(W→H), which we know
to be 2, but the corresponding computation with Count must have started earlier if we
want to be able to use the result now.
In order to get a more precise understanding of the timing, we assume that a basic
computation step, such as adding two numbers, takes one time unit. We can then talk
about the duration of a computation in terms of such time units. The start and com-
pletion of a computation relative to the present can be expressed as a distance of time
units. If we designate the present as time 0, a basic step taken now will be completed at
time +1. Similarly, a computation that takes two steps and completes now must have
been started at time −2.
It is not obvious how long a recursive computation will take because it very much
depends on how often a recursive step occurs, and this depends in general on the input.
In the example Count, the number of recursions, and thus the runtime, depends on
the length of the list. As with algorithms that use loops, the runtime for recursive
algorithms can be described only as a function of the size of the input. This observation
raises a question for the travel-into-the past view of recursion: How far back in time
does the recursive call of Count have to travel to complete its work just in time and
deliver its result for the addition? It seems that we have to travel far enough back into
the past to have enough time for all the required expansions and additions, but since we
don’t know the length of the list, we don’t know how far back we should go.
Fortunately, we don’t have to know the size of the input to employ time travel effec-
tively. The key observation is that it is sufficient to start a recursive computation only
one time unit into the past, because no matter how long a recursive computation takes,
any additional time that would be required to execute further recursive computations
is gained by sending the corresponding recursive calls even further into the past. How
this works is illustrated in figure 12.1.
As shown, the execution of Count(B→W→H) generates the computation of
Count(W→H) + 1. Once the recursive computation is completed, Count takes exactly
one step to add 1 to result of the recursive call, and we have the final result 3 available
at time +1. To have the result of a recursive call available at some specific time, it is suf-
ficient to start the computation one time unit earlier. For example, to obtain the value
Chapter 12 A Stitch in Time Computes Fine 213
1
)=0 0 + 1
H ) = 1+1 → H ) =2+ 3
C ount( Count
(H) =
(W → (B → W
Count Count
Past Future
−3 −2 −1 0 +1
Present
Figure 12.1 Recursively counting the number of elements in a list by traveling into the past.
The execution of the algorithm Count on a nonempty list triggers a computation that adds 1 to
the result of the computation of Count for the list’s tail. Executing the computation of the tail
one step in the past makes its result available in the present, and the result of the addition will
be available one step in the future. Executing Count on a nonempty list in the past leads to a
further execution of Count even further back in the past.
of Count(W→H) in the present, that is, at time 0, we need to send the computation
only one time unit into the past. As shown in the figure, this computation results in the
expression 1 + 1 at time −1, which evaluates then in one step to 2 at time 0, exactly when
it’s needed. And how is it possible for Count(W→H) to produce 1 + 1 instantaneously?
This is achieved, again, by sending the involved recursive call Count(H) one step into
the past, that is, to time −2, where it produces the expression 0 + 1, which also evaluates
in one step and makes its result 1 available one time unit later at time −1. The 0 in the
expression is the result of the call Count( ) that was sent back to time −3. Altogether
we can see that by repeatedly starting recursive computations in the past, we can finish
the original computation only 1 time unit into the future. This is true regardless of how
long the list is; a longer list would simply generate more time travel further back in
time.
Note that the time travel analogy might make recursion look more complicated than
it really is. When a recursive algorithm is executed by a computer, no timing tricks and
fancy scheduling are required. This was evident with binary search in chapter 4 and the
quicksort and mergesort algorithms in chapter 6.
The time travel analogy illustrates two forms of recursion and how they are related.
On the one hand, algorithms such as Goal or Count employ recursion in their defini-
tions through self-reference. I call this form of recursion descriptive, since the recursion
occurs in the description of the algorithm. On the other hand, when a recursive algo-
rithm is executed, the resulting sequence of similar events or computations comprises
214 Once Upon an Algorithm
instantiations of the recursive description with different values used for the parameters.
I call this form of recursion unfolded. As figure 12.1 illustrates, the repeated expansion of
recursive algorithm applications transforms a descriptive recursion into an unfolded re-
cursion. In other words, executing a descriptive recursion (that is, a recursive algorithm)
yields a corresponding unfolded recursive computation, a view that can be summarized
in the following equation:
Execution(Descriptive Recursion) = Unfolded Recursion
The recursive picture of the room with a TV showing this room is an example of an
unfolded recursion. Given the above relationship, is there a descriptive recursion of it
that, when executed, produces it? Yes. One could consider the instructions to take a still
image with a video camera of the room and feed the picture into the TV that is located in
the room. Carrying out the instructions would produce the unfolded recursive picture.
In Back to the Future the descriptive recursion is captured in the algorithms ToDo and
Goal. In particular, the goals and plans for changing the past are a form of descriptive
recursion. And when the plans are executed, the story unfolds into events that are often
quite similar to one another, such as the déjà vu cafe/saloon and skateboard/hoverboard
scenes.
It is not only fictional characters in time-traveling movies who formulate goals and
plans that contemplate changes in the past. In fact, we all sometimes engage in descrip-
tive recursion by asking, What would have happened if I had done that differently?
However, unlike characters in a movie, we cannot execute such plans.
and destroy the entire universe! Granted, that’s a worst-case scenario. The
destruction might in fact be very localized, limited to merely our own galaxy.
One response to the problem of paradoxes is to assume that it’s actually not possible to
take actions that will cause a paradox. For example, even if you could travel into the
past, it would be impossible for you to kill your grandfather. Concretely, suppose you
try to kill your grandfather with a gun. Then maybe you would fail to get hold of a
gun, or if you tried to use the gun, it would jam, or your grandfather would dodge the
bullet, or even if it hit him, he would only be injured and recover. Maybe the nature
of the space-time continuum is such that it permits only actions in the past that are
consistent with what is certain to happen in the future.
Does the equivalent of the grandfather paradox exist in the realm of computation?
Yes. For example, any nonterminating execution of a recursive algorithm can be con-
sidered such a paradox. In the following I explain this view in more detail, using the
concept of a fixed point.
To produce a paradox, it seems that we need to identify a recursive computation that
triggers a trip into the past and changes it so that the triggering computation disappears
or becomes impossible. Stopping short of actually going back in time, the closest we
can probably get is an algorithm that deletes itself or destroys the computer on which
it is running. In such a situation, the execution of the algorithm would simply stop,
so there is no real paradox to be resolved in that case. But this example only looks at
unfolded recursion. There are many cases of descriptive recursion that are paradoxes.
Recall the recursive definition of Count, which was given by an equation that said that
the number of elements in a list is obtained by adding 1 to the number of elements in
the list’s tail. There is no paradox here, but suppose we changed the definition slightly
by applying Count recursively to the whole list and not the list’s tail:
Count(list) = Count(list) + 1
This seems to be defining the number of elements as one more than the number of
elements, a clear contradiction. A similar example is the following equation, which
tries to define a number n that is 1 larger than itself:
n=n+1
Again, this is a contradiction, or paradox, and the equation has no solution. In terms of
the time-traveling analogy, the attempt to go back in time to compute a value for n or
Count( list) will not terminate and thus can never reach the point where we could add
1. In other words, there is no way that the desired actions in the past can be carried out
216 Once Upon an Algorithm
to be consistent with the action in the present, which amounts to the paradox that the
number of elements in a list should be two different values at the same time.
In reality, apparent paradoxes are often resolved by the constraints of physics. For
example, the seemingly infinite recursion in the picture of the room with the TV stops
with the resolution of the camera. Once a deeply nested TV picture gets as small as one
pixel, the recursion stops and will not present a picture of the room as part of that pixel.
Similarly, in the case of audio feedback when the output of an amplifier is input (fed
back) to a microphone connected to it, the amplification does not proceed infinitely.
The paradox in this case is resolved by the physical limitations of the microphone and
amplifier, which are both limited with respect to the amplitude of signals they can
handle.
Chapter 9 discussed the fact that languages have semantics and that the semantics of
an algorithm expressed in some programming language is a computation that is carried
out when the algorithm is executed. From this perspective, the meaning of algorithms
that are contradictory and lead to paradoxes is undefined, that is, there is no computa-
tion that would do what the algorithm demands. How can we tell whether a recursive
algorithm is contradictory and entails a paradox?
The example of a recursive number definition can help illuminate the problem.
Consider the following equation, which says the number being defined is equal to its
own square:
n =n×n
This is actually not a contradiction. There are two natural numbers for which the equa-
tion is true, namely, 1 and 0. And this is the key to understanding recursive definitions.
The defined number is a solution to the equation, that is, it is a number that when
substituted for the variable n will yield a true statement and not a contradiction. In the
case of this number equation, we obtain, for example, 1 = 1 × 1, which is true, so 1 is a
solution for that equation. In contrast, the equation n = n + 1 does not have a solution.
Similarly, the equation Count(list) = Count(list) + 1 does not have a solution either, but
the original equation does. In that case, it is a computation that computes the number
of elements contained in a list.
An equation that contains the same variable on both sides can be viewed as being
defined by a transformation. For example, the equation n = n + 1 says that n is
defined by adding 1 to itself, or n = n × n says that n is defined by multiplying it by
itself. Any number n that is unaffected by a transformation is called a fixed point of that
transformation. As the name suggests, the value does not change and remains fixed.
Chapter 12 A Stitch in Time Computes Fine 217
Examples of fixed points that make the term point more fitting arise
for geometric transformations. Consider, for example, the rotation of
a picture around its center. All points change their position except for
the center, which stays at its place. The center is a fixed point of the
rotation transformation. In fact, the center is the only fixed point of the rotation. Or
consider reflecting a picture along one of its diagonals. In this case all of the points on
the diagonal are fixed points of the reflection transformation. Finally, shifting a picture
to the left does not have any fixed point, since all points are affected. For the number
examples, the “adding 1” transformation has no fixed points, whereas the “multiply by
itself” transformation has the two fixed points: 1 and 0.
What is the transformation that corresponds to the equation defining Count? First,
we observe that the changed definition Count(list) = Count(list) + 1 is very similar to
the definition n = n+1 except that it has a parameter. This definition corresponds to the
transformation that says Count applied to list is defined by adding 1 to this application.
This transformation, like the one for the equation for n, does not have a fixed point. In
contrast, the transformation for the original definition Count(x→rest) = Count(rest) + 1
says that Count applied to list is defined by removing the first element of list in that
application and then adding 1. This transformation (together with the equation defining
the case for the empty list) does have a fixed point, which turns out to be the function
that counts the elements in a list.
The meaning of a recursive equation is the fixed point of its underlying
transformation—that is, if such a fixed point exists.3 When the recursive equation de-
scribes an algorithm, the fixed point describes a computation that is stable under a
transformation that makes it applicable to a large number of cases. The transformation
typically adapts the argument of the algorithm, and it might also modify the result of
the recursive call. In the case of Count, the argument list has its first element removed,
and the result is increased by 1. In the case of Goal, the recursive equations execute Goal
with different goals and add other activities that are pertinent to each case.
Viewed from the time-traveling perspective, the fixed point of a recursive algo-
rithm describes a computation whose effects in the past are consistent with those in
the present. This is why fixed points are relevant for recursion. Recursive algorithms,
like time travelers, must behave properly and avoid paradoxes to be successful. If a re-
cursive algorithm has a fixed point, it denotes a meaningful computation. Otherwise,
it amounts to a paradox. However, unlike time travel paradoxes, it doesn’t destroy the
universe; it simply doesn’t compute what you want. Understanding the meaning of a
recursive algorithm as a fixed point is not easy. Chapter 13 demonstrates another way
of making sense of recursive algorithm descriptions.
218 Once Upon an Algorithm
GroundhogDay(true) = do nothing
GroundhogDay(false) = experience the day; GroundhogDay(good person?)
Compared with the loop repeat experience the day until good person, this seems to be
more complicated. But it would be incorrect to conclude that loops are always easier
to program with than recursion. Recursion is particularly well suited for problems that
can be decomposed into subproblems (see divide-and-conquer algorithms in chapter 6).
Recursive algorithms for those problems are often simpler and clearer than their loop-
based counterparts. It is an instructive exercise to try to implement an algorithm such
as quicksort without recursion to appreciate this point.
we can ask what the requirements for descriptive recursion are to produce bounded
recursion. One condition is that the description of the recursion must contain some
part that does not itself invoke recursion, such as the definitions for Goal(save Doc) or
Count( ). Such equations are called base cases. While a base case is always required to
end the recursion, it is not a guarantee for termination because the recursive case(s)
could be such that the base case is never reached. Recall the definition Count(list) =
Count(list) + 1, which will never terminate when applied to a nonempty list, even if it
has a base case for the empty list.
Is unbounded recursion useful at all? It would seem that recursive computations
that do not terminate could not produce any result and would therefore be meaning-
less. This may be so if such computations are considered in isolation, but as components
of other computations unbounded recursions can be quite useful. Suppose a computa-
tion produces an infinite stream of random numbers. Such a stream can be useful for
implementing simulations. As long as only a finite portion of the infinite stream is
consumed, the computation can behave well and simply ignore the infinite parts. As
another example, consider the following definition of an infinite list of 1s, which says
that the list has 1 as its first element and is followed by a list of 1s:
Ones = 1→Ones
Executing this definition will lead to an infinite sequence of 1s:
1→1→1→1→1→1→ · · ·
Like the picture of the room with the TV, this list contains itself as a part. The equation
shows this, as does the unfolded list. The infinite list of 1s starts with a 1 and is followed
by a list of 1s, which is also infinite.
This view of self-containment also helps to explain self-similarity as a result of re-
cursion. If we write the list computed by Ones on one line, and the list computed by
1→Ones on the line beneath it, we will see two lists that are absolutely identical. Since
both lists are infinite, the one on the second line does not contain an additional element.
Unbounded recursion can also be found in music, as in never-ending songs (“99
bottles of beer on the wall” and the like) or the drawings of M. C. Escher, such as
“Drawing Hands” and “Print Gallery.”6 The work “Drawing Hands” shows one hand
drawing a second hand, which in turn draws the first hand. There is no base case,
and the recursion does not end. Similarly, “Print Gallery” displays a nonterminating
recursion. It is the picture of a town with a gallery in which a man looks at a picture of
that same town that includes him in the gallery looking at the picture.
The two Escher drawings illustrate the difference between direct and indirect re-
cursion. In “Drawing Hands” the recursion is indirect, since instead of drawing itself,
Chapter 12 A Stitch in Time Computes Fine 221
each hand draws a different hand—the one by which it is drawn. In contrast, the “Print
Gallery” picture contains itself directly, since it immediately shows the town with the
print gallery in which that man looks at the picture. “Drawing Hands” also demon-
strates that indirect recursion does not guarantee termination. This situation is similar
to that for base cases: they are required for termination but are no guarantee for it.
A popular example of indirect recursion is the definition of the algorithms Even and
Odd to determine whether a number is divisible by 2. The definition of Even says that 0
is an even number and any other number is even if its predecessor is odd. In the second
equation the definition of Even refers to the algorithm Odd. The definition of Odd says
that 0 is not an odd number and any other number is odd if its predecessor is even. In
the second equation the definition of Odd refers to the algorithm Even.
Thus, Even refers recursively to itself indirectly through the reference to Odd (and vice
versa). This can be seen when evaluating a simple example:
We see that the call Even(2) is reduced to the call Even(0), but only indirectly via
Odd. The definitions of Even and Odd are similar to “Drawing Hands” in that each
one defines the other. An important difference, however, is that the recursion in the
algorithms is bounded (any computation will terminate with one of the base cases),7
whereas the recursion in the painting is unbounded.
Another example of direct recursion is the following dictionary-style definition of
recursion:8
You are done with your work for the day and get back home. Before dinner you have
some time to work on your latest sewing project. The quilting pattern you have chosen
details the different fabrics and how much of them you need. Since you started the
project a few weeks ago, you have bought the fabrics, cut and ironed most of the pieces,
and have already begun to sew together some of the quilt blocks.
The quilting pattern with its associated instructions is an algorithm. Since making
a quilt is a lot of work and typically can’t be finished in one session, the execution of
the algorithm must be repeatedly interrupted and continued at a later time. Despite the
care and attention it takes to make a quilt, it is surprisingly easy to interrupt quilting
and pick it up again at a later time, because the state of the quilting process is perfectly
represented at each stage by its current results. If you don’t have the fabrics, the next
thing to do is buy the fabrics; if you have all the fabrics but don’t have the pieces
yet, you have to cut the fabrics next; and so on. The situation is similar for other
crafting projects, such as building a birdhouse or folding an origami figure, but there
are some tasks that require extra effort to represent the state of computation. Suppose,
for example, that you are counting the number of baseball cards in a box of collectibles,
and you are interrupted by a phone call. In order to continue the counting, you have
to make sure to separate the counted items from the yet-to-be-counted ones and to
remember the number of items counted so far.
In addition to supporting the interruption of work, intermediate results can serve
as an explanation of the computation produced by an algorithm, since they keep track
of computational steps and provide a trace of what has been done so far. A trace is
given by a sequence of computation states that starts with a simple item (for example, a
collection of fabrics in quilting, or a plain sheet of paper in origami) and then contains
increasingly more accurate approximations of the final result. Each of the approxima-
tions is obtained from a preceding one by performing changes to it as described by the
algorithm that defines the computation. The sequence of the initial item, all interme-
diate steps, and the final result is the trace of the computation. And just as a trace of
224 Once Upon an Algorithm
footsteps in the sand explains the movement of a person and how they got from one
place to another, the trace of a computation explains the process of how the initial item
was transformed into the final result.
Building a trace is also an effective approach to understand-
ing a recursive description. Even though most quilting pat-
terns are not recursive, we can find some fascinating recur-
sive designs, for example, those that employ Sierpinski tri-
angles. The quilt illustrates nicely the recursion on which it
is based. An upright triangle is built of three mostly light-
colored upright triangles that enclose one upside-down dark-
colored triangle; and an upside-down triangle consists of three
upside-down triangles enclosing one upright triangle. Each of
the smaller upright triangles is again composed out of three
smaller upright triangles enclosing an upside-down triangle, and so on. Note the sim-
ilarity to the indirect recursion in Escher’s “Drawing Hands” and the Even and Odd
algorithms (see chapter 12).
There are different kinds of traces. In some cases, the algorithm is completely sepa-
rated from the trace. For example, many assembly descriptions contain numbered steps
that describe what to do and then have a separate, correspondingly numbered sequence
of pictures that show what the results of the steps are. But there are also traces that con-
sist of pictures containing instructions directly. Both kinds of traces have their strengths
and shortcomings, especially when it comes to traces for recursive algorithms. One chal-
lenge with executing recursive algorithms is to keep track of all the different executions
and their different parameters. For example, the execution of Count(B→W→H) leads
to three more executions of Count, all with different lists as parameters. The approach
that keeps instructions as part of traces does this very well and without any additional
help. The individual steps of the trace are the only representations of the computation
and contain all relevant information to continue the computation. However, instruc-
tions within a trace can be confusing, and this method can also lead to huge traces, full
of redundancy and containing too many snapshots. In contrast, the approach that keeps
instructions separate from traces has to manage the correspondence between algorithm
and trace but can produce more succinct traces.
The meaning of an algorithm is given by the set of all computations it can produce.1
Traces make computations concrete and therefore contribute to the understanding of
algorithms. Methods for producing traces are thus important tools for elucidating the
relationship between algorithms and their computations.
13 A Matter of
Interpretation
The focus of chapter 12 was on explaining what recursion is, the different forms of re-
cursion, and how recursion is related to loops. The executions of the algorithms ToDo
and Count demonstrated that the execution of descriptive recursion yields unfolded re-
cursion, revealing that the connection between self-reference and self-similarity is com-
putation. However, we have not yet looked in detail at how recursive computation
works.
This chapter illustrates how recursive algorithms can be executed. One intriguing
aspect is that the execution of an algorithm leads, through recursion, to many executions
of the same algorithm. The dynamic behavior of recursive algorithms can be illustrated
in two ways.
First, the use of substitution allows the construction of computation traces out of
recursive definitions. The substitution of an argument for a parameter is a fundamental
activity that is invoked whenever we execute an algorithm. For executing a recursive
algorithm, substitution additionally is employed to replace the call of an algorithm by its
definition. In this way, substitution can eliminate descriptive recursion and transform
it into a trace that can serve as an explanation of a recursive algorithm.
Second, the notion of an interpreter provides an alternative approach to explain-
ing recursive algorithms. An interpreter is a specific kind of computer that executes
algorithms using a stack data type (see chapter 4) to keep track of recursive (and nonre-
cursive) calls and multiple copies of arguments that arise as a consequence of recursive
algorithm executions. The operation of an interpreter is more complicated than sub-
226 Once Upon an Algorithm
Rewriting History
An algorithm as a tool for solving problems is of interest only if it can solve a number of
related problems (see chapter 2). If a specific algorithm could solve only one problem,
such as finding the shortest route from your home to work, you could execute the
algorithm once and then remember the route and forget the algorithm. If, on the other
hand, the algorithm is parameterized and can find shortest routes between different
places of interest, it becomes very useful, since it is applicable in many situations.
When an algorithm is executed, the resulting computation works on input values
that are substituted for the parameters. The getting-up algorithm in chapter 2 consists
of the instruction “Wake up at wake-up-time.” To execute that algorithm, a concrete
time value such as 6:30 a.m. must be supplied (for example, by setting the alarm), and
so the instruction becomes “Wake up at 6:30 a.m.,” obtained by substituting the value
6:30 a.m. for the parameter wake-up-time in the algorithm.
The substitution mechanism applies to all algorithms and their parameters: cups of
water for making coffee, pebbles for finding a path, weather conditions for weather re-
ports, and so on. Of course, parameter substitution also applies to recursive algorithms.
For example, quicksort and mergesort require the list that is to be sorted as input; bi-
nary search has two parameters, the item to be found and the data structure (tree or
array) to perform the search in; and the algorithm Count (see chapter 12) takes the list
to be counted as input for its parameter.
In addition, another kind of substitution is at play in the execution of recursive
algorithms, namely, the substitution of the definition of an algorithm for its name, for
instance, in the demonstration of how Count computes the number of items Marty
took with him on his trip to 1885. Let’s take another look at the equation that defines
what the Count algorithm does for nonempty lists:
Count(x→rest) = Count(rest) + 1
First, there is the substitution of the argument list for the parameter. Executing Count
on the list B→W→H means to substitute the list for the parameter of Count. Since the
parameter of Count is represented in the equation as a pattern that consists of two parts,
Chapter 13 A Matter of Interpretation 227
the process of matching the list against this pattern produces two substitutions, B for x
and W→H for rest. The substitution affects the right-hand side of the equation, which
defines the steps of the algorithm—in this case the execution of the algorithm Count on
rest and the addition of 1 to the resulting number. This leads to the following equation:
Count(B→W→H) = Count(W→H) + 1
This equation can be understood as the definition of the algorithm instantiated for a
particular example case, but it can also be viewed as a substitution of the call of the
algorithm by its definition.
This becomes clearer if we employ the notation for derivations first mentioned in
chapter 8. As you may recall, an arrow was used to indicate how a nonterminal grammar
symbol is expanded by its defining right-hand side. A sequence of such expansions
can be used to derive a string or syntax tree using the grammar rules for a language.
We can view the defining equations of a recursive algorithm in the same way as rules
for deriving a computation. Using the arrow notation we can rewrite the preceding
equation as follows:
Count2
Count(B→W→H) −−−−−−−−→ Count(W→H) + 1
This last step shows that substitutions generally happen within a context that is unaf-
fected by the substitution. In other words, the substitution replaces a part of a bigger
expression and makes a change only locally. This is very much like replacing a light
bulb. The old light bulb is removed, and in its place a new one is inserted, leaving the
lamp and other parts of the environment unchanged. In the example, the substitution
of Count(H) + 1 for Count(W→H) occurs in the context of “ + 1.” We need two more
substitution steps to complete the expansion and remove all recursive occurrences of
Count:
228 Once Upon an Algorithm
Picture
Picture =
Figure 13.1 Recursive picture definition. A name is given to a picture, which contains the
name and thus a reference to itself. The meaning of such a self-referential definition can be ob-
tained by repeatedly substituting a scaled-down copy of the picture for its name, thus producing
step-by-step an unfolded recursion.
Count2
Count(B→W→H) −−−−−−−−→ Count(W→H) + 1
Count2
−−−−−−−−→ Count(H) + 1 + 1
Count2
−−−−−−−−→ Count( ) + 1 + 1 + 1
Count1
−−−−−−−−→ 0 + 1 + 1 + 1
Note that the last substitution step employs the first rule for Count, which applies to
the empty list. Now that we have eliminated all recursion and obtained an arithmetic
expression, we can evaluate it and obtain the result.
We can apply the same strategy to the algorithms ToDo and Goal and use substitu-
tion to trace the execution of a recursive time-traveling algorithm, producing a sequence
of actions:
Goal(live now)
Goal1
−−−−−−−→ Goal(restore world); Go camping
Goal2
−−−−−−−→ retrieve sports almanac; Goal(save Doc); go camping
Goal3
−−−−−−−→ retrieve sports almanac; help Doc avoid Buford Tannen; go camping
These examples show how the potentially mystifying self-reference in descriptive re-
cursion can be resolved by the repeated substitution of a name by its definition. The
repeated substitution of definitions works even for the recursive picture that shows a
room containing a TV showing a picture of that room (see figure 13.1).
Here are the first few steps that show how repeated substitution transforms the
descriptive recursion into an unfolded one:
Chapter 13 A Matter of Interpretation 229
Picture
Picture
Picture Picture Picture Picture
Picture −−−−−−−→ −−−−−−−→ −−−−−−−→
Of course, this substitution process doesn’t end, since the recursion is unbounded and
has no base case. This situation is similar to the definition of the infinite list of 1s:
Ones = 1→Ones
When executing this definition, substitution will produce an ever-growing list. In each
step, another 1 is added to the list:
Ones Ones Ones
Ones −−−−−−→ 1→Ones −−−−−−→ 1→1→Ones −−−−−−→ 1→1→1→Ones · · ·
A Smaller Footprint
Substitution is a simple mechanism for producing a trace of a computation, which is ba-
sically a sequence of snapshots of intermediate results or states. It works for nonrecur-
sive and recursive algorithms alike, but it is particularly useful for recursive algorithms
because it eliminates self-reference and systematically turns a descriptive recursion into
a corresponding unfolded one. When we are only interested in the result of a com-
putation and not in the intermediate steps, substitution is doing more than is needed,
but the value of a substitution trace lies in its ability to provide an explanation of the
computation that has taken place. The Count trace provides such an example.
However, while a substitution trace can be illuminating, it can also be distracting
when it gets too large. Consider again the insertion sort algorithm (see chapter 6). Here
is a recursive definition of the algorithm Isort, which uses two lists, the list of elements
still to be sorted and the list of elements already in sorted order:
Insert(w, ) =w
Insert(w, x→rest) = if w ≤ x then w→x→rest else x→Insert(w, rest)
230 Once Upon an Algorithm
Isort2
Isort(B→W→H, ) −−−−−−−−→ Isort(W→H, Insert(B, ))
Insert1
−−−−−−−−→ Isort(W→H, B)
Isort2
−−−−−−−−→ Isort(H, Insert(W, B))
Insert2
−−−−−−−−→ Isort(H, if W ≤ B then W→B else B→Insert(W, ))
else
−−−−−−−−→ Isort(H, B→Insert(W, ))
Insert1
−−−−−−−−→ Isort(H, B→W)
Isort2
−−−−−−−−→ Isort( , Insert(H, B→W))
Isort1
−−−−−−−−→ Insert(H, B→W)
Insert2
−−−−−−−−→ if H ≤ B then H→B→W else B→Insert(H, W)
else
−−−−−−−−→ B→Insert(H,W)
Insert2
−−−−−−−−→ B→(if H ≤ W then H→W else W→Insert(H, rest))
then
−−−−−−−−→ B→H→W
The algorithm Isort has two arguments. It traverses its first list and executes the auxiliary
algorithm Insert for each element of the list. If the list of elements to be sorted is
empty, no more sorting is needed, and the second parameter, list, contains the final
result. Otherwise, the algorithm Insert moves an element w from the unsorted list
to the correct position in the sorted list. If that list is empty, w alone constitutes the
resulting sorted list. Otherwise, Insert compares w with the first element ( x) of the list
into which it is to be inserted. If w is smaller than or equal to x, the correct position
has been found, and w is placed at the beginning of the list. Otherwise, Insert keeps x
in place and tries to insert w into the remaining list rest. Of course, this works only if
the list into which it is to be inserted is itself already sorted, which is indeed the case
because that list is built exclusively using the Insert algorithm.
As figure 13.2 demonstrates, the actual construction of the sorted list takes many
steps, and the effect of the different executions of the Insert algorithm is partly obscured
by the presence of the conditional and the fact that intermediate lists are temporarily
represented twice in the alternatives of the conditional. While the trace produced by
substitution is precise and shows exactly what the algorithm does, it also requires a lot
of attention and focus to slog through all the details and to distinguish the data from
the instructions.
Chapter 13 A Matter of Interpretation 231
Another aspect of the substitution approach that can be confusing is that in many
cases different substitutions are possible, and while the choice does not generally affect
the result, it can affect the size of the trace and its understandability. For example,
figure 13.2 shows that the first substitution yields Isort(W→H, Insert(B, )), to which
two substitutions are applicable: we can either use the first equation for Insert or the
second equation for Isort.
Chapter 6 showed a data-only trace to illustrate different sorting algorithms: for each
element that was moved, only the unsorted and sorted lists were shown (see illustration
of insertion sort in figure 6.2). If we apply the same visualization to the example here,
we obtain a trace that is much shorter and more concise than the one in figure 13.2:
The data-only trace looks so much simpler because it does not contain any instructions
from the algorithm. (This reveals how an interpreter typically works: it keeps the
algorithm or program description and the data to be manipulated separate.) Also, the
data-only trace shows only the effects of Isort on the two lists, leaving out the details of
how Insert moves elements within the second list. Moreover, the program is represented
only once and is not changed at all. When an algorithm is interpreted, a data-only trace
presents only the data as it evolves.
Whereas the trace in the substitution approach serves the double duty of tracking
the evolution of the data and the progress of the computation, an interpreter uses a
stack data type for each of these two tasks. In particular, for a recursive algorithm, an
interpreter must keep track for each recursive call where it left off to be able to go back
and continue the computation after the recursive call is finished. Since each recursive
execution of an algorithm has its own argument, the interpreter also has to be able to
maintain multiple versions of parameters.
Both these needs can be fulfilled by storing program addresses and parameter values
on a stack. Let’s see how this works by executing the algorithm ToDo. To facilitate
jumps back from recursive calls, we have to mark positions in the algorithm, for which
we use numbers. Since the parameter of the algorithm ToDo is not used in any of
the definitions, we can ignore it and only store positions on the stack. The algorithm
232 Once Upon an Algorithm
Figure 13.3 Interpretation of ToDo(1985). If the current instruction is a recursive call, the
address following it is remembered on the stack to allow the continuation of the computation
after it is completed. This happens whenever a return instruction is encountered. After the
jump back, the address is removed from the stack.
is reproduced in a slightly modified way, where numbers mark the positions between
instructions:
ToDo(1985) = 1 ToDo(1955) 2 go camping 3 return
ToDo(1955) = 4 destroy almanac 5 ToDo(1885) 6 return
ToDo(1885) = 7 save Doc 8 return
An interpreter executes an algorithm applied to an argument such as ToDo(1985) by
performing the instructions one by one and by keeping addresses to jump back to on a
stack. Whenever an instruction is a recursive execution of the algorithm, the address fol-
lowing that instruction is pushed onto the top of the stack before the interpreter jumps
to the instruction indicated by the recursive call. In the example, the first instruction is
such a jump that causes the following address 2 to be pushed onto the stack and makes
4 the next instruction.
This is illustrated in the first two lines of figure 13.3, which shows how the current
instruction and the stack evolve during the execution of the algorithm. The second
column displays the stack with its top on the left and bottom on the right. The figure
also shows part of the world as it is changed through the execution of the instructions,
for example, the current year or the possession of the sports almanac. This particular
fact about the world changes after Marty has destroyed the sports almanac in 1955. Note,
however, that the fact that Doc Brown is in danger is not a consequence of the current
algorithm execution. However, changing it is part of the next steps of the algorithm.
After traveling to 1885, which causes another return address, 6 , to be pushed onto the
stack, the action to save Doc Brown changes the fact about his being in danger. The
next instruction in the algorithm is to return to where the algorithm left off when the
Chapter 13 A Matter of Interpretation 233
recursive jump was made. The target for the return jump is the last address pushed
onto the stack and can thus be found on top of the stack. Therefore, the execution of
the return instruction causes the next instruction to be the instruction at the address 6 ,
which is another return that also pops the return address off the stack. This reveals the
target address for the next return instruction, which is 2 , the point where Marty and
Jennifer can finally go camping.
The ToDo example demonstrates how nested calls of an algorithm lead to return
addresses being stored on the stack, but it did not require storing parameter values on
the stack. To illustrate this aspect of an interpreter, we consider again the algorithm
Isort, which does require this. However, we don’t need to store return addresses because
each algorithmic step is given by just one expression, not a sequence of several, as in the
case of ToDo.
To sort the list of Marty’s items, an interpreter starts evaluating the call
Isort(B→W→H, ) with an empty stack. Matching the arguments against the param-
eter patterns of Isort leads to bindings, which are associations of parameter names used
in the patterns and values. These bindings are pushed onto the stack, resulting in the
stack shown in figure 13.4a. It may seem strange that while the Isort algorithm takes two
inputs, the application to a nonempty first input produces bindings for three parameters
on the stack. This is because the first argument is matched against a pattern, x→rest,
which contains two parameters that have the purpose of splitting the argument list into
two parts, the first element and the tail of the list. The first equation only produces
a binding for one parameter, because the first input is known to be the empty list and
need not be referred to by a name.
After the pattern matching, which produces parameter bindings on the stack, the
algorithm instructs to compute Isort(rest, Insert(x, list)). As in the substitution method,
the interpreter now has two possible routes to proceed: either continue with the outer
Isort call, or first deal with the nested Insert call. Most programming languages evaluate
arguments before executing an algorithm.1 Following this strategy, the interpreter eval-
uates Insert(x, list) before further evaluating the Isort call. The values for x and list can
be retrieved from the stack and lead to the evaluation of Insert(B, ). This evaluation can
be carried out with a separate stack and produces the resulting list B, which means that
the evaluation of the Isort call becomes Isort(rest, B).
The interpreter evaluates Isort(rest, B) with the stack shown in figure 13.4a. First,
the value of rest is retrieved from the stack, which means that the interpreter actually
evaluates the call Isort(W→H, B). Then pattern matching produces new bindings for
the parameters of Isort to be pushed onto the stack, as shown in figure 13.4b.
234 Once Upon an Algorithm
list=B→H→W
x=H x=H
rest= rest=
list=B→W list=B→W
x=W x=W x=W
rest=H rest=H rest=H
list=B list=B list=B
x=B x=B x=B x=B
rest=W→H rest=W→H rest=W→H rest=W→H
list= list= list= list=
(a) (b) (c) (d)
It is striking that the parameters x, rest, and list appear twice on the stack. This is
because the recursive call of Isort operates with different arguments for its parameters
and thus requires separate storage for them. Once the nested call of Isort has been pro-
cessed by the interpreter, the bindings for the parameters are removed from (“popped
off”) the stack, and the computation of the previous call of Isort can continue, having
access to its own arguments.2
However, before it can be completed, the call of Isort leads again to an execution of
the second equation for Isort, which prompts the evaluation of Isort(rest, Insert(x, list)).
The bindings are again found on the stack. But since there are multiple bind-
ings available for each parameter name, the question is which of these should be
used, and how to find them. It is here where the stack data type comes into play
again. Since the parameter values for the latest Isort call have been pushed onto the
stack most recently, they are found on top of the stack. The resulting call is thus
Isort(H, Insert(W, B)). Since Insert(W, B) results in B→W, the next call of Isort to be
evaluated is Isort(H, B→W), which triggers again the second equation for Isort and leads
to another call Isort(rest, Insert(x, list)) with the stack shown in figure 13.4c.
Finding the arguments for the parameters on top of the stack, this call leads to
evaluating Isort( , Insert(H,B→W)), which in turn results in Isort( , B→H→W) after
the evaluation of Insert(H, B→W). Since now Isort’s first argument is the empty list,
the interpreter is given list to evaluate by the first equation for Isort. The evaluation
happens in the context of the stack shown in Figure 13.4d.
At this point, the value for list is found on the stack and returned as the final result.
Finally, the end of each Isort call is accompanied by removing its parameter bindings
Chapter 13 A Matter of Interpretation 235
from the stack, leaving the stack empty. We can observe how a stack grows with each
(recursive) call and shrinks after a call is completed.
The two-list trace for insertion sort shown earlier can be systematically recon-
structed from the stacks in figure 13.4. In fact, the stack that represents the complete
nesting of all Isort calls (figure 13.4d) is sufficient for that purpose. Specifically, each
group of bindings in the stack produces one step of the trace. Recall that each nonempty
input list to Isort is split when Isort is called to produce the bindings for the two param-
eters x and rest. This means that the input list for the first three Isort calls is given by
the list x→rest, and the output is given by list. For the final call of Isort, when the input
list is empty, the input list is the empty list, which is not represented by a parameter,
and the output list is given by list. Thus the bottom entry of the stack yields the pairs
of lists B→W→H and the empty list, the second entry yields W→H and B, the third
yields H and B→W, and the top entry gives us the empty list (the input list, not bound
to a parameter) and B→H→W.
Substitution and interpretation are two methods to understand the execution of al-
gorithms, in particular, of recursive algorithms. Substitution is simpler, since it only
works with a trace that is rewritten step-by-step, whereas interpretation employs an
auxiliary stack. Substitution mixes code and data, whereas interpretation keeps them
cleanly separated, which simplifies the extraction of simple traces. In the case of un-
bounded recursion, substitution produces something useful, whereas interpretation
does not terminate.
machine must exist in 1955. In fact, three time machines must exist at the same time in
1955 because the first Marty has also come to 1955 with the time machine.
This demonstrates that an inevitable consequence of time travel into the past is
the duplication of the time-traveling objects and people—at least, when multiple travels
happen to that same time in the past. The situation with recursion is very similar.
Since the second equation for Count contains only one recursive occurrence of Count,
it will create when executed only one instance at any time in the past, because each
recursive call will travel one time unit into the past from when it occurred. This form
of recursion, in which the defined is referred to only once in its definition, is called linear
recursion. Linear recursion in algorithms leads to isolated occurrences of algorithm calls
in the past. Linear recursion can be easily transformed into loops, and it generally does
not lead to opportunities for concurrent execution.
In contrast, the case when a definition refers to the defined more than once is called
nonlinear recursion. Since all the calls happen at the same time, the corresponding
executions are also started at the same time in the past and thus occur concurrently.
This does not mean that a computer (electronic or human) actually has to execute the
calls in parallel. It only means that they could be executed in parallel. And this is
a stunning aspect of well-designed divide-and-conquer algorithms. Not only do they
divide the problem quickly so that a problem can be solved with few steps, they also
support the parallel execution by many computers.
Quicksort and mergesort are two examples of such algorithms (see chapter 6). The
definition of quicksort is as follows. The first equation says that an empty list is already
sorted, and the second equation says that to sort a nonempty list one should take all
elements from the tail of the list ( rest) that are smaller than x, sort them and put them
in front of x and similarly place the result of sorting all larger or equal elements after x:
Qsort( ) =
Qsort(x→rest) = Qsort(Smaller(rest, x))→x→Qsort(Larger(rest, x))
Notice that the second equation reveals the nonlinear recursion of Qsort. Quicksort
performs best when x repeatedly allows the tail to be split into two sublists of roughly
equal size. This may not happen in the worst case, when the list is already (almost)
sorted, but quicksort performs very well on average.
It’s actually fun to execute quicksort or mergesort with the help of a group of peo-
ple. To execute quicksort, all people line up in a queue, and the first person starts sorting
by applying the second equation and splitting the list of all elements into two sublists
depending on whether they are smaller than the first element. She keeps the first ele-
ment and hands each sublist to a new person from the queue who performs quicksort
Chapter 13 A Matter of Interpretation 237
on the list handed to him and possibly also recruits new people from the queue for the
sorting of sublists. A person who is handed an empty list is immediately done and can
return that empty list, following the definition of the first equation.4 Once a person has
finished sorting her list, she hands the sorted list back to the person who gave her the
list. And everyone who receives a sorted sublist as a result creates his own sorted list by
placing the list with the smaller elements before x and the list with the larger elements
after x. If we stop the recursion when lists have only one element, then we need as many
people to sort the list as the list contains elements, because each person hangs on to only
one element. This strategy may seem a waste of resources for simply sorting a list, but
with ever-decreasing computing costs and increasing computing power, it illustrates the
power of divide-and-conquer and shows that many hands make light work.
Further Exploration
A recursive algorithm for solving a problem tries to first solve the same problem for
smaller input. The time travel in Back to the Future illustrates how solving a problem
in the past is a prerequisite for proceeding in the present. The same situation can be
found in the movies Twelve Monkeys and Déjà Vu, where time travel is employed to
solve a problem in the present by first solving a related problem in the past. With a
similar premise but a changed perspective, in the Terminator movies robots are sent to
the present to change reality now to create a different future. The future-to-present time
travel also is the basis of the movie Looper.
The problem of time paradoxes is often skirted in time travel stories because reading
an inconsistent story is not very satisfying. Attempts to deal with paradoxes, and thus
with undefined recursions, can be found in Stephen King’s 11/22/63, in which a teacher
travels through a time portal back in time to prevent JFK’s assassination, which causes
reality in the present to break down. A similar story line can also be found in Gregory
Benford’s Timescape.
The idea of making sense of recursive definitions through fixed points is dramati-
cally illustrated in Robert Heinlein’s All You Zombies, which tells the story of a woman
who falls in love with the turned-male version of herself traveling back in time from
the future. She gets pregnant, and her baby girl is abducted and brought back in time,
where she grows up to be the woman the story starts out with. The interesting aspect
is that all the strange events can be reconciled, since the woman is her own mother
and father, a fact that amounts to the fixed point of the story’s causal constraints. This
story has been made into the movie Predestination. In the movie Los Cronoscrímenes
(Timecrimes), the fixed point is obtained by creating doppelgängers of the protagonist.
When recursive definitions are executed, they unfold into nested structures. This is
clear for Sierpinski triangle quilts, Matryoshka dolls, or nested TV pictures. But it also
happens in stories. Perhaps one of the oldest examples is the collection of stories One
Thousand and One Nights, also known as Arabian Nights. A narrator tells the story of
Queen Scheherazade, who tells stories in which people tell stories about people who
240 Once Upon an Algorithm
tell stories, and so on. Douglas Hofstadter’s Gödel, Escher, Bach contains a dialogue that
tells a nested story, which, among other things, illustrates the stack model of recursion.
The book contains a lot of material on recursion, including several drawings by M. C.
Escher.
David Mitchell’s book Cloud Atlas consists of several stories that are nested within
one another. Nested stories play a central role in the movie Inception, where a group
of people steal ideas from other people’s minds by infiltrating and manipulating their
dreams. For a difficult mission, they have to perform their dream manipulation recur-
sively, that is, once being within the dream of their victim, they have to perform the
manipulation to the person in the dream.
Mutual recursion occurs in Richard Cowper’s Worlds Apart, which tells two stories,
one of a married school teacher on earth who invents a world with a married couple,
and the other of a married man on another planet who invents a story about a married
school teacher on earth. A similar story unfolds in the movie The Frame, which is
about a paramedic who watches a TV show about a thief who himself watches a TV
show about the life of the paramedic. Mutual recursion also appears in Lewis Carroll’s
Through the Looking-Glass when Alice encounters a Unicorn and both consider each
other fictional creatures.
Types and Abstraction
Harry Potter
Time for Dinner
After finishing the work on your quilt, it is time to prepare dinner. Which silverware
you need to set the dinner table depends on what will be served. If you prepare soup,
you will use spoons, for spaghetti you will need (also) forks, and for meat you will need
both knives and forks. The properties of different dishes call for different features of
eating utensils. The shape of the spoon supports carrying liquids, the tines of the fork
help with catching and holding on to the spaghetti strands, and the blade of the knife
makes it easy to cut the meat into pieces. The rules about when to use spoons, forks,
and knives illustrate several important aspects about language.
First, the rules talk about types of eating utensils, that is, they do not distinguish
between different individual spoons, forks, or knives. All those are grouped together
into categories, and the rules talk about arbitrary members of those categories. This
is important, since it allows for an economical description that employs as few rules
as possible. Suppose the word fork were not available to collectively describe all the
forks in your kitchen drawer, and instead you had names for each fork, say, Pointy,
Spiky, Sharpy, and so on. To express a rule about needing a fork for eating spaghetti,
you would need to mention each individual fork. If you had to talk in the same way
about spoons, knives, plates, and so on, you would have a hard time coming up with
names and remembering them all, and the rules would become quite complicated. If
this sounds ridiculous, that’s because it is—and it illustrates how important types are in
making language effective. The words soup, spaghetti, and meat are also used as types
in the rules, since they refer to general food categories, not to a particular dish made at
a particular time and place. You may notice that the word food is also a type, which
encompasses soup, spaghetti, and meat; it is a type at a higher level. This shows that
types and individual objects can organize concepts into hierarchies, which make lan-
guages more efficient through generalizations. Types are a powerful linguistic tool for
organizing knowledge in support of effective reasoning.
Second, the rules express relationships between different types, such as food and
eating utensils. Rules put to work the knowledge about objects represented in types.
244 Once Upon an Algorithm
In particular, they facilitate reasoning about objects and allow us to draw conclusions
about objects’ behavior. The rule about using a fork to eat spaghetti indicates that using
a fork for that purpose will be successful but using a spoon probably won’t be. The
rule encodes prior experiences about the interaction of objects, and by using types it
can represent these experiences in a succinct form.
Third, the rules are predictive, which means that you can pick the silverware and set
the table even before the meal is ready and still be certain that the chosen eating utensil
will be appropriate for the dish. (It also means that by violating the rules, you may go
hungry.) This matters, since it makes the dinner algorithm more efficient: it allows you
to set the table while the meal is still cooking. Since the rule is a reflection of previous
eating experiences, it saves you from having to find out every time you have soup that
a fork or knife won’t do the job.
Using a fork or knife to eat soup would be a mistake. This insight can be derived
from the rules about proper eating utensils. Since the rules talk about the types of
objects, irrespective of which individual spoon or fork or soup is used, any action that
violates such a rule is called a type error. There are two kinds of type errors. First, there
are errors that have failure as an immediate consequence. For example, using a fork to
eat soup is such a failure. These errors are real in the sense that they make an intended
action, such as eating, impossible. The algorithm gets stuck, and since no progress is
possible, it has to be aborted. Second, there are errors that do not lead to failure but
describe situations that are nevertheless considered wrong or ill advised. For example,
it is possible to use a spoon to drink water, but few people would normally do that,
probably because it’s inefficient and doesn’t provide a satisfying drinking experience.
Similarly, while it is not unusual to use a straw for drinking water, this would be rather
weird for wine. Again, this is not a mistake that makes the action impossible, but it is
nevertheless considered foolish.
In computation, operations are applied to values. With their ability to impose struc-
ture, type-based rules about the composition of computation can help make sense of
algorithms, predict their behavior, and identify errors in their execution. Just as phone
jacks and electrical outlets have different shapes to protect appliances from damage and
people from injuries, rules about types in algorithms can prevent computations from
causing bad consequences. Chapter 14 examines the magical power of types through
some of the adventures of Harry Potter and his friends.
14 The Magical Type
Of all the stories in this book, the tale of Harry Potter may be the one most widely
known. Among the many factors that contribute to its popularity is that the story
revolves around magic, which obeys different laws than the rest of the physical universe.
Therefore, these stories must develop rules about what is and what is not possible in the
realm of magic and indicate how these rules interact with the ordinary laws of nature.
This last aspect is particularly important for the Harry Potter adventures because they,
unlike many other stories involving wizards and magicians, take place in present-day
England, not in some distant place or time.
If the magic in the Harry Potter books were arbitrary and not governed by laws, the
stories would quickly become pointless. When readers cannot form reasonable expec-
tations of what happens next or what the unknown cause for a particular event could
be, they are less motivated to read on. Rules that capture the laws of nature, the “laws”
of magic, and other regularities in our lives are important for understanding events and
their causes. They are also essential for planning ahead. These laws are necessarily gen-
eral and refer to types of things, not individuals, because a law’s power lies in its ability
to represent a large number of individual cases. Types not only characterize objects but
also categorize actions. For example, teleportation, a train ride, or Hansel and Gretel’s
walk through the forest are all examples of movements. A simple law for movements
is that moving something changes its position. Since this is true for any movement and
any object, this law is about types, not individuals.
246 Once Upon an Algorithm
In the realm of computing, laws that describe regularities of computations are called
typing rules. These constrain the admissible inputs and outputs of algorithms and
thereby can find mistakes when executing algorithms. Moreover, since types operate
on a fine-grained level, that is, for operations and their arguments in each step of an
algorithm, typing rules can be employed to find type errors in algorithms. This is such
an important task that it is done itself by algorithms that are called type checkers. If an
algorithm doesn’t violate any of the typing rules, it is said to be type correct, and its ex-
ecution is guaranteed to be free of certain errors. Types and typing rules go a long way
toward ensuring that algorithms behave as intended, and they can serve as guidelines for
building reliable algorithms.
However, experienced wizards can also cast nonverbal spells without wands. The effect
of a spell is usually temporally bound, and it can also sometimes be protected against
through a counterspell. Magic can also be conserved in potions, and this enables even
muggles to perform magic if they have access to potions. That magic is not a trivial mat-
ter is also reflected in the fact that young wizards and witches have to attend wizarding
school for seven years to master the subject.
The classification of people or objects into different categories according to partic-
ular properties or abilities applies, of course, not only to magic; it happens in virtually
all domains of life. Classification is ubiquitous in science, and it is a basic component of
our everyday understanding of the world; we employ it unconsciously in our reasoning
all the time. Examples range from rather mundane tasks, such as selecting silverware
for dinner or clothes according to the weather, to more abstract realms, such as classify-
ing and reasoning about philosophical and political ideas. The process of classification
itself is studied systematically in computer science because it can greatly enhance our
understanding of computation and help us create more reliable software in practice.
In computer science, a class of objects that behave in a certain way is called a type. We
have encountered types already in several different ways. In chapter 4 we saw different
data types (set, stack, and queue) for storing and maintaining collections of objects. Each
such data type has a distinctive way of inserting, retrieving, and removing elements from
a collection. For example, a stack retrieves objects in the reverse order in which they
were put into the stack (last in, first out), and a queue retrieves them in the order they
were entered (first in, first out). Thus, a particular data type encapsulates a specific
behavior for manipulating collections of elements. The behavior of a data type makes it
suitable to support specific computation tasks. For example, in chapter 13, I used a stack
data type to explain how an interpreter works and keeps track of different arguments
when algorithms are invoked recursively.
Another use of types is in describing the required or expected input and output
for algorithms. For example, the type of an algorithm for adding two numbers can be
described as follows:
(Number, Number) → Number
The type of the algorithm’s argument(s) is shown to the left of the arrow and the
result type to its right. Thus the type says that the algorithm takes a pair of numbers
and produces a number as a result. Note that the type of algorithms for subtracting,
multiplying, or any other binary operation on numbers would be the same as that for
addition. This shows that a type is not tied to a particular algorithm or computation,
which is consistent with types being descriptions of classes of things. In this case, the
type describes a wide range of computations.
248 Once Upon an Algorithm
As another example, consider the getting-up algorithm in chapter 2, which has the
parameter wake-up-time to tell the algorithm when to sound the alarm. This parameter
needs as input a time value, that is, a pair of numbers indicating the hour and minute
of the wake-up time. Moreover, the two numbers cannot be arbitrary: the hour value
must be a number between 0 and 23,1 and the minute value must be a number between
0 and 59. Numbers outside these ranges are meaningless and would cause the algorithm
to behave in unexpected ways. We could therefore describe the type of the getting-up
algorithm as follows:
(Hour, Minute) → Alarm
Here Hour and Minute are types for subsets of numbers as described, and Alarm is a
type for alarm behaviors. An alarm behavior is making a particular sound at a specific
time, and just as the type Minute contains the numbers 0 through 59, the type Alarm
contains 24 × 60 = 1,440 different alarm behaviors, one for each minute of a day. If the
alarm sound is configurable, then the algorithm needs another parameter, which must
be reflected in the type as well, and the result type Alarm has to be more general as well
and incorporate different sounds.
We can see that the type of an algorithm tells us something about the algorithm. It
doesn’t tell us precisely what the algorithm does, but it narrows down its functionality.
This is often enough to choose between algorithms. One would obviously not use an
algorithm for addition for the problem of sounding an alarm at a specific time. Without
even looking at the details of the addition and getting-up algorithms, one can tell this
already from their different types. The type of an algorithm contains an arrow to
separate the type of its arguments (to the left) from the type of its result (to the right).
It says that the corresponding computation transforms input of one type into results of
the other. The result type can be a value, as in the case of addition, or an effect, as in
the case of the getting-up algorithm.
Since spells are transformations too, we can apply types to char-
acterize their effects as well. For example, in the Harry Potter
books, objects can be levitated using the Wingardium Leviosa spell,
in which the wizard has to move the wand, point it toward the ob-
ject, and use the incantation “Wingardium Leviosa.” Assuming that the types Object and
Levitating denote arbitrary and levitating objects, respectively, the type of the spell can
be described as follows:
(Wand, Incantation, Object) → Levitating
Types are a useful notation to talk and reason about properties of algorithms and com-
putations. The arrow to separate arguments from result types is one critical part of that
Chapter 14 The Magical Type 249
notation. The use of type parameters is another. As an example, consider the sorting al-
gorithms discussed in chapter 6. A sorting algorithm needs as input a list of items, and
it produces a list of the same items as a result. Moreover, the items must be comparable
in some way, that is, we must be able to determine for two items whether they are equal
or whether one is larger than the other. Since numbers are comparable, the following
type is a valid type for any sorting algorithm that sorts lists of numbers:
List(Number) → List(Number)
Writing the type for lists of numbers as List(Number) indicates that different list types
can be obtained by replacing Number. For example, since we can also sort textual
information, a sorting algorithm can also have the type List(Text) → List(Text). To
express that sorting algorithms can have a number of different though related types, we
can replace any specific item type such as Number or Text by a type parameter, which
can be substituted by a specific type. We can thus express the type of sorting algorithms
by the following template, where comparable stands for any type whose elements can
be compared:2
List(comparable) → List(comparable)
Any specific type, such as List(Number) → List(Number), can be obtained from this
type template by substituting an item type for the type parameter comparable.
By attaching a type to the name of an algorithm with a colon, one can assert that
the algorithm has that particular type. For example, to assert that the sorting algorithm
Qsort has this type, we would write the following:
The Count algorithm in chapter 12 also takes a list of items as inputs, but it doesn’t
require the items to be comparable, and it returns a number as a result. Thus the type
of Count can be described as follows, where any is a type parameter that stands for an
arbitrary type:
To know the type of something tells us what it can do and what can be done with it.
Knowing the types of algorithms helps us select and apply them appropriately. The
type information can help with that in several different ways.
First, if we want to execute a particular algorithm, its input type tells us what kind
of arguments we need to supply. For example, a spell needs an incantation and a wand
250 Once Upon an Algorithm
Rules Rule
In addition to supporting the efficient derivation of information about particular ob-
jects, types enable reasoning about the interaction of objects and thus provide a way to
make predictions about the world. This can be achieved through a set of rules, called
typing rules, that describe whether and how objects of different types can interact. For
example, if something fragile drops onto a hard floor, it will likely break. In this case,
the prediction can be derived from a rule that invokes several types: the types of fragile
and of broken things, the type of accelerated movements, and the type of hard surfaces.
The rule says that something belonging to the type of fragile things will likely belong
Chapter 14 The Magical Type 251
to the type of broken things after it has experienced an accelerated movement against
a hard surface. This general rule can be employed to make predictions in many differ-
ent scenarios involving all kinds of fragile things (eggs, glasses, ice cream cones), hard
surfaces (paved roads, brick walls, driveways), and accelerated movements (dropping,
throwing). The rule compresses a large set of facts into one compact description. The
compression is achieved by using types instead of individual names to represent the
pertinent properties.
A typing rule describes the type of an object under certain conditions. The con-
ditions that must be fulfilled for a rule to be applicable are called its premises, and the
statement about the type of an object that the rule derives from the premises is called
its conclusion. The rule about the breaking of fragile objects has three premises, namely,
that the object is fragile, that it is dropped, and that the surface it lands on is hard. The
rule’s conclusion is that the object is broken. A typing rule that has only a conclusion
but no premises is called an axiom to indicate that it is unconditionally true. Examples
are “Harry Potter is a wizard,” “3 is a number,” or “thin glass is fragile.”
The identification of types and the assignment of types to objects is a matter of
design and is guided by particular situations and purposes. For example, the distinction
between muggles and wizards is important in the context of the Harry Potter story
but is not useful in contexts where magic does not exist. Most rules about objects in
everyday life originally evolved to help us with survival in the natural world. They help
us to avoid lethal mistakes. With new technology and culture, new rules are required
to effectively navigate all areas of modern life like how to operate an elevator or a cell
phone or how to sign up for and use health insurance. These examples demonstrate that
the design of types and typing rules is an ongoing process driven by specific goals.
The most important and fundamental typing rule for computation is the application
rule. It relates the type of an algorithm to the type of the argument it is applied to and
the type of the result it produces. The rule requires that an algorithm can be applied
only to arguments whose type is identical to the algorithm’s input type. In that case,
the type of the result delivered by the algorithm is the same as the algorithm’s output
type. Typing rules are often shown by displaying the premises above a horizontal line
and the conclusion beneath it. In this style, the rule for applying an algorithm to its
arguments is as follows:
The first premise of the rule requires an algorithm to have an input and output type.
This requirement is closely related to Gamp’s Law of Elemental Transfiguration, pre-
252 Once Upon an Algorithm
sented in Harry Potter and the Deathly Hallows, which says that food cannot be created
out of thin air; it can only be summoned from a different place or enlarged in size.
The application rule reflects a fundamental property of algorithms, namely, that they
depend on input to produce variable output.
Only if all the premises of a rule are fulfilled can the conclusion of the rule be drawn.
For example, an egg dropped onto a driveway will satisfy the rule about the breaking
of fragile things, and thus the conclusion that the egg will be broken is warranted.
Moreover, since 6 is a valid hour and 30 is a valid minute, and since the argument
type of a wake-up algorithm is (Hour, Minute), we can apply this algorithm to the two
arguments and conclude that the result is a valid behavior of type Alarm. An application
of a typing rule is obtained by substituting a specific algorithm for Alg and argument(s)
for Arg together with the corresponding types for Input and Output. The application
of the algorithm typing rule to a wake-up algorithm looks as follows:
Similarly, if L is a list of numbers, we can use the rule with the type List(Number) →
List(Number) for Qsort to conclude that applying Qsort to L will produce a list of
numbers:
Qsort : List(Number) → List(Number) L : List(Number)
Qsort(L) : List(Number)
The ability to predict the outcome of situations from just the types of the involved
objects is a powerful reasoning mechanism that causes rational people to use a spoon
for eating soup or to undress before taking a shower. Types and typing rules provide
us with a compact representation of similar objects and events that makes this process
efficient. And this does not work just for reasoning about everyday situations but also
in domains such as the magic world of Harry Potter or the world of computing.
In the realm of Harry Potter the power of magic overrides many of the laws of
the natural world. Therefore, wizards sometimes reason differently about everyday
objects and act accordingly. For example, wizards can avoid much of the tedious work
that muggles are forced to do, as when Ron Weasley’s mother uses magic to aid her
with cooking and sweeping. For wizards it simply doesn’t make sense to do this by
hand. Another example is driving in a regular car, which is shunned by the wizarding
community because it isn’t economical compared to magical forms of transportation,
such as teleportation, floo powder, portkeys, and, of course, flying.
Chapter 14 The Magical Type 253
from Harry Potter, even if Ron doesn’t perform the Wingardium Leviosa spell correctly,
the feather could still start to levitate because somebody else could apply a spell to it
simultaneously. Or the dropped hammer might still break when it hits the floor because
its handle was already fractured to begin with. Therefore, when a rule is not applicable,
no conclusion can be drawn either way.
Thus it may not seem like such a big deal when typing rules don’t apply. However,
that really depends on how important it is to ensure that a particular conclusion is true.
While levitating feathers or the breaking of a dropped object are mere curiosities, there
are many situations where the conclusion really matters. Think of all the “famous last
word” jokes that play on incorrect conclusions, such as “The red wire is safe to cut,”
“Nice doggy,” “These are the good kind of mushrooms,” and so on. For a computation,
getting the types correct is similarly important. While seldom lethal, operating on a
value of the wrong type will in most cases cause a computation to fail. Here is why.
Each step of an algorithm transforms one or more values, and each operation that is
applied in this process relies on its arguments to be of a particular type because it is
only defined to do something meaningful for such values. Remember, we can’t sort a
wake-up time, and we can’t compute the square root of a list. If any step of an algorithm
encounters a value of a type that is not expected, it doesn’t know what to do with it and
gets stuck. As a consequence, the computation cannot be completed successfully and
cannot deliver a meaningful result. In short, the success of a computation depends on
submitting only values of correct types to operations.
Therefore, typing rules are a crucial component of computation because they can
ensure that operations don’t get stuck on wrong values and that the computation as
a whole can be completed successfully.3 Just as typing rules can be used to identify
meaningless sentences, such as “Harry Potter drank a spell,” they can spot nonsensical
applications of algorithms, such as Qsort(6:30am). And since an algorithm consists of
many steps that involve the application of operations and other algorithms to values,
typing rules can be used to identify mistakes within the definition of an algorithm,
and they can actively support the construction of type-correct algorithms, a process
sometimes called type-directed programming.
Typing rules prevent not only impossible applications of operations to values that
get stuck but also ones that don’t make sense in a certain context. For example, your
height and age can be both expressed as numbers, but it doesn’t make sense to add these
two numbers. The addition makes no sense in this case because the two numbers are
representations of different things (see chapter 3), and thus the sum of the two numbers
is not a representation of anything and therefore meaningless.4 We find examples in
Harry Potter as well. For example, in the game of Quidditch two teams try to score
Chapter 14 The Magical Type 255
goals by tossing a ball, the quaffle, into one of the opponent team’s goal posts. Now
this would be rather easy using a teleportation spell, but that would make the game
rather boring, and it would undermine the goal of the game, which is to determine the
better Quidditch team. Therefore, employing magic other than flying a broomstick is
not allowed for the players of the game. Such restrictions are ubiquitous in games. The
requirement to follow suit in a card game or move a bishop in chess only diagonally
does not reflect the impossibility of playing a different card or moving the bishop in a
straight line; it is a restriction in a particular context in support of the goals of the game.
Types can be used to track these representations alongside computations to ensure that
operations respect the rules of combination for the represented values.
A violation of a typing rule in an algorithm is called a type error. It indicates an
inconsistency in the combination of operations and values. An algorithm that does not
contain any type errors is said to be type correct. A type error in an algorithm can have
different kinds of effects. First, it can cause the computation to get stuck, which in
turn may lead to aborting the computation, potentially followed by reporting an error.
The attempt to divide a number by zero typically has such an effect. In Harry Potter
and the Chamber of Secrets, Ron’s wand breaks and causes all his spells to malfunction.
For example, the Eat Slugs spell that he casts on Malfoy backfires, and his attempt to
transform a rat into a goblet results in a furry cup with a tail. While the exact outcome
might be a surprise, the lack of success in Ron’s magic was predictable, since his broken
wand violates one of the premises of the typing rule for magic. In these cases, the
effects of the nonworking magic are immediately visible. Second, a type error might
not lead to any immediately noticeable effect, which means the computation continues
and eventually finishes. But it does so with a meaningless value and thus ultimately
produces an incorrect result. Adding the age and height of a person is such an example.
While an abrupt interruption of a computation without a final result seems bad, a
computation that continues but leads to a meaningless result can be much worse. The
problem in that case is that one might not recognize that the result is incorrect and
base an important decision on that wrong result. An example of this is when Harry
Potter mispronounces “Diagon Alley” as “Diagonally” when attempting to travel by
floo powder, which causes him to end up incorrectly in Knockturn Alley instead. The
travel was not aborted, and the floo powder still worked, but it produced a wrong result.
It would have been better for Harry if the trip had simply aborted, because he nearly
gets kidnapped and has a run-in with some dark artifacts in the store he ends up in.
256 Once Upon an Algorithm
Law Enforcement
Since type correctness is an important prerequisite for the correct functioning of any
algorithm, it is a good idea to check for type correctness automatically using an algo-
rithm. Such an algorithm is called a type checker. A type checker determines whether
the steps in an algorithm conform to the typing rules about control structures, vari-
ables, and values. A type checker can spot type errors in an algorithm and thus prevent
erroneous computations.
The types in an algorithm can be checked in two different ways. One possibility is to
check the types during the execution of the algorithm. This approach is called dynamic
type checking, since it happens while the dynamic behavior of an algorithm comes into
effect. Right before an operation is performed, it is determined whether the types of the
arguments match the typing requirements of the operation to be applied. A problem
with this approach is that it is not really of any help in case a type error is detected.
In most cases, the only possibility is to abort the computation, which can be quite
frustrating, in particular, in cases when much of the intended computation has already
happened and the algorithm aborts right before the final result is obtained. It would
be much better to know ahead of time whether the computation can be successfully
completed without encountering a type error and not spend resources on a computation
that is doomed to fail.
The alternative approach inspects the algorithm without executing it and checks
whether all steps respect the typing rules. The algorithm is then executed only if no
type errors are found. In this case, one can be sure that the algorithm will not abort
with a type error. This approach is called static type checking, since it happens without
considering the dynamic behavior of executing an algorithm, and it requires only the
static description of the algorithm. Static type checking would have told Ron not to try
any magic with his broken wand.
A further advantage of static type checking is that an algorithm needs to be checked
only once. After that it can be executed repeatedly for different arguments without any
more type checking. Moreover, while dynamic type checking has to check the opera-
tions in a loop repeatedly—in fact, as often as the loop is repeated—static type checking
has to check any operation only once, which leads to faster execution of the algorithm.
In contrast, dynamic type checking must be performed every time an algorithm is exe-
cuted.
However, dynamic type checking has the advantage that it is generally more precise,
because it takes into account the values processed by the algorithm. Whereas static
type checking only knows the types of parameters, dynamic type checking knows the
Chapter 14 The Magical Type 257
concrete values. Static type checking would have prevented Ron from using his broken
wand at all, and thus would have prevented any failures, while dynamic type checking
lets him try out spells, and succeed and fail in different situations. For example, when
Ron tries to transfigure a rat into a cup, he only partly succeeds: the cup still has a tail.
Determining the precise typing behavior of an algorithm is, like determining its ter-
mination behavior, an undecidable problem (see chapter 11). Consider, for example, an
algorithm in which only the “then” branch of a conditional contains a type error. This
algorithm contains a type error only when the condition of the conditional evaluates
to true and selects that branch. The problem is that figuring out the value of the condi-
tion might require an arbitrarily complicated computation (for example, the condition
could be a variable whose value is computed in some loop). Since we cannot be sure
whether such a computation terminates (since we cannot solve the halting problem),
we cannot know in advance whether the condition will be true, and thus we cannot
know whether the program will exhibit a type error.
To deal with this inherent inability to predict the behavior of an algorithm, static
type checking approximates the types within an algorithm. To avoid computations that
could lead to nontermination, static type checking is overcautious and reports a type
error whenever there could be one. In the case of the conditional this means that a
static type checker would require both alternatives to not contain a type error, since it
can’t compute the value of the condition. It would thus report a type error even if only
one branch of the conditional contained one. Even though the algorithm might not
exhibit any type error when executed, a static type checker would flag the algorithm
as type-incorrect and not execute it. This is the price to be paid for the safety of static
type checking: Some algorithms will be rejected even though they might not actually
produce an error when executed.
Static type checking trades immediacy for accuracy. While an error might not neces-
sarily happen later, it could. If the computation is important enough to not risk failure,
it is better to fix the potential error in the algorithm before executing it. The cautious
approach of static type checking is not unique to computing. There are many other
domains in which checks are performed in advance. If you are boarding a plane, you
rely on the pilots making sure there is enough fuel on board and that all important
systems function. You are expecting static type checking of the rules for a successful
flight. Consider the alternative of checking the systems after take-off. Detecting a prob-
lem at that time could be too late. Or, if you are to receive a medical procedure or
prescription, the doctor should make sure that any contraindication is determined be-
fore the treatment to prevent any harm. Static type checking follows the maxim “Better
safe than sorry.” Following static typing, Ron should not have used the Eat Slugs spell;
258 Once Upon an Algorithm
because of his broken wand the rule for applying magic is not applicable and foreshad-
ows a mistake during the execution of the spell. In contrast, dynamic typing is a less
aggressive approach for enforcing typing rules that doesn’t let any opportunity for com-
putation pass, thereby running the risk of encountering errors during the execution of
an algorithm. This is what Ron did. He took a chance, and it didn’t work.
Building Code
The violation of a typing rule is a sign that something is wrong. A type error in an
algorithm indicates that it probably won’t work correctly. Viewed positively, when
all parts of an algorithm satisfy the typing rules, then some kinds of mistakes cannot
happen, and the algorithm is correct to a certain extent. Of course, the algorithm
can still produce incorrect results. For example, if we need an algorithm for adding
two numbers, which has the type (Number, Number) → Number, but we accidentally
create an algorithm for subtracting numbers, then this algorithm still has the correct
type, but it doesn’t compute the right values. Nevertheless, in a type-correct program
a large number of errors have been ruled out, and one can be sure that the steps of the
algorithm conform to an important level of consistency.
Shielding against incorrect or meaningless computation is an important role of types
and typing rules, but it is not the only benefit they offer. Types also act as explanations
of the steps of an algorithm. Without understanding the details of every step in an
algorithm, we can get a high-level picture of what is being computed when we just
look at the types. The ability to differentiate between computations by their types was
illustrated by the task of picking the envelope with the right algorithm. The ability
of types to summarize a large number of computations provides insights that cannot
be gleaned by just looking at a few examples. Thus, types and typing rules also have
explanatory value.
To illustrate, let’s take another look at the game of Quidditch. The game can be
explained without showing an actual game by describing the general rules and the roles
of different players. This is, in fact, how Harry learns it from Oliver Wood in Harry
Potter and the Sorcerer’s Stone.5 The rules of the game constrain the valid actions in the
game and define how the score is kept. The important aspect is that the rules of the
game are typing rules that use types for the roles of players (such as “seeker”) and kinds
of objects (such as “quaffle”). While seeing an example game would be helpful, it is not
enough to understand the game. In particular, even after having watched many games,
someone who doesn’t know the rules can be easily surprised. For example, a Quidditch
game ends immediately when a seeker catches the Golden Snitch. This may not have
Chapter 14 The Magical Type 259
occurred in the examples that have been watched, so it would surprise an observer
the first time it happened. Other games have special rules that can be surprising, for
example, the offside rule in soccer or the en passant move in chess. It is difficult to learn
a game by just watching examples, and it can take a very long time for all the rules to
be acted out.
Similarly, it can be quite hard to understand what an algorithm does just by watch-
ing its inputs being transformed into corresponding outputs. While types are generally
not sufficient to precisely describe the effect of an algorithm, they explain some of its
behavior on a high level, and they support the understanding of how its parts (the
individual steps and control structures) fit and work together.
Types structure the objects of a domain, and typing rules explain how they can be
combined in meaningful ways. In computer science, types and typing rules ensure the
meaningful interaction of different parts of an algorithm. They are thus important tools
for building larger systems out of smaller ones. This is explored in chapter 15.
At the End of the Day
As a long day of computing comes to an end, you reflect on all that has happened,
and you note some of the events in your diary. Getting up in the morning wasn’t re-
ally different from other days in any way. Nothing extraordinary happened that would
warrant mention in the diary. Brushing your teeth a little longer than normal or even
running out of toothpaste—these are not events that you would like to be reminded of
years from now. The same is true for most of what happened during the day. Since you
have breakfast every day and commute to work every workday, starting the diary entry
with something like “This was mostly another ordinary workday” implies a standard
getting-up and breakfast routine. Specifically, the name workday refers to a whole de-
tailed description of getting up, having breakfast, commuting to work, and all the other
routines that occur during an ordinary workday.
Assigning a (short) name to a (long) description and then using the name to refer to
the description is a form of abstraction. Like representing different cities by the same
kind of point on a map, the name workday ignores differences between weekdays and
regards them as all the same. However, since the points on the map appear at different
locations, they are not identical. Similarly, different mentions of workday occur at
different times, so they are not identical either. The different positions in space and
time provide context and additional meaning to references. For example, one city is
located by the sea or can be reached by a particular highway, and one workday falls on
election day or follows a holiday.
While effective in many situations, a plain name or symbol can sometimes be too
abstract a representation. To supply different references with additional information,
names and symbols can be extended by parameters. For example, points for repre-
senting cities are sometimes parameterized by size or color to distinguish larger from
smaller cities. Or the shape of the symbol is varied to distinguish the capitals of states,
and so on. Similarly, the name referring to a day can be parameterized. In fact, the
name workday already distinguishes that day from, say, a holiday, on which you would
not commute to work.
262 Once Upon an Algorithm
The full potential of parameters unfolds when they are referenced in the abbreviated
descriptions. For example, suppose you want to add to your diary an entry about
an important meeting or your visit to the doctor’s office. Like workday, the terms
meeting and doctor’s appointment are abstractions that stand for a number of things that
typically happen during such events. Most likely, in addition to writing that you had
an important meeting, you would add whom you met with and what the purpose of
the meeting was, and for the doctor’s appointment you would probably mention what
kind of doctor you had to consult and for what reason. This information would be
represented by a parameter for the abstraction and used in its description. Just as when
a bakery offers to customize birthday cakes with decorations for name and age, the
abstraction meeting is extended by the parameters who and purpose, which then are
referred to in the description of the abstraction. When you use an abstraction as in
“meeting with Jill about hiring,” the parameters are replaced by “Jill” and “hiring,” and
the description reads as if it were written just for this particular meeting.
Abstractions identify patterns and make them reusable. Through parameters, ab-
stractions can be flexibly adapted to specific situations and still convey a lot of informa-
tion in a concise form. Assigning a name to an abstraction and identifying its parameters
defines its interface, which determines the proper use of the abstraction. Much of what
computer scientists do on a daily basis is identifying, creating, and using abstractions to
describe and reason about computation. Computer science studies the nature of abstrac-
tions and formalizes their definitions and uses to empower programmers and software
engineers to develop better software.
In the final chapter of this book I explain what abstractions are and the role they
play in computing. Since abstractions play an important role in natural languages and
computer science, it is not surprising that stories can explain a lot about computing.
15 A Bird’s Eye View:
Abstracting from Details
The examples of computation in this book were all rather small, which is good for
illustrating concepts and principles, but do they scale up? The question of scalability
arises in different places.
First, there is the question of whether algorithms work for large inputs. This prob-
lem is addressed by analyzing the runtime and space complexity of algorithms, discussed
in the first part of this book, in particular, in chapters 2, 4, 5, 6, and 7. While some algo-
rithms scale quite well (following a path, making coffee, searching, and sorting), others
don’t (optimizing a lunch selection with a limited budget). As explained in chapter 7,
the runtime of exponential algorithms effectively prohibits the solution of some prob-
lems.
Second, there is the question of how to create, understand, and maintain large soft-
ware systems. Designing and writing a small program is relatively easy, but producing
large software systems still poses a major challenge for software engineers.
To see what the problem is, imagine a map of the city or country you live in.
What would be a good scale? As explained by Lewis Carroll in his last novel Sylvie
and Bruno Concluded, a scale of one to one is useless since, if the map were spread
out, “it would cover the whole country and shut out the sunlight!” Thus any useful
map has to be much smaller than what it represents, and it therefore has to omit many
details. An important question in creating a map is, What scale is small enough to
be manageable yet large enough to represent sufficient detail? Moreover, which details
should be ignored and which should be kept? The answer to the latter question often
264 Once Upon an Algorithm
depends on the context in which the map is used. Sometimes we need to see the roads
and parking garages; in other situations we’re interested in the bike paths and coffee
shops. This means that the map should be configurable to individual needs.
Any description that is shorter than what it talks about faces these challenges of
finding the right level of generalization and providing means for configuration. Such
descriptions are called abstractions. Much of computer science is concerned with the
question of how to define and effectively use abstractions. Most prominently, an algo-
rithm is an abstraction of many different computations, and its parameters determine
what particular computation will unfold when the algorithm is executed. An algorithm
manipulates representations, which are also abstractions that preserve details to be ex-
ploited by the algorithm and ignore others. The level of abstraction in an algorithm and
its arguments are related. Finding the right level of generality for an algorithm often
involves a trade-off between generality and efficiency. To handle a wider range of inputs
often requires a higher level of abstraction for the inputs, which means the algorithm
has fewer details to exploit.
An algorithm is expressed in a language and is executed by a computer. All these
concepts make use of abstractions too. And finally, the abstraction from individual com-
putations achieved by algorithms requires abstraction also for the concepts of runtime
and space efficiency because an effective characterization of an algorithm’s efficiency
needs to be independent of the size of the different inputs.
This chapter illustrates that abstraction permeates all major concepts of computer
science. I first discuss the issues that arise in defining and using abstractions by contem-
plating an abstraction for the stories used in this book. Then I explain how abstraction
applies to algorithms, representations, runtime, computers, and languages.
of detail provided and the time required to understand the description because more
details lead to longer descriptions. This problem can be addressed by introducing names
for specific descriptions, such as detective story, in which the protagonist is a detective
who investigates a crime. Tag lines, movie previews, and other short summaries are
important because they are an effective way to deliver relevant information about what
to expect of a particular story, and they help us make faster decisions. For example,
if you don’t like detective stories, you can anticipate that you probably won’t enjoy
reading The Hound of the Baskervilles.
How would you go about generating a summary statement for a collection of sto-
ries? You could start by comparing two of the stories and remember all the aspects they
have in common. Then you could compare the result with what is contained in the
third story and keep those things all three stories have in common, and so on. This
process filters out, step by step, aspects that do not apply to a particular story and keeps
those that are common to all of them. This elimination of distinguishing details is called
abstraction, and one also sometimes talks about the “abstraction from details.”1
In computer science, the description that is the result of an abstraction is itself also
called an abstraction, and the examples are called instances of the abstraction. Using the
same name, abstraction, for the process and the result might be confusing. Why don’t
we just use a similar term, such as generalization? This seems to be fitting, since a gener-
alization is also a summary that matches a number of specific instances. However, the
term abstraction in computer science means more than generalization. In addition to the
summarizing description, it typically has a name and contains one or more parameters
that can be identified with concrete values in the instances. The name and the parame-
ters are called the abstraction’s interface. The interface provides a mechanism for using
an abstraction, and the parameters link key elements of the instances to the abstraction.
While generalization is solely driven by the instances, the need for defining an interface
makes abstraction a more deliberate process that involves decisions about what details
to omit. For example, we can elevate a story generalization “How protagonists over-
come a challenge” to a story abstraction by giving it the name Story and identifying the
parameters protagonist and challenge:
Like the parameters of an algorithm, the parameters of the Story abstraction are used in
the description as placeholders to be substituted by whatever arguments Story will be
applied to. (The potential singular/plural in protagonist( s) and solve(s) is used to have
the description match both single and multiple protagonists, and replacing “overcome
a” with “solve(s) the problem of” makes the following examples easier to read.)
266 Once Upon an Algorithm
In the story of Hansel and Gretel the protagonists are Hansel and Gretel, and the
challenge is to find their way home. We can express this fact through an application of
the Story abstraction to the corresponding values for the parameters:
Story(Hansel and Gretel, finding their way home)
This application denotes a story of two protagonists, Hansel and Gretel, who solve the
problem of finding their way home.
It’s worthwhile reflecting on what has been accomplished here. Suppose you want
to quickly explain to somebody the gist of the story of Hansel and Gretel. You can
do this by first saying that it is a story, that is, by referring to the Story abstraction.
This works, of course, only if the other person knows what a story is, that is, if they
understand the Story abstraction. In that case, a reference to the abstraction invokes
in the other person the description of what a story is. Then you provide details to fill
in the roles represented by the parameters protagonist and challenge, which turns the
generic description of a story into a more specific one.
Technically, the application of the Story abstraction leads to replacement of the
abstraction name by its definition and the substitution of the values “Hansel and Gretel”
and “finding their way home” for the two parameters in the definition (see chapter 2).
This substitution leads to the following instance:
How Hansel and Gretel solve(s) the problem of finding their way home
The relationship between instance and abstraction is illustrated in figure 15.1.
Once we have given a summary for the story of Hansel and Gretel in terms of
the Story abstraction, we may want to refer to it even more succinctly by assigning
it a name. In this case, the name of the story happens to be the same as that of the
protagonists:
Hansel and Gretel = Story(Hansel and Gretel, finding their way home)
This equation says that Hansel and Gretel is the story of Hansel and Gretel solving
the problem of finding their way home. That the story name and the names of the
protagonists agree is purely coincidental, even though this actually happens fairly often.
Here is an example where it doesn’t occur:
Groundhog Day = Story(Phil Connors, escaping an endlessly repeating day)
Again, the instance represented by the application of the abstraction can be obtained
by replacing the abstraction name with its definition and by substituting values for the
parameters:
Chapter 15 A Bird’s Eye View: Abstracting from Details 267
Interface
Definition of
Name(parameter, ..., parameter) = Story(protagonists, challenge) =
Abstraction
How protagonists solve the problem of challenge
Definition
refers to
Use of
Name(argument, ..., argument) Story(Hansel and Gretel, getting home)
Abstraction
produces
substituting
Instance of argument
... How Hansel and Gretel solve
Abstraction argument Definition the problem of getting home
Figure 15.1 Definition and use of abstractions. The definition of an abstraction assigns it a
name and identifies parameters that are referred to by its definition. The name and parameters
are the abstraction’s interface, which prescribes how the abstraction is to be used: through its
name and by supplying arguments for its parameters. Such a use generates an instance of the
abstraction, which is obtained by substituting the arguments for the parameters in the definition
of the abstraction.
challenge (at least, for stories that contain several). For example, describing Hansel
and Gretel with the Story abstraction leaves at least one challenge unmentioned, which
raises the question of whether we could define the Story abstraction so that it accounts
for multiple challenges in a story. This is not difficult, but what is the right level of
detail for such an abstraction?
Say When
When is an abstraction general enough to cover enough instances? When does it leave
out too many details and lack precision? Whenever we are forming an abstraction,
we have to decide how general it should be and how much detail it should provide.
For example, we could use the description “sequence of events” instead of the Story
abstraction to characterize the stories in this book. This would be accurate, but it
would leave out some important aspects and be less precise. On the other hand, we
could use more specific abstractions, such as “fairy tale” or “comedy.” However, while
these provide more details than the Story abstraction, they are not general enough to
apply to all stories. In fact, even the fairly general Story abstraction is not general
enough to cover stories in which the protagonists fail to solve the problem. We could
address this shortcoming by adding another parameter that can be substituted by either
“solve” or “don’t solve,” depending on the situation. Whether generalizing Story in this
way is a good idea depends on the usage of the abstraction. If the “don’t solve” case
never comes up, the extra generality is not needed, and the current simpler definition is
preferable. However, the usage context for abstractions may change, and thus one can
never be absolutely sure about the chosen level of generality.
In addition to finding the right level of generality, another problem in defining an
abstraction is deciding how many details to provide in the description and how many
parameters to use. For example, we could add a parameter to the Story abstraction that
reflects how the challenge was overcome by the protagonists. On the one hand, adding
parameters to an abstraction makes it more expressive, since it provides a mechanism
to expose more subtle differences between different instances. On the other hand, it
makes the interface of the abstraction more complicated and requires more arguments
when the abstraction is used. A more complex interface not only makes an abstraction
more difficult to use but also makes applications of the abstraction more difficult to
understand, because more arguments have to be substituted for parameters in more
places. This counteracts one of the major reasons for using abstractions, which is to
provide concise and easy-to-grasp summaries.
Chapter 15 A Bird’s Eye View: Abstracting from Details 269
Of course, there are many other challenges for programmers, and some can be neatly
summarized using the Story abstraction as well. Here is one more struggle that every
programmer can empathize with:
Finding the right level of abstraction is also an issue for the Story abstraction. There are
two different ways to define Hansel and Gretel as an instance of the Story abstraction.
Picking one instance that focuses on one challenge means abstracting from (ignoring)
the other. But what if we want to use both instances to provide a more comprehensive
account of the Hansel and Gretel story? This can be done in different ways. First, we
can simply mention both instances side by side:
This looks a bit clunky. In particular, the repeated mention of the protagonists and the
Story abstraction seems redundant. This is apparent when we perform the substitution
and generate the following instance:
How Hansel and Gretel solve(s) the problem of finding their way home,
and how Hansel and Gretel solve(s) the problem of escaping the witch
Alternatively, we could simply combine both challenges into one argument for the
challenge parameter:
Story(Hansel and Gretel, finding their way home and escaping the witch)
This works reasonably well. You may have noticed that the same thing was done with
the protagonists: Hansel and Gretel were grouped together and substituted as one text
unit for the protagonist parameter. But we can also see that the flexibility of using
single or multiple protagonists does not come for free. In the definition of the Story
abstraction the use of protagonist( s) and solve(s) had to work grammatically with both
singular and plural subjects. (To allow for multiple challenges this also had to be done
270 Once Upon an Algorithm
with problem(s).) It would be nice if the Story abstraction could generate separate,
grammatically correct instances for each case.
This can be achieved by defining the Story abstraction so that its first parameter is
a list of protagonists. Then two slightly different definitions are given, depending on
whether the parameter is a single protagonist or a list of two:2
Story( protagonist, challenge) = How protagonist solves the problem of
challenge
Story( protagonist 1→protagonist 2 , challenge) = How protagonist 1 and
protagonist 2 solve the problem of challenge
Now if Story is applied to a single protagonist, such as Phil Connors, the first definition
using the singular verb is chosen, whereas when Story is applied to a list of two protag-
onists, such as Hansel→Gretel, the second definition with a plural verb is selected. In
that case the second definition decomposes the list into its two elements and inserts and
between them.
It seems that the Story abstraction provides a means to create English sentences. In
chapter 8, I demonstrated grammars as a mechanism to do just that. So couldn’t we
alternatively define a grammar for story summaries? Yes. Here is one possible gram-
mar that corresponds to the latest definition of Story.3 Apart from minor notational
differences, such as using an arrow instead of an equal sign or using dotted boxes to
mark nonterminals rather than dotted lines to mark parameters, the two mechanisms
basically operate in the same way by substituting values (or terminals) for parameters
(or nonterminals).
Table 8.1 in chapter 8 compares grammars, equations, and algorithms. It shows the
common roles of the constituting parts of the different formalisms and thus empha-
sizes that grammars, equations, and algorithms are different but similar mechanisms for
describing abstractions.
The preceding discussion shows that the design of an abstraction is not a straight-
forward task. By encountering new use cases one may recognize changing requirements
that force changes in the definition of the abstraction. Such changes can lead to a more
general abstraction or to one that exposes different details. In some cases, this might
cause a change to the interface, for example, when a new parameter is added or when
Chapter 15 A Bird’s Eye View: Abstracting from Details 271
its type is changed. An example is the change of the protagonist parameter from a single
value to a list. When the interface of an abstraction changes, all existing uses of the
abstraction have to be changed as well to conform to the new interface. This can be a
lot of work and might have a ripple effect of causing changes to other interfaces as well.
Therefore, software engineers try to avoid changing interfaces whenever possible and
consider it often only as a last resort.
A Run of Abstractions
Consider the last definition of Story that uses a list of protagonists. While the equations
work only for lists that contain either one or two elements, the definition can be easily
extended to work for lists with an arbitrary number of elements using loops or recur-
sion, as demonstrated in chapters 10 and 12. Using separate equations to distinguish
between different cases and the idea of processing lists indicate that the Story abstrac-
tion might actually be an algorithm for producing short story descriptions. As it turns
out, there is more to this than meets the eye, and I next discuss the relationship between
algorithms and abstractions in more detail.
Given the importance of abstraction in computer science, it isn’t surprising that
one of its central concepts, the algorithm, is itself an example of an abstraction. An
algorithm describes the commonality of a number of similar computations, whether
following pebbles or sorting lists; see figure 15.2.4 Different individual computations
result when the algorithm is executed with different arguments substituted for its pa-
rameters.
The Story abstraction seems to be a post hoc generalization, that is, the summa-
rizing description is created after having seen many existing stories. This may happen
occasionally for algorithms as well. For example, after you have repeatedly prepared
a certain meal, changing the ingredients and modifying the method to improve the
outcome, you may finally decide to write down the recipe to ensure that the culinary
experience can be repeated in the future. However, in many other cases an algorithm
is a solution to an unsolved problem and is created before any computation is ever run.
And this is what makes the algorithm abstraction so powerful. In addition to describ-
ing computations that have already happened, an algorithm can generate completely
new computations as needed. An algorithm has the power to solve new problems not
encountered before. In the context of the Story abstraction, think of some arbitrary
protagonists and a specific problem, and you have the beginning of a new story.
272 Once Upon an Algorithm
Algorithm
Type
Computation
Representation
Computation Representation
Computation
Representation
identifies some features of signs to stand for something (see chapter 3) and by doing so
actively ignores other features, that is, it abstracts from those features.
When Hansel and Gretel find their way home by tracing pebbles, the size and color
of the pebbles don’t matter. The use of the pebbles as a representation for locations
creates an abstraction that ignores the differences in size and color and focuses instead
on their feature of reflecting the moonlight. When we say that Harry Potter is a wizard,
we emphasize the fact that he can perform magic. In contrast, we do not care about his
age or that he wears glasses or any other interesting piece of information about him.
By calling Harry Potter a wizard we abstract from all those details. The same applies
when we point out that Sherlock Holmes is a detective or that Doc Brown is a scientist.
We highlight features commonly associated with those terms and temporarily ignore
everything else about the person.
Terms such as wizard, detective, and scientist are, of course, types. They carry with
them connotations of properties that are generally thought to be true of any one indi-
vidual member of such a type. The notions of protagonist and challenge in the Story
abstraction are also types, since they evoke a particular imagery that is required for the
Story abstraction to convey its meaning. It seems that protagonist is a more general type
than, say, wizard or detective because it provides fewer details. The fact that the latter
two can be substituted for the former also supports this view. However, this is only part
of the story. Take Lord Voldemort, for example. He is a wizard, but he is not a pro-
tagonist. Instead, he is a major antagonist in the Harry Potter stories. Since both Harry
Potter, a protagonist, and Voldemort, an antagonist, are wizards, it now seems that the
type wizard is more general because it ignores details on which the distinction between
a protagonist and an antagonist relies. Therefore, neither protagonist nor wizard can be
considered more abstract in general, which is not too surprising, since these types come
from different domains, namely, stories and magic.
Within one domain, types are often arranged more clearly by placing them in hier-
archies. For example, Harry Potter, Draco Malfoy, and Severus Snape are all members
of Hogwarts, but only Harry and Draco are students at Hogwarts. If you are a student
at Hogwarts, you are clearly also a member of Hogwarts, which means that the type
member of Hogwarts is a more general abstraction than student at Hogwarts. Moreover,
since Harry, but not Draco, is a member of the house of Gryffindor, the type student at
Hogwarts is a more general abstraction than student in the house of Gryffindor. Similarly,
magic is more abstract than a spell, which in turn is more abstract than a teleportation
spell or a patronus charm.
Types found in programming languages are probably the most obvious form of data
abstraction. The numbers 2 and 6 are different, but they have many things in com-
Chapter 15 A Bird’s Eye View: Abstracting from Details 275
mon. We can divide them by 2, we can add them to other numbers, and so on. We can
therefore ignore their differences and group them together with other numbers in the
type Number. This type abstracts from the properties of individual numbers and ex-
poses the commonalities between all its members. In particular, the type can be used to
characterize parameters in algorithms, which can then be exploited to check the consis-
tency of the algorithm with a type checker (see chapter 14). Thus data abstraction goes
hand-in-hand with functional abstraction: an algorithm abstracts from individual values
by using parameters. However, in many cases the parameters cannot be substituted by
anything imaginable but require as arguments representations that can be manipulated
by the algorithm. For example, a parameter that is multiplied by 2 must be a number,
which is where types as data abstraction come into play. The type Number can also be
regarded as more abstract than many more specialized number types, such as the type
of all even numbers.
Finally, in addition to (plain) types such as Number, data abstraction applies specif-
ically to data types (see chapter 4). A data type is defined solely by the operations it
offers and their properties. The details of the representation are ignored, that is, ab-
stracted from, which means that a data type is more abstract than a data structure that
implements it. For example, a stack can be implemented by a list or an array, but the
details of those structures and thus the differences between them are not visible when
they are used to implement stacks.
Any well-chosen data abstraction highlights the features of the representation that
support computations with them. Moreover, such an abstraction ignores and hides
features that could interfere with the computation.
Time to Abstract
As explained in chapter 2, the runtime of an algorithm is not reported in the same way as
a fitness tracker reports the time of your most recent six-mile run. Reporting runtimes
in seconds (or minutes or hours) would not be a very helpful piece of information,
since it depends on a particular computer. When the same algorithm is run on a faster
or slower computer, the runtime of the same algorithm would be different. Comparing
your time for running six miles with that of your friend makes sense, since it provides
information about the relative efficiency of two computers, that is, runners. However,
the time doesn’t say anything meaningful about the efficiency of the running algorithm
itself, because both you and your friend are executing the same algorithm.
It is therefore a good idea to abstract from concrete times and to measure the com-
plexity of algorithms in number of steps taken. Such a measure is independent of the
276 Once Upon an Algorithm
Growth
Steps
Steps
Steps
Time
Time Time
Figure 15.3 Time abstraction. To abstract from the different speeds of computers, we use the
number of steps an algorithm needs during execution as a measure for its runtime. To abstract
from the different number of steps needed for different inputs, we measure the runtime of an
algorithm in terms of how fast it grows for larger input.
fail to execute an algorithm altogether if even one of its instructions has been slightly
changed. The brittle language tie between algorithm and computer seems to make it
difficult to share algorithms. Fortunately, software does not have to be rewritten every
time a new computer enters the market. Two forms of abstraction can be credited for
this: language translation and abstract machines.
To illustrate the idea of an abstract machine, let’s consider the algorithm for driving
a car. You probably learned how to drive one specific car, but still you are able to drive a
variety of different cars. The driving skills you acquired are not tied to a specific model
and make but are more abstract and can be described using concepts such as steering
wheel, gas pedal, and brakes. The abstract car model is realized by various actual cars
that differ in details but provide access to its functionality using the common general
language of driving.
Abstraction applies to all kinds of machines. For example, we can abstract from
the details of a coffee maker, a French press, or an espresso machine by saying that a
machine for making coffee needs the ability to mix hot water with ground coffee for
some time and then to separate the particles from the fluid. The algorithm for making
coffee can be described in terms of this coffee-making abstraction, which is still concrete
enough to be instantiated with different coffee-making machines. Of course, machine
abstraction has its limits. We cannot use a coffee maker to execute a driving algorithm,
and we cannot make coffee using a car. Yet abstract machines are an important way to
decouple languages from specific computer architectures (see figure 15.4).
The most famous machine abstraction for computing is the Turing machine, named
after the famous British mathematician and pioneer of computer science Alan Turing,
who invented it in 1936 and used it to formalize the concepts of computation and algo-
rithm. A Turing machine consists of a tape that is divided into cells, each containing
a symbol. The tape is accessed through a read/write head that can also move the tape
forward and backward. The machine is always in some particular state and is controlled
by a program, given by a set of rules that say which symbol to write on the current cell
of the tape, in which direction to move the tape, and which new state to go into, de-
pending on what symbol is currently visible and what the current state is. The Turing
machine has been used to prove the unsolvability of the halting problem (see chapter
11). Any program can be translated into a Turing machine program, which means that
the Turing machine is an abstraction of all currently existing (electronic) computers.
The importance of this insight is that whatever general property we can prove for the
Turing machine also holds for any other existing computer.
The other strategy for abstracting from particular computers is to use language
translation. For example, we can translate the pebble-tracing algorithm from German
Chapter 15 A Bird’s Eye View: Abstracting from Details 279
Abstract
Machine
Computer
Language Language
Computer
Computer
Computer
Algorithm Algorithm
Algorithm Algorithm
Algorithm Algorithm
to French or English, which eliminates the language barrier and makes the algorithm
accessible to a wider range of people. The same works, of course, for computer lan-
guages. In fact, almost all programs written today are translated in some way before
they are executed by a machine, which means that almost none of the programming
languages in use today are directly understood by a computer and that every algorithm
has to be translated. A programming language abstracts from the details of a particular
computer and provides a uniform access for programming a wide range of computers
in one language. A programming language is therefore an abstraction of computers and
makes the design of algorithms independent of specific computers. If a new computer
is produced, all it takes to run existing algorithms on this new computer is to adapt
a translator to produce changed code for that computer. The translation abstraction
has made the design of programming languages to a large degree independent of the
computers on which they will be executed.
There are different ways to define a Translate abstraction. As with any abstraction,
the question is which details to abstract from and which to expose as parameters in
the interface. The following definition abstracts from the program to be translated, the
language it is given in, and the language it should be translated into:
Note that I use quotes around “Translate . . .” because translation is a sophisticated algo-
rithm, much too long and complicated to be presented here. For example, the automatic
translation of natural languages is still an open problem. In contrast, the translation of
computer languages is a well understood and solved problem. Still, translators are long
and complicated algorithms, which is why no details are shown here.
As an example of how to use the Translate abstraction, here is how the instruction
to find a pebble can be translated from German into English:
The instruction Finde Kieselstein is an element of the source language German, and the
result of the translation is the instruction Find pebble, which is an element of the target
language English.
In Harry Potter the language of spells and charms needs a wizard for execution.
Some spells can be translated into a corresponding potion, which can then be executed
by muggles as well. For example, the effect of some transforming spells can be captured
in the Polyjuice Potion, which allows the drinker to change his or her appearance.
While the Polyjuice Potion is apparently very difficult to produce, even for experienced
wizards, some other spells have a straightforward translation. For example, the killing
curse Avada Kedavra translates into any ordinary lethal poison.
Since every Translate abstraction is itself an algorithm, we can abstract from all
translations through a language in which they can be expressed, just as we abstracted
from all algorithms using a language. The top left part of figure 15.5 illustrates this case.
Since every language corresponds to a computer or abstract machine that can execute
its programs, this means that language can abstract from computers, as shown at the top
right of figure 15.5.
Just as the Turing machine is the ultimate abstraction of any computing machine,
lambda calculus is the ultimate abstraction of any programming language. Lambda cal-
culus was invented around the same time as the Turing machine by the American math-
ematician Alonzo Church. It consists only of three constructs for defining abstractions,
referencing parameters in definitions, and creating instances of abstractions by supply-
ing arguments for parameters, very similar to what is shown in figure 15.1. Any program
in any algorithmic language can be translated into a lambda calculus program. Now it
seems that we have two different ultimate abstractions from computers: lambda calculus
and the Turing machine. How can that be? It turns out that these two abstractions are
equivalent, which means that any program for a Turing machine can be translated into
an equivalent lambda calculus program, and vice versa. Moreover, every known formal-
ism for expressing algorithms6 has been shown to be no more expressive than Turing
Chapter 15 A Bird’s Eye View: Abstracting from Details 281
Language
Translate
Computer
Language
Language
Computer
Language
Algorithm
Algorithm
Algorithm
Type
Type
Algorithm
Algorithm
Computation Algorithm
Representation
Computation Representation
Computation
Representation
Figure 15.5 The tower of abstractions. An algorithm is a (functional) abstraction from com-
putations. An algorithm transforms representations, whose (data) abstractions are types. The
acceptable inputs and outputs for an algorithm are also expressed as types. Each algorithm is
expressed in a language, which is an abstraction from algorithms. A translation algorithm can
transform algorithms from one language into another and therefore makes algorithms indepen-
dent from a particular computer or abstract machine that understands the language in which
it is expressed. Language is also an abstraction of computers, since translation can effectively
eliminate the differences between computers. Types are also expressed as part of a language. The
abstraction hierarchy illustrates that all abstractions in computer science are expressed in some
language.
machines or lambda calculus. Therefore, it seems that any algorithm can be expressed
as a Turing machine or a lambda calculus program. This observation is known as the
Church-Turing thesis, named after the two pioneers of computer science. The Church-
Turing thesis is about the expressiveness and reach of algorithms. Since the definition
of an algorithm is based on the idea of an effective instruction, which is an intuitive
notion tied to human abilities, an algorithm cannot be formalized mathematically. The
282 Once Upon an Algorithm
Church-Turing thesis is not a theorem that could be proved but rather an observation
about the intuitive concept of an algorithm. The Church-Turing thesis is so important
because it implies that everything that can be known about algorithms can be found
out by studying Turing machines and lambda calculus. The Church-Turing thesis is
accepted by most computer scientists.
\ \ \
Computation is used for the systematic solving of problems. Although electronic com-
puters have facilitated the unprecedented growth and reach of computation, they are just
one tool of computing. The concept of computing is more general and more widely ap-
plicable. As we have seen, Hansel and Gretel already knew how to execute an algorithm
and how to successfully employ abstractions in representing paths through pebbles.
Sherlock Holmes was a master of signs and representations, and he manipulated data
structures to solve crimes. And Indiana Jones didn’t employ an electronic computer for
any of his exciting searches. The language of music might not have solved any problem
directly, but it incorporates every bit of syntax and semantics you can imagine. Phil
Connors may not have known any theoretical computer science, yet he faced the same
problem that illustrates the fundamental limits of computation: the unsolvability of
the halting problem. Marty McFly and Doc Brown lived recursion, and Harry Potter
revealed to us the magic power of types and abstraction.
The heroes of these stories may not be heroes of computing, but their stories have
told us a lot about what computation is. As this book ends, I want to leave you with
one more story—the story of computer science. This is a story about abstractions for con-
quering the concept of computation. Its main protagonist is the algorithm, which can
systematically solve problems through the transformation of representations. It does that
by skillfully applying its basic tools, control structures and recursion. Although problems
can be solved in different ways, the algorithm doesn’t settle for any specific solution but
aspires to be as general as possible by employing its secret weapon, the parameter. How-
ever, any of its concrete missions is a constant struggle, since the algorithm always faces
serious opposition from its arch enemy complexity:
The story unfolds as new and larger problems push the algorithm to its limits. But in its
struggle it enjoys the friendship and support of its relatives from the abstraction family:
its sister efficiency, who counsels it about the prudent use of precious resources; its
brother type, who shields it relentlessly from programming errors and incorrect inputs;
Chapter 15 A Bird’s Eye View: Abstracting from Details 283
and its wise grandmother language, who bestows on it expressiveness and makes sure it
is understood by its trusted companion the computer, on whom it relies for executing
any of its plans:
The algorithm is not all powerful and cannot solve all problems, and it is particularly
vulnerable to inefficiency and errors. But it is well aware of this fact. And this knowl-
edge about its own limitations makes it strong and confident about the adventures that
lie ahead.
Further Exploration
The magic in Harry Potter illustrates the concepts of types and typing rules and how
they can help making predictions about the future. A rigorous system of magic can be
found in L. E. Modesitt Jr.’s series of fantasy novels The Saga of Recluce, where magic
is a person’s ability to control the chaos and order that is inherent in all matter. The
protagonist in Jim Butcher’s book series The Dresden Files is a private detective and
wizard who investigates cases involving supernatural phenomena. The stories contain
different types of magic, laws of magic, and tools for magic.
Many examples of creatures with specific abilities, which can be described as types,
can be found in the world created by J. R. R. Tolkien in his novels The Lord of the Rings
and The Hobbit. The superheroes in the X-Men comic series and corresponding movies
have special abilities that are well defined and interact in a precisely defined way with
the natural world. Some of the superheroes define a type that has exactly one element,
namely, themselves. Other types have multiple members.
Using a device incorrectly often amounts to a type error and can lead to malfunc-
tion. This is regularly exploited for dramatic effect, as in the movie The Fly, where the
improper use of a teleportation device causes mixing of the DNA of a human and a fly.
Type errors also occur when people change or switch roles, playing on stereotypes asso-
ciated with the roles. For example, in the movie Freaky Friday, the minds of a teenager
and her mother switch their bodies, which leads to behaviors that violate the expecta-
tions associated with the roles of teenager and adult. In Mark Twain’s The Prince and
the Pauper, a prince switches roles with a poor boy.
Abstractions come in many different forms. Dioramas are abstractions often used
to represent important events in history. Some examples occur in the movie Dinner for
Schmucks. A voodoo doll is an abstraction of a person and is used to inflict pain on the
person over a distance. Examples occur in Indiana Jones and the Temple of Doom and in
Pirates of the Caribbean: On Stranger Tides. In Jim Butcher’s The Dresden Files, voodoo
dolls are used several times. Similarly, an avatar is representations of oneself that can
act in distant environments, which is a major theme of the movie Avatar. The movie
286 Once Upon an Algorithm
Inside Out represents the mind as a space inhabited by five people that personify basic
emotions.
Money is an abstraction of value and is an essential tool for exchanging goods in
economies. The fact that money is a socially constructed abstraction and works only
when all participants agree on its value is nicely illustrated in stories that use non-
traditional currencies, as in Mad Max 2, where gasoline is the major currency, or in the
movie In Time, where one’s own life span is used as a currency. In Frank Herbert’s
Dune, water and spice are currencies, and Douglas Adam’s The Hitchhiker’s Guide to the
Galaxy contains a number of weird currency examples.
Abstractions of behavioral rules can be found in Isaac Asimov’s I, Robot collection
of short stories in the form of three laws of robotics that are intended to ensure that
robots serve humans well without harming them. These laws represent an abstraction
of morality. The stories illustrate the application of these laws and their limitations,
which has led to modification and extensions of the laws. And in Max Barry’s Lexicon
a secret society has developed a novel language that is not a tool for communicating
ideas but for controlling the actions of other people. While natural languages typically
provide abstractions for describing the world, the special language in Lexicon, is based
on abstractions for neurochemical reactions that steer human actions, similar to how
some magic spells can directly affect behavior.
Glossary
This glossary summarizes important terms in the book. The entries are grouped in
sections by story and major concepts. Some entries appear in more than one section.
Important terms used in definitions are labeled with a section letter indicating the
section where they themselves are defined. For instance, algorithma indicates that the
term algorithm is defined in section A.
input. The variable part of a problema that distinguishes different problem examples. The
input is part of the problem representationa and can consist of several parts. An algorithma
applied to a specific input leads to a computationa for the problem instance for that input. In
the algorithm, the input is represented by parametersa .
parameter. A name used by an algorithma to refer to its inputa .
problem. A representationa of a situation, deemed in need of a solution, that facilitates so-
lutions through computationa . Refers to both individual problem examples that are subject to
transformation by computationa and the class of all such problems that can be the inputa for an
algorithma .
program. An algorithma that can be understood and executeda by a machine.
representation. An entity that stands for something in the real world. A representation must
be modifiable by the steps of a computationa . See also representationb .
runtime. A measure for the number of steps an executiona of an algorithma is expected to take.
Expressed as a rule that captures how the runtime is related to the size of the inputa . Commonly
occurring runtime measures are linearc , quadraticc , logarithmicc , and exponentialc .
worst case. The longest possible runtimea of a computationa generated by an algorithma . The
worst case serves as an estimate to judge whether an algorithma is efficient enough to produce a
solution to a problema .
queue. A data typeb for a collection from which elements are removed in the same order in
which they were inserted. Realizes the FIFO (first in, first out) principle for accessing elements
in a collection.
representation. A signb that stands for something in the real world. See also representationa .
set. A data typeb for representing a collection of elements that provides operations for insertion,
removal, and finding of elements. Can be represented as a dictionaryb in which the elements are
the keysb . The set data type is important, since it facilitates the representation of properties.
sign. A specific form of representationb consisting of a signifierb (its visible part) and a signifiedb
(what the signifier stands for).
signified. The part of a signb represented by the signifierb . Not a concrete object in the world
but a concept about objects that people have in their minds.
signifier. The part of a signb that is perceived and that stands for the signifiedb . One signifier
can represent different concepts.
stack. A data typeb for a collection from which elements are removed in the opposite order in
which they were inserted. Realizes the LIFO (last in, first out) principle for accessing elements
in a collection.
symbol. A signb that stands for the signifiedb based on an arbitrary convention.
tree. A data structureb for representing a collection of elements as a hierarchy. Its elements are
called nodesb . Any node higher up in the hierarchy is connected to zero or more nodes lower
in the hierarchy and to at most one node higher in the hierarchy.
Glossary 291
binary search tree. A binary treec with the property that for each nodeb in the treeb , all the
nodes in its left subtree are smaller, and all the nodes in its right subtree are larger.
binary tree. A treeb in which each nodeb has at most two children.
bucket sort. A sorting algorithma that scans the listb to be sorted and places each element
into one of a number of specifically reserved and initially empty spaces (buckets). After all list
elements have been placed into buckets, the buckets are inspected in order, and all the elements
from nonempty buckets are put into the result list. Bucket sort typically uses an arrayb to
represent the buckets. It requires that the set of elements that may possibly occur in the list be
not too large, so that each potential bucket can be assigned only a few or even only one element.
In that case bucket sort has linear runtimec .
divide-and-conquer. An algorithma schema in which the solution is obtained by splitting the
input into separate parts, solving the problems for the parts independently of one another, and
then combining the solutions for the parts into a solution for the original problema . Divide-
and-conquer is not an algorithm itself, but an algorithm structure. Mergesortc and quicksortc
are examples.
exponential runtime. An algorithma has exponential runtime when its execution time dou-
bles (or increases by a factor greater than 1) whenever the size of the inputa grows by 1. This
means that if the size of the input increases by 10, the algorithm will take 1,000 times longer. Al-
gorithms with exponential runtime are not practical solutions, since they work only for inputs
of small size.
generate-and-test. An algorithma schema that consists of two major phases. In the first phase,
it produces potential solutions, which are then systematically checked during the second phase.
Generate-and-test is not an algorithm itself, but an algorithm structure. Trying out all combi-
nations for a combination lock is an example, as is trying out all combinations for solving a
knapsack problemc .
greedy algorithm. An algorithma is called greedy if it always picks the currently best available
option. Picking elements from a sorted list for solving a knapsack problemc is an example.
insertion sort. A sorting algorithma that repeatedly picks the next element from an unsorted
listb and inserts it at the right place in the sorted list. Insertion sort has quadratic runtimec in
the worst casea .
intractable problem. A problema for which only algorithmsa with exponential runtimec are
known. Examples are the knapsack problemc and the traveling salesman problem.
knapsack problem. An example of an intractable problemc . The task is to fill a knapsack of
limited capacity with as many items as possible to maximize the total value of all items that fit
in the knapsack.
Glossary 293
linear runtime. An algorithma has linear runtimea when its execution time grows proportion-
ally to the size of the inputa . Linear runtime is better than linearithmic runtimec but not as
good as logarithmic runtimec . Finding the smallest element in a list has linear runtime in the
worst casea .
linearithmic runtime. An algorithma has linearithmic runtimea when its execution time
grows proportionally to the size of inputa multiplied by the logarithm of the size. Linearith-
mic runtime is not quite as good as linear runtimec but much better than quadratic runtimec .
Mergesortc has linearithmic runtime in the worst casea .
logarithmic runtime. If an algorithma has logarithmic runtime, its runtimea increases only
by 1 for inputa twice the size. Logarithmic runtime is much faster than linear runtimec . For
example, binary searchc in a balancedc binary search treec has logarithmic runtime.
lower bound. The complexity of a problema . Gives the growth rate of the minimum num-
ber of steps any algorithma must take to solve a problem. Any algorithm whose worst-casea
runtimea is the same as the lower bound is an optimal algorithmc .
mergesort. A divide-and-conquerc sorting algorithma that first splits the listb to be sorted into
two sublists of equal length, then sorts these sublists, and finally merges the sorted sublists into
the sorted result list. Mergesort has linearithmic runtimec in the worst casea . Since the lower
boundc for sorting is also linearithmic, mergesort is an optimal algorithmc for sorting.
optimal algorithm. An algorithma whose worst-casea runtimea is the same as the lower
boundc of the problema it is solving.
quadratic runtime. An algorithma has quadratic runtimea when its execution time grows
proportionally to the square of the size of inputa . Quadratic runtime is worse than linearithmic
runtimec but much better than exponential runtimec . Insertion sortc and selection sortc have
quadratic runtime in the worst casea .
quicksort. A divide-and-conquerc sorting algorithma that first splits a listb to be sorted into two
sublists that each contain all the elements that are, respectively, smaller and larger than a chosen
pivot element. These two lists are then themselves sorted, and the results are concatenated, with
the pivot element in the middle, into the sorted result list. Quicksort has a quadratic runtimec
in the worst casea , but performs very well in practice and has linearithmic runtimec in the
average case.
search key. A piece of information that identifies something to be found. In some cases it
is derived (then it is an indexb ); in other cases it is a separate, unrelated value (then it is a
symbolb ). Identifies the boundary that separates relevant from irrelevant elements with regard
to the current search. See also keyb .
selection sort. A sorting algorithma that repeatedly finds the minimum from an unsorted listb
and puts it at the end of the sorted list. Selection sort has quadratic runtimea in the worst casea .
294 Once Upon an Algorithm
trie. A treeb data structureb for implementing setsb or dictionariesb that represents each keyb as
a sequence of nodesb . A trie is particularly efficient when the stored keys share common parts.
control structure. A part of an algorithma for organizing the order, application, and repetition
of operations. The three major control structures are loopse , conditionalse , and recursionf .
for loop. A loope whose bodye is executed a fixed number of times.
halting problem. Will an algorithma always terminatee for any given inputa ?
loop. A control structuree for repeating the bodye of the loop. The number of repetitions is
determined by the termination conditione , which is checked after each executiona of the loop’s
body and which depends on the program state that can be changed by the operations of the
loop’s body. In general, it is not clear how many repetitions the loop generates and thus whether
the loop terminates. Two major kinds of loops are the repeat loope and the while loope . For
these loops the termination condition determines how often the body will be repeated, which
is generally not known before executing the loop. In contrast, the number of repetitions of a
for loope is fixed at the beginning of the loop, which guarantees the terminatione of for loopse .
repeat loop. A loope whose termination conditione is checked always after the operations of
the loop’s bodye have been executed.
termination. A property of an algorithma or computationa that holds when the computa-
tion ends. The termination of an algorithma cannot be determined automatically by another
algorithm; it is an undecidablee problema .
termination condition. A condition that determines whether a loope ends or continues to
run.
undecidability. A property of a problema . A problem is undecidable if it cannot be solved by
an algorithma . A famous undecidable problem is the halting probleme .
while loop. A loope whose termination conditione is always checked before the operations of
the loop’s bodye are executed.
F. Recursion
base case. The nonrecursive part of a recursive definitionf . When a recursive algorithma reaches
a base case, the recursionf will stop at that point. A base case is required, but not sufficient, for
a recursion to terminatee .
bounded recursion. A recursionf that terminatese .
descriptive recursion. When the definition of a concept contains a reference to itself, usually
through the use of a name or symbol. Sometimes called self-reference.
direct recursion. When the definition of a name or object contains a reference to itself.
fixed point. A solution to a recursive equation.
indirect recursion. When the definition of a name or object does not contain a reference to
itself but rather contains another name that contains the reference.
interpreter. A computera that executesa an algorithma using a stackb data typeb to keep track of
recursivef (and nonrecursive) calls and multiple copies of arguments that arise as a consequence
of recursive callsf .
paradox. A logical contradiction, which is caused by a descriptive recursionf that cannot be
executed to yield a boundedf unfolded recursionf . A recursive equation describes a paradox if it
doesn’t have a fixed pointf .
recursion. Refers either to descriptive recursionf , unfolded recursionf , or the executiona of a
recursive definitionf .
recursive call. A call to an algorithma from within its own definition (direct recursionf ) or
from the definition of another algorithm that it calls (indirect recursionf ).
recursive definition. When the definition of a name or object contains a reference to itself. In
a recursive definition of an algorithma , the reference is also called a recursive callf .
substitution. The process of replacing parametersa by concrete values. Replacing the call of an
algorithma by its definition while substituting the arguments for the parameters.
trace. A sequence that captures the different states a computationa went through. Helpful in
illustrating the executiona of an algorithma that is given by a recursive definitionf .
unbounded recursion. A recursionf that does not terminatee . This happens when the recursive
definitionf lacks a base casef or when the base case is not reached by the recursive callsf . Exam-
ples are never-ending songs, infinite listsb of numbers, or some nonterminating computationsa .
Glossary 299
unfolded recursion. When an artifact, typically a picture or text, contains a (sometimes scaled)
copy of itself. Sometimes called self-similarity. Fractals and geometric patterns, such as Sierpin-
ski triangles, are examples of unfolded recursion in pictures, and a never-ending song or poem
is an example of unfolded recursion in text. Matryoshka dolls are an example of a recursive
physical object.
Introduction
1. John Dewey, “The Pattern of Inquiry,” in Logic: Theory of Inquiry (1938).
On Your Way
1. If you have never driven in the United States, Canada, or South Africa, you may not know
this traffic rule. For somebody who first learned how to drive in Germany, four-way stop
304 Once Upon an Algorithm
streets are a truly amazing concept, in particular, the fact that they cause so few accidents
and quarrels about who came first.
2. This follows, since signs are representations and since we have identified computation as the
systematic transformation of representations.
6. A wiki is a program for collaboratively editing and sharing information over the internet.
7. Their ultimate goal is, of course, to trim the set down to contain exactly one element.
other words, since 2 ≈ 1.4142, the algorithm can process input that is approximately 1.4
times as large.
4. About 30 billion atoms would have to be split to produce the energy needed to light up a 1
watt lamp for 1 second.
Notes 307
5. The P stands for polynomial and refers to the class of problems for which a solution can be
constructed by an algorithm whose runtime is proportional to a polynomial such as n 2 or
n3.
The N stands for nondeterministic, and NP refers to the class of problems for which a
solution can be constructed by an algorithm with polynomial runtime on a nondeterministic
machine, which is a hypothetical machine that can always make a correct guess for each
decision needed in the algorithm. For example, we can generate a solution for the weighing
problem in linear time if we can correctly guess for each weight whether it is included in the
solution. Equivalently, NP refers to the class of problems whose solutions can be verified by
an algorithm with polynomial runtime. For example, a proposed solution to the weighing
problem can be verified in linear time, since it requires only the summation of the weights
and the comparison with the target weight.
6. However, if one has to repeatedly find and remove the smallest element from a list, the
sorting-based method becomes more efficient at some point.
7. Consider the first object added. It either weighs more than half of the target weight (which
proves the case) or it weighs less than half, which means that we can add the next object,
since its added weight (which is less than the first and thus also less than half the target
weight) does not exceed the target weight.
Now we can again distinguish two cases. If the two objects together weigh more than half
of the target, their weight is less than 50% off the target weight. Otherwise, if they weigh
less than half, each object must weigh less than one-fourth of the target, and so does the
next object, since it weighs even less and can thus be added. Following this line of reasoning
shows that we can continue to add objects at least until we have reached half the target
weight. This argument assumes there are enough objects available to reach the target weight,
but this assumption is already contained in the assumption that we can find an optimal
solution.
Doctor’s Orders
1. Ludwig Wittgenstein, Tractatus Logico-Philosophicus (1921), para 5.6, trans. K. C. Ogden.
www.gutenberg.org/files/5740/5740-pdf.pdf
4. Unfortunately, tablature is ambiguous with regard to the duration of notes and can therefore
only effectively be used when the performer knows the piece already.
5. To clearly distinguish nonterminal from terminal symbols, the former are always given by
a name with a dotted frame around it, similar to the dotted underlining used to represent
parameters in chapter 2. This notation evokes the notion of a placeholder that is to be
substituted.
6. To represent n different pitches and m different durations, we need n ×m rules. This number
can be drastically reduced to about n + m rules by decomposing a note into two nontermi-
nals for pitch and duration plus rules for producing these two aspects independently of one
another.
7. Idiomatic expressions are an exception to this rule (see chapter 9).
8. Oops. I forgot the last two words “it doesn’t,” which prevented you from successfully parsing
the sentence. In this case, you could probably fix the sentence by adding the missing part,
but sentences for which that doesn’t work will be left without meaning—a very frustrating
experience for the reader or listener.
6. The condition of a repeat loop is appropriately called a termination condition; the condition
of the while loop is better called an entry condition.
7. The for loops that can be found in most programming languages actually have the following
slightly more general form, which provides the body of the loop with information about
the number of iterations that have passed:
for name := number to number do step
This form introduces a nonterminal name that acts as a counter bound to the current iter-
ation number. It can be used in the body of the loop, as in for n := 1 to 10 do compute the
square of n.
Stop at Nothing
1. Recursion can also be a source of nontermination. However, since any recursive algorithm
can be transformed into a nonrecursive one that uses loops, it is sufficient to consider only
loops.
2. Remember that the semicolon is the control structure that sequentially composes steps of
an algorithm.
3. The situation is a bit more complicated than this, but the analogy is still valid.
4. The recursive description is longer because the name of the algorithm has to be mentioned
explicitly.
5. One can also distinguish between generative and structural recursion, but that distinction
is not so important for the fundamental understanding of the concept of recursion itself.
Moreover, one can distinguish between linear and nonlinear recursion (see chapter 13).
6. See en.wikipedia.org/wiki/Drawing_Hands and en.wikipedia.org/wiki/Print_Gallery_(M.
_C._Escher).
7. Only when the algorithms are applied to nonnegative numbers.
8. From David Hunter, Essentials of Discrete Mathematics (2011).
4. This all depends on the context. Usually, one would also say that it doesn’t make sense to
multiply the height of a person with itself, but the square of a person’s height is actually
used to divide the weight of a person to compute the so-called body-mass index.
5. Published in the United Kingdom as Harry Potter and the Philosopher’s Stone.
see also data type; last in, first out see also data structure
start symbol (grammar), 150 trie, 97–99, 100
state, 191, 223 see also data structure
static ↔ dynamic type checking, 256–258 Turing machine, 278, 281
storage space, 32 Turing test, 141
linear, 42 Turing, Alan, 194, 278
substitution, 34, 147, 226–229, 230, 266 type, 9, 142, 243, 247, 274
see also parameter algorithms, for, 247, 248, 253
symbol (sign), 56 see also data type
synonym, 54 type checker, 246, 256
syntax, 6, 145 see also dynamic ↔ static type checking
abstract, 142, 151 type correctness, 246, 255
concrete, 142, 151 type error, 10, 244, 246, 255
see also grammar; parse tree type parameter, 249
syntax tree, see abstract syntax tree; parse tree type-directed programming, 254
typing rule, 9, 246, 251
terminal, 146, 147
unbounded ↔ bounded recursion, 219
termination, 178–179, 185, 187
uncomputable problem, see problem,
termination condition, 178, 179, 183, 191–195,
undecidable
309n6
undecidable problem, see problem, undecidable
see also loop, condition
unfolded ↔ descriptive recursion, 214
textual language ↔ visual language, 181
unsolvable problem, see problem, undecidable
time abstraction, see runtime abstraction
use-mention distinction, 51
trace, 223, 230, 231
tree, 73 variable, 146, 177, 191
ancestor, 73 see also parameter
child, 74 visual language ↔ textual language, 181
descendant, 73 von Neumann, John, 111, 115, 306n6
height, 95
internal node, 91 way finding, see problem examples, path
leaf, 74 finding; algorithm examples, path
node, 74 finding
parent, 74 while loop ↔ repeat loop, 183, 184
path, 92 worst-case complexity, see runtime, worst case
root, 74
subtree, 91 Zuse, Konrad, 303n6