Algorithm Reading Materials
Algorithm Reading Materials
What are algorithms? Why is the study of algorithms worthwhile? What is the role
of algorithms relative to other technologies used in computers? This chapter will
answer these questions.
1.1 Algorithms
Thus, given the input sequence h31; 41; 59; 26; 41; 58i, a correct sorting algorithm
returns as output the sequence h26; 31; 41; 41; 58; 59i. Such an input sequence is
6 Chapter 1 The Role of Algorithms in Computing
1 Sometimes, when the problem context is known, problem instances are themselves simply called
<problems.=
1.1 Algorithms 7
Given a mechanical design in terms of a library of parts, where each part may
include instances of other parts, list the parts in order so that each part appears
before any part that uses it. If the design comprises n parts, then there are nŠ
possible orders, where nŠ denotes the factorial function. Because the factorial
function grows faster than even an exponential function, you cannot feasibly
generate each possible order and then verify that, within that order, each part
appears before the parts using it (unless you have only a few parts). This prob-
lem is an instance of topological sorting, and Chapter 20 shows how to solve
this problem efûciently.
A doctor needs to determine whether an image represents a cancerous tumor or
a benign one. The doctor has available images of many other tumors, some of
which are known to be cancerous and some of which are known to be benign.
A cancerous tumor is likely to be more similar to other cancerous tumors than
to benign tumors, and a benign tumor is more likely to be similar to other be-
nign tumors. By using a clustering algorithm, as in Chapter 33, the doctor can
identify which outcome is more likely.
You need to compress a large ûle containing text so that it occupies less space.
Many ways to do so are known, including <LZW compression,= which looks for
repeating character sequences. Chapter 15 studies a different approach, <Huff-
man coding,= which encodes characters by bit sequences of various lengths,
with characters occurring more frequently encoded by shorter bit sequences.
These lists are far from exhaustive (as you again have probably surmised from
this book’s heft), but they exhibit two characteristics common to many interesting
algorithmic problems:
1. They have many candidate solutions, the overwhelming majority of which do
not solve the problem at hand. Finding one that does, or one that is <best,= with-
out explicitly examining each possible solution, can present quite a challenge.
2. They have practical applications. Of the problems in the above list, ûnding the
shortest path provides the easiest examples. A transportation ûrm, such as a
trucking or railroad company, has a ûnancial interest in ûnding shortest paths
through a road or rail network because taking shorter paths results in lower
labor and fuel costs. Or a routing node on the internet might need to ûnd the
shortest path through the network in order to route a message quickly. Or a
person wishing to drive from New York to Boston might want to ûnd driving
directions using a navigation app.
Not every problem solved by algorithms has an easily identiûed set of candi-
date solutions. For example, given a set of numerical values representing samples
of a signal taken at regular time intervals, the discrete Fourier transform converts
1.1 Algorithms 9
the time domain to the frequency domain. That is, it approximates the signal as a
weighted sum of sinusoids, producing the strength of various frequencies which,
when summed, approximate the sampled signal. In addition to lying at the heart of
signal processing, discrete Fourier transforms have applications in data compres-
sion and multiplying large polynomials and integers. Chapter 30 gives an efûcient
algorithm, the fast Fourier transform (commonly called the FFT), for this problem.
The chapter also sketches out the design of a hardware FFT circuit.
Data structures
This book also presents several data structures. A data structure is a way to store
and organize data in order to facilitate access and modiûcations. Using the appro-
priate data structure or structures is an important part of algorithm design. No sin-
gle data structure works well for all purposes, and so you should know the strengths
and limitations of several of them.
Technique
Although you can use this book as a <cookbook= for algorithms, you might some-
day encounter a problem for which you cannot readily ûnd a published algorithm
(many of the exercises and problems in this book, for example). This book will
teach you techniques of algorithm design and analysis so that you can develop al-
gorithms on your own, show that they give the correct answer, and analyze their ef-
ûciency. Different chapters address different aspects of algorithmic problem solv-
ing. Some chapters address speciûc problems, such as ûnding medians and order
statistics in Chapter 9, computing minimum spanning trees in Chapter 21, and de-
termining a maximum üow in a network in Chapter 24. Other chapters introduce
techniques, such as divide-and-conquer in Chapters 2 and 4, dynamic programming
in Chapter 14, and amortized analysis in Chapter 16.
Hard problems
Most of this book is about efûcient algorithms. Our usual measure of efûciency
is speed: how long does an algorithm take to produce its result? There are some
problems, however, for which we know of no algorithm that runs in a reasonable
amount of time. Chapter 34 studies an interesting subset of these problems, which
are known as NP-complete.
Why are NP-complete problems interesting? First, although no efûcient algo-
rithm for an NP-complete problem has ever been found, nobody has ever proven
that an efûcient algorithm for one cannot exist. In other words, no one knows
whether efûcient algorithms exist for NP-complete problems. Second, the set of
10 Chapter 1 The Role of Algorithms in Computing
Exercises
1.1-1
Describe your own real-world example that requires sorting. Describe one that
requires ûnding the shortest distance between two points.
1.1-2
Other than speed, what other measures of efûciency might you need to consider in
a real-world setting?
1.1-3
Select a data structure that you have seen, and discuss its strengths and limitations.
1.1-4
How are the shortest-path and traveling-salesperson problems given above similar?
How are they different?
1.1-5
Suggest a real-world problem in which only the best solution will do. Then come
up with one in which <approximately= the best solution is good enough.
1.1-6
Describe a real-world problem in which sometimes the entire input is available
before you need to solve the problem, but other times the input is not entirely
available in advance and arrives over time.
12 Chapter 1 The Role of Algorithms in Computing
If computers were inûnitely fast and computer memory were free, would you have
any reason to study algorithms? The answer is yes, if for no other reason than that
you would still like to be certain that your solution method terminates and does so
with the correct answer.
If computers were inûnitely fast, any correct method for solving a problem
would do. You would probably want your implementation to be within the bounds
of good software engineering practice (for example, your implementation should
be well designed and documented), but you would most often use whichever
method was the easiest to implement.
Of course, computers may be fast, but they are not inûnitely fast. Computing
time is therefore a bounded resource, which makes it precious. Although the saying
goes, <Time is money,= time is even more valuable than money: you can get back
money after you spend it, but once time is spent, you can never get it back. Memory
may be inexpensive, but it is neither inûnite nor free. You should choose algorithms
that use the resources of time and space efûciently.
Efûciency
Different algorithms devised to solve the same problem often differ dramatically in
their efûciency. These differences can be much more signiûcant than differences
due to hardware and software.
As an example, Chapter 2 introduces two algorithms for sorting. The ûrst,
known as insertion sort, takes time roughly equal to c1 n2 to sort n items, where c1
is a constant that does not depend on n. That is, it takes time roughly proportional
to n2 . The second, merge sort, takes time roughly equal to c2 n lg n, where lg n
stands for log2 n and c2 is another constant that also does not depend on n. Inser-
tion sort typically has a smaller constant factor than merge sort, so that c1 < c2 .
We’ll see that the constant factors can have far less of an impact on the running
time than the dependence on the input size n. Let’s write insertion sort’s running
time as c1 n n and merge sort’s running time as c2 n lg n. Then we see that where
insertion sort has a factor of n in its running time, merge sort has a factor of lg n,
which is much smaller. For example, when n is 1000, lg n is approximately 10, and
when n is 1,000,000, lg n is approximately only 20. Although insertion sort usu-
ally runs faster than merge sort for small input sizes, once the input size n becomes
large enough, merge sort’s advantage of lg n versus n more than compensates for
the difference in constant factors. No matter how much smaller c1 is than c2 , there
is always a crossover point beyond which merge sort is faster.
1.2 Algorithms as a technology 13
For a concrete example, let us pit a faster computer (computer A) running inser-
tion sort against a slower computer (computer B) running merge sort. They each
must sort an array of 10 million numbers. (Although 10 million numbers might
seem like a lot, if the numbers are eight-byte integers, then the input occupies
about 80 megabytes, which ûts in the memory of even an inexpensive laptop com-
puter many times over.) Suppose that computer A executes 10 billion instructions
per second (faster than any single sequential computer at the time of this writing)
and computer B executes only 10 million instructions per second (much slower
than most contemporary computers), so that computer A is 1000 times faster than
computer B in raw computing power. To make the difference even more dramatic,
suppose that the world’s craftiest programmer codes insertion sort in machine lan-
guage for computer A, and the resulting code requires 2n2 instructions to sort n
numbers. Suppose further that just an average programmer implements merge
sort, using a high-level language with an inefûcient compiler, with the resulting
code taking 50 n lg n instructions. To sort 10 million numbers, computer A takes
2 .107 /2 instructions
1010 instructions/second
D 20,000 seconds (more than 5:5 hours) ;
while computer B takes
from statistics, computer science, and optimization. The design and analysis of
algorithms is fundamental to the ûeld. The core techniques of data science, which
overlap signiûcantly with those in machine learning, include many of the algo-
rithms in this book.
Furthermore, with the ever-increasing capacities of computers, we use them to
solve larger problems than ever before. As we saw in the above comparison be-
tween insertion sort and merge sort, it is at larger problem sizes that the differences
in efûciency between algorithms become particularly prominent.
Having a solid base of algorithmic knowledge and technique is one characteristic
that deûnes the truly skilled programmer. With modern computing technology, you
can accomplish some tasks without knowing much about algorithms, but with a
good background in algorithms, you can do much, much more.
Exercises
1.2-1
Give an example of an application that requires algorithmic content at the applica-
tion level, and discuss the function of the algorithms involved.
1.2-2
Suppose that for inputs of size n on a particular computer, insertion sort runs in 8n2
steps and merge sort runs in 64 n lg n steps. For which values of n does insertion
sort beat merge sort?
1.2-3
What is the smallest value of n such that an algorithm whose running time is 100n2
runs faster than an algorithm whose running time is 2n on the same machine?
Problems
1 1 1 1 1 1 1
second minute hour day month year century
lg n
p
n
n
n lg n
n2
n3
2n
nŠ
Chapter notes
There are many excellent texts on the general topic of algorithms, including those
by Aho, Hopcroft, and Ullman [5, 6], Dasgupta, Papadimitriou, and Vazirani [107],
Edmonds [133], Erickson [135], Goodrich and Tamassia [195, 196], Kleinberg
and Tardos [257], Knuth [259, 260, 261, 262, 263], Levitin [298], Louridas [305],
Mehlhorn and Sanders [325], Mitzenmacher and Upfal [331], Neapolitan [342],
Roughgarden [385, 386, 387, 388], Sanders, Mehlhorn, Dietzfelbinger, and De-
mentiev [393], Sedgewick and Wayne [402], Skiena [414], Soltys-Kulinicz [419],
Wilf [455], and Williamson and Shmoys [459]. Some of the more practical as-
pects of algorithm design are discussed by Bentley [49, 50, 51], Bhargava [54],
Kochenderfer and Wheeler [268], and McGeoch [321]. Surveys of the ûeld of al-
gorithms can also be found in books by Atallah and Blanton [27, 28] and Mehta and
Sahhi [326]. For less technical material, see the books by Christian and Grifûths
[92], Cormen [104], Erwig [136], MacCormick [307], and V¨ocking et al. [448].
Overviews of the algorithms used in computational biology can be found in books
by Jones and Pevzner [240], Elloumi and Zomaya [134], and Marchisio [315].