An Open Guide To Data Structures and Algorithms 1698234730
An Open Guide To Data Structures and Algorithms 1698234730
Acknowledgements viii
Contributors 341
Publisher's Note
This textbook was peer-reviewed, copyedited, and published
through the Private Academic Library Network of Indiana (PALNI)
PALSave Textbook Creation Grants Program, which is funded by
the Lilly Endowment Inc. For more information about the PALSave:
PALNI Affordable Learning Program, visit the PALSave website.
Use the left-hand contents menu to navigate, or the green
bar at the bottom of the page to page forward and back.
If you have comments, suggestions, or corrections for this
textbook, please send them to [email protected].
viii | Acknowledgements
1. Algorithms, Big-O, and
Complexity
Learning Objectives
Introduction
What Is an Algorithm?
If you are reading this book, you have at least some experience with
programming. What if you are asked to express these procedures
using a computer program? Converting procedure A is daunting
from the start. What does it mean to spread the deck out on the
table? With all “cards” strewn across a “table,” we have no
programmatic way of looking at each card. How do you know when
you have found the ace of spades? In most programming languages
there is no comparison that allows you to determine if a list of items
is equal to a single value. However, procedure B can (with relative
ease) be expressed in any programming language. With a couple of
common programming constructs (lists, iteration, and conditionals),
someone with even modest experience can typically express this
procedure in a programming language.
The issue is further complicated if we attempt to be too
explicit, providing details that obscure the desired intent. What if
instead of saying “Then hand him the card,” you say, “Using your
index finger on the top of the card and your thumb on the bottom,
apply pressure to the card long enough to lift the card 15
centimeters from the table and 40 centimeters to the right to hand
it to your professor.” In this case, your instructions have become
more explicit, but at the cost of obfuscating your true objective.
Striking a balance between explicit and sufficient requires practice.
As a rule, pay attention to how others specify algorithms and
generally lean toward being more explicit than less.
Constraints
Scalability
Measuring Scalability
Algorithm Analysis
f(n) = 1n + 1n + 1
f(n) = 2n + 1.
This implies that, regardless of how many cards we have in our deck,
we have a small cost of 1 (namely, the amount of time to return the
matching card). However, if we add two jokers, we have increased
our deck size by 2, thus increasing our cost by 4. Notice that the
first element in our function is variable (2n changes as n changes),
but the second is fixed. As our n gets larger, the coefficient of 2 on
2n has a much greater impact on the overall cost than the cost to
return the matching card.
Big-O Notation
f(n) = O(g(n)) if f(n) ≤ cg(n) for some c > 0 and all n > n0.
2n + 1≤ cn
2n + 1≤ 3n
1≤ n
2 2 2
m(n)=n +1 and g(n) = n , then we can show m(n) = O(n )
2 2
n + 1≤ cn
2 2
n + 1≤ 2n
2
Notably, it is possible to show that f(n) = O(n ). However, it
is not true that m(n) = O(n). Showing this has been left as an exercise
at the end of the chapter.
Another Example
s(n) = 64n
2
t(n) = n .
Other Notations
The remainder of this book (and most books for that matter)
typically uses Big-O notation. However, other sources often
reference Big-Theta notation and a few also use Big-Omega. While
we will not use either of these extensively, you should be familiar
with them. Additional notations outside these three do exist but are
encountered so infrequently that they need not be addressed here.
Big-Omega Notation
Big-Theta Notation
Exercises
a. f(n) = 8n + 4n
b. f(n) = n(n+1)/2
c. f(n) = 12
2
d. f(n) = 1000n
Learning Objectives
Introduction
20 | Recursion
This function takes a number, n, as an input parameter
and then defines a procedure to repeatedly add 2 to a sum. This
approach to calculation uses a for-loop explicitly. Such an approach
is sometimes referred to as an iterative process. This means that
the calculation repeatedly modifies a fixed number of variables that
change the current “state” of the calculation. Each pass of the loop
updates these state variables, and they evolve toward the correct
value. Imagine how this process will evolve as a computer executes
the function call multiplyBy2(3). A “call” asks the computer to
evaluate the function by executing its code.
When the process starts, sum is 0. Then the process
iteratively adds 2 to the sum. The process generated is equivalent to
the mathematical expression (2 + 2 + 2). The following table shows
the value of each variable (i, sum, and n) at each time step:
Figure 2.1
Recursion | 21
Recursive Multiplication
22 | Recursion
0. We now have some understanding of the features of all recursive
procedures.
Features of Recursive Procedures
Recursive Exponentiation
Recursion | 23
3 2
2 =2 * 2
1
=2 * 2 * 2
0
=2 * 2 * 2 * 2
=2 * 2 * 2 * 1=8.
0
The explicit 2 is shown to help us think about the
recursive and base cases. From this example, we can formulate two
general rules about exponentiation using 2 as the base:
0
2 =1
n (n−1)
2 =2 * 2 .
24 | Recursion
procedures and give some background on the motivation for
recursion.
Before we begin, recall from chapter 1 that a procedure can
be thought of as a specific implementation of an algorithm. While
these are indeed two separate concepts, they can often be used
interchangeably when considering pseudocode. This is because the
use of an algorithm in practice should be made as simple as possible.
Often this is accomplished by wrapping the algorithm in a simple
procedure. This design simplifies the interface, allowing the
programmer to easily use the algorithm in some other software
context such as a program or application. In this chapter, you can
treat algorithm and procedure as the same idea.
Recursion | 25
F0 = 0
F1 = 1
F2 = 1
F3 = 2
F4 = 3.
Fn = ?.
F0=0
F1=1
26 | Recursion
Fn=Fn−1 + Fn−2.
F8=F8−1 + F8−2
=F7 + F6
…and so on.
This may be a long process if we are doing this by hand. This could
be an especially long process if we fail to notice that F7−2 is the same
as F6−1.
Recursion | 27
rewritten to be more efficient. Unfortunately, the efficiency of the
implementation comes with the cost of losing the simplicity of the
recursive code. Sacrificing simplicity leads to a more difficult
implementation. Difficult implementations allow for more bugs.
Recursive Structure
28 | Recursion
Base Cases
F0 = 0
F1 = 1
Recursive Case
Fn = Fn−1 + Fn−2
Recursion | 29
Base Cases
Recursive Case
30 | Recursion
the recursive case handles the calculation, and two more calls to
the procedure are executed. You may be thinking, “Wait, for every
function call, it calls itself two more times? That doesn’t sound
efficient.” You would be right. This is not an efficient way to
calculate the Fibonacci numbers. We will revisit this idea in more
detail later in the chapter.
Before we move on, let us revisit the two examples from
the introduction, multiplication and exponentiation. We can define
these concepts using recurrence relations too. We will use the
generic letter a for these definitions.
Multiplication by 2
Base Case
a0 = 0
Recursive Case
an = 2 + an−1
Recursion | 31
Powers of 2
Base Case
a0 = 1
Recursive Case
an = 2 * an−1
32 | Recursion
Recursion and the Runtime Stack
Recursion | 33
The runtime stack is a section of memory that stores data
associated with the dynamic execution of many function calls.
When a function is called at runtime, all the data associated with
that function call, including input arguments and locally defined
variables, are placed on the stack. This is known as “pushing” to
the stack. The data that was pushed onto the stack represents
the state of the function call as it is executing. You can think of
this as the function’s local environment during execution. We call
this data stored on the stack a stack frame. The stack frame is a
contiguous section of the stack that is associated with a specific call
to a procedure. There may be many separate stack frames for the
same procedure. This is the case with recursion, where the same
function is called many times with distinct inputs.
Once the execution of a function completes, its data are no
longer needed. At the time of completion, the function’s data are
removed from the top of the stack or “popped.” Popping data from
the runtime stack frees it, in a sense, and allows that memory to be
used for other function calls that may happen later in the program
execution. As the final two steps in the execution of the procedure,
the function’s stack frame is popped, and its return value (if the
function returns a value) is passed to the caller of the function.
The calling function may be a “main” procedure that is driving the
program, or it may be another call to the same function as in
recursion. This allows the calling function to proceed with its
execution. Let’s trace a simple recursive algorithm to better
understand this process.
34 | Recursion
The computer executes programs in a step-by-step
manner, executing one instruction after another. In this example,
suppose that the computer begins execution on line 8 at the start
of the main procedure. As main needs space to store the resulting
value, we can imagine that a place has been reserved for main’s
result variable and that it is stored on the stack. The following figure
shows the stack at the time just before the function is called on line
8.
Figure 2.2
Recursion | 35
Figure 2.3
Figure 2.4
36 | Recursion
Figure 2.5
Figure 2.6
Recursion | 37
Figure 2.7
Figure 2.8
38 | Recursion
the program executes the print command, “6” would appear on the
screen. This concludes our trace of the dynamic execution of our
program.
You may now be thinking, “This seems like a lot of detail. Why do
we need to know all this stuff?” There are two main reasons why
we want you to better understand the runtime stack. First, it should
be noted that function calls are not free. There is overhead involved
in making a call to a procedure. This involves copying data to the
stack, which takes time. The second concern is that making function
calls, especially recursive calls, consumes memory. Understanding
how algorithms consume the precious resources of time and space
is fundamental to understanding computer science. We need to
understand how the runtime stack works to be able to effectively
reason about memory usage of our recursive algorithms.
Recursive Reverse
Suppose you are given the text string “HELLO,” and we wish to print
its letters in reverse order. We can construct a recursive algorithm
Recursion | 39
that prints a specific letter in the string after calling the algorithm
again on the next position. Before we dive into this algorithm, let’s
explain a few conventions that we will use.
First, we will treat strings as lists or arrays of characters.
Characters are any symbols such as letters, numbers, or other type
symbols like “*,” “+,” or “&.” Saying they are lists or arrays means
that we can access distinct positions of a string using an index.
For example, let message be a string variable, and let its value be
“HELLO.” If we access the elements of this array using zero-based
indexing, then message[0] is the first letter of the string, or “H.”
We will be using zero-based indexing (or 0-based indexing) in this
textbook. Switching between 0-based and 1-based indexing should
be easy, although it can be tricky and requires some thought when
converting complex algorithms. Next, saying that a string is an array
is incorrect in most programming languages. An array is specifically
a fixed-size, contiguous block of memory designed to store multiple
values of the same type. Most programming languages provide a
string type that is more robust. Strings are usually represented as
data structures that provide more functionality than just a block
of memory. We will treat strings as a data structure that is like an
array with a little more functionality. For example, we will assume
that the functionality exists to determine the size of a string from
its variable in constant time or O(1). Specifically, we will use the
following function.
40 | Recursion
Calling recReverse(“HELLO”, 0) should give the following
text printed to the screen:
O
L
L
E
H
This demonstrates that the recursive algorithm can print
the characters of a string in reverse order without using excessive
index manipulation. Notice the order of the print statement and
the recursive call. If the order of lines 7 and 8 were switched, the
characters would print in their normal order. This algorithm
resembles another recursive algorithm for visiting the nodes of a
tree data structure that you will see in a later chapter.
Recursion | 41
Now we may call reverse(“HELLO”), which in turn, makes
a call that is equivalent to revReverse(“HELLO”, 0). This design
method is sometimes called wrapping. The recursive algorithm is
wrapped in a function that simplifies the interface to the recursive
algorithm. This is a very common pattern for dealing with recursive
algorithms that need to carry some state of the calculation as input
parameters. Some programmers may call reverse the wrapper and
recReverse the helper. If your language supports access specifiers
like “public” and “private,” you should make the wrapper a “public”
function and restrict the helper function by making it “private.” This
practice prevents programmers from accidentally mishandling the
index by restricting the use of the helper function.
42 | Recursion
The equations below illustrate this process. Here, let a
equal 28, and let b equal 16:
a/b
(16 / 4) / (28 / 4) = 4 / 7.
Recursion | 43
This process will continue to reduce a and b in sequence
until the remainder is 0, ultimately finding the greatest common
divisor. This provides a good example of a recursive numerical
algorithm that has a practical use, which is for simplifying fractions.
For this algorithm, our base case occurs when the recursive
process reaches the last position of the array. This signals the end
of recursion, and the currentMin value is returned. In the recursive
44 | Recursion
case, we compare the value of the current minimum with the value
at the current position. Next, a recursive call is made that uses the
appropriate current minimum and increases the position.
Recursion | 45
following sequence of 5 multiplications: 2 * 2 * 2 * 2 * 2 * 2. If we
added 1 to n, we would have 6 multiplications for an n of 7. Actually,
according to the algorithm as written, the correct sequence for an n
equal to 6 would be 1 * 2 * 2 * 2 * 2 * 2 * 2. When n is 6, this sequence
has exactly n multiplications counting the first multiplication by 1.
This could be optimized away, but it is a good demonstration that
with or without the optimization the time complexity would be O(n).
As a reminder, O(n − 1) is equivalent to O(n). The time complexity
O(n) is also known as linear time complexity.
For this algorithm, what is the space complexity? First, let
us ask how many variables are used. Well, space is needed for the
input parameter n and the product variable. Are any other variables
needed? In a technical sense, perhaps some space would be needed
to store the literal numerical values of 1 and 2, but this may depend
on the specific computer architecture in use. We ignore these issues
for now. That leaves the variables n and product. If the input
parameter n was assigned the value of 10, how many variables would
be needed? Just two variables would be needed. If n equaled 100,
how many variables would be needed? Still, only two variables would
be needed. This leads us to the conclusion that only a constant
number of variables are needed regardless of the size of n.
Therefore, the space complexity of this algorithm is O(1), also known
as constant space complexity.
In summary, this iterative procedure takes O(n) time and
uses O(1) space to arrive at the calculated result. Note: This analysis
is a bit of a simplification. In reality, the size of the input is
proportional to log(n) of the value, as the number n itself is
represented in approximately log2(n) bits. This detail is often
ignored, but it is worth mentioning. The point is to build your
intuition for Big-O analysis by thinking about how processes need
to grow with larger inputs.
46 | Recursion
Recursive Powers of 2
Recursion | 47
T(0) = 1
T(n)=1 + T(n − 1)
=1 + 1 + T(n − 2)
=2 + T(n − 2)
=2 + 1 + T(n − 3)
=3 + T(n − 3).
T(n)=n + T(n − n)
=n + T(0)
48 | Recursion
=n + 1.
Recursion | 49
Figure 2.9
Tail-Call Optimization
50 | Recursion
For this algorithm to work correctly, any external call
should be made with product set to 1. This ensures that an exponent
of n equal to 0 returns 1. This means it would be a good idea to wrap
this algorithm in a wrapper function to avoid unnecessary errors
when a programmer mistakenly calls the algorithm without product
set to 1. This simple wrapper function is presented below:
Recursion | 51
Figure 2.10
Figure 2.11
52 | Recursion
Powers of 2 in O(log n) Time
Recursion | 53
Let us think about the process this algorithm uses to
9
calculate 2 . This process starts with a call to fastPowerOf2(9). Since
9 is odd, we would multiply 2 * fastPowerOf2(8). This expression
then expands to 2 * square(fastPowerOf2(4)). Let us look at this in
more detail.
fastPowerOf2(9)->2 * fastPowerOf2(8)
->2 * square(fastPowerOf2(4))
->2 * square(square(fastPowerOf2(2)))
->2 *
square(square(square(fastPowerOf2(1))))
->2 * square(square(square(2 *
fastPowerOf2(0))))
->2 * square(square(square(2 * 1)))
->2 * square(square(4))
->2 * square(16)
->2 * 256
->512
It may not be clear that this algorithm is faster than the
previous recursivePowerOf2, which was bounded by O(n).
9
Considering the linear calculation of 2 would look like this: 2 * 2 *
2 * 2 * 2 * 2 * 2 * 2 * 2 with 9 multiplications. For this algorithm,
we could start thinking about the calculation that is equivalent to
2 * square(square(square(2 * 1))). Our first multiplication is 2 * 1.
Next, the square procedure multiplies 2 * 2 to get 4. Then 4 is
squared, and then 16 is squared for 2 more multiplications. Finally,
we get 2 * 256, giving the solution of 512 after only 5 multiply
operations. At least for an n of 9, fastPowerOf2 uses fewer multiply
operations. From this, we could imagine that for larger values of
54 | Recursion
n, the fastPowerOf2 algorithm would yield even more savings with
fewer total operations compared to the linear algorithm.
Let us now examine the time complexity of fastPowerOf2
using a recurrence relation. The key insight is that roughly every
time the recursive algorithm is called, n is cut in half. We could
model this as follows: Let us assume that there is a small constant
number of operations associated with each call to the recursive
procedure. We will represent this as c. Again, we will use the
function T(n) to represent the worse-case number of operations for
the algorithm. So for this algorithm, we have T(n) = c + T(n/2). Let us
write this and the base case as a recurrence relation:
T(0)=c
T(n)=c + T(n/2).
T(n)=c + T(n/2)
=c + c + T(n/4)
=c + c + c + T(n/8).
Recursion | 55
1
T(n)=c + T(n/2 )
2
=2c + T(n/2 )
3
=3c + T(n/2 ).
k
T(n)=k * c + T(n/2 ).
k
So when will n/2 be 1?
We can solve for k in the following equation:
k
n/2 =1
k k k
2 * (n/2 )=2 * 1
k
n=2 .
log2 n=k.
56 | Recursion
Now we can rewrite the original formula:
T(n)=log2 n * c + T(1).
T(n)=log2 n * c + T(1)
=log2 n * c + 2c
=c * (log2 n + 2).
Recursion | 57
these calls will consume memory by storing data on the runtime
stack. As n is divided each time, we will reach a base case after O(log
n) recursive calls. This means that in the worst case, the algorithm
will have O(log n) stack frames taking up memory. Put simply, its
space complexity is O(log n).
Exercises
3
. Modify the recMin algorithm to create a function
called recMinIndex. Your algorithm should return the
index of the minimum value in the array rather than the
minimum value itself. Hint: You should add another
parameter to the helper function that keeps track of the
minimum value’s index.
4
. Write an algorithm called simplify that prints the
58 | Recursion
simplified output of a fraction. Have the procedure accept
two integers representing the numerator and
denominator of a fraction. Have the algorithm print a
simplified representation of the fraction. For example,
simplify(8, 16) should print “1 / 2.” Use the GCD algorithm
as a subroutine in your procedure.
5
. Implement the three powerOf2 algorithms
(iterative, recursive, and fastPowerOf2) in your language
of choice. Calculate the runtime of the different
algorithms, and determine which algorithms perform best
for which values of n. Using your data, create a “super”
algorithm that checks the size of n and calls the most
efficient algorithm depending on the value of n.
References
Recursion | 59
3. Sorting
Learning Objectives
Introduction
60 | Sorting
To achieve these goals, some form of sorting algorithm must be
used. A key observation is that these sorting problems rely on a
specific comparison operator that imposes an ordering (“a” comes
before “b” in alphabetical order, and 10 < 12 in numerical order). As
a terminology note, alphabetical ordering is also known as lexical,
lexicographic, or dictionary ordering. Alphabetical and numerical
orderings are usually the most common orderings, but date or
calendar ordering is also common.
In this chapter, we will explore several sorting algorithms.
Sorting is a classic problem in computer science. These algorithms
are classic not because you will often need to write sorting
algorithms in your professional life. Rather, sorting offers an easy-
to-understand problem with a diverse set of algorithms, making
sorting algorithms an excellent starting point for the analysis of
algorithms.
To begin our study, let us take a simple example sorting
problem and explore a straightforward algorithm to solve it.
Figure 3.1
Sorting | 61
Our human mind could easily order these numbers from smallest
to largest without much effort. What if we had 20 values? 200? We
would quickly get tired and start making mistakes. For these values,
the correct ordering is 22, 24, 27, 35, 43, 45, 47, 48. Give yourself
some time to think about how you would solve this problem. Don’t
consider arrays or indexes or algorithms. Think about doing it just
by looking at the numbers. Try it now.
Reflect on how you solved the problem. Did you use your
fingers to mark the positions? Did you scan over all the values
multiple times? Taking some time to think about your process may
help you understand how a computer could solve this problem.
One simple solution would be to move the smallest value
in the list to the leftmost position, then attempt to place the next
smallest value in the next available position, and so on until reaching
the last value in the list. This approach is called Selection Sort.
Selection Sort
62 | Sorting
The algorithm works by repeatedly selecting the smallest
value in the given range of the array and then placing it in its proper
position along the array starting in the first position. With a little
thought for our design, we can construct this algorithm in a way
that greatly simplifies its logic.
Figure 3.2
Sorting | 63
Figure 3.3
64 | Sorting
For this algorithm, we can set any start coordinate and find
the index of the smallest value from the start to the end of the array.
This simple procedure gives us a lot of power, as we will learn.
Now that we have our tools created, we can write Selection
Sort. This leads to a simple implementation thanks to the design
that decomposed the problem into smaller tasks.
Sorting | 65
Selection Sort Complexity
It should be clear that the sorting of the array using Selection Sort
does not use any extra space other than the original array and a few
extra variables. The space complexity of Selection Sort is O(n).
Analyzing the time complexity of Selection Sort is a little
trickier. We may already know that the complexity of finding the
minimum value from an array of size n is O(n) because we cannot
avoid checking every value in the array. We might reason that there
is a loop that goes from 1 to n in the algorithm, and our findMinIndex
should also be O(n). This idea leads us to think that calling an O(n)
2
function n times would lead to O(n ). Is this correct? How can we
be sure? Toward the end of the algorithm’s execution, we are only
looking for the minimum value’s index from among 3, 2, or 1 values.
This seems close to O(1) or constant time. Calling an O(1) function
n times would lead to O(n), right? Practicing this type of reasoning
and asking these questions will help develop your algorithm analysis
skills. These are both reasonable arguments, and they have helped
establish a bound for our algorithm’s complexity. It would be safe
to assume that the actual runtime is somewhere between O(n) and
2
O(n ). Let us try to tackle this question more rigorously.
When our algorithm begins, nothing is in sorted order
(assume a random ordering). Our index from line 3 of selectionSort
starts at 0. Next, findMinIndex searches all n elements from 0 to n −
1. Then we have the smallest value in position 0, and index becomes
1. With index 1, findMinIndex searches n − 1 values from 1 to n − 1.
This continues until index becomes n − 1 and the algorithm finishes
with all values sorted.
We have the following pattern:
With index at 0 n comparison operations are performed by
findMinIndex.
With index at 1 n − 1 comparison operations are performed
by findMinIndex.
66 | Sorting
With index at 2 n − 2 comparison operations are performed
by findMinIndex.
…
With index at n − 2 2 comparison operations are performed by
findMinIndex.
With index at n − 1 1 comparison operation is performed by
findMinIndex.
Our runtime is represented by the sum of all these
operations. We could rewrite this in terms of the sum over the
number of comparison operations at each step:
n + (n − 1) + (n − 2) + … 3 + 2 + 1.
Let S = n + (n − 1) + (n − 2) + … 3 + 2 + 1.
Multiplying S by 2 gives
2 * S=[n + (n − 1) + (n − 2) + … + 3 + 2 + 1] + [n + (n − 1) +
(n − 2) + … + 3 + 2 + 1].
Sorting | 67
2 * S=[n + (n − 1) + (n − 2) + … + 3 + 2 + 1]
+ [1 + 2 + 3 + … + (n − 2) + (n − 1) + n].
2 * S=(n + 1) + (n + 1) + (n + 1) + … + (n + 1) + (n + 1) + (n +
1)
=n * (n + 1).
S=[n * (n + 1)]/2.
68 | Sorting
2
S=(½)*[n + n]
2
=(½) n + (½) n.
Insertion Sort
Figure 3.4
Sorting | 69
Currently, the books are not organized alphabetically.
Insertion Sort starts by considering the first book as sorted and
placing an imaginary separator between the sorted and unsorted
books. The algorithm then considers the first book in the unsorted
portion.
Figure 3.5
70 | Sorting
Figure 3.6
Sorting | 71
Figure 3.7
72 | Sorting
Figure 3.8
Figure 3.9
Sorting | 73
Figure 3.10
74 | Sorting
arrays of integers as with Selection Sort. Specifically, we will assume
that the values of positions in the array are comparable and will
lead to the correct ordering. The process will work equally well with
alphabetical characters as with numbers provided the relational
operators are defined for these and other orderable types. We will
examine a way to make comparisons more flexible later in the
chapter.
Figure 3.11
Sorting | 75
1 and end set to 8. Entering the body of the while-loop, we set
currentValue to the value 27, and index is set to endSorted, which is
1. The image below gives an illustration of this scenario:
Figure 3.12
76 | Sorting
Figure 3.13
Sorting | 77
Figure 3.14
78 | Sorting
Figure 3.15
Sorting | 79
Figure 3.16
Figure 3.17
Then…
80 | Sorting
Figure 3.18
Now with index 0, the inner loop’s body will not execute.
We will copy the currentValue into the index position and begin the
outer loop’s execution again. At the start of the next inner loop, we
have the scene below:
Sorting | 81
Figure 3.19
From this example execution, you may have noticed that sometimes
Insertion Sort does a lot of work, but other times it seems that
very little needs to be done. This observation allows us to consider
a different way to analyze algorithms—namely, the best-case time
complexity. Before we address this question, let us analyze the
worst-case space complexity and the worst-case time complexity of
Insertion Sort.
The space complexity of Insertion Sort should be easy to
determine. We only need space for the array itself and a few other
index- and value-storage variables. This means that our memory
usage is O(n) for the array and a small constant number of other
values where c << n (a constant much smaller than n). This means
our memory usage is bounded by n + c, and we have an O(n) space
complexity for Insertion Sort. The memory usage of Insertion Sort
takes O(n) for the array itself and O(1) for the other necessary
variables. This O(1) memory cost for the indexes and other values
is sometimes referred to as the auxiliary memory. It is the extra
82 | Sorting
memory needed for the algorithm to function in addition to the
storage cost of the array values themselves. This auxiliary memory
could be freed after the algorithm completes while keeping the
sorted array intact. We will revisit auxiliary memory later in the
chapter.
When considering the time complexity, we are generally
interested in the worst-case scenario. When we talk about a “case,”
we mean a particular instance of the problem that has some special
features that impact the algorithm’s performance. What special
features of the way our values are organized might lead to a good or
bad case for our algorithm? In what situation would we encounter
the absolute largest number of operations? As we observed in our
trace, the value 24 resulted in a lot of comparisons and move
operations. We continued to check each value and move all the
values greater than 24 one space to the right. In contrast, value 45
was nearly in the right place. For 45, we only “moved” it back into
the same place from which it came. Take a moment to think about
what these observations might mean for our worst-case and best-
case analysis.
Let us think about the case of 24 first. Why did 24 require so
many operations? Well, it was smaller than all the values that came
before it. In a sense, it was “maximally out of place.” Suppose the
value 23 came next in the endSorted position. This would require
us to move all the other values, including 24, over again to make
room for 23 in the first position. What if 22 came next? There may
be a pattern developing. We considered some values followed by
24, 23, then 22. These values would lead to a lot of work each time.
The original starting order of our previous array was 43, 27, 45, 24,
35, 47, 22, 48. For Insertion Sort, what would be the worst starting
order? Or put another way, which starting order would lead to the
absolute highest number of comparison and move operations? Take
a moment to think about it.
When sorting in increasing order, the worst scenario or
Insertion Sort would be an array where all values are in decreasing
order. This means that every value being considered for placement
Sorting | 83
is “maximally out of place.” Consider the case where our values were
ordered as 48, 47, 45, 43, 35, 27, 24, 22, and we needed to place them
in increasing order. The positions of 48 and 47 must be swapped.
Next, 45 must move to position 0. Next, 43 moves to position 0
resulting in 3 comparisons and 3 move (or assignment) operations.
Then, 35 results in 4 comparisons and 4 move operations to take its
place at the front. This process continues for smaller and smaller
values that need to be moved all the way to the front of the array.
From this pattern, we see that for this worst-case scenario,
the first value considered takes 1 comparison and 1 move operation.
The second value requires 2 comparisons and 2 moves. The third
takes 3 comparisons and 3 moves, and so on. As the total runtime is
the sum of the operations for all the values, we see that a function
for the worst-case runtime would look like the following equation.
The 2 accounts for an equal number of comparisons and moves:
T(n)=2 * 1 + 2 * 2 + 2 * 3 + 2 * 4 + … 2 * (n − 1).
T(n)=2 * (1 + 2 + 3 + 4 … n − 1).
84 | Sorting
T(n)=2 * {(½)[n*(n − 1)]}.
T(n)=n*(n − 1).
2
T(n)=n − n.
2
This means that our worst-case time complexity is O(n ).
This is the same as Selection Sort’s worst-case time complexity.
Sorting | 85
portion of the array. Specifically, 45 is larger than the other values in
the sorted portion, meaning it is “already sorted.” Suppose we next
considered 46. Well, 46 would be larger than 45, which is already
larger than the other previous values. This means 46 is already
sorted as well, resulting in 1 additional comparison and 1 additional
move operation. We now know that the best-case scenario for
Insertion Sort is a correctly sorted array.
For our example array, this would be 22, 24, 27, 35, 43, 45, 47,
48. Think about how Insertion Sort would proceed with this array.
We first consider 24 with respect to 22. This gives 1 comparison and
1 move operation. Next, we consider 27, adding 1 comparison and 1
move, and so on until we reach 48 at the end of the array. Following
this pattern, 2 operations are needed for each of the n − 1 values to
the right of 22 in the array. Therefore, we have 2*(n − 1) operations
leading to a bound of O(n) operations for the best-case scenario.
This means that when the array is already sorted, Insertion Sort
will execute in O(n) time. This could be a significant cost savings
2
compared to the O(n ) case.
The fact that Insertion Sort has a best-case time
2
complexity of O(n) and a worst-case time complexity of O(n ) may
be hard to interpret. To better understand these features, let us
consider the best- and worst-case time complexity of Selection
Sort. Suppose that we attempt to sort an array with Selection Sort,
and that array is already sorted in increasing order. Selection Sort
will begin by finding the minimum value in the array and placing it
in the first position. The first value is already in its correct position,
but Selection Sort still performs n comparisons and 1 move. The
next value is considered, and n − 1 comparisons are executed. This
progression leads to another variation of the arithmetic series (n
2
+ n − 1 + n − 2 …) leading to O(n ) time complexity. Suppose now
that we have the opposite scenario, where the array is sorted in
descending order. Selection Sort performs the same. It searches for
the minimum and moves it to the first position. Then it searches
for the second smallest value, moves it to position 1, and continues
2
with the remaining values. This again leads to O(n ) time complexity.
86 | Sorting
We have now considered the already-sorted array and the reverse-
2
order-sorted array, and both cases led to O(n ) time complexity.
Regardless of the orderings of the input array, Selection
2
Sort always takes O(n ) operations. Depending on the input
2
configuration, Insertion Sort may take O(n ) operations, but in other
cases, the time complexity may be closer to O(n). This gives
Insertion Sort a definite advantage over Selection Sort in terms of
time complexity. You may rightly ask, “How big is this advantage?”
Constant factors can be large, after all. The answer is “It depends.”
We may wish to ask, “How likely are we to encounter our best-case
scenario?” This question may only be answered by making some
assumptions about how the algorithms will be used or assumptions
about the types of value sets that we will be sorting. Is it likely we
will encounter data sets that are nearly sorted? Would it be more
likely that the values are in a roughly random order? The answers
to these questions will be highly context-dependent. For now, we
will only highlight that Insertion Sort has a better best-case time
complexity bound than Selection Sort.
Merge Sort
Sorting | 87
only take 98 years to complete compared to the 99 years of the
other algorithm.” Professor: “I would like to have the problem solved
before I pass away. Preferably, before I retire.” This is an extreme
example. Often, minor improvements to an algorithm can make a
very real impact, especially on real-time systems with small input
sizes. On the other hand, reducing the runtime bound by more than
a constant factor can have a drastic impact on performance. In
this section, we will present Merge Sort, an algorithm that greatly
improves on the runtime efficiency of Insertion Sort and Selection
Sort.
88 | Sorting
Figure 3.20
Merge Sort would first split the array in half and then make
a recursive call on the two subarrays. This would in turn split
repeatedly until each array is only a single element. For this
example, the process would look something like the image below.
Figure 3.21
Sorting | 89
Figure 3.22
90 | Sorting
Figure 3.23
Sorting | 91
Figure 3.24
Figure 3.25
92 | Sorting
Using this approach simplifies the code and makes sure the data are
sorted and put back into the original array.
These are the key ideas behind Merge Sort. Now we will
examine the implementation.
Sorting | 93
This function will take a split segment of an array, identified
by indexes, and merge the values of the left and right halves in order.
This will place their values into their proper sorted order in the
original array. Merge is the most complex function that we will write
for Merge Sort. This function is complex in a general sense because
it relies on the careful manipulation of indexes. This is a very error-
prone process that leads to many off-by-one errors. If you forget
to add 1 or subtract 1 in a specific place, your algorithm may be
completely broken. One advantage of a modular design is that these
functions can be tested independently. At this time, we may wish to
create some tests for the merge function. This is a good practice,
but it is outside of the scope of this text. You are encouraged to
write a test of your newly created function with a simple example
such as 27, 43, 24, 45 from the diagrams above.
Now that the merge implementation is complete, we will
move on to writing the recursive function that will complete Merge
Sort. Here we will use the mathematical “floor” function. This is
equivalent to integer division in C-like languages or truncation in
languages with fixed-size integers.
94 | Sorting
From this implementation, we see that the recursion
continues while the start index is less than the end index. Recursion
ends once the start and end index have the same value. In other
words, our process will continue splitting and splitting the data until
it reaches a level of a single-array element. At this point, recursion
ends, a single value is sorted by default, and the merging process
can begin. This process continues until the final two halves are
sorted and merged into their new positions within the starting
array, completing the algorithm.
Once more, we have a recursive algorithm requiring some
starting values. In cases like these, we should use a wrapper to
provide a more user-friendly interface. A wrapper can be
constructed as follows:
Sorting | 95
takes n/2 individual copies for each half of the array. Then another
n copies back into the original array. This gives n/2 + n/2 + n total
copy operations, giving 2n. This would be O(n) or linear time.
Now that we know merging is O(n) we can start to think
about Merge Sort. Thinking about the top-level case at the start of
the algorithm, we can set up a function for the time cost of Merge
Sort:
T(n)=2*T(n/2) + c*n.
This captures the cost of Merge Sorting the two halves of the array
and the merge cost, which we determined would be O(n) or n times
some constant c. Substituting this equation into itself for T(n/2)
gives the following:
96 | Sorting
terms. Instead of continuing the recurrence, we will instead draw a
diagram to show how many of these we can expect.
Figure 3.26
Sorting | 97
Figure 3.27
Figure 3.28
98 | Sorting
is equal to the end index. In other words, how many times can we
split before we reach the single-element level of the array? We just
k
need to solve 1 = n/2 for the value k. Setting k to log2 n solves this
equation. Therefore, we have log2 n occurrences of the c*n terms.
If we assume that n is a power of 2, the overall time cost gives us
T(n) = n*T(1) + log2 n * c * n. For T(1), our wrapper function would
make one check and return. We can safely assume that T(1) = c, a
small constant. We could then write it as T(n) = n*c + log2 n * c *
n, or c*(n * log2 n + n). In Big-O terms, we arrive at O(n log2 n).
A consequence of Big-O is that constants are ignored. Logs in any
base can be related to each other by a constant factor (log2(n) =
log3(n)/log3(2), and note that 1/log3(2) is a constant), so the base
is usually dropped in computer science discussions. We can now
state the proper worst-case runtime complexity of Merge Sort is
O(n log n). It may not be obvious, but this improvement leads to a
2 2
fundamental improvement over O(n ). For example, at n = 100, n =
10000, but n*log2 n is approximately 665, which is less than a tenth
2
of the n value. Merge Sort guarantees a runtime bounded by O(n
log n), as the best case and worst case are equivalent (much like
Selection Sort).
For space complexity, we will need at least enough memory
as we have elements of the array. So we need at least O(n) space.
Remember that we also needed some temporary storage for copying
the subarrays before merging them again. We need the most
memory for the case near the end of the algorithm. At this stage, we
have two n/2-sized storage arrays in addition to the original array.
This leads us to space for the original n values and another n value’s
worth of storage for the temporary values. That gives 2*n near the
end of the calculation, so overall memory usage seems to be O(n).
This is not the whole story though.
Since we are using a recursive algorithm, we may also
reason that we need stack space to store the sequence of calls.
Each stack frame does not need to hold a copy of the array. Usually,
arrays are treated as references. This means that each stack frame is
likely small, containing a link to the array’s location and the indexes
Sorting | 99
needed to keep our place in the array. This means that each
recursive call will take up a constant amount of memory. The other
question we need to address is “How large will the stack of frames
grow during execution?” We can expect as many recursive calls as
the depth of the treelike structure in the diagrams above. We know
now that that depth is approximately log n. Now we have all the
pieces to think about the overall space complexity of Merge Sort.
First, we need n values for the original array. Next, we will
need another n value for storage in the worst case (near the end of
the algorithm). Finally, we can expect the stack to take up around log
n stack frames. This gives the following formula using c1 and c2 to
account for a small number of extra variables associated with each
category (indexes for the temporary arrays, start and end indexes
for recursive calls, etc.): S(n) = n + c1*n + c2*log n. This leads to O(2*n
+ log n), which simplifies to just O(n) in Big-O notation. The overall
space complexity of Merge Sort is O(n).
100 | Sorting
from the equation and think about how much “extra” memory is
needed.
Let us try to think about auxiliary storage for Selection
Sort. We said that Selection Sort only uses the memory needed for
the array plus a few extra variables. Removing the array storage, we
are left with the “few extra variables” part. It means that a constant
number of “auxiliary” variables are needed, leading to an auxiliary
space cost of O(1) or constant auxiliary memory usage. Insertion
Sort falls into the same category, needing only a few index variables
in addition to storage used for the array itself. Insertion Sort has an
auxiliary memory cost of O(1).
An algorithm of this kind that requires only a constant
amount of extra memory is called an “in-place” algorithm. The
algorithm keeps array data as a whole within its original place in
memory (even if specific values are rearranged). Historically, this
was a very important feature of algorithms when memory was
expensive. Both Selection Sort and Insertion Sort are in-place
sorting algorithms.
Coming back to Merge Sort, we can roughly estimate the
memory usage with this function:
Now we remove n for the storage of the array to think about the
auxiliary memory. That leaves us with c1*n + c2*log n. This means
our auxiliary memory usage is bounded by O(n + log n) or just
O(n). This means that Merge Sort potentially needs quite a bit of
extra memory, and it grows proportionally to the size of the input.
This represents the major drawback of Merge Sort. On modern
computers, which have sizable memory, the extra memory cost is
Sorting | 101
usually worth the speed up, although the only way to know for sure
is to test it on your machine.
Quick Sort
The general idea of Quick Sort is to choose a pivot key value and
move any array element less than the pivot to the left side of the
array (for increasing or ascending order). Similarly, any value greater
than the pivot should move to the right. Now on either side of the
pivot, there are two smaller unsorted portions of the array. This
might look something like this: [all numbers less than pivot, the
pivot value, all numbers greater than pivot]. Now the pivot is in
its correct place, and the higher and lower values have all moved
closer to their final positions. The next step recursively sorts these
two portions of the array in place. The process of moving values to
the left and right of a pivot is called “partitioning” in this context.
There are many variations on Quick Sort, and many of them focus
on clever ways to choose the pivot. We will focus on a simple version
to make the runtime complexity easier to understand.
102 | Sorting
Quick Sort Implementation
This partition function does the bulk of the work for the
algorithm. First, the pivot is assumed to be the first value in the
array. The algorithm then places any value less than the pivot on
the left of the eventual position of the pivot value. This goal is
accomplished using the smallIndex value that holds the position
of the last value that was smaller than the pivot. When the loop
advances to a position that holds a value smaller than the pivot, the
algorithm exchanges the smaller value with the one to the right of
smallIndex (the rightmost value considered so far that is smaller
than the pivot). Finally, the algorithm exchanges the first value, the
pivot, with the small value at smallIndex to put the pivot in its final
position, and smallIndex is returned. The final return provides the
pivot value’s index for the recursive process that we will examine
next.
Sorting | 103
Using recursion, the remainder of the algorithm is simple
to implement. We will recursively sort by first partitioning the
values between start and end. By calling partition, we are
guaranteed to have the pivot in its correct position in the array.
Next, we recursively sort all the remaining values to the left and
right of the pivot location. This completes the algorithm, but we may
wish to create a nice wrapper for this function to avoid so much
index passing.
104 | Sorting
Figure 3.29
Sorting | 105
Figure 3.30
Figure 3.31
106 | Sorting
The next value, 24, is smaller than our pivot. This means the
algorithm updates smallIndex and exchange values.
Figure 3.32
Now the array is in the following state just before the loop
finishes. Next, the index will be incremented, and position 4 will be
examined. Notice how smallIndex always points to the rightmost
value smaller than the pivot.
Sorting | 107
Figure 3.33
108 | Sorting
Figure 3.34
Sorting | 109
Figure 3.35
110 | Sorting
Figure 3.36
Sorting | 111
is the smallest value in its partition. This leads to an uneven split.
When Quick Sort runs recursively on these parts, the left side is
split unevenly, but the right side is split evenly in half.
Figure 3.37
112 | Sorting
Figure 3.38
Figure 3.39
Sorting | 113
to Merge Sort in many real-world settings, and it has the advantage
of being an “in-place” sorting algorithm. Let’s next explore some of
these ideas and try to understand why Quick Sort is a great and
highly used algorithm in practice even with a worst-case complexity
2
of O(n ).
We now know that bad choices of the pivot can lead to poor
performance. Consider the example Quick Sort execution above.
The first pivot of 43 was near the middle, but 22 was a bad choice in
the second iteration. The choice of 47 on the right side of the second
iteration was a good choice. Let us assume that the values to be
sorted are randomly distributed. This means that the probability of
choosing the worst pivot should be 1/n. The probability of choosing
the worst pivot twice in a row would be 1/n * 1/(n − 1). This
probability is shrinking rapidly.
Here is another way to think about it. The idea of
repeatedly choosing a bad pivot by chance is the same as
encountering an already sorted array by chance. So the already
sorted order is one ordering out of all possible orderings. How many
possible orderings exist? Well, in our example we have 8 values. We
can choose any of the 8 as the first value. Once the first value is
chosen, we can choose any of the remaining 7 values for the second
value. This means that for the first two values, we already have 8*7
choices. Continuing this process, we get 8*7*6*5*4*3*2*1. There is a
special mathematical function for this called factorial. We represent
factorial with an exclamation point (!). So we say that there are 8!
or 8 factorial possible orderings for our 8 values. That is a total of
40,320 possible orderings with just 8 values! That means that the
probability of encountering by chance an already sorted list of 8
values is 1/8! or 1/40,320, which is 0.00002480158. Now imagine the
perfectly reasonable task of sorting 100 values. The value of 100! is
114 | Sorting
greater than the estimated number of atoms in the universe! This
2
makes the probability of paying the high cost of O(n ) extremely
unlikely for even relatively small arrays.
To think about the average-case complexity, we need to
consider the complexity across all cases. We could reason that
making the absolute best choice for a pivot is just as unlikely as
making the absolute worst choice. This means that the vast majority
of cases will be somewhere in the middle. Researchers have studied
Quick Sort and determined that the average complexity is O(n log n).
We won’t try to formally prove the average case, but we will provide
some intuition for why this might be. Sequences of bad choices for
a pivot are unlikely. When a pivot is chosen that partitions the array
unevenly, one part is smaller than the other. The smaller subset will
then terminate more quickly during recursion. The larger part has
another chance to choose a decent pivot, moving closer to the case
of a balanced partition.
As we mentioned, the choice of pivot can further improve
the performance of Quick Sort by working harder to avoid choosing
a bad pivot value. Some example extensions are to choose the pivot
randomly or to select 3 values from the list and choose the median.
These can offer some improvement over choosing the first element
as the pivot. Other variations switch to Insertion Sort once the
size of the partitions becomes sufficiently small, taking advantage
of the fact that small data sets may often be almost sorted, and
small partitions can take advantage of CPU cache efficiency. These
modifications can help improve the practical measured runtime but
do not change the overall Big-O complexity.
We should also discuss the space complexity for Quick Sort. Quick
Sort is an “in-place” sorting algorithm. So we do not require any
extra copies of the data. The tricky part about considering the
Sorting | 115
space complexity of Quick Sort is recognizing that it is a recursive
algorithm and therefore requires stack space. As with the worst-
case time complexity, it is possible that recursive calls to Quick
Sort will require stack space proportional to n. This leads to O(n)
elements stored in the array and O(n) extra data stored in the stack
frames. In the worst case, we have O(n + n) total space, which is
just O(n). Of the total space, we need O(n) auxiliary space for stack
data during the recursive execution of the algorithm. This scenario
is unlikely though. The average case leads to O(n + log n) space if our
recursion only reaches a depth of log n rather than n. This is still just
O(n) space complexity, but the auxiliary space is now only O(log n).
Exercises
116 | Sorting
the sort.
f. Consider creating functions to repeatedly run
an algorithm and record the average sorting time.
2
. Implement Selection Sort and Insertion Sort in
your language of choice. Using randomly generated array
data, try to find the number of values where Insertion
Sort begins to improve on Selection Sort. Remember to
repeat the sorting several times to calculate an average
time.
3
. Implement Merge Sort and repeat the analysis for
Merge Sort and Insertion Sort. For what n does Merge
Sort begin to substantially improve on Insertion Sort? Or
does it seem to improve at all?
4
. Implement Quick Sort in your language of choice.
Next, determine the time for sorting 100 values that are
already sorted using Quick Sort (complete the 1.d
exercise). Next, randomly generate 1,000 arrays of size
100 and sort them while calculating the runtime. Are any
as slow as Quick Sort on the sorted array? If so, how
many? If not, what is the closest time?
References
Sorting | 117
JaJa, Joseph. “A Perspective on Quicksort.” Computing
in Science & Engineering 2, no. 1 (2000): 43–49.
118 | Sorting
4. Search
Learning Objectives
Introduction
Search | 119
Figure 4.1
120 | Search
Linear Search
Search | 121
to an O(1) or constant number of operations. In algorithm analysis
(as in stock market investing), luck is not a strategy. We still need
to consider the worst-case behavior of the algorithm, as this
characteristic makes a better tool for evaluating one algorithm
against another. In general, we cannot choose the problems we
encounter, and our methods should be robust against all types of
problems that are thrown at us. The worst case for Linear Search
would be a problem where our key is found at the end of the array
or isn’t found at all. For inputs with this feature, our time complexity
bound is O(n).
122 | Search
Now think about storing an array of Student object
instances in memory. The diagram below is one way to visualize this
data structure:
Figure 4.2
Search | 123
An example use of this implementation is given below. The
programmer could access a student object safely (only if it was
found) using the array index after the array has been searched.
124 | Search
Figure 4.3
Search | 125
Using this implementation as before may look something
like this:
Binary Search
126 | Search
this game as a kid. The first player chooses a number between 1 and
100, and the second player tries to guess the number. The guesser
guesses a number, and the chooser reports one of the following
three scenarios:
Search | 127
would maximize the number that we eliminate each time? Maybe
you have thought of the strategy by now.
The optimal strategy would be to start with 50, which
eliminates half of the numbers with one guess. If the chooser
responds “lower,” the next guess should be 25, which again halves
the number of possible guesses. This process continues to split
the remaining values in half each time. This is the principle behind
Binary Search, and the “binary” name refers to the binary split of the
candidate values. This strategy works because the numbers from 1
to 100 have a natural order.
A precondition for Binary Search is that the elements of the
array are sorted. The sorting allows each comparison in the array
to be oriented, and it indicates in which direction to continue the
search. Each check adds some new information for our algorithm
and allows the calculation to proceed efficiently.
We will present an illustration below of an example
execution of the algorithm. Suppose we are searching for the key 27
in the sorted array below:
Figure 4.4
128 | Search
we will update the high end of the range. The high variable will be
set to mid − 1, and we will recalculate the mid.
Figure 4.5
Figure 4.6
Search | 129
The game description and array example should give you
an idea of how Binary Search can efficiently find keys in a sorted
data structure. Let us examine an implementation of this algorithm.
For our design, we will return the index of the value if it is found
or −1 as an invalid index to indicate the key was not found. We will
consider an array of integer keys, but it will work equally well with
objects assuming they are sorted by their relevant keys.
130 | Search
Binary Search Complexity
Search | 131
T(n)=c + T(n/2).
T(n)=c + (c + T(n/4))
2
=2c + T(n/2 )
3
=2c + (c + T(n/2 ))
3
=3c + T(n/2 ).
k
T(n)=k*c + T(n/2 ).
132 | Search
log n
T(n)=(log n)*c + T(n/2 )
=(log n)*c + c.
Search | 133
n)—all is lost! Well, let us use our analysis skills to try to determine
why and when Binary Search would be more useful than Linear
Search.
The important realization is that sorting is a one-time cost.
Once the array is sorted, all subsequent searches can be done in
O(log n). Let us think about how this compares to Linear Search,
which always has a cost of O(n) regardless of the number of the
number of times the array is searched. Another name for the act of
searching is called a query. A query is a question, and we are asking
the data structure the question “Do you have the information we
need?” Suppose that the variable Q is the number of queries that are
made of the data structure.
Querying our array using Linear Search Q times would give
the following time cost with c being a constant associated with O(n):
TLS(n, Q)=Q * c * n.
Querying our array using Binary Search Q times would give the
following cost:
134 | Search
T′LS(n)=n * c * n
2
=c * n .
2
This leads to a time complexity of O(n ) for searching with
approximately n different queries.
For Binary Search, we have the following adjusted formula:
=2 * c * (n log n).
Search | 135
Exercises
136 | Search
sorting time for your Binary Search database before
calculating the total time for all queries. Compare your
result to the Linear Search total query time. Next, repeat
this process for n, 2*n, and 4*n queries. At what number
of queries does Sorting + Binary Search start to show an
advantage over Linear Search?
References
Search | 137
5. Linked Lists
Learning Objectives
Introduction
You have a case of cola you wish to add to your refrigerator. Your
initial approach is to add all colas to the refrigerator while still in
the box. That way, when you want to retrieve a drink, they are all
in the same place, making them easier to find. You will also know
when you are running low because the box will be nearly empty.
Storing them while still in the box clearly has some benefits. It does
come with one major issue though. Consider a refrigerator filled
with groceries. You may not have an empty spot large enough to
accommodate the entire case of cola. However, if you open the case
and store each can individually, you can probably find some spot for
each of the 12 cans. You have now found a way to keep all cans cold,
When working with the linked list, the next element in the
structure, starting from a given element, is determined by following
the reference to the next node. In the example below, node a
references (or points to) node b. To determine the elements in the
structure, you can inspect the payloads as you follow the references.
We follow these references until we find a null reference (or a
reference that points to nothing). In this case, we have a linked list
of length 2, which has the value 1 followed by the value 12.
Figure 5.3
Continuing the comparison with arrays, our first task will be to look
up the value at an arbitrary position relative to a particular node
in a linked list. You may consider a position in a linked list as the
analogue of an array’s index. It is the ordinal number for a specific
value or node within the data structure. For the sake of consistency,
we will use 0-based positions.
While it is the case that most list traversals are implemented with
for-loops, there are occasions where other styles of traversal are
more appropriate. For example, for-loops are ill-suited for scenarios
where we do not know exactly how many times we must loop.
As a result, while-loops are often used whenever all nodes must
be visited. Consider the function below, which returns an integer
representing the number of elements in the list starting at
rootNode:
Insert
Remove
Now that we have seen how to traverse the list via the Next
reference and rearrange those references to insert a new node,
we are ready to address removal of nodes. We will continue to
provide the root node and a position to the function. Also, because
we might choose to remove the element at the 0 position, we will
continue to return the root node of the resulting list. As with the
insertAtPosition, we assume that the value for position is valid for
this list.
Figure 5.6
For each insert into the list, we must now maintain some
new values on the list object. Lines 7, 8, 11, and 22 help keep track
of when we have changed the head or tail of the list. Line 24 runs
regardless of where the value was inserted because we now have
one more element in the list.
This extra work was not in vain. Consider what was
required if we wanted to write a function to return the last element
of a linked list represented by the root node compared to running it
on a list object.
Exercises
References
Learning Objectives
Introduction
Stack
Linked lists are well suited for both queues and stacks. First, adding
and removing elements is simply a matter of creating a node and
setting a reference or removing a reference. Second, we have a
fairly intuitive means of tracking both ends of the data structure.
For purposes of this section, we will assume a singly linked node
discussed earlier in this book.
How can we ensure we have enough space in our array for the next
Practical Considerations
Exercises
References
Learning Objectives
Introduction
Hash Functions
If you have ever eaten breakfast at a diner in the USA, you were
probably served some hash browns. These are potatoes that have
been very finely chopped up and cooked. In fact, this is where the
“hash” of the hash function gets its name. A hash function takes a
number, the key, and generates a new number using information in
the original key value. So at some level, it takes information stored
in the key, chops the bits or digits into parts, then rearranges or
combines them to generate a new number. The important part,
though, is that the hash function will always generate the same
output number given the input key. There are many different types
of hash functions. Let’s look at a simple one that hashes the key 137.
We will use a small subscript of 2 when indicating binary numbers.
Figure 7.2
Hash Tables
Once you have finished reading this chapter, you will understand
the idea behind hash tables. A hash table is essentially a lookup table
that allows extremely fast search operations. This data structure
is also known as a hash map, associative array, or dictionary. Or
more accurately, a hash table may be used to implement associative
arrays and dictionaries. There may be some dispute on these exact
terms, but the general idea is this: we want a data structure that
associates a key and some value, and it must efficiently find the
value when given the key. It may be helpful to think of a hash table
as a generalization of an array where any data type can be used as
an index. This is made possible by applying a hash function to the
key value.
For this chapter, we will keep things simple and only use
integer keys. Nearly all modern programming languages provide a
built-in hash function or several hash functions. These language
This hash function maps the key to a valid array index. This
can be done in constant time, O(1). When searching for a student in
our database, we could do something like this:
Let us begin by specifying our hash table data structure. This class
will need a few class functions that we will specify below, but first
let’s give our hash table some data. Our table will need an array of
Student records and a size variable.
Probing
Figure 7.5
Figure 7.6
Figure 7.7
We saw that adding more students to the hash table can lead to
collisions. When we have collisions, the probing sequence places the
colliding student near the original student record. Think about the
situation below that builds off one of our previous examples:
Figure 7.8
Quadratic Probing
Figure 7.10
Figure 7.11
Double Hashing
As the name implies, double hashing uses two hash functions rather
than one. Let’s look at the specific problem this addresses. Suppose
we are using the good practice of having size be a prime number.
This still cannot overcome the problem in probing methods of
having the same initial hash index. Consider the following situation.
Suppose k1 is 13 and k2 is 26. Both keys will generate a hashed value
of 0 using mod 13. The probing sequence for k1 in linear probing is
this:
2
hQ(k,c) = (k mod size + c ) mod size
h1(k) = k mod p1
Let’s let p1 = 13 = size and p2 = 11 for our example. How would this
change the probe sequence for our keys 13 and 26? In this case h1(13)
= h1(26) = 0, but h2(13) = 2, h2(26) = 4.
Consider the following table:
Figure 7.14
Figure 7.15
Chaining
Figure 7.16
Figure 7.17
Our list will just keep track of references to the head and
tail Nodes in the list. To start thinking about using this list, let’s
cover the add function for our LinkedList. We will add new students
to the end of our list in constant time using the tail reference. We
need to handle two cases for add. First, adding to an empty list
means we need to set our head and tail variables correctly. All other
cases will simply append to the tail and update the tail reference.
Exercises
2
. Implement a hash table using linear probing as
described in the chapter using your language of choice,
but substitute the Student class for an integer type. Also,
implement a utility function to print a representation of
your table and the status associated with each open slot.
Once your implementation is complete, execute the
sequence of operations described in exercise 1, and print
the table. Do your results match the paper results from
exercise 1?
3
. Extend your linear probing hash table to have a
load variable. Every time a record is added or removed,
recalculate the load based on the size and the number of
records. Add a procedure to create a new array that has
size*2 as its new size, and add all the records to the new
References
Learning Objectives
Introduction
Figure 8.1
Figure 8.2
Searching
Insertion
No Children
Figure 8.5
Figure 8.6
Self-Balancing Trees
AVL Trees
AVL trees are named after the computer scientists who developed
them (G. M. Adelson-Velsky and E. M. Landis). After insertions that
leave the tree in an unbalanced state, we achieve balance by
performing small constant-timed adjustments called rotations.
First, we must determine whether an insertion has resulted
in an unbalanced tree. To determine this, we use a metric called
the balance factor. This integer is the difference in the heights of
a node’s left and right subtrees. Below is the simplest possible tree
where we can witness such an imbalance. As usual, values are stored
inside the node. Subtree heights are stored at the upper right of
Figure 8.8
Exercises
2
. In the language of your choice, implement BST
deletes. Rather than solving the entire problem at once,
break your code into three distinct cases:
References
Learning Objectives
Introduction
Heaps
Figure 9.1
Insertion
Consider inserting the number 8 into the prior binary heap example.
Imagine if we simply added that 8 in the array after the 2 (in position
6).
Figure 9.3
What can we now claim about the state of our binary heap?
Subheaps starting at indexes 1, 3, 4, and 5 are all still valid subheaps
because the heap property is preserved. In other words, our
erroneous insertion of 8 under 4 does not alter the descendants
of these 4 nodes. As a result, we could hypothetically leave these
Extraction
Figure 9.5
Binomial Heaps
Figure 9.7
Figure 9.10
Figure 9.13
Figure 9.15
Figure 9.16
Merging Heaps
Now we will tackle the union function. This function performs the
final step of combining two binomial heaps by traversing the
merged list and combining any pairs of trees that have equal
degrees. The algorithm is given below:
Summary
Learning Objectives
Introduction
F1 = 1
Fn = Fn−1 + Fn−2.
F8=F8−1 + F8−2
=F7 + F6
…and so on.
There are two key thoughts we can learn from this expansion for
calculating F8. The first thought is that things are getting out of
hand and fast! Every term expands into two terms. This leads to
n
eight rounds of doubling. Our complexity looks like O(2 ), which
20
should be scary. Already at n = 20, 2 is in the millions, and it only
gets worse from there. The second thought that comes to mind in
observing this explanation is that many of these terms are repeated.
Let’s look at the last line again.
Figure 10.2
Figure 10.3
Figure 10.4
Now we have seen two algorithms for solving the optimal matrix
chain multiplication problem. The recursive formulation proved to
n
be exponential time (O(2 )) with each recursive call potentially
branching n − 1 times. The dynamic programming algorithm should
improve upon this cost; otherwise, it would not be very useful.
One way to reason about the complexity is to think about how the
tables get filled in. Ultimately, we are filling in about one-half of a
2D array or table. This amounts to filling in the upper triangular
portion of a matrix in mathematical terms. Our table is n by n,
and we are filling in n(n+1)/2 values (a little over half of the n-by-
2
n matrix). So you may think the time complexity should be O(n ).
This is not the full story though. For every start-end pair, we must
try all the split indexes. This could be as bad as n − 1. So all these
pairs need to evaluate up to n − 1 options for a split. We can reason
2 3
that this requirement would lead to some multiple of n *n or n
operations. This provides a good explanation of the time complexity,
3 3
which is O(n ). This may seem expensive, but O(n ) is profoundly
n
better than O(3 ). Moreover, consider the difference between the
1. ck−1 = am−1 and ck−1 = bn−1. This means that am−1 = bn−1 and Ck−2
is an LCS of Am−2 and Bn−2.
2. am−1 is not equal to bn−1, and ck−1 is not equal to am−1. This must
mean that C is an LCS of Am−2 and B.
3. am−1 is not equal to bn−1, and ck−1 is not equal to bn−1. This must
mean that C is an LCS of A and Bn−2.
Now that we have seen the algorithm and an example, let’s consider
the time complexity of the algorithm. The nested loops for the A
and B indexes should be a clue. In the worst case, all increasing
subsequences of each input string need to be compared. The
algorithm fills every cell of the n-by-m table (ignoring the first row
and column of zeros, which are initialized with a minor time cost).
This gives us n * m cells, so the complexity of the algorithm would
be considered O(mn). It might be reasonable to assume that m and
n are roughly equal in size. This would lead to a time complexity of
2 n
O(n ). This represents a huge cost savings over the O(2 ) time cost
of the recursive algorithm.
A Note on Memoization
Exercises
References
Learning Objectives
Introduction
Graphs | 297
• circuit design
298 | Graphs
travel from one node to another. A cycle is a path that starts and
ends at the same node. Sometimes cycles are useful, but they often
represent challenges in graph algorithms. Failing to detect a cycle
in graph algorithms often results in implementations falling into
endless loops.
Figure 11.1
Graphs | 299
Figure 11.2
Representations of Graphs
300 | Graphs
language, or computing environment. For simplicity, we will only
address the two main strategies.
Adjacency Matrices
Figure 11.3
Graphs | 301
on the type of graph we are modeling. Some considerations for
adjacency matrices include the following:
302 | Graphs
Adjacency Lists
Figure 11.4
Graphs | 303
will likely need some kinds of composite types such as objects
or structs. Unweighted graphs are easily stored using the
identity of the node and may not require additional types.
• Directed or Undirected: As with adjacency matrices,
undirected graphs tend to lead toward redundancy in data. If
an undirected graph has an edge between A and B, then A’s
secondary list stores a reference to B, and B’s secondary list
stores a reference to A. Directed graphs have no such concern.
• Underlying Data Structures: As with adjacency matrices, it may
be convenient to identify each node using an integer. This
allows us to leverage constant-time lookups when looking for
nodes in primary or secondary lists. Unlike adjacency matrices,
using arrays for secondary lists may pose additional challenges
if edges are frequently added or removed (due to the fixed size
of arrays).
Algorithms
Traversal
Two frequent questions with graphs are (1) how we can visit each
node and (2) if we can find a path between two nodes. They arise
whenever we want to broadcast on a network, find a route between
two cities, or help a virtual actor through a maze. We rely on two
related algorithms to accomplish this task: breadth first traversal
and depth first traversal. If we wish to perform a search, we simply
terminate the traversal once the target node has been located.
304 | Graphs
Both algorithms depend on knowing some start node. If we
are attempting to traverse all nodes, a start node may be chosen
arbitrarily. If we wish to find a path from one node to another, we
obviously must choose our start node deliberately. Both algorithms
work from the same basic principle: if we wish to visit every node
originating from some start node, we first visit its neighbors, its
neighbors’ neighbors, and so on until all nodes have been visited.
The primary distinction between both is the order in which we
consider the next node. Breadth first spreads slowly, favoring nodes
closest to the start node. Depth first reaches as deep as possible
quickly. Below are examples of both traversals. Note that there is
no unique breadth first or depth first traversal, but rather they
are dependent on a precise implementation. The examples below
represent possible traversals. There are other possibilities.
Figure 11.5
Graphs | 305
first starting from A will consider all A’s neighbors first, we will
encounter C before we encounter F. Depth first will instead
prioritize B’s neighbors before completing all of A’s neighbors. As a
result, depth first reaches F before breadth first does.
In addition to the order in which we visit neighbors, we
must also pay close attention to cycles. Recall that cycles are paths
that start and end at the same node. In the depth first example
above, what happens when we finally visit C only to find its neighbor
is A? If we fail to recognize this as a cycle, we will again traverse
A B D F E C and continue to do so indefinitely. We must avoid
visiting already visited nodes. We will need to incorporate this into
the algorithm as well.
Below is the pseudocode for a breadth first traversal. The
function call Visit is simply a placeholder for some meaningful
action you might take at each node (the simplest of which is simply
printing the node identifier). It also assumes that node identifiers
are integers.
306 | Graphs
mark it as visited, then enqueue its neighbors. Once we have visited
a node and enqueued its neighbors, the conditional on line 9 will
prevent us from doing the same again. This is our mechanism for
avoiding cycles.
This pseudocode describes a breadth first traversal but
only requires a nominal change to make it a depth first traversal.
Recall that after we visited A, we visited B and C. This was due to
the first-in-first-out nature of queues. Consider what happens if
we swap our queue for a stack. We visit A and now push B and C.
C was the last node pushed, so a pop returns it. We then push A
and E, which will eventually be popped before B. As a result, we
prioritize nodes deep in the graph before ever considering B. In fact,
we eventually consider B due to its adjacency to D rather than its
adjacency to A.
Three aspects of graph algorithms make runtime analysis
difficult. Because of these challenges, runtime analysis in this
chapter will be less precise than in others but still descriptive as to
roughly how much effort is required to perform the task at hand.
Graphs | 307
know that the number of edges is relatively low compared to
the number of nodes, then use the sum of the two. If we know
the graph to be highly connected, it is better to recognize the
runtime as quadratic.
• Algorithms are typically presented conceptually without
regard to precise implementations of graphs or auxiliary data
structures. For example, if we are working in an object-
oriented system, we may leverage adjacency lists, which limit
our ability to perform constant-time indexing into the adjacent
nodes. If we have no mapping from nodes to integers, we may
have to perform Linear Searches to determine if nodes have
been visited.
While breadth first and depth first searches provide a path from
a source to a destination, they do not guarantee an efficient path.
Accomplishing such a task requires that the algorithm consider the
weights of the edges that it traverses as well as the cumulative
weights of edges already traversed. Numerous algorithms exist to
find the shortest path from one node to another, but this section
will focus on Edgar Dijkstra’s algorithm.
Dijkstra’s algorithm determines the shortest path between
a source and destination node by maintaining a list of minimum
distances required to reach each node already visited by the
algorithm. It is an example of a greedy algorithm. This is a general
strategy employed in algorithms, akin to the divide-and-conquer
strategy employed in Binary Search or Merge Sort. Greedy
algorithms make locally optimal choices that trend toward globally
optimal solutions. In the case of Dijkstra’s (and Prim’s to follow), we
only consider a single node at a time and use the information at
that location in the graph to update the global state. If we carry this
strategy out in clever ways, we can indeed determine the shortest
308 | Graphs
path between two nodes without each step considering the entire
graph.
In the following graph, consider finding the shortest path
from A to D. Note that in the initial state, we acknowledge that the
distance from the source node to itself is 0. This is analogous to
enqueuing or pushing the source node in breadth first and depth
first traversals. The primary control flow will again be a loop, which
selects the next node to consider. Initializing some state again
provides the loop with a logical place to begin. Also note that we
maintain predecessors for each node whenever we update the
distance to that node. This helps traverse the shortest path after the
algorithm has been completed.
Figure 11.6
Graphs | 309
path from A to C. What we are claiming is that of the nodes visited
so far, the minimum weight paths to B or C are 4 and 6, respectively.
We then start the next iteration by carefully choosing the next node
to visit. It should be one that has not yet been visited so that we do
not create cycles. Additionally, regarding the shortest path, we must
choose the next node based on which has the minimum distance
from the starting node. We then repeat this process until each node
is visited or we reach some desired destination node.
Figure 11.7
310 | Graphs
As with breadth first and depth first traversals, the precise
runtime cannot be determined without specifying exactly how
visited, distance, predecessors, and edges are structured. What we
can determine with certainty is that the while-loop will iterate the
same number of times as the number of nodes in the graph. This is
evident because each iteration marks a node as visited, and the loop
terminates when all nodes are visited. Assuming we are looking for
all shortest paths (or our destination node is the last to be visited),
we will perform the body of the inner loop once for each edge in
the graph. This very closely approximates the runtime behavior of
the traversal algorithms earlier and results in a likely worst-case
2
runtime of O(N ). However, Dijkstra’s algorithm is well researched,
and known improvements can be made to this runtime by choosing
clever data structures to represent different components.
Graphs | 311
only the edges in the subgraph.
3. It is possible to have more than one spanning tree for a given
graph. Of those possible spanning trees, the MST is the one
with the lowest cumulative edge weight.
Exercises
312 | Graphs
1. Imagine you have a sparse graph with a large
number of nodes but a relatively small number of edges.
You are working in a system with strict constraints on
how much memory you can consume. How does this
impact your decision between an adjacency matrix and an
adjacency list?
2
. The breadth first example and pseudocode assume
an unweighted and undirected graph. Does this pseudo
code change if the graph is weighted? What if it is
directed?
3
. Consider an array of numbers. Devise a way of
sorting these numbers using the graph algorithms from
this chapter. The challenging part here is to determine
how to model the array as a graph. Hint: What if the
numbers are nodes, each node is connected to each other
node, and the weight is the difference between the two
nodes’ values?
References
Graphs | 313
http://discrete.openmathbooks.org/dmoi3/
ch_graphtheory.html
314 | Graphs
12. Hard Problems
Learning Objectives
Introduction
Figure 12.1
Figure 12.2
Figure 12.3
(x ˄ y) ˅ (¬x ˄ z).
Figure 12.5
Figure 12.6
Given the boxes and their sizes, is there a way to pack all
the boxes in the minimum number of bins? It might seem simple,
but to solve this problem optimally, in general, might require a lot of
time. One approach to finding the optimal number of bins would be
to try all orderings of the items. Attempt to create bins by taking the
items in the ordering and opening a new bin when the first is full.
By trying all possible orderings of the items, the optimal bin number
would be found, but this would take O(n!) time.
Figure 12.7
Figure 12.11
Figure 12.12
Exercises
References
Paul W. Bible
DEPAUW UNIVERSITY
Lucas Moser
MARIAN UNIVERSITY
Contributors | 341
that a rich education plays a major role in the development of
problem-solving skills. His experiences in software engineering,
management, and teaching bring a unique perspective to both
project teams and students.
https://orcid.org/0009-0006-9452-1246
Reviewers
Aaron Boudreaux
UNIVERSITY OF LOUISIANA AT LAFAYETTE
Joshua Kiers
MARIAN UNIVERSITY
Illustrator
Mia M. Scarlato
342 | Contributors