0% found this document useful (0 votes)
161 views156 pages

cmsc420 Fall2020 Lects

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views156 pages

cmsc420 Fall2020 Lects

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

CMSC 420

Data Structures1

David M. Mount
Department of Computer Science
University of Maryland
Fall 2020

1
Copyright, David M. Mount, 2020, Dept. of Computer Science, University of Maryland, College Park, MD, 20742.
These lecture notes were prepared by David Mount for the course CMSC 420, Data Structures, at the University of
Maryland. Permission to use, copy, modify, and distribute these notes for educational purposes and without fee is
hereby granted, provided that this copyright notice appear in all copies.

Lecture Notes 1 CMSC 420


Lecture 1: Course Introduction and Background
Algorithms and Data Structures: The study of data structures and the algorithms that ma-
nipulate them is among the most fundamental topics in computer science. Most of what
computer systems spend their time doing is storing, accessing, and manipulating data in one
form or another. Some examples from computer science include:

Information Retrieval: List the 10 most informative Web pages on the subject of “how to
treat high blood pressure?” Identify possible suspects of a crime based on fingerprints or
DNA evidence. Find movies that a Netflix subscriber may like based on the movies that
this person has already viewed. Find images on the Internet containing both kangaroos
and horses.
Geographic Information Systems: How many people in the USA live within 25 miles of
the Mississippi River? List the 10 movie theaters that are closest to my current location.
If sea levels rise 10 meters, what fraction of Florida will be under water?
Compilers: You need to store a set of variable names along with their associated types.
Given an assignment between two variables we need to look them up in a symbol table,
determine their types, and determine whether it is possible to cast from one type to the
other).
Networking: Suppose you need to multicast a message from one source node to many other
machines on the network. Along what paths should the message be sent, and how can
this set of paths be determined from the network’s structure?
Computer Graphics: Given a virtual reality system for an architectural building walk-
through, what portions of the building are visible to a viewer at a particular location?
(Visibility culling)

In many areas of computer science, much of the content deals with the questions of how to
store, access, and manipulate the data of importance for that area. In this course we will
deal with the first two tasks of storage and access at a very general level. (The last issue of
manipulation is further subdivided into two areas, manipulation of numeric or floating point
data, which is the subject of numerical analysis, and the manipulation of discrete data, which
is the subject of discrete algorithm design.) An good understanding of data structures is
fundamental to all of these areas.
What is a data structure? Whenever we deal with the representation of real world objects in
a computer program we must first consider a number of issues:

Modeling: The manner in which objects in the real world are modeled as abstract mathe-
matical entities and basic data types
Operations: The actions that are used to store, query, and manipulate these entities along
with the formal meaning of these operations
Representation: The manner in which these entities are represented concretely in a com-
puter’s memory
Algorithms: The algorithms that are used to perform these operations

Note that the first two items above are essentially mathematical in nature, and deal with
the “what” of a data structure, whereas the last two items involve the implementation issues
and the “how” of the data structure. The first two essentially encapsulate the essence of

Course Introduction 2 CMSC 420


an abstract data type (or ADT). In contrast the second two items, the concrete issues of
implementation, will be the focus of this course.
For example, you are all familiar with the concept of a stack from basic programming classes.
This a sequence of objects (of unspecified type). Objects can be inserted into the stack
by pushing and removed from the stack by popping. The pop operation removes the last
unremoved object that was pushed. Stacks may be implemented in many ways, for example
using arrays or using linked lists. Which representation is the fastest? Which is the most
space efficient? Which is the most flexible? What are the tradeoffs involved with the use of
one representation over another? In the case of a stack, the answers are all rather mundane.
However, as data structures grow in complexity and sophistication, the answers are far from
obvious.
In this course we will explore a number of different data structures, study their implemen-
tations, and analyze their efficiency (both in time and space). One of our goals will be to
provide you with the tools and design principles that will help you to design and implement
your own data structures to solve your own data storage and retrieval problems efficiently.

Course Overview: In this course we will consider many different abstract data types and various
data structures for implementing each of them. There is not always a single “best” data
structure for a given task. For example, there are many common sorting algorithms: Bubble-
Sort is easy to code but slow, Quick-Sort is very fast but not stable, Merge-Sort is stable but
needs additional memory, and Heap-Sort needs no additional memory but is hard to code
(relative to Quick-Sort). It will be important for you, as a designer of the data structure, to
understand each structure well enough to know the circumstances where one data structure
is to be preferred over another.
How important is the choice of a data structure? There are numerous examples from all areas
of computer science where a relatively simple application of good data structure techniques
resulted in massive savings in computation time and, hence, money.
Perhaps a more important aspect of this course is a sense of how to design new data structures,
how to implement these designs, and how to evaluate how good your design is. The data
structures we will cover in this course have grown out of the standard applications of computer
science. But new applications will demand the creation of new domains of objects (which we
cannot foresee at this time) and this will demand the creation of new data structures. It will
fall on the students of today to create these data structures of the future. We will see that
there are a few important elements which are shared by all good data structures. We will
also discuss how one can apply simple mathematics and common sense to quickly ascertain
the weaknesses or strengths of one data structure relative to another.

Our Approach: We will consider the design of data structures from two different perspectives:
theoretical and practical. Our theoretical analysis of data structures will be similar in style
to the approach taken in algorithms courses (such as CMSC 351). The emphasis will be
on deriving asymptotic (so called, “big-O”) bounds on the space, query time, and cost of
operations for a given data structure. On the practical side, you will be writing programs
to implement a number of classical data structures. This will acquaint you with the skills
needed to develop clean designs and debug them.
The remainder of the lecture will review some material from your earlier algorithms course,
which hopefully you still remember.

Course Introduction 3 CMSC 420


Algorithmics: This background from CMSC 351. Review and familiarize yourself with this.
It is easy to see that the topics of algorithms and data structures cannot be separated since
the two are inextricably intertwined. So before we begin talking about data structures, we
must begin with a quick review of the basics of algorithms, and in particular, how to measure
the relative efficiency of algorithms. The main issue in studying the efficiency of algorithms
is the amount of resources they use, usually measured in either the space or time used. There
are usually two ways of measuring these quantities. One is a mathematical analysis of the
general algorithm being used, called an asymptotic analysis, which can capture gross aspects
of efficiency for all possible inputs but not exact execution times. The second is an empirical
analysis of an actual implementation to determine exact running times for a sample of specific
inputs, but it cannot predict the performance of the algorithm on all inputs. In class we will
deal mostly with the former, but the latter is important also.2
Running time: For now let us concentrate on running time. (What we are saying can also be
applied to space, but space is somewhat easier to deal with than time.) Given a program, its
running time is not a fixed number, but rather a function. For each input (or instance of the
data structure), there may be a different running time. Presumably as input size increases so
does running time, so we often describe running time as a function of input/data structure
size n, denoted T (n). We want our notion of time to be largely machine-independent, so
rather than measuring CPU seconds, it is more common to measure basic “steps” that the
algorithm makes (e.g. the number of statements executed or the number of memory accesses).
This will not exactly predict the true running time, since some compilers do a better job of
optimization than others, but its will get us within a small constant factor of the true running
time most of the time.
When discussing data structures, we do not usually talk about running time from the per-
spective of the individual operations. For example:
Query time: How long does it take to answer a query on a data structure containing n
objects?
Update time: How long does it take to insert (or delete) and object from a data structure
containing n objects?
Bulk load time: Given a collection of n objects, how long does it take to construct a data
structure containing all of them?
Even measuring running time as a function of input size is not really well defined, because,
for example, it may be possible to sort a list that is already sorted, than it is to sort a list that
is randomly permuted. For this reason, we usually talk about worst case running time. Over
all possible inputs of size n, what is the maximum running time. It is often more reasonable
to consider expected case running time where we average over all inputs of size n. When
dealing with randomized algorithms (where the execution depends on random choices), it is
common to focus on the worst case over all inputs (of a given size) and expected case over
all random choices. Another example that is often used in data structure design is called
amortized analysis, where the average is taken over a series of operations. Any one operation
might be costly, but the overall average in any long sequence cannot be. We will usually do
worst-case analysis, except where it is clear that the worst case is significantly different from
the expected case.
2
Of course, there is another aspect of complexity, that we will not discuss at length (but needs to be considered)
and that is the software-engineering issues regarding the complexity of programming a correct implementation.

Course Introduction 4 CMSC 420


Review of Asymptotics: There are particular bag of tricks that most algorithm analyzers use
to study the running time of algorithms. For this class we will try to stick to the basics. The
first element is the notion of asymptotic notation. Suppose that we have already performed
an analysis of an algorithm and we have discovered through our worst-case analysis that

T (n) = 13n3 + 42n2 + 2n log n + 3 n.

(This function was just made up as an illustration.) Unless we say otherwise, assume that
logarithms are taken base 2. When the value n is small, we do not worry too much about
this function since it will not be too large, but as n increases in size, we will have to worry
about the running time. Observe that as n grows larger, the size of n3 is much larger than n2 ,
which is much larger than n log n (note that 0 < log n < n whenever n > 1) which is much

larger than n. Thus the n3 term dominates for large n. Also note that the leading factor 13
is a constant. Such constant factors can be affected by the machine speed, or compiler, so we
may ignore it (as long as it is relatively small). We could summarize this function succinctly
by saying that the running time grows “roughly on the order of n3 ”, and this is written
notationally as T (n) = O(n3 ). Informally, the statement T (n) = O(n3 ) means, “when you
ignore constant multiplicative factors, and consider the leading (i.e. fastest growing) term,
you get n3 ”. This intuition can be made more formal, however. We refer you back to your
algorithms course for how to do this.

Digression about Notation: Let us pause for a moment to explain the “=” in the assertion
that “T (n) = O(n3 )”. It is not being used in the usual mathematical sense. By definition,
“a = b” implies “b = a”. The use of “=” in asymptotic notation just a lazy way of saying
T (n) “is” on the order of n3 . It would not make sense to express this as O(n3 ) = T (n).
A more proper way of writing this expression would be “T (n) ∈ O(n3 )”, that is T (n)
is a member of the set of functions whose asymptotic growth rate is at most n3 . But,
most of the world simply uses the “=” notation, and so shall we.

Common Complexity Classes: To get a feeling what various growth rates mean here is a sum-
mary. In data structures, our objective is usually to answer a query in time that is less than
the size of the data set n, so in this class we will be primarily interested in the following
complexity classes, which are collectively called sublinear.

T (n) = O(1) : Great. This means your algorithm takes only constant time. You can’t beat
this. (Example: Popping a stack.)
T (n) = O(α(n)) : The function α is the inverse of the famous Ackerman’s function. Acker-
man’s function grows insanely rapidly, and hence its inverse grows insanely slowly. How
slowly? If n is less than the number of atoms in the visible universe, then α(n) ≤ 5.
Thus, α(n) is a constant “for all practical purposes”. However, formally it is not a con-
stant. In the limit, as n tends to infinity, α(n) also tends to infinity. It just gets there
extremely slowly! Believe or not, there are actually a number of data structures whose
running times are O(α(n)), but they are not O(1). (Example: Disjoint-set union-find.)
T (n) = O(log log n) : Super fast! For most practical purposes, this is as fast as a constant
time. (Example: We will probably not show any examples this semester, but one example
is a data structure for storing sets of integers, called a Van Emde Boas tree.)
T (n) = O(log n) : Very good. This is called logarithmic time, and is the “gold standard”
for data structures based on making binary comparisons. It is the running time of

Course Introduction 5 CMSC 420


binary search and the height of a balanced binary tree. This is about the best that
can be achieved for data structures based on binary trees. Note that log 1000 ≈ 10 and
log 1, 000, 000 ≈ 20 (log’s base 2). (Example: Binary search.)
T (n) = O((log n)k ) : (where k is a constant). This is called polylogarithmic time. Not bad,
when simple logarithmic time is not achievable. We will often write this as O(logk n).
(Example: Orthogonal range searching, that is, counting the number of points in a
d-dimensional axis-parallel rectangle.)

(a) (b) (c)

Fig. 1: (a) A point set, (b) orthogonal range search query, and (c) nearest-neighbor query.

T (n) = O(np ) : (where 0 < p < 1 is a constant). An example is O( n). This is slower than
polylogarithmic (no matter how big k is or how small p), but is still faster than linear
time, which is acceptable for data structure use. (Example: Nearest neighbor searching
in d-dimensional space.)

In an algorithms course, it is more common to focus on running times that grow at least
linearly. These are described below.

T (n) = O(n) : This is called linear time. It is about the best that one can hope for if your
algorithm has to look at all the data. (Example: Enumerating the elements of a linked
list.)
T (n) = O(n log n) : This one is famous, because this is the time needed to sort a list of
numbers by means of comparisons. It arises in a number of other problems as well.
(Example: Sorting, of course.)
T (n) = O(n2 ) : Called quadratic time. Okay if n is in the thousands, but rough when n gets
into the millions. (Example: 3Sum: Given a list of n numbers (positive and negative),
do any three sum to zero?)
T (n) = O(nk ) : (where k is a constant). Called polynomial time. Practical if k is not too
large. (Example: Matrix multiplication.)
T (n) = O(2n ), O(nn ), O(n!) : Called exponential time. Algorithms taking this much time are
only practical for the smallest values of n (e.g. n ≤ 10 or maybe n ≤ 20). (Example:
Your favorite NP-complete problem. . . as far as anyone knows!)

Lecture 2: Some Basic Data Structures


Basic Data Structures: Before we go into our coverage of complex data structures, it is good to
remember that in many applications, simple data structures are sufficient. This is true, for
example, if the number of data objects is small enough that efficiency is not so much an issue,

Basic Data Structures 6 CMSC 420


and hence a complex data structure is not called for. In many instances where you need a
data structure for the purposes of prototyping an application, these simple data structures
are quick and easy to implement.

Abstract Data Types: An important element to good data structure design is to distinguish
between the functional definition of a data structure and its implementation. By an abstract
data structure (ADT) we mean a set of objects and a set of operations defined on these
objects. For example, a stack ADT is a structure which supports operations such as push
and pop (whose definition you are no doubt familiar with). A stack may be implemented in
a number of ways, for example using an array or using a linked list. An important aspect of
object-oriented languages, like Java, is the capability to present the user of a data structure
with an abstract view of its function without revealing the methods with which it operates.
Java’s interface/implements mechanism is an example. To a large extent, this course will be
concerned with the various approaches for implementing simple abstract data types and the
tradeoffs between these options.

Linear Lists: A linear list or simply list is perhaps the most basic of abstract data types. A list
is simply an ordered sequence of elements ha1 , a2 , . . . , an i. We will not specify the actual type
of these elements here, since it is not relevant to our presentation. (In Java this would be
handled through generics.)
The size or length of such a list is n. Here is a very simple, minimalist specification of a list:

init(): Initialize an empty list


get(i): Returns element ai
set(i,x): Sets the ith element to x
length(): Returns the number of elements currently in the list
insert(i,x): Insert element x just prior to element ai (causing the index of all subsequent
items to be increased by one).
delete(i): Delete the ith element (causing the indices of all subsequent elements to be
decreased by 1).

I am sure that you can imagine many other useful operations, for example searching the list
for an item, splitting or concatenating lists, generating an iterator object for enumerating the
elements of the list.
There are a number of possible implementations of lists. The most basic question is whether
to use sequential allocation (meaning storing the elements sequentially in an array) or linked
allocation (meaning storing the elements in a linked list). (See Fig. 2.) With linked allocation
there are many other options to be considered. Is the list singly linked (each node pointing
to its successor in the list), doubly linked (each node pointing to both its successor and
predecessor), circularly linked (with the last node pointing back to the first)?

Stacks, Queues, and Deques: There are a few very special types of lists. The most well known
are of course stacks and queues. We’ll also discuss an interesting generalization, called the
deque (see Fig. 3):

Stack: Supports insertion (push) and removal (pop) from only one end of the list, called the
stack’s top. Stacks are among the most widely used of all data structures, and we will
see many applications of them throughout the semester.

Basic Data Structures 7 CMSC 420


Singly linked list
Sequential allocation head a1 a2 a3 a4
a1 tail
a2
Doubly linked list
a3 head
a4 a1 a2 a3 a4
tail

Fig. 2: Common types of list allocation.

Queue: Supports insertion (called enqueue) and removal (called dequeue), each from opposite
ends of the list. The end where insertion takes place is called the tail, and the end where
removals occur is called the head.
Deque: This data structure is a generalization of stacks and queues, called a double-ended
queue or “deque” for short. It supports insertions and removals from either end of the
list.
The name is actually a play on words. It is written like “d-e-que” for a “double-ended
queue”, but it is pronounced like deck, because it behaves like a deck of cards, since you
can deal off the top or the bottom.

Stack Queue Deque


push enqueue pushBack pushFront

top pop tail head dequeue popBack popFront

Fig. 3: Stack, Queue, and Deque.

Both stacks and queues can be implemented efficiently as arrays or as linked lists. Note
that when a queue is implemented using sequential allocation (as an array) the head and tail
pointers chase each other around the array. When each reaches the end of the array it wraps
back around to the beginning of the array.

Dynamic Storage Reallocation: When sequential allocation is used for stacks and queues, an
important issue is what to do when an attempt is made to insert an element into an array
that is full. When this occurs, the usual practice is the allocate a new array of twice the
size as the existing array, and then copy the elements of the old array into the new one. For
example, if the initial stack or queue has 8 elements, then when an attempt is made to insert
a 9th element, we allocate an array of size 16, copy the existing 8 elements to this new array,
and then add the new element. When we fill this up, we then allocate an array of size 32,
and when it is filled an array of size 64, and so on.
To understand whether such an approach is efficient, we will view the problem from the
perspective of an amortized analysis.

Amortized Analysis: In an amortized analysis of a data structure, rather than the (worst-case)
cost of each individual operation, we consider the average cost of maintaining the data struc-

Basic Data Structures 8 CMSC 420


ture over a long sequence of operations. The idea is that many operations are very cheap
to perform, but occasionally we need to perform a major reorganization, which can be quite
costly. More formally, suppose that the running time of any sequence of m data structure
operations (starting from an empty structure) is given by a function T (m). Then we amor-
tized cost of each operation is defined to be T (m)/m. Amortization is a way of establishing
that, when taken as a whole, the costly reorganizations do not dominate the cheap ones.
(The term comes from economics. It refers to spreading out expensive payments over small
installments.)
To make this more formal in the case of a dynamic stack, let’s fix a cost model, called the
actual cost:

Cheap: Each time we are asked to perform a push or pop, and the array is not reallocated,
it costs one work unit for the operation.
Expensive: If a push would cause the array to overflow, we reallocated the current array of
size n into a new array of size 2n, and it costs 2n work units to perform the reallocation
(which involves allocating the new array, initializing it, copying the n elements over).
Finally, it costs 1 additional work unit to perform the actual push operation.

For example, suppose we start with an array of size n = 1, and we perform a sequence of m
pushes and pops. What is the total cost of all these operations? If we never did a reallocation
(just alternated one push and one pop), the overall cost would be exactly m, and this is the
lowest possible. But how high might the cost be? Suppose we just keep pushing elements.
Suppose, for example, that we push 17 elements. When we push the second element, we
overflow the array of size 1 and double the array size to n = 2. When we push the 3rd
element, we overflow the array of size 2 and double the array size to n = 4. Following
this pattern, we see that we will overflow with push numbers 5, 9, and 17, resulting in the
creation of new arrays of sizes 8, 16, and 32. The overall cost is 17 for the individual pushes
and 62 = 2 + 4 + 8 + 16 + 32 for the reallocations, for a total of 17 + 62 = 79.

Tokens, Charging, and Amortized Cost: How can we analyze the overall cost for an arbitrary
sequence of pushes and pops? We will show that in any sequence of m pushes and pops, the
overall cost cannot exceed 5m. (Intuitively, we pay a factor of at most 5 for the luxury of
never running out of space in our stack.)
To show this, we will employ a charging argument. Here is a high-level overview. We will play
the role of an accountant. Whenever the user asks us to perform a push or pop operation, we
will assess the user a fixed charge of 5 work tokens. We will store our work tokens in a bank
account, from which we will make withdrawals to pay for the actual costs (as defined above).
Most of the time, the operation will not involved any reallocation. We will take one of the
5 work token to pay the actual cost and put the remaining 4 tokens in our bank account.
We will show that, whenever it is time to perform a reallocation, we will always have enough
work tokens in our bank account to pay the actual costs. Since we charge 5 tokens for each
operation, it follows that the actual costs for any sequence of m operations never exceeds 5m.
An example is shown in Fig. 4. Let suppose that the current array has size 8 and it contains
5 elements. We are asked to perform 4 consecutive pushes. The first 3 are all cheap and cost
1 + 1 + 1 = 3. The last one causes the array to overflow to 9 elements, which necessitates
that we reallocate to an array of size 16 for a total cost of 16 + 1 = 17. The overall actual
cost is 3 + 17 = 20. We charged the user 5 tokens for each operation. Thus, the total in our

Basic Data Structures 9 CMSC 420


push
push
push 16
push

8 8 9
6 7
5

Actual cost: 1 + 1 + 1 + 17 Total = 20


Collected tokens 5 + 5 + 5 + 5 Total = 20

Fig. 4: Amortized analysis of doubling reallocation.

bank account is 4 · 5 = 20, exactly what we need to pay for the actual costs! Did we just get
lucky or is this always true? The following theorem shows that this is always true.

Theorem: Starting with an empty stack, any sequence of m push/pop operations to our
dynamically allocated stack has a total actual cost of at most 5m. (That is, the amortized
cost of each stack operation is 5.)
Proof: Let’s take the sequence of length m and break it into subsequences by splitting it
just after each array reallocation. Let’s call each subsequence a run. Each run, except
possibly the last, consists of a sequence of cheap operations, and its last operation causes
a reallocation. We will show that the number of tokens collected during each run suffices
to pay the actual costs of the run. To simplify the analysis, we will ignore the first and
last runs. (We will leave the full proof as an exercise.)
Consider the state of affairs at the start of a run. We have just reallocated an array
of some size n, creating an array of size 2n, and (because we just overflowed an array
of size n) we know that the current array must have n + 1 actual elements. When the
run ends, we have just overflowed the 2n array, so there must be 2n + 1 elements in the
stack. Thus, there must be at least (2n + 1) − (n + 1) = n push and pop operations in
this run. This means that we have collected a total of at least 5n work tokens in the
run. We take away one token to pay for the actual cost of each operation, which leaves
us at least 4n tokens in our bank account. The next reallocation results in doubling our
array size to 4n for an actual cost of 4n, but since we have at least 4n tokens in our
bank account, we can pay for it.
In summary, by charging 5 work tokens for each of the m operations, we have enough
accumulated tokens to pay the total actual costs for all these operations. This implies
that the total actual costs cannot be more than 5m, as desired.

Okay, we have just shown that doubling works. What about other possible strategies?

Linear growth: What if, rather than doubling, we reallocate by adding a fixed increment,
say going from n elements to n + c elements for some large integer constant c ≥ 1? So,
the allocated stack sizes would be multiples of c, hc, 2c, 3c, . . . , kc, . . .i.
Exponential growth: What if, rather than doubling, we reallocated by adding a fixed con-
stant factor, say going from n elements to dcne for some constant c > 1? So, the allocated
stack sizes would be powers of c, h1, c, c2 , c3 , . . . , ck , . . .i.

Basic Data Structures 10 CMSC 420


Doubly-exponential growth: What if, rather than doubling, we squared the size of the
k
array, going from n to n2 ? So, the allocated stack sizes would be hc, c2 , c4 , c8 , . . . , c2 , . . .i.

We will leave the analyses of these cases as an exercise, but (spoiler alert!) the fixed-increment
method will not yield a constant amortized cost, the constant-factor expansion will yield a
constant amortized cost (but the factor 5 with change), and the repeated squaring will yield
an amortized cost that gets arbitrarily close to 1 as m → ∞. (Unfortuntately, it is rather
space-wasteful, which is why it is usually not used.)

Multilists and Sparse Matrices: Although lists are very basic structures, they can be combined
in numerous nontrivial ways. A multilist is a structure in which a number of lists are combined
to form a single aggregate structure. Java’s ArrayList is a simple example, in which a
sequence of lists are combined into an array structure. A more interesting example of this
concept is its use to represent a sparse matrix.
Recall from linear algebra that a matrix is a structure consisting of n rows and m columns,
whose entries are drawn from some numeric field, say the real numbers. In practice, n and m
can be very large, say on the order of tens to hundred of thousands. For example, a physicist
who wants to study the dynamics of a galaxy might model the n stars of the galaxy using an
n × n matrix, where entry A[i, j] stores the gravitational force that star i exerts on star j.
The number of entries of such a matrix is n2 (and generally nm for an n × m matrix). This
may be impractical if n is very large.
The physicist knows that most stars are so far apart from each other that (due to the inverse
square law of gravity), only a small number of matrices are significant, and all the others could
be set to zero. For example, n = 10, 000 but a star typically exerts a significant gravitational
pull on only its 20 nearest stellar neighbors, then only 20/10, 000 = 0.02% of the matrix
entries are nonzero. Such a matrix in which only a small fraction of the entries are nonzero
is called sparse.
We can use a multilist representation to store sparse matrices. The idea is to create 2n linked
lists, one for each row and one for each column. Each entry of each list stores five things, its
row and column index, its numeric value, and links to the next items in the current row and
current column (see Fig. 5). We will not discuss the technical details, but all the standard
matrix operations (such as matrix multiplication, vector-matrix multiplication, transposition)
can be performed efficiently using this representation.
As a challenge, you might think of how you would write a program to perform matrix multi-
plication using such a representation. The running time should be “sensitive” to the number
of entries in the two matrices.

Lecture 3: Rooted Trees and Binary Trees


Tree Definition and Notation: Trees and their variants are among the most fundamental data
structures. A tree is a special class of graph. Recall from your previous courses that a
graph G = (V, E) consists of a finite set of vertices (or nodes) V and a finite set of edges E.
Each edge is a pair of nodes. In an undirected graph (or simply “graph”), the edge pairs are
unordered, and in a directed graph (or “digraph”), the edge pairs are ordered. An undirected
graph is connected if there is path between any pair of nodes.

Trees 11 CMSC 420


Matrix contents 0 1 2 3
Column
0 1 2 3 0 1 85 0 3 67
0 0 85 0 67 0
1 15 0 0 99
Row 1 0 15 1 3 99
2 0 22 0 0
1
3 77 0 39 0
2 1 22
Node structure
2
row col val
3 0 77 3 2 39
next in row 3
next in col

Fig. 5: Sparse matrix representation using a multilist structure.

Graph (Undirected) Digraph (Directed)

v1 v1
v4 v4 v3
v2 v2
v5 v7 v5 v7
v3 v6 v6

(a) (b)

Fig. 6: Graphs: (a) undirected (b) directed.

Trees 12 CMSC 420


The most general form of a tree, called a free tree, is simply a connected, undirected graph
that has no cycles (see Fig. 7(a)). An example of a free tree is the minimum cost spanning
tree (MST) of a graph.

Free tree (no root) Rooted tree Recursive definition


v1 v1 root r
v3 v4
v2 v2 v3 v4
v5
v7 v6 v5 v6 v7 v8
v9
T1 T2 Tk
v8 v9 v10
leaves
(a) (b) (c)

Fig. 7: Trees: (a) free tree, (b) rooted tree, (c) recursive definition.

Since we will want to use trees for applications in searching, it will be useful to assign some
sense of order and direction to our trees. This will be done by designating a special node,
called the root. In the same manner as a family tree, we think of the root as the ancestor of
all the other nodes in the tree, or equivalently, the other nodes are descendants of the root.
Nodes that have no descendants are called leaves (see Fig. 7(b)). All the others are called
internal nodes. Each node of the tree can be viewed as the root of a subtree, consisting of
this node an all of its descendants. Here is a formal (recursive) definition of a rooted tree:

Rooted tree: Is either:


ˆ A single node, or
ˆ A collection of one more more rooted trees {T1 , . . . , Tk } joined under a common root
node (see Fig. 7(c)).

Since we will be dealing with rooted trees almost exclusively for the rest of the semester,
when we say “tree” we will mean “rooted tree.” We will use the term “free tree” otherwise.

Terminology: There is a lot of notation involving trees. Most terms are easily understood from
the family-tree analogy. Every non-root node is descended from its parent, and the directed
descendants of any node are called its children (see Fig. 8(a)). Two nodes that share a
common parent are called siblings, and other relations (e.g., grandparent, grandchild) follow
analogously.
The depth of a node in the tree is the length (number of edges) of the (unique) path from
the root to that node. Thus, the root is at depth 0. The height of a tree is the maximum
depth of any of its nodes (see Fig. 6(b)). For example, the tree of Fig. 7(b) is of depth 3, as
evidenced by nodes v9 and v10 , which are at this depth. As we defined it, there is no special
ordering among the children of a node. When the ordering among a node’s is significant, it
is called an ordered tree.

Representing Rooted Trees: Rooted trees arise in many applications in which hierarchies exist.
Examples include computer file systems, hierarchically-based organizations (e.g., military and
corporate), documents and reports (volume → chapter → section . . . paragraph). There are a

Trees 13 CMSC 420


grandparent(v) Depth: 0

parent(v) v 1

v 2 Height = 4

3
siblings of v children of v
4

(a) (b)

Fig. 8: (a) “Family” relations, (b) depth and height.

number of ways of representing rooted trees. Later we will discuss specialized representations
that are tailored for special classes of trees (e.g., binary search trees), but for now let’s consider
how to represent a “generic” rooted tree.
The idea is to think of each node of the tree as containing a pointer to the head of a linked list
of its children, called its firstChild. Since a node may generally have siblings, in addition
it will have a pointer to its next sibling, called nextSibling. Finally, each node will contain a
data element (which we will also refer to as its entry), which will depend on the application
involved. This is illustrated in Fig. 9(a). Fig. 9(b) illustrates how the tree in Fig. 6(b) would
be represented using this technique.

root v1

v2 v3 v4
data nextSibling
v5 v6 v7 v8
firstChild
v9 v10

(a) (b)

Fig. 9: The “binary” representation of a rooted tree from Fig. 7(b).

Because each node stores two pointers (references), we will often refer to this as the binary
representation of a rooted tree. Note that this is minimal representation. In practice, we may
wish to add additional information. (For example, we may also wish to include a reference
to each node’s parent.)

Binary Trees: Among rooted trees, by far the most popular in the context of data structures is
the binary tree. A binary tree is a rooted, ordered tree in which every non-leaf node has two
children, called left and right (see Fig. 10(a)). We allow for a binary tree to empty. (We will
see that, like the empty string, it is convenient to allow for empty trees.)
Binary trees can be defined more formally as follows. First, an empty tree is a binary tree.

Trees 14 CMSC 420


a a a

b c b c b c

d e f d e f g d e f
g h i h i j k g h i
` m

(a) (b) (c)

Fig. 10: Binary trees: (a) standard definition, (b) full binary tree, (c) extended binary tree.

Second, if TL and TR are two binary trees (possibly empty) then the structure formed by
making TL and TR the left and right children of a node is also a binary tree. TL and TR are
called the subtrees of the root. If both children are empty, then the resulting node is a leaf.
Note that, unlike standard rooted trees, there is a difference between a node that has just
one child on its left side as opposed to a node that has just one child on its right side. All the
definitions from rooted trees (parent, sibling, depth, height) apply as well to binary trees.
Allowing for empty subtrees can make coding tricky. In some cases, we would like to forbid
such binary trees. We say that a binary tree is full if every node has either zero children (a
leaf) or exactly two (an internal node). An example is shown in Fig. 10(b).
Another approach to dealing with empty subtrees is through a process called extension. This
is most easily understood in the context of the tree shown in Fig. 10(a). We extend the tree
by adding a special external node to replace all the empty subtrees at the bottom of the tree.
The result is called a extended tree. (In Fig. 10(c) the external nodes are shown has squares.)
This has the effect of converting an arbitrary binary tree to a full binary tree.

Java Representation: The typical Java representation of a tree as a data structure is given
below. The data field contains the data for the node and is of some generic entry type E. The
left field is a pointer to the left child (or null if this tree is empty) and the right field is
analogous for the right child.
Binary Tree Node
class BTNode <E > {
E entry ; // this node ’s data
BTNode <E > left ; // left child
BTNode <E > right ; // right child
// ... remaining details omitted
}

As with our rooted-tree representation, this is a minimal representation. Perhaps the most
useful augmentation would be a parent link.
Binary trees come up in many applications. One that we will see a lot of this semester is
for representing ordered sets of objects, a binary search tree. Another is an expression tree,
which is used in compiler design in representing a parsed arithmetic exception (see Fig. 11).

Traversals: There are a number of natural ways of visiting or enumerating every node of a tree.
For rooted trees, the three best known are preorder, postorder, and (for binary trees) inorder.

Trees 15 CMSC 420


Let T be a tree whose root is r and whose subtrees are T1 , . . . , Tk for k ≥ 0. They are all
most naturally defined recursively. (Fig. 11 illustrates these in the context of an expression
tree.)

Preorder: Visit the root r, then recursively do a preorder traversal of T1 , . . . , Tk .


Postorder: Recursively do a postorder traversal of T1 , . . . , Tk and then visit r. (Note that
this is not the same as reversing the preorder traversal.)
Inorder: (for binary trees) Do an inorder traversal of TL , visit r, do an inorder traversal of
TR .

/ Preorder: / * + a b c - d e

* - Postorder: a b + c * d e - /
+ c d e
Inorder: a + b * c / d - e
a b

Fig. 11: Expression tree for ((a + b) ∗ c)/(d − e)) and common traversals.

These traversals are most easily coded using recursion. The code block below shows a possible
way of implementing the preorder traversal in Java. The procedure visit would depend on
the specific application. The algorithm is quite efficient in that its running time is proportional
to the size of the tree. That is, if the tree has n nodes then the running time of these traversal
algorithms are all O(n).
Preorder Traversal
void preorder ( BTNode v )
{
if ( v == null ) return ; // empty subtree - do nothing
visit ( v ) ; // visit ( depends on the application )
preorder ( v . left ) ; // recursively visit left subtree
preorder ( v . right ) ; // recursively visit right subtree
}

These are not the only ways of traversing a tree. For example, another option would be
breadth-first, which visits the nodes level by level: “/ * - + c d e a b.” An interesting
question is whether a traversal uniquely determines the tree’s shape. The short answer is no,
but if you have an extended tree and you know which nodes are internal and which are leaves
(as is the case in the expression tree example from Fig. 11), then such a reconstruction is
possible. Think about this.

Extended Binary Trees: To motivate our next topic, let’s consider the utilization of space in
our trees. Recall the binary tree shown in Fig. 10(a). It has nine nodes, and each node has
two child links, for a total of 18 links. We have eight actual links in the tree, which means
that the remaining 10 links are null. Thus, nearly half of the child links are null! This is not
unusual, as the following theorem states. (Take note the proof structure, since it is common
for proofs involving binary trees.)

Trees 16 CMSC 420


Claim: A binary tree with n nodes has n + 1 null child links.
Proof: (by induction on the size of the tree) Let x(n) denote the number of null child links
in a binary tree of n nodes. We want to show that for all n ≥ 0, x(n) = n+1. We’ll make
the convention of defnining x(0) = 1. (If you like, you can think of the null pointer to
the root node as this link.)
For the basis case, n = 0, by our convention x(0) = 1, which satisfies the desired relation.
For the induction step, let’s assume that n ≥ 1. The induction hypothesis states that,
for all n0 < n, x(n0 ) = n0 + 1. A binary tree with at least one node consists of a root
node and two (possibly empty) subtrees, TL and TR . Let nL and nR denote the numbers
of nodes in the left and right subtrees, respectively. Together with the root, these must
sum to n, so we have n = 1+nL +nR . By the induction hypothesis, the numbers of null
links in the left and right subtrees are x(nL ) = nL + 1 and x(nR ) = nR + 1. Together,
these constitute all the null links. Thus, the total number of null links is

x(n) = x(nL ) + x(nR ) = (nL + 1) + (nR + 1) = (1 + nL + nR ) + 1 = n + 1,

as desired.

In large data structures, these null pointers represent quite a significant wastage of space.
What can we do about this? One idea is to distinguish two different node types, one for
internal nodes, which have (non-null) child links and another for leaves, which need no child
links. One way to construct such a tree is called “extension.” An extended binary tree is
constructed by replacing each null link with a special leaf node, called an external node.
(The other nodes are called internal nodes.) An example of an extended binary tree is shown
in Fig. 10(c). Because we replace each null link with a leaf node, we have the following direct
corollary from our previous theorem.

Corollary: An extended binary tree with n internal nodes has n + 1 external nodes, and
hence 2n + 1 nodes altogether.

The key “take-away” from this proof is that over half of the nodes in an extended binary tree
are leaf nodes. In fact, it is generally true that if the degree of a tree is two or greater, leaves
constitute the majority of the nodes.

Threaded Binary Trees: We have seen that extended binary trees provide one way to deal with
the wasted space caused by null pointers in the nodes of a binary tree. In this section we
will consider another rather cute use of these pointers.
Recall that binary tree traversals are naturally defined recursively. Therefore a straightfor-
ward implementation would require extra space to store the stack for the recursion. Is some
way to traverse the tree without this additional storage? The answer is yes, and the trick
is to employ each null pointer encode some additional information to aid in the traversal.
Each left-child null pointer stores a reference to the node’s inorder predecessor, and each
right-child null pointer stores a reference to the node’s inorder successor. The resulting rep-
resentation is called a threaded binary tree. (For example, in Fig. 12(a), we show a threaded
version of the tree in Fig. 10(b)).
We also need to add a special “mark bit” to each child link, which indicates whether the link
is a thread or a standard parent-child link. The node structure is shown below.

Trees 17 CMSC 420


a u v

b c

d e f
g h i v u

(a) (b) (c)

Fig. 12: A Threaded Tree.


Threaded Binary Tree Node
class ThreadBTNode <E > {
E entry ; // this node ’s data
Boolean leftIsThread ; // left child link is a thread
Boolean rightIsThread ; // right child link is a thread
ThreadBTNode <E > left ; // left child
ThreadBTNode <E > right ; // right child
// ... remaining details omitted
}

Let us consider how to do an inorder traversal in a threaded-tree representation. Suppose


that we are currently visiting a node u. How do we find the inorder successor of u? First,
if u’s right-child link is a thread, then we just follow it (see Fig. 12(b)). Otherwise, we go
the node’s right child, and then traverse left-child links until reaching the bottom of the tree,
that is a threaded link (see Fig. 12(c)).
Inorder Successor in a Threaded Tree
ThreadBTNode inorderSuccessor ( ThreadBTNode v) {
ThreadBTNode u = v . right ; // go to right child
if ( v . rightIsThread ) return u ; // if thread , then done
while (! u . leftIsThread ) { // else u is right child
u = u . left ; // go to left child
} // ... until hitting thread
return u ;
}

For example, in Fig. 12(b), if we start at d, the thread takes us directly to a, which is d’s
inorder successor. In Fig. 12(c), if we start at a, then we follow the right-child link to b, and
then follow left-links until arriving at d, which is the inorder successor. Of course, to turn
this into a complete traversal function, we would need to start by finding the first inorder
node in the tree. We’ll leave this as an exercise.
Threading is more of a “cute trick” than a common implementation technique with binary
trees. Nonetheless, it is representative of the number of clever ideas that have been developed
over the years for representing and processing binary trees.

Complete Binary Trees: We have discussed linked allocation strategies for rooted and binary
trees. Is it possible to allocate trees using sequential (that is, array) allocation? In general

Trees 18 CMSC 420


it is not possible because of the somewhat unpredictable structure of trees (unless you are
willing to waste a lot of space). However, there is a very important case where sequential
allocation is possible.

Complete Binary Tree: Every level of the tree is completely filled, except possibly the
bottom level, which is filled from left to right.

It is easy to verify that a complete binary tree of height h has between 2h and 2h+1 − 1
nodes, implying that a tree with n nodes has height O(log n) (see Fig. 13). (We leave these
as exercises involving geometric series.)
1
2
a 3
b c
4 5 6 7
d e f g
8 9 10 11 12 13
h i j k ` m

a b c d e f g h i j k ` m
0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 13: A complete binary tree.

The extreme regularity of complete binary trees allows them to be stored in arrays, so no
additional space is wasted on pointers. Consider an indexing of nodes of a complete tree from
1 to n in increasing level order (so that the root is numbered 1 and the last leaf is numbered
n). Observe that there is a simple mathematical relationship between the index of a node
and the indices of its children and parents. In particular:

leftChild(i): if (2i ≤ n) then 2i, else null.


rightChild(i): if (2i + 1 ≤ n) then 2i + 1, else null.
parent(i): if (i ≥ 2) then bi/2c, else null.

As an exercise, see if you can also compute the sibling of node i and the depth of node i.
Observe that the last leaf in the tree is at position n, so adding a new leaf simply means
inserting a value at position n + 1 in the list and updating n. Since arrays in Java are indexed
from 0, omitting the 0th entry of the matrix is a bit of wastage. Of course, the above rules
can be adapted to work even if we start indexing at zero, but they are not quite so simple.

Lecture 4: Binary Search Trees


Searching: Searching is among the most fundamental problems in data structure design. Our
objective is to store a set of entries {e1 , . . . , en }, where each ei is a pair (xi , vi ), where xi
is a key value drawn from some totally ordered domain (e.g., integers or strings) and vi is
an associated data value. The data value is not used in the search itself, but is needed by
whatever application is using our data structure.
We assume that keys are unique, meaning that no two entries share the same key. Given an
arbitrary search key x, the basic search problem is determining whether there exists an entry

Binary Search Trees 19 CMSC 420


matching this key value. To implement this, we will assume that we are given two types, Key
and Value. We will also assume that key values can be compared using the usual comparison
operators, such as <, >, ==. We make no assumptions about Value, since it is used by the
application only. (We will discuss Java implementation at the end of the lecture.)

The Dictionary (Map) ADT: A dictionary (also called a map) is an ADT that supports the
operations insert, delete, and find:

void insert(Key x, Value v): Stores an entry with the key-value pair (x, v). We assume
that keys are unique, and so if this key already exists, an error condition will be signaled
(e.g., an exception will be thrown).
void delete(Key x): Delete the entry with x’s key from the dictionary. If this key does not
appear in the dictionary, then an error conditioned is signaled.
Value find(Key x): Determine whether there is an entry matching x’s key in the dictionary?
If so, it returns a reference to associated value. Otherwise, it returns a null reference.

There are a number of additional operations that one may like to support, such as iterating
the entries, answering range queries (that is, reporting or counting all objects between some
minimum and maximum key values), returning the kth smallest key value, and computing
set operations such as union and intersection.
There are three common methods for storing dictionaries: sorted arrays, search trees, and
hashing. We will discuss the first two in this a subsequent lectures. (Hashing will be presented
later this semester.)

Sequential Allocation: The most naive approach for implementing a dictionary data structure
is to simply store the entries in a linear array without any sorting. To find a key value, we
simply run sequentially through the list until we find the desired key. Although this is simple,
it is not efficient. Searching and deletion each take O(n) time in the worst case, which is very
bad if n (the number of items in the dictionary) is large. Although insertion only involves
O(1) to insert a new item at the end of the array (assuming we don’t overflow), it would
require O(n) time to check that we haven’t inserted a duplicate element.
An alternative is to sort the entries by key value. Now, binary search can be used to locate
a key in O(log n) time, which is much more efficient. (For example, if n = 106 , log2 n is only
around 20.) While searches are fast, updates are slow. Insertion and deletion require O(n)
time, since the elements of the array must be moved around to make space.

Binary Search Trees: In order to provide the type of rapid access that binary search offers, but
at the same time allows efficient insertion and deletion of keys, the simplest generalization
is called a binary search tree. The idea is to store the records in the nodes of a binary tree.
Each node, called a BSTNode, stores a key, key, and an associated value, value.
These nodes are organized so that an inorder traversal visits the nodes in increasing key order.
In particular, if x is the key stored in some node, then the left subtree contains all keys that
are less than x, and the right subtree stores all keys that are greater than x (see Fig. 14(a)).
(Recall that we assume that keys are distinct, so no other key can be equal to x.)

Search in Binary Search Trees: The search for a key x proceeds as follows. We start by as-
signing a pointer p to the root of the tree. We compare x to p’s key, that is, p.key. If they
are equal, we have found it and we are done. Otherwise, if x is smaller, we recursively search

Binary Search Trees 20 CMSC 420


13 find(5) 13 find(14)
7 16 7 16
x
4 8 15 22 4 8 15 22
1 5 11 18 1 5 11 18
<x >x
2 9 17 21 2 9 17 21

Fig. 14: A possible binary search tree for the key set {1, 2, 4, 5, 7, 8, 9, 11, 13, 15, 16, 17, 18, 21, 22}
and the searcg paths for find(5) and find(14).

p’s left subtree, and if x is larger, we recursively visit p’s right subtree. The search proceeds
until we either find the key (see Fig. 14(b)) or we fall out of the tree (see Fig. 14(b)). The
code block below shows a possible pseudocode implementation of the find operation. The
initial call is find(x, root), where root is the root of the tree. It is easy to see based on
the definition of a binary tree why this is correct.
Recursive Binary Tree Search
Value find(Key x, BSTNode p) {
if (p == null) return null; // unsuccessful search
else if (x < p.key) // x is smaller?
return find(x, p.left); // ... search left
else if (x > p.key) // x is larger?
return find(x, p.right); // ... search right
else return p.value; // successful search
}

Query time: What is the running time of the search algorithm? Well, it depends on the key you
are searching for. In the worst case, the search time is proportional to the height of the tree.
The height of a binary search tree with n entries can be as low as O(log n) for the case of
balanced tree (see Fig. 15 left) or as large as O(n) for the case of a degenerate tree (see Fig. 15
right). However, we shall see that if the keys are inserted in random order, the expected
height of the tree is just O(log n).

Balanced: Height = O(log n) Degenerate: Height = O(n)


11 1
2
5 19
4
2 8 15 24 5
7
1 4 7 9 13 16 22 26
8

Fig. 15: Balanced and degenerate binary trees.

Can we devise ways to force the tree height to be O(log n)? The answer is yes, and in future
lectures we will see numerous approaches for guaranteeing this.

Binary Search Trees 21 CMSC 420


Insertion: To insert a new key-value entry (x, v) in a binary search tree, we first try to locate the
key in the tree. If we find it, then the attempt to insert a duplicate key is an error. If not,
we effectively “fall out” of the tree at some node p. We insert a new leaf node containing the
desired entry as a child of p. It turns out that this is always the right place to put the new
node. (For example, in Fig. 16, we fall out of the tree at the left child of node 15, and we
insert the new node with key 14 here.)

insert(14)
13 13

7 19 7 19
p1
4 10 15 22 4 10 15 22
p2
1 6 17 20 6 1 6 14 17 20 6

Fig. 16: Binary tree insertion.

This can naturally be implemented recursively. If the inserted key is smaller than the current
node’s key, we insert recursively on the left. If it is greater, we insert on the right. (It cannot
be equal, or we have a duplicate key!) When we fall out of the tree (a null link), we create a
new node here. The pseudocode is presented in the code block below. The initial call is root
= insert(x, v, root). We assume that there is a constructor for the BSTNode, which is
given the key, value, and the initial values of the left and right child pointers.
Recursive Binary Tree Insertion
BSTNode insert(Key x, Value v, BSTNode p) {
if (p == null) // fell out of the tree?
p = new BSTNode(x, v, null, null); // ... create a new leaf node here
else if (x < p.key) // x is smaller?
p.left = insert(x, v, p.left); // ...insert left
else if (x > p.key) // x is larger?
p.right = insert(x, v, p.right); // ...insert right
else throw DuplicateKeyException; // x is equal ...duplicate key!
return p // return ref to current node (sneaky!)
}

Coding Trick: (Be sure you understand this! ) You will see one technical oddity in the above
pseudocode implementation. Consider the line: “p.left = insert(x, v, p.left)”. Why
does the insert function return a value, and why do we replace the left-child link with this
value?
Here is the issue. Whenever we create the new node, we need to “reach up” and modify
the appropriate child link in the parent’s node. Unfortunately, this is not easy to do in our
recursive formulation, since the parent node is not a local variable. To do this, our insert
function will return a reference to the modified subtree after the insertion. We store this
value in the appropriate child pointer for the parent.
To better understand how our coding trick works, let’s refer back to Fig. 16. Let p1 denote
the node containing 15. Since 14 < 15, we effectively invoked the command “p1.left =

Binary Search Trees 22 CMSC 420


insert(x, v, p1.left)”. But since this node has no left child (p1.left == null), this
recursive call creates a new BSTNode containing 14 and returns a pointer to it, call it p2 .
Since this is the return value from the recursive call, we effectively have executed “p1.left
= p2”, thus linking the 14 node as the left child of the 15 node. Slick!

Deletion: Next, let us consider how to delete an entry from the tree. Deletion is a more involed
than insertion. While insertion adds nodes at the leaves of the tree, but deletions can occur
at any place within the tree. Deleting a leaf node is relatively easy, since it effectively involves
“undoing” the insertion process (see Fig. 17(a)). Deleting an internal node requires that we
“fill the hole” left when this node goes away. The easiest case is when the node has just
a single child, since we can effectively slide this child up to replace the deleted node (see
Fig. 17(b)).

Leaf deletion Single-child case


delete(8) delete(7)
9 9 9 9

2 11 2 11 2 11 2 11

1 7 13 1 7 13 1 7 13 1 5 13

5 8 (remove) 5 5 (replace) 4

4 4 4
(a) (b)

Fig. 17: Deletion: (a) Leaf and (b) single-child case.

The hardest case is when the deleted node that has two children. Let p denote the node to
be deleted (see Fig. 18(a)):

ˆ First off, we we need to find a suitable replacement node, whose key/value will fill the
hole left by deletion. We can either take the largest key from p’s left subtree (its inorder
predecessor) or the smallest key from its right subtree (its inorder successor). Let’s
arbitrarily decide to do the latter. Call this node r (see Fig. 18(b)). Note that because
p has two children, its inorder successor is the “leftmost” node of p’s right subtree. Call
r the replacement node.
ˆ Copy the contents of r (both key and value) to p (see Fig. 18(c)).
ˆ Recursively delete node r from p’s right subtree (see Fig. 18(d)).

It may seem that we have made no progress, because we have just replaced one deletion
problem (for p) with another (for r). However, the task of deleting r is much simpler. The
reason is, since r is p’s inorder successor, r is the leftmost node of p’s right subtree. It follows
that r has no left child. Therefore, r is either a leaf or it has a single child, implying that it
is one of the two “easy” deletion cases that we discussed earlier.

Deletion Implementation: Before giving the code for deletion, we first present a utility function,
findReplacement(), which returns a pointer to the node that will replace p in the two-child
case. As mentioned above, this is the inorder successor of p, that is, the leftmost node in

Binary Search Trees 23 CMSC 420


delete(3) Two-Child Case
13 13 13 13
p p p p
3 14 3 14 copy 4 14 4 14

2 9 16 2 9 16 2 9 16 2 9 16
r r
1 4 11 1 4 11 1 4 11 1 6 11
replacement
6 6 6 delete(4) 5 7

5 7 5 7 5 7
(a) (b) (c) (d)

Fig. 18: Deletion: Two-child case.

p’s right subtree. As with the insertion method, the initial call is made to the root of the
tree, delete(x, root). Again, we will employ the sneaky trick of returning a pointer to the
revised subtree after deletion, and store this value in the child link. See the code fragment
below.
Replacement Node for the Two-child Case
BSTNode findReplacement(BSTNode p) { // find p’s replacement node
BSTNode r = p.right; // start in p’s right subtree
while (r.left != null) r = r.left; // go to the leftmost node
return r;
}

The full deletion code is given in the following code fragment. As with insertion, the code
is quite tricky. For example, can you see where the leaf and single-child cases are handled
in the code? We do not have a conditional that distinguishes between these cases. How can
that be correct. (But it is!)

Analysis of Binary Search Trees: It is not hard to see that all of the procedures find(),
insert(), and delete() run in time that is proportional to the height of the tree being
considered. (The delete() procedure is the only one for which this is not obvious. Because
the replacement node is the inorder successor of the deleted node, it is the leftmost node of
the right subtree. This implies that the replacement node has no left child, and so it will fall
into one of the easy cases, which do not require a recursive call.)
The question is, given a binary search tree T containing n keys, what can be said about the
height of the tree? It is easy to devise “terrible” insertion orders so that the tree has the worst
possible height of n − 1. Conversely, it is possible to devise “perfect” insertion orders, which
result in a balanced tree of height blog2 nc. (As an exercise, think about how you would do
either of these.)
Since the worst case is obviously very bad, let’s consider the expected case. Suppose that the
keys are inserted in random order. To be more precise, given n keys, let us assume that each
of the n! insertion orders is equally likely. What is the expected height of the resulting tree?

Binary Search Trees 24 CMSC 420


Binary Tree Deletion
BSTNode delete(Key x, BSTNode p) {
if (p == null) // fell out of tree?
throw KeyNotFoundException; // ...error - no such key
else {
if (x < p.data) // look in left subtree
p.left = delete(x, p.left);
else if (x > p.data) // look in right subtree
p.right = delete(x, p.right);
// found it!
else if (p.left == null || p.right == null) { // either child empty?
if (p.left == null) return p.right; // return replacement node
else return p.left;
}
else { // both children present
r = findReplacement(p); // find replacement node
copy r’s contents to p; // copy its contents to p
p.right = delete(r.key, p.right); // delete the replacement
}
}
return p;
}

It turns out to be pretty good!

Theorem: Given a set of n keys x1 < x2 < . . . < xn , let H(n) denote the expected height
of a binary search tree, under the assumption that every one of the n! insertion orders
is equally likely. Then H(n) = O(log n).

Proving this is not an trivial exercise. (If you saw the expected-case analysis for Quicksort in
CMSC 351, it is quite similar.)
Quick and Dirty Analysis: (Optional )
Rather than give the full proof, we will provide a much simpler one, which will hopefully
convince you that the assertion is reasonable. Instead of proving that every node of the tree
is at expected depth O(log n), we will prove that the leftmost node of the tree (that is, the
node associated with the smallest key value) will be at expected depth O(log n).

Theorem: Given a set of n keys x1 < x2 < . . . < xn , let D(n) denote the expected depth
of leftmost node x1 after inserting all these keys in a binary search tree, under the
assumption that all n! insertion orders are equally likely. Then D(n) ≤ 1 + ln n, where
ln denotes the natural logarithm.
Proof: We will track the depth of the leftmost node of the tree through the sequence of
insertions. Suppose that we have already inserted i − 1 keys from the sequence, and we
are in the process of inserting the ith key. The only way that the leftmost node changes
is when the ith key is the lowest key value that has been seen so far in the sequence.
That is, the i element to be inserted in the new minimum value among all the keys in
the tree.
For example, consider the insertion sequence S = h9, 5, 10, 6, 3, 4, 2i. Observe that the
minimum value changes three times, when 5, 3, and 2 are inserted. If we look at the

Binary Search Trees 25 CMSC 420


binary tree that results from this insertion sequence we see that the depth of the leftmost
node also increases by one with each of these insertions.

Insertion order: h9, 5, 10, 6, 3, 4, 2i

insert(5) insert(10) insert(6) insert(3) insert(4) insert(2)


9 9 9 9 9 9 9
insert(9)
5 5 10 5 10 5 10 5 10 5 10

6 3 6 3 6 3 6

4 2 4

Fig. 19: Length of the leftmost chain.

To complete the analysis, it suffices to determine (in expectation) the number of times
that the minimum in a sequence of n random values changes. To make this formal, for
2 ≤ i ≤ n, let Xi denote the random variable that is 1 if the ith element of the random
sequence is the minimum among the first i elements, and 0 otherwise. (In our sequence
S above, X2 = X5 = X7 = 1, because the minimum changed when the second, fifth, and
seventh elements were added. The remaining Xi ’s are zero.)
To analyze Xi , let’s just focus on the first i elements and ignore the rest. Since every
permutation of the numbers is equally likely, the minimum among the first i is equally
likely to come at any of the positions first, second, . . . , up to ith. The minimum changes
only if comes last out of the first i. Thus, Pr(Xi = 1) = 1i and Pr(Xi = 0) = 1 − 1i .
Whenever this random event occurs (Xi = 1), the minimum has changed one more time.
Therefore, to obtain the expected number of times that the minimum changes, we just
need to sum the probabilities that Xi = 1, for i = 2, . . . , n. Thus we have
n n
!
X 1 X 1
D(n) = = − 1.
i i
i=2 i=1

This summation is among the most famous in mathematics. It is called the Harmonic
Series. Unlike the geometric series (1/2i ), the Harmonic Series does not converge. But
it is known that when n is large, its value is very close to ln n, the natural log of n. (In
fact, it is not more than 1 + ln n.)
Therefore, we conclude that the expected depth of the leftmost node in a binary search
tree under n random insertions is at most 1 + ln n = O(log n), as desired.

Random Insertions and Deletions: Interestingly, the analysis of the expected-case tree height
breaks down if we perform both insertions and deletions. Suppose that we consider a very
long sequence of insertions and deletions, which occur at roughly the same rate so that, in
steady state, the tree has roughly n nodes. Let us also assume that insertions are random
(drawn say from some large domain of candidate elements) and deletions are random in the
sense that a random element from the tree is deleted each time.
It is natural to suppose that the O(log n) bound should apply, but remarkably it does not! It
can be shown that over a long sequence, the height of the tree will converge to a significantly

Binary Search Trees 26 CMSC 420



larger value of O( n).3
The reason has to do with the fact that the replacement element was chosen in a “biased”
manner, always taking the inorder successor. Over the course of many deletions, this repeated
bias causes the tree’s structure to skew away from the ideal. This can be remedied by
selecting the replacement node in an unbiased manner, choosing randomly between the inorder
successor or inorder predecessor. It has been shown experimentally that this resolves the issue,
but (to the best of my knowledge) it is not known whether the expected height of this balanced
version of deletion matches the expected height for the insertion-only case (see Culberson and
Munro, Algorithmica, 1990).

Java Implementation: So far, we have been expressing our functions in pseudocode. Defining
a binary search tree object in a modern object-oriented language like Java would be done
in a slightly different manner. First, rather than fixing the types Key and Value, we would
make the parameterized (generic) types. Let’s call these K and V, respectively. We should not
assume that we can simply apply operators such as “<”, “>”, and “==” for comparing keys.
For this reason, we will assume that our K object implements the Comparable<K> interface,
which means that it defines a method compareTo, which compares two objects of type K.
In order to implement nodes, the class BinarySearchTree can define an inner class, called
BSTNode. In Java, we can use the default access modifier. This means that the BSTNode mem-
bers are accessible within the same package but hidden from the outside. Public functions,
like find, insert, and delete, each invoke corresponding local functions, each of which is
invoked on the root of the tree. A partial example (showing just the find function) is shown
in the following code fragment.

Lecture 5: AVL Trees


Balanced Binary Trees: The binary search trees described in the previous lecture are easy to
implement, but they suffer from the fact that if nodes are inserted in a poor order (e.g.,
increasing or decreasing) then the height of the tree can be much higher than the ideal height
of O(log n). This raises the question of whether we can design a binary search tree that is
guaranteed to have O(log n) height, irrespective of the order of insertions and deletions.
Today we will consider the oldest, and perhaps best known example of such a data structure
is the famous AVL tree, which was discovered way back in 1962 by G. Adelson-Velskii and E.
Landis (and hence the name “AVL”).

AVL Trees: AVL tree’s are height-balanced binary search trees. In an absolutely ideal height-
balanced tree, the two children of any internal node would have equal heights, but it is
not generally possible to achieve this goal. The most natural relaxation of this condition is
expressed in the following invariant:
3
There is an interesting history regarding this question. It was believed for a number of years that random
deletions did not alter the structure of the tree. A theorem by T. N. Hibbard in 1962 proved that the tree structure
was probabilistically unaffected by deletions. The first edition of D. E. Knuth’s famous book on data structures, quotes
this result. In the mid 1970’s, Gary Knott, a Ph.D. student of Knuth and later a professor at UMD, discovered a
subtle flaw in Hibbard’s result. While the structure of the tree is probabilistically the same, the distribution of keys

is not. However, Knott could not resolve the asymptotic running time. The analysis showing that O( n) bound was
due to Culberson and Munro in the mid 1980’s.

AVL Trees 27 CMSC 420


(Partial) Java Binary Tree
public class BinarySearchTree<K extends Comparable<K>, V> {

class BSTNode { // A node of the tree


K key;
V value;
BSTNode left;
BSTNode right;

// ... other utility methods omitted


}

V find(K x, BSTNode p) { // local find function


if (p == null) return null; // unsuccessful search
else if (x.compareTo(p.key) < 0) // x is smaller?
return find(x, p.left); // ... search left
else if (x.compareTo(p.key) > 0) // x is larger?
return find(x, p.right); // ... search right
else return p.value; // successful search
}

// ... other methods (insert, delete, ...) omitted

BSTNode root; // root of tree

public V find(K x) { // public find key


return find(x, root); // invoke local find
}
}

AVL Trees 28 CMSC 420


AVL balance condition: For every node in the tree, the absolute difference in the heights
of its left and right subtrees is at most 1.
AVL Tree: A binary search tree that satisfies the AVL balance condition.

For any node v of the tree, let height(v) denote the height of the subtree rooted at v (shown
in blue in Fig. 20(a)). It will be convenient to define the height of an empty tree (that is, a
null pointer) to be −1. Define the balance factor of v, denoted balance(v) to be

balance(v) = height(v.right) − height(v.left)

(see Fig. 20(b)). The AVL balance condition is equivalent to the requirement that balance(v) ∈
{−1, 0, +1} for all nodes v of the tree. (Thus, Fig. 20(b) is an AVL tree, but the tree of
Fig. 20(c) is not because node 10 has a balance factor of +2.)

Subtree heights Balance factors Not an AVL tree


3 -1 0
9 9 9 !

2 1 -1 +1 -1 +2
5 10 5 10 5 10
1 0 0 0 0 0 0 0 -1
3 6 14 3 6 14 3 6 14
0 0 0 0 0 0 0
1 4 1 4 1 4 12
(a) (b) (c)

Fig. 20: AVL-tree balance condition.

Worst-case Height: Before discussing how we maintain this balance condition we should consider
the question of whether this condition is strong enough to guarantee that the height of an
AVL tree with n nodes is O(log n). Interestingly, the famous Fibonacci numbers will arise in
the analysis. Recall that for h ≥ 0, the hth Fibonacci number, denoted Fh is defined by the
following recurrence:

F0 = 0, F1 = 1, and Fh = Fh−1 + Fh−2 , for h ≥ 2.

An important and well-known property


√ of the Fibonacci
√ numbers is that they grow expo-
nentially. In particular, Fh ≈ ϕh / 5, where ϕ = (1 + 5)/2 ≈ 1.618 is the famous Golden
Ratio.4

Lemma: An AVL tree of height h ≥ 0 has Ω(ϕh ) nodes, where ϕ = (1 + 5)/2.
Proof: For h ≥ 0, let N (h) denote the minimum possible number of nodes in binary tree of
height h that satisfies the AVL balance condition. We will prove that N (h)√= Fh+3 − 1
(see Fig. 21). The result will then follow from the fact that Fh+3 ≈ ϕh+3 / 5, which is
equal to ϕh up to constant factors (since ϕ itself is a constant).
4
Here is a sketch of a proof. Let us conjecture that Fh ≈ ϕh for some constant ϕ. Since the function grows very
(exponentially) fast, we may ignore the tiny contribution of the +1 in the definition for large h. Substituting our
conjectured value for Fh into the above recurrence, we find the ϕ satisfies ϕh = ϕh−1 + ϕh−2 . Removing the common
factor of ϕh−2 , we have ϕ2 = ϕ + 1, that is, √ ϕ2 − ϕ − 1 = 0. This is a quadratic equation, and by applying the
quadratic formula, we conclude that ϕ = (1 + 5)/2.

AVL Trees 29 CMSC 420


h 0 1 2 3 4 5
Fh+3 2 3 5 8 13 21
N (h) 1 2 4 7 12 20

Fig. 21: Minimal AVL trees and Fibonacci numbers.

Our proof is by induction on h. First, observe that a tree of height zero consists of a
single root node, so N (0) = 1. Also, the smallest possible AVL tree of height one consists
of a root and a single child, so N (1) = 2. By definition of the Fibonacci numbers, we
have F0+3 = 2 and F1+3 = 3, and thus N (i) = Fi+3 − 1, for these two basis cases.
For h ≥ 2, let hL and hR denote the heights of the left and right subtrees, respectively.
Since the tree has height h, one of the two subtrees must have height h − 1, say, hL .
To minimize the overall number of nodes, we should make the other subtree as short as
possible. By the AVL balance condition, this implies that hR = h − 2. Adding a +1
for the root, we have N (h) = 1 + N (h − 1) + N (h − 2). We may apply our induction
hypothesis to conclude that

N (h) = 1 + N (h − 1) + N (h − 2) = 1 + (Fh+2 − 1) + (Fh+1 − 1)


= Fh+2 + Fh+1 − 1 = Fh+3 − 1,

as desired.
Corollary: An AVL tree with n nodes has height O(log n).
Proof: Let lg denote logarithm base 2. From the above lemma, up to constant factors we
have n ≥ ϕh , which implies that h ≤ logϕ n = lg n/ lg ϕ. Since ϕ > 1 is a constant, so
is log ϕ. Therefore, h is O(log n). (If you work through the math, the actual bound on
the height is roughly 1.44 lg n.)

Since the height of the AVL tree is O(log n), it follows that the find operation takes this
much time. All that remains is to show how to perform insertions and deletions in AVL trees,
and how to restore the AVL balance condition efficiently after each insertion or deletion.

Rotation: In order to maintain the tree’s balance, we will employ a simple operation that locally
modifies subtree heights, while preserving the tree’s inorder properties. This operation is
called rotation. It comes in two symmetrical forms, called a right rotation and a left rotation
(see Fig. 22(a) and (b)).
We have intentionally labeled the elements of Fig. 22 to emphasize the fact that the inorder
properties of the tree are preserved. That is, A < b < C < d < E. The code fragment below
shows how to apply a right and left rotations to a node p, to a generic node of a binary search
tree, BSTNode. As has been our practice, we return a pointer to the modified subtree (in order
to modify the child link pointing into this subtree).

AVL Trees 30 CMSC 420


Right rotation Left rotation
p q p q
d b b d
q p q p
b d d b
E A A E

A C C E C E A C

(a) (b)

Fig. 22: (Single) Rotations. (Triangles denote subtrees, which may be null.)
Binary-Tree Rotations
BSTNode rotateRight(BSTNode p) { // right rotation at p
BSTNode q = p.left;
p.left = q.right;
q.right = p;
return q;
}

BSTNode rotateLeft(BSTNode p) { ... symmetrical ... }

Unfortunately, a single rotation is not always be sufficient to rectify a node that is out of
balance. To see why, observe that the single rotation does not alter the height of subtree C.
If it is too heavy, we need to do something else to fix matters. This is done by combining
two rotations, called a double rotation. They come in two forms, left-right rotation and right-
left rotation (Fig. 23). To help remember the name, note that the left-right rotation, called
rotateLeftRight(p), is equivalent to performing a left rotation to the p.left (labeled b in
Fig. 23(a)) followed by a right rotation to p (labeled d in Fig. 23(a)). The right-left rotation
is symmetrical (see Fig. 23(b)).

Left-Right rotation Right-Left rotation


p p
d c b c

b b d d b d
E A
c c

A A C0 C 00 E E A C0 C 00 E

C0 C 00 C0 C 00
(a) (b)

Fig. 23: Double rotations (rotateLeftRight(p) and RotateRightLeft(p)).

Insertion: The insertion routine for AVL trees starts exactly the same as the insertion routine for

AVL Trees 31 CMSC 420


standard (unbalanced) binary search trees. In particular, we search for the key and insert a
new node at the point that we fall out of the tree. After the insertion of the node, we must
update the subtree heights, and if the AVL balance condition is violated at any node, we then
apply rotations as needed to restore the balance.
The manner in which rotations are applied depends on the nature of the imbalance. An
insertion results in the addition of a new leaf node, and so the balance factors of the ancestors
can be altered by at most ±1. Suppose that after the insertion, we find that some node has a
balance factor of −2. For concreteness, let us consider the naming of the nodes and subtrees
shown in Fig. 24, and let the node in equation be d. Note that this node must be along the
search path for the inserted node, since these are the only nodes whose subtree heights may
have changed. Clearly, d’s left subtree, is too deep relative to d’s right subtree E. Let b
denote the root of d’s left subtree.
At this point there are two cases to consider. Either b’s left child is deeper or its right child
is deeper. (The subtree that is deeper will be the one into which the insertion took place.)
Let’s first consider the case where the insertion took place in the the subtree A (see Fig. 24(b)).
In this case, we can restore balance by performing a right rotation at node d. This operation
pulls the deep subtree A up by one level, and it pushes the shallow subtree E down by one
level (see Fig. 24(c)). Observe that the depth of subtree C is unaffected by the operation. It
follows that the balance factors of the resulting subtrees rooted at b and d are now both zero.
The AVL balance condition is satisfies by all nodes, and we are in good shape.
-1 insert -2 !! rotate 0
d d b
0 -1 0
b b d
E E A
A C A C C E
new insertion
(a) (b) (c)

Fig. 24: Restoring balance after insertion through a single rotation.

Next, us consider the case where the insertion occurs within subtree C (see Fig. 25(b)). As
observed earlier, the rotation at d does not alter C’s depth, so we will need to do something
else to fix this case. Let c be the root of the subtree C, and let C 0 and C 00 be its two subtrees
(either of these might be null). The insertion took place into either C 0 or C 00 . (We don’t care
which, but the “?” in the figure indicate our uncertainty.) We restore balance by performing
two rotations, first a left rotation at b and then a right rotation at d (see Fig. 25(c)). This
double rotation has the effect of moving the subtree E down one level, leaving A’s level
unchanged, and pulling both C 0 and C 00 up by one level.
The balance factors at nodes b and d will depend on whether the insertion took place into C 0
or C 00 , but irrespective of which, they will be in the range from −1 to +1. The balance factor
at the new root node c is now 0. So, again we are all good with respect to the AVL balance
condition.
Insertion Implementation: The entire insertion procedure for AVL trees is shown in the follow-
ing code fragment. It starts with a few utilities. We assume that we store the height of each

AVL Trees 32 CMSC 420


double
-1 insert -2 !! rotate 0
d d c
0 1 0 or -1 0 or +1
b b b d
E E
c
A C A A C0 C 00 E
insert into C 0 or C 00 C0 ? C 00 ?
(a) (b) (c)

Fig. 25: Restoring balance after insertion through a double rotation.

node in a field p.height, which contains the height of the subtree rooted at this node. We
define a utility function height(p), which returns p.height if p is non-null and −1 otherwise.
Based on this we provide a utility function updateHeight, which is used for updating the
height’s of nodes (assuming that their children’s heights are properly computed). We also
provide a utility for computing balance factors and the rotation functions. We omit half of
the rotation functions since they are symmetrical, just with left and right swapped.
An interesting feature of the insertion algorithm (which is not at all obvious) is that whenever
rebalancing is required, the height of the modified subtree is the same as it was before the
insertion. This implies that no further rotations are required. (This is not the case for
deletion, however.)
Deletion: After having put all the infrastructure together for rebalancing trees, deletion is actually
relatively easy to implement. As with insertion, deletion starts by applying the deletion
algorithm for standard (unbalanced) binary search trees. Recall that this breaks into three
cases, leaf, single child, and two children. This part of the deletion process is identical to
the standard case. The only change is that (as in insertion) we restore balance to the tree
by invoking the function rebalance(p) just prior to returning from each node (starting with
the parent of the replacement node). Even though we design this piece of code to work in
the case of insertion, it can be shown that it works just as well for deletion.
In Fig. 26, we illustrate a deletion of a node (from the subtree E) which can be remedied by a
single rotation. This happens because the left subtree (rooted at b) is too heavy following the
deletion, and the left child of the left subtree (A) is at least as heavy as the right child (C).
(The “?” in our figures illustrate places where the subtree’s height is not fully determined.
For this reason, some of the balance factors are listed as “x or y” to indicate the possible
options.)
In Fig. 27, we illustrate an instance where a double-rotation is needed. In this case, the left
subtree (rooted at b) is too heavy following the deletion, but the right child of the left subtree
(C) is strictly heavier than the left child (A).
Note that in the case of the double rotation, the height of the entire tree rooted at d has de-
creased by 1. This means that further ancestors need to be checked for the balance condition.
Unlike insertion, where at most one rebalancing operation is needed, deletion could result in
a cascade of O(log n) rebalancing operations.
Lazy Deletion: The deletion code for standard binary search tree (and, by extension, AVL trees
and other balanced search trees) is generally more complicated than the insertion code. An

AVL Trees 33 CMSC 420


AVL Tree Utilities and Insertion
int height(AvlNode p) { return p == null ? -1 : p.height; }
void updateHeight(AvlNode p) { p.height = 1 + max(height(p.left), height(p.right));}
int balanceFactor(AvlNode p) { return height(p.right) - height(p.left); }

AvlNode rotateRight(AvlNode p) // right single rotation


{
AvlNode q = p.left;
p.left = q.right; // swap inner child
q.right = p; // bring q above p
updateHeight(p); // update subtree heights
updateHeight(q);
return q; // q replaces p
}

AvlNode rotateLeft(AvlNode p) { ... symmetrical to rotateRight ... }

AvlNode rotateLeftRight(AvlNode p) // left-right double rotation


{
p.left = rotateLeft(p.left);
return rotateRight(p);
}
AvlNode rotateRightLeft(AvlNode p) { ... symmetrical to rotateLeftRight ... }

AvlNode insert(Key x, Value v, AvlNode p) {


if (p == null) { // fell out of tree; create new node
p = new AvlNode(x, v, null, null);
}
else if (x < p.key) { // x is smaller - insert left
p.left = insert(x, p.left); // ... insert left
else if (x > p.key) { // x is larger - insert right
p.right = insert(x, p.right); // ... insert right
}
else throw DuplicateKeyException; // key already in the tree?
return rebalance(p); // rebalance if needed
}

AvlNode rebalance(AvlNode p) {
if (p == null) return p; // null - nothing to do
if (balanceFactor(p) < -1) { // too heavy on the left?
if (height(p.left.left) >= height(p.left.right)) { // left-left heavy?
p = rotateRight(p); // fix with single rotation
else // left-right heavy?
p = rotateLeftRight(p); // fix with double rotation
} else if (balanceFactor(p) > +1) { // too heavy on the right?
if (height(p.right.right) >= height(p.right.left)) { // right-right heavy?
p = rotateLeft(p); // fix with single rotation
else // right-left heavy?
p = rotateRightLeft(p); // fix with double rotation
}
updateHeight(p); // update p’s height
return p; // return link to updated subtree
}

AVL Trees 34 CMSC 420


-1 delete -2 !! rotate 0 or +1
d d b
-1 or 0 -1 or 0 0 or -1
b b d
E E A
A C A C C E
? deleted item ? ?

(a) (b) (c)

Fig. 26: Restoring balance after deletion with single rotation.

double
-1 delete -2 !! rotate 0
d d c
+1 +1 0 or -1 0 or +1
b b b d
E E
c
A C A A C0 C 00 E

deleted item C0 C 00
(a) (b) (c)

Fig. 27: Restoring balance after deletion with double rotation.

intriguing alternative for avoiding coding up the deletion algorithm is called lazy deletion.
For each node, we maintain a boolean value indicating whether this element is alive or dead.
When a key is deleted, we simply declare it to be dead, but leave it in the tree. If an attempt
is made to insert a value that comes in the same relative order as a dead key, we store the
new key-value pair in the dead node and declare it to now be alive. Of course, your tree
may generally fill up with lots of dead nodes, so lazy deletion is usually applied only in
circumstances where the number of deletions is expected to be significantly smaller than the
number of insertions. Alternatively, if the number dead nodes gets too high, you can invoke a
garbage collection process, which builds an entirely new search tree containing just the alive
nodes.

Lecture 6: 2-3 Trees


2-3 Trees: In the previous lecture, we presented one way to establish balance in a binary search
tree, namely through the AVL tree’s height-balance condition. Today, we will explore an
alternative approach which is achieved by allowing nodes to variable “widths.” In particular,
we will allow internal nodes to have either two or three children (see Fig. 28(a) and (b)).
When a node has three children, it stores two keys. Given the two key values b and d, the
three subtrees A, C, and E must satisfy the generalized inorder property that

A < b < C < d < E,

These are called 2-nodes and 3-nodes, respectively.

2-3 Trees 35 CMSC 420


2-3 tree
2-node 3-node 4 : 11
b b:d
2 8 15 : 20 h=2
A C A C E
1 3 5:7 9 14 18 23
(a) (b) (c)

Fig. 28: (a) 2-node, (b) 3-node, and (c) a possible 2-3 tree of height 2.

A 2-3 tree of height h ≥ −1 is defined recursively as follows (see Fig. 28(c)). It is either:

ˆ empty (i.e., null) when h = −1, or


ˆ its root is a 2-node, and its two subtrees are each 2-3 trees of heights h − 1, or
ˆ its root is a 3-node, and its three subtrees are each 2-3 trees of heights h − 1.

Since all the leaves are at the same level, and the sparsest possible 2-3 tree is a complete
binary tree, which implies the following:

Theorem: A 2-3 tree with n nodes has height O(log n).

It is easy to see how to perform the operation find(x) in a 2-3 tree. We apply the usual
recursive descent as for a standard binary tree, but whenever we come to a 3-node, we need
to check the relationship between x and the two key values in this node in order to decide
which of the three subtrees to visit. The important issues are insertion and deletion, which
we discuss next.
For conceptual purposes, it will be convenient to temporarily allow for the existence of “in-
valid” 1-nodes and 4-nodes. As soon as one of these exceptional nodes comes into existence,
we will take action to replace them with proper 2-nodes and 3-nodes.

Insertion into a 2-3 tree: The insertion procedure follows the general structure that we have
established with prior binary search trees. We first search for the insertion key, and make a
note of the last node we visited just before falling out of the tree. Because all leaf nodes are
at the same level, we always fall out at the lowest level of the tree. We insert the new key
into this leaf node. If the node was a 2-node, it now becomes a 3-node, and we are fine. If it
was a 3-node, it now becomes a 4-node, and we need to fix it.
While the initial insertion takes place at a leaf node, we will see that the restructuring process
can propagate to internal nodes. So, let us consider how to remedy the problem of a 4-node
in a general setting. A 4-node has three keys, say b, d, and f and four children, say A, C, E,
and G (see Fig. 29(a)).
To resolve the problem we split this node into two 2-nodes: one for b with A and C as
subtrees, and the other for f with E and G as subtrees. We then promote the middle key
d by inserting it (recursively) into the parent node. What if there is no parent, because the
current node is the root? We create a new root node storing d whose two children are b
and f (see Fig. 29(b)). It is easy to check that this process preserves the 2-3 tree structural
properties, and so we won’t bother giving a proof.

2-3 Trees 36 CMSC 420


Splitting a 4-node Creating new root
d d
!! split
b:d:f b promote f b f

A C E G A C E G A C E G

(a) (b)

Fig. 29: 2-3 tree insertion: (a) splitting a 4-node into two 2-nodes and (b) creating a new root.

In the figure below, we present an example of the result of inserting key 6 into a 2-3 tree. It
requires two splits to restore the tree’s structure.

insert(6) split split


4 4 4 4:8
!!
2 8 : 12 2 8 : 12 2 6 : 8 : 12 2 6 12
!!
1 3 5:7 9 14 1 3 5:6:7 9 14 1 3 5 7 9 14 1 3 5 7 9 14

Fig. 30: 2-3 tree insertion involving two splits.

Deletion from a 2-3 tree: Consistent with our experience with binary search trees, deletion is
more complicated than insertion. The general process follows the usual pattern. First, we
find the key to be deleted. If it does not appear in a leaf node, then we identify a replacement
key as the inorder successor. (The inorder predecessor would work equally well.) We copy the
replacement key-value pair to replace the deleted key entry, and then we recursively delete
the replacement key from its subtree. In this manner, we can always assume that we are
deleting a key from a leaf node. So, let us focus on this.
As you might imagine, since insertion resulted in an overfull 4-node being split into two 2-
nodes, the process of deletion will involve merging “underfull” nodes. This is indeed the case,
but it will also be necessary to consider another restructuring primitive in which keys are
taken or “adopted” from a sibling. This adoption process is also called key rotation, and I
will use both terms interchangeably.
More formally, let us consider the deletion of an arbitrary key from an arbitrary node in the
tree. If this is a 3-node, then the deletion results in a 2-node, which is fine. However, if this
is a 2-node, the deletion results in an illegal 1-node, which has one subtree and zero keys. We
remedy the situation in one of two ways.
Adoption (Key Rotation): Consider the left and right siblings of this node (if they exist).
If either sibling is a 3-node, then it gives up an appropriate key (and subtree) to convert
us back to 2-node. The tree is now properly structured.
Suppose, for the sake of illustration that the current node is an underfull 1-node, and
it has a right sibling that is a 3-node (see Fig. 31(a)). Then, we adopt the leftmost key
and subtree from this sibling, resulting in two 2-nodes. (A convenient mnemonic is the
equation 1 + 3 = 2 + 2.)
Merging: On the other hand, if neither sibling can offer a key, then it follows that at least
one sibling is a 2-node. Suppose for the sake of illustration that it is the right sibling

2-3 Trees 37 CMSC 420


adopt merge !!
b b d d b
!! !!
d:f b f d b:d

A C E G A C E G A C E A C E
(a) (b)

Fig. 31: 2-3 tree deletion: (a) adopting a key/subtree from a sibling and (b) merging two nodes.

(see Fig. 31(b)). In this case, we merge the 1-node with this 2-node to form a 3-node.
(A convenient mnemonic is the equation 1 + 2 = 3.)
But, we need a key to complete this 3-node. We take this key from our parent. If the
parent was a 3-node, it now becomes a 2-node, and we are fine. If the parent was a
2-node, it has now become a 1-node (as in the figure), and the restructuring process
continues on up the tree. Finally, if we ever arrive at a situation where the 1-node is the
root of the tree, we remove this root and make its only child the new root.
An example of the deletion of a key is shown in Fig. 32. In this case, the initial deletion
was from a 2-node, which left it as a 1-node. We merged it with its sibling to form a 3-node
(1 + 2 = 3). This involved demoting the key 6 from the parent, which caused the parent to
decrease from a 2-node to a 1-node. Since the parent has a 3-node sibling, we can adopt from
it. (By the way, there is also a sibling which is a 2-node, containing the key 2. Could we have
instead merged with this node? The answer is “yes”, but it is not in our interest to do this.
This is because merging results in more disruptions to ancestors of the tree, whereas a single
adoption terminates the restructuring process.)

delete(5) merge adopt


4:8 4:8 4:8 4 : 10
!!
2 6 10 : 16 2 6 10 : 16 2 10 : 16 2 8 16
!!
1 3 5 7 9 14 17 1 3 7 9 14 17 1 3 6:7 9 14 17 1 3 6:7 9 14 17

Fig. 32: 2-3 tree deletion involving a merge and an adoption.

In Fig. 33 we illustrate two more examples of deletions from a 2-3 tree. In the upper example,
we delete the key 3. Since its only sibling is a 2-node, we cannot perform an adoption, so we
merge with the parent. (You might wonder, can’t we adopt from our cousin, the (6 : 7) node?
The answer is “no.” According to the 2-3 deletion algorithm, you can only adopt from your
siblings. If we were to allow adoptions from arbitrarily distant cousins, the adoption process
would take more than O(1) time.) The parent node is now critical, and so we merge with its
parent.
In the second example in Fig. 33, we illustrate a case where the root node becomes critical.
In this case, we remove the old root, and make its only child the new root.
Implementation? We will not discuss the 2-3 tree implementation in detail, since we will be
presenting a number of related data structures, including red-black trees, AA trees, and
B-trees in later lectures. The fact that there are two types of nodes might suggest using

2-3 Trees 38 CMSC 420


delete(3) merge merge
4 : 10 4 : 10 4 : 10 10
!!
2 8 16 2 8 16 8 16 4:8 16
!!
1 3 6:7 9 14 17 1 6:7 9 14 17 1:2 6:7 9 14 17 1:2 6:7 9 14 17

delete(9) merge
10 10 10 merge
!!
8 16 8 16 16 10 : 16 new root
!!
7 9 14 17 7 14 17 7:8 14 17 7:8 14 17

Fig. 33: Two more examples of 2-3 tree deletions.

two different node classes (e.g., TwoNode and ThreeNode, which are both subclasses of a
generic node class). However, because we often convert between one node type and the
other (changing a 2-node into a 3-node and vice versa), it may be preferred to have just one
node type that is capable of representing both, and adjusting the number of children/keys
dynamically. An example is shown in the following code fragment.
2-3 Tree Node
class TwoThreeNode {
int nChildren; // number of children (2 or 3)
TwoThreeNode child[3]; // children pointers
Key key[2]; // keys
Value value[2]; // values
}

In our examples, we temporarily created a 4-node, which we immediately split into two 2-
nodes. This is really just a conceptual trick. In reality, a 4-node would never be generated.
Instead, as soon as you see that you are about to create a 4-node, you would immediately
invoke a split procedure to remedy the situation.
Another tricky implementation issue is that the current node will need to access its siblings
on the left and right sides. To make this operation easier, it may be desirable to add a parent
link to our node structure. Alternatively, the various recursive functions can be passed two
node pointers as arguments, one to the current node and the other to its parent.

Lecture 7: Red-black and AA trees


“A rose by any other name . . . ”: In the previous lecture, we presented the 2-3 tree, which
allows nodes of variable widths. In this lecture, we will explore a closely related binary-tree
data structure, called a red-black tree, and we will discuss a particular implementaiton, which
is called an AA trees. (Before reading this lecture, please review the material on 2-3 trees.)

Red-Black Trees: While a 2-3 tree provides an interesting alternative to AVL trees, the fact that
it is not a binary tree is a bit annoying. As we saw earlier in the semester, there are ways
of representing arbitrary trees as binary trees. This inspires the question, “Can we encode a
2-3 tree as an equivalent binary tree?” Unfortunately, the first-child, next-sibling approach

Red-black and AA Trees 39 CMSC 420


presented earlier in the semester will not work. (Can you see why not? At issue is whether
the inorder properties of the tree hold under this representation.)
Here is a simple approach that works, however. First, there is no need to modify 2-nodes,
since they are already binary-tree nodes. To represent a 3-node as a binary-tree node, we
create a two-node combination, as shown in Fig. 34(a) below. The 2-3 tree shown in the inset
would be encoded in the manner shown in Fig. 34(b).
2-3 tree
4 : 11 Red-black tree
2 8 15 : 20 4
b:d b 1 3 5:7 9 14 18 23 11
d 2 8 15
20
A C E A C E 1 3 5 9 14 18 23
7
(a) (b)

Fig. 34: Encoding a 2-3 tree as an equivalent binary tree.

If we label each of “second nodes” of the 3-nodes as red and label all the other nodes as black,
we obtain a binary tree with both red and black nodes. It is easy to see that the resulting
binary tree satisfies the following properties:

(1) Each node is either red or black.


(2) The root is black.
(3) All null pointers are treated as if they point to black nodes (a conceptual convenience).
(4) If a node is red, then both its children are black.
(5) Every path from a given node to any of its null descendants contains the same number
of black nodes.

A binary search tree that satisfies these conditions is called a red-black tree. It is easy to see
that the above encoding of any 2-3 tree to this binary form satisfies all of these properties.
(On the other hand, if you just saw this list of properties without having seen the 2-3 tree,
it would seem to be very arcane!) Because 2-3 trees have O(log n) height, the following is an
immediate consequence:

Lemma: A red-black tree with n nodes has height O(log n).

Note that not every red-black tree is the binary encoding of a 2-3 tree. There are two issues.
First, the red-black conditions do not distinguish between left and right children, so a 3-node
could be encoded in two different ways in a red-black tree (see Fig. 35(a)). More seriously,
the red-black condition allows for the sort of structure in Fig. 35(b), which clearly does not
correspond to a node of a 2-3 tree.
It is interesting to observe that this three-node combination can be seen as a way of modeling
a node with four children. Indeed, there is a generalization of the 2-3 tree, called a 2-3-4
tree, which allows 2-, 3-, and 4-nodes. Red-black trees as defined above correspond 1–1 with
2-3-4 trees. Red-black trees are the basis of TreeMap class in the java.util package. The

Red-black and AA Trees 40 CMSC 420


or
b:d b d b:d:f d
d b b f

(a) (b)

Fig. 35: Color combinations allowed by the red-black tree rules.

principle drawback of red-black trees is that they are rather complicated to implement. For
this reason, we will introduce a variant of the red-black tree below, called an AA tree, which
is easier to code.

AA trees (Red-Black trees simplified): In an effort to simplify the complicated cases that
arise with the red-black tree, in 1993 Arne Anderson developed a restriction of the red-black
tree. He called his data structure a BB tree (for “Binary B-tree”), but over time the name has
evolved into AA trees, named for the inventor (and to avoid confusion with another popular
but unrelated data structure called a BB[α] tree).
Anderson’s idea was to allow the conversion described above between 2-3 trees and red-black
trees but add a rule that forbids the alternate configurations shown in Fig. 35. The additional
rule is:

(6) Each red node can arise only as the right child of a black node.

The edge between a red node and its black parent is called a red edge, and is shown as a
dashed red edge in our figures. Note that, while AA-trees are simpler to code, experiments
show that are a bit slower than red-black trees in practice.
The implementation of the AA tree has the following two noteworthy features, which further
simplify the coding:

We do not use null pointers: Instead, we create a special sentinel node, called nil (see
Fig. 36(a)), and every null pointer is replaced with a pointer to nil. (Although the tree
may have many null pointers, there is only one nil node allocated, with potentially
many incoming pointers.) This node is considered to be black.
Why do this? Observe that nil.left == nil and nil.right == nil. This simplifies
the code because we can always de-reference a pointer, without having to check first
whether it is null.
2-3 tree
4 : 11 AA tree
Level:
2 8 15 : 20
3 4
1 3 5:7 9 14 18 23 11
nil
2 2 8 15
20
all point to nil 5
level = 0 1 1 3 9 14 18 23
7
(a) (b)

Fig. 36: AA trees: (a) the nil sentinel node, (b) the AA tree for a 2-3 tree.

Red-black and AA Trees 41 CMSC 420


We do not store node colors: Instead, each node p stores a level number, denoted p.level
(see Fig. 36(b)). Intuitively, the level number encodes the level of the associated node
in the 2-3 tree. Formally, nil node is at level 0, and if q is a black child of some node p,
then p.level = q.level + 1, and if q is a red child of p, then they have the same level
numbers.
We do not need to store node colors. For example, we can determine whether a black
node p’s right child is red, it suffices to test p.right.level == p.level.

AA tree operations: Since an AA tree is essentially a binary search tree, the find operation
is exactly the same as for any binary search tree. Insertions and deletions are performed in
essentially the same way as for AVL trees: first the key is inserted or deleted at the leaf level,
and then we retrace the search path back to the root and restructure the tree as we go. As
with AVL trees, restructuring essentially involves rotations. For AA trees the two rotation
operations go under the special names skew and split. They are defined as follows:

skew(p) split(p) returned


returned
d q
d p q b p b q
q b d
d p f p b f

A C E A C E A C E G A C E G

(a) (b)

Fig. 37: AA restructuring opertions (a) skew and (b) split. (Afterwards q may be red or black.)

skew(p): If p is black and has a red left child, rotate so that the red child is now on the
right (see Fig. 37(a)). The level of these two nodes are unchanged. Return a pointer to
upper node of the resulting subtree.
split(p): If p is black and has a chain of two consecutive red nodes to its right (that is,
p.level == p.right.level == p.right.right.level), split this triple by performing
a left rotation at p and promoting p’s right child, call it q, to the next higher level (see
Fig. 37(b)).

In the figure, we have shown p as a black node, but in the context of restructuring p may be
either red or black. As a result, the node q that is returned from the operations may either be
red or black. The implementation of these two operations is shown in the code block below.

AA-tree insertion: As mentioned above, we insert a node just as for a standard binary-search
tree and then work back up the tree restructuring as we go. What sort of restructuring is
needed? Recall first that (following the policies of 2-3 trees) all leaves should be at the same
level of the tree. To achieve this, when the new node is inserted, we assign it the same level
number as its parent. This is equivalent to saying that the newly inserted node is red (see
Fig. 38(a)).
The first problem might arise is that this newly inserted red node is a left child, which is
not allowed (see Fig. 38(b)). Letting p denote the node’s parent, this is easily remedied by
performing skew(p) (see Fig. 38(c)). Let q be the pointer to the resulting subtree.

Red-black and AA Trees 42 CMSC 420


AA tree skew and split utilities
AANode skew(AANode p) {
if (p == nil) return p;
if (p.left.level == p.level) { // red node to our left?
AANode q = p.left; // do a right rotation at p
p.left = q.right;
q.right = p;
return q; // return pointer to new upper node
}
else return p; // else, no change needed
}

AANode split(AANode p) {
if (p == nil) return p;
if (p.right.right.level == p.level) { // right-right red chain?
AANode q = p.right; // do a left rotation at p
p.right = q.left;
q.left = p;
q.level += 1; // promote q to next higher level
return q; // return pointer to new upper node
}
else return p; // else, no change needed
}

insert(5) skew(p) split(q) ...


3
3 3 3
7
p p q
7 !! 7 5 !!
5 9
9 5 9 7
9
(a) (b) (c) (d)

Fig. 38: AA insertion: (a) Initial tree, (b) after insertion, (c) after skewing, (d) after splitting.

Red-black and AA Trees 43 CMSC 420


Next, it might be that p already had a right child that was red, and the skew could have
resulted in a right-right chain starting from q. (This is equivalent to having a 4-node in a 2-3
tree.) We remedy this by invoking the split operation on q (see Fig. 38(d)). Note that the split
operation moves the middle node of the chain up to the next level of the tree. The problems
that we just experienced may occur with this promoted node, and so the skewing/splitting
process generally propagates up the tree to the root.
The insertion function is provided in the following code block. Observe that (as with the AVL
tree) the function is almost the same as the standard (unbalanced) binary tree insertion except
for the final rebalancing step, which is performed by the call “return split(skew(p))”.
(This simplicity is the principle appeal of AA-trees over traditional red-black trees.)
AA Tree Insertion
AANode insert(Key x, Value v, AANode p) {
if (p == nil) // fell out of the tree?
p = new AANode(x, v, 1, nil, nil); // ... create a new leaf node here
else if (x < p.key) // x is smaller?
p.left = insert(x, v, p.left); // ...insert left
else if (x > p.key) // x is larger?
p.right = insert(x, v, p.right); // ...insert right
else
throw DuplicateKeyException; // duplicate key!
return split(skew(p)); // restructure and return result
}

An example of insertion is shown in Fig. 39. (See the lecture on 2-3 trees for the analogous
process.)

insert(6) skew
4 4 4
8 8 8
2 12 2 12 2 12
1 3 5 9 14 1 3 5 9 14 1 3 5 9 14
7 7 !! 6
6 !! 7 split
split skew
4 4 4
8 !!
6
2 8 !! 8
2 6 12 12 2 6 12
1 3 5 7 9 14 1 3 5 7 9 14 1 3 5 7 9 14

Fig. 39: Example of AA-tree insertion. (Remember, a node is red if it is at the same level as its
parent.)

AA-tree deletion: As usual deletion is more complex than insertion. If this is not a leaf node,
we find a suitable replacement node. (This will either be the inorder predecessor or inorder
successor, depending on the tree’s structure.) We copy the contents of the replacement
node to the deleted node and then we proceed to delete the replacement. After deleting the

Red-black and AA Trees 44 CMSC 420


replacement node (which must be a leaf), we retrace the search path towards the root and
restructure as we go.
Before discussing deletion, let’s first consider a useful utility function. In the process of
deletion, a node can lose one of its children. As a result, we may need to decrease this
node’s level in the tree. To assist in this process we define two functions. The first, called
updateLevel(p), updates the level of a node p based on the levels of its children. Every node
has at least one black child, and therefore, the ideal level of any node is one more than the
minimum level of its two children. If we discover that p’s current level is higher than this ideal
value, we set it to its proper value. If p’s right child is a red node (that is, p.right.level ==
p.level prior to the adjustment), then the level of p.right needs to be decreased as well.
AA-Tree update level utility
void updateLevel(AANode p) { // update p’s level
int idealLevel = 1 + min(p.left.level, p.right.level);
if (p.level > idealLevel) { // is p’s level too high?
p.level = idealLevel; // decrease its level
if (p.right.level > idealLevel) // is p’s right child red?
p.right.level = idealLevel; // ...pull it down as well
}
}

When the restructuring process arrives at a node p, we first fix its level using updateLevel(p).
Next we need to skew to make sure that any red children are to its right. Deletion is com-
plicated in that we may generally need to perform up to three skew operations to achieve
this: one on p, one on p.right, and one on p.right.right (see Fig. 40). After this, p may
generally be at the top of a right-leaning chain consisting of p followed by four red nodes.
To remedy this, we perform two splits, one at p, and the other to its right-right grandchild,
which becomes its right child after the first split (see Fig. 40). Whew! These splits may
not be needed, but remember that the split function only modifies the tree if needed. The
restructuring function, called fixAfterDelete, is presented in the following code fragment.
As an exercise, you might draw the equivalent 2-3 tree both before and after the deletion.
You will discover that, although the intermediate results differ, the final tree is the encoding
of the 2-3 tree after the deletion.
AA-Tree Deletion Utility
AANode fixAfterDelete(AANode p) {
updateLevel(p); // update p’s level
p = skew(p); // skew p
p.right = skew(p.right); // ...and p’s right child
p.right.right = skew(p.right.right); // ...and p’s right-right grandchild
p = split(p); // split p
p.right = split(p.right); // ...and p’s (new) right child
return p;
}

Finally, we can present the full deletion code. It looks almost the same as the deletion code for
the standard binary search tree, but after deleting the leaf node, we invoke fixAfterDelete
to restructure the tree. We will omit the (messy) details showing that after this restructuring,
the tree is in valid AA-form. (We refer you to Anderson’s original paper.)

Red-black and AA Trees 45 CMSC 420


delete(1) updateLevel !!
4 13 4 13 4 13
2 6 9 16 2 6 9 16 6 9 16
!!
1 3 5 7 11 14 17 3 5 7 11 14 17 2 3 5 7 11 14 17
updateLevel
split skew skew
split skew
split
6 13 4 !! 4 13
6 9 !!
13 16
4 9 16 6 9 16

2 3 5 7 11 14 17 2 3 5 7 11 14 17 2 3 5 7 11 14 17

Fig. 40: Example of AA-tree deletion. (Remember, a node is red if it is at the same level as its
parent.)

AA Tree Deletion
AANode delete(Key x, AANode p) {
if (p == nil) // fell out of tree?
throw KeyNotFoundException; // ...error - no such key
else {
if (x < p.key) // look in left subtree
p.left = delete(x, p.left);
else if (x > p.key) // look in right subtree
p.right = delete(x, p.right);
else { // found it!
if (p.left == nil && p.right == nil)// leaf node?
return nil; // just unlink the node
else if (p.left == nil) { // no left child?
AANode r = inorderSuccessor(p); // get replacement from right
p.copyContentsFrom(r); // copy replacement contents here
p.right = delete(r.key, p.right); // delete replacement
}
else { // no right child?
AANode r = inorderPredecessor(p); // get replacement from left
p.copyContentsFrom(r); // copy replacement contents here
p.left = delete(r.key, p.left); // delete replacement
}
}
return fixAfterDelete(p); // fix structure after deletion
}
}

Red-black and AA Trees 46 CMSC 420


Analysis: All of these algorithms take O(1) time per level of the tree, which implies that the
running time of all the dictionary operations is O(h) where h is the height of the tree. As we
saw abow, the tree’s height is O(log n) height, which implies that all the dictionary operations
run in O(log n) time.

Lecture 8: Treaps
Randomized Data Structures: A common design techlque in the field of algorithm design in-
volves the notion of using randomization. A randomized algorithm employs a pseudo-random
number generator to inform some of its decisions. Randomization has proved to be a re-
markably useful technique, and randomized algorithms are often the fastest and simplest
algorithms for a given application.
This may seem perplexing at first. Shouldn’t an intelligent, clever algorithm designer be
able to make better decisions than a simple random number generator? The issue is that a
deterministic decision-making process may be susceptible to systematic biases, which in turn
can result in unbalanced data structures. Randomness creates a layer of “independence,”
which can alleviate these systematic biases.
In this lecture, we will consider among the first randomized data structures for dictionary
operations, called a treap. This data structure’s name is a portmanteau (combination) of
“tree” and “heap.” It was developed by Raimund Seidel and Cecilia Aragon in 1989. (This
1-dimensional data structure is closely related to two 2-dimensional data structures, the
Cartesian tree by Jean Vuillemin and the priority search tree of Edward McCreight, both
discovered in 1980.)
Because the treap is a randomized data structure, its running time depend on the random
choices made by the algorithm. We will see that all the standard dictionary operations
take O(log n) expected time. The expectation is taken over all possible random choices that
the algorithm may make. You might protest, since this allows for rare instances where the
performance is very bad. While this is always a possibility, a more refined analysis shows
that (assuming n is fairly large) the probability of poor performance is so insanely small that
it is not worth worrying about.

Treaps: The intuition behind the treap is easy to understand. Recall back when we discussed stan-
dard (unbalanced) binary search trees that if keys are inserted in random order, the expected
height of the tree is O(log n). The problem is that your user may not be so accommodating
to insert keys in this order. A treap is a binary search tree whose structure arises “as if” the
keys had been inserted in random order.
Let’s recall how standard binary tree insertion works. When a new key is inserted into such
a tree, it is inserted at the leaf level. If we were to label each node with a “timestamp”
indicating its insertion time, as we follow any path from the root to a leaf, the timestamp
values must increase monotonically (see Fig. 41(b)). From your earlier courses you should
know a data structure that has this very property—such a tree is generally called heap.
This suggests the following simple idea: When first inserted, each key is assigned a random
priority, call it p.priority. As in a standard binary tree, keys are sorted according to an
inorder traversal. But, the priorities are maintained according to heap order. Since the
priorities are random, it follows that the tree’s structure is consistent with a tree resulting
from a sequence of random insertions. Thus, we have the following:

Treaps 47 CMSC 420


Insertion order: k, e, b, o, f, h, w, m, c, a, s

k Binary search tree 1 With timestamps


k

e o 2 4
e o
3 5 8 7 Timestamp
b f m w m w
b f Key
a c h s 10 9 6 11
a c h s
(a) (b)

Fig. 41: (a) A binary search tree and (b) associating insertion timestamps with each node.

Theorem: A treap storing n nodes has height O(log n) in expectation (over all n! possible
orderings of the random priorities present in the tree).

Since priorities are random, you might wonder about possibility of two priorities being equal.
This might happen, but if the domain of random numbers if much larger than n (say at least
n2 ) then these events are so rare that they won’t affect the tree’s performance. We will show
that it is possible to maintain this structure quite easily.
Geometric Interpretation: While Seidel and Aragon designed the treap as a 1-dimensional
search structure, the introduction of numeric priorities suggests that we can interpret each
key-priority pair as a point in 2-dimensional space. We can visualize a treap as a subdivision
of 2-dimensional space as follows. Place all the points in rectangle, where the y-coordinates
(ordered top to bottom) are the priorities and the x-coordinates are the keys, suitably mapped
to numbers (see Fig. 42). Now, draw a horizontal line through the root. Because there are no
points of lower priority, all the other points lie in the lower rectangle. Now, shoot a vertical
ray down from this point. This splits the rectangle in two, with the points of the left subtree
lying in the left rectangle and the points of the right subtree lying in the right subtree. Now,
repeat the process recursively on each of the two halves.
Geometric interpretation (k,1)
Treap 1 (e,2)
1 2 ...
k 3 (o,4)
4
2 4 5
e o 6
3 5 8 7 Timestamp: y 7
b f m w 8
Key: x 9
10 9 6 11 10
a c h s 11
a b c e f h k m o s w
(a) (b)

Fig. 42: (a) A treap and (b) geometric interpretation as a (randomized) priority search tree.

The resulting subdivision is closely related to two 2-dimensional data structures, one called

Treaps 48 CMSC 420


the Cartesian tree and the other called the priority search tree. These structures can be used
for answering various geometric range-searching queries.

Treap Insertion: Insertion into the treap is remarkably simple. First, we apply the standard
binary-search-tree insertion procedure. When we “fall out” of the tree, we create a new node
p, and set its priority, p.priority, to a random integer. We then retrace the path back up to
the root (as we return from the recursive calls). Whenever we come to a node p whose child’s
priority is smaller than p’s, we apply an appropriate single rotation (left or right, depending
on which child it is), thus reversing their parent-child relationship. We continue doing this
until the newly inserted key node is lifted up to its proper position in heap order. The code
is very simple and is given below.

03 insert(“t”) 03 rotate w-t 03 rotate o-t 03 OK


k priority = 14 k k k
13 45 13 45 13 45 !! 13 14
e o e o e o e t
37 51 78 67 37 51 78 !! 67 37 51 78 14 37 51 45 67
b f m w b f m w b f m t b f o w
84 57 84 57 14 84 57 67 84 57 78
c h c h t c h w c h m

Fig. 43: Treap insertion.

Treap Deletion: Deletion is also quite easy, but as usual it is a bit more involved than insertion.
If the deleted node is a leaf or has a single child, then we can remove it in the same manner
that we did for binary trees, since the removal of the node preserves the heap order. However,
if the node has two children, then normally we would have to find the replacement node, say
its inorder successor and copy its contents to this node. The newly copied node will then be
out of priority order, and rotations will be needed to restore it to its proper heap order. (The
code presented at the end of the lecture notes uses this approach.)
There is, however, a cute trick for performing deletions. We first locate the node in the tree
and then set its priority to ∞ (see Fig. 44). We then apply rotations to “sift it down” the
tree to the leaf level, where we can easily unlink it from the tree.

03 delete(“e”) 03 rotate e-b 03 rotate e-f 03


k k k k
13 14 !! ∞ 14 37 14 37 14
e t e t b t b t
37 51 45 67 37 51 45 67 89 ∞ 45 67 89 51 45 67
b f o w b f o w a e !! o w a f o w
89 57 89 57 51 ∞ 57
a h a h f unlink e e h
57
h

Fig. 44: Treap deletion.

The treap is particularly easy to implement because we never have to worry about adjusting
the priority field. For this reason, treaps are among the fastest data tree-based dictionary
structures.

Treaps 49 CMSC 420


Implementation: We can implement a treap by modifying our implementation of other binary
search trees. First, the node structure is similar to that of a standard (unbalanced) binary
search tree, but we include the priority value, called priority:
Treap Node Structure
private class TreapNode {
Key key // key
Value value // value
int priority // random priority (set when node is created)
TreapNode left // left child
TreapNode right // right child
}

The right and left rotation functions are the same as for AVL trees (and we omit them). We
introduce three utility functions:

ˆ getPriority(p): If p is not null it returns the node’s priority, and ∞ otherwise.


ˆ lowestPriority(p): Returns the node p, p.left, and p.right with the lowest priority.
ˆ restructure(p): Restructure the tree locally about p by rotating up a child having
lower priority than p.

Treap Restructuring Operations


int getPriority(TreapNode p) { return (p == null ? MAX_PRIORITY : p.priority) }

TreapNode lowestPriority(TreapNode p) { // lowest priority of p, p.left, p.right


TreapNode q = p
if (getPriority(p.left) < getPriority(q)) q = p.left
if (getPriority(p.right) < getPriority(q)) q = p.right
return q
}

TreapNode restructure(TreapNode p) { // restore priority at p


if (p == null) return p // nothing to do
TreapNode q = lowestPriority(p) // get child to rotate
if (q == p.left) p = rotateRight(p) // rotate as needed
else if (q == p.right) p = rotateLeft(p)
return p // return updated subtree
}

The insert function has exactly the same form the insert function for standard (unbalanced)
binary search trees, but it invokes restructure as it returns up the tree. Notice that once
we get to a node where a rotation is not needed, it will not be needed at any higher node.
The deletion function also follows the standard template. As in the case of deletion, the only
difference from standard (unbalanced) search trees is the invocation of restructure(p) at
the last line.

Skip Lists 50 CMSC 420


Treap Insertion
TreapNode insert(Key x, Value v, TreapNode p) {
if (p == null) // fell out of the tree?
p = new TreapNode(x, v, null, null) // ... create a new leaf node here
else if (x < p.key) // x is smaller?
p.left = insert(x, v, p.left) // ...insert left
else if (x > p.key) // x is larger?
p.right = insert(x, v, p.right) // ...insert right
else
Error - Duplicate key!
return restructure(p) // restructure (if needed)
}

Treap Deletion
TreapNode delete(Key x, TreapNode p) {
if (p == null) Error - Nonexistent key! // key not found
else {
if (x < p.key) p.left = delete(x, p.left) // delete from left
else if (x > p.key) p.right = delete(x, p.right) // delete from right
// found it
else if (p.left == null || p.right == null) { // either child empty?
if (p.left == null) return p.right // return the other child
else return p.left
} else { // both children present
TreapNode r = findReplacement(p.right) // find replacement node
copy contents from r to p
p.right = delete(r.key, p.right) // delete the replacement
}
}
return restructure(p) // restructure (if needed)
}

Skip Lists 51 CMSC 420


Lecture 9: Randomized Search Structures: Skip Lists
More Randomized Search Structures: In this lecture, we continue our discussion of random-
ized search structures. In the previous lecture, we introduced the treap, which uses random-
ization to simulate the structure of a binary tree where keys are inserted in random order.
Recall that such a data structure uses a random number generator to guide its operations and
structure, and the running time is expressed as the expected running time for an arbitrary
sequence of operations, averaged over all possible outcomes of the random number generator.
Since the random number generator is not under the control of the data structure’s user, even
a malicious user cannot engineer a sequence of operations that will guarantee a higher than
expected running time. The only thing that could go wrong is having bad luck with respect
to the random number generator.
In this lecture we will discuss a data structure that is reputed to be among the fastest data
structures for ordered dictionaries, the skip list. (It also has the feature that it was designed
by Bill Pugh, a former professor at the University of Maryland!)

Skip Lists: Skip lists began with the idea, “how can we make sorted linked lists better?” It is
easy to do operations like insertion and deletion into linked lists, but it is costly to locate
items efficiently because we have to walk through the list one item at a time. If we could
“skip” over multiple of items at a time, however, then we could perform searches efficiently.
Intuitively, a skip list is a data structure that encodes a collection of sorted linked lists, where
links skip over 2, then 4, then 8, and so on, elements with each link.
To make this more concrete, imagine a linked list, sorted by key value. There are two nodes
at either end of the list, called head and tail. Take every other entry of this linked list (say
the even numbered entries) and extend it up to a new linked list with 1/2 as many entries.
Now take every other entry of this linked list and extend it up to another linked with 1/4 as
many entries as the original list, and continue this until no elements remain. The head and
tail nodes are always lifted (see Fig. 45). Clearly, we can repeat this process dlg ne times.
(Recall that “lg” denotes log base 2.)
The result is something that we will call the ideal skip list. Unlike the standard linked list,
where you may need to traverse O(n) links to reach a given node, in this list any node can
be reached with O(log n) links from the head.
head tail
5
4
3 ∞
2
25
1 11
8 19
0 2 10 13 22

Fig. 45: The “ideal” skip list.

To search for a key x, we start at the highest level of head. We scan linearly along the list at
the current level i, until we are about to jump to an node whose key value is strictly greater
than to x. Since tail is associated with ∞, we will always succeed in finding such a node.
Let p point to the node just before this step. If p’s data value is equal to x then we stop.

Skip Lists 52 CMSC 420


Otherwise, if we are not yet at the lowest level, we descend to the next lower level i − 1 and
continue the search there. Finally, if we are at the lowest level and have not found x, we
announce that the x is not in the list (see Fig. 46).
head tail
find(22)
5
4
3 ∞
2
25
1 11
8 19
0 2 10 13 22

Fig. 46: Searching the ideal skip list.

How long would this search require in the worst case? Observe that we need never traverse
more than one link at any given level in the path to the desired node. We will generally need
to access two nodes at each level, however, because the need to determine the node whose
key is greater than x’s. As mentioned earlier, the number of levels is dlg ne. Therefore, the
total number of nodes accessed is at most 2 dlg ne = O(log n).
Randomized Skip Lists: Unfortunately, like a perfectly balanced binary tree, the ideal skip list
is too pure to be able to use for a dynamic data structure. As soon as a single node was
added to the middle of the lists, all the heights following it would need to be modified. But
we can relax this requirement to achieve an efficient data structure. In the ideal skip list,
every other node from level i is extended up to level i + 1. Instead, how about if we did this
randomly?
Suppose that we have built the skip list up to some level i, and we want to extend this to level
i + 1. Imagine a node at level i tossing a coin. If the coin comes up heads (with probability
1/2) this node promotes itself to level i + 1, and otherwise it stops here. By the nature of
randomization, the expected number of nodes at level i + 1 will be half the number of nodes
at level i. Thus, the expected number of nodes at level k will be n/2k , which means that the
expected number of nodes at level dlg ne is a constant. Fig. 47 shows what such a randomized
skip list, or simply skip list, will look like.
head tail
5
4
3 ∞
2
13
1
10 19
0 2 8 11 22 25

Fig. 47: A (randomized) skip list.

Implementation: As we saw with 2-3 and 2-3-4 trees, skip lists have variable sized nodes. In the
skip list, there is no a priori limit on a node’s size. This is not an issue in languages like Java,
where it is easy to dynamically allocate arrays of arbitrary length, however. For example, our
skip list node might have the following structure

Skip Lists 53 CMSC 420


Node in a Skip List
class SkipNode {
Key key
Value value
SkipNode[] next

SkipNode(Key x, Value v, int size) {


key = x; value = v; next = new SkipNode[size];
}
}

Here is sample code to find a key in a Skip list. Notice it is very simple (and does not involve
recursion!) Intuitively, we start at the topmost nonempty level of the skip list, and keep
moving along a given level as long as the key in the next node is less than or equal to the key
we are looking form. Whenever we get stuck, we drop down a level. When we drop out of
the skip list (level is less than zero), we know that the current node’s key is at most x and
the next node’s key is strictly greater than x. So, we check this node’s key matches, and if
so, we have found it. If so, we drop down a level. We stop when we arrive at level zero. If
the key is in the skip list, this node will match. If not, we know that the key is not present.
Find in a Skip List
Value find(Key x) {
int i = topmostLevel // start at topmost nonempty level
SkipNode p = head // start at head node
while (i >= 0) { // while levels remain
if (p.next[i].key <= x) p = p.next[i] // advance along same level
else i-- // drop down a level
}
return (p.key == x ? p.value : null) // return value if found
}

Height and Space Analysis: Unlike binary search trees whose nodes are all of the same size,
the nodes of a skip list have variable sizes. How many levels will the skip list have? The
process we described earlier could create an arbitrarily large number of levels (if your coin
keeps coming up heads). We can show that the maximum level in the skip list will be O(log n)
with high probability.

Theorem: The expected number of levels in a skip list with n entries is O(log n).
Proof: We will show the stronger result that, with high probability, the number of levels in
a skip list with n entries does not exceed O(log n). Let ` ≥ 1 denote some level number.
Suppose that we have n entries and we observe that our skip list contains at least one
node at level `. In order for an arbitrary entry to make it to level `, its coin toss came
up heads at least ` times consecutively. The probability of this happening is clearly at
most 1/2` . Since this holds independently for all of the n entries, the probability that
any of the entries survive to level ` is5 at most n/2` .
5
This is called the union bound. It states that the probability that any among n events occurs is at most n times
the maximum probability of each individual event.

Skip Lists 54 CMSC 420


Setting ` = c · lg n, it follows that the probability that the maximum height exceeds ` is
at most
n n n n
`
= c lg n = lg n c
= c = n1−c .
2 2 (2 ) n
So, if we take even a moderate value of c, say c = 3, the probability that the maximum
height exceeds 3 lg n is at most 1/n2 . So, if n is large (say over 1000), the probability of
exceeding 3 lg n is less than one in a million!

This implies that, if you have an upper bound on the value of n, you are safe (with high
probability) to allocate head and tail arrays of size d3 lg ne. Of course, even if a node were
to attempt to exceed this level, you could just arbitrarily limit it. The analysis that follows
would not change as a result.
Next, let’s consider the storage space. Under the assumption that the maximum number of
levels is O(log n), then in the worst case every node contributes to every level, and the skip
list would have total storage of O(n log n). This is too high! The following lemma indicates
that the expected space used is just linear in n.

Theorem: The expected storage for a skip list with n entries is O(n).
Proof: For i ≥ 0, let ni denote the number of nodes that contribute to level i of the skip
list. This is a random variable, meaning that its value varies with our random choices.
The quantity that we are interested in is its expected value E(ni ). The probability that
an arbitrary entry contributes to level i, is just the probability of tossing i consecutive
heads using a fair coin, which is clearly 1/2i . Thus, the expected number of entries
that survive to this level is just the total population size (n) times the probability of
contributing (1/2i ). Thus, E(ni ) = n/2i .
By basic facts on the geometric series, we know that 1 + 12 + 41 + 18 + · · · = 2, and hence
∞ ∞ ∞
X X n X 1
E(ni ) = i
= n = 2n.
2 2i
i=0 i=0 i=0

Finally, by the linearity of expectation (the expected value of a sum of random variables
is the sum of the expected values of the random variables), it follows that the total
expected storage needed is the sum of E(ni ) over all possible levels
∞ ∞
!
X X
E ni = E(ni ) = 2n = O(n),
i=0 i=0

and this completes the proof.

Notice that the proof did not make use of the fact that we arbitrarily limit the maximum
level. Clearly, adding such a upper limit can only reduce the storage requirements.

Search-Time Analysis: Earlier, we argued that the worst-case search time in an ideal skip list
is O(log n). Now, we will show that the expected case search time in the randomized skip list
will be O(log n). It is important to note that the analysis to follow will not depend on the
choice of keys in the data structure nor the order in which they were inserted. Rather, it will
depend solely on the randomized (coin-flipping) process used to build the data structure.

Skip Lists 55 CMSC 420


The analysis of skip lists is an example of a probabilistic analysis. As observed earlier, the
expected number of levels in a skip list is O(log n). We will show that for any fixed node, the
length of the search path leading here is O(log n) in expectation. Our analysis will be based
on walking backwards along the search path. (This is sometimes called a backwards analysis.)
Observe that the forward search path drops down a level whenever the next link would have
taken us “beyond” the node we are searching for. Thus, when we consider the reversed search
path, it will always take a step up if it can (i.e., if the node it is visiting contributes to the
next higher level), otherwise it will take a step to the left.

head find(25) tail


5
4
3 ∞
2
13
1
10 19
0 2 8 11 22 25

Fig. 48: The search path (blue) to x = 25 and the reverse search path (red).

Theorem: The expected number of nodes visited in a search in a skip list of n keys is
O(log n).
Proof: We will prove this for the more general case, where the probability that a node is
promoted to the next higher level is p, for some constant 0 < p < 1. The analysis for
our coin-flipping version of the skip follows by setting p = 1/2.
For 0 ≤ i ≤ O(log n), let E(i) denote the expected number of nodes visited in the skip
list at the top i levels of the skip list. (For example, in Fig. 48, the skip list’s top level
is 5. In this case E(2) would be the expected number of steps taken at levels 4 and 5,
E(3) would be the expected time spent in levels 3 − −5, and E(6) would be the expected
number of steps at all the levels.)
As mentioned above, our analysis is based on walking backwards along the search path.
Whenever we arrive at some node of level i, the probability that it contributes to the
next higher level is exactly p, and if so, the search path came from above. On the other
hand, with the remaining probability 1 − p, this node stopped at this level, and the
backwards search stays at the same level. Counting the current node we are visiting
(+1), we can express E(i) by the recurrence:
E(i) = 1 + p · E(i − 1) + (1 − p)E(i).
This is a weird recurrence, because we are defining E(i) in terms of itself! But this is
not circular, since each subsequent invocation occurs with a lower probability value, and
so this converges. In particular, with a bit of algebra, we have:
1
E(i) = + E(i − 1).
p
By expansion, it is easy to verify that E(i) = pi . Thus, if our skip list has ` levels, the
expected search time is E(`) = `/p. By our assumption that p is a constant, we have
E(`) = O(`). By our earlier theorem on the maximum number of levels, we know that
` = O(log n). Therefore, the expected search time for skip lists is O(log n).

Skip Lists 56 CMSC 420


Clearly, there is more math involved in establishing expected-case bounds for randomized data
structures. (This is the bane of students of data structures!) The payoff is that these data
structures are usually faster than traditional structures because they are so simple. There is
no need for rotating nodes, merging or splitting nodes, skewing and splitting, as we have seen
with tree-based data structures.

Insertion and Deletion: Insertion into a skip list is almost as easy as insertion into a standard
linked list. Given a key x to insert, we first do a search on key x to find its immediate
predecessors in the skip list at each level of the structure. Next, we create a new node x. To
determine the height of this node, we toss a coin repeatedly until it comes up tails. (More
practically, we generate a random number until its parity is odd.) Letting k denote the
number of tosses needed, we create a node who height is the minimum of k + 1 and the
maximum height of the skip list. We then link this node in to its k + 1 lowest predecessors
in the list (see Fig. 49).

head tail
5
4
3 ∞
2
13
1
10 19
0 2 11 22 25

head insert(24) tail


5
4
3 ∞
2
13 24
1
10 19
0 2 11 22 25

Fig. 49: Inserting a new key 24.

Deletion is quite similar. Again, we search for the node containing the key to delete, and we
keep track of all its predecessors at various levels in the skip list. On finding it we unlink
the node from each level, exactly as we would do in a standard linked-list deletion. Both
operations take O(log n) time in expectation.

Implementation Notes: One of the appeals of skip lists is their ease of implementation. Most
of procedures that operate on skip lists make use of the same simple code that is used for
linked-list operations. One additional element is that you need to keep track of the level that
you are currently on when performing searching.
Skip-list nodes have variable size, which is a bit unusual. This is not a problem in program-
ming languages like Java that allow us to dynamically allocated arrays of variable size. Thus,
each node of the skip list will generally contain the key-value pair associated with this entry,
a variable-sized array of next pointers (so that p.next[i] points to the next node in the skip
list from node p at level i). Finally, the structure has two special “sentinel nodes, ” head and

Skip Lists 57 CMSC 420


tail. We assume that tail.key is set to some incredibly large value so that searches always
stop here.

Overall Performance: From a practical perspective, skip lists can do pretty much everything
that standard binary trees structures can do. In expectation, they require O(n) storage
space, and all dictionary operations can be performed in time O(log n) in expectation. Given
their simple linear structure, they are arguably easier to visualize and program. Experimental
studies show that skip lists are among the fastest data structures for sorted dictionaries (with
treaps). This is largely because the power of randomization keeps us from having to maintain
more complex balance information, and thus simplifies the code and processing.

Lecture 10: Splay Trees


Recap: We have discussed a number of different search structures for performing the basic ordered-
dictionary operations (insert, delete, and find). Here is a brief review:

(Standard) Binary Search Trees: Very simple, but no effort is made to balance the tree.
Height is O(log n) if keys are inserted in random order, but can be as bad as Ω(n).
(Random deletions are tricky, but if they are done in an unbiased manner, the expected
height is O(log n) as well.)
AVL Trees: These are height-balanced trees. Height is guaranteed to be O(log n), and all
dictionary operations run in O(log n) time. Each node stores height information (or
the balance factor), and the tree is rebalanced by rotations. This is a very well known
“classic” data structure, but AVL trees are neither the simplest nor fastest in practice.
2-3, Red-Black, and AA Trees: 2-3 trees were based on the notion of having a variable-
width node, but the tree was perfectly balanced. Red-black and AA trees were binary
tree encodings of 2-3 trees. For all three, the height is guaranteed to be O(log n), and
all dictionary operations run in O(log n) time.
Treap and Skip Lists: These are randomized structures, which means that they rely on a
random-number generator to determine their structures. All operations run in O(log n)
time in expectation over the random choices. (The user’s actions cannot impact the per-
formance.) The skip list has expected space O(n), but its worst case space is O(n log n).

Beyond Dictionary Operations: There are many other operations, beyond the standard ordered-
dictionary operations (insert, delete, find) that a user might want the data structure to sup-
port. Here are a few examples:

Order-statistic queries: Given an integer k, where 1 ≤ k ≤ n, find the kth smallest element
in the set.
Range queries: Given two keys x0 and x1 , compute the sum (or any associative operation)
on all the keys of the dictionary that lie between x0 and x1 .
Split/Merge: Given a search tree T and a key x, split T into two search trees T1 and T2
such that all the keys of T1 are ≤ x and all the keys of T2 are > x. Conversely, given two
search trees T1 and T2 such that every key of T1 is smaller than every key of T2 , merge
them into a single search tree. The target is to solve both problems in O(log n) time.

Splay Trees 58 CMSC 420


Finger-Search Queries: We have just performed a find and located the node containing a
key x. Now, we want to access a key y that is close to x in order. (E.g., you have a
pointer to “Brendan” and now you want to find “Brendon.”) Can this be done faster
than restarting the search from the root?

Beyond Worst-Case Complexity: In the above data structures, our analysis was based on
worst-case performance. In practice, some queries are more likely than others. If so, expected-
case performance may be more appropriate.

Expected-Case Optimal Search Tree: You are given n keys, Pn {x1 , . . . , xn }. Let pi denote
the probability of accessing key xi . Thus, 0 ≤ pi ≤ 1 and i=1 pi = 1. Suppose a binary
search tree T stores xi Pin a node at depth di from the root. The expected search time
for this tree is E(T ) = ni=1 pi di .
Given the pi ’s can we compute the binary search tree T that minimizes E(T )?

The structures we have seen so far do not address this question. There does exist an efficient
algorithm for computing the optimum binary-search tree. (It is a nice exercise in dynamic
programming.) However, in order to compute this data structure, we assume that we know
what the access probabilities. What if they are not known? What if they are known at
some time, but they change over time? In such a dynamic setting, a better solution would
be a self-adjusting tree. This is a tree that dynamically adjusts its structure according to a
dynamically changing set of access probabilities. Intuitively, keys that are frequently accessed
will filter up near the root, and keys that are rarely accessed will slowly fall to the deeper
levels of the search tree.
Achieving this goal in a manner that can be made theoretically rigorous is a challenging
problem. It was solved by Danniel Sleator and Robert Tarjan in 1985 by a data structure,
called a splay tree. The splay tree itself is an amazing idea. While the tree is provably
(theoretically) optimal with respect to a number of different criteria, the efficiency comes at
a cost. Individual operations may take a long time, and efficiency is in the amortized sense.
Splay Trees and Amortization: All the balanced binary tree structures we have seen so far
have two things in common: (1) they use rotations to maintain structure and (2) each node
stores additional information to allow the tree to maintain balance. A splay tree is a binary
search tree, and it uses rotations to maintain its structure, but unlike the others no additional
storage is needed for balance information. (Thus, each node is just a node of a standard binary
search tree. It stores a key, value, left child, and right child. That is all!)
Because a splay tree has no balance information, it is possible for the tree to become very
unbalanced. Splay trees are remarkable in that they are self-adjusting. Having nodes that
are great depth in a binary tree is not a problem per se, until such an element is accessed.
Splay trees employ a clever trick so that whenever a very deep node is accessed, the tree will
restructure itself so that the tree becomes significantly more balanced. This is really quite
clever, when you consider the fact that the tree has no idea whether it is balanced or not!
This means that, as with standard binary search trees, it is possible that a single access
operation could take as long as Ω(n) time (and not the O(log n) that we would like). However,
splay trees are efficient in the amortized sense:

Splay Tree Performance Bound: Starting with an empty tree, the total time needed to
perform any sequence of m insert/delete/find operations on a splay tree is O(m log n),
where n is the maximum number of nodes in the tree.

Splay Trees 59 CMSC 420


Thus, although any individual operation may be quite costly (as high as Ω(n) time), the
average cost of any operation is at most O(log n). This type of analysis (averaging over a
sequence of operations) is called an amortized analysis. In the business world, amortization
refers to the process of paying off a large payment over time in small installments. Here we
are paying for the total running time of the data structure’s algorithms over a sequence of
operations in small installments. Even though each individual operation may be costly, the
overall average is small.
As mentioned above, splay trees tend to bring frequently accessed keys up near the root.
Indeed, it can be shown that if the access probabilities are stable, the cost of splay-tree
operations asymptotically matches the performance of an optimal binary search tree.
No balance information, optimal expected search times, self-adjusting behavior? Splay trees
sound amazing—and they are. However, they are not used that often in practice. In spite of
their cool properties, they tend to be slower in practice than red-black trees, treaps, and skip
lists. So, these other structures tend to be used more often than splay trees.

Splaying: The key to any self-adjusting data structure is the operation that incrementally modifies
the organization of objects in the structure. In the case of a splay tree, this operation is called
splaying. Given a key value x and a splay tree T the operation T.splay(x) searches for the
key x within T , and reorganizes T while rotating the node with key x up to the root of the
tree. If x is not in the tree, either the inorder predecessor or inorder successor of x will be
brought to the root instead.
Here is how T.splay(x) works. We start with the normal binary search descent from the
root of T to find the node p containing key x, or the last node visited before we fall out of
the tree. (Observe that in the latter case, p is either the inorder successor or predecessor of
x, depending on whether we fell out along a null left child link or a null right child link.
Our objective is to bring the node p up to the root.

An idea that doesn’t work: At this point you may see an obvious strategy to bring p to
the root. We walk up the tree to p’s ancestors, applying a rotation at each. While this
will satisfy one of our requirements of moving p to the root, it will not do a good job of
reorganizing the tree. To see why, suppose that we attempt to apply rotations to node
a in Fig. 50. Observe that while a is brought up to the root, the tree is still very skewed
and unbalanced.
g g g g g a
f f f f f g
e e e e a f
d d d a e e
c c a d d d
b a c c c c
a b b b b b

Fig. 50: Single rotations up to the root—the tree is still poorly balanced.

A better idea: The poor performance of the single-rotation method suggests that we try
something that “stirs things up” a bit more. Our next idea is to go two nodes at a time
and apply rotations at each of these nodes. For example, in Fig. 51, we see that by

Splay Trees 60 CMSC 420


applying two rotations at a time, first at the grandparent and then at the parent has
a dramatically better result on the tree height, cutting it roughly in half. (But will it
work general?)

g g g a
f f f 1 f
e e a 2 d g
d d 1 d b e
c a 2 b e c
b 1 b c
a 2 c

Fig. 51: Two-at-a-time rotations up to the root—the tree height is cut almost in half.

The above exercise suggests that we work two levels at a time. Here is a more formal descrip-
tion of the splay operation at a single node p of the tree:
ˆ If p has both a parent and grandparent, q and r be the parent and grandparent, respec-
tively. We consider two cases:
– Zig-zig: If p and q are both right children or both left children, we apply a rotation
at r followed by a rotation at q, to bring p to the top of this 3-node ensemble (see
Fig. 52(a)), and continue up the tree.
– Zig-zag: If p and q are left-right or right-left children, we apply a rotation at
q followed by a rotation at r, to bring p to the top of this 3-node ensemble (see
Fig. 52(b)), and continue up the tree.

Zig-zig Zig-zag Zig


p p root p
r f b r f d d b
q 1 q q 2 q r p 1
d d b b f b d
p r p
2 G A 1 G E A
b f d
E C A A C E G A C C E
A C E G C E
(a) (b) (c)

Fig. 52: Splaying cases: (a) Zig-Zig, (b) Zig-Zag and (c) Zig.

ˆ Zig: If p is the child of the root, we do a single rotation at the root of T , making p the
new root (see Fig. 52(a)), and are now done.
ˆ If p is the root of T , we are done.

A full example is shown in Fig. 53. Note that the tree’s inorder structure is preserved. Also
observe that nodes lying on or near the search path to p (such as 1) tend to be lifted much
closer to the root through this operation.
You might wonder why we performed these particular rotations in this particular order. The
situation that we are most concerned about is where the tree is highly imbalanced and we

Splay Trees 61 CMSC 420


Zig-zag Zig-zig Zig
10 10 10 3
p
8 11 8 11 3 11 2 10

6 9 6 9 2 6 1 6 11
p
2 7 3 7 1 4 8 4 8

1 4 2 4 5 7 9 5 7 9
p
3 5 1 5

Fig. 53: The full splay operation on p.

repeatedly attempt to access elements that are unusually deep in the tree, say much deeper
than O(log n) depth. Such an operation will be expensive. We want to be sure that we cannot
repeat it many times.
Because we rotate as we are backing up the tree, we may assume that the key we sought was
in one of p’s two children (shaded in blue in the figure). Observe in Fig. 52(a) and (b) that
both of these subtrees are lifted up at least one level in the tree following the zig-zig or zig-zag
rotation. Thus, if this were a long search path, then after repeating this operation all the way
to the root, the nodes along this long path would be lifted up to roughly half of their original
level. (The reason we say “halved” is that for every two levels we perform an operation that
lifts each subtree up by at least one level.) Thus, splaying has the desirable effect that it
tends to significantly reduce the length of long search paths, whenever we attempt to access
something within one of these long paths.
You may protest at this point. What about the negative impact splaying has on the levels of
other subtrees (like G in Fig. 52(a))? This is true. Clearly, there must be winners and losers.
But we the accessed node was in p’s subtree, and our principal concern is repeat visits deep
in the tree cannot be allowed to repeat excessively. A certain amount of damage to the rest of
the tree’s structure is the price that we pay for this. But to make this convincing, you should
read the full amortized proof. We will not cover it since it is quite mathematically involved.

Splay Tree Operations: Now that we know how to perform a single splay operation, how to we
use this to perform the basic dictionary operations, insert, delete, and find?

find(x): To find key x in tree T , we simply invoke T.splay(x). If x is in the tree it will
be transported to the root. If after the operation, the root key is not x, we know that
x is not in the dictionary. (Why bother moving x to the root? Recall that in the
case of expected-case optimal trees, a small number of nodes may have very high access
probabilities. The splaying operation has the tendency to keep these nodes nearest to
the root.)
insert(x, v): To insert the key-value pair (x, v), we first invoke T.splay(x). If x is already
in the tree, it will be transported to the root, and we can take appropriate action (e.g.,
throw an exception). Otherwise, the root will consist of some key y that is either the

Splay Trees 62 CMSC 420


key immediately before x or immediately after x in T . Let us consider the former case
(y < x), since the other case is symmetrical. Let R denote the right subtree of the
root. We know that all the keys in R are greater than x so we create a new root node
containing x and v, and we make R its right subtree (see Fig. 54). The remaining nodes
are hung off as the left subtree L of this node.
T.splay(x) y insert x
T x
(assume y < x)
L R y R

Fig. 54: Splay-tree insertion of x.

delete(x): To delete x, we first invoke T.splay(x) to bring the deleted node to the root.
If the root’s key is not x, then x is not in the tree, and we can take appropriate error
action. Otherwise, let L and R be the left and right subtrees of the resulting tree (see
Fig. 55). If L is empty, then x is the new smallest key in the tree. We remove x and R
becomes the new tree. We can do the symmetrical thing if R is empty.
T.splay(x) R.splay(x) y
T x x
R0
L R L y L
R0

Fig. 55: Splay-tree deletion of x.

Otherwise, both subtrees are nonempty, and we next find an appropriate replacement
node. To do this, we perform R.splay(x). (It might be smarter to flip a coin and do
L.splay(x) half the time for the sake of symmetry. This will not affect the theoretical
analysis of the tree’s performance.) Let’s call the resulting subtree R0 . Since x is not in
R and all the keys in this subtree are greater than x, this will bring the smallest key y
of R to the root. This implies that y is x’s inorder successor from the original tree. It
follows that y can have no left child in R0 . To complete the operation, we simply link L
as the left child of R0 .

Why are Splay Trees Great? Splay trees are amazing in the sense that they satisfy (at least
theoretically) many optimality properties, which none of the other search trees that we have
seen. Here is a partial list.

Balance Theorem: The cost of applying any sequence of m accesses (insert, delete, find)
on a splay tree with n elements is O(m log n + n log n)

Splay Trees 63 CMSC 420


Static Optimality Theorem: Let qx denote the number of times that an element x is
accessed in a sequence of m accesses to a splay tree. (Think of qx /m = px as an
empirical measure of the probability of accessing x.) Then the cost of performing these
accesses is !
X m
O m+ qx log .
x
q x

The expression in the summation is the entropy of the access probability distribution,
and this is a theoretical lower bound on the performance of any decision-based data
structure.
Static Finger Theorem: Assume that the items are numbered 1 through n in ascending
order. Let f be any fixed element (the “finger”). Then the cost of performing any
sequence of operations is
!
X
O m + n log n + log(|x − f | + 1) .
x

Intuitively, this says that the cost of any operation is the logarithm of the number of
elements between the accessed element and the finger.
Dynamic Finger Theorem: Assume that finger for each step in accessing an element y is
the location of element accessed in the previous step x. Then the cost of performing any
sequence of operations is
!
X
O m + n log n + log(|y − x| + 1) .
x,y

Intuitively, this that if you start a search from some element y to another element x, the
cost of the operation is the log of the number of elements between them.
Working-Set Theorem: Each time an element x is accessed, let t(x) be the number of
elements that were accessed since x’s last access. Then the cost of performing the
sequence is !
X
O m + n log n + log(t(x) + 1) .
x

Intuitively, this says that if we accessed an element t steps ago, then the time to access
it now is rougly log t.
Scanning Theorem: If each element of a splay tree is accessed in ascending (or descending)
order, the total time for all these accesses is O(n). (A naive bound would be O(n log n).)

Why Splay Trees Work: (Optional) Sleator and Tarjan proved that, if you start with an empty
tree and perform any sequence of m splay tree operations (insert, delete, find), then the total
running time will be O(m log n), where n is the maximum number of elements in the tree at
any time. Thus the average time per operation is O(log n), as we would like. Their amortized
analysis involves a potential-based argument. We will sketch the idea without getting into the
details.
The intuition is to associate a potential with any tree. Intuitively, the potential is a value
that informs you how badly unbalanced the tree is. In financial terms, we think of potential

Splay Trees 64 CMSC 420


as money in a bank account. As it accrues, we can use it to pay for the cost of rebalancing
the tree. The argument relies on showing that no matter what sequence of operations occurs,
there is always a nonnegative potential (“money in the bank”). The amortized cost of any
operation is defined to be the sum of the actual cost of the operation (e.g., the number of
rotations performed) and the change in potential. The objective is to show that the amortized
cost of every operation is O(log n). Some operations may be very costly (e.g., splaying along
a path of length n), but the resulting decrease in the tree’s potential will be large enough to
compensate for this.
So, what is the potential function used for proving splay trees have good amortized perfor-
mance? First, for each node p of the tree, define size(p) to be the number of nodes in the
subtree rooted at p. Define rank(p) = lg size(p). Intuitively, the rank of a node is the “ideal
height” of this subtree in a perfectly balanced tree. The potential function of a tree is defined
to be X X
Φ(T ) = rank(p) = lg size(p).
p∈T p∈T

The following is the key to the analysis. It bounds the amortized cost of each rotation
operation.

Rotation Lemma: Given any node p, let rank(p) and rank0 (p) denote its rank before and
after applying a rotation step. The amortized cost of a zig rotation at p is at most
1 + 3(rank0 (p) − rank(p)), and the amortized cost of a zig-zig or zig-zag rotation at any
node p is at most 3(rank0 (p) − rank(p)).
(Partial) Proof: We will only prove the case of the zig-zig rotation. The other cases follow
by a similar sort of derivation, which we leave as an exercise.
To simplify notation, let’s write rank(x) as r(x) and size(x) and s(x). Suppose that
we perform a zig-zig rotation involving three nodes x, y, and z, as shown in Fig. 52(a)
(where x, y, and z play the roles of p, q, and r, respectively). Let r(x), r(y), and
r(z) denote the ranks of these items before the rotation, and let r0 (x), r0 (y), and r0 (z)
denote these ranks after the rotation. The actual cost for the zig-zig is 2 rotations,
and since these are the only changes made in the tree, the change in potential is ∆Φ =
(r0 (x) + r0 (y) + r0 (z)) − (r(x) + r(y) + r(z)). Thus, the amortized cost of this operation
is
A = 2 + ∆Φ = 2 + (r0 (x) + r0 (y) + r0 (z)) − (r(x) + r(y) + r(z)).
Rearranging terms we have

A = 2 + (r0 (x) − r(z)) + r0 (y) + r0 (z) − r(x) − r(y).

Observe that after the rotation x’s subtree is the same as z’s subtree before the rotation,
so r0 (x) = r(z). Also, observe that before the rotation y’s subtree contains x’s subtree,
so r(y) ≥ r(x) or equivalently −r(y) ≤ −r(x). After the rotation y’s subtree is contained
in x’s subtree, so r0 (y) ≤ r0 (x). Thus, we have

A ≤ 2 + 0 + r0 (x) + r0 (z) − r(x) − r(x) = 2 + r0 (x) + r0 (z) − 2r(x).

Next, we will employ an observation about the logarithm function. It is a concave


function, which implies that for any a and b, (lg a + lg b)/2 ≤ lg((a + b)/2). Also, observe

Splay Trees 65 CMSC 420


that s(x) + s0 (z) ≤ s0 (x). Using these and the fact that r(x) = lg s(x), we have

r(x) + r0 (z) lg s(x) + lg s0 (z) s(x) + s0 (z)


= ≤ lg
2 2 2
s0 (x)
≤ lg = (lg s0 (x)) − 1 = r0 (x) − 1.
2
This implies that r0 (z) ≤ 2r0 (x) − r(x) − 2. Plugging this in to our expression for A, we
have

A ≤ 2 + r0 (x) + (2r0 (x) − r(x) − 2) − 2r(x) ≤ 3r0 (x) − 3r(x) ≤ 3(r0 (x) − r(x)),

which is what we set out to prove. Whew!

It is rather difficult to see what the benefit of this symbol manipulation is, but the key is that
we have completely eliminated the +2 term in the zig-zig and zig-zag cases. This means that
we can perform any number of these and the only terms that accumulate will be of the form
r0 (x) − r(x), which is just the potential change. By applying this all the way up the rotation
path, we obtain a telescoping series (and a final +1 for the last zig rotation). This implies
the following:

Splay Lemma: The amortized cost of a T.splay(p) is at most 1 + 3(rank(root) − rank(p)).

Since the rank of the root cannot exceed lg n and the rank of p is nonnegative, we immediately
have:

Corollary: The amortized cost of T.splay(p) is O(log n).

Finally, since each dictionary operation involves a constant number of splaying operations,
we obtain the final result.

Theorem: The amortized cost of each dictionary operation in a splay tree is O(log n).

Wow! That is one impressive bit of data structure analysis. You are not responsible for
knowing it, but this is great example of how amortized analyses are performed.

Lecture 11: B-Trees


B-trees: While binary trees are great data structures for ordered dictionaries stored in main
memory, these data structures are really not appropriate for data stored on external memory
systems (i.e., disks). When accessing data on a disk, an entire block (or page) is input at
once. So it makes sense to design the tree so that each node of the tree essentially occupies
one entire block. This idea applies more generally to modern memory hierarchies, where
memory is divided into various levels of caches. The data structure we will discuss today is
appropriate whenever memory can be accessed efficiently in blocks.
An alternative strategy is to use multiway search trees, where each node is chosen so that
it fits coincides with a memory block. Standard binary search tree store a single key value
and two children left and right storing keys that are smaller than and greater than this key,
respectively. In a j-ary multiway search tree node, a node stores references to j different
subtrees, T1 , . . . , Tj and contains j − 1 key values, a1 < . . . < aj−1 , such that subtree Ti stores

B-Trees 66 CMSC 420


a a1 : a2 : a3
T1 T2 T3 T4

x<a x>a x < a1 a1 < x < a2 a2 < x < a3 a3 < x


(a) (b)

Fig. 56: Binary and 4-ary search tree nodes.

nodes whose key values x such that ai−1 < x < ai . (To handle the boundary cases, let’s make
the convention that a0 = −∞ and aj = +∞, but these are not stored in the node.)
B-trees are multiway search trees, in which we achieve balance by constraining the “width”
of each node. Indeed, B-trees are the natural generalization of 2-3 trees to higher widths.
B-trees were first introduced way back in 1970 by Rudolf Bayer and Edward McCreight. They
have proven to be very popular, but with popularity comes variety. Numerous modifications
and adaptations of B-trees have been developed over the years. We will present one, fairly
simple, formulation. (Later in the lecture we will discuss a particularly popular variant, called
B+ trees.)
For any integer m ≥ 3, a B-tree of order m is a multiway search tree has the following
properties (see Fig. 57):

ˆ The root is either a leaf or has between two and m children.


ˆ Each node except the root has between dm/2e and m children (which may be empty,
that is null). A node with j children contains j − 1 key.
ˆ All leaves are at the same level of the tree.

49 75 -- --

07 20 31 40 56 66 71 -- 81 89 -- --

02 08 23 35 42 53 58 67 72 77 84 90
04 09 25 38 44 54 59 68 74 78 85 91
06 13 26 -- 48 -- 62 70 -- 80 87 94
-- 19 30 -- -- -- 64 -- -- -- -- 97

Fig. 57: B-tree of order 5. (Each node, except possibly the root has 3–5 children.)

The 2-3 tree that we presented earlier is an example of a B-tree of order 3. The typical fan-out
values for B-trees are quite large. For example, B-trees of order of around 100 are common
in practice. A node in such a tree has between 50 and 100 children and holds between 49 and
99 keys. Of course, with such high fan-outs, the heights of the tree are quite small.

Height Analysis: The following theorem show that as fan-out of a B-tree grows, the height of
the tree decreases.
lg n
Theorem: A B-tree of order m containing n keys has height at most γ , where γ = lg m
2.

B-Trees 67 CMSC 420


Proof: The formal proof is a bit messy, so we will instead prove a simpler result. Let’s let
n denote the number of keys just at the leaf level. (This is reasonable, because in any
multiway tree, a constant fraction of the nodes are leaves, and as the degree increases,
the fraction increases.) Let h ≥ 1 denote the height of our tree, and let N (h) denote the
smallest possible number of keys in the leaves of this tree. Since N (h) ≤ n, it suffices to
show that h ≤ lg Nγ(h) .
Recall that the degree of a node is its number of children. The root node has degree at
least two, and all other nodes have degree at least d = m/2. Thus, there are at least
two nodes at depth 1, and with each additional level the number increases by a factor
of d. Thus, there are at least 2d nodes at depth 2, 2d2 at depth 3, and generally 2dh−1
at depth h. Each of these leaf nodes contains at least d keys, implying the total number
of keys at the leaf level is at least 2dh . Thus, we have

N (h) ≥ 2dh ≥ dh
⇒ lg N (h) ≥ lg(dh ) = h lg d (taking lg on both sides)
lg N (h)
⇒ ≥ h.
lg d
lg N (h)
Recalling that γ = lg m
2 = lg d, a tree with n keys has height at least h ≤ γ , as
desired.

For example, when m = 100, this implies that the height of the B-tree is not greater than
(lg n)/5.6, that is, it is 5.6 times smaller than a binary search tree. For example, this means
that you can store over a 100 million keys in a search structure of height roughly five!

Node structure: Although B-tree nodes can hold a variable number of items, this number gener-
ally changes dynamically as keys are inserted and deleted. Therefore, every node is allocated
with the maximum possible size, but most nodes will not be fully utilized. (Experimental
studies show that B-tree nodes are on average about 2/3 utilized.)
The code block below shows a possible Java implementation of a B-tree node implementation.
In this case, we are storing integers as the elements. We place twice as many elements in each
leaf node as in each internal node, since we do not need the storage for child pointers.
B-Tree Node
final int M = ... // order of the B-tree

class BTreeNode {
int nChildren; // number of children (from M/2 to M)
BTreeNode child[M]; // children pointers
Key key[M-1]; // keys
Value value[M-1]; // values
}

Note that 2-3 trees and 2-3-4 trees discussed in earlier lectures are special cases (when M = 3
and M = 4, respectively.)

Search: Searching a B-tree for a key x is a straightforward generalization of binary tree searching.
When you arrive at an internal node with keys a1 < a2 < . . . < aj−1 search (either linearly or
by binary search) for x in this list. If you find x in the list, then we have found x. Otherwise,

B-Trees 68 CMSC 420


determine the index i such that ai−1 < x < ai . (Recall that a0 = −∞ and aj = +∞.) Then
recursively search the subtree Ti . When you arrive at a leaf, search all the keys in this node.
If it is not here, then x is not in the B-tree.

Restructuring: In an earlier lecture, we showed how to restructure 2-3 trees. We had three
mechanisms: splitting nodes, merging nodes, and subtree adoption. We will generalize each
of these operations to general B-trees.

Key Rotation (Adoption): Recall that a node in a B-tree can have from dm/2e up to m
children, and the number of keys is smaller by one. As a result of insertion or deletion,
a node may acquire one too many (m + 1 children and hence, m keys) or one too few
(dm/2e − 1 children and hence, dm/2e − 2 keys).

... 57 ... Key rotation ... 44 ...


(m = 5)
16 27 44 -- !! 63 -- -- -- 16 27 -- -- 57 63 -- --

T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T6

Fig. 58: Key rotation for a B-tree of order m = 5.

The easiest way in which to remedy the imbalanced is to move a child into or from one
of your siblings, assuming that you have a sibling can absorb this change. This is called
key rotation (or as I call it, adoption). For example, in Fig. 58, the node in red has too
few children, and since its left sibling can spare a child, we move this node’s rightmost
child over, sliding the associated key value up to the parent and we take the parent’s
key value.
This operation is not always possible, because it depends on the existence of a sibling
with a proper number of keys. Because allocating and deallocating nodes is a relatively
expensive operation, we prefer this operation whenever it is available.
Node Splitting: As the result of insertion, a node may acquire one too many children (m+1
children and hence, m keys). When this happens and key rotation is not available, we
split the node into two nodes, one having m0 = dm/2e children and the other having the
remaining m00 = m+1−dm/2e children. Clearly, the first node has an acceptable number
of children. The following lemma demonstrates that the other node has an acceptable
number of children as well.
Lemma: For all m ≥ 2, dm/2e ≤ m + 1 − dm/2e ≤ m.
Proof: If m is even, then dm/2e = m/2, and the middle expression in the inequality
reduces to m + 1 − m/2 = m/2 + 1. Thus, the claim is equivalent to
m m
≤ + 1 ≤ m,
2 2
which is clearly true for any m ≥ 2. On the other hand, if m is odd then dm/2e =
(m+1)/2, and the middle expression in the inequality reduces to m+1−(m+1)/2 =

B-Trees 69 CMSC 420


(m + 1)/2. Thus, the claim is equivalent to
m+1 m+1
≤ ≤ m,
2 2
which is also clearly true for any m ≥ 1.

44 Promote
Node splitting
!!
16 27 44 63 76 (m = 5) 16 27 -- -- 63 76 -- --

T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T6

Fig. 59: Node splitting for a B-tree of order m = 5.

Returning to node splitting, we create two nodes and distribute the smallest m0 subtrees
to the first and the remaining m00 to the second node (see Fig. 59). Among the m − 1
keys, m0 − 1 smallest keys go with the first node and the m00 − 1 largest keys go with
the other node. Since (m0 − 1) + (m00 − 1) = m − 2, we have one extra key that does
not fit into either of these nodes. This node is promoted to the parent node. (As with
2-3 trees, if we do not have a parent, we create a new root node with this single key and
just two children. By the way, this is the reason that we allowed the root to have fewer
than dm/2e children.)
Since the parent acquires an extra key and extra child, the splitting process may prop-
agate to the parent node.
Node Merging: As the result of deletion, a node may have one too few children (dm/2e − 1
children and hence, dm/2e−2 keys). When this happens and key rotation is not available,
we may infer that its siblings have the minimum number dm/2e children. We merge this
node with either of its siblings into a single node having a total of m0 = (dm/2e − 1) +
dm/2e = 2 dm/2e − 1 children. The following lemma demonstrates that the resulting
node has an acceptable number of children.
Lemma: For all m ≥ 2, dm/2e ≤ 2 dm/2e − 1 ≤ m.
Proof: If m is even, then dm/2e = m/2, and the middle expression in the inequality
reduces to 2(m/2) − 1 = m − 1. Thus, the claim is equivalent to
m
≤ m − 1 ≤ m,
2
which is easily true for any m ≥ 2. On the other hand, if m is odd then dm/2e =
(m + 1)/2, and the middle expression in the inequality reduces to m. Thus, the
claim is equivalent to
m+1
≤ m ≤ m,
2
which is easily true for any m ≥ 1.

B-Trees 70 CMSC 420


Demote 44
44 from parent
!! Node merging
16 27 -- -- 63 -- -- -- 16 27 44 63
(m = 5)

T1 T2 T3 T4 T5 T1 T2 T3 T4 T5

Fig. 60: Node merging for a B-tree of order m = 5.

Returning to node merging, we merge the two nodes into a single node having m0 children
(see Fig. 60). The number of keys from the two initial nodes is dm/2e − 2 + dm/2e =
2 dm/2e − 2 = m0 − 2, which is one too few. We demote the appropriate key from the
parent’s node to yield the desired number of keys.
Since the parent has lost a key and a child, the merging process may propagate to the
parent node.

Given these operations, we can now describe how to perform the various dictionary operations.

Insertion: In the case of 2-3 trees, we would always split a node when it had too many keys. With
B-trees, creating nodes is a more expensive operation. So, whenever possible we will try to
employ key rotation to resolve nodes that are too full, and we will fall back on node splitting
only when necessary.
To insert a key into a B-tree of order m, we perform a search to find the appropriate leaf into
which to insert the node. If we find the key, then we signal a duplicate-key error. Otherwise,
if the leaf is not at full capacity (it has fewer than m − 1 keys) then we simply insert it and
are done. Note that this will involve sliding keys around within the leaf node to make room
for the new entry, but since m is assumed to be a constant (e.g., the size of one disk page),
we ignore this extra cost.
Otherwise the node overflows and to remedy the situation, we first check whether either
sibling is less than full. If so, we perform a rotation moving the extra key and child into
this sibling. Otherwise, we perform a node split as described above (see Fig. 61). When this
happens, the parent acquires a new child and new key, and thus the splitting process may
continue with the parent node.

Deletion: As in binary tree deletion we begin by finding the node to be deleted. We need to find
a suitable replacement for this node. This is done by finding the largest key in the left child
(or equivalently the smallest key in the right child), and moving this key up to fill the hole.
This key will always be at the leaf level. This creates a hole at the leaf node. If this leaf node
still has sufficient capacity (at least dm/2e − 1 keys) then we are done.
Otherwise, we have an underflow situation at this node. As with insertion we first check
whether a key rotation is possible. If one of the two siblings has at least one key more than
the minimum, then we rotate the extra key into this node, and we are done (see Fig. 62).
If this is not possible, then any siblings of ours must have the minimum number of dm/2e
children, and so we can apply a node node merge (see Fig. 63).

B-Trees 71 CMSC 420


insert(29)
49 -- -- -- 49 -- -- --

07 20 31 40 56 66 71 75 07 20 31 40 56 66 71 75
!!
02 08 23 32 42 53 58 67 72 77 02 08 23 32 42 53 58 67 72 77
04 09 25 34 44 54 59 68 74 78 04 09 25 34 44 54 59 68 74 78
06 13 26 35 48 -- 62 70 -- 80 06 13 26 35 48 -- 62 70 -- 80
-- 19 30 38 -- -- 64 -- -- -- -- 19 29 38 -- -- 64 -- -- --
30
split split
26 49 -- -- 49 -- -- --

!!
07 20 -- -- 31 40 -- -- 56 66 71 75 07 20 26 31 40 56 66 71 75

02 08 23 29 32 42 53 58 67 72 77 02 08 23 29 32 42 53 58 67 72 77
04 09 25 30 34 44 54 59 68 74 78 04 09 25 30 34 44 54 59 68 74 78
06 13 -- -- 35 48 -- 62 70 -- 80 06 13 -- -- 35 48 -- 62 70 -- 80
-- 19 -- -- 38 -- -- 64 -- -- -- -- 19 -- -- 38 -- -- 64 -- -- --

Fig. 61: Insertion of key 29 (order m = 5).

delete(20) Copy replacement (23)


26 49 -- -- 26 49 -- --
delete replacement

07 20 -- -- 31 40 -- -- 56 66 71 75 07 23 -- -- 31 40 -- -- 56 66 71 75

02 08 23 29 35 42 53 58 67 72 77 02 08 23 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 25 30 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- 19 -- -- -- -- -- 64 -- -- -- -- 19 -- -- -- -- -- 64 -- -- --

key rotation
26 49 -- -- 26 49 -- --

07 19 -- -- 31 40 -- -- 56 66 71 75 07 23 -- -- 31 40 -- -- 56 66 71 75
!!
02 08 23 29 35 42 53 58 67 72 77 02 08 25 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 -- 30 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- -- -- -- -- -- -- 64 -- -- -- -- 19 -- -- -- -- -- 64 -- -- --

Fig. 62: Deletion of key 23 (order m = 5).

B-Trees 72 CMSC 420


delete(30)
26 49 -- -- 26 49 -- --

07 19 -- -- 31 40 -- -- 56 66 71 75 07 19 -- -- 31 40 -- -- 56 66 71 75
!!
02 08 20 29 35 42 53 58 67 72 77 02 08 20 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 25 -- 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- -- -- -- -- -- -- 64 -- -- -- -- -- -- -- -- -- -- 64 -- -- --

rotate key
26 56 -- -- 26 49 -- -- merge

!!
07 19 -- -- 40 49 -- -- 66 71 75 -- 07 19 -- -- 40 -- -- -- 56 66 71 75

02 08 20 29 42 53 58 67 72 77 02 08 20 29 42 53 58 67 72 77
04 09 25 31 44 54 59 68 74 78 04 09 25 31 44 54 59 68 74 78
06 13 -- 35 48 -- 62 70 -- 80 06 13 -- 35 48 -- 62 70 -- 80
-- -- -- 38 -- -- 64 -- -- -- -- -- -- 38 -- -- 64 -- -- --

Fig. 63: Deletion of key 30 (order m = 5).

The removal of a key from the parent’s node may cause it to underflow. Thus, the process
may need to be repeated recursively up to the root. If the root now has only one child, and
we make this single child the new root of the B-tree.

B+ trees: B-trees have been very successful, and a number of variants have been proposed. A
particularly popular one for disk storage is called a B+ tree. The key differences with the
standard B-tree as the following:

ˆ Internal and leaf nodes are different in structure:


– Internal nodes store keys only, no values. The keys in the internal nodes are used
solely for locating the leaf node containing the actual data, so it is not necessary
that every key appearing in an internal node need correspond to an actual key-value
pair.
– All the key-value pairs are stored in the leaf nodes. There is no need for child
pointers. (This also saves space.)
ˆ Each leaf node has a next-leaf pointer, which points to the next leaf in sorted order.

Storing keys only in the internal nodes saves space, and allows for increased fan-out. This
means the tree height is lower, which reduces number of disk accesses. Thus, the internal
nodes are merely an index to locating the actual data, which resides at the leaf level. (The
policy regarding which keys a subtree contains are changed. Given an internal node with keys
ha1 , . . . , aj−1 i, subtree Tj contains keys x such that ai−1 < x ≤ ai .)
The next-leaf links enable efficient range reporting queries. In such a query, we are asked to
list all the keys in a range [xmin , xmax ]. We simply find the leaf node for xmin and then follow
next-leaf links until reaching xmax .

Scapegoat Trees 73 CMSC 420


Lecture 12: Balancing by Rebuilding – Scapegoat Trees
Scapegoat Trees: We have previously studied the splay tree, a data structure that supports
dictionary operations in O(log n) amortized time. Recall that this means that, over a series of
operations, the average cost per operation is O(log n), even though the cost of any individual
operation can be as high as O(n). We will now study another example of a binary search tree
that has good amortized efficiency, called a scapegoat tree. The idea underlying the scapegoat
tree was due initially to Arne Anderson (of AA trees) in 1989. This idea was rediscovered by
Galperin and Rivest in 1993, who made some refinements and gave it the name “scapegoat
tree” (which we will explain below).
While amortized data structures often interesting in their own right, there is a particular
motivation for studying the scapegoat tree. So far, all of the binary search trees that we have
studied achieve balance through the use of rotation operation. Scapegoat trees are unique in
that they do not rely on rotation. This is significant because there exist binary trees that
cannot be balanced through the use of rotations. (This is often true of multidimensional data
structures in geometry.) As we shall see, the scapegoat tree achieves good balance in a very
different way—when a subtree is imbalanced, it is rebuilt from scratch.
Below, we will discuss the details of how the scapegoat tree works. Here is a high-level
overview. A scapegoat tree is a binary search tree, which does not need to store any addi-
tional information in the nodes, other than the key, value, and left and right child pointers.
Nonetheless, the height of a tree with n nodes will always be O(log n). (Note that this is not
the case for splay trees, whose height can grow to as high as O(n).) Insertion and deletions
work roughly as follows.

Insertion:
ˆ The key is first inserted just as in a standard (unbalanced) binary tree
ˆ We monitor the depth of the inserted node after each insertion, and if it is too high,
there must be at least one node on the search path that has poor weight balance
(that is, its left and right children have very different sizes).
ˆ In such a case, we find such a node, called the scapegoat,6 and we completely rebuild
the subtree rooted at this node so that it is perfectly balanced.
Deletion:
ˆ The key is first deleted just as in a standard (unbalanced) binary tree
ˆ Once the number of deletions performed is sufficiently large relative to the entire
tree size, rebuild the entire tree so it is perfectly balanced.

You might wonder why there is a notable asymmetry between the rebuilding rules for insertion
and deletion. The existence of a single very deep node is proof that a tree is out of balance.
Thus, for insertion, we can use the fact that the inserted node is too deep to trigger rebuilding.
However, observe that the converse does not work for deletion. The natural counterpart
would be “if the depth of the leaf node containing the deleted key is too small, then trigger a
rebuilding operation.” However, the fact that a single node has a low depth, does not imply
that the rest of the tree is out of balance. (It may just be that a single search path has low
depth, but the rest of the tree is perfectly balanced.) Could we instead apply the deletion
6
The colorful term “scapegoat” refers to an individual who is assigned the blame when something goes wrong. In
this case, the unbalanced node takes the blame for the tree’s height being too great.

Scapegoat Trees 74 CMSC 420


rebuilding trigger to work for insertion? Again, this will not work. The natural counterpart
would be, “given a newly rebuild tree with n keys, we will rebuild it after inserting roughly n/2
new keys.” However, if we are very unlucky, all these keys may fall along a single search path,
and the tree’s height would be as bad as O((log n) + n/2) = O(n), and this is unacceptably
high.

How to Rebuild a Subtree: Before getting to the details of how the scapegoat tree works, let’s
consider the basic operation that is needed to maintain balance, namely rebuilding subtrees
into balanced form. We shall see that if the subtree contains n keys, this operation can be
performed in O(n) time (see Fig. 64). Letting p denote the root node of the subtree to rebuild,
call this function rebuild(p):

ˆ Perform an inorder traversal of p’s subtree, copying the keys to an array A[0..k-1],
where k denotes the number of nodes in this subtree. Note that the elements of A are
sorted.
ˆ Invoke the following recursive subtree-building function: buildSubtree(A)
– Let k = A.length.
– If k == 0, return an empty tree, that is, null.
– Otherwise, let x be the median key, that is, A[j], where j = bk/2c. Recursively in-
voke L = buildSubtree(A[0..j-1]) and R = buildSubtree(A[j+1..k-1]). Fi-
nally, create an internal node containing x with left subtree L and right subtree R.
Return a pointer to x.

Note that if A is implemented as a Java ArrayList, there is a handy function called sublist
for performing the above splits. The function is given in the code block below.

b A A[j]
0 1 2 3 4 5
d d
a c a b c d e f
k=6
0 1 2 0 1
e j = bk/2c = 3 b f
a b c e f
d f A[0..j − 1] A[j + 1..k − 1] a c e

Fig. 64: Recursively building a balanced tree from a sorted array of keys.

Building a Balanced Tree from an Array


Node buildSubtree(Key[] A) { // A is a sorted array of keys
k = A.length
if (k == 0) return null // empty array
else {
j = floor(k/2) // median of the array
Node p = new Node(A[j]) // ...this is the root
p.left = buildSubtree(A[0..j-1]) // build left subtree recursively
p.right = buildSubtree(A[j+1..k-1]) // build right subtree recursively
return p // return root of the subtree
}
}

Scapegoat Trees 75 CMSC 420


Ignoring the recursive calls, we spend O(1) time in each recursive call, so the overall time is
proportional to the size of the tree, which is k, so the total time is O(k).
Scapegoat Tree Operations: In addition to the nodes themselves, the scapegoat tree maintains
two integer values. The first, denoted by n, is just the actual number of keys in the tree. The
second, denoted by m, is a special parameter, which is used to trigger the event of rebuilding
the entire tree.
In particular, whenever we insert a key, we increment m, but whenever we delete a key we
do not decrement m. Thus, m ≥ n. The difference m − n intuitively represents the number
of deletions. When we reach a point where m > 2n (or equivalently m − n > n) we can infer
that the number of deletions exceeds the number of keys remaining in the tree. In this case,
we rebuild the entire tree in balanced form.
We are now in a position to describe how to perform the dictionary operations for a scapegoat
tree.

find(Key x): The find operation is performed exactly as in a standard (unbalanced) binary
search tree. We will show that the height of the tree never exceeds log3/2 n ≈ 1.7 · lg n,
so this is guaranteed to run in O(log n) time.
delete(Key x): This operates exactly the same as deletion in a standard binary search tree.
After deleting the node, decrement n (but do not change m). If m > 2n, rebuild the
entire tree by invoking rebuild(root), and set m ← n.
insert(Key x, Value v): First, increment both n and m.The begins exactly as insertion
does for a standard binary search tree. But, as we are tracing the search path to the
insertion point, keep track of our depth in the tree. (Recall that depth is the number
of edges to root.) If the depth of the inserted node exceeds log3/2 m then we trigger a
rebuilding event. This involves the following:
ˆ Walk back up along the insertion search path towards the root. Let p be the current
node that is visited, and let p.child be the child of p that lies on the search path.
ˆ Let size(p) denote the size of the subtree rooted at p, that is, the number of nodes
in this subtree.
ˆ If
size(p.child) 2
> ,
size(p) 3
then rebuild the subtree rooted at p by invoking rebuild(p). The node p is the
scapegoat.
An example of insertion is shown in Fig. 65. After inserting 5, the tree has n = 11 nodes.
The newly inserted node is at depth 6, and since 6 > log3/2 11 (which is approximately
5.9), we trigger the rebuilding event. We walk back up the search path. We find node 9
whose size is 7, but the child on the search path has size 6, and 6/7 > 2/3, so we invoke
rebuild on the node containing 9.

Must there be a scapegoat? The fact that a child has over 2/3 of the nodes of the entire subtree
intuitively means that this subtree has (roughly) more than twice as many nodes as its sibling.
We call such a node on the search path a scapegoat candidate. A short way of summarize the
above process is “rebuild the scapegoat candidate that is closest to the insertion point.’
You might wonder whether we will necessarily encounter an scapegoat candidate when we
trace back along the search path. The following lemma shows that this is always the case.

Scapegoat Trees 76 CMSC 420


insert(5)
13 13 13
12 15 12 15 12 15
6
7 > 23 !!
9 17 9 17 4 17
3
6 rebuild(9)
2 2 1 7
2
3
1 7 1 7 0 2 5 9
1
2
0 4 0 4
6 > log 3 11 ≈ 5.9!!
5 2

Fig. 65: Inserting a key into a scapegoat tree, which triggers a rebuilding event. The node containing
9 is the first scapegoat candidate encountered while backtracking up the search path and is rebuilt.

Lemma: Given a binary search tree of n nodes, if there exists a node p such that depth(p) >
log3/2 n, then p has an ancestor (possibly p itself) that is a scapegoat candidate.
Proof: The proof is by contradiction. Suppose to the contrary that no node from p to the
root is a scapegoat candidate. This means that for every ancestor node u from p to the
root, we have size(u.child) ≤ 23 · size(u). We know that the root has a size of n. It follows
that if p is at depth k in the tree, then
 k
2
size(p) ≥ n.
3
We know that size(p) ≥ 1 (since the subtree contains p itself, if nothing else), so it follows
that 1 ≥ (2/3)k n. With some simple manipulations, we have
 k
3
≤ n,
2
which implies that k ≤ log3/2 n. However, this violates our hypothesis that p’s depth
exceeds log3/2 n, yielding the desired contradiction.

Recall that m ≥ n, and so if a rebuilding event is triggered, the insertion depth is at least
log3/2 m, which means that it is at depth at least log3/2 n. Therefore, by the above lemma,
there must be a scapegoat candidate along the search path.
How to Compute Subtree Sizes? We mentioned earlier that the scapegoat tree does not store
any information in the nodes other than the key, value, and left and right child pointers.
So how can we compute size(u) for a node u during the insertion process, without this
information? There is a clever trick for doing this on the fly.
Since we are doing this as we back up the search path, we may assume that we already know
the value of s0 = size(u.child), where this is the child that lies along the insertion search
path. So, to compute size(u), it suffices to compute the size of u’s other child. To do this,
we perform a traversal of this child’s subtree to determine its size s00 . Given this, we have
size(u) = 1 + s0 + s00 , where the +1 counts the node u itself.
You might wonder, how can we possibly expect to achieve O(log n) amortized time for inser-
tion if we are using brute force (which may take as much as O(n) time) to compute the sizes

Scapegoat Trees 77 CMSC 420


of the subtrees? The reason is to first recall that we do not need to compute subtree sizes
unless a rebuild event has been triggered. Every node that we are visiting in the counting
process will need to be visited again in the rebuilding process. Thus, the cost of this counting
process can be “charged for” in the cost of the rebuilding process, and hence it essentially
comes for free!
Amortized Analysis: We will not present a formal analysis of the amortized analysis of the
scapegoat tree. The following theorem (and the rather sketchy proof that follows) provides
the main results, however.

Theorem: Starting with an empty tree, any sequence of m dictionary operations (find, insert,
and delete) to a scapegoat tree can be performed in time O(m log m). (Don’t confuse m
here with the m used in the algorithm.)
Proof: The proof is quite detailed, so we will just sketch the key ideas. In order for each
rebuild operation to be triggered, you need to perform a lot of cheap (rebuild-free)
operations to get into an unbalanced state. We will explain the results in terms of n,
the number of items in the tree, but note that m is generally larger.
Find: Because the tree’s height is at most log3/2 m ≤ log3/2 (2n) = O(log n) the costs
of a find operation is O(log n) (unconditionally).
Delete: In order to rebuild the entire tree due to deletions, at least half the entries
since the last full rebuild must have been deleted. (The value m − n is the number
of deletions, and a rebuild is triggered when m > 2n, implying that m − n > n.) By
token-based analyses, it follows that the O(n) cost of rebuilding the entire tree can
be amortized against the time spent processing the (inexpensive) deletions.
Insert: This is analyzed by a potential argument. This is complicated by the fact that
subtrees of various sizes can be rebuilt. Intuitively, after any subtree of size k is
rebuilt, it takes an additional O(k) (inexpensive) operations to force this subtree
to become unbalanced and hence to be rebuilt again. We charge the expense of
rebuilding to against these “cheap” insertions.

Lecture 13: Hashing


Hashing: We have seen various data structures (e.g., binary trees, AVL trees, splay trees, skip
lists) that can perform the dictionary operations insert(), delete() and find(). We know
that these data structures provide O(log n) time access. It is unreasonable to expect any
type of comparison-based structure to do better than this in the worst case. Using binary
decisions, there is a lower bound of Ω(log n) (and more precisely, 1 + blg nc) on the worst case
search time.
Remarkably, there is a better method, assuming that we are willing to give up on the idea of
using comparisons to locate keys. The best known method is called hashing. Hashing and its
variants support all the dictionary operations in O(1) (i.e. constant) expected time. Hashing is
considered so good, that in contexts where just these operations are being performed, hashing
is the method of choice. The price we pay is that we cannot perform dictionary operations
based on search order, such as range queries (finding the keys x such that x1 ≤ x ≤ x2 ) or
nearest-neighbor queries (find the key closest to a given key x).
The idea behind hashing is very simple. We have a table of given size m, called the table size.
We will assume that m is at least a small constant factor larger n. We select a hash function

Hashing 78 CMSC 420


h(x), which is an easily computable function that maps a key x to a “virtually random” index
in the range [0..m-1]. We then attempt to store x (and its associated value) in index h(x)
in the table. Of course, it may be that different keys are mapped to the same location. Such
events are called collisions, and a key element in the design of a good hashing system how
collisions are to be handled. Of course, if the table size is large (relative to the total number
of entries) and the hashing function has been well designed, collisions should be relatively
rare.
Hashing is quite a versatile technique. One way to think about hashing is as a means of
implementing a content-addressable array. We know that arrays can be addressed by an
integer index. But it is often convenient to have a look-up table in which the elements
are addressed by a key value which may be of any discrete type, strings for example or
integers that are over such a large range of values that devising an array of this size would
be impractical. Note that hashing is not usually used for continuous data, such as floating
point values, because similar keys 3.14159 and 3.14158 may be mapped to entirely different
locations.
There are two important issues that need to be addressed in the design of any hashing system,
the hash function and the method of collision resolution. Let’s discuss each of these in turn.

Hash Functions: A good hashing function should have the following properties:

ˆ It should be efficiently computable, say in constant time and using simple arithmetic
operations.
ˆ It should produce few collisions. Two additional aspects of a hash function implied by
this are:
– It should be a function of every bit of the key (otherwise keys that differ only in
these bits will collide)
– It break up (scatter) naturally occuring clusters of key values.

As an example of the last rule, observe that in writing programs it is not uncommon to use
very similar variables names, “temp1”, “temp2”, and “temp3”. It is important such similar
names be mapped to very different locations in the hash output space. By the way, the origin
of the name “hashing” is from this mixing aspect of hash functions (thinking of “hash” in
food preparation as a mixture of things).
We will think of hash functions as being applied to nonnegative integer keys. Keys that are
not integers will generally need to be converted into this form (e.g., by converting the key into
a bit string, such as an ASCII or Unicode representation of a string) and then interpreting the
bit string as an integer. Since the hash function’s output is the range [0..m − 1], an obvious
(but not very good) choice for a hash function is:

h(x) = x mod m.

This is called division hashing. It satisfies our first criteria of efficiency, but consecutive keys
are mapped to consecutive entries, and this is does not do a good job of breaking up clusters.

Some Common Hash Functions: Many different hash functions have been proposed. The topic
is quite deep, and we will not claim to have a definitive answer for the best hash function.
Here are three simple, commonly used hash functions:

Hashing 79 CMSC 420


Multiplicative Hash Function: Uses the hash function

h(x) = (ax) mod m,

where a is a large prime number (or at least, sharing no common factors with m).
Linear Hash Function: Enhances the multiplicative hash function with an added constant
term
h(x) = (ax + b) mod m.
Polynomial Hash Function: We can further extend the linear hash function to a polyno-
mial. This is often handy with keys that consist of a sequence of objects, such as strings
or the coordinates of points in a multi-dimensional space.
Suppose that the key being hashed involves a sequence of numbers x = (c0 , c1 , . . . , ck−1 ).
We map them to a single number by computing a polynomial function whose coefficients
are these values.
k−1
!
X
i
h(x0 , . . . , xn ) = ci p mod m
i=0

For example, if k = 4 and p = 37, the associated polynomial would be c0 + c1 37 + c2 372 +


c3 373 .

Randomization and Universal Hashing: Any deterministic hashing scheme runs the risk that
we may (very rarely) come across a set keys that behaves badly for this choice. As we have
seen before, one way to evade attacks by a clever adversary is to employ randomization. Any
given hash function might be bad for a particular set of keys. So, it would seem that we can
never declare that any one hash function to be “ideal.”
This gives rise to the notion of universal hashing. A hashing scheme is said to be universal if
the hash function is selected randomly from a large class of functions, and the probability of
a collision between any two fixed keys is 1/m. Since there are m entries in the table, this is
the best you might hope for.
There are many different universal hash functions. Here is a simple one (which was first
proposed by the inventors of universal hashing, Carter and Wegman). First, let p be any
prime number that is chosen to be larger than any input key x to the hash function. Next,
select two integers a and b at random where

a ∈ {1, 2, . . . , p − 1} and b ∈ {0, 1, . . . , p − 1}.

(Note that a 6= 0.) Finally, consider the following linear hash function, which depends on the
choice of a and b.
ha,b (x) = ((ax + b) mod p) mod m.
As a and b vary, this defines a family of functions. Let Hp denote the class of hash functions
that arise by considering all possible choices of a and b (subject to the above restrictions).
We will not prove that this hash function is universal. The proof involves some basic number
theory.
Like the other randomized structures we have seen this year, universal hash functions are both
simple and provide good guarantees on the expected-case performance of hashing systems.
We will pick this topic up in our next lecture, focusing on methods for collision resolution,
under the assumption that our hashing function has a low probability of collisions.

Hashing 80 CMSC 420


Collision Resolution: We have discussed how to design a hash function in order to achieve good
scattering properties. But, given even the best hash function, it is possible that distinct keys
can map to the same location, that is, h(x) = h(y), even though x 6= y. Such events are called
collisions, and a fundamental aspect in the design of a good hashing system how collisions
are handled. We focus on this aspect of hashing in this lecture, called collision resolution.

Separate Chaining: If we have additional memory at our disposal, a simple approach to collision
resolution, called separate chaining, is to store the colliding entries in a separate linked list,
one for each table entry. More formally, each table entry stores a reference to a list data
structure that contains all the dictionary entries that hash to this location.
To make this more concrete, let h be the hash function, and let table[] be an m-element
array, such that each element table[i] is a linked list containing the key-value pairs (x, v),
such that h(x) = i. We will set the value of m so that each linked list is expected to contain
just a constant number of entries, so there is no need to be clever by trying to sort the
elements of the list. The dictionary operations reduce to applying the associated linked-list
operation on the appropriate entry of the hash table.

ˆ insert(x,v): Compute i = h(x), and then invoke table[i].insert(x,v) to insert


(x, v) into the associated linked list. If x is already in the list, signal a duplicate-key
error (see Fig. 66).

table
insert("d") h("d") = 1 0 w f
insert("z") h("z") = 4 1 d
insert("p") h("p") = 7 2
insert("w") h("w") = 0 3
insert("t") h("t") = 4 4 z t
insert("f") h("f") = 0 5
6
m=8 7 p

Fig. 66: Collision resolution by separate chaining.

ˆ delete(x): Compute i = h(x), and then invoke table[i].delete(x) to remove x’s


entry from the associated linked list. If x is not in the list, signal a missing-key error.
ˆ find(x): Compute i = h(x), and then invoke table[i].find(x) to determine (by
simple brute-force search) whether x is in the list.

Clearly, the running time of this procedure depends on the number of entries that are stored
in the given table entry. To get a handle on this, consider a hash table of size m containing
n keys. Define its load factor to be λ = n/m. If we assume that our hash function has done
a good job of scattering keys uniformly about the table entries, it follows that the expected
number of entries in each list is λ.
We say that a search find(x) is successful if x is in the table, and otherwise it is unsuccessful.
Assuming that the entries appear in each linked list in random order, we would expect that we
need to search roughly half the list before finding the item being sought after. It follows that
the expected running time of a successful search with separate chaining is roughly 1 + λ/2.

Hashing 81 CMSC 420


(The initial “+1” accounts for the fact that we need to check one more entry than the list
contains, if just to check the null pointer at the end of the list.) On the other hand, if the
search is unsuccessful, we need to enumerate the entire list, and so the expected running time
of an unsuccessful search with separate chaining is roughly 1 + λ. In summary, the successful
and unsuccessful search times for separate chaining are:
λ
SSC = 1 + USC = 1 + λ,
2
Observe that both are O(1) under our assumption that λ is O(1). Since we can insert and
delete into a linked list in constant time, it follows that the expected time for all dictionary
operations is O(1 + λ).
Note the “in expectation” condition is not based on any assumptions about the insertion or
deletion order. It depends simply on the assumption that the hash function uniformly scatters
the keys. Assuming that we use universal hashing (see the previous lecture), this uniformity
assumption is very reasonable, since the user cannot predict which random hash function will
be used. It has been borne out through many empirical studies that hashing is indeed very
efficient.
The principal drawback of separate chaining is that additional storage is required for linked-
list pointers. It would be nice to avoid this additional wasted space. The remaining methods
that we will discuss have this property. Before discussing them, we should discuss the issue
of controlling the load factor.

Controlling the Load Factor and Rehashing: Recall that the load factor of a hashing scheme
is λ = n/m, and the expected running time of hashing operations using separate chaining is
O(1 + λ). We will see below that other popular collision-resolution methods have running
times that grow as O(λ/(1 − λ)). Clearly, we would like λ to be small and in fact strictly
smaller than 1. Making λ too small is wasteful, however, since it means that our table size
is significantly larger than the number of keys. This suggests that we define two constants
0 < λmin < λmax < 1, and maintain the invariant that λmin ≤ λ ≤ λmax . This is equivalent
to saying that n ≤ λmax m (that is, the table is never too close to being full) and m ≤ n/λmin
(that is, the table size is not significantly larger than the number of entries). Define the ideal
load factor to be the mean of these two, λ0 = (λmin + λmax )/2.
Now, as we insert new entries, if the load factor ever exceeds λmax (that is, n > λmax m),
we replace the hash table with a larger one, devise a new hash function (suited to the larger
size), and then insert the elements from the old table into the new one, using the new hash
function. This is called rehashing (see Fig. 67). More formally:

ˆ Allocate a new hash table of size m0 = dn/λ0 e


ˆ Generate a new hash function h0 based on the new table size
ˆ For each entry (x, v) in the old hash table, insert it into the new table using h0
ˆ Remove the old table

Observe that after rehashing the new load factor is roughly n/m0 ≈ λ0 , thus we have restored
the table to the ideal load factor. (The ceiling is a bit of an algebraic inconvenience. Through-
out, we will assume that n is sufficiently large that floors and ceilings are not significant.)
Symmetrically, as we delete entries, if the load factor ever falls below λmin (that is, n <
λmin m), we replace the hash table with a smaller one of size dn/λ0 e, generate a new hash

Hashing 82 CMSC 420


insert(a) rehash

n=7 0 w f n=7 0 t
m=8 1 d m0 = 14 1
2 2
λ = 7/8 > λmax !! 3 a λ = 7/14 = λ0 3 a
4 z t 4
5 5 d
6 6
7 p 7 f p
8

12 w
13 z

Fig. 67: Controlling the load factor by rehashing, where λmin = 0.25, λmax = 0.75, and λ0 = 0.5.

function for this table, and we rehash entries into this new table. Note that in both cases
(expanding and contracting) the hash table changes by a constant fraction of its current size.
This is significant in the analysis.
Open Addressing: Let us return to the question of collision-resolution methods that do not
require additional storage. Our objective is to store all the keys within the hash table.
(Therefore, we will need to assume that the load factor is never greater than 1.) To know
which table entries store a value and which do not, we will store a special value, called empty,
in the empty table entries. The value of empty must be such that it matches no valid key.
Whenever we attempt to insert a new entry and find that its position is already occupied, we
will begin probing other table entries until we discover an empty location where we can place
the new key. In it most general form, an open addressing system involves a secondary search
function, f . If we discover that location h(x) is occupied, we next try locations

(h(x) + f (1)) mod m, (h(x) + f (2)) mod m, (h(x) + f (3)) mod m, . . . .

until finding an open location. (To make this a bit more elegant, let us assume that f (0) = 0,
so even the first probe fits within the general pattern.) This is called a probe sequence, and
ideally it should be capable of searching the entire list. How is this function f chosen? There
are a number of alternatives, which we consider below.
Linear Probing: The simplest idea is to simply search sequential locations until finding one that
is open. In other words, the probe function is f (i) = i. Although this approach is very simple,
it only works well for fairly small load factor. As the table starts to get full, and the load
factor approaches 1, the performance of linear probing becomes very bad.
To see what is happening consider the example shown in Fig 68. Suppose that we insert four
keys, two that hash to table[0] and two that hash to table[2]. Because of the collisions,
we will fill the table entries table[1] and table[3] as well. Now, suppose that the fifth key
(“t”) hashes to location table[1]. This is the first key to arrive at this entry, and so it is
not involved any collisions. However, because of the previous collisions, it needs to slide down
three positions to be entered into table[4].

Hashing 83 CMSC 420


insert(d) insert(w)
0 d 0 d 0 d 0 d !! 0 d
insert(t)
1 1 1 1 w 1 w !!
h("d") = 0 insert(z) insert(p)
2 2 z 2 z !! 2 z 2 z
h("z") = 2
3 3 3 p 3 p 3 p
h("p") = 2
4 4 4 4 4 t
h("w") = 0
5 5 5 5 5
h("t") = 1

Fig. 68: Linear probing.

This phenomenon is called primary clustering. Primary clustering occurs when the collision
resolution algorithm causes keys that hash to nearby locations to form into clumps. Linear
probing is especially susceptible to primary clustering. As the load factor approaches 1,
primary clustering becomes more and more pronounced, and probe sequences may become
unacceptably long.
While we will not present it, a careful analysis shows that the expected costs for successful
and unsuccessful searches using linear probing are, respectively:
   2 !
1 1 1 1
SLP = 1+ ULP = 1+ .
2 1−λ 2 1−λ

The proof is quite sophisticated, and we will skip it. Observe, however, that in the limit as
λ → 1 (a full table) the running times (especially for unsuccessful searches) rapidly grows to
infinity. A rule of thumb is that as long as the table remains less than 75% full, linear probing
performs fairly well. Nonetheless, the issue of primary clustering is a major shortcoming, and
the methods given below do significantly better in this regard.

Quadratic Probing: To avoid primary clustering, one idea is to use a nonlinear probing function
which scatters subsequent probes around more effectively. One such method is called quadratic
probing, which works as follows. If the index hashed to h(x) is full, then we consider next
h(x) + 1, h(x) + 4, h(x) + 9, . . . (again taking indices mod m). Thus, the probing function is
f (i) = i2 . Alert: Note that the quadratic terms are added to the initial hash location, not
to the previously probed location.
The find function is shown in the following code block. Rather than computing h(x) + i2 , we
use a cute trick to update the probe location. Observe that i2 = (i − 1)2 + 2i − 1. Thus, we
can advance to the next position in the probe sequence (i2 ) by incrementing the old position
((i − 1)2 ) by the value 2i − 1. We assume that each table entry table[i] contains two
elements, table[i].key and table[i].value. If found, the function returns the associated
value, and otherwise it returns null.
Experience shows that this succeeds in breaking up the clusters that arise from linear probing,
but it is far from perfect. Quadractic probing is also clumping, called secondary clustering.
This happens because if two keys hash to nearby locations, corresponding elements of the
probe sequence will also be similarly nearby.
In addition, quadratic probing suffers from a rather knotty problem. Unlike linear probing,
which is guaranteed to try every entry in your table, quadratic probing bounces around
less predictably. Might it miss some entries? The answer, unfortunately, is yes! To see why,

Hashing 84 CMSC 420


Find Operation with Quadratic Probing
Value find(Key x) { // find x
int c = h(x) // initial probe location
int i = 0 // probe offset
while (table[c].key != empty) && (table[c].key != x) {
c += 2*(++i) - 1 // next position
c = c % m // wrap around if needed
}
return table[c].value // return associated value (or null if empty)
}

consider the rather trivial case where m = 4. Suppose that h(x) = 0 and your table has empty
slots at table[1] and table[3]. The quadratic probe sequence will inspect the following
indices:

12 mod 4 = 1 22 mod 4 = 0 32 mod 4 = 1 42 mod 4 = 0 . . .

It can be shown that it will only check table entries 0 and 1. This means that you cannot find
a slot to insert this key, even though your table is only half full! A more realistic example is
when m = 105. In this case,
The following lemma shows that, if you choose your table size m to be a prime number, then
quadratic probing is guaranteed to visit at least half of the table entries before repeating.
This means that it will succeed in finding an empty slot, provided that m is prime and your
load factor is smaller than 1/2.

Theorem: If quadratic probing is used, and the table size m is a prime number, the first
bm/2c probe sequences are distinct.
Proof: Suppose by way of contradiction that for 0 ≤ i < j ≤ bm/2c, both h(x) + i2 and
h(x) + j 2 are equivalent modulo m. Then the following equivalencies hold modulo m:

i2 ≡ j 2 ⇔ i2 − j 2 ≡ 0 ⇔ (i − j)(i + j) ≡ 0 (mod m)

This means that the quantity (i − j)(i + j) is a multiple of m. But this cannot be, since
m is prime and both i − j and i + j are nonzero and strictly smaller than m. (The fact
that i < j ≤ bm/2c implies that their sum is strictly smaller than m.) Thus, we have
the desired contradiction.

This is a rather weak result, however, since people usually want their hash tables to be more
than half full. You can do better by being more careful in the choice of the table size and/or
the quadratic increment. Here are two examples, which I will present without proof.

ˆ If the table size m is a prime number of the form 4k + 3, then quadratic probing will
succeed in probing every table entry before repeating an entry.
ˆ If the table size m is a power of two, and the increment is chosen to be 21 (i2 + i) (thus,
you probe locations h(x), h(x) + 1, h(x) + 3, h(x) + 6, and so on) then you will succeed
in probing every table entry before repeating an entry.

Double Hashing: Both linear probing and quadratic probing have shortcomings. Our final
method overcomes both of these limitations. Recall that in any open-addressing scheme,

Hashing 85 CMSC 420


we are accessing the probe sequence h(x) + f (1), h(x) + f (2), and so on. How about if we
make the increment function f (i) a function of the search key? Indeed, to make it as random
as possible, let’s use another hash function! This leads to the concept of double hashing.
More formally, we define two hash functions h(x) and g(x). We use h(x) to determine the
first probe location. If this entry is occupied, we then try:

h(x) + g(x), h(x) + 2g(x), h(x) + 3g(x), ...

More formally, the probe sequence is defined by the function f (i) = i · g(x). In order to be
sure that we do not cycle, it should be the case that m and g(x) are relatively prime, that
is, they share no common factors. There are lots of ways to achieve this. For example, select
g(x) to be a prime that is strictly larger than m or the product of primes that are larger than
m. Another approach would be to set m to be a power of 2, and then to generate g(x) as
the product of prime numbers other than 2. In short, we should be careful in the design of a
double-hashing scheme, but there is a lot of room for adjustment.
Fig. 69 provides an illustration of how the various open-addressing probing methods work.

h(x)

insert
Linear probing

h(x)

+1 +4 +9 +16
Quadratic probing insert

h(x)
insert
+g(x) +2g(x) +3g(x) +4g(x)
Double hashing

Fig. 69: Various open-addressing systems. (Shaded squares are occupied and the black square
indicates where the key is inserted.)

Theoretical running-time analysis shows that double hashing is the most efficient among
the open-addressing methods of hashing, and it is competitive with separate chaining. The
running times of successful and unsuccessful searches for open addressing using double hashing
are
1 1 1
SDH = ln UDH = .
λ 1−λ 1−λ
To get some feeling for what these quantities mean, consider the following table:

λ 0.50 0.75 0.90 0.95 0.99


U (λ) 2.00 4.00 10.0 20.0 100.
S(λ) 1.39 1.89 2.56 3.15 4.65

Hashing 86 CMSC 420


Note that, unlike tree-based search structures where the search time grows with n, these
search times depend only on the load factor. For example, if you were storing 100,000 items
in your data structure, the above search times (except for the very highest load factors) are
superior to a running time of O(log n).
Deletions: Deletions are a bit tricky with open-addressing schemes. Can you see why?
The issue is illustrated Fig. 70. When we insert “a”, an existing key “f” was on the probe
path, and we inserted “a” beyond “f”. Then we delete “f” and then search for “a”. The
problem is that with “f” no longer on the probe path, we arrive at the empty slot and take
this to mean that “a” is not in the dictionary, which is not correct.

h("a")
insert("a")

f a

delete("f")

h("a")
find("a")

a
"a" not found!

Fig. 70: The problem with deletion in open addressing systems.

To handle this we create a new special value (in addition to empty) for cells whose keys have
been deleted, called, say “deleted”. If the entry is marked deleted this means that the
slot is available for future insertions, but if the find function comes across such an entry, it
should keep searching. The searching stops when it either finds the key or arrives at an cell
marked “empty” (key not found).

f a

delete("f")

D a

h(a) (deleted)
find("a")

D a

(keep searching) "a" found!

Fig. 71: Deleting in open-addressing by using special empty entry.

Using the “deleted” entry is a rather quick-and-dirty fix. It suffers from the shortcoming
that as keys are deleted, the search paths are unnaturally long. (The load factor has come

Hashing 87 CMSC 420


down, but the search paths are just as long as before.) A more clever solution would involve
moving keys that that were pushed down in the probe sequence up to fill the vacated entries.
Doing this, however make deletion times longer.

Further refinements: Hashing is a very well studied topic. We have hit the major points, but
there are a number of interesting refinements that can be applied. One example is a technique
called Brent’s method. This approach is used to reduce the search times when double hashing
is used. It exploits the fact that any given cell of the table may lie at the intersection of
two or more probe sequences. If one of these probe sequences is significantly longer than the
other, we can reduce the average search time by changing which key is placed at this point
of overlap. Brent’s algorithm optimizes this selection of which keys occupy these locations in
the hash table.

Amortized Analysis of Rehashing: (Optional) Here we show that if a hash table is maintained
using rehashing when it gets too full (as described earlier), the amortized cost per operation
is a constant. The proof is quite similar to the proof given earlier in the semester for the
dynamically expanding stack.
Observe that whenever we rehash, the running time is proportional to the number of keys n.
If n is large, rehashing clearly takes a lot of time. But once we have rehashed, we will need
to do a significant number of insertions or deletions before we need to rehash again.
To make this concrete, let’s consider a specific example. Suppose that λmin = 1/4 and
λmax = 3/4, and hence λ0 = 1/2. Also suppose that the current table size is m = 1000.
Suppose the most recent insertion caused the load factor to exceed our upper bound, that is
n > λmax m = 750. We allocate a new table of size m0 = n/λ0 = 2n = 1500, and rehash all
the old elements into this new table. In order to overflow this new table, we will need for n
to increase to some higher value n0 such that n0 /m0 > λmax , that is n0 > (3/4)1500 = 1125.
In order to grow from the current 750 keys to 1125 keys, we needed to have at least 375 more
insertions (and perhaps many more operations if finds and deletions were included as well).
This means that we can amortize the (expensive) cost of rehashing 750 keys against the 375
(cheap) insertions.
Recall that the amortized cost of a series of operations is the total cost divided by the number
of operations.

Theorem: Assuming that individual hashing operations take O(1) time each, if we start with
an empty hash table, the amortized complexity of hashing using the above rehashing
method with minimum and maximum load factors of λmin and λmax , respectively, is at
most 1 + 2λmax /(λmax − λmin ).
Proof: Our proof is based on the same token-based argument that we used in the earlier
lecture. Let us assume that each standard hashing operation takes exactly 1 unit of
time, and rehashing takes time n, where n is the number of entries currently in the table.
Whenever we perform a hashing operation, we assess 1 unit to the actual operation, and
save 2λmax /(λmax − λmin ) work tokens to pay for future rehashings. (Where did this
“magic” number of tokens come from? The answer is to work through the analysis
below treating the number of tokens as an unknown quantity x. Then figure out what
value x needs to be to make the inequalities work out.)
There are two ways to trigger rehashing: expansion due to insertion, and contraction due
to deletion. Let us consider insertion first. Suppose that our most recent insertion has

Hashing 88 CMSC 420


triggered rehashing. This implies that the current table contains roughly n ≈ λmax m
entries. (Again, to avoid worrying about floors and ceilings, let’s assume that n is
quite large.) The last time the table was rehashed, the table contained n0 = λ0 m
entries immediately after the rehashing finished. This implies that we inserted at least
n−n0 = (λmax −λ0 )m entries. Therefore, the number of work tokens we have accumulated
since then is at least
 
2λmax λmax + λmin 2λmax
(λmax − λ0 )m = λmax − m
λmax − λmin 2 λmax − λmin
 
λmax − λmin 2λmax
= m
2 λmax − λmin
= λmax m ≈ n,

which implies that we have accumulated enough work tokens to pay the cost of n to
rehash.
Next, suppose that our most recent deletion has triggered rehashing. This implies that
the current table contains roughly n ≈ λmin m entries. (Again, to avoid worrying about
floors and ceilings, let’s assume that n is quite large.) The last time the table was
rehashed, the table contained n0 = λ0 m entries immediately after the rehashing finished.
This implies that we deleted at least n0 −n = (λ0 −λmin )m entries. Therefore, the number
of work tokens we have accumulated since then is at least
 
2λmax λmax + λmin 2λmax
(λ0 − λmin )m = − λmin m
λmax − λmin 2 λmax − λmin
 
λmax − λmin 2λmax
= m
2 λmax − λmin
= λmax m ≥ λmin m ≈ n,

again implying that we have accumulated enough work tokens to pay the cost of n to
rehash.

To make this a bit more concrete, suppose that we set λmin = 1/4 and λmax = 3/4, so
that λ0 = 1/2 (see Fig. 67). Then the amortized cost of each hashing operation is at most
1 + 2λmax /(λmax − λmin ) = 1 + 2(3/4)/(1/2) = 4. Thus, we pay just additional factor of four
due to rehashing. Of course, this is a worst case bound. When the number of insertions and
deletions is relatively well balanced, we do not need rehash very often, and the amortized cost
is even smaller.

Lecture 14: Point quadtrees and kd-trees


Geometric Data Structures: In today’s lecture we move in a new direction by covering a num-
ber of data structures designed for storing multi-dimensional geometric data. Geometric data
structures are fundamental to the efficient processing of data sets arising from myriad applica-
tions, including spatial databases, automated cartography (maps) and navigation, computer
graphics, robotics and motion planning, solid modeling and industrial engineering, particle
and fluid dynamics, molecular dynamics and drug design in computational biology, machine
learning, image processing and pattern recognition, computer vision.

Point quadtrees and kd-trees 89 CMSC 420


Fundamentally, our objective is to store a large datasets consisting of geometric objects (e.g.,
points, lines and line segments, simple shapes (such as balls, rectangles, triangles), and com-
plex shapes such as surface meshes) in order to answer queries on these data sets efficiently.
While some of our explorations will involve delving into geometry and linear algebra, fortu-
nately most of what we will cover assumes no deep knowledge of geometric objects or their
representations. Given a collection of geometric objects, there are numerous types of queries
that we may wish to answer.

Nearest-Neighbor Searching: Store a set of points so that qiven a query point q, it is


possible to find the closest point of the set (or generally the closest k objects) to the
query point (see Fig. 72(a)).
Range Searching: Store a set of points so that given a query region R (e.g., a rectangle
or circle), it is possible to report (or count) all the points of the set that lie inside this
region (see Fig. 72(b)).
Point location: Store the subdivision of space into disjoint regions (e.g., the subdivision of
the globe into countries) so that given a query point q, it is possible determine the region
of the subdivision containing this point efficiently (see Fig. 72(c)).
Intersection Searching: Store a collection of geometric objects (e.g., rectangles), so that
given a query consisting of an object R of this same type, it is possible to report (or
count) all of the objects of the set that intersect the query object (see Fig. 72(d)).
Ray Shooting: Store a collection of object so that given any query ray, it is possible to
determine whether the ray hits any object of the set, and if so which object does it hit
first.

R
q R

(a) (b) (c) (d)

Fig. 72: Common geometric queries: (a) nearest-neighbor searching, (b) range searching, (c) point
location, (d) intersection searching.

In all cases, you should imagine the size n of the set is huge, consisting for example of millions
of objects, and the objective is to answer the query in time that is significantly smaller than
n, ideally O(log n). We shall see that it is not always possible to achieve efficient query times
with storage that grows linearly with n. In such instances, we would like the storage to slowly,
for example, O(n log n). As with 1-dimensional data structures, it will also be desirable to
provide dynamic updates, allowing for the insertion and deletion of objects.
No Total Ordering: While we shall see that many of the ideas that we applied in the design of 1-
dimensional data structures can be adapted to the design of multi-dimensional data structure,
there is one fundamental challenge that we will face. Almost all 1-dimensional data structures
exploit the fact that the data are drawn from a total order. The existence of such a total
ordering is critical to all the tree-based search structures we studied as well as skip lists.

Point quadtrees and kd-trees 90 CMSC 420


The only exception to this is hashing. But hashing is applicable only when we are searching for
exact matches. In typical geometric queries (as all the ones described above) exact matching
does not apply. Instead we are interested in notions such as “close to” or “contained within”
or “overlapping with,” none of which are amenable to hashing.

Point representations: Let’s first say a bit about representation and notation. We will assume
that each point pi is expressed as a d-element vector, that is pi = (pi,1 , . . . , pi,d ). To simplify
our presentation, we will usually describe our data structures in a 2-dimensional context,
but the generalization to higher dimensions will be straightforward. For this reason, we may
sometimes refer to a point’s in terms of its (x, y)-coordinates, for example, p = (px , py ), rather
than p = (p1 , p2 ).
While mathematicians label indices starting with 1, programming languages like Java prefer
to index starting with 0. Therefore, in Java, each point p is represented as a d-element vector:
float[][] p = new float[n][d]; // array of n points, each a d-element vector

In this example, the points are p[0], p[1], p[n-1], and the coordinates of the ith point are
given by p[i][0], p[i][1], p[i][d-1].
A better approach would be to define a class that represents a point object. An example
of a simple Point object can be found in the code block below. We will assume this in our
examples. Java defines a 2-dimensional point object, called Point2d.
Simple Point class
public class Point {
private float[] coord; // coordinate storage

public Point(int dim) { /* construct a zero point */ }

public int getDim() { return coord.length; }


public float get(int i) { return coord[i]; }

public void set(int i, float x) { coord[i] = x; }

public boolean equals(Point other) { /* compare with another point */ }


public float distanceTo(Point other) { /* compute distance to another point */ }

public String toString() { /* convert to string */ }


}

Now, your point set could be defined as an array of points, for example, Point[] pointSet
= new Point[n]. Note that although we should use x.get(i) to get the ith coordinate of a
point x, we will often be lazy in our code samples, and just write x[i] instead.

Point quadtree: Let us first consider a natural way of generalizing unbalanced binary trees in
the 1-dimensional case to a d-dimensional context. Suppose that we wish to store a set
P = {p1 , . . . , pn } of n points in d-dimensional space. In binary trees, each point naturally
splits the real line in two. In two dimensions if we run a vertical and horizontal line through
the point, it naturally subdivides the plane into four quadrants about this point. (In general
d-dimensional space, we consider d axis-parallel hyperplanes passing through the point. These
subdivide space into 2d orthants.)

Point quadtrees and kd-trees 91 CMSC 420


To simplify the presentation, let us assume that we are working in 2-dimensional space. The
resulting data structure is called a point quadtree. (In dimension three, the corresponding
structure is naturally called an octtree. As the dimension grows, it is too complicated to figure
out the proper term for the number of children, and so the term quadtree is often used in
arbitrary dimensions, even though the outdegree of each node is 2d , not four.)
Each node has four (possibly null) children, corresponding to the four quadrants defined by
the 4-way subdivision. We label these according to the compass directions, as NW, NE, SW, and
SE. In terms of implementation, you can think of assigning these the values 0, 1, 2, 3, and
use them as indices to a 4-element array of children pointers.
As with standard (unbalanced) binary trees, points are inserted one by one. We descend
through the tree structure in a natural way. For example, we compare the newly inserted
point’s x and y coordinates to those of the root. If the x is larger and the y is smaller, we
recurse on the SE child. The insertion of each point results in a subdivision of a rectangular
region into four smaller rectangles. Consider the insertion of the following points (see Fig. 73):
(35, 40), (50, 10), (60, 75), (80, 65), (85, 15), (5, 45), (25, 35), (90, 5).

(60, 75) (60, 75)


(90, 5)
(60, 75) (25, 35) (80, 65)
(5, 45)
(35, 40) (50, 10) (35, 40) (5, 45) (35, 40)
(85, 15)
(80, 65) (25, 35)
(85, 15)

(50, 10) (50, 10)


(90, 5)

Fig. 73: Point quadtree subdivision.

The final subdivision and tree structure are shown in Fig. 74.

(x, y)
(60, 75)
NW NE SW SE
(80, 65) (35, 40)
(5, 45)
(35, 40)
(5, 45) (60, 75) (25, 35) (50, 10)
(25, 35)
(85, 15) (80, 65) (85, 15) (90, 5)
(50, 10)
(90, 5)
(a) (b)

Fig. 74: Point quadtree.

Each node in the tree is naturally associated with a rectangular region of space, which we
call its cell. Note that some rectangles are special in that they extend to infinity. Since semi-

Point quadtrees and kd-trees 92 CMSC 420


infinite rectangles sometimes bother people, it is not uncommon to assume that everything
is contained within one large bounding rectangle, which may be provided by the user when
the tree is first constructed.
We will not discuss algorithms for the point quad-tree in detail. Instead, we will defer this
discussion to point kd-trees, and simply note that for each operation on a kd-tree, there is a
similar algorithm for quadtrees.

Point kd-tree: As observed above, point quadtrees can be generalized to higher dimensions, the
number of children grows exponentially in the dimension, as 2d . For example, if you are
working in 20-dimensional space, every node has 220 , or roughly a million children! Clearly,
the simple quadtree idea is not scalable to very high dimensions. Next, we describe an
alternative data structure, that always results in a binary tree.
As in the case of a quadtree, the cell associated with each node is an axis-aligned rectangle
(assuming the planar case) or a hyper-rectangle in d-dimensional space. When a new point is
inserted into some node (equivalently into some cell), we will split the cell by a horizontal or
vertical splitting line, which passes through this point. In higher dimensions, we split the cell
by a (d − 1) dimensional hyperplane that is orthogonal to one of the coordinate axes. In any
dimension, such a split can be specified by giving the cutting axes (which can be represented
as an integer from 0 to d − 1), and also called the cutting value. Following the approach used
in point quadtrees, the cutting value will be taken from the coordinates of the point being
stored in this node. Thus, along with its left and right child pointers, we can think of every
node as storing two items, an integer cutting dimension and a point. The following code
shows a possible node structure. We add a utility method to determine whether a point lies
in the left subtree, that is, whether it is smaller along the cutting dimension. Of course, if
it is in the right subtree, this returns false. Throughout, we will use the terms “left” and
“right” to by synonymous with being smaller than or larger than the splitter along the cutting
dimension, respectiely.

class KDNode { // node in a kd-tree


Point point; // splitting point
int cutDim; // cutting dimension
KDNode left; // children
KDNode right;

KDNode(Point point, int cutDim) { // constructor


this.point = point;
this.cutDim = cutDim;
left = right = null;
}

boolean inLeftSubtree(Point x) { // is x in left subtree?


return x[cutDim] < point[cutDim];
}
}

The resulting data structure is called a point kd-tree. Actually, this is a bit of a misnomer.
The data structure was named by its inventor Jon Bentley to be a 2-d tree in the plane, a
3-d tree in 3-space, and a k-d tree in dimension k. However, over time the name “kd-tree”
became commonly used irrespective of dimension. Thus it is common to say a “kd-tree in
dimension 3” rather than a “3-d tree”.

Point quadtrees and kd-trees 93 CMSC 420


How is the cutting dimension chosen? There are a number of ways, but the most common
is just to alternate (or generally cycle) among the possible axes at each new level of the
tree.7 For example, at the root node we cut orthogonal to the x-axis (or 0th coordinate), for
its children we cut orthogonal to y (or 1st coordinate), for the grandchildren we cut again
along x. In general, we cycle through the various axes, setting the cutting dimension of the
child to 1 plus the cutting dimension of the parent all taken modulo the dimension. An
example is shown in Fig. 75 (given the same points as in the previous example). Again we
show both the tree and the spatial subdivision. We will assume this method of choosing
the cutting dimension in our examples, but there are better ways, for example selecting the
cutting dimension based on the direction in which the points of the subtree are most widely
distributed.

cut on x (35, 40)


(60, 75)
(x, y)
(5, 45) (50, 10)
(80, 65)
(5, 45) cut on y (25, 35) (90, 5) (60, 75)
(35, 40)
(x, y)
(25, 35) (80, 65)
(85, 15)
(85, 15)
(50, 10)
(90, 5)
(a) (b)

Fig. 75: Point kd-tree decomposition.

The tree representation is much the same as it is for quad-trees except now every node has
only two children. The contents of the left child of a node contain the points x such that
x[cutDim] < point[cutDim] and for the right child x[cutDim] >= point[cutDim]. (How
you break ties is rather arbitrary.)
As with unbalanced binary search trees, it is possible to prove that if keys are inserted in
random order, then the expected height of the tree is O(log n), where n is the number of
points in the tree.

Insertion into kd-trees: Insertion operates as it would for a regular binary search tree. We
descend the tree until falling out, and then we create a node containing the point and assign
its cutting dimension by whatever policy is used by the tree. The principal utility function
is presented in the following code block. The function takes three arguments, the point x
being inserted, the current node p, and the cutting dimension of the newly created node. The
initial call is root = insert(x, root, 0).
Observe that whenever we make a recursive call, we advance the cutting dimension by one
(p.cutDim + 1) but we wrap around by modding this with the dimension of the space
(x.getDim()). An example is shown in Fig. 76, where we insert the point (50, 90) into
7
An alternative is to select the cutting dimension based on the distribution of the points. For example, if the
points are more widely dispersed along the x-axis than along the y-axis, it makes sense to cut orthogonal to the
x-axis.

Point quadtrees and kd-trees 94 CMSC 420


kd-tree Insertion
KDNode insert(Point x, KDNode p, int cutDim) {
if (p == null) { // fell out of tree
p = new KDNode(x, cutDim); // create new leaf
} else if (p.point.equals(x)) {
throw Exception("duplicate point"); // duplicate data point!
} else if (p.inLeftSubtree(x)) { // insert into left subtree
p.left = insert(x, p.left, (p.cutDim + 1) % x.getDim());
} else { // insert into right subtree
p.right = insert(x, p.right, (p.cutDim + 1) % x.getDim());
}
return p;
}

the kd-tree of Fig. 75. We descend the tree until we fall out on the left subtree of node
(60, 75). We create a new node at this point, and the cutting dimension cycles from the
parent’s x-cutting dimension (cutDim = 0) to a y-cutting dimension (cutDim = 1).

insert
(35, 40)
(50,90) (50, 90)
(60, 75) (60, 75)
(5, 45) (50, 10)
(80, 65) (80, 65)
(5, 45) (5, 45) (25, 35) (90, 5) (60, 75)
(35, 40) (35, 40)

(25, 35) (50, 90) (80, 65)


(85, 15) (85, 15)
(85, 15)
(50, 10) (50, 10)
(90, 5) (90, 5)

Fig. 76: Insertion into the point kd-tree of Fig. 75.

Analysis: The space needed by the kd-tree to store n points in d-dimensional space is O(n), which
is optimal. (We treat d like a constant, independent of n. In fact, it takes O(dn) space to
store the coordinates, but since d is a constant, we can ignore the d factor.)
The height analysis of the kd-tree is essentially the same as that of the unbalanced binary
tree. If n points are inserted in random order, then the height of the tree will be O(log n) in
expectation. (Below, we discuss deletion. Because we chose replacement nodes in a biased
way, always from the right subtree, it is reasonable that the same systematic bias issues that

leads to O( n) tree height over a long series of random insertions and deletions.)
Probably the best way to maintain balance in a kd-tree is to simply rebuild unbalanced
subtrees, similar to what we did with scapegoat trees. This raises the question of how to build
a well-balanced tree for a fixed set of points. Suppose that we use the method of cycling the
cutting dimension from level to level of the tree to build a kd-tree for a point set P . At the root
level, we could choose the splitting point to be the median of P according to x-coordinates.
Then, after partitioning the set about this point, into say PL and PR , the splitting value for
each set would be the respective medians but according to the y-coordinates. By doing this,
we guarantee that the number of points in each subtree is essentially half that of its parent,
and this implies that the overall tree height is O(log n).
By the way, this raises an interesting computational question. We know that it is possible to

Point quadtrees and kd-trees 95 CMSC 420


build a 1-dimensional tree from a sorted point set in O(n log n) time, by repeatedly splitting
on the median. Can you generalize this to construct a perfectly balanced 2-dimensional kd-
tree also in O(n log n) time. The tricky issue is that sorting on x does not help you in finding
the y-splits, and sorting on y does not help you with the x splits. This is an interesting
computational problem to think about. (The answer is that it is possible to build such a tree
in O(n log n) time, but it takes a bit of cleverness. We will leave this as an exercise.)

Deletion and Replacement Points: (Optional) We will next discuss deletion from kd-trees.
As we saw with deletion in standard bindary search trees, an issue that will arise when
deleting a point from the middle of the tree is what to use in place of this node, that is, the
replacement point. It is not as simple as selecting the next point in an inorder traversal of
the tree, since we need a point that satisfies the necessary geometric conditions.
Suppose that we wish to delete a point in some node p, and suppose further that p.cutDim
== 0, that is, the x-coordinate. An appropriate choice for the replacement point is the point
of p.right that has the smallest x-coordinate. (What do we do if the right child is null?
We’ll come to this later.) Finding such a point is a nice exercise, since it illustrates how
programming is done with kd-trees.
Let us derive a procedure findMin(p, i, cutDim) that computes the point in the subtree
rooted at p that has the smallest ith coordinate. The procedure operates recursively. When
we arrive at a node p, if the cutting dimension matches i, then we know that the subtrees are
ordered according the the ith coordinate. If the left subtree is nonempty, then the desired
point is the minimum from this subtree. Otherwise, we return p’s associated point. (There
is never a need to search the right subtree. Do you see why?)
On the other hand, if p’s cutting dimension differs from i then we cannot infer which subtree
may contain the point with the minimum ith coordinate. So, we will have to try the right
subtree, the left subtree, and the point at this node. We include a function minAlongDim(p1,
p2, i), which returns whichever point p1 or p2 is smallest along coordinate i. (The function
is written so that the second point might be null, which happens when we attempt to extract
the minimum point from an empty subtree.) The function is given in the code block below.
Fig. 77 presents an example of the execution of this algorithm to find the point with the
minimum x-coordinate in the subtree rooted at (55, 40). Since this node splits horizontally,
we need to visit both of its subtrees to find their minimum x values. (These will be (15, 10)
for the left subtree and (10, 65) for the right subtree.) These values are then compared with
the point at the root to obtain the overall x-minimum point, namely (10, 65). Observe that
because the subtrees at (45, 20) and (35, 75) both split on the x-coordinate, and we are looking
for the point with the minimum x-coordinate, we do not need to search their right subtrees.
The nodes visited in the search are shaded in blue.

Deletion from a kd-tree: As with insertion we can model the deletion code after the deletion
code for unbalanced binary search trees. However, there is an interesting twist here. Recall
that in the 1-dimensional case we needed to consider a number of different cases. If the node
is a leaf we just delete the node. Otherwise, its deletion would result in a “hole” in the tree.
We need to find an appropriate replacement element. In the 1-dimensional case, we were
able to simplify this if the node has a single child (by making this child the new child of
our parent). However, this would move the child from an even level to an odd level, or vice
versa, and this would violate our assumption that the cutting dimensions cycle among the
coordinates. (Note this might not be an issue for some implementations of the kd-tree, where

Point quadtrees and kd-trees 96 CMSC 420


Find the minimum point in subtree along ith coordinate
Point findMin(KDNode p, int i) { // get min point along dim i
if (p == null) { // fell out of tree?
return null;
}
if (p.cutDim == i) { // cutting dimension matches i?
if (p.left == null) // no left child?
return p.point; // use this point
else
return findMin(p.left, i); // get min from left subtree
} else { // it may be in either side
Point q = minAlongDim(p.point, findMin(p.left, i), i);
return minAlongDim(q, findMin(p.right, i), i);
}
}

Point minAlongDim(Point p1, Point p2, int i) { // return smaller point on dim i
if (p2 == null || p1[i] <= p2[i]) // p1[i] is short for p1.get(i)
return p1;
else
return p2;
}

(x, y) cut on x
(80, 90)
(25, 85) (55, 40)
(35, 75)
(10, 65) (x, y) cut on y (45, 20) (35, 75)
(70, 60)
(20, 50)
(30, 25) (60, 30) (10, 65) (70, 60)
(55, 40)
(60, 30)
(30, 25) (15, 10) (70, 15) (20, 50) (25, 85) (80, 90)
(45, 20)
(15, 10) (70, 15)
(25, 15) (25, 15)

(a) (b)

Fig. 77: Example of findMin when i = 0 (the x-coordinate) on the subtree rooted at (55, 40). The
function returns (10, 65).

Point quadtrees and kd-trees 97 CMSC 420


the cutting dimension is selected by some other policy, but to keep things as clean as possible,
our deletion procedure will not alter a node’s cutting dimension.)
Let us assume first that the right subtree is non-empty. Recall that in the 1-dimensional case,
the replacement key was taken to be the smallest key from the right child, and after using the
replacement to fill the hole, we recursively deleted the replacement. How do we generalize
this to the multi-dimensional case? What does it mean to find the “smallest” element in a
such a set? The proper thing to do is to find the point whose coordinate along the current
cutting dimension is minimum. Thus, if the cutting dimension is the x-axis, say, then the
replacement key is the point with the smallest x-coordinate in the right subtree. We use the
findMin() function (given above) to do this.
On the other hand, what if the right subtree is empty? At first, it might seem that the
right thing to do is to select the maximum node from the left subtree. However, there is a
subtle trap here. Recall that we maintain the invariant that points whose coordinates are
equal to the cutting dimension are stored in the right subtree. If we select the replacement
point to be the point with the maximum coordinate from the left subtree, and if there are
other points with the same coordinate value in this subtree, then we will have violated our
invariant. There is a clever trick for getting around this though. For the replacement element
we will select the minimum (not maximum) point from the left subtree, and we move the left
subtree over and becomes the new right subtree. The left child pointer is set to null. The
code is given in the following code block.
kd-tree Deletion
KDNode delete(Point x, KDNode p) {
if (p == null) { // fell out of tree?
throw Exception("point does not exist");
} else if (p.point.equals(x)) { // found it
if (p.right != null) { // take replacement from right
p.point = findMin(p.right, p.cutDim);
p.right = delete(p.point, p.right);
} else if (p.left != null) { // take replacement from left
p.point = findMin(p.left, p.cutDim);
p.right = delete(p.point, p.left); // move left subtree to right!
p.left = null; // left subtree is now empty
} else { // deleted point in leaf
p = null; // remove this leaf
}
} else if (p.inLeftSubtree(x)) {
p.left = delete(x, p.left); // delete from left subtree
} else { // delete from right subtree
p.right = delete(x, p.right);
}
return p;
}

An example of the operation of this deletion algorithm is presented in Fig. 78. The original
objective is to delete the point (35, 60). This is at the root of the tree. Because the cutting
dimension is vertical, we search its right subtree to find the point with the minimum x-
coordinate, which is (50, 30). The point is copied to the root, and we then recursively delete
(50, 30) from the root’s right subtree. This recursive call then seeks the node p containing
(50, 30). Note that this node has no right child, but unlike standard binary search trees, we

Point quadtrees and kd-trees 98 CMSC 420


delete(35,60)
(35, 60) (50, 30)
delete(50,30)
(60, 80)
(20, 45) (60, 80) (20, 45) (60, 80)
copy
(35, 60) (90, 60)
(10, 35) (80, 40) (10, 35) (80, 40)
(20, 45)
(80, 40)
(20, 20) (50, 30) (90, 60) (20, 20) (50, 30) (90, 60)
(50, 30)
(10, 35)
(70, 20) (70, 20) (70, 20)
(20, 20) (60, 10) copy
replacement
(60, 10) (60, 10) replacement

(50, 30) (50, 30)


(60, 80)
(20, 45) (60, 80) (20, 45) (60, 80)
(90, 60)
(10, 35) (80, 40) (10, 35) (80, 40)
(20, 45)
(80, 40)
(20, 20) (60, 10) (90, 60) (20, 20) (60, 10) (90, 60)
(50, 30)
(10, 35)
(70, 20) Moved from left (70, 20) (70, 20) delete(60,10)
(20, 20) (60, 10) to right child
(60, 10)

Fig. 78: Deletion from a kd-tree.

cannot simply unlink it from the tree (for the reasons described above). Instead, we observe
that its cutting dimension is horizontal, and we search for the point with the minimum y-
coordinate in p’s left subtree, which is (60, 10). We copy (60, 10) to p, and then recursively
delete (60, 10) from p’s left subtree. It is a leaf, so it may simply be unlinked from the tree.
Finally we move this left subtree p over to become p’s right subtree. (Whew!)
Recall that in the 1-dimensional case, in the 2-child case the replacement node was guaranteed
to have either zero or one child. However this is not necessarily the case here. Thus we may
do many 2-child deletions. As with insertion the running time is proportional to the height
of the tree.

Lecture 15: Answering Queries with kd-trees


Recap: In our previous lecture we introduced kd-trees, a multi-dimensional binary partition tree
that is based on axis-aligned splits. We have shown how to perform the operations of insertion
and deletion from kd-trees. In this lecture, we will investigate an important geometric query
using kd-trees: orthogonal range search queries.

Range Queries: Given any point set, a fundamental type of query is called a range query or
more properly, an orthogonal range query. To motivate this sort of query, suppose that you
querying a biomedical database with millions of records. Each medical record is encoded as
a vector of health statistics, such as height, weight, blood pressure, etc. Each coordinate is
the numeric value of some statistic, such as a person’s height, weight, blood pressure, etc.

Answering Queries with kd-trees 99 CMSC 420


Suppose that you want to anwwer queries of the form “how many patients whose range 70–80
kilograms, heights in the range 160–170 centimeters, etc.” This is equivalent to finding the
number of points in the database that lie within an axis-orthogonal rectangle, defined by the
intersection of these intervals (see Fig. 79).

R 170 R 170

height 160 height 160


weight 70 80 weight 70 80

Fig. 79: Orthogonal range query.

More formally, given a set P of points in d-dimensional real space, Rd , we wish to store these
points in a kd-tree so that, given a query consisting of an axis-aligned rectangle, denoted R,
we can efficiently count or report the points of P lying within R. Listing all the points lying
in the range is called a range reporting query, and counting all the points in the range is called
a range counting query. The solutions for the two problems are often similar, but some tricks
can be employed when counting, that do not apply when reporting.
A Rectangle Class: Before we get into a description of how to answer orthogonal range queries
with the kd-tree tree, let us first define a simple class for storing a multi-dimensional rectangle,
or hyper-rectangle for short. The private data consists of two points low and high. A point
q lies within the rectangle if low[i] ≤ q[i] ≤ high[i], for 0 ≤ i ≤ d − 1 (assuming Java-like
indexing). In addition to a constructor, the class provides a few useful geometric primitives
(illustrated in Fig. 80).
boolean contains(Point q): Returns true if and only if point q is contained within this
rectangle (using the above inequalities).
boolean contains(Rectangle c): Returns true if and only if this rectangle contains rect-
angle c. This boils down to testing containment on all the intervals defining each of the
rectangles’ sides:
   
c.low[i], c.high[i] ⊆ low[i], high[i] , for all 0 ≤ i ≤ d − 1.

boolean isDisjointFrom(Rectangle c): Returns true if and only if rectangle c is disjoint


from this rectangle. This boils down to testing whether any of the defining intervals are
disjoint, that is

r.high[i] < c.low[i] or r.low[i] > c.high[i], for any 0 ≤ i ≤ d − 1.

float distanceTo(Point q): Returns the minimum Euclidean distance from q to any point
of this rectangle. This can be computed by computing the distance from the coordinate

Answering Queries with kd-trees 100 CMSC 420


q2
high q3
c
r
r r r r
q c q1
q4
low
new Rectangle(low, high) r.contains(q) r.contains(c) r.isDisjointFrom(c) r.distanceFrom(q)

Fig. 80: An axis-parallel rectangle methods.

q[i] to this rectangle’s ith defining interval, taking the sums of squares of these distances,
and then taking the square root of this sum:
v
ud−1
uX
t (distance(q[i], low[i], high[i]))2
i=0

There is one additional function worth discussing, because it is used in many algorithms that
involve kd-trees. The function is given a rectangle r and a splitting point s lying within
the rectangle. We want to cut the rectangle into two sub-rectangles by a line that passes
through the splitting point. These are used in a context where the rectangle r represents the
cell associated with a given kd-tree node, and by cutting the cell through the splitter, we
generate the cells associated with the node’s left and right children.
Rectangle leftPart(int cd, Point s): (and rightPart(int cd, Point s)) These are
both given a cutting dimension cd and a point s that lies within the rectangle. The
first returns the subrectangle lying to the left (below) of s with respect to the cutting
dimension, and the other returns the subrectangle lying to the right (above) of s with
respect to the cutting dimension (see Fig. 80). More formally, leftPart(cd, s), returns
a rectangle whose low point is the same as r.low and whose high point is the same as
r.high except that the cd-th coordinate is set to s[cd]. Similarly, rightPart(cd, s),
returns a rectangle whose high point is the same as r.high and whose low point is the
same as r.low except that the cd-th coordinate is set to s[cd].

r high
s cd = 0

low
high
(s[0], high[1])
s s
(s[0], low[1])
low
r.leftPart(cd, s) r.rightPart(cd, s)

Fig. 81: The functions leftPart and rightPart.

The following code block provides a high-level overview of the Rectangle class (without
defining any of the functions).
Anwering the Range Query: In order to answer range counting queries, let us first assume that
each node p of the tree has been augmented with a member p.size, indicating the number of

Answering Queries with kd-trees 101 CMSC 420


A skelton of a simple Rectangle class
public class Rectangle {
Point low; // lower left corner
Point high; // upper right corner

public Rectangle(Point low, Point high) // constructor


public boolean contains(Point q) // do we contain q?
public boolean contains(Rectangle c) // do we contain rectangle c?
public boolean isDisjointFrom(Rectangle c) // disjoint from rectangle c?
public float distanceTo(Point q) // minimum distance to point q
public Rectangle leftPart(int cd, Point s) // left part from s
public Rectangle rightPart(int cd, Point s) // right part from s
}

points lying within the subtree rooted at p. This can easily be updated as points are inserted
to and deleted from the tree. The counting function, rangeCount(R, p, cell) operates
recursively. The first argument R is the query range, the second argument p is the node
currently visited, and cell is its associated cell. It returns a count of the number of points
within p’s subtree that lie within R. The initial call is rangeCount(R, root, boundingBox),
where boundingBox is the bounding box of the entire kd-tree.

Disjoint from cell Contains the cell Overlaps (“stabs”) the cell
cell r cell r r
cell

(a) (b) (c)

Fig. 82: Cases arising in orthogonal range searching.

The function operates recursively, working from the root down to the leaves. First, if we fall
out of the tree then there is nothing to count. Second, if the current node’s cell and the range
are completely disjoint, we may return 0, because none of this node’s points lie within the
range (see Fig. 82). Next, if the query range completely contains the current cell, we can
count all the points of p as lying within the range, and so we return p.size. Otherwise, the
range partially overlaps the cell. We say that the range stabs the cell. In this case, we apply
the function recursively to each of our two children. The function is presented in the code
block below.
An Example: Fig. 83 shows an example of a range search. Next to each node we store the size of
the associated subtree in blue. We say that a node is visited if a call to rangeCount() is made
on this node. We say that a node is processed if both of its children are visited. Observe that
for a node to be processed, its cell must overlap the range without being contained within the
range. In the example, the shaded nodes are those that are not processed. For example the
subtree rooted at h is entirely contained within the range, and any points in the subtree can be
safely included in the count. (In this case, this includes the two points p and h.) The subtrees
rooted at k and g are entirely disjoint from the query, and the subtrees rooted at these nodes
can be completely ignored. The nodes with red squares surrounding them those whose points
have been added individually to the count (by the condition R.contains(p.point)). There

Answering Queries with kd-trees 102 CMSC 420


kd-tree Range Counting Query
int rangeCount(Rectangle R, KDNode p, Rectangle cell) {
if (p == null) return 0 // empty subtree
else if (R.isDisjointFrom(cell)) // no overlap with range?
return 0
else if (R.contains(cell)) // the range contains our entire cell?
return p.size // include all points in the count
else { // the range stabs this cell
int count = 0
if (R.contains(p.point)) // consider this point
count += 1
// apply recursively to children
count += rangeCount(R, p.left, cell.leftPart(p.cutDim, p.point))
count += rangeCount(R, p.right, cell.rightPart(p.cutDim, p.point))
return count
}
}

are four such nodes d, f , l, and q. Combined with the two points of h’s subtree, the total
count returned is 6.
p.size
p 18
a
j g In the range
a 18
k s a
e 9 8 f
n b c
c 5 3 4 3 Disjoint from the range
b R d e f g
3 1 1 1 2 1 1 1
p q h i j k l m n s
i 1 1 1
h m p q
t
d t l f Contained in the range

Fig. 83: Range search in kd-trees. The subtree rooted at h is counted entirely. The subtrees rooted
at k and g are excluded entirely. The other points are checked individually.

Analysis of query time: How many nodes does this method visit altogether? We claim that

the total number of nodes is O( n) assuming a balanced kd-tree (which is a reasonable
assumption in the average case).

Theorem: Given a balanced kd-tree with n points in R2 (where the cutting dimension alter-

nates between x and y), orthogonal range counting queries can be answered in O( n)
time.

Recall from the discussion above that a node is processed (both children visited) if and only
if the range partially overlaps or “stabs” the cell. To bound the total number of nodes that
are processed in the search, it suffices to count the total number of nodes whose cells are
stabbed by the query rectangle. Rather than prove the above theorem directly, we will prove

Answering Queries with kd-trees 103 CMSC 420


a simpler result, which illustrates the essential ideas behind the proof. Rather than using a 4-
sided rectangle, we consider an orthogonal range having a only one side, that is, an orthogonal
halfplane. In this case, the query stabs a cell if the vertical or horizontal line that defines the
halfplane intersects the cell.

Lemma: Given a balanced kd-tree with n points in R2 (where the cutting dimension alter-

nates between x and y), any vertical or horizontal line stabs the cells of O( n) nodes of
the tree.
Proof: It will simplify the analysis to assume that the tree is “perfectly balanced”, which
means that if a subtree contains m points then each of its two subtrees contains at most
m/2 points. (The proof generally works as long as the height of the tree is O(log n), but
it is a bit more complicated.)
By symmetry, it suffices to consider a horizontal line. Consider a processed node which
has a cutting dimension along x. A horizontal line can stab the cells of both its children.
On the other hand, if the cutting dimension is along y, a horizontal line either stabs
the upper cell or the lower cell, but not both. (If our horizontal line coincides with the
cutting line, then we consider it to stab the upper cell and not the lower cell.)
NW, SW NE, SE
NW NE NW NE
NW, NE

SW, NE

SW, SE
SW SE SW SE

Fig. 84: An axis-parallel line in 2D can stab at most two out of four cells in two levels of the kd-tree
decomposition. In general, it stabs 2i cells at level 2i.

Since we alternate splitting on x then y, this means that after descending two levels
of the tree, we may stab the cells of at most two of the possible four grandchildren of
the original node. (This is illustrated in Fig. 84.) By our assumption that the tree is
balanced, if the parent node has n points, each of its two children has at most n/2 points,
and each of the four grandchildren has at most n/4 points. Therefore, the total number
of nodes whose cells are stabbed satisfies the following recurrence:

2 + 2T (n/4) if n ≥ 2,
T (n) =
1 if n = 1.
We can solve this recurrence either by appealing to the Master Theorem (see the CLRS
Algorithms book), but it is easy enough to solve directly. By expanding the recurrence
and observing the trend, we obtain:
T (n) = 2 + 2T (n/4)
= 2 + 2(2 + T (n/16)) = 2 + 4 + 4T (n/16)
= (2 + 4) + 4(2 + 2T (n/64)) = (2 + 4 + 8) + 8T (n/64)
= ...
k
X
= 2i + 2k T (n/4k ).
i=1

Answering Queries with kd-trees 104 CMSC 420


To get to the basis case of T (1), we set n/4k = 1, which yields k = log4 n = (lg n)/(lg 4) =
(lg n)/2. The summation term (which is the dominant term) in T (n) is:
k
X √ √
2i ≈ 2 · 2k = 2 · 2(lg n)/2 = 2 · (2lg n )1/2 = 2 · n1/2 = 2 n = O( n).
i=1

This completes the proof.



We have shown that any (infinite) vertical or horizontal line can stab only O( n) cells of the
tree. This upper bound clearly holds for any finite vertical or horizontal line segment. Thus,
if we apply it to the four line segments that bound R, it follows that the the total number of
√ √
cells stabbed by the query range R is O(4 n) = O( n). The total query time is determined
by the sum of nodes visited, which is dominated by the sum of the nodes that are stabbed by
the query. Therefore, the overall running time (assuming a balanced kd-tree and alternating

cutting dimensions) is O( n). This completes the proof of the above lemma.
To see whether you understand this, you might try generalizing this analysis to arbitrary
dimensions d (where d is constant). As a hint, the query time in general will be O(n1−1/d ).

In the case where d = 2, this is O( n). Observe that as d gets larger and larger, the query
time approaches O(n). Unfortunatey, O(n) is the same time as brute-force search (since we
can simply test every point one-by-one to see whether it lies in the range). Thus, kd-trees
are efficient only for fairly small values of d.

Lecture 16: Range Trees


Range Queries: The objective of range searching is to count or report the set of points of some
point set that lie within a given shape. The most well-known instance of range searching
is orthogonal range searching, where the shape is an axis-aligned rectangle. In our previous
lecture, we discussed the use of kd-trees for answering orthogonal range queries, where we

showed that if the tree is balanced, then the running time is close to O( n) to count the
points in the range. Generally, if there are k points in the range, then we can report them

in total time O(k + n). (The modification is that whenever we find a point or subtree that
lies within the range, we traverse the subtree and add all it points to the output.)

Count = 10

Fig. 85: 2-Dimensional orthogonal range counting query.

In this lecture we will present a faster solution for answering these queries. We will consider the
problem in 2-dimensional space, but generalizations to higher dimensions are straightforward.
The data structures is called a range tree. Unlike kd-trees, which are general purpose and
can be applied to many different types of queries, range trees are designed especially for

Range Trees 105 CMSC 420


orthogonal range queries. We will show that a range tree can answer orthogonal counting
range queries in the plane in time O(log2 n). (Recall that log2 n means (log n)2 ). If there are
k points in the range it can also report these points in O(k + log2 n) time. It uses O(n log n)
space, which is somewhat larger than the O(n) space used by kd-trees. (There are actually
two versions of range trees. We will present the simpler version. There is a significantly more
complex version that can answer queries in O(k + log n) time, thus shaving off a log factor
in the running time.) Range trees can be generalized to higher dimensions. In any constant
dimension d, it answers orthogonal range queries in O(logd n) time.

Layering: Range trees are of additional interest because they illustrate a more general technique
used in data structure design, called layering. Suppose that you want to design a data
structure that answers queries based on multiple criteria, all of which must be satisfied. (For
example, find all medical records where the patient was between ages alo and ahi , whose
weight is between wlo and whi , and whose blood pressure is between blo and bhi .) Suppose
we had data structures for answering each type of query individually. How can we use these
to build a data structure for answering finding the entries that satisfy all three. This can
be done by “layering” the individual data structures. (Which we will describe below.) This
is exactly how range trees work—they layer multiple 1-dimensional range trees to answer
multi-dimensional range queries. These are called multi-layer search trees.

1-dimensional Range Tree: Before discussing 2-dimensional range trees, let us first consider
what a 1-dimensional range tree would look like. Given a set S of scalar (say, real-valued)
keys, we wish to preprocess these points so that given a 1-dimensional interval Q = [lo, hi]
along the x-axis, we can count (or report) all the points that lie in this interval (see Fig. 86(a)).
There are a number of simple solutions to this, but we will consider a tree-based method,
because it readily generalizes to higher dimensions.
root
d
Count = 8 size=6 size=7
Report = {h, k, b, d, `, j, m, f }
<d ≥d
ae i h kb d` j mf c g ae i h kb d` j mf c g

Qlo Q Qhi

Fig. 86: 1-dimensional range query.

Let us begin by storing all the points of our data set in the external nodes (leaves) of any
balanced binary search tree sorted by x-coordinates (e.g., an AVL tree). The data values in
the internal nodes will just be used for searching purposes. They may or may not correspond
to actual data values stored in the leaves. We assume that if an internal node contains a
value x0 then the leaves in the left subtree are strictly less than x0 , and the leaves in the right
subtree are greater than or equal to x0 (see Fig. 86(b)). Each node p in this tree is implicitly
associated with a subset S(p) ⊆ S of elements of S that are in the leaves descended from p.
(For example S(root) = S.) We assume that for each node p, we store the number of leaves
that are descended from p, denoted p.size. Thus p.size is equal to the number of elements
in S(p).
Let us introduce a few definitions before continuing. Given the interval Q = [Qlo , Qhi ], we say

Range Trees 106 CMSC 420


that a node p is relevant to the query if S(p) ⊆ Q. That is, all the descendents of p lie within
the interval. If p is relevant, then clearly all of the nodes descended from p are also relevant. A
relevant node p is canonical if p is relevant, but its parent it not. The canonical nodes are the
roots of the maximal subtrees that are contained within Q (see Fig. 87(a)). For each canonical
node p, the subset S(p) is called a canonical subset. Because of the hierarchical structure of
the tree, it is easy to see that the canonical subsets are disjoint from each other, and they
cover the interval Q. In other words, the subset of points of S lying within the interval Q
is equal to the disjoint union of the canonical subsets. Thus, solving a range counting query
reduces to finding the canonical nodes for the query range, and returning the sum of their
sizes.

12 p.size
17
6 6
9 25
4
4 14 4 4
22 29 4
O(log n)
2 2 2 2 2 2 2 2
3 7 12 15 20 24 27 31
1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31
u v
Qlo Q Qhi
Qlo = 2 Q Qhi = 23
(a) (b)

Fig. 87: The canonical subtrees for 1-dimensional range query.

We claim that the canonical subsets corresponding to any range can be identified in O(log n)
time from this structure. Intuitively, given any interval [Qlo , Qhi ], we search the tree to find
the leftmost leaf u whose key is greater than or equal to Qlo and the rightmost leaf v whose
key is less than or equal to Qhi . Clearly all the leaves between u and v (including u and v)
constitute the points that lie within the range. Since these two paths are of length at most
O(log n), there are at most O(2 log n) such trees possible, which is O(log n). To form the
canonical subsets, we take the subsets of all the maximal subtrees lying between u and v. In
Fig. 87(b), we illustrate this for the interval [2, 23]
There are a few different ways to map this intuition into an algorithm. Our approach will be
modeled after the approach used for range searching in kd-trees. We will maintain for each
node a cell C, which in this 1-dimensional case is just an interval [Clo , Chi ]. As with kd-trees,
the cell for node p contains all the points in S(T ).
The arguments to the procedure are the current node, the range Q, and the current cell. Let
C0 = [−∞, +∞] be the initial cell for the root node (or generally, any interval large enough
to contain all the points). The initial call is range1D(root, Q, C0).
Let p.x denote the key associated with the current node p, and let C = [x0 , x1 ] denote the
current cell for node p. We assume that given two ranges A and B, we have utility functions
A.contains(B) which determined whether interval A contains interval B, and there is a
similar function A.contains(x) that determines whether A contains a single point x.
The processing considers the following cases. If we arrive at an external node, we check
whether the point stored in this external node is contained in Q, and if so, we we count it.
Otherwise we are at an internal node. First, if Q contains the entire cell C, then all the points

Range Trees 107 CMSC 420


of this subtree lie within Q and we add p.size to the count (see Fig. 88(a)). Next, if Q is
entirely disjoint from C (that is, Q ∩ C = ∅), then none of p’s descendants can contribute to
the query and we return 0. Otherwise, Q and C partially overlap, and we recursively invoke
the search procedure on each subtree (see Fig. 88(c)). When we invoke the procedure on the
left subtree, we trim the cell to the left part [x0 , p.x] and when we invoke the procedure on
the right subtree, we trim the cell to the right part [p.x, x1 ]. The recursive helper function
range1D(p, Q, C) is shown in the code block below.

C⊆Q C ∩Q=∅ C ∩Q=∅

x0 C x1 x0 C x1 x0 C x1

Qlo Q Qhi Qlo Q Qhi Qlo Q Qhi


(a) (b) (c)

Fig. 88: Range search cases: (a) Q contains p’s cell, (b) Q is disjoint from p’s cell, (c) Q partially
overlaps p’s cell.

1-Dimensional Range Counting Query


int range1Dx(Node p, Range Q, Interval C=[x0,x1]) {
if (p.isExternal) // hit the leaf level?
return (Q.contains(p.point) ? 1 : 0) // count if point in range
else if (Q.contains(C)) // Q contains entire cell?
return p.size // return entire subtree size
else if (Q.isDisjointFrom(C)) // no overlap
return 0
else
return range1Dx(p.left, Q, [x0, p.x]) + // count left side
range1Dx(p.right, Q, [p.x, x1]) // and right side
}

The external nodes counted in the second line and the internal nodes for which we return
p.size are the canonical nodes. (To extend this from counting to to reporting, we simply
replace the step that counts the points in the subtree with a procedure that traverses the
subtree and prints the data in the leaves. Each tree can be traversed in time proportional to
the number of leaves in each subtree.) Combining the observations of this section we have
the following results.

Lemma: Given a (balanced) 1-dimensional range tree and any query range Q, in O(log n)
time we can compute a set of O(log n) canonical nodes p, such that the answer to the
query is the disjoint union of the associated canonical subsets S(p).
Theorem: 1-dimensional range counting queries can be answered in O(log n) time and range
reporting queries can be answered in O(k + log n) time, where k is the number of values
reported.

Range Trees 108 CMSC 420


Range Trees: Now let us consider how to answer range queries in 2-dimensional space. We first
create 1-dimensional tree T as described in the previous section sorted by the x-coordinate
(see the left side of Fig. 89). For each internal node p of T , recall that S(p) denotes the
points associated with the leaves descended from p. For each node p of this tree we build a
1-dimensional range tree for the points of S(p), but sorted on y-coordinates (see the right side
of Fig. 89). This called the auxiliary tree associated with p. Thus, there are n − 1 auxiliary
trees, one for each internal node of T .
x-range tree
y-range tree

p
p.aux

Qlo.x Qhi.x

Qhi.y
S(p)

Qlo.y

S(p)

Fig. 89: 2-Dimensional Range tree.

Notice that there is a significant amount of duplication here. Each point in a leaf of the
x-range tree arises in the sets S(p) for all of its ancestors p in the x-range tree. Since the tree
is assumed to be balanced, it has height O(log n), and therefore, each of the n points appears
in O(log n) auxilliary trees. Thus, the sum of the sizes of all the auxiliary trees is O(n log n).
The original tree itself contributes space of O(n). Thus, we have the following total space for
the 2-dimensional range tree.

Claim: The total size of an 2-dimensional range tree storing n keys is O(n log n).

Now, when a 2-dimensional range is presented we do the following. First, we invoke a variant
of the 1-dimensional range search algorithm to identify the O(log n) canonical nodes. (These
are shown in blue in the left side of Fig. 89.) For each such node p, we know that all the
points of the set lie within the x portion of the range, but not necessarily in the y part of
the range. So, for each of the nodes p of the canonical subtrees, we search the associated
1-dimensional auxiliary y-range and return a count of the resulting points. These counts are
summed up over all the auxiliary subtrees to obtain the final answer.
The algorithm given in the code block below is almost identical the previous one, except
that we make explicit reference to the x-coordinates in the search, and rather than adding
p.size to the count, we invoke a 1-dimensional version of the above procedure using the
y-coordinate instead. Let Q.x denote the x-portion of Q’s range, consisting of the interval

Range Trees 109 CMSC 420


[Qlo .x, Qhi .x]. The function call Q.contains(p.point) is applied on both coordinates, but
the call Q.x.contains(C) only checks the x-part of Q’s range. The procedure range1Dy()
is the same procedure described above, except that it searches on y rather than x.
2-Dimensional Range Counting Query
int range2D(Node p, Range2D Q, Range1D C=[x0,x1]) {
if (p.isExternal) // hit the leaf level?
return (Q.contains(p.point) ? 1 : 0) // count if point in range
else if (Q.x.contains(C)) { // Q’s x-range contains C
[y0,y1] = [-infinity, +infinity] // initial y-cell
return range1Dy(p.aux.root, Q, [y0, y1]) // search auxiliary tree
}
else if (Q.x.isDisjointFrom(C)) // no overlap
return 0
else
return range2D(p.left, Q, [x0, p.x]) + // count left side
range2D(p.right, Q, [p.x, x1]) // and right side
}

Analysis: It takes O(log n) time to identify the canonical nodes in the x-range tree. For each of
these O(log n) nodes we make a call to a 1-dimensional y-range tree. When we invoke this
on the subtree rooted at a node p, the running time is O(log |S(p)|). But, |S(p)| ≤ n, so this
takes O(log n) time for each auxiliary tree search. Since we are performing O(log n) searches,
each taking O(log n) time, the total search time is O(log2 n). As above, we can replace the
counting code with code in range1Dy() with code that traverses the tree and reports the
points. This results in a total time of O(k + log2 n), assuming k points are reported.
Thus, each node of the 2-dimensional range tree has a pointer to a auxiliary 1-dimensional
range tree. We can extend this to any number of dimensions. At the highest level the d-
dimensional range tree consists of a 1-dimensional tree based on the first coordinate. Each
of these trees has an auxiliary tree which is a (d − 1)-dimensional range tree, based on the
remaining coordinates. A straightforward generalization of the arguments presented here
show that the resulting data structure requires O(n logd n) space and can answer queries in
O(logd n) time.

Theorem: Given an n-element point set in d-dimensional space (for any constant d) orthog-
onal range counting queries can be answered in O(logd n) time, and orthogonal range
reporting queries can be answered in O(k +logd n) time, where k is the number of entries
reported.

Lecture 17: Tries and Digital Search Trees


Strings and Digital Data: In earlier lectures, we studied binary search trees, which store keys
from an ordered domain. Each node stores a splitter value, and we decide whether to visit the
left or right subtree based on a comparison with the splitter. Many times, data is presented
in digital form, that is, as a sequence of binary bits. These may be organized into groups,
for example as characters in a string. When data is presented in this form, an alternative
is to design trees that branch in a radix-based manner on these digital values. This can
be advantageous with respect to the data structure’s speed and the simplicity of performing

Tries and Digital Search Trees 110 CMSC 420


update operations. Generically, we refer to these structures as digital search trees. In this
lecture, we will investigate a few variations on this idea.

Tries: The trie (pronounced “try”) and its variations are widely used for storing string data sets.
Tries were introduced by René de la Briandais in 1959, and the term “trie” was later coined
by Edward Fredkin, derived from the middle syllable of the work “retrieval.” (It is said that
Fredkin pronounced it the same as “tree,” but it was changed to “try” to avoid confusion.)
Let us assume that characters are drawn from an alphabet Σ of size k. In its simplest form,
each internal node of a trie has k children, one for each letter of the alphabet. We cast each
character to an integer in the range {0, . . . , k − 1}, which we can use as an index to an array
to access the next child. This way, we can search for a word of length m by visiting m nodes
of our tree. Let us also assume Java’s convention that we index the characters of a string
starting with 0 as the first (leftmost) character of the string. Given a string str, we will refer
to its leftmost character as str[0], and in Java this would be str.charAt(0).
An example is shown in Fig. 90, where we store a set of strings over the alphabet Σ = {a, b, c}
where a = 0, b = 1, and c = 2.

a b c

a b c a b c a b c

a b c a b c a b c a b c a b c

aab aba abc a b c caa cab cac cbc

bcba bcbb

Fig. 90: A trie containing the set {aab, aba, abc, bcba, bcbb, caa, cab, cac, cbc}.

We would like each distinct search path to terminate at a different leaf node. This is not
generally possible when one string is a prefix of another. A simple remedy is to add a special
terminal character to the end of each string. (Later, we will use $ as our terminal character,
but for now we will simply avoid storing keys where one is a prefix of another.)
The drawing shown in Fig. 90 is true to the trie’s implementation, but this form of drawing is
rather verbose. When drawing tries and other digital search trees, we will adopt the manner
shown below, where we label edges with the character. But remember that this is not a
different representation or a different data structure, it is just a different way of drawing the
tree.

Analysis: It is easy to see that we can search for a string in time proportional to the number
of characters in the string. This can be a significant savings over binary search trees. For
example, suppose that you wanted to store a dictionary of English words. A large dictionary
can have around 500,000 entries, and log2 500, 000 ≈ 19, but the length of the average English
word is well less than 10.
Space is an issue, however. In the worst case, the number of nodes in the data structure can
be as high as the total number of characters in all the strings. Observe, for example, that
if we have one key containing 100 characters, we must devote a search path for this string,
which will require 100 nodes and 100k total space, where k = |Σ|. Clearly, this is an absurd

Tries and Digital Search Trees 111 CMSC 420


a b c

a b c a b

b a c b a b c c
aab aba abc caa cab cac cbc
a b
bcba bcbb

Fig. 91: Alternative method of drawing a trie.

situation, but it points to the main disadvantage of tries, namely their relatively high space
requirements. (By the way, if you wanted to perform exact-match searches, hashing would
be both more time and space efficient.)
de la Briandais Trees: One idea for saving space is to convert the k-order trie into a form that
is similar to the first-child/next-sibling representation that we introduced for multiway trees.
A node in this structure stores a single character. If the next character of the string matches,
we descend to the next lower level and advance to the next character of the search string.
Otherwise, we visit the next node at this same level. Eventually, we will either reach the
desired leaf node or else we will run off the end of some level. If the latter happens, then we
report that the key is not found. These are called de la Briandais trees (see Fig. 92).
root
x != x a b c
== x
a b c a b

b a c b a b c c

aab aba abc a b caa cab cac cbc

bcba bcbb

Fig. 92: A de la Briandais tree containing the same strings as in Fig. 90.

The de la Briandais tree trades off a factor of k in the space for a factor of k in the search
time. While the number of nodes can be as large as the total number of characters in all the
strings, the size of each node is just a constant, independent of k. In contrast, the search time
can be as high as O(k) per level of the tree, in contrast to O(1) per level for the traditional
trie.
Patricia Tries: One issue that arises with tries and digital search trees is that we may have long
paths where the contents of two strings are the same. Consider a very degenerate situation
where we have a small number of very long keys. F, that we have the keys “disfunctional”
and “disfunctioning” (see Fig. 93). The standard trie structure would have a long sequence
where there is no branching. This is very wasteful considering that each of these nodes requires
O(k) space, where k is the size of our dictionary.
When keys are closely clustered (in the sense that they share lengthy common prefexes), digi-
tal search trees can suffer from this phenomenon. The solution is to perform path compression,

Tries and Digital Search Trees 112 CMSC 420


d
i
s
f

n
a i
l n
g
disfunctional
disfunctioning

Fig. 93: Space wastage due to long degenerate paths in traditional tries storing “disfunctional”
and “disfunctioning”.

which succinctly encodes long degenerate paths. This is the idea behind a trie variant called
a patricia trie. (The word ‘patricia’ is an acronym for Practical Algorithm To Retrieve Infor-
mation Coded In Alphanumeric.) This concept was introduced in the late 1960’s by Donald
R. Morrison, and was independently discovered by Gernot Gwehenberger.
A patricia trie uses the same node structure as the standard trie, but in addition each node
contains an index field. This field is the index of the discriminating character, which is used
to determine how to branch at this node. This index field value increases as we descend the
tree. (A traditional trie can be viewed as a special case where this index increases by one
with each level.) In the patricia trie, the index field is set to the next index such where there
is a nontrivial split (that is, the next level where the node has two or more children). Note
that once the search path has uniquely determined the string of the set, we just store a link
directly to the leaf node. An example is shown in Fig. 94. Observe we start by testing the
0th character. All the strings that start with “e” are on the left branch from the root node.
Since they all share the next symbol in common, we continue the search with the character
at index 2.
0123456789...
essence
0 essential
e s estimate
2 4 estimation
s t i sublease
5 7 e 6 sublimate
c t e i a e i sublime
essence essential estimate estimation sublease sublimate sublime subliminal subliminal

Fig. 94: A patricia trie for a set of strings. Each node is labeled with the index of the character
and each edge is labeled with the matching character for this branch.

As we did with standard tries, there is a more intuitive way of drawing patricia tries. Rather
than listing the index of the discriminating character using for branching, we instead list the
entire substring from the (see Fig. 95). This is convenient because we can read the substrings
from the drawing without having to refer back to the original strings. As in the previous case,
this is not a different representation or a different data structure, just a different drawing.

Analysis: The patricia trie is superior to the standard trie in terms of both worst-case space

Tries and Digital Search Trees 113 CMSC 420


es subl

sen timat im
ease
ce tial e ion ate e inal
essence essential estimate estimation sublease sublimate sublime subliminal

Fig. 95: Alternative method of drawing a patricia tree.

and worst-case query time. For the space, observe that because the tree splits at least two
ways with each node, and easy induction argument shows that the total number of nodes is
proportional to the number of strings (not the total number of characters in these strings,
which was the case for the standard trie). As with the standard trie, the worst-case search
time is equal to the length of the query string, but note that it may generally be much smaller,
since we only query character positions that are relevant to distinguishing between two keys
of the set.

Suffix trees: So far, we have been considering data structures for storing a set of strings. Another
common application involves storing a single, very long string, for which we want to perform
substring queries. For example, “Given the string “abracadabra”, how often does the sub-
string “ab” occur?” Such queries are useful in applications such as genomics, where queries
are applied to a genome sequence.
Consider a string S = “a0 a1 . . . an−1 $”, which we call the text. For 0 ≤ i ≤ n, define the
ith suffix to be the substring Si = “ai ai+1 . . . an−1 $”. We have intentionally placed a special
terminator character “$” at the end of the string so that every suffix is distinct from any
other substring appearing in S.
For each position i, 0 ≤ i ≤ n, there is a minimum length substring starting at index i that
uniquely identifies Si . For example, in the string “yabbadabbadoo$”, there are two suffixes
(S1 and S6 ) that start with “abbad”, and so this substring does not uniquely identify a
suffix. But there is only one suffix (namely S6 ) that starts with “abbado”, and therefore this
substring is minimum length unique identifier of a suffix.
To make this more formal, for 0 ≤ i ≤ n, define the ith substring identifier, denoted idi , to
be the shortest substring starting at index i that is unique among all substrings in the text
string. for position i. Note that because the end of string character is unique, every position
has a unique substring identifier. An example is shown in the following figure. The set of all
substring identifiers for this string are shown on the left side of Fig. 96.
A suffix tree for a text S is a patricia trie in which we store each of the n + 1 substring
identifiers for the string S. We illustrate this on the right side of Fig. 96. We have adopted
the convention from Fig. 95 for drawing the patricia trie. Each leaf of the tree is associated
with the suffix Si it identifies, and it is labeled with the associated index i.

Efficient Construction: Generally, the size and construction time of a patricia trie are both
proportional to the total length of the strings being inserted. If you are given a text S of
length n, there arePn suffixes of lengths 0, 1, 2, . . . , n. Thus, the total number of characters
over all suffixes is ni=1 i = O(n2 ). This would suggest that the suffix tree has size O(n2 ) and
takes O(n2 ) time to compute. While we will not present the proof, it can be shown that, due

Tries and Digital Search Trees 114 CMSC 420


Index Substring ID Index Substring ID
0 y 7 bbado a $
b o y
1 abbada 8 bado d
2 bbada 9 ado
3 bada 10 do 0 13
bbad d ad bad a o o $
4 ada 11 oo
5 da 12 o$ 5 10 11 12
a o a o a o a o
6 abbado 13 $
1 6 4 9 3 8 2 7

Fig. 96: A suffix tree for the string “yabbadabbadoo$”. (Labels are placed on the edges, but this
can be implemented using a standard trie or patricia trie.)

to the the special structure of the suffix tree, it has O(n) size, and it can be built in O(n)
time.8
By the way, suffix trees are not usually the method of choice when storing suffix information.
There is a related data structure, called a suffix array, which encodes the same information
more succinctly (by a constant factor), and is the method that is usually used in practice
when answering the same sorts of queries.

Suffix-Tree Queries: As an example of answering a query, suppose that we want to know how
many times the substring “abb” occurs within the text string S. To do this we search for the
string “abb” in the suffix tree. If we fall out of the tree, then it does not occur. Otherwise the
search ends at some node u. The number of leaves descended from u is equal to the number
of times “abb” occurs within S.
In our example from Fig. 96, the search ends on the leftmost path in the tree, midway along
the edge labeled “bbad”. Since the subtree rooted here has two nodes, the answer to the
query is 2. By traversing the entire subtree, we can report that the two instances start at
indices 1 and 6. (Recall that we index the string starting with 0.)

Geometric Digital Search Trees: Before leaving the topic of digital search trees, we should
mention an interesting connection to geometric data structures. In an earlier lecture we
discussed point quadtree and point kd-trees as two ways of storing geometric point sets. We
can extend the idea of digital search trees to the geometric setting. We will do this in a
2-dimensional setting, but the generalization to arbitrary dimensions is straightforward.
Our approach will be to define a transformation that maps a 2-dimensional point (x, y) into a
string, in a manner that preserves geometric structure. To do this, let us first assume that we
have applied a scaling transformation so our point coordinates lie within the interval [0, 1).
(For example, we can add a sufficiently large number that our coordinates are positive, and
then we can scale by dividing by the largest possible coordinate value.) By doing this, we
may assume that each point (x, y) satisfies 0 ≤ x, y < 1.
Our next step is to represent coordinate as a binary fraction. For example, the decimal point
(x, y) = (0.356, 0.753) can be represented in binary form as (0.01011 . . . , 0.11000 . . .). We can
treat the binary strings x = 01011 . . . and y = 11000 . . . as strings over the 2-symbol alphabet
Σ = {0, 1}.
8
Recall that the size of a patricia tree is proportional to the number of strings, irrespective of their sizes. This
directly bounds the suffix tree’s size. The trick for constructing the tree efficiently is to start with the last suffix
Sn = “$” and work backwards to the first suffix, using the partially built tree to assist in the construction.

Tries and Digital Search Trees 115 CMSC 420


But, how do we convert two coordinates into a single digital string? Recall that in kd-trees,
we alternated splitting on x and then y. This suggests that we can map two coordinates
x and y into a single binary string by alternating their binary bits. More formally, if x =
0.a1 a2 a3 . . . and y = 0.b1 b2 b3 . . . in binary, we create the binary string a1 b1 a2 b2 a3 b3 . . .. In
our previous example, given our point (x, y) = (0.356, 0.753), we would interleave the bits
of their binary fractions to obtain the binary string “0111001010...”. We we refer to this
as the bit interleaving transformation. Let us apply this transformation to every point in
our 2-dimensional point data set. We can then store the resulting binary strings in a digital
search tree, like a patricia trie.
Wow! Is this crazy transformation from geometric data to digital data meaningful in any
way? It turns out that not only is meaningful, it actually corresponds to one of the most
fundamental geometric geometric data structures, called a PR kd-tree. In a 2-dimensional PR
kd-tree, we assume that the data lies within a square. Let’s assume that this square is our unit
square [0, 1)2 . At the root, we split vertically through the midpoint of this square, creating
two children (West and East). Each child is associated with a rectangular cell. For each child,
if its cell contains two or more points, we split the cell horizontally through its midpoint into
two cells (South and North). We repeat the process, splitting alternately between x and y,
always splitting through the midpoint of the current cell, until the cell has either zero or one
points. An example of the resulting subdivision of space is shown in Fig. 97(a).
0 x 1
1
(0.356, 0.753)
.11 x
NW NE .1 0 1
.10 y y
y 0 1 0 1
.01 x x x x
SW SE .0
.00
0 SW NW SE NE
.00 .01 .10 .11
.0 .1
(a) (b) (c)

Fig. 97: The PR kd-tree as a digital search tree using bit interleaving.)

Observe that after two levels of splitting in our PR kd-tree, we have subdivided the original
square into four quadrants, each of half the side length of the original square. We can label
these SW, NW, SE, and NE (see Fig. 97(b)). Let’s consider what can be said about the
points lying in any one of these quadrants. For concreteness let’s consider the point with
coordinates (0.356, 0.753) in the NW quadrant (see Fig. 97(b)). Since it lies in this quadrant
we know that the leading bits of its x-coordinate is 0 (smaller than half) and the leading bit
of its y-coordinate is 1 (larger than half). Therefore, its digital code starts with “01...”.
Observe that our digital tree will put this point and all points in the NW quadrant into the
01 grandchild subtree of the root. It is easy to verify that each of the four grandchildren of
the root correspond to each of the four quadrants. Furthermore, as we descend the tree, with
each two levels, the digital search tree partitions the points into subtrees in exactly the same
manner as the PR kd-tree does.

Tries and Digital Search Trees 116 CMSC 420


In summary, the PR kd-tree data structure, is equivalent to a digital search tree for the points
after applying the bit-interleaving transformation! While we think of a patricia tree as a data
structure for storing strings, it turns out that this data structure can also be interpreted as
geometric data structure, and it can be used for answering most of the same queries (such as
orthogonal range and nearest neighbor queries) that the kd-tree can answer.

Lecture 18: Memory Management


Memory Management: One of the major systems issues that arises when dealing with data
structures is how storage is allocated and deallocated as objects are created and destroyed.
Although memory management is really a operating systems issue, we will discuss this topic
because there are a number interesting data structures issues that arise. Sometimes for the
sake of efficiency, it is desirable to design a special-purpose memory management system for
your data structure application, rather than using the system’s memory manager.

System-Level: We will not discuss the issue of how the runtime system maintains memory in
great detail. Basically what you need to know is that there are two principal components of
dynamic memory allocation. The first is the stack. When procedures are called, arguments
and local variables are pushed onto the stack, and when a procedure returns these variables
are popped. The stacks grows and shrinks in a very predictable way.
In contrast, whenever objects are allocated through the Java “new” operator, they are stored
in a different section of memory called the heap. (In spite of the similarity of names, this heap
has nothing to do with the binary heap data structure, which is used for priority queues.)
As elements are allocated and deallocated, the heap storage becomes fragmented into pieces.
How to maintain the heap efficiently is the main issue that we will consider.

Approaches: There are two basic aproaches of doing memory management. This has to do with
whether storage deallocation is done explicitly or implicitly. Explicit deallocation is used
in languages like C (resp., C++). Memory is allocated through malloc (resp., new), and
is explicitly released to the system by invoking free (resp., delete). While these systems
provide the user a high degree of control, they also impose a strong burden on the programmer.
Blocks of memory may be leaked in the sense that the memory is inaccessible, and has not been
deallocated. A more significant programming bug is the result of aliasing, where (unknown
to the programmer) two pointers reference the same block of memory. (This often happens
when a shallow copy results in a pointer being copied, rather than making a copy of the
underlying object.) Now, when one of these pointers is used to release the block, the other
pointer still references this chunk of released memory.
In contrast languages like Java uses implicit deallocation. It is up to the system to determine
which objects are no longer accessible and reclaim their storage. This process is called garbage
collection. In both cases there are a number of choices that can be made, and these choices can
have significant impacts on the performance of the memory allocation system. Unfortunately,
there is not one system that is best for all circumstances. We begin by discussing explicit
deallocation systems.

Explicit Allocation/Deallocation: There is one case in which explicit deallocation is very easy
to handle. This is when all the objects being allocated are of the same size. A large contiguous
portion of memory is used for the heap, and we partition this into blocks of size b, where b is

Memory Management 117 CMSC 420


the size of each object. For each unallocated block, we use one word of the block to act as
a next pointer, and simply link these blocks together in linked list, called the available space
list. For each alloc request, we extract a block from the available space list and for each
dealloc we return the block to the list.
avail:

Allocated

Fig. 98: Dynamic memory allocation block structure.

If the records are of different sizes then things become much trickier. We will partition
memory into blocks of varying sizes. Each block (allocated or available) contains information
indicating how large it is. Available blocks are linked together to form an available space list.
The main questions are: (1) what is the best way to allocate blocks for each alloc request,
and (2) what is the fastest way to deallocate blocks for each dealloc request.
The biggest problem in such systems is the fact that after a series of allocation and deallo-
cation requests the memory space tends to become fragmented into small blocks of memory.
This is called external fragmentation, and is inherent to all dynamic memory allocators. Frag-
mentation increases the memory allocation system’s running time by increasing the size of
the available space list, and when a request comes for a large block, it may not be possible to
satisfy this request, even though there is enough total memory available in these small blocks.
Observe that it is not usually feasible to compact memory by moving fragments around. This
is because there may be pointers stored in local variables that point into the heap. Moving
blocks around would require finding these pointers and updating them, which is a very ex-
pensive proposition. We will consider it later in the context of garbage collection. A good
memory manager is one that does a good job of controlling external fragmentation.

Overview: When allocating a block we must search through the list of available blocks of memory
for one that is large enough to satisfy the request. The first question is, assuming that there
does exist a block that is large enough to satisfy the request, what is the best block to select?
There are two common but conflicting strategies:

First-fit: Search the available blocks sequentially until finding the first one that is large
enough to satisfy the request.
Best-fit: Search all available blocks and select the smallest block that is large enough to
fulfill the request.

Both methods work well in some instances and poorly in others. Remarkably, in spite of its
name, Best-fit often performs worse in practice than First-fit. The reasons are twofold. First,
First-fit is faster to execute, since the search can stop at the first block of sufficiently large
size, rather than having to search the entire available list. Second, best-fit tends to produce a
good deal of fragmentation by selecting blocks that are just barely larger than the requested
size, but this results in a small residual block, or sliver, left over, and there may not be any
future requests that are small enough that this small block can be used.

Memory Management 118 CMSC 420


One method which is commonly used to reduce fragmentation is when a request is just barely
filled by an available block that is slightly larger than the request, we allocate the entire
block (more than the request) to avoid the creation of the sliver. This keeps the list of
available blocks free of large numbers of tiny fragments which increase the search time. The
additional waste of space that results because we allocate a larger block of memory than the
user requested is called internal fragmentation (since the waste is inside the allocated block).
When deallocating a block, it is important that if there is available storage we should merge
the newly deallocated block with any neighboring available blocks to create large blocks of
free space. This process is called merging. Merging is trickier than one might first imagine.
For example, we want to know whether the preceding or following block is available. How
would we do this? We could walk along the available space list and see whether we find it,
but this would be very slow. We might store a special bit pattern at the end and start of each
block to indicate whether it is available or not, but what if the block is allocated that the data
contents happen to match this bit pattern by accident? Let us consider the implementation
of this scheme in greater detail.

Notation and Assumptions: Memory is usually allocated in bytes, but it is common to align all
allocated blocks to a word (typically 32-bits) or a double-word (typically 64 bits) boundary.
This way, we don’t need to know what the data is being used for (bytes, ints, floats, doubles,
etc.). In our examples, we will make the simplifying assumption that requests for memory
allocation are given in terms of a number of words, and the block of memory that we return
will be aligned at a word boundary. Given a pointer p to memory, we will use *p to refer
to the contents of the word of memory at this address. Given a pointer p and integer i, we
will use p + i to refer to the memory location that is i words beyond p. We will use the
expression void* to indicate the “type” of a pointer to a location of physical memory.

Block Structure: The main issues are how to we store the available blocks so we can search them
quickly (skipping over used blocks), and how do we determine whether we can merge with
neighboring blocks or not. We want the operations to be efficient, but we do not want to use
up excessive pointer space (especially in used blocks). Here is a sketch of a solution (one of
many possible).

Allocated Available
inUse inUse
prevInUse prevInUse (always 1 for available block)
1 ? size 0 1 size
prev
next
size size

size2

Fig. 99: Block structure for dynamic storage allocation.

Allocated blocks: For each block of used memory we record the following information. It
can all fit in the first word of the block.
size: The size of the block of storage (including space for the following fields).

Memory Management 119 CMSC 420


inUse: A single bit that is set to 1 (true) to indicate that this block is in use.
prevInUse: A single bit that is set to 1 (true) if the previous block is in use and 0
(false) otherwise. (Later we will see why this is useful.)
Available blocks: For an available block we store more information, which is okay because
the user is not using this space. These blocks are stored in a doubly-linked circular list,
called avail. Note that the available space list need not sorted by physical memory
addresses. For example, it might be ordered by some other criteria, such as the sizes of
the blocks.
size: The size of the block of storage (including space for the following fields).
inUse: A bit that is set to 0 (false) to indicate that this block is not in use.
prevInUse: A bit that is set to 1 (true) if the immediately preceding block (not the
same as prev) is in use (which should always be true, since we should never have
two consecutive unused blocks).
prev: A pointer to the previous block on the available space list. (It need not be the
block immediately preceding in memory.)
next: A pointer to the next block on the available space list. (It need not be the block
immediately following in memory.)
size2: Contains the same value as size at the head of the block. This field is stored in
the last word of the block. For block p it can be accessed as *(p + p.size - 1).

Note that available blocks require more space for all this extra information. The system
will never allow the creation of a fragment that is so small that it cannot contain all this
information.
You will observe that the run-time system trusts that the user’s program will not overwrite
any of the block header fields or previous/next pointers. What is to keep this from happening?
The answer (at least with C and C++) is nothing! If the program accidentally overwrites a
memory location outside of the allocated block, it can destroy the memory system’s integrity,
and very soon everything fails. Usually, the result is an abort with just a cryptic message,
such as “Segmentation fault.” Failure to check for these faults can result in undetected
buffer-overflow writes, a common technique for hacking into software systems.

Allocation: To allocate a block we search through the linked list of available blocks until finding
one of sufficient size (see Fig. 100). If the request is about the same size (or perhaps slightly
smaller) as the block, we remove the block from the list of available blocks (performing the
necessary relinkings) and return a pointer to it. We also may need to update the prevInUse
bit of the next block since this block is no longer available. Otherwise we split the block into
two smaller blocks, return one to the user, and leave the other on the available space list.
The following code block presents an algorithm for allocating a new block of memory. Note
that we make use of pointer arithmetic here. The argument b is the desired size of the
allocation. Because we reserve one word of storage for our own use we increment this value on
entry to the procedure. We keep a constant TOO SMALL, which indicates the smallest allowable
fragment size. If the allocation would result in a fragment of size less than this value, we
return the entire block. The procedure returns a generic pointer to the newly allocated block.
The utility function avail.unlink(p) simply unlinks block p from the doubly-linked available
space list. An example is provided in Fig. 100. Shaded blocks are available.

Memory Management 120 CMSC 420


0 1 100
0 1 50

1 0 20

1 0 30

0 1 50
100
50

50
Initial:

50 20 100 30 50
returned
0 1 50

1 0 20

0 1 40

1 0 60

1 1 30

0 1 50
50

40

50
alloc(59)

50 20 40 60 30 50

Fig. 100: An example of block allocation.

Allocate a block of storage


(void*) alloc(int b) { // allocate block with b words
b += 1; // extra space for system overhead
p = search available space list for block of size at least b;
if (p == null) { ...Error! Insufficient memory...}
if (p.size - b < TOO_SMALL) { // remaining fragment too small?
avail.unlink(p); // remove entire block from avail list
q = p; // this is block to return
}
else { // split the block
p.size -= b; // decrease size by b
*(p + p.size - 1) = p.size; // set new block’s size2 field
q = p + p.size; // offset of start of new block
q.size = b; // size of new block
q.prevInUse = 0; // previous block is unused
}
q.inUse = 1; // new block is used
(q + q.size).prevInUse = 1; // adjust prevInUse for following block
return q + 1; // offset the link (to avoid header)
}

Memory Management 121 CMSC 420


Deallocation: To deallocate a block, we check whether the next block or the preceding blocks are
available. For the next block we can find its first word and check its inUse field. For the
preceding block we use our own prevInUse field. (This is why this field is present). If the
previous block is not in use, then we use the size value stored in the last word to find the
block’s header. If either of these blocks is available, we merge the two blocks and update the
header values appropriately. If both the preceding and next blocks are available, then this
result in one of these blocks being deleting from the available space list (since we do not want
to have two consecutive available blocks). If both the preceding and next blocks are in-use,
we simply link this block into the list of available blocks (e.g. at the head of the list).
The deletion function is shown in the following code block. Let avail denote the head of
the available space list. There are four different cases depending on whether the blocks that
immediate precede or follow p are allocated or available.
ˆ If the following block q is available, then we merge it with p. (See Fig. 101, upper
half.) To do this we update p’s size, and move q’s record in the available space list to
p, using a utility function move(q, p). This copies q’s previous and next fields to p
and appropriately update the entries in the available space list that point to q. (E.g.,
q.prev.next = p.) Otherwise we add p to the available space list, which we assume
will set its previous and next pointers. Now we may assume that p is on the available
space list.
dealloc
0 1 50

1 0 20

0 1 40

1 0 60

1 1 30

0 1 50
50

40

50
Initial:

50 20 40 60 30 50
0 1 50

0 0 60

1 0 60

1 1 30

0 1 50

Merge with
50

40

50
trailing block

50 60 60 30 50
0 1 110

1 0 60

1 1 30

0 1 50

Merge with
110

50

previous block

110 60 30 50

Fig. 101: An example of block deallocation and merging.

ˆ If the preceding block is not in use, we merge this block with p (see Fig. 101, lower half),
and unlink p from the available space list. Note that we do need to alter p.prevInUse.
If the previous block was available, then we merged p with it and p has gone away, and
otherwise p’s original value was correct.

Analysis: There is not much theoretical analysis of this method of dynamic storage allocation.
Because the system has no knowledge of the future sequence of allocation and deallocation
requests, it is possible to contrive situations in which either first fit or best fit (or virtually any

Memory Management 122 CMSC 420


Deallocating a block of storage
void dealloc(void* p) { // deallocate block at p
p--; // back up to the header
q = p + p.size; // the following block
if (!q.inUse) { // is following available?
p.size += q.size; // merge q into p
avail.move(q, p); // move q to p in avail space list
}
else avail.insert(p); // insert p into avail space list
// p is now in avail space list
p.inUse = 0; // p is now available
*(p + p.size - 1) = p.size; // set our size2 value

if (!p.prevInUse) { // previous is available?


q = p - *(p-1); // get previous block using size2
q.size += p.size; // merge p into q
*(q + q.size - 1) = q.size; // store new size2 value
avail.unlink(p); // unlink p from avail space list
(q + q.size).prevInUse = 0; // notify next that we are avail
}
}

other method you can imagine) will perform poorly. Empirical studies based on simulations
have shown that this method achieves utilizations of around 2/3 of the total available storage
before failing to satisfy a request. Even higher utilizations can be achieved if the blocks are
small on average and block sizes are similar (since this limits fragmentation). A rule of thumb
is to allocate a heap that is at least 10 times larger than the largest block to be allocated.

Buddy System: The dynamic storage allocation method described last time suffers from the
problem that long sequences of allocations and deallocations of objects of various sizes tends
to result in a highly fragmented space. The buddy system is an alternative allocation system
which limits the possible sizes of blocks and their positions, and so tends to produce a more
well-structured allocation. Because it limits block sizes, internal fragmentation (the waste
caused when an allocation request is mapped to a larger block size) becomes an issue.
The buddy system works by starting with a block of memory whose size is a power of 2 and
then hierarchically subdivides each block into blocks of equal sizes (see Fig. 103). To make
this intuition more formal, we introduce the two key elements of the buddy system:

(i) The sizes of all blocks (both allocated and available) are powers of 2. When a request
comes for an allocation, the request (including the overhead space needed for storing
block size information) is artificially rounded up to the next higher power of 2. Note
that the allocated size is never more than twice the size of the request.
(ii) Blocks of size 2k start at memory addresses that are multiples of 2k . (We assume that
addressing starts at 0, but it is easy to update this scheme to start at any arbitrary
address by simply shifting addresses by an appropriate offset.)

Note that the above requirements limits the ways in which blocks may be merged. For
example Fig. 103 below illustrates a buddy system allocation of blocks, where the blocks of
size 2k are shown at the same level. Available blocks shaded blue and allocated blocks are
shaded red. The two available blocks at addresses 5 and 6 (the two white blocks between 4

Memory Management 123 CMSC 420


Level 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fig. 102: Buddy system block structure.

and 8) cannot be merged because the result would be a block of length 2, starting at address
5, which is not a multiple of 2. For each size group, there is a separate available space list.
avail
allocated available
4
3
2
1
0
0 4 8 12 16 20 24 28 32 36 40 44 48

Fig. 103: Buddy system example. Allocated blocks are shaded red and available blocks are shaded
blue. Available blocks are linked into the available-block list of the appropriate size.

For every block there is exactly one other block with which this block can be merged with.
This is called its buddy. In general, if a block b is of size 2k , and is located at address x, then
its buddy is the block of size 2k located at address

x + 2k if 2k+1 divides x

buddyk (x) = k
x−2 otherwise.

This is easy to compute via bit manipulation. Basically the buddy’s address is formed by
complementing bit k in the binary representation of x (where the lowest order bit is bit 0). In
Java, this can be expressed as buddy(k,x) = (1<<k)^x. For example, for k = 3 the blocks
of length 8 at addresses 208 and 216 are buddies. If we look at their binary representations
we see that they differ in bit position 3 (four bits from the right). Because they must be
multiples of 8, bits 0–2 are zero.

80 = 10100002
88 = 10110002 .

So, buddy3 (80) = 88 and vice versa.

Putting the Pieces Together: As we mentioned earlier, one principle advantage of the buddy
system is that we can exploit the regular sizes of blocks to search efficiently for available
blocks. We maintain an array of linked lists, one for the available block list for each size

Memory Management 124 CMSC 420


group. In particular, avail[k] is the header pointer to a doubly linked list of available
blocks of size 2k .
Here is how the basic operations work. We assume that each block has the same structure
as described in the dynamic storage allocation example from last time. The prevInUse bit
and the size field at the end of each available block are not needed given the extra structure
provided in the buddy system. Each block stores its size (actually the log base 2 of its size is
sufficient) and a bit indicating whether it is allocated or not. Also each available block has
links to the previous and next entries on the available space list. There is not just one available
space, but rather there are multiple lists, one for each level of the hierarchy (something like
a skip list). This makes it possible to search quickly for a block of a given size.

Buddy System Allocation: We will give a high level description of allocation and dealloction.
To allocate a block of size b, let k = dlg(b + 1)e. (Recall that we need to include one extra
word for the block size. However, to simplify our figures we will ignore this extra word.)
We will allocate a block of size 2k . In general there may not be a block of exactly this size
available, so find the smallest j ≥ k such that there is an available block of size 2j . If j > k,
repeatedly split this block until we create a block of size 2k . In the process we will create one
or more new blocks, which are added to the appropriate available space lists.
For example, in Fig. 104 we request a block of length 2. There are no available blocks of
this size, so we use the smallest available block of the next larger size, that is, the block of
length 16 at address 32. We remove it from its available space list. We recursively split it
into subblocks of sizes 8, 4, 2, 2, until we get to a block of the desired size. We return one of
the blocks of size 2, and we insert the remaining blocks (of sizes 2, 4, and 8) into the available
space lists of the appropriate sizes.
avail

4
3
2
1
0
alloc(2) 0 4 8 12 16 20 24 28 32 36 40 44 48
avail
returned split
4
3
2
1
0
0 4 8 12 16 20 24 28 32 36 40 44 48

Fig. 104: Example of allocating a block in the buddy system.

Deallocation: To deallocate a block, we first mark this block as being available. We then check
to see whether its buddy is available. This can be done in constant time. If so, we remove
the buddy from its available space list, and merge them together into a single free block of
twice the size. This process is repeated until we find that the buddy is allocated.

Memory Management 125 CMSC 420


Fig. 105 shows the deallocation of the block of size 1 at address 21. It is merged with its
buddy of size 1 at address 20, thus forming a block of size 2 at 20. This is then merged
with its size-2 buddy at 22, forming a block of size 4 at 20. Finally, this is merged with its
size-4 buddy at 16, forming a block of size 8 at 16. This block’s buddy (the block of size 8
starting at 0) is not available, so the merging process stops. We insert this final block into
the appropriate level of the available space list.
avail

4
3
2
1
0
dealloc(21) 0 4 8 12 16 20 24 28 32 36 40 44 48
avail dealloc
4
3
merge
2
1
0
0 4 8 12 16 20 24 28 32 36 40 44 48

Fig. 105: Example of deallocating a block in the buddy system.

Fibonacci Buddy System: An interesting variant of the buddy system is designed to reduce
fragmentation by using a hierarchy that is “denser” with respect to the sizes available. The
Fibonacci Buddy System uses Fibonacci numbers, rather than powers of 2.
Recall that F (0) = 0, F (1) = 1, and generally F (i) = F (i − 1) + F (i − 2). The first few
Fibonacci numbers are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .. In this system avail[k] stores available
blocks of size F (k). Each allocation request (after adding +1 for the header) is rounded up
to the next larger Fibonacci number, say F (k). If no block of size F (k) is available, we find
next larger available size, say F (j). We then repeatedly split F (j) using Fibonacci numbers
to obtain a block of the desired size. For example, if F (k) = F (3) = 2 is the desired size and
F (j) = F (9) = 34 is the smallest available, we split the F (j) block up as:

F (9) = 34 = F (7)+F (8) = F (5)+F (6)+F (8) = F (3)+F (4)+F (6)+F (8) = 2+3+8+21.

We remove the block of size 34 from avail[9], and split it up as described above. We return
the block of size 2, and insert the blocks of sizes 3, 8, and 21 into avail[4], avail[6],
avail[8], respectively.

Memory Management 126 CMSC 420


Supplemental Lectures

Memory Management 127 CMSC 420


Supplemental Lecture 1: Mathematical Prelimaries
Mathematics: In this lecture we will present a brief overview of the mathematical tools that will
be needed to reason about the data structures and algorithms we will be working with. A
good understanding of mathematics helps greatly in the ability to design good data structures,
since through mathematics it is possible to get a clearer understanding of the nature of the
data structures, and a general feeling for their efficiency in time and space. Last time we
gave a brief introduction to asymptotic (big-“Oh” notation), and later this semester we will
see how to apply that. Today we consider a few other preliminary notions: summations and
proofs by induction.
Summations: Summations are important in the analysis of programs that operate iteratively. For
example, in the following code fragment

for (i = 0; i < n; i++) { ... }

Where the loop body (the “...”) takes f (i) time to run the total running time is given by
the summation
n−1
X
T (n) = f (i).
i=0
Observe that nested loops naturally lead to nested sums. Solving summations breaks down
into two basic steps. First simplify the summation as much as possible by removing constant
terms (note that a constant here means anything that is independent of the loop variable,
i) and separating individual terms into separate summations. Then each of the remaining
simplified sums can be solved. Some important sums to know are
n
X
1 = n (The constant series)
i=1
n
X n(n + 1)
i = (The arithmetic series)
2
i=1
n
X 1
= ln n + O(1) (The harmonic series)
i
i=1
n
X cn+1 − 1
ci = c 6= 1 (The geometric series)
c−1
i=0

Note that complex sums can often be broken down into simpler terms, which can then be
solved. For example
2n−1
X 2n−1
X n−1
X
T (n) = (3 + 4i) = (3 + 4i) − (3 + 4i)
i=n i=0 i=0
2n−1 2n−1 n−1 n−1
! !
X X X X
= 3 1+4 i − 3 1+4 i
i=0 i=0 i=0 i=0
   
2n(2n − 1) n(n − 1)
= 3(2n) + 4 − 3(n) + 4 = n + 6n2 .
2 2

Mathematical Prelimaries 128 CMSC 420


The last summation is probably the most important one for data structures. For example,
suppose you want to know how many nodes are in a complete 3-ary tree of height h. The
height of a tree is the maximum number of edges from the root to a leaf.) One way to break
this computation down is to look at the tree level by level. At the top level (level 0) there
is 1 node, at level 1 there are 3 nodes, at level 2, 9 nodes, and in general at level i there 3i
nodes. To find the total number of nodes we sum over all levels, 0 through h. Plugging into
the above equation with h = n we have:
h
X 3h+1 − 1
3i = ∈ O(3h ).
2
i=0

Conversely, if someone told you that he had a 3-ary tree with n nodes, you could determine
the height by inverting this. Since n = (3(h+1) − 1)/2 then we have

3(h+1) = (2n + 1)

implying that
h = (log3 (2n + 1)) − 1 ∈ O(log n).

Another important fact to keep in mind about summations is that they can be approximated
using integrals.
Xb Z b
f (i) ≈ f (x)dx.
i=a x=a

Given an obscure summation, it is often possible to find it in a book on integrals, and use
the formula to approximate the sum.

Recurrences: A second mathematical construct that arises when studying recursive programs (as
are many described in this class) is that of a recurrence. A recurrence is a mathematical
formula that is defined recursively. For example, let’s go back to our example of a 3-ary tree
of height h. There is another way to describe the number of nodes in a complete 3-ary tree.
If h = 0 then the tree consists of a single node. Otherwise that the tree consists of a root
node and 3 copies of a 3-ary tree of height h − 1. This suggests the following recurrence which
defines the number of nodes N (h) in a 3-ary tree of height h:

N (0) = 1
N (h) = 3N (h − 1) + 1 if h ≥ 1.

Although the definition appears circular, it is well grounded since we eventually reduce to
N (0).

N (1) = 3N (0) + 1 = 3 · 1 + 1 = 4
N (2) = 3N (1) + 1 = 3 · 4 + 1 = 13
N (3) = 3N (2) + 1 = 3 · 13 + 1 = 40,

and so on.
There are two common methods for solving recurrences. One (which works well for simple
regular recurrences) is to repeatedly expand the recurrence definition, eventually reducing

Mathematical Prelimaries 129 CMSC 420


it to a summation, and the other is to just guess an answer and use induction. Here is an
example of the former technique.

N (h) = 3N (h − 1) + 1
= 3(3N (h − 2) + 1) + 1 = 9N (h − 2) + 3 + 1
= 9(3N (h − 3) + 1) + 3 + 1 = 27N (h − 3) + 9 + 3 + 1
..
.
= 3k N (h − k) + (3k−1 + . . . + 9 + 3 + 1)

When does this all end? We know that N (0) = 1, so let’s set k = h implying that
h
X
N (h) = 3h N (0) + (3h−1 + . . . + 3 + 1) = 3h + 3h−1 + . . . + 3 + 1 = 3i .
i=0

This is the same thing we saw before, just derived in a different way.

Proofs by Induction: The last mathematical technique of importance is that of proofs by induc-
tion. Induction proofs are critical to all aspects of computer science and data structures, not
just efficiency proofs. In particular, virtually all correctness arguments are based on induc-
tion. From courses on discrete mathematics you have probably learned about the standard
approach to induction. You have some theorem that you want to prove that is of the form,
“For all integers n ≥ 1, blah, blah, blah”, where the statement of the theorem involves n in
some way. The idea is to prove the theorem for some basis set of n-values (e.g. n = 1 in this
case), and then show that if the theorem holds when you plug in a specific value n − 1 into
the theorem then it holds when you plug in n itself. (You may be more familiar with going
from n to n + 1 but obviously the two are equivalent.)
In data structures, and especially when dealing with trees, this type of induction is not
particularly helpful. Instead a slight variant called strong induction seems to be more relevant.
The idea is to assume that if the theorem holds for all values of n that are strictly less than n
then it is true for n. As the semester goes on we will see examples of strong induction proofs.
Let’s go back to our previous example problem. Suppose we want to prove the following
theorem.

Theorem: Let T be a complete 3-ary tree with n ≥ 1 nodes. Let H(n) denote the height of
this tree. Then
H(n) = (log3 (2n + 1)) − 1.
Basis Case: (Take the smallest legal value of n, n = 1 in this case.) A tree with a single
node has height 0, so H(1) = 0. Plugging n = 1 into the formula gives (log3 (2 · 1 + 1)) − 1
which is equal to (log3 3) − 1 or 0, as desired.
Induction Step: We want to prove the theorem for the specific value n > 1. Note that we
cannot apply standard induction here, because there is no complete 3-ary tree with 2
nodes in it (the next larger one has 4 nodes).
We will assume the induction hypothesis, that for all smaller n0 , 1 ≤ n0 < n, H(n0 ) is
given by the formula above. (This is sometimes called strong induction, and it is good
to learn since most induction proofs in data structures work this way.)

Mathematical Prelimaries 130 CMSC 420


Let’s consider a complete 3-ary tree with n > 1 nodes. Since n > 1, it must consist of
a root node plus 3 identical subtrees, each being a complete 3-ary tree of n0 < n nodes.
How many nodes are in these subtrees? Since they are identical, if we exclude the root
node, each subtree has one third of the remaining number nodes, so n0 = (n − 1)/3.
Since n0 < n we can apply the induction hypothesis. This tells us that
H(n0 ) = (log3 (2n0 + 1)) − 1 = (log3 (2(n − 1)/3 + 1)) − 1
= (log3 (2(n − 1) + 3)/3) − 1 = (log3 (2n + 1)/3) − 1
= log3 (2n + 1) − log3 3 − 1 = log3 (2n + 1) − 2.

Note that the height of the entire tree is one more than the heights of the subtrees so
H(n) = H(n0 ) + 1. Thus we have:
H(n) = log3 (2n + 1) − 2 + 1 = log3 (2n + 1) − 1,
as desired.

This may seem like an awfully long-winded way of proving such a simple fact. But induction
is a very powerful technique for proving many more complex facts that arise in data structure
analyses.
You need to be careful when attempting proofs by induction that involve O(n) notation. Here
is an example of a common error.

(False) Theorem: For n ≥ 1, let T (n) be given by the following summation


n
X
T (n) = i,
i=0

then T (n) ∈ O(n). (We know from the formula for the linear series above that T (n) =
n(n + 1)/2 ∈ O(n2 ). So this must be false. Can you spot the error in the following
“proof”?)
Basis Case: For n = 1 we have T (1) = 1, and 1 is O(1).
Induction Step: We want to prove the theorem for the specific value n > 1. Suppose that
for any n0 < n, T (n0 ) ∈ O(n0 ). Now, for the case n, we have (by definition)
n n−1
!
X X
T (n) = i = i + n = T (n − 1) + n.
i=0 i=0

Now, since n−1 < n, we can apply the induction hypothesis, giving T (n−1) ∈ O(n−1).
Plugging this back in we have
T (n) ∈ O(n − 1) + n.
But (n − 1) + n ≤ 2n − 1 ∈ O(n), so we have T (n) ∈ O(n).

What is the error? Recall asymptotic notation applies to arbitrarily large n (for n in the
limit). However induction proofs by their very nature only apply to specific values of n. The
proper way to prove this by induction would be to come up with a concrete expression, which
does not involve O-notation. For example, try to prove that for all n ≥ 1, T (n) ≤ 50n. If you
attempt to prove this by induction (try it!) you will see that it fails.

Leftist and Skew Heaps 131 CMSC 420


Supplemental Lecture 2: Leftist and Skew Heaps
Leftist Heaps: The standard binary heap data structure is an simple and efficient data structure
for the basic priority queue operations insert(x) and x = extractMin(). It is often the
case in data structure design that the user of the data structure wants to add additional
capabilities to the abstract data structure. When this happens, it may be necessary to
redesign components of the data structure to achieve efficient performance.
For example, consider an application in which in addition to insert and extractMin, we want
to be able to merge the contents of two different queues into one queue. As an application,
suppose that a set of jobs in a computer system are separate queues waiting for the use of
two resources. If one of the resources fails, we need to merge these two queues into a single
queue.
We introduce a new operation Q = merge(Q1, Q2), which takes two existing priority queues
Q1 and Q2 , and merges them into a new priority queue, Q. (Duplicate keys are allowed.)
This operation is destructive, which means that the priority queues Q1 and Q2 are destroyed
in order to form Q. (Destructiveness is quite common for operations that map two data
structures into a single combined data structure, since we can simply reuse the same nodes
without having to create duplicate copies.)
We would like to be able to implement merge() in O(log n) time, where n is the total number
of keys in priority queues Q1 and Q2 . Unfortunately, it does not seem to be possible to do
this with the standard binary heap data structure (because of its highly rigid structure and
the fact that it is stored in an array, without the use of pointers).
We introduce a new data structure called a leftist heap, which is fairly simple, and can provide
the operations insert(x), extractMin(), and merge(Q1,Q2). This data structure has many
of similar features to binary heaps. It is a binary tree which is partially ordered, which means
that the key value in each parent node is less than or equal to the key values in its children’s
nodes. However, unlike a binary heap, we will not require that the tree is complete, or even
balanced. In fact, it is entirely possible that the tree may be quite unbalanced.

Leftist Heap Property: Define the null path length, npl(v), of any node v to be the length of
the shortest path to a descendent has either 0 or 1 child. The value of npl(v) can be defined
recursively as follows.

−1 if v = null,
npl(v) =
1 + min(npl(v.left), npl(v.right)) otherwise.

We will assume that each node has an extra field, v.npl that contains the node’s npl value.
The leftist property states that for every node v in the tree, the npl of its left child is at least
as large as the npl of its right child. We say that the keys of a tree are partially ordered if
each node’s key is greater than or equal to its parent’s key.

Leftist heap: Is a binary tree whose keys are partially ordered (parent is less than or equal
to child) and which satisfies the leftist property (npl(left) ≥ npl(right)). An example is
shown in the figure below, where the npl values are shown next to each node.

Note that any tree that does not satisfy leftist property can always be made to do so by
swapping left and right subtrees at any nodes that violate the property. Observe that this

Leftist and Skew Heaps 132 CMSC 420


2
3
2 1
6 10
1 1 0 0
12 7 21 14
0 0 1 0 0
18 24 8 37 23
0 0 0
33 17 18
0
26

Fig. 106: Leftist heap structure (with npl values shown).

does not affect the partial ordering property. Also observe that satisfying the leftist heap
property does not imply that the tree is balanced. Indeed, a degenerate binary tree in which
is formed from a chain of nodes each attached as the left child of its parent does satisfy this
property.
The key to the efficiency of leftist heap operations is that there exists a short (O(log n) length)
path in every leftist heap (namely the rightmost path). We prove the following lemma, which
implies that the rightmost path in the tree cannot be of length greater than O(log n).

Lemma: A leftist heap with r ≥ 1 nodes on its rightmost path has has at least 2r − 1 nodes.
Proof: The proof is by induction on the size of the rightmost path. Before beginning the
proof, we begin with two observations, which are easy to see: (1) the shortest path in
any leftist heap is the rightmost path in the heap, and (2) any subtree of a leftist heap
is a leftist heap. For the basis case, if there is only one node on the rightmost path, then
the tree has at least one node. Since 1 ≥ 21 − 1, the basis case is satisfied.
For the induction step, let us suppose that the lemma is true for any leftist heap with
strictly fewer than r nodes on its rightmost path, and we will prove it for a binary tree
with exactly r nodes on its rightmost path. Remove the root of the tree, resulting in
two subtrees. The right subtree has exactly r − 1 nodes on its rightmost path (since we
have eliminated only the root), and the left subtree must have a at least r − 1 nodes
on its rightmost path (since otherwise the rightmost path in the original tree would not
be the shortest, violating (1)). Thus, by applying the induction hypothesis, it follows
that the right and left subtrees have at least 2r−1 − 1 nodes each, and summing them,
together with the root node we get a total of at least

2(2r−1 − 1) + 1 = 2r − 1

nodes in the entire tree.

Leftist Heap Operations: The basic operation upon which leftist heaps are based is the merge
operation. Observe, for example, that both the operations insert() and extractMin() can
be implemented by using the operation merge(). (Note the similarity with splay trees, in
which all operations were centered around the splay operation.)
The basic node of the leftist heap is a LeftHeapNode. Each such node contains an data
field of type Element (upon which comparisons can be made) a left and right child, and an

Leftist and Skew Heaps 133 CMSC 420


npl value. The constructor is given each of these values. The leftist heap class consists of
a single root, which points to the root node of the heap. Later we will describe the main
procedure merge(LeftHeapNode h1, LeftHeapNode h2), which merges the two (sub)heaps
rooted at h1 and h2 . For now, assuming that we have this operation we define the main heap
operations. Recall that merge is a destructive operation.
Leftist Operations
void insert(Element x) {
root = merge(root, new LeftHeapNode(x, null, null, 0))
}

Element extractMin() {
if (root == null) return null // empty heap
Element minItem = root.data // minItem is root’s element
root = merge(root.left, root.right)
return minItem
}

void merge(LeftistHeap Q1, LeftistHeap Q2) {


root = merge(Q1.root, Q2.root)
}

Leftist Heap Merge: All that remains, is to show how merge(h1, h2) is implemented. The
formal description of the procedure is recursive. However it is somewhat easier to understand
in its nonrecursive form. Let h1 and h2 be the two leftist heaps to be merged. Consider
the rightmost paths of both heaps. The keys along each of these paths form an increasing
sequence. We could merge these paths into a single sorted path (as in merging two lists of
sorted keys in the merge-sort algorithm). However the resulting tree might not satisfy the
leftist property. Thus we update the npl values, and swap left and right children at each
node along this path where the leftist property is violated. A recursive implementation of
the algorithm is given below. It is essentially the same as the one given in Weiss, with some
minor coding changes.
For the analysis, observe that because the recursive algorithm spends O(1) time for each node
on the rightmost path of either h1 or h2 , the total running time is O(log n), where n is the
total number of nodes in both heaps.
This recursive procedure is a bit hard to understand. A somewhat easier 2-step interpretation
is as follows. First, merge the two right paths, and update the npl values as you do so. Second,
walk back up along this path and swap children whenever the leftist condition is violated.
The figure below illustrates this way of thinking about the algorithm. The recursive algorithm
just combines these two steps, rather than making the separate.

Skew Heaps: Recall that a splay tree is a self-adjusting balanced binary tree. Unlike the AVL
tree, which maintains balance information at each node and performs rotations to maintain
balance, the splay tree performs a series of rotations to continuously “mix-up” the tree’s
structure, and hence achieve a sort of balance from an amortized perspective.
A skew heap is a self-organizing heap, and applies this same idea as in splay trees, but to the
leftist heaps instead. As with splay trees we store no balance information with each node (no
npl values). We perform the merge operation as described above, but we swap the left and

Leftist and Skew Heaps 134 CMSC 420


Merge two leftist heaps
LeftHeapNode merge(LeftHeapNode h1, LeftHeapNode h2) {
if (h1 == null) return h2 // if one is empty, return the other
if (h2 == null) return h1
if (h1.data > h2.data) // swap so h1 has the smaller root
swap(h1, h2)

if (h1.left == null) // h1 must be a leaf in this case


h1.left = h2
else { // merge h2 on right and swap if needed
h1.right = merge(h1.right, h2)
if (h1.left.npl < h1.right.npl) {
swap(h1.left, h1.right) // swap children to make leftist
}
h1.npl = h1.right.npl + 1 // update npl value
}
return h1
}

Merge right paths


1 and update npl’s 2
3 3
1 0 1 2
10 8 10 6
0 0 0 0 0 1 1
21 14 17 21 14 12 7
0 0 0 0 0 0 1
23 26 23 18 24 37 8
0 0 0
33 17 18
0
2 swap subtrees if 26
6 left.npl < right.npl
1 1
12 7 2
0 0 0 0 3
18 24 37 18 2 1
0 6 10
33 1 1 0 0
12 7 21 14
0 0 1 0 0
18 24 8 37 23
0 0 0
33 17 18
0
26

Fig. 107: Example of merging two leftist heaps (2-step interpretation.

Leftist and Skew Heaps 135 CMSC 420


right children from every node along the rightmost path. (Note that in the process the old
rightmost path gets flipped over to become the leftmost path. An example is given in Weiss’s
book, Section 6.7.)
It can be shown that if the heap contains at most n elements, then a sequence of m heap
operations (insert, extractMin, merge) starting from empty heaps takes O(m log n) time.
Thus the average time per operation is O(log n). Because no balance information is needed,
the code is quite simple.

Supplemental Lecture 3: Disjoint Set Union-Find


Equivalence relations: An equivalence relation over some set S is a relation that satisfies the
following properties for all elements of a, b, c ∈ S.

Reflexive: a ≡ a.
Symmetric: a ≡ b then b ≡ a
Transitive: a ≡ b and b ≡ c then a ≡ c.

Equivalence relations arise in numerous applications. An example includes any sort of “group-
ing” operation, where every object belongs to some group (perhaps in a group by itself) and
no object belongs to more than one group. More formally these groups are called equivalent
classes and the subdivision of the set into such classes is called a partition. For example,
suppose we are maintaining a bidirectional communication network. The ability to commu-
nicate is an equivalence relation, since if machine a can communicate with machine b, and
machine b can communicate with machine c, then machine a can communicate with machine
c (e.g. by sending messages through b). Now suppose that a new link is created between two
groups, which previously were unable to communicate. This has the effect of merging two
equivalence classes into one class.
We discuss a data structure that can be used for maintaining equivalence partitions with
two operations: (1) union, merging to groups together, and (2) find, determining which
group an element belongs to. This data structure should not be thought of as a general
purpose data structure for storing sets. In particular, it cannot perform many important set
operations, such as splitting two sets, or computing set operations such as intersection and
complementation. And its structure is tailored to handle just these two operations. However,
there are many applications for which this structure is useful. As we shall see, the data
structure is simple and amazingly efficient.

Union-Find ADT: We assume that we have an underlying finite set of elements S. We want to
maintain a partition of the set. In addition to the constructor, the (abstract) data structure
supports the following operations.

Set s = find(Element x): Return an set identifier of the set s that contains the element
x. A set identifier is simply a special value (of unspecified type) with the property that
find(x) == find(y) if and only if x and y are in the same set.
Set r = union(Set s, Set t): Merge two sets named s and t into a single set r containing
their union. We assume that s, t and r are given as set identifiers. This operations
destroys the sets s and t.

Disjoint Set Union-Find 136 CMSC 420


Note that there are no key values used here. The arguments to the find and union operations
are pointers to objects stored in the data structure. The constructor for the data structure
is given the elements in the set S and produces a structure in which every element x ∈ S is
in a singleton set {x} containing just x.
Inverted Tree Implementation: We will derive our implementation of a data structure for the
union-find ADT by starting with a simple structure based on a forest of inverted trees. You
think of an inverted tree as a multiway tree in which we only store parent links (no child or
sibling links). The root’s parent pointer is null. There is no limit on how many children a
node can have. The sets are represented by storing the elements of each set in separate tree.
In this implementation a set identifier is simply a pointer to the root of an inverted tree.
Thus the type Set is just an alias for the type Element (which is perhaps not very good
practice, but is very convenient). We will assume that each element x of the set has a pointer
x.parent, which points to its parent in this tree. To perform the operation find(x), we walk
along parent links and return the root of the tree. This root element is set identifier for the
set containing x. Notice that this satisfies the above requirement, since two elements are in
the same set if and only if they have the same root. We call this find1(). Later we will
propose an improvement.
The Find Operation (First version)
Set find1(Element x) {
while (x.parent != null) x = x.parent; // follow chain to root
return x; // return the root
}

For example, suppose that S = {a, b, c, . . . , m} and the current partition is:
{a, f, g, h, k, l}, {b}, {c, d, e, m}, {i, j}.
This might be stored in an inverted tree as shown in the following figure. The operation
find(k) would return a pointer to node g.

g b d i

a l f m c j

k h e

Fig. 108: Union-find Tree Example.

Note that there is no particular order to how the individual trees are structured, as long as
they contain the proper elements. (Again recall that unlike existing data structures that we
have discussed in which there are key values, here the arguments are pointers to the nodes
of the inverted tree. Thus, there is never a question of “what” is in the data structure, the
issue is “which” tree are you in.) Initially each element is in its own set. To do this we just
set all parent pointers to null.
A union is straightforward to implement. To perform the union of two sets, e.g. to take the
union of the set containing {b} with the set {i, j} we just link the root of one tree into the

Disjoint Set Union-Find 137 CMSC 420


root of the other tree. But here is where it pays to be smart. If we link the root of {i, j} into
{b}, the height of the resulting tree is 2, whereas if we do it the other way the height of the
tree is only 1. (Recall that height of a tree is the maximum number of edges from any leaf to
the root.) It is best to keep the tree’s height small, because in doing so we make the find’s
run faster.
In order to perform union’s intelligently, we maintain an extra piece of information, which
records the height of each tree. For historic reasons we call this height the rank of the tree.
We assume that the rank is stored as a field in each element. The intelligent rule for doing
unions is to link the root of the set of smaller rank as a child of the root of the set of larger
rank.
Observe will only need the rank information for each tree root, and each tree root has no need
for a parent pointer. So in a really tricky implementation, the ranks and parent pointers can
share the same storage field, provided that you have some way of distinguishing them. For
example, in our text, parent pointers (actually indices) are stored with positive integers and
ranks are stored as negative integers. We will not worry about being so clever.

g 2 b 0 d 2 i 1 union(g,d) d 3 i 1
union(b,i)
a l f m c j g m c b j

k h e a l f e

k h

Fig. 109: Union-find with ranks.

The code for union operation is shown in the following figure. When updating the ranks there
are two cases. If the trees have the same ranks, then the rank of the result is one larger than
this value. If not, then the rank of the result is the same as the rank of the higher ranked
tree. (You should convince yourself of this.)
Union operation
Set union(Set s, Set t) {
if (s.rank > t.rank) { // s has strictly higher rank
t.parent = s // make s the root (rank does not change)
return s
}
else { // t has equal or higher rank
if (s.rank == t.rank) t.rank++
s.parent = t // make t the root
return t
}
}

Analysis of Running Time: Consider a sequence of m union-find operations, performed on a


domain with n total elements. Observe that the running time of the initialization is propor-
tional to n, the number of elements in the set, but this is done only once. Each union takes
only constant time, O(1).

Disjoint Set Union-Find 138 CMSC 420


In the worst case, find takes time proportional to the height of the tree. The key to the
efficiency of this procedure is the following observation, which implies that the tree height is
never greater than lg m. (Recall that lg m denotes the logarithm base 2 of m.)

Lemma: Using the union-find procedures described above any tree with height h has at least
2h elements.
Proof: Given a union-find tree T , let h denote the height of T , let n denote the number of
elements in T , and let u denote the number of unions needed to build T . We prove the
lemma by strong induction on the number of unions performed to build the tree. For
the basis (no unions) we have a tree with 1 element of height 0. Since 20 = 1, the basis
case is established.
For the induction step, assume that the hypothesis is true for all trees built with strictly
fewer than u union operations, and we want to prove the lemma for a union-find tree
built with exactly u union operations. Let T be such a tree. Let T 0 and T 00 be the two
subtrees that were joined as part of the final union. Let h0 and h00 be the heights of T 0
and T 00 , respectively, and define n0 and n00 similarly. Each of T 0 and T 00 were built with
strictly fewer than u union operations. By the induction hypothesis we have
0 00
n0 ≥ 2h and n00 ≥ 2h .

Let us assume that T 0 was made the child of T 00 (the other case is symmetrical). This
implies that h0 ≤ h00 , and h = max(h0 + 1, h00 ). There are two cases. First, if h0 = h00
then h = h0 + 1 and we have
0 00 0
n = n0 + n00 ≥ 2h + 2h = 2h +1 = 2h .

Second, if h0 < h00 then h = h00 and we have


00
n = n0 + n00 ≥ n00 ≥ 2h = 2h .

In either case we get the desired result.

Since the unions’s take O(1) time each, we have the following.

Theorem: After initialization, any sequence of m union’s and find’s can be performed in
time O(m log m).

Path Compression: It is possible to apply a very simple heuristic improvement to this data
structure which provides a significant improvement in the running time. Here is the intuition.
If the user of your data structure repeatedly performs find’s on a leaf at a very low level in
the tree then each such find takes as much as O(log n) time. Can we improve on this?
Once we know the result of the find, we can go back and “short-cut” each pointer along the
path to point directly to the root of the tree. This only increases the time needed to perform
the find by a constant factor, but any subsequent find on this node (or any of its ancestors)
will take only O(1) time. The operation of short-cutting the pointers so they all point directly
to the root is called path-compression and an example is shown below. Notice that only the
pointers along the path to the root are altered. We present a slick recursive version below as
well. Trace it to see how it works.

Disjoint Set Union-Find 139 CMSC 420


find(a)
f f
e e
c c

b b
a a

Fig. 110: Find using path compression.


Find Operation using Path Compression
Set find2(Element x) {
if (x.parent == null) return x // return root
else return x.parent = find2(x.parent) // find root and update parent
}

The running time of find2 is still proportional to the depth of node being found, but observe
that each time you spend a lot of time in a find, you flatten the search path. Thus the work
you do provides a benefit for later find operations. (This is the sort of thing that we observed
in earlier amortized analyses.)
Does the savings really amount to anything? The answer is yes. It was actually believed at
one time that if path compression is used, then (after initialization) the running time of the
algorithm for a sequence of m union and finds was O(m), and hence the amortized cost of
each operation is O(1). However, this turned out to be false, but the amortized time is much
less than O(log m).
Analyzing this algorithm is quite tricky. (Our text gives the full analysis if you are interested.)
In order to create a bad situation you need to do lots of unions to build up a tree with some
real depth. But as soon as you start doing finds on this tree, it very quickly becomes very
flat again. In the worst case we need to an immense number of unions to get high costs for
the finds.
To give the analysis (which we won’t prove) we introduce two new functions, A(m, n) and
α(n). The function A(m, n) is called Ackerman’s function. It is famous for being just about
the fastest growing function imaginable.

A(1, j) = 2j for j ≥ 1,
A(i, 1) = A(i − 1, 2) for i ≥ 2,
A(i, j) = A(i − 1, A(i, j − 1)) for i, j ≥ 2.

In spite of its innocuous appearance, this function is a real monster. To get a feeling for how
fast this function grows, observe that
·2
··
A(2, j) = 22 ,

Disjoint Set Union-Find 140 CMSC 420


where the tower of powers of 2 is j high. (Try expanding the recurrence to prove this.) Hence,
2
A(4, 1) = A(3, 2) = A(2, A(3, 1)) = A(2, A(2, 2)) = A(2, 22 ) = A(2, 16) ≈ 1080 ,

which is already greater than the estimated number of atoms in the observable universe.
Since Ackerman’s function grows so fast, its inverse, called α grows incredibly slowly. Define

α(m, n) = min{i ≥ 1 | A(i, bm/nc) > lg n}.

This definition is somewhat hard to interpret, but the important bottom line is that assuming
bm/nc ≥ 1, we have α(m, n) ≤ 4 as long as m is less than the number of atoms in the universe
(which is certainly true for most input sets!) The following result (which we present without
proof) shows that an sequence of union-find operations take amortized time O(α(m, n)), and
so each operation (for all practical purposes) takes amortized constant time.

Theorem: After initialization, any sequence of m union’s and find’s (using path compression)
on an initial set of n elements can be performed in time O(mα(m, n)) time. Thus the
amortized cost of each union-find operation is O(α(m, n)).

Supplemental Lecture 4: Geometric Preliminaries


Geometric Information: A large number of modern data structures and algorithms problems
involve geometric data. The reason is that rapidly growing fields such as computer graphics,
robotics, computer vision, computer-aided design, visualization, human-computer interaction,
virtual reality, and others deal primarily with geometric data. Geometric applications give
rise to new data structures problems, which we will be studying for a number of lectures.
Before discussing geometric data structures, we need to provide some background on what
geometric data is, and how we compute with it. With nongeometric data we stated that
we are storing records, and each record is associated with an identifying key value. These
key values served a number of functions for us. They served a function of identification the
the objects of the data structure. Because of the underlying ordering relationship, they also
provided a means for searching for objects in the data structure, by giving us a way to direct
the search by subdividing the space of keys into subsets that are greater than or less than the
key.
In geometric data structures we will need to generalize the notion of a key. A geometric point
object in the plane can be identified by its (x, y) coordinates, which can be thought of as
a type of key value. However, if we are storing more complex objects, such as rectangles,
line segments, spheres, and polygons, the notion of a identifying key is not as appropriate.
As with one-dimensional data, we also have associated application data. For example, in a
graphics system, the geometric description of an object is augmented with information about
the object’s color, texture, and its surface reflectivity properties. Since our interest will be in
storing and retrieving objects based on their geometric properties, we will not discuss these
associated data values.

Primitive Objects: Before we start discussing geometric data structures, we will digress to dis-
cuss a bit about geometric objects, their representation, and manipulation. Here is a list of
common geometric objects and possible representations. The list is far from complete. Let
Rd denote d-dimensional space with real coordinates.

Geometric Preliminaries 141 CMSC 420


Scalar: This is a single (1-dimensional) real number. It is represented as float or double.
Point: Points are locations in space. A typical representation is as a d-tuple of scalars, e.g.
P = (p0 , p1 , . . . , pd−1 ) ∈ Rd . Is it better to represent a point as an array of scalars, or
as an object with data members labeled x, y, and z? The array representation is more
general and often more convenient, since it is easier to generalized to higher dimensions
and coordinates can be parameterized. You can define “names” for the coordinates. If
the dimension might vary then the array representation is necessary.
For example, in Java we might represent a point in 3-space using something like the
following. (Note that “static final” essentially means “constant” in this context.)
class Point {
public static final int DIM = 3;
protected float coord[DIM];
...
}
Vector: Vectors are used to denote direction and magnitude in space. Vectors and points
are represented in essentially the same way, as a d-tuple of scalars, ~v = (v0 , v1 , . . . , vd−1 ).
It is often convenient to think of vectors as free vectors, meaning that they are not tied
down to a particular origin, but instead are free to roam around space. The reason for
distinguishing vectors from points is that they often serve significantly different functions.
For example, velocities are frequently described as vectors, but locations are usually
described as points. This provides the reader of your program a bit more insight into
your intent.
Line Segment: A line segment can be represented by giving its two endpoints (p1 , p2 ). In
some applications it is important to distinguish between the line segments p# 1 p2 and p# 2 p1 .
In this case they would be called directed line segments.
Ray: Directed lines in 3- and higher dimensions are not usually represented by equations but
as rays. A ray can be represented by storing an origin point P and a nonzero directional
vector ~v .

P v
v
Q
u P

vectors points line segment ray line

Fig. 111: Basic geometric objects.

Line: A line in the plane can be represented by a line equation


y = ax + b or ax + by = c.
The former definition is the familiar slope-intercept representation, and the latter takes
an extra coefficient but is more general since it can easily represent a vertical line. The
representation consists of just storing the pair (a, b) or (a, b, c).
Another way to represent a line is by giving two points through which the line passes, or
by giving a point and a directional vector. These latter two methods have the advantage
that they generalize to higher dimensions.

Geometric Preliminaries 142 CMSC 420


Hyperplanes and halfspaces: In general, in dimension d, a linear equation of the form

a0 p0 + a1 p1 + . . . + ad−1 pd−1 = c

defines a d−1 dimensional hyperplane. It can be represented by the d-tuple (a0 , a1 , . . . , ad−1 )
together with c. Note that the vector (a0 , a1 , . . . , ad−1 ) is orthogonal to the hyperplane.
The set of points that lie on one side or the other of a hyperplane is called a halfspace.
The formula is the same as above, but the “=” is replaced with an inequality such as
“<” or “≥”.
2x + y > 1/2
cutDim = x
(2,1) cutVal = 1.5
y y

x x

Halfspace Orthogonal halfspace

Fig. 112: Planes and Halfspaces.

Orthogonal Hyperplane: In many data structures it is common to use hyperplanes that


are orthogonal to one of the coordinate axes, called orthogonal hyperplanes. In this case
it is much easier to store such a hyperplane by storing (1) an integer cutDim from the
set {0, 1, . . . , d − 1}, which indicates which axis the plane is perpendicular to and (2) a
scalar cutVal, which indicates where the plane cuts this axis.
Simple Polygon: Solid objects can be represented as polygons (in dimension 2) and poly-
hedra (in higher dimensions). A polygon is a cycle of line segments joined end-to-end.
It is said to be simple if its boundary does not self-intersect. It is convex if it is simple
and no internal angle is greater than π.

r P

nonsimple simple convex


Polygons Rectangle Circle

Fig. 113: Polygons, rectangles and spheres.

The standard representation of a simple polygon is just a list of points (stored either in
an array or a circularly linked list). In general representing solids in higher dimensions
involves more complex structures, which we will discuss later.
Orthogonal Rectangle: Rectangles can be represented as polygons, but rectangles whose
sides are parallel to the coordinate axes are common. A couple of representations are
often used. One is to store the two points of opposite corners (e.g. lower left and upper
right).
Circle/Sphere: A d-dimensional sphere, can be represented by point P indicating the center
of the sphere, and a positive scalar r representing its radius. A points X lies within the

Geometric Preliminaries 143 CMSC 420


sphere if
(x0 − p0 )2 + (x1 − p1 )2 + . . . + (xd−1 − pd−1 )2 ≤ r2 .

Topological Notions: When discussing geometric objects such as circles and polygons, it is often
important to distinguish between what is inside, what is outside and what is on the boundary.
For example, given a triangle T in the plane we can talk about the points that are in the
interior (extitint(T )) of the triangle, the points that lie on the boundary (extitbnd(T )) of the
triangle, and the points that lie on the exterior (extitext(T )) of the triangle. A set is closed
if it includes its boundary, and open if it does not. Sometimes it is convenient to define a set
as being semi-open, meaning that some parts of the boundary are included and some are not.
Making these notions precise is tricky, so we will just leave this on an intuitive level.

int(T) bnd(T) ext(T) open closed semi−open

Fig. 114: Topological notation.

Operations on Primitive Objects: When dealing with simple numeric objects in 1-dimensional
data structures, the set of possible operations needed to be performed on each primitive object
was quite simple, e.g. compare one key with another, add or subtract keys, print keys, etc.
With geometric objects, there are many more operations that one can imagine.

Basic point/vector operators: Let α be any scalar, let P and Q be points, and ~u, ~v , w ~ be
vectors. We think of vectors as being free to roam about the space whereas points are
tied down to a particular location. We can do all the standard operations on vectors you
learned in linear algebra (adding, subtracting, etc.) The difference of two points P − Q
results in the vector directed from Q to P . The sum of a point P and vector ~u is the
point lying on the head of the vector when its tail is placed on P .

Q
u+v P+v
v P−Q P
u v
P
vector addition point subtraction point−vector addition

Fig. 115: Point-vector operators.

As a programming note, observe that C++ has a very nice mechanism for handling
these operations on Points and Vectors using operator overloading. Java does not allow
overloading of operators.
Affine combinations: Given two points P and Q, the point (1 − α)P + αQ is point on the
line joining P and Q. We can think of this as a weighted average, so that as α approaches
0 the point is closer to P and as α approaches 1 the point is closer to Q. This is called

Geometric Preliminaries 144 CMSC 420


α<0
P P
2 Q+1 P 0<α<1
3 3
Q
Q α>1

Fig. 116: Affine combinations.

an affine combination of P and Q. If 0 ≤ α ≤ 1, then the point lies on the line segment
P Q.
Length and distance: The length of a vector v is defined to be
q
2 .
k~v k = v02 + v12 + . . . + vd−1

The distance between two points P and Q is the length of the vector between them, that
is
extitdist(P, Q) = kP − Qk.
Computing distances between other geometric objects is also important. For example,
what is the distance between two triangles? When discussing complex objects, distance
usually means the closest distance between objects.
Orientation and Membership: There are a number of geometric operations that deal with
the relationship between geometric objects. Some are easy to solve. (E.g., does a point
P lie within a rectangle R?) Some are harder. (E.g., given points A, B, C, and D, does
D lie within the unique circle defined by the other three points?) We will discuss these
as the need arises.
Intersections: The other sort of question that is important is whether two geometric objects
intersect each other. Again, some of these are easy to answer, and others can be quite
complex.

Example: Circle/Rectangle Intersection: As an example of a typical problem involving ge-


ometric primitives, let us consider the question of whether a circle in the plane intersects a
rectangle. Let us represent the circle by its center point C and radius r and represent the
rectangle R by its lower left and upper right corner points, Rlo and Rhi (for low and high).
Problems involving circles are often more easily recast as problems involving distances. So,
instead, let us consider the problem of determining the distance d from C to its closest point
on the rectangle R. If d > r then the circle does not intersect the rectangle, and otherwise it
does.
In order to determine the distance from C to the rectangle, we first observe that if C lies inside
R, then the distance is 0. Otherwise we need to determine which point of the boundary of R
is closest to C. Observe that we can subdivide the exterior of the rectangle into 8 regions, as
shown in the figure below. If C lies in one of the four corner regions, then C is closest to the
corresponding vertex of R and otherwise C is closest to one of the four sides of R. Thus, all
we need to do is to classify which region C lies is, and compute the corresponding distance.
This would usually lead a lengthy collection of if-then-else statements, involving 9 different
cases.

Geometric Preliminaries 145 CMSC 420


(lo x − c x ) 2 (c x − hi x ) 2
C
(c y − hi y ) 2
2
(c y − hi y ) (c y − hi y ) 2
hi y
2 2
(lo x − c x ) (c x − hi x )
lo y
(lo x − c x ) 2
2
(c x − hi x )
(lo y − c y ) 2 (lo y − c y ) (lo y − c y ) 2
2
lo x hi x

Fig. 117: Distance from a point to a rectangle, and squared distance contributions for each region.

We will take a different approach. Sometimes things are actually to code if you consider
the problem in its general d-dimensional form. Rather than computing the distance, let us
first concentrate on computing the squared distance instead. The distance is the sum of the
squares of the distance along the x-axis and distance along the y-axis. Consider just the
x-coordinates, if C lies to are to the left of the rectangle then the contribution is (Rlo,x − cx )2 ,
if C lies to the right then the contribution is (cx − Rhi,x )2 , and otherwise there is no x-
contribution to the distance. A similar analysis applies for y. This suggests the following
code, which works in all dimensions.
Distance from Point C to Rectangle R
float distance(Point C, Rectangle R) {
sumSq = 0 // sum of squares
for (int i = 0; i < Point.DIM; i++) {
if (C[i] < R.lo[i]) // left of rectangle
sumSq += square(R.lo[i] - C[i])
else if (C[i] > R.hi[i]) // right of rectangle
sumSq += square(C[i] - R.hi[i])
}
return sqrt(sumSq)
}

Finally, once the distance has been computed, we test whether it is less than the radius r. If
so the circle intersects the rectangle and otherwise it does not.

Supplemental Lecture 5: Interval Trees


Segment Data: So far we have considered geometric data structures for storing points. However,
there are many others types of geometric data that we may want to store in a data structure.
Today we consider how to store orthogonal (horizontal and vertical) line segments in the
plane. We assume that a line segment is represented by giving its pair of endpoints. The
segments are allowed to intersect one another.
As a basic motivating query, we consider the following window query. Given a set of orthogonal
line segments S, which have been preprocessed, and given an orthogonal query rectangle W ,
count or report all the line segments of S that intersect W . We will assume that W is closed
and solid rectangle, so that even if a line segment lies entirely inside of W or intersects only

Interval Trees 146 CMSC 420


the boundary of W , it is still reported. For example, given the window below, the query
would report the segments that are shown with solid lines, and segments with broken lines
would not be reported.

Fig. 118: Window Query.

Window Queries for Orthogonal Segments: We will present a data structure, called the in-
terval tree, which (combined with a range tree) can answer window counting queries for
orthogonal line segments in O(log2 n) time, where n is the number line segments. It can
report these segments in O(k + log2 n) time, where and k is the total number of segments
reported. The interval tree uses O(n log n) storage and can be built in O(n log n) time.
We will consider the case of range reporting queries. (There are some subtleties in making
this work for counting queries.) We will derive our solution in steps, starting with easier
subproblems and working up to the final solution. To begin with, observe that the set of
segments that intersect the window can be partitioned into three types: those that have no
endpoint in W , those that have one endpoint in W , and those that have two endpoints in W .
We already have a way to report segments of the second and third types. In particular, we
may build a range tree just for the 2n endpoints of the segments. We assume that each
endpoint has a cross-link indicating the line segment with which it is associated. Now, by
applying a range reporting query to W we can report all these endpoints, and follow the cross-
links to report the associated segments. Note that segments that have both endpoints in the
window will be reported twice, which is somewhat unpleasant. We could fix this either by
sorting the segments in some manner and removing duplicates, or by marking each segment
as it is reported and ignoring segments that have already been marked. (If we use marking,
after the query is finished we will need to go back an “unmark” all the reported segments in
preparation for the next query.)
All that remains is how to report the segments that have no endpoint inside the rectangular
window. We will do this by building two separate data structures, one for horizontal and
one for vertical segments. A horizontal segment that intersects the window but neither of its
endpoints intersects the window must pass entirely through the window. Observe that such
a segment intersects any vertical line passing from the top of the window to the bottom. In
particular, we could simply ask to report all horizontal segments that intersect the left side
of W . This is called a vertical segment stabbing query. In summary, it suffices to solve the
following subproblems (and remove duplicates):

Endpoint inside: Report all the segments of S that have at least one endpoint inside W .
(This can be done using a range query.)
Horizontal through segments: Report all the horizontal segments of S that intersect the
left side of W . (This reduces to a vertical segment stabbing query.)

Interval Trees 147 CMSC 420


Vertical through segments: Report all the vertical segments of S that intersect the bot-
tom side of W . (This reduces to a horizontal segment stabbing query.)

We will present a solution to the problem of vertical segment stabbing queries. Before dealing
with this, we will first consider a somewhat simpler problem, and then modify this simple
solution to deal with the general problem.

Vertical Line Stabbing Queries: Let us consider how to answer the following query, which is
interesting in its own right. Suppose that we are given a collection of horizontal line segments
S in the plane and are given an (infinite) vertical query line `q : x = xq . We want to report
all the line segments of S that intersect `q . Notice that for the purposes of this query, the
y-coordinates are really irrelevant, and may be ignored. We can think of each horizontal line
segment as being a closed interval along the x-axis. We show an example in the figure below
on the left.
xmed
d d M
h h
f f
i i
c l L c l R
e k e k
b m b m
a g j n a g j n

0 5 10 15 20 25 30 0 5 10 15 20 25 30

x=x q stabs: b,c,d,e

Fig. 119: Line Stabbing Query.

As is true for all our data structures, we want some balanced way to decompose the set of
intervals into subsets. Since it is difficult to define some notion of order on intervals, we instead
will order the endpoints. Sort the interval endpoints along the x-axis. Let hx1 , x2 , . . . , x2n i
be the resulting sorted sequence. Let xmed be the median of these 2n endpoints. Split the
intervals into three groups, L, those that lie strictly to the left of xmed , R those that lie strictly
to the right of xmed , and M those that contain the point xmed . We can then define a binary
tree by putting the intervals of L in the left subtree and recursing, putting the intervals of R
in the right subtree and recursing. Note that if xq < xmed we can eliminate the right subtree
and if xq > xmed we can eliminate the left subtree. See the figure right.
But how do we handle the intervals of M that contain xmed ? We want to know which of these
intervals intersects the vertical line `q . At first it may seem that we have made no progress,
since it appears that we are back to the same problem that we started with. However, we
have gained the information that all these intervals intersect the vertical line x = xmed . How
can we use this to our advantage?
Let us suppose for now that xq ≤ xmed . How can we store the intervals of M to make it easier
to report those that intersect `q . The simple trick is to sort these lines in increasing order of
their left endpoint. Let ML denote the resulting sorted list. Observe that if some interval in
ML does not intersect `q , then its left endpoint must be to the right of xq , and hence none of

Interval Trees 148 CMSC 420


the subsequent intervals intersects `q . Thus, to report all the segments of ML that intersect
`q , we simply traverse the sorted list and list elements until we find one that does not intersect
`q , that is, whose left endpoint lies to the right of xq . As soon as this happens we terminate.
If k 0 denotes the total number of segments of M that intersect `q , then clearly this can be
done in O(k 0 + 1) time.
On the other hand, what do we do if xq > xmed ? This case is symmetrical. We simply sort
all the segments of M in a sequence, MR , which is sorted from right to left based on the right
endpoint of each segment. Thus each element of M is stored twice, but this will not affect the
size of the final data structure by more than a constant factor. The resulting data structure
is called an interval tree.

Interval Trees: The general structure of the interval tree was derived above. Each node of the
interval tree has a left child, right child, and itself contains the median x-value used to split
the set, xmed , and the two sorted sets ML and MR (represented either as arrays or as linked
lists) of intervals that overlap xmed . We assume that there is a constructor that builds a node
given these three entities. The following high-level pseudocode describes the basic recursive
step in the construction of the interval tree. The initial call is root = IntTree(S), where
S is the initial set of intervals. Unlike most of the data structures we have seen so far, this
one is not built by the successive insertion of intervals (although it would be possible to do
so). Rather we assume that a set of intervals S is given as part of the constructor, and the
entire structure is built all at once. We assume that each interval in S is represented as a
pair (xlo , xhi ). An example is shown in the following figure.
Interval tree construction
IntTreeNode IntTree(IntervalSet S) {
if (|S| == 0) return null // no more

xMed = median endpoint of intervals in S // median endpoint

L = {[xlo, xhi] in S | xhi < xMed} // left of median


R = {[xlo, xhi] in S | xlo > xMed} // right of median
M = {[xlo, xhi] in S | xlo <= xMed <= xhi} // contains median
ML = sort M in increasing order of xlo // sort M
MR = sort M in decreasing order of xhi

t = new IntTreeNode(xMed, ML, MR) // this node


t.left = IntTree(L) // left subtree
t.right = IntTree(R) // right subtree
return t
}

We assert that the height of the tree is O(log n). To see this observe that there are 2n
endpoints. Each time through the recursion we split this into two subsets L and R of sizes
at most half the original size (minus the elements of M ). Thus after at most lg(2n) levels we
will reduce the set sizes to 1, after which the recursion bottoms out. Thus the height of the
tree is O(log n).
Implementing this constructor efficiently is a bit subtle. We need to compute the median of
the set of all endpoints, and we also need to sort intervals by left endpoint and right endpoint.
The fastest way to do this is to presort all these values and store them in three separate lists.

Interval Trees 149 CMSC 420


d (ML) xmed (MR)
h (d,f,h,i) 14 (i,f,d,h)
f
i
c l (b,c,e) 6 (c,e,b) (k,l.m) 22 (m,l,k)
e k
b m
a g j n (a) 2 (a) (g) 11 (g) (j) 18 (j) (n) 25 (n)

0 5 10 15 20 25 30

Fig. 120: Interval Tree.

Then as the sets L, R, and M are computed, we simply copy items from these sorted lists to
the appropriate sorted lists, maintaining their order as we go. If we do so, it can be shown
that this procedure builds the entire tree in O(n log n) time. The algorithm for answering a
stabbing query was derived above. We summarize this algorithm below. Let xq denote the
x-coordinate of the query line.
Line Stabbing Queries for an Interval Tree
stab(IntTreeNode t, Scalar xq) {
if (t == null) return // fell out of tree
if (xq < t.xMed) { // left of median?
for (i = 0; i < t.ML.length; i++) { // traverse ML
if (t.ML[i].lo <= xq) print(t.ML[i])// ..report if in range
else break // ..else done
}
stab(t.left, xq) // recurse on left
}
else { // right of median
for (i = 0; i < t.MR.length; i++) { // traverse MR
if (t.MR[i].hi >= xq) print(t.MR[i])// ..report if in range
else break // ..else done
}
stab(t.right, xq) // recurse on right
}
}

This procedure actually has one small source of inefficiency, which was intentionally included
to make code look more symmetric. Can you spot it? Suppose that xq = t.xmed ? In this case
we will recursively search the right subtree. However this subtree contains only intervals that
are strictly to the right of xmed and so is a waste of effort. However it does not affect the
asymptotic running time.
As mentioned earlier, the time spent processing each node is O(1 + k 0 ) where k 0 is the total
number of points that were recorded at this node. Summing over all nodes, the total reporting
time is O(k + v), where k is the total number of intervals reported, and v is the total number
of nodes visited. Since at each node we recurse on only one child or the other, the total
number of nodes visited v is O(log n), the height of the tree. Thus the total reporting time
is O(k + log n).

Interval Trees 150 CMSC 420


Vertical Segment Stabbing Queries: Now let us return to the question that brought us here.
Given a set of horizontal line segments in the plane, we want to know how many of these
segments intersect a vertical line segment. Our approach will be exactly the same as in
the interval tree, except for how the elements of M (those that intersect the splitting line
x = xmed ) are handled.
Going back to our interval tree solution, let us consider the set M of horizontal line segments
that intersect the splitting line x = xmed and as before let us consider the case where the
query segment q with endpoints (xq , ylo ) and (xq , yhi ) lies to the left of the splitting line.
The simple trick of sorting the segments of M by their left endpoints is not sufficient here,
because we need to consider the y-coordinates as well. Observe that a segment of M stabs the
query segment q if and only if the left endpoint of a segment lies in the following semi-infinite
rectangular region.
{(x, y) | x ≤ xq and ylo ≤ y ≤ yhi }.
This is illustrated in the figure below. Observe that this is just an orthogonal range query.
(It is easy to generalize the procedure given last time to handle semi-infinite rectangles.) The
case where q lies to the right of xmed is symmetrical.
xmed

Fig. 121: The segments that stab q lie within the shaded semi-infinite rectangle.

So the solution is that rather than storing ML as a list sorted by the left endpoint, instead
we store the left endpoints in a 2-dimensional range tree (with cross-links to the associated
segments). Similarly, we create a range tree for the right endpoints and represent MR using
this structure.
The segment stabbing queries are answered exactly as above for line stabbing queries, except
that part that searches ML and MR (the for-loops) are replaced by searches to the appropriate
range tree, using the semi-infinite range given above.
We will not discuss construction time for the tree. (It can be done in O(n log n) time, but this
involves some thought as to how to build all the range trees efficiently). The space needed is
O(n log n), dominated primarily from the O(n log n) space needed for the range trees. The
query time is O(k + log3 n), since we need to answer O(log n) range queries and each takes
O(log2 n) time plus the time for reporting. If we use the spiffy version of range trees (which
we mentioned but never discussed) that can answer queries in O(k + log n) time, then we can
reduce the total time to O(k + log2 n).

Garbage Collection 151 CMSC 420


Supplemental Lecture 6: Garbage Collection
Garbage Collection: In contrast to the explicit deallocation methods discussed in the previous
lectures, in some memory management systems such as Java, there is no explicit deallocation
of memory. In such systems, when memory is exhausted it must perform garbage collection
to reclaim storage and sometimes to reorganize memory for better future performance. We
will consider some of the issues involved in the implementation of such systems.
Any garbage collection system must do two basic things. First, it must detect which blocks
of memory are unreachable, and hence are “garbage”. Second, it must reclaim the space used
by these objects and make it available for future allocation requests. Garbage detection is
typically performed by defining a set of roots, e.g., local variables that point to objects in the
heap, and then finding everything that is reachable from these roots. An object is reachable
(or live) if there is some path of pointers or references from the roots by which the executing
program can access the object. The roots are always accessible to the program. Objects that
are not reachable are considered garbage, because they can no longer affect the future course
of program execution.
Reference counts: How do we know when a block of storage is able to be deleted? One simple
way to do this is to maintain a reference count for each block. This is a counter associated
with the block. It is set to one when the block is first allocated. Whenever the pointer to
this block is assigned to another variable, we increase the reference count. (For example, the
compiler can overload the assignment operation to achieve this.) When a variable containing
a pointer to the block is modified, deallocated or goes out of scope, we decrease the reference
count. If the reference count ever equals 0, then we know that no references to this object
remain, and the object can be deallocated.
Reference counts have two significant shortcomings. First, there is a considerable overhead
in maintaining reference counts, since each assignment needs to modify the reference counts.
Second, there are situations where the reference count method may fail to recognize unreach-
able objects. For example, if there is a circular list of objects, for example, X points to Y and
Y points to X, then the reference counts of these objects may never go to zero, even though
the entire list is unreachable.
Mark-and-sweep: A more complete alternative to reference counts involves waiting until space
runs out, and then scavenging memory for unreachable cells. Then these unreachable regions
of memory are returned to available storage. These available blocks can be stored in an
available space list using the same method described in the previous lectures for dynamic
storage allocation. This method works by finding all immediately accessible pointers, and
then traces them down and marks these blocks as being accessible. Then we sweep through
memory adding all unmarked blocks to the available space list. Hence this method is called
mark-and-sweep. An example is illustrated in the figure below.
How do we implement the marking phase? One way is by performing simple depth-first
traversal of the “directed graph” defined by all the pointers in the program. We start from
all the root pointers t (those that are immediately accessible from the program) and invoke
the following procedure for each. Let us assume that for each pointer t we know its type
and hence we know the number of pointers this object contains, denoted t.nChild and these
pointers are denoted t.child[i]. (Note that although we use the term “child” as if pointers
form a tree, this is not the case, since there may be cycles.) We assume that each block has
a bit value t.isMarked which indicates whether the object has been marked.

Garbage Collection 152 CMSC 420


local1 local2
mark

local1 local2
sweep
avail

Fig. 122: Mark-and-sweep garbage collection.


Recursive Marking Algorithm
mark(Pointer t) {
if (t == null || t.isMarked) return // null or already visited
t.isMarked = true // mark t visited
for (i = 0; i < t.nChild; i++) // consider t’s pointers
mark(t.child[i]) // recursively visit each one
}

The recursive calls persist in visiting everything that is reachable, and only backing off when
we come to a null pointer or something that has already been marked. Note that we do not
need to store the t.nChild field in each object. It is function of t’s type, which presumably the
run-time system is aware of (especially in languages like Java that support dynamic casting).
However we definitely need to allocate an extra bit in each object to store the mark.

Marking using Link Redirection: There is a significant problem in the above marking algo-
rithm. This procedure is necessarily recursive, implying that we need a stack to keep track
of the recursive calls. However, we only invoke this procedure if we have run out of space.
So we do not have enough space to allocate a stack. This poses the problem of how can we
traverse space without the use of recursion or a stack. There is no method known that is
very efficient and does not use a stack. However there is a rather cute idea which allows us to
dispense with the stack, provided that we allocate a few extra bits of storage in each object.
The method is called link redirection or the Schorr-Deutsch-Waite method. Let us think of
our objects as being nodes in a multiway tree. (Recall that we have cycles, but because of we
never revisit marked nodes, so we can think of pointers to marked nodes as if they are null
pointers.) Normally the stack would hold the parent of the current node. Instead, when we
traverse the link to the ith child, we redirect this link so that it points to the parent. When
we return to a node and want to proceed to its next child, we fix the redirected child link and
redirect the next child.
Pseudocode for this procedure is given below. As before, in the initial call the argument t
is a root pointer. In general t is the current node. The variable p points to t’s parent. We
have a new field t.currChild which is the index of the current child of t that we are visiting.

Garbage Collection 153 CMSC 420


Whenever the search ascends to t from one of its children, we increment t.currChild and
visit this next child. When t.currChild == t.nChild then we are done with t and ascend
to its parent. We include two utilities for pointer redirection. The call descend(p, t, t.c)
moves us from t to its child t.c and saves p in the pointer field containing t.c. The call
ascend(p, t, p.c) moves us from t to its parent p and restores the contents of the p’s child
p.c. Each are implemented by a performing a cyclic rotation of these three quantities. (Note
that the arguments are reference arguments.) An example is shown in the figure below, in
which we print the state after each descend or ascend.
       
p t p p.c
descend(p, t, t.c) :  t  ←−  t.c  ascend(p, t, p.c) :  t  ←−  p  .
t.c p p.c t

Marking using Link Redirection


markByRedirect(Pointer t) {
p = null // parent pointer
while (true) {
if (t != null && !t.isMarked) { // first time at t?
t.isMarked = true // mark as visited
if (t.nChild > 0) { // t has children
t.currChild = 0 // start with child 0
descend(p, t, t.child[0]) // descend via child 0
}
}
else if (p != null) { // returning to t
j = p.currChild // parent’s current child
ascend(p, t, p.child[j]) // ascend via child j
j = ++t.currChild // next child
if (j < t.nChild) { // more children left
descend(p, t, t.child[j]) // descend via child j
}
}
else return // no parent? we’re done
}
}

As we have described it, this method requires that we reserve enough spare bits in each object
to be able to keep track of which child we are visiting in order to store the t.currChild value.
If an object has k children, then dlg ke bits would be needed (in addition to the mark bit).
Since most objects have a fairly small number of pointers, this is a small number of bits.
Since we only need to use these bits for the elements along the current search path, rather
than storing them in the object, they could instead be packed together in the form of a stack.
Because we only need a few bits per object, this stack would require much less storage than
the one needed for the recursive version of mark.

Stop-and-Copy: The alternative strategy to mark-and-sweep is called stop-and-copy. This method


achieves much lower memory allocation, but provides for very efficient allocation. Stop-and-
copy divides memory into two large banks of equal size. One of these banks is active and
contains all the allocated blocks, and the other is entirely unused, or dormant. Rather than
maintaining an available space list, the allocated cells are packed contiguously, one after the

Garbage Collection 154 CMSC 420


p descend descend ascend descend
t 0 p 0 0 p 0

t 0 p 0 t 1 p

t t
ascend
p p p
t 2 t 1 p 1 0 t 0 p

2 2 2 t 2 1 t

(done) ascend descend ascend

Fig. 123: Marking using link redirection.

other at the fron of the current bank. When the current bank is full, we stop and determine
which blocks in the current bank are reachable or alive. For each such live block we find, we
copy it from the active bank to the dormant bank (and mark it so it is not copied again).
The copying process packs these live blocks one after next, so that memory is compacted in
the process, and hence there is no fragmentation. Once all the reachable blocks have been
copied, the roles of the two banks are swapped, and then control is returned to the program.
Because storage is always compacted, there is no need to maintain an available space list.
Free space consists of one large contiguous chunk in the current bank. Another nice feature
of this method is that it only accesses reachable blocks, that is, it does not need to touch
the garbage. (In mark-and-sweep the sweeping process needs to run through all of memory.)
However, one shortcoming of the method is that only half of the available memory is usable
at any given time, and hence memory utilization cannot exceed one half.
The trickiest issue in stop-and-copy is how to deal with pointers. When a block is copied
from one bank to the other, there may be pointers to this object, which would need to be
redirected. In order to handle this, whenever a block is copied, we store a forwarding link in
the first word of the old block, which points to the new location of the block. Then, whenever
we detect a pointer in the current object that is pointing into the bank of memory that is
about to become dormant, we redirect this link by accessing the forwarding link. An example
is shown in the figure below. The forwarding links are shown as broken lines.
Which is better, mark-and-sweep or stop-and-copy? There is no consensus as to which is best
in all circumstances. The stop-and-copy method seems to be popular in systems where it is
easy to determine which words are pointers and which are not (as in Java). In languages such
as C++ where a pointer can be cast to an integer and then back to a pointer, it may be very
hard to determine what really is a pointer and what is not, and so mark-and-sweep is more
popular here. Stop-and-copy suffers from lots of data movement. To save time needed for
copying long-lived objects (say those that have survived 3 or more garbage collections), we
may declare them to be immortal and assign them to a special area of memory that is never
garbage collected (unless we are really in dire need of space).

Garbage Collection 155 CMSC 420


dormant a b c d e

local1 local2
copy

a b c d e a b c d e

local1 local2
fix links

a b c d e dormant

local1 local2

Fig. 124: Stop-and-copy and pointer redirection.

Garbage Collection 156 CMSC 420

You might also like