CS168: The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing
CS168: The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing
1 Consistent Hashing
1.1 Meta-Discussion
Well talk about the course in general in Section 2, but first lets discuss a representative
technical topic: consistent hashing. This topic is representative in the following respects:
1. As you could guess by the word hashing, the topic builds on central algorithmic ideas
that youve already learned (e.g., in CS161) and adapts them to some very real-world
applications.
2. The topic is modern, in the sense that it is motivated by issues in present-day systems
that were not present in the applications of yore consistent hashing is not in your
parents algorithms textbook, because back then it wasnt needed. The original idea
isnt that new anymore (from 1997), but it has been repurposed for new technologies
several times since.
3. Consistent hashing is a tool in the sense that it is a non-obvious idea but, once you
know it, its general and flexible enough to potentially prove useful for other problems.
In this course, well be looking for the following trifecta: (i) ideas that are non-obvious,
even to the well-trained computer scientist, so that were not wasting your time; (ii)
conceptually simple realistically, these are the only ideas that you might remember
a year or more from now, when youre a start-up founder, senior software engineer, or
PhD student (iii) fundamental, meaning that there is some chance that the idea will
prove useful to you in the future.
4. The idea has real applications. Consistent hashing gave birth to Akamai, which to this
day is a major player in the Internet (market cap $10B), managing the Web presence
20152016,
c Tim Roughgarden and Gregory Valiant. Not to be sold, published, or distributed without
the authors consent.
1
of tons of major companies. More recently, consistent hashing has been repurposed
to solve basic problems in peer-to-peer networks (initially in [4]), including parts of
BitTorrent. These days, all the cool kids are using consistent hashing for distributed
storage made popular by Amazons Dynamo [1], the idea is to have a lightweight
alternative to a database where all the data resides in main memory across multiple
machines, rather than on disk.
2
Figure 1: A hash function maps elements from a (generally large) universe U to a list of
buckets, such as 32-bit values.
you look for a cached copy of the Web page? You could poll all 100 caches for a copy, but
that feels pretty dumb. And with lots of users and caches, this solution crosses the line
from dumb to infeasible. Wouldnt it be nice if, instead, given a URL (like amazon.com) we
magically knew which cache (like #23) we should look to for a copy?
2. For all practical purposes, h behaves like a totally random function, spreading data
out evenly and without noticeable correlation across the possible buckets.
Designing good hash functions is not easy hopefully you wont need to do it yourself
but you can regard it as a solved problem. A common approach in practice is to use a
well-known and well-crafted hash function like MD52 its overwhelmingly likely that this
function will behave randomly for whatever data set youre working with. Theoretical
1
Well assume that youve seen hashing before, probably multiple times. See the course site for review
videos on the topic.
2
This is built in to most programming languages, or you can just copy the code for it from the Web. Or
you might want to use something faster and more lightweight (but still well tested), like from the FarmHash
family.
3
guarantees are possible only for families of hash functions,3 which motivates picking a hash
function at random from a universal family (see CS161 for details).
Taking the existence of a good hash function h for granted, we can solve the problem of
mapping URLs to caches. Say there are n caches, named {0, 1, 2, . . . , n 1}. Then we can
just store the Web page with URL x at the server named
Note that h(x) is probably something like a 32-bit value, representing an integer that is way
way bigger than n this is the reason we apply the mod n operation to recover the name
of one of the caches.
The solution (1) of mapping URLs to caches is an excellent first cut, and it works great
in many cases. To motivate why we might need a different solution, suppose the number n
of servers is not static, but rather is changing over time. For example, in Akamais early
days, they were focused on adding as many caches as possible all over the Internet, so n
was constantly increasing. Web caches can also fail or lose connection to the network, which
causes n to decrease. In a peer-to-peer context (see Section 1.5), n corresponds to the number
of nodes of the network, which is constantly changing as nodes join and depart.
Suppose we add a new cache and thereby bump up n from 100 to 101. For an object x, it
is very unlikely that h(x) mod 100 and h(x) mod 101 are the same number, Thus, changing
n forces almost all objects to relocate. This is a disaster for applications where n is constantly
changing.4
4
Figure 2: Each element of the array above is a bucket of the hash table. Each object x is
assigned to the first server s on its right.
Figure 3: (Left) We glue 0 and 232 1 together, so that objects are instead assigned to the
server that is closest in the clockwise direction. This solves the problem of the last object
being to the right of the last server. (Right) Adding a new server s3 . Object x2 moves from
s0 to s3 .
The key idea is: in addition to hashing the names of all objects (URLs) x, like before,
we also hash the names of all the servers s. The object and server names need to be hashed
to the same range, such as 32-bit values.
To understand which objects are assigned to which servers, consider the array shown in
Figure 2, indexed by the possible hash values. (This array might be very big and it exists
only in our minds; well discuss the actual implementation shortly.) Imagine that weve
already hashed all the server names and made a note of them in the corresponding buckets.
Given an object x that hashes to the bucket h(x), we scan buckets to the right of h(x) until
we find a bucket h(s) to which the name of some server s hashes. (We wrap around the
array, if need be.) We then designate s as the server responsible for the object x.
This approach to consistent hashing can also be visualized on a circle, with points on the
circle corresponding to the possible hash values (Figure 3(left)). Servers and objects both
hash to points on this circle; an object is stored on the server that is closest in the clockwise
direction. Thus n servers partition the circle into n segments, with each server responsible
for all objects in one of these segments.
This simple idea leads to some nice properties. First, assuming reasonable hash functions,
5
by symmetry, the expected load on each of the n servers is exactly a n1 fraction of the objects.
(There is non-trivial variance; below we explain how to reduce it via replication.) Second,
and more importantly, suppose we add a new server s which objects have to move? Only
the objects stored at s. See Figure 3(right). Combined, these two observations imply that,
in expectation, adding the nth server causes only a n1 fraction of the objects to relocate.
This is the best-case scenario if we want the load to be distributed evenly clearly the
objects on the new server have to move from where they were before. By contrast, with the
solution (1), on average only a n1 fraction of the objects dont move when the nth server is
added!6
So how do we actually implement the standard hash table operations Lookup and Insert?
Given an object x, both operations boil down to the problem of efficiently implementing the
rightward/clockwise scan for the server s that minimizes h(s) subject to h(s) h(x).7 Thus,
we want a data structure for storing the server names, with the corresponding hash values
as keys, that supports a fast Successor operation. A hash table isnt good enough (it doesnt
maintain any order information at all); a heap isnt good enough (it only maintains a partial
order so that identifying the minimum is fast); but recall that binary search trees, which
maintain a total ordering of the stored elements, do export a Successor function.8 Since the
running time of this operation is linear in the depth of the tree, its a good idea to use a
balanced binary search tree, such as a Red-Black tree. Finding the server responsible for
storing a given object x then takes O(log n) time, where n is the number of servers.9
Reducing the variance: While the expected load of each server is a n1 fraction of the
objects, the realized load of each server will vary. Pictorially, if you pick n random points
on the circle, youre very unlikely to get a perfect partition of the circle into equal-sized
segments.
An easy way to decrease this variance is to make k virtual copies of each server s,
implemented by hashing its name with k different hash functions to get h1 (s), . . . , hk (s).
(More on using multiple hash functions next lecture.) For example, with servers {0, 1, 2}
and k = 4, we choose 12 points on the circle 4 labeled 0, 4 labeled 1, and 4 labeled 2.
(See Figure 4.) Objects are assigned as before from h(x), we scan rightward/clockwise
until we encounter one of the hash values of some server s, and s is responsible for storing x.
6
You might wonder how the objects actually get moved. There are several ways to do this, and the best
one depends on the context. For example, the new server could identify its successor and send a request
for the objects that hash to the relevant range. In the original Web caching context, one can get away with
doing nothing: a request for a Web page that was re-assigned from an original cache s to the new cache s0
will initially result in a cache miss, causing s0 to download the page from the appropriate Web server and
cache it locally to service future requests. The copies of these Web pages that are at s will never be used
again (requests for these pages now go to s0 instead), so they will eventually time out and be deleted from s.
7
We ignore the wraparound case, which can be handled separately as an edge case.
8
This operation is usually given short shrift in lectures on search trees, but its exactly what we want
here!
9
Our description, and the course in general, emphasizes fundamental concepts rather than the details
of an implementation. Our assumption is that youre perfectly capable of translating high-level ideas into
working code. A quick Web search for consistent hashing python or consistent hashing java yields some
example implementations.
6
Figure 4: Decreasing the variance by assigning each server multiple hash values.
By symmetry, each server still expects to get a n1 fraction of the objects. This replication
increases the number of keys stored in the balanced binary search by a factor of k, but it
reduces the variance in load across servers significantly. Intuitively, some copies of a server
1
will get more objects than expected (more than a kn fraction), but this will be largely
canceled out by other copies that get fewer objects than expected. Choosing k log2 n
is large enough to obtain reasonably balanced loads. Well teach you some methods for
reasoning mathematically about such replication vs. variance trade-offs later in the course.
Virtual copies are also useful for dealing with heterogeneous servers that have different
capacities. The sensible approach is to make the number of virtual copies of a server propor-
tional to the server capacity; for example, if one server is twice as big as another, it should
have twice as many virtual copies.
3. March 31, 1999: A trailer for Star Wars: The Phantom Menace is released online,
with Apple the exclusive official distributor. apple.com goes down almost immediately
due to the overwhelming number of download requests. For a good part of the day,
10
The concept of consistent hashing was also invented, more or less simultaneously, in [5]. The implemen-
tation in [5] is different from and incomparable to the one in [2].
7
the only place to watch (an unauthorized copy?) of the trailer is via Akamais Web
caches. This put Akamai on the map.
4. April 1, 1999: Steve Jobs, having noticed Akamais performance the day before, calls
Akamais President Paul Sagan to talk. Sagan hangs up on Jobs, thinking its an April
Fools prank by one of the co-founders, Danny Lewin or Tom Leighton.
5. September 11, 2001: Tragically, co-founder Danny Lewin is killed aboard the first
airplane that crashes into the World Trade Center. (Akamai remains highly relevant
to this day, however.)
6. 2001: Consistent hashing is re-purposed in [4] to address technical challenges that arise
in peer-to-peer (P2P) networks. A key issue in P2P networks is how to keep track of
where to look for a file, such as an mp3. This functionality is often called a distributed
hash table (DHT). DHTs were a very hot topic of research in the early years of the
21st century.
First-generation P2P networks (like Napster) solved this problem by having a cen-
tralized server keep track of where everything is. Such a network has a single point
of failure, and thus is also easy to shut down. Second-generation P2P networks (like
Gnutella) used broadcasting protocols so that everyone could keep track of where ev-
erything is. This is an expensive solution that does not scale well with the number
of nodes. Third-generation P2P networks, like Chord [4], use consistent hashing to
keep track of whats where. The key challenge is to implement the successor operation
discussed in Section 1.4 even though nobody is keeping track of the full set of servers.
The high-level idea in [4], which has been copied or refined in several subsequent P2P
networks, is that each machine should be responsible for keeping track of a small num-
ber of other machines in the network. An object search is then sent to the appropriate
machine using a clever routing protocol.
Consistent hashing remains in use in modern P2P networks, including for some features
of the BitTorrent protocol.
7. 2006: Amazon implements its internal Dynamo system using consistent hashing [1].
The goal of this system is to store tons of stuff using commodity hardware while
maintaining a very fast response time. As much data as possible is stored in main
memory, and consistent hashing is used to keep track of whats where.
This idea is now widely copied in modern lightweight alternatives to traditional databases
(the latter of which tend to reside on disk). Such alternatives generally support few
operations (e.g., no secondary keys) and relax traditional consistency requirements in
exchange for speed. As you can imagine, this is a big win for lots of modern Internet
companies.
8
2 About CS168
For the nuts and bolts of coursework, grading, etc., see the course Web site. Read on for an
overview of the course and our overarching goals and philosophy.
1. Modern hashing. This lectures topic of consistent hashing is one example. Next lecture
well discuss how hash functions can be used to perform lossy compression through
data structures like bloom filters and count-min sketches. The goal is to compress a
data set while approximately preserving properties such as set membership or frequency
counts.
2. The nearest neighbor problem and dimension reduction. Dimension reduction contin-
ues the theme of lossy compression: its about compressing data while approximately
preserving similarity information (represented using distances). In the nearest neigh-
bor problem, you are given a point set (e.g., representing documents) and want to
preprocess it so that, given a query (e.g., representing a keyword search query), you
can quickly determine which point is closest to the query. This problem offers our first
method of understanding and exploring a data set.
9
give you the geometric intuition to make gradient descent obvious in hindsight, and
will see the method in action in the context of regression.
4. Linear algebra and spectral techniques. One could also call this topic the unreason-
able effectiveness of sophomore-level linear algebra. This is a major topic, and it will
occupy us for three weeks. Many data sets are usefully interpreted as points in space
(and hence matrices, with the vectors forming the rows or the columns of a matrix).
For example, a document can be mapped to a vector of word frequencies. Graphs
(social networks, etc.) can also usefully be viewed as matrices in various ways. Well
see that linear algebraic methods are incredibly useful for exposing the geometry
of a data set, and this allows one to see patterns in the data that would be other-
wise undetectable. Exhibit A is principle component analysis (PCA), which identifies
the most meaningful dimensions of a data set. Well also cover the singular value
decomposition (SVD), which identifies low-rank structure and is useful for denoising
data and recovering missing data. Finally, well see how eigenvalues and eigenvectors
have shockingly meaningful interpretations in network data. Linear algebra is a much-
maligned topic in computer science circles, but hopefully the geometric intuition and
real-world applications we provide will bring the subject to life.
5. Sampling and estimation. Its often useful to view a data set as a sample from some
distribution or population. How many samples are necessary and sufficient before you
can make accurate inferences about the population? How can you estimate what you
dont know? Well also study the Markov Chain Monte Carlo method, by which you
can estimate what you cannot compute.
6. Alternative bases and the Fourier perspective. This topic continues the theme of how
a shift in perspective can illuminate otherwise undetectable patterns in data. For
example, some data has a temporal component (like audio or time-series data). Other
data has locality (nearby pixels of an image are often similar, same for measurements
by sensors). A naive representation of such data might have one point per moment
in time or per point in space. It can be far more informative to transform the data
into a dual representation, which rephrases the data in terms of patterns that occur
across time or across space. This is the point of the Fourier transform and other
similar-in-spirit transforms.
10
2.3 Course Goals and Themes
1. Our ambition is for this to be the coolest computer science course youve ever taken.
Seriously!
2. We also think of CS168 as a capstone course, meaning a course you take at the
conclusion of your major, after which the seemingly disparate skills and ideas learned
in previous courses can be recognized as a coherent and powerful whole. Even before
taking CS168 your computer science toolbox is rich enough to tackle timely and chal-
lenging problems, with consistent hashing being a fine example. After the course, your
toolbox will be richer still. Capstone courses in computer science are traditionally soft-
ware engineering courses; in comparison, CS168 will have a much stronger algorithmic
and conceptual bent.
3. We focus on general-purpose ideas that are not overly wedded to a particular appli-
cation domain, and are therefore potentially useful to as many of you as possible,
whatever your future trajectory (software engineer, data scientist, start-up founder,
PhD student, etc.).
4. If you forced us to pick the most prominent theme of the course, it would probably
be how to be smart with your data. This has several aspects: how to be smart about
storing it (like in this lecture), about what to throw out and what to retain, about how
to transform it, visualize it, analyze it, etc. After completing CS168, you will be an
educated client of the modern tools for performing all of these tasks.
References
[1] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin,
S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazons highly available
key-value store. SIGOPS Operating Systems Review, 41(6):205220, 2007.
[2] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent
hashing and random trees: Distributed caching protocols for relieving hot spots on the
world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory
of Computing (STOC), pages 654663, 1997.
[3] J. Lamping and E. Veach. A fast, minimal memory, consistent hash algorithm.
arXiv:1406.2294, 2014.
[4] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and
H. Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for internet applica-
tions. IEEE/ACM Transactions on Networking, 11(1):1732, 2003.
[5] D. G. Thaler and C. V. Ravishankar. Using name-based mappings to increase hit rates.
IEEE/ACM Transactions on Networking, 6(1):114, 1998.
11