0% found this document useful (0 votes)

10 views28 pages

Consistent Hashihhhhng Explained

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views28 pages

Consistent Hashihhhhng Explained

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Consistent Hashing Explained

systemdesign.one/consistent-hashing-explained

Distributed Hashing Technique

NK included in Algorithm

Consistent hashing is used in the system design of distributed systems such as the URL
shortener, and Pastebin

How does consistent hashing work?

At a high level, consistent hashing performs the following operations:

1. The output of the hash function is placed on a virtual ring structure (known as the
hash ring)
2. The hashed IP addresses of the nodes are used to assign a position for the nodes
on the hash ring
3. The key of a data object is hashed using the same hash function to find the position
of the key on the hash ring
4. The hash ring is traversed in the clockwise direction starting from the position of the
key until a node is found

1/28
5. The data object is stored or retrieved from the node that was found

Terminology
The following terminology might be useful for you:

Node: a server that provides functionality to other services

Hash function: a mathematical function used to map data of arbitrary size to fixed-
size values
Data partitioning: a technique of distributing data across multiple nodes to improve
the performance and scalability of the system
Data replication: a technique of storing multiple copies of the same data on different
nodes to improve the availability and durability of the system
Hotspot: A performance-degraded node in a distributed system due to a large share
of data storage and a high volume of retrieval or storage requests
Gossip protocol: peer-to-peer communication technique used by nodes to
periodically exchange state information

Requirements

Functional Requirements

Design an algorithm to horizontally scale the cache servers

The algorithm must minimize the occurrence of hot spots in the network
The algorithm must be able to handle internet-scale dynamic load
The algorithm must reuse existing network protocols such as TCP/IP

Non-Functional Requirements
Scalable
High availability
Low latency
Reliable

Further system design learning resources

Books are a powerful medium to gather new knowledge. Check out the following books to
set yourself up for success in the system design interview:

Introduction
A website can become extremely popular in a relatively short time frame. The increased
load might swamp and degrade the performance of the website. The cache server is used
to improve the latency and reduce the load on the system. The cache servers must scale

2/28
to meet the dynamic demand as a fixed collection of cache servers will not be able to
handle the dynamic load. In addition, the occurrence of multiple cache misses might
swamp the origin server.

Figure 1: Cache replication

The replication of the cache improves the availability of the system. However, replication
of the cache does not solve the dynamic load problem as only a limited data set can be
cached1. The tradeoffs of the cache replication approach are the following:

only a limited data set is cached

consistency between cache replicas is expensive to maintain

The spread is the number of cache servers holding the same key-value pair (data
object). The load is the number of distinct data objects assigned to a cache server. The
optimal configuration for the high performance of a cache server is to keep the spread
and the load at a minimum2.

3/28
Figure 2: Dynamic hashing

The data set must be partitioned (shard) among multiple cache servers (nodes) to
horizontally scale. The replication and partitioning of nodes are orthogonal to each other.
Multiple data partitions are stored on a single node for improved fault tolerance and
increased throughput. The reasons for partitioning are the following1:

a cache server is memory bound

increased throughput

Partitioning
The data set is partitioned among multiple nodes to horizontally scale out. The different
techniques for partitioning the cache servers are the following1:

Random assignment
Single global cache
Key range partitioning
Static hash partitioning
Consistent hashing

Random assignment

4/28
Figure 3: Partitioning; Random assignment

The server distributes the data objects randomly across the cache servers. The random
assignment of a large data set results in a relatively uniform distribution of data. However,
the client cannot easily identify the node to retrieve the data due to random distribution. In
conclusion, the random assignment solution will not scale to handle the dynamic load.

Single global cache

5/28
Figure 4: Partitioning; Single global cache

The server stores the whole data set on a single global cache server. The data objects
are easily retrieved by the client at the expense of degraded performance and decreased
availability of the system. In conclusion, the single global cache solution will not scale to
handle the dynamic load.

Key range partitioning

Figure 5: Key range partitioning

The cache servers are partitioned using the key range of the data set. The client can
easily retrieve the data from cache servers. The data set is not necessarily uniformly
distributed among the cache servers as there might be more keys in a certain key range.
In conclusion, the key range partitioning solution will not scale to handle the dynamic
load.

6/28
Static hash partitioning

Figure 6: Static hash partitioning

The identifiers (internet protocol address or domain name) of the nodes are placed on an
array of length N. The modulo hash service computes the hash of the data key and
executes a modulo N operation to locate the array index (node identifier) to store or
retrieve a key. The time complexity to locate a node identifier (ID) in static hash
partitioning is constant O(1).

Static hash partitioning

node ID = hash(key) mod N

where N is the array’s length and the key is the key of the data object.

A collision occurs when multiple nodes are assigned to the same position on the array.
The techniques to resolve a collision are open addressing and chaining. The occurrence
of a collision degrades the time complexity of the cache nodes.

7/28
Figure 7: Static hash partitioning; Node failure

The static hash partitioning is not horizontally scalable. The removal of a node (due to a
server crash) breaks the existing mappings between the keys and nodes. The keys must
be rehashed to restore mapping between keys and nodes3.

8/28
Figure 8: Static hash partitioning; Node added

New nodes must be provisioned to handle the increasing load. The addition of a node
breaks the existing mappings between the keys and nodes. The following are the
drawbacks of static hash partitioning:

nodes will not horizontally scale to handle the dynamic load

the addition or removal of a node breaks the mapping between keys and nodes
massive data movement when the number of nodes changes

Figure 9: Static hash partitioning; Data movement due to node failure

In conclusion, the data set must be rehashed or moved between nodes when the number
of nodes changes. The majority of the requests in the meantime will result in cache
misses. The requests are delegated to the origin server on cache misses. The heavy load
on the origin server might swamp and degrade the service3.

Consistent hashing

9/28
Figure 10: Consistent hashing

Consistent hashing is a distributed systems technique that operates by assigning the data
objects and nodes a position on a virtual ring structure (hash ring). Consistent hashing
minimizes the number of keys to be remapped when the total number of nodes changes4.

Figure 11: Hash function mapping

The basic gist behind the consistent hashing algorithm is to hash both node identifiers
and data keys using the same hash function. A uniform and independent hashing function
such as message-digest 5 (MD5) is used to find the position of the nodes and keys (data

10/28
objects) on the hash ring. The output range of the hash function must be of reasonable
size to prevent collisions.

Figure 12: Consistent hash ring

The output space of the hash function is treated as a fixed circular space to form the hash
ring. The largest hash value wraps around the smallest hash value. The hash ring is
considered to have a finite number of positions5.

Figure 13: Consistent hashing; Positioning the nodes on the hash ring

The following operations are executed to locate the position of a node on the hash ring4:

11/28
1. Hash the internet protocol (IP) address or domain name of the node using a hash
function
2. The hash code is base converted
3. Modulo the hash code with the total number of available positions on the hash ring

Figure 14: Consistent hashing; Node position

Suppose the hash function produces an output space size of 10 bits (2¹⁰ = 1024), the
hash ring formed is a virtual circle with a number range starting from 0 to 1023. The
hashed value of the IP address of a node is used to assign a location for the node on the
hash ring.

12/28
Figure 15: Consistent hashing; Storing a data object (key)

The key of the data object is hashed using the same hash function to locate the position
of the key on the hash ring. The hash ring is traversed in the clockwise direction starting
from the position of the key until a node is found. The data object is stored on the node
that was found. In simple words, the first node with a position value greater than the
position of the key stores the data object6.

13/28
Figure 16: Consistent hashing; Retrieving a data object (key)

The key of the data object is hashed using the same hash function to locate the position
of the key on the hash ring. The hash ring is traversed in the clockwise direction starting
from the position of the key until a node is found. The data object is retrieved from the
node that was found. In simple words, the first node with a position value greater than the
position of the key must hold the data object.

Each node is responsible for the region on the ring between the node and its predecessor
node on the hash ring. The origin server must be queried on a cache miss. In conclusion,
the following operations are performed for consistent hashing7:

1. The output of the hash function such as MD5 is placed on the hash ring
2. The IP address of the nodes is hashed to find the position of the nodes on the hash
ring
3. The key of the data object is hashed using the same hash function to locate the
position of the key on the hash ring
4. Traverse the hash ring in the clockwise direction starting from the position of the key
until the next node to identify the correct node to store or retrieve the data object

14/28
Figure 17: Consistent hashing; Deletion of a node

The failure (crash) of a node results in the movement of data objects from the failed node
to the immediate neighboring node in the clockwise direction. The remaining nodes on
the hash ring are unaffected5.

Figure 18: Consistent hashing; Addition of a node

When a new node is provisioned and added to the hash ring, the keys (data objects) that
fall within the range of the new node are moved out from the immediate neighboring node
in the clockwise direction.

Consistent hashing

Average number of keys stored on a node = k/N

where k is the total number of keys (data objects) and N is the number of nodes.

15/28
The deletion or addition of a node results in the movement of an average number of keys
stored on a single node. Consistent hashing aid cloud computing by minimizing the
movement of data when the total number of nodes changes due to dynamic load8.

Figure 19: Consistent hashing; Non-uniform positioning of nodes

There is a chance that nodes are not uniformly distributed on the consistent hash ring.
The nodes that receive a huge amount of traffic become hot spots resulting in cascading
failure of the nodes.

16/28
Figure 20: Consistent hashing; Virtual nodes

The nodes are assigned to multiple positions on the hash ring by hashing the node IDs
through distinct hash functions to ensure uniform distribution of keys among the nodes.
The technique of assigning multiple positions to a node is known as a virtual node. The
virtual nodes improve the load balancing of the system and prevent hot spots. The
number of positions for a node is decided by the heterogeneity of the node. In other
words, the nodes with a higher capacity are assigned more positions on the hash ring5.

The data objects can be replicated on adjacent nodes to minimize the data movement
when a node crashes or when a node is added to the hash ring. In conclusion, consistent
hashing resolves the problem of dynamic load.

Consistent hashing implementation

Figure 21: Consistent hashing implementation; Binary search tree storing the node positions

17/28
The self-balancing binary search tree (BST) data structure is used to store the positions
of the nodes on the hash ring. The BST offers logarithmic O(log n) time complexity for
search, insert, and delete operations. The keys of the BST contain the positions of the
nodes on the hash ring.

The BST data structure is stored on a centralized highly available service. As an

alternative, the BST data structure is stored on each node, and the state information
between the nodes is synchronized through the gossip protocol7.

Figure 22: Consistent hashing implementation; Insertion of a data object (key)

In the diagram, suppose the hash of an arbitrary key ‘xyz’ yields the hash code output 5.
The successor BST node is 6 and the data object with the key ‘xyz’ is stored on the node
that is at position 6. In general, the following operations are executed to insert a key (data
object):

1. Hash the key of the data object

2. Search the BST in logarithmic time to find the BST node immediately greater than
the hashed output
3. Store the data object in the successor node

18/28
Figure 23: Consistent hashing implementation; Insertion of a node

The insertion of a new node results in the movement of data objects that fall within the
range of the new node from the successor node. Each node might store an internal or an
external BST to track the keys allocated in the node. The following operations are
executed to insert a node on the hash ring:

1. Insert the hash of the node ID in BST in logarithmic time

2. Identify the keys that fall within the subrange of the new node from the successor
node on BST
3. Move the keys to the new node

Figure 24: Consistent hashing implementation; Deletion of a node

The deletion of a node results in the movement of data objects that fall within the range of
the decommissioned node to the successor node. An additional external BST can be
used to track the keys allocated in the node. The following operations are executed to
delete a node on the hash ring:

1. Delete the hash of the decommissioned node ID in BST in logarithmic time

2. Identify the keys that fall within the range of the decommissioned node
3. Move the keys to the successor node

19/28
What is the asymptotic complexity of consistent hashing?
The asymptotic complexity of consistent hashing operations are the following:

Time
Operation Complexity Description

Add a node O(k/n + logn) O(k/n) for redistribution of keys O(logn) for binary
search tree traversal

Remove a O(k/n + logn) O(k/n) for redistribution of keys O(logn) for binary
node search tree traversal

Add a key O(logn) O(logn) for binary search tree traversal

Remove a O(logn) O(logn) for binary search tree traversal

key

where k = total number of keys, n = total number of nodes 2, 7.

How to handle concurrency in consistent hashing?

The BST that stores the positions of the nodes is a mutable data structure that must be
synchronized when multiple nodes are added or removed at the same time on the hash
ring. The readers-writer lock is used to synchronize BST at the expense of a slight
increase in latency.

What hash functions are used in consistent hashing?

An optimal hash function for consistent hashing must be fast and produce uniform output.
The cryptographic hash functions such as MD5, and the secure hash algorithms SHA-1
and SHA-256 are not relatively fast. MurmurHash is a relatively cheaper hash function.
The non-cryptographic hash functions like xxHash, MetroHash, or SipHash1–3 are other
potential candidates6.

What are the benefits of consistent hashing?

The following are the advantages of consistent hashing3:

horizontally scalable
minimized data movement when the number of nodes changes
quick replication and partitioning of data

The following are the advantages of virtual nodes5:

load handled by a node is uniformly distributed across the remaining available

nodes during downtime

20/28
the newly provisioned node accepts an equivalent amount of load from the available
nodes
fair distribution of load among heterogenous nodes

What are the drawbacks of consistent hashing?

The following are the disadvantages of consistent hashing5:

cascading failure due to hot spots

non-uniform distribution of nodes and data
oblivious to the heterogeneity in the performance of nodes

The following are the disadvantages of virtual nodes 5, 6, 8:

when a specific data object becomes extremely popular, consistent hashing will still
send all the requests for the popular data object to the same subset of nodes
resulting in a degradation of the service
capacity planning is trickier with virtual nodes
memory costs and operational complexity increase due to the maintenance of BST
replication of data objects is challenging due to the additional logic to identify the
distinct physical nodes
downtime of a virtual node affects multiple nodes on the ring

What are the consistent hashing examples?

21/28
Figure 25: Consistent hashing example: Discord

The discord server (discord space or chat room) is hosted on a set of nodes. The client of
the discord chat application identifies the set of nodes that hosts a specific discord server
using consistent hashing9.

22/28
Figure 26: Consistent hashing example: Amazon Dynamo

The distributed NoSQL data stores such as Amazon DynamoDB, Apache Cassandra,
and Riak use consistent hashing to dynamically partition the data set across the set of
nodes. The data is partitioned for incremental scalability5.

23/28
Figure 27: Consistent hashing example: Vimeo

The video storage and streaming service Vimeo uses consistent hashing for load
balancing the traffic to stream videos8.

24/28
Figure 28: Consistent hashing example: Netflix

The video streaming service Netflix uses consistent hashing to distribute the uploaded
video content across the content delivery network (CDN)10.

Consistent hashing algorithm real-world implementation

The clients of Memcached (Ketama), and Amazon Dynamo support consistent hashing
out of the box 3, 11. The HAProxy includes the bounded-load consistent hashing algorithm
for load balancing the traffic8. As an alternative, the consistent hashing algorithm can be
implemented, in the language of choice.

Consistent hashing optimization

Some of the popular variants of consistent hashing are the following:

Multi-probe consistent hashing

Consistent hashing with bounded loads

Multi-probe consistent hashing

25/28
Figure 29: Consistent hashing optimization; Multi-probe consistent hashing

The Multi-probe consistent hashing offers linear O(n) space complexity to store the
positions of nodes on the hash ring. There are no virtual nodes but a node is assigned
only a single position on the hash ring. The amortized time complexity for the addition and
removal of nodes is constant O(1). However, the key (data object) lookups are relatively
slower.

The basic gist of multi-probe consistent hashing is to hash the key (data object) multiple
times using distinct hash functions on lookup and the closest node in the clockwise
direction returns the data object12.

Consistent hashing with bounded loads

26/28
Figure 30: Consistent hashing optimization; Bounded-load consistent hashing

The consistent hashing with bounded load puts an upper limit on the load received by a
node on the hash ring, relative to the average load of the whole hash ring. The
distribution of requests is the same as consistent hashing as long as the nodes are not
overloaded13.

When a specific data object becomes extremely popular, the node hosting the data object
receives a significant amount of traffic resulting in the degradation of the service. If a node
is overloaded, the incoming request is delegated to a fallback node. The list of fallback
nodes will be the same for the same request hash. In simple words, the same node(s) will
consistently be the “second choice” for a popular data object. The fallback nodes resolve
the popular data object caching problem.

If a node is overloaded, the list of the fallback nodes will usually be different for different
request hashes. In other words, the requests to an overloaded node are distributed
among the available nodes instead of a single fallback node.

Summary
Consistent hashing is popular among distributed systems. The most common use cases
of consistent hashing are data partitioning and load balancing.

References

27/28
1. Lindsey Kuper, UC Santa Cruz CSE138 (Distributed Systems) Lecture 15:
introduction to sharding; consistent hashing (2021), youtube.com ↩︎

2. David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, Daniel
Lewin, Consistent Hashing and Random Trees: Distributed Caching Protocols for
Relieving Hot Spots on the World Wide Web (1997), GitHub.com ↩︎

3. Tom White, Consistent Hashing (2007), tom-e-white.com ↩︎

4. Srushtika Neelakantam, Consistent hashing explained (2018), ably.com ↩︎

5. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall
and Werner Vogels, Dynamo: Amazon’s Highly Available Key-value Store (2007),
GitHub.com ↩︎

6. Damian Gryski, Consistent Hashing: Algorithmic Tradeoffs (2018), medium.com ↩︎

7. MIT 6.854 Spring 2016 Lecture 3: Consistent Hashing and Random Trees (2016),
mit.edu ↩︎

8. Improving load balancing with a new consistent-hashing algorithm (2016), Vimeo

Engineering Blog ↩︎

9. Stanislav Vishnevskiy, How discord scaled elixir to 5,000,000 concurrent users

(2017), discord.com ↩︎

10. Mohit Vora, Andrew Berglund, Videsh Sadafal, David Pfitzner, and Ellen Livengood,
Distributing Content to Open Connect (2017), netflixtechblog.com ↩︎

11. libketama - a consistent hashing algo for Memcache clients (2007), last.fm ↩︎

12. Ben Appleton, Michael O’Reilly, Multi-Probe Consistent Hashing (2015), arxiv.org ↩︎

13. Vahab Mirrokni, Mikkel Thorup, and Morteza Zadimoghaddam, Consistent Hashing
with Bounded Loads (2017), arxiv.org ↩︎

28/28

The Elephant in The Fridge - Gui - John Giles
100% (2)
The Elephant in The Fridge - Gui - John Giles
323 pages
Answer: D: Exam Name: Exam Type: Exam Code: Total Questions
0% (1)
Answer: D: Exam Name: Exam Type: Exam Code: Total Questions
39 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Consistent Hashing
No ratings yet
Consistent Hashing
2 pages
28 Consistent Hashing
No ratings yet
28 Consistent Hashing
6 pages
06 Design Consistent Hashing
No ratings yet
06 Design Consistent Hashing
13 pages
Consistent Hashing With Bounded Load
No ratings yet
Consistent Hashing With Bounded Load
9 pages
Consistent Hashing - Explanation and Implementation
No ratings yet
Consistent Hashing - Explanation and Implementation
7 pages
Consistent Hashing
No ratings yet
Consistent Hashing
19 pages
Data Partitioning
No ratings yet
Data Partitioning
39 pages
Design Consistent Hashing: The Rehashing Problem
No ratings yet
Design Consistent Hashing: The Rehashing Problem
2 pages
Presentation 7
No ratings yet
Presentation 7
16 pages
Dxhash: A Scalable Consistent Hashing Based On The Pseudo-Random Sequence
No ratings yet
Dxhash: A Scalable Consistent Hashing Based On The Pseudo-Random Sequence
12 pages
15-440 Distributed Systems: Hashing and Cdns
No ratings yet
15-440 Distributed Systems: Hashing and Cdns
38 pages
005 - Data Partitioning - Grokking The Advanced System Design Interview - WWW - Educative.io
No ratings yet
005 - Data Partitioning - Grokking The Advanced System Design Interview - WWW - Educative.io
5 pages
Lec14 Dhts
No ratings yet
Lec14 Dhts
22 pages
Binomial
No ratings yet
Binomial
14 pages
Memcached & Consistent Hashing: Distributing Keys & Values
No ratings yet
Memcached & Consistent Hashing: Distributing Keys & Values
3 pages
Class 7 - Scaling, Sharding, Consistent Hashing
No ratings yet
Class 7 - Scaling, Sharding, Consistent Hashing
4 pages
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
No ratings yet
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
49 pages
ADBMS - Unit3 (Autosaved)
No ratings yet
ADBMS - Unit3 (Autosaved)
12 pages
Hashing
No ratings yet
Hashing
4 pages
Mod 5
No ratings yet
Mod 5
13 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
Hashing Muddasir
No ratings yet
Hashing Muddasir
43 pages
DFS OS Final
No ratings yet
DFS OS Final
28 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Unit Iv
No ratings yet
Unit Iv
6 pages
CS168: The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing
11 pages
A Crash Course in Caching - Part 2 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 2 - by Alex Xu
9 pages
MGS 655: Distributed Computing: Consistent Hashing
No ratings yet
MGS 655: Distributed Computing: Consistent Hashing
8 pages
Foundational Concepts in System Design - Part 2
No ratings yet
Foundational Concepts in System Design - Part 2
25 pages
Hashing in Networked Systems: Mike Freedman
No ratings yet
Hashing in Networked Systems: Mike Freedman
24 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
System Design
No ratings yet
System Design
32 pages
Assignment 6
No ratings yet
Assignment 6
5 pages
Unit 3.docx Dbms
No ratings yet
Unit 3.docx Dbms
25 pages
System Design Importnat Concepts
No ratings yet
System Design Importnat Concepts
16 pages
Hashing
No ratings yet
Hashing
16 pages
System Caching - Part 2
No ratings yet
System Caching - Part 2
7 pages
DHTLookup
No ratings yet
DHTLookup
56 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
6 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Unit 3 File Organization
No ratings yet
Unit 3 File Organization
19 pages
Anchorhash: A Scalable Consistent Hash
No ratings yet
Anchorhash: A Scalable Consistent Hash
12 pages
Unit V
No ratings yet
Unit V
14 pages
02 235211 042 11951087560 09062024 104438pm
No ratings yet
02 235211 042 11951087560 09062024 104438pm
7 pages
CNG351 Lecture 11 Part 2
No ratings yet
CNG351 Lecture 11 Part 2
32 pages
Edge Computing
No ratings yet
Edge Computing
5 pages
Data Structure Unit II
No ratings yet
Data Structure Unit II
25 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Hashing
No ratings yet
Hashing
34 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
10 pages
Leveraging Consistent Hashing in Your Python Applications
No ratings yet
Leveraging Consistent Hashing in Your Python Applications
39 pages
Assignment 3
No ratings yet
Assignment 3
53 pages
Hashing
No ratings yet
Hashing
4 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Becoming The Mentor You Wish You Had A Guide For Software Engineers
No ratings yet
Becoming The Mentor You Wish You Had A Guide For Software Engineers
7 pages
Bhargava2 v-4 ScrambledConteer-11
No ratings yet
Bhargava2 v-4 ScrambledConteer-11
10 pages
Roadmap of UML
No ratings yet
Roadmap of UML
39 pages
Java Enums PDF
No ratings yet
Java Enums PDF
3 pages
Surreptitious Software Book
No ratings yet
Surreptitious Software Book
13 pages
Unit-4: Hashing & File Structure (Hashing)
100% (1)
Unit-4: Hashing & File Structure (Hashing)
26 pages
Ethereum Yellow Paper
No ratings yet
Ethereum Yellow Paper
41 pages
Kaspersky Lab Whitepaper Machine Learning
No ratings yet
Kaspersky Lab Whitepaper Machine Learning
17 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Practical Assignment 1
No ratings yet
Practical Assignment 1
5 pages
Realeft Developer'S Guide With XML Definitions
No ratings yet
Realeft Developer'S Guide With XML Definitions
39 pages
HashCode and Equals
No ratings yet
HashCode and Equals
3 pages
Task 3 Hashing Quadratic Probing
No ratings yet
Task 3 Hashing Quadratic Probing
7 pages
Mathematics Behind Blockchain - Pooja Warke
No ratings yet
Mathematics Behind Blockchain - Pooja Warke
42 pages
HANA Memory OutOfMemoryEvents 1.00.120+
No ratings yet
HANA Memory OutOfMemoryEvents 1.00.120+
9 pages
E395923X TM112 TMAParthShah E395923X TM112
No ratings yet
E395923X TM112 TMAParthShah E395923X TM112
15 pages
Literature Review On Orange
100% (2)
Literature Review On Orange
6 pages
File Management (Plattech)
100% (1)
File Management (Plattech)
3 pages
Unit I NS
No ratings yet
Unit I NS
59 pages
Data Quality and Data Cleaning: An Overview
0% (1)
Data Quality and Data Cleaning: An Overview
132 pages
Winnowing
No ratings yet
Winnowing
15 pages
Quantum Operating Systems
No ratings yet
Quantum Operating Systems
6 pages
Data Structure Full Book
100% (2)
Data Structure Full Book
361 pages
AADS Exam Note 2022
No ratings yet
AADS Exam Note 2022
12 pages
Chapter 6 - DS
No ratings yet
Chapter 6 - DS
67 pages
Oracle Partitioning Interview Questions and Answers
0% (1)
Oracle Partitioning Interview Questions and Answers
3 pages
Windows Powershell 4.0 Language Quick Reference: Useful Commands
No ratings yet
Windows Powershell 4.0 Language Quick Reference: Useful Commands
4 pages
Lecture 15
No ratings yet
Lecture 15
4 pages
Inmetro Software Requirements-586-12
No ratings yet
Inmetro Software Requirements-586-12
13 pages
DBMS End Term Question Paper April 2025
No ratings yet
DBMS End Term Question Paper April 2025
15 pages
Weak and Bright Student Assignment
No ratings yet
Weak and Bright Student Assignment
4 pages
Java Tutorial Part 3
No ratings yet
Java Tutorial Part 3
46 pages

Consistent Hashihhhhng Explained

Uploaded by

Consistent Hashihhhhng Explained

Uploaded by

Consistent Hashing Explained

Distributed Hashing Technique

How does consistent hashing work?

Node: a server that provides functionality to other services

Design an algorithm to horizontally scale the cache servers

Further system design learning resources

Figure 1: Cache replication

only a limited data set is cached

a cache server is memory bound

Single global cache

Key range partitioning

Figure 5: Key range partitioning

Figure 6: Static hash partitioning

Static hash partitioning

node ID = hash(key) mod N

nodes will not horizontally scale to handle the dynamic load

Figure 9: Static hash partitioning; Data movement due to node failure

Figure 11: Hash function mapping

Figure 12: Consistent hash ring

Figure 14: Consistent hashing; Node position

Figure 18: Consistent hashing; Addition of a node

Average number of keys stored on a node = k/N

Figure 19: Consistent hashing; Non-uniform positioning of nodes

Consistent hashing implementation

The BST data structure is stored on a centralized highly available service. As an

Figure 22: Consistent hashing implementation; Insertion of a data object (key)

1. Hash the key of the data object

1. Insert the hash of the node ID in BST in logarithmic time

Figure 24: Consistent hashing implementation; Deletion of a node

1. Delete the hash of the decommissioned node ID in BST in logarithmic time

Add a key O(logn) O(logn) for binary search tree traversal

Remove a O(logn) O(logn) for binary search tree traversal

where k = total number of keys, n = total number of nodes 2, 7.

How to handle concurrency in consistent hashing?

What hash functions are used in consistent hashing?

What are the benefits of consistent hashing?

The following are the advantages of virtual nodes5:

load handled by a node is uniformly distributed across the remaining available

What are the drawbacks of consistent hashing?

cascading failure due to hot spots

The following are the disadvantages of virtual nodes 5, 6, 8:

What are the consistent hashing examples?

Consistent hashing algorithm real-world implementation

Consistent hashing optimization

Multi-probe consistent hashing

Multi-probe consistent hashing

Consistent hashing with bounded loads

3. Tom White, Consistent Hashing (2007), tom-e-white.com ↩︎

4. Srushtika Neelakantam, Consistent hashing explained (2018), ably.com ↩︎

5. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

6. Damian Gryski, Consistent Hashing: Algorithmic Tradeoffs (2018), medium.com ↩︎

8. Improving load balancing with a new consistent-hashing algorithm (2016), Vimeo

9. Stanislav Vishnevskiy, How discord scaled elixir to 5,000,000 concurrent users

You might also like