28 Consistent Hashing
28 Consistent Hashing
index = hash_function(key)
Suppose we are designing a distributed caching system. Given ‘n’ cache servers,
an intuitive hash function would be ‘key % n’. It is simple and commonly used.
But it has two major drawbacks:
In Consistent Hashing, when the hash table is resized (e.g. a new cache host is
added to the system), only ‘k/n’ keys need to be remapped where ‘k’ is the total
number of keys and ‘n’ is the total number of servers. Recall that in a caching
system using the ‘mod’ as the hash function, all keys need to be remapped.
In Consistent Hashing, objects are mapped to the same host if possible. When a
host is removed from the system, the objects on that host are shared by other
hosts; when a new host is added, it takes its share from a few hosts without
touching other’s shares.
2 of 5
3 of 5
4 of 5
5 of 5
To add a new server, say D, keys that were originally residing at C will be split.
Some of them will be shifted to D, while other keys will not be touched.
To remove a cache or, if a cache fails, say A, all keys that were originally
mapped to A will fall into B, and only those keys need to be moved to B; other
keys will not be affected.
For load balancing, as we discussed in the beginning, the real data is essentially
randomly distributed and thus may not be uniform. It may make the keys on
caches unbalanced.
To handle this issue, we add “virtual replicas” for caches. Instead of mapping
each cache to a single point on the ring, we map it to multiple points on the
ring, i.e. replicas. This way, each cache is associated with multiple portions of
the ring.
If the hash function “mixes well,” as the number of replicas increases, the keys
will be more balanced.
Back Next
CAP Theorem Long-Polling vs WebSockets vs Serve…
Completed