Leveraging Consistent Hashing in Your Python Applications
Leveraging Consistent Hashing in Your Python Applications
● Amazon DynamoDB
● Cassandra / ScyllaDB
● Riak
● CockroachDB
MAPPING
referential -> information
Phonebook
name -> phone number
Map logic
lookup efficiency
MAP
key -> value
Python dict()
{key: value}
Python dict() is a Hash Table
Hash Table logic
implementation
Python dict() implementation
Array (in memory)
hash(key) & (size of array - 1) = array index
1 |
...
11 |
Key factors to consider
efficiency
scaling
Python dict efficiency & scaling
hash(key) & (size of array - 1) = array index
...
Optimized for fast lookups O(1)
Memory inefficient (probing)
11 | MEMORY
Distributed Hash Tables
(DHT)
Split your key space into buckets
what’s the best operator function to find the server hosting the bucket for my key
?
Naive DHT implementation
md5(key) % (number of buckets) = server
hash(server 1)
hash(server 0)
hash(server 2)
Keys’ bucket is on the next server in the ring
hash(key)
SERVER 2
SERVER 0
hash(key)
SERVER 1
1/n
~ fraction of remapped keys
Uneven partitions lead to hotspots
server 0
server 2
server 1
hash functions are not perfect
Which hash function to use ?
Cryptographic hash functions Non cryptographic hash functions
adoption fast
DB1
client A
client B DB2
client C
DB3
client D
DB4
Example use case #1
Database instances distribution
Example use case #1
Database instances distribution
Example use case #2
Disk & network I/O distribution
disk
task A 1
disk
task B
2
task C disk
3
task D
disk
4
Example use case #3
Log & tracing consistency
worker
user_id A 1
worker
user_id B
2
user_id C worker
3
user_id D
worker
4
Example use case #4
python-memcached consolidation
cache
‘potato’ 1
‘coconut’ cache
2
‘tomato’ cache
3
‘raspberry’
cache
4
Live demo raffle
List of GIFs
One of the GIF is the winner
Every participant is a node (bucket)
hash(WINNER_GIF_URL) picks the winner node
http://ep17.nbly.co