Data Partitioning
Data Partitioning
ACK: Slides use some material from Scott Shenker (UC Berkeley) and Jim Kurose (UMass)
Agenda
⚫ Back to the Application Layer
⚫ P2P networks
⚫ Distributed Storage of Data
⚫ Next Lecture
⚫ Preview
2
Note: Change in our topic schedule
⚫ We will be discussing topics related to PA4 first, then move back to the
link layer and other topics
3
How Internet apps are architected?
Client-Server Architecture
How Internet apps are architected?
Client-Server Architecture
User Remote
device services
Internet
5
How Internet apps are architected?
Peer to Peer architecture
Internet
The course so far …
⚫ Studied fundamental networking concepts
7
The next few lectures of the course
⚫ Study fundamental concepts in building systems for distributed
applications
8
Next Few Classes: Fundamental Questions
⚫ How to distribute data for access by application components?
⚫ DHTs, consistent hashing
9
Today’s Lecture
⚫ How to partition data for a large distributed application?
10
Motivation
⚫ Present-day internet applications often store vast amounts of data and
need fast access to it
⚫ Examples:
⚫ E-commerce websites like Amazon store shopping carts for millions of users during
peak times
⚫ Social media and social networking websites store pictures, friends lists, reels,
conversation threads, for billions of users
⚫ A Content Distribution Network like Akamai caches more than 10% world’s web
content
11
Distributed Hash Tables
12
Hash Tables
⚫ A simple database that stores (key, value) pairs
⚫ Operations?
⚫ put(key, value)
⚫ get(key): returns the value Key Value
⚫ delete(key): deletes the (key, value) pair 132-54-3570 Tariq Usman
761-55-3791 Hina Akram
⚫ Expected time for operations: O(1)
385-41-0902 Rida Ali
⚫ Example: (ID,Name) 441-89-1956 Ali Ahmed
⚫ Another example: 217-66-5609 Afshan Khan
……. ………
⚫ Key: movie title
177-23-0199 Naeem Ullah
⚫ Value: list of IP addresses
13
Why is it called a “HASH” table?
⚫ Often the key stored is a numerical representation
⚫ A number instead of a movie name. Why?
⚫ More convenient and efficient to store and search
⚫ Typically internal (stored) key is the hash of the original key
Original Key Key Value
The Prestige 8962458 15, 10.2.5
171.64.90.121
Godfather 7800356 16.5.211.45
Heat 1567109 24.121.38.7
The English Patient 2360012 95.67.5.114
18.4.44.132
Jerry McGuire 5430938 143.21.34.156
……. ………
14
Interstellar 9290124 27.43.3.241
Problem
⚫ Many applications (e.g., Facebook, Amazon apps) require access to
billions of (key, value) pairs
⚫ Applications need fast access to these (key, value) pairs
⚫ Example: Memcached (an in-memory key/value store)
⚫ If all data is stored on a single server, it can become a bottleneck
⚫ Too many entries in one place
⚫ Too much query traffic to handle
⚫ Don’t want to fit in a single computer
16
How to partition (key,value) pairs
across servers?
17
Desired Properties
⚫ Balanced load distribution
⚫ (key, value) pairs stored over millions of peers
⚫ No server has “too many” data items – even distribution is ideal!
⚫ The number of queries is also evenly distributed (roughly)
⚫ Smoothness
⚫ On addition/removal of servers, minimize the number of keys that need to be
relocated – robust to peers coming and going (churn)
19
Solution#1
⚫ Use random assignment to map a movie to a server
⚫ What is the key? The value? for our (key, value) pair
⚫ Hash(movie-name) is key
⚫ Server ID is the value
⚫ Total number of keys >> num_servers
22
Issue
⚫ The solution doesn’t ensure the smoothness property
⚫ Example: 𝑁=10 servers, 𝐾=1000 pairs, use server ID = key % 𝑁
⚫ Add one server ➜ will need to move about 99% of keys
⚫ Approx 1 − 𝑁/𝐾 keys need to move on average (per key)
⚫ Is the solution load balanced?
⚫ For (key, value) storage: Yes, for a good hash function
⚫ For the number of queries per node: Yes for even key popularity
⚫ Is the solution scalable?
⚫ A small number of internal messages to resolve a query?
⚫ Yes – knowing the key, the server ID is readily known
23
What do we need?
⚫ A solution that doesn’t depend on the number of servers
⚫ When adding or removing servers, the number of keys that need to be relocated is
minimized
24
Solution#3: Consistent Hashing
⚫ Provides nice smoothness and data balancing properties
⚫ Widely used in industry:
⚫ Amazon’s Dynamo data store
⚫ Used for Amazon’s e-commerce website
25
Consistent Hashing: Construction
⚫ Use an n-bit identifier for:
⚫ Keys: use hash(for example, movie name)
⚫ Servers: use hash(for example, server IP)
⚫ Use a standard hash function such as SHA-1
⚫ Servers and Keys both get mapped to a number
⚫ A number from 0 to 2n-1
⚫ Where and how to store the (k,v) pairs?
⚫ Store (k,v) at a server that is the immediate successor of k
⚫ The closest clockwise server greater than or equal to k
⚫ Servers and keys mapped to an “abstract” circle or hash ring
26
Example
1
12
60
13
48
25
40
32
“overlay network” 28
Consistent Hashing: Summary
⚫ Partitions the key-space among servers
29
Questions?
30
Consistent Hashing: Load Balancing
⚫ Each server owns 1/Nth of the ID space in expectation
⚫ Where N is the number of servers
32
Virtual Nodes in DHT
How to implement?
movie6
Server3-1
Normally, we use hash(IP) to get the server ID Server2-2
movie1
For V virtual servers, use: movie5
Hash(IP+”1”)
Hash(IP+”2”) Server1-1
…
Hash(IP+”V”) Server1-2
movie2
movie4
Server 1 physically fails now (Server Server3-2
1-1, Server 1-2 failed) Server2-1
movie3
The storage of Serv 1-1 and Server 1-2 gets
relocated to the two immediate successors
33
Virtual Nodes: Summary
⚫ Idea: Each physical node now maintains V > 1 virtual nodes
⚫ The number of virtual nodes that a node is responsible for can be decided
based on its capacity
34
Theoretical Results
⚫ For any set 𝑁 of nodes, and 𝐾 keys, with high probability:
𝐾
⚫ Each node is responsible for at most 1 + 𝜖 keys
𝑁
⚫ 𝜖 can be reduced to an arbitrarily small constant by having each node
run Ο(log 𝑁) virtual nodes
⚫ When an (𝑁 + 1)𝑠𝑡 node joins or leaves the network, responsibility for
𝐾
Ο keys changes hands (and only to and from the joining or leaving
𝑁
node)
35
Proven by D. Lewin in his work “Consistent hashing and random trees”, Master thesis, MIT, 1998
Summary
⚫ How do we partition data in distributed systems?
⚫ With a goal to achieve balance and smoothness
⚫ Virtual nodes
⚫ Can help with load imbalance caused by failures/additions
⚫ Also handles different server capacities
36
Next Lecture
⚫ How to efficiently locate where a data item is stored in a distributed
system?
37
Resolving a Query
60
13
48
O(N) messages 25
on avgerage to resolve
query, when there 40
32
are N peers 38
Questions?
39