DHTLookup
DHTLookup
ACK: Slides use some material from Scott Shenker (UC Berkeley) and Jim Kurose (UMass)
Agenda
⚫ Recap
⚫ Distributed Storage of Data
⚫ Circular DHT and Consistent Hashing
⚫ Distributed Hashtable
⚫ Key Lookup
⚫ Failures and Node Additions
2
Recap: DHT
⚫ Previous lecture:
⚫ How to partition data for a large distributed application?
⚫ Where should we store a specific (key, value) pair?
⚫ Consistent Hashing and Circular DHT
⚫ Today:
⚫ Consistent hashing: Load balancing, smoothness, and scalability
⚫ Virtual servers
⚫ DHT lookup services
3
Circular DHT and Consistent Hashing
4
Recap: Problem
⚫ We intend to store Billions of (key, value) pairs
⚫ A single server is not an option
⚫ Distribute the (key, value) pairs over many servers
⚫ Any server can be queried
⚫ Provide the key and get the value
⚫ The queried server may need to ask others
6
Example
1
9
Consistent Hashing: Load Balancing
⚫ Each server owns 1/N of the ID space in expectation
⚫ Where N is the number of servers
11
Virtual Nodes in DHT
How to implement?
movie6
Server3-1
Normally, we use hash(IP) to get the server ID Server2-2
movie1
For V virtual servers, use: movie5
Hash(IP+”1”)
Hash(IP+”2”) Server1-1
…
Hash(IP+”V”) Server1-2
movie2
movie4
Server 1 physically fails now (Server Server3-2
1-1, Server 1-2 failed) Server2-1
movie3
The storage of Serv 1-1 and Server 1-2 gets
relocated to the two immediate successors What will get relocated and where?
12
Virtual Nodes: Summary
⚫ Idea: Each physical node now maintains V > 1 virtual nodes
𝐾
⚫ Each node is responsible for at most 1 + 𝜖 keys
𝑁
⚫ 𝜖 can be reduced to an arbitrarily small constant by having each node
run Ο(log 𝑁) virtual nodes Load Balanced!
14
Proven by D. Lewin in his work “Consistent hashing and random trees”, Master thesis, MIT, 1998
Summary
⚫ How do we partition data in distributed systems?
⚫ To achieve balance and smoothness
⚫ Virtual nodes
⚫ Can help with load imbalance caused by failures/additions
⚫ Also handles different server capacities
15
Handling Peer Churn (servers come and go)
⚫ Each server/peer knows the address
of its two successors 1
⚫ Periodically pings to check aliveness
⚫ If the immediate successor leaves, choose 3
15
the next server as the new immediate
successor 4
⚫ Example: Peer 5 abruptly fails 12
5
⚫ Peer 4 notices, makes 8 its immediate
successor 10
8
⚫ Asks 8 who its immediate successor is and
makes it the second successor
⚫ Server 3 also updates 2nd successor 16
New Node Joining?
⚫ See book for an example
17
Next Question
⚫ How to efficiently locate where a data item is stored in a distributed hash
table (DHT)?
⚫ For Scalability:
⚫ A query should not take too long to resolve
⚫ A node should not keep track of too many other nodes
18
Designing a Lookup Service
Application
Component
get(key)?
…
server1 server2 servern
19
Resolving a Query
60
13
48
25
O(N) messages on average
to resolve the query, when 40
there are N peers 32
20
Circular DHT with shortcuts
22
Centralized Lookup Service
23
Designing a Lookup Service
Application
Component
get(key)?
…
server1 server2 servern
24
Centralized Lookup Service
Central Server
Application lookup(key)?
Component
server2
Keeps an updated hash ring; which
servers are up and how they are
get(key) mapped on the ring
…
server1 server2 servern
27
Zero-hop DHTs
⚫ Each node tries to maintain an updated hash ring
⚫ All nodes send directly to the server immediately succeeding the key on
the hash ring
28
Implementing Zero-hop DHTs
⚫ One possible strategy
⚫ Each node in the system periodically pings (sends small msgs) every other
node
⚫ To learn whether the node is up or not
⚫ If a node does not reply to k consecutive pings, we assume that the node is down
⚫ If the node is down, remove the node or virtual nodes associated with it, from the hash ring
⚫ To learn about the virtual node IDs that belong to the node
29
Zero-hop DHTs through Gossip
⚫ Periodically (e.g., once per second), each node contacts a randomly chosen
other node
⚫ Nodes exchange their lists of known nodes (including virtual node IDs)
⚫ Each node learns about the key ranges handled by other nodes
30
Analyzing Gossip based Zero-hop DHTs
⚫ What if there are a large number of highly unreliable nodes?
⚫ E.g., on average, a node is available only for one hour
⚫ In other words, nodes exhibit high churn; they constantly join and leave
⚫ Becomes difficult for a single node to have an accurate picture of all the
nodes in the system
31
Questions?
32
DHT Node Joining Example
33
New Node Joining
• Assume each node only knows about the
predecessor and the (just one) successor
• Node with id=98 wants to join the ring
• Node 98 needs to know at least one DHT node Succ=120
• Assume the known node is 10 Pred= 90
Succ=null
Pred= null
N98
Succ=105
Pred= 60
34
New Node Joining
• Node 98 sends join(98) to node 10
• Node 90 returns node 105 Succ=120
• Node 98 updates its successor to 105 Pred= 90
Join(98)
N98
Succ=105
Pred= null
Succ=105
Pred= 60
35
New Node Joining
• Node 98 sends a periodic message to
node 105 Succ=120
• Node 105 updates its predecessor to 98 Pred= 98
N98
Succ=105
Pred= null
Succ=105
Pred= 60
36
New Node Joining
• Node 90 sends a periodic message to 105
• Node 90 updates its successor to 98 Succ=120
Pred= 98
N98
Succ=105
Pred= null
Succ=98
Pred= 60
37
New Node Joining
• Node 90 sends a periodic message to 98
• Node 98 updates its predecessor to 90 Succ=120
• This completes joining Pred= 98
N98
Succ=105
Pred= 90
Succ=98
Pred= 60
38
Case Study: bittorrent
39
P2P file distribution: BitTorrent
▪ file divided into 256Kb chunks
▪ peers in torrent send/receive file chunks
tracker: tracks peers torrent: group of peers
participating in torrent exchanging chunks of a file
Alice arrives …
… obtains list
of peers from tracker
… and begins exchanging
file chunks with peers in torrent
40
P2P file distribution: BitTorrent
⚫ peer joining torrent:
⚫ has no chunks, but will accumulate them
over time from other peers
⚫ registers with tracker to get list of peers,
connects to subset of peers (“neighbors”)
⚫ while downloading, peer uploads chunks to other peers
⚫ peer may change peers with whom it exchanges chunks
⚫ churn: peers may come and go
⚫ once peer has entire file, it may (selfishly) leave or (altruistically) remain
in torrent
41
BitTorrent: requesting, sending file chunks
42
BitTorrent: tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
45
DHT BitTorrent: bootstrap
⚫ Several options
⚫ Embedded in the torrent file
⚫ Recommended when a torrent file is created
⚫ Hardcoding
⚫ Nodes run by orgs with long-running servers
⚫ dht.transmissionbt.com
⚫ router.utorrent.com
⚫ router.bittorent.com
⚫ Peer conversations /PEX
⚫ Ask the peers (probably, non-DHT) that you are already downloading other
torrents from
46
DHT Node vs. Bittorrent Peer
⚫ DHT Node participates in DHT
⚫ Peers participate in one or more torrent
⚫ Being a node and a peer is independent
⚫ You can just be a peer (and not a DHT node)
⚫ Not respond to DHT queries or Disable DHT in client
⚫ May still be able to query the DHT network
⚫ You can also just be a DHT node (and not a peer)
⚫ Enable DHT in client & Not participate in any torrent
⚫ Helps global DHT network
⚫ storing key-value pairs for random torrents
⚫ Can also be both – a DHT node and a BT peer
47
Questions?
48
Next Topic
49
Time Synchronization
50
Next …
⚫ Why time synchronization?
⚫ Why challenging?
⚫ Cristian’s algorithm
Why time synchronization?
⚫ You have to appear for a quiz but your watch is off by 15mins
⚫ What if your watch is Late by 15mins?
⚫ You will miss the quiz
⚫ What if your watch is Fast by 15mins?
⚫ You will end up waiting for a longer time than you intended
⚫ Some airlines could not process any reservations or check-ins for several hours.
What happened?
⚫ More generally:
⚫ Use timestamps to order events in a distributed system
⚫ Requires the system clocks to be synchronized with one another
Questions?
56