0% found this document useful (0 votes)
10 views56 pages

DHTLookup

The document discusses consistent hashing and distributed hash tables for storing and retrieving key-value pairs across multiple servers. It covers using virtual servers to improve load balancing when servers fail or are added. The key ideas are that consistent hashing maps keys and server IDs to the same identifier space, virtual servers represent each physical node multiple times, and servers store their successor's information to efficiently route queries when the network changes.

Uploaded by

dayyanali789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views56 pages

DHTLookup

The document discusses consistent hashing and distributed hash tables for storing and retrieving key-value pairs across multiple servers. It covers using virtual servers to improve load balancing when servers fail or are added. The key ideas are that consistent hashing maps keys and server IDs to the same identifier space, virtual servers represent each physical node multiple times, and servers store their successor's information to efficiently route queries when the network changes.

Uploaded by

dayyanali789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CS 382: Network-Centric Computing

Virtual Servers and Lookup in DHT

Zartash Afzal Uzmi


Spring 2023-24

ACK: Slides use some material from Scott Shenker (UC Berkeley) and Jim Kurose (UMass)
Agenda
⚫ Recap
⚫ Distributed Storage of Data
⚫ Circular DHT and Consistent Hashing

⚫ Distributed Hashtable
⚫ Key Lookup
⚫ Failures and Node Additions

2
Recap: DHT
⚫ Previous lecture:
⚫ How to partition data for a large distributed application?
⚫ Where should we store a specific (key, value) pair?
⚫ Consistent Hashing and Circular DHT

⚫ Today:
⚫ Consistent hashing: Load balancing, smoothness, and scalability
⚫ Virtual servers
⚫ DHT lookup services

3
Circular DHT and Consistent Hashing

4
Recap: Problem
⚫ We intend to store Billions of (key, value) pairs
⚫ A single server is not an option
⚫ Distribute the (key, value) pairs over many servers
⚫ Any server can be queried
⚫ Provide the key and get the value
⚫ The queried server may need to ask others

⚫ Need load balancing, smoothness, and scalability


⚫ Load balancing: No server stores too many pairs (or gets queried for!)
⚫ Smoothness: Few key relocations on adding/removing servers
⚫ Scalability: small #messages to resolve query, small #connections
5
Recap: Solutions
⚫ Solution #1: Random Assignment
⚫ Load balanced, smooth, scalable in storage
⚫ Not scalable in the lookup (no idea where the pair is!)

⚫ Solution #2: Server ID = hash(movie-name) % num_servers


⚫ Load balanced, scalable in storage and lookup
⚫ Not smooth (Ex: 1000 keys,10 servers, 99% keys moved on adding server)

⚫ Solution #3: Consistent Hashing and Circular DHT


⚫ Server ID and Key ID in the same n-bit number space (0 to 2n-1)
⚫ Load Balanced? Scalable? Smooth?

6
Example
1

⚫ Let’s use a 6-bit hash function


⚫ Assume 8 servers are available 12
⚫ Take a 6-bit hash of IP addresses 60
⚫ Get: 1,12,13,25,32,40,48,60
⚫ Create the hash ring
13
⚫ For each key, use a 6-bit hash to get
the key identifier k (say 35) 48 (35,v)
⚫ Store (k,v) at the immediate successor 25
of k
⚫ Resolving query: how many servers 40
should a server be connected to? 32
7
Consistent Hashing: Summary
⚫ Partitions the key-space among servers
⚫ Keys randomly distributed in ID‐space: hash(movie-name)
⚫ Servers choose random identifiers: for example, hash (IP) – same ID-space
⚫ Smooth?
⚫ Percent of keys relocated on addition/removal of a server node
⚫ Load balanced?
⚫ Spreads ownership of keys evenly across servers
⚫ Does the key distribution remain even with failures and additions
⚫ Scalable? In #connections and in #messages to resolve a query?
⚫ Need to evaluate the lookup process
8
Questions?

9
Consistent Hashing: Load Balancing
⚫ Each server owns 1/N of the ID space in expectation
⚫ Where N is the number of servers

⚫ What happens if a server fails?


⚫ If a server fails, its successor takes over the space
⚫ Smoothness goal achieved: only the failed server’s keys get relocated
⚫ But now the successor owns 2/N of the key space
⚫ Failures can upset the load balance. Is there a solution?
Virtual Servers
⚫ What if servers have different capacities?
⚫ The basic algorithm is oblivious to node heterogeneity
10
How to better load balance?
⚫ Identify the core reason for the load-balancing issue
⚫ When a server fails, all its storage falls onto the successor
⚫ Can we increase server storage granularity along the ring?
⚫ Try representing each server as multiple (V) virtual servers
⚫ Spread the virtual servers along the ring
⚫ Failure of a physical node
⚫ All virtual instances (of this server) will fail
⚫ The number of (key,value) pairs to be relocated will remain the same…but
⚫ Will now fall onto V successors (instead of just one)
⚫ Better load balancing!!!

11
Virtual Nodes in DHT
How to implement?
movie6
Server3-1
Normally, we use hash(IP) to get the server ID Server2-2
movie1
For V virtual servers, use: movie5
Hash(IP+”1”)
Hash(IP+”2”) Server1-1

Hash(IP+”V”) Server1-2
movie2
movie4
Server 1 physically fails now (Server Server3-2
1-1, Server 1-2 failed) Server2-1
movie3
The storage of Serv 1-1 and Server 1-2 gets
relocated to the two immediate successors What will get relocated and where?
12
Virtual Nodes: Summary
⚫ Idea: Each physical node now maintains V > 1 virtual nodes

⚫ Each virtual node owns an expected 1/(VN) of the ID space

⚫ Upon a physical node’s failure, V successors take over

⚫ Result: Better load balance with larger V

⚫ Handling servers with various capacities


⚫ The number of virtual nodes that a node is responsible for can be decided based on its
capacity
13
Theoretical Results
⚫ For any set 𝑁 of nodes, and 𝐾 keys, with high probability:

𝐾
⚫ Each node is responsible for at most 1 + 𝜖 keys
𝑁
⚫ 𝜖 can be reduced to an arbitrarily small constant by having each node
run Ο(log 𝑁) virtual nodes Load Balanced!

⚫ When an (𝑁 + 1)𝑠𝑡 node joins or leaves the network, responsibility for


𝐾
Ο keys changes hands (and only to and from the joining or leaving
𝑁
node) Smooth!

14
Proven by D. Lewin in his work “Consistent hashing and random trees”, Master thesis, MIT, 1998
Summary
⚫ How do we partition data in distributed systems?
⚫ To achieve balance and smoothness

⚫ Consistent hashing is widely used


⚫ Provides smoothness and load balancing
⚫ Load balancing can be impacted under node removal or addition
⚫ What about scalability? Need to consider the lookup mechanism

⚫ Virtual nodes
⚫ Can help with load imbalance caused by failures/additions
⚫ Also handles different server capacities
15
Handling Peer Churn (servers come and go)
⚫ Each server/peer knows the address
of its two successors 1
⚫ Periodically pings to check aliveness
⚫ If the immediate successor leaves, choose 3
15
the next server as the new immediate
successor 4
⚫ Example: Peer 5 abruptly fails 12
5
⚫ Peer 4 notices, makes 8 its immediate
successor 10
8
⚫ Asks 8 who its immediate successor is and
makes it the second successor
⚫ Server 3 also updates 2nd successor 16
New Node Joining?
⚫ See book for an example

17
Next Question
⚫ How to efficiently locate where a data item is stored in a distributed hash
table (DHT)?

⚫ For Scalability:
⚫ A query should not take too long to resolve
⚫ A node should not keep track of too many other nodes

⚫ Any node can be asked “the question” – where is this key?


⚫ Actually: “What is the value corresponding to this key?”

18
Designing a Lookup Service
Application
Component
get(key)?


server1 server2 servern

Data organized as (key, value) pairs distributed across servers

19
Resolving a Query

1 What is the value


associated with
key 53 ?
value 12

60

13
48
25
O(N) messages on average
to resolve the query, when 40
there are N peers 32
20
Circular DHT with shortcuts

What is the value


⚫ Each peer keeps track of the IP 1 for key 53
addresses of a predecessor, a Value
successor, and shortcuts 12
60
⚫ Reduced from 6 to 3 internal
messages for resolving a query 13
⚫ A compromise between the 48
25
#shortcuts and the #messages
⚫ Possible to design shortcuts with O(log 40
32
N) neighbors, O(log N) messages in
query Each server has just one shortcut
21
DHT Lookup Schemes
⚫ Routed DHTs (what we just saw)

⚫ Centralized lookup service


⚫ Storage is still distributed

⚫ Zero-hop DHTs and Gossip

22
Centralized Lookup Service

23
Designing a Lookup Service
Application
Component
get(key)?


server1 server2 servern

Data organized as (key, value) pairs distributed across servers

24
Centralized Lookup Service
Central Server
Application lookup(key)?
Component
server2
Keeps an updated hash ring; which
servers are up and how they are
get(key) mapped on the ring


server1 server2 servern

Data organized as (key, value) pairs

This design is used in Google File System


25
Analyzing Centralized Lookups
• Simple to implement
• Only a single master node keeps the updated hash ring

• Total delay in serving a get request


• > RTT(app_comp central_server) + RTT(app_comp server)

• Master node potential bottleneck


• If many concurrent requests

• Single point of failure


26
app_comp = Application component
Zero-hop DHTs

27
Zero-hop DHTs
⚫ Each node tries to maintain an updated hash ring

⚫ All nodes send directly to the server immediately succeeding the key on
the hash ring

⚫ Reduces response times;


⚫ Delay~ RTT(app_comp server)

28
Implementing Zero-hop DHTs
⚫ One possible strategy

⚫ Each node in the system periodically pings (sends small msgs) every other
node
⚫ To learn whether the node is up or not
⚫ If a node does not reply to k consecutive pings, we assume that the node is down
⚫ If the node is down, remove the node or virtual nodes associated with it, from the hash ring
⚫ To learn about the virtual node IDs that belong to the node

⚫ Do you see any issues with this scheme?

29
Zero-hop DHTs through Gossip
⚫ Periodically (e.g., once per second), each node contacts a randomly chosen
other node

⚫ Nodes exchange their lists of known nodes (including virtual node IDs)

⚫ Each node learns about the key ranges handled by other nodes

⚫ This design is used by Amazon

30
Analyzing Gossip based Zero-hop DHTs
⚫ What if there are a large number of highly unreliable nodes?
⚫ E.g., on average, a node is available only for one hour
⚫ In other words, nodes exhibit high churn; they constantly join and leave

⚫ Becomes difficult for a single node to have an accurate picture of all the
nodes in the system

31
Questions?

32
DHT Node Joining Example

33
New Node Joining
• Assume each node only knows about the
predecessor and the (just one) successor
• Node with id=98 wants to join the ring
• Node 98 needs to know at least one DHT node Succ=120
• Assume the known node is 10 Pred= 90

Succ=null
Pred= null
N98

Succ=105
Pred= 60

34
New Node Joining
• Node 98 sends join(98) to node 10
• Node 90 returns node 105 Succ=120
• Node 98 updates its successor to 105 Pred= 90

Join(98)

N98
Succ=105
Pred= null

Succ=105
Pred= 60

35
New Node Joining
• Node 98 sends a periodic message to
node 105 Succ=120
• Node 105 updates its predecessor to 98 Pred= 98

N98
Succ=105
Pred= null

Succ=105
Pred= 60

36
New Node Joining
• Node 90 sends a periodic message to 105
• Node 90 updates its successor to 98 Succ=120
Pred= 98

N98
Succ=105
Pred= null

Succ=98
Pred= 60

37
New Node Joining
• Node 90 sends a periodic message to 98
• Node 98 updates its predecessor to 90 Succ=120
• This completes joining Pred= 98

N98
Succ=105
Pred= 90

Succ=98
Pred= 60

38
Case Study: bittorrent

39
P2P file distribution: BitTorrent
▪ file divided into 256Kb chunks
▪ peers in torrent send/receive file chunks
tracker: tracks peers torrent: group of peers
participating in torrent exchanging chunks of a file

Alice arrives …
… obtains list
of peers from tracker
… and begins exchanging
file chunks with peers in torrent
40
P2P file distribution: BitTorrent
⚫ peer joining torrent:
⚫ has no chunks, but will accumulate them
over time from other peers
⚫ registers with tracker to get list of peers,
connects to subset of peers (“neighbors”)
⚫ while downloading, peer uploads chunks to other peers
⚫ peer may change peers with whom it exchanges chunks
⚫ churn: peers may come and go
⚫ once peer has entire file, it may (selfishly) leave or (altruistically) remain
in torrent
41
BitTorrent: requesting, sending file chunks

Requesting chunks: Sending chunks: tit-for-tat


⚫ At any given time, different peers have ⚫ Alice sends chunks to those four peers
different subsets of file chunks currently sending her chunks at highest
⚫ Periodically, Alice asks each peer for a rate
list of chunks that they have ⚫ other peers are choked by Alice (do not
receive chunks from her)
⚫ Alice requests missing chunks from
⚫ re-evaluate top 4 every10 secs
peers, rarest first
⚫ every 30 secs: randomly select another
peer, starts sending chunks
⚫ “optimistically unchoke” this peer
⚫ newly chosen peer may join top 4

42
BitTorrent: tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

higher upload rate: find better trading


partners, get file faster !
43
Tracker Dependence
⚫ Heavy dependence on tracker
⚫ Single point of failure

⚫ Methods to reduce tracker dependency


⚫ Multi-tracker torrents
⚫ Peer Exchange (PEX)
⚫ Distributed Hashtable (DHT)

⚫ PEX may not be very useful when DHT is used


⚫ Private torrents normally disable DHT; PEX still works
44
Distributed Hash Table (DHT)
⚫ Trackerless torrents use DHT

⚫ Many many other applications of DHT

45
DHT BitTorrent: bootstrap
⚫ Several options
⚫ Embedded in the torrent file
⚫ Recommended when a torrent file is created
⚫ Hardcoding
⚫ Nodes run by orgs with long-running servers
⚫ dht.transmissionbt.com
⚫ router.utorrent.com
⚫ router.bittorent.com
⚫ Peer conversations /PEX
⚫ Ask the peers (probably, non-DHT) that you are already downloading other
torrents from

46
DHT Node vs. Bittorrent Peer
⚫ DHT Node participates in DHT
⚫ Peers participate in one or more torrent
⚫ Being a node and a peer is independent
⚫ You can just be a peer (and not a DHT node)
⚫ Not respond to DHT queries or Disable DHT in client
⚫ May still be able to query the DHT network
⚫ You can also just be a DHT node (and not a peer)
⚫ Enable DHT in client & Not participate in any torrent
⚫ Helps global DHT network
⚫ storing key-value pairs for random torrents
⚫ Can also be both – a DHT node and a BT peer
47
Questions?

48
Next Topic

49
Time Synchronization

50
Next …
⚫ Why time synchronization?

⚫ Why challenging?

⚫ Cristian’s algorithm
Why time synchronization?
⚫ You have to appear for a quiz but your watch is off by 15mins
⚫ What if your watch is Late by 15mins?
⚫ You will miss the quiz
⚫ What if your watch is Fast by 15mins?
⚫ You will end up waiting for a longer time than you intended

⚫ Time synchronization is required:


⚫ Correctness
⚫ Fairness
Example: Cloud airline reservation system
⚫ Server A receives a request to purchase last ticket on flight ABC 123
⚫ Server A timestamps purchase using local clock 9h:15m:32.45s, and logs it
⚫ Replies ok to client
⚫ That was the last seat. Server A sends a message to Server B saying “flight
full”
⚫ Server B enters “Flight ABC 123 full” + its own local clock value (which reads
9h:10m:10.11s) into its log
⚫ Server C queries A’s and B’s logs. Is confused that a client purchased a ticket
at A after the flight became full at B
⚫ This may lead to further incorrect actions by Server C
53
Story of an Internet crash in 2012
⚫ In the night from 30 June to 1 July 2012, many online services and systems
around the world crashed simultaneously

⚫ Servers locked up and stopped responding

⚫ Some airlines could not process any reservations or check-ins for several hours.
What happened?

⚫ Read more about it at:


⚫ https://www.somebits.com/weblog/tech/bad/leap-second-2012.html
More examples
⚫ At what day and time did Ali transfer money to Ayesha?
⚫ Require accurate clocks (synchronized with a central authority)

⚫ More generally:
⚫ Use timestamps to order events in a distributed system
⚫ Requires the system clocks to be synchronized with one another
Questions?

56

You might also like