0% found this document useful (0 votes)
19 views30 pages

05 DHT

This document discusses distributed hash tables (DHTs) which allow for the distributed storage and retrieval of key-value pairs across multiple nodes. It begins with an overview of traditional hash tables and how they map keys to values using a hash function. It then explains how DHTs distribute this functionality across a network of nodes arranged in a ring topology. The Chord protocol is presented as an example DHT, detailing how it uses consistent hashing to assign keys to nodes, performs lookups by routing through successors, and handles new nodes joining and leaving the ring through periodic stabilization messages.

Uploaded by

emmanuel sop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

05 DHT

This document discusses distributed hash tables (DHTs) which allow for the distributed storage and retrieval of key-value pairs across multiple nodes. It begins with an overview of traditional hash tables and how they map keys to values using a hash function. It then explains how DHTs distribute this functionality across a network of nodes arranged in a ring topology. The Chord protocol is presented as an example DHT, detailing how it uses consistent hashing to assign keys to nodes, performs lookups by routing through successors, and handles new nodes joining and leaving the ring through periodic stabilization messages.

Uploaded by

emmanuel sop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Computer Networks

Distributed Hash Tables


Chapter

1. Introduction
2. Protocols Top-Down-Approach
3. Application layer
Application
4. Web services Layer
5. Distributed hash tables Presentation
Layer
n DHT
Session
n Chord Layer
6. Time synchronization Transport
Layer
7. Transport layer
Network
8. UDP and TCP Layer
9. TCP performance Data link
Layer
10. Network layer
Physical
11. Internet protocol Layer
12. Data link layer

2 Computer Networks – Distributed Hash Tables


Distributed Hash Tables (DHT)

3 Computer Networks – Distributed Hash Tables


Reminder

n We have discussed the mapping: name → address


n The solution has been DNS
n Hierarchical organization
n Distributed
n Using redundancy
n Challenge:
n How to provide storage of pairs like the above one in a distributed way, even
if there is no strong hierarchy?

4 Computer Networks – Distributed Hash Tables


Hash Table

n Items: [Key, Value] are stored


n The key is hashed, i.e., transformed (using a hash function) so that the
result – the hash – can be used to locate a bucket in which the pair is
stored.
n The bucket is identified by an index.
n The bucket might contain multiple such items (pairs)

5 Computer Networks – Distributed Hash Tables


Hash Table - Collision Resolution

n Hash collisions are practically unavoidable when hashing a random subset of a


large set of possible keys.
n E.g., if 2450 keys are hashed into 106 buckets, even with a perfectly uniform random
distribution there is a ~95% chance of at least two of the keys being hashed to the same slot
à birthday paradox

n Widely used collision resolution strategies:


n Separate chaining: each bucket is independent & has some sort of list of entries with the
same index (left)
n Open addressing: all entry records are stored in the bucket array itself & usage of probe
sequence (middle)

probability of at least
two people sharing a
birthday

src: wikipedia

6 Computer Networks – Distributed Hash Tables


A Distributed Hash Table (DHT)

n Remember the mapping of names to IP addresses?


Could we use hash tables? Remember the scaling issue … DHT
n Distributed Hash Tables (DHT) spread the pairs across TCP
a number of computers (buckets) located arbitrarily across
IP
the world.
n Note: copies of a single pair can be stored in one or in multiple locations!
n When a user queries the system, i.e., provides the key, the system uses
the hash to find the pair from one of the computers where it’s stored and
returns the result.
n All the nodes are assumed to be reachable by some kind of unicast
communication.
n DHT posses the features of scaling, robustness, self-organization.

7 Computer Networks – Distributed Hash Tables


Hash Table vs. DHT

n The key is hashed to find the


proper bucket in a hash table
lookup (key) → value hash table
insert (key, value)
0 x
1

n In a Distributed Hash Table (DHT), key hash function pos


2 y z
3
nodes are the hash buckets “Beattles”
h(key)%N
2 ...
N-1

n Key is hashed to find the hash bucket

responsible Node 0

n Pairs are distributed among the lookup (key) → value


1
nodes with respect to load insert (key, value)

balancing 2

key hash function pos


“Beattles” 2 ...
h(key)%N
N-1

node

[Khalifeh, op. cit.]

8 Computer Networks – Distributed Hash Tables


DHT Interface

n Minimal interface (data-centric)


n lookup(key) → value
n insert(key, value)
n delete(key)
n Supports a wide range of applications, because few restrictions
n Value is application dependent
n Keys have no semantic meaning
n Note: DHTs do not have to store data useful to end users, e.g., data files
… data storage can be built on top of DHTs

9 Computer Networks – Distributed Hash Tables


DHTs: Problems

n Problem 1 (dynamicity): adding or removing nodes


n With hash mod M (= no. of nodes), virtually every key will change its location!
h(k) mod m ≠ h(k) mod (m+1) ≠ h(k) mod (m-1)
n Solution: use consistent hashing
n Define a fixed hash space
n All hash values fall within that space and do not depend on the number of
peers (hash bucket)
n Each key goes to peer closest to its ID in hash space (according to some
proximity metric)
n Problem 2 (size): all nodes must be known (in order to insert or lookup
items!)
n Works with small and static server populations
n Solution: each peer knows of only a few “neighbors”
n Messages are routed through neighbors via multiple hops
[Felber, op. cit.]

10 Computer Networks – Distributed Hash Tables


Chord: Identifier to Node Mapping

n Chord is a DHT developed in 2001

4
n Associate to each node and item a 58 8
unique id in an unidimensional
space 0 to 2m-1
n Node 8 maps [5, 8] 15

n Node 15 maps [9, 15] 44 20


n Node 20 maps [16, 20]
n … 35 32
n Node 4 maps [59, 4]
n Each node maintains a pointer to its successor
[Shenker, op. cit.]

11 Computer Networks – Distributed Hash Tables


Chord: Lookup

n Each node maintains its successor


n Route packet (ID, data) to the node lookup(37)
responsible for ID using successor 4
pointers 58
8

node=44
15

44 20

35 32

12 Computer Networks – Distributed Hash Tables


Chord: Joining Operation

n Each node A periodically sends a stabilize() message to its successor B


n Upon receiving a stabilize() message, node B
n returns its predecessor B’= pred(B) to A by sending a notify(B’) message
n Upon receiving notify(B’) from B,
n if B’ is between A and B, A updates its successor to B’
n A doesn’t do anything, otherwise

13 Computer Networks – Distributed Hash Tables


Chord: Joining Operation (II)

n Node with id=50 joins the ring


n Node 50 needs to know at least
succ=4
one node already in the system pred=44
4
n Assume known node is 15
58
8

succ=nil
pred=nil
15
50
succ=58 44 20
pred=35

35 32

14 Computer Networks – Distributed Hash Tables


Chord: Joining Operation

n Node 50 asks node 15


to forward join message succ=4
to successor(50) which pred=50
pred=44
4
notify()
is 58 58
8
n When join(50) reaches
the destination (i.e., join(50)
succ=nil
succ=58
node 58), node 58 pred=nil
15
n updates its predecessor 50

to 50, succ=58
44
20
pred=35
n returns a notify message
to node 50
35
32
n Node 50 updates its
successor to 58

15 Computer Networks – Distributed Hash Tables


Chord: Joining Operation (cont’d)

n Node 44 sends a
stabilize message to its succ=4
successor, node 58 pred=50

50)
4

or=
n Node 58 reply with a 58

ess
8
notify message

dec
pre
ify(
n Node 44 updates its

not
stabilize()
successor to 50 succ=58
pred=nil 15
50

succ=58
succ=50 44 20
pred=35

35
32

16 Computer Networks – Distributed Hash Tables


Chord: Joining Operation (cont’d)

n Node 44 sends a
stabilize message to succ=4
its new successor, pred=50

node 50 4
58
n Node 50 sets its 8
predecessor to node
44 succ=58
pred=44
pred=nil Stabilize()
15
50
succ=50 44 20
pred=35

35 32

17 Computer Networks – Distributed Hash Tables


Chord: Joining Operation (cont’d)

n This completes the joining operation!

pred=50

4
58
8

succ=58
pred=44 50
15

succ=50 44 20

35 32

18 Computer Networks – Distributed Hash Tables


Achieving Efficiency: Finger Tables

Say m=7
Finger Table at k=80
0
start = k + 2i (modulo 2m)
ft[i] = successor(start[i]) (80 + 26) mod 27 = 16
80 + 25 112
i start ft[i] 20
0 81 96
1 82 96
2 84 96 96
3 88 96 80 + 24 32
4 96 96
80 + 23
5 112 112
80 + 22
6 16 20 80 + 21
80 + 20 45
80

ith entry at peer with id n is first peer with id >= n + 2 (mod 2 )


i m

19 Computer Networks – Distributed Hash Tables


Lookups using Finger Tables (II)

n Ex. 1: key=3 at node 1:


n Node 1 knows that 3 lies between it &
its successor (4) à desired node is 4

n Ex. 2: key=16 at node 1:


n Using FT we see that closest
predecessor to 16 is 9 à request is
forwarded to node 12
n Node 12 uses FT to find out that
closest predecessor to 16 is 14 à
request is forwarded to node 15
n Node 15 observes that 16 lies
between it and its successor (20) à it
returns address of node 20 to caller

20 Computer Networks – Distributed Hash Tables


Chord Performance (Improvements)

n Chord properties
n Routing table size O(log(N)), where N is the total number of nodes
n Guarantees that a file is found in O(log(N)) steps
n Reducing latency
n Chose finger that reduces expected time to reach destination
n Chose the closest node from range [N+2i-1,N+2i) as successor
n Stretch is another parameter:
latency for each lookup on the overlay topology
=
average latency on the underlying topology

n Nodes close on ring, but far away in Internet


n Goal: put nodes in routing table that result in few hops and low latency

21 Computer Networks – Distributed Hash Tables


Latency Stretch in Chord

0 32 64 96 128 160 192 224


255
120

U.S.A China
64
0 Network node
Data
128 96 Overlay routing
physical link

A Chord network with N (=8) nodes and m (=8)-bit key space

[Ratnasamy, op. cit.]

22 Computer Networks – Distributed Hash Tables


Achieving Robustness

n To improve robustness each node maintains the k (> 1) immediate


successors instead of only one successor
n In the notify() message, node A can send its k-1 successors to its
predecessor B
n Upon receiving notify() message, B can update its successor list by
concatenating the successor list received from A with A itself

23 Computer Networks – Distributed Hash Tables


The Problem of Membership Churn

n In a system with 1,000s of machines, some machines failing / recovering


at all times
n This process is called churn
n Without repair, quality of overlay network degrades over time
n A significant problem deployed DHTs systems

Observation: in >50 % cases, mean time between failures (MTBF) in order of


minutes.

[Rhea, op. cit.]

24 Computer Networks – Distributed Hash Tables


What Makes a Good DHT Design?

n The number of neighbors for each node should remain “reasonable”


(small degree)
n DHT routing mechanisms should be decentralized (no single point of
failure or bottleneck)
n Should gracefully handle nodes joining and leaving
n Repartition the affected keys over existing nodes
n Reorganize the neighbor sets
n Bootstrap mechanisms to connect new nodes into the DHT
n DHT must provide low stretch
n Minimize ratio of DHT routing vs. unicast latency between two nodes

[Felber, op. cit.]

25 Computer Networks – Distributed Hash Tables


Multiple Solutions

n Chord
n Tapestry
n Uses locally optimal routing tables to reduce routing stretch
n Pastry
n Routing overlay network to reduce cost of routing a packet
n CAN
n Virtual multi-dimensional Cartesian coordinate space

26 Computer Networks – Distributed Hash Tables


DHTs Support Many Applications

n File sharing [CFS, OceanStore, PAST, …]


n Web cache [Squirrel, …]
n Censor-resistant stores [Eternity, FreeNet, …]
n Application-layer multicast [Narada, …]
n Event notification [Scribe]
n Naming systems [ChordDNS, INS, …]
n Query and indexing [Kademlia, …]
n Communication primitives [I3, …]
n Backup store [HiveNet]
n Web archive [Herodotus]

27 Computer Networks – Distributed Hash Tables


DHT is a Good Shared Infrastructure

n A single DHT (namely, Open DHT) is shared across multiple applications,


thus amortizing the cost of deployment.
n Applications inherit some security and robustness from DHT
n DHT replicates data
n Resistant to malicious participants
n Low-cost deployment
n Self-organizing across administrative domains
n Allows to be shared among applications

[Kashoek, op. cit.]

28 Computer Networks – Distributed Hash Tables


DHT as an Infrastructure

Open DHT Architecture

[Rhea, op. cit.]

29 Computer Networks – Distributed Hash Tables


Open DHT Deployment Model

n A single DHT (namely, Open DHT) is shared across multiple applications,


thus amortizing the cost of deployment.
n Each DHT node serves as a gateway into the DHT for clients.
n Any Internet-connected computer can act as client:
n Clients of Open DHT do not need to run a DHT node
n Using DHT services, i.e., can store or put key-value pairs in Open DHT, and
can retrieve or get the value stored under a particular key
n Each DHT node serves as a gateway into the DHT for clients.
n An Open DHT client communicates with the DHT through the gateway of
its choice using an RPC over TCP. The gateway processes the
operations on client´s behalf.
n Because of this, the service is easy to access from virtually every
programming language.

30 Computer Networks – Distributed Hash Tables

You might also like