0% found this document useful (0 votes)
41 views

Distributed Hash Tables

In this article, I will try to explain the concept of Distributed Hash Table and will present the advantages and disadvantages of implementing such structures in software systems.

Uploaded by

soheib yahoui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Distributed Hash Tables

In this article, I will try to explain the concept of Distributed Hash Table and will present the advantages and disadvantages of implementing such structures in software systems.

Uploaded by

soheib yahoui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

République Algérienne Démocratique et Populaire

Ministère de l’Enseignement Supérieur et de la Recherche Scientifique

Université e Akli Mohand Oulhadj de Brouira

Faculté des Sciences et des Sciences Appliquées

Département d’Informatique

Projet SDA
En Informatique

Spécialité : Ingénierie des Systèmes d’Information et des Logiciels.

Thème

Les Tables de hachages distribués

Encadré Réalisé par


—D.Mourad Amad — Nacer-Bey Sara

— Kahlouche Nour-EL-houda

— Yahoui Soheib

2022/2023

1
Table des matières

1 distributed Hash table…………………………….…………………………i


1.1 Introduction ..............................................................................................3
1.2Concept of a DHT........................................................................................4
1.3 Properties of DHT ......................................................................................5
1.4 Structure-DHT………………………………………………………...……5
1.5 Pupular DHT protocols ..............................................................................4

1.4.1 Chord… ………………………………………………………………………………………..5

1.4.1.1Example – Chord ………………………………………………………………6


1.4.2 Kademlia…………………………………………………………………………….6

1.4.2.1Example – Kademlia ……………………………………………….7


1.4.3 Koorde TomP2P ……………………………………………………………………………. 7

1.4.1.1Example – TomP2P ……………………………………………………………7


1.4.4 Cassand raKoorde …………………………………………………………………………. 8

1.6 Example (Creating the network) ………………………………… 9

1.8 DHT Algorithms............................................................……….......15

1.7 Calcular DHT ……………………………………………………………17

1.7 Advantages of DHT ……………………………………………………………………..……….18

1.8 Disadvantages of DHT …………………………………………………………………………19

1.9 Conclusion………………………………………………………………………………………….…20

2
1. Introduction :

With the rise of cryptocurrencies in recent years, the topic of


“Decentralization” in general has gained a substantial amount of
popularity. As decentralization finds its way into the vocabulary of
more and more people with enthusiasm for technology, the greater
the probability for those people to become misinformed about the
underlying concepts in that particular technology. These topics
should be handles with care, they are not something that can be
explained in a ~10 min. video, therefore a substantial amount of
determination and logical robustness is required from the consumer
of such information.

In this article, I will try to explain the concept of Distributed Hash


Table and will present the advantages and disadvantages of
implementing such structures in software systems.

3
2.Concept of a DHT :

A distributed hash table (DHT) is a distributed system that


provides a lookup service similar to a hash table: key-value pairs are
stored in a DHT, and any participating node can efficiently retrieve the
value associated with a given key. The main advantage of a DHT is
that nodes can be added or removed with minimum work around
re-distributing keys. Responsibility for maintaining the mapping
from keys to values is distributed among the nodes, in such a way that a
change in the set of participants causes a minimal amount of
disruption. This allows a DHT to scale to extremely large numbers of
nodes and to handle continual node arrivals, departures, and failures.

A Distributed Hash Table is a decentralized data store that looks up


data based on key-value pairs. Every node in a DHT is responsible for a
set of keys and their associated values/resources. A key may be assigned
to a hash value of a specific variable or the hashed contents of a
particular file resource on the system. Furthermore, the key is a unique
identifier for its associated data value, which can be any form of data
[file, variable, etc...], created by running the value through a hashing
function. A node may be responsible for one or many keys, depending on
the architecture of the network of nodes.

4
2. Properties of DHT :

 Autonomy and decentralization — nodes collectively form


the system without any central coordination.

 Fault tolerance — the system should be reliable (in some sense)


even with nodes continuously joining , leaving, and failing

 Scalability — the system should function efficiently even with


thousands or millions of nodes.

 Reliability — nodes fail and new ones are added.

 Performance — fast retrieval of information on a larger scale.

3 .Structure-DHT :

-Key space partitioning scheme splits keyspace among the


participating nodes.

-An overlay network that connects the nodes , allowing them to find
of given key in the keyspace.

-Hash algorithm (SHA-1).

-Consistent hashing that provides removal or addition of one node


changes only the set of keysowned by the nodes with adjacent Ids ,
and leaves all other nodes unaffected.

4.Popular DHT protocols and implementations :


4.1 Chord :
What is CHORD?

5
In computing, Chord is a protocol and
algorithm for a peer-to-peer distributed hash table. A distributed hash
table stores key-value pairs by assigning keys to different computers
(known as “nodes”); a node will store the values for all the keys for which
it is responsible. Chord specifies how keys are assigned to nodes, and how
a node can discover the value for a given key by first locating the node
responsible for that key.

Chord is one of the four original distributed hash table protocols, along
with CAN, Tapestry, and Pastry. It was introduced in 2001 by Ion Stoica,
Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan,
and was developed at MIT. For more information, once can refer to the
original paper. Chord is based on consistent hashing, something
which I’ve implemented in a previous article.

 4.1.1DHT :Example – Chord :

_Associate to each node and file a unique id in an unidimensional


space (a Ring) .

6
_E.g, pick from the range [0…2^m-1]

_ Usually the hash of the or IP address

 Properties :
 Routing table size is O (log N) ,where N is the total number of nodes
 Guarantees that a file is found in O (log N) hops.

4.2 Kademlia : Kademlia is a P2P Distributed Hash Table implementation that


distributes the KV stores across nodes in a network and retrieves them without
any central authority/database.

4.2.1 For example -Kademlia:

Let us say that all ids and keys are in a keyspace of [0,1,…23−1] and are
represented in binary. We can represent this space as a complete binary tree
where each leaf node is a key. Circled leaves are ids that correspond to a
participating computer in the network. In the example above (Figure 3), three
computers are participating in the protocol with ids of 000, 110, and 111
respectively.

10001000 = 1. Next, consider key 100 and computer ids 110 and 111. Both 100,
110, and 111 share a common prefix of 1. However, bit 1 of 110 and 111 are
equal: 11011101. Thus, we consider the first bit at which they differ: bit 2.

7
Since 11021102 but 1112≠10021112≠1002, 100 should be assigned to the
computer whose id is 110.

Figure 2: the correct assignments of 100 to 110 and 101 to 111.

3.Apache Cassandra :The Appache Cassaandra Project develops a highly


scalable.eventually consistent ,distributed , structured second-generation
distributed database .Cassandra is a Key-value store.Cassandra brings together
the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually
consistent. Like BigTable, Cassandra provides a Column Family-based data
model richer than typical key/value systems.

In this paper, l briefly illustrate the technology bechind Cassandra and its
working.

4.Koorde TomP2P : is a P2P library and a distributed hash table (DHT)


implementation which provides a decentralized key-value infrastructure for
distributed applications. Each peer has a table that can be configured either to
be disk-based or memory-based to store its values.

TomP2P stores key-value pairs in a distributed manner. To find the peers to


store the data in the distributed hash table, TomP2P uses an iterative routing
to find the closest peers. Since TomP2P uses non-blocking communication, a
future object is required to keep track of future results. This key concept is
used for all the communication (iterative routing and DHT operations, such as

8
storing a value on multiple peers) in TomP2P and it is also exposed in the API.
Thus, an operation such as get or put will return immediately and the user can
either block the operation to wait for the completion or add a listener that gets
notified when the operation completes.

4.4.1 DHT_Exemple :TomP2P :

5. Example  (Creating the network) :

0. Assigning Node IDs :

In this example we will create a DHT with 8 Nodes. These 8 nodes will
be connected, such that the network forms conceptually a “circle ”
starting from the first node. Each node will be assigned an ID from the
following array of IDs {10, 20, 30, 40, 50, 60 ,70, 80 }. The first Node
will have an ID of 10, the second; an ID of 20 and so forth. (fig 3 )

Figure 03 :

9
Each Node will have a successor (the node immediately after the current
node that is connected in the network i.e. the next node) and a pred-
ecessor (the node immediately before the current node that is connected
in the network i.e. the previous node). (Fig 04)

figure 04 :

1. Resource assignment based on Node ID :

Let us consider that the main purpose of the DHT in this example is to
store the contents of a particular set of files. The specific content of each
file does not matter for this example, however I would like to use file
contents in order to make the example seem closer to a real world
problem.

Suppose that we have 8 files that contain very important information;


file1, tile2, … In a normal Hash table we would either hash the absolute
path to the file, or the relative path to the file, or even hash all the file
contents. Regardless of the choice, the output of the hash will be the “key”
in the key-value pair in our hash table. In this example, for simplicity
reasons, the output of our hashed files will be the following array of
keys{15, 25, 35, 45, 55, 65, 75, 85}. (Fig 05)

10
Figure 05 :

Now that the key-value pair is available, a “resource distribution” scheme


must be chosen. In this example I will keep things simple and will chose
the resource distribution rule to be as follows:

“Resource with key N will be managed by Node with ID M, where M <


N and M is the biggest ID number in the set” , If there is a resource with
N=25 and you have 3 Nodes with IDs 10 20 and 30, this particular
resource will be managed by the Node with ID M = 20, not 30 because its
> 25 and not 10, because 20 exists and is > 10.

Creating this simple DHT can be seen on Fig 06 :

figure 06 :

2. Removing a Node from the network :

In order to remove a node from the network, first what needs to be

11
decided is where its key is going to be stored after the removal of the
node. If we want to remove the node with ID 80, we need to assign its
currently stored key to some other node. In this example we will follow a
basic rule, that says that when removing a node, all its stored keys shall
be assigned to its successor.

_This rule was arbitrary chosen for this example, there are many other
ways to manage key of nodes that are going to be removed in the future.

Figure 07 : Figure 08 :

Therefore, in the process of removing node 80, node 10 will be


additionally assigned key 85, hence after the removal of node 80, node
10 will store key 15 and key 85. (Fig. 7)(Fig. 8)

3. Adding a Node to the network :

When a new node wants to join the DHT network, depending on its
position relative to who is its successor/predecessor, the node will acquire

12
a different key. In this example, we will bring back the Node with ID 80
inside the DHT network. The constrain that we will use in this example is
that, if on leaving, a nodes key was assigned to its successor, then when
bringing that node inside the network, the new node will acquire a new
key from its predecessor, relative to its new position.

_Here I am not applying the rule “Resource with key N will be managed
by Node with ID M, where M < N and M is the biggest ID number in the
set”. If we had to apply this rule, there would be only one correct
position for a node with ID 80.

Following the above declared rule, we get 7 possible positions, marked


with green, for Node 80. We can observe that relative to the position of
the node that needs to be inserted inside the network, the assigned key
varies. When Node 80 goes between nodes 70 and 10, it will receive and
be responsible for storing key 75; when Node 80 goes between nodes 10
and 20, it will receive and be responsible for either key 15 or 85. (Fig. 9)

Figure 09 :

13
4. Querying the network for a particular resource
[O(n)] :

When performing a query for a specific key on the network, the parameter
with the greatest wight is “connectivity”. Connectivity here means the
number of direct connections a node has to other nodes, i.e. how many
successors / predecessors does a given node have.

In some cases, the number of predecessors does not signify great


importance, simply because the term predecessor is only theoretically
used for simplifying the concept of this particular data structure. In
reality, a Distributed Hash Table can be implemented with little as a
Linked List [Circular Linked List] composed of Node objects that contain
fields for ID, Key and a successor pointer. Simply said, the term
predecessor would be defined as so: The number of nodes that have their
successor pointer pointing to a particular node. In this case, in our
example each node would have only one node pointing to it, therefore
only one predecessor. (only node 10 would be pointing to node 20, only
node 20 would be pointing to node 30 …) This is not very practical,
because of the fact that the direction in which the query would propagate
would always be the same, therefore the complexity of this type of
algorithm would stay constant.

Executing a query on the network, in this example, would


have O(n) complexity. The reason for this is the following:

14
Let us consider that Node 10 would like to obtain the value for key 85. It
first needs to locate where that particular resource is stored, i.e. Node 80.
Initially Node 10 would query its successor with the theoretical
question “Do you have key 85?”. If its successor, in this case Node 20.
Does not have key 85 it will continue asking this question to its
successor and so on until Node 80 gets asked. This process illustrates

the forward propagation of the query, shown on Fig . as red arrows.

Once Node 80 receives the query and confirms that it manages key 85, Node 80

would return the message to its caller. Namely , the query would propagate

backwards to the initial node that sent it.

Because we are traversing linearly through the whole network in both directions,

the complexity of this implementation of a DHT is O(n).

Figure10 :

6.DHT Algorithms : 6.1 Overview :

The following graph is replicated and simplified from . Degree is the


number of neighbors with which a node must maintain contact.

15
Parameter CAN CHORD Kademlia Koord Pastry Tapestry Viceroy

d-dimensional Circular Plaxton-style Plaxton-style Butterfly


Foundation XOR metric de Bruijn graph
torus space mesh mesh network

Map key-value
Matching Matching key
Routing pairs to Matching key Matching key to Suffix Levels of tree,
key to node and prefix in
coordinate to node ID node ID matching vicinity search
ID node ID
space

Routing
O(log(n))+c c i Between O(log(log(
(network O(dn(2/d)) O(log(n)) O(log(n)) O(log(n)) O(log(n))
s small n))) and O(log(n))

Between constant
Degree 2d O(log(n)) O(log(n)) O(2log(n)) O(log(n)) Constant
to log(n)

O(log(n))+c c i
Join/Leaves 2d log(n)2 O(log(n)) O(log(n)) O(log(n)) O(log(n))
s small

Ethereum [3],
Mainline DHT
Implementatio OpenChord, OceanStore,
– (BitTorrent), – FreePastry –
OverSIM Mnemosyne [4]
I2P, Kad
Network

_ The popularity of Kademlia over other DHTs is likely due to its relative simplicity
and performance. The rest of this section dives deeper into Kademlia.

6.2 Kademlia :

Kademlia is designed to be an efficient means for storing and finding


content in a distributed peer-to-peer (P2P) network. It has a number of
core features that are not simultaneously offered by other DHTs, such
as:

 The number of messages necessary for nodes to learn about each


other, is minimized.

16
 Nodes have enough information to route traffic through low-latency
paths.
 Parallel and asynchronous queries are made to avoid timeout delays
from failed nodes.
 The node existence algorithm resists certain basic distributed denial-
of-service (DDoS) attacks.

7.Calcular DHT :
each peer only aware of immediate successor and predecessor .

 each peer keeps track of IP addresses of predecessor,successor ,


short cuts .

17
 reduced from 6 to 3 messages .
 possible to design shortcuts with O(log N) neighbors,O(log N)
messages in query.

8.Advantages of DHT :

 Resilience to change in active node number — It’s highly


resilient to network leavers/joiners or other big changes around the
network , changes in the number of active nodes is handled very well
by the network.

 Automatic data distribution — Data is automatically


distributed, according to specific configuration options. The specific
way how data is eventually organized depends on the chosen
strategy in the design.

 Data loss resilience — Data is replicated across nodes, so it’s


difficult to have a permanent loss of a particular set of data, but not
impossible.

 Decentralization — There is no central entity responsible for data


management and query handling.

9.Disadvantages of DHT :

 Probable data loss — The probability of data loss exists as DHT


doesn’t provide absolute guarantees on data consistency and
integrity.

 Decentralization — There is no authority in the network that can


handle and prioritize queries, which can lead to network overload.

 Speed of a query — the speed of a particular query depends on the


specific architectural implementation of DHT; It can range from
O(n) to O(logN).

18
Conclusion :
In this chapter, we have defined the notion of
distributed Hash Table systems. We have reviewed
Structure of DHT and Popular DHT protocols : Chord,
Kademlia,Apache Cassandra ,Koorde TomP2P with
some Properties and DHT Algorithms.
And we ended this chapter with some examples to
explain this project.

19
BIBLIOGRAPHIE :

https://tlu.tarilabs.com/protocols/distributed-hash-tables
https://medium.com/the-code-vault/data-structures-distributed-
hash-table-febfd01fc0af
https://medium.com/techlog/chord-building-a-dht-distributed-
hash-table-in-golang-67c3ce17417b
https://fr.slideshare.net/atefbentahar/les-protocoles-de-routage-
dans-les-rseaux-pair-apair-master-informatiquesr-2016

https://slideplayer.fr/amp/1817817/

20

You might also like