0% found this document useful (0 votes)
45 views5 pages

LDHT: Locality-Aware Distributed Hash Tables

LDHT is a proposed locality-aware distributed hash table (DHT) that aims to improve performance in DHT-based systems by exploiting network locality. It assigns node identifiers according to their autonomous system numbers (ASNs) so that nodes close in the network topology will also be close in the identifier space. This results in nodes having more nearby than distant neighbors. LDHT was evaluated on typical DHT protocols like Chord, Symphony and Kademlia across different topologies, showing significant improvements in end-to-end latency and other metrics without adding overhead.

Uploaded by

Hà Thuỵ Sâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

LDHT: Locality-Aware Distributed Hash Tables

LDHT is a proposed locality-aware distributed hash table (DHT) that aims to improve performance in DHT-based systems by exploiting network locality. It assigns node identifiers according to their autonomous system numbers (ASNs) so that nodes close in the network topology will also be close in the identifier space. This results in nodes having more nearby than distant neighbors. LDHT was evaluated on typical DHT protocols like Chord, Symphony and Kademlia across different topologies, showing significant improvements in end-to-end latency and other metrics without adding overhead.

Uploaded by

Hà Thuỵ Sâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

LDHT: Locality-aware Distributed Hash Tables*

Weiyu Wu#1, Yang Chen#, Xinyi Zhang*, Xiaohui Shi#, Lin Cong#, Beixing Deng#, Xing Li#
#Department of Electronic Engineering, Tsinghua University, China
*Department of Electrical Engineering, University of California, Los Angeles, USA
1
[email protected]

Abstract – As the substrate of structured peer-to-peer systems, locality in DHT-based systems. We assign the node identifiers
Distributed Hash Table (DHT) plays a key role in P2P routing in a geographic layout manner to ensure nodes close in the
infrastructures. Traditional DHT does not consider the location network topology to be close in the identifier space. We use a
of the nodes for the assignment of identifiers, which will result in node's ASN to generate the prefix of the identifier in order to
high end-to-end latency on DHT-based overlay networks. In this
make nodes in a same AS have close identifiers. As a result,
paper, we propose a design of locality-aware DHT called LDHT,
which exploits network locality on DHT-based systems. Instead of nodes in LDHT-based systems will have more close neighbors
assigning uniform random node identifiers in traditional DHT, than faraway neighbors in the network topology. The end-to-
nodes in LDHT are assigned locality-aware identifiers according end latency for the query on the overlay network will thus be
to their Autonomous System Numbers (ASNs). As a result, each reduced. We use three typical DHT-based systems, Chord [1],
node will have more nearby neighbors than faraway neighbors in Symphony [2] and Kademlia [3] as the basic DHT protocols to
the overlay. We evaluate the performance of LDHT on different evaluate our design. According to the simulation results on
kinds of typical DHT protocols and on various topologies. The different topologies, it is indicated that LDHT can improve the
results show that LDHT improves the traditional DHT protocols
performance of DHT-based systems on both path length and
a lot in terms of end-to-end latency, without introducing
additional overhead. It is indicated that LDHT is fit for different
Relative Delay Penalty (RDP) significantly, without adding
kinds of DHT protocols and can work effectively on all structured overlay hops.
P2P systems including Chord, Symphony and Kademlia. The rest of this paper is organized as follows. First we
review related work in Section II. Then we present the design
I. INTRODUCTION of LDHT in detail in Section III and evaluate its performance
Distributed Hash Table (DHT) is the substrate of structured in Section IV. We conclude the whole paper in Section V.
P2P systems. It supports the scalable storage and retrieval of
{key, value} pairs on the overlay network. DHT-based II. RELATED WORK
systems are an important class of P2P routing infrastructures. Three basic approaches have been suggested for exploiting
In DHT-based systems, nodes are assigned uniform random network locality into typical DHT protocols [4].
identifiers from a large identifier space. Data object (or value)
A. Proximity Routing
is placed at the node with identifier corresponding to its
unique key, which is chosen from the same identifier space. Proximity routing is the approach that the routing choice is
Lookup queries are forwarded across the overlay paths to based not only on which neighboring node makes the “most”
nodes in a progressive manner, with the identifiers closer to progress towards the key, but also on which neighboring node
the key in the identifier space. is “closest” in the sense of latency. At each hop, a nearby node
DHT-based systems can guarantee that any data object can is chosen among the ones in the routing table. This approach
be located in small O(logN) overlay hops on average, where N strikes a balance between making progress towards the
is the number of nodes in the system. However, overlay hop destination in the identifier space and choosing the closest
count is not enough to evaluate the performance of DHT- routing table entry according to the network locality.
based systems. Another efficient metric is the end-to-end Proximity routing has been used in a version of Chord [1].
latency of the overlay path. Routing algorithms that ignore the A set of alternate nodes are maintained for each finger table
latency of individual hops will result in high-latency paths. entry rather than one, and then queries are routed by selecting
Without considering network locality in DHT, the the closest node among the alternate nodes according to some
underlying network path between two nodes can be network proximity metric.
significantly different from the path on the overlay network. B. Proximity Neighbor Selection
Therefore, the lookup latency in the overlay network could be
quite high and adversely affect the performance of the This is a variant of the above idea, but the proximity
applications running over the DHT. criterion is applied when choosing neighbors, not just when
In this paper, we propose a design of an ASN-based choosing the next hop. Routing table entries are chosen to
locality-aware DHT called LDHT, which exploits network

*
This work is supported by National Basic Research Program of China (No.2007CB310806) and National Science Foundation of
China (No.60473087).
refer to nodes nearby in the network topology, among all live ring reflect their proximity relations in the network topology.
nodes with appropriate identifiers. When DHT routing makes progress in the identifier space,
Proximity neighbor selection has been used in the routing similar progress is made in the network topology and thus
table construction of Tapestry [5] and Pastry [6]. They choose overlay path costs are bounded.
the closest node in the network topology according to some
network proximity metric among the nodes whose identifiers B. Identifier Assignment
have the appropriate prefix. We use a node’s ASN to represent its network locality for
the reasons below. First, a node’s ASN can be easily obtained
C. Geographic Layout by itself using WHOIS, which is a TCP-based query/response
Geographic layout is the way exploiting network locality protocol widely used for querying a database in order to
into node identifiers. In this approach, nodes’ identifiers are determine the owner of a domain name, an IP address, or an
assigned in a manner which ensures nodes close to each other ASN in Internet. With abundant WHOIS databases available
in the network topology are also close in the identifier space. in Internet, this approach will not result in a single point of
In [7], the authors propose a hierarchical location-based failure problem. In the worst case, if a node can not access any
node ID assignment to encode physical topology. A location- WHOIS database, it can generate a random number as its ASN,
based node ID is a concatenation of a hierarchical prefix which will not effect the normal operation of the whole system.
assigned to a node’s region and a suffix of randomly generated While if using the geographical information like the scheme in
bits. The scheme is based on geography, that is, different [7], we will need to either deploy and maintain a dedicated
prefixes are assigned to different geographical regions. centralized database to partition the regions and assign
Chord6 [8] is an IPv6-based modified version of Chord prefixes, or have each end host maintain this kind of up-to-
with the approach of geographic layout. It exploits the date database by itself. A centralized database will lead to
hierarchical feature of IPv6 address. In Chord6, a node’s single point of failure, and, maintaining the database by each
identifier contains two parts: the higher bits are obtained by end host is too costly. Second, when using ASNs, LDHT can
hashing the node’s IPv6 address prefix of specific length, work on both IPv4 and IPv6 networks. While depending on
while the remaining lower bits are the hash of the rest of that the hierarchical feature of IPv6 address, Chord6 [8] can only
IPv6 address. work on IPv6 networks.
We divide each node’s identifier into two parts, Global
III. DESIGN OF LOCALITY-AWARE DHT Part and Local Part. Assuming that the length of the identifier
is n bits, Global Part covers the highest m bits of the identifier,
A. Basic Idea and Local Part covers the remaining n-m bits. Local Part is
The basic idea of LDHT is to exploit network locality on the prefix of the hash of the node’s IP address, which is the
DHT-based systems in a geographic layout manner. Different same as most traditional DHTs. Global Part is generated
DHT-based systems have different routing strategies and according to the node’s ASN. We assign a same Global Part
neighbor selection schemes, but they could have the same way to nodes in a same AS, in order to make them close to each
of node identifier assignment. Once the routing strategy and other in the identifier space.
neighbor selection scheme is determined, nodes choose The length of the Global Part m is a tradeoff between end-
neighbors only according to the identifiers of each other. Our to-end performance and load balancing of nodes, which can be
purpose is to make LDHT compatible for all DHT-based adjusted according to the scale of the application system. In
systems, no matter what routing strategies and neighbor our simulation described in Section IV, we construct the node
selection schemes they use. While proximity routing and identifier with m=7. An ASN is an integer between 0 and
proximity neighbor selection are approaches altered for 65536. We use a node’s ASN modulo 2m, which is converted
different DHTs. So we choose the approach of geographic to binary code, as its Global Part.
layout to exploit network locality. We concatenate Global Part and Local Part together, and
In traditional DHT, no information about a node’s network form the whole locality-aware identifier.
location or its proximity to other nodes can be deduced
through its random identifier. Randomness in node identifiers C. Workflow of LDHT
will probably lead to high end-to-end routing latency. In Fig. 1 shows the workflow of LDHT. When a node joins
LDHT, we construct a structured identifier space. Each node is LDHT, it first obtains its ASN and generates Global Part by
assigned a locality-aware identifier, thus its network topology the ASN in the length of m bits. Then, it will generate its
information can be embedded into its identifier. Local Part, which is the prefix of the hash of its IP address in
When nodes are choosing neighbors in DHT-based systems, the length of n-m bits. Then, the node joins the two parts
they will choose more nodes with identifiers close to together to form a whole identifier. With this locality-aware
themselves. So if we assign close identifiers to nodes close to identifier, it joins the DHT-based system and works the same
each other in the network topology, they will have more close way as in the original DHT protocol, such as neighbor
neighbors than faraway neighbors in the network topology. In selecting, message routing, etc.
LDHT, neighborhood relations of regions along the identifier
With the API interfaces provided by our locality-aware In the evaluation, we consider the following metrics:
DHT, distributed structured P2P applications will have better • Path length: the latency in an end-to-end overlay path of
end-to-end performance. We will evaluate the performance of each query. It is an efficient yet accurate metric to
LDHT in next section. measure network structure and data delivery
performance in different overlays.
• Relative Delay Penalty (RDP): the ratio of end-to-end
routing delay between a pair of nodes over that of a
direct IP path per query. RDP represents the relative
cost of routing on the overlay. The smaller it is, the
better the path on the overlay network fits the path on
the IP network.
• Hop count: the number of overlay hops in an end-to-end
path of each query.

Fig. 1 Workflow of LDHT


B. Simulation Results
We complete our simulations on Topology1 and Topology2
IV. PERFORMANCE EVALUATION described in Section IV-A.
Fig. 2, Fig. 3 and Fig. 4 show the CDF of the path length of
We use Chord, Symphony and Kademlia as the basic DHT both original and LDHT-based Chord, Symphony and
protocols, and add our approach on them to form LDHT-based Kademlia. We also calculate the average path length of each
systems. Performance of LDHT is evaluated and compared protocol and topology and show the results in Table I. We use
with that of the three original DHT protocols on two some short names due to the limited space. “TP1” and “TP2”
representative network topologies, one of which is generated means Topology1 and Topology2. “Orig” means original
by GT-ITM [9] and the other is collected from real-world protocol and “LDHT” means LDHT-based protocol.
Internet.
1

A. Simulation Setup 0.9

We use our own simulator to construct Chord, Symphony 0.8

and Kademlia, and add our LDHT design to them respectively. 0.7

To accurately prove the effectiveness of our scheme, we 0.6

implement two network topologies for the performance


CDF

0.5

evaluation, Topology1 and Topology2. 0.4

Topology1 is generated by GT-ITM [9] with the scale of 0.3


Chord, Topology1
4000 nodes. It’s a two-level hierarchical topology. The top 0.2
LDHT-based Chord, Topology1
Chord, Topology2
level of Topology1 consists of 200 ASes in 150 by 150 grids. 0.1
LDHT-based Chord, Topology2

The bottom level consists of a random number of nodes in the 0


0 200 400 600 800 1000 1200 1400 1600 1800 2000
range of [13, 26] within each AS in 10 by 10 grids. Path length per query (ms)

We use a real-world Internet distance dataset of 226


PlanetLab [10] nodes to construct Topology2 with the scale of Fig. 2 Path length per query of Chord and LDHT-based Chord
4520 nodes. The dataset contains latencies between nodes in
PlanetLab with ping method in real Internet. These 226 1

PlanetLab nodes are distributed dispersedly in 80 different 0.9

ASes. We use the location of the 226 PlanetLab nodes to 0.8

generate a larger scale topology, which can still reflect the 0.7

0.6
nodes’ distribution of the real Internet. The 226 PlanetLab
CDF

0.5
nodes serve as transit nodes, and 20 stub nodes are assigned to
0.4
each transit node. We assign different distances to the edges in
0.3
Topology2: the distance of intra-stub edges is 1; the distance
0.2 Symphony, Topology1
of the edges between a transit node and a stub node is a LDHT-based Symphony, Topology1
0.1 Symphony, Topology2
random integer within [5, 15]; and the distance between transit LDHT-based Symphony, Topology2
0
nodes is from the distance dataset. Topology2 consists of all 0 200 400 600 800 1000 1200 1400
Path length per query (ms)
1600 1800 2000

the stub nodes.


We use SHA-1 as the hash algorithm to generate the hash of Fig. 3 Path length per query of Symphony and LDHT-based Symphony
IP address, with the length of 160 bits. For each system, we
perform random queries for 4*104 times to get the statistical
and average simulation results. (In other words, we insert
4*104 random keys into the overlay network.)
1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
CDF

CDF
0.5 0.5

0.4 0.4

0.3 0.3 Chord, Topology1


Kademlia, Topology1 LDHT-based Chord, Topology1
0.2 0.2
LDHT-based Kademlia, Topology1 Chord, Topology2
Kademlia, Topology2 LDHT-based Chord, Topology2
0.1 0.1
LDHT-based Kademlia, Topology2

0 0
0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 25 30
Path length per query (ms) RDP per query

Fig. 4 Path length per query of Kademlia and LDHT-based Kademlia Fig. 5 RDP per query of Chord and LDHT-based Chord

TABLE I AVERAGE PATH LENGTH (MS) 1

0.9
Chord Symphony Kademlia
0.8
Orig LDHT Orig LDHT Orig LDHT
0.7
TP1 525 407 595 443 449 383
0.6
TP2 1024 869 1083 897 884 853

CDF
0.5

0.4

From the figures and the table, we can see that the LDHT- 0.3
Symphony, Topology1
based systems have smaller path length than the original ones 0.2 LDHT-based Symphony, Topology1
Symphony, Topology2
for all of the three DHT protocols on both topologies. It 0.1 LDHT-based Symphony, Topology2

indicates that LDHT is much more efficient in terms of end-to- 0


0 5 10 15 20 25 30

end latency. In fact, the query path on the LDHT overlay RDP per query

network has many intra-domain connections between


Fig. 6 RDP per query of Symphony and LDHT-based Symphony
neighbors, which are much shorter in terms of latency than
inter-domain connections. As the original DHT overlay
1
network doesn’t take network locality into account, many
0.9
neighbor connections are high-latency inter-domain links
0.8
instead. 0.7
Fig. 5, Fig. 6 and Fig. 7 show the CDF of the Relative 0.6
Delay Penalty (RDP) of both original and LDHT-based Chord,
CDF

0.5
Symphony and Kademlia. Table II shows the average RDP of 0.4

each protocol and topology. The meanings of the short names 0.3 Kademlia, Topology1
are the same as Table I. 0.2
LDHT-based Kademlia, Topology1
Kademlia, Topology2
We can see that on both topologies, RDP of the three 0.1
LDHT-based Kademlia, Topology2

LDHT-based systems are smaller than the three original ones. 0


0 5 10 15 20 25 30
It indicates that the end-to-end path between two nodes on the RDP per query

LDHT overlay network is closer than that on the original DHT


to the underlying IP network path. The relative routing cost of Fig. 7 RDP per query of Kademlia and LDHT-based Kademlia
LDHT overlay network is smaller than the original overlay
TABLE II AVERAGE RDP
network.
Fig. 8, Fig. 9 and Fig. 10 show the comparison of hop count Chord Symphony Kademlia
per query of both original and LDHT-based Chord, Symphony Orig LDHT Orig LDHT Orig LDHT
and Kadmlia on the two topologies. We present the results of TP1 10.71 8.22 12.19 8.64 9.11 7.50
the 10th, 50th and 90th percentiles of nodes. The results TP2 14.24 13.16 15.48 12.80 13.54 10.82
indicate that the hop count’s distribution of our LDHT-based
system is the same as the original DHT. The reason is that our
design only changes the manner in which the identifiers are
assigned, but doesn’t change the original DHT’s routing
strategy and neighbor selection scheme at all.
8
neighbors, and the end-to-end latency in a query can thus be
7
LDHT-based Chord, Topology2
reduced. We use a node’s ASN to generate Global Part of the
Chord, Topology2
LDHT-based Chord, Topology1 identifier that makes the nodes in a same AS have a same
6 Chord, Topology1
Hop count per query
identifier prefix. As a result, there are more intra-domain
5
neighbor connections in the path on LDHT-based overlay
4 network.
3
We develop LDHT-based Chord, Symphony and Kademlia
2
to evaluate the performance of our design in three metrics. Our
simulations are done on both topologies generated by GT-ITM
1
and real-world Internet. The simulation results prove the
0
10 50 90 effectiveness of LDHT. The advantage of LDHT over
Pencentiles of nodes (%)
traditional DHT lies in its better performance in terms of end-
Fig. 8 Hop count per query of Chord and LDHT-based Chord to-end latency like path length and RDP, without adding
overlay hops. Meanwhile, LDHT is applicable for different
8
kinds of basic DHT protocols and can work well on various
7 LDHT-based Symphony, Topology2
Symphony, Topology2
topologies.
6
LDHT-based Symphony, Topology1
Symphony, Topology1
As for future work, first, we would like to deploy a publicly
accessible DHT service, like OpenDHT [11]. People can
Hop count per query

5
easily issue put and get operations to any DHT node without
4
running a LDHT client in order to use the LDHT service.
3 Second, we will consider the proximity among ASes to
2 improve the performance of LDHT. We have already done
1
some works in [12] and hope to use this kind of scheme to
make LDHT stronger.
0
10 50 90
Pencentiles of nodes (%)
REFERENCES
Fig. 9 Hop count per query of Symphony and LDHT-based Symphony [1] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan,
“Chord: A scalable peer-to-peer lookup protocol for internet
applications,” IEEE/ACM Transactions on Networking, vol. 11, pp.
8
17–32, 2003.
LDHT-based Kademlia, Topology2
[2] Gurmeet Singh Manku, Mayank Bawa and Prabhakar Raghavan,
7
Kademlia, Topology2 “Symphony: Distributed hashing in a small world,” in Proc. UCITS’03,
LDHT-based Kademlia, Topology1
6 Kademlia, Topology1 2003.
[3] P. Maymounkov and D. Mazi`eres, “Kademlia: A peer-to-peer
Hop count per query

5
information system based on the xor metric,” in Proc. IPTPS’02, 2002.
4 [4] Miguel Castro, Peter Druschel and Y. Charlie Hu, “Exploiting network
proximity in Distributed Hash Tables,” in Proc. IPTPS’02, 2002.
3
[5] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D.
2 Kubiatowicz, “Tapestry: A resilient global-scale overlay for service
deployment,” IEEE Journal on Selected Areas in Communications, vol.
1
22, pp. 41–53, January 2004.
0 [6] Antony Rowstron and Peter Druschel, “Pastry: Scalable, decentralized
10 50 90
Pencentiles of nodes (%) object location and routing for large-scale peer-to-peer systems,” in
Proc. IFIP/ACM International Conference on Distributed Systems
Fig. 10 Hop count per query of Kademlia and LDHT-based Kademlia Platforms (Middleware’01), 2001.
[7] Shuheng Zhou, Gregory R. Ganger and Peter Steenkiste, “Location-
based node IDs: enabling explicit locality in DHTs,” Carnegie Mellon
C. Simulation Conclusion University, Tech. Rep. CMU-CS-03-171, 2003.
Results above clearly show that, LDHT is applicable for [8] Jiping Xiong, Youwei Zhang, Peilin Hong and Jinsheng Li, “Chord6:
IPv6 based topology-aware Chord,” in Proc. ICNS’05, 2005.
different DHT protocols and topologies. In comparison with [9] (2007) The GT-ITM homepage. [Online]. Available:
original DHT, LDHT has better performance on end-to-end http://www.cc.gatech.edu/projects/gtitm/.
latency, without adding overlay hops. [10] (2007) The PlanetLab homepage. [Online]. Available:
http://www.planet-lab.org.
V. CONCLUSION AND FUTURE WORK [11] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia
Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu, “OpenDHT: A
In this paper, we propose a design of an ASN-based public DHT service and its uses,” in Proc. ACM SIGCOMM’05,
locality-aware DHT called LDHT, which exploits network August 2005.
locality on DHT-based systems. We assign a node’s identifier [12] Lin CONG, Bo YANG, Yang CHEN, Guohan LU, Beixing DENG,
Xing LI, Ye WANG, “NTS6: IPv6 based network topology service
in a geographic layout manner that nodes with close identifiers
system of CERNET2,” in Proc. MUE’07, Apr 2007.
in the identifier space are close in the network topology, so
that they will have more close neighbors than faraway

You might also like