LDHT: Locality-Aware Distributed Hash Tables
LDHT: Locality-Aware Distributed Hash Tables
Weiyu Wu#1, Yang Chen#, Xinyi Zhang*, Xiaohui Shi#, Lin Cong#, Beixing Deng#, Xing Li#
#Department of Electronic Engineering, Tsinghua University, China
*Department of Electrical Engineering, University of California, Los Angeles, USA
1
[email protected]
Abstract – As the substrate of structured peer-to-peer systems, locality in DHT-based systems. We assign the node identifiers
Distributed Hash Table (DHT) plays a key role in P2P routing in a geographic layout manner to ensure nodes close in the
infrastructures. Traditional DHT does not consider the location network topology to be close in the identifier space. We use a
of the nodes for the assignment of identifiers, which will result in node's ASN to generate the prefix of the identifier in order to
high end-to-end latency on DHT-based overlay networks. In this
make nodes in a same AS have close identifiers. As a result,
paper, we propose a design of locality-aware DHT called LDHT,
which exploits network locality on DHT-based systems. Instead of nodes in LDHT-based systems will have more close neighbors
assigning uniform random node identifiers in traditional DHT, than faraway neighbors in the network topology. The end-to-
nodes in LDHT are assigned locality-aware identifiers according end latency for the query on the overlay network will thus be
to their Autonomous System Numbers (ASNs). As a result, each reduced. We use three typical DHT-based systems, Chord [1],
node will have more nearby neighbors than faraway neighbors in Symphony [2] and Kademlia [3] as the basic DHT protocols to
the overlay. We evaluate the performance of LDHT on different evaluate our design. According to the simulation results on
kinds of typical DHT protocols and on various topologies. The different topologies, it is indicated that LDHT can improve the
results show that LDHT improves the traditional DHT protocols
performance of DHT-based systems on both path length and
a lot in terms of end-to-end latency, without introducing
additional overhead. It is indicated that LDHT is fit for different
Relative Delay Penalty (RDP) significantly, without adding
kinds of DHT protocols and can work effectively on all structured overlay hops.
P2P systems including Chord, Symphony and Kademlia. The rest of this paper is organized as follows. First we
review related work in Section II. Then we present the design
I. INTRODUCTION of LDHT in detail in Section III and evaluate its performance
Distributed Hash Table (DHT) is the substrate of structured in Section IV. We conclude the whole paper in Section V.
P2P systems. It supports the scalable storage and retrieval of
{key, value} pairs on the overlay network. DHT-based II. RELATED WORK
systems are an important class of P2P routing infrastructures. Three basic approaches have been suggested for exploiting
In DHT-based systems, nodes are assigned uniform random network locality into typical DHT protocols [4].
identifiers from a large identifier space. Data object (or value)
A. Proximity Routing
is placed at the node with identifier corresponding to its
unique key, which is chosen from the same identifier space. Proximity routing is the approach that the routing choice is
Lookup queries are forwarded across the overlay paths to based not only on which neighboring node makes the “most”
nodes in a progressive manner, with the identifiers closer to progress towards the key, but also on which neighboring node
the key in the identifier space. is “closest” in the sense of latency. At each hop, a nearby node
DHT-based systems can guarantee that any data object can is chosen among the ones in the routing table. This approach
be located in small O(logN) overlay hops on average, where N strikes a balance between making progress towards the
is the number of nodes in the system. However, overlay hop destination in the identifier space and choosing the closest
count is not enough to evaluate the performance of DHT- routing table entry according to the network locality.
based systems. Another efficient metric is the end-to-end Proximity routing has been used in a version of Chord [1].
latency of the overlay path. Routing algorithms that ignore the A set of alternate nodes are maintained for each finger table
latency of individual hops will result in high-latency paths. entry rather than one, and then queries are routed by selecting
Without considering network locality in DHT, the the closest node among the alternate nodes according to some
underlying network path between two nodes can be network proximity metric.
significantly different from the path on the overlay network. B. Proximity Neighbor Selection
Therefore, the lookup latency in the overlay network could be
quite high and adversely affect the performance of the This is a variant of the above idea, but the proximity
applications running over the DHT. criterion is applied when choosing neighbors, not just when
In this paper, we propose a design of an ASN-based choosing the next hop. Routing table entries are chosen to
locality-aware DHT called LDHT, which exploits network
*
This work is supported by National Basic Research Program of China (No.2007CB310806) and National Science Foundation of
China (No.60473087).
refer to nodes nearby in the network topology, among all live ring reflect their proximity relations in the network topology.
nodes with appropriate identifiers. When DHT routing makes progress in the identifier space,
Proximity neighbor selection has been used in the routing similar progress is made in the network topology and thus
table construction of Tapestry [5] and Pastry [6]. They choose overlay path costs are bounded.
the closest node in the network topology according to some
network proximity metric among the nodes whose identifiers B. Identifier Assignment
have the appropriate prefix. We use a node’s ASN to represent its network locality for
the reasons below. First, a node’s ASN can be easily obtained
C. Geographic Layout by itself using WHOIS, which is a TCP-based query/response
Geographic layout is the way exploiting network locality protocol widely used for querying a database in order to
into node identifiers. In this approach, nodes’ identifiers are determine the owner of a domain name, an IP address, or an
assigned in a manner which ensures nodes close to each other ASN in Internet. With abundant WHOIS databases available
in the network topology are also close in the identifier space. in Internet, this approach will not result in a single point of
In [7], the authors propose a hierarchical location-based failure problem. In the worst case, if a node can not access any
node ID assignment to encode physical topology. A location- WHOIS database, it can generate a random number as its ASN,
based node ID is a concatenation of a hierarchical prefix which will not effect the normal operation of the whole system.
assigned to a node’s region and a suffix of randomly generated While if using the geographical information like the scheme in
bits. The scheme is based on geography, that is, different [7], we will need to either deploy and maintain a dedicated
prefixes are assigned to different geographical regions. centralized database to partition the regions and assign
Chord6 [8] is an IPv6-based modified version of Chord prefixes, or have each end host maintain this kind of up-to-
with the approach of geographic layout. It exploits the date database by itself. A centralized database will lead to
hierarchical feature of IPv6 address. In Chord6, a node’s single point of failure, and, maintaining the database by each
identifier contains two parts: the higher bits are obtained by end host is too costly. Second, when using ASNs, LDHT can
hashing the node’s IPv6 address prefix of specific length, work on both IPv4 and IPv6 networks. While depending on
while the remaining lower bits are the hash of the rest of that the hierarchical feature of IPv6 address, Chord6 [8] can only
IPv6 address. work on IPv6 networks.
We divide each node’s identifier into two parts, Global
III. DESIGN OF LOCALITY-AWARE DHT Part and Local Part. Assuming that the length of the identifier
is n bits, Global Part covers the highest m bits of the identifier,
A. Basic Idea and Local Part covers the remaining n-m bits. Local Part is
The basic idea of LDHT is to exploit network locality on the prefix of the hash of the node’s IP address, which is the
DHT-based systems in a geographic layout manner. Different same as most traditional DHTs. Global Part is generated
DHT-based systems have different routing strategies and according to the node’s ASN. We assign a same Global Part
neighbor selection schemes, but they could have the same way to nodes in a same AS, in order to make them close to each
of node identifier assignment. Once the routing strategy and other in the identifier space.
neighbor selection scheme is determined, nodes choose The length of the Global Part m is a tradeoff between end-
neighbors only according to the identifiers of each other. Our to-end performance and load balancing of nodes, which can be
purpose is to make LDHT compatible for all DHT-based adjusted according to the scale of the application system. In
systems, no matter what routing strategies and neighbor our simulation described in Section IV, we construct the node
selection schemes they use. While proximity routing and identifier with m=7. An ASN is an integer between 0 and
proximity neighbor selection are approaches altered for 65536. We use a node’s ASN modulo 2m, which is converted
different DHTs. So we choose the approach of geographic to binary code, as its Global Part.
layout to exploit network locality. We concatenate Global Part and Local Part together, and
In traditional DHT, no information about a node’s network form the whole locality-aware identifier.
location or its proximity to other nodes can be deduced
through its random identifier. Randomness in node identifiers C. Workflow of LDHT
will probably lead to high end-to-end routing latency. In Fig. 1 shows the workflow of LDHT. When a node joins
LDHT, we construct a structured identifier space. Each node is LDHT, it first obtains its ASN and generates Global Part by
assigned a locality-aware identifier, thus its network topology the ASN in the length of m bits. Then, it will generate its
information can be embedded into its identifier. Local Part, which is the prefix of the hash of its IP address in
When nodes are choosing neighbors in DHT-based systems, the length of n-m bits. Then, the node joins the two parts
they will choose more nodes with identifiers close to together to form a whole identifier. With this locality-aware
themselves. So if we assign close identifiers to nodes close to identifier, it joins the DHT-based system and works the same
each other in the network topology, they will have more close way as in the original DHT protocol, such as neighbor
neighbors than faraway neighbors in the network topology. In selecting, message routing, etc.
LDHT, neighborhood relations of regions along the identifier
With the API interfaces provided by our locality-aware In the evaluation, we consider the following metrics:
DHT, distributed structured P2P applications will have better • Path length: the latency in an end-to-end overlay path of
end-to-end performance. We will evaluate the performance of each query. It is an efficient yet accurate metric to
LDHT in next section. measure network structure and data delivery
performance in different overlays.
• Relative Delay Penalty (RDP): the ratio of end-to-end
routing delay between a pair of nodes over that of a
direct IP path per query. RDP represents the relative
cost of routing on the overlay. The smaller it is, the
better the path on the overlay network fits the path on
the IP network.
• Hop count: the number of overlay hops in an end-to-end
path of each query.
and Kademlia, and add our LDHT design to them respectively. 0.7
0.5
generate a larger scale topology, which can still reflect the 0.7
0.6
nodes’ distribution of the real Internet. The 226 PlanetLab
CDF
0.5
nodes serve as transit nodes, and 20 stub nodes are assigned to
0.4
each transit node. We assign different distances to the edges in
0.3
Topology2: the distance of intra-stub edges is 1; the distance
0.2 Symphony, Topology1
of the edges between a transit node and a stub node is a LDHT-based Symphony, Topology1
0.1 Symphony, Topology2
random integer within [5, 15]; and the distance between transit LDHT-based Symphony, Topology2
0
nodes is from the distance dataset. Topology2 consists of all 0 200 400 600 800 1000 1200 1400
Path length per query (ms)
1600 1800 2000
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
CDF
CDF
0.5 0.5
0.4 0.4
0 0
0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 25 30
Path length per query (ms) RDP per query
Fig. 4 Path length per query of Kademlia and LDHT-based Kademlia Fig. 5 RDP per query of Chord and LDHT-based Chord
0.9
Chord Symphony Kademlia
0.8
Orig LDHT Orig LDHT Orig LDHT
0.7
TP1 525 407 595 443 449 383
0.6
TP2 1024 869 1083 897 884 853
CDF
0.5
0.4
From the figures and the table, we can see that the LDHT- 0.3
Symphony, Topology1
based systems have smaller path length than the original ones 0.2 LDHT-based Symphony, Topology1
Symphony, Topology2
for all of the three DHT protocols on both topologies. It 0.1 LDHT-based Symphony, Topology2
end latency. In fact, the query path on the LDHT overlay RDP per query
0.5
Symphony and Kademlia. Table II shows the average RDP of 0.4
each protocol and topology. The meanings of the short names 0.3 Kademlia, Topology1
are the same as Table I. 0.2
LDHT-based Kademlia, Topology1
Kademlia, Topology2
We can see that on both topologies, RDP of the three 0.1
LDHT-based Kademlia, Topology2
5
easily issue put and get operations to any DHT node without
4
running a LDHT client in order to use the LDHT service.
3 Second, we will consider the proximity among ASes to
2 improve the performance of LDHT. We have already done
1
some works in [12] and hope to use this kind of scheme to
make LDHT stronger.
0
10 50 90
Pencentiles of nodes (%)
REFERENCES
Fig. 9 Hop count per query of Symphony and LDHT-based Symphony [1] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan,
“Chord: A scalable peer-to-peer lookup protocol for internet
applications,” IEEE/ACM Transactions on Networking, vol. 11, pp.
8
17–32, 2003.
LDHT-based Kademlia, Topology2
[2] Gurmeet Singh Manku, Mayank Bawa and Prabhakar Raghavan,
7
Kademlia, Topology2 “Symphony: Distributed hashing in a small world,” in Proc. UCITS’03,
LDHT-based Kademlia, Topology1
6 Kademlia, Topology1 2003.
[3] P. Maymounkov and D. Mazi`eres, “Kademlia: A peer-to-peer
Hop count per query
5
information system based on the xor metric,” in Proc. IPTPS’02, 2002.
4 [4] Miguel Castro, Peter Druschel and Y. Charlie Hu, “Exploiting network
proximity in Distributed Hash Tables,” in Proc. IPTPS’02, 2002.
3
[5] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D.
2 Kubiatowicz, “Tapestry: A resilient global-scale overlay for service
deployment,” IEEE Journal on Selected Areas in Communications, vol.
1
22, pp. 41–53, January 2004.
0 [6] Antony Rowstron and Peter Druschel, “Pastry: Scalable, decentralized
10 50 90
Pencentiles of nodes (%) object location and routing for large-scale peer-to-peer systems,” in
Proc. IFIP/ACM International Conference on Distributed Systems
Fig. 10 Hop count per query of Kademlia and LDHT-based Kademlia Platforms (Middleware’01), 2001.
[7] Shuheng Zhou, Gregory R. Ganger and Peter Steenkiste, “Location-
based node IDs: enabling explicit locality in DHTs,” Carnegie Mellon
C. Simulation Conclusion University, Tech. Rep. CMU-CS-03-171, 2003.
Results above clearly show that, LDHT is applicable for [8] Jiping Xiong, Youwei Zhang, Peilin Hong and Jinsheng Li, “Chord6:
IPv6 based topology-aware Chord,” in Proc. ICNS’05, 2005.
different DHT protocols and topologies. In comparison with [9] (2007) The GT-ITM homepage. [Online]. Available:
original DHT, LDHT has better performance on end-to-end http://www.cc.gatech.edu/projects/gtitm/.
latency, without adding overlay hops. [10] (2007) The PlanetLab homepage. [Online]. Available:
http://www.planet-lab.org.
V. CONCLUSION AND FUTURE WORK [11] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia
Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu, “OpenDHT: A
In this paper, we propose a design of an ASN-based public DHT service and its uses,” in Proc. ACM SIGCOMM’05,
locality-aware DHT called LDHT, which exploits network August 2005.
locality on DHT-based systems. We assign a node’s identifier [12] Lin CONG, Bo YANG, Yang CHEN, Guohan LU, Beixing DENG,
Xing LI, Ye WANG, “NTS6: IPv6 based network topology service
in a geographic layout manner that nodes with close identifiers
system of CERNET2,” in Proc. MUE’07, Apr 2007.
in the identifier space are close in the network topology, so
that they will have more close neighbors than faraway