0% found this document useful (0 votes)
46 views5 pages

Routing Algorithms For DHTS: Some Open Questions

The document discusses open questions about routing algorithms for distributed hash tables (DHTs). It briefly reviews existing DHT routing algorithms like Tapestry, Pastry, Chord, and CAN which employ different routing approaches. The document argues that instead of competitively comparing algorithms, researchers should seek to combine insights from different algorithms to develop better routing techniques. It outlines some issues for further study, such as balancing routing efficiency and state requirements, and adapting routing to heterogeneous and dynamic networks. The goal is to promote discussion and collaboration to advance the design of DHT routing algorithms.

Uploaded by

Nghĩa Zer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

Routing Algorithms For DHTS: Some Open Questions

The document discusses open questions about routing algorithms for distributed hash tables (DHTs). It briefly reviews existing DHT routing algorithms like Tapestry, Pastry, Chord, and CAN which employ different routing approaches. The document argues that instead of competitively comparing algorithms, researchers should seek to combine insights from different algorithms to develop better routing techniques. It outlines some issues for further study, such as balancing routing efficiency and state requirements, and adapting routing to heterogeneous and dynamic networks. The goal is to promote discussion and collaboration to advance the design of DHT routing algorithms.

Uploaded by

Nghĩa Zer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Routing Algorithms for DHTs: Some Open Questions

Sylvia Ratnasamy Scott Shenker Ion Stoica


([email protected]) ([email protected]) ([email protected])

1 Introduction gorithm. The DHT nodes form an overlay network


with each node having several other nodes as neighbors.
Even though they were introduced only a few years When a lookup(key) is issued, the lookup is routed
ago, peer-to-peer (P2P) filesharing systems are now one through the overlay network to the node responsible for
of the most popular Internet applications and have be- that key. The scalability of these DHT algorithms is tied
come a major source of Internet traffic. Thus, it is ex- directly to the efficiency of their routing algorithms.
tremely important that these systems be scalable. Un- Each of the proposed DHT systems listed above –
fortunately, the initial designs for P2P systems have sig- Tapestry, Pastry, Chord, and CAN – employ a differ-
nificant scaling problems; for example, Napster has a ent routing algorithm. Usually discussion of DHT rout-
centralized directory service, and Gnutella employs a ing issues is in the context of one particular algorithm.
flooding-based search mechanism that is not suitable And, when more than one is mentioned, they are of-
for large systems. ten compared in competitive terms in an effort to deter-
In response to these scaling problems, several re- mine which is “best”. We think both of these trends
search groups have (independently) proposed a new are wrong. The algorithms have more commonality
generation of scalable P2P systems that support a dis- than differences, and each algorithm embodies some
tributed hash table (DHT) functionality; among them insights about routing in overlay networks. Rather than
are Tapestry [15], Pastry [6], Chord [14], and Content- always working in the context of a single algorithm,
Addressable Networks (CAN) [10]. In these systems, or comparing the algorithms competitively, a more ap-
which we will call DHTs, files are associated with a propriate goal would be to combine these insights, and
key (produced, for instance, by hashing the file name) seek new insights, to produce even better algorithms. In
and each node in the system is responsible for storing that spirit we describe some issues relevant to routing
a certain range of keys. There is one basic operation in algorithms and identify some open research questions.
these DHT systems, lookup(key), which returns the Of course, our list of questions is not intended to be
identity (e.g., the IP address) of the node storing the ob- exhaustive, merely illustrative.
ject with that key. This operation allows nodes to put
and get files based on their key, thereby supporting the As should be clear by our description, this paper is
hash-table-like interface.1 not about finished work, but instead is about a research
agenda for future work (by us and others). We hope that
This DHT functionality has proved to be a use-
presenting such a discussion to this audience will pro-
ful substrate for large distributed systems; a number
of projects are proposing to build Internet-scale fa- mote synergy between research groups in this area and
cilities layered above DHTs, including distributed file help clarify some of the underlying issues. We should
systems [5, 7, 4], application-layer multicast [11, 16], note that there are many other interesting issues that
event notification services [3, 1], and chat services [2]. remain to be resolved in these DHT systems, such as
With so many applications being developed in so short security and robustness to attacks, system monitoring
a time, we expect the DHT functionality to become an and maintenance, and indexing and keyword search-
ing. These issues will doubtless be discussed elsewhere
integral part of the future P2P landscape.
in this workshop. Our focus on routing algorithms is
The core of these DHT systems is the routing al-
not intended to imply that these other issues are of sec-
1
The interfaces of these systems are not all identical; some ondary importance.
reveal only the put and get interface while others reveal the
lookup(key) function directly. However, the above discussion We first (very) briefly review the routing algorithms
refers to the underlying functionality and not the details of the API. used in the various DHT systems in Section 2. We then,
in the following sections, discuss various issues rele- the design, originally intended for static environments,
vant to routing: state-efficiency tradeoff, resilience to can adapt to a dynamic node population. The modifi-
failures, routing hotspots, geography, and heterogene- cations are too involved to describe in this short review.
ity. However, the algorithm maintains the properties of hav-
ing   neighbors and routing with path lengths of
  hops.
2 Review of Existing Algorithms
Pastry: In Pastry [6], nodes are responsible for keys
In this section we review some of the existing routing that are the closest numerically (with the keyspace con-
algorithms. All of them take, as input, a key and, sidered as a circle). The neighbors consist of a Leaf Set
 
in response, route a message to the node responsible which is the set of   closest nodes (half larger, half
for that key. The keys are strings of digits of some smaller). Correct, not necessarily efficient, routing can
length. Nodes have identifiers, taken from the same be achieved with this leaf set. To achieve more efficient
space as the keys (i.e., same number of digits). Each routing, Pastry has another set of neighbors spread out
node maintains a routing table consisting of a small in the key space (in a manner we don’t describe here).
subset of nodes in the system. When a node receives a Routing consists of forwarding the query to the neigh-
query for a key for which it is not responsible, the node boring node that has the longest shared prefix with the
routes the query to the neighbor node that makes the key (and, in the case of ties, to the node with identifier
most “progress” towards resolving the query. The no- closest numerically to the key). Pastry has  
tion of progress differs from algorithm to algorithm, but neighbors and routes within   hops.
in general is defined in terms of some distance between
the identifier of the current node and the identifier of Chord: Chord [14] also uses a one-dimensional cir-
the queried key. cular key space. The node responsible for the key is
the node whose identifier most closely follows the key
Plaxton et al.: Plaxton et al. [9] developed perhaps (numerically); that node is called the key’s successor.
the first routing algorithm that could be scalably used Chord maintains two sets of neighbors. Each node
by DHTs. While not intended for use in P2P systems, has a successor list of  nodes that immediately fol-
because it assumes a relatively static node population, low it in the key space. Routing correctness is achieved
it does provide very efficient routing of lookups. The with these lists. Routing efficiency is achieved with
routing algorithm works by “correcting” a single digit
 the finger list of   nodes spaced exponentially
at a time: if node number received a lookup around the key space. Routing consists of forwarding


query with key , which matches the first two dig- to the node closest, but not past, the key; pathlengths
its, then the routing algorithm forwards the query to are   hops.
a node which matches the first three digits (e.g., node
  CAN: CAN chooses its keys from a -dimensional
). To do this, a node needs to have, as neigh-
bors, nodes that match each prefix of its own identi- toroidal space. Each node is associated with a hyper-
fier but differ in the next digit. For a system of cubal region of this key space, and its neighbors are the
nodes, each node has on the order of   neigh- nodes that “own” the contiguous hypercubes. Routing
bors. Since one digit is corrected each time the query is consists of forwarding to a neighbor that is closer to the
forwarded, the routing path is at most   overlay key. CAN has a different performance profile than the
(or application-level) hops. other algorithms; nodes have 
 neighbors and path-
This algorithm has the additional property that if the lengths are ! #$ "  hops. Note, however, that when
 node-node latencies (or “distances” according to &%' , CAN has (  neighbors and  
some metric) are known, the routing tables can be cho- pathlengths like the other algorithms.
sen to minimize the expected path latency and, more-
over, the latency of the overlay path between two nodes
is within a constant factor of the latency of the direct 3 State-Efficiency Tradeoff
underlying network path between them.
The most obvious measure of the efficiency of these
Tapestry: Tapestry [15] uses a variant of the Plaxton routing algorithms is the resulting pathlength. Most
et al. algorithm. The modifications are to ensure that of the algorithms have pathlengths of   hops,
while CAN has longer paths of  $ "  . The most ob- neighbors are re-established (such as the finger set).
vious measure of the overhead associated with keeping The presence of these special neighbors allow one to
routing tables is the number of neighbors. This isn’t prove the correctness of routing, but the following ques-
just a measure of the state required to do routing but tion remains:
it is also a measure of how much state needs to be ad-
justed when nodes join or leave. Given the prevalence Question 4 To what extent are the observed path
of inexpensive memory and the highly transient user lengths better than the rather pessimistic bounds pro-
populations in P2P systems, this second issue is likely vided by the presence of these special neighbors?
to be much more important than the first. Most of the
Finally, one can ask how long it takes various algo-
algorithms require   neighbors, while CAN re-
rithms to fully recover their routing state, and at what
quires only 
 neighbors.
cost (measured, for example, by the number of nodes
Ideally, one would like to combine the best of these
participating in the recovery or the number of control
two classes of algorithms in hybrid algorithms that
messages generated for recovery).
achieve short pathlengths with a fixed number of neigh-
bors. Question 5 How long does it take, on average, to re-
Question 1 Can one achieve   pathlengths (or cover complete routing state? And what is the cost of
doing so?
better) with   neighbors?
One would expect that, if this were possible, that some A related question is:
other aspects of routing would get worse. Question 6 Can one identify design rules that lead to
Question 2 If so, are there other properties (such as shorter and/or cheaper recoveries?
those described in the following sections) that are made
worse in these hybrid routing algorithms? For instance, is symmetry (where the node neighbor re-
lation is symmetric) important in restoring state easily?
One could also argue that in the face of node failure,
4 Resilience to Failures having the routing automatically send messages to the
correct alternate node (i.e. the node that takes over the
The above routing results refer to a perfectly function- range of the identifier space that was previously held by
ing system with all nodes operational. However, P2P the failed node) leads to quicker recovery.
nodes are notoriously transient and the resilience of
routing to failures is a very important consideration.
There are (at least) three different aspects to resilience. 5 Routing Hot Spots
First, one needs to evaluate whether routing can con-
tinue to function (and with what efficiency) as nodes When there is a hotspot in the query pattern, with a
fail without any time for other nodes to establish other certain key being requested extremely often, then the
neighbors to compensate; that is the neighboring nodes node holding that key may become overloaded. Various
know that a node has failed, but they don’t establish any caching and replication schemes have been proposed to
new neighbor relations with other nodes. We will call overcome this query hotspot problem; the effectiveness
this static resilience and measure it in terms of the per- of these schemes may vary between algorithms based
centage of reachable key locations and of the resulting on the fan-in at the node and other factors, but this
average path length. seems to be a manageable problem. More problematic,
however, is if a node is overloaded with too much rout-
Question 3 Can one characterize the static resilience
ing traffic. These routing hotspots are harder to deal
of the various algorithms? What aspects of these algo-
with since there is no local action the node can take to
rithms lead to good resilience?
redirect the routing load. Some of the proximity tech-
Second, one can investigate the resilience when niques we describe below might be used to help here,
nodes have a chance to establish some neighbors, but but otherwise this remains an open problem.
not all. That is, when nodes have certain “special”
Question 7 Do routing hotspots exist and, if so, how
neighbors, such as the successor list or the Leaf Set,
can one deal with them?
and these are re-established after a failure, but no other
6 Incorporating Geography Geographic Layout: In most of the algorithms, the
node identifiers are chosen randomly (e.g. hash func-
The efficiency measure used above was the number of tions of the IP address, etc.) and the neighbor relations
application-level hops taken on the path. However, the are established based solely on these node identifiers.
true efficiency measure is the end-to-end latency of the One could instead attempt to choose node identifiers
path. Because the nodes could be geographically dis- in a geographically informed manner. 2 An initial at-
persed, some of these application-level hops could in- tempt to do so in the context of CAN was reported on
volve transcontinental links, and others merely trips in [12]; this approach was quite successful in reduc-
across a LAN; routing algorithms that ignore the la- ing the latency of paths. There was little in the layout
tencies of individual hops are likely to result in high- method specific to CAN, but the high-dimensionality of
latency paths. While the original “vanilla” versions of the key space may have played an important role; recent
some of these routing algorithms did not take these hop work [8] suggests that latencies in the Internet can be
latencies into account, almost all of the “full” versions reasonably modeled by a -dimension geometric space

of the algorithms make some attempt to deal with the with . This raises the question of whether sys-
geographic proximity of nodes. There are (at least) tems that use a one-dimensional key set can adequately
three ways of coping with geography. mimic the geographic layout of the nodes.

Proximity Routing: Proximity routing is when the Question 11 Can one choose identifiers in a one-
routing choice is based not just which neighboring node dimensional key space that will adequately capture the
makes the “most” progress towards the key, but is also geographic layout of nodes?
based on which neighboring node is “closest” in the However, this may not matter because the geographic
sense of latency. Various algorithms implement prox- layout may not offer significant advantages over the two
imity routing differently, but they all adopt the same proximity methods.
basic approach of weighing progress in identifier space Question 12 Can the two local techniques of proximity
against cost in latency (or geography). Simulations routing and proximity neighbor selection achieve most
have shown this to be a very effective tool in reducing of the benefit of global geographic layout?
the average path latency.
Moreover, these geographically-informed layout meth-
Question 8 Can one formally characterize the effec- ods may interfere with the robustness, hotspot, and
tiveness of these proximity routing approaches? other properties mentioned in previous sections.
Question 13 Does geographic layout have an impact
Proximity Neighbor Selection: This is a variant of on resilience, hotspots, and other aspects of perfor-
the idea above, but now the proximity criterion is ap- mance?
plied when choosing neighbors, not just when choosing
the next hop.
Question 9 Can one show that proximity neighbor se- 7 Extreme Heterogeneity
lection is always better than proximity routing? Is this
All of the algorithms start by assuming that all nodes
difference significant?
have the same capacity to process messages and then,
As mentioned earlier, if the  node-pair dis- only later, add on techniques for coping with hetero-
tances (as measured by latency) are known, the Plax- geneity.3 However, the heterogeneity observed in cur-
ton/Tapestry algorithm can choose the neighbors so as rent P2P populations [13] is quite extreme, with dif-
to minimize the expected overlay path latency. This is ferences of several orders of magnitude in bandwidth.
an extremely important property, that is (so far) the ex- One can ask whether the routing algorithms, rather than
clusive domain of the Plaxton/Tapestry algorithms. We merely coping with heterogeneity, should instead use
don’t whether other algorithms can adopt similar ap- 2
Note that geographic layout differs from the two above proxim-
proaches. ity methods in that here there is an attempt to affect the global lay-
out of the node identifiers, whereas the proximity methods merely
Question 10 If one had the full  distance matrix, affect the local choices of neighbors and forwarding nodes.
could one do optimal neighbor selection in algorithms 3
The authors of [13] deserve credit for bringing the issue of het-
other than Plaxton/Tapestry? erogeneity to our attention.
it to their advantage. At the extreme, a star topology [5] D RUSCHEL , P., AND ROWSTRON , A. Past: Persistent and
with all queries passing through a single hub node and anonymous storage in a peer-to-peer networking environ-
ment. In Proceedings of the 8th IEEE Workshop on Hot Top-
then routed to their destination would be extremely ef-
ics in Operating Systems (HotOS 2001) (Elmau/Oberbayern,
ficient, but would require a very highly capable nub Germany, May 2001), pp. 65–70.
node (and would have a single point of failure). But [6] D RUSCHEL , P., AND ROWSTRON , A. Pastry: Scalable, dis-
perhaps one could use the very highly capable nodes tributed object location and routing for large-scale peer-to-
as mini-hubs to improve routing. In another position peer systems. In Proceedings of the 18th IFIP/ACM Interna-
paper here, some of us argue that heterogeneity can be tional Conference on Distributed Systems Platforms (Middle-
ware 2001)W (Nov 2001).
used to make Gnutella-like systems more scalable. The
question is whether one could similarly modify the cur- [7] K UBIATOWICZ , J., B INDEL , D., C HEN , Y., C ZERWIN -
SKI , S., E ATON , P., G EELS , D., G UMMADI , R., R HEA ,
rent DHT routing algorithms to exploit heterogeneity: S., W EATHERSPOON , H., W EIMER , W., W ELLS , C., AND
Z HAO , B. OceanStore: An architecture for global-scale per-
Question 14 Can one redesign these routing algo- sistent storage. In Proceeedings of the Ninth international
Conference on Architectural Support for Programming Lan-
rithms to exploit heterogeneity? guages and Operating Systems (ASPLOS 2000) (Boston, MA,
It may be that no sophisticated modifications are November 2000), pp. 190–201.
needed to leverage heterogeneity. Perhaps the sim- [8] N G , E., AND Z HANG , H. Towards global network position-
plest technique to cope with heterogeneity, one that has ing. In Proceedings of ACM SIGCOMM Internet Measure-
already been mentioned in the literature, is to clone ment Workshop 2001 (Nov. 2001).
highly capable nodes so that they could serve as multi- [9] P LAXTON , C., R AJARAMAN , R., AND R ICHA , A. Access-
 ing nearby copies of replicated objects in a distributed envi-
ple nodes; i.e., a node that was times more powerful
 ronment. In Proceedings of the ACM SPAA (Newport, Rhode
than other nodes could function as virtual nodes. 4 Island, June 1997), pp. 311–320.
When combined with proximity routing and neighbor [10] R ATNASAMY, S., F RANCIS , P., H ANDLEY, M., K ARP, R.,
selection, cloning would allow nodes to route to them- AND S HENKER , S. A scalable content-addressable network.
selves and thereby “jump” in key space without any for- In Proc. ACM SIGCOMM (San Diego, CA, August 2001),
warding hops. pp. 161–172.
[11] R ATNASAMY, S., H ANDLEY, M., K ARP, R., AND
Question 15 Does cloning plus proximity routing and S HENKER , S. Application-level Multicast using Content-
neighbor selection lead to significantly improved per- Addressable Networks. In Proceedings of NGC 2001 (Nov.
formance when the node capabilities are extremely het- 2001).
erogeneous? [12] R ATNASAMY, S., H ANDLEY, M., R ICHARD K ARP, AND
S HENKER , S. Topologically-aware overlay construction and
server selection. In Proceedings of Infocom ’2002 (Mar.
References 2002).
[13] S AROIU , S., G UMMADI , K., AND G RIBBLE , S. A measure-
[1] A. ROWSTRON , A-M. K ERMARREC , M. C., AND D R - ment study of peer-to-peer file sharing systems. In Proceed-
USCHEL , P. Scribe: The design of a large-scale event no- ings of Multimedia Conferencing and Networking (San Jose,
tification infrastructure. In Proceedings of NGC 2001 (Nov. Jan. 2002).
2001).
[14] S TOICA , I., M ORRIS , R., K ARGER , D., K AASHOEK , M. F.,
[2] BASED C HAT, C. http://jxme.jxta.org/demo.html, 2001. AND B ALAKRISHNAN , H. Chord: A scalable peer-to-peer
[3] C ABRERA , L. F., J ONES , M. B., AND T HEIMER , M. Herald: lookup service for internet applications. In Proceedings of
Achieving a global event notification service. In Proceedings the ACM SIGCOMM ’01 Conference (San Diego, California,
of the 8th IEEE Workshop on Hot Topics in Operating Systems August 2001).
(HotOS-VIII) (Elmau/Oberbayern, Germany, May 2001). [15] Z HAO , B. Y., K UBIATOWICZ , J., AND J OSEPH , A. Tapestry:
[4] DABEK , F., K AASHOEK , M. F., K ARGER , D., M ORRIS , R., An infrastructure for fault-tolerant wide-area location and
AND S TOICA , I. Wide-area cooperative storage with CFS. routing. Tech. Rep. UCB/CSD-01-1141, University of Cal-
In Proceedings of the 18th ACM Symposium on Operating ifornia at Berkeley, Computer Science Department, 2001.
Systems Principles (SOSP ’01) (To appear; Banff, Canada, [16] Z HUANG , S., Z HAO , B., J OSEPH , A. D., K ATZ , R. H., AND
Oct. 2001). K UBIATOWICZ , J. Bayeux: An architecture for wide-area,
4
This technique has already been suggested for some of the al- fault-tolerant data dissemination. In Proceedings of NOSS-
gorithms, and could easily be applied to the others. However, in DAV’01 (Port Jefferson, NY, June 2001).
some algorithms it would require alteration in the way the node
identifiers were chosen so that they weren’t tied to the IP address of
the node.

You might also like