Usenix CR PDF
Usenix CR PDF
Abstract churn is node session time: the time between when a node
joins the network until the next time it leaves. Median
This paper addresses the problem of churn—the continu- session times observed in deployed networks range from
ous process of node arrival and departure—in distributed as long as an hour to as short as a few minutes.
hash tables (DHTs). We argue that DHTs should perform In this paper we explore the performance of DHTs in
lookups quickly and consistently under churn rates at least such dynamic environments. DHTs may be better able
as high as those observed in deployed P2P systems such to locate rare files than existing unstructured peer-to-peer
as Kazaa. We then show through experiments on an em- networks [18]. Moreover, it is not hard to imagine that
ulated network that current DHT implementations cannot other proposed uses for DHTs will show similar churn
handle such churn rates. Next, we identify and explore rates to file-sharing networks—application-level multicast
three factors affecting DHT performance under churn: re- of a low-budget radio stream, for example. In spite of this
active versus periodic failure recovery, message timeout promise, we show that short session times cause a vari-
calculation, and proximity neighbor selection. We work ety of negative effects on two mature DHT implementa-
in the context of a mature DHT implementation called tions we tested. Both systems exhibit dramatic latency
Bamboo, using the ModelNet network emulator, which growth when subjected to increasing churn, and in one
models in-network queuing, cross-traffic, and packet loss. implementation the network eventually partitions, causing
These factors are typically missing in earlier simulation- subsequent lookups to return inconsistent results. The re-
based DHT studies, and we show that careful attention mainder of this paper is dedicated to determining whether
to them in Bamboo’s design allows it to function effec- a DHT can be built such that it continues to perform well
tively at churn rates at or higher than that observed in P2P as churn rates increase.
file-sharing applications, while using lower maintenance We demonstrate that DHTs can in fact handle high
bandwidth than other DHT implementations. churn rates, and we identify and explore several factors
that affect the behavior of DHTs under churn. The three
most important factors we identify are:
1 Introduction
• reactive versus periodic recovery from failures
The popularity of widely-deployed file-sharing services
has recently motivated considerable research into peer-to- • calculation of message timeouts during lookups
peer systems. Along one line, this research has focused • choice of nearby over distant neighbors
on the design of better peer-to-peer algorithms, especially
in the area of structured peer-to-peer overlay networks or By reactive recovery, we mean the strategy whereby a
distributed hash tables (e.g. [20, 22, 24, 27, 30]), which we DHT node tries to find a replacement neighbor immedi-
will simply call DHTs. These systems map a large iden- ately upon noticing that an existing neighbor has failed.
tifier space onto the set of nodes in the system in a deter- We show that under bandwidth-limited conditions, reac-
ministic and distributed fashion, a function we alternately tive recovery can lead to a positive feedback cycle that
call routing or lookup. DHTs generally perform these overloads the network, causing lookups to have high la-
lookups using only O(log N ) overlay hops in a network tency or to return inconsistent results. In contrast, a DHT
of N nodes where every node maintains only O(log N ) node may recover from neighbor failure at a fixed, pe-
neighbor links, although recent research has explored the riodic rate. We show that this strategy improves perfor-
tradeoffs in storing more or less state. mance under churn by allowing the system to avoid posi-
A second line of research into P2P systems has focused tive feedback cycles.
on observing deployed networks (e.g. [5, 9, 13, 25]). A The manner in which a DHT chooses timeout values
significant result of this research is that such networks are during lookups can also greatly affect its performance un-
characterized by a high degree of churn. One metric of der churn. If a node performing a lookup sends a message
to a node that has left the network, it must eventually time-
111...
out the request and try another neighbor. We demonstrate
that such timeouts are a significant component of lookup
latency under churn, and we explore several methods of
110...
computing good timeout values, including virtual coordi- 0...
nate schemes as used in the Chord DHT.
Finally, we consider proximity neighbor selection
(PNS), where a DHT node with a choice of neighbors
tries to select those that are most nearby itself in net-
work latency. We compare several algorithms for discov-
ering nearby neighbors—including algorithms similar to
10...
those used in the Chord, Pastry, and Tapestry DHTs—to
show the tradeoffs they offer between latency reduction Figure 1: Neighbors in Pastry and Bamboo. A node’s
and added bandwidth. neighbors are divided into its leaf set, shown as dashed
We have augmented the Bamboo DHT [23] such that arrows, and its routing table, shown as solid arrows.
it can be configured to use any of the design choices
described above. As such, we can examine each de-
sign decision independently of the others. Moreover, we implementations under such churn. In Section 4, we study
examine the performance of each configuration by run- each of the factors listed above in isolation, and describe
ning it on a large cluster with an emulated wide-area net- how Bamboo uses these techniques. In Section 5, we sur-
work. This methodology is particularly important with vey related work, and in Section 6 we discuss important
regard to the choice of reactive versus periodic recovery future work. We conclude in Section 7.
as described above. Existing studies of churn in DHTs
(e.g. [7, 8, 16, 19]) have used simulations that—unlike
our emulated network—did not model the effects of net- 2 Introduction to DHT Routing
work queuing, cross traffic, or message loss. In our ex-
perience, these effects are primary factors contributing to In this section we present a brief review of DHT rout-
DHTs’ inability to handle churn. Moreover, our measure- ing, using Pastry [24] as an example. The geometry and
ments are conducted on an isolated network, where the routing algorithm of Bamboo are identical to Pastry; the
only sources of queuing, cross traffic, and loss are the difference (and the main contribution of this paper) lies in
DHTs themselves; in the presence of heavy background how Bamboo maintains the geometry as nodes join and
traffic, we expect that such network realities will exacer- leave the network and the network conditions vary.
bate the ability of DHTs to handle even lower levels of DHTs are structured graphs, and we use the term geom-
churn. etry to mean the pattern of neighbor links in the overlay
Of course, this study has limitations. Building and test- network, independent of the routing algorithms or state
ing a complete DHT implementation on an emulated net- management algorithms used [12].
work is a major effort. Consequently, we have limited our- Each node in Pastry is assigned a numeric identifier in
selves to studying a single DHT on a single network topol- [0, 2160 ), derived either from the SHA-1 hash of the IP
ogy using a relatively simple churn model. Furthermore, address and port on which the node receives packets or
we have not yet studied the effects of some implementa- from the SHA-1 hash of a public key. As such, they are
tion decisions that might affect the performance of a DHT well-distributed throughout the identifier space.
under churn, including the use of alternate routing table In Pastry, a node maintains two sets of neighbors, the
neighbors as in Kademlia and Tapestry, or the use of it- leaf set and the routing table (see Figure 1). A node’s
erative versus recursive routing. Nevertheless, we believe leaf set is the set of 2k nodes immediately preceding and
that the effects of the factors we have studied are dramatic following it in the circular identifier space. We denote this
enough to present them as an important early study in the set by L, and we use the notation Li with −k ≤ i ≤ k to
effort to build a DHT that successfully handles churn. denote the members of L, where L0 is the node itself.
The rest of this paper is structured as follows: in the In contrast, the routing table is a set of nodes whose
next section we review how DHTs perform routing or identifiers share successively longer prefixes with the
lookup, with particular reference to Pastry, whose routing source node’s identifier. Treating each identifier as a se-
algorithm Bamboo also uses. In Section 3, we review ex- quence of digits of base 2b and denoting the routing ta-
isting studies of churn in deployed file-sharing networks, ble entry at row l and column i by Rl [i], a node chooses
describe the way we model such churn in our emulated its neighbors such that the entry at Rl [i] is a node whose
network, and quantify the performance of mature DHT identifier matches its own in exactly l digits and whose
111...
if (L−k ≤ D ≤ Lk )
next hop = Li s.t. |D − Li | is minimal
else if (Rl [D[l]] 6= null)
00...
next hop = Rl [D[l]]
else
next hop = Li s.t. |D − Li | is minimal
010...
Figure 5: Metrics of churn. With respect to the routing Table 1: Observed session times in various peer-to-peer
and lookup functionality of a DHT, the session times of systems. The median session time ranges from an hour to
nodes are more relevant than their lifetimes. a minute.
Percent of Lookups
all [17]. In a Poisson process, an event rate λ corresponds 80 6.2 h 1.6 h
to a median inter-event period of ln 2/λ. For each event 60 3.1 h 47 min
we select a node to die uniformly at random, so each 40
node’s session time is expected to span N events, where
20 Consistent 23 min
N is the network size. Therefore a churn rate of λ corre- Completed
sponds to a median node session time of 0
0 50 100 150 200
tmed = N ln 2/λ. Time (minutes)
For example, a 1000-node network churning with median Figure 6: FreePastry under churn. The percentage of suc-
session times of one hour will see one node arrive (and cessful lookups in a 1000-node FreePastry network under
one leave) every 5.2 seconds. In our experiments, we used churn. Session times for each 30-minute churn period are
churn rates ranging from 8/second to 4/minute, equal to indicated by arrows, and each churn period is separated
median session times from 1.4 minutes to 3 hours. from the next by 10 minutes of no churn. The churn rate
Each live node continually performs lookups for identi- doubles with each successive period.
fiers chosen uniformly at random, timed by a Poisson pro-
cess with rate 0.1/second, for an aggregate system load of
100 lookups/second. Each lookup is simultaneously per- both systems, it is still possible that alternative configu-
formed by ten nodes, and we report both whether it com- rations could have improved their performance. More-
pletes and whether it is consistent with the others for the over, both systems have seen subsequent development,
same key. If there is a majority among the ten results for a and newer versions may show improved resilience under
given key, all nodes in the majority are said to see a con- churn.
sistent result, and all others are considered inconsistent.
If there is no majority, all nodes are said to see inconsis- FreePastry We tested FreePastry 1.3, the Rice Univer-
tent results. This metric of consistency is more strict than sity implementation of Pastry [1]. Figure 6 shows one
that required by some DHT applications. However, both effect of churn on a network of 1000 FreePastry nodes,
MIT’s Chord and our Bamboo implementation show at which we ran using the default 24-node leaf sets and log-
least 99.9% consistency under 47-minute median session arithm base of 16. We do not enforce proximity between a
times [23], so it does not seem unreasonable. new node and its gateway, as suggested for best FreePas-
There are two ways in which lookups fail in our tests. try performance; this decision only effects the proximity
First, we do not perform end-to-end retries, so a lookup of a node’s neighbors, not the efficiency of its routing.
may fail to complete if a node in the middle of the lookup It is clear from Figure 6 that while successful lookups
path leaves the network before forwarding the lookup re- are mostly consistent, FreePastry fails to complete a ma-
quest to the next node. We observed this behavior primar- jority of lookup requests under heavy churn. A likely
ily in FreePastry as described below. Second, a lookup explanation for this failure is that nodes wait so long on
may return inconsistent results. Such failures occur either lookup requests to time out that they frequently leave the
because a node is not aware of the correct node to forward network with several requests still in their queues. This
the lookup to, or because it erroneously believes the cor- behavior is probably exacerbated by FreePastry’s use of
rect node has left the network (because of congestion or Java RMI over TCP as its message transport, and the way
poorly chosen timeouts). All DHT implementations we that FreePastry nodes handle the loss of their neighbors.
have tested show some inconsistencies under churn, but We present evidence to support these ideas in Section 4.1.
carefully chosen timeouts and judicious bandwidth usage We make a final comment on this graph. FreePas-
can minimize them. try generally recovers well between churn periods, once
again correctly completing all lookups. The difficulty
3.3 Existing DHTs with real systems is that there is no such quiet period; the
network is in a continual state of churn.
In this section we report the results of testing two ma-
ture DHT implementations under churn. Our intent here
is not to place a definitive bound on the performance of MIT Chord We tested MIT’s Chord implementa-
either implementation. Rather, it is to motivate our work tion [4] using a CVS snapshot from 8/4/2003, with the de-
by demonstrating that handling churn in DHTs is both an fault 10-node successor lists and with the location cache
important and a non-trivial problem. While we have dis- disabled (using the -F option), since the cache causes poor
cussed these experiments extensively with the authors of performance under churn.
5
Chord
4 Handling Churn
4 Bamboo (No PNS)
Bandwidth (kB/s/node)
7 Periodic 5 Periodic
6
5 47 min 23 min 4
47 min 23 min
4 3
3 2
2
1 1
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Time (minutes) Time (minutes)
Figure 8: Reactive versus periodic recovery. Without churn, reactive recovery is very efficient, as messages are only
sent in response to actual changes. At reasonable churn rates, however, periodic recovery uses less bandwidth, and
lower contention for the network leads to lower latencies.
250 240
600 800 1000 1200 1400 600 800 1000 1200 1400
Bandwidth (bytes/s/node) Bandwidth (bytes/s/node)
(a) (b)
Figure 13: Comparison of PNS techniques. “No PNS” is the control case, where proximity is ignored. “Global
Sampling” uses the lookup function to sample all nodes in the DHT. “NN” is sampling our neighbor’s neighbors, and
“NIN” is sampling their inverse neighbors. The recursive versions of “NN” and “NIN” mimic the nearest-neighbor
algorithms of Pastry and Tapestry, respectively. Note that the scales are different between the two figures.
configuration that is not using PNS. With virtually no in- Gummadi et al. [12] present a comprehensive analysis
crease in bandwidth, global sampling drops the mean la- of the static resilience of the various DHT geometries. As
tency from 450 ms to 340 ms. we have argued earlier in this work, static resilience is an
Next, much to our surprise, we find that simple sam- important first step in a DHT’s ability to handle failures in
pling of our neighbor’s neighbors or inverse neighbors is general and churn in particular.
not terribly effective. As we argued above, this result may Liben-Nowell et al. [17] present a theoretical analysis
be in part due to the constraints of the routing table, but of structured peer-to-peer overlays from the point of view
we did not expect the effect to be so dramatic. On the of churn as a continuous process. They prove a lower
other hand, the recursive versions of both algorithms are bound on the maintenance traffic needed to keep such
at least as effective as global sampling, but not much more networks consistent under churn, and show that Chord’s
so. This result agrees with the contention of Gummadi et algorithms are within a logarithmic factor of this bound.
al. that only a small amount of global sampling is neces- This paper, in contrast, has focused more on the systems
sary to achieve near-optimal PNS. issues that arise in handling churn in a DHT. For example,
Figure 13(b) shows several combinations of the vari- we have observed what they call “false suspicions of fail-
ous algorithms. Global sampling plus sampling of neigh- ure”, the appearance that a functioning node has failed,
bors’ neighbors—the combination used in our earlier and shown how reactive failure recovery can exacerbate
work [23]—does well, offering a small decrease in la- such conditions.
tency without much additional bandwidth. However, the
Mahajan et al. [19] present a simulation-based analysis
other combinations offer similar results. At this point, it
of Pastry in which they study the probability that a DHT
seems prudent to say that the most effective technique is to
node will forward a lookup message to a failed node as
combine global sampling with any other technique. While
a function of the rate of maintenance traffic. They also
there may be other differences between the techniques not
present an algorithm for automatically tuning that rate for
revealed by this analysis, we see no clear reason to prefer
a given failure rate. Since this algorithm increases the
one over another as yet.
rate of maintenance traffic in response to losses, we are
concerned that it may cause positive feedback cycles like
5 Related Work those we have observed in reactive recovery. Moreover,
we believe their failure model is pessimistic, as they do
As we noted at the start of this paper, while DHTs have not consider hop-by-hop retransmissions of lookup mes-
been the subject of much research in the last 4 years or sages. By acknowledging lookup messages on each hop,
so, there have been few studies of the resilience of real a DHT can route around failed nodes in the middle of a
implementations at scale, perhaps because of the difficulty lookup path, and in this work we have shown that good
of deploying, instrumenting, and creating workloads for timeout values can be computed to minimize the cost of
such deployments. However, there has been a substantial such retransmissions.
amount of theoretical and simulation-based work. Castro et al. [7] presented a number of optimizations
they have performed in MSPastry, the Microsoft Research is more Gaussian than the distribution of latencies mea-
implementation of Pastry, using simulations. Also, Li et sured on the Internet. Unfortunately for our purposes,
al. [16] performed a detailed simulation-based analysis of these measured latency distributions do not include topol-
several different DHTs under churn, varying their parame- ogy information, and thus cannot be used to simulate the
ters to explore the latency-bandwidth tradeoffs presented. kind of network cross traffic that we have found important
It was their work that inspired our analysis of different in this study. The existence of better topologies would be
PNS techniques. most welcome.
As opposed to the emulated network used in this study, In addition to more realistic network models, we would
simulations do not usually consider such network issues also like to include more realistic models of churn in our
as queuing, packet loss, etc. By not doing so, they either future work. One idea that was suggested to us by an
simulate far larger networks than we have studied here as anonymous reviewer was to scale traces of session times
in [7, 19], or they are able to explore a far larger space collected from deployed networks to produce a range of
of possible DHT configurations as in [16]. On the other churn rates with a more realistic distribution. We would
hand, they do not reveal subtle issues in DHT design, like to explore this approach. Nevertheless, we believe
such as the tradeoffs between reactive and periodic recov- that the effects of the factors we have studied are dramatic
ery. Also, they do not reveal the interactions of lookup enough that they will remain important even as our mod-
traffic and maintenance traffic in competing for network els improve.
bandwidth. We are interested in whether a useful middle Finally, in this work we have only shown the resistance
ground exists between these approaches. of the Bamboo routing layer to churn, an important first
Finally, a number of useful features for handling churn step verifying that DHTs are ready to become the domi-
have been proposed, but are not implemented by Bamboo. nant building block for peer-to-peer systems, but a limited
For example, Kademlia [20] maintains several neighbors one. Clearly other issues remain. Security and possibly
for each routing table entry, ordered by the length of time anonymity are two such issues, but we are unclear about
they have been neighbors. Newer nodes replace existing how they relate to churn. We are currently studying the re-
neighbors only after failure of the latter. This design deci- silience to churn of the algorithms used by the DHT stor-
sion is aimed at mitigating the effects of the high “infant age layer. We hope that the existence of a routing layer
mortality” observed in peer-to-peer networks. that is robust under churn will provide a useful substrate
Another approach to handling churn is to introduce a on which these remaining issues may be studied.
hierarchy into the system, through stable “superpeers” [2,
29]. While an explicit hierarchy is a viable strategy for
handling load in some cases, this work has shown that 7 Conclusion
a fully decentralized, non-hierarchical DHT can in fact
handle high rates of churn at the routing layer. In this work we have summarized the rates of churn ob-
served in deployed peer-to-peer systems and shown that
existing DHTs exhibit less than desirable performance at
6 Future Work the higher end of these churn rates. We have presented
Bamboo and explored various design tradeoffs and their
As discussed in the introduction, there are several other effects on its ability to handle churn.
limitations of this study that we think provide for impor- The design tradeoffs we studied in this work fall into
tant future work. At an algorithmic level, we would like three broad categories: reactive versus periodic recov-
to study the effects of alternate routing table neighbors as ery from neighbor failure, the calculation of timeouts on
in Kademlia and Tapestry. We would also like to con- lookup messages, and proximity neighbor selection. We
tinue our study of iterative versus recursive routing. As have presented the danger of positive feedback cycles in
discussed by others [11], congestion control for iterative reactive recovery and discussed two ways to break such
lookups is a challenging problem. We have implemented cycles. First, we can make the DHT much more cautious
Chord’s STP congestion control algorithm and are cur- about declaring neighbors failed, in order to limit the pos-
rently investigating its behavior under churn, but we do sibility that we will be tricked into recovering a non-faulty
not yet have definitive results about its performance. node by network congestion. Second, we presented the
At a methodological level, we would like to broaden technique of periodic recovery. Finally, we demonstrated
our study to include better models of network topology that reactive recovery is less efficient than periodic recov-
and churn. We have so far used only a single network ery under reasonable churn rates when leaf sets are large,
topology in our work, and so our results should be not as they would be in a large system.
be taken as the last word on PNS. In particular, the dis- With respect to timeout calculation, we have shown that
tribution of internode latencies in our ModelNet topology TCP-style timeout calculation performs best, but argued
that it is only appropriate for lookups performed recur- [9] J. Chu, K. Labonte, and B. N. Levine. Availability and locality
sively. It has long been known that recursive routing pro- measurements of peer-to-peer file systems. In Proc. of ITCom:
Scalability and Traffic Control in IP Networks, July 2002.
vides lower latency lookups than iterative, but this result
[10] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica.
presents a further argument for recursive routing where
Wide-area cooperative storage with CFS. In Proc. ACM SOSP,
the lowest latency is important. However, we have also Oct. 2001.
shown that while they are not as effective as TCP-style [11] F. Dabek, J. Li, E. Sit, J. Robertson, M. F. Kaashoek, and R. Mor-
timeouts, timeouts based on virtual coordinates are quite ris. Designing a DHT for low latency and high throughput. In
reasonable under moderate rates of churn. This result in- Proc. NSDI, 2004.
dicates that at least with respect to timeouts, iterative rout- [12] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker,
ing should not be infeasible under moderate churn. and I. Stoica. The impact of DHT routing geometry on resilience
and proximity. In Proc. ACM SIGCOMM, Aug. 2003.
Concerning proximity neighbor selection, we have
[13] K. P. Gummadi, R. J. Dunn, S. Saroiu, S. D. Gribble, H. M. Levy,
shown that global sampling can provide a 24% reduc- and J. Zahorjan. Measurement, modeling, and analysis of a peer-
tion in latency for virtually no increase in bandwidth used. to-peer file-sharing workload. In Proc. ACM SOSP, Oct. 2003.
By using an additional 40% more bandwidth, a 42% de- [14] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao. Distributed
crease in latency can be achieved. Other techniques are object location in a dynamic network. In Proc. SPAA, 2002.
also effective, especially our adaptations of the Pastry and [15] V. Jacobson and M. J. Karels. Congestion avoidance and control.
Tapestry nearest-neighbor algorithms, but not much more In Proc. ACM SIGCOMM, 1988.
so than simple global sampling. Merely sampling our [16] J. Li, J. Stribling, T. M. Gil, R. Morris, and F. Kaashoek. Com-
paring the performance of distributed hash tables under churn. In
neighbors’ neighbors or inverse neighbors is not very ef-
Proc. IPTPS, 2004.
fective in comparison. Some combination of global sam-
[17] D. Liben-Nowell, H. Balakrishnan, and D. Karger. Analysis of
pling an any of the other techniques seems to provide the the evolution of peer-to-peer systems. In Proc. ACM PODC, July
best performance at the least cost. 2002.
[18] B. T. Loo, R. Huebsch, I. Stoica, and J. Hellerstein. The case for a
hybrid P2P search infrastructure. In Proc. IPTPS, 2004.
8 Acknowledgments [19] R. Mahajan, M. Castro, and A. Rowstron. Controlling the cost of
reliability in peer-to-peer overlays. In Proc. IPTPS, Feb. 2003.
We would like to thank a number of people for their help [20] P. Maymounkov and D. Mazieres. Kademlia: A peer-to-peer in-
with this work. Our shepherd, Atul Adya, and the anony- formation system based on the XOR metric. In Proc. IPTPS, 2002.
mous reviewers all provided valuable comments and guid- [21] C. Plaxton, R. Rajaraman, and A. Richa. Accessing nearby copies
ance. Frank Dabek helped us tune our Vivaldi implemen- of replicated objects in a distributed environment. In Proc. of ACM
SPAA, June 1997.
tation, and he and Emil Sit helped us get Chord up and
[22] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A
running. Likewise, Peter Druschel provided valuable de- scalable content-addressable network. In Proc. ACM SIGCOMM,
bugging insight for FreePastry. David Becker helped us Aug. 2001.
with ModelNet. Sylvia Ratnasamy, Scott Shenker, and [23] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz. Handling churn
Ion Stoica provided valuable guidance at several stages of in a DHT. Technical Report UCB//CSD-03-1299, University of
this paper’s development. California, Berkeley, December 2003.
[24] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object
location and routing for large scale peer-to-peer systems. In Proc.
References of IFIP/ACM Middleware, Nov. 2001.
[25] S. Saroiu, P. K. Gummadi, and S. D. Gribble. A measurement
[1] Freepastry 1.3. study of peer-to-peer file sharing systems. In Proc. MMCN, Jan.
http://www.cs.rice.edu/CS/Systems/Pastry/. 2002.
[2] Gnutella. http://www.gnutella.com/. [26] S. Sen and J. Wang. Analyzing peer-to-peer traffic across large net-
[3] Inet topology generator. works. In Proc. of ACM SIGCOMM Internet Measurement Work-
http://topology.eecs.umich.edu/inet/. shop, Nov. 2002.
[4] MIT Chord. http://www.pdos.lcs.mit.edu/chord/. [27] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakr-
ishnan. Chord: A scalable peer-to-peer lookup service for Internet
[5] R. Bhagwan, S. Savage, and G. Voelker. Understanding availabil-
applications. In Proc. ACM SIGCOMM, Aug. 2001.
ity. In Proc. IPTPS, Feb. 2003.
[28] A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic,
[6] C. Blake and R. Rodrigues. High availability, scalable storage,
J. Chase, and D. Becker. Scalability and accuracy in a large-scale
dynamic peer networks: Pick two. 2003.
network emulator. In Proc. OSDI, Dec. 2002.
[7] M. Castro, M. Costa, and A. Rowstron. Performance and depend-
[29] B. Y. Zhao, Y. Duan, L. Huang, A. D. Joseph, and J. D. Kubiatow-
ability of structured peer-to-peer overlays. Technical Report MSR-
icz. Brocade: Landmark routing on overlay networks. In Proc.
TR-2003-94, Microsoft, 2003.
IPTPS, March 2002.
[8] M. Castro, M. B. Jones, A.-M. Kermarrec, A. Rowstron, [30] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and
M. Theimer, H. Wang, and A. Wolman. An evaluation of scal- J. D. Kubiatowicz. Tapestry: A resilient global-scale overlay for
able application-level multicast built using peer-to-peer overlays. service deployment. IEEE JSAC, 22(1):41–53, Jan. 2004.
Apr. 2003.