A Statistical Theory of Chord Under Churn: Abstract

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A Statistical Theory of Chord under Churn ∗

Supriya Krishnamurthy1 , Sameh El-Ansary1 , Erik Aurell1,2 and Seif Haridi1,3


1
Swedish Institute of Computer Science (SICS), Sweden
2
Department of Physics, KTH-Royal Institute of Technology, Sweden
3
IMIT, KTH-Royal Institute of Technology, Sweden
{supriya,sameh,eaurell,seif}@sics.se

Abstract. Most earlier studies of DHTs under churn have in one of three states: alive and correct, alive and incorrect or
either depended on simulations as the primary investigation failed. A master equation for this system is simply an equa-
tool, or on establishing bounds for DHTs to function. In this tion for the time evolution of the probability that the system is
paper, we present a complete analytical study of churn using in a particular state. Writing such an equation involves keep-
a master-equation-based approach, used traditionally in non- ing track of all the gain/loss terms which add/detract from this
equilibrium statistical mechanics to describe steady-state or probability, given the details of the dynamics. This approach
transient phenomena. Simulations are used to verify all the- is applicable to any P2P system (or indeed any system with a
oretical predictions. We demonstrate the application of our discrete set of states).
methodology to the Chord system. For any rate of churn and Our main result is that, for every outgoing pointer of a Chord
stabilization rates, and any system size, we accurately predict node, we systematically compute the probability that it is in
the fraction of failed or incorrect successor and finger point- any one of the three possible states, by computing all the gain
ers and show how we can use these quantities to predict the and loss terms that arise from the details of the Chord proto-
performance and consistency of lookups under churn. We also col under churn. This probability is different for each of the
discuss briefly how churn may actually be of different ’types’ successor and finger pointers. We then use this information to
and the implications this will have for the functioning of DHTs predict both lookup consistency (number of failed lookups) as
in general. well as lookup performance (latency) as a function of the pa-
1 Introduction rameters involved. All our results are verified by simulations.
The main novelty of our analysis is that it is carried out en-
Theoretical studies of asymptotic performance bounds of tirely from first principles i.e. all quantities are predicted solely
DHTs under churn have been conducted in works like [6, 2]. as a function of the parameters of the problem: the churn rate,
However, within these bounds, performance can vary substan- the stabilization rate and the number of nodes in the system. It
tially as a function of different design decisions and config- thus differs from earlier related theoretical studies where quan-
uration parameters. Hence simulation-based studies such as tities similar to those we predict, were either assumed to be
[5, 8, 3] often provide more realistic insights into the perfor- given [10], or measured numerically [1].
mance of DHTs. Relying on an understanding based on sim- Closest in spirit to our work is the informal derivation in
ulations alone is however not satisfactory either, since in this the original Chord paper [9] of the average number of time-
case, the DHT is treated as a black box and is only empirically outs encountered by a lookup. This quantity was approximated
evaluated, under certain operation conditions. In this paper we there by the product of the average number of fingers used in
present an alternative theoretical approach to analyzing and un- a lookup times the probability that a given finger points to a
derstanding DHTs, which aims for an accurate prediction of departed node. Our methodology not only allows us to de-
performance, rather than on placing asymptotic performance rive the latter quantity rigorously but also demonstrates how
bounds. Simulations are then used to verify all theoretical pre- this probability depends on which finger (or successor) is in-
dictions. volved. Further we are able to derive an exact relation relating
Our approach is based on constructing and working with this probability to lookup performance and consistency accu-
master equations, a widely used tool wherever the mathemati- rately at any value of the system parameters.
cal theory of stochastic processes is applied to real-world phe-
nomena [7]. We demonstrate the applicability of this approach 2 Assumptions & Definitions
to one specific DHT: Chord [9]. For Chord, it is natural to de-
Basic Notation. In what follows, we assume that the reader is
fine the state of the system as the state of all its nodes, where
familiar with Chord. However we introduce the notation used
the state of an alive node is specified by the states of all its
below. We use K to mean the size of the Chord key space and
pointers. These pointers (either fingers or successors) are then
N the number of nodes. Let M = log2 K be the number of fin-

This work is funded by the Swedish VINNOVA AMRAM and PPC gers of a node and S the length of the immediate successor list,
projects, the European IST-FET PEPITO and 6th FP EVERGROW projects. usually set to a value = O(log(N )). We refer to nodes by their
keys, so a node n implies a node with key n ∈ 0 · · · K − 1. We allelized on a cluster of 14 nodes where we had N = 1000,
use p to refer to the predecessor, s for referring to the successor K = 220 , S = 6, 200 ≤ r ≤ 2000 and 0.25 ≤ α ≤ 0.75.
list as a whole, and si for the ith successor. Data structures of 3 The Analysis
different nodes are distinguished by prefixing them with a node 3.1 Distribution of Inter-Node Distances
key e.g. n′ .s1 , etc. Let f ini .start denote the start of the ith fin- During churn, the inter-node distance (the difference between
ger (Where for a node n, ∀i ∈ 1..M, n.f ini .start = n + 2i−1 ) the keys of two consecutive nodes) is a fluctuating variable. An
and f ini .node denote the actual node pointed to by that finger. important quantity used throughout the analysis is the pdf of
Steady State Assumption. λj is the rate of joins per node, inter-node distances. We define this quantity below and state
λf the rate of failures per node and λs the rate of stabilizations a theorem giving its functional form. We then mention three
per node. We carry out our analysis for the general case when properties of this distribution which are needed in the ensuing
the rate of doing successor stabilizations αλs , is not necessarily analysis. Due to space limitations, we omit the proof of this
the same as the rate at which finger stabilizations (1 − α)λs theorem and the properties here and provide them in [4].
are performed. In all that follows, we impose the steady state
Definition 3.1 Let Int(x) be the number of intervals of length
condition λj = λf . Further it is useful to define r ≡ λλfs which
x, i.e. the number of pairs of consecutive nodes which are sep-
is the relevant ratio on which all the quantities we are interested arated by a distance of x keys on the ring.
in will depend, e.g, r = 50 means that a join/fail event takes
place every half an hour for a stabilization which takes place Theorem 3.1 For a process in which nodes join or leave with
once every 36 seconds. equal rates (and the number of nodes in the network is almost
Parameters. The parameters of the problem are hence: K, constant) independently of each other and uniformly on the
N , α and r. All relevant measurable quantities should be en- ring, The probability (P (x) ≡ Int(x)
N ) of finding an interval
tirely expressible in terms of these parameters. of length x is:
N
Chord Simulation. We use our own discrete event simula- P (x) = ρx−1 (1 − ρ) where ρ = K−N
K and 1 − ρ = K
tion environment implemented in Java which can be retrieved The derivation of the distribution P (x) is independent of any
from [4]. We assume the familiarity of the reader with Chord, details of the Chord implementation and depends solely on the
however an exact analysis necessitates the provision of a few join and leave process. It is hence applicable to any DHT that
details. Successor stabilizations performed by a node n on n.s1 deploys a ring.
accomplish two main goals: i) Retrieving the predecessor and
successor list of of n.s1 and reconciling with n’s state. ii) Property 3.1 For any two keys u and v, where v = u + x,
Informing n.s1 that n is alive/newly joined. A finger stabiliza- let bi be the probability that the first node encountered inbe-
tion picks one finger at random and looks up its start. Lookups tween these two keys is at u + i (where 0 ≤ i < x − 1). Then
do not use the optimization of checking the successor list be- bi ≡ ρi (1 − ρ). The probability that there is definitely atleast
fore using the fingers. However, the successor list is used as a one node between u and v is: a(x) ≡ 1 − ρx . Hence the condi-
last resort if fingers could not provide progress. Lookups are tional probability that the first node is at a distance i given that
assumed not to change the state of a node. For joins, a new there is atleast one node in the interval is bc(i, x) ≡ b(i)/a(x).
node u finds its successor v through some initial random con- Property 3.2 The probability that a node and atleast one
tact and performs successor stabilization on that successor. All of its immediate predecessors share the same k th finger is
fingers of u that have v as an acceptable finger node are set to v. p1 (k) ≡ 1+ρρ k
(1 − ρ2 −2 ). This is ∼ 1/2 for K >> 1 and
The rest of the fingers are computed as best estimates from v ′ s N << K.Clearly p1 = 0 for k = 1. It is straightforward
routing table. All failures are ungraceful. We make the simpli- (though tedious) to derive similar expressions for p2 (k) the
fying assumption that communication delays due to a limited probability that a node and atleast two of its immediate pre-
number of hops is much smaller than the average time interval decessors share the same k th finger, p3 (k) and so on.
between joins, failures or stabilization events. However, we do
not expect that the results will change much even if this were Property 3.3 We can similarly assess the probability that the
not satisfied. join protocol (see previous section) results in further replica-
Averaging. Since we are collecting statistics like the proba- tion of the k th pointer. That is, the probability that a newly
bility of a particular finger pointer to be wrong, we need to re- joined node will choose the k th entry of its successor’s finger
k−2
peat each experiment 100 times before obtaining well-averaged table as its own k th entry is pjoin (k) ∼ ρ(1 − ρ2 −2 ) + (1 −
k−2 k−2
results. The total simulation sequential real time for obtaining ρ)(1 − ρ2 −2 ) − (1 − ρ)ρ(2k−2 − 2)ρ2 −3 . The function
the results of this paper was about 1800 hours that was par- pjoin (k) = 0 for small k and 1 for large k.
0.04 0.022
w1(r,0.25) Simulation I(r,0.25) Simulation
w1(r,0.5) Simulation 0.02 I(r,0.5) Simulation
0.035 w1(r,0.75) Simulation I(r,0.75) Simulation
w1(r,0.25) Theory 0.018 I(r,0.25) theory
0.03 w1(r,0.5) Theory 0.016 I(r,0.5) theory
w1(r,0.75) Theory I(r,0.75) theory
w1(r,α), d1(r,α)

0.025 d1(r,0.75) Simulation 0.014


d1(r, 0.75) Theory
0.012

I(r,α)
0.02
0.01
0.015 0.008

0.01 0.006
0.004
0.005
0.002
0 0
200 400 600 800 1000 1200 1400 1600 1800 2000 200 400 600 800 1000 1200 1400 1600 1800 2000
Rate of Stabilisation /Rate of failure (r=λs/λf) Rate of Stabilisation of Successors/Rate of failure (αr=αλs/λf)

Figure 1: Theory and Simulation for w1 (r, α), d1 (r, α), I(r, α)

Change in W1 (r, α) Rate of Change


W1 (t + ∆t) = W1 (t) + 1 c1 = (λj ∆t)(1 − w1 )
W1 (t + ∆t) = W1 (t) + 1 c2 = λf (1 − w1 )2 ∆t
W1 (t + ∆t) = W1 (t) − 1 c3 = λf w12 ∆t
W1 (t + ∆t) = W1 (t) − 1 c4 = αλs w1 ∆t
W1 (t + ∆t) = W1 (t) 1 − (c1 + c2 + c3 + c4 )
Table 1: Gain and loss terms for W1 (r, α): the number of
wrong first successors as a function of r and α.

periodically contacts its first successor, possibly correcting it


and reconciling with its successor list. Therefore, the number
of wrong k th successor pointers are not independent quantities
Figure 2: Changes in W1 , the number of wrong (failed or out- but depend on the number of wrong first successor pointers.
dated) s1 pointers, due to joins, failures and stabilizations. We consider only s1 here.
We write an equation for W1 (r, α) by accounting for all the
3.2 Successor Pointers events that can change it in a micro event of time ∆t. An illus-
In order to get a master-equation description which keeps all tration of the different cases in which changes in W1 take place
the details of the system and is still tractable, we make the due to joins, failures and stabilizations is provided in figure 2.
ansatz that the state of the system is the product of the states In some cases W1 increases/decreases while in others it stays
of its nodes, which in turn is the product of the states of all unchanged. For each increase/decrease, table 1 provides the
its pointers. As we will see this ansatz works very well. Now corresponding probability.
we need only consider how many kinds of pointers there are By our implementation of the join protocol, a new node ny ,
in the system and the states these can be in. Consider first the joining between two nodes nx and nz , has its s1 pointer always
successor pointers. correct after the join. However the state of nx .s1 before the join
Let wk (r, α), dk (r, α) denote the fraction of nodes hav- makes a difference. If nx .s1 was correct (pointing to nz ) before
ing a wrong k th successor pointer or a failed one respectively the join, then after the join it will be wrong and therefore W1
and Wk (r, α), Dk (r, α) be the respective numbers . A failed increases by 1. If nx .s1 was wrong before the join, then it will
pointer is one which points to a departed node and a wrong remain wrong after the join and W1 is unaffected. Thus, we
pointer points either to an incorrect node (alive but not correct) need to account for the former case only. The probability that
or a dead one. As we will see, both these quantities play a role nx .s1 is correct is 1 − w1 and from that follows the term c1 .
in predicting lookup consistency and lookup length. For failures, we have 4 cases. To illustrate them we use
By the protocol for stabilizing successors in Chord, a node nodes nx , ny , nz and assume that ny is going to fail. First,
if both nx .s1 and ny .s1 were correct, then the failure of ny
will make nx .s1 wrong and hence W1 increases by 1. Sec-
ond, if nx .s1 and ny .s1 were both wrong, then the failure of ny
will decrease W1 by one, since one wrong pointer disappears.
Third, if nx .s1 was wrong and ny .s1 was correct, then W1 is
unaffected. Fourth, if nx .s1 was correct and ny .s1 was wrong,
then the wrong pointer of ny disappeared and nx .s1 became
wrong, therefore W1 is unaffected. For the first case to happen,
we need to pick two nodes with correct pointers, the probabil-
ity of this is (1 − w1 )2 . For the second case to happen, we need
to pick two nodes with wrong pointers, the probability of this
is w12 . From these probabilities follow the terms c2 and c3 .
Finally, a successor stabilization does not affect W1 , unless
the stabilizing node had a wrong pointer. The probability of Figure 4: Changes in Fk , the number of failed f ink pointers,
picking such a node is w1 . From this follows the term c4 . due to joins, failures and stabilizations.
Hence the equation for W1 (r, α) is:
Fk (t + ∆t) Rate of Change
dW1 = Fk (t) + 1 c1 = (λj ∆t)pjoin (k)fk
= λj (1 − w1 ) + λf (1 − w1 )2 − λf w12 − αλs w1 = Fk (t) − 1 c2 = (1 − α) M 1
fk (λs ∆t)
dt 2
= Fk (t) + 1 c3 = (1 − fk ) [1 − p1 (k)](λf ∆t)
Solving for w1 in the steady state and putting λj = λf , we get: = Fk (t) + 2 c4 = (1 − fk )2 (p1 (k) − p2 (k))(λf ∆t)
= Fk (t) + 3 c5 = (1 − fk )2 (p2 (k) − p3 (k))(λf ∆t)
2 2
w1 (r, α) = ≈ (1) = Fk (t) 1 − (c1 + c2 + c3 + c4 + c5 )
3 + rα rα
Table 2: Some of the relevant gain and loss terms for Fk , the
This expression matches well with the simulation results as
number of nodes whose kth fingers are pointing to a failed
shown in figure 1. d1 (r, α) is then ≈ 12 w1 (r, α) since when
node for k > 1.
λj = λf , about half the number of wrong pointers are incorrect
1
and about half point to dead nodes. Thus d1 (r, α) ≈ rα which
also matches well the simulations as shown in figure 1. We can number. For notational simplicity, we write these as simply Fk
also use the above reasoning to iteratively get wk (r, α) for any and fk . We can predict this function for any k by again esti-
k. mating the gain and loss terms for this quantity, caused by a
Lookup Consistency By the lookup protocol, a lookup is join, failure or stabilization event, and keeping only the most
inconsistent if the immediate predecessor of the sought key relevant terms. These are listed in table 2.
has an wrong s1 pointer. However, we need only consider the A join event can play a role here by increasing the number
case when the s1 pointer is pointing to an alive (but incorrect) of Fk pointers if the successor of the joinee had a failed k th
node since our implementation of the protocol always requires pointer (occurs with probability fk ) and the joinee replicated
the lookup to return an alive node as an answer to the query. this from the successor (occurs with probability pjoin (k) from
The probability that a lookup is inconsistent I(r, α) is hence property 3.3).
w1 (r, α) − d1 (r, α). This prediction matches the simulation A stabilization evicts a failed pointer if there was one to be-
results very well, as shown in figure 1. gin with. The stabilization rate is divided by M, since a node
stabilizes any one finger randomly, every time it decides to sta-
3.3 Failure of Fingers bilize a finger at rate (1 − α)λs .
We now turn to estimating the fraction of finger pointers which Given a node n with an alive k th finger (occurs with prob-
point to failed nodes. As we will see this is an important quan- ability 1 − fk ), when the node pointed to by that finger fails,
tity for predicting lookups. Unlike members of the successor the number of failed k th fingers (Fk ) increases. The amount
list, alive fingers even if outdated, always bring a query closer of this increase depends on the number of immediate predeces-
to the destination and do not affect consistency. Therefore we sors of n that were pointing to the failed node with their k th
consider fingers in only two states, alive or dead (failed). finger. That number of predecessors could be 0, 1, 2,.. etc. Us-
th
Let fk (r, α) denote the fraction of nodes having their k fin- ing property 3.2 the respective probabilities of those cases are:
ger pointing to a failed node and Fk (r, α) denote the respective 1 − p1 (k), p1 (k) − p2 (k), p2 (k) − p3 (k),... etc.
0.3 10
f7(r,0.5) Simulation L(r,0.5) Simulation
f7(r,0.5) Theory L(r,0.5) Theory

Lookup latency (hops+timeouts) L(r,α)


f9(r,0.5) Simulation 9.5
0.25
f9(r,0.5) Theory
f11(r,0.5) Simulation 9
f11(r,0.5) Theory
0.2 f14(r,0.5) Simulation 8.5
f14(r,0.5) Theory
fk(r,α)

0.15 8

7.5
0.1
7
0.05
6.5

0 6
100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Rate of Stabilisation of Fingers/Rate of failure ((1-α)r=(1-α)λs/λf) Rate of Stabilisation of Fingers/Rate of failure ((1-α)r=(1-α)λs/λf)
Figure 3: Theory and Simulation for fk (r, α), and L(r, α)

Solving for fk in the steady state, we get: have C1 = 1 − d1 + 2 × d1 (1 − d2 ) + 3 × d1 d2 (1 − d3 ) + · · · ≈


h i 1 + d1 = 1 + 1/(αr).
r(1−α)
2P̃rep (k) + 2 − pjoin (k) + M For finding the expected cost of reaching a general distance
fk = t we need to follow closely the Chord protocol, which would
2(1 + P̃rep (k))
rh lookup t by first finding the closest preceding finger. For no-
i2
2P̃rep (k) + 2 − pjoin (k) + r(1−α) − 4(1 + P̃rep (k))2 tational simplicity, let us define ξ to be the start of the finger
M
− (say the k th ) that most closely precedes t. Thus t = ξ + m,
2(1 + P̃rep (k)) i.e. there are m keys between the sought target t and the start
(2) of the most closely preceding finger. With that, we can write a
recursion relation for Cξ+m as follows:
where P̃rep (k) = Σpi (k). In principle its enough to keep
even three terms in the sum. The above expressions match very
Cξ+m = Cξ [1 − a(m)]
well with the simulation results (figure 3).
m
" #
X
3.4 Cost of Finger Stabilizations and Lookups + (1 − fk ) a(m) + bm+1−i Ci
i=1
In this section, we demonstrate how the information about the · k−1
failed fingers and successors can be used to predict the cost X (3)
+ fk a(m) 1 + hk (i)
of stabilizations, lookups or in general the cost for reaching
i=1
any key in the id space. By cost we mean the number of hops ξ/2i ¸
needed to reach the destination including the number of time- X
i
bc(l, ξ/2 )(1 + Cξi +1−l+m ) + 2hk (k)
outs encountered en-route. For this analysis, we consider time-
l=1
outs and hops to add equally to the cost. We can easily gener-
where ξi ≡ m=1,i ξ/2m and hk (i) is the probability that
P
alize this analysis to investigate the case when a timeout costs
some factor n times the cost of a hop. a node is forced to use its k − ith finger owing to the death
Define Ct (r, α) (also denoted Ct ) to be the expected cost for of its k th finger. The probabilities a, b, bc have already been
a given node to reach some target key which is t keys away introduced in section 3.
from it (which means reaching the first successor of this key). The lookup equation though rather complicated at first sight
For example, C1 would then be the cost of looking up the adja- merely accounts for all the possibilities that a Chord lookup
cent key (1 key away). Since the adjacent key is always stored will encounter, and deals with them exactly as the protocol dic-
at the first alive successor, therefore if the first successor is alive tates. The first term accounts for the eventuality that there is no
(occurs with probability 1 − d1 ), the cost will be 1 hop. If the node intervening between ξ and ξ + m (occurs with probabil-
first successor is dead but the second is alive (occurs with prob- ity 1 − a(m)). In this case, the cost of looking for ξ + m is
ability d1 (1 − d2 )), the cost will be 1 hop + 1 timeout = 2 and the same as the cost for looking for ξ. The second term ac-
the expected cost is 2 × d1 (1 − d2 ) and so forth. Therefore, we counts for the situation when a node does intervene inbetween
(with probability a(m)), and this node is alive (with probability existing theoretical work done on DHTs in that it aims not at
1 − fk ). Then the query is passed on to this node (with 1 added establishing bounds, but on precise determination of the rele-
to register the increase in the number of hops) and then the cost vant quantities in this dynamically evolving system. From the
depends on the length of the distance between this node and t. match of our theory and the simulations, it can be seen that we
The third term accounts for the case when the intervening node can predict with an accuracy of greater than 1% in most cases.
is dead (with probability fk ). Then the cost increases by 1 (for Apart from the usefulness of this approach for its own sake,
a timeout) and the query needs to be passed back to the closest we can also gain some new insights into the system from it.
preceding finger. We hence compute the probability hk (i) that For example, we see that the fraction of dead finger pointers
it is passed back to the k − ith finger either because the inter- fk is an increasing function of the length of the finger. Infact
vening fingers are dead or share the same finger table entry as for large enough K, all the long fingers will be dead most of
the k th finger. The cost of the lookup now depends on the re- the time, making routing very inefficient. This implies that we
maining distance to the sought key. The expression for hk (i) is need to consider a different stabilization scheme for the fingers
easy to compute using theorem 3.1 and the expression for the (such as, perhaps, stabilizing the longer fingers more often than
fk ’s [4]. the smaller ones), in order that the DHT continues to function
The cost for general lookups is hence at high churn rates. We also expect that we can use this analysis
to understand and analyze other DHTs.
ΣK−1
i=1 Ci (r, α)
L(r, α) = References
K
The lookup equation is solved recursively, given the coeffi- [1] Karl Aberer, Anwitaman Datta, and Manfred Hauswirth, Efficient, self-
contained handling of identity in peer-to-peer systems, IEEE Transac-
cients and C1 . We plot the result in Fig 3. The theoretical result tions on Knowledge and Data Engineering 16 (2004), no. 7, 858–869.
matches the simulation very well. [2] James Aspnes, Zo&#235; Diamadi, and Gauri Shah, Fault-tolerant
4 Discussion and Conclusion routing in peer-to-peer systems, Proceedings of the twenty-first annual
symposium on Principles of distributed computing, ACM Press, 2002,
We now discuss a broader issue, connected with churn, which pp. 223–232.
arises naturally in the context of our analysis. As we mentioned [3] Miguel Castro, Manuel Costa, and Antony Rowstron, Performance and
earlier, all our analysis is performed in the steady state where dependability of structured peer-to-peer overlays, Proceedings of the
the rate of joins is the same as the rate of departures. However 2004 International Conference on Dependable Systems and Networks
(DSN’04), IEEE Computer Society, 2004.
this rate itself can be chosen in different ways. While we ex-
[4] Sameh El-Ansary, Supriya Krishnamurthy, Erik Aurell, and
pect the mean behaviour to be the same in all these cases, the
Seif Haridi, An analytical study of consistency and perfor-
fluctuations are very different with consequent implications for mance of DHTs under churn (draft), Tech. Report TR-2004-
the functioning of DHTs. The case where fluctuations play the 12, Swedish Institute of Computer Science, October 2004,
least role are when the join rate is “per-network” (The number http://www.sics.se/ sameh/pubs/TR2004 12.
of joinees does not depend on the current number of nodes in [5] Jinyang Li, Jeremy Stribling, Thomer M. Gil, Robert Morris, and Frans
the network) and the failure rate is “per-node” (the number of Kaashoek, Comparing the performance of distributed hash tables un-
der churn, The 3rd International Workshop on Peer-to-Peer Systems
failures does depend on the current number of occupied nodes). (IPTPS’02) (San Diego, CA), Feb 2004.
In this case, the steady state condition is λj /N = λf guaran- [6] David Liben-Nowell, Hari Balakrishnan, and David Karger, Analysis
teeing that N can not deviate too much from the steady state of the evolution of peer-to-peer systems, ACM Conf. on Principles of
value. In the two other cases where the join and failure rate Distributed Computing (PODC) (Monterey, CA), July 2002.
are both per-network or (as in the case considered in this pa- [7] N.G. van Kampen, Stochastic Processes in Physics and Chemistry,
per) both per-node, there is no such “repair” mechanism, and North-Holland Publishing Company, 1981, ISBN-0-444-86200-5.
a large fluctuation can (and will) drive the number of nodes [8] Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz,
to extinction, causing the DHT to die. In the former case, the Handling churn in a DHT, Proceedings of the 2004 USENIX Annual
3 Technical Conference(USENIX ’04) (Boston, Massachusetts, USA),
time-to-die scales with the number of nodes as ∼ N while in June 2004.
the latter case it scales as ∼ N 2 [4]. Which of these ’types’ of [9] Ion Stoica, Robert Morris, David Liben-Nowell, David Karger,
churn is the most relevant? We imagine that this depends on M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan, Chord: A
the application and it is hence probably of importance to study scalable peer-to-peer lookup service for internet applications, IEEE
all of them in detail. Transactions on Networking 11 (2003).
To summarize, in this paper, we have presented a detailed [10] Shengquan Wang, Dong Xuan, and Wei Zhao, On resilience of
structured peer-to-peer systems, GLOBECOM 2003 - IEEE Global
theoretical analysis of a DHT-based P2P system, Chord, us-
Telecommunications Conference, Dec 2003, pp. 3851–3856.
ing a Master-equation formalism. This analysis differs from

You might also like