A Statistical Theory of Chord Under Churn: Abstract
A Statistical Theory of Chord Under Churn: Abstract
A Statistical Theory of Chord Under Churn: Abstract
Abstract. Most earlier studies of DHTs under churn have in one of three states: alive and correct, alive and incorrect or
either depended on simulations as the primary investigation failed. A master equation for this system is simply an equa-
tool, or on establishing bounds for DHTs to function. In this tion for the time evolution of the probability that the system is
paper, we present a complete analytical study of churn using in a particular state. Writing such an equation involves keep-
a master-equation-based approach, used traditionally in non- ing track of all the gain/loss terms which add/detract from this
equilibrium statistical mechanics to describe steady-state or probability, given the details of the dynamics. This approach
transient phenomena. Simulations are used to verify all the- is applicable to any P2P system (or indeed any system with a
oretical predictions. We demonstrate the application of our discrete set of states).
methodology to the Chord system. For any rate of churn and Our main result is that, for every outgoing pointer of a Chord
stabilization rates, and any system size, we accurately predict node, we systematically compute the probability that it is in
the fraction of failed or incorrect successor and finger point- any one of the three possible states, by computing all the gain
ers and show how we can use these quantities to predict the and loss terms that arise from the details of the Chord proto-
performance and consistency of lookups under churn. We also col under churn. This probability is different for each of the
discuss briefly how churn may actually be of different ’types’ successor and finger pointers. We then use this information to
and the implications this will have for the functioning of DHTs predict both lookup consistency (number of failed lookups) as
in general. well as lookup performance (latency) as a function of the pa-
1 Introduction rameters involved. All our results are verified by simulations.
The main novelty of our analysis is that it is carried out en-
Theoretical studies of asymptotic performance bounds of tirely from first principles i.e. all quantities are predicted solely
DHTs under churn have been conducted in works like [6, 2]. as a function of the parameters of the problem: the churn rate,
However, within these bounds, performance can vary substan- the stabilization rate and the number of nodes in the system. It
tially as a function of different design decisions and config- thus differs from earlier related theoretical studies where quan-
uration parameters. Hence simulation-based studies such as tities similar to those we predict, were either assumed to be
[5, 8, 3] often provide more realistic insights into the perfor- given [10], or measured numerically [1].
mance of DHTs. Relying on an understanding based on sim- Closest in spirit to our work is the informal derivation in
ulations alone is however not satisfactory either, since in this the original Chord paper [9] of the average number of time-
case, the DHT is treated as a black box and is only empirically outs encountered by a lookup. This quantity was approximated
evaluated, under certain operation conditions. In this paper we there by the product of the average number of fingers used in
present an alternative theoretical approach to analyzing and un- a lookup times the probability that a given finger points to a
derstanding DHTs, which aims for an accurate prediction of departed node. Our methodology not only allows us to de-
performance, rather than on placing asymptotic performance rive the latter quantity rigorously but also demonstrates how
bounds. Simulations are then used to verify all theoretical pre- this probability depends on which finger (or successor) is in-
dictions. volved. Further we are able to derive an exact relation relating
Our approach is based on constructing and working with this probability to lookup performance and consistency accu-
master equations, a widely used tool wherever the mathemati- rately at any value of the system parameters.
cal theory of stochastic processes is applied to real-world phe-
nomena [7]. We demonstrate the applicability of this approach 2 Assumptions & Definitions
to one specific DHT: Chord [9]. For Chord, it is natural to de-
Basic Notation. In what follows, we assume that the reader is
fine the state of the system as the state of all its nodes, where
familiar with Chord. However we introduce the notation used
the state of an alive node is specified by the states of all its
below. We use K to mean the size of the Chord key space and
pointers. These pointers (either fingers or successors) are then
N the number of nodes. Let M = log2 K be the number of fin-
∗
This work is funded by the Swedish VINNOVA AMRAM and PPC gers of a node and S the length of the immediate successor list,
projects, the European IST-FET PEPITO and 6th FP EVERGROW projects. usually set to a value = O(log(N )). We refer to nodes by their
keys, so a node n implies a node with key n ∈ 0 · · · K − 1. We allelized on a cluster of 14 nodes where we had N = 1000,
use p to refer to the predecessor, s for referring to the successor K = 220 , S = 6, 200 ≤ r ≤ 2000 and 0.25 ≤ α ≤ 0.75.
list as a whole, and si for the ith successor. Data structures of 3 The Analysis
different nodes are distinguished by prefixing them with a node 3.1 Distribution of Inter-Node Distances
key e.g. n′ .s1 , etc. Let f ini .start denote the start of the ith fin- During churn, the inter-node distance (the difference between
ger (Where for a node n, ∀i ∈ 1..M, n.f ini .start = n + 2i−1 ) the keys of two consecutive nodes) is a fluctuating variable. An
and f ini .node denote the actual node pointed to by that finger. important quantity used throughout the analysis is the pdf of
Steady State Assumption. λj is the rate of joins per node, inter-node distances. We define this quantity below and state
λf the rate of failures per node and λs the rate of stabilizations a theorem giving its functional form. We then mention three
per node. We carry out our analysis for the general case when properties of this distribution which are needed in the ensuing
the rate of doing successor stabilizations αλs , is not necessarily analysis. Due to space limitations, we omit the proof of this
the same as the rate at which finger stabilizations (1 − α)λs theorem and the properties here and provide them in [4].
are performed. In all that follows, we impose the steady state
Definition 3.1 Let Int(x) be the number of intervals of length
condition λj = λf . Further it is useful to define r ≡ λλfs which
x, i.e. the number of pairs of consecutive nodes which are sep-
is the relevant ratio on which all the quantities we are interested arated by a distance of x keys on the ring.
in will depend, e.g, r = 50 means that a join/fail event takes
place every half an hour for a stabilization which takes place Theorem 3.1 For a process in which nodes join or leave with
once every 36 seconds. equal rates (and the number of nodes in the network is almost
Parameters. The parameters of the problem are hence: K, constant) independently of each other and uniformly on the
N , α and r. All relevant measurable quantities should be en- ring, The probability (P (x) ≡ Int(x)
N ) of finding an interval
tirely expressible in terms of these parameters. of length x is:
N
Chord Simulation. We use our own discrete event simula- P (x) = ρx−1 (1 − ρ) where ρ = K−N
K and 1 − ρ = K
tion environment implemented in Java which can be retrieved The derivation of the distribution P (x) is independent of any
from [4]. We assume the familiarity of the reader with Chord, details of the Chord implementation and depends solely on the
however an exact analysis necessitates the provision of a few join and leave process. It is hence applicable to any DHT that
details. Successor stabilizations performed by a node n on n.s1 deploys a ring.
accomplish two main goals: i) Retrieving the predecessor and
successor list of of n.s1 and reconciling with n’s state. ii) Property 3.1 For any two keys u and v, where v = u + x,
Informing n.s1 that n is alive/newly joined. A finger stabiliza- let bi be the probability that the first node encountered inbe-
tion picks one finger at random and looks up its start. Lookups tween these two keys is at u + i (where 0 ≤ i < x − 1). Then
do not use the optimization of checking the successor list be- bi ≡ ρi (1 − ρ). The probability that there is definitely atleast
fore using the fingers. However, the successor list is used as a one node between u and v is: a(x) ≡ 1 − ρx . Hence the condi-
last resort if fingers could not provide progress. Lookups are tional probability that the first node is at a distance i given that
assumed not to change the state of a node. For joins, a new there is atleast one node in the interval is bc(i, x) ≡ b(i)/a(x).
node u finds its successor v through some initial random con- Property 3.2 The probability that a node and atleast one
tact and performs successor stabilization on that successor. All of its immediate predecessors share the same k th finger is
fingers of u that have v as an acceptable finger node are set to v. p1 (k) ≡ 1+ρρ k
(1 − ρ2 −2 ). This is ∼ 1/2 for K >> 1 and
The rest of the fingers are computed as best estimates from v ′ s N << K.Clearly p1 = 0 for k = 1. It is straightforward
routing table. All failures are ungraceful. We make the simpli- (though tedious) to derive similar expressions for p2 (k) the
fying assumption that communication delays due to a limited probability that a node and atleast two of its immediate pre-
number of hops is much smaller than the average time interval decessors share the same k th finger, p3 (k) and so on.
between joins, failures or stabilization events. However, we do
not expect that the results will change much even if this were Property 3.3 We can similarly assess the probability that the
not satisfied. join protocol (see previous section) results in further replica-
Averaging. Since we are collecting statistics like the proba- tion of the k th pointer. That is, the probability that a newly
bility of a particular finger pointer to be wrong, we need to re- joined node will choose the k th entry of its successor’s finger
k−2
peat each experiment 100 times before obtaining well-averaged table as its own k th entry is pjoin (k) ∼ ρ(1 − ρ2 −2 ) + (1 −
k−2 k−2
results. The total simulation sequential real time for obtaining ρ)(1 − ρ2 −2 ) − (1 − ρ)ρ(2k−2 − 2)ρ2 −3 . The function
the results of this paper was about 1800 hours that was par- pjoin (k) = 0 for small k and 1 for large k.
0.04 0.022
w1(r,0.25) Simulation I(r,0.25) Simulation
w1(r,0.5) Simulation 0.02 I(r,0.5) Simulation
0.035 w1(r,0.75) Simulation I(r,0.75) Simulation
w1(r,0.25) Theory 0.018 I(r,0.25) theory
0.03 w1(r,0.5) Theory 0.016 I(r,0.5) theory
w1(r,0.75) Theory I(r,0.75) theory
w1(r,α), d1(r,α)
I(r,α)
0.02
0.01
0.015 0.008
0.01 0.006
0.004
0.005
0.002
0 0
200 400 600 800 1000 1200 1400 1600 1800 2000 200 400 600 800 1000 1200 1400 1600 1800 2000
Rate of Stabilisation /Rate of failure (r=λs/λf) Rate of Stabilisation of Successors/Rate of failure (αr=αλs/λf)
Figure 1: Theory and Simulation for w1 (r, α), d1 (r, α), I(r, α)
0.15 8
7.5
0.1
7
0.05
6.5
0 6
100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Rate of Stabilisation of Fingers/Rate of failure ((1-α)r=(1-α)λs/λf) Rate of Stabilisation of Fingers/Rate of failure ((1-α)r=(1-α)λs/λf)
Figure 3: Theory and Simulation for fk (r, α), and L(r, α)