0% found this document useful (0 votes)
73 views8 pages

Cost Based Analysis of Hierarchical p2p

This document analyzes the costs of hierarchical distributed hash table (DHT) designs compared to flat DHT designs. It presents a cost model for a specific hierarchical DHT system with superpeers and leafnodes. The analysis shows that the total costs of running the network are minimized for some non-zero ratio of leafnodes to superpeers, indicating hierarchical designs can be optimal in certain situations over flat designs.

Uploaded by

youcef moualkia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views8 pages

Cost Based Analysis of Hierarchical p2p

This document analyzes the costs of hierarchical distributed hash table (DHT) designs compared to flat DHT designs. It presents a cost model for a specific hierarchical DHT system with superpeers and leafnodes. The analysis shows that the total costs of running the network are minimized for some non-zero ratio of leafnodes to superpeers, indicating hierarchical designs can be optimal in certain situations over flat designs.

Uploaded by

youcef moualkia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Cost-Based Analysis of Hierarchical DHT Design

Stefan Zoels1, Zoran Despotovic2, Wolfgang Kellerer2


1
Institute of Communication Networks, Munich University of Technology, Germany
2
DoCoMo Communications Laboratories Europe, Munich, Germany
[email protected], {despotovic, kellerer}@docomolab-euro.com

Abstract model of [8], we calculate the costs in one specific type


of hierarchical systems in which so-called superpeers
Flat DHT architectures have been the main focus of run a DHT protocol (Chord in this case) and at the
the research on DHT design so far. However, there same time serve as proxies for so-called leafnodes. The
have been also a number of works proposing hierar- flat DHT design can be viewed as a special case where
chical DHT organizations and pointing their advan- the fraction of leafnodes equals zero.
tages. They mostly rely on the intuitive understanding Our analysis shows that, depending on the proper-
that hierarchy is desirable in any complex system. In ties of the constituent peers, costs can be minimized for
this paper we formalize this intuition within a general a non-zero number of leafnodes. In such cases flat de-
cost-based framework. We provide a cost model of a signs are not an optimal DHT design solution. Instead,
specific hierarchical DHT organization composed of there are one or more optimal points corresponding to
superpeers and leafnodes, and show that the costs of true hierarchical designs with non-zero leafnodes and
running the network are not necessarily minimized for superpeers.
flat DHT organization, providing thus a formal moti- Our motivation to consider this type of hierarchical
vation for hierarchical DHTs. We further hint on what systems was driven by our wish to provide a P2P solu-
distributed algorithms can be applied in practice to tion for highly heterogeneous environments, i.e. with
reach optimal operating point of the network. large variations among the performances of the in-
volved peers. However, we emphasize that the analysis
1. Introduction below holds for homogeneous settings too, where peers
have similar capabilities.
Most of current research on Distributed Hash Table Note also that we do not claim that the specific hier-
(DHT) design concentrates on so-called flat DHT de- archical design we consider in this paper is optimal
signs, where all participating peers are considered across all possible hierarchical P2P systems. The pur-
equal in functionality. Chord [1], Pastry [2], Kadem- pose of the paper and our cost-based analysis is rather
lia [3] or P-Grid [4] are some examples of flat DHT to formally investigate whether and when hierarchical
designs. Recently, however, there have been a number designs are better than flat designs. In this way we want
of works presenting advantages of hierarchical DHT to provide strong justifications as well as an analytical
designs. [5] points out better fault isolation, more effec- framework for investigating hierarchical P2P systems
tive bandwidth utilization, and better adaptation to the in general, and determining optimal values of the in-
underlying physical network as the main reasons for volved parameters in particular. Generalizations of the
choosing hierarchical systems. [6] demonstrates that work presented in this paper targeting any general hier-
hierarchical systems offer a reduction of the lookup archical system are an important item of our future
path length. Further, in our earlier paper [7] we have work.
shown the benefits of a hierarchical DHT design for The rest of the paper is structured as follows. Sec-
mobile environments, characterized by high churn tion 2 introduces the hierarchical system design that
rates, high failure probabilities and resource-constraint underlies the analysis in this work. In Section 3 we
mobile peers. derive expressions for all costs generated in the ana-
In this paper we take a more general view by con- lyzed overlay network and show that total network
sidering the Peer-to-Peer (P2P) network as a whole costs increase with decentralization. Section 4 evalu-
having its own costs of running. Based on the cost ates the costs of centralization, provides a methodology
to determining optimal superpeer ratios and gives an Leafnodes maintain only an overlay connection to
illustrative example. In Section 5 we describe how their superpeer. To be able to recognize and react to a
these optima can be discovered and maintained in a superpeer failure, they periodically run a simple
distributed fashion. Section 6 concludes and gives an PING/PONG algorithm. Moreover, they store a list con-
outlook for future work. taining other available superpeers in the system, in or-
der to be able to rejoin the overlay network after a
2. System Architecture superpeer failure.
In contrast, superpeers perform multiple other tasks.
The system architecture we analyze in this paper de- We assume that on joining the network a leafnode
fines two different classes of peers: superpeers and transfers its list of pointers to the objects it shares to its
leafnodes. Every superpeer acts as a proxy for its leaf- corresponding superpeer. The superpeer then inserts
nodes. The leafnodes communicate, apart from uploads these references into the overlay network and acts as
and downloads, only with their superpeer. In contrast, their owner. When a leafnode performs a lookup, e.g.
superpeers establish a structured DHT-based overlay in to insert or query for an object, the superpeer it is con-
form of a Chord ring.1 Figure 1 depicts the architecture nected to resolves the lookup by using the search func-
of the analyzed overlay network. tionality of the Chord overlay, determines the
responsible superpeer (based on the object’s key), and
LN forwards the result to the leafnode. Additionally, since
LN LN
superpeers establish a conventional Chord ring, they
periodically run Chord’s STABILIZE and FIXFINGER
SP
LN algorithms for overlay maintenance. We analyze here
LN the original STABILIZE algorithm that is proposed in [1]
SP SP
LN Chord and do not consider extensions of [9]. We use a slight
modification of the FIXFINGER algorithm that we will
SP SP explain in Section 3.2. Finally, the superpeers refresh
LN
LN periodically all references they maintain in order to
LN
LN keep them up-to-date.
LN

3. Cost-Based Analysis of the Super-


Figure 1: Hierarchical P2P overlay network with superpeers
forming a Chord ring and leafnodes attached to them
peer Ratio
An important design parameter for the architecture
Our wish to provide a P2P solution for heterogene-
proposed in Section 2 is the ratio between superpeers
ous environments led us to select this form of hierar-
chical systems. Heterogeneous environments are and the total number of peers; we denote it α and call it
normally characterized by high differences among per- simply the superpeer ratio. In this section, we analyze
formances of the involved peers, e.g. highly mobile how the total network traffic costs depend on this ratio.
unreliable peers versus static and reliable ones. Intui- To start, we make a number of definitions, introduce
tively, the above architecture would fit better these our notation and explain our assumptions:
settings than a flat DHT design if reliable high per- – Number of Peers: N
formance peers take the role of superpeers while low – Number of superpeers: NSP = α ⋅ N
performance peers operate as leafnodes. But, as we will – Number of leafnodes: NLN = (1-α) ⋅ N
see below, it does make sense to propose the above – Costs for peer k for sending a message: cs k
hierarchical architecture even in homogeneous settings, – Costs for peer k for receiving a message: crk
where the differences among peers are negligible. – Lookup rate of peer k: rLKP, k = 1 / TLKP, k
– Number of shared objects provided by peer k: fk
Tasks Performed by Peers. The tasks performed We assume that every message sent/received by
by the two groups of peers can be briefly described as peer k creates same costs cs/rk, regardless of the mes-
follows. sage size. We further assume that the system is in a
steady state, i.e. no churn occurs, that every superpeer
1 Although we use Chord here as DHT between the super- knows the shared objects of its leafnodes at any time,
peers, any other structured P2P protocol could also be and that the superpeers’ Chord ring is fully populated.
used. We believe this would result in only minor changes
for the analysis in Section 3.
We begin our analysis with the total traffic costs for sent and received by all superpeers due to lookups ac-
a hierarchical overlay network with the above parame- cumulates to
ters. The total traffic costs C consist of costs for lookup N SP N LN
traffic (see Section 3.1) and of costs for maintenance M = ∑ m SP i + ∑ m LN j =
traffic (see Section 3.2): i =1 j =1
N SP N LN
C = CLKP + CMAINT = ∑ rLKP , SP i ⋅ log 2αN + ∑ rLKP, LN j ⋅ [log 2αN + 1] =
i =1 j =1
3.1. Lookup Traffic Costs = rLKP , SP ⋅ N SP ⋅ log 2 αN + rLKP , LN ⋅ N LN ⋅ [log 2αN + 1] =
When regarding lookups, we differentiate between [
= N ⋅ rLKP , SP ⋅ α ⋅ log 2αN + rLKP , LN ⋅ (1 − α ) ⋅ [log 2 αN + 1] ]
lookup costs for superpeers and lookup costs for leaf-
nodes. The total costs for lookup traffic CLKP then con- with rLKP , SP and rLKP, LN being the mean lookup rates
sist of lookup costs for all leafnodes CLKP, LN and of of all superpeers and leafnodes, respectively.
lookup costs for all superpeers CLKP, SP: We assume that every superpeer manages an equal
CLKP = CLKP, LN + CLKP, SP number of leafnodes2 and that the ID space is fully
populated. Hence every superpeer processes an equal
Lookup Costs for Leafnodes. When a leafnode per- proportion of M and the number of sent and received
forms a lookup, it sends a QUERY message to its super- messages ms and mr for any superpeer i is given by
peer. The superpeer resolves the lookup in the s r M M
superpeers’ Chord overlay and returns the result to the mSP i = m SP i = = =
N SP α ⋅ N
leafnode. Therefore, the number of sent and received
messages ms and mr for any leafnode j is given by 1−α
= rLKP , SP ⋅ log 2αN + rLKP , LN ⋅ ⋅ [log 2αN + 1]
msLN j = mrLN j = rLKP, LN j α
To obtain the lookup costs for leafnode j we multiply To obtain the lookup costs for superpeer i we multiply
the number of messages with the respective costs: the number of messages with the respective costs:
s s r r
CLKP, LN j = csLN j ⋅ msLN j + crLN j ⋅ mrLN j = C LKP , SP i = cSP i ⋅ m SP i + c SP i ⋅ mSP i =

= (csLN j + crLN j) ⋅ rLKP, LN j s


(
= cSP r
i + c SP i ⋅ )
The total lookup costs for leafnodes are thus given by
 1−α 
N LN N LN ⋅  rLKP , SP ⋅ log 2αN + rLKP , LN ⋅ ⋅ [log 2αN + 1]
C LKP , LN = ∑ C LKP, LN j = ∑ (c s
LN j +c r
LN j )⋅ r
LKP , LN j
 α 
j =1 j =1
The total lookup costs for superpeers are thus given by
N SP
Lookup Costs for Superpeers. To calculate lookup C LKP, SP = ∑ C LKP, SP i =
costs for superpeers, we separate costs generated by i =1

superpeer lookups and costs generated by leafnode  1−α 


lookups. Lookups performed by any superpeer i gener- =  rLKP, SP ⋅ log 2αN + rLKP, LN ⋅ ⋅ [log 2αN + 1] ⋅
 α 
ate a mean amount of N SP
mSP i = rLKP, SP i ⋅ log2NSP = rLKP, SP i ⋅ log2αN ⋅ ∑ cSP
s
( r
i + c SP i )
i =1
messages in the superpeers’ Chord ring. (Note that the
average number of hops to resolve a lookup in a fully By summing up the lookup costs for leafnodes and the
populated ring is ½ log2NSP, and that two messages are lookup costs for superpeers, we get the total costs for
needed per hop.) Lookups performed by any leafnode j lookup traffic CLKP.
generate a mean amount of
3.2. Maintenance Traffic Costs
mLN j = rLKP, LN j ⋅ [log2NSP +1] = rLKP, LN j ⋅ [log2αN +1]
messages sent and received by superpeers, because the Maintenance traffic is generated by the PING/PONG
leafnode’s superpeer receives the lookup request (one algorithm between leafnodes and their superpeer and
additional message received) and, after resolving the
lookup in the superpeers’ Chord overlay, sends the
result back to the leafnode (one additional message
sent). Thus, the total number M of messages that are 2
We believe this can be achieved by using an appropriate
load balancing algorithm.
by the STABILIZE, FIXFINGER and REPUBLISH algo- N SP
3 N SP

rithms in the superpeers’ Chord overlay: C STAB = ∑ CSTAB, SP i = T ⋅ ∑ cSP


s
( r
i + c SP i )
i =1 STAB i =1
CMAINT = CPING + CSTAB + CFIX + CREP
FIXFINGER Costs. Every superpeer runs the FIXFIN-
PING Costs. Every leafnode runs the PING/PONG algo- GER algorithm periodically every TFIX seconds for each
rithm periodically every TPING seconds. It sends a PING of its log2αN fingers (assuming a fully populated ID
message to its superpeer, and the superpeer answers space). Fixing a finger usually corresponds to a Chord
with a PONG message. Therefore, the number of sent lookup of the finger’s ID. However, we use here an
and received PING/PONG messages ms and mr for any improved FIXFINGER algorithm that sends a PING mes-
leafnode j is given by sage to a finger peer, and initiates a finger lookup only
msLN j = mrLN j = 1 / TPING when no PONG message is received or when the finger
To obtain the PING costs for leafnode j we multiply the peer indicates a new peer being responsible for this
number of messages with the respective costs: finger ID. Resulting, finger lookups can be avoided
when the system is in a steady state, and the number of
CPING, LN j = (csLN j + crLN j) / TPING
sent and received FIXFINGER messages ms and mr for
As in the previous section, we assume an appropri- any superpeer i is given by
ate load balancing algorithm that spreads all leafnodes
uniformly over all superpeers. Thus, the number of sent msSP i = mrSP i = log2αN ⋅ 2 / TFIX
and received PING/PONG messages ms and mr for any Here we assume that every superpeer receives the
superpeer i is given by same number of FIXFINGER PINGS from other super-
peers as it sends to them. (This is the case for a fully
s r 1 N LN 1 1−α
mSP i = m SP i = ⋅ = ⋅ populated ID space.) To obtain the FIXFINGER costs for
TPING N SP TPING α superpeer i we multiply the number of messages with
To obtain the PING costs for superpeer i we multiply the respective costs:
the number of messages with the respective costs: CFIX, SP i = (csSP i + crSP i) ⋅ log2αN ⋅ 2 / TFIX
1 1−α The total FIXFINGER costs for the overlay network are
s
C PING , SP i = cSP ( r
i + c SP i ⋅ ) ⋅
TPING α thus given by
log 2 αN ⋅ 2 N SP s
The total PING costs for the overlay network are thus C FIX =
TFIX
(
⋅ ∑ cSP i + c SP
r
i )
given by i =1
N LN N SP
C PING = ∑ C PING , LN j + ∑ C PING , SP i = REPUBLISH Costs. Every superpeer runs the REPUB-
j =1 i =1
LISH algorithm periodically every TREP seconds for
 N LN s N SP 
=
1
TPING
⋅  ∑ c LN ( r
j + c LN j ) + 1 −αα ⋅ ∑ (c s
SP i
r
+ cSP )
i 
every shared object of the superpeer itself and its leaf-
nodes. Republishing a shared object corresponds to a
 j =1 i =1 
Chord lookup of the object’s ID. Therefore, it gener-
ates on average log2NSP messages. For the analysis of
STABILIZE Costs. Every superpeer runs the STABILIZE
REPUBLISH costs, we assume that every superpeer man-
algorithm periodically every TSTAB seconds. STABILIZE
ages the same number of objects shared by the super-
requires three messages: The initiating superpeer sends
peer itself and its leafnodes, i.e.
a REQUESTPREDECESSOR message to its successor, the
N
successor responds with a RESPONSEPREDECESSOR 1
Shared objects managed by one superpeer = ⋅ ∑fk
message and finally the initiating peer sends a NOTIFY N SP k =1
message. Therefore, the number of sent and received
STABILIZE messages ms and mr for any superpeer i is Thus, republishing shared objects generates a total
given by number of REPUBLISH messages M sent and received
by all superpeers of
msSP i = mrSP i = 3 / TSTAB
N N
To obtain the STABILIZE costs for superpeer i we multi- ∑fk
1 k =1
∑fk
ply the number of messages with the respective costs: M = N SP ⋅ ⋅ ⋅ log 2 N SP = k =1 ⋅ log 2 αN
T REP N SP TREP
CSTAB, SP i = (csSP i + crSP i) ⋅ 3 / TSTAB
The total STABILIZE costs for the overlay network are As we assume a fully populated Chord ring, every
thus given by superpeer processes an equal proportion of M. There-
fore, the number of sent and received REPUBLISH mes- traffic in the superpeers’ Chord ring. The highest net-
sages ms and mr for any superpeer i is given by work costs can be observed if all peers are superpeers
N and are thus forming a conventional Chord overlay.
M
∑ fk This is the price for a completely flat overlay topology.
ms = mr = = k =1 ⋅ log 2 αN
N SP T REP ⋅ α ⋅ N
300
To obtain the REPUBLISH costs for superpeer i we mul-
tiply the number of messages with the respective costs:

Total Network Costs


N
∑ fk STAB
( s
C REP, SP i = cSP r
)T
i + c SP i ⋅
k =1
⋅α ⋅ N
⋅ log 2αN PING FIX
REP

The total REPUBLISH costs for the overlay network are


thus given by REP
N
∑ fk N SP LKP
C REP = k =1
TREP ⋅ α ⋅ N
⋅ log 2αN ⋅ ∑ cSP
s
(
r
i + c SP i ) 0
i =1 1% Superpeer Ratio α 100%

3.3. Example: Homogeneous Peers Figure 2: Total network costs against superpeer ratio α

With the expressions developed in the previous sec-


tions we can compute the total network costs for the 4. Costs of Centralization
proposed system architecture with NSP superpeers and
NLN leafnodes, and their specific values for message Based on the conclusions made in the previous para-
costs, lookup rate and number of shared objects. graph it is obvious that, with regard to the total network
The following example considers a small overlay traffic costs, a centralized overlay topology is the opti-
network with 100 homogeneous peers p0…p99. Every mal solution and that a small number of superpeers is
peer pk (k ∈ {0…99}) has the same message costs preferable. However, such centralization imposes cer-
csk = crk = 1, same number of shared objects fk = 20 and tain costs on individual peers. Every peer may have a
same lookup rates rLKP, k = 1/30[s]. Further we define maximum value of costs, i.e. a cost limit it is poised to
system-wide timer values TPING = 10[s], TSTAB = 10[s], spend for participation. In our analyzed system, most of
TFIX = 60[s] and TREP = 300[s]. Therefore, the total the arising costs account for superpeers, and the fewer
network costs are given by superpeers there are the higher costs every superpeer
has to bear. As a result, superpeers may be overloaded,
C = CLKP, LN + CLKP, SP + CPING + CSTAB + CFIX + CREP
i.e. bear higher costs than their cost limits, and thus we
with face the risk of breaking system stability. From this
CLKP, LN = 20/3⋅(1−α) point of view a high number of superpeers is prefer-
CLKP, SP = 20/3⋅[log2100α+1−α] able, as shown below.
CPING = 40⋅(1−α),
4.1. Individual Costs for Superpeers
CSTAB = 60α
CFIX = 20/3⋅α⋅ log2100α The costs for any superpeer i consist of all costs deter-
CREP = 40/3⋅log2100α mined in Section 3, i.e. lookup costs, PING costs, STA-
Figure 2 shows the total network costs as a function of BILIZE costs, FIXFINGERS costs, and REPUBLISH costs:
the superpeer ratio α = NSP / N. CSP i = CLKP, SP + CPING, SP i + CSTAB, SP i +
Obviously, a centralized overlay network with only + CFIX, SP i + CREP, SP i
one superpeer (an index server) generates the lowest If we apply this formula to our homogeneous example
network traffic costs, because only lookup and scenario of Section 3.3, the costs per superpeer are
PING/PONG messages are exchanged between the su- given by
perpeer and its leafnodes. Moreover, we notice in-
creased network traffic costs if the number of 1
C SP i = ⋅ [(α + 3) ⋅ log 2 100α + 4 + 5α ]
superpeers increases, mostly caused by maintenance 15α
Figure 3 depicts the costs per superpeer for different 300%
superpeer ratios α = NSP / N. As expected, the costs per
superpeer decrease as the number of superpeers in-
creases.

HLF
30 A HLF = 100%
Costs per Superpeer

B
0%
1% 10% Superpeer Ratio α 100%

Figure 4: Highest Load Factor against superpeer ratio α

0 Secondly, if we want to minimize total network


1% Superpeer Ratio α 100% costs without overloading any peer in the system, a
superpeer ratio of 10% (point A) should be chosen.
Figure 3: Costs per superpeer against superpeer ratio α Point A can be seen as the optimal value for α from the
network’s point of view.
Thirdly, we can also focus on minimizing the HLF
4.2. Load Factor (point B), i.e. costs for the highest loaded peer are
minimized in relation to the peer’s cost limit. In our
To determine an optimal value for the number of small example scenario, the HLF is minimized if all
superpeers taking into account their individual costs, peers are superpeers and build a conventional Chord
we define a load factor LF for every participating peer. overlay. We regard point B as an optimal value for α
The load factor of peer k specifies the ratio between the from the peers’ point of view.
costs Ck for peer k in a given scenario and the maxi- We believe that the HLF can play an important role
mum value of costs peer k is willing to accept (i.e. k’s in determining optimal operating points of a network in
cost limit Ck, max): a more general sense. As a simple example, assume
Ck that the system designer has to strike a balance between
LFk = minimizing the total network costs and minimizing the
C k , max HLF in the system. For this purpose, he can specify a
For evaluating a given scenario we focus on the weighted average
highest load factor HLF that can be observed across all c = ω1 · Total Network Costs + ω2 · HLF,
peers: and define an optimal ratio αopt = argmin(c). As men-
HLF = max(LFk) ∀ k tioned above, setting ω1 = 1 and ω2 = 0 results in
Let us again have a look at our small example sce- choosing point A in the example from Figure 4 as the
nario. If we define a cost limit for every peer optimal value, while setting ω1 = 0 and ω2 = 1 yields
Ck, max = 10, k ∈ {0…99}, we will get the dependence point B as the optimum.
of the HLF on the superpeer ratio α = NSP / N as shown Further, the load factor of a peer can be viewed as
in Figure 4. Because we have a fully homogeneous an indicator of the probability that the peer will fail.
setting in this example the shown curve is smooth and Consequently, distributions of load factors across the
basically coincides with the one from Figure 3. peers may be indicative of what we could generally call
Nevertheless, Figure 4 shows three interesting facts. the quality of the overlay network. We do not attempt
First of all, if the ratio of superpeers is below 10%, we here to define the concept of overlay network quality,
will obtain HLFs of more than 100%. Consequently, but we firmly believe that this can be done by extend-
one or more peers are overloaded, because they bear ing the analysis of Gummadi et al. [10] by the above
higher costs than they accept. In general, HLFs higher ideas. A serious difficulty in applying this would be the
than 100% should be avoided in order to ensure system necessity to determine the distributions of load factors
stability. across the peers. However, we believe that in practice
the situation is not so complicated, as the example in
the following section illustrates.
4.3. Example: Heterogeneous Peers 0,5

Total Network Costs


DSL UMTS GPRS
Consider an overlay network with 90,000 peers in
B
total, of which 30,000 peers are DSL subscribers, 0,4

[Gbit/s]
A
30,000 peers are UMTS-connected PDAs, and 30,000
peers are built of GPRS-connected cell phones. All
peers are modeled according to the parameters given in 0,3
Table 1.

Table 1: Modeling of peers 0,2


1 Number of Superpeers 90000
DSL UMTS GPRS
Lookup rate 1 / 60s 1 / 30s 1 / 30s Figure 5: Total network costs
Shared objects 500 100 50
Upstream [kbit/s] 256 92 50 300%
DSL UMTS GPRS
To define a cost limit for every peer, we here focus
on the upstream bit rate of every participating peer. We 200%
set Ck, max to 10% of the upstream bit rate of peer k. HLF
Consequently, DSL peers provide 25.6 kbit/s for P2P A
network participation, UMTS peers 9.2 kbit/s, and 100%
GPRS peers 5.0 kbit/s. In addition, we set message B
costs csk = 1 and crk = 0 for all k, thus taking only sent
messages into account.3 Further system parameters are 0%
defined according to Table 2. 1 Number of Superpeers 90000

Table 2: System parameters Figure 6: Highest Load Factor

Parameter Value With regard to the HLF diagram, we can draw the
TPING 10s following conclusions: When the number of superpeers
TSTAB 10s is less than 13,775 (number of superpeers in point A),
TFIX 60s we observe HLFs of more then 100%, i.e. overloaded
DSL superpeers. This is also the case for a number of
TREP 600s
superpeers between 30,001 and 45,291 (UMTS super-
Mean message size 1000 Bits
peers overloaded) and whenever GPRS peers act as
superpeers. As stated above these superpeers ratios
When increasing the number of superpeers in the should be avoided to ensure system stability. Interest-
following analysis, we first promote DSL peers to su- ingly, this also implies that neither a centralized system
perpeers. For scenarios with more than 30,000 super- (due to overloaded index server) nor a flat Chord over-
peers we also take UMTS peers. Finally, if we have lay (due to overloaded GPRS peers) is an applicable
more than 60,000 superpeers, even GPRS peers will act solution for the given system, thus only a hierarchical
as superpeers. Figures 5 and 6 show the total network architecture fulfils the system requirements.
costs and the highest load factor for the given overlay The optimal number of superpeers for the given sys-
network against the number of superpeers in the sys- tem lies between 13,775 (point A) and 30,000 (point
tem. B). As we can see from figures 5 and 6, at point A the
As expected, we again notice increasing total net- total network costs are minimized without overloading
work costs for an increasing number of superpeers. any participating peer in the system. At point B the
This corresponds directly to our argumentation in Sec- HLF is minimized. In this case all DSL peers act as
tion 3.3. superpeers with a load factor of 51%, while all UMTS
and GPRS peers are leafnodes and therefore have much
lower load factors.
3
We focus on the upstream bit rate of peers here because
this is mainly the bottleneck in today’s overlay networks,
e.g. due to an asymmetric down- and upstream in DSL.
5. On Distributed Algorithms to Determine minimizing the costs for the highest loaded peer in the
Optimal Superpeer Ratios system.
In our future work we plan to extend our analytical
framework to other hierarchical DHT designs, in order
There are two instances of the problem of distributed to compare them to the system architecture proposed in
determining optimal operating points of a network. this paper. We also intend to evaluate the impact of
How can an optimal point be reached from any operat- churn on our analysis. Further, we are currently work-
ing state and how can it be maintained once reached? ing on distributed algorithms to balance leafnodes and
Answering these questions in detail is an important shared objects uniformly over superpeers, and to main-
item of our future work, here we just hint on a possible tain optimal superpeer ratios.
answer to the second question in order to back our be-
lief that distributed implementations of the presented
ideas are feasible.
References
Under the assumption of sequential peer arrivals an
[1] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H.
optimal superpeer ratio can be maintained roughly as Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup
follows. When a new peer joins the network it can Service for Internet Applications", ACM SIGCOMM
communicate its cost limit to the superpeer through Conference, 2001.
which it is about to join. The superpeer is responsible [2] A. Rowstron and P. Druschel, "Pastry: Scalable, Dis-
for deciding which role the arriving peer should take tributed Object Location and Routing for Large-Scale
and whether it should replace one or more superpeers Peer-to-Peer Systems", IFIP/ACM International Con-
that are already in the network. To be able to do this, ference on Distributed Systems Platforms (Middle-
all superpeers need to maintain approximations of the ware), 2001.
[3] P. Maymounkov and D. Mazieres, "Kademlia: A Peer-
current state of the network. In particular, good ap-
to-Peer Information System Based on the XOR Metric",
proximations of the sizes of the superpeer population International Workshop on Peer-to-Peer Systems
and the entire network as well as an estimate of the (IPTPS'02), 2002.
load distribution across peers are necessary. We be- [4] K. Aberer, P. Cudré-Mauroux, A. Datta, Z. Despotovic,
lieve that the DHT size estimation from [11] can be M. Hauswirth, M. Punceva, and R. Schmidt, "P-Grid: A
extended and applied to our problem. On the other Self-organizing Structured P2P System", SIGMOD Re-
hand, it seems to be hard to discover an accurate esti- cord, vol. 32, 2003.
mate of the load distribution. We wonder whether it [5] P. Ganesan, K. Gummadi, and H. Garcia-Molina,
suffices to maintain samples made up of routing "Canon in G Major: Designing DHTs with Hierarchical
Structure", International Conference on Distributed
neighbors only or larger sets are necessary. In case
Computing Systems (ICDCS 2004), 2004.
they are needed, these sets can be spread among [6] L. Garces-Erice, E. W. Biersack, K. W. Ross, P. A.
neighbors, either by broadcasting them periodically or Felber, and G. Urvoy-Keller, "Hierarchical P2P Sys-
piggybacking them to other exchanged messages. tems", ACM/IFIP International Conference on Parallel
and Distributed Computing (Euro-Par), 2003.
6. Conclusion and Future Work [7] S. Zoels, S. Schubert, W. Kellerer, and Z. Despotovic,
"Hybrid DHT Design for Mobile Environments", Inter-
national Workshop on Agents and Peer-to-Peer Com-
We showed in this paper that hierarchical DHT or-
puting (AP2PC 2006), 2006.
ganizations provide a plausible approach to building [8] N. Christin and J. Chuang, "A Cost-Based Analysis of
P2P overlay networks. We presented an analytical Overlay Routing Geometries", IEEE INFOCOM'05,
framework to analyze a specific type of hierarchical 2005.
systems, where superpeers build a conventional Chord [9] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowcz,
overlay and leafnodes use them as proxies. We evalu- "Handling Churn in a DHT", USENIX 2004 Annual
ated the costs for running the whole network as well as Technical Conference, 2004.
the costs for every single participant, in order to deter- [10] P. K Gummadi,, R. Gummadi, S. D. Gribble, S. Ratna-
mine an optimal superpeer ratio for a given system. We samy, S. Shenker, and I. Stoica. "The Impact of DHT
Routing Geometry on Resilience and Proximity", ACM
found that total network costs decrease with centraliza- SIGCOMM Conference, 2003.
tion; while on the other hand, centralization may over- [11] G. S. Manku, "Routing Networks for Distributed Hash
load peers and therefore endanger system stability. As Tables", 22nd ACM Symposium on Principles of Dis-
our main results, we showed that hierarchical DHT tributed Computing (PODC), 2003.
design is better than flat design and that there is a
trade-off between minimizing total network costs and

You might also like