0% found this document useful (0 votes)
16 views15 pages

Debunking Some Myths About Structured and Unstruct

This paper compares structured and unstructured overlays in peer-to-peer networks, debunking myths that structured overlays are more expensive to maintain and less flexible. The authors demonstrate that structured overlays can efficiently support complex queries and manage high churn rates with comparable maintenance overhead to unstructured overlays. They present techniques that exploit structural constraints to enhance performance and robustness in data discovery processes.

Uploaded by

Bird and Comb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views15 pages

Debunking Some Myths About Structured and Unstruct

This paper compares structured and unstructured overlays in peer-to-peer networks, debunking myths that structured overlays are more expensive to maintain and less flexible. The authors demonstrate that structured overlays can efficiently support complex queries and manage high churn rates with comparable maintenance overhead to unstructured overlays. They present techniques that exploit structural constraints to enhance performance and robustness in data discovery processes.

Uploaded by

Bird and Comb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220832129

Debunking Some Myths About Structured and Unstructured Overlays.

Conference Paper · January 2005


Source: DBLP

CITATIONS READS
127 74

3 authors, including:

Miguel Castro Manuel Costa


Microsoft Microsoft
110 PUBLICATIONS 16,902 CITATIONS 31 PUBLICATIONS 2,513 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Miguel Castro on 02 May 2014.

The user has requested enhancement of the downloaded file.


Debunking some myths about structured and unstructured overlays

Miguel Castro Manuel Costa Antony Rowstron


Microsoft Research, 7 J J Thomson Avenue, Cambridge, UK

Abstract nodes. Each node visited during a flood or random walk


evaluates the query locally on the data items that it stores.
We present a comparison of structured and unstructured
This approach supports arbitrarily complex queries and
overlays that decouples overlay topology maintenance
it does not impose any constraints on the overlay topol-
from query mechanism. Structured overlays provide ef-
ogy or on data placement, for example, each node can
ficient support for simple exact-match queries but they
choose any other node to be its neighbour in the overlay
constrain overlay topology to achieve this. Unstructured
and it can store the data it owns. There has been a large
overlays do not constrain overlay topology or query com-
amount of work on improving unstructured overlays, for
plexity because they use flooding or random walks to
example [10, 13, 24].
discover data. It is commonly believed that structured
Structured overlays, like Tapestry [35], CAN [25],
overlays are more expensive to maintain, that their topol-
Chord [32] and Pastry [29], were developed to improve
ogy constraints make it harder to exploit heterogeneity,
the performance of data discovery. They impose con-
and that they cannot support complex queries efficiently.
straints both on the topology of the overlay and on data
We performed a detailed comparison study using sim-
placement to enable efficient discovery of data. Each
ulations driven by real-world traces that debunks these
data item is identified by a key and nodes are organized
widespread myths. We describe techniques that exploit
into a structured graph topology that maps each key to
structural constraints to achieve low maintenance over-
a responsible node. The data or a pointer to the data is
head and we present a modified neighbour selection algo-
stored at the node responsible for its key. These con-
rithm that can exploit heterogeneity effectively. We also
straints provide efficient support for exact-match queries;
describe techniques to perform floods and random walks
they enable discovery of a data item given its key in
on structured topologies. These techniques exploit struc-
typically only O(logN ) hops with only O(logN ) neigh-
tural constraints to support complex queries with better
bours per node. It is possible to support more complex
performance than unstructured overlays.
queries by building indices on top of structured overlays
but current solutions perform worse than unstructured
1 Introduction overlays [20].
It is commonly believed that structured overlays are
There has been much interest in peer-to-peer data shar- more expensive to maintain in the presence of churn, that
ing applications. They are used by millions of users and their topology constraints remove the flexibility neces-
they represent a large fraction of the traffic in the Inter- sary to exploit heterogeneity, and that they cannot sup-
net [31]. These applications are built on top of large- port complex queries efficiently (see for example, [10]).
scale network overlays that provide mechanisms to dis- This paper presents a detailed comparison of structured
cover data stored by overlay nodes. There is an ongoing and unstructured overlays that contradicts these myths.
debate in the research community on the relative mer- We explore the design space by decoupling overlay
its of two types of overlays: unstructured and structured. topology maintenance from query mechanisms.
This paper presents a comparison study of unstructured
and structured overlays that contributes to this debate by • We evaluate a technique that exploits structure to
debunking some widespread myths. reduce maintenance overhead. It eliminates redun-
Unstructured overlays, for example Gnutella [1], or- dant failure detection probes by using structure to
ganize nodes into a random graph topology and use partition failure detection responsibility and to lo-
floods or random walks to discover data stored by overlay cate nodes that need to be informed about failures

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 85
and new node arrivals. We show that this technique 2 Topology maintenance with churn
can achieve robustness to high rates of churn with
overhead lower than unstructured overlays. Measurement studies of deployed peer-to-peer overlays
have observed a high rate of churn [4, 17, 30]; nodes join
• We describe how to exploit heterogeneity by mod- and leave these overlays constantly. Therefore, peer-to-
ifying any proximity neighbour selection algo- peer overlays should be able to cope with a high rate of
rithm [8, 35, 16] to adapt the topology such that the churn.
indegree of nodes matches their capacity. Can unstructured overlays cope with churn better than
structured overlays?
• We introduce techniques to support complex Each node maintains a set of neighbours to form
queries efficiently on structured topologies with- an overlay. Structured overlays impose constraints on
out constraints on data placement. These tech- the overlay topology; nodes have identifiers and two
niques perform floods or random walks on struc- nodes can be neighbours only if their identifiers satisfy
tured topologies but exploit structural constraints certain constraints. Unstructured overlays do not im-
to ensure that nodes are visited only once during pose constraints on neighbours. Both types of overlay
a query, the number of visited nodes is controlled can improve robustness to churn at the expense of in-
accurately, and the average capacity of nodes vis- creased maintenance overhead by increasing the num-
ited during a query is increased to better exploit ber of neighbours per node and probing them more fre-
heterogeneity. Additionally, they remove the need quently to detect and replace failed neighbours.
to maintain both a structured and an unstructured It is believed that maintaining a structured overlay in
overlay to implement hybrid search strategies [22]. the presence of churn is more expensive than maintain-
ing an unstructured overlay because of the constraints
The paper presents results of detailed comparisons be- on neighbour selection. This section shows that this is
tween several representative structured and unstructured not necessarily the case. It is possible to use structure to
overlay topology maintenance algorithms. These results achieve better robustness with lower maintenance over-
were obtained using simulations driven by real-world head in a structured overlay.
traces of node arrivals and departures in the Gnutella
Structured overlays also impose constraints on data
file sharing application [30]. The results show that our
placement that can result in high overhead under churn
techniques enable structured overlays to cope with high
for some applications [5]. We study structured overlays
rates of churn and exploit heterogeneity effectively with
without these constraints to keep the evaluation indepen-
a maintenance overhead comparable to that achieved by
dent of any particular application. Data placement con-
state-of-the-art unstructured overlays.
straints do not result in significant overhead in several ap-
We also compared the performance of data discovery plications (for example, content distribution [9] and Web
using several representative unstructured overlays and caching [19]) and the search technique in Section 4 does
using our techniques to perform floods and random walks not constrain data placement at all.
on structured overlays. We used a real trace of content
This section describes the implementation of struc-
distribution across nodes in the eDonkey peer-to-peer file
tured and unstructured overlay maintenance protocols
sharing application [12] to drive the simulations. The re-
in an homogeneous setting and compares their perfor-
sults show that our techniques can discover data more
mance. The next section explains how to exploit hetero-
often, faster, or with lower overhead.
geneity.
The additional functionality provided by structured
overlays has proven important to achieve scalability and
efficiency in a wide range of applications. Structured 2.1 Unstructured overlays
overlays can emulate the functionality of unstructured
overlays with comparable or even better performance. We implemented an unstructured overlay maintenance
In Section 2, we describe and compare structured and protocol based on the specification of Gnutella version
unstructured topology maintenance protocols assuming 0.4 [15] but we added many optimizations to the proto-
a homogeneous setting. Section 3 extends the struc- col to ensure a fair comparison.
tured topology maintenance protocol to exploit hetero- Gnutella 0.4 organizes overlay nodes into a random
geneity in peers’ resources and compares this with un- graph. Each node in the overlay maintains a neighbour
structured topology maintenance protocols which exploit table with the network addresses of its neighbours in the
heterogeneity. Section 4 compares the performance of overlay. The neighbour tables are symmetric; if node x
content discovery using random walks and flooding on has node y in its neighbour table then node y has node x
both structured and unstructured topologies, and Section in its neighbour table. There is an upper and lower bound
5 presents our conclusions. on the number of entries in each node’s neighbour table.

86 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
A joining node uses a random walk starting from a results in additional overhead without improved robust-
bootstrap node, which is randomly chosen from the set ness or query performance.
of nodes already in the overlay, to find other nodes to fill
its neighbour table. It sends the bootstrap node a neigh-
bour discovery message with a counter that is initialized
2.2 Structured overlays
to the number of nodes required to fill its neighbour ta- There are several structured overlay maintenance proto-
ble. Upon receiving a discovery message, a node checks cols. We chose an implementation of Pastry [29] called
whether it has less neighbours than the upper bound. If MS Pastry [6] because it has good performance under
this is the case, the node sends a message to the joining churn and has an efficient implementation of proxim-
node inviting it to become a neighbour and decrements ity neighbour selection [8]. We modified it to exploit
the counter in the neighbour discovery message. In either heterogeneity (as described in the next section). Stud-
case, the neighbour discovery message is forwarded to a ies have shown that other structured overlay maintenance
randomly chosen neighbour if the counter is still greater protocols[21, 28] also perform well under churn.
than zero. To increase resilience to node and network Structured overlays map keys to overlay nodes. Over-
failures, all neighbour discovery messages are acknowl- lay nodes are assigned nodeIds selected from a large
edged. If a node does not receive an acknowledgement identifier space and application objects are identified by
within a timeout, it selects another neighbour at random keys selected from the same identifier space. Pastry se-
and forwards the neighbour discovery message to that lects nodeIds and keys uniformly at random from the set
neighbour. of 128-bit unsigned integers and it maps a key k to the
In addition to joins, nodes need to detect failures and node whose identifier is numerically closest to k modulo
replace faulty neighbours. Every t seconds each node 2128 . This node is called the key’s root. Given a message
sends an I’m alive message to every node in its neigh- and a destination key, Pastry routes the message to the
bour table. Since all nodes do the same and neighbour key’s root node. Each node maintains a routing table and
tables are symmetric, each node should receive a mes- a leaf set to route messages.
sage from each neighbour in each t second period. If a NodeIds and keys are interpreted as a sequence of dig-
node does not receive a message from a neighbour, it ex- its in base 2b . We use b = 1 in this paper to minimizes
plicitly probes them and if no reply is received the node is the maintenance overhead. The routing table is a matrix
assumed to be faulty. We used t = 30 seconds in this pa- with 128/b rows and 2b columns. The entry in row r and
per. Nodes maintain a cache of other nodes that they use column c of the routing table contains a random nodeId
to replace failed neighbours. If the cache is empty, they that shares the first r digits with the local node’s nodeId,
obtain new neighbours by sending a neighbour discovery and has the (r + 1)th digit equal to c. If there is no such
message to a randomly chosen neighbour. All messages nodeId, the entry is left empty. The uniform random dis-
sent between the nodes are used to replace explicit I’m tribution of nodeIds ensures that only log2b N rows have
alive messages. non-empty entries on average. Additionally, the column
Simulation results show that this protocol leads to poor in row r corresponding to the value of the (r + 1)th digit
query performance because the neighbour table of a join- of the local node’s nodeId remains empty.
ing node and those of its neighbours are likely to share a Nodes use a neighbour selection function to select be-
significant fraction of nodes. This reduces the effective- tween two candidates for the same routing table slot.
ness of floods and random walks to discover data. We Given two candidates y and z for slot (r, c) in node x’s
overcome this problem by forwarding the neighbour dis- routing table, x selects z if z’s nodeId is numerically
covery message over a number of random hops after each closer than y’s to the nodeId obtained by replacing the
neighbour invitation is sent. We add a hop counter to (r + 1)th digit of x’s nodeId by c. This neighbour selec-
discovery messages that is set to R by every node that tion function promotes stability in routing tables while
replies with a neighbour invitation. Nodes decrement the distributing load. We chose not to use proximity neigh-
hop counter when they forward a discovery message and bour selection because it increases overhead slightly and
they only consider sending a neighbour invitation when low delay routes do not seem important for the applica-
the counter is less than or equal to zero. We used R = 5 tions we study in this paper.
in this paper as, from experimental evaluation, this pro- The leaf set connects nodes in a ring. It contains the
vided good query performance with small increase in l/2 closest nodeIds clockwise from the local nodeId and
maintenance overheads. the l/2 closest nodeIds counter clockwise. The leaf set
We use unbiased random walks because we found that ensures reliable message delivery. We use l = 32 in
biasing the random walk to nodes with low degree re- this paper, which provides high robustness to large scale
duces overhead but results in poor query performance. failures and high churn rates.
We also experimented with flooding of discovery mes- At each routing step, the local node normally forwards
sages (as specified in the Gnutella 0.4 protocol) but this the message to a node whose nodeId shares a prefix with

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 87
the key that is at least one digit longer than the prefix example, the original Chord [32] finger table and Pastry’s
that the key shares with the local node’s nodeId. If no constrained routing table [7]. For example, Pastry’s con-
such node is known, the message is forwarded to a node strained routing table enables a node that detects the fail-
whose nodeId is numerically closer to the key and shares ure of its right neighbour to locate all nodes with routing
a prefix with the key at least as long. The leaf set is used table entries pointing to the failed node with an expected
to determine the destination node in the last hop. cost of O(log N) messages. We chose not to use the con-
strained routing table because it eliminates the flexibility
Exploiting structure to reduce maintenance overhead necessary to cope with heterogeneous peers as described
Structured overlays can use structure to reduce mainte- in the next section.
nance overhead in several ways. First, several structured MS Pastry uses a different strategy to detect failures
overlays use structure to initialize the routing tables of in the routing table. Since the routing table is not sym-
joining nodes efficiently and to announce their arrival. metrical, a node explicitly probes every member every
Node joining in Pastry exploits the topology structure tr seconds to detect failures. The routing table probing
as follows. A joining node x picks a random nodeId X period tr is set dynamically by each node based on the
and asks a bootstrap node a to route a special join mes- node failure rate in the overlay observed by the node [6].
sage using X as the destination key. This message is We configured MS Pastry to achieve a 1% loss rate, i.e., a
routed to the node z with nodeId numerically closest to message routed between a pair of nodes has a probability
X. The nodes along the overlay route add routing table of 99% of reaching the destination even in the absence of
rows to the message; node x obtains the rth row of its retransmissions.
routing table from the node encountered along the route Pastry also has a periodic routing table maintenance
whose nodeId matches x’s in the first r − 1 digits and protocol to repair failed entries. Each node x asks a node
its leaf set from z. After initializing its routing table, x in each row of the routing table for the corresponding row
sends the rth row of the table to each node in that row. in its routing table. x chooses between the new entries in
This serves both to announce x’s presence and to gos- received rows and the entries in its routing table using
sip information about nodes that joined previously. Each the neighbour selection function defined above. This is
node that receives a row considers using the new nodes repeated periodically, for example, every 20 minutes in
to replace entries in its routing table. the current implementation. Additionally, Pastry has a
Additionally, structured overlays can eliminate redun- passive routing table repair protocol: when a routing ta-
dant failure detection probes by using structure to parti- ble slot is found empty during routing, the next hop node
tion failure detection responsibility and to locate nodes is asked to return any entry it may have for that slot.
that need to be informed when a failure is detected. For These techniques used to reduce overhead in MS Pas-
example, MS Pastry uses this technique to reduce the try are described in detail in [6] and are applicable to
number of liveness probes in the leaf set by a factor of other structured overlays.
32. Each node sends a single I’m alive message every tl
seconds to its left neighbour in the id space. If a node 2.3 Experimental comparison
does not receive a message from its right neighbour, it
probes the neighbour and marks it faulty if it does not re- We compare the maintenance overhead of the different
ply. When it marks the neighbour faulty, it discovers the overlays using a packet-level discrete-event simulator.
new member of its leaf set by querying the right neigh- We simulated a transit-stub network topology [34] with
bour of the failed node and informs all the members of 5050 routers. There are 10 transit domains at the top
the new leaf set about the failed node. If several con- level with an average of 5 routers in each. Each transit
secutive nodes in the ring fail, the left neighbour of the router has an average of 10 stub domains attached, and
leftmost node will detect the failure and repair provided each stub has an average of 10 routers. Routing is per-
the number of consecutive nodes that failed is less than formed using the routing policy weights of the topology
l/2 − 1. We use tl = 30 seconds in this paper, which is generator [34]. The simulator models the propagation
equal to the period between I’m alive messages in the un- delay on the physical links. The average delay of router-
structured overlays. This technique is readily applicable router links was 40.7ms. In the experiments, each end
to systems that organize nodes into a logical ring, for ex- system node was attached to a randomly selected stub
ample [32, 29, 28], but harder to apply to other systems, router with a link delay of 1ms.
for example [25, 35]. The simulation is driven using a real-world trace of
The technique can be extended to eliminate fault de- node arrivals and failures from a measurement study of
tection probes sent to routing table entries. This can Gnutella [30]. The study monitored 17,000 unique nodes
be done in routing tables that constrain each node x to in the Gnutella overlay over a period of 60 hours. It
point to nodes whose identifiers are the closest to specific probed each node every seven minutes to check if it was
points in the identifier space derived from x’s nodeId, for still part of the overlay. The average session time over

88 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.9
0.8
send I’m alive messages to each of their neighbours every
Gnutella 0.4 (8)

Messages / second / node


0.7 Gnutella 0.4 (4) 30 seconds. The average number of links per node over
Pastry
0.6
the trace is 5.8 in Gnutella 0.4 (4) and 11.0 in Gnutella
0.5
0.4
0.4 (8). Therefore, the expected overhead due to fault de-
0.3 tection is 0.19 and 0.37 messages per second per node in
0.2 Gnutella 0.4 (4) and Gnutella 0.4 (8), respectively. Pas-
0.1
try’s maintenance overhead is between the overhead of
0
0 10 20 30 40 50 60 Gnutella 0.4 (4) and Gnutella 0.4 (8) most of the time.
Time(hours)
Pastry is able to achieve low maintenance overhead
because it exploits structure. The overhead for fault de-
Figure 1: Maintenance overhead in messages per second tection of leaf set members is only 0.03 messages per
per node over time for the Gnutella 0.4 and Pastry over- second per node even though there are 32 nodes in each
lays. node’s leaf set. Additionally, Pastry tunes the routing
table probing period to achieve 1% loss rate (using the
techniques described in [6]). This ensures that it uses
the trace was approximately 2.3 hours and the number the minimum probe rate that achieves the desired reli-
of active nodes in the overlay varied between 1,300 and ability. Pastry’s maintenance overhead varies with the
2,700. The failure rate and arrival rates are similar but failure rate observed during the trace because the self-
there are large daily variations (more than a factor of 3). tuning technique increases the probe rate when the node
There was no application-level traffic during this experi- failure rate increases. The spikes in maintenance over-
ment to isolate the overlay maintenance overhead. head at approximately 44 hours and after 50 hours are
We opted for a simulation study because scalability is due to spikes in the node failure rate in the trace. These
an important attribute of these overlays and the testbeds spikes in failure rate are probably caused by temporary
we have available cannot cope with the overlay sizes that loss of network connectivity between the site issuing the
we simulate in this and later sections. The code that runs pings and a large fraction of its targets during the collec-
in the simulator is complete and realistic; it can run in tion of the trace.
a real deployment by simply relinking with a different It is possible to lower the overhead of Gnutella by re-
communication library. The simulator also appears to ducing the rate of I’m alive messages or the number of
be accurate as shown by the validation study presented neighbours but doing this decreases resilience to churn
in [6], which compares the simulator output with values and degrades search efficiency. It might also be possible
measured in a real deployment. to use techniques similar to Pastry’s to reduce mainte-
We compare the maintenance overhead of Gnutella 0.4 nance overhead in Gnutella overlays without decreasing
and Pastry. We used two configurations of Gnutella 0.4: resilience but this would require introducing a structure
Gnutella 0.4 (4) bounds the number of neighbours to be similar to Pastry’s. However, this is not the point.
at least 4 and no more than 12, Gnutella 0.4 (8) bounds The important point is that the maintenance overhead
the number of neighbours to be at least 8 and no more is negligible in all three systems and that structured over-
than 32. In the experiments, we observed that Gnutella lays provide additional functionality that has proven use-
0.4 (4) has on average 5.8 neighbours and Gnutella 0.4 ful in a number of applications. For example, the average
(8) has on average 11.0 neighbours. number of messages per second per node over the trace
These parameters were chosen because Gnutella 0.4 is only 0.26 in Pastry. Furthermore, the vast majority of
(4) has maintenance overhead lower than Pastry whereas these messages are smaller than 100 bytes on the wire.
Gnutella 0.4 (8) has higher overhead. It is important Therefore, the overhead is less than 26 bytes per second,
to note that both configurations have lower resilience to which is negligible even for users with slow dialup con-
churn than Pastry. Each Pastry node has 32 neighbours nections. For comparison, the latest Gnutella specifica-
in the leaf set alone and it detects and repairs failures of tion [2] recommends a probing period that results in an
leaf set neighbours as fast as the Gnutella overlays de- estimated 131 bytes per second per neighbour.
tect and repair their neighbour failures. A node only gets The maintenance overhead is constant in the unstruc-
partitioned from the overlay if 32 nodes fail before being tured overlays but grows with N in the structured over-
replaced in Pastry whereas it only takes 6 nodes to fail in lay. However, it grows very slowly. The fault detection
Gnutella 0.4 (4) and 11 in Gnutella 0.4 (8). traffic, which accounts for most of the maintenance over-
Figure 1 shows the maintenance overhead measured head, is constant for leaf set members and it is propor-
as the average number of messages per second per node. tional to log2 (N ) for routing table entries. For example,
The x-axis represents simulation time. increasing N to one billion nodes with a similar pattern
Most of the overhead is due to fault detection mes- of node arrivals and departures would increase mainte-
sages in the three overlays. In the Gnutella overlay, nodes nance traffic in the structured overlay to less than 0.69

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 89
messages per second per node (or less than 69 bytes per lay topology such that nodes with higher capacity have
second per node), which is still negligible. higher degree. Since high-degree nodes receive a larger
fraction of the traffic, this ensures that they have the ca-
pacity to handle this traffic. Gia’s fine-grained approach
3 Exploiting heterogeneity to exploit heterogeneity can perform better than simply
using super-peers [10].
Nodes in deployed peer-to-peer overlays are heteroge- We implemented Gia exactly as described in [10].
neous [30]; they have different bandwidth, storage, and Node discovery is implemented using a random walk
processing capacities. An overlay that ignores the differ- (as described for Gnutella 0.4) but the nodes use Gia’s
ent node capacities must bound the load on any node to pick neighbor to drop function [10] to decide whether
be below the load that the least capable nodes are able to send back a neighbour invitation message. Topology
to sustain; otherwise, it risks congestion collapse. It is adaptation is driven by Gia’s satisfaction level function,
important to exploit heterogeneity to improve scalability. which increases with the sum of the ratio between the
Can unstructured overlays exploit heterogeneity more capacity and degree of each neighbour. This function
effectively than structured overlays? is evaluated periodically and nodes with a low satisfac-
Structured overlays have constraints on the graph tion level attempt to find a new neighbour to increase the
topology that reduce flexibility to adapt the topology to level. The adaptation interval is computed as in Gia (with
exploit heterogeneity. However, some structured over- the parameters K = 256 and T = 10 seconds).
lays have significant flexibility in the choice of some
overlay neighbours, which is important to implement
proximity neighbour selection [35, 29, 16, 28]. These 3.2 Structured overlays
structured overlays can exploit heterogeneity by mod- We implemented two structured overlay maintenance
ifying the proximity neighbour selection algorithm to protocols based on Pastry that exploit heterogeneity: Su-
choose nodes with high capacity as overlay neighbours. perPastry uses super-peers like Gnutella 0.6 and Het-
We show that this is as effective as recent proposals to eroPastry uses topology adaptation like Gia.
adapt unstructured overlay topologies [10]. It is simple to exploit the super-peers concept in a
This section describes the implementation of several structured overlay. The super-peers are organized into
structured and unstructured overlay maintenance proto- a structured overlay using the Pastry algorithm described
cols that exploit heterogeneity and compares their per- in the previous section. Ordinary peers do not join this
formance. overlay. Instead they attach to a small number of super-
peers as in Gnutella 0.6. Ordinary peers select super-
3.1 Unstructured overlays peers to attach to by routing to random destination keys
through a bootstrap super-peer. They exchange I’m alive
We implemented two unstructured overlay maintenance messages with the selected super-peers to detect failures
algorithms that exploit heterogeneity: a version of as in Gnutella 0.6.
Gnutella 0.6 [2] and a version of Gia [10]. The implementation of capacity-aware topology adap-
Gnutella 0.6 extends the Gnutella 0.4 protocol by tation in structured overlays is less obvious. We propose
adding the concept of super-peers [3]. Nodes that are a simple solution based on existing proximity neigh-
capable of contributing enough resources to the overlay bour selection algorithms [29, 35, 16]. These algo-
are classified as super-peers and organized into a ran- rithms select the closest neighbours in the underlying
dom graph using the optimized version of the Gnutella network subject to the structural constraints on the topol-
0.4 protocol (which was described in the previous sec- ogy. They can be modified to provide capacity-aware
tion). Ordinary nodes are not part of the random graph. topology adaptation by using a proximity metric that re-
Instead, each ordinary node attaches to a small number flects node capacity.
of randomly selected super-peers and proxies its data HeteroPastry uses the Pastry algorithm described in
discovery queries through them. Ordinary nodes select the previous section except that it achieves capacity-
super-peers to attach to using a random walk with a mod- aware topology adaptation by modifying the neighbour
ified neighbour discovery message and they exchange selection function to take node capacity into account.
I’m alive messages with the selected super-peers to de- Given two candidates y and z for slot (r, c) in node x’s
tect failures. This topology places most of the search and routing table, x selects z if it has capacity greater than
overlay maintenance load on super-peers. y or if z and y have the same capacity and z’s nodeId is
Gia [10] provides a more fine-grained adaptation to numerically closer than y’s to the nodeId obtained by re-
heterogeneity. Each node selects a numerical capacity placing the (r + 1)th digit of x’s nodeId by c. We assume
value that abstracts the amount of resources that it is that node capacities are quantized into a few discrete val-
willing to contribute to the overlay. Gia adapts the over- ues for the randomization based on nodeIds to be effec-

90 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.45
tive at distributing load. It is possible to design neighbour 0.4
Gnutella 0.6
SuperPastry

Messages / second / node


selection functions that combine several capacity metrics 0.35

and even network proximity. 0.3


0.25
In addition to specifying capacity, nodes can specify 0.2
an upper bound on their indegree, i.e., the number of 0.15

nodes with routing table entries pointing to them. This 0.1


0.05
bound is likely to be a function of their capacity. We
0
modified Pastry to ensure that the number of routing ta- 0 10 20 30 40 50 60
Time(hours)
ble entries pointing to a node does not exceed the speci-
fied bound. Each node x keeps track of nodes with rout-
ing table entries that point to x (backpointers) and sends Figure 2: Maintenance overhead in messages per sec-
backoff messages when the number of backpointers ex- ond per node over time for the two overlays using super-
ceeds the indegree bound. It is necessary to keep track peers.
of backpointers because neighbour links in Pastry rout-
ing tables are not symmetric. Neighbour links in the leaf
set are symmetric and their number is fixed at 32 in this 3.3 Experimental comparison
paper. They are not counted as part of the indegree of x
unless they also have a routing table entry pointing to x. We compared the maintenance overhead of the different
Nodes keep track of backpointers by passively moni- overlay maintenance algorithms that exploit heterogene-
toring messages received from other nodes. They add a ity to achieve scalability. We used the experimental setup
node to the backpointer set when they receive a message in Section 2.3, which does not include any query traffic,
from the node and, every D seconds, they remove nodes to isolate the maintenance overheads.
from which they did not receive messages for more than Gnutella 0.6 and SuperPastry were configured with
2D seconds. D is set to the routing table probing period similar parameters to allow a fair comparison. Each or-
because nodes send probes to their routing table entries dinary node selected 3 super-peers as proxies and each
every routing table period. super-peer acted as a proxy for up to 30 ordinary nodes.
If the number of backpointers exceeds the bound after Each super-peer in Gnutella 0.6 had at least 10 super-
adding a new node, the local node x selects one of the peer neighbours and at most 32. The indegree bound
backpointers for removal and sends that node a backoff of super-peers in SuperPastry was also 32. The simula-
message. For each backpointer y with x in slot (r, c) tor provided each joining node with a randomly selected
of its routing table, the numerical distance between x’s super-peer to bootstrap the joining process and joining
nodeId and the nodeId obtained by replacing the (r+1)th nodes were marked super-peers with a probability of 0.2.
digit of y’s nodeId by c is computed. x selects the node Figure 2 shows the maintenance overhead measured as
with the maximal distance for eviction. This policy is the number of messages sent per second per node.
dual of the neighbour selection function (except that it is The maintenance overhead is dominated by the cost
oblivious to capacity) to provide stability. of failure detection as before. In Gnutella 0.6, a node
Nodes that receive a backoff message remove the has 7.5 neighbours on average, which results in 0.25 I’m
sender from their routing tables and insert the sender in alive messages per second per node on average. This
a backoff cache. We modified the neighbour selection accounts for most of the control traffic has shown in Fig-
function to ensure that it never selects nodes in the back- ure 2. Both systems incur the same communication over-
off cache. The current implementation removes entries head between ordinary peers and super-peers. SuperPas-
from the backoff cache after four routing table probing try achieves lower overhead than Gnutella 0.6 because
periods. it exploits structure to reduce failure detection overhead.
Our solution is not applicable to some structured over- The overhead is negligible in both systems.
lays that provide no flexibility at all in the selection of We also ran experiments to compare the maintenance
neighbours, for example, the original Chord [32] and overhead of Gia and HeteroPastry. Gia was configured
CAN [25]. It is possible to use virtual nodes [32] to using the parameters in [10]. The lower bound on the
adapt these structured overlays to different node capaci- number of neighbours in Gia is 3 and the upper bound
ties. Each physical node can simulate a number of virtual is max(3, min(128, C4 )) [10], where C is the capacity
overlay nodes proportional to its capacity. The problem of the node. We use the same bounds on the indegree of
is that node capacities can vary by several order of mag- nodes in HeteroPastry. The capacity of a node (in both
nitude. Therefore, the number of virtual nodes must be overlays) is selected when it joins according to the prob-
much larger than the number of physical nodes, which abilities in Table 1, which were taken from [10].
results in a large increase in maintenance traffic that can Figure 3 plots the maintenance overhead in messages
render this solution impractical. per second per node against time for Gia and HeteroPas-

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 91
200
Capacity Probability 180

1 0.2 160

Average indegree
140
10 0.45 120

100 0.3 100


80
1000 0.049 60

10000 0.001 40
20
0
Table 1: Node capacity distribution 1 10 100 1000 10000
Capacity

0.8
Gia
0.7 HeteroPastry
Figure 5: Average indegree of nodes with each capacity
Messages / second / node

0.6

0.5 value.
0.4

0.3

0.2 to 5 is above 897. The capacity decreases when the level


0.1 increases because of stronger structural constraints. A
0 node in level l of the routing table must match the nodeId
0 10 20 30 40 50 60
Time(hours) of the local node in the first l digits. The size of the set
of nodes that can be selected to fill slots at level l + 1 is
Figure 3: Maintenance overhead in messages per second half the size of the set of nodes that can fill slots at level
per node over time for Gia and HeteroPastry. l. Therefore, the probability that these sets include high
capacity nodes decreases as the level increases. Since
most nodes have less than 12 (log2 (2627)) levels in their
try. Failure detection messages account for most of the routing tables, there is some noise for levels above 12.
overhead as in previous experiments. Nodes in Gia have We also measured the average indegree of nodes with
15.6 neighbours on average, which results in 0.52 I’m each capacity value at the same point in time. The re-
alive messages per second per node. The overhead of sults are in Figure 5. The average indegree of the two
HeteroPastry is almost identical to the overhead incurred nodes with capacity 10000 is above the indegree bound
by the version of Pastry that does not exploit heterogene- of 128. This happens because nodes are very likely to se-
ity and does not bound indegrees (which is shown in Fig- lect nodes with capacity 10000 for the top levels of their
ure 1). routing tables and these pointers are only removed after
Figure 3 shows that the overhead of topology adap- the node receives a backoff message. The results show
tation in both Gia and HeteroPastry is negligible. The that topology adaptation in HeteroPastry is effective at
next set of results show that topology adaptation in Het- distributing the indegree according to capacity.
eroPastry is also effective.
We examined the routing tables of live HeteroPastry
nodes five hours into the trace and calculated the aver- 4 Data queries
age capacity of the nodes in routing table entries at each
routing table level across the 2627 live nodes. Figure 4 Complex queries are important in mass-market data shar-
shows the results. ing applications [10]. Since users do not know the exact
Topology adaptation fills routing tables with high ca- names of the files they want to retrieve, the exact-match
pacity nodes. The average capacity of nodes in levels up queries offered by structured overlays are not directly
useful in these applications. Users discover data with
keyword searches, which are readily supported by un-
10000
structured overlays that visit a subset of random nodes in
Average capacity of members

1000
the overlay and execute the search query locally at each
visited node.
100 Can unstructured overlays support complex queries
more efficiently than structured overlays?
10
Several research prototypes support keyword searches
1 using the exact-match queries of structured overlays [27,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Level of routing table
33, 14, 18] to implement inverted indices. The basic idea
is to use the structured overlay to map keywords to over-
lay nodes. The node responsible for a keyword stores an
Figure 4: Average capacity of nodes in routing table en- index with the location of all documents that contain the
tries at each level in HeteroPastry. keyword. When a file is added to the system, the nodes

92 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
responsible for the keywords in the file are contacted to visit the same node more than once, which resulted in
update the appropriate indices. A query for documents worse search performance. We added a list to each query
containing a set of keywords contacts the nodes respon- with all the nodes already visited by the query to prevent
sible for those keywords and intersects their indices. this. Nodes do not forward a query to a node that is in
Unfortunately, this approach has several problems. this list.
Maintaining the indices in the presence of churn is ex- All unstructured overlays use one hop replication,
pensive and popular keywords may be mapped to low which has been shown to improve search performance
capacity nodes that cannot cope with the load [10]. Ad- in unstructured overlays [10]. A node replicates an index
ditionally, the queries can be expensive because they re- of its content at each of its neighbours. In Gnutella 0.6,
quire computing the intersection of large indices. The these indices are only replicated at super peers.
analysis in [20] shows that this approach performs worse
than flooding queries to 60,000 nodes in a random graph.
4.2 Structured Overlays
Therefore, this approach performs significantly worse
than recent unstructured overlays like Gia [10]. Addi- The hybrid system exploits structure to implement ran-
tionally, unstructured overlays can support even more so- dom walks and constrained floods more efficiently.
phisticated queries that are not supported by the inverted Flooding in random graphs is inefficient because each
indices approach, for example, regular expressions and node is likely to be visited more than once. In a graph
range queries on multiple attributes. with an average degree of k, a flood that visits all nodes
This section explores a different approach to support- will send on average (k − 1) × N messages (where N
ing complex queries in structured overlays. We devel- is the size of the overlay). Additionally, it is difficult to
oped a hybrid system that uses the topology from struc- control the number of nodes visited during a constrained
tured overlays with the data placement and data discov- flood. Floods are constrained using a time-to-live field
ery strategies of unstructured overlays. We introduce in the query message that is decremented every time the
new techniques to perform floods or random walks over query is forwarded. The query is not forwarded when
structured topologies that provide support for arbitrar- the time-to-live field drops to zero. This provides very
ily complex queries. These techniques take advantage coarse control over the number of nodes visited.
of structural constraints on the topology to ensure that The hybrid system can do better by replacing flood-
nodes are visited only once during a query, to control ing with the broadcast mechanisms that have been pro-
the number of nodes that are visited accurately, and to posed for structured overlays [26, 9, 11]. We use Pas-
increase the average capacity of nodes visited during a try’s broadcast mechanism [9] to flood queries to over-
query to exploit heterogeneity more effectively. lay nodes. A node y broadcasts a query by sending the
The results in the previous sections show that it is pos- query to all the nodes x in its routing table. Each query
sible to maintain a structured overlay that exploits het- is tagged with the routing table row r of node x. When
erogeneity with low maintenance overhead. Addition- a node receives a query tagged with r, it forwards the
ally, the hybrid system does not constrain data place- query to all nodes in its routing table in rows greater than
ment; nodes do not have to incur the overhead of up- r if any.
dating distributed indices for each keyword in their files. A node may have a missing entry in a slot in its rout-
This section compares the performance of random walks ing table, for example, because it pointed to a node that
and floods on the overlays that were described in the pre- failed. The broadcast overcomes this problem by using
vious section. Pastry to route the query to a node with the appropriate
nodeId to fill the slot (if there is any) [9]. Almost all
4.1 Unstructured overlays nodes receive the query only once but the technique to
deal with empty routing table slots may result in a small
We used random walks to discover data because they number of duplicates.
have been shown to induce lower overhead than the con- We place an upper bound on the row number of entries
strained floods [23] used by current versions of Gnutella. to which the query is forwarded to constrain the flood.
These random walks are biased to prefer nodes with This bounds the number of nodes visited to a power of
higher degree in Gia and are unbiased in the other un- two. It is simple to extend this mechanism to provide
structured overlays. The original Gia [10] biased the ran- arbitrarily fine grained control over the number of nodes
dom walks to prefer nodes with higher capacity but our visited.
experimental results indicate that preferring nodes with This mechanism can easily be modified to perform
higher degree yields both higher success rate and lower random walks rather than floods by performing a breadth
delay. We present results for this optimized version of first traversal of the tree used for flooding. This can be
Gia. done by adding a set of nodes to visit in the query mes-
We observed that random walks in Gia were likely to sage. A random walk query message includes the tag r,

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 93
450 400
400 350
350
300

Number of nodes

Number of copies
300
250
250
200 200

150 150
100 100
50
50
0
1 10 100 1000 10000 0
1 10 100 1000 10000 100000 1000000
Number of cached files Popularity ranking

Figure 6: Distribution of the number of files per node for Figure 7: Number of files versus file rank for the eDon-
the eDonkey file trace [12]. key file trace [12].

an array q with queues of nodes indexed by routing ta- The eDonkey trace does not include queries but the
ble row, and a bound d on the maximum row number to number of copies of a file is strongly correlated with the
traverse. When the query is received at node x, it ap- number of queries that it satisfies. Therefore, our query
pends the nodes in each routing table row r0 to queue distribution matches the distribution of the number of
q[r0 ] provided that r < r0 ≤ d. Then, if queue q[r] is not copies of files.
empty, x removes the next node from the queue and for- Each node generates 0.01 query messages per second
wards the query to this node. If q[r] is empty, the query using a Poisson process and each query searches for a
is forwarded to the first node in queue q[r + 1] and r is file in the trace. The simulator maintains the distribution
incremented. If all queues are empty, the random walk is of the number of copies of files stored by nodes that are
complete. currently in the overlay. The target file for each query is
The results in the previous section show that the aver- chosen from this distribution (which is a sample of the
age capacity of the nodes in routing table entries in Het- distribution in Figure 7). This ensures that at least one
eroPastry decreases as the row number increases. There- copy of the target file is stored in the overlay when the
fore, the mechanism that we use to bound the floods and query is initiated.
random walks biases them to visit nodes with higher ca- In all the experiments, we bound random walks to visit
pacity in HeteroPastry. at most 128 nodes. When a node x receives a query, it
We also implement one hop replication in the hybrid checks if the target file is stored locally or if it is stored
system. Each node replicates an index of its local content by nodes whose indices are replicated locally. In the first
on the nodes in its routing table. Therefore, it is expected case, the query is satisfied and x does not forward the
to replicate its index in log2 (N ) other nodes. query further. In the second case, x contacts a random
node y which it believes has a copy of the file. If y has
the file, the query is satisfied and y sends an acknowl-
4.3 Experimental comparison
edgment back to x. If x receives the acknowledgment
We compared the performance of random walks on struc- before a timeout, it stops forwarding the query. Other-
tured and unstructured overlays. We used the basic ex- wise, x contacts another random node that it believes has
perimental setup described in the previous sections but the file or it forwards the query if there are no more such
we simulated queries and node file stores. nodes.
We used a real-world trace of files stored by eDon- We measured the fraction of queries that are satisfied
key [12] peers to model the sets of files stored by sim- and the delay from the moment a query is initiated until
ulated nodes. There are 37,000 peers in the trace and, it is satisfied. We also measured the load by counting the
for each peer, there is a record with the identifiers of the number of messages sent per second per node.
files stored by the peer. Figure 6 shows the distribution
of the number of files stored by each peer. It excludes 4.3.1 Gnutella trace
the 25,172 peers that have no files. We model the set of
files stored by each node as follows: when a node joins, We compared the performance of data discovery on the
the simulator chooses a random unused record from the overlays that exploit heterogeneity. Figure 8 shows the
trace and assigns the files in the record to the node. query success rate, Figure 9 shows the delay for success-
There are approximately 923,000 unique files. File ful queries, and Figure 10 shows the overhead in mes-
copies exhibit a heavy-tailed zipf-like distribution as sages per second per node. The results show that fine-
shown in Figure 7. Full details about the trace can be grained topology adaptation performs better than using
found in [12]. super-peers. HeteroPastry achieves significantly higher

94 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
1.2 2
Gia Gnutella 0.6
1.8
1 SuperPastry HeteroPastry

Messages / second / node


1.6
1.4
0.8

Success rate
1.2
0.6 1
0.8
0.4 HeteroPastry
Gia 0.6
Gnutella 0.6 0.4
0.2 SuperPastry
0.2
0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time(hours) Time (hours)

Figure 8: Query success rate. Figure 10: Messages per second per node.

30000 1
Gnutella 0.6
SuperPastry 0.9 1
25000 Gia 0.8 10
HeteroPastry 100
0.7

Fraction of nodes
20000 1000
Delay (ms)

0.6 10000
15000 0.5
0.4
10000 0.3
0.2
5000
0.1

0 0
0.01 0.1 1 10 100
0 10 20 30 40 50 60
Messages / second
Time(hours)

Figure 9: Query delay for successful queries. Figure 11: Cumulative distribution of messages per sec-
ond per node for each capacity value in HeteroPastry.

success rate, and lower delay and overhead than Super-


Pastry and Pastry. We also ran experiments with overlays this 10 minute window was 2.4 times higher for Gia than
that do not exploit heterogeneity and found that they per- HeteroPastry. Figures 11 and 12 show the cumulative
form significantly worse. distribution of the number of messages per second per
SuperPastry and Gnutella 0.6 achieve very similar per- node for each capacity value in HeteroPastry and Gia.
formance by all metrics. But HeteroPastry achieves The maximum message rate observed was only 42.63
significantly better performance than all the others. It for Gia and 26.48 for HeteroPastry. Both systems do a
achieves the highest success rate, the lowest delay, and good job of distributing message load according to ca-
the lowest overhead. This demonstrates that HeteroPas- pacity; nodes with higher capacity receive more mes-
try can exploit heterogeneity effectively to improve scal- sages. The message rate for nodes with capacity 1 is
ability; the high success rate indicates that the bound on low; the median is only 0.17 and the 95th percentile is
the length of random walks can be small and the low de- only 0.30 in HeteroPastry, and the median is 0.11 and
lay shows that they are likely to terminate early, which the 95th percentile is 0.13 in Gia. For the nodes with
results in low overhead. The other systems would re- capacity 10 in HeteroPastry, the median is also 0.17 and
quire longer random walks to achieve the success rate of the 95th percentile is 0.32, and the median is 0.11 and the
HeteroPastry, which would increase their overhead. 95th percentile is 0.14 in Gia. Since the indegree of 1-
All the overlay maintenance algorithms benefit from
suppression of failure detection traffic by query traffic. 1
1
For example, Gia’s overhead without queries is approx- 0.9
10
0.8 100
imately twice the overhead of Gnutella 0.6. The over- 0.7 1000
Fraction of nodes

10000
heads of the two are comparable with queries because 0.6
0.5
of the suppression of failure detection traffic and shorter
0.4
random walks. 0.3

So far we have considered the overhead averaged over 0.2


0.1
all live nodes in each 10 minute window in the trace. 0
Since both Gia and HeteroPastry adapt the topology to 0.01 0.1 1
Messages / second
10 100

distribute load according to node capacity, we looked at


the distribution of the number of messages per second
per node in the ten minutes preceding the 5 hour mark Figure 12: Cumulative distribution of messages per sec-
in the trace. The total number of messages received in ond per node for each capacity value in Gia.

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 95
4
Capacity 1 10 100 1000 10000
3.5 Gia
Gia Mean 3 3 23.56 126.02 128

messages / second / node


3 HeteroPastry
Median 3 3 24 128 128
2.5
95th 3 3 25 128 128
2
Hetero- Mean 2.15 2.38 14.50 104.66 128
1.5
Pastry Median 2 3 15 125 128
95th 3 3 24 128 128 1

0.5

Table 2: Distribution of replicas of node indices for dif- 0


0 100 200 300 400 500 600
ferent capacity values in Gia and HeteroPastry. session time (minutes)

Figure 13: Messages per second per node for Gia and
HeteroPastry versus session time.
and 10-capacity nodes is bounded to the same value, this
is not surprising. In both Gia and HeteroPastry, the 100-
capacity nodes incur a higher overhead than the 1- and 4.3.2 Poisson traces
10-capacity nodes but a lower overhead than the 1000- The experiments described so far use a trace of node
capacity nodes. arrivals and departures collected in a real Gnutella de-
ployment. The next set of experiments compare the per-
The figures also show that the load on any node is suf-
formance of Gia and HeteroPastry using artificial traces
ficiently low (with a query rate of 0.01 queries per second
with more nodes and different rates of churn. These
per node) that flow control is not necessary. Gia’s flow
traces have Poisson node arrivals and an exponential dis-
control mechanism [10] can be applied to HeteroPastry
tribution of node session times with the same rate. We
to enable scaling to higher query rates.
generated traces with session times of 5, 15, 30, 60, 120
We also studied the distribution of replicas of node in- and 600 minutes and in all cases the average number of
dices, which is another indicator of the effectiveness of nodes was 10,000. We used the same data and query dis-
both systems in adapting the topology to diffferent node tribution as in the previous experiments. It is important
capacities. Table 2 summarises the distribution of repli- to note that a session time of 5 minutes is short; indeed,
cas of indices for each capacity value in both systems. it is 28 times shorter than the average session time of 2.3
The total numbers of index replicas is 27,707 in Het- hours observed in the Gnutella trace.
eroPastry and 38,153 in Gia. Both systems do a good Figure 13 shows the total number of messages per sec-
job at distributing index replicas (and indegree) accord- ond per node for the different session times. Both Gia
ing to node capacity. Gia replicates more because it is and HeteroPastry have low overhead across all session
more effective at pushing replicas to nodes with capacity times.
100 and 1000. Gia’s overhead is almost constant across all session
times. Short session times increase Gia’s overhead
HeteroPastry maintains significantly less index repli- because of increased retransmissions and traffic to fill
cas than Gia but it performs better because its random neighbour tables. However, this is offset by a decrease
walks visit nodes with more index replicas and more di- in fault detection traffic due to a decrease in the average
verse index replicas than those visited by random walks number of neighbours; there are 15.1 neighbours when
in Gia. In Gia, nodes that are close in the overlay topol- the session time is 600 and 10.7 when it is 5.
ogy tend to share the same high capacity neighbours. HeteroPastry has a lower message overhead than Gia
This reduces the number of unique files known by a node for session times of 30 minutes or greater. This overhead
and its neighbours and it forces biased random walks to decreases between 60 and 600 minutes because Het-
visit low capacity nodes before they can find new high eroPastry adapts the routing table probing rate to match
capacity nodes to visit. Since the number of index repli- the failure rate. HeteroPastry incurs a higher message
cas stored by a node is proportional to its capacity, this overhead than Gia for extremely high churn rates mostly
results in poor performance. The topology adaptation due to the overhead of maintaining the leaf set. This
and random walk mechanisms in HeteroPastry exploit overhead could be reduced without impacting query suc-
structure to prevent this problem; the constraints on the cess rate and delay by using a smaller leaf set or disabling
node identifiers of neighbours and nodes visited during the mechanisms to ensure strong leaf set consistency [6],
a random walk ensure that the initial set of nodes vis- which are not important in this application.
ited has high capacity and knows about more unique Figure 14 shows the lookup success rate for the dif-
files. This results in HeteroPastry visiting significantly ferent session times. As in previous experiments, Het-
less nodes with capacity 100 during random walks than eroPastry achieves a success rate higher than Gia across
Gia (as shown in Figure 11). all session times.

96 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.9 2
1.8 Flooding
0.8
Random walks

Messages / second / node


0.7 1.6
1.4
0.6

Success rate
1.2
0.5
1
0.4
HeteroPastry 0.8
0.3 Gia
0.6
0.2 0.4
0.1 0.2
0 0
0 100 200 300 400 500 600 0 10 20 30 40 50 60
Session time (minutes) Time(hours)

Figure 14: Query success rate for Gia and HeteroPastry Figure 16: Messages per second per node when using
versus session time. constrained floods and random walks in HeteroPastry.
12000
Random walks
10000 Flooding the nodes with the random walk. Additionally, random
8000 walks use acknowledgments and retransmissions to re-
Delay (ms)

6000 cover when the query is forwarded to a node that fails.


4000
This introduces delays that increase when the failure rate
2000
in the trace increases (as shown in Figure 15). The de-
lay of constrained floods remains constant because we
0
0 10 20 30 40 50 60 do not use acknowledgments and retransmissions and in-
Time(hours)
stead rely on redundancy to cope with node failures. We
observed the same success rate for both flooding and
Figure 15: Query delay when using constrained flooding random walks, which demonstrates the effectiveness of
and random walks in HeteroPastry. using redundancy to cope with node failure during con-
strained floods.
Figure 16 shows the number of messages per sec-
The success rates with 10,000 nodes are lower than ond per node when using constrained floods and ran-
those observed before because there are more nodes and dom walks in HeteroPastry. It demonstrates the advan-
random walk length is still bound to 128. There are at tage of random walks over flooding; random walks re-
most 2,700 active nodes at any time in the Gnutella trace. sult in lower overhead because they stop when they find
This also results in higher message overhead with 10,000 a copy of the file and visit less nodes than constrained
nodes even with a session time of 600 minutes. floods on average. It is interesting to note that the over-
The delay incurred for successful lookups is similar head with constrained floods is comparable to the over-
in both HeteroPastry and Gia. HeteroPastry achieves a head in the unstructured overlays. Additionally, some
lower average delay per lookup because it has a higher peer-to-peer applications discover multiple nodes with
success rate and failed lookups take longer to complete matching content, for example, to enable more efficient
on average than successful lookups. Therefore, Het- downloads with some form of striping. The benefit of
eroPastry achieves a delay at least 12% lower than Gia random walks over constrained floods decreases in this
with 5 minute session times and at least 43% lower with case. Constrained floods are likely to be the best strategy
600 minutes session time. for many applications.

4.3.3 Constrained floods


5 Conclusion
We also compared the performance of constrained flood-
ing and random walks in HeteroPastry. We configured It is commonly believed that unstructured overlays cope
constrained floods to visit at most 128 nodes as with the with churn better, exploit heterogeneity more effectively,
random walks. Both algorithms visit exactly the same and support complex queries more efficiently than struc-
128 nodes when the query fails so they have the same tured overlays. This paper shows that coping with churn,
success rate. exploiting heterogeneity and supporting complex queries
Figure 15 shows the delay for successful queries using are not fundamental problems for structured overlays.
both constrained floods and random walks. It shows that We describe how to exploit structure to achieve high
constrained flooding can locate content faster than ran- resilience to churn with maintenance overhead as low
dom walks. This is not surprising because constrained as unstructured overlays and how to modify proximity
flooding visits nodes in parallel; all 128 nodes are vis- neighbour selection to exploit heterogeneity effectively
ited after only 7 hops. It takes 128 hops to visit all to improve scalability. Additionally, we present a hybrid

USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 97
system that uses the search and data placement strategies [16] G UMMADI , K. P., G UMMADI , R., G RIBBLE , S. D., R AT-
of unstructured overlays on a structured overlay topol- NASAMY, S., S HENKER , S., AND S TOICA , I. The impact of
DHT routing geometry on resilience and proximity. In SIG-
ogy. Simulation results using a real-world trace show COMM’03 (Aug. 2003).
that the hybrid system can support complex queries with
[17] G UMMADI , P. K., D UNN , R. J., S AROIU , S., G RIBBLE , S. D.,
lower message overhead while providing higher query L EVY, H. M., AND Z AHORJAN , J. Measurement, modeling,
success rates and lower response times than the state of and analysis of a peer-to-peer file-sharing workload. In SOSP’03
the art in unstructured overlays. (Oct. 2003).
The additional functionality provided by structured [18] H ARREN , M., H ELLERSTEIN , J. M., H UEBSCH , R., L OO ,
overlays has proven important to achieve scalability and B. T., S HENKER , S., AND S TOICA , I. Complex queries in DHT-
based peer-to-peer networks. In IPTPS’02 (Mar. 2002).
efficiency in a wide range of applications. Structured
overlays can emulate the functionality of unstructured [19] I YER , S., ROWSTRON , A., AND D RUSCHEL , P. Squirrel: A
decentralized peer-to-peer web cache. In PODC’02 (July 2002).
overlays with comparable or even better performance.
[20] L I , J., L OO , B. T., H ELLERSTEIN , J., K AASHOEK , F.,
Interestingly, it is not clear that unstructured overlays can K ARGER , D. R., AND M ORRIS , R. On the feasibility of peer-to-
efficiently emulate the same functionality as structured peer web indexing and search. In IPTPS’03 (Feb. 2003).
overlays. [21] L I , J., S TRIBLING , J., G IL , T. M., M ORRIS , R., AND
K AASHOEK , M. F. Comparing the performance of distributed
References hash tables under churn. In IPTPS’04 (Feb. 2004).
[22] L OO , B. T., H ELLERSTEIN , J. M., H UEBSCH , R., S HENKER ,
[1] The Gnutella 0.4 protocol specification, 2000. S., AND S TOICA , I. Enhancing P2P file sharing with an Internet-
http://dss.clip2.com/GnutellaProtocol04.pdf. scale query processor. In VLDB’04 (Sept. 2004).
[2] The Gnutella 0.6 protocol specification, 2002. [23] LV, Q., C AO , P., C OHEN , E., L I , K., AND S HENKER , S. Search
http://www.limewire.org/. and replication in unstructured peer-to-peer networks. In ICS’02
(June 2002).
[3] Kazaa, 2002. http://www.kazaa.com/.
[24] LV, Q., R ATNASAMY, S., AND S HENKER , S. Can heterogeneity
[4] B HAGWAN , R., S AVAGE , S., AND VOELKER , G. Understanding make Gnutella scalable? In IPTPS’02 (Feb. 2002).
availability. In IPTPS’03 (Feb. 2003).
[25] R ATNASAMY, S., F RANCIS , P., H ANDLEY, M., K ARP, R., AND
[5] B LAKE , C., AND RODRIGUES , R. High Availability, Scalable S HENKER , S. A scalable content-addressable network. In SIG-
Storage, Dynamic Peer Networks: Pick Two. In HotOS IX (May COMM’01 (Aug. 2001).
2003).
[26] R ATNASAMY, S., H ANDLEY, M., K ARP, R., AND S HENKER , S.
[6] C ASTRO , M., C OSTA , M., AND ROWSTRON , A. Performance Application-level multicast using content-addressable networks.
and dependability of structured peer-to-peer overlays. In DSN’04 In NGC’01 (Nov. 2001).
(June 2004).
[27] R EYNOLDS , P., AND VAHDAT, A. Efficient peer-to-peer key-
[7] C ASTRO , M., D RUSCHEL , P., G ANESH , A., ROWSTRON , A., word searching. In Middleware’03 (Nov. 2003).
AND WALLACH , D. S. Security for structured peer-to-peer over-
lay networks. In OSDI’02 (Dec. 2002). [28] R HEA , S., G EELS , D., ROSCOE , T., AND K UBIATOWICZ , J.
Handling churn in a DHT. In USENIX’04 (June 2004).
[8] C ASTRO , M., D RUSCHEL , P., H U , Y. C., AND ROWSTRON ,
[29] ROWSTRON , A., AND D RUSCHEL , P. Pastry: Scalable, dis-
A. Proximity neighbor selection in tree-based structured peer-
tributed object location and routing for large-scale peer-to-peer
to-peer overlays. Tech. Rep. MSR-TR-2003-52, Microsoft Re-
systems. In Middleware’01 (Nov. 2001).
search, Aug. 2003.
[30] S AROIU , S., G UMMADI , K., AND G RIBBLE , S. A measurement
[9] C ASTRO , M., J ONES , M. B., K ERMARREC , A.-M., ROW-
study of peer-to-peer file sharing systems. In MMCN’02 (Jan.
STRON , A., T HEIMER , M., WANG , H., AND W OLMAN , A. An
2002).
evaluation of scalable application-level multicast built using peer-
to-peer overlays. In Infocom’03 (Apr. 2003). [31] S EN , S., AND WANG , J. Analyzing peer-to-peer traffic across
large networks. In Internet Measurement Workshop (Nov. 2002).
[10] C HAWATHE , Y., R ATNASAMY, S., B RESLAU , L., L ANHAM ,
N., AND S HENKER , S. Making Gnutella-like p2p systems scal- [32] S TOICA , I., M ORRIS , R., K ARGER , D., K AASHOEK , M. F.,
able. In SIGCOMM’03 (Aug. 2003). AND BALAKRISHNAN , H. Chord: A scalable peer-to-peer
lookup service for Internet applications. In SIGCOMM’01 (Aug.
[11] E L -A NSARY, S., A LIMA , L. O., B RAND , P., AND H ARIDI , S. 2001).
Efficient broadcast in structured p2p networks. In IPTPS’03 (Feb.
2003). [33] TANG , C., X U , Z., AND DWARKADAS , S. Peer-to-peer informa-
tion retrieval using self-organizing semantic overlay networks. In
[12] F ESSANT, F. L., H ANDURUKANDE , S., K ERMARREC , A.-M., SIGCOMM’03 (Aug. 2003).
AND M ASSOULIE , L. Clustering in peer-to-peer file sharing
workloads. In IPTPS’04 (Feb. 2004). [34] Z EGURA , E., C ALVERT, K., AND B HATTACHARJEE , S. How to
model an internetwork. In INFOCOM’96 (1996).
[13] G ANESAN , P., S UN , Q., AND G ARCIA -M OLINA , H. Yappers:
A peer-to-peer lookup service over arbitrary topology. In Info- [35] Z HAO , B. Y., K UBIATOWICZ , J. D., AND J OSEPH , A. D.
com’03 (Apr. 2003). Tapestry: An infrastructure for fault-resilient wide-area location
and routing. Tech. Rep. UCB-CSD-01-1141, U. C. Berkeley, Apr.
[14] G NAWALI , O. A keyword set search system for peer-to-peer net- 2001.
works, 2002. Master Thesis, MIT.
[15] The Gnutella protocol specification, 2000. http://dss.
clip2.com/GnutellaProtocol04.pdf.

98 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association

View publication stats

You might also like