Debunking Some Myths About Structured and Unstruct
Debunking Some Myths About Structured and Unstruct
net/publication/220832129
CITATIONS READS
127 74
3 authors, including:
All content following this page was uploaded by Miguel Castro on 02 May 2014.
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 85
and new node arrivals. We show that this technique 2 Topology maintenance with churn
can achieve robustness to high rates of churn with
overhead lower than unstructured overlays. Measurement studies of deployed peer-to-peer overlays
have observed a high rate of churn [4, 17, 30]; nodes join
• We describe how to exploit heterogeneity by mod- and leave these overlays constantly. Therefore, peer-to-
ifying any proximity neighbour selection algo- peer overlays should be able to cope with a high rate of
rithm [8, 35, 16] to adapt the topology such that the churn.
indegree of nodes matches their capacity. Can unstructured overlays cope with churn better than
structured overlays?
• We introduce techniques to support complex Each node maintains a set of neighbours to form
queries efficiently on structured topologies with- an overlay. Structured overlays impose constraints on
out constraints on data placement. These tech- the overlay topology; nodes have identifiers and two
niques perform floods or random walks on struc- nodes can be neighbours only if their identifiers satisfy
tured topologies but exploit structural constraints certain constraints. Unstructured overlays do not im-
to ensure that nodes are visited only once during pose constraints on neighbours. Both types of overlay
a query, the number of visited nodes is controlled can improve robustness to churn at the expense of in-
accurately, and the average capacity of nodes vis- creased maintenance overhead by increasing the num-
ited during a query is increased to better exploit ber of neighbours per node and probing them more fre-
heterogeneity. Additionally, they remove the need quently to detect and replace failed neighbours.
to maintain both a structured and an unstructured It is believed that maintaining a structured overlay in
overlay to implement hybrid search strategies [22]. the presence of churn is more expensive than maintain-
ing an unstructured overlay because of the constraints
The paper presents results of detailed comparisons be- on neighbour selection. This section shows that this is
tween several representative structured and unstructured not necessarily the case. It is possible to use structure to
overlay topology maintenance algorithms. These results achieve better robustness with lower maintenance over-
were obtained using simulations driven by real-world head in a structured overlay.
traces of node arrivals and departures in the Gnutella
Structured overlays also impose constraints on data
file sharing application [30]. The results show that our
placement that can result in high overhead under churn
techniques enable structured overlays to cope with high
for some applications [5]. We study structured overlays
rates of churn and exploit heterogeneity effectively with
without these constraints to keep the evaluation indepen-
a maintenance overhead comparable to that achieved by
dent of any particular application. Data placement con-
state-of-the-art unstructured overlays.
straints do not result in significant overhead in several ap-
We also compared the performance of data discovery plications (for example, content distribution [9] and Web
using several representative unstructured overlays and caching [19]) and the search technique in Section 4 does
using our techniques to perform floods and random walks not constrain data placement at all.
on structured overlays. We used a real trace of content
This section describes the implementation of struc-
distribution across nodes in the eDonkey peer-to-peer file
tured and unstructured overlay maintenance protocols
sharing application [12] to drive the simulations. The re-
in an homogeneous setting and compares their perfor-
sults show that our techniques can discover data more
mance. The next section explains how to exploit hetero-
often, faster, or with lower overhead.
geneity.
The additional functionality provided by structured
overlays has proven important to achieve scalability and
efficiency in a wide range of applications. Structured 2.1 Unstructured overlays
overlays can emulate the functionality of unstructured
overlays with comparable or even better performance. We implemented an unstructured overlay maintenance
In Section 2, we describe and compare structured and protocol based on the specification of Gnutella version
unstructured topology maintenance protocols assuming 0.4 [15] but we added many optimizations to the proto-
a homogeneous setting. Section 3 extends the struc- col to ensure a fair comparison.
tured topology maintenance protocol to exploit hetero- Gnutella 0.4 organizes overlay nodes into a random
geneity in peers’ resources and compares this with un- graph. Each node in the overlay maintains a neighbour
structured topology maintenance protocols which exploit table with the network addresses of its neighbours in the
heterogeneity. Section 4 compares the performance of overlay. The neighbour tables are symmetric; if node x
content discovery using random walks and flooding on has node y in its neighbour table then node y has node x
both structured and unstructured topologies, and Section in its neighbour table. There is an upper and lower bound
5 presents our conclusions. on the number of entries in each node’s neighbour table.
86 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
A joining node uses a random walk starting from a results in additional overhead without improved robust-
bootstrap node, which is randomly chosen from the set ness or query performance.
of nodes already in the overlay, to find other nodes to fill
its neighbour table. It sends the bootstrap node a neigh-
bour discovery message with a counter that is initialized
2.2 Structured overlays
to the number of nodes required to fill its neighbour ta- There are several structured overlay maintenance proto-
ble. Upon receiving a discovery message, a node checks cols. We chose an implementation of Pastry [29] called
whether it has less neighbours than the upper bound. If MS Pastry [6] because it has good performance under
this is the case, the node sends a message to the joining churn and has an efficient implementation of proxim-
node inviting it to become a neighbour and decrements ity neighbour selection [8]. We modified it to exploit
the counter in the neighbour discovery message. In either heterogeneity (as described in the next section). Stud-
case, the neighbour discovery message is forwarded to a ies have shown that other structured overlay maintenance
randomly chosen neighbour if the counter is still greater protocols[21, 28] also perform well under churn.
than zero. To increase resilience to node and network Structured overlays map keys to overlay nodes. Over-
failures, all neighbour discovery messages are acknowl- lay nodes are assigned nodeIds selected from a large
edged. If a node does not receive an acknowledgement identifier space and application objects are identified by
within a timeout, it selects another neighbour at random keys selected from the same identifier space. Pastry se-
and forwards the neighbour discovery message to that lects nodeIds and keys uniformly at random from the set
neighbour. of 128-bit unsigned integers and it maps a key k to the
In addition to joins, nodes need to detect failures and node whose identifier is numerically closest to k modulo
replace faulty neighbours. Every t seconds each node 2128 . This node is called the key’s root. Given a message
sends an I’m alive message to every node in its neigh- and a destination key, Pastry routes the message to the
bour table. Since all nodes do the same and neighbour key’s root node. Each node maintains a routing table and
tables are symmetric, each node should receive a mes- a leaf set to route messages.
sage from each neighbour in each t second period. If a NodeIds and keys are interpreted as a sequence of dig-
node does not receive a message from a neighbour, it ex- its in base 2b . We use b = 1 in this paper to minimizes
plicitly probes them and if no reply is received the node is the maintenance overhead. The routing table is a matrix
assumed to be faulty. We used t = 30 seconds in this pa- with 128/b rows and 2b columns. The entry in row r and
per. Nodes maintain a cache of other nodes that they use column c of the routing table contains a random nodeId
to replace failed neighbours. If the cache is empty, they that shares the first r digits with the local node’s nodeId,
obtain new neighbours by sending a neighbour discovery and has the (r + 1)th digit equal to c. If there is no such
message to a randomly chosen neighbour. All messages nodeId, the entry is left empty. The uniform random dis-
sent between the nodes are used to replace explicit I’m tribution of nodeIds ensures that only log2b N rows have
alive messages. non-empty entries on average. Additionally, the column
Simulation results show that this protocol leads to poor in row r corresponding to the value of the (r + 1)th digit
query performance because the neighbour table of a join- of the local node’s nodeId remains empty.
ing node and those of its neighbours are likely to share a Nodes use a neighbour selection function to select be-
significant fraction of nodes. This reduces the effective- tween two candidates for the same routing table slot.
ness of floods and random walks to discover data. We Given two candidates y and z for slot (r, c) in node x’s
overcome this problem by forwarding the neighbour dis- routing table, x selects z if z’s nodeId is numerically
covery message over a number of random hops after each closer than y’s to the nodeId obtained by replacing the
neighbour invitation is sent. We add a hop counter to (r + 1)th digit of x’s nodeId by c. This neighbour selec-
discovery messages that is set to R by every node that tion function promotes stability in routing tables while
replies with a neighbour invitation. Nodes decrement the distributing load. We chose not to use proximity neigh-
hop counter when they forward a discovery message and bour selection because it increases overhead slightly and
they only consider sending a neighbour invitation when low delay routes do not seem important for the applica-
the counter is less than or equal to zero. We used R = 5 tions we study in this paper.
in this paper as, from experimental evaluation, this pro- The leaf set connects nodes in a ring. It contains the
vided good query performance with small increase in l/2 closest nodeIds clockwise from the local nodeId and
maintenance overheads. the l/2 closest nodeIds counter clockwise. The leaf set
We use unbiased random walks because we found that ensures reliable message delivery. We use l = 32 in
biasing the random walk to nodes with low degree re- this paper, which provides high robustness to large scale
duces overhead but results in poor query performance. failures and high churn rates.
We also experimented with flooding of discovery mes- At each routing step, the local node normally forwards
sages (as specified in the Gnutella 0.4 protocol) but this the message to a node whose nodeId shares a prefix with
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 87
the key that is at least one digit longer than the prefix example, the original Chord [32] finger table and Pastry’s
that the key shares with the local node’s nodeId. If no constrained routing table [7]. For example, Pastry’s con-
such node is known, the message is forwarded to a node strained routing table enables a node that detects the fail-
whose nodeId is numerically closer to the key and shares ure of its right neighbour to locate all nodes with routing
a prefix with the key at least as long. The leaf set is used table entries pointing to the failed node with an expected
to determine the destination node in the last hop. cost of O(log N) messages. We chose not to use the con-
strained routing table because it eliminates the flexibility
Exploiting structure to reduce maintenance overhead necessary to cope with heterogeneous peers as described
Structured overlays can use structure to reduce mainte- in the next section.
nance overhead in several ways. First, several structured MS Pastry uses a different strategy to detect failures
overlays use structure to initialize the routing tables of in the routing table. Since the routing table is not sym-
joining nodes efficiently and to announce their arrival. metrical, a node explicitly probes every member every
Node joining in Pastry exploits the topology structure tr seconds to detect failures. The routing table probing
as follows. A joining node x picks a random nodeId X period tr is set dynamically by each node based on the
and asks a bootstrap node a to route a special join mes- node failure rate in the overlay observed by the node [6].
sage using X as the destination key. This message is We configured MS Pastry to achieve a 1% loss rate, i.e., a
routed to the node z with nodeId numerically closest to message routed between a pair of nodes has a probability
X. The nodes along the overlay route add routing table of 99% of reaching the destination even in the absence of
rows to the message; node x obtains the rth row of its retransmissions.
routing table from the node encountered along the route Pastry also has a periodic routing table maintenance
whose nodeId matches x’s in the first r − 1 digits and protocol to repair failed entries. Each node x asks a node
its leaf set from z. After initializing its routing table, x in each row of the routing table for the corresponding row
sends the rth row of the table to each node in that row. in its routing table. x chooses between the new entries in
This serves both to announce x’s presence and to gos- received rows and the entries in its routing table using
sip information about nodes that joined previously. Each the neighbour selection function defined above. This is
node that receives a row considers using the new nodes repeated periodically, for example, every 20 minutes in
to replace entries in its routing table. the current implementation. Additionally, Pastry has a
Additionally, structured overlays can eliminate redun- passive routing table repair protocol: when a routing ta-
dant failure detection probes by using structure to parti- ble slot is found empty during routing, the next hop node
tion failure detection responsibility and to locate nodes is asked to return any entry it may have for that slot.
that need to be informed when a failure is detected. For These techniques used to reduce overhead in MS Pas-
example, MS Pastry uses this technique to reduce the try are described in detail in [6] and are applicable to
number of liveness probes in the leaf set by a factor of other structured overlays.
32. Each node sends a single I’m alive message every tl
seconds to its left neighbour in the id space. If a node 2.3 Experimental comparison
does not receive a message from its right neighbour, it
probes the neighbour and marks it faulty if it does not re- We compare the maintenance overhead of the different
ply. When it marks the neighbour faulty, it discovers the overlays using a packet-level discrete-event simulator.
new member of its leaf set by querying the right neigh- We simulated a transit-stub network topology [34] with
bour of the failed node and informs all the members of 5050 routers. There are 10 transit domains at the top
the new leaf set about the failed node. If several con- level with an average of 5 routers in each. Each transit
secutive nodes in the ring fail, the left neighbour of the router has an average of 10 stub domains attached, and
leftmost node will detect the failure and repair provided each stub has an average of 10 routers. Routing is per-
the number of consecutive nodes that failed is less than formed using the routing policy weights of the topology
l/2 − 1. We use tl = 30 seconds in this paper, which is generator [34]. The simulator models the propagation
equal to the period between I’m alive messages in the un- delay on the physical links. The average delay of router-
structured overlays. This technique is readily applicable router links was 40.7ms. In the experiments, each end
to systems that organize nodes into a logical ring, for ex- system node was attached to a randomly selected stub
ample [32, 29, 28], but harder to apply to other systems, router with a link delay of 1ms.
for example [25, 35]. The simulation is driven using a real-world trace of
The technique can be extended to eliminate fault de- node arrivals and failures from a measurement study of
tection probes sent to routing table entries. This can Gnutella [30]. The study monitored 17,000 unique nodes
be done in routing tables that constrain each node x to in the Gnutella overlay over a period of 60 hours. It
point to nodes whose identifiers are the closest to specific probed each node every seven minutes to check if it was
points in the identifier space derived from x’s nodeId, for still part of the overlay. The average session time over
88 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.9
0.8
send I’m alive messages to each of their neighbours every
Gnutella 0.4 (8)
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 89
messages per second per node (or less than 69 bytes per lay topology such that nodes with higher capacity have
second per node), which is still negligible. higher degree. Since high-degree nodes receive a larger
fraction of the traffic, this ensures that they have the ca-
pacity to handle this traffic. Gia’s fine-grained approach
3 Exploiting heterogeneity to exploit heterogeneity can perform better than simply
using super-peers [10].
Nodes in deployed peer-to-peer overlays are heteroge- We implemented Gia exactly as described in [10].
neous [30]; they have different bandwidth, storage, and Node discovery is implemented using a random walk
processing capacities. An overlay that ignores the differ- (as described for Gnutella 0.4) but the nodes use Gia’s
ent node capacities must bound the load on any node to pick neighbor to drop function [10] to decide whether
be below the load that the least capable nodes are able to send back a neighbour invitation message. Topology
to sustain; otherwise, it risks congestion collapse. It is adaptation is driven by Gia’s satisfaction level function,
important to exploit heterogeneity to improve scalability. which increases with the sum of the ratio between the
Can unstructured overlays exploit heterogeneity more capacity and degree of each neighbour. This function
effectively than structured overlays? is evaluated periodically and nodes with a low satisfac-
Structured overlays have constraints on the graph tion level attempt to find a new neighbour to increase the
topology that reduce flexibility to adapt the topology to level. The adaptation interval is computed as in Gia (with
exploit heterogeneity. However, some structured over- the parameters K = 256 and T = 10 seconds).
lays have significant flexibility in the choice of some
overlay neighbours, which is important to implement
proximity neighbour selection [35, 29, 16, 28]. These 3.2 Structured overlays
structured overlays can exploit heterogeneity by mod- We implemented two structured overlay maintenance
ifying the proximity neighbour selection algorithm to protocols based on Pastry that exploit heterogeneity: Su-
choose nodes with high capacity as overlay neighbours. perPastry uses super-peers like Gnutella 0.6 and Het-
We show that this is as effective as recent proposals to eroPastry uses topology adaptation like Gia.
adapt unstructured overlay topologies [10]. It is simple to exploit the super-peers concept in a
This section describes the implementation of several structured overlay. The super-peers are organized into
structured and unstructured overlay maintenance proto- a structured overlay using the Pastry algorithm described
cols that exploit heterogeneity and compares their per- in the previous section. Ordinary peers do not join this
formance. overlay. Instead they attach to a small number of super-
peers as in Gnutella 0.6. Ordinary peers select super-
3.1 Unstructured overlays peers to attach to by routing to random destination keys
through a bootstrap super-peer. They exchange I’m alive
We implemented two unstructured overlay maintenance messages with the selected super-peers to detect failures
algorithms that exploit heterogeneity: a version of as in Gnutella 0.6.
Gnutella 0.6 [2] and a version of Gia [10]. The implementation of capacity-aware topology adap-
Gnutella 0.6 extends the Gnutella 0.4 protocol by tation in structured overlays is less obvious. We propose
adding the concept of super-peers [3]. Nodes that are a simple solution based on existing proximity neigh-
capable of contributing enough resources to the overlay bour selection algorithms [29, 35, 16]. These algo-
are classified as super-peers and organized into a ran- rithms select the closest neighbours in the underlying
dom graph using the optimized version of the Gnutella network subject to the structural constraints on the topol-
0.4 protocol (which was described in the previous sec- ogy. They can be modified to provide capacity-aware
tion). Ordinary nodes are not part of the random graph. topology adaptation by using a proximity metric that re-
Instead, each ordinary node attaches to a small number flects node capacity.
of randomly selected super-peers and proxies its data HeteroPastry uses the Pastry algorithm described in
discovery queries through them. Ordinary nodes select the previous section except that it achieves capacity-
super-peers to attach to using a random walk with a mod- aware topology adaptation by modifying the neighbour
ified neighbour discovery message and they exchange selection function to take node capacity into account.
I’m alive messages with the selected super-peers to de- Given two candidates y and z for slot (r, c) in node x’s
tect failures. This topology places most of the search and routing table, x selects z if it has capacity greater than
overlay maintenance load on super-peers. y or if z and y have the same capacity and z’s nodeId is
Gia [10] provides a more fine-grained adaptation to numerically closer than y’s to the nodeId obtained by re-
heterogeneity. Each node selects a numerical capacity placing the (r + 1)th digit of x’s nodeId by c. We assume
value that abstracts the amount of resources that it is that node capacities are quantized into a few discrete val-
willing to contribute to the overlay. Gia adapts the over- ues for the randomization based on nodeIds to be effec-
90 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.45
tive at distributing load. It is possible to design neighbour 0.4
Gnutella 0.6
SuperPastry
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 91
200
Capacity Probability 180
1 0.2 160
Average indegree
140
10 0.45 120
10000 0.001 40
20
0
Table 1: Node capacity distribution 1 10 100 1000 10000
Capacity
0.8
Gia
0.7 HeteroPastry
Figure 5: Average indegree of nodes with each capacity
Messages / second / node
0.6
0.5 value.
0.4
0.3
1000
the overlay and execute the search query locally at each
visited node.
100 Can unstructured overlays support complex queries
more efficiently than structured overlays?
10
Several research prototypes support keyword searches
1 using the exact-match queries of structured overlays [27,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Level of routing table
33, 14, 18] to implement inverted indices. The basic idea
is to use the structured overlay to map keywords to over-
lay nodes. The node responsible for a keyword stores an
Figure 4: Average capacity of nodes in routing table en- index with the location of all documents that contain the
tries at each level in HeteroPastry. keyword. When a file is added to the system, the nodes
92 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
responsible for the keywords in the file are contacted to visit the same node more than once, which resulted in
update the appropriate indices. A query for documents worse search performance. We added a list to each query
containing a set of keywords contacts the nodes respon- with all the nodes already visited by the query to prevent
sible for those keywords and intersects their indices. this. Nodes do not forward a query to a node that is in
Unfortunately, this approach has several problems. this list.
Maintaining the indices in the presence of churn is ex- All unstructured overlays use one hop replication,
pensive and popular keywords may be mapped to low which has been shown to improve search performance
capacity nodes that cannot cope with the load [10]. Ad- in unstructured overlays [10]. A node replicates an index
ditionally, the queries can be expensive because they re- of its content at each of its neighbours. In Gnutella 0.6,
quire computing the intersection of large indices. The these indices are only replicated at super peers.
analysis in [20] shows that this approach performs worse
than flooding queries to 60,000 nodes in a random graph.
4.2 Structured Overlays
Therefore, this approach performs significantly worse
than recent unstructured overlays like Gia [10]. Addi- The hybrid system exploits structure to implement ran-
tionally, unstructured overlays can support even more so- dom walks and constrained floods more efficiently.
phisticated queries that are not supported by the inverted Flooding in random graphs is inefficient because each
indices approach, for example, regular expressions and node is likely to be visited more than once. In a graph
range queries on multiple attributes. with an average degree of k, a flood that visits all nodes
This section explores a different approach to support- will send on average (k − 1) × N messages (where N
ing complex queries in structured overlays. We devel- is the size of the overlay). Additionally, it is difficult to
oped a hybrid system that uses the topology from struc- control the number of nodes visited during a constrained
tured overlays with the data placement and data discov- flood. Floods are constrained using a time-to-live field
ery strategies of unstructured overlays. We introduce in the query message that is decremented every time the
new techniques to perform floods or random walks over query is forwarded. The query is not forwarded when
structured topologies that provide support for arbitrar- the time-to-live field drops to zero. This provides very
ily complex queries. These techniques take advantage coarse control over the number of nodes visited.
of structural constraints on the topology to ensure that The hybrid system can do better by replacing flood-
nodes are visited only once during a query, to control ing with the broadcast mechanisms that have been pro-
the number of nodes that are visited accurately, and to posed for structured overlays [26, 9, 11]. We use Pas-
increase the average capacity of nodes visited during a try’s broadcast mechanism [9] to flood queries to over-
query to exploit heterogeneity more effectively. lay nodes. A node y broadcasts a query by sending the
The results in the previous sections show that it is pos- query to all the nodes x in its routing table. Each query
sible to maintain a structured overlay that exploits het- is tagged with the routing table row r of node x. When
erogeneity with low maintenance overhead. Addition- a node receives a query tagged with r, it forwards the
ally, the hybrid system does not constrain data place- query to all nodes in its routing table in rows greater than
ment; nodes do not have to incur the overhead of up- r if any.
dating distributed indices for each keyword in their files. A node may have a missing entry in a slot in its rout-
This section compares the performance of random walks ing table, for example, because it pointed to a node that
and floods on the overlays that were described in the pre- failed. The broadcast overcomes this problem by using
vious section. Pastry to route the query to a node with the appropriate
nodeId to fill the slot (if there is any) [9]. Almost all
4.1 Unstructured overlays nodes receive the query only once but the technique to
deal with empty routing table slots may result in a small
We used random walks to discover data because they number of duplicates.
have been shown to induce lower overhead than the con- We place an upper bound on the row number of entries
strained floods [23] used by current versions of Gnutella. to which the query is forwarded to constrain the flood.
These random walks are biased to prefer nodes with This bounds the number of nodes visited to a power of
higher degree in Gia and are unbiased in the other un- two. It is simple to extend this mechanism to provide
structured overlays. The original Gia [10] biased the ran- arbitrarily fine grained control over the number of nodes
dom walks to prefer nodes with higher capacity but our visited.
experimental results indicate that preferring nodes with This mechanism can easily be modified to perform
higher degree yields both higher success rate and lower random walks rather than floods by performing a breadth
delay. We present results for this optimized version of first traversal of the tree used for flooding. This can be
Gia. done by adding a set of nodes to visit in the query mes-
We observed that random walks in Gia were likely to sage. A random walk query message includes the tag r,
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 93
450 400
400 350
350
300
Number of nodes
Number of copies
300
250
250
200 200
150 150
100 100
50
50
0
1 10 100 1000 10000 0
1 10 100 1000 10000 100000 1000000
Number of cached files Popularity ranking
Figure 6: Distribution of the number of files per node for Figure 7: Number of files versus file rank for the eDon-
the eDonkey file trace [12]. key file trace [12].
an array q with queues of nodes indexed by routing ta- The eDonkey trace does not include queries but the
ble row, and a bound d on the maximum row number to number of copies of a file is strongly correlated with the
traverse. When the query is received at node x, it ap- number of queries that it satisfies. Therefore, our query
pends the nodes in each routing table row r0 to queue distribution matches the distribution of the number of
q[r0 ] provided that r < r0 ≤ d. Then, if queue q[r] is not copies of files.
empty, x removes the next node from the queue and for- Each node generates 0.01 query messages per second
wards the query to this node. If q[r] is empty, the query using a Poisson process and each query searches for a
is forwarded to the first node in queue q[r + 1] and r is file in the trace. The simulator maintains the distribution
incremented. If all queues are empty, the random walk is of the number of copies of files stored by nodes that are
complete. currently in the overlay. The target file for each query is
The results in the previous section show that the aver- chosen from this distribution (which is a sample of the
age capacity of the nodes in routing table entries in Het- distribution in Figure 7). This ensures that at least one
eroPastry decreases as the row number increases. There- copy of the target file is stored in the overlay when the
fore, the mechanism that we use to bound the floods and query is initiated.
random walks biases them to visit nodes with higher ca- In all the experiments, we bound random walks to visit
pacity in HeteroPastry. at most 128 nodes. When a node x receives a query, it
We also implement one hop replication in the hybrid checks if the target file is stored locally or if it is stored
system. Each node replicates an index of its local content by nodes whose indices are replicated locally. In the first
on the nodes in its routing table. Therefore, it is expected case, the query is satisfied and x does not forward the
to replicate its index in log2 (N ) other nodes. query further. In the second case, x contacts a random
node y which it believes has a copy of the file. If y has
the file, the query is satisfied and y sends an acknowl-
4.3 Experimental comparison
edgment back to x. If x receives the acknowledgment
We compared the performance of random walks on struc- before a timeout, it stops forwarding the query. Other-
tured and unstructured overlays. We used the basic ex- wise, x contacts another random node that it believes has
perimental setup described in the previous sections but the file or it forwards the query if there are no more such
we simulated queries and node file stores. nodes.
We used a real-world trace of files stored by eDon- We measured the fraction of queries that are satisfied
key [12] peers to model the sets of files stored by sim- and the delay from the moment a query is initiated until
ulated nodes. There are 37,000 peers in the trace and, it is satisfied. We also measured the load by counting the
for each peer, there is a record with the identifiers of the number of messages sent per second per node.
files stored by the peer. Figure 6 shows the distribution
of the number of files stored by each peer. It excludes 4.3.1 Gnutella trace
the 25,172 peers that have no files. We model the set of
files stored by each node as follows: when a node joins, We compared the performance of data discovery on the
the simulator chooses a random unused record from the overlays that exploit heterogeneity. Figure 8 shows the
trace and assigns the files in the record to the node. query success rate, Figure 9 shows the delay for success-
There are approximately 923,000 unique files. File ful queries, and Figure 10 shows the overhead in mes-
copies exhibit a heavy-tailed zipf-like distribution as sages per second per node. The results show that fine-
shown in Figure 7. Full details about the trace can be grained topology adaptation performs better than using
found in [12]. super-peers. HeteroPastry achieves significantly higher
94 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
1.2 2
Gia Gnutella 0.6
1.8
1 SuperPastry HeteroPastry
Success rate
1.2
0.6 1
0.8
0.4 HeteroPastry
Gia 0.6
Gnutella 0.6 0.4
0.2 SuperPastry
0.2
0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time(hours) Time (hours)
Figure 8: Query success rate. Figure 10: Messages per second per node.
30000 1
Gnutella 0.6
SuperPastry 0.9 1
25000 Gia 0.8 10
HeteroPastry 100
0.7
Fraction of nodes
20000 1000
Delay (ms)
0.6 10000
15000 0.5
0.4
10000 0.3
0.2
5000
0.1
0 0
0.01 0.1 1 10 100
0 10 20 30 40 50 60
Messages / second
Time(hours)
Figure 9: Query delay for successful queries. Figure 11: Cumulative distribution of messages per sec-
ond per node for each capacity value in HeteroPastry.
10000
heads of the two are comparable with queries because 0.6
0.5
of the suppression of failure detection traffic and shorter
0.4
random walks. 0.3
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 95
4
Capacity 1 10 100 1000 10000
3.5 Gia
Gia Mean 3 3 23.56 126.02 128
0.5
Figure 13: Messages per second per node for Gia and
HeteroPastry versus session time.
and 10-capacity nodes is bounded to the same value, this
is not surprising. In both Gia and HeteroPastry, the 100-
capacity nodes incur a higher overhead than the 1- and 4.3.2 Poisson traces
10-capacity nodes but a lower overhead than the 1000- The experiments described so far use a trace of node
capacity nodes. arrivals and departures collected in a real Gnutella de-
ployment. The next set of experiments compare the per-
The figures also show that the load on any node is suf-
formance of Gia and HeteroPastry using artificial traces
ficiently low (with a query rate of 0.01 queries per second
with more nodes and different rates of churn. These
per node) that flow control is not necessary. Gia’s flow
traces have Poisson node arrivals and an exponential dis-
control mechanism [10] can be applied to HeteroPastry
tribution of node session times with the same rate. We
to enable scaling to higher query rates.
generated traces with session times of 5, 15, 30, 60, 120
We also studied the distribution of replicas of node in- and 600 minutes and in all cases the average number of
dices, which is another indicator of the effectiveness of nodes was 10,000. We used the same data and query dis-
both systems in adapting the topology to diffferent node tribution as in the previous experiments. It is important
capacities. Table 2 summarises the distribution of repli- to note that a session time of 5 minutes is short; indeed,
cas of indices for each capacity value in both systems. it is 28 times shorter than the average session time of 2.3
The total numbers of index replicas is 27,707 in Het- hours observed in the Gnutella trace.
eroPastry and 38,153 in Gia. Both systems do a good Figure 13 shows the total number of messages per sec-
job at distributing index replicas (and indegree) accord- ond per node for the different session times. Both Gia
ing to node capacity. Gia replicates more because it is and HeteroPastry have low overhead across all session
more effective at pushing replicas to nodes with capacity times.
100 and 1000. Gia’s overhead is almost constant across all session
times. Short session times increase Gia’s overhead
HeteroPastry maintains significantly less index repli- because of increased retransmissions and traffic to fill
cas than Gia but it performs better because its random neighbour tables. However, this is offset by a decrease
walks visit nodes with more index replicas and more di- in fault detection traffic due to a decrease in the average
verse index replicas than those visited by random walks number of neighbours; there are 15.1 neighbours when
in Gia. In Gia, nodes that are close in the overlay topol- the session time is 600 and 10.7 when it is 5.
ogy tend to share the same high capacity neighbours. HeteroPastry has a lower message overhead than Gia
This reduces the number of unique files known by a node for session times of 30 minutes or greater. This overhead
and its neighbours and it forces biased random walks to decreases between 60 and 600 minutes because Het-
visit low capacity nodes before they can find new high eroPastry adapts the routing table probing rate to match
capacity nodes to visit. Since the number of index repli- the failure rate. HeteroPastry incurs a higher message
cas stored by a node is proportional to its capacity, this overhead than Gia for extremely high churn rates mostly
results in poor performance. The topology adaptation due to the overhead of maintaining the leaf set. This
and random walk mechanisms in HeteroPastry exploit overhead could be reduced without impacting query suc-
structure to prevent this problem; the constraints on the cess rate and delay by using a smaller leaf set or disabling
node identifiers of neighbours and nodes visited during the mechanisms to ensure strong leaf set consistency [6],
a random walk ensure that the initial set of nodes vis- which are not important in this application.
ited has high capacity and knows about more unique Figure 14 shows the lookup success rate for the dif-
files. This results in HeteroPastry visiting significantly ferent session times. As in previous experiments, Het-
less nodes with capacity 100 during random walks than eroPastry achieves a success rate higher than Gia across
Gia (as shown in Figure 11). all session times.
96 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
0.9 2
1.8 Flooding
0.8
Random walks
Success rate
1.2
0.5
1
0.4
HeteroPastry 0.8
0.3 Gia
0.6
0.2 0.4
0.1 0.2
0 0
0 100 200 300 400 500 600 0 10 20 30 40 50 60
Session time (minutes) Time(hours)
Figure 14: Query success rate for Gia and HeteroPastry Figure 16: Messages per second per node when using
versus session time. constrained floods and random walks in HeteroPastry.
12000
Random walks
10000 Flooding the nodes with the random walk. Additionally, random
8000 walks use acknowledgments and retransmissions to re-
Delay (ms)
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 97
system that uses the search and data placement strategies [16] G UMMADI , K. P., G UMMADI , R., G RIBBLE , S. D., R AT-
of unstructured overlays on a structured overlay topol- NASAMY, S., S HENKER , S., AND S TOICA , I. The impact of
DHT routing geometry on resilience and proximity. In SIG-
ogy. Simulation results using a real-world trace show COMM’03 (Aug. 2003).
that the hybrid system can support complex queries with
[17] G UMMADI , P. K., D UNN , R. J., S AROIU , S., G RIBBLE , S. D.,
lower message overhead while providing higher query L EVY, H. M., AND Z AHORJAN , J. Measurement, modeling,
success rates and lower response times than the state of and analysis of a peer-to-peer file-sharing workload. In SOSP’03
the art in unstructured overlays. (Oct. 2003).
The additional functionality provided by structured [18] H ARREN , M., H ELLERSTEIN , J. M., H UEBSCH , R., L OO ,
overlays has proven important to achieve scalability and B. T., S HENKER , S., AND S TOICA , I. Complex queries in DHT-
based peer-to-peer networks. In IPTPS’02 (Mar. 2002).
efficiency in a wide range of applications. Structured
overlays can emulate the functionality of unstructured [19] I YER , S., ROWSTRON , A., AND D RUSCHEL , P. Squirrel: A
decentralized peer-to-peer web cache. In PODC’02 (July 2002).
overlays with comparable or even better performance.
[20] L I , J., L OO , B. T., H ELLERSTEIN , J., K AASHOEK , F.,
Interestingly, it is not clear that unstructured overlays can K ARGER , D. R., AND M ORRIS , R. On the feasibility of peer-to-
efficiently emulate the same functionality as structured peer web indexing and search. In IPTPS’03 (Feb. 2003).
overlays. [21] L I , J., S TRIBLING , J., G IL , T. M., M ORRIS , R., AND
K AASHOEK , M. F. Comparing the performance of distributed
References hash tables under churn. In IPTPS’04 (Feb. 2004).
[22] L OO , B. T., H ELLERSTEIN , J. M., H UEBSCH , R., S HENKER ,
[1] The Gnutella 0.4 protocol specification, 2000. S., AND S TOICA , I. Enhancing P2P file sharing with an Internet-
http://dss.clip2.com/GnutellaProtocol04.pdf. scale query processor. In VLDB’04 (Sept. 2004).
[2] The Gnutella 0.6 protocol specification, 2002. [23] LV, Q., C AO , P., C OHEN , E., L I , K., AND S HENKER , S. Search
http://www.limewire.org/. and replication in unstructured peer-to-peer networks. In ICS’02
(June 2002).
[3] Kazaa, 2002. http://www.kazaa.com/.
[24] LV, Q., R ATNASAMY, S., AND S HENKER , S. Can heterogeneity
[4] B HAGWAN , R., S AVAGE , S., AND VOELKER , G. Understanding make Gnutella scalable? In IPTPS’02 (Feb. 2002).
availability. In IPTPS’03 (Feb. 2003).
[25] R ATNASAMY, S., F RANCIS , P., H ANDLEY, M., K ARP, R., AND
[5] B LAKE , C., AND RODRIGUES , R. High Availability, Scalable S HENKER , S. A scalable content-addressable network. In SIG-
Storage, Dynamic Peer Networks: Pick Two. In HotOS IX (May COMM’01 (Aug. 2001).
2003).
[26] R ATNASAMY, S., H ANDLEY, M., K ARP, R., AND S HENKER , S.
[6] C ASTRO , M., C OSTA , M., AND ROWSTRON , A. Performance Application-level multicast using content-addressable networks.
and dependability of structured peer-to-peer overlays. In DSN’04 In NGC’01 (Nov. 2001).
(June 2004).
[27] R EYNOLDS , P., AND VAHDAT, A. Efficient peer-to-peer key-
[7] C ASTRO , M., D RUSCHEL , P., G ANESH , A., ROWSTRON , A., word searching. In Middleware’03 (Nov. 2003).
AND WALLACH , D. S. Security for structured peer-to-peer over-
lay networks. In OSDI’02 (Dec. 2002). [28] R HEA , S., G EELS , D., ROSCOE , T., AND K UBIATOWICZ , J.
Handling churn in a DHT. In USENIX’04 (June 2004).
[8] C ASTRO , M., D RUSCHEL , P., H U , Y. C., AND ROWSTRON ,
[29] ROWSTRON , A., AND D RUSCHEL , P. Pastry: Scalable, dis-
A. Proximity neighbor selection in tree-based structured peer-
tributed object location and routing for large-scale peer-to-peer
to-peer overlays. Tech. Rep. MSR-TR-2003-52, Microsoft Re-
systems. In Middleware’01 (Nov. 2001).
search, Aug. 2003.
[30] S AROIU , S., G UMMADI , K., AND G RIBBLE , S. A measurement
[9] C ASTRO , M., J ONES , M. B., K ERMARREC , A.-M., ROW-
study of peer-to-peer file sharing systems. In MMCN’02 (Jan.
STRON , A., T HEIMER , M., WANG , H., AND W OLMAN , A. An
2002).
evaluation of scalable application-level multicast built using peer-
to-peer overlays. In Infocom’03 (Apr. 2003). [31] S EN , S., AND WANG , J. Analyzing peer-to-peer traffic across
large networks. In Internet Measurement Workshop (Nov. 2002).
[10] C HAWATHE , Y., R ATNASAMY, S., B RESLAU , L., L ANHAM ,
N., AND S HENKER , S. Making Gnutella-like p2p systems scal- [32] S TOICA , I., M ORRIS , R., K ARGER , D., K AASHOEK , M. F.,
able. In SIGCOMM’03 (Aug. 2003). AND BALAKRISHNAN , H. Chord: A scalable peer-to-peer
lookup service for Internet applications. In SIGCOMM’01 (Aug.
[11] E L -A NSARY, S., A LIMA , L. O., B RAND , P., AND H ARIDI , S. 2001).
Efficient broadcast in structured p2p networks. In IPTPS’03 (Feb.
2003). [33] TANG , C., X U , Z., AND DWARKADAS , S. Peer-to-peer informa-
tion retrieval using self-organizing semantic overlay networks. In
[12] F ESSANT, F. L., H ANDURUKANDE , S., K ERMARREC , A.-M., SIGCOMM’03 (Aug. 2003).
AND M ASSOULIE , L. Clustering in peer-to-peer file sharing
workloads. In IPTPS’04 (Feb. 2004). [34] Z EGURA , E., C ALVERT, K., AND B HATTACHARJEE , S. How to
model an internetwork. In INFOCOM’96 (1996).
[13] G ANESAN , P., S UN , Q., AND G ARCIA -M OLINA , H. Yappers:
A peer-to-peer lookup service over arbitrary topology. In Info- [35] Z HAO , B. Y., K UBIATOWICZ , J. D., AND J OSEPH , A. D.
com’03 (Apr. 2003). Tapestry: An infrastructure for fault-resilient wide-area location
and routing. Tech. Rep. UCB-CSD-01-1141, U. C. Berkeley, Apr.
[14] G NAWALI , O. A keyword set search system for peer-to-peer net- 2001.
works, 2002. Master Thesis, MIT.
[15] The Gnutella protocol specification, 2000. http://dss.
clip2.com/GnutellaProtocol04.pdf.
98 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association