A Survey and Comparison of Peer-To-Peer Overlay Network Schemes
A Survey and Comparison of Peer-To-Peer Overlay Network Schemes
00 © 2005 IEEE
www.comsoc.org/pubs/surveys
ABSTRACT
Over the Internet today, computing and communications environments are
significantly more complex and chaotic than classical distributed systems, lacking any
centralized organization or hierarchical control. There has been much interest in
emerging Peer-to-Peer (P2P) network overlays because they provide a good substrate
for creating large-scale data sharing, content distribution, and application-level
multicast applications. These P2P overlay networks attempt to provide a long list of
features, such as: selection of nearby peers, redundant storage, efficient search/loca-
tion of data items, data permanence or guarantees, hierarchical naming, trust and
authentication, and anonymity. P2P networks potentially offer an efficient routing
architecture that is self-organizing, massively scalable, and robust in the wide-area,
combining fault tolerance, load balancing, and explicit notion of locality. In this
article we present a survey and comparison of various Structured and Unstructured
P2P overlay networks. We categorize the various schemes into these two groups in
the design spectrum, and discuss the application-level network performance
of each group.
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
74 IEEE Communications Surveys & Tutorials • Second Quarter 2005
retrieves an entry corresponding to key K, and any peer can ously unavailable. Thus, queries for a particular hash table
apply the same deterministic hash function to map K onto entry could be forwarded to all k peers in parallel, thereby
point P and then retrieve the corresponding value V from the reducing the average query latency, and reliability and fault
point P. If the requesting peer or its immediate neighbors do resiliency properties are enhanced.
not own the point P, the request must be routed through the CAN could be used in large-scale storage management sys-
CAN infrastructure until it reaches the peer where P lays. A tems such as the OceanStore [17], Farsite [18], and Publius
peer maintains the IP addresses of those peers that hold coor- [19]. These systems require efficient insert and retrieval of
dinate zones adjoining its zone. This set of immediate neigh- content in a large distributed storage network with a scalable
bors in the coordinate space serves as a coordinate routing indexing mechanism. Another potential application for CANs
table that enables efficient routing between points in this is in the construction of wide-area name resolution services
space. that decouple the naming scheme from the name resolution
A new peer that joins the system must have its own portion process. This enables an arbitrary and location-independent
of the coordinate space allocated. This can be achieved by naming scheme.
splitting an existing peer’s zone in half, retaining half for the
peer and allocating the other half to the new peer. CAN has CHORD
an associated DNS domain name that is resolved into the IP
address of one or more CAN bootstrap peers (which maintain Chord [6] uses consistent hashing [20] to assign keys to its
a partial list of CAN peers). For a new peer to join a CAN peers. Consistent hashing is designed to let peers enter and
network, the peer looks up in the DNS of a CAN domain leave the network with minimal interruption. This decentral-
name to retrieve a bootstrap peer’s IP address, similar to the ized scheme tends to balance the load on the system, since
bootstrap mechanism in [16]. The bootstrap peer supplies the each peer receives roughly the same number of keys, and
IP addresses of some randomly chosen peers in the system. there is little movement of keys when peers join and leave the
The new peer randomly chooses a point P and sends a JOIN system. In a steady state, for a total of N peers in the system,
request destined for point P. Each CAN peer uses the CAN each peer maintains routing state information for about
routing mechanism to forward the message until it reaches the O(logN) other peers. This may be efficient but performance
peer in which zone P lies. The current peer in zone P then degrades gracefully when that information is out-of-date.
splits its zone in half and assigns the other half to the new The consistent hash functions assign peers and data keys
peer. For example, in a 2-dimensional space, a zone would an m-bit identifier using SHA-1 [21] as the base hash func-
first be split along the X dimension, then the Y, and the split- tion. A peer’s identifier is chosen by hashing the peer’s IP
ting continues. The {K,V} pairs from the half zone to be address, while a key identifier is produced by hashing the data
handed over are also transferred to the new peer. After key. The length of the identifier m must be large enough to
obtaining its zone, the new peer learns of the IP addresses of make the probability of keys hashing to the same identifier
its neighbor set from the previous peer in point P, and adds to negligible. Identifiers are ordered on an identifier circle mod-
that the previous peer itself. ulo 2m. Key k is assigned to the first peer whose identifier is
When a peer leaves the CAN network, an immediate equal to or follows k in the identifier space. This peer is called
takeover algorithm ensures that one of the failed peer’s neigh- the successor peer of key k, denoted by successor(k). If identi-
bors takes over the zone and starts a takeover timer. The peer fiers are represented as a circle of numbers from 0 to 2m – 1,
updates its neighbor set to eliminate those peers that are no then successor(k) is the first peer clockwise from k. The iden-
longer its neighbors. Every peer in the system then sends soft- tifier circle is called the Chord ring. To maintain consistent
state updates to ensure that all of their neighbors will learn hashing mapping when a peer n joins the network, certain
about the change and update their own neighbor sets. The keys previously assigned to n’s successor now need to be reas-
number of neighbors a peer maintains depends only on the signed to n. When peer n leaves the Chord system, all of its
dimensionality of the coordinate space (i.e., 2 × d) and it is assigned keys are reassigned to n’s successor. Therefore, peers
independent on the total number of peers in the system. join and leave the system with (logN)2 performance. No other
The Fig. 3 example illustrated a simple routing path from changes of keys assignment to peers need to occur. In Fig. 4
peer X to point E and a new peer Z joining the CAN net- (adapted from [6]), the Chord ring is depicted with m = 6.
work. For a d-dimensional space partitioned into n equal This particular ring has 10 peers and stores five keys. The suc-
zones, the average routing path length is (d/4) × (n1/d) hops cessor of the identifier 10 is peer 14, so key 10 will be located
and individual peers maintain a list of 2 × d neighbors. Thus, at NodeID 14. Similarly, if a peer were to join with identifier
the growth of peers (or zones) can be achieved without 26, it would store the key with identifier 24 from the peer with
increasing per peer state while the average path length grows identifier 32.
as O(n1/d). Since there are many different paths between two Each peer in the Chord ring needs to know how to contact
points in the space, when one or more of a peer’s neighbors its current successor peer on the identifier circle. Lookup
fail, this peer can still route along the next best available path. queries involve the matching of key and NodeID. For a given
Improvement of the CAN algorithm can be accomplished identifier could be passed around the circle via these succes-
by maintaining multiple, independent coordinate spaces, with sor pointers until they encounter a pair of peers that include
each peer in the system being assigned a different zone in the desired identifier; the second peer in the pair is the peer
each coordinate space, called reality. For a CAN with r reali- the query maps to. An example is presented in Fig. 4, whereby
ties, a single peer is assigned r coordinate zones, one on each peer 8 performs a lookup for key 54. Peer 8 invokes the find
reality available, and this peer holds r independent neighbor successor operation for this key, which eventually returns the
sets. The contents of the hash table are replicated on every successor of that key, i.e. peer 56. The query visits every peer
reality, thus improving data availability. For further data avail- on the circle between peer 8 and peer 56. The response is
ability improvement, CAN could use k different hash func- returned along the reverse of the path.
tions to map a given key onto k points in the coordinate As m is the number of bits in the key/NodeID space, each
space. This results in the replication of a single {key,value} peer n maintains a routing table with up to m entries, called
pair at k distinct peers in the system. A {key,value} pair is the finger table. The ith entry in the table at peer n contains
then unavailable only when all the k replicas are simultane- the identity of the first peer s that succeeds n by at least 2 i – 1
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
76 IEEE Communications Surveys & Tutorials • Second Quarter 2005
digit, e.g., * * * 7 ⇒ * * 97 ⇒ *297 ⇒ 3297, where * is the A variety of different applications have been designed and
wildcard, similar to the longest prefix routing in the CIDR implemented on Tapestry. Tapestry is self-organizing, fault
IP address allocation architecture [24]. The resolution of tolerant, resilient under load, and is a fundamental compo-
digits from right to left or left to right is arbitrary. A peer’s nent of the OceanStore system [17, 25]. OceanStore is a glob-
local routing map has multiple levels, where each of them al-scale, highly available storage utility deployed on the
represents a match of the suffix with a digit position in the PlanetLab [26] testbed. OceanStore servers use Tapestry to
ID space. The nth peer that a message reaches shares a suf- disseminate encoded file blocks efficiently, and clients can
fix of at least length n with the destination ID. To locate the quickly locate and retrieve nearby file blocks by their ID,
next router, the (n + 1)th level map is examined to locate despite server and network failures. Other Tapestry applica-
the entry matching the value of the next digit in the destina- tions include the Bayeux [27], an efficient self organizing
tion ID. This routing method guarantees that any existing application-level multicast system, and SpamWatch [28], a
unique peer in the system can be located within at most decentralized spam-filtering system that uses a similarity
logBN logical hops, in a system with N peers using NodeIDs search engine implemented on Tapestry.
of base B. Since the peer’s local routing map assumes that
the preceding digits all match the current peer’s suffix, the PASTRY
peer needs only to keep a small constant size (B) entry at
each route level, yielding a routing map of fixed constant Pastry [4], like Tapestry, makes use of Plaxton-like prefix rout-
size: (entries/map) × no. of maps = B × logBN. ing to build a self-organizing decentralized overlay network,
The lookup and routing mechanisms of Tapestry are based where each peer routes client requests and interacts with local
on matching the suffix in NodeID as described above. Routing instances of one or more applications. Each peer in Pastry is
maps are organized into routing levels, where each level con- assigned a 128-bit peer identifier (NodeID). The NodeID is
tains entries that point to a set of peers closest in distance used to give a peer’s position in a circular NodeID space,
that matches the suffix for that level. Also, each peer holds a which ranges from 0 to 2128 – 1. The NodeID is assigned ran-
list of pointers to peers referred to as neighbors. Tapestry domly when a peer joins the system, and it is assumed to be
stores the locations of all data object replicas to increase generated such that the resulting set of NodeIDs is uniformly
semantic flexibility and allow the application level to choose distributed in the 128-bit space. For a network of N peers, Pas-
from a set of data object replicas based on some selection cri- try routes to the numerically closest peer to a given key in less
teria, such as date. Each data object may include an optional than logBN steps under normal operation (where B = 2b is a
application-specific metric in addition to a distance metric. configuration parameter with typical value of b = 4). The
For example, the OceanStore [17] global storage architecture NodeIDs and keys are considered a sequence of digits with
finds the closest cached document replica that satisfies the base B. Pastry routes messages to the peer whose NodeID is
closest distance metric. These queries deviate from the simple numerically closest to the given key. A peer normally forwards
find first semantics, and Tapestry will route the message to the the message to a peer whose NodeIDs share with the key a
closest k distinct data objects. prefix that is at least one digit (or b bits) longer than the prefix
Tapestry handles the problem of a single point of failure that the key shares with the current peer NodeID.
due to a single data object’s root peer by assigning multiple As shown in Fig. 5, each Pastry peer maintains a routing
roots to each object. Tapestry makes use of surrogate routing table, a neighborhood set, and a leaf set. A peer routing table
to select root peers incrementally during the publishing pro- is designed with logBN rows, where each row holds B – 1 num-
cess to insert location information into Tapestry. Surrogate ber of entries. The B – 1 number of entries at row n of the
routing provides a technique by which any identifier can be routing table each refer to a peer whose NodeID shares the
uniquely mapped to an existing peer in the network. A data current peer’s NodeID in the first n digits, but whose (n+1)th
object’s root or surrogate peer is chosen as the peer that digit has one of the B – 1 possible values other than the
matches the data object’s ID, X. This is unlikely to happen, (n+1)th digit in the current peer’s NodeID. Each entry in the
given the sparse nature of the NodeID space. Nevertheless, routing table contains the IP address of peers whose NodeIDs
Tapestry assumes peer X exists by attempting to route a mes- have the appropriate prefix, and it is chosen according to a
sage to it. A route to a non-existent identifier will encounter close proximity metric. The value of b could be chosen with a
empty neighbor entries at various positions along the way. tradeoff between the size of the populated portion of the rout-
The goal is to select an existing link that can act as an alterna- ing table [approximately (logBN) × (B – 1) entries] and maxi-
tive to the desired link; i.e. the one associated with a digit X. mum number of hops required to route between any pair of
Routing terminates when a map is reached where the only peers (logBN). The neighborhood set, M, contains the NodeIDs
non-empty routing entry belongs to the current peer. That and IP addresses of the M peers that are closest in proximity
peer is then designated as the surrogate root for the data to the local peer. The network proximity that Pastry uses is
object. While surrogate routing may take additional hops to based on a scalar proximity metric such as the IP routing geo-
reach a root if compared with the Plaxton algorithm, the addi- graphic distance. The leaf set, L, is the set of peers with L/2
tional number of hops is small. Thus, surrogate routing in numerically closest larger NodeIDs and L/2 peers with
Tapestry has minimal routing overhead relative to the static numerically smaller NodeIDs, in relation to the current peer’s
global Plaxton algorithm. NodeID. Typical values for L and M are B or 2 × B. Even
Tapestry addresses the issue of fault adaptation and main- with concurrent peer failure, eventual delivery is guaranteed
tains cached content for fault recovery by relying on TCP with good reliability and fault resiliency, unless L/2 peers
timeouts and UDP periodic heartbeat packets to detect link, with adjacent NodeIDs fail simultaneously (L is a configura-
server failures during normal operations, and rerouting tion parameter with a typical value of 16 or 32).
through its neighbors. During fault operation each entry in When a new peer (NodeID is X) joins the network, it
the neighbor map maintains two backup neighbors in addition needs to initialize its routing table and inform other peers of
to the closest/primary neighbor. On a testbed of 100 machines its presence. This new peer needs to know the address of a
with 1000 peer simulations, the results in [7] show the good contact or bootstrap peer in the network. A small list of con-
routing rates and maintenance bandwidths during instanta- tact peers, based on a proximity metric (e.g., the RTT to each
neous failures and continuing churn. peer) to provide better performance, could be provided as a
0x 1x 2x 3x 4x ... Dx Ex Fx
NodeID 37A0F1
Leaf set
(smaller)
37A0F1
37A001 37A011 37A022 37A033
37A044 37A055 37A066 37A077
Live peers
in Pastry
Leaf set
(larger) Route
(B57B2D)
37A0F2 37A0F4 37A0F6 37A0F8
37A0FA 37A0FB 37A0FC 37A0FE
B581F1
Neighborhood
set B24EA3
B57B2D B573D6
1A223B 1B3467 245AD0 2670AB
B573AB
3612AB 37890A 390AF0 3912CD B5324F
46710A 477810 4881AB 490CDE
Routing from peer 37A0F1 with key B57B2D
279DE0 290A0B 510A0C 5213EA
11345B 122167 16228A 19902D
221145 267221 28989C 199ABC
■ Figure 5. Pastry peer's routing table, leaf set, and neighbor set. An example of routing path for a pastry peer.
service in the network, and the new peer could select at ran- the side of the failed peer, and request its leaf table. Let the
dom one of the peers for contact. As a result, this new peer received leaf set be L′, which overlaps the current peer’s leaf
will know initially about a closest Pastry peer A. Peer X then set L, and it contains peers with nearby NodeIDs not residing
asks A to route a join message with the key equal to X. Pastry in L. The appropriate peer is chosen to insert into L, verifying
routes the join message to the existing peer Z whose NodeID that the peer is actually still alive by contacting it. The neigh-
is numerically closest to X. Upon receiving the join request, borhood set is not used in the routing of messages, but it is
peers A, Z and all peers encountered on the path from A to Z still kept fresh because this set plays an important role in
send their routing tables to X. Finally, X informs any peers exchanging information about nearby peers. Therefore, a peer
that need to be aware of its arrival. This ensures that X initial- contacts each member of the neighborhood set periodically to
izes its routing table with appropriate information and the test if it is still alive. If the peer is not responding, the contact-
routing tables in all other affected peers are updated based on ing peer asks other members for their neighborhood sets and
the information received. As peer A is topologically close to checks for the closest proximity of each of the newly contact-
the new peer X, A’s neighborhood set is used to initialize X’s ed peers and updates its own neighborhood set.
neighborhood set. Pastry is being used in the implementation of a scalable
A Pastry peer is considered to have left the overlay net- application-level multicast infrastructure called Scribe [29, 30].
work when its immediate neighbors in the NodeID space can Instead of relying on a multicast infrastructure in the network
no longer communicate with the peer. To replace this failed that is not widely available, the participating peers route and
peer in the leaf set of its neighbors, its neighbors in the distribute multicast messages using only unicast network ser-
NodeID space contact the live peer with the largest index on vices. It supports a large number of groups with large num-
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
78 IEEE Communications Surveys & Tutorials • Second Quarter 2005
bers of members per group. Scribe is built on top of Pastry, close to the key. A NodeID-based routing algorithm will be
which is used to create and manage groups and to build effi- used to locate peers near a destination key. One of the key
cient multicast trees for dissemination of messages to each architectures of Kademlia is the use of a novel XOR metric for
group. Scribe builds a multicast tree formed by joining Pastry distance between points in the key space. XOR is symmetric
routes from each group member to a rendezvous point associ- and it allows peers to receive lookup queries from precisely the
ated with a group. Membership maintenance and message dis- same distribution of peers contained in their routing tables.
semination in Scribe leverages the robustness, self Kademlia can send a query to any peer within an interval,
organization, locality, and reliability properties of Pastry. allowing it to select routes based on latency or send parallel
SplitStream [31] allows a cooperative multicasting environ- asynchronous queries. It uses a single routing algorithm
ment where peers contribute resources in exchange for using throughout the process to locate peers near a particular ID.
the service. The key idea is to split the content into k stripes Every message being transmitted by a peer includes its peer
and to multicast each stripe using a separate tree. Peers join as ID, permitting the recipient to record the sender peer’s exis-
many trees as there are stripes they wish to receive and they tence. Data keys are also 160-bit identifiers. To locate {key,value}
specify an upper bound on the number of stripes that they are pairs, Kademlia relies on the notion of distance between two
willing to forward. The challenge is to construct this forest of identifiers. Given two 160-bit identifiers, a and b, it defines the
multicast trees such that an interior peer in one tree is a leaf distance between them as their bitwise exclusive OR (XOR,
peer in all the remaining trees and the bandwidth constraints interpreted as d(a, b) = a ⊕ b = d(b, a) for all a, b), and this is a
specified by the peers are satisfied. This ensures that the for- non-Euclidean metric. Thus, d(a, b) = 0, d(a, b) > 0(if a ≠ b),
warding load can be spread across all participating peers. For and for all a, b: d(a, b) = d(b, a). XOR also offers the triangle
example, if all peers wish to receive k stripes and they are will- inequality property: d(a, b) + d(b, c) ≥ d(a, c), since d(a, c) =
ing to forward k stripes, SplitStream will construct a forest such d(a, b) ⊕ d(b, c) and (a + b ≥ a ⊕ b) for all a, b = 0. Similar to
that the forwarding load is evenly balanced across all peers Chord’s clockwise circle metric, XOR is unidirectional. For any
while achieving low delay and link stress across the network. given point x and distance d > 0, there is exactly one point y such
Squirrel [32] uses Pastry as its data object location service, that d(x, y) = d. The unidirectional approach makes sure that all
to identify and route to peers that cache copies of a requested lookups for the same key converge along the same path, regard-
data object. It facilitates mutual sharing of Web data objects less of the originating peer. Hence, caching {key,value} pairs
among client peers, and enables the peers to export their local along the lookup path alleviates hot spots.
caches to other peers in the network, thus creating a large The peer in the network stores a list of {IP address,UDP
shared virtual Web cache. Each peer then performs both Web port,NodeID} triples for peers of distance between 2i and 2i+1
browsing and Web caching, without the need for expensive from itself. These lists are called k-buckets. Each k-bucket is
and dedicated hardware for centralized Web caching. Squirrel kept sorted by last time seen, i.e. least recently accessed peer
faces a new challenge whereby peers in a decentralized cache at the head, most-recently accessed at the tail. The Kademlia
incur the overhead of having to serve each other requests, and routing protocol consists of the following steps:
this extra load must be kept low. • PING probes a peer to check if it is active.
PAST [33, 34] is a large scale P2P persistent storage utility • STORE instructs a peer to store a {key,value} pair for
that is based on Pastry. The PAST system is composed of peers later retrieval.
connected to the Internet such that each peer is capable of ini- • FIND_NODE takes a 160-bit ID, and returns {IP
tiating and routing client requests to insert or retrieve files. address,UDP port,NodeID} triples for the k peers it
Peers may also contribute storage to the system. A storage sys- knows that are closest to the target ID.
tem like PAST is attractive because it exploits the multitude • FIND_VALUE is similar to FIND_NODE: it returns {IP
and diversity of peers in the Internet to achieve strong persis- address,UDP port,NodeID} triples, except in the case
tence and high availability. This eradicates the need for physical when a peer receives a STORE for the key, in which case
transport of storage media to protect lookup and archival data, it just returns the stored value.
and the need for explicit mirroring to ensure high availability Importantly, Kademlia’s peer must locate the k closest
and throughput for shared data. A global storage utility also peers to some given NodeID. This lookup initiator starts by
facilitates the sharing of storage and bandwidth, thus permitting picking X peers from its closest non-empty k-bucket, and then
a group of peers to jointly store or publish content that would sends parallel asynchronous FIND_NODE to the X peers it
exceed the capacity or bandwidth of any individual peer. has chosen. If FIND_NODE fails to return a peer that is any
Pastiche [35] is a simple and inexpensive backup system closer than the closest peers already seen, the initiator resends
that exploits excess disk capacity to perform P2P backup with the FIND_NODE to all of the k closest peers it has not
no administrative costs. The cost and inconvenience of backup already queried. It can route for lower latency because it has
are unavoidable and often prohibitive. Small-scale solutions the flexibility to choose any one of k peers to forward a
require significant administrative efforts. Large-scale solutions request. To find a {key,value} pair, a peer starts by perform-
require aggregation of substantial demand to justify the capi- ing a FIND_VALUE lookup to find the k peers with IDs clos-
tal costs of a large, centralized repository. Pastiche builds on est to the key. To join the network, a peer n must have contact
three architectures: Pastry, which provides the scalable P2P with an already participating peer m. Peer n inserts peer m
network with self-administered routing and peer location; into the appropriate k-bucket, and then performs a peer
content-based indexing [36, 37], which provides flexible dis- lookup for its own peer ID. Peer n refreshes all k-buckets far-
covery of redundant data for similar files; and convergent ther away than its closest neighbor, and during this refresh,
encryption [18], which allows hosts to use the same encrypted peer n populates its own k-buckets and inserts itself into other
representation for common data without sharing keys. peers’ k-buckets, if needed.
KADEMLIA VICEROY
The Kademlia [14] P2P decentralized overlay network takes the The Viceroy [15] P2P decentralized overlay network is
basic approach of assigning each peer a NodeID in the 160-bit designed to handle the discovery and location of data and
key space, and {key,value} pairs are stored on peers with IDs resources in a dynamic butterfly fashion. Viceroy employs
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
80 IEEE Communications Surveys & Tutorials • Second Quarter 2005
ences will move to maintain load balancing. Since the Chord cient network construction proved formally in [15], and main-
lookup service presents a solution where each peer maintains tains constant degree networks in a dynamic environment,
a logarithmic number of long-range links, it gives a logarith- similar to CAN. Viceroy has logarithmic diameter, similar to
mic join/leave update. In Chord, the network is maintained Chord, Pastry, and Tapestry. Viceroy’s diameter is proven to
appropriately by a background maintenance process, i.e. a be better than CAN and its degree is better than Chord, Pas-
periodic stabilization procedure that updates predecessor and try, and Tapestry. Its routing is achieved in O(logN) hops
successor pointers to cater to newly joined peers. Liben-Now- (where N is the number of peers) and with nearly optimal
ell et al. [43] ask the question of how often the stabilization congestion. Peers joining and leaving the system induce
procedure needs to run to determine the success of Chord’s O(logN) hops and require only O(1) peers to change their
lookups and if determining the optimum involves the mea- states. Li et al. [45] suggest in their paper that limited degree
surement of peers’ behavior. Stoica et al. [6] demonstrate the may increase the risk of network partition or limitations in the
advantage of recursive lookups over iterative lookups, but use of local neighbors. However, its advantage is the constant-
future work is proposed to improve resiliency to network par- degree overlay properties. Kaashoek et al. [46] highlight its
titions using a small set of known peers, and to reduce the fault-tolerant blind spots and its complexity.
amount of messages in lookups by increasing the size of each Further work was done by Viceroy’s authors with the pro-
step around the ring with a larger finger in each peer. Alima posal of a two-tier, locality-aware DHT [47] which gives lower
et al. [44] propose a correction-on-use mechanism in their Dis- degree properties in each lower-tier peer, and the bounded-
tributed K-ary Search (DKS), which is similar to Chord, to degree P2P overlay using de Bruijn graph [48]. Since de Brui-
reduce the communication costs incurred by Chord’s stabiliza- jn graphs give very short average routing distances and high
tion procedure. The mechanism makes corrections to the resilience to peer failure, they are well suited for Structured
expired routing entries by piggybacking lookups and inser- P2P overlay networks. The P2P overlays discussed above are
tions. greedy, and for a given degree, the algorithms are suboptimal
The work on CAN has a constant degree network for rout- because the routing distance is longer. There are increasing
ing lookup requests. It organizes the overlay peers into a d- improvements to de Bruijn P2P overlay proposals [46, 49–52].
dimensional Cartesian coordinate space, with each peer taking The de Bruijn graph of degree k (k can be varied) could
ownership of a specific hyper-rectangular shape in the space. achieve an asymptotically optimum diameter (maximum hop-
The key motivation of the CAN design is based on the argu- counts between any two peers in the graph) of logkN, where N
ment that Plaxton-based schemes would not perform well is the total number of peers in the system. Given O(logN)
under churn, given that peer departures and arrivals would neighbors in each peer, the de Bruijn graphs’ hop count is
affect a logarithmic number of peers. It maintains a routing O(logN/loglogN). A good comparison study has been done by
table with its adjacent immediate neighbors. Peers joining the Loguinov et al. [50] where they use examples of Chord, CAN,
CAN cause the peer owning the region of space to split, giv- and de Bruijn to study routing performance and resilience of
ing half to the new peer and retaining half. Peers leaving the P2P overlay networks, including graph expansion and cluster-
CAN will pass its NodeID, neighbors’ NodeID, IP addresses ing properties. They confirmed that de Bruijn graphs for a
and its {key,value} pairs to a takeover peer. CAN has a num- given degree k offer the best diameter and average distance
ber of tunable parameters to improve routing performance: between all pairs of peers (this determines the expected
dimensionality of the hypercube; network-aware routing by response time in number of hops), optimal resilience (k-peer
choosing the neighbor closest to the destination in CAN connectivity), large bisection width (the bisection width of a
space; multiple peers in a zone, allowing CAN to deliver mes- graph provides tight upper bounds on the achievable capacity
sages to anyone of the peers in the zone in an anycast man- of the graph), and good node (peer) expansion, which guaran-
ner; uniform partitioning, made possible by comparing the tees little overlap between parallel paths to any destination
volume of a region with the volumes of neighboring regions peer. (If there is a peer failure, very few alternative paths to a
when a peer joins; and landmark-based placement which caus- destination peer are affected.)
es peers, at join time, to probe a set of well known landmark P2P DHT-based overlay systems are susceptible to security
hosts, estimating each of their network distances. There are breaches from malicious peers’ attacks. One simple attack on
open research questions on CAN’s resiliency, load balancing, a DHT-based overlay system is when the malicious peer
locality, and latency/hopcount costs. returns wrong data objects to the lookup queries. The authen-
Kademlia’s XOR topology-based routing resembles very ticity of the data objects can be handled by using cryptograph-
much the first phase in the routing algorithms of Pastry, ic techniques through some cost-effective public keys and/or
Tapestry, and Plaxton. For these three algorithms, there is a content hashes to securely link together different pieces of
need for an additional algorithmic structure for discovering data objects. Such techniques can neither prevent undesirable
the target peer within the peers that share the same prefix but data objects from polluting the search results, nor prevent
differ in the next b-bit digit. It was argued in [14] that Pastry denial of attacks. Malicious peers may still be able to corrupt,
and Tapestry algorithms require secondary routing tables of deny access, or respond to lookup queries of replicas of a data
size O(2b) in addition to the main tables of size O(2blog2bN), object, and impersonate so that replicas may be stored on ille-
which increases the cost of bootstrapping and maintenance. gitimate peers. Sit et al. [53] provide a very clear description
Kademlia resolves in their distinctive ways through the use of of security considerations that involve the adversaries that are
XOR metrics for the distance between 160-bit NodeIDs, and peers in the DHT overlay lookup system that do not follow
each peer maintains a list of contact peers, of which longer- the protocol correctly: malicious peers are able to eavesdrop
lived peers are given preference on this list. Kademlia can the communication between other nodes; malicious peers can
easily be optimized with a base other than 2, by configuring only receive data objects addressed to its IP address, and thus,
the bucket table so that it approaches the target b bits per the IP address can be a weak form of peer identity; and mali-
hop. This requiress having one bucket for each range of peers cious peers can collude together, giving believable false infor-
at a distance [j × (2160–(i+1)b), (j + 1) × (2160–(i+1)b)], for each mation. They presented a taxonomy of possible attacks
0 < j < 2 b and 0 ≤ i < 160/b. This expects no more than involving routing deficiencies due to corrupted lookup routing
(2b – 1) × (log2bN) buckets. and updates; vulnerability to partitioning and virtualization
The Viceroy overlay network (butterfly) presents an effi- into incorrect networks when new peers join and contact mali-
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
82 IEEE Communications Surveys & Tutorials • Second Quarter 2005
Structured P2P Overlay Network Comparisons
Algorithm
taxonomy
CAN Chord Tapestry Pastry Kademlia Viceroy
Butterfly network
Multi- XOR metric for with connected
Uni-directional Plaxton-style Plaxton-style
dimensional distance between ring of predecessor
Architecture and circular global mesh global mesh
ID coordinate points in the key and successor links;
NodeID space. network. network.
space. space. data managed by
servers.
Routing through
{key, value}
levels of tree until
pairs to map a
a peer is reached
point P in the Matching key and
Lookup Matching key Matching suffix Matching key and with no downlinks;
coordinate NodeID-based
protocol and NodeID. in NodeID. prefix in NodeID. vicinity search
space using routing.
performed using
uniform hash
ring and level-ring
function.
links.
N-number of
N-number of peers in network N-number of
N-number of
peers in and b-number of peers in network
peers in N-number of
System network and and b-number of N-number of
network and peers in bits (B = 2b) used
parameters B-base of the peers in network.
d-number of network. for the base of bits (B = 2b) of
chosen peer
dimensions. the chosen NodeID.
identifier.
identifier.
O(logBN)+c
Routing
O(d.N1/d) O(logN) O(logB N) O(logBN) where c = small O(logN)
performance
constant
are the ability to maintain locally a set of files in accordance descriptive text string chosen by the user, e.g., /music/Brit-
with the maximum disk space allocated by the network opera- ney.Spears. The descriptive text string is used as the input to
tor, and to provide security mechanisms against malicious deterministically generate a public/private key pair, and the
peers. The basic model is that requests for keys are passed public half is then hashed to yield the data file key. The pri-
along from peer to peer through a chain of proxy requests in vate half of the asymmetric key pair is used to sign the data
which each peer makes a local decision about the location to file, thus providing a minimal integrity check that a retrieved
send the request next, similar to Internet Protocol (IP) rout- data file matches its data file key. The data file is also encrypt-
ing. Freenet also enables users to share unused disk space, ed using the descriptive string itself as a key, so as to perform
thus allowing a logical extension to their own local storage an explicit lookup protocol to access the contents of their
devices. data-stores.
The basic architecture consists of data items being identi- However, nothing prevents two users from independently
fied by binary file keys obtained by applying the 160-bit choosing the same descriptive string for different files. These
SHA-1 hash function [70]. The simplest type of file key is the problems are addressed by the Signed-Subspace Key (SSK),
Keyword-Signed Key (KSK), which is derived from a short which enables personal namespaces. The public namespace key
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
84 IEEE Communications Surveys & Tutorials • Second Quarter 2005
Peer
Download gies as the system evolves. Ultra-peers perform query process-
ing on behalf of their leaf peers. When a peer joins the net-
work as a leaf, it selects a number of ultra-peers, and then it
Query publishes its file list to those ultra-peers. A query for a leaf
Peer
peer is sent to an ultra-peer, which floods the query to its
ultra-peer neighbors up to a limited number of hops. Dynamic
Peer querying [74] is a search technique whereby queries that
ery
Query return fewer results are re-flooded deeper into the network.
Qu
e
Peer Saroiu et al. [75] examined the bandwidth, latency, avail-
ons
ability, and file sharing patterns of the peers in Gnutella and
sp
Re
Response
Napster, and highlighted the existence of significant hetero-
geneity in both systems. Krishnamurthy et al. [76] propose a
Query cluster-based architecture for P2P systems (CAP), which uses
a network-aware clustering technique (based on a central
Peer clustering server) to group peers into clusters. Each cluster
has one or more delegate peers that act as directory servers
for objects stored at peers within the same cluster. Chawathe
■ Figure 8. Gnutella utilizes a decentralized architecture docu- et al. [73] propose a model called Gia, by modifying Gnutel-
ment location and retrieval. la’s algorithm to include flow control, dynamic topology
adaptation, one-hop replication, and careful attention to peer
heterogeneity. The simulation results suggest that these mod-
able, e.g., list of peers available from http://gnutellahosts. com. ifications provide three to five orders of magnitude improve-
Once connected to the network, peers send messages to inter- ment in the total capacity of the system while retaining
act with each other. These messages are broadcasted (i.e. sent significant robustness to failures. Thus, making a few simple
to all peers with which the sender has open TCP connections), changes to Gnutella’s search operations would result in dra-
or simply back-propagated (i.e., sent on a specific connection matic improvements in its scalability.
on the reverse of the path taken by an initial, broadcast mes-
sage). First, each message has a randomly generated identifi- FASTTRACK/KAZAA
er. Second, each peer keeps a short memory of the recently
routed messages, used to prevent re-broadcasting and to FastTrack [65] P2P is a decentralized file-sharing system that
implement back-propagation. Third, messages are flagged supports meta-data searching. Peers form a structured overlay
with TTL and “hops passed” fields. The messages that are of super-peer architectures to make search more efficient, as
allowed in the network are: shown in Fig. 9. Super-peers are peers with high bandwidth,
• Group Membership (PING and PONG) Messages. A disk space, and processing power, and have volunteered to be
peer joining the network initiates a broadcasted PING elected to facilitate search by caching the meta-data. The ordi-
message to announce its presence. The PING message is nary peers transmit the meta-data of the data files they are
then forwarded to its neighbors and initiates a back- sharing to the super-peers. All the queries are also forwarded
propagated PONG message, which contains information to the super-peer. Then, Gnutella-type broadcast-based search
about the peer, such as the IP address, number and size is performed in a highly pruned overlay network of super-
of the data items. peers. The P2P system can exist without any super-peer, but
• Search (QUERY and QUERY RESPONSE) Messages. this would result in worse query latency. However, this
QUERY contains a user specified search string that each approach still consumes bandwidth so as to maintain the index
receiving peer matches against locally stored file names at the super-peers on behalf of the peers that are connected.
and it is broadcast. QUERY RESPONSE messages are The super-peers still use a broadcast protocol for search, and
backpropagated replies to QUERY messages and include the lookup queries are routed to peers and super-peers that
information necessary to download a file. have no relevant information to the query. Both KaZaA [66]
• File Transfer (GET and PUSH) Messages. File down- and Crokster [77] are both FastTrack applications.
loads are performed directly between two peers using As mentioned, KaZaA is based on the proprietary Fast-
these types of messages. Track protocol which uses specially designated super-peers
Therefore, to become a member of the network, a servent that have higher bandwidth connectivity. Pointers to each
(peer) has to open one or many connections with other peers peer’s data are stored on an associated super-peer, and all
that are already in the network. With such a dynamic network queries are routed to the super-peers. Although this approach
environment, to cope with the unreliability after joining the seems to offer better scaling properties than Gnutella, its
network, a peer periodically PINGs its neighbors to discover design has not been analyzed. There have been proposals to
other participating peers. Peers decide where to connect in incorporate this approach into the Gnutella network [11]. The
the network based only on local information. Thus, the entire KaZaA peer-to-peer file sharing network client supports a
application-level network has servents as its peers and open similar behavior, allowing powerful peers to opt-out of net-
TCP connections as its links, forming a dynamic, self-organiz- work support roles that consume CPU and bandwidth.
ing network of independent entities. KaZaA file transfer traffic consists of unencrypted HTTP
The latest versions of Gnutella uses the notion of super- transfers; all transfers include KaZaA-specific HTTP headers
peers or ultra-peers [11] (peers with better bandwidth connec- (e.g., X-KaZaA-IP). These headers make it simple to distin-
tivity), to help improve the routing performance of the guish between KaZaA activity and other HTTP activity. The
network. However, it is still limited by the flooding mechanism KaZaA application has an auto-update feature, meaning a
used for communications across ultra-peers. Moreover, the running instance of KaZaA will periodically check for updated
ultra-peer approach makes a binary decision about a peer’s versions of itself. If it is found, it downloads the new exe-
capacity (ultra-peer or not) and to our knowledge, it has no cutable over the KaZaA network.
mechanism to dynamically adapt the ultra-peer-client topolo- A power-law topology, commonly found in many practical
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
86 IEEE Communications Surveys & Tutorials • Second Quarter 2005
downloader has a complete file, it uses its upload rate rather one object owner can write to a file and anyone can read it.
than its download rate to decide which to unchoke. For opti- P2P overlay designs using DHTs share similar characteristics
mistic unchoking, at any one time there is a single peer that is as Freenet — an exact query yields an exact response. This is
unchoked regardless of its upload rate. If this peer is interest- not surprising since Freenet uses a hash function to generate
ed, it counts as one of the four allowed downloaders. Peers keys. Recent research in [84] shows that changing Freenet’s
that are optimistically unchoked rotate every 30 seconds. routing table cache replacement scheme from LRU to enforc-
ing clustering in the key space can significantly improve per-
OVERNET/EDONKEY2000 formance. This idea is based on the intuition from the
small-world models [39] and theoretical results by Kleinberg
Overnet/eDonkey [68, 69] is a hybrid two-layer P2P informa- [39].
tion storage network composed of client and server, which are Version 0.6 of the Gnutella protocol [9, 10] adopted the
used to publish and retrieve small pieces of data by creating a concept of ultra-peers, which are high-capacity peers that act
file-sharing network. This architecture provides features such as proxies for lower-capacity peers. One of the main enhance-
as concurrent download of a file from multiple peers, detec- ments is the Query Routing Protocol (QRP), which allows the
tion of file corruption using hashing, partial sharing of files leaf peers to forward an index of object name keywords to its
during downloading, and expressive querying methods for file ultra-peers [85]. This allows the ultra-peers to have their
search. To join the network, the peer (client) needs to know leaves receive lookup queries when they have a match, and
the IP address and port of another peer (server) in the net- subsequently, it reduces the lookup query traffic at the leaves.
work. It then bootstraps from the other peer. The clients con- A shortcoming of QRP is that the lookup query propagation
nect to a server and register the object files that they are is independent of the popularity of the objects. The Dynamic
sharing by providing the meta-data describing the object files. Query Protocol [86] addressed this by letting the leaf peers
After registration, the clients can either search by querying send single queries to high-degree ultra-peers, which adjust
the meta-data or request a particular file through its unique the lookup queries’ TTL bounds in accordance with the num-
network identifier, thus providing guaranteed service to locate ber of received lookup query results. The Gnutella UDP
popular objects. Servers provide the locations of object files Extension for Scalable Searches (GUESS) [87] also aimed to
when requested by clients, so that clients can download the reduce the number of lookup queries by repeatedly querying
files directly from the indicated locations. single ultra-peers with a TTL of 1, to limit the load on each
lookup query.
As described earlier, Chawathe et al. [73] improve the
DISCUSSION ON UNSTRUCTURED Gnutella design in their Gia system, by incorporating an adap-
tation algorithm so that peers are attached to high-degree
P2P OVERLAY NETWORK peers, and by providing a receiver-based token flow control
The Unstructured P2P centralized overlay model was first for sending lookup queries to neighbors. Instead of flooding,
popularized by Napster. This model requires some managed they make use of a random walk search algorithm as the sys-
infrastructure (the directory server) and show some scalability tem keeps pointers to objects in neighboring peers. However,
limits. A flooding-requests model for decentralized P2P over- in [87] they proposed that Unstructured P2P overlays such as
lay systems such as Gnutella, whereby each peer keeps a user- Gnutella can be built on top of Structured P2P overlays to
driven neighbor table to locate data objects, are quite effective help reduce the lookup query overhead and overlay mainte-
in locating popular data objects, thanks to the power-law nance traffic. They used the collapse point lookup query rate
property of user-driven characteristics. However, it can lead to (defined as the per node query rate at which the successful
excessive network bandwidth consumption, and remote or query rate falls below 90 percent) and the average hopcounts
unpopular data objects may not be found due to the limit of prior to collapse. However, the comparison was done in a
lookup horizon typically imposed by TTL. static network scenario with the older Gnutella and not the
The argument is that DHT-based systems, while more effi- enhanced version of Gnutella.
cient at many tasks and offering strong theoretical fundamen- BitTorrent, a second-generation P2P overlay system,
tals to guarantee a key to be found if it exists, are not well achieves higher levels of robustness and resource utilization
suited for mass-market file sharing. They do not capture the based on its incentives cooperation technique for file distribu-
semantic object relationships between its name and its content tion. The longest and most comprehensive measurement study
or metadata. In particular, DHT-based ability to find exceed- of a BitTorrent P2P system [88] provides more insight by
ingly rare objects is not required in a mass-market file sharing comparing a detailed measurement study of BitTorrent with
environment, and their ability to efficiently implement key- other popular P2P file-sharing systems, such as
word search is still not proven. In addition, they use precise FastTrack/KaZaA, Gnutella, Overnet/eDonkey, and Direct-
placement algorithms and specific routing protocols to make Connect, based on five characteristics:
searching efficient. However, these Structured P2P overlay 1 Popularity: Total number of users participating over a
systems have not been widely deployed, and their ability to certain period of time.
handle unreliable peers has not been tested. Thus, in the 2 Availability: System availability depending on contributed
research community, efforts are being made to improve the resources.
lookup properties of Unstructured P2P overlays to include 3 Download Performance: Contrast between size of data
flow control, dynamic geometric topology adaptation [82, 83], and the time required for download.
one-hop replication, peer heterogeneity, etc. 4 Content Lifetime: Time period when data is injected into
Freenet, like Chord, does not assign responsibility for data the system until no peer is willing to share the data any-
to specific peers, and its lookups take the form of searches for more.
cached copies. This prevents it from guaranteeing retrieval of 5 Pollution Level: Fraction of corrupted content spread
existing data or from providing low bounds on retrieval costs. throughout the system.
But Freenet provides anonymity and it introduces a novel FastTrack/KaZaA has the largest file sharing community, with
indexing scheme whereny files are identified by content-hash Overnet/eDonkey and BitTorrent gaining popularity. The
keys and by secured signed-subspace keys to ensure that only popularity of the BitTorrent system is influenced by the avail-
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
88 IEEE Communications Surveys & Tutorials • Second Quarter 2005
Unstructured P2P Overlay Network Comparisons
Algorithm
taxonomy
Freenet Gnutella FastTrack/KaZaA BitTorrent Overnet/eDonkey 2000
Keys, Descriptive
Lookup
Text String search Query flooding. Super-Peers. Tracker. Client-server peers.
protocol
from peer to peer.
System
None. None. None. .torrent file. None.
parameters
Some degree of
No guarantee to
Guarantee to guarantee to
locate data;
locate data using locate data, since
improvements Guarantee to locate Guarantee to locate data
Key search until queries are routed
Routing made in adapting data and guarantee and guarantee
the requests to the Super-Peers,
performance ultrapeer-client performance for performance for popular
exceeded the which has better
topologies; good popular content. content.
Hops-To-Live scaling; good
performance for
(HTL) limits. performance for
popular content.
popular content.
improve global routing properties. There is ongoing research work path) routing metric based on scalable and robust prox-
in this area based on mapping the peers into geometric coor- imity calculations (e.g., in geometric space). This leads to
dinate-based space [100, 103–112] and heuristic proximity improved P2P overlay operations performance globally. A
routing optimizations [113–117]. Taking heterogeneity of the mixed set of metrics which include delay, throughput, avail-
peers and its geometric properties [82, 83] into account when able bandwidth, and packet loss would provide a more effi-
delegating responsibility across peers, P2P overlays will cient global routing optimization.
improve the routing scalability. Future research would aim to •Cross-application of Internet P2P overlay networking
reduce the stretch (ratio of overlay path to underlying net- models in mobile, wireless, or ad-hoc networks. Because of
REFERENCES
Oceanstore,” IEEE Internet Comp., 2001.
[26] L. Peterson et al., “A Blueprint for Introducing Disruptive
Technology into the Internet,” SIGCOMM Comp. Commun.
[1] C. Plaxton, R. Rajaraman, and A. Richa, “Accessing Nearby Rev., vol. 33, no. 1, 2003, pp. 59–64.
Copies of Replicated Objects in a Distributed Environment,” [27] S. Q. Zhuang et al., “Bayeux: An Architecture for Scalable
Proc. 9th Annual ACM Symp. Parallel Algorithms and Architec- and Fault-Tolerant Wide-Area Data Dissemination,” Proc. 11th
tures, 1997. Int’l. Wksp. Network and Op. Sys. Support for Digital Audio
[2] L. Breslau et al., “Web Caching and zipf-like Distribution: Evi- and Video, 2001, pp. 11-20.
dence and Implications,” Proc. IEEE INFOCOM, 1999. [28] F. Zhou et al., “Approximate Object Location and Spam Fil-
[3] D. R. Karger et al., “Consistent Hashing and Random Trees: tering on Peer-to-Peer Systems,” Proc. Middleware, June 2003.
Distributed Caching Protocols for Relieving Hot Spots on the [29] A. Rowstron et al., “SCRIBE: The Design of a Large-Scale
World Wide Web,” Proc. ACM Symp. Theory of Comp., May Event Notification Infrastructure,” Proc. 3rd Int’l. Wksp. Net-
1997, pp. 654–63. worked Group Commun. (NGC2001), London, UK, Nov. 2001,
[4] A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed pp. 30-43.
Object Location and Routing for Large-scale Peer-to-peer Sys- [30] M. Castro et al., “SCRIBE: A Large-Scale and Decentralized
tems,” Proc. Middleware, 2001. Application-Level Multicast Infrastructure,” IEEE JSAC (special
[5] S. Ratnasamy et al., “A Scalable Content Addressable Net- issue on Network Support for Multicast Communications),
work,” Proc. ACM SIGCOMM, 2001, pp. 161–72. October 2002.
[6] I. Stoica, R. Morris et al., “Chord: A Scalable Peer-to-Peer [31] M. Castro et al., “Splitstream: High-Bandwidth Multicast in
Lookup Protocol for Internet Applications,” IEEE/ACM Trans. Cooperative Environments,” Proc. 19th ACM Symp. Operating
Net., vol. 11, no. 1, 2003, pp. 17–32. Systems Principles, Oct. 19–20, 2003, pp. 298–313.
[7] B. Y. Zhao et al., “Tapestry: A Resilient Global-Scale Overlay for [32] S. Iyer, A. Rowstron, and P. Druschel, “Squirrel: A Decentral-
Service Deployment,” IEEE JSAC, vol. 22, no. 1, Jan. 2004, pp. ized Peer-to-Peer Web Cache,” Proc. 21st Symp. Principles of
41-53. Distributed Computing (PODC), Monterey, California, USA, July
[8] Napster, available at http://www.napster.com/ 21-24 2002.
[9] (2001) Gnutella development forum, the Gnutella v0.6 proto- [33] P. Druschel and A. Rowstron, “PAST: A Large-Scale, Persistent
col, available at http://groups.yahoo.com/group/the gdf/files/ Peer-to-Peer Storage Utility,” Proc. 8th Wksp. Hot Topics in
[10] (2002) Gnucleus, the Gnutella Web caching system, available: Op. Sys. (HotOS-VIII). Schloss Elmau, Germany: IEEECompSoc,
at http://www.gnucleus.net/gwebcache/ May 2001.
[11] (2002) Gnutella ultrapeers, available at http://rfc- [34] A. Rowstron and P. Druschel, “Storage Management and
gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm/ Caching in Past, a Large-Scale, Persistent Peer-to-Peer Storage
[12] F. Dabek et al., “Towards a Common API for Structured Peer- Utility,” Proc. 18th ACM Symp. Operating Systems Principles,
to-peer Overlays,” Proc. 2nd Int’l. Wksp. Peer-to-Peer Systems Oct. 2001, pp. 188-201.
(IPTPS 2003), Berkeley, California, USA, Feb. 20-21, 2003. [35] L. P. Cox, C. D. Murray, and B. D. Noble, “Pastiche: Making
[13] B. Karp et al., “Spurring Adoption of DHTs with OpenHash, a Backup Cheap and Easy,” SIGOPS Op. Sys. Rev., vol. 36, no. SI,
Public DHT Service,” Proc. 3rd Int’l. Wksp. Peer-to-Peer Sys- 2002, pp. 285-98.
tems (IPTPS 2004), Berkeley, California, USA, Feb. 26-27, 2004. [36] A. Muthitacharoen, B. Chen, and D. Mazières, “A Low-Band-
[14] P. Maymounkov and D. Mazieres, “Kademlia: A Peer-to-Peer width Network File System,” Proc. 18th ACM Symp. Op. Sys.
Information System Based on the XOR Metric,” Proc. IPTPS, principles, 2001, pp. 174–87.
Cambridge, MA, USA, Feb. 2002, pp. 53–65. [37] U. Manber, “Finding Similar Files in a Large File System,”
[15] D. Malkhi, M. Naor, and D. Ratajczak, “Viceroy: A Scalable Proc. USENIX Winter 1994 Conf., Jan. 1994, pp. 1–10.
and Dynamic Emulation of the Butterfly,” Proc. ACM PODC [38] H. J. Siegel, “Interconnection Networks for SIMD Machines,”
2002, Monterey, CA, USA, July 2002, pp. 183–92. Computer, vol. 12, no. 6, 1979, pp. 57–65.
[16] P. Francis, “Yoid: Extending the Internet Multicast Architec- [39] J. Kleinberg, “The Small-World Phenomenon: An Algorithm
ture,” unpublished, Apr. 2000. http://www.aciri.org/yoid/docs/ Perspective,” Proc. 32nd Annual ACM Symp. Theory of Comp.,
index.html 2000, pp. 163–70.
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
90 IEEE Communications Surveys & Tutorials • Second Quarter 2005
[40] L. Barriére et al., “Efficient Routing in Networks with Long [63] Z. Despotovic and K. Aberer, “A Probabilistic Approach to
Range Contacts,” Proc. 15th Int’l. Conf. Distributed Comput- Predict Peers’ Performance in P2P Networks,” Proc. 8th Int’l.
ing, vol. 2180, 2001, pp. 270–84. Wksp. Cooperative Information Agents (CIA 2004), Erfurt, Ger-
[41] S. Rhea et al., “Handling Churn in a DHT,” Proc. 2nd Int’l. many, Sept. 27–29 2004.
Wksp. Peer-to-Peer (IPTPS 2003), Feb. 2003. [64] I. Clarke et al., Freenet: A Distributed Anonymous Informa-
[42] M. Castro, M. Costa, and A. Rowstron, “Performance and tion Storage and Retrieval System, available at http://freenet-
Dependability of Structured Peer-to-Peer Overlays,” Proc. 2004 project.org/ freenet.pdf, 1999.
Int’l. Conf. Dependable Sys. and Net., Palazzo dei Congressi, [65] Fasttrack Peer-to-Peer Technology Company, available at
Florence, Italy, June 28–July 1 2004. http://www.fasttrack.nu/, 2001.
[43] D. Liben-Nowell, H. Balakrishnan, and D. Karger, “Analysis of [66] Kazaa Media Desktop, available at http://www.kazaa.com/,
the Evolution of Peer-to-Peer Systems,” Proc. Annual ACM 2001.
Symp. Principles of Distributed Comp., Monterey, California, [67] Bittorrent, available at http://bitconjurer.org/BitTorrent/, 2003.
USA, 2002. [68] The Overnet File-sharing Network, available at http:
[44] L. Alima et al., “Dks(n,k,f): A Family of Low Communication, //www.overnet.com/, 2002.
Scalable and Fault-Tolerant Infrastructures for P2P Applica- [69] Overnet/edonkey2000, available at http://www. edon-
tions,” Proc. 3rd IEEE/ACM Int’l. Symp. Cluster Comp. and the key2000.com/, 2000.
Grid, Monterey, California, USA, 2003, pp. 344-50. [70] American National Standard Institute (ANSI), “Public Key
[45] X. Li and C. Plaxton, “On Name Resolution in Peer-to-Peer Cryptography Using Irreversible Algorithms — Part 2: The
Networks,” Proc. 2nd ACM Int’l. Wksp. Principles of Mobile Secure Hash Algorithm (SHA-1),” ANSI Standards, Tech. Rep.
Comp., Monterey, California, USA, 2002, pp. 82–89. ANSI X9.30.2-1997, 1997.
[46] F. Kaashoek and D. Karger, “Koorde: A Simple Degree-Opti- [71] P. Ganesan, Q. Sun, and H. Garcia-Molina, “Yappers: A
mal Hash Table,” Proc. 2nd Int’l. Wksp. Peer-to- Peer Systems Peer-to-Peer Lookup Service over Arbitrary Topology,” Proc.
(IPTPS’03), Berkeley, CA, USA, Feb. 20–21, 2003. IEEE INFOCOM 2003, San Francisco, USA, Mar. 30–Apr. 1,
[47] I. Abraham, D. Malkhi, and O. Dubzinski, “Land: Stretch 2003.
(1+epsilon) Locality Aware Networks for DHTS,” Proc. ACM- [72] Q. Lv, S. Ratnasamy, and S. Shenker, “Can Heterogeneity
SIAM Symp. Discrete Algorithms (SODA 2004), New Orleans, Make Gnutella Scalable?” Proc. 1st Int’l. Wksp. Peerto- Peer
LA., USA, 2004. Systems (IPTPS), Cambridge, MA, USA, Feb. 2002.
[48] N. D. de Bruijn, “A Combinatorial Problem,” Koninklijke [73] Y. Chawathe et al., “Making Gnutella-like P2P Systems Scal-
Netherlands: Academe Van Wetenschappen, vol. 49, 1946, pp. able,” Proc. ACM SIGCOMM, Karlsruhe, Germany, Aug. 25–29
758–64. 2003.
[49] M. Naor and U. Wieder, “Novel Architectures for P2P Applica- [74] Gnutella Proposals for Dynamic Querying, available at
tions: The Continuous-Discrete Approach,” Proc. 15th Annual http://www9.limewire.com/developer/dynamicquery.html/
ACM Symp. Parallel Algorithms and Architectures (SPAA 2003), [75] S. Saroiu, P. K. Gummadi, and S. D. Gribble, “A Measurement
San Diego, California, USA, June 7-9 2003, pp. 50–59. Study of Peer-to-Peer File Sharing Systems,” Proc. Multi-
[50] D. Loguinov, A. Kumar, and S. Ganesh, “Graph-Theoretic media Comp. and Net. (MMCN), San Jose, California, USA, Jan.
Analysis of Structured Peer-to-Peer Systems: Routing Distances 2002.
and Fault Resilience,” Proc. ACM SIGCOMM, Karlsruhe, Ger- [76] B. Krishnamurthy, J. Wang, and Y. Xie, “Early Measurement
many, Aug. 25–29 2003, pp. 395–406. of a Cluster-Based Architecture for P2P Systems,” Proc. ACM
[51] M. Naor and U. Wieder, “A Simple Fault Tolerant Distributed SIGCOMM Internet Measurement Wksp., San Francisco, USA,
Hash Table,” Proc. 2nd Int’l. Wksp. Peer-to-Peer Systems (IPTPS Nov. 2001.
’03), Berkeley, California, USA, Feb. 20–21 2003. [77] Grokster, available at http://www.grokster.com/
[52] P. Fraigniaud and P. Gauron, “The Content-Addressable Net- [78] A.-L. Barabási et al., “Power-Law Distribution of the World
works D2B,” Laboratoire de Recherche en Informatique, Uni- Wide Web,” Science, vol. 287, 2000.
versit’e de Paris Sud, Tech. Rep. Technical Report 1349, Jan. [79] R. Albert, H. Jeong, and A.-L. Barabási, “Diameter of the
2003. World Wide Web,” Nature, vol. 401, 1999, pp. 130–31.
[53] E. Sit and R. Morris, “Security Considerations for Peer-to-Peer [80] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On Power-Law
Distributed Hash Tables,” Proc. 1st Int’l. Wksp. Peer-to-Peer Relationships of the Internet Topology,” Proc. SIGCOMM 1999,
Systems (IPTPS), Cambridge, MA, USA, Mar. 2002. 1999.
[54] M. Castro et al., “Secure Routing for Structured Peer-to-Peer [81] K. P. Gummadi et al., “Measurement, Modeling, and Analysis
Overlay Networks,” SIGOPS Oper. Syst. Rev., vol. 36, no. SI, of a Peer-to-Peer File Sharing Workload,” Proc. SOSP, Bolton
2002, pp. 299–314. Landing, New York, USA, Oct. 19–22 2003.
[55] A. Singh et al., “Defending Against Eclipse Attacks on Over- [82] M. Kleis, E. K. Lua, and X. Zhou, “Hierarchical Peer-to-Peer
lay Networks,” Proc. SIGOPS European Wksp., Leuven, Bel- Networks using Lightweight Superpeer Topologies,” Proc. 10th
gium, Sept. 2004. IEEE Symp. Comp. and Commun. (ISCC 2005), La Manga del
[56] D. S. Wallach, “A Survey of Peer-to-Peer Security Issues,” Mar Menor, Cartagena, Spain, June 27–30 2005.
Proc. Int’l. Symp. Software Security, Tokyo, Japan, November [83] M. Kleis, E. K. Lua, and X. Zhou, “A Case for Lightweight
2002. Superpeer Topologies.” KiVS Kurzbeiträge und Wksp., Mar.
[57] E. K. Lua et al., “Barterroam: A Novel Mobile and Wireless 2005, pp. 185–88.
Roaming Settlement Model,” Proc. QofIS, 2004, pp. 348–57. [84] A. Goel and R. Govindan, “Using the Small-World Model to
[58] C. Buragohain, D. Agrawal, and S. Suri, “A Game-Theoretic Improve Freenet Performance,” Comp. Net. Journal, vol. 46,
Framework for Incentives in P2P Systems,” Proc. IEEE P2P no. 4, Nov. 2004, pp. 555–74.
2003, Linkoping, Sweden, Sept. 1–3 2003. [85] A. Singla and C. Rohrs, Ultrapeers: Another Step Towards
[59] P. Golle et al., “Incentives for Sharing in Peer-to-Peer Net- Gnutella Scalability, available at http://groups.yahoo.com/
works,” Lecture Notes in Computer Science, vol. 2232, pp. group/thegdf/files/Proposals/Working Proposals/Ultrapeer/
75+, 2001. [86] A. Fisk, Gnutella Ultrapeer Query Protocol v0.1, available at
[60] K. Lai et al., “Incentives for Cooperation in Peer-to-Peer Net- http://groups.yahoo.com/group/the gdf/files/Proposals/Working
works,” Proc. Wksp. Economics of Peer-to-Peer Systems, Proposals/search/Dynamic Querying/
Linkoping, Sweden, June 2003. [87] S. Daswani and A. Fisk, Gnutella UDP Extension for Scalable
[61] J. R. Douceur, “The Sybil Attack,” Proc. 1st Int’l. Wksp. Peer- Searches (GUESS) v0.1, available athttp://www.limewire.org/
to-Peer Systems, Mar. 7–8 2002, pp. 251- 260. fisheye/viewrep/~raw,r=1.2/limecvs/core/guess 01.html
[62] R. Dingledine, M. J. Freedman, and D. Molnar, “Accountabili- [88] J. A. Pouwelse et al., “A Measurement Study of the BitTor-
ty Measures for Peer-to-Peer Systems,” Peer-to-Peer: Harness- rent Peer-to-Peer File Sharing System,” Delft University of Tech-
ing the Power of Disruptive Technologies, D. Derickson, Ed. nology Parallel and Distributed Systems Report Series, Tech.
O’Reilly and Associates, Nov. 2000. Rep. Technical Report PDS-2004-007, 2004.
BIOGRAPHIES
[93] A.-L. Barabási and R. Albert, “Emergence of Scaling in Ran-
dom Networks,” Science, vol. 286, no. 509, 1999.
[94] L. Adamic et al., “Search in Power-Law Networks,” Physical
Review E, vol. 64, 2001. ENG KEONG LUA ([email protected]) is currently the Ph.D. research
[95] B. Yang and H. Garcia-Molina, “Efficient Search in Peer-to- candidate in the Computer Laboratory, of the University of Cam-
Peer Networks,” Proc. 22nd IEEE Int’l. Conf. Distributed Com- bridge. He is sponsored by Microsoft Research and EPSRC e-Sci-
puting Systems (ICDCS), July 2002. ence, and has completed a research internship at the Intel
[96] R. Albert, H. Jeong, and A.-L. Barabási, “Attack and Tolerance Research Laboratory, Cambridge. Prior to this, he was the Assis-
in Complex Networks,” Nature, vol. 406, no. 378, 2000. tant Director in the Information Infrastructure Development Divi-
[97] N. J. A. Harvey et al., “Skipnet: A Scalable Overlay Network sion, of the Infocomm Development Authority of Singapore,
with Practical Locality Properties,” Proc. 4th USENIX Symp. where he was involved in the developmental and regulatory
Internet Tech. and Sys. (USITS), Seattle, WA, USA, Mar. 2003. aspects of cutting edge technologies for Singapore’s National
[98] B. T. Loo et al., “The Case for a Hybrid P2P Search Infra- Information Infrastructure. He was formerly the Consulting Project
structure,” Proc. 3rd Int’l. Wksp. Peer-to-Peer Systems (IPTPS), Manager in the regional solution development and integration
San Diego, California, USA, Feb. 26–27, 2004. center of the HP Consulting. In his consulting engagements, he
[99] A. C.-C. Yao, “On Constructing Minimum Spanning Trees in served as telco industry advisor and delivered large turn-key pro-
k-dimensional Space and Related Problems,” SIAM J. Comp., jects to major telecommunication companies at worldwide level.
vol. 11, 1982, pp. 721–36. He had also spent several years as a faculty staff member in
[100] E. K. Lua, J. Crowcroft, and M. Pias, “Highways: Proximity academia, and had served in various strategic and technical advi-
Clustering for Scalable Peer-to-Peer Networks,” Proc. IEEE 4th sory committees. His current research interests include accurate
Int’l. Conf. Peer-to-Peer Computing (P2P 2004), Aug. 25–28 distributed geometric location service for Peer-to-Peer networked
2004, pp. 266-267. systems, applying modelling and analysis methods to efficient
[101] S. Bellovin, “Security Aspects of Napster and Gnutella,” wired and wireless network protocol design and network commu-
Proc. 2001 Usenix Annual Technical Conf., Boston, Mas- nications security. He holds a filed patent. He received his M.Sc.
sachusetts, USA, June 2001. (Telecommunications) with Distinction from the Department of
[102] G. Hardin, “The Tragedy of the Commons,” Science, vol. Electronic and Electrical Engineering, University College London
162, 1968, pp. 1243–48. (UCL), in 1997. He is a Member of the IEEE Computer and Com-
[103] E. K. Lua, T. Griffin, and M. Pias, “On the Accuracy of munications Societies, IEE and IES. He holds Cisco Certified Aca-
Embeddings for Internet Coordinate Systems,” submission, demic Instructor (CCAI), Cisco Certified Network Professional
2005. (CCNP) and Cisco Certified Network Associate (CCNA) professional
[104] M. Costa et al., “PIC: Practical Internet Coordinates for Dis- certifications. He was the inaugural recipient of the prestigious
tance Estimation,” Proc. 24th IEEE Int’l. Conf. Distributed Com- Australia-Asia Award in 2003, presented by the Australian Gov-
puting Systems (ICDCS 2004), Tokyo, Japan, Mar. 2004. ernment.
[105] F. Dabek et al., “Vivaldi: A Decentralized Network Coordi-
nate System,” Proc. ACM SIGCOMM 2004 Conf., Portland, Ore- JON CROWCROFT is the Marconi Professor of Networked Systems in
gon, Aug. 2004. the Computer Laboratory, of the University of Cambridge. Prior to
[106] Y. Shavitt and T. Tankel, “Big-Bang Simulation for Embed- that he was professor of networked systems at UCL in the Com-
ding Network Distances in Euclidean Space,” Proc. IEEE INFO- puter Science Department. He is a Fellow of the ACM, a Fellow of
COM 2003 Conf., San Francisco, California, USA, Mar. 30–Apr. the British Computer Society and a Fellow of the IEE and a Fellow
3 2003. of the Royal Academy of Engineering, as well as a Fellow of the
[107] Y. Shavitt and T. Tankel, “On the Curvature of the Internet IEEE. He was a member of the IAB; was general chair for the ACM
and its Usage for Overlay Construction and Distance Estima- SIGCOMM 95-99. He is on the editorial team for COMNET, and on
tion,” Proc. IEEE INFOCOM 2004 Conf., Hong Kong, March 7- the program committee for ACM SIGCOMM and IEEE Infocomm.
11 2004. He has published 5 books - the latest is the Linux TCP/IP Imple-
[108] T. E. Ng and H. Zhang, “Predicting Internet Network Dis- mentation, published by Wiley in 2001. Currently he is the Princi-
tance with Coordinates-Based Approaches,” Proc. IEEE INFO- ple Investigator for the CMI funded Communications Research
COM 2002, New York, USA, June 2002. Network, which is a 3M pound government/industry/academia
[109] M. Pias et al., “Lighthouses for Scalable Distributed Loca- multidisciplinary collaboration to automate the successful
tion,” Proc. 2nd Int’l. Wksp. Peer-to-Peer Systems, Feb. 2003. exploitation of disruptive communications technologies.
[110] L. Tang and M. Crovella, “Virtual Landmarks for the Inter-
net,” Proc. ACM SIGCOMM Internet Measurement Conf. (IMC M ARCELO P IAS obtained a B.Eng. in Computer Engineering from
2003), Miami (FL), USA, Oct. 2003. Fundacao Universidade Federal do Rio Grande (FURG, Brazil) in
[111] H. Lim, J. Hou, and C. Choi, “Constructing Internet Coordi- 1999 and a Ph.D. degree in Computer Science from University
nate System Based on Delay Measurement,” Proc. ACM SIG- College London (UCL) in February 2004, where he worked in the
COMM Internet Measurement Conf. (IMC 2003), Miami (FL), field of distributed metering of networked services in end-user
USA, October 2003. devices. This project was funded by British Telecom (BT Exact).
[112] Y. Mao and L. K. Saul, “Modeling Distances in Large-Scale From Aug 2003 to Aug 2004, he worked as a post-doc researcher
Networks by Matrix Factorization,” Proc. 4th ACM SIGCOMM at Intel Labs in Cambridge in the area of decentralised peer-to-
conference on Internet Measurement, 2004, pp. 278-287. peer location systems for the Internet. He is currently in the Com-
Authorized licensed use limited to: UNIVERSITAT OBERTA DE CATALUNYA. Downloaded on November 21,2023 at 09:01:27 UTC from IEEE Xplore. Restrictions apply.
92 IEEE Communications Surveys & Tutorials • Second Quarter 2005
puter Laboratory, University of Cambridge, working on two wire- and is a Chartered Engineer (UK) and a Senior Member of the
less sensor networks (WSNs) projects. The Sentient Sports aims at IEEE. He is an associate editor of the IEEE Communications Sur-
tracking the performance of athletes in sports events and the EU veys and sits on various technical and professional advisory
funded Embedded WiSeNts project is preparing a research boards in the region.
roadmap in the area of WSNs and cooperating objects for the
European Commission. S TEVEN L IM is currently the Program Director for ST Electronics
(Info-Soft), a large info-communication company with Singapore's
RAVI S. SHARMA is presently Deputy Head of International relations Temasek Group of Companies. In his present role, he covers Egov-
and Director (India Strategy) at the Nanyang Technological Uni- ernment and enterprise IT solutions delivery. Prior to this, he was
versity, Singapore. He is concurrently Adjunct Associate Professor Regional Principal Consultant of Microsoft Regional Services in
of Communication and Information. Prior to this, he was the APAC, Greater China Region focusing on the telecommunication
Asean Communications Industry Principal at IBM Global Services industry. Steven was formerly the Consulting Manager in the
and before that, Director of the Multimedia Competency Centre regional solution and integration center with HP Consulting. In
of Deutsche Telekom Asia. His teaching, consulting and research the consulting engagements, he serves as telco industry advisor
interests are in telecommunications best practices and strategies. and deliver on large projects to major telecommunication compa-
His work has appeared in leading journals, conferences, trade nies in Asia Pacific as well as at worldwide level. Before this he
publications and the broadcast media. He has been teaching IT held key management and technical position in the IT industry in
and management courses since 1985 and enjoys this role as edu- shipping, logistics and training industy. He has over 17 years of IT
cator and mentor immensely. As consulting engagements, he has experience. During the span of his career, he has served in various
served as telco industry advisor to Frost & Sullivan’s technology IT committees to support the business, both internally and exter-
practice in the Asia Pacific, Vice-President of the Homeportal Inc. nally. He received his B.Sc in computing Science from the Univer-
presence in Asia, and business advisor to several local startups. He sity of Newcastle upon Tyne. He is a Member of the Project
received his Ph.D. in engineering from the University of Waterloo Management Institute and Singapore Computer Society.