Dynamic Search Algorithm in Unstructured
Dynamic Search Algorithm in Unstructured
5, MAY 2009
Abstract—Designing efficient search algorithms is a key challenge in unstructured peer-to-peer networks. Flooding and random walk
(RW) are two typical search algorithms. Flooding searches aggressively and covers the most nodes. However, it generates a large
amount of query messages and, thus, does not scale. On the contrary, RW searches conservatively. It only generates a fixed amount of
query messages at each hop but would take longer search time. We propose the dynamic search (DS) algorithm, which is a generalization
of flooding and RW. DS takes advantage of various contexts under which each previous search algorithm performs well. It resembles
flooding for short-term search and RW for long-term search. Moreover, DS could be further combined with knowledge-based search
mechanisms to improve the search performance. We analyze the performance of DS based on some performance metrics including the
success rate, search time, query hits, query messages, query efficiency, and search efficiency. Numerical results show that DS provides a
good tradeoff between search performance and cost. On average, DS performs about 25 times better than flooding and 58 times better
than RW in power-law graphs, and about 186 times better than flooding and 120 times better than RW in bimodal topologies.
1 INTRODUCTION
Based on the generating function, the average degree of a where R is the replication ratio, and C is the coverage. This
randomly chosen vertex is given by formula shows that SR highly depends on the coverage of the
search algorithms. We use (8) to obtain an important
X
m
z1 ¼ hki ¼ kpk ¼ G00 ð1Þ: ð2Þ performance metric, the search time ðST Þ, in the following.
k¼1
4.2.2 Search Time ðST Þ
The average number of second neighbors is
To represent the capability of one search algorithm to find
d the queried resource in time with a given probability, we
z2 ¼ G0 ðG1 ðxÞÞ ¼ G00 ð1ÞG01 ð1Þ; ð3Þ define the search time ðST Þ as the time it takes to guarantee
dx x¼1
the query success with success rate requirement SRreq . ST
where G1 ðxÞ is given by represents the hop count that a search is successful with a
G00 ðxÞ probabilistic guarantee. Using (8), ST is obtained when the
G1 ðxÞ ¼ : ð4Þ coverage C is equal to logð1RÞ ð1 SRreq Þ. For MBFS search
G00 ð1Þ
algorithms, this situation occurs when
Due to the difficulties to correctly measure and sample
the operational P2P networks, there are only limited real 2
p G00 ð1Þ þ p2 G00 ð1Þ G01 ð1Þ þ p3 G00 ð1Þ G01 ð1Þ
data about the topologies of such networks. In this paper, ST 1 ð9Þ
we will use the top two most common topologies, the þ þ pSTMBF S G00 ð1Þ G01 ð1Þ MBF S
power-law graphs and the bimodal topologies, to evaluate ¼ logð1RÞ ð1 SRreq Þ:
the search performance.
We compare ST ’s for DS and RW with one walker. The Then, the average number of candidates of RW at hop h is
improvement ratio is X
n
rh ¼ Pi ðRh Þ: ð21Þ
STRW STDS G0 ð1Þ 1 i¼1
ffi 1 10 : ð16Þ
STRW G0 ð1Þ p G01 ð1Þ n
Hence, the probability that vertex i is visited at hop h
In (16), the last term on the right would significantly affect for RW is
the performance improvement. ST of DS would be exponen- h i
tially decreased with n, which can be expressed as Oð1=nÞ. Pi ðVh Þ ¼ Pi ðRh Þ 1 ð1 1=rh Þk ; ð22Þ
Larger p would also affect the performance, but the effect is
where k is the number of walkers.
slow when compared with n. The extreme case of n is that it is
The calculation of visiting probability Pi ðVh Þ for DS
set as T T L, i.e., DS performs as flooding or MBFS. In this case, depends on the relation between h and n. When h n,
ST would be the shortest, whereas it would also generate a Pi ðVh Þ is given by (18). When h > n, (20), (21), and (22) are
huge amount of query messages at the same time. The used to get Pi ðVh Þ, where k in (22) is set as Cn , i.e., the
tradeoff between the search performance and the cost should coverage at the nth hop. Therefore, the visiting probability
be taken into consideration. In the following paragraphs, we Pi ðVh Þ of DS is given by
further analyze the number of query hits and the number of 8
query messages and further combine these metrics into the > p pi G00 ð1Þ; for h ¼ 1;
< Ch1
0
query efficiency and search efficiency. Pi ðVh Þ ¼ 1 1 hp p i G1 ð1Þ ;i for 2 h n;
>
: P ðR Þ 1 ð1 1=r ÞCn ; for h > n:
i h h
4.2.3 Query Hits ðQHÞ
The number of query hits highly depends on the coverage, ð23Þ
i.e., the number of total visited nodes. Assume that the
queried resources are uniformly distributed with the 4.2.4 Query Messages ðQMÞ
replication ratio R in the network, and the coverage is C. When considering the flooding and MBFS cases, the query
The number of query hits is R C. The coverage C can be message eh generated at hop h is given by
regarded as the summation of the coverage at each hop.
Therefore, we first analyze the coverage Ch at the hth hop. Let p G00 ð1Þ; for h ¼ 1;
eh ¼ ð24Þ
Vh be the event that a vertex is visited at the hth hop. Suppose p G01 ð1Þ Ch1 ; for h 2:
the probability that the vertex i is visited at the hth hop is
Pi ðVh Þ. When the hop count h ¼ 1, Ch is the expectation of the When considering the RW case, the number of query
vertices that are visited at the first hop. When the hop count h messages for each hop keeps fixed as k, i.e., the number of
is larger than 1, the calculation of Ch should preclude the walkers. Therefore, the total number of query messages for
event that the vertex has been visited in the previous hop. RW is k T T L.
Therefore, the coverage Ch at the hth hop can be written as The calculation of query messages for DS depends on h
8P and n. The query messages eh generated at hop h for DS can
n
>
> be written as
< Pi ðVh Þ; for h ¼ 1;
i¼1
Ch ¼ P ð17Þ 8
> Q
n h1 < p G00 ð1Þ; for h ¼ 1;
>
: 1 Pi ðVj Þ Pi ðVh Þ; for h 2;
i¼1 j¼1 eh ¼ p G01 ð1Þ Ch1 ; for 2 h n; ð25Þ
:
Cn ; for h > n:
where N is the total number of vertices in the network.
Next, we analyze the visiting probability Pi ðVh Þ for
flooding, MBFS, RW, and DS, respectively. First, we 4.2.5 Query Efficiency ðQEÞ
consider the flooding and MBFS cases. The visiting The number of query hits ðQHÞ and the number of query
probability Pi ðVh Þ of flooding or MBFS is messages ðQMÞ are the well-known performance metrics
for the evaluations of search algorithms. Generally speak-
p pi G00 ð1Þ; for h ¼ 1;
Pi ðVh Þ ¼ 0
Ch1 ð18Þ ing, the objective of search algorithms is to get the most
1 1 p pi G1 ð1Þ ; for h 2;
query hits with the fewest query messages, but these two
where pi is the probability that vertex i is to be reached by metrics often conflict with each other. Therefore, it requires
certain edge. Aiello et al. [39] shows that pi can be written as a more objective metric to evaluate the search performance.
LIN ET AL.: DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS 659
We adopt the performance metrics proposed in [15], query and [20] have suggested that the topology of Gnutella
efficiency ðQEÞ and search efficiency ðSEÞ, which consider network has the property of two-segment power-law link
both the search performance and the cost. The similar distribution. Thus, we construct a P2P network of
criterion can also be found in [9]. First, we calculate QE. In 100,000 peers in our simulator, in which the link distribution
[15], QE is defined as follows the reported two-segment power law. We set the
PT T L first power-law slope as 0.2316 and the second as 1.1373,
QHðhÞ 1 which are similar to the ones used in [17]. The statistics result
QE ¼ h¼1 ; ð26Þ
QM R of the topology embedded in our simulator are that the
maximum link degree is 632, mean is 11.73, and standard
where QHðhÞ is the query hits at the hth hop, QM is the
deviation is 17.09. Once the node (peer) degrees are chosen,
total number of query messages generated during the we connect these peers randomly and reassure every peer is
query, and R is the replication ratio of the queried object. connected properly (each peer has at least one link).
Since a search getting hits in a faster fashion delivers better For the object distribution of the network, we assume
users’ experiences and should be gauged as the higher there are 100 distinct objects with replication ratio of
reputation, we modify (26) and show two types of QE’s. R ¼ 1 percent; totally, there are 100,000 objects in the
QE1 is calculated as (26) shows, and QE2 penalizes search network. The distribution of the 100,000 objects over the
results coming from far away, i.e., network follows the measurement characteristics reported
PT T L in [21]. In addition, due to the dynamic environment—
QHðhÞ=h 1 peers join and leave dynamically—described in the
QE2 ¼ h¼1 : ð27Þ
QM R following section, the total number of objects available
in the network will fluctuate according to the network size
4.2.6 Search Efficiency ðSEÞ (number of online peers), but the replication ratio will
The search efficiency ðSEÞ is proposed as a unified roughly remain constant.
performance metric for search algorithms [15]. A similar Our dynamic peer behavior modeling largely follows the
criterion can be found in [9]. While the query efficiency QE proposed idea of the peer cycle [18], which includes joining,
does not consider the success rate SR, SE is defined as querying, idling, leaving, and joining again to form a cycle.
PT T L The joining and leaving operations of peers (include idling)
QHðhÞ=h SR are inferred and then modeled by the uptime and session
SE ¼ h¼1 ; ð28Þ
QM R duration distributions measured in [21] and [22]. These
where QHðhÞ=h is the query hits in the hth hop weighted measurement studies show similar results in the peer
by the hop count, QM is the total number of query uptime distribution, where half of the peers have uptime
messages generated during the query, SR is the probability percentage less than 10 percent and the best 20 percent of
that the query is successful, i.e., there is at least one query peers have 45 percent uptime or more. We use the log-
hit, and R is the replication ratio of the queried object. Thus, quadratic distribution suggested in [22] to rebuild the
the success rate SR is taken into consideration. Assume that uptime distribution, which is plotted in Fig. 4. However, for
the object is uniformly distributed in the network. Then, the the session duration distribution, those two studies lead to
query hit at the hth hop is equal to the multiplication of the different results. The median of session time in [22] is about
coverage at the hth hop and the replication rate R. 15 minutes, while it is 60 minutes in [21]. In our modeling,
Therefore, (28) can be written as we choose the median session duration time to be
PT T L 20 minutes.
PT T L Ch
C h R=h 1 ð1 RÞ h¼1 By these two rebuilt distributions, we can generate a
SE ¼ h¼1 PT T L ; ð29Þ
R probability model to decide when a peer should join or
h¼1 eh
leave the network and how long it should continually be
where Ch is the coverage at the hth hop, eh is the query online. The basic rule to assign peers’ attributes is that peers
messages generated at the hth hop, and R is the replication with higher link degrees are assigned to higher uptime
ratio. We consider two types of SEs. SE1 does not penalize percentages and longer session durations, and vice versa.
search results coming from far away, i.e., With these conditions, we map a 2-hour-long dynamic join/
PT T L leave pattern for peers. On average, there are 10 peers
PT T L
h¼1 Ch R 1 ð1 RÞ h¼1 Ch joining or leaving simultaneously. Since the mean value of
SE1 ¼ P TTL
; ð30Þ uptime distribution is about 18 percent, the resulting
h¼1 eh
R
average number of online peers is 18,152. Moreover, the
and SE2 is calculated as (29) shows. maximum number of online nodes is 24,218, while the
minimum number is 4,886.
4.3 Experimental Environment We model the dynamic querying model as Poisson
We construct the experimental environment to evaluate the distribution with the idle time ¼ 50 minutes; that is, each
performance of the knowledge-based DS algorithm. For the peer will initiate a search every 50 minutes on average.
network topology modeling, we model the P2P network as Since there is no direct measurement about the idle time, we
Gnutella to provide a network context in which peers can just use an experiential value. The choice of this parameter
perform their intended activities. The measurements in [17] is insensitive to our search performance evaluation. With
660 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 5, MAY 2009
Fig. 7. ST versus SR requirement. R is set as 0.01 in this case. The Fig. 8. Search efficiency for different number of nodes N in the network.
walkers K for RW are set as 1 and 32, respectively. The n of DS are set This figure shows the scalability of the DS algorithm.
as 2, 3, and 7, and p is set as 1. T T L is set as 7 in this case, thus the DS
with n ¼ 7 is equal to flooding.
5.1.5 Performance under Various Network Topologies
and Replication Ratios
flooding, while does not generate as many query messages.
Tables 2 and 3 show the search performance under power-
In summary, DS with n ¼ 2 and p ¼ 1 would get the best
law random graphs and bimodal topologies, respectively.
SE and significantly improve ST in this case. While
The replication ratio R is set as 0.01 percent, 0.1 percent, and
increasing n to 3, although SE is a little degraded, the
1 percent, respectively. The performance metrics including
shortest ST is obtained.
the success rate ðSRÞ, search time ðST Þ, number of query
5.1.3 Comparison with Other Advanced Search hits ðQHÞ, number of query messages ðQMÞ, query
Algorithms efficiency ðQEÞ, and search efficiency ðSEÞ are listed in
these tables. Two types of QE’s and SE’s are shown. Ones
We also compare the performance of DS with that of other
without the penalty that the search results come from far
advanced search algorithms including Hybrid Search [12]
away (QE1 and SE1 ), and others with the penalty (QE2 and
and Expanding Ring [17]. The number of nodes N is set as
SE2 ), as mentioned in Section 4.2. When considering QE1
10,000. Power-law exponent is set as 2.1. Replication ratio
and QE2 , RW performs the best because it covers the fewest
R is set as 0.01 in this case. Fig. 4 shows SE’s of these search
redundant nodes. Although RW generates the fewest query
algorithms. SE of Hybrid Search is analog to that of RW.
messages, its SR, ST , QH, and the resulting SE do not
They both increase slowly with hop counts. SE of
perform well. In most cases, DS can perform closely to the
Expanding Ring is analog to but a little worse than that of
flooding search when considering SR and ST without
the flooding. This is because Expanding Ring would revisit
generating as many query messages as flooding does. In
the nodes it has already visited before. It would thus
summary, DS obtains satisfactory performances in spite of
generate redundant messages. SE of DS is better than that
the number of nodes, the replication ratio, and the network
of Hybrid Search and Expanding Ring for all hop counts.
topologies. On average, it performs about 25 times better
Fig. 7 shows ST ’s of these search algorithms. The
than flooding and 58 times better than RW in power-law
operation of Hybrid Search is analog to that of RW with
graphs, and about 186 times better than flooding and
K ¼ G01 ð1Þ. Based on our simulation parameters, G01 ð1Þ is
120 times better than RW in bimodal topologies.
roughly 16. Thus, ST of Hybrid Search is better than that of
RW(1) but worse than that of RW(32). ST of Expanding 5.2 Performance of Knowledge-Based Dynamic
Ring is almost one hop worse than that of the flooding. Search
When the flooding reaches the second neighbors at the In this section, we evaluate the search performance in a
second hop, Expanding Ring just revisits the first neighbors network where every node is capable of building knowl-
and there is no increment in coverage. For SR requirement edge with respect to the target through some learning
smaller than 0.7, ST of DS(2) is shorter than that of mechanisms. Any forwarding mechanism can improve the
Expanding Ring, while ST of DS(2) would be longer than search performance by leveraging over the knowledge. For
that of Expanding Ring for SR requirement larger than 0.7. example, APS [27] uses the adaptive probability learning
mechanism and adopts RW as the forwarding mechanism.
5.1.4 Scalability Besides, other forwarding mechanisms, e.g., MBFS or our
In order to validate the scalability of our DS algorithm, we dynamic forwarding, are also applicable to this learning
show the search efficiency for different number of nodes in mechanism. In order to evaluate the search performance,
Fig. 8. Nodes N are set as 10,000, 50,000, 100,000, and we adopt APS learning mechanism to build the knowledge.
500,000, respectively. The replication ratio R is set as 0.01, APS learning builds a probability table for each neighbor
and T T L is set as 7. This figure shows that our DS algorithm and each object. When a query for certain object forwarding
always performs better than flooding and RW in spite of the to a certain neighbor succeeds, the relative probability (or
number of nodes. weight) of the entry for that neighbor and that object is
662 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 5, MAY 2009
TABLE 2a
Performance of Flooding in Power-Law Graphs
TABLE 2b
Performance of RW in Power-Law Graphs
increased. Otherwise, it is decreased. Since the flooding search algorithms. The initial walker for APS is 10, the
forwards messages to all of the neighbors, the learning same as [27].
mechanism is useless for it, and so we do not evaluate The experimental results for different search algorithms
flooding here. For the MBFS with APS learning, the with the knowledge building mechanism are shown in
transmission probability p is set as 0.2, which is chosen to Fig. 9. With APS knowledge building mechanism, all search
keep the same amount of query messages as the other algorithms perform much better than they do without
LIN ET AL.: DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS 663
TABLE 2c
Performance of DS in Power-Law Graphs
TABLE 3a
Performance of Flooding in Bimodal Topologies
TABLE 3b
Performance of RW in Bimodal Topologies
TABLE 3c
Performance of DS in Bimodal Topologies
performs well. It resembles flooding or MBFS for the short- parameters of DS can obtain short search time and provide a
term search and RW for the long-term search. good tradeoff between the search performance and cost.
We analyze the performance of DS based on some metrics
Under different contexts, DS always performs well. When
including the success rate, search time, number of query hits,
number of query messages, query efficiency, and search combined with knowledge-based search algorithms, its
efficiency. Numerical results show that proper setting of the search performances could be further improved.
LIN ET AL.: DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS 665
Tsungnan Lin received the BS degree in Hsinping Wang received the BS degree in
electrical engineering from the National Taiwan electrical engineering and the MS degree in
University, Taipei, in 1989 and the MA and PhD communication engineering from the National
degrees in electrical engineering from Princeton Taiwan University, Taipei, in 2002 and 2004,
University, Princeton, New Jersey, in 1993 and respectively. He is currently with the Graduate
1996, respectively. He was a teaching assistant Institute of Communication Engineering, Na-
in the Department of Electrical Engineering, tional Taiwan University. He is also starting up
Princeton University, from 1991 to 1992. From a social enterprise, called MEntoring, peering, to
1992 to 1996, he was a research assistant with Opportunity (MEPO) Humanity Technology, in-
NEC Research Institute. He has been with itiating a social campaign toward collective
EPSON R&D and EMC. Since February 2002, he has been with the deconfusions through Web 2.0 mentoring and peering while collecting
Department of Electrical Engineering and the Graduate Institute of goodwills in Taiwan, bearing the duty and attempting to strike a balance
Communication Engineering, National Taiwan University. He is a between commercial entity and nonprofit charity.
member of Phi Tau Phi Scholastic Honor Society and a senior member
of the IEEE. Chiahung Chen received the BS degree in
electrical engineering from the National Taiwan
Pochiang Lin received the BS degree in University, Taipei, in 2005. He is currently
communication engineering from the National working toward the MS degree in the Depart-
Chiao Tung University, Hsinchu, Taiwan, in ment of Electrical Engineering, National Taiwan
1996 and the MS degree in communication University. His main research interests include
engineering from the National Taiwan Univer- Internet topology and hardware acceleration of
sity, Taipei, in 2005. He is currently working network protocol.
toward the PhD degree in the Graduate Institute
of Communication Engineering, National Taiwan
University. His current research interests are in
wireless multimedia networking including hand-
off design, performance optimization, and QoS issues. He is a student . For more information on this or any other computing topic,
member of the IEEE. please visit our Digital Library at www.computer.org/publications/dlib.