0% found this document useful (0 votes)

40 views

Bees Swarm Optimization Based Approach For Web Information Retrieval

This paper deals with large scale information retrieval aiming at contributing to web searching. A Bees Swarm Optimization algorithm called BSO-IR is designed to explore the prohibitive number of documents to find the information needed by the user. Experiments were performed on CACM and RCV1 collections and more large corpuses in order to show the benefit gained from using such approach instead of the classic one.

Uploaded by

Maya Nagasaki

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Bees Swarm Optimization Based Approach For Web Information Retrieval

Uploaded by

Maya Nagasaki

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Bees Swarm Optimization based Approach for Web Information Retrieval

Habiba Drias, Hadia Mosteghanemi Department of Computer Science USTHB, LRIA, Algiers, Algeria [email protected] Abstract
This paper deals with large scale information retrieval aiming at contributing to web searching. The collections of documents considered are huge and not obvious to tackle with classical approaches. The greater the number of documents belonging to the collection, the more powerful approach required. A Bees Swarm Optimization algorithm called BSO-IR is designed to explore the prohibitive number of documents to find the information needed by the user. Extensive experiments were performed on CACM and RCV1 collections and more large corpuses in order to show the benefit gained from using such approach instead of the classic one. Performances in terms of solutions quality and runtime are compared between BSO and exact algorithms. Numerical results exhibit the superiority of BSO-IR on previous works in terms of scalability while yielding comparable quality. Keywords; web information retrieval; very large collections of documents, scalability; evolutionary algorithms; swarm intelligence; BSO; classic approach In this study, artificial intelligence approaches and more precisely bee swarm optimization (BSO) algorithms are designed for this purpose. We show through this work that evolutionary approaches may help to palliate the complexity issue. The original BSO meta-heuristic was introduced for the first time in [4] and applied successfully for the satisfiability problem. The same principles and framework are adapted for the problem that attracts our interest in the present study. The idea behind addressing web information retrieval with a BSO-based approach is the pruning of the prohibitive search space in order to browse only interesting documents and therefore get results in a reasonable amount of time. This meta-heuristic belongs to the vast and well recognized domain of swarm intelligence. Many works have been undertaken in this area and applied to many public and industrial sectors. The methodology used the most concerns the particle swarm optimization known as PSO. The present article develops a BSO approach, which is different from PSO and is inspired from the collective behavior of bees. BSO is the fruit of an aggregation of individual behaviours dictated by very simple rules. It presents an auto-organized working model, based on a decentralized logic, founded on the cooperation of units having only local information. Real bees communicate between them by means of a dance. In fact, a bee performs an active dance in order to draw the attention of its congeners, when exploring a region it finds a wealthy food source. The discovered area will be exploited by the bees at maximum. Then they will repeat this way of feeding indefinitely until satisfying their needs. Motivated by the success and the power of this metaheuristic and knowing that a few heuristic search techniques have been studied to investigate information retrieval problem, we have designed a BSO algorithm, namely BSO-IR for exploring this useful domain. Three kinds of collections have been tested; CACM with 3204 documents, RCV1 with 804 414 documents and larger collections generated by our own process. Comparison with the classical IR method is performed.
6

1. Introduction
With the exponentially growing amount of information in the web, the classic process of search knows lacks in efficiency. Innovative tools to address information retrieval (IR) become necessary to cope with the complexity induced by this tremendous volume of information. Many different directions of research are contributing in handling the complexity of the problem. Distributed information retrieval and Personalizing Information Source Selection are examples of these research axes. The recent works are considering the user and sources profiles in order to restrict the search only to the sources that have the same profile as the user [6,7]. In this manner, a lot of information is pruned and therefore, the respond time of such systems becomes rapid.

978-0-7695-4191-4/10 $26.00 2010 IEEE DOI 10.1109/WI-IAT.2010.179

2. Information retrieval problem

An information retrieval system handles and manages a collection of documents structured in an internal representation using an indexing process. It consists in finding a set of documents including information expressed in a query specifying user needs. The process involves a matching mechanism between the query and the documents of the collection. Therefore four important components are central in such process: - The document which can be a text, a web page, an image or a video. A document is usually represented by a set of terms or keywords extracted from its source. - The query which represents a need expressed by a user and specified in a formalism adopted by the system. - The similarity function that measures the similarity between a document and a query. - Two system evaluations are widely used: the precision which is the fraction of retrieved documents that are relevant and the recall which is the fraction of relevant documents that are retrieved. In an IR system, an important step is the indexing process in which an internal organization of the documents, the terms and the queries is determined in order to access in an efficient way these components. Besides the indexing process, the documents and the queries must be described according to a model. Many models for IR like the Boolean model, the vector space model and the probabilistic model exist in the literature. The most widely used which is also appropriate for meta-heuristics is the vector space model. In this model, documents as well as queries are represented as vectors of weights. Each weight in the vector denotes the importance of the corresponding term in the document or in the query. The vector space is built during the indexing process and contains all the terms that the system encounters. Consider the following vector space: (t1, t2, t3, , tn) where ti is a term or keyword for i=1 to n. For each term, we consider a structure that contains all the documents that include the term. The weight of the term in the document is associated with the document in the list. The whole collection of documents is represented by a vector containing all the documents. The structure is indexed by the number of the document. C = (d1, d2, d3, , dm) Each element of C points towards a list containing all the terms of the documents with their respective weight. The list is sorted according to the number of the term. The query is modelled exactly as a document.

The weight of a term in a document is computed using the expression tf*idf where tf is the term frequency in the document and idf is the inverted frequency computed usually as follows: idf = log(m/df) where m represents the number of documents and df is the number of documents that contain the term. The component tf indicates the importance of the term for the document, while idf expresses the power of discrimination of this term. In this way, a term having a high value of tf *idf is at the same time important in the document and less frequent in the others. The weight for a query is computed with the same manner. The similarity of a document d and a query q is then computed using the Cosine following formulas: f(d, q) = i (ai * bi) / ( i (ai)2 * i (bi)2 )1/2 (Cosine) ai and bi are the weight of term ti respectively in the document and in the query.

2.1. Complexity Issues

The classic approach described in [11] yields an exact algorithm. The principle is to search for each term of the query in the inverted file for all documents containing the term and compute the similarity between the document and the query. Then the consulted documents are sorted according to their similarity with the query. This process has a complexity in O(n*m). When n and m are reasonable, this technique is very efficient. However for an environment like the web where the number of documents and the number of keywords are prohibitive, the complexity is exponential because the parameters n and m express exponential magnitudes. This is the reason why it is important to find another alternative for addressing information retrieval in such context. Meta-heuristics enables to get a polynomial response time but in the detriment of the solution quality. To enhance the latter every component of the search method must be very well thought and implemented and all the difficulty is in this task.

2.2. Discussion
For information retrieval the landscape of the search space is not homogeneous because similar documents may exist in one region that is hard to access because it may be surrounded by barren regions. In other words the promising region can be isolated like an oasis. This situation does not represent all instances but can happen especially when the number of terms is excessively greater than the number of documents. Since in the real world, the terms we use are well defined and stored in a dictionary while at the same

time documents can be created in an infinite quantity, the number of documents is determinant in the search space shape. Therefore two kinds of situations appear. When the number of documents is small, the landscape of the search space cannot be managed easily by heuristic search techniques however the exact approach is more suited. On the contrary, when the number of documents is huge, the search space is more compact and then heuristic search can perform a great job. Heuristic search methods can even help distributed information retrieval systems where users and sources profiles are considered to direct the search to speed up their response time. This observation leaded us to construct corpuses from CACM which is a small collection and in which we have added a considerable number of documents in order to test our approach.

2.3. Related works

A few works have attempted to address information retrieval with meta-heuristics and with genetic algorithms especially [1,10,13]. The only work that deals with large scale information retrieval is [8] where the authors proposed a hybrid search technique based on genetic algorithms and ant colony optimization. The reason for that may reside in the lack of availability of large scale corpus. Some very recent works are interested in building such corpuses.

therefore the one that indicates the place of the richest source of food [3]. The meta-heuristic Bees Swarm optimization is inspired by the above collective bee behaviour description. It handles artificial bees to imitate the real bee feeding and working style in solving problems. First, a bee named BeeInit settles down to find a solution presenting good features that we call Sref and from which the other solutions of the search space are determined via a certain strategy. The set of these solutions is called SearchArea. Then, every bee will consider a solution from SearchArea as its starting point in the search. After accomplishing its search, every bee communicates through a structure named Dance to its fellows the best solution visited. One of the solutions of this list will become the new solution of reference for the next iteration of the process. In order to avoid cycles, the solution of reference is stored every time in a taboo list. The choice of the reference solution is made, first according to the quality criterion. However, if after a period of time, the swarm notes that the solution does not progress in term of quality, it integrates a second criterion of diversity that will allow it to escape from the region where it is possibly incarcerated. The BSO algorithm is therefore outlined as follows: begin let Sref be the solution found by BeeInit; while (MaxIter not reached) do insert Sref in a taboo list; determine SearchArea from Sref; assign a solution of SearchArea to each bee; for each Bee k do search starting with the assigned solution; store the result in Dance; endfor; compute the new solution of reference Sref; endwhile; end;

3. BSO meta-heuristic
In 1946 Karl Von Fris while decoding the language of bees, has observed that it is through the dance, that a bee communicates with its fellows upon its return to the hive, the distance, the direction and the wealth of the food source. Bees of a same colony visit more than a dozen of potential exploitation areas. But the colony concentrates its efforts of harvest on a small number among them, the richest and the easiest of access. In addition, numerous observations make appear that a colony can displace its exploitation of a source quickly to another one. In their experience made in 1991, Seely, Camazine and Sneyd have shown that when a colony of bees has the choice between the exploitation of two sources of food where the concentration in sugar is very unequal and situated in a diametrically opposite manner to the hive, one to the north and the other to the south, the colony goes up towards the richest to concentrate its effort of harvest. In that phenomenon, the swarm follows the bee which does the most vigorous dance,

4. BSO-IR Algorithm
In this section, we present the bee swarm optimization algorithm called BSO-IR designed for information retrieval. The adaptation of the meta-heuristic to IR requires the design of the following components: the artificial world where the bees live, the fitness function that evaluates solutions, the initial solution Sref, the strategies to determine the set of solutions SearchArea from Sref, the search procedure performed by each artificial bee, the quality and the diversity strategies and the choice

rules of the reference solution Sref allowing the iteration of the process. Let first start with the description of the problem modelling.

4.1. Solutions coding

On the basis of the CACM collection study where there are more than 3000 documents and more than 6000 terms in total, we propose the data structures described in section 2. The whole collection is a table indexed by the document numbers and containing pointers to lists of the terms of the documents. Beside each term, we put its weight in the document. The list is sorted according to the term numbers. This dynamic structure allows the insertion and the remove of terms from it. This operation enables to implement the move from one state to another one in the search space for the BSO algorithm. Similarly, the terms are stored in a table of pointers to lists including the documents where the terms appear. The list is also sorted according to the document numbers allowing the insertion and the remove of documents from it for the same reason. The sorting of the lists help to search for an element, the search stops when the element is found or when the current search pointer finds an element greater than the element the process is searching for. The search space for the algorithm is restricted to the vector of terms since the search as in the classic approaches is directed by the terms. This way, the approach gains in efficacy. However when the collection of documents is huge, the inverted file achievement is time consumer and cannot be computed. In this situation, the set of documents is considered.

number implies that Sref is probably close to the local optimum of the new region of exploitation. Therefore the probability of an improvement is very weak. On the other hand, if this value is too important, the swarm will move away from the region containing Sref with the risk to lose good solutions. To proceed to the changes, we propose two strategies assuring that the gotten solutions are as distinct as possible. If the number of generated solutions proves to be insufficient, a random technique will be used to complete the rest. The first strategy: The solution s is generated while flipping the term ti from Sref as follows: begin h = 0; while size of SearchArea not reached and h< Flip do s = Sref; p = 0; repeat if the term Flip*p+h exists in s then remove it from s else insert it in s; p = p+1; until Flip*p+h k; SearchArea = SearchArea {s}; (* set of all solutions s *) h = h+1; endwhile end The second strategy: Here we consider Sref as being a set of contiguous packets of terms. The solution s is generated while changing the terms ti of the packet s from Sref, that is: begin while size of SearchArea not reached and h< Flip do s = Sref; p = 0; repeat if the term (k/Flip*h)+p exists in s then remove it from s else insert it in s; p = p+1; until p >= k/Flip SearchArea = SearchArea {s}; h := h+1; endwhile; end; Let n=20 be the number of terms and Flip = 5. If the terms are subscripted from 1 to 20, then the first strategy consists in flipping the terms: (1,6,11,16), (2,7,12,17), (3,8,13,18) and (4,9,14,19), (5,10,15,20), while in the second strategy, the following terms are

4.2. The initial solution

Initially, BeeInit builds the solution of reference Sref via a heuristic. The heuristic we have used is the information included in the query because the document the process will search for must contains the maximum of identical terms with the query.

4.3. The determination of SearchArea

The region of exploitation SearchArea includes k solutions (k being the number of bees constituting the swarm). Each of these solutions is calculated from Sref while introducing terms and removing others at positions separated from each other by a gap equal to a multiple of the constant Flip. The choice of the parameter Flip determines the number of terms to change from Sref. Indeed, a too small value of this

inverted: (1,2,3,4), (5,6,7,8), (9,10,11,12), (13,14,15,16) and (17,18,19,20) . Flipping or inverting a term from the solution s consists in removing it from s if it exists in it or adding it to s if it is not in it.

endif endif end Remarks. (Sref is better in quality) is equivalent to (f(Sref) = Max f(s)) where s belongs to Dance and not to the Taboo list. (Sref is better in diversity) is equivalent to (diversity (Sref) = Max diversity (s)) Where s belongs to Dance. If two solutions s1 and s2 are equal in quality, that is, if they have the same value of the objective function then the one that has the largest degree of diversity will be considered. In the same way, if two solutions s1 and s2 present the same degree of diversity, the one that improves the fitness function will be chosen. It can happen, although very rarely, that all solutions of Dance exist in the Taboo list. To palliate to this problem, the solution of reference will be generated at random. maxchances is an empirical parameter and designates the maximum number of chances accorded to create a search region SearchArea.

4.4. The solution quality

It is measured by the fitness function f expressing the similarity between the solution and the query. f is computed as described in section 2.

4.5. The degree of diversity

The degree of diversity offered by a solution s is measured by the minimum of the distances between s and the elements in the Taboo list called Taboo. It is expressed as: diversity(s) = min {distance(s, x), x Taboo}, where distance(s, x) is the distance between s and x, which is defined as follows: distance(s,x) = |sx| that is, the number of shared terms between s and x.

5. Experimental Results
4.6. The bee search process
The bee search process is an iterative process. The number of iterations is called MaxIter and is an empirical parameter. The process consists of two phases: -A simple local search. -A improvement technique that consists in flipping the maximum of terms of the solution found in the first phase so that the fitness function of the new solution is superior or equal to it. In order to test the performance of the designed algorithm, we have performed a series of extensive experiments. The first one consists in setting the empirical parameters that yield high solutions quality such as the bee colony size, the maximum number of iterations and the maximum number of changes in the improvement procedure. A second step is undertaken in order to test the performance of the algorithm. The latter were implemented in C# on a personal computer. The tested collections are the known CACM, RCV1 and large scale collections generated from CACM by the following process. CACM has 3204 documents of 6468 terms whereas RCV1 possesses 804 414 documents of 47 236 terms.

4.7. The choice of the reference solution Sref

Let Sref (t) be the chosen reference solution at the t iteration, then the choice of Sref (t+1) depends on the quantity f computed as: f =f(Sbest) f (Sref(t)); Sbest being the best solution at the iteration t+1; The algorithm is as follows: begin if f > 0 then Sref = the best solution in quality; If NbChances < MaxChances then NbChances = MaxChances; endif else NbChances = NbChances 1; If NbChances > 0 then Sref = the best solution in quality else Sref = the best solution in diversity; NbChances = MaxChances;

5.1. A large scale benchmarking process

In order to test our algorithm on a larger scale collection, we have built a benchmark from the real CACM collection. The idea is to increase the size of the collection by including in it all possible concatenations of documents that are very similar between them. This way the size of the collection will augment in an exponential manner. The queries are the same as those of the original collection and the judgments are extended to the created documents that include the initial judged documents. The approach used includes two phases. The first one consists in classifying the documents in clusters having a very

close similarity. In the second step, new documents are generated by concatenating documents of each cluster in all possible ways that is by 2, by 3 and so on until concatenating all the documents altogether. The operation of documents concatenation is performed by merely merging their respective indexes. 5.1.1. Clustering of documents The algorithm of clustering documents of a collection on the basis of their similarity is a BSO based algorithm. It is designed with the same framework as the one described previously for information retrieval. The main difference is threefold; first the classes are initially created by a diversification generator of documents, second BSO is called for each class to fill in the class with documents and third each time an artificial bee finds a good solution it will insert it in a dynamic list representing the cluster. This list managed as FIFO (First In First Out) is sorted according to the similarity of the documents and its size is restricted to r, which is the number of documents per cluster, that is, its size. With this constraint, only documents that have a similarity greater than the similarity of the element located at the end of the queue is inserted in the cluster. The algorithm is outlined as follows: Input: a collection C of m documents Output: p clusters of r documents each begin C = C; Initialize p queues to empty; determine p diversified documents from Sref; (* same process as SearchArea determination *) assign a document to each group; for each group p do create searchArea of size equal to k; assign a solution to each bee; for each Bee k do search starting with the assigned solution; let s be the result of the search; store s in Dance list; endfor; if f(s, p)>f(queue.end, p) then if queue full then remove queue.end; endif; insert s in queue; endif; endfor; end;

5.1.2. Corpus generation The second phase deals with the construction of the benchmark. The idea is to create documents from similar documents belonging to the same class. The algorithm is as follows: Input: p clusters of r documents each documents Output: a collection of begin For each cluster p do For each document d of the cluster do Merge d with the other documents of class p 2by 2 then 3by 3 and so on ; Suppress d from the cluster p; endfor endfor end; The CACM collection called CACM1 in Table I, was transformed in four collections called CACM2, CACM3, CACM4 and CACM5 as shown in Table I.
TABLE I. GENERATED COLLECTIONS

collection CACM1 CACM2 CACM3 CACM4 CACM5

#clusters 3204 640 320 213 160

Cluster size 1 5 10 15 20

#doc 3204 19 844 327 364 6 979 380 167 772 004

Figure 1 shows how the size of the generated corpuses grows in an exponential way in terms of the cluster size. The aim of producing large scale collections is then achieved. In the following we describe the experimental study we have undertaken in order to demonstrate the efficiency of BSO on such tremendous corpuses.

Figure 1. The exponential growth of the size of the generated collections

5.2. Setting the parameters

Preliminary tests have been carried out in order to fix the key parameters of BSO-IR. Figure 2 shows an example of the results of the tests done on CACM1 for setting the number of iterations of the algorithm. Identical tests have been performed for all the other parameters. Table II summaries the parameters values obtained after these extensive experiments for CACM1 and CACM5.

The advantage of the evolutionary approach is in the respond time where it shows its superiority on the exact algorithm. The time factor is very important since information retrieval is performed on line and thus requires fast reactivity from the system.

Figure 2. Setting the maximum number of iterations for CACM1

Figure 3. Comparison of BSO-IR and the exact algorithm performances for CACM1

TABLE II.

EMPIRICAL PARAMETERS VALUES

parameter SearchArea MaxChances Flip MaxIter

CACM1 30 5 40 30

CACM5 50 8 70 55

5.3. Comparison of BSO-IR and classical IR algorithm

An exact algorithm based on a classical IR approach [11] was developed for comparison purposes. It computes the solution by proceeding with an exhaustive search. The following series of experiments were performed in order to analyze the behavior of the algorithm BSO-IR relatively to the exact algorithm. We vary the collection of documents and we compare the performance of the algorithm in terms of precision/recall and runtime. Figure 3 and Figure4 show the comparison between BSO-IR and the exact algorithm respectively for the collections CACM1 and CACM5. We observe a slight superiority of the classic approach in the first case that is for the small collection and the same performance for both algorithms for the huge collection.

Figure 4. Comparison of BSO-IR and the exact algorithm performances for CACM5

Table III exhibits numerical values for similarity and running time of Exact and BSO algorithms on RCV1 collection and four queries. Note that both algorithms have almost the same performance while BSO is faster than the exact algorithm. This observation is shown by another experiment to be less significant for CACM, because of its smaller size. Figure 5 and Figure 6 show the runtime for both algorithms respectively when processing CACM1 and

CACM5. Note that the time increases in an exponential time for the classic algorithm whereas it augments almost linearly for BSO-IR.
TABLE III. COMPARISON BETWEEN EXACT AND BSO RUNTIME FOR RCV1 COLLECTION

exact

BSO

query 1 2 3 4 1 2 3 4

similarity 0.31 0.62 0.79 0.56 0.31 0.67 0.75 0.56

Time (s) 206 216 215 207 6 81 99 87

for the small collection. On the other hand for large scale corpuses both algorithms have comparable efficacies. However in terms of running time, BSO-IR has yielded a performance level exceeding the one of the exact algorithm. And as a conclusion we can state that BSO-IR is more suited for large scale information retrieval than classical IR method. As a future work, we plan to hybridize metaheuristics with the distributed information retrieval approaches to better address web information retrieval. Another intent is to design and develop a metaoptimization generator that generates automatically the empirical parameters for any meta-heuristic in general and for BSO-IR in particular in order to improve its performance. Actually the manual tuning of parameters is a hard and fastidious task on one hand and may not reach optimality on the other hand in spite of the extensive and numerous experiments we can perform.

References
[1] A. A. R. Ahmed, A. A. L Bahgat, A. A. Abdel Mgeid and A.S. Osman, Using Genetic Algorithm to Improve Information Retrieval Systems, Word Academy of Science, Engineering and Technology WASET06, vol. 17, pp. 6-12, 2006. R. Baeza-Yates, B. Ribiero-Neto, Modern Information Retrieval, Addison Wesley Longman Publishing Co. Inc., 1999. E. Bonabeau, M. Dorigo, G. Theraulaz, Swarm Intellignce from natural to artificial systems, Oxford University Press, 1999. H. Drias, S. Sadeg, S. Yahi: Cooperative Bees Swarm for Solving the Maximum Weighted Satisfiability Problem. IWANN 2005: 318-325 C. Hsinchun, Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning and Genetic Algorith, Journal of the American Society for Information Science. 46, 3, pp. 194-216, 1995. S. Kechid, H. Drias: Personalizing the Source Selection and the Result Merging Process. International Journal on Artificial Intelligence Tools 18(2): 331-354 (2009) S. Kechid, H. Drias: Mutli-agent System for Personalizing Information Source Selection. Web Intelligence 2009: 588-595 P. Kromer, V. Snasel, J. Platos, A. Abraham, Implicit User Modelling Using Hybrid Meta-Heuristics, IEEE HIS, pp. 4247, 2008 C. D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008. P. Pathak, M. Gordon and W. Fan, Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation, 33rd IEEE HICSS, (2000) Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management, 24, 5, pp. 513-523 1988. C. J. Van Rijsbergen, Information Retrieval. Information, Retrieval Group, University of Glasgow, 1979. D. Vrajitoru, Crossover Improvement for the Genetic Algorithm in Information Retrieval, Inf Process Manage, 34, 4, pp. 405-415, 1998.

[2]

[3]

[4]

[5]

Figure 5. Comparison of runtimes for BSO-IR and the exact algorithm

[6]

6. Conclusions
In this paper, a bee swarm optimization algorithm named BSO-IR has been designed for information retrieval. The aim of this study is the adaptation of heuristic search techniques to large scale IR and their comparison with classical approaches. Experimental tests have been conducted on the well known CACM and RCV1 collections and also on very large benchmarks generated from CACM for test purposes. The approach designed to construct the large collection is original and enables to increase the scale level of any collection. Through the undertaken experiments we have observed that concerning the solution quality, the exact algorithm achieved slightly better results than BSO-IR

[7] [8]

[9] [10]

[11]

[12] [13]

PSO4
No ratings yet
PSO4
8 pages
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
No ratings yet
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
4 pages
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
No ratings yet
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
4 pages
Literature Review On Information Retrieval System
100% (1)
Literature Review On Information Retrieval System
5 pages
Csit1232 (2021 - 07 - 30 08 - 37 - 35 UTC)
No ratings yet
Csit1232 (2021 - 07 - 30 08 - 37 - 35 UTC)
11 pages
Unit Ii Modeling
No ratings yet
Unit Ii Modeling
15 pages
Designing_and_Building_an_Automatic_Information_Re
No ratings yet
Designing_and_Building_an_Automatic_Information_Re
7 pages
PSO11
No ratings yet
PSO11
5 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
4 pages
Egyptian Informatics Journal: O.G. El Barbary, A.S. Salama
No ratings yet
Egyptian Informatics Journal: O.G. El Barbary, A.S. Salama
4 pages
Mining The Web Searching and Integration
No ratings yet
Mining The Web Searching and Integration
5 pages
swj248 PDF
No ratings yet
swj248 PDF
8 pages
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
No ratings yet
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
8 pages
Thesis Information Retrieval
100% (2)
Thesis Information Retrieval
8 pages
Hybrid Search: Effectively Combining Keywords and Semantic Searches
No ratings yet
Hybrid Search: Effectively Combining Keywords and Semantic Searches
15 pages
of-280fbpkmhy
No ratings yet
of-280fbpkmhy
9 pages
Context Based Web Indexing For Semantic Web: Anchal Jain Nidhi Tyagi
No ratings yet
Context Based Web Indexing For Semantic Web: Anchal Jain Nidhi Tyagi
5 pages
Big Data Mining Literature Review
100% (2)
Big Data Mining Literature Review
7 pages
Query Reformulation Based On Word Embeddings
No ratings yet
Query Reformulation Based On Word Embeddings
12 pages
IR Merged Merged
No ratings yet
IR Merged Merged
132 pages
Information Retrieval Thesis PDF
100% (3)
Information Retrieval Thesis PDF
4 pages
Example Based Search 2001
No ratings yet
Example Based Search 2001
7 pages
State of The Art Document Clustering Algorithms Based On Semantic Similarity
No ratings yet
State of The Art Document Clustering Algorithms Based On Semantic Similarity
18 pages
Personalized Information Retrieval Syste
No ratings yet
Personalized Information Retrieval Syste
6 pages
KajalReview
No ratings yet
KajalReview
5 pages
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
No ratings yet
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
18 pages
Grammar Based Annotation Content Mapping in The Social Page in The Context of Short Text
No ratings yet
Grammar Based Annotation Content Mapping in The Social Page in The Context of Short Text
3 pages
Modern Information Retrieval: A Brief Overview
No ratings yet
Modern Information Retrieval: A Brief Overview
9 pages
Systematic Literature Review Computer Science
100% (3)
Systematic Literature Review Computer Science
4 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
A Knowledge Graph For Humanities Research
No ratings yet
A Knowledge Graph For Humanities Research
3 pages
Associative Text Retrieval From A Large Document Collection Using Unorganized Neural Networks
No ratings yet
Associative Text Retrieval From A Large Document Collection Using Unorganized Neural Networks
10 pages
Document Clustering Method Based On Visual Features
No ratings yet
Document Clustering Method Based On Visual Features
5 pages
15.string Similarity Search A Hash-Based Approach
No ratings yet
15.string Similarity Search A Hash-Based Approach
57 pages
27 A Review of Some Semi Supervised Learning Methods
No ratings yet
27 A Review of Some Semi Supervised Learning Methods
10 pages
A_comparative_analysis_of_text_similarity_measures_and_algorithms_in_research_paper_recommender_systems
No ratings yet
A_comparative_analysis_of_text_similarity_measures_and_algorithms_in_research_paper_recommender_systems
5 pages
Efficiently Searching Nearest Neighbor in Documents
No ratings yet
Efficiently Searching Nearest Neighbor in Documents
3 pages
Information Retrieval From Scientific Abstract and Citation Database Query by Documents Approach Based On Monte Carlo Sampling
No ratings yet
Information Retrieval From Scientific Abstract and Citation Database Query by Documents Approach Based On Monte Carlo Sampling
9 pages
Employing A Domain Specific Ontology To Perform Semantic Search
No ratings yet
Employing A Domain Specific Ontology To Perform Semantic Search
13 pages
Text Mining Assignment
No ratings yet
Text Mining Assignment
12 pages
MVS Clustering of Sparse and High Dimensional Data
No ratings yet
MVS Clustering of Sparse and High Dimensional Data
5 pages
Chapter 4
No ratings yet
Chapter 4
48 pages
Bif601 Final Term Handous 15 To 61
No ratings yet
Bif601 Final Term Handous 15 To 61
28 pages
Mining Spatial Data & Enhancing Classification Using Bio - Inspired Approaches
No ratings yet
Mining Spatial Data & Enhancing Classification Using Bio - Inspired Approaches
7 pages
Information Retrieval Thesis
100% (3)
Information Retrieval Thesis
5 pages
Bif601 Final Term Handouts
No ratings yet
Bif601 Final Term Handouts
18 pages
Cash2013 - Highly Scalable Searchable Symmetric Encryption With Support For Boolean Queries
No ratings yet
Cash2013 - Highly Scalable Searchable Symmetric Encryption With Support For Boolean Queries
21 pages
IR Notes.docx
No ratings yet
IR Notes.docx
14 pages
K-Means Document Clustering Using Vector Space Model
No ratings yet
K-Means Document Clustering Using Vector Space Model
5 pages
SIGIR 2003 Workshop On Distributed Information Retrieval: Jamie Callan Fabio Crestani Mark Sanderson
No ratings yet
SIGIR 2003 Workshop On Distributed Information Retrieval: Jamie Callan Fabio Crestani Mark Sanderson
5 pages
Learning Structure and Schemas From Heterogeneous Domains in Networked Systems: A Survey
No ratings yet
Learning Structure and Schemas From Heterogeneous Domains in Networked Systems: A Survey
8 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
A Survey On Association Rules in Case of Multimedia Data Mining
No ratings yet
A Survey On Association Rules in Case of Multimedia Data Mining
4 pages
Prior steps into knowledge mapping: Text mining application and comparison
No ratings yet
Prior steps into knowledge mapping: Text mining application and comparison
8 pages
An Improved Technique For Document Clustering
No ratings yet
An Improved Technique For Document Clustering
4 pages
Chapter 1: Introduction: Efficient Search in Large Textual Collections With Redundancy - 2009
No ratings yet
Chapter 1: Introduction: Efficient Search in Large Textual Collections With Redundancy - 2009
31 pages
Noortwijk
No ratings yet
Noortwijk
15 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Media Archaeology Out of Nature - An Interview With Jussi Parikka
No ratings yet
Media Archaeology Out of Nature - An Interview With Jussi Parikka
14 pages
Vivekananda Mukherjee, PHD: Personal Details
No ratings yet
Vivekananda Mukherjee, PHD: Personal Details
17 pages
Blasting-Induced Flyrock and Ground Vibration Prediction PDF
No ratings yet
Blasting-Induced Flyrock and Ground Vibration Prediction PDF
14 pages
Kaveh2013 Dolphin PDF
No ratings yet
Kaveh2013 Dolphin PDF
18 pages
Personalized Learning Path Generation Based On Genetic Algorithms
No ratings yet
Personalized Learning Path Generation Based On Genetic Algorithms
54 pages
Swarm Intelligence
No ratings yet
Swarm Intelligence
17 pages
An Introduction To Heuristic Algorithms
No ratings yet
An Introduction To Heuristic Algorithms
8 pages
Particle Swarm Optimization (PSO)
No ratings yet
Particle Swarm Optimization (PSO)
34 pages
(Studies in Computational Intelligence 592) Satchidananda Dehuri, Alok Kumar Jagadev, Mrutyunjaya Panda (Eds.)-Multi-objective Swarm Intelligence_ Theoretical Advances and Applications-Springer-Verlag
100% (3)
(Studies in Computational Intelligence 592) Satchidananda Dehuri, Alok Kumar Jagadev, Mrutyunjaya Panda (Eds.)-Multi-objective Swarm Intelligence_ Theoretical Advances and Applications-Springer-Verlag
209 pages
Alvarez SVC 2
No ratings yet
Alvarez SVC 2
7 pages
China's New Generation Artificial Intelligence Development Plan- 2017
No ratings yet
China's New Generation Artificial Intelligence Development Plan- 2017
44 pages
Swarm Inteligence Seminar Report
100% (3)
Swarm Inteligence Seminar Report
20 pages
Doctoral Dissertation Shibaura Institute of Technology
No ratings yet
Doctoral Dissertation Shibaura Institute of Technology
99 pages
Underwater Robots To 'Repair' Scotland's Coral Reefs.20120928.025827
No ratings yet
Underwater Robots To 'Repair' Scotland's Coral Reefs.20120928.025827
2 pages
Final Report Artificial Intelligence in Civil Engineering (8257)
No ratings yet
Final Report Artificial Intelligence in Civil Engineering (8257)
45 pages
Pillay 2008
No ratings yet
Pillay 2008
207 pages
Swarm Robotics PDF
No ratings yet
Swarm Robotics PDF
19 pages
10 1 1 206 4846 PDF
No ratings yet
10 1 1 206 4846 PDF
7 pages
Intelligent Paradigms For Smart Grid and Renewable Energy Systems
No ratings yet
Intelligent Paradigms For Smart Grid and Renewable Energy Systems
397 pages
2011 Book IntelligentComputingAndInforma PDF
No ratings yet
2011 Book IntelligentComputingAndInforma PDF
745 pages
2.1 The Glowworm Swarm Optimization (GSO) Algorithm
No ratings yet
2.1 The Glowworm Swarm Optimization (GSO) Algorithm
37 pages
Reservoir Optimization in Water Resources: A Review
No ratings yet
Reservoir Optimization in Water Resources: A Review
16 pages
Advanced Optimization by Nature-Inspired Algorithms 1st Ed. 2018 Edition
100% (1)
Advanced Optimization by Nature-Inspired Algorithms 1st Ed. 2018 Edition
166 pages
Classification of MC Clusters in Digital Mammography Via Haralick Descriptors and Heuristic Embedded Feature Selection Method
No ratings yet
Classification of MC Clusters in Digital Mammography Via Haralick Descriptors and Heuristic Embedded Feature Selection Method
7 pages
AI (Unit 1)
No ratings yet
AI (Unit 1)
13 pages
Artificial Intelligence Solutions For Urban Land Dynamics
No ratings yet
Artificial Intelligence Solutions For Urban Land Dynamics
21 pages
Production Scheduling Optimization in Foundry Using Hybrid Particle Swarm Optimization Algorithm
No ratings yet
Production Scheduling Optimization in Foundry Using Hybrid Particle Swarm Optimization Algorithm
8 pages
Complete TOEFL Test #11: The Speaking Section
No ratings yet
Complete TOEFL Test #11: The Speaking Section
17 pages
Computers in Biology and Medicine: Chandrasekaran Raja, Narayanan Gangatharan
No ratings yet
Computers in Biology and Medicine: Chandrasekaran Raja, Narayanan Gangatharan
12 pages
Multiprocessor Scheduling Using Particle Swarm Opt
No ratings yet
Multiprocessor Scheduling Using Particle Swarm Opt
15 pages

Bees Swarm Optimization Based Approach For Web Information Retrieval

Uploaded by

Bees Swarm Optimization Based Approach For Web Information Retrieval

Uploaded by

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Bees Swarm Optimization based Approach for Web Information Retrieval

978-0-7695-4191-4/10 $26.00 2010 IEEE DOI 10.1109/WI-IAT.2010.179

2. Information retrieval problem

2.1. Complexity Issues

2.3. Related works

4.1. Solutions coding

4.2. The initial solution

4.3. The determination of SearchArea

4.4. The solution quality

4.5. The degree of diversity

4.7. The choice of the reference solution Sref

5.1. A large scale benchmarking process

collection CACM1 CACM2 CACM3 CACM4 CACM5

#clusters 3204 640 320 213 160

Figure 1. The exponential growth of the size of the generated collections

5.2. Setting the parameters

Figure 2. Setting the maximum number of iterations for CACM1

EMPIRICAL PARAMETERS VALUES

parameter SearchArea MaxChances Flip MaxIter

5.3. Comparison of BSO-IR and classical IR algorithm

similarity 0.31 0.62 0.79 0.56 0.31 0.67 0.75 0.56

Time (s) 206 216 215 207 6 81 99 87

Figure 5. Comparison of runtimes for BSO-IR and the exact algorithm

You might also like